In AI-assisted decision-making, humans often passively review AI’s suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans’ appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design.
This yielded 226 human deliberative statements. To extract diverse intentions from these statements, two authors conducted qualitative coding using thematic analysis , and the results are summarized in Table 2. We then iteratively refined LLM prompts based on the collected data and built an "Intention Analyzer" with a 96% accuracy in identifying themes of participant statements. Specific prompts are available in the supplementary materials. I-2. Deliberation Facilitator. This component addresses DC2 (Justification Rationality) and DC5 (Respect and Agreement), as discussed in Sec 3.3, by designing corresponding LLM prompts. In particular, we instruct LLM to (1) Demonstrate a nuanced understanding of the humans statement; (2) analyze the specific content of the persons statement; and (3) provide a thoughtful and critical response. For deta
id: d256eeadfd2ad8c8888c6771863350bb - page: 12
I-3. Argument Evaluator. The main function of this component is to assess the strength of a persons statement, which informs updates to AI opinions. Drawing from established theories in human argumentation evaluation [55, 127, 128], we devised a comprehensive scoring mechanism with nine key items: Clarity, Relevance, Evidence, Logic, Consistency, Counterarguments, Depth, Credibility, and Alignment. These criteria are integrated into a prompt, guiding the LLM to evaluate human statements. We then average and scale the scores to obtain the overall human argument strength (from 0 to 1; 0: weakest, 1: strongest). Additional details, including scoring schemas and prompts, can be found in the supplementary materials. In summary, the communication layer can engage in general interactions with humans. To imbue it with specific model opinions and knowledge, we require a control layer to mediate between the LLM and the DS-model. II. Con
id: 3c78da626c42a3bf4f9bbff363f31424 - page: 12
This layer manages the querying and extraction of specific DS-model opinions and knowledge while controlling the entire conversation flow. II-1. Dialogue/Discussion Controller. This component serves as the control center for the discussion process,
id: 11df1cee3dbdbae08265819625d82ed2 - page: 12
It unfolds as follows: [Thought Elicitation] Participants express their WoE on each dimension; AI responds with its perspectives. [Discussion] AI highlights commonalities and discrepancies, inviting participants to provide justifications or question differing viewpoints. AI responds with critical insights. All three components of the Communication layer (Intention Analyzer, Deliberation Facilitator, and Argument Evaluator) play vital roles in this phase. After one round of discussion, AI offers input options for participants to update, maintain, or continue the discussion. AI proceeds based on participants choices. If they wish to move to the next dimension, AI summarizes any pending dimensions, highlighting differences. Participants can choose to explore untouched dimensions, revisit previous discussions, or skip this round. Participants have the flexibility to initiate dialogues on any dimension at any time, using qui
id: 205bf8479032ecec2f22fa0e2e2ac567 - page: 13