Adult - Anxiety
Artificial Intelligence’s Recognition of Generalized Anxiety Disorder and Recommendations for Treatment: A Comparative Study Between ChatGPT-3.5 and Humans
Estefania Andrade, B.A.
Clinical Psychology Master's Student Researcher
University of Houston – Clear Lake
McAllen, Texas, United States
Jessica M. Montoya, B.A.
Master of Arts in Clinical Psychology Student
University of Houston-Clear Lake
Houston, Texas, United States
Sean Lauderdale, Ph.D.
Assistant Professor
University of Houston – Clear Lake
Houston, Texas, United States
Hela Desai, B.S.
Graduate Student
University of Houston – Clear Lake
Houston, Texas, United States
Anxiety disorders represent the most frequent mental health problem in the US (Greenberg et al., 1999). Generalized Anxiety Disorder (GAD) is the most addressed anxiety disorder in primary health care settings, with a prevalence rate of 4% to 7% in US adults (Hoge et al., 2012). Patients with GAD have a higher health care use and cost, increased risk for suicide attempts, and reduced life satisfaction (Hoge et al., 2012). Roughly over 20% of US internet users search for mental health information, with the majority reporting that their findings would help make healthcare decisions (Pew Internet & American Life Project, 2006). With the release of artificial intelligence (AI), researchers have investigated whether it could enhance mental health care. Previous research suggests that AI provides unbiased and evidence-based recommendations for major depression better than physicians (Levkovich & Elyoseph, 2023). No research exists determining how successful AI is identifying and recommending mental health care for GAD. This study assesses the utility of AI as a resource for clinical decision making for GAD. ChatGPT-3.5, an AI model that learns from human feedback, was used to assess how well AI could identify causes of GAD and make treatment recommendations. A vignette of an individual with GAD was presented to ChatGPT-3.5 followed by questions asking it to identify the perceived causes, symptoms, and provide treatment recommendation (Coles & Coleman, 2010). Other questions asked if ChatGPT-3.5 found the problem distressing, difficult to treat, sympathized with the individual, and asked the AI to rate potential treatments (Furnham & Lousley, 2013). AI was also asked if the individual met the GAD DSM-5-TR criteria (A-D; Clark et al., 2017). All trials were conducted in an incognito browser, with each conversation deleted after the data was recorded. Each set was given to ChatGPT-3.5 10 times. ChatGPT-3.5 correctly identified GAD 100% of the time, which was significantly greater than human participants (41.4%; X2(1) = 13.46, p < .001). ChatGPT-3.5 selected stress as the primary cause half (50%) of the time, which did not significantly differ from human responses (61.7%; X2(1) = 0.556, p = 0.46). The AI chose environmental factors as the primary cause for the remaining trials (50%), which was significantly different from the human participants (16.6%; X2(1) = 7.36, p = 0.0067). ChatGPT-3.5 advised that the individual seek professional help (100%), which was significantly different when compared to the human participants (51.8%; X2(1) = 8.99, p = 0.0027). ChatGPT-3.5 and the human participants did not differ significantly in their ratings of whether the individual should see a psychologist (t(325) = 1.52, p = 0.129). The AI and human participants did not differ significantly in their ratings toward general practitioners (t(325) = 0.69, p = 0.491). ChatGPT-3.5 answered more accurately than its human counterparts in terms of identifying GAD in the vignette. The differences between the AI and human responses, as well as AI’s accuracy in defining GAD symptoms, will be discussed further.