Adult - Anxiety
Artificial Intelligence’s Recognition of Social Anxiety Disorder and Recommendations for Treatment: A Comparative Study Between ChatGPT-3.5 and Humans
Jessica M. Montoya, B.A.
Master of Arts in Clinical Psychology Student
University of Houston-Clear Lake
Houston, Texas, United States
Estefania Andrade, B.A.
Clinical Psychology Master's Student Researcher
University of Houston – Clear Lake
McAllen, Texas, United States
Hela Desai, B.S.
Graduate Student
University of Houston – Clear Lake
Houston, Texas, United States
Sean Lauderdale, Ph.D.
Assistant Professor
University of Houston – Clear Lake
Houston, Texas, United States
ChatGPT-3.5 is a large language model (LLM) that allows users to input a variety of requests. ChatGPT-3.5 uses information from the internet and other databases to generate narrative responses. ChatGPT-3.5 learns from human feedback but may be biased in the information that it provides (OpenAI, 2022). Recent research has examined ChatGPT-3.5 and ChatGPT-4’s ability to recognize suicide risks (Elyoseph & Levkovich, 2023; Levkovich & Elyospeh, 2023).
Currently, there is no research on ChatGPT-3.5’s identification of social anxiety disorder (SAD). This is important considering the frequency of SAD (7%) in the US and efforts to incorporate LLM in clinical decision-making. The current study aims to assess ChatGPT-3.5’s responses when given a SAD vignette to understand how innovations in artificial intelligence may influence clinical practice.
A vignette of an individual with SAD was used to assess ChatGPT-3.5’s identification of SAD, potential causes of this disorder, and whether the individual should seek professional help (Coles & Coleman, 2010). Additional questions assessed distress and difficulty around this problem, whether ChatGPT-3.5 would be sympathetic towards this person, and ratings on treatment options (Furnham & Lousley, 2013). Other questions assessed whether the individual in the vignette met SAD DSM-5-TR criteria (Clark et al., 2017). Each vignette and series of questions was provided to ChatGPT-3.5 in an incognito browser, and the conversation was deleted after ChatGPT-3.5 answered all the items. This was repeated for a total of 10 trials. These responses were directly compared to Coles and Coleman’s (2010) human participants’ mental literacy. Additionally, ChatGPT-3.5’s rating for differing treatments was compared to human participants’ ratings (Furnham & Lousley, 2013). These results are preliminary and other responses about SAD will be presented.
ChatGPT-3.5 selected SAD 100% of the time, and this did not significantly differ from the human participants (86.8%; X2(1) = 1.51, p = 0.22). ChatGPT-3.5 selected environmental factors as the primary cause of SAD 100% of the time, which differed significantly from human participants’ selection of environmental factors (30.9%; X2(1) = 20.71, p < 0.0001). Human participants selected additional primary causes for this problem; however, there was no significant difference between human participants’ and ChatGPT-3.5’s selection of stress or biological factors as the primary cause (X2(1) = 0.66, p = 0.42; X2(1) = 1.21, p = 0.27). ChatGPT-3.5 was also more likely to indicate that the person should seek help in comparison to human participants (X2(1) = 5.43, p = 0.02).
For treatment options, ChatGPT-3.5 was more likely to rate utilizing a psychologist as extremely likely (t(325) = 2.70, p = 0.0072) compared to human participants. There was no significant differences in ratings for use of a general medical practitioner (t(325) = 1.50, p = 0.13).
ChatGPT-3.5 correctly identified SAD; however, it is necessary to continue to evaluate differences from human responses to learn more about ChatGPT-3.5’s mental health literacy and suggestions for treatment options.
The potential implications for ChatGPT-3.5 within real world clinical settings will be discussed.