by Vivek Gupta - 1 week ago - 4 min read
Multiple recent reports have highlighted serious concerns about artificial intelligence chatbots and their use in seeking medical advice. Research published this week shows that these tools often fail to provide accurate or safe guidance when real people use them for health-related questions. The findings have prompted experts to urge caution as more people turn to AI for health information.
A large study published this week in Nature Medicine examined how artificial intelligence (AI) chatbots perform when people use them to assess health concerns and decide on a course of action. Researchers from the University of Oxford’s Internet Institute conducted a randomised trial involving roughly 1,300 participants in the United Kingdom.
Participants were given medical scenarios such as a severe headache or persistent exhaustion and asked to identify likely conditions and recommend what to do next. Three well-known AI models were tested, including OpenAI’s ChatGPT-4o, Meta’s Llama 3 and Cohere’s Command R+. People in the study either used one of these chatbots or used traditional methods such as internet search engines.
While these models demonstrate near-expert performance on formal medical tests, the real-world results were far weaker. Human users interacting with the AI correctly identified conditions in just over 34 percent of cases and chose appropriate next steps in about 44 percent of cases. Those outcomes were roughly the same as for people using standard search tools or their own judgement, the research found. The study also noted that minor differences in how questions were written produced very different answers from the AI.
Researchers said a key problem is communication. People often do not provide the precise details an AI needs, and chatbots sometimes mix accurate information with misleading or irrelevant suggestions, leaving users confused about what to trust. A lead medical expert on the study warned that consulting an AI about symptoms could be “dangerous” because wrong advice might delay appropriate care.
Several news agencies have published related findings in the past 24 to 48 hours.
A report from an international news organisation noted that AI chatbots, despite being able to demonstrate high performance on licensing exam-style questions, fail to provide better health guidance than traditional online information sources when ordinary users consult them. It quoted study authors saying the tools are not ready to replace professional medical advice.
Other media coverage highlighted how the phrasing and completeness of user inputs affect chatbot responses, with incomplete or unclear user descriptions leading to contradictory or unhelpful medical suggestions.

The concerns around medical advice from AI chatbots align with broader evidence about the limitations of AI in healthcare.
An independent analysis published today found that AI systems can be more likely to accept and repeat medical misinformation when it appears in a seemingly authoritative context. In tests, large language models were more likely to incorporate false medical claims into responses if they were presented in credible-looking hospital discharge notes, compared with other sources. This finding raises questions about how AI might spread incorrect health information in real-world settings.
Other recent work has highlighted safety concerns when AI chatbots interact with sensitive queries such as mental health or suicide-related questions, pointing to inconsistent responses and the potential for harm when guidance veers off from clinical norms. Experts have noted that chatbots trained on general datasets lack the context awareness and safeguards typically required in health services fields.
The relevance of these findings is underscored by patterns in public use of AI for health.
Independent hazard reporting released earlier this year identified misuse of AI chatbots concerning healthcare as a top health technology risk for 2026. The safety experts noted that as chatbots become easier to access, people increasingly use them for health-related questions, despite the fact that these tools are not regulated as medical devices or validated for clinical decision-making.
Those experts highlighted examples where AI responses have included incorrect diagnoses or unsafe recommendations, reinforcing the need for users to verify health information with trained professionals rather than relying on AI guidance alone.
Health researchers and safety organisations have called for stronger evaluation and oversight of AI health tools. They advocate real-world testing that mirrors how people actually interact with chatbots, rather than relying solely on laboratory benchmarks. They also recommend clearer public education about the limitations of AI when it comes to medical decisions.
For now, experts say that while AI can support background research on medical topics or help users better understand generic health concepts, it should not be used as a substitute for professional medical advice. Decision-making about conditions that could be serious or life-threatening is best left to qualified health professionals.