CONFERENCE PROCEEDING
Assessing the performance of ChatGPT-3.5, ChatGPT-4, and the Custom Trained Model in answering questions regarding breastfeeding problems
 
More details
Hide details
1
Antalya Bilim University, Midwifery, Döşemealtı, Turkey
 
 
Eur J Midwifery 2026;10(Supplement 1):A824
 
ABSTRACT
BACKGROUND:
ChatGPT, an AI-based conversational tool, is increasingly used for healthcare-related inquiries, including breastfeeding support. Despite its growing popularity among parents and healthcare professionals, concerns remain about the scientific accuracy, motivational tone, and overall reliability of its responses. Evaluating these outputs is particularly crucial in sensitive and high-impact domains like breastfeeding.

OBJECTIVES:
This study aims to evaluate the responses of three AI-based large language models—ChatGPT-3.5, ChatGPT-4, and a custom-trained GPT—regarding frequently asked questions in breastfeeding. The evaluation focuses on scientific accuracy and comprehensiveness, tone and motivational language, and reliability. The goal is to inform the integration of AI tools into midwifery education and client support.

METHODS:
This study was conducted between July 20 and August 1, 2024. Ten breastfeeding-related questions were selected based on international guidelines and expert validation. Each model generated answers to the same questions, which were anonymized and evaluated by eight qualified breastfeeding consultants using structured Likert scales. Evaluation dimensions included (1) scientific accuracy and scope, (2) tone, language, and motivation, and (3) reliability via mDISCERN scoring. Friedman and Wilcoxon signed-rank tests with Bonferroni correction were used to determine statistical significance.

RESULTS:
Statistically significant differences were found among the three models in terms of scientific accuracy (p = 0.0038) and language/motivation (p = 0.0001). The custom-trained GPT outperformed both ChatGPT-3.5 and ChatGPT-4 in these domains. Although the mDISCERN score was highest for the custom GPT, it didn't reach statistical significance (p = 0.0575), suggesting comparable reliability among models in structured information delivery.

CONCLUSIONS:
Custom-trained GPTs tailored for breastfeeding education may offer superior performance in delivering accurate, empathetic, and motivational responses. These findings support the potential integration of domain-specific AI models into midwifery education and maternal support systems, while acknowledging ethical and clinical limitations.

KEY MESSAGE:
Custom-trained GPTs provide accurate and supportive answers to breastfeeding questions, offering valuable potential for midwifery education and care. Poster session 3 (Group B)
eISSN:2585-2906
Journals System - logo
Scroll to top