Assessing the performance of ChatGPT-3.5, ChatGPT-4, and the Custom Trained Model in answering questions regarding breastfeeding problems

Seda Serhatlioglu

Abstract book of the 34th ICM Triennial...

CONFERENCE PROCEEDING

Seda Serhatlioglu ¹

More details

Hide details

Antalya Bilim University, Midwifery, Döşemealtı, Turkey

Eur J Midwifery 2026;10(Supplement 1):A824

Article (PDF)

ABSTRACT

BACKGROUND:
ChatGPT, an AI-based conversational tool, is increasingly used for healthcare-related inquiries, including breastfeeding support. Despite its growing popularity among parents and healthcare professionals, concerns remain about the scientific accuracy, motivational tone, and overall reliability of its responses. Evaluating these outputs is particularly crucial in sensitive and high-impact domains like breastfeeding.

OBJECTIVES:
This study aims to evaluate the responses of three AI-based large language models—ChatGPT-3.5, ChatGPT-4, and a custom-trained GPT—regarding frequently asked questions in breastfeeding. The evaluation focuses on scientific accuracy and comprehensiveness, tone and motivational language, and reliability. The goal is to inform the integration of AI tools into midwifery education and client support.

METHODS:
This study was conducted between July 20 and August 1, 2024. Ten breastfeeding-related questions were selected based on international guidelines and expert validation. Each model generated answers to the same questions, which were anonymized and evaluated by eight qualified breastfeeding consultants using structured Likert scales. Evaluation dimensions included (1) scientific accuracy and scope, (2) tone, language, and motivation, and (3) reliability via mDISCERN scoring. Friedman and Wilcoxon signed-rank tests with Bonferroni correction were used to determine statistical significance.

RESULTS:
Statistically significant differences were found among the three models in terms of scientific accuracy (p = 0.0038) and language/motivation (p = 0.0001). The custom-trained GPT outperformed both ChatGPT-3.5 and ChatGPT-4 in these domains. Although the mDISCERN score was highest for the custom GPT, it didn't reach statistical significance (p = 0.0575), suggesting comparable reliability among models in structured information delivery.

CONCLUSIONS:
Custom-trained GPTs tailored for breastfeeding education may offer superior performance in delivering accurate, empathetic, and motivational responses. These findings support the potential integration of domain-specific AI models into midwifery education and maternal support systems, while acknowledging ethical and clinical limitations.

KEY MESSAGE:
Custom-trained GPTs provide accurate and supportive answers to breastfeeding questions, offering valuable potential for midwifery education and care. Poster session 3 (Group B)

Submit your paper

Instructions to Authors

Home

Indexes

We process personal data collected when visiting the website. The function of obtaining information about users and their behavior is carried out by voluntarily entered information in forms and saving cookies in end devices. Data, including cookies, are used to provide services, improve the user experience and to analyze the traffic in accordance with the Privacy policy. Data are also collected and processed by Google Analytics tool (more).

You can change cookies settings in your browser. Restricted use of cookies in the browser configuration may affect some functionalities of the website.

I agree I do not agree