XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models

Omkar Thawakar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz Khan

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

The latest breakthroughs in large language models (LLMs) and vision-language models (VLMs) have showcased promising capabilities toward performing a wide range of tasks. Such models are typically trained on massive datasets comprising billions of image-text pairs with diverse tasks. However, their performance on task-specific domains, such as radiology, is still under-explored. While few works have recently explored LLMs-based conversational medical models, they mainly focus on text-based analysis. In this paper, we introduce XrayGPT, a conversational medical vision-language (VLMs) model that can analyze and answer open-ended questions about chest radiographs. Specifically, we align both medical visual encoder with a fine-tuned LLM to possess visual conversation abilities, grounded in an understanding of radiographs and medical knowledge. For improved alignment of chest radiograph data, we generate 217k interactive and high-quality summaries from free-text radiology reports. Extensive experiments are conducted to validate the merits of XrayGPT. To conduct an expert evaluation, certified medical doctors evaluated the output of our XrayGPT on a test subset and the results reveal that more than 70% of the responses are scientifically accurate, with an average score of 4/5. Our code and models are available at: https://github.com/mbzuai-oryx/XrayGPT.

Original languageEnglish
Title of host publicationBioNLP 2024 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, Proceedings of the Workshop and Shared Tasks
EditorsDina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, Junichi Tsujii
PublisherAssociation for Computational Linguistics
Pages440-448
Number of pages9
ISBN (Electronic)9798891761308
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventBiomedical Natural Language Processing Workshop - Bangkok, Thailand
Duration: 16 Aug 202416 Aug 2024
Conference number: 23

Conference

ConferenceBiomedical Natural Language Processing Workshop
Abbreviated titleBioNLP
Country/TerritoryThailand
CityBangkok
Period16/08/202416/08/2024

Fingerprint

Dive into the research topics of 'XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models'. Together they form a unique fingerprint.

Cite this