Evaluating Text Summarization Techniques and Factual Consistency with Language Models

Md Moinul Islam*, Usman Muhammad, Mourad Oussalah

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

35 Downloads (Pure)

Abstract

Standard evaluation of automated text summarization (ATS) methods relies on manually crafted golden summaries. With the advances in Large Language Models (LLMs), it is legitimate to question whether these models can now potentially complement or replace human-crafted summaries. This study examines the effectiveness of several language models (LMs) in specifically addressing the issue of preserving factual consistency. By conducting a thorough assessment of various conventional and state-of-the-art performance metrics, such as ROUGE, BLEU, BERTScore, FActScore, and LongDocFACTScore across diverse datasets, our findings highlight the important relationship between linguistic eloquence and factual accuracy. The findings suggest that whereas LLMs, such as GPT and LLaMA, demonstrate considerable competence in producing concise and contextually-aware summaries, there remain difficulties in ensuring factual accuracy, particularly in domain-specific situations. Moreover, this work enhances the existing knowledge on summarization dynamics and highlights the need of developing more reliable and tailored evaluation techniques that minimize the probability of factual errors in text generated by ATS. In particular, the findings advance the current domain by providing a rigorous assessment of the balance between linguistic fluency and factual correct- ness, highlighting the limitations of current ATS frameworks and metrics to enhance the factual reliability of LM-generated summaries.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Big Data, BigData 2024
EditorsWei Ding, Chang-Tien Lu, Fusheng Wang, Liping Di, Kesheng Wu, Jun Huan, Raghu Nambiar, Jundong Li, Filip Ilievski, Ricardo Baeza-Yates, Xiaohua Hu
PublisherIEEE
Pages116-122
Number of pages7
ISBN (Electronic)979-8-3503-6248-0
ISBN (Print)979-8-3503-6249-7
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventIEEE International Conference on Big Data - Washington, United States
Duration: 15 Dec 202418 Dec 2024
https://www3.cs.stonybrook.edu/~ieeebigdata2024/

Publication series

NameIEEE International Conference on Big Data
ISSN (Electronic)2573-2978

Conference

ConferenceIEEE International Conference on Big Data
Abbreviated titleBigData
Country/TerritoryUnited States
CityWashington
Period15/12/202418/12/2024
Internet address

Keywords

  • automated text summarization
  • evaluation metrics
  • FActScore
  • large language models
  • LongDocFACTScore

Fingerprint

Dive into the research topics of 'Evaluating Text Summarization Techniques and Factual Consistency with Language Models'. Together they form a unique fingerprint.

Cite this