Skip to main navigation Skip to search Skip to main content

Data quality management in big data: Strategies, tools, and educational implications

  • Thu Nguyen*
  • , Hong Tri Nguyen
  • , Tu Anh Nguyen-Hoang
  • *Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

8 Citations (Scopus)
301 Downloads (Pure)

Abstract

This study addresses the critical need for effective Big Data Quality Management (BDQM) in education, a field where data quality has profound implications but remains underexplored. The work systematically progresses from requirement analysis and standard development to the deployment of tools for monitoring and enhancing data quality in big data workflows. The study's contributions are substantiated through five research questions that explore the impact of data quality on analytics, the establishment of evaluation standards, centralized management strategies, improvement techniques, and education-specific BDQM adaptations. By addressing these questions, the research advances both theoretical and practical frameworks, equipping stakeholders with the tools to enhance the reliability and efficiency of data-driven educational initiatives. Integrating Artificial Intelligence (AI) and distributed computing, this research introduces a novel multi-stage BDQM framework that emphasizes data quality assessment, centralized governance, and AI-enhanced improvement techniques. This work underscores the transformative potential of robust BDQM systems in supporting informed decision-making and achieving sustainable outcomes in educational projects. The survey findings highlight the potential for automated data management within big data architectures, suggesting that data quality frameworks can be significantly enhanced by leveraging AI and distributed computing. Additionally, the survey emphasizes emerging trends in big data quality management, specifically (i) automated data cleaning and cleansing and (ii) data enrichment and augmentation.

Original languageEnglish
Article number105067
JournalJournal of Parallel and Distributed Computing
Volume200
DOIs
Publication statusPublished - Jun 2025
MoE publication typeA1 Journal article-refereed

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 17 - Partnerships for the Goals
    SDG 17 Partnerships for the Goals

Keywords

  • Artificial intelligence
  • Big data
  • Data quality
  • Distributed computing
  • Education projects

Fingerprint

Dive into the research topics of 'Data quality management in big data: Strategies, tools, and educational implications'. Together they form a unique fingerprint.

Cite this