Siirry päänavigointiin Siirry hakuun Siirry pääsisältöön

Detecting Simpson’s Paradox: A Machine Learning Perspective

  • Rahul Sharma*
  • , Huseyn Garayev
  • , Minakshi Kaushik
  • , Sijo Arakkal Peious
  • , Prayag Tiwari
  • , Dirk Draheim
  • *Tämän työn vastaava kirjoittaja

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

7 Sitaatiot (Scopus)

Abstrakti

The size of data collected around the world is growing exponentially, and it has become popular as big data. The volume and velocity of big data are facilitating the transition of machine learning (ML), deep learning (DL) and artificial intelligence (AI) from research laboratories to real life. There are numerous other claims made about Big Data. Can we, however, rely on data blindly? What happens when a dataset used to train ML models has a hidden statistical paradox? Data, like fossil fuels, is valuable, but it must be refined carefully for accurate outcomes. Statistical paradoxes are hard to observe in classical data cleaning and analysis techniques. Still, they are required to be investigated separately in training datasets. In this paper, we discuss the impact of Simpson’s paradox on categorical data and demonstrate its effects on AI and ML application scenarios. Next, we provide an algorithm to automatically identify the confounding variable and detect Simpson’s paradox within categorical datasets. The algorithm experiments on datasets from two real-world case studies. The outcome of the algorithm uncovers the existence of the paradox and indicates that Simpson’s paradox is severely harmful in automatic data analysis, especially in AI, ML and DL.

AlkuperäiskieliEnglanti
OtsikkoDatabase and Expert Systems Applications - 33rd International Conference, DEXA 2022, Proceedings
ToimittajatChristine Strauss, Alfredo Cuzzocrea, Gabriele Kotsis, Ismail Khalil, A Min Tjoa
KustantajaSpringer
Sivut323-335
Sivumäärä13
ISBN (painettu)978-3-031-12422-8
DOI - pysyväislinkit
TilaJulkaistu - 2022
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Conference on Database and Expert Systems Applications - Vienna, Itävalta
Kesto: 22 elok. 202224 elok. 2022
Konferenssinumero: 33

Julkaisusarja

NimiLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Vuosikerta13426 LNCS
ISSN (painettu)0302-9743
ISSN (elektroninen)1611-3349

Conference

ConferenceInternational Conference on Database and Expert Systems Applications
LyhennettäDEXA
Maa/AlueItävalta
KaupunkiVienna
Ajanjakso22/08/202224/08/2022

Rahoitus

Acknowledgements. This work has been partially conducted in the project “ICT programme” which was supported by the European Union through the European Social Fund.

Sormenjälki

Sukella tutkimusaiheisiin 'Detecting Simpson’s Paradox: A Machine Learning Perspective'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä