Abstract
Dimensionality reduction techniques can be employed to produce robust, cost-effective predictive models, and to enhance interpretability in exploratory data analysis. However, the models produced by many of these methods are formulated in terms of abstract factors or are too high-dimensional to facilitate insight and fit within low computational budgets. In this paper we explore an alternative approach to interpretable dimensionality reduction. Given a data matrix, we study the following question: are there subsets of variables that can be primarily explained by a single factor? We formulate this challenge as the problem of finding submatrices close to rank one. Despite its potential, this topic has not been sufficiently addressed in the literature, and there exist virtually no algorithms for this purpose that are simultaneously effective, efficient and scalable. We formalize the task as two problems which we characterize in terms of computational complexity, and propose efficient, scalable algorithms with approximation guarantees. Our experiments demonstrate how our approach can produce insightful findings in data, and show our algorithms to be superior to strong baselines.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the Web Conference, WWW 2021 |
| Publisher | ACM |
| Pages | 3066-3075 |
| Number of pages | 10 |
| ISBN (Electronic) | 9781450383127 |
| DOIs | |
| Publication status | Published - 3 Jun 2021 |
| MoE publication type | A4 Conference publication |
| Event | The Web Conference - Ljubljana, Slovenia Duration: 19 Apr 2021 → 23 Apr 2021 |
Conference
| Conference | The Web Conference |
|---|---|
| Abbreviated title | WWW |
| Country/Territory | Slovenia |
| City | Ljubljana |
| Period | 19/04/2021 → 23/04/2021 |
Funding
This work was supported by the Academy of Finland project AIDA (317085), the EC H2020RIA project “SoBigData++” (871042), and the Polish National Agency for Academic Exchange within the Bekker programme, number PPN/BEK/2019/1/00133.
Keywords
- Data mining
- Dimensionality reduction
- Explainability
- Variable selection
Fingerprint
Dive into the research topics of 'Insightful dimensionality reduction with very low rank variable subsets'. Together they form a unique fingerprint.Projects
- 2 Finished
-
-: SoBigData-PlusPlus
Roy, C. (Project Member), Kaski, K. (Project Member) & Bhattacharya, K. (Project Member)
01/01/2020 → 31/12/2025
Project: EU H2020 Framework program
-
Adaptive and intelligent data
Gionis, A. (Principal investigator), Mahadevan, A. (Project Member), Zhang, G. (Project Member), Papatheodorou, D. (Project Member), Ordozgoiti Rubio, B. (Project Member) & Muniyappa, S. (Project Member)
01/01/2018 → 30/06/2022
Project: Academy of Finland: Other research funding
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver