Fair Column Subset Selection

Antonis Matakos, Bruno Ordozgoiti, Suhas Thejaswi

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

Abstrakti

The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a setting where the matrix rows are partitioned into two groups, and the goal is to choose a subset of columns that minimizes the maximum reconstruction error of both groups, relative to their respective best rank-k approximation. Extending the known results of column subset selection to this fair setting is not straightforward: in certain scenarios it is unavoidable to choose columns separately for each group, resulting in double the expected column count. We propose a deterministic leverage-score sampling strategy for the fair setting and show that sampling a column subset of minimum size becomes NP-hard in the presence of two groups. Despite these negative results, we give an approximation algorithm that guarantees a solution within 1.5 times the optimal solution size. We also present practical heuristic algorithms based on rank-revealing QR factorization. Finally, we validate our methods through an extensive set of experiments using real-world data.

AlkuperäiskieliEnglanti
OtsikkoKDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
KustantajaACM
Sivut2189-2199
Sivumäärä11
ISBN (elektroninen)9798400704901
DOI - pysyväislinkit
TilaJulkaistu - 25 elok. 2024
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Barcelona, Espanja
Kesto: 25 elok. 202429 elok. 2024
Konferenssinumero: 30

Conference

ConferenceACM SIGKDD International Conference on Knowledge Discovery and Data Mining
LyhennettäKDD
Maa/AlueEspanja
KaupunkiBarcelona
Ajanjakso25/08/202429/08/2024

Sormenjälki

Sukella tutkimusaiheisiin 'Fair Column Subset Selection'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä