Fair Column Subset Selection

Antonis Matakos, Bruno Ordozgoiti, Suhas Thejaswi

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

Abstract

The problem of column subset selection asks for a subset of columns from an input matrix such that the matrix can be reconstructed as accurately as possible within the span of the selected columns. A natural extension is to consider a setting where the matrix rows are partitioned into two groups, and the goal is to choose a subset of columns that minimizes the maximum reconstruction error of both groups, relative to their respective best rank-k approximation. Extending the known results of column subset selection to this fair setting is not straightforward: in certain scenarios it is unavoidable to choose columns separately for each group, resulting in double the expected column count. We propose a deterministic leverage-score sampling strategy for the fair setting and show that sampling a column subset of minimum size becomes NP-hard in the presence of two groups. Despite these negative results, we give an approximation algorithm that guarantees a solution within 1.5 times the optimal solution size. We also present practical heuristic algorithms based on rank-revealing QR factorization. Finally, we validate our methods through an extensive set of experiments using real-world data.

Original languageEnglish
Title of host publicationKDD 2024 - Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherACM
Pages2189-2199
Number of pages11
ISBN (Electronic)9798400704901
DOIs
Publication statusPublished - 25 Aug 2024
MoE publication typeA4 Conference publication
EventACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Barcelona, Spain
Duration: 25 Aug 202429 Aug 2024
Conference number: 30

Conference

ConferenceACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Abbreviated titleKDD
Country/TerritorySpain
CityBarcelona
Period25/08/202429/08/2024

Keywords

  • algorithmic fairness
  • column subset selection
  • dimensionality reduction
  • low rank approximation
  • matrix factorization

Fingerprint

Dive into the research topics of 'Fair Column Subset Selection'. Together they form a unique fingerprint.

Cite this