Abstract
Large 0--1 datasets arise in various applications, such as market basket analysis and information retrieval. We concentrate on the study of topic models, aiming at results which indicate why certain methods succeed or fail. We describe simple algorithms for finding topic models from 0--1 data. We give theoretical results showing that the algorithms can discover the epsilon-separable topic models of Papadimitriou et al. We present empirical results showing that the algorithms find natural topics in real-world data sets. We also briefly discuss the connections to matrix approaches, including nonnegative matrix factorization and independent component analysis.
Original language | English |
---|---|
Title of host publication | KDD '02: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Publisher | ACM |
Pages | 450-455 |
ISBN (Electronic) | 978-1-58113-567-1 |
DOIs | |
Publication status | Published - 2002 |
MoE publication type | A4 Conference publication |
Event | ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Edmonton, Canada Duration: 23 Jun 2002 → 26 Jun 2002 Conference number: 8 |
Conference
Conference | ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Abbreviated title | KDD |
Country/Territory | Canada |
City | Edmonton |
Period | 23/06/2002 → 26/06/2002 |
Keywords
- 0-1 data
- data mining
- latent variable model
- topic models