Abstract
In many domains, document sets are hierarchically organized such as message forums having multiple levels of sections. Analysis of latent topics within such content is crucial for tasks like trend and user interest analysis. Nonparametric topic models are a powerful approach, but traditional Hierarchical Dirichlet Processes (HDPs) are unable to fully take into account topic sharing across deep hierarchical structure. We propose the Tree-structured Hierarchical Dirichlet Process, allowing Dirichlet process based topic modeling over a given tree structure of arbitrary size and height, where documents can arise at all tree nodes. Experiments on a hierarchical social message forum and a product reviews forum demonstrate better generalization performance than traditional HDPs in terms of ability to model new data and classify documents to sections.
Original language | English |
---|---|
Title of host publication | Distributed Computing and Artificial Intelligence, Special Sessions, 15th International Conference |
Editors | Sara Rodríguez, Javier Prieto, Pedro Faria, María N. Moreno, Santiago Mazuelas, Elena M. Navarro, Slawomir Klos, Alberto Fernández, M. Dolores Jiménez-López |
Pages | 291-299 |
Number of pages | 9 |
DOIs | |
Publication status | Published - 1 Jan 2019 |
MoE publication type | A4 Article in a conference publication |
Event | International Conference on Distributed Computing and Artificial Intelligence - Toledo, Spain Duration: 20 Jun 2018 → 22 Jun 2018 Conference number: 15 |
Publication series
Name | Advances in Intelligent Systems and Computing |
---|---|
Publisher | Springer |
Volume | 801 |
ISSN (Print) | 2194-5357 |
Conference
Conference | International Conference on Distributed Computing and Artificial Intelligence |
---|---|
Abbreviated title | DCAI |
Country/Territory | Spain |
City | Toledo |
Period | 20/06/2018 → 22/06/2018 |
Keywords
- Hierarchical Dirichlet Processes
- Message forum
- Topic modeling