Abstract
Recent studies have demonstrated the prospects of data mining algorithms for addressing the task of seriation in paleontological data (i.e. the age-based ordering of the sites of excavation). A prominent approach is spectral ordering that computes a similarity measure between the sites and orders them such that similar sites become adjacent and dissimilar sites are placed far apart. In the paleontological domain, the similarity measure is based on the mammal genera whose remains are retrieved at each site of excavation. Although spectral ordering achieves good performance in the seriation task, it ignores the background knowledge that is naturally present in the domain, as paleontologists can derive the ages of the sites of excavation within some accuracy. On the other hand, the age information is uncertain, so the best approach would be to combine the background knowledge with the information on mammal co-occurrences. Motivated by this kind of partial supervision we propose a novel semi-supervised spectral ordering algorithm. Our algorithm modifies the Laplacian matrix used in spectral ordering, such that domain knowledge of the ordering is taken into account. Also, it performs feature selection (sparsification) by discarding features that contribute most to the unwanted variability of the data in bootstrap sampling. The theoretical properties of the proposed algorithm are thoroughly analyzed and it is demonstrated that the proposed framework enhances the stability of the spectral ordering output and induces computational gains.
Original language | English |
---|---|
Title of host publication | ICDM'08, IEEE International Conference on Data Mining, Pisa, Italy, Dec. 15-19, 2008 |
Publisher | IEEE |
Pages | 462-471 |
DOIs | |
Publication status | Published - 2008 |
MoE publication type | A4 Conference publication |