Category tree distance: a taxonomy-based transaction distance for web user analysis

Yinjia Zhang, Qinpei Zhao*, Yang Shi, Jiangfeng Li, Weixiong Rao

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

1 Citation (Scopus)


With the emergence of webpage services, huge amounts of customer transaction data are flooded in cyberspace, which are getting more and more useful for profiling users and making recommendations. Since web user transaction data are usually multi-modal, heterogeneous and large-scale, the traditional data analysis methods meet new challenges. One of the challenges is the distance definition on two transaction data or two web users. The distance definition takes an important role in further analysis, such as the cluster analysis or k-nearest neighbor query. We introduce a category tree distance in this paper, which makes use of the product taxonomy information to convert the user transaction data to vectors. Then, the similarity between web users can be evaluated by the vectors from their transaction data. The properties of the distance like upper and lower bounds and the complexity analysis are also given in the paper. To investigate the performance of the proposal, we conduct experiments on real web user transaction data. The results show that the proposed distance outperforms the other distances on user transaction analysis.

Original languageEnglish
Pages (from-to)39-66
Number of pages28
JournalData Mining and Knowledge Discovery
Issue number1
Early online date13 Oct 2022
Publication statusPublished - Jan 2023
MoE publication typeA1 Journal article-refereed


  • Cluster analysis
  • Distance metric
  • k-nearest neighbor query
  • Taxonomy
  • Transaction data
  • Tree structure


Dive into the research topics of 'Category tree distance: a taxonomy-based transaction distance for web user analysis'. Together they form a unique fingerprint.

Cite this