Abstract
In this paper, we show that a simple, data dependent way of setting the initial vector can be used to substantially speed up the training of linear one-versus-all classifiers in extreme multi-label classification (XMC). We discuss the problem of choosing the initial weights from the perspective of three goals. We want to start in a region of weight space (a) with low loss value, (b) that is favourable for second-order optimization, and (c) where the conjugate-gradient (CG) calculations can be performed quickly. For margin losses, such an initialization is achieved by selecting the initial vector such that it separates the mean of all positive (relevant for a label) instances from the mean of all negatives – two quantities that can be calculated quickly for the highly imbalanced binary problems occurring in XMC. We demonstrate a training speedup of up to 5× on Amazon-670K dataset with 670,000 labels. This comes in part from the reduced number of iterations that need to be performed due to starting closer to the solution, and in part from an implicit negative-mining effect that allows to ignore easy negatives in the CG step. Because of the convex nature of the optimization problem, the speedup is achieved without any degradation in classification accuracy. The implementation can be found at https://github.com/xmc-aalto/dismecpp.
| Original language | English |
|---|---|
| Pages (from-to) | 3953-3976 |
| Journal | Machine Learning |
| Volume | 111 |
| Issue number | 11 |
| DOIs | |
| Publication status | Published - Nov 2022 |
| MoE publication type | A4 Conference publication |
| Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Grenoble, France Duration: 19 Sept 2022 → 23 Sept 2022 https://2022.ecmlpkdd.org/ |
Keywords
- Large-scale multi-label classification
- Linear classification
- 2nd order optimization
- Class imbalance
- Weight initialization
Fingerprint
Dive into the research topics of 'Speeding-up one-versus-all training for extreme classification via mean-separating initialization'. Together they form a unique fingerprint.Projects
- 1 Finished
-
HPC-HD/Babbar: High Performance Computing for the Detection and Analysis of Historical Discourses
Babbar, R. (Principal investigator), Mohammadnia Qaraei, M. (Project Member), Schultheis, E. (Project Member), Zhang, J. (Project Member) & Marttinen, P. (Principal investigator)
01/01/2022 → 31/12/2024
Project: RCF Academy Project targeted call
Activities
- 1 Conference presentation
-
Speeding-up one-versus-all training for extreme classification via mean-separating initialization
Schultheis, E. (Speaker)
21 Sept 2022Activity: Talk or presentation types › Conference presentation
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver