TY - JOUR
T1 - Doubly Stochastic Neighbor Embedding on Spheres
AU - Lu, Yao
AU - Corander, Jukka
AU - Yang, Zhirong
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Stochastic Neighbor Embedding (SNE) methods minimize the divergence between the similarity matrix of a high-dimensional data set and its counterpart from a low-dimensional embedding, leading to widely applied tools for data visualization. Despite their popularity, the current SNE methods experience a crowding problem when the data include highly imbalanced similarities. This implies that the data points with higher total similarity tend to get crowded around the display center. To solve this problem, we introduce a fast normalization method and normalize the similarity matrix to be doubly stochastic such that all the data points have equal total similarities. Furthermore, we show empirically and theoretically that the doubly stochasticity constraint often leads to embeddings which are approximately spherical. This suggests replacing a flat space with spheres as the embedding space. The spherical embedding eliminates the discrepancy between the center and the periphery in visualization, which efficiently resolves the crowding problem. We compared the proposed method (DOSNES) with the state-of-the-art SNE method on three real-world datasets and the results clearly indicate that our method is more favorable in terms of visualization quality. DOSNES is freely available at http://yaolubrain.github.io/dosnes/.
AB - Stochastic Neighbor Embedding (SNE) methods minimize the divergence between the similarity matrix of a high-dimensional data set and its counterpart from a low-dimensional embedding, leading to widely applied tools for data visualization. Despite their popularity, the current SNE methods experience a crowding problem when the data include highly imbalanced similarities. This implies that the data points with higher total similarity tend to get crowded around the display center. To solve this problem, we introduce a fast normalization method and normalize the similarity matrix to be doubly stochastic such that all the data points have equal total similarities. Furthermore, we show empirically and theoretically that the doubly stochasticity constraint often leads to embeddings which are approximately spherical. This suggests replacing a flat space with spheres as the embedding space. The spherical embedding eliminates the discrepancy between the center and the periphery in visualization, which efficiently resolves the crowding problem. We compared the proposed method (DOSNES) with the state-of-the-art SNE method on three real-world datasets and the results clearly indicate that our method is more favorable in terms of visualization quality. DOSNES is freely available at http://yaolubrain.github.io/dosnes/.
KW - Data visualization
KW - Information divergence
KW - Nonlinear dimensionality reduction
UR - http://www.scopus.com/inward/record.url?scp=85071483814&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2019.08.026
DO - 10.1016/j.patrec.2019.08.026
M3 - Article
AN - SCOPUS:85071483814
SN - 0167-8655
VL - 128
SP - 100
EP - 106
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -