Distributed Bayesian matrix factorization with limited communication

Research output: Contribution to journalArticleScientificpeer-review

Standard

Distributed Bayesian matrix factorization with limited communication. / Qin, Xiangju; Blomstedt, Paul; Leppäaho, Eemeli; Parviainen, Pekka; Kaski, Samuel.

In: Machine Learning, 01.01.2019, p. 1-26.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

APA

Vancouver

Author

Bibtex - Download

@article{8f440f9a370a48fd84334318b7a976ba,
title = "Distributed Bayesian matrix factorization with limited communication",
abstract = "Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.",
keywords = "Bayesian matrix factorization, Distributed inference, Embarrassingly parallel MCMC, Posterior propagation",
author = "Xiangju Qin and Paul Blomstedt and Eemeli Lepp{\"a}aho and Pekka Parviainen and Samuel Kaski",
note = "| openaire: EC/H2020/671555/EU//ExCAPE",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s10994-019-05778-2",
language = "English",
pages = "1--26",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",

}

RIS - Download

TY - JOUR

T1 - Distributed Bayesian matrix factorization with limited communication

AU - Qin, Xiangju

AU - Blomstedt, Paul

AU - Leppäaho, Eemeli

AU - Parviainen, Pekka

AU - Kaski, Samuel

N1 - | openaire: EC/H2020/671555/EU//ExCAPE

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.

AB - Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.

KW - Bayesian matrix factorization

KW - Distributed inference

KW - Embarrassingly parallel MCMC

KW - Posterior propagation

UR - http://www.scopus.com/inward/record.url?scp=85064242641&partnerID=8YFLogxK

U2 - 10.1007/s10994-019-05778-2

DO - 10.1007/s10994-019-05778-2

M3 - Article

SP - 1

EP - 26

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

ER -

ID: 33413159