Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

1 Citation (Scopus)

Abstract

Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known that CD yields a biased gradient estimate. In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than unbiased sampling, while the mean of subsequent PCD estimates has a higher variance than independent sampling. The results give one explanation to the finding that CD can be used with smaller minibatches or higher learning rates than PCD.
Original languageEnglish
Title of host publicationESANN 2016 - 24th European Symposium on Artificial Neural Networks
Pages521-526
Number of pages6
ISBN (Electronic)978-2-87578-027-8
Publication statusPublished - 1 Jan 2016
MoE publication typeA4 Article in a conference publication
Event European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning - Bruges, Belgium
Duration: 27 Apr 201629 Apr 2016
Conference number: 24

Conference

Conference European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
Abbreviated titleESANN
CountryBelgium
CityBruges
Period27/04/201629/04/2016

Fingerprint Dive into the research topics of 'Stochastic gradient estimate variance in contrastive divergence and persistent contrastive divergence'. Together they form a unique fingerprint.

Cite this