Abstract
Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known that CD yields a biased gradient estimate. In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than unbiased sampling, while the mean of subsequent PCD estimates has a higher variance than independent sampling. The results give one explanation to the finding that CD can be used with smaller minibatches or higher learning rates than PCD.
Original language | English |
---|---|
Title of host publication | ESANN 2016 - 24th European Symposium on Artificial Neural Networks |
Pages | 521-526 |
Number of pages | 6 |
ISBN (Electronic) | 978-2-87578-027-8 |
Publication status | Published - 1 Jan 2016 |
MoE publication type | A4 Article in a conference publication |
Event | European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning - Bruges, Belgium Duration: 27 Apr 2016 → 29 Apr 2016 Conference number: 24 |
Conference
Conference | European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning |
---|---|
Abbreviated title | ESANN |
Country | Belgium |
City | Bruges |
Period | 27/04/2016 → 29/04/2016 |