Abstract
Recurrent neural network language model (RNNLM) is becoming popular in the state-of-the-art speech recognition systems. However, it can not remember long term patterns well due to a so-called vanishing gradient problem. Recently, Bag-of-words (BOW) representation of a word sequence is frequently used as a context feature to improve the performance of a standard feedforward NNLM. However, the BOW features have not been shown to benefit RNNLM. In this paper, we introduce a technique using BOW features to remember long term dependencies in RNNLM by creating a context feature vector in a separate non-linear context layer during the training of RNNLM. The context information is incrementally updated based on the BOW features and processed further in a non-linear context layer. The output of this layer is used as a context feature vector and fed into the hidden and output layers of the RNNLM. Experiments with Penn Treebank corpus indicate that our approach can provide lower perplexity with fewer parameters and faster training compared to the conventional RNNLM. Moreover, we carried out speech recognition experiments with Wall Street Journal corpus and achieved lower word error rate than RNNLM.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual Conference of the International Speech Communication Association |
Subtitle of host publication | Interspeech'16, San Francisco, USA, Sept. 8-12, 2016 |
Publisher | International Speech Communication Association (ISCA) |
Pages | 3504-3508 |
Number of pages | 5 |
Volume | 08-12-September-2016 |
ISBN (Electronic) | 978-1-5108-3313-5 |
DOIs | |
Publication status | Published - 2016 |
MoE publication type | A4 Conference publication |
Event | Interspeech - San Francisco, United States Duration: 8 Sept 2016 → 12 Sept 2016 Conference number: 17 |
Publication series
Name | Proceedings of the Annual Conference of the International Speech Communication Association |
---|---|
Publisher | International Speech Communication Association |
ISSN (Print) | 1990-9770 |
ISSN (Electronic) | 2308-457X |
Conference
Conference | Interspeech |
---|---|
Country/Territory | United States |
City | San Francisco |
Period | 08/09/2016 → 12/09/2016 |
Keywords
- Bag-of-words
- Language modeling
- Recurrent neural networks
- Speech recognition