How to construct deep recurrent neural networks

Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio

    Research output: Contribution to conferencePaperScientificpeer-review


    In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.

    Original languageEnglish
    Publication statusPublished - 1 Jan 2014
    MoE publication typeNot Eligible
    EventInternational Conference on Learning Representations - Banff, Canada
    Duration: 14 Apr 201416 Apr 2014
    Conference number: 2


    ConferenceInternational Conference on Learning Representations
    Abbreviated titleICLR

    Fingerprint Dive into the research topics of 'How to construct deep recurrent neural networks'. Together they form a unique fingerprint.

    Cite this