Audio-Driven Facial Animation by Joint End-to-End Learning of Pose and Emotion
Research output: Contribution to journal › Article › Scientific › peer-review
- Remedy Entertainment
We train our network with 3--5 minutes of high-quality animation data obtained using traditional, vision-based performance capture methods. Even though our primary goal is to model the speaking style of a single actor, our model yields reasonable results even when driven with audio from other speakers with different gender, accent, or language, as we demonstrate with a user study. The results are applicable to in-game dialogue, low-cost localization, virtual reality avatars, and telepresence.
|Journal||ACM Transactions on Graphics|
|Publication status||Published - Jul 2017|
|MoE publication type||A1 Journal article-refereed|