Gameplay is often an emotionally charged activity, in particular when streaming in front of a live audience. From a games user research perspective, it would be beneficial to automatically detect and recognize players’ and streamers’ emotional expression, as this data can be used for identifying gameplay highlights, computing emotion metrics or to select parts of the videos for further analysis, e.g., through assisted recall. We contribute the first automatic game stream emotion annotation system that combines neural network analysis of facial expressions, video transcript sentiment, voice emotion, and low-level audio features (pitch, loudness). Using human-annotated emotional expression data as the ground truth, we reach accuracies of up to 70.7%, on par with the inter-rater agreement of the human annotators. In detecting the 5 most intense events of each video, we reach a higher accuracy of 80.4%. Our system is particularly accurate in detecting clearly positive emotions like amusement and excitement, but more limited with subtle emotions like puzzlement.
|Title of host publication||CHI PLAY 2019|
|Publication status||Accepted/In press - 2019|
|MoE publication type||A4 Article in a conference publication|