Affective computing solutions, in the literature, mainly rely on machine learning methods designed to accurately detect human affective states. Nevertheless, many of the proposed methods are based on handcrafted features, requiring sufficient expert knowledge in the realm of signal processing. With the advent of deep learning methods, attention has turned toward reduced feature engineering and more end-to-end machine learning. However, most of the proposed models rely on late fusion in a multimodal context. Meanwhile, addressing interrelations between modalities for intermediate-level data representation has been largely neglected. In this paper, we propose a novel deep convolutional neural network, called CN-Waterfall, consisting of two modules: Base and General. While the Base module focuses on the low-level representation of data from each single modality, the General module provides further information, indicating relations between modalities in the intermediate- and high-level data representations. The latter module has been designed based on theoretically grounded concepts in the Explainable AI (XAI) domain, consisting of four different fusions. These fusions are mainly tailored to correlation- and non-correlation-based modalities. To validate our model, we conduct an exhaustive experiment on WESAD and MAHNOB-HCI, two publicly and academically available datasets in the context of multimodal affective computing. We demonstrate that our proposed model significantly improves the performance of physiological-based multimodal affect detection.
- Data fusion
- Deep convolutional neural network
- Multimodal affect detection
- Physiological-based sensors