TY - JOUR
T1 - Metrics and Evaluations of Time Series Explanations: An Application in Affect Computing
AU - Fouladgar, Nazanin
AU - Alirezaie, Marjan
AU - Framling, Kary
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022
Y1 - 2022
N2 - Explainable artificial intelligence (XAI) has shed light on enormous applications by clarifying why neural models make specific decisions. However, it remains challenging to measure how sensitive XAI solutions are to the explanations of neural models. Although different evaluation metrics have been proposed to measure sensitivity, the main focus has been on the visual and textual data. There is insufficient attention devoted to the sensitivity metrics tailored for time series data. In this paper, we formulate several metrics, including max short-term sensitivity (MSS), max long-term sensitivity (MLS), average short-term sensitivity (ASS) and average long-term sensitivity (ALS), that target the sensitivity of XAI models with respect to the generated and real time series. Our hypothesis is that for close series with the same labels, we obtain similar explanations. We evaluate three XAI models, LIME, integrated gradient (IG), and SmoothGrad (SG), on CN-Waterfall, a deep convolutional network. This network is a highly accurate time series classifier in affect computing. Our experiments rely on data-, metric- and XAI hyperparameter- related settings on the WESAD and MAHNOB-HCI datasets. The results reveal that (i) IG and LIME provide a lower sensitivity scale than SG in all the metrics and settings, potentially due to the lower scale of important scores generated by IG and LIME, (ii) the XAI models show higher sensitivities for a smaller window of data, (iii) the sensitivities of XAI models fluctuate when the network parameters and data properties change, and (iv) the XAI models provide unstable sensitivities under different settings of hyperparameters.
AB - Explainable artificial intelligence (XAI) has shed light on enormous applications by clarifying why neural models make specific decisions. However, it remains challenging to measure how sensitive XAI solutions are to the explanations of neural models. Although different evaluation metrics have been proposed to measure sensitivity, the main focus has been on the visual and textual data. There is insufficient attention devoted to the sensitivity metrics tailored for time series data. In this paper, we formulate several metrics, including max short-term sensitivity (MSS), max long-term sensitivity (MLS), average short-term sensitivity (ASS) and average long-term sensitivity (ALS), that target the sensitivity of XAI models with respect to the generated and real time series. Our hypothesis is that for close series with the same labels, we obtain similar explanations. We evaluate three XAI models, LIME, integrated gradient (IG), and SmoothGrad (SG), on CN-Waterfall, a deep convolutional network. This network is a highly accurate time series classifier in affect computing. Our experiments rely on data-, metric- and XAI hyperparameter- related settings on the WESAD and MAHNOB-HCI datasets. The results reveal that (i) IG and LIME provide a lower sensitivity scale than SG in all the metrics and settings, potentially due to the lower scale of important scores generated by IG and LIME, (ii) the XAI models show higher sensitivities for a smaller window of data, (iii) the sensitivities of XAI models fluctuate when the network parameters and data properties change, and (iv) the XAI models provide unstable sensitivities under different settings of hyperparameters.
KW - deep convolutional neural network
KW - Explainable AI
KW - metrics
KW - time series data
UR - http://www.scopus.com/inward/record.url?scp=85125751693&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2022.3155115
DO - 10.1109/ACCESS.2022.3155115
M3 - Article
AN - SCOPUS:85125751693
SN - 2169-3536
VL - 10
SP - 23995
EP - 24009
JO - IEEE Access
JF - IEEE Access
ER -