TY - JOUR
T1 - Metrics and methods for robustness evaluation of neural networks with generative models
AU - Buzhinsky, Igor
AU - Nerinovsky, Arseny
AU - Tripakis, Stavros
N1 - Funding Information:
The work was financially supported by the Government of the Russian Federation (Grant 08–08) and by the Russian Science Foundation (Project 20-19-00700). We acknowledge the computational resources provided by the Aalto Science-IT project. We thank Ari Heljakka for his help related to the use of the PIONEER generative autoencoder.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature.
PY - 2023/10
Y1 - 2023/10
N2 - Recent studies have shown that modern deep neural network classifiers are easy to fool, assuming that an adversary is able to slightly modify their inputs. Many papers have proposed adversarial attacks, defenses and methods to measure robustness to such adversarial perturbations. However, most commonly considered adversarial examples are based on perturbations in the input space of the neural network that are unlikely to arise naturally. Recently, especially in computer vision, researchers discovered “natural” perturbations, such as rotations, changes of brightness, or more high-level changes, but these perturbations have not yet been systematically used to measure the performance of classifiers. In this paper, we propose several metrics to measure robustness of classifiers to natural adversarial examples, and methods to evaluate them. These metrics, called latent space performance metrics, are based on the ability of generative models to capture probability distributions. On four image classification case studies, we evaluate the proposed metrics for several classifiers, including ones trained in conventional and robust ways. We find that the latent counterparts of adversarial robustness are associated with the accuracy of the classifier rather than its conventional adversarial robustness, but the latter is still reflected on the properties of found latent perturbations. In addition, our novel method of finding latent adversarial perturbations demonstrates that these perturbations are often perceptually small.
AB - Recent studies have shown that modern deep neural network classifiers are easy to fool, assuming that an adversary is able to slightly modify their inputs. Many papers have proposed adversarial attacks, defenses and methods to measure robustness to such adversarial perturbations. However, most commonly considered adversarial examples are based on perturbations in the input space of the neural network that are unlikely to arise naturally. Recently, especially in computer vision, researchers discovered “natural” perturbations, such as rotations, changes of brightness, or more high-level changes, but these perturbations have not yet been systematically used to measure the performance of classifiers. In this paper, we propose several metrics to measure robustness of classifiers to natural adversarial examples, and methods to evaluate them. These metrics, called latent space performance metrics, are based on the ability of generative models to capture probability distributions. On four image classification case studies, we evaluate the proposed metrics for several classifiers, including ones trained in conventional and robust ways. We find that the latent counterparts of adversarial robustness are associated with the accuracy of the classifier rather than its conventional adversarial robustness, but the latter is still reflected on the properties of found latent perturbations. In addition, our novel method of finding latent adversarial perturbations demonstrates that these perturbations are often perceptually small.
KW - Adversarial examples
KW - Generative models
KW - Natural adversarial examples
KW - Reliable machine learning
UR - http://www.scopus.com/inward/record.url?scp=85109283713&partnerID=8YFLogxK
U2 - 10.1007/s10994-021-05994-9
DO - 10.1007/s10994-021-05994-9
M3 - Article
AN - SCOPUS:85109283713
SN - 0885-6125
VL - 112
SP - 3977
EP - 4012
JO - Machine Learning
JF - Machine Learning
IS - 10
ER -