Deep learning models are now used in multiple contexts, including safety critical applications. However, it has been proven that small adversarial alterations to the input can undermine the performances of the model, leading to unreliable results, while being hardly visible to a human observer. Image watermarking share similarities with this field: a small information is embedded inside the media, aiming at being not perceivable but robust. Many attacks have been developed to remove watermarks. In this paper, we evaluate the effectiveness of multiple image transformations to remove adversarial perturbations from images. Our experiments on the MNIST dataset for a Projected Gradient Descent-based adversary demonstrate that many transformations can result in a significant gain in accuracy when classifying adversarial examples, while not degrading the quality of the images when the adversary is not present/non significant.
Countering Adversarial Examples by Means of Steganographic Attacks
Battisti F.
2019
Abstract
Deep learning models are now used in multiple contexts, including safety critical applications. However, it has been proven that small adversarial alterations to the input can undermine the performances of the model, leading to unreliable results, while being hardly visible to a human observer. Image watermarking share similarities with this field: a small information is embedded inside the media, aiming at being not perceivable but robust. Many attacks have been developed to remove watermarks. In this paper, we evaluate the effectiveness of multiple image transformations to remove adversarial perturbations from images. Our experiments on the MNIST dataset for a Projected Gradient Descent-based adversary demonstrate that many transformations can result in a significant gain in accuracy when classifying adversarial examples, while not degrading the quality of the images when the adversary is not present/non significant.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.