Countering Adversarial Examples by Means of Steganographic Attacks

Colangelo, F.; Neri, A.; Battisti, F.

doi:10.1109/EUVIP47703.2019.8946254

Deep learning models are now used in multiple contexts, including safety critical applications. However, it has been proven that small adversarial alterations to the input can undermine the performances of the model, leading to unreliable results, while being hardly visible to a human observer. Image watermarking share similarities with this field: a small information is embedded inside the media, aiming at being not perceivable but robust. Many attacks have been developed to remove watermarks. In this paper, we evaluate the effectiveness of multiple image transformations to remove adversarial perturbations from images. Our experiments on the MNIST dataset for a Projected Gradient Descent-based adversary demonstrate that many transformations can result in a significant gain in accuracy when classifying adversarial examples, while not degrading the quality of the images when the adversary is not present/non significant.