Deep learning has expanded its application to many real-world problems, but at the same time, it is found to be very vulnerable to some simple adversarial attacks. Though various defense mechanisms have been proposed to improve robustness of deep learning software, many of them are ineffective against adaptive attacks. Thus, it is an important and arduous task to study the attack and defense methods against adversarial examples. Professor Song Fu from SIST and his collaborators have conducted a long-term research in this field and recently they made significant progress.
A research paper from Song’s group entitled “Attack as Defense: Characterizing Adversarial Examples using Robustness” was accepted by the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021), one of the most prestigious software engineering conferences in the world (CCF-A).
This paper proposed a novel characterization to distinguish adversarial examples from benign ones based on the observation that adversarial examples are significantly less robust than benign ones. A novel defense framework, named attack as defense (A2D), was proposed to detect adversarial examples by effectively evaluating an example’s robustness.
A2D evaluates robustness by calculating the cost of attacking an input image. Those less robust examples are classified as adversarial because less robust examples are easier to attack. Extensive experimental results on MNIST, CIFAR10 and ImageNet datasets showed that A2D is more effective than recent promising approaches.
Figure 1. Illustration of the adversarial examples
The figure above has three columns, which shows adversarial examples of targeted attack from ‘airplane’ to ‘cat’ and ‘horse’. The first column is the original image. Without any defense, adversarial examples with less distortion can be crafted, as shown in the second column. The adversarial examples at this time are very deceptive, with perturbations that are difficult to detect by humans, but can deceive the neural network to classify as ‘cat’ (the top one) and ‘horse’ (the bottom one). If both our defense method and adversarial training are enabled, attackers require much more distortion to craft adversarial examples, as shown in the third column. Now the distortion is too large to be human-imperceptible, and users can clearly see the silhouettes of ‘cats’ or ‘horse’ on the adversarial examples.
PhD candidate Zhao Zhe is the first author and Professor Song Fu is the corresponding author. This work was jointly completed by ShanghaiTech University, Zhejiang University and Singapore Management University, and ShanghaiTech is the first affiliation.