Exploring Adversarial Attacks on Neural Networks: An Explainable Approach

By Justus Renkhoff et al
Read the original document by opening this link in a new tab.

Table of Contents

Abstract
I. Introduction
II. Related Work
III. Methodology
A. Problem Formulation
B. Data Preparation
C. Network Behavior Deviation Detection
D. DLFuzz for Adversarial Example Generation
IV. Evaluation and Discussion
A. Seed Selection
B. Neuron Coverage under different attacks
C. Network Vulnerability Analysis
D. Behavior Drift Under Different Attack Strengths
E. Distributions on the number of compromised layers

Summary

Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. This paper explores adversarial attacks on neural networks using an explainable approach. The study uses gradient heatmaps to analyze the response characteristics of the VGG-16 model when input images are mixed with adversarial noise and Gaussian random noise. The findings reveal that adversarial noise causes severe behavior deviation by distracting the network's concentration areas. Specific blocks in the network are identified as more vulnerable to adversarial attacks. The study aims to provide insights into developing more reliable Deep Neural Network (DNN) models. The paper presents a methodology to analyze DNNs' robustness to adversarial attacks and identifies vulnerable layers in the network. Neuron coverage optimization is explored under different attack scenarios, and the behavior drift of the network is analyzed. The study highlights the impact of adversarial attacks on neural networks and the importance of understanding and mitigating such attacks for robust model development.
×
This is where the content will go.