Foiling Explanations in Deep Neural Networks

By Snir Vitrack et al
Published on Aug. 10, 2023
Read the original document by opening this link in a new tab.

Table of Contents

1 Introduction
2 Related Work
3 AttaXAI
4 Experiments and Results

Summary

This paper uncovers a troubling property of explanation methods for image-based DNNs, demonstrating how explanations may be manipulated through the use of evolution strategies. The novel algorithm, AttaXAI, enables adversarial attacks on XAI algorithms without access to internal model details. Results show successful manipulation of XAI methods on benchmark datasets using different deep-learning models. The algorithm is based on Evolution Strategies and optimizes a fitness function to generate adversarial instances fooling explanation methods.
×
This is where the content will go.