Read the original document by opening this link in a new tab.
Table of Contents
1 Introduction
2 Related Work
3 AttaXAI
4 Experiments and Results
Summary
This paper uncovers a troubling property of explanation methods for image-based DNNs, demonstrating how explanations may be manipulated through the use of evolution strategies. The novel algorithm, AttaXAI, enables adversarial attacks on XAI algorithms without access to internal model details. Results show successful manipulation of XAI methods on benchmark datasets using different deep-learning models. The algorithm is based on Evolution Strategies and optimizes a fitness function to generate adversarial instances fooling explanation methods.