Explaining Image Classifiers by Adaptive Dropout and Generative In-filling
Explanations of black-box classifiers often rely on saliency maps, which score the relevance of each input dimension to the resulting classification. Recent approaches compute saliency by optimizing regions of the input that maximally change the classification outcome when replaced by a reference value. These reference values are based on ad-hoc heuristics such as the input mean. In this work we marginalize out masked regions of the input, conditioning a generative model on the rest of the image. Our model-agnostic method produces realistic explanations, generating plausible inputs that would have caused the model to classify differently. When applied to image classification, our method produces more compact and relevant explanations, with fewer artifacts compared to previous methods.
READ FULL TEXT