Influential Sample Selection: A Graph Signal Processing Approach
With the growing complexity of machine learning techniques, understanding the functioning of black-box models is more important than ever. A recently popular strategy towards interpretability is to generate explanations based on examples -- called influential samples -- that have the largest influence on the model's observed behavior. However, for such an analysis, we are confronted with a plethora of influence metrics. While each of these metrics provide varying levels of representativeness and diversity, existing approaches implicitly couple the definition of influence to their sample selection algorithm, thereby making it challenging to generalize to specific analysis needs. In this paper, we propose a generic approach to influential sample selection, which analyzes the influence metric as a function on a graph constructed using the samples. We show that samples which are critical to recovering the high-frequency content of the function correspond to the most influential samples. Our approach decouples the influence metric from the actual sample selection technique, and hence can be used with any type of task-specific influence. Using experiments in prototype selection, and semi-supervised classification, we show that, even with popularly used influence metrics, our approach can produce superior results in comparison to state-of-the-art approaches. Furthermore, we demonstrate how a novel influence metric can be used to recover the influence structure in characterizing the decision surface, and recovering corrupted labels efficiently.
READ FULL TEXT