Hardness of Samples Is All You Need: Protecting Deep Learning Models Using Hardness of Samples

06/21/2021
by   Amir Mahdi Sadeghzadeh, et al.
0

Several recent studies have shown that Deep Neural Network (DNN)-based classifiers are vulnerable against model extraction attacks. In model extraction attacks, an adversary exploits the target classifier to create a surrogate classifier imitating the target classifier with respect to some criteria. In this paper, we investigate the hardness degree of samples and demonstrate that the hardness degree histogram of model extraction attacks samples is distinguishable from the hardness degree histogram of normal samples. Normal samples come from the target classifier's training data distribution. As the training process of DNN-based classifiers is done in several epochs, we can consider this process as a sequence of subclassifiers so that each subclassifier is created at the end of an epoch. We use the sequence of subclassifiers to calculate the hardness degree of samples. We investigate the relation between hardness degree of samples and the trust in the classifier outputs. We propose Hardness-Oriented Detection Approach (HODA) to detect the sample sequences of model extraction attacks. The results demonstrate that HODA can detect the sample sequences of model extraction attacks with a high success rate by only watching 100 attack samples. We also investigate the hardness degree of adversarial examples and indicate that the hardness degree histogram of adversarial examples is distinct from the hardness degree histogram of normal samples.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset