Linear Guardedness and its Implications

10/18/2022

∙

Previous work on concept identification in neural representations has focused on linear concept subspaces and their neutralization. In this work, we formulate the notion of linear guardedness – the inability to directly predict a given concept from the representation – and study its implications. We show that, in the binary case, the neutralized concept cannot be recovered by an additional linear layer. However, we point out that – contrary to what was implicitly argued in previous works – multiclass softmax classifiers can be constructed that indirectly recover the concept. Thus, linear guardedness does not guarantee that linear classifiers do not utilize the neutralized concepts, shedding light on theoretical limitations of linear information removal methods.

READ FULL TEXT

Linear Guardedness and its Implications

Sign in with Google

Consider DeepAI Pro