Linear Guardedness and its Implications

10/18/2022
by   Shauli Ravfogel, et al.
2

Previous work on concept identification in neural representations has focused on linear concept subspaces and their neutralization. In this work, we formulate the notion of linear guardedness – the inability to directly predict a given concept from the representation – and study its implications. We show that, in the binary case, the neutralized concept cannot be recovered by an additional linear layer. However, we point out that – contrary to what was implicitly argued in previous works – multiclass softmax classifiers can be constructed that indirectly recover the concept. Thus, linear guardedness does not guarantee that linear classifiers do not utilize the neutralized concepts, shedding light on theoretical limitations of linear information removal methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Adversarial Concept Erasure in Kernel Space

The representation space of neural models for textual data emerges in an...
research
01/28/2022

Linear Adversarial Concept Erasure

Modern neural models trained on textual data rely on pre-trained represe...
research
08/31/2022

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Concept-based interpretations of black-box models are often more intuiti...
research
01/25/2023

Towards Robust Metrics for Concept Representation Evaluation

Recent work on interpretability has focused on concept-based explanation...
research
10/11/2019

Finding Interpretable Concept Spaces in Node Embeddings using Knowledge Bases

In this paper we propose and study the novel problem of explaining node ...
research
03/21/2019

Multi-adjoint concept lattices via quantaloid-enriched categories

With quantaloids carefully constructed from multi-adjoint frames, it is ...
research
02/21/2021

Mining EL Bases with Adaptable Role Depth

In Formal Concept Analysis, a base for a finite structure is a set of im...

Please sign up or login with your details

Forgot password? Click here to reset