Do Concept Bottleneck Models Learn as Intended?

by   Andrei Margeloiu, et al.

Concept bottleneck models map from raw inputs to concepts, and then from concepts to targets. Such models aim to incorporate pre-specified, high-level concepts into the learning procedure, and have been motivated to meet three desiderata: interpretability, predictability, and intervenability. However, we find that concept bottleneck models struggle to meet these goals. Using post hoc interpretability methods, we demonstrate that concepts do not correspond to anything semantically meaningful in input space, thus calling into question the usefulness of concept bottleneck models in their current form.


page 1

page 2

page 3

page 4


Post-hoc Concept Bottleneck Models

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpreta...

Concept Bottleneck Models

We seek to learn models that we can interact with using high-level conce...

Concept Bottleneck with Visual Concept Filtering for Explainable Medical Image Classification

Interpretability is a crucial factor in building reliable models for var...

Sparse Linear Concept Discovery Models

The recent mass adoption of DNNs, even in safety-critical scenarios, has...

Concept Embedding Models

Deploying AI-powered systems requires trustworthy models supporting effe...

GlanceNets: Interpretabile, Leak-proof Concept-based Models

There is growing interest in concept-based models (CBMs) that combine hi...

A Closer Look at the Intervention Procedure of Concept Bottleneck Models

Concept bottleneck models (CBMs) are a class of interpretable neural net...

Please sign up or login with your details

Forgot password? Click here to reset