AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation

by   Jiacheng Lin, et al.

Interactive Image Segmentation (IIS) has emerged as a promising technique for decreasing annotation time. Substantial progress has been made in pre- and post-processing for IIS, but the critical issue of interaction ambiguity notably hindering segmentation quality, has been under-researched. To address this, we introduce AdaptiveClick – a clicks-aware transformer incorporating an adaptive focal loss, which tackles annotation inconsistencies with tools for mask- and pixel-level ambiguity resolution. To the best of our knowledge, AdaptiveClick is the first transformer-based, mask-adaptive segmentation framework for IIS. The key ingredient of our method is the Clicks-aware Mask-adaptive Transformer Decoder (CAMD), which enhances the interaction between clicks and image features. Additionally, AdaptiveClick enables pixel-adaptive differentiation of hard and easy samples in the decision space, independent of their varying distributions. This is primarily achieved by optimizing a generalized Adaptive Focal Loss (AFL) with a theoretical guarantee, where two adaptive coefficients control the ratio of gradient values for hard and easy pixels. Our analysis reveals that the commonly used Focal and BCE losses can be considered special cases of the proposed AFL loss. With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods. Code will be publicly available at


page 1

page 3

page 5

page 10


PiClick: Picking the desired mask in click-based interactive segmentation

Click-based interactive segmentation enables productive pixel-level anno...

EdgeFlow: Achieving Practical Interactive Segmentation with Edge-Guided Flow

High-quality training data play a key role in image segmentation tasks. ...

SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

Instance-level segmentation of documents consists in assigning a class-a...

Pixel Adaptive Deep Unfolding Transformer for Hyperspectral Image Reconstruction

Hyperspectral Image (HSI) reconstruction has made gratifying progress wi...

VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation

The integration of diverse visual prompts like clicks, scribbles, and bo...

Transformer-based Annotation Bias-aware Medical Image Segmentation

Manual medical image segmentation is subjective and suffers from annotat...

Unleashing the Power of Visual Prompting At the Pixel Level

This paper presents a simple and effective visual prompting method for a...

Please sign up or login with your details

Forgot password? Click here to reset