Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

by   Kibok Lee, et al.

Most existing works on few-shot object detection (FSOD) focus on a setting where both pre-training and few-shot learning datasets are from a similar domain. However, few-shot algorithms are important in multiple domains; hence evaluation needs to reflect the broad applications. We propose a Multi-dOmain Few-Shot Object Detection (MoFSOD) benchmark consisting of 10 datasets from a wide range of domains to evaluate FSOD algorithms. We comprehensively analyze the impacts of freezing layers, different architectures, and different pre-training datasets on FSOD performance. Our empirical results show several key factors that have not been explored in previous works: 1) contrary to previous belief, on a multi-domain benchmark, fine-tuning (FT) is a strong baseline for FSOD, performing on par or better than the state-of-the-art (SOTA) algorithms; 2) utilizing FT as the baseline allows us to explore multiple architectures, and we found them to have a significant impact on down-stream few-shot tasks, even with similar pre-training performances; 3) by decoupling pre-training and few-shot learning, MoFSOD allows us to explore the impact of different pre-training datasets, and the right choice can boost the performance of the down-stream tasks significantly. Based on these findings, we list possible avenues of investigation for improving FSOD performance and propose two simple modifications to existing algorithms that lead to SOTA performance on the MoFSOD benchmark. The code is available at


page 2

page 21


CD-FSOD: A Benchmark for Cross-domain Few-shot Object Detection

Although few-shot object detection (FSOD) has attracted great research a...

Region Proposal Network Pre-Training Helps Label-Efficient Object Detection

Self-supervised pre-training, based on the pretext task of instance disc...

Understanding Cross-Domain Few-Shot Learning: An Experimental Study

Cross-domain few-shot learning has drawn increasing attention for handli...

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

The existing few-shot video classification methods often employ a meta-l...

How to Train an Accurate and Efficient Object Detection Model on Any Dataset

The rapidly evolving industry demands high accuracy of the models withou...

Multidomain Document Layout Understanding using Few Shot Object Detection

We try to address the problem of document layout understanding using a s...

DUMB: A Benchmark for Smart Evaluation of Dutch Models

We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a d...

Please sign up or login with your details

Forgot password? Click here to reset