Follow Anything: Open-set detection, tracking, and following in real-time

08/10/2023
by   Alaa Maalouf, et al.
0

Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed “follow anything” (FAn), is an open-vocabulary and multimodal model – it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (foundation models), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source all our code on our project webpage at https://github.com/alaamaalouf/FollowAnything . We also encourage the reader the watch our 5-minutes explainer video in this https://www.youtube.com/watch?v=6Mgt3EPytrw .

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

research
02/14/2023

ConceptFusion: Open-set Multimodal 3D Mapping

Building 3D maps of the environment is central to robot navigation, plan...
research
03/16/2021

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

Multimodal pre-training has propelled great advancement in vision-and-la...
research
09/15/2022

PIZZA: A Powerful Image-only Zero-Shot Zero-CAD Approach to 6 DoF Tracking

Estimating the relative pose of a new object without prior knowledge is ...
research
05/11/2023

Segment and Track Anything

This report presents a framework called Segment And Track Anything (SAMT...
research
05/26/2023

Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models

We focus on the challenge of out-of-distribution (OOD) detection in deep...
research
03/10/2021

PatchNet – Short-range Template Matching for Efficient Video Processing

Object recognition is a fundamental problem in many video processing tas...
research
06/08/2020

Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View with a Reachability Prior

In this paper, we investigate the problem of anticipating future dynamic...

Please sign up or login with your details

Forgot password? Click here to reset