G^3: Geolocation via Guidebook Grounding

11/28/2022
by   Grace Luo, et al.
10

We demonstrate how language can improve geolocation: the task of predicting the location where an image was taken. Here we study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation. We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations and an associated textual guidebook for GeoGuessr, a popular interactive geolocation game. Our approach predicts a country for each image by attending over the clues automatically extracted from the guidebook. Supervising attention with country-level pseudo labels achieves the best performance. Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5 accuracy. Our dataset and code can be found at https://github.com/g-luo/geolocation_via_guidebook_grounding.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 8

page 12

page 13

research
12/31/2021

Deconfounded Visual Grounding

We focus on the confounding bias between language and location in the vi...
research
03/16/2022

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Visual grounding, i.e., localizing objects in images according to natura...
research
07/21/2022

Grounding Visual Representations with Texts for Domain Generalization

Reducing the representational discrepancy between source and target doma...
research
09/02/2022

Which country is this picture from? New data and methods for DNN-based country recognition

Predicting the country where a picture has been taken from has many pote...
research
05/15/2023

CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding

Visual Grounding (VG) refers to locating a region described by expressio...
research
03/18/2023

Grounding 3D Object Affordance from 2D Interactions in Images

Grounding 3D object affordance seeks to locate objects' ”action possibil...
research
04/27/2023

Learning Human-Human Interactions in Images from Weak Textual Supervision

Interactions between humans are diverse and context-dependent, but previ...

Please sign up or login with your details

Forgot password? Click here to reset