Towards Fairer Datasets: Filtering and Balancing the Distribution of the People Subtree in the ImageNet Hierarchy

12/16/2019
by   Kaiyu Yang, et al.
9

Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including offensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the "person" subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the first steps to mitigate them constructively.

READ FULL TEXT

page 5

page 8

page 10

page 14

research
06/24/2020

Large image datasets: A pyrrhic win for computer vision?

In this paper we investigate problematic practices and consequences of l...
research
02/03/2021

One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision

Computer vision is widely deployed, has highly visible, society altering...
research
05/03/2019

Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets

The ImageNet dataset ushered in a flood of academic and industry interes...
research
12/09/2022

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

Machine learning models have been found to learn shortcuts – unintended ...
research
08/16/2023

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Biases in large-scale image datasets are known to influence the performa...
research
05/26/2021

Computer Vision and Conflicting Values: Describing People with Automated Alt Text

Scholars have recently drawn attention to a range of controversial issue...
research
10/10/2020

Defining Computer Art: Methods, Themes, and the Aesthetic Problematic

The application of computer technology in the field of art has given ris...

Please sign up or login with your details

Forgot password? Click here to reset