Scalable Data Balancing for Unlabeled Satellite Imagery

07/07/2021
by   Deep Patel, et al.
14

Data imbalance is a ubiquitous problem in machine learning. In large scale collected and annotated datasets, data imbalance is either mitigated manually by undersampling frequent classes and oversampling rare classes, or planned for with imputation and augmentation techniques. In both cases balancing data requires labels. In other words, only annotated data can be balanced. Collecting fully annotated datasets is challenging, especially for large scale satellite systems such as the unlabeled NASA's 35 PB Earth Imagery dataset. Although the NASA Earth Imagery dataset is unlabeled, there are implicit properties of the data source that we can rely on to hypothesize about its imbalance, such as distribution of land and water in the case of the Earth's imagery. We present a new iterative method to balance unlabeled data. Our method utilizes image embeddings as a proxy for image labels that can be used to balance data, and ultimately when trained increases overall accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2021

Automated System for Ship Detection from Medium Resolution Satellite Optical Imagery

In this paper, we present a ship detection pipeline for low-cost medium ...
research
06/13/2021

Reducing Effects of Swath Gaps on Unsupervised Machine Learning Models for NASA MODIS Instruments

Due to the nature of their pathways, NASA Terra and NASA Aqua satellites...
research
04/27/2022

An Iterative Labeling Method for Annotating Fisheries Imagery

In this paper, we present a methodology for fisheries-related data that ...
research
09/01/2022

Enabling Country-Scale Land Cover Mapping with Meter-Resolution Satellite Imagery

High-resolution satellite images can provide abundant, detailed spatial ...
research
05/04/2022

Positional Accuracy Assessment of Historical Google Earth Imagery

Google Earth is the most popular virtual globe in use today. Given its p...
research
09/14/2023

Large-scale Weakly Supervised Learning for Road Extraction from Satellite Imagery

Automatic road extraction from satellite imagery using deep learning is ...
research
12/28/2022

Curator: Creating Large-Scale Curated Labelled Datasets using Self-Supervised Learning

Applying Machine learning to domains like Earth Sciences is impeded by t...

Please sign up or login with your details

Forgot password? Click here to reset