MVImgNet: A Large-scale Dataset of Multi-view Images

by   Xianggang Yu, et al.

Being data-driven is one of the most iconic properties of deep learning algorithms. The birth of ImageNet drives a remarkable trend of "learning from large-scale data" in computer vision. Pretraining on ImageNet to obtain rich universal representations has been manifested to benefit various 2D visual tasks, and becomes a standard in 2D vision. However, due to the laborious collection of real-world 3D data, there is yet no generic dataset serving as a counterpart of ImageNet in 3D vision, thus how such a dataset can impact the 3D community is unraveled. To remedy this defect, we introduce MVImgNet, a large-scale dataset of multi-view images, which is highly convenient to gain by shooting videos of real-world objects in human daily life. It contains 6.5 million frames from 219,188 videos crossing objects from 238 classes, with rich annotations of object masks, camera parameters, and point clouds. The multi-view attribute endows our dataset with 3D-aware signals, making it a soft bridge between 2D and 3D vision. We conduct pilot studies for probing the potential of MVImgNet on a variety of 3D and 2D visual tasks, including radiance field reconstruction, multi-view stereo, and view-consistent image understanding, where MVImgNet demonstrates promising performance, remaining lots of possibilities for future explorations. Besides, via dense reconstruction on MVImgNet, a 3D object point cloud dataset is derived, called MVPNet, covering 87,200 samples from 150 categories, with the class label on each point cloud. Experiments show that MVPNet can benefit the real-world 3D object classification while posing new challenges to point cloud understanding. MVImgNet and MVPNet will be publicly available, hoping to inspire the broader vision community.


page 1

page 7

page 17

page 18

page 19

page 20

page 21

page 22


Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Traditional approaches for learning 3D object categories have been predo...

Objaverse-XL: A Universe of 10M+ 3D Objects

Natural language processing and 2D vision models have attained remarkabl...

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

Recent advances in modeling 3D objects mostly rely on synthetic datasets...

3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding

The ability to understand the ways to interact with objects from visual ...

CAPNet: Continuous Approximation Projection For 3D Point Cloud Reconstruction Using 2D Supervision

Knowledge of 3D properties of objects is a necessity in order to build e...

Scalable Surface Reconstruction with Delaunay-Graph Neural Networks

We introduce a novel learning-based, visibility-aware, surface reconstru...

NeuralDome: A Neural Modeling Pipeline on Multi-View Human-Object Interactions

Humans constantly interact with objects in daily life tasks. Capturing s...

Please sign up or login with your details

Forgot password? Click here to reset