Fine-grained Text and Image Guided Point Cloud Completion with CLIP Model

08/17/2023
by   Wei Song, et al.
0

This paper focuses on the recently popular task of point cloud completion guided by multimodal information. Although existing methods have achieved excellent performance by fusing auxiliary images, there are still some deficiencies, including the poor generalization ability of the model and insufficient fine-grained semantic information for extracted features. In this work, we propose a novel multimodal fusion network for point cloud completion, which can simultaneously fuse visual and textual information to predict the semantic and geometric characteristics of incomplete shapes effectively. Specifically, to overcome the lack of prior information caused by the small-scale dataset, we employ a pre-trained vision-language model that is trained with a large amount of image-text pairs. Therefore, the textual and visual encoders of this large-scale model have stronger generalization ability. Then, we propose a multi-stage feature fusion strategy to fuse the textual and visual features into the backbone network progressively. Meanwhile, to further explore the effectiveness of fine-grained text descriptions for point cloud completion, we also build a text corpus with fine-grained descriptions, which can provide richer geometric details for 3D shapes. The rich text descriptions can be used for training and evaluating our network. Extensive quantitative and qualitative experiments demonstrate the superior performance of our method compared to state-of-the-art point cloud completion networks.

READ FULL TEXT

page 4

page 5

page 6

page 8

page 9

page 10

page 11

research
09/21/2023

FGFusion: Fine-Grained Lidar-Camera Fusion for 3D Object Detection

Lidars and cameras are critical sensors that provide complementary infor...
research
07/04/2022

VEM^2L: A Plug-and-play Framework for Fusing Text and Structure Knowledge on Sparse Knowledge Graph Completion

Knowledge Graph Completion has been widely studied recently to complete ...
research
10/08/2022

FBNet: Feedback Network for Point Cloud Completion

The rapid development of point cloud learning has driven point cloud com...
research
05/25/2023

T2TD: Text-3D Generation Model based on Prior Knowledge Guidance

In recent years, 3D models have been utilized in many applications, such...
research
08/18/2021

ME-PCN: Point Completion Conditioned on Mask Emptiness

Point completion refers to completing the missing geometries of an objec...
research
08/04/2023

Exploring Part-Informed Visual-Language Learning for Person Re-Identification

Recently, visual-language learning has shown great potential in enhancin...
research
09/14/2023

Looking at words and points with attention: a benchmark for text-to-shape coherence

While text-conditional 3D object generation and manipulation have seen r...

Please sign up or login with your details

Forgot password? Click here to reset