Reduce Information Loss in Transformers for Pluralistic Image Inpainting

05/10/2022
by   Qiankun Liu, et al.
10

Transformers have achieved great success in pluralistic image inpainting recently. However, we find existing transformer based solutions regard each pixel as a token, thus suffer from information loss issue from two aspects: 1) They downsample the input image into much lower resolutions for efficiency consideration, incurring information loss and extra misalignment for the boundaries of masked regions. 2) They quantize 256^3 RGB pixels to a small number (such as 512) of quantized pixels. The indices of quantized pixels are used as tokens for the inputs and prediction targets of transformer. Although an extra CNN network is used to upsample and refine the low-resolution results, it is difficult to retrieve the lost information back.To keep input information as much as possible, we propose a new transformer based framework "PUT". Specifically, to avoid input downsampling while maintaining the computation efficiency, we design a patch-based auto-encoder P-VQVAE, where the encoder converts the masked image into non-overlapped patch tokens and the decoder recovers the masked regions from inpainted tokens while keeping the unmasked regions unchanged. To eliminate the information loss caused by quantization, an Un-Quantized Transformer (UQ-Transformer) is applied, which directly takes the features from P-VQVAE encoder as input without quantization and regards the quantized tokens only as prediction targets. Extensive experiments show that PUT greatly outperforms state-of-the-art methods on image fidelity, especially for large masked regions and complex large-scale datasets. Code is available at https://github.com/liuqk3/PUT

READ FULL TEXT

page 5

page 8

page 14

page 15

page 16

page 17

page 18

research
03/29/2022

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Recent studies have shown the importance of modeling long-range interact...
research
05/31/2021

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Transformers have offered a new methodology of designing neural networks...
research
11/27/2021

FQ-ViT: Fully Quantized Vision Transformer without Retraining

Network quantization significantly reduces model inference complexity an...
research
09/17/2022

Delving Globally into Texture and Structure for Image Inpainting

Image inpainting has achieved remarkable progress and inspired abundant ...
research
11/05/2021

Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

We present a new perspective of achieving image synthesis by viewing thi...
research
03/24/2022

Transformer Compressed Sensing via Global Image Tokens

Convolutional neural networks (CNN) have demonstrated outstanding Compre...
research
09/05/2023

Extract-and-Adaptation Network for 3D Interacting Hand Mesh Recovery

Understanding how two hands interact with each other is a key component ...

Please sign up or login with your details

Forgot password? Click here to reset