Towards Fine-grained Human Pose Transfer with Detail Replenishing Network

by   Lingbo Yang, et al.

Human pose transfer (HPT) is an emerging research topic with huge potential in fashion design, media production, online advertising and virtual reality. For these applications, the visual realism of fine-grained appearance details is crucial for production quality and user engagement. However, existing HPT methods often suffer from three fundamental issues: detail deficiency, content ambiguity and style inconsistency, which severely degrade the visual quality and realism of generated images. Aiming towards real-world applications, we develop a more challenging yet practical HPT setting, termed as Fine-grained Human Pose Transfer (FHPT), with a higher focus on semantic fidelity and detail replenishment. Concretely, we analyze the potential design flaws of existing methods via an illustrative example, and establish the core FHPT methodology by combing the idea of content synthesis and feature transfer together in a mutually-guided fashion. Thereafter, we substantiate the proposed methodology with a Detail Replenishing Network (DRN) and a corresponding coarse-to-fine model training scheme. Moreover, we build up a complete suite of fine-grained evaluation protocols to address the challenges of FHPT in a comprehensive manner, including semantic analysis, structural detection and perceptual quality assessment. Extensive experiments on the DeepFashion benchmark dataset have verified the power of proposed benchmark against start-of-the-art works, with 12%-14% gain on top-10 retrieval recall, 5% higher joint localization accuracy, and near 40% gain on face identity preservation. Moreover, the evaluation results offer further insights to the subject matter, which could inspire many promising future works along this direction.


page 1

page 2

page 3

page 6

page 9

page 10

page 12


Towards General Visual-Linguistic Face Forgery Detection

Deepfakes are realistic face manipulations that can pose serious threats...

Retrieve in Style: Unsupervised Facial Feature Transfer and Retrieval

We present Retrieve in Style (RIS), an unsupervised framework for fine-g...

RetouchingFFHQ: A Large-scale Dataset for Fine-grained Face Retouching Detection

The widespread use of face retouching filters on short-video platforms h...

Region-adaptive Texture Enhancement for Detailed Person Image Synthesis

The ability to produce convincing textural details is essential for the ...

A New Benchmark and Approach for Fine-grained Cross-media Retrieval

Cross-media retrieval is to return the results of various media types co...

Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis

Semantic image synthesis is a challenging task with many practical appli...

No-Frills Human-Object Interaction Detection: Factorization, Appearance and Layout Encodings, and Training Techniques

We show that with an appropriate factorization, and encodings of layout ...

Please sign up or login with your details

Forgot password? Click here to reset