CLIP-based Neural Neighbor Style Transfer for 3D Assets
We present a method for transferring the style from a set of images to a 3D object. The texture appearance of an asset is optimized with a differentiable renderer in a pipeline based on losses using pretrained deep neural networks. More specifically, we utilize a nearest-neighbor feature matching loss with CLIP-ResNet50 to extract the style from images. We show that a CLIP- based style loss provides a different appearance over a VGG-based loss by focusing more on texture over geometric shapes. Additionally, we extend the loss to support multiple images and enable loss-based control over the color palette combined with automatic color palette extraction from style images.
READ FULL TEXT