CLIP-based Neural Neighbor Style Transfer for 3D Assets

Shailesh Mishra (Saarland University, NVIDIA), Jonathan Granskog (NVIDIA)


We present a method for transferring the style from a set of images to the texture of a 3D object. The texture of an asset is optimized with a differentiable renderer and losses using pretrained deep neural networks. More specifically, we utilize a nearest-neighbor feature matching (NNFM) loss with CLIP-ResNet50 that we extend to support multiple style images. We improve color accuracy and artistic control with an extra loss on user-provided or automatically extracted color palettes. Finally, we show that a CLIP-based NNFM loss provides a different appearance over a VGG-based one by focusing more on textural details over geometric shapes. However, we note that user preference is still subjective.


Scenes


Turntables


Citation

@inproceedings {N20115:2023,
booktitle = {Eurographics 2023 - Short Papers},
editor = {Babaei, VahidSkouras, Melina},
title = {{CLIP-based Neural Neighbor Style Transfer for 3D Assets}},
author = {Mishra, Shailesh and Granskog, Jonathan},
year = {2023},
publisher = {The Eurographics Association},
ISBN = {978-3-03868-209-7},
pages = {25-284 pages},
DOI = {10.2312/egs.20231006}
}