Compositional Neural Scene Representations for Shading Inference
Jonathan Granskog (NVIDIA/ETH Zurich), Fabrice Rousselle (NVIDIA), Marios Papas (DisneyResearch|Studios), Jan Novák (NVIDIA)
Paper | Supplemental | Code
Abstract
We present a technique for adaptively partitioning neural scene representations. Our method disentangles lighting, material, and geometric information yielding a scene representation that preserves the orthogonality of these components, improves interpretability of the model, and allows compositing new scenes by mixing components of existing ones. The proposed adaptive partitioning respects the uneven entropy of individual components and permits compressing the scene representation to lower its memory footprint and potentially reduce the evaluation cost of the model. Furthermore, the partitioned representation enables an in-depth analysis of existing image generators. We compare the flow of information through individual partitions, and by contrasting it to the impact of additional inputs (G-buffer), we are able to identify the roots of undesired visual artifacts, and propose one possible solution to remedy the poor performance. We also demonstrate the benefits of complementing traditional forward renderers by neural representations and synthesis, e.g. to infer expensive shading effects, and show how these could improve production rendering in the future if developed further.
1 Minute Summary
We use neural networks to infer shading in computer graphics scenes. Previous works (Nalbach et al. [2018]) have used neural networks to synthesize shading based on input image buffers, but this approach will discard anything not seen by the buffers. Inspired by Eslami et al. [2018], we use neural scene representations to fill in information missing from buffers. This approach uses two networks; an encoder and a generator. The encoder extracts the scene representation from a set of image observations of the scene and the corresponding camera coordinates. The generator network synthesizes the output image based on a query camera coordinate, a set of query buffers and the extracted scene representation.
The extracted neural representations contain useful scene information, but are difficult to study without any understanding of the structure. To combat the black-box nature of the representations, we partition them into separate parts for lighting, material and geometric information using a method by Kulkarni et al. [2015]. This method results in a static partitioning where each partition has to be predefined before training. We propose to learn the sizes of partitions. To accomplish this, we replace the non-differentiable hard boundaries with differentiable soft boundaries using Sigmoid functions that can be shifted. Over the training duration, we sharpen the Sigmoids such that they converge to hard boundaries. Now we can also learn to compress the representations simply by adding a fourth “null” partition that is always full of zeroes. We apply an additional loss term to encourage the growth of this partition. After training, the network can be pruned based on the level of compression.
Finally, we use an attribution method to study how the networks function. With our partitioning scheme, we can color attributions based on which partition the attributions come from. This lets us investigate how the networks extract different types of information from the observations. The image below shows a few patches where we compute the attribution for certain patches. Yellow symbolizes lighting information, blue geometry and finally red for materials.
We hope this work inspires further research into neural scene representations for computer graphics. We believe more scalable and compositional representations could potentially have a large impact if researched in more detail.
Citation
@article{granskog2020,
author = {Granskog, Jonathan and Rousselle, Fabrice and Papas, Marios and Nov\'{a}k, Jan},
title = {Compositional Neural Scene Representations for Shading Inference},
journal = {ACM Transactions on Graphics (Proceedings of SIGGRAPH)},
volume = {39},
number = {4},
year = {2020},
month = jul,
keywords = {rendering, neural networks, neural scene representations, disentaglement, attribution}
}