Now, as you might already know, surface normals are necessary for shading in computer graphics. Scalar fields (SDFs are scalar fields) have a property called the gradient, which is a vector pointing in the direction of the biggest change from any point. The gradient also turns out to be the normal of a surface. So all we need to do is evaluate the gradient at the hit point to get the normal. To evaluate the gradient, we need to calculate the derivative at our point. Now this might sound complicated, but really what we do is we take a tiny step along each of our 3D axis vectors in both positive and negative directions and then we evaluate the SDF in each of these points. Based on the differences between the positive and negative direction values we will know along which axis the values changed the most. By normalizing the result we get the normal.
fx = SDF(point + (step, 0, 0)) - SDF(point - (step, 0, 0))
fy = SDF(point + (0, step, 0)) - SDF(point - (0, step, 0))
fz = SDF(point + (0, 0, step)) - SDF(point - (0, 0, step))
normal = normalize(fx, fy, fz)
This is really all you need to understand about ray marchers. In the last part you can see how I’ve implemented all of this. But before we can look at my implementation, we need to cover some CUDA concepts.
A Short Introduction to CUDA
A GPU is a component in your computer that is very good at running independent processes in parallel. And because we have multiple pixels whose results are independent of each other GPUs work really well in rendering. One way to program for GPUs is to use CUDA. CUDA’s syntax is similar to C++ so it’s fairly easy to get started with but unfortunately it only works on NVIDIA’s GPUs (because NVIDIA created it, duh).
Let’s go through some terms. In CUDA the GPU is called ‘Device’ and the CPU is called the ‘Host’. The host and the device don’t share memory so when you have some data on the CPU that you want to process on the GPU you have to move it over. CUDA has a function that allocates memory on the GPU (cudaMalloc) and in CUDA you can also copy memory from one to the other (cudaMemcpy). When you don’t need the memory you’ve allocated you need to free it (cudaFree). In CUDA you can write functions for the GPU and/or the CPU, so functions need to specify for which component they are supposed to run on. This is done by adding __device__ or __host__ in the function definition. You can also put both after each other which means that the function can run on both the CPU and the GPU.
There’s another function which is called for each thread and is used to start a process on the GPU. This is called the ‘Kernel’. It is the main function in a CUDA program. The kernel is run for each thread and from the kernel you can call __device__ functions, but not __host__ functions. Instead of __device__, a kernel function is defined with __global__ and when you call a kernel you also need to specify how many GPU threads will run the kernel. We will see an example in the code later.
Two more things left to mention before we can look at my implementation. In CUDA, 3D vectors are of type float3 and you create one using the make_float3() function. Random values are a little bit trickier in CUDA than on the CPU. For each thread you need to initialize a random state based on a seed. The random state object type is called curandState and it is initialized with a seed using a function called curand_init(). Then when you want the next (uniform) random value you have to call another function, which is called curand_uniform(). When managing a curandState you will have to pass the state’s pointer around because it is given as input to the curand functions. Don’t worry if this feels a little confusing, hopefully looking at the code will make everything a tiny bit clearer.
Implementation Walkthrough