Tile-Based Deferred Rendering on CPU
Dynamic triangle mesh rendering with perspective-correct interpolation, accelerated on Intel architectures with SSE2. All operations are performed with floating-point arithmetic (save for final pixel colour quantization of course). Fun with EMMS! Also multi-threaded using OpenMP. The output image is divided into tiles which each have associated with them, a list of triangles. These tiles are dispatched to processor cores and rendered independently by finding the closest intersecting triangle for each pixel. I tried sorting the links so that they would be contiguous in memory for each tile, however this degraded performance as the per-frame sorting took too long. This approach allows for one interesting application - shadow volumes are easily implemented as they do not depend on the order of primitive fragments within a pixel. The shadow volume code operates similar to the classical stencil-buffer volumetric shadow technique, but only one pass is required to render the hard-edged shadows. Furthermore, it is easy to performing lighting from multiple sources by tagging each shadow polygon with a lightsource identifier. This is then used to peform lighting calculations for all sources in one pass.
I implemented this algorithm after reading some material which suggested that PVR's GPUs do not require a depth buffer and instead perform a process skin to raycasting. Since then I have found that in fact all GPUs today use something very similar, and of course they do not perform scanline rasterization a la software renderers at all. This is to be expected, as this 'raycasting' operation is highly parallel.
This demo loads a mesh of the well-known Sponza atrium, colours it's vertices and creates some shadow volumes which then become animated. It should be noted that any vertex or triangle in the scene could be changed at any time during the demo, as this renderer does not require any pre-computation. The camera can be rotated and moved using the key controls listed below.
Once again, there are more things I would like to try out with this, such as newer instructions available in the later versions of SSE and even those of AVX which is 8-wide SIMD as opposed to SSE's 4-wide for single-precision floats. It would be nice to have the shadow volumes projected from a lightsource.
Download. Source code is licensed under the zlib license.
http://fgiesen.wordpress.com/2011/07/06/a-trip-through-the-graphics-pipeline-2011-part-6 - A relevant part of a series on GPU pipelines. On this page you should be able to find a link to an article written by Michael Abrash on 'software' rasterization for Larrabee.
http://imgtec.com/powervr/insider/powervr-sdk-docs.asp - Some actual information about the real PowerVR hardware.