Pages

15/01/2013

SSAO Tutorial

Originally posted on 05/01/2011

Background

Ambient occlusion is an approximation of the amount by which a point on a surface is occluded by the surrounding geometry, which affects the accessibility of that point by incoming light. In effect, ambient occlusion techniques allow the simulation of proximity shadows - the soft shadows that you see in the corners of rooms and the narrow spaces between objects. Ambien occlusion is often subtle, but will dramatically improve the visual realism of a computer-generated scene:
The basic idea is to compute an occlusion factor for each point on a surface and incorporate this into the lighting model, usually by modulating the ambient term such that more occlusion = less light, less occlusion = more light. Computing the occlusion factor can be expensive; offline renderers typically do it by casting a large number of rays in a normal-oriented hemisphere to sample the occluding geometry around a point. In general this isn't practical for realtime rendering.

To achieve interactive frame rates, computing the occlusion factor needs to be optimized as far as possible. One option is to pre-calculate it, but this limits how dynamic a scene can be (the lights can move around, but the geometry can't).

Way back in 2007, Crytek implemented a realtime solution for Crysis, which quickly became the yardstick for game graphics. The idea is simple: use per-fragment depth information as an approximation of the scene geometry and calculate the occlusion factor in screen space. This means that the whole process can be done on the GPU, is 100% dynamic and completely independent of scene complexity. Here we'll take a quick look at how the Crysis method works, then look at some enhancements.

Crysis Method

Rather than cast rays in a hemisphere, Crysis samples the depth buffer at points derived from samples in a sphere:

This works in the following way:
  • project each sample point into screen space to get the coordinates into the depth buffer
  • sample the depth buffer
  • if the sample position is behind the sampled depth (i.e. inside geometry), it contributes to the occlusion factor
Clearly the quality of the result is directly proportional to the number of samples, which needs to be minimized in order to achieve decent performance. Reducing the number of samples, however, produces ugly 'banding' artifacts in the result. This problem is remedied by randomly rotating the sample kernel at each pixel, trading banding for high frequency noise which can be removed by blurring the result.
The Crysis method produces occlusion factors with a particular 'look' - because the sample kernel is a sphere, flat walls end up looking grey because ~50% of the samples end up being inside the surrounding geometry. Concave corners darken as expected, but convex ones appear lighter since fewer samples fall inside geometry. Although these artifacts are visually acceptable, they produce a stylistic effect which strays somewhat from photorealism.

Normal-oriented Hemisphere

Rather than sample a spherical kernel at each pixel, we can sample within a hemisphere, oriented along the surface normal at that pixel. This improves the look of the effect with the penalty of requiring per-fragment normal data. For a deferred renderer, however, this is probably already available, so the cost is minimal (especially when compared with the improved quality of the result).

Generating the Sample Kernel

The first step is to generate the sample kernel itself. The requirements are that
  • sample positions fall within the unit hemisphere
  • sample positions are more densely clustered towards the origin. This effectively attenuates the occlusion contribution according to distance from the kernel centre - samples closer to a point occlude it more than samples further away
Generating the hemisphere is easy:
for (int i = 0; i < kernelSize; ++i) {
   kernel[i] = vec3(
   random(-1.0f, 1.0f),
   random(-1.0f, 1.0f),
   random(0.0f, 1.0f)
   kernel[i].normalize();
}
This creates sample points on the surface of a hemisphere oriented along the z axis. The choice of orientation is arbitrary - it will only affect the way we reorient the kernel in the shader. The next step is to scale each of the sample positions to distribute them within the hemisphere. This is most simply done as:
   kernel[i] *= random(0.0f, 1.0f);
which will produce an evenly distributed set of points. What we actually want is for the distance from the origin to falloff as we generate more points, according to a curve like this:

We can use an accelerating interpolation function to achieve this:
   float scale = float(i) / float(kernelSize);
   scale = lerp(0.1f, 1.0f, scale * scale);
   kernel[i] *= scale;

Generating the Noise Texture

Next we need to generate a set of random values used to rotate the sample kernel, which will effectively increase the sample count and minimize the 'banding' artefacts mentioned previously.
for (int i = 0; i < noiseSize; ++i) {
   noise[i] = vec3(
      random(-1.0f, 1.0f),
      random(-1.0f, 1.0f),
      0.0f
   );
   noise[i].normalize();
}
Note that the z component is zero; since our kernel is oriented along the z-axis, we want the random rotation to occur around that axis.

These random values are stored in a texture and tiled over the screen. The tiling of the texture causes the orientation of the kernel to be repeated and introduces regularity into the result. By keeping the texture size small we can make this regularity occur at a high frequency, which can then be removed with a blur step that preserves the low-frequency detail of the image. Using a 4x4 texture and blur kernel produces excellent results at minimal cost. This is the same approach as used in Crysis.

The SSAO Shader

With all the prep work done, we come to the meat of the implementation: the shader itself. There are actually two passes: calculating the occlusion factor, then blurring the result.

Calculating the occlusion factor requires first obtaining the fragment's view space position and normal:
   vec3 origin = vViewRay * texture(uTexLinearDepth, vTexcoord).r;
I reconstruct the view space position by combining the fragment's linear depth with the interpolated vViewRay. See Matt Pettineo's blog for a discussion of other methods for reconstructing position from depth. The important thing is that origin ends up being the fragment's view space position.
Retrieving the fragment's normal is a little more straightforward; the scale/bias and normalization steps are necessary unless you're using some high precision format to store the normals:
   vec3 normal = texture(uTexNormals, vTexcoord).xyz * 2.0 - 1.0;
   normal = normalize(normal);
Next we need to construct a change-of-basis matrix to reorient our sample kernel along the origin's normal. We can cunningly incorporate the random rotation here, as well:
   vec3 rvec = texture(uTexRandom, vTexcoord * uNoiseScale).xyz * 2.0 - 1.0;
   vec3 tangent = normalize(rvec - normal * dot(rvec, normal));
   vec3 bitangent = cross(normal, tangent);
   mat3 tbn = mat3(tangent, bitangent, normal);
The first line retrieves a random vector rvec from our noise texture. uNoiseScale is a vec2 which scales vTexcoord to tile the noise texture. So if our render target is 1024x768 and our noise texture is 4x4, uNoiseScale would be (1024 / 4, 768 / 4). (This can just be calculated once when initialising the noise texture and passed in as a uniform).

The next three lines use the Gram-Schmidt process to compute an orthogonal basis, incorporating our random rotation vector rvec.

The last line constructs the transformation matrix from our tangent, bitangent and normal vectors. The normal vector fills the z component of our matrix because that is the axis along which the base kernel is oriented.

Next we loop through the sample kernel (passed in as an array of vec3, uSampleKernel), sample the depth buffer and accumulate the occlusion factor:
float occlusion = 0.0;
for (int i = 0; i < uSampleKernelSize; ++i) {
// get sample position:
   vec3 sample = tbn * uSampleKernel[i];
   sample = sample * uRadius + origin;
  
// project sample position:
   vec4 offset = vec4(sample, 1.0);
   offset = uProjectionMat * offset;
   offset.xy /= offset.w;
   offset.xy = offset.xy * 0.5 + 0.5;
  
// get sample depth:
   float sampleDepth = texture(uTexLinearDepth, offset.xy).r;
  
// range check & accumulate:
   float rangeCheck= abs(origin.z - sampleDepth) < uRadius ? 1.0 : 0.0;
   occlusion += (sampleDepth <= sample.z ? 1.0 : 0.0) * rangeCheck;
}
Getting the view space sample position is simple; we multiply by our orientation matrix tbn, then scale the sample by uRadius (a nice artist-adjustable factor, passed in as a uniform) then add the fragment's view space position origin.
We now need to project sample (which is in view space) back into screen space to get the texture coordinates with which we sample the depth buffer. This step follows the usual process - multiply by the current projection matrix (uProjectionMat), perform w-divide then scale and bias to get our texture coordinate: offset.xy.

Next we read sampleDepth out of the depth buffer (uTexLinearDepth). If this is in front of the sample position, the sample is 'inside' geometry and contributes to occlusion. If sampleDepth is behind the sample position, the sample doesn't contribute to the occlusion factor. Introducing a rangeCheck helps to prevent erroneous occlusion between large depth discontinuities:

As you can see, rangeCheck works by zeroing any contribution from outside the sampling radius.

The final step is to normalize the occlusion factor and invert it, in order to produce a value that can be used to directly scale the light contribution.
 occlusion = 1.0 - (occlusion / uSampleKernelSize);

The Blur Shader

The blur shader is very simple: all we want to do is average a 4x4 rectangle around each pixel to remove the 4x4 noise pattern:
uniform sampler2D uTexInput;

uniform int uBlurSize = 4; // use size of noise texture

noperspective in vec2 vTexcoord; // input from vertex shader

out float fResult;

void main() {
   vec2 texelSize = 1.0 / vec2(textureSize(uInputTex, 0));
   float result = 0.0;
   vec2 hlim = vec2(float(-uBlurSize) * 0.5 + 0.5);
   for (int i = 0; i < uBlurSize; ++i) {
      for (int j = 0; j < uBlurSize; ++j) {
         vec2 offset = (hlim + vec2(float(x), float(y))) * texelSize;
         result += texture(uTexInput, vTexcoord + offset).r;
      }
   }
 
   fResult = result / float(uBlurSize * uBlurSize);
}
The only thing to note in this shader is uTexelSize, which allows us to accurately sample texel centres based on the resolution of the AO render target.

Conclusion

The normal-oriented hemisphere method produces a more realistic-looking than the basic Crysis method, without much extra cost, especially when implemented as part of a deferred renderer where the extra per-fragment data is readily available. It's pretty scalable, too - the main performance bottleneck is the size of the sample kernel, so you can either go for fewer samples or have a lower resolution AO target.

A demo implementation is available here.


The Wikipedia article on SSAO has a good set of external links and references for information on other techniques for achieving real time ambient occlusion.

24 comments:

  1. Thank you so much for this tutorial. It's a pretty hard effect to implement, I used your tutorial for to add ssao to my own python molecular viewer https://github.com/chemlab/chemlab. What I've obtained so far is this: http://troll.ws/image/d9ab2364 (I have to add the blur step) it looks right to me but there are many mistakes that I could have made.

    I would have never been able to implement such a cool looking effect without your tutorial and your code. I sincerely thank you.

    ReplyDelete
    Replies
    1. I've added blur and have a very small problem.

      In this picture I've rendered with 128 samples a set of procedurally-generated sphere imposters, the problem I'm having is that around each sphere there is a thin halo of non-occlusion:

      http://troll.ws/image/f57a6a71

      Do you have any idea/suggestion about what causes this issue and how to solve this problem?

      The shader I'm using are in this directory: https://github.com/chemlab/chemlab/tree/master/chemlab/graphics/postprocessing/shaders

      Delete
    2. This is the main issue with indiscriminately blurring the AO result. Areas of occlusion/non-occlusion will tend to 'leak' - most noticeably where there are sharp discontinuities in the depth buffer.

      The solution is to use a more complex blur which samples the depth buffer and only blurs pixels which are at a similar depth.

      Another option might be to simply dilate the AO result slightly (after applying the blur). I'm not sure how well this will work, though.

      Delete
  2. Hi,

    Thanks for the great tutorial! I have a question for you:

    What do I need in order to compute the vViewRay vector used for the unprojection?

    The article you linked to says that vViewRay is a vector pointing towards the far-clipping plane - how would I obtain it? Would this be done in the vertex shader of the SSAO fullscreen quad pass or via some other means? Maybe you can share your method of obtaining it :)?

    ReplyDelete
    Replies
    1. You can compute the view ray from the normalized device coordinates of the fragment in question and the field of view angle and aspect ratio of the camera, like this:

      float thfov = tan(fov / 2.0); // can do this on the CPU
      viewray = vec3(
      ndc.x * thfov * aspect,
      ndc.y * thfov,
      1.0
      );

      You can do this either in the vertex shader (and interpolate the view ray), or directly in the fragment shader (compute ndc as texcoords * 2.0 - 1.0).

      Matt actually has another, more in-depth blog post on this topic, which may help clarify things better.

      Delete
  3. Hey John,
    First off, thanks for these great tutorials.

    I have been trying to implement your ssao for a few day and I'm stumped. I took a few images. Some of them are averages across 8 samples.

    Origin: https://dl.dropboxusercontent.com/u/11216481/SSAO/origin.png
    Depth: https://dl.dropboxusercontent.com/u/11216481/SSAO/depth.png
    SamplePosition: https://dl.dropboxusercontent.com/u/11216481/SSAO/sampleP.png
    SampleDepth: https://dl.dropboxusercontent.com/u/11216481/SSAO/sampleDepth.png
    Offset.xy: https://dl.dropboxusercontent.com/u/11216481/SSAO/offset.png

    Could you please let me know if these look right. I have a feeling the sample depth is not right.
    Any help would be greatly appreciated. Thanks!

    ReplyDelete
    Replies
    1. The sample depth does look a bit odd, but it's difficult to tell what's wrong with it. What does the end result look like?

      Delete
    2. I made some progress. When I make uRadius linear (* 1.0/(far-near)) the final result seems a lot better. However everything is flipped around. I tried negating axis of the sample position but it produces strange results. The ssao seems to be calculated corrected, it's just not in the right place lol.

      Result: https://dl.dropboxusercontent.com/u/11216481/SSAO/final.png

      Delete
    3. Is uRadius not linear anyway? It should simply a scale value for the sample kernel. Also, remember that uRadius should be appropriate to the scale of the scene.

      Are you using a left or right handed system; is +z into, or out of the screen?

      Delete
    4. That was the initial problem. uRadius was much to large and it was throwing off the samplePosition and therefore the offset for the sample depth lookup.

      I am using a left handed system (thanks, I wasn't thinking about it). I negated the z component of the viewRay (I calculate it as per the comments above). This properly oriented the ssao and produces great results... mostly.
      looking down z: https://dl.dropboxusercontent.com/u/11216481/SSAO/final-oriented-downz.png

      The above screen has the camera facing in the -z direction. When I turn the camera around 180 to face +z I get a strange artifact. The artifact is gradual when turning the camera and gets the worst when facing +z dead on.
      looking up z: https://dl.dropboxusercontent.com/u/11216481/SSAO/final-oriented-upz.png

      The artifacts move slightly when with animation in the scene; some objects are moving but they are away from the sphere pyramid. It's like the sample depth lookup is wrapping around. However, I have my depth texture clamped to border so that should happen.
      For more info, I am using a 32x32 kernel and 4x4 noise texture.

      Have you seen anything like this?

      Delete
    5. Looks like the kernel isn't being properly oriented - check that your normals are correct.

      Delete
    6. You were totally right. The problem was with my normal texture. It wasn't in view space... **facepalm** Multiplied by the good ol' inverse transpose modelview and voila!

      FINAL

      Thanks for all your help man!

      Delete
  4. This is what it looks like now:

    https://dl.dropboxusercontent.com/u/43006973/ssao/origin%202.png

    I moved in closer to the cube to take this picture.

    I'm not sure if this is what it's supposed to look like.

    ReplyDelete
    Replies
    1. Yes, that looks correct - remember that you're in view space, so you'd expect to see red along the X axis, Y green, Z blue.

      As you're working with right-handed coordinates you may need to negate the kernel sample.

      Delete
    2. I negated the sample kernel, which didn't fix my problem.

      I looked at the range check value, and found out it only reached 1 when I either got very close to the object, or made the radius very large (~10 instead of 1). I doubt this is correct, and I have no idea what could be causing it.

      The depth is being linearized when fetching the sample depth from the texture.

      Delete
    3. Well we know that the view space position is being calculated correctly now. Are you getting any occlusion results at all? Take the range check out and see if you get any results.

      Delete
    4. No, I'm not getting anything, just a white screen.

      Delete
    5. Well all I can really say is go back to the beginning and make sure you're getting sensible results at every step. You can use the sample implementation to visualize the outputs from a working version and check that what you're getting matches.

      Delete
    6. Well, I finally got it working. Thanks for all the help.

      I had two things missing to make this work with a right handed system:

      First, I had to negate the sample depth after sampling and linearization to make it increase and decrease with the z axis.

      Second, when accumulating the occlusion, the operator comparing the sample depth and sample z needs to be '=>', not '<=' (or, if using the step function, switch the arguments).

      Finally, I found a way to reduce the hallow due to blurring: instead of iterating on [0, noiseSize[, iterate on [-noiseSize / 2, noiseSize / 2[. Iterating on [-noiseSize / 2, noiseSize / 2] and dividing by (noiseSize + 1)^2 might even be better (haven't tested it yet).

      Delete
    7. Good to hear - I am going to add some notes on the differences between left/right handed implementations of this as it seems to be the main source of problems for people trying to implement it themselves.

      I was going to say that you were wrong about reducing the halo, but then I realized that the simple blur code in the tutorial is actually different to the code in the demo implementation! The demo does what you suggest - it centres the halo on the object border, which reduces the halo. This is about as good as it gets without dilating the SSAO result.

      Delete
  5. hi..Im student from Informatics engineering, this article is very informative, thanks for sharing :)

    ReplyDelete
  6. Hi there, I was learning from this tutorial, and I loved the camera movements, would you tell how you implemented the smoothness/lerping? Thanks.

    ReplyDelete
    Replies
    1. Hi - you can take a look at the source code in the demos section; most of the interpolation is in common.zip (in sml_0.6.0, in include/itpl.h)

      Delete
  7. Hello John,

    first, thanks for the tutorial!
    I tried to implement the SSAO Shader, but there is still something wrong with it.

    Here is the code of the fragment shader: http://pastebin.com/a3Jb2wCT
    The result looks like this: https://www.dropbox.com/s/zwhi9rton9of3sw/ssao.png?dl=0

    Here are some other things that I've rendered. I did it like Derek Sorensen and averaged the offset values from the loop:
    Origin: https://www.dropbox.com/s/inj9m9p18autegd/origin.png?dl=0
    Offset.xy: https://www.dropbox.com/s/cf94bt38dcekfzy/offset.png?dl=0

    To me, it seems like offset.xy is wrong, but I just cannot figure out why.
    Do you have an idea?

    ReplyDelete