GPU Particles

Modern graphic cards are the source of an enormous computing power that can be used not only for rendering. How about updating over million particles with a realtime changing enviroment or emitter position?


There are many ways to compute data on gpu. This example will show how to do this with Transform Feedback. This feature, introduced in OpenGL 3.0, allows to send computed data from vertex shader back to the vertex buffer.
The image below shows the flow of updating and drawing particles.

In this example one particle contains:


Which gives us 12 GLfloats per particle - position, color, velocity, life time (lt) and "was emitted" flag (we).

Assuming that we want to update one milion particles there are two Vertex Object Buffers needed with size of 48 milions of bytes ( ~46 MB) one. There are two VBOs needed because vertex shader output can’t be stored in the same VBO from which the input come.

The program that updates data is very specific, because it is made only from one shader – the vertex shader.

Right before linking, the shader has to be informed which variables will be the output ones. In this case there are four inputs (Position, Color, Velocity and Other) and four corresponding outputs:
/**
 * These are input variables with arranged locations.
 */
layout(location = 1) in vec3 inPosition;
layout(location = 2) in vec4 inColor;
layout(location = 3) in vec3 inVelocity;
layout(location = 4) in vec2 inOthers;
 
/**
 * Output variables (the same order as input variables!)
 */
out vec3 outPosition;
out vec4 outColor;
out vec3 outVelocity;
out vec2 outOthers;
The program is informed about the outputs before linking.
const char* shaderOutputs[4] = {
 "outPosition",
 "outColor",
 "outVelocity",
 "outOthers"
};
glTransformFeedbackVaryings(shader_compute, 4, shaderOutputs, GL_INTERLEAVED_ATTRIBS);
It is very important to maintain the same order of inputs, outputs and names in glTransformFeedbackVaryings. Without that the data will mix up. When the proper vertex shader is in use and proper vertex attribute pointers are set launch this code to perform computation with the shader:
// Enable rasterizer discard, because compute shader won't raster data
glEnable(GL_RASTERIZER_DISCARD);
 
// Bind the transform feedback buffer using the second vertex buffer object.
// All transformed data will be stored to it.
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, VBO[1]);
 
/// Draw Arrays using Transform Feedback
glBeginTransformFeedback(GL_POINTS);
glDrawArrays(GL_POINTS, 0, particlesCount);
glEndTransformFeedback();
 
// Unbind the transform feedback for safety
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, 0);
 
// Disable the rasterizer discard, because we need rasterization in drawing.
glDisable(GL_RASTERIZER_DISCARD);
The last thing to do is to swap VBO so the output buffer become the input buffer.
std::swap(VBO[0], VBO[1]);
Because in VBO there is always updated data it is very easy to render them. Just use the render shader with vertex atrribute pointers set to the place where positions and colors are stored in VBO and run OpenGL draw call.

For this example I made an emitter that emits from four different places, which are changing their position. The computation shader is emiting particles in proper time and then it updates it velocity and visibility.

But how about performance as compared to CPU?

I wrote a compute shader equivalent in C++ and checked the application on a computer with Intel Core  i7-3770 3.9GHz processor and with GeForce GTX760 graphic card. FPS’s were measured with Fraps.

As shown below GPU handles this kind of computations much better than CPU. With GPU even 16 milions of particles could be rendered with 30fps, where CPU had problems with 2mln.

It is unlikely that some game will offer such amount of particles, but it might be a good technique to relieve CPU from some computation.


You can download a source of the application on GitHub, or just get a working .exe application from here.