Posterous theme by Cory Watilo

Filed under: penumbra

shaders in clojure, revisited

A number of months back, I wrote about how to use OpenGL shaders in Clojure, using Penumbra. This introduced an s-expression representation of GLSL, which allowed for an idiomatic specification of shader code. It didn’t, however, cut down on the boilerplate that surrounds the OpenGL programmable pipeline.

There are two steps in the programmable pipeline: the vertex shader and the fragment shader. The vertex shader transforms geometry from the world-space to screen-space. The geometry is then discretized into pixels, and then each pixel is processed through the fragment shader. For a more in-depth look at this process, check out this set of articles.

This is referred to as a pipeline because data flows from the vertices to the vertex shader to the fragment shader. Since these shaders are meant to be swappable, the vertex shader has to declare what data it passes the fragment shader, and the fragment shader has to declare what data it receives from the vertex shader. Mismatched declarations are allowed, which can be useful, but mostly it’s just redundant and increases the potential for errors. If we’re willing to treat the pipeline as a single entity, our specification can become shorter, simpler, and much more straightforward to develop.

So, let’s revisit the marble teapot:

In the previous example, we used the Perlin noise lookup that’s built into GLSL. This approach had a lot of limitations; this built-in function only works properly on a fraction of all hardware, and it can only be used within the vertex shader. As a result, the marble coloration was per-vertex rather than per-pixel, and looked a lot less detailed.

To create our own per-pixel Perlin noise implementation, we need a block of 3-D random noise. We can get this by creating a 3-D texture, which is seeded with random numbers from -1 to 1. To perform a per-pxiel lookup, each pixel needs to know its location in the world space. This is different than its location with respect to the camera; if we used the location in camera-space, our teapot looks like this:

We will also need to know the normal on a per-pixel basis, to allow for per-pixel lighting. These two values, along with the transformed screen-space location, are represented as a hash-map at the end of the vertex shader. These values are effectively the “return value” of the vertex shader.

Notice that in the fragment code, we reference both position and normal, which are defined in the vertex shader. The % variable represents the 3-D texture that’s seeded with random values. Also notice that while this looks a lot like Clojure code, it uses operators like += and *= that belie the underlying C implementation. The above code expands to this, and the complete code can be found here.

Stay tuned for future announcements. I hope to have vertex buffers implemented in the next week or two.

creating a simple game in clojure

There’s a certain type of person who, upon seeing a new language, feels compelled to write a game with it. Functional languages represent a unique challenge in this respect, as has been written about at some length elsewhere. Clojure adds an interesting wrinkle to this problem, in the form of its Java interop. If a game consists of a thin layer of Clojure wrapped around a full-featured Java game engine, is it actually “written in Clojure”?

Without getting bogged down in semantics, I suggest that a Clojure application (such as the game discussed in this post) is something that can be easily extended using Clojure’s data structures and concurrency primitives. This may or may not be true of the hypothetical Clojure-wrapped game engine; it depends on the design and implementation details.

To explore what goes into making a game in Clojure, we’ll look at an implementation of Asteroids that uses Penumbra, an idiomatic wrapper for OpenGL. Penumbra was written to streamline doing interesting things with the graphics card, which includes small-scale game development. The issues I encountered while developing this game informed a number of design decisions I later made.

a description of the game

The game of Asteroids has three major elements: the spaceship, the bullets, and the eponymous asteroids. Optional elements include exploision effects for when an asteroid is split or the spaceship is destroyed, and exhaust effects to indicate thrust from the spaceship.

When implementing the game, we should first consider how each of these elements behave, as well as how they interact. Some of these details will vary between implementations, but for simplicity’s sake we will assume the following:

  • Bullets travel a straight, predictable path.
    • If they come into contact with an asteroid, they blow up the asteroid and disappear.
    • If they don’t encounter an asteroid, they will eventually disappear.
  • Asteroids travel a straight, predictable path.
    • Asteroids come in three sizes: large, medium, and small.
    • Asteroids explode when they come into contact with a bullet or the spaceship.
    • When a large or medium asteroid explodes, it emits four smaller, faster asteroids in random directions. Small asteroids disappear altogether.
  • The spaceship travels an unpredictable path dictated by user input.
    • When the spaceship comes into contact with an asteroid, it explodes.
    • The spaceship has a position and velocity.
      • The position is constantly updated w.r.t. the velocity and elapsed time since the last update.
      • When the engine is off, the velocity remains constant. When the engine is engaged, the velocity is constantly updated w.r.t. the orientation of the ship and the elapsed time since the last update.
  • Exhaust particles travel a straight, predictable path.
    • They are emitted whenever the ship’s engine is engaged.
    • They travel the opposite direction of the ship, adjusted for the ship’s velocity (this is important; things will look very strange otherwise).
    • They are purely for show, and don’t interact with anything.
    • After a while, they disappear.
  • Explosion particles travel a straight, predictable path.
    • They are emitted in random directions from the explosion’s center.
    • They are purely for show, and don’t interact with anything.
    • After a while, they disappear.

Looking at this list, we can see that four out of the five elements have paths that are predestined; their position at any given moment is determined the moment they are created. This means that while we could store their position in memory and update that position each frame, we don’t have to. Instead, we have the option of defining a function which takes the game time as a parameter, and returns the current position.

This approach has two benefits. First, it’s much less taxing to the garbage collector; while a perfect garbage collector would be able to handle constant, predictable churn by just allocating over the previous frame’s state, reality is a little less kind. Updating too many things too often will lead to noticeable GC pauses, which any real-time application should strive to avoid. Second, it lets us treat the game as a constantly changing system which is sampled by the renderer, rather than a succession of snapshots whose frequency is arbitrarily dictated by the game’s frame rate. Specific reasons for why this is desirable will be discussed later.

It’s also obvious that the only difference between the exhaust and explosion particles are their initial conditions. As long as we make our generation method flexible enough, there’s no reason to deal with these elements separately.

So to create the game, we need methods for collision testing, generation, physics (for the spaceship), and finally rendering. Below, we’ll look at a few of the more interesting aspects. The complete source can be found here.

collision

We need to be able to test collisions between asteroids and ships, and asteroids and bullets. Bullets are perfectly circular, but asteroids and the ship are irregular shapes, so doing this accurately could be pretty complicated. But if we just assume everything’s a circle, it’s easy:

Notice that this code makes a number of assumptions about how the program will be structured: all of our game elements will be hashes, all of them will have :position and :radius keys, and those keys will either return static values or functions.

Nowhere do we define an interface that explicitly states this, nor can we tell at compile time whether the game elements actually satisfy these requirements. It’s easy to see how this approach could cause issues in larger scale development, but it does give us a certain flexibility, and for a program this size, the complexity is manageable.

generating asteroids

All of the elements exist on a 2-D plane, but that doesn’t mean we can’t render them as 3-D objects. We want our asteroids to be spherical, but irregular. We could randomly perturb the vertices, but that would just give us a spiky ball. The position of one vertex can’t be independent of its neighbors. In other words, we want to generate a fractal asteroid.

There are a number of different ways to accomplish this, but one of the simplest is the fault line algorithm:

  • Split the asteroid into equal halves.
  • Expand everything in one half, and shrink everything in the other.
  • Repeat.

The resulting asteroid isn’t too distinctive, so we can just generate a few asteroid templates, and choose one at random whenever we create a new asteroid.

generating particles

Particles are soft-edged circles which, in large numbers, can be used to represent amorphous shapes. To create a texture, we define a texture which is perfectly white, but with varying transparency. Towards the middle, it is perfectly opaque, but falls off towards the edges per a Gaussian function.

which yields

Notice that we’ve wrapped the definitions of particle-tex and particle-quad inside a function. This is because we need to have both of these executed inside an OpenGL context. This allows us to freely reference these vars in the rest of the code, as long as we make sure init-particles is executed within an OpenGL context, and before any code that references the internal definitions.

Particles can fill a number of roles. Individual particles can represent bullets, and large quantities can be used to represent both explosions and the flame from the ship.

updating the game

Every frame, we need to determine the position of the various game elements, and draw them. We also need to update these elements – testing for collisions, generating particles when necessary – but this is a completely orthogonal concern to the rendering. Most games are single-threaded, and therefore the sequence of events is something like this:

  • Test for user input, and change any related values.
  • Update all the elements in the game, taking into consideration the elapsed time since we last updated the elements.
  • Render the newly updated elements.
  • Repeat.

This artificially conflates the act of updating and rendering, but with Clojure, we’re not nearly as constrained.

In Penumbra, an application is defined as a series of callbacks, such as :init, :mouse-down and :display. With the exception of :display, which cannot affect the game state, these are pure functions, which take the current state of the game, and return an altered version of that state. As a result, each function is independent of the others; as long as they don’t execute at the same time, it doesn’t matter what thread it’s executing on.

These callbacks are called in response to user input (:key-press, :mouse-up) or a frame being rendered (:update, :display). However, how often we test for collisions shouldn’t have to be coupled to how often we render, nor should how often we emit particles for the spaceship’s exhaust. In fact, in the latter case this would only make things look strange: if our frame rate stutters, so would the exhaust. Our rendering should just be a periodic glance into the world of the game, not something that dictates its appearance.

Penumbra supports periodic updates, which are defined as pure functions which alter the state at regular intervals. We can test for collisions 10 times a second, and emit exhaust particles 50 times a second, regardless of the game’s frame rate. This is used somewhat trivially in this game, but it’s worth noting that this can be an extremely powerful capability: we can reason about the game as a collection of independent, asynchronous processes, without concern for the underlying implementation. In this Tetris implementation, for instance, the descent of the blocks is controlled by a periodic update which alters its own frequency based on whether the down arrow is pressed.

Brian Carper recently posted about his difficulties using Clojure’s concurrency primitives during his implementation of an RPG. While I don’t think the approach used by Penumbra is the only way forward, at the very least it demonstrates that it is quite possible to leverage Clojure’s unique capabilities towards the development of games or other real-time graphical applications.

I have recently released version 0.5.0 of Penumbra to clojars, which should greatly simplify its use in a separate project. If anyone has any questions regarding how they could use it for their project, I’d be happy to help.

 

exploring the mandelbrot set with your gpu

The Mandelbrot set is a good candidate for GPU implementation: it's simple, uses vector math, and is embarrassingly parallel.  Doing so, however, generally adds a significant amount of complexity to a program.  The programmer is forced to twist all computations to fit within the graphics pipeline, and a significant amount of ceremony surrounds even the simplest operation.  

The GPGPU framework in Penumbra is designed to hide away all these rough edges, with the goal of making it easier to use the GPU for these sorts of tasks than it would be to use plain Clojure.  It's not quite there yet, but in this particular example it proves to be extremely useful.
The full source code for this program weighs in at just over 100 lines of code, and can be found here.  Here's what it looks like, slowed down to 10 iterations per frame:
Let's look at the different GPU kernels we define for this program, and what they do.
This initializes our initial conditions for a particular view.  We need to keep track of our position as we iterate, and how many iterations we've gone through, and we can store both these pieces of data together in a 3-vector.  The :coord keyword represents the position of the current pixel we're calculating, and :dim the maximum value for :coord.  We calculate the initial position by doing a bilinear interpolation between upper-left and lower-right, which are parameters passed in when the kernel is run.
This is a single iteration of every pixel.  This must be repeated dozens of times for the Mandelbrot set to resolve.  The % symbol represents a pixel from the input data (further data sets would be represented by %2, %3...).  This 3-vector is deconstructed into a 2-vector and the counter, and checks whether we've exited the boundaries.  If we have, we return the value unchanged, since we want to keep track of which iteration it exited on.  Otherwise, we update the position, and increment the counter. 
This transforms the position and iteration count into something we can display onscreen.  We deconstruct the input pixel, and check if it's still inside the boundaries.  If so, we leave it black.  If not, we interpolate between dark blue and white, based on how many iterations it took to escape.
This is a handpicked example of how Penumbra's GPGPU capabilities can be used.  Obviously, not every problem is going to lend itself as easily to this sort of implementation, but I'm hopeful that other people will end up being able to use it for real-world applications.  If you have any ideas in this vein, let me know.

 

performance in factor, java, and clojure

Slava Pestov recently posted a performance comparison between Factor and Java, based on a heavily numerical benchmark.  I've recently added support to Penumbra for offloading computation to the graphics card (GPGPU), which is ideally suited to this sort of computation, so I decided to give it a try.  It's not exactly an apples-to-apples comparison, but the results are interesting nonetheless.

Here is the Clojure+Penumbra implementation, in its entirety:
The first thing you'll notice is that this is about 4x shorter than the Factor version, but that's to be expected; I'm sure Factor would be similarly terse if it were allowed to hide all the code behind a library (Java, on the other hand...).  The really cool part is that on my MacBook Pro, with a fairly middle of the road GPU, it runs in 350ms.  That's 4x faster than the optimized Factor code, and more than 10x faster than Java.

The GPGPU support in Penumbra is a work in progress, but it's looking promising.  Up until now, Clojure has only been able to asymptotically approach the performance of Java (and, in the process, its verbosity).  For this benchmark, at least, it's ahead on both counts by a order of magnitude.

I'll be posting a less trivial example, and a more thorough explanation, in the near future.  

UPDATE
So this is really interesting: I installed Snow Leopard, and my performance improved by a factor of ten.  This now runs 100x times faster than Java.  How unexpectedly awesome.

using shaders in clojure

The graphics card is a massively parallel processor.  It processes graphics first at the geometry level (per-vertex), and then at the image level (per-pixel).  These two steps are called the vertex shader and fragment shader, respectively.  These are normally specified in a C-like language, but Penumbra uses an intermediate s-expression representation, which is translated into GLSL at runtime.

The above shader, with 'noise' equal to 0, looks like this:

Media_httpdlgetdropboxcomu174179teapotnoturbulencepng_edkjxgbhdykbxdc

But if we use a Perlin noise lookup instead, we get something that looks a little bit like marble:
Media_httpfilesgetdropboxcomu174179teapotwithturbulencepng_fftgphtbsxizjhb

This is not a texture; rather, it is a slice of a large, periodic 3-space.  This becomes obvious when we move the square.

This means that we don't have to use a simple shape like a square.  We can use any shape we like, and the grains will travel a continuous path along the z-axis.

Complete code can be found in the /examples subdirectory of the Penumbra repository.

rendering textures in clojure

At its simplest, a texture is an image stretched across a shape. We can define a texture like so:

(def checkers (create-color-texture 128 128))

(draw-to-subsampled-texture 
  checkers 
  
  (fn [[x y] _] 
   
    (if (xor (even? (int (/ x 16))) (even? (int (/ y 16)))) 
   
      [0.8 0.1 0.1 1] 
   
      [0.1 0.1 0.1 1])))) 
First we create a texture that is 128x128 pixels. Then we populate the texture using a function that returns a color for each individual pixel.   We render the texture by associating two-dimensional coordinates in texture-space with each vertex. These coordinates are independent of any specific texture. To associate them with the texture we just defined, we must call bind-texture.

 
(defn textured-quad [] 
  (push-matrix 
   (translate -0.5 -0.5 0.5) 
   (normal 0 0 -1) 
   (draw-quads 
    (texture 1 1) (vertex 1 1 0) 
    (texture 0 1) (vertex 0 1 0) 
    (texture 0 0) (vertex 0 0 0) 
    (texture 1 0) (vertex 1 0 0)))) 
 
(bind-texture checkers) 
(textured-quad) 

We can apply the same texture to multiple shapes. Wherever a textured quad is drawn, the texture that is currently bound will be drawn across it.

 
(defn textured-cube [] 
  (dotimes [_ 4] 
   (rotate 90 0 1 0) 
   (textured-quad)) 
  (rotate 90 1 0 0) 
  (textured-quad) 
  (rotate 180 1 0 0) 
  (textured-quad)) 
 
(bind-texture checkers) 
(textured-cube) 

This rendering of a cube can itself be made into a texture, which can be applied to another shape. 

 
(def scene (create-texture 128 128)) 
 
(render-to-texture scene 
 (with-projection (frustum-view 50 1 0.1 10) 
  (textured-cube))) 

Putting everything together, we get this:

These examples use the Penumbra library. Full code can be found in the /examples subdirectory.

rendering a sierpinski pyramid in clojure

A three-dimensional Sierpinski triangle ( http://en.wikipedia.org/wiki/Sierpinski_triangle ) is created recursively, by replacing each pyramid with five sub-pyramids. To create one, we use Penumbra ( http://github.com/ztellman/penumbra/tree/master ), which is a thin but idiomatic wrapper for OpenGL.

First, we create a pyramid.

(defn draw-pyramid [] 
  (material 0.8 0.2 0.2 1) 
  (draw-triangle-fan 
   (vertex 0 1 0) 
   (dotimes [_ 5] 
    (rotate 90 0 1 0) 
    (normal 1 0.5 1) 
    (vertex 0.5 0 0.5))) 
  (draw-quads 
   (normal 0 -1 0) 
   (dotimes [_ 4] 
    (rotate -90 0 1 0) 
    (vertex 0.5 0 0.5)))) 
The pyramid is defined in two parts: a fan of triangles around the tip, and a square at the bottom. Both are defined by rotating iteratively around the central axis.
 (draw-pyramid) 

Next, we turn one pyramid into five smaller pyramids. 

(defn subdivide [display-list] 
  (push-matrix 
   (scale 0.5 0.5 0.5) 
   (push-matrix 
    (translate 0 1 0) 
    (call-display-list display-list)) 
   (dotimes [_ 4] 
    (rotate 90 0 1 0) 
    (push-matrix 
     (translate 0.5 0 0.5) 
     (call-display-list display-list)))))
The subdivide function takes a display list as its argument, which is a cached list of OpenGL calls. This list is called five times, first at the top and then four times around the central axis.
(subdivide (get-display-list (draw-pyramid)))

We can call subdivide as many times as we like, but Clojure gives us a superior approach, by way of lazy sequences. 



(defn sierpinski [] 
  (iterate 
    #(get-display-list (subdivide %)) 
    (get-display-list (draw-pyramid)))) 
The 'iterate' function creates a lazy sequence based on a function that takes n and returns n+1. On each iteration, the previous display list is shrunk to half its size on all dimensions, and drawn five times. This lets us generate as much geometry as we need, and no more.
(call-display-list (nth (sierpinski) 2))

(call-display-list (nth (sierpinski) 6))

Complete code can be found in the /examples subfolder in the Penumbra repository.