Game Jam Entry: Super Overkill

It's difficult to get a screenshot without dying!



Technical:

It's an arena shooter. The Game Jam theme was 'super' so I wanted to see if I could do some technical stuff for over the top effects, as well as having lots of bullets and enemies to kill.

I thought about using hardware sprites for effects. There are some restrictions with using sprites on the Genesis, namely a limit of 80 on screen at once, and 20 per scan line (also a max number of pixels per scan line).

It's possible to re-write the sprite list on the horizontal blank interrupt to re-use sprites, and effectively have a lot more on screen. I looked in to this a little, but gave up on the idea as if it didn't work there would not be much time left in the Jam to do something different. I will have a go with it in the future, to see what can be done.

The other alternative with sprites is to be clever with the graphics which are being used, to make it appear that there are more sprites than are actually there. For instance drawing multiple particles in to a sprite to give the appearance of lots of particles. This would require a lot of experimentation and would be tricky for me to do, graphically.

And another approach is what appears to be being done in the Genesis version of Robotron, and other Williams games - that is to use a bitmap screen for everything.

In the end I tried the approach of using a Scroll Layer for effects, with the inbuilt limitation of everything being limited to character movement - ie 8x8 pixels at a time. This shouldn't be too much of a problem with fast moving objects, I thought, as long as they stay on the main axes / diagonals. That would look just like fast moving sprites.

I commandeered my particle system code, which is basically a simple circular buffer. Instead of using this to write sprites, as it was doing previously, I'm writing the animations to a buffer in RAM which is DMA'd to VDP during the vertical blank.

First Implementation: 320x224 - 64x32 Plane size. - The game was until now running in 40-char width mode. This has the disadvantage of needing a 64x32 character VDP buffer, as the Genesis doesn't allow a 40-char scroll plane in hardware. If I was writing directly to the VDP this would be fine, but as I'm going through a RAM buffer this makes the DMA part very wasteful.

So, the first optimization decision (unsurprisingly) was to go to 32-wide mode. This means that I can use a single DMA operation to blit 1792 bytes from RAM to VRAM. The disadvantage is that now I can only have 64 sprites on screen at once, and it looks a bit more Master-Systemy than Megadrivey! :(

So.. I have a load of particle effects going on to test this out. explosions, circles, etc. The initial code works, but is rather slow. For each 'pixel' (ie an 8x8 character) I need to calculate its position in the 2k buffer, cull it off each edge, and then write to the position in the buffer. The code was something like:

; d0 = x, d1 = y, A0 = buffer, d2 = 'pixel' character data

    tst d0
blt .cull
tst d1
blt .cull
cmp #WIDTH,d0
bge .cull
cmp #HEIGHT,d1
bge .cull
; here, calculate screen offset and 'render' the character
.cull:

(actually this would be using registers for width and height, but the principle is shown here)

It's not bad, I can get a lot of effects on screen. I'm going to need quite a bit of CPU for gameplay though with lots of sprites and collisions, and I am definitely not going under 60 fps, that would be sacrilege.

But.. this game is trying to be like an 80s / 90s arcade game. A simple decision makes this a lot faster. I decided to let the particles wrap round the edges, Asteroids-like - It might even look 'better'.

   Instead of the above code I'm left with:

    and #31,d0
    and #31,d1

(again, in reality using registers, so only 8 cycles compared with many times more if we have to cull)

I tried this out, and was happy with the results. I also convinced myself it looked better than the culled version.. more going on on screen at once. And I was able to add a lot more particles, which is what I was after.

The render code is something like this:

    move fxps_x(a0),d0        ; x position
    move fxps_y(a0),d1        ; y position
    and d6,d0                 ; wrap round in x
    and d6,d1              ; wrap round in y
    lsl #6,d1                 ; y * 64 (32 character wide buffer, each char is represented by 1 word)
    add d0,d0                 ; x * 2
    add d1,d0                 ; add them
    move d2,(a1,d0)           ; a1 points at RAM buffer. Write the character to the buffer.

That's still quite hefty - that shift isn't very nice either.

Also up until now I'd been writing the write address in to a list, which I could use to 'undraw' all the rendered pixels instead of clearing the buffer every frame. But the amount of pixels I'm rendering is now growing to be more than I'd originally envisioned for this systtem. Once I've started filling up the screen with moving particles, it's quicker to simply clear the buffer every frame.

Anyway at this point I started thinking about pre-calculating the particle animations, as my particle update code was getting a bit over the top - drawing spirals and lines etc. That's the obvious place to go, but it would require some tooling. First option was to write some code to take a bunch of PNGs and convert them to whatever data I required. But authoring with asperite would be something I'd have to learn how to do, so of course I made a little editor for this.



This meant I had full control over the animation, and could track individual pixels over multiple frames if I needed to. (for fade-out effects for instance)

This tool exported a list of bytes which the render code could read in, add an offset to, and render at the final position. This worked beautifully, using similar code to that in the particle system. However you know what's next.. if you are pre-storing data, maybe you can also do some pre-calculation as well, to speed things up. 

I wanted *at least* to get rid of that 18 cycle shift instruction. But the difficulty lies in the fact that the animation needs to have an origin point added to it for every pixel rendered, also needing to be masked to wrap around.

OK, so what if I could reduce the masking quality a little.. would it be noticable if when a pixel wrapped round to the opposite screen edge, it wasn't quite level with its siblings over on the other edge? It turns out that at least I couldn't notice anything bad about it, which lead to the possibility of further optimizations.

I ended up with the following code:

    lea FXMap,a2        ; a2 points to the RAM map
add d0,d0     ; d0 is 'origin x' - pre multiply it by 2
lsl #6,d1     ; d1 is 'origin y' - pre multiply it by 64
add d1,d0        ; add them together. 
move #(32*64)-1,d3 ; This is the mask for the whole buffer
.lp:
move.w (a1)+,d1     ; grab the exported positional offset.
add d0,d1           ; add the origin position
and d3,d1           ; mask it to the buffer size
move d2,(a2,d1)     ; draw the pixel.
dbra d4,.lp         ; and loop.

This now masks the pixel to the whole buffer rather than each side individually, and pre-calculates the offset for each pixel in the animation. I then also generated a jump table version to unwrap it something like 200 times, to get rid of the wasted loop cycles.
Now everything works really well, and it's something like 8x faster than the original code. Even with 10s of explosions and effects going on it only takes a few scans. This of course opened up the idea to add more effects! I then started adding huge animated text made of individual 'particles'.
After finalizing these effects I realized that I'm not moving this text around at all, it's all fixed in position. Therefore we don't need any of the masking or offset code. In fact all we need to do is export a pre-calculated piece of code for each animation frame.. like this:
.frame31:
move.w d2,1522(a2)
move.w d2,1524(a2)
move.w d2,1526(a2)
move.w d2,1528(a2)
move.w d2,1530(a2)
move.w d2,1458(a2)
rts

This can be optimized even further by combining neighbouring 'pixels' in to longword instructions. But in reality, this code is very fast, and I have plenty of CPU free. It's something to bother with if it ever becomes necessary. This is now a 'compiled sprite' which we used heavily in VGA / EGA PC games in the 90s.

Because we are clearing the RAM buffer each frame, we can try different effects instead of clearing. I wrote a 'fade' function which instead of clearing each pixel, subtracts one from its value instead. Combining this with cleverly animated characters gives a great effect. I'm turning in to Jeff Minter here. But the fade out code is a hell of a lot slower than a simple clear. I'm not currently using it in the final game, but hopefully it will find a home in a future game.


Gameplay

Even though it's trying to be a technical showcase in some ways, gameplay is the most important thing. Asute and I both like arena shooters, and there aren't really any technical hurdles to overcome while making one, so we decided on this. I'd had the idea of a game called OVERKILL for ages - in which you're rewarded for killing enemies with more power than you need to. We went through a few iterations of this and settled on something we both felt was OK.

We gradually added different types of enemy pretty much ad-hoc. I'd code something and Asute would make graphics for it, or Asute would make an enemy and I'd code something for it which seemed to match.

The one thing I really wanted to concentrate on was getting an editor in place for designing levels. In this game Levels are a series of different waves, aiming for a few quick peaks and troughs of action, giving you a few seconds of breath before the next onslaught. Having quick turnaround time for testing out levels is really important.. so an editor it was. Also I had been wanting to make a node-based editor in Unity for ages, so here was the opportunity!

Here's a screenshot of Level 1



So here I can quickly test out combinations of enemies, different wave sizes etc. Pressing a button in the editor saves out the data, compiles, and runs the game (in a couple of seconds) so turnaround is really fast. I can 'mute' or disable nodes so that I can test individual waves, then enable them all to test a whole level.

For variety I added sub-types for most enemies. Eg. 'Random Walker' has 3 options: Slow, Medium, Fast. Adding these in really made the combinations of enemies more interesting.


Music / Sound

For this gamejam I added some features to my sound 'driver' - I didn't yet have sampled sounds in there, so I added that as we thought it'd be funny to have someone shouting OVERKILL! I'm not entirely sure that what I added is bug-free though, so something bad will probably come to light in the next project!

Up until now most of the music I've done for the previous game has been not particularly fast paced. I thought to get more driving sounds i could add a Pattern module to my music editor, just to speed up the creation process. It's basically a simple arpeggiator, so I can 'notate' chords, and assign patterns to them.



Here a track is notated as chords, but the pattern is applied to it. Each entry in the pattern has it's own volume and keyoff value, which creates nice effects. I thought about adding FM modifier values to each note, but that ability is already there with a curve control. This is something I might add later though. Or even use the pattern editor as a controller for the FM values separately from notes.


Next up I want to add triggered samples, perhaps for the next game jam.



Comments

Popular posts from this blog

Converting Gunslugs to Megadrive / Genesis

Mega Palettes