Gauntlet Of Power. Sega Genesis.

Initial Thoughts.

 Heroes Of Loot : Gauntlet Of Power is another game by Pascal at OrangePixel. 

I was on the lookout for another game to convert after finishing Meganoid. 
GoP is a similar sort of roge-like to Heroes Of Loot, but it's room based. 

My HoL conversion pre-created random levels on the PC side, but for this I wanted to actually do it all on the Genesis. 
Having looked in to the Level Generation code, I really didn't fancy converting that to asm. Nothing hard about it, but it's a *lot* of code, and there's massive room for tiny logic errors in the conversion. I didn't want to spend months in the future tracking down subtle little bugs.

So, Pascal very generously allowed me to take the gfx and make a similar game to the original, but not necessarily sticking with exactly how the original works.

From me feeling quite down about the original conversion task, I'm now on quite a high. This is the first time for a few games that I'll be able to make changes which work in the Genesis's favour. I can write to the hardware again! 

This opens up lots of possibilities for compromise which should make this version better than the strict conversion could ever have been.

The first change which comes to mind immediately is to use fixed sizes for the game rooms. (eg. 512x256) rather than a bunch of different sizes as in the original. (I can still make smaller rooms, but this would be the max. size). Something like this would immediately get rid of the need for updating scroll edges, saving 10 scan lines or so of CPU / VBL which can be used elsewhere, without really anyone noticing anything different!

This game requires lots and lots of bullets. The player can shoot from 8 different slots at once, in each of 8 directions. Some generic weapons shoot straight in these directions, while others move around in different ways, like homing missiles.
As it's not a straight conversion, I can control the amount of different things which are mounted at once.

Stress Test Tech Stuff

Implemented the 512x256 map, and a simple 0,1 collision map in code which matches the arena image I knocked up. 

Collision Boundary

All arenas will have a collision boundary around the border. As long as nothing is travelling greater than 8px per frame, this removes the need for separate edge tests for moving objects. In games without borders, I usually add an extra collision border around the map, but that can end up with less optimal map sizes.

Coordinates

As nothing can go outside the arena range, I need 9 bits for X and 8 bits for Y. I can use words for each, rather than the generic 16:16. This makes standard things like adding dx to x quicker on 68000. (28 vs 40 cycles) , and of course saves space
However this is slower when accessing the pixel position of a value.
I think the lowest fractional value I can use is probably 4 bits, so we have at least 12:4 for each axis. The exact range will depend on other ways we need to access the values, for instance getting the tile value (dividing by 8). 
I've settled on 12:4 for now. It's possible that we could have different ranges for X and Y, as we only need 8 bits for Y values.

Rendering

I can render things as hardware sprites, or as tiles on a separate layer (as we have no parallax layer), depending on how they are needed to move.
For generic 8 directional weapons I chose to make them travel quickly, so I could use tiles for these, and it won't look too bad.
For others, i need to use hardware sprites. (I tried homing missiles as tiles, and they look bad)

For the 'tile sprite' layer, I have a couple of choices. 
The easiest way is to store a ram map for the entire arena, write in to that, and DMA the whole thing in VBL. This is 4k, so it's a hefty chunk of VBL time used up.
The other choice (which I'm currently using) is to write individual tiles. It's possible that doing this could become less efficient than dumping the whole map, depending on the number of items rendered. I don't think this will happen but it's something to watch out for.

So for each rendered bullet, at update time I generate a list which is created each frame.  This list has the VDP_CONTROL and VDP_DATA values for each tile. The control data is a long word, and the data is the tile value. This is done to be as efficient as possible during VBL, where CPU time is very important. I can dump the values to VDP with a tight loop.
This is a double buffered list, with the previous list being the 'clear list' 
So in VBL I first replace all the previous frame's bullet tiles with zero, then copy in all the new ones. 
Then I swap the buffers each frame.  
 *Possible optimizations* - Unroll the loop. Or write the values directly in to an unrolled code loop in RAM. I have a feeling that the CPU might need to wait a little on the VRAM write, so maybe the unrolling isn't needed. Not sure. 
Will need to so some timing tests.
 
For sprites, I'm currently doing the standard approach - 'cached sprites'. And still looking for ways to improve this. A great optimization of the fixed arena size is that I don't need to do any sprite culling , which improves the speed a lot. I could still feasibly do X culling if there are a lot of sprites spread out horizontallly over the level or something like that. No need for Y culling, at least.

I store a 64x32 longword table, containing the VDP_CONTROL value for each tile in the whole arena. So I don't need to calculate the VDP values for each tile for each tile rendered

Bullet / Enemy Collision

For this much action, this part is potentially by far the most dangerous bit of code. I've limited the number of player bullets to 64 in total, and I want to handle 50+ enemies.
I think the chaotic nature of the game allows me to be a little inaccurate with the collision. You probably won't notice some small errors, especially if they favour the player.

My approach is purely to use an 8x8pix collision map. One byte per tile, using 2048 bytes in total.
I fast-clear the map at the start of the frame, using unrolled movems. Again this needs to be tested against maintaining a list, and clearing individual bytes. I except the big-clear approeach is fastest, and it's certainly simplest. But maybe not!

Bullets are a single list of structs, 64 long. So I write their index in to the collision map. When a collision is registered, that's used to look up the original bullet. This can be used for an accurate box check if needed. (though it seems fine for now without). It also gives access to the bullet's HP etc.

The enemies themselves check the collision map. This is done in-line at the moment, though a separate loop might be better, depending if I have spare registers available.
Enemies are going to have a hard coded function to check the bulllet map, depending on their size. I can take their map position, and check each tile in eg. a 3x4 block around them.

The alternative approach would be to have the enemies writing their positions in to the map, and have the bullets check them. The difference is that I think enemies overwriting each other when they're bunched togther is worse than the alternative. This alternative would be faster, but less accurate.

Again, having borders around the arena means that i don't need to do any clipping for these tests, which is a HUGE benefit.

Scenery Collision

Another byte map for the whole arena. Moving objects just check the byte under their position, and react accordingly. Bullets are destroyed, and monsters can collide and slide around walls.

Pickups 

Another 64x32 map! (Potentially this can be incorporated in to the collision map, if needed)
When a monster drops a pickup, it's written to this map, and also directly to the VDP tile map. 
When it's picked up, the pickup collision map is cleared for this item and the VDP tile map is restored to its original state.

Enemy Bullets

Not entirely sure what's needed here, but I tried using the tile render approach for this, and it works really well. 
It's very similar to the player bullet approach. But non cardinal direction bullets don't look great at 8x8 tile resolution when moving.
The route I'm taking limits the size to 4x4 pixels, but gives the movement double the resolution of the 8xx8s. I write one of 4 tiles in to the map depending on the bullet's position within the 8x8 tile. Surprisingly this works well enough for bullet hell, and is pretty fast. 256 bullets fits within half a frame of CPU.
At this point it might well be faster to use the RAM tile map, dumping 4k to the VDP each frame, rather than erasing and rendering >300 tiles each frame.

For collision, I believe I can re-use the player bullet collision map without needing to clear anything. I have 1 or 2 spare bits in each byte, so can just use the top bit and do a negative test rather than non-zero test. This is assuming that all the enemy bullets are of the same type.

Implementation      

All these maps of different types are the same dimensions, 64x32. But they are of different sizes, some byte, some word, some longword. Often, objects like bullets and monsters need to access multiple of these maps in their update loop. Map collision, bullet collision map, visible tile map, etc.
I can calculate the 8x8 tile offset once, from the X,Y positions. This can index in to the byte tables directly.
Then for indexing in to the maps of other sizes , I just need to multiply the value by 2 or 4 and use that as an index. This is wonderfully fast!
All or some of the maps could feasibly be combined in to one, though clearing the ones which need to be cleared would be very much slower. And some maps are not writable, so can be in ROM only. Combining them would waste RAM. These are all options, depending on how the game turns out. Access would not be any slower, I think, if it could all be combined in to 64x32x8 bytes.


Dungeon Room Generation

Previously I would generate different maps on PC, and import hundreds of them in to the Genesis game to seem random. This time I wanted to do the whole thing on the Genesis side, mainly because I can't sit at the PC and do tools work for many minutes at a time.

Whilst not as complicated as HoL, the room generation in the original code has lots and lots of long "if this and this or this then this, else this or that" sort of lines of code
This sort of branching code in asm is notoriously long winded, and it's usually much better to use tables and logic in different ways.

I decided to do a RPN (Reverse Polish Notation) logic system to help translate this logic in to machine code. (I could have just used C, but honestly doing all the compiler setup is something I really don't want to deal with, and I have my "100% Machine Code" label to adhere to!)

RPN is great because it's extremely simple to implement. It's a stack-based system, where you push the parameters on to a stack, perform an operation, and push the result to the stack. This way very complex logic can be built up fairly simply. I used this a lot in the 90s for simple scripting languages.

As a simple example. To do the test: "If( isRaining AND InHole), GetWet!

We push the terms and tests on to the stack like this:
 IsRaining
InHole
AND
DONE

The AND instruction pops the previous two things from the stack and performs an AND on them (meaning that they BOTH have to be true). It puts the result of this back on the stack, and 'DONE' gets this value, which is the result of the entire statement.

An example from the code is as follows. The JAVA code is this:
           if (y>0 && isWall(x,y-1) &&  (isWall(x-1,y-1) || isWall(x+1,y-1) || isWall(x,y-2) ) ) { Do Something }

And the corresponding assembler data is: 
RPN_TEST:
    RG_IsWall,-1,-1
    RG_IsWall,1,-1
    RG_OR
    RG_IsWall,0,-2
    RG_OR
    RG_Y_GreaterThan,0
    RG_AND
    RG_IsWall,0,-1
    RG_AND
    RG_DONE
 
The nested ORs and ANDs, and the braces in the original map out to the list of statements above. 
This might look complex, but it's much MUCH easier to debug than a list of compares and branches which would be the standard asm approach.

Another simple example:
               if (!isWall(x-1,y-1)) (DoSomething)
RPN_TEST
    RG_IsWall,-1,-1
    RG_NOT
    RG_DONE

the NOT command allows me to NOT the result on the stack which is equivalent to ! (or == false) in the higher level language.








 

 

Comments

Popular posts from this blog

Converting Gunslugs to Megadrive / Genesis