Design decisions

Preliminary considerations

Before diving into writing Jazz (and later Sjaak), I did some uncharacteristic planning and designing. Usually when I want to code something up, especially in my spare time, I just start typing and hack things together. However, I wasn't entirely happy with how some things turned out in my last chess program, and I wanted to do better this time. A few things were obvious:

  1. Bitboards. I wanted to use bitboards.
    With 64 squares on a chessboard and 64 bits in... well, a 64 bit integer, how can you not feel that bitboards are what you're supposed to use when writing a computer chess program? Previously, I had had this idea and used it without bothering to look on the internet. As usual, when you have an idea, someone else has had it too (and if they haven't they usually will in the near future), so I decided to do some homework and read up on all the clever things people had done with bitboards over the past decades, in particular "rotated bitboards". I started out with these (in Jazz), but later switched to using something called a kindergarten bitboard, which turned out to be both faster (because I don't need to store and copy so much data) and simpler. Sjaak has used these from the beginning.
  2. Optimise for 64 bit hardware.
    Pretty much a consequence of the first point. It means my program will likely run sub-optimal on 32 bit systems. Well, it's a hobby project, and my machine is 64 bit, so I don't care too much at this point.
  3. Use C, not C++
    Not much of a design decision. I slightly prefer C over C++, although I would probably benefit from some C++ style object-oriented programming. It should fit the way I program in C very well. Another no-brainer was to use the C99 standard, or at least have no qualms about using C99 constructs or ideas. It probably means the code doesn't compile using Microsoft's C++ compiler. Again though, a hobby project. Windows does not rank highly on my personal list of things that matter.
  4. No global variables.
    Again not much of a decision. When writing maintainable code, global variables are evil. No more, no less. Yes, they're often very handy and useful - certainly in the beginning. Yes, it may make sense to load some data in a global structure and reference it often. Again, it's a rule of thumb. Certainly nothing gameplay or search related can rely on global variables.
  5. Function inlining is a good thing
    This means I put a lot of code in header files in "static inline" functions - so the exact same code gets inserted multiple times by the compiler in each source file. It makes the code a bit more ugly to work with, but hopefully it's worth it. It still beats macro's to accomplish the same effect.

Having decided on these things, the next step was planning data structures and the functions that would manipulate them (what would be class member functions in C++). First the board.

Intermezzo: BigSjaak

Originally, I designed Sjaak to play variants on an 8x8 board, which meant that I could reuse directly a lot of ideas (if not code) from Jazz. Ultimately, of course, I wanted to lift that restriction and play variants on a larger board.

One approach could have been to scrap the bitboard structure and use plain arbitrary-sized mailboxes. I didn't really want to do that though, so I never considered doing that. I really wanted to retain the bitboard-based design.

I had some experience writing a bitboard-based Xiangqi engine, where I used one 64-bit bitboard to represent each bank of the river. This worked reasonably well, and since I had written the code in C++, it still looked very much like normal C 64-bit bitboard code. So my first attempt was a C++ port of Sjaak called BigSjaak, with a double-bitboard replacing the single 64 bit bitboard. Ultimately I scrapped this idea because I'd made some updates/bugfixes to Sjaak that I would have to port into BigSjaak, and also because the code was actually more cumbersome in practice than I had thought it would be.

That wasn't the end of it though, because I soon learned about a GCC extension that made it possible to use 128-bit integers (on 64 bit hardware) or 128 bit vectors of integers (on 32 bit hardware). The main advantage of these would be that logical set-wise operators would work exactly the same as on a 64 bit bitboard. The main disadvantage would be a potential large performance hit on 32 bit hardware (especially without SSE), but as per the above design principles, I wasn't too worried about that. The major annoyance was an inconsistency in the way multiplication and bitshifts are handled in those two modes: on 128 bit integer, they just work as you'd expect, but on a vector they work on each component alone, so I still had to implement general 128 bit bitshifts and 128 bit multiplication (working on tuples of 64 bit integers).

This does, of course, mean that there is some duplication in Sjaak, since I kept the 64 bits for 8x8 (and potentially smaller) boards and added structures to store 128 bit bitboards as well as the (duplicate) code to manipulate them. Ultimately this code duplication will become irksome again and it'll have to go, at which point moving to C++ and templates is probably the right thing to do.

The board layout

I've seen all sort of funky board layout schemes on-line, where people enumerate squares starting at A8 or H8. To me, the only thing that makes sense is to start counting down the files starting with A1, B1, C1 and on to H1, at which point we wrap to A2. Using conventional 0-based counting, we have A1 = 0, B1 = 1, A2 = 8, H8 = 63. Not complicated.

The representation of the board is a collection of bitboards, and is essentially the same in Jazz as it is in Sjaak:

typedef struct board_t {
   /* Bitboard representation of the chess board.
    *  Index 0 is for pawns
    *  Index 1 is for knights
    *  Index 2 is for bishops
    *  Index 3 is for rooks
    *  Index 4 is for queens
    *  Index 5 is for kings
    */
   bitboard_t bbp[6];

   /* Bitboard encoding initial positions, used in move generation for
    * castling moves.
    */
   bitboard_t init;

   /* Board occupancy bitboards
    *  Index 0 is for white pieces
    *  Index 1 is for black pieces 
    */
   bitboard_t bbc[2];

   /* Hash key for this board */
   uint64_t hash, pawn_hash;
   
   /* En-passant square */
   int8_t ep_square;

   /* Material balance for black and white. */
   int16_t material[2];

   /* Whether the current player is in check or not. */
   bool in_check;
} board_t;

Nothing too deep or complicated. The init bitboard is something that is left over from the first chess program I wrote, which relied on knowing which pieces had moved before and which hadn't. I still use it out of habit for manging castling rights, but I may drop it at some point and simply encode those in bit flags, which is what everyone else seems to do.

I don't use a bitboard to store which squares are occupied, instead I just store where black and white pieces are. Occupied squares are given by the union of these two bitboards.

There is some extra book keeping that doesn't seem to be directly related to the current position. The init bitboard is an example of this as well. The two 64 bit hash keys are used to index the transposition table (again, later) and a table of pawn structure evaluations. The material balance (total material for black and total material for white) is stored here too for reasons I'll get to shortly. The in_check field is a convenience that can help speed up some parts of the search or the move generator.

In Sjaak, the bbp array is 16 elements big, to keep track of up to 16 different piece types. Sjaak also has an extra royal bitboard that keeps track of the location of the royal pieces. This is mainly used to detect check: the piece type alone doesn't tell us whether a piece is "royal" or not. There are a few other minor differences:

   /* Location of royal pieces.
    *  In a sense, this information is redundant, since the king is redundant. However, having a separate
    *  bitboard to track the position of the royal pieces gives us a lot more flexibility in defining chess
    *  variants where there can be multiple royal pieces.
    */
   bitboard_t royal;

   /* Bitboards for all piece types, up to 16 per variant */
   bitboard_t bbp[MAX_PIECE_TYPES];

   /* Byte board holding the type of piece at each location.
    * NB: the contents of empty squares is not defined!
    */
   int8_t piece[64];

   /* Piece holdings.
    * These are indexed by [piece type][side to move]
    * We actually have a bool flag to specify whether we're interested in these or not, so we can skip a chunk
    * of code for variants where we're not.
    */
   int8_t holdings[MAX_PIECE_TYPES][NUM_SIDES];
   bool use_holdings;

As well as some extra status information that affect the rules of the game.

Storing moves (Jazz)

On its most basic level, for each move we need to store the origin and the destination squares, and (for promotions) the type of promotion piece. The first two require six bits each, the latter requires only two bits (for four possibilities in total). That's 14 bits, so it easily fits in one 16 bit integer.

We could do that, and there are probably people who do exactly that. But I don't. When actually making a move on the computer's internal board we need to

  1. know the piece that makes the move;
  2. decide whether it's a capture or not, and if it is, know what type of piece was taken;
  3. decide whether or not it's an en-passant capture or not and figure out what square the pawn we just captured was actually on (since it isn't on our destination square);
  4. decide whether the move is a promotion and if it is, replace the pawn with whatever new piece it's supposed to turn into;
  5. decide whether a king move is a castling move, and then move the appropriate rook to where it needs to go as well.

All of these require either extra conditional code or lookups, which we like to avoid. And there should be no real reason we can't, because we know all of these things when we decide that this is a legal move - so we could simply store that data and use it later.

First, an important consideration, however.

Because computers play chess by trying different moves on its internal board in order to work out if they're any good, we need to be able to make moves, and then take them back. This is not a nice feature to have for the player's benefit so that they can undo their stupid mistakes, it's absolutely essential. How to do this tough?

My first chess program simply stored all the information it needed within the move itself: given that we knew what the last move had been, it would have enough information to restore the previous position (including tacky things like castling rights and en-passant squares). That is one option, but it has at least one drawback: it makes reversing a move as complicated as making a move.

There is another option: simply back up data we cannot easily restore otherwise. Or, even better, just back up everything.

There is a cost associated with that, of course: it takes more memory (but memory is cheap) and it requires that we make a copy of the current state so that we can restore it later. However, if we're clever, we can make the "take back move" function essentially free.

I keep an array of board_t in my game struct and a board_t *board pointer that always points to the current position. When making a move, I increment the pointer, copy the state on the old board to the new one (this is potentially slow!) and then make the move on the new board.

When I unmake the move, all I have to do is decrement the board pointer. This is practically for free.

I spent some time wondering whether this was really the way I wanted to do this, because I was afraid that copying the board might turn out to be slow in practice. However, in the alternative case, the function to unmake the move would be much more complicated and slower. I decided that the gain in simplicity there would probably outweigh copying the board at each move, and even if somehow it didn't - the added simplicity in the code did.

I haven't looked back. So far things work great.

This is why there is some extra book keeping done in the board_t struct: things that are incrementally updated by the make move function and that need to be restored by the unmake move function are stored in the board_t struct, because the unmake move function doesn't do anything besides decrement a pointer (well, it doesn't do much beside anyway).

Move structure (old)

My data structure for storing moves originally looked like this:

typedef union move_t {
   uint64_t m;             /* use only to compare/clear struct */
   struct {
      uint8_t piece;       /* Piece on source square */
      uint8_t from;        /* Origin square */
      uint8_t to;          /* Destination square */
      uint8_t capture;     /* Capture square; distinct because of en-passant */
      uint8_t tpiece;      /* Piece on destination square (for promotions) */
      uint8_t castle;      /* Castle move; actually xor mask for rook */
      uint8_t ptaken;      /* The captured piece */
      uint8_t ep:6;        /* En-passant square */
      uint8_t is_capture:1;/* Bit flag: this is a capture move */
   };

It's 64 bit, which is more than the minimum we need and may be bad for the processor cache.

The reason for storing so much extra data was to remove branches from the move generator. As can be seen from the struct, I had to resort to using bitfields to squeeze some of the data into 64 bits...

The m element is strictly used to set all bits in the struct to 0 (initialise the move) and to compare moves quickly using one 64 bit compare. Mainly, this is just a convenience. Very importantly, no assumptions are made anywhere about the relation between the other struct elements and the bits of m. These are, of course, highly non-portable.

The from and to elements are quite clear: they represent the source and destination squares for the move. The only other piece of information that cannot be deduced from the current state of the board is the tpiece element, which mainly holds the promotion piece. The other elements are only used to speed up making the move.

The piece field is a clear example of this. This encodes the piece type and colour, using bitflags to indicate the different piece types:

#define PAWN     0x01
#define KNIGHT   0x02
#define BISHOP   0x04
#define ROOK     0x08
#define QUEEN    0x10
#define KING     0x20

#define WHITE    0x00
#define BLACK    0x80

So a black king would be represented by KING|BLACK. The tpiece field uses the same encoding. Obviously, this is far from the minimal encoding needed to store a piece type.

Storing the piece type in the move struct avoids having to look them up on the board when making the move, which would be slow. In practice, we don't really need to know the type of piece when making a move (it's enough to know that we're removing a piece from its source square), so this isn't essential. It's nice that we know the type (and colour) of the piece without looking at the board for display purposes, but I may remove this field at some point if I find that I have something more importat to store here. Either way, using bits to represent a piece type is probably not the best design decision here, and I plan to get rid of that in the future.

It's clear that we don't need to know what piece type to remove (since we can simply remove all pieces from a given square; there should only be one, after all), but we do need to know what sort of piece to put on the target square. For a promotion this is stored in the tpiece element and we can do the same for a normal move. With this little trick, we need no special code to make promotions on the board: for promotions, piece is a pawn and tpiece is whatever piece the pawn promotes into, for regular moves, tpiece = tpiece.

A very similar trick is applied to the capture and to squares: the to square is the square the piece moves to, the capture square is the square that needs to be cleared because this move captures a piece there. For normal captures, capture = to, but this is not the case for en-passant captures, where the pawn moves to a different square than the pawn it captures is on (it may also come in handy if I ever want to implement some kind of fairy chess pieces). By clearing the capture square, we assure that the to square is empty in all cases (if this were not the case, putting another piece there would currupt our board data structure). So again, we need no special code to handle en-passant captures.

The castle flag is only needed when making a castling move. It marks the origin and destination files for the rook, so simply xor'ing it into the rook and colour bitboards will move the rook to its new file. The one thing it does not handle well at the moment are the rotated bitboards, which are currently set explicitly (however, we could use a table lookup indexed by this bitfield to do those too). Again, this avoids some special purpose code (it would be even better if we did use the lookup table approach for the rotated bitboards).

The ep field holds the en-passant square after a double pawn move. This is copied as status information on the board after the move is made so that the move generator will know to allow en-passant moves for this position. It's truncated to 6 bits (which is enough) because I needed an extra bit to store whether the move is a capture or not (the is_capture flag). This is again used for display purposes, but also for move ordering.

Move structure (new)

Obviously the original move-type encoding was rather wasteful in terms of storage, and it also had a major drawback that piece-types weren't simply sequential integers, which made code dealing with piece-table lookups more hassle than it should have been. So this had to be changed.

Stored as sequential integers, 3 bits are really all that is needed to store a piece type and we certainly don't need more than 6 bits to store a square. The captured piece, it turns out, was unused and in practice the castle mask was only used as a boolean flag (with the actual mask reconstructed from a bitboard). A lot of wasted and useless information, in other words.

The goal was simple: get rid of the struct, decode the bitfields by hand, and reduce the move type to 32 bits (which turned out to be plenty).

Some things would have to stay, however: there was a piece "NONE", meant to be 0, so that meant 7 piece types in total rather than 6 (doesn't matter, still fits in 3 bits) and the colour bit, BLACK, was expected everywhere to be bit 7. The new piece encoding and flags became

#define NONE     0
#define PAWN     1
#define KNIGHT   2
#define BISHOP   3
#define ROOK     4
#define QUEEN    5
#define KING     6

#define ANY_PROMOTION   (1<<KNIGHT | 1<<BISHOP | 1<<ROOK | 1<<QUEEN)
#define QUEEN_PROMOTION (1<<QUEEN)

#define WHITE    0x00
#define BLACK    0x80

#define PIECE     (0xff ^ BLACK)

Bitfields are extracted from the 32-bit move structure through a combination of shifts and masks:

#define MOVE_SQUARE_MASK      0x3f
#define MOVE_PIECE_MASK       0x07
#define MOVE_PLAYER_MASK      0x80
#define MOVE_IS_CAPTURE_MASK  0x01
#define MOVE_IS_DPAWN_MASK    0x01
#define MOVE_IS_CASTLE_MASK   0x01

#define MOVE_FROM_SHIFT       0           /* From square    (6 bits) */
#define MOVE_TO_SHIFT         6           /* To square      (6 bits) */
#define MOVE_PIECE_SHIFT      12          /* Piece          (3 bits) */
#define MOVE_PLAYER_SHIFT     8           /* Player bit     (1 bit, offset by 7; it's really bit 15) */
#define MOVE_CAPTURE_SHIFT    16          /* Capture square (6 bits) */
#define MOVE_EP_SHIFT         16          /* En-passant square (6 bits), doubles with the capture square field */
#define MOVE_TPIECE_SHIFT     22          /* Piece on destination square (promotion) (3 bits) */
#define MOVE_IS_CAPTURE_SHIFT 25          /* Whether the move is a capture (1 bit) */
#define MOVE_IS_DPAWN_SHIFT   26          /* Whether the move is a double pawn move and stores an EP square */
#define MOVE_IS_CASTLE_SHIFT  27          /* Whether the move is a castling move */

For instance

uint8_t get_move_origin(move_t move)
{
   return (move >> MOVE_FROM_SHIFT) & MOVE_SQUARE_MASK;
}

uint8_t get_move_piece(move_t move)
{
   uint8_t piece;

   piece = (move.m >> MOVE_PIECE_SHIFT) & MOVE_PIECE_MASK; 
   piece |= (move.m >> MOVE_PLAYER_SHIFT) & MOVE_PLAYER_MASK;
   return piece;
}

The new code is certainly no slower (though as fas as I can tell it's not faster on 64 bit hardware either, 32 bit might be different) and feels a lot cleaner than the old code now.

Storing moves (Sjaak)

Jazz' original move structure wasn't flexible enough for a more general variant engine. Inspired by a description of how moves are stored by the general variant engine ChessV, Sjaak conceptually splits the possible actions that can be performed during a move as follows:

  • Pickups pick pieces up from the board. A normal move has one pickup, a capture or castling move has two. A piece drop has no pickups at all.
  • Drops are the reverse of pickups: a piece is placed on the board. Normal moves have one drop, castling moves have two.
  • Store places a piece in a player's holdings.
  • Retrieve takes a piece from a player's holdings.
  • Status change reset the 50-move counter, change side to move, set the en-passant target square.

For each drop or pickup, we need to store the target square (7 bits, to accomodate large boards), the piece type (4 bits) and the colour (1 bit) for a total of 12 bits. For a store/retrieve, we need the piece type (4 bits), the side (1 bit) and whether this is a store or retrieve (1 bit). If we limit the number of pickups and drops to 0-2 (or 0-3) we need 2 bits to store their number, and we need 1 bit to store the number of retrieve/stores. That means we need 5 bits to describe the number of actions in a move. Since castling is two pickups and two drops, we need at least 48 bits to describe a castling move using this format, or 53 bits in total.

Long story short, we use a 64 bit integer, and use the lowest 5 bits to describe the number of drops, pickups and store/retrieve operations. The next 48 bits are used to record the actual move data and the final bits are used to store state-changes. We use bitshifts to access different elements in the struct.

This works very well in practice: when making a move, we first resolve all pickups, then we perform all drops. Finally, we update the holdings and implement state changes. To reverse a move, we undo all status changes, we undo the store/retrieve, we treat all drops as pickups and finally all pickups as drops.