BitBoards

BertTuyt · Post by **BertTuyt** » Wed Oct 29, 2008 21:24

Because many programs (more and more?) use BitBoards, and so far I see al kind of references to them in misc. topics, I decided to start a new topic covering them. Also I want to provide more Damage bitboard details to share with others, and also to learn more.
Maybe I already provide some insights in the past, so for some programmers this information will be known.

For the Board representation I use 5 BitBoards (WhiteMan, WhiteKing, BlackMan, BlackKing and Empty). I know it is possible to use only 3 Boards (WhitePiece, BlackPiece, WhiteBlackKing), but i have not implemented this, maybe I will give it some thought, some day, if this could be beneficial.

Both the MoveGenerator as the Evaluation Function are BitBoard based.
It is clear that I keep the Eval Function for myself, but Im open to share the source code of the MoveGenerator with everyone who is willing to do the same with his/her code.

The Bitboard implementation is very fast for detecting a possible kill (just a sequence of shifts ands. and ors) and also executing a non-kill move.
I doubt however if in the kill-routines, the bitboards are really faster (and for the king-routines i doubt this even more).
Also in case of just a man-move (which is only a shift and and-operation) i convert the 2 move-bitboards (which indicate the possibility to move to right or left) in a move-list containing from - and to - position, so i need in the end the index of the bitpositions (hope that some-one does not need the bit-index, due to a more clever implementation).

In the past I only had 2 BitBoard functions, BitFind() and BitCount().
Nowadays I only use the BitCount as the BitFind is an Intrinsic in VC on my quadcore (_BitScanForward).
The BitCount is based on dividing the 64bit-word in 4 parts of 16bits and then do a table-lookup. Also this function could become an intrinsic as on the new Nehalem processor this is only 1 instruction (although I don't know the implementation details and/or speed, compared with a table-lookup).

The BitBoards are embedded in a Structure with many additional parameters which I update during a move (such as number whitepieces, number blackpieces, hashvalue).

When I do a move I write the updated values to a new structure on the tree-stack. This way the remove becomes very simple/fast, it is just a pointer decrement.

I think that this method is faster as i have to do a read a write anyway.
Some parameters which I don't use during search are not incrementally updated, and re-calculated in the evaluation function (but mostly only in the case of a full-eval, many evals are lazy-evals).

The hashvalue and counts for white and black pieces is something which i need during the search, and therefore they are incrementally updated.

Rein Halbersma · Post by **Rein Halbersma** » Thu Oct 30, 2008 02:01

BertTuyt wrote:Because many programs (more and more?) use BitBoards, and so far I see al kind of references to them in misc. topics, I decided to start a new topic covering them. Also I want to provide more Damage bitboard details to share with others, and also to learn more.
Maybe I already provide some insights in the past, so for some programmers this information will be known.

For the Board representation I use 5 BitBoards (WhiteMan, WhiteKing, BlackMan, BlackKing and Empty). I know it is possible to use only 3 Boards (WhitePiece, BlackPiece, WhiteBlackKing), but i have not implemented this, maybe I will give it some thought, some day, if this could be beneficial.

Both the MoveGenerator as the Evaluation Function are BitBoard based.
It is clear that I keep the Eval Function for myself, but Im open to share the source code of the MoveGenerator with everyone who is willing to do the same with his/her code.

The Bitboard implementation is very fast for detecting a possible kill (just a sequence of shifts ands. and ors) and also executing a non-kill move.
I doubt however if in the kill-routines, the bitboards are really faster (and for the king-routines i doubt this even more).
Also in case of just a man-move (which is only a shift and and-operation) i convert the 2 move-bitboards (which indicate the possibility to move to right or left) in a move-list containing from - and to - position, so i need in the end the index of the bitpositions (hope that some-one does not need the bit-index, due to a more clever implementation).

In the past I only had 2 BitBoard functions, BitFind() and BitCount().
Nowadays I only use the BitCount as the BitFind is an Intrinsic in VC on my quadcore (_BitScanForward).
The BitCount is based on dividing the 64bit-word in 4 parts of 16bits and then do a table-lookup. Also this function could become an intrinsic as on the new Nehalem processor this is only 1 instruction (although I don't know the implementation details and/or speed, compared with a table-lookup).

The BitBoards are embedded in a Structure with many additional parameters which I update during a move (such as number whitepieces, number blackpieces, hashvalue).

When I do a move I write the updated values to a new structure on the tree-stack. This way the remove becomes very simple/fast, it is just a pointer decrement.

I think that this method is faster as i have to do a read a write anyway.
Some parameters which I don't use during search are not incrementally updated, and re-calculated in the evaluation function (but mostly only in the case of a full-eval, many evals are lazy-evals).

The hashvalue and counts for white and black pieces is something which i need during the search, and therefore they are incrementally updated.

Hi Bert,

Thanks for your elaborate explanations. I am happy to share my move generator code with you. We can do this by email. I have also done this with Ed. My move generator is 100% bitboard based. There is no piece array or an array for captured pieces. Like Ed, I store a position as 3 bitboard (white, black, kings), but we do some things a little different. Ed generates successor positions and I generate moves as differences with respect to the current position. As was discussed last year, most people except Ed use ghost squares and I also have them. In my representation they are bits 10,21,32,43 and 54 through 63. During move generation, I generate 4 additional bitboards: not_occupied, active_men, active_kings and opponent_pieces as each of them is only a couple of bitwise operations so generating them on the fly is cheaper than making and unmaking them.

What do I mean with "moves are differences with respect to the current position": my move struct is exactly the same as a position! The white bitboard in a move struct has 1s for the white pieces that change during a move (with white to move there are only 2 bits set to 1, except for captures that go back to the originating square). The black bitboard in a move struct has the black pieces that change during the move: with white to move these are all the pieces that get captured. Finally, the kings bitboard in a move struct contains the captured kings (if any) and the field on which a piece makes a promotion (if any). Making and unmaking is simply done by XOR-ing the 3 bitboards in a position by the corresponding 3 bitboards in a move. It is always the same code (and the same speed!), both for man and king moves, both for captures and normal moves. With these operations, you can "subtract" two positions and get a move, or you can "add" multiple moves and get a variation that can be added in one pass to a position.

I keep all my move lists in a global move table (move**): one list for each search depth. I experimented with both with the "make-unmake" approach that most of the chess guys use and the "copy-make" approach of Ed / Feike (where unmake is a pointer decrement). I thought you also did this Bert, but above you stated that you increment material counts per move. Do you perhaps store a list of moves and a single position list of the entire line from the root position? This I haven't tried, but it seems a new idea. One thing that I measured was that the copy-make approach is a lot faster on perft() without bulk-counting (since you avoid a lot of unmakes on the leaf nodes). But I figure I will lose many times this advantage as soon as I had to compute the material balance and hash keys from scratch in every leaf node, so currently I am sticking to the make/unmake method.

My move generator is very straightforward: only a little over 600 lines of C++ code. E.g. the code for generating man captures has a calling function and a recursive function that each fit on a single screen in MSVC++. During recursion, I loop over all directions except the direction I came from, and I only have recursive calls in the directions orthogonal to the one I came from. I learned this from Ed's code: it didn't make a speed difference, but it felt good to have eliminated an extra recursion call I have enumerated my directions in such a way that I can find the next "positive" orthogonal direction (i.e. a direction in which I have to shift right for white) by a simple arithmetic expression. I started out with a somewhat non-obvious capture generator based on very specific draughts knowledge, but it turns out it is 5-10% slower than the straightforward approach. I can elaborate for those who are interested (I got the inspiration from the little purple book by Wiersma, Scholma and Vd Kooij on combination patterns in draughts)

One question for you Bert: do you use incremental move generation? WIth that I mean do you postpone converting the man move left and man move right bitboards to move lists until you have tried the hash move, the history table and killer moves? It could save you some time if you got a cutoff before you had to find out which other moves are actually in those bitboards. It's something that is on my TODO list on the next visit to the move generator (currently it's all evaluation that I'm coding on).

Ed has kindly measured the perft timings for my program on his pc and I get speeds which are a little behind yours Bert, and comparable to Ed's. One thing that bugs me is that king move generation is very slow. I use simple sliding along a direction, with a bitmask in each direction for captures to terminate sliding 2 squares from the edge. Ed uses a more clever table with the exact distances to the edge in each direction, but then I need to use a bit to square conversion, and out of laziness (and stubborness to use any non-bitboard methods) I haven't tried that yet. And there is a whole technology of "magic bitboards" (see the chess Wiki) waiting to be implemented once I have version 1.0 ready (which I am afraid will not be in time for Gorredijk). There is plenty of room for innovation in move generation for draughts, IMHO. OTOH, the move generator speed will probably not kill a program.

Rein

TAILLE · Post by **TAILLE** » Thu Oct 30, 2008 14:45

Hi Rein,

Rein Halbersma wrote: What do I mean with "moves are differences with respect to the current position": my move struct is exactly the same as a position! The white bitboard in a move struct has 1s for the white pieces that change during a move (with white to move there are only 2 bits set to 1, except for captures that go back to the originating square). The black bitboard in a move struct has the black pieces that change during the move: with white to move these are all the pieces that get captured. Finally, the kings bitboard in a move struct contains the captured kings (if any) and the field on which a piece makes a promotion (if any). Making and unmaking is simply done by XOR-ing the 3 bitboards in a position by the corresponding 3 bitboards in a move. It is always the same code (and the same speed!), both for man and king moves, both for captures and normal moves. With these operations, you can "subtract" two positions and get a move, or you can "add" multiple moves and get a variation that can be added in one pass to a position.
Rein

I do not understand why you chose a move struct exactly the same as a position. I am just beginning to study a new implementation with some mixture between generating successor positions and generating moves as differences. My view for storing a position by difference is only to store one 64bits word containing only the new empty squares which is quite simple, maybe too simple !

Gérard

Rein Halbersma · Post by **Rein Halbersma** » Thu Oct 30, 2008 15:42

TAILLE wrote:Hi Rein,

Rein Halbersma wrote: What do I mean with "moves are differences with respect to the current position": my move struct is exactly the same as a position! The white bitboard in a move struct has 1s for the white pieces that change during a move (with white to move there are only 2 bits set to 1, except for captures that go back to the originating square). The black bitboard in a move struct has the black pieces that change during the move: with white to move these are all the pieces that get captured. Finally, the kings bitboard in a move struct contains the captured kings (if any) and the field on which a piece makes a promotion (if any). Making and unmaking is simply done by XOR-ing the 3 bitboards in a position by the corresponding 3 bitboards in a move. It is always the same code (and the same speed!), both for man and king moves, both for captures and normal moves. With these operations, you can "subtract" two positions and get a move, or you can "add" multiple moves and get a variation that can be added in one pass to a position.
Rein
I do not understand why you chose a move struct exactly the same as a position. I am just beginning to study a new implementation with some mixture between generating successor positions and generating moves as differences. My view for storing a position by difference is only to store one 64bits word containing only the new empty squares which is quite simple, maybe too simple !

Gérard

Hi Gérard,

Yes it is possible to use only one bitboard to represent a move where each 1 represents a square that changes in the position. However, you can only make such a move, not unmake it, because you miss the information what kind of pieces you captured (could be kings or men), and you also miss the whether a promotion took place (e.g. how to undo a move like 6-1? it could have been 6-1 with or without promotion).

It is possible to do it as you describe, but then you have to unmake moves by a pointer decrement on a position list. In any case, doing the make move requires to update all 3 bitboards in the position struct anyway, so it is almost equivalent in number of operations to have a move struct already contain these 3 bitboards during move generation. In that way, I only have to do 3 XOR operations to either make or unmake a move.

In my generate man captures routine e.g., I use these instructions

Code: Select all

        move_list[num_moves].pieces[color] = from_sq ^ dest_sq;        
        move_list[num_moves].pieces[!color] = captured;
        move_list[num_moves].kings = (captured & kings) ^ (dest_sq & Board::BACK_ROW[!color]);

Making or unmaking a move is simply equal to

Code: Select all

        pieces[BLACK] ^= move.pieces[BLACK];
        pieces[WHITE] ^= move.pieces[WHITE];
                kings ^= move.kings;

Rein

BertTuyt · Post by **BertTuyt** » Thu Oct 30, 2008 21:37

I keep all my move lists in a global move table (move**): one list for each search depth. I experimented with both with the "make-unmake" approach that most of the chess guys use and the "copy-make" approach of Ed / Feike (where unmake is a pointer decrement). I thought you also did this Bert, but above you stated that you increment material counts per move. Do you perhaps store a list of moves and a single position list of the entire line from the root position?

I also keep all my move lists in a global move table. I basically have a copy-make process for the Bitboards, as well as for some related position derived parameters (like hashvalue , piece-count).
So my unmove is a pointer decrement, getting the previous bitboards as the other parameters.

One question for you Bert: do you use incremental move generation? WIth that I mean do you postpone converting the man move left and man move right bitboards to move lists until you have tried the hash move, the history table and killer moves? It could save you some time if you got a cutoff before you had to find out which other moves are actually in those bitboards. It's something that is on my TODO list on the next visit to the move generator (currently it's all evaluation that I'm coding on).

I don't do incremental move generation, although i also thought about this idea, for the reasons mentioned by you. In the end i thought it would not bring much as I believe (but need to verify this statement) that most MoveGen time is consumed with positions with a kill-move. And for these positions i need to get the whole list as i do not have (yet) a simple algorithm which easily detects the number of killed pieces ( which you need to maximize).

Bert

BertTuyt · Post by **BertTuyt** » Thu Oct 30, 2008 21:43

Rein,

maybe i missed something, I still don't understand why the make-unmake process is as fast as the copy-make process.

For both processes you need to get the initial bitboard/parameter out of memory, modify it and write it back.

In case of the make-unmake you write it to the same memory location, in case of a copy-make, to a different one.
So here no difference, but the unmake is faster as it is just a structure-pointer decrement.

And as i use a more complex structure with not only bitboards, but some position derived parameters as well, I update all during a remove , with just one instruction.
Help me where im wrong ??

Bert

BertTuyt · Post by **BertTuyt** » Thu Oct 30, 2008 21:50

Rein, another question.

You wrote somewhere (and Gerard also) that you don't need to calculate the lsb index of a bitboard.

But if you don't have the index, how are you able to update/modify the hash-value, as (at least I) need this on every depth-level.
I could calculate this again with a table-base implementation, but guess that an incremental update is faster.

For evaluation-function parameters, it is possible that for most parameters incremental update is slower, especially with increasing depths.

But I guess for the hasvalue, which you need every depth, this could be different.

Bert

FeikeBoomstra · Post by **FeikeBoomstra** » Thu Oct 30, 2008 23:41

Let me start with some pseudo code:

Code: Select all

short int search(NodePnt node, alpha, beta, depth) 
struct work_set cur_work_set;
struct tnode next_node;
     if depth == 0 return evaluate(node)
     generate_legal_moves(node, cur_work_set)
     for all moves 
         populate(&next_node, cur_work_set, move_nr)
         z=search (&next_node, alpha, beta, depth-1)
         update(z, best value)
         if cutoff(alpha, beta, best_value) break
     next move
return best_value

So I never undo a move, it is more like decrementing a pointer, as it is handled by the return statement.

In Work_set I have for each move a structure with move_start (index), move_end(index) and a bit map of the captured pieces.
In node there are three bitmaps: black_white, man_king and occ_empty. Like Rein I generate on the fly during movegeneration own_man, own_king, other_pieces and occupied.

I generate the counts at each level by using the gcc provided internal function: __builtin_popcountll
I did't try the incremental approach.

For the hash_value I increment the value (one of the big advantages of Zobrist hash).

For King moves and king captures I use a table with a bitmap for each field in each direction with all fields (minus 2 for captures) till the edge.

Rein Halbersma · Post by **Rein Halbersma** » Fri Oct 31, 2008 01:41

BertTuyt wrote:Rein,

maybe i missed something, I still don't understand why the make-unmake process is as fast as the copy-make process.

For both processes you need to get the initial bitboard/parameter out of memory, modify it and write it back.

In case of the make-unmake you write it to the same memory location, in case of a copy-make, to a different one.
So here no difference, but the unmake is faster as it is just a structure-pointer decrement.

And as i use a more complex structure with not only bitboards, but some position derived parameters as well, I update all during a remove , with just one instruction.
Help me where im wrong ??

Bert

With a perft() without bulk counting, the copy-make approach is faster than the make-unmake approach. If you compare the timings with the perft() with bulk counting in leaf nodes, then you can compute the timing of an average move generate and the average make+unmake of each move. Here's the math: on my pc (P4 3.6 Ghz) I save about 2 ns per move generated if I use the make/unmake approach (because I don't have to do 3 XORs the move with the current position). However, I lose about 8ns per move make/unmake pair. To compare: in the initial position move generation takes about 220 ns *per node* and make/unmake about 18ns *per move* (this means on my pc the makemove() costs 9ns and a pointer decrement costs 1ns).

What does this mean? On a *full-width search* with a perft() without bulk-counting with a branching factor of 6, I save about 36ns per frontier node, on a total of about 340ns, so copy-make is about 10% faster than make-unmake. However, on a *selective search*, I don't earn enough back to compensate the more expensive move generation. The break-even point is when I make/unmake less than 25% of the move I generate.

But suppose my move ordering is bad at frontier nodes and they are not being cut by more than 75%. I think (but have not tested this yet) copy-make will still be slower than make/unmake because of the hash and eval updates. Computing a Zobrist hash from scratch requires looping over all pieces rather than looping over all moving/captured pieces. Similarly for the eval: computing the material change or tempo change is cheaper than computing the material balance or tempo balance from scratch for each move.

So my conclusion is that on a very badly ordered tree copy-make will gain at most 10%, on a reasonably ordered tree (25% chance of first move being the best) it is already the same as make-unmake, and recomputing the eval + hash + tempo at every node will only worsen the score for copy-make. So *at worst* am I 10% slower on the move generation (but probably at least equal), but the upside from the eval/hash/tempo is a lot bigger I believe.

BertTuyt · Post by **BertTuyt** » Sun Nov 02, 2008 21:40

Rein, I like your approach for elegance and simplicity. And also your perft score showed that is is very fast.
Nevertheless I have a question, as I also think about improving board representation.

During a move I also modify/update the Empty (or not occupied) BitBoard.
You don't do this during move, only when you call the MoveGenerator.

To what extend do you think it is efficient to update this Bitboard later.
My reasoning, in every node you need to call the MoveGenerator at least to check if there is no forced kill move. And this requires the Empty Bitboard.

If it is the terminal node, and no kills are possible, you call (in most cases) the evaluation function which again needs to have the Empty Board.

So it seems to me (but i oversee something most likely), that it is more efficient to update the Empty BitBoard every time.

Bert

Rein Halbersma · Post by **Rein Halbersma** » Sun Nov 02, 2008 21:59

BertTuyt wrote:Rein, I like your approach for elegance and simplicity. And also your perft score showed that is is very fast.
Nevertheless I have a question, as I also think about improving board representation.

During a move I also modify/update the Empty (or not occupied) BitBoard.
You don't do this during move, only when you call the MoveGenerator.

To what extend do you think it is efficient to update this Bitboard later.
My reasoning, in every node you need to call the MoveGenerator at least to check if there is no forced kill move. And this requires the Empty Bitboard.

If it is the terminal node, and no kills are possible, you call (in most cases) the evaluation function which again needs to have the Empty Board.

So it seems to me (but i oversee something most likely), that it is more efficient to update the Empty BitBoard every time.

Bert

I haven't thought about it, I just wanted a very compact Move and Board struct, and simplicity in updating. But I am not sure it would be faster to incrementally update Empty BitBoard. The reason is that it takes also some instructions to compute the mutations in the empty squares. How are you going to do that quicker than the expression:

Code: Select all

not_occupied = ~(pieces[BLACK] ^ pieces[WHITE] ^ Board::GHOSTS);

Rein

BertTuyt · Post by **BertTuyt** » Mon Nov 10, 2008 02:56

Rein, in your movegenerator you construct a bitboard from the From_ and To_square. Based on this doing a Move and Unmove is relatively easy.

On the other hand i guess you need the From_ and To_ information when you want to update the hashvalue (as I think all programs use the same Zobrist type of implementation).

So why don't you just pass the From_ & To_Square and the captured bitboard, and construct the From_To bitboard at that time (so you don't need to do a _bitscan operation.

The advantage of this "delayed" operation is even more, in case you get an immediate beta-cutoff in a Cutoff-Node.

I also agree that this will not make the final breakthrough in the end for your program, but it is anyway fun thinking and discussing about this.

Greetings from the USA, by the way (so I had little time posting the last week and most likely next week).

Bert

Rein Halbersma · Post by **Rein Halbersma** » Mon Nov 10, 2008 08:32

BertTuyt wrote:Rein, in your movegenerator you construct a bitboard from the From_ and To_square. Based on this doing a Move and Unmove is relatively easy.

On the other hand i guess you need the From_ and To_ information when you want to update the hashvalue (as I think all programs use the same Zobrist type of implementation).

So why don't you just pass the From_ & To_Square and the captured bitboard, and construct the From_To bitboard at that time (so you don't need to do a _bitscan operation.

The advantage of this "delayed" operation is even more, in case you get an immediate beta-cutoff in a Cutoff-Node.

I also agree that this will not make the final breakthrough in the end for your program, but it is anyway fun thinking and discussing about this.

Greetings from the USA, by the way (so I had little time posting the last week and most likely next week).

Bert

Hi Bert,

Yes that's a good idea. In fact, I have been thinking about it myself since the DamExchange protocol requires a From and To square as well. In some rare events, a capture move can have From=To which in my format would translate into an empty bitfield (I have From XOR To in that bitboard). It would be easier to do it as you suggest. However, that would expand my Move struct from 24 to 32 bytes as I also need to store captured men and captured kings.

BTW, I will need to do a _bitscan for the hash updating since my from_sq and dest_sq in my move generation are not square numbers but bitboards with a single bit set to 1 on the appropriate squares.

Rein

FeikeBoomstra · Post by **FeikeBoomstra** » Mon Nov 10, 2008 09:25

This is what I described earlier: I've split move generation and node population. I indeed use an index for From and To and a bitmap for the pieces captured.

Ed Gilbert · Post by **Ed Gilbert** » Mon Nov 10, 2008 14:00

BertTuyt wrote:Rein, in your movegenerator you construct a bitboard from the From_ and To_square. Based on this doing a Move and Unmove is relatively easy.

On the other hand i guess you need the From_ and To_ information when you want to update the hashvalue (as I think all programs use the same Zobrist type of implementation).

Hi Bert,

I do not use a Zobrist hash function in kingsrow. I use a series of XORs and shifts, and I calculate the hashing function 'from scratch' whenever it is needed rather than updating it incrementally during the search. You might think this should be slow, but I tried switching to an incremental Zobrist hash once and found that the search speed dropped slightly. I suspect the reason is that I don't do hashtable lookups at leaf nodes or during a quiescent extension search, but because of many extensions and truncations I still have to update an incremental hashcode in these functions because a leaf node can become an interior node when truncations are cancelled.

BTW, my movelist format is exactly a destination bitboard, so there are no makemove() and unmake() functions needed during the search.

-- Ed

World Draughts Forum

BitBoards

BitBoards

Re: BitBoards

Re: BitBoards

Re: BitBoards