PDN standard

ildjarn · Post by **ildjarn** » Tue May 12, 2009 09:59

Defining the possible results as 2-0[^0-9], 1-1[^0-9] and 0-2[^0-9] would solve the prefix problem, wouldn't it?

If 'free' result formats would be allowed, there is a problem with the Delft result '2-8' or '8-2', since they are valid moves too.

Ed Gilbert · Post by **Ed Gilbert** » Tue May 12, 2009 13:41

There are two use cases for which I now think that a game separator is relevant. The first one is when you want to store multiple positions without any moves in a PDN file. If there is no game separator, then all the FEN tags of the positions will become part of the same game. This seems undesirable to me. The second one is when you want to store multiple games without any tags in a PDN file.

For the case of multiple games having moves but no tags, my preference would be to write null event tags as separators, [Event ""]. For multiple positions with no moves, I see no easy solution, and I guess in these cases result codes can be used as terminators. I'm assuming we are talking about pdn writing now. For reading pdn, result codes will always be terminators as they are now, but we're trying to eliminate most cases of having to write them.

-- Ed

Rein Halbersma · Post by **Rein Halbersma** » Tue May 12, 2009 18:35

Ed Gilbert wrote:
There are two use cases for which I now think that a game separator is relevant. The first one is when you want to store multiple positions without any moves in a PDN file. If there is no game separator, then all the FEN tags of the positions will become part of the same game. This seems undesirable to me. The second one is when you want to store multiple games without any tags in a PDN file.
For the case of multiple games having moves but no tags, my preference would be to write null event tags as separators, [Event ""]. For multiple positions with no moves, I see no easy solution, and I guess in these cases result codes can be used as terminators. I'm assuming we are talking about pdn writing now. For reading pdn, result codes will always be terminators as they are now, but we're trying to eliminate most cases of having to write them.

-- Ed

What is the problem of using an extra newline as a game separator? This is the way most PDN files on the web are laid out.

Rein Halbersma · Post by **Rein Halbersma** » Tue May 12, 2009 18:44

Wieger Wesselink wrote:I have made an EBNF for a 'forgiving' parser, see below. This one accepts moves with spaces (one before and/or after the separator). It also accepts moves in chess notation. One more difference with respect to the previous version is that the Nag annotations are at the same level as variations and comments. Apparently that is the way they are used in TurboDambase. With this grammar I have tested a lot of pdn files that I found on the internet. All of them pass, except for the ones with real errors or with 0-2, 1-1 or 2-0 as game separators. The 1-1 result is what causes a real problem. It is a prefix of a move like 1-12, and therefore it requires a non-trivial parser to deal with that case. This does not seem acceptable to me, so I left those results out.

Wieger,

I think the confusion between 1-1 as a result and 1-12 as a move is a limitation of the parser generator (TPG) that you are using. In C/C++, part of the standard tool chain is to use Bison/Flex or Yacc/Lex as parser/scanner generators. These tools support LR(1) parsers: i.e. they read input from left to right, build a rightmost derivation (i.e. try to expand the rightmost non-terminal symbol) and look 1 character ahead to deduce the context. Then you can define the token DRAW to be equal to "1-1" without confusing it with the move 1-12.

Rein

Piet Bouma · Post by **Piet Bouma** » Tue May 12, 2009 20:04

ildjarn wrote:Defining the possible results as 2-0[^0-9], 1-1[^0-9] and 0-2[^0-9] would solve the prefix problem, wouldn't it?

If 'free' result formats would be allowed, there is a problem with the Delft result '2-8' or '8-2', since they are valid moves too.

I don't think it's really a problem. In the result tag we can use 2-8, 1-9, 5-5 and 1+ - 1- (maybe a real problem, the plus and minus, it's is not numeric. Also really difficult to calculate with for total results in a tournamenttable. )

The separator at the end of the notation still can be 0-1 1/2-1/2 1-0 en *.

Rein Halbersma wrote:Wieger,

I think the confusion between 1-1 as a result and 1-12 as a move is a limitation of the parser generator (TPG) that you are using. In C/C++, part of the standard tool chain is to use Bison/Flex or Yacc/Lex as parser/scanner generators. These tools support LR(1) parsers: i.e. they read input from left to right, build a rightmost derivation (i.e. try to expand the rightmost non-terminal symbol) and look 1 character ahead to deduce the context. Then you can define the token DRAW to be equal to "1-1" without confusing it with the move 1-12.

Rein

Rein, does that mean that your parser will accept leading zero's at moves? For example: 1-6, in Toernooibase 01-06 (and maybe a problem with the separator/result 1-0.

Rein Halbersma · Post by **Rein Halbersma** » Tue May 12, 2009 20:34

Piet Bouma wrote:
ildjarn wrote:Defining the possible results as 2-0[^0-9], 1-1[^0-9] and 0-2[^0-9] would solve the prefix problem, wouldn't it?

If 'free' result formats would be allowed, there is a problem with the Delft result '2-8' or '8-2', since they are valid moves too.
I don't think it's really a problem. In the result tag we can use 2-8, 1-9, 5-5 and 1+ - 1- (maybe a real problem, the plus and minus, it's is not numeric. Also really difficult to calculate with for total results in a tournamenttable. )

The separator at the end of the notation still can be 0-1 1/2-1/2 1-0 en *.

Rein Halbersma wrote:Wieger,

I think the confusion between 1-1 as a result and 1-12 as a move is a limitation of the parser generator (TPG) that you are using. In C/C++, part of the standard tool chain is to use Bison/Flex or Yacc/Lex as parser/scanner generators. These tools support LR(1) parsers: i.e. they read input from left to right, build a rightmost derivation (i.e. try to expand the rightmost non-terminal symbol) and look 1 character ahead to deduce the context. Then you can define the token DRAW to be equal to "1-1" without confusing it with the move 1-12.

Rein
Rein, does that mean that your parser will accept leading zero's at moves? For example: 1-6, in Toernooibase 01-06 (and maybe a problem with the separator/result 1-0.

Ahum, I don't have a PDN parser yet myself. But yes, I would want to be able to accept all forms "1-6", "1 - 6", "01-06" etc. I do have a FEN parser already however (to be able to setup positions) in my program that I am writing. It will gladly accept any sequence of square numbers, completely ignoring all other characters except W, B and K.

I also don't quite understand the fuss about the result being redundant at the end. The chess guys have had this for years, and so has the draughts community. Of course, we don't have to blindly copy what the chess folks do, but they have so much tools that it is much more economical to free ride on their experiences with what works and what doesn't. Apart from esthestics and the few extra characters, what is *really broken* about the game result as a game termination? And a possible confusion of "8-2" as a Delft majority draw result with the move "8-2" does not bother me, as I would take the result from the tag information anyway.

The main thing I am worrying about is that we as 10x10 draughts programmers don't end up with a standard that does not encompass other variants, for other players, with other result types, board sizes etc.

Ed Gilbert · Post by **Ed Gilbert** » Tue May 12, 2009 23:14

I also don't quite understand the fuss about the result being redundant at the end. The chess guys have had this for years, and so has the draughts community. Of course, we don't have to blindly copy what the chess folks do, but they have so much tools that it is much more economical to free ride on their experiences with what works and what doesn't. Apart from esthestics and the few extra characters, what is *really broken* about the game result as a game termination?

I agree that we should not make gratuitous changes. But the rendundant result codes have always bothered me. When I added pdn file support to kingsrow recently I was sorely temped to drop writing them. I went so far as to verify that CheckerBoard and Dam2.2 do not care if they are present or not. After reading Weiger's initial post I am firmly convinced to drop them now. I don't think it will break compatibility with anything.

-- Ed

Piet Bouma · Post by **Piet Bouma** » Tue May 12, 2009 23:41

Ed Gilbert wrote:
I also don't quite understand the fuss about the result being redundant at the end. The chess guys have had this for years, and so has the draughts community. Of course, we don't have to blindly copy what the chess folks do, but they have so much tools that it is much more economical to free ride on their experiences with what works and what doesn't. Apart from esthestics and the few extra characters, what is *really broken* about the game result as a game termination?
I agree that we should not make gratuitous changes. But the rendundant result codes have always bothered me. When I added pdn file support to kingsrow recently I was sorely temped to drop writing them. I went so far as to verify that CheckerBoard and Dam2.2 do not care if they are present or not. After reading Weiger's initial post I am firmly convinced to drop them now. I don't think it will break compatibility with anything.

-- Ed

But still you need a game separator, I think.
It can be [Event ""] (that is how I separate games from .pdn, but when the tag misses I have a little crash) or the "old way".

But more important I see things as:

Wieger Wesselink wrote: How should clock times be added to a game? This is useful for games played on an electronic board and for games between computers. How should time controls and times used by the players be recorded? Finally it would be nice to have a possibility to do a setup of a new position at arbitrary points. This helps to store analyses of games, and it also makes it possible to store games with illegal moves.

If we have agreement on:

- a way to add clock times at every move
- a FEN (tag?) within the notation when there are illegal moves or an electronic board that does not can recalculate notation or a capture of pieces that can't be recognized in another way.

then we have solved maybe the major problems.

Wieger Wesselink · Post by **Wieger Wesselink** » Wed May 13, 2009 00:06

ildjarn wrote:Defining the possible results as 2-0[^0-9], 1-1[^0-9] and 0-2[^0-9] would solve the prefix problem, wouldn't it?

If 'free' result formats would be allowed, there is a problem with the Delft result '2-8' or '8-2', since they are valid moves too.

Thanks for the useful idea! The above regular expressions have to be slightly modified, otherwise they will consume too many characters. But using the negative lookahead option (?!...) it works. So now I can define the forgiving parser as below. It can deal with all common results as a game terminator, and it also can handle moves with leading zeroes. I have verified this grammar with the collection of games that I have, and the only games it cannot handle are the ones with 'Delft' results like 4-6.

The good news is that this grammar requires only a LL parser (at least I think so). This means that almost any parser will be able to handle it, which is IMO a very important feature of a language definition. But this claim needs to be verified. I want to make a reference implementation in C++ with Boost.Spirit, and perhaps also one in AntLR.

Hopefully we do agree that the grammar below covers PDN as it is being used nowadays. A detail that still might need some discussion is what is exactly allowed inside a quoted string or inside a comment.

The next step is to define the restrictions we want to impose on "good style PDN". I think this can best be done by stripping elements from this grammar.

Code: Select all

separator space: '\s+'

token Win:           '1-0'                                
token Draw:          '1/2-1/2'                            
token Loss:          '0-1'                                
token IWin           '2-0'                                
token IDraw          '1-1(?![0-9])'                       
token ILoss          '0-2'                                
token NoResult:      '\*'                                 
token NumericMove:   '\d+(\s?[-x]\s?\d+)+[*?!]?'          
token ChessMove:     '[a-h][1-8]([-x][a-h][1-8])+[*?!]?'  
token Identifier:    '[a-zA-Z]\w*'                        
token String:        '"[^"]*"'                            
token Comment:       '{[^}]*}'                            
token MoveNumber:    '\d+\.(\.\.)?'                       
token Nag:           '\$\d+'                              
                                                          
PDNFile       -> Game (GameSeparator Game)* GameSeparator?
GameSeparator -> Win | Draw | Loss | NoResult | IWin | IDraw | ILoss ;
Game          -> (GameHeader GameBody?) | GameBody        
GameHeader    -> (Tag)+                                   
Tag           -> '\[' Identifier String '\]'                                                                                               
GameMove      -> MoveNumber? (NumericMove | ChessMove)    
GameBody      -> Annotation? (GameMove Annotation?)+      
Variation     -> '\(' GameBody '\)'                                               
Annotation    -> (Variation | Comment | Nag)+

FeikeBoomstra · Post by **FeikeBoomstra** » Wed May 13, 2009 00:13

For completeness: 0-0 is also a valid result

Wieger Wesselink · Post by **Wieger Wesselink** » Wed May 13, 2009 00:27

Rein Halbersma wrote: Ahum, I don't have a PDN parser yet myself. But yes, I would want to be able to accept all forms "1-6", "1 - 6", "01-06" etc. I do have a FEN parser already however (to be able to setup positions) in my program that I am writing. It will gladly accept any sequence of square numbers, completely ignoring all other characters except W, B and K.

I also don't quite understand the fuss about the result being redundant at the end. The chess guys have had this for years, and so has the draughts community. Of course, we don't have to blindly copy what the chess folks do, but they have so much tools that it is much more economical to free ride on their experiences with what works and what doesn't. Apart from esthestics and the few extra characters, what is *really broken* about the game result as a game termination? And a possible confusion of "8-2" as a Delft majority draw result with the move "8-2" does not bother me, as I would take the result from the tag information anyway.

The main thing I am worrying about is that we as 10x10 draughts programmers don't end up with a standard that does not encompass other variants, for other players, with other result types, board sizes etc.

First I was under the impression that the results at the end of a game would severely limit the class of parsers that are able to parse it. Right now I think that I was wrong about this. So the only problem that I still have with results as game terminator is that it is ugly to misuse the result of a game for this purpose. And it still complicates the job of writing a parser.

I'm no longer so sure that we want to entirely get rid of game separators. For the use case of storing multiple positions without moves it seems cleaner to me to have

[FEN ...] *
[FEN ...] *
[FEN ...]

instead of having to put empty Event or Result tags inbetween the FEN tags, and requiring those to start a new game.

Wieger Wesselink · Post by **Wieger Wesselink** » Wed May 13, 2009 00:40

Piet Bouma wrote: If we have agreement on:

- a way to add clock times at every move
- a FEN (tag?) within the notation when there are illegal moves or an electronic board that does not can recalculate notation or a capture of pieces that can't be recognized in another way.

then we have solved maybe the major problems.

Exactly, these are points we need to handle as well. FEN tags and clock times can be handled separately from the EBNF grammar that we have discussed so far.

Concerning illegal moves I don't know what the best option is. If illegal moves or setups are handled inside comments, this would suggest that older programs are able to deal with the PDN. But this not the case, since ignoring the setup may result in impossible moves right after it. So we might consider to introduce a special token for doing a setup that can appear anywhere in the game. What do others think of this?

Wieger Wesselink · Post by **Wieger Wesselink** » Wed May 13, 2009 00:42

FeikeBoomstra wrote:For completeness: 0-0 is also a valid result

Yes, you're right! I'm afraid this one is not in my system yet .

Ed Gilbert · Post by **Ed Gilbert** » Wed May 13, 2009 01:14

I'm no longer so sure that we want to entirely get rid of game separators. For the use case of storing multiple positions without moves it seems cleaner to me to have

[FEN ...] *
[FEN ...] *
[FEN ...]

instead of having to put empty Event or Result tags inbetween the FEN tags, and requiring those to start a new game.

I agree. I was only suggesting empty Event tags inbetween games consisting of moves but no headers. But since result codes are still needed as terminators as in your example above, then they cannot be completely eliminated, so maybe there is no benefit in eliminating them from only most situations.

BTW, the [FEN "..."] * might also have to be [FEN "..."] 1-0 or [FEN "..."] 1/2-1/2, etc., to document a known result of the position.

I wonder, that 1-0 is a white win in international draughts, but a black win in English checkers, if that a source of any conflict or ambiguity? I guess a program that can deal with multiple game types has to have some way to determine the game type of a file it is reading. Probably a default type if none specified, else there has to be a gametype tag.

-- Ed

Ed Gilbert · Post by **Ed Gilbert** » Wed May 13, 2009 01:22

Concerning illegal moves I don't know what the best option is. If illegal moves or setups are handled inside comments, this would suggest that older programs are able to deal with the PDN. But this not the case, since ignoring the setup may result in impossible moves right after it. So we might consider to introduce a special token for doing a setup that can appear anywhere in the game. What do others think of this?

I would not expect a draughts program to read a pdn file and then seamlessly show me a full game containing an illegal move! For those unusual situations I think embedding all moves following the illegal move in a comment is sufficient to document what happened and allow someone to sort it out afterwards while reading the comment.

-- Ed