(BTW I also insist on separating them, which is why I assumed from the beginning that "optimizer" could only refer to the maths part, but it was only a guess).The tool was with source, so I could change the input, as I wanted that the optimization tool had no knowledge about the evaluation, so I separate them. All the knowledge about the evaluation is in the tool i made (called cpn) that reads the .pdn files, derives the relevant positions, and for every position a feature vector is constructed. I have 2 types of features, pattern-features (where I only provide the pattern index to the optimization tool), and property features which have a numerical value.
If, in a parallel universe, Ed's code were actually doing everything (convert draughts games/positions to a final weight vector), then I would feel it's (way) too much the same as Kingrow; fine for personal experiments, though.
While I felt uneasy after congratulating Bert perhaps a bit quickly (not being sure what Ed's code was doing), from this quote I now see that Bert has done the work I expect of any game programmer. I'm not sure that the missing gradient piece is so important anymore (and might become a common developper tool in the future).
Although I don't use 3rd party libraries, I'll take a shot in the dark and give Jan-Jaap a possibly-missing piece of the puzzle. At first sight, it seems far fetched that those libraries could help learning patterns at all. However, from what I consider a historical coincidence, Natural language processing (NLP) also requires sparse vectors (or at least used to). For this reason, possibly alone, ML libraries usually have some way to cope with sparse features. Rein found mention of "wide learning" (if I remember correctly) in TF for instance.
Fabien.