158 games seems way to little to determen if one version of program is better than another version.BertTuyt wrote:With the modified evaluation function (+ voorpost), with corrected symmetry bugs (as detected and corrected by Ed), I did another Horizon Damage 158 games test.
The end result: 33+ 3- 122=, in comparison with the previous match (with the bugged evaluation) not really better (in reality somewhat worse, 25+ 5- 128=).
But I think this is a statistical fluctuation (at least I hope) also observed by others.
If I remember well also Ed did not reveal a huge (or any) difference with the modified eval.
Bert
I usually compute a match score as
Code: Select all
score=(2*win+draw)/(number of games)
If you compute the statistical variance on this, you get
Code: Select all
n1=win+draw+lose;
sigma=sqrt( ( (4*win+draw)/n1 -(2*win+draw)*(2*win+draw)/(n1*n1))/n1);
Or in other words, if you want to be 95% sure player 1 betters player 2 in a 158 game match, it needs to score at least 56%-44% or so. Any score between 44 and 56 percent doesn't mean much in such a short match.
I am doing to calculations by heart, so they may be off a little, and they also depend on the draw rate.
Michel