I was able, after some challenges to incorporate nnue in Scan.
The first test seemed to work.
Game result with the usual parameters (1 min/ 65 moves Game, 6p DB each, no book, 1 core), 10W KR, 148D. Which translates into an ELO of 22.
The nn was the previous so: 191:256x32x32x1.
In my case i can only use avx-256 (as i have an older processor), and all weights are in int16
I will also check parallel performance, to secure that this also works, From design the implementation should be thread-safe, but the proof of the pudding is the eating. With low nps i assume nnue will benefit more from the additional cores, but the actual match will provide the final answer.
Next to that I also want to test with a double think time for Scan nnue, to compensate for the missing AVX-512 vnni, and to get an idea about network potential. This test was the easiest to do (as i trust the 1 core implementation , so don't need to observe constant eval scores).
Test (DXP Match) so far so good, with 50 games and 50 draws. But like the football parallel, in the end KR will win

.
If all works well, i hope to share the Scan sources with nnue in the weekend (i need to improve readability of some changes).
Keep in mind that this work would be impossible without the base from Jonathan, and the support and idea exchange with Joost.
I really hope that others will embark on the nnue voyage, and share results and new insights in this forum.
Im not 100% sure that nnue will bring a similar revolution like the patterns-based eval, but it is really interesting, and we are only starting. Next to that i really like this black box approach where the nn has totally no pre-defined features, which is the case for the current pattern based evals.
As i personal believe that we are close to the performance optimum, i don't expect nnue to surpass the current state-of the art evals by a huge margin (if any margin), i would already applaud an on-par behavior.
On the other hand im sure we will see much progress in CPU nn HW acceleration (like avx 512 vnni), similar in the way that 64bit processors and progress in instruction sets, enabled efficient bitboard implementations.
Bert