Nothing fancy, I just wanted to poke and see if it was worth investing more into. I was wondering if it was better to switch up playstyles according to unknown factors in the game state. Maybe something about the initial state indicated it’s better to draft faster/slower, or maybe a state in the early mid-game indicated it’s better to go for score over deliveries, or… Anyway so I recorded self-play game states to a text file, used a tweaked bot as a reference opponent to not get symmetric games but still ~50% winrate, forcing a decision on one side and recording the game result.
Input was components / 5 for recipes, spells and tomes, price / 20, and if applicable, score / 100, delivery / 5. Output is # of decisions + default, fitting for win/draw/loss value encoded as 1/0/-1. Used a basic fully-connected network with various # of layers and neurons, batch norm, relu/tanh activation, tried both MSE and NLL for loss function. Used 10k games for training set and 1k games for test set. Also used pytorch to keep momentum and not waste too much time. All it ever did was memorize the training set, the test set loss never went down. I gave up soon after, so maybe wasted 2-3 hours in total, plus self-play time during down time.