I implement that a long time ago without any gains.
Do you have a pool of nodes allocated once for all ? This will help you reuse the tree and make your bot stronger.
Yes I have a pool, but at this point 80% of time is spent in simulations which do not involve nodes anyway, also i donāt store board in nodes which maybe slow things as I have to recalculate every move (it seems from profiling that time is taken by play(move) which is almost only about few bit moving and tests.) Anyway thanks for help, iāll keep digging.
Do you have 1-ply wincheck during your playouts? It will make them slower but more accurate.
Yes I have 1 ply win check and 1 ply prevent opponent win check. It almost changed nothing for speed but the bot was much better. The reason is ( I can check) that the simulation are shorter than full random.
Unless your game state is very small, it is a good practice not to store it into nodes. With a fast sim, you will need a lot of nodes whose memory requirements will exceed what is allowed by CG.
My code for playing a move on a small board
bool playMove (int player, unsigned int move)
{
unsigned int& playerMoves = players[player];
playerMoves |= 1 << move;
return winningPositions[playerMoves];
}
I have a 9 bit bitboard for each player. I use the bitboard to index a precomputed wincheck table. Combining bitboards for both players give me the candidate moves.
Please note that there are many different implementations that achieve the same result.
Now i have MCTS+tree reuse+solver+pool working and sitting around 79 (peak at 74). Rollouts at turn 2 are still quite low (40k to 43k). But i noticed that the memory pool is almost useless: wether i preallocate with malloc or with an array of node it is not faster than using new for each new node, maybe iām doing something completely wrong here.
Yes, there is something fishy here. The master rule for performance is ādo not allocate/free anything in your real time loopā. Otherwise, your response time becomes unpredictable.
Little update: big bug correction, more pruning from score bounded mcts,minor optimizations(win check with avx, faster movegen), also remembered from reading that you can play with fpu and Iām now sitting at 37. Still miles away from the coveted 80k rollouts.
Do you use rand() instead of some simple random generation function? rand() is rather slow on CG.
I use some simple custom generator I found on internet also tried with random lib. On average Ć full
simulation starting from initial position is 3 micro seconds. On starting position I reach 350-400k rollouts which means that almost all time is spent on simulation. As Iām not home I canāt profile properly.
Thanks darkhorse,
Can you help me to manage games by myself?
How do I simulate a game in my own dev tools? Do I need to re-code the tic tac toe game?
Yes. To simulate the game, you have to provide your own implementation in your bot.
After a lot of tries, profile and changes a few conclusions:
-node size is not so much a big deal (if i go minimalist or embed full list of moves, speed does not change)
-if i use only pointer in nodes or use vector(slightly better was vector<Node*> the difference if very small.
-Memory Pool or not change almost nothing (going all the way new() Node tadatada)
-On local my implementation is around 2 times faster(even tried to compile on local and send binary , still 2 times slower)
-The ratio time of playouts/total time is about 50% and i canāt figure how to make playouts faster(even tried no struct imple)
So still struggling around 40k at turn 2, only good news is with full moves list at every nodes and move ordering speed is almost the same as withoutā¦ I should be doing something pretty nasty:)
It may be a very minor optimization but 1 / Quake Inverse Square Root is disgustingly fast if you dont need exact values. According to profiling, quake is 135.14 times faster than std::sqrt().
Graph of calculating UCT, closer to 0 equals less difference compared to using std::sqrt(). Wins equal visits/2, parent equals childVisits*2.
I think you can do even better by using the rsqrtss instruction (approximate reciprocal square root) and by using a multiplication instead of a division to find the square root.
float fastSquareRoot(float x) {
__m128 in = _mm_set_ss(x);
return _mm_cvtss_f32(_mm_mul_ss(in, _mm_rsqrt_ss(in)));
}
There are also some similar ideas to optimize log2 by losing some precision here.
I donāt think it makes a big difference though.
Ooh nice, thanks! Yea, these things dont yield much but its good to know about them anyway.
If you rely much on randomness, you would want to get random number in range using multiplication instead of modulo as it is faster.
uint64_t seed = 12345L;
uint64_t nextLong() {
seed ^= seed << 13;
seed ^= seed >> 7;
seed ^= seed << 17;
return seed;
}
uint32_t nextInt(int n) {
uint32_t x = nextLong() >> 32;
uint64_t m = uint64_t(x) * uint64_t(n);
return m >> 32;
}
Is there any way the timeout limit can be increased? This puzzle is tagged minimax, but applying minimax (atleast in python) results in a timeout. If i copy my solution over to another online IDE, such as repl, it doesnt take very long at all
(BTW im only on wood 1, before the extra layer of tic tac toe is added)
I donāt think the limit can be increased, but also donāt think it should be necessary. My code doesnāt use minimax, nor is in python but I am pretty sure those are sufficient to solve the problem (to certain extent). Even more so, given that is is not āultimateā yet, so the search space is smaller.
About the IDE though. How do you define ānot very longā? How do you measure it?
I have 1 second for first round and 0,1 second for subsequent rounds. Iād say that is not very long, but twice as that wouldnāt be either, yet would result in a timeout. Limits may differ from language to language (and maybe should), so your limits may be different.
Besides, I also keep timing out. So that is just life Got tired trying to fix it. The ālegend dreamā will have to wait a little
Having timeouts is a good opportunity to learn to profile and optimize your code. Finding the source of inefficiency will boost your coding skills like nothing else. You might start with measuring elapsed time and printing it to error stream (debug).
import sys
from timeit import default_timer as tmr
# start measuring after the first input() in the game loop
start_tmr = tmr()
your_function_to_measure()
print(f"time elapsed: {(tmr() - start_tmr) * 1000:.3f} ms.", file=sys.stderr, flush=True)