Ultimate Tic Tac Toe [Puzzle discussion]

darkhorse64 · August 9, 2023, 9:33pm

I implement that a long time ago without any gains.
Do you have a pool of nodes allocated once for all ? This will help you reuse the tree and make your bot stronger.

MrSubZero · August 9, 2023, 11:47pm

Yes I have a pool, but at this point 80% of time is spent in simulations which do not involve nodes anyway, also i don’t store board in nodes which maybe slow things as I have to recalculate every move (it seems from profiling that time is taken by play(move) which is almost only about few bit moving and tests.) Anyway thanks for help, i’ll keep digging.

jacek1 · August 10, 2023, 5:32am

Do you have 1-ply wincheck during your playouts? It will make them slower but more accurate.

MrSubZero · August 10, 2023, 6:41am

Yes I have 1 ply win check and 1 ply prevent opponent win check. It almost changed nothing for speed but the bot was much better. The reason is ( I can check) that the simulation are shorter than full random.

darkhorse64 · August 10, 2023, 10:15am

Unless your game state is very small, it is a good practice not to store it into nodes. With a fast sim, you will need a lot of nodes whose memory requirements will exceed what is allowed by CG.

My code for playing a move on a small board

bool playMove (int player, unsigned int move)
{
	unsigned int& playerMoves = players[player];
	playerMoves |= 1 << move;
	return winningPositions[playerMoves];
}

I have a 9 bit bitboard for each player. I use the bitboard to index a precomputed wincheck table. Combining bitboards for both players give me the candidate moves.

Please note that there are many different implementations that achieve the same result.

MrSubZero · August 15, 2023, 9:10pm

Now i have MCTS+tree reuse+solver+pool working and sitting around 79 (peak at 74). Rollouts at turn 2 are still quite low (40k to 43k). But i noticed that the memory pool is almost useless: wether i preallocate with malloc or with an array of node it is not faster than using new for each new node, maybe i’m doing something completely wrong here.

darkhorse64 · August 15, 2023, 9:54pm

Yes, there is something fishy here. The master rule for performance is “do not allocate/free anything in your real time loop”. Otherwise, your response time becomes unpredictable.

MrSubZero · August 19, 2023, 7:41pm

Little update: big bug correction, more pruning from score bounded mcts,minor optimizations(win check with avx, faster movegen), also remembered from reading that you can play with fpu and I’m now sitting at 37. Still miles away from the coveted 80k rollouts.

jacek1 · August 20, 2023, 7:18am

Do you use rand() instead of some simple random generation function? rand() is rather slow on CG.

MrSubZero · August 20, 2023, 7:47am

I use some simple custom generator I found on internet also tried with random lib. On average à full
simulation starting from initial position is 3 micro seconds. On starting position I reach 350-400k rollouts which means that almost all time is spent on simulation. As I’m not home I can’t profile properly.

Nilhav · August 25, 2023, 12:29pm

Thanks darkhorse,

Can you help me to manage games by myself?

How do I simulate a game in my own dev tools? Do I need to re-code the tic tac toe game?

darkhorse64 · August 25, 2023, 1:13pm

Yes. To simulate the game, you have to provide your own implementation in your bot.

MrSubZero · August 30, 2023, 4:50pm

After a lot of tries, profile and changes a few conclusions:
-node size is not so much a big deal (if i go minimalist or embed full list of moves, speed does not change)
-if i use only pointer in nodes or use vector(slightly better was vector<Node*> the difference if very small.
-Memory Pool or not change almost nothing (going all the way new() Node tadatada)
-On local my implementation is around 2 times faster(even tried to compile on local and send binary , still 2 times slower)
-The ratio time of playouts/total time is about 50% and i can’t figure how to make playouts faster(even tried no struct imple)
So still struggling around 40k at turn 2, only good news is with full moves list at every nodes and move ordering speed is almost the same as without… I should be doing something pretty nasty:)

Biglover · November 25, 2023, 2:27pm

It may be a very minor optimization but 1 / Quake Inverse Square Root is disgustingly fast if you dont need exact values. According to profiling, quake is 135.14 times faster than std::sqrt().

Graph of calculating UCT, closer to 0 equals less difference compared to using std::sqrt(). Wins equal visits/2, parent equals childVisits*2.

TomAlard · November 26, 2023, 7:17am

I think you can do even better by using the rsqrtss instruction (approximate reciprocal square root) and by using a multiplication instead of a division to find the square root.

float fastSquareRoot(float x) {
    __m128 in = _mm_set_ss(x);
    return _mm_cvtss_f32(_mm_mul_ss(in, _mm_rsqrt_ss(in)));
}

There are also some similar ideas to optimize log2 by losing some precision here.

I don’t think it makes a big difference though.

Biglover · November 26, 2023, 10:10am

Ooh nice, thanks! Yea, these things dont yield much but its good to know about them anyway.

jacek1 · November 26, 2023, 3:58pm

If you rely much on randomness, you would want to get random number in range using multiplication instead of modulo as it is faster.

    uint64_t seed = 12345L;

    uint64_t nextLong() {
        seed ^= seed << 13;
        seed ^= seed >> 7;
        seed ^= seed << 17;
        return seed;
    }

    uint32_t nextInt(int n) {
        uint32_t x = nextLong() >> 32;
        uint64_t m = uint64_t(x) * uint64_t(n);
        return m >> 32;
    }

NKYT1 · March 27, 2024, 9:34am

Is there any way the timeout limit can be increased? This puzzle is tagged minimax, but applying minimax (atleast in python) results in a timeout. If i copy my solution over to another online IDE, such as repl, it doesnt take very long at all

(BTW im only on wood 1, before the extra layer of tic tac toe is added)

brokenelevator · April 6, 2024, 6:44pm

I don’t think the limit can be increased, but also don’t think it should be necessary. My code doesn’t use minimax, nor is in python but I am pretty sure those are sufficient to solve the problem (to certain extent). Even more so, given that is is not ‘ultimate’ yet, so the search space is smaller.

About the IDE though. How do you define ‘not very long’? How do you measure it?
I have 1 second for first round and 0,1 second for subsequent rounds. I’d say that is not very long, but twice as that wouldn’t be either, yet would result in a timeout. Limits may differ from language to language (and maybe should), so your limits may be different.

Besides, I also keep timing out. So that is just life Got tired trying to fix it. The ‘legend dream’ will have to wait a little

Uljahn · April 6, 2024, 8:00pm

Having timeouts is a good opportunity to learn to profile and optimize your code. Finding the source of inefficiency will boost your coding skills like nothing else. You might start with measuring elapsed time and printing it to error stream (debug).

import sys
from timeit import default_timer as tmr

# start measuring after the first input() in the game loop
start_tmr = tmr()
your_function_to_measure()
print(f"time elapsed: {(tmr() - start_tmr) * 1000:.3f} ms.", file=sys.stderr, flush=True)