I use normalized values. The rationale is that you want to score more than your opponents with the largest gap.
There is no multithreading on CG.
There is no multithreading on CG.
I also thought so, but yesterday I ran thread::hardware_concurrency()
and it returns 8 on CG, and I thought that they could’ve enabled it and I didn’t hear about it.
I’ve tested whatever came to my mind and it seems to me that everything works as expected, but the bot is still very very bad in the platform. The biggest suspicion I have at the moment is that my tree seems to grow really wide and not too deep, even with low UCT value. I’ve deployed a visual representation here for 3 turns, 1000 MCTS iterations each. It’s very basic site, which needs a lot of improvements. Numpad + for next iteration, Numpad - for previous, Page Up for next turn Page Down for previous turn, Mouse scroll Zoom In/Out, hold Left Mouse button to pan around.
For each turn I reach max depths between 3-6 is this normal for my tree to see only this many turns in the future?
Probably my evaluations are really bad, I always have problems with them. I’ll share several variants I’ve tried. For each I have the state at the start of the playout and after it. All evaluations are normalized in the range [-1, 1]:
A playout from GAME TURN 0:
==== STATE BEFORE PLAYOUT ====
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
....#....#....#....#.......... 0 0 0 0 0 0 0
31155382221123 6 7 6 7 6 7 0
DRUL 0 0 0 0 0 0 15
DULRURDDRLLLL 0 0 0 0 0 0 0
==== STATE AFTER PLAYOUT ====
27 1 0 0 1 0 0 0 1 0 1 0 0
0 0 0 1 0 0 1 1 0 0 0 1 0
0 0 1 0 0 1 0 0 0 1 0 0 1
GAME_OVER 29 26 28 0 0 0 0
GAME_OVER 4 6 11 -5 8 2 0
GAME_OVER 22 25 21 4 0 2 0
GAME_OVER 11 8 2 1 1 1 0
==== Evaluation based on MEDALS before and after ====
Player[0]: 0.916667
Player[1]: -0.0833333
Player[2]: -0.166667
==== Evaluation based on SCORE before and after ====
Player[0]: -0.333333
Player[1]: -1
Player[2]: -1
==== Evaluation based on TOTAL SCORE at the end ====
Player[0]: -0.999912
Player[1]: -1
Player[2]: -1
A playout from GAME TURN 50:
==== STATE BEFORE PLAYOUT ====
0 2 0 0 2 0 1 3 0 0 0 0 3
540 1 1 0 1 2 0 0 3 0 3 0 0
540 1 1 0 1 2 0 0 3 0 3 0 0
....#...#....#...#....#...#... 22 13 13 2 1 1 0
3649232 10 -8 -2 -12 10 -8 0
GAME_OVER 25 17 17 2 4 4 0
RDRDDLRRLU 10 7 10 0 1 0 0
==== STATE AFTER PLAYOUT ====
1701 3 0 0 2 1 1 3 0 0 1 0 3
540 1 1 1 1 2 1 0 3 0 3 0 1
1200 1 2 0 2 2 0 0 3 0 3 1 0
GAME_OVER 29 19 24 0 0 0 0
GAME_OVER 19 -9 -20 -10 9 -14 0
GAME_OVER 25 17 17 2 4 4 0
GAME_OVER 16 8 13 0 0 0 0
==== Evaluation based on MEDALS before and after ====
Player[0]: 0.416667
Player[1]: -1
Player[2]: 0.333333
==== Evaluation based on SCORE before and after ====
Player[0]: 0.166667
Player[1]: -0.732143
Player[2]: -0.404762
==== Evaluation based on TOTAL SCORE at the end ====
Player[0]: -0.994465
Player[1]: -0.998243
Player[2]: -0.996095
A playout from GAME TURN 95:
==== STATE BEFORE PLAYOUT ====
0 4 0 1 5 0 1 6 0 0 0 0 6
9504 3 2 0 1 5 0 0 6 0 6 0 0
9504 3 2 0 1 5 0 0 6 0 6 0 0
....#....#....#............... 4 4 3 2 2 0 0
43223 -4 3 8 5 7 4 0
GAME_OVER 27 13 13 -2 -1 -1 0
DULDUL 1 6 10 0 0 0 0
==== STATE AFTER PLAYOUT ====
0 4 0 1 5 0 1 6 0 0 0 0 6
9504 3 2 0 1 5 0 0 6 0 6 0 0
9504 3 2 0 1 5 0 0 6 0 6 0 0
....#....#....#............... 8 7 9 0 0 1 0
3 -3 3 13 3 11 3 0
GAME_OVER 27 13 13 -2 -1 -1 0
UL 1 6 10 0 0 0 0
==== Evaluation based on MEDALS before and after ====
Player[0]: -1
Player[1]: -1
Player[2]: -1
==== Evaluation based on SCORE before and after ====
Player[0]: -1
Player[1]: -0.346939
Player[2]: -0.346939
==== Evaluation based on TOTAL SCORE at the end ====
Player[0]: -1
Player[1]: -0.969075
Player[2]: -0.969075
Looking at the different evaluation now as I’m writing they all look pretty bad… But I’m not sure how to proceed with the experiments to improve them.
Looking at your score at turn 95 you are winning no ? 4 gold in hurdles, 5 gold in archery, 6 gold in skating, that’s more than the other ; but 0 medal in diving. Is that normal ?
And no i don’t use multithreading. I wrote some optimizations of code, namely, CG doesn’t like std:: routines e.g. vector or rand() so I erase them as much as possible, I use many global variables (faster on CG). But some ppl have much faster code than me.
Yes, somehow only diving is ignored and I do not earn medals there no matter what I do, I’m literally out of ideas where the problem may be. I’ve checked the selection uct and backpropagation 100 times and I do not see what I’m missing…
Isn’t that because Silver Bot only plays diving ? What if you play vs. 1 silver bot and another bot ?
I’ve tried different kinds of combinations the result is not very different, but yes sometimes I win gold in diving.
I think I’ll take a break from this task, seems like my skills could get me only so far, thanks for all the help. I’ll try to implement DUCT for other game (Xmas Rush maybe) if I get stuck there also I’ll implement it offline for the most basic simultaneous game I could find (Duel, 2 players, 2 moves per player, simple evaluation… hopefully). And afterwards I’ll try to cross the invisible barrier, which prevents to move to legend, once more.
I decided to take a look again at the PMs in the contest forum thread and the post of @_Royale caught my eye, especially:
- Reward is my_score / total_score (with a epsilon free silver medal for all players).
I was sure that I had a problem with my evaluation function, by overcomplicating it. I decided to implement _Royale’s and it instantly worked, my bot was able to beat the Gold Boss for first time and I was very excited. I’ve submitted with 2 for the exploration vs exploitation value and I’ve reached the top 10 in the Gold league, I couldn’t believe it. Then after tweaking the value to 0.5 and 0.4 I was literally placed first in the Gold league. I was first for the whole night in which I couldn’t get almost any sleep. In the morning I had some ideas but decided to submit again with 0.5 just in case and it happened LEGEEEEND
Finally I’ve managed to achieve the so desired goal!
In the end it turned out that the evaluation/reward function is probably the most important and it needs to be tweaked to favor bigger difference between the scores of the players. Unfortunately the randomness plays a role in this task, but it’s part of the game.
For completeness here are the examples of states and different evaluations that I’ve tried:
GAME TURN 0
==== STATE BEFORE PLAYOUT ====
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
....#....#....#....#.......... 0 0 0 0 0 0 0
31155382221123 6 7 6 7 6 7 0
DRUL 0 0 0 0 0 0 15
DULRURDDRLLLL 0 0 0 0 0 0 0
==== STATE AFTER PLAYOUT ====
0 0 0 1 0 1 0 0 1 0 1 0 0
9 1 0 0 1 0 0 0 1 0 0 1 0
0 0 1 0 0 0 1 1 0 0 0 1 0
GAME_OVER 19 29 25 0 0 0 0
GAME_OVER 7 -1 4 4 18 14 0
GAME_OVER 20 20 25 3 3 4 0
GAME_OVER 5 2 2 2 0 0 0
==== Evaluation based on MEDALS before and after ====
Player[0]: 0.333333
Player[1]: 0.833333
Player[2]: 0.333333
==== Evaluation based on SCORE before and after ====
Player[0]: -1
Player[1]: -0.777778
Player[2]: -1
==== Evaluation based on TOTAL SCORE at the end ====
Player[0]: -1
Player[1]: -0.999971
Player[2]: -1
==== Evaluation based on MY SCORE / TOTAL SCORE and a silver medal epsilon ====
Player[0]: -0.666667
Player[1]: 0.333333
Player[2]: -0.666667
GAME TURN 50
==== STATE BEFORE PLAYOUT ====
0 2 0 0 2 0 1 3 0 0 0 0 3
540 1 1 0 1 2 0 0 3 0 3 0 0
540 1 1 0 1 2 0 0 3 0 3 0 0
....#...#....#...#....#...#... 23 13 13 0 1 1 0
3649232 10 -12 12 -10 12 -10 0
LUDR 6 4 4 -2 -2 -2 13
RDRDDLRRLU 15 10 10 5 0 0 0
==== STATE AFTER PLAYOUT ====
756 3 0 0 2 1 1 4 0 0 0 1 3
1920 1 2 0 2 2 0 0 4 0 4 0 0
900 1 2 0 1 2 1 0 4 0 3 0 1
GAME_OVER 29 17 17 0 1 1 0
GAME_OVER 4 -15 14 5 12 -11 0
GAME_OVER 25 21 21 4 2 4 0
GAME_OVER 16 19 11 0 0 0 0
==== Evaluation based on MEDALS before and after ====
Player[0]: 0.833333
Player[1]: 0.833333
Player[2]: -0.166667
==== Evaluation based on SCORE before and after ====
Player[0]: -0.481481
Player[1]: -0.047619
Player[2]: -0.553571
==== Evaluation based on TOTAL SCORE at the end ====
Player[0]: -0.99754
Player[1]: -0.993753
Player[2]: -0.997072
==== Evaluation based on MY SCORE / TOTAL SCORE and a silver medal epsilon ====
Player[0]: -0.437077
Player[1]: -0.0500677
Player[2]: -0.512855
GAME TURN 95
==== STATE BEFORE PLAYOUT ====
0 4 0 1 5 0 1 6 0 0 0 0 6
9504 3 2 0 1 5 0 0 6 0 6 0 0
9504 3 2 0 1 5 0 0 6 0 6 0 0
....#....#....#............... 4 4 3 2 2 0 0
43223 -4 3 8 3 7 4 0
GAME_OVER 25 13 13 -1 -1 -1 0
DULDUL 0 6 7 0 0 1 0
==== STATE AFTER PLAYOUT ====
0 4 0 1 5 0 1 6 0 0 0 0 6
9504 3 2 0 1 5 0 0 6 0 6 0 0
9504 3 2 0 1 5 0 0 6 0 6 0 0
....#....#....#............... 7 9 5 0 2 0 0
3 -2 8 3 1 3 5 0
GAME_OVER 25 13 13 -1 -1 -1 0
UL 0 6 16 0 0 0 0
==== Evaluation based on MEDALS before and after ====
Player[0]: -1
Player[1]: -1
Player[2]: -1
==== Evaluation based on SCORE before and after ====
Player[0]: -1
Player[1]: -0.346939
Player[2]: -0.346939
==== Evaluation based on TOTAL SCORE at the end ====
Player[0]: -1
Player[1]: -0.969075
Player[2]: -0.969075
==== Evaluation based on MY SCORE / TOTAL SCORE and a silver medal epsilon ====
Player[0]: -0.75814
Player[1]: -0.12093
Player[2]: -0.12093
I have some ideas for improvements like to remove the std::string I use in each game states for each mini game in an effort to increase the MCTS iterations, just to see if more iterations will improve my rank.
Thanks again for all the help. Awesome game and experience Never stop believing and trying.
How to play with other game than “course de haies” ? I don’t understand the other game ?
Someone can explain why I have 0, i have more gold medals §??? Look at the picture
All four games are controlled with the same movement. If you output ‘UP’, you jump and move two in hurdles, your arrow direction moves up*windspeed, you do a certain skating move depending on the position of ‘U’, and you either or not score points for diving based on whether ‘U’ happens to be the movement needed at that turn.
You most likely have gold and silver medals across only three (or less) of the four games, and 0 in one (or more) games. If you have 0 gold/silver → 0 points for one of the games, your total score is 0 as well, because: The scores for all four mini-games are multiplied together to determine the final score.
How don’t be 3rd ? I simulate only one turn by step in my MCTS. How to expect the medals in the simulation. Someone can help me PLS ?
If you only simulate one turn at a time, I would say you do not need to do something on medal expectation. Every turn you get the number of medals as input (score_info
gives per player the total points followed by medals per mini game). So, per turn you can check in which of the games you are lacking in medals, and based on that decide which game to prioritize.
The only one-turn simulation relevance related to medals I see is to predict whether you expect a game to finish within the upcoming turn - and what your position might be during that finish.
In fact, I also basically do one turn simulation, per turn prioritizing games based on number of medals, with some expectation on whether games are about to end. Not enough for Legend (I should switch to full simulation for that I guess), but at least enough for high gold (current Gold 207).
Per mini-game (except for Skating) it is pretty doable to write code dedicated to doing the best possible move in that game. It is guaranteed that, if you run a mini-game-specific code, you will get silver or gold for it multiple times (since other players are not continuously focusing on that specific mini-game). If you then decide which of the mini-game-specific codes you use based on which mini-game has the lowest number of gold/silver medals, you will most certainly start getting some wins.
In first , I do the diving alone 2 times in Silver , and after only hurdle and archery , skating doing alone. I skip the 0 score.But I have many 3rd place.
I have a replay of my game, somebody can help me ?? What did you think about ?
How did you do your randomPlayout? Totally randomly like random.randint(0, 3)
Yeap, purely random
Hi Di Masta,
Can you see my code , I have the same problem like you that I didn’ happen to resolve.
If you have time, and if you want , I’ll be happy , Thanks Di MAsta to answer me each time !!!
The main problem on my side was caused by the evaluation function. Try different approaches for it. The one that worked for me best was:
- Reward is my_score / total_score (with a epsilon free silver medal for all players).
Allright, nothing run for me !!! But i see some error in my code , i think it’ll be better after.
Do you have one tree only, or for each player?