The perfect place to explain it all... and share with others how you proceeded.
As for me, a short summary of what I did in 1v1 :
General algorithm : minimax with alpha-beta pruning
At each depth level, I considered the following actions :
-- 1 move in the direction direction of shortest path
-- A number of walls centered around my pawn
-- A number of walls centered around my opponent's pawn
- The depth level was adapted with the following criteria :
-- Stop the search if the number of walls considered is >= 4
-- Stop the search if the total depth is >= 16
(this adaptative depth allows for the detection of unblockable corridors created by fixed-strategies)
Evaluation function was a combination of :
-- My distance
-- Opponent distance
-- My wall number
-- Opponent's wall number
-- GameTurn at which the wall is placed (-> if the wall can be placed later I'll do so)
-- Specific hardcoded coefficients explained below :
Minimax is great at local game calculations, but pretty bad regarding broad gameplay understanding. I tweaked it so that it would heavily favor placing horizontal walls between turn 5 and turn 9, and vertical walls between turn 10 and 11. This tends to create a box containing you and your opponent, with the exit where you want to go. It also retains all the good local decision-making of minimax for great in-game adaptability.
1v2 strategy was based on 1v1 with the following modification :
-- If an opponent is within 2 distance of the arrival, he's my target
-- If no opponent is within 2 distance of the arrival, my target is the one with the minimal (distance + 3*number of walls left)
-- Evaluation function is modified to take into account both opponents' distances to arrival with equal weight, and my distance to arrival with double weight.
My algorithm is based on paranoid. The way it works is similar to minimax, except that amongst the possible action, opponents will take the one that is the less favorable to me.
I took care of two situation in priority, out my recursion:
- No AI alive has a wall
- Moving once put me on a path that cannot be blocked and let me be the next one to end if I follow it.
For alI other cases, I discerned 3 state of the game:
Hadn’t time to really use it. It helps me to avoid some rare timeout occurring on specific situation, following my path.
Mid and late game:
I used flags to pinpoint preferred actions to play. I launched recursion(s) based on it to determine my preferred/best action.
Some were the calculated for both mid and late game:
- urgentToPutAWall => Was it imperative to put a wall to increase opponents’ paths?
- isTricky => Was someone trying to trick me? (This last one was especially useful in 1v2)
One was calculated for mid game, but was automatically true in late game:
- shouldPutASavingWall => Is this a moment where I can put a wall to defend my path?
Consequences of flags on my recursion:
- isTricky => My first action is a wall making unable one opponent to pursue its fixed strategy.
- urgentToPutAWall (and not isTricky) => My first action is a wall on any opponents’ paths.
- shouldPutASavingWall (and not urgentToPutAWall and not isTricky) => My first action is a wall that counter the combination of two walls that makes my path size to increase the most.
- Finally, when there was none of these flags, my first action could be either moving or putting a wall.
In some cases, I launched the recursion twice, (or even thrice in one case,) so I make sure the recursion would not calculate what was already calculated.
Finally, there is a last parameter for the recursion: depth. As I was using a paranoid algorithm, going too far in 1v2 makes my AI too fearful. Even though having a cute AI to comfort is great, I decided that it would fend for itself.
- Depth = 0 (My action only): isTricky
- Depth = 2 (My action and two next players actions): 1v1 or shouldPutASavingWall or when I was trying to put a wall with no flag on.
- Depth = 1 (My action and next opponent action): All other cases
- Difference of path size between first player to end and the second one. (Negative if I am not first)
- Difference of path size between second player to end and the third one. (Negative if I am third)
- Walls left of each player
In addition, hardcoded a value to tend being the first player to end in 1v1.
In addition, hardcoded a value to tend being the first player to end in 1v1.
Paranoid is a variation on minimax to allow for a game with more than 2 players. As I understand it, in 1v1 you end up having a min-max algorithm. If you have a correct evaluation function, minimax will look for solutions that make you end first. What is this hardcoded value you need to add to your evaluation function specifically for 1v1 ?
Yeah, in 1v1, the worst action an opponent can do to me is to play its best action.
Well, i find my 1v2 valuation more "correct" than than the one I used for 1v1, but for some reason, at some point of the challenge, I got better result in 1v1 adding "+100" to the valuation when I am first.
My 1v2 valuation is not "correct" enough so the hardcoded value is like using a crutch to go further in the ranking.
I think that as most of the players, my general algorithm is a minimax with alpha beta pruning. But I used a genetic algorithm to determine the weight in my evaluation function.
I put in place an iterative deepening (starting from depth 0), and was going deeper until I reach a time-out at 80ms. In order to keep the benefit of the alpha beta pruning, in 1vs2 I was only considering playing against the next player, and considered the third one was not playing at all.
Of course on each node I take into account the different movements of the players
I also have the basic offensive walls placed between the other players and their next shortest path cell.
Additionally I take also some defensive walls, behind the player (between the player and the cells around the player with a greater distance), but also the walls between the next cell and the cell with the same distances (that's usually the horizontal walls creating tunnels)
Finally, around each already placed walls (including those placed in the game tree) I also take the walls that continues the wall in one of the 3 possible directions. It allows my bot to foresee two wall placement in a row that might close an area, or potentially anticipate some walls that might close a path way before reaching it
Considering all these possible movements, it was really difficult to reach a good depth, and I needed to optimize a lot. I particularly enhance the function that allows me to know if a wall is valid or not. In this method I verified if the wall could create a closed area (it touches at least two of either a wall either a board limit) and if it is the case, double check with a kind of Dijkstra approach that it exists at least one path to one of the targets. But the branching factor is still high, and if at the beginning of the game I can go up to 4 in depth, in the late game, I might be able to only goes up to depth 2.
My evaluation function is a weighted sum of different characteristics of the game:
The shortest path to the objective. This is evaluated with a Bread First Search algorithm. I preferred it over A* because it supports multi destination in a single run. I also use these distances in other characteristics
The number of walls that are still available for the player.
The number of reachable cells squared. I am counting all the cells that are reachable for a player, knowing that I consider there is no possibilities to move from an objective cell to an other. It allows my bot to close himself in a small area because in this case he will have only a few reachable cells.
The number of branches along the shortest path. I simply count the number of neighbour minus 2 each cell has between my bot and the objective on the shortest path. It allows my bot to prefer positions in front of a tunnel instead of a wide area. It also allows to place defensive walls that will limit the number of path up to the destination.
Genetic selection of weights
Defining the weights of the different parameters was difficult. On the other hand I already developed a trueskill based ranking system that allowed me to play matches locally when I studied the impact of trueskill on the rankings. It was not a big deal to write a game implementation of the great escape, because I assume my bots were only producing valid moves. In this case, I just needed to remember the number of walls, their coordinates, and just check that the new position of a player is a winning position or not. In a couple of hours I had a working trueskill ranking system allowing me to play my own bots against themselves and select the bests. If it was nice to test new features and at least verify that it enhance my bots against previous versions, it also allowed me to put in place a kind of genetic algorithm to determine the best weights for my evaluation function. I modified my bot so it can take additional parameters at the start-up using the command line arguments. It was then taking those parameters into account, and with a same codeline, I was then able to play matches with different parameters. On each parameters, I took random values at the beginning for 20 players, played around 20 matches for each, select the 4 best, and then starting from those, generate new parameters which were either identical, mean, and a few brand new random values. This next generation played again matches, I select the best, create the next generation, and the process continues 5 times for each parameter.
It takes a lot of time to run those matches, that's why I limited it to only 20 players and 5 iteration on each parameter. Each run took around 6 hours to complete... But even if I can enhance the process, I find out that there was some of the characteristics of my evaluation function that were useless (I did not describe them), and it was also the opportunity to remove them. In the end I did not get the precise values as I expected, but it gave me an interesting estimation of the best parameters, and allowed my bot at that time to climb up from the 100 position to the top 20.
Improvements not done
I had several interesting ideas that I did not implement due to lack of time.
The first one is to place the wall only when necessary. I usually place my walls way too early, and I underestimate it before reading all your strategies and seeing it was an important thing. That was not so costly to implement, and it might have been a great improvement...
The second one is the number of path your bot has to reach the objective. When reading strategic advises on quoridor, you quickly understand that one of the goal of the game, even more than the distance to the objective, is to minimise the number of possible path you have, while maximizing the number of your opponents. I started an implementation of connected components, that would group the walls that touch themselves, and see them as blocks you will have to go around. If your opponent has more blocks in front of him, then he has more possible path.
My branching factor was quite high due to the number of walls I selected. That would be great if I limited them to only those that were really expanding the continuous walls, and avoid selecting all the intermediary.
Nice debriefing. By the way, I'm always interested in seeing your source code if you want to share it.
Early on in the competition, I decided that I was not going to use minimax. I did some back of the envelope calculations and found that I will only be able to go to depth 4 in 2-player games and that is not enough.
Instead I made the following simplifying assumptions:
- The opponent doesn't place any walls. Instead the opponent only walks. Each time the opponent will choose a move that reduces his path to the goal.
- I can only place walls adjacent to the opponent. This means I only need to consider up to 8 wall placements per opponent.
I considered two types of moves:
- Place a wall now, walk for K turns, place another wall
- Walk for (K+1) turns, place a wall
I then evaluated the value of the end state of each possible sequence, found the best sequence (highest value) and played the first move from that sequence. The value of a state was: 100*(enemyDistanceToGoal - myDistanceToGoal) + (myWallCount - enemyWallCount). In 3-player games the enemy was chosen as the one who is closest to their goal.
During the first week, I stumbled upon a fascinating fixed strategy by Laochao[TEAM_AUBAY]. He was making a nice passage for himself around the edge. The strategy was so powerful that it usually won against the top players at the time. I implemented this strategy with some minor improvements and shot up into the top 10. The next day I saw more people using this strategy and I've heard that people were discussing my bot. Then I saw people countering this strategy and realised that it is not going to hold for much longer.
In secret I worked on a new improved fixed strategy. It was even better than the first and was easier to implement. All you had to do is place 4 walls away from you, like so:
I did this for the first few moves and then switched to my regular search strategy (above). The minimax players at the time struggled against this strategy because they didn't have enough lookahead depth. If your opponent was below the line (B) then you could block his path (to B) with your remaining 6 walls and he would have to walk all the way around. By that time you were already near the end. Your opponent couldn't block you easily because this was your only path to the goal. Similarly this worked well when your opponent was above the line as you just need to block the path to A. I decided to keep this strategy as a secret until the end. Unfortunately in the last two days I noticed other bots using this same strategy, it even worked well in some 3-player games. I believe they came up with the same idea independently as it was very close to the surface. Now I had nothing to lose and submitted my bot with this strategy, but by this time the minimax players got really good and I could only manage top 30. My final submission was a fixed strategy that could counter people using the same strategy. Basically if I detect the opponent using this strategy then I put a wall in front of them like so:
B|---- | <---- put this wall!
I was disappointed with my final submission because it only got 52nd. I think if they played more games I could possibly slowly climb back up into top 30, but that didn't happen. This was my only complaint about the competition. Everything else was really great. The game was a good choice and the competition was very strong. The fact that it is a well known game didn't spoil the competition for me, because current AI strategies for Quoridor are not that good. I hope we have improved the current AI and hopefully even the way humans play. I believe these fixed opening strategies could also help human play.
My name is Roman, and I was placed 3rd in the final rankings.
As this is my first contest here at Codingame, I'd like to say a few words about this platform. I think it is very cool because it supports a lot of languages. There is a lot of single player tasks where you can practice and earn some funny achievements. Player of battles is also very cool and handy. I appreciate a lot the frequency of battles between players, which allows to estimate your new AI version in less than hour.
One thing that frustrated me was that I didn't receive any answer from the organizers on my topic "How will winners of The Great Escape be determined?" =(
Now about my AI, which was quite good, but of course not so good as Recar's one =)
At the beginning of the contest I decided that minimax is not the way to go because of large amount of possible wall placements on each turn. So I ended up with some kind of a greedy algorithm, that was some mix of heuristics and brute-force methods with some reasonable assumptions. I said this algorithm is greedy because it tries to do some actions in the fixed order of their importance, and performs first suitable action. On each turn my AI used the following scenario without any information from the previous turns:
1) If I have a path to the goal, such that no opponent can finish before me and no wall can block this path, I'll go straight to the goal.
2) If I can put some wall which will lead to the situation above, I'll do this. This search was done with backtracking with depth = 3. That means I do recursive search of all possible placements of three walls discarding the actions of the opponent.
3) If opponent can put some wall with similar property, I find a wall to prevent this and put this wall.
4A) Find wall which I can put now in order to make opponent's path as long as possible. This search was done with backtracking with depth = 5, assuming that opponent won't put any walls and will just move towards his goal. Lets say this is currentBestWall.
4B) Find similar wall, but for the next turn, when opponent made one step. Lets say this is nextBestWall.
4C) If nextBestWall will make opponent's path not so long as currentBestWall, put currentBestWall right now.
5) If I can put some wall which will leave me in small part of map, I'll do this. This search was done with backtracking with depth = 3.
6) If opponent can put some wall with similar property, I find a wall to prevent this and put this wall.
7) Otherwise I move towards my goal.
For 1vs2 battles I decided that my opponent is always the next player. I thought:
If I am player 0, and player 2 is about to win, why should I spent my turn in order to prevent it? There is player 1 who can block him, so I will care about blocking player 1 supposing that he will block player 2!
Of course there are many details and optimizations which allowed to perform all this recursive searches in less than 100 ms. Final version of my AI has 1200 lines of code.
Are you willing to share some of the optimizations you used in order to perform so many recursive searches? Would be very interesting!
I tried to use optimal types to store and copy data, and used MapReduce to evaluate branches (actually, poorly implemented Map, and parallelized the Reduce function).
To give you an example of what I call optimal type to store data, here are what I did for my BFS, which is very linked to the performance of the recursion. The types I give in my explanation are Java type.
First of all, after looking at how optimizations are done for chess AI, I decided that positions (players and walls) would be Short. Actually, I wanted 4 bits for the x position and 4 bits for the y position. (Byte type use 8 bits, but one bit is used for negative number so it wasn't handy).
In my BFS, I use a hash that gives me the previous point needed to get to one point. As points are of Short type, there are no possible collisions and this hash is accessible in O(1).
This hash allowed me to check whether I already tried a point or not, and I used it to recreate the shortest path.
To know which point I should check, I used the Queue ArrayDeque. I just wanted to push elements at the end of this queue and pop the first element. To do this, ArrayDeque is the fastest type I know.
To check if one point is on the objective, I used bitwise operations. Objective points were either going to a fixed x or a fixed y, so I put the non-fixed coordinate as 0x0F to distinguish it easily.
Even though it was fast, I just realized I could have done better: instead of doing bitwise operations every time, I could have created a hash in the beginning of the first round that link an objective to a Set of points. Then, I would have to get in O(1) the Set (only once per path finding) and look in O(1) if the Set contains the current point.
When the current point was not the objective, I used a hash to add all neighbors to the queue. This hash was created in the beginning of the first round and updated at the beginning of each round to remove neighbors that were not accessible because of a wall one AI put on the board.
In a nutshell, I stored data in hash (with no collisions) and queue, and used MapReduce. When possible, hashes were created upstream.
I tried two approaches.
First was mini-max with alpha beta pruning. However, after some tries, I decided to experiment with simple heuristics version and it shows not bad result. I had some ideas how to improve mini-max version but it has not so much sense =)
One of unusual things =). Max path length heuristic. I called this one “path area”. It counted as sum of all areas, which connected to my dragon and to exit or which connected to my dragon’s area at least with two cells. Exit cells considered as impassable.
Here are some examples. “Path area” for Orange dragon marked
Opponent cannot make my path longer than this “path area”. Sometimes it is greater than actual max path length but never is less.
Usually our rival is next player. However, if he already won or his minimal path is longer than my maximal path I take
next rival. Alternatively, if my minimal path is better and rival has no walls
to change it.
Hardcoded turn for pattern
If enemy puts first wall on one of those lines then I put my wall near with one cell gap. This pattern become active in last day, so it was fast fix =)
Check if enemy can cut his max path by placing one wall and if we can block this turn – block it.
In this Replay my first wall prevents closing area by red player.
Find if I can cut my maximal path with three walls for 2p game or with two walls for 3p game then do it. If this is multiwall
solution then select wall which makes enemy path longer and put this wall first.
If enemy can put super blocking wall and we can block his turn – do it. Super blocking is depends from number of walls we have and from number of players. Usually, it is from three to five cells.
Looking for one blocking wall on entire board, which can delay enemy at least on five cells. Check if enemy cannot use it on next turn to super block me. If he can't then place this wall.
Finding best blocking wall with in depth search.
Some kind of mini-max but very limited. Enemy can only select one of decreasing his path turns. Walls placed only to block opponent from one of next moves or we can skip place wall. Depth is about 10 and maximal amount of our walls is four.
After wall placing turn found, check if enemy cannot use it to delay me more. If he cannot do this turn.
If we are here, it means we decide not to place wall on this turn. I just check all equal decreasing our path moves and selecting best which rival can delay less.
I looking for only next steps not entire path. Only cells marked by green on this image:
I send wave from all target points until it reaches my dragon.
100000 calls of this function on empty board from (8,5) for red player consumes about 110-120ms on codegame’s servers.
Minimal path length.
Often I need only minimal path length, but do not care about path itself. So, for this case I have separate function. It sends wave from dragon until it reaches any exit point.
100000 calls of this function on empty board from (8,5) for red player consumes about 75-80ms on codegame’s servers.
@olegkuznecov this is some amazing work, it must have taken you at least one hour or two to just share all this with printed image and stuff, great sharing
Thank you for sharing!
Hi Recar, I'm actually working on Great Escape multiplayer game. I tried MinMax but i find out it will be very hard to have something very effective due to the execution time. So i'll try an heuristic version too and i read your strategie feeback but I'm not sur I understood your "path area". Is this all the cells that your dragon can walk in if he follows the shortest path (considering that the opponent can block you). Here is an example can you tell me if the "path area" is correct in this picture for the orange dragon :
I didn't really understand how the path area is created. The starting point are obviously all areas connected to the dragon or to the exit. But I can't see how the rule "connected with 2 areas in the path area" can create the entire path area. Something is blocking me.
Is there any pseudo code to explain it?
My evaluation functions for finding a path and checking for playable walls seem really slow. Does anyone have any clever data structures they used to minimize generating new objects while searching for winning paths and things like that? I'm doing a lot of recursive a* and it's very expensive.