The Accountant - Feedback & Strategy

It explains why I moved two places up . But how have they cheated ? (If it can be disclosed)

What they did?

Hi there,

On my side, mostly like people described above, I was simulating many entirely random moves, with a very deep evaluation (up to 60 turns in the future, unless the game is won or lost before). At first it was a simple Monte Carlo search, ranking solutions only by their final outcome (win / lose, then end score). Then I moved to a Genetic Algorithm search, which allowed it to converge more predictably to good solutions.

My final evaluation function was ranking solution based on, in order:

  • whether or not we killed at least one enemy, as if Wolff doesn’t kill anyone, the game is lost.
  • whether or not we are alive (or killed after 5-10 turns, which I considered as to be kind of alive, hoping to find a better solution in the next turns), this allowed the algorithm to keep solutions that looked almost good, and try to stay alive only if we are going to die in the next few turns.
  • if Wolff isn’t alive at the end of the solution, the later we die, the better. if Wolff is alive, the depth is only taken into account after the score, and if it’s a winning solution (no more enemies), the depth is always considered as maximum.
  • maximize the score, as a decaying sum of the score for every turn of the solution. This favored solutions where we lose data points less quickly, or where we kill enemies more quickly. The decay factor I used was around 0.95 (so each turn score accounts for 0.95turn of the compared scores).
  • maximize the distance to the enemy that killed Wolff if he’s dead at the end of the solution.
  • maximize the total enemy life to total shoot count ratio

I quickly implemented the initial version, that got me quite a high score on the first Sunday, I was ranked first and couldn’t really believe it.

The main trouble for some time was timeout issues that I didn’t understand at first (27, I’m looking at you). It took quite some time to figure out what was happening.

At the end, my Genetic Algorithm still didn’t work very well on scenarios with many entities, probably it wasn’t simulating enough solutions to be efficient, so I also implemented a fallback to my simpler Monte Carlo search when the number of enemies was over 50, going back to GA whenever enough have been killed.

I managed to stay ranked first during the first week, spending most of my evenings on improving my code, but then on Sunday, after loosing my rank I decided to step down and I didn’t do any more changes.

First of all I didn’t exactly know what to improve, although I knew I wasn’t doing very good scores on Extreme test case and that could have been a lead, but overall I felt I was spending too much time on the challenge already and didn’t want to spend the following week on it as well.

That’s probably the main negative point in my opinion: two weeks is too much for me, given the amount of involvement I generally put in CG challenges. Otherwise the challenge was fun and the game mechanics were fairly straightforward and easy to implement.

6 Likes

Can you explain ‘cheating’ in details (as detailed as you can), please?

+1, How could they cheat? Did they hardcode actions as some people did at Code vs Zombies?

How did you implement crossover? I tried PMX as I had no “shoot enemy” but “kill enemy” genes and you obviously can’t kill an enemy twice. But I didn’t find any situation where it would do something useful, I had my best scores with mutation only (adding move commands between kill commands, swapping two kill commands)

As i know you can write local files when using bash for instance (which i use sometime to compile portion of my bash script to c++ and then run it from local directory which is quite practical to beat some speed requirement on some puzzles ^^) I suspect some people could have dumped the validators to local directory into some file and then retrieved it, for instance that is a possibility.

Stuff like that seem like obvious cheating so i did not even try to go that way.

No matter how long is the contest I always feel like I was out of time even thought I forced myself to code faster than i would normally, I spent most of my time devising algorithms in my mind and feel like i should have been trying to code stuff faster. But I’ve been programming like that for ages and it’s hard to change now :slight_smile:

Single-point crossover (with a probability of 0.88 to do that crossover).

My solution was based on building the search tree for the world, selecting best possible moves - basically similar to what multiple people have already described.

Firstly the world state changes are deterministic and known so I’ve coded the world evaluation function based on the problem description. This was pretty critical part of the solution to be done right. So far I think there were some mismatches between judges system and my implementation which might have lead to lower scores for me.

At each step I’ve kept the search tree and a queue of unevaluated nodes leading deeper into the search space. If there was a mismatch between the state of the root in the search tree and current input from the judge system, the entire tree was rebuilt from the start. To reduce the search space I’ve maintained a set of seen states, where states were described in reduced form: player position divided by enemy step size (to shrink the board space), vectors of alive enemies and data points.

To explore the search space I’ve had a few variants: shoot closest enemy, shoot enemy closest to a point, move to closest enemy, move to enemies centroid, run from enemies, or something similar. To implement the “run from the enemies” step I’ve calculated vectors from each close enemy and wall to the player, summed them up, normalized and used as a direction; to handle the case where enemies are distributed around the player I’ve searched for the widest opening between enemy points in terms of an angle size from the player position - this part wasn’t implemented entirely right in the final solution, but it worked well enough, so I’ve kept it.

I’ve coded my solution in C++. Solution file was growing in size pretty quickly, and I wanted to have some separate testable modules, so I’ve implemented a simple system to combine multiple source files to single output artifact that can be posted to the CodinGame IDE. It consisted of a simple python script that reads a description file, which simply contains filenames of the headers and sources in appropriate order as they should be combined to the final solution file. I’ve automated the building (calling the combining script and compiling everything) using CMake.

For anyone interested here is my solution: https://github.com/xffox/codingame_the_accountant .

My endscore was ~36000.

This was real fun, many thanks to the organizers and players.

2 Likes

The solution was implemented in C++, but I used Python and the Jupyter notebook1 for the local UI and tools. pybind112
makes it easy to call C++ code (the simulation) from Python (in the
notebook). Also with Python, I could gather and display nice stats to
know how the GA was performing.

Jupyter part is really great. I’ve wanted to implement something like this, but never had a good idea of how to organize it. Thanks for sharing.

Finished rank 8 after the leaderboard correction. Was quite fun overall, but not as exciting as Hypersonic for reasons I will detail further.

Game feedback

  • Even if it started way too soon after Hypersonic, I really enjoyed the 2 weeks (3 week-ends!), it allowed me to think about things without feeling rushed.
  • The description of the mechanics was stellar this time, save for the small floating point bug that ended up fixed. Please keep up this level of details in the future.
  • The game was simple to grasp, easy to get started with, and good depth making it hard to master. Not all games need to be like that, but the simplicity was refreshing.
  • Way too few test cases at start, but the decision to add more after was great. I still think there could have been more, and closer to the validators (for example, validator 34 had nothing close).
  • Lack of feedback from validators was frustrating.
  • Having a single number to work with is not motivating either. Why did my latest submit drop 2k points when I worked hard for it to actually score higher and more consistently in IDE? Who knows! There was no point even using the IDE at the end as it was barely representative of a submit result. Just guesswork about what seems to score better.
  • The scoring method in these challenges needs a big overhaul. Having to spam submits in hope of getting your actual best high was not fun, especially with the huge variance in score (1000-1500) I had with the same code, which was entirely deterministic given a similar number of iterations.
  • Too much variation on the server performance, especially with memory cache I believe. I think it significantly penalize some classes of algorithms. Probably nothing that can be fixed, but definitely something I will keep in mind for next time.

My strategy

  • I used a best-first variation of beam search for exploring potential moves. It kept a max of 512k nodes (player actions with heuristic), 64k states (all enemy pos at a given point in time) and 64k timelines (uninterrupted sequence of states). When one of the buffers would get full, it would be trimmed to make room for the next expansions.
  • The actions explored were only the direct low level of moving and shooting. I tried implementing higher level actions like I did in Hypersonic without much success. There is really only one goal here (shoot enemies) and too many viable ways to go about it for this to be helpful.
  • The movement strategy starts with going in 16 directions around the player at full speed, then keep going in that direction for at least 3 turns. Afterwards, 5 new sub-deviations are added to the original direction to compensate for the lack of precision, and it keeps searching in those directions.
  • Should an enemy get close enough, the previous strategy is ignored, and the next movements are much more varied and fine-grained. 9 directions around the general direction and 5 speeds are considered, but only for as long as the enemy remains close.
  • After each move, try shooting targets. Once starting to shoot, it would not consider moving again until the target is dead. The rationale is, if a move was actually needed to survive the shootout, the player should have started shooting from another position since the beginning, so pruning out that node early was good.
  • The balance of exploration vs. exploitation was done through the heuristic. I tried different things but narrowed it down to the things that actually helped me score better:
    • Shooting for low damage had a much higher cost.
    • Moving in a way that does not decrease the minimum distance to an enemy gets a big penalty.
    • Killing an enemy yields a good score, but only if the average damage per shot is reasonable, otherwise it’s valued down.
    • A point getting destroyed incurs another big penalty. It helps the algorithm to consider many scenarios before accepting that a point cannot be saved and move on.
  • I could not figure out a way in my final version to balance good overall scoring strategy while correctly solving the “Near impossible to save any points” case.
    • It always preferred shooting the nearby targets, realizing too late the points would be destroyed by the single armored guy.
    • Any tweak to make it consider that scenario (without hardcoding) would end up screwing the scoring in other test cases.
    • I needed to try a major revision of my search technique to make it happen, but decided I did not have enough time left to try yet another approach, preferring to do well on all other cases that should give me a better score. If I knew I would end up scoring so well, maybe I would have tried again to get everything right instead. That could have gotten me very close to top 3.

Other notes

  • I thought of making a visualization of my algorithm, but was not sure the time investment would be worthwhile vs. testing out more stuff. After much firing in the dark, and seeing what others did, I will definitely prepare better to have an easy way to make metrics and visualizations for the next contest. Great work there guys!
  • The 1 second on first move, and the 25 submits per 5 hours limit should have been part of the statement. Ideally, no one should need to hang around in chat to get all the information.

Thanks again for the contest. Looking forward to Fantastic Bits!

3 Likes

Could you explain more about this please :slight_smile:

That was a great challenge. Many contributions here are stochastic and I believe it was the best option. Which is why I went full deterministic, basically with a topological sorting. It goes quickly like this :

  • Extract the next positions of enemies. There is a choice there : are enemies to be dependant of each other or not ? Since we plan to kill them in order, I assumed they are independant (i.e. each enemy reaches its target before chosing another).
  • We have now a table of estimed arrival of each enemy on each data. Now we run a single simulation : kill the first enemy to reach a data, then the next, and so on. If we realize some data cannot be saved, then we put it in a trash can and we restart the simulation from scratch, ignoring every trashed data. When we are done, we simply know which data to ignore and which ones to save.
  • To kill an enemy, we assume that the best way is to move n steps at full speed towards the target data of the enemy, then shoot s times until he is dead. We (almost) always want to minimize the time (n+s) to do so, then s. So we try each combination – there are only a few of them – of n and s according to the future positions of the enemy. (The exception is when there is only one enemy left : then we just want to minimize s before the enemy reaches any data.)
  • Of course there are some adjustments to be done to avoid being on the path of enemies. This is to be done carefully as it can become quite costly.
  • Even if the algorithm is polynomial, it timeouts on larger tests (with Python3 implementation). So I had to fall back on a basic one (shoot the closest) for these.

This algorithm seems simplistic and, well, it is not very bright and there is plenty of room for improvement. Still, I reached 30k with it and I believe it could have been a backbone for a more complex implementation.

5 Likes

Can you name the specific items of the rules they violated? Or you disqualifiy users just because it seems to you they cheated? The formal side of the issue is very important with such prizes.

3 Likes

Which part ? The “GA on LCG seeds” ?
It can sound a bit cryptic indeed, in fact I had, let me describe the three methods i tried.

first in any all methods i used a simple fast 32 bits LCG to generate a series of 32 bits random values used as random source. Every time i had to make a random decision. For instance, when i was in a range to chose between shooting or getting closer to my current destination, i would just pull out a value from the current 32 bit value then next time i would use the next value and so on.

At the end of the process i would save the number of value used as length of the current attempt with the corresponding score. Idea being to extract a list of commands from the best scoring list of random values.

1.) I first used a GA but with in mind the idea of keeping the number of used 32 bits to a minimum in order to save some process time, i tried a method which was multiplying the current seed by the range of value i wanted to extract and retrieving the result in the higher 32 bits (using temporary 64 bit to do so) I then just saved the new lower 32 bits as the current seed and only retrieving the next seed if the lower 16 bits of the current seed was zero. In that method, I forced bit 0 to one of every random 32 bits and I often used either prime or at least odd values as ranges so to minimize the number of time i would retrieve another value, that pretty much doing a second LCG on those random values.

The GA would then consist of a normal single point cross over on those seeds but with bit boundaries, using a crossover at about 90% (well i use powers of two ratios to speed up the calculations that was 15/16th) and a mutation ratio, i tried several things, one bit (1-31), xor 0x5a<<(0-30) or replacing a full char with a random value. It was indeed basically very hard to change a single bit in the value without completely changing the list of commands but for some reason this was giving me the best overall performance. As of my observation over the course of a few tries at least. I suspect it was just granting to do more tries per frame limiting the overall process but this is just an assumption.

Maybe I have been fooled by some steep variance, that is a possibility, it was pretty hard to get a real statistical view of the results without writing an external simulator which i did not do.

2.) I tried a more “classical” GA retrieving a 32 bit value for every time i would need a random value, crossover would then just use 32 bits boundaries and mutation would just change a whole value. That should have made change in the chromosome less of an impact on the forthcomming commands but in fact changing the behavior of wolff at some point was logically affecting the entire process anyway so all in all it was just giving me a little less bang for my bucks as it needed a lot more 32 bits values to run and was therefore likely slower. at least that is my understanding.

3.) Thinking that my GA was probably essentially generating random tries more than actually really mixing solutions together, I tried a simple MC kind of approach generating new values on the fly and saving them, only keeping the best result, it was doing relatively ok as my algorithm was quite deterministic but it wasn’t as good as using the GA which was still probably retaining some solution mixing on the first random choices.

I would really have expected the option 2 to perform better but for some reason, maybe performance, method 1 which was very much like performing a GA on LCG seeds gave better results…

When i added some extra random on the position i used several primes as range, as when you want to go left or right of an enemy you need to go about 2000 unit left or right (i just used perp of the vector from wolff to the enemy scaled 2010 in my fixed method) I then tried primes in the ranges of ~4000 with position of x and y just a random point in a square of +/- 2000, i tried several values and the best result i got was in the range of +/- 1500 actually that’s how i could get result of 7384, 7580 and even 7890 on extreme. However with a greater range of variance than with my fixed method.

I regret not trying to use some power of 2 as range like 2048 / 4096 in order to get rid of a maximum number of bits in the current seed so to allow the GA to randomly try different positions without so much of an impact on the next commands but that would have been pretty much the same as method 2 which didn’t show any improvement over method 1 still to my surprise as of now.

However my main regret will remind to not being able to do much submit on that last method for it was really showing some improvement over my first fixed position method, from a peak submit perspective it had really good potential, oh well I should probably have been coding faster instead of spending most of my time devising algorithms. My first really working version was up like 3 days before the end of the contest, that was still better than having my first “final version” as my last submit like i did with hypersonic, In fact I didn’t expect that the submit would take SO long though (it was like 6 hours!!!) and I was really caught off guard in that last contest with that stupid length for a submit which kept me from doing any improvement…

I guess I should just try to code something up faster as most of the time I end up with something pretty similar with what I had in mind from the get go anyway…

Sorry for the wall of text :wink:

2 Likes

Even if you do that and a create a local file on the dataserver, how can you upload it to your pc ?

First it’s all assumption nothing i tried, but the idea would be to retrieve what you saved during a submit when running in the ide and then just print out values in the debug of the ide.

That implies you are running in the same environnement both when submitting and when running test cases or at least you manage to save the data in a way that you can retrieve from another environnement. That just might not work but AFAIK, as you can’t send anything through network you would need to do something like that to extract the validators, or you need to hack into CG’s server in some way or eventually have some insider’s contact or some accès on the remote server CG might be using for processing power… Well just standard hacking stuff…

I’m not a security expert though so that just might be something entirely different.

You can write and read files in any language, but you can’t store them between sessions, so I don’t think it’s possible to extract the validators, unless you actually have access to them by hacking the server or being sent by a member of staff. You could find out how many data points and enemies were on the level, and also narrow down some very specific cases. You could also link some IDE tests to the validators and know which one is which (although they were altered), but these doesn’t really help you to hardcode any solution (because they are altered of course), but to give you a general idea on what you need to improve on. It’s interesting though that improving the IDE score often made the validators worse.

I believe this is because of the most variation points were essentially on validators that were not available as test case and optimizing on test case would make you win a few hundred of point here and there and lose thousands in a non representative validator.

That is one of the reason i feel this was not really an optimization contest, you can’t really optimize something that you have no data on beside how it works when you have such limited time to calculate a frame. sad enough best score were probably achieved on lucky submits rather than on overall performance.

Not to say that top people does not deserve to be where they are, but when you realize that using random position gives you a huge margin of improvement within a larger variance it becomes clear that as soon as your algorithm achieve regular scores on most validators, then it all boils down on being lucky enough to align a good draw on each heavy validator in the same submit…

It’s clear that if you can range for about 3k on one validator say 5 to 8k based on the random draws with about 1/8 chance to achieve max, if you have 3 validator in that class you have about 1/512 probability of hitting your highest possible score, even doing 256 submits that would be 1/2 chance to never see it happen. That is with a range of score within 9k which is really a lot regarding the overall scoring range.

It’s indeed not exactly in line with the numbers you can read up there from SaiksyApo post but the idea remains valid, either the random be spread on 3 validator for 3k each or on 6 validator for different values your score is still completely up to a submit lottery.

That is why imho, scoring should have been made more variance independent with either a mean score out of several run on the same validator or eventually the best score out of those runs, probably mean score rather than peak score as it would favor high variance and multiple submits yet again anyway.

1 Like

That is true that the very point of contest was to optimize against validators, not against test cases. And when people do that – you ban them. Logic :smiley:

I’m so in love with this decision that there is no strict line between ‘valid’ optimizations and ‘prohibited’ ones so you possibilities are infinite.