Send your feedback or ask for help here!
Is there a way to create a custom map for testing?
No, but it is a good idea. I’ll try to code something tomorrow to add that!
Hmm, this one sounds a bit familiar…
Is it exactly same physics/rules as CSB, just solo?
(Sorry, could check myself but I am as lazy as a Haskell expression evaluator…)
Yes, it is CSB, but solo. My own motivation for making this is to have a playground to work with simpler NN for CG.
Contrary to CSB, the positions are truncated and not rounded to the nearest integer even if the statement says so.
I found this not possible as there is no way to enter custom input for Optimization puzzles.
Some of us aren’t up to real AI and must resort to mere simulation.
We have to know: truncate or round?
Angles are rounded.
Position is truncated
Speed is truncated.
First I 'd like to thank @Illedan for putting this together, very much appreciated! I was looking for a pure driving simulation to work on my predictions for CSB, and this is it. Well done.
Now for a little rant. The turning physics in the game do not behave at all like described. From the description:
It will at max turn 18 degrees from where the current heading are.
And, in the expert rules:
The car rotates to face the target point, with a maximum of 18 degrees.
Furthermore, in the input specification:
angle. Heading angle in degrees between 0 and 360 for the Car.
So one might think the ‘angle’ input is what defines the ‘current heading’. Not so. There are two scenarios:
- The desired turning angle is < pi/10
- The desired turning angle is >= pi/10
In scenario 1, the current heading is actually defined by the relative positions of the car and the target and no rounding of angles occurs in the computation of the acceleration for the next move. Using the angle reported in the turn’s input will (occasionally) yield wrong predictions in this scenario.
That’s confusing enough, but it gets better. The reported ‘angle’ is not the ‘current heading’ in scenario 2 either! That’s because there is an intermediate rounding hidden in the engine: at the end of each turn, the game engine converts the current angle to degrees and rounds it to the nearest integer degree. Then it gets converted back to radians and mapped to [0, 2pi]. Finally, when the angle is reported to the player, it gets rounded to the nearest integer degree again. But the engine uses the angle from before this final rounding to compute the acceleration for the next turn. That’s a value player code never sees nor can easily derive from the inputs. Sure, knowing this you can keep track, but hey…
To be fair, there is this sentence with a link to the engine code on github in the documentation:
If you’re going to run local simulations, you’ll need to look at the referee.
Maybe this one sentence should replace the entire description of the game physics in the current state of affairs. I have a hunch that the behavior described above is not intended. But maybe it is – I’d like to know.
Obviously, I had too much time on my hands and some fun figuring this out. So thanks again!
Thx for the feedback, glad you had fun figuring this out
I will add a little more descriptive information in the statement!
Is there any time bug cause’ i can use 37-38 ms max?
if yes,so @Illedan can you fix it?
Same here, 50 ms works for test cases for me, but when I submit, I must use 30 ms in order to get 100%.
I think it could be cool to share a bit about how we achieve our results. At the moment it is 10771 for me.
I trained offline and my bot simply recognize the race and output precomputed actions.
To precompute these actions, I train an NN using DPG (Deterministic Policy Gradient, https://arxiv.org/abs/1509.02971) and record the best result seen for a given race while training. Without recording the best result, my end of training NN scores about 10900-11000. I trained my NN using all the races (I did not tried to learn one race at a time).
For DPG, I used 2 NN:
- one for Q value that takes the states (full context, POD and all checkpoints and the action angle/thrust) and outputs a single value
- one for actions that takes the states (full context, POD and all checkpoints) and outputs 2 values (angle and thrust).
Note: I used my own NN toolbox that I developed for CSB and so to compute dQ/dAction, I approximate the derivative using (q(action + delta) - q(action - delta)) / 2delta.
For me, the mains issues were:
- Stable Q training. I tried the slow moving weight averaging used as a target by the original paper but it was not very good in my tests/setup. I end up using ensemble training with multiple Q NN (and associated action network) and use the Q average as the target.
- Defining the right reward (and associated gamma)
- The amount of exploration