Neural Network Ressources

fenrir1 · February 16, 2019, 7:42pm

Hi,

My current bot for CSB is also a NN but I used a different approach than Agade and pb4.
My NN directly outputs a joint policy for both of my bots (6 choices for each bots, so 6x6 for the joint policy) and I sample this policy at each turn. I have no search algorithm on top of it. I do only one NN evaluation per turn.

For the NN training, I used a policy gradient algorithm: A2C. A few points:

self-play using the same NNs for both player
full roll-out and GAE (Generalized Advantage Estimation) instead of n-step for the advantage
advantages are normalized for each batch
entropy of the policy in the cost function to (try to) avoid converging too fast on a sub optimal policy.
2 types of reward: a training reward and the real reward, I linearly switched from the training reward to the real reward during the first part of training.
ADAM for the gradient descent with really large batch (a few k samples).
Input normalization (guessed at first guesses, and then using statistics gathered during a whole training)
very long training duration. The NN learns really fast to have a very good runner (less 30 minutes I think) but having a good blocker take a very long time.
a gamma closed to 1 (I am not sure which value I used for the submitted bot, either 0.999 or 0.9999)

For the rewards:

Real reward: -1, 0, 1 at the end of the game and 0 otherwise
Training reward: the real reward + a small bonus/malus for each checkpoint checked and bonus for each collision at each step.
A hack to avoid run-away: I made a player loose if a bot go too far away. It helped during early training but it had sides effects: my NN didn’t do anything anymore when an opponent bot was running away, I had to workaround that in my submitted bot (thanks pb4 for showing me this issue).

My NN has 80 inputs, 1 + 36 outputs, 4 hidden layers using leaky relu (1 common + 3 per output head), tanh for the value, and softmax for the policy.
To have a submitted bot below 100k, I do a final training pass using weight quantization down to 7 bits with a quantization ‘aware’ gradient descent with a reduced learning rate and entropy.
It allows shrinking the NN from ~200k of binary floats to a base85 encoded string of ~60k used by my code (in C) while keeping a quite good accuracy

Good game to Agade and pb4 and thanks a lot for your article. I did tried multiple times to do a Nash/CE Q learning but I never succeeded in achieving anything converging.

Don’t hesitate if you have any questions.

Kirbiby · February 18, 2019, 7:40am

For Pb4, Agade and Fenrir,
What do you use for the NN: python, C++, Caffee, Tensorflow, etc ?

Agade · February 18, 2019, 12:07pm

@pb4 and I used C++ without libraries. From speaking with Fenrir I believe he as well.

I made a working version of my double-runner Q learning bot in Python+Tensorflow last year but I was not able to get reasonable performance, it was orders of magnitude slower than my C++. I surely did not code it optimally though.

Kirbiby · February 18, 2019, 1:04pm

Okay !
So the first step is not a Q learning tutorial, but a “Write your own kickass NN in C++” tutorial

pb4 · February 18, 2019, 4:22pm

If it’s a learning experiment, do write your own NN code. You’ll know it in and out, and be more comfortable adapting it. The drawbacks are that you’ll have to make sure it’s not bugged and you can’t easily use “fancy” stuff such as dropout, multi-head networks, etc…

One thing I wish I had known : you’ll find much more stuff with which to start in Python. See for example, state of the art DQN ready-made implementation: https://github.com/Kaixhin/Rainbow
I wish I could compare my DQN implementation to Rainbow…

fenrir1 · February 18, 2019, 9:24pm

For learning purposes, I wrote my own NN code (in C). One of the drawbacks is that it is difficult to be sure to have something without too much bugs, NN are quite resilient to buggy code (but with sub optimal results)…

jolindien · February 19, 2019, 3:17pm

For those interested, here is my implementation of a neural network in C++ https://github.com/jolindien-git/Neural-Networks
This remains a beta version (so possibly with bugs) not optimized. But it seems to work correctly with the few examples I have tested.

zaitsev_maxim · October 21, 2019, 6:16pm

Actually, to train neural nets there must be environment for personal use. If there is a way to interact with environment locally, it would be great. Otherwise you need write it on your own, but obviously often it’s not possible.
It would be good if there is a server.exe file for environment or something similar.

Energy_Spectrum · March 29, 2020, 3:45pm

Hello,
Can you help me?
What to do to save the weights for neural networks in “codingame”?

jolindien · April 5, 2020, 5:07pm

Hello Batya_S1
you can’t save from online IDE.
You only can train locally with your computer… then you can create a list of weights to use online.

Energy_Spectrum · March 26, 2021, 9:13am

Thank you jolindien

DomiKo · May 19, 2021, 11:10pm

After 4 years here I come.
I have a question to dataset.
I really don’t understand input.

int D = 602+fastRandInt(0, 10000)*sqrt(fastRandDouble());
double angle = fastRandDouble(-PI, PI)*sqrt(fastRandDouble());
double angleV = fastRandDouble(-PI, PI)*sqrt(fastRandDouble());
double normV = fastRandDouble()*fastRandDouble()*1000.0;
double angleNextDir = fastRandDouble(-PI, PI);
double distanceCP1CP2 = fastRandInt(1300, 10000)*sqrt(fastRandDouble());

could you describe those variables?

I guess that angle is current pod angle, angleV is angle of speed, normV is norm of speed vector, distanceCP1CP2 is distance between next and next next checkpoint.

But what is angleNextDir. I can’t guess it.

pb4 · May 20, 2021, 5:58am

angleNextDir is the angle between (pod->CP1) and (CP1->CP2)

I have copied the header of the datafile here and reorganized/added a few comments :

// Section 1 : distribution of random gamestates used for this database
int D = 602+fastRandInt(0, 10000)*sqrt(fastRandDouble()); // Distance between pod and CP1 (his next CP)
double angle = fastRandDouble(-PI, PI)*sqrt(fastRandDouble());	// Assuming angle 0 is "straight to CP1", this is where the pod looks
double angleV = fastRandDouble(-PI, PI)*sqrt(fastRandDouble());// Assuming angle 0 is "straight to CP1", this is angle of the normalized pod     velocity
double normV = fastRandDouble()*fastRandDouble()*1000.0; // Current speed of the pod
double angleNextDir = fastRandDouble(-PI, PI);// Assuming angle 0 is "straight to CP1", this is the angle between (pod->CP1)     and (CP1->CP2)
double distanceCP1CP2 = fastRandInt(1300, 10000)*sqrt(fastRandDouble());// Distance between CP1 and CP2

// Section 2 : features calculated from this distribution
double angleScalD = dot(vec2ByAngle(angle), vec2(1.0,0.0));
double angleCrossD = cross(vec2ByAngle(angle), vec2(1.0,0.0));
int vScalD = (int)(normV*dot(vec2ByAngle(angleV), vec2(1.0,0.0)));
int vCrossD = (int)(normV*cross(vec2ByAngle(angleV), vec2(1.0,0.0)));
double nextDirScalD = dot(vec2ByAngle(angleNextDir), vec2(1.0, 0.0));
double nextDirCrossD = cross(vec2ByAngle(angleNextDir), vec2(1.0, 0.0));

// Section 3 : data from section 1 and 2 is normalized this way before being provided to the neural network as input (saved in the database on one line with space separators)
(D-600)/20000.0 
angleScalD
angleCrossD
vScalD/1000.0
vCrossD/1000.0
nextDirScalD
nextDirCrossD
(distanceCP1CP2 - 1200.0) / 10000.0

// Section 4 : same as section 3, but for the output (saved in the database on one line with space separators)
(gameAction.deltaAngle*180/PI+18.0)/36.0
gameAction.thrust/200.0

// Miscellaneous : 
dot(const vec2 &a, const vec2 &b) {
  return a.x*b.x+a.y*b.y;
}
cross(const vec2 &vec, const vec2 &axe) {
  return vec.y*axe.x-vec.x*axe.y;
}

DomiKo · May 20, 2021, 9:57am

Thank you so much

jrke · June 7, 2021, 4:24am

Can you tell me what does vec2ByAngle do (alternative in python), and for ‘angleV’ we have to multiply angle with sqrt(vx) or sqrt((vx * vx)+(vy * vy))?
Can you please clear that.

pb4 · June 7, 2021, 4:48am

Vec2ByAngle(angle) is {cos(angle), sin(angle)}

as for angleV, this was a value used internally to generate the positions in the database. It does not appear in the input features or in the output values of the neural network. If you wanted to recalculate it for whatever reason, the formula would be atan2(vy, vx) in radians.

There is no relation betwen angle and angleV, except for the poor uninformative names which have changed in my codes now

FrancoisB · January 14, 2022, 1:01pm

To give some reference points to others who would be tempted to go the neural network route for Mad Pod Racing / Coders Strike Back:

My very old C# genetic algorithm from back in the day got me in legend at the time
Training a neural network runner based on @pb4’s public dataset got me to about 650 in legend (python + TensorFlow 2)
Emboldened by this success, I started on the reinforcement learning path (which was significantly more work) and it got me to about 500 in legend

I kept my network small (6x64x64x5) so all I needed was to encode the TensorFlow weights in base64.

At this point it looks like you’d need to also train a “blocker” to progress further!

nyrrrr · January 23, 2022, 12:19pm

but how do you guys train your AIs outside of codingame?

FrancoisB · January 24, 2022, 1:21pm

To train a network on some existing data like @pb4’s public dataset you don’t need anything too special as you have the inputs and outputs already. At this point you need to realize your neural network will simply learn/approximate the behaviour of the decent “runner” that generated that dataset.

The next step is to write your own neural network library (best to learn the fundamentals) or use an existing one like TensorFlow/Keras or PyTorch (good option too). You feed the inputs and outputs and hope the network learns. You’ll want to visualize the loss over time at this point.

You probably want to write some kind of visualization tool so that you can see what your AI does and double check that what your bot is learning makes sense (but I guess that’s technically optional?).

Once your network has learned something, you can extract the weights and somehow make them accessible to your codingame script. This could be in raw text form e.g. arrays of float numbers or it could be more efficient/advanced like base 64 encoded (easy), some custom encoding (base85?) or take advantage of unicode magic (probably harder). Don’t worry too much about this step, keep your network small so that a basic encoding will not hit the 100k code limit.

Once you got the weights in codingame you need to implement a simple neural network forward pass. That’s the basic algorithm that multiplies your inputs by the weights, add them and apply an activation function. Rince and repeat if you have multiple hidden layers.

Talking about inputs, if your network learned from an existing dataset, make sure you transform the codingame state/inputs for the turn to whatever format your network learned on. This can involve computing angles, distances, … Triple check everything as it’s easy to have bugs in that part.

That’s it, you have now your network outputs in codingame. Map that to what your AI needs to output to the game (turn/thrust) and you’re in business! It probably won’t work the first time your try but don’t give up At some point your runner will move nicely and you will have applied machine learning to a codingame challenge, congratulations!

For reinforcement learning, that’s a bit more involved as you basically need to create some kind of simulator for the game you can run on your local machine.

In most of the recent competitions, a java version of the referee (the game engine) was provided so you can reuse it or rewrite it in a different language. For older games you may be able to find 3rd party implementations online. Check the readme on this repo for some links.

You can train a good runner by building a simplified simulator that only runs a single pod (no bounce, no shield).

To train a team with a “blocker” you might need a full fledged simulator with all the pods, possibly using self play. The training is going to be an order of magnitude harder.

Good luck!

Dr-D · February 15, 2022, 10:36pm

Thank you, pb4, for your dataset and all those explanations. Unfortunately it was almost impossible for me to create the mapping from the game output to the input for the network meeting the criteria for this dataset.

I am very thankful to FrancoisB, who supplied me with some kind of verification set. I therefore was able to check and debug my mapping. Maybe it might help someone else, as well.

For anyone wondering: angles going “to the left” are supposed to be negative and those to the right, positive. Oh, and try to differentiate between deg and rad functions

x =           [ 3,    3000, 1000, 1000, 413]
y =           [ 5,    4000, 2000, 1000, 4577]
angle =       [ 40,   85,   225,  265,  330]
vx =          [-50,   80,   120, -213, -80]
vy =          [ 80,   100,  120,  155, -90]
nextCPx =     [ 3000, 2000, 3000, 8000, 4023]
nextCPy =     [ 8000, 2000, 2000, 8000, 7582]
nextnextCPx = [ 8000, 9000, 5000, 5000, 8123]
nextnextCPy = [ 2000, 1000, 2000, 5000, 3523]

//maps to 

(D-600/2000)                = [ 0.3969134397041162   0.08180339887498948  0.07                 0.46497474683058326     0.20485168191861008]    
angleScalD                  = [ 0.8707763031101143  -0.9300008585858587  -0.7071067811865475  -0.7660444431189782      0.3457185604113709]     
angleCrossD                 = [-0.4916793975161887  -0.3675573465863057  -0.7071067811865476  -0.6427876096865391     -0.9383382529701586]     
vScalD/1000                 = [ 0.05735940292011353 -0.12521980673998823  0.12000000000000001 -0.041012193308819826   -0.1190645081677152]     
vCrossD/1000                = [ 0.07489925831841109  0.026832815729997458 0.12                 0.2602152954766495     -0.017990077675765652]   
nextDirScalD                = [-0.4946314746513534  -0.31622776601683805  1.0                 -1.0                     0.09608170209599205]    
nextDirCrossD               = [-0.8691028157152798   0.9486832980505138   0.0                 -1.2246467991473532e-16 -0.9953734507823367]     
(distanceCP1CP2-1200)/10000 = [ 0.6610249675906654   0.5871067811865475   0.08                 0.30426406871192846     0.4569357069899557]

All credits go to FrancoisB and pb4, of cause!