Neural Network Ressources

@SaiksyApo bump! Is there any plans of officially supporting ‘good’ way of using neural nets on multiplayer and contests? It’s very interesting topic for me.

Maybe just increasing code size limit or allowing us to upload and read files. Or any other functional solution to achieve this?

3 Likes

Atm, it’s not in our priority list. :frowning:

1 Like

** And I had to workaround CG code size limitation, in very very horrible ways…

Could you explain/share the code of how you did that? Did you manage to write your coefficients in ascii/base 256?

Yes, basiscally it’s something like this.

I though of multiple approaches at the time.

The first step would have been to encode the coefficients as a base64 string, instead of an array of hexadecimal constants, although the gain is maybe not very high.

Then, instead of encoding the coefficients themselves, I thought of compressing the coefficient array first with gzip, lzma, or anything else, and encode the result as the base64 string. At program load time, I would have to decode the string and decompress the data to get back the coefficients.

The problem with these two approaches, it that although they gain a bit of code size, there is also an overhead for the decoder / decompresser code.

My codingame framework code was already very big and I had very little margin to include additional code. So, I went further and instead of compressing and encoding the coefficients only, I did it for the program binary itself:

  1. I compile all of my framework code with NN coefficient arrays directly as an executable in local, which may already be smaller than the source code itself because of unused code, and other things.
  2. The resulting binary is then optimally compressed using UPX, which is a nice executable compression tool, which embeds decompression code directly in the executable, in addition to aggressive transformations to achieve minimal executable size.
  3. I encode the resulting binary in base64, and I include it in another small source code that will decode the base64 to a temporary file, at runtime on CodinGame servers, and then execute the resulting binary.
4 Likes

A great q-learning ressource has been mentioned in another thread (thanks @Gaarv !) so I’m posting the link here.

I was also recently able to succesfully do Q-learning on CSB. I trained a runner bot using 8,64,64,64,6 neural network architecture using Deep Double Q-Learning. The neural network takes as inputs information relative to the position of the next two checkpoints and outputs the Q-value of 6 possible moves (+18,0,-18 angle)x(0,200 thrust). This neural network performs pretty well at its task of passing checkpoints as fast as possible. Leading to replays such as this, this and this. I have submitted it to the online arena and it reached a rank of ~90. You can therefore play games with it if you’d like.

Unlike [CPC]Herbert (previously rOut) I havn’t tried to train a blocker yet. However unlike him my training is stable for other activation functions than tanh. I suspect this comes from my strict application of Deepmind’s Deep Q learning algorithm, including error clipping to avoid seldom passing huge gradients through the NN (pass an error signal no larger than 1 in absolute value), sweeping the epsilon in the epsilon-greedy policy from 0.95 to 0.05 at the beginning of the training and the Deep Double Q learning improvement. I also have 1 network instead of his two separate networks for thrust and angle.

Supervised vs Reinforcement Learning
A while back I did my first NN runner bot on CSB, it ranked pretty well for a runner bot (~37 at the time) but it was trained by supervised learning to copy my heuristic runner bot and this showed in it’s play style. On average it would copy pretty well but it would sometimes make a small mistake like turning too early and waste like 50 turns going back to the CP it had missed. What is really nice about this reinforcement learning algorithm is that it is directly optimising for what you care about, in this case racing, instead of optimising for average error on a copy which is only indirectly related to racing.

Non-NN Moves
I hardcoded the first move of boosting to the next CP. And I override the NN’s thrust decision to “SHIELD” if there will be a collision with the enemy next turn (assuming no thrusts) of more than some constant in relative speed. The SHIELD doesn’t seem to be influencing the ranking much though.

Implementation
I implemented it in C++ without libraries. I was able to thread the mini-batch learning process but the scaling isn’t very good: almost but not quite ~2x when using 4 cores

Details
Linear output layer, relu fully connected hidden layers.
Learning rate of 1e-4 to 1e-7 in the backpropagation. I lower it to 1e-7 by hand at the end of the training to get a bit closer to the optimum.
Reward of 1 for passing a CP 0 otherwise
Gamma 0.95 in the bellman equation
Mini-batch of 100 bellman iterations per turn of a race
Memory of 1000 for action replay
Sweep epsilon from 0.95 to 0.05 linearly over the first 1e4 races
50 races until updating the target network in deep double Q-Learning
Inputs normalised to average 0 and standard deviation 1 by sampling random states and finding the Mean/Std of the inputs, then I can subtract the empirical mean and divide by the empirical Std .

Conclusion
I think I can improve performance to train my NN’s faster. The ideal would be if I could train it using a high speed library like tensor flow on a GPU and then export to C++ code. I would like to convert my Stochastic Gradient Descent to a learning-rate-free algorithm like RMSprop, hoping for better and quicker convergence. I want to try training a blocker. I’m also curious about Policy Gradient optimisation which is considered a better alternative to Q-learning, here is an article linked by inorry on that subject.

Acknowledgement
Doing this I sparked pb4’s interest in Reinforcement learning again so he has come back to try it and has very recently succeeded as well. Chatting with him, he helped me find important bugs. I also used as inputs what he was using back in our Supervised learning days. Of course [CPC]rOut/Herbert’s work was a major inspiration as well.

16 Likes

Nice !

I refactored mine some time ago, and it’s now all broken. I don’t know where exactly and I don’t have the motivation to make it work again, but this will maybe make me want to try again :slight_smile:

1 Like

Great news ! Waiting for you to come back then ! :slight_smile:

If by any chance you have a linux system, I encourage you to use Agade’s Runner Arena : https://github.com/Agade09/CSB-Runner-Arena. It’s a great way to monitor progress for your Neural Networks.

1 Like

You could be interested in the article called Importance of Machine Learning Applications in Various Spheres

I made a chatbot using a Neural Network. I basically copy pasted this tutorial and worked from there.

Architecture
It uses a Gated Recurrent Unit (GRU) which is a type of Recurrent Neural Network (RNN) to “eat” the characters in the chat one by one and output what it thinks the next character should be. I trained three versions on the world, fr and ru channels by teaching it to predict the respective chat logs in the format “username:text” (timestamp removed). Then whenever the bot is triggered to speak I feed it “Bot_Name:” and it generates a sentence character by character which I push onto the chat. The network doesn’t actually generate characters but probabilities for the next character, a categorical distribution is then used to select the next character.

Motivation: Automaton2000
One motivation I had for this project apart from personal interest was to help bring back Automaton2000. Automaton2000 was taking too much RAM on Magus’ cheap dedicated server (1Go RAM). The markov chain of all possible words had exceeded that limit. I made Magus a variable-length Markov chain C++ version (standalone and daemon) that consumes less RAM but it still easily exceeds 1Go. Magus took that work and made it consume even less RAM (github) but I suppose it wasn’t good enough. He also tried to store the tree on the disk instead of RAM but it was too slow. A Neural Network seemed like the ultimate solution against RAM consumption because even large NNs may only take ~100Mo worth of weights (e.g: keras applications networks for image classification). Potentially a NN could also take context into account, like answering people’s names and talking about the subject of the ongoing conversation which a Markov chain cannot.

The training consumes a lot of RAM (GPU VRAM actually…training takes ~2 hours per channel on a 1070) but the inference can be done on CPU and the bot currently consumes roughly ~300Mo of RAM + ~20Mo for each channel. It is a bit slow to generate sentences character by character taking up to a second on my computer and several seconds on Magus’ server. Automaton2000 is currently back using this work.

Issues
One thing that is quite noticeable is the overuse of common words and phrases. One version liked to say “merde” a lot on the french channel. I guess the network just learns what is most common.

Future work (including suggestion from @inoryy)

  • Bug fixes
  • BLEU metric
  • Seq2Seq architecture?
  • Use a higher level NLP framework like Sockeye?
  • Word-level instead of character-level?
  • Somehow ignore “bad data” like things said by Automaton2000?
6 Likes

The C++ version was bugged :frowning: It can’t work like that.

I think i’ll give a last hope shot with a noSQL database.

This isn’t directly related to CG, but the question of getting started with reinforcement learning gets brought up a lot, so figured I’d share my new blog post Deep Reinforcement Learning with TensorFlow 2.0.

In the blog post I give an overview of features in the upcoming major TensorFlow 2.0 release by implementing a popular DRL algorithm (A2C) from scratch. While focus is on TensorFlow, I tried to make the blog post self-contained and accessible with regards to DRL theory, with a birds eye overview of methods involved.

3 Likes

Agade, I’m interested to apply q-learning to the problem also. I’ve been assuming that you must extract data from the system to do training off-line and then inject the learned weights and models. I haven’t seen a way to get data out of the system. However, reading your write-up above it almost sounds like you are learning on-line. Is that the case?

@algae, you don’t need to extract data out of the system. You can play offline games with a simulator which is close enough to the real game, which is what I and @pb4 do. For example to train a NN which controls a runner you just need a simulator which moves pods and passes checkpoints (no need to simulate with >1 pods and pod collisions).

4 Likes

@agade, thanks for the response. I see some simulators on git. I’ll look into it.

As some of you know, @pb4 and I have been working on a NN based reinforcement learning AI on CSB. We have finished a write-up and are pleased to present Mastering Coders Strike Back with Nash-DQN.
leaderboard

21 Likes

Thank you for this gem.
I have some questions as I’m not quite as intelligent as you two:

  1. You train the runner because you have a training set with good trajectories to pass checkpoints.
    But on which training set can you train the blocker ?

  2. I don’t get the difference between what you call depth 0 and depth 1 implementation.

  3. Just to be sure, online on codingame the neural network is simply an evaluation function, right ?

  4. What are the one or two best tutorials you ever did on Q-learning ?
    I’m sure more will come from me or others

Thanks !

3 Likes

Hi,

  1. I made the runner dataset a long time ago, and never bothered to make one for a blocker. Those datasets would only be applied to supervised learning, which is a different approach compared to the Q-Learning described in the article.

  2. Depth 0 : Read the state. Feed it to the neural network. The neural network has 49 outputs corresponding to one pair of actions each. Use the iterative matrix-game solver algorithm to find the optimal mixed strategy for our bot.
    Depth 1 : Simulate all possible actions by my pod and the opponent. For each pair of actions : feed the simulated state to the network, read the output of the network and calculate the expected future rewards, this is the evaluation of the pair of actions. Upon constructing the table which evaluates all pairs of actions, use the iterative matrix-game solver algorithm linked in the article to find the optimal mixed strategy for our bot.

  3. Pretty much, yes. We use it to evaluate actions or a state depending on the algorithm in which it is plugged (depth 0, depth 1, depth 2, MCTS).

  4. Not really a tutorial, but I liked this one.

Don’t hesitate if you have anymore questions !

5 Likes

Thank you !

Hey I saw the link ! Cool I’m on the right rack :stuck_out_tongue:

So on depth 1, 2 or MCTS, the NN always has 49 output ?

The part where we train the NN is totally unrelated to the part where we use the NN.

So yes: in all different versions (depth 0, 1, 2, MCTS) the same NN was used, with the same 49 outputs.