Fall Challenge 2020 - Feedbacks & Strategies

Illedan · November 24, 2020, 8:29am

TLDR:

(details found in other posts)
C#, 56th, Beam search
Depth 30 with beam of 500, ending mostly between 10-20 depth
Run for enemy to find end of game.
Scoring based on:

ingredients needed for customers
offline spell learning score
closest customer in ingredients needed
Spells usable with given inventory + extra for repeats

Extra:

Beautiful UI, but terrible for debugging!
Too little opponent interactions for my taste
Randomness
Had given up as I had zero motivation since mid week, but my gf told me I would regret it if I didn’t make Legend => made me do an all nighter with coffee and red bull getting into Legend around 6 AM the last night #NoRegrets
Big thanks to CodinGame for having this great platform and free games, which makes me want to plan my year around the contests

And using VR for the contest was not the most efficient
( https://twitter.com/erik_kvanli/status/1326879016817750017?s=20 )

skyyker · November 24, 2020, 9:37am

Hello Wala,
Thanks for your feedback! I like the idea of array with the size in the first element.
I would love to discuss about your code because mine in Java is really not optimized. Not setting the time limit to 40ms would lead frequently to timeout.
I wasn’t aware about the JVM memory issues. Did you try to remove it to see if it really matters?

codybumba · November 24, 2020, 9:50am

What is the philosophy behind “sum(blues)/2 + 1”?

AntiSquid · November 24, 2020, 10:43am

My scores are based on how long it takes to get a gem if all spells are exhausted.

You get 2 blues from the starting spell and +1 if you need to rest it. So i count a blue as half turn.

I was considering to adjust scores based on spells I learn, but never got around doing that. Not yet.

codybumba · November 24, 2020, 12:07pm

Can you explain this factor “tomeIndex/3” and why you are dividing by 3?

BrunoFelthes · November 24, 2020, 12:39pm

Hey @wala congrats for the 1st place in java, It is a challenge, I was chasing you

Two question:

How many nodes are you visiting?
How do you build, store and compare the hashes in Java? All solutions that i used looks like so a little inefficient…

_CG_Simon · November 24, 2020, 12:56pm

Rank 204 - Gold league (Java)

Firstly, thanks CG for the great contest, I loved it!!

Explaining the evolution of my bot will be boring for most of you so I’ll stick to my final AI (don’t hesitate to ask if you want more details).

My AI is a BFS. Nodes can be cast, brew or learn (only during first turn). I avoid duplicated states (same inventory and same castable spells). When time is out, I take all leaves (max reached depth) and evaluate depending on the potions brewed and ingredients left, somehow ponderated with a decay rate (not the best eval but I never figured out a better one). I was able to simulate at most 3-4k nodes, never got better…
At the beggining I compute the shortest path for the opponent to brew each potion (same BFS without learn, max depth 4, timeout 10ms) and forbid my BFS to brew these potions after the opponent (not the best way to do it, but the other strategies I tried wouldn’t be better).
A few optimizations: caching state objects, and encoding castable spells in an int (that was fun, but performances were basically the same).

First 5-8 turns: learn something I don’t know (a spell whose effect is not the same as the ones I already have)
When any of the players have brewed 4 potions, try solve the end game with a minimax (max depth 2, it’s bugged, but still works well enough to make me win if I can).

I guess the worst parts of my AI are the eval and the perfomances. Next time I’ll go for C++. Also I abandonned most of what I did during the week end because it kept getting worse.

The best part is that it was so fun!!

See you all nest year!

wala · November 24, 2020, 12:56pm

Hi skkyker, thanks for your feedback.
I tried to remove the code to avoid timeouts and no my ai doesn’t timeout in this multi.
But i knew that if i removed it in my UTTT bot, it timeouts every time (but in UTTT i create some garbage collectable objects during each turn).
If you’re not using the whole first round second, i advise you to put it anyway (it can only help).
In this multi i create no garbage collectable objects (except for some log strings).
I have a cache with 30000-50000 arrays for the ways. Most of my variables are static fields that i reuse (example : for the logs i use the same static new StringBuilder(5000) that i empty at each use).

DomiKo · November 24, 2020, 1:11pm

You need to use blues to get spell with tomeIndex > 0, so I give a little penalty for that.

DomiKo · November 24, 2020, 1:20pm

It should be similar.
We can go even further and do something similar to @NitMpez.

codybumba · November 24, 2020, 1:37pm

Why dividing by 3? Is it some kind of random weightage?

skyyker · November 24, 2020, 1:45pm

Thanks for your response and your test.
I will try to add it to see if I can easily up my time limit with this.

I have a lot to do if I want to remove all my created objects at each node creation (object Node, Gamer and depending on the action also a Game)

DomiKo · November 24, 2020, 1:48pm

Just try some weights and see which one work best for you.

kovi · November 24, 2020, 1:51pm

Depending on architecture (cache), I think direct encoding can be the fastest, as it returns validity at once
int newinv = caston[spell][oldinv];
if (newinv) {
…
But in the end it didn’t matter…

skyyker · November 24, 2020, 2:02pm

In order to add my data:
Currently only around 10k nodes. It was around 25-30k before I added the hash code. I don’t really know why it is so slow with this.
My hashcode:

int hc = inventory[0] | inventory[1]<<4 | inventory[2]<<8 | inventory[3]<<12;//size 16
hc = hc << 3 | createdPotions;//max 6 so 3 bit - size 19
hc = hc << 6 | score; //max 128 so 7 - size 26	
hc ^= Arrays.hashCode(usableSpells); //Not perfect but few prob to have same values with diff data

With usableSpells an array containing the ids of my available spells

wala · November 24, 2020, 2:06pm

Hi BrunoFelthes, thanks for your feedback.
Including the discarding ways (not enough ingredients to do the action…), i go up to ~70000 actions when i simulate my possible moves (i also simulate the opponent moves for 5-10ms depending on how much potions we’ve already done).
But it was more than i could handle and to improve my AI, it’s the evaluation that should be improved.

To calculate the hashcode, i inspired myself from Arrays.hashCode(long a[]) :

long hash = 0
hash += 31^0 * wayIng0Nb + 31^1 * wayIng1Nb  + 31^2 * wayIng2Nb + 31^3 * wayIng3Nb
for each action in the way : if not rest : hash += 31^(actionId + 4)

The actionId is not the one of the game, I redefined them for each player to keep them smaller.

So basically i remove the rests and the action order doesn’t matter (to avoid CAST 1, CAST 2 and CAST 2, CAST 1).
But before testing if the hashes are the same, the ways must have the same score.
So BREW 1, CAST 2 and CAST 2, BREW 1, even if they have the same hashcode, will both be kept because their scores are different (more points for brewing first).

I stored the hashes in the hash array (as i tried to explain in my PM).

For each new way, i do a binary search on its score in the score’s array :

if the index returned is negative : the score isn’t in the array so there no identical way yet => i add it (as well as its score and hashcode)
if the index is positive : the score is in the score’s array. So there might be an identical way. I look around the found index to get all the identical scores and for each scoreIndex, i compare the corresponding hashcode (in the hascode’s array) with the hashcode of the new way.

I hope it’s a bit clearer.
This is far from being perfect.

pb4 · November 24, 2020, 2:23pm

Here is a link to my post mortem

reCurse · November 24, 2020, 2:25pm

Nothing fancy, I just wanted to poke and see if it was worth investing more into. I was wondering if it was better to switch up playstyles according to unknown factors in the game state. Maybe something about the initial state indicated it’s better to draft faster/slower, or maybe a state in the early mid-game indicated it’s better to go for score over deliveries, or… Anyway so I recorded self-play game states to a text file, used a tweaked bot as a reference opponent to not get symmetric games but still ~50% winrate, forcing a decision on one side and recording the game result.

Input was components / 5 for recipes, spells and tomes, price / 20, and if applicable, score / 100, delivery / 5. Output is # of decisions + default, fitting for win/draw/loss value encoded as 1/0/-1. Used a basic fully-connected network with various # of layers and neurons, batch norm, relu/tanh activation, tried both MSE and NLL for loss function. Used 10k games for training set and 1k games for test set. Also used pytorch to keep momentum and not waste too much time. All it ever did was memorize the training set, the test set loss never went down. I gave up soon after, so maybe wasted 2-3 hours in total, plus self-play time during down time.

skyyker · November 24, 2020, 3:29pm

Hello everyone !
Ended 101th, but currently 7th on multi.
YES, I AM A REAL IDIOT!

Beam Search
Like many people, I have performed a Beam Search. Here are my specificities:

300 nodes per depth
Max depth 20 but it is more frequently depth 10 reached because no more time.
Has I said in my previous messages. I have an average around 10k nodes with hash to avoid duplicated nodes. It was around 25-30k nodes with duplicates.

Heuristic
My heuristic to compute a node’s score was REALLY bad (huge coeff on points that leads frequently to points but with empty inventory…).
Thanks a lot @Magus for your feedback (and also others who have similar eval). I have set exactly the same things as yours and that’s what leads me to the current 7th place:

1 point for each score point
1 points for each tier 0 ingredient in my inventory
2 points for each tier 1 ingredient in my inventory
3 points for each tier 2 ingredient in my inventory
4 points for each tier 3 ingredient in my inventory
1.1 point for each crafted potion

The only modification, is an inner score on spells were @pb4 helped me =)

Spells

Taking a spell on the 7 first turns (the ‘best’ between the first and the second at each turn).
I allow to learn new spells with max 13 spells.
In the endgame (2 or less potions left),I check if I can end faster with new spell.

Opponent
I am performing the same beam search for my opponent but only with max 5ms and max depth 10. I used it to know if he will take a potion before me or if he can win and if he does, in how many turns.
I have coded the game mechanic for +3 and +1 bonuses, and that’s why sometimes my bot can naturally wait before BREW just because I expect my opponent to BREW and allow me a bonus in next turn.

Thanks a lot for this contest, it was really interesting. Once again I could have done something really better just by talking more with others and stop being stuck with my huge coeff on points… See you for the Spring Challenge 2021

codybumba · November 24, 2020, 3:59pm

It seems most of the Gold and Legend bot programmers use beam search technique. Can anybody share good resources for beam search tutorial?