Timeout in smash the code

eldidou · May 8, 2016, 6:58pm

Hello,

There is something with my AI that I don’t really understand.
I didn’t submit anything new in 2 days.
For the 1700~ first battles everything seems fine, and AFAIK my AI was in the top 3.
But since the battle ~1700, suddenly my AI get timeout almost every time.

Is there anything new in the way the timeouts are checked ?

Can this be related to the load of servers at the end of the contest ?

Edit: It didn’t happened after the battle 1728.

Psyho · May 8, 2016, 9:39pm

Hey,

FYI, if you need more data. Near the end of the contest I tested ~80 runs against your AI. I believe you have timed out in 3 or 4 out of ~80 during the first round. It didn’t happened before so something definitely changed with your submission in the last few hours.

Also, the result was 45 wins (for me), 29 losses and 6 ties

eldidou · May 8, 2016, 10:57pm

Thanks for the data.
What I found weird, is that I didn’t submit (or resubmit) anything in the last two days…
In the end I think there were enough matches after that to totally cancel the effect, so I’m fine with my final rank.

Magus · May 9, 2016, 8:41am

How your AI works ? Do you do tests until the timeout limit ? How do you check this limit ?

If you have a fixed amount of test, the problem is that at the rush hour, CodinGame servers are crying in a corner trying to not die. So your AI can’t do the same amount of tests as the normal time.

eldidou · May 9, 2016, 7:55pm

I used an any-time approach.
I checked the time using rdtsc (I handle 32b and 64b architectures), which is very fast, especially in a sandbox environment.
The time was checked frequently (something like every ~1ms) and I kept a reasonable margin before stopping the algorithm (100ms for first turn, and 10ms for others).

Rdtsc requires to hardcode the CPU frequency as a constant into the code, so if for some reason one server had a frequency lower than others, then it could explain what happened.

Another explanation is the time allocated for the first turn, which is always bigger that other turns. I didn’t seen it in the statement so I tried different values in order to guess it, and ended with a value of 1000ms (so with the margin my algorithm runs for 900ms the first turn). If the real value was not 1000ms but smaller, then it could also explain the timeouts (but then why did it fail just at some point near the end of contest ?)

_CG_SaiksyApo · May 9, 2016, 8:12pm

1s first turn, 100ms after.

We tried to add some Amazon machine at the end of the contest but they are sligthly differents as the ones provided by OVH. (2.9Ghz instead of 3.?Ghz).

We stop them to avoid more timeout for you, but they might be usefull to improve the calculations of end of contest. So be careful next time

eldidou · May 9, 2016, 8:48pm

Well, it makes sense then.
Thanks !

Wildcat-SC · May 9, 2016, 9:29pm

I read somewhere that the check was done with a wall clock ; as RDTSC is CPU clock, it can explain the timeout I guess.

I had a few weird timeouts when I used clock() before taking gettimeofday(), but I’m still not sure if I initialized the tStart in the right place (juste after the parsing).

LBandy · May 9, 2016, 9:34pm

I’ve experienced an interesting issue, there were a few matches where my bot timed out, which was interesting, because the the cerr sent AFTER the cout was printed correctly, but the std out was missing.

kiwijam · May 9, 2016, 9:46pm

Yes I also printed to cerr and cout together, and there were times when I saw the cerr printing OK but got a timeout message and no cout.

magaiti · May 9, 2016, 9:49pm

flushing stdout right after printing your move might be a good idea

Psyho · May 10, 2016, 12:02am

RDTSC and getimeofday() measures wall clock time and RDTSC is the cheapest (in terms of used clock cycles) and most accurate way of measuring elapsed time. The only problem (mentioned by @eldidou) is that you need to know the clock frequency. Which is problematic when program is run on different machine, or there’s boost/power-saving mode enabled on the CPU, since both can dynamically change the clock frequency of the CPU. It’s great when you’re manually profiling the code, but it’s generally best to stay from it when running code in uncontrolled environment.

On the other hand, clock() uses CPU clock, and since you already know that competition used wall clock time, you know why it timed out

Wildcat-SC · May 10, 2016, 6:34am

Yup, thats my point, it can hardly be considered as a reliable wall clock, as it depends on some CPU params (dynamic frequency scaling, CPU frequency variance on a cluster, virtualization?…)

_CPC_Herbert · May 10, 2016, 8:06am

I believe that the main issue here is the virtualized environment. RDTSC has no guarantee to be correctly virtualized, and you might get jumping values because of this. Also, unless the CPU has some very specific features (which are now common on latest intel CPUs to be honest), RDTSC even has no guarantees to be synchronized between cores, or monotonic between different p-states.

IMHO It’s better to rely on high level functions calls such as clock_gettime, gettimeofday, or std::chrono::high_resolution_clock that will be more reliable, especially on EC2. The performance penalty should not be noticeable, unless you want to measure very small blocks of code, which isn’t the case here.

magaiti · May 10, 2016, 8:13am

I was using clock_gettime(), and had to set the margin at 80ms, because 85 already resulted in occasional timeouts.
even with 80ms, I have seen one timeout, out of a few hundred replays I viewed over the course of the contest.

_CPC_Herbert · May 10, 2016, 8:33am

This is probably not an issue with clock_gettime (be sure to use CLOCK_MONOTONIC). I am using std::chrono::high_resolution_clock, which is basically just calling the same function, and I use a 95ms timeout without any trouble ever.

Wildcat-SC · May 10, 2016, 8:33am

Weird, I had set a very close limit (92-96 ms depending on defense calculations), and had not a single timeout during the matches.

Use of gettimeofday
Time initialization just after the parsing
Memory free management just after this initialization

Didn’t you have an unexpectedly long task at the end of your loop, like the free of a big structure ?

Niii · May 10, 2016, 8:43am

I would be useful to have stats when submitting a program so that we do not have to examine every simulation to know whether our program times out.

magaiti · May 10, 2016, 9:59am

I didn’t know about CLOCK_MONOTONIC, thanks for the tip!
It might be because I used CLOCK_REALTIME and/or I was not checking time often enough.

_CPC_Herbert · May 10, 2016, 2:40pm

Also, CLOCK_REALTIME will jump whenever the system time changes (NTP or manual adjustments), whereas CLOCK_MONOTONIC will not.