Would it make sense, and be possible, for CG to measure our AI’s computation time in CPU time instead of wall clock time? The Fall2020 contest had quite a lot of timeouts, even in AIs limiting themselves to 30ms of wall clock time.
Sometimes, these timeouts are the programmer’s own fault. For exemple, the AI may spend a long time in destructors when exiting a function. But it seems that CG had some blame for the timeouts. I, for example, saw turns where my AI performed 200 simulations in 50ms and I am pretty sure I was avoiding all mallocs and map re-hashes.
I have to suspect that something is sometimes not giving our AIs CPU time server-side, and timing us by CPU time would solve this issue.
If CG were to make this change, it would not break existing bots because existing bots are timing themselves by wall clock time<= CPU time.
If CG were to make this change, it would probably not significantly affect their server costs because for most turns, wall clock should be very close to CPU time.
Open questions, others may feel free to chime in on:
Do you think this would indeed reduce timeouts in the arena?
Is it possible to implement this on CG side?
On question 2, I vaguely remember CG doing this back in Tron days because people were threading to get more CPU time in the same wall clock time.
Like we already discussed, I’m not sure this the only issue.
Some players tested with a simple C++ code doing only the same thing at each turn (and outputing WAIT at each turn). It revealed that sometime, just doing this takes 20ms (according to wall clock time).
So, from my point of view, switching everyone to CPU time before fixing this 20ms “hiccup” will only make games longer to compute. And CG has already a hard time to have fast submits during a contest …
But, i’m not sure you can have a difference of 20ms between wall clock and CPU time in such a short time (50ms). So I think there’s other issues.
It’s possible it could help reducing the timeouts in the arena. But it’s far from certain since even CG themselves do not know the root cause of the timeouts (or hiccups as I prefer calling that specific issue, random spikes of 10-20ms lag for no reason even in C++). If they happen while the bot’s process is scheduled, it could very well still count towards the bot’s CPU time and not change anything. But we could validate that by running @blasterpoard’s test again with measuring CPU time.
But overall I would be in favor of this change, as it only brings upsides (more stability!) with the only downside of being more complicated to explain to most people. And as JBM rightly pointed out, maybe more difficult to get a hold of in some languages.
This is addressing a new issue on CG where a random 10-20ms can be added randomly, pretty much anywhere in your code, even native C++. If this happens when you’re towards the end of a turn, then you are guaranteed to timeout with no alternative other than using much less time than allowed.
See blasterpoard’s example here which could experience such massive hiccups doing nothing at all.
But I did! You just need to scroll down the ranking for some times… I did experienced timeout too, but there were clearly other things to consider in my code before blaming a narcoleptic clock.
Another option I’ve suggested in the past was the “time budget” concept. The general idea being to let the AI be more flexible in how it allocates its time. In broad strokes:
the AI is provided with a “remaining time” input
it starts at 1
at each turn, it’s increased by 0.050s
the AI’s compute time then eats into it
the AI is responsible for ending the game with >0
Pros:
can erase load spikes
a (lax) per-turn timeout can still be applied
match time is still bounded
both per-turn timeout and endgame >0 check can be made external to the game and more of a “platform health-related” variable. And, for example, increased by 20ms when there are unexplained spikes happening randomly.
Cons:
an AI can pile up time to use in the end. Well, I’m not sure that’s really a con. Match time is still bounded. It’s juste a possibly different feel.
needs per-game protocol adjustment
Many variations are possible, taking inspiration from accounting.
overdraft allowed, must be repaid within 5 daysturns
overdraft carries interest (less added time per turn while in debt)
That’s a post-hoc rationalization, not really an explanation… In real-time systems, application developers often have much more global control over the system.
It wouldn’t. And it’s not the goal. The goal is to make rankings resilient in face of the spikes.