Inconsistent timing behavior between arena and local simulations

Chahine · November 6, 2015, 10:05pm

Hello,

The following observation came up while playing Tron Battle, but it might be relevant for any AI battle.
The game description advertises that the AI must answer in less than 100ms for each game turn.
I made some time measurements between the moment the AI reads game inputs and the moment it prints the answer. Results were very consistent:

for the first turn, the answer was accepted after up to roughly 500ms
for all other turns, the answer was accepted after up to precisely 100ms (as advertised)

Those measurements were performed through local simulations, ie hitting the “play” button, not in the arena.

Then I designed some AI that would make those most of this timespan, through an iterative-deepening-tree-search algorithm, and it was working like a charm… until I sent it to the arena, where it timed out within the first 2nd turns in most battles. I’ve even tried to restrain the admissible delay for my AI to answer down to 10ms, and it keeps timing out.

Either the way I’m measuring time isn’t reliable, or the way timing constraints are managed differs between local simulations and the arena. Or both.

To other gamers: has anyone experienced the same issue ?
To codingame developers: care to share any insight about this inconsistency ?

Note: I’m writing the AI in Haskell, and the way I’m managing time relies on Control.Concurrent.threadDelay and System.Timeout.timeout functions.

Thank you.

_CG_SaiksyApo · November 6, 2015, 10:14pm

You have more or less 1s for the first turn. Depend on how much your language need to be ready

Galilleos · November 7, 2015, 7:06am

It is strange indeed.
In other posts, I have seen some people putting the limit at 100 ms CPU time. Do you use massive multi threading?

_CG_Maxime · November 7, 2015, 8:52am

There’s no difference between a local play in the IDE and in the arena: when you submit in the arena, we just select the matchs and run them the exact same way.

Aries · November 7, 2015, 12:30pm

AFAIK, to your program is allotted 100 ms of CPU time, not real time. Since the functions you are using are based on the wall clock, it can’t work reliably on CG servers (the fact it works in IDE is curious though). Unfortunately I know of no Haskell function to create timeouts based on CPU time, that’s why I’ve created one. You should adapt it to your need and see if it works.

Chahine · November 7, 2015, 9:10pm

I managed to improve the situation through the following actions:

I’m bounding both CPU time and wall time
bound is set to 70ms (which is waaaaay too conservative if you ask me)

The timeouts still happen in roughly 10% of the battles. Measurements show some spikes in the duration of some turns, so I’m starting to suspect the garbage collector. As I have no control over the linking process of the program, I can’t tune the way garbage collection is done.

Unless you have some brilliant idea, I think the game is over for me .

Aries · November 8, 2015, 7:50am

A common pitfall in Haskell when measuring performances is its laziness. Are you sure everything is done inside your time guard? The evaluate function could be used to locally enforce strict evaluation. However, all of this still doesn’t explain the difference between the EDI and the arena…

Chahine · November 8, 2015, 9:27am

Yes, I’m already using the evaluate function.
I’m also calling performMajorGC after printing the answer, but that doesn’t help.

Aries · November 8, 2015, 10:34am

Sorry, but the last item on my checklist is: does the user has been recently cursed / is experiencing hallucinations / tapping in dark powers with known incompatibilities with its hardware? You could share your code if you want, I’m always eager to play with fun problems in Haskell (solving them is another matter).

Aries · November 8, 2015, 7:49pm

One more thing to test in fact. You probably have the first, but maybe not the second.

hSetBuffering stdout NoBuffering
hSetBuffering stderr NoBuffering

Chahine · November 9, 2015, 7:41pm

I’ve disabled buffering on stderr as you suggested. No change (as expected, since I have timeouts even when nothing is logged to stderr).
I can’t paste the whole code here, but if you wish to give it a try, here’s the part that exhibits the issue:

data Move = LEFT | RIGHT | UP | DOWN | DIE
  deriving(Enum, Eq, Generic, Show)

iterativeDeepening :: Int -> GameTree -> IO (Int, Move, ThreadId)
iterativeDeepening delay gameTree = do
  startCPU <- fromInteger <$> getCPUTime
  startClock <- getClockTime

  result <- newIORef (0, LEFT)
  thread <- forkIO $ worker result 0

  sleep delay startCPU startClock

  (d, move) <- readIORef result
  return (d, move, thread)

  where worker output depth = case bestMove of
          DIE -> return ()
          _ -> do
            void $ evaluate bestMove
            output `writeIORef` (depth+1, bestMove)
            worker output (depth+1)
          where bestMove = getBestMove (depth+1) gameTree
          -- Implement any getBestMove that is exponential in depth

sleep :: Int -> Int -> ClockTime -> IO ()
sleep delay startCPU startClock = do
  -- threadDelay $ 3*1000
  endCPU <- fromInteger <$> getCPUTime
  endClock <- getClockTime

  let durationClock = toMilliseconds $ diffClockTimes endClock startClock
      durationCPU = div (endCPU - startCPU) (1000*1000*1000)

  if max durationCPU durationClock > delay
  then hPutStrLn stderr $ unwords [show durationCPU, show durationClock]
  else sleep delay startCPU startClock

You should be able to easily adapt it to reproduce the issue. If you need any other part of the code, let me know, I’ll gladly provide it.

Aries · November 9, 2015, 9:00pm

I’ve adapted my Pi calculation to Tron and, from what I see from your code, we basically do the same thing. The only difference is that I output things on stderr and, regarding this, not disabling the corresponding buffering leads to timeouts in the IDE, not only in the arena (I’m still getting timeouts in arena from time to time). Since output buffering is just another thing which need to be garbage collected, it always ends with the same culprit I guess. The cost incurred however seems a bit excessive to me, even without any special configuration (or simply more memory).

That’s not an answer to the problem, but perhaps Codingame should introduce some kind of average time budget for each bot instead of a hard limit (even if it cancel the benefit of some languages).

_CG_Maxime · November 10, 2015, 7:02am

It’s hard to compute an average time before knowing the number of rounds the game will last, and we can’t use the maximum number of rounds because that would have as effect to give to much computation time to the AI which would slow down everything. But the idea is interesting though. Maybe the average should apply on a certain time window.

Aries · November 10, 2015, 5:42pm

I had something simpler in mind: instead of having a hard limit of 100 ms each turn, a player is only required to stay bellow N*100 ms for its N last turns, starting from its Nth turn (N = 10 should be enough). This way, its bot could spent 190 ms during one turn then 90 ms the 9 following turns, it will be ok. A possible refinement would be to add an input parameter to instruct the bot about its time budget (per instance, -90 after the turn where its spent 190 ms, then -80 the turn after and so on). This way, it could go back in line by adjusting any internal timer. Unpredictable GC triggering should be no more a problem and I don’t see any way this mechanism could be abused by other, more predictable, languages.

_CG_Maxime · November 10, 2015, 7:56pm

We’ll keep that idea in mind. For now, we are quite busy so don’t expect any change on short term.

Aries · March 13, 2016, 1:12pm

A little grave digging today…

The delay I was observing from time to time (20 ms) was not caused by the GC after all, but to the way Haskell deals with threads. Basically, a thread which doesn’t yield keeps running until the rescheduling timer, which runs on a 20ms granularity by default, preempts it. There are different ways to tweak how Haskell deals (rather efficiently by the way) with threads (green and native), but, in our case, simply inserting yield at the right places makes a pretty responsive program. I’ve update my example which now always answers in less than 1 ms after being forced to stop.