Java - Random timeouts due to

My Java uTTT bot is now consistently timeouting on turn 1. It’s doing some precalculations, very little memory usage. I don’t have the numbers at hand, but I think it was not using anything close to the 1000ms of the first turn before. This problem seems fairly recent, as my bot is still #33 in legend. Maybe something is seriously wrong with the Java 11 update, or with another recent change?

For information, the current GC is the serial Garbage Collector.

You can see this by executing the following code:

import java.lang.management.GarbageCollectorMXBean;
...


        List<GarbageCollectorMXBean> gcBeans = java.lang.management.ManagementFactory.getGarbageCollectorMXBeans();

        for (GarbageCollectorMXBean gcBean : gcBeans) {
            String name = gcBean.getName();
            System.err.println(name);
        }

Java 11 is cgroup-aware (contrary to java 8) so I guess that as Java sees that the program operates on a single cpu it selects the serial GC (as we do not force any from the command line). This is a good choice I guess for maximum throughput on a single cpu but the program is completely blocked during GC. Maybe with Java 8, it was using the parallel GC which was competing with the main program but at least you could escape the while() loop.

We’ll try to force some other Garbage collectors. We’ll use some of your bots for testing. Sorry for the issue!

7 Likes

I would also be interested in a solution here. My bot (Java/Code a la mode) keeps timing out on random turns as well and I can’t see a reason. Happens mostly on turn 2 or early in the game. It just suddenly stops with “timeout” message with the last timestamp having plenty of time left. I cannot definitely say that it is not my fault (all very experimental) but as the output just stops it is hard to tell.

Maybe as an intermediate solution you could give the bots 200 ms extra time if they don’t respond? They could still time out after 50 ms and won’t get another turn but this way we could check if it is a code-based or a system-based timeout and get some debug output?

I was looking into that myself after i talked to Thibaud about this yesterday.
It does not seem to be the GC. I added ‘finalize’ methods to all my own classes and there was no garbage collection during those peeks. Even when i gave the GC a hint (System.gc()) and i saw him cleaning there was no improvement and still got the same timeouts.

1 Like

Hmm.

If you want to check whether it comes from GC or not, the best would be to build on the code above:

for (GarbageCollectorMXBean gcBean : gcBeans) {
            String name = gcBean.getName();
            System.err.println(name);
            ((NotificationEmitter)gcBean).addNotificationListener(new GCListener(), null, null);
  }
...
public static class GCListener implements NotificationListener {

        @Override
        public void handleNotification(Notification notification, Object handback) {
            if (!notification.getType().equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION)) {
                return;
            }

            GarbageCollectionNotificationInfo info = GarbageCollectionNotificationInfo.from((CompositeData) notification.getUserData());
            GcInfo gcInfo = info.getGcInfo();
            String message = String.format("GC #%d: %dms [Cause:%s]", gcInfo.getId(), gcInfo.getDuration(), info.getGcCause());
            System.err.println(message);
}

But if you timeout, you won’t get the message…

If does not come from GC, then we are in trouble. Could come from the JIT as well.

Anyway investigating…

Could you suggest a code that fails always with Java 11 and used to work with Java 8 (just let me know which bot/game to take the code from)?

1 Like

Similar timeout problems for Clojure in Code a la mode (previously Java 8, Clojure 1.9 and now Clojure 1.10):


Currently my bots in ‘Code a la Mode’ multiplayer is a pretty save crash and it was working before the update.
Just take any of the last battles i end as rank 3 and you should get a good crash.

Thats also the bot that i located the use 10ms for a loop init problem in

e.g. in https://www.codingame.com/replay/381570637 with my current ide code i can see in frame 54 a waste of
9-10ms between line 480 and 483

all i do at those lines is

		System.err.println("blub " + (System.nanoTime() - begin));
		for (int i = 0; i < ingredientsNeeded.size(); ++i) {
			System.err.println(i + " " + ingredientsNeeded.get(i) + " " + ingredientsNeeded.size());
			System.err.println("blub1 " + (System.nanoTime() - begin));

Output is

blub 22025533
0 TART 2
blub1 31818889

Sometimes i’ve also seen that time is beeing wasted between line 272 and 277 and there is also only some comparison in between, shouldn’t take that long

Thanks @Tiramon. I do not see the “blub” and “blub1” thing on frame 54 of your replay.

Anyway, when I replay your code I can see you time out with the last output being “blub xxx” and then when a new “Code a la Mode” round starts I see the continuation of the ouput like 100 frames further (so the difference between “blub” and “blub 1” timings can be explained because you are eliminated at round 1 and do not get more time until round 2, the processes which timeouts are frozen in between the 2 rounds).

Anyway that does not explain why you timeout in the first place, but it gave me a hint as you have a lot of traces all over your code: I believe all your problems will be solved if you comment out all your System.err.println statements.

Could you try it?

If it solves the problem, we’ll fix this Java 8->Java 11 regression easily.

1 Like

I don’t think it is just the System.err.println. I did a quick try to printout dots and linebreaks for 40ms and got no timeout while this simple code timed out twice (out of 4 tries):

public static void main(String args[]) {
    Scanner in = new Scanner(System.in);
    int numAllCustomers = in.nextInt();
    for (int i = 0; i < numAllCustomers; i++) {
        String customerItem = in.next(); // the food the customer is waiting for
        int customerAward = in.nextInt(); // the number of points awarded for delivering the food
    }
    in.nextLine();
    for (int i = 0; i < 7; i++) {
        String kitchenLine = in.nextLine();
    }

    // game loop
    int numTablesWithItems = 0;
    int numCustomers = 0;
    while (true) {
        in.nextInt(); in.nextInt(); in.nextInt(); in.next(); in.nextInt();in.nextInt(); in.next();
        numTablesWithItems = in.nextInt();
        for (int i = 0; i < numTablesWithItems; i++) { in.nextInt(); in.nextInt(); in.next(); }
        in.next(); in.nextInt(); 
        numCustomers = in.nextInt(); // the number of customers currently waiting for food
        for (int i = 0; i < numCustomers; i++) { in.next(); in.nextInt(); }

        try { Thread.sleep(40); } catch(Exception ex) {}
        // =====================================================================
        
        System.out.println("WAIT");
    }
}

(Basically the starter code + a thread.sleep of 40ms).

Best Regards
Martin

Ok my head is spinning…

What I see is that by commenting out the System.err.println lines from @Tiramon code, the code no longer times out.

@ThomasNicoullaud Would be good to test your code without any System.err (in case you have some debug in your code). Either it solves the problem too and we have at least one lead. Or it does not solve anything and what I see with @Tiramon code is due to something else to be found…

Still, I can reproduce with @SuperNit code, so still investigating.

No change for me, timeout 90% of test run.
(You can try my CR Code. Just put debug var to false line 434).

Thomas

Same problem for me with Ultimate Tic Tac Toe.
Several timeout checks inside my code, with System.err.println time that exceed the timeout. Each timeout exceeded immediatly quit the method (return).

I have many System.err.println about this timeouts, without any code between them (except method return), and i’ve already experiment (random) deltas above 40~50ms.

Trying to optimize my code to avoid a maximum of garbage collected objects, quite complicated with Monte Carlo TS :disappointed_relieved:

PS : sorry for my english :wink:

1 Like

I added GC notifications on my GitC code that timeouts everytime, and I had 0 notifications. The code just seems to run much slower than before.

Same problem with Code of Kutulu…
Light code with only parameters reading and a few very simple distances calculation, random timeout.

Appearing that timeout becomes systematic with one “System.err.println(…)”

1 Like

Can you simply rollback @_CG_XorMode ? It seems that the new version is much worse than the old one for everyone. There is no reason to keep it in this condition.

1 Like

Indeed. On Code à la Mode, my code is timing out for all matches now although I had no such issue before on this game…

1 Like

@CG Also to validate that timeouts are due to GC you can test by setting Epsilon-GC (experimental in Java 11).

Any news here?

2 Likes

Just tried in Code A La Mode.
No timeouts in 10 games with commented err output

but a real simple bot in CultistWars wastes time while calling a method.
err output was just added afterwards to locate the time waster

timestamp before method call and first line in method is 70ms apart …

Couldn’t find the 70ms again
and sometimes id don’t even reach the method call

in this game it is 60ms vanishing

We’ll fallback to Java 8 tomorrow. We’ll continue to investigate so we can finally go back to Java 11 at some point.

8 Likes