Timeouts with C++ in CSB (might be more general)

So according to the graph is better to submit at 10pm :smiley:

I think Compilation errors might be discarded because I never saw any compiler error in these (no line error, or warning, just an empty error or the mount proc one). It’s the same tested coded that compiles fine on CG IDE, on my local GCC and Visual Studio. It also goes fine in the rest of matches except when the dreaded proc error happens.
I saw the same behavior than Agade, some submits have crazy high failures at start, with up to 5 crashes in the first 15 matches, all of them at turn 0. It’s annoying as hell, lately I just give up trying to have a clean submit without any mount error in first 20 matches.
I can’t infer about the execution part, but having 60ms in it and tested with a lot of code guards for possible overflow/errors I don’t get assert errors. I’d say it’s in that umount/mount phase.

Also I can’t know if it’s a problem from compiled langs only, in multis I usually am C++/C# masterrace

Code failed: your program was terminated before reaching the main entry point for your language
(possible reasons: segfault on static initializer, too much memory used, etc.)
/usr/bin/stdbuf: failed to run command ‘/tmp/Answer’: Permission denied

That doesn’t sound good

Code failed: your program was terminated before reaching the main entry point for your language
(possible reasons: segfault on static initializer, too much memory used, etc.)
/usr/bin/stdbuf: failed to run command ‘/tmp/Answer’: No such file or directory

No file?

In Hypersonic, but I don’t think that matters.

Code failed: your program was terminated before reaching the main entry point for your language
(possible reasons: segfault on static initializer, too much memory used, etc.)
“/tmp/Answer”: not in executable format: File truncated
No executable file specified.
Use the “file” or “exec-file” command.

Some news regarding the “mount proc”:

  • We did some fixes in the code. Please @Marchete and others, could you check if we improved anything or not.
  • As we trace these errors now, I can tell you that the numbers of these errors is very variable. We can have a full day without one and then all of a sudden we can have many over a short period of time.
  • So if you still have the errors, please let me know which code you used on which game as I now believe this is tied to particular code (although I cannot understand why it would be).

Regarding the “/tmp/Answer’: No such file or directory” or ““/tmp/Answer”: not in executable format: File truncated” issues, we are investigating. This can be due to two things: either compilation takes too long and our system kills the compilation process or the compilation takes too much memory and again our system kills the compilation process. Again, being investigated. We see it happening for C++ and Swift programs.

Regarding the “’/tmp/Answer’: Permission denied”, no idea. Probably a bug in our compilation caching system but no clue as to why for now. Anyway let’s fix the above problems first.

5 Likes

And we moved the allowed compile time to 20s for all languages. Should remove the “No such file or directory” error as we confirmed that the 10s limit was the issue there.

3 Likes

It seems that these changes are working. Before them I struggle to have 10 first matches without a timeout. Now I have tested 3 resubmits and I haven’t seen any in the first matches.
About the compilation time, in C++ you can have code that precalculates stuff at compile time, or maybe the compiler does it by itself because it’s optimal, but in local GCC barely takes 2 seconds to compile my code. I have no idea how to control memory usage at compile time.
If there is still some corner case of the problem (maybe some weird race condition) it could me enough to just make a single retry of the process if some error is detected during mount phase.

2 Likes

I am glad we fixed things.

Regarding compile time that’s ok, some code were near the 10s limit and sometimes it would take just over 10s. With a 20s limit it seems to be ok for all languages (except Scala for which we still grant more time).
We ruled out the memory problem at compile time.

With the next release when we detect a “mount” problem or any other internal problem, the game will be flagged as “in error” and retried 5 minutes later (plus, we get an alert). This “error flag” mechanism is not new but it was not applied to our C system code that would do the jailing, mounting, etc. So it means that in case of an issue, either it is a transient issue that will get solved by itself or we will be spammed with alerts and forced to look into it.

We’ll also put a probe into the “/usr/bin/stdbuf: failed to run command ‘/tmp/Answer’: Permission denied” issue to try to understand it and fix it.

I think we can finally go back to the initial timeout issue (spiking issue). A more complex problem… (next culprit in sight: systemd which consumes CPU all over the place when mounting partitions)

4 Likes

@_CG_XorMode I am currently experiencing an abnormally high amount of CSB games crashing on the very first turn with this error message: /usr/bin/stdbuf: Resource temporarily unavailable

When trying to see if Python3 was the culprit, I obtained a similar one using C++:

Code failed: your program was terminated before reaching the main entry point for your language
(possible reasons: segfault on static initializer, too much memory used, etc.)
/usr/bin/stdbuf: Resource temporarily unavailable

This is a very common occurrence, as my last submission had 13 of its 44 losses (~30%) directly caused by a game crashing on the first turn. An additional 2 losses were very suspicious timeouts, which could indicate the spiking issue is back as well, and possibly another symptom. I can also reproduce the crash on first turn maybe once every 10 games in IDE.

So far I have only obtained a few clues from my investigation. This crash seems to happen way before any of my code gets executed, as I have been unable to retrieve any logs when it happens. However, a key piece of information I later discovered is 100% of those occurrences happen as player 1. Not only I have been unable to reproduce this crash on my bot as player 2, but I have been able to reproduce the crash on another opponent as player 1 (see replay).

This seems to be a recent regression, as I have not had this problem at all until today. I will happily provide any more information or assistance to track down and fix this issue. This is severely impacting my progression at the moment. :confused: Thanks!

EDIT: Revamped the post to match my investigation progress.

2 Likes

Can confirm the /usr/bin/stdbuf: Resource temporarily unavailable error. This happened several times to myself yesterday (C++, BR2048), also had some suspicious timeout spiking of ~70ms(!) sporadically over the past week or so both in CSB and BR, I haven’t checked any other multis.

I aldready had this message but in a different context.

Create a executable file (a .sh file should be enought) then create a program where you spawn some process executing your file. Do it in a multithread program. Some of your process will fail to spawn with the following message : myfile.sh: Resource temporarily unavailable.

I can’t be sure, but i assume that codingame try to spawn multiple child process at the same time with the command stdbuf. Or maybe the file is locked by something else.

Same problem here (Bender4 / Java).
Occurs on submit, but also with unitary test case (never twice the same).

Thank you for reporting this and sorry this has been happening. It seems to be linked to an issue we had with our code machines yesterday. We’ve since rebooted them manually so you should not be having the same issues today.

If this is indeed the case, we’ll regularly reboot machines to avoid this kind of problem from now on.

1 Like

I did not try yesterday, but there is still some on C4L /usr/bin/stdbuf: Resource temporarily unavailable (the only multi I tried), i don’t know either if the frequency is higher or not.
So i guess it’s still occur on multi as well.

Still happening every 2-3 plays in the IDE as Player 1 as reCurse described, not sure about actual submits.
EDIT: in BR at least, not checked others

Same. On BR, 19 crashes out of 90 games :frowning:

ok :frowning: @_CG_XorMode will have a look this afternoon

EDIT: we’ve further investigated and discovered more machines affected with the issue. We rebooted them and will add a warning notification if it happens again so we can take action. We’ll also look into the root cause: zombies (don’t ask me more :sweat_smile:)

4 Likes

Currently it seems ok for me.
Thanks :wink:

Not a single crash in my last submit, thanks!

You might want to look here for strategies on how to kill those.

5 Likes

Beware, they might strike back !

1 Like

Which might lead to crashes by rising the Temperatures of the CPUs.