[JAVA] JVM memory issues

Happened today on Vox Codei, when launching IDE test case:
ERROR: ld.so: object ‘/usr/lib/coreutils/libstdbuf.so’ from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.

Then got a timeout.

u can call GC.collect() by yourself

I sure can, but this action can take up to dozens of MS, that makes your program timeout.

I decided to print out JVM options from the running code of the Smash the code AI puzzle (not the contest) and got this:
-Xmx512m, -Xms32m, -Xss128k

The available max memory seems different from what is written on this page: https://www.codingame.com/faq

Also it seems like there is place for improvement of GC parameters but until then - initializing a huge memory buffer as suggested by Neumann seems reasonable.

1 Like

Hey.

I have some kind of performance issues (or maybe just broken code): I have a class ‘GameState’ which can be deeply cloned.

On my local PC which runs hardcore VM I can do about 15-20k calls to clone()/ms (inside VM) while on CG’s servers I can barely do 1500 .clones()/ms. (which is 10 times slower)

I am looking for advice on possible solutions (which can speed up my cloning by 10x) or any info that can lead me there and also I am curious about real environment used for sandboxing.

Is there a memory access patterns that I should avoid? Or any hints about memry management that are useful to know?

@MaximeC @SaiksyApo can you share some info about sandbox env?

1 Like

The definition of those params being:
-Xms set initial Java heap size
-Xmx set maximum Java heap size
-Xss set java thread stack size

Is the “place for improvement” you mentionned located in the big difference between 32 (initial) and 512 (max) ?
The point in initializing a huge memory buffer would be to raise the heap size close to the 512 right from the beginning ?
Any advice how to best perform this init ?

Yes, the easiest place of improvement is to increase Xms, see below why.
Yes, by initializing a huge memory from the start you’re telling the JVM that you actually need more memory so it is supposed to readjust its default GC settings. You can find a hint how to do this by reading Neumann comment #4 above.

1 Like

To test how JVM settings affect performance I created a dummy test based on a skeleton of Smash the code contest solution which follows a pattern suitable for most of the multiplayer contest - specifically there are some global state objects which should never be GCed (usually their number is very small) and a huge amount of objects which are generated in the game loop which are not supposed to live long and should be GCed very quickly. The test runs for 3 minutes - a typical max game time

Here’s my test class
public class JVMTest { public static void main(String[] args) throws Exception { Scanner in = new Scanner(System.in); in.nextLine(); SimulatorMetric simulatorMetric = new SimulatorMetric(); List<Block> globalState = new ArrayList<>(); initGlobalState(globalState); long start = System.currentTimeMillis(); long end = start + 3 * 60 * 1000; long acceptedDelay = 100L; long iterationMaxRun = 93L; long startIteration; while ((startIteration = System.currentTimeMillis()) < end) { long endIteration = startIteration + iterationMaxRun; Board board = readBoard(); simulatorMetric.simulateBoard(board, endIteration); if (System.currentTimeMillis() > startIteration + acceptedDelay) { throw new RuntimeException("Unexpected delay found after"); } waitOpponentMove(); } System.out.println("Min iterations simulated per run: " + simulatorMetric.minIterations); System.out.println("Max iterations simulated per run: " + simulatorMetric.maxIterations); System.out.println("Average iterations per all runs: " + simulatorMetric.avgIterations); } private static void initGlobalState(List<Block> globalState) { for (int i = 0; i < 200; i++) { globalState.add(new Block()); } } private static Board readBoard() { return new Board(); } private static void waitOpponentMove() throws InterruptedException { Thread.sleep(100L); } private static class Board { private Block[][] grid = new Block[12][6]; public Board() { for (int i = 0; i < 12; i++) { for (int j = 0; j < 6; j++) { grid[i][j] = new Block(); } } } public void addBlock(Block block) { } } private static class Block { private long row; private long column; private long color; } private static class SimulatorMetric { private long minIterations = Long.MAX_VALUE; private long maxIterations = 0L; private long avgIterations = 0L; private long numberOfCalls = 0L; private void simulateBoard(Board board, long endTime) { long iterations = 0L; while (System.currentTimeMillis() < endTime) { board.addBlock(new Block()); iterations++; } if (iterations < minIterations) { minIterations = iterations; } if (iterations > maxIterations) { maxIterations = iterations; } numberOfCalls++; avgIterations = (avgIterations * (numberOfCalls - 1) + iterations) / numberOfCalls; } } }
And here are the results I got

  • -Xms32m -Xmx512m -XX:+UseParallelGC (seems to be the default JVM settings in all contests)
    Heap 32MB, YoungGen=10.5MB,Eden=8.5MB,Survivor=1MB
    Got 4-5 GC cycles of young generation each usually lasting 3-5 ms
    Out of 20 runs about 10% were failures related to the fact that GCs took a few ms longer and apparently happened at the end of the simulation cycle so the code threw exceptions stating that execution times were too big

Results of one test:
Min iterations simulated per run: 2267946
Max iterations simulated per run: 4463943
Average iterations per all runs: 3303737

  • -Xms180m -Xmx512m -XX:+UseParallelGC
    Heap 180MB, YoungGen=60MB,Eden=45MB,Survivor=7.5MB
    Got 1 GC cycle lasting 7-8 ms
    Didn’t get timeout exceptions as the probability of GC kicking at the end of simulation cycle is much lower

Results of one test:
Min iterations simulated per run: 2296191
Max iterations simulated per run: 4457412
Average iterations per all runs: 3310138

  • -Xms320m -Xmx512m -XX:+UseParallelGC
    Heap 320MB, YoungGen=106.5MB,Eden=80.5MB,Survivor=13MB
    No GC cycles, no timeout exceptions

Results of one test:
Min iterations simulated per run: 2354515
Max iterations simulated per run: 4492568
Average iterations per all runs: 3292464

  • -Xms32m -Xmx512m -XX:+UseConcMarkSweepGC
    Heap 32MB, YoungGen=10.5MB,Eden=8.5MB,Survivor=1MB
    Got 4 GC cycles of young generation each usually lasting 3-4 ms, however the first cycle seems to always last 30-40ms. While it seems to affect the performance of the algorithm itself, it didn’t ever throw the timeout exception, though it could have just been luck

Results of one test:
Min iterations simulated per run: 1322192 (must be due to the time when 30-40ms GC kicked in)
Max iterations simulated per run: 4488003
Average iterations per all runs: 3312208

  • -Xms180m -Xmx512m -XX:+UseConcMarkSweepGC -XX:NewRatio=2
    Heap 180MB, YoungGen=60MB,Eden=48MB,Survivor=6MB
    Got 1 GC cycle lasting 8-9 ms
    Didn’t get timeout exceptions

Results of one test:
Min iterations simulated per run: 1719210
Max iterations simulated per run: 5230574
Average iterations per all runs: 3452837

  • -Xms320m -Xmx512m -XX:+UseConcMarkSweepGC
    Heap 320MB, YoungGen=170MB,Eden=136MB,Survivor=17MB
    No GC cycles, no timeout exceptions

Results of one test:
Min iterations simulated per run: 2699521
Max iterations simulated per run: 4490146
Average iterations per all runs: 3325435

Conclusions:

  1. Increasing -Xms definitely makes sense as the vast majority of solutions will only benefit from it in terms of GC pause times. How much it should be raised that really depends. Apparently setting it to the max value is the best option as it can avoid the GC in almost all cases, however memory-greedy solutions have to be avoided as cleaning a YoungGen full of 170-256MB of data can be expensive (YoungGen is usually 1/3 of heap size). Setting Xms to 240-360MB could be a trade-off
  2. In terms of GC time, surprisingly the CMS collector showed worse results than the parallel one however for small memory sizes it didn’t throw the timeout exception. The average and max iterations also don’t seem to differ essentially with CMS giving just slightly better results but worse results in terms of min iterations.
  3. So it looks like the tandem -XX:+UseParallelGC (default) or even better the explicit -XX:+UseParallelOldGC along with an increased Xms could give much better results
  4. Other optimization settings could be tried. Eg it seems like due to the nature of the algorithms, very few objects would reach the OldGen so we may decrease its size and increase YoungGen instead by setting -XX:NewRatio=1 Another option is to tune the survivor ratio but I’m afraid this is too dependent on the algorithm and there’s no value that can suit all.
  5. A note about Xss. The value of this parameter would control the thread stack size and indirectly affects the used memory. If the value is too small then we won’t be able to use recursion up to a much deeper level as we will get StckOverflowException. On the other hand, a value too big doesn’t allow to use too many threads for a multithreading solution as we could get OutOfMemory errors.
  6. This test was actually a fun exercise with mostly informational purposes. Of course the admins at CodingGame are not required to change anything, since if there are people who can write solutions that have no memory issues then it means that each of us can do the same - it’s just a matter of how you write your algorithm, but obviously some help on that would be appreciated :slight_smile:
1 Like

I also had this problems multiple times today … always on init before my first turn startet …

[quote]ERROR: ld.so: object ‘/usr/lib/coreutils/libstdbuf.so’ from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread
[/quote]

i.e.



That is a different problem, most likely caused by a bug in their virtual platforms. It only happens in the IDE though, so that’s not a big deal.

the replays i posted are ranking games …

@Tiramon Is this still happening?

I got one today @MaximeC, but this time it was a normal play action in Tron not a ranking game
if it happens again i’ll add a replay here

@MaximeC got another OutOfMemory

ERROR: ld.so: object ‘/usr/lib/coreutils/libstdbuf.so’ from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread

And another replay for @MaximeC

ERROR: ld.so: object ‘/usr/lib/coreutils/libstdbuf.so’ from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread

Just got something sounding similar but not the same

ERROR: ld.so: object ‘/usr/lib/coreutils/libstdbuf.so’ from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
Error occurred during initialization of VM
Cannot create VM thread. Out of system resources.

Me too i usually encounter this problem.

ERROR: ld.so: object ‘/usr/lib/coreutils/libstdbuf.so’ from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.lang.ref.Finalizer.(Finalizer.java:226)

It seems to appear also on server when we submit our code. In codebusters multiplayer game i lose some games where my busters never move. I do not think it’s a timeout as i can’t reproduce in the IDE.

In code of rings game i have to submit 3-4 times before all tests pass.

This bug is really annoying as we lose some time and possibly ranks in multiplayer games.

I confirm the bug. A quick fix is to restart the faulty servers but that’s only temporary. We’re currently working on a long term fix ;).

1 Like

Ok then i stop my reports until you tell us here that you have fixed it :wink: