CodinGame, C++ and the STL

Hello guys!

The latest blog entry is about codingame legends sharing their tips for multiplayer contests (if you are learning something here, go read it now :slight_smile: )

Two of them agree that in C++, one should avoid the STL whenever possible. Why is that?

performance consideratoins is the only reason I can think of.
If you don’t care about 2-3 times slower performance, there’s no reason not to use STL.
If you do care, every STL object used is a tradeoff between code size and performance.

STL should be used in “real life” conditions. STL has been highly optimized and is now way more efficient than any code you could develop yourself. The problem with CG is that the compilation flags are not including the optimization flag O3 (I think everything is in O2, but I need a confirmation). So the STL is slower than “plain static tables”.

The reason for the absence of the flag is still a bit mysterious to me though.

1 Like

It’s not -O2 it’s -g without optimisation flags and the code is ran with gdb. If I remember correctly the main problem is this -g. You can get some performance back with the following pragma at the top of your code:

#pragma GCC optimize “-O3”

But it doesn’t have the performance something actually compiled with -O3 has because -g cancels alot of optimisations.
By the way you can reproduce this locally just compile some code with -g and this pragma vs compiling with -O3.

The code is ran through GDB O_o ?

1 Like

Thanks for the replies!

As far as I can tell, that’s saying that old-school arrays should be preferred to vectors, Maybe also STL arrays? I thought I read somewhere that STL arrays are strictly equivalent to handmade static arrays…
In any case, I can live with that.

As for the rest of the STL, I don’t believe any implementation of hash tables or balanced trees that I could come up with will be more efficient than STL implementations, with or without optimizations. :smiley:

Forcing -g is indeed mysterious, does CG need the debug instruments?

And side-thought: why not use CG’s support for bash to work-around this limitation?

g++ -O3 -o foo <<EOF
(...)
EOF

./foo
1 Like

Just out of curiosity I tried this during the last contest to compare the performances.
The issue is that the compiler takes some time so you timeout during the first turn.

It is however possible to send a binary of your locally compiled program (for example using base64 in Bash) but I guess this would be considered cheating :slight_smile:
Also if you don’t use STL, the performance gap is not very relevant.

1 Like

Love that idea. Is it still cheating if the code that is submitted is the one that’s compiled and sent?

That’s the easiest way to get nice stack traces whenever a crash happens. But it doesn’t have a big impact on performance if any.

Actually “-g” doesn’t cancel any optimization, it just adds debugging symbols to the binary. It makes it larger, but the code sections themselves are the same with and without “-g”.

The pragma trick works perfectly fine. However “O3” might not be the best optimization setting depending on your code. “O3” allows very aggressive inlining and loop unrolling, which can induce generated code bloat, and ultimately leads to suboptimal performance. “O2” might be better in some (a lot of) cases, and is usually advised as a default optimization setting, unless proved otherwise.

There are also other optimization options that a performance freak could enable to squeeze every last drop of CPU time, but I believe that this kind of coding tricks is part of the C++ coding challenge.

Of course, inline assembly and SIMD intrinsics are also another way to achieve better performance.

You are wrong.

-g does not affect performance on linux so it’s ok. And without -g we can’t have the stack when the program crash (not really sure of that).

You are right. I didn’t think of this.

1 Like

I don’t think there is loop unrolling in O3. Only inlining and register tricks.
But I guess the problems with O3 are quite rare and marginal now. I remember reading somewhere on stackoverflow (found it : http://stackoverflow.com/questions/11546075/is-optimisation-level-o3-dangerous-in-g) that most of the problems arising with O3 were in earlier versions of gcc/g++. And that now, problems can arise if you use a low L1 cache machine or rely on undefined behaviour (which is always a bad idea right :D).

1 Like

Even then, O3 isn’t always optimal. I’ve got a ~50% performance drop when switching from O2 to O3 with the same code on CSB.

Alright, my bad then. It looks like the pragma O3 isn’t enabling the full inlining O3 is supposed to offer, as well as some other global optimizations, like functions and jumps alignments and frame pointer optimization.

Regarding frame pointer optimization, it can be enforced with the same pragma trick (#pragma GCC optimize("-fno-omit-frame-pointer")), but the alignment cannot be enforced outside of the command-line apparently. The full inlining is also problematic, especially for the standard library headers which functions will usually not get inlined - which would explain to me why the use of the STL is so slow on CG. For user functions, one can still mark them as “always_inline” and get the O3-style inlining on them.

Indeed, I also tried to enable loop unrolling in addition, but it was even worse on CSB (but slightly beneficial on STC).

What that means in any case is that you should always measure and not rely on the maximum optimization level, it may not be optimal for a given code.

1 Like

In CG case, STL arrays turns out to be slower than vectors

What that means in any case is that you should always measure and not rely on the maximum optimization level, it may not be optimal for a given code.

That’s actually a sanity rule that should be always true ^^.

As a parenthesis, indeed, loop-unrolling should almost never be used as a “standard flag”. Devs should be picking the loops they want to unroll “by hand” (using gcc attributes).

1 Like