Regarding this topic, here is some more information about how things work.
Basically, CG compiles C++ code using “-O0 -g” compilation options, this means no optimizations, no inlining, nothing at all. Which gives, for std::max, std::vector and std::sort dummy samples, this kind of generated code: https://godbolt.org/g/YOoPB0
As you can see on the assembly side, this is really bad, every single function is called separately, nothing is optimized.
For comparison, here is the code generated when O3 is passed on the command-line instead: https://godbolt.org/g/kLkX62
You see that the compiler inlines a lot of functions, and does aggressive optimization. The call to std::max completely disappears, and the sort implementation is almost fully inlined.
When one uses the #pragma GCC optimize("O3")
trick, here’s what happens: https://godbolt.org/g/EZdEqB
As you can see, each function gets optimized accordingly to the O3 flag, but they aren’t inlined at all. This is why any call to std::max is slower than a macro.
So, what happens here? Well, it looks like the pragma only tells GCC to optimize each function O3-style, but it doesn’t activate all the global optimization flags, such as inlining and stuff, and GCC still does this part with the O0-style…
Is it possible to do better with pragmas? Yes. Not as good as command-line O3, but still, quite good: https://godbolt.org/g/syhzgm
By adding another #pragma GCC optimize("inline")
, we can override the implicit -fno-inline
that comes from O0 optimization, and tell GCC to try inlining the functions that are explicitely marked as inline. Also, the #pragma GCC optimize("omit-frame-pointer")
removes the useless stores of the frame pointer, which is enabled by O0 but useless most of the time.
As you can see, for std::max, which is marked as inline in the STL headers, this additional tricks make it as good as if it was compiled with command-line O3.
So why isn’t this still not as good as -O3? I’m not sure entirely, but I did notice that the pragma trick for enabling inlining works for functions marked with the inline keyword and for small functions. For functions not marked as such, some of them will not be considered for inlining, although if they would with -O3. This is also the case for every implicitly created functions, such as default constructors and assignment operators. This means that you should define these explicitly and mark them with inline, even if you want the default behavior:
struct bla {
inline bla() = default;
inline bla(bla const&) = default;
inline bla(bla&&) = default;
inline bla& operator=(bla const&) = default;
inline bla& operator=(bla&&) = default;
};
Now, using all these tricks, you should have performance almost on par with -O3.