Datastructures used by the dataflow framework in GCC were reorganized for better memory usage and more cache locality. Compile time is improved especially on units with large functions (possibly resulting from a lot of inlining) not fitting into the processor cache. The compile time of the GCC C compiler binary with link-time optimization went down by over 10% (benchmarked on x86-64 target).
Interprocedural optimization improvements
The interprocedural framework was re-tuned for link time optimization. Several scalability issues were resolved.
Improved auto-detection of const and pure functions. Newly, noreturn functions are auto-detected.
The -Wsuggest-attribute=[const|pure|noreturn] flag is available that informs users when adding attributes to headers might improve code generation.
A number of inlining heuristic improvements. In particular:
Partial inlining is now supported and enabled by default at -O2 and greater. The feature can be controlled via -fpartial-inlining.
Partial inlining splits functions with short hot path to return. This allows more aggressive inlining of the hot path leading to better performance and often to code size reductions (because cold parts of functions are not duplicated).
Scalability for large compilation units was improved significantly.
Inlining of callbacks is now more aggressive.
Virtual methods are considered for inlining when the caller is inlined and devirtualization is then possible.
Inlining when optimizing for size (either in cold regions of a program or when compiling with -Os) was improved to better handle C++ programs with larger abstraction penalty, leading to smaller and faster code.
The IPA reference optimization pass detecting global variables used or modified by functions was strengthened and sped up.
Functions whose address was taken are now optimized out when all references to them are dead.
A new inter-procedural static profile estimation pass detects functions that are executed once or unlikely to be executed. Unlikely executed functions are optimized for size. Functions executed once are optimized for size except for the inner loops.
On most targets with named section support, functions used only at startup (static constructors and main), functions used only at exit and functions detected to be cold are placed into separate text segment subsections. This extends the -freorder-functions feature and is controlled by the same switch. The goal is to improve the startup time of large C++ programs.
Yes GCC 4.6 templates are much more optimized, intermediate code optimization is better, loop unroll is much more intelligent. For full report check: http://www.phoronix.com/scan.php?page=article&item=gcc_4248_intelamd&num=1