Home » Feedback directed optimization with GCC and Perf

Feedback directed optimization with GCC and Perf

Gcc 5.0 has added support for FDO which uses perf to generate profile. There is documentation for this in gcc manual, to quote:

Enable sampling-based feedback-directed optimizations, and the following optimizations which are generally profitable only with profile feedback available: -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer, -ftree-vectorize,
-finline-functions, -fipa-cp, -fipa-cp-clone, -fpredictive-commoning, -funswitch-loops, -fgcse-after-reload, and -ftree-loop-distribute-patterns.
path is the name of a file containing AutoFDO profile information. If omitted, it defaults to fbdata.afdo in the current directory.
Producing an AutoFDO profile data file requires running your program with the perf utility on a supported GNU/Linux target system. For more information, see .
perf record -e br_inst_retired:near_taken -b -o perf.data \
— your_program
Then use the create_gcov tool to convert the raw profile data to a format that can be used by GCC. You must also supply the unstripped binary for your program to this tool. See .
create_gcov –binary=your_program.unstripped –profile=perf.data \

However, this skims over a few details:

  • br_inst_retired:near_taken is not available as shown there. See this gcc thread for details.

    I did with:

    perf record \
    -e  " cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp " \
    -p ...  -b -o perf.data

    You can use the ocperf from pmu-tools here to get the correct event (with ocperf.py list).

  • create_gcov is not packaged with gcc and is only available with autofdo from google.

  • However, you can run into incompatibility due to autofdo being incompatible with latest perf. I am using perf with linux 4.0. You can apply the patches here.

    • I also have a github branch with patches applied here.
  • Finally, you can also run into gcov version incompatibility:
AutoFDO profile version 875575082 does match 1.
  • You need to explicitly provide the gcov_version for this:
create_gcov --binary=/pxc56/bin/mysqld 
 --profile=perf.data -gcov_version 1 

Now, with all tools in place, all you need to do is:

  1. Build the program. In my case, I built percona-xtradb-cluster with RelWithDebInfo profile. The debug symbols are required.
  • Run it against representative workload. I used sysbench oltp for this.

    sysbench --test=/pxc56/db/oltp.lua --db-driver=mysql \
    --mysql-engine-trx=yes --mysql-table-engine=innodb \
    --mysql-user=root --mysql-password=test --oltp-table-size=100000 \
    --num-threads=4 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=60 \
    --max-requests=100000 --oltp-tables-count=5 run

  • While the workload is running, run perf concurrently.
    perf record -e \
    " cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp " \
    -p $(pidof mysqld)  -b -o perf.data
  • After sysbench ends, stop perf and then convert perf.data to gcov format.
    create_gcov --binary=/pxc56/bin/mysqld \
    --profile=perf.data -gcov_version 1 --gcov=perf.ado
  • Now, rebuild the program again but this time with:
    export CFLAGS+=" -fauto-profile=/tmp/perf.ado "
    export CXXFLAGS+=" -fauto-profile=/tmp/perf.ado "
  • The binary produced now is the one which would be optimized with hints/feedback from profile captured by perf.
  • I have skipped the results for now, that is for another post with actual benchmarking in place and a better representative workload.

    To conclude, even though gcc has had gcov profiling before, it wasn’t that convenient to use. perf has been a good low-overhead profiler in use in various environments, so using its output/profile certainly makes it easier for optimization based on it.

    Vote on HN

    Name of author

    Name: Raghavendra

    Short Bio: http://wnohang.net/about

    Leave a comment if you can

    %d bloggers like this: