Reading Time: 2 minutes

Gcc 5.0 has added support for FDO which uses perf to generate profile. There is documentation for this in gcc manual, to quote:

-fauto-profile=path
Enable sampling-based feedback-directed optimizations, and the following optimizations which are generally profitable only with profile feedback available: -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer, -ftree-vectorize,
-finline-functions, -fipa-cp, -fipa-cp-clone, -fpredictive-commoning, -funswitch-loops, -fgcse-after-reload, and -ftree-loop-distribute-patterns.
path is the name of a file containing AutoFDO profile information. If omitted, it defaults to fbdata.afdo in the current directory.
Producing an AutoFDO profile data file requires running your program with the perf utility on a supported GNU/Linux target system. For more information, see .
E.g.
perf record -e br_inst_retired:near_taken -b -o perf.data \
— your_program
Then use the create_gcov tool to convert the raw profile data to a format that can be used by GCC. You must also supply the unstripped binary for your program to this tool. See .
E.g.
create_gcov –binary=your_program.unstripped –profile=perf.data \
–gcov=profile.afdo

However, this skims over a few details:

br_inst_retired:near_taken is not available as shown there. See this gcc thread for details.
I did with:
```
perf record \
-e  &quot; cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp &quot; \
-p ...  -b -o perf.data
```
You can use the ocperf from pmu-tools here to get the correct event (with ocperf.py list).
create_gcov is not packaged with gcc and is only available with autofdo from google.
However, you can run into incompatibility due to autofdo being incompatible with latest perf. I am using perf with linux 4.0. You can apply the patches here.
- I also have a github branch with patches applied here.
Finally, you can also run into gcov version incompatibility:

AutoFDO profile version 875575082 does match 1.

You need to explicitly provide the gcov_version for this:

create_gcov --binary=/pxc56/bin/mysqld 
 --profile=perf.data -gcov_version 1 
 --gcov=perf.ado

Now, with all tools in place, all you need to do is:

Build the program. In my case, I built percona-xtradb-cluster with RelWithDebInfo profile. The debug symbols are required.

Run it against representative workload. I used sysbench oltp for this.

sysbench --test=/pxc56/db/oltp.lua --db-driver=mysql \
--mysql-engine-trx=yes --mysql-table-engine=innodb \
--mysql-user=root --mysql-password=test --oltp-table-size=100000 \
--num-threads=4 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=60 \
--max-requests=100000 --oltp-tables-count=5 run

While the workload is running, run perf concurrently.

perf record -e \
&quot; cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009/pp &quot; \
-p $(pidof mysqld)  -b -o perf.data

After sysbench ends, stop perf and then convert perf.data to gcov format.

create_gcov --binary=/pxc56/bin/mysqld \
--profile=perf.data -gcov_version 1 --gcov=perf.ado

Now, rebuild the program again but this time with:

export CFLAGS+=&quot; -fauto-profile=/tmp/perf.ado &quot;
export CXXFLAGS+=&quot; -fauto-profile=/tmp/perf.ado &quot;

The binary produced now is the one which would be optimized with hints/feedback from profile captured by perf.

I have skipped the results for now, that is for another post with actual benchmarking in place and a better representative workload.

To conclude, even though gcc has had gcov profiling before, it wasn’t that convenient to use. perf has been a good low-overhead profiler in use in various environments, so using its output/profile certainly makes it easier for optimization based on it.

Feedback directed optimization with GCC and Perf

However, this skims over a few details:

Like this:

Related

However, this skims over a few details:

Share this:

Like this:

Related