{"id":160,"date":"2015-04-29T14:41:08","date_gmt":"2015-04-29T09:11:08","guid":{"rendered":"https:\/\/blog.wnohang.net\/?p=160"},"modified":"2015-05-02T13:19:36","modified_gmt":"2015-05-02T07:49:36","slug":"feedback-directed-optimization-with-gcc-and-perf","status":"publish","type":"post","link":"https:\/\/blog.wnohang.net\/index.php\/2015\/04\/29\/feedback-directed-optimization-with-gcc-and-perf\/","title":{"rendered":"Feedback directed optimization with GCC and Perf"},"content":{"rendered":"<span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 2<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span><p><a href=\"https:\/\/blog.wnohang.net\/wp-content\/uploads\/2015\/04\/feedback.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blog.wnohang.net\/wp-content\/uploads\/2015\/04\/feedback.jpg\" alt=\"feedback\" width=\"520\" height=\"147\" class=\"aligncenter size-full wp-image-181\" srcset=\"https:\/\/blog.wnohang.net\/wp-content\/uploads\/2015\/04\/feedback.jpg 520w, https:\/\/blog.wnohang.net\/wp-content\/uploads\/2015\/04\/feedback-300x85.jpg 300w, https:\/\/blog.wnohang.net\/wp-content\/uploads\/2015\/04\/feedback-500x141.jpg 500w\" sizes=\"auto, (max-width: 520px) 100vw, 520px\" \/><\/a><br \/>\nGcc 5.0 has <a href=\"https:\/\/gcc.gnu.org\/gcc-5\/changes.html\">added<\/a> support for FDO which uses <a href=\"https:\/\/perf.wiki.kernel.org\/\">perf<\/a> to generate profile. There is documentation for this in gcc manual, to quote:<\/p>\n<p><!--more--><\/p>\n<blockquote><p>\n  -fauto-profile=path<br \/>\n              Enable sampling-based feedback-directed optimizations, and the following optimizations which are generally profitable only with profile feedback available: -fbranch-probabilities, -fvpt, -funroll-loops, -fpeel-loops, -ftracer, -ftree-vectorize,<br \/>\n              -finline-functions, -fipa-cp, -fipa-cp-clone, -fpredictive-commoning, -funswitch-loops, -fgcse-after-reload, and -ftree-loop-distribute-patterns.<br \/>\n              path is the name of a file containing AutoFDO profile information.  If omitted, it defaults to fbdata.afdo in the current directory.<br \/>\n             Producing an AutoFDO profile data file requires running your program with the perf utility on a supported GNU\/Linux target system.  For more information, see <https: \/\/perf.wiki.kernel.org><\/https:>.<br \/>\n              E.g.<br \/>\n                      perf record -e br_inst_retired:near_taken -b -o perf.data \\<br \/>\n                          &#8212; your_program<br \/>\n             Then use the create_gcov tool to convert the raw profile data to a format that can be used by GCC.  You must also supply the unstripped binary for your program to this tool.  See <https: \/\/github.com\/google\/autofdo>.<br \/>\n              E.g.<br \/>\n                      create_gcov &#8211;binary=your_program.unstripped &#8211;profile=perf.data \\<br \/>\n                          &#8211;gcov=profile.afdo<br \/>\n<\/https:><\/p><\/blockquote>\n<h1>However, this skims over a few details:<\/h1>\n<ul>\n<li>br_inst_retired:near_taken is not available as shown there. See this <a href=\"https:\/\/www.mail-archive.com\/gcc@gcc.gnu.org\/msg76447.html\">gcc thread<\/a> for details.\n<p>I did with:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nperf record \\\n-e  &amp;quot; cpu\/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009\/pp &amp;quot; \\\n-p ...  -b -o perf.data\n<\/pre>\n<p>You can use the ocperf from pmu-tools <a href=\"https:\/\/github.com\/andikleen\/pmu-tools\">here<\/a> to get the correct event (with ocperf.py list).<\/li>\n<li>\n<p>create_gcov is not packaged with gcc and is only available with <a href=\"https:\/\/github.com\/google\/autofdo\">autofdo<\/a> from google.<\/p>\n<\/li>\n<li>\n<p>However, you can run into incompatibility due to autofdo being incompatible with latest perf. I am using perf with linux 4.0. You can apply the patches <a href=\"https:\/\/gcc.gnu.org\/ml\/gcc\/2015-04\/msg00271.html\">here<\/a>.<\/p>\n<ul>\n<li>I also have a github branch with patches applied <a href=\"https:\/\/github.com\/ronin13\/autofdo\">here<\/a>.<\/li>\n<\/ul>\n<\/li>\n<li>Finally, you can also run into gcov version incompatibility:<\/li>\n<\/ul>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nAutoFDO profile version 875575082 does match 1.\n<\/pre>\n<ul>\n<li>You need to explicitly provide the gcov_version for this:<\/li>\n<\/ul>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncreate_gcov --binary=\/pxc56\/bin\/mysqld \n --profile=perf.data -gcov_version 1 \n --gcov=perf.ado\n<\/pre>\n<p>Now, with all tools in place, all you need to do is:<\/p>\n<ol>\n<li>Build the program. In my case, I built <a href=\"https:\/\/github.com\/percona\/percona-xtradb-cluster\">percona-xtradb-cluster<\/a> with RelWithDebInfo profile. The debug symbols are required.<\/li>\n<\/ol>\n<li>\n<p>Run it against representative workload. I used sysbench oltp for this.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsysbench --test=\/pxc56\/db\/oltp.lua --db-driver=mysql \\\n--mysql-engine-trx=yes --mysql-table-engine=innodb \\\n--mysql-user=root --mysql-password=test --oltp-table-size=100000 \\\n--num-threads=4 --init-rng=on --max-requests=0 --oltp-auto-inc=off --max-time=60 \\\n--max-requests=100000 --oltp-tables-count=5 run\n<\/pre>\n<\/p>\n<\/li>\n<li>While the workload is running, run perf concurrently.\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nperf record -e \\\n&amp;quot; cpu\/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=400009\/pp &amp;quot; \\\n-p $(pidof mysqld)  -b -o perf.data\n<\/pre>\n<\/li>\n<li>After sysbench ends, stop perf and then convert perf.data to gcov format.\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncreate_gcov --binary=\/pxc56\/bin\/mysqld \\\n--profile=perf.data -gcov_version 1 --gcov=perf.ado\n<\/pre>\n<\/li>\n<li>Now, rebuild the program again but this time with:\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nexport CFLAGS+=&amp;quot; -fauto-profile=\/tmp\/perf.ado &amp;quot;\nexport CXXFLAGS+=&amp;quot; -fauto-profile=\/tmp\/perf.ado &amp;quot;\n<\/pre>\n<\/li>\n<li>The binary produced now  is the one which would be optimized with hints\/feedback from profile captured by perf.<\/li>\n<p>I have skipped the results for now, that is for another post with actual benchmarking in place and a better representative workload.<\/p>\n<p>To conclude, even though gcc has had gcov profiling before, it wasn&#8217;t that convenient to use. perf has been a good low-overhead profiler in use in various environments, so using its output\/profile certainly makes it easier for optimization based on it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><span class=\"span-reading-time rt-reading-time\" style=\"display: block;\"><span class=\"rt-label rt-prefix\">Reading Time: <\/span> <span class=\"rt-time\"> 2<\/span> <span class=\"rt-label rt-postfix\">minutes<\/span><\/span>Gcc 5.0 has added support for FDO which uses perf to generate profile. There is documentation for this in gcc manual, to quote:<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[48,47,46,43,49],"class_list":["post-160","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-fdo","tag-gcc","tag-perf","tag-pxc","tag-sysbench"],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p3AlYV-2A","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":59,"url":"https:\/\/blog.wnohang.net\/index.php\/2014\/04\/30\/saving-form-data\/","url_meta":{"origin":160,"position":0},"title":"Saving form data in firefox","author":"Raghavendra","date":"April 30, 2014","format":false,"excerpt":"When commenting on sites, I have sometimes, seen that the commenting system just swallows the comment, or there is a browser crash, or a system one. In these cases it would be great if you can recover it somehow, particularly when you typed quite a bit. There are plugins for\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":334,"url":"https:\/\/blog.wnohang.net\/index.php\/2020\/05\/22\/gossips-in-distributed-systems-physalia\/","url_meta":{"origin":160,"position":1},"title":"Gossips in Distributed Systems:  Physalia","author":"Raghavendra","date":"May 22, 2020","format":false,"excerpt":"I often take notes and jot down observations when I read academic\/industry papers. \u00a0 Thinking of a name for this series \u2018Gossips in Distributed Systems\u2019 seemed apt to me, inspired by the gossip protocol with which peers in these systems communicate with each other which mimics the spread of ideas\u2026","rel":"","context":"In \"availability\"","block_context":{"text":"availability","link":"https:\/\/blog.wnohang.net\/index.php\/tag\/availability\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/blog.wnohang.net\/wp-content\/uploads\/2020\/05\/Screen-Shot-2020-05-22-at-4.40.40-PM.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/blog.wnohang.net\/wp-content\/uploads\/2020\/05\/Screen-Shot-2020-05-22-at-4.40.40-PM.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/blog.wnohang.net\/wp-content\/uploads\/2020\/05\/Screen-Shot-2020-05-22-at-4.40.40-PM.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/blog.wnohang.net\/wp-content\/uploads\/2020\/05\/Screen-Shot-2020-05-22-at-4.40.40-PM.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/blog.wnohang.net\/wp-content\/uploads\/2020\/05\/Screen-Shot-2020-05-22-at-4.40.40-PM.png?resize=1050%2C600&ssl=1 3x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/posts\/160","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/comments?post=160"}],"version-history":[{"count":22,"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/posts\/160\/revisions"}],"predecessor-version":[{"id":194,"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/posts\/160\/revisions\/194"}],"wp:attachment":[{"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/media?parent=160"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/categories?post=160"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.wnohang.net\/index.php\/wp-json\/wp\/v2\/tags?post=160"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}