You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4138 lines
177 KiB
HTML
4138 lines
177 KiB
HTML
4 years ago
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<!-- Copyright (C) 1988-2018 Free Software Foundation, Inc.
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document
|
||
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
|
any later version published by the Free Software Foundation; with the
|
||
|
Invariant Sections being "Funding Free Software", the Front-Cover
|
||
|
Texts being (a) (see below), and with the Back-Cover Texts being (b)
|
||
|
(see below). A copy of the license is included in the section entitled
|
||
|
"GNU Free Documentation License".
|
||
|
|
||
|
(a) The FSF's Front-Cover Text is:
|
||
|
|
||
|
A GNU Manual
|
||
|
|
||
|
(b) The FSF's Back-Cover Text is:
|
||
|
|
||
|
You have freedom to copy and modify this GNU Manual, like GNU
|
||
|
software. Copies published by the Free Software Foundation raise
|
||
|
funds for GNU development. -->
|
||
|
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<title>Optimize Options (Using the GNU Compiler Collection (GCC))</title>
|
||
|
|
||
|
<meta name="description" content="Optimize Options (Using the GNU Compiler Collection (GCC))">
|
||
|
<meta name="keywords" content="Optimize Options (Using the GNU Compiler Collection (GCC))">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content="makeinfo">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<link href="index.html#Top" rel="start" title="Top">
|
||
|
<link href="Option-Index.html#Option-Index" rel="index" title="Option Index">
|
||
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||
|
<link href="Invoking-GCC.html#Invoking-GCC" rel="up" title="Invoking GCC">
|
||
|
<link href="Instrumentation-Options.html#Instrumentation-Options" rel="next" title="Instrumentation Options">
|
||
|
<link href="Debugging-Options.html#Debugging-Options" rel="prev" title="Debugging Options">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.summary-letter {text-decoration: none}
|
||
|
blockquote.indentedblock {margin-right: 0em}
|
||
|
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
|
||
|
blockquote.smallquotation {font-size: smaller}
|
||
|
div.display {margin-left: 3.2em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
div.lisp {margin-left: 3.2em}
|
||
|
div.smalldisplay {margin-left: 3.2em}
|
||
|
div.smallexample {margin-left: 3.2em}
|
||
|
div.smalllisp {margin-left: 3.2em}
|
||
|
kbd {font-style: oblique}
|
||
|
pre.display {font-family: inherit}
|
||
|
pre.format {font-family: inherit}
|
||
|
pre.menu-comment {font-family: serif}
|
||
|
pre.menu-preformatted {font-family: serif}
|
||
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
|
pre.smallexample {font-size: smaller}
|
||
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
|
pre.smalllisp {font-size: smaller}
|
||
|
span.nolinebreak {white-space: nowrap}
|
||
|
span.roman {font-family: initial; font-weight: normal}
|
||
|
span.sansserif {font-family: sans-serif; font-weight: normal}
|
||
|
ul.no-bullet {list-style: none}
|
||
|
-->
|
||
|
</style>
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en">
|
||
|
<a name="Optimize-Options"></a>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Instrumentation-Options.html#Instrumentation-Options" accesskey="n" rel="next">Instrumentation Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="prev">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<a name="Options-That-Control-Optimization"></a>
|
||
|
<h3 class="section">3.10 Options That Control Optimization</h3>
|
||
|
<a name="index-optimize-options"></a>
|
||
|
<a name="index-options_002c-optimization"></a>
|
||
|
|
||
|
<p>These options control various sorts of optimizations.
|
||
|
</p>
|
||
|
<p>Without any optimization option, the compiler’s goal is to reduce the
|
||
|
cost of compilation and to make debugging produce the expected
|
||
|
results. Statements are independent: if you stop the program with a
|
||
|
breakpoint between statements, you can then assign a new value to any
|
||
|
variable or change the program counter to any other statement in the
|
||
|
function and get exactly the results you expect from the source
|
||
|
code.
|
||
|
</p>
|
||
|
<p>Turning on optimization flags makes the compiler attempt to improve
|
||
|
the performance and/or code size at the expense of compilation time
|
||
|
and possibly the ability to debug the program.
|
||
|
</p>
|
||
|
<p>The compiler performs optimization based on the knowledge it has of the
|
||
|
program. Compiling multiple files at once to a single output file mode allows
|
||
|
the compiler to use information gained from all of the files when compiling
|
||
|
each of them.
|
||
|
</p>
|
||
|
<p>Not all optimizations are controlled directly by a flag. Only
|
||
|
optimizations that have a flag are listed in this section.
|
||
|
</p>
|
||
|
<p>Most optimizations are only enabled if an <samp>-O</samp> level is set on
|
||
|
the command line. Otherwise they are disabled, even if individual
|
||
|
optimization flags are specified.
|
||
|
</p>
|
||
|
<p>Depending on the target and how GCC was configured, a slightly different
|
||
|
set of optimizations may be enabled at each <samp>-O</samp> level than
|
||
|
those listed here. You can invoke GCC with <samp>-Q --help=optimizers</samp>
|
||
|
to find out the exact set of optimizations that are enabled at each level.
|
||
|
See <a href="Overall-Options.html#Overall-Options">Overall Options</a>, for examples.
|
||
|
</p>
|
||
|
<dl compact="compact">
|
||
|
<dt><code>-O</code></dt>
|
||
|
<dt><code>-O1</code></dt>
|
||
|
<dd><a name="index-O"></a>
|
||
|
<a name="index-O1"></a>
|
||
|
<p>Optimize. Optimizing compilation takes somewhat more time, and a lot
|
||
|
more memory for a large function.
|
||
|
</p>
|
||
|
<p>With <samp>-O</samp>, the compiler tries to reduce code size and execution
|
||
|
time, without performing any optimizations that take a great deal of
|
||
|
compilation time.
|
||
|
</p>
|
||
|
<p><samp>-O</samp> turns on the following optimization flags:
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">-fauto-inc-dec
|
||
|
-fbranch-count-reg
|
||
|
-fcombine-stack-adjustments
|
||
|
-fcompare-elim
|
||
|
-fcprop-registers
|
||
|
-fdce
|
||
|
-fdefer-pop
|
||
|
-fdelayed-branch
|
||
|
-fdse
|
||
|
-fforward-propagate
|
||
|
-fguess-branch-probability
|
||
|
-fif-conversion2
|
||
|
-fif-conversion
|
||
|
-finline-functions-called-once
|
||
|
-fipa-pure-const
|
||
|
-fipa-profile
|
||
|
-fipa-reference
|
||
|
-fmerge-constants
|
||
|
-fmove-loop-invariants
|
||
|
-fomit-frame-pointer
|
||
|
-freorder-blocks
|
||
|
-fshrink-wrap
|
||
|
-fshrink-wrap-separate
|
||
|
-fsplit-wide-types
|
||
|
-fssa-backprop
|
||
|
-fssa-phiopt
|
||
|
-ftree-bit-ccp
|
||
|
-ftree-ccp
|
||
|
-ftree-ch
|
||
|
-ftree-coalesce-vars
|
||
|
-ftree-copy-prop
|
||
|
-ftree-dce
|
||
|
-ftree-dominator-opts
|
||
|
-ftree-dse
|
||
|
-ftree-forwprop
|
||
|
-ftree-fre
|
||
|
-ftree-phiprop
|
||
|
-ftree-sink
|
||
|
-ftree-slsr
|
||
|
-ftree-sra
|
||
|
-ftree-pta
|
||
|
-ftree-ter
|
||
|
-funit-at-a-time
|
||
|
</pre></div>
|
||
|
|
||
|
</dd>
|
||
|
<dt><code>-O2</code></dt>
|
||
|
<dd><a name="index-O2"></a>
|
||
|
<p>Optimize even more. GCC performs nearly all supported optimizations
|
||
|
that do not involve a space-speed tradeoff.
|
||
|
As compared to <samp>-O</samp>, this option increases both compilation time
|
||
|
and the performance of the generated code.
|
||
|
</p>
|
||
|
<p><samp>-O2</samp> turns on all optimization flags specified by <samp>-O</samp>. It
|
||
|
also turns on the following optimization flags:
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">-fthread-jumps
|
||
|
-falign-functions -falign-jumps
|
||
|
-falign-loops -falign-labels
|
||
|
-fcaller-saves
|
||
|
-fcrossjumping
|
||
|
-fcse-follow-jumps -fcse-skip-blocks
|
||
|
-fdelete-null-pointer-checks
|
||
|
-fdevirtualize -fdevirtualize-speculatively
|
||
|
-fexpensive-optimizations
|
||
|
-fgcse -fgcse-lm
|
||
|
-fhoist-adjacent-loads
|
||
|
-finline-small-functions
|
||
|
-findirect-inlining
|
||
|
-fipa-cp
|
||
|
-fipa-bit-cp
|
||
|
-fipa-vrp
|
||
|
-fipa-sra
|
||
|
-fipa-icf
|
||
|
-fisolate-erroneous-paths-dereference
|
||
|
-flra-remat
|
||
|
-foptimize-sibling-calls
|
||
|
-foptimize-strlen
|
||
|
-fpartial-inlining
|
||
|
-fpeephole2
|
||
|
-freorder-blocks-algorithm=stc
|
||
|
-freorder-blocks-and-partition -freorder-functions
|
||
|
-frerun-cse-after-loop
|
||
|
-fsched-interblock -fsched-spec
|
||
|
-fschedule-insns -fschedule-insns2
|
||
|
-fstore-merging
|
||
|
-fstrict-aliasing
|
||
|
-ftree-builtin-call-dce
|
||
|
-ftree-switch-conversion -ftree-tail-merge
|
||
|
-fcode-hoisting
|
||
|
-ftree-pre
|
||
|
-ftree-vrp
|
||
|
-fipa-ra
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Please note the warning under <samp>-fgcse</samp> about
|
||
|
invoking <samp>-O2</samp> on programs that use computed gotos.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-O3</code></dt>
|
||
|
<dd><a name="index-O3"></a>
|
||
|
<p>Optimize yet more. <samp>-O3</samp> turns on all optimizations specified
|
||
|
by <samp>-O2</samp> and also turns on the following optimization flags:
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">-finline-functions
|
||
|
-funswitch-loops
|
||
|
-fpredictive-commoning
|
||
|
-fgcse-after-reload
|
||
|
-ftree-loop-vectorize
|
||
|
-ftree-loop-distribution
|
||
|
-ftree-loop-distribute-patterns
|
||
|
-floop-interchange
|
||
|
-floop-unroll-and-jam
|
||
|
-fsplit-paths
|
||
|
-ftree-slp-vectorize
|
||
|
-fvect-cost-model
|
||
|
-ftree-partial-pre
|
||
|
-fpeel-loops
|
||
|
-fipa-cp-clone
|
||
|
</pre></div>
|
||
|
|
||
|
</dd>
|
||
|
<dt><code>-O0</code></dt>
|
||
|
<dd><a name="index-O0"></a>
|
||
|
<p>Reduce compilation time and make debugging produce the expected
|
||
|
results. This is the default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-Os</code></dt>
|
||
|
<dd><a name="index-Os"></a>
|
||
|
<p>Optimize for size. <samp>-Os</samp> enables all <samp>-O2</samp> optimizations that
|
||
|
do not typically increase code size.
|
||
|
</p>
|
||
|
<p><samp>-Os</samp> disables the following optimization flags:
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">-falign-functions -falign-jumps -falign-loops
|
||
|
-falign-labels -freorder-blocks -freorder-blocks-algorithm=stc
|
||
|
-freorder-blocks-and-partition -fprefetch-loop-arrays
|
||
|
</pre></div>
|
||
|
|
||
|
<p>It also enables <samp>-finline-functions</samp>, causes the compiler to tune for
|
||
|
code size rather than execution speed, and performs further optimizations
|
||
|
designed to reduce code size.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-Ofast</code></dt>
|
||
|
<dd><a name="index-Ofast"></a>
|
||
|
<p>Disregard strict standards compliance. <samp>-Ofast</samp> enables all
|
||
|
<samp>-O3</samp> optimizations. It also enables optimizations that are not
|
||
|
valid for all standard-compliant programs.
|
||
|
It turns on <samp>-ffast-math</samp> and the Fortran-specific
|
||
|
<samp>-fstack-arrays</samp>, unless <samp>-fmax-stack-var-size</samp> is
|
||
|
specified, and <samp>-fno-protect-parens</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-Og</code></dt>
|
||
|
<dd><a name="index-Og"></a>
|
||
|
<p>Optimize debugging experience. <samp>-Og</samp> enables optimizations
|
||
|
that do not interfere with debugging. It should be the optimization
|
||
|
level of choice for the standard edit-compile-debug cycle, offering
|
||
|
a reasonable level of optimization while maintaining fast compilation
|
||
|
and a good debugging experience.
|
||
|
</p></dd>
|
||
|
</dl>
|
||
|
|
||
|
<p>If you use multiple <samp>-O</samp> options, with or without level numbers,
|
||
|
the last such option is the one that is effective.
|
||
|
</p>
|
||
|
<p>Options of the form <samp>-f<var>flag</var></samp> specify machine-independent
|
||
|
flags. Most flags have both positive and negative forms; the negative
|
||
|
form of <samp>-ffoo</samp> is <samp>-fno-foo</samp>. In the table
|
||
|
below, only one of the forms is listed—the one you typically
|
||
|
use. You can figure out the other form by either removing ‘<samp>no-</samp>’
|
||
|
or adding it.
|
||
|
</p>
|
||
|
<p>The following options control specific optimizations. They are either
|
||
|
activated by <samp>-O</samp> options or are related to ones that are. You
|
||
|
can use the following flags in the rare cases when “fine-tuning” of
|
||
|
optimizations to be performed is desired.
|
||
|
</p>
|
||
|
<dl compact="compact">
|
||
|
<dt><code>-fno-defer-pop</code></dt>
|
||
|
<dd><a name="index-fno_002ddefer_002dpop"></a>
|
||
|
<p>Always pop the arguments to each function call as soon as that function
|
||
|
returns. For machines that must pop arguments after a function call,
|
||
|
the compiler normally lets arguments accumulate on the stack for several
|
||
|
function calls and pops them all at once.
|
||
|
</p>
|
||
|
<p>Disabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fforward-propagate</code></dt>
|
||
|
<dd><a name="index-fforward_002dpropagate"></a>
|
||
|
<p>Perform a forward propagation pass on RTL. The pass tries to combine two
|
||
|
instructions and checks if the result can be simplified. If loop unrolling
|
||
|
is active, two passes are performed and the second is scheduled after
|
||
|
loop unrolling.
|
||
|
</p>
|
||
|
<p>This option is enabled by default at optimization levels <samp>-O</samp>,
|
||
|
<samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ffp-contract=<var>style</var></code></dt>
|
||
|
<dd><a name="index-ffp_002dcontract"></a>
|
||
|
<p><samp>-ffp-contract=off</samp> disables floating-point expression contraction.
|
||
|
<samp>-ffp-contract=fast</samp> enables floating-point expression contraction
|
||
|
such as forming of fused multiply-add operations if the target has
|
||
|
native support for them.
|
||
|
<samp>-ffp-contract=on</samp> enables floating-point expression contraction
|
||
|
if allowed by the language standard. This is currently not implemented
|
||
|
and treated equal to <samp>-ffp-contract=off</samp>.
|
||
|
</p>
|
||
|
<p>The default is <samp>-ffp-contract=fast</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fomit-frame-pointer</code></dt>
|
||
|
<dd><a name="index-fomit_002dframe_002dpointer"></a>
|
||
|
<p>Omit the frame pointer in functions that don’t need one. This avoids the
|
||
|
instructions to save, set up and restore the frame pointer; on many targets
|
||
|
it also makes an extra register available.
|
||
|
</p>
|
||
|
<p>On some targets this flag has no effect because the standard calling sequence
|
||
|
always uses a frame pointer, so it cannot be omitted.
|
||
|
</p>
|
||
|
<p>Note that <samp>-fno-omit-frame-pointer</samp> doesn’t guarantee the frame pointer
|
||
|
is used in all functions. Several targets always omit the frame pointer in
|
||
|
leaf functions.
|
||
|
</p>
|
||
|
<p>Enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-foptimize-sibling-calls</code></dt>
|
||
|
<dd><a name="index-foptimize_002dsibling_002dcalls"></a>
|
||
|
<p>Optimize sibling and tail recursive calls.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-foptimize-strlen</code></dt>
|
||
|
<dd><a name="index-foptimize_002dstrlen"></a>
|
||
|
<p>Optimize various standard C string functions (e.g. <code>strlen</code>,
|
||
|
<code>strchr</code> or <code>strcpy</code>) and
|
||
|
their <code>_FORTIFY_SOURCE</code> counterparts into faster alternatives.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-inline</code></dt>
|
||
|
<dd><a name="index-fno_002dinline"></a>
|
||
|
<p>Do not expand any functions inline apart from those marked with
|
||
|
the <code>always_inline</code> attribute. This is the default when not
|
||
|
optimizing.
|
||
|
</p>
|
||
|
<p>Single functions can be exempted from inlining by marking them
|
||
|
with the <code>noinline</code> attribute.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-finline-small-functions</code></dt>
|
||
|
<dd><a name="index-finline_002dsmall_002dfunctions"></a>
|
||
|
<p>Integrate functions into their callers when their body is smaller than expected
|
||
|
function call code (so overall size of program gets smaller). The compiler
|
||
|
heuristically decides which functions are simple enough to be worth integrating
|
||
|
in this way. This inlining applies to all functions, even those not declared
|
||
|
inline.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-findirect-inlining</code></dt>
|
||
|
<dd><a name="index-findirect_002dinlining"></a>
|
||
|
<p>Inline also indirect calls that are discovered to be known at compile
|
||
|
time thanks to previous inlining. This option has any effect only
|
||
|
when inlining itself is turned on by the <samp>-finline-functions</samp>
|
||
|
or <samp>-finline-small-functions</samp> options.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O3</samp>, <samp>-Os</samp>. Also enabled
|
||
|
by <samp>-fprofile-use</samp> and <samp>-fauto-profile</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-finline-functions</code></dt>
|
||
|
<dd><a name="index-finline_002dfunctions"></a>
|
||
|
<p>Consider all functions for inlining, even if they are not declared inline.
|
||
|
The compiler heuristically decides which functions are worth integrating
|
||
|
in this way.
|
||
|
</p>
|
||
|
<p>If all calls to a given function are integrated, and the function is
|
||
|
declared <code>static</code>, then the function is normally not output as
|
||
|
assembler code in its own right.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-finline-functions-called-once</code></dt>
|
||
|
<dd><a name="index-finline_002dfunctions_002dcalled_002donce"></a>
|
||
|
<p>Consider all <code>static</code> functions called once for inlining into their
|
||
|
caller even if they are not marked <code>inline</code>. If a call to a given
|
||
|
function is integrated, then the function is not output as assembler code
|
||
|
in its own right.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O1</samp>, <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fearly-inlining</code></dt>
|
||
|
<dd><a name="index-fearly_002dinlining"></a>
|
||
|
<p>Inline functions marked by <code>always_inline</code> and functions whose body seems
|
||
|
smaller than the function call overhead early before doing
|
||
|
<samp>-fprofile-generate</samp> instrumentation and real inlining pass. Doing so
|
||
|
makes profiling significantly cheaper and usually inlining faster on programs
|
||
|
having large chains of nested wrapper functions.
|
||
|
</p>
|
||
|
<p>Enabled by default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-sra</code></dt>
|
||
|
<dd><a name="index-fipa_002dsra"></a>
|
||
|
<p>Perform interprocedural scalar replacement of aggregates, removal of
|
||
|
unused parameters and replacement of parameters passed by reference
|
||
|
by parameters passed by value.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp> and <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-finline-limit=<var>n</var></code></dt>
|
||
|
<dd><a name="index-finline_002dlimit"></a>
|
||
|
<p>By default, GCC limits the size of functions that can be inlined. This flag
|
||
|
allows coarse control of this limit. <var>n</var> is the size of functions that
|
||
|
can be inlined in number of pseudo instructions.
|
||
|
</p>
|
||
|
<p>Inlining is actually controlled by a number of parameters, which may be
|
||
|
specified individually by using <samp>--param <var>name</var>=<var>value</var></samp>.
|
||
|
The <samp>-finline-limit=<var>n</var></samp> option sets some of these parameters
|
||
|
as follows:
|
||
|
</p>
|
||
|
<dl compact="compact">
|
||
|
<dt><code>max-inline-insns-single</code></dt>
|
||
|
<dd><p>is set to <var>n</var>/2.
|
||
|
</p></dd>
|
||
|
<dt><code>max-inline-insns-auto</code></dt>
|
||
|
<dd><p>is set to <var>n</var>/2.
|
||
|
</p></dd>
|
||
|
</dl>
|
||
|
|
||
|
<p>See below for a documentation of the individual
|
||
|
parameters controlling inlining and for the defaults of these parameters.
|
||
|
</p>
|
||
|
<p><em>Note:</em> there may be no value to <samp>-finline-limit</samp> that results
|
||
|
in default behavior.
|
||
|
</p>
|
||
|
<p><em>Note:</em> pseudo instruction represents, in this particular context, an
|
||
|
abstract measurement of function’s size. In no way does it represent a count
|
||
|
of assembly instructions and as such its exact meaning might change from one
|
||
|
release to an another.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-keep-inline-dllexport</code></dt>
|
||
|
<dd><a name="index-fno_002dkeep_002dinline_002ddllexport"></a>
|
||
|
<p>This is a more fine-grained version of <samp>-fkeep-inline-functions</samp>,
|
||
|
which applies only to functions that are declared using the <code>dllexport</code>
|
||
|
attribute or declspec. See <a href="Function-Attributes.html#Function-Attributes">Declaring Attributes of
|
||
|
Functions</a>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fkeep-inline-functions</code></dt>
|
||
|
<dd><a name="index-fkeep_002dinline_002dfunctions"></a>
|
||
|
<p>In C, emit <code>static</code> functions that are declared <code>inline</code>
|
||
|
into the object file, even if the function has been inlined into all
|
||
|
of its callers. This switch does not affect functions using the
|
||
|
<code>extern inline</code> extension in GNU C90. In C++, emit any and all
|
||
|
inline functions into the object file.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fkeep-static-functions</code></dt>
|
||
|
<dd><a name="index-fkeep_002dstatic_002dfunctions"></a>
|
||
|
<p>Emit <code>static</code> functions into the object file, even if the function
|
||
|
is never used.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fkeep-static-consts</code></dt>
|
||
|
<dd><a name="index-fkeep_002dstatic_002dconsts"></a>
|
||
|
<p>Emit variables declared <code>static const</code> when optimization isn’t turned
|
||
|
on, even if the variables aren’t referenced.
|
||
|
</p>
|
||
|
<p>GCC enables this option by default. If you want to force the compiler to
|
||
|
check if a variable is referenced, regardless of whether or not
|
||
|
optimization is turned on, use the <samp>-fno-keep-static-consts</samp> option.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fmerge-constants</code></dt>
|
||
|
<dd><a name="index-fmerge_002dconstants"></a>
|
||
|
<p>Attempt to merge identical constants (string constants and floating-point
|
||
|
constants) across compilation units.
|
||
|
</p>
|
||
|
<p>This option is the default for optimized compilation if the assembler and
|
||
|
linker support it. Use <samp>-fno-merge-constants</samp> to inhibit this
|
||
|
behavior.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fmerge-all-constants</code></dt>
|
||
|
<dd><a name="index-fmerge_002dall_002dconstants"></a>
|
||
|
<p>Attempt to merge identical constants and identical variables.
|
||
|
</p>
|
||
|
<p>This option implies <samp>-fmerge-constants</samp>. In addition to
|
||
|
<samp>-fmerge-constants</samp> this considers e.g. even constant initialized
|
||
|
arrays or initialized constant variables with integral or floating-point
|
||
|
types. Languages like C or C++ require each variable, including multiple
|
||
|
instances of the same variable in recursive calls, to have distinct locations,
|
||
|
so using this option results in non-conforming
|
||
|
behavior.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fmodulo-sched</code></dt>
|
||
|
<dd><a name="index-fmodulo_002dsched"></a>
|
||
|
<p>Perform swing modulo scheduling immediately before the first scheduling
|
||
|
pass. This pass looks at innermost loops and reorders their
|
||
|
instructions by overlapping different iterations.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fmodulo-sched-allow-regmoves</code></dt>
|
||
|
<dd><a name="index-fmodulo_002dsched_002dallow_002dregmoves"></a>
|
||
|
<p>Perform more aggressive SMS-based modulo scheduling with register moves
|
||
|
allowed. By setting this flag certain anti-dependences edges are
|
||
|
deleted, which triggers the generation of reg-moves based on the
|
||
|
life-range analysis. This option is effective only with
|
||
|
<samp>-fmodulo-sched</samp> enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-branch-count-reg</code></dt>
|
||
|
<dd><a name="index-fno_002dbranch_002dcount_002dreg"></a>
|
||
|
<p>Avoid running a pass scanning for opportunities to use “decrement and
|
||
|
branch” instructions on a count register instead of generating sequences
|
||
|
of instructions that decrement a register, compare it against zero, and
|
||
|
then branch based upon the result. This option is only meaningful on
|
||
|
architectures that support such instructions, which include x86, PowerPC,
|
||
|
IA-64 and S/390. Note that the <samp>-fno-branch-count-reg</samp> option
|
||
|
doesn’t remove the decrement and branch instructions from the generated
|
||
|
instruction stream introduced by other optimization passes.
|
||
|
</p>
|
||
|
<p>Enabled by default at <samp>-O1</samp> and higher.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fbranch-count-reg</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-function-cse</code></dt>
|
||
|
<dd><a name="index-fno_002dfunction_002dcse"></a>
|
||
|
<p>Do not put function addresses in registers; make each instruction that
|
||
|
calls a constant function contain the function’s address explicitly.
|
||
|
</p>
|
||
|
<p>This option results in less efficient code, but some strange hacks
|
||
|
that alter the assembler output may be confused by the optimizations
|
||
|
performed when this option is not used.
|
||
|
</p>
|
||
|
<p>The default is <samp>-ffunction-cse</samp>
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-zero-initialized-in-bss</code></dt>
|
||
|
<dd><a name="index-fno_002dzero_002dinitialized_002din_002dbss"></a>
|
||
|
<p>If the target supports a BSS section, GCC by default puts variables that
|
||
|
are initialized to zero into BSS. This can save space in the resulting
|
||
|
code.
|
||
|
</p>
|
||
|
<p>This option turns off this behavior because some programs explicitly
|
||
|
rely on variables going to the data section—e.g., so that the
|
||
|
resulting executable can find the beginning of that section and/or make
|
||
|
assumptions based on that.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fzero-initialized-in-bss</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fthread-jumps</code></dt>
|
||
|
<dd><a name="index-fthread_002djumps"></a>
|
||
|
<p>Perform optimizations that check to see if a jump branches to a
|
||
|
location where another comparison subsumed by the first is found. If
|
||
|
so, the first branch is redirected to either the destination of the
|
||
|
second branch or a point immediately following it, depending on whether
|
||
|
the condition is known to be true or false.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsplit-wide-types</code></dt>
|
||
|
<dd><a name="index-fsplit_002dwide_002dtypes"></a>
|
||
|
<p>When using a type that occupies multiple registers, such as <code>long
|
||
|
long</code> on a 32-bit system, split the registers apart and allocate them
|
||
|
independently. This normally generates better code for those types,
|
||
|
but may make debugging more difficult.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>,
|
||
|
<samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcse-follow-jumps</code></dt>
|
||
|
<dd><a name="index-fcse_002dfollow_002djumps"></a>
|
||
|
<p>In common subexpression elimination (CSE), scan through jump instructions
|
||
|
when the target of the jump is not reached by any other path. For
|
||
|
example, when CSE encounters an <code>if</code> statement with an
|
||
|
<code>else</code> clause, CSE follows the jump when the condition
|
||
|
tested is false.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcse-skip-blocks</code></dt>
|
||
|
<dd><a name="index-fcse_002dskip_002dblocks"></a>
|
||
|
<p>This is similar to <samp>-fcse-follow-jumps</samp>, but causes CSE to
|
||
|
follow jumps that conditionally skip over blocks. When CSE
|
||
|
encounters a simple <code>if</code> statement with no else clause,
|
||
|
<samp>-fcse-skip-blocks</samp> causes CSE to follow the jump around the
|
||
|
body of the <code>if</code>.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-frerun-cse-after-loop</code></dt>
|
||
|
<dd><a name="index-frerun_002dcse_002dafter_002dloop"></a>
|
||
|
<p>Re-run common subexpression elimination after loop optimizations are
|
||
|
performed.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fgcse</code></dt>
|
||
|
<dd><a name="index-fgcse"></a>
|
||
|
<p>Perform a global common subexpression elimination pass.
|
||
|
This pass also performs global constant and copy propagation.
|
||
|
</p>
|
||
|
<p><em>Note:</em> When compiling a program using computed gotos, a GCC
|
||
|
extension, you may get better run-time performance if you disable
|
||
|
the global common subexpression elimination pass by adding
|
||
|
<samp>-fno-gcse</samp> to the command line.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fgcse-lm</code></dt>
|
||
|
<dd><a name="index-fgcse_002dlm"></a>
|
||
|
<p>When <samp>-fgcse-lm</samp> is enabled, global common subexpression elimination
|
||
|
attempts to move loads that are only killed by stores into themselves. This
|
||
|
allows a loop containing a load/store sequence to be changed to a load outside
|
||
|
the loop, and a copy/store within the loop.
|
||
|
</p>
|
||
|
<p>Enabled by default when <samp>-fgcse</samp> is enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fgcse-sm</code></dt>
|
||
|
<dd><a name="index-fgcse_002dsm"></a>
|
||
|
<p>When <samp>-fgcse-sm</samp> is enabled, a store motion pass is run after
|
||
|
global common subexpression elimination. This pass attempts to move
|
||
|
stores out of loops. When used in conjunction with <samp>-fgcse-lm</samp>,
|
||
|
loops containing a load/store sequence can be changed to a load before
|
||
|
the loop and a store after the loop.
|
||
|
</p>
|
||
|
<p>Not enabled at any optimization level.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fgcse-las</code></dt>
|
||
|
<dd><a name="index-fgcse_002dlas"></a>
|
||
|
<p>When <samp>-fgcse-las</samp> is enabled, the global common subexpression
|
||
|
elimination pass eliminates redundant loads that come after stores to the
|
||
|
same memory location (both partial and full redundancies).
|
||
|
</p>
|
||
|
<p>Not enabled at any optimization level.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fgcse-after-reload</code></dt>
|
||
|
<dd><a name="index-fgcse_002dafter_002dreload"></a>
|
||
|
<p>When <samp>-fgcse-after-reload</samp> is enabled, a redundant load elimination
|
||
|
pass is performed after reload. The purpose of this pass is to clean up
|
||
|
redundant spilling.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-faggressive-loop-optimizations</code></dt>
|
||
|
<dd><a name="index-faggressive_002dloop_002doptimizations"></a>
|
||
|
<p>This option tells the loop optimizer to use language constraints to
|
||
|
derive bounds for the number of iterations of a loop. This assumes that
|
||
|
loop code does not invoke undefined behavior by for example causing signed
|
||
|
integer overflows or out-of-bound array accesses. The bounds for the
|
||
|
number of iterations of a loop are used to guide loop unrolling and peeling
|
||
|
and loop exit test optimizations.
|
||
|
This option is enabled by default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-funconstrained-commons</code></dt>
|
||
|
<dd><a name="index-funconstrained_002dcommons"></a>
|
||
|
<p>This option tells the compiler that variables declared in common blocks
|
||
|
(e.g. Fortran) may later be overridden with longer trailing arrays. This
|
||
|
prevents certain optimizations that depend on knowing the array bounds.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcrossjumping</code></dt>
|
||
|
<dd><a name="index-fcrossjumping"></a>
|
||
|
<p>Perform cross-jumping transformation.
|
||
|
This transformation unifies equivalent code and saves code size. The
|
||
|
resulting code may or may not perform better than without cross-jumping.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fauto-inc-dec</code></dt>
|
||
|
<dd><a name="index-fauto_002dinc_002ddec"></a>
|
||
|
<p>Combine increments or decrements of addresses with memory accesses.
|
||
|
This pass is always skipped on architectures that do not have
|
||
|
instructions to support this. Enabled by default at <samp>-O</samp> and
|
||
|
higher on architectures that support this.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdce</code></dt>
|
||
|
<dd><a name="index-fdce"></a>
|
||
|
<p>Perform dead code elimination (DCE) on RTL.
|
||
|
Enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdse</code></dt>
|
||
|
<dd><a name="index-fdse"></a>
|
||
|
<p>Perform dead store elimination (DSE) on RTL.
|
||
|
Enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fif-conversion</code></dt>
|
||
|
<dd><a name="index-fif_002dconversion"></a>
|
||
|
<p>Attempt to transform conditional jumps into branch-less equivalents. This
|
||
|
includes use of conditional moves, min, max, set flags and abs instructions, and
|
||
|
some tricks doable by standard arithmetics. The use of conditional execution
|
||
|
on chips where it is available is controlled by <samp>-fif-conversion2</samp>.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fif-conversion2</code></dt>
|
||
|
<dd><a name="index-fif_002dconversion2"></a>
|
||
|
<p>Use conditional execution (where available) to transform conditional jumps into
|
||
|
branch-less equivalents.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdeclone-ctor-dtor</code></dt>
|
||
|
<dd><a name="index-fdeclone_002dctor_002ddtor"></a>
|
||
|
<p>The C++ ABI requires multiple entry points for constructors and
|
||
|
destructors: one for a base subobject, one for a complete object, and
|
||
|
one for a virtual destructor that calls operator delete afterwards.
|
||
|
For a hierarchy with virtual bases, the base and complete variants are
|
||
|
clones, which means two copies of the function. With this option, the
|
||
|
base and complete variants are changed to be thunks that call a common
|
||
|
implementation.
|
||
|
</p>
|
||
|
<p>Enabled by <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdelete-null-pointer-checks</code></dt>
|
||
|
<dd><a name="index-fdelete_002dnull_002dpointer_002dchecks"></a>
|
||
|
<p>Assume that programs cannot safely dereference null pointers, and that
|
||
|
no code or data element resides at address zero.
|
||
|
This option enables simple constant
|
||
|
folding optimizations at all optimization levels. In addition, other
|
||
|
optimization passes in GCC use this flag to control global dataflow
|
||
|
analyses that eliminate useless checks for null pointers; these assume
|
||
|
that a memory access to address zero always results in a trap, so
|
||
|
that if a pointer is checked after it has already been dereferenced,
|
||
|
it cannot be null.
|
||
|
</p>
|
||
|
<p>Note however that in some environments this assumption is not true.
|
||
|
Use <samp>-fno-delete-null-pointer-checks</samp> to disable this optimization
|
||
|
for programs that depend on that behavior.
|
||
|
</p>
|
||
|
<p>This option is enabled by default on most targets. On Nios II ELF, it
|
||
|
defaults to off. On AVR, CR16, and MSP430, this option is completely disabled.
|
||
|
</p>
|
||
|
<p>Passes that use the dataflow information
|
||
|
are enabled independently at different optimization levels.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdevirtualize</code></dt>
|
||
|
<dd><a name="index-fdevirtualize"></a>
|
||
|
<p>Attempt to convert calls to virtual functions to direct calls. This
|
||
|
is done both within a procedure and interprocedurally as part of
|
||
|
indirect inlining (<samp>-findirect-inlining</samp>) and interprocedural constant
|
||
|
propagation (<samp>-fipa-cp</samp>).
|
||
|
Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdevirtualize-speculatively</code></dt>
|
||
|
<dd><a name="index-fdevirtualize_002dspeculatively"></a>
|
||
|
<p>Attempt to convert calls to virtual functions to speculative direct calls.
|
||
|
Based on the analysis of the type inheritance graph, determine for a given call
|
||
|
the set of likely targets. If the set is small, preferably of size 1, change
|
||
|
the call into a conditional deciding between direct and indirect calls. The
|
||
|
speculative calls enable more optimizations, such as inlining. When they seem
|
||
|
useless after further optimization, they are converted back into original form.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdevirtualize-at-ltrans</code></dt>
|
||
|
<dd><a name="index-fdevirtualize_002dat_002dltrans"></a>
|
||
|
<p>Stream extra information needed for aggressive devirtualization when running
|
||
|
the link-time optimizer in local transformation mode.
|
||
|
This option enables more devirtualization but
|
||
|
significantly increases the size of streamed data. For this reason it is
|
||
|
disabled by default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fexpensive-optimizations</code></dt>
|
||
|
<dd><a name="index-fexpensive_002doptimizations"></a>
|
||
|
<p>Perform a number of minor optimizations that are relatively expensive.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-free</code></dt>
|
||
|
<dd><a name="index-free"></a>
|
||
|
<p>Attempt to remove redundant extension instructions. This is especially
|
||
|
helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
|
||
|
registers after writing to their lower 32-bit half.
|
||
|
</p>
|
||
|
<p>Enabled for Alpha, AArch64 and x86 at levels <samp>-O2</samp>,
|
||
|
<samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-lifetime-dse</code></dt>
|
||
|
<dd><a name="index-fno_002dlifetime_002ddse"></a>
|
||
|
<p>In C++ the value of an object is only affected by changes within its
|
||
|
lifetime: when the constructor begins, the object has an indeterminate
|
||
|
value, and any changes during the lifetime of the object are dead when
|
||
|
the object is destroyed. Normally dead store elimination will take
|
||
|
advantage of this; if your code relies on the value of the object
|
||
|
storage persisting beyond the lifetime of the object, you can use this
|
||
|
flag to disable this optimization. To preserve stores before the
|
||
|
constructor starts (e.g. because your operator new clears the object
|
||
|
storage) but still treat the object as dead after the destructor you,
|
||
|
can use <samp>-flifetime-dse=1</samp>. The default behavior can be
|
||
|
explicitly selected with <samp>-flifetime-dse=2</samp>.
|
||
|
<samp>-flifetime-dse=0</samp> is equivalent to <samp>-fno-lifetime-dse</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-flive-range-shrinkage</code></dt>
|
||
|
<dd><a name="index-flive_002drange_002dshrinkage"></a>
|
||
|
<p>Attempt to decrease register pressure through register live range
|
||
|
shrinkage. This is helpful for fast processors with small or moderate
|
||
|
size register sets.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fira-algorithm=<var>algorithm</var></code></dt>
|
||
|
<dd><a name="index-fira_002dalgorithm"></a>
|
||
|
<p>Use the specified coloring algorithm for the integrated register
|
||
|
allocator. The <var>algorithm</var> argument can be ‘<samp>priority</samp>’, which
|
||
|
specifies Chow’s priority coloring, or ‘<samp>CB</samp>’, which specifies
|
||
|
Chaitin-Briggs coloring. Chaitin-Briggs coloring is not implemented
|
||
|
for all architectures, but for those targets that do support it, it is
|
||
|
the default because it generates better code.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fira-region=<var>region</var></code></dt>
|
||
|
<dd><a name="index-fira_002dregion"></a>
|
||
|
<p>Use specified regions for the integrated register allocator. The
|
||
|
<var>region</var> argument should be one of the following:
|
||
|
</p>
|
||
|
<dl compact="compact">
|
||
|
<dt>‘<samp>all</samp>’</dt>
|
||
|
<dd><p>Use all loops as register allocation regions.
|
||
|
This can give the best results for machines with a small and/or
|
||
|
irregular register set.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt>‘<samp>mixed</samp>’</dt>
|
||
|
<dd><p>Use all loops except for loops with small register pressure
|
||
|
as the regions. This value usually gives
|
||
|
the best results in most cases and for most architectures,
|
||
|
and is enabled by default when compiling with optimization for speed
|
||
|
(<samp>-O</samp>, <samp>-O2</samp>, …).
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt>‘<samp>one</samp>’</dt>
|
||
|
<dd><p>Use all functions as a single region.
|
||
|
This typically results in the smallest code size, and is enabled by default for
|
||
|
<samp>-Os</samp> or <samp>-O0</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
</dl>
|
||
|
|
||
|
</dd>
|
||
|
<dt><code>-fira-hoist-pressure</code></dt>
|
||
|
<dd><a name="index-fira_002dhoist_002dpressure"></a>
|
||
|
<p>Use IRA to evaluate register pressure in the code hoisting pass for
|
||
|
decisions to hoist expressions. This option usually results in smaller
|
||
|
code, but it can slow the compiler down.
|
||
|
</p>
|
||
|
<p>This option is enabled at level <samp>-Os</samp> for all targets.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fira-loop-pressure</code></dt>
|
||
|
<dd><a name="index-fira_002dloop_002dpressure"></a>
|
||
|
<p>Use IRA to evaluate register pressure in loops for decisions to move
|
||
|
loop invariants. This option usually results in generation
|
||
|
of faster and smaller code on machines with large register files (>= 32
|
||
|
registers), but it can slow the compiler down.
|
||
|
</p>
|
||
|
<p>This option is enabled at level <samp>-O3</samp> for some targets.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-ira-share-save-slots</code></dt>
|
||
|
<dd><a name="index-fno_002dira_002dshare_002dsave_002dslots"></a>
|
||
|
<p>Disable sharing of stack slots used for saving call-used hard
|
||
|
registers living through a call. Each hard register gets a
|
||
|
separate stack slot, and as a result function stack frames are
|
||
|
larger.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-ira-share-spill-slots</code></dt>
|
||
|
<dd><a name="index-fno_002dira_002dshare_002dspill_002dslots"></a>
|
||
|
<p>Disable sharing of stack slots allocated for pseudo-registers. Each
|
||
|
pseudo-register that does not get a hard register gets a separate
|
||
|
stack slot, and as a result function stack frames are larger.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-flra-remat</code></dt>
|
||
|
<dd><a name="index-flra_002dremat"></a>
|
||
|
<p>Enable CFG-sensitive rematerialization in LRA. Instead of loading
|
||
|
values of spilled pseudos, LRA tries to rematerialize (recalculate)
|
||
|
values if it is profitable.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fdelayed-branch</code></dt>
|
||
|
<dd><a name="index-fdelayed_002dbranch"></a>
|
||
|
<p>If supported for the target machine, attempt to reorder instructions
|
||
|
to exploit instruction slots available after delayed branch
|
||
|
instructions.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fschedule-insns</code></dt>
|
||
|
<dd><a name="index-fschedule_002dinsns"></a>
|
||
|
<p>If supported for the target machine, attempt to reorder instructions to
|
||
|
eliminate execution stalls due to required data being unavailable. This
|
||
|
helps machines that have slow floating point or memory load instructions
|
||
|
by allowing other instructions to be issued until the result of the load
|
||
|
or floating-point instruction is required.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fschedule-insns2</code></dt>
|
||
|
<dd><a name="index-fschedule_002dinsns2"></a>
|
||
|
<p>Similar to <samp>-fschedule-insns</samp>, but requests an additional pass of
|
||
|
instruction scheduling after register allocation has been done. This is
|
||
|
especially useful on machines with a relatively small number of
|
||
|
registers and where memory load instructions take more than one cycle.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-sched-interblock</code></dt>
|
||
|
<dd><a name="index-fno_002dsched_002dinterblock"></a>
|
||
|
<p>Don’t schedule instructions across basic blocks. This is normally
|
||
|
enabled by default when scheduling before register allocation, i.e.
|
||
|
with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-sched-spec</code></dt>
|
||
|
<dd><a name="index-fno_002dsched_002dspec"></a>
|
||
|
<p>Don’t allow speculative motion of non-load instructions. This is normally
|
||
|
enabled by default when scheduling before register allocation, i.e.
|
||
|
with <samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-pressure</code></dt>
|
||
|
<dd><a name="index-fsched_002dpressure"></a>
|
||
|
<p>Enable register pressure sensitive insn scheduling before register
|
||
|
allocation. This only makes sense when scheduling before register
|
||
|
allocation is enabled, i.e. with <samp>-fschedule-insns</samp> or at
|
||
|
<samp>-O2</samp> or higher. Usage of this option can improve the
|
||
|
generated code and decrease its size by preventing register pressure
|
||
|
increase above the number of available hard registers and subsequent
|
||
|
spills in register allocation.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-spec-load</code></dt>
|
||
|
<dd><a name="index-fsched_002dspec_002dload"></a>
|
||
|
<p>Allow speculative motion of some load instructions. This only makes
|
||
|
sense when scheduling before register allocation, i.e. with
|
||
|
<samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-spec-load-dangerous</code></dt>
|
||
|
<dd><a name="index-fsched_002dspec_002dload_002ddangerous"></a>
|
||
|
<p>Allow speculative motion of more load instructions. This only makes
|
||
|
sense when scheduling before register allocation, i.e. with
|
||
|
<samp>-fschedule-insns</samp> or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-stalled-insns</code></dt>
|
||
|
<dt><code>-fsched-stalled-insns=<var>n</var></code></dt>
|
||
|
<dd><a name="index-fsched_002dstalled_002dinsns"></a>
|
||
|
<p>Define how many insns (if any) can be moved prematurely from the queue
|
||
|
of stalled insns into the ready list during the second scheduling pass.
|
||
|
<samp>-fno-sched-stalled-insns</samp> means that no insns are moved
|
||
|
prematurely, <samp>-fsched-stalled-insns=0</samp> means there is no limit
|
||
|
on how many queued insns can be moved prematurely.
|
||
|
<samp>-fsched-stalled-insns</samp> without a value is equivalent to
|
||
|
<samp>-fsched-stalled-insns=1</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-stalled-insns-dep</code></dt>
|
||
|
<dt><code>-fsched-stalled-insns-dep=<var>n</var></code></dt>
|
||
|
<dd><a name="index-fsched_002dstalled_002dinsns_002ddep"></a>
|
||
|
<p>Define how many insn groups (cycles) are examined for a dependency
|
||
|
on a stalled insn that is a candidate for premature removal from the queue
|
||
|
of stalled insns. This has an effect only during the second scheduling pass,
|
||
|
and only if <samp>-fsched-stalled-insns</samp> is used.
|
||
|
<samp>-fno-sched-stalled-insns-dep</samp> is equivalent to
|
||
|
<samp>-fsched-stalled-insns-dep=0</samp>.
|
||
|
<samp>-fsched-stalled-insns-dep</samp> without a value is equivalent to
|
||
|
<samp>-fsched-stalled-insns-dep=1</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched2-use-superblocks</code></dt>
|
||
|
<dd><a name="index-fsched2_002duse_002dsuperblocks"></a>
|
||
|
<p>When scheduling after register allocation, use superblock scheduling.
|
||
|
This allows motion across basic block boundaries,
|
||
|
resulting in faster schedules. This option is experimental, as not all machine
|
||
|
descriptions used by GCC model the CPU closely enough to avoid unreliable
|
||
|
results from the algorithm.
|
||
|
</p>
|
||
|
<p>This only makes sense when scheduling after register allocation, i.e. with
|
||
|
<samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-group-heuristic</code></dt>
|
||
|
<dd><a name="index-fsched_002dgroup_002dheuristic"></a>
|
||
|
<p>Enable the group heuristic in the scheduler. This heuristic favors
|
||
|
the instruction that belongs to a schedule group. This is enabled
|
||
|
by default when scheduling is enabled, i.e. with <samp>-fschedule-insns</samp>
|
||
|
or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-critical-path-heuristic</code></dt>
|
||
|
<dd><a name="index-fsched_002dcritical_002dpath_002dheuristic"></a>
|
||
|
<p>Enable the critical-path heuristic in the scheduler. This heuristic favors
|
||
|
instructions on the critical path. This is enabled by default when
|
||
|
scheduling is enabled, i.e. with <samp>-fschedule-insns</samp>
|
||
|
or <samp>-fschedule-insns2</samp> or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-spec-insn-heuristic</code></dt>
|
||
|
<dd><a name="index-fsched_002dspec_002dinsn_002dheuristic"></a>
|
||
|
<p>Enable the speculative instruction heuristic in the scheduler. This
|
||
|
heuristic favors speculative instructions with greater dependency weakness.
|
||
|
This is enabled by default when scheduling is enabled, i.e.
|
||
|
with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp>
|
||
|
or at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-rank-heuristic</code></dt>
|
||
|
<dd><a name="index-fsched_002drank_002dheuristic"></a>
|
||
|
<p>Enable the rank heuristic in the scheduler. This heuristic favors
|
||
|
the instruction belonging to a basic block with greater size or frequency.
|
||
|
This is enabled by default when scheduling is enabled, i.e.
|
||
|
with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or
|
||
|
at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-last-insn-heuristic</code></dt>
|
||
|
<dd><a name="index-fsched_002dlast_002dinsn_002dheuristic"></a>
|
||
|
<p>Enable the last-instruction heuristic in the scheduler. This heuristic
|
||
|
favors the instruction that is less dependent on the last instruction
|
||
|
scheduled. This is enabled by default when scheduling is enabled,
|
||
|
i.e. with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or
|
||
|
at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsched-dep-count-heuristic</code></dt>
|
||
|
<dd><a name="index-fsched_002ddep_002dcount_002dheuristic"></a>
|
||
|
<p>Enable the dependent-count heuristic in the scheduler. This heuristic
|
||
|
favors the instruction that has more instructions depending on it.
|
||
|
This is enabled by default when scheduling is enabled, i.e.
|
||
|
with <samp>-fschedule-insns</samp> or <samp>-fschedule-insns2</samp> or
|
||
|
at <samp>-O2</samp> or higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-freschedule-modulo-scheduled-loops</code></dt>
|
||
|
<dd><a name="index-freschedule_002dmodulo_002dscheduled_002dloops"></a>
|
||
|
<p>Modulo scheduling is performed before traditional scheduling. If a loop
|
||
|
is modulo scheduled, later scheduling passes may change its schedule.
|
||
|
Use this option to control that behavior.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fselective-scheduling</code></dt>
|
||
|
<dd><a name="index-fselective_002dscheduling"></a>
|
||
|
<p>Schedule instructions using selective scheduling algorithm. Selective
|
||
|
scheduling runs instead of the first scheduler pass.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fselective-scheduling2</code></dt>
|
||
|
<dd><a name="index-fselective_002dscheduling2"></a>
|
||
|
<p>Schedule instructions using selective scheduling algorithm. Selective
|
||
|
scheduling runs instead of the second scheduler pass.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsel-sched-pipelining</code></dt>
|
||
|
<dd><a name="index-fsel_002dsched_002dpipelining"></a>
|
||
|
<p>Enable software pipelining of innermost loops during selective scheduling.
|
||
|
This option has no effect unless one of <samp>-fselective-scheduling</samp> or
|
||
|
<samp>-fselective-scheduling2</samp> is turned on.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsel-sched-pipelining-outer-loops</code></dt>
|
||
|
<dd><a name="index-fsel_002dsched_002dpipelining_002douter_002dloops"></a>
|
||
|
<p>When pipelining loops during selective scheduling, also pipeline outer loops.
|
||
|
This option has no effect unless <samp>-fsel-sched-pipelining</samp> is turned on.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsemantic-interposition</code></dt>
|
||
|
<dd><a name="index-fsemantic_002dinterposition"></a>
|
||
|
<p>Some object formats, like ELF, allow interposing of symbols by the
|
||
|
dynamic linker.
|
||
|
This means that for symbols exported from the DSO, the compiler cannot perform
|
||
|
interprocedural propagation, inlining and other optimizations in anticipation
|
||
|
that the function or variable in question may change. While this feature is
|
||
|
useful, for example, to rewrite memory allocation functions by a debugging
|
||
|
implementation, it is expensive in the terms of code quality.
|
||
|
With <samp>-fno-semantic-interposition</samp> the compiler assumes that
|
||
|
if interposition happens for functions the overwriting function will have
|
||
|
precisely the same semantics (and side effects).
|
||
|
Similarly if interposition happens
|
||
|
for variables, the constructor of the variable will be the same. The flag
|
||
|
has no effect for functions explicitly declared inline
|
||
|
(where it is never allowed for interposition to change semantics)
|
||
|
and for symbols explicitly declared weak.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fshrink-wrap</code></dt>
|
||
|
<dd><a name="index-fshrink_002dwrap"></a>
|
||
|
<p>Emit function prologues only before parts of the function that need it,
|
||
|
rather than at the top of the function. This flag is enabled by default at
|
||
|
<samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fshrink-wrap-separate</code></dt>
|
||
|
<dd><a name="index-fshrink_002dwrap_002dseparate"></a>
|
||
|
<p>Shrink-wrap separate parts of the prologue and epilogue separately, so that
|
||
|
those parts are only executed when needed.
|
||
|
This option is on by default, but has no effect unless <samp>-fshrink-wrap</samp>
|
||
|
is also turned on and the target supports this.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcaller-saves</code></dt>
|
||
|
<dd><a name="index-fcaller_002dsaves"></a>
|
||
|
<p>Enable allocation of values to registers that are clobbered by
|
||
|
function calls, by emitting extra instructions to save and restore the
|
||
|
registers around such calls. Such allocation is done only when it
|
||
|
seems to result in better code.
|
||
|
</p>
|
||
|
<p>This option is always enabled by default on certain machines, usually
|
||
|
those which have no call-preserved registers to use instead.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcombine-stack-adjustments</code></dt>
|
||
|
<dd><a name="index-fcombine_002dstack_002dadjustments"></a>
|
||
|
<p>Tracks stack adjustments (pushes and pops) and stack memory references
|
||
|
and then tries to find ways to combine them.
|
||
|
</p>
|
||
|
<p>Enabled by default at <samp>-O1</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-ra</code></dt>
|
||
|
<dd><a name="index-fipa_002dra"></a>
|
||
|
<p>Use caller save registers for allocation if those registers are not used by
|
||
|
any called function. In that case it is not necessary to save and restore
|
||
|
them around calls. This is only possible if called functions are part of
|
||
|
same compilation unit as current function and they are compiled before it.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>, however the option
|
||
|
is disabled if generated code will be instrumented for profiling
|
||
|
(<samp>-p</samp>, or <samp>-pg</samp>) or if callee’s register usage cannot be known
|
||
|
exactly (this happens on targets that do not expose prologues
|
||
|
and epilogues in RTL).
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fconserve-stack</code></dt>
|
||
|
<dd><a name="index-fconserve_002dstack"></a>
|
||
|
<p>Attempt to minimize stack usage. The compiler attempts to use less
|
||
|
stack space, even if that makes the program slower. This option
|
||
|
implies setting the <samp>large-stack-frame</samp> parameter to 100
|
||
|
and the <samp>large-stack-frame-growth</samp> parameter to 400.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-reassoc</code></dt>
|
||
|
<dd><a name="index-ftree_002dreassoc"></a>
|
||
|
<p>Perform reassociation on trees. This flag is enabled by default
|
||
|
at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcode-hoisting</code></dt>
|
||
|
<dd><a name="index-fcode_002dhoisting"></a>
|
||
|
<p>Perform code hoisting. Code hoisting tries to move the
|
||
|
evaluation of expressions executed on all paths to the function exit
|
||
|
as early as possible. This is especially useful as a code size
|
||
|
optimization, but it often helps for code speed as well.
|
||
|
This flag is enabled by default at <samp>-O2</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-pre</code></dt>
|
||
|
<dd><a name="index-ftree_002dpre"></a>
|
||
|
<p>Perform partial redundancy elimination (PRE) on trees. This flag is
|
||
|
enabled by default at <samp>-O2</samp> and <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-partial-pre</code></dt>
|
||
|
<dd><a name="index-ftree_002dpartial_002dpre"></a>
|
||
|
<p>Make partial redundancy elimination (PRE) more aggressive. This flag is
|
||
|
enabled by default at <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-forwprop</code></dt>
|
||
|
<dd><a name="index-ftree_002dforwprop"></a>
|
||
|
<p>Perform forward propagation on trees. This flag is enabled by default
|
||
|
at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-fre</code></dt>
|
||
|
<dd><a name="index-ftree_002dfre"></a>
|
||
|
<p>Perform full redundancy elimination (FRE) on trees. The difference
|
||
|
between FRE and PRE is that FRE only considers expressions
|
||
|
that are computed on all paths leading to the redundant computation.
|
||
|
This analysis is faster than PRE, though it exposes fewer redundancies.
|
||
|
This flag is enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-phiprop</code></dt>
|
||
|
<dd><a name="index-ftree_002dphiprop"></a>
|
||
|
<p>Perform hoisting of loads from conditional pointers on trees. This
|
||
|
pass is enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fhoist-adjacent-loads</code></dt>
|
||
|
<dd><a name="index-fhoist_002dadjacent_002dloads"></a>
|
||
|
<p>Speculatively hoist loads from both branches of an if-then-else if the
|
||
|
loads are from adjacent locations in the same structure and the target
|
||
|
architecture has a conditional move instruction. This flag is enabled
|
||
|
by default at <samp>-O2</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-copy-prop</code></dt>
|
||
|
<dd><a name="index-ftree_002dcopy_002dprop"></a>
|
||
|
<p>Perform copy propagation on trees. This pass eliminates unnecessary
|
||
|
copy operations. This flag is enabled by default at <samp>-O</samp> and
|
||
|
higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-pure-const</code></dt>
|
||
|
<dd><a name="index-fipa_002dpure_002dconst"></a>
|
||
|
<p>Discover which functions are pure or constant.
|
||
|
Enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-reference</code></dt>
|
||
|
<dd><a name="index-fipa_002dreference"></a>
|
||
|
<p>Discover which static variables do not escape the
|
||
|
compilation unit.
|
||
|
Enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-pta</code></dt>
|
||
|
<dd><a name="index-fipa_002dpta"></a>
|
||
|
<p>Perform interprocedural pointer analysis and interprocedural modification
|
||
|
and reference analysis. This option can cause excessive memory and
|
||
|
compile-time usage on large compilation units. It is not enabled by
|
||
|
default at any optimization level.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-profile</code></dt>
|
||
|
<dd><a name="index-fipa_002dprofile"></a>
|
||
|
<p>Perform interprocedural profile propagation. The functions called only from
|
||
|
cold functions are marked as cold. Also functions executed once (such as
|
||
|
<code>cold</code>, <code>noreturn</code>, static constructors or destructors) are identified. Cold
|
||
|
functions and loop less parts of functions executed once are then optimized for
|
||
|
size.
|
||
|
Enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-cp</code></dt>
|
||
|
<dd><a name="index-fipa_002dcp"></a>
|
||
|
<p>Perform interprocedural constant propagation.
|
||
|
This optimization analyzes the program to determine when values passed
|
||
|
to functions are constants and then optimizes accordingly.
|
||
|
This optimization can substantially increase performance
|
||
|
if the application has constants passed to functions.
|
||
|
This flag is enabled by default at <samp>-O2</samp>, <samp>-Os</samp> and <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-cp-clone</code></dt>
|
||
|
<dd><a name="index-fipa_002dcp_002dclone"></a>
|
||
|
<p>Perform function cloning to make interprocedural constant propagation stronger.
|
||
|
When enabled, interprocedural constant propagation performs function cloning
|
||
|
when externally visible function can be called with constant arguments.
|
||
|
Because this optimization can create multiple copies of functions,
|
||
|
it may significantly increase code size
|
||
|
(see <samp>--param ipcp-unit-growth=<var>value</var></samp>).
|
||
|
This flag is enabled by default at <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-bit-cp</code></dt>
|
||
|
<dd><a name="index-_002dfipa_002dbit_002dcp"></a>
|
||
|
<p>When enabled, perform interprocedural bitwise constant
|
||
|
propagation. This flag is enabled by default at <samp>-O2</samp>. It
|
||
|
requires that <samp>-fipa-cp</samp> is enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-vrp</code></dt>
|
||
|
<dd><a name="index-_002dfipa_002dvrp"></a>
|
||
|
<p>When enabled, perform interprocedural propagation of value
|
||
|
ranges. This flag is enabled by default at <samp>-O2</samp>. It requires
|
||
|
that <samp>-fipa-cp</samp> is enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fipa-icf</code></dt>
|
||
|
<dd><a name="index-fipa_002dicf"></a>
|
||
|
<p>Perform Identical Code Folding for functions and read-only variables.
|
||
|
The optimization reduces code size and may disturb unwind stacks by replacing
|
||
|
a function by equivalent one with a different name. The optimization works
|
||
|
more effectively with link-time optimization enabled.
|
||
|
</p>
|
||
|
<p>Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF
|
||
|
works on different levels and thus the optimizations are not same - there are
|
||
|
equivalences that are found only by GCC and equivalences found only by Gold.
|
||
|
</p>
|
||
|
<p>This flag is enabled by default at <samp>-O2</samp> and <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fisolate-erroneous-paths-dereference</code></dt>
|
||
|
<dd><a name="index-fisolate_002derroneous_002dpaths_002ddereference"></a>
|
||
|
<p>Detect paths that trigger erroneous or undefined behavior due to
|
||
|
dereferencing a null pointer. Isolate those paths from the main control
|
||
|
flow and turn the statement with erroneous or undefined behavior into a trap.
|
||
|
This flag is enabled by default at <samp>-O2</samp> and higher and depends on
|
||
|
<samp>-fdelete-null-pointer-checks</samp> also being enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fisolate-erroneous-paths-attribute</code></dt>
|
||
|
<dd><a name="index-fisolate_002derroneous_002dpaths_002dattribute"></a>
|
||
|
<p>Detect paths that trigger erroneous or undefined behavior due to a null value
|
||
|
being used in a way forbidden by a <code>returns_nonnull</code> or <code>nonnull</code>
|
||
|
attribute. Isolate those paths from the main control flow and turn the
|
||
|
statement with erroneous or undefined behavior into a trap. This is not
|
||
|
currently enabled, but may be enabled by <samp>-O2</samp> in the future.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-sink</code></dt>
|
||
|
<dd><a name="index-ftree_002dsink"></a>
|
||
|
<p>Perform forward store motion on trees. This flag is
|
||
|
enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-bit-ccp</code></dt>
|
||
|
<dd><a name="index-ftree_002dbit_002dccp"></a>
|
||
|
<p>Perform sparse conditional bit constant propagation on trees and propagate
|
||
|
pointer alignment information.
|
||
|
This pass only operates on local scalar variables and is enabled by default
|
||
|
at <samp>-O</samp> and higher. It requires that <samp>-ftree-ccp</samp> is enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-ccp</code></dt>
|
||
|
<dd><a name="index-ftree_002dccp"></a>
|
||
|
<p>Perform sparse conditional constant propagation (CCP) on trees. This
|
||
|
pass only operates on local scalar variables and is enabled by default
|
||
|
at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fssa-backprop</code></dt>
|
||
|
<dd><a name="index-fssa_002dbackprop"></a>
|
||
|
<p>Propagate information about uses of a value up the definition chain
|
||
|
in order to simplify the definitions. For example, this pass strips
|
||
|
sign operations if the sign of a value never matters. The flag is
|
||
|
enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fssa-phiopt</code></dt>
|
||
|
<dd><a name="index-fssa_002dphiopt"></a>
|
||
|
<p>Perform pattern matching on SSA PHI nodes to optimize conditional
|
||
|
code. This pass is enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-switch-conversion</code></dt>
|
||
|
<dd><a name="index-ftree_002dswitch_002dconversion"></a>
|
||
|
<p>Perform conversion of simple initializations in a switch to
|
||
|
initializations from a scalar array. This flag is enabled by default
|
||
|
at <samp>-O2</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-tail-merge</code></dt>
|
||
|
<dd><a name="index-ftree_002dtail_002dmerge"></a>
|
||
|
<p>Look for identical code sequences. When found, replace one with a jump to the
|
||
|
other. This optimization is known as tail merging or cross jumping. This flag
|
||
|
is enabled by default at <samp>-O2</samp> and higher. The compilation time
|
||
|
in this pass can
|
||
|
be limited using <samp>max-tail-merge-comparisons</samp> parameter and
|
||
|
<samp>max-tail-merge-iterations</samp> parameter.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-dce</code></dt>
|
||
|
<dd><a name="index-ftree_002ddce"></a>
|
||
|
<p>Perform dead code elimination (DCE) on trees. This flag is enabled by
|
||
|
default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-builtin-call-dce</code></dt>
|
||
|
<dd><a name="index-ftree_002dbuiltin_002dcall_002ddce"></a>
|
||
|
<p>Perform conditional dead code elimination (DCE) for calls to built-in functions
|
||
|
that may set <code>errno</code> but are otherwise free of side effects. This flag is
|
||
|
enabled by default at <samp>-O2</samp> and higher if <samp>-Os</samp> is not also
|
||
|
specified.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-dominator-opts</code></dt>
|
||
|
<dd><a name="index-ftree_002ddominator_002dopts"></a>
|
||
|
<p>Perform a variety of simple scalar cleanups (constant/copy
|
||
|
propagation, redundancy elimination, range propagation and expression
|
||
|
simplification) based on a dominator tree traversal. This also
|
||
|
performs jump threading (to reduce jumps to jumps). This flag is
|
||
|
enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-dse</code></dt>
|
||
|
<dd><a name="index-ftree_002ddse"></a>
|
||
|
<p>Perform dead store elimination (DSE) on trees. A dead store is a store into
|
||
|
a memory location that is later overwritten by another store without
|
||
|
any intervening loads. In this case the earlier store can be deleted. This
|
||
|
flag is enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-ch</code></dt>
|
||
|
<dd><a name="index-ftree_002dch"></a>
|
||
|
<p>Perform loop header copying on trees. This is beneficial since it increases
|
||
|
effectiveness of code motion optimizations. It also saves one jump. This flag
|
||
|
is enabled by default at <samp>-O</samp> and higher. It is not enabled
|
||
|
for <samp>-Os</samp>, since it usually increases code size.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-optimize</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002doptimize"></a>
|
||
|
<p>Perform loop optimizations on trees. This flag is enabled by default
|
||
|
at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-linear</code></dt>
|
||
|
<dt><code>-floop-strip-mine</code></dt>
|
||
|
<dt><code>-floop-block</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002dlinear"></a>
|
||
|
<a name="index-floop_002dstrip_002dmine"></a>
|
||
|
<a name="index-floop_002dblock"></a>
|
||
|
<p>Perform loop nest optimizations. Same as
|
||
|
<samp>-floop-nest-optimize</samp>. To use this code transformation, GCC has
|
||
|
to be configured with <samp>--with-isl</samp> to enable the Graphite loop
|
||
|
transformation infrastructure.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fgraphite-identity</code></dt>
|
||
|
<dd><a name="index-fgraphite_002didentity"></a>
|
||
|
<p>Enable the identity transformation for graphite. For every SCoP we generate
|
||
|
the polyhedral representation and transform it back to gimple. Using
|
||
|
<samp>-fgraphite-identity</samp> we can check the costs or benefits of the
|
||
|
GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations
|
||
|
are also performed by the code generator isl, like index splitting and
|
||
|
dead code elimination in loops.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-floop-nest-optimize</code></dt>
|
||
|
<dd><a name="index-floop_002dnest_002doptimize"></a>
|
||
|
<p>Enable the isl based loop nest optimizer. This is a generic loop nest
|
||
|
optimizer based on the Pluto optimization algorithms. It calculates a loop
|
||
|
structure optimized for data-locality and parallelism. This option
|
||
|
is experimental.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-floop-parallelize-all</code></dt>
|
||
|
<dd><a name="index-floop_002dparallelize_002dall"></a>
|
||
|
<p>Use the Graphite data dependence analysis to identify loops that can
|
||
|
be parallelized. Parallelize all the loops that can be analyzed to
|
||
|
not contain loop carried dependences without checking that it is
|
||
|
profitable to parallelize the loops.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-coalesce-vars</code></dt>
|
||
|
<dd><a name="index-ftree_002dcoalesce_002dvars"></a>
|
||
|
<p>While transforming the program out of the SSA representation, attempt to
|
||
|
reduce copying by coalescing versions of different user-defined
|
||
|
variables, instead of just compiler temporaries. This may severely
|
||
|
limit the ability to debug an optimized program compiled with
|
||
|
<samp>-fno-var-tracking-assignments</samp>. In the negated form, this flag
|
||
|
prevents SSA coalescing of user variables. This option is enabled by
|
||
|
default if optimization is enabled, and it does very little otherwise.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-if-convert</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002dif_002dconvert"></a>
|
||
|
<p>Attempt to transform conditional jumps in the innermost loops to
|
||
|
branch-less equivalents. The intent is to remove control-flow from
|
||
|
the innermost loops in order to improve the ability of the
|
||
|
vectorization pass to handle these loops. This is enabled by default
|
||
|
if vectorization is enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-distribution</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002ddistribution"></a>
|
||
|
<p>Perform loop distribution. This flag can improve cache performance on
|
||
|
big loop bodies and allow further loop optimizations, like
|
||
|
parallelization or vectorization, to take place. For example, the loop
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">DO I = 1, N
|
||
|
A(I) = B(I) + C
|
||
|
D(I) = E(I) * F
|
||
|
ENDDO
|
||
|
</pre></div>
|
||
|
<p>is transformed to
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">DO I = 1, N
|
||
|
A(I) = B(I) + C
|
||
|
ENDDO
|
||
|
DO I = 1, N
|
||
|
D(I) = E(I) * F
|
||
|
ENDDO
|
||
|
</pre></div>
|
||
|
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-distribute-patterns</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002ddistribute_002dpatterns"></a>
|
||
|
<p>Perform loop distribution of patterns that can be code generated with
|
||
|
calls to a library. This flag is enabled by default at <samp>-O3</samp>.
|
||
|
</p>
|
||
|
<p>This pass distributes the initialization loops and generates a call to
|
||
|
memset zero. For example, the loop
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">DO I = 1, N
|
||
|
A(I) = 0
|
||
|
B(I) = A(I) + I
|
||
|
ENDDO
|
||
|
</pre></div>
|
||
|
<p>is transformed to
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">DO I = 1, N
|
||
|
A(I) = 0
|
||
|
ENDDO
|
||
|
DO I = 1, N
|
||
|
B(I) = A(I) + I
|
||
|
ENDDO
|
||
|
</pre></div>
|
||
|
<p>and the initialization loop is transformed into a call to memset zero.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-floop-interchange</code></dt>
|
||
|
<dd><a name="index-floop_002dinterchange"></a>
|
||
|
<p>Perform loop interchange outside of graphite. This flag can improve cache
|
||
|
performance on loop nest and allow further loop optimizations, like
|
||
|
vectorization, to take place. For example, the loop
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">for (int i = 0; i < N; i++)
|
||
|
for (int j = 0; j < N; j++)
|
||
|
for (int k = 0; k < N; k++)
|
||
|
c[i][j] = c[i][j] + a[i][k]*b[k][j];
|
||
|
</pre></div>
|
||
|
<p>is transformed to
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">for (int i = 0; i < N; i++)
|
||
|
for (int k = 0; k < N; k++)
|
||
|
for (int j = 0; j < N; j++)
|
||
|
c[i][j] = c[i][j] + a[i][k]*b[k][j];
|
||
|
</pre></div>
|
||
|
<p>This flag is enabled by default at <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-floop-unroll-and-jam</code></dt>
|
||
|
<dd><a name="index-floop_002dunroll_002dand_002djam"></a>
|
||
|
<p>Apply unroll and jam transformations on feasible loops. In a loop
|
||
|
nest this unrolls the outer loop by some factor and fuses the resulting
|
||
|
multiple inner loops. This flag is enabled by default at <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-im</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002dim"></a>
|
||
|
<p>Perform loop invariant motion on trees. This pass moves only invariants that
|
||
|
are hard to handle at RTL level (function calls, operations that expand to
|
||
|
nontrivial sequences of insns). With <samp>-funswitch-loops</samp> it also moves
|
||
|
operands of conditions that are invariant out of the loop, so that we can use
|
||
|
just trivial invariantness analysis in loop unswitching. The pass also includes
|
||
|
store motion.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-ivcanon</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002divcanon"></a>
|
||
|
<p>Create a canonical counter for number of iterations in loops for which
|
||
|
determining number of iterations requires complicated analysis. Later
|
||
|
optimizations then may determine the number easily. Useful especially
|
||
|
in connection with unrolling.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fivopts</code></dt>
|
||
|
<dd><a name="index-fivopts"></a>
|
||
|
<p>Perform induction variable optimizations (strength reduction, induction
|
||
|
variable merging and induction variable elimination) on trees.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-parallelize-loops=n</code></dt>
|
||
|
<dd><a name="index-ftree_002dparallelize_002dloops"></a>
|
||
|
<p>Parallelize loops, i.e., split their iteration space to run in n threads.
|
||
|
This is only possible for loops whose iterations are independent
|
||
|
and can be arbitrarily reordered. The optimization is only
|
||
|
profitable on multiprocessor machines, for loops that are CPU-intensive,
|
||
|
rather than constrained e.g. by memory bandwidth. This option
|
||
|
implies <samp>-pthread</samp>, and thus is only supported on targets
|
||
|
that have support for <samp>-pthread</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-pta</code></dt>
|
||
|
<dd><a name="index-ftree_002dpta"></a>
|
||
|
<p>Perform function-local points-to analysis on trees. This flag is
|
||
|
enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-sra</code></dt>
|
||
|
<dd><a name="index-ftree_002dsra"></a>
|
||
|
<p>Perform scalar replacement of aggregates. This pass replaces structure
|
||
|
references with scalars to prevent committing structures to memory too
|
||
|
early. This flag is enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fstore-merging</code></dt>
|
||
|
<dd><a name="index-fstore_002dmerging"></a>
|
||
|
<p>Perform merging of narrow stores to consecutive memory addresses. This pass
|
||
|
merges contiguous stores of immediate values narrower than a word into fewer
|
||
|
wider stores to reduce the number of instructions. This is enabled by default
|
||
|
at <samp>-O2</samp> and higher as well as <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-ter</code></dt>
|
||
|
<dd><a name="index-ftree_002dter"></a>
|
||
|
<p>Perform temporary expression replacement during the SSA->normal phase. Single
|
||
|
use/single def temporaries are replaced at their use location with their
|
||
|
defining expression. This results in non-GIMPLE code, but gives the expanders
|
||
|
much more complex trees to work on resulting in better RTL generation. This is
|
||
|
enabled by default at <samp>-O</samp> and higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-slsr</code></dt>
|
||
|
<dd><a name="index-ftree_002dslsr"></a>
|
||
|
<p>Perform straight-line strength reduction on trees. This recognizes related
|
||
|
expressions involving multiplications and replaces them by less expensive
|
||
|
calculations when possible. This is enabled by default at <samp>-O</samp> and
|
||
|
higher.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-vectorize</code></dt>
|
||
|
<dd><a name="index-ftree_002dvectorize"></a>
|
||
|
<p>Perform vectorization on trees. This flag enables <samp>-ftree-loop-vectorize</samp>
|
||
|
and <samp>-ftree-slp-vectorize</samp> if not explicitly specified.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-loop-vectorize</code></dt>
|
||
|
<dd><a name="index-ftree_002dloop_002dvectorize"></a>
|
||
|
<p>Perform loop vectorization on trees. This flag is enabled by default at
|
||
|
<samp>-O3</samp> and when <samp>-ftree-vectorize</samp> is enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-slp-vectorize</code></dt>
|
||
|
<dd><a name="index-ftree_002dslp_002dvectorize"></a>
|
||
|
<p>Perform basic block vectorization on trees. This flag is enabled by default at
|
||
|
<samp>-O3</samp> and when <samp>-ftree-vectorize</samp> is enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fvect-cost-model=<var>model</var></code></dt>
|
||
|
<dd><a name="index-fvect_002dcost_002dmodel"></a>
|
||
|
<p>Alter the cost model used for vectorization. The <var>model</var> argument
|
||
|
should be one of ‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’ or ‘<samp>cheap</samp>’.
|
||
|
With the ‘<samp>unlimited</samp>’ model the vectorized code-path is assumed
|
||
|
to be profitable while with the ‘<samp>dynamic</samp>’ model a runtime check
|
||
|
guards the vectorized code-path to enable it only for iteration
|
||
|
counts that will likely execute faster than when executing the original
|
||
|
scalar loop. The ‘<samp>cheap</samp>’ model disables vectorization of
|
||
|
loops where doing so would be cost prohibitive for example due to
|
||
|
required runtime checks for data dependence or alignment but otherwise
|
||
|
is equal to the ‘<samp>dynamic</samp>’ model.
|
||
|
The default cost model depends on other optimization flags and is
|
||
|
either ‘<samp>dynamic</samp>’ or ‘<samp>cheap</samp>’.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsimd-cost-model=<var>model</var></code></dt>
|
||
|
<dd><a name="index-fsimd_002dcost_002dmodel"></a>
|
||
|
<p>Alter the cost model used for vectorization of loops marked with the OpenMP
|
||
|
simd directive. The <var>model</var> argument should be one of
|
||
|
‘<samp>unlimited</samp>’, ‘<samp>dynamic</samp>’, ‘<samp>cheap</samp>’. All values of <var>model</var>
|
||
|
have the same meaning as described in <samp>-fvect-cost-model</samp> and by
|
||
|
default a cost model defined with <samp>-fvect-cost-model</samp> is used.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftree-vrp</code></dt>
|
||
|
<dd><a name="index-ftree_002dvrp"></a>
|
||
|
<p>Perform Value Range Propagation on trees. This is similar to the
|
||
|
constant propagation pass, but instead of values, ranges of values are
|
||
|
propagated. This allows the optimizers to remove unnecessary range
|
||
|
checks like array bound checks and null pointer checks. This is
|
||
|
enabled by default at <samp>-O2</samp> and higher. Null pointer check
|
||
|
elimination is only done if <samp>-fdelete-null-pointer-checks</samp> is
|
||
|
enabled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsplit-paths</code></dt>
|
||
|
<dd><a name="index-fsplit_002dpaths"></a>
|
||
|
<p>Split paths leading to loop backedges. This can improve dead code
|
||
|
elimination and common subexpression elimination. This is enabled by
|
||
|
default at <samp>-O2</samp> and above.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsplit-ivs-in-unroller</code></dt>
|
||
|
<dd><a name="index-fsplit_002divs_002din_002dunroller"></a>
|
||
|
<p>Enables expression of values of induction variables in later iterations
|
||
|
of the unrolled loop using the value in the first iteration. This breaks
|
||
|
long dependency chains, thus improving efficiency of the scheduling passes.
|
||
|
</p>
|
||
|
<p>A combination of <samp>-fweb</samp> and CSE is often sufficient to obtain the
|
||
|
same effect. However, that is not reliable in cases where the loop body
|
||
|
is more complicated than a single basic block. It also does not work at all
|
||
|
on some architectures due to restrictions in the CSE pass.
|
||
|
</p>
|
||
|
<p>This optimization is enabled by default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fvariable-expansion-in-unroller</code></dt>
|
||
|
<dd><a name="index-fvariable_002dexpansion_002din_002dunroller"></a>
|
||
|
<p>With this option, the compiler creates multiple copies of some
|
||
|
local variables when unrolling a loop, which can result in superior code.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fpartial-inlining</code></dt>
|
||
|
<dd><a name="index-fpartial_002dinlining"></a>
|
||
|
<p>Inline parts of functions. This option has any effect only
|
||
|
when inlining itself is turned on by the <samp>-finline-functions</samp>
|
||
|
or <samp>-finline-small-functions</samp> options.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fpredictive-commoning</code></dt>
|
||
|
<dd><a name="index-fpredictive_002dcommoning"></a>
|
||
|
<p>Perform predictive commoning optimization, i.e., reusing computations
|
||
|
(especially memory loads and stores) performed in previous
|
||
|
iterations of loops.
|
||
|
</p>
|
||
|
<p>This option is enabled at level <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fprefetch-loop-arrays</code></dt>
|
||
|
<dd><a name="index-fprefetch_002dloop_002darrays"></a>
|
||
|
<p>If supported by the target machine, generate instructions to prefetch
|
||
|
memory to improve the performance of loops that access large arrays.
|
||
|
</p>
|
||
|
<p>This option may generate better or worse code; results are highly
|
||
|
dependent on the structure of loops within the source code.
|
||
|
</p>
|
||
|
<p>Disabled at level <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-printf-return-value</code></dt>
|
||
|
<dd><a name="index-fno_002dprintf_002dreturn_002dvalue"></a>
|
||
|
<p>Do not substitute constants for known return value of formatted output
|
||
|
functions such as <code>sprintf</code>, <code>snprintf</code>, <code>vsprintf</code>, and
|
||
|
<code>vsnprintf</code> (but not <code>printf</code> of <code>fprintf</code>). This
|
||
|
transformation allows GCC to optimize or even eliminate branches based
|
||
|
on the known return value of these functions called with arguments that
|
||
|
are either constant, or whose values are known to be in a range that
|
||
|
makes determining the exact return value possible. For example, when
|
||
|
<samp>-fprintf-return-value</samp> is in effect, both the branch and the
|
||
|
body of the <code>if</code> statement (but not the call to <code>snprint</code>)
|
||
|
can be optimized away when <code>i</code> is a 32-bit or smaller integer
|
||
|
because the return value is guaranteed to be at most 8.
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">char buf[9];
|
||
|
if (snprintf (buf, "%08x", i) >= sizeof buf)
|
||
|
…
|
||
|
</pre></div>
|
||
|
|
||
|
<p>The <samp>-fprintf-return-value</samp> option relies on other optimizations
|
||
|
and yields best results with <samp>-O2</samp> and above. It works in tandem
|
||
|
with the <samp>-Wformat-overflow</samp> and <samp>-Wformat-truncation</samp>
|
||
|
options. The <samp>-fprintf-return-value</samp> option is enabled by default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-peephole</code></dt>
|
||
|
<dt><code>-fno-peephole2</code></dt>
|
||
|
<dd><a name="index-fno_002dpeephole"></a>
|
||
|
<a name="index-fno_002dpeephole2"></a>
|
||
|
<p>Disable any machine-specific peephole optimizations. The difference
|
||
|
between <samp>-fno-peephole</samp> and <samp>-fno-peephole2</samp> is in how they
|
||
|
are implemented in the compiler; some targets use one, some use the
|
||
|
other, a few use both.
|
||
|
</p>
|
||
|
<p><samp>-fpeephole</samp> is enabled by default.
|
||
|
<samp>-fpeephole2</samp> enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-guess-branch-probability</code></dt>
|
||
|
<dd><a name="index-fno_002dguess_002dbranch_002dprobability"></a>
|
||
|
<p>Do not guess branch probabilities using heuristics.
|
||
|
</p>
|
||
|
<p>GCC uses heuristics to guess branch probabilities if they are
|
||
|
not provided by profiling feedback (<samp>-fprofile-arcs</samp>). These
|
||
|
heuristics are based on the control flow graph. If some branch probabilities
|
||
|
are specified by <code>__builtin_expect</code>, then the heuristics are
|
||
|
used to guess branch probabilities for the rest of the control flow graph,
|
||
|
taking the <code>__builtin_expect</code> info into account. The interactions
|
||
|
between the heuristics and <code>__builtin_expect</code> can be complex, and in
|
||
|
some cases, it may be useful to disable the heuristics so that the effects
|
||
|
of <code>__builtin_expect</code> are easier to understand.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fguess-branch-probability</samp> at levels
|
||
|
<samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-freorder-blocks</code></dt>
|
||
|
<dd><a name="index-freorder_002dblocks"></a>
|
||
|
<p>Reorder basic blocks in the compiled function in order to reduce number of
|
||
|
taken branches and improve code locality.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-freorder-blocks-algorithm=<var>algorithm</var></code></dt>
|
||
|
<dd><a name="index-freorder_002dblocks_002dalgorithm"></a>
|
||
|
<p>Use the specified algorithm for basic block reordering. The
|
||
|
<var>algorithm</var> argument can be ‘<samp>simple</samp>’, which does not increase
|
||
|
code size (except sometimes due to secondary effects like alignment),
|
||
|
or ‘<samp>stc</samp>’, the “software trace cache” algorithm, which tries to
|
||
|
put all often executed code together, minimizing the number of branches
|
||
|
executed by making extra copies of code.
|
||
|
</p>
|
||
|
<p>The default is ‘<samp>simple</samp>’ at levels <samp>-O</samp>, <samp>-Os</samp>, and
|
||
|
‘<samp>stc</samp>’ at levels <samp>-O2</samp>, <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-freorder-blocks-and-partition</code></dt>
|
||
|
<dd><a name="index-freorder_002dblocks_002dand_002dpartition"></a>
|
||
|
<p>In addition to reordering basic blocks in the compiled function, in order
|
||
|
to reduce number of taken branches, partitions hot and cold basic blocks
|
||
|
into separate sections of the assembly and <samp>.o</samp> files, to improve
|
||
|
paging and cache locality performance.
|
||
|
</p>
|
||
|
<p>This optimization is automatically turned off in the presence of
|
||
|
exception handling or unwind tables (on targets using setjump/longjump or target specific scheme), for linkonce sections, for functions with a user-defined
|
||
|
section attribute and on any architecture that does not support named
|
||
|
sections. When <samp>-fsplit-stack</samp> is used this option is not
|
||
|
enabled by default (to avoid linker errors), but may be enabled
|
||
|
explicitly (if using a working linker).
|
||
|
</p>
|
||
|
<p>Enabled for x86 at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-freorder-functions</code></dt>
|
||
|
<dd><a name="index-freorder_002dfunctions"></a>
|
||
|
<p>Reorder functions in the object file in order to
|
||
|
improve code locality. This is implemented by using special
|
||
|
subsections <code>.text.hot</code> for most frequently executed functions and
|
||
|
<code>.text.unlikely</code> for unlikely executed functions. Reordering is done by
|
||
|
the linker so object file format must support named sections and linker must
|
||
|
place them in a reasonable way.
|
||
|
</p>
|
||
|
<p>Also profile feedback must be available to make this option effective. See
|
||
|
<samp>-fprofile-arcs</samp> for details.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fstrict-aliasing</code></dt>
|
||
|
<dd><a name="index-fstrict_002daliasing"></a>
|
||
|
<p>Allow the compiler to assume the strictest aliasing rules applicable to
|
||
|
the language being compiled. For C (and C++), this activates
|
||
|
optimizations based on the type of expressions. In particular, an
|
||
|
object of one type is assumed never to reside at the same address as an
|
||
|
object of a different type, unless the types are almost the same. For
|
||
|
example, an <code>unsigned int</code> can alias an <code>int</code>, but not a
|
||
|
<code>void*</code> or a <code>double</code>. A character type may alias any other
|
||
|
type.
|
||
|
</p>
|
||
|
<a name="Type_002dpunning"></a><p>Pay special attention to code like this:
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">union a_union {
|
||
|
int i;
|
||
|
double d;
|
||
|
};
|
||
|
|
||
|
int f() {
|
||
|
union a_union t;
|
||
|
t.d = 3.0;
|
||
|
return t.i;
|
||
|
}
|
||
|
</pre></div>
|
||
|
<p>The practice of reading from a different union member than the one most
|
||
|
recently written to (called “type-punning”) is common. Even with
|
||
|
<samp>-fstrict-aliasing</samp>, type-punning is allowed, provided the memory
|
||
|
is accessed through the union type. So, the code above works as
|
||
|
expected. See <a href="Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation">Structures unions enumerations and bit-fields implementation</a>. However, this code might not:
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">int f() {
|
||
|
union a_union t;
|
||
|
int* ip;
|
||
|
t.d = 3.0;
|
||
|
ip = &t.i;
|
||
|
return *ip;
|
||
|
}
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Similarly, access by taking the address, casting the resulting pointer
|
||
|
and dereferencing the result has undefined behavior, even if the cast
|
||
|
uses a union type, e.g.:
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">int f() {
|
||
|
double d = 3.0;
|
||
|
return ((union a_union *) &d)->i;
|
||
|
}
|
||
|
</pre></div>
|
||
|
|
||
|
<p>The <samp>-fstrict-aliasing</samp> option is enabled at levels
|
||
|
<samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-falign-functions</code></dt>
|
||
|
<dt><code>-falign-functions=<var>n</var></code></dt>
|
||
|
<dd><a name="index-falign_002dfunctions"></a>
|
||
|
<p>Align the start of functions to the next power-of-two greater than
|
||
|
<var>n</var>, skipping up to <var>n</var> bytes. For instance,
|
||
|
<samp>-falign-functions=32</samp> aligns functions to the next 32-byte
|
||
|
boundary, but <samp>-falign-functions=24</samp> aligns to the next
|
||
|
32-byte boundary only if this can be done by skipping 23 bytes or less.
|
||
|
</p>
|
||
|
<p><samp>-fno-align-functions</samp> and <samp>-falign-functions=1</samp> are
|
||
|
equivalent and mean that functions are not aligned.
|
||
|
</p>
|
||
|
<p>Some assemblers only support this flag when <var>n</var> is a power of two;
|
||
|
in that case, it is rounded up.
|
||
|
</p>
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
|
||
|
The maximum allowed <var>n</var> option value is 65536.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-flimit-function-alignment</code></dt>
|
||
|
<dd><p>If this option is enabled, the compiler tries to avoid unnecessarily
|
||
|
overaligning functions. It attempts to instruct the assembler to align
|
||
|
by the amount specified by <samp>-falign-functions</samp>, but not to
|
||
|
skip more bytes than the size of the function.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-falign-labels</code></dt>
|
||
|
<dt><code>-falign-labels=<var>n</var></code></dt>
|
||
|
<dd><a name="index-falign_002dlabels"></a>
|
||
|
<p>Align all branch targets to a power-of-two boundary, skipping up to
|
||
|
<var>n</var> bytes like <samp>-falign-functions</samp>. This option can easily
|
||
|
make code slower, because it must insert dummy operations for when the
|
||
|
branch target is reached in the usual flow of the code.
|
||
|
</p>
|
||
|
<p><samp>-fno-align-labels</samp> and <samp>-falign-labels=1</samp> are
|
||
|
equivalent and mean that labels are not aligned.
|
||
|
</p>
|
||
|
<p>If <samp>-falign-loops</samp> or <samp>-falign-jumps</samp> are applicable and
|
||
|
are greater than this value, then their values are used instead.
|
||
|
</p>
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default
|
||
|
which is very likely to be ‘<samp>1</samp>’, meaning no alignment.
|
||
|
The maximum allowed <var>n</var> option value is 65536.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-falign-loops</code></dt>
|
||
|
<dt><code>-falign-loops=<var>n</var></code></dt>
|
||
|
<dd><a name="index-falign_002dloops"></a>
|
||
|
<p>Align loops to a power-of-two boundary, skipping up to <var>n</var> bytes
|
||
|
like <samp>-falign-functions</samp>. If the loops are
|
||
|
executed many times, this makes up for any execution of the dummy
|
||
|
operations.
|
||
|
</p>
|
||
|
<p><samp>-fno-align-loops</samp> and <samp>-falign-loops=1</samp> are
|
||
|
equivalent and mean that loops are not aligned.
|
||
|
The maximum allowed <var>n</var> option value is 65536.
|
||
|
</p>
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-falign-jumps</code></dt>
|
||
|
<dt><code>-falign-jumps=<var>n</var></code></dt>
|
||
|
<dd><a name="index-falign_002djumps"></a>
|
||
|
<p>Align branch targets to a power-of-two boundary, for branch targets
|
||
|
where the targets can only be reached by jumping, skipping up to <var>n</var>
|
||
|
bytes like <samp>-falign-functions</samp>. In this case, no dummy operations
|
||
|
need be executed.
|
||
|
</p>
|
||
|
<p><samp>-fno-align-jumps</samp> and <samp>-falign-jumps=1</samp> are
|
||
|
equivalent and mean that loops are not aligned.
|
||
|
</p>
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
|
||
|
The maximum allowed <var>n</var> option value is 65536.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-funit-at-a-time</code></dt>
|
||
|
<dd><a name="index-funit_002dat_002da_002dtime"></a>
|
||
|
<p>This option is left for compatibility reasons. <samp>-funit-at-a-time</samp>
|
||
|
has no effect, while <samp>-fno-unit-at-a-time</samp> implies
|
||
|
<samp>-fno-toplevel-reorder</samp> and <samp>-fno-section-anchors</samp>.
|
||
|
</p>
|
||
|
<p>Enabled by default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-toplevel-reorder</code></dt>
|
||
|
<dd><a name="index-fno_002dtoplevel_002dreorder"></a>
|
||
|
<p>Do not reorder top-level functions, variables, and <code>asm</code>
|
||
|
statements. Output them in the same order that they appear in the
|
||
|
input file. When this option is used, unreferenced static variables
|
||
|
are not removed. This option is intended to support existing code
|
||
|
that relies on a particular ordering. For new code, it is better to
|
||
|
use attributes when possible.
|
||
|
</p>
|
||
|
<p>Enabled at level <samp>-O0</samp>. When disabled explicitly, it also implies
|
||
|
<samp>-fno-section-anchors</samp>, which is otherwise enabled at <samp>-O0</samp> on some
|
||
|
targets.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fweb</code></dt>
|
||
|
<dd><a name="index-fweb"></a>
|
||
|
<p>Constructs webs as commonly used for register allocation purposes and assign
|
||
|
each web individual pseudo register. This allows the register allocation pass
|
||
|
to operate on pseudos directly, but also strengthens several other optimization
|
||
|
passes, such as CSE, loop optimizer and trivial dead code remover. It can,
|
||
|
however, make debugging impossible, since variables no longer stay in a
|
||
|
“home register”.
|
||
|
</p>
|
||
|
<p>Enabled by default with <samp>-funroll-loops</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fwhole-program</code></dt>
|
||
|
<dd><a name="index-fwhole_002dprogram"></a>
|
||
|
<p>Assume that the current compilation unit represents the whole program being
|
||
|
compiled. All public functions and variables with the exception of <code>main</code>
|
||
|
and those merged by attribute <code>externally_visible</code> become static functions
|
||
|
and in effect are optimized more aggressively by interprocedural optimizers.
|
||
|
</p>
|
||
|
<p>This option should not be used in combination with <samp>-flto</samp>.
|
||
|
Instead relying on a linker plugin should provide safer and more precise
|
||
|
information.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-flto[=<var>n</var>]</code></dt>
|
||
|
<dd><a name="index-flto"></a>
|
||
|
<p>This option runs the standard link-time optimizer. When invoked
|
||
|
with source code, it generates GIMPLE (one of GCC’s internal
|
||
|
representations) and writes it to special ELF sections in the object
|
||
|
file. When the object files are linked together, all the function
|
||
|
bodies are read from these ELF sections and instantiated as if they
|
||
|
had been part of the same translation unit.
|
||
|
</p>
|
||
|
<p>To use the link-time optimizer, <samp>-flto</samp> and optimization
|
||
|
options should be specified at compile time and during the final link.
|
||
|
It is recommended that you compile all the files participating in the
|
||
|
same link with the same options and also specify those options at
|
||
|
link time.
|
||
|
For example:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">gcc -c -O2 -flto foo.c
|
||
|
gcc -c -O2 -flto bar.c
|
||
|
gcc -o myprog -flto -O2 foo.o bar.o
|
||
|
</pre></div>
|
||
|
|
||
|
<p>The first two invocations to GCC save a bytecode representation
|
||
|
of GIMPLE into special ELF sections inside <samp>foo.o</samp> and
|
||
|
<samp>bar.o</samp>. The final invocation reads the GIMPLE bytecode from
|
||
|
<samp>foo.o</samp> and <samp>bar.o</samp>, merges the two files into a single
|
||
|
internal image, and compiles the result as usual. Since both
|
||
|
<samp>foo.o</samp> and <samp>bar.o</samp> are merged into a single image, this
|
||
|
causes all the interprocedural analyses and optimizations in GCC to
|
||
|
work across the two files as if they were a single one. This means,
|
||
|
for example, that the inliner is able to inline functions in
|
||
|
<samp>bar.o</samp> into functions in <samp>foo.o</samp> and vice-versa.
|
||
|
</p>
|
||
|
<p>Another (simpler) way to enable link-time optimization is:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">gcc -o myprog -flto -O2 foo.c bar.c
|
||
|
</pre></div>
|
||
|
|
||
|
<p>The above generates bytecode for <samp>foo.c</samp> and <samp>bar.c</samp>,
|
||
|
merges them together into a single GIMPLE representation and optimizes
|
||
|
them as usual to produce <samp>myprog</samp>.
|
||
|
</p>
|
||
|
<p>The only important thing to keep in mind is that to enable link-time
|
||
|
optimizations you need to use the GCC driver to perform the link step.
|
||
|
GCC then automatically performs link-time optimization if any of the
|
||
|
objects involved were compiled with the <samp>-flto</samp> command-line option.
|
||
|
You generally
|
||
|
should specify the optimization options to be used for link-time
|
||
|
optimization though GCC tries to be clever at guessing an
|
||
|
optimization level to use from the options used at compile time
|
||
|
if you fail to specify one at link time. You can always override
|
||
|
the automatic decision to do link-time optimization
|
||
|
by passing <samp>-fno-lto</samp> to the link command.
|
||
|
</p>
|
||
|
<p>To make whole program optimization effective, it is necessary to make
|
||
|
certain whole program assumptions. The compiler needs to know
|
||
|
what functions and variables can be accessed by libraries and runtime
|
||
|
outside of the link-time optimized unit. When supported by the linker,
|
||
|
the linker plugin (see <samp>-fuse-linker-plugin</samp>) passes information
|
||
|
to the compiler about used and externally visible symbols. When
|
||
|
the linker plugin is not available, <samp>-fwhole-program</samp> should be
|
||
|
used to allow the compiler to make these assumptions, which leads
|
||
|
to more aggressive optimization decisions.
|
||
|
</p>
|
||
|
<p>When <samp>-fuse-linker-plugin</samp> is not enabled, when a file is
|
||
|
compiled with <samp>-flto</samp>, the generated object file is larger than
|
||
|
a regular object file because it contains GIMPLE bytecodes and the usual
|
||
|
final code (see <samp>-ffat-lto-objects</samp>. This means that
|
||
|
object files with LTO information can be linked as normal object
|
||
|
files; if <samp>-fno-lto</samp> is passed to the linker, no
|
||
|
interprocedural optimizations are applied. Note that when
|
||
|
<samp>-fno-fat-lto-objects</samp> is enabled the compile stage is faster
|
||
|
but you cannot perform a regular, non-LTO link on them.
|
||
|
</p>
|
||
|
<p>Additionally, the optimization flags used to compile individual files
|
||
|
are not necessarily related to those used at link time. For instance,
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">gcc -c -O0 -ffat-lto-objects -flto foo.c
|
||
|
gcc -c -O0 -ffat-lto-objects -flto bar.c
|
||
|
gcc -o myprog -O3 foo.o bar.o
|
||
|
</pre></div>
|
||
|
|
||
|
<p>This produces individual object files with unoptimized assembler
|
||
|
code, but the resulting binary <samp>myprog</samp> is optimized at
|
||
|
<samp>-O3</samp>. If, instead, the final binary is generated with
|
||
|
<samp>-fno-lto</samp>, then <samp>myprog</samp> is not optimized.
|
||
|
</p>
|
||
|
<p>When producing the final binary, GCC only
|
||
|
applies link-time optimizations to those files that contain bytecode.
|
||
|
Therefore, you can mix and match object files and libraries with
|
||
|
GIMPLE bytecodes and final object code. GCC automatically selects
|
||
|
which files to optimize in LTO mode and which files to link without
|
||
|
further processing.
|
||
|
</p>
|
||
|
<p>There are some code generation flags preserved by GCC when
|
||
|
generating bytecodes, as they need to be used during the final link
|
||
|
stage. Generally options specified at link time override those
|
||
|
specified at compile time.
|
||
|
</p>
|
||
|
<p>If you do not specify an optimization level option <samp>-O</samp> at
|
||
|
link time, then GCC uses the highest optimization level
|
||
|
used when compiling the object files.
|
||
|
</p>
|
||
|
<p>Currently, the following options and their settings are taken from
|
||
|
the first object file that explicitly specifies them:
|
||
|
<samp>-fPIC</samp>, <samp>-fpic</samp>, <samp>-fpie</samp>, <samp>-fcommon</samp>,
|
||
|
<samp>-fexceptions</samp>, <samp>-fnon-call-exceptions</samp>, <samp>-fgnu-tm</samp>
|
||
|
and all the <samp>-m</samp> target flags.
|
||
|
</p>
|
||
|
<p>Certain ABI-changing flags are required to match in all compilation units,
|
||
|
and trying to override this at link time with a conflicting value
|
||
|
is ignored. This includes options such as <samp>-freg-struct-return</samp>
|
||
|
and <samp>-fpcc-struct-return</samp>.
|
||
|
</p>
|
||
|
<p>Other options such as <samp>-ffp-contract</samp>, <samp>-fno-strict-overflow</samp>,
|
||
|
<samp>-fwrapv</samp>, <samp>-fno-trapv</samp> or <samp>-fno-strict-aliasing</samp>
|
||
|
are passed through to the link stage and merged conservatively for
|
||
|
conflicting translation units. Specifically
|
||
|
<samp>-fno-strict-overflow</samp>, <samp>-fwrapv</samp> and <samp>-fno-trapv</samp> take
|
||
|
precedence; and for example <samp>-ffp-contract=off</samp> takes precedence
|
||
|
over <samp>-ffp-contract=fast</samp>. You can override them at link time.
|
||
|
</p>
|
||
|
<p>If LTO encounters objects with C linkage declared with incompatible
|
||
|
types in separate translation units to be linked together (undefined
|
||
|
behavior according to ISO C99 6.2.7), a non-fatal diagnostic may be
|
||
|
issued. The behavior is still undefined at run time. Similar
|
||
|
diagnostics may be raised for other languages.
|
||
|
</p>
|
||
|
<p>Another feature of LTO is that it is possible to apply interprocedural
|
||
|
optimizations on files written in different languages:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">gcc -c -flto foo.c
|
||
|
g++ -c -flto bar.cc
|
||
|
gfortran -c -flto baz.f90
|
||
|
g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Notice that the final link is done with <code>g++</code> to get the C++
|
||
|
runtime libraries and <samp>-lgfortran</samp> is added to get the Fortran
|
||
|
runtime libraries. In general, when mixing languages in LTO mode, you
|
||
|
should use the same link command options as when mixing languages in a
|
||
|
regular (non-LTO) compilation.
|
||
|
</p>
|
||
|
<p>If object files containing GIMPLE bytecode are stored in a library archive, say
|
||
|
<samp>libfoo.a</samp>, it is possible to extract and use them in an LTO link if you
|
||
|
are using a linker with plugin support. To create static libraries suitable
|
||
|
for LTO, use <code>gcc-ar</code> and <code>gcc-ranlib</code> instead of <code>ar</code>
|
||
|
and <code>ranlib</code>;
|
||
|
to show the symbols of object files with GIMPLE bytecode, use
|
||
|
<code>gcc-nm</code>. Those commands require that <code>ar</code>, <code>ranlib</code>
|
||
|
and <code>nm</code> have been compiled with plugin support. At link time, use the
|
||
|
flag <samp>-fuse-linker-plugin</samp> to ensure that the library participates in
|
||
|
the LTO optimization process:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo
|
||
|
</pre></div>
|
||
|
|
||
|
<p>With the linker plugin enabled, the linker extracts the needed
|
||
|
GIMPLE files from <samp>libfoo.a</samp> and passes them on to the running GCC
|
||
|
to make them part of the aggregated GIMPLE image to be optimized.
|
||
|
</p>
|
||
|
<p>If you are not using a linker with plugin support and/or do not
|
||
|
enable the linker plugin, then the objects inside <samp>libfoo.a</samp>
|
||
|
are extracted and linked as usual, but they do not participate
|
||
|
in the LTO optimization process. In order to make a static library suitable
|
||
|
for both LTO optimization and usual linkage, compile its object files with
|
||
|
<samp>-flto</samp> <samp>-ffat-lto-objects</samp>.
|
||
|
</p>
|
||
|
<p>Link-time optimizations do not require the presence of the whole program to
|
||
|
operate. If the program does not require any symbols to be exported, it is
|
||
|
possible to combine <samp>-flto</samp> and <samp>-fwhole-program</samp> to allow
|
||
|
the interprocedural optimizers to use more aggressive assumptions which may
|
||
|
lead to improved optimization opportunities.
|
||
|
Use of <samp>-fwhole-program</samp> is not needed when linker plugin is
|
||
|
active (see <samp>-fuse-linker-plugin</samp>).
|
||
|
</p>
|
||
|
<p>The current implementation of LTO makes no
|
||
|
attempt to generate bytecode that is portable between different
|
||
|
types of hosts. The bytecode files are versioned and there is a
|
||
|
strict version check, so bytecode files generated in one version of
|
||
|
GCC do not work with an older or newer version of GCC.
|
||
|
</p>
|
||
|
<p>Link-time optimization does not work well with generation of debugging
|
||
|
information on systems other than those using a combination of ELF and
|
||
|
DWARF.
|
||
|
</p>
|
||
|
<p>If you specify the optional <var>n</var>, the optimization and code
|
||
|
generation done at link time is executed in parallel using <var>n</var>
|
||
|
parallel jobs by utilizing an installed <code>make</code> program. The
|
||
|
environment variable <code>MAKE</code> may be used to override the program
|
||
|
used. The default value for <var>n</var> is 1.
|
||
|
</p>
|
||
|
<p>You can also specify <samp>-flto=jobserver</samp> to use GNU make’s
|
||
|
job server mode to determine the number of parallel jobs. This
|
||
|
is useful when the Makefile calling GCC is already executing in parallel.
|
||
|
You must prepend a ‘<samp>+</samp>’ to the command recipe in the parent Makefile
|
||
|
for this to work. This option likely only works if <code>MAKE</code> is
|
||
|
GNU make.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-flto-partition=<var>alg</var></code></dt>
|
||
|
<dd><a name="index-flto_002dpartition"></a>
|
||
|
<p>Specify the partitioning algorithm used by the link-time optimizer.
|
||
|
The value is either ‘<samp>1to1</samp>’ to specify a partitioning mirroring
|
||
|
the original source files or ‘<samp>balanced</samp>’ to specify partitioning
|
||
|
into equally sized chunks (whenever possible) or ‘<samp>max</samp>’ to create
|
||
|
new partition for every symbol where possible. Specifying ‘<samp>none</samp>’
|
||
|
as an algorithm disables partitioning and streaming completely.
|
||
|
The default value is ‘<samp>balanced</samp>’. While ‘<samp>1to1</samp>’ can be used
|
||
|
as an workaround for various code ordering issues, the ‘<samp>max</samp>’
|
||
|
partitioning is intended for internal testing only.
|
||
|
The value ‘<samp>one</samp>’ specifies that exactly one partition should be
|
||
|
used while the value ‘<samp>none</samp>’ bypasses partitioning and executes
|
||
|
the link-time optimization step directly from the WPA phase.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-flto-odr-type-merging</code></dt>
|
||
|
<dd><a name="index-flto_002dodr_002dtype_002dmerging"></a>
|
||
|
<p>Enable streaming of mangled types names of C++ types and their unification
|
||
|
at link time. This increases size of LTO object files, but enables
|
||
|
diagnostics about One Definition Rule violations.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-flto-compression-level=<var>n</var></code></dt>
|
||
|
<dd><a name="index-flto_002dcompression_002dlevel"></a>
|
||
|
<p>This option specifies the level of compression used for intermediate
|
||
|
language written to LTO object files, and is only meaningful in
|
||
|
conjunction with LTO mode (<samp>-flto</samp>). Valid
|
||
|
values are 0 (no compression) to 9 (maximum compression). Values
|
||
|
outside this range are clamped to either 0 or 9. If the option is not
|
||
|
given, a default balanced compression setting is used.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fuse-linker-plugin</code></dt>
|
||
|
<dd><a name="index-fuse_002dlinker_002dplugin"></a>
|
||
|
<p>Enables the use of a linker plugin during link-time optimization. This
|
||
|
option relies on plugin support in the linker, which is available in gold
|
||
|
or in GNU ld 2.21 or newer.
|
||
|
</p>
|
||
|
<p>This option enables the extraction of object files with GIMPLE bytecode out
|
||
|
of library archives. This improves the quality of optimization by exposing
|
||
|
more code to the link-time optimizer. This information specifies what
|
||
|
symbols can be accessed externally (by non-LTO object or during dynamic
|
||
|
linking). Resulting code quality improvements on binaries (and shared
|
||
|
libraries that use hidden visibility) are similar to <samp>-fwhole-program</samp>.
|
||
|
See <samp>-flto</samp> for a description of the effect of this flag and how to
|
||
|
use it.
|
||
|
</p>
|
||
|
<p>This option is enabled by default when LTO support in GCC is enabled
|
||
|
and GCC was configured for use with
|
||
|
a linker supporting plugins (GNU ld 2.21 or newer or gold).
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ffat-lto-objects</code></dt>
|
||
|
<dd><a name="index-ffat_002dlto_002dobjects"></a>
|
||
|
<p>Fat LTO objects are object files that contain both the intermediate language
|
||
|
and the object code. This makes them usable for both LTO linking and normal
|
||
|
linking. This option is effective only when compiling with <samp>-flto</samp>
|
||
|
and is ignored at link time.
|
||
|
</p>
|
||
|
<p><samp>-fno-fat-lto-objects</samp> improves compilation time over plain LTO, but
|
||
|
requires the complete toolchain to be aware of LTO. It requires a linker with
|
||
|
linker plugin support for basic functionality. Additionally,
|
||
|
<code>nm</code>, <code>ar</code> and <code>ranlib</code>
|
||
|
need to support linker plugins to allow a full-featured build environment
|
||
|
(capable of building static libraries etc). GCC provides the <code>gcc-ar</code>,
|
||
|
<code>gcc-nm</code>, <code>gcc-ranlib</code> wrappers to pass the right options
|
||
|
to these tools. With non fat LTO makefiles need to be modified to use them.
|
||
|
</p>
|
||
|
<p>Note that modern binutils provide plugin auto-load mechanism.
|
||
|
Installing the linker plugin into <samp>$libdir/bfd-plugins</samp> has the same
|
||
|
effect as usage of the command wrappers (<code>gcc-ar</code>, <code>gcc-nm</code> and
|
||
|
<code>gcc-ranlib</code>).
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-fat-lto-objects</samp> on targets with linker plugin
|
||
|
support.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcompare-elim</code></dt>
|
||
|
<dd><a name="index-fcompare_002delim"></a>
|
||
|
<p>After register allocation and post-register allocation instruction splitting,
|
||
|
identify arithmetic instructions that compute processor flags similar to a
|
||
|
comparison operation based on that arithmetic. If possible, eliminate the
|
||
|
explicit comparison operation.
|
||
|
</p>
|
||
|
<p>This pass only applies to certain targets that cannot explicitly represent
|
||
|
the comparison operation before register allocation is complete.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcprop-registers</code></dt>
|
||
|
<dd><a name="index-fcprop_002dregisters"></a>
|
||
|
<p>After register allocation and post-register allocation instruction splitting,
|
||
|
perform a copy-propagation pass to try to reduce scheduling dependencies
|
||
|
and occasionally eliminate the copy.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O</samp>, <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fprofile-correction</code></dt>
|
||
|
<dd><a name="index-fprofile_002dcorrection"></a>
|
||
|
<p>Profiles collected using an instrumented binary for multi-threaded programs may
|
||
|
be inconsistent due to missed counter updates. When this option is specified,
|
||
|
GCC uses heuristics to correct or smooth out such inconsistencies. By
|
||
|
default, GCC emits an error message when an inconsistent profile is detected.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fprofile-use</code></dt>
|
||
|
<dt><code>-fprofile-use=<var>path</var></code></dt>
|
||
|
<dd><a name="index-fprofile_002duse"></a>
|
||
|
<p>Enable profile feedback-directed optimizations,
|
||
|
and the following optimizations
|
||
|
which are generally profitable only with profile feedback available:
|
||
|
<samp>-fbranch-probabilities</samp>, <samp>-fvpt</samp>,
|
||
|
<samp>-funroll-loops</samp>, <samp>-fpeel-loops</samp>, <samp>-ftracer</samp>,
|
||
|
<samp>-ftree-vectorize</samp>, and <samp>ftree-loop-distribute-patterns</samp>.
|
||
|
</p>
|
||
|
<p>Before you can use this option, you must first generate profiling information.
|
||
|
See <a href="Instrumentation-Options.html#Instrumentation-Options">Instrumentation Options</a>, for information about the
|
||
|
<samp>-fprofile-generate</samp> option.
|
||
|
</p>
|
||
|
<p>By default, GCC emits an error message if the feedback profiles do not
|
||
|
match the source code. This error can be turned into a warning by using
|
||
|
<samp>-Wcoverage-mismatch</samp>. Note this may result in poorly optimized
|
||
|
code.
|
||
|
</p>
|
||
|
<p>If <var>path</var> is specified, GCC looks at the <var>path</var> to find
|
||
|
the profile feedback data files. See <samp>-fprofile-dir</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fauto-profile</code></dt>
|
||
|
<dt><code>-fauto-profile=<var>path</var></code></dt>
|
||
|
<dd><a name="index-fauto_002dprofile"></a>
|
||
|
<p>Enable sampling-based feedback-directed optimizations,
|
||
|
and the following optimizations
|
||
|
which are generally profitable only with profile feedback available:
|
||
|
<samp>-fbranch-probabilities</samp>, <samp>-fvpt</samp>,
|
||
|
<samp>-funroll-loops</samp>, <samp>-fpeel-loops</samp>, <samp>-ftracer</samp>,
|
||
|
<samp>-ftree-vectorize</samp>,
|
||
|
<samp>-finline-functions</samp>, <samp>-fipa-cp</samp>, <samp>-fipa-cp-clone</samp>,
|
||
|
<samp>-fpredictive-commoning</samp>, <samp>-funswitch-loops</samp>,
|
||
|
<samp>-fgcse-after-reload</samp>, and <samp>-ftree-loop-distribute-patterns</samp>.
|
||
|
</p>
|
||
|
<p><var>path</var> is the name of a file containing AutoFDO profile information.
|
||
|
If omitted, it defaults to <samp>fbdata.afdo</samp> in the current directory.
|
||
|
</p>
|
||
|
<p>Producing an AutoFDO profile data file requires running your program
|
||
|
with the <code>perf</code> utility on a supported GNU/Linux target system.
|
||
|
For more information, see <a href="https://perf.wiki.kernel.org/">https://perf.wiki.kernel.org/</a>.
|
||
|
</p>
|
||
|
<p>E.g.
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">perf record -e br_inst_retired:near_taken -b -o perf.data \
|
||
|
-- your_program
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Then use the <code>create_gcov</code> tool to convert the raw profile data
|
||
|
to a format that can be used by GCC. You must also supply the
|
||
|
unstripped binary for your program to this tool.
|
||
|
See <a href="https://github.com/google/autofdo">https://github.com/google/autofdo</a>.
|
||
|
</p>
|
||
|
<p>E.g.
|
||
|
</p><div class="smallexample">
|
||
|
<pre class="smallexample">create_gcov --binary=your_program.unstripped --profile=perf.data \
|
||
|
--gcov=profile.afdo
|
||
|
</pre></div>
|
||
|
</dd>
|
||
|
</dl>
|
||
|
|
||
|
<p>The following options control compiler behavior regarding floating-point
|
||
|
arithmetic. These options trade off between speed and
|
||
|
correctness. All must be specifically enabled.
|
||
|
</p>
|
||
|
<dl compact="compact">
|
||
|
<dt><code>-ffloat-store</code></dt>
|
||
|
<dd><a name="index-ffloat_002dstore"></a>
|
||
|
<p>Do not store floating-point variables in registers, and inhibit other
|
||
|
options that might change whether a floating-point value is taken from a
|
||
|
register or memory.
|
||
|
</p>
|
||
|
<a name="index-floating_002dpoint-precision"></a>
|
||
|
<p>This option prevents undesirable excess precision on machines such as
|
||
|
the 68000 where the floating registers (of the 68881) keep more
|
||
|
precision than a <code>double</code> is supposed to have. Similarly for the
|
||
|
x86 architecture. For most programs, the excess precision does only
|
||
|
good, but a few programs rely on the precise definition of IEEE floating
|
||
|
point. Use <samp>-ffloat-store</samp> for such programs, after modifying
|
||
|
them to store all pertinent intermediate computations into variables.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fexcess-precision=<var>style</var></code></dt>
|
||
|
<dd><a name="index-fexcess_002dprecision"></a>
|
||
|
<p>This option allows further control over excess precision on machines
|
||
|
where floating-point operations occur in a format with more precision or
|
||
|
range than the IEEE standard and interchange floating-point types. By
|
||
|
default, <samp>-fexcess-precision=fast</samp> is in effect; this means that
|
||
|
operations may be carried out in a wider precision than the types specified
|
||
|
in the source if that would result in faster code, and it is unpredictable
|
||
|
when rounding to the types specified in the source code takes place.
|
||
|
When compiling C, if <samp>-fexcess-precision=standard</samp> is specified then
|
||
|
excess precision follows the rules specified in ISO C99; in particular,
|
||
|
both casts and assignments cause values to be rounded to their
|
||
|
semantic types (whereas <samp>-ffloat-store</samp> only affects
|
||
|
assignments). This option is enabled by default for C if a strict
|
||
|
conformance option such as <samp>-std=c99</samp> is used.
|
||
|
<samp>-ffast-math</samp> enables <samp>-fexcess-precision=fast</samp> by default
|
||
|
regardless of whether a strict conformance option is used.
|
||
|
</p>
|
||
|
<a name="index-mfpmath"></a>
|
||
|
<p><samp>-fexcess-precision=standard</samp> is not implemented for languages
|
||
|
other than C. On the x86, it has no effect if <samp>-mfpmath=sse</samp>
|
||
|
or <samp>-mfpmath=sse+387</samp> is specified; in the former case, IEEE
|
||
|
semantics apply without excess precision, and in the latter, rounding
|
||
|
is unpredictable.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ffast-math</code></dt>
|
||
|
<dd><a name="index-ffast_002dmath"></a>
|
||
|
<p>Sets the options <samp>-fno-math-errno</samp>, <samp>-funsafe-math-optimizations</samp>,
|
||
|
<samp>-ffinite-math-only</samp>, <samp>-fno-rounding-math</samp>,
|
||
|
<samp>-fno-signaling-nans</samp>, <samp>-fcx-limited-range</samp> and
|
||
|
<samp>-fexcess-precision=fast</samp>.
|
||
|
</p>
|
||
|
<p>This option causes the preprocessor macro <code>__FAST_MATH__</code> to be defined.
|
||
|
</p>
|
||
|
<p>This option is not turned on by any <samp>-O</samp> option besides
|
||
|
<samp>-Ofast</samp> since it can result in incorrect output for programs
|
||
|
that depend on an exact implementation of IEEE or ISO rules/specifications
|
||
|
for math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-math-errno</code></dt>
|
||
|
<dd><a name="index-fno_002dmath_002derrno"></a>
|
||
|
<p>Do not set <code>errno</code> after calling math functions that are executed
|
||
|
with a single instruction, e.g., <code>sqrt</code>. A program that relies on
|
||
|
IEEE exceptions for math error handling may want to use this flag
|
||
|
for speed while maintaining IEEE arithmetic compatibility.
|
||
|
</p>
|
||
|
<p>This option is not turned on by any <samp>-O</samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fmath-errno</samp>.
|
||
|
</p>
|
||
|
<p>On Darwin systems, the math library never sets <code>errno</code>. There is
|
||
|
therefore no reason for the compiler to consider the possibility that
|
||
|
it might, and <samp>-fno-math-errno</samp> is the default.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-funsafe-math-optimizations</code></dt>
|
||
|
<dd><a name="index-funsafe_002dmath_002doptimizations"></a>
|
||
|
|
||
|
<p>Allow optimizations for floating-point arithmetic that (a) assume
|
||
|
that arguments and results are valid and (b) may violate IEEE or
|
||
|
ANSI standards. When used at link time, it may include libraries
|
||
|
or startup files that change the default FPU control word or other
|
||
|
similar optimizations.
|
||
|
</p>
|
||
|
<p>This option is not turned on by any <samp>-O</samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
Enables <samp>-fno-signed-zeros</samp>, <samp>-fno-trapping-math</samp>,
|
||
|
<samp>-fassociative-math</samp> and <samp>-freciprocal-math</samp>.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-unsafe-math-optimizations</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fassociative-math</code></dt>
|
||
|
<dd><a name="index-fassociative_002dmath"></a>
|
||
|
|
||
|
<p>Allow re-association of operands in series of floating-point operations.
|
||
|
This violates the ISO C and C++ language standard by possibly changing
|
||
|
computation result. NOTE: re-ordering may change the sign of zero as
|
||
|
well as ignore NaNs and inhibit or create underflow or overflow (and
|
||
|
thus cannot be used on code that relies on rounding behavior like
|
||
|
<code>(x + 2**52) - 2**52</code>. May also reorder floating-point comparisons
|
||
|
and thus may not be used when ordered comparisons are required.
|
||
|
This option requires that both <samp>-fno-signed-zeros</samp> and
|
||
|
<samp>-fno-trapping-math</samp> be in effect. Moreover, it doesn’t make
|
||
|
much sense with <samp>-frounding-math</samp>. For Fortran the option
|
||
|
is automatically enabled when both <samp>-fno-signed-zeros</samp> and
|
||
|
<samp>-fno-trapping-math</samp> are in effect.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-associative-math</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-freciprocal-math</code></dt>
|
||
|
<dd><a name="index-freciprocal_002dmath"></a>
|
||
|
|
||
|
<p>Allow the reciprocal of a value to be used instead of dividing by
|
||
|
the value if this enables optimizations. For example <code>x / y</code>
|
||
|
can be replaced with <code>x * (1/y)</code>, which is useful if <code>(1/y)</code>
|
||
|
is subject to common subexpression elimination. Note that this loses
|
||
|
precision and increases the number of flops operating on the value.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-reciprocal-math</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ffinite-math-only</code></dt>
|
||
|
<dd><a name="index-ffinite_002dmath_002donly"></a>
|
||
|
<p>Allow optimizations for floating-point arithmetic that assume
|
||
|
that arguments and results are not NaNs or +-Infs.
|
||
|
</p>
|
||
|
<p>This option is not turned on by any <samp>-O</samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-finite-math-only</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-signed-zeros</code></dt>
|
||
|
<dd><a name="index-fno_002dsigned_002dzeros"></a>
|
||
|
<p>Allow optimizations for floating-point arithmetic that ignore the
|
||
|
signedness of zero. IEEE arithmetic specifies the behavior of
|
||
|
distinct +0.0 and -0.0 values, which then prohibits simplification
|
||
|
of expressions such as x+0.0 or 0.0*x (even with <samp>-ffinite-math-only</samp>).
|
||
|
This option implies that the sign of a zero result isn’t significant.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fsigned-zeros</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-trapping-math</code></dt>
|
||
|
<dd><a name="index-fno_002dtrapping_002dmath"></a>
|
||
|
<p>Compile code assuming that floating-point operations cannot generate
|
||
|
user-visible traps. These traps include division by zero, overflow,
|
||
|
underflow, inexact result and invalid operation. This option requires
|
||
|
that <samp>-fno-signaling-nans</samp> be in effect. Setting this option may
|
||
|
allow faster code if one relies on “non-stop” IEEE arithmetic, for example.
|
||
|
</p>
|
||
|
<p>This option should never be turned on by any <samp>-O</samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions.
|
||
|
</p>
|
||
|
<p>The default is <samp>-ftrapping-math</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-frounding-math</code></dt>
|
||
|
<dd><a name="index-frounding_002dmath"></a>
|
||
|
<p>Disable transformations and optimizations that assume default floating-point
|
||
|
rounding behavior. This is round-to-zero for all floating point
|
||
|
to integer conversions, and round-to-nearest for all other arithmetic
|
||
|
truncations. This option should be specified for programs that change
|
||
|
the FP rounding mode dynamically, or that may be executed with a
|
||
|
non-default rounding mode. This option disables constant folding of
|
||
|
floating-point expressions at compile time (which may be affected by
|
||
|
rounding mode) and arithmetic transformations that are unsafe in the
|
||
|
presence of sign-dependent rounding modes.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-rounding-math</samp>.
|
||
|
</p>
|
||
|
<p>This option is experimental and does not currently guarantee to
|
||
|
disable all GCC optimizations that are affected by rounding mode.
|
||
|
Future versions of GCC may provide finer control of this setting
|
||
|
using C99’s <code>FENV_ACCESS</code> pragma. This command-line option
|
||
|
will be used to specify the default state for <code>FENV_ACCESS</code>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsignaling-nans</code></dt>
|
||
|
<dd><a name="index-fsignaling_002dnans"></a>
|
||
|
<p>Compile code assuming that IEEE signaling NaNs may generate user-visible
|
||
|
traps during floating-point operations. Setting this option disables
|
||
|
optimizations that may change the number of exceptions visible with
|
||
|
signaling NaNs. This option implies <samp>-ftrapping-math</samp>.
|
||
|
</p>
|
||
|
<p>This option causes the preprocessor macro <code>__SUPPORT_SNAN__</code> to
|
||
|
be defined.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-signaling-nans</samp>.
|
||
|
</p>
|
||
|
<p>This option is experimental and does not currently guarantee to
|
||
|
disable all GCC optimizations that affect signaling NaN behavior.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fno-fp-int-builtin-inexact</code></dt>
|
||
|
<dd><a name="index-fno_002dfp_002dint_002dbuiltin_002dinexact"></a>
|
||
|
<p>Do not allow the built-in functions <code>ceil</code>, <code>floor</code>,
|
||
|
<code>round</code> and <code>trunc</code>, and their <code>float</code> and <code>long
|
||
|
double</code> variants, to generate code that raises the “inexact”
|
||
|
floating-point exception for noninteger arguments. ISO C99 and C11
|
||
|
allow these functions to raise the “inexact” exception, but ISO/IEC
|
||
|
TS 18661-1:2014, the C bindings to IEEE 754-2008, does not allow these
|
||
|
functions to do so.
|
||
|
</p>
|
||
|
<p>The default is <samp>-ffp-int-builtin-inexact</samp>, allowing the
|
||
|
exception to be raised. This option does nothing unless
|
||
|
<samp>-ftrapping-math</samp> is in effect.
|
||
|
</p>
|
||
|
<p>Even if <samp>-fno-fp-int-builtin-inexact</samp> is used, if the functions
|
||
|
generate a call to a library function then the “inexact” exception
|
||
|
may be raised if the library implementation does not follow TS 18661.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsingle-precision-constant</code></dt>
|
||
|
<dd><a name="index-fsingle_002dprecision_002dconstant"></a>
|
||
|
<p>Treat floating-point constants as single precision instead of
|
||
|
implicitly converting them to double-precision constants.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcx-limited-range</code></dt>
|
||
|
<dd><a name="index-fcx_002dlimited_002drange"></a>
|
||
|
<p>When enabled, this option states that a range reduction step is not
|
||
|
needed when performing complex division. Also, there is no checking
|
||
|
whether the result of a complex multiplication or division is <code>NaN
|
||
|
+ I*NaN</code>, with an attempt to rescue the situation in that case. The
|
||
|
default is <samp>-fno-cx-limited-range</samp>, but is enabled by
|
||
|
<samp>-ffast-math</samp>.
|
||
|
</p>
|
||
|
<p>This option controls the default setting of the ISO C99
|
||
|
<code>CX_LIMITED_RANGE</code> pragma. Nevertheless, the option applies to
|
||
|
all languages.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fcx-fortran-rules</code></dt>
|
||
|
<dd><a name="index-fcx_002dfortran_002drules"></a>
|
||
|
<p>Complex multiplication and division follow Fortran rules. Range
|
||
|
reduction is done as part of complex division, but there is no checking
|
||
|
whether the result of a complex multiplication or division is <code>NaN
|
||
|
+ I*NaN</code>, with an attempt to rescue the situation in that case.
|
||
|
</p>
|
||
|
<p>The default is <samp>-fno-cx-fortran-rules</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
</dl>
|
||
|
|
||
|
<p>The following options control optimizations that may improve
|
||
|
performance, but are not enabled by any <samp>-O</samp> options. This
|
||
|
section includes experimental options that may produce broken code.
|
||
|
</p>
|
||
|
<dl compact="compact">
|
||
|
<dt><code>-fbranch-probabilities</code></dt>
|
||
|
<dd><a name="index-fbranch_002dprobabilities"></a>
|
||
|
<p>After running a program compiled with <samp>-fprofile-arcs</samp>
|
||
|
(see <a href="Instrumentation-Options.html#Instrumentation-Options">Instrumentation Options</a>),
|
||
|
you can compile it a second time using
|
||
|
<samp>-fbranch-probabilities</samp>, to improve optimizations based on
|
||
|
the number of times each branch was taken. When a program
|
||
|
compiled with <samp>-fprofile-arcs</samp> exits, it saves arc execution
|
||
|
counts to a file called <samp><var>sourcename</var>.gcda</samp> for each source
|
||
|
file. The information in this data file is very dependent on the
|
||
|
structure of the generated code, so you must use the same source code
|
||
|
and the same optimization options for both compilations.
|
||
|
</p>
|
||
|
<p>With <samp>-fbranch-probabilities</samp>, GCC puts a
|
||
|
‘<samp>REG_BR_PROB</samp>’ note on each ‘<samp>JUMP_INSN</samp>’ and ‘<samp>CALL_INSN</samp>’.
|
||
|
These can be used to improve optimization. Currently, they are only
|
||
|
used in one place: in <samp>reorg.c</samp>, instead of guessing which path a
|
||
|
branch is most likely to take, the ‘<samp>REG_BR_PROB</samp>’ values are used to
|
||
|
exactly determine which path is taken more often.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fprofile-values</code></dt>
|
||
|
<dd><a name="index-fprofile_002dvalues"></a>
|
||
|
<p>If combined with <samp>-fprofile-arcs</samp>, it adds code so that some
|
||
|
data about values of expressions in the program is gathered.
|
||
|
</p>
|
||
|
<p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered
|
||
|
from profiling values of expressions for usage in optimizations.
|
||
|
</p>
|
||
|
<p>Enabled with <samp>-fprofile-generate</samp> and <samp>-fprofile-use</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fprofile-reorder-functions</code></dt>
|
||
|
<dd><a name="index-fprofile_002dreorder_002dfunctions"></a>
|
||
|
<p>Function reordering based on profile instrumentation collects
|
||
|
first time of execution of a function and orders these functions
|
||
|
in ascending order.
|
||
|
</p>
|
||
|
<p>Enabled with <samp>-fprofile-use</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fvpt</code></dt>
|
||
|
<dd><a name="index-fvpt"></a>
|
||
|
<p>If combined with <samp>-fprofile-arcs</samp>, this option instructs the compiler
|
||
|
to add code to gather information about values of expressions.
|
||
|
</p>
|
||
|
<p>With <samp>-fbranch-probabilities</samp>, it reads back the data gathered
|
||
|
and actually performs the optimizations based on them.
|
||
|
Currently the optimizations include specialization of division operations
|
||
|
using the knowledge about the value of the denominator.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-frename-registers</code></dt>
|
||
|
<dd><a name="index-frename_002dregisters"></a>
|
||
|
<p>Attempt to avoid false dependencies in scheduled code by making use
|
||
|
of registers left over after register allocation. This optimization
|
||
|
most benefits processors with lots of registers. Depending on the
|
||
|
debug information format adopted by the target, however, it can
|
||
|
make debugging impossible, since variables no longer stay in
|
||
|
a “home register”.
|
||
|
</p>
|
||
|
<p>Enabled by default with <samp>-funroll-loops</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fschedule-fusion</code></dt>
|
||
|
<dd><a name="index-fschedule_002dfusion"></a>
|
||
|
<p>Performs a target dependent pass over the instruction stream to schedule
|
||
|
instructions of same type together because target machine can execute them
|
||
|
more efficiently if they are adjacent to each other in the instruction flow.
|
||
|
</p>
|
||
|
<p>Enabled at levels <samp>-O2</samp>, <samp>-O3</samp>, <samp>-Os</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ftracer</code></dt>
|
||
|
<dd><a name="index-ftracer"></a>
|
||
|
<p>Perform tail duplication to enlarge superblock size. This transformation
|
||
|
simplifies the control flow of the function allowing other optimizations to do
|
||
|
a better job.
|
||
|
</p>
|
||
|
<p>Enabled with <samp>-fprofile-use</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-funroll-loops</code></dt>
|
||
|
<dd><a name="index-funroll_002dloops"></a>
|
||
|
<p>Unroll loops whose number of iterations can be determined at compile time or
|
||
|
upon entry to the loop. <samp>-funroll-loops</samp> implies
|
||
|
<samp>-frerun-cse-after-loop</samp>, <samp>-fweb</samp> and <samp>-frename-registers</samp>.
|
||
|
It also turns on complete loop peeling (i.e. complete removal of loops with
|
||
|
a small constant number of iterations). This option makes code larger, and may
|
||
|
or may not make it run faster.
|
||
|
</p>
|
||
|
<p>Enabled with <samp>-fprofile-use</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-funroll-all-loops</code></dt>
|
||
|
<dd><a name="index-funroll_002dall_002dloops"></a>
|
||
|
<p>Unroll all loops, even if their number of iterations is uncertain when
|
||
|
the loop is entered. This usually makes programs run more slowly.
|
||
|
<samp>-funroll-all-loops</samp> implies the same options as
|
||
|
<samp>-funroll-loops</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fpeel-loops</code></dt>
|
||
|
<dd><a name="index-fpeel_002dloops"></a>
|
||
|
<p>Peels loops for which there is enough information that they do not
|
||
|
roll much (from profile feedback or static analysis). It also turns on
|
||
|
complete loop peeling (i.e. complete removal of loops with small constant
|
||
|
number of iterations).
|
||
|
</p>
|
||
|
<p>Enabled with <samp>-O3</samp> and/or <samp>-fprofile-use</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fmove-loop-invariants</code></dt>
|
||
|
<dd><a name="index-fmove_002dloop_002dinvariants"></a>
|
||
|
<p>Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
|
||
|
at level <samp>-O1</samp>
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsplit-loops</code></dt>
|
||
|
<dd><a name="index-fsplit_002dloops"></a>
|
||
|
<p>Split a loop into two if it contains a condition that’s always true
|
||
|
for one side of the iteration space and false for the other.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-funswitch-loops</code></dt>
|
||
|
<dd><a name="index-funswitch_002dloops"></a>
|
||
|
<p>Move branches with loop invariant conditions out of the loop, with duplicates
|
||
|
of the loop on both branches (modified according to result of the condition).
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-ffunction-sections</code></dt>
|
||
|
<dt><code>-fdata-sections</code></dt>
|
||
|
<dd><a name="index-ffunction_002dsections"></a>
|
||
|
<a name="index-fdata_002dsections"></a>
|
||
|
<p>Place each function or data item into its own section in the output
|
||
|
file if the target supports arbitrary sections. The name of the
|
||
|
function or the name of the data item determines the section’s name
|
||
|
in the output file.
|
||
|
</p>
|
||
|
<p>Use these options on systems where the linker can perform optimizations to
|
||
|
improve locality of reference in the instruction space. Most systems using the
|
||
|
ELF object format have linkers with such optimizations. On AIX, the linker
|
||
|
rearranges sections (CSECTs) based on the call graph. The performance impact
|
||
|
varies.
|
||
|
</p>
|
||
|
<p>Together with a linker garbage collection (linker <samp>--gc-sections</samp>
|
||
|
option) these options may lead to smaller statically-linked executables (after
|
||
|
stripping).
|
||
|
</p>
|
||
|
<p>On ELF/DWARF systems these options do not degenerate the quality of the debug
|
||
|
information. There could be issues with other object files/debug info formats.
|
||
|
</p>
|
||
|
<p>Only use these options when there are significant benefits from doing so. When
|
||
|
you specify these options, the assembler and linker create larger object and
|
||
|
executable files and are also slower. These options affect code generation.
|
||
|
They prevent optimizations by the compiler and assembler using relative
|
||
|
locations inside a translation unit since the locations are unknown until
|
||
|
link time. An example of such an optimization is relaxing calls to short call
|
||
|
instructions.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fbranch-target-load-optimize</code></dt>
|
||
|
<dd><a name="index-fbranch_002dtarget_002dload_002doptimize"></a>
|
||
|
<p>Perform branch target register load optimization before prologue / epilogue
|
||
|
threading.
|
||
|
The use of target registers can typically be exposed only during reload,
|
||
|
thus hoisting loads out of loops and doing inter-block scheduling needs
|
||
|
a separate optimization pass.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fbranch-target-load-optimize2</code></dt>
|
||
|
<dd><a name="index-fbranch_002dtarget_002dload_002doptimize2"></a>
|
||
|
<p>Perform branch target register load optimization after prologue / epilogue
|
||
|
threading.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fbtr-bb-exclusive</code></dt>
|
||
|
<dd><a name="index-fbtr_002dbb_002dexclusive"></a>
|
||
|
<p>When performing branch target register load optimization, don’t reuse
|
||
|
branch target registers within any basic block.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fstdarg-opt</code></dt>
|
||
|
<dd><a name="index-fstdarg_002dopt"></a>
|
||
|
<p>Optimize the prologue of variadic argument functions with respect to usage of
|
||
|
those arguments.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>-fsection-anchors</code></dt>
|
||
|
<dd><a name="index-fsection_002danchors"></a>
|
||
|
<p>Try to reduce the number of symbolic address calculations by using
|
||
|
shared “anchor” symbols to address nearby objects. This transformation
|
||
|
can help to reduce the number of GOT entries and GOT accesses on some
|
||
|
targets.
|
||
|
</p>
|
||
|
<p>For example, the implementation of the following function <code>foo</code>:
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">static int a, b, c;
|
||
|
int foo (void) { return a + b + c; }
|
||
|
</pre></div>
|
||
|
|
||
|
<p>usually calculates the addresses of all three variables, but if you
|
||
|
compile it with <samp>-fsection-anchors</samp>, it accesses the variables
|
||
|
from a common anchor point instead. The effect is similar to the
|
||
|
following pseudocode (which isn’t valid C):
|
||
|
</p>
|
||
|
<div class="smallexample">
|
||
|
<pre class="smallexample">int foo (void)
|
||
|
{
|
||
|
register int *xr = &x;
|
||
|
return xr[&a - &x] + xr[&b - &x] + xr[&c - &x];
|
||
|
}
|
||
|
</pre></div>
|
||
|
|
||
|
<p>Not all targets support this option.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>--param <var>name</var>=<var>value</var></code></dt>
|
||
|
<dd><a name="index-param"></a>
|
||
|
<p>In some places, GCC uses various constants to control the amount of
|
||
|
optimization that is done. For example, GCC does not inline functions
|
||
|
that contain more than a certain number of instructions. You can
|
||
|
control some of these constants on the command line using the
|
||
|
<samp>--param</samp> option.
|
||
|
</p>
|
||
|
<p>The names of specific parameters, and the meaning of the values, are
|
||
|
tied to the internals of the compiler, and are subject to change
|
||
|
without notice in future releases.
|
||
|
</p>
|
||
|
<p>In each case, the <var>value</var> is an integer. The allowable choices for
|
||
|
<var>name</var> are:
|
||
|
</p>
|
||
|
<dl compact="compact">
|
||
|
<dt><code>predictable-branch-outcome</code></dt>
|
||
|
<dd><p>When branch is predicted to be taken with probability lower than this threshold
|
||
|
(in percent), then it is considered well predictable. The default is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-rtl-if-conversion-insns</code></dt>
|
||
|
<dd><p>RTL if-conversion tries to remove conditional branches around a block and
|
||
|
replace them with conditionally executed instructions. This parameter
|
||
|
gives the maximum number of instructions in a block which should be
|
||
|
considered for if-conversion. The default is 10, though the compiler will
|
||
|
also use other heuristics to decide whether if-conversion is likely to be
|
||
|
profitable.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-rtl-if-conversion-predictable-cost</code></dt>
|
||
|
<dt><code>max-rtl-if-conversion-unpredictable-cost</code></dt>
|
||
|
<dd><p>RTL if-conversion will try to remove conditional branches around a block
|
||
|
and replace them with conditionally executed instructions. These parameters
|
||
|
give the maximum permissible cost for the sequence that would be generated
|
||
|
by if-conversion depending on whether the branch is statically determined
|
||
|
to be predictable or not. The units for this parameter are the same as
|
||
|
those for the GCC internal seq_cost metric. The compiler will try to
|
||
|
provide a reasonable default for this parameter using the BRANCH_COST
|
||
|
target macro.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-crossjump-edges</code></dt>
|
||
|
<dd><p>The maximum number of incoming edges to consider for cross-jumping.
|
||
|
The algorithm used by <samp>-fcrossjumping</samp> is <em>O(N^2)</em> in
|
||
|
the number of edges incoming to each block. Increasing values mean
|
||
|
more aggressive optimization, making the compilation time increase with
|
||
|
probably small improvement in executable size.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>min-crossjump-insns</code></dt>
|
||
|
<dd><p>The minimum number of instructions that must be matched at the end
|
||
|
of two blocks before cross-jumping is performed on them. This
|
||
|
value is ignored in the case where all instructions in the block being
|
||
|
cross-jumped from are matched. The default value is 5.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-grow-copy-bb-insns</code></dt>
|
||
|
<dd><p>The maximum code size expansion factor when copying basic blocks
|
||
|
instead of jumping. The expansion is relative to a jump instruction.
|
||
|
The default value is 8.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-goto-duplication-insns</code></dt>
|
||
|
<dd><p>The maximum number of instructions to duplicate to a block that jumps
|
||
|
to a computed goto. To avoid <em>O(N^2)</em> behavior in a number of
|
||
|
passes, GCC factors computed gotos early in the compilation process,
|
||
|
and unfactors them as late as possible. Only computed jumps at the
|
||
|
end of a basic blocks with no more than max-goto-duplication-insns are
|
||
|
unfactored. The default value is 8.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-delay-slot-insn-search</code></dt>
|
||
|
<dd><p>The maximum number of instructions to consider when looking for an
|
||
|
instruction to fill a delay slot. If more than this arbitrary number of
|
||
|
instructions are searched, the time savings from filling the delay slot
|
||
|
are minimal, so stop searching. Increasing values mean more
|
||
|
aggressive optimization, making the compilation time increase with probably
|
||
|
small improvement in execution time.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-delay-slot-live-search</code></dt>
|
||
|
<dd><p>When trying to fill delay slots, the maximum number of instructions to
|
||
|
consider when searching for a block with valid live register
|
||
|
information. Increasing this arbitrarily chosen value means more
|
||
|
aggressive optimization, increasing the compilation time. This parameter
|
||
|
should be removed when the delay slot code is rewritten to maintain the
|
||
|
control-flow graph.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-gcse-memory</code></dt>
|
||
|
<dd><p>The approximate maximum amount of memory that can be allocated in
|
||
|
order to perform the global common subexpression elimination
|
||
|
optimization. If more memory than specified is required, the
|
||
|
optimization is not done.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-gcse-insertion-ratio</code></dt>
|
||
|
<dd><p>If the ratio of expression insertions to deletions is larger than this value
|
||
|
for any expression, then RTL PRE inserts or removes the expression and thus
|
||
|
leaves partially redundant computations in the instruction stream. The default value is 20.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-pending-list-length</code></dt>
|
||
|
<dd><p>The maximum number of pending dependencies scheduling allows
|
||
|
before flushing the current state and starting over. Large functions
|
||
|
with few branches or calls can create excessively large lists which
|
||
|
needlessly consume memory and resources.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-modulo-backtrack-attempts</code></dt>
|
||
|
<dd><p>The maximum number of backtrack attempts the scheduler should make
|
||
|
when modulo scheduling a loop. Larger values can exponentially increase
|
||
|
compilation time.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-inline-insns-single</code></dt>
|
||
|
<dd><p>Several parameters control the tree inliner used in GCC.
|
||
|
This number sets the maximum number of instructions (counted in GCC’s
|
||
|
internal representation) in a single function that the tree inliner
|
||
|
considers for inlining. This only affects functions declared
|
||
|
inline and methods implemented in a class declaration (C++).
|
||
|
The default value is 400.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-inline-insns-auto</code></dt>
|
||
|
<dd><p>When you use <samp>-finline-functions</samp> (included in <samp>-O3</samp>),
|
||
|
a lot of functions that would otherwise not be considered for inlining
|
||
|
by the compiler are investigated. To those functions, a different
|
||
|
(more restrictive) limit compared to functions declared inline can
|
||
|
be applied.
|
||
|
The default value is 30.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>inline-min-speedup</code></dt>
|
||
|
<dd><p>When estimated performance improvement of caller + callee runtime exceeds this
|
||
|
threshold (in percent), the function can be inlined regardless of the limit on
|
||
|
<samp>--param max-inline-insns-single</samp> and <samp>--param
|
||
|
max-inline-insns-auto</samp>.
|
||
|
The default value is 15.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>large-function-insns</code></dt>
|
||
|
<dd><p>The limit specifying really large functions. For functions larger than this
|
||
|
limit after inlining, inlining is constrained by
|
||
|
<samp>--param large-function-growth</samp>. This parameter is useful primarily
|
||
|
to avoid extreme compilation time caused by non-linear algorithms used by the
|
||
|
back end.
|
||
|
The default value is 2700.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>large-function-growth</code></dt>
|
||
|
<dd><p>Specifies maximal growth of large function caused by inlining in percents.
|
||
|
The default value is 100 which limits large function growth to 2.0 times
|
||
|
the original size.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>large-unit-insns</code></dt>
|
||
|
<dd><p>The limit specifying large translation unit. Growth caused by inlining of
|
||
|
units larger than this limit is limited by <samp>--param inline-unit-growth</samp>.
|
||
|
For small units this might be too tight.
|
||
|
For example, consider a unit consisting of function A
|
||
|
that is inline and B that just calls A three times. If B is small relative to
|
||
|
A, the growth of unit is 300\% and yet such inlining is very sane. For very
|
||
|
large units consisting of small inlineable functions, however, the overall unit
|
||
|
growth limit is needed to avoid exponential explosion of code size. Thus for
|
||
|
smaller units, the size is increased to <samp>--param large-unit-insns</samp>
|
||
|
before applying <samp>--param inline-unit-growth</samp>. The default is 10000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>inline-unit-growth</code></dt>
|
||
|
<dd><p>Specifies maximal overall growth of the compilation unit caused by inlining.
|
||
|
The default value is 20 which limits unit growth to 1.2 times the original
|
||
|
size. Cold functions (either marked cold via an attribute or by profile
|
||
|
feedback) are not accounted into the unit size.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipcp-unit-growth</code></dt>
|
||
|
<dd><p>Specifies maximal overall growth of the compilation unit caused by
|
||
|
interprocedural constant propagation. The default value is 10 which limits
|
||
|
unit growth to 1.1 times the original size.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>large-stack-frame</code></dt>
|
||
|
<dd><p>The limit specifying large stack frames. While inlining the algorithm is trying
|
||
|
to not grow past this limit too much. The default value is 256 bytes.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>large-stack-frame-growth</code></dt>
|
||
|
<dd><p>Specifies maximal growth of large stack frames caused by inlining in percents.
|
||
|
The default value is 1000 which limits large stack frame growth to 11 times
|
||
|
the original size.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-inline-insns-recursive</code></dt>
|
||
|
<dt><code>max-inline-insns-recursive-auto</code></dt>
|
||
|
<dd><p>Specifies the maximum number of instructions an out-of-line copy of a
|
||
|
self-recursive inline
|
||
|
function can grow into by performing recursive inlining.
|
||
|
</p>
|
||
|
<p><samp>--param max-inline-insns-recursive</samp> applies to functions
|
||
|
declared inline.
|
||
|
For functions not declared inline, recursive inlining
|
||
|
happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is
|
||
|
enabled; <samp>--param max-inline-insns-recursive-auto</samp> applies instead. The
|
||
|
default value is 450.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-inline-recursive-depth</code></dt>
|
||
|
<dt><code>max-inline-recursive-depth-auto</code></dt>
|
||
|
<dd><p>Specifies the maximum recursion depth used for recursive inlining.
|
||
|
</p>
|
||
|
<p><samp>--param max-inline-recursive-depth</samp> applies to functions
|
||
|
declared inline. For functions not declared inline, recursive inlining
|
||
|
happens only when <samp>-finline-functions</samp> (included in <samp>-O3</samp>) is
|
||
|
enabled; <samp>--param max-inline-recursive-depth-auto</samp> applies instead. The
|
||
|
default value is 8.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>min-inline-recursive-probability</code></dt>
|
||
|
<dd><p>Recursive inlining is profitable only for function having deep recursion
|
||
|
in average and can hurt for function having little recursion depth by
|
||
|
increasing the prologue size or complexity of function body to other
|
||
|
optimizers.
|
||
|
</p>
|
||
|
<p>When profile feedback is available (see <samp>-fprofile-generate</samp>) the actual
|
||
|
recursion depth can be guessed from the probability that function recurses
|
||
|
via a given call expression. This parameter limits inlining only to call
|
||
|
expressions whose probability exceeds the given threshold (in percents).
|
||
|
The default value is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>early-inlining-insns</code></dt>
|
||
|
<dd><p>Specify growth that the early inliner can make. In effect it increases
|
||
|
the amount of inlining for code having a large abstraction penalty.
|
||
|
The default value is 14.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-early-inliner-iterations</code></dt>
|
||
|
<dd><p>Limit of iterations of the early inliner. This basically bounds
|
||
|
the number of nested indirect calls the early inliner can resolve.
|
||
|
Deeper chains are still handled by late inlining.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>comdat-sharing-probability</code></dt>
|
||
|
<dd><p>Probability (in percent) that C++ inline function with comdat visibility
|
||
|
are shared across multiple compilation units. The default value is 20.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>profile-func-internal-id</code></dt>
|
||
|
<dd><p>A parameter to control whether to use function internal id in profile
|
||
|
database lookup. If the value is 0, the compiler uses an id that
|
||
|
is based on function assembler name and filename, which makes old profile
|
||
|
data more tolerant to source changes such as function reordering etc.
|
||
|
The default value is 0.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>min-vect-loop-bound</code></dt>
|
||
|
<dd><p>The minimum number of iterations under which loops are not vectorized
|
||
|
when <samp>-ftree-vectorize</samp> is used. The number of iterations after
|
||
|
vectorization needs to be greater than the value specified by this option
|
||
|
to allow vectorization. The default value is 0.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>gcse-cost-distance-ratio</code></dt>
|
||
|
<dd><p>Scaling factor in calculation of maximum distance an expression
|
||
|
can be moved by GCSE optimizations. This is currently supported only in the
|
||
|
code hoisting pass. The bigger the ratio, the more aggressive code hoisting
|
||
|
is with simple expressions, i.e., the expressions that have cost
|
||
|
less than <samp>gcse-unrestricted-cost</samp>. Specifying 0 disables
|
||
|
hoisting of simple expressions. The default value is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>gcse-unrestricted-cost</code></dt>
|
||
|
<dd><p>Cost, roughly measured as the cost of a single typical machine
|
||
|
instruction, at which GCSE optimizations do not constrain
|
||
|
the distance an expression can travel. This is currently
|
||
|
supported only in the code hoisting pass. The lesser the cost,
|
||
|
the more aggressive code hoisting is. Specifying 0
|
||
|
allows all expressions to travel unrestricted distances.
|
||
|
The default value is 3.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-hoist-depth</code></dt>
|
||
|
<dd><p>The depth of search in the dominator tree for expressions to hoist.
|
||
|
This is used to avoid quadratic behavior in hoisting algorithm.
|
||
|
The value of 0 does not limit on the search, but may slow down compilation
|
||
|
of huge functions. The default value is 30.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-tail-merge-comparisons</code></dt>
|
||
|
<dd><p>The maximum amount of similar bbs to compare a bb with. This is used to
|
||
|
avoid quadratic behavior in tree tail merging. The default value is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-tail-merge-iterations</code></dt>
|
||
|
<dd><p>The maximum amount of iterations of the pass over the function. This is used to
|
||
|
limit compilation time in tree tail merging. The default value is 2.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>store-merging-allow-unaligned</code></dt>
|
||
|
<dd><p>Allow the store merging pass to introduce unaligned stores if it is legal to
|
||
|
do so. The default value is 1.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-stores-to-merge</code></dt>
|
||
|
<dd><p>The maximum number of stores to attempt to merge into wider stores in the store
|
||
|
merging pass. The minimum value is 2 and the default is 64.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-unrolled-insns</code></dt>
|
||
|
<dd><p>The maximum number of instructions that a loop may have to be unrolled.
|
||
|
If a loop is unrolled, this parameter also determines how many times
|
||
|
the loop code is unrolled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-average-unrolled-insns</code></dt>
|
||
|
<dd><p>The maximum number of instructions biased by probabilities of their execution
|
||
|
that a loop may have to be unrolled. If a loop is unrolled,
|
||
|
this parameter also determines how many times the loop code is unrolled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-unroll-times</code></dt>
|
||
|
<dd><p>The maximum number of unrollings of a single loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-peeled-insns</code></dt>
|
||
|
<dd><p>The maximum number of instructions that a loop may have to be peeled.
|
||
|
If a loop is peeled, this parameter also determines how many times
|
||
|
the loop code is peeled.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-peel-times</code></dt>
|
||
|
<dd><p>The maximum number of peelings of a single loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-peel-branches</code></dt>
|
||
|
<dd><p>The maximum number of branches on the hot path through the peeled sequence.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-completely-peeled-insns</code></dt>
|
||
|
<dd><p>The maximum number of insns of a completely peeled loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-completely-peel-times</code></dt>
|
||
|
<dd><p>The maximum number of iterations of a loop to be suitable for complete peeling.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-completely-peel-loop-nest-depth</code></dt>
|
||
|
<dd><p>The maximum depth of a loop nest suitable for complete peeling.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-unswitch-insns</code></dt>
|
||
|
<dd><p>The maximum number of insns of an unswitched loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-unswitch-level</code></dt>
|
||
|
<dd><p>The maximum number of branches unswitched in a single loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-loop-headers-insns</code></dt>
|
||
|
<dd><p>The maximum number of insns in loop header duplicated by the copy loop headers
|
||
|
pass.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>lim-expensive</code></dt>
|
||
|
<dd><p>The minimum cost of an expensive expression in the loop invariant motion.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>iv-consider-all-candidates-bound</code></dt>
|
||
|
<dd><p>Bound on number of candidates for induction variables, below which
|
||
|
all candidates are considered for each use in induction variable
|
||
|
optimizations. If there are more candidates than this,
|
||
|
only the most relevant ones are considered to avoid quadratic time complexity.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>iv-max-considered-uses</code></dt>
|
||
|
<dd><p>The induction variable optimizations give up on loops that contain more
|
||
|
induction variable uses.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>iv-always-prune-cand-set-bound</code></dt>
|
||
|
<dd><p>If the number of candidates in the set is smaller than this value,
|
||
|
always try to remove unnecessary ivs from the set
|
||
|
when adding a new one.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>avg-loop-niter</code></dt>
|
||
|
<dd><p>Average number of iterations of a loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>dse-max-object-size</code></dt>
|
||
|
<dd><p>Maximum size (in bytes) of objects tracked bytewise by dead store elimination.
|
||
|
Larger values may result in larger compilation times.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>scev-max-expr-size</code></dt>
|
||
|
<dd><p>Bound on size of expressions used in the scalar evolutions analyzer.
|
||
|
Large expressions slow the analyzer.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>scev-max-expr-complexity</code></dt>
|
||
|
<dd><p>Bound on the complexity of the expressions in the scalar evolutions analyzer.
|
||
|
Complex expressions slow the analyzer.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-tree-if-conversion-phi-args</code></dt>
|
||
|
<dd><p>Maximum number of arguments in a PHI supported by TREE if conversion
|
||
|
unless the loop is marked with simd pragma.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>vect-max-version-for-alignment-checks</code></dt>
|
||
|
<dd><p>The maximum number of run-time checks that can be performed when
|
||
|
doing loop versioning for alignment in the vectorizer.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>vect-max-version-for-alias-checks</code></dt>
|
||
|
<dd><p>The maximum number of run-time checks that can be performed when
|
||
|
doing loop versioning for alias in the vectorizer.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>vect-max-peeling-for-alignment</code></dt>
|
||
|
<dd><p>The maximum number of loop peels to enhance access alignment
|
||
|
for vectorizer. Value -1 means no limit.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-iterations-to-track</code></dt>
|
||
|
<dd><p>The maximum number of iterations of a loop the brute-force algorithm
|
||
|
for analysis of the number of iterations of the loop tries to evaluate.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>hot-bb-count-ws-permille</code></dt>
|
||
|
<dd><p>A basic block profile count is considered hot if it contributes to
|
||
|
the given permillage (i.e. 0...1000) of the entire profiled execution.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>hot-bb-frequency-fraction</code></dt>
|
||
|
<dd><p>Select fraction of the entry block frequency of executions of basic block in
|
||
|
function given basic block needs to have to be considered hot.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-predicted-iterations</code></dt>
|
||
|
<dd><p>The maximum number of loop iterations we predict statically. This is useful
|
||
|
in cases where a function contains a single loop with known bound and
|
||
|
another loop with unknown bound.
|
||
|
The known number of iterations is predicted correctly, while
|
||
|
the unknown number of iterations average to roughly 10. This means that the
|
||
|
loop without bounds appears artificially cold relative to the other one.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>builtin-expect-probability</code></dt>
|
||
|
<dd><p>Control the probability of the expression having the specified value. This
|
||
|
parameter takes a percentage (i.e. 0 ... 100) as input.
|
||
|
The default probability of 90 is obtained empirically.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>align-threshold</code></dt>
|
||
|
<dd>
|
||
|
<p>Select fraction of the maximal frequency of executions of a basic block in
|
||
|
a function to align the basic block.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>align-loop-iterations</code></dt>
|
||
|
<dd>
|
||
|
<p>A loop expected to iterate at least the selected number of iterations is
|
||
|
aligned.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>tracer-dynamic-coverage</code></dt>
|
||
|
<dt><code>tracer-dynamic-coverage-feedback</code></dt>
|
||
|
<dd>
|
||
|
<p>This value is used to limit superblock formation once the given percentage of
|
||
|
executed instructions is covered. This limits unnecessary code size
|
||
|
expansion.
|
||
|
</p>
|
||
|
<p>The <samp>tracer-dynamic-coverage-feedback</samp> parameter
|
||
|
is used only when profile
|
||
|
feedback is available. The real profiles (as opposed to statically estimated
|
||
|
ones) are much less balanced allowing the threshold to be larger value.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>tracer-max-code-growth</code></dt>
|
||
|
<dd><p>Stop tail duplication once code growth has reached given percentage. This is
|
||
|
a rather artificial limit, as most of the duplicates are eliminated later in
|
||
|
cross jumping, so it may be set to much higher values than is the desired code
|
||
|
growth.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>tracer-min-branch-ratio</code></dt>
|
||
|
<dd>
|
||
|
<p>Stop reverse growth when the reverse probability of best edge is less than this
|
||
|
threshold (in percent).
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>tracer-min-branch-probability</code></dt>
|
||
|
<dt><code>tracer-min-branch-probability-feedback</code></dt>
|
||
|
<dd>
|
||
|
<p>Stop forward growth if the best edge has probability lower than this
|
||
|
threshold.
|
||
|
</p>
|
||
|
<p>Similarly to <samp>tracer-dynamic-coverage</samp> two parameters are
|
||
|
provided. <samp>tracer-min-branch-probability-feedback</samp> is used for
|
||
|
compilation with profile feedback and <samp>tracer-min-branch-probability</samp>
|
||
|
compilation without. The value for compilation with profile feedback
|
||
|
needs to be more conservative (higher) in order to make tracer
|
||
|
effective.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>stack-clash-protection-guard-size</code></dt>
|
||
|
<dd><p>Specify the size of the operating system provided stack guard as
|
||
|
2 raised to <var>num</var> bytes. The default value is 12 (4096 bytes).
|
||
|
Acceptable values are between 12 and 30. Higher values may reduce the
|
||
|
number of explicit probes, but a value larger than the operating system
|
||
|
provided guard will leave code vulnerable to stack clash style attacks.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>stack-clash-protection-probe-interval</code></dt>
|
||
|
<dd><p>Stack clash protection involves probing stack space as it is allocated. This
|
||
|
param controls the maximum distance between probes into the stack as 2 raised
|
||
|
to <var>num</var> bytes. Acceptable values are between 10 and 16 and defaults to
|
||
|
12. Higher values may reduce the number of explicit probes, but a value
|
||
|
larger than the operating system provided guard will leave code vulnerable to
|
||
|
stack clash style attacks.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-cse-path-length</code></dt>
|
||
|
<dd>
|
||
|
<p>The maximum number of basic blocks on path that CSE considers.
|
||
|
The default is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-cse-insns</code></dt>
|
||
|
<dd><p>The maximum number of instructions CSE processes before flushing.
|
||
|
The default is 1000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ggc-min-expand</code></dt>
|
||
|
<dd>
|
||
|
<p>GCC uses a garbage collector to manage its own memory allocation. This
|
||
|
parameter specifies the minimum percentage by which the garbage
|
||
|
collector’s heap should be allowed to expand between collections.
|
||
|
Tuning this may improve compilation speed; it has no effect on code
|
||
|
generation.
|
||
|
</p>
|
||
|
<p>The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when
|
||
|
RAM >= 1GB. If <code>getrlimit</code> is available, the notion of “RAM” is
|
||
|
the smallest of actual RAM and <code>RLIMIT_DATA</code> or <code>RLIMIT_AS</code>. If
|
||
|
GCC is not able to calculate RAM on a particular platform, the lower
|
||
|
bound of 30% is used. Setting this parameter and
|
||
|
<samp>ggc-min-heapsize</samp> to zero causes a full collection to occur at
|
||
|
every opportunity. This is extremely slow, but can be useful for
|
||
|
debugging.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ggc-min-heapsize</code></dt>
|
||
|
<dd>
|
||
|
<p>Minimum size of the garbage collector’s heap before it begins bothering
|
||
|
to collect garbage. The first collection occurs after the heap expands
|
||
|
by <samp>ggc-min-expand</samp>% beyond <samp>ggc-min-heapsize</samp>. Again,
|
||
|
tuning this may improve compilation speed, and has no effect on code
|
||
|
generation.
|
||
|
</p>
|
||
|
<p>The default is the smaller of RAM/8, RLIMIT_RSS, or a limit that
|
||
|
tries to ensure that RLIMIT_DATA or RLIMIT_AS are not exceeded, but
|
||
|
with a lower bound of 4096 (four megabytes) and an upper bound of
|
||
|
131072 (128 megabytes). If GCC is not able to calculate RAM on a
|
||
|
particular platform, the lower bound is used. Setting this parameter
|
||
|
very large effectively disables garbage collection. Setting this
|
||
|
parameter and <samp>ggc-min-expand</samp> to zero causes a full collection
|
||
|
to occur at every opportunity.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-reload-search-insns</code></dt>
|
||
|
<dd><p>The maximum number of instruction reload should look backward for equivalent
|
||
|
register. Increasing values mean more aggressive optimization, making the
|
||
|
compilation time increase with probably slightly better performance.
|
||
|
The default value is 100.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-cselib-memory-locations</code></dt>
|
||
|
<dd><p>The maximum number of memory locations cselib should take into account.
|
||
|
Increasing values mean more aggressive optimization, making the compilation time
|
||
|
increase with probably slightly better performance. The default value is 500.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-sched-ready-insns</code></dt>
|
||
|
<dd><p>The maximum number of instructions ready to be issued the scheduler should
|
||
|
consider at any given time during the first scheduling pass. Increasing
|
||
|
values mean more thorough searches, making the compilation time increase
|
||
|
with probably little benefit. The default value is 100.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-sched-region-blocks</code></dt>
|
||
|
<dd><p>The maximum number of blocks in a region to be considered for
|
||
|
interblock scheduling. The default value is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-pipeline-region-blocks</code></dt>
|
||
|
<dd><p>The maximum number of blocks in a region to be considered for
|
||
|
pipelining in the selective scheduler. The default value is 15.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-sched-region-insns</code></dt>
|
||
|
<dd><p>The maximum number of insns in a region to be considered for
|
||
|
interblock scheduling. The default value is 100.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-pipeline-region-insns</code></dt>
|
||
|
<dd><p>The maximum number of insns in a region to be considered for
|
||
|
pipelining in the selective scheduler. The default value is 200.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>min-spec-prob</code></dt>
|
||
|
<dd><p>The minimum probability (in percents) of reaching a source block
|
||
|
for interblock speculative scheduling. The default value is 40.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-sched-extend-regions-iters</code></dt>
|
||
|
<dd><p>The maximum number of iterations through CFG to extend regions.
|
||
|
A value of 0 (the default) disables region extensions.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-sched-insn-conflict-delay</code></dt>
|
||
|
<dd><p>The maximum conflict delay for an insn to be considered for speculative motion.
|
||
|
The default value is 3.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sched-spec-prob-cutoff</code></dt>
|
||
|
<dd><p>The minimal probability of speculation success (in percents), so that
|
||
|
speculative insns are scheduled.
|
||
|
The default value is 40.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sched-state-edge-prob-cutoff</code></dt>
|
||
|
<dd><p>The minimum probability an edge must have for the scheduler to save its
|
||
|
state across it.
|
||
|
The default value is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sched-mem-true-dep-cost</code></dt>
|
||
|
<dd><p>Minimal distance (in CPU cycles) between store and load targeting same
|
||
|
memory locations. The default value is 1.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>selsched-max-lookahead</code></dt>
|
||
|
<dd><p>The maximum size of the lookahead window of selective scheduling. It is a
|
||
|
depth of search for available instructions.
|
||
|
The default value is 50.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>selsched-max-sched-times</code></dt>
|
||
|
<dd><p>The maximum number of times that an instruction is scheduled during
|
||
|
selective scheduling. This is the limit on the number of iterations
|
||
|
through which the instruction may be pipelined. The default value is 2.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>selsched-insns-to-rename</code></dt>
|
||
|
<dd><p>The maximum number of best instructions in the ready list that are considered
|
||
|
for renaming in the selective scheduler. The default value is 2.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sms-min-sc</code></dt>
|
||
|
<dd><p>The minimum value of stage count that swing modulo scheduler
|
||
|
generates. The default value is 2.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-last-value-rtl</code></dt>
|
||
|
<dd><p>The maximum size measured as number of RTLs that can be recorded in an expression
|
||
|
in combiner for a pseudo register as last known value of that register. The default
|
||
|
is 10000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-combine-insns</code></dt>
|
||
|
<dd><p>The maximum number of instructions the RTL combiner tries to combine.
|
||
|
The default value is 2 at <samp>-Og</samp> and 4 otherwise.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>integer-share-limit</code></dt>
|
||
|
<dd><p>Small integer constants can use a shared data structure, reducing the
|
||
|
compiler’s memory usage and increasing its speed. This sets the maximum
|
||
|
value of a shared integer constant. The default value is 256.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ssp-buffer-size</code></dt>
|
||
|
<dd><p>The minimum size of buffers (i.e. arrays) that receive stack smashing
|
||
|
protection when <samp>-fstack-protection</samp> is used.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>min-size-for-stack-sharing</code></dt>
|
||
|
<dd><p>The minimum size of variables taking part in stack slot sharing when not
|
||
|
optimizing. The default value is 32.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-jump-thread-duplication-stmts</code></dt>
|
||
|
<dd><p>Maximum number of statements allowed in a block that needs to be
|
||
|
duplicated when threading jumps.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-fields-for-field-sensitive</code></dt>
|
||
|
<dd><p>Maximum number of fields in a structure treated in
|
||
|
a field sensitive manner during pointer analysis. The default is zero
|
||
|
for <samp>-O0</samp> and <samp>-O1</samp>,
|
||
|
and 100 for <samp>-Os</samp>, <samp>-O2</samp>, and <samp>-O3</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>prefetch-latency</code></dt>
|
||
|
<dd><p>Estimate on average number of instructions that are executed before
|
||
|
prefetch finishes. The distance prefetched ahead is proportional
|
||
|
to this constant. Increasing this number may also lead to less
|
||
|
streams being prefetched (see <samp>simultaneous-prefetches</samp>).
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>simultaneous-prefetches</code></dt>
|
||
|
<dd><p>Maximum number of prefetches that can run at the same time.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>l1-cache-line-size</code></dt>
|
||
|
<dd><p>The size of cache line in L1 cache, in bytes.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>l1-cache-size</code></dt>
|
||
|
<dd><p>The size of L1 cache, in kilobytes.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>l2-cache-size</code></dt>
|
||
|
<dd><p>The size of L2 cache, in kilobytes.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>loop-interchange-max-num-stmts</code></dt>
|
||
|
<dd><p>The maximum number of stmts in a loop to be interchanged.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>loop-interchange-stride-ratio</code></dt>
|
||
|
<dd><p>The minimum ratio between stride of two loops for interchange to be profitable.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>min-insn-to-prefetch-ratio</code></dt>
|
||
|
<dd><p>The minimum ratio between the number of instructions and the
|
||
|
number of prefetches to enable prefetching in a loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>prefetch-min-insn-to-mem-ratio</code></dt>
|
||
|
<dd><p>The minimum ratio between the number of instructions and the
|
||
|
number of memory references to enable prefetching in a loop.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>use-canonical-types</code></dt>
|
||
|
<dd><p>Whether the compiler should use the “canonical” type system. By
|
||
|
default, this should always be 1, which uses a more efficient internal
|
||
|
mechanism for comparing types in C++ and Objective-C++. However, if
|
||
|
bugs in the canonical type system are causing compilation failures,
|
||
|
set this value to 0 to disable canonical types.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>switch-conversion-max-branch-ratio</code></dt>
|
||
|
<dd><p>Switch initialization conversion refuses to create arrays that are
|
||
|
bigger than <samp>switch-conversion-max-branch-ratio</samp> times the number of
|
||
|
branches in the switch.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-partial-antic-length</code></dt>
|
||
|
<dd><p>Maximum length of the partial antic set computed during the tree
|
||
|
partial redundancy elimination optimization (<samp>-ftree-pre</samp>) when
|
||
|
optimizing at <samp>-O3</samp> and above. For some sorts of source code
|
||
|
the enhanced partial redundancy elimination optimization can run away,
|
||
|
consuming all of the memory available on the host machine. This
|
||
|
parameter sets a limit on the length of the sets that are computed,
|
||
|
which prevents the runaway behavior. Setting a value of 0 for
|
||
|
this parameter allows an unlimited set length.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sccvn-max-scc-size</code></dt>
|
||
|
<dd><p>Maximum size of a strongly connected component (SCC) during SCCVN
|
||
|
processing. If this limit is hit, SCCVN processing for the whole
|
||
|
function is not done and optimizations depending on it are
|
||
|
disabled. The default maximum SCC size is 10000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sccvn-max-alias-queries-per-access</code></dt>
|
||
|
<dd><p>Maximum number of alias-oracle queries we perform when looking for
|
||
|
redundancies for loads and stores. If this limit is hit the search
|
||
|
is aborted and the load or store is not considered redundant. The
|
||
|
number of queries is algorithmically limited to the number of
|
||
|
stores on all paths from the load to the function entry.
|
||
|
The default maximum number of queries is 1000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ira-max-loops-num</code></dt>
|
||
|
<dd><p>IRA uses regional register allocation by default. If a function
|
||
|
contains more loops than the number given by this parameter, only at most
|
||
|
the given number of the most frequently-executed loops form regions
|
||
|
for regional register allocation. The default value of the
|
||
|
parameter is 100.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ira-max-conflict-table-size</code></dt>
|
||
|
<dd><p>Although IRA uses a sophisticated algorithm to compress the conflict
|
||
|
table, the table can still require excessive amounts of memory for
|
||
|
huge functions. If the conflict table for a function could be more
|
||
|
than the size in MB given by this parameter, the register allocator
|
||
|
instead uses a faster, simpler, and lower-quality
|
||
|
algorithm that does not require building a pseudo-register conflict table.
|
||
|
The default value of the parameter is 2000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ira-loop-reserved-regs</code></dt>
|
||
|
<dd><p>IRA can be used to evaluate more accurate register pressure in loops
|
||
|
for decisions to move loop invariants (see <samp>-O3</samp>). The number
|
||
|
of available registers reserved for some other purposes is given
|
||
|
by this parameter. The default value of the parameter is 2, which is
|
||
|
the minimal number of registers needed by typical instructions.
|
||
|
This value is the best found from numerous experiments.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>lra-inheritance-ebb-probability-cutoff</code></dt>
|
||
|
<dd><p>LRA tries to reuse values reloaded in registers in subsequent insns.
|
||
|
This optimization is called inheritance. EBB is used as a region to
|
||
|
do this optimization. The parameter defines a minimal fall-through
|
||
|
edge probability in percentage used to add BB to inheritance EBB in
|
||
|
LRA. The default value of the parameter is 40. The value was chosen
|
||
|
from numerous runs of SPEC2000 on x86-64.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>loop-invariant-max-bbs-in-loop</code></dt>
|
||
|
<dd><p>Loop invariant motion can be very expensive, both in compilation time and
|
||
|
in amount of needed compile-time memory, with very large loops. Loops
|
||
|
with more basic blocks than this parameter won’t have loop invariant
|
||
|
motion optimization performed on them. The default value of the
|
||
|
parameter is 1000 for <samp>-O1</samp> and 10000 for <samp>-O2</samp> and above.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>loop-max-datarefs-for-datadeps</code></dt>
|
||
|
<dd><p>Building data dependencies is expensive for very large loops. This
|
||
|
parameter limits the number of data references in loops that are
|
||
|
considered for data dependence analysis. These large loops are no
|
||
|
handled by the optimizations using loop data dependencies.
|
||
|
The default value is 1000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-vartrack-size</code></dt>
|
||
|
<dd><p>Sets a maximum number of hash table slots to use during variable
|
||
|
tracking dataflow analysis of any function. If this limit is exceeded
|
||
|
with variable tracking at assignments enabled, analysis for that
|
||
|
function is retried without it, after removing all debug insns from
|
||
|
the function. If the limit is exceeded even without debug insns, var
|
||
|
tracking analysis is completely disabled for the function. Setting
|
||
|
the parameter to zero makes it unlimited.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-vartrack-expr-depth</code></dt>
|
||
|
<dd><p>Sets a maximum number of recursion levels when attempting to map
|
||
|
variable names or debug temporaries to value expressions. This trades
|
||
|
compilation time for more complete debug information. If this is set too
|
||
|
low, value expressions that are available and could be represented in
|
||
|
debug information may end up not being used; setting this higher may
|
||
|
enable the compiler to find more complex debug expressions, but compile
|
||
|
time and memory use may grow. The default is 12.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-debug-marker-count</code></dt>
|
||
|
<dd><p>Sets a threshold on the number of debug markers (e.g. begin stmt
|
||
|
markers) to avoid complexity explosion at inlining or expanding to RTL.
|
||
|
If a function has more such gimple stmts than the set limit, such stmts
|
||
|
will be dropped from the inlined copy of a function, and from its RTL
|
||
|
expansion. The default is 100000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>min-nondebug-insn-uid</code></dt>
|
||
|
<dd><p>Use uids starting at this parameter for nondebug insns. The range below
|
||
|
the parameter is reserved exclusively for debug insns created by
|
||
|
<samp>-fvar-tracking-assignments</samp>, but debug insns may get
|
||
|
(non-overlapping) uids above it if the reserved range is exhausted.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-sra-ptr-growth-factor</code></dt>
|
||
|
<dd><p>IPA-SRA replaces a pointer to an aggregate with one or more new
|
||
|
parameters only when their cumulative size is less or equal to
|
||
|
<samp>ipa-sra-ptr-growth-factor</samp> times the size of the original
|
||
|
pointer parameter.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sra-max-scalarization-size-Ospeed</code></dt>
|
||
|
<dt><code>sra-max-scalarization-size-Osize</code></dt>
|
||
|
<dd><p>The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to
|
||
|
replace scalar parts of aggregates with uses of independent scalar
|
||
|
variables. These parameters control the maximum size, in storage units,
|
||
|
of aggregate which is considered for replacement when compiling for
|
||
|
speed
|
||
|
(<samp>sra-max-scalarization-size-Ospeed</samp>) or size
|
||
|
(<samp>sra-max-scalarization-size-Osize</samp>) respectively.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>tm-max-aggregate-size</code></dt>
|
||
|
<dd><p>When making copies of thread-local variables in a transaction, this
|
||
|
parameter specifies the size in bytes after which variables are
|
||
|
saved with the logging functions as opposed to save/restore code
|
||
|
sequence pairs. This option only applies when using
|
||
|
<samp>-fgnu-tm</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>graphite-max-nb-scop-params</code></dt>
|
||
|
<dd><p>To avoid exponential effects in the Graphite loop transforms, the
|
||
|
number of parameters in a Static Control Part (SCoP) is bounded. The
|
||
|
default value is 10 parameters, a value of zero can be used to lift
|
||
|
the bound. A variable whose value is unknown at compilation time and
|
||
|
defined outside a SCoP is a parameter of the SCoP.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>loop-block-tile-size</code></dt>
|
||
|
<dd><p>Loop blocking or strip mining transforms, enabled with
|
||
|
<samp>-floop-block</samp> or <samp>-floop-strip-mine</samp>, strip mine each
|
||
|
loop in the loop nest by a given number of iterations. The strip
|
||
|
length can be changed using the <samp>loop-block-tile-size</samp>
|
||
|
parameter. The default value is 51 iterations.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>loop-unroll-jam-size</code></dt>
|
||
|
<dd><p>Specify the unroll factor for the <samp>-floop-unroll-and-jam</samp> option. The
|
||
|
default value is 4.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>loop-unroll-jam-depth</code></dt>
|
||
|
<dd><p>Specify the dimension to be unrolled (counting from the most inner loop)
|
||
|
for the <samp>-floop-unroll-and-jam</samp>. The default value is 2.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-cp-value-list-size</code></dt>
|
||
|
<dd><p>IPA-CP attempts to track all possible values and types passed to a function’s
|
||
|
parameter in order to propagate them and perform devirtualization.
|
||
|
<samp>ipa-cp-value-list-size</samp> is the maximum number of values and types it
|
||
|
stores per one formal parameter of a function.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-cp-eval-threshold</code></dt>
|
||
|
<dd><p>IPA-CP calculates its own score of cloning profitability heuristics
|
||
|
and performs those cloning opportunities with scores that exceed
|
||
|
<samp>ipa-cp-eval-threshold</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-cp-recursion-penalty</code></dt>
|
||
|
<dd><p>Percentage penalty the recursive functions will receive when they
|
||
|
are evaluated for cloning.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-cp-single-call-penalty</code></dt>
|
||
|
<dd><p>Percentage penalty functions containing a single call to another
|
||
|
function will receive when they are evaluated for cloning.
|
||
|
</p>
|
||
|
|
||
|
</dd>
|
||
|
<dt><code>ipa-max-agg-items</code></dt>
|
||
|
<dd><p>IPA-CP is also capable to propagate a number of scalar values passed
|
||
|
in an aggregate. <samp>ipa-max-agg-items</samp> controls the maximum
|
||
|
number of such values per one parameter.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-cp-loop-hint-bonus</code></dt>
|
||
|
<dd><p>When IPA-CP determines that a cloning candidate would make the number
|
||
|
of iterations of a loop known, it adds a bonus of
|
||
|
<samp>ipa-cp-loop-hint-bonus</samp> to the profitability score of
|
||
|
the candidate.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-cp-array-index-hint-bonus</code></dt>
|
||
|
<dd><p>When IPA-CP determines that a cloning candidate would make the index of
|
||
|
an array access known, it adds a bonus of
|
||
|
<samp>ipa-cp-array-index-hint-bonus</samp> to the profitability
|
||
|
score of the candidate.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>ipa-max-aa-steps</code></dt>
|
||
|
<dd><p>During its analysis of function bodies, IPA-CP employs alias analysis
|
||
|
in order to track values pointed to by function parameters. In order
|
||
|
not spend too much time analyzing huge functions, it gives up and
|
||
|
consider all memory clobbered after examining
|
||
|
<samp>ipa-max-aa-steps</samp> statements modifying memory.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>lto-partitions</code></dt>
|
||
|
<dd><p>Specify desired number of partitions produced during WHOPR compilation.
|
||
|
The number of partitions should exceed the number of CPUs used for compilation.
|
||
|
The default value is 32.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>lto-min-partition</code></dt>
|
||
|
<dd><p>Size of minimal partition for WHOPR (in estimated instructions).
|
||
|
This prevents expenses of splitting very small programs into too many
|
||
|
partitions.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>lto-max-partition</code></dt>
|
||
|
<dd><p>Size of max partition for WHOPR (in estimated instructions).
|
||
|
to provide an upper bound for individual size of partition.
|
||
|
Meant to be used only with balanced partitioning.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>cxx-max-namespaces-for-diagnostic-help</code></dt>
|
||
|
<dd><p>The maximum number of namespaces to consult for suggestions when C++
|
||
|
name lookup fails for an identifier. The default is 1000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sink-frequency-threshold</code></dt>
|
||
|
<dd><p>The maximum relative execution frequency (in percents) of the target block
|
||
|
relative to a statement’s original block to allow statement sinking of a
|
||
|
statement. Larger numbers result in more aggressive statement sinking.
|
||
|
The default value is 75. A small positive adjustment is applied for
|
||
|
statements with memory operands as those are even more profitable so sink.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-stores-to-sink</code></dt>
|
||
|
<dd><p>The maximum number of conditional store pairs that can be sunk. Set to 0
|
||
|
if either vectorization (<samp>-ftree-vectorize</samp>) or if-conversion
|
||
|
(<samp>-ftree-loop-if-convert</samp>) is disabled. The default is 2.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>allow-store-data-races</code></dt>
|
||
|
<dd><p>Allow optimizers to introduce new data races on stores.
|
||
|
Set to 1 to allow, otherwise to 0. This option is enabled by default
|
||
|
at optimization level <samp>-Ofast</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>case-values-threshold</code></dt>
|
||
|
<dd><p>The smallest number of different values for which it is best to use a
|
||
|
jump-table instead of a tree of conditional branches. If the value is
|
||
|
0, use the default for the machine. The default is 0.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>tree-reassoc-width</code></dt>
|
||
|
<dd><p>Set the maximum number of instructions executed in parallel in
|
||
|
reassociated tree. This parameter overrides target dependent
|
||
|
heuristics used by default if has non zero value.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>sched-pressure-algorithm</code></dt>
|
||
|
<dd><p>Choose between the two available implementations of
|
||
|
<samp>-fsched-pressure</samp>. Algorithm 1 is the original implementation
|
||
|
and is the more likely to prevent instructions from being reordered.
|
||
|
Algorithm 2 was designed to be a compromise between the relatively
|
||
|
conservative approach taken by algorithm 1 and the rather aggressive
|
||
|
approach taken by the default scheduler. It relies more heavily on
|
||
|
having a regular register file and accurate register pressure classes.
|
||
|
See <samp>haifa-sched.c</samp> in the GCC sources for more details.
|
||
|
</p>
|
||
|
<p>The default choice depends on the target.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-slsr-cand-scan</code></dt>
|
||
|
<dd><p>Set the maximum number of existing candidates that are considered when
|
||
|
seeking a basis for a new straight-line strength reduction candidate.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>asan-globals</code></dt>
|
||
|
<dd><p>Enable buffer overflow detection for global objects. This kind
|
||
|
of protection is enabled by default if you are using
|
||
|
<samp>-fsanitize=address</samp> option.
|
||
|
To disable global objects protection use <samp>--param asan-globals=0</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>asan-stack</code></dt>
|
||
|
<dd><p>Enable buffer overflow detection for stack objects. This kind of
|
||
|
protection is enabled by default when using <samp>-fsanitize=address</samp>.
|
||
|
To disable stack protection use <samp>--param asan-stack=0</samp> option.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>asan-instrument-reads</code></dt>
|
||
|
<dd><p>Enable buffer overflow detection for memory reads. This kind of
|
||
|
protection is enabled by default when using <samp>-fsanitize=address</samp>.
|
||
|
To disable memory reads protection use
|
||
|
<samp>--param asan-instrument-reads=0</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>asan-instrument-writes</code></dt>
|
||
|
<dd><p>Enable buffer overflow detection for memory writes. This kind of
|
||
|
protection is enabled by default when using <samp>-fsanitize=address</samp>.
|
||
|
To disable memory writes protection use
|
||
|
<samp>--param asan-instrument-writes=0</samp> option.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>asan-memintrin</code></dt>
|
||
|
<dd><p>Enable detection for built-in functions. This kind of protection
|
||
|
is enabled by default when using <samp>-fsanitize=address</samp>.
|
||
|
To disable built-in functions protection use
|
||
|
<samp>--param asan-memintrin=0</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>asan-use-after-return</code></dt>
|
||
|
<dd><p>Enable detection of use-after-return. This kind of protection
|
||
|
is enabled by default when using the <samp>-fsanitize=address</samp> option.
|
||
|
To disable it use <samp>--param asan-use-after-return=0</samp>.
|
||
|
</p>
|
||
|
<p>Note: By default the check is disabled at run time. To enable it,
|
||
|
add <code>detect_stack_use_after_return=1</code> to the environment variable
|
||
|
<code>ASAN_OPTIONS</code>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>asan-instrumentation-with-call-threshold</code></dt>
|
||
|
<dd><p>If number of memory accesses in function being instrumented
|
||
|
is greater or equal to this number, use callbacks instead of inline checks.
|
||
|
E.g. to disable inline code use
|
||
|
<samp>--param asan-instrumentation-with-call-threshold=0</samp>.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>use-after-scope-direct-emission-threshold</code></dt>
|
||
|
<dd><p>If the size of a local variable in bytes is smaller or equal to this
|
||
|
number, directly poison (or unpoison) shadow memory instead of using
|
||
|
run-time callbacks. The default value is 256.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>chkp-max-ctor-size</code></dt>
|
||
|
<dd><p>Static constructors generated by Pointer Bounds Checker may become very
|
||
|
large and significantly increase compile time at optimization level
|
||
|
<samp>-O1</samp> and higher. This parameter is a maximum number of statements
|
||
|
in a single generated constructor. Default value is 5000.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-fsm-thread-path-insns</code></dt>
|
||
|
<dd><p>Maximum number of instructions to copy when duplicating blocks on a
|
||
|
finite state automaton jump thread path. The default is 100.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-fsm-thread-length</code></dt>
|
||
|
<dd><p>Maximum number of basic blocks on a finite state automaton jump thread
|
||
|
path. The default is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-fsm-thread-paths</code></dt>
|
||
|
<dd><p>Maximum number of new jump thread paths to create for a finite state
|
||
|
automaton. The default is 50.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>parloops-chunk-size</code></dt>
|
||
|
<dd><p>Chunk size of omp schedule for loops parallelized by parloops. The default
|
||
|
is 0.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>parloops-schedule</code></dt>
|
||
|
<dd><p>Schedule type of omp schedule for loops parallelized by parloops (static,
|
||
|
dynamic, guided, auto, runtime). The default is static.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>parloops-min-per-thread</code></dt>
|
||
|
<dd><p>The minimum number of iterations per thread of an innermost parallelized
|
||
|
loop for which the parallelized variant is prefered over the single threaded
|
||
|
one. The default is 100. Note that for a parallelized loop nest the
|
||
|
minimum number of iterations of the outermost loop per thread is two.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-ssa-name-query-depth</code></dt>
|
||
|
<dd><p>Maximum depth of recursion when querying properties of SSA names in things
|
||
|
like fold routines. One level of recursion corresponds to following a
|
||
|
use-def chain.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>hsa-gen-debug-stores</code></dt>
|
||
|
<dd><p>Enable emission of special debug stores within HSA kernels which are
|
||
|
then read and reported by libgomp plugin. Generation of these stores
|
||
|
is disabled by default, use <samp>--param hsa-gen-debug-stores=1</samp> to
|
||
|
enable it.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-speculative-devirt-maydefs</code></dt>
|
||
|
<dd><p>The maximum number of may-defs we analyze when looking for a must-def
|
||
|
specifying the dynamic type of an object that invokes a virtual call
|
||
|
we may be able to devirtualize speculatively.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>max-vrp-switch-assertions</code></dt>
|
||
|
<dd><p>The maximum number of assertions to add along the default edge of a switch
|
||
|
statement during VRP. The default is 10.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>unroll-jam-min-percent</code></dt>
|
||
|
<dd><p>The minimum percentage of memory references that must be optimized
|
||
|
away for the unroll-and-jam transformation to be considered profitable.
|
||
|
</p>
|
||
|
</dd>
|
||
|
<dt><code>unroll-jam-max-unroll</code></dt>
|
||
|
<dd><p>The maximum number of times the outer loop should be unrolled by
|
||
|
the unroll-and-jam transformation.
|
||
|
</p></dd>
|
||
|
</dl>
|
||
|
</dd>
|
||
|
</dl>
|
||
|
|
||
|
<hr>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Instrumentation-Options.html#Instrumentation-Options" accesskey="n" rel="next">Instrumentation Options</a>, Previous: <a href="Debugging-Options.html#Debugging-Options" accesskey="p" rel="prev">Debugging Options</a>, Up: <a href="Invoking-GCC.html#Invoking-GCC" accesskey="u" rel="up">Invoking GCC</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|