You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
2987 lines
191 KiB
HTML
2987 lines
191 KiB
HTML
4 years ago
|
<html lang="en">
|
||
|
<head>
|
||
|
<title>Optimize Options - Using the GNU Compiler Collection (GCC)</title>
|
||
|
<meta http-equiv="Content-Type" content="text/html">
|
||
|
<meta name="description" content="Using the GNU Compiler Collection (GCC)">
|
||
|
<meta name="generator" content="makeinfo 4.13">
|
||
|
<link title="Top" rel="start" href="index.html#Top">
|
||
|
<link rel="up" href="Invoking-GCC.html#Invoking-GCC" title="Invoking GCC">
|
||
|
<link rel="prev" href="Debugging-Options.html#Debugging-Options" title="Debugging Options">
|
||
|
<link rel="next" href="Preprocessor-Options.html#Preprocessor-Options" title="Preprocessor Options">
|
||
|
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
|
||
|
<!--
|
||
|
Copyright (C) 1988-2015 Free Software Foundation, Inc.
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document
|
||
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
|
any later version published by the Free Software Foundation; with the
|
||
|
Invariant Sections being ``Funding Free Software'', the Front-Cover
|
||
|
Texts being (a) (see below), and with the Back-Cover Texts being (b)
|
||
|
(see below). A copy of the license is included in the section entitled
|
||
|
``GNU Free Documentation License''.
|
||
|
|
||
|
(a) The FSF's Front-Cover Text is:
|
||
|
|
||
|
A GNU Manual
|
||
|
|
||
|
(b) The FSF's Back-Cover Text is:
|
||
|
|
||
|
You have freedom to copy and modify this GNU Manual, like GNU
|
||
|
software. Copies published by the Free Software Foundation raise
|
||
|
funds for GNU development.-->
|
||
|
<meta http-equiv="Content-Style-Type" content="text/css">
|
||
|
<style type="text/css"><!--
|
||
|
pre.display { font-family:inherit }
|
||
|
pre.format { font-family:inherit }
|
||
|
pre.smalldisplay { font-family:inherit; font-size:smaller }
|
||
|
pre.smallformat { font-family:inherit; font-size:smaller }
|
||
|
pre.smallexample { font-size:smaller }
|
||
|
pre.smalllisp { font-size:smaller }
|
||
|
span.sc { font-variant:small-caps }
|
||
|
span.roman { font-family:serif; font-weight:normal; }
|
||
|
span.sansserif { font-family:sans-serif; font-weight:normal; }
|
||
|
--></style>
|
||
|
</head>
|
||
|
<body>
|
||
|
<div class="node">
|
||
|
<a name="Optimize-Options"></a>
|
||
|
<p>
|
||
|
Next: <a rel="next" accesskey="n" href="Preprocessor-Options.html#Preprocessor-Options">Preprocessor Options</a>,
|
||
|
Previous: <a rel="previous" accesskey="p" href="Debugging-Options.html#Debugging-Options">Debugging Options</a>,
|
||
|
Up: <a rel="up" accesskey="u" href="Invoking-GCC.html#Invoking-GCC">Invoking GCC</a>
|
||
|
<hr>
|
||
|
</div>
|
||
|
|
||
|
<h3 class="section">3.10 Options That Control Optimization</h3>
|
||
|
|
||
|
<p><a name="index-optimize-options-881"></a><a name="index-options_002c-optimization-882"></a>
|
||
|
These options control various sorts of optimizations.
|
||
|
|
||
|
<p>Without any optimization option, the compiler's goal is to reduce the
|
||
|
cost of compilation and to make debugging produce the expected
|
||
|
results. Statements are independent: if you stop the program with a
|
||
|
breakpoint between statements, you can then assign a new value to any
|
||
|
variable or change the program counter to any other statement in the
|
||
|
function and get exactly the results you expect from the source
|
||
|
code.
|
||
|
|
||
|
<p>Turning on optimization flags makes the compiler attempt to improve
|
||
|
the performance and/or code size at the expense of compilation time
|
||
|
and possibly the ability to debug the program.
|
||
|
|
||
|
<p>The compiler performs optimization based on the knowledge it has of the
|
||
|
program. Compiling multiple files at once to a single output file mode allows
|
||
|
the compiler to use information gained from all of the files when compiling
|
||
|
each of them.
|
||
|
|
||
|
<p>Not all optimizations are controlled directly by a flag. Only
|
||
|
optimizations that have a flag are listed in this section.
|
||
|
|
||
|
<p>Most optimizations are only enabled if an <samp><span class="option">-O</span></samp> level is set on
|
||
|
the command line. Otherwise they are disabled, even if individual
|
||
|
optimization flags are specified.
|
||
|
|
||
|
<p>Depending on the target and how GCC was configured, a slightly different
|
||
|
set of optimizations may be enabled at each <samp><span class="option">-O</span></samp> level than
|
||
|
those listed here. You can invoke GCC with <samp><span class="option">-Q --help=optimizers</span></samp>
|
||
|
to find out the exact set of optimizations that are enabled at each level.
|
||
|
See <a href="Overall-Options.html#Overall-Options">Overall Options</a>, for examples.
|
||
|
|
||
|
<dl>
|
||
|
<dt><code>-O</code><dt><code>-O1</code><dd><a name="index-O-883"></a><a name="index-O1-884"></a>Optimize. Optimizing compilation takes somewhat more time, and a lot
|
||
|
more memory for a large function.
|
||
|
|
||
|
<p>With <samp><span class="option">-O</span></samp>, the compiler tries to reduce code size and execution
|
||
|
time, without performing any optimizations that take a great deal of
|
||
|
compilation time.
|
||
|
|
||
|
<p><samp><span class="option">-O</span></samp> turns on the following optimization flags:
|
||
|
<pre class="smallexample"> -fauto-inc-dec
|
||
|
-fbranch-count-reg
|
||
|
-fcombine-stack-adjustments
|
||
|
-fcompare-elim
|
||
|
-fcprop-registers
|
||
|
-fdce
|
||
|
-fdefer-pop
|
||
|
-fdelayed-branch
|
||
|
-fdse
|
||
|
-fforward-propagate
|
||
|
-fguess-branch-probability
|
||
|
-fif-conversion2
|
||
|
-fif-conversion
|
||
|
-finline-functions-called-once
|
||
|
-fipa-pure-const
|
||
|
-fipa-profile
|
||
|
-fipa-reference
|
||
|
-fmerge-constants
|
||
|
-fmove-loop-invariants
|
||
|
-fshrink-wrap
|
||
|
-fsplit-wide-types
|
||
|
-ftree-bit-ccp
|
||
|
-ftree-ccp
|
||
|
-fssa-phiopt
|
||
|
-ftree-ch
|
||
|
-ftree-copy-prop
|
||
|
-ftree-copyrename
|
||
|
-ftree-dce
|
||
|
-ftree-dominator-opts
|
||
|
-ftree-dse
|
||
|
-ftree-forwprop
|
||
|
-ftree-fre
|
||
|
-ftree-phiprop
|
||
|
-ftree-sink
|
||
|
-ftree-slsr
|
||
|
-ftree-sra
|
||
|
-ftree-pta
|
||
|
-ftree-ter
|
||
|
-funit-at-a-time
|
||
|
</pre>
|
||
|
<p><samp><span class="option">-O</span></samp> also turns on <samp><span class="option">-fomit-frame-pointer</span></samp> on machines
|
||
|
where doing so does not interfere with debugging.
|
||
|
|
||
|
<br><dt><code>-O2</code><dd><a name="index-O2-885"></a>Optimize even more. GCC performs nearly all supported optimizations
|
||
|
that do not involve a space-speed tradeoff.
|
||
|
As compared to <samp><span class="option">-O</span></samp>, this option increases both compilation time
|
||
|
and the performance of the generated code.
|
||
|
|
||
|
<p><samp><span class="option">-O2</span></samp> turns on all optimization flags specified by <samp><span class="option">-O</span></samp>. It
|
||
|
also turns on the following optimization flags:
|
||
|
<pre class="smallexample"> -fthread-jumps
|
||
|
-falign-functions -falign-jumps
|
||
|
-falign-loops -falign-labels
|
||
|
-fcaller-saves
|
||
|
-fcrossjumping
|
||
|
-fcse-follow-jumps -fcse-skip-blocks
|
||
|
-fdelete-null-pointer-checks
|
||
|
-fdevirtualize -fdevirtualize-speculatively
|
||
|
-fexpensive-optimizations
|
||
|
-fgcse -fgcse-lm
|
||
|
-fhoist-adjacent-loads
|
||
|
-finline-small-functions
|
||
|
-findirect-inlining
|
||
|
-fipa-cp
|
||
|
-fipa-cp-alignment
|
||
|
-fipa-sra
|
||
|
-fipa-icf
|
||
|
-fisolate-erroneous-paths-dereference
|
||
|
-flra-remat
|
||
|
-foptimize-sibling-calls
|
||
|
-foptimize-strlen
|
||
|
-fpartial-inlining
|
||
|
-fpeephole2
|
||
|
-freorder-blocks -freorder-blocks-and-partition -freorder-functions
|
||
|
-frerun-cse-after-loop
|
||
|
-fsched-interblock -fsched-spec
|
||
|
-fschedule-insns -fschedule-insns2
|
||
|
-fstrict-aliasing -fstrict-overflow
|
||
|
-ftree-builtin-call-dce
|
||
|
-ftree-switch-conversion -ftree-tail-merge
|
||
|
-ftree-pre
|
||
|
-ftree-vrp
|
||
|
-fipa-ra
|
||
|
</pre>
|
||
|
<p>Please note the warning under <samp><span class="option">-fgcse</span></samp> about
|
||
|
invoking <samp><span class="option">-O2</span></samp> on programs that use computed gotos.
|
||
|
|
||
|
<br><dt><code>-O3</code><dd><a name="index-O3-886"></a>Optimize yet more. <samp><span class="option">-O3</span></samp> turns on all optimizations specified
|
||
|
by <samp><span class="option">-O2</span></samp> and also turns on the <samp><span class="option">-finline-functions</span></samp>,
|
||
|
<samp><span class="option">-funswitch-loops</span></samp>, <samp><span class="option">-fpredictive-commoning</span></samp>,
|
||
|
<samp><span class="option">-fgcse-after-reload</span></samp>, <samp><span class="option">-ftree-loop-vectorize</span></samp>,
|
||
|
<samp><span class="option">-ftree-loop-distribute-patterns</span></samp>,
|
||
|
<samp><span class="option">-ftree-slp-vectorize</span></samp>, <samp><span class="option">-fvect-cost-model</span></samp>,
|
||
|
<samp><span class="option">-ftree-partial-pre</span></samp> and <samp><span class="option">-fipa-cp-clone</span></samp> options.
|
||
|
|
||
|
<br><dt><code>-O0</code><dd><a name="index-O0-887"></a>Reduce compilation time and make debugging produce the expected
|
||
|
results. This is the default.
|
||
|
|
||
|
<br><dt><code>-Os</code><dd><a name="index-Os-888"></a>Optimize for size. <samp><span class="option">-Os</span></samp> enables all <samp><span class="option">-O2</span></samp> optimizations that
|
||
|
do not typically increase code size. It also performs further
|
||
|
optimizations designed to reduce code size.
|
||
|
|
||
|
<p><samp><span class="option">-Os</span></samp> disables the following optimization flags:
|
||
|
<pre class="smallexample"> -falign-functions -falign-jumps -falign-loops
|
||
|
-falign-labels -freorder-blocks -freorder-blocks-and-partition
|
||
|
-fprefetch-loop-arrays
|
||
|
</pre>
|
||
|
<br><dt><code>-Ofast</code><dd><a name="index-Ofast-889"></a>Disregard strict standards compliance. <samp><span class="option">-Ofast</span></samp> enables all
|
||
|
<samp><span class="option">-O3</span></samp> optimizations. It also enables optimizations that are not
|
||
|
valid for all standard-compliant programs.
|
||
|
It turns on <samp><span class="option">-ffast-math</span></samp> and the Fortran-specific
|
||
|
<samp><span class="option">-fno-protect-parens</span></samp> and <samp><span class="option">-fstack-arrays</span></samp>.
|
||
|
|
||
|
<br><dt><code>-Og</code><dd><a name="index-Og-890"></a>Optimize debugging experience. <samp><span class="option">-Og</span></samp> enables optimizations
|
||
|
that do not interfere with debugging. It should be the optimization
|
||
|
level of choice for the standard edit-compile-debug cycle, offering
|
||
|
a reasonable level of optimization while maintaining fast compilation
|
||
|
and a good debugging experience.
|
||
|
|
||
|
<p>If you use multiple <samp><span class="option">-O</span></samp> options, with or without level numbers,
|
||
|
the last such option is the one that is effective.
|
||
|
</dl>
|
||
|
|
||
|
<p>Options of the form <samp><span class="option">-f</span><var>flag</var></samp> specify machine-independent
|
||
|
flags. Most flags have both positive and negative forms; the negative
|
||
|
form of <samp><span class="option">-ffoo</span></samp> is <samp><span class="option">-fno-foo</span></samp>. In the table
|
||
|
below, only one of the forms is listed—the one you typically
|
||
|
use. You can figure out the other form by either removing ‘<samp><span class="samp">no-</span></samp>’
|
||
|
or adding it.
|
||
|
|
||
|
<p>The following options control specific optimizations. They are either
|
||
|
activated by <samp><span class="option">-O</span></samp> options or are related to ones that are. You
|
||
|
can use the following flags in the rare cases when “fine-tuning” of
|
||
|
optimizations to be performed is desired.
|
||
|
|
||
|
<dl>
|
||
|
<dt><code>-fno-defer-pop</code><dd><a name="index-fno_002ddefer_002dpop-891"></a>Always pop the arguments to each function call as soon as that function
|
||
|
returns. For machines that must pop arguments after a function call,
|
||
|
the compiler normally lets arguments accumulate on the stack for several
|
||
|
function calls and pops them all at once.
|
||
|
|
||
|
<p>Disabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fforward-propagate</code><dd><a name="index-fforward_002dpropagate-892"></a>Perform a forward propagation pass on RTL. The pass tries to combine two
|
||
|
instructions and checks if the result can be simplified. If loop unrolling
|
||
|
is active, two passes are performed and the second is scheduled after
|
||
|
loop unrolling.
|
||
|
|
||
|
<p>This option is enabled by default at optimization levels <samp><span class="option">-O</span></samp>,
|
||
|
<samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-ffp-contract=</code><var>style</var><dd><a name="index-ffp_002dcontract-893"></a><samp><span class="option">-ffp-contract=off</span></samp> disables floating-point expression contraction.
|
||
|
<samp><span class="option">-ffp-contract=fast</span></samp> enables floating-point expression contraction
|
||
|
such as forming of fused multiply-add operations if the target has
|
||
|
native support for them.
|
||
|
<samp><span class="option">-ffp-contract=on</span></samp> enables floating-point expression contraction
|
||
|
if allowed by the language standard. This is currently not implemented
|
||
|
and treated equal to <samp><span class="option">-ffp-contract=off</span></samp>.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-ffp-contract=fast</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fomit-frame-pointer</code><dd><a name="index-fomit_002dframe_002dpointer-894"></a>Don't keep the frame pointer in a register for functions that
|
||
|
don't need one. This avoids the instructions to save, set up and
|
||
|
restore frame pointers; it also makes an extra register available
|
||
|
in many functions. <strong>It also makes debugging impossible on
|
||
|
some machines.</strong>
|
||
|
|
||
|
<p>On some machines, such as the VAX, this flag has no effect, because
|
||
|
the standard calling sequence automatically handles the frame pointer
|
||
|
and nothing is saved by pretending it doesn't exist. The
|
||
|
machine-description macro <code>FRAME_POINTER_REQUIRED</code> controls
|
||
|
whether a target machine supports this flag. See <a href="../gccint/Registers.html#Registers">Register Usage</a>.
|
||
|
|
||
|
<p>The default setting (when not optimizing for
|
||
|
size) for 32-bit GNU/Linux x86 and 32-bit Darwin x86 targets is
|
||
|
<samp><span class="option">-fomit-frame-pointer</span></samp>. You can configure GCC with the
|
||
|
<samp><span class="option">--enable-frame-pointer</span></samp> configure option to change the default.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-foptimize-sibling-calls</code><dd><a name="index-foptimize_002dsibling_002dcalls-895"></a>Optimize sibling and tail recursive calls.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-foptimize-strlen</code><dd><a name="index-foptimize_002dstrlen-896"></a>Optimize various standard C string functions (e.g. <code>strlen</code>,
|
||
|
<code>strchr</code> or <code>strcpy</code>) and
|
||
|
their <code>_FORTIFY_SOURCE</code> counterparts into faster alternatives.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-inline</code><dd><a name="index-fno_002dinline-897"></a>Do not expand any functions inline apart from those marked with
|
||
|
the <code>always_inline</code> attribute. This is the default when not
|
||
|
optimizing.
|
||
|
|
||
|
<p>Single functions can be exempted from inlining by marking them
|
||
|
with the <code>noinline</code> attribute.
|
||
|
|
||
|
<br><dt><code>-finline-small-functions</code><dd><a name="index-finline_002dsmall_002dfunctions-898"></a>Integrate functions into their callers when their body is smaller than expected
|
||
|
function call code (so overall size of program gets smaller). The compiler
|
||
|
heuristically decides which functions are simple enough to be worth integrating
|
||
|
in this way. This inlining applies to all functions, even those not declared
|
||
|
inline.
|
||
|
|
||
|
<p>Enabled at level <samp><span class="option">-O2</span></samp>.
|
||
|
|
||
|
<br><dt><code>-findirect-inlining</code><dd><a name="index-findirect_002dinlining-899"></a>Inline also indirect calls that are discovered to be known at compile
|
||
|
time thanks to previous inlining. This option has any effect only
|
||
|
when inlining itself is turned on by the <samp><span class="option">-finline-functions</span></samp>
|
||
|
or <samp><span class="option">-finline-small-functions</span></samp> options.
|
||
|
|
||
|
<p>Enabled at level <samp><span class="option">-O2</span></samp>.
|
||
|
|
||
|
<br><dt><code>-finline-functions</code><dd><a name="index-finline_002dfunctions-900"></a>Consider all functions for inlining, even if they are not declared inline.
|
||
|
The compiler heuristically decides which functions are worth integrating
|
||
|
in this way.
|
||
|
|
||
|
<p>If all calls to a given function are integrated, and the function is
|
||
|
declared <code>static</code>, then the function is normally not output as
|
||
|
assembler code in its own right.
|
||
|
|
||
|
<p>Enabled at level <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-finline-functions-called-once</code><dd><a name="index-finline_002dfunctions_002dcalled_002donce-901"></a>Consider all <code>static</code> functions called once for inlining into their
|
||
|
caller even if they are not marked <code>inline</code>. If a call to a given
|
||
|
function is integrated, then the function is not output as assembler code
|
||
|
in its own right.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O1</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp> and <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fearly-inlining</code><dd><a name="index-fearly_002dinlining-902"></a>Inline functions marked by <code>always_inline</code> and functions whose body seems
|
||
|
smaller than the function call overhead early before doing
|
||
|
<samp><span class="option">-fprofile-generate</span></samp> instrumentation and real inlining pass. Doing so
|
||
|
makes profiling significantly cheaper and usually inlining faster on programs
|
||
|
having large chains of nested wrapper functions.
|
||
|
|
||
|
<p>Enabled by default.
|
||
|
|
||
|
<br><dt><code>-fipa-sra</code><dd><a name="index-fipa_002dsra-903"></a>Perform interprocedural scalar replacement of aggregates, removal of
|
||
|
unused parameters and replacement of parameters passed by reference
|
||
|
by parameters passed by value.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp> and <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-finline-limit=</code><var>n</var><dd><a name="index-finline_002dlimit-904"></a>By default, GCC limits the size of functions that can be inlined. This flag
|
||
|
allows coarse control of this limit. <var>n</var> is the size of functions that
|
||
|
can be inlined in number of pseudo instructions.
|
||
|
|
||
|
<p>Inlining is actually controlled by a number of parameters, which may be
|
||
|
specified individually by using <samp><span class="option">--param </span><var>name</var><span class="option">=</span><var>value</var></samp>.
|
||
|
The <samp><span class="option">-finline-limit=</span><var>n</var></samp> option sets some of these parameters
|
||
|
as follows:
|
||
|
|
||
|
<dl>
|
||
|
<dt><code>max-inline-insns-single</code><dd>is set to <var>n</var>/2.
|
||
|
<br><dt><code>max-inline-insns-auto</code><dd>is set to <var>n</var>/2.
|
||
|
</dl>
|
||
|
|
||
|
<p>See below for a documentation of the individual
|
||
|
parameters controlling inlining and for the defaults of these parameters.
|
||
|
|
||
|
<p><em>Note:</em> there may be no value to <samp><span class="option">-finline-limit</span></samp> that results
|
||
|
in default behavior.
|
||
|
|
||
|
<p><em>Note:</em> pseudo instruction represents, in this particular context, an
|
||
|
abstract measurement of function's size. In no way does it represent a count
|
||
|
of assembly instructions and as such its exact meaning might change from one
|
||
|
release to an another.
|
||
|
|
||
|
<br><dt><code>-fno-keep-inline-dllexport</code><dd><a name="index-fno_002dkeep_002dinline_002ddllexport-905"></a>This is a more fine-grained version of <samp><span class="option">-fkeep-inline-functions</span></samp>,
|
||
|
which applies only to functions that are declared using the <code>dllexport</code>
|
||
|
attribute or declspec (See <a href="Function-Attributes.html#Function-Attributes">Declaring Attributes of Functions</a>.)
|
||
|
|
||
|
<br><dt><code>-fkeep-inline-functions</code><dd><a name="index-fkeep_002dinline_002dfunctions-906"></a>In C, emit <code>static</code> functions that are declared <code>inline</code>
|
||
|
into the object file, even if the function has been inlined into all
|
||
|
of its callers. This switch does not affect functions using the
|
||
|
<code>extern inline</code> extension in GNU C90. In C++, emit any and all
|
||
|
inline functions into the object file.
|
||
|
|
||
|
<br><dt><code>-fkeep-static-consts</code><dd><a name="index-fkeep_002dstatic_002dconsts-907"></a>Emit variables declared <code>static const</code> when optimization isn't turned
|
||
|
on, even if the variables aren't referenced.
|
||
|
|
||
|
<p>GCC enables this option by default. If you want to force the compiler to
|
||
|
check if a variable is referenced, regardless of whether or not
|
||
|
optimization is turned on, use the <samp><span class="option">-fno-keep-static-consts</span></samp> option.
|
||
|
|
||
|
<br><dt><code>-fmerge-constants</code><dd><a name="index-fmerge_002dconstants-908"></a>Attempt to merge identical constants (string constants and floating-point
|
||
|
constants) across compilation units.
|
||
|
|
||
|
<p>This option is the default for optimized compilation if the assembler and
|
||
|
linker support it. Use <samp><span class="option">-fno-merge-constants</span></samp> to inhibit this
|
||
|
behavior.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fmerge-all-constants</code><dd><a name="index-fmerge_002dall_002dconstants-909"></a>Attempt to merge identical constants and identical variables.
|
||
|
|
||
|
<p>This option implies <samp><span class="option">-fmerge-constants</span></samp>. In addition to
|
||
|
<samp><span class="option">-fmerge-constants</span></samp> this considers e.g. even constant initialized
|
||
|
arrays or initialized constant variables with integral or floating-point
|
||
|
types. Languages like C or C++ require each variable, including multiple
|
||
|
instances of the same variable in recursive calls, to have distinct locations,
|
||
|
so using this option results in non-conforming
|
||
|
behavior.
|
||
|
|
||
|
<br><dt><code>-fmodulo-sched</code><dd><a name="index-fmodulo_002dsched-910"></a>Perform swing modulo scheduling immediately before the first scheduling
|
||
|
pass. This pass looks at innermost loops and reorders their
|
||
|
instructions by overlapping different iterations.
|
||
|
|
||
|
<br><dt><code>-fmodulo-sched-allow-regmoves</code><dd><a name="index-fmodulo_002dsched_002dallow_002dregmoves-911"></a>Perform more aggressive SMS-based modulo scheduling with register moves
|
||
|
allowed. By setting this flag certain anti-dependences edges are
|
||
|
deleted, which triggers the generation of reg-moves based on the
|
||
|
life-range analysis. This option is effective only with
|
||
|
<samp><span class="option">-fmodulo-sched</span></samp> enabled.
|
||
|
|
||
|
<br><dt><code>-fno-branch-count-reg</code><dd><a name="index-fno_002dbranch_002dcount_002dreg-912"></a>Do not use “decrement and branch” instructions on a count register,
|
||
|
but instead generate a sequence of instructions that decrement a
|
||
|
register, compare it against zero, then branch based upon the result.
|
||
|
This option is only meaningful on architectures that support such
|
||
|
instructions, which include x86, PowerPC, IA-64 and S/390.
|
||
|
|
||
|
<p>Enabled by default at <samp><span class="option">-O1</span></samp> and higher.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fbranch-count-reg</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-function-cse</code><dd><a name="index-fno_002dfunction_002dcse-913"></a>Do not put function addresses in registers; make each instruction that
|
||
|
calls a constant function contain the function's address explicitly.
|
||
|
|
||
|
<p>This option results in less efficient code, but some strange hacks
|
||
|
that alter the assembler output may be confused by the optimizations
|
||
|
performed when this option is not used.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-ffunction-cse</span></samp>
|
||
|
|
||
|
<br><dt><code>-fno-zero-initialized-in-bss</code><dd><a name="index-fno_002dzero_002dinitialized_002din_002dbss-914"></a>If the target supports a BSS section, GCC by default puts variables that
|
||
|
are initialized to zero into BSS. This can save space in the resulting
|
||
|
code.
|
||
|
|
||
|
<p>This option turns off this behavior because some programs explicitly
|
||
|
rely on variables going to the data section—e.g., so that the
|
||
|
resulting executable can find the beginning of that section and/or make
|
||
|
assumptions based on that.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fzero-initialized-in-bss</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fthread-jumps</code><dd><a name="index-fthread_002djumps-915"></a>Perform optimizations that check to see if a jump branches to a
|
||
|
location where another comparison subsumed by the first is found. If
|
||
|
so, the first branch is redirected to either the destination of the
|
||
|
second branch or a point immediately following it, depending on whether
|
||
|
the condition is known to be true or false.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fsplit-wide-types</code><dd><a name="index-fsplit_002dwide_002dtypes-916"></a>When using a type that occupies multiple registers, such as <code>long
|
||
|
long</code> on a 32-bit system, split the registers apart and allocate them
|
||
|
independently. This normally generates better code for those types,
|
||
|
but may make debugging more difficult.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>,
|
||
|
<samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fcse-follow-jumps</code><dd><a name="index-fcse_002dfollow_002djumps-917"></a>In common subexpression elimination (CSE), scan through jump instructions
|
||
|
when the target of the jump is not reached by any other path. For
|
||
|
example, when CSE encounters an <code>if</code> statement with an
|
||
|
<code>else</code> clause, CSE follows the jump when the condition
|
||
|
tested is false.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fcse-skip-blocks</code><dd><a name="index-fcse_002dskip_002dblocks-918"></a>This is similar to <samp><span class="option">-fcse-follow-jumps</span></samp>, but causes CSE to
|
||
|
follow jumps that conditionally skip over blocks. When CSE
|
||
|
encounters a simple <code>if</code> statement with no else clause,
|
||
|
<samp><span class="option">-fcse-skip-blocks</span></samp> causes CSE to follow the jump around the
|
||
|
body of the <code>if</code>.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-frerun-cse-after-loop</code><dd><a name="index-frerun_002dcse_002dafter_002dloop-919"></a>Re-run common subexpression elimination after loop optimizations are
|
||
|
performed.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fgcse</code><dd><a name="index-fgcse-920"></a>Perform a global common subexpression elimination pass.
|
||
|
This pass also performs global constant and copy propagation.
|
||
|
|
||
|
<p><em>Note:</em> When compiling a program using computed gotos, a GCC
|
||
|
extension, you may get better run-time performance if you disable
|
||
|
the global common subexpression elimination pass by adding
|
||
|
<samp><span class="option">-fno-gcse</span></samp> to the command line.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fgcse-lm</code><dd><a name="index-fgcse_002dlm-921"></a>When <samp><span class="option">-fgcse-lm</span></samp> is enabled, global common subexpression elimination
|
||
|
attempts to move loads that are only killed by stores into themselves. This
|
||
|
allows a loop containing a load/store sequence to be changed to a load outside
|
||
|
the loop, and a copy/store within the loop.
|
||
|
|
||
|
<p>Enabled by default when <samp><span class="option">-fgcse</span></samp> is enabled.
|
||
|
|
||
|
<br><dt><code>-fgcse-sm</code><dd><a name="index-fgcse_002dsm-922"></a>When <samp><span class="option">-fgcse-sm</span></samp> is enabled, a store motion pass is run after
|
||
|
global common subexpression elimination. This pass attempts to move
|
||
|
stores out of loops. When used in conjunction with <samp><span class="option">-fgcse-lm</span></samp>,
|
||
|
loops containing a load/store sequence can be changed to a load before
|
||
|
the loop and a store after the loop.
|
||
|
|
||
|
<p>Not enabled at any optimization level.
|
||
|
|
||
|
<br><dt><code>-fgcse-las</code><dd><a name="index-fgcse_002dlas-923"></a>When <samp><span class="option">-fgcse-las</span></samp> is enabled, the global common subexpression
|
||
|
elimination pass eliminates redundant loads that come after stores to the
|
||
|
same memory location (both partial and full redundancies).
|
||
|
|
||
|
<p>Not enabled at any optimization level.
|
||
|
|
||
|
<br><dt><code>-fgcse-after-reload</code><dd><a name="index-fgcse_002dafter_002dreload-924"></a>When <samp><span class="option">-fgcse-after-reload</span></samp> is enabled, a redundant load elimination
|
||
|
pass is performed after reload. The purpose of this pass is to clean up
|
||
|
redundant spilling.
|
||
|
|
||
|
<br><dt><code>-faggressive-loop-optimizations</code><dd><a name="index-faggressive_002dloop_002doptimizations-925"></a>This option tells the loop optimizer to use language constraints to
|
||
|
derive bounds for the number of iterations of a loop. This assumes that
|
||
|
loop code does not invoke undefined behavior by for example causing signed
|
||
|
integer overflows or out-of-bound array accesses. The bounds for the
|
||
|
number of iterations of a loop are used to guide loop unrolling and peeling
|
||
|
and loop exit test optimizations.
|
||
|
This option is enabled by default.
|
||
|
|
||
|
<br><dt><code>-funsafe-loop-optimizations</code><dd><a name="index-funsafe_002dloop_002doptimizations-926"></a>This option tells the loop optimizer to assume that loop indices do not
|
||
|
overflow, and that loops with nontrivial exit condition are not
|
||
|
infinite. This enables a wider range of loop optimizations even if
|
||
|
the loop optimizer itself cannot prove that these assumptions are valid.
|
||
|
If you use <samp><span class="option">-Wunsafe-loop-optimizations</span></samp>, the compiler warns you
|
||
|
if it finds this kind of loop.
|
||
|
|
||
|
<br><dt><code>-fcrossjumping</code><dd><a name="index-fcrossjumping-927"></a>Perform cross-jumping transformation.
|
||
|
This transformation unifies equivalent code and saves code size. The
|
||
|
resulting code may or may not perform better than without cross-jumping.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fauto-inc-dec</code><dd><a name="index-fauto_002dinc_002ddec-928"></a>Combine increments or decrements of addresses with memory accesses.
|
||
|
This pass is always skipped on architectures that do not have
|
||
|
instructions to support this. Enabled by default at <samp><span class="option">-O</span></samp> and
|
||
|
higher on architectures that support this.
|
||
|
|
||
|
<br><dt><code>-fdce</code><dd><a name="index-fdce-929"></a>Perform dead code elimination (DCE) on RTL.
|
||
|
Enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fdse</code><dd><a name="index-fdse-930"></a>Perform dead store elimination (DSE) on RTL.
|
||
|
Enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fif-conversion</code><dd><a name="index-fif_002dconversion-931"></a>Attempt to transform conditional jumps into branch-less equivalents. This
|
||
|
includes use of conditional moves, min, max, set flags and abs instructions, and
|
||
|
some tricks doable by standard arithmetics. The use of conditional execution
|
||
|
on chips where it is available is controlled by <samp><span class="option">-fif-conversion2</span></samp>.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fif-conversion2</code><dd><a name="index-fif_002dconversion2-932"></a>Use conditional execution (where available) to transform conditional jumps into
|
||
|
branch-less equivalents.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fdeclone-ctor-dtor</code><dd><a name="index-fdeclone_002dctor_002ddtor-933"></a>The C++ ABI requires multiple entry points for constructors and
|
||
|
destructors: one for a base subobject, one for a complete object, and
|
||
|
one for a virtual destructor that calls operator delete afterwards.
|
||
|
For a hierarchy with virtual bases, the base and complete variants are
|
||
|
clones, which means two copies of the function. With this option, the
|
||
|
base and complete variants are changed to be thunks that call a common
|
||
|
implementation.
|
||
|
|
||
|
<p>Enabled by <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fdelete-null-pointer-checks</code><dd><a name="index-fdelete_002dnull_002dpointer_002dchecks-934"></a>Assume that programs cannot safely dereference null pointers, and that
|
||
|
no code or data element resides there. This enables simple constant
|
||
|
folding optimizations at all optimization levels. In addition, other
|
||
|
optimization passes in GCC use this flag to control global dataflow
|
||
|
analyses that eliminate useless checks for null pointers; these assume
|
||
|
that if a pointer is checked after it has already been dereferenced,
|
||
|
it cannot be null.
|
||
|
|
||
|
<p>Note however that in some environments this assumption is not true.
|
||
|
Use <samp><span class="option">-fno-delete-null-pointer-checks</span></samp> to disable this optimization
|
||
|
for programs that depend on that behavior.
|
||
|
|
||
|
<p>Some targets, especially embedded ones, disable this option at all levels.
|
||
|
Otherwise it is enabled at all levels: <samp><span class="option">-O0</span></samp>, <samp><span class="option">-O1</span></samp>,
|
||
|
<samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>. Passes that use the information
|
||
|
are enabled independently at different optimization levels.
|
||
|
|
||
|
<br><dt><code>-fdevirtualize</code><dd><a name="index-fdevirtualize-935"></a>Attempt to convert calls to virtual functions to direct calls. This
|
||
|
is done both within a procedure and interprocedurally as part of
|
||
|
indirect inlining (<samp><span class="option">-findirect-inlining</span></samp>) and interprocedural constant
|
||
|
propagation (<samp><span class="option">-fipa-cp</span></samp>).
|
||
|
Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fdevirtualize-speculatively</code><dd><a name="index-fdevirtualize_002dspeculatively-936"></a>Attempt to convert calls to virtual functions to speculative direct calls.
|
||
|
Based on the analysis of the type inheritance graph, determine for a given call
|
||
|
the set of likely targets. If the set is small, preferably of size 1, change
|
||
|
the call into a conditional deciding between direct and indirect calls. The
|
||
|
speculative calls enable more optimizations, such as inlining. When they seem
|
||
|
useless after further optimization, they are converted back into original form.
|
||
|
|
||
|
<br><dt><code>-fdevirtualize-at-ltrans</code><dd><a name="index-fdevirtualize_002dat_002dltrans-937"></a>Stream extra information needed for aggressive devirtualization when running
|
||
|
the link-time optimizer in local transformation mode.
|
||
|
This option enables more devirtualization but
|
||
|
significantly increases the size of streamed data. For this reason it is
|
||
|
disabled by default.
|
||
|
|
||
|
<br><dt><code>-fexpensive-optimizations</code><dd><a name="index-fexpensive_002doptimizations-938"></a>Perform a number of minor optimizations that are relatively expensive.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-free</code><dd><a name="index-free-939"></a>Attempt to remove redundant extension instructions. This is especially
|
||
|
helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
|
||
|
registers after writing to their lower 32-bit half.
|
||
|
|
||
|
<p>Enabled for Alpha, AArch64 and x86 at levels <samp><span class="option">-O2</span></samp>,
|
||
|
<samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-lifetime-dse</code><dd><a name="index-fno_002dlifetime_002ddse-940"></a>In C++ the value of an object is only affected by changes within its
|
||
|
lifetime: when the constructor begins, the object has an indeterminate
|
||
|
value, and any changes during the lifetime of the object are dead when
|
||
|
the object is destroyed. Normally dead store elimination will take
|
||
|
advantage of this; if your code relies on the value of the object
|
||
|
storage persisting beyond the lifetime of the object, you can use this
|
||
|
flag to disable this optimization.
|
||
|
|
||
|
<br><dt><code>-flive-range-shrinkage</code><dd><a name="index-flive_002drange_002dshrinkage-941"></a>Attempt to decrease register pressure through register live range
|
||
|
shrinkage. This is helpful for fast processors with small or moderate
|
||
|
size register sets.
|
||
|
|
||
|
<br><dt><code>-fira-algorithm=</code><var>algorithm</var><dd><a name="index-fira_002dalgorithm-942"></a>Use the specified coloring algorithm for the integrated register
|
||
|
allocator. The <var>algorithm</var> argument can be ‘<samp><span class="samp">priority</span></samp>’, which
|
||
|
specifies Chow's priority coloring, or ‘<samp><span class="samp">CB</span></samp>’, which specifies
|
||
|
Chaitin-Briggs coloring. Chaitin-Briggs coloring is not implemented
|
||
|
for all architectures, but for those targets that do support it, it is
|
||
|
the default because it generates better code.
|
||
|
|
||
|
<br><dt><code>-fira-region=</code><var>region</var><dd><a name="index-fira_002dregion-943"></a>Use specified regions for the integrated register allocator. The
|
||
|
<var>region</var> argument should be one of the following:
|
||
|
|
||
|
<dl>
|
||
|
<dt>‘<samp><span class="samp">all</span></samp>’<dd>Use all loops as register allocation regions.
|
||
|
This can give the best results for machines with a small and/or
|
||
|
irregular register set.
|
||
|
|
||
|
<br><dt>‘<samp><span class="samp">mixed</span></samp>’<dd>Use all loops except for loops with small register pressure
|
||
|
as the regions. This value usually gives
|
||
|
the best results in most cases and for most architectures,
|
||
|
and is enabled by default when compiling with optimization for speed
|
||
|
(<samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <small class="dots">...</small>).
|
||
|
|
||
|
<br><dt>‘<samp><span class="samp">one</span></samp>’<dd>Use all functions as a single region.
|
||
|
This typically results in the smallest code size, and is enabled by default for
|
||
|
<samp><span class="option">-Os</span></samp> or <samp><span class="option">-O0</span></samp>.
|
||
|
|
||
|
</dl>
|
||
|
|
||
|
<br><dt><code>-fira-hoist-pressure</code><dd><a name="index-fira_002dhoist_002dpressure-944"></a>Use IRA to evaluate register pressure in the code hoisting pass for
|
||
|
decisions to hoist expressions. This option usually results in smaller
|
||
|
code, but it can slow the compiler down.
|
||
|
|
||
|
<p>This option is enabled at level <samp><span class="option">-Os</span></samp> for all targets.
|
||
|
|
||
|
<br><dt><code>-fira-loop-pressure</code><dd><a name="index-fira_002dloop_002dpressure-945"></a>Use IRA to evaluate register pressure in loops for decisions to move
|
||
|
loop invariants. This option usually results in generation
|
||
|
of faster and smaller code on machines with large register files (>= 32
|
||
|
registers), but it can slow the compiler down.
|
||
|
|
||
|
<p>This option is enabled at level <samp><span class="option">-O3</span></samp> for some targets.
|
||
|
|
||
|
<br><dt><code>-fno-ira-share-save-slots</code><dd><a name="index-fno_002dira_002dshare_002dsave_002dslots-946"></a>Disable sharing of stack slots used for saving call-used hard
|
||
|
registers living through a call. Each hard register gets a
|
||
|
separate stack slot, and as a result function stack frames are
|
||
|
larger.
|
||
|
|
||
|
<br><dt><code>-fno-ira-share-spill-slots</code><dd><a name="index-fno_002dira_002dshare_002dspill_002dslots-947"></a>Disable sharing of stack slots allocated for pseudo-registers. Each
|
||
|
pseudo-register that does not get a hard register gets a separate
|
||
|
stack slot, and as a result function stack frames are larger.
|
||
|
|
||
|
<br><dt><code>-fira-verbose=</code><var>n</var><dd><a name="index-fira_002dverbose-948"></a>Control the verbosity of the dump file for the integrated register allocator.
|
||
|
The default value is 5. If the value <var>n</var> is greater or equal to 10,
|
||
|
the dump output is sent to stderr using the same format as <var>n</var> minus 10.
|
||
|
|
||
|
<br><dt><code>-flra-remat</code><dd><a name="index-flra_002dremat-949"></a>Enable CFG-sensitive rematerialization in LRA. Instead of loading
|
||
|
values of spilled pseudos, LRA tries to rematerialize (recalculate)
|
||
|
values if it is profitable.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fdelayed-branch</code><dd><a name="index-fdelayed_002dbranch-950"></a>If supported for the target machine, attempt to reorder instructions
|
||
|
to exploit instruction slots available after delayed branch
|
||
|
instructions.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fschedule-insns</code><dd><a name="index-fschedule_002dinsns-951"></a>If supported for the target machine, attempt to reorder instructions to
|
||
|
eliminate execution stalls due to required data being unavailable. This
|
||
|
helps machines that have slow floating point or memory load instructions
|
||
|
by allowing other instructions to be issued until the result of the load
|
||
|
or floating-point instruction is required.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fschedule-insns2</code><dd><a name="index-fschedule_002dinsns2-952"></a>Similar to <samp><span class="option">-fschedule-insns</span></samp>, but requests an additional pass of
|
||
|
instruction scheduling after register allocation has been done. This is
|
||
|
especially useful on machines with a relatively small number of
|
||
|
registers and where memory load instructions take more than one cycle.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-sched-interblock</code><dd><a name="index-fno_002dsched_002dinterblock-953"></a>Don't schedule instructions across basic blocks. This is normally
|
||
|
enabled by default when scheduling before register allocation, i.e.
|
||
|
with <samp><span class="option">-fschedule-insns</span></samp> or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fno-sched-spec</code><dd><a name="index-fno_002dsched_002dspec-954"></a>Don't allow speculative motion of non-load instructions. This is normally
|
||
|
enabled by default when scheduling before register allocation, i.e.
|
||
|
with <samp><span class="option">-fschedule-insns</span></samp> or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-pressure</code><dd><a name="index-fsched_002dpressure-955"></a>Enable register pressure sensitive insn scheduling before register
|
||
|
allocation. This only makes sense when scheduling before register
|
||
|
allocation is enabled, i.e. with <samp><span class="option">-fschedule-insns</span></samp> or at
|
||
|
<samp><span class="option">-O2</span></samp> or higher. Usage of this option can improve the
|
||
|
generated code and decrease its size by preventing register pressure
|
||
|
increase above the number of available hard registers and subsequent
|
||
|
spills in register allocation.
|
||
|
|
||
|
<br><dt><code>-fsched-spec-load</code><dd><a name="index-fsched_002dspec_002dload-956"></a>Allow speculative motion of some load instructions. This only makes
|
||
|
sense when scheduling before register allocation, i.e. with
|
||
|
<samp><span class="option">-fschedule-insns</span></samp> or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-spec-load-dangerous</code><dd><a name="index-fsched_002dspec_002dload_002ddangerous-957"></a>Allow speculative motion of more load instructions. This only makes
|
||
|
sense when scheduling before register allocation, i.e. with
|
||
|
<samp><span class="option">-fschedule-insns</span></samp> or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-stalled-insns</code><dt><code>-fsched-stalled-insns=</code><var>n</var><dd><a name="index-fsched_002dstalled_002dinsns-958"></a>Define how many insns (if any) can be moved prematurely from the queue
|
||
|
of stalled insns into the ready list during the second scheduling pass.
|
||
|
<samp><span class="option">-fno-sched-stalled-insns</span></samp> means that no insns are moved
|
||
|
prematurely, <samp><span class="option">-fsched-stalled-insns=0</span></samp> means there is no limit
|
||
|
on how many queued insns can be moved prematurely.
|
||
|
<samp><span class="option">-fsched-stalled-insns</span></samp> without a value is equivalent to
|
||
|
<samp><span class="option">-fsched-stalled-insns=1</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fsched-stalled-insns-dep</code><dt><code>-fsched-stalled-insns-dep=</code><var>n</var><dd><a name="index-fsched_002dstalled_002dinsns_002ddep-959"></a>Define how many insn groups (cycles) are examined for a dependency
|
||
|
on a stalled insn that is a candidate for premature removal from the queue
|
||
|
of stalled insns. This has an effect only during the second scheduling pass,
|
||
|
and only if <samp><span class="option">-fsched-stalled-insns</span></samp> is used.
|
||
|
<samp><span class="option">-fno-sched-stalled-insns-dep</span></samp> is equivalent to
|
||
|
<samp><span class="option">-fsched-stalled-insns-dep=0</span></samp>.
|
||
|
<samp><span class="option">-fsched-stalled-insns-dep</span></samp> without a value is equivalent to
|
||
|
<samp><span class="option">-fsched-stalled-insns-dep=1</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fsched2-use-superblocks</code><dd><a name="index-fsched2_002duse_002dsuperblocks-960"></a>When scheduling after register allocation, use superblock scheduling.
|
||
|
This allows motion across basic block boundaries,
|
||
|
resulting in faster schedules. This option is experimental, as not all machine
|
||
|
descriptions used by GCC model the CPU closely enough to avoid unreliable
|
||
|
results from the algorithm.
|
||
|
|
||
|
<p>This only makes sense when scheduling after register allocation, i.e. with
|
||
|
<samp><span class="option">-fschedule-insns2</span></samp> or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-group-heuristic</code><dd><a name="index-fsched_002dgroup_002dheuristic-961"></a>Enable the group heuristic in the scheduler. This heuristic favors
|
||
|
the instruction that belongs to a schedule group. This is enabled
|
||
|
by default when scheduling is enabled, i.e. with <samp><span class="option">-fschedule-insns</span></samp>
|
||
|
or <samp><span class="option">-fschedule-insns2</span></samp> or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-critical-path-heuristic</code><dd><a name="index-fsched_002dcritical_002dpath_002dheuristic-962"></a>Enable the critical-path heuristic in the scheduler. This heuristic favors
|
||
|
instructions on the critical path. This is enabled by default when
|
||
|
scheduling is enabled, i.e. with <samp><span class="option">-fschedule-insns</span></samp>
|
||
|
or <samp><span class="option">-fschedule-insns2</span></samp> or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-spec-insn-heuristic</code><dd><a name="index-fsched_002dspec_002dinsn_002dheuristic-963"></a>Enable the speculative instruction heuristic in the scheduler. This
|
||
|
heuristic favors speculative instructions with greater dependency weakness.
|
||
|
This is enabled by default when scheduling is enabled, i.e.
|
||
|
with <samp><span class="option">-fschedule-insns</span></samp> or <samp><span class="option">-fschedule-insns2</span></samp>
|
||
|
or at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-rank-heuristic</code><dd><a name="index-fsched_002drank_002dheuristic-964"></a>Enable the rank heuristic in the scheduler. This heuristic favors
|
||
|
the instruction belonging to a basic block with greater size or frequency.
|
||
|
This is enabled by default when scheduling is enabled, i.e.
|
||
|
with <samp><span class="option">-fschedule-insns</span></samp> or <samp><span class="option">-fschedule-insns2</span></samp> or
|
||
|
at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-last-insn-heuristic</code><dd><a name="index-fsched_002dlast_002dinsn_002dheuristic-965"></a>Enable the last-instruction heuristic in the scheduler. This heuristic
|
||
|
favors the instruction that is less dependent on the last instruction
|
||
|
scheduled. This is enabled by default when scheduling is enabled,
|
||
|
i.e. with <samp><span class="option">-fschedule-insns</span></samp> or <samp><span class="option">-fschedule-insns2</span></samp> or
|
||
|
at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-fsched-dep-count-heuristic</code><dd><a name="index-fsched_002ddep_002dcount_002dheuristic-966"></a>Enable the dependent-count heuristic in the scheduler. This heuristic
|
||
|
favors the instruction that has more instructions depending on it.
|
||
|
This is enabled by default when scheduling is enabled, i.e.
|
||
|
with <samp><span class="option">-fschedule-insns</span></samp> or <samp><span class="option">-fschedule-insns2</span></samp> or
|
||
|
at <samp><span class="option">-O2</span></samp> or higher.
|
||
|
|
||
|
<br><dt><code>-freschedule-modulo-scheduled-loops</code><dd><a name="index-freschedule_002dmodulo_002dscheduled_002dloops-967"></a>Modulo scheduling is performed before traditional scheduling. If a loop
|
||
|
is modulo scheduled, later scheduling passes may change its schedule.
|
||
|
Use this option to control that behavior.
|
||
|
|
||
|
<br><dt><code>-fselective-scheduling</code><dd><a name="index-fselective_002dscheduling-968"></a>Schedule instructions using selective scheduling algorithm. Selective
|
||
|
scheduling runs instead of the first scheduler pass.
|
||
|
|
||
|
<br><dt><code>-fselective-scheduling2</code><dd><a name="index-fselective_002dscheduling2-969"></a>Schedule instructions using selective scheduling algorithm. Selective
|
||
|
scheduling runs instead of the second scheduler pass.
|
||
|
|
||
|
<br><dt><code>-fsel-sched-pipelining</code><dd><a name="index-fsel_002dsched_002dpipelining-970"></a>Enable software pipelining of innermost loops during selective scheduling.
|
||
|
This option has no effect unless one of <samp><span class="option">-fselective-scheduling</span></samp> or
|
||
|
<samp><span class="option">-fselective-scheduling2</span></samp> is turned on.
|
||
|
|
||
|
<br><dt><code>-fsel-sched-pipelining-outer-loops</code><dd><a name="index-fsel_002dsched_002dpipelining_002douter_002dloops-971"></a>When pipelining loops during selective scheduling, also pipeline outer loops.
|
||
|
This option has no effect unless <samp><span class="option">-fsel-sched-pipelining</span></samp> is turned on.
|
||
|
|
||
|
<br><dt><code>-fsemantic-interposition</code><dd><a name="index-fsemantic_002dinterposition-972"></a>Some object formats, like ELF, allow interposing of symbols by the
|
||
|
dynamic linker.
|
||
|
This means that for symbols exported from the DSO, the compiler cannot perform
|
||
|
interprocedural propagation, inlining and other optimizations in anticipation
|
||
|
that the function or variable in question may change. While this feature is
|
||
|
useful, for example, to rewrite memory allocation functions by a debugging
|
||
|
implementation, it is expensive in the terms of code quality.
|
||
|
With <samp><span class="option">-fno-semantic-interposition</span></samp> the compiler assumes that
|
||
|
if interposition happens for functions the overwriting function will have
|
||
|
precisely the same semantics (and side effects).
|
||
|
Similarly if interposition happens
|
||
|
for variables, the constructor of the variable will be the same. The flag
|
||
|
has no effect for functions explicitly declared inline
|
||
|
(where it is never allowed for interposition to change semantics)
|
||
|
and for symbols explicitly declared weak.
|
||
|
|
||
|
<br><dt><code>-fshrink-wrap</code><dd><a name="index-fshrink_002dwrap-973"></a>Emit function prologues only before parts of the function that need it,
|
||
|
rather than at the top of the function. This flag is enabled by default at
|
||
|
<samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fcaller-saves</code><dd><a name="index-fcaller_002dsaves-974"></a>Enable allocation of values to registers that are clobbered by
|
||
|
function calls, by emitting extra instructions to save and restore the
|
||
|
registers around such calls. Such allocation is done only when it
|
||
|
seems to result in better code.
|
||
|
|
||
|
<p>This option is always enabled by default on certain machines, usually
|
||
|
those which have no call-preserved registers to use instead.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fcombine-stack-adjustments</code><dd><a name="index-fcombine_002dstack_002dadjustments-975"></a>Tracks stack adjustments (pushes and pops) and stack memory references
|
||
|
and then tries to find ways to combine them.
|
||
|
|
||
|
<p>Enabled by default at <samp><span class="option">-O1</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fipa-ra</code><dd><a name="index-fipa_002dra-976"></a>Use caller save registers for allocation if those registers are not used by
|
||
|
any called function. In that case it is not necessary to save and restore
|
||
|
them around calls. This is only possible if called functions are part of
|
||
|
same compilation unit as current function and they are compiled before it.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fconserve-stack</code><dd><a name="index-fconserve_002dstack-977"></a>Attempt to minimize stack usage. The compiler attempts to use less
|
||
|
stack space, even if that makes the program slower. This option
|
||
|
implies setting the <samp><span class="option">large-stack-frame</span></samp> parameter to 100
|
||
|
and the <samp><span class="option">large-stack-frame-growth</span></samp> parameter to 400.
|
||
|
|
||
|
<br><dt><code>-ftree-reassoc</code><dd><a name="index-ftree_002dreassoc-978"></a>Perform reassociation on trees. This flag is enabled by default
|
||
|
at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-pre</code><dd><a name="index-ftree_002dpre-979"></a>Perform partial redundancy elimination (PRE) on trees. This flag is
|
||
|
enabled by default at <samp><span class="option">-O2</span></samp> and <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-ftree-partial-pre</code><dd><a name="index-ftree_002dpartial_002dpre-980"></a>Make partial redundancy elimination (PRE) more aggressive. This flag is
|
||
|
enabled by default at <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-ftree-forwprop</code><dd><a name="index-ftree_002dforwprop-981"></a>Perform forward propagation on trees. This flag is enabled by default
|
||
|
at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-fre</code><dd><a name="index-ftree_002dfre-982"></a>Perform full redundancy elimination (FRE) on trees. The difference
|
||
|
between FRE and PRE is that FRE only considers expressions
|
||
|
that are computed on all paths leading to the redundant computation.
|
||
|
This analysis is faster than PRE, though it exposes fewer redundancies.
|
||
|
This flag is enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-phiprop</code><dd><a name="index-ftree_002dphiprop-983"></a>Perform hoisting of loads from conditional pointers on trees. This
|
||
|
pass is enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fhoist-adjacent-loads</code><dd><a name="index-fhoist_002dadjacent_002dloads-984"></a>Speculatively hoist loads from both branches of an if-then-else if the
|
||
|
loads are from adjacent locations in the same structure and the target
|
||
|
architecture has a conditional move instruction. This flag is enabled
|
||
|
by default at <samp><span class="option">-O2</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-copy-prop</code><dd><a name="index-ftree_002dcopy_002dprop-985"></a>Perform copy propagation on trees. This pass eliminates unnecessary
|
||
|
copy operations. This flag is enabled by default at <samp><span class="option">-O</span></samp> and
|
||
|
higher.
|
||
|
|
||
|
<br><dt><code>-fipa-pure-const</code><dd><a name="index-fipa_002dpure_002dconst-986"></a>Discover which functions are pure or constant.
|
||
|
Enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fipa-reference</code><dd><a name="index-fipa_002dreference-987"></a>Discover which static variables do not escape the
|
||
|
compilation unit.
|
||
|
Enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fipa-pta</code><dd><a name="index-fipa_002dpta-988"></a>Perform interprocedural pointer analysis and interprocedural modification
|
||
|
and reference analysis. This option can cause excessive memory and
|
||
|
compile-time usage on large compilation units. It is not enabled by
|
||
|
default at any optimization level.
|
||
|
|
||
|
<br><dt><code>-fipa-profile</code><dd><a name="index-fipa_002dprofile-989"></a>Perform interprocedural profile propagation. The functions called only from
|
||
|
cold functions are marked as cold. Also functions executed once (such as
|
||
|
<code>cold</code>, <code>noreturn</code>, static constructors or destructors) are identified. Cold
|
||
|
functions and loop less parts of functions executed once are then optimized for
|
||
|
size.
|
||
|
Enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fipa-cp</code><dd><a name="index-fipa_002dcp-990"></a>Perform interprocedural constant propagation.
|
||
|
This optimization analyzes the program to determine when values passed
|
||
|
to functions are constants and then optimizes accordingly.
|
||
|
This optimization can substantially increase performance
|
||
|
if the application has constants passed to functions.
|
||
|
This flag is enabled by default at <samp><span class="option">-O2</span></samp>, <samp><span class="option">-Os</span></samp> and <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fipa-cp-clone</code><dd><a name="index-fipa_002dcp_002dclone-991"></a>Perform function cloning to make interprocedural constant propagation stronger.
|
||
|
When enabled, interprocedural constant propagation performs function cloning
|
||
|
when externally visible function can be called with constant arguments.
|
||
|
Because this optimization can create multiple copies of functions,
|
||
|
it may significantly increase code size
|
||
|
(see <samp><span class="option">--param ipcp-unit-growth=</span><var>value</var></samp>).
|
||
|
This flag is enabled by default at <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fipa-cp-alignment</code><dd><a name="index-g_t_002dfipa_002dcp_002dalignment-992"></a>When enabled, this optimization propagates alignment of function
|
||
|
parameters to support better vectorization and string operations.
|
||
|
|
||
|
<p>This flag is enabled by default at <samp><span class="option">-O2</span></samp> and <samp><span class="option">-Os</span></samp>. It
|
||
|
requires that <samp><span class="option">-fipa-cp</span></samp> is enabled.
|
||
|
|
||
|
<br><dt><code>-fipa-icf</code><dd><a name="index-fipa_002dicf-993"></a>Perform Identical Code Folding for functions and read-only variables.
|
||
|
The optimization reduces code size and may disturb unwind stacks by replacing
|
||
|
a function by equivalent one with a different name. The optimization works
|
||
|
more effectively with link time optimization enabled.
|
||
|
|
||
|
<p>Nevertheless the behavior is similar to Gold Linker ICF optimization, GCC ICF
|
||
|
works on different levels and thus the optimizations are not same - there are
|
||
|
equivalences that are found only by GCC and equivalences found only by Gold.
|
||
|
|
||
|
<p>This flag is enabled by default at <samp><span class="option">-O2</span></samp> and <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fisolate-erroneous-paths-dereference</code><dd><a name="index-fisolate_002derroneous_002dpaths_002ddereference-994"></a>Detect paths that trigger erroneous or undefined behavior due to
|
||
|
dereferencing a null pointer. Isolate those paths from the main control
|
||
|
flow and turn the statement with erroneous or undefined behavior into a trap.
|
||
|
This flag is enabled by default at <samp><span class="option">-O2</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fisolate-erroneous-paths-attribute</code><dd><a name="index-fisolate_002derroneous_002dpaths_002dattribute-995"></a>Detect paths that trigger erroneous or undefined behavior due a null value
|
||
|
being used in a way forbidden by a <code>returns_nonnull</code> or <code>nonnull</code>
|
||
|
attribute. Isolate those paths from the main control flow and turn the
|
||
|
statement with erroneous or undefined behavior into a trap. This is not
|
||
|
currently enabled, but may be enabled by <samp><span class="option">-O2</span></samp> in the future.
|
||
|
|
||
|
<br><dt><code>-ftree-sink</code><dd><a name="index-ftree_002dsink-996"></a>Perform forward store motion on trees. This flag is
|
||
|
enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-bit-ccp</code><dd><a name="index-ftree_002dbit_002dccp-997"></a>Perform sparse conditional bit constant propagation on trees and propagate
|
||
|
pointer alignment information.
|
||
|
This pass only operates on local scalar variables and is enabled by default
|
||
|
at <samp><span class="option">-O</span></samp> and higher. It requires that <samp><span class="option">-ftree-ccp</span></samp> is enabled.
|
||
|
|
||
|
<br><dt><code>-ftree-ccp</code><dd><a name="index-ftree_002dccp-998"></a>Perform sparse conditional constant propagation (CCP) on trees. This
|
||
|
pass only operates on local scalar variables and is enabled by default
|
||
|
at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-fssa-phiopt</code><dd><a name="index-fssa_002dphiopt-999"></a>Perform pattern matching on SSA PHI nodes to optimize conditional
|
||
|
code. This pass is enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-switch-conversion</code><dd><a name="index-ftree_002dswitch_002dconversion-1000"></a>Perform conversion of simple initializations in a switch to
|
||
|
initializations from a scalar array. This flag is enabled by default
|
||
|
at <samp><span class="option">-O2</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-tail-merge</code><dd><a name="index-ftree_002dtail_002dmerge-1001"></a>Look for identical code sequences. When found, replace one with a jump to the
|
||
|
other. This optimization is known as tail merging or cross jumping. This flag
|
||
|
is enabled by default at <samp><span class="option">-O2</span></samp> and higher. The compilation time
|
||
|
in this pass can
|
||
|
be limited using <samp><span class="option">max-tail-merge-comparisons</span></samp> parameter and
|
||
|
<samp><span class="option">max-tail-merge-iterations</span></samp> parameter.
|
||
|
|
||
|
<br><dt><code>-ftree-dce</code><dd><a name="index-ftree_002ddce-1002"></a>Perform dead code elimination (DCE) on trees. This flag is enabled by
|
||
|
default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-builtin-call-dce</code><dd><a name="index-ftree_002dbuiltin_002dcall_002ddce-1003"></a>Perform conditional dead code elimination (DCE) for calls to built-in functions
|
||
|
that may set <code>errno</code> but are otherwise side-effect free. This flag is
|
||
|
enabled by default at <samp><span class="option">-O2</span></samp> and higher if <samp><span class="option">-Os</span></samp> is not also
|
||
|
specified.
|
||
|
|
||
|
<br><dt><code>-ftree-dominator-opts</code><dd><a name="index-ftree_002ddominator_002dopts-1004"></a>Perform a variety of simple scalar cleanups (constant/copy
|
||
|
propagation, redundancy elimination, range propagation and expression
|
||
|
simplification) based on a dominator tree traversal. This also
|
||
|
performs jump threading (to reduce jumps to jumps). This flag is
|
||
|
enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-dse</code><dd><a name="index-ftree_002ddse-1005"></a>Perform dead store elimination (DSE) on trees. A dead store is a store into
|
||
|
a memory location that is later overwritten by another store without
|
||
|
any intervening loads. In this case the earlier store can be deleted. This
|
||
|
flag is enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-ch</code><dd><a name="index-ftree_002dch-1006"></a>Perform loop header copying on trees. This is beneficial since it increases
|
||
|
effectiveness of code motion optimizations. It also saves one jump. This flag
|
||
|
is enabled by default at <samp><span class="option">-O</span></samp> and higher. It is not enabled
|
||
|
for <samp><span class="option">-Os</span></samp>, since it usually increases code size.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-optimize</code><dd><a name="index-ftree_002dloop_002doptimize-1007"></a>Perform loop optimizations on trees. This flag is enabled by default
|
||
|
at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-linear</code><dd><a name="index-ftree_002dloop_002dlinear-1008"></a>Perform loop interchange transformations on tree. Same as
|
||
|
<samp><span class="option">-floop-interchange</span></samp>. To use this code transformation, GCC has
|
||
|
to be configured with <samp><span class="option">--with-isl</span></samp> to enable the Graphite loop
|
||
|
transformation infrastructure.
|
||
|
|
||
|
<br><dt><code>-floop-interchange</code><dd><a name="index-floop_002dinterchange-1009"></a>Perform loop interchange transformations on loops. Interchanging two
|
||
|
nested loops switches the inner and outer loops. For example, given a
|
||
|
loop like:
|
||
|
<pre class="smallexample"> DO J = 1, M
|
||
|
DO I = 1, N
|
||
|
A(J, I) = A(J, I) * C
|
||
|
ENDDO
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p class="noindent">loop interchange transforms the loop as if it were written:
|
||
|
<pre class="smallexample"> DO I = 1, N
|
||
|
DO J = 1, M
|
||
|
A(J, I) = A(J, I) * C
|
||
|
ENDDO
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p>which can be beneficial when <code>N</code> is larger than the caches,
|
||
|
because in Fortran, the elements of an array are stored in memory
|
||
|
contiguously by column, and the original loop iterates over rows,
|
||
|
potentially creating at each access a cache miss. This optimization
|
||
|
applies to all the languages supported by GCC and is not limited to
|
||
|
Fortran. To use this code transformation, GCC has to be configured
|
||
|
with <samp><span class="option">--with-isl</span></samp> to enable the Graphite loop transformation
|
||
|
infrastructure.
|
||
|
|
||
|
<br><dt><code>-floop-strip-mine</code><dd><a name="index-floop_002dstrip_002dmine-1010"></a>Perform loop strip mining transformations on loops. Strip mining
|
||
|
splits a loop into two nested loops. The outer loop has strides
|
||
|
equal to the strip size and the inner loop has strides of the
|
||
|
original loop within a strip. The strip length can be changed
|
||
|
using the <samp><span class="option">loop-block-tile-size</span></samp> parameter. For example,
|
||
|
given a loop like:
|
||
|
<pre class="smallexample"> DO I = 1, N
|
||
|
A(I) = A(I) + C
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p class="noindent">loop strip mining transforms the loop as if it were written:
|
||
|
<pre class="smallexample"> DO II = 1, N, 51
|
||
|
DO I = II, min (II + 50, N)
|
||
|
A(I) = A(I) + C
|
||
|
ENDDO
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p>This optimization applies to all the languages supported by GCC and is
|
||
|
not limited to Fortran. To use this code transformation, GCC has to
|
||
|
be configured with <samp><span class="option">--with-isl</span></samp> to enable the Graphite loop
|
||
|
transformation infrastructure.
|
||
|
|
||
|
<br><dt><code>-floop-block</code><dd><a name="index-floop_002dblock-1011"></a>Perform loop blocking transformations on loops. Blocking strip mines
|
||
|
each loop in the loop nest such that the memory accesses of the
|
||
|
element loops fit inside caches. The strip length can be changed
|
||
|
using the <samp><span class="option">loop-block-tile-size</span></samp> parameter. For example, given
|
||
|
a loop like:
|
||
|
<pre class="smallexample"> DO I = 1, N
|
||
|
DO J = 1, M
|
||
|
A(J, I) = B(I) + C(J)
|
||
|
ENDDO
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p class="noindent">loop blocking transforms the loop as if it were written:
|
||
|
<pre class="smallexample"> DO II = 1, N, 51
|
||
|
DO JJ = 1, M, 51
|
||
|
DO I = II, min (II + 50, N)
|
||
|
DO J = JJ, min (JJ + 50, M)
|
||
|
A(J, I) = B(I) + C(J)
|
||
|
ENDDO
|
||
|
ENDDO
|
||
|
ENDDO
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p>which can be beneficial when <code>M</code> is larger than the caches,
|
||
|
because the innermost loop iterates over a smaller amount of data
|
||
|
which can be kept in the caches. This optimization applies to all the
|
||
|
languages supported by GCC and is not limited to Fortran. To use this
|
||
|
code transformation, GCC has to be configured with <samp><span class="option">--with-isl</span></samp>
|
||
|
to enable the Graphite loop transformation infrastructure.
|
||
|
|
||
|
<br><dt><code>-fgraphite-identity</code><dd><a name="index-fgraphite_002didentity-1012"></a>Enable the identity transformation for graphite. For every SCoP we generate
|
||
|
the polyhedral representation and transform it back to gimple. Using
|
||
|
<samp><span class="option">-fgraphite-identity</span></samp> we can check the costs or benefits of the
|
||
|
GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations
|
||
|
are also performed by the code generator ISL, like index splitting and
|
||
|
dead code elimination in loops.
|
||
|
|
||
|
<br><dt><code>-floop-nest-optimize</code><dd><a name="index-floop_002dnest_002doptimize-1013"></a>Enable the ISL based loop nest optimizer. This is a generic loop nest
|
||
|
optimizer based on the Pluto optimization algorithms. It calculates a loop
|
||
|
structure optimized for data-locality and parallelism. This option
|
||
|
is experimental.
|
||
|
|
||
|
<br><dt><code>-floop-unroll-and-jam</code><dd><a name="index-floop_002dunroll_002dand_002djam-1014"></a>Enable unroll and jam for the ISL based loop nest optimizer. The unroll
|
||
|
factor can be changed using the <samp><span class="option">loop-unroll-jam-size</span></samp> parameter.
|
||
|
The unrolled dimension (counting from the most inner one) can be changed
|
||
|
using the <samp><span class="option">loop-unroll-jam-depth</span></samp> parameter. .
|
||
|
|
||
|
<br><dt><code>-floop-parallelize-all</code><dd><a name="index-floop_002dparallelize_002dall-1015"></a>Use the Graphite data dependence analysis to identify loops that can
|
||
|
be parallelized. Parallelize all the loops that can be analyzed to
|
||
|
not contain loop carried dependences without checking that it is
|
||
|
profitable to parallelize the loops.
|
||
|
|
||
|
<br><dt><code>-fcheck-data-deps</code><dd><a name="index-fcheck_002ddata_002ddeps-1016"></a>Compare the results of several data dependence analyzers. This option
|
||
|
is used for debugging the data dependence analyzers.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-if-convert</code><dd><a name="index-ftree_002dloop_002dif_002dconvert-1017"></a>Attempt to transform conditional jumps in the innermost loops to
|
||
|
branch-less equivalents. The intent is to remove control-flow from
|
||
|
the innermost loops in order to improve the ability of the
|
||
|
vectorization pass to handle these loops. This is enabled by default
|
||
|
if vectorization is enabled.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-if-convert-stores</code><dd><a name="index-ftree_002dloop_002dif_002dconvert_002dstores-1018"></a>Attempt to also if-convert conditional jumps containing memory writes.
|
||
|
This transformation can be unsafe for multi-threaded programs as it
|
||
|
transforms conditional memory writes into unconditional memory writes.
|
||
|
For example,
|
||
|
<pre class="smallexample"> for (i = 0; i < N; i++)
|
||
|
if (cond)
|
||
|
A[i] = expr;
|
||
|
</pre>
|
||
|
<p>is transformed to
|
||
|
<pre class="smallexample"> for (i = 0; i < N; i++)
|
||
|
A[i] = cond ? expr : A[i];
|
||
|
</pre>
|
||
|
<p>potentially producing data races.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-distribution</code><dd><a name="index-ftree_002dloop_002ddistribution-1019"></a>Perform loop distribution. This flag can improve cache performance on
|
||
|
big loop bodies and allow further loop optimizations, like
|
||
|
parallelization or vectorization, to take place. For example, the loop
|
||
|
<pre class="smallexample"> DO I = 1, N
|
||
|
A(I) = B(I) + C
|
||
|
D(I) = E(I) * F
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p>is transformed to
|
||
|
<pre class="smallexample"> DO I = 1, N
|
||
|
A(I) = B(I) + C
|
||
|
ENDDO
|
||
|
DO I = 1, N
|
||
|
D(I) = E(I) * F
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<br><dt><code>-ftree-loop-distribute-patterns</code><dd><a name="index-ftree_002dloop_002ddistribute_002dpatterns-1020"></a>Perform loop distribution of patterns that can be code generated with
|
||
|
calls to a library. This flag is enabled by default at <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<p>This pass distributes the initialization loops and generates a call to
|
||
|
memset zero. For example, the loop
|
||
|
<pre class="smallexample"> DO I = 1, N
|
||
|
A(I) = 0
|
||
|
B(I) = A(I) + I
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p>is transformed to
|
||
|
<pre class="smallexample"> DO I = 1, N
|
||
|
A(I) = 0
|
||
|
ENDDO
|
||
|
DO I = 1, N
|
||
|
B(I) = A(I) + I
|
||
|
ENDDO
|
||
|
</pre>
|
||
|
<p>and the initialization loop is transformed into a call to memset zero.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-im</code><dd><a name="index-ftree_002dloop_002dim-1021"></a>Perform loop invariant motion on trees. This pass moves only invariants that
|
||
|
are hard to handle at RTL level (function calls, operations that expand to
|
||
|
nontrivial sequences of insns). With <samp><span class="option">-funswitch-loops</span></samp> it also moves
|
||
|
operands of conditions that are invariant out of the loop, so that we can use
|
||
|
just trivial invariantness analysis in loop unswitching. The pass also includes
|
||
|
store motion.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-ivcanon</code><dd><a name="index-ftree_002dloop_002divcanon-1022"></a>Create a canonical counter for number of iterations in loops for which
|
||
|
determining number of iterations requires complicated analysis. Later
|
||
|
optimizations then may determine the number easily. Useful especially
|
||
|
in connection with unrolling.
|
||
|
|
||
|
<br><dt><code>-fivopts</code><dd><a name="index-fivopts-1023"></a>Perform induction variable optimizations (strength reduction, induction
|
||
|
variable merging and induction variable elimination) on trees.
|
||
|
|
||
|
<br><dt><code>-ftree-parallelize-loops=n</code><dd><a name="index-ftree_002dparallelize_002dloops-1024"></a>Parallelize loops, i.e., split their iteration space to run in n threads.
|
||
|
This is only possible for loops whose iterations are independent
|
||
|
and can be arbitrarily reordered. The optimization is only
|
||
|
profitable on multiprocessor machines, for loops that are CPU-intensive,
|
||
|
rather than constrained e.g. by memory bandwidth. This option
|
||
|
implies <samp><span class="option">-pthread</span></samp>, and thus is only supported on targets
|
||
|
that have support for <samp><span class="option">-pthread</span></samp>.
|
||
|
|
||
|
<br><dt><code>-ftree-pta</code><dd><a name="index-ftree_002dpta-1025"></a>Perform function-local points-to analysis on trees. This flag is
|
||
|
enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-sra</code><dd><a name="index-ftree_002dsra-1026"></a>Perform scalar replacement of aggregates. This pass replaces structure
|
||
|
references with scalars to prevent committing structures to memory too
|
||
|
early. This flag is enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-copyrename</code><dd><a name="index-ftree_002dcopyrename-1027"></a>Perform copy renaming on trees. This pass attempts to rename compiler
|
||
|
temporaries to other variables at copy locations, usually resulting in
|
||
|
variable names which more closely resemble the original variables. This flag
|
||
|
is enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-coalesce-inlined-vars</code><dd><a name="index-ftree_002dcoalesce_002dinlined_002dvars-1028"></a>Tell the copyrename pass (see <samp><span class="option">-ftree-copyrename</span></samp>) to attempt to
|
||
|
combine small user-defined variables too, but only if they are inlined
|
||
|
from other functions. It is a more limited form of
|
||
|
<samp><span class="option">-ftree-coalesce-vars</span></samp>. This may harm debug information of such
|
||
|
inlined variables, but it keeps variables of the inlined-into
|
||
|
function apart from each other, such that they are more likely to
|
||
|
contain the expected values in a debugging session.
|
||
|
|
||
|
<br><dt><code>-ftree-coalesce-vars</code><dd><a name="index-ftree_002dcoalesce_002dvars-1029"></a>Tell the copyrename pass (see <samp><span class="option">-ftree-copyrename</span></samp>) to attempt to
|
||
|
combine small user-defined variables too, instead of just compiler
|
||
|
temporaries. This may severely limit the ability to debug an optimized
|
||
|
program compiled with <samp><span class="option">-fno-var-tracking-assignments</span></samp>. In the
|
||
|
negated form, this flag prevents SSA coalescing of user variables,
|
||
|
including inlined ones. This option is enabled by default.
|
||
|
|
||
|
<br><dt><code>-ftree-ter</code><dd><a name="index-ftree_002dter-1030"></a>Perform temporary expression replacement during the SSA->normal phase. Single
|
||
|
use/single def temporaries are replaced at their use location with their
|
||
|
defining expression. This results in non-GIMPLE code, but gives the expanders
|
||
|
much more complex trees to work on resulting in better RTL generation. This is
|
||
|
enabled by default at <samp><span class="option">-O</span></samp> and higher.
|
||
|
|
||
|
<br><dt><code>-ftree-slsr</code><dd><a name="index-ftree_002dslsr-1031"></a>Perform straight-line strength reduction on trees. This recognizes related
|
||
|
expressions involving multiplications and replaces them by less expensive
|
||
|
calculations when possible. This is enabled by default at <samp><span class="option">-O</span></samp> and
|
||
|
higher.
|
||
|
|
||
|
<br><dt><code>-ftree-vectorize</code><dd><a name="index-ftree_002dvectorize-1032"></a>Perform vectorization on trees. This flag enables <samp><span class="option">-ftree-loop-vectorize</span></samp>
|
||
|
and <samp><span class="option">-ftree-slp-vectorize</span></samp> if not explicitly specified.
|
||
|
|
||
|
<br><dt><code>-ftree-loop-vectorize</code><dd><a name="index-ftree_002dloop_002dvectorize-1033"></a>Perform loop vectorization on trees. This flag is enabled by default at
|
||
|
<samp><span class="option">-O3</span></samp> and when <samp><span class="option">-ftree-vectorize</span></samp> is enabled.
|
||
|
|
||
|
<br><dt><code>-ftree-slp-vectorize</code><dd><a name="index-ftree_002dslp_002dvectorize-1034"></a>Perform basic block vectorization on trees. This flag is enabled by default at
|
||
|
<samp><span class="option">-O3</span></samp> and when <samp><span class="option">-ftree-vectorize</span></samp> is enabled.
|
||
|
|
||
|
<br><dt><code>-fvect-cost-model=</code><var>model</var><dd><a name="index-fvect_002dcost_002dmodel-1035"></a>Alter the cost model used for vectorization. The <var>model</var> argument
|
||
|
should be one of ‘<samp><span class="samp">unlimited</span></samp>’, ‘<samp><span class="samp">dynamic</span></samp>’ or ‘<samp><span class="samp">cheap</span></samp>’.
|
||
|
With the ‘<samp><span class="samp">unlimited</span></samp>’ model the vectorized code-path is assumed
|
||
|
to be profitable while with the ‘<samp><span class="samp">dynamic</span></samp>’ model a runtime check
|
||
|
guards the vectorized code-path to enable it only for iteration
|
||
|
counts that will likely execute faster than when executing the original
|
||
|
scalar loop. The ‘<samp><span class="samp">cheap</span></samp>’ model disables vectorization of
|
||
|
loops where doing so would be cost prohibitive for example due to
|
||
|
required runtime checks for data dependence or alignment but otherwise
|
||
|
is equal to the ‘<samp><span class="samp">dynamic</span></samp>’ model.
|
||
|
The default cost model depends on other optimization flags and is
|
||
|
either ‘<samp><span class="samp">dynamic</span></samp>’ or ‘<samp><span class="samp">cheap</span></samp>’.
|
||
|
|
||
|
<br><dt><code>-fsimd-cost-model=</code><var>model</var><dd><a name="index-fsimd_002dcost_002dmodel-1036"></a>Alter the cost model used for vectorization of loops marked with the OpenMP
|
||
|
or Cilk Plus simd directive. The <var>model</var> argument should be one of
|
||
|
‘<samp><span class="samp">unlimited</span></samp>’, ‘<samp><span class="samp">dynamic</span></samp>’, ‘<samp><span class="samp">cheap</span></samp>’. All values of <var>model</var>
|
||
|
have the same meaning as described in <samp><span class="option">-fvect-cost-model</span></samp> and by
|
||
|
default a cost model defined with <samp><span class="option">-fvect-cost-model</span></samp> is used.
|
||
|
|
||
|
<br><dt><code>-ftree-vrp</code><dd><a name="index-ftree_002dvrp-1037"></a>Perform Value Range Propagation on trees. This is similar to the
|
||
|
constant propagation pass, but instead of values, ranges of values are
|
||
|
propagated. This allows the optimizers to remove unnecessary range
|
||
|
checks like array bound checks and null pointer checks. This is
|
||
|
enabled by default at <samp><span class="option">-O2</span></samp> and higher. Null pointer check
|
||
|
elimination is only done if <samp><span class="option">-fdelete-null-pointer-checks</span></samp> is
|
||
|
enabled.
|
||
|
|
||
|
<br><dt><code>-fsplit-ivs-in-unroller</code><dd><a name="index-fsplit_002divs_002din_002dunroller-1038"></a>Enables expression of values of induction variables in later iterations
|
||
|
of the unrolled loop using the value in the first iteration. This breaks
|
||
|
long dependency chains, thus improving efficiency of the scheduling passes.
|
||
|
|
||
|
<p>A combination of <samp><span class="option">-fweb</span></samp> and CSE is often sufficient to obtain the
|
||
|
same effect. However, that is not reliable in cases where the loop body
|
||
|
is more complicated than a single basic block. It also does not work at all
|
||
|
on some architectures due to restrictions in the CSE pass.
|
||
|
|
||
|
<p>This optimization is enabled by default.
|
||
|
|
||
|
<br><dt><code>-fvariable-expansion-in-unroller</code><dd><a name="index-fvariable_002dexpansion_002din_002dunroller-1039"></a>With this option, the compiler creates multiple copies of some
|
||
|
local variables when unrolling a loop, which can result in superior code.
|
||
|
|
||
|
<br><dt><code>-fpartial-inlining</code><dd><a name="index-fpartial_002dinlining-1040"></a>Inline parts of functions. This option has any effect only
|
||
|
when inlining itself is turned on by the <samp><span class="option">-finline-functions</span></samp>
|
||
|
or <samp><span class="option">-finline-small-functions</span></samp> options.
|
||
|
|
||
|
<p>Enabled at level <samp><span class="option">-O2</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fpredictive-commoning</code><dd><a name="index-fpredictive_002dcommoning-1041"></a>Perform predictive commoning optimization, i.e., reusing computations
|
||
|
(especially memory loads and stores) performed in previous
|
||
|
iterations of loops.
|
||
|
|
||
|
<p>This option is enabled at level <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fprefetch-loop-arrays</code><dd><a name="index-fprefetch_002dloop_002darrays-1042"></a>If supported by the target machine, generate instructions to prefetch
|
||
|
memory to improve the performance of loops that access large arrays.
|
||
|
|
||
|
<p>This option may generate better or worse code; results are highly
|
||
|
dependent on the structure of loops within the source code.
|
||
|
|
||
|
<p>Disabled at level <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-peephole</code><dt><code>-fno-peephole2</code><dd><a name="index-fno_002dpeephole-1043"></a><a name="index-fno_002dpeephole2-1044"></a>Disable any machine-specific peephole optimizations. The difference
|
||
|
between <samp><span class="option">-fno-peephole</span></samp> and <samp><span class="option">-fno-peephole2</span></samp> is in how they
|
||
|
are implemented in the compiler; some targets use one, some use the
|
||
|
other, a few use both.
|
||
|
|
||
|
<p><samp><span class="option">-fpeephole</span></samp> is enabled by default.
|
||
|
<samp><span class="option">-fpeephole2</span></samp> enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-guess-branch-probability</code><dd><a name="index-fno_002dguess_002dbranch_002dprobability-1045"></a>Do not guess branch probabilities using heuristics.
|
||
|
|
||
|
<p>GCC uses heuristics to guess branch probabilities if they are
|
||
|
not provided by profiling feedback (<samp><span class="option">-fprofile-arcs</span></samp>). These
|
||
|
heuristics are based on the control flow graph. If some branch probabilities
|
||
|
are specified by <code>__builtin_expect</code>, then the heuristics are
|
||
|
used to guess branch probabilities for the rest of the control flow graph,
|
||
|
taking the <code>__builtin_expect</code> info into account. The interactions
|
||
|
between the heuristics and <code>__builtin_expect</code> can be complex, and in
|
||
|
some cases, it may be useful to disable the heuristics so that the effects
|
||
|
of <code>__builtin_expect</code> are easier to understand.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fguess-branch-probability</span></samp> at levels
|
||
|
<samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-freorder-blocks</code><dd><a name="index-freorder_002dblocks-1046"></a>Reorder basic blocks in the compiled function in order to reduce number of
|
||
|
taken branches and improve code locality.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-freorder-blocks-and-partition</code><dd><a name="index-freorder_002dblocks_002dand_002dpartition-1047"></a>In addition to reordering basic blocks in the compiled function, in order
|
||
|
to reduce number of taken branches, partitions hot and cold basic blocks
|
||
|
into separate sections of the assembly and .o files, to improve
|
||
|
paging and cache locality performance.
|
||
|
|
||
|
<p>This optimization is automatically turned off in the presence of
|
||
|
exception handling, for linkonce sections, for functions with a user-defined
|
||
|
section attribute and on any architecture that does not support named
|
||
|
sections.
|
||
|
|
||
|
<p>Enabled for x86 at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-freorder-functions</code><dd><a name="index-freorder_002dfunctions-1048"></a>Reorder functions in the object file in order to
|
||
|
improve code locality. This is implemented by using special
|
||
|
subsections <code>.text.hot</code> for most frequently executed functions and
|
||
|
<code>.text.unlikely</code> for unlikely executed functions. Reordering is done by
|
||
|
the linker so object file format must support named sections and linker must
|
||
|
place them in a reasonable way.
|
||
|
|
||
|
<p>Also profile feedback must be available to make this option effective. See
|
||
|
<samp><span class="option">-fprofile-arcs</span></samp> for details.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fstrict-aliasing</code><dd><a name="index-fstrict_002daliasing-1049"></a>Allow the compiler to assume the strictest aliasing rules applicable to
|
||
|
the language being compiled. For C (and C++), this activates
|
||
|
optimizations based on the type of expressions. In particular, an
|
||
|
object of one type is assumed never to reside at the same address as an
|
||
|
object of a different type, unless the types are almost the same. For
|
||
|
example, an <code>unsigned int</code> can alias an <code>int</code>, but not a
|
||
|
<code>void*</code> or a <code>double</code>. A character type may alias any other
|
||
|
type.
|
||
|
|
||
|
<p><a name="Type_002dpunning"></a>Pay special attention to code like this:
|
||
|
<pre class="smallexample"> union a_union {
|
||
|
int i;
|
||
|
double d;
|
||
|
};
|
||
|
|
||
|
int f() {
|
||
|
union a_union t;
|
||
|
t.d = 3.0;
|
||
|
return t.i;
|
||
|
}
|
||
|
</pre>
|
||
|
<p>The practice of reading from a different union member than the one most
|
||
|
recently written to (called “type-punning”) is common. Even with
|
||
|
<samp><span class="option">-fstrict-aliasing</span></samp>, type-punning is allowed, provided the memory
|
||
|
is accessed through the union type. So, the code above works as
|
||
|
expected. See <a href="Structures-unions-enumerations-and-bit_002dfields-implementation.html#Structures-unions-enumerations-and-bit_002dfields-implementation">Structures unions enumerations and bit-fields implementation</a>. However, this code might not:
|
||
|
<pre class="smallexample"> int f() {
|
||
|
union a_union t;
|
||
|
int* ip;
|
||
|
t.d = 3.0;
|
||
|
ip = &t.i;
|
||
|
return *ip;
|
||
|
}
|
||
|
</pre>
|
||
|
<p>Similarly, access by taking the address, casting the resulting pointer
|
||
|
and dereferencing the result has undefined behavior, even if the cast
|
||
|
uses a union type, e.g.:
|
||
|
<pre class="smallexample"> int f() {
|
||
|
double d = 3.0;
|
||
|
return ((union a_union *) &d)->i;
|
||
|
}
|
||
|
</pre>
|
||
|
<p>The <samp><span class="option">-fstrict-aliasing</span></samp> option is enabled at levels
|
||
|
<samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fstrict-overflow</code><dd><a name="index-fstrict_002doverflow-1050"></a>Allow the compiler to assume strict signed overflow rules, depending
|
||
|
on the language being compiled. For C (and C++) this means that
|
||
|
overflow when doing arithmetic with signed numbers is undefined, which
|
||
|
means that the compiler may assume that it does not happen. This
|
||
|
permits various optimizations. For example, the compiler assumes
|
||
|
that an expression like <code>i + 10 > i</code> is always true for
|
||
|
signed <code>i</code>. This assumption is only valid if signed overflow is
|
||
|
undefined, as the expression is false if <code>i + 10</code> overflows when
|
||
|
using twos complement arithmetic. When this option is in effect any
|
||
|
attempt to determine whether an operation on signed numbers
|
||
|
overflows must be written carefully to not actually involve overflow.
|
||
|
|
||
|
<p>This option also allows the compiler to assume strict pointer
|
||
|
semantics: given a pointer to an object, if adding an offset to that
|
||
|
pointer does not produce a pointer to the same object, the addition is
|
||
|
undefined. This permits the compiler to conclude that <code>p + u >
|
||
|
p</code> is always true for a pointer <code>p</code> and unsigned integer
|
||
|
<code>u</code>. This assumption is only valid because pointer wraparound is
|
||
|
undefined, as the expression is false if <code>p + u</code> overflows using
|
||
|
twos complement arithmetic.
|
||
|
|
||
|
<p>See also the <samp><span class="option">-fwrapv</span></samp> option. Using <samp><span class="option">-fwrapv</span></samp> means
|
||
|
that integer signed overflow is fully defined: it wraps. When
|
||
|
<samp><span class="option">-fwrapv</span></samp> is used, there is no difference between
|
||
|
<samp><span class="option">-fstrict-overflow</span></samp> and <samp><span class="option">-fno-strict-overflow</span></samp> for
|
||
|
integers. With <samp><span class="option">-fwrapv</span></samp> certain types of overflow are
|
||
|
permitted. For example, if the compiler gets an overflow when doing
|
||
|
arithmetic on constants, the overflowed value can still be used with
|
||
|
<samp><span class="option">-fwrapv</span></samp>, but not otherwise.
|
||
|
|
||
|
<p>The <samp><span class="option">-fstrict-overflow</span></samp> option is enabled at levels
|
||
|
<samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-falign-functions</code><dt><code>-falign-functions=</code><var>n</var><dd><a name="index-falign_002dfunctions-1051"></a>Align the start of functions to the next power-of-two greater than
|
||
|
<var>n</var>, skipping up to <var>n</var> bytes. For instance,
|
||
|
<samp><span class="option">-falign-functions=32</span></samp> aligns functions to the next 32-byte
|
||
|
boundary, but <samp><span class="option">-falign-functions=24</span></samp> aligns to the next
|
||
|
32-byte boundary only if this can be done by skipping 23 bytes or less.
|
||
|
|
||
|
<p><samp><span class="option">-fno-align-functions</span></samp> and <samp><span class="option">-falign-functions=1</span></samp> are
|
||
|
equivalent and mean that functions are not aligned.
|
||
|
|
||
|
<p>Some assemblers only support this flag when <var>n</var> is a power of two;
|
||
|
in that case, it is rounded up.
|
||
|
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-falign-labels</code><dt><code>-falign-labels=</code><var>n</var><dd><a name="index-falign_002dlabels-1052"></a>Align all branch targets to a power-of-two boundary, skipping up to
|
||
|
<var>n</var> bytes like <samp><span class="option">-falign-functions</span></samp>. This option can easily
|
||
|
make code slower, because it must insert dummy operations for when the
|
||
|
branch target is reached in the usual flow of the code.
|
||
|
|
||
|
<p><samp><span class="option">-fno-align-labels</span></samp> and <samp><span class="option">-falign-labels=1</span></samp> are
|
||
|
equivalent and mean that labels are not aligned.
|
||
|
|
||
|
<p>If <samp><span class="option">-falign-loops</span></samp> or <samp><span class="option">-falign-jumps</span></samp> are applicable and
|
||
|
are greater than this value, then their values are used instead.
|
||
|
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default
|
||
|
which is very likely to be ‘<samp><span class="samp">1</span></samp>’, meaning no alignment.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-falign-loops</code><dt><code>-falign-loops=</code><var>n</var><dd><a name="index-falign_002dloops-1053"></a>Align loops to a power-of-two boundary, skipping up to <var>n</var> bytes
|
||
|
like <samp><span class="option">-falign-functions</span></samp>. If the loops are
|
||
|
executed many times, this makes up for any execution of the dummy
|
||
|
operations.
|
||
|
|
||
|
<p><samp><span class="option">-fno-align-loops</span></samp> and <samp><span class="option">-falign-loops=1</span></samp> are
|
||
|
equivalent and mean that loops are not aligned.
|
||
|
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-falign-jumps</code><dt><code>-falign-jumps=</code><var>n</var><dd><a name="index-falign_002djumps-1054"></a>Align branch targets to a power-of-two boundary, for branch targets
|
||
|
where the targets can only be reached by jumping, skipping up to <var>n</var>
|
||
|
bytes like <samp><span class="option">-falign-functions</span></samp>. In this case, no dummy operations
|
||
|
need be executed.
|
||
|
|
||
|
<p><samp><span class="option">-fno-align-jumps</span></samp> and <samp><span class="option">-falign-jumps=1</span></samp> are
|
||
|
equivalent and mean that loops are not aligned.
|
||
|
|
||
|
<p>If <var>n</var> is not specified or is zero, use a machine-dependent default.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>-funit-at-a-time</code><dd><a name="index-funit_002dat_002da_002dtime-1055"></a>This option is left for compatibility reasons. <samp><span class="option">-funit-at-a-time</span></samp>
|
||
|
has no effect, while <samp><span class="option">-fno-unit-at-a-time</span></samp> implies
|
||
|
<samp><span class="option">-fno-toplevel-reorder</span></samp> and <samp><span class="option">-fno-section-anchors</span></samp>.
|
||
|
|
||
|
<p>Enabled by default.
|
||
|
|
||
|
<br><dt><code>-fno-toplevel-reorder</code><dd><a name="index-fno_002dtoplevel_002dreorder-1056"></a>Do not reorder top-level functions, variables, and <code>asm</code>
|
||
|
statements. Output them in the same order that they appear in the
|
||
|
input file. When this option is used, unreferenced static variables
|
||
|
are not removed. This option is intended to support existing code
|
||
|
that relies on a particular ordering. For new code, it is better to
|
||
|
use attributes when possible.
|
||
|
|
||
|
<p>Enabled at level <samp><span class="option">-O0</span></samp>. When disabled explicitly, it also implies
|
||
|
<samp><span class="option">-fno-section-anchors</span></samp>, which is otherwise enabled at <samp><span class="option">-O0</span></samp> on some
|
||
|
targets.
|
||
|
|
||
|
<br><dt><code>-fweb</code><dd><a name="index-fweb-1057"></a>Constructs webs as commonly used for register allocation purposes and assign
|
||
|
each web individual pseudo register. This allows the register allocation pass
|
||
|
to operate on pseudos directly, but also strengthens several other optimization
|
||
|
passes, such as CSE, loop optimizer and trivial dead code remover. It can,
|
||
|
however, make debugging impossible, since variables no longer stay in a
|
||
|
“home register”.
|
||
|
|
||
|
<p>Enabled by default with <samp><span class="option">-funroll-loops</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fwhole-program</code><dd><a name="index-fwhole_002dprogram-1058"></a>Assume that the current compilation unit represents the whole program being
|
||
|
compiled. All public functions and variables with the exception of <code>main</code>
|
||
|
and those merged by attribute <code>externally_visible</code> become static functions
|
||
|
and in effect are optimized more aggressively by interprocedural optimizers.
|
||
|
|
||
|
<p>This option should not be used in combination with <samp><span class="option">-flto</span></samp>.
|
||
|
Instead relying on a linker plugin should provide safer and more precise
|
||
|
information.
|
||
|
|
||
|
<br><dt><code>-flto[=</code><var>n</var><code>]</code><dd><a name="index-flto-1059"></a>This option runs the standard link-time optimizer. When invoked
|
||
|
with source code, it generates GIMPLE (one of GCC's internal
|
||
|
representations) and writes it to special ELF sections in the object
|
||
|
file. When the object files are linked together, all the function
|
||
|
bodies are read from these ELF sections and instantiated as if they
|
||
|
had been part of the same translation unit.
|
||
|
|
||
|
<p>To use the link-time optimizer, <samp><span class="option">-flto</span></samp> and optimization
|
||
|
options should be specified at compile time and during the final link.
|
||
|
For example:
|
||
|
|
||
|
<pre class="smallexample"> gcc -c -O2 -flto foo.c
|
||
|
gcc -c -O2 -flto bar.c
|
||
|
gcc -o myprog -flto -O2 foo.o bar.o
|
||
|
</pre>
|
||
|
<p>The first two invocations to GCC save a bytecode representation
|
||
|
of GIMPLE into special ELF sections inside <samp><span class="file">foo.o</span></samp> and
|
||
|
<samp><span class="file">bar.o</span></samp>. The final invocation reads the GIMPLE bytecode from
|
||
|
<samp><span class="file">foo.o</span></samp> and <samp><span class="file">bar.o</span></samp>, merges the two files into a single
|
||
|
internal image, and compiles the result as usual. Since both
|
||
|
<samp><span class="file">foo.o</span></samp> and <samp><span class="file">bar.o</span></samp> are merged into a single image, this
|
||
|
causes all the interprocedural analyses and optimizations in GCC to
|
||
|
work across the two files as if they were a single one. This means,
|
||
|
for example, that the inliner is able to inline functions in
|
||
|
<samp><span class="file">bar.o</span></samp> into functions in <samp><span class="file">foo.o</span></samp> and vice-versa.
|
||
|
|
||
|
<p>Another (simpler) way to enable link-time optimization is:
|
||
|
|
||
|
<pre class="smallexample"> gcc -o myprog -flto -O2 foo.c bar.c
|
||
|
</pre>
|
||
|
<p>The above generates bytecode for <samp><span class="file">foo.c</span></samp> and <samp><span class="file">bar.c</span></samp>,
|
||
|
merges them together into a single GIMPLE representation and optimizes
|
||
|
them as usual to produce <samp><span class="file">myprog</span></samp>.
|
||
|
|
||
|
<p>The only important thing to keep in mind is that to enable link-time
|
||
|
optimizations you need to use the GCC driver to perform the link-step.
|
||
|
GCC then automatically performs link-time optimization if any of the
|
||
|
objects involved were compiled with the <samp><span class="option">-flto</span></samp> command-line option.
|
||
|
You generally
|
||
|
should specify the optimization options to be used for link-time
|
||
|
optimization though GCC tries to be clever at guessing an
|
||
|
optimization level to use from the options used at compile-time
|
||
|
if you fail to specify one at link-time. You can always override
|
||
|
the automatic decision to do link-time optimization at link-time
|
||
|
by passing <samp><span class="option">-fno-lto</span></samp> to the link command.
|
||
|
|
||
|
<p>To make whole program optimization effective, it is necessary to make
|
||
|
certain whole program assumptions. The compiler needs to know
|
||
|
what functions and variables can be accessed by libraries and runtime
|
||
|
outside of the link-time optimized unit. When supported by the linker,
|
||
|
the linker plugin (see <samp><span class="option">-fuse-linker-plugin</span></samp>) passes information
|
||
|
to the compiler about used and externally visible symbols. When
|
||
|
the linker plugin is not available, <samp><span class="option">-fwhole-program</span></samp> should be
|
||
|
used to allow the compiler to make these assumptions, which leads
|
||
|
to more aggressive optimization decisions.
|
||
|
|
||
|
<p>When <samp><span class="option">-fuse-linker-plugin</span></samp> is not enabled then, when a file is
|
||
|
compiled with <samp><span class="option">-flto</span></samp>, the generated object file is larger than
|
||
|
a regular object file because it contains GIMPLE bytecodes and the usual
|
||
|
final code (see <samp><span class="option">-ffat-lto-objects</span></samp>. This means that
|
||
|
object files with LTO information can be linked as normal object
|
||
|
files; if <samp><span class="option">-fno-lto</span></samp> is passed to the linker, no
|
||
|
interprocedural optimizations are applied. Note that when
|
||
|
<samp><span class="option">-fno-fat-lto-objects</span></samp> is enabled the compile-stage is faster
|
||
|
but you cannot perform a regular, non-LTO link on them.
|
||
|
|
||
|
<p>Additionally, the optimization flags used to compile individual files
|
||
|
are not necessarily related to those used at link time. For instance,
|
||
|
|
||
|
<pre class="smallexample"> gcc -c -O0 -ffat-lto-objects -flto foo.c
|
||
|
gcc -c -O0 -ffat-lto-objects -flto bar.c
|
||
|
gcc -o myprog -O3 foo.o bar.o
|
||
|
</pre>
|
||
|
<p>This produces individual object files with unoptimized assembler
|
||
|
code, but the resulting binary <samp><span class="file">myprog</span></samp> is optimized at
|
||
|
<samp><span class="option">-O3</span></samp>. If, instead, the final binary is generated with
|
||
|
<samp><span class="option">-fno-lto</span></samp>, then <samp><span class="file">myprog</span></samp> is not optimized.
|
||
|
|
||
|
<p>When producing the final binary, GCC only
|
||
|
applies link-time optimizations to those files that contain bytecode.
|
||
|
Therefore, you can mix and match object files and libraries with
|
||
|
GIMPLE bytecodes and final object code. GCC automatically selects
|
||
|
which files to optimize in LTO mode and which files to link without
|
||
|
further processing.
|
||
|
|
||
|
<p>There are some code generation flags preserved by GCC when
|
||
|
generating bytecodes, as they need to be used during the final link
|
||
|
stage. Generally options specified at link-time override those
|
||
|
specified at compile-time.
|
||
|
|
||
|
<p>If you do not specify an optimization level option <samp><span class="option">-O</span></samp> at
|
||
|
link-time then GCC computes one based on the optimization levels
|
||
|
used when compiling the object files. The highest optimization
|
||
|
level wins here.
|
||
|
|
||
|
<p>Currently, the following options and their setting are take from
|
||
|
the first object file that explicitely specified it:
|
||
|
<samp><span class="option">-fPIC</span></samp>, <samp><span class="option">-fpic</span></samp>, <samp><span class="option">-fpie</span></samp>, <samp><span class="option">-fcommon</span></samp>,
|
||
|
<samp><span class="option">-fexceptions</span></samp>, <samp><span class="option">-fnon-call-exceptions</span></samp>, <samp><span class="option">-fgnu-tm</span></samp>
|
||
|
and all the <samp><span class="option">-m</span></samp> target flags.
|
||
|
|
||
|
<p>Certain ABI changing flags are required to match in all compilation-units
|
||
|
and trying to override this at link-time with a conflicting value
|
||
|
is ignored. This includes options such as <samp><span class="option">-freg-struct-return</span></samp>
|
||
|
and <samp><span class="option">-fpcc-struct-return</span></samp>.
|
||
|
|
||
|
<p>Other options such as <samp><span class="option">-ffp-contract</span></samp>, <samp><span class="option">-fno-strict-overflow</span></samp>,
|
||
|
<samp><span class="option">-fwrapv</span></samp>, <samp><span class="option">-fno-trapv</span></samp> or <samp><span class="option">-fno-strict-aliasing</span></samp>
|
||
|
are passed through to the link stage and merged conservatively for
|
||
|
conflicting translation units. Specifically
|
||
|
<samp><span class="option">-fno-strict-overflow</span></samp>, <samp><span class="option">-fwrapv</span></samp> and <samp><span class="option">-fno-trapv</span></samp> take
|
||
|
precedence and for example <samp><span class="option">-ffp-contract=off</span></samp> takes precedence
|
||
|
over <samp><span class="option">-ffp-contract=fast</span></samp>. You can override them at linke-time.
|
||
|
|
||
|
<p>It is recommended that you compile all the files participating in the
|
||
|
same link with the same options and also specify those options at
|
||
|
link time.
|
||
|
|
||
|
<p>If LTO encounters objects with C linkage declared with incompatible
|
||
|
types in separate translation units to be linked together (undefined
|
||
|
behavior according to ISO C99 6.2.7), a non-fatal diagnostic may be
|
||
|
issued. The behavior is still undefined at run time. Similar
|
||
|
diagnostics may be raised for other languages.
|
||
|
|
||
|
<p>Another feature of LTO is that it is possible to apply interprocedural
|
||
|
optimizations on files written in different languages:
|
||
|
|
||
|
<pre class="smallexample"> gcc -c -flto foo.c
|
||
|
g++ -c -flto bar.cc
|
||
|
gfortran -c -flto baz.f90
|
||
|
g++ -o myprog -flto -O3 foo.o bar.o baz.o -lgfortran
|
||
|
</pre>
|
||
|
<p>Notice that the final link is done with <samp><span class="command">g++</span></samp> to get the C++
|
||
|
runtime libraries and <samp><span class="option">-lgfortran</span></samp> is added to get the Fortran
|
||
|
runtime libraries. In general, when mixing languages in LTO mode, you
|
||
|
should use the same link command options as when mixing languages in a
|
||
|
regular (non-LTO) compilation.
|
||
|
|
||
|
<p>If object files containing GIMPLE bytecode are stored in a library archive, say
|
||
|
<samp><span class="file">libfoo.a</span></samp>, it is possible to extract and use them in an LTO link if you
|
||
|
are using a linker with plugin support. To create static libraries suitable
|
||
|
for LTO, use <samp><span class="command">gcc-ar</span></samp> and <samp><span class="command">gcc-ranlib</span></samp> instead of <samp><span class="command">ar</span></samp>
|
||
|
and <samp><span class="command">ranlib</span></samp>;
|
||
|
to show the symbols of object files with GIMPLE bytecode, use
|
||
|
<samp><span class="command">gcc-nm</span></samp>. Those commands require that <samp><span class="command">ar</span></samp>, <samp><span class="command">ranlib</span></samp>
|
||
|
and <samp><span class="command">nm</span></samp> have been compiled with plugin support. At link time, use the the
|
||
|
flag <samp><span class="option">-fuse-linker-plugin</span></samp> to ensure that the library participates in
|
||
|
the LTO optimization process:
|
||
|
|
||
|
<pre class="smallexample"> gcc -o myprog -O2 -flto -fuse-linker-plugin a.o b.o -lfoo
|
||
|
</pre>
|
||
|
<p>With the linker plugin enabled, the linker extracts the needed
|
||
|
GIMPLE files from <samp><span class="file">libfoo.a</span></samp> and passes them on to the running GCC
|
||
|
to make them part of the aggregated GIMPLE image to be optimized.
|
||
|
|
||
|
<p>If you are not using a linker with plugin support and/or do not
|
||
|
enable the linker plugin, then the objects inside <samp><span class="file">libfoo.a</span></samp>
|
||
|
are extracted and linked as usual, but they do not participate
|
||
|
in the LTO optimization process. In order to make a static library suitable
|
||
|
for both LTO optimization and usual linkage, compile its object files with
|
||
|
<samp><span class="option">-flto</span></samp> <samp><span class="option">-ffat-lto-objects</span></samp>.
|
||
|
|
||
|
<p>Link-time optimizations do not require the presence of the whole program to
|
||
|
operate. If the program does not require any symbols to be exported, it is
|
||
|
possible to combine <samp><span class="option">-flto</span></samp> and <samp><span class="option">-fwhole-program</span></samp> to allow
|
||
|
the interprocedural optimizers to use more aggressive assumptions which may
|
||
|
lead to improved optimization opportunities.
|
||
|
Use of <samp><span class="option">-fwhole-program</span></samp> is not needed when linker plugin is
|
||
|
active (see <samp><span class="option">-fuse-linker-plugin</span></samp>).
|
||
|
|
||
|
<p>The current implementation of LTO makes no
|
||
|
attempt to generate bytecode that is portable between different
|
||
|
types of hosts. The bytecode files are versioned and there is a
|
||
|
strict version check, so bytecode files generated in one version of
|
||
|
GCC do not work with an older or newer version of GCC.
|
||
|
|
||
|
<p>Link-time optimization does not work well with generation of debugging
|
||
|
information. Combining <samp><span class="option">-flto</span></samp> with
|
||
|
<samp><span class="option">-g</span></samp> is currently experimental and expected to produce unexpected
|
||
|
results.
|
||
|
|
||
|
<p>If you specify the optional <var>n</var>, the optimization and code
|
||
|
generation done at link time is executed in parallel using <var>n</var>
|
||
|
parallel jobs by utilizing an installed <samp><span class="command">make</span></samp> program. The
|
||
|
environment variable <samp><span class="env">MAKE</span></samp> may be used to override the program
|
||
|
used. The default value for <var>n</var> is 1.
|
||
|
|
||
|
<p>You can also specify <samp><span class="option">-flto=jobserver</span></samp> to use GNU make's
|
||
|
job server mode to determine the number of parallel jobs. This
|
||
|
is useful when the Makefile calling GCC is already executing in parallel.
|
||
|
You must prepend a ‘<samp><span class="samp">+</span></samp>’ to the command recipe in the parent Makefile
|
||
|
for this to work. This option likely only works if <samp><span class="env">MAKE</span></samp> is
|
||
|
GNU make.
|
||
|
|
||
|
<br><dt><code>-flto-partition=</code><var>alg</var><dd><a name="index-flto_002dpartition-1060"></a>Specify the partitioning algorithm used by the link-time optimizer.
|
||
|
The value is either ‘<samp><span class="samp">1to1</span></samp>’ to specify a partitioning mirroring
|
||
|
the original source files or ‘<samp><span class="samp">balanced</span></samp>’ to specify partitioning
|
||
|
into equally sized chunks (whenever possible) or ‘<samp><span class="samp">max</span></samp>’ to create
|
||
|
new partition for every symbol where possible. Specifying ‘<samp><span class="samp">none</span></samp>’
|
||
|
as an algorithm disables partitioning and streaming completely.
|
||
|
The default value is ‘<samp><span class="samp">balanced</span></samp>’. While ‘<samp><span class="samp">1to1</span></samp>’ can be used
|
||
|
as an workaround for various code ordering issues, the ‘<samp><span class="samp">max</span></samp>’
|
||
|
partitioning is intended for internal testing only.
|
||
|
The value ‘<samp><span class="samp">one</span></samp>’ specifies that exactly one partition should be
|
||
|
used while the value ‘<samp><span class="samp">none</span></samp>’ bypasses partitioning and executes
|
||
|
the link-time optimization step directly from the WPA phase.
|
||
|
|
||
|
<br><dt><code>-flto-odr-type-merging</code><dd><a name="index-flto_002dodr_002dtype_002dmerging-1061"></a>Enable streaming of mangled types names of C++ types and their unification
|
||
|
at linktime. This increases size of LTO object files, but enable
|
||
|
diagnostics about One Definition Rule violations.
|
||
|
|
||
|
<br><dt><code>-flto-compression-level=</code><var>n</var><dd><a name="index-flto_002dcompression_002dlevel-1062"></a>This option specifies the level of compression used for intermediate
|
||
|
language written to LTO object files, and is only meaningful in
|
||
|
conjunction with LTO mode (<samp><span class="option">-flto</span></samp>). Valid
|
||
|
values are 0 (no compression) to 9 (maximum compression). Values
|
||
|
outside this range are clamped to either 0 or 9. If the option is not
|
||
|
given, a default balanced compression setting is used.
|
||
|
|
||
|
<br><dt><code>-flto-report</code><dd><a name="index-flto_002dreport-1063"></a>Prints a report with internal details on the workings of the link-time
|
||
|
optimizer. The contents of this report vary from version to version.
|
||
|
It is meant to be useful to GCC developers when processing object
|
||
|
files in LTO mode (via <samp><span class="option">-flto</span></samp>).
|
||
|
|
||
|
<p>Disabled by default.
|
||
|
|
||
|
<br><dt><code>-flto-report-wpa</code><dd><a name="index-flto_002dreport_002dwpa-1064"></a>Like <samp><span class="option">-flto-report</span></samp>, but only print for the WPA phase of Link
|
||
|
Time Optimization.
|
||
|
|
||
|
<br><dt><code>-fuse-linker-plugin</code><dd><a name="index-fuse_002dlinker_002dplugin-1065"></a>Enables the use of a linker plugin during link-time optimization. This
|
||
|
option relies on plugin support in the linker, which is available in gold
|
||
|
or in GNU ld 2.21 or newer.
|
||
|
|
||
|
<p>This option enables the extraction of object files with GIMPLE bytecode out
|
||
|
of library archives. This improves the quality of optimization by exposing
|
||
|
more code to the link-time optimizer. This information specifies what
|
||
|
symbols can be accessed externally (by non-LTO object or during dynamic
|
||
|
linking). Resulting code quality improvements on binaries (and shared
|
||
|
libraries that use hidden visibility) are similar to <samp><span class="option">-fwhole-program</span></samp>.
|
||
|
See <samp><span class="option">-flto</span></samp> for a description of the effect of this flag and how to
|
||
|
use it.
|
||
|
|
||
|
<p>This option is enabled by default when LTO support in GCC is enabled
|
||
|
and GCC was configured for use with
|
||
|
a linker supporting plugins (GNU ld 2.21 or newer or gold).
|
||
|
|
||
|
<br><dt><code>-ffat-lto-objects</code><dd><a name="index-ffat_002dlto_002dobjects-1066"></a>Fat LTO objects are object files that contain both the intermediate language
|
||
|
and the object code. This makes them usable for both LTO linking and normal
|
||
|
linking. This option is effective only when compiling with <samp><span class="option">-flto</span></samp>
|
||
|
and is ignored at link time.
|
||
|
|
||
|
<p><samp><span class="option">-fno-fat-lto-objects</span></samp> improves compilation time over plain LTO, but
|
||
|
requires the complete toolchain to be aware of LTO. It requires a linker with
|
||
|
linker plugin support for basic functionality. Additionally,
|
||
|
<samp><span class="command">nm</span></samp>, <samp><span class="command">ar</span></samp> and <samp><span class="command">ranlib</span></samp>
|
||
|
need to support linker plugins to allow a full-featured build environment
|
||
|
(capable of building static libraries etc). GCC provides the <samp><span class="command">gcc-ar</span></samp>,
|
||
|
<samp><span class="command">gcc-nm</span></samp>, <samp><span class="command">gcc-ranlib</span></samp> wrappers to pass the right options
|
||
|
to these tools. With non fat LTO makefiles need to be modified to use them.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-fat-lto-objects</span></samp> on targets with linker plugin
|
||
|
support.
|
||
|
|
||
|
<br><dt><code>-fcompare-elim</code><dd><a name="index-fcompare_002delim-1067"></a>After register allocation and post-register allocation instruction splitting,
|
||
|
identify arithmetic instructions that compute processor flags similar to a
|
||
|
comparison operation based on that arithmetic. If possible, eliminate the
|
||
|
explicit comparison operation.
|
||
|
|
||
|
<p>This pass only applies to certain targets that cannot explicitly represent
|
||
|
the comparison operation before register allocation is complete.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fcprop-registers</code><dd><a name="index-fcprop_002dregisters-1068"></a>After register allocation and post-register allocation instruction splitting,
|
||
|
perform a copy-propagation pass to try to reduce scheduling dependencies
|
||
|
and occasionally eliminate the copy.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O</span></samp>, <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fprofile-correction</code><dd><a name="index-fprofile_002dcorrection-1069"></a>Profiles collected using an instrumented binary for multi-threaded programs may
|
||
|
be inconsistent due to missed counter updates. When this option is specified,
|
||
|
GCC uses heuristics to correct or smooth out such inconsistencies. By
|
||
|
default, GCC emits an error message when an inconsistent profile is detected.
|
||
|
|
||
|
<br><dt><code>-fprofile-dir=</code><var>path</var><dd><a name="index-fprofile_002ddir-1070"></a>
|
||
|
Set the directory to search for the profile data files in to <var>path</var>.
|
||
|
This option affects only the profile data generated by
|
||
|
<samp><span class="option">-fprofile-generate</span></samp>, <samp><span class="option">-ftest-coverage</span></samp>, <samp><span class="option">-fprofile-arcs</span></samp>
|
||
|
and used by <samp><span class="option">-fprofile-use</span></samp> and <samp><span class="option">-fbranch-probabilities</span></samp>
|
||
|
and its related options. Both absolute and relative paths can be used.
|
||
|
By default, GCC uses the current directory as <var>path</var>, thus the
|
||
|
profile data file appears in the same directory as the object file.
|
||
|
|
||
|
<br><dt><code>-fprofile-generate</code><dt><code>-fprofile-generate=</code><var>path</var><dd><a name="index-fprofile_002dgenerate-1071"></a>
|
||
|
Enable options usually used for instrumenting application to produce
|
||
|
profile useful for later recompilation with profile feedback based
|
||
|
optimization. You must use <samp><span class="option">-fprofile-generate</span></samp> both when
|
||
|
compiling and when linking your program.
|
||
|
|
||
|
<p>The following options are enabled: <samp><span class="option">-fprofile-arcs</span></samp>, <samp><span class="option">-fprofile-values</span></samp>, <samp><span class="option">-fvpt</span></samp>.
|
||
|
|
||
|
<p>If <var>path</var> is specified, GCC looks at the <var>path</var> to find
|
||
|
the profile feedback data files. See <samp><span class="option">-fprofile-dir</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fprofile-use</code><dt><code>-fprofile-use=</code><var>path</var><dd><a name="index-fprofile_002duse-1072"></a>Enable profile feedback-directed optimizations,
|
||
|
and the following optimizations
|
||
|
which are generally profitable only with profile feedback available:
|
||
|
<samp><span class="option">-fbranch-probabilities</span></samp>, <samp><span class="option">-fvpt</span></samp>,
|
||
|
<samp><span class="option">-funroll-loops</span></samp>, <samp><span class="option">-fpeel-loops</span></samp>, <samp><span class="option">-ftracer</span></samp>,
|
||
|
<samp><span class="option">-ftree-vectorize</span></samp>, and <samp><span class="option">ftree-loop-distribute-patterns</span></samp>.
|
||
|
|
||
|
<p>By default, GCC emits an error message if the feedback profiles do not
|
||
|
match the source code. This error can be turned into a warning by using
|
||
|
<samp><span class="option">-Wcoverage-mismatch</span></samp>. Note this may result in poorly optimized
|
||
|
code.
|
||
|
|
||
|
<p>If <var>path</var> is specified, GCC looks at the <var>path</var> to find
|
||
|
the profile feedback data files. See <samp><span class="option">-fprofile-dir</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fauto-profile</code><dt><code>-fauto-profile=</code><var>path</var><dd><a name="index-fauto_002dprofile-1073"></a>Enable sampling-based feedback-directed optimizations,
|
||
|
and the following optimizations
|
||
|
which are generally profitable only with profile feedback available:
|
||
|
<samp><span class="option">-fbranch-probabilities</span></samp>, <samp><span class="option">-fvpt</span></samp>,
|
||
|
<samp><span class="option">-funroll-loops</span></samp>, <samp><span class="option">-fpeel-loops</span></samp>, <samp><span class="option">-ftracer</span></samp>,
|
||
|
<samp><span class="option">-ftree-vectorize</span></samp>,
|
||
|
<samp><span class="option">-finline-functions</span></samp>, <samp><span class="option">-fipa-cp</span></samp>, <samp><span class="option">-fipa-cp-clone</span></samp>,
|
||
|
<samp><span class="option">-fpredictive-commoning</span></samp>, <samp><span class="option">-funswitch-loops</span></samp>,
|
||
|
<samp><span class="option">-fgcse-after-reload</span></samp>, and <samp><span class="option">-ftree-loop-distribute-patterns</span></samp>.
|
||
|
|
||
|
<p><var>path</var> is the name of a file containing AutoFDO profile information.
|
||
|
If omitted, it defaults to <samp><span class="file">fbdata.afdo</span></samp> in the current directory.
|
||
|
|
||
|
<p>Producing an AutoFDO profile data file requires running your program
|
||
|
with the <samp><span class="command">perf</span></samp> utility on a supported GNU/Linux target system.
|
||
|
For more information, see <a href="https://perf.wiki.kernel.org/">https://perf.wiki.kernel.org/</a>.
|
||
|
|
||
|
<p>E.g.
|
||
|
<pre class="smallexample"> perf record -e br_inst_retired:near_taken -b -o perf.data \
|
||
|
-- your_program
|
||
|
</pre>
|
||
|
<p>Then use the <samp><span class="command">create_gcov</span></samp> tool to convert the raw profile data
|
||
|
to a format that can be used by GCC. You must also supply the
|
||
|
unstripped binary for your program to this tool.
|
||
|
See <a href="https://github.com/google/autofdo">https://github.com/google/autofdo</a>.
|
||
|
|
||
|
<p>E.g.
|
||
|
<pre class="smallexample"> create_gcov --binary=your_program.unstripped --profile=perf.data \
|
||
|
--gcov=profile.afdo
|
||
|
</pre>
|
||
|
</dl>
|
||
|
|
||
|
<p>The following options control compiler behavior regarding floating-point
|
||
|
arithmetic. These options trade off between speed and
|
||
|
correctness. All must be specifically enabled.
|
||
|
|
||
|
<dl>
|
||
|
<dt><code>-ffloat-store</code><dd><a name="index-ffloat_002dstore-1074"></a>Do not store floating-point variables in registers, and inhibit other
|
||
|
options that might change whether a floating-point value is taken from a
|
||
|
register or memory.
|
||
|
|
||
|
<p><a name="index-floating_002dpoint-precision-1075"></a>This option prevents undesirable excess precision on machines such as
|
||
|
the 68000 where the floating registers (of the 68881) keep more
|
||
|
precision than a <code>double</code> is supposed to have. Similarly for the
|
||
|
x86 architecture. For most programs, the excess precision does only
|
||
|
good, but a few programs rely on the precise definition of IEEE floating
|
||
|
point. Use <samp><span class="option">-ffloat-store</span></samp> for such programs, after modifying
|
||
|
them to store all pertinent intermediate computations into variables.
|
||
|
|
||
|
<br><dt><code>-fexcess-precision=</code><var>style</var><dd><a name="index-fexcess_002dprecision-1076"></a>This option allows further control over excess precision on machines
|
||
|
where floating-point registers have more precision than the IEEE
|
||
|
<code>float</code> and <code>double</code> types and the processor does not
|
||
|
support operations rounding to those types. By default,
|
||
|
<samp><span class="option">-fexcess-precision=fast</span></samp> is in effect; this means that
|
||
|
operations are carried out in the precision of the registers and that
|
||
|
it is unpredictable when rounding to the types specified in the source
|
||
|
code takes place. When compiling C, if
|
||
|
<samp><span class="option">-fexcess-precision=standard</span></samp> is specified then excess
|
||
|
precision follows the rules specified in ISO C99; in particular,
|
||
|
both casts and assignments cause values to be rounded to their
|
||
|
semantic types (whereas <samp><span class="option">-ffloat-store</span></samp> only affects
|
||
|
assignments). This option is enabled by default for C if a strict
|
||
|
conformance option such as <samp><span class="option">-std=c99</span></samp> is used.
|
||
|
|
||
|
<p><a name="index-mfpmath-1077"></a><samp><span class="option">-fexcess-precision=standard</span></samp> is not implemented for languages
|
||
|
other than C, and has no effect if
|
||
|
<samp><span class="option">-funsafe-math-optimizations</span></samp> or <samp><span class="option">-ffast-math</span></samp> is
|
||
|
specified. On the x86, it also has no effect if <samp><span class="option">-mfpmath=sse</span></samp>
|
||
|
or <samp><span class="option">-mfpmath=sse+387</span></samp> is specified; in the former case, IEEE
|
||
|
semantics apply without excess precision, and in the latter, rounding
|
||
|
is unpredictable.
|
||
|
|
||
|
<br><dt><code>-ffast-math</code><dd><a name="index-ffast_002dmath-1078"></a>Sets the options <samp><span class="option">-fno-math-errno</span></samp>, <samp><span class="option">-funsafe-math-optimizations</span></samp>,
|
||
|
<samp><span class="option">-ffinite-math-only</span></samp>, <samp><span class="option">-fno-rounding-math</span></samp>,
|
||
|
<samp><span class="option">-fno-signaling-nans</span></samp> and <samp><span class="option">-fcx-limited-range</span></samp>.
|
||
|
|
||
|
<p>This option causes the preprocessor macro <code>__FAST_MATH__</code> to be defined.
|
||
|
|
||
|
<p>This option is not turned on by any <samp><span class="option">-O</span></samp> option besides
|
||
|
<samp><span class="option">-Ofast</span></samp> since it can result in incorrect output for programs
|
||
|
that depend on an exact implementation of IEEE or ISO rules/specifications
|
||
|
for math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
|
||
|
<br><dt><code>-fno-math-errno</code><dd><a name="index-fno_002dmath_002derrno-1079"></a>Do not set <code>errno</code> after calling math functions that are executed
|
||
|
with a single instruction, e.g., <code>sqrt</code>. A program that relies on
|
||
|
IEEE exceptions for math error handling may want to use this flag
|
||
|
for speed while maintaining IEEE arithmetic compatibility.
|
||
|
|
||
|
<p>This option is not turned on by any <samp><span class="option">-O</span></samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fmath-errno</span></samp>.
|
||
|
|
||
|
<p>On Darwin systems, the math library never sets <code>errno</code>. There is
|
||
|
therefore no reason for the compiler to consider the possibility that
|
||
|
it might, and <samp><span class="option">-fno-math-errno</span></samp> is the default.
|
||
|
|
||
|
<br><dt><code>-funsafe-math-optimizations</code><dd><a name="index-funsafe_002dmath_002doptimizations-1080"></a>
|
||
|
Allow optimizations for floating-point arithmetic that (a) assume
|
||
|
that arguments and results are valid and (b) may violate IEEE or
|
||
|
ANSI standards. When used at link-time, it may include libraries
|
||
|
or startup files that change the default FPU control word or other
|
||
|
similar optimizations.
|
||
|
|
||
|
<p>This option is not turned on by any <samp><span class="option">-O</span></samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
Enables <samp><span class="option">-fno-signed-zeros</span></samp>, <samp><span class="option">-fno-trapping-math</span></samp>,
|
||
|
<samp><span class="option">-fassociative-math</span></samp> and <samp><span class="option">-freciprocal-math</span></samp>.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-unsafe-math-optimizations</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fassociative-math</code><dd><a name="index-fassociative_002dmath-1081"></a>
|
||
|
Allow re-association of operands in series of floating-point operations.
|
||
|
This violates the ISO C and C++ language standard by possibly changing
|
||
|
computation result. NOTE: re-ordering may change the sign of zero as
|
||
|
well as ignore NaNs and inhibit or create underflow or overflow (and
|
||
|
thus cannot be used on code that relies on rounding behavior like
|
||
|
<code>(x + 2**52) - 2**52</code>. May also reorder floating-point comparisons
|
||
|
and thus may not be used when ordered comparisons are required.
|
||
|
This option requires that both <samp><span class="option">-fno-signed-zeros</span></samp> and
|
||
|
<samp><span class="option">-fno-trapping-math</span></samp> be in effect. Moreover, it doesn't make
|
||
|
much sense with <samp><span class="option">-frounding-math</span></samp>. For Fortran the option
|
||
|
is automatically enabled when both <samp><span class="option">-fno-signed-zeros</span></samp> and
|
||
|
<samp><span class="option">-fno-trapping-math</span></samp> are in effect.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-associative-math</span></samp>.
|
||
|
|
||
|
<br><dt><code>-freciprocal-math</code><dd><a name="index-freciprocal_002dmath-1082"></a>
|
||
|
Allow the reciprocal of a value to be used instead of dividing by
|
||
|
the value if this enables optimizations. For example <code>x / y</code>
|
||
|
can be replaced with <code>x * (1/y)</code>, which is useful if <code>(1/y)</code>
|
||
|
is subject to common subexpression elimination. Note that this loses
|
||
|
precision and increases the number of flops operating on the value.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-reciprocal-math</span></samp>.
|
||
|
|
||
|
<br><dt><code>-ffinite-math-only</code><dd><a name="index-ffinite_002dmath_002donly-1083"></a>Allow optimizations for floating-point arithmetic that assume
|
||
|
that arguments and results are not NaNs or +-Infs.
|
||
|
|
||
|
<p>This option is not turned on by any <samp><span class="option">-O</span></samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions. It may, however, yield faster code for programs
|
||
|
that do not require the guarantees of these specifications.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-finite-math-only</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-signed-zeros</code><dd><a name="index-fno_002dsigned_002dzeros-1084"></a>Allow optimizations for floating-point arithmetic that ignore the
|
||
|
signedness of zero. IEEE arithmetic specifies the behavior of
|
||
|
distinct +0.0 and −0.0 values, which then prohibits simplification
|
||
|
of expressions such as x+0.0 or 0.0*x (even with <samp><span class="option">-ffinite-math-only</span></samp>).
|
||
|
This option implies that the sign of a zero result isn't significant.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fsigned-zeros</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fno-trapping-math</code><dd><a name="index-fno_002dtrapping_002dmath-1085"></a>Compile code assuming that floating-point operations cannot generate
|
||
|
user-visible traps. These traps include division by zero, overflow,
|
||
|
underflow, inexact result and invalid operation. This option requires
|
||
|
that <samp><span class="option">-fno-signaling-nans</span></samp> be in effect. Setting this option may
|
||
|
allow faster code if one relies on “non-stop” IEEE arithmetic, for example.
|
||
|
|
||
|
<p>This option should never be turned on by any <samp><span class="option">-O</span></samp> option since
|
||
|
it can result in incorrect output for programs that depend on
|
||
|
an exact implementation of IEEE or ISO rules/specifications for
|
||
|
math functions.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-ftrapping-math</span></samp>.
|
||
|
|
||
|
<br><dt><code>-frounding-math</code><dd><a name="index-frounding_002dmath-1086"></a>Disable transformations and optimizations that assume default floating-point
|
||
|
rounding behavior. This is round-to-zero for all floating point
|
||
|
to integer conversions, and round-to-nearest for all other arithmetic
|
||
|
truncations. This option should be specified for programs that change
|
||
|
the FP rounding mode dynamically, or that may be executed with a
|
||
|
non-default rounding mode. This option disables constant folding of
|
||
|
floating-point expressions at compile time (which may be affected by
|
||
|
rounding mode) and arithmetic transformations that are unsafe in the
|
||
|
presence of sign-dependent rounding modes.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-rounding-math</span></samp>.
|
||
|
|
||
|
<p>This option is experimental and does not currently guarantee to
|
||
|
disable all GCC optimizations that are affected by rounding mode.
|
||
|
Future versions of GCC may provide finer control of this setting
|
||
|
using C99's <code>FENV_ACCESS</code> pragma. This command-line option
|
||
|
will be used to specify the default state for <code>FENV_ACCESS</code>.
|
||
|
|
||
|
<br><dt><code>-fsignaling-nans</code><dd><a name="index-fsignaling_002dnans-1087"></a>Compile code assuming that IEEE signaling NaNs may generate user-visible
|
||
|
traps during floating-point operations. Setting this option disables
|
||
|
optimizations that may change the number of exceptions visible with
|
||
|
signaling NaNs. This option implies <samp><span class="option">-ftrapping-math</span></samp>.
|
||
|
|
||
|
<p>This option causes the preprocessor macro <code>__SUPPORT_SNAN__</code> to
|
||
|
be defined.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-signaling-nans</span></samp>.
|
||
|
|
||
|
<p>This option is experimental and does not currently guarantee to
|
||
|
disable all GCC optimizations that affect signaling NaN behavior.
|
||
|
|
||
|
<br><dt><code>-fsingle-precision-constant</code><dd><a name="index-fsingle_002dprecision_002dconstant-1088"></a>Treat floating-point constants as single precision instead of
|
||
|
implicitly converting them to double-precision constants.
|
||
|
|
||
|
<br><dt><code>-fcx-limited-range</code><dd><a name="index-fcx_002dlimited_002drange-1089"></a>When enabled, this option states that a range reduction step is not
|
||
|
needed when performing complex division. Also, there is no checking
|
||
|
whether the result of a complex multiplication or division is <code>NaN
|
||
|
+ I*NaN</code>, with an attempt to rescue the situation in that case. The
|
||
|
default is <samp><span class="option">-fno-cx-limited-range</span></samp>, but is enabled by
|
||
|
<samp><span class="option">-ffast-math</span></samp>.
|
||
|
|
||
|
<p>This option controls the default setting of the ISO C99
|
||
|
<code>CX_LIMITED_RANGE</code> pragma. Nevertheless, the option applies to
|
||
|
all languages.
|
||
|
|
||
|
<br><dt><code>-fcx-fortran-rules</code><dd><a name="index-fcx_002dfortran_002drules-1090"></a>Complex multiplication and division follow Fortran rules. Range
|
||
|
reduction is done as part of complex division, but there is no checking
|
||
|
whether the result of a complex multiplication or division is <code>NaN
|
||
|
+ I*NaN</code>, with an attempt to rescue the situation in that case.
|
||
|
|
||
|
<p>The default is <samp><span class="option">-fno-cx-fortran-rules</span></samp>.
|
||
|
|
||
|
</dl>
|
||
|
|
||
|
<p>The following options control optimizations that may improve
|
||
|
performance, but are not enabled by any <samp><span class="option">-O</span></samp> options. This
|
||
|
section includes experimental options that may produce broken code.
|
||
|
|
||
|
<dl>
|
||
|
<dt><code>-fbranch-probabilities</code><dd><a name="index-fbranch_002dprobabilities-1091"></a>After running a program compiled with <samp><span class="option">-fprofile-arcs</span></samp>
|
||
|
(see <a href="Debugging-Options.html#Debugging-Options">Options for Debugging Your Program or <samp><span class="command">gcc</span></samp></a>), you can compile it a second time using
|
||
|
<samp><span class="option">-fbranch-probabilities</span></samp>, to improve optimizations based on
|
||
|
the number of times each branch was taken. When a program
|
||
|
compiled with <samp><span class="option">-fprofile-arcs</span></samp> exits, it saves arc execution
|
||
|
counts to a file called <samp><var>sourcename</var><span class="file">.gcda</span></samp> for each source
|
||
|
file. The information in this data file is very dependent on the
|
||
|
structure of the generated code, so you must use the same source code
|
||
|
and the same optimization options for both compilations.
|
||
|
|
||
|
<p>With <samp><span class="option">-fbranch-probabilities</span></samp>, GCC puts a
|
||
|
‘<samp><span class="samp">REG_BR_PROB</span></samp>’ note on each ‘<samp><span class="samp">JUMP_INSN</span></samp>’ and ‘<samp><span class="samp">CALL_INSN</span></samp>’.
|
||
|
These can be used to improve optimization. Currently, they are only
|
||
|
used in one place: in <samp><span class="file">reorg.c</span></samp>, instead of guessing which path a
|
||
|
branch is most likely to take, the ‘<samp><span class="samp">REG_BR_PROB</span></samp>’ values are used to
|
||
|
exactly determine which path is taken more often.
|
||
|
|
||
|
<br><dt><code>-fprofile-values</code><dd><a name="index-fprofile_002dvalues-1092"></a>If combined with <samp><span class="option">-fprofile-arcs</span></samp>, it adds code so that some
|
||
|
data about values of expressions in the program is gathered.
|
||
|
|
||
|
<p>With <samp><span class="option">-fbranch-probabilities</span></samp>, it reads back the data gathered
|
||
|
from profiling values of expressions for usage in optimizations.
|
||
|
|
||
|
<p>Enabled with <samp><span class="option">-fprofile-generate</span></samp> and <samp><span class="option">-fprofile-use</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fprofile-reorder-functions</code><dd><a name="index-fprofile_002dreorder_002dfunctions-1093"></a>Function reordering based on profile instrumentation collects
|
||
|
first time of execution of a function and orders these functions
|
||
|
in ascending order.
|
||
|
|
||
|
<p>Enabled with <samp><span class="option">-fprofile-use</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fvpt</code><dd><a name="index-fvpt-1094"></a>If combined with <samp><span class="option">-fprofile-arcs</span></samp>, this option instructs the compiler
|
||
|
to add code to gather information about values of expressions.
|
||
|
|
||
|
<p>With <samp><span class="option">-fbranch-probabilities</span></samp>, it reads back the data gathered
|
||
|
and actually performs the optimizations based on them.
|
||
|
Currently the optimizations include specialization of division operations
|
||
|
using the knowledge about the value of the denominator.
|
||
|
|
||
|
<br><dt><code>-frename-registers</code><dd><a name="index-frename_002dregisters-1095"></a>Attempt to avoid false dependencies in scheduled code by making use
|
||
|
of registers left over after register allocation. This optimization
|
||
|
most benefits processors with lots of registers. Depending on the
|
||
|
debug information format adopted by the target, however, it can
|
||
|
make debugging impossible, since variables no longer stay in
|
||
|
a “home register”.
|
||
|
|
||
|
<p>Enabled by default with <samp><span class="option">-funroll-loops</span></samp> and <samp><span class="option">-fpeel-loops</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fschedule-fusion</code><dd><a name="index-fschedule_002dfusion-1096"></a>Performs a target dependent pass over the instruction stream to schedule
|
||
|
instructions of same type together because target machine can execute them
|
||
|
more efficiently if they are adjacent to each other in the instruction flow.
|
||
|
|
||
|
<p>Enabled at levels <samp><span class="option">-O2</span></samp>, <samp><span class="option">-O3</span></samp>, <samp><span class="option">-Os</span></samp>.
|
||
|
|
||
|
<br><dt><code>-ftracer</code><dd><a name="index-ftracer-1097"></a>Perform tail duplication to enlarge superblock size. This transformation
|
||
|
simplifies the control flow of the function allowing other optimizations to do
|
||
|
a better job.
|
||
|
|
||
|
<p>Enabled with <samp><span class="option">-fprofile-use</span></samp>.
|
||
|
|
||
|
<br><dt><code>-funroll-loops</code><dd><a name="index-funroll_002dloops-1098"></a>Unroll loops whose number of iterations can be determined at compile time or
|
||
|
upon entry to the loop. <samp><span class="option">-funroll-loops</span></samp> implies
|
||
|
<samp><span class="option">-frerun-cse-after-loop</span></samp>, <samp><span class="option">-fweb</span></samp> and <samp><span class="option">-frename-registers</span></samp>.
|
||
|
It also turns on complete loop peeling (i.e. complete removal of loops with
|
||
|
a small constant number of iterations). This option makes code larger, and may
|
||
|
or may not make it run faster.
|
||
|
|
||
|
<p>Enabled with <samp><span class="option">-fprofile-use</span></samp>.
|
||
|
|
||
|
<br><dt><code>-funroll-all-loops</code><dd><a name="index-funroll_002dall_002dloops-1099"></a>Unroll all loops, even if their number of iterations is uncertain when
|
||
|
the loop is entered. This usually makes programs run more slowly.
|
||
|
<samp><span class="option">-funroll-all-loops</span></samp> implies the same options as
|
||
|
<samp><span class="option">-funroll-loops</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fpeel-loops</code><dd><a name="index-fpeel_002dloops-1100"></a>Peels loops for which there is enough information that they do not
|
||
|
roll much (from profile feedback). It also turns on complete loop peeling
|
||
|
(i.e. complete removal of loops with small constant number of iterations).
|
||
|
|
||
|
<p>Enabled with <samp><span class="option">-fprofile-use</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fmove-loop-invariants</code><dd><a name="index-fmove_002dloop_002dinvariants-1101"></a>Enables the loop invariant motion pass in the RTL loop optimizer. Enabled
|
||
|
at level <samp><span class="option">-O1</span></samp>
|
||
|
|
||
|
<br><dt><code>-funswitch-loops</code><dd><a name="index-funswitch_002dloops-1102"></a>Move branches with loop invariant conditions out of the loop, with duplicates
|
||
|
of the loop on both branches (modified according to result of the condition).
|
||
|
|
||
|
<br><dt><code>-ffunction-sections</code><dt><code>-fdata-sections</code><dd><a name="index-ffunction_002dsections-1103"></a><a name="index-fdata_002dsections-1104"></a>Place each function or data item into its own section in the output
|
||
|
file if the target supports arbitrary sections. The name of the
|
||
|
function or the name of the data item determines the section's name
|
||
|
in the output file.
|
||
|
|
||
|
<p>Use these options on systems where the linker can perform optimizations
|
||
|
to improve locality of reference in the instruction space. Most systems
|
||
|
using the ELF object format and SPARC processors running Solaris 2 have
|
||
|
linkers with such optimizations. AIX may have these optimizations in
|
||
|
the future.
|
||
|
|
||
|
<p>Only use these options when there are significant benefits from doing
|
||
|
so. When you specify these options, the assembler and linker
|
||
|
create larger object and executable files and are also slower.
|
||
|
You cannot use <samp><span class="command">gprof</span></samp> on all systems if you
|
||
|
specify this option, and you may have problems with debugging if
|
||
|
you specify both this option and <samp><span class="option">-g</span></samp>.
|
||
|
|
||
|
<br><dt><code>-fbranch-target-load-optimize</code><dd><a name="index-fbranch_002dtarget_002dload_002doptimize-1105"></a>Perform branch target register load optimization before prologue / epilogue
|
||
|
threading.
|
||
|
The use of target registers can typically be exposed only during reload,
|
||
|
thus hoisting loads out of loops and doing inter-block scheduling needs
|
||
|
a separate optimization pass.
|
||
|
|
||
|
<br><dt><code>-fbranch-target-load-optimize2</code><dd><a name="index-fbranch_002dtarget_002dload_002doptimize2-1106"></a>Perform branch target register load optimization after prologue / epilogue
|
||
|
threading.
|
||
|
|
||
|
<br><dt><code>-fbtr-bb-exclusive</code><dd><a name="index-fbtr_002dbb_002dexclusive-1107"></a>When performing branch target register load optimization, don't reuse
|
||
|
branch target registers within any basic block.
|
||
|
|
||
|
<br><dt><code>-fstack-protector</code><dd><a name="index-fstack_002dprotector-1108"></a>Emit extra code to check for buffer overflows, such as stack smashing
|
||
|
attacks. This is done by adding a guard variable to functions with
|
||
|
vulnerable objects. This includes functions that call <code>alloca</code>, and
|
||
|
functions with buffers larger than 8 bytes. The guards are initialized
|
||
|
when a function is entered and then checked when the function exits.
|
||
|
If a guard check fails, an error message is printed and the program exits.
|
||
|
|
||
|
<br><dt><code>-fstack-protector-all</code><dd><a name="index-fstack_002dprotector_002dall-1109"></a>Like <samp><span class="option">-fstack-protector</span></samp> except that all functions are protected.
|
||
|
|
||
|
<br><dt><code>-fstack-protector-strong</code><dd><a name="index-fstack_002dprotector_002dstrong-1110"></a>Like <samp><span class="option">-fstack-protector</span></samp> but includes additional functions to
|
||
|
be protected — those that have local array definitions, or have
|
||
|
references to local frame addresses.
|
||
|
|
||
|
<br><dt><code>-fstack-protector-explicit</code><dd><a name="index-fstack_002dprotector_002dexplicit-1111"></a>Like <samp><span class="option">-fstack-protector</span></samp> but only protects those functions which
|
||
|
have the <code>stack_protect</code> attribute
|
||
|
|
||
|
<br><dt><code>-fstdarg-opt</code><dd><a name="index-fstdarg_002dopt-1112"></a>Optimize the prologue of variadic argument functions with respect to usage of
|
||
|
those arguments.
|
||
|
|
||
|
<br><dt><code>-fsection-anchors</code><dd><a name="index-fsection_002danchors-1113"></a>Try to reduce the number of symbolic address calculations by using
|
||
|
shared “anchor” symbols to address nearby objects. This transformation
|
||
|
can help to reduce the number of GOT entries and GOT accesses on some
|
||
|
targets.
|
||
|
|
||
|
<p>For example, the implementation of the following function <code>foo</code>:
|
||
|
|
||
|
<pre class="smallexample"> static int a, b, c;
|
||
|
int foo (void) { return a + b + c; }
|
||
|
</pre>
|
||
|
<p class="noindent">usually calculates the addresses of all three variables, but if you
|
||
|
compile it with <samp><span class="option">-fsection-anchors</span></samp>, it accesses the variables
|
||
|
from a common anchor point instead. The effect is similar to the
|
||
|
following pseudocode (which isn't valid C):
|
||
|
|
||
|
<pre class="smallexample"> int foo (void)
|
||
|
{
|
||
|
register int *xr = &x;
|
||
|
return xr[&a - &x] + xr[&b - &x] + xr[&c - &x];
|
||
|
}
|
||
|
</pre>
|
||
|
<p>Not all targets support this option.
|
||
|
|
||
|
<br><dt><code>--param </code><var>name</var><code>=</code><var>value</var><dd><a name="index-param-1114"></a>In some places, GCC uses various constants to control the amount of
|
||
|
optimization that is done. For example, GCC does not inline functions
|
||
|
that contain more than a certain number of instructions. You can
|
||
|
control some of these constants on the command line using the
|
||
|
<samp><span class="option">--param</span></samp> option.
|
||
|
|
||
|
<p>The names of specific parameters, and the meaning of the values, are
|
||
|
tied to the internals of the compiler, and are subject to change
|
||
|
without notice in future releases.
|
||
|
|
||
|
<p>In each case, the <var>value</var> is an integer. The allowable choices for
|
||
|
<var>name</var> are:
|
||
|
|
||
|
<dl>
|
||
|
<dt><code>predictable-branch-outcome</code><dd>When branch is predicted to be taken with probability lower than this threshold
|
||
|
(in percent), then it is considered well predictable. The default is 10.
|
||
|
|
||
|
<br><dt><code>max-crossjump-edges</code><dd>The maximum number of incoming edges to consider for cross-jumping.
|
||
|
The algorithm used by <samp><span class="option">-fcrossjumping</span></samp> is O(N^2) in
|
||
|
the number of edges incoming to each block. Increasing values mean
|
||
|
more aggressive optimization, making the compilation time increase with
|
||
|
probably small improvement in executable size.
|
||
|
|
||
|
<br><dt><code>min-crossjump-insns</code><dd>The minimum number of instructions that must be matched at the end
|
||
|
of two blocks before cross-jumping is performed on them. This
|
||
|
value is ignored in the case where all instructions in the block being
|
||
|
cross-jumped from are matched. The default value is 5.
|
||
|
|
||
|
<br><dt><code>max-grow-copy-bb-insns</code><dd>The maximum code size expansion factor when copying basic blocks
|
||
|
instead of jumping. The expansion is relative to a jump instruction.
|
||
|
The default value is 8.
|
||
|
|
||
|
<br><dt><code>max-goto-duplication-insns</code><dd>The maximum number of instructions to duplicate to a block that jumps
|
||
|
to a computed goto. To avoid O(N^2) behavior in a number of
|
||
|
passes, GCC factors computed gotos early in the compilation process,
|
||
|
and unfactors them as late as possible. Only computed jumps at the
|
||
|
end of a basic blocks with no more than max-goto-duplication-insns are
|
||
|
unfactored. The default value is 8.
|
||
|
|
||
|
<br><dt><code>max-delay-slot-insn-search</code><dd>The maximum number of instructions to consider when looking for an
|
||
|
instruction to fill a delay slot. If more than this arbitrary number of
|
||
|
instructions are searched, the time savings from filling the delay slot
|
||
|
are minimal, so stop searching. Increasing values mean more
|
||
|
aggressive optimization, making the compilation time increase with probably
|
||
|
small improvement in execution time.
|
||
|
|
||
|
<br><dt><code>max-delay-slot-live-search</code><dd>When trying to fill delay slots, the maximum number of instructions to
|
||
|
consider when searching for a block with valid live register
|
||
|
information. Increasing this arbitrarily chosen value means more
|
||
|
aggressive optimization, increasing the compilation time. This parameter
|
||
|
should be removed when the delay slot code is rewritten to maintain the
|
||
|
control-flow graph.
|
||
|
|
||
|
<br><dt><code>max-gcse-memory</code><dd>The approximate maximum amount of memory that can be allocated in
|
||
|
order to perform the global common subexpression elimination
|
||
|
optimization. If more memory than specified is required, the
|
||
|
optimization is not done.
|
||
|
|
||
|
<br><dt><code>max-gcse-insertion-ratio</code><dd>If the ratio of expression insertions to deletions is larger than this value
|
||
|
for any expression, then RTL PRE inserts or removes the expression and thus
|
||
|
leaves partially redundant computations in the instruction stream. The default value is 20.
|
||
|
|
||
|
<br><dt><code>max-pending-list-length</code><dd>The maximum number of pending dependencies scheduling allows
|
||
|
before flushing the current state and starting over. Large functions
|
||
|
with few branches or calls can create excessively large lists which
|
||
|
needlessly consume memory and resources.
|
||
|
|
||
|
<br><dt><code>max-modulo-backtrack-attempts</code><dd>The maximum number of backtrack attempts the scheduler should make
|
||
|
when modulo scheduling a loop. Larger values can exponentially increase
|
||
|
compilation time.
|
||
|
|
||
|
<br><dt><code>max-inline-insns-single</code><dd>Several parameters control the tree inliner used in GCC.
|
||
|
This number sets the maximum number of instructions (counted in GCC's
|
||
|
internal representation) in a single function that the tree inliner
|
||
|
considers for inlining. This only affects functions declared
|
||
|
inline and methods implemented in a class declaration (C++).
|
||
|
The default value is 400.
|
||
|
|
||
|
<br><dt><code>max-inline-insns-auto</code><dd>When you use <samp><span class="option">-finline-functions</span></samp> (included in <samp><span class="option">-O3</span></samp>),
|
||
|
a lot of functions that would otherwise not be considered for inlining
|
||
|
by the compiler are investigated. To those functions, a different
|
||
|
(more restrictive) limit compared to functions declared inline can
|
||
|
be applied.
|
||
|
The default value is 40.
|
||
|
|
||
|
<br><dt><code>inline-min-speedup</code><dd>When estimated performance improvement of caller + callee runtime exceeds this
|
||
|
threshold (in precent), the function can be inlined regardless the limit on
|
||
|
<samp><span class="option">--param max-inline-insns-single</span></samp> and <samp><span class="option">--param
|
||
|
max-inline-insns-auto</span></samp>.
|
||
|
|
||
|
<br><dt><code>large-function-insns</code><dd>The limit specifying really large functions. For functions larger than this
|
||
|
limit after inlining, inlining is constrained by
|
||
|
<samp><span class="option">--param large-function-growth</span></samp>. This parameter is useful primarily
|
||
|
to avoid extreme compilation time caused by non-linear algorithms used by the
|
||
|
back end.
|
||
|
The default value is 2700.
|
||
|
|
||
|
<br><dt><code>large-function-growth</code><dd>Specifies maximal growth of large function caused by inlining in percents.
|
||
|
The default value is 100 which limits large function growth to 2.0 times
|
||
|
the original size.
|
||
|
|
||
|
<br><dt><code>large-unit-insns</code><dd>The limit specifying large translation unit. Growth caused by inlining of
|
||
|
units larger than this limit is limited by <samp><span class="option">--param inline-unit-growth</span></samp>.
|
||
|
For small units this might be too tight.
|
||
|
For example, consider a unit consisting of function A
|
||
|
that is inline and B that just calls A three times. If B is small relative to
|
||
|
A, the growth of unit is 300\% and yet such inlining is very sane. For very
|
||
|
large units consisting of small inlineable functions, however, the overall unit
|
||
|
growth limit is needed to avoid exponential explosion of code size. Thus for
|
||
|
smaller units, the size is increased to <samp><span class="option">--param large-unit-insns</span></samp>
|
||
|
before applying <samp><span class="option">--param inline-unit-growth</span></samp>. The default is 10000.
|
||
|
|
||
|
<br><dt><code>inline-unit-growth</code><dd>Specifies maximal overall growth of the compilation unit caused by inlining.
|
||
|
The default value is 20 which limits unit growth to 1.2 times the original
|
||
|
size. Cold functions (either marked cold via an attribute or by profile
|
||
|
feedback) are not accounted into the unit size.
|
||
|
|
||
|
<br><dt><code>ipcp-unit-growth</code><dd>Specifies maximal overall growth of the compilation unit caused by
|
||
|
interprocedural constant propagation. The default value is 10 which limits
|
||
|
unit growth to 1.1 times the original size.
|
||
|
|
||
|
<br><dt><code>large-stack-frame</code><dd>The limit specifying large stack frames. While inlining the algorithm is trying
|
||
|
to not grow past this limit too much. The default value is 256 bytes.
|
||
|
|
||
|
<br><dt><code>large-stack-frame-growth</code><dd>Specifies maximal growth of large stack frames caused by inlining in percents.
|
||
|
The default value is 1000 which limits large stack frame growth to 11 times
|
||
|
the original size.
|
||
|
|
||
|
<br><dt><code>max-inline-insns-recursive</code><dt><code>max-inline-insns-recursive-auto</code><dd>Specifies the maximum number of instructions an out-of-line copy of a
|
||
|
self-recursive inline
|
||
|
function can grow into by performing recursive inlining.
|
||
|
|
||
|
<p><samp><span class="option">--param max-inline-insns-recursive</span></samp> applies to functions
|
||
|
declared inline.
|
||
|
For functions not declared inline, recursive inlining
|
||
|
happens only when <samp><span class="option">-finline-functions</span></samp> (included in <samp><span class="option">-O3</span></samp>) is
|
||
|
enabled; <samp><span class="option">--param max-inline-insns-recursive-auto</span></samp> applies instead. The
|
||
|
default value is 450.
|
||
|
|
||
|
<br><dt><code>max-inline-recursive-depth</code><dt><code>max-inline-recursive-depth-auto</code><dd>Specifies the maximum recursion depth used for recursive inlining.
|
||
|
|
||
|
<p><samp><span class="option">--param max-inline-recursive-depth</span></samp> applies to functions
|
||
|
declared inline. For functions not declared inline, recursive inlining
|
||
|
happens only when <samp><span class="option">-finline-functions</span></samp> (included in <samp><span class="option">-O3</span></samp>) is
|
||
|
enabled; <samp><span class="option">--param max-inline-recursive-depth-auto</span></samp> applies instead. The
|
||
|
default value is 8.
|
||
|
|
||
|
<br><dt><code>min-inline-recursive-probability</code><dd>Recursive inlining is profitable only for function having deep recursion
|
||
|
in average and can hurt for function having little recursion depth by
|
||
|
increasing the prologue size or complexity of function body to other
|
||
|
optimizers.
|
||
|
|
||
|
<p>When profile feedback is available (see <samp><span class="option">-fprofile-generate</span></samp>) the actual
|
||
|
recursion depth can be guessed from probability that function recurses via a
|
||
|
given call expression. This parameter limits inlining only to call expressions
|
||
|
whose probability exceeds the given threshold (in percents).
|
||
|
The default value is 10.
|
||
|
|
||
|
<br><dt><code>early-inlining-insns</code><dd>Specify growth that the early inliner can make. In effect it increases
|
||
|
the amount of inlining for code having a large abstraction penalty.
|
||
|
The default value is 14.
|
||
|
|
||
|
<br><dt><code>max-early-inliner-iterations</code><dd>Limit of iterations of the early inliner. This basically bounds
|
||
|
the number of nested indirect calls the early inliner can resolve.
|
||
|
Deeper chains are still handled by late inlining.
|
||
|
|
||
|
<br><dt><code>comdat-sharing-probability</code><dd>Probability (in percent) that C++ inline function with comdat visibility
|
||
|
are shared across multiple compilation units. The default value is 20.
|
||
|
|
||
|
<br><dt><code>profile-func-internal-id</code><dd>A parameter to control whether to use function internal id in profile
|
||
|
database lookup. If the value is 0, the compiler uses an id that
|
||
|
is based on function assembler name and filename, which makes old profile
|
||
|
data more tolerant to source changes such as function reordering etc.
|
||
|
The default value is 0.
|
||
|
|
||
|
<br><dt><code>min-vect-loop-bound</code><dd>The minimum number of iterations under which loops are not vectorized
|
||
|
when <samp><span class="option">-ftree-vectorize</span></samp> is used. The number of iterations after
|
||
|
vectorization needs to be greater than the value specified by this option
|
||
|
to allow vectorization. The default value is 0.
|
||
|
|
||
|
<br><dt><code>gcse-cost-distance-ratio</code><dd>Scaling factor in calculation of maximum distance an expression
|
||
|
can be moved by GCSE optimizations. This is currently supported only in the
|
||
|
code hoisting pass. The bigger the ratio, the more aggressive code hoisting
|
||
|
is with simple expressions, i.e., the expressions that have cost
|
||
|
less than <samp><span class="option">gcse-unrestricted-cost</span></samp>. Specifying 0 disables
|
||
|
hoisting of simple expressions. The default value is 10.
|
||
|
|
||
|
<br><dt><code>gcse-unrestricted-cost</code><dd>Cost, roughly measured as the cost of a single typical machine
|
||
|
instruction, at which GCSE optimizations do not constrain
|
||
|
the distance an expression can travel. This is currently
|
||
|
supported only in the code hoisting pass. The lesser the cost,
|
||
|
the more aggressive code hoisting is. Specifying 0
|
||
|
allows all expressions to travel unrestricted distances.
|
||
|
The default value is 3.
|
||
|
|
||
|
<br><dt><code>max-hoist-depth</code><dd>The depth of search in the dominator tree for expressions to hoist.
|
||
|
This is used to avoid quadratic behavior in hoisting algorithm.
|
||
|
The value of 0 does not limit on the search, but may slow down compilation
|
||
|
of huge functions. The default value is 30.
|
||
|
|
||
|
<br><dt><code>max-tail-merge-comparisons</code><dd>The maximum amount of similar bbs to compare a bb with. This is used to
|
||
|
avoid quadratic behavior in tree tail merging. The default value is 10.
|
||
|
|
||
|
<br><dt><code>max-tail-merge-iterations</code><dd>The maximum amount of iterations of the pass over the function. This is used to
|
||
|
limit compilation time in tree tail merging. The default value is 2.
|
||
|
|
||
|
<br><dt><code>max-unrolled-insns</code><dd>The maximum number of instructions that a loop may have to be unrolled.
|
||
|
If a loop is unrolled, this parameter also determines how many times
|
||
|
the loop code is unrolled.
|
||
|
|
||
|
<br><dt><code>max-average-unrolled-insns</code><dd>The maximum number of instructions biased by probabilities of their execution
|
||
|
that a loop may have to be unrolled. If a loop is unrolled,
|
||
|
this parameter also determines how many times the loop code is unrolled.
|
||
|
|
||
|
<br><dt><code>max-unroll-times</code><dd>The maximum number of unrollings of a single loop.
|
||
|
|
||
|
<br><dt><code>max-peeled-insns</code><dd>The maximum number of instructions that a loop may have to be peeled.
|
||
|
If a loop is peeled, this parameter also determines how many times
|
||
|
the loop code is peeled.
|
||
|
|
||
|
<br><dt><code>max-peel-times</code><dd>The maximum number of peelings of a single loop.
|
||
|
|
||
|
<br><dt><code>max-peel-branches</code><dd>The maximum number of branches on the hot path through the peeled sequence.
|
||
|
|
||
|
<br><dt><code>max-completely-peeled-insns</code><dd>The maximum number of insns of a completely peeled loop.
|
||
|
|
||
|
<br><dt><code>max-completely-peel-times</code><dd>The maximum number of iterations of a loop to be suitable for complete peeling.
|
||
|
|
||
|
<br><dt><code>max-completely-peel-loop-nest-depth</code><dd>The maximum depth of a loop nest suitable for complete peeling.
|
||
|
|
||
|
<br><dt><code>max-unswitch-insns</code><dd>The maximum number of insns of an unswitched loop.
|
||
|
|
||
|
<br><dt><code>max-unswitch-level</code><dd>The maximum number of branches unswitched in a single loop.
|
||
|
|
||
|
<br><dt><code>lim-expensive</code><dd>The minimum cost of an expensive expression in the loop invariant motion.
|
||
|
|
||
|
<br><dt><code>iv-consider-all-candidates-bound</code><dd>Bound on number of candidates for induction variables, below which
|
||
|
all candidates are considered for each use in induction variable
|
||
|
optimizations. If there are more candidates than this,
|
||
|
only the most relevant ones are considered to avoid quadratic time complexity.
|
||
|
|
||
|
<br><dt><code>iv-max-considered-uses</code><dd>The induction variable optimizations give up on loops that contain more
|
||
|
induction variable uses.
|
||
|
|
||
|
<br><dt><code>iv-always-prune-cand-set-bound</code><dd>If the number of candidates in the set is smaller than this value,
|
||
|
always try to remove unnecessary ivs from the set
|
||
|
when adding a new one.
|
||
|
|
||
|
<br><dt><code>scev-max-expr-size</code><dd>Bound on size of expressions used in the scalar evolutions analyzer.
|
||
|
Large expressions slow the analyzer.
|
||
|
|
||
|
<br><dt><code>scev-max-expr-complexity</code><dd>Bound on the complexity of the expressions in the scalar evolutions analyzer.
|
||
|
Complex expressions slow the analyzer.
|
||
|
|
||
|
<br><dt><code>omega-max-vars</code><dd>The maximum number of variables in an Omega constraint system.
|
||
|
The default value is 128.
|
||
|
|
||
|
<br><dt><code>omega-max-geqs</code><dd>The maximum number of inequalities in an Omega constraint system.
|
||
|
The default value is 256.
|
||
|
|
||
|
<br><dt><code>omega-max-eqs</code><dd>The maximum number of equalities in an Omega constraint system.
|
||
|
The default value is 128.
|
||
|
|
||
|
<br><dt><code>omega-max-wild-cards</code><dd>The maximum number of wildcard variables that the Omega solver is
|
||
|
able to insert. The default value is 18.
|
||
|
|
||
|
<br><dt><code>omega-hash-table-size</code><dd>The size of the hash table in the Omega solver. The default value is
|
||
|
550.
|
||
|
|
||
|
<br><dt><code>omega-max-keys</code><dd>The maximal number of keys used by the Omega solver. The default
|
||
|
value is 500.
|
||
|
|
||
|
<br><dt><code>omega-eliminate-redundant-constraints</code><dd>When set to 1, use expensive methods to eliminate all redundant
|
||
|
constraints. The default value is 0.
|
||
|
|
||
|
<br><dt><code>vect-max-version-for-alignment-checks</code><dd>The maximum number of run-time checks that can be performed when
|
||
|
doing loop versioning for alignment in the vectorizer.
|
||
|
|
||
|
<br><dt><code>vect-max-version-for-alias-checks</code><dd>The maximum number of run-time checks that can be performed when
|
||
|
doing loop versioning for alias in the vectorizer.
|
||
|
|
||
|
<br><dt><code>vect-max-peeling-for-alignment</code><dd>The maximum number of loop peels to enhance access alignment
|
||
|
for vectorizer. Value -1 means 'no limit'.
|
||
|
|
||
|
<br><dt><code>max-iterations-to-track</code><dd>The maximum number of iterations of a loop the brute-force algorithm
|
||
|
for analysis of the number of iterations of the loop tries to evaluate.
|
||
|
|
||
|
<br><dt><code>hot-bb-count-ws-permille</code><dd>A basic block profile count is considered hot if it contributes to
|
||
|
the given permillage (i.e. 0...1000) of the entire profiled execution.
|
||
|
|
||
|
<br><dt><code>hot-bb-frequency-fraction</code><dd>Select fraction of the entry block frequency of executions of basic block in
|
||
|
function given basic block needs to have to be considered hot.
|
||
|
|
||
|
<br><dt><code>max-predicted-iterations</code><dd>The maximum number of loop iterations we predict statically. This is useful
|
||
|
in cases where a function contains a single loop with known bound and
|
||
|
another loop with unknown bound.
|
||
|
The known number of iterations is predicted correctly, while
|
||
|
the unknown number of iterations average to roughly 10. This means that the
|
||
|
loop without bounds appears artificially cold relative to the other one.
|
||
|
|
||
|
<br><dt><code>builtin-expect-probability</code><dd>Control the probability of the expression having the specified value. This
|
||
|
parameter takes a percentage (i.e. 0 ... 100) as input.
|
||
|
The default probability of 90 is obtained empirically.
|
||
|
|
||
|
<br><dt><code>align-threshold</code><dd>
|
||
|
Select fraction of the maximal frequency of executions of a basic block in
|
||
|
a function to align the basic block.
|
||
|
|
||
|
<br><dt><code>align-loop-iterations</code><dd>
|
||
|
A loop expected to iterate at least the selected number of iterations is
|
||
|
aligned.
|
||
|
|
||
|
<br><dt><code>tracer-dynamic-coverage</code><dt><code>tracer-dynamic-coverage-feedback</code><dd>
|
||
|
This value is used to limit superblock formation once the given percentage of
|
||
|
executed instructions is covered. This limits unnecessary code size
|
||
|
expansion.
|
||
|
|
||
|
<p>The <samp><span class="option">tracer-dynamic-coverage-feedback</span></samp> parameter
|
||
|
is used only when profile
|
||
|
feedback is available. The real profiles (as opposed to statically estimated
|
||
|
ones) are much less balanced allowing the threshold to be larger value.
|
||
|
|
||
|
<br><dt><code>tracer-max-code-growth</code><dd>Stop tail duplication once code growth has reached given percentage. This is
|
||
|
a rather artificial limit, as most of the duplicates are eliminated later in
|
||
|
cross jumping, so it may be set to much higher values than is the desired code
|
||
|
growth.
|
||
|
|
||
|
<br><dt><code>tracer-min-branch-ratio</code><dd>
|
||
|
Stop reverse growth when the reverse probability of best edge is less than this
|
||
|
threshold (in percent).
|
||
|
|
||
|
<br><dt><code>tracer-min-branch-ratio</code><dt><code>tracer-min-branch-ratio-feedback</code><dd>
|
||
|
Stop forward growth if the best edge has probability lower than this
|
||
|
threshold.
|
||
|
|
||
|
<p>Similarly to <samp><span class="option">tracer-dynamic-coverage</span></samp> two values are present, one for
|
||
|
compilation for profile feedback and one for compilation without. The value
|
||
|
for compilation with profile feedback needs to be more conservative (higher) in
|
||
|
order to make tracer effective.
|
||
|
|
||
|
<br><dt><code>max-cse-path-length</code><dd>
|
||
|
The maximum number of basic blocks on path that CSE considers.
|
||
|
The default is 10.
|
||
|
|
||
|
<br><dt><code>max-cse-insns</code><dd>The maximum number of instructions CSE processes before flushing.
|
||
|
The default is 1000.
|
||
|
|
||
|
<br><dt><code>ggc-min-expand</code><dd>
|
||
|
GCC uses a garbage collector to manage its own memory allocation. This
|
||
|
parameter specifies the minimum percentage by which the garbage
|
||
|
collector's heap should be allowed to expand between collections.
|
||
|
Tuning this may improve compilation speed; it has no effect on code
|
||
|
generation.
|
||
|
|
||
|
<p>The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when
|
||
|
RAM >= 1GB. If <code>getrlimit</code> is available, the notion of “RAM” is
|
||
|
the smallest of actual RAM and <code>RLIMIT_DATA</code> or <code>RLIMIT_AS</code>. If
|
||
|
GCC is not able to calculate RAM on a particular platform, the lower
|
||
|
bound of 30% is used. Setting this parameter and
|
||
|
<samp><span class="option">ggc-min-heapsize</span></samp> to zero causes a full collection to occur at
|
||
|
every opportunity. This is extremely slow, but can be useful for
|
||
|
debugging.
|
||
|
|
||
|
<br><dt><code>ggc-min-heapsize</code><dd>
|
||
|
Minimum size of the garbage collector's heap before it begins bothering
|
||
|
to collect garbage. The first collection occurs after the heap expands
|
||
|
by <samp><span class="option">ggc-min-expand</span></samp>% beyond <samp><span class="option">ggc-min-heapsize</span></samp>. Again,
|
||
|
tuning this may improve compilation speed, and has no effect on code
|
||
|
generation.
|
||
|
|
||
|
<p>The default is the smaller of RAM/8, RLIMIT_RSS, or a limit that
|
||
|
tries to ensure that RLIMIT_DATA or RLIMIT_AS are not exceeded, but
|
||
|
with a lower bound of 4096 (four megabytes) and an upper bound of
|
||
|
131072 (128 megabytes). If GCC is not able to calculate RAM on a
|
||
|
particular platform, the lower bound is used. Setting this parameter
|
||
|
very large effectively disables garbage collection. Setting this
|
||
|
parameter and <samp><span class="option">ggc-min-expand</span></samp> to zero causes a full collection
|
||
|
to occur at every opportunity.
|
||
|
|
||
|
<br><dt><code>max-reload-search-insns</code><dd>The maximum number of instruction reload should look backward for equivalent
|
||
|
register. Increasing values mean more aggressive optimization, making the
|
||
|
compilation time increase with probably slightly better performance.
|
||
|
The default value is 100.
|
||
|
|
||
|
<br><dt><code>max-cselib-memory-locations</code><dd>The maximum number of memory locations cselib should take into account.
|
||
|
Increasing values mean more aggressive optimization, making the compilation time
|
||
|
increase with probably slightly better performance. The default value is 500.
|
||
|
|
||
|
<br><dt><code>reorder-blocks-duplicate</code><dt><code>reorder-blocks-duplicate-feedback</code><dd>
|
||
|
Used by the basic block reordering pass to decide whether to use unconditional
|
||
|
branch or duplicate the code on its destination. Code is duplicated when its
|
||
|
estimated size is smaller than this value multiplied by the estimated size of
|
||
|
unconditional jump in the hot spots of the program.
|
||
|
|
||
|
<p>The <samp><span class="option">reorder-block-duplicate-feedback</span></samp> parameter
|
||
|
is used only when profile
|
||
|
feedback is available. It may be set to higher values than
|
||
|
<samp><span class="option">reorder-block-duplicate</span></samp> since information about the hot spots is more
|
||
|
accurate.
|
||
|
|
||
|
<br><dt><code>max-sched-ready-insns</code><dd>The maximum number of instructions ready to be issued the scheduler should
|
||
|
consider at any given time during the first scheduling pass. Increasing
|
||
|
values mean more thorough searches, making the compilation time increase
|
||
|
with probably little benefit. The default value is 100.
|
||
|
|
||
|
<br><dt><code>max-sched-region-blocks</code><dd>The maximum number of blocks in a region to be considered for
|
||
|
interblock scheduling. The default value is 10.
|
||
|
|
||
|
<br><dt><code>max-pipeline-region-blocks</code><dd>The maximum number of blocks in a region to be considered for
|
||
|
pipelining in the selective scheduler. The default value is 15.
|
||
|
|
||
|
<br><dt><code>max-sched-region-insns</code><dd>The maximum number of insns in a region to be considered for
|
||
|
interblock scheduling. The default value is 100.
|
||
|
|
||
|
<br><dt><code>max-pipeline-region-insns</code><dd>The maximum number of insns in a region to be considered for
|
||
|
pipelining in the selective scheduler. The default value is 200.
|
||
|
|
||
|
<br><dt><code>min-spec-prob</code><dd>The minimum probability (in percents) of reaching a source block
|
||
|
for interblock speculative scheduling. The default value is 40.
|
||
|
|
||
|
<br><dt><code>max-sched-extend-regions-iters</code><dd>The maximum number of iterations through CFG to extend regions.
|
||
|
A value of 0 (the default) disables region extensions.
|
||
|
|
||
|
<br><dt><code>max-sched-insn-conflict-delay</code><dd>The maximum conflict delay for an insn to be considered for speculative motion.
|
||
|
The default value is 3.
|
||
|
|
||
|
<br><dt><code>sched-spec-prob-cutoff</code><dd>The minimal probability of speculation success (in percents), so that
|
||
|
speculative insns are scheduled.
|
||
|
The default value is 40.
|
||
|
|
||
|
<br><dt><code>sched-spec-state-edge-prob-cutoff</code><dd>The minimum probability an edge must have for the scheduler to save its
|
||
|
state across it.
|
||
|
The default value is 10.
|
||
|
|
||
|
<br><dt><code>sched-mem-true-dep-cost</code><dd>Minimal distance (in CPU cycles) between store and load targeting same
|
||
|
memory locations. The default value is 1.
|
||
|
|
||
|
<br><dt><code>selsched-max-lookahead</code><dd>The maximum size of the lookahead window of selective scheduling. It is a
|
||
|
depth of search for available instructions.
|
||
|
The default value is 50.
|
||
|
|
||
|
<br><dt><code>selsched-max-sched-times</code><dd>The maximum number of times that an instruction is scheduled during
|
||
|
selective scheduling. This is the limit on the number of iterations
|
||
|
through which the instruction may be pipelined. The default value is 2.
|
||
|
|
||
|
<br><dt><code>selsched-max-insns-to-rename</code><dd>The maximum number of best instructions in the ready list that are considered
|
||
|
for renaming in the selective scheduler. The default value is 2.
|
||
|
|
||
|
<br><dt><code>sms-min-sc</code><dd>The minimum value of stage count that swing modulo scheduler
|
||
|
generates. The default value is 2.
|
||
|
|
||
|
<br><dt><code>max-last-value-rtl</code><dd>The maximum size measured as number of RTLs that can be recorded in an expression
|
||
|
in combiner for a pseudo register as last known value of that register. The default
|
||
|
is 10000.
|
||
|
|
||
|
<br><dt><code>max-combine-insns</code><dd>The maximum number of instructions the RTL combiner tries to combine.
|
||
|
The default value is 2 at <samp><span class="option">-Og</span></samp> and 4 otherwise.
|
||
|
|
||
|
<br><dt><code>integer-share-limit</code><dd>Small integer constants can use a shared data structure, reducing the
|
||
|
compiler's memory usage and increasing its speed. This sets the maximum
|
||
|
value of a shared integer constant. The default value is 256.
|
||
|
|
||
|
<br><dt><code>ssp-buffer-size</code><dd>The minimum size of buffers (i.e. arrays) that receive stack smashing
|
||
|
protection when <samp><span class="option">-fstack-protection</span></samp> is used.
|
||
|
|
||
|
<br><dt><code>min-size-for-stack-sharing</code><dd>The minimum size of variables taking part in stack slot sharing when not
|
||
|
optimizing. The default value is 32.
|
||
|
|
||
|
<br><dt><code>max-jump-thread-duplication-stmts</code><dd>Maximum number of statements allowed in a block that needs to be
|
||
|
duplicated when threading jumps.
|
||
|
|
||
|
<br><dt><code>max-fields-for-field-sensitive</code><dd>Maximum number of fields in a structure treated in
|
||
|
a field sensitive manner during pointer analysis. The default is zero
|
||
|
for <samp><span class="option">-O0</span></samp> and <samp><span class="option">-O1</span></samp>,
|
||
|
and 100 for <samp><span class="option">-Os</span></samp>, <samp><span class="option">-O2</span></samp>, and <samp><span class="option">-O3</span></samp>.
|
||
|
|
||
|
<br><dt><code>prefetch-latency</code><dd>Estimate on average number of instructions that are executed before
|
||
|
prefetch finishes. The distance prefetched ahead is proportional
|
||
|
to this constant. Increasing this number may also lead to less
|
||
|
streams being prefetched (see <samp><span class="option">simultaneous-prefetches</span></samp>).
|
||
|
|
||
|
<br><dt><code>simultaneous-prefetches</code><dd>Maximum number of prefetches that can run at the same time.
|
||
|
|
||
|
<br><dt><code>l1-cache-line-size</code><dd>The size of cache line in L1 cache, in bytes.
|
||
|
|
||
|
<br><dt><code>l1-cache-size</code><dd>The size of L1 cache, in kilobytes.
|
||
|
|
||
|
<br><dt><code>l2-cache-size</code><dd>The size of L2 cache, in kilobytes.
|
||
|
|
||
|
<br><dt><code>min-insn-to-prefetch-ratio</code><dd>The minimum ratio between the number of instructions and the
|
||
|
number of prefetches to enable prefetching in a loop.
|
||
|
|
||
|
<br><dt><code>prefetch-min-insn-to-mem-ratio</code><dd>The minimum ratio between the number of instructions and the
|
||
|
number of memory references to enable prefetching in a loop.
|
||
|
|
||
|
<br><dt><code>use-canonical-types</code><dd>Whether the compiler should use the “canonical” type system. By
|
||
|
default, this should always be 1, which uses a more efficient internal
|
||
|
mechanism for comparing types in C++ and Objective-C++. However, if
|
||
|
bugs in the canonical type system are causing compilation failures,
|
||
|
set this value to 0 to disable canonical types.
|
||
|
|
||
|
<br><dt><code>switch-conversion-max-branch-ratio</code><dd>Switch initialization conversion refuses to create arrays that are
|
||
|
bigger than <samp><span class="option">switch-conversion-max-branch-ratio</span></samp> times the number of
|
||
|
branches in the switch.
|
||
|
|
||
|
<br><dt><code>max-partial-antic-length</code><dd>Maximum length of the partial antic set computed during the tree
|
||
|
partial redundancy elimination optimization (<samp><span class="option">-ftree-pre</span></samp>) when
|
||
|
optimizing at <samp><span class="option">-O3</span></samp> and above. For some sorts of source code
|
||
|
the enhanced partial redundancy elimination optimization can run away,
|
||
|
consuming all of the memory available on the host machine. This
|
||
|
parameter sets a limit on the length of the sets that are computed,
|
||
|
which prevents the runaway behavior. Setting a value of 0 for
|
||
|
this parameter allows an unlimited set length.
|
||
|
|
||
|
<br><dt><code>sccvn-max-scc-size</code><dd>Maximum size of a strongly connected component (SCC) during SCCVN
|
||
|
processing. If this limit is hit, SCCVN processing for the whole
|
||
|
function is not done and optimizations depending on it are
|
||
|
disabled. The default maximum SCC size is 10000.
|
||
|
|
||
|
<br><dt><code>sccvn-max-alias-queries-per-access</code><dd>Maximum number of alias-oracle queries we perform when looking for
|
||
|
redundancies for loads and stores. If this limit is hit the search
|
||
|
is aborted and the load or store is not considered redundant. The
|
||
|
number of queries is algorithmically limited to the number of
|
||
|
stores on all paths from the load to the function entry.
|
||
|
The default maxmimum number of queries is 1000.
|
||
|
|
||
|
<br><dt><code>ira-max-loops-num</code><dd>IRA uses regional register allocation by default. If a function
|
||
|
contains more loops than the number given by this parameter, only at most
|
||
|
the given number of the most frequently-executed loops form regions
|
||
|
for regional register allocation. The default value of the
|
||
|
parameter is 100.
|
||
|
|
||
|
<br><dt><code>ira-max-conflict-table-size</code><dd>Although IRA uses a sophisticated algorithm to compress the conflict
|
||
|
table, the table can still require excessive amounts of memory for
|
||
|
huge functions. If the conflict table for a function could be more
|
||
|
than the size in MB given by this parameter, the register allocator
|
||
|
instead uses a faster, simpler, and lower-quality
|
||
|
algorithm that does not require building a pseudo-register conflict table.
|
||
|
The default value of the parameter is 2000.
|
||
|
|
||
|
<br><dt><code>ira-loop-reserved-regs</code><dd>IRA can be used to evaluate more accurate register pressure in loops
|
||
|
for decisions to move loop invariants (see <samp><span class="option">-O3</span></samp>). The number
|
||
|
of available registers reserved for some other purposes is given
|
||
|
by this parameter. The default value of the parameter is 2, which is
|
||
|
the minimal number of registers needed by typical instructions.
|
||
|
This value is the best found from numerous experiments.
|
||
|
|
||
|
<br><dt><code>lra-inheritance-ebb-probability-cutoff</code><dd>LRA tries to reuse values reloaded in registers in subsequent insns.
|
||
|
This optimization is called inheritance. EBB is used as a region to
|
||
|
do this optimization. The parameter defines a minimal fall-through
|
||
|
edge probability in percentage used to add BB to inheritance EBB in
|
||
|
LRA. The default value of the parameter is 40. The value was chosen
|
||
|
from numerous runs of SPEC2000 on x86-64.
|
||
|
|
||
|
<br><dt><code>loop-invariant-max-bbs-in-loop</code><dd>Loop invariant motion can be very expensive, both in compilation time and
|
||
|
in amount of needed compile-time memory, with very large loops. Loops
|
||
|
with more basic blocks than this parameter won't have loop invariant
|
||
|
motion optimization performed on them. The default value of the
|
||
|
parameter is 1000 for <samp><span class="option">-O1</span></samp> and 10000 for <samp><span class="option">-O2</span></samp> and above.
|
||
|
|
||
|
<br><dt><code>loop-max-datarefs-for-datadeps</code><dd>Building data dapendencies is expensive for very large loops. This
|
||
|
parameter limits the number of data references in loops that are
|
||
|
considered for data dependence analysis. These large loops are no
|
||
|
handled by the optimizations using loop data dependencies.
|
||
|
The default value is 1000.
|
||
|
|
||
|
<br><dt><code>max-vartrack-size</code><dd>Sets a maximum number of hash table slots to use during variable
|
||
|
tracking dataflow analysis of any function. If this limit is exceeded
|
||
|
with variable tracking at assignments enabled, analysis for that
|
||
|
function is retried without it, after removing all debug insns from
|
||
|
the function. If the limit is exceeded even without debug insns, var
|
||
|
tracking analysis is completely disabled for the function. Setting
|
||
|
the parameter to zero makes it unlimited.
|
||
|
|
||
|
<br><dt><code>max-vartrack-expr-depth</code><dd>Sets a maximum number of recursion levels when attempting to map
|
||
|
variable names or debug temporaries to value expressions. This trades
|
||
|
compilation time for more complete debug information. If this is set too
|
||
|
low, value expressions that are available and could be represented in
|
||
|
debug information may end up not being used; setting this higher may
|
||
|
enable the compiler to find more complex debug expressions, but compile
|
||
|
time and memory use may grow. The default is 12.
|
||
|
|
||
|
<br><dt><code>min-nondebug-insn-uid</code><dd>Use uids starting at this parameter for nondebug insns. The range below
|
||
|
the parameter is reserved exclusively for debug insns created by
|
||
|
<samp><span class="option">-fvar-tracking-assignments</span></samp>, but debug insns may get
|
||
|
(non-overlapping) uids above it if the reserved range is exhausted.
|
||
|
|
||
|
<br><dt><code>ipa-sra-ptr-growth-factor</code><dd>IPA-SRA replaces a pointer to an aggregate with one or more new
|
||
|
parameters only when their cumulative size is less or equal to
|
||
|
<samp><span class="option">ipa-sra-ptr-growth-factor</span></samp> times the size of the original
|
||
|
pointer parameter.
|
||
|
|
||
|
<br><dt><code>sra-max-scalarization-size-Ospeed</code><br><dt><code>sra-max-scalarization-size-Osize</code><dd>The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to
|
||
|
replace scalar parts of aggregates with uses of independent scalar
|
||
|
variables. These parameters control the maximum size, in storage units,
|
||
|
of aggregate which is considered for replacement when compiling for
|
||
|
speed
|
||
|
(<samp><span class="option">sra-max-scalarization-size-Ospeed</span></samp>) or size
|
||
|
(<samp><span class="option">sra-max-scalarization-size-Osize</span></samp>) respectively.
|
||
|
|
||
|
<br><dt><code>tm-max-aggregate-size</code><dd>When making copies of thread-local variables in a transaction, this
|
||
|
parameter specifies the size in bytes after which variables are
|
||
|
saved with the logging functions as opposed to save/restore code
|
||
|
sequence pairs. This option only applies when using
|
||
|
<samp><span class="option">-fgnu-tm</span></samp>.
|
||
|
|
||
|
<br><dt><code>graphite-max-nb-scop-params</code><dd>To avoid exponential effects in the Graphite loop transforms, the
|
||
|
number of parameters in a Static Control Part (SCoP) is bounded. The
|
||
|
default value is 10 parameters. A variable whose value is unknown at
|
||
|
compilation time and defined outside a SCoP is a parameter of the SCoP.
|
||
|
|
||
|
<br><dt><code>graphite-max-bbs-per-function</code><dd>To avoid exponential effects in the detection of SCoPs, the size of
|
||
|
the functions analyzed by Graphite is bounded. The default value is
|
||
|
100 basic blocks.
|
||
|
|
||
|
<br><dt><code>loop-block-tile-size</code><dd>Loop blocking or strip mining transforms, enabled with
|
||
|
<samp><span class="option">-floop-block</span></samp> or <samp><span class="option">-floop-strip-mine</span></samp>, strip mine each
|
||
|
loop in the loop nest by a given number of iterations. The strip
|
||
|
length can be changed using the <samp><span class="option">loop-block-tile-size</span></samp>
|
||
|
parameter. The default value is 51 iterations.
|
||
|
|
||
|
<br><dt><code>loop-unroll-jam-size</code><dd>Specify the unroll factor for the <samp><span class="option">-floop-unroll-and-jam</span></samp> option. The
|
||
|
default value is 4.
|
||
|
|
||
|
<br><dt><code>loop-unroll-jam-depth</code><dd>Specify the dimension to be unrolled (counting from the most inner loop)
|
||
|
for the <samp><span class="option">-floop-unroll-and-jam</span></samp>. The default value is 2.
|
||
|
|
||
|
<br><dt><code>ipa-cp-value-list-size</code><dd>IPA-CP attempts to track all possible values and types passed to a function's
|
||
|
parameter in order to propagate them and perform devirtualization.
|
||
|
<samp><span class="option">ipa-cp-value-list-size</span></samp> is the maximum number of values and types it
|
||
|
stores per one formal parameter of a function.
|
||
|
|
||
|
<br><dt><code>ipa-cp-eval-threshold</code><dd>IPA-CP calculates its own score of cloning profitability heuristics
|
||
|
and performs those cloning opportunities with scores that exceed
|
||
|
<samp><span class="option">ipa-cp-eval-threshold</span></samp>.
|
||
|
|
||
|
<br><dt><code>ipa-cp-recursion-penalty</code><dd>Percentage penalty the recursive functions will receive when they
|
||
|
are evaluated for cloning.
|
||
|
|
||
|
<br><dt><code>ipa-cp-single-call-penalty</code><dd>Percentage penalty functions containg a single call to another
|
||
|
function will receive when they are evaluated for cloning.
|
||
|
|
||
|
<br><dt><code>ipa-max-agg-items</code><dd>IPA-CP is also capable to propagate a number of scalar values passed
|
||
|
in an aggregate. <samp><span class="option">ipa-max-agg-items</span></samp> controls the maximum
|
||
|
number of such values per one parameter.
|
||
|
|
||
|
<br><dt><code>ipa-cp-loop-hint-bonus</code><dd>When IPA-CP determines that a cloning candidate would make the number
|
||
|
of iterations of a loop known, it adds a bonus of
|
||
|
<samp><span class="option">ipa-cp-loop-hint-bonus</span></samp> to the profitability score of
|
||
|
the candidate.
|
||
|
|
||
|
<br><dt><code>ipa-cp-array-index-hint-bonus</code><dd>When IPA-CP determines that a cloning candidate would make the index of
|
||
|
an array access known, it adds a bonus of
|
||
|
<samp><span class="option">ipa-cp-array-index-hint-bonus</span></samp> to the profitability
|
||
|
score of the candidate.
|
||
|
|
||
|
<br><dt><code>ipa-max-aa-steps</code><dd>During its analysis of function bodies, IPA-CP employs alias analysis
|
||
|
in order to track values pointed to by function parameters. In order
|
||
|
not spend too much time analyzing huge functions, it gives up and
|
||
|
consider all memory clobbered after examining
|
||
|
<samp><span class="option">ipa-max-aa-steps</span></samp> statements modifying memory.
|
||
|
|
||
|
<br><dt><code>lto-partitions</code><dd>Specify desired number of partitions produced during WHOPR compilation.
|
||
|
The number of partitions should exceed the number of CPUs used for compilation.
|
||
|
The default value is 32.
|
||
|
|
||
|
<br><dt><code>lto-minpartition</code><dd>Size of minimal partition for WHOPR (in estimated instructions).
|
||
|
This prevents expenses of splitting very small programs into too many
|
||
|
partitions.
|
||
|
|
||
|
<br><dt><code>cxx-max-namespaces-for-diagnostic-help</code><dd>The maximum number of namespaces to consult for suggestions when C++
|
||
|
name lookup fails for an identifier. The default is 1000.
|
||
|
|
||
|
<br><dt><code>sink-frequency-threshold</code><dd>The maximum relative execution frequency (in percents) of the target block
|
||
|
relative to a statement's original block to allow statement sinking of a
|
||
|
statement. Larger numbers result in more aggressive statement sinking.
|
||
|
The default value is 75. A small positive adjustment is applied for
|
||
|
statements with memory operands as those are even more profitable so sink.
|
||
|
|
||
|
<br><dt><code>max-stores-to-sink</code><dd>The maximum number of conditional stores paires that can be sunk. Set to 0
|
||
|
if either vectorization (<samp><span class="option">-ftree-vectorize</span></samp>) or if-conversion
|
||
|
(<samp><span class="option">-ftree-loop-if-convert</span></samp>) is disabled. The default is 2.
|
||
|
|
||
|
<br><dt><code>allow-store-data-races</code><dd>Allow optimizers to introduce new data races on stores.
|
||
|
Set to 1 to allow, otherwise to 0. This option is enabled by default
|
||
|
at optimization level <samp><span class="option">-Ofast</span></samp>.
|
||
|
|
||
|
<br><dt><code>case-values-threshold</code><dd>The smallest number of different values for which it is best to use a
|
||
|
jump-table instead of a tree of conditional branches. If the value is
|
||
|
0, use the default for the machine. The default is 0.
|
||
|
|
||
|
<br><dt><code>tree-reassoc-width</code><dd>Set the maximum number of instructions executed in parallel in
|
||
|
reassociated tree. This parameter overrides target dependent
|
||
|
heuristics used by default if has non zero value.
|
||
|
|
||
|
<br><dt><code>sched-pressure-algorithm</code><dd>Choose between the two available implementations of
|
||
|
<samp><span class="option">-fsched-pressure</span></samp>. Algorithm 1 is the original implementation
|
||
|
and is the more likely to prevent instructions from being reordered.
|
||
|
Algorithm 2 was designed to be a compromise between the relatively
|
||
|
conservative approach taken by algorithm 1 and the rather aggressive
|
||
|
approach taken by the default scheduler. It relies more heavily on
|
||
|
having a regular register file and accurate register pressure classes.
|
||
|
See <samp><span class="file">haifa-sched.c</span></samp> in the GCC sources for more details.
|
||
|
|
||
|
<p>The default choice depends on the target.
|
||
|
|
||
|
<br><dt><code>max-slsr-cand-scan</code><dd>Set the maximum number of existing candidates that are considered when
|
||
|
seeking a basis for a new straight-line strength reduction candidate.
|
||
|
|
||
|
<br><dt><code>asan-globals</code><dd>Enable buffer overflow detection for global objects. This kind
|
||
|
of protection is enabled by default if you are using
|
||
|
<samp><span class="option">-fsanitize=address</span></samp> option.
|
||
|
To disable global objects protection use <samp><span class="option">--param asan-globals=0</span></samp>.
|
||
|
|
||
|
<br><dt><code>asan-stack</code><dd>Enable buffer overflow detection for stack objects. This kind of
|
||
|
protection is enabled by default when using<samp><span class="option">-fsanitize=address</span></samp>.
|
||
|
To disable stack protection use <samp><span class="option">--param asan-stack=0</span></samp> option.
|
||
|
|
||
|
<br><dt><code>asan-instrument-reads</code><dd>Enable buffer overflow detection for memory reads. This kind of
|
||
|
protection is enabled by default when using <samp><span class="option">-fsanitize=address</span></samp>.
|
||
|
To disable memory reads protection use
|
||
|
<samp><span class="option">--param asan-instrument-reads=0</span></samp>.
|
||
|
|
||
|
<br><dt><code>asan-instrument-writes</code><dd>Enable buffer overflow detection for memory writes. This kind of
|
||
|
protection is enabled by default when using <samp><span class="option">-fsanitize=address</span></samp>.
|
||
|
To disable memory writes protection use
|
||
|
<samp><span class="option">--param asan-instrument-writes=0</span></samp> option.
|
||
|
|
||
|
<br><dt><code>asan-memintrin</code><dd>Enable detection for built-in functions. This kind of protection
|
||
|
is enabled by default when using <samp><span class="option">-fsanitize=address</span></samp>.
|
||
|
To disable built-in functions protection use
|
||
|
<samp><span class="option">--param asan-memintrin=0</span></samp>.
|
||
|
|
||
|
<br><dt><code>asan-use-after-return</code><dd>Enable detection of use-after-return. This kind of protection
|
||
|
is enabled by default when using <samp><span class="option">-fsanitize=address</span></samp> option.
|
||
|
To disable use-after-return detection use
|
||
|
<samp><span class="option">--param asan-use-after-return=0</span></samp>.
|
||
|
|
||
|
<br><dt><code>asan-instrumentation-with-call-threshold</code><dd>If number of memory accesses in function being instrumented
|
||
|
is greater or equal to this number, use callbacks instead of inline checks.
|
||
|
E.g. to disable inline code use
|
||
|
<samp><span class="option">--param asan-instrumentation-with-call-threshold=0</span></samp>.
|
||
|
|
||
|
<br><dt><code>chkp-max-ctor-size</code><dd>Static constructors generated by Pointer Bounds Checker may become very
|
||
|
large and significantly increase compile time at optimization level
|
||
|
<samp><span class="option">-O1</span></samp> and higher. This parameter is a maximum nubmer of statements
|
||
|
in a single generated constructor. Default value is 5000.
|
||
|
|
||
|
<br><dt><code>max-fsm-thread-path-insns</code><dd>Maximum number of instructions to copy when duplicating blocks on a
|
||
|
finite state automaton jump thread path. The default is 100.
|
||
|
|
||
|
<br><dt><code>max-fsm-thread-length</code><dd>Maximum number of basic blocks on a finite state automaton jump thread
|
||
|
path. The default is 10.
|
||
|
|
||
|
<br><dt><code>max-fsm-thread-paths</code><dd>Maximum number of new jump thread paths to create for a finite state
|
||
|
automaton. The default is 50.
|
||
|
|
||
|
</dl>
|
||
|
</dl>
|
||
|
|
||
|
</body></html>
|
||
|
|