You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
210 lines
9.6 KiB
HTML
210 lines
9.6 KiB
HTML
4 years ago
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<!-- Copyright (C) 1988-2018 Free Software Foundation, Inc.
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document
|
||
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
|
any later version published by the Free Software Foundation; with the
|
||
|
Invariant Sections being "Funding Free Software", the Front-Cover
|
||
|
Texts being (a) (see below), and with the Back-Cover Texts being (b)
|
||
|
(see below). A copy of the license is included in the section entitled
|
||
|
"GNU Free Documentation License".
|
||
|
|
||
|
(a) The FSF's Front-Cover Text is:
|
||
|
|
||
|
A GNU Manual
|
||
|
|
||
|
(b) The FSF's Back-Cover Text is:
|
||
|
|
||
|
You have freedom to copy and modify this GNU Manual, like GNU
|
||
|
software. Copies published by the Free Software Foundation raise
|
||
|
funds for GNU development. -->
|
||
|
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<title>LTO Overview (GNU Compiler Collection (GCC) Internals)</title>
|
||
|
|
||
|
<meta name="description" content="LTO Overview (GNU Compiler Collection (GCC) Internals)">
|
||
|
<meta name="keywords" content="LTO Overview (GNU Compiler Collection (GCC) Internals)">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content="makeinfo">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||
|
<link href="index.html#Top" rel="start" title="Top">
|
||
|
<link href="Option-Index.html#Option-Index" rel="index" title="Option Index">
|
||
|
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
|
||
|
<link href="LTO.html#LTO" rel="up" title="LTO">
|
||
|
<link href="LTO-object-file-layout.html#LTO-object-file-layout" rel="next" title="LTO object file layout">
|
||
|
<link href="LTO.html#LTO" rel="prev" title="LTO">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.summary-letter {text-decoration: none}
|
||
|
blockquote.indentedblock {margin-right: 0em}
|
||
|
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
|
||
|
blockquote.smallquotation {font-size: smaller}
|
||
|
div.display {margin-left: 3.2em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
div.lisp {margin-left: 3.2em}
|
||
|
div.smalldisplay {margin-left: 3.2em}
|
||
|
div.smallexample {margin-left: 3.2em}
|
||
|
div.smalllisp {margin-left: 3.2em}
|
||
|
kbd {font-style: oblique}
|
||
|
pre.display {font-family: inherit}
|
||
|
pre.format {font-family: inherit}
|
||
|
pre.menu-comment {font-family: serif}
|
||
|
pre.menu-preformatted {font-family: serif}
|
||
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
|
pre.smallexample {font-size: smaller}
|
||
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
|
pre.smalllisp {font-size: smaller}
|
||
|
span.nolinebreak {white-space: nowrap}
|
||
|
span.roman {font-family: initial; font-weight: normal}
|
||
|
span.sansserif {font-family: sans-serif; font-weight: normal}
|
||
|
ul.no-bullet {list-style: none}
|
||
|
-->
|
||
|
</style>
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en">
|
||
|
<a name="LTO-Overview"></a>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="LTO-object-file-layout.html#LTO-object-file-layout" accesskey="n" rel="next">LTO object file layout</a>, Up: <a href="LTO.html#LTO" accesskey="u" rel="up">LTO</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<a name="Design-Overview"></a>
|
||
|
<h3 class="section">25.1 Design Overview</h3>
|
||
|
|
||
|
<p>Link time optimization is implemented as a GCC front end for a
|
||
|
bytecode representation of GIMPLE that is emitted in special sections
|
||
|
of <code>.o</code> files. Currently, LTO support is enabled in most
|
||
|
ELF-based systems, as well as darwin, cygwin and mingw systems.
|
||
|
</p>
|
||
|
<p>Since GIMPLE bytecode is saved alongside final object code, object
|
||
|
files generated with LTO support are larger than regular object files.
|
||
|
This “fat” object format makes it easy to integrate LTO into
|
||
|
existing build systems, as one can, for instance, produce archives of
|
||
|
the files. Additionally, one might be able to ship one set of fat
|
||
|
objects which could be used both for development and the production of
|
||
|
optimized builds. A, perhaps surprising, side effect of this feature
|
||
|
is that any mistake in the toolchain leads to LTO information not
|
||
|
being used (e.g. an older <code>libtool</code> calling <code>ld</code> directly).
|
||
|
This is both an advantage, as the system is more robust, and a
|
||
|
disadvantage, as the user is not informed that the optimization has
|
||
|
been disabled.
|
||
|
</p>
|
||
|
<p>The current implementation only produces “fat” objects, effectively
|
||
|
doubling compilation time and increasing file sizes up to 5x the
|
||
|
original size. This hides the problem that some tools, such as
|
||
|
<code>ar</code> and <code>nm</code>, need to understand symbol tables of LTO
|
||
|
sections. These tools were extended to use the plugin infrastructure,
|
||
|
and with these problems solved, GCC will also support “slim” objects
|
||
|
consisting of the intermediate code alone.
|
||
|
</p>
|
||
|
<p>At the highest level, LTO splits the compiler in two. The first half
|
||
|
(the “writer”) produces a streaming representation of all the
|
||
|
internal data structures needed to optimize and generate code. This
|
||
|
includes declarations, types, the callgraph and the GIMPLE representation
|
||
|
of function bodies.
|
||
|
</p>
|
||
|
<p>When <samp>-flto</samp> is given during compilation of a source file, the
|
||
|
pass manager executes all the passes in <code>all_lto_gen_passes</code>.
|
||
|
Currently, this phase is composed of two IPA passes:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li> <code>pass_ipa_lto_gimple_out</code>
|
||
|
This pass executes the function <code>lto_output</code> in
|
||
|
<samp>lto-streamer-out.c</samp>, which traverses the call graph encoding
|
||
|
every reachable declaration, type and function. This generates a
|
||
|
memory representation of all the file sections described below.
|
||
|
|
||
|
</li><li> <code>pass_ipa_lto_finish_out</code>
|
||
|
This pass executes the function <code>produce_asm_for_decls</code> in
|
||
|
<samp>lto-streamer-out.c</samp>, which takes the memory image built in the
|
||
|
previous pass and encodes it in the corresponding ELF file sections.
|
||
|
</li></ul>
|
||
|
|
||
|
<p>The second half of LTO support is the “reader”. This is implemented
|
||
|
as the GCC front end <samp>lto1</samp> in <samp>lto/lto.c</samp>. When
|
||
|
<samp>collect2</samp> detects a link set of <code>.o</code>/<code>.a</code> files with
|
||
|
LTO information and the <samp>-flto</samp> is enabled, it invokes
|
||
|
<samp>lto1</samp> which reads the set of files and aggregates them into a
|
||
|
single translation unit for optimization. The main entry point for
|
||
|
the reader is <samp>lto/lto.c</samp>:<code>lto_main</code>.
|
||
|
</p>
|
||
|
<a name="LTO-modes-of-operation"></a>
|
||
|
<h4 class="subsection">25.1.1 LTO modes of operation</h4>
|
||
|
|
||
|
<p>One of the main goals of the GCC link-time infrastructure was to allow
|
||
|
effective compilation of large programs. For this reason GCC implements two
|
||
|
link-time compilation modes.
|
||
|
</p>
|
||
|
<ol>
|
||
|
<li> <em>LTO mode</em>, in which the whole program is read into the
|
||
|
compiler at link-time and optimized in a similar way as if it
|
||
|
were a single source-level compilation unit.
|
||
|
|
||
|
</li><li> <em>WHOPR or partitioned mode</em>, designed to utilize multiple
|
||
|
CPUs and/or a distributed compilation environment to quickly link
|
||
|
large applications. WHOPR stands for WHOle Program optimizeR (not to
|
||
|
be confused with the semantics of <samp>-fwhole-program</samp>). It
|
||
|
partitions the aggregated callgraph from many different <code>.o</code>
|
||
|
files and distributes the compilation of the sub-graphs to different
|
||
|
CPUs.
|
||
|
|
||
|
<p>Note that distributed compilation is not implemented yet, but since
|
||
|
the parallelism is facilitated via generating a <code>Makefile</code>, it
|
||
|
would be easy to implement.
|
||
|
</p></li></ol>
|
||
|
|
||
|
<p>WHOPR splits LTO into three main stages:
|
||
|
</p><ol>
|
||
|
<li> Local generation (LGEN)
|
||
|
This stage executes in parallel. Every file in the program is compiled
|
||
|
into the intermediate language and packaged together with the local
|
||
|
call-graph and summary information. This stage is the same for both
|
||
|
the LTO and WHOPR compilation mode.
|
||
|
|
||
|
</li><li> Whole Program Analysis (WPA)
|
||
|
WPA is performed sequentially. The global call-graph is generated, and
|
||
|
a global analysis procedure makes transformation decisions. The global
|
||
|
call-graph is partitioned to facilitate parallel optimization during
|
||
|
phase 3. The results of the WPA stage are stored into new object files
|
||
|
which contain the partitions of program expressed in the intermediate
|
||
|
language and the optimization decisions.
|
||
|
|
||
|
</li><li> Local transformations (LTRANS)
|
||
|
This stage executes in parallel. All the decisions made during phase 2
|
||
|
are implemented locally in each partitioned object file, and the final
|
||
|
object code is generated. Optimizations which cannot be decided
|
||
|
efficiently during the phase 2 may be performed on the local
|
||
|
call-graph partitions.
|
||
|
</li></ol>
|
||
|
|
||
|
<p>WHOPR can be seen as an extension of the usual LTO mode of
|
||
|
compilation. In LTO, WPA and LTRANS are executed within a single
|
||
|
execution of the compiler, after the whole program has been read into
|
||
|
memory.
|
||
|
</p>
|
||
|
<p>When compiling in WHOPR mode, the callgraph is partitioned during
|
||
|
the WPA stage. The whole program is split into a given number of
|
||
|
partitions of roughly the same size. The compiler tries to
|
||
|
minimize the number of references which cross partition boundaries.
|
||
|
The main advantage of WHOPR is to allow the parallel execution of
|
||
|
LTRANS stages, which are the most time-consuming part of the
|
||
|
compilation process. Additionally, it avoids the need to load the
|
||
|
whole program into memory.
|
||
|
</p>
|
||
|
|
||
|
<hr>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="LTO-object-file-layout.html#LTO-object-file-layout" accesskey="n" rel="next">LTO object file layout</a>, Up: <a href="LTO.html#LTO" accesskey="u" rel="up">LTO</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Option-Index.html#Option-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|