You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

133 lines
6.1 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This manual describes how to install and use the GNU multiple precision
arithmetic library, version 6.1.0.
Copyright 1991, 1993-2015 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.3 or any later
version published by the Free Software Foundation; with no Invariant Sections,
with the Front-Cover Texts being "A GNU Manual", and with the Back-Cover
Texts being "You have freedom to copy and modify this GNU Manual, like GNU
software". A copy of the license is included in
GNU Free Documentation License. -->
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Assembly Writing Guide (GNU MP 6.1.0)</title>
<meta name="description" content="How to install and use the GNU multiple precision arithmetic library, version 6.1.0.">
<meta name="keywords" content="Assembly Writing Guide (GNU MP 6.1.0)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link href="index.html#Top" rel="start" title="Top">
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
<link href="Assembly-Coding.html#Assembly-Coding" rel="up" title="Assembly Coding">
<link href="Internals.html#Internals" rel="next" title="Internals">
<link href="Assembly-Loop-Unrolling.html#Assembly-Loop-Unrolling" rel="prev" title="Assembly Loop Unrolling">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en">
<a name="Assembly-Writing-Guide"></a>
<div class="header">
<p>
Previous: <a href="Assembly-Loop-Unrolling.html#Assembly-Loop-Unrolling" accesskey="p" rel="prev">Assembly Loop Unrolling</a>, Up: <a href="Assembly-Coding.html#Assembly-Coding" accesskey="u" rel="up">Assembly Coding</a> &nbsp; [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Writing-Guide"></a>
<h4 class="subsection">15.8.10 Writing Guide</h4>
<a name="index-Assembly-writing-guide"></a>
<p>This is a guide to writing software pipelined loops for processing limb
vectors in assembly.
</p>
<p>First determine the algorithm and which instructions are needed. Code it
without unrolling or scheduling, to make sure it works. On a 3-operand CPU
try to write each new value to a new register, this will greatly simplify later
steps.
</p>
<p>Then note for each instruction the functional unit and/or issue port
requirements. If an instruction can use either of two units, like U0 or U1
then make a category &ldquo;U0/U1&rdquo;. Count the total using each unit (or combined
unit), and count all instructions.
</p>
<p>Figure out from those counts the best possible loop time. The goal will be to
find a perfect schedule where instruction latencies are completely hidden.
The total instruction count might be the limiting factor, or perhaps a
particular functional unit. It might be possible to tweak the instructions to
help the limiting factor.
</p>
<p>Suppose the loop time is <em>N</em>, then make <em>N</em> issue buckets, with the
final loop branch at the end of the last. Now fill the buckets with dummy
instructions using the functional units desired. Run this to make sure the
intended speed is reached.
</p>
<p>Now replace the dummy instructions with the real instructions from the slow
but correct loop you started with. The first will typically be a load
instruction. Then the instruction using that value is placed in a bucket an
appropriate distance down. Run the loop again, to check it still runs at
target speed.
</p>
<p>Keep placing instructions, frequently measuring the loop. After a few you
will need to wrap around from the last bucket back to the top of the loop. If
you used the new-register for new-value strategy above then there will be no
register conflicts. If not then take care not to clobber something already in
use. Changing registers at this time is very error prone.
</p>
<p>The loop will overlap two or more of the original loop iterations, and the
computation of one vector element result will be started in one iteration of
the new loop, and completed one or several iterations later.
</p>
<p>The final step is to create feed-in and wind-down code for the loop. A good
way to do this is to make a copy (or copies) of the loop at the start and
delete those instructions which don&rsquo;t have valid antecedents, and at the end
replicate and delete those whose results are unwanted (including any further
loads).
</p>
<p>The loop will have a minimum number of limbs loaded and processed, so the
feed-in code must test if the request size is smaller and skip either to a
suitable part of the wind-down or to special code for small sizes.
</p>
<hr>
<div class="header">
<p>
Previous: <a href="Assembly-Loop-Unrolling.html#Assembly-Loop-Unrolling" accesskey="p" rel="prev">Assembly Loop Unrolling</a>, Up: <a href="Assembly-Coding.html#Assembly-Coding" accesskey="u" rel="up">Assembly Coding</a> &nbsp; [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>