You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
133 lines
6.1 KiB
HTML
133 lines
6.1 KiB
HTML
4 years ago
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<!-- This manual describes how to install and use the GNU multiple precision
|
||
|
arithmetic library, version 6.1.0.
|
||
|
|
||
|
Copyright 1991, 1993-2015 Free Software Foundation, Inc.
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document under
|
||
|
the terms of the GNU Free Documentation License, Version 1.3 or any later
|
||
|
version published by the Free Software Foundation; with no Invariant Sections,
|
||
|
with the Front-Cover Texts being "A GNU Manual", and with the Back-Cover
|
||
|
Texts being "You have freedom to copy and modify this GNU Manual, like GNU
|
||
|
software". A copy of the license is included in
|
||
|
GNU Free Documentation License. -->
|
||
|
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<title>Assembly Writing Guide (GNU MP 6.1.0)</title>
|
||
|
|
||
|
<meta name="description" content="How to install and use the GNU multiple precision arithmetic library, version 6.1.0.">
|
||
|
<meta name="keywords" content="Assembly Writing Guide (GNU MP 6.1.0)">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content="makeinfo">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||
|
<link href="index.html#Top" rel="start" title="Top">
|
||
|
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
|
||
|
<link href="Assembly-Coding.html#Assembly-Coding" rel="up" title="Assembly Coding">
|
||
|
<link href="Internals.html#Internals" rel="next" title="Internals">
|
||
|
<link href="Assembly-Loop-Unrolling.html#Assembly-Loop-Unrolling" rel="prev" title="Assembly Loop Unrolling">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.summary-letter {text-decoration: none}
|
||
|
blockquote.indentedblock {margin-right: 0em}
|
||
|
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
|
||
|
blockquote.smallquotation {font-size: smaller}
|
||
|
div.display {margin-left: 3.2em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
div.lisp {margin-left: 3.2em}
|
||
|
div.smalldisplay {margin-left: 3.2em}
|
||
|
div.smallexample {margin-left: 3.2em}
|
||
|
div.smalllisp {margin-left: 3.2em}
|
||
|
kbd {font-style: oblique}
|
||
|
pre.display {font-family: inherit}
|
||
|
pre.format {font-family: inherit}
|
||
|
pre.menu-comment {font-family: serif}
|
||
|
pre.menu-preformatted {font-family: serif}
|
||
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
|
pre.smallexample {font-size: smaller}
|
||
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
|
pre.smalllisp {font-size: smaller}
|
||
|
span.nolinebreak {white-space: nowrap}
|
||
|
span.roman {font-family: initial; font-weight: normal}
|
||
|
span.sansserif {font-family: sans-serif; font-weight: normal}
|
||
|
ul.no-bullet {list-style: none}
|
||
|
-->
|
||
|
</style>
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en">
|
||
|
<a name="Assembly-Writing-Guide"></a>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Previous: <a href="Assembly-Loop-Unrolling.html#Assembly-Loop-Unrolling" accesskey="p" rel="prev">Assembly Loop Unrolling</a>, Up: <a href="Assembly-Coding.html#Assembly-Coding" accesskey="u" rel="up">Assembly Coding</a> [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<a name="Writing-Guide"></a>
|
||
|
<h4 class="subsection">15.8.10 Writing Guide</h4>
|
||
|
<a name="index-Assembly-writing-guide"></a>
|
||
|
|
||
|
<p>This is a guide to writing software pipelined loops for processing limb
|
||
|
vectors in assembly.
|
||
|
</p>
|
||
|
<p>First determine the algorithm and which instructions are needed. Code it
|
||
|
without unrolling or scheduling, to make sure it works. On a 3-operand CPU
|
||
|
try to write each new value to a new register, this will greatly simplify later
|
||
|
steps.
|
||
|
</p>
|
||
|
<p>Then note for each instruction the functional unit and/or issue port
|
||
|
requirements. If an instruction can use either of two units, like U0 or U1
|
||
|
then make a category “U0/U1”. Count the total using each unit (or combined
|
||
|
unit), and count all instructions.
|
||
|
</p>
|
||
|
<p>Figure out from those counts the best possible loop time. The goal will be to
|
||
|
find a perfect schedule where instruction latencies are completely hidden.
|
||
|
The total instruction count might be the limiting factor, or perhaps a
|
||
|
particular functional unit. It might be possible to tweak the instructions to
|
||
|
help the limiting factor.
|
||
|
</p>
|
||
|
<p>Suppose the loop time is <em>N</em>, then make <em>N</em> issue buckets, with the
|
||
|
final loop branch at the end of the last. Now fill the buckets with dummy
|
||
|
instructions using the functional units desired. Run this to make sure the
|
||
|
intended speed is reached.
|
||
|
</p>
|
||
|
<p>Now replace the dummy instructions with the real instructions from the slow
|
||
|
but correct loop you started with. The first will typically be a load
|
||
|
instruction. Then the instruction using that value is placed in a bucket an
|
||
|
appropriate distance down. Run the loop again, to check it still runs at
|
||
|
target speed.
|
||
|
</p>
|
||
|
<p>Keep placing instructions, frequently measuring the loop. After a few you
|
||
|
will need to wrap around from the last bucket back to the top of the loop. If
|
||
|
you used the new-register for new-value strategy above then there will be no
|
||
|
register conflicts. If not then take care not to clobber something already in
|
||
|
use. Changing registers at this time is very error prone.
|
||
|
</p>
|
||
|
<p>The loop will overlap two or more of the original loop iterations, and the
|
||
|
computation of one vector element result will be started in one iteration of
|
||
|
the new loop, and completed one or several iterations later.
|
||
|
</p>
|
||
|
<p>The final step is to create feed-in and wind-down code for the loop. A good
|
||
|
way to do this is to make a copy (or copies) of the loop at the start and
|
||
|
delete those instructions which don’t have valid antecedents, and at the end
|
||
|
replicate and delete those whose results are unwanted (including any further
|
||
|
loads).
|
||
|
</p>
|
||
|
<p>The loop will have a minimum number of limbs loaded and processed, so the
|
||
|
feed-in code must test if the request size is smaller and skip either to a
|
||
|
suitable part of the wind-down or to special code for small sizes.
|
||
|
</p>
|
||
|
|
||
|
<hr>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Previous: <a href="Assembly-Loop-Unrolling.html#Assembly-Loop-Unrolling" accesskey="p" rel="prev">Assembly Loop Unrolling</a>, Up: <a href="Assembly-Coding.html#Assembly-Coding" accesskey="u" rel="up">Assembly Coding</a> [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|