You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

139 lines
6.9 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This manual describes how to install and use the GNU multiple precision
arithmetic library, version 6.1.0.
Copyright 1991, 1993-2015 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.3 or any later
version published by the Free Software Foundation; with no Invariant Sections,
with the Front-Cover Texts being "A GNU Manual", and with the Back-Cover
Texts being "You have freedom to copy and modify this GNU Manual, like GNU
software". A copy of the license is included in
GNU Free Documentation License. -->
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Binary GCD (GNU MP 6.1.0)</title>
<meta name="description" content="How to install and use the GNU multiple precision arithmetic library, version 6.1.0.">
<meta name="keywords" content="Binary GCD (GNU MP 6.1.0)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link href="index.html#Top" rel="start" title="Top">
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
<link href="Greatest-Common-Divisor-Algorithms.html#Greatest-Common-Divisor-Algorithms" rel="up" title="Greatest Common Divisor Algorithms">
<link href="Lehmer_0027s-Algorithm.html#Lehmer_0027s-Algorithm" rel="next" title="Lehmer's Algorithm">
<link href="Greatest-Common-Divisor-Algorithms.html#Greatest-Common-Divisor-Algorithms" rel="prev" title="Greatest Common Divisor Algorithms">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en">
<a name="Binary-GCD"></a>
<div class="header">
<p>
Next: <a href="Lehmer_0027s-Algorithm.html#Lehmer_0027s-Algorithm" accesskey="n" rel="next">Lehmer's Algorithm</a>, Previous: <a href="Greatest-Common-Divisor-Algorithms.html#Greatest-Common-Divisor-Algorithms" accesskey="p" rel="prev">Greatest Common Divisor Algorithms</a>, Up: <a href="Greatest-Common-Divisor-Algorithms.html#Greatest-Common-Divisor-Algorithms" accesskey="u" rel="up">Greatest Common Divisor Algorithms</a> &nbsp; [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Binary-GCD-1"></a>
<h4 class="subsection">15.3.1 Binary GCD</h4>
<p>At small sizes GMP uses an <em>O(N^2)</em> binary style GCD. This is described
in many textbooks, for example Knuth section 4.5.2 algorithm B. It simply
consists of successively reducing odd operands <em>a</em> and <em>b</em> using
</p>
<blockquote>
<p><em>a,b = abs(a-b),min(a,b)</em> <br>
strip factors of 2 from <em>a</em>
</p></blockquote>
<p>The Euclidean GCD algorithm, as per Knuth algorithms E and A, repeatedly
computes the quotient <em>q = floor(a/b)</em> and replaces
<em>a,b</em> by <em>v, u - q v</em>. The binary algorithm has so far been found to
be faster than the Euclidean algorithm everywhere. One reason the binary
method does well is that the implied quotient at each step is usually small,
so often only one or two subtractions are needed to get the same effect as a
division. Quotients 1, 2 and 3 for example occur 67.7% of the time, see Knuth
section 4.5.3 Theorem E.
</p>
<p>When the implied quotient is large, meaning <em>b</em> is much smaller than
<em>a</em>, then a division is worthwhile. This is the basis for the initial
<em>a mod b</em> reductions in <code>mpn_gcd</code> and <code>mpn_gcd_1</code> (the latter
for both Nx1 and 1x1 cases). But after that initial reduction,
big quotients occur too rarely to make it worth checking for them.
</p>
<br>
<p>The final <em>1x1</em> GCD in <code>mpn_gcd_1</code> is done in the generic C
code as described above. For two N-bit operands, the algorithm takes about
0.68 iterations per bit. For optimum performance some attention needs to be
paid to the way the factors of 2 are stripped from <em>a</em>.
</p>
<p>Firstly it may be noted that in twos complement the number of low zero bits on
<em>a-b</em> is the same as <em>b-a</em>, so counting or testing can begin on
<em>a-b</em> without waiting for <em>abs(a-b)</em> to be determined.
</p>
<p>A loop stripping low zero bits tends not to branch predict well, since the
condition is data dependent. But on average there&rsquo;s only a few low zeros, so
an option is to strip one or two bits arithmetically then loop for more (as
done for AMD K6). Or use a lookup table to get a count for several bits then
loop for more (as done for AMD K7). An alternative approach is to keep just
one of <em>a</em> or <em>b</em> odd and iterate
</p>
<blockquote>
<p><em>a,b = abs(a-b), min(a,b)</em> <br>
<em>a = a/2</em> if even <br>
<em>b = b/2</em> if even
</p></blockquote>
<p>This requires about 1.25 iterations per bit, but stripping of a single bit at
each step avoids any branching. Repeating the bit strip reduces to about 0.9
iterations per bit, which may be a worthwhile tradeoff.
</p>
<p>Generally with the above approaches a speed of perhaps 6 cycles per bit can be
achieved, which is still not terribly fast with for instance a 64-bit GCD
taking nearly 400 cycles. It&rsquo;s this sort of time which means it&rsquo;s not usually
advantageous to combine a set of divisibility tests into a GCD.
</p>
<p>Currently, the binary algorithm is used for GCD only when <em>N &lt; 3</em>.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Lehmer_0027s-Algorithm.html#Lehmer_0027s-Algorithm" accesskey="n" rel="next">Lehmer's Algorithm</a>, Previous: <a href="Greatest-Common-Divisor-Algorithms.html#Greatest-Common-Divisor-Algorithms" accesskey="p" rel="prev">Greatest Common Divisor Algorithms</a>, Up: <a href="Greatest-Common-Divisor-Algorithms.html#Greatest-Common-Divisor-Algorithms" accesskey="u" rel="up">Greatest Common Divisor Algorithms</a> &nbsp; [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>