You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

185 lines
9.9 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This manual describes how to install and use the GNU multiple precision
arithmetic library, version 6.1.0.
Copyright 1991, 1993-2015 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.3 or any later
version published by the Free Software Foundation; with no Invariant Sections,
with the Front-Cover Texts being "A GNU Manual", and with the Back-Cover
Texts being "You have freedom to copy and modify this GNU Manual, like GNU
software". A copy of the license is included in
GNU Free Documentation License. -->
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
<head>
<title>FFT Multiplication (GNU MP 6.1.0)</title>
<meta name="description" content="How to install and use the GNU multiple precision arithmetic library, version 6.1.0.">
<meta name="keywords" content="FFT Multiplication (GNU MP 6.1.0)">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link href="index.html#Top" rel="start" title="Top">
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
<link href="Multiplication-Algorithms.html#Multiplication-Algorithms" rel="up" title="Multiplication Algorithms">
<link href="Other-Multiplication.html#Other-Multiplication" rel="next" title="Other Multiplication">
<link href="Higher-degree-Toom_0027n_0027half.html#Higher-degree-Toom_0027n_0027half" rel="prev" title="Higher degree Toom'n'half">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en">
<a name="FFT-Multiplication"></a>
<div class="header">
<p>
Next: <a href="Other-Multiplication.html#Other-Multiplication" accesskey="n" rel="next">Other Multiplication</a>, Previous: <a href="Higher-degree-Toom_0027n_0027half.html#Higher-degree-Toom_0027n_0027half" accesskey="p" rel="prev">Higher degree Toom'n'half</a>, Up: <a href="Multiplication-Algorithms.html#Multiplication-Algorithms" accesskey="u" rel="up">Multiplication Algorithms</a> &nbsp; [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="FFT-Multiplication-1"></a>
<h4 class="subsection">15.1.6 FFT Multiplication</h4>
<a name="index-FFT-multiplication-1"></a>
<a name="index-Fast-Fourier-Transform"></a>
<p>At large to very large sizes a Fermat style FFT multiplication is used,
following Sch&ouml;nhage and Strassen (see <a href="References.html#References">References</a>). Descriptions of FFTs
in various forms can be found in many textbooks, for instance Knuth section
4.3.3 part C or Lipson chapter IX. A brief description of the form used in
GMP is given here.
</p>
<p>The multiplication done is <em>x*y mod 2^N+1</em>, for a given
<em>N</em>. A full product <em>x*y</em> is obtained by choosing <em>N&gt;=bits(x)+bits(y)</em> and padding
<em>x</em> and <em>y</em> with high zero limbs. The modular product is the native
form for the algorithm, so padding to get a full product is unavoidable.
</p>
<p>The algorithm follows a split, evaluate, pointwise multiply, interpolate and
combine similar to that described above for Karatsuba and Toom-3. A <em>k</em>
parameter controls the split, with an FFT-<em>k</em> splitting into <em>2^k</em>
pieces of <em>M=N/2^k</em> bits each. <em>N</em> must be a multiple of
<em>(2^k)*<code>mp_bits_per_limb</code></em> so
the split falls on limb boundaries, avoiding bit shifts in the split and
combine stages.
</p>
<p>The evaluations, pointwise multiplications, and interpolation, are all done
modulo <em>2^N'+1</em> where <em>N'</em> is <em>2M+k+3</em> rounded up to a
multiple of <em>2^k</em> and of <code>mp_bits_per_limb</code>. The results of
interpolation will be the following negacyclic convolution of the input
pieces, and the choice of <em>N'</em> ensures these sums aren&rsquo;t truncated.
</p>
<div class="example">
<pre class="example"> ---
\ b
w[n] = / (-1) * x[i] * y[j]
---
i+j==b*2^k+n
b=0,1
</pre></div>
<p>The points used for the evaluation are <em>g^i</em> for <em>i=0</em> to
<em>2^k-1</em> where <em>g=2^(2N'/2^k)</em>. <em>g</em> is a
<em>2^k'</em>th root of unity mod <em>2^N'+1</em>, which produces necessary
cancellations at the interpolation stage, and it&rsquo;s also a power of 2 so the
fast Fourier transforms used for the evaluation and interpolation do only
shifts, adds and negations.
</p>
<p>The pointwise multiplications are done modulo <em>2^N'+1</em> and either
recurse into a further FFT or use a plain multiplication (Toom-3, Karatsuba or
basecase), whichever is optimal at the size <em>N'</em>. The interpolation is
an inverse fast Fourier transform. The resulting set of sums of <em>x[i]*y[j]</em> are added at appropriate offsets to give the final result.
</p>
<p>Squaring is the same, but <em>x</em> is the only input so it&rsquo;s one transform at
the evaluate stage and the pointwise multiplies are squares. The
interpolation is the same.
</p>
<p>For a mod <em>2^N+1</em> product, an FFT-<em>k</em> is an <em>O(N^(k/(k-1)))</em> algorithm, the exponent representing <em>2^k</em> recursed
modular multiplies each <em>1/2^(k-1)</em> the size of the original.
Each successive <em>k</em> is an asymptotic improvement, but overheads mean each
is only faster at bigger and bigger sizes. In the code, <code>MUL_FFT_TABLE</code>
and <code>SQR_FFT_TABLE</code> are the thresholds where each <em>k</em> is used. Each
new <em>k</em> effectively swaps some multiplying for some shifts, adds and
overheads.
</p>
<p>A mod <em>2^N+1</em> product can be formed with a normal
<em>NxN-&gt;2N</em> bit multiply plus a subtraction, so an FFT
and Toom-3 etc can be compared directly. A <em>k=4</em> FFT at
<em>O(N^1.333<!-- /@w -->)</em> can be expected to be the first faster than Toom-3 at
<em>O(N^1.465<!-- /@w -->)</em>. In practice this is what&rsquo;s found, with
<code>MUL_FFT_MODF_THRESHOLD</code> and <code>SQR_FFT_MODF_THRESHOLD</code> being between
300 and 1000 limbs, depending on the CPU. So far it&rsquo;s been found that only
very large FFTs recurse into pointwise multiplies above these sizes.
</p>
<p>When an FFT is to give a full product, the change of <em>N</em> to <em>2N</em>
doesn&rsquo;t alter the theoretical complexity for a given <em>k</em>, but for the
purposes of considering where an FFT might be first used it can be assumed
that the FFT is recursing into a normal multiply and that on that basis it&rsquo;s
doing <em>2^k</em> recursed multiplies each <em>1/2^(k-2)</em> the size of
the inputs, making it <em>O(N^(k/(k-2)))</em>. This would mean
<em>k=7</em> at <em>O(N^1.4<!-- /@w -->)</em> would be the first FFT faster than Toom-3.
In practice <code>MUL_FFT_THRESHOLD</code> and <code>SQR_FFT_THRESHOLD</code> have been
found to be in the <em>k=8</em> range, somewhere between 3000 and 10000 limbs.
</p>
<p>The way <em>N</em> is split into <em>2^k</em> pieces and then <em>2M+k+3</em> is
rounded up to a multiple of <em>2^k</em> and <code>mp_bits_per_limb</code> means that
when <em>2^k&gt;=<code>mp\_bits\_per\_limb</code></em> the effective <em>N</em> is a
multiple of <em>2^(2k-1)</em> bits. The <em>+k+3</em> means some values of
<em>N</em> just under such a multiple will be rounded to the next. The
complexity calculations above assume that a favourable size is used, meaning
one which isn&rsquo;t padded through rounding, and it&rsquo;s also assumed that the extra
<em>+k+3</em> bits are negligible at typical FFT sizes.
</p>
<p>The practical effect of the <em>2^(2k-1)</em> constraint is to introduce a
step-effect into measured speeds. For example <em>k=8</em> will round <em>N</em>
up to a multiple of 32768 bits, so for a 32-bit limb there&rsquo;ll be 512 limb
groups of sizes for which <code>mpn_mul_n</code> runs at the same speed. Or for
<em>k=9</em> groups of 2048 limbs, <em>k=10</em> groups of 8192 limbs, etc. In
practice it&rsquo;s been found each <em>k</em> is used at quite small multiples of its
size constraint and so the step effect is quite noticeable in a time versus
size graph.
</p>
<p>The threshold determinations currently measure at the mid-points of size
steps, but this is sub-optimal since at the start of a new step it can happen
that it&rsquo;s better to go back to the previous <em>k</em> for a while. Something
more sophisticated for <code>MUL_FFT_TABLE</code> and <code>SQR_FFT_TABLE</code> will be
needed.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Other-Multiplication.html#Other-Multiplication" accesskey="n" rel="next">Other Multiplication</a>, Previous: <a href="Higher-degree-Toom_0027n_0027half.html#Higher-degree-Toom_0027n_0027half" accesskey="p" rel="prev">Higher degree Toom'n'half</a>, Up: <a href="Multiplication-Algorithms.html#Multiplication-Algorithms" accesskey="u" rel="up">Multiplication Algorithms</a> &nbsp; [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>