You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
185 lines
9.9 KiB
HTML
185 lines
9.9 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<!-- This manual describes how to install and use the GNU multiple precision
|
|
arithmetic library, version 6.1.0.
|
|
|
|
Copyright 1991, 1993-2015 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to copy, distribute and/or modify this document under
|
|
the terms of the GNU Free Documentation License, Version 1.3 or any later
|
|
version published by the Free Software Foundation; with no Invariant Sections,
|
|
with the Front-Cover Texts being "A GNU Manual", and with the Back-Cover
|
|
Texts being "You have freedom to copy and modify this GNU Manual, like GNU
|
|
software". A copy of the license is included in
|
|
GNU Free Documentation License. -->
|
|
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
|
|
<head>
|
|
<title>FFT Multiplication (GNU MP 6.1.0)</title>
|
|
|
|
<meta name="description" content="How to install and use the GNU multiple precision arithmetic library, version 6.1.0.">
|
|
<meta name="keywords" content="FFT Multiplication (GNU MP 6.1.0)">
|
|
<meta name="resource-type" content="document">
|
|
<meta name="distribution" content="global">
|
|
<meta name="Generator" content="makeinfo">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<link href="index.html#Top" rel="start" title="Top">
|
|
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
|
|
<link href="Multiplication-Algorithms.html#Multiplication-Algorithms" rel="up" title="Multiplication Algorithms">
|
|
<link href="Other-Multiplication.html#Other-Multiplication" rel="next" title="Other Multiplication">
|
|
<link href="Higher-degree-Toom_0027n_0027half.html#Higher-degree-Toom_0027n_0027half" rel="prev" title="Higher degree Toom'n'half">
|
|
<style type="text/css">
|
|
<!--
|
|
a.summary-letter {text-decoration: none}
|
|
blockquote.indentedblock {margin-right: 0em}
|
|
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
|
|
blockquote.smallquotation {font-size: smaller}
|
|
div.display {margin-left: 3.2em}
|
|
div.example {margin-left: 3.2em}
|
|
div.lisp {margin-left: 3.2em}
|
|
div.smalldisplay {margin-left: 3.2em}
|
|
div.smallexample {margin-left: 3.2em}
|
|
div.smalllisp {margin-left: 3.2em}
|
|
kbd {font-style: oblique}
|
|
pre.display {font-family: inherit}
|
|
pre.format {font-family: inherit}
|
|
pre.menu-comment {font-family: serif}
|
|
pre.menu-preformatted {font-family: serif}
|
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
|
pre.smallexample {font-size: smaller}
|
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
|
pre.smalllisp {font-size: smaller}
|
|
span.nolinebreak {white-space: nowrap}
|
|
span.roman {font-family: initial; font-weight: normal}
|
|
span.sansserif {font-family: sans-serif; font-weight: normal}
|
|
ul.no-bullet {list-style: none}
|
|
-->
|
|
</style>
|
|
|
|
|
|
</head>
|
|
|
|
<body lang="en">
|
|
<a name="FFT-Multiplication"></a>
|
|
<div class="header">
|
|
<p>
|
|
Next: <a href="Other-Multiplication.html#Other-Multiplication" accesskey="n" rel="next">Other Multiplication</a>, Previous: <a href="Higher-degree-Toom_0027n_0027half.html#Higher-degree-Toom_0027n_0027half" accesskey="p" rel="prev">Higher degree Toom'n'half</a>, Up: <a href="Multiplication-Algorithms.html#Multiplication-Algorithms" accesskey="u" rel="up">Multiplication Algorithms</a> [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
|
</div>
|
|
<hr>
|
|
<a name="FFT-Multiplication-1"></a>
|
|
<h4 class="subsection">15.1.6 FFT Multiplication</h4>
|
|
<a name="index-FFT-multiplication-1"></a>
|
|
<a name="index-Fast-Fourier-Transform"></a>
|
|
|
|
<p>At large to very large sizes a Fermat style FFT multiplication is used,
|
|
following Schönhage and Strassen (see <a href="References.html#References">References</a>). Descriptions of FFTs
|
|
in various forms can be found in many textbooks, for instance Knuth section
|
|
4.3.3 part C or Lipson chapter IX. A brief description of the form used in
|
|
GMP is given here.
|
|
</p>
|
|
<p>The multiplication done is <em>x*y mod 2^N+1</em>, for a given
|
|
<em>N</em>. A full product <em>x*y</em> is obtained by choosing <em>N>=bits(x)+bits(y)</em> and padding
|
|
<em>x</em> and <em>y</em> with high zero limbs. The modular product is the native
|
|
form for the algorithm, so padding to get a full product is unavoidable.
|
|
</p>
|
|
<p>The algorithm follows a split, evaluate, pointwise multiply, interpolate and
|
|
combine similar to that described above for Karatsuba and Toom-3. A <em>k</em>
|
|
parameter controls the split, with an FFT-<em>k</em> splitting into <em>2^k</em>
|
|
pieces of <em>M=N/2^k</em> bits each. <em>N</em> must be a multiple of
|
|
<em>(2^k)*<code>mp_bits_per_limb</code></em> so
|
|
the split falls on limb boundaries, avoiding bit shifts in the split and
|
|
combine stages.
|
|
</p>
|
|
<p>The evaluations, pointwise multiplications, and interpolation, are all done
|
|
modulo <em>2^N'+1</em> where <em>N'</em> is <em>2M+k+3</em> rounded up to a
|
|
multiple of <em>2^k</em> and of <code>mp_bits_per_limb</code>. The results of
|
|
interpolation will be the following negacyclic convolution of the input
|
|
pieces, and the choice of <em>N'</em> ensures these sums aren’t truncated.
|
|
</p>
|
|
<div class="example">
|
|
<pre class="example"> ---
|
|
\ b
|
|
w[n] = / (-1) * x[i] * y[j]
|
|
---
|
|
i+j==b*2^k+n
|
|
b=0,1
|
|
</pre></div>
|
|
|
|
<p>The points used for the evaluation are <em>g^i</em> for <em>i=0</em> to
|
|
<em>2^k-1</em> where <em>g=2^(2N'/2^k)</em>. <em>g</em> is a
|
|
<em>2^k'</em>th root of unity mod <em>2^N'+1</em>, which produces necessary
|
|
cancellations at the interpolation stage, and it’s also a power of 2 so the
|
|
fast Fourier transforms used for the evaluation and interpolation do only
|
|
shifts, adds and negations.
|
|
</p>
|
|
<p>The pointwise multiplications are done modulo <em>2^N'+1</em> and either
|
|
recurse into a further FFT or use a plain multiplication (Toom-3, Karatsuba or
|
|
basecase), whichever is optimal at the size <em>N'</em>. The interpolation is
|
|
an inverse fast Fourier transform. The resulting set of sums of <em>x[i]*y[j]</em> are added at appropriate offsets to give the final result.
|
|
</p>
|
|
<p>Squaring is the same, but <em>x</em> is the only input so it’s one transform at
|
|
the evaluate stage and the pointwise multiplies are squares. The
|
|
interpolation is the same.
|
|
</p>
|
|
<p>For a mod <em>2^N+1</em> product, an FFT-<em>k</em> is an <em>O(N^(k/(k-1)))</em> algorithm, the exponent representing <em>2^k</em> recursed
|
|
modular multiplies each <em>1/2^(k-1)</em> the size of the original.
|
|
Each successive <em>k</em> is an asymptotic improvement, but overheads mean each
|
|
is only faster at bigger and bigger sizes. In the code, <code>MUL_FFT_TABLE</code>
|
|
and <code>SQR_FFT_TABLE</code> are the thresholds where each <em>k</em> is used. Each
|
|
new <em>k</em> effectively swaps some multiplying for some shifts, adds and
|
|
overheads.
|
|
</p>
|
|
<p>A mod <em>2^N+1</em> product can be formed with a normal
|
|
<em>NxN->2N</em> bit multiply plus a subtraction, so an FFT
|
|
and Toom-3 etc can be compared directly. A <em>k=4</em> FFT at
|
|
<em>O(N^1.333<!-- /@w -->)</em> can be expected to be the first faster than Toom-3 at
|
|
<em>O(N^1.465<!-- /@w -->)</em>. In practice this is what’s found, with
|
|
<code>MUL_FFT_MODF_THRESHOLD</code> and <code>SQR_FFT_MODF_THRESHOLD</code> being between
|
|
300 and 1000 limbs, depending on the CPU. So far it’s been found that only
|
|
very large FFTs recurse into pointwise multiplies above these sizes.
|
|
</p>
|
|
<p>When an FFT is to give a full product, the change of <em>N</em> to <em>2N</em>
|
|
doesn’t alter the theoretical complexity for a given <em>k</em>, but for the
|
|
purposes of considering where an FFT might be first used it can be assumed
|
|
that the FFT is recursing into a normal multiply and that on that basis it’s
|
|
doing <em>2^k</em> recursed multiplies each <em>1/2^(k-2)</em> the size of
|
|
the inputs, making it <em>O(N^(k/(k-2)))</em>. This would mean
|
|
<em>k=7</em> at <em>O(N^1.4<!-- /@w -->)</em> would be the first FFT faster than Toom-3.
|
|
In practice <code>MUL_FFT_THRESHOLD</code> and <code>SQR_FFT_THRESHOLD</code> have been
|
|
found to be in the <em>k=8</em> range, somewhere between 3000 and 10000 limbs.
|
|
</p>
|
|
<p>The way <em>N</em> is split into <em>2^k</em> pieces and then <em>2M+k+3</em> is
|
|
rounded up to a multiple of <em>2^k</em> and <code>mp_bits_per_limb</code> means that
|
|
when <em>2^k>=<code>mp\_bits\_per\_limb</code></em> the effective <em>N</em> is a
|
|
multiple of <em>2^(2k-1)</em> bits. The <em>+k+3</em> means some values of
|
|
<em>N</em> just under such a multiple will be rounded to the next. The
|
|
complexity calculations above assume that a favourable size is used, meaning
|
|
one which isn’t padded through rounding, and it’s also assumed that the extra
|
|
<em>+k+3</em> bits are negligible at typical FFT sizes.
|
|
</p>
|
|
<p>The practical effect of the <em>2^(2k-1)</em> constraint is to introduce a
|
|
step-effect into measured speeds. For example <em>k=8</em> will round <em>N</em>
|
|
up to a multiple of 32768 bits, so for a 32-bit limb there’ll be 512 limb
|
|
groups of sizes for which <code>mpn_mul_n</code> runs at the same speed. Or for
|
|
<em>k=9</em> groups of 2048 limbs, <em>k=10</em> groups of 8192 limbs, etc. In
|
|
practice it’s been found each <em>k</em> is used at quite small multiples of its
|
|
size constraint and so the step effect is quite noticeable in a time versus
|
|
size graph.
|
|
</p>
|
|
<p>The threshold determinations currently measure at the mid-points of size
|
|
steps, but this is sub-optimal since at the start of a new step it can happen
|
|
that it’s better to go back to the previous <em>k</em> for a while. Something
|
|
more sophisticated for <code>MUL_FFT_TABLE</code> and <code>SQR_FFT_TABLE</code> will be
|
|
needed.
|
|
</p>
|
|
|
|
<hr>
|
|
<div class="header">
|
|
<p>
|
|
Next: <a href="Other-Multiplication.html#Other-Multiplication" accesskey="n" rel="next">Other Multiplication</a>, Previous: <a href="Higher-degree-Toom_0027n_0027half.html#Higher-degree-Toom_0027n_0027half" accesskey="p" rel="prev">Higher degree Toom'n'half</a>, Up: <a href="Multiplication-Algorithms.html#Multiplication-Algorithms" accesskey="u" rel="up">Multiplication Algorithms</a> [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
|
</div>
|
|
|
|
|
|
|
|
</body>
|
|
</html>
|