Next: Random Number Functions, Previous: Floating-point Functions, Up: Top [Index]
This chapter describes low-level GMP functions, used to implement the high-level GMP functions, but also intended for time-critical user code.
These functions start with the prefix mpn_
.
The mpn
functions are designed to be as fast as possible, not
to provide a coherent calling interface. The different functions have somewhat
similar interfaces, but there are variations that make them hard to use. These
functions do as little as possible apart from the real multiple precision
computation, so that no time is spent on things that not all callers need.
A source operand is specified by a pointer to the least significant limb and a limb count. A destination operand is specified by just a pointer. It is the responsibility of the caller to ensure that the destination has enough space for storing the result.
With this way of specifying operands, it is possible to perform computations on subranges of an argument, and store the result into a subrange of a destination.
A common requirement for all functions is that each source area needs at least one limb. No size argument may be zero. Unless otherwise stated, in-place operations are allowed where source and destination are the same, but not where they only partly overlap.
The mpn
functions are the base for the implementation of the
mpz_
, mpf_
, and mpq_
functions.
This example adds the number beginning at s1p and the number beginning at s2p and writes the sum at destp. All areas have n limbs.
cy = mpn_add_n (destp, s1p, s2p, n)
It should be noted that the mpn
functions make no attempt to identify
high or low zero limbs on their operands, or other special forms. On random
data such cases will be unlikely and it’d be wasteful for every function to
check every time. An application knowing something about its data can take
steps to trim or perhaps split its calculations.
In the notation used below, a source operand is identified by the pointer to the least significant limb, and the limb count in braces. For example, {s1p, s1n}.
Add {s1p, n} and {s2p, n}, and write the n least significant limbs of the result to rp. Return carry, either 0 or 1.
This is the lowest-level function for addition. It is the preferred function
for addition, since it is written in assembly for most CPUs. For addition of
a variable to itself (i.e., s1p equals s2p) use mpn_lshift
with a count of 1 for optimal speed.
Add {s1p, n} and s2limb, and write the n least significant limbs of the result to rp. Return carry, either 0 or 1.
Add {s1p, s1n} and {s2p, s2n}, and write the s1n least significant limbs of the result to rp. Return carry, either 0 or 1.
This function requires that s1n is greater than or equal to s2n.
Subtract {s2p, n} from {s1p, n}, and write the n least significant limbs of the result to rp. Return borrow, either 0 or 1.
This is the lowest-level function for subtraction. It is the preferred function for subtraction, since it is written in assembly for most CPUs.
Subtract s2limb from {s1p, n}, and write the n least significant limbs of the result to rp. Return borrow, either 0 or 1.
Subtract {s2p, s2n} from {s1p, s1n}, and write the s1n least significant limbs of the result to rp. Return borrow, either 0 or 1.
This function requires that s1n is greater than or equal to s2n.
Perform the negation of {sp, n}, and write the result to
{rp, n}. This is equivalent to calling mpn_sub_n
with a
n-limb zero minuend and passing {sp, n} as subtrahend.
Return borrow, either 0 or 1.
Multiply {s1p, n} and {s2p, n}, and write the 2*n-limb result to rp.
The destination has to have space for 2*n limbs, even if the product’s most significant limb is zero. No overlap is permitted between the destination and either source.
If the two input operands are the same, use mpn_sqr
.
Multiply {s1p, s1n} and {s2p, s2n}, and write the (s1n+s2n)-limb result to rp. Return the most significant limb of the result.
The destination has to have space for s1n + s2n limbs, even if the product’s most significant limb is zero. No overlap is permitted between the destination and either source.
This function requires that s1n is greater than or equal to s2n.
Compute the square of {s1p, n} and write the 2*n-limb result to rp.
The destination has to have space for 2n limbs, even if the result’s most significant limb is zero. No overlap is permitted between the destination and the source.
Multiply {s1p, n} by s2limb, and write the n least significant limbs of the product to rp. Return the most significant limb of the product. {s1p, n} and {rp, n} are allowed to overlap provided rp <= s1p.
This is a low-level function that is a building block for general multiplication as well as other operations in GMP. It is written in assembly for most CPUs.
Don’t call this function if s2limb is a power of 2; use mpn_lshift
with a count equal to the logarithm of s2limb instead, for optimal speed.
Multiply {s1p, n} and s2limb, and add the n least significant limbs of the product to {rp, n} and write the result to rp. Return the most significant limb of the product, plus carry-out from the addition. {s1p, n} and {rp, n} are allowed to overlap provided rp <= s1p.
This is a low-level function that is a building block for general multiplication as well as other operations in GMP. It is written in assembly for most CPUs.
Multiply {s1p, n} and s2limb, and subtract the n least significant limbs of the product from {rp, n} and write the result to rp. Return the most significant limb of the product, plus borrow-out from the subtraction. {s1p, n} and {rp, n} are allowed to overlap provided rp <= s1p.
This is a low-level function that is a building block for general multiplication and division as well as other operations in GMP. It is written in assembly for most CPUs.
Divide {np, nn} by {dp, dn} and put the quotient at {qp, nn-dn+1} and the remainder at {rp, dn}. The quotient is rounded towards 0.
No overlap is permitted between arguments, except that np might equal rp. The dividend size nn must be greater than or equal to divisor size dn. The most significant limb of the divisor must be non-zero. The qxn operand must be zero.
[This function is obsolete. Please call mpn_tdiv_qr
instead for best
performance.]
Divide {rs2p, rs2n} by {s3p, s3n}, and write the quotient at r1p, with the exception of the most significant limb, which is returned. The remainder replaces the dividend at rs2p; it will be s3n limbs long (i.e., as many limbs as the divisor).
In addition to an integer quotient, qxn fraction limbs are developed, and stored after the integral limbs. For most usages, qxn will be zero.
It is required that rs2n is greater than or equal to s3n. It is required that the most significant bit of the divisor is set.
If the quotient is not needed, pass rs2p + s3n as r1p. Aside from that special case, no overlap between arguments is permitted.
Return the most significant limb of the quotient, either 0 or 1.
The area at r1p needs to be rs2n - s3n + qxn limbs large.
Divide {s2p, s2n} by s3limb, and write the quotient at r1p. Return the remainder.
The integer quotient is written to {r1p+qxn, s2n} and in addition qxn fraction limbs are developed and written to {r1p, qxn}. Either or both s2n and qxn can be zero. For most usages, qxn will be zero.
mpn_divmod_1
exists for upward source compatibility and is simply a
macro calling mpn_divrem_1
with a qxn of 0.
The areas at r1p and s2p have to be identical or completely separate, not partially overlapping.
[This function is obsolete. Please call mpn_tdiv_qr
instead for best
performance.]
Divide {sp, n} by d, expecting it to divide exactly, and writing the result to {rp, n}. If d doesn’t divide exactly, the value written to {rp, n} is undefined. The areas at rp and sp have to be identical or completely separate, not partially overlapping.
Divide {sp, n} by 3, expecting it to divide exactly, and writing the result to {rp, n}. If 3 divides exactly, the return value is zero and the result is the quotient. If not, the return value is non-zero and the result won’t be anything useful.
mpn_divexact_by3c
takes an initial carry parameter, which can be the
return value from a previous call, so a large calculation can be done piece by
piece from low to high. mpn_divexact_by3
is simply a macro calling
mpn_divexact_by3c
with a 0 carry parameter.
These routines use a multiply-by-inverse and will be faster than
mpn_divrem_1
on CPUs with fast multiplication but slow division.
The source a, result q, size n, initial carry i,
and return value c satisfy c*b^n + a-i = 3*q, where
b=2^GMP_NUMB_BITS. The
return c is always 0, 1 or 2, and the initial carry i must also
be 0, 1 or 2 (these are both borrows really). When c=0 clearly
q=(a-i)/3. When c!=0, the remainder (a-i) mod
3 is given by 3-c, because b ≡ 1 mod 3 (when
mp_bits_per_limb
is even, which is always so currently).
Divide {s1p, s1n} by s2limb, and return the remainder. s1n can be zero.
Shift {sp, n} left by count bits, and write the result to {rp, n}. The bits shifted out at the left are returned in the least significant count bits of the return value (the rest of the return value is zero).
count must be in the range 1 to mp_bits_per_limb
-1. The
regions {sp, n} and {rp, n} may overlap, provided
rp >= sp.
This function is written in assembly for most CPUs.
Shift {sp, n} right by count bits, and write the result to {rp, n}. The bits shifted out at the right are returned in the most significant count bits of the return value (the rest of the return value is zero).
count must be in the range 1 to mp_bits_per_limb
-1. The
regions {sp, n} and {rp, n} may overlap, provided
rp <= sp.
This function is written in assembly for most CPUs.
Compare {s1p, n} and {s2p, n} and return a positive value if s1 > s2, 0 if they are equal, or a negative value if s1 < s2.
Test {sp, n} and return 1 if the operand is zero, 0 otherwise.
Set {rp, retval} to the greatest common divisor of {xp, xn} and {yp, yn}. The result can be up to yn limbs, the return value is the actual number produced. Both source operands are destroyed.
It is required that xn >= yn > 0, and the most significant limb of {yp, yn} must be non-zero. No overlap is permitted between {xp, xn} and {yp, yn}.
Return the greatest common divisor of {xp, xn} and ylimb. Both operands must be non-zero.
Let U be defined by {up, un} and let V be defined by {vp, vn}.
Compute the greatest common divisor G of U and V. Compute a cofactor S such that G = US + VT. The second cofactor T is not computed but can easily be obtained from (G - U*S) / V (the division will be exact). It is required that un >= vn > 0, and the most significant limb of {vp, vn} must be non-zero.
S satisfies S = 1 or abs(S) < V / (2 G). S = 0 if and only if V divides U (i.e., G = V).
Store G at gp and let the return value define its limb count. Store S at sp and let |*sn| define its limb count. S can be negative; when this happens *sn will be negative. The area at gp should have room for vn limbs and the area at sp should have room for vn+1 limbs.
Both source operands are destroyed.
Compatibility notes: GMP 4.3.0 and 4.3.1 defined S less strictly. Earlier as well as later GMP releases define S as described here. GMP releases before GMP 4.3.0 required additional space for both input and output areas. More precisely, the areas {up, un+1} and {vp, vn+1} were destroyed (i.e. the operands plus an extra limb past the end of each), and the areas pointed to by gp and sp should each have room for un+1 limbs.
Compute the square root of {sp, n} and put the result at {r1p, ceil(n/2)} and the remainder at {r2p, retval}. r2p needs space for n limbs, but the return value indicates how many are produced.
The most significant limb of {sp, n} must be non-zero. The areas {r1p, ceil(n/2)} and {sp, n} must be completely separate. The areas {r2p, n} and {sp, n} must be either identical or completely separate.
If the remainder is not wanted then r2p can be NULL
, and in this
case the return value is zero or non-zero according to whether the remainder
would have been zero or non-zero.
A return value of zero indicates a perfect square. See also
mpn_perfect_square_p
.
Return the size of {xp,n} measured in number of digits in the given base. base can vary from 2 to 62. Requires n > 0 and xp[n-1] > 0. The result will be either exact or 1 too big. If base is a power of 2, the result is always exact.
Convert {s1p, s1n} to a raw unsigned char array at str in base base, and return the number of characters produced. There may be leading zeros in the string. The string is not in ASCII; to convert it to printable format, add the ASCII codes for ‘0’ or ‘A’, depending on the base and range. base can vary from 2 to 256.
The most significant limb of the input {s1p, s1n} must be non-zero. The input {s1p, s1n} is clobbered, except when base is a power of 2, in which case it’s unchanged.
The area at str has to have space for the largest possible number represented by a s1n long limb array, plus one extra character.
Convert bytes {str,strsize} in the given base to limbs at rp.
str[0] is the most significant input byte and str[strsize-1] is the least significant input byte. Each byte should be a value in the range 0 to base-1, not an ASCII character. base can vary from 2 to 256.
The converted value is {rp,rn} where rn is the return value. If the most significant input byte str[0] is non-zero, then rp[rn-1] will be non-zero, else rp[rn-1] and some number of subsequent limbs may be zero.
The area at rp has to have space for the largest possible number with strsize digits in the chosen base, plus one extra limb.
The input must have at least one byte, and no overlap is permitted between {str,strsize} and the result at rp.
Scan s1p from bit position bit for the next clear bit.
It is required that there be a clear bit within the area at s1p at or beyond bit position bit, so that the function has something to return.
Scan s1p from bit position bit for the next set bit.
It is required that there be a set bit within the area at s1p at or beyond bit position bit, so that the function has something to return.
Generate a random number of length r1n and store it at r1p. The
most significant limb is always non-zero. mpn_random
generates
uniformly distributed limb data, mpn_random2
generates long strings of
zeros and ones in the binary representation.
mpn_random2
is intended for testing the correctness of the mpn
routines.
Count the number of set bits in {s1p, n}.
Compute the hamming distance between {s1p, n} and {s2p, n}, which is the number of bit positions where the two operands have different bit values.
Return non-zero iff {s1p, n} is a perfect square. The most significant limb of the input {s1p, n} must be non-zero.
Perform the bitwise logical and of {s1p, n} and {s2p, n}, and write the result to {rp, n}.
Perform the bitwise logical inclusive or of {s1p, n} and {s2p, n}, and write the result to {rp, n}.
Perform the bitwise logical exclusive or of {s1p, n} and {s2p, n}, and write the result to {rp, n}.
Perform the bitwise logical and of {s1p, n} and the bitwise complement of {s2p, n}, and write the result to {rp, n}.
Perform the bitwise logical inclusive or of {s1p, n} and the bitwise complement of {s2p, n}, and write the result to {rp, n}.
Perform the bitwise logical and of {s1p, n} and {s2p, n}, and write the bitwise complement of the result to {rp, n}.
Perform the bitwise logical inclusive or of {s1p, n} and {s2p, n}, and write the bitwise complement of the result to {rp, n}.
Perform the bitwise logical exclusive or of {s1p, n} and {s2p, n}, and write the bitwise complement of the result to {rp, n}.
Perform the bitwise complement of {sp, n}, and write the result to {rp, n}.
Copy from {s1p, n} to {rp, n}, increasingly.
Copy from {s1p, n} to {rp, n}, decreasingly.
Zero {rp, n}.
The functions prefixed with mpn_sec_
and mpn_cnd_
are designed to
perform the exact same low-level operations and have the same cache access
patterns for any two same-size arguments, assuming that function arguments are
placed at the same position and that the machine state is identical upon
function entry. These functions are intended for cryptographic purposes, where
resilience to side-channel attacks is desired.
These functions are less efficient than their “leaky” counterparts; their performance for operands of the sizes typically used for cryptographic applications is between 15% and 100% worse. For larger operands, these functions might be inadequate, since they rely on asymptotically elementary algorithms.
These functions do not make any explicit allocations. Those of these functions that need scratch space accept a scratch space operand. This convention allows callers to keep sensitive data in designated memory areas. Note however that compilers may choose to spill scalar values used within these functions to their stack frame and that such scalars may contain sensitive data.
In addition to these specially crafted functions, the following mpn
functions are naturally side-channel resistant: mpn_add_n
,
mpn_sub_n
, mpn_lshift
, mpn_rshift
, mpn_zero
,
mpn_copyi
, mpn_copyd
, mpn_com
, and the logical function
(mpn_and_n
, etc).
There are some exceptions from the side-channel resilience: (1) Some assembly
implementations of mpn_lshift
identify shift-by-one as a special case.
This is a problem iff the shift count is a function of sensitive data. (2)
Alpha ev6 and Pentium4 using 64-bit limbs have leaky mpn_add_n
and
mpn_sub_n
. (3) Alpha ev6 has a leaky mpn_mul_1
which also makes
mpn_sec_mul
on those systems unsafe.
These functions do conditional addition and subtraction. If cnd is
non-zero, they produce the same result as a regular mpn_add_n
or
mpn_sub_n
, and if cnd is zero, they copy {s1p,n} to
the result area and return zero. The functions are designed to have timing and
memory access patterns depending only on size and location of the data areas,
but independent of the condition cnd. Like for mpn_add_n
and
mpn_sub_n
, on most machines, the timing will also be independent of the
actual limb values.
Set R to A + b or A - b, respectively, where R = {rp,n}, A = {ap,n}, and b is a single limb. Returns carry.
These functions take O(N) time, unlike the leaky functions
mpn_add_1
which are O(1) on average. They require scratch space
of mpn_sec_add_1_itch(n)
and mpn_sec_sub_1_itch(n)
limbs, respectively, to be passed in the tp parameter. The scratch space
requirements are guaranteed to be at most n limbs, and increase
monotonously in the operand size.
If cnd is non-zero, swaps the contents of the areas {ap,n} and {bp,n}. Otherwise, the areas are left unmodified. Implemented using logical operations on the limbs, with the same memory accesses independent of the value of cnd.
Set R to A * B, where A = {ap,an}, B = {bp,bn}, and R = {rp,an+bn}.
It is required that an >= bn > 0.
No overlapping between R and the input operands is allowed. For
A = B, use mpn_sec_sqr
for optimal performance.
This function requires scratch space of mpn_sec_mul_itch(an,
bn)
limbs to be passed in the tp parameter. The scratch space
requirements are guaranteed to increase monotonously in the operand sizes.
Set R to A^2, where A = {ap,an}, and R = {rp,2an}.
It is required that an > 0.
No overlapping between R and the input operands is allowed.
This function requires scratch space of mpn_sec_sqr_itch(an)
limbs
to be passed in the tp parameter. The scratch space requirements are
guaranteed to increase monotonously in the operand size.
Set R to (B raised to E) modulo
M, where R = {rp,n}, M = {mp,n},
and E = {ep,ceil(enb /
GMP\_NUMB\_BITS
)}.
It is required that B > 0, that M > 0 is odd, and that E < 2^enb.
No overlapping between R and the input operands is allowed.
This function requires scratch space of mpn_sec_powm_itch(bn,
enb, n)
limbs to be passed in the tp parameter. The scratch
space requirements are guaranteed to increase monotonously in the operand
sizes.
Select entry which from table tab, which has nents entries, each n limbs. Store the selected entry at rp.
This function reads the entire table to avoid side-channel information leaks.
Set Q to the truncated quotient N / D and R to N modulo D, where N = {np,nn}, D = {dp,dn}, Q’s most significant limb is the function return value and the remaining limbs are {qp,nn-dn}, and R = {np,dn}.
It is required that nn >= dn >= 1, and that dp[dn-1] != 0. This does not imply that N >= D since N might be zero-padded.
Note the overlapping between N and R. No other operand overlapping is allowed. The entire space occupied by N is overwritten.
This function requires scratch space of mpn_sec_div_qr_itch(nn,
dn)
limbs to be passed in the tp parameter.
Set R to N modulo D, where N = {np,nn}, D = {dp,dn}, and R = {np,dn}.
It is required that nn >= dn >= 1, and that dp[dn-1] != 0. This does not imply that N >= D since N might be zero-padded.
Note the overlapping between N and R. No other operand overlapping is allowed. The entire space occupied by N is overwritten.
This function requires scratch space of mpn_sec_div_r_itch(nn,
dn)
limbs to be passed in the tp parameter.
Set R to the inverse of A modulo M, where R = {rp,n}, A = {ap,n}, and M = {mp,n}. This function’s interface is preliminary.
If an inverse exists, return 1, otherwise return 0 and leave R undefined. In either case, the input A is destroyed.
It is required that M is odd, and that nbcnt >= ceil(\log(A+1)) + ceil(\log(M+1)). A safe choice is nbcnt = 2 * n * GMP_NUMB_BITS, but a smaller value might improve performance if M or A are known to have leading zero bits.
This function requires scratch space of mpn_sec_invert_itch(n)
limbs to be passed in the tp parameter.
Everything in this section is highly experimental and may disappear or be subject to incompatible changes in a future version of GMP.
Nails are an experimental feature whereby a few bits are left unused at the
top of each mp_limb_t
. This can significantly improve carry handling
on some processors.
All the mpn
functions accepting limb data will expect the nail bits to
be zero on entry, and will return data with the nails similarly all zero.
This applies both to limb vectors and to single limb arguments.
Nails can be enabled by configuring with ‘--enable-nails’. By default the number of bits will be chosen according to what suits the host processor, but a particular number can be selected with ‘--enable-nails=N’.
At the mpn level, a nail build is neither source nor binary compatible with a non-nail build, strictly speaking. But programs acting on limbs only through the mpn functions are likely to work equally well with either build, and judicious use of the definitions below should make any program compatible with either build, at the source level.
For the higher level routines, meaning mpz
etc, a nail build should be
fully source and binary compatible with a non-nail build.
GMP_NAIL_BITS
is the number of nail bits, or 0 when nails are not in
use. GMP_NUMB_BITS
is the number of data bits in a limb.
GMP_LIMB_BITS
is the total number of bits in an mp_limb_t
. In
all cases
GMP_LIMB_BITS == GMP_NAIL_BITS + GMP_NUMB_BITS
Bit masks for the nail and number parts of a limb. GMP_NAIL_MASK
is 0
when nails are not in use.
GMP_NAIL_MASK
is not often needed, since the nail part can be obtained
with x >> GMP_NUMB_BITS
, and that means one less large constant, which
can help various RISC chips.
The maximum value that can be stored in the number part of a limb. This is
the same as GMP_NUMB_MASK
, but can be used for clarity when doing
comparisons rather than bit-wise operations.
The term “nails” comes from finger or toe nails, which are at the ends of a limb (arm or leg). “numb” is short for number, but is also how the developers felt after trying for a long time to come up with sensible names for these things.
In the future (the distant future most likely) a non-zero nail might be permitted, giving non-unique representations for numbers in a limb vector. This would help vector processors since carries would only ever need to propagate one or two limbs.
Next: Random Number Functions, Previous: Floating-point Functions, Up: Top [Index]