You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
121 lines
5.9 KiB
HTML
121 lines
5.9 KiB
HTML
4 years ago
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
||
|
<html>
|
||
|
<!-- This manual describes how to install and use the GNU multiple precision
|
||
|
arithmetic library, version 6.1.0.
|
||
|
|
||
|
Copyright 1991, 1993-2015 Free Software Foundation, Inc.
|
||
|
|
||
|
Permission is granted to copy, distribute and/or modify this document under
|
||
|
the terms of the GNU Free Documentation License, Version 1.3 or any later
|
||
|
version published by the Free Software Foundation; with no Invariant Sections,
|
||
|
with the Front-Cover Texts being "A GNU Manual", and with the Back-Cover
|
||
|
Texts being "You have freedom to copy and modify this GNU Manual, like GNU
|
||
|
software". A copy of the license is included in
|
||
|
GNU Free Documentation License. -->
|
||
|
<!-- Created by GNU Texinfo 6.4, http://www.gnu.org/software/texinfo/ -->
|
||
|
<head>
|
||
|
<title>Assembly Cache Handling (GNU MP 6.1.0)</title>
|
||
|
|
||
|
<meta name="description" content="How to install and use the GNU multiple precision arithmetic library, version 6.1.0.">
|
||
|
<meta name="keywords" content="Assembly Cache Handling (GNU MP 6.1.0)">
|
||
|
<meta name="resource-type" content="document">
|
||
|
<meta name="distribution" content="global">
|
||
|
<meta name="Generator" content="makeinfo">
|
||
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
||
|
<link href="index.html#Top" rel="start" title="Top">
|
||
|
<link href="Concept-Index.html#Concept-Index" rel="index" title="Concept Index">
|
||
|
<link href="Assembly-Coding.html#Assembly-Coding" rel="up" title="Assembly Coding">
|
||
|
<link href="Assembly-Functional-Units.html#Assembly-Functional-Units" rel="next" title="Assembly Functional Units">
|
||
|
<link href="Assembly-Carry-Propagation.html#Assembly-Carry-Propagation" rel="prev" title="Assembly Carry Propagation">
|
||
|
<style type="text/css">
|
||
|
<!--
|
||
|
a.summary-letter {text-decoration: none}
|
||
|
blockquote.indentedblock {margin-right: 0em}
|
||
|
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
|
||
|
blockquote.smallquotation {font-size: smaller}
|
||
|
div.display {margin-left: 3.2em}
|
||
|
div.example {margin-left: 3.2em}
|
||
|
div.lisp {margin-left: 3.2em}
|
||
|
div.smalldisplay {margin-left: 3.2em}
|
||
|
div.smallexample {margin-left: 3.2em}
|
||
|
div.smalllisp {margin-left: 3.2em}
|
||
|
kbd {font-style: oblique}
|
||
|
pre.display {font-family: inherit}
|
||
|
pre.format {font-family: inherit}
|
||
|
pre.menu-comment {font-family: serif}
|
||
|
pre.menu-preformatted {font-family: serif}
|
||
|
pre.smalldisplay {font-family: inherit; font-size: smaller}
|
||
|
pre.smallexample {font-size: smaller}
|
||
|
pre.smallformat {font-family: inherit; font-size: smaller}
|
||
|
pre.smalllisp {font-size: smaller}
|
||
|
span.nolinebreak {white-space: nowrap}
|
||
|
span.roman {font-family: initial; font-weight: normal}
|
||
|
span.sansserif {font-family: sans-serif; font-weight: normal}
|
||
|
ul.no-bullet {list-style: none}
|
||
|
-->
|
||
|
</style>
|
||
|
|
||
|
|
||
|
</head>
|
||
|
|
||
|
<body lang="en">
|
||
|
<a name="Assembly-Cache-Handling"></a>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Assembly-Functional-Units.html#Assembly-Functional-Units" accesskey="n" rel="next">Assembly Functional Units</a>, Previous: <a href="Assembly-Carry-Propagation.html#Assembly-Carry-Propagation" accesskey="p" rel="prev">Assembly Carry Propagation</a>, Up: <a href="Assembly-Coding.html#Assembly-Coding" accesskey="u" rel="up">Assembly Coding</a> [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
<hr>
|
||
|
<a name="Cache-Handling"></a>
|
||
|
<h4 class="subsection">15.8.4 Cache Handling</h4>
|
||
|
<a name="index-Assembly-cache-handling"></a>
|
||
|
|
||
|
<p>GMP aims to perform well both on operands that fit entirely in L1 cache and
|
||
|
those which don’t.
|
||
|
</p>
|
||
|
<p>Basic routines like <code>mpn_add_n</code> or <code>mpn_lshift</code> are often used on
|
||
|
large operands, so L2 and main memory performance is important for them.
|
||
|
<code>mpn_mul_1</code> and <code>mpn_addmul_1</code> are mostly used for multiply and
|
||
|
square basecases, so L1 performance matters most for them, unless assembly
|
||
|
versions of <code>mpn_mul_basecase</code> and <code>mpn_sqr_basecase</code> exist, in
|
||
|
which case the remaining uses are mostly for larger operands.
|
||
|
</p>
|
||
|
<p>For L2 or main memory operands, memory access times will almost certainly be
|
||
|
more than the calculation time. The aim therefore is to maximize memory
|
||
|
throughput, by starting a load of the next cache line while processing the
|
||
|
contents of the previous one. Clearly this is only possible if the chip has a
|
||
|
lock-up free cache or some sort of prefetch instruction. Most current chips
|
||
|
have both these features.
|
||
|
</p>
|
||
|
<p>Prefetching sources combines well with loop unrolling, since a prefetch can be
|
||
|
initiated once per unrolled loop (or more than once if the loop covers more
|
||
|
than one cache line).
|
||
|
</p>
|
||
|
<p>On CPUs without write-allocate caches, prefetching destinations will ensure
|
||
|
individual stores don’t go further down the cache hierarchy, limiting
|
||
|
bandwidth. Of course for calculations which are slow anyway, like
|
||
|
<code>mpn_divrem_1</code>, write-throughs might be fine.
|
||
|
</p>
|
||
|
<p>The distance ahead to prefetch will be determined by memory latency versus
|
||
|
throughput. The aim of course is to have data arriving continuously, at peak
|
||
|
throughput. Some CPUs have limits on the number of fetches or prefetches in
|
||
|
progress.
|
||
|
</p>
|
||
|
<p>If a special prefetch instruction doesn’t exist then a plain load can be used,
|
||
|
but in that case care must be taken not to attempt to read past the end of an
|
||
|
operand, since that might produce a segmentation violation.
|
||
|
</p>
|
||
|
<p>Some CPUs or systems have hardware that detects sequential memory accesses and
|
||
|
initiates suitable cache movements automatically, making life easy.
|
||
|
</p>
|
||
|
|
||
|
<hr>
|
||
|
<div class="header">
|
||
|
<p>
|
||
|
Next: <a href="Assembly-Functional-Units.html#Assembly-Functional-Units" accesskey="n" rel="next">Assembly Functional Units</a>, Previous: <a href="Assembly-Carry-Propagation.html#Assembly-Carry-Propagation" accesskey="p" rel="prev">Assembly Carry Propagation</a>, Up: <a href="Assembly-Coding.html#Assembly-Coding" accesskey="u" rel="up">Assembly Coding</a> [<a href="Concept-Index.html#Concept-Index" title="Index" rel="index">Index</a>]</p>
|
||
|
</div>
|
||
|
|
||
|
|
||
|
|
||
|
</body>
|
||
|
</html>
|