You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
158 lines
11 KiB
HTML
158 lines
11 KiB
HTML
4 years ago
|
<!-- HTML header for doxygen 1.8.7-->
|
||
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
||
|
<head>
|
||
|
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
|
||
|
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
|
||
|
<meta name="generator" content="Doxygen 1.6.3"/>
|
||
|
<!--BEGIN PROJECT_NAME--><title>avr-libc: avr-libc: Compiler optimization</title><!--END PROJECT_NAME-->
|
||
|
<!--BEGIN !PROJECT_NAME--><title>avr-libc: Compiler optimization</title><!--END !PROJECT_NAME-->
|
||
|
<link href="$relpath^tabs.css" rel="stylesheet" type="text/css"/>
|
||
|
<script type="text/javascript" src="$relpath^jquery.js"></script>
|
||
|
<script type="text/javascript" src="$relpath^dynsections.js"></script>
|
||
|
$treeview
|
||
|
$search
|
||
|
$mathjax
|
||
|
<link href="$relpath^$stylesheet" rel="stylesheet" type="text/css" />
|
||
|
$extrastylesheet
|
||
|
</head>
|
||
|
<body>
|
||
|
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
|
||
|
|
||
|
<!--BEGIN TITLEAREA-->
|
||
|
<div id="titlearea">
|
||
|
<table cellspacing="0" cellpadding="0">
|
||
|
<tbody>
|
||
|
<tr style="height: 56px;">
|
||
|
<!--BEGIN PROJECT_LOGO-->
|
||
|
<td id="projectlogo"><img alt="Logo" src="$relpath^$projectlogo"/></td>
|
||
|
<!--END PROJECT_LOGO-->
|
||
|
<!--BEGIN PROJECT_NAME-->
|
||
|
<td style="padding-left: 0.5em;">
|
||
|
<div id="projectname">avr-libc
|
||
|
<!--BEGIN PROJECT_NUMBER--> <span id="projectnumber">2.0.0</span><!--END PROJECT_NUMBER-->
|
||
|
</div>
|
||
|
<!--BEGIN PROJECT_BRIEF--><div id="projectbrief">$projectbrief</div><!--END PROJECT_BRIEF-->
|
||
|
</td>
|
||
|
<!--END PROJECT_NAME-->
|
||
|
<!--BEGIN !PROJECT_NAME-->
|
||
|
<!--BEGIN PROJECT_BRIEF-->
|
||
|
<td style="padding-left: 0.5em;">
|
||
|
<div id="projectbrief">$projectbrief</div>
|
||
|
</td>
|
||
|
<!--END PROJECT_BRIEF-->
|
||
|
<!--END !PROJECT_NAME-->
|
||
|
<!--BEGIN DISABLE_INDEX-->
|
||
|
<!--BEGIN SEARCHENGINE-->
|
||
|
<td>$searchbox</td>
|
||
|
<!--END SEARCHENGINE-->
|
||
|
<!--END DISABLE_INDEX-->
|
||
|
</tr>
|
||
|
</tbody>
|
||
|
</table>
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td align="left"><a href="http://www.nongnu.org/avr-libc/"><h2>AVR Libc Home Page</h2></a></td>
|
||
|
<td align="center" colspan=4><img src="avrs.png" alt="AVRs" align="middle" border="0"></td>
|
||
|
<td align="right"><a href="https://savannah.nongnu.org/projects/avr-libc/"><h2>AVR Libc Development Pages</h2></a></td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td align="center" width="20%"><a href="index.html"><h2>Main Page</h2></a></td>
|
||
|
<td align="center" width="20%"><a href="pages.html"><h2>User Manual</h2></a></td>
|
||
|
<td align="center" width="20%"><a href="modules.html"><h2>Library Reference</h2></a></td>
|
||
|
<td align="center" width="20%"><a href="FAQ.html"><h2>FAQ</h2></a></td>
|
||
|
<td align="center" width="20%"><a href="group__demos.html"><h2>Example Projects</h2></a></td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
</div>
|
||
|
<!--END TITLEAREA-->
|
||
|
<!-- end header part -->
|
||
|
<!-- Generated by Doxygen 1.6.3 -->
|
||
|
<script type="text/javascript"><!--
|
||
|
var searchBox = new SearchBox("searchBox", "search",false,'Search');
|
||
|
--></script>
|
||
|
<div class="contents">
|
||
|
|
||
|
|
||
|
<h1><a class="anchor" id="optimization">Compiler optimization </a></h1><h2><a class="anchor" id="optim_code_reorder">
|
||
|
Problems with reordering code</a></h2>
|
||
|
<dl class="author"><dt><b>Author:</b></dt><dd>Jan Waclawek</dd></dl>
|
||
|
<p>Programs contain sequences of statements, and a naive compiler would execute them exactly in the order as they are written. But an optimizing compiler is free to <em>reorder</em> the statements - or even parts of them - if the resulting "net effect" is the same. The "measure" of the "net effect" is what the standard calls "side
|
||
|
effects", and is accomplished exclusively through accesses (reads and writes) to variables qualified as <code>volatile</code>. So, as long as all volatile reads and writes are to the same addresses and in the same order (and writes write the same values), the program is correct, regardless of other operations in it. (One important point to note here is, that time duration between consecutive volatile accesses is not considered at all.)</p>
|
||
|
<p>Unfortunately, there are also operations which are not covered by volatile accesses. An example of this in avr-gcc/avr-libc are the <a class="el" href="group__avr__interrupts.html#ga68c330e94fe121eba993e5a5973c3162">cli()</a> and <a class="el" href="group__avr__interrupts.html#gaad5ebd34cb344c26ac87594f79b06b73">sei()</a> macros defined in <<a class="el" href="interrupt_8h.html">avr/interrupt.h</a>>, which convert directly to the respective assembler mnemonics through the __asm__() statement. These don't constitute a variable access at all, not even volatile, so the compiler is free to move them around. Although there is a "volatile" qualifier which can be attached to the __asm__() statement, its effect on (re)ordering is not clear from the documentation (and is more likely only to prevent complete removal by the optimiser), as it (among other) states:</p>
|
||
|
<p><em>Note that even a volatile asm instruction can be moved relative to other code, including across jump instructions. [...] Similarly, you can't expect a sequence of volatile asm instructions to remain perfectly consecutive.</em></p>
|
||
|
<dl class="see"><dt><b>See also:</b></dt><dd><a href="http://gcc.gnu.org/onlinedocs/gcc-4.3.4/gcc/Extended-Asm.html">http://gcc.gnu.org/onlinedocs/gcc-4.3.4/gcc/Extended-Asm.html</a></dd></dl>
|
||
|
<p>There is another mechanism which can be used to achieve something similar: <em>memory barriers</em>. This is accomplished through adding a special "memory" clobber to the inline <code>asm</code> statement, and ensures that all variables are flushed from registers to memory before the statement, and then re-read after the statement. The purpose of memory barriers is slightly different than to enforce code ordering: it is supposed to ensure that there are no variables "cached" in registers, so that it is safe to change the content of registers e.g. when switching context in a multitasking OS (on "big" processors with out-of-order execution they also imply usage of special instructions which force the processor into "in-order" state (this is not the case of AVRs)).</p>
|
||
|
<p>However, memory barrier works well in ensuring that all volatile accesses before and after the barrier occur in the given order with respect to the barrier. However, it does not ensure the compiler moving non-volatile-related statements across the barrier. Peter Dannegger provided a nice example of this effect:</p>
|
||
|
<div class="fragment"><pre class="fragment"><span class="preprocessor">#define cli() __asm volatile( "cli" ::: "memory" )</span>
|
||
|
<span class="preprocessor"></span><span class="preprocessor">#define sei() __asm volatile( "sei" ::: "memory" )</span>
|
||
|
<span class="preprocessor"></span>
|
||
|
<span class="keywordtype">unsigned</span> <span class="keywordtype">int</span> ivar;
|
||
|
|
||
|
<span class="keywordtype">void</span> test2( <span class="keywordtype">unsigned</span> <span class="keywordtype">int</span> val )
|
||
|
{
|
||
|
val = 65535U / val;
|
||
|
|
||
|
<a class="code" href="group__avr__interrupts.html#ga68c330e94fe121eba993e5a5973c3162">cli</a>();
|
||
|
|
||
|
ivar = val;
|
||
|
|
||
|
<a class="code" href="group__avr__interrupts.html#gaad5ebd34cb344c26ac87594f79b06b73">sei</a>();
|
||
|
}
|
||
|
</pre></div><p>compiles with optimisations switched on (-Os) to</p>
|
||
|
<div class="fragment"><pre class="fragment">
|
||
|
00000112 <test2>:
|
||
|
112: bc 01 movw r22, r24
|
||
|
114: f8 94 cli
|
||
|
116: 8f ef ldi r24, 0xFF ; 255
|
||
|
118: 9f ef ldi r25, 0xFF ; 255
|
||
|
11a: 0e 94 96 00 call 0x12c ; 0x12c <__udivmodhi4>
|
||
|
11e: 70 93 01 02 sts 0x0201, r23
|
||
|
122: 60 93 00 02 sts 0x0200, r22
|
||
|
126: 78 94 sei
|
||
|
128: 08 95 ret
|
||
|
</pre></div><p>where the potentially slow division is moved across <a class="el" href="group__avr__interrupts.html#ga68c330e94fe121eba993e5a5973c3162">cli()</a>, resulting in interrupts to be disabled longer than intended. Note, that the volatile access occurs in order with respect to <a class="el" href="group__avr__interrupts.html#ga68c330e94fe121eba993e5a5973c3162">cli()</a> or <a class="el" href="group__avr__interrupts.html#gaad5ebd34cb344c26ac87594f79b06b73">sei()</a>; so the "net effect" required by the standard is achieved as intended, it is "only" the timing which is off. However, for most of embedded applications, timing is an important, sometimes critical factor.</p>
|
||
|
<dl class="see"><dt><b>See also:</b></dt><dd><a href="https://www.mikrocontroller.net/topic/65923">https://www.mikrocontroller.net/topic/65923</a></dd></dl>
|
||
|
<p>Unfortunately, at the moment, in avr-gcc (nor in the C standard), there is no mechanism to enforce complete match of written and executed code ordering - except maybe of switching the optimization completely off (-O0), or writing all the critical code in assembly.</p>
|
||
|
<p>To sum it up:</p>
|
||
|
<ul>
|
||
|
<li>memory barriers ensure proper ordering of volatile accesses </li>
|
||
|
<li>memory barriers don't ensure statements with no volatile accesses to be reordered across the barrier </li>
|
||
|
</ul>
|
||
|
</div>
|
||
|
<!--- window showing the filter options -->
|
||
|
<div id="MSearchSelectWindow"
|
||
|
onmouseover="return searchBox.OnSearchSelectShow()"
|
||
|
onmouseout="return searchBox.OnSearchSelectHide()"
|
||
|
onkeydown="return searchBox.OnSearchSelectKey(event)">
|
||
|
<a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(0)"><span class="SelectionMark"> </span>All</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(1)"><span class="SelectionMark"> </span>Data Structures</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(2)"><span class="SelectionMark"> </span>Files</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(3)"><span class="SelectionMark"> </span>Functions</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(4)"><span class="SelectionMark"> </span>Variables</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(5)"><span class="SelectionMark"> </span>Typedefs</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(6)"><span class="SelectionMark"> </span>Enumerations</a><a class="SelectItem" href="javascript:void(0)" onclick="searchBox.OnSelectItem(7)"><span class="SelectionMark"> </span>Defines</a></div>
|
||
|
|
||
|
<!-- iframe showing the search results (closed by default) -->
|
||
|
<div id="MSearchResultsWindow">
|
||
|
<iframe src="" frameborder="0"
|
||
|
name="MSearchResults" id="MSearchResults">
|
||
|
</iframe>
|
||
|
</div>
|
||
|
|
||
|
<!-- HTML footer for doxygen 1.8.7-->
|
||
|
<!-- start footer part -->
|
||
|
<!--BEGIN GENERATE_TREEVIEW-->
|
||
|
<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
|
||
|
<ul>
|
||
|
$navpath
|
||
|
<li class="footer">$generatedby
|
||
|
<a href="http://www.doxygen.org/index.html">
|
||
|
<img class="footer" src="$relpath^doxygen.png" alt="doxygen"/></a> 1.6.3 </li>
|
||
|
</ul>
|
||
|
</div>
|
||
|
<!--END GENERATE_TREEVIEW-->
|
||
|
<!--BEGIN !GENERATE_TREEVIEW-->
|
||
|
<hr class="footer"/><address class="footer"><small>
|
||
|
$generatedby  <a href="http://www.doxygen.org/index.html">
|
||
|
<img class="footer" src="$relpath^doxygen.png" alt="doxygen"/>
|
||
|
</a> 1.6.3
|
||
|
</small></address>
|
||
|
<!--END !GENERATE_TREEVIEW-->
|
||
|
</body>
|
||
|
</html>
|