ePenguin-Software-Framework/toolchains/avr/share/doc/cppinternals/Token-Spacing.html

<html lang="en">
<head>
<title>Token Spacing - The GNU C Preprocessor Internals</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name="description" content="The GNU C Preprocessor Internals">
<meta name="generator" content="makeinfo 4.13">
<link title="Top" rel="start" href="index.html#Top">
<link rel="prev" href="Macro-Expansion.html#Macro-Expansion" title="Macro Expansion">
<link rel="next" href="Line-Numbering.html#Line-Numbering" title="Line Numbering">
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
<meta http-equiv="Content-Style-Type" content="text/css">
<style type="text/css"><!--
  pre.display { font-family:inherit }
  pre.format  { font-family:inherit }
  pre.smalldisplay { font-family:inherit; font-size:smaller }
  pre.smallformat  { font-family:inherit; font-size:smaller }
  pre.smallexample { font-size:smaller }
  pre.smalllisp    { font-size:smaller }
  span.sc    { font-variant:small-caps }
  span.roman { font-family:serif; font-weight:normal; } 
  span.sansserif { font-family:sans-serif; font-weight:normal; } 
--></style>
</head>
<body>
<div class="node">
<a name="Token-Spacing"></a>
<p>
Next:&nbsp;<a rel="next" accesskey="n" href="Line-Numbering.html#Line-Numbering">Line Numbering</a>,
Previous:&nbsp;<a rel="previous" accesskey="p" href="Macro-Expansion.html#Macro-Expansion">Macro Expansion</a>,
Up:&nbsp;<a rel="up" accesskey="u" href="index.html#Top">Top</a>
<hr>
</div>

<h2 class="unnumbered">Token Spacing</h2>

<p><a name="index-paste-avoidance-14"></a><a name="index-spacing-15"></a><a name="index-token-spacing-16"></a>
First, consider an issue that only concerns the stand-alone
preprocessor: there needs to be a guarantee that re-reading its preprocessed
output results in an identical token stream.  Without taking special
measures, this might not be the case because of macro substitution. 
For example:

<pre class="smallexample">     #define PLUS +
     #define EMPTY
     #define f(x) =x=
     +PLUS -EMPTY- PLUS+ f(=)
             ==&gt; + + - - + + = = =
     <em>not</em>
             ==&gt; ++ -- ++ ===
</pre>
   <p>One solution would be to simply insert a space between all adjacent
tokens.  However, we would like to keep space insertion to a minimum,
both for aesthetic reasons and because it causes problems for people who
still try to abuse the preprocessor for things like Fortran source and
Makefiles.

   <p>For now, just notice that when tokens are added (or removed, as shown by
the <code>EMPTY</code> example) from the original lexed token stream, we need
to check for accidental token pasting.  We call this <dfn>paste
avoidance</dfn>.  Token addition and removal can only occur because of macro
expansion, but accidental pasting can occur in many places: both before
and after each macro replacement, each argument replacement, and
additionally each token created by the &lsquo;<samp><span class="samp">#</span></samp>&rsquo; and &lsquo;<samp><span class="samp">##</span></samp>&rsquo; operators.

   <p>Look at how the preprocessor gets whitespace output correct
normally.  The <code>cpp_token</code> structure contains a flags byte, and one
of those flags is <code>PREV_WHITE</code>.  This is flagged by the lexer, and
indicates that the token was preceded by whitespace of some form other
than a new line.  The stand-alone preprocessor can use this flag to
decide whether to insert a space between tokens in the output.

   <p>Now consider the result of the following macro expansion:

<pre class="smallexample">     #define add(x, y, z) x + y +z;
     sum = add (1,2, 3);
             ==&gt; sum = 1 + 2 +3;
</pre>
   <p>The interesting thing here is that the tokens &lsquo;<samp><span class="samp">1</span></samp>&rsquo; and &lsquo;<samp><span class="samp">2</span></samp>&rsquo; are
output with a preceding space, and &lsquo;<samp><span class="samp">3</span></samp>&rsquo; is output without a
preceding space, but when lexed none of these tokens had that property. 
Careful consideration reveals that &lsquo;<samp><span class="samp">1</span></samp>&rsquo; gets its preceding
whitespace from the space preceding &lsquo;<samp><span class="samp">add</span></samp>&rsquo; in the macro invocation,
<em>not</em> replacement list.  &lsquo;<samp><span class="samp">2</span></samp>&rsquo; gets its whitespace from the
space preceding the parameter &lsquo;<samp><span class="samp">y</span></samp>&rsquo; in the macro replacement list,
and &lsquo;<samp><span class="samp">3</span></samp>&rsquo; has no preceding space because parameter &lsquo;<samp><span class="samp">z</span></samp>&rsquo; has none
in the replacement list.

   <p>Once lexed, tokens are effectively fixed and cannot be altered, since
pointers to them might be held in many places, in particular by
in-progress macro expansions.  So instead of modifying the two tokens
above, the preprocessor inserts a special token, which I call a
<dfn>padding token</dfn>, into the token stream to indicate that spacing of
the subsequent token is special.  The preprocessor inserts padding
tokens in front of every macro expansion and expanded macro argument. 
These point to a <dfn>source token</dfn> from which the subsequent real token
should inherit its spacing.  In the above example, the source tokens are
&lsquo;<samp><span class="samp">add</span></samp>&rsquo; in the macro invocation, and &lsquo;<samp><span class="samp">y</span></samp>&rsquo; and &lsquo;<samp><span class="samp">z</span></samp>&rsquo; in the
macro replacement list, respectively.

   <p>It is quite easy to get multiple padding tokens in a row, for example if
a macro's first replacement token expands straight into another macro.

<pre class="smallexample">     #define foo bar
     #define bar baz
     [foo]
             ==&gt; [baz]
</pre>
   <p>Here, two padding tokens are generated with sources the &lsquo;<samp><span class="samp">foo</span></samp>&rsquo; token
between the brackets, and the &lsquo;<samp><span class="samp">bar</span></samp>&rsquo; token from foo's replacement
list, respectively.  Clearly the first padding token is the one to
use, so the output code should contain a rule that the first
padding token in a sequence is the one that matters.

   <p>But what if a macro expansion is left?  Adjusting the above
example slightly:

<pre class="smallexample">     #define foo bar
     #define bar EMPTY baz
     #define EMPTY
     [foo] EMPTY;
             ==&gt; [ baz] ;
</pre>
   <p>As shown, now there should be a space before &lsquo;<samp><span class="samp">baz</span></samp>&rsquo; and the
semicolon in the output.

   <p>The rules we decided above fail for &lsquo;<samp><span class="samp">baz</span></samp>&rsquo;: we generate three
padding tokens, one per macro invocation, before the token &lsquo;<samp><span class="samp">baz</span></samp>&rsquo;. 
We would then have it take its spacing from the first of these, which
carries source token &lsquo;<samp><span class="samp">foo</span></samp>&rsquo; with no leading space.

   <p>It is vital that cpplib get spacing correct in these examples since any
of these macro expansions could be stringified, where spacing matters.

   <p>So, this demonstrates that not just entering macro and argument
expansions, but leaving them requires special handling too.  I made
cpplib insert a padding token with a <code>NULL</code> source token when
leaving macro expansions, as well as after each replaced argument in a
macro's replacement list.  It also inserts appropriate padding tokens on
either side of tokens created by the &lsquo;<samp><span class="samp">#</span></samp>&rsquo; and &lsquo;<samp><span class="samp">##</span></samp>&rsquo; operators. 
I expanded the rule so that, if we see a padding token with a
<code>NULL</code> source token, <em>and</em> that source token has no leading
space, then we behave as if we have seen no padding tokens at all.  A
quick check shows this rule will then get the above example correct as
well.

   <p>Now a relationship with paste avoidance is apparent: we have to be
careful about paste avoidance in exactly the same locations we have
padding tokens in order to get white space correct.  This makes
implementation of paste avoidance easy: wherever the stand-alone
preprocessor is fixing up spacing because of padding tokens, and it
turns out that no space is needed, it has to take the extra step to
check that a space is not needed after all to avoid an accidental paste. 
The function <code>cpp_avoid_paste</code> advises whether a space is required
between two consecutive tokens.  To avoid excessive spacing, it tries
hard to only require a space if one is likely to be necessary, but for
reasons of efficiency it is slightly conservative and might recommend a
space where one is not strictly needed.

   </body></html>
the monkey dances and laughs and sings and dances 4 years ago			`<html lang="en">`
			`<head>`
			`<title>Token Spacing - The GNU C Preprocessor Internals</title>`
			`<meta http-equiv="Content-Type" content="text/html">`
			`<meta name="description" content="The GNU C Preprocessor Internals">`
			`<meta name="generator" content="makeinfo 4.13">`
			`<link title="Top" rel="start" href="index.html#Top">`
			`<link rel="prev" href="Macro-Expansion.html#Macro-Expansion" title="Macro Expansion">`
			`<link rel="next" href="Line-Numbering.html#Line-Numbering" title="Line Numbering">`
			`<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">`
			`<meta http-equiv="Content-Style-Type" content="text/css">`
			`<style type="text/css"><!--`
			`pre.display { font-family:inherit }`
			`pre.format { font-family:inherit }`
			`pre.smalldisplay { font-family:inherit; font-size:smaller }`
			`pre.smallformat { font-family:inherit; font-size:smaller }`
			`pre.smallexample { font-size:smaller }`
			`pre.smalllisp { font-size:smaller }`
			`span.sc { font-variant:small-caps }`
			`span.roman { font-family:serif; font-weight:normal; }`
			`span.sansserif { font-family:sans-serif; font-weight:normal; }`
			`--></style>`
			`</head>`
			`<body>`
			`<div class="node">`
			`<a name="Token-Spacing"></a>`
			`<p>`
			`Next: <a rel="next" accesskey="n" href="Line-Numbering.html#Line-Numbering">Line Numbering</a>,`
			`Previous: <a rel="previous" accesskey="p" href="Macro-Expansion.html#Macro-Expansion">Macro Expansion</a>,`
			`Up: <a rel="up" accesskey="u" href="index.html#Top">Top</a>`
			`<hr>`
			`</div>`

			`<h2 class="unnumbered">Token Spacing</h2>`

			`<p><a name="index-paste-avoidance-14"></a><a name="index-spacing-15"></a><a name="index-token-spacing-16"></a>`
			`First, consider an issue that only concerns the stand-alone`
			`preprocessor: there needs to be a guarantee that re-reading its preprocessed`
			`output results in an identical token stream. Without taking special`
			`measures, this might not be the case because of macro substitution.`
			`For example:`

			`<pre class="smallexample"> #define PLUS +`
			`#define EMPTY`
			`#define f(x) =x=`
			`+PLUS -EMPTY- PLUS+ f(=)`
			`==> + + - - + + = = =`
			`<em>not</em>`
			`==> ++ -- ++ ===`
			`</pre>`
			`<p>One solution would be to simply insert a space between all adjacent`
			`tokens. However, we would like to keep space insertion to a minimum,`
			`both for aesthetic reasons and because it causes problems for people who`
			`still try to abuse the preprocessor for things like Fortran source and`
			`Makefiles.`

			`<p>For now, just notice that when tokens are added (or removed, as shown by`
			`the <code>EMPTY</code> example) from the original lexed token stream, we need`
			`to check for accidental token pasting. We call this <dfn>paste`
			`avoidance</dfn>. Token addition and removal can only occur because of macro`
			`expansion, but accidental pasting can occur in many places: both before`
			`and after each macro replacement, each argument replacement, and`
			`additionally each token created by the ‘<samp><span class="samp">#</span></samp>’ and ‘<samp><span class="samp">##</span></samp>’ operators.`

			`<p>Look at how the preprocessor gets whitespace output correct`
			`normally. The <code>cpp_token</code> structure contains a flags byte, and one`
			`of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and`
			`indicates that the token was preceded by whitespace of some form other`
			`than a new line. The stand-alone preprocessor can use this flag to`
			`decide whether to insert a space between tokens in the output.`

			`<p>Now consider the result of the following macro expansion:`

			`<pre class="smallexample"> #define add(x, y, z) x + y +z;`
			`sum = add (1,2, 3);`
			`==> sum = 1 + 2 +3;`
			`</pre>`
			`<p>The interesting thing here is that the tokens ‘<samp><span class="samp">1</span></samp>’ and ‘<samp><span class="samp">2</span></samp>’ are`
			`output with a preceding space, and ‘<samp><span class="samp">3</span></samp>’ is output without a`
			`preceding space, but when lexed none of these tokens had that property.`
			`Careful consideration reveals that ‘<samp><span class="samp">1</span></samp>’ gets its preceding`
			`whitespace from the space preceding ‘<samp><span class="samp">add</span></samp>’ in the macro invocation,`
			`<em>not</em> replacement list. ‘<samp><span class="samp">2</span></samp>’ gets its whitespace from the`
			`space preceding the parameter ‘<samp><span class="samp">y</span></samp>’ in the macro replacement list,`
			`and ‘<samp><span class="samp">3</span></samp>’ has no preceding space because parameter ‘<samp><span class="samp">z</span></samp>’ has none`
			`in the replacement list.`

			`<p>Once lexed, tokens are effectively fixed and cannot be altered, since`
			`pointers to them might be held in many places, in particular by`
			`in-progress macro expansions. So instead of modifying the two tokens`
			`above, the preprocessor inserts a special token, which I call a`
			`<dfn>padding token</dfn>, into the token stream to indicate that spacing of`
			`the subsequent token is special. The preprocessor inserts padding`
			`tokens in front of every macro expansion and expanded macro argument.`
			`These point to a <dfn>source token</dfn> from which the subsequent real token`
			`should inherit its spacing. In the above example, the source tokens are`
			`‘<samp><span class="samp">add</span></samp>’ in the macro invocation, and ‘<samp><span class="samp">y</span></samp>’ and ‘<samp><span class="samp">z</span></samp>’ in the`
			`macro replacement list, respectively.`

			`<p>It is quite easy to get multiple padding tokens in a row, for example if`
			`a macro's first replacement token expands straight into another macro.`

			`<pre class="smallexample"> #define foo bar`
			`#define bar baz`
			`[foo]`
			`==> [baz]`
			`</pre>`
			`<p>Here, two padding tokens are generated with sources the ‘<samp><span class="samp">foo</span></samp>’ token`
			`between the brackets, and the ‘<samp><span class="samp">bar</span></samp>’ token from foo's replacement`
			`list, respectively. Clearly the first padding token is the one to`
			`use, so the output code should contain a rule that the first`
			`padding token in a sequence is the one that matters.`

			`<p>But what if a macro expansion is left? Adjusting the above`
			`example slightly:`

			`<pre class="smallexample"> #define foo bar`
			`#define bar EMPTY baz`
			`#define EMPTY`
			`[foo] EMPTY;`
			`==> [ baz] ;`
			`</pre>`
			`<p>As shown, now there should be a space before ‘<samp><span class="samp">baz</span></samp>’ and the`
			`semicolon in the output.`

			`<p>The rules we decided above fail for ‘<samp><span class="samp">baz</span></samp>’: we generate three`
			`padding tokens, one per macro invocation, before the token ‘<samp><span class="samp">baz</span></samp>’.`
			`We would then have it take its spacing from the first of these, which`
			`carries source token ‘<samp><span class="samp">foo</span></samp>’ with no leading space.`

			`<p>It is vital that cpplib get spacing correct in these examples since any`
			`of these macro expansions could be stringified, where spacing matters.`

			`<p>So, this demonstrates that not just entering macro and argument`
			`expansions, but leaving them requires special handling too. I made`
			`cpplib insert a padding token with a <code>NULL</code> source token when`
			`leaving macro expansions, as well as after each replaced argument in a`
			`macro's replacement list. It also inserts appropriate padding tokens on`
			`either side of tokens created by the ‘<samp><span class="samp">#</span></samp>’ and ‘<samp><span class="samp">##</span></samp>’ operators.`
			`I expanded the rule so that, if we see a padding token with a`
			`<code>NULL</code> source token, <em>and</em> that source token has no leading`
			`space, then we behave as if we have seen no padding tokens at all. A`
			`quick check shows this rule will then get the above example correct as`
			`well.`

			`<p>Now a relationship with paste avoidance is apparent: we have to be`
			`careful about paste avoidance in exactly the same locations we have`
			`padding tokens in order to get white space correct. This makes`
			`implementation of paste avoidance easy: wherever the stand-alone`
			`preprocessor is fixing up spacing because of padding tokens, and it`
			`turns out that no space is needed, it has to take the extra step to`
			`check that a space is not needed after all to avoid an accidental paste.`
			`The function <code>cpp_avoid_paste</code> advises whether a space is required`
			`between two consecutive tokens. To avoid excessive spacing, it tries`
			`hard to only require a space if one is likely to be necessary, but for`
			`reasons of efficiency it is slightly conservative and might recommend a`
			`space where one is not strictly needed.`

			`</body></html>`