You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
161 lines
8.6 KiB
HTML
161 lines
8.6 KiB
HTML
4 years ago
|
<html lang="en">
|
||
|
<head>
|
||
|
<title>Token Spacing - The GNU C Preprocessor Internals</title>
|
||
|
<meta http-equiv="Content-Type" content="text/html">
|
||
|
<meta name="description" content="The GNU C Preprocessor Internals">
|
||
|
<meta name="generator" content="makeinfo 4.13">
|
||
|
<link title="Top" rel="start" href="index.html#Top">
|
||
|
<link rel="prev" href="Macro-Expansion.html#Macro-Expansion" title="Macro Expansion">
|
||
|
<link rel="next" href="Line-Numbering.html#Line-Numbering" title="Line Numbering">
|
||
|
<link href="http://www.gnu.org/software/texinfo/" rel="generator-home" title="Texinfo Homepage">
|
||
|
<meta http-equiv="Content-Style-Type" content="text/css">
|
||
|
<style type="text/css"><!--
|
||
|
pre.display { font-family:inherit }
|
||
|
pre.format { font-family:inherit }
|
||
|
pre.smalldisplay { font-family:inherit; font-size:smaller }
|
||
|
pre.smallformat { font-family:inherit; font-size:smaller }
|
||
|
pre.smallexample { font-size:smaller }
|
||
|
pre.smalllisp { font-size:smaller }
|
||
|
span.sc { font-variant:small-caps }
|
||
|
span.roman { font-family:serif; font-weight:normal; }
|
||
|
span.sansserif { font-family:sans-serif; font-weight:normal; }
|
||
|
--></style>
|
||
|
</head>
|
||
|
<body>
|
||
|
<div class="node">
|
||
|
<a name="Token-Spacing"></a>
|
||
|
<p>
|
||
|
Next: <a rel="next" accesskey="n" href="Line-Numbering.html#Line-Numbering">Line Numbering</a>,
|
||
|
Previous: <a rel="previous" accesskey="p" href="Macro-Expansion.html#Macro-Expansion">Macro Expansion</a>,
|
||
|
Up: <a rel="up" accesskey="u" href="index.html#Top">Top</a>
|
||
|
<hr>
|
||
|
</div>
|
||
|
|
||
|
<h2 class="unnumbered">Token Spacing</h2>
|
||
|
|
||
|
<p><a name="index-paste-avoidance-14"></a><a name="index-spacing-15"></a><a name="index-token-spacing-16"></a>
|
||
|
First, consider an issue that only concerns the stand-alone
|
||
|
preprocessor: there needs to be a guarantee that re-reading its preprocessed
|
||
|
output results in an identical token stream. Without taking special
|
||
|
measures, this might not be the case because of macro substitution.
|
||
|
For example:
|
||
|
|
||
|
<pre class="smallexample"> #define PLUS +
|
||
|
#define EMPTY
|
||
|
#define f(x) =x=
|
||
|
+PLUS -EMPTY- PLUS+ f(=)
|
||
|
==> + + - - + + = = =
|
||
|
<em>not</em>
|
||
|
==> ++ -- ++ ===
|
||
|
</pre>
|
||
|
<p>One solution would be to simply insert a space between all adjacent
|
||
|
tokens. However, we would like to keep space insertion to a minimum,
|
||
|
both for aesthetic reasons and because it causes problems for people who
|
||
|
still try to abuse the preprocessor for things like Fortran source and
|
||
|
Makefiles.
|
||
|
|
||
|
<p>For now, just notice that when tokens are added (or removed, as shown by
|
||
|
the <code>EMPTY</code> example) from the original lexed token stream, we need
|
||
|
to check for accidental token pasting. We call this <dfn>paste
|
||
|
avoidance</dfn>. Token addition and removal can only occur because of macro
|
||
|
expansion, but accidental pasting can occur in many places: both before
|
||
|
and after each macro replacement, each argument replacement, and
|
||
|
additionally each token created by the ‘<samp><span class="samp">#</span></samp>’ and ‘<samp><span class="samp">##</span></samp>’ operators.
|
||
|
|
||
|
<p>Look at how the preprocessor gets whitespace output correct
|
||
|
normally. The <code>cpp_token</code> structure contains a flags byte, and one
|
||
|
of those flags is <code>PREV_WHITE</code>. This is flagged by the lexer, and
|
||
|
indicates that the token was preceded by whitespace of some form other
|
||
|
than a new line. The stand-alone preprocessor can use this flag to
|
||
|
decide whether to insert a space between tokens in the output.
|
||
|
|
||
|
<p>Now consider the result of the following macro expansion:
|
||
|
|
||
|
<pre class="smallexample"> #define add(x, y, z) x + y +z;
|
||
|
sum = add (1,2, 3);
|
||
|
==> sum = 1 + 2 +3;
|
||
|
</pre>
|
||
|
<p>The interesting thing here is that the tokens ‘<samp><span class="samp">1</span></samp>’ and ‘<samp><span class="samp">2</span></samp>’ are
|
||
|
output with a preceding space, and ‘<samp><span class="samp">3</span></samp>’ is output without a
|
||
|
preceding space, but when lexed none of these tokens had that property.
|
||
|
Careful consideration reveals that ‘<samp><span class="samp">1</span></samp>’ gets its preceding
|
||
|
whitespace from the space preceding ‘<samp><span class="samp">add</span></samp>’ in the macro invocation,
|
||
|
<em>not</em> replacement list. ‘<samp><span class="samp">2</span></samp>’ gets its whitespace from the
|
||
|
space preceding the parameter ‘<samp><span class="samp">y</span></samp>’ in the macro replacement list,
|
||
|
and ‘<samp><span class="samp">3</span></samp>’ has no preceding space because parameter ‘<samp><span class="samp">z</span></samp>’ has none
|
||
|
in the replacement list.
|
||
|
|
||
|
<p>Once lexed, tokens are effectively fixed and cannot be altered, since
|
||
|
pointers to them might be held in many places, in particular by
|
||
|
in-progress macro expansions. So instead of modifying the two tokens
|
||
|
above, the preprocessor inserts a special token, which I call a
|
||
|
<dfn>padding token</dfn>, into the token stream to indicate that spacing of
|
||
|
the subsequent token is special. The preprocessor inserts padding
|
||
|
tokens in front of every macro expansion and expanded macro argument.
|
||
|
These point to a <dfn>source token</dfn> from which the subsequent real token
|
||
|
should inherit its spacing. In the above example, the source tokens are
|
||
|
‘<samp><span class="samp">add</span></samp>’ in the macro invocation, and ‘<samp><span class="samp">y</span></samp>’ and ‘<samp><span class="samp">z</span></samp>’ in the
|
||
|
macro replacement list, respectively.
|
||
|
|
||
|
<p>It is quite easy to get multiple padding tokens in a row, for example if
|
||
|
a macro's first replacement token expands straight into another macro.
|
||
|
|
||
|
<pre class="smallexample"> #define foo bar
|
||
|
#define bar baz
|
||
|
[foo]
|
||
|
==> [baz]
|
||
|
</pre>
|
||
|
<p>Here, two padding tokens are generated with sources the ‘<samp><span class="samp">foo</span></samp>’ token
|
||
|
between the brackets, and the ‘<samp><span class="samp">bar</span></samp>’ token from foo's replacement
|
||
|
list, respectively. Clearly the first padding token is the one to
|
||
|
use, so the output code should contain a rule that the first
|
||
|
padding token in a sequence is the one that matters.
|
||
|
|
||
|
<p>But what if a macro expansion is left? Adjusting the above
|
||
|
example slightly:
|
||
|
|
||
|
<pre class="smallexample"> #define foo bar
|
||
|
#define bar EMPTY baz
|
||
|
#define EMPTY
|
||
|
[foo] EMPTY;
|
||
|
==> [ baz] ;
|
||
|
</pre>
|
||
|
<p>As shown, now there should be a space before ‘<samp><span class="samp">baz</span></samp>’ and the
|
||
|
semicolon in the output.
|
||
|
|
||
|
<p>The rules we decided above fail for ‘<samp><span class="samp">baz</span></samp>’: we generate three
|
||
|
padding tokens, one per macro invocation, before the token ‘<samp><span class="samp">baz</span></samp>’.
|
||
|
We would then have it take its spacing from the first of these, which
|
||
|
carries source token ‘<samp><span class="samp">foo</span></samp>’ with no leading space.
|
||
|
|
||
|
<p>It is vital that cpplib get spacing correct in these examples since any
|
||
|
of these macro expansions could be stringified, where spacing matters.
|
||
|
|
||
|
<p>So, this demonstrates that not just entering macro and argument
|
||
|
expansions, but leaving them requires special handling too. I made
|
||
|
cpplib insert a padding token with a <code>NULL</code> source token when
|
||
|
leaving macro expansions, as well as after each replaced argument in a
|
||
|
macro's replacement list. It also inserts appropriate padding tokens on
|
||
|
either side of tokens created by the ‘<samp><span class="samp">#</span></samp>’ and ‘<samp><span class="samp">##</span></samp>’ operators.
|
||
|
I expanded the rule so that, if we see a padding token with a
|
||
|
<code>NULL</code> source token, <em>and</em> that source token has no leading
|
||
|
space, then we behave as if we have seen no padding tokens at all. A
|
||
|
quick check shows this rule will then get the above example correct as
|
||
|
well.
|
||
|
|
||
|
<p>Now a relationship with paste avoidance is apparent: we have to be
|
||
|
careful about paste avoidance in exactly the same locations we have
|
||
|
padding tokens in order to get white space correct. This makes
|
||
|
implementation of paste avoidance easy: wherever the stand-alone
|
||
|
preprocessor is fixing up spacing because of padding tokens, and it
|
||
|
turns out that no space is needed, it has to take the extra step to
|
||
|
check that a space is not needed after all to avoid an accidental paste.
|
||
|
The function <code>cpp_avoid_paste</code> advises whether a space is required
|
||
|
between two consecutive tokens. To avoid excessive spacing, it tries
|
||
|
hard to only require a space if one is likely to be necessary, but for
|
||
|
reasons of efficiency it is slightly conservative and might recommend a
|
||
|
space where one is not strictly needed.
|
||
|
|
||
|
</body></html>
|
||
|
|