The language to write expression simplifications in resembles other domain-specific languages GCC uses. Thus it is lispy. Lets start with an example from the match.pd file:
(simplify (bit_and @0 integer_all_onesp) @0)
This example contains all required parts of an expression simplification.
A simplification is wrapped inside a (simplify ...)
expression.
That contains at least two operands - an expression that is matched
with the GIMPLE or GENERIC IL and a replacement expression that is
returned if the match was successful.
Expressions have an operator ID, bit_and
in this case. Expressions can
be lower-case tree codes with _expr
stripped off or builtin
function code names in all-caps, like BUILT_IN_SQRT
.
@n
denotes a so-called capture. It captures the operand and lets
you refer to it in other places of the match-and-simplify. In the
above example it is refered to in the replacement expression. Captures
are @
followed by a number or an identifier.
(simplify (bit_xor @0 @0) { build_zero_cst (type); })
In this example @0
is mentioned twice which constrains the matched
expression to have two equal operands. This example also introduces
operands written in C code. These can be used in the expression
replacements and are supposed to evaluate to a tree node which has to
be a valid GIMPLE operand (so you cannot generate expressions in C code).
(simplify (trunc_mod integer_zerop@0 @1) (if (!integer_zerop (@1))) @0)
Here @0
captures the first operand of the trunc_mod expression
which is also predicated with integer_zerop
. Expression operands
may be either expressions, predicates or captures. Captures
can be unconstrained or capture expresions or predicates.
This example introduces an optional operand of simplify,
the if-expression. This condition is evaluated after the
expression matched in the IL and is required to evaluate to true
to enable the replacement expression. The expression operand
of the if
is a standard C expression which may contain references
to captures.
A if
expression can be used to specify a common condition
for multiple simplify patterns, avoiding the need
to repeat that multiple times:
(if (!TYPE_SATURATING (type) && !FLOAT_TYPE_P (type) && !FIXED_POINT_TYPE_P (type)) (simplify (minus (plus @0 @1) @0) @1) (simplify (minus (minus @0 @1) @0) (negate @1)))
Ifs can be nested.
Captures can also be used for capturing results of sub-expressions.
#if GIMPLE (simplify (pointer_plus (addr@2 @0) INTEGER_CST_P@1) (if (is_gimple_min_invariant (@2))) { HOST_WIDE_INT off; tree base = get_addr_base_and_unit_offset (@0, &off); off += tree_to_uhwi (@1); /* Now with that we should be able to simply write (addr (mem_ref (addr @base) (plus @off @1))) */ build1 (ADDR_EXPR, type, build2 (MEM_REF, TREE_TYPE (TREE_TYPE (@2)), build_fold_addr_expr (base), build_int_cst (ptr_type_node, off))); }) #endif
In the above example, @2
captures the result of the expression
(addr @0)
. For outermost expression only its type can be captured,
and the keyword type
is reserved for this purpose. The above
example also gives a way to conditionalize patterns to only apply
to GIMPLE
or GENERIC
by means of using the pre-defined
preprocessor macros GIMPLE
and GENERIC
and using
preprocessor directives.
(simplify (bit_and:c integral_op_p@0 (bit_ior:c (bit_not @0) @1)) (bit_and @1 @0))
Here we introduce flags on match expressions. There is currently
a single flag, c
, which denotes that the expression should
be also matched commutated. Thus the above match expression
is really the following four match expressions:
(bit_and integral_op_p@0 (bit_ior (bit_not @0) @1)) (bit_and (bit_ior (bit_not @0) @1) integral_op_p@0) (bit_and integral_op_p@0 (bit_ior @1 (bit_not @0))) (bit_and (bit_ior @1 (bit_not @0)) integral_op_p@0)
Usual canonicalizations you know from GENERIC expressions are applied before matching, so for example constant operands always come second in commutative expressions.
More features exist to avoid too much repetition.
(for op (plus pointer_plus minus bit_ior bit_xor) (simplify (op @0 integer_zerop) @0))
A for
expression can be used to repeat a pattern for each
operator specified, substituting op
. for
can be
nested and a for
can have multiple operators to iterate.
(for opa (plus minus) opb (minus plus) (for opc (plus minus) (simplify...
In this example the pattern will be repeated four times with
opa, opb, opc
being plus, minus, plus
,
plus, minus, minus
, minus, plus, plus
,
minus, plus, minus
.
To avoid repeating operator lists in for
you can name
them via
(define_operator_list pmm plus minus mult)
and use them in for
operator lists where they get expanded.
(for opa (pmm trunc_div) (simplify...
So this example iterates over plus
, minus
, mult
and trunc_div
.
Using operator lists can also remove the need to explicitely write
a for
. All operator list uses that appear in a simplify
or match
pattern in operator positions will implicitely
be added to a new for
. For example
(define_operator_list SQRT BUILT_IN_SQRTF BUILT_IN_SQRT BUILT_IN_SQRTL) (define_operator_list POW BUILT_IN_POWF BUILT_IN_POW BUILT_IN_POWL) (simplify (SQRT (POW @0 @1)) (POW (abs @0) (mult @1 { built_real (TREE_TYPE (@1), dconsthalf); })))
is the same as
(for SQRT (BUILT_IN_SQRTF BUILT_IN_SQRT BUILT_IN_SQRTL) POW (BUILT_IN_POWF BUILT_IN_POW BUILT_IN_POWL) (simplify (SQRT (POW @0 @1)) (POW (abs @0) (mult @1 { built_real (TREE_TYPE (@1), dconsthalf); }))))
Another building block are with
expressions in the
result expression which nest the generated code in a new C block
followed by its argument:
(simplify (convert (mult @0 @1)) (with { tree utype = unsigned_type_for (type); } (convert (mult (convert:utype @0) (convert:utype @1)))))
This allows code nested in the with
to refer to the declared
variables. In the above case we use the feature to specify the
type of a generated expression with the :type
syntax where
type
needs to be an identifier that refers to the desired type.
Usually the types of the generated result expressions are
determined from the context, but sometimes like in the above case
it is required that you specify them explicitely.
As intermediate conversions are often optional there is a way to
avoid the need to repeat patterns both with and without such
conversions. Namely you can mark a conversion as being optional
with a ?
:
(simplify (eq (convert@0 @1) (convert? @2)) (eq @1 (convert @2)))
which will match both (eq (convert @1) (convert @2))
and
(eq (convert @1) @2)
. The optional converts are supposed
to be all either present or not, thus
(eq (convert? @1) (convert? @2))
will result in two
patterns only. If you want to match all four combinations you
have access to two additional conditional converts as in
(eq (convert1? @1) (convert2? @2))
.
Predicates available from the GCC middle-end need to be made
available explicitely via define_predicates
:
(define_predicates integer_onep integer_zerop integer_all_onesp)
You can also define predicates using the pattern matching language
and the match
form:
(match negate_expr_p INTEGER_CST (if (TYPE_OVERFLOW_WRAPS (type) || may_negate_without_overflow_p (t)))) (match negate_expr_p (negate @0))
This shows that for match
expressions there is t
available which captures the outermost expression (something
not possible in the simplify
context). As you can see
match
has an identifier as first operand which is how
you refer to the predicate in patterns. Multiple match
for the same identifier add additional cases where the predicate
matches.
Predicates can also match an expression in which case you need to provide a template specifying the identifier and where to get its operands from:
(match (logical_inverted_value @0) (eq @0 integer_zerop)) (match (logical_inverted_value @0) (bit_not truth_valued_p@0))
You can use the above predicate like
(simplify (bit_and @0 (logical_inverted_value @0)) { build_zero_cst (type); })
Which will match a bitwise and of an operand with its logical inverted value.