Term rewriting macros for Nimrod

Author: Andreas Rumpf

Term rewriting macros are macros or templates that have not only a name but also a pattern that is searched for after the semantic checking phase of the compiler: This means they provide an easy way to enhance the compilation pipeline with user defined optimizations:

template optMul{`*`(a, 2)}(a: int): int = a+a

let x = 3
echo x * 2

The compiler now rewrites x * 2 as x + x. The code inside the curlies is the pattern to match against. The operators *, **, |, ~ have a special meaning in patterns if they are written in infix notation, so to match verbatim against * the ordinary function call syntax needs to be used.

Unfortunately optimizations are hard to get right and even the tiny example is wrong:

template optMul{`*`(a, 2)}(a: int): int = a+a

proc f(): int =
  echo "side effect!"
  result = 55

echo f() * 2

We cannot duplicate 'a' if it denotes an expression that has a side effect! Fortunately Nimrod supports side effect analysis:

template optMul{`*`(a, 2)}(a: int{noSideEffect}): int = a+a

proc f(): int =
  echo "side effect!"
  result = 55

echo f() * 2 # not optimized ;-)

So what about 2 * a? We should tell the compiler * is commutative. We cannot really do that however as the following code only swaps arguments blindly:

template mulIsCommutative{`*`(a, b)}(a, b: int): int = b*a

What optimizers really need to do is a canonicalization:

template canonMul{`*`(a, b)}(a: int{lit}, b: int): int = b*a

The int{lit} parameter pattern matches against an expression of type int, but only if it's a literal.

Parameter constraints

The parameter constraint expression can use the operators | (or), & (and) and ~ (not) and the following predicates:

PredicateMeaning
atomThe matching node has no children.
litThe matching node is a literal like "abc", 12.
symThe matching node must be a symbol (a bound identifier).
identThe matching node must be an identifier (an unbound identifier).
callThe matching AST must be a call/apply expression.
lvalueThe matching AST must be an lvalue.
sideeffectThe matching AST must have a side effect.
nosideeffectThe matching AST must have no side effect.
paramA symbol which is a parameter.
genericparamA symbol which is a generic parameter.
moduleA symbol which is a module.
typeA symbol which is a type.
varA symbol which is a variable.
letA symbol which is a let variable.
constA symbol which is a constant.
resultThe special result variable.
procA symbol which is a proc.
methodA symbol which is a method.
iteratorA symbol which is an iterator.
converterA symbol which is a converter.
macroA symbol which is a macro.
templateA symbol which is a template.
fieldA symbol which is a field in a tuple or an object.
enumfieldA symbol which is a field in an enumeration.
forvarA for loop variable.
labelA label (used in block statements).
nk*The matching AST must have the specified kind. (Example: nkIfStmt denotes an if statement.)
aliasStates that the marked parameter needs to alias with some other parameter.
noaliasStates that every other parameter must not alias with the marked parameter.

The alias and noalias predicates refer not only to the matching AST, but also to every other bound parameter; syntactially they need to occur after the ordinary AST predicates:

template ex{a = b + c}(a: int{noalias}, b, c: int) =
  # this transformation is only valid if 'b' and 'c' do not alias 'a':
  a = b
  inc a, b

Pattern operators

The operators *, **, |, ~ have a special meaning in patterns if they are written in infix notation.

The | operator

The | operator if used as infix operator creates an ordered choice:

template t{0|1}(): expr = 3
let a = 1
# outputs 3:
echo a

The matching is performed after the compiler performed some optimizations like constant folding, so the following does not work:

template t{0|1}(): expr = 3
# outputs 1:
echo 1

The reason is that the compiler already transformed the 1 into "1" for the echo statement. However, a term rewriting macro should not change the semantics anyway. In fact they can be deactived with the --patterns:off command line option or temporarily with the patterns pragma.

The {} operator

A pattern expression can be bound to a pattern parameter via the expr{param} notation:

template t{(0|1|2){x}}(x: expr): expr = x+1
let a = 1
# outputs 2:
echo a

The ~ operator

The ~ operator is the not operator in patterns:

template t{x = (~x){y} and (~x){z}}(x, y, z: bool): stmt =
  x = y
  if x: x = z

var
  a = false
  b = true
  c = false
a = b and c
echo a

The * operator

The * operator can flatten a nested binary expression like a & b & c to &(a, b, c):

var
  calls = 0

proc `&&`(s: varargs[string]): string =
  result = s[0]
  for i in 1..len(s)-1: result.add s[i]
  inc calls

template optConc{ `&&` * a }(a: string): expr = &&a

let space = " "
echo "my" && (space & "awe" && "some " ) && "concat"

# check that it's been optimized properly:
doAssert calls == 1

The second operator of * must be a parameter; it is used to gather all the arguments. The expression "my" && (space & "awe" && "some " ) && "concat" is passed to optConc in a as a special list (of kind nkArgList) which is flattened into a call expression; thus the invocation of optConc produces:

`&&`("my", space & "awe", "some ", "concat")

The ** operator

The ** is much like the * operator, except that it gathers not only all the arguments, but also the matched operators in reverse polish notation:

import macros

type
  TMatrix = object
    dummy: int

proc `*`(a, b: TMatrix): TMatrix = nil
proc `+`(a, b: TMatrix): TMatrix = nil
proc `-`(a, b: TMatrix): TMatrix = nil
proc `$`(a: TMatrix): string = result = $a.dummy
proc mat21(): TMatrix =
  result.dummy = 21

macro optM{ (`+`|`-`|`*`) ** a }(a: TMatrix): expr =
  echo treeRepr(a)
  result = newCall(bindSym"mat21")

var x, y, z: TMatrix

echo x + y * z - x

This passes the expression x + y * z - x to the optM macro as an nnkArgList node containing:

Arglist
  Sym "x"
  Sym "y"
  Sym "z"
  Sym "*"
  Sym "+"
  Sym "x"
  Sym "-"

(Which is the reverse polish notation of x + y * z - x.)

Parameters

Parameters in a pattern are type checked in the matching process. If a parameter is of the type varargs it is treated specially and it can match 0 or more arguments in the AST to be matched against:

template optWrite{
  write(f, x)
  ((write|writeln){w})(f, y)
}(x, y: varargs[expr], f: TFile, w: expr) =
  w(f, x, y)
Generated: 2014-03-11 21:26:34 UTC