Metaprogramming in Nim

A couple weeks ago, I set out to give Golang another try. I don't want to talk about Golang in this article but suffice to say it wasn't long before I found myself perusing Rust's documentation.

In experimenting with Rust I learned about another language called Nim (formerly Nimrod) which caught my attention with its remotely Python looking syntax. Apparently Nim is a statically-typed imperative language with an optional non-tracing GC, rounded generics, compiles to (reportedly) quality C and has a strong focus on meta-programming.

Now, I've been focused on Python for a long time and while I have had non-trivial time with some statically-typed languages over the years (C# mostly) and have come to appreciate a good generics system I have never really had any experience with language macros whatsoever. Frankly, I had a bad impression of them and really only had a sense of them in the context of Lisp.

I suppose that's why I decided to give Nim a good look. I wanted to know how a language that looks like Python could have a useful macro and templating system and what that even meant.

The test app

I started by writing a small app in Nim that implemented a procedure called execute that took a sequence of int and a sequence of proc procedures. The idea being that execute would iterate over the sequence of int using each value as an index into sequence of proc and calling the procedure to which it referred. Essentially the first sequence is a script or ordering by which to call the functions in the second sequence:

import macros

proc A() = echo "A"
proc B() = echo "B"
proc C() = echo "C"

proc execute(order: seq[int], callbacks: seq[proc]) =
  for i in items(order):
    callbacks[i]()

execute(@[0,0,1,2,1,2], @[A, B, C])

A A B C B C

This is a simple pointless app but what it gives us is some repetition that we can try to eliminate with Nim's meta-programming facilities.

Nim templates

The most straightforward way to save ourselves from implementing the three A, B and C procs is going to be templates which are a substitution mechanism for Nim ast!

The idea is that we write a template function, and at compile-time, any invocation of the template will be replaced by the contents of the template body's AST. Alright, that's pretty cool.

import macros

template abcProc(n: expr): stmt  =
  proc n() = echo astToStr(n)

abcProc(A)
abcProc(B)
abcProc(C)

proc execute(order: seq[int], callbacks: seq[proc]) =
  for i in items(order):
    callbacks[i]()

execute(@[0,0,1,2,1,2], @[A, B, C])

A A B C B C

It works. At compile time each invocation of abcProc is replaced with an expansion of the template body's AST. In this case it is our original proc but with the template expression n substituting the places we originally hardcoded each letter such as each A in proc A() = echo "A"

parseStmt Macro

With a template, it is the AST of the template body that is substituted for any invocation of the template. Macros however are different. The Macro acts more like a normal function that builds and returns a construct AST object. It is the return value of the macro that is substituted for the invocation. The macro is not constrained in the sense that it's body must literally implement the AST to be substituted. Instead it can use for loop or any other normal procedural means to generate the resulting AST. This is far more powerful but obviously a bit more tedious.

As it happens, in our simple case we can leverage the help of macros.parseStmt which will take a string and give you back the AST of the code in the string!

That means all we have to do is simply interpolate the macro's argument into a string resembling the original A, B, and C procs above and parse it with parseStmt as the result:

import macros

macro abcProc(n: expr): stmt =
  result = parseStmt("proc $1() = echo \"$1\"" % $n)

abcProc(A)
abcProc(B)
abcProc(C)

proc execute(order: seq[int], callbacks: seq[proc]) =
  for i in items(order):
    callbacks[i]()

execute(@[0,0,1,2,1,2], @[A, B, C])

A A B C B C

Pretty straight-forward. We build a string with the code we want, parseStmt parses it into AST and the macro returns it. I have to imagine that parseStmt is a macro writer's best friend.

Building the AST

parseStmt is handy, but we should also be able to build the exact AST that we want ourselves and indeed we can. In this version we construct the exact proc objects we need. Did I mention how cool this is?!

import macros

macro abcProc(n: expr): stmt  =
  let body = newCall("echo", newStrLitNode($n))
  result = newProc(n, body=body)

abcProc(A)
abcProc(B)
...
A A B C B C

Let's break this one down. First, the macro constructs the body of the proc which is our call to echo where we print out the corresponding letter. Using newCall we can specify the proc we want to call and its arguments. We create our string argument by constructing a "string literal node" with the stringified n arg (that's what $ does) to newStrLitNode.

On the second line, we then construct a new proc with newProc; the first argument the name of the proc and the second being the body we just constructed.

At compile-time every invocation of abcProc will be replaced by the AST returned by executing the macro. The macro will build a tiny proc that executes echo specifically using the letteral identifier name we pass to abcProc.

Vararg Macros

Since our example features the repetition of only very trivially small procs it is not surprising that we are not saving much typing by utilizing meta-programming. However it should be easy to extrapolate the mechanics out to much larger or more complicated procedures and imagine the benefits.

That said, we can reduce how much is needed to create our three procs by upgrading our macro to support a variable number of arguments.

import macros

macro abcProcs(n: varargs[expr]): stmt  =
  result = newStmtList()
  for i in 0.. <n.len:
    let body = newCall("echo", newStrLitNode($n[i]))
    result.add(newProc(n[i], body=body))

abcProcs(A, B, C)
...
test.nim(9, 9) Error: undeclared identifier: 'A'

Whoops. This version barfs. The error tells us that at our invocation of the macro, abcProcs(A, B, C) that A is an unknown identifier. The reason for this is because we have hit a subtlety in Nim's current macro implementation. You see, there are actually two kinds of macros; Ordinary and Immediate.

We may have expected to see an 'undeclared identifier' error earlier in this article such as when we invoked our single-arg macro with abcProc(A). Since afterall, A is undeclared. What are we actually passing to the macro when we invoke it and pass an undeclared A. And how come we're only seeing the error now that we are passing multiple arguments to the macro.

The documentation explains the difference between Ordinary and Immediate macros:

There are two different kinds of templates: immediate templates and ordinary templates. Ordinary templates take part in overloading resolution. As such their arguments need to be type checked before the template is invoked. So ordinary templates cannot receive undeclared identifiers.

This means that normal Nim macro's enforce the constraint that any arguments passed to the macro must have been declared. The compiler can only know the type of arguments that you've previously declared so this makes sense that it is a dependency for distinguishing between overloaded calls to the same macro.

So then why does it work for our single-argument macros? It turns out that Nim has actually had some work done to implicitly 'demote' a macro to its immediate form when you call it with undeclared arguments. However, that is only supported for single argument macros.

What the compiler is doing is turning this: macro abcProc(n: expr): stmt = let body = newCall("echo", newStrLitNode($n)) result = newProc(n, body=body)

into this:

macro abcProc(n: expr): stmt {.immediate.} =
  let body = newCall("echo", newStrLitNode($n))
  result = newProc(n, body=body)

...based on whether or not you invoke the macro with undeclared arguments or not (and the macro is defined to accept them as type expr).

So what does it mean for a macro to be immediate anyway? It means that the arguments you pass to the macro are passed lexically. If we invoke abcProc with an undeclared identifier A as in abcProc(A), the argument n will be filled with the AST node (nnkIdent) representing the identifer rather than the value of whatever type A was declared to be in the non-immediate case (such as an int, or string, or whatever).

The compiler doesn't make this mode switch automatically for macros invoked with multiple arguments. That's okay adding the "pragma" tag is easy enough:

import macros

macro abcProcs(n: varargs[expr]): stmt {.immediate.}  =
  result = newStmtList()
  for i in 0.. <n.len:
    let body = newCall("echo", newStrLitNode($n[i]))
    result.add(newProc(n[i], body=body))

abcProcs(A, B, C)

proc execute(order: seq[int], callbacks: seq[proc]) =
  for i in items(order):
    callbacks[i]()

execute(@[0,0,1,2,1,2], @[A, B, C])
test.nim(15, 9) Error: undeclared identifier: 'A'

Huh, okay. This time we have the same error but it occurs 6 lines lower at 15 at our invocation of execute. This indicates that our macro invocation succeeded, the compiler didn't have a problem of us referring to A before it existed any longer. Since we tagged our macro as being immediate, it merely passed the meaningless AST token A.

But by the time we reach line 15, A should definitely exist. The macro invocation should have been expaned to the definition of our three procs A, B, and C. Alas, the compiler says it can't find it. That means something in our macro must have gone wrong.

Callsite

The problem essentially boils down to the fact that macro arguments for multi-argument macros are simply not supported. The argument signature we have written for our immediate macro is essentially meaningless, let alone depending on varargs functionality. This is just a current limitation however and there is a trivial workaround:

import macros

macro abcProcs(n: varargs[expr]): stmt {.immediate.}  =
  let callargs = callsite()
  result = newStmtList()
  for i in 1.. <callargs.len:
    let body = newCall("echo", newStrLitNode($callargs[i]))
    result.add(newProc(callargs[i], body=body))

abcProcs(A, B, C)
...
A A B C B C

Finally a working 'varargs' version of our macro. Since we cannot rely on the macro's argument signature for providing the arguments passed to our macro we make a call to callsite which returns a sequence of the macro name followed by any arguments passed to it. That's why our for loop has been updated to start from 1 to skip the name of the macro.

We simply iterate over the callargs instead but macro is largely the same all around. In a larger macro it would be a relatively minor change to the macro so its not a huge deal for a feature which will eventually work just like single-argument immediate macros (including actual varargs support!)

The future

As better support for automatic immediate-mode macro invocation becomes available I would say this is about as good as it will get (pretty freaking good imo):

import macros

template abcProc(n: expr): stmt  =
  proc n() =  echo astToStr(n)

macro abcProcs(n: varargs[expr]): stmt =
  result = newStmtList()
  for i in 1.. <n.len:
    result.add(getAst(abcProc(n[i])))

abcProcs(A, B, C)

proc execute(order: seq[int], callbacks: seq[proc]) =
  for i in items(order):
    callbacks[i]()

execute(@[0,0,1,2,1,2], @[A, B, C])

Conclusion

I haven't solved a whole lot of problems with Nim yet but I have to say it really has accomplished its goal of selling me on the merits of meta-programming. The macros and templates do not seem hard to reason about at all. One of the difficulties I run up to with static languages is that I want to make my code as DRY as possible. I absolutely loathe doing anything more than once and even with generics you run into these problems where you are going to have to repeat some code. Templates and Macros seem to close the gap and truly allow you to keep it DRY in the land of strict-typing.