Closures in C

Code execution ultimately involves a lot of shared resources and mutation of state around those resources and as C is a pretty thin wrapper around those gory details it tends to leave those guts exposed. To have comprehensible programs however, some level of context preservation is generally helpful to shield us from bits flying between addresses and registers.

A particularly common aid to preserving context is the notion of closures which are essentially functions that make use of lexical scoping such that the references that were defined at the time of the function definition are the same that are present at the time of function invocation. As C has dynamic scoping it does not have closures, but this aims to provide a reasonable proximity.

Stripping away some of the ideological dressing the concept of closures and lexical scoping revolve around the concept of preserving a binding environment that contains those references. While many newer languages draw on Scheme’s notion of lexical scoping such that that environment would have variable in scope available for use in a closure, other languages such as PHP and C++ make this concept more obvious in that they require additional capture arguments which indicate those variables that should be loaded in the environment. It is this style that can be also adopted in C.

In an attempt to assemble the above thoughts into a comprehensible package: a closure represents a binding environment and some logic acting upon that environment. This fits in with a struct representation along the lines of:

struct closure {
  binding env;
  void (*invoke)(binding env, ...args);
};

…but that doesn’t really seem to be provide any benefit over just passing an environment in and the struct ultimately seems fairly pointless. The goal should be that the client code can just invoke the closure without needing to know anything else and we should be able to drop anything in there: what is in binding? Should we even have to care what is in binding?

Let’s adjust so that the calling code only needs to care that this thing is a closure with the desired signature and nothing else.

typedef struct closure {
  void *(invoke)(struct closure *self, ...args);
  binding env;
} *Closure;

The general pattern of passing owning context in as a first parameter is a common one in at least object oriented languages although it is sometimes hidden from users of the language. That seems slightly better…and we could then also wrap this up a bit:

void invoke(Closure c, ...args) {
  c->invoke(c, args);
}

That seems promising…now to invoke a closure with the right signature any given code should just be able to just call invoke(c, ...args) which is pretty comparable to other languages. What the closure itself is is still a bit opinionated, but with these calls only the closure itself needs to actually know what it is…and any closure should be able to identify itself. Each closure could then also figure out what its binding environment looks like so we could leave the resolution of what that type is to the closure itself, or even more simply we could make the resolution of what the closure itself is to the closure and hide the details behind a void pointer. What the binding environment looks like or whether it even exists at all can be left up to the closure so the updated struct definition looks like:

typedef struct {
  void *(invoke)(void *self, ...args);
} *Closure

Since the closure is able to self identify and the top level invoke always passes the closure instance (which can be encapsulated with opaque pointers), each instance can simply cast the parameter back to the desired type. In cases such as above where invoke is the first child of the struct a simple recast will suffice, if the function pointer is not the initial struct member then the case should would be supplementing by adjusting the pointer based on the offset of the function pointer within the struct.

For example:

typedef struct counter {
  void *(invoke)(void* self, int inc);
  int *sum;
};

void add_it(void* self, int inc) {
  struct adder* a = *((adder*) self);
  *a-> sum += a->left;
}

#define adder(sum) (Closure) & (adder) { add_it, sum }

...
int sum = 0;
Closure count = adder(&sum);
invoke(count, 1);
invoke(count, 2);
assert(sum == 3);

This approach is primitive but effective, and the example above should be representative of both how it works and some of the related hoops. There is certainly enough overhead that this approach is not as readily useful as the concept is in systems where there is direct support. At the outset this is likely to be “too messy and ugly” to offer immediate benefit (“Lambda” 2021). This is rather a design approach that needs enough utility to be amortize its weight. Currently I make use of it as part of error callbacks in C.

As it is this does not allow for significant reuse of logic with different signatures. Introduction of some generalized data structures may help in some places which may be explored as this is used.

Another risk with this approach is tied to lifetime: if the binding environment contains pointers to any data outside of that environment (as in the example above) those references could be invalidated by the time the closure is invoked. In cases where the closure would be invoked immediately or discarded this is unlikely but if the closure is used to represent some strategy that may be called at an undetermined later time this could lead to some bad behavior. This should be guarded against, likely by avoiding such external references (in the example above this could be addressed by transferring ownerhip of sum to the adder).

I’ll likely circle back to this page and add an example program demonstrating actual use.

“Lambda.” 2021. https://martinfowler.com/bliki/Lambda.html.