Code execution ultimately involves a lot of shared resources and mutation of state around those resources and as C is a pretty thin wrapper around those gory details it tends to leave those guts exposed. To have comprehensible programs however, some level of context preservation is generally helpful to shield us from bits flying between addresses and registers.
A particularly common aid to preserving context is the notion of closures which are essentially functions that make use of lexical scoping such that the references that were defined at the time of the function definition are the same that are present at the time of function invocation. As C has dynamic scoping it does not have closures, but this aims to provide a reasonable proximity.
Stripping away some of the ideological dressing the concept of closures and lexical scoping revolve around the concept of preserving a binding environment that contains those references. While many newer languages draw on Scheme’s notion of lexical scoping such that that environment would have variable in scope available for use in a closure, other languages such as PHP and C++ make this concept more obvious in that they require additional capture arguments which indicate those variables that should be loaded in the environment. It is this style that can be also adopted in C.
In an attempt to assemble the above thoughts into a comprehensible package: a closure represents a binding environment and some logic acting upon that environment. This fits in with a struct representation along the lines of:
struct closure {
;
binding envvoid (*invoke)(binding env, ...args);
};
…but that doesn’t really seem to be provide any benefit over just
passing an environment in and the struct ultimately seems fairly
pointless. The goal should be that the client code can just invoke the
closure without needing to know anything else and we should be able to
drop anything in there: what is in binding
? Should we even
have to care what is in binding
?
Let’s adjust so that the calling code only needs to care that this thing is a closure with the desired signature and nothing else.
typedef struct closure {
void *(invoke)(struct closure *self, ...args);
;
binding env} *Closure;
The general pattern of passing owning context in as a first parameter is a common one in at least object oriented languages although it is sometimes hidden from users of the language. That seems slightly better…and we could then also wrap this up a bit:
void invoke(Closure c, ...args) {
->invoke(c, args);
c}
That seems promising…now to invoke a closure with the right signature
any given code should just be able to just call
invoke(c, ...args)
which is pretty comparable to other
languages. What the closure itself is is still a bit opinionated, but
with these calls only the closure itself needs to actually know what it
is…and any closure should be able to identify itself. Each closure could
then also figure out what its binding environment looks like so we could
leave the resolution of what that type is to the closure itself, or even
more simply we could make the resolution of what the closure itself is
to the closure and hide the details behind a void pointer. What the
binding environment looks like or whether it even exists at all can be
left up to the closure so the updated struct definition looks like:
typedef struct {
void *(invoke)(void *self, ...args);
} *Closure
Since the closure is able to self identify and the top level
invoke
always passes the closure instance (which can be
encapsulated with opaque pointers), each instance can simply cast the
parameter back to the desired type. In cases such as above where
invoke
is the first child of the struct a simple recast
will suffice, if the function pointer is not the initial struct member
then the case should would be supplementing by adjusting the pointer
based on the offset of the function pointer within the struct.
For example:
typedef struct counter {
void *(invoke)(void* self, int inc);
int *sum;
};
void add_it(void* self, int inc) {
struct adder* a = *((adder*) self);
*a-> sum += a->left;
}
#define adder(sum) (Closure) & (adder) { add_it, sum }
...
int sum = 0;
= adder(&sum);
Closure count (count, 1);
invoke(count, 2);
invoke(sum == 3); assert
This approach is primitive but effective, and the example above should be representative of both how it works and some of the related hoops. There is certainly enough overhead that this approach is not as readily useful as the concept is in systems where there is direct support. At the outset this is likely to be “too messy and ugly” to offer immediate benefit (1). This is rather a design approach that needs enough utility to be amortize its weight. Currently I make use of it as part of error callbacks in C.
As it is this does not allow for significant reuse of logic with different signatures. Introduction of some generalized data structures may help in some places which may be explored as this is used.
Another risk with this approach is tied to lifetime: if the binding
environment contains pointers to any data outside of that environment
(as in the example above) those references could be invalidated by the
time the closure is invoked. In cases where the closure would be invoked
immediately or discarded this is unlikely but if the closure is used to
represent some strategy that may be called at an undetermined later time
this could lead to some bad behavior. This should be guarded against,
likely by avoiding such external references (in the example above this
could be addressed by transferring ownerhip of sum
to the
adder).
I’ll likely circle back to this page and add an example program demonstrating actual use.