Weave-only Literate Programming

For a few years now I’ve been an advocate of Literate Programming (LP). I think I first had substantial exposure to the concept through the facilities provided by org-mode and it appealed to me fairly immediately as I’ve frequently introduced new technologies and approaches on to teams and have had to then field usage questions. As is often the case my future self was often one of those people seeking insight, especially when the solutions go for long enough stretches without needing attention that large amounts are forgotten between actually using the technology (a particular obvious example is IaC tools such as Terraform which are very useful but only likely to be used by engineers a few times a year). Providing explanations and links to references next to the code seemed like a very straightforward way to reduce the need to reconstruct context or try to track someone down that can help with that, and combined with other forms of documentation seem like a promising means to capture essential information.

I started down the path of applying it largely as originally proposed by Knuth(1), though gravitating towards documentation formats that are more oriented towards logical design and being human readable. Without yet landing on relevant tooling I’ve really invested in (for various reasons) and after the appropriate honeymoon period has expired my preference has been to adopt a weave only form of LP.

The original design revolved around having a WEB source file which was a mixture of TeX and Pascal thereby providing a model where there was a single source which was then run through tangle to produce machine-friendly source code and through weave to produce human-friendly documentation. Raw TeX (and family) is not particularly conducive to reading, and one of the underlying motivations was to provide macros which enabled Pascal to be structured in ways which enabled exposition to the reader but which were not supported by the compiler; ultimately the system as a whole therefore revolves around authoring a WEB source for the sake of the consumable dervied outputs rather than being inherently consumable itself.

Within the context of computer science the original system was created a long time ago and the landscape has evolved significantly.

De-Tangling

Languages are now far more flexible and built on top of far more sophisticated optimizing compilers (and hardware). This obviates many of the underlying motivations for requiring indirection from the source code as many of the limitations of Pascal are unlikely to be extant in modern systems and approaches such as invoking subroutines which “impose[d] considerable overhead”(1, p. p51) at the time LP was conceived are less likely to post practical overhead and can be inlined by compilers. This is certainly nothing revelatory and is simply retrospective from a place over the horizon of what was expected from compiler optimizations at the time. Many of the mechanisms envisioned in or extended from “program manipulation systems” have since come to fruition.

The original design also produced output that was solely intended for a compiler comparable to current practices of minification. This seems likely to be a trivial and irrelevant detail by itself but seems worth identifying as it increases the distance between the source and the intermediate outputs it produces.

Un-Weaving

Another potentially stale force is that LP was crafted while Knuth was breaking new ground in applying computers to typesetting. While TeX is amazingly still in use there are typically layers of abstraction on top of it. Formats such as Markdown allow for directly legible documentation which can also lend themselves to more semantic authoring, and then can be passed through assorted transformations to generate presentation formats.

What Now?

Removal of the needs for tangling or weaving lend themselves to the notion that the authored file can be directly consumed to provide both the legible documentation and the code. So long as a decently flexible language is in use the code can be structured to facilitate communication and the comment facility it provides can be used for documentation using a format such as Markdown. While this may constrain some potential LP practices it is also dramatically simpler and provides immediately usable source files.

The one significant gap in the above is that semantic formats such as Markdown are still typically best consumed through some presentation format to which they can be converted. Some programs can also do this automatically but my preference is to remain tool agnostic and I personally don’t like to need more than a basic text editor. This suggests that some notion of weaving is still desired to extract the documentation and potentially process it further. This is analagous to the original weaving and also assorted code documentation generators. The involved logic should be fairly straightfoward and can amount to extracting the relevant code comments and wrapping the code itself in fenced blocks.

Tangle-only

Prior to this approach I had started down the path of going in the other direction where a Markdown file was created and source files could be extracted from it. This was likely carried forward from the approach provided by org-mode. This model seemed to offer some potential benefits such as being able to house multiple related files in the same source file. After some time, however, the opposite direction seems far more enticing:

The last point is particularly significant given that code can often start as something very small or cookie-cutter but then evolve, and it also allows the practice to be introduced non-invasively such as on a project with additional less literate contributors.

Conclusion

I’ll be adopting this moving forward and integrating it into this site and elsewhere. I have a small simple weave program which I’ll be evolving over time and can likely document some of the more interesting integrations if it starts to be distributed.

1.
KNUTH, D. E. Literate programming [online]. Cambridge University Press, 1992. Center for the study of language and information publication lecture notes. ISBN 9780937073803. Available from: https://books.google.com/books?id=fqPIPgAACAAJ