Tangle - Matt Whipple



I’m an advocate of Literate Programming (LP)(1), so one of the first tools I’ll be looking for is something to tangle source code out of LP documents. In the nature of my software reboot I’ll be starting with something somewhat homegrown until I have the bandwidth to dig deeply into existing alternatives.

I’ll be using Markdown(2) as the source input and utilize additional attributes associated with fenced code blocks to indicate tangle targets.

The simplest initial solution would just be to look for fenced blocks and output those blocks with a specific file, so in the interest of bootstrapping I’ll start with that. This could be executed with a command similar to < lpfile tangle filename > tangledsource.

The tangle implementation is in Haskell which coincidentally provides support for literate programming(3) that along with Pandoc’s support for that feature allows bootstrapping of tangle itself. The relevant code blocks in this document have both .haskell and .literate classes which allow pandoc to produce a <format>+lhs output file which can then be compiled directly with ghc.

This is particularly limited and is likely to be replaced to allow for more expressive structuring of the programs more in-line with the styles espoused by Don Knuth(1). If this addresses short term needs it may be kept long enough for me to adopt an existing tool or more accessible platform.

Technology Choice

The initial basic version of this utility was written in C, and I subsequently started to port it to Bash. While Bash would be adequate for the current use and would provide a portable and readily modifiable solution, the thought of porting also led me to consider future functionality so that the work was more clearly forward rather that potentially lateral motion.

Quickly assessing existing LP tools did not produce a clear choice with the desired behavior, but an envisioned option has been to leverage pandoc(4) to provide this behavior, and so that pursuit will start as the next step for this project.


The immediate advantage to making use of Pandoc is that it enables operating on the abstract syntax tree rather than duplicating parsing logic.

Pandoc provides numerous extension points; that (combined with the propsect of using it as a library) provides confidence that logic added using the most immediately convenient option can be be evolved as needed while possibly needing to adopt a different means of integration. For parity with the initial C implementation the creation of a filter combined with the plain output writer seems sufficient.

The first pass borrows heavily from the examples documented in the pandoc filters documentation, particularly that for Include files(5 #include-files).

I have a fair amount of familiarity with the concepts of Haskell but have not spent a significant amount of time working with it, so any code is unlikely to start off partcularly idiomatic.

The filter will be expected to be invoked with the name of the desired file passed as the first positional argument. This will use the piped invocation style so that the argument can be passed, for example:

pandoc tangle.md -t json | ./tangle 'tangle.hs' | pandoc -f json -t plain

Setup and Main

The main entrypoint will make use of toJSONFilter to compose an appropriate filter out of the provided logic, where that function serves to augment the filter such that it operates on and returns JSON(6). OverloadedStrings will also be enabled to ease dealing with assorted string representations, and pack will be imported to support some related manual coercion between Strings and Text.

{-# LANGUAGE OverloadedStrings #-}

import Data.Text (pack)
import Text.Pandoc.JSON

main :: IO ()
main = toJSONFilter tangle


tangle currently provides handling of Blocks as reflected in the type and will make use of the form using an initial [String] that allows for accessing of the parameters(6).

tangle :: [String] -> Block -> Block

Code Blocks

Any code blocks that include a file attribute with a value matching the provided argument should be included, those that do not have such a matching attribute should be replaced by a Null.

tangle args (CodeBlock att contents)
    | matchesFile args att = (CodeBlock att contents)
    | otherwise = Null

Other Block Types

Anything other than a code block will simply be Nulled.

tangle _ _ = Null


matchesFile will be in charge of comparing the arguments provided with the attributes of a given Block and returning a boolean indicating whether the desired attribute value exists.

matchesFile :: [String] -> Attr -> Bool


For now we’ll assume that there is a single argument corresponding to the name of the target output file, and this function will see if the attribute named file has a matching value. I’m fairly sure there’s an option to move the string comparison to the pattern match but it didn’t work quickly and the current code is still fairly readable (though I may swap it around later).

Invalid input will result in nothing being done. To handle one evident edge case this will define an arm that does nothing in the case of no arguments (any non-initial arguments will simply be ignored).

matchesFile []          _                = False
matchesFile (target:xs) (_, _, namevals) =
    case lookup "file" namevals of
        Just file  -> file == pack target
        Nothing    -> False
KNUTH, D. E. Literate programming [online]. Cambridge University Press, 1992. Center for the study of language and information publication lecture notes. ISBN 9780937073803. Available from: https://books.google.com/books?id=fqPIPgAACAAJ
Markdown - wikipedia [online]. 17 April 2021. Available from: https://en.wikipedia.org/wiki/Markdown
Literate programming - HaskellWiki [online]. 5 October 2021. Available from: https://wiki.haskell.org/Literate_programming
Pandoc - about pandoc [online]. 20 September 2021. Available from: https://pandoc.org
Pandoc - pandoc filters [online]. 2 October 2021. Available from: https://pandoc.org/filters.html