Tangle - Matt Whipple


I’m an advocate of Literate Programming (LP)(1), so one of the first tools I’ll be looking for is something to tangle source code out of LP documents. In the nature of my software reboot I’ll be starting with something somewhat homegrown until I have the bandwidth to dig deeply into existing alternatives.

I’ll be using Markdown(2) as the source input and utilize additional attributes associated with fenced code blocks to indicate tangle targets.

The simplest initial solution would just be to look for fenced blocks and output those blocks with a specific file, so in the interest of bootstrapping I’ll start with that. This could be executed with a command similar to < lpfile tangle filename > tangledsource.

The tangle implementation is in Haskell which coincidentally provides support for literate programming(3) that along with Pandoc’s support for that feature allows bootstrapping of tangle itself. The relevant code blocks in this document have both .haskell and .literate classes which allow pandoc to produce a <format>+lhs output file which can then be compiled directly with ghc.

This is particularly limited and is likely to be replaced to allow for more expressive structuring of the programs more in-line with the styles espoused by Don Knuth(1). If this addresses short term needs it may be kept long enough for me to adopt an existing tool or more accessible platform.

Technology Choice

The initial basic version of this utility was written in C, and I subsequently started to port it to Bash. While Bash would be adequate for the current use and would provide a portable and readily modifiable solution, the thought of porting also led me to consider future functionality so that the work was more clearly forward rather that potentially lateral motion.

Quickly assessing existing LP tools did not produce a clear choice with the desired behavior, but an envisioned option has been to leverage pandoc(4) to provide this behavior, and so that pursuit will start as the next step for this project.


The immediate advantage to making use of Pandoc is that it enables operating on the abstract syntax tree rather than duplicating parsing logic.

Pandoc provides numerous extension points; that (combined with the propsect of using it as a library) provides confidence that logic added using the most immediately convenient option can be be evolved as needed while possibly needing to adopt a different means of integration. For parity with the initial C implementation the creation of a filter combined with the plain output writer seems sufficient.

The first pass borrows heavily from the examples documented in the pandoc filters documentation, particularly that for Include files(5 #include-files).

I have a fair amount of familiarity with the concepts of Haskell but have not spent a significant amount of time working with it, so any code is unlikely to start off partcularly idiomatic.

The filter will be expected to be invoked with the name of the desired file passed as the first positional argument. This will use the piped invocation style so that the argument can be passed, for example:

pandoc tangle.md -t json | ./tangle 'tangle.hs' | pandoc -f json -t plain

The above invocation currently has issues where indentation will be included to match the source structure. A resolution for this is pending.

Setup and Main

The main entrypoint will make use of toJSONFilter to compose an appropriate filter out of the provided logic, where that function serves to augment the filter such that it operates on and returns JSON(6). OverloadedStrings will also be enabled to ease dealing with assorted string representations, and pack will be imported to support some related manual coercion between Strings and Text.

{-# LANGUAGE OverloadedStrings #-}

import Data.Text (pack)
import Text.Pandoc.JSON

main :: IO ()
main = toJSONFilter tangle


tangle currently provides handling of Blocks as reflected in the type and will make use of the form using an initial [String] that allows for accessing of the parameters(6).

tangle :: [String] -> Block -> Block

Code Blocks

Any code blocks that include a file attribute with a value matching the provided argument should be included, those that do not have such a matching attribute should be replaced by a Null.

tangle args (CodeBlock att contents)
    | matchesFile args att = (CodeBlock att contents)
    | otherwise = Null

Other Block Types

Anything other than a code block will simply be Nulled.

tangle _ _ = Null


matchesFile will be in charge of comparing the arguments provided with the attributes of a given Block and returning a boolean indicating whether the desired attribute value exists.

matchesFile :: [String] -> Attr -> Bool


For now we’ll assume that there is a single argument corresponding to the name of the target output file, and this function will see if the attribute named file has a matching value. I’m fairly sure there’s an option to move the string comparison to the pattern match but it didn’t work quickly and the current code is still fairly readable (though I may swap it around later).

Invalid input will result in nothing being done. To handle one evident edge case this will define an arm that does nothing in the case of no arguments (any non-initial arguments will simply be ignored).

matchesFile []          _                = False
matchesFile (target:xs) (_, _, namevals) =
    case lookup "file" namevals of
        Just file  -> file == pack target
        Nothing    -> False


A script will be used to wrap up the noisy pandoc invocation. This will write the output to stdout so it will typically require redirection.

The script will start with some basic bash boilerplate to provide some stricter behavior.

#!/usr/bin/env bash
set -euo pipefail


The script itself will expect two positional arguments where the first indicates the source file and the second indicates the name of the file for which blocks will be expanded.

For now validation will just involve checking the number of arguments provided.

readonly me="${0}"
tangle::usage() {
    echo "${me} <source_file> <file_to_tangle>"
if (( $# != 2 )); then
    exit 1

Filtering Fences

At the moment there is also a loose end: plain output results in the indentation of the code blocks and so the above likely needs to evolve into a custom writer. In the short term using markdown output and filtering out the fence blocks can get the job done.

This could be easly handled through grep but as grep is not currently in my official toolbox I’ll use a bash function for that purpose.

This should be used as a pipe to filter out triple backtick fences. IFS is set to null to prevent any whitespace normalization.

tangle::filter() {
    while read line; do
        [[ ${line} =~ ^\\`\\`\\` ]] || echo "${line}"

Calling pandoc

The invocation of pandoc can be handled through expanding arguments in the appropriate locations. This will assume that pandoc is available on the path and the filter is in the current directory.

pandoc "${1}" --preserve-tabs --to=json | ./tangle "${2}" | pandoc --preserve-tabs --from=json --to=markdown | tangle::filter


Bootstrapping will largely just be ignored for now and it will be assumed that the tangleFile script exists or can be constructed manually. This can be dealt with far more easily when the writer is introduced such that the overall invcation is simpler.

This makes use of some fairly interesting make stuff that should be documented. The setting of the execution bit could also be captured as a rule but there is currently only one relevant file.

GHC     := ghc

OUTPUTS := Makefile.tangle tangleFile tangle.hs
TANGLES := $(addsuffix .tangled,${OUTPUTS})

all: ${OUTPUTS} tangle
.PHONY: all

%.tangled: tangle.md
    @./tangleFile ${<} ${*} > ${@}

tangle-%: %.tangled
    @mv ${<} ${*}

tangle: tangle.hs ; ${GHC} ${<}

${OUTPUTS}: tangle-$${@}

tangleFile: tangle-$${@}
    @chmod +x ${@}
KNUTH, D. E. Literate programming [online]. Cambridge University Press, 1992. Center for the study of language and information publication lecture notes. ISBN 9780937073803. Available from: https://books.google.com/books?id=fqPIPgAACAAJ
Markdown - wikipedia [online]. 17 April 2021. Available from: https://en.wikipedia.org/wiki/Markdown
Literate programming - HaskellWiki [online]. 5 October 2021. Available from: https://wiki.haskell.org/Literate_programming
Pandoc - about pandoc [online]. 7 May 2022. Available from: https://pandoc.org
Pandoc - pandoc filters [online]. 2 October 2021. Available from: https://pandoc.org/filters.html