I’m an advocate of Literate Programming (LP)(1), so one of the first tools I’ll be looking for is something to tangle source code out of LP documents. In the nature of my software reboot I’ll be starting with something somewhat homegrown until I have the bandwidth to dig deeply into existing alternatives.
I’ll be using Markdown(2) as the source input and utilize additional attributes associated with fenced code blocks to indicate tangle targets.
The simplest initial solution would just be to look for fenced blocks
and output those blocks with a specific file, so in the interest of
bootstrapping I’ll start with that. This could be executed with a
command similar to
< lpfile tangle filename > tangledsource
.
The tangle implementation is in Haskell which coincidentally provides
support for literate programming(3) that along with Pandoc’s support for
that feature allows bootstrapping of tangle
itself. The
relevant code blocks in this document have both .haskell
and .literate
classes which allow pandoc to produce a
<format>+lhs
output file which can then be compiled
directly with ghc
.
This is particularly limited and is likely to be replaced to allow for more expressive structuring of the programs more in-line with the styles espoused by Don Knuth(1). If this addresses short term needs it may be kept long enough for me to adopt an existing tool or more accessible platform.
The initial basic version of this utility was written in C, and I subsequently started to port it to Bash. While Bash would be adequate for the current use and would provide a portable and readily modifiable solution, the thought of porting also led me to consider future functionality so that the work was more clearly forward rather that potentially lateral motion.
Quickly assessing existing LP tools did not produce a clear choice with the desired behavior, but an envisioned option has been to leverage pandoc(4) to provide this behavior, and so that pursuit will start as the next step for this project.
The immediate advantage to making use of Pandoc is that it enables operating on the abstract syntax tree rather than duplicating parsing logic.
Pandoc provides numerous extension points; that (combined with the
propsect of using it as a library) provides confidence that logic added
using the most immediately convenient option can be be evolved as needed
while possibly needing to adopt a different means of integration. For
parity with the initial C implementation the creation of a filter
combined with the plain
output writer seems sufficient.
The first pass borrows heavily from the examples documented in the pandoc filters documentation, particularly that for Include files(5 #include-files).
I have a fair amount of familiarity with the concepts of Haskell but have not spent a significant amount of time working with it, so any code is unlikely to start off partcularly idiomatic.
The filter will be expected to be invoked with the name of the desired file passed as the first positional argument. This will use the piped invocation style so that the argument can be passed, for example:
pandoc tangle.md -t json | ./tangle 'tangle.hs' | pandoc -f json -t plain
The above invocation currently has issues where indentation will be included to match the source structure. A resolution for this is pending.
The main entrypoint will make use of toJSONFilter
to
compose an appropriate filter out of the provided logic, where that
function serves to augment the filter such that it operates on and
returns JSON(6).
OverloadedStrings will also be enabled to ease dealing with assorted
string representations, and pack
will be imported to
support some related manual coercion between Strings and Text.
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (pack)
import Text.Pandoc.JSON
main :: IO ()
= toJSONFilter tangle main
tangle currently provides handling of Blocks as reflected in the type
and will make use of the form using an initial [String]
that allows for accessing of the parameters(6).
tangle :: [String] -> Block -> Block
Any code blocks that include a file
attribute with a
value matching the provided argument should be included, those that do
not have such a matching attribute should be replaced by a Null.
CodeBlock att contents)
tangle args (| matchesFile args att = (CodeBlock att contents)
| otherwise = Null
Anything other than a code block will simply be Nulled.
= Null tangle _ _
matchesFile will be in charge of comparing the arguments provided with the attributes of a given Block and returning a boolean indicating whether the desired attribute value exists.
matchesFile :: [String] -> Attr -> Bool
For now we’ll assume that there is a single argument corresponding to
the name of the target output file, and this function will see if the
attribute named file
has a matching value. I’m fairly sure
there’s an option to move the string comparison to the pattern match but
it didn’t work quickly and the current code is still fairly readable
(though I may swap it around later).
Invalid input will result in nothing being done. To handle one evident edge case this will define an arm that does nothing in the case of no arguments (any non-initial arguments will simply be ignored).
= False
matchesFile [] _ :xs) (_, _, namevals) =
matchesFile (targetcase lookup "file" namevals of
Just file -> file == pack target
Nothing -> False
A script will be used to wrap up the noisy pandoc invocation. This will write the output to stdout so it will typically require redirection.
The script will start with some basic bash boilerplate to provide some stricter behavior.
#!/usr/bin/env bash
set -euo pipefail
The script itself will expect two positional arguments where the first indicates the source file and the second indicates the name of the file for which blocks will be expanded.
For now validation will just involve checking the number of arguments provided.
readonly me="${0}"
tangle::usage() {
echo "${me} <source_file> <file_to_tangle>"
}
if (( $# != 2 )); then
tangle::usage
exit 1
fi
At the moment there is also a loose end: plain
output
results in the indentation of the code blocks and so the above likely
needs to evolve into a custom writer. In the short term using markdown
output and filtering out the fence blocks can get the job done.
This could be easly handled through grep but as grep is not currently in my official toolbox I’ll use a bash function for that purpose.
This should be used as a pipe to filter out triple backtick fences. IFS is set to null to prevent any whitespace normalization.
tangle::filter() {
IFS=\0
while read line; do
[[ ${line} =~ ^\\`\\`\\` ]] || echo "${line}"
done
}
The invocation of pandoc can be handled through expanding arguments in the appropriate locations. This will assume that pandoc is available on the path and the filter is in the current directory.
pandoc "${1}" --preserve-tabs --to=json | ./tangle "${2}" | pandoc --preserve-tabs --from=json --to=markdown | tangle::filter
Bootstrapping will largely just be ignored for now and it will be assumed that the tangleFile script exists or can be constructed manually. This can be dealt with far more easily when the writer is introduced such that the overall invcation is simpler.
This makes use of some fairly interesting make stuff that should be documented. The setting of the execution bit could also be captured as a rule but there is currently only one relevant file.
GHC := ghc
OUTPUTS := Makefile.tangle tangleFile tangle.hs
TANGLES := $(addsuffix .tangled,${OUTPUTS})
all: ${OUTPUTS} tangle
.PHONY: all
%.tangled: tangle.md
@./tangleFile ${<} ${*} > ${@}
tangle-%: %.tangled
@mv ${<} ${*}
tangle: tangle.hs ; ${GHC} ${<}
.SECONDEXPANSION:
${OUTPUTS}: tangle-$${@}
tangleFile: tangle-$${@}
@chmod +x ${@}