Orchestrating the Tangle: makefile-tangle
Why Tangling is Separate

The tangling process is an ideal candidate for GNU make (as is the weaving process, described elsewhere). However, while it is possible to encapsulate the weaving process entirely in a single makefile common to all directories, the same is not true for the tangling process. Tangling requires a knowledge not only of which source (.tei) files require tangling (which could be determined automatically) and what programming languages their code chunks are written in (which could be determined, or encoded), but also of external issues such as compiler parameters (directives, include files, libraries - which may differ in different operating system environments) and the intended location of use of the compiled output (especially relevant for gMLP VARKON MBS, the running of which may be incorporated back into the source documents).

For now at least, then (and I may change my mind later), I'll simply use a separate makefile in each directory. I'll call this the same thing in each directory, "makefile-tangle", but it may/probably will be different in each. The model here, which is the makefile-tangle for graphein itself, may be copied, modified, and used as necessary. This can be done either by literally copying and modifying "makefile-graphein" or by taking the gMLP source for the present section of this document. (This section has been written in a separate file, "makefile-tangle.tei-entity", which may be copied and used as a gMLP source.)

Once again, I'll use gMLP techniques to put the most important parts first. Perhaps more accurately, I'll put the tricky bit first.

Trickery, and a Script to Run It

An XML "SYSTEM entity" is an external piece of text (usually, and here, a file, though in theory not necessarily) encoded in a markup langauge defined by XML (such as the TEI). The graphein system allows the possibility of XML SYSTEM entities. They can be handy, for example, for cutting up a document into chapters, one chapter per system entity. (The present section of this chapter, for example, is sitting in an XML SYSTEM entity, just to make it separable from the rest of the chapter at a source level.) Because the graphein make (weaving) process automatically detects TEI files, XML system entities (which aren't standalone, but are incorporated as part of TEI files) can't have the ".tei" extension. By convention, graphein uses ".tei-entity" - clunky, but obvious.

The problem is detecting in the make process when a document has changed when only the .tei-entity" file referenced by the document has changed.

There's no way to do this with XSL. By the time the XSL processor sees the input, the XML parser has resolved and incorporated all SYSTEM entities. It must be done at XML parse time.

What I need, then, is a way which goes into a TEI file and detects all of the external SYSTEM entities it uses. Here's a bit of trickery to accomplish this:

xmllint --debug index.tei 2> /dev/null | \ awk '{if (substr($1,1,10) == "ENTITY_REF") { \ printf("%s.tei-entity ", substr($1,12,length($1)-12))}}'

What this does is use the xmllint tool to parse the TEI (and all of the SYSTEM entities it uses), outputting a parsed tree as if debugging. Any actual error messages are thrown away ("2> /dev/null" redirects stderr to the bit bucket). The rest of the tree is passed through an Awk script which detects those lines in the debugging output which are of the form "ENTITY_REF(entityname)" and returns only the "entityname" part. This is the base name of the external SYSTEM entity - add ".tei-entity" to it and the result is the filename.

Naively, I might just put this in the right-hand side of a GNU make rule, but that won't work. To do so, I'd put it in a GNU make "$(shell )" function call. But GNU make evaluates all functions at its invocation. By the time it gets around to working through the rule dependencies, all of the functions have been called already.

I could (and will) put this into a bit of bash shell code which cycles through all of the ".tei" files and, for each, cycles through all of the ".tei-entity" SYSTEM entities (if any). If any TEI file calls a SYSTEM entity that is newer than itself, I touch that file. This is enough to trigger a rebuild later.

The only trouble with this is that I can't figure out how to force GNU make to execute a particular target first. Unless I can do that, it may execute after other targets which are supposed to rely upon what it does - not good.

My solution here is crude, alas. I put this bit of trickery in a shell script and use that shell script to, then, execute make.

for tei_file in `ls *.tei`; \ do \ for tei_entity_file in `xmllint --debug $tei_file 2> /dev/null | awk '{if (substr($1,1,10) == "ENTITY_REF") { printf("%s.tei-entity ", substr($1,12,length($1)-12))}}'`; \ do \ if [ $tei_entity_file -nt $tei_file ]; then \ touch $tei_file; \ fi; \ done; \ done make -r -f makefile-tangle $1

I try to avoid writing code that feels sneaky, but this sure does.

There are two other situations in which I'll need to do nearly identical things: when weaving (only) and when both weaving and tangling. The shell scripts for these differ only in their line invoking make. Since I've just explained all of this here, I'll include them here, too.

for tei_file in `ls *.tei`; \ do \ for tei_entity_file in `xmllint --debug $tei_file 2> /dev/null | awk '{if (substr($1,1,10) == "ENTITY_REF") { printf("%s.tei-entity ", substr($1,12,length($1)-12))}}'`; \ do \ if [ $tei_entity_file -nt $tei_file ]; then \ touch $tei_file; \ fi; \ done; \ done make -r -f makefile $1

Note that the makefile for weaving is called just "makefile" (because it is what gets done in ordinary graphein making, without gMLP) rather than "makefile-weave" (which would have been more a parallel with "makefile-tangle").

for tei_file in `ls *.tei`; \ do \ for tei_entity_file in `xmllint --debug $tei_file 2> /dev/null | awk '{if (substr($1,1,10) == "ENTITY_REF") { printf("%s.tei-entity ", substr($1,12,length($1)-12))}}'`; \ do \ if [ $tei_entity_file -nt $tei_file ]; then \ touch $tei_file; \ fi; \ done; \ done make -r -f makefile-tangle $1 make -r -f makefile $1
makefile-tangle

The rules that do the actual work are much simpler. Here is the core of the makefile, a set of GNU make targets to make each type of program (e.g., bash shell script, GNU make makefile, XSLT stylesheet, etc. as required).

Note (Noweb): The default behavior for notangle is to expand tabs to 8 blank spaces. GNU make requires tabs in its rule actions. So when tangling this, be sure to specify the "-tN" parameter to notangle ("-t8" works). Depending on the weaving process, the tabs may or may not be visible below; they're there in the source.

TARGETS_BASHES_MLP=hello.sh run-saxon.sh \ run-make-tangle.sh run-make-weave.sh run-make-weave-and-tangle.sh # generate-dependencies.sh TARGETS_BASHES_WEAVE= TARGETS_MAKEFILES_MLP= makefile-tangle TARGETS_MAKEFILES_WEAVE= TARGETS_XSLTS_MLP=graphein-tonotangle.xsl TARGETS_XSLTS_WEAVE= TARGETS_XMLS=xml-catalog bashes: $(TARGETS_BASHES_MLP) $(TARGETS_BASHES_WEAVE) makefiles: $(TARGETS_MAKEFILES_MLP) $(TARGETS_MAKFILES_WEAVE) xslts: $(TARGETS_XSLTS_MLP) $(TARGETS_XSLTS_WEAVE) xmls: $(TARGETS_XMLS) make-bashes make-makefiles make-xslts

The various targets above include all of the code components of the graphein system (not just the ones required for gMLP). So, for example, TARGETS_BASHES_MLP includes not only "run-make-tangle.sh" (used in the gMLP part of graphein) but also "generate-dependencies.sh" (used in scaling images during weaving).

$(TARGETS_BASHES_MLP): make.nw literate-programming.nw scaling.nw notangle -t8 -R$@ make.nw literate-programming.nw scaling.nw > $@ chmod +x $@ $(TARGETS_MAKEFILES_MLP): literate-programming.nw notangle -t8 -R$@ literate-programming.nw > $@ $(TARGETS_XSLTS_MLP): literate-programming.nw notangle -t8 -R$@ literate-programming.nw > $@ $(TARGETS_XMLS): text-processors.nw notangle -t8 -R$@ text-processors.nw > $@

All of these depend on a rule to make the Noweb intermediate files:

BASENAMES_TEI=$(basename $(wildcard *.tei)) TARGETS_NW=$(patsubst %,%.nw,$(BASENAMES_TEI)) $(TARGETS_NW): %.nw : %.tei java \ -classpath "/usr/share/java/xerces-j2.jar:/usr/share/saxon-6.5.3/saxon.jar:/usr/share/java/xml-commons-resolver-1.1.jar:/etc/java/resolver" \ com.icl.saxon.StyleSheet \ -x org.apache.xml.resolver.tools.ResolvingXMLReader \ -y org.apache.xml.resolver.tools.ResolvingXMLReader \ -r org.apache.xml.resolver.tools.CatalogResolver \ -u \ $*.tei graphein-tonotangle.xsl > $*.nw

Finally, every makefile should be able to clean up after itself. It should be safe to remove all Noweb files (".nw"), as they're only used as intermediate files. It is probably not safe to remove all shell scripts by the ".sh" suffix, as I (or you) may have written entirely unrelated shell scripts that shouldn't simply disappear. The same is true of XSL(T) (".xsl") stylesheets. So remove them by name, and also remove by name the generated makefiles (which have no filename suffixes).

clean: rm -f *.nw rm -f $(TARGETS_BASHES_MLP) $(TARGETS_BASHES_WEAVE) rm -f $(TARGETS_MAKEFILES_MLP) $(TARGETS_MAKEFILES_WEAVE) rm -f $(TARGETS_XSLTS_MLP) $(TARGETS_XSLTS_WEAVE)

Putting it all together:

all: bashes makefiles xslts xmls targets-by-type make-code-chunks clean
gMLP Distribution Items

So, after all of this, what does the standard distribution of the MLP portion of graphein contain?

First, it contains the minimal items out of which it can be re-created:

literate-programming.tei makefile-tangle.tei-entity target-dependencies.tei-entity graphein-tonotangle.xsl-illiterate

The first two of these are the TEI source files for the MLP portion itself. The makefile-tangle.tei-entity source file may be used as a starting point for gMLP tangling makefiles elsewhere. The third item is a SYSTEM entity referenced in literate-programming.tei, but less tightly a part of the gMLP source itself. The fourth item is an XSL stylesheet which extracts the code chunks. (At some point in my own development process this was handwritten; as-distributed, it's just a copy of the MLP-generated "graphein-tonotangle.xsl".)

Second, it contains the three shell scripts used to invoke graphein's weaving and tangling processes. These invocations could be done by hand (and thus these scripts extracted from the gMLP source), but they're handy to have from the start.

run-make-tangle.sh run-make-weave.sh run-make-weave-and-tangle.sh

Third, it contains the gMLP-generated XSLT stylesheet for extracting the code chunks, and makefile for tangling. These could be generated from the items above, but are, again, handy to have from the start.

graphein-tonotangle.xsl makefile-tangle

This makefile may be used, literally, as a template for tangling makefiles in other directories, or its gMLP source (see above) can be used for gMLP variants.

Fourth, it contains a script which runs the Saxon XSLT processor standalone. This "illiterate" version is just a copy of the gMLP version described earlier. This script is useful (but the very useful) only if rebuilding the gMLP system from source.

run-saxon.sh-illiterate

Finally, it may or may not contain the example shell script "hello.sh", depending on how I copy things.

To regenerate the system from the source, should you wish to do so:

./run-make-tangle.sh clean cp graphein-tonotangle.xsl-illiterate graphein-tonotangle.xsl cp run-saxon.sh-illiterate run-saxon.sh ./run-saxon.sh literate-programming.tei > literate-programming.nw notangle -t8 -Rmakefile-tangle literate-programming.nw > makefile-tangle notangle -t8 -Rrun-make-tangle.sh literate-programming.nw > run-make-tangle.sh notangle -t8 -Rrun-make-weave.sh literate-programming.nw > run-make-weave.sh notangle -t8 -Rrun-make-weave-and-tangle.sh literate-programming.nw > run-make-weave-and-tangle.sh chmod +x run-make-*.sh ./run-make-weave-and-tangle.sh

This requires most of the full complement of graphein tools (described elsewhere), including: an XSLT processor (Saxon, here), the XML Commons Resolver (installed and configured), the TEI P5, xmllint, Noweb (notangle only), GNU make, bash, and Awk.

To use the system, copy these components (and the components use for weaving) into the "new-static-files" subdirectory of the root directory of the graphein hierarchy.