Including external tables for markdown/pandoc

I use pandoc for all sorts of workflows. From one report, we can create a PDF, docx, or HTML page.

One difficulty though is managing tabular data that is dynamic, or which is susceptible to change. In its basic form, the markdown format doesn’t have an “include” concept, like that of LaTeX. But you can easily work around this shortcoming using awk.

What I like to do is something similar to this:

This is a sample paragraph introducing the table.

TAB table1.md

More prose....

I essentially use TAB as a placeholder for where I want the contents of a separate file to go.

Replacing that line with the contents of the file can then easily be done using a one-line awk script.

awk '/TAB/ { system("cat " $2; next } { print }' my-document.md

If you don’t know awk, you should, but here is the basic jist of the script.

awk scans line by line. If the line matches the text “TAB”, then run the commands in the brackets. The first command is to call the cat program, which prints the contents of files to standard output. The file that we want printed out is the second field ($2) of the line. awk splits fields by whitespace by default, so make sure you don’t have spaces in your filename (which hopefully you don’t do, right?).

The next statement just moves onto processing the next line, so that we don’t do the default behavior, which is to just print the line, which is what the print command does in the second set of brackets.

So this allows a different program to have the sole job of just creating the data for the table, leaving the actual insertion of the contents to this short script.

Tie it all together in a Makefile and you’re on your way to building some cool documents!

Disclaimer: First came across this in the awk manual, which everyone should peruse.