Automating the creation of research artifacts

A simple way to automate generating source code documentation, creating HTML and PDF versions of user documentation, compiling a technical (research) document to PDF, generating the bibliography, and provisioning virtual machines.
102 readers like this
102 readers like this
Files in a folder

In my work as a programming language researcher, I need to create artifacts that are easy to understand and well-documented. To make my work easier, I found a simple way to automate generating source code documentation, creating HTML and PDF versions of user documentation, compiling a technical (research) document to PDF, generating the bibliography, and provisioning of virtual machines with the software artefact installed for ease of reproducibility of my research.

The tools I use are:

  • Make makefiles for overall orchestration of all components
  • Haddock for generating source code documentation
  • Pandoc for generating PDF and HTML files from a Markdown file
  • Vagrant for provisioning virtual machines
  • Stack for downloading Haskell dependencies, compiling, running tests, etc
  • pdflaTeX for compiling a LaTeX file to PDF format
  • BibTeX for generating a bibliography
  • Zip to pack everything and get it ready for distribution

I use the following folder and file structure:

├── Makefile
├── Vagrantfile
├── code
│   └── typechecker-oopl (Project)
│       ├── Makefile
│       └── ...

├── documentation
│   ├── Makefile
│   ├──
│   ├── assets
│   │   ├── pandoc.css (Customised CSS for Pandoc)
│   │   └── submitted-version.pdf (PDF of your research)
│   └── meta.yaml

├── research
│   ├── Makefile
│   ├── ACM-Reference-Format.bst
│   ├── acmart.cls
│   ├── biblio.bib
│   └── typecheckingMonad.tex

The Makefile glues together the output from all of the tools listed above. The code folder contains the source code of the tool/language I created. The documentation folder contains a Makefile that has instructions on how to generate PDF and HTML versions of the user instructions, located in the file. I generate the PDF and HTML user documentation using Pandoc. The assets are simply the CSS style to use and a PDF of my research article that will be hyperlinked from the user-generated documentation, so that it is easy to follow. meta.yaml contains meta instructions for generating the user documentation, used by Pandoc for e.g., for author names. The research folder contains my research article in LaTeX format, but it could hold any other technical document.

As you can see in the structure, I have a Makefile for each folder to decouple each Makefile's responsibility and keep a (somewhat) maintainable design. Here is an overview of the top-level Makefile, which orchestrates generating the user documentation, research paper, bibliography, documentation from source code, and provisioning of a virtual machine.

all: doc gen

        make -C $(DOC_SRC) $@
        make -C $(CODE_PATH) $@
        make -C $(RESEARCH)

        # Creation of folder with artefact, empty at the moment
        mkdir -p $(ARTEFACT_FOLDER)

        # Moving user documentation to artefact folder
        cp $(DOC_SRC)/$(README).pdf $(ARTEFACT_FOLDER)
        cp $(DOC_SRC)/$(README).html $(ARTEFACT_FOLDER)
        cp -r $(DOC_SRC)/$(ASSETS) $(ARTEFACT_FOLDER)

        # Moving research article to artefact folder
        cp $(RESEARCH)/$(RESEARCH_PAPER).pdf $(ARTEFACT_FOLDER)/$(ASSETS)/submitted-version.pdf

        # Moving code and autogenerated doc to artefact folder
        cp -r $(CODE_PATH) $(ARTEFACT_FOLDER)
        cd ../..
        rm -rf $(ARTEFACT_FOLDER)/$(DOC_SRC)

        # zip it!
        zip $(ZIP_FILE) $(ARTEFACT_FOLDER)

        vagrant up
        vagrant provision

        rm -rf $(ARTEFACT_FOLDER)

.PHONY: all clean doc gen update

First, the doc target generates the user documentation using Pandoc, then it uses Haddock to generate the documentation from the Haskell library source code, and finally, it creates a PDF from the LaTeX file. As depicted in the image below, the generated user documentation is in HTML and CSS. The user documentation contains links to the generated source code documentation, also in HTML and CSS, and to the technical (research) paper . The generated source code documentation links directly to the source code, in case the reader would like to understand the implementation.

Artifact automation structure

The user documentation is generated with the following Makefile:

.PHONY: all doc clean

all: doc

doc: $(DOC)
        pandoc -s $(META) $(DOCS) --listings --pdf-engine=xelatex -c assets/pandoc.css -o $(DOCS:md=pdf)
        pandoc -s $(META) $(DOCS) --self-contained -c assets/pandoc.css -o $(DOCS:md=html)

        rm $(DOCS:md=pdf) $(DOCS:md=html)

To generate documentation from Haskell code, I use this other Makefile, which makes use of Stack to compile the library and download dependencies, and Haddock (inside its OPTS, or options) to create documentation in HMTL:

OPTS=exec -- haddock --html --hyperlinked-source --odir=docs

        stack $(OPTS) src/Initial/AST.hs src/Initial/Typechecker.hs \
        src/Reader/AST.hs src/Reader/Typechecker.hs \
        src/Backtrace/AST.hs src/Backtrace/Typechecker.hs \
        src/Warning/AST.hs src/Warning/Typechecker.hs \
        src/MultiError/AST.hs src/MultiError/Typechecker.hs \
        src/PhantomFunctors/AST.hs src/PhantomFunctors/Typechecker.hs \
        src/PhantomPhases/AST.hs src/PhantomPhases/Typechecker.hs \
        src/Applicative/AST.hs src/Applicative/Typechecker.hs \
        src/Final/AST.hs src/Final/Typechecker.hs

.PHONY: doc

I compile the research paper from LaTeX to PDF with this simple Makefile:

.PHONY: research

        pdflatex typecheckingMonad.tex
        bibtex typecheckingMonad
        pdflatex typecheckingMonad.tex
        pdflatex typecheckingMonad.tex

The virtual machine (VM) relies on Vagrant and the Vagrantfile, where I can write all the commands to set up the VM. The one thing that I do not know how to automate is moving all of the documentation, once it is generated, into the VM. If you know how to transfer the file from the host machine to the VM, please share your solution in the comments. That means that, currently, I manually enter in the VM and place the documentation in the Desktop folder.

# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config| = "ubuntu/trusty64"
  config.ssh.username = "vagrant"
  config.ssh.password = "vagrant"
  config.vm.provider "virtualbox" do |vb|
    # Display the VirtualBox GUI when booting the machine
    vb.gui = true

    # Customize the amount of memory on the VM:
    vb.memory = "2048"
    vb.customize ["modifyvm", :id, "--vram", "64"]
  config.vm.provision "shell", inline: <<-SHELL
    ## Installing dependencies, comment after this has been done once.
    # sudo apt-get update -y
    # sudo apt-get install ubuntu-desktop -y
    # sudo apt-get install -y build-essential linux-headers-server

    # echo 'PATH="/home/vagrant/.local/bin:$PATH"' >> /home/vagrant/.profile

    ## Comment and remove the folder sharing before submission
    mkdir -p /home/vagrant/Desktop/TypeChecker
    cp -r /vagrant/artefact-submission/* /home/vagrant/Desktop/TypeChecker/
    chown -R vagrant:vagrant /home/vagrant/Desktop/TypeChecker/

With this final step, everything has been wired. You can see one example of the result in HTML and in PDF. I have created a GitHub repo with all the source code for ease of study and reproducibility.

I have used this setup for two conferences—the European Conference on Object-Oriented Programming (ECOOP) and the International Conference on Software Language Engineering (SLE), where we won (in both) the Disguinshed Artifact Award.

What to read next
Kiko is a PhD student in programming languages and the main lecturer of the course Advanced Software Design at Uppsala University. He is also a core developer of the Encore programming language, has written research publications about concurrent and parallel data structures and has won two Best Paper awards and two Distinguished Artifact awards in his short (yet) academic career.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.