Automating the creation of research artifacts

A simple way to automate generating source code documentation, creating HTML and PDF versions of user documentation, compiling a technical (research) document to PDF, generating the bibliography, and provisioning virtual machines.

In my work as a programming language researcher, I need to create artifacts that are easy to understand and well-documented. To make my work easier, I found a simple way to automate generating source code documentation, creating HTML and PDF versions of user documentation, compiling a technical (research) document to PDF, generating the bibliography, and provisioning of virtual machines with the software artefact installed for ease of reproducibility of my research.

The tools I use are:

Make makefiles for overall orchestration of all components
Haddock for generating source code documentation
Pandoc for generating PDF and HTML files from a Markdown file
Vagrant for provisioning virtual machines
Stack for downloading Haskell dependencies, compiling, running tests, etc
pdflaTeX for compiling a LaTeX file to PDF format
BibTeX for generating a bibliography
Zip to pack everything and get it ready for distribution

I use the following folder and file structure:

├── Makefile
├── Vagrantfile
├── code
│   └── typechecker-oopl (Project)
│       ├── Makefile
│       └── ...
│
├── documentation
│   ├── Makefile
│   ├── README.md
│   ├── assets
│   │   ├── pandoc.css (Customised CSS for Pandoc)
│   │   └── submitted-version.pdf (PDF of your research)
│   └── meta.yaml
│
├── research
│   ├── Makefile
│   ├── ACM-Reference-Format.bst
│   ├── acmart.cls
│   ├── biblio.bib
│   └── typecheckingMonad.tex

The Makefile glues together the output from all of the tools listed above. The code folder contains the source code of the tool/language I created. The documentation folder contains a Makefile that has instructions on how to generate PDF and HTML versions of the user instructions, located in the README.md file. I generate the PDF and HTML user documentation using Pandoc. The assets are simply the CSS style to use and a PDF of my research article that will be hyperlinked from the user-generated documentation, so that it is easy to follow. meta.yaml contains meta instructions for generating the user documentation, used by Pandoc for e.g., for author names. The research folder contains my research article in LaTeX format, but it could hold any other technical document.

As you can see in the structure, I have a Makefile for each folder to decouple each Makefile's responsibility and keep a (somewhat) maintainable design. Here is an overview of the top-level Makefile, which orchestrates generating the user documentation, research paper, bibliography, documentation from source code, and provisioning of a virtual machine.

all: doc gen

doc:
	make -C $(DOC_SRC) $@
	make -C $(CODE_PATH) $@
	make -C $(RESEARCH)

gen:
	# Creation of folder with artefact, empty at the moment
	mkdir -p $(ARTEFACT_FOLDER)

	# Moving user documentation to artefact folder
	cp $(DOC_SRC)/$(README).pdf $(ARTEFACT_FOLDER)
	cp $(DOC_SRC)/$(README).html $(ARTEFACT_FOLDER)
	cp -r $(DOC_SRC)/$(ASSETS) $(ARTEFACT_FOLDER)

	# Moving research article to artefact folder
	cp $(RESEARCH)/$(RESEARCH_PAPER).pdf $(ARTEFACT_FOLDER)/$(ASSETS)/submitted-version.pdf

	# Moving code and autogenerated doc to artefact folder
	cp -r $(CODE_PATH) $(ARTEFACT_FOLDER)
	cd $(ARTEFACT_FOLDER)/$(CODE_SRC)
	$(STACK)
	cd ../..
	rm -rf $(ARTEFACT_FOLDER)/$(DOC_SRC)
	mv $(ARTEFACT_FOLDER)/$(CODE_SRC)/$(HADDOCK) $(ARTEFACT_FOLDER)/$(DOC_SRC)

	# zip it!
	zip $(ZIP_FILE) $(ARTEFACT_FOLDER)

update:
	vagrant up
	vagrant provision

clean:
	rm -rf $(ARTEFACT_FOLDER)

.PHONY: all clean doc gen update

First, the doc target generates the user documentation using Pandoc, then it uses Haddock to generate the documentation from the Haskell library source code, and finally, it creates a PDF from the LaTeX file. As depicted in the image below, the generated user documentation is in HTML and CSS. The user documentation contains links to the generated source code documentation, also in HTML and CSS, and to the technical (research) paper . The generated source code documentation links directly to the source code, in case the reader would like to understand the implementation.

The user documentation is generated with the following Makefile:

DOCS=README.md
META=meta.yaml
NUMBER_SECTION_HEADINGS=-N

.PHONY: all doc clean

all: doc

doc: $(DOC)
	pandoc -s $(META) $(DOCS) --listings --pdf-engine=xelatex -c assets/pandoc.css -o $(DOCS:md=pdf)
	pandoc -s $(META) $(DOCS) --self-contained -c assets/pandoc.css -o $(DOCS:md=html)

clean:
	rm $(DOCS:md=pdf) $(DOCS:md=html)

To generate documentation from Haskell code, I use this other Makefile, which makes use of Stack to compile the library and download dependencies, and Haddock (inside its OPTS, or options) to create documentation in HMTL:

OPTS=exec -- haddock --html --hyperlinked-source --odir=docs

doc:
	stack $(OPTS) src/Initial/AST.hs src/Initial/Typechecker.hs \
	src/Reader/AST.hs src/Reader/Typechecker.hs \
	src/Backtrace/AST.hs src/Backtrace/Typechecker.hs \
	src/Warning/AST.hs src/Warning/Typechecker.hs \
	src/MultiError/AST.hs src/MultiError/Typechecker.hs \
	src/PhantomFunctors/AST.hs src/PhantomFunctors/Typechecker.hs \
	src/PhantomPhases/AST.hs src/PhantomPhases/Typechecker.hs \
	src/Applicative/AST.hs src/Applicative/Typechecker.hs \
	src/Final/AST.hs src/Final/Typechecker.hs

.PHONY: doc

I compile the research paper from LaTeX to PDF with this simple Makefile:

.PHONY: research

research:
	pdflatex typecheckingMonad.tex
	bibtex typecheckingMonad
	pdflatex typecheckingMonad.tex
	pdflatex typecheckingMonad.tex

The virtual machine (VM) relies on Vagrant and the Vagrantfile, where I can write all the commands to set up the VM. The one thing that I do not know how to automate is moving all of the documentation, once it is generated, into the VM. If you know how to transfer the file from the host machine to the VM, please share your solution in the comments. That means that, currently, I manually enter in the VM and place the documentation in the Desktop folder.

# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
  config.vm.box = "ubuntu/trusty64"
  config.ssh.username = "vagrant"
  config.ssh.password = "vagrant"
  config.vm.provider "virtualbox" do |vb|
    # Display the VirtualBox GUI when booting the machine
    vb.gui = true

    # Customize the amount of memory on the VM:
    vb.memory = "2048"
    vb.customize ["modifyvm", :id, "--vram", "64"]
  end
  config.vm.provision "shell", inline: <<-SHELL
    ## Installing dependencies, comment after this has been done once.
    # sudo apt-get update -y
    # sudo apt-get install ubuntu-desktop -y
    # sudo apt-get install -y build-essential linux-headers-server

    # echo 'PATH="/home/vagrant/.local/bin:$PATH"' >> /home/vagrant/.profile

    ## Comment and remove the folder sharing before submission
    mkdir -p /home/vagrant/Desktop/TypeChecker
    cp -r /vagrant/artefact-submission/* /home/vagrant/Desktop/TypeChecker/
    chown -R vagrant:vagrant /home/vagrant/Desktop/TypeChecker/
  SHELL
end

With this final step, everything has been wired. You can see one example of the result in HTML and in PDF. I have created a GitHub repo with all the source code for ease of study and reproducibility.

I have used this setup for two conferences—the European Conference on Object-Oriented Programming (ECOOP) and the International Conference on Software Language Engineering (SLE), where we won (in both) the Disguinshed Artifact Award.

A simple, scalable solution for storing and serving build artifacts

At Pinterest, our mission is to help people discover things they love so they can live a more creative and fulfilling life. Pinterest engineering moves amazingly fast, with…