Get the highlights in your inbox every week.
Automating the creation of research artifacts | Opensource.com
Automating the creation of research artifacts
A simple way to automate generating source code documentation, creating HTML and PDF versions of user documentation, compiling a technical (research) document to PDF, generating the bibliography, and provisioning virtual machines.
In my work as a programming language researcher, I need to create artifacts that are easy to understand and well-documented. To make my work easier, I found a simple way to automate generating source code documentation, creating HTML and PDF versions of user documentation, compiling a technical (research) document to PDF, generating the bibliography, and provisioning of virtual machines with the software artefact installed for ease of reproducibility of my research.
The tools I use are:
- Make makefiles for overall orchestration of all components
- Haddock for generating source code documentation
- Pandoc for generating PDF and HTML files from a Markdown file
- Vagrant for provisioning virtual machines
- Stack for downloading Haskell dependencies, compiling, running tests, etc
- pdflaTeX for compiling a LaTeX file to PDF format
- BibTeX for generating a bibliography
- Zip to pack everything and get it ready for distribution
I use the following folder and file structure:
│ └── typechecker-oopl (Project)
│ ├── Makefile
│ └── ...
│ ├── Makefile
│ ├── README.md
│ ├── assets
│ │ ├── pandoc.css (Customised CSS for Pandoc)
│ │ └── submitted-version.pdf (PDF of your research)
│ └── meta.yaml
│ ├── Makefile
│ ├── ACM-Reference-Format.bst
│ ├── acmart.cls
│ ├── biblio.bib
│ └── typecheckingMonad.tex
The Makefile glues together the output from all of the tools listed above. The code folder contains the source code of the tool/language I created. The documentation folder contains a Makefile that has instructions on how to generate PDF and HTML versions of the user instructions, located in the README.md file. I generate the PDF and HTML user documentation using Pandoc. The assets are simply the CSS style to use and a PDF of my research article that will be hyperlinked from the user-generated documentation, so that it is easy to follow. meta.yaml contains meta instructions for generating the user documentation, used by Pandoc for e.g., for author names. The research folder contains my research article in LaTeX format, but it could hold any other technical document.
As you can see in the structure, I have a Makefile for each folder to decouple each Makefile's responsibility and keep a (somewhat) maintainable design. Here is an overview of the top-level Makefile, which orchestrates generating the user documentation, research paper, bibliography, documentation from source code, and provisioning of a virtual machine.
all: doc gen
make -C $(DOC_SRC) $@
make -C $(CODE_PATH) $@
make -C $(RESEARCH)
# Creation of folder with artefact, empty at the moment
mkdir -p $(ARTEFACT_FOLDER)
# Moving user documentation to artefact folder
cp $(DOC_SRC)/$(README).pdf $(ARTEFACT_FOLDER)
cp $(DOC_SRC)/$(README).html $(ARTEFACT_FOLDER)
cp -r $(DOC_SRC)/$(ASSETS) $(ARTEFACT_FOLDER)
# Moving research article to artefact folder
cp $(RESEARCH)/$(RESEARCH_PAPER).pdf $(ARTEFACT_FOLDER)/$(ASSETS)/submitted-version.pdf
# Moving code and autogenerated doc to artefact folder
cp -r $(CODE_PATH) $(ARTEFACT_FOLDER)
rm -rf $(ARTEFACT_FOLDER)/$(DOC_SRC)
mv $(ARTEFACT_FOLDER)/$(CODE_SRC)/$(HADDOCK) $(ARTEFACT_FOLDER)/$(DOC_SRC)
# zip it!
zip $(ZIP_FILE) $(ARTEFACT_FOLDER)
rm -rf $(ARTEFACT_FOLDER)
.PHONY: all clean doc gen update
First, the doc target generates the user documentation using Pandoc, then it uses Haddock to generate the documentation from the Haskell library source code, and finally, it creates a PDF from the LaTeX file. As depicted in the image below, the generated user documentation is in HTML and CSS. The user documentation contains links to the generated source code documentation, also in HTML and CSS, and to the technical (research) paper . The generated source code documentation links directly to the source code, in case the reader would like to understand the implementation.
The user documentation is generated with the following Makefile:
.PHONY: all doc clean
pandoc -s $(META) $(DOCS) --listings --pdf-engine=xelatex -c assets/pandoc.css -o $(DOCS:md=pdf)
pandoc -s $(META) $(DOCS) --self-contained -c assets/pandoc.css -o $(DOCS:md=html)
rm $(DOCS:md=pdf) $(DOCS:md=html)
To generate documentation from Haskell code, I use this other Makefile, which makes use of Stack to compile the library and download dependencies, and Haddock (inside its OPTS, or options) to create documentation in HMTL:
OPTS=exec -- haddock --html --hyperlinked-source --odir=docs
stack $(OPTS) src/Initial/AST.hs src/Initial/Typechecker.hs \
src/Reader/AST.hs src/Reader/Typechecker.hs \
src/Backtrace/AST.hs src/Backtrace/Typechecker.hs \
src/Warning/AST.hs src/Warning/Typechecker.hs \
src/MultiError/AST.hs src/MultiError/Typechecker.hs \
src/PhantomFunctors/AST.hs src/PhantomFunctors/Typechecker.hs \
src/PhantomPhases/AST.hs src/PhantomPhases/Typechecker.hs \
src/Applicative/AST.hs src/Applicative/Typechecker.hs \
I compile the research paper from LaTeX to PDF with this simple Makefile:
The virtual machine (VM) relies on Vagrant and the Vagrantfile, where I can write all the commands to set up the VM. The one thing that I do not know how to automate is moving all of the documentation, once it is generated, into the VM. If you know how to transfer the file from the host machine to the VM, please share your solution in the comments. That means that, currently, I manually enter in the VM and place the documentation in the Desktop folder.
# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure("2") do |config|
config.vm.box = "ubuntu/trusty64"
config.ssh.username = "vagrant"
config.ssh.password = "vagrant"
config.vm.provider "virtualbox" do |vb|
# Display the VirtualBox GUI when booting the machine
vb.gui = true
# Customize the amount of memory on the VM:
vb.memory = "2048"
vb.customize ["modifyvm", :id, "--vram", "64"]
config.vm.provision "shell", inline: <<-SHELL
## Installing dependencies, comment after this has been done once.
# sudo apt-get update -y
# sudo apt-get install ubuntu-desktop -y
# sudo apt-get install -y build-essential linux-headers-server
# echo 'PATH="/home/vagrant/.local/bin:$PATH"' >> /home/vagrant/.profile
## Comment and remove the folder sharing before submission
mkdir -p /home/vagrant/Desktop/TypeChecker
cp -r /vagrant/artefact-submission/* /home/vagrant/Desktop/TypeChecker/
chown -R vagrant:vagrant /home/vagrant/Desktop/TypeChecker/
I have used this setup for two conferences—the European Conference on Object-Oriented Programming (ECOOP) and the International Conference on Software Language Engineering (SLE), where we won (in both) the Disguinshed Artifact Award.