In May 2020, in the early months of the coronavirus pandemic, I started a summer internship with Red Hat. COVID-19 had cut short my third year studying computer engineering at Georgia Tech, and I learned I would have to work remotely all summer. I wasn't sure what to expect from a virtual internship.
This was my second internship but my first time working completely remotely. I adjusted quickly to the new virtual environment—I think having so little experience in a physical workspace helped me. The hardest part of working remotely was knowing when to stop working and call it a day. It was easy to get wrapped up in my work, write another line of code, and just finish one more task. However, I will say that I didn't miss the commute.
I spent my internship working on the Pulp team. Pulp is a platform for managing repositories of software packages and making them available to many consumers. Pulp can mirror all or part of a repository locally, host software packages in repositories, and manage many types of content from multiple sources in one place. To manage a certain type of content with Pulp, you just add a content plugin to it.
Bringing the Pulp Python plugin up to date
Since Pulp 3's release in December 2019, the Pulp community has been working on a rapid stream of releases. In my internship, I was focused on bringing the Pulp Python plugin up to date and adding functionality so that the Pulp Python plugin could mirror the entire PyPI repository, which is huge with thousands of packages. While this was technically possible before, it was extremely time-consuming and required a huge volume of requests to PyPI's servers, which isn't practical.
I was mentored by Pulp engineers Daniel Alley, Dennis Kliban, and Grant Gainey. As we looked at how to approach the problem, Daniel suggested that the Pulp plugin should interact with Python's repository-mirroring software, Bandersnatch.
However, nothing worth doing is ever that easy. The Bandersnatch API required some updates to work with Pulp's Python plugin. Daniel opened a conversation with the Bandersnatch community and explained what we intended to do. They were very receptive to our ideas and very willing to broaden the code so that it could be more widely used. So, I ended up contributing to both Pulp Python and Bandersnatch so that the Pulp Python plugin could take advantage of the Bandersnatch filtering toolset.
Now that this work is complete, you can use the Pulp Python plugin to mirror the entire PyPI in just over an hour. With the Pulp team's contributions to Bandersnatch, it should also be possible to use the Bandersnatch API to mirror Python content from sources other than PyPI (including Pulp itself).
Working across communities
From working in both the Pulp and the Python Bandersnatch communities, I learned that every open source community does things differently. It is important to understand each open source community's preferred methods for issue tracking, testing, commit messages, pull requests, and changelog submissions. I also learned that in any community you're working with, understanding its members and their goals is crucial in getting work done that benefits everyone.
The latest version of the Pulp Python plugin is available with Pulp 3.9 and higher. You can check out all of its features and how to use them through the documentation. If you'd like to try it out, Pulp Python can be installed from PyPI or source. Client bindings for Python and Ruby are also available.