The challenges of maintaining a large open source project

Open source dependency management is a balancing act

Open source dependency management is a balancing act
Image by : 

opensource.com

During my career I have spent a lot of time packaging other people's code, writing my own, and working on large software frameworks. I have seen projects that still haven't released a stable version, never quite hitting 1.0, while others made 1.0 releases within months of beginning development, and then quickly moving on to 2.0, 3.0, etc. There is quite a variance in these release cycles, and this coupled with maintaining large projects can make things difficult.

I will go through some of the decisions we have faced in projects I have worked on and the pressures on the project. On the one extreme, users would like to have a stable API that never changes, with dependencies that don't specify a minimum version so that they can choose whatever version works best. The other extreme pushes us to use the latest features of the language, of hardware accelerated APIs, compilers, and the libraries we depend upon.

Should you use these on day one, demanding the very latest versions, or give your users, distros and other users time to update? When changing an API becomes necessary do you change it immediately or wait and batch changes into a major release where due warning is given, reducing the frequency of API changes?

Open Chemistry and C++11

When we started development of the Open Chemistry project we looked quite seriously at requiring C++11, and I was dissuaded at the time by several in our community. We ended up using some small parts of C++11 that could be made optional and falling back to Boost implementations/empty macro definitions. At the time I think it was perhaps a little too aggressive, but if I could go back I would have told my former self to go for it. The project was new, had few existing users, and was mainly targeting the desktop. Add to that the fact that adoption often takes a few years and there is the cost of supporting older compilers.

Several major projects, such as Qt, have only just started to require C++11 in the last year, with more going +/- a year. I think now it is quite clear that the advantages outweigh the costs, but requiring very new compilers can be a hard sell. This has become easier with Microsoft releasing a community edition of Visual Studio 2015 that features full C++11 support, with Linux and Mac having had well-supported C++11 compilers for a while. The secure and long-term support variants of Linux distributions are still presenting some challenges there, but it is largely a solved problem now.

I have made the rest of our dependencies work with Visual Studio 2015 (still the hardest platform to get dependencies working on well), and will remove our optional Boost fallback code soon. There are now other clear and distinct advantages, such as the PyBind11 project that offers a header-only Python wrapping API for C++ with no dependencies other than C++11. The language changes also make some local implementations irrelevant and remove the need for complex fallbacks in order to use threads, atomics, and other new features. The question of C++14 is still looming, and I am tempted to jump straight to that as it looks like all of our platforms have good support now. C++14 looks like more of a bug fix release, but it has some great fixes in it!

Third-party libraries

As someone who spent many years packaging software for Gentoo Linux I would curse your name if you bundled a bunch of third-party libraries in your project. As a software developer I can totally see why projects do it and how it can make things much easier to get up and running. If the project is well behaved, developers will come up with a solution that builds bundled libraries by default and makes it easy to use system libraries for packagers and others.

This relies on the upstreams for the third-party libraries maintaining stable APIs, for the downstream projects to stick to those stable APIs, and for regular releases that are used in the downstream projects quickly. For the Linux distributions a security fix should be applied ideally only in one or two places and then all dependent projects either rebuilt or simply linked to the updated library the next time they reload.

This gets much more difficult when the library has an API that changes or when the project’s developers don't update their third-party libraries regularly (or too often when the API is changing forcing updates in dependencies). The ideal solution is obviously to use projects with regular release cycles, to integrate their new releases rapidly, and ideally to maintain a stable API that offers some flexibility in the version of the library that can be used.

On the Windows and Mac OS X platforms these libraries will be updated rarely after the package has been installed and for this reason tracking upstream and making sure any security fixes are integrated is even more important. Doing so is challenging and is one of the reasons why I have been using CMake's external project to build dependencies that offer many of the advantages of bundling third-party libraries with simpler integration of updated versions.

OpenGL and scientific visualization

I authored a paper nearly two years ago that discusses the rewrite of the rendering backend for the Visualization Toolkit (VTK). I think that rewrite was long overdue, and as someone who had written rendering code before joining Kitware I found it frustrating that we couldn't use the latest advances in OpenGL (or at least some from the last decade). I knew that several rendering workloads I was interested in could benefit from an update, and it seemed like this would be quite universal.

I had started looking at how we might make some incremental changes, but the problem was that OpenGL had changed so significantly that it was tough. Others agreed that a rewrite was needed, but felt that more funding was needed to rewrite all of the rendering code. This was quite an undertaking, and is still not finished today. Great progress has been made, but there are always new features people want, optimizations, and new parts of the rendering drivers being written to get the most out of the latest graphics cards.

We called the new rendering backend OpenGL2, the old one was mostly OpenGL 1.1, with some runtime detection of new features. Initially we were targeting OpenGL 2.1 on the desktop and OpenGL ES 2.0 on the embedded systems, making OpenGL2 feel quite apt. Also, from a lot of my research, I found it was clear that OpenGL 2.1 and OpenGL ES 2.0 had a common subset of API used by a number of other projects. As the project evolved there was a wish to use more and more from OpenGL 3.2 and OpenGL ES 3.0—the new minimum evolved.

We have had trouble with some systems/chips that we would like to deploy to, and software rendering was challenging at times. Overall we have been able to make things work. It comes down to a fine balance, and where the extremes are can be clear, but the middle ground often ends up feeling harder to define. There is a real cost to pushing out too far to either extreme, and I think that VTK was in danger of being left behind due to the use of very old APIs, and were we to start requiring OpenGL 4.4 next week we would be far out on the other end of the spectrum.

Qt 3 vs. 4 vs. 5

I have seen three major releases of Qt during my development career, with the transition from 3 to 4 being the most challenging. Each of these updates have involved quite significant changes, but far less so in the 4 to 5 transition. The Avogadro 2 project moved to Qt 5 several years ago as it was newer (it began with 4, and is another case where in retrospect I wish we had jumped forward a little sooner). In the Tomviz project we reuse ParaView, and used Qt 4 until about a year ago at which point we did the work to move over to 5.

The ParaView project itself still defaults to Qt 4, and supporting Qt 4 on modern operating systems is becoming increasingly difficult. They have supported both for a while, but they still have several issues holding them back. Modern operating systems have moved on significantly, with high resolution displays and updates to the user interface both causing significant divergence from the Qt 4 target.

They also made significant changes to the way that Qt integrates with the host operating system and how it interacts with OpenGL. Add to this significant changes to the threading models and updates to take advantage of new features in C++11 and beyond and supporting both major versions can be hard.

Python 2 vs. 3

This has kept me up at night, and several projects have had to choose which way to jump or straddle such that they can offer both. In the VTK project our generator code had to be updated, and we are able to offer Python 2 or 3, but haven't put in the work to offer both simultaneously. The ParaView project, which builds on VTK, recently updated its code to support both Python 2 and 3 but Python 3 support is still very experimental there. I am hoping to update Tomviz in the next month or two to offer support for Python 3, and this is becoming increasingly important as more Python modules switch to Python 3 only.

On the web side I worked on a project that extended the Girder platform, which itself uses CherryPy, PyMongo, and many other modules. When we started the project it only supported Python 2, but they added support for Python 3 shortly after—enabling us to switch to Python 3. There we decided not to support both as it was a new project, and we knew it was unlikely to be widely used in the short term. At the Google Summer of Code mentor summit several in the Python community expressed frustration with the situation and stated they had mixed feelings on whether the cost of supporting both and forgoing new language features was worth it.

Closing thoughts

Hopefully we can maintain a good middle ground that best serves our users, and be cognizant of the cost of being too conservative or too aggressive. Most developers are eager to use the latest features, and it can be extremely frustrating to know there is a better way that cannot be employed. I think there is a significant cost to being too conservative, but I have seen other projects that update and change too aggressively lose mind share.

2 Comments

igoddard

Why do people offer APIs? Presumably because they want people to use them. What are they offering to make the API useful to others? The functionality, certainly. But that's only one part of what makes it useful. Stability must be another. Who would want to commit to using a specific API if they knew its authors were going to keep changing it? The offer of an API ought to be seen as carrying an implicit promise to maintain stability.

Vote up!
0
Vote down!
0
mhanwell

As I said in the article in an ideal world APIs would be stable forever, but the reality is that this is rarely possible. New technologies are developed, better ways of batching commands, composing data, etc. If you don't evolve your API to take advantage of these things then people will often move on to libraries that do, and this is why most libraries have used version numbers to signal what might have changed - major version could be API changes, minor version is usually additions to API, bug fix release just fixes bugs/issues. There is an implicit promise, but there is also an implicit promise to continue developing the library, adding features, and taking advantage of new technology/approaches/ideas.

Vote up!
0
Vote down!
0