What's the difference between a fork and a distribution?

Open source software distributions and forks are not the same. Learn the differences and the potential risks.
355 readers like this
355 readers like this
Forks and spoons, Open Office and Libre Office

Photo by Jason Hibbets

If you've been around open source software for any length of time, you'll hear the terms fork and distribution thrown around casually in conversation. For many people, the distinction between the two isn't clear, so here I'll try to clear up the confusion.

First, some definitions

Before explaining the nuances of a fork vs. a distribution and the pitfalls thereof, let's define key concepts.

Open source software is software that:

  • Is freely available to distribute under certain license restraints
  • Permits its source code to be viewable and modified under certain license restraints

Open source software can be consumed in the following ways:

  • Downloaded in binary or source code format, often at no charge (e.g., the Eclipse developer environment)
  • As a distribution (product) by a vendor, sometimes at a cost to the user (e.g., Red Hat products)
  • Embedded into proprietary software solutions (e.g., some smartphones and browsers display fonts using the open source freetype software)

Free and open source (FOSS) is not necessarily "free" as in "zero cost." Free and open source simply means the software is free to distribute, modify, study, and use, subject to the software's licensing. The software distributor may attach a purchase price to it. For example, Linux is available at no cost as Fedora, CentOS, Gentoo, etc. or as a paid distribution as Red Hat Enterprise Linux, SUSE, etc.

Community refers to the organizations and individuals that collaboratively work on an open source project. Any individual or organization can contribute to the project by writing or reviewing code, documentation, test suites, managing meetings, updating websites, etc., provided they abide by the license. For example, at Openhub.net, we see government, nonprofit, commercial, and education organizations contributing to some open source projects.

An open source project is the result of this collaborative development, documentation, and testing. Most projects have a central repository where code, documentation, testing, and so forth are developed.

A distribution is a copy, in binary or source code format, of an open source project. For example, CentOS, Fedora, Red Hat Enterprise Linux, SUSE, Ubuntu, and others are distributions of the Linux project. Tectonic, Google Kubernetes Engine, Amazon Container Service, and Red Hat OpenShift are distributions of the Kubernetes project.

Vendor distributions of open source projects are often called products, thus Red Hat OpenStack Platform is the Red Hat OpenStack product that is a distribution of the OpenStack upstream project—and it is still 100% open source.

The trunk is the main workstream in the community where the open source project is developed.

An open source fork is a version of the open source project that is developed along a separate workstream from the main trunk.

Thus, a distribution is not the same as a fork. A distribution is a packaging of the upstream project that is made available by vendors, often as products. However, the core code and documentation in the distribution adhere to the version in the upstream project. A fork—and any distribution based on the fork—results in a version of the code and documentation that are different from the upstream project. Users who have forked upstream open source code have to maintain it on their own, meaning they lose the benefit of the collaboration that takes place in the upstream community.

To further explain a software fork, let's use the analogy of migrating animals. Whales and sea lions migrate from the Arctic to California and Mexico; Monarch butterflies migrate from Alaska to Mexico; and (in the Northern Hemisphere) swallows and many other birds fly south for the winter. The key to a successful migration is that all animals in the group stick together, follow the leaders, find food and shelter, and don't get lost.

Risks of going it on your own

A bird, butterfly, or whale that strays from the group loses the benefit of remaining with the group and knowing where to find food, shelter, and the desired destination.

Similarly, users or organizations that fork and modify an upstream project and maintain it on their own run the following risks:

  1. They cannot update their code based on the upstream because their code differs. This is known as technical debt; the more changes made to forked code, the more it costs in time and money to rebase the fork to the upstream project.
  2. They potentially run less secure code. If a vulnerability is found in open source code and fixed by the community in the upstream, a forked version of the code may not benefit from this fix because it is different from the upstream.
  3. They might not benefit from new features. The upstream community, using input from many organizations and individuals, creates new features for the benefit of all users of the upstream project. If an organization forks the upstream, they potentially cannot incorporate the new features because their code differs.
  4. They might not integrate with other software packages. Open source projects are rarely developed as single entities; rather they often are packaged together with other projects to create a solution. Forked code may not be able to be integrated with other projects because the developers of the forked code are not collaborating in the upstream with other participants.
  5. They might not certify on hardware platforms. Software packages are often certified to run on hardware platforms so, if problems arise, the hardware and software vendors can collaborate to find the root cause or problem.

In summary, an open source distribution is simply a packaging of an upstream, multi-organizational, collaborative open source project sold and supported by a vendor. A fork is a separate development workstream of an open source project and risks not being able to benefit from the collaborative efforts of the upstream community.

Jonathan Gershater has lived and worked in Silicon Valley since 1996, and is primarily focused on cloud and virtualization. He has experience with VMware, KVM, HyperV, AWS, OpenStack, and CloudStack technologies. At Red Hat, Jonathan focuses on competitive differentiation for Red Hat’s virtualization and OpenStack solutions.

2 Comments

Alternative description for Fork. A Fork is like a split in the road. Getting to the same place if you take different directions will be harder though not impossible. You just have to work out how to get to the same place, and if it's worth not following a single route.

Distribution is hard though. I'm pretty sure most distro's do maintain their own patches as well as benefiting from pulling from mainline, they also contribute to it. I don't believe they follow the group by default as much as work out if they agree with the group.

More worryingly though distributions often struggle to keep up with mainline projects, which for some users creates problems. It's kind of a catch 22. If you're not using a distribution you probably need lots of support, which is expensive in time and financially.

If you use the distribution they won't support non-distro features, so you lose control. This is why some large companies seem like their technology is light-years behind. They have come to the decision paying for more than distribution is prohibitive and doesn't align with their goals. By doing this they do leave themselves vulnerable to whatever the buy-in to that distro is.

For Desktop & general-consumer users this model has been problematic since the beginning.

I remember in the 90's asking someone how to make a splash screen and they berated me about how I should watch the init process and be grateful to the tux graphic in the corner.

Hardware often wouldn't work for months. Fast-forward and device vendors are doing great work with OpenSource communities to push out drivers ensuring support (and sales). It's a fragile balance though. You might get a device like an ARM SBC that runs on a sunxi fork of Linux, or a smartphone you cannot flash or control.

Often those are abandoned a few months in and never again updated. Companies like hardkernel, sparkfun, even Raspberry Pi foundation release hardware that cannot fully be supported without their involvement.

The part I recommend adding to this article is to explain the difference in modern git-based workflow between 'bad forking' and 'good forking'. The appearance of 'fork me in git' on projects feeds confusion here.

Forking in git can be a good thing if the forking developer(s) are working on a side branch with the intention of being able to be merged back into the main codebase. This sort of 'good forking' is part of the social coding ethos that is made massively easier by the quality of the git toolchain.

In addition to the 'bad forking' that Jonathan describes in this article is the use of the fork as a 'nuclear option'. That is, using the right to fork by a community as a way to force consensus with wayward leaders or the community will simply take their ball and go play elsewhere. This is a third type of fork that is neither good nor bad in itself, but it usually indicates that something untoward has happened that the community is responding to.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.