Should open source projects have copyright statements in their code?
Copyright statements proliferate inside open source code
I was looking at a source file for the OpenStack Ceilometer docs one day and noticed that there's a copyright statement at the top. Now, in no way do I want to pick on Nicholas. There are hundreds of such copyright statements in the OpenStack docs and code, and this is just the example I happened to be looking at.
(Note that my employer has its share of copyright statements in the OpenStack code. Pretty much every company participating in OpenStack does this. I think we need to stop.)
I sent a note to the OpenStack-docs list, and it has generated a thread of remarks. As I understand it, people are encouraged to put copyright statements in contributed source code and documentation, and add copyright lines to files that they modify.
I believe this to be a very bad thing to do for the following reasons:
- If I edit a file and it says at the top that the file is copyright BigCo, I am discouraged from editing that file, because of the implication that I'm treading on someone else's toes. Files should not have any indication that they are "owned" by any one person or company. (See this by Karl Fogel for more on "owning" code.) This actively discourages people jumping in and fixing stuff.
- If N people contribute to a file, are we supposed to have N copyright statements in the file? This doesn't scale over time. Imagine what these files will look like 10 years from now, and fix the problem now.
- Having author names in a file encourages people to contribute for the wrong reasons.
- Git keeps track of who contributed what changes. It's not necessary to have explicit copyright statements.
The first of those reasons is, to me, the most compelling. Anything that discourages contribution, particularly from beginners, should be eschewed as much as possible.
I have had people ask me, when encountering a copyright statement in source code, whether they have to ask that person's permission before submitting a patch. If we can avoid even one person asking this question, we've done a service to the project.
I also worry that companies that insist on copyright statements in their contributions understand neither copyright law nor Open Source. On the one hand, the audit trail in Git protects your record of contribution, and thus your copyright. On the other hand, if your copyright is that important to you, perhaps you shouldn't be contributing it to an Open Source project. It's anti-community to say to a project that they can have your contribution, but only as long as you get to assert that it's your personal property. Open Source is about community and collaboration. If the building is owned by the community, what do you gain by insisting that a particular brick is yours?
At the Apache Software Foundation, we had this debate a decade ago, and decided that author tags in source code were anti-community, and thus discouraged. To quote from the thread of comments at the time, and, in particular, to quote Sander Striker:
At the Apache Software foundation we discourage the use of author tags in source code. There are various reasons for this, apart from the legal ramifications. Collaborative development is about working on projects as a group and caring for the project as a group. Giving credit is good, and should be done, but in a way that does not allow for false attribution, even by implication. There is no clear line for when to add or remove an author tag. Do you add your name when you change a comment? When you put in a one-line fix? Do you remove other author tags when you refactor the code and it looks 95% different? What do you do about people who go about touching every file, changing just enough to make the virtual author tag quota, so that their name will be everywhere?
There are better ways to give credit, and our preference is to use those. From a technical standpoint author tags are unnecessary; if you wish to find out who wrote a particular piece of code, the version control system can be consulted to figure that out. Author tags also tend to get out of date. Do you really wish to be contacted in private about a piece of code you wrote five years ago and were glad to have forgotten?
This is a slightly different issue (author tags rather than copyright statements) but makes exactly the same point. Do I add a copyright statement when I correct grammar or spelling in a doc? How about when I add a paragraph or reorder sentences for greater clarity? At what point do I remove your copyright statement because I've changed so much of that file?
And then of course, you should consider the bigger question—why do you care? What are you trying to protect against? If you're trying to protect against your contribution being taken by the community and used for other purposes, perhaps contributing to an Apache-licensed code base isn't the smartest thing to do.
Originally posted on Notes in the Margin. Reposted with permission.