Copyright statements proliferate inside open source code

No readers like this yet.
Copyright statements emerging in source code

Opensource.com

I was looking at a source file for the OpenStack Ceilometer docs one day and noticed that there's a copyright statement at the top. Now, in no way do I want to pick on Nicholas. There are hundreds of such copyright statements in the OpenStack docs and code, and this is just the example I happened to be looking at.

(Note that my employer has its share of copyright statements in the OpenStack code. Pretty much every company participating in OpenStack does this. I think we need to stop.)

I sent a note to the OpenStack-docs list, and it has generated a thread of remarks. As I understand it, people are encouraged to put copyright statements in contributed source code and documentation, and add copyright lines to files that they modify.

I believe this to be a very bad thing to do for the following reasons:

  • If I edit a file and it says at the top that the file is copyright BigCo, I am discouraged from editing that file, because of the implication that I'm treading on someone else's toes. Files should not have any indication that they are "owned" by any one person or company. (See this by Karl Fogel for more on "owning" code.) This actively discourages people jumping in and fixing stuff.
  • If N people contribute to a file, are we supposed to have N copyright statements in the file? This doesn't scale over time. Imagine what these files will look like 10 years from now, and fix the problem now.
  • Having author names in a file encourages people to contribute for the wrong reasons.
  • Git keeps track of who contributed what changes. It's not necessary to have explicit copyright statements.

The first of those reasons is, to me, the most compelling. Anything that discourages contribution, particularly from beginners, should be eschewed as much as possible.

I have had people ask me, when encountering a copyright statement in source code, whether they have to ask that person's permission before submitting a patch. If we can avoid even one person asking this question, we've done a service to the project.

I also worry that companies that insist on copyright statements in their contributions understand neither copyright law nor Open Source. On the one hand, the audit trail in Git protects your record of contribution, and thus your copyright. On the other hand, if your copyright is that important to you, perhaps you shouldn't be contributing it to an Open Source project. It's anti-community to say to a project that they can have your contribution, but only as long as you get to assert that it's your personal property. Open Source is about community and collaboration. If the building is owned by the community, what do you gain by insisting that a particular brick is yours?

At the Apache Software Foundation, we had this debate a decade ago, and decided that author tags in source code were anti-community, and thus discouraged. To quote from the thread of comments at the time, and, in particular, to quote Sander Striker:

At the Apache Software foundation we discourage the use of author tags in source code. There are various reasons for this, apart from the legal ramifications. Collaborative development is about working on projects as a group and caring for the project as a group. Giving credit is good, and should be done, but in a way that does not allow for false attribution, even by implication. There is no clear line for when to add or remove an author tag. Do you add your name when you change a comment? When you put in a one-line fix? Do you remove other author tags when you refactor the code and it looks 95% different? What do you do about people who go about touching every file, changing just enough to make the virtual author tag quota, so that their name will be everywhere?

There are better ways to give credit, and our preference is to use those. From a technical standpoint author tags are unnecessary; if you wish to find out who wrote a particular piece of code, the version control system can be consulted to figure that out. Author tags also tend to get out of date. Do you really wish to be contacted in private about a piece of code you wrote five years ago and were glad to have forgotten?

This is a slightly different issue (author tags rather than copyright statements) but makes exactly the same point. Do I add a copyright statement when I correct grammar or spelling in a doc? How about when I add a paragraph or reorder sentences for greater clarity? At what point do I remove your copyright statement because I've changed so much of that file?

And then of course, you should consider the bigger question—why do you care? What are you trying to protect against? If you're trying to protect against your contribution being taken by the community and used for other purposes, perhaps contributing to an Apache-licensed code base isn't the smartest thing to do.

Originally posted on Notes in the Margin. Reposted with permission.

Tags
Rich Bowen
Rich is an Open Source Advocate at AWS. He's a director, member, and VP Conferences, at The Apache Software Foundation.

12 Comments

Maybe an source control system keeps track of that - but I'm old skool. Each and every line that has changed in my sourcefiles foes into a change log: what has changed, who changed it and when. I find it invaluable when I'm scratching my head why certain code does what - and why. In that certain case, I do thing there is a value to "authorship". As you correctly state: there is a difference between authorship and copyright. As far as I'm concerned, every change in my project is mine - no matter who did it. If you want to have full authorship of a certain change, fork the project.

"Maybe an source control system keeps track of that - but I'm old skool. Each and every line that has changed in my sourcefiles foes into a change log: what has changed, who changed it and when."

That's not old skool, that's just *terrible*. SCMs exist so you don't have to do that stuff. Go check out a git codebase and run 'git blame' - you'll look like one of those animated GIFs people are always passing around...

No, that's not terrible, that's discipline. Something some programmers are completely lacking when you see what they enter into CVS, SVN or GIT because they assume that the SCM "will take care of that". Well, it doesn't - it only does when discipline is there. More so, a change log will stay with the source code all along (when it's not manually removed of course). Whether a GIT stays there in clear conjunction with the code remains to be seen. Do you also think that Doxygen automagically will create the proper documentation?

git will always know who checked in what. It is incapable of *not* knowing that.

"Why" requires you to write a good commit message, but that's the appropriate level of/for discipline. The business of tracking who changed what to what is not productive work, and if you can outsource it to a tool, you should.

Have you actually played with git for an afternoon? It really is pretty much the best thing since sliced bread.

What I really would want is to give git log as some argument a line number range additionally to a file path, and then it would show the history of this code block.

I can do "grep <function name> <sourcefile>" and it does just that. Note the date and name of the developer take about 16 characters. That's rather a small savings for a SCM - and that's not why I use it. An SCM has vast other advantages than just "keeping the books".

You mean you want git log to have a shortcut to run <u>git blame -L$start,$end -- $path</u>? Can your manual changelog match the git blame feature of using a regex for $start/$end instead of a literal line number? How readily can you follow code that moved from one file to another, possibly multiple times between when it was first written and where it is now? Git blame does this automatically for full file renames, and can be asked to do it for more granular blocks by using -C and -M.

Open Source is not Public Domain. An Open Source application is copyrighted - or rather copylefted. If the copyright notice was removed the code would be Public Domain. Public Domain is nearly the opposite of Open Source [or copylefted] code. Anyone at anytime may without reference, deference, or even a nod take Public Domain code and incorporate it into a project of any kind proprietary or open source or otherwise. With a copylefted notice, such as the GPL, the author of the code is NOT giving the code to you to do with as you please. This is vary badly understood by many people...

Sorry, but this comment is almost completely false. Attribution of copyright ownership is absolutely not required for copyright to be valid on a work, unless you happen to be in a country that is not a signatory to the Berne Convention. It does not become public domain without the attribution, and it is wholly wrong to assume that an un attributed work is in the Public Domain.

Additionally, while it is correct to say that Open Source licensed files are not in the Public Domain, since Public Domain files have had their copyright waived (or expired), they are completely compatible with Open Source licensed works.

Understanding Public Domain is rather difficult (and write possibly not a valid option in many parts of the world due to Moral Rights), and it is for that reason that we have licenses like Creative Commons Zero.

You are not seeing the real meaning of a copyright comment. As Fossman said copyright is not about purely authorship is about protecting your code and be able to protect the freedom you are attaching to it.

The Software Freedom Law Center has a publication that talks about file-scope copyright statements v. centralized ones: http://www.softwarefreedom.org/resources/2012/ManagingCopyrightInformation.html

I think it is far more valuable to have per-file license attribution than per-file copyright owner(s) and date. Far too often, an effort to remove the former also removes (or prohibits) the latter, leaving everyone confused when that code migrates into other projects.

I fully agree with you.
This practice tends to go against collective code ownership described in XP rules ( http://www.jamesshore.com/Agile-Book/collective_code_ownership.html ).

Thank you for encouraging collaboration in open source.