Copyright notices in source code are inconsistently applied and poorly maintained. As a result, such notices are poor sources of information. Should more resources be applied to the maintenance of copyright notices? No.
Copyright notices are one-line strings that typically include the word "Copyright" (or some substitute, like ©), a name (usually a person or company), and a year.
In this article, I am not focusing on licenses or license notices (which may sometimes include a copyright notice). My suggestion for low prioritization of investment in copyright notice maintenance does NOT apply to license information. License information should be clearly presented and maintained to be accurate. If you invite others to take and do something with your software, please make the permissions that are being given clear by presenting and maintaining clear license information.
Returning to copyright notices: What is their legal significance? If you think copyright notices satisfy a legal requirement or at least provide a significant legal benefit, think again. The legal significance of such notices in open source software is so small that one can easily find practical considerations outweighing the legal significance.
While these notices may appear important, their presence in source code today is largely a residue of the copyright law of the past. There was a time when failure to include a copyright notice in published material could result in complete loss of rights under US copyright law; that changed when the United States finally joined the many other countries that were already parties to the Berne Convention (US accession to the treaty came on November 16, 1988, and became effective in the US on March 1, 1989).
If there is a utility in these notices in open source software, rather than maintaining copyright notices, a project could adopt conventions that, rather than seeking to meet the US statutory requirements for a "copyright notice," can be maintained with less effort and still obtain some practical value.
As US Copyright Law has been such a significant factor in driving the use of copyright notices, this is where I will dive more deeply. The US Copyright Office publishes guidance documents known as Circulars. Circular 3, Copyright Notice, includes:
"Copyright notice was required for all works first published before March 1, 1989, subject to some exceptions discussed below. If the notice was omitted or a mistake was made in using copyright notice, the work generally lost copyright protection in the United States. Copyright notice is optional for works published on or after March 1, 1989, unpublished works, and foreign works; however, there are legal benefits for including notice on your work."
The sentence I highlighted makes it clear that, in the US, copyright notices were very important as recently as 1988. But when the US joined the many other countries in the Berne Convention, the critical role for copyright notices under US law was eliminated by the Convention's provision that: "The enjoyment and the exercise of these rights shall not be subject to any formality…"
Software projects at MIT (The X Window System) and at the University of California, Berkeley, (Berkeley Software Distribution) led to early license texts that each had origins when the draconian notice-or-lose-it requirement was still in force (or at least fresh in the minds of those contributing to the text of these licenses). A result of that timing is that these license texts have explicit language about reproducing copyright notices.
With the continued pervasive use of licenses based on these texts, most developers of open source software have seen licenses that appear to place importance on copyright notices. But those texts were created with an earlier legal regime in mind. We are now 30 years past the time when the no-formalities-required feature of the Berne Convention (which most other countries had already embraced) was first applicable to the US. To appreciate the extent of Berne Convention adoption, see the list of contracting parties maintained by the World Intellectual Property Organization, which administers the Berne Convention.
You might be wondering about those "legal benefits" mentioned in the quote above. The answer is at the end of Circular 3:
Although notice is optional for unpublished works, foreign works, or works published on or after March 1, 1989, using a copyright notice carries the following benefits:
- Notice makes potential users aware that copyright is claimed in the work.
- In the case of a published work, a notice may prevent a defendant in a copyright infringement action from attempting to limit his or her liability for damages or injunctive relief based on an innocent infringement defense.
- Notice identifies the copyright owner at the time the work was first published for parties seeking permission to use the work.
- Notice identifies the year of first publication, which may be used to determine the term of copyright protection in the case of an anonymous work, a pseudonymous work, or a work made for hire.
- Notice may prevent the work from becoming an orphan work by identifying the copyright owner and specifying the term of the copyright.
That's it. That's the benefit.
I have quoted from the US Copyright Office Circular 3 because it provides a slightly more readable phrasing of the requirements than in the underlying statute. The statutory law at the federal level in the United States is codified in what is known as the United States Code, which is organized as a set of "titles." Title 17 is Copyrights. The details of copyright notices are found in sections 401–406 of that title. One can start at 17 USC 401. See 17 USC 401(b) for a description of the three elements that the statute requires to be present in a copyright notice. If you want to see the details of the "Effect of Omission on Innocent Infringers," see 17 USC 405(b).
To provide more accurate information, why not clean up the copyright notices in a codebase? Awkwardly, 17 USC 506(c) (Fraudulent Copyright Notice), 17 USC 506(d) (Fraudulent Removal of Copyright Notice), and 17 USC 1202(a) (False Copyright Management Information) provide some disincentive (even if limited to bad intent). With low value and some risk (if getting it wrong when making changes), no wonder more resources are not applied to the maintenance of copyright notices.
Some people and some companies place emphasis on putting detailed notices into code that they make available under an open source license; others do not. As open source projects develop, some contributions may include notices, and others do not. A file may include an original notice and no other notices, even though its content has changed substantially from its original version. Or a later contributor might add a notice to a file that previously had none. And what about that element of copyright notices that is the "year of first publication of the work"? What does that mean? Different people have different practices. Updated? What about when other contributions are made?
As to drawing conclusions from mining copyright notice data, be cautious. Have low expectations.
What should an open source project do?
Please, present and maintain clear, accurate license information.
As to copyright notices, it is difficult to justify investing in maintaining copyright notice details. But some people may expect notices to be present. As to the "origin of the software," perhaps it would be more useful and more accurate to simply refer to the project, rather than attempt to capture something more fine-grained. Year of publication? This is unlikely to be worth the trouble to manually maintain in source files; source management tools provide more accurate information at lower resource cost.
For more details on a practical approach, I direct your attention to an excellent reconsideration of copyright notice practice: Copyright notices in open source software projects, by Steve Winslow, January 10, 2020.