Having access to source code makes it possible to analyze the security and safety of applications. But if nobody actually looks at the code, the issues won’t get caught, and even when people are actively looking at code, there’s usually quite a lot to look at. Fortunately, GitHub has an active security team, and recently, they revealed a Trojan that had been committed into several Git repositories, having snuck past even the repo owners. While we can’t control how other people manage their own repositories, we can learn from their mistakes. To that end, this article reviews some of the best practices when it comes to adding files to your own repositories.
Know your repo
This is arguably Rule Zero for a secure Git repository. As a project maintainer, whether you started it yourself or you’ve adopted it from someone else, it’s your job to know the contents of your own repository. You might not have a memorized list of every file in your codebase, but you need to know the basic components of what you’re managing. Should a stray file appear after a few dozen merges, you’ll be able to spot it easily because you won’t know what it’s for, and you’ll need to inspect it to refresh your memory. When that happens, review the file and make sure you understand exactly why it’s necessary.
Ban binary blobs
Git is meant for text, whether it’s C or Python or Java written in plain text, or JSON, YAML, XML, Markdown, HTML, or something similar. Git isn’t ideal for binary files.
It’s the difference between this:
$ cat hello.txt
This is plain text.
It's readable by humans and machines alike.
Git knows how to version this.
$ git diff hello.txt
diff --git a/hello.txt b/hello.txt
index f227cc3..0d85b44 100644
@@ -1,2 +1,3 @@
This is plain text.
+It's readable by humans and machines alike.
Git knows how to version this.
$ git diff pixel.png
diff --git a/pixel.png b/pixel.png
index 563235a..7aab7bc 100644
Binary files a/pixel.png and b/pixel.png differ
$ cat pixel.png
The data in a binary file can’t be parsed in the same way plain text can be parsed, so if anything is changed in a binary file, the whole thing must be rewritten. The only difference between one version and the other is everything, which adds up quickly.
Worse still, binary data can’t be reasonably audited by you, the Git repository maintainer. That’s a violation of Rule Zero: know what’s in your repository.
In addition to the usual POSIX tools, you can detect binaries using
git diff. When you try to diff a binary file using the
--numstat option, Git returns a null result:
$ git diff --numstat /dev/null pixel.png | tee
- - /dev/null => pixel.png
$ git diff --numstat /dev/null file.txt | tee
5788 0 /dev/null => list.txt
If you’re considering committing binary blobs to your repository, stop and think about it first. If it’s binary, it was generated by something. Is there a good reason not to generate them at build time instead of committing them to your repo? Should you decide it does make sense to commit binary data, make sure you identify, in a README file or similar, where the binary files are, why they’re binary, and what the protocol is for updating them. Updates must be performed sparingly, because, for every change you commit to a binary blob, the storage space for that blob effectively doubles.
Keep third-party libraries third-party
Third-party libraries are no exception to this rule. While it’s one of the many benefits of open source that you can freely re-use and re-distribute code you didn’t write, there are many good reasons not to house a third-party library in your own repository. First of all, you can’t exactly vouch for a third party, unless you’ve reviewed all of its code (and future merges) yourself. Secondly, when you copy third party libraries into your Git repo, it splinters focus away from the true upstream source. Someone confident in the library is technically only confident in the master copy of the library, not in a copy lying around in a random repo. If you need to lock into a specific version of a library, either provide developers with a reasonable URL the release your project needs or else use Git Submodule.
Resist a blind git add
If your project is compiled, resist the urge to use
git add . (where
. is either the current directory or the path to a specific folder) as an easy way to add anything and everything new. This is especially important if you’re not manually compiling your project, but are using an IDE to manage your project for you. It can be extremely difficult to track what’s gotten added to your repository when an IDE manages your project, so it’s important to only add what you’ve actually written and not any new object that pops up in your project folder.
If you do use
git add ., review what’s in staging before you push. If you see an unfamiliar object in your project folder when you do a
git status, find out where it came from and why it’s still in your project directory after you’ve run a
make clean or equivalent command. It’s a rare build artifact that won’t regenerate during compilation, so think twice before committing it.
Use Git ignore
Many of the conveniences built for programmers are also very noisy. The typical project directory for any project, programming, or artistic or otherwise, is littered with hidden files, metadata, and leftover artifacts. You can try to ignore these objects, but the more noise there is in your
git status, the more likely you are to miss something.
You can Git filter out this noise for you by maintaining a good gitignore file. Because that’s a common requirement for anyone using Git, there are a few starter gitignore files available. Github.com/github/gitignore offers several purpose-built gitignore files you can download and place into your own project, and Gitlab.com integrated gitignore templates into the repo creation workflow several years ago. Use these to help you build a reasonable gitignore policy for your project, and stick to it.
Review merge requests
When you get a merge or pull request or a patch file through email, don’t just test it to make sure it works. It’s your job to read new code coming into your codebase and to understand how it produces the result it does. If you disagree with the implementation, or worse, you don’t comprehend the implementation, send a message back to the person submitting it and ask for clarification. It’s not a social faux pas to question code looking to become a permanent fixture in your repository, but it’s a breach of your social contract with your users to not know what you merge into the code they’ll be using.
Good software security in open source is a community effort. Don’t encourage poor Git practices in your repositories, and don’t overlook a security threat in repositories you clone. Git is powerful, but it’s still just a computer program, so be the human in the equation and keep everyone safe.