If you are into open source development, you have probably worked with Git to manage source code. You might have come across projects with numerous dependencies and/or sub-projects. How do you manage them?
For an open source organization, it can be tricky to achieve single-source documentation and dependency management for the community and the product. The documentation and project often end up fragmented and redundant, which makes them difficult to maintain.
Suppose you want to use a single project as a child project inside a repository. The traditional method is just to copy the project to the parent repository. But, what if you want to use the same child project in many parent repositories? It wouldn't be feasible to copy the child project into every parent and have to make changes in all of them whenever you update it. This would create redundancy and inconsistency in the parent repositories and make it difficult to update and maintain the child project.
Git submodules and subtrees
What if you could put one project within another using a single command? What if you could just add the project as a child to any number of projects and push changes on the go, whenever you want to? Git provides solutions for this: Git submodules and Git subtrees. These tools were created to support code-sharing development workflows on a more modular level, aspiring to bridge the gap between the Git repository's source-code management (SCM) and the sub-repos within it.
This is a real-life scenario of the concepts this article will cover in detail. If you're already familiar with trees, here is what this model will look like:
What are Git submodules?
Git provides submodules in its default package that enable Git repositories to be nested within other repositories. To be precise, the Git submodule points to a specific commit on the child repository. Here is what Git submodules look like in my Docs-test GitHub repo:
The format folder@commitId indicates that the repository is a submodule, and you can directly click on the folder to go to the child repository. The config file called .gitmodules contains all the submodule repository details. My repo's .gitmodules file looks like this:
You can use the following commands to use Git submodules in your repositories.
Clone a repository and load submodules
To clone a repository containing submodules:
$ git clone --recursive <URL to Git repo>
If you have already cloned a repository and want to load its submodules:
$ git submodule update --init
If there are nested submodules:
$ git submodule update --init --recursive
Downloading submodules sequentially can be a tedious task, so clone and submodule update will support the --jobs or -j parameter.
For example, to download eight submodules at once, use:
$ git submodule update --init --recursive -j 8
$ git clone --recursive --jobs 8 <URL to Git repo>
Before running or building the parent repository, you have to make sure that the child dependencies are up to date.
To pull all changes in submodules:
$ git submodule update --remote
Create repositories with submodules
To add a child repository to a parent repository:
$ git submodule add <URL to Git repo>
To initialize an existing Git submodule:
$ git submodule init
You can also create branches and track commits in your submodules by adding --update to your submodule update command:
$ git submodule update --remote
Update submodule commits
As explained above, a submodule is a link that points to a specific commit in the child repository. If you want to update the commit of the submodule, don't worry. You don't need to specify the latest commit explicitly. You can just use the general submodule update command:
$ git submodule update
Just add and commit as you normally would to create and push the parent repository to GitHub.
Delete a submodule from a parent repository
Merely deleting a child project folder manually won't remove the child project from the parent repository. To delete a submodule named childmodule, use:
$ git rm -f childmodule
Although Git submodules may appear easy to work with, it can be difficult for beginners to find their way around them.
What are Git subtrees?
Git subtrees, introduced in Git 1.7.11, allow you to insert a copy of any repository as a subdirectory of another one. It is one of several ways Git projects can inject and manage project dependencies. It stores the external dependencies in regular commits. Git subtrees provide clean integration points, so they're easier to revert.
If you use the subtrees tutorial provided by GitHub to use subtrees, you won't see a .gittrees config file in your local whenever you add a subtree. This makes it difficult to recognize subtrees because subtrees look like general folders, but they are copies of the child repository. The version of Git subtree with the .gittrees config file is not available with the default Git package, so to get the git-subtree with .gittrees config file, you must download git-subtree from the /contrib/subtree folder in the Git source repository.
You can clone any repository containing subtrees, just like any other general repository, but it may take longer because entire copies of the child repository reside in the parent repository.
You can use the following commands to use Git subtrees in your repositories.
Add a subtree to a parent repository
To add a new subtree to a parent repository, you first need to remote add it and then run the subtree add command, like:
$ git remote add remote-name <URL to Git repo>
$ git subtree add --prefix=folder/ remote-name <URL to Git repo> subtree-branchname
This merges the whole child project's commit history to the parent repository.
Push and pull changes to and from the subtree
$ git subtree push-all
$ git subtree pull-all
Which should you use?
Every tool has pros and cons. Here are some features that may help you decide which is best for your use case.
- Git submodules have a smaller repository size since they are just links that point to a specific commit in the child project, whereas Git subtrees house the entire child project along with its history.
- Git submodules need to be accessible in a server, but subtrees are decentralized.
- Git submodules are mostly used in component-based development, whereas Git subtrees are used in system-based development.
A Git subtree isn't a direct alternative to a Git submodule. There are certain caveats that guide where each can be used. If there is an external repository you own and are likely to push code back to, use Git submodule since it is easier to push. If you have third-party code that you are unlikely to push to, use Git subtree since it is easier to pull.
Give Git subtrees and submodules a try and let me know how it goes in the comments.