Git subtree merging: low level plumbing



In this article, I will share a couple of tricks to handle a git repo dealing with one or more subrepos organized using the git subtree merging strategy. This article is probably somewhat irrelevant if you use the "git subtree" because I'm directly using plumbing commands here. Nevertheless, you might find it interesting to understand what happens under the hood, or how to deal with a situation where the high level commands fail.

My situation

I have three apps which share common code that is too much of a moving target to stabilize its API and publish. I used to use submodules (as I used subrepos when I used Mercurial), but these turned out to be way too complicated to my own taste. I liked the conceptual simplicity of subtree merging so I moved to that, using instructions from the git book.

I went happily for a while. Whenever I needed to change something in my common repo, I would do so in the context of a change in one the my apps. Then, I would use git merge --squash -s subtree --no-commit master when into my common branch and push my commit into my common repo. Whenever I would work on another of my apps, I could fetch the latest commits from the common repo, use git commit --squash -s subtree again (with a commit message like "Updated common repo") and everything worked well... until the structure of that repo changed.

I'm not exactly sure what happened, but after I added a new file to my common repo (until then, no modification I made involved the addition or removal of a file), my merging operation crapped out. I guess it has something to do with tree hashes changing, but well, I had to dig deeper in git's internals and use git read-tree. I ended up fixing my problem and I like the read-tree approach, but whenever I need to do it again (which isn't that often), I forget how I did it the last time, which is why I'm writing it down here.

What to do

So, as you might know already, everything git stores is either a blob or a tree (or a commit or a tag, but well...). The subtree merging strategy already forbids sharing commit history between your main repos and your subrepos, so why bother with the merge command at all? What we do is simply read the common tree in your main app tree into the staging area of your common branch, which you'll then commit and push to your common repo.

Using read-tree is really, really easy: You checkout the branch you want to read into (your common branch) and then do:

git read-tree <tree-hash>

This is going to take the whole contents of that tree and write it into your staging area. The problem now is to find the good tree hash. You can do so with cat-file. First, you start with cat-ing the commit you want to extract your content from. So you copy your commit hash from a git log and do:

git cat-file -p ba855c4d9ef4018b28a78463e63596378d274a9d

(-p is to tell git to automatically guess the type of contents and pretty-print it). You'll get something like:

tree 96d303843e22eec05d5ad49ba61b5d3cbf85df07
parent 152f5f37ce04056329aaa11195e60ffbc9da9967
author Virgil Dupras <hsoft@hardcoded.net> 1387732419 -0500
committer Virgil Dupras <hsoft@hardcoded.net> 1387732419 -0500

Some commit I've made. Had to change code in common folder.

See the hash next to tree? That's the hash of our root tree for that commit. That's not the one we want to read though. We want the common tree. So we'll need to dig deeper:

git cat-file -p 96d303843e22eec05d5ad49ba61b5d3cbf85df07

which will get us:

100644 blob 67f794c9e41c5d1f53b05a2ac41cf41a0a668859    .gitignore
100644 blob 7f6c8001062c687fb0f8fe794d2a4e2c9f18044f    LICENSE
100644 blob cc6db5964d1c46fce06aadf032806a65a7b712e4    README.md
100644 blob 3cae2df1af0277e4f2c48b9001c0c78002525b0f    setup.py
040000 tree fa226ac268a8d4f19ff2d00fa7f90a989601db56    src
040000 tree 8e308cd8103193c24928676e0e9c16e8390e573d    doc
040000 tree e2c5ea17991f4e8b569fe626acedb1cf9b8ed602    common
100644 blob 9e8cf87cd8314be151cf937cb6ca881a8f216ab3    requirements.txt

Ahh, there we have it, the hash for the tree we're looking for. So, after having checked out our common branch, we simply do:

git read-tree e2c5ea17991f4e8b569fe626acedb1cf9b8ed602

That will update your staging area (but not your working copy, which might look a bit weird in a git status), which you can commit with a relevant commit message and push to your common repo.

Is this whole procedure complicated? Technically, yes, conceptually, no. Does it make a repo less usable by mere mortals? No because the only people having to know about that stuff are the ones doing the merging. Developers cloning the modules or even committing to a repo organized like that don't have to know about this plumbing, which makes this organisation method rather elegant IMO.

Addendum 2013-12-24

While trying to propagate changes to the common library into your app, you will likely get this error message on read-tree:

error: Entry 'common/somefile' overlaps with 'common/somefile'.  Cannot bind.

Well, that's another hurdle in our path. This time, I solved it with git diff-tree. I ask git for the diff between two tree hashes and then ask it to apply the patch. First, we get the hash from the root tree of our newly updated "common" branch (let's call it "hash-new") and then we get the hash of the tree for the "common" subfolder we have in our app needing update propagation (let's call it "hash-old"). As you remember, we get all of that with cat-file. Then, we diff-and-apply with:

git diff-tree -p hash-old hash-new | git apply --directory=common

We can then commit our changes. Now that I think of it, it might be more simple to do things this way all the time instead of using read-tree.

Addendum 2015-01-02

It's a bit late to acknowledge it, but Chris' comment below is very sound. It's much easier to perform subtree merging this way. So, for example, if you want to apply updates to your common repo (fetched into a common branch), into your common subfolder of your master branch, you would simply do which master is checked out:

git diff-tree master:common/ common: | git apply --directory=common

But still, it's always interesting to know about the lower level plumbing going on under this neat command.