Bernd H., Xaidat Senior Software Engineer
Integrating Your Work With git (part 1)
When you’ve built fixes or features, you want to integrate them with the main line of development, generally the “master” branch. Git offers multiple ways you can go about that task. Let’s take a look at your options using a little sample repository.
This is, in a way, the “default” way of integrating changes. The command is even called merge!
The process is pretty easy: You start with the branch you want to integrate your changes with already checked out. Then you call
git merge other-branch for each of the other branches you want to integrate. We begin:
Resolving conflicts through git can be done manually, but that is very inconvenient. For a better user experience, git supports many different tools for resolving conflicts. A full list is available under the configuration documentation for
merge.tool in the on-line help for
git merge, which you can access using
git help merge. I use
opendiff because it is available by default on Macs with the developer tools installed and I am reasonably used to it. There are many other options though, take a look! When you have made your choice, you can configure it for your account using
git config --global merge.tool chosen-tool
git tells me that it starts merging with
main.txt. “Local”, in this context, means the branch we are on; “remote” is the branch that we want to merge. After these messages, git starts the
opendiff program for me, or whatever three-way-merge tool you have have configured:
Once we’re done merging, we save the results and quit the merge tool. Git helpfully warns us that the result seems unchanged, and whether we want to restore the state before the tool was run, if the result is unchanged. It will also ask whether we even want to attempt to merge other conflicts in that case.
I just performed a sample merge and git drops me back to a shell prompt. Let’s take a look at git status:
We can see that shows us that the merge has not been concluded, but the conflicts are already resolved. There’s also a
.orig File with the unmerged contents:
This is nice for a final comparison if we’re unsure about our merge, but in a real project we might have a lot of resolved conflicts scattered over a large directory tree. Let’s get rid of this untracked file and do what git says to finish the merge:
git prefills the commit message with a note what we merged, with some helpful comments about what we did, what we’re about to do and the affected files added below:
When we just save & quit, git performs the merge commit and we get a short confirmation message:
After we repeat the process with the other branches to merge, we get the following history structure:
Merging like this is fairly easy for most users, and it has definitive benefits:
- each merge is explicitly marked, it is easy to see what was merged and when it happened
- if conflicts had to be resolved, the required changes have their own easy-to-see commit
These benefits have several drawbacks following from them, however:
- The history gets awfully busy, and quickly – it’s not awful when one person merges two branches, but when a whole team integrates many branches, you quickly end up with a diagram that looks like a major railway station, with many branches fanning out at the start of a sprint and lots of overlapping branches coming together in the last week or so.
- Merge forces you to integrate all the commits of one branch with all the commits of another branch in one go. If you have a bigger project and multiple concerns you had to touch, this can make merges very complicated. You wouldn’t define or implement a big story as one task, so why perform all merge tasks at once?
- Merge commits often provide very little value-added: Is putting calls for one feature in a method below calls for another feature in the same method always its own “thing” that you want to see later?
Sub-case: Octopus Merge
If changes can be merged without any conflict, they can be octopus merged. If we didn’t have any conflicts in our branches, we could e.g. do
git merge feature-1 feature-2 feature-3 quick-fix-1.
This would cut down on the number of merge commits & save some time. Since it’s only applicable when there are no conflicts, it is rarely applicable, alas – something of an edge case.
Another option for integrating changes is a Rebase. In a rebase, we typically stay on the branch that we want to merge when we’re done implementing it and rebase it onto the branch into which we want to merge it. Let’s try it and see what happens:
As we can see, git comes back to us that it found a conflict when merging the first commit of the branch that we are rebasing.
Here, we see the most important difference between merge and rebase: While merge brings together two branch heads and all the changes they build on, rebase takes the branch we are rebasing, and applies it, commit for commit, to the branch we are rebasing onto.
If you think about it, it does exactly what it says on the label: Rebase takes a branch from its existing “base” (where its branch history first diverged from the rebase target) and puts it onto a new one.
Let’s see what
git status can tell us:
git’s message is fairly self-explanatory: We are in a rebase, there are conflicts that we need to resolve. git also tells us how we can proceed.
Let’s forge ahead. We run
git mergetool -y just as with merge; when we’re done we get the following
git status output:
We have fixed the conflicts that the first commit of the branch we are merging,
feature-1, had with the branch we are rebasing it onto,
master. We can now do
git rebase --continue and git will perform the commit of the new, deconflicted contents that we have created in our merge tool, and immediately continue on with applying the other commits in
feature-1. At this point, as in any other point during a rebase, we could also run
git rebase --abort to abandon the rebase (which would end up with
master looking exactly as they did when we started the process with
git rebase master).
We can see that the first change got applied without conflicts after we resolved them, but we get another conflict on the next commit, which we have to resolve in turn.
This is both a curse and a blessing of rebasing: When we rebase, we have to resolve potential conflicts for each change that we need to apply, meaning that we may have to resolve more conflicts (or at least, in more passes) than with a direct merge. The corresponding benefit is that we don’t have to merge a lot of stuff at once, but can take it one change at a time.
In the end, nothing determines how well rebase works for you as much as the quality of the branch histories that you’re integrating: The more each commit implements one whole intention, the mentally easier it will be to merge. If you don’t have feature branches that don’t touch half of your source tree, you’ll have fewer opportunities for conflicting changes. If there’s few back-and-forth experimental commits in your branch, there are also fewer risks of having to resolve conflicts in one location multiple times.
Much of this can be “fixed in post” when using git, in any case. We will look at the ways git can help you achieve a clean history for easier conflict resolution in the next section.
Before that, let’s take a look at a git history where multiple branches have been integrated through
git rebase. For that, we have to rebase all the feature branches and finally make the
master branch point to the newest commit in the linear history we produce. The easiest way to achieve this in our demo is to rebase each feature (or fix) branch onto the previous one that we have finished rebasing without touching master in the meantime, and only in the end
checking out master and merging the last feature branch that we have rebased:
What we get is a “Fast-forward” merge: git has determined that the
master branch into which we want to merge
feature-3 is a direct ancestor of
feature-3, so no further merging is necessary; the
master reference is just switched to point to the same commit that is the head of
feature-3. Finally, git prints a summary of the changes that happened between the old and new head commits of master.
What we have achieved with this is the following picture:
I would say it’s much easier to understand what happened when than in a history built using “normal” merge.
We mentioned in the section on rebase that rebasing works best with feature branch histories that are already fairly “clean”, with just a few commits that each implement one clear goal.
At the same time, while working on an issue or story, it is normally best to commit as often as feasible, to allow fine-grained undo and have frequent backups of your work.
To bridge these conflicting goals, git offers “interactive rebase” to modify history in a targeted fashion.
Let’s restart from our original sample and integrate our feature- and fix-branches once again , but this time rewriting history as we go.
Let’s start with
feature-1 once again:
As before, git determines which commits happened in the history of
feature-1 since it diverged from master’s history, but this time, git doesn’t begin applying these commits to
Instead, it gives us a text editor with the following contents:
As usual, git includes some comments with instructions with the active contents of the file. Let’s review:
The non-comment lines in the file each represent a commit that will be rebased onto the target. The lines contain a verb that determines what will be done with the commit, the commit hash for identification and the start of the commit message to help the user get oriented.
Note that the CI message here is strictly for the user’s convenience; don’t get confused and change it here for rebase comments that modify commit messages since your changes will be ignored (you’ll get an editor do do the actual modification of commit messages later in the process).
We’ll get to the available verbs in a second, but let’s first note the messages below:
- You can reorder commits in the branch – but note what the message does not mention: There is no guarantee whatsoever that commits would apply cleanly in any other order. Since reordering does not change the commits, this may seem like risky busywork, but we’ll see how it can be extremely useful when exploring the available verbs below.
- You can just drop a commit by deleting the corresponding line from the rebase instructions. One case where this is very useful is when you had to revert some commit during branch development: Instead of integrating a commit and its inverse commit, you can just drop both during integration.
- You can also abandon the rebase immediately by removing all the lines.
If you kept vim as your git editor, you can quickly delete all lines by typing
:%d followed by return while in normal mode, (i.e. not in insert mode – the characters you type don’t show up in the edited text). vim seems intimidating at first, but its commands have a consistent structure that allow very efficient editing once mastered. In this case, e.g.
: begins a command to operate on specific lines,
% specifies “all lines” and
d means delete. While learning vim is a worthwile endeavour, if you really want to avoid the time investment, you can also make git use a different editor. Just set the environment variable “VISUAL” to the editor of your choice. One popular choice is “nano”, a text editor that prioritizes approachability.
With that out of the way, let’s take a look at the available verbs, the actual rebase instructions:
pickjust integrates the commit as-is, with the same contents and message. Note that you will not get a commit with the same ID hash, however: since commits include the hash of the parent commit, rebased commits will always have IDs that differ from the originals.
rewordwill allow you to choose a new message for a commit without changing its content. Good commit messages will make the intention and the decisions behind a given change much easier to understand. The developer trying to understand your code a few years down the line may very well be you, so write the very best message you can! Rewording commit messages at the end of a feature branch allows you to do so with the benefit of hindsight.
editapplies a commit, then pauses the rebase to allow you to modify the commit, e.g. to make a small modification or split it up into multiple commits. It will only continue applying further commits when you run
git rebase --continue.
squashuses the commit’s contents and message, but combines both the contents and the message with the previous commit. It will give you the opportunity to edit the commit message of the combined commit before going on. This is highly recommended, since the default a concatenation of the messages of the commits that were squashed together, which is not very readable – and the description of reword above applies!
fixupmelds this commit’s contents into the previous commit while discarding its message. One situation where this is very practical is with fixing minor mistakes in earlier commits of the branch. Imaging, e.g. that you have committed a database schema migration at the start of the branch. You produce multiple commits with service code changes, and before integrating you notice an embarrassing typo in your schema change a few commits ago. The cleanest way to fix this is to:
- Create a quick commit only containing the fixed typo with a throwaway message identifying the commit to fix (NB: Don’t just use that commit’s hash – if you rebase in the meantime to integrate parent branch changes mid-stream, that hash will be invalid!). I like to prepend this message with “!!fu” or a similar visually striking tag so that I immediately see that this commit needs to be melded into another one before integration.
- While rebasing interactive during feature integration, reorder commits so that the fixup commit is directly behind the commit that it amends.
- Change its verb to “fixup” (or just “f”) and in the final result, the commit and its message will have vanished, but the error is gone as well, as if it had never happened!
execjust runs the rest of the line behind it as a shell command. This means it makes little sense as the verb before an existing commit; you will generally insert new lines between commits to use it. Just running a command after a specific commit can be used in many ways, but it is especially great for quality control: Take the same scenario as for
fixupabove. Wouldn’t it be nice to have all automated tests of the old version run with the new schema before we make changes to the code? After all, if we have to go back on the feature in the application (e.g. because it caused unexpected system load in production) the new schema will still be around, so why not test up-front that it will actually work? With exec, you can just add a line that runs your tests (such as
exec mvn clean testif you build with Apache Maven) behind the schema commit. If Maven is successful (i.e. produces a 0 return code), git will go on with rebasing according to your plan. If not,
execwill behave like
editand interrupt rebasing after the schema commit, giving you an opportunity to make changes before you
git rebase --continue.
dropremoves a commit – but why not just leave out the line from the rebase plan instead? Either will do.
rebase --interactive In Practice
Let’s try this out with our sample repo and finish the
rebase --interactive we started:
We basically want to put the changes from
feature-1 into one commit, and give it a better commit message. Of course, we’ll still have to resolve the conflict – inherently conflicting changes will have to be handled in every way we could perform our integrations. How we perform the conflict resolution during interactive rebase is just the same as with other techniques, so let’s not cover it in detail yet again.
We next rebase
We drop a failed fix attempt and its reversion, wholesale. We decide to keep the documentation of the underlying problem that we identified in the process and squash it together with the proper implementation of the fix.
Squashing gives us a commit message like the above to edit. I added some actual content to the sample messages to show the value – if you have complex commits with explanatory commit messages (which are highly encouraged!), you will want to keep elements of them them around when you combine commits.
In this case, we might up with a message such as the following:
Note that this is a bit shorter than the combined commit messages – but it loses little information. This is one of the benefits of rewriting history when all is said and done: It is much easier to provide well-written messages with all information in hand.
Also note that this only pays off when you have longer, informative commit messages. When there’s just not much to write about some change, don’t force it – but consider rewriting one commit and just mark the other commits to merge as fixup. Squash is only “worth it” when there’s interesting messages to blend.
Next we add
Here we see a bit of reordering:
We put implementation of the new component and its documentation before its use in the history. Personally, I find it productive to program from usage to implementation (i.e., I write the function calls into my code as if the function already existed, and then let my IDE create the skeletons of the functions from the signature of its use). This is also true when writing tests first – TDD will have you build users of your functions before the functions themselves by
definition. If we only commit when tests, users and implementation are done, on the other hand, we are committing very rarely, giving us little granularity for undoing a step or two.
Working by committing every step while creating our tests and/or users and while building parts of the implementation is a productive, fluid style while building a feature. There is a price, however: Once we integrate the use of the implementation before the implementation, we get non-functioning software if we have to move back a commit or two. If we integrate the new component first, however, worst case is that an unused component is included. This re-ordering, then, is not just a cosmetic but a real semantic change! This robustness will help you most at a time when you need it most urgently – in our experience, we don’t always have time to act with considered strategy when a hot fix is required.
Note that we implement the lesson from above here: A new component may need a bit of description and a richer commit message, so we choose to squash the messages together.
There’s a lot of talk of good commit messages in this post, but what is a good commit message? Content-wise, it depends entirely on your project and team,
but generally, a good message should be as short as it can be while fully describing what was done, why it was done and what may be unusual about the solution (e.g. why a dependency was introduced or why a more straightforward solution could not be used), so that anybody reviewing or modifying that part of the code is warned off potential gotchas that may be tempting again later. The formal structure of a good commit message is explained very well in this post by Chris Beams.
Just using the new implementation should be pretty straightforward, so we just reword into a quick note and just fixup the actual call of the component into the first commit.
We finally rebase
feature-2, just rewording the message, and end up with this history:
Actively writing a history helps you end up with a much more intentional narrative about how the state of the project came to be.
Most developers understand the importance of readable code and test coverage, not just for building a robust system, but also for building a system that can be understood by others.
The same is true of version history, but this is much neglected on most projects. Writing your history as a biography of your code instead of just a file store with undo or an audit log what was typed when is a powerful tool for staying more effective as systems grow.
Being able to
git blame or
git log a certain section of the code and actually seeing not just what happened, but getting both an explanation why it happened and what else was a logical unit with that change is very powerful. You will not want to miss it once you have experienced it.
So try it for yourself! Don’t get discouraged by the length of the section on
rebase --interactive – I have mostly just skipped much less than in the other sections because it is not yet habitual to most git users.
Hint: You don’t have to
rebase --interactive onto a different branch! It’s often very productive to first beautify your branch history by running
git rebase --interactive commit-id-of-commit-before-you-branched-off. While rewriting the branch by itself, you won’t have to deal with merge conflicts, except maybe from reordering or dropping commits. When you are done with rewriting the branch, you can then rebase the result – which should have more self-contained, easier to read commits) onto your actual target for integration!
Back to Merge: Squashing
One important alternative to fully rewriting a branch history for integration is just compressing it into one commit. In fact, this is what we used
rebase --interactive for in parts of the previous section!
There is an easier way to achieve that:
git merge --squash. Let’s go over our sample once more:
We start on master, and this time we stay on it. When we run
git merge --squash feature-1, we get the usual notification about conflicts (remember, no technique can do away with inherently conflicting changes!), and we resolve them.
When we are done and run git status, we’ll see that the totality of changes in
feature-1 is visible, and already staged for commit (we can also verify that everything is as expected with a quick
git diff --cached):
Let’s commit the changes with a short message:
git commit -m 'Implement feature-1.'
We can then repeat the process with
feature-3 (which we can just as well merge without
--squash since it consists of just one commit). The resulting history looks like this:
- All feature branches are still present, and unchanged.
- The history of master is extremely linear and easy to read.
- We never left the master branch.
merge --squash can be an extremely quick and straightforward process. Of course it also has drawbacks:
- Sometimes you want commits from a branch to remain separate. This can be highly important – as in the note on commit order in the section on
rebase --interactiveabove – or nice-to-have, like keeping typo fixes separate from semantically complicated commits so that the typo doesn’t come back when a problem forces a revert on the algorithm change.
- Like any merge, you have to resolve conflicts all at once, unlike rebase where you can tackle commits one-by-one.
Despite these drawbacks,
merge --squash is very useful for straightforward feature implementations that you just want to get onto
master with the least possible fuss.
Git gives you a lot of options for integrating changes – which is great, but can make it hard to decide on one of them. In everyday usage, none of them are always wrong or always the right call, but there is a meaningful choice to be made in any given situation.
Having a full toolbox of concepts and techniques gives you the best git experience and will help you and your team build and deliver more robust and understandable software.