Categories

Bernd Haug, Senior Software Engineer

15.min

Integrating Your Work With git (part 3) – interactive rebase

rebase --interactive

We mentioned in the section on rebase that rebasing works best with feature branch histories that are already fairly “clean”, with just a few commits that each implement one clear goal.

At the same time, while working on an issue or story, it is normally best to commit as often as feasible, to allow fine-grained undo and have frequent backups of your work.

To bridge these conflicting goals, git offers “interactive rebase” to modify history in a targeted fashion.

Let’s restart from our original sample and integrate our feature- and fix-branches once again , but this time rewriting history as we go.

Let’s start with feature-1 once again:

As before, git determines which commits happened in the history of feature-1 since it diverged from master’s history, but this time, git doesn’t begin applying these commits to master immediately.

Instead, it gives us a text editor with the following contents:

As usual, git includes some comments with instructions with the active contents of the file. Let’s review:

The non-comment lines in the file each represent a commit that will be rebased onto the target. The lines contain a verb that determines what will be done with the commit, the commit hash for identification and the start of the commit message to help the user get oriented.

Note that the CI message here is strictly for the user’s convenience; don’t get confused and change it here for rebase comments that modify commit messages since your changes will be ignored (you’ll get an editor do do the actual modification of commit messages later in the process).

We’ll get to the available verbs in a second, but let’s first note the messages below:

  • You can reorder commits in the branch – but note what the message does not mention: There is no guarantee whatsoever that commits would apply cleanly in any other order. Since reordering does not change the commits, this may seem like risky busywork, but we’ll see how it can be extremely useful when exploring the available verbs below.
  • You can just drop a commit by deleting the corresponding line from the rebase instructions. One case where this is very useful is when you had to revert some commit during branch development: Instead of integrating a commit and its inverse commit, you can just drop both during integration.
  • You can also abandon the rebase immediately by removing all the lines.
If you kept vim as your git editor, you can quickly delete all lines by typing :%d followed by return while in normal mode, (i.e. not in insert mode – the characters you type don’t show up in the edited text). vim seems intimidating at first, but its commands have a consistent structure that allow very efficient editing once mastered. In this case, e.g. :begins a command to operate on specific lines, % specifies “all lines” and dmeans delete. While learning vim is a worthwile endeavour, if you really want to avoid the time investment, you can also make git use a different editor. Just set the environment variable “VISUAL” to the editor of your choice. One popular choice is “nano”, a text editor that prioritizes approachability.

With that out of the way, let’s take a look at the available verbs, the actual rebase instructions:

  • pick just integrates the commit as-is, with the same contents and message. Note that you will not get a commit with the same ID hash, however: since commits include the hash of the parent commit, rebased commits will always have IDs that differ from the originals.
  • reword will allow you to choose a new message for a commit without changing its content. Good commit messages will make the intention and the decisions behind a given change much easier to understand. The developer trying to understand your code a few years down the line may very well be you, so write the very best message you can! Rewording commit messages at the end of a feature branch allows you to do so with the benefit of hindsight.
  • edit applies a commit, then pauses the rebase to allow you to modify the commit, e.g. to make a small modification or split it up into multiple commits. It will only continue applying further commits when you run git rebase --continue.
  • squash uses the commit’s contents and message, but combines both the contents and the message with the previous commit. It will give you the opportunity to edit the commit message of the combined commit before going on. This is highly recommended, since the default a concatenation of the messages of the commits that were squashed together, which is not very readable – and the description of reword above applies!
  • fixup melds this commit’s contents into the previous commit while discarding its message. One situation where this is very practical is with fixing minor mistakes in earlier commits of the branch. Imaging, e.g. that you have committed a database schema migration at the start of the branch. You produce multiple commits with service code changes, and before integrating you notice an embarrassing typo in your schema change a few commits ago. The cleanest way to fix this is to:
    • Create a quick commit only containing the fixed typo with a throwaway message identifying the commit to fix (NB: Don’t just use that commit’s hash – if you rebase in the meantime to integrate parent branch changes mid-stream, that hash will be invalid!). I like to prepend this message with “!!fu” or a similar visually striking tag so that I immediately see that this commit needs to be melded into another one before integration.
    • While rebasing interactive during feature integration, reorder commits so that the fixup commit is directly behind the commit that it amends.
    • Change its verb to “fixup” (or just “f”) and in the final result, the commit and its message will have vanished, but the error is gone as well, as if it had never happened!
  • exec just runs the rest of the line behind it as a shell command. This means it makes little sense as the verb before an existing commit; you will generally insert new lines between commits to use it. Just running a command after a specific commit can be used in many ways, but it is especially great for quality control: Take the same scenario as for fixup above. Wouldn’t it be nice to have all automated tests of the old version run with the new schema before we make changes to the code? After all, if we have to go back on the feature in the application (e.g. because it caused unexpected system load in production) the new schema will still be around, so why not test up-front that it will actually work? With exec, you can just add a line that runs your tests (such as exec mvn clean test if you build with Apache Maven) behind the schema commit. If Maven is successful (i.e. produces a 0 return code), git will go on with rebasing according to your plan. If not, exec will behave like edit and interrupt rebasing after the schema commit, giving you an opportunity to make changes before you git rebase --continue.
  • drop removes a commit – but why not just leave out the line from the rebase plan instead? Either will do.

rebase --interactive In Practice

Let’s try this out with our sample repo and finish the rebase --interactive we started:

becomes:

We basically want to put the changes from feature-1 into one commit, and give it a better commit message. Of course, we’ll still have to resolve the conflict – inherently conflicting changes will have to be handled in every way we could perform our integrations. How we perform the conflict resolution during interactive rebase is just the same as with other techniques, so let’s not cover it in detail yet again.

We next rebase quick-fix-1 onto feature-1.

becomes:

We drop a failed fix attempt and its reversion, wholesale. We decide to keep the documentation of the underlying problem that we identified in the process and squash it together with the proper implementation of the fix.

Squashing gives us a commit message like the above to edit. I added some actual content to the sample messages to show the value – if you have complex commits with explanatory commit messages (which are highly encouraged!), you will want to keep elements of them them around when you combine commits.

In this case, we might up with a message such as the following:

Note that this is a bit shorter than the combined commit messages – but it loses little information. This is one of the benefits of rewriting history when all is said and done: It is much easier to provide well-written messages with all information in hand.

Also note that this only pays off when you have longer, informative commit messages. When there’s just not much to write about some change, don’t force it – but consider rewriting one commit and just mark the other commits to merge as fixup. Squash is only “worth it” when there’s interesting messages to blend.

Next we add feature-2:

becomes

Here we see a bit of reordering:

We put implementation of the new component and its documentation before its use in the history. Personally, I find it productive to program from usage to implementation (i.e., I write the function calls into my code as if the function already existed, and then let my IDE create the skeletons of the functions from the signature of its use). This is also true when writing tests first – TDD will have you build users of your functions before the functions themselves by
definition. If we only commit when tests, users and implementation are done, on the other hand, we are committing very rarely, giving us little granularity for undoing a step or two.

Working by committing every step while creating our tests and/or users and while building parts of the implementation is a productive, fluid style while building a feature. There is a price, however: Once we integrate the use of the implementation before the implementation, we get non-functioning software if we have to move back a commit or two. If we integrate the new component first, however, worst case is that an unused component is included. This re-ordering, then, is not just a cosmetic but a real semantic change! This robustness will help you most at a time when you need it most urgently – in our experience, we don’t always have time to act with considered strategy when a hot fix is required.

Note that we implement the lesson from above here: A new component may need a bit of description and a richer commit message, so we choose to squash the messages together.

There’s a lot of talk of good commit messages in this post, but what is a good commit message? Content-wise, it depends entirely on your project and team, but generally, a good message should be as short as it can be while fully describing what was done, why it was done and what may be unusual about the solution (e.g. why a dependency was introduced or why a more straightforward solution could not be used), so that anybody reviewing or modifying that part of the code is warned off potential gotchas that may be tempting again later. The formal structure of a good commit message is explained very well in this post by Chris Beams.

Just using the new implementation should be pretty straightforward, so we just reword into a quick note and just fixup the actual call of the component into the first commit.

We finally rebase feature-3 onto feature-2, just rewording the message, and end up with this history:

Actively writing a history helps you end up with a much more intentional narrative about how the state of the project came to be.

Most developers understand the importance of readable code and test coverage, not just for building a robust system, but also for building a system that can be understood by others.

The same is true of version history, but this is much neglected on most projects. Writing your history as a biography of your code instead of just a file store with undo or an audit log what was typed when is a powerful tool for staying more effective as systems grow.

Being able to git blame or git log a certain section of the code and actually seeing not just what happened, but getting both an explanation why it happened and what else was a logical unit with that change is very powerful. You will not want to miss it once you have experienced it.

So try it for yourself! Don’t get discouraged by the length of the section on rebase --interactive – I have mostly just skipped much less than in the other sections because it is not yet habitual to most git users.

You don’t have to rebase --interactive onto a different branch! It’s often very productive to first beautify your branch history by running git rebase --interactive commit-id-of-commit-before-you-branched-off. While rewriting the branch by itself, you won’t have to deal with merge conflicts, except maybe from reordering or dropping commits. When you are done with rewriting the branch, you can then rebase the result – which should have more self-contained, easier to read commits) onto your actual target for integration!

Back to Merge: Squashing

One important alternative to fully rewriting a branch history for integration is just compressing it into one commit. In fact, this is what we used rebase --interactive for in parts of the previous section!

There is an easier way to achieve that: git merge --squash. Let’s go over our sample once more:

We start on master, and this time we stay on it. When we run git merge --squash feature-1, we get the usual notification about conflicts (remember, no technique can do away with inherently conflicting changes!), and we resolve them.

When we are done and run git status, we’ll see that the totality of changes in feature-1 is visible, and already staged for commit (we can also verify that everything is as expected with a quick git diff --cached):

Let’s commit the changes with a short message: git commit -m 'Implement feature-1.'

We can then repeat the process with quick-fix-1, feature-2 and feature-3 (which we can just as well merge without --squash since it consists of just one commit). The resulting history looks like this:

Git

Note that:

  • All feature branches are still present, and unchanged.
  • The history of master is extremely linear and easy to read.
  • We never left the master branch.

merge --squash can be an extremely quick and straightforward process. Of course it also has drawbacks:

  • Sometimes you want commits from a branch to remain separate. This can be highly important – as in the note on commit order in the section on rebase --interactive above – or nice-to-have, like keeping typo fixes separate from semantically complicated commits so that the typo doesn’t come back when a problem forces a revert on the algorithm change.
  • Like any merge, you have to resolve conflicts all at once, unlike rebase where you can tackle commits one-by-one.

Despite these drawbacks, merge --squash is very useful for straightforward feature implementations that you just want to get onto master with the least possible fuss.

Conclusion

Git gives you a lot of options for integrating changes – which is great, but can make it hard to decide on one of them. In everyday usage, none of them are always wrong or always the right call, but there is a meaningful choice to be made in any given situation.

Having a full toolbox of concepts and techniques gives you the best git experience and will help you and your team build and deliver more robust and understandable software.

Help your team make the most of git by obtaining customized training. Our offering in this space, focusing on interactive exercises in your individual environment and considering your workflow needs, can be found here.