Bernd Haug, Senior Software Engineer


Don’t Fear The git

Maybe the biggest obstacle that we encounter when we provide git trainings to our clients is trepidation: For one reason or another, clients are afraid to shred their source code while using git. Beginners have often heard rumours that it’s even easy to lose already committed source code using git.

Others, mostly coming from SVN or SourceSafe, fear losing work that is only present on their own computers since changes don’t automatically get transmitted to remote servers with every commit when using distributed version control systems like git. Both these worries are unjustified, and can sadly keep new users from experimenting, which is the best way to learn git. Let’s see why we can git with peace of mind!

Losing Commits Ain’t Easy

Fear number one is accidentally losing commits, mostly when using git rebase instead of git merge when integrating changes (which we encourage).

Let’s work through an example in detail.

We have finished implementing Feature 1. It’s time to rebase! We want to do a rebase onto master to merge conflicts before we include the merged commits in master:

Copy to Clipboard

We resolve conflicts as they come up, merging each commit in turn, and end up with the history looking like this:

We are done rebasing, but on closer inspection, we’re really unhappy with how we resolved the conflicts. What now? We seem to be stuck with our results and the original commit contents are nowhere to be seen!

Reflog to the Rescue

Happily, the git reflog is always there for us. Call git reflog and behold:

The reflog shows the commits that HEAD (the checked out commit) last pointed to. It goes quite a way back (the list is truncated here, but already goes back to before the start of feature-1), and we can still work with every commit that is shown here – we just don’t see them by default since they’re not pointed to by a branch or tag (a “ref” in git lingo).

Since the list is comprised of lines of commit hash followed by a description of the operation that moved HEAD to that commit, let’s make a branch that points to the HEAD before the rebase started, given as 56e70d5 HEAD@{5}: commit: Feature 1 is finished.:

Copy to Clipboard

We now can see our unchanged master, our rebased feature-1 and restore-feature-1 points to the state of feature-1 before we ever
started rebasing.

The reflog (or the whole repository clone containing it!) has to be deleted explicitly to lose this lifeline. So rely on it and rebase away – or just create the “safety” branch before even starting the rebase so that it’ll always be there for comparison. Unless you push it, nobody even needs to see it; that’s the beauty of distributed version control.

Distributed Version Control Doesn’t Mean No Safety Net

The other fear we often encounter is that it’s easy to lose work with git since it doesn’t get automatically backed up to a server with every version control operation like with centralized systems. We’ll just throw out two quick inspirations on that topic:

You Don’t Have To Share Every Commit

With SVN or similar systems, we’re often afraid to commit every change since it may not work immediately, or look silly; creating and merging (or discarding) a lot of branches has the same issue (and merging with SVN is a lot of work anyway).

SVN users tend to commit rarely, and use few branches.

With git, commit everything, branch where you want – just don’t push the results. Always having a checked in history locally gives a lot of safety.

It may not be a remote backup, but in our experience, we’ve made a lot more mistakes that we wish we could undo safely than we had disks die on us.

Actually, Push Everything Anyway, But…

Lots of git hosting systems allow making personal clones (“forks”) of repositories. Where this facility is available, have a personal remote copy of the repository, and push to that. This provides a backup, and a place where you can e.g. selectively share refactorings with colleagues without muddying up the “official” history.

Personal forks also facilitate a “pull request” based workflow, where you always push your changes to your own fork and ask another developer to “pull” the changes from there for review and integration.

This seems bothersome at first, but is one of the few workflows that organically lead into code review. Reviews may take some time, but skipping them, in our experience, is a bit like saving time while driving by not checking fluids.

Nothing’s Impossible, Though

All this said, there actually are some git operations that can destroy data in a way that cannot be recovered by using git’s tooling.

The data in danger here here are not changes that have ever been committed, whether the commit is currently visible in a GUI or not, but rather changes that have not been committed yet.

The first type of these changes are changes to version controlled files that only exist in the working directory (freshly saved edits) or the index (you have run git add but not committed yet).

The main command that can accidentally destroy these is git reset:

git reset --soft only moves the HEAD to an earlier commit, which is recoverable using the reflog. git reset --mixed (also the default if no option is specified at all) however overwrites the commit staging area (where changes get put when you git add them) with commit contents. git reset --hard overwrites both the staging area and the work tree (the directory with the files you edit directly) with commit contents. Staging area and work tree cannot be recovered, at least by git itself.

The other kind of changes that you can accidentally destroy with git are untracked files – i.e. files that you never ran git add on.

The main command that accidentally deletes untracked files is git clean.

git clean serves explicitly to remove untracked files (i.e., files not under version control) from a work tree. git clean -fdeletes files, git clean -d removes whole untracked directories. Of course, untracked files cannot be restored from git commits.

This is really useful, e.g. to restore work areas on build servers to a pristine condition – as if just cloned – without actually repeating the whole time-consuming git clone run.

Just remember to never run git clean in an actual working directory before checking which changed “untracked files” you might wish to preserve using git status.

In Conclusion

git offers a lot of “safety equipment” in its tooling and organic workflows, but is not always obvious about it. While some stumbling blocks exist, very few operations are truly unrecoverable, so don’t fear experimentation, especially with already committed contents.

Help your team make the most of git by obtaining customized training. Our offering in this space, focusing on interactive exercises in your individual environment and considering your workflow needs, can be found here.