Version control (pt. 1): An introduction

This post is the first part of a series on version control. It provides an introduction by explaining what that actually is, why you should probably use it and how it works in general.

Important terms

Version control (also: revision control) is a means of preserving various versions of a file or of multiple files. This can be done in a lot of different ways but over time some best-practices have emerged that are more or less followed in all modern version control systems.

There are lots of cases where version control makes sense. One of the most common ones is software development where using version control is virtually mandatory. That is why there’s even a separate term describing this form of version control: source (code) control or source code management.

We can group various version control systems together in two groups: local as well as network-based systems. The latter can be further differentiated between centralized and distributed ones.

In short: Why you may need it

Depending on the choice of the tool there can be various situation where you can benefit from version control:

  • Version control enables you to quickly return to an older, known-good state in case something broke
  • Version control gives you the possibility to precisely document any changes you made and thus provides both tractability and a quick overview of changes
  • Version control solves a lot of the problems that occur if multiple people work on the same files at the same time
  • Version control makes it simple to clearly find out the author of any change

Or as a former colleague of mine put it (in a very vivid way):
Why to use version control!

Manual version control

The simplest form of version control is working with a backup copy. You make a copy of e.g. a configuration file before making a change to the live file. Afterwards you test your changes and if they seem to work you either delete the backup or keep it for reference. If the changes had undesired effects, the original is overwritten with the backup (and the latter usually deleted again). This is actually a (rather primitive but sometimes sufficient) form of version control: Thanks to the backup copy you have two versions of the file at your hands!

Backup copy – we’ve all done it

Another variant is to make a copy after a fixed amount of time (or at random). Often people prepend the date to the file name. If all you want to accomplish is that e.g. the data as it was at of the first of each month is preserved for one year, that’s also a sufficient method (along with rotating the backup copies so that you don’t keep around more of them than you need).

Those means of manual version control are however pretty limited. And worse: There’s plenty of room for making mistakes!

Manual version control

How a version control system (VCS) works

Let’s think about a simplified versioning process by pretending to do things by hand. For each recorded change of a file you’d make a copy of the file and keep all the old versions of it. A VCS does not forget a single version of a file it monitors! That’s what it is meant to do, after all. Each new version of the file gets a comment that is meant to briefly sum up the changes that were made.

Keeping probably dozens or even hundreds of files around because one file has that many revisions, would really clutter your disk. Also it would not make sense to keep nearly the same file twice if only one line was changed! That’s why local VCS which also version on a per-file base, keep a “history file” around for each versioned file. That file records only the changes between the various versions as well as the comments.

Network-based VCS are able to organize a whole project (multiple files together instead of each one separately). They also record the changes instead of the full files for each revision as well as the comments. All of that data is collected in a so-called repository.

If a new team member wants to start working on the project, he or she first needs to get the files of that project. For centralized VCS this is done by checking out the most current revisions of all the project files from the remote repository. By doing so, a local working copy of the project files is created to be worked on by the user. When using a distributed VCS the remote repository is cloned instead (thus receiving the full repository with all revisions and not just the most current version of each file). The working copy is then checked out from the local repository clone.

At the beginning of a new project there is no repository, yet. In this case either an empty repository is created, checked out and the new working directory is populated with files. In a next step, those files are placed under version control (which means that the VCS is told to watch them and record changes). Then all changes (all of the files since they are all new right now) are placed into the repository by doing a commit.

After every change made to the project you do a commit again, recording the changed state inside the repository. Other project members can now get a current copy from the repository. This way it’s easy to work together on the same project without the risk of (unknowingly) get into the way of somebody else.