In the previous post I gave a little introduction to Version control, explaining a few basics that help to understand the topic. This post assumes you know these things that version control systems have in common. So now it’s time to discuss what sets them apart – not individually, yet, but in terms of characteristics some of them share with each other.
Each version control system (VCS) has its unique advantages and disadvantages. However there are some traits which are common to several of them. And since the programs that share those traits were typically released rather close to each other, it makes sense to speak of generations of version control systems.
So far there are three of them with the first obviously being the oldest and the third generation the newest. What may surprise you is the fact that the earlier versions, even though being much more limited, have not completely disappeared. How come? Well, just keep it in mind while we take a look at those generations. Perhaps you can see where tools of older generations may still make sense!
The previous article discussed manual “version control” and its limitations. In short: It can work for you if your project is rather small, you’re working on it alone and you’ve got the discipline to always do proper file backup before you make bigger changes. In a world of more sophisticated software it’s quite unlikely that many projects meet all of those requirements. Therefore it totally makes sense to develop programs that would assist you in doing proper version control on your projects.
The first generation
The first generation did exactly that: It preserved any changes you made to a single file. Yes, it is one characteristic trait of the first generation that it works on a per-file basis. Version control is entirely separate for each and every file that you choose to record changes for. For each file it manages, the VCS creates a “history file” which contains all the differences from one version to the next plus a comment.
Originally, it was common to use VCS of the first generation on multiuser systems. If multiple users can work on the same project at the same time (via different logins), it’s quite possible that conflicts arise. If two persons make changes to the same file, the one who saves last “wins” – overwriting all changes that somebody else may have made in the meantime. To avoid that, locking was invented. Files can be locked while they are being edited. In case somebody decides to edit a file, causes a lock and then does something else, the file remains locked. An administrator can however break a lock if something like this happens.
Today VCS of the first generation are more or less obsolete. There are niches where they managed to survive and are still being used. One example is management of configuration files on *nix systems without centralized configuration management. These configuration files are only relevant to the system they exist on so that missing networking capabilities (which the second generation introduced) do not mean any disadvantage. And most of the time these configuration files are separate entities not related to other files and for that reason not even the limitation of only managing a single file is a problem here.
The second generation
There are two important aspects that set tools of the second generation apart from those of the first: They offer network capabilities and they can manage multiple files in one project! The later ability solves a whole bunch of problems which made 1st generation tools hard to work with on anything but very small projects. Managing each file separately does not sound too bad at first. But think about it for a minute.
Let’s imagine, we work on a simple project. Nothing too fancy: A few source code files, one header file. Currently the program is broken and we decided to go back to a working version. Good thing that we have version control, right? Right… Sort of. The file main.c is currently at revision 96, foo.c at revision 44, bar.c at 24 and baz.h at revision 7. See the problem? After we found out that revision 89 broke the program and we reverted back to 88, how do we find out which revision number of the other files belongs to revision 88 of main.c? Yes, we have time stamps and we can find out which revisions all of our files had when the program was working when the main file was at revision 88. Maybe it’s not even that bad when we only have four files. But what if we have 20? 100? It’s cumbersome and really a waste of time. Keep things like this in mind and you’ll definitely come to appreciate the ability to manage multiple files together in one project where the revision number increases whatever file was changed and however many files were modified!
Now that larger projects are possible because the whole project is managed together in one repository, it makes sense to use the network as well. This networking capability is achieved by providing one centralized repository which all project members (or even everybody interested in the project) can checkout to create a local working copy of the latest revision (or any older one if needed). Changes can be made locally and after committing them they are checked in back into the centralized repository. Since the tools of the second generation will only allow checking in if nobody else did a check-in in the meantime (if somebody did, you need to update your working copy first and merge your and the remote changes if at least one file was modified by both) locking is also not necessary anymore!
Today tools of the second generation still play an important role. Their attractivity is declining, however. This is due to a few shortcomings which the third generation tries to address.
The third generation
The big innovation that is common to all tools of the third generation is that they work decentralized. Users usually don’t checkout files from a central (probably remote) repository. Instead they clone the full repository and then checkout the files from their local clone. Since the local repository is exactly the same as the original one, there’s no longer one central repository – at least from a technical view. And while cloning requires to transfer a lot more data over the wire (especially for large projects), there are some huge benefits to it.
If you have a local clone, you can work on the project even when you’re not online. You can see the complete history and checkout earlier revisions if you need to – all without having to access a central repository. If you’re online, you can always sync your local clone with the original repo (pull down changes) or even the original one with your local repository (push up changes) if you have write access.
One of the biggest advantages of decentralized tools is that they make forking much easier. While forking a project has been something not well liked in the past, you’ll often see projects asking you to fork and play with their code today (e.g. the well-known “fork me on github”). Experience has shown that quite some people fork a project, add a feature they need and then give their code back to the project (this is done by creating a pull request which invites the administrators of the original project to pull in the changes if they want).
Which one to use?
If you like software history, there’s nothing wrong with trying out tools of the older generations. But if you’re just starting out with version control and you want to learn something now, it makes sense to choose a tool of the third generation. Which one would I recommend? I can only give the usual answer to such a question: It depends. Each one has its strengths and weaknesses. In the next blog posts we’ll take a closer look at some of the open source VCS of all generations. This might help you to choose the right one for your purpose.