Developer-centric version control considered harmful
Why you shouldn't host things under your own name if you care about others using it
Git is definitely the flavour of the month when it comes to version control systems. In a relatively short space of time, it's attained quite substantial mindshare. I think this has a lot to do with GitHub. And I think GitHub's success has a lot to do with its friendly, inviting user interface. It's a great service, and it's managed to become a de-facto home for Git repositories for a lot of people.
The great strength of Git - enhanced by GitHub - is that it allows people to put up their own repositories very easily. As a developer, I have a natural home for all manner of personal projects, and a great tool chain around them, including a Wiki and an issue tracker.
So what's the problem? Consider this URL:
Notice that my (user)name is in the URL.
Once the project outgrows my own tinkering, I may have some contributors, who want to tinker on their own. Git and GitHub encourage this kind of thing. They can fork my project with just one click. But now there are multiple versions of the repository (and the wiki and the issue tracker), and no-one can really be too sure which one to use.
There are lots of potential problems with this, with lots of potential solutions. Let's think about a few scenarios.
- Someone else forks my repository and tinkers a bit. They then email me to ask me to pull their changes into my repository. If I have the time and inclination (and read my email), that may bring us back to one canonical version. However, in time, there is a single point of failure on me as the maintainer. At some point, I may stop caring or get too busy to follow up.
- Here, Git has an answer: anyone can fork my repository and take over ownership. This is powerful and adds fault tolerance to the "single maintainer" model. But socially, it is dubious. Who really controls development? Who has a right to publish a new release? What if I come back after a few months and decide I wanted to own the project all along? We now need to reconcile the two forks. This may be tricky. Perhaps we don't do it. People now have to figure out which one of two versions of the software to "bet" on. This is onerous, and likely to deter other contributors.
- Let's say we bring the main line of development back to "my" repository. Once again, I have to commit a lot of time to reviewing and pulling in changes. Maybe I fall behind. So another user comes around and wants to work a bit. Which one of the two trees does he fork? Which ones does he pull in updates for?
- Perhaps I decide to relinquish control to another developer. I let my GitHub fork die, and expect people to use another one. But how do I manage this? How do I hand over ownership to someone else's account without confusing people who've been using my repository? Or people who stumble upon it in the future?
Sooner or later, some order has to be imposed on this model. Here, unfortunately, Git and GitHub offer only a handwave. There are a multitude of models that we could adopt. Perhaps we move the project to a more "project-centric" hosting service such as Google Code or Launchpad. Perhaps we set up an "organisation" user on GitHub (but what organisation would that be? for a small project, there may not be an obvious one) with shared ownership, and move our code there. Perhaps we continue with a highly decentralised model, where pulling changes is always an ad-hoc task, and no-one has a stronger claim to ownership than anyone else. There are pros and cons to all of these options, of course, but none seem especially good for a project that has a few users, a well-populated wiki and a bunch of issues in its tracker.
Here's a cautionary tale that happened to me today: I've been using Soaplib, a Python library for building SOAP servers. There was a release on PyPI, but it had a bug that meant I had to get a checkout. The PyPI page listed a GitHub URL, so I used that to clone the repository. A few weeks later, all our builds suddenly broke. The remote repository was gone, deleted from GitHub. Turns out, the original maintainer had given owner responsibility to another person, who has his own GitHub repository. Development had gone on there for months, unbeknownst to me. One day, the original maintainer decided to delete his repository to not confuse people. Noble, but rather inconvenient.
Again, there are lots of solutions to this problem. However, I think that fundamentally, as a user and potential contributor to a library, I want to find "the" repository and commit my changes there. A personal fork is a good idea until I can get access, but there has to be a path for me to get repository access and become a recognised, trusted contributor. Open source projects have used this model for years, as a way to encourage, recognise and empower contributors and build a shared sense of ownership around the project. I expect the project to be bigger than any one contributor or owner, and I expect the infrastructure to be able to outlive their involvement.
This is why, for things that are a bit bigger than one person, and a bit smaller than a major open source project, I've got mixed feelings about GitHub, and even mixed feelings about Git itself. They are great tools. I find myself wanting to use them. But I also worry that they are too flexible for their own good, and that the most obvious way of using GitHub is not a good one for open source projects.
I think Hanno Schlichting put it well: With Subversion, you get a development model, which, whilst not perfect, is easy to understand and has worked out very well in practice for numerous projects. With Git, you're expected to make up your own model. I don't think people are very good at that, and rarely plan ahead for when their code outgrows them.
I know it's possible to use Git like Subversion, with a central repository. But I'm also not seeing a great many people doing that. We have to remember that Git was first built for the Linux kernel, which has a development model unlike most other projects, where it really is up to a set of core maintainers to review every patch and selectively pull it in. They need the kind of flexibility and power that Git offers. For many other projects, I'm not sure this flexibility is always a good thing.