Python package management
easy_install, zc.buildout, workingenv, setuptools, virtualenv... are we done yet?
A while ago now, a debate was raging the Plone mailing lists about the relavive merits of workingenv and zc.buildout, two systems to make Python eggs easier to manage. Very briefly, the problem is that setuptools, the system that enables eggs with dependency management, will tend to install packages in the global Python site-packages directory. Sometimes that makes a lot of sense, but when we work with something like Plone, we often want different versions of different packages for different instances, and we don't necessarily want to pollute the global Python interpreter with packages that are used exclusively for one particular Plone instance.
There are roughly two approaches to this problem: Workingenv took the approach of creating a mini-environment inside a particular directory, complete with bin/python, bin/easy_install, lib/python2.4/site-packages and so on. By "activating" this environment, or simply by ensuring that you always used the binaries for python and easy_install inside this directory, you could make sure that eggs installed (whether by running "python setup.py install" or using easy_install) were contained within that environment only. For Zope development, one approach, championed by Daniel Nouri in the ploneenv tool, is to create a standard Zope 2 instance that is also a workingenv.
Buildout, in contrast, ties everything to a single configuration file - buildout.cfg. The buildout tool runs various "recipes" listed in this file. Some recipes will use setuptools to install eggs (and their dependencies) locally in the buildout. Buildout is usually very explicit about which eggs it activates, for example by generating wrapper scripts (such as the ./bin/instance, which starts a Zope instance installed via the plone.recipe.zope2instance recipe) that explicitly activate a particular working set of eggs. Each time you change the config file, you have to re-run the buildout tool to re-create these working sets.
Ian Bicking recently created virtualenv, which supersedes workingenv. He blogged about some of the problems with workingenv (which he now seems to discourage people from using). At the moment, virtualenv doesn't work on Windows or on the standard Python installation on OS X, which is a bit of a problem, but I'm sure it will be overcome. I haven't had a chace to use it yet, but if Ian says it's more appropriate, I trust him. No dobut, ploneenv and other such scripts could quite easily be made to use virtualenv rather than workingenv.
zc.buildout doesn't create an isolated Python environment in the same style, but achieves similar results through a declarative config file that sets up scripts with very particular packages. As a declarative system, it is somewhat easier to repeat and manage, but more difficult to experiment with. zc.buildout includes the ability to setup non-Python systems (e.g., a database server or an Apache instance)
As he notes, buildout is somewhat broader in scope than virtualenv/workingenv. You can use it to run any kind of command, for example to download non-egg packages or create Zope instances or configure a cache server. The buildout tutorial on plone.org explains the concepts in more detail.
Are we there yet?
In my book, I advocate buildout pretty much exclusively. I'm quite glad that I did, especially given that workingenv now seems to be deprecated and virtualenv doesn't yet work on Windows (I'm sure it will eventually, though). More importantly, though, in shared development or deployment scenarios, I think it's very important to have something that's explicit, mangable and repeatable. If you're not careful, it can be hard to know exactly what's in your environment, and what's local and what's global, with the workingenv/virtualenv approach. Setuptools also does not have a very good uninstall story (buildout, which re-calculates the working set of eggs each time it's run, will simply not activate eggs it doesn't need, so there's nothing to install or uninstall).
Buildout also gives me some peace of mind, in that I know I can write a new recipe to solve problems that aren't solved with existing recipes or the egg mechanism. For example, I had to create a recipe to build Deilverance, due to problems with dependencies on some platforms, and a recipe to configure Plone, in order to make the installation of a particular version of Plone predictable and simple. Ideally, neither should be necessary (both Deliverance and Plone could, in theory, just be installed as simple eggs), but for the moment, we need a bit more magic. By writing a recipe, I can isolate this magic, and when Plone does become just as simple as a collection of eggs in the future, I can update the recipe to take advantage of that - the user of the recipe will probably never notice.
However, this is still an over-generalisation. I think the buildout approach works well for Zope/Plone development and similar scenarios, but it would be quite cumbersome to create buildouts for every single Pythons script you ever wrote. I think we need to step back and look at the different use cases for Python package management, and promote appropriate tools for each one.
A standalone program
Say a user wants to install and use a program, which happens to be written in Python. This is where setuptools and easy_install shines. Simply easy_install the package, and you will get the latest version. Using the console_scripts support in setuptools, this will create an appropraite binary (which even works on Windows). For the most part, I think it's appropriate to install this globally (I install Paste Script this way, for example, and use its paster command extensively to create new buildouts and packages, via ZopeSkel).
Some developers will want to have multiple versions of these binaries, e.g. for testing purposes. Here, workingenv/virtualenv is a good aproach, since it creates an isolated mini-environment. Most users won't need this though - they'll just want the latest version. easy_install also makes it easy to upgrade, and to install specific versions if necessary.
A dependency for an ad-hoc script
Developers and integrators may write simple ad-hoc scripts, that are not themselves packaged as eggs. If they need specific libraries, they may use easy_install to get these. Again, they're probably more interested in the latest version.
Using the global easy_install will typically work here. It'd be better if easy_install was better at uninstall, especially when it comes to transitive dependencies. There is a chance that two globally easy_install'ed things could conflict. It may be better to use a workingenv/virtualenv here, though I suspect developers may not always have the patience to do so.
A dependency for a standalone program or library
Developers who create standalone programs or libraries will typically package these as eggs. Dependencies are declared in setup.py. During development, it may make sense to install these using workingenv/virtualenv or manage the application with a buildout, but the final version is just a standalone egg. How it ends up being used depends on what it'll be used for - typically, it'll correspond to one of the other use cases considered here.
Some of the Zope 3 packages actually take a hybrid approach here. They are standard eggs, with a setup.py, but they also ship with a buildout.cfg file. This means that developers can build a minimal environment with this package and its dependencies, for example to run tests. In the case of a standalone program with console scripts, this approach also lets you get an archive of the program, run buildout and get the executable in a bin/ folder inside the archive. To my mind, this is a little strange, but it should work. I can certainly see the attraction of having something like this in svn to make it easy to run tests.
A library used in a specific, standalone application
This is the Zope/Plone use case. You build a web server and various libraries, and you may need to deploy it to different environments, but it's not necessarily going to end up as an egg that other people will download and grapple with.
Here, I strongly prefer the buildout approach, not at least because it gives a lot of predictability and flexibility. The main downside is that if you want to quickly experiment with a new egg, you need to edit buildout.cfg (or your own package's setup.py) to add it, then re-run buildout. It may be useful to have a standalong workingenv/virtualenv for trial-and-error as well, where you can easy_install anything you want (and blow it away when it goes awry). Some people in the past have experimented with creating a buildout directory that is also a workingenv/virtualenv. I think this gets pretty messy, though, and I wouldn't recommend it.
These are the use cases I can think of right now. I'd be intersted to hear what other people use and what works for them. Also, although I think we have solutions for most of these, I still find there is plenty of scope for confusion, especially when you deal with packages that may come from the global site-packages or a local environment, or "stateful" environments such as workingenv that may or may not be activate at any one time. Having tools to make it easier to manage (upgrade, uninstall, disable, trace dependencies of) packages installed via easy_install in site-packages would actually go a long way, I think.
The only parallels I can think of are .NET DLL assemblies and Java JAR files. I'm not sure these do much better, though. In a recent project, we had somany JAR files in Tomcat's "lib" directory, with such cryptic names and no clear dependency information, that we didn't know what libraries we were actually using. :)