The PICAX Media Creation System


Introduction

Debian needs good installation media. While much focus has been placed on the installer itself (and its virtues or lack thereof), the process of putting packages on the media and making them available is also important. That task has historically fallen to the debian-cd package, and for many years it served its purpose well. Recently, however, issues surrounding its future have come up, and its lack of flexibility has also been noted.

Progeny Systems has used debian-cd in the past (for its PGI graphical installer), and found it inadequate for our needs. As a result, when Progeny began work on porting Red Hat's Anaconda installer to Debian, we decided to take the opportunity to write a new media creation tool. Rather than tie our new tool tightly to Anaconda, we decided to make the tool flexible and modular, able to handle any special Debian media project. The result is picax(for Progeny Installer Creator and Archive eXtractor). The package is currently available in Debian testing and unstable.

Architecture

Picaxis written in Python, and requires at least version 2.2 of the interpreter. It comes as a Python script in /usr/bin and some modules loaded in the appropriate "site-packages" directory. Among other things, it needs python-apt and a Python XML parser; for the complete list of dependencies, see the package or its source.

While the master script controls the general flow of the program, the modules do most of the actual work. One provides a custom configuration system, another controls interaction with apt, and yet another provides a runtime build system. In addition, add-on modules perform two of the most important tasks: setting up an installer and writing particular media formats. Add-on modules are installed in the picax.modules namespace, and picaxcan detect their presence at runtime.

Fundamentally, picax is an apt repository splitter with some extra functionality. It strives to divide the repositories passed to it into approximately equal parts, with each succeeding part building on the previous one (or, in the case of the first, stand-alone). The appropriate add-on modules for handing installers and media type creation are called at the appropriate times in the split process to ensure that the split is done properly. For example, the media module is asked before the split begins about the maximum media size, and the installer is given a chance to install itself before the split begins so the split sizes can be adjusted appropriately.

Source packages are also handled (optionally) by picax. By default, picax builds source media in a similar manner to debian-cd, with separate source media corresponding to the binary packages installed on the binary media. But it can also make a continuous set of media, with the first source immediately following the last binaries on one of the middle media, and it can also create "mixed-mode" media, with each medium containing both binaries and the complete source for those binaries. This was intended to assist with compliance with copyleft-style copyright licenses such as the GPL.

Configuration

The picax.config module handles all configuration for picax. It is data-driven, with all possible items defined in a dictionary. The dictionary contains information about the item's name, data type, and documentation, used to display a short help screen in response to the --help argument. This dictionary is used to interpret both the command line and any configuration files picax is directed to use. This makes the configuration system very flexible and easy to extend when necessary.

As mentioned, configuration is read from two sources: the command line and configuration files. Configuration files must themselves be listed on the command line. The format is a free-form XML base derived from the configuration dictionaries; no DTD is provided, as the format is open-ended and can be extended. Conveniently, the config module can be told to write its configuration to a file. Since the command line overrides any configuration file settings, changing a configuration file can be done within picax itself, by reading a configuration file, adding command-line changes, and writing a new configuration file.

Add-on modules are also expected to participate in configuration. Each provides a dictionary of its own to define its configuration items. These items can be set on the command line using a special prefix ("inst" for installer modules, or "media" for media modules), and are given a special section within configuration files. The resulting dictionary of items from the module contains a sub-dictionary for each of these add-ons.

Building an Installer Runtime

The picax.unpack module is not used directly anywhere within picax, but it is nevertheless very useful for installers. Its function is to create a very small runtime environment, suitable for booting and supporting the install program. Its design is somewhat similar to Debian's "udeb" system for debian-installer, but it has the additional advantage of being able to use any of the packages in the Debian distribution being built, and not just the udebs. Its use is not mandatory; installers with their own runtime build system can ignore this module as long as they are able to build a runtime image and write it to the media when appropriate.

The module operates off an XML description of the packages to be used. The first section of this file amounts to a list of packages to unpack to the runtime destination, while the second defines how these packages are to be unpacked and what needs to be done to them afterwards. Since the unpack process does not run any package scripts, it provides for small scripts to be provided per package that can perform the necessary tasks.

One might think that such a bare-bones method of installing packages would not work. However, our experience with the Anaconda installer, which uses picax.unpack, suggests that the method works well. We are able, using this method, to unpack and configure such packages as LVM, Python and Python-GTK+, and TrueType font packages (with the resulting fonts showing up nicely in the runtime).

Handling Media

Adding a new media module is not difficult; when the media API was planned, the CD module was finished in just a few days (most of which was spent debugging the API itself). Only three functions are required:one to return the configuration dictionary to picax.config, one to return the maximum size of each media, and one to actually create each image. Modules can be installed into the picax.modules namespace, and are selected using the "media" configuration item.

Handling Installers

Installers also use modules to hook into the picax infrastructure, installed into the same place as media modules (picax.modules). Like media modules, installer modules need to only provide a few functions: one to return a configuration dictionary, one to return a list of packages the installer would prefer to put on the first medium, one to write the installer to the media, and one to perform any post-install cleanup.

Installer modules can be fairly complex. Progeny's Anaconda port to Debian (package "anaconda", available from http://platform.progeny.com/, provides a good example of a complex installer module.

Installer and Media Interaction

When building installers that boot off media, it's often the case that the installer needs to control the media build process. For example, apt-based installers often need to write the apt-cdrom information for all the CDs to the first CD, so it may be retrieved at install time without forcing the user to cycle through all the CDs. This can't be done unless the first CD is the last one created.

For this reason, installers are given the opportunity to drive the media-building process. By default, a simple media builder, defined in picax.media, controls the process; it simply builds each image in order. If an installer module defines a get_media_builder function, however, picax will call this function, and the object it returns will be given control over the process. As a function of the installer module, it is able to control exactly when each image is created, and can do anything it needs to prepare the image root before creating each one.

Conclusion

The right to choose lies at the heart of free software's successes over the years. Alternatives are never a liability, despite the beliefs of some; they simply help to direct the community towards better solutions, either by adopting superior alternatives or by enhancing the tried-and-true.

Picax is offered in that spirit. While some grumble about their problems with debian-cd , in the end it has successfully served the Debian community well through several releases, and will likely serve it through at least one more. It certainly should not be thrown out without careful consideration. Whatever the Debian community decides, picax is serving Progeny well right now, and was worth doing for that reason alone.