Packaging Java code

2008/11/30 23:12:11 -08:00

In the beginning Java applets were originally just collections of class files and other resources, such as sound and image files, published on a web server and downloaded over HTTP—one at a time—by early web browsers.

This approach was, of course, not terribly efficient in terms of bandwidth, and so the user experience over a slow connection was pretty unpleasant. There was, moreover, no way to cryptographically sign an applet in order to guarantee its integrity and authenticate its publisher.

Thus in 1996 was born the JAR file format, a simple extension of the popular ZIP archive format with manifest and signature files. In the absence of a better alternative the JAR format rapidly became the standard way to package reusable Java libraries and, with the advent of the Main-Class manifest header, even entire applications. Experience has shown, however, that the JAR format is not very well-suited to these more-ambitious uses.

Failures of expedience A JAR file is little more than a set of classes packaged into a single unit which can be transported over the wire, cryptographically verified, and, ultimately, placed on a traditional class loader’s class path. Code packaged in JAR files is hence susceptible to all the usual problems of the class-path model, in particular those arising from the fact that all classes on the path are in the same flat namespace with their visibility determined only by the order, on the path, of their containing JAR files.

There are many ways in which this model can fail. The most common failure mode occurs when two versions of a library are inadvertently placed on the class path. In this situation chaos is likely to ensue—and be very difficult to debug. Application code might appear to use the older version, the newer version, or some bizarre combination of the two, all depending on the exact differences between the versions and their placement on the class path.

This failure mode, and others, have been encountered by so many unwary developers over the years that this general set of problems has been dubbed “JAR hell.”

Modular code If we’re going to modularize the JDK then why not do the same for Java libraries and applications? If we have the facilities required to divide the JDK into a set of well-specified and separate, yet interdependent, modules then we should be able to leverage those same tools even further in order to climb out of JAR hell.

This can work because the metadata in a module is much richer than that in a simple JAR file. A module’s metadata describes its own version as well as the dependences it has upon other modules. The dependences can themselves be constrained with respect to version numbers so that, e.g., a module X can declare that it needs version 1.2.3 of module Y, or any later version.

At run time the classes in a module are not simply added to a class path. The loading and linking of the classes within a module is, rather, guided by the dependence and versioning constraints in the module’s metadata. During this process care is taken to ensure that classes in one module are visible to classes in another only when intended, and that no module is ever linked to more than one version of another.

Packaging Java libraries and applications as modules would allow us to ascend from JAR hell into the clear light of day, where versioning and dependence information is declared at compile time and then leveraged during distribution, installation, and run time. The techniques for improving download time, startup time, and memory footprint applicable to a modular JDK would, moreover, be equally applicable to modularized Java libraries and applications. Truly modular Java components would, finally, also enable us to address at least one other longstanding problem area …

Native packaging One of the age-old critiques of the Java platform is that it doesn’t integrate very well with the native operating systems upon which it runs. An oft-cited aspect of this critique is that the usual means of packaging Java code—i.e., JAR files—has no well-defined relationship to native packaging systems.

Many an application developer has coped with this impedance-mismatch problem by creating, for each target platform, a native package containing the JAR files for the application, the JAR files for all required libraries—and an entire JRE.

This is an effective solution, but a crude one which introduces problems of its own. Delivering monolithic, self-contained native packages wastes both download time and disk space. More importantly, the lack of sharing of common components—whether of libraries or of the JRE itself—makes it impossible to update those components independently in order to fix security bugs and other critical issues.

A different approach to the impedance-mismatch problem is to address it directly, by building suitable native packages for Java libraries and applications that were originally delivered as JAR files. The enterprising developers behind the JPackage Project have done exactly this, for RPM-based Linux platforms, for a wide variety of popular Java components.

A large part of the time spent in this sort of effort, whether for RPM or for any other reasonably-capable native packaging system—e.g., Debian, SVR4, or IPS—rests in identifying and encoding inter-component dependences. Simple JAR files do not contain such metadata, so in many cases quite a bit of tedious detective work is required.

If Java libraries and applications were packaged as modules, complete with accurate version and dependence metadata, then it would be almost trivial to transform them into sensible native packages. One could even imagine a single tool, delivered as part of the JDK, that would implement this transformation for common native platforms, at least for simple cases.

Now—that would be cool.