The massive, monolithic JDK
2008/11/25 00:35:55 -08:00

Space is big. Really big. You just won’t believe how vastly, hugely, mindbogglingly big it is. I mean, you may think it’s a long way down the road to the chemist, but that’s just peanuts to space.

— Douglas Adams, The Hitchhiker’s Guide to the Galaxy

The JDK is big, too—though not (yet) as big as space.

It’s big because over the last thirteen years the Java SE platform has grown from a small system originally intended for embedded devices into a rich collection of libraries serving a wide variety of needs across a broad range of environments.

It’s incredibly handy to have such a large and capable Swiss-Army knife at one’s disposal, but size is not without its costs.

Size The JDK and its runtime subset, the JRE, have always been delivered as massive, indivisible artifacts. The growth of the platform has thus inevitably led to the growth of the basic JRE download, which now stands at well over 14MB despite heroic engineering efforts such as the Pack200 class-file compression format.

Complexity The JDK is big—and it’s also deeply interconnected. It has been built, on the whole, as a monolithic software system. In this mode of development it’s completely natural to take advantage of other parts of the platform when writing new code or even just improving old code, relying upon the flexible linking mechanism of the Java virtual machine to make it all work at runtime.

Over the years, however, this style of development can lead to unexpected connections between APIs—and between their implementations—leading in turn to increased startup time and memory footprint. A trivial command-line “Hello, world!” program, e.g., now loads and initializes over 300 separate classes, taking around 100ms on a recent desktop machine despite yet more heroic engineering efforts such as class-data sharing. The situation is even worse, of course, for larger applications.

Palliatives The Java Kernel and Quickstarter features in the JDK 6u10 release do improve download time and (cold) startup time, at least for Windows users. These techniques really just address the symptoms of long-term interconnected growth, however, rather than the underlying cause.

The modular JDK The most promising way to improve the key metrics of download time, startup time, and memory footprint is to attack the root problem head-on: Divide the JDK into a set of well-specified and separate, yet interdependent, modules.

The process of restructuring the JDK into modules would force all of the unexpected interconnections out into the open where they can be analyzed and, in many cases, either hidden or eliminated. This would, in turn, reduce the total number of classes loaded and thereby improve both startup time and memory footprint.

If we had a modular JDK then at download time we could deliver just those modules required to start a particular application, rather than the entire JRE. The Java Kernel is a first step toward this kind of solution; a further advantage of having well-specified modules is that the download stream could be customized, in advance, to the particular needs of the application at hand.

Now, wouldn’t all that be cool?

Going further, the modularization process could be applied not just to the JDK but to libraries and applications themselves so as to improve these metrics even more. Doing so might also enable us to address some other longstanding problems related to the packaging and delivery of Java code.

Hmm …