The massive, monolithic JDK
2008/11/25 00:35:55 -08:00

Space is big. Re­ally big. You just won’t be­lieve how vastly, hugely, mind­bog­glingly big it is. I mean, you may think it’s a long way down the road to the chemist, but that’s just peanuts to space.

— Dou­glas Adams, The Hitch­hiker’s Guide to the Galaxy

The JDK is big, too—though not (yet) as big as space.

It’s big be­cause over the last thir­teen years the Java SE plat­form has grown from a small sys­tem orig­i­nally in­tended for em­bed­ded de­vices into a rich col­lec­tion of li­braries serv­ing a wide va­ri­ety of needs across a broad range of en­vi­ron­ments.

It’s in­cred­i­bly handy to have such a large and ca­pa­ble Swiss-Army knife at one’s dis­posal, but size is not with­out its costs.

Size The JDK and its run­time sub­set, the JRE, have al­ways been de­liv­ered as mas­sive, in­di­vis­i­ble ar­ti­facts. The growth of the plat­form has thus in­evitably led to the growth of the basic JRE down­load, which now stands at well over 14MB de­spite heroic en­gi­neer­ing ef­forts such as the Pack­200 class-file com­pres­sion for­mat.

Com­plex­ity The JDK is big—and it’s also deeply in­ter­con­nected. It has been built, on the whole, as a mono­lithic soft­ware sys­tem. In this mode of de­vel­op­ment it’s com­pletely nat­ural to take ad­van­tage of other parts of the plat­form when writ­ing new code or even just im­prov­ing old code, re­ly­ing upon the flex­i­ble link­ing mech­a­nism of the Java vir­tual ma­chine to make it all work at run­time.

Over the years, how­ever, this style of de­vel­op­ment can lead to un­ex­pected con­nec­tions be­tween APIs—and be­tween their im­ple­men­ta­tions—lead­ing in turn to in­creased startup time and mem­ory foot­print. A triv­ial com­mand-line “Hello, world!” pro­gram, e.g., now loads and ini­tial­izes over 300 sep­a­rate classes, tak­ing around 100ms on a re­cent desk­top ma­chine de­spite yet more heroic en­gi­neer­ing ef­forts such as class-data shar­ing. The sit­u­a­tion is even worse, of course, for larger ap­pli­ca­tions.

Pal­lia­tives The Java Ker­nel and Quick­starter fea­tures in the JDK 6u10 re­lease do im­prove down­load time and (cold) startup time, at least for Win­dows users. These tech­niques re­ally just ad­dress the symp­toms of long-term in­ter­con­nected growth, how­ever, rather than the un­der­ly­ing cause.

The mod­u­lar JDK The most promis­ing way to im­prove the key met­rics of down­load time, startup time, and mem­ory foot­print is to at­tack the root prob­lem head-on: Di­vide the JDK into a set of well-spec­i­fied and sep­a­rate, yet in­ter­de­pen­dent, mod­ules.

The process of re­struc­tur­ing the JDK into mod­ules would force all of the un­ex­pected in­ter­con­nec­tions out into the open where they can be an­a­lyzed and, in many cases, ei­ther hid­den or elim­i­nated. This would, in turn, re­duce the total num­ber of classes loaded and thereby im­prove both startup time and mem­ory foot­print.

If we had a mod­u­lar JDK then at down­load time we could de­liver just those mod­ules re­quired to start a par­tic­u­lar ap­pli­ca­tion, rather than the en­tire JRE. The Java Ker­nel is a first step to­ward this kind of so­lu­tion; a fur­ther ad­van­tage of hav­ing well-spec­i­fied mod­ules is that the down­load stream could be cus­tomized, in ad­vance, to the par­tic­u­lar needs of the ap­pli­ca­tion at hand.

Now, wouldn’t all that be cool?

Going fur­ther, the mod­u­lar­iza­tion process could be ap­plied not just to the JDK but to li­braries and ap­pli­ca­tions them­selves so as to im­prove these met­rics even more. Doing so might also en­able us to ad­dress some other long­stand­ing prob­lems re­lated to the pack­ag­ing and de­liv­ery of Java code.

Hmm …