The purpose of the Leyden repository is to prototype improvements to the startup time, time to peak performance, and footprint of Java programs, as a part of Project Leyden. We solicit feedback from the Java community, with the hope that some of these improvements can be eventually incoporated in future JDK releases.
- This repository contains experimental and unstable code. It is not intended to be used in a production environment.
- This repository is intended for developers of the JDK, and advanced Java developers who are familiar with building the JDK.
- The experimental features in this repository may be changed or removed without notice. Command line flags and workflows will change.
- The benchmarks results reported on this page are for illustrative purposes only. Your applications may get better or worse results.
The Leyden "premain" prototype includes many optimizations that shift work from run time to earlier executions of the application, which are called training runs. In a training run, we pre-compute various kinds of information. Importantly, we pre-compile bytecode to native code, guided by observations of the application's actual behavior during the training run.
The Leyden repository closely tracks the JDK main line. We are typically only a few weeks behind the main-line JDK repo.
We have implemented the following improvements:
-
Ahead-of-Time Class Loading & Linking (JEP 483): This gives the JVM the ability to put classes in the linked state as soon the application starts up. As a result, we can implement many other time shifting optimizations with considerably simplified assumptions.
- Please refer to the JEP 483 document for more details.
-
Ahead-of-Time Method Profiling (JEP draft 8325147): We store method profiles from training runs in the CDS archive, thereby enabling the JIT to begin compiling earlier during warmup. As a result, Java applications can reach peak performance faster.
- This feature is enabled by the new diagnostic (
-XX:+UnlockDiagnosticVMOptions
) VM flags-XX:+RecordTraining
and-XX:+ReplayTraining
.
- This feature is enabled by the new diagnostic (
-
Ahead-of-Time Code Compilation (JEP draft 8335368): Methods that are frequently used during the training run can be compiled and stored along with the CDS archive. As a result, as soon as the application starts up in the production run, its methods can be can be natively executed.
- This feature is enabled by the new VM flags
-XX:+StoreCachedCode
,-XX:+LoadCachedCode
, and-XX:CachedCodeFile
. - Currently, the native code is stored in a separate file, but our plans is to eventually store the native code inside the CDS archive file.
- This feature is enabled by the new VM flags
-
Ahead-of-time resolution of constant pool entries: many constant pool entries are resolved during the assembly phase. This allows the application to start up faster. Also, the existence of resolved constant pool entries allows the AOT compiler to generate better code. For diagnostic purposes, you can use
-XX:+UnlockDiagnosticVMOptions -XX:-AOTInvokeDynamicLinking
to disable the AOT linking of constant pool entries for theinvokedynamic
bytecode. -
Ahead-of-time generation of Dynamic Proxies: Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time.
- This feature is enabled by the new VM flag
-XX:+ArchiveDynamicProxies
.
- This feature is enabled by the new VM flag
-
Ahead-of-time generation of reflection data: Reflection data (such as instances of
java.lang.reflect.Method
) are generated by the JVM to supportjava.lang.reflect
operations. We can generate these ahead of time to improve start-up.- This feature is enabled by the new VM flag
-XX:+ArchiveReflectionData
.
- This feature is enabled by the new VM flag
-
Class Not Found Cache: Sometimes application frameworks repeatedly try to load classes that do not exist. This optimization allows such failing lookups to be done quickly without repeatedly scanning the class path.
- This feature is enabled by the new VM flag
-XX:+ArchiveLoaderLookupCache
.
- This feature is enabled by the new VM flag
By default, all optimizations listed above are enabled. This simplifies testing of the whole prototype. If necessary for more detailed testing, each feature can be individually disabled by negating its associated flag.
The names of all of these VM flags will change in a future EA build as we transition from the old “CDS” terminology to the new “AOT” terminology, as discussed here.
The Leyden Repository can be built in the same way as the main-line JDK repository. Please use the "premain" branch. I.e., https://github.com/openjdk/leyden/tree/premain.
For build instructions please see the online documentation, or either of these files:
- doc/building.html (html version)
- doc/building.md (markdown version)
See https://openjdk.org/ for more information about the OpenJDK Community and the JDK and see https://bugs.openjdk.org for JDK issue tracking.
The easiest way to try out the Leyden optimizations is to build a JVM from the Leyden repository, and use it with your application with the -XX:AOTCache
flag.
Note: in an earlier version of the Leyden prototype, the optimizations were controlled by an experimental flag
-XX:CacheDataStore
. This flag has been deprecated and will be removed. For a reference to this flag, please see an older version of this document.
Here's a small benchmark that uses the JDK's built-in
JavaCompiler
class to compile some Java source files. This benchmark spends a significant amount of start-up time
setting up the classes used by JavaCompiler
, so it will benefit from the Leyden features.
First, download JavacBenchApp.java and compile it into a JAR file.
(Remember to use the java
program that you built from the Leyden repository.)
$ javac JavacBenchApp.java
$ jar cvf JavacBenchApp.jar JavacBenchApp*.class
added manifest
adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%)
adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%)
adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%)
adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%)
We can run this benchmark without any Leyden features. It takes 893 ms:
$ java -cp JavacBenchApp.jar JavacBenchApp 50
Generated source code for 51 classes and compiled them in 893 ms
To use AOT optimizations for JavacBenchApp, we should first perform a training run and
capture the profiling information into JavacBenchApp.aotconfig
$ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar JavacBenchApp 50
$ ls -l JavacBenchApp.aotconfig
-rw-rw-r-- 1 iklam iklam 27652096 Mar 3 16:23 JavacBenchApp.aotconfig
With the JavacBenchApp.aotconfig
file, we can create the AOT cache. This is called the assembly phase:
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
$ ls -l JavacBenchApp.aot
-r--r--r-- 1 iklam iklam 42332160 Mar 3 16:58 JavacBenchApp.aot
Now, we can make a production run of the program using the AOT cache JavacBenchApp.aot
. It finishes in 423 ms, or more than twice as fast as
before.
$ java -XX:AOTCache=JavacBenchApp.aot -cp JavacBenchApp.jar JavacBenchApp 50
Generated source code for 51 classes and compiled them in 423 ms
By default, training runs end when the application terminates. You have two other options to end training runs:
-XX:AOTEndTrainingOnMethodEntry=<method1,method2,...>[,count=100]
jcmd <pid> AOT.end_training
Note that -XX:AOTEndTrainingOnMethodEntry
uses the same format as -XX:CompileOnly
and the default count is 1.
See EndTrainingOnMethodEntry.java for a test case.
We have implemented part of the JEP draft 8350022: Ahead-of-time Command Line Ergonomics. This simplifies the training process, so that an AOT cache can be created with a single command-line:
$ java -XX:AOTCacheOutput=JavacBenchApp.aot -cp JavacBenchApp.jar \
JavacBenchApp 50
Generated source code for 51 classes and compiled them in 2212 ms
[3.720s][warning][cds] Skipping Sanity: Unsupported location
Temporary AOTConfiguration recorded: JavacBenchApp.aot.config
Launching child process /bld/images/jdk/bin/java to assemble AOT cache JavacBenchApp.aot
using configuration JavacBenchApp.aot.config
Picked up JAVA_TOOL_OPTIONS: -Djava.class.path=JavacBenchApp.jar
-XX:AOTCacheOutput=JavacBenchApp.aot
-XX:AOTConfiguration=JavacBenchApp.aot.config -XX:AOTMode=create
Reading AOTConfiguration JavacBenchApp.aot.config and writing AOTCache JavacBenchApp.aot
AOTCache creation is complete: JavacBenchApp.aot 39956480 bytes
Removed temporary AOT configuration file JavacBenchApp.aot.config
As seen in the log messages, when the application is about to exit, the JVM writes a temporary AOT configuration file and spawns a child process to create an AOT cache using this configuration file.
In the above example, the configuration file's name JavacBenchApp.aot.config
is automatically picked by the JVM. This is a temporary
file written into the same directory as the file specified by -XX:AOTCacheOutput
, so you must make sure this directory is writable.
After the cache is created, this temporary is automatically removed.
If you want to persist the configuration file (for debugging purposes, or for manual cache creation at a later time), you can explicitly
set the AOTConfiguration
option:
$ java -XX:AOTCacheOutput=JavacBenchApp.aot -XX:AOTConfiguration=myconfig.aotconfig \
-cp JavacBenchApp.jar JavacBenchApp 50
When -XX:AOTCacheOutput
is specified, the JVM automatically runs with -XX:AOTMode=record
, although you can also specify this explicitly:
$ java -XX:AOTMode=record -XX:AOTCacheOutput=JavacBenchApp.aot \
-cp JavacBenchApp.jar JavacBenchApp 50
An alternative way for single-command cache creation is to specify -XX:AOTCache=<file>
along with -XX:AOTMode=record
. This essentially says:
I want to record my training run directly into the specified AOT cache:
$ java -XX:AOTMode=record -XX:AOTCache=JavacBenchApp.aot \
-cp JavacBenchApp.jar JavacBenchApp 50
Note that in all the examples in this section, -XX:AOTCache
and -XX:AOTCacheOutput
are allowed to be specified at the same time, but in that case they are required to have the same value.
When creating a AOT cache with a single command, the environment variable AOT_TOOL_OPTIONS
can be used to pass extra VM options to
the child JVM process that performs the assembly step. For example:
$ export AOT_TOOL_OPTIONS=-Xmx128m
$ java -XX:AOTCacheOutput=JavacBenchApp.aot -cp JavacBenchApp.jar JavacBenchApp 50
Generated source code for 51 classes and compiled them in 2242 ms
[3.732s][warning][cds] Skipping Sanity: Unsupported location
Temporary AOTConfiguration recorded: JavacBenchApp.aot.config
Launching child process /jdk3/bld/le4-fastdebug/images/jdk/bin/java to assemble AOT
cache JavacBenchApp.aot using configuration JavacBenchApp.aot.config
Picked up AOT_TOOL_OPTIONS: -Xmx128m
Picked up JAVA_TOOL_OPTIONS: -Djava.class.path=JavacBenchApp.jar -XX:AOTCacheOutput=JavacBenchApp.aot
-XX:AOTConfiguration=JavacBenchApp.aot.config -XX:AOTMode=create -Xmx128m
Reading AOTConfiguration JavacBenchApp.aot.config and writing AOTCache JavacBenchApp.aot
AOTCache creation is complete: JavacBenchApp.aot 51937280 bytes
Removed temporary AOT configuration file JavacBenchApp.aot.config
By default, all of the optimizations described in the Overview section above are enabled by default. This ensures that you can get all the optimizations without specifying them individually.
For diagnostic purposes, you can selectively disable some of the options:
- The
-XX:+LoadCachedCode
and-XX:+ReplayTraining
flags affect only the production run. - The
-XX:+RecordTraining
option affects only the training run and the assembly phase. - All other options affect only the assembly phase.
For example, you can disable the loading of AOT-compiled methods during the production run. Notice that the benchmark now starts more slowly than it did when AOT-compiled methods was loaded.
$ java -XX:AOTCache=JavacBenchApp.aot -Xlog:cds=error -XX:-LoadCachedCode \
-cp JavacBenchApp.jar JavacBenchApp 50
Generated source code for 51 classes and compiled them in 647 ms
You can also disable AOT compilation in the assembly phase. Note that the size of the AOT cache is smaller because it no longer has AOT-compiled methods.
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar \
-XX:AOTCache=JavacBenchApp.aot -XX:-StoreCachedCode
$ ls -l JavacBenchApp.aot
-r--r--r-- 1 iklam iklam 29990912 Mar 3 16:34 JavacBenchApp.aot
When trying out the Leyden, please pay attention to the following limitations.
The CDS archive generated by the Leyden prototype includes machine instructions that are specific to the garbage collector. We recommend that you explicitly specify the same collector during both training and production runs. For example:
# assembly phase.
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar \
-XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC
# production run
$ java -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC -cp JavacBenchApp.jar \
JavacBenchApp 50
Otherwise, the CDS archive may not be useable for the production run, leading to suboptimal performance. For example, sometimes you may perform the assembly phase run on a large development host, and then use a container to run the application in a small production node. In the following scenario, as the collector is not explicitly specified, the VM will automatically pick G1 for the assembly phase, and SerialGC for the production run (due to its limited amount of memory):
# Assembly phase (uses G1 by default)
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
# Production run (uses SerialGC)
$ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
--memory=1024m \
container-registry.oracle.com/java/openjdk \
bash -c 'cd /test; ' \
'/jdk/bin/java -XX:AOTCache=JavacBenchApp.aot ' \
' -cp JavacBenchApp.jar JavacBenchApp 50'
[0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because
GC used during dump time (G1) is not the same as runtime (Serial)
[0.001s][error][cds] An error has occurred while processing the AOT cache.
[0.001s][error][cds] Unable to map shared spaces
Error occurred during initialization of VM
Unable to use AOT cache.
Currently, if you use any other garbage collector in combination with -XX:AOTMode
or -XX:AOTCache
, the VM will
exit with an error.
$ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar -XX:+UseZGC JavacBenchApp 50
Error occurred during initialization of VM
Cannot create the AOT configuration file: UseCompressedClassPointers must be enabled,
and collector must be G1, Parallel, Serial, Epsilon, or Shenandoah
As seen in the example immediately above, in the production run, if the CDS archive cannot be
used for any reason, the JVM will report an error and exit. This happens as if -XX:AOTMode=on
was
specified in the command-line.
In the standard JDK, when the CDS archive cannot be used for any reason (for example, the archive was created for a different version of the JDK), the application will continue to run without using CDS. This fall-back strategy ensures that the application will function correctly, though at a lower level of performance.
With the Leyden prototype, we have changed this fall-back behavior to make it easier to diagnose performance issues. For example, when the start-up time is not as good as one would expect, we want know whether it's caused by a misconfiguration that prevents the CDS archive from being used, or it's caused by a deficiency in the implementation of the Leyden optimizations.
To revert to the behavior of the standard JDK, you can explicitly add -XX:AOTMode=auto
to the command-line.
$ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
--memory=1024m \
container-registry.oracle.com/java/openjdk \
bash -c 'cd /test; ' \
'/jdk/bin/java -XX:AOTMode=auto -XX:AOTCache=JavacBenchApp.aot ' \
' -cp JavacBenchApp.jar JavacBenchApp 50'
[0.001s][error][cds] CDS archive has aot-linked classes. It cannot be used because
GC used during dump time (G1) is not the same as runtime (Serial)
Generated source code for 51 classes and compiled them in 831 ms
See JEP 483 for a discussion of -XX:AOTMode=on
vs -XX:AOTMode=auto
.
We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo.
(FIXME: add a benchmark for javac)
To can compare the performance of Leyden vs the main-line JDK, you need:
- An official build of JDK 21
- An up-to-date build of the JDK main-line
- The latest Leyden build
- Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind a firewall, you may need to set up proxies for Maven
The same steps are used for benchmarking all of the above demos. For example:
$ cd helidon-quickstart-se
$ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \
MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \
BLDJDK_HOME=/usr/local/jdk21 \
bench
run,mainline default,mainline custom static cds,mainline aot cache,premain aot cache
1,456,229,156,117
2,453,227,157,117
3,455,232,155,116
4,448,230,154,114
5,440,228,156,114
6,446,228,156,114
7,448,232,156,114
8,465,261,159,114
9,448,226,157,113
10,442,233,154,114
Geomean,450.05,232.41,155.99,114.69
Stdev,6.98,9.72,1.41,1.35
Markdown snippets in mainline_vs_premain.md
The above command runs each configuration 10 times, in an interleaving order. This way the noise of the system (background processes, thermo throttling, etc) is more likely to be spread across the different runs.
As is typical for benchmarking start-up performance, the numbers are not very steady.
It is best to plot
the results (as saved in the file mainline_vs_premain.csv
) in a spreadsheet to check for
noise and other artifacts.
The "make bench" target also generates GitHub markdown snippets (in the file mainline_vs_premain.md
) for creating the
graphs below.
This is useful for Leyden developers to measure the benefits of a particular optimization. The steps are similar to above, but we use the "make compare_premain_builds" target:
$ cd helidon-quickstart-se
$ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \
PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \
BLDJDK_HOME=/usr/local/jdk21 \
compare_premain_builds
Old build = /repos/leyden_old/build/linux-x64/images/jdk with options
New build = /repos/leyden_new/build/linux-x64/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT
1,110,109
2,131,111
3,118,115
4,110,108
5,117,110
6,114,109
7,110,109
8,118,110
9,110,110
10,113,114
Geomean,114.94,110.48
Stdev,6.19,2.16
Markdown snippets in compare_premain_builds.md
Please see test/hotspot/jtreg/premain/lib/Bench.gmk for more details.
Note: due to the variability of start-up time, the benefit of minor improvements may be difficult to measure.
The following charts show the relative start-up performance of the Leyden/Premain branch vs the JDK main-line.
For example, a number of "premain aot cache: 255" indicates that if the application takes 1000 ms to start-up with the JDK main-line, it takes only 255 ms to start up when all the current set of Leyden optimizations are enabled.
The benchmark results are collected with make bench
in the following directories:
helidon-quickstart-se
micronaut-first-app
quarkus-getting-started
spring-boot-getting-started
spring-petclinic
The meaning of the four rows in the following the charts:
Row | Meaning |
---|---|
mainline default | Run benchmark with no optimizations |
mainline custom static cds | Run benchmark with a custom static CDS archive |
mainline aot cache | Run benchmark with a custom AOT cache (JEP 483) |
premain aot cache | Run benchmark with a custom AOT cache, plus all Leyden optimizations such as AOT profiles and AOT-compiled methods |
These JDK versions were used in the comparisons:
- JDK main-line: JDK 24, build 24+36-3646
- Leyden: https://github.com/openjdk/leyden/tree/bbac8f2d845aa6408182ca3ff9ce60b5ca6e0390
For details information about the hardware and raw numbers, see bench.20250307.txt
---
config:
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 516, 347, 255]
---
config:
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 475, 366, 321]
---
config:
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 437, 380, 284]
---
config:
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 502, 382, 287]
---
config:
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 625, 586, 376]
Please see test/hotspot/jtreg/premain/ for more information.