The purpose of the Leyden repository is to prototype improvements to the startup time, time to peak performance, and footprint of Java programs, as a part of Project Leyden. We solicit feedback from the Java community, with the hope that some of these improvements can eventually be incorporated into future JDK releases.
- This repository contains experimental and unstable code. It is not intended to be used in a production environment.
- This repository is intended for developers of the JDK, and advanced Java developers who are familiar with building the JDK.
- The experimental features in this repository may be changed or removed without notice. Command line flags and workflows will change.
- The benchmarks results reported on this page are for illustrative purposes only. Your applications may get better or worse results.
To try out the Leyden prototype without building it from source code, please download the Leyden EA Release from https://jdk.java.net/leyden/.
As of JDK 25, the Leyden Project has successfully delivered ahead-of-time (AOT) optimizations JEPs:
- JEP 483 - Ahead-of-Time Class Loading & Linking
- JEP 514 - Ahead-of-Time Command-Line Ergonomics
- JEP 515 - Ahead-of-Time Method Profiling
Please refer to the above JEPs for a detailed discussion of AOT optimizations.
The Leyden "premain" prototype includes new experimental AOT optimizations that are not yet integrated into the JDK mainline:
-
Ahead-of-Time Code Compilation (JEP draft 8335368): Methods that are frequently used during the training run can be compiled and stored along with the AOT cache. As a result, as soon as the application starts up in the production run, its methods can be natively executed.
- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
-XX:-AOTCodeCaching
.
- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
-
Ahead-of-time generation of Dynamic Proxies: Dynamic proxies are frequently used by popular application frameworks. We can improve start-up time by generating these proxies ahead of time.
- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
-XX:-ArchiveDynamicProxies
.
- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
-
Ahead-of-time generation of reflection data: Reflection data (such as instances of
java.lang.reflect.Method
) are generated by the JVM to supportjava.lang.reflect
operations. We can generate these ahead of time to improve start-up.- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
-XX:-ArchiveReflectionData
.
- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
-
Class Not Found Cache: Sometimes application frameworks repeatedly try to load classes that do not exist. This optimization allows such failing lookups to be done quickly without repeatedly scanning the class path.
- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
-XX:-ArchiveLoaderLookupCache
.
- This feature is enabled by default when you create an AOT cache. It can be disabled with the diagnostic
flag
The Leyden Repository can be built in the same way as the main-line JDK repository. Please use the "premain" branch. I.e., https://github.com/openjdk/leyden/tree/premain.
For build instructions please see the online documentation, or either of these files:
- doc/building.html (html version)
- doc/building.md (markdown version)
See https://openjdk.org/ for more information about the OpenJDK Community and the JDK and see https://bugs.openjdk.org for JDK issue tracking.
The easiest way to try out the Leyden optimizations is to build a JVM from the Leyden repository, and use it with your application with the -XX:AOTCache
flag.
Here's a small benchmark that uses the JDK's built-in
JavaCompiler
class to compile some Java source files. This benchmark spends a significant amount of start-up time
setting up the classes used by JavaCompiler
, so it will benefit from the Leyden features.
First, download JavacBenchApp.java and compile it into a JAR file.
(Remember to use the java
program that you built from the Leyden repository.)
$ javac JavacBenchApp.java
$ jar cvf JavacBenchApp.jar JavacBenchApp*.class
added manifest
adding: JavacBenchApp$ClassFile.class(in = 1608) (out= 787)(deflated 51%)
adding: JavacBenchApp$FileManager.class(in = 2090) (out= 979)(deflated 53%)
adding: JavacBenchApp$SourceFile.class(in = 1351) (out= 671)(deflated 50%)
adding: JavacBenchApp.class(in = 7571) (out= 3302)(deflated 56%)
We can run this benchmark without any AOT optimizations. It takes 893 ms:
$ java -cp JavacBenchApp.jar JavacBenchApp 50
Generated source code for 51 classes and compiled them in 893 ms
To use AOT optimizations for JavacBenchApp, we should first perform a training run and
capture the profiling information into JavacBenchApp.aotconfig
$ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar JavacBenchApp 50
$ ls -l JavacBenchApp.aotconfig
-rw-rw-r-- 1 iklam iklam 27652096 Mar 3 16:23 JavacBenchApp.aotconfig
With the JavacBenchApp.aotconfig
file, we can create the AOT cache. This is called the assembly phase:
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
$ ls -l JavacBenchApp.aot
-r--r--r-- 1 iklam iklam 42332160 Mar 3 16:58 JavacBenchApp.aot
Alternatively, you can also combine the training run and assembly phase with a single command:
$ java -XX:AOTCacheOutput=JavacBenchApp.aot \
-cp JavacBenchApp.jar JavacBenchApp 50
$ ls -l JavacBenchApp.aot
-r--r--r-- 1 iklam iklam 42332160 Mar 3 16:58 JavacBenchApp.aot
Now, we can make a production run of the program using the AOT cache JavacBenchApp.aot
. It finishes in 423 ms, or more than twice as fast as
before.
$ java -XX:AOTCache=JavacBenchApp.aot -cp JavacBenchApp.jar JavacBenchApp 50
Generated source code for 51 classes and compiled them in 423 ms
By default, training runs end when the application terminates. You have two other options to end training runs:
-XX:AOTEndTrainingOnMethodEntry=<method1,method2,...>[,count=100]
jcmd <pid> AOT.end_training
Note that -XX:AOTEndTrainingOnMethodEntry
uses the same format as -XX:CompileOnly
and the default count is 1.
See EndTrainingOnMethodEntry.java for a test case.
As mentioned below, parts or all of the AOT cache may be disabled under certain circumstances. This may lead
to lower performance than expected. To diagnose potential performance issues, you can add -Xlog:aot*
to the
command line to see detailed information about what parts of the AOT cache are being utilized. For example, if the
the AOT-compiled code cannot be loaded, you will see a log message like this:
[0.008s][info][aot,codecache,init] Mapped 652184 bytes at address 0x00007f491005f028 from AOT Code Cache
[0.008s][info][aot,codecache,init] Loaded 439 AOT code entries from AOT Code Cache
[0.008s][info][aot,codecache,init] Unable to use AOT Code Cache.
By default, all of the optimizations described in the Overview section above are enabled by default. This ensures that you can get all the optimizations without specifying them individually.
For diagnostic purposes, you can selectively disable some of the options:
- The
-XX:+AOTCodeCaching
flag affects only the assembly phase and the production run. - The
-XX:+AOTRecordTraining
flag affects only the training run and the assembly phase. - The
-XX:+AOTReplayTraining
flag affects only the production run. - All other options affect only the assembly phase.
For example, you can disable the loading of AOT-compiled methods during the production run. Notice that the benchmark now starts more slowly than it did when AOT-compiled methods was loaded.
$ java -XX:AOTCache=JavacBenchApp.aot -Xlog:cds=error -XX:-AOTCodeCaching \
-cp JavacBenchApp.jar JavacBenchApp 50
Generated source code for 51 classes and compiled them in 647 ms
You can also disable AOT compilation in the assembly phase. Note that the size of the AOT cache is smaller because it no longer has AOT-compiled methods.
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar \
-XX:AOTCache=JavacBenchApp.aot -XX:-AOTCodeCaching
$ ls -l JavacBenchApp.aot
-r--r--r-- 1 iklam iklam 29990912 Mar 3 16:34 JavacBenchApp.aot
When trying out the Leyden prototype, please pay attention to the following limitations.
The AOT-compiled code will be only used if the production run is on a machine with the same type of CPU as used in the training run and assembly phase. If this is not the case (for example, the production run is on a machine that has different AVX capabilities), the AOT-compiled code will be ignored.
The AOT cache generated by the Leyden prototype includes machine instructions that are specific to the garbage collector. We recommend that you explicitly specify the same collector during both training and production runs. For example, if you prefer to use the SerialGC:
# assembly phase.
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar \
-XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC
# production run
$ java -XX:AOTCache=JavacBenchApp.aot -XX:+UseSerialGC -cp JavacBenchApp.jar \
JavacBenchApp 50
Otherwise, the AOT cache may not be usable for the production run, leading to suboptimal performance. For example, sometimes you may perform the assembly phase run on a large development host, and then use a container to run the application in a small production node. In the following scenario, as the collector is not explicitly specified, the VM will automatically pick G1 for the assembly phase, and SerialGC for the production run (due to its limited amount of memory):
# Assembly phase (uses G1 by default)
$ java -XX:AOTMode=create -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar -XX:AOTCache=JavacBenchApp.aot
# Production run (uses SerialGC)
$ docker run --rm -v /repos/leyden/build/linux-x64/images/jdk:/jdk -v $(pwd):/test \
--memory=1024m \
container-registry.oracle.com/java/openjdk \
bash -c 'cd /test; ' \
'/jdk/bin/java -XX:AOTCache=JavacBenchApp.aot ' \
' -cp JavacBenchApp.jar JavacBenchApp 50'
[0.001s][error][aot] AOT cache has aot-linked classes. It cannot be used because
GC used during dump time (G1) is not the same as runtime (Serial)
[0.001s][error][aot] An error has occurred while processing the AOT cache.
[0.001s][error][aot] Unable to map shared spaces
Error occurred during initialization of VM
Unable to use AOT cache.
Currently, if you use any other garbage collector in combination with -XX:AOTMode
or -XX:AOTCache
, the VM will
exit with an error.
$ java -XX:AOTMode=record -XX:AOTConfiguration=JavacBenchApp.aotconfig \
-cp JavacBenchApp.jar -XX:+UseZGC JavacBenchApp 50
Error occurred during initialization of VM
Cannot create the AOT configuration file: UseCompressedClassPointers must be enabled,
and collector must be G1, Parallel, Serial, Epsilon, or Shenandoah
We use a small set of benchmarks to demonstrate the performance of the optimizations in the Leyden repo.
To can compare the performance of Leyden vs the main-line JDK, you need:
- An official build of JDK 21
- An up-to-date build of the JDK main-line
- The latest Leyden build
- Maven (ideally 3.8 or later, as required by some of the demos). Note: if you are behind a firewall, you may need to set up proxies for Maven
The same steps are used for benchmarking all of the above demos. For example:
$ cd test/hotspot/jtreg/premain/helidon-quickstart-se
$ make PREMAIN_HOME=/repos/leyden/build/linux-x64/images/jdk \
MAINLINE_HOME=/repos/jdk/build/linux-x64/images/jdk \
BLDJDK_HOME=/usr/local/jdk21 \
bench
run,mainline default,mainline custom static cds,mainline aot cache,premain aot cache
1,456,229,156,117
2,453,227,157,117
3,455,232,155,116
4,448,230,154,114
5,440,228,156,114
6,446,228,156,114
7,448,232,156,114
8,465,261,159,114
9,448,226,157,113
10,442,233,154,114
Geomean,450.05,232.41,155.99,114.69
Stdev,6.98,9.72,1.41,1.35
Markdown snippets in mainline_vs_premain.md
The above command runs each configuration 10 times, in an interleaving order. This way the noise of the system (background processes, thermo throttling, etc) is more likely to be spread across the different runs.
As is typical for benchmarking start-up performance, the numbers are not very steady.
It is best to plot
the results (as saved in the file mainline_vs_premain.csv
) in a spreadsheet to check for
noise and other artifacts.
The "make bench" target also generates GitHub markdown snippets (in the file mainline_vs_premain.md
) for creating the
graphs below.
This is useful for Leyden developers to measure the benefits of a particular optimization. The steps are similar to above, but we use the "make compare_premain_builds" target:
$ cd helidon-quickstart-se
$ make PM_OLD=/repos/leyden_old/build/linux-x64/images/jdk \
PM_NEW=/repos/leyden_new/build/linux-x64/images/jdk \
BLDJDK_HOME=/usr/local/jdk21 \
compare_premain_builds
Old build = /repos/leyden_old/build/linux-x64/images/jdk with options
New build = /repos/leyden_new/build/linux-x64/images/jdk with options
Run,Old CDS + AOT,New CDS + AOT
1,110,109
2,131,111
3,118,115
4,110,108
5,117,110
6,114,109
7,110,109
8,118,110
9,110,110
10,113,114
Geomean,114.94,110.48
Stdev,6.19,2.16
Markdown snippets in compare_premain_builds.md
Please see test/hotspot/jtreg/premain/lib/Bench.gmk for more details.
Note: due to the variability of start-up time, the benefit of minor improvements may be difficult to measure.
The following charts show the relative start-up performance of the Leyden/Premain branch vs the JDK main-line.
For example, a number of "premain aot cache: 255" indicates that if the application takes 1000 ms to start-up with the JDK main-line, it takes only 255 ms to start up when all the current set of Leyden optimizations are enabled.
The benchmark results are collected with make bench
in the following directories under test/hotspot/jtreg/premain:
helidon-quickstart-se
javac-bench
micronaut-first-app
quarkus-getting-started
spring-boot-getting-started
spring-petclinic
The meaning of the four rows in the following charts:
Row | Meaning |
---|---|
mainline default | Run benchmark with no optimizations |
mainline custom static cds | Run benchmark with a custom static CDS archive |
mainline aot cache | Run benchmark with a custom AOT cache (JDK mainline) |
premain aot cache | Run benchmark with a custom AOT cache (Leyden Premain Prototype) |
We have benchmark results from two types of configurations using the script test/hotspot/jtreg/premain/bench_data/do_bench.sh:
- Desktop/Server Class: these are the results when running on a modern desktop or server, using the
command
bash bench_data/do_bench.sh
. - 2 Cores Only: these are the results when running in a limited configuration where only two cores.
are available, using the command
taskset -c 1,2 bash bench_data/do_bench.sh
The 2 Cores Only setting is intended to emulate microservice configurations where a very small number of cores are allocated for small Java programs. In this setting, the JIT compiler may compete for CPU with the Java program, making start-up slower. The premain aot cache numbers usually are much better in this setting because most of the start-up code has been AOT-compiled, so the app can spend most of the available CPUs to execute application logic.
These JDK versions were used in the comparisons:
- JDK main-line: JDK 25, build 25+37-LTS-3491
- Leyden: https://github.com/openjdk/leyden/tree/ce150637130086ad2b47916d66148007f5331a28
For details information about the hardware and raw numbers, see bench.20250930.txt and bench.20250930-2cpu.txt
This is the speed up of premain aot cache vs mainline default in the two types of configurations
Benchmark | Desktop/Server Class (28 Cores) | 2 Cores Only |
---|---|---|
Helidon Quick Start | 3.59x | 4.11x |
JavacBenchApp 50 source files | 2.21x | 3.17x |
Micronaut First App Demo | 2.91x | 4.90x |
Quarkus Getting Started Demo | 2.97x | 3.74x |
Spring-boot Getting Started Demo | 4.13x | 4.70x |
Spring PetClinic Demo | 3.33x | 3.03x |
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 520, 350, 279]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 785, 572, 452]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 482, 387, 344]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 499, 417, 337]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 492, 332, 242]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 619, 568, 301]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 585, 459, 244]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 845, 674, 315]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 439, 355, 204]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 512, 495, 268]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 607, 518, 213]
---
config:
theme: "forest"
xyChart:
chartOrientation: horizontal
height: 300
---
xychart-beta
x-axis "variant" ["mainline default", "mainline custom static cds", "mainline aot cache", "premain aot cache"]
y-axis "Elapsed time (normalized, smaller is better)" 0 --> 1000
bar [1000, 632, 572, 330]
Please see test/hotspot/jtreg/premain/ for more information.