Frequently asked questions about version catalogs

11 April 2021

Tags: gradle catalog convenience

Version catalogs FAQ

Can I use a version catalog to declare plugin versions?

No.The initial implementation of the version catalogs had, in TOML files, a dedicated section for plugins:

[plugins]
id.of.my.awesome.plugin="1.2.3"

However, after community feedback and for consistency reasons, we removed this feature from the initial release. This means that currently, you have to use the pluginManagement section of the settings file to deal with your plugin versions, and this cannot use, in particular, the TOML file to declare plugin versions:

settings.gradle
pluginManagement {
    plugins {
        id("me.champeau.jmh") version("0.6.3")
    }
}

It may look surprising that you can’t use version(libs.plugins.jmh) for example in the pluginManagement block, but it’s a chicken and egg problem: the pluginManagement block has to be evaluated before the catalogs are defined, because settings plugins may contribute more catalogs or enhance the existing catalogs. Therefore, the libs extension doesn’t exist when this block is evaluated.

The limitation of not being able to deal with plugin versions in catalogs will be lifted in one way or another in the future.

Can I use the version catalog in buildSrc?

Yes you can. Not only in buildSrc, but basically in any included build too. You have several options, but the easiest is to include the TOML catalog in your buildSrc/settings.gradle(.kts) file:

buildSrc/settings.gradle
dependencyResolutionManagement {
    versionCatalogs {
        lib {
            from(files("../gradle/libs.versions.toml"))
        }
    }
}

But how can I use the catalog in plugins defined in buildSrc?

The solution above lets you use the catalogs in the build scripts of buildSrc itself, but what if you want to use the catalog(s) in the plugins that buildSrc defines, or precompiled script plugins? Long story short, currently, you can do it using a type unsafe API only.

First you need to access the version catalogs extension to your plugin/build script, for example in Groovy:

def catalogs = project.extensions.getByType(VersionCatalogsExtension)

or in Kotlin:

val catalogs = extensions.getByType<VersionCatalogsExtension>()

then you can access the version catalogs in your script, for example writing:

pluginManager.withPlugin("java") {
    val libs = catalogs.named("libs")
    dependencies.addProvider("implementation", libs.findDependency("lib").get())
}

Note that this API doesn’t provide any static accessor but is nevertheless safe, using the Optional API. There’s a reason why you cannot access type-safe accessors in plugins/precompiled script plugins, you will find more details on this issue. In a nutshell, that’s because buildSrc plugins (precompiled or not) are plugins which can be applied to any kind of project and we don’t know what the target project catalogs will be: there’s no inherent reason why they would be the same. In the future we will probably provide a way to explain that, at your own risk, expect the target catalog model to be the same.

Can I use version catalogs in production code?

No, you can’t. Version catalogs are only accessible to build scripts/plugins, not your production code.

Should I use a platform or a catalog?

You should probably use both, look at our docs for a complete explanation.

Why did you choose TOML and not YAML?

or XML (or pick your favorite format). The rationale is described in the design document.

My IDE is red everywhere, MISSING_DEPENDENCY_CLASS error

If you are seeing this error:

missing dependency

upgrade to the latest IntelliJ IDEA 2021.1, which fixes this problem.

Why can’t I have nested aliases with the same prefix?

Imagine that you want to have 2 aliases, say junit and junit-jupiter and that both represent distinct dependencies: Gradle won’t let you do this and you will have to rename your aliases to, say junit-core and junit-jupiter. That’s because Gradle will map those aliases to accessors, that is to say libs.getJunit() and libs.getJUnit().getJupiter(). The problem is that you can’t have an accessor which is both a leaf (represents a dependency notation) and a node (that is to say an intermediate node to access a real dependency). The reason we can’t do this is that we’re using lazy accessors of type Provider<MinimalExternalDependency> for leaves and that type cannot be extended to provide accessors for "children" dependencies. In other words, the type which represents a node with children provides accessors which return Provider<...> for dependencies, but a provider itself cannot have children. A potential workaround for this would be to support, in the future, an explicit call to say "I’m stopping here, that’s the dependency I need", for example:

dependencies {
    testImplementation(libs.junit.get())
    // or
    testImplemementation(libs.junit.peek()) // because `get()` might be confusing as it would return a `Provider` on which you can call `get()` itself
}

For now the team has decided to restrict what you can do by preventing having aliases which have "name clashes".

Why can’t I use an alias with dots directly?

You will have noticed that if you declare an alias like this:

[libraries]
junit-jupiter = "..."

then Gradle will generate the following accessor: libs.junit.jupiter (basically the dashes are transformed to dots). The question is, why can’t we just write:

[libraries]
junit.jupiter = "..."

And the reason is: tooling support. The previous declaration is actually equivalent to writing:

[libraries]
   [junit]
   jupiter = "..."

but technically, it’s undecidable where the "nesting hierarchy" stops, which would prevent tools from providing good completion (for example, where you can use { module = "..."}. It also makes it harder for tooling to automatically patch the file since they wouldn’t know where to look for.

As a consequence, we’ve decided to keep the format simple and implement this mapping strategy.

Should I use commons-lang3 as an alias or commonsLang3?

Problably neither one nor the other :) By choosing commons-lang3, you’re implicitly creating a group of dependencies called commons, which will include a number of dependencies, including lang3. The question then is, does that commons group make sense? It’s rather abstract, no? Does it actually say it’s "Apache Commons"?

A better solution would therefore be to use commonsLang3 as the alias, but then you’d realize that you have chosen a version in the alias name, so why not commonsLang directly?

Therefore:

[libraries]
commonsLang = { module="org.apache.commons:commons-lang3:3.3.1" }

This means that the dashes should be limited to grouping of dependencies, so that they are organized in "folders". This can make it practical when you have lots of dependencies, but it also makes them less discoverable by completion, since you’d have to know in which subtree to look at. Proper guidance on what to use will be discussed later, based on your feedback and practices.

Should I use the settings API or the TOML file?

Gradle comes with both a settings API to declare the catalog, or a convenience TOML file. I would personally say that most people should only care about the TOML file as it covers 80% of use cases. The settings API is great as soon as you want to implement settings plugins or, for example, that you want to use your own, existing format to declare a catalog, instead of using the TOML format.

Why can’t I use excludes or classifiers?

By design, version catalogs talk about dependency coordinates only. The choice of applying excludes is on the consumer side: for example, for a specific project, you might need to exclude a transitive dependency because you don’t use the code path which exercises this dependency, but this might not be the case for all places. Similarly, a classifier falls into the category of variant selectors (see the variant model): for the same dependency coordinates, one might want classifier X, another classifier Y, and it’s not necessarily allowed to have both in the same graph. Therefore, classifiers need to be declared on the dependency declaration site:

dependencies {
   implementation(variantOf(libs.myLib) { classifier('test-fixtures') })
}

The rationale being this limitation is that the use of classifiers is an artifact of the poor pom.xml modeling, which doesn’t assign semantics to classifiers (we don’t know what they represent), contrary to Gradle Module Metadata. Therefore, a consumer should only care about the dependency coordinates, and the right variant (e.g classifier) should be selected automatically by the dependency resolution engine. We want to encourage this model, rather than supporting adhoc classifiers which will eventually require more work for all consumers.

How do I tell Gradle to use a specific artifact?

Similarly to classifiers or excludes, artifact selectors belong to the dependency declaration site. You need to write:

dependencies {
    implementation(libs.myLib) {
        artifact {
            name = 'my-lib' // note that ideally this will go away, see https://github.com/gradle/gradle/issues/16768
            type = 'aar'
        }
    }
}

Where should I report bugs or feature requests?

As usual, on our issue tracker. There’s also the dedicated epic where you will find the initial specification linked, which explains a lot of the design process.

Comments

Simplified version management with Gradle 7

24 March 2021

Tags: gradle catalog convenience

Gradle 7 introduces the concept of version catalogs, which I’ve been working on for several months already. Long story short, I’m extremely excited by this new feature which should make dependency management easier with Gradle. Let’s get started!

Please also read my Version catalogs FAQ follow up post if you have more questions!

Sharing dependencies between projects

One of the most frequent questions raised by Gradle users is how to properly share dependency versions between projects. For example, let’s imagine that you have a multi-project build with this layout:

root
 |---- client
 |---- server

Because they live in the same "multi-project", it is expected that both client and server would require the same dependencies. For example, both of them would need Guava as an implementation detail and JUnit 5 for testing:

build.gradle
dependencies {
    implementation("com.google.guava:guava:30.0-jre")
    testImplementation("org.junit.jupiter:junit-jupiter-api:5.7.1")
    testRuntimeOnly("org.junit.jupiter:junit-jupiter-engine")
}

Without any sharing mechanism, both projects would replicate the dependency declarations, which is subject to a number of drawbacks:

  • upgrading a library requires updating all build files which use it

  • you have to remember about the dependency coordinates (group, artifact, version) of all dependencies

  • you might accidentally use different versions in different projects

  • some dependencies are always used together but you have to duplicate entries in build files

Existing patterns

For these reasons, users have invented over the years different patterns for dealing with dependency versions over time. For example:

Versions in properties files:

gradle.properties
guavaVersion=30.0-jre

then in a build file:

dependencies {
    implementation("com.google.guava:guava:${guavaVersion}")
}

Or versions in "extra properties" in the root project:

extra properties
ext {
   guavaVersion = '30.0-jre'
}

// ...

dependencies {
    implementation("com.google.guava:guava:${guavaVersion}")
}

Sometimes you even find full coordinates in dependencies.gradle files.

And since the rise of the Kotlin DSL, another pattern became extremely popular in the Android world: declaring libraries in buildSrc then using type-safe accessors to declare dependencies in build scripts:

buildSrc/src/main/kotlin/Libs.kt
object Libs {
   val guava = "com.google.guava:guava:30.0-jre"
}

and in a build script:

build.gradle
dependencies {
    implementation(Libs.guava)
}

This last example is interesting because it goes into the direction of having more type-safety, more compile-time errors (as opposed to runtime errors). But it has a major drawback: any change to any dependency will trigger recompilation of build scripts and invalidate the build script classpath, causing up-to-date checkness to fail and in the end, rebuilding a lot more than what you should do for a single version change.

Introducing version catalogs

A version catalog is basically a replacement for all the previous patterns, supported by Gradle, without the drawbacks of the previous approaches. To add support for version catalogs, you need to enable the experimental feature in your settings file:

settings.gradle
enableFeaturePreview("VERSION_CATALOGS")

In its simplest form, a catalog is a file found in a conventional location and uses the TOML configuration format:

gradle/libs.versions.toml
[libraries]
guava = "com.google.guava:guava:30.0-jre"
junit-jupiter = "org.junit.jupiter:junit-jupiter-api:5.7.1"
junit-engine = { module="org.junit.jupiter:junit-jupiter-engine" }

[bundles]
testDependencies = ["junit-jupiter", "junit-engine"]

This declares the dependency coordinates which will be used in build scripts. You still have to declare your dependencies, but this now can be done using a typesafe API:

build.gradle
dependencies {
    implementation(libs.guava)
    testImplementation(libs.testDependencies)
}

The benefit of type-safe APIs is immediately visible in the IDE:

In the catalog file above, we inlined dependency versions directly in the coordinates. However, it’s possible to externalize them so that you can share a dependency version between dependencies. For example:

gradle/libs.versions.toml
[versions]
groovy = "2.5.14"
guava = "30.0-jre"
jupiter = "5.7.1"

[libraries]
guava = { module="com.google.guava:guava", version.ref="guava" }
junit-jupiter = { module="org.junit.jupiter:junit-jupiter-api", version.ref="jupiter" }
junit-engine = { module="org.junit.jupiter:junit-jupiter-engine" }

groovy-core = { module="org.codehaus.groovy:groovy", version.ref="groovy" }
groovy-json = { module="org.codehaus.groovy:groovy-json", version.ref="groovy" }

[bundles]
testDependencies = ["junit-jupiter", "junit-engine"]

This new feature makes it trivial to update a dependency version: you have a single place where to look at.

This comes with other benenefits like the fact that updating the GAV coordinates (group, artifact or version) of a dependency doesn’t trigger recompilation of build scripts. The TOML format also provides us with the ability to declare rich versions.

Under the hood

Under the hood, Gradle provides an API to declare catalogs. This API is found on the Settings, which means that plugin authors can contribute catalogs, for example via convention plugins applied to the settings.gradle(.kts) file.

This API is more verbose than when you use the TOML file, but is designed for type-safety. The equivalent of the catalog above would be this:

settings.gradle
dependencyResolutionManagement {
   versionCatalogs {
      libs {
           alias("guava").to("com.google.guava", "guava").versionRef("guava")
           alias("junit-jupiter").to("org.junit.jupiter", "junit-jupiter-api").versionRef("jupiter")
           alias("junit-engine").to("org.junit.jupiter", "junit-jupiter-engine").withoutVersion()
           alias("groovy-core").to("org.codehaus.groovy", "groovy").versionRef("groovy")
           alias("groovy-json").to("org.codehaus.groovy", "groovy-json").versionRef="groovy")

           version("groovy", "2.5.14")
           version("guava", "30.0-jre")
           version("jupiter", "5.7.1")
      }
   }
}

This API actually must be used if you are consuming an external catalog. That’s one of the big selling points of this feature: it allows teams (or framework authors) to publish catalogs, so that users can get recommendations. For example, let’s imagine that the Spring Boot team publishes a catalog of recommendations (they do something similar today with a BOM, but BOMs will have an impact on your transitive dependencies that you might not want).

Consuming this catalog it in a Gradle build would look like this:

settings.gradle
dependencyResolutionManagement {
   versionCatalogs {
       spring {
           from("org.springframework:spring-catalog:1.0')
       }
   }
}

This would make a catalog available under the spring namespace in your build scripts. Therefore, you’d be able to use whatever version of SLF4J the Spring team recommends by declaring this dependency:

build.gradle
dependencies {
    implementation(spring.slf4j)
}

Such a catalog would be published on a regular Maven repository, as a TOML file. Thanks to Gradle’s advanced dependency resolution engine, it’s totally transparent to the user that the actual dependency is a catalog.

What version catalogs are not

At this stage, it becomes important to state what version catalogs are not:

  • they are not the "single source of truth" for your dependencies: it’s not because you have a catalog that you can’t directly declare dependencies using the "old" notation in build scripts. Nor does it prevent plugins from adding dependencies. Long story short: the presence of a catalog makes discoverability and maintenance easier, but it doesn’t remove any of the flexibility that Gradle offers. We’re thinking about ways to enforce that all direct dependencies are declared via a catalog in the future.

  • the version declared in a catalog is not necessarily the one which is going to be resolved: a catalog only talks about direct dependencies (not transitives) and the version that you use is the one used as an input to dependency resolution. With transitive dependencies, it’s typically possible that a version gets upgraded, for example.

  • while it makes it possible for third-party tooling to "update automatically" versions, this wasn’t a goal of this work. If you relate to the previous point, it all makes sense: as long as you rely on the input (what is written) to assume what is going to be resolved, you’re only wishing that it is what is going to be resolved. It may be enough for some cases, though. Please refer to my blog post about Dependabot for more insights on this topic. Again, future work we have in mind is adding some linting to make sure that the first level dependencies you declare match whatever you resolved, because in general, having a difference there is a sign that something is wrong in the setup. I’m going to repeat myself, but don’t assume that the version you see in a config file is the one you will get.

Please take a look at the documentation for further details, and give us your feedback!

Comments

Using Java 16 with Gradle

17 March 2021

Tags: gradle java16

Java 16 is out and I’m seeing a number of folks trying to figure out how to use Java 16 with Gradle. Often they would try to run Gradle with JDK 16 and see it fail. There’s a ticket about Java 16 support in Gradle but in most cases you can already work with JDK 16 without waiting for official support.

The happy path

Gradle 7, which is due soon, will provide official support for Java 16. If you have an existing build that you want to try on Java 16, you can update the wrapper to use the latest Gradle 7.0 milestone release:

gradle/wrapper/gradle-wrapper.properties
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-7.0-milestone-3-bin.zip
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists

If you are lucky this is all you need to do.

However, Gradle 7 is a major release, and as such it brings a number of changes which may break your build (deprecated methods being removed, or, in particular for the Java 16 support, upgrading to Groovy 3 internally). It may be a bit involved to migrate to Gradle 7 just to try Java 16.

Decouple the Java version used for Gradle itself from the version you need!

It’s actually better to decouple the version of Java required to run Gradle from the version of Java your application requires. In general, it’s actually considered the best practice to use whatever version of the JDK Gradle officially supports to run Gradle itself, and configure the build to use a different JDK.

Configuring Java toolchains

In Gradle terminology, this is called activating Java Toolchains.

Let’s get started with a sample project running on latest stable Gradle, which is 6.8.3. Make sure that you have Gradle 6.8.3 on your PATH to get started. I’m personally recommending to use sdkman! to install Gradle:

$ sdk install gradle 6.8.3

At the same time, we want to make sure we run Gradle with a supported version, which is anything between Java 8 and 15:

$ java -version
java -version
openjdk version "11.0.9.1" 2020-11-04
OpenJDK Runtime Environment 18.9 (build 11.0.9.1+1)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.9.1+1, mixed mode)

If it outputs something else than 8 to 15, please make sure to update your PATH to point to such a JDK. Again you can do this with sdkman!:

$ sdk install java 11.0.9.open

Demo application

Now, let’s create a sample Gradle project:

$ mkdir demo-app
$ cd demo-app
$ gradle init

Then select:

Select type of project to generate:
  1: basic
  2: application
  3: library
  4: Gradle plugin
Enter selection (default: basic) [1..4] 2

Select implementation language:
  1: C++
  2: Groovy
  3: Java
  4: Kotlin
  5: Scala
  6: Swift
Enter selection (default: Java) [1..6] 3

Split functionality across multiple subprojects?:
  1: no - only one application project
  2: yes - application and library projects
Enter selection (default: no - only one application project) [1..2] 1

Select build script DSL:
  1: Groovy
  2: Kotlin
Enter selection (default: Groovy) [1..2] 1

Select test framework:
  1: JUnit 4
  2: TestNG
  3: Spock
  4: JUnit Jupiter
Enter selection (default: JUnit 4) [1..4] 4

Project name (default: demo-app):
Source package (default: demo.app):

and confirm the default name and packages.

Then let’s run our app:

$ ./gradlew run

> Task :app:run
Hello World!

BUILD SUCCESSFUL in 4s
2 actionable tasks: 2 executed

Migrating the application to Java 16

All good! Now let’s configure Gradle to use Java 16 to build and run our app instead. Let’s open the build script, found under app:

app/build.gradle
plugins {
    // Apply the application plugin to add support for building a CLI application in Java.
    id 'application'
}

// Add this under the `plugins` section:
java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(16)
    }
}

The java.toolchain block lets us configure the toolchain that Gradle is going to use to build and run your application. We’re setting 16, which means that we’re going to compile the main and test sources as well as execute the with a Java 16 JDK. Gradle will automatically try to find if you have a Java 16 installation in a conventional location. If it cannot find one you will see something like this happening:

Provisioning toolchain adoptopenjdk-16-x64-linux.tar.gz > adoptopenjdk-16-x64-linux.tar.gz > 66 MiB/195.8 MiB

which means that Gradle is downloading the JDK for you!

Let’s check:

./gradlew run

Dang! The build fails!. To some extent, it’s good news, it means that Gradle is really using Java 16, but why is it failing?

Disabling incremental compilation

Well, you’re facing one of the bugs we fixed in 7, which is that our incremental compiler isn’t compatible with Java 16 because we’re using classes which have been made "hidden" by the module system in Java 16.

There’s an easy fix: let’s disable incremental compilation!

Again, let’s open our app/build.gradle file and add this:

tasks.withType(JavaCompile).configureEach {
	// disable incremental compilation
    options.incremental = false
}

And let’s run the build again:

./gradlew run

Yay! This time the build passed! Congrats, you have your first Java 16 app running!

Alternatively to disabling incremental compilation, you might just want to let Gradle access the JDK internals. This solution is better for performance, even if a bit "hackish":

tasks.withType(JavaCompile).configureEach {
    options.forkOptions.jvmArgs.addAll( ['--add-opens', 'jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED'] )
}

In case you want to use one of the experimental features that Java 16 provides, the setup I’ve described in a previous post about Java Feature Previews still hold and is a good follow-up to this post!

Comments

Gradle + WebAssembly

21 February 2021

Tags: gradle webassembly

A couple weeks ago, I listened to an interview (in French) of Geoffroy Couprie and Ivan Enderlin who gave a very nice overview of what WebAssembly is and what it’s good for. To be honest it wasn’t clear to me what the advantages of WebAssembly were, compared to a good old JVM: all in all, both are virtual machines, and both promise the same thing, that is to say write once run everywhere. Also, in my mind, WebAssembly was some kind of restricted version of JavaScript like asm.js. I was wrong: it’s more than that.

Why WebAssembly?

One of the aspects which were discussed during this interview was basically sandboxing: with WebAssembly, it’s the responsibility of the embedder to give access, for example, to I/O to the wasm program. In practice, it means that wasm binaries are by default very restricted. In fact, by design, they are restricted to pure computations, which makes them very suitable for isolation of work. Extensions from embedders are responsible for giving access to the host resources (for example the GPU, or the file system).

Another promise of WebAssembly is running at close-to-native speeds.

Therefore, a crazy idea came to me: what if I could use a wasm binary as a task implemention in Gradle? This, for example, would let us use whatever language compiling to WASM as an implementation language for Gradle tasks. mmmmmmm

The result of this experience is available at this repository. The results are quite promising.

Isolating task inputs from the task execution

In a nutshell, a Gradle task can be seen as a function which takes a number of inputs and returns a value. Cacheability is derived from the "pureness" of the function: for the same inputs, the output should be the same.

This makes it very suitable to the WebAssembly model where functions are exported to an embedder.

In reality, it’s more complicated: Gradle tasks most likely use files as inputs, and also produce files. This means that the WebAssembly runtime would need to provide I/O access for this to be really useful.

After some support from Ivan Enderlin, I quickly figured out that it would be difficult to make file access work, so I simplified the problem. In the prototype, my WASM tasks are not able to produce a file output and are limited to simply display the execution result on the console.

For tasks which actually have files as inputs, I’m reading the file contents from Java code into byte arrays which are "propagated" to the WASM runtime memory. This means, effectively, that the WASM functions I played with don’t have any kind of access to the file system and remain pure functions.

With those limitations in mind, here’s what I came up with.

Declaring the task I/O protocol

Gradle is evolving fast and nowadays the idiomatic way to declare inputs and outputs of a task is to use the lazy configuration API. However, this isn’t enforced and nobody prevents you from writing tasks which do not use this API. As an implementor of a new integration mechanism, I can set the rules and actually restrict the scope to tasks which use this API, which has a number of advantages.

Say that we want to write a task which computes the sum of two integers. With the lazy configuration API, you need two "properties" corresponding to the input numbers:

abstract class MyTask extends DefaultTask {
   abstract Property<Integer> getX(); // implementation is generated by Gradle
   abstract Property<Integer> getY();

The actual value is "lazy" in the sense that it can be set via a plugin, overriden in a build script, mapped from another property, etc. For example, that the value of the x property can be computed from the output of another task:

x.set(myOtherTask.outputFile.map { ... }

The advantage of this is that Gradle can track the inputs and outputs of tasks for you and that you don’t have to declare any explicit dependency between tasks, making the build less brittle.

However, once you’re about to execute a task, you don’t really care about the Property<...> wrappers anymore: what matters are the actual values, that you can get():

int x = getX().get();
int y = getY().get();

This often leads to some boilerplate code in task implementations. In our case, the x and y integers are actually the only thing we care about to call our WASM function: we don’t need to pass the richer Provider type to the functions.

Let’s imagine that we have this function written in Rust:

#[no_mangle]
pub extern fn sum(x: i32, y: i32) -> i32 {
    x + y
}

We can see that the inputs of this function are integers and that it returns another integer: it’s that simple.

So, what if we could actually simplify the implementation of the Gradle task itself? With all this in mind, I decided to prototype an annotation processor which would generate the boilerplate code for me.

To transform the Rust function above into a full Gradle task, all I have to write is a declaration of what I call its I/O protocol:

@WasmProtocol(
        taskName = "Sum",
        classpathBinary = "demo_lib",
        functionName = "sum"
)
public interface SumIO {
    @Input
    Property<Integer> getX();

    @Input
    Property<Integer> getY();
}

This will generate a task of type Sum, which will use a wasm binary found on classpath named demo_lib, and use the function sum from that binary as the task implementation. That’s it!

Note that this protocol isn’t declaring any output: that’s a limitation of the prototype right now, which is inherently caused by this whole "Gradle tasks are mostly generating files" problem. But we don’t really care for now.

A more realistic example

For my tests, I used 3 different functions: this sum function, a fib function (the Fibonacci suite function) and I wanted to try something more complicated like computing an md5 hash.

This, for example, is how I would define the protocol for the MD5 hashing function:

public interface HasherIO {
    @InputFile
    @PathSensitive(PathSensitivity.NAME_ONLY)
    RegularFileProperty getInputFile();

    @OutputFile
    RegularFileProperty getOutputFile();
}

Except that in this case, because the input is a file, I couldn’t use my code annotation processor code generator yet (but it’s planned). Instead I wrote code to read the file manually, allocate WebAssembly memory buffers, in order to call the function which is implemented in Rust:

#[no_mangle]
pub extern fn process(bytes: *const u8, len: usize) -> *const c_void {
    let data: &[u8] = unsafe { std::slice::from_raw_parts(bytes, len) };
    let mut hasher = Md5::new();
    hasher.update(data);
    let result = hasher.finalize_fixed();
    let pointer = result
        .to_vec()
        .as_ptr();
    mem::forget(pointer);

    pointer as *const c_void
}

Then all I need is to use this task in a build script. Let’s see how it performs…

The WebAssembly runtimes

I wrote 2 different implementations of the WASM integration runtime: the first one was pretty straightforward to write and makes use of wasmer-java. The second one took me significantly more time to implement and is using GraalVM.

Integrating Wasmer was easy for different reasons: 1. it’s just a library which you have to add to your classpath 2. it’s relatively well documented

GraalVM was more complicated because:

  1. you actually need to run your program on GraalVM

  2. you need to install the WASM support separately (it’s not downloadable as a regular Maven dependency, for example)

  3. you still need to add the GraalVM polyglot API on classpath

  4. it’s poorly documented at this stage (in particular, there’s no documentation whatsoever on how to share memory between the Java host and the WASM guest)

Anyway, I think (but I haven’t done it yet) that the GraalVM runtime will be easier to support I/O since it already offers the configuration options to let the WASM host access the host file system. Wasmer doesn’t support I/O yet.

Let’s talk about performance, now. Disclaimer: this isn’t proper benchmarking. The results you will see were obtained via functional testing of a plugin. There’s a lot of variance, but they were reproducible.

Measuring the wasmer runtime

In short, the wasmer runtime is very promising: it’s easy to setup and actually performs extremely well. The API is not very Java friendly, but the abstraction layer I wrote (which supports both wasmer and GraalVM) makes it significanly easier.

Here are some results for a memoized Fibonacci, which compares a version I wrote in Java vs the one I wrote in Rust:

Memoized Fibonacci on wasmer
Java fib(90) = 2880067194370816120
Took 3ms
Precompiled Rust fib(90) = 2880067194370816120
Took 366μs

The WASM version compiled from Rust is already faster than the Java version!

Let’s see how it performs when hashing a MD5 file (remember that for this use case, I’m actually passing a byte array to the WASM program, not a file):

Hashing a 4MB file on wasmer
hash from Java is 49DFDCEF6751973A236D35401B6CBFC8
Took 64ms
hash from Rust is 49DFDCEF6751973A236D35401B6CBFC8
Took 58ms

Again, the WASM version is still faster!

On both operations, the WASM binary performs better than the pure Java version. However, there’s a catch: in order to reach that level of performance, the WASM binary has to be precompiled to a native binary by wasmer. This, already takes time. If you include this in the whole picture, the numbers are different: 36ms for Fibonacci (compared to 3ms in Java, 10x slower). However, this is in practice not a big deal since those binaries can actually be cached, meaning that if we have to call them multiple times, or from different builds, we can actually fetch them from the cache.

All in all, it means that the wasmer runtime is very fast and integrates quite well with Java.

Measuring the GraalVM runtime

The WebAssembly support for GraalVM is still experimental. However, it has the advantage of taking advantage of the Truffle API, which promises better integration between languages and, eventually, better performance.

In my case, that wasn’t quite true. Again as usual don’t trust benchmarks, but here are the numbers:

Memoized Fibonacci on wasmer
Java fib(90) = 2880067194370816120
Took 3ms
Precompiled Rust fib(90) = 2880067194370816120
Took 21ms

This time, the WASM code is significantly slower. The explanation is probably that contrary to the wasmer runtime, the WASM binary has to be parsed and transformed into a model that the Truffle API can understand, and as far as I could tell, this is not cacheable. However, this isn’t the only explanation, as we can see with the hash example:

Hashing a 4MB file on GraalVM
hash from Java is 49DFDCEF6751973A236D35401B6CBFC8
Took 57ms
hash from Rust is 49DFDCEF6751973A236D35401B6CBFC8
Took 407ms

Again we can see that the performance is significantly worse with GraalVM. I must again say that maybe I’m not using the API properly. In particular, I have found no better way to pass the byte[] to the WASM memory model other than byte by byte!

What we’ve learnt

In this blog post, we’ve seen that we can use a wasm binary in Gradle as the implementation of a task. This binary can be written in any language which supports compiling to WebAssembly. In my [test project], I have written tasks in 2 different languages: Rust and AssemblyScript.

We’ve seen that we can integrate WASM binaries using 2 different "runtimes":

  • Wasmer, which is using JNI and compiles, as far as I understand, WebAssembly binaries to native code

  • GraalVM, which is a different Java Virtual Machine, which usually performs extremely well with Java, and provides a Polyglot runtime leveraging the Truffle API.

As of today, the Wasmer version performs significantly better and WASM functions can be executed even faster than Java code! The GraalVM version is still experimental and performs quite poorly compared to using native Java code. It’s also more painful to test because it’s not enough to install GraalVM: you also have to install components separately, which is not build friendly, nor CI friendly.

The next step for me is to try to integrate more directly with the file system: at the current stage, none of the approaches is suitable for Gradle as we need to read and write files.

Also, one has to keep in mind that it’s pretty rare that you’d like to integrate in a build arbitrary code like this: in general, you want to call external tools (javac, gcc, …). Nevertheless, this experiment is quite fun and I’m going to experiment more with this annotation processing API, which, I think, would be valuable in any case.

Comments


If you like this blog or my talks, consider helping me acquire astronomy equipment

Older posts are available in the archive.