21 February 2021

Tags: gradle webassembly

A couple weeks ago, I listened to an interview (in French) of Geoffroy Couprie and Ivan Enderlin who gave a very nice overview of what WebAssembly is and what it’s good for. To be honest it wasn’t clear to me what the advantages of WebAssembly were, compared to a good old JVM: all in all, both are virtual machines, and both promise the same thing, that is to say write once run everywhere. Also, in my mind, WebAssembly was some kind of restricted version of JavaScript like asm.js. I was wrong: it’s more than that.

Why WebAssembly?

One of the aspects which were discussed during this interview was basically sandboxing: with WebAssembly, it’s the responsibility of the embedder to give access, for example, to I/O to the wasm program. In practice, it means that wasm binaries are by default very restricted. In fact, by design, they are restricted to pure computations, which makes them very suitable for isolation of work. Extensions from embedders are responsible for giving access to the host resources (for example the GPU, or the file system).

Another promise of WebAssembly is running at close-to-native speeds.

Therefore, a crazy idea came to me: what if I could use a wasm binary as a task implemention in Gradle? This, for example, would let us use whatever language compiling to WASM as an implementation language for Gradle tasks. mmmmmmm

The result of this experience is available at this repository. The results are quite promising.

Isolating task inputs from the task execution

In a nutshell, a Gradle task can be seen as a function which takes a number of inputs and returns a value. Cacheability is derived from the "pureness" of the function: for the same inputs, the output should be the same.

This makes it very suitable to the WebAssembly model where functions are exported to an embedder.

In reality, it’s more complicated: Gradle tasks most likely use files as inputs, and also produce files. This means that the WebAssembly runtime would need to provide I/O access for this to be really useful.

After some support from Ivan Enderlin, I quickly figured out that it would be difficult to make file access work, so I simplified the problem. In the prototype, my WASM tasks are not able to produce a file output and are limited to simply display the execution result on the console.

For tasks which actually have files as inputs, I’m reading the file contents from Java code into byte arrays which are "propagated" to the WASM runtime memory. This means, effectively, that the WASM functions I played with don’t have any kind of access to the file system and remain pure functions.

With those limitations in mind, here’s what I came up with.

Declaring the task I/O protocol

Gradle is evolving fast and nowadays the idiomatic way to declare inputs and outputs of a task is to use the lazy configuration API. However, this isn’t enforced and nobody prevents you from writing tasks which do not use this API. As an implementor of a new integration mechanism, I can set the rules and actually restrict the scope to tasks which use this API, which has a number of advantages.

Say that we want to write a task which computes the sum of two integers. With the lazy configuration API, you need two "properties" corresponding to the input numbers:

abstract class MyTask extends DefaultTask {
   abstract Property<Integer> getX(); // implementation is generated by Gradle
   abstract Property<Integer> getY();

The actual value is "lazy" in the sense that it can be set via a plugin, overriden in a build script, mapped from another property, etc. For example, that the value of the x property can be computed from the output of another task:

x.set(myOtherTask.outputFile.map { ... }

The advantage of this is that Gradle can track the inputs and outputs of tasks for you and that you don’t have to declare any explicit dependency between tasks, making the build less brittle.

However, once you’re about to execute a task, you don’t really care about the Property<…​> wrappers anymore: what matters are the actual values, that you can get():

int x = getX().get();
int y = getY().get();

This often leads to some boilerplate code in task implementations. In our case, the x and y integers are actually the only thing we care about to call our WASM function: we don’t need to pass the richer Provider type to the functions.

Let’s imagine that we have this function written in Rust:

#[no_mangle]
pub extern fn sum(x: i32, y: i32) -> i32 {
    x + y
}

We can see that the inputs of this function are integers and that it returns another integer: it’s that simple.

So, what if we could actually simplify the implementation of the Gradle task itself? With all this in mind, I decided to prototype an annotation processor which would generate the boilerplate code for me.

To transform the Rust function above into a full Gradle task, all I have to write is a declaration of what I call its I/O protocol:

@WasmProtocol(
        taskName = "Sum",
        classpathBinary = "demo_lib",
        functionName = "sum"
)
public interface SumIO {
    @Input
    Property<Integer> getX();

    @Input
    Property<Integer> getY();
}

This will generate a task of type Sum, which will use a wasm binary found on classpath named demo_lib, and use the function sum from that binary as the task implementation. That’s it!

Note that this protocol isn’t declaring any output: that’s a limitation of the prototype right now, which is inherently caused by this whole "Gradle tasks are mostly generating files" problem. But we don’t really care for now.

A more realistic example

For my tests, I used 3 different functions: this sum function, a fib function (the Fibonacci suite function) and I wanted to try something more complicated like computing an md5 hash.

This, for example, is how I would define the protocol for the MD5 hashing function:

public interface HasherIO {
    @InputFile
    @PathSensitive(PathSensitivity.NAME_ONLY)
    RegularFileProperty getInputFile();

    @OutputFile
    RegularFileProperty getOutputFile();
}

Except that in this case, because the input is a file, I couldn’t use my code annotation processor code generator yet (but it’s planned). Instead I wrote code to read the file manually, allocate WebAssembly memory buffers, in order to call the function which is implemented in Rust:

#[no_mangle]
pub extern fn process(bytes: *const u8, len: usize) -> *const c_void {
    let data: &[u8] = unsafe { std::slice::from_raw_parts(bytes, len) };
    let mut hasher = Md5::new();
    hasher.update(data);
    let result = hasher.finalize_fixed();
    let pointer = result
        .to_vec()
        .as_ptr();
    mem::forget(pointer);

    pointer as *const c_void
}

Then all I need is to use this task in a build script. Let’s see how it performs…​

The WebAssembly runtimes

I wrote 2 different implementations of the WASM integration runtime: the first one was pretty straightforward to write and makes use of wasmer-java. The second one took me significantly more time to implement and is using GraalVM.

Integrating Wasmer was easy for different reasons: 1. it’s just a library which you have to add to your classpath 2. it’s relatively well documented

GraalVM was more complicated because:

  1. you actually need to run your program on GraalVM

  2. you need to install the WASM support separately (it’s not downloadable as a regular Maven dependency, for example)

  3. you still need to add the GraalVM polyglot API on classpath

  4. it’s poorly documented at this stage (in particular, there’s no documentation whatsoever on how to share memory between the Java host and the WASM guest)

Anyway, I think (but I haven’t done it yet) that the GraalVM runtime will be easier to support I/O since it already offers the configuration options to let the WASM host access the host file system. Wasmer doesn’t support I/O yet.

Let’s talk about performance, now. Disclaimer: this isn’t proper benchmarking. The results you will see were obtained via functional testing of a plugin. There’s a lot of variance, but they were reproducible.

Measuring the wasmer runtime

In short, the wasmer runtime is very promising: it’s easy to setup and actually performs extremely well. The API is not very Java friendly, but the abstraction layer I wrote (which supports both wasmer and GraalVM) makes it significanly easier.

Here are some results for a memoized Fibonacci, which compares a version I wrote in Java vs the one I wrote in Rust:

Memoized Fibonacci on wasmer
Java fib(90) = 2880067194370816120
Took 3ms
Precompiled Rust fib(90) = 2880067194370816120
Took 366μs

The WASM version compiled from Rust is already faster than the Java version!

Let’s see how it performs when hashing a MD5 file (remember that for this use case, I’m actually passing a byte array to the WASM program, not a file):

Hashing a 4MB file on wasmer
hash from Java is 49DFDCEF6751973A236D35401B6CBFC8
Took 64ms
hash from Rust is 49DFDCEF6751973A236D35401B6CBFC8
Took 58ms

Again, the WASM version is still faster!

On both operations, the WASM binary performs better than the pure Java version. However, there’s a catch: in order to reach that level of performance, the WASM binary has to be precompiled to a native binary by wasmer. This, already takes time. If you include this in the whole picture, the numbers are different: 36ms for Fibonacci (compared to 3ms in Java, 10x slower). However, this is in practice not a big deal since those binaries can actually be cached, meaning that if we have to call them multiple times, or from different builds, we can actually fetch them from the cache.

All in all, it means that the wasmer runtime is very fast and integrates quite well with Java.

Measuring the GraalVM runtime

The WebAssembly support for GraalVM is still experimental. However, it has the advantage of taking advantage of the Truffle API, which promises better integration between languages and, eventually, better performance.

In my case, that wasn’t quite true. Again as usual don’t trust benchmarks, but here are the numbers:

Memoized Fibonacci on wasmer
Java fib(90) = 2880067194370816120
Took 3ms
Precompiled Rust fib(90) = 2880067194370816120
Took 21ms

This time, the WASM code is significantly slower. The explanation is probably that contrary to the wasmer runtime, the WASM binary has to be parsed and transformed into a model that the Truffle API can understand, and as far as I could tell, this is not cacheable. However, this isn’t the only explanation, as we can see with the hash example:

Hashing a 4MB file on GraalVM
hash from Java is 49DFDCEF6751973A236D35401B6CBFC8
Took 57ms
hash from Rust is 49DFDCEF6751973A236D35401B6CBFC8
Took 407ms

Again we can see that the performance is significantly worse with GraalVM. I must again say that maybe I’m not using the API properly. In particular, I have found no better way to pass the byte[] to the WASM memory model other than byte by byte!

What we’ve learnt

In this blog post, we’ve seen that we can use a wasm binary in Gradle as the implementation of a task. This binary can be written in any language which supports compiling to WebAssembly. In my [test project], I have written tasks in 2 different languages: Rust and AssemblyScript.

We’ve seen that we can integrate WASM binaries using 2 different "runtimes":

  • Wasmer, which is using JNI and compiles, as far as I understand, WebAssembly binaries to native code

  • GraalVM, which is a different Java Virtual Machine, which usually performs extremely well with Java, and provides a Polyglot runtime leveraging the Truffle API.

As of today, the Wasmer version performs significantly better and WASM functions can be executed even faster than Java code! The GraalVM version is still experimental and performs quite poorly compared to using native Java code. It’s also more painful to test because it’s not enough to install GraalVM: you also have to install components separately, which is not build friendly, nor CI friendly.

The next step for me is to try to integrate more directly with the file system: at the current stage, none of the approaches is suitable for Gradle as we need to read and write files.

Also, one has to keep in mind that it’s pretty rare that you’d like to integrate in a build arbitrary code like this: in general, you want to call external tools (javac, gcc, …​). Nevertheless, this experiment is quite fun and I’m going to experiment more with this annotation processing API, which, I think, would be valuable in any case.