24 May 2022

Tags: gradle laziness

Yesterday, I wrote this tweet:

2022 05 24 tweet

I got a surprisingly high number of answers, so I thought it would be a good idea to expand a bit on the topic.

Gradle introduced lazy APIs several years ago. Those APIs are mostly directed at plugin authors but some build authors may have to deal with them too. Lazy APIs are designed to improve performance, by avoiding to create tasks which would never be invoked during a build. While lots of users wouldn’t notice the difference between a build using lazy APIs and a build which doesn’t, in some ecosystems like Android or with large projects, this makes a dramatic difference. In other words, while Gradle’s performance is often praised, it’s easy to break performance by unintentionally trigerring configuration of tasks which shouldn’t.

Task configuration

The discussion was trigerred when I was doing a code review yesterday. I saw the following block:

tasks.withType(Test) {
    testLogging {
        showStandardStreams = true
        exceptionFormat = 'full'
    }
}

This block configures logging for all test tasks of the project. At first glance, this seems appropriate, but there’s this gotcha: you should use .configureEach:

tasks.withType(Test).configureEach {
    testLogging {
        showStandardStreams = true
        exceptionFormat = 'full'
    }
}

If you don’t, then all tasks of type Test will always be configured, even if you don’t call them in a build. In other words, lazy configuration is about only configuring tasks which are going to be invoked.

Unfortunately, there are no warnings about eager configuration, or "unnecessary" configuration in a build. If you use Build Scans, you can have insights about configuration and realize that, but casual users wouldn’t.

Similarly, this code:

test {
    testLogging {
        showStandardStreams = true
        exceptionFormat = 'full'
    }
}
----

Will configure the test task (not all test tasks) eagerly: even if the test task isn’t executed in a build, it would be configured. Now you see the problem: this configuration pattern has been there basically forever, so it’s hard to remove. To do lazy configuration, you have to write:

tasks.named('test') {
    testLogging {
        showStandardStreams = true
        exceptionFormat = 'full'
    }
}
----

Obviously, this isn’t as nice, DSL-wise. One thing you may wonder is why Gradle’s DSL default to the lazy version? In other words, why doesn’t it call the lazy version instead of the eager one?

It’s because of backwards compatiblity: because this pattern has been present since day one in Gradle, eager configuration is everywhere in older builds. If you search for configuration blocks in Stack Overflow, it’s very likely that you’ll end up copy and pasting eager configuration samples. But, as the name implies, lazy configuration has a different behavior than eager: in the lazy case, the configuration block is invoked only when the task is needed, either because it’s going to be executed, or that another task depends on its configuration to configure itself. In the eager case, configuration is executed immediately: unfortunately there are lots of builds which accidentally depend on this order of execution, so changing from eager to lazy could result in breaking changes!

What should you use?

The consequence is that there’s a mix of lazy and eager APIs in Gradle, and making the difference between what is going to trigger configuration or not isn’t obvious, even for Gradle experts. Let’s summarize a few patterns:

  • If you want to configure one particular task by name, you should write:

tasks.named("myTask") {
   // configure the task
}

or

tasks.named("myTask", SomeType) {
   // configure the task
}
  • If you want to all tasks of a particular type, you should write:

tasks.withType(SomeType).configureEach {
   // configure the task
}
  • If you want to create a new task, don’t use create, but register instead:

tasks.register("myTask", SomeType) {
    ...
}

In the DSL, the following code that you find in many tutorials would immediately create a task:

task hello {
   doLast {
       println "Hello!"
   }
}

So the correct way to do this is:

tasks.register("hello") {
    doLast {
         println "Hello!"
    }
}

Note that the return type of both calls is different: the eager version will return a Task, while the 2nd one returns a TaskProvider. This is the reason why upgrading plugins isn’t that trivial, since it’s a binary breaking change!

Task collections and implicit dependencies

In a previous blog post I explained that the provider API is the right way to handle implicit inputs. For example, you can pass directly a TaskProvider as an element of a file collection: Gradle would automatically resolve dependencies and trigger the configuration of that task, include it in the task graph and use its output as an input of the task you’re invoking.

Therefore, understanding lazy APIs means that you should understand when things are executed. In the example above, the call tasks.withType(Test) by itself does not configure anything. You can see it as a lazy predicate: it returns a live task collection, it’s a declaration of intent: "this models all tasks of type `Test`".

Therefore, the following blocks of code are strictly equivalent:

tasks.withType(Test) {
   // configure
}

or

tasks.withType(Test).each {
    // configure
}

or

def testTasks = tasks.withType(Test)
testTasks.each {
    // configure
}

In other words, the last version explains the "magic" behind the traditional Gradle DSL. The first line is lazy, returns a task collection, and it’s the fact of calling .each which triggers configuration of all tasks! Replace .each with .configureEach and you are now lazy!

Newer APIs like named are lazy from day one, but are not necessarily user friendly.

A Gradle puzzle

In effect, named is lazy in terms of configuration, but eager in terms of lookup: it will fail if the task that you’re looking for doesn’t exist. It’s a bit strange, since in Gradle everything is now supposed to be lazy, so you can’t know when a task is going to be available or not. As an illustration, let’s explore the following script (don’t write this in your own builds, this is for demonstration purposes!):

tasks.register("hello") {
   doLast {
       println "Hello,"
   }
}

tasks.named("hello") {
   doLast {
        println "World!"
   }
}

If you run gradle hello, then the output is what you expect:

> Task :hello
Hello,
World!

Now, invert the position of the 2 tasks:

tasks.named("hello") {
   doLast {
        println "World!"
   }
}

tasks.register("hello") {
   doLast {
       println "Hello,"
   }
}

and run again. Boom!

* Where:
Build file '/tmp/ouudfd/build.gradle' line: 1

* What went wrong:
A problem occurred evaluating root project 'ohnoes'.
> Task with name 'hello' not found in root project 'ohnoes'.

That is very unexpected: I think what most people would expect is, if any change, that the World! and Hello outputs would be exchanged. But because named eagerly searches for a task registed with a particular name, it fails if not found.

As a consequence, plugin authors who want to react to other plugins, or react to tasks which may be present or not, tend to use the following API instead:

tasks.matching { it.name == 'hello' }.configureEach {
    doLast {
        println "World!"
   }
}

tasks.register("hello") {
   doLast {
       println "Hello,"
   }
}

Now let’s run our hello task:

> Task :hello
World!
Hello,

Yay! No failure anymore, and the output is in the order we expected. Problem solved, right?

Well, not so fast. You’ve used configureEach, so everything should be lazy, right? Sorry, nope: the matching API is an old, eager API! Actually, if you look at what the predicate uses, it becomes obvious:

// T is a Task!
TaskCollection<T> matching(Spec<? super T> var1)

Because it works on Task instances, it needs to create and configure the tasks so that you can run an arbitrary predicate on them!

That’s why if you have to write things like this, you must guard calls to matching with a withType before, which will restrict the set of tasks which will be configured. For example:

tasks.withType(Greeter).matching { it.name == 'hello' }.configureEach {
   messages.add("World!")
}

tasks.register("hello", Greeter) {
   messages.add("Hello,")
}

Of course the example is a bit stupid, but it makes sense when you’re not the one in control of when a task is configured or even if you don’t know if it will ever be.

Unfortunately, Gradle doesn’t provide an API which is fully lazy and lenient to tasks being present or not. If you simply want to configure a task, that is not a big deal since you can simply use configureEach:

tasks.configureEach {
    if (it.name == 'hello') { ... }
}

This is fine because the configuration block will be called for each task being configured. However, this configureEach block is a configurer, not a predicate, so you can’t use it as an input to another task:

tasks.named("md5") {
    inputFiles.from(tasks.named("userguide"))
}

The code above would fail if the userguide task doesn’t exist before the md5 task is configured…​

Conclusion

In this blog post, I have explained why you should use the new lazy APIs instead of their eager counterparts. I have also described that while they are more verbose, they make it possible to have faster builds by avoiding configuration of tasks which would not be executed. However, Gradle doesn’t warn you if you eagerly configure tasks, and it’s easy to shoot yourself in the foot. Some would blame the docs, some would blame the APIs.

As a former Gradler, I would blame none of those: the docs are here, and changing the APIs to be lazy everywhere is either a binary breaking change (return type of methods which create instead of register), or a behavior change (deferred configuration vs immediate configuration). This makes it particularly complicated to upgrade builds without pissing off a number of users!