Gradle and Kotlin, a personal perspective

22 May 2016

Tags: jvm groovy gradle kotlin

Gradle embraces Kotlin, what about Groovy?

First of all, it’s been a long time since I last blogged, and I’d like to remind that everything written here are opinions of my own and not the views of my employer, which happens to be Gradle Inc as write those lines.

A few days ago, Gradle and Jetbrains announced a partnership to make Kotlin a first class language for Gradle builds, both for build scripts and plugins. Most likely, you know Gradle has been using Groovy since its inception. Lots of people think that Gradle is written in Groovy, which is actually wrong. Most of Gradle is written in Java. The builds scripts are written in Groovy, lots of plugins are written in Groovy, our test cases are written in Groovy (using the best testing framework out there, Spock), but Gradle itself is written in Java.

From my perspective, this is situation has been very disturbing and continues to be so. I have very good friends in the Groovy community, and this move has been seen by some of them as a betrayal. As an Apache Groovy committer, and someone who spent almost 4 years full time implementing new features of the language, most importantly its static compiler, seeing Kotlin promoted as the language of choice for Gradle’s future, it’s a little strange. One could legitimely say, WTF? I’ve been aware of this work for several months now, and my colleagues Rodrigo B. de Oliveira and Chris Beams have done an amazing job in a very short period of time. From a long time Groovy user and Groovy developer point of view, it’s hard not to make this move an emotional thing. However, business is not about emotions. In particular, what are we trying to acheive with Gradle? We’re trying to help developers build their applications. We’re trying to make this elegant, reproducible, scalable and fast. We’re language agnostic. We can build Java, Groovy, Scala, Kotlin, C++, Python, … Gradle has never been the tool to build Groovy applications: it’s been a tool to build software. It’s a tool about automation. And I’ve been complaining enough about communities that build their own tool for their very specific language to understand that this is super important: Gradle is (or aims at) the best tool for building any kind of software. In short, we must think in terms of what is best for our users, and sometimes, this means changing technogies. A product should not be bound to a technology, but a company should even less be bound to it. And given the response that we had after the announcement, supporting Kotlin seem to drive a lot of excitement around Gradle, and that’s a very good thing. So, let’s take that out, and think what it means for Groovy.

Groovy support is not abandoned

First of all, I already said it several times, but better continue to spread the message: support for Groovy in Gradle is not deprecated nor removed. You can still write your scripts in Groovy, you can write your plugins in Groovy, and you will still be able to do it. But Gradle will likely encourage users to migrate to Kotlin. To be clear, Kotlin support is incubating, and there’s a lot to do to make it as usable as the Groovy version. Second, there are tens of thousands of builds written using Groovy, hundreds of plugins written in Groovy, so it’s not tomorrow that Kotlin is going to replace Groovy. However, we care about the future, so we need to think about what it means in the long term. Should we be excited about supporting Kotlin? Yes we should, because Kotlin is an amazing language. Should we continue to be excited about Groovy? Of course we should, because it’s also an amazing language. But it’s old and as such brings a lot of legacy with it. As someone who implemented the static compiler for Groovy, I know it very well. There are things that are hard to change, because a large part of the Groovy community is very fond of its dynamic behavior.

So let’s focus on the two major aspects that led to embracing Kotlin in Gradle. The fist one, and principal, is IDE support. Let’s face it: even before I joined Gradle, when I was giving talks about it, people were complaining about IDE support. Compared to a tool like Maven, supporting Gradle build scripts is complicated. Supporting XML is easy (to some extent). Supporting a dynamic DSL is not. Some say it’s Groovy’s fault, and I want to correct this statement right now: it’s not Groovy’s fault. While Groovy let’s you design dynamic DSLs, the design of the DSL can be changed to make it easier for tools to "discover" things. But when Gradle was designed, there wasn’t any statically compiled Groovy. The idiomatic way to write DSLs in Groovy, at that time, was to heavily rely on runtime metaprogramming. While loving metaprogramming, I’ve always prefered compile time metaprogramming over runtime metaprogramming. For multiple reasons:

  • because in most cases, what you want to do at runtime can be done in a unique, setup phase. For example, create your metaclasses, enrich existing types, configure property missing, method missing, … If it’s setup, it’s better done at compile time, because you can report errors, and because it gives higher performance. This led the way I designed the static compiler, and more features of Groovy after that (traits, type checking extensions, …) : describe what you want to do at compile time.

  • because it makes the life of tools easier. While IntelliJ or Eclipse support DSL descriptors that help them provide completion, those are hard to implement, and often inaccurate. They can only approximate what is going to happen at runtime. And in the end, you’re doing the same job twice: you’re writing a runtime for your DSL, which is dynamic, then you need to write a DSL descriptor for the IDE to understand it. Wouldn’t it be better if all was done in a unique place? Something that both the compiler and the IDE can understand?

So while we know we can describe dynamic Groovy DSLs so that they are understood by IDEs, it’s effectively a lot of work. And if you want to support multiple IDEs, it’s even more work. But in the case of Gradle, it’s even worse: each plugin can provide it’s own "micro DSL". While there’s an "idiomatic" way to configure Gradle builds, it’s no single rule. One can implement it’s own Groovy DSL within the Gradle build. And no luck the IDE would ever understand it. Another pain point is that Gradle adds complexity to complexity in terms of DSL capabilities. For example, when you have a build script that has:

dependencies {
   compile libraries.groovy

greeter {
   message = 'hello'

sign {
   signature = top

often people do not realize that: - dependencies is found on a Project instance - libraries is a user declared variable, that can be found in a plugin, another build script, a project properties file, … (how does the IDE find about it?) - greeter is a convention object, defined by a plugin, to configure the default values of its task - sign is a task, which has a signature property, and top references an extension property from the project

So while this build script is simple to read, it’s hard to understand how it effectively works, because objects can be found at different places, can be provided by different providers (plugins, properties, extensions), but everything is accessed using a single notation. This is bad, because it makes it almost impossible for an IDE to understand what is going on.

The question is, is it Groovy’s fault? My answer is not totally. The fault is mostly on the DSL design, and Groovy made it too easy to do so. But again, that was designed at a time when dynamic Groovy was the rule. I gave a talk, recently, about building modern DSLs with Groovy, where I discourage such practices, and encourage the use of static DSLs instead.

That leads me to the second main reason of embracing Kotlin in Gradle: performance. When we talk about performance, lots of folks tend to think that Groovy is slow. This is not the case. Groovy is pretty fast. However, depending on the design of the DSL, you can easily fall into traps that can lead to catastrophic performance. Before I go further with it, I’m reading way to often that Gradle is slow because it’s written in Groovy and that Groovy is dynamic so it’s slow. F* no, those who tell you that just didn’t profile a build. As I said, Gradle is mostly written in Java. And I’ve spent the last 3 months optimizing the performance of Gradle, and I can tell you that of the dramatic performance improvements that one can see in Gradle 2.13 and 2.14, almost none was obtained by rewriting Groovy to Java, or rewriting Groovy code. None! Most of the hotspots were pure Java code. Period. However, as soon as you use plugins, which are today mostly written in dynamic Groovy, or that your build scripts imply a lot of nested closures, things start to become complicated for "Groovy". Let me explain that clearly. I think at some point, someone made a terrible design decision in Groovy. I don’t know who it was, but the idea was to rely on exceptions to control the flow of resolution of properties. This means that when a property is missing, typically in a closure, an exception is thrown. When a method is not found, an exception is thrown. When a property is not found, an exception is thrown. That seemed to be a good idea, because in the end, you want to provide the user with an error, but in practice, this is catastrophic, because Groovy can capture those exceptions. Typically, in a delegation chain (nested closures), a containing closure or class can actually have this property defined, or implement property missing/method missing. Now, re-think a second about the "simple" example of Gradle build above: how do you now where to look up for message, top, signature, …? Now you know: f* exceptions are thrown, stack traces are filled, and eventually captured because some composite dynamic object finally wants to answer the message… In practice, for some builds I have profiled, it was tens of thousands of exceptions being thrown and stack traces filled for nothing. And that has a terrible impact on performance. So even if we have implemented strategies in Gradle to try to avoid throwing those exceptions (which are responsible for part of the performance improvements in 2.14), this is very hard to do it, and we’re still throwing way too many of them. A static language doesn’t have this problem, because every single reference in source is resolved at compile time. So, if you’re writing a plugin in Groovy, for the sake of performance, please add @CompileStatic.

So there goes Kotlin. Kotlin has excellent static builders support, that make it practical both for IDE support, which will dramatically improve user experience in terms of understanding what do write, what is an error, having documentation, refactorings, … and is a very pleasant language to work with. Honestly, I don’t have anything bad to say about the language (apart from the fun keyword that I don’t like). To some degree, it’s not very surprising: Kotlin has heavily inspired by Groovy and another popular JVM language: Scala. And again, being the one behind the static compiler of Groovy, I can’t blame them for doing what I like about static languages. Their builder support is awesome, and very elegant. And it’s supported out of the box by IntelliJ of course, but also Eclipse.

A static DSL for Groovy?

Ok, so one might think at this point that I’m mad. I wrote a "competing" language, and I’m happy to see Kotlin being promoted in Gradle. I wrote the static compiler, that is capable of doing everything Kotlin can do (minus reified generics, plus superior scripting support, type checking extensions, …), so wtf? Ok, so let’s be very clear: I have absolutely no doubt that Groovy can do everything that we’ve done with the Kotlin support in Gradle. It can be statically compiled, provide an elegant DSL that is statically compiled, and it can be understood by the IDE. I had no doubt before the Kotlin work started, I have even less doubts now. And I can say I have no doubts because I tried it: I implemented experimental support for statically compiled Gradle scripts, written in Groovy. Here’s an example:

apply plugin: 'java'
apply plugin: 'eclipse'
apply plugin: 'idea'
apply plugin: 'groovy'
apply plugin: GreetingPlugin

repositories {

dependencies {
    compile 'commons-lang:commons-lang:2.5'
    compile "commons-httpclient:commons-httpclient:3.0"
    compile "commons-codec:commons-codec:1.2"
    compile "org.slf4j:jcl-over-slf4j:1.7.10"
    compile "org.codehaus.groovy:groovy:2.4.4"
    testCompile 'junit:junit:4.12'
    runtime 'com.googlecode:reflectasm:1.01'

tasks.configure('test', Test) {
    jvmArgs '-XX:MaxPermSize=512m', '-XX:+HeapDumpOnOutOfMemoryError'

dependencies {
    compile 'org.codehaus:groovy:groovy-all:2.4.4'

extension(GreetingPluginExtension) {
    message = 'Hi'
    greeter = findProperty('greeter')?:'static Gradle!'

tasks.create('dependencyReport', DependencyReportTask) {
    outputs.upToDateWhen { false }
    outputFile = new File( project.buildDir, "dependencies.txt")

class GreetingPlugin implements Plugin<Project> {
    void apply(Project project) {
        project.extensions.create("greeting", GreetingPluginExtension)
        project.task('hello') << {
            println "${project.extension(GreetingPluginExtension).message} from ${project.extension(GreetingPluginExtension).greeter}"

class GreetingPluginExtension {
    String message
    String greeter

This is an example Gradle build that is compiled statically. It has none of the problems I described about the Groovy implementation in Gradle above. It uses all the techniques that static Groovy provides: extension methods, powerful scripting with implicit imports, type checking extensions, … All this works. And interestingly, the work that is done to enable support for Kotlin also benefits to statically compiled Groovy, and Java! Let’s not forget about the latter, which is years behind in terms of "modern" languages support. So if this works, why do we need Kotlin? To be honest, I asked it to myself many times. It was very difficult to me, because I knew Groovy could do it. Again, I had no doubt about the language capabilities, no doubt about the performance impact of doing this. However, I missed two critical points:

  1. IDE support. Even if support of Groovy in IntelliJ is by far the most advanced of all other IDEs, it still lacks behind when static compilation is on. But more importantly, it doesn’t know that my script is statically compiled, nor does it now about my custom extension methods. I tried to implement a GDSL descriptor to make it aware of them, and it somehow worked: I do have code completion, but errors are not marked as errors, and the IDE still doesn’t understand that it should only suggest to me what is relevant in the context. With Kotlin scripts which are natively static, there’s no such issue. The IDE understands everything natively, in IntelliJ and Eclipse. So, I have no doubt that Jetbrains can implement support for this, just like I had no doubt I could implement a static Groovy DSL, but who is going to write this? Me? Gradle? I don’t have the time to do it. And it’s not Gradle’s job to write IDE plugins. And what about Eclipse? One big issue that the Groovy community has, today, is that nobody is supporting Eclipse since Pivotal dropped sponsorship of Groovy. After more than one year, nobody took over the development of Groovy Eclipse. Nobody. While Groovy itself saw lots of new contributors, while we saw a lot of bugfixes, new contributors and that the download numbers where never as high as they are today, IDE support is critical. And nobody took over the development of it. I saw some people referring to what Jetbrains is doing as "blackmailing". Seriously? Jetbrains? Think of what they’ve done for Groovy. Groovy would never has been as popular as it is without them. They provided us with the best Groovy IDE possible. They are constantly supporting new features of the language, adding support for AST transformations, traits, … They even added the ability, in IDEA 14, to use Groovy (and not Kotlin, guys!) as the language to evaluate expressions in the debugger. And they would try to kill Groovy? Kill part of their business? Come on guys! So yes, they invested a lot in Kotlin and want to promote it, but how could it be otherwise? And it’s not like if the language sucked: it’s awesome!

  2. Does it make sense? Now that we made the decision to support Kotlin, that we proved it would provide the level of user friendliness we want and that it is statically compiled by default, does it make sense to put resources to support static Groovy in addition? I don’t have an answer to this. I thought yes, but now I’m not sure. Kotlin does the job. And honestly, they have great engineers working on the language. Even if it lacks behind in terms of scripting and compilation times compared to Groovy, I have no doubt they will fix it. How arrogant would we be if we thought other languages could not do what we’ve done with Groovy?

The future of Groovy

The last point I want to address is what it means for the future of Groovy, and what it means for my future in Groovy. First of all, I always thought that the future of Groovy was in the hands of its community. It’s not Gradle that has Groovy’s future in its hands. It’s you. The move to the Apache Software Foundation was also done for this very same reason: community first. If you want to continue to use Groovy, to improve it, to support it, all you have to do is f* do it! And I will continue! I love this language, I know too well how far it can go in terms of DSL support, AST transformations, now in 2.5 we have macros, that’s just a crazily powerful language that’s super fun to use. Should we fear competition? No, we shouldn’t. Competition is good. It should be inspiring. And if Gradle moving to Kotlin means the death of Groovy, maybe the problem is elsewhere. And even if lots of people get introduced to Groovy through Gradle, it’s not the only entry point. Grails is another. Jenkins (through Flow) is another. And many, many more. There was a tweet a few days ago which showed the 100 most popular dependencies in GitHub projects. Groovy was one of them. No Kotlin. No Scala. Groovy. It’s everywhere, and it’s going to be there for a long time.

Part of the fears of the community is, after the Pivotal demise, if Groovy is a dying language. It’s not. It has never been so widely used. The move to Apache Software Foundation drove a lot of attention and brought us many more contributors. But the community has to realize what the problems with Groovy are, and it has to face them: the introduction of the static compiler was too late. IDE support is important. Java 9 support is going to be super important. If you love your language, contribute. Help it. Help yourselves. The future of Groovy must be in your hands. I can’t recall how many times I told this, since I joined VMware, a few years ago, to develop Groovy. In every talk I give, I’m always telling how important it is that you contribute. Jetbrains is not going to write Groovy Eclipse for you.

And I would like to finish with one word: if people move from Groovy to Kotlin, is it really a problem? Isn’t any technology inspired by another? Aren’t we, developers, always rebuilding the same things, but improving them, learning lessons from the past? Is Kotlin a better Groovy? I don’t have the answer yet. Maybe it is. Maybe not. Today Groovy remains greatly superior in terms of scripting, DSL support, but it comes with a price that Gradle doesn’t want to pay. And let’s not forget the original community of Groovy: a dynamic language for the JVM. There are still lots of people who like this aspect of the language (and I do too, typically when I write Groovy scripts in place of bash scripts, I don’t care about types). It’s compile time metaprogramming features also make it incredibly powerful. Modern Groovy definitely doesn’t deserve its "bad press". Would you compare Java 8 with Java 1? No. So don’t compare Groovy 2.4 with Groovy 1 either. Reputation should change, and you can help there too.

This leads me to what I should do. And there, I’m a bit lost, to be honest. I work for a company that embraced Groovy, that is now embracing Kotlin. I love my job, I love working with Gradle, I love Groovy, and I quite enjoy Kotlin. I’m a passionate developer. I just want to continue having fun. But if you think that as such, I’m not a good representative of the Groovy community anymore, maybe I should step off from the Groovy project. I would hate that, but I’ve kind of been hurt by the bad comments we (Gradle) received from some members of the Groovy community. I don’t want to fall into a language war, I don’t care about this. I care about users. What I love to do is helping people, period.

I would like to finish this post with a thought about what I’m going to do, as a Gradle developer, for you, Groovy users. In particular, I am convinced that the success of Gradle is largely due to its Groovy DSL, despite its problems. The fact that it’s simple, easy to read, is super important. I joined the Groovy project because I was using Groovy as a DSL platform in a natural language processing context. Groovy is super powerful for this. And I learnt a lot in terms of DSL design. In particular, I will try to make sure that it doesn’t become a Kotlin API. What I mean by that is that I think we should elevate from a Groovy DSL to a Gradle language. And this language is meant at describing builds. And our users are not Kotlin developers. Most of them are not Groovy developers either. They are, as I described earlier, from different horizons. And I would hate if a user would have to understand concepts like generics or type inference to write a build script. That would be horribly wrong. A build author should understand how to model an application, not what is a type, what is an extension method, or generic return type inference. It’s different for plugin authors, but for a build author, it’s super important. So I will try to make sure that Kotlin scripting support improves, even if it means that it would go even closer to what Groovy supports. I would do this not because I want Groovy to die, I don’t (and it wouldn’t help my royalties for Groovy in Action 2 ;)), but it would help users or Gradle. That’s what I care most about, just like I care about what Groovy users want when I work on the Groovy project.

As for talking about Gradle, Groovy and its future, I’ll be a GR8Conf next week, I’d be happy to answer you in person there too!

Keep on Groovying!


A week fighting a PermGen leak

29 August 2015

Tags: jvm groovy gradle permgen yourkit

A challenge at Gradle

A new job

This is my first blog post after 4 months being a full time employee of Gradle Inc.! I am pretty excited by this job, even though so far it didn’t give me much time to contribute to Apache Groovy (but we did manage to release 2.4.4 though). One of the reasons I joined the company was because I love technical challenges. And Gradle has a lot of them, with an incredible team of smart people working together to make software automation better. This week, I worked on my first true challenge, and I must confess that I miserably failed :-) This is a long post, and anyone who ever fought against the infamous "PermGen space error" in their application is going to understand why…

CodeNarc as the source of a leak?

The first suspicious piece of code that draw our attention was CodeNarc. CodeNarc is a source code quality analysis tool for Groovy, which is used by a lot of Groovy developers, including in Gradle itself (since Gradle intensively makes use of Groovy). CodeNarc can be seen as the equivalent of FindBugs for Groovy code. And problems seemed to start with an upgrade of the Gradle CodeNarc plugin to use CodeNarc 0.23. We actually saw reports like this one in the forums or this other one in GitHub but thought that the PermGen error was just a consequence of CodeNarc including more rules: rules are written in Groovy, so compiled down to classes and classes eat PermGen. So increasing the PermGen space was enough, and it did actually solve the error. Problem solved. Or not. The riddle only started for me with a seemingly insignificant question on our internal mailing lists: "Can some investigate why our build sometimes fails with a PermGen space error?", and I volunteered.

Interestingly, I had just finished pushing an upgrade of Gradle to Groovy 2.4.4 on our master branch, and I had noticed that I had to increase the PermGen space too. At first, I naively thought that it was also required because Groovy 2.4 consumed more memory, but I was wrong. I should have known, because before joining Gradle, I had actually worked on Groovy for Android, and a consequence of this work was that Groovy 2.4 had a reduced memory footprint: we generate less bytecode, which directly relates to a reduced PermGen space usage. So why on earth would Groovy 2.4 require more memory? And what is the relation with the CodeNarc plugin? Actually this plugin works in a Gradle version that uses Groovy 2.3.10, so why would there be a relation between the two?

In such a case, your best friend is a profiler. But as I will explain here, it can also lead you to wrong tracks. Be careful. The second best friend is the JVM options -XX:+TraceClassLoading and +XX:+TraceClassUnloading. I also used -XX:SoftRefLRUPolicyMSPerMB=0, an option that I had no idea it existed before my friend David Gageot told me. Basically, it will force the garbage collector to agressively collect all soft references, which is very useful to understand, in combination with the 2 other options, from which classloader we are leaking memory.

The first wrong track was actually thinking that CodeNarc was the source of the problem. I wrote an email on the gradle-dev list, explaining my findings, and I had indeed found a lot of classes from CodeNarc were not unloaded. Before I go further, let’s explain how the JVM is supposed to behave with regards to classes in Java 7. We all know that objects are garbage collected, but for a lot of people, classes are not. That’s why we have the PermGen space (which has disappeared in Java 8 but that’s another story): this segregated space of the JVM memory is used to store classes. And in Java, a class is loaded by a classloader. There is a strong reference between a class and its classloader. But what the JVM is able to do is actually simple: if there’s no instance of the class which is strongly reachable and that the classloader is neither strongly reachable, then both the class and the classloader can be unloaded. This means that PermGen can be recovered, and it is pretty useful, especially for a language like Groovy which can generate a lot of classes at runtime.

In Gradle, and particularily in the Gradle CodeNarc plugin, CodeNarc is executed through an Ant task, which spawns its own isolated classloader, containing both the CodeNarc and Groovy classes. So when the plugin execution is finished, if we do not keep track of the classloader, classes should be garbage collected. So a good candidate for the memory leak was actually the IsolatedAntBuilder that Gradle uses to execute the Ant task. And guess what? There is such a leak, because the DefaultIsolatedAntBuilder performs classloader caching! That was also discovered by my colleague Sterling, who immediately spot that: while we do cache the classloaders, keeping a strong reference on them, we don’t have any code to release the classloader in case of memory pressure. Conclusion, we’ve found the memory leak, hurray! And it has nothing to do with CodeNarc or Apache Groovy, pfiew!

So I immediately tried to disable the cache, which turned to be pretty trivial. Run the build again and… another PermGen space error. No CodeNarc classes unloaded, no Groovy classes unloaded. Wow. So the problem wasn’t solved, first "oh my!" moment of the week: there was another leak.

One test I did, then, is to totally comment out the code that, in the ant builder code, performed the definition of the CodeNarc task. Eventually, there was only that code left in the CodeNarc plugin:

    void run() {
        def classpath = new DefaultClassPath(getCodenarcClasspath())
        antBuilder.withClasspath(classpath.asFiles).execute {
            // ... thou shalt not leak!

I executed the code again, and there was definitely a leak: after several loops, a PermGen error occurred. CodeNarc was ruled out as the source of the leak. After some hours of trials, study of memory snapshots, I eventually came out with a piece of code that reproduces the problem independently of Gradle:

int i = 0;
try {
    while (true) {
	URLClassLoader loader = new URLClassLoader(
            new URL[]{new File(GROOVY_JAR).toURI().toURL()},
        Class system = loader.loadClass("groovy.lang.GroovySystem");
} catch (OutOfMemoryError e) {
    System.err.println("Failed after " + i + " loadings");

As you can see, the code is very simple: it creates a new isolated classloader, which only contains Groovy on classpath. Then it invokes the creation of the Groovy runtime, by asking the metaclass registry, then it closes the classloader. On my JVM, after about 40 runs, the code fails with a PermGen space error: despite the fact that no class, no object is kept out of the classloader, the Groovy runtime is not unloaded, leading to a memory leak. The key point here is that I had noticed some oddities during my hours of debugging: a class, named ClassInfo, was at the center of those oddities.

ClassInfo leak

In particular, although YourKit (the profiler I was using) was telling me that all classes, all classloaders were weakly or softly reachable (btw, it’s really a pity that YourKit doesn’t show them separately, that is, weakly referenced objects from softly referenced ones), the classes were not garbage collected. Also strangely, some classes appeared as GC roots, meaning they could be collected, but they weren’t! And when I navigated through some of the duplicate classes I was seeing, ClassInfo was present, as a value of a map entry of class value. Here we are, I had found the real source of the leak. Something had changed in Groovy. And the fact that CodeNarc was leaking since 0.23 was just a side effect of upgrading its dependency to Groovy 2.4! So despite Gradle was using Groovy 2.3.10, CodeNarc, by default, was using a more recent version of Groovy. That doesn’t explain why it leaks yet, but we now know who is responsible: Groovy.

ClassValue, friend or foe?

Since Groovy 2.4, Groovy uses a new mechanism for storing its runtime metadata in case you run on JDK 7 or later: ClassValue. ClassValue allows storing information on the class level. Typically, a language like Groovy would use it to store the metaclass of a class. In practice, Groovy doesn’t directly store the metaclass here, but a higher level concept called ClassInfo, which in turns gives access to the metaclass of a class.

Before Groovy 2.4, all that information was stored directly in the ClassInfo class itself, through a private static field. ClassInfo is therefore the global entry point for accessing runtime information about a class. While Groovy 2.4 still uses ClassInfo as an entry point, there are 2 possible storage mechanisms, based on the underlying JDK. If ClassValue is available; which is the case for any JDK 7+, then it is used, otherwise we fallback on the old mechanism. ClassValue is supposed to be more efficient and more direct. Of course, in any case (old and new storage mechanism), it is memory sensitive: in case a class is not available anymore, its ClassInfo is removed. What ClassValue storage provides is basically the same as ThreadLocal, but at the Class level instead of the Thread level. Users are allowed to store information here, but one should be aware that as thread locals, if you start using it, you may face memory leaks if you don’t use it properly.

That’s for the theory, let’s see how in practice this change led to a giant memory leak in the Gradle build.

The theory is that ClassValue should behave like ThreadLocal. That is, the entries stored in the internal map of the Class class, should be garbage collected when the referent is not strongly referenced anymore. This behavior is however not what the JVM does. It was confirmed to me by Charles Nutter (JRuby) a few hours later: although we all expected the JVM to collect the unreachable, it does not.

Uh. Second "oh my!" moment of the week. I had now a candidate (Groovy) and a reason (ClassInfo leaking). However, it doesn’t explain by itself why the class loader is not garbage collected: if Groovy stores information on classes, it’s ok, as long as the classes to which it writes some metadata are from the classloader, or any child classloader, of the Groovy runtime itself. Everything would be self-contained, meaning we would have a graph of objects that do not leak outside of the isolated classloader. However… Groovy uses Strings, integers, List, … all coming from the system classloader. And that is the main difference with the old metadata storage mechanism: the old one only referenced classes from the system classloader. With ClassValue, we are modifying classes from the system classloader too! That is, the String class, for example, contains in its class value map, information from the Groovy runtime! The famous ClassInfo instance is present there! There we are! We leaked a ClassInfo instance into the system classloader! So what happens is that when we are done with our "isolated" Groovy runtime, we think it should unload because nothing references any object or class from that classloader. However, the Groovy runtime did update classes from the system classloader, and it started leaking into it! The ghost in the shell! Groovy is spoiling everywhere…

So far so good, I had the explanation, I could write a workaround: let’s iterate over all those classes that Groovy updated, remove the ClassInfo, and we’re done. I wrote that code, and it turned out to be a bit ugly, but… it worked! Here is, for information, the cleanup code:

    static void removeClassFromGlobalClassSet(Class<?> classInfoClass) throws Exception {
        Field globalClassValueField = classInfoClass.getDeclaredField("globalClassValue");
        Object globalClassValue = globalClassValueField.get(null);
        Method removeFromGlobalClassValue = globalClassValueField.getType().getDeclaredMethod("remove", Class.class);

        Field globalClassSetField = classInfoClass.getDeclaredField("globalClassSet");
        Object globalClassSet = globalClassSetField.get(null);
        globalClassSetField = globalClassSet.getClass().getDeclaredField("items");
        Object globalClassSetItems = globalClassSetField.get(globalClassSet);

        Field clazzField = classInfoClass.getDeclaredField("klazz");

        Iterator it = (Iterator) globalClassSetItems.getClass().getDeclaredMethod("iterator").invoke(globalClassSetItems);

        while (it.hasNext()) {
            Object classInfo =;
            Object clazz = clazzField.get(`ClassInfo`);
            removeFromGlobalClassValue.invoke(globalClassValue, clazz);


After executing that code, no ClassInfo instance was leaking anymore into the system classloader, and the runtime could be shutdown properly. The garbage collector did its job, and yay! I’m so happy, I’ll be able to sleep soon! That was tuesday night. And that night, I thought I had found the solution.

Memory sensitive classloader caching

So wednesday, I spent the day trying to implement the same strategy inside Gradle. More precisely, inside the IsolatedAntBuilder thing I told you. I implemented the code, launched my test again and hurray! it worked! My test passed! No more PermGen space error! So all I had to do, now, was to reactivate classloader caching, otherwise, we would loose a feature that is important performance wise.

So I reactivated the cache, and boom! That time, the Gradle build did not fail with a PermGen error, but with very strange errors like this one:

groovy.lang.MissingMethodException: No signature of method: is applicable for argument types: (java.lang.Integer) values: [0]
>  Possible solutions: plus(java.lang.String), plus(java.lang.Character), abs(), use([Ljava.lang.Object;), split(groovy.lang.Closure), minus(java.lang.Character)

Mmmmmm… 3rd "oh my!" moment of the week. I understood what I had just done. By clearing the ClassInfo stuff from the classloader, I had effectively shutdown the Groovy runtime that was initiated in that cached classloader. So when some code was trying to reuse the runtime from that cached classloader, since I had disabled it, it was failing! And there’s no option to reinitialize the Groovy runtime. It’s just not doable, because everything happens in static initializers (private final fields, …). So unless the JVM had an option to allow to re-execute the static initializers of a class (and who knows what oddities it would lead to), I had no luck.

That’s about when I told my mates at Gradle "I think we have to choose between caching and leaking memory". But the night came, and I actually had an idea. I could implement a memory sensitive cache: by writing a smart cache structure with appropriate SoftReferences and reference queues, I would be able to execute the shutdown code only when I know that the GC is trying to reclaim memory. The idea is simple: we have a map, which key is a SoftRefence<String>, and the value is our cached classloader. The String represents the classpath that we are caching for the classloader.

Now imagine that the GC is out of memory. The semantics of SoftReference are clear: before throwing an OutOfMemoryError, the JVM will do its best and clear all soft references. Doing so, using a custom reference queue, we can be notified that the reference is cleared. Then, we can execute the Groovy runtime shutdown code, which will in turn make the ClassLoader collectible.

Honestly I was pretty happy with my implementation. I executed the code and it worked! Caching was working until the GC tried to reclaim memory, then I saw my shutdow code executed, memory reclaimed and green tests. Woooo!!! I had eventually knocked that memory leak down! Ha ha!

Then I remembered that my colleague Sterling had a test which involved a loop in an integration test. To make sure I had really fixed the leak, I asked him to tell me how he did that. The code was very simple, just involving a loop thanks to @Unroll in a Spock specification. I did it and… PermGen error showed up again. WAT?!

That was the fourth "oh my!" moment. And not the last one. I really then spent hours in modifying my caching code, refactored my code to add more complicated memory leak strategies, seeing that there were still thread locals, clearing them explicitly, adding a memory leak strategy for Ant itself, for the Java Beans introspector, … None of my attempts worked. In the end, it always failed. But there was always one mystery: I saw that the Groovy classes were unloading. But the Ant classes were not… And the rest, I should have discovered that much sooner. But when you have so many potential source leaks, that are much more evident, it’s so hard to figure out.

In particular, one thing would have made things much easier to discover. In YourKit, you can see that there are duplicate classes. Classes that have the same name, but come from different class loaders. However, there’s nothing that will show you those duplicates. You have to find them yourself. And in the end, when in the dump you see an instance of that class, all you can see is that it is an instance of ClassInfo. Nothing, visually, tells you that the instance of ClassInfo that you are seeing actually comes from a different classloader from the one you are seeing just next to it. A bit of color, for example, would help…. And it would have helped me seeing that some ClassInfo elements that I was seeing in the classes from Ant didn’t come from the "disposable" Groovy runtime… No. They were coming from… the Gradle runtime itself!

Where it all ends

Ok, That was the last "oh my!" moment of the week. The one that killed all my hopes. And to understand the problem, I now have to explain to you how IsolatedAntBuilder works. It’s a very small, yet very smart and practical piece of code. Maybe too smart.

Gradle, as a core feature, lets you execute Ant tasks thanks to code inherited from the Groovy codebase itself: AntBuilder. It’s a very elegant piece of code, that lets you write things like:

task check << {
    ant.taskdef(name: 'pmd',
                classname: 'net.sourceforge.pmd.ant.PMDTask',
                classpath: configurations.pmd.asPath)
    ant.pmd(shortFilenames: 'true',
            failonruleviolation: 'true',
            rulesetfiles: file('pmd-rules.xml').toURI().toString()) {
        formatter(type: 'text', toConsole: 'true')
        fileset(dir: 'src')

While this works, there’s actually a lot involved behind that. Including classloader magic. In particular, in the example above, we create a task definition in Ant, which uses a classpath defined in Gradle. "ant" here is a global object which is shared accross the build, but it is possible to avoid the classes from the Ant tasks to be mixed with the Gradle classpath itself by using antBuilder instead. That’s what the CodeNarc plugin does:

antBuilder.withClasspath(classpath.asFiles).execute {

means "Gradle, please, create an isolated classloader for me, that will contain the classpath only necessary for CodeNarc, and execute that Ant task with it". It seems very trivial, but there is a problem. The code that you see here is found in a Gradle script. It means that the "antBuilder" object that you are seeing here comes from Gradle. It is our IsolatedAntBuilder instance. When we call "withClasspath", a new instance of IsolatedAntBuilder will be created, with an isolated classloader corresponding to the supplied classpath. Then calling execute with a closure that lets you configure the ant task using the Groovy AntBuilder syntax.

So the "Closure" class that we are seeing here comes from Gradle. Then, we have a classloader which contains the Ant runtime, and a "bridge" class, written in Groovy, called "AntBuilderDelegate", which has one responsibility: when the code of the Ant builder is going to be executed, it is likely that the version of Groovy which will be found on classpath will be different from the one that Gradle uses. That is exactly what happens with CodeNarc: Gradle 2.6 uses Groovy 2.3.10, but the CodeNarc plugin executes with Groovy 2.4.1, so the Ant task works with a different "Closure" class than the one that Gradle has. We will really have two distinct "Closure" classes here, and "AntBuilderDelegate" is responsible for filling the gap: when the Ant configuration code, which will use AntBuilder from the Ant classpath, is going to be executed, it will be calling AntBuilderDelegate instead of directly the Closure code. And that code will intercept the missing methods in order to "reroute" them to the builder. You don’t have to understand that in detail, it’s not really the point here, but it is important to understand that this "AntBuilderDelegate" class is instantiated… in Gradle, using the Gradle classloader.

Now you may see it coming. I told you I had upgraded Gradle to use Groovy 2.4.4. So what does it mean? Gradle now uses Groovy with ClassValue. And what is the problem with ClassValue? All classes "touched" by Groovy will have them "polluted" with necessary metadata information. So when we create an instance of "AntBuilderDelegate", we’re doing that using the Groovy 2.4 runtime from Gradle, which comes with its own ClassInfo. And that delegate references and AntBuilder which is instantiated using the Ant classloader, with a different Groovy runtime, having its own ClassInfo. So what I had found earlier was that the ClassInfo from the Groovy "Ant" runtime was leaking into Gradle. But I hadn’t realized that the opposite was also true! By bridging the runtimes, we were leaking Groovy "Gradle" into the isolated classloader, through ClassValue!

So, what happened, is that the Groovy classes from the isolated classpath were garbage collected, because no ClassInfo from Gradle leaked into them. However, the Ant classes were touched. And they were NOT collectible then.

And this is were I stopped. Because if I found a way to "unload" ClassInfo from the isolated classpath and the touched classes from the system classloader, I haven’t found a way to do the same for the ClassInfo instances that leak into the Ant runtime… Of course I tried a "brute force" thing, by removing all ClassInfo from those classes, but as you understood, it’s a desperate attempt: it’s equivalent to shutting down the runtime. And then, it totally breaks subsequent calls in the Gradle build, we’ve just broken the Groovy runtime from Gradle…

To add some confusion to the problem, I think it’s now a good time to explain that I actually simplified the isolated ant builder classloader hierarchy. There are actually (at least) 3 classloaders involved:

  • The classloader from Gradle, which loads the Gradle runtime, the IsolatedAntBuilder instance and also the AntBuilderDelegate instance

  • A classloader for the Ant runtime, which is isolated from Gradle, apart from logging classes necessary for Gradle to be able to capture the output. This classloader is per classpath, and is the one which is cached.

  • A classloader that is filtering some classes from the Gradle runtime classloader to make them available. This is what the bridging Ant builder uses. This classloader is shared among all isolated ant builder instances.

So when I say that something leaks, it can leak to any of those classloaders, and any of the parent loaders…

That’s were I felt desperate. After a week fighting those memory leaks, and so many "ah ah, got you!" moments, I was in front of a wall. Basically, while in the first case (isolated ClassInfo leaking into Gradle), I know I can totally clean all the ClassInfo references because I know I can shutdown the Groovy runtime, in the second case (Gradle ClassInfo leaking), I basically have no idea that a class comes from the isolated classloader or not. So it’s hard to say if you should remove the ClassInfo or not. I am currently experimenting a brute force "try to determine if a class belongs in the class loader hierarchy", but it is weak (ah ah!) because I need to know about several potential class loader types.

What’s next?

So, what can we do next? One has to remember that fixing Groovy is not the ultimate solution because Gradle uses Groovy internally, but the various tasks can very well use a different Groovy version which is beyond our control.

  1. rollback Groovy in Gradle to use Groovy 2.3.10. It would avoid the Groovy classes from Gradle to leak into the Isolated classloaders, but is also unfortunate given the improvements that this version provides. Also, those who write Groovy applications for Android use Groovy 2.4+…

  2. CodeNarc would still use Groovy 2.4+, we could downgrade it too. However, if people rely on features of Groovy 2.4+, they just have no choice, so we would still have the problem.

  3. Use smarter techniques like instrumentation to track the leakages of ClassInfo, record them, and revert when we’re done. It’s doable, but it’s a huge amount of work, and relying on instrumentation for Gradle would be very bad for performance.

  4. Update Gradle to use FilteringClassLoader everywhere, including in its main process, to prevent ClassValue to be found. This would work because without that class, Groovy wouldn’t use ClassValue to store the metadata and fall back to the old mechanism.

  5. Wait for a fix in Groovy. Jochen is already working on that, but we know that the old mechanism isn’t perfect either, and has memory leaks too. That was one of the reasons to migrate to ClassValue.

  6. Wait for a fix of the JVM. That’s beyond our control.

  7. Increase the PermGen space for builds that use the code quality plugins, which internally use AntBuilder. It’s what we do today. It works, but it’s just hiding the problem. And we have to explain to our users to do it too.

  8. Some smart people come with a smart solution, that’s why I wrote this too. During that week, I got help from lots of people, including Jochen "Groovy" Theodorou, Henri "EasyMock" Tremblay, David Gageot or Nikita Salnikov from Plumbr, thank you guys!

By the way, if you wonder, the same problem exist in JDK 8 too, it’s just not visible immediately because of the metaspace that appeared to replace the PermGen space.

Now, it was fun writing this "post-mortem", I hope it wasn’t too obscure, it helped me a lot because I had so many "got it" and "oh noes!" moments that I felt it was very interesting to share this story with you. And if you like technical challenges, do not forget that Gradle is hiring!


After writing this post, I made great progress, and I did manage to get rid of the leak, but I also discovered that the leak happens with the various GroovyCompile tasks… Anyway you can follow my progress on this branch.


Improved sandboxing of Groovy scripts

27 March 2015

Tags: groovy sandoxing type checking AST secure

One of the most current uses cases of Groovy is scripting. Groovy makes it very easy to execute code dynamically, at runtime. Depending on the application, scripts can be found in multiple sources: file system, databases, remote services, … but more importantly, the designer of the application which executes scripts is not necessarily the one writing those scripts. Moreover, the scripts might run in a constrained environment (limited memory, file descriptors, time, …) or your you simply don’t want to allow users to access the full capabilities of the language from a script.

What you will learn in this post
  • why Groovy is a good fit to write internal DSLs

  • what it implies in terms of security in your applications

  • how you can customize compilation to improve the DSL

  • the meaning of SecureASTCustomizer

  • what are type checking extensions

  • how you can rely on type checking extensions to offer proper sandboxing

For example, imagine that you want to offer users the ability to evaluate mathematical expressions. One option would be to implement your own internal DSL, create a parser and eventually an interpreter for those expressions. Obviously this involves a bit of work, but if in the end you need to improve performance, for example by generating bytecode for the expressions instead of evaluating them, or introduce caching of those runtime generated classes, then Groovy is probably a very good option.

There are lots of options available, described in the documentation but the most simple example is just using the Eval class:
int sum = (Integer)"1+1");

The code 1+1 is parsed, compiled to bytecode, loaded and eventually executed by Groovy at runtime. Of course in this example the code is very simple, and you will want to add parameters, but the idea is that the code which is executed here is arbitrary. This is probably not what you want. For a calculator, you want to allow expressions like:


but certainly not

println 'Hello'
(0..100).each { println 'Blah' }
Pong p = new Pong()
println(new File('/etc/passwd').text)
System.exit(-1)'System.exit(-1)') // a script within a script!

This is where things start to become complicated, and where we start seeing actually multiple needs:

  • restricting the grammar of the language to a subset of its capabilities

  • preventing users from executing unexpected code

  • preventing users from executing malicious code

The calculator example is pretty simple, but for more complex DSLs, people might actually start writing problematic code without noticing, especially if the DSL is suffiently elegant to be used by non developers.

I was in this situation a few years ago, when I designed an engine that used Groovy "scripts" written by linguists. One of the problems was that they could unintentionally create infinite loops, for example. Code was executing on the server, and then you had a thread eating 100% of the CPU and had no choice but restart the application server so I had to find a way to mitigate that problem without compromising the DSL nor the tooling or the performance of the application.

Actually, lots of people have similar needs. During the past 4 years, I spoke to tons of users who had a similar question: How can I prevent users from doing bad things in Groovy scripts?

Compilation customizers

At that time, I had implemented my own solution, but I knew that other people also had implemented similar ones. In the end, Guillaume Laforge suggested to me that I wrote something that would help fixing those issues and make into Groovy core. This happened in Groovy 1.8.0 with compilation customizers.

Compilation customizers are a set of classes that are aimed at tweaking the compilation of Groovy scripts. You can write your own customizer but Groovy ships with:

  • an import customizer, which aims at adding imports transparently to scripts, so that users do not have to add "import" statements

  • an AST (Abstract Syntax Tree) transformation customizer, which allows to add AST transformations transparently to scripts

  • a secure AST customizer, which aims at restricting the grammar and syntactical constructs of the language

The AST transformation customizer allowed me to solve the infinite loop issue, by applying the @ThreadInterrupt transformation, but the SecureASTCustomizer is probably the one which has been the most misinterpreted of the whole.

I must apologize for this. Back then, I had no better name in mind. The important part of SecureASTCustomizer is AST. It was aimed at restricting access to some features of the AST. The "secure" part is actually not a good name at all, and I will illustrate why. You can even find a blog post from Kohsuke Kawagushi, of Jenkins fame, named Groovy SecureASTCustomizer is harmful[]. It is very true. The SecureASTCustomizer has never been designed with sandboxing in mind. It was designed to restrict the language at compile time, not runtime. So a much better name, in retrospect, would have been GrammarCustomizer. But as you’re certainly aware, there are two hard things in computer science: cache invalidation, naming things and off by one errors.

So imagine that you think of the secure AST customizer as a way of securing your script, and that you want to use this to prevent a user from calling System.exit from a script. The documentation says that you can prevent calls on some specific receivers by defining either a blacklist or a whitelist. In terms of securing something, I would always recommand to use a whitelist, that is to say to list explicitly what is allowed, rather than a blacklist, saying what is disallowed. The reason is that hackers always think of things you don’t, so let’s illustrate this.

Here is how a naive "sandbox" script engine could be configured using the SecureASTCustomizer. I am writing the examples of configuration in Java, even though I could write them in Groovy, just to make the difference between the integration code and the scripts clear.

public class Sandbox {
    public static void main(String[] args)  {
        CompilerConfiguration conf = new CompilerConfiguration();				(1)
        SecureASTCustomizer customizer = new SecureASTCustomizer();				(2)
        customizer.setReceiversBlackList(Arrays.asList(System.class.getName()));		(3)
        conf.addCompilationCustomizers(customizer);						(4)
        GroovyShell shell = new GroovyShell(conf);						(5)
        Object v = shell.evaluate("System.exit(-1)");						(6)
        System.out.println("Result = " +v);							(7)
1 create a compiler configuration
2 create a secure AST customizer
3 declare that the System class is blacklisted as the receiver of method calls
4 add the customizer to the compiler configuration
5 associate the configuration to the script shell, that is, try to create a sandbox
6 execute a nasty script
7 print the result of the execution of the script

If you run this class, when the script is executed, it will throw an error:

General error during canonicalization: Method calls not allowed on [java.lang.System]
java.lang.SecurityException: Method calls not allowed on [java.lang.System]

This is the result of the application of the secure AST customizer, which prevents the execution of methods on the System class. Success! Now we have secured our script! Oh wait…

SecureASTCustomizer pwned!

Secure you say? So what if I do:

def c = System

Execute again and you will see that the program exits without error and without printing the result. The return code of the process is -1, which indicates that the user script has been executed! What happened? Basically, at compile time, the secure AST customizer is not able to recognize that c.exit is a call on System, because it works at the AST level! It analyzes a method call, and in this case the method call is c.exit(-1), then gets the receiver and checks if the receiver is in the whitelist (or blacklist). In this case, the receiver is c and this variable is declared with def, which is equivalent to declaring it as Object, so it will think that c is Object, not System!

Actually there are many ways to workaround the various configurations that you can make on the secure AST customizer. Just for fun, a few of them:

('java.lang.System' as Class).exit(-1)

import static java.lang.System.exit

and there are much more options. The dynamic nature of Groovy just makes it impossible to resolve those cases at compile time. There are solutions though. One option is to rely on the JVM standard security manager. However this is a system wide solution which is often considered as a hammer. But it also doesn’t really work for all cases, for example you might not want to prevent creation of files, but only reads for example…

This limitation - or should I say frustration for lots of us - led several people to create a solution based on runtime checks. Runtime checks do not suffer the same problem, because you will have for example the actual receiver type of a message before checking if a method call is allowed or not. In particular, those implementations are of particular interest:

However none of those implementations is totally secure or reliable. For example, the version by Kohsuke relies on hacking the internal implementation of call site caching. The problem is that it is not compatible with the invokedynamic version of Groovy, and those internal classes are going to be removed in future versions of Groovy. The version by Simon, on the other hand, relies on AST transformations but misses a lot of possible hacks.

As a result, with friends of mine Corinne Krych, Fabrice Matrat and Sébastien Blanc, we decided to create a new runtime sandboxing mechanism that would not have the issues of those projects. We started implementing this during a hackathon in Nice, and we gave a talk about this last year at the Greach conference. It relies on AST transformations and heavily rewrites the code in order to perform a check before each method call, property access, increment of variable, binary expression, … The implementation is still incomplete, and not much work has been done because I realized there was still a problem in case of methods or properties called on "implicit this", like in builders for example:

xml {
   cars {				 // cars is a method call on an implicit this: "this".cars(...)
     car(make:'Renault', model: 'Clio')

As of today I still didn’t find a way to properly handle this because of the design of the meta-object protocol in Groovy, that here relies on the fact that a receiver throws an exception when the method is not found before trying another receiver. In short, it means that you cannot know the type of the receiver before the method is actually called. And if it is called, it’s already too late…

Until earlier this year I had still no perfect solution to this problem, in case the script being executed is using the dynamic features of the language. But now has come the time to explain how you can significantly improve the situation if you are ready to loose some of the dynamism of the language.

Type checking

Let’s come back to the root problem of the SecureASTCustomizer: it works on the abstract syntax tree and has no knowledge of the concrete types of the receivers of messages. But since Groovy 2, Groovy has optional compilation, and in Groovy 2.1, we added type checking extensions.

Type checking extensions are very powerful: they allow the designer of a Groovy DSL to help the compiler infer types, but it also lets you throw compilation errors when normally it should not. Type checking extensions are even used internally in Groovy to support the static compiler, for example to implement traits or the markup template engine.

What if, instead of relying on the information available after parsing, we could rely on information from the type checker? Take the following code that our hacker tried to write:


If you activate type checking, this code would not compile:

1 compilation error:

[Static type checking] - Cannot find matching method java.lang.Object#exit(java.lang.Integer). Please check if the declared type is right and if the method exists.

So this code would not compile anymore. But what if the code is:

def c = System

You can verify that this passes type checking by wrapping the code into a method and running the script with the groovy command line tool:

@groovy.transform.TypeChecked // or even @CompileStatic
void foo() {
  def c = System

Then the type checker will recognize that the exit method is called on the System class and is valid. It will not help us there. But what we know, if this code passes type checking, is that the compiler recognized the call on the System receiver. The idea, then, is to rely on a type checking extension to disallow the call.

A simple type checking extension

Before we dig into the details about sandboxing, let’s try to "secure" our script using a traditional type checking extension. Registering a type checking extension is easy: just set the extensions parameter of the @TypeChecked annotation (or @CompileStatic if you want to use static compilation):

void foo() {
  def c = System

The extension will be searched on classpath in source form (there’s an option to have precompiled type checking extensions but this is beyond the scope of this blog post):

onMethodSelection { expr, methodNode ->					(1)
   if ('java.lang.System') {		(2)
      addStaticTypeError("Method call is not allowed!", expr)		(3)
1 when the type checker selects the target method of a call
2 then if the selected method belongs to the System class
3 make the type checker throw an error

That’s really all needed. Now execute the code again, and you will see that there’s a compile time error!

/home/cchampeau/tmp/securetest.groovy: 6: [Static type checking] - Method call is not allowed!
 @ line 6, column 3.

1 error

So this time, thanks to the type checker, c is really recognized as an instance of class System and we can really disallow the call. This is a very simple example, but it doesn’t really go as far as what we can do with the secure AST customizer in terms of configuration. The extension that we wrote has hardcoded checks, but it would probably be nicer if we could configure it. So let’s start working with a bit more complex example.

Imagine that your application computes a score for a document and that you allow the users to customize the score. Then your DSL:

  • will expose (at least) a variable named score

  • will allow the user to perform mathematical operations (including calling methods like cos, abs, …)

  • should disallow all other method calls

An example of user script would be:


Such a DSL is easy to setup. It’s a variant of the one we defined earlier:
CompilerConfiguration conf = new CompilerConfiguration();
ImportCustomizer customizer = new ImportCustomizer();
customizer.addStaticStars("java.lang.Math");                        (1)
Binding binding = new Binding();
binding.setVariable("score", 2.0d);                                 (2)
GroovyShell shell = new GroovyShell(binding,conf);
Double userScore = (Double) shell.evaluate("abs(cos(1+score))");    (3)
System.out.println("userScore = " + userScore);
1 add an import customizer that will add import static java.lang.Math.* to all scripts
2 make the score variable available to the script
3 execute the script
There are options to cache the scripts, instead of parsing and compiling them each time. Please check the documentation for more details.

So far, our script works, but nothing prevents a hacker from executing malicious code. Since we want to use type checking, I would recommand to use the @CompileStatic transformation transparently:

  • it will activate type checking on the script, and we will be able to perform additional checks thanks to the type checking extension

  • it will improve the performance of the script

Adding @CompileStatic transparently is easy. We just have to update the compiler configuration:

ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(CompileStatic.class);

Now if you try to execute the script again, you will face a compile time error:

Script1.groovy: 1: [Static type checking] - The variable [score] is undeclared.
 @ line 1, column 11.

Script1.groovy: 1: [Static type checking] - Cannot find matching method int#plus(java.lang.Object). Please check if the declared type is right and if the method exists.
 @ line 1, column 9.

2 errors

What happened? If you read the script from a "compiler" point of view, it doesn’t know anything about the "score" variable. You, as a developer, know that it’s a variable of type double, but the compiler cannot infer it. This is precisely what type checking extensions are designed for: you can provide additional information to the compiler, so that compilation passes. In this case, we will want to indicate that the score variable is of type double.

So we will slightly change the way we transparently add the @CompileStatic annotation:

ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
        singletonMap("extensions", singletonList("SecureExtension2.groovy")),

This will "emulate" code annotated with @CompileStatic(extensions=['SecureExtension2.groovy']). Of course now we need to write the extension which will recognize the score variable:

unresolvedVariable { var ->			(1)
   if ('score') {			(2)
      return makeDynamic(var, double_TYPE)	(3)
1 in case the type checker cannot resolve a variable
2 if the variable name is score
3 then instruct the compiler to resolve the variable dynamically, and that the type of the variable is double

You can find a complete description of the type checking extension DSL in[this section of the documentation], but you have here an example of _mixed mode compilation : the compiler is not able to resolve the score variable. You, as the designer of the DSL, know that the variable is in fact found in the binding, and is of the double, so the makeDynamic call is here to tell the compiler: "ok, don’t worry, I know what I am doing, this variable can be resolved dynamically and it will be of type double". That’s it!

First completed "secure" extension

Now it’s time to put this altogether. We wrote a type checking extension which is capable of preventing calls on System on one side, and we wrote another which is able to resolve the score variable on another. So if we combine both, we have a first, complete, securing type checking extension:

// disallow calls on System
onMethodSelection { expr, methodNode ->
    if ('java.lang.System') {
        addStaticTypeError("Method call is not allowed!", expr)

// resolve the score variable
unresolvedVariable { var ->
    if ('score') {
        return makeDynamic(var, double_TYPE)

Don’t forget to update the configuration in your Java class to use the new type checking extension:

ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
        singletonMap("extensions", singletonList("SecureExtension3.groovy")),

Execute the code again and it still works. Now, try to do:


And the script compilation will fail with:

Script1.groovy: 1: [Static type checking] - Method call is not allowed!
 @ line 1, column 19.

1 error

Congratulations, you just wrote your first type checking extension that prevents the execution of malicious code!

Improving configuration of the extension

So far so good, we are able to prevent calls on System, but it is likely that we are going to discover new vulnerabilities, and that we will want to prevent execution of such code. So instead of hardcoding everything in the extension, we will try to make our extension generic and configurable. This is probably the trickiest thing to do, because there’s no direct way to provide context to a type checking extension. Our idea therefore relies on the (ugly) thread locals to pass configuration data to the type checker.

The first thing we’re going to do is to make the variable list configurable. Here is the code on the Java side of things:
public class Sandbox {
    public static final String VAR_TYPES = "sandboxing.variable.types";

    public static final ThreadLocal<Map<String, Object>> COMPILE_OPTIONS = new ThreadLocal<>();		(1)

    public static void main(String[] args) {
        CompilerConfiguration conf = new CompilerConfiguration();
        ImportCustomizer customizer = new ImportCustomizer();
        ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
                singletonMap("extensions", singletonList("SecureExtension4.groovy")),			(2)

        Binding binding = new Binding();
        binding.setVariable("score", 2.0d);
        try {
            Map<String,ClassNode> variableTypes = new HashMap<String, ClassNode>();			(3)
            variableTypes.put("score", ClassHelper.double_TYPE);					(4)
            Map<String,Object> options = new HashMap<String, Object>();					(5)
            options.put(VAR_TYPES, variableTypes);							(6)
            COMPILE_OPTIONS.set(options);								(7)
            GroovyShell shell = new GroovyShell(binding, conf);
            Double userScore = (Double) shell.evaluate("abs(cos(1+score));System.exit(-1)");
            System.out.println("userScore = " + userScore);
        } finally {
            COMPILE_OPTIONS.remove();									(8)
1 create a ThreadLocal that will hold the contextual configuration of the type checking extension
2 update the extension to SecureExtension4.groovy
3 variableTypes is a map variable name → variable type
4 so this is where we’re going to add the score variable declaration
5 options is the map that will store our type checking configuration
6 we set the "variable types" value of this configuration map to the map of variable types
7 and assign it to the thread local
8 eventually, to avoid memory leaks, it is important to remove the configuration from the thread local

And now, here is how the type checking extension can use this:

import static Sandbox.*

def typesOfVariables = COMPILE_OPTIONS.get()[VAR_TYPES]				(1)

unresolvedVariable { var ->
    if (typesOfVariables[]) {						(2)
        return makeDynamic(var, typesOfVariables[])			(3)
1 Retrieve the list of variable types from the thread local
2 if an unresolved variable is found in the map of known variables
3 then declare to the type checker that the variable is of the type found in the map

Basically, the type checking extension, because it is executed when the type checker verifies the script, can access the configuration through the thread local. Then, instead of using hard coded names in unresolvedVariable, we can just check that the variable that the type checker doesn’t know about is actually declared in the configuration. If it is, then we can tell it which type it is. Easy!

Now we have to find a way to explicitly declare the list of allowed method calls. It is a bit trickier to find a proper configuration for that, but here is what we came up with.

Configuring a white list of methods

The idea of the whitelist is simple. A method call will be allowed if the method descriptor can be found in the whitelist. This whitelist consists of regular expressions, and the method descriptor consists of the fully-qualified class name of the method, it’s name and parameters. For example, for System.exit, the descriptor would be:


So let’s see how to update the Java integration part to add this configuration:

public class Sandbox {
    public static final String WHITELIST_PATTERNS = "sandboxing.whitelist.patterns";

    // ...

    public static void main(String[] args) {
        // ...
        try {
            Map<String,ClassNode> variableTypes = new HashMap<String, ClassNode>();
            variableTypes.put("score", ClassHelper.double_TYPE);
            Map<String,Object> options = new HashMap<String, Object>();
            List<String> patterns = new ArrayList<String>();					(1)
            patterns.add("java\\.lang\\.Math#");						(2)
            options.put(VAR_TYPES, variableTypes);
            options.put(WHITELIST_PATTERNS, patterns);						(3)
            GroovyShell shell = new GroovyShell(binding, conf);
            Double userScore = (Double) shell.evaluate("abs(cos(1+score));System.exit(-1)");
            System.out.println("userScore = " + userScore);
        } finally {
1 declare a list of patterns
2 add all methods of java.lang.Math as allowed
3 put the whitelist to the type checking options map

Then on the type checking extension side:

import groovy.transform.CompileStatic
import org.codehaus.groovy.ast.ClassNode
import org.codehaus.groovy.ast.MethodNode
import org.codehaus.groovy.ast.Parameter

import static Sandbox.*

private static String prettyPrint(ClassNode node) {

private static String toMethodDescriptor(MethodNode node) {								(1)
    if (node instanceof ExtensionMethodNode) {
        return toMethodDescriptor(node.extensionMethodNode)
    def sb = new StringBuilder()
    sb.append(node.parameters.collect { Parameter it ->
def typesOfVariables = COMPILE_OPTIONS.get()[VAR_TYPES]
def whiteList = COMPILE_OPTIONS.get()[WHITELIST_PATTERNS]								(2)

onMethodSelection { expr, MethodNode methodNode ->
    def descr = toMethodDescriptor(methodNode)										(3)
    if (!whiteList.any { descr =~ it }) {										(4)
        addStaticTypeError("You tried to call a method which is not allowed, what did you expect?: $descr", expr)	(5)

unresolvedVariable { var ->
    if (typesOfVariables[]) {
        return makeDynamic(var, typesOfVariables[])
1 this method will generate a method descriptor from a MethodNode
2 retrieve the whitelist of methods from the thread local option map
3 convert a selected method into a descriptor string
4 if the descriptor doesn’t match any of the whitelist entries, throw an error

So if you execute the code again, you will now have a very cool error:

Script1.groovy: 1: [Static type checking] - You tried to call a method which is not allowed, what did you expect?: java.lang.System#exit(int)
 @ line 1, column 19.

1 error

There we are! We now have a type checking extension which handles both the types of the variables that you export in the binding and a whitelist of allowed methods. This is still not perfect, but we’re very close to the final solution! It’s not perfect because we only took care of method calls here, but you have to deal with more than that. For example, properties (like foo.text which is implicitly converted into foo.getText()).

Putting it altogether

Dealing with properties is a bit more complicated because the type checker doesn’t have a handler for "property selection" like it does for methods. We can work around that, and if you are interested in seeing the resulting code, check it out below. It’s a type checking extension which is not written exactly as you have seen in this blog post, because it is meant to be precompiled for improved performance. But the idea is exactly the same.

import groovy.transform.CompileStatic
import org.codehaus.groovy.ast.ClassCodeVisitorSupport
import org.codehaus.groovy.ast.ClassHelper
import org.codehaus.groovy.ast.ClassNode
import org.codehaus.groovy.ast.MethodNode
import org.codehaus.groovy.ast.Parameter
import org.codehaus.groovy.ast.expr.PropertyExpression
import org.codehaus.groovy.control.SourceUnit

import static Sandbox.*

class SandboxingTypeCheckingExtension extends GroovyTypeCheckingExtensionSupport.TypeCheckingDSL {

    private static String prettyPrint(ClassNode node) {

    private static String toMethodDescriptor(MethodNode node) {
        if (node instanceof ExtensionMethodNode) {
            return toMethodDescriptor(node.extensionMethodNode)
        def sb = new StringBuilder()
        sb.append(node.parameters.collect { Parameter it ->

    Object run() {

        // Fetch white list of regular expressions of authorized method calls
        def whiteList = COMPILE_OPTIONS.get()[WHITELIST_PATTERNS]
        def typesOfVariables = COMPILE_OPTIONS.get()[VAR_TYPES]

        onMethodSelection { expr, MethodNode methodNode ->
            def descr = toMethodDescriptor(methodNode)
            if (!whiteList.any { descr =~ it }) {
                addStaticTypeError("You tried to call a method which is not allowed, what did you expect?: $descr", expr)

        unresolvedVariable { var ->
            if (isDynamic(var) && typesOfVariables[]) {
                storeType(var, typesOfVariables[])
                handled = true

        // handling properties (like foo.text) is harder because the type checking extension
        // does not provide a specific hook for this. Harder, but not impossible!

        afterVisitMethod { methodNode ->
            def visitor = new PropertyExpressionChecker(context.source, whiteList)

    private class PropertyExpressionChecker extends ClassCodeVisitorSupport {
        private final SourceUnit unit
        private final List<String> whiteList

        PropertyExpressionChecker(final SourceUnit unit, final List<String> whiteList) {
            this.unit = unit
            this.whiteList = whiteList

        protected SourceUnit getSourceUnit() {

        void visitPropertyExpression(final PropertyExpression expression) {

            ClassNode owner = expression.objectExpression.getNodeMetaData(StaticCompilationMetadataKeys.PROPERTY_OWNER)
            if (owner) {
                if (expression.spreadSafe && StaticTypeCheckingSupport.implementsInterfaceOrIsSubclassOf(owner, classNodeFor(Collection))) {
                    owner = typeCheckingVisitor.inferComponentType(owner, ClassHelper.int_TYPE)
                def descr = "${prettyPrint(owner)}#${expression.propertyAsString}"
                if (!whiteList.any { descr =~ it }) {
                    addStaticTypeError("Property is not allowed: $descr", expression)

And a final version of the sandbox that includes assertions to make sure that we catch all cases:
public class Sandbox {
    public static final String WHITELIST_PATTERNS = "sandboxing.whitelist.patterns";
    public static final String VAR_TYPES = "sandboxing.variable.types";

    public static final ThreadLocal<Map<String, Object>> COMPILE_OPTIONS = new ThreadLocal<Map<String, Object>>();

    public static void main(String[] args) {
        CompilerConfiguration conf = new CompilerConfiguration();
        ImportCustomizer customizer = new ImportCustomizer();
        ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
                singletonMap("extensions", singletonList("SandboxingTypeCheckingExtension.groovy")),

        Binding binding = new Binding();
        binding.setVariable("score", 2.0d);
        try {
            Map<String, ClassNode> variableTypes = new HashMap<String, ClassNode>();
            variableTypes.put("score", ClassHelper.double_TYPE);
            Map<String, Object> options = new HashMap<String, Object>();
            List<String> patterns = new ArrayList<String>();
            // allow method calls on Math
            // allow constructors calls on File
            // because we let the user call each/times/...
            options.put(VAR_TYPES, variableTypes);
            options.put(WHITELIST_PATTERNS, patterns);
            GroovyShell shell = new GroovyShell(binding, conf);
            Object result;
            try {
                result = shell.evaluate("'1')"); // error
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());
            try {
                result = shell.evaluate("System.exit(-1)"); // error
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());
            try {
                result = shell.evaluate("((Object)Eval).me('1')"); // error
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());

            try {
                result = shell.evaluate("new File('/etc/passwd').getText()"); // getText is not allowed
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());

            try {
                result = shell.evaluate("new File('/etc/passwd').text");  // getText is not allowed
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());

            Double userScore = (Double) shell.evaluate("abs(cos(1+score))");
            System.out.println("userScore = " + userScore);
        } finally {


This post has explored the interest of using Groovy as a platform for scripting on the JVM. It introduced various mechanisms for integration, and showed that this comes at the price of security. However, we illustrated some concepts like compilation customizers that make it easier to sandbox the environment of execution of scripts. The current list of customizers available in the Groovy distribution, and the currently available sandboxing projects in the wild, are not sufficient to guarantee security of execution of scripts in the general case (dependending, of course, on your users and where the scripts come from).

We then illustrated how you could, if you are ready to pay the price of loosing some of the dynamic features of the language, properly workaround those limitations through type checking extensions. Those type checking extensions are so powerful that you can even introduce your own error messages during the compilation of scripts. Eventually, by doing this and caching your scripts, you will also benefit from dramatic performance improvements in script execution.

Eventually, the sandboxing mechanism that we illustrated is not a replacement for the SecureASTCustomizer. We recommand that you actually use both, because they work on different levels: the secure AST customizer will work on the grammar level, allowing you to restrict some language constructs (for example preventing the creation of closures or classes inside the script), while the type checking extension will work after type inference (allowing you to reason on inferred types rather than declared types).

Last but not least, the solution that I described here is incomplete. It is not available in Groovy core. As I will have less time to work on Groovy, I would be very glad if someone or some people improve this solution and make a pull request so that we can have something!


Who is Groovy?

04 March 2015

Tags: groovy apache OSS

With all the changes that the Groovy project is seeing since the beginning of the year, I thought it was a good time to make a summary about its history. In particular, with the end of sponsorship from Pivotal, as well as Guillaume Laforge annoucing he is joining Restlet, a lot of people state that Groovy is done. It will be the occasion to talk about the history of the project, both in terms of community and sponsorship.

First of all, Groovy is, and will remain, alive. Groovy is a community project and we are pleased to announce that Groovy has started the process to join the Apache Software Foundation. The community deserved it, and even if it will mean some adaptations on our side, we think that the ASF will be a great fit for the project.

To build the statistics you will read in this post, I have used a Groovy script (of course!) that takes as reference the number of commits. This is far from being a perfect number, because some commits are just for fixing typos, while others are full new features and it also totally misses the fact that patches, before Git, didn’t carry the author information or the problem that we maintain multiple branches of Groovy, requiring a lot of work, but it gives an idea… And because people think that end of sponsorship may be equivalent to death, I separated commits in two categories: sponsored and community. Sponsored commits are commits which are likely to have been made by someone directly paid to contribute on Groovy. We will see how this proportion evolved over time.

If you think Groovy is dead, please read the following carefully. Let’s start our journey back in time!

2003: A dynamic language for the JVM

Total: 476 commits Community: 476 commits (100%)

  1. James Strachan : 374 commits (78%)

  2. Bob McWhirter : 76 commits (15%)

  3. Sam Pullara : 23 commits (4%)

  4. Kasper Nielsen : 2 commits (0%)

  5. Aslak Hellesoy : 1 commits (0%)

2003 is the inception year of Groovy. James Strachan, in August 2003, wanted to create a dynamic language for the JVM, inspired by Ruby, but closer to Java. Groovy was born, and the idea never changed over time: Groovy is the perfect companion for Java, a language which can be very close to Java in terms of syntax but also removes a lot of its boilerplate. Bob McWhirter is a famous name in the JVM world, he is now Director of Polyglot at Red Hat, so you can see that when he started contributing to the language back then, there was already a story for him! Sam Pullara is another very smart guy in the JVM world and he worked for top companies like BEA, Yahoo! or Twitter to name a few. He is a technical reviewer for the JavaOne conference.

2004: Guillaume Laforge joins the project

Total: 871 commits Community: 871 commits (100%)

  1. James Strachan : 495 commits (56%)

  2. Guillaume Laforge : 101 commits (11%)

  3. John Wilson : 70 commits (8%)

  4. Sam Pullara : 66 commits (7%)

  5. Jeremy Rayner : 33 commits (3%)

  6. Chris Poirier : 28 commits (3%)

  7. Bing Ran : 21 commits (2%)

  8. Steve Goetze : 12 commits (1%)

  9. John Stump : 10 commits (1%)

  10. Russel Winder : 8 commits (0%)

  11. Zohar Melamed : 8 commits (0%)

  12. Jochen Theodorou : 7 commits (0%)

  13. Damage Control : 4 commits (0%)

  14. Bob McWhirter : 3 commits (0%)

  15. Christiaan ten Klooster : 3 commits (0%)

  16. Yuri Schimke : 2 commits (0%)

Groovy is hosted at Codehaus (which just has annouced its retirement) and 2004 sees the appearance of famous names of the community. In particular, you can already see Guillaume Laforge and Jochen Theodorou. Both of them still directly work on the project as today. John Wilson started contributing the famous XML support of Groovy, and you can also note names like Russel Winder of Gant and GPars fame. Jeremy Rayer’s work is famous in the Groovy community since he wrote the first versions of the Groovy grammar using Antlr.

2005: Jochen Theodorou joins the project

Total: 934 commits Community: 934 commits (100%)

  1. Jochen Theodorou : 244 commits (26%)

  2. James Strachan : 162 commits (17%)

  3. Pilho Kim : 104 commits (11%)

  4. John Wilson : 79 commits (8%)

  5. Guillaume Laforge : 75 commits (8%)

  6. Dierk Koenig : 70 commits (7%)

  7. Jeremy Rayner : 62 commits (6%)

  8. Christian Stein : 48 commits (5%)

  9. Alan Green : 20 commits (2%)

  10. Russel Winder : 20 commits (2%)

  11. Martin C. Martin : 14 commits (1%)

  12. Sam Pullara : 10 commits (1%)

  13. John Rose : 8 commits (0%)

  14. Hein Meling : 7 commits (0%)

  15. Scott Stirling : 6 commits (0%)

  16. Franck Rasolo : 5 commits (0%)

I would tend to think that 2005 is the year when Jochen Theodorou took the technical lead of Groovy. In 2005, he becomes the most prolific contributor, even beyond the creator of the language himself. Dierk Koenig makes an appearance here: he is known for his work on GPars, but also for the reference book for Groovy: Groovy in Action.

2006: Rise of Paul King

Total: 480 commits Community: 480 commits (100%)

  1. Jochen Theodorou : 221 commits (46%)

  2. John Wilson : 56 commits (11%)

  3. Paul King : 54 commits (11%)

  4. Guillaume Laforge : 47 commits (9%)

  5. Dierk Koenig : 37 commits (7%)

  6. Jeremy Rayner : 23 commits (4%)

  7. Russel Winder : 14 commits (2%)

  8. Guillaume Alleon : 12 commits (2%)

  9. Joachim Baumann : 6 commits (1%)

  10. Martin C. Martin : 4 commits (0%)

  11. Graeme Rocher : 2 commits (0%)

  12. Marc Guillemot : 2 commits (0%)

  13. Christian Stein : 1 commits (0%)

  14. Steve Goetze : 1 commits (0%)

2006 is a very calm year for Groovy in terms of code production. James, the creator of the language, already disappeared from the contributors, and will not contribute anymore. Guillaume Laforge, in agreement with the other contributors, takes the project lead (he is still the lead today).

With half as many commits as in 2007, in retrospect, I would say that this was a critical year: either the project would die, or it would have become what it is today. And my personal feeling is that the person who saved Groovy just appeared in the contributors list: Paul King. Paul is undoubtfully the most active contributor to Groovy. He wrote a lot of the Groovy Development Kit, that is to say the APIs without which a language would be nothing. Having a nice language is one thing, having proper APIs and libraries that unleash its full potential is another. Paul King did it. Look at his ranking here: 3rd place. You will never see him ranked lower than that. And guess what? Paul is not paid to do this. He runs his own business and if you want to work with a Groovy expert, he’s probably the best.

Joachim Baumann is a name some people would recognize: he is still working with Groovy and one of the most regular contributors, with the Windows installer. Joachim takes time, for each Groovy release, to produce a Windows installer, which today we are still not capable of handling automatically.

2007: Groovy 1.0

  1. Paul King : 447 commits (30%)

  2. Jason Dillon : 265 commits (18%)

  3. Jochen Theodorou (Sponsored) : 242 commits (16%)

  4. Danno Ferrin : 101 commits (6%)

  5. Alex Tkachman (Sponsored) : 87 commits (5%)

  6. Graeme Rocher (Sponsored) : 61 commits (4%)

  7. Russel Winder : 46 commits (3%)

  8. Marc Guillemot : 36 commits (2%)

  9. Andres Almiray : 34 commits (2%)

  10. Guillaume Laforge (Sponsored) : 33 commits (2%)

  11. Jeremy Rayner : 26 commits (1%)

  12. Alexandru Popescu : 24 commits (1%)

  13. John Wilson : 22 commits (1%)

  14. Joachim Baumann : 21 commits (1%)

  15. Jeff Brown : 8 commits (0%)

  16. Dierk Koenig : 6 commits (0%)

  17. Martin C. Martin : 6 commits (0%)

  18. Guillaume Alleon : 4 commits (0%)

2007 is an important year in the history of Groovy. On January, 2d, Groovy 1.0 is out. Paul King ranks #1 for the first time, and will remain on top for a long time. This year also sees the creation of G2One, the first company build specifically for Groovy and Grails, by Guillaume Laforge, Graeme Rocher and Alex Tkachman. Both Graeme and Alex make their first appearance in the contributors graph, and both of them made significant contributions to the Groovy ecosystem: Graeme is famous for co-creating the Grails framework, and is still the lead of the project, while Alex is the one who contributed major performance improvements to the Groovy runtime (call site caching) and first experimented with a static compiler for Groovy (Groovy++).

Danno Ferrin contributed what is still one of my personal favorite features of Groovy, AST transformations, and probably one of the reasons I got paid to work on Groovy so thank you Danno! Andrés Almiray, listed here for the first time, is famous for the Griffon framework, a Grails-like framework for desktop applications which is still actively developed. He spent a lot of time improving the Swing support in Groovy.

Starting from 2007, you will see that the sponsored ratio of commits is changing. People who were employed by G2One fall into that category. As you can see, 2007 is more than important for Groovy, it is its second birth. And to conclude that, Groovy won the first prize at JAX 2007 innovation award.

2008: The G2One era

Total: 1069 commits Sponsored: 287 commits (26%) Community: 782 commits (73%)

  1. Paul King : 445 commits (41%)

  2. Danno Ferrin : 176 commits (16%)

  3. Jochen Theodorou (Sponsored) : 126 commits (11%)

  4. Alex Tkachman (Sponsored) : 125 commits (11%)

  5. Guillaume Laforge (Sponsored) : 33 commits (3%)

  6. Jim White : 32 commits (2%)

  7. Russel Winder : 31 commits (2%)

  8. Martin Kempf : 22 commits (2%)

  9. Roshan Dawrani : 19 commits (1%)

  10. Jeremy Rayner : 14 commits (1%)

  11. Martin C. Martin : 12 commits (1%)

  12. Jason Dillon : 9 commits (0%)

  13. Andres Almiray : 8 commits (0%)

  14. Thom Nichols : 5 commits (0%)

  15. Graeme Rocher (Sponsored) : 3 commits (0%)

  16. Jeff Brown : 3 commits (0%)

  17. John Wilson : 3 commits (0%)

  18. James Williams : 1 commits (0%)

  19. Marc Guillemot : 1 commits (0%)

  20. Vladimir Vivien : 1 commits (0%)

In 2008, Paul King still ranks #1 and you can see that the people who were sponsored by G2One were actually not the main contributors. Actually, most of them did consulting to pay salaries, which doesn’t leave much time to contribute to the language. Hopefully, a great project such as Groovy can rely on its community! Guillaume, Graeme and Alex were looking for an opportunity to spend more time on actual development, and it happened in November 2008 when G2One got acquired by SpringSource.

Some of the contributors you see in this list are still actively using Groovy or contributing: Jim White for example is famous for his contributions on the scripting sides of the language. Roshan Dawrani is one of the few guys capable of opening cryptic code and fixing bugs. Jeff Brown is a name you should know, since he is now a key member of the Grails team.

2009: milestones and the inappropriate quote

Total: 835 commits Sponsored: 183 commits (21%) Community: 652 commits (78%)

  1. Paul King : 342 commits (40%)

  2. Roshan Dawrani : 128 commits (15%)

  3. Jochen Theodorou (Sponsored) : 101 commits (12%)

  4. Alex Tkachman (Sponsored) : 41 commits (4%)

  5. Guillaume Laforge (Sponsored) : 40 commits (4%)

  6. Jason Dillon : 31 commits (3%)

  7. Jim White : 31 commits (3%)

  8. Danno Ferrin : 24 commits (2%)

  9. Peter Niederwieser : 23 commits (2%)

  10. Hamlet D’Arcy : 18 commits (2%)

  11. Russel Winder : 14 commits (1%)

  12. Martin C. Martin : 13 commits (1%)

  13. Thom Nichols : 13 commits (1%)

  14. Andres Almiray : 12 commits (1%)

  15. Vladimir Vivien : 3 commits (0%)

  16. Graeme Rocher (Sponsored) : 1 commits (0%)

2009 is another important year concluding with the release of Groovy 1.7, the first version of Groovy supporting inner classes or the famous power asserts from Peter Niederwieser. If you know Groovy, you must know Peter, the father of the famous Spock testing framework which just reached 1.0!

Hamlet D’Arcy contributed a lot in terms of code quality, but also became the first specialist of AST transformations. 2009 is also the year I started to use Groovy, as a user. I never stopped and actually I started contributing back then. At that time, Groovy was still using Subversion (we’re now using Git like all the cool kids), so it was the good old patch way, loosing authorship.

This year is also the year when James Strachan wrote a very famous quote about Groovy. This quote is probably the most innapropriately used quote about Groovy of all time, because it was done by its creator, but remember that James left the project in 2005!

I can honestly say if someone had shown me the Programming in Scala book by Martin Odersky, Lex Spoon & Bill Venners back in 2003 I’d probably have never created Groovy.
on his blog
— James Strachan

First of all James says nothing about the language itself here. He had already left the project and says that if he had known about Scala before, he wouldn’t have created Groovy. I am today very happy that he didn’t know about it, or we would have missed an incredibly powerful language. Groovy today is nothing close to what it was when James left the project, thanks to the lead of Guillaume Laforge and incredibly talented people like Paul King, Jochen Theodorou and all the contributors listed on this page. Groovy and Scala both have their communities, but also different use cases. I wouldn’t sell one for the other…

In the end of 2009, another important milestone occurred for project, with VMware acquiring SpringSource.

2010: DSLs all the way

Total: 894 commits Sponsored: 189 commits (21%) Community: 705 commits (78%)

  1. Paul King : 443 commits (49%)

  2. Roshan Dawrani : 134 commits (14%)

  3. Jochen Theodorou (Sponsored) : 96 commits (10%)

  4. Guillaume Laforge (Sponsored) : 93 commits (10%)

  5. Hamlet D’Arcy : 71 commits (7%)

  6. Alex Tkachman : 28 commits (3%)

  7. Peter Niederwieser : 19 commits (2%)

  8. Andres Almiray : 7 commits (0%)

  9. Jason Dillon : 1 commits (0%)

  10. Russel Winder : 1 commits (0%)

  11. Thom Nichols : 1 commits (0%)

2010 is a pretty stable year for Groovy. Groovy reaches 1.8 in 2010 with important features for its incredible DSL design capabilities. With command chain expressions, native JSON support and performance improvements, Groovy put the bar very high in terms of integration in the Java ecosystem. Today, no other JVM language is as simple as Groovy to integrate with Java. With cross-compilation and by the use of the very same class model, Groovy is at that date the best language for scripting on the JVM. It is so good that a lot of people start to see it as a better Java and want to use it as a first class language. However, being dynamic, Groovy is still a problem for a category of users…

2011: Time to move to GitHub

Total: 841 commits Sponsored: 514 commits (61%) Community: 327 commits (38%)

  1. Cédric Champeau (Sponsored) : 252 commits (29%)

  2. Paul King : 212 commits (25%)

  3. Jochen Theodorou (Sponsored) : 163 commits (19%)

  4. Guillaume Laforge (Sponsored) : 98 commits (11%)

  5. Jochen : 44 commits (5%)

  6. Hamlet D’Arcy : 33 commits (3%)

  7. Roshan Dawrani : 26 commits (3%)

  8. Andres Almiray : 1 commits (0%)

  9. Andrew Eisenberg : 3 commits (0%)

  10. Alex Tkachman : 2 commits (0%)

  11. Bobby Warner : 1 commits (0%)

  12. Colin Harrington : 1 commits (0%)

  13. Dierk Koenig : 1 commits (0%)

  14. Dirk Weber : 1 commits (0%)

  15. John Wagenleitner : 1 commits (0%)

  16. Lari Hotari (Sponsored) : 1 commits (0%)

  17. Peter Niederwieser : 1 commits (0%)

In 2011, I became a committer to the Groovy project. As I said, I had contributed several fixes or features for Groovy 1.8, but for the first time, I became a committer and I started to be able to push changes to the codebase without having to ask permission. So this is basically the first time you see my name on the contributors list, but you can see that I am ranking #1 and I have never lost that ranking since then. It surprised me too, but there is a very good reason for that. In october 2011, in addition to being a committer, I also became paid to work on Groovy. Full-time. I entered the club of lucky people being paid to work on open-source software. It was sincerely a dream, and I will never be enough thankful to Guillaume Laforge for giving me this opportunity. He changed my life and I think I became a better developer thanks to him. VMware was my employer back then, and while I had never worked on a language before, Guillaume trusted my skills and proposed to me to work on something that would dramatically change the language : a static type checker.

I also worked on the infrastructure of the language, starting from the migration to GitHub. It was an important move to make: as you can see, there was a very limited set of committers to Groovy. With GitHub, we had the tool we needed to increase the size of our community and from the numbers that will follow, I think it’s a success.

2012: Groovy 2 and static compilation

  1. Cédric Champeau (Sponsored) : 515 commits (46%)

  2. Paul King : 249 commits (22%)

  3. Jochen Theodorou (Sponsored) : 169 commits (15%)

  4. Guillaume Laforge (Sponsored) : 74 commits (6%)

  5. PascalSchumacher : 12 commits (1%)

  6. Peter Niederwieser : 11 commits (0%)

  7. René Scheibe : 11 commits (0%)

  8. Andre Steingress : 9 commits (0%)

  9. John Wagenleitner : 7 commits (0%)

  10. Peter Ledbrook : 6 commits (0%)

  11. Andres Almiray : 6 commits (0%)

  12. Adrian Nistor : 5 commits (0%)

  13. Tim Yates : 5 commits (0%)

  14. Baruch Sadogursky : 4 commits (0%)

  15. Andrew Eisenberg : 3 commits (0%)

  16. Rich Freedman : 3 commits (0%)

  17. Stephane Maldini : 3 commits (0%)

  18. Andrew Taylor : 2 commits (0%)

  19. Jeff Brown : 2 commits (0%)

  20. Luke Daley : 2 commits (0%)

  21. Tiago Fernandez : 2 commits (0%)

  22. Andrey Bloschetsov : 1 commits (0%)

  23. Johnny Wey : 1 commits (0%)

  24. Kenneth Kousen : 1 commits (0%)

  25. Mathieu Bruyen : 1 commits (0%)

  26. Paul Bakker : 1 commits (0%)

  27. Paulo Poiati : 1 commits (0%)

  28. Sean Flanigan : 1 commits (0%)

  29. Suk-Hyun Cho : 1 commits (0%)

  30. Vladimir Orany : 1 commits (0%)

2012 is one of the most important years for the language. It was the year Groovy 2.0 was released. As you can see, I am still ranking #1 and Paul King, an unpaid contributor, is #2. This tells you the importance of community! Groovy 2 is a major change in the language, because it introduced both optional type checking and static compilation. For the first time, Groovy was able to provide at compile time the same level of feedback that Java would have. Some people wanted to kill me for having introduced that into the language. The truth is that it wasn’t my decision, but in retrospect, I am very happy with what the language is now. Without this, some people would have abandonned Groovy in favor of other JVM languages like Scala, while now in Groovy you can have the same level of performance as Java, with type safety, powerful type inference, extension methods, functional style programming and without the boilerplate. And it’s optional. I don’t know any other language that allows this, especially when you take type checking extensions into account, a feature that allows Groovy to go far beyond what Java and other languages offer in terms of type safety or static compilation.

2012 also sees the appearance of Pascal Schumacher, a silent but very active Groovy committer. Pascal does since 2012 an amazing job in helping us filtering JIRA issues, writing bugfixes, reviewing pull requests and lately writing documentation.

2013: Documentation effort and explosion of contributions

  1. Cédric Champeau (Sponsored) : 244 commits (22%)

  2. Paul King : 188 commits (17%)

  3. PascalSchumacher : 180 commits (16%)

  4. Jochen Theodorou (Sponsored) : 96 commits (8%)

  5. Thibault Kruse : 84 commits (7%)

  6. Guillaume Laforge (Sponsored) : 54 commits (4%)

  7. Andrey Bloschetsov : 43 commits (3%)

  8. Andre Steingress : 36 commits (3%)

  9. Pascal Schumacher : 27 commits (2%)

  10. Tim Yates : 24 commits (2%)

  11. René Scheibe : 12 commits (1%)

  12. kruset : 12 commits (1%)

  13. Martin Hauner : 8 commits (0%)

  14. Andres Almiray : 8 commits (0%)

  15. Larry Jacobson : 4 commits (0%)

  16. John Wagenleitner : 6 commits (0%)

  17. Paolo Di Tommaso : 6 commits (0%)

  18. Jeff Scott Brown (Sponsored) : 5 commits (0%)

  19. Masato Nagai : 5 commits (0%)

  20. Jochen Eddelbüttel : 3 commits (0%)

  21. hbaykuslar : 3 commits (0%)

  22. shalecraig : 3 commits (0%)

  23. Andrew Eisenberg : 2 commits (0%)

  24. Jacopo Cappellato : 2 commits (0%)

  25. Peter Niederwieser : 2 commits (0%)

  26. Rafael Luque : 2 commits (0%)

  27. Vladimir Orany : 1 commits (0%)

  28. saschaklein : 2 commits (0%)

  29. seanjreilly : 2 commits (0%)

  30. upcrob : 2 commits (0%)

  31. Adrian Nistor : 1 commits (0%)

  32. Alan Thompson : 1 commits (0%)

  33. Alessio Stalla : 1 commits (0%)

  34. DJBen : 1 commits (0%)

  35. Eric Dahl : 1 commits (0%)

  36. Ingo Hoffmann : 1 commits (0%)

  37. JBaruch : 1 commits (0%)

  38. Jacob Aae Mikkelsen : 1 commits (0%)

  39. Jim White : 1 commits (0%)

  40. John Engelman : 1 commits (0%)

  41. Jon Schneider : 1 commits (0%)

  42. Karel Piwko : 1 commits (0%)

  43. Kenneth Endfinger : 1 commits (0%)

  44. Kohsuke Kawaguchi : 1 commits (0%)

  45. Luke Kirby : 1 commits (0%)

  46. Michal Mally : 1 commits (0%)

  47. Miro Bezjak : 1 commits (0%)

  48. Olivier Croquette : 1 commits (0%)

  49. Rob Upcraft : 1 commits (0%)

  50. Sergey Egorov : 1 commits (0%)

  51. Stefan Armbruster : 1 commits (0%)

  52. Yasuharu NAKANO : 1 commits (0%)

While continuing to improve Groovy, 2013 was very important for the community. You can start to see the GitHub effect here, with much more contributors than before. It is impressive to see the difference before 2011 and after. The number of contributors is continously growing. In 2013, 63% of commits came from the community!

In February 2013, we also launched a new big project: the documentation and website overhaul. It is incredible to think that this effort is still uncomplete, but if you see that the old wiki has more than a thousand page or contents (often outdated), you can imagine what effort it takes to rewrite the documentation. Hopefully, we’re close to filling the gap now, and with the demise of Codehaus, we officially launched our new website where you can see the result of this job.

I also started working on Android support during 2013, for a first overview in GR8Conf 2014, and continued working on improving the infrastructure, with Bintray, TeamCity and Gradle. And Pivotal was born, out of EMC and VMware. Groovy and Grails, along with the Spring Framework, became part of this new company which is still paying me today to work on Groovy (and I, we, should be very thankful for this).

2014: Towards Android support

  1. Cédric Champeau (Sponsored) : 446 commits (37%)

  2. Paul King : 261 commits (22%)

  3. Jochen Theodorou (Sponsored) : 85 commits (7%)

  4. Guillaume Laforge (Sponsored) : 61 commits (5%)

  5. Thibault Kruse : 54 commits (4%)

  6. Pascal Schumacher : 47 commits (3%)

  7. Jim White : 26 commits (2%)

  8. Yu Kobayashi : 18 commits (1%)

  9. Andre Steingress : 16 commits (1%)

  10. Richard Hightower : 3 commits (0%)

  11. James Northrop : 11 commits (0%)

  12. Kenneth Endfinger : 9 commits (0%)

  13. Tomek Janiszewski : 9 commits (0%)

  14. Matias Bjarland : 8 commits (0%)

  15. Tobia Conforto : 8 commits (0%)

  16. Michael Schuenck : 7 commits (0%)

  17. Sargis Harutyunyan : 7 commits (0%)

  18. Andrey Bloschetsov : 6 commits (0%)

  19. Craig Andrews : 5 commits (0%)

  20. Kent : 5 commits (0%)

  21. Paolo Di Tommaso : 5 commits (0%)

  22. Peter Ledbrook : 5 commits (0%)

  23. Sergey Egorov : 5 commits (0%)

  24. Yasuharu Nakano : 5 commits (0%)

  25. Andrew Hamilton : 4 commits (0%)

  26. Lari Hotari (Sponsored) : 4 commits (0%)

  27. Bloshchetsov Andrey Evgenyevich : 3 commits (0%)

  28. Johannes Link : 3 commits (0%)

  29. Keegan Witt : 3 commits (0%)

  30. Tim Yates : 3 commits (0%)

  31. anto_belgin : 3 commits (0%)

  32. Baruch Sadogursky : 2 commits (0%)

  33. Dan Allen : 2 commits (0%)

  34. Jan Sykora : 2 commits (0%)

  35. John Wagenleitner : 2 commits (0%)

  36. Luke Kirby : 2 commits (0%)

  37. Martin Stockhammer : 2 commits (0%)

  38. UEHARA Junji : 2 commits (0%)

  39. Vihang D : 2 commits (0%)

  40. Andres Almiray : 2 commits (0%)

  41. Andy Hamilton : 1 commits (0%)

  42. Bobby Warner : 1 commits (0%)

  43. Carsten Lenz : 1 commits (0%)

  44. Chris Earle : 1 commits (0%)

  45. David Avenante : 1 commits (0%)

  46. David Nahodil : 1 commits (0%)

  47. David Tiselius : 1 commits (0%)

  48. Dimitar Dimitrov : 1 commits (0%)

  49. Grant McConnaughey : 1 commits (0%)

  50. Jeff Sheets : 1 commits (0%)

  51. Jess Sightler : 1 commits (0%)

  52. Logan Gorence : 1 commits (0%)

  53. Luke Daley : 1 commits (0%)

  54. Manuel Prinz : 1 commits (0%)

  55. Marc Guillemot : 1 commits (0%)

  56. Marcin Grzejszczak : 1 commits (0%)

  57. Nathan Mische : 1 commits (0%)

  58. Peter Swire : 1 commits (0%)

  59. Sagar Sane : 1 commits (0%)

  60. Stephen Mallette : 1 commits (0%)

  61. Tobias Schulte : 1 commits (0%)

  62. Wil Selwood : 1 commits (0%)

  63. davidmichaelkarr : 1 commits (0%)

  64. fintelia : 1 commits (0%)

  65. kruset : 1 commits (0%)

  66. paul-bjorkstrand : 1 commits (0%)

2014 was a difficult year. We had a lot of work to do on the documentation side, new features to deliver (traits) and an important topic we definitely wanted to highlight: Android support. This took longer than expected, but in the end, the new Groovy 2.4. We’re lucky to have half of the commits coming from the community here. Especially, lots of people helped us on the documentation. And it wasn’t easy, because our documentation requires that every snippet of code that appears in the docs belongs to a unit test, to make sure that the documentation is always up-to-date.

Meanwhile, at the end of the year, we learnt from Pivotal that they would end sponsoring our jobs. It means that Guillaume Laforge, Jochen Theodorou and myself, for the Groovy team, plus Graeme Rocher, Jeff Brown and Lari Hotari, for the Grails team, were both loosing their jobs and full time to work on the project at the same time. This wasn’t really a surprise and I am very happy I could work for so long on Groovy, full time, but as I said in a previous post I also wish I will still be able to do that, because you can see from the numbers and features that it matters. If you wonder, we are still discussing with several potential sponsors.

2015: Your story

Total: 178 commits Sponsored: 81 commits (45%) Community: 97 commits (54%)

  1. Cédric Champeau (Sponsored) : 69 commits (38%)

  2. Pascal Schumacher : 59 commits (33%)

  3. Jochen Theodorou (Sponsored) : 12 commits (6%)

  4. Paul King : 12 commits (6%)

  5. JBrownVisualSpection : 7 commits (3%)

  6. Yu Kobayashi : 3 commits (1%)

  7. Christoph Frick : 2 commits (1%)

  8. Kamil Szymanski : 2 commits (1%)

  9. Michael Schuenck : 2 commits (1%)

  10. Sean Gilligan : 2 commits (1%)

  11. Sergey Egorov : 2 commits (1%)

  12. Thibault Kruse : 2 commits (1%)

  13. Andy Wilkinson : 1 commits (0%)

  14. Maksym Stavytskyi : 1 commits (0%)

  15. Mario Garcia : 1 commits (0%)

  16. Radovan Synek : 1 commits (0%)

2015 will be another important year. It’s going to be huge for the community. Guillaume Laforge announced that he was joining Restlet, so for the first time since 2007 he will not be fully employed to work on Groovy, but I don’t expect this to have a big impact on the language development itself: as you can see from the numbers, about half of the commits already come from the community and Guillaume didn’t contribute much code lately. He was instead the lead of the project, the one that took decisions, the one speaking about the project and talking to and leading the community. He was the voice. It was a hard job, a very important one for Groovy. Guillaume is still today the lead of the project, and he will continue to contribute to the language, but I know from him that he wanted to be able to do more code, and put Groovy in action into a new project.

With the end of sponsorship of Pivotal, the demise of Codehaus and Guillaume’s decision, it became even more important to move Groovy to a foundation where it will be able to live with or without us. I have honestly no idea where I will work in a few weeks now. I sincerely hope I will still be able to contribute to the language full time, but let’s be clear: today, it is very unlikely this is going to happen. It makes it very important for the project to be able to develop the community even more. We had more than 4.5 million downloads last year. This is huge. And with Android support, I see a lot of potential, even if we have tough competition with other languages and people being paid to develop them. The Apache Software Foundation is going to help us with securing the future of the language and building a community. I am proud of what you have done, collectively, and this is not over. Groovy is ready for a rebirth under the Apache umbrella!

More than ever, the future of Groovy is you.

To conclude this post, here are the top 10 contributors, in terms of number of commits, of Groovy, for the past 12 years. Congratulations Paul and thanks to our 100+ contributors!

  1. Paul King :Paul King : 2653 commits (23%)

  2. Jochen Theodorou : 1562 commits (13%)

  3. Cédric Champeau : 1526 commits (13%)

  4. James Strachan : 1031 commits (9%)

  5. Guillaume Laforge : 709 commits (6%)

  6. Roshan Dawrani : 307 commits (2%)

  7. Jason Dillon : 306 commits (2%)

  8. Danno Ferrin : 301 commits (2%)

  9. Alex Tkachman : 283 commits (2%)

  10. John Wilson : 230 commits (2%)


Older posts are available in the archive.