Improved sandboxing of Groovy scripts

27 March 2015

Tags: groovy sandoxing type checking AST secure

One of the most current uses cases of Groovy is scripting. Groovy makes it very easy to execute code dynamically, at runtime. Depending on the application, scripts can be found in multiple sources: file system, databases, remote services, … but more importantly, the designer of the application which executes scripts is not necessarily the one writing those scripts. Moreover, the scripts might run in a constrained environment (limited memory, file descriptors, time, …) or your you simply don’t want to allow users to access the full capabilities of the language from a script.

What you will learn in this post
  • why Groovy is a good fit to write internal DSLs

  • what it implies in terms of security in your applications

  • how you can customize compilation to improve the DSL

  • the meaning of SecureASTCustomizer

  • what are type checking extensions

  • how you can rely on type checking extensions to offer proper sandboxing

For example, imagine that you want to offer users the ability to evaluate mathematical expressions. One option would be to implement your own internal DSL, create a parser and eventually an interpreter for those expressions. Obviously this involves a bit of work, but if in the end you need to improve performance, for example by generating bytecode for the expressions instead of evaluating them, or introduce caching of those runtime generated classes, then Groovy is probably a very good option.

There are lots of options available, described in the documentation but the most simple example is just using the Eval class:

Example.java
int sum = (Integer) Eval.me("1+1");

The code 1+1 is parsed, compiled to bytecode, loaded and eventually executed by Groovy at runtime. Of course in this example the code is very simple, and you will want to add parameters, but the idea is that the code which is executed here is arbitrary. This is probably not what you want. For a calculator, you want to allow expressions like:

1+1
x+y
1+(2*x)**y
cos(alpha)*r
v=1+x

but certainly not

println 'Hello'
(0..100).each { println 'Blah' }
Pong p = new Pong()
println(new File('/etc/passwd').text)
System.exit(-1)
Eval.me('System.exit(-1)') // a script within a script!

This is where things start to become complicated, and where we start seeing actually multiple needs:

  • restricting the grammar of the language to a subset of its capabilities

  • preventing users from executing unexpected code

  • preventing users from executing malicious code

The calculator example is pretty simple, but for more complex DSLs, people might actually start writing problematic code without noticing, especially if the DSL is suffiently elegant to be used by non developers.

I was in this situation a few years ago, when I designed an engine that used Groovy "scripts" written by linguists. One of the problems was that they could unintentionally create infinite loops, for example. Code was executing on the server, and then you had a thread eating 100% of the CPU and had no choice but restart the application server so I had to find a way to mitigate that problem without compromising the DSL nor the tooling or the performance of the application.

Actually, lots of people have similar needs. During the past 4 years, I spoke to tons of users who had a similar question: How can I prevent users from doing bad things in Groovy scripts?

Compilation customizers

At that time, I had implemented my own solution, but I knew that other people also had implemented similar ones. In the end, Guillaume Laforge suggested to me that I wrote something that would help fixing those issues and make into Groovy core. This happened in Groovy 1.8.0 with compilation customizers.

Compilation customizers are a set of classes that are aimed at tweaking the compilation of Groovy scripts. You can write your own customizer but Groovy ships with:

  • an import customizer, which aims at adding imports transparently to scripts, so that users do not have to add "import" statements

  • an AST (Abstract Syntax Tree) transformation customizer, which allows to add AST transformations transparently to scripts

  • a secure AST customizer, which aims at restricting the grammar and syntactical constructs of the language

The AST transformation customizer allowed me to solve the infinite loop issue, by applying the @ThreadInterrupt transformation, but the SecureASTCustomizer is probably the one which has been the most misinterpreted of the whole.

I must apologize for this. Back then, I had no better name in mind. The important part of SecureASTCustomizer is AST. It was aimed at restricting access to some features of the AST. The "secure" part is actually not a good name at all, and I will illustrate why. You can even find a blog post from Kohsuke Kawagushi, of Jenkins fame, named Groovy SecureASTCustomizer is harmful[http://kohsuke.org/2012/04/27/groovy-secureastcustomizer-is-harmful/]. It is very true. The SecureASTCustomizer has never been designed with sandboxing in mind. It was designed to restrict the language at compile time, not runtime. So a much better name, in retrospect, would have been GrammarCustomizer. But as you’re certainly aware, there are two hard things in computer science: cache invalidation, naming things and off by one errors.

So imagine that you think of the secure AST customizer as a way of securing your script, and that you want to use this to prevent a user from calling System.exit from a script. The documentation says that you can prevent calls on some specific receivers by defining either a blacklist or a whitelist. In terms of securing something, I would always recommand to use a whitelist, that is to say to list explicitly what is allowed, rather than a blacklist, saying what is disallowed. The reason is that hackers always think of things you don’t, so let’s illustrate this.

Here is how a naive "sandbox" script engine could be configured using the SecureASTCustomizer. I am writing the examples of configuration in Java, even though I could write them in Groovy, just to make the difference between the integration code and the scripts clear.

public class Sandbox {
    public static void main(String[] args)  {
        CompilerConfiguration conf = new CompilerConfiguration();				(1)
        SecureASTCustomizer customizer = new SecureASTCustomizer();				(2)
        customizer.setReceiversBlackList(Arrays.asList(System.class.getName()));		(3)
        conf.addCompilationCustomizers(customizer);						(4)
        GroovyShell shell = new GroovyShell(conf);						(5)
        Object v = shell.evaluate("System.exit(-1)");						(6)
        System.out.println("Result = " +v);							(7)
    }
}
1 create a compiler configuration
2 create a secure AST customizer
3 declare that the System class is blacklisted as the receiver of method calls
4 add the customizer to the compiler configuration
5 associate the configuration to the script shell, that is, try to create a sandbox
6 execute a nasty script
7 print the result of the execution of the script

If you run this class, when the script is executed, it will throw an error:

General error during canonicalization: Method calls not allowed on [java.lang.System]
java.lang.SecurityException: Method calls not allowed on [java.lang.System]

This is the result of the application of the secure AST customizer, which prevents the execution of methods on the System class. Success! Now we have secured our script! Oh wait…

SecureASTCustomizer pwned!

Secure you say? So what if I do:

def c = System
c.exit(-1)

Execute again and you will see that the program exits without error and without printing the result. The return code of the process is -1, which indicates that the user script has been executed! What happened? Basically, at compile time, the secure AST customizer is not able to recognize that c.exit is a call on System, because it works at the AST level! It analyzes a method call, and in this case the method call is c.exit(-1), then gets the receiver and checks if the receiver is in the whitelist (or blacklist). In this case, the receiver is c and this variable is declared with def, which is equivalent to declaring it as Object, so it will think that c is Object, not System!

Actually there are many ways to workaround the various configurations that you can make on the secure AST customizer. Just for fun, a few of them:

((Object)System).exit(-1)
Class.forName('java.lang.System').exit(-1)
('java.lang.System' as Class).exit(-1)

import static java.lang.System.exit
exit(-1)

and there are much more options. The dynamic nature of Groovy just makes it impossible to resolve those cases at compile time. There are solutions though. One option is to rely on the JVM standard security manager. However this is a system wide solution which is often considered as a hammer. But it also doesn’t really work for all cases, for example you might not want to prevent creation of files, but only reads for example…

This limitation - or should I say frustration for lots of us - led several people to create a solution based on runtime checks. Runtime checks do not suffer the same problem, because you will have for example the actual receiver type of a message before checking if a method call is allowed or not. In particular, those implementations are of particular interest:

However none of those implementations is totally secure or reliable. For example, the version by Kohsuke relies on hacking the internal implementation of call site caching. The problem is that it is not compatible with the invokedynamic version of Groovy, and those internal classes are going to be removed in future versions of Groovy. The version by Simon, on the other hand, relies on AST transformations but misses a lot of possible hacks.

As a result, with friends of mine Corinne Krych, Fabrice Matrat and Sébastien Blanc, we decided to create a new runtime sandboxing mechanism that would not have the issues of those projects. We started implementing this during a hackathon in Nice, and we gave a talk about this last year at the Greach conference. It relies on AST transformations and heavily rewrites the code in order to perform a check before each method call, property access, increment of variable, binary expression, … The implementation is still incomplete, and not much work has been done because I realized there was still a problem in case of methods or properties called on "implicit this", like in builders for example:

xml {
   cars {				 // cars is a method call on an implicit this: "this".cars(...)
     car(make:'Renault', model: 'Clio')
   }
}

As of today I still didn’t find a way to properly handle this because of the design of the meta-object protocol in Groovy, that here relies on the fact that a receiver throws an exception when the method is not found before trying another receiver. In short, it means that you cannot know the type of the receiver before the method is actually called. And if it is called, it’s already too late…

Until earlier this year I had still no perfect solution to this problem, in case the script being executed is using the dynamic features of the language. But now has come the time to explain how you can significantly improve the situation if you are ready to loose some of the dynamism of the language.

Type checking

Let’s come back to the root problem of the SecureASTCustomizer: it works on the abstract syntax tree and has no knowledge of the concrete types of the receivers of messages. But since Groovy 2, Groovy has optional compilation, and in Groovy 2.1, we added type checking extensions.

Type checking extensions are very powerful: they allow the designer of a Groovy DSL to help the compiler infer types, but it also lets you throw compilation errors when normally it should not. Type checking extensions are even used internally in Groovy to support the static compiler, for example to implement traits or the markup template engine.

What if, instead of relying on the information available after parsing, we could rely on information from the type checker? Take the following code that our hacker tried to write:

((Object)System).exit(-1)

If you activate type checking, this code would not compile:

1 compilation error:

[Static type checking] - Cannot find matching method java.lang.Object#exit(java.lang.Integer). Please check if the declared type is right and if the method exists.

So this code would not compile anymore. But what if the code is:

def c = System
c.exit(-1)

You can verify that this passes type checking by wrapping the code into a method and running the script with the groovy command line tool:

@groovy.transform.TypeChecked // or even @CompileStatic
void foo() {
  def c = System
  c.exit(-1)
}
foo()

Then the type checker will recognize that the exit method is called on the System class and is valid. It will not help us there. But what we know, if this code passes type checking, is that the compiler recognized the call on the System receiver. The idea, then, is to rely on a type checking extension to disallow the call.

A simple type checking extension

Before we dig into the details about sandboxing, let’s try to "secure" our script using a traditional type checking extension. Registering a type checking extension is easy: just set the extensions parameter of the @TypeChecked annotation (or @CompileStatic if you want to use static compilation):

@TypeChecked(extensions=['SecureExtension1.groovy'])
void foo() {
  def c = System
  c.exit(-1)
}
foo()

The extension will be searched on classpath in source form (there’s an option to have precompiled type checking extensions but this is beyond the scope of this blog post):

SecureExtension1.groovy
onMethodSelection { expr, methodNode ->					(1)
   if (methodNode.declaringClass.name=='java.lang.System') {		(2)
      addStaticTypeError("Method call is not allowed!", expr)		(3)
   }
}
1 when the type checker selects the target method of a call
2 then if the selected method belongs to the System class
3 make the type checker throw an error

That’s really all needed. Now execute the code again, and you will see that there’s a compile time error!

/home/cchampeau/tmp/securetest.groovy: 6: [Static type checking] - Method call is not allowed!
 @ line 6, column 3.
     c.exit(-1)
     ^

1 error

So this time, thanks to the type checker, c is really recognized as an instance of class System and we can really disallow the call. This is a very simple example, but it doesn’t really go as far as what we can do with the secure AST customizer in terms of configuration. The extension that we wrote has hardcoded checks, but it would probably be nicer if we could configure it. So let’s start working with a bit more complex example.

Imagine that your application computes a score for a document and that you allow the users to customize the score. Then your DSL:

  • will expose (at least) a variable named score

  • will allow the user to perform mathematical operations (including calling methods like cos, abs, …)

  • should disallow all other method calls

An example of user script would be:

abs(cos(1+score))

Such a DSL is easy to setup. It’s a variant of the one we defined earlier:

Sandbox.java
CompilerConfiguration conf = new CompilerConfiguration();
ImportCustomizer customizer = new ImportCustomizer();
customizer.addStaticStars("java.lang.Math");                        (1)
conf.addCompilationCustomizers(customizer);
Binding binding = new Binding();
binding.setVariable("score", 2.0d);                                 (2)
GroovyShell shell = new GroovyShell(binding,conf);
Double userScore = (Double) shell.evaluate("abs(cos(1+score))");    (3)
System.out.println("userScore = " + userScore);
1 add an import customizer that will add import static java.lang.Math.* to all scripts
2 make the score variable available to the script
3 execute the script
There are options to cache the scripts, instead of parsing and compiling them each time. Please check the documentation for more details.

So far, our script works, but nothing prevents a hacker from executing malicious code. Since we want to use type checking, I would recommand to use the @CompileStatic transformation transparently:

  • it will activate type checking on the script, and we will be able to perform additional checks thanks to the type checking extension

  • it will improve the performance of the script

Adding @CompileStatic transparently is easy. We just have to update the compiler configuration:

ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(CompileStatic.class);
conf.addCompilationCustomizers(astcz);

Now if you try to execute the script again, you will face a compile time error:

Script1.groovy: 1: [Static type checking] - The variable [score] is undeclared.
 @ line 1, column 11.
   abs(cos(1+score))
             ^

Script1.groovy: 1: [Static type checking] - Cannot find matching method int#plus(java.lang.Object). Please check if the declared type is right and if the method exists.
 @ line 1, column 9.
   abs(cos(1+score))
           ^

2 errors

What happened? If you read the script from a "compiler" point of view, it doesn’t know anything about the "score" variable. You, as a developer, know that it’s a variable of type double, but the compiler cannot infer it. This is precisely what type checking extensions are designed for: you can provide additional information to the compiler, so that compilation passes. In this case, we will want to indicate that the score variable is of type double.

So we will slightly change the way we transparently add the @CompileStatic annotation:

ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
        singletonMap("extensions", singletonList("SecureExtension2.groovy")),
        CompileStatic.class);

This will "emulate" code annotated with @CompileStatic(extensions=['SecureExtension2.groovy']). Of course now we need to write the extension which will recognize the score variable:

SecureExtension2.groovy
unresolvedVariable { var ->			(1)
   if (var.name=='score') {			(2)
      return makeDynamic(var, double_TYPE)	(3)
   }
}
1 in case the type checker cannot resolve a variable
2 if the variable name is score
3 then instruct the compiler to resolve the variable dynamically, and that the type of the variable is double

You can find a complete description of the type checking extension DSL in http://docs.groovy-lang.org/latest/html/documentation/#type_checking_extensions[this section of the documentation], but you have here an example of _mixed mode compilation : the compiler is not able to resolve the score variable. You, as the designer of the DSL, know that the variable is in fact found in the binding, and is of the double, so the makeDynamic call is here to tell the compiler: "ok, don’t worry, I know what I am doing, this variable can be resolved dynamically and it will be of type double". That’s it!

First completed "secure" extension

Now it’s time to put this altogether. We wrote a type checking extension which is capable of preventing calls on System on one side, and we wrote another which is able to resolve the score variable on another. So if we combine both, we have a first, complete, securing type checking extension:

SecureExtension3.groovy
// disallow calls on System
onMethodSelection { expr, methodNode ->
    if (methodNode.declaringClass.name=='java.lang.System') {
        addStaticTypeError("Method call is not allowed!", expr)
    }
}

// resolve the score variable
unresolvedVariable { var ->
    if (var.name=='score') {
        return makeDynamic(var, double_TYPE)
    }
}

Don’t forget to update the configuration in your Java class to use the new type checking extension:

ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
        singletonMap("extensions", singletonList("SecureExtension3.groovy")),
	CompileStatic.class);

Execute the code again and it still works. Now, try to do:

abs(cos(1+score))
System.exit(-1)

And the script compilation will fail with:

Script1.groovy: 1: [Static type checking] - Method call is not allowed!
 @ line 1, column 19.
   abs(cos(1+score));System.exit(-1)
                     ^

1 error

Congratulations, you just wrote your first type checking extension that prevents the execution of malicious code!

Improving configuration of the extension

So far so good, we are able to prevent calls on System, but it is likely that we are going to discover new vulnerabilities, and that we will want to prevent execution of such code. So instead of hardcoding everything in the extension, we will try to make our extension generic and configurable. This is probably the trickiest thing to do, because there’s no direct way to provide context to a type checking extension. Our idea therefore relies on the (ugly) thread locals to pass configuration data to the type checker.

The first thing we’re going to do is to make the variable list configurable. Here is the code on the Java side of things:

Sandbox.java
public class Sandbox {
    public static final String VAR_TYPES = "sandboxing.variable.types";

    public static final ThreadLocal<Map<String, Object>> COMPILE_OPTIONS = new ThreadLocal<>();		(1)

    public static void main(String[] args) {
        CompilerConfiguration conf = new CompilerConfiguration();
        ImportCustomizer customizer = new ImportCustomizer();
        customizer.addStaticStars("java.lang.Math");
        ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
                singletonMap("extensions", singletonList("SecureExtension4.groovy")),			(2)
                CompileStatic.class);
        conf.addCompilationCustomizers(astcz);
        conf.addCompilationCustomizers(customizer);

        Binding binding = new Binding();
        binding.setVariable("score", 2.0d);
        try {
            Map<String,Class> variableTypes = new HashMap<String, Class>();				(3)
            variableTypes.put("score", Double.TYPE);							(4)
            Map<String,Object> options = new HashMap<String, Object>();					(5)
            options.put(VAR_TYPES, variableTypes);							(6)
            COMPILE_OPTIONS.set(options);								(7)
            GroovyShell shell = new GroovyShell(binding, conf);
            Double userScore = (Double) shell.evaluate("abs(cos(1+score));System.exit(-1)");
            System.out.println("userScore = " + userScore);
        } finally {
            COMPILE_OPTIONS.remove();									(8)
        }
    }
}
1 create a ThreadLocal that will hold the contextual configuration of the type checking extension
2 update the extension to SecureExtension4.groovy
3 variableTypes is a map variable name → variable type
4 so this is where we’re going to add the score variable declaration
5 options is the map that will store our type checking configuration
6 we set the "variable types" value of this configuration map to the map of variable types
7 and assign it to the thread local
8 eventually, to avoid memory leaks, it is important to remove the configuration from the thread local

And now, here is how the type checking extension can use this:

import static Sandbox.*

def typesOfVariables = COMPILE_OPTIONS.get()[VAR_TYPES]				(1)

unresolvedVariable { var ->
    if (typesOfVariables[var.name]) {						(2)
        return makeDynamic(var, classNodeFor(typesOfVariables[var.name]))	(3)
    }
}
1 Retrieve the list of variable types from the thread local
2 if an unresolved variable is found in the map of known variables
3 then declare to the type checker that the variable is of the type found in the map

Basically, the type checking extension, because it is executed when the type checker verifies the script, can access the configuration through the thread local. Then, instead of using hard coded names in unresolvedVariable, we can just check that the variable that the type checker doesn’t know about is actually declared in the configuration. If it is, then we can tell it which type it is. Easy!

Now we have to find a way to explicitly declare the list of allowed method calls. It is a bit trickier to find a proper configuration for that, but here is what we came up with.

Configuring a white list of methods

The idea of the whitelist is simple. A method call will be allowed if the method descriptor can be found in the whitelist. This whitelist consists of regular expressions, and the method descriptor consists of the fully-qualified class name of the method, it’s name and parameters. For example, for System.exit, the descriptor would be:

java.lang.System#exit(int)

So let’s see how to update the Java integration part to add this configuration:

public class Sandbox {
    public static final String WHITELIST_PATTERNS = "sandboxing.whitelist.patterns";

    // ...

    public static void main(String[] args) {
        // ...
        try {
            Map<String,Class> variableTypes = new HashMap<String, Class>();
            variableTypes.put("score", Double.TYPE);
            Map<String,Object> options = new HashMap<String, Object>();
            List<String> patterns = new ArrayList<String>();					(1)
            patterns.add("java\\.lang\\.Math#");						(2)
            options.put(VAR_TYPES, variableTypes);
            options.put(WHITELIST_PATTERNS, patterns);						(3)
            COMPILE_OPTIONS.set(options);
            GroovyShell shell = new GroovyShell(binding, conf);
            Double userScore = (Double) shell.evaluate("abs(cos(1+score));System.exit(-1)");
            System.out.println("userScore = " + userScore);
        } finally {
            COMPILE_OPTIONS.remove();
        }
    }
}
1 declare a list of patterns
2 add all methods of java.lang.Math as allowed
3 put the whitelist to the type checking options map

Then on the type checking extension side:

import groovy.transform.CompileStatic
import org.codehaus.groovy.ast.ClassNode
import org.codehaus.groovy.ast.MethodNode
import org.codehaus.groovy.ast.Parameter
import org.codehaus.groovy.transform.stc.ExtensionMethodNode

import static Sandbox.*

@CompileStatic
private static String prettyPrint(ClassNode node) {
    node.isArray()?"${prettyPrint(node.componentType)}[]":node.toString(false)
}

@CompileStatic
private static String toMethodDescriptor(MethodNode node) {								(1)
    if (node instanceof ExtensionMethodNode) {
        return toMethodDescriptor(node.extensionMethodNode)
    }
    def sb = new StringBuilder()
    sb.append(node.declaringClass.toString(false))
    sb.append("#")
    sb.append(node.name)
    sb.append('(')
    sb.append(node.parameters.collect { Parameter it ->
        prettyPrint(it.originType)
    }.join(','))
    sb.append(')')
    sb
}
def typesOfVariables = COMPILE_OPTIONS.get()[VAR_TYPES]
def whiteList = COMPILE_OPTIONS.get()[WHITELIST_PATTERNS]								(2)

onMethodSelection { expr, MethodNode methodNode ->
    def descr = toMethodDescriptor(methodNode)										(3)
    if (!whiteList.any { descr =~ it }) {										(4)
        addStaticTypeError("You tried to call a method which is not allowed, what did you expect?: $descr", expr)	(5)
    }
}

unresolvedVariable { var ->
    if (typesOfVariables[var.name]) {
        return makeDynamic(var, classNodeFor(typesOfVariables[var.name]))
    }
}
1 this method will generate a method descriptor from a MethodNode
2 retrieve the whitelist of methods from the thread local option map
3 convert a selected method into a descriptor string
4 if the descriptor doesn’t match any of the whitelist entries, throw an error

So if you execute the code again, you will now have a very cool error:

Script1.groovy: 1: [Static type checking] - You tried to call a method which is not allowed, what did you expect?: java.lang.System#exit(int)
 @ line 1, column 19.
   abs(cos(1+score));System.exit(-1)
                     ^

1 error

There we are! We now have a type checking extension which handles both the types of the variables that you export in the binding and a whitelist of allowed methods. This is still not perfect, but we’re very close to the final solution! It’s not perfect because we only took care of method calls here, but you have to deal with more than that. For example, properties (like foo.text which is implicitly converted into foo.getText()).

Putting it altogether

Dealing with properties is a bit more complicated because the type checker doesn’t have a handler for "property selection" like it does for methods. We can work around that, and if you are interested in seeing the resulting code, check it out below. It’s a type checking extension which is not written exactly as you have seen in this blog post, because it is meant to be precompiled for improved performance. But the idea is exactly the same.

import groovy.transform.CompileStatic
import org.codehaus.groovy.ast.ClassCodeVisitorSupport
import org.codehaus.groovy.ast.ClassHelper
import org.codehaus.groovy.ast.ClassNode
import org.codehaus.groovy.ast.MethodNode
import org.codehaus.groovy.ast.Parameter
import org.codehaus.groovy.ast.expr.PropertyExpression
import org.codehaus.groovy.control.SourceUnit
import org.codehaus.groovy.transform.sc.StaticCompilationMetadataKeys
import org.codehaus.groovy.transform.stc.ExtensionMethodNode
import org.codehaus.groovy.transform.stc.GroovyTypeCheckingExtensionSupport
import static Sandbox.*

class SandboxingTypeCheckingExtension extends GroovyTypeCheckingExtensionSupport.TypeCheckingDSL {

    @CompileStatic
    private static String prettyPrint(ClassNode node) {
        node.isArray()?"${prettyPrint(node.componentType)}[]":node.toString(false)
    }

    @CompileStatic
    private static String toMethodDescriptor(MethodNode node) {
        if (node instanceof ExtensionMethodNode) {
            return toMethodDescriptor(node.extensionMethodNode)
        }
        def sb = new StringBuilder()
        sb.append(node.declaringClass.toString(false))
        sb.append("#")
        sb.append(node.name)
        sb.append('(')
        sb.append(node.parameters.collect { Parameter it ->
            prettyPrint(it.originType)
        }.join(','))
        sb.append(')')
        sb
    }

    @Override
    Object run() {

        // Fetch white list of regular expressions of authorized method calls
        def whiteList = COMPILE_OPTIONS.get()[WHITELIST_PATTERNS]
        def typesOfVariables = COMPILE_OPTIONS.get()[VAR_TYPES]

        onMethodSelection { expr, MethodNode methodNode ->
            def descr = toMethodDescriptor(methodNode)
            if (!whiteList.any { descr =~ it }) {
                addStaticTypeError("You tried to call a method which is not allowed, what did you expect?: $descr", expr)
            }
        }

        unresolvedVariable { var ->
            if (isDynamic(var) && typesOfVariables[var.name]) {
                storeType(var, ClassHelper.make(typesOfVariables[var.name]))
                handled = true
            }
        }

        // handling properties (like foo.text) is harder because the type checking extension
        // does not provide a specific hook for this. Harder, but not impossible!

        afterVisitMethod { methodNode ->
            def visitor = new PropertyExpressionChecker(context.source, whiteList)
            visitor.visitMethod(methodNode)
        }
    }

    private class PropertyExpressionChecker extends ClassCodeVisitorSupport {
        private final SourceUnit unit
        private final List<String> whiteList

        PropertyExpressionChecker(final SourceUnit unit, final List<String> whiteList) {
            this.unit = unit
            this.whiteList = whiteList
        }

        @Override
        protected SourceUnit getSourceUnit() {
            unit
        }

        @Override
        void visitPropertyExpression(final PropertyExpression expression) {
            super.visitPropertyExpression(expression)

            ClassNode owner = expression.objectExpression.getNodeMetaData(StaticCompilationMetadataKeys.PROPERTY_OWNER)
            if (owner) {
                def descr = "${prettyPrint(owner)}#${expression.propertyAsString}"
                if (!whiteList.any { descr =~ it }) {
                    addStaticTypeError("Property is not allowed: $descr", expression)
                }
            }
        }
    }
}

And a final version of the sandbox that includes assertions to make sure that we catch all cases:

Sandbox.java
public class Sandbox {
    public static final String WHITELIST_PATTERNS = "sandboxing.whitelist.patterns";
    public static final String VAR_TYPES = "sandboxing.variable.types";

    public static final ThreadLocal<Map<String, Object>> COMPILE_OPTIONS = new ThreadLocal<Map<String, Object>>();

    public static void main(String[] args) {
        CompilerConfiguration conf = new CompilerConfiguration();
        ImportCustomizer customizer = new ImportCustomizer();
        customizer.addStaticStars("java.lang.Math");
        ASTTransformationCustomizer astcz = new ASTTransformationCustomizer(
                singletonMap("extensions", singletonList("SandboxingExtension.groovy")),
                CompileStatic.class);
        conf.addCompilationCustomizers(astcz);
        conf.addCompilationCustomizers(customizer);

        Binding binding = new Binding();
        binding.setVariable("score", 2.0d);
        try {
            Map<String, Class> variableTypes = new HashMap<String, Class>();
            variableTypes.put("score", Double.TYPE);
            Map<String, Object> options = new HashMap<String, Object>();
            List<String> patterns = new ArrayList<String>();
            // allow method calls on Math
            patterns.add("java\\.lang\\.Math#");
            // allow constructors calls on File
            patterns.add("File#<init>");
            // because we let the user call each/times/...
            patterns.add("org\\.codehaus\\.groovy\\.runtime\\.DefaultGroovyMethods");
            options.put(VAR_TYPES, variableTypes);
            options.put(WHITELIST_PATTERNS, patterns);
            COMPILE_OPTIONS.set(options);
            GroovyShell shell = new GroovyShell(binding, conf);
            Object result;
            try {
                result = shell.evaluate("Eval.me('1')"); // error
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());
            }
            try {
                result = shell.evaluate("System.exit(-1)"); // error
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());
            }
            try {
                result = shell.evaluate("((Object)Eval).me('1')"); // error
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());
            }

            try {
                result = shell.evaluate("new File('/etc/passwd').getText()"); // getText is not allowed
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());
            }

            try {
                result = shell.evaluate("new File('/etc/passwd').text");  // getText is not allowed
                assert false;
            } catch (MultipleCompilationErrorsException e) {
                System.out.println("Successful sandboxing: "+e.getMessage());
            }

            Double userScore = (Double) shell.evaluate("abs(cos(1+score))");
            System.out.println("userScore = " + userScore);
        } finally {
            COMPILE_OPTIONS.remove();
        }
    }
}

Conclusion

This post has explored the interest of using Groovy as a platform for scripting on the JVM. It introduced various mechanisms for integration, and showed that this comes at the price of security. However, we illustrated some concepts like compilation customizers that make it easier to sandbox the environment of execution of scripts. The current list of customizers available in the Groovy distribution, and the currently available sandboxing projects in the wild, are not sufficient to guarantee security of execution of scripts in the general case (dependending, of course, on your users and where the scripts come from).

We then illustrated how you could, if you are ready to pay the price of loosing some of the dynamic features of the language, properly workaround those limitations through type checking extensions. Those type checking extensions are so powerful that you can even introduce your own error messages during the compilation of scripts. Eventually, by doing this and caching your scripts, you will also benefit from dramatic performance improvements in script execution.

Eventually, the sandboxing mechanism that we illustrated is not a replacement for the SecureASTCustomizer. We recommand that you actually use both, because they work on different levels: the secure AST customizer will work on the grammar level, allowing you to restrict some language constructs (for example preventing the creation of closures or classes inside the script), while the type checking extension will work after type inference (allowing you to reason on inferred types rather than declared types).

Last but not least, the solution that I described here is incomplete. It is not available in Groovy core. As I will have less time to work on Groovy, I would be very glad if someone or some people improve this solution and make a pull request so that we can have something!

Comments

Who is Groovy?

04 March 2015

Tags: groovy apache OSS

With all the changes that the Groovy project is seeing since the beginning of the year, I thought it was a good time to make a summary about its history. In particular, with the end of sponsorship from Pivotal, as well as Guillaume Laforge annoucing he is joining Restlet, a lot of people state that Groovy is done. It will be the occasion to talk about the history of the project, both in terms of community and sponsorship.

First of all, Groovy is, and will remain, alive. Groovy is a community project and we are pleased to announce that Groovy has started the process to join the Apache Software Foundation. The community deserved it, and even if it will mean some adaptations on our side, we think that the ASF will be a great fit for the project.

To build the statistics you will read in this post, I have used a Groovy script (of course!) that takes as reference the number of commits. This is far from being a perfect number, because some commits are just for fixing typos, while others are full new features and it also totally misses the fact that patches, before Git, didn’t carry the author information or the problem that we maintain multiple branches of Groovy, requiring a lot of work, but it gives an idea… And because people think that end of sponsorship may be equivalent to death, I separated commits in two categories: sponsored and community. Sponsored commits are commits which are likely to have been made by someone directly paid to contribute on Groovy. We will see how this proportion evolved over time.

If you think Groovy is dead, please read the following carefully. Let’s start our journey back in time!

2003: A dynamic language for the JVM

Total: 476 commits Community: 476 commits (100%)

  1. James Strachan : 374 commits (78%)

  2. Boc McWhirter : 76 commits (15%)

  3. Sam Pullara : 23 commits (4%)

  4. Kasper Nielsen : 2 commits (0%)

  5. Aslak Hellesoy : 1 commits (0%)

2003 is the inception year of Groovy. James Strachan, in August 2003, wanted to create a dynamic language for the JVM, inspired by Ruby, but closer to Java. Groovy was born, and the idea never changed over time: Groovy is the perfect companion for Java, a language which can be very close to Java in terms of syntax but also removes a lot of its boilerplate. Bob McWhirter is a famous name in the JVM world, he is now Director of Polyglot at Red Hat, so you can see that when he started contributing to the language back then, there was already a story for him! Sam Pullara is another very smart guy in the JVM world and he worked for top companies like BEA, Yahoo! or Twitter to name a few. He is a technical reviewer for the JavaOne conference.

2004: Guillaume Laforge joins the project

Total: 871 commits Community: 871 commits (100%)

  1. James Strachan : 495 commits (56%)

  2. Guillaume Laforge : 101 commits (11%)

  3. John Wilson : 70 commits (8%)

  4. Sam Pullara : 66 commits (7%)

  5. Jeremy Rayner : 33 commits (3%)

  6. Chris Poirier : 28 commits (3%)

  7. Bing Ran : 21 commits (2%)

  8. Steve Goetze : 12 commits (1%)

  9. John Stump : 10 commits (1%)

  10. Russel Winder : 8 commits (0%)

  11. Zohar Melamed : 8 commits (0%)

  12. Jochen Theodorou : 7 commits (0%)

  13. Damage Control : 4 commits (0%)

  14. Boc McWhirter : 3 commits (0%)

  15. Christiaan ten Klooster : 3 commits (0%)

  16. Yuri Schimke : 2 commits (0%)

Groovy is hosted at Codehaus (which just has annouced its retirement) and 2004 sees the appearance of famous names of the community. In particular, you can already see Guillaume Laforge and Jochen Theodorou. Both of them still directly work on the project as today. John Wilson started contributing the famous XML support of Groovy, and you can also note names like Russel Winder of Gant and GPars fame. Jeremy Rayer’s work is famous in the Groovy community since he wrote the first versions of the Groovy grammar using Antlr.

2005: Jochen Theodorou joins the project

Total: 934 commits Community: 934 commits (100%)

  1. Jochen Theodorou : 244 commits (26%)

  2. James Strachan : 162 commits (17%)

  3. Pilho Kim : 104 commits (11%)

  4. John Wilson : 79 commits (8%)

  5. Guillaume Laforge : 75 commits (8%)

  6. Dierk Koenig : 70 commits (7%)

  7. Jeremy Rayner : 62 commits (6%)

  8. Christian Stein : 48 commits (5%)

  9. Alan Green : 20 commits (2%)

  10. Russel Winder : 20 commits (2%)

  11. Martin C. Martin : 14 commits (1%)

  12. Sam Pullara : 10 commits (1%)

  13. John Rose : 8 commits (0%)

  14. Hein Meling : 7 commits (0%)

  15. Scott Stirling : 6 commits (0%)

  16. Franck Rasolo : 5 commits (0%)

I would tend to think that 2005 is the year when Jochen Theodorou took the technical lead of Groovy. In 2005, he becomes the most prolific contributor, even beyond the creator of the language himself. Dierk Koenig makes an appearance here: he is known for his work on GPars, but also for the reference book for Groovy: Groovy in Action.

2006: Rise of Paul King

Total: 480 commits Community: 480 commits (100%)

  1. Jochen Theodorou : 221 commits (46%)

  2. John Wilson : 56 commits (11%)

  3. Paul King : 54 commits (11%)

  4. Guillaume Laforge : 47 commits (9%)

  5. Dierk Koenig : 37 commits (7%)

  6. Jeremy Rayner : 23 commits (4%)

  7. Russel Winder : 14 commits (2%)

  8. Guillaume Alleon : 12 commits (2%)

  9. Joachim Baumann : 6 commits (1%)

  10. Martin C. Martin : 4 commits (0%)

  11. Graeme Rocher : 2 commits (0%)

  12. Marc Guillemot : 2 commits (0%)

  13. Christian Stein : 1 commits (0%)

  14. Steve Goetze : 1 commits (0%)

2006 is a very calm year for Groovy in terms of code production. James, the creator of the language, already disappeared from the contributors, and will not contribute anymore. Guillaume Laforge, in agreement with the other contributors, takes the project lead (he is still the lead today).

With half as many commits as in 2007, in retrospect, I would say that this was a critical year: either the project would die, or it would have become what it is today. And my personal feeling is that the person who saved Groovy just appeared in the contributors list: Paul King. Paul is undoubtfully the most active contributor to Groovy. He wrote a lot of the Groovy Development Kit, that is to say the APIs without which a language would be nothing. Having a nice language is one thing, having proper APIs and libraries that unleash its full potential is another. Paul King did it. Look at his ranking here: 3rd place. You will never see him ranked lower than that. And guess what? Paul is not paid to do this. He runs his own business and if you want to work with a Groovy expert, he’s probably the best.

Joachim Baumann is a name some people would recognize: he is still working with Groovy and one of the most regular contributors, with the Windows installer. Joachim takes time, for each Groovy release, to produce a Windows installer, which today we are still not capable of handling automatically.

2007: Groovy 1.0

  1. Paul King : 447 commits (30%)

  2. Jason Dillon : 265 commits (18%)

  3. Jochen Theodorou (Sponsored) : 242 commits (16%)

  4. Danno Ferrin : 101 commits (6%)

  5. Alex Tkachman (Sponsored) : 87 commits (5%)

  6. Graeme Rocher (Sponsored) : 61 commits (4%)

  7. Russel Winder : 46 commits (3%)

  8. Marc Guillemot : 36 commits (2%)

  9. Andres Almiray : 34 commits (2%)

  10. Guillaume Laforge (Sponsored) : 33 commits (2%)

  11. Jeremy Rayner : 26 commits (1%)

  12. Alexandru Popescu : 24 commits (1%)

  13. John Wilson : 22 commits (1%)

  14. Joachim Baumann : 21 commits (1%)

  15. Jeff Brown : 8 commits (0%)

  16. Dierk Koenig : 6 commits (0%)

  17. Martin C. Martin : 6 commits (0%)

  18. Guillaume Alleon : 4 commits (0%)

2007 is an important year in the history of Groovy. On January, 2d, Groovy 1.0 is out. Paul King ranks #1 for the first time, and will remain on top for a long time. This year also sees the creation of G2One, the first company build specifically for Groovy and Grails, by Guillaume Laforge, Graeme Rocher and Alex Tkachman. Both Graeme and Alex make their first appearance in the contributors graph, and both of them made significant contributions to the Groovy ecosystem: Graeme is famous for co-creating the Grails framework, and is still the lead of the project, while Alex is the one who contributed major performance improvements to the Groovy runtime (call site caching) and first experimented with a static compiler for Groovy (Groovy++).

Danno Ferrin contributed what is still one of my personal favorite features of Groovy, AST transformations, and probably one of the reasons I got paid to work on Groovy so thank you Danno! Andrés Almiray, listed here for the first time, is famous for the Griffon framework, a Grails-like framework for desktop applications which is still actively developed. He spent a lot of time improving the Swing support in Groovy.

Starting from 2007, you will see that the sponsored ratio of commits is changing. People who were employed by G2One fall into that category. As you can see, 2007 is more than important for Groovy, it is its second birth. And to conclude that, Groovy won the first prize at JAX 2007 innovation award.

2008: The G2One era

Total: 1069 commits Sponsored: 287 commits (26%) Community: 782 commits (73%)

  1. Paul King : 445 commits (41%)

  2. Danno Ferrin : 176 commits (16%)

  3. Jochen Theodorou (Sponsored) : 126 commits (11%)

  4. Alex Tkachman (Sponsored) : 125 commits (11%)

  5. Guillaume Laforge (Sponsored) : 33 commits (3%)

  6. Jim White : 32 commits (2%)

  7. Russel Winder : 31 commits (2%)

  8. Martin Kempf : 22 commits (2%)

  9. Roshan Dawrani : 19 commits (1%)

  10. Jeremy Rayner : 14 commits (1%)

  11. Martin C. Martin : 12 commits (1%)

  12. Jason Dillon : 9 commits (0%)

  13. Andres Almiray : 8 commits (0%)

  14. Thom Nichols : 5 commits (0%)

  15. Graeme Rocher (Sponsored) : 3 commits (0%)

  16. Jeff Brown : 3 commits (0%)

  17. John Wilson : 3 commits (0%)

  18. James Williams : 1 commits (0%)

  19. Marc Guillemot : 1 commits (0%)

  20. Vladimir Vivien : 1 commits (0%)

In 2008, Paul King still ranks #1 and you can see that the people who were sponsored by G2One were actually not the main contributors. Actually, most of them did consulting to pay salaries, which doesn’t leave much time to contribute to the language. Hopefully, a great project such as Groovy can rely on its community! Guillaume, Graeme and Alex were looking for an opportunity to spend more time on actual development, and it happened in November 2008 when G2One got acquired by SpringSource.

Some of the contributors you see in this list are still actively using Groovy or contributing: Jim White for example is famous for his contributions on the scripting sides of the language. Roshan Dawrani is one of the few guys capable of opening cryptic code and fixing bugs. Jeff Brown is a name you should know, since he is now a key member of the Grails team.

2009: milestones and the inappropriate quote

Total: 835 commits Sponsored: 183 commits (21%) Community: 652 commits (78%)

  1. Paul King : 342 commits (40%)

  2. Roshan Dawrani : 128 commits (15%)

  3. Jochen Theodorou (Sponsored) : 101 commits (12%)

  4. Alex Tkachman (Sponsored) : 41 commits (4%)

  5. Guillaume Laforge (Sponsored) : 40 commits (4%)

  6. Jason Dillon : 31 commits (3%)

  7. Jim White : 31 commits (3%)

  8. Danno Ferrin : 24 commits (2%)

  9. Peter Niederwieser : 23 commits (2%)

  10. Hamlet D’Arcy : 18 commits (2%)

  11. Russel Winder : 14 commits (1%)

  12. Martin C. Martin : 13 commits (1%)

  13. Thom Nichols : 13 commits (1%)

  14. Andres Almiray : 12 commits (1%)

  15. Vladimir Vivien : 3 commits (0%)

  16. Graeme Rocher (Sponsored) : 1 commits (0%)

2009 is another important year concluding with the release of Groovy 1.7, the first version of Groovy supporting inner classes or the famous power asserts from Peter Niederwieser. If you know Groovy, you must know Peter, the father of the famous Spock testing framework which just reached 1.0!

Hamlet D’Arcy contributed a lot in terms of code quality, but also became the first specialist of AST transformations. 2009 is also the year I started to use Groovy, as a user. I never stopped and actually I started contributing back then. At that time, Groovy was still using Subversion (we’re now using Git like all the cool kids), so it was the good old patch way, loosing authorship.

This year is also the year when James Strachan wrote a very famous quote about Groovy. This quote is probably the most innapropriately used quote about Groovy of all time, because it was done by its creator, but remember that James left the project in 2005!

I can honestly say if someone had shown me the Programming in Scala book by Martin Odersky, Lex Spoon & Bill Venners back in 2003 I’d probably have never created Groovy.
on his blog
— James Strachan

First of all James says nothing about the language itself here. He had already left the project and says that if he had known about Scala before, he wouldn’t have created Groovy. I am today very happy that he didn’t know about it, or we would have missed an incredibly powerful language. Groovy today is nothing close to what it was when James left the project, thanks to the lead of Guillaume Laforge and incredibly talented people like Paul King, Jochen Theodorou and all the contributors listed on this page. Groovy and Scala both have their communities, but also different use cases. I wouldn’t sell one for the other…

In the end of 2009, another important milestone occurred for project, with VMware acquiring SpringSource.

2010: DSLs all the way

Total: 894 commits Sponsored: 189 commits (21%) Community: 705 commits (78%)

  1. Paul King : 443 commits (49%)

  2. Roshan Dawrani : 134 commits (14%)

  3. Jochen Theodorou (Sponsored) : 96 commits (10%)

  4. Guillaume Laforge (Sponsored) : 93 commits (10%)

  5. Hamlet D’Arcy : 71 commits (7%)

  6. Alex Tkachman : 28 commits (3%)

  7. Peter Niederwieser : 19 commits (2%)

  8. Andres Almiray : 7 commits (0%)

  9. Jason Dillon : 1 commits (0%)

  10. Russel Winder : 1 commits (0%)

  11. Thom Nichols : 1 commits (0%)

2010 is a pretty stable year for Groovy. Groovy reaches 1.8 in 2010 with important features for its incredible DSL design capabilities. With command chain expressions, native JSON support and performance improvements, Groovy put the bar very high in terms of integration in the Java ecosystem. Today, no other JVM language is as simple as Groovy to integrate with Java. With cross-compilation and by the use of the very same class model, Groovy is at that date the best language for scripting on the JVM. It is so good that a lot of people start to see it as a better Java and want to use it as a first class language. However, being dynamic, Groovy is still a problem for a category of users…

2011: Time to move to GitHub

Total: 841 commits Sponsored: 514 commits (61%) Community: 327 commits (38%)

  1. Cédric Champeau (Sponsored) : 252 commits (29%)

  2. Paul King : 212 commits (25%)

  3. Jochen Theodorou (Sponsored) : 163 commits (19%)

  4. Guillaume Laforge (Sponsored) : 98 commits (11%)

  5. Jochen : 44 commits (5%)

  6. Hamlet D’Arcy : 33 commits (3%)

  7. Roshan Dawrani : 26 commits (3%)

  8. Andres Almiray : 1 commits (0%)

  9. Andrew Eisenberg : 3 commits (0%)

  10. Alex Tkachman : 2 commits (0%)

  11. Bobby Warner : 1 commits (0%)

  12. Colin Harrington : 1 commits (0%)

  13. Dierk Koenig : 1 commits (0%)

  14. Dirk Weber : 1 commits (0%)

  15. John Wagenleitner : 1 commits (0%)

  16. Lari Hotari (Sponsored) : 1 commits (0%)

  17. Peter Niederwieser : 1 commits (0%)

In 2011, I became a committer to the Groovy project. As I said, I had contributed several fixes or features for Groovy 1.8, but for the first time, I became a committer and I started to be able to push changes to the codebase without having to ask permission. So this is basically the first time you see my name on the contributors list, but you can see that I am ranking #1 and I have never lost that ranking since then. It surprised me too, but there is a very good reason for that. In october 2011, in addition to being a committer, I also became paid to work on Groovy. Full-time. I entered the club of lucky people being paid to work on open-source software. It was sincerely a dream, and I will never be enough thankful to Guillaume Laforge for giving me this opportunity. He changed my life and I think I became a better developer thanks to him. VMware was my employer back then, and while I had never worked on a language before, Guillaume trusted my skills and proposed to me to work on something that would dramatically change the language : a static type checker.

I also worked on the infrastructure of the language, starting from the migration to GitHub. It was an important move to make: as you can see, there was a very limited set of committers to Groovy. With GitHub, we had the tool we needed to increase the size of our community and from the numbers that will follow, I think it’s a success.

2012: Groovy 2 and static compilation

  1. Cédric Champeau (Sponsored) : 515 commits (46%)

  2. Paul King : 249 commits (22%)

  3. Jochen Theodorou (Sponsored) : 169 commits (15%)

  4. Guillaume Laforge (Sponsored) : 74 commits (6%)

  5. PascalSchumacher : 12 commits (1%)

  6. Peter Niederwieser : 11 commits (0%)

  7. René Scheibe : 11 commits (0%)

  8. Andre Steingress : 9 commits (0%)

  9. John Wagenleitner : 7 commits (0%)

  10. Peter Ledbrook : 6 commits (0%)

  11. Andres Almiray : 6 commits (0%)

  12. Adrian Nistor : 5 commits (0%)

  13. Tim Yates : 5 commits (0%)

  14. Baruch Sadogursky : 4 commits (0%)

  15. Andrew Eisenberg : 3 commits (0%)

  16. Rich Freedman : 3 commits (0%)

  17. Stephane Maldini : 3 commits (0%)

  18. Andrew Taylor : 2 commits (0%)

  19. Jeff Brown : 2 commits (0%)

  20. Luke Daley : 2 commits (0%)

  21. Tiago Fernandez : 2 commits (0%)

  22. Andrey Bloschetsov : 1 commits (0%)

  23. Johnny Wey : 1 commits (0%)

  24. Kenneth Kousen : 1 commits (0%)

  25. Mathieu Bruyen : 1 commits (0%)

  26. Paul Bakker : 1 commits (0%)

  27. Paulo Poiati : 1 commits (0%)

  28. Sean Flanigan : 1 commits (0%)

  29. Suk-Hyun Cho : 1 commits (0%)

  30. Vladimir Orany : 1 commits (0%)

2012 is one of the most important years for the language. It was the year Groovy 2.0 was released. As you can see, I am still ranking #1 and Paul King, an unpaid contributor, is #2. This tells you the importance of community! Groovy 2 is a major change in the language, because it introduced both optional type checking and static compilation. For the first time, Groovy was able to provide at compile time the same level of feedback that Java would have. Some people wanted to kill me for having introduced that into the language. The truth is that it wasn’t my decision, but in retrospect, I am very happy with what the language is now. Without this, some people would have abandonned Groovy in favor of other JVM languages like Scala, while now in Groovy you can have the same level of performance as Java, with type safety, powerful type inference, extension methods, functional style programming and without the boilerplate. And it’s optional. I don’t know any other language that allows this, especially when you take type checking extensions into account, a feature that allows Groovy to go far beyond what Java and other languages offer in terms of type safety or static compilation.

2012 also sees the appearance of Pascal Schumacher, a silent but very active Groovy committer. Pascal does since 2012 an amazing job in helping us filtering JIRA issues, writing bugfixes, reviewing pull requests and lately writing documentation.

2013: Documentation effort and explosion of contributions

  1. Cédric Champeau (Sponsored) : 244 commits (22%)

  2. Paul King : 188 commits (17%)

  3. PascalSchumacher : 180 commits (16%)

  4. Jochen Theodorou (Sponsored) : 96 commits (8%)

  5. Thibault Kruse : 84 commits (7%)

  6. Guillaume Laforge (Sponsored) : 54 commits (4%)

  7. Andrey Bloschetsov : 43 commits (3%)

  8. Andre Steingress : 36 commits (3%)

  9. Pascal Schumacher : 27 commits (2%)

  10. Tim Yates : 24 commits (2%)

  11. René Scheibe : 12 commits (1%)

  12. kruset : 12 commits (1%)

  13. Martin Hauner : 8 commits (0%)

  14. Andres Almiray : 8 commits (0%)

  15. Larry Jacobson : 4 commits (0%)

  16. John Wagenleitner : 6 commits (0%)

  17. Paolo Di Tommaso : 6 commits (0%)

  18. Jeff Scott Brown (Sponsored) : 5 commits (0%)

  19. Masato Nagai : 5 commits (0%)

  20. Jochen Eddelbüttel : 3 commits (0%)

  21. hbaykuslar : 3 commits (0%)

  22. shalecraig : 3 commits (0%)

  23. Andrew Eisenberg : 2 commits (0%)

  24. Jacopo Cappellato : 2 commits (0%)

  25. Peter Niederwieser : 2 commits (0%)

  26. Rafael Luque : 2 commits (0%)

  27. Vladimir Orany : 1 commits (0%)

  28. saschaklein : 2 commits (0%)

  29. seanjreilly : 2 commits (0%)

  30. upcrob : 2 commits (0%)

  31. Adrian Nistor : 1 commits (0%)

  32. Alan Thompson : 1 commits (0%)

  33. Alessio Stalla : 1 commits (0%)

  34. DJBen : 1 commits (0%)

  35. Eric Dahl : 1 commits (0%)

  36. Ingo Hoffmann : 1 commits (0%)

  37. JBaruch : 1 commits (0%)

  38. Jacob Aae Mikkelsen : 1 commits (0%)

  39. Jim White : 1 commits (0%)

  40. John Engelman : 1 commits (0%)

  41. Jon Schneider : 1 commits (0%)

  42. Karel Piwko : 1 commits (0%)

  43. Kenneth Endfinger : 1 commits (0%)

  44. Kohsuke Kawaguchi : 1 commits (0%)

  45. Luke Kirby : 1 commits (0%)

  46. Michal Mally : 1 commits (0%)

  47. Miro Bezjak : 1 commits (0%)

  48. Olivier Croquette : 1 commits (0%)

  49. Rob Upcraft : 1 commits (0%)

  50. Sergey Egorov : 1 commits (0%)

  51. Stefan Armbruster : 1 commits (0%)

  52. Yasuharu NAKANO : 1 commits (0%)

While continuing to improve Groovy, 2013 was very important for the community. You can start to see the GitHub effect here, with much more contributors than before. It is impressive to see the difference before 2011 and after. The number of contributors is continously growing. In 2013, 63% of commits came from the community!

In February 2013, we also launched a new big project: the documentation and website overhaul. It is incredible to think that this effort is still uncomplete, but if you see that the old wiki has more than a thousand page or contents (often outdated), you can imagine what effort it takes to rewrite the documentation. Hopefully, we’re close to filling the gap now, and with the demise of Codehaus, we officially launched our new website where you can see the result of this job.

I also started working on Android support during 2013, for a first overview in GR8Conf 2014, and continued working on improving the infrastructure, with Bintray, TeamCity and Gradle. And Pivotal was born, out of EMC and VMware. Groovy and Grails, along with the Spring Framework, became part of this new company which is still paying me today to work on Groovy (and I, we, should be very thankful for this).

2014: Towards Android support

  1. Cédric Champeau (Sponsored) : 446 commits (37%)

  2. Paul King : 261 commits (22%)

  3. Jochen Theodorou (Sponsored) : 85 commits (7%)

  4. Guillaume Laforge (Sponsored) : 61 commits (5%)

  5. Thibault Kruse : 54 commits (4%)

  6. Pascal Schumacher : 47 commits (3%)

  7. Jim White : 26 commits (2%)

  8. Yu Kobayashi : 18 commits (1%)

  9. Andre Steingress : 16 commits (1%)

  10. Richard Hightower : 3 commits (0%)

  11. James Northrop : 11 commits (0%)

  12. Kenneth Endfinger : 9 commits (0%)

  13. Tomek Janiszewski : 9 commits (0%)

  14. Matias Bjarland : 8 commits (0%)

  15. Tobia Conforto : 8 commits (0%)

  16. Michael Schuenck : 7 commits (0%)

  17. Sargis Harutyunyan : 7 commits (0%)

  18. Andrey Bloschetsov : 6 commits (0%)

  19. Craig Andrews : 5 commits (0%)

  20. Kent : 5 commits (0%)

  21. Paolo Di Tommaso : 5 commits (0%)

  22. Peter Ledbrook : 5 commits (0%)

  23. Sergey Egorov : 5 commits (0%)

  24. Yasuharu Nakano : 5 commits (0%)

  25. Andrew Hamilton : 4 commits (0%)

  26. Lari Hotari (Sponsored) : 4 commits (0%)

  27. Bloshchetsov Andrey Evgenyevich : 3 commits (0%)

  28. Johannes Link : 3 commits (0%)

  29. Keegan Witt : 3 commits (0%)

  30. Tim Yates : 3 commits (0%)

  31. anto_belgin : 3 commits (0%)

  32. Baruch Sadogursky : 2 commits (0%)

  33. Dan Allen : 2 commits (0%)

  34. Jan Sykora : 2 commits (0%)

  35. John Wagenleitner : 2 commits (0%)

  36. Luke Kirby : 2 commits (0%)

  37. Martin Stockhammer : 2 commits (0%)

  38. UEHARA Junji : 2 commits (0%)

  39. Vihang D : 2 commits (0%)

  40. Andres Almiray : 2 commits (0%)

  41. Andy Hamilton : 1 commits (0%)

  42. Bobby Warner : 1 commits (0%)

  43. Carsten Lenz : 1 commits (0%)

  44. Chris Earle : 1 commits (0%)

  45. David Avenante : 1 commits (0%)

  46. David Nahodil : 1 commits (0%)

  47. David Tiselius : 1 commits (0%)

  48. Dimitar Dimitrov : 1 commits (0%)

  49. Grant McConnaughey : 1 commits (0%)

  50. Jeff Sheets : 1 commits (0%)

  51. Jess Sightler : 1 commits (0%)

  52. Logan Gorence : 1 commits (0%)

  53. Luke Daley : 1 commits (0%)

  54. Manuel Prinz : 1 commits (0%)

  55. Marc Guillemot : 1 commits (0%)

  56. Marcin Grzejszczak : 1 commits (0%)

  57. Nathan Mische : 1 commits (0%)

  58. Peter Swire : 1 commits (0%)

  59. Sagar Sane : 1 commits (0%)

  60. Stephen Mallette : 1 commits (0%)

  61. Tobias Schulte : 1 commits (0%)

  62. Wil Selwood : 1 commits (0%)

  63. davidmichaelkarr : 1 commits (0%)

  64. fintelia : 1 commits (0%)

  65. kruset : 1 commits (0%)

  66. paul-bjorkstrand : 1 commits (0%)

2014 was a difficult year. We had a lot of work to do on the documentation side, new features to deliver (traits) and an important topic we definitely wanted to highlight: Android support. This took longer than expected, but in the end, the new Groovy 2.4. We’re lucky to have half of the commits coming from the community here. Especially, lots of people helped us on the documentation. And it wasn’t easy, because our documentation requires that every snippet of code that appears in the docs belongs to a unit test, to make sure that the documentation is always up-to-date.

Meanwhile, at the end of the year, we learnt from Pivotal that they would end sponsoring our jobs. It means that Guillaume Laforge, Jochen Theodorou and myself, for the Groovy team, plus Graeme Rocher, Jeff Brown and Lari Hotari, for the Grails team, were both loosing their jobs and full time to work on the project at the same time. This wasn’t really a surprise and I am very happy I could work for so long on Groovy, full time, but as I said in a previous post I also wish I will still be able to do that, because you can see from the numbers and features that it matters. If you wonder, we are still discussing with several potential sponsors.

2015: Your story

Total: 178 commits Sponsored: 81 commits (45%) Community: 97 commits (54%)

  1. Cédric Champeau (Sponsored) : 69 commits (38%)

  2. Pascal Schumacher : 59 commits (33%)

  3. Jochen Theodorou (Sponsored) : 12 commits (6%)

  4. Paul King : 12 commits (6%)

  5. JBrownVisualSpection : 7 commits (3%)

  6. Yu Kobayashi : 3 commits (1%)

  7. Christoph Frick : 2 commits (1%)

  8. Kamil Szymanski : 2 commits (1%)

  9. Michael Schuenck : 2 commits (1%)

  10. Sean Gilligan : 2 commits (1%)

  11. Sergey Egorov : 2 commits (1%)

  12. Thibault Kruse : 2 commits (1%)

  13. Andy Wilkinson : 1 commits (0%)

  14. Maksym Stavytskyi : 1 commits (0%)

  15. Mario Garcia : 1 commits (0%)

  16. Radovan Synek : 1 commits (0%)

2015 will be another important year. It’s going to be huge for the community. Guillaume Laforge announced that he was joining Restlet, so for the first time since 2007 he will not be fully employed to work on Groovy, but I don’t expect this to have a big impact on the language development itself: as you can see from the numbers, about half of the commits already come from the community and Guillaume didn’t contribute much code lately. He was instead the lead of the project, the one that took decisions, the one speaking about the project and talking to and leading the community. He was the voice. It was a hard job, a very important one for Groovy. Guillaume is still today the lead of the project, and he will continue to contribute to the language, but I know from him that he wanted to be able to do more code, and put Groovy in action into a new project.

With the end of sponsorship of Pivotal, the demise of Codehaus and Guillaume’s decision, it became even more important to move Groovy to a foundation where it will be able to live with or without us. I have honestly no idea where I will work in a few weeks now. I sincerely hope I will still be able to contribute to the language full time, but let’s be clear: today, it is very unlikely this is going to happen. It makes it very important for the project to be able to develop the community even more. We had more than 4.5 million downloads last year. This is huge. And with Android support, I see a lot of potential, even if we have tough competition with other languages and people being paid to develop them. The Apache Software Foundation is going to help us with securing the future of the language and building a community. I am proud of what you have done, collectively, and this is not over. Groovy is ready for a rebirth under the Apache umbrella!

More than ever, the future of Groovy is you.

To conclude this post, here are the top 10 contributors, in terms of number of commits, of Groovy, for the past 12 years. Congratulations Paul and thanks to our 100+ contributors!

  1. Paul King :Paul King : 2653 commits (23%)

  2. Jochen Theodorou : 1562 commits (13%)

  3. Cédric Champeau : 1526 commits (13%)

  4. James Strachan : 1031 commits (9%)

  5. Guillaume Laforge : 709 commits (6%)

  6. Roshan Dawrani : 307 commits (2%)

  7. Jason Dillon : 306 commits (2%)

  8. Danno Ferrin : 301 commits (2%)

  9. Alex Tkachman : 283 commits (2%)

  10. John Wilson : 230 commits (2%)

Comments

Looking for a new job

19 January 2015

Tags: groovy jvm job

Will you be the new Groovy sponsor?

There we are. January 19th was a bad day as Pivotal announced that they wouldn’t sponsor the development of the Groovy language, as well as its long time friend the Grails framework, starting from march 31st. As I was paid to work on Groovy, it has a direct consequence for me: starting from April 1st, I am for hire. For you, it probably doesn’t change much as Groovy and Grails existed before Pivotal without corporate funding.

It was hard to keep this secret for weeks, but now that everything is public, it is easier for us. I have been lucky to work full time on the Groovy language for a bit more than 3 years now. Lucky because it was both a great open-source software and my passion. I have implemented major features of the language since Groovy 1.8: static type checking, static compilation, Android support, numerous AST transformations, continous integration… I have given talks to various conferences, including world class ones like JavaOne, Devoxx, GR8Conf or SpringOne2GX and it was a fantastic opportunity that wouldn’t have been possible without the support from VMware then Pivotal.

However I feel that I still have a lot to do. So many things are on the list, so much to improve, like for example adding language support for asynchronous programming, improved Java 8 support, a new meta-object protocol, not forgetting, of course, bugfixes… I am 100% convinced that the language wouldn’t have reached the level of maturity it has without support from a company like Pivotal. And even though Java is considered mature, it doesn’t mean that the language shouldn’t continue to evolve: it is rather the opposite. People are constantly asking for evolutions, because programming models evolve too.

For those reasons, my #1 choice would be to find a new company securing the development of Groovy and Grails, that is to say willing to pay me and my mates to continue working on them. The Groovy community is huge, there are lots of companies using Groovy in a variety of domains, I sincerely hope this is possible. With more than 4 million downloads in 2014, I can’t imagine how many hours of development have been saved thanks to Groovy and Grails… Should you be interested in sponsoring our projects, please write to sponsorship@groovy-lang.org.

About me

If unfortunately we can’t find third parties willing to take over the development of the language and hire us, then I would obviously be available to help you in your projects. For those of you who are looking for someone with a good knowledge of the JVM internals, I am listening. My areas of interest:

  • R&D: innovation, technical challenges are what drive my motivation

  • open-source over closed-source: If you are an Open-Source company, by that I mean a company which contributes Open-Source, I think we are on the same line. I love to share what I do and I am convinced that open-source development is the best way to improve the global quality of a project. I have worked on closed-source software in the past, there are good reasons to do so but working on OSS software is really what I prefer. I also enjoy speaking publicly about projects I work on.

  • JVM: I have spent most of my career on that platform and believe me, it’s not dead :)

  • back-end over front-end: I love working on tooling, performance tuning, algorithmics… everything that is hidden to the end user. On the other side I am not so good on the front-end part (HTML, CSS, …) so it would probably not be a good idea to make me work on that topic :)

  • remote working: I live in Saint Hilaire de Loulay, a small countryside town near Nantes, in France. I have now been working from home for more that 3 years, most of the Spring Framework team works that way, and it has been very successful so far. If you are not against the idea, I would love to continue that way.

In any case if you think a profile like mine is interesting for your company, you can contact me or ask for my résumé at cedric.champeau@gmail.com. Meanwhile, rest assured that we are focused and we will soon release Groovy 2.4 to prove that!

Comments

10 things your static language can’t do

15 December 2014

Tags: groovy languages static dynamic java javascript scala C++

But maybe mine can…

For those of you who do not know me, and you are likely much more in that category than in the other, I’ve been working on the Groovy language full-time for more than 3 years now. I started as a user several years ago, in the context of DSLs, and eventually became a contributor before getting employed to work on the language. I love static typing, but not at the point of thinking that we should only reason about types. This is why I also love dynamic typing, and why I invested so much in Groovy.

Groovy is primarily a dynamic language. It is probably the most widely used alternative language on the JVM, with use cases ranging from DSLs (scripting Jenkins builds for example) to full blown applications on mobile devices (Android), going through full stack web applications with Grails. Despite being primarily a dynamic language, I spent a lot of time writing a static compiler for Groovy, making it a pretty unique language in the JVM world, but not limited to the JVM world:

Groovy is a language which supports dynamic typing and compile-time type checking. It makes it surprisingly powerful and versatile language, capable of adapting to a great variety of contexts.

When I tell people that I wrote the static compiler for Groovy, I often get a reaction which is "so you admit that dynamic languages are less powerful than static ones", and they see me as the one that made the language right. Oh no, I did not. In fact, I love the dynamic aspects of the language. It is always annoying that I, as a designer of a static compiler, have to defend dynamic languages, but it’s an interesting topic, especially those days where I read lots of articles doing dynamic-bashing.

So in this post, I’m going to illustrate 10 things that a static language (most likely) cannot do. It doesn’t mean that there are only 10 things that a static language cannot do compared to a dynamic one, but it is here to illustrate the fact that this idea that static languages are superior or more scalable just because they are type safe is IMHO stupid. Compare languages between them, but do not compare categories of languages. While static languages will be excellent in making type safety guarantees (errors at compile time), dynamic languages are often far superior in cutting down verbosity. That’s just an example which illustrates that comparing on the sole aspect of type safety is not enough.

In this post I will also illustrate how Groovy is special, because it is capable of handling things in a mixed mode, making it totally unique and incredibly powerful, bringing the best of the two worlds together. Last disclaimer, this post is mostly centered on the JVM world, because this is the one I know the best.

As expected I got lots of comments on various social media. Some are positive, some not, that’s ok, but again, I would like to remember that this is not static languages bashing, nor dynamic languages promotion. Maybe some of you will think "hey, but my static language can do it!" and yes, it is possible, because as the subtitle of this post says, mine can too. But when it does, often, there’s a drawback (like loosing type safety, decreased performance,… ) or you fall on dynamic behavior without even noticing it. When it is the case, I tried to be honest and tell about the possibilities. But I also voluntarily hide some static features of Groovy that make it a very interesting solution (flow typing for example). Last but not least, I am not saying that all dynamic features implement all those items. I am saying that by nature a dynamic language make it possible. An possible doesn’t mean required. So please read carefully before screaming!

Ready?

10. Object-oriented programming done right

This is a very academic point of view. Users of static languages tend to think that their language is object-oriented. Because C++ has a compiler, because Java has a compiler, means that statically typed languages have to be compiled. Object oriented programming does not require languages to be compiled. OOP is about message passing. Object-oriented programming does not imply type safety. It means that an object is something that receives messages. In the Java world, the message is a method with arguments, and the contract is enforced at compile time, but it is an implementation detail which reduces OO programming to a small subset of its capabilities.

The great Alan Kay himself explains it.

Groovy, as a dynamic language, supports commons OOP concepts (and also functional concepts) like class, interface, abstract classes or traits but also has a meta-object protocol. For those of you who did Smalltalk programming, it’s the same idea: the behavior of objects is not determined at compile time, it’s a runtime behavior determined by a meta-object protocol. In Groovy, it translates to the fact that to each class corresponds a meta-class which determines the behavior that an object will have when it receives a message (a method call).

This capability of doing things at runtime instead of compile time is the core of many features of dynamic languages and most of the points illustrated in this blog post derive from it.

I had some comments that the fact that dynamic languages can do OO right wasn’t really interesting. In fact, I insisted on keeping this because this is actually what makes most of the following items possible. So think of 10. as the basement for most of the following items.

9. Multimethods

Java supports overloaded methods. The question whether it is a good or a bad idea is beyond the scope of this post (and believe me it is a very interesting question both in terms of semantics and performance). The idea is that an object can have two methods of the same name accepting different parameters. Let’s illustrate this with Java code:

public static int foo(String o) { return 1; }
public static int foo(Date o) { return 2; }
public static int foo(Object o) { return 3; }

Then you call it like this:

public static void main(String[] args) {
    Object[] array = new Object[] { "a string", new Date(), 666 };
    for (Object o : array) {
        System.out.println(foo(o));
    }
}

What do you think it prints? Well, most beginners will probably answer something that looks natural when you know the contents of the array:

1
2
3

But the correct answer is:

3
3
3

Because the static type of o when the call to foo is made is Object. To say it more clearly, the declared type of o is Object so we are calling foo(Object). The reason for this is that the code is statically compiled so the compiler has to know at compile time which method is going to be called. A dynamic language like Groovy chooses the method at runtime (unless, of course, you use @CompileStatic to enforce static semantics), so the method which is going to be called corresponds to the best fitting arguments. So Groovy, unlike Java, will print the less surprising result:

1
2
3

It is theorically possible for a static language to do the same. But it comes at the price of performance. It would mean that the arguments have to be checked at runtime, and since static languages do not, as far as I know, implement an inlining cache, performance would be lower than those of a well designed dynamic language…

But to add something to a dynamic language, what if you remove the Object version of foo, and remove 666 from the array? As an exercise to the reader, would this Java code compile?

public static int foo(String o) { return 1; }
public static int foo(Date o) { return 2; }

public static void main(String[] args) {
    Object[] array = new Object[] { "a string", new Date() };
    for (Object o : array) {
        System.out.println(foo(o));
    }
}

If not, what do you have to do to make it pass? Yes, dynamic languages are superior here…

8. Duck typing

Duck typing has always been a selling point of dynamic languages. Basically imagine two classes:

class Duck {
   String getName() { 'Duck' }
}
class Cat {
   String getName() { 'Cat' }
}

Those two classes define the same getName method, but it is not defined explicitly in a contract (for example through an interface). There are many reasons why this can happen. For example, you didn’t write those classes, they are in a third party library and for some reason those methods were not intended to be part of the contract. Imagine that you have a list of objects containing either ducks, cats, or anything else definining a getName method. Then a dynamic language will let you call that method:

def list = [cat, dog, human, hal]
list.each { obj ->
   println obj.getName()
}

A static language like Java would force you to have a cast here. But since you don’t have an interface defining getName and implemented by all objects, you cannot cast to that type so you have to consider all types and delegate appropriately like in the following code:

if (obj instanceof Cat) {
   return ((Cat)obj).getName();
}
if (obj instanceof Duck) {
   return ((Duck)obj).getName();
}
if (obj instanceof Human) {
   return ((Human)obj).getName();
}
if (obj instanceof Computer) {
   return ((Computer)obj).getName();
}

The real solution in Java is to define either a common super class or an interface for all those, but again, sometimes you just cannot because you don’t have access to the code! Imagine that the Cat and Dog classes where designed like this for example:

public abstract class Something {} // should define getName, but does not for some obscure reason
public class Cat extends Something {
   public String getName() { return "Cat"; }
}
public class Dog extends Something {
   public String getName() { return "Dog"; }
}

Often the developer didn’t even realize that all objects share a common interface. That’s bad for you, and if you find this code you have no choice but the cascading instanceof solution. There are multiple issues with that code:

  • it is very repetitive, the only thing which changes is the type used in the test and the cast

  • it has to be extensive, that is to say that if your list happens to contain an object having a getName method but not in your list of cases to consider, the code is broken. This means that you have to think about changing that method if you add a new type in your list.

  • in the JVM world, as the number of cases to consider grows, the size of the method will increase to the point where the JIT (just-in-time compiler) decides it’s not worth inlining, potentially dramatically reducing performance.

Of course, one may say "but why the hell didn’t you use an interface". This is of course a good way to solve this in Java, but it is not always possible. Not for example if you don’t have access to the source code (think of the various classes being split in third party libraries). I often faced this problem in the past, and believe me it’s no fun (I look at you, Apache Lucene).

There are actually alternatives for static languages. In Java, you could use a reflective proxy: define an interface, then create a proxy implementing that interface which will delegate to the appropriate getName method. Of course it is overkill: for each object of your list you have a proxy instantiated… Another option, again in Java, is to make the call reflective. But in that case, the call becomes slow and in fact, what you are doing is a dynamic call like a dynamic language would do. A language like Groovy doesn’t have that problem because it implements smart techniques like call site caching and runtime bytecode generation which make it much faster than what a reflective call would do…

An elegant alternative used by other static languages is structural typing. This is for example what the Go language does. In this case, you define an interface, but the object does not have to explicitly implement the interface: the fact that the object defines a method corresponding to the method in the interface is enough to implement it. This is elegant but it changes the semantics of an interface as you define it in Java. Last but not least, this technique cannot be used on a platform like the JVM, because the virtual machine has no way to do it. Well, this is not totally true since now we have the invokedynamic bytecode instruction but guess what? You are relying on a dynamic feature of the VM… Can you hear it?

Some argued that this is very bad design. I must repeat that if you think so, you missed the point. The idea is to workaround poorly designed APIs (or APIs which were "optimized"). When I talked about Lucene it was for a very good reason. I faced the problem. Lucene is a highly optimized piece of code. It makes design decisions which are often based on performance: flattening as much as possible class hierarchies (the HotSpot JIT doesn’t like deep class hierarchies), make classes final, prefer abstract classes over interfaces, … So it is easy to find classes that you want to extend, but you can’t because they are final, or classes that implicitly implement a contract but do not define interfaces. This is a pain to work with, and the ability of a dynamic language to be able to call such methods without having to explicitly declare a contract is a real gain. Some static languages offer similar features through structural typing, but then you have to think about what it means (virtual table lookup?) and how it is implemented depending on the platform (on the JVM, relying on reflection is possible but you loose all type safety and have very bad performance). So everytime I used duck typing, it wasn’t on APIs that I had designed. It was on 3rd party APIs, that for some reason didn’t provide me with a way to call some methods.

7. Respond to non-existing methods

A dynamic language answers to messages (method calls) at runtime. This means that a well designed dynamic language should be able to let you answer any kind of method call, including… non existing methods! This feature is at the core of powerful facilitating frameworks like Grails. In Grails, you can define a domain class like this:

class Person {
   String firstName
   String lastName
   int age
}

The Person class does not define any method, nor does it have any explicit relation to a datastore, an ORM or SQL server. However, you can write code like this:

def adults = Person.findByAge { it>= 18 }

I will not dig into the details about how this is done, but the idea is to intercept the fact that the findByAge method does not exist, then parse the name of the method and build a query based on the method name and the rest of the arguments (here, a closure, an open block of code). Queries can be as complex as you wish, like findByLastNameAndAge or whatever you can think of. Of course Grails does some smart things here, like generating a new method at runtime, so that the next time this method is hit, it is not an unknown method anymore, and can be invoked faster! Only a dynamic language would let you do that. Say bye to infamous DAOs that you have to change everytime you have a new query, it is not necessary. One could say that they prefer safety at compile time rather than the ability to do this, but Grails also offers that possibility of checking that the syntax is correct at compile time, while still leveraging the dynamic runtime to make this work… It’s all about boilerplate removed, code verbosity and productivity…

The ability to react to arbitrary messages is actually at the core of many DSLs (domain specific languages) written in Groovy. They are at the core of builders for example, which will let you write code like:

catalog {
   book {
   	isbn 123
	name 'Awesome dynamic languages'
        price 11.5
        tags {
	   dynamic,
	   groovy,
	   awesome
	}
   }
}

Instead of the less readable Java 8 version (for the reader’s mental sanity, I will not write the Java 7 version):

builder.catalog( (el) -> {
  el.book ( (p) -> {
     p.setISBN("123");
     p.setName("Awesome dynamic languages");
     p.setPrice(11.5);
     p.setTags("dynamic","groovy","awesome");
  })
});

6. Mocking and monkey-patching

Mocking is at the core of many unit testing strategies. Most of static languages make use of an external library to do this. Why this can be true of dynamic languages too, this is often not strictly necessary. For example Groovy offers built-in stubbing/mocking capabilities, very easily thanks to its dynamic nature. Monkey patching rely on the very same behavior but is easier to explain so I will illustrate this concept here. Imagine that you use a closed-source library (I won’t judge you, I promise) or an open-source library for which you don’t want to/don’t have time to contribute to, but you have found a serious security issue in a method:

public class VulnerableService {
   public void vulnerableMethod() {
      FileUtils.recurseDeleteWithRootPrivileges("/");
   }
}

You know how to fix it, but you have to wait for the maintainer to upgrade the library. Unfortunately, you can’t wait because attackers are already leveraging the vulnerability on your production server (yeah, they like to). One option that a dynamic language can let you do is redefine the method at runtime. For example, in Groovy, you could write:

VulnerableService.metaClass.vulnerableMethod = {
   println "Well tried, but you have been logged to Santa's naughty guys list!"
}

Then a caller that would call the vulnerableMethod would call the monkey-patched version instead of the original one. Of course in a language like Groovy, this would only be true if the callee is dynamically compiled: if you use @CompileStatic to behave like a static compiler, you’re out of luck, because the method which will be invoked is selected at compile time, so you will be vulnerable even if you try to monkey patch… Groovy provides other extension mechanisms to work around this, but it’s not the topic here ;-)

5. Dynamic objects

Most dynamic languages let you create… dynamic objects. It is basically an object for which you attach methods and properties at runtime. Not that I am a big fan of it but there are some valid use cases (serialization, languages like Golo not supporting classes, prototype based construction, …). It can also be convenient if you want to rapidly prototype a class.

As an example, let’s see how you could create an arbitrary object to represent a person, without actually leveraging on a class, using the Groovy language:

def p = new Expando()
p.name = 'Cédric'
p.sayHello = { println "Hello $name" }

p.sayHello()

The code is totally dynamic here. It lets you create an arbitrary object, attach new methods to it, data, …, without relying on strong typing. Of course it is interesting when you see that the sayHello method is capable of referencing "pseudo-fields" which are themselves dynamic!

4. Scripting

Static languages can do scripting. But it is definitely not what I would call scripting. Having to write types is not natural in a script. I even worked in the past in a context where people who wrote scripts where not programmers. They didn’t even know what a type is, and they don’t care. The most popular scripting technologies like Bash do not have types, and it’s not a problem, so imagine the following. You arrive late at your office, your boss is very angry about that and shouts to you: "you have 5 minutes, not more, to give me the total number of followers of users who have submitted an accepted pull request on the Groovy repo recently". It’s a weird query, most probably your boss is going into social networking madness but you have no choice otherwise you’re fired.

In that case, most developers would think of:

  • using a Bash script combining curl, grep, regular expressions and hoping that man works

  • using a tool they know like Java, but since they have so little time, they will probably rely on a regular expression to parse the JSON feed until they realize they have to do a second HTTP query for each user

  • quiting their job

In Groovy, you would do:

import groovy.json.JsonSlurper

def json = new JsonSlurper().parse('https://api.github.com/repos/groovy/groovy-core/issues?state=closed'.toURL())
json.user.collectEntries { u ->
   // second query to fetch the nb of followers
   def followers = new JsonSlurper().parse(u.followers_url.toURL())
   [u.login,followers.size()]
}.values().sum()

What you can see here is that we use a facility, JSonSlurper which actually parses the JSON result. It is much more reliable that what you would have done with a quick hack like a regex, but not only:

  • all data is accessible in a path-like fashion (json.user.address.city.postalCode)

  • you don’t need a single type here

Even if you use a smart JSON parser with your static language, you would still have to write a collection of classes to unmarshall the JSON structure into beans/classes. For such a simple use case, you really don’t care. You just want things done, easily, quickly. You don’t need type safety. You don’t need it to be super clean and tolerant to future changes of the JSON format. Get. Things. Done. (and boss happy).

3. Runtime coercions

Another thing that dynamic languages are particularily good at is runtime coercions. In general static languages users know about one type of conversion, which is casting. Some are lucky enough to know about coercion (like the use of implicit in Scala), the others rely on the adapter pattern. In a dynamic language, runtime coercions are often easy to implement. A coercion differs from a cast in the sense that you want to convert an object of class A to an object of class B, but a B cannot be assigned to an A.

Groovy provides "natural" conversions for some widely used types: lists to objects, and maps to object, like in the example here:

Point p = [1,2] // coercion of a list literal into an object of class Point thanks to constructor injection
Point p = [x:1, y:2] // coercion of a map literal into an object of class Point thanks to setter injection

But if it happened to be that you cannot use maps or lists but really want to convert one type to another, you can just declare a converter:

class A {
   Object asType(Class<?> clazz) { new B(...) }
}

I can see you raising an eyebrow here, because I wrote the conversion code directly in class A, but remember it’s a dynamic language with a meta-object protocol, so nothing prevents you from writing this conversion code outside of the class A itself, through its metaclass, which would let you add conversion algorithms for classes which are beyond your control. It’s a win!

2. Dynamic binding

Dynamic binding is linked to DSL evaluation and scripting. Imagine the following script:

a+b

In this script, variables a and b are unbound. They are not known from a compiler, so if you tried to statically compile this with a compiler like Java (or C++, or Scala) it would definitely blow up. Not if you compile this with Groovy. Because it’s dynamic, it’s able to know that those variables will be eventually bound, when the script is executed. Groovy provides means to inject those variables when you need them. It is some kind of late binding, but it is the core of expression languages, and it is no surprise that products like ElasticSearch uses Groovy as the default scripting language: it allows it to be both compilable and late bound. But there is more, if you think you have an issue with not being able to resolve a and b at compile time and that you fear to write code which might fail at runtime…

1. Mixed mode compilation

The last thing that a dynamic language like Groovy is capable of doing is leveraging mixed mode compilation. Behind this curious term is a unique concept in programming languages: Groovy is able of mixing static code with dynamic code, but more, you can instruct the compiler how to do so. So if you design a DSL like in ElasticSearch where you know that some variables will be bound, that the number, names and types of those variables are fixed and known in advance, then you can instruct the compiler and switch to a statically compilable mode! This means that if the user uses an unknown variable, compilation will fail.

This technique is already used in Groovy itself, in the powerful Markup Template Engine. It is a template engine which is capable of generating markup-like contents with a very nice builder-like syntax, but all templates are statically compiled even if the code seems to be full of unresolved method calls or variables!

For those who are interested in this, I invite them to take an eye at my blog posts describing how you can do this.

Conclusion

In conclusion, I have highlighted some points where dynamic languages can do what static languages cannot. Users of the most widely used dynamic language, Javascript, probably have lots of ideas too. The point for me is not to tell which one is better than the other because I don’t care. In general, I am not much into the war behind those, because I really enjoy both. I do static typing most of time, but I really enjoy the dynamic nature of the language too because often I don’t want to be slowed down just to make a compiler happy. I, as a developer, should be happy. Making a compiler happy is secondary and often not necessary. Last but not least, you might have thought, reading this post, that your static language can do this or that. I won’t blame you here, because mine can too. The idea here is more to show that it is totally unnatural for a static language or it often comes with horrible drawbacks like verbosity, performance issues or simply difficult to implement.

Comments


Older posts are available in the archive.