Cédric Champeau's blog: Groovy 2.0 from an insider

29 June 2012

Groovy 2.0 is has landed!

It’s with great pleasure that we released Groovy 2 yesterday. For me, this is a special release and some kind of achievement because this is the first one for which I had a large contribution (a bit selfish I admit, but well…). Before that release, I had contributed code, then became committer, but I didn’t have much time to give to the Groovy project. Now that I’m working full time on it, it’s a bit easier!

In this post, I will try to investigate some of the new features of Groovy 2 and provide you additional details that you won’t find in the announcement like some technical insight. There fore, it’s a good complement to the introduction that you can find on InfoQ.

Modularization

Modules

This is probably one of my favorite features of Groovy 2. In this version, we made a lot of effort to split Groovy into several modules. This effort, driven by Paul King and backed by a Groovy Enhancement Proposal has introduced several major changes in the codebase:

A build which is now fully based on Gradle
Core Groovy only contains the minimal set of classes required to run the language
Several optional modules in their own source subtree
an extension module mechanism has been put in place

There are dependencies between modules. Actually, if you look at the Gradle build, you would find a rather complicated dependency graph, but most of the complexity is just that we need to build a minimal core of Groovy (written in Java) to build the full Groovy project itself. As for the modules, here is the current dependency graph (click for larger image):

Each module is available as a Maven artifact, under the groupId org.codehaus.groovy and an artifactId corresponding to the name on the schema. The core Groovy module is named groovy (not groovy-core) for compatibility with previous releases (you just need to upgrade the version number). We also provide a groovy-all jar which bundles all those modules in a single jar, as well as packaging some dependencies into a separate package to avoid class loading issues (in particular with ASM which is used by a lot of frameworks, but in different versions). If you upgrade from a previous version of Groovy, there are only two possible paths: if you were using the groovy-all jar, then just upgrade to the new groovy-all version. If you were using groovy.jar, then it is possible that some of the classes you were using have now moved into a module. In that case, include the module jar as a dependency.

It’s worth noting that Groovy 2.0.0 is just a milestone in the modularization task: the following releases of Groovy will add new features, like the ability to add custom file extensions that actually do something different (like transparently applying AST transformations).

Extension modules

When I talked about modularization, I told you that this was one of my favorite features. Now, I’m going to explain why! Groovy is more than a language. It also provides a lot of helpful APIs to reduce code verbosity. For example, if you have the java.io.File class in Java, there is no getText(String encoding) method on it which allows you to retrieve the contents of the file, although this is a quite common operation. Groovy adds this method onto the File class. This is sometimes (incorrectly) called known category methods or to be more correct_extension methods_. In previous versions of Groovy, most of those methods were defined in a class named DefaultGroovyMethods. With modularization, we have exploded that class into multiple helper classes, but more importantly, we added the ability for you to add your custom extension methods!. In the past, if you wanted to add a method named foo on the String class, there weren’t many options:

write a Category class, then use it: while this is quite easy to do, it has major drawbacks. First, it is lexically scoped and thread bound. This means that it is a dynamic feature, and that opening a use block influences all Groovy code being executed until the block is closed, including method calls. In other words, category usage ``leaks'' into the current thread (which can have its own advantages but I don’t like side effects). Moreover, it cannot be statically checked (which is one of the major additions of Groovy 2). Additionally, categories are performance killers, as they imply disabling a lot of possible optimizations. I think you understand, now, that I really don’t like categories!
use a metaclass: it is very easy to do, but once again, it’s a pure dynamic feature, so it cannot be statically checked. It also has the disadvantage of not being easily reusable, and eventually, you may still be in conflict with another part of the code replacing the metaclass (which would remove your changes). Unlike categories, metaclass changes are visible to every thread and have immediate effect on every Groovy class. This is why this feature is the perfect candidate for monkey patching.
write an AST transformation: done at compile time, this would transform your foo call into another call. This has the advantage of beeing done at compile time, also meaning that they can be statically checked (and compiled!), but writing AST transformations requires knowledge of (part of) the Groovy compiler infrastructure (specifically, the AST part). In general, unless you already have written AST transformations, this is much slower to develop than playing with a metaclass. It is also much harder to debug. But in the end, this is a very powerful feature and performance-wise, it is in general better than metaclass hacks. One important thing with AST transforms, though, is that they are selectively applied: you may choose on what classes they run, meaning that it’s not necessarily a global feature.

With Groovy 2, we provide a new way to write such extension methods, in a very friendly way: basically, you just need to write a support class that ``looks like'' DefaultGroovyMethods or Groovy categories. I will not repeat here how you can write such extension methods, but documentation can be found here. This technique has many advantages:

easy to write
easy to share (modules are jar files which may be included as a dependency)
compatible with static type checking and static compilation

This is so easy to put in place that we already have, for example, an extension module from Tim Yates adding generators to Groovy. Another interesting thing with the way extension modules are written is that many third-party libraries already use that kind of extension mechanism. Think of Apache Commons StringUtils class for example. Every method in that class is a static method taking a String as the first parameter. With extension modules, this means that all you have to do is to write a module descriptor referencing that class to have those methods added directly on the String class!

For advanced programmers, the extension module framework that have been added to Groovy allows you to register MetaMethod_s to Groovy. This means that it not necessary for you to follow the static method way of doing extension methods: you could implement your own! Whatever technique you choose, you must be aware that just like _metaclass tricks, extension methods are global: you cannot choose on what lexical scope they apply. Once the hooks are installed, extension methods are there forever.

Static type checking and compilation

Why static type checking in a dynamic language?

There are various reasons why we wanted to add static type checking to the Groovy language. They are discussed in Guillaume Laforge’s article on InfoQ but the main reason is definitely that we want Groovy to be even easier to use by Java developers. There has been a lot of FUD about Groovy performance and the fact that it is dynamically typed, but in the end, Groovy is among the fastest dynamic languages (if not the fastest) and performance problems are more often coming from I/O operations than from the language itself! Also, we must think about all developers: not everyone builds a Twitter… But for people coming from Java, the lack of static type checking was often a barrier preventing them from using Groovy. There are many things to say about dynamic vs static languages, but for people who are interested in evaluating productivity or ease of debugging, there are some interesting results in this study from Stefan Hanenberg.

Speaking of that, we often see (and we are really irritated by this) the quote from James Strachan, founder of the Groovy language, promoted as a very good reason why not to choose Groovy (and its corrolate, prefer Scala over it): ``I can honestly say if someone had shown me the Programming Scala book by Martin Odersky, Lex Spoon & Bill Venners back in 2003 I’d probably have never created Groovy.'' I don’t know James personaly and I really respect his work and opinion and I joined the project well after but:

James made this statement in 2009 after leaving the project in 2006 even before Groovy 1.0 was released. He is now also working on the Kotlin language. I don’t think that would mean that if Kotlin had existed back in 2003, Odersky wouldn’t have created Scala… or that James would drop Scala anyway. Different languages acheive different goals.
If JPA existed in 2002, do you think Hibernate would have been created? Technology is innovation and languages are not different from any other piece of software in that case!
a lot of improvements have been made on the language since then. I mean, you cannot honestly compare Groovy 1.0 with what Groovy 2.0 is. It’s now a mature, widely adopted language and reading such a quote should not prevent you from testing it by yourself.
this is a personal comment, but definitely, I find Groovy much more readable than Scala. It was easy to make Java developers use Groovy, it’s not so easy to write Scala, and it’s even more difficult to debug other people’s Scala code…
Groovy was never meant to replace Java. All we want is to make the life of developers easier and reintroduce fun in programming.

One interesting thing to know is where I come from. Before working on Groovy for VMware, I was employed by a small French company named Lingway where I introduced the usage of Groovy in several projects. Groovy was used in many contexts, but one thing that is worth noting is that the audience was pretty large: from developers to linguists. Developers used Groovy as a Java scripting tool, sometimes as an ETL processor and linguists used it in the form of a DSL for natural language processing. In the end, we wrote a high performance natural language processor which performed text extraction, named entity recognition, part-of-speech tagging, … just using DSL rules written in Groovy. We had performance issues with Groovy, but in the end, they were all fixed by rewriting classes in Java. Of course, none of the rules were written in Java, so the engine was mostly written in Java, but rules, which are the core of the extraction system, are all written in Groovy, and performance is very good (and it’s using Groovy 1.8 by the way). However, it’s true that it was quite frustrating to rewrite classes in Java when they would have been much shorter and readable if written in Groovy. Moreover, none of those classes were using any of the dynamic features of the language…

Developers were Java developers. And some of them (who may recognize themselves reading this paper) were defintely not fans of dynamic typing. While I spent a lot of time advocating the advantages of Groovy over Java for readability and conciseness, in the end, if you end up using Groovy as a scripting language for Java and that you don’t use the dynamic features of the language, it is very legitimate to ask for static type checking, so that errors are catched early in the development process.

It’s optional!

With Groovy 2, we want people to have choice. Groovy is a very elegant language that removes most of the boilerplate of Java while staying very close to it, grammar-wise, hence very easy to learn. But it was definitely a problem for us to see people choosing other languages because of the lack of a type checker, although we think Groovy is much more readable than a lot of its competitors. Now, you will have the choice. You will have the choice to type check your code, and explicitely choose not to use the dynamic features of the language, and you will even have the choice to statically compile your code (with its own semantics). This adresses the two main issues we’ve talked about here: failing early in the development process thanks to static type checking, and adressing episodic performance issues by statically compiling your code.

Insider

Developing the static type checker was a very challenging task. I spent much less time on static compilation, which is an order of magnitude simpler than type checking. I learnt a lot doing this, and despite all the time spent in implementing that feature, I am pretty sure there are still bugs (in fact, I’m already aware of some), especially when it comes to generics. Oh man. Generics. I mean, I really like using generics and the kind of type inference they offer, but algorithms are damn complex. Type inference with generics is the real challenge. There were two problems that I had to face: first, the internal representation of generics in Groovy is not well suited for type inference, which makes algorithms unnecessary complex (complexity over complexity). Second problem was testing: although you think about a lot of cases, you only develop what you think of. We need testing and we need tests made by others. This is also why there were so many bug fixes in the RC phase: people tend to test the RC versions, not the betas, but this is life!

If you’re interested in the internals, just be aware that the type checker is implemented as an AST transformation (yes!). So basically, it’s a (very) big transformation. The first step was to make it work, then make it right. Now, before making it fast (we didn’t talked about performance of the type checker itself, but it should be almost unnoticeable) I need to simplify code and, of course, fix the bugs!

Invoke Dynamic

A small word on invoke dynamic support in Groovy 2. We must congratulate Jochen Theodorou for that, because he spent a lot of time working on a feature which is hidden for most users. But his work is an important milestone for the future of Groovy, and while we will work on improving the performance of invoke dynamic in the 2.x versions of Groovy (as well as the JVM folks are going to improve performance of invokedynamic in the JVM itself because it will be at the core of the upcoming lambdas in JDK 8), Groovy 3 will probably be a language heavily relying on this feature.

Get it and test it!

Last but important, here are some links: