31 January 2011

Tags: asm bytecode groovy programming

Inline assembly with Groovy… Evil or awesome ?

I’m borrowing the title of this post from a Tweet by Cédric Beust, the creator of TestNG that followed my post about inlining JVM bytecode instructions in Groovy. I have read many different reactions from this, going from stupid to awesome. Though I started this as an ironic response to the languages micro-benchmarking frenzy, it seemed at the end that there was real interest for this. The question, now, is to determine whether this idea is, indeed, evil, or awesome.

One of the first things to consider is that inlining JVM bytecode instructions is not inlining assembly. The code that we’re inlining is indeed virtual machine instruction sequences. That it really important to understand, because a JVM won’t let you do nasty things. It’s really, when you’re inlining assembly code in a C program, to crash your program (or, worse, the operating system itself). In a JVM, every bytecode sequence gets verified by the JVM before getting executed, and checked at runtime just like regular code.

What does it mean ? It means that you won’t be able to execute bytecode which does things regular Java code wouldn’t be authorized to do. Let’s take an example. One could think that you could, through bytecode instructions, bypass the fact that you have to use reflection to access private fields of a class from another one :

class Private {
    private int value;
}

@groovyx.ast.bytecode.Bytecode
void set(Private b, int x) {
    aload 1
    iload 2
    putfield Private.value >> int
    return
}

def bidon = new Private()
set(bidon, 5)

The previous code, when executed, throws the regular java.lang.IllegalAccessError. It’s not different from what you would expect in pure Java. Now, what would happen if you wrote invalid bytecode sequences ? Do you think you would crash the JVM ? Let’s try :

@groovyx.ast.bytecode.Bytecode
int sum(int x, int y) {
    aload 1
    iload 2
    iadd
    ireturn
}

println sum(3,6)

Can you spot the error ? It’s the first line of the bytecode sequence : aload 1 means push the reference of the first parameter object onto the stack. Here, the first parameter is not an object, but a primitive type (int), so we should have used iload 1. Here’s the output of this program :

Caught: java.lang.VerifyError: (class: Helper, method: sum signature: (II)I) Register 1 contains wrong type

No crash at all, your program is just invalid and buggy, but nothing harmful. From this point of view, the @Bytecode annotation just provides another DSL for Groovy. It’s not different from any other DSL, apart from the fact that it’s translated directly into real JVM bytecode. Are DSLs inherently evil ? I don’t think so.

So where’s the evil ?

Well, I think the problem is that every feature you add to a language will eventually be used. So, what would prevent someone from writing his whole code with bytecode instructions ? How can this be maintainable ? Not everyone speaks bytecode (to be honest, I find this more readable than Clojure, but that’s another debate ;)).

That’s true, no one will prevent anyone from writing bytecode everywhere. But that’s exactly where code reviews and code analysis tools are useful : it’s easy to write crap with any language. It’s easy not to follow conventions, and it’s easy to take wrong decisions : why would you want to write inline bytecode right into Groovy ? The first bad reason would be to improve performance. Most likely, it’s easier to write a Java class that you’ll use into your groovy code. The second bad reason would be because that’s cool. It is, but don’t try this at work. The third one would be to think you are smarter than the Java compiler. Here are, in my opinion, a few reasons why you would want to inline bytecode into Groovy code :

  • you are a student having programming language theory and compilation courses

  • you already use bytecode generation in a project, and using this annotation it’s incredibly easy to test bytecode sequences. It makes Groovy the perfect partner for developers who use dynamic bytecode generation (try the IntelliJ IDEA plugin to have a real good combo)

  • you are a genius Groovy hacker and understand that inlining bytecode could help you save a bunch of metaclass resolutions, and smart enough to understand why writing a Java class wouldn’t help you much about that

  • you have written a dynamic bytecode generator which uses Groovy instead of direct ASM code to produce the class file

  • you eventually want to write your own JVM language

  • playing around with bytecode instructions helps you becoming a better Java programmer (totally subjective)

So, what do you think ? Evil ? Awesome ? Any good reason why you would allow or disallow inlined bytecode ?