Optimization of higher-order functions in Clojure
Start with a very simple Clojure program:
$ mkdir -p src/blog
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn -main
[& args]
(println "foo"))
EOF
Make a directory for compiled classes and compile it:
$ mkdir out
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
And check that it worked:
$ java -cp clojure.jar:out blog.core
foo
If you take a look in the out directory, you will see that simple program has
turned into five separate class files:
$ du -a out
4 out/blog/core__init.class
4 out/blog/core$loading__6434__auto____279.class
4 out/blog/core$_main.class
4 out/blog/core$fn__281.class
4 out/blog/core.class
24 out/blog
28 out
Let's add a function to our namespace:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn foo [args] args)
(defn -main
[& args]
(println (foo args)))
EOF
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
(bar)
You should now see that another class file has been created in out called
core$foo.class. Each new Clojure function we create will result in a new Java
class file being created.
Let's take a look at it:
$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: aload_0
1: aconst_null
2: astore_0
3: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #16 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: return
}
Our Clojure function blog.core/foo has turned into a Java class
blog.core$foo. It has a constructor that just calls the superclass constructor
and two methods: invokeStatic and invoke.
Let's make foo do something more than just return its argument:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn foo [args] (rest args))
(defn -main
[& args]
(println (foo args)))
EOF
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(baz)
And disassemble it again:
$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
public static final clojure.lang.Var const__0;
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: getstatic #15 // Field const__0:Lclojure/lang/Var;
3: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: checkcast #23 // class clojure/lang/IFn
9: aload_0
10: aconst_null
11: astore_0
12: invokeinterface #26, 2 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
17: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #30 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: ldc #33 // String clojure.core
2: ldc #35 // String rest
4: invokestatic #41 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
7: checkcast #17 // class clojure/lang/Var
10: putstatic #15 // Field const__0:Lclojure/lang/Var;
13: return
}
The constructor hasn't changed and invoke still just calls invokeStatic, but
invokeStatic now does something more. We have a Clojure Var that is
initialized to clojure.core/rest in the class' static initialization block
and whose invoke method is called (via the clojure.lang.IFn interface) in
our invokeStatic method. It is a bit convoluted but it's pretty clear how that
comes from our use of rest.
Now let's add another function of our own and map it over the args:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x] (str x "y"))
(defn foo [args] (map addy args))
(defn -main
[& args]
(println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(bary bazy)
We now have a new file in out: core$addy.class. This contains the new class
for our addy function, which has its own constructor, invoke and
invokeStatic methods and its own Var object for clojure.core/str.
If we disassemble core$foo again, we can see how it is used:
public final class blog.core$foo extends clojure.lang.AFunction {
public static final clojure.lang.Var const__0;
public static final clojure.lang.Var const__1;
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: getstatic #15 // Field const__0:Lclojure/lang/Var;
3: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: checkcast #23 // class clojure/lang/IFn
9: getstatic #26 // Field const__1:Lclojure/lang/Var;
12: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
15: aload_0
16: aconst_null
17: astore_0
18: invokeinterface #30, 3 // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
23: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #34 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: ldc #37 // String clojure.core
2: ldc #39 // String map
4: invokestatic #45 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
7: checkcast #17 // class clojure/lang/Var
10: putstatic #15 // Field const__0:Lclojure/lang/Var;
13: ldc #47 // String blog.core
15: ldc #49 // String addy
17: invokestatic #45 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
20: checkcast #17 // class clojure/lang/Var
23: putstatic #26 // Field const__1:Lclojure/lang/Var;
26: return
}
Unsurprisingly, we have two Var objects: one for clojure.core/map and one
for blog.core/addy. We call the invoke method on map via the IFn
interface, just as we did for rest last time, and we pass the addy Var as
an argument to that method.
This is quite a lot of indirection. By using Var objects like this, Clojure
allows the user to redefine any Var at runtime. You could connect to a
running REPL and redefine map to do something else, then foo would call
your new function instead. This is very flexible but not always necessary,
especially in a production environment.
Since indirection is slow, Clojure offers the clojure.compiler.direct-linking
option when compiling. This will replace the invocations of map via a Var
with direct method invocations:
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar baz
(bary bazy)
$ javap -c out/blog/core\$foo.class
Compiled from "core.clj"
public final class blog.core$foo extends clojure.lang.AFunction {
public static final clojure.lang.Var const__1;
public blog.core$foo();
Code:
0: aload_0
1: invokespecial #9 // Method clojure/lang/AFunction."<init>":()V
4: return
public static java.lang.Object invokeStatic(java.lang.Object);
Code:
0: getstatic #15 // Field const__1:Lclojure/lang/Var;
3: invokevirtual #21 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
6: aload_0
7: aconst_null
8: astore_0
9: invokestatic #26 // Method clojure/core$map.invokeStatic:(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
12: areturn
public java.lang.Object invoke(java.lang.Object);
Code:
0: aload_1
1: aconst_null
2: astore_1
3: invokestatic #31 // Method invokeStatic:(Ljava/lang/Object;)Ljava/lang/Object;
6: areturn
public static {};
Code:
0: ldc #34 // String blog.core
2: ldc #36 // String addy
4: invokestatic #42 // Method clojure/lang/RT.var:(Ljava/lang/String;Ljava/lang/String;)Lclojure/lang/Var;
7: checkcast #17 // class clojure/lang/Var
10: putstatic #15 // Field const__1:Lclojure/lang/Var;
13: return
}
The map invocation is no longer performed via the invoke method on a Var:
instead we call the core$map class' invokeStatic method directly. But we
do still have a Var for the function that we pass to map. This hasn't been
removed because we are not directly invoking it, we are passing it as an
argument to map.
This remaining level of indirection, the function call, adds a considerable
overhead to the use of map. To eliminate this overhead, we would like to
inline the body of the addy function.
If we were writing in a procedural language with a for loop, we probably
wouldn't have introduced a function call here anyway: the functionality provided
by addy is so trivial that we would have just written it in the body of the
for loop. But if we wish to use a functional programming style with map then
we must provide a function. We are now at the mercy of the compiler: is it
clever enough to automatically inline addy and remove the function call
overhead?
Clearly the first-stage Java compiler did not inline it. However, Java bytecode is executed on the JVM, which has a JIT that optimizes hot code. We can ask the JVM to log whenever the JIT inlines a function:
$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz | grep addy
And we see nothing. (Without the | grep addy we see lots of things, but we
want to see addy being inlined and we don't see that.) This isn't very
surprising since we only call addy once: the JIT won't consider that code a
candidate for optimization unless it is called many times.
So let's call it lots of times. Also, let's print to stderr so that we can
easily distinguish between our program's output and the JVM's PrintInlining
output:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x] (str x "y"))
(defn foo [args] (map addy args))
(defn -main
[& args]
(binding [*out* *err*]
(dotimes [_ 100000]
(println (foo args)))))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz 2>/dev/null | grep addy
@ 3 blog.core$addy::invokeStatic (19 bytes)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
Hooray! inline (hot) is what we wanted to see. That means the JIT has
determined that addy is a hot function and has successfully inlined it. This
removes the function call overhead and makes our use of map as fast as a for
loop. Or does it?
We can see that core$addy::invokeStatic has been inlined into its caller. That
might be enough to eliminate the function call overhead in some languages, but
we have already seen that a function call in Clojure consists of more than a
single function call in the resulting Java. Let's see the complete stack trace:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x]
(.printStackTrace (new Exception))
(str x "y"))
(defn foo [args] (doall (map addy args)))
(defn -main
[& args]
(println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
java.lang.Exception
at blog.core$addy.invokeStatic(core.clj:4)
at blog.core$addy.invoke(core.clj:4)
at clojure.core$map$fn__5587.invoke(core.clj:2747)
at clojure.lang.LazySeq.sval(LazySeq.java:40)
at clojure.lang.LazySeq.seq(LazySeq.java:49)
at clojure.lang.RT.seq(RT.java:528)
at clojure.core$seq__5124.invokeStatic(core.clj:137)
at clojure.core$dorun.invokeStatic(core.clj:3125)
at clojure.core$doall.invokeStatic(core.clj:3140)
at blog.core$foo.invokeStatic(core.clj:8)
at blog.core$_main.invokeStatic(core.clj:10)
at blog.core$_main.doInvoke(core.clj:10)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at blog.core.main(Unknown Source)
(bary)
Note that I added a doall to foo just to make the stack trace clearer:
otherwise addy is called when the sequence is being realized so foo does
not appear in the stack, which may be surprising. This does not change the
important point, which is that map calls core$addy.invoke that in turn calls
core$addy.invokeStatic.
invokeStatic is a static class method. We have already seen that the function
is passed to map as a Var object, which implements the IFn interface that
provides an invoke method. So it is not enough for core$addy.invokeStatic to
be inlined: we also need core$addy.invoke to be inlined too. Unfortunately, as
we've seen from the -XX:+PrintInlining output, core$addy.invoke was not
inlined.
Do transducers help with this?
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x] (str x "y"))
(defn foo [args] (into '() (map addy) args))
(defn -main
[& args]
(binding [*out* *err*]
(dotimes [_ 1000000]
(println (foo args)))))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
[charles@sunshine clj-blog]$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -cp clojure.jar:out blog.core bar baz 2>/dev/null | grep addy
@ 3 blog.core$addy::invokeStatic (19 bytes)
@ 20 blog.core$addy::invoke (7 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
@ 20 blog.core$addy::invoke (7 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
@ 20 blog.core$addy::invoke (7 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
@ 3 blog.core$addy::invokeStatic (19 bytes) inline (hot)
Yes! Now we have both core$addy::invokeStatic and core$addy::invoke inlined.
Let's check that is enough:
$ cat >src/blog/core.clj <<EOF
(ns blog.core
(:gen-class))
(defn addy [x]
(.printStackTrace (new Exception))
(str x "y"))
(defn foo [args] (into '() (map addy) args))
(defn -main
[& args]
(println (foo args)))
$ java -cp clojure.jar:src -Dclojure.compile.path=out -Dclojure.compiler.direct-linking=true clojure.lang.Compile blog.core
Compiling blog.core to out
$ java -cp clojure.jar:out blog.core bar
java.lang.Exception
at blog.core$addy.invokeStatic(core.clj:4)
at blog.core$addy.invoke(core.clj:4)
at clojure.core$map$fn__5583$fn__5584.invoke(core.clj:2734)
at clojure.lang.ArraySeq.reduce(ArraySeq.java:109)
at clojure.core$transduce.invokeStatic(core.clj:6803)
at clojure.core$into.invokeStatic(core.clj:6819)
at blog.core$foo.invokeStatic(core.clj:8)
at blog.core$_main.invokeStatic(core.clj:10)
at blog.core$_main.doInvoke(core.clj:10)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at blog.core.main(Unknown Source)
(bary)
addy.invokeStatic and addy.invoke are the only functions beneath map in
the call stack so if they are both inlined then we have eliminated all function
calls.
Why does using the transducer version of map improve inlining? I have no idea.
I tried it because, 27 minutes and 20 seconds into his talk "Transducers" at
StrangeLoop 2014, Rich Hickey said of transducers, "They're just a stack of
function calls. They're short. Potentially they could be inlined." But it is not
obvious to me why the resulting Java bytecode is any more amenable to inlining
than it was in the non-transducer case. Both times we call a static class method
via a dynamic interface method.
Obviously, in this particular case, something about the code (and especially the hot paths through the running code) allowed the JIT to inline the call in the transducer case and not in the other case. I am not certain that this would generalize: I think all we can reasonably infer from this is that inlining is far from certain.
So sometimes, if we are lucky, we can use higher-order functions with standard
functional-programming patterns such as map, filter, reduce, etc. and the
JVM JIT will optimize hot code to achieve parity with the old procedural-style
for loop. But sometimes it won't.
We must therefore accept that the use of map, filter, reduce et al. in
Clojure will often result in suboptimal code. Is there a benefit to the
programmer that outweighs the cost to the computer? I suspect this is a matter
of taste.
Personally, I find the myriad different ways of performing the same task quite
confusing. Should I be using map directly? Or as a transducer? Or with
transients? Or a loop? These are not questions that I want to think about when I
have a problem to solve: I want to think about the problem itself, not the many
different ways in which I might express the same algorithm. Having a single loop
construct would be simpler both for me and for the computer.