Tin Can Camera

Link: http://blogs.sun.com/roller/page/dagastine?entry=java_is_faster_than_c

In response to David Dagastine's use of the SciMark numeric benchmark, where he finds that there's very little difference between in speed between Sun's JVM and native C compiled with Visual C++ 6, using a numeric benchmark that uses static methods to manipulate arrays of primitive data types, primarily doubles.

This agrees with my experience for numerical code. When I was lead software engineer in the technical computing group of the world's largest aerospace company, I moved several numerical projects from Fortran or C++ to Java. At the time, one the major reason were bugs in VC's optimisation code which meant the results were wrong, and discrepancies between release and debug builds that made C++ bugs harder to track, whereas Java had performance within 10% and much stronger guarantees of its results.

But MS got round to fixing the bugs, and in VS 2005 appear to have made some major gains in performance.

On my laptop (WinXP SP2, AMD Athlon 3400+), comparing Java 1.6.0-beta with VS 2005 express gives 388.67 and 631.12 mflops respectively, which is a much bigger difference than I observed between Java 1.4 and VS6. The SciMark code is portable C, so they don't use C++ intrinsics, which can give an order of magnitude speed up for certain code (though since it's working in double precision you probably wouldn't get that much improvement for quite a lot less readability, as you can only process 2 doubles at a time with SSE2, but can process 4 floats, and so keep whole vectors in single registers).

So I'd disagree that Java 1.6 is faster than C/C++ in this case on the two counts - firstly it's significantly slower than the C code it was benchmarked against, and secondly SciMark is not using current C++ techniques for optimising numeric code, which any serious numerical library would.

The SciMark code is optimised to a similar degree on both C and Java platforms, but the nature of the Java language means that there are further optimisations which are impossible (due to not having access to intrinsics or inline assembler), and many more that go against Java's intent of being a safe object oriented programming language (such as passing around char arrays rather than String objects).

I've observed before, that without any kind of meta-programming facility, writing optimised numerical Java is very painful (though I haven't had cause to try again - even with better function inlining, you have to copy all the parameters you need, and construct and object for any result which is more complicated than a single type, so it doesn't impact on the complexity much). For small projects, you can do it be hand, for others you have to either use a code generator from a higher level model, or just put up with it running slow. It would be nice if having to hard code optimisations disappear from Java code as Sun improves the JVM's capabilities. For example, copying all state out of an object into local variables, then performing a tight loop, then copying back and gives a measurable performance boost. The same applies for nested loops on multidimensional arrays - you hoist the reference to the row. Currently you have to do such tedious code by hand, though Fortress seems to favour such optimisations in its transactions, so may be a better way ahead for numeric code.

For something a little closer to Java's enterprise use, say date formatting functions, the difference between Java6 and C++ in small benchmarks is closer to a factor of 7 slower, for example see this thread.

However, what I learnt in that thread isn't that if you port C to Java it runs seven times slower (usually it's not as bad as that, and as the SciMark above shows - a good programmer can write Fortran in any language and it'll run quickly), but rather that even good Java programmers won't think of solving the problem in the faster way, but use Java idioms instead.

The idiomatic Java way of solving that problem was twenty times slower than the C code. If you're used to thinking that StringBuilder is faster than StringBuffer is faster than concatenating String, you won't write the sort of code that works fast, however good the VM gets. The granularity at which safety is assured in Java - immutable objects - means you can't optimise across method calls. You need to wrap up a string as an immutable object to protect it from modification, rather than proving its const via the type system.

Java is designed to be a safe, object oriented language. Fast code either operates above this level - for example, using a DSL specify that you want to multiple a matrix row by factor and it'll generate the code inline for it which works on the primitive types - or below this level, such as C can do, and the generated code would. Having to work only with objects with a statically typed interface means you don't have the flexibility to get a high level view of the logic, and get to feel the bits between your toes if you want to.

The granularity problem in Java also applies to array bounds and other checks - much of what can be known at compile time is lost by the Java type system. The attempt to fix this with generics didn't get far in terms of capability - you can only specify the generic type as far as the current Java type system allows, and the information is erased at runtime, so the JVM can't selectively remove redundant checks (other than checkcast). Traditionally, this hit you in loops of a known number of iterations against an array of known length – the array access instruction would still check the bounds. That one may have been mitigated, but only by means of specifically getting the JVM to look for that particular pattern. Traits allow you to provide a general mechanism. For example, a method which can only take a square matrix only needs to check its arguments if they are not known to be square at compile time, which depends where the method's being called from, so requires a backwards possible-caller analysis, and different code to be generated based on the results. Having an NxM matrix with a trait for squareness, also square matrix subclasses with the trait fixed, and knowing this trait doesn't change after the object is created would allow the check to be done both at compile and run time as appropriate. I'm not aware of any language yet that has that level of sophistication of traits inference. (I also don't like the compiler's lack of escape analysis for generics - if you've only ever put a Foo into a list, then you shouldn't have to tell it that it's a list of Foo. But that's for another time, as I'm not doing much Java programming in the real world now, and getting on to how traits can improve dynamic language performance is another post.)

So I don't think the JVM will ever be the fastest thing. Java's too much in the middle ground:

The language lacks the meta-programming facilities that hide the tedium of hand optimisations

The language lacks access to low level instructions that allow optimisations

There is no concept of a cross-object inference to prove unsafe operations cannot happen, rather it relies on information hiding to prevent them, which forces conservative restrictions and redundancy

Java idioms use object level safety and synchronised blocks, rather than traits and transactions. This prevents many possible optimisations based on direct access, thread-safe caching, and often requires redundant checks on information already known.

There were good reasons for these decisions in the evolution of Java, but the effects of them won't ever go away, and as long as it's the Java virtual machine, the JVM will have some compromise on its performance.

But it never was intended to compete with Fortran, and is fast enough for many uses.

TME

Labels: java, programming, virtual-machine

Tin Can Camera

2006-11-14

Current reading

2006-06-11

How fast is Java?

About Me

Previous Posts

Blog Roll

Archives