09 April 2007 - 04:52More Numbers (OpenDS and a hint at ApacheDS)
Some more numbers have come in. Howard continued over the weekend and into this morning. The last post was about benchmarking OpenLDAP 2.3.34 (OL) against Fedora Directory Server 1.0.4 (FDS). This post gives a glimpse of OpenLDAP 2.3.34 against OpenDS 0.1-34 (ODS) on the same hardware and software as FDS.
Authentication Rate performance of OpenDS was much closer to FDS than we expected; OpenLDAP was 3.7 - 3.9 times faster. OpenDS actually outperformed FDS at 2.5M entries (by a little). Load times and Search rate (see earlier post) were much worse (loading 247% - 265% the time for OL and 3.3 - 7.7 times slower search rate).
Oh yeah, Howard struggled to put ApacheDS's Release 1.0.1 through its paces. Let's just say that we agreed that enough was enough after a clean run at 250K. Load times - OpenLDAP 45 seconds, ApacheDS 3,051 seconds. Search rate: 1,689 entries/sec. Authentication Rate: 632 auths/sec.
First, OpenDS is to be congratulated on achieving almost as good authentication rate performance with their current code level as Fedora (descendant of iPlanet and Netscape's efforts) gets. We consider FDS performance-equivalent with Red Hat Directory Server (RHDS), the commercially supported Red Hat directory product and this bodes well for the OpenDS team.
The less said about the ApacheDS numbers, the better. The most important metric, Authentication Rate, was 4% of OpenLDAP and 14% of FDS. We wish the project well but it's evident from many of Howard's experiences with running this benchmark that there's lots of work to do on the code. For example, the load time was obtained using OpenLDAP's ldapadd tool, because using ApacheDS's bundled import tool would have taken pretty much forever. (It was only able to load 83,000 entries in 62 minutes, and the load rate was asymptotically approaching zero entries per second.) Remember, whatever else they do, computers are supposed to be fast, faster than humans. At these speeds, we could've carved the entry data by hand onto stone tablets faster than ApacheDS could load them.
Here are the graphs for the OpenDS versus OpenLDAP comparison. The OpenLDAP numbers are the same as in the (updated) FDS benchmark posting. Click on the images for a larger view.
Load Time
![]()
Note that smaller is better ![]()
... Marty and Howard
fourteen comments:
Re: ApacheDS—At those speeds, I could have carved the data onto stone tablets … if Marty would hold the chisel for me! ... Jordan
Another note we forgot to mention – OpenDS also includes their own set of command line tools, also written in Java. Out of curiosity I also measured their ldapsearch command’s performance.
The time to ldapsearch the entire 250K entry OpenLDAP database, returning the complete entries (not just their DNs, as in the previously posted data) was about 6.3 seconds with OpenLDAP’s ldapsearch tool. With the OpenDS tool, the best time was about 19.0 seconds. The Sun folks talk about how their JIT compiler can identify hotspots in running code and optimize it on the fly, but it’s clear from this simple case that even their best code can only get to 3x slower than good C code. (This should represent a best-case for a hotspot compiler – a small loop that does the exact same instructions, iterated 250,000 times. It can’t get any easier than that.) The other thing this test revealed is that their current code still needs some work, as 7 out of 10 tries to use their ldapsearch command aborted with SEGVs; only 3 tries ran to completion. I don’t understand how there can be such variation in performance, since it was the identical command invocation each time, against the same server. With a fresh process for each invocation the runs should have been 100% repeatable, but they’re not.
With their stability issues aside, this also reinforces my point that Java is not the right tool when performance matters. We used Sun’s SLAMD benchmark suite for these tests, which is also 100% Java. Not by preference, but because the only published data we have for Sun’s own directory server performance was generated with SLAMD, and we wanted directly comparable numbers. Our preference would be to use load generators written in C, and timing routines written in C, so that we can be sure the client environment isn’t introducing any latencies or measurement errors into the results. Given the at best 3:1 performance deficit between Java and C, the absolute accuracy of SLAMD’s numbers are somewhat suspect. But it’s the best we can do to get apples-to-apples numbers right now.
Java vs C is irrelevant. the difference is in the backend code and persistence. 3:1 performance deficit? where do you get that from? The main area in business computing where C is clearly better than Java despite all JIT is diskIO. Everything else is “voodoo” or bias.
@Andy: “where do you get that from?” in case you weren’t paying attention, there is no disk I/O performed when running an ldapsearch client. 19 seconds for the Java client vs 6.3 seconds for the C client is, duh, 3:1 performance difference. The same difference is observed with SLAMD’s own load generators vs load generators in C – it takes at least 3 times as many client machines to generate a given level of server load with SLAMD’s java load generators than it does with C load generators. Again, there’s no disk I/O involved there, just BER coding and network I/O. No voodoo, just reality.
Yes…there is magic good stuff injected in the CPU when it knows the byte instructions it is executing came from a C compiler rather than a Java JIT. That has nothing to do with language. Where was the data in the ldapsearch stored? Humm.. oh yeah…the DISK! Again 3x is arbitrary based on the implementations, it has nothing to do with the language of origination. Java should be marginally slower due to the weight of the type system and memory allocation—however not anything by comparison to the cost of I/O. Both Java and C end up as native code when hit against the CPU. Modern Java JITs such as sun’s 1.5 and 1.6 JVMs and recent versions of JRockit generate pretty tight code after a few iterations. It is simple voodoo to suggest there is a 3x performance difference that is related to Java itself rather than oh I dunno…say the implementation is 0.9 vs freaking 10 or better. ![]()
@Andy:
You are still completely missing the point. I’ll type more slowly since you obviously can’t comprehend that fast.
Of course it’s not just about the language, because Java includes the JVM. None of these Java-based projects are compiling with gcj and running natively, so the JVM is implicitly included in the discussion.
And again, get off your idiotic point about the disk. Disk I/O has nothing to do with the difference here. Disk I/O matters to the server sure, but the server was the same in both cases. I was only talking about differences in the client, comparing a Java client to a C client. Disk I/O is irrelevant on the client side. The data is retrieved and dumped to /dev/null, no further I/O to speak of, so just forget about it.
And as I already noted above, the client runs a very small code path iterated several thousand times. This is a best case for the JIT optimizer, and 3x slower is the best that Sun’s JVM could get. Any other case will have far more diverse code paths, yielding an even greater performance deficit.
It is simple voodoo to suggest that any level of optimized code running inside a virtual machine can deliver performance equal to or better than native code running on the native machine.
Do you have to be so hostile? What flags are you passing to the JVM? GC logs? what does the code look like? JNDI does suck. also have you tried on Java 6 because Sun likes to incorporate this stupid “activation” framework thing in their code that has bad synchronization. Java 6 mitigates that largely by optimizing out stupidity.
@Andy:
(typing slowly) ... At its core, the JVM is a machine simulator simulating a virtual machine and the code generated simulates the operation of that machine and is not optimized for the actual machine (at the assembler level) that you are running on. Java developers can be very clever in the JIT compilers but it’s hard to compete with gcc’s optimizations for C code. Particularly because the compilation penalty for C, large as it may be, is paid once at compile time (before it’s packaged for delivery and installation). The server code in C doesn’t need iterations to achieve optimum performance. Howard’s point abut the “heavy” type penalty and garbage collection are also worth thinking about as there is a code-cost for both. They are defensive mechanisms introduced to protect developers from shoddy coding practices and carry measurable performance penaltiies.
Meanwhile, we all have strong opinions and respect your enthusiastic support of a language and infrastructure that apparently serves you well. The performance advantage OpenLDAP enjoys over Java and other C implementations was achieved by careful engineering and, HYC would assert, the advantages of the C language.
I’m well aware of how the JVM works. GCC on the whole has a very poor optimizer and other compilers. There are plenty of papers about how garbage collection is not inherently slower than direct allocation depending on the collector. The default server collectors are generally more interested in lower pause times vs overall consumption. In fact the more enterpise collectors actually use MORE in the way resources quite deliberately but should easily outperform most C code doing direct allocation (parallel erasure of memory vs single threaded). So you guys can choose to believe the problem is the JVM by taking a flying leap off of scant evidence. there is a ramp up time for Java and its JIT, that is a given. I’m merely pointing out this has all the validity of the latest microsoft-funded study on how Windows is the most secure operating system ever because there was a hole in some linux program
. I’d be happy to help you create a more valid benchmark. I expect the JVM to slightly underperform, but merely that. In fact GCJ which operates a Java language front end underperforms the VM (for a number of reasons) for most stuff. OpenLDAP is far from the best example of great C code.
@Andy:
Well, common research with Java shows that garbage collection performs much worse than you indicate. E.g. http://www.realworldtech.com/forums/inde..
And having already gone thru this debate once before, I’ll leave you to read those threads.
http://www.realworldtech.com/forums/inde..
As for OpenLDAP’s code quality – it is certainly uneven, as we haven’t excised all of the crappy old code from it yet. But it is reliable, its performance is unmatched, and it is maintainable. That’s great code in anybody’s book.
re: GCC… Show some evidence for this comment. Back when I was a GCC maintainer my platforms were among the fastest in the world, both for code generation and resulting code. Granted, my code (M68K and i860) aren’t part of the current GCC codebase, so I really can’t say one way or the other.
It looks like things might have changed according to this post.
http://thoughtblender.info/2008/11/04/co..
All right, you’ve inspired me to try this.Thanks
Thanks for the in-depth article. I can tell you put a lot of work into this one. Thumbs way up.
Awesome! Some really helpful information in there. Bookmarked. Excellent source.
No trackbacks: