|
Death by Decompilation I recently got involved in trying to improve performance of a large customer application. Azul's JVMs come with an always-on profiling tool - RTPM - so it was easy to spot where the time was going. The name has been changed to protect the guilty but the name might as well have been "addToSet". A quick check of the JIT'd assembly showed that all time was spent in spinning down a long array (an ArrayList actually), checking for duplicates. In the end, no dup gets found and the new element gets appended. Adding N elements gives us a classic O(N^2) algorithm. The array was already >32000 elements in length at the time we profiled it, so time is proportional to (32K^2) ~ 1billion. Ugly stuff. Naturally (grumble) this code is in a large 3rd party not-open-source application, where the owning corporation has been recently purchased by an even larger Faceless Corporation. The "proper approach" would be to file a performance bug and wait and wait and ... We voted to try and work around the problem by doing a little decompilation, hacking the ArrayList into a HashSet, and inserting a new jar file in the bootclasspath. So we pulled this class file out of the pile-o-jar's and hit it with the standard decompiler jad (www.kpdus.com/jad.html). Presto! About 2000 lines of fairly legible Java code. Plus a pile-o-warnings about un-decompilable stuff (Ut-oh!) mostly around nested try/catch/finally clauses and inner classes. I surf the Java code looking for other uses of the offending ArrayList - there are definitely some - and start to see places where jad bailed out. When I turn javac loose on the new source file, I get lots of error messages. No problem thinks I, I'll just get another decompiler. I immediately google-up cavaj (cavaj-java-decompiler.en.softonic.com). A few moments later I get another Java file - and whoops! It's header mentions jad by name down to the exact same revision. Sure enough this decompiler is a GUI front-end for jad complete with identically broken Java code. Turns out there are a number of such GUI wrappers for jad. So I search some more, discarding mocha (www.brouhaha.com/~eric/software/mocha) for being abandoned, ClassCracker (mayon.actewagl.net.au/index.html) because the demo version didn't produce any Java code at all, and IceBreaker (www.breakertech.com) because their web site was down. I also tried Java Decompiler (java.decompiler.free.fr) but it also produced loads of syntax errors although in different places than jad. DJ (members.fortunecity.com/neshkov/dj.html) also choked on my class file but just barely. Very long forward jumps spanning nested try/finally blocks would confuse it - but the inner classes came out looking nice. A colleague also tried JDec (sourceforge.net/projects/jdec) with limited success. I also stumbled across this nice list of decompilers from google. In the end, I hand-integrated the code from JDec & DJ to get something that would compile. I then hand-refactored the code to insert (probably re-insert) generics and Java6 style iterators - this step was entirely mechanical on my part but no decompiler I tried did it. The resulting code was much smaller and more readable... and made a bug in the code more obvious (there are "before" & "after" Sets that periodically get unioned and during the union the dup-elimination code is broken and allows dups). The actual replacement of ArrayList with a HashSet took only a few minutes. In summary: decompiling a single class file to reasonable-looking java code took about 6 hrs (and 7 decompilers looked at and 5 actually turned loose on the class file), while the refactoring to correct the egregious performance bug took maybe 10-minutes (yanked all the search loops and just used 'contains' and 'addAll') and included fixing the dups-allowed bug as a pleasant side-effect. Is it time for another Java decompiler bake-off? Has somebody got a recent comparison of decompilers floating about? Thanks,
Category: Web/Tech | | TrackBack (0) TrackBackTrackBack URL for this entry: Listed below are links to weblogs that reference Death by Decompilation: CommentsAn alternative approach would be to use AOP to reroute calls to the array list to a hash map. Aspectj allows you to intercept field access and method call/execution changing the code path and/or returning value. This could be done at compile time or load time. William Posted by: William Louth | Sep 4, 2008 10:09:49 AM What I needed was an easy way to replace *just this one* usage of ArrayList - (this 3rd party program is easily a million-line program) and there wasn't a 1-to-1 call replacement. The original program had a loop over all ArrayList elements and that loop was the root cause of the performance. Also, I can't name all the places that need replacing without understanding the code... for which I need the Java. If I had the sources in the first place, then it's <10 minute fix. I suspect AOP is a poor fit for this kind of problem. Cliff Posted by: Cliff Click | Sep 4, 2008 10:34:35 AM Yes AOP is a poor fit when you do not know the underlying code. No disputing that but you did reverse engineer the code a little bit which was enough to give you the name of the field and some insight into how it is used (called). That would be sufficient to create an adhoc aspect. Neither are good options in this context but I felt that maybe the AOP solution would provide a bit more distance between you and the code alteration act. Sorry but my response was in relation to the second half of the problem - changing the code with incomplete reverse engineering of the source code. William Posted by: William Louth | Sep 4, 2008 12:20:27 PM Not exactly a comparison, but have you tried JODE http://jode.sourceforge.net/ ? Sometimes JODE works better than JAD in my stuff, sometimes JAD does. Posted by: Johann | Sep 4, 2008 1:07:34 PM AOP - yeah no problem; I didn't mean my response to sound so abrupt. AOP has it's strengths but it just didn't fit this job. JODE - Saw it, but only after I had 3 "close enough" decompilations to attempt a hand-merge. It's on my radar for next time, though. Cliff Posted by: Cliff Click | Sep 4, 2008 2:17:56 PM Sometimes classes have such an ugly mix of try/finally blocks and loops/ifs that no decompiler produces any decent Java code. If you need to make a relatively small fix to a code, then a pair of jasmin/jasper may do the trick: http://www.angelfire.com/tx4/cus/jasper/Readme.html. You disassemble with jasper into jasmin assembler's format, make correction, the reassemble with jasmin. Posted by: Roman Elizarov | Sep 4, 2008 11:27:24 PM I agree with Roman, and I also wanted to point you to jasmin. This even works for byte code that is legal but not produced by using javac. Regards, Posted by: Markus Kohler | Sep 5, 2008 1:33:33 AM Yeah, going the bytecode/asm route was next on my list but I didn't know the name of the tools. Now I do! Posted by: Cliff Click | Sep 5, 2008 9:02:35 AM One simple solution: Just open up the class file in a hex editor and change the constant pool entry "java/util/ArrayList" to "/clif/clik/ArrayList". Write a regular .java file to match that and make it do what you want. Of course if it changes the method signatures, that's a problem, but if it only uses ArrayList *internally* then that'll do ya. (Or you can just add another constant pool entry. The class file format is offset based IIRC, so you can just jam it in there without having to update locations all throughout the rest of the file.) "Dava" is a half-way decent decompiler, but it's hard to get set up and doesn't work 100% either: http://citeseerx.ist.psu.edu/viewdoc/summary;?doi=10.1.1.87.9082 There's also the now-defunct Judoscript Jamaica which I could have sworn had a .class->.ja disassembler built into it at some point. http://web.archive.org/web/20070918071746/http://www.judoscript.com/articles/jamaica.html Basically, none of the tools I've found have worked properly, at all, even Jasper/Jasmin. If you have some extra money laying around, I'd love to be the first "Azul Summer of Code" project. :) There's still a few weeks left! Posted by: Joe | Sep 5, 2008 9:59:17 AM How is it O(n^2)? Isn't checking every element of an n-sized list O(n) or am I missing something? Posted by: Matthew M. Boedicker | Sep 5, 2008 10:12:41 AM Class-replacement alone wasn't going to work - I needed to remove some bad loops buried in the middle of the nested try/catch/finally clauses (and replace the entire loop with a call to HashSet.contains) Cliff Posted by: Cliff Click | Sep 5, 2008 10:13:44 AM Each insertion is O(n), and there are O(n) insertions. Posted by: Cliff Click | Sep 5, 2008 10:14:56 AM The best resource on Java decompilers I know are http://saffeine.com/links.jsp and http://www.program-transformation.org/Transform/JavaDecompilers The latter also provides comparison of Java decompilers http://www.program-transformation.org/Transform/JavaDecompilerTests Guys from Sable team that wrote Dava decompiler also published an interesting paper on decompilation. http://www.sable.mcgill.ca/publications/papers/#cc2002-2 Posted by: Eugene Kuleshov | Sep 5, 2008 11:10:09 AM Thanks - just bookmarked this page! Posted by: Cliff Click | Sep 5, 2008 1:35:07 PM Hi Cliff, It has happened quite often to need and change some 3rd party component. In short - I stopped using decompilers after the code is horribly obfuscated and not compilable afterward. If the changes are not too big I resort on simple byte code editing. I usually use JBE ( http://cs.ioc.ee/~ando/jbe/ ) not very good but it's doable. Posted by: stanimir | Nov 27, 2008 8:52:22 AM Post a comment |


