|
A Plea For Programs
September 19, 2008
[Update 9/21/2008: I've got a simple sample program for people to port, plus I've got at least some code for - Clojure, JavaScript/Rhino, JPC & JRuby; missing Scala at least - thanks].
I would like some non-Java Java-bytecode programs to do performance testing, for a talk I'm giving this coming Friday (my bad for starting this late) and I'm hoping my gentle readers can supply some. I'd like programs in different languages, but ones that are easy to setup and run. I'm going to do internal JVM profiling, so I'm not all that concerned with the output or "Foo-per-second" results. Ideally, my programs would be:
- Non-Java. Clojure, Scala, JPython, JRuby all come to mind. The more variation, the merrier!
- Easy setup. I'm not an expert in any of these, so the resulting program has to be easy to setup and run. Perferably a simple "java -cp Weirdo.jar FunnyProgram" command line.
- Plain JVM. Note that the 'java' command has to be there; I intend to use Azul Systems' JVM for profiling and we have our own. Any kind of odd-ball jar or class files should be fine.
- Long enough. The program has to run for several minutes at least, without "babysitting". Long enough for the JIT to settle down (if it's going to), and long enough for decent profiling.
- Little I/O. Besides DBs being a pain to setup, I'm really looking for CPU-bound programs. Plain file I/O is fine, if the files are provided and can be scripted easily (e.g. "java -cp Weirdo.jar FunnyProgram < BigInput.dat > /dev/null").
- Be multi-threaded. Not a requirement, but a definite nice-to-have. Several of these languages support alternative threading & coherency models and I'd like to test these features.
- Be Open Source, so I can post the collection for others to compare against. This is NOT a hard requirement; I'm all fine with keeping private anything you request be kept private. Performance profiling data *will* be released, as that is what the talk is about! (I'm also fine with signing NDA's but that's probably not going to be an issue with this crowd).
- An example: A multi-threaded Mandelbrot program would be fine, computing a 1000x1000 grid of points centered around (1.0,1.0) with a spread of (1.0,1.0) - so fill in the grid (0.5,0.5) to (1.5,1.5), using your choice of thread controls.
- Please include any names, so I can give credit where credit is due.
I hope to discover things like:
- How close does "plain code" match the JVM/JIT's expectations? How well does the JIT turn "plain code" into machine instructions? I hope to present the JIT'd code for sample language constructs and detailed profiling data.
- How well does the function-call logic match the JVM/JIT's expectations? Can trivial functions be inlined? What's the cost of a not-inlined function-call?
- Other interesting costs? (e.g., endless new-Class churning, endless new-bytecode churning causing endless JIT'ing; endless new weak-ref or finalizer creation causing GC grief, etc)
- How well does the alternative threading & coherency scale? Can Mandelbrot run on a thousand CPUs? (I expect: trivially yes). How about programs with more interesting coherency requirements?
I put a sample Java program here, if you'd like to port something really simple. The inner loop of this program looks like: "for( i=0; i<1000000; i++ ) { sum += ((int)(sum^i)/i); }". The JIT'd assembly code from HotSpot's server compiler looks like this, unrolled a few times:
| 2.83% |
243 |
0x12d93878 |
add4 r5, r4, 1 |
// tmp=i+1; unrolled 8 times, this is #1
|
| 0.06% |
5 |
0x12d9387c |
xor r3, r5, r1 |
// sum in r1, tmp in r5
|
| 0.06% |
5 |
0x12d93880 |
beq r5, 0, 0x012d93b40 |
// zero check before divide |
| 0.35% |
30 |
0x12d93884 |
div4 r0, r3, r5 |
// divide, notice cycles on next op |
| 2.64% |
227 |
0x12d93888 |
add4 r1, r0, r1
|
// sum += (sum ^ tmp)/tmp |
|
As expected, there's a pretty direct mapping from the source code to the machine code. I'd like to see how other JVM-based languages stack up here. Email me directly with small programs, or post links here.
Thanks! Cliff
Category: Web/Tech | |
TrackBack (0)
TrackBack
TrackBack URL for this entry: http://www.typepad.com/services/trackback/6a00d83451bd7669e2010534b280ce970b
Listed below are links to weblogs that reference A Plea For Programs:
Comments
There's been some recent JavaScript performance battles going on among Mozilla, Google, and Apple each with their own JS engines.
Mozilla has TraceMonkey (plus Tamarin with Adobe):
http://weblogs.mozillazine.org/roadmap/archives/2008/08/tracemonkey_javascript_lightsp.html
... Trace Trees based JIT
Google has V8 by Lars Bak (+ Strongtalk & Animorphic guys):
http://code.google.com/apis/v8/design.html
... Fast Property Access, Dynamic Machine Code Generation, Efficient Garbage Collection
Apple has SquirrelFish Extreme:
http://webkit.org/blog/214/introducing-squirrelfish-extreme/
... bytecode optimizations, polymorphic inline caching, a lightweight “context threaded” JIT compiler, and a new regular expression engine that uses our JIT infrastructure.
So back to your original request, there's a JavaScript engine implementation for Java called Rhino. And you can run the same tests (SunSpider) that the above 3 companies are using to evaluate their implementations.
svn checkout http://svn.webkit.org/repository/webkit/trunk/SunSpider
wget ftp://ftp.mozilla.org/pub/mozilla.org/js/rhino1_7R1.zip
unzip rhino1_7R1.zip
cd SunSpider
./sunspider --shell=java --args="-jar ../rhino1_7R1/js.jar"
The whole testsuite will run a number of times (with one warmup round). But I believe it'll run each test with a new instance of the engine, so that might be problematic for your "long enough" requirement.
The V8 benchmarks are included with the SunSpider tests, and they seem to be longer running, and I suppose you could tweak the file to do more iterations. (e.g., open tests/v8-richards.js and jump to the bottom of the file and up the bounds)
Then you can run a single test with..
./sunspider --tests richards --v8-suite --shell=java --args="-jar ../rhino1_7R1/js.jar"
It might complain at the end when it's analyzing the results, but if you're just interested in watching it run, this should be fine. If you do want to see the analysis results, you can do:
java -jar ../rhino1_7R1/js.jar -f tmp/sunspider-test-prefix.js -f resources/sunspider-analyze-results.js -f tmp/sunspider-results-2008-09-19-12.28.05.js
Where the last file is the output of the run.
Posted by: Edward Lee | Sep 19, 2008 11:11:51 AM
If you post code in Java or even pseudo code, it will be much easier to post a "translation" to another language of your choice.
Posted by: Ones Self | Sep 19, 2008 11:12:28 AM
Oh, and if you just wanted to run the plain js file without any SunSpider harness stuff:
java -jar ../rhino1_7R1/js.jar -f tests/v8-richards.js
Or if you do want the timer but not the sunspider perl script:
1) create a file myTests.js
2) make its contents this one line: var tests = [ "v8-richards" ];
3) java -jar ../rhino1_7R1/js.jar -f myTests.js -f resources/sunspider-standalone-driver.js
Posted by: Edward Lee | Sep 19, 2008 11:25:42 AM
Where do I get "../rhino1_8R1/js.jar"? Looks like the 1st part of your post got clipped off.
Thanks,
Cliff
Posted by: Cliff Click | Sep 19, 2008 12:15:49 PM
Does this help? http://beust.com/weblog/archives/000493.html
Posted by: Eugene Kaganovich | Sep 19, 2008 3:29:54 PM
Fooled around with Rhino just now, on Azul. Our CPUs are slower than an X86 (but we have *lots* more), but certainly there are different tradeoffs. In this case the hot code looks like this:
Hot method:
org.mozilla.javascript.ScriptableObject.accessSlot(Ljava/lang/String;II)Lorg/mozilla/javascript/ScriptableObject$Slot;
....
0.70% 33 0x12f924a4 move r1, r13 0x05b68066
0.96% 45 0x12f924a8 extract r0, r13, 0, 30, 0 0x21a0001e
3.15% 148 0x12f924ac rem4 r2, r0, r9 0x0414808c
33.35% 1,565 0x12f924b0 aadd8 r2, r5, r2 0xa0b10083
3.15% 148 0x12f924b4 ld8 r0, 16(r2) 0xec500002
5.82% 273 0x12f924b8 lvb r0, 16(r2) 0xa4500002
1.15% 54 0x12f924bc beq r0, 0, 0x012f9252c 0x5000001c
...
It's hard to read so let me decipher: 1/3 of all time is spend in 'rem4' (4-byte modulus function), followed by an array lookup. This is a classic hash-table pattern using a mod fcn to do a lookup, and it's happening *all* the time.
Cliff
Posted by: Cliff Click | Sep 19, 2008 3:40:45 PM
The cedric/beust challenge codes mostly run too fast - or at least the Java versions, with Scala close behind. Might do the trick for JRuby/JPython-style solutions.
Let me add the obvious optimization to Crazy Bob's solution - replace the linked-list with just a pile-o-bits. Each digit can be represented in 4 bits; there are only 10 digits, so the BCD-style number comes in at 40 bits - fits in a long. Just requires some shift/mask operations to insert & delete values - should be faster than cache-hitting pointer operations.
Thanks for the link,
Cliff
Posted by: Cliff Click | Sep 19, 2008 4:06:17 PM
Hi,
Although sometimes controversial, the Computer Language Benchmarks has various benchmarks for many languages. Maybe some of them can be used for this experiment. The link for the scala ones is:
http://shootout.alioth.debian.org/u64q/scala.php
Ismael
Posted by: Ismael Juma | Sep 20, 2008 9:23:18 AM
I've uploaded a simple Clojure ant system solver for the traveling salesman problem. It's not a very good solver, but it is a simple, truly parallel implementation, with agents for 'ants' and refs/transactions for the shared edge data.
http://clojure.googlegroups.com/web/tsp.zip
Rich
Posted by: Rich Hickey | Sep 21, 2008 9:12:50 AM
Thanks for the links. So far I think I have code for Clojure, JPC & JavaScript/Rhino, and have links JRuby at least. After I chase down the Scala link (thanks Ismael) I probably have a quorum, although I'm still interested in other languages and ports of my simple sample program (the better to get apples/apples comparisons - yes it's a trivial program & does not test nearly any interesting features; I'm looking for really simple language/JVM mismatch issues first).
Thanks
Cliff
Posted by: Cliff Click | Sep 21, 2008 9:51:26 AM
Thanks Rich - a quicky peek at this shows about 50% of cycles spend on GC; doing about 250 Meg/sec of allocation. There's also a fair amount of cycles doing not-inlined reflective calls (eg JVM_GetArrayElement), and cycles in some top-level generated invoke function (clojure.spread__132.invoke), and cycles in the slow-path subtyping check. About 200 threads get spawned, but only 10 are busy (thread-pool heuristic spawning 1 thread per cpu? but we have more cpus than that).
More later, as I get time to drill down.
Cliff
Posted by: Cliff Click | Sep 21, 2008 10:22:48 AM
Cliff -
If you pass a number higher than 10 you should get that many busy threads:
java -server -cp clojure.jar clojure.lang.Script tsp-ants.clj -- 100
Rich
Posted by: Rich Hickey | Sep 21, 2008 11:01:09 AM
For fun I cranked it up to 600 ants. I see: +700 threads in pool-1 (1 per cpu looks like), all idle, and 600 threads in pool-2 all busy worker ants.
Allocation jumped to 10Gig/sec, but the per-thread profile didn't really change. It's smeared out across:
- clojure.spread__132.invoke,
- JVM_GetArrayElement (something the JIT should inline),
- partial_subtype_check (means JIT couldn't deduce types),
- new_tlab (allocation)
- java.lang.reflect.Array.getLength (again, JIT should inline)
- clojure.lang.Ref.get
Cliff
Posted by: Cliff Click | Sep 21, 2008 11:33:30 AM
The pool usage is as designed (this demo doesn't use the API that uses the other pool).
Thanks for the reflection tipoff - I've already removed the reflective array calls responsible for spread/JVM_GetArrayElement/Array.getLength. That fix is in Clojure SVN now.
The ephemeral garbage is an as-designed aspect of Clojure and something I hope for a VM to handle well.
Rich
Posted by: Rich Hickey | Sep 21, 2008 11:54:02 AM
Rich - Can you shoot me an updated clojure.jar?
As for the ephemeral garbage, it's a good bet that escape analysis will show up as the default in the next major release or so.
Thanks,
Cliff
Posted by: Cliff Click | Sep 21, 2008 11:59:23 AM
Ok, here's the 2-min perf analysis job on JRuby running my simple benchmark:
This guy is allocating about 150M/sec, although it's all young-gen objects (so definitely cheaper).
It's all RubyFixnum's as Charles Nutter predicted.
The most common call-stack looks like this:
* org.jruby.RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.call (org/jruby/RubyFixnum$i_method_1_0$RUBYINVOKER$op_plus.gen, bci=-1, server compiler)
* org.jruby.runtime.callsite.CachingCallSite.call (CachingCallSite.java:114, bci=39, server compiler)
* simple.method__1$RUBY$test1 (simple.rb:6, bci=145, server compiler)
where a gen'd method $test1 calls ...CachingCallSite.call calls op_plus/op_xor/op_div/op_and, etc.
The op_plus flavor calls allocate a new FixNum after verifying the result of the 'plus' doesn't overflow.
The $RUBY$test1 guy is interesting; the CachingCallSite.calls have all been inlined, so he's got a series of calls to op_plus, op_div, etc. Those calls in-turn are inline-cached - meaning correctly predicted but NOT candidates for inlining.
Cliff
Posted by: Cliff Click | Sep 21, 2008 12:17:57 PM
You can find here[1] two translations of the sample program into Scala. One of them is a "fast" version that is even uglier than the Java version, but performs about the same and the other is a "slow" version that uses nicer constructs but hits one of the weak spots in Scala at the moment: when working with collections of primitives, a lot of boxing/unboxing takes place.
[1] http://github.com/ijuma/misc/tree/master/scala/bench/Simple.scala
Posted by: Ismael Juma | Sep 21, 2008 12:37:36 PM
Ok, Ismael's FastSimple.scala is the closest to "make fast bytecodes", as the JIT'd code essentially matches my Java version.
The SlowSimple version allocates at 70M/sec, and has this top level profile:
38.3% 75,748 new_tlab (allocation
33.4% 65,945 scala.Iterator$class.foldLeft
10.1% 20,001 scala.Range.length
3.3% 6,585 scala.RandomAccessSeq$$anon$12.next
So, pretty darned far performance-wise from the FastSimple version.
Cliff
Posted by: Cliff Click | Sep 21, 2008 5:38:08 PM
Hello Cliff,
please let me chime in with a Jython translation of your simple Java program.
Setup:
1. Download http://downloads.sourceforge.net/jython/jython_installer-2.5a3.jar
2. cd into the download directory
3. java -jar jython_installer-2.5a3.jar -s -t standalone -d YOUR_DIRECTORY
(this will create a 'jython-complete.jar' file in YOUR_DIRECTORY)
4. Point your browser to http://pylonshq.com/pasties/958
5. Copy the Jython code (lines 1 - 12) and save it as YOUR_DIRECTORY/simple.py
6. Check that YOUR_DIRECTORY contains exactly 2 files: jython-complete.jar and simple.py
Run:
7. cd into YOUR_DIRECTORY
8. java -jar jython-complete.jar simple.py
Have fun, and thanks!
Oti (on behalf of the Jython development team)
Posted by: Oti | Sep 24, 2008 3:22:25 PM
Ohhh - easiest launch instructions yet.
I got it up in the profiler already.
Thanks,
Cliff
Posted by: Cliff | Sep 24, 2008 3:30:31 PM
Cliff,
since the Jython compiler in 2.5a3 is completely different from the one in the (stable) 2.2.1 version, it might be interesting to compare the two.
The procedure is almost the same: The jar files are named differently, and you should make sure to install into an empty directory...
Setup:
1. Download http://downloads.sourceforge.net/jython/jython_installer-2.2.1.jar
2. cd into the download directory
3. java -jar jython_installer-2.2.1.jar -s -t standalone -d YOUR_DIRECTORY_2
(this will create a 'jython.jar' file in YOUR_DIRECTORY_2)
4. Point your browser to http://pylonshq.com/pasties/958
5. Copy the Jython code (lines 1 - 12) and save it as YOUR_DIRECTORY_2/simple.py
6. Check that YOUR_DIRECTORY_2 contains exactly 2 files: jython.jar and simple.py
Run:
7. cd into YOUR_DIRECTORY_2
8. java -jar jython.jar simple.py
Thanks,
Oti.
Posted by: Oti | Sep 24, 2008 3:49:24 PM
on my system, your simple.java takes 11s. translating to http://code.google.com/p/talc/ i get the following source:
function test(sum: int) : int {
for( i:=1; i":([Ljava/lang/String;)V
8: putstatic #18; //Field ARGS:Lorg/jessies/talc/ListValue;
11: getstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
14: iconst_0
15: aaload
16: checkcast #47; //class org/jessies/talc/IntegerValue
19: checkcast #47; //class org/jessies/talc/IntegerValue
22: dup
23: putstatic #49; //Field sum:Lorg/jessies/talc/IntegerValue;
26: pop
27: getstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
30: iconst_0
31: aaload
32: checkcast #47; //class org/jessies/talc/IntegerValue
35: checkcast #47; //class org/jessies/talc/IntegerValue
38: dup
39: astore_1
40: pop
41: aload_1
42: getstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
45: iconst_1
46: aaload
47: checkcast #47; //class org/jessies/talc/IntegerValue
50: invokeinterface #56, 2; //InterfaceMethod java/lang/Comparable.compareTo:(Ljava/lang/Object;)I
55: iflt 64
58: getstatic #62; //Field org/jessies/talc/BooleanValue.FALSE:Lorg/jessies/talc/BooleanValue;
61: goto 67
64: getstatic #65; //Field org/jessies/talc/BooleanValue.TRUE:Lorg/jessies/talc/BooleanValue;
67: getstatic #62; //Field org/jessies/talc/BooleanValue.FALSE:Lorg/jessies/talc/BooleanValue;
70: if_acmpeq 134
73: getstatic #49; //Field sum:Lorg/jessies/talc/IntegerValue;
76: getstatic #49; //Field sum:Lorg/jessies/talc/IntegerValue;
79: checkcast #47; //class org/jessies/talc/IntegerValue
82: invokestatic #69; //Method test:(Lorg/jessies/talc/IntegerValue;)Lorg/jessies/talc/IntegerValue;
85: checkcast #47; //class org/jessies/talc/IntegerValue
88: invokevirtual #72; //Method org/jessies/talc/IntegerValue.add:(Lorg/jessies/talc/IntegerValue;)Lorg/jessies/talc/IntegerValue;
91: checkcast #47; //class org/jessies/talc/IntegerValue
94: dup
95: putstatic #49; //Field sum:Lorg/jessies/talc/IntegerValue;
98: pop
99: getstatic #49; //Field sum:Lorg/jessies/talc/IntegerValue;
102: getstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
105: iconst_2
106: aaload
107: checkcast #47; //class org/jessies/talc/IntegerValue
110: invokevirtual #75; //Method org/jessies/talc/IntegerValue.and:(Lorg/jessies/talc/IntegerValue;)Lorg/jessies/talc/IntegerValue;
113: checkcast #47; //class org/jessies/talc/IntegerValue
116: dup
117: putstatic #49; //Field sum:Lorg/jessies/talc/IntegerValue;
120: pop
121: aload_1
122: dup
123: invokevirtual #79; //Method org/jessies/talc/IntegerValue.increment:()Lorg/jessies/talc/IntegerValue;
126: checkcast #47; //class org/jessies/talc/IntegerValue
129: astore_1
130: pop
131: goto 41
134: getstatic #49; //Field sum:Lorg/jessies/talc/IntegerValue;
137: invokestatic #85; //Method org/jessies/talc/Functions.puts:(Ljava/lang/Object;)V
140: return
public static org.jessies.talc.IntegerValue test(org.jessies.talc.IntegerValue);
Code:
0: getstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
3: iconst_3
4: aaload
5: checkcast #47; //class org/jessies/talc/IntegerValue
8: checkcast #47; //class org/jessies/talc/IntegerValue
11: dup
12: astore_1
13: pop
14: aload_1
15: getstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
18: iconst_4
19: aaload
20: checkcast #47; //class org/jessies/talc/IntegerValue
23: invokeinterface #89, 2; //InterfaceMethod java/lang/Comparable.compareTo:(Ljava/lang/Object;)I
28: iflt 37
31: getstatic #62; //Field org/jessies/talc/BooleanValue.FALSE:Lorg/jessies/talc/BooleanValue;
34: goto 40
37: getstatic #65; //Field org/jessies/talc/BooleanValue.TRUE:Lorg/jessies/talc/BooleanValue;
40: getstatic #62; //Field org/jessies/talc/BooleanValue.FALSE:Lorg/jessies/talc/BooleanValue;
43: if_acmpeq 96
46: aload_0
47: getstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
50: iconst_2
51: aaload
52: checkcast #47; //class org/jessies/talc/IntegerValue
55: invokevirtual #75; //Method org/jessies/talc/IntegerValue.and:(Lorg/jessies/talc/IntegerValue;)Lorg/jessies/talc/IntegerValue;
58: checkcast #47; //class org/jessies/talc/IntegerValue
61: dup
62: astore_0
63: pop
64: aload_0
65: aload_0
66: aload_1
67: invokevirtual #92; //Method org/jessies/talc/IntegerValue.xor:(Lorg/jessies/talc/IntegerValue;)Lorg/jessies/talc/IntegerValue;
70: aload_1
71: invokevirtual #95; //Method org/jessies/talc/IntegerValue.divide:(Lorg/jessies/talc/IntegerValue;)Lorg/jessies/talc/IntegerValue;
74: invokevirtual #72; //Method org/jessies/talc/IntegerValue.add:(Lorg/jessies/talc/IntegerValue;)Lorg/jessies/talc/IntegerValue;
77: checkcast #47; //class org/jessies/talc/IntegerValue
80: dup
81: astore_0
82: pop
83: aload_1
84: dup
85: invokevirtual #79; //Method org/jessies/talc/IntegerValue.increment:()Lorg/jessies/talc/IntegerValue;
88: checkcast #47; //class org/jessies/talc/IntegerValue
91: astore_1
92: pop
93: goto 14
96: aload_0
97: checkcast #47; //class org/jessies/talc/IntegerValue
100: areturn
private static void __init_constants__();
Code:
0: iconst_5
1: anewarray #4; //class java/lang/Object
4: dup
5: iconst_0
6: new #47; //class org/jessies/talc/IntegerValue
9: dup
10: ldc #97; //String 0
12: bipush 10
14: invokespecial #100; //Method org/jessies/talc/IntegerValue."":(Ljava/lang/String;I)V
17: aastore
18: dup
19: iconst_1
20: new #47; //class org/jessies/talc/IntegerValue
23: dup
24: ldc #102; //String 100
26: bipush 10
28: invokespecial #100; //Method org/jessies/talc/IntegerValue."":(Ljava/lang/String;I)V
31: aastore
32: dup
33: iconst_2
34: new #47; //class org/jessies/talc/IntegerValue
37: dup
38: ldc #104; //String 1073741823
40: bipush 10
42: invokespecial #100; //Method org/jessies/talc/IntegerValue."":(Ljava/lang/String;I)V
45: aastore
46: dup
47: iconst_3
48: new #47; //class org/jessies/talc/IntegerValue
51: dup
52: ldc #106; //String 1
54: bipush 10
56: invokespecial #100; //Method org/jessies/talc/IntegerValue."":(Ljava/lang/String;I)V
59: aastore
60: dup
61: iconst_4
62: new #47; //class org/jessies/talc/IntegerValue
65: dup
66: ldc #108; //String 10000000
68: bipush 10
70: invokespecial #100; //Method org/jessies/talc/IntegerValue."":(Ljava/lang/String;I)V
73: aastore
74: putstatic #45; //Field $__talc_constants:[Ljava/lang/Object;
77: return
}
i have an uncommitted alternative that's fixnum-only. that takes 55s, which isn't bad because just changing your Java to use long instead of int (my fixnums are 64-bit) increases its run-time to 33s. i'm a bit surprised, actually. a 3x slowdown for switching from int to long, and only a 2x slowdown for boxing? with a 64-bit JVM on a 64-bit machine?
anyway, if you can be bothered to build, "./bin/talc -DnS simple.talc" will show you the generated code [S] without running [n], or "./bin/talc simple.talc" will just run it.
i was pleased to see i get the same answer as with your Java, though ;-)
Posted by: Elliott Hughes | Sep 29, 2008 9:53:06 PM
comments system doesn't escape for html? bah!
function test(sum: int) : int {
for( i:=1; i10000000; i++ ) {
sum &= 0x3FFFFFFF;
sum += (sum^i)/i;
}
return sum;
}
sum:=0;
for( i:=0; i100; i++ ) {
sum += test(sum);
sum &= 0x3FFFFFFF;
}
puts(sum);
Posted by: Elliott Hughes | Sep 29, 2008 9:55:26 PM
comments system swallows HTML-encoded "less-than"s? grr!
Posted by: Elliott Hughes | Sep 29, 2008 9:56:42 PM
It's not really about the speed - my benchmark is too trivial to measure any interesting speed issue. It's really about how close you could possibly get "to the metal" if you wanted to.... for which I'm using speed-within-an-order-of-magnitude as a proxy. I haven't been measuring runtimes; I'm just profiling the trivial hot loop and looking at the code.
If you send me jar file with a simple command line I'll profile your bytecodes on Azul's JVM - although I suspect the results will be fairly close to what Jython, JRuby, JavaScript/Rhino & Clojure all reported: runtimes are dominated by allocations costs (although switching to a long will trigger long-division which probably turns into a largish subroutine call).
Cliff
Posted by: Cliff Click | Sep 29, 2008 10:05:55 PM
Post a comment
|