Cliff Click Jr.’s Blog's Blog

« How to stop a compiler | Main | Travel Time »

NonBlocking HashTable Source Code
April 24, 2007

I am please to announce, after loooong delay, the source code to my NonBlocking Hash Table is available as open-source on SourceForge:      

http://sourceforge.net/projects/high-scale-lib

I'll be adding to this library over time when I'm not so busy!  Right now I'm porting Java6 to Azul, reviewing OOPSLA papers (19 papers, about 15 pages each, mostly thick academic stuff), and making JavaOne slides.  In fact, I'll be taking about the NonBlocking Hash Table at JavaOne (slides) this year, along with a couple of other talks.

Here's an interesting (to me) discussion about tiered compilation in HotSpot.  Interesting to me because at Azul we've basically forked from the main HotSpot source base on this issue.  Our version of tiered compilation is alive and well, with a tidy performance win pretty much across the board.  Of course, we don't inline true virtual calls (i.e., megamorphic calls or calls which actually target more than one method at runtime - as opposed to those that can be statically predicted), because our hardware lets us do such calls fairly cheaply.  Inlining the megamorphic call "nails down" the decision to do the Java-level call via the most expensive mechanism (albeit with compiler aided scheduling, which will help Sparc & X86 but not Azul), and nails it down at "server" compile time. 

Since Azul's tiered compilation is not nailing down the decision to do such calls "the hard way", if it turns out the call site is really monomorphic we get to do the call via an inline-cache, i.e., really cheap.

Cliff

PS: I struck out on Wikipedia today, failing to find entries for: megamorphic calls, inline caches, tiered JIT compilation (and several variations on that theme), as well as entries for IBM's J9 JVM (which I know has tiered compilation).  How many readers of this blog know what an inline-cache is?  (hint: it's a way to make >95% of virtual Java calls go nearly as fast as plain static calls).

Category: Web/Tech | | TrackBack (0)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451bd7669e200d83474be2969e2

Listed below are links to weblogs that reference NonBlocking HashTable Source Code:

 

Comments

Congrats!

Posted by: Bob Lee | Apr 24, 2007 9:03:05 AM

 

> How many readers of this blog know what an inline-cache is?

First! I do! Well, actually I guess all of your readers do.

Looking forward to your BOF.

Matthias

Posted by: Matthias | Apr 24, 2007 9:36:16 AM

 

Browse files shows nothing.
CVS only?

Posted by: Eric | Apr 25, 2007 10:20:23 AM

 

Yup; I'm new to SourceForge. How do I get files from CVS to "browse"?

and... no further updates to the code until after JavaOne. I've got my hands full now.

Posted by: Cliff Click | Apr 25, 2007 10:33:59 AM

 

inline-cache? Pretty weird name. It's the same as a polymorphic inline cache, isn't it? Seem to recall reading about it in a paper about a language with multiple dispatch - Cecil iirc.

Posted by: Steven Shaw | Apr 27, 2007 1:17:36 PM

 

It's a 1-entry PIC, so we drop the 'polymorphic' part. It's used by HotSpot for Java calls which cannot be proven monomorphic (but call a single target in practice). I'm sure something similar is in use by all other JVM vendors as well.

Posted by: Cliff Click | Apr 28, 2007 9:19:15 AM

 

Hi Cliff, I just found your Google talk, and then this. As it happens I'm implementing a hash table for a Java runtime right at the moment anyway, but in a very different environment to yours!

I very much like a lot of the ideas in your implementation, especially "put() does not cons" and the very closely intertwined "allocate just one power-of-two block for the hash table" (and I implemented those in Gwydion Dylan's hash tables back in 2001, but using a prime about 75% of the total table size for the number of buckets, and the rest of the table for chaining).

But two things give me the shits! Non-prime table size, and linear probing.

I see how linear probing benefits your algorithm, especially as having a link field would expand your state diagram quite a lot. I guess you must run at a pretty low load factor -- I'm working on very memory-constrained devices and want to keep my load factor high.

But what really makes me nervous is the non-prime table size. That means that if you're getting a lot of collisions to the same bucket and chaining a lot then you still will after a resize - the only thing that will improve is that if you're lucky those collisions will now be going to two buckets instead of to one.

I'm not sure why you have such a fear of mod. Yeah, division is slow, but multiplication is not -- at least on what I'm working on: for small operands it's only a couple of times the cost of AND or ADD.

The thing is ... with a hash table, the table stays the same size for quite a whle, so what you're taking the modulus with respect to stays the same for quite a while, so you can computer an integer reciprocal of the size when you resize the hash table, and then use the same reciprocal for a long time. So for a hash code of H and a table size of N (with reciprocal R = 2^32/N) you're looking at:

bucket = H - ((R*H)>>32)*N
bucket -= N if bucket >= N

OK, this is about ten times the cost of a simple AND, assuming you can grab just the upper half of a multiply, and cheap conditional execution (both of which I have on ARM). But .. you're doing calls to Object.hashCode() before it, and Object.equals() after it so I've got to think the speed difference might not be that significant.

What do you think? Did you consider and reject this?

Posted by: Bruce Hoult | Jun 3, 2007 11:47:24 PM

 

Gonna repost some of these comments as a separate blog entry.

Posted by: Cliff Click | Jun 4, 2007 8:16:37 AM

 

Hi Cliff. Just watched the video - I noticed for very high numbers of threads you mentioned it was GC throughput that was slowing you down as when copying old data you had to wrap it with a new Prime object. Given that as a bottleneck - would it help if instead of wrapping with a new prime, if values were always wrapped with a prime object from the start, and you just flipped a boolean or something on it?

Just a thought, good to see the code is out there, I will take a look.

Posted by: Michael Neale | Jun 30, 2007 2:05:49 AM

 

Nope... avoiding allocation on insert is a big help, as it allows me to delay allocation until the value is live at table-copy time (if ever).

Also I fixed some other GC issues with the version of code I was presenting at Google, that I might have been blaming on Prime creation. Prime creation is not an issue (and probably never was).

Thanks
Cliff

Posted by: Cliff Click | Jul 1, 2007 10:35:45 AM

 

Post a comment