|
A Non-Blocking HashTable, Part 2 In a previous blog I talked about my Non-Blocking Hash Table, and how to implement 'get()'. This blog will focus on 'put()' and all variations. The hard part will be figuring out how to include state machine diagrams in a blog! Quick recap: the HashTable is a closed power-of-2 sized table. The table holds Key/Value pairs in even/odd entries in a large Java Object[] array. Key slots are limited to States {null,K} and make a 1-time transition from null to K (when a new Key is inserted). Value slots are limited to States {null,V/T} where V is any Value, and T is a Tombstone. Value slots can waffle about according to the last put(); keys are deleted by replacing the value with a Tombstone; keys can be re-inserted by replacing the Tombstone with a real Value again. Two Key states times three Value states gives me 6 total states:
States 5 & 6 are functionally identical, and are only visible to threads doing a get() call where they prefetch the Value slot despite seeing a null Key. The null key counts as a miss for the get(); the stuff in the Value slot belongs to some not-quite-fully-inserted Key/Value pair - but the get()-thread does not know for which Key! Note that Key slots, once set, can never be UNset - this is required for correctness, lest Thread A tests for K1 in a slot, then Thread B deletes it & inserts K2/V2, and finally Thread A gets around to reading the value slot and reads V2 - and not the now replaced V1. put() can be broken down into 2 main steps: key-slot claim and value update. Key-slot Claim: First up: standard hash, mask-to-table-size, then look at the slot. If the slot has a Key already and it's the desired Key - we're done. If it's the wrong key - then just like get() we reprobe. If we find a null Key slot, then we attempt to CAS the slot from null to Key. If we succeed, then we're done. If not, we reprobe. If we reprobe too often, we'll trigger a resize and that's a subject for a later blog. Object put( Object K, Object V ) {
What, exactly, is CAS_key() doing? It's implemented via sun.misc.Unsafe.WeakCompareAndSwap(). I'm using the Weak version because I do not need a fence, and at least on Azul the fence costs (roughly the cost of a trip to memory). On most other platforms the CAS includes an unavoidable fence, hence the CAS includes some unavoidable cost. The weak version of CAS allows for spurious failure; so my CAS_key code will loop if it thinks the failure is spurious. i.e., it loops until it succeeds or the value in memory no longer matches the expected value. Value-slot Update: The output of the prior step is that the Key-slot is correct, but the Value-slot is really unknown. You might think that after a fresh key-claim you could be assured that the value slot is null. But another thread can leap in and change the null at any time. Hence value-update has to inspect & go with what it finds there. As before, I'm using an unfenced CAS that does not spuriously fail (internal looping on spurious failure). Object old = table[2*idx+1]; // key/val kept in even/odd pairs If the CAS fails, then somebody else must have published another value out from under us. We can either give up OR retry: if we give up we can claim it is "as if" our CAS succeeded but another thread immediately stomped our update. The problem with this approach is that we can't make the OTHER update return OUR value as it's "old value". i.e., if we had truly succeeded and been immediately stomped, the stomper would be returning our value as it's "old value". So instead we'll retry: while( true ) { What about all those put() variations? putIfAbsent? remove? replace? Turns out, it's all the same implementation in the end - we just need to gate the CAS attempt a bit more. We'll pass in an extra value which has to match the old value, or the extra value is some sentinel meaning "dont care". while( true ) {
and so forth. There are only a few more tricks to go in here: things like 'remove()' should not insert a missing key, only to turn around and insert a tombstone. 'replace()' gives up on seeing a tombstone unlike 'putifAbsent()' which only succeeds on seeing a tombstone (or null). The returned old value has to have tombstones mapped back to null (the normal "no old value" return result). Stayed for next posting, where I try to explain resize() in a blog... and probably punt to a real white-paper style presentation. Cliff Category: Web/Tech | | TrackBack (0) TrackBackTrackBack URL for this entry: Listed below are links to weblogs that reference A Non-Blocking HashTable, Part 2: CommentsWhen resizing happens, what is the good time to delete the old table? Posted by: Steven | Apr 3, 2007 10:21:43 PM It's garbage collected at the JVM's discretion. Posted by: Chris Purcell | Apr 4, 2007 1:19:13 AM You could also do epoch-based collection. But you cannot rely on there being no old threads in the old table at any time - in theory a thread read the old-table pointer, then go to sleep in the OS "forever". Only to awaken next year and attempt to touch the old table. You need 2 bits of global knowledge, both of which GC provide (but other things do it to): (1) you know the thread "got somewhere else" that is known to be outside the Hash Table code, hence it's done using any old table pointer, and (2) or else the thread has died & been reaped by the OS. The knowledge is global in the sense that it's not YOUR thread in question, it's the union of all OTHER threads. Cliff Posted by: Cliff Click | Apr 4, 2007 8:50:11 AM So I guess in a non-java world, keeping a reference count in the old table can help to determine the deleting time. Posted by: Steven | Apr 4, 2007 10:59:10 AM Reference counts are one of those things that tend to scale badly, like occupancy counts. Michael's hazard pointers (Safe Memory Reclamation) work better. Posted by: Chris Purcell | Apr 4, 2007 3:09:07 PM Thanks, it sounds something interesting that I need to check out. Posted by: Steven | Apr 4, 2007 3:44:09 PM This is very interesting stuff. Are you going to be putting out Java source code at some point? Posted by: Rob Bygrave | Apr 10, 2007 2:06:19 PM "Soon" hopefully; I'm waiting on Source Forge project approval, but I have the go-ahead from Azul to open-source it. Posted by: Cliff Click | Apr 11, 2007 5:22:21 PM Very interesting stuff! I've been working on something vaguely similar recently. See http://random-state.net/log/3387038617.html. Thanks to different constraints on the table I'm writing I need to CAS only on new writes, and modifications can just smash their values in. Posted by: Nikodemus Siivola | May 2, 2007 1:03:06 AM How do you handle copying the data without locking? That's always been the hard part of any lock-free hash table. I can use a 'smash' for modification instead of a 'CAS' only if I do not support resizing. That was my original design - but you risk losing the last update to the table during the resize if you don't somehow tell late-arriving-writers that the table entry is stale. Otherwise your solution closely mirrors mine; CAS a key (wrapper in your jargon) down; then CAS down a value; delete CAS's down a sentinel over the value but leaves the keys untouched. (null for you; I use null for a different sentinel). Posted by: Cliff Click | May 2, 2007 9:59:41 AM Hi Cliff! I spend some time thinking about and am currently trying to implement it. Could it be that the following part of the put routine if( K == key || key.equals(K) ) // key hit? can/must be replaced by the CAS statement only? Posted by: Lars Schäfers | Apr 26, 2009 1:06:03 AM No, no chance of the same key in the table twice. For a given key, all threads agree on which slot to CAS it in. If multiple threads try to insert the same key there's no telling which one 'wins' but the key goes in the same slot regardless. Cliff Posted by: Cliff Click | Apr 26, 2009 10:16:05 AM Post a comment |


