|
Adding Transactions to the Non-BlockingHashMap I went to the winter meeting of DaCapo, a small group of senior academics and industrial scientists mostly focussed on GC & Memory-related issues (but with fairly wide interests). Much of the work is not yet published; work-in-progress stuff (lack of rigorous experiments, or badly broken implementations, or Big-Ideas-No-Code state, etc). So I won't repeat what I saw here directly, contact the authors directly if you're interested in something. The DaCapo group has decided to come out with a Java Server benchmark, similar to their popular DaCapo Benchmark Suite. This will be something that's much much easier to run than SpecJAppServer (that won't be hard! SJAS is a total nightmare to setup) and much more realistic than SpecJBB2005. Of the DaCapo material that's already published, I like Tracking Bad Apples the best. I'm going to float this one around Azul and see if we want implement it. I also stumbled over this gem while perusing Emery Burger's page. Adding Transactions to the Non-BlockingHashMap One of the other things that came out was the result of a quick conversation between Rick Hudson, Eliot Moss and myself: adding software transactions to the NonBlockingHashMap. A transaction would mean that a single thread could do a series of reads & modifications to the HashMap and either abort them all, or commit them all atomically. If committed, no other thread would see any partial updates and this thread is assured that no other thread modified any K/V pair read during the transaction. Transactional support would go a long ways towards making a (fast, scalable, non-blocking) in-memory database (A C I but no D). One of the reasons this idea looks so tractable, is that all the accesses to the transacted memory are already wrapped in function calls - there's no requirement for a "below the JVM" or "inside the C compiler" implementation. It can all be done in pure Java. The basic idea here is to add another State on the Value slot, representing a "transaction in progress". Non-transacting threads seeing this state will be required to do more work: at least 1 more layer of indirection to get the actual value, and at least 1 flag check to decide if they need to report the before- or after- transaction-committed value. The State would be implemented as a variant of box'ing like I do now: an instanceof test reports that the Value is really a Box, and the reader of the Box now reads through the Box to get the actual value. I believe I can implement this in the State Machine that drives the NBHM, or at least I have a back-of-the-envelope FSM that handles 2 conflicting transactions. Transactions would be non-blocking (so no live-lock - always some thread gets to complete a transaction) - but not fair (I couldn't find a Wikipedia article on the notion of fairness, although there are plenty of articles with the word fairness in their title). It's possible for a thread with long transaction to be continuously aborted by other threads with doing repeated short conflicting transactions. In DB's, I suspect fairness is a big deal. In non-blocking algorithms it's tough to implement: the endless winners need to stop winning at some point and help the endless loser to win every now and then. The problem is, the winners are different threads than the loser and can't actually execute the loser's code directly (other than the little bits inside the NBHM calls). If I block the winners to let the loser get through, then the algorithm becomes ... well..., blocking. I can 'pause' the winners for a fixed period of time and retain the non-blocking title, although this seems messy. Does somebody have a better idea here? Another, API-related question: should the transactional calls be under a completely separate API, or should I just have "begin_transaction", "commit", and "abort" calls, and assume all the calls in the middle are being transacted over? What happens if a thread starts a transaction, then does a non-transactional conflicting write (abort the transaction?)? Do I allow more than one thread to work on the same transaction? i.e., make a "transaction object" and allow it to be passed around, and require it in transactional calls? Thanks, Category: Web/Tech | | TrackBack (0) TrackBackTrackBack URL for this entry: Listed below are links to weblogs that reference Adding Transactions to the Non-BlockingHashMap: CommentsSPECjbb2005 :) Posted by: Azeem Jiva | Jan 7, 2008 11:22:19 AM Since you're basically implementing an STM, you may want to check out the STM literature - I seem to recall some papers covering API issues. Posted by: Chris Purcell | Jan 7, 2008 12:11:53 PM Thanks for both comments. Posted by: Cliff Click | Jan 7, 2008 2:39:26 PM Hello Cliff! Your blog is great and another great article here. In an ideal implementation, the intermediate methods should acquire and commit the transaction if there was not any transaction bound to the calling thread - kind of what spring's transaction manager does - so the client code that does not care about transaction dont have to worry about such things. On the other hand, i doubt such map would be useful for client code that does not care about transactions so it might not matter much. Regarding the transaction object, i think it would be nice if there is such a thing, but im not sure it would be much used (i do suspect that if it would degrade performance, you are better of without it). Posted by: takeshi | Jan 7, 2008 3:07:13 PM The first problem is to even know that one thread never gets through. After X failures of a transaction, you need to set some flag saying "there is a transaction that cannot pass". You can store with that flag the value on which the loser transaction failed last time, and then immediately fail & restart any other transaction attempting to write over this value - until the loser transaction passes and cleans up the flag. If the loser transaction fails on another value, do the same thing with it. In fact, you would store the failing thread id in the slot where it failed. With STM you can also achieve a different kind of durability. Normally durable means "committed to disk so that it can be recovered". In clusters, you can instead redefine durable to mean "it has been replicated to 2 other machines, so that everything will work normally even if this and one other machines fail". Posted by: Nikola Toshev | Jan 8, 2008 6:37:02 AM Yeah, I figured out how to 'warn' other threads that this word is contended by a repeating 'loser'. I just haven't figured how much it's worth stalling a winner to achieve fairness. And yes, I'm aware of the different kinds of durability - it's old-hat to the DB world (although still new to me). The "sent to other cluster" kind of durability is apparently faster but weaker by an amount related to physical proximity. (in same rack: circuit breaker pop takes down both; in same datacenter: lightning strike takes down both; in same city: wide-scale power outage (ala CA a few years ago) takes out both, etc.) Cliff Posted by: Cliff Click | Jan 8, 2008 7:58:13 AM I tried to outline a different solution from stalling the threads. Sorry if the description didn't make sense or we have different assumptions - let me try again. If the other threads are about to *read* the contended word, you could stall them, but I think it's better to just allow them to continue. They may be able to complete their transaction without interfering with the "loser". On the other hand, if another thread attempts to *write* to the word contended by the "loser", the system can just fail this transaction (that would normally succeed). It doesn't make sense to stall the thread, because the "loser" presumably will overwrite that word any moment and you would have to restart the transaction anyway (after its thread woke up). So why not restart immediately. Failing all other attempts to write will allow the "loser" to continue past the last failure, and ultimately succeed. I assume you have the regular STM model when all transactions try to do their thing and if someone else overwrote their data, restart the transaction.
I don't know about any research on cluster durability and would appreciate if someone can refer me to an overview or something. Posted by: Nikola Toshev | Jan 8, 2008 8:58:30 AM I think I got it first time already. There's a lot of options for the 'winner' with not-killing the 'loser'. If we decide that the 'loser' must win, then obviously any already-conflicting transactions should abort as soon as possible. However, if some non-loser comes along and finds a word already written by the 'loser' he's got some options: If the non-loser finds a word read by a 'loser' he wants to write it, he's got similar options: I'm sure other options abound. Cliff Posted by: Cliff Click | Jan 8, 2008 9:22:21 AM It seems to me that initially the system should not treat the loser as a special case unless absolutely necessary - so, assume the loser will succeed and abort the transactions that we know would stay on the way. The other options seem to be performance optimizations that can be tested and added incrementally, if they are found useful. Anyway, thank you for the great article and perspective! Posted by: Nikola Toshev | Jan 8, 2008 10:11:22 AM Re "fairness". I use the term forward progress. Conventional synchronization can't guarantee forward progress but it can guarantee when forward progress won't be made. When I was trying to do formal semantics for synchronization I kept forward progress out of the synchronization definitions and made it part of or a hint at least in the signaling stuff. And in volatile. When you set a volatile variable, the system should try to make it visible to other threads, as opposed to regular variables which didn't ever have to be made visible as long as no consistency constraints were violated. I did have a rather inefficient lock-free algorithm where the slowest thread always won. Haven't figured out what to do with it. For transactions, I suggested using a conventional rwlock. Run all the transactions using a read lock. If a thread has problems getting a transaction through, it gets the lock in write mode and retries. For the transaction api, I've been using Callable inner anonymous classes for the body of the transaction. You don't have to depend of the user using a try/finally block correctly. You can also stick in retry policies easily in the wrapper method. BTW, in my STM implementation I was using a lot of WeakReferences on short lived objects. I took those out and used reference counting and got a 1/3 reduction in runtime. I'll probably try hazard pointers at some point. Posted by: Joe Seigh | Jan 9, 2008 7:13:46 PM I think I can do transactions in the hashtable with my State Machine trick - so no locks (R/W or otherwise), and very low overhead (dozen clocks, max) for folks touching the transacted entries but not otherwise involved. Folks doing the transaction will obviously pay a higher cost, but again I think I can keep it fairly low. Just to be clear: I'm not talking about doing a full-blown STM here, just allowing transactions on the hashtable; writes to memory outside the hashtable remain un-transacted. I like the idea of requiring the user to provide the body of the transaction as a thunk. It does make it easy to provide a variety of policies. My original idea (and probably what I'll provide as underpinning) is something like: "x = hashtable.create_xaction()" There's no need for the atomic "putIfAbsent" varieties, since you can do a 'get' then a 'put' atomically with the transaction. This is a very low level view of a transaction - it calls out the explicit transaction actions. This needs to be combined with user-code that can be retried if need be; wrapping the combo of user-code+above_transaction_api in a thunk would make for a better coding style than loosely interleaving retry policy w/transaction calls w/user-code. However, I also like the low-level API for it's flexibility - I can pass around the x-action object to other threads, or interleave multiple unrelated x-actions inside the same thread. Cliff Posted by: Cliff Click | Jan 9, 2008 10:34:41 PM Hi Cliff, thanks for all the wonderful info you post on your Blog. When you wrote the NBHM library, did you consider Extendible/Dynamic hashing to avoid re-hashing everytime? I haven't been able to find any useful (practical) info on this topic. Almost everyone just sticks to either Open Addressing or Chaining with re-hashing. Thanks. Posted by: Ashwin Jayaprakash | Jan 30, 2008 12:06:47 AM I think it's because the extendible/dynamic stuff adds a layer of indirection. Each layer has it's own resize/rehash rules. The top layer still needs the full re-size/ re-hash logic. The lower layer is intended to represent an entire disk-block's worth of records (for hashed DB's that don't fit in ram). Extended hashing can be used to reduce "disk block misses", but it doesn't map so well to "cache line misses" (except maybe for very long cache lines?). Given that all RAM is uniformly fast, I don't need the indirection layer for clustering. There is also a claimed space efficiency, but I haven't seen any theorem's showing one is better than the other. Finally, for hash tables with more elements than a Java array allows (for NBHM, I'm limited to 2billion words or 1billion key/value pairs - Azul makes boxes that can easily handle this), I have considered a 2-layer hash table. If I ever go there, I'll revisit the extendabile/dynamic hashing schemes. Cliff Posted by: Cliff Click | Jan 30, 2008 8:37:45 AM Speaking of "cache line aware" code, in your JavaOne slides on NBHM, you said that Open Addressing is more cache-aware than Chaining. I'm still mulling over this statement, when it comes to Java. Pardon my ignorance, but a non-primitive array in Java is just an array of pointers and not an array of objects. So, how is it more efficient than a linked-list of pointers? (Apart from having to allocate the List-Entry objects) Posted by: Ashwin Jayaprakash | Jan 30, 2008 10:54:28 AM For open addressing, I have an array of pointers. The next ptr is a stride-1 access away. For chaining, the next ptr is always following a link - i.e., ptr chaining. For the array, I get a bunch of ptrs on the same cache line, plus the hardware knows to prefetch stride-1. Thus each successive probe after the first one is a cache hit. For chaining, each successive probe is often a cache miss, so you must arrange to probe less often. Cliff Posted by: Cliff Click | Jan 30, 2008 11:19:10 AM The interrupted update problem may be possible to be solved in the following way: Each time the long running transaction is interrupted, raise its priority a step. After a number of step increases, the transaction will finish because of priority. Note that in a very busy update situation, you may also have to periodically monitor how high the priority is getting, possibly at the raise time. If it exceeds a limit you set (or possibly a delay time), you suspend other threads and let this one finish. It seems minimal and fail safe on quick review. BillN Posted by: Bill Nicholls | Mar 2, 2008 9:07:41 AM The interrupted update problem may be possible to be solved in the following way: Each time the long running transaction is interrupted, raise its priority a step. After a number of step increases, the transaction will finish because of priority. Note that in a very busy update situation, you may also have to periodically monitor how high the priority is getting, possibly at the raise time. If it exceeds a limit you set (or possibly a delay time), you suspend other threads and let this one finish. It seems minimal and fail safe on quick review. BillN Posted by: Bill Nicholls | Mar 2, 2008 9:09:03 AM There's lots of heuristics for fairness and helping forward progress on transactions. In my case, I think I figured out to be both lock-free (no live-lock) but also fair... but that's a conference paper I need to write someday. Cliff Posted by: Cliff Click | Mar 2, 2008 4:21:00 PM For fairness, how about randomized loosers/winners? Posted by: gruber | Mar 6, 2008 3:06:29 AM Probably works in theory (i.e., random is probably good enough for a conference paper). In practice, you'd have to be careful that it was really random, because any repeating pattern is going to starve somebody - but sounds good. Cliff Posted by: Cliff Click | Mar 6, 2008 8:50:06 AM Hi, just came across your blog via mailinator tonight. Very interesting stuff. Question which I, in a different form, also asked at mailinator--any thoughts on an non-blocking LRU map? I've been looking for one, and this post about adding higher-level functionality into your non-blocking map via the state-machine that underlies it made me start to wonder about the possibility for doing something similar for LRU functionality. The type of non-blocking algo used in CHM doesn't seem to lend itself (based on my toying with it) to doing something like this. Or, you could also generalize to (as Sun did), using the state machine to do a linked map with the option of doing LRU. Anyhow, I'm sure I'll spend some time ruminating on it, the thought just occurred to me, figured I'd ask. Posted by: Jacy | Mar 28, 2008 10:40:39 PM When you say 'LRU map' do you mean with an explicit peak/cap on size? There are lots of policies to choose from when aging out entries: Are you looking specifically for LRU? Just curious, Posted by: Cliff Click | Mar 29, 2008 8:51:35 AM well, explicit peak/cap in size. not something with weak/soft refs. i'd like to be able to control, for some situations, exactly how many of an object type are live, and be able to do some cleanup when they're done, and i'd like it to order them by last access. for some context, it's a financial system, so, when i see a message related to an order, i'm likely to see more in the short term, and there's some objects related to the order i'd like to keep around, at least in the short term. as long as they keep getting accessed, i would want them to stay near the head of the map. i want to cap the size of the datastructure because i don't want the underlying array from growing too large. certainly i could do some timestamping and have a cleanup thread which goes through a standard concurrent map and removes the oldest objects to bring it back down to size, but i would rather avoid that overhead. i think it would be appealing to have a concurrent, fixed-size datastructure where i can control the eviction policy (and, ideally, have an opportunity to do cleanup on evicted objects if necessary). for my purposes, nearly-LRU would be fine, strong ordering isn't necessary, but approximate is important. Posted by: Jacy | Mar 29, 2008 9:40:09 AM Hi cliff my question is regarding the google talk on the lock free hash table. appreciate your time Amit. Posted by: Amit Ben | May 29, 2008 9:13:49 AM Just wanted to let you know that I've built a concurrent hash table with transactional support. It's BSD licensed and the transactional hash table core is written in C. Since I do my development work on a Mac, it's got some Mac OS X centric quirks to it, but I've tried to keep them separate from the core. There is also an Objective-C portion that uses the transactional hash table to create a NSDictionary and NSMutableDictionary drop in replacement that's multi-reader, multi-writer safe. You can enumerate a mutable dictionary while other threads continue to make mutations to the dictionary and it doesn't effect the iteration. You can even mutate the dictionary inside the iteration. I'm not claiming it's ready for anything close to production work. The second released version 0.2, really strengthens the transaction logic under corner cases (ie, inserting or deleting the same key simultaneously, etc). The basic architecture is that the data structure uses a chaining hash table. The hash chains are really an atomic LIFO queue, new items are pushed on to a hash buckets LIFO queue. MVCC is used to support the transactional capability, and it allows me to easily support atomically unlinking an item that's in the middle of the LIFO queue (which becomes a multi-step operation, and when there are no transactions open that are < the transaction number the first atomic unlinking occurred, you can guarantee that no one has any references to the unlinked node). http://transactionkit.sourceforge.net/ Posted by: John Engelhart | Jul 21, 2008 12:13:37 PM
Next
»
Post a comment |


