|
state machines & non-blocking algs Chris Purcell writes:
This is absolutely the Right Thing To Do here. The hardware guys have been using State Machines for years to do concurrent algorithms - in hardware. That is, they have tool support such that they can write large State Machines, have the description maintained by many people (CVS for State Machines?), automatically tested, executable code generated (Verilog), etc. I think the concurrent algorithms crowd definitely needs to head down this road as well. Cliff TrackBackTrackBack URL for this entry: Listed below are links to weblogs that reference state machines & non-blocking algs: CommentsHere's a tool for state machine support that's been around for a while: Posted by: Brian Goetz | Apr 10, 2007 9:06:46 AM Yeah, I surfed SourceForge for awhile. All the tools I see are too heavy-weight for what I'm doing now. That is, the gen'd code has more overhead than I can tolerate in this application. The hardware guys do better than, e.g. SMC, in terms of mapping a state machine notion down to raw hardware (or in my case, raw Java bytecodes). Also, I could not find concurrent state updates without real heavyweight locking going on. Posted by: Cliff Click | Apr 11, 2007 5:20:08 PM Hi Cliff, 5 seconds worth of thoughts as a random reader.. Your state machine is for one K/V pair, in one thread? you have to show that the system behavior works? It's kind of like how a MOESI state machine I always hated when people said "MOESI" FSM and though For instance, you need to show that if two threads Probably means that on every CAS to a shared location, it's probably obvious to you, but it's missing in the analysis, Once you prove those details, you don't need the page about stronger I'm not really sure it's fair to call Your local state is implied by transitions. The only real state You could analyze it as a state machine from
-kevin Posted by: Kevin | May 4, 2007 4:28:36 PM Hi Kevin... you said a lot in that post(!) and the replies won't easily fit in a reply. Since I'm in Chicago x-fering planes right now, I'm gonna punt and reply in a full post later. But here's the gist of a reply: You argue I don't have a "state machine", just some K/V pairs and transitions.... but that's EXACTLY all any state machine ever is! The state machine shows I'm OK if 2 different threads bang on the same K/V pair with different values. The state machine shows I don't NEED store-ordering on CAS, just atomicity. If the machine lacks store-ordering I'll get a looser semantics in the presence of races, but it will still be "hash- table-like" semantics. I don't NEED any more guarantees to show I've got a "hash table"-like property (any get of key K will return some value V that was the subject of a prior put(K,V)). But I might want a java.util.Concurrent.HashTable-like property which makes stronger guarantees - and thus requires more fencing. Your argument that a collection of K/V-state machines does not scale... seems lacking. They compose directly. You might make an argument that I haven't discussed the composition function very much- it's exactly (and only) the key-slot select function... which could be treated as a state machine in it's own right. (and deserves more discussion). I DO end up composing 2 state machines, when I copy from the old table to the new table as part of a resize operation. Come to JavaOne! I'm giving the talk again Thursday and these questions would be great for Q&A. Cliff Posted by: Cliff Click | May 4, 2007 5:46:32 PM Hi Cliff, I guess I was just thinking that
I guess I'm just saying (you might The requirement for a key being static This M*N set of states, is the total system If hw diagnostics guys were testing this, It also makes it more obvious that you I'm really just reacting based on past Understanding all legal combinations, and having Maybe what I'm really thinking is that Posted by: kevin | May 7, 2007 2:37:57 PM "One could argue that there is no _true_ K/V state, only that which each thread believes to be true at a point in time." Actually, I am talking about the exposed hardware abstraction - that of a shared memory machine. Hence there actually IS only 1 true K/V state for the machine but each thread gets a snapshot of it as the state charges forward in time. Just because a thread sees state "Q" now does not mean the K/V will be at state "Q" when the same thread looks again, or attempts a change. However I slice it, the hardware presents to my program the abstraction that there is but a single K/V state - I just can't know what it is (I can know "what it WAS" just 1 clock ago). Hopefully this will clear up the issue. I don't need some kind of MOESI protocol - that is the hardware's job. I'm relying on the hardware to be correct. Given correct hardware, I want to show that my state machine is correct. At different point in time, the threads read the current K/V state (there is only 1!!!), but the state can change immediately after reading. After a short time, I end up with a lot of threads with different ideas of what the K/V state should be - plus the ACTUAL K/V state. Then the threads act (i.e, attempt to CAS). The CAS only succeeds if the thread's view of the state and the actual state are in sync; otherwise the CAS fails.
Posted by: Cliff Click | May 7, 2007 3:05:39 PM hmm. I think we'll have to end this. I would note there's no way for you to i.e. if you do a load 1000 times, you You can complete a CAS for a V, and still Once you start talking about the "actual" If you agree that the "actual" K/V state So the only thing that's worth discussing I'm really just saying something simple. The proof requires a full enumeration of Also, the hw doesn't provide the Posted by: kevin | May 9, 2007 8:56:30 AM "I would note there's no way for you to read the current K/V state. You have to agree with that." There are 2 times where you DO get the "current K/V state" - (1) the clock cycle upon reading a K-that-is-an-X, a K-that-is-a-NULL, or a V (given you already read a not-null not-X K) - and CAS counts as a read. And (2) after a "long time" past the last update to that slot. "You can complete a CAS for a V and still have an old view of a K, for instance." Yup; the FSM guarantees that at the time you attempted the CAS, the "old value for K" is also the current value of K and also the future value of K for all time. You don't CAS on a V until K has hit it's final value.
Yup again; fortunately the FSMs for each K/V slot are independent so the proof of this is nothing more than the trivially replicated proof of each separate K/V FSM. Now I need to show that many threads running the same FSM are legal. Again, this is trivial to reduce - by definition of a FSM a FSM does not care which thread makes transitions. So the proof devolves down to "is the FSM correct".
Yes again - and these views are not contradictory. i.e., all these statements are true all at the same time: For the last 2 statements, I'll note that the abstraction of a single K/V state exists, it's just hard to witness. But not impossible. I witness it via a successful CAS or if I wait "long enough". (I'll add that I've seen "long enough" run into the several minute range on real hardware!!!).
Perhaps it would help if you would provide a counter-example? Some set of thread schedules & successful CAS's that move outside the FSM? You're allowed to assume any amount of delay on K/V visibility but a successful CAS has to remain atomic. Cliff Posted by: Cliff Click | May 9, 2007 9:54:59 AM "You can complete a CAS for a V, and still have an old view of a K, for instance." That's fine, because of the monotone properties of K. Posted by: Chris Purcell | May 10, 2007 7:39:14 AM I think kevin has a point. There is not a single FSM for each K/V cell. There is an FSM for each thread's view of each K/V cell, like kevin says. For example, one thread can see a given cell go from (null,null)->(null,V)->(K,V), and another can see it go from (null,null)->(K,null)->(K,V). Fortunately, the analysis doesn't depend on there being a single FSM for each cell. It's sufficient that when some thread A performs a given sequence (as defined by program order) of updates to a location, each other thread sees the updates by thread A in that order. If that were not the case then it would be possible for some thread to see a key location go to K, and then see a prior state in which it was null, which would break the algorithm. But the algorithm (and this method of analysis in general) does not depend on every thread seeing the same interleaving of updates to a location. Note that even x86 does not guarantee that for bare loads/stores (I think it may do for CAS, but don't quote me on that). Posted by: David Hopwood | Aug 4, 2007 1:09:02 PM You wrote: Actually, I am talking about the exposed hardware abstraction - that of a shared memory machine. Hence there actually IS only 1 true K/V state for the machine but each thread gets a snapshot of it as the state charges forward in time. That is making quite a strong assumption about the machine memory model. Posted by: David Hopwood | Aug 4, 2007 1:12:29 PM >> For example, one thread can see a given cell go from (null,null)->(null,V)->(K,V), and another can see it go from (null,null)->(K,null)->(K,V). Notice you said "see" as in "witness" - but not "change". Seeing a NULL key counts as a miss, so the Value witnessed in a (null,V) state is ignored.
Sparc TSO does guarentee this, but I've also heard otherwise for X86.
So I take it we're in violent agreement here?
Posted by: Cliff Click | Aug 4, 2007 11:15:07 PM Post a comment |


