Cliff Click Jr.’s Blog's Blog

« A Non-Blocking HashTable, Part 2 | Main | quick update »

state machines & non-blocking algs
April 2, 2007

Chris Purcell writes:

Re. your talk: State machines permeate my algorithms, too. I suspect it's because they are easy to do atomically, make the "progress" through an algorithm obvious (simplifying concurrent assistance), and make enumerating all cases (the bane of lock-free algorithms) both explicit and fundamental.

This is absolutely the Right Thing To Do here.  The hardware guys have been using State Machines for years to do concurrent algorithms - in hardware.  That is, they have tool support such that they can write large State Machines, have the description maintained by many people (CVS for State Machines?), automatically tested, executable code generated (Verilog), etc.  I think the concurrent algorithms crowd definitely needs to head down this road as well.

Cliff

TrackBack (0)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451bd7669e200d8352aa72b69e2

Listed below are links to weblogs that reference state machines & non-blocking algs:

 

Comments

Here's a tool for state machine support that's been around for a while:

http://smc.sourceforge.net/

Posted by: Brian Goetz | Apr 10, 2007 9:06:46 AM

 

Yeah, I surfed SourceForge for awhile. All the tools I see are too heavy-weight for what I'm doing now. That is, the gen'd code has more overhead than I can tolerate in this application. The hardware guys do better than, e.g. SMC, in terms of mapping a state machine notion down to raw hardware (or in my case, raw Java bytecodes). Also, I could not find concurrent state updates without real heavyweight locking going on.

Posted by: Cliff Click | Apr 11, 2007 5:20:08 PM

 

Hi Cliff,
I happened to scan your latest 2007_LockFreeHash.pdf

5 seconds worth of thoughts as a random reader..

Your state machine is for one K/V pair, in one thread?
The total system state is all K/V pairs, all threads.

you have to show that the system behavior works?
You make a leap from one key/one thread FSM, to saying
the implied system FSM obviously works, I think, which
isn't necessarily true. (in fact it's why people build
buggy cache-coherency systems)

It's kind of like how a MOESI state machine
doesn't prove cache coherency works in a system.

I always hated when people said "MOESI" FSM and though
they defined a cache coherent protocol.

For instance, you need to show that if two threads
try to operate on the same key, that the CAS-protected
transitions always go a way that's okay for each thread,
even if the value the thread sees on the CAS is the value from the other
thread. And that there are no other random values that
can appear and mess things up.

Probably means that on every CAS to a shared location,
different transitions CAS different values? or something like that.

it's probably obvious to you, but it's missing in the analysis,
I think.

Once you prove those details, you don't need the page about stronger
guarantees, which makes it sound like you're not totally sure
about correctness.

I'm not really sure it's fair to call
it a state machine anyhow, since you don't maintain any state..which
is a basic requirement of a state machine.

Your local state is implied by transitions. The only real state
is the set of K/V pairs.

You could analyze it as a state machine from
the hash table point of view, for all K/V pairs, showing that
the possible values a thread sees, in all cases, is legal for the algorithm
you're using in each thread. That would make it more clear what aspects
of a memory model are required also. The "state machine" your per-thread
algorithm goes thru, doesn't have to match the "state machine" implied
by the transitioning values of the K/V pairs, although it's probably
easiest if it does (as you imply).


You do have some system memory model properties required. For instance
probably store atomicity (everyone sees the same order of stores
to a single address..no one disagrees). Although maybe not. Maybe
you just need the CAS to work atomically (I wasn't sure if there
are any other stores besides CAS)

-kevin

Posted by: Kevin | May 4, 2007 4:28:36 PM

 

Hi Kevin... you said a lot in that post(!) and the replies won't easily fit in a reply. Since I'm in Chicago x-fering planes right now, I'm gonna punt and reply in a full post later. But here's the gist of a reply:

You argue I don't have a "state machine", just some K/V pairs and transitions.... but that's EXACTLY all any state machine ever is!

The state machine shows I'm OK if 2 different threads bang on the same K/V pair with different values.

The state machine shows I don't NEED store-ordering on CAS, just atomicity. If the machine lacks store-ordering I'll get a looser semantics in the presence of races, but it will still be "hash- table-like" semantics.

I don't NEED any more guarantees to show I've got a "hash table"-like property (any get of key K will return some value V that was the subject of a prior put(K,V)). But I might want a java.util.Concurrent.HashTable-like property which makes stronger guarantees - and thus requires more fencing.

Your argument that a collection of K/V-state machines does not scale... seems lacking. They compose directly. You might make an argument that I haven't discussed the composition function very much- it's exactly (and only) the key-slot select function... which could be treated as a state machine in it's own right. (and deserves more discussion).

I DO end up composing 2 state machines, when I copy from the old table to the new table as part of a resize operation.

Come to JavaOne! I'm giving the talk again Thursday and these questions would be great for Q&A.

Cliff

Posted by: Cliff Click | May 4, 2007 5:46:32 PM

 

Hi Cliff,
I wasn't trying to be too deep. :)
I know you get what I was thinking...
but a little more detail:

I guess I was just thinking that
at any point in time, there are many
viewpoints of a single K/V pair, not all
identical. These varying views define
the possible system state. One could argue
that there is no "true" K/V state, only
that which each thread believes to be
true at a point in time. Thus the sum of all
thread's viewpoints, is the total system
state.


It's like with cache coherency algorithms,
caches don't necessarily have the same
values for an address at any particular
time. So we talk instead about the legal combinations.

I guess I'm just saying (you might
already have it in your pdf) it'd be nice
to see a table, for each state in the fsm,
that shows the legal states for all other
fsms in the system, for that k/v pair.
(maybe all combinations are legal)

The requirement for a key being static
I think limits the possibilites though?

This M*N set of states, is the total system
FSM.

If hw diagnostics guys were testing this,
they'd want to know the full possibilities
to test them all.
(assuming 3 or more threads hitting the
same K/V pair doesn't increase the possibilit
ies...heck you should even call that out..
how many threads are the minimum required
to create the full combination of legal
state variations in the system. Need
to know that for testing too.

It also makes it more obvious that you
can interrupt and restore your fsm at
any time, without there being any issues.

I'm really just reacting based on past
experiences with MOESI algorithms, where
people get fixated on "what's the state
of the line" when that's not the important
question, since there are many views of
that at any point in time.

Understanding all legal combinations, and having
atomic transactions is what makes these
things work (as you have done).

Maybe what I'm really thinking is that
sw people don't think in FSMs more (for multi-cpu problems) because
they don't have the right language/approach
for discussing them. I'm thinking you could
advance that better...
-kevin

Posted by: kevin | May 7, 2007 2:37:57 PM

 

"One could argue that there is no _true_ K/V state, only that which each thread believes to be true at a point in time."

Actually, I am talking about the exposed hardware abstraction - that of a shared memory machine. Hence there actually IS only 1 true K/V state for the machine but each thread gets a snapshot of it as the state charges forward in time. Just because a thread sees state "Q" now does not mean the K/V will be at state "Q" when the same thread looks again, or attempts a change.

However I slice it, the hardware presents to my program the abstraction that there is but a single K/V state - I just can't know what it is (I can know "what it WAS" just 1 clock ago).

Hopefully this will clear up the issue. I don't need some kind of MOESI protocol - that is the hardware's job. I'm relying on the hardware to be correct.

Given correct hardware, I want to show that my state machine is correct. At different point in time, the threads read the current K/V state (there is only 1!!!), but the state can change immediately after reading. After a short time, I end up with a lot of threads with different ideas of what the K/V state should be - plus the ACTUAL K/V state. Then the threads act (i.e, attempt to CAS). The CAS only succeeds if the thread's view of the state and the actual state are in sync; otherwise the CAS fails.


Posted by: Cliff Click | May 7, 2007 3:05:39 PM

 

hmm. I think we'll have to end this.
I don't think we're going to get
into a mind sync.

I would note there's no way for you to
"read the current K/V state".
You have to agree with that. There's
no such thing or way.

i.e. if you do a load 1000 times, you
might still get 1000 old values
of K or V.

You can complete a CAS for a V, and still
have an old view of a K, for instance.

Once you start talking about the "actual"
K/V state, the thinking goes bad, in my
mind.

If you agree that the "actual" K/V state
is just a concept that you can't know, then
there's no reason to believe it exists
or even discuss it..i.e. it provides no
value if you can't know it.

So the only thing that's worth discussing
is what views each thread has, and
what the total legal system state is for
all threads.

I'm really just saying something simple.

The proof requires a full enumeration of
the legal states for all other threads,
along with the description of a single
threads FSM, and that no two threads
have bad cooperation that causes an
erroneous FSM transition.

Also, the hw doesn't provide the
abstraction that there is a single K/V
state. It provides the behaviors the
memory model describes, which is not
a set of rules around single values
...it's a set of
visibility rules, which inherently allow
multiple values for a single address to
exist in a system for a long time.

Posted by: kevin | May 9, 2007 8:56:30 AM

 

"I would note there's no way for you to read the current K/V state. You have to agree with that."

There are 2 times where you DO get the "current K/V state" - (1) the clock cycle upon reading a K-that-is-an-X, a K-that-is-a-NULL, or a V (given you already read a not-null not-X K) - and CAS counts as a read. And (2) after a "long time" past the last update to that slot.

"You can complete a CAS for a V and still have an old view of a K, for instance."

Yup; the FSM guarantees that at the time you attempted the CAS, the "old value for K" is also the current value of K and also the future value of K for all time. You don't CAS on a V until K has hit it's final value.


"The proof requires a full enumeration of the legal states for all other threads, along with the description of a single threads FSM, and that no two threads have bad cooperation that causes an erroneous FSM transition."

Yup again; fortunately the FSMs for each K/V slot are independent so the proof of this is nothing more than the trivially replicated proof of each separate K/V FSM. Now I need to show that many threads running the same FSM are legal. Again, this is trivial to reduce - by definition of a FSM a FSM does not care which thread makes transitions. So the proof devolves down to "is the FSM correct".


"Also, the hw doesn't provide the abstraction that there is a single K/V state. It provides the behaviors the memory model describes which is not a set of rules around single values ...it's a set of visibility rules, which inherently allow multiple values for a single address to exist in a system for a long time."

Yes again - and these views are not contradictory. i.e., all these statements are true all at the same time:
- "hardware provides a set of visibility rules"
- "hardware allows multiple values for a single address to exist for a long time"
- "hardware provides the abstraction of a single K/V state"

For the last 2 statements, I'll note that the abstraction of a single K/V state exists, it's just hard to witness. But not impossible. I witness it via a successful CAS or if I wait "long enough". (I'll add that I've seen "long enough" run into the several minute range on real hardware!!!).


---

Perhaps it would help if you would provide a counter-example? Some set of thread schedules & successful CAS's that move outside the FSM? You're allowed to assume any amount of delay on K/V visibility but a successful CAS has to remain atomic.

Cliff

Posted by: Cliff Click | May 9, 2007 9:54:59 AM

 

"You can complete a CAS for a V, and still have an old view of a K, for instance."

That's fine, because of the monotone properties of K.

Posted by: Chris Purcell | May 10, 2007 7:39:14 AM

 

I think kevin has a point. There is not a single FSM for each K/V cell. There is an FSM for each thread's view of each K/V cell, like kevin says. For example, one thread can see a given cell go from (null,null)->(null,V)->(K,V), and another can see it go from (null,null)->(K,null)->(K,V).

Fortunately, the analysis doesn't depend on there being a single FSM for each cell. It's sufficient that when some thread A performs a given sequence (as defined by program order) of updates to a location, each other thread sees the updates by thread A in that order. If that were not the case then it would be possible for some thread to see a key location go to K, and then see a prior state in which it was null, which would break the algorithm. But the algorithm (and this method of analysis in general) does not depend on every thread seeing the same interleaving of updates to a location. Note that even x86 does not guarantee that for bare loads/stores (I think it may do for CAS, but don't quote me on that).

Posted by: David Hopwood | Aug 4, 2007 1:09:02 PM

 

You wrote:

Actually, I am talking about the exposed hardware abstraction - that of a shared memory machine. Hence there actually IS only 1 true K/V state for the machine but each thread gets a snapshot of it as the state charges forward in time.

That is making quite a strong assumption about the machine memory model.

Posted by: David Hopwood | Aug 4, 2007 1:12:29 PM

 

>> For example, one thread can see a given cell go from (null,null)->(null,V)->(K,V), and another can see it go from (null,null)->(K,null)->(K,V).

Notice you said "see" as in "witness" - but not "change". Seeing a NULL key counts as a miss, so the Value witnessed in a (null,V) state is ignored.


>> every thread seeing the same interleaving of updates to a location. Note that even x86 does not guarantee that for bare loads/stores

Sparc TSO does guarentee this, but I've also heard otherwise for X86.


>>> Actually, I am talking about the exposed hardware abstraction - that of a shared memory machine. Hence there actually IS only 1 true K/V state for the machine but each thread gets a snapshot of it as the state charges forward in time.

So I take it we're in violent agreement here?
:-)


Cliff


Posted by: Cliff Click | Aug 4, 2007 11:15:07 PM

 

Post a comment