|
I've Been Slashdot'd I gave the talk at the JVM Language Summit, which itself was a lot of fun. The talk is a repeat of one of the talks I did at JavaOne. I also gave two other talks, but the Sun JavaOne website appears to be unable to deliver the video right now. I also gave a short interview at JavaOne. One of the talks I mentioned on the InfoQ video is also available here as a Google Tech Talk; Java on 1000 cores: Tales of Hardware/Software Co-Design. I also mentioned a talk Azul's Experiences with Hardware Transactional Memory, and my blog on that is here. Alas, I don't believe the HTM talk has been video'd for public consumption at any time. If you are interested in HTM support, you should also check out this short gem. The GC talk alluded too has slides all over the web; here's the original paper, but I could not find a public video presentation. Cliff Category: Web/Tech | | TrackBack (0) TrackBackTrackBack URL for this entry: Listed below are links to weblogs that reference I've Been Slashdot'd: CommentsCliff - Apologies for contacting you through this blog comment! it's not obvious how one might email you. Your lock-free hash, back from 2007 - I observe from you a comment in that thread asking, if someone implemented the algorithm in C, of a copy of the code. I have a *license-free* lock-free library, written in C, here; www.liblfds.org Nothing special yet, only a couple of the simplest algorithms, although to be fair most of the work so far has gone into making the library accessable to developers. People are using it commerically (I'm not charging for it, obviously, since it's license-free). What would be your position on a request from me for permission to implement your algorithm in that library? Posted by: Toby | Jan 26, 2010 3:15:37 PM From the Wiki comments: * atomic single-word increment As far as I know, these requires are *only* met by X86. They also seem very strong requirements: you're unlikely to get anything beyond plain CAS on anything beyond an X86. Atomic increment can easily be implemented via a CAS (so you can remove it from your must-have list), but the double-word CAS is a tough sell. high-scale-lib is under a very loose liscense, you are free to do with it what you will. All I ask is you credit me with the base algorithm. Cliff
Posted by: Cliff Click | Jan 27, 2010 8:32:18 AM ARM supports DWCAS (via LL/SC). I don't know much about other architectures - I've only ported to x86, x64 and ARM. I know SPARC and IA64 do not. The reason for DWCAS is the absence of SMR; I'm using pointer-counter pairs to address ABA. SMR is something I'm about to look at, because if you want a linked list (which I do), you basically need SMR. *nod* re license for high-scale-lib. There is a credits page on the wiki; if I successfully port, there would be an entry pointing to you and Azul. The wiki section for the API would also discuss the algorithm and describe the originator. Posted by: Toby | Jan 27, 2010 9:10:12 AM Josh Dybnis has an x86/64 port (with some other bits) up at http://code.google.com/p/nbds/ Posted by: Andrew M | Jan 27, 2010 3:26:14 PM Ta! Useful reference if I run into problems and/or confusion, both of which are certainties :-) Posted by: Toby | Jan 28, 2010 4:24:14 AM I just read the SpecJBB2005 section in your IWannaBit! paper Posted by: Andrew Trick | Feb 2, 2010 4:43:52 PM Suppose I have a locked region with 2 writes in it (and some reads). Using the atomic-read bit I can verify that (so far at least), I am reading atomically. Then I come to the first write. Since I am still atomic, I allow the write to proceed. At this point the locked region is only 1/2 done, but one write has happened - and is now visible to other threads. If at this point I lose a line, I am no longer atomic.... but the damage is done: only 1 of the 2 writes happened. Really I can only do 1 write atomically, and it has to be arranged to be the last operation in the locked region (but it's easy to move all reads before the write, with a little care taken about aliasing). Cliff Posted by: Cliff Click | Feb 3, 2010 8:49:19 AM Thanks for the explanation. I was thinking the lock acquisition could The general concept is good because it eliminates the "acquire Posted by: Andrew Trick | Feb 3, 2010 3:15:14 PM No stall required for in-flight loads; you just fail to be atomic immediately. Means if you issue a ld/clr_atomic sequence, you always fail (because the ld is still 3 clocks even for an L1 hit). The fix is to insert a CPU-specific stall between the last load and the clr_atomic. This is a rather painful fix... A "better" CPU implementation might allow L1 hits and stall the clr_atomic for any outstanding L1 *hits*, but just fail out for anything that misses L1. (and/or stall for anything but a private-cache hit. E.g. for machines with a shared L3 but private L1/L2, you could stall for a L2 hit but punt on an L3 hit.) Really the goal is to get everything you need cache-local, then retry the whole operation. Stalling defeats forward progress to the next cache-missing op, so needs to be used with care. Cliff Posted by: Cliff Click | Feb 3, 2010 3:29:18 PM That makes perfect sense. The optimization that I was considering It would be more broadly applicable, but would also be more Posted by: Andrew Trick | Feb 3, 2010 7:54:19 PM Post a comment |


