Why WeakHashMap Sucks
August 31, 2007

Came across this surprise while working on a customer app.

Symptom: with Concurrent GC, the app performance slowly degrades over time, with more and more time spent in WeakHashMap lookups - until the load is removed (typically because all transactions start timing out).  Then suddenly the app becomes performant again, the WeakHashMap lookups are fast, etc ... until slowly, the app performance starts to degrade again.  With the standard Stop-The-World fast parallel collector, the app performance is always fast (except during GC pauses, which is why the customer wants to use a Concurrent GC).

Root cause #1: Customer's HashMap hashcode did NOT includes value hashes, but it's equal call did!
Root cause #2: A failed "equals" calls in a WeakHashMap take time, and will typically cause a ConcurrentGC to believe the Key should be kept alive for another cycle.

Let me explain some more:

The App: ships objects over the wires between 2 JVMs, via RMI.  Each RMI structure carries a java object essentially encoded as a HashMap with Keys for fields and Values for field values.  I.e., Since String has 4 fields, an object like the String "abc" would be encoded as a HashMap with 4 key/value pairs: ("value", char[3]={'a','b','c'}), ("offset",int=0), {"count",int=3), and ("hash",int=0xab123ab}.  The same type of objects are getting shipped around, so all these HashMaps have the same 4 Keys.... and the same hashcode! ('cause the customer's HashMap hashcode ignores values) the 'equals' DID compare against values, so 2 maps with the same keys and different values will have the same hashcode but will not be equals.

These HashMap structures (suitably wrapped in a few more layers) are then stuffed into a WeakHashMap.  Alas, since all of 'em have the same hashcode the WeakHashMap turns into a linked list. This is a major bummer.  Lookups are forced to scan the linked list doing a full HashMap compare at each step.

Now comes the ConcurrentGC trick: CGC takes time.  During this time the application is probing the table.  As the app sweeps down the linked list it forces every Key it touches to be alive during this CGC cycle.  Assuming a single lookup can finish inside of a CGC cycle, all Keys end up being forced alive every time.  Nothing ever gets removed.  CGC starts running slower cycles because the live set is growing each time a new HashMap is inserted into the WeakHashMap.

If we stop the app even for a moment a CGC cycle completes with no probes causing Keys to remain alive and all the dead keys get stripped out at once.  Immediately the table is mostly empty and fast again (and our live set has hugely shrunk making CGC cycles fast again) - and we can start probing it with little problem since the linked list is short and the odds of CGC catching us in the middle of the linked-list walking is low.  But slowly over time we fail to remove some dead keys, the list grows, and the odds of getting caught in the middle of a walk grow until it's a surety.

..................

Suppose we look at a Stop-The-World collector.  When a GC cycle halts things, likely we're in the middle of 1 key probe - but no more.  All the other dead keys get collected and our live set does not grow.  App performance remains good (except for the obvious GC pauses).

..................

Suppose we look at a generational collector; young-gen is STW but old-gen is concurrent.  Same problem arises if we've got any keys in the old-gen.  The time to sweep the old-gen is so long that we surely touch every old-gen key in the table, keeping all of them alive.   As the young-gen periodically promotes a few keys they "get stuck" in the old-gen and can never be collected until we stop probing during a full old-gen cycle.  Old-gen cycles get slower and slower as the live set grows, unless we stop the app even for a moment.

..................

Suppose we keep going with our ConcurrentGC (and crap hashcodes) and the linked list grows without bound.  Eventually the time to walk the linked list (and call equals()) could exceed a CGC cycle (I say "could" because as the list grows CGC as more work to do so it depends on the relative costs to walk what's being kept alive vs the cost of an equals() call).  In this case, those portions of the linked list not walked inside the CGC cycle will be found dead and removed - i.e., there's an upper bound to the list length, but it depends on the ratio of CGC-walk-time vs time-to-equals for objects on the list.  Paradoxically, a slower equals call will keep less stuff alive!
..................

Suppose we correct the hashcode problem; the WeakHashMap turns into a real *hashtable* instead of a linked list.  Suppose also I work for a company with a ConcurrentGC which can cycle in a matter of seconds for almost any size heap... and thus the Weak keys in the WeakHashMap get stripped out every few seconds.  Then the WeakHashMap fails it's intended purpose of being a cache (with lazy old-key cleanup semantics).  Really, the customer wants a cache with a n explicit timeout on the entries.

I read the javadocs for WeakHashMap and they pretty clearly explain the fallacy's... but maybe WHM was made when Concurrent collectors where a far off dream, or could only cycle very slowly.  I think technology has caught up with WeakHashMap.

Cliff

Category: Web/Tech | | TrackBack (0)

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451bd7669e200e54eea195c8834

Listed below are links to weblogs that reference Why WeakHashMap Sucks:

 

Comments

Hey, Cliff. Another problem with WHM is it uses equals() equality, but weak references are fundamentally identity based.

I have a WHM replacement which supports weak/soft/strongly referenced keys and or values: http://google-guice.googlecode.com/svn/trunk/src/com/google/inject/internal/ReferenceMap.java

It cleans up after GC-ed entries in a background thread instead of inline, and it implements ConcurrentMap completely.

I would personally never use weak reference for caching for the same reasons you mention (it doesn't really work). Use soft references instead.

Here's a pretty simple way to GC entries after N ms. It only takes a few lines of code. Use a ConcurrentHashMap and a Timer. Every time you add an entry to the CHM, schedule a TimerTask to remove it (after ensuring that the value hasn't been replaced). You can share one static Timer between all the map instances.

Posted by: Bob Lee | Aug 31, 2007 5:09:27 PM

 

Sounds like a severe scaling limit to me. How does it work with million-entry tables? Billion-entry tables? (don't laugh, I got a customer looking at using NonBlockingHashMap with a 500-million entry table).

What's it do to the cost of an insertion ("put()" is cheap - what is the cost of another TimerTask)? How about the cost a successful lookup (if you don't do something then your hot entries are tossed out after N secs along with the cold).

This is an interesting hard problem... I can imagine that calling currentTimeNano or currentTimeMillis is cheap - so each entry can have a last-touched stamp. I can imagine sweeping a fraction of a table every few msec (but for every table???). You kinda need a fraction-of-a-thread for small tables (i.e. TimerTask), but my billion-entry table probably needs several threads full-time sweeping looking for entries to age out.

I already got a variant of NBHM for use with primitive long keys... maybe I'll look at one with a graceful aging policy.

Cliff

Posted by: Cliff Click | Aug 31, 2007 5:54:19 PM

 

Oh, I didn't realize you wanted to keep the entries around for N seconds *after the most recent access time*.

I would just use ReferenceMap with soft references (assuming "N seconds" has wiggle room).

Posted by: Bob Lee | Aug 31, 2007 6:05:10 PM

 

Also, where does AbstractMap come into play? AbstractMap.hashCode() does call hashCode() on the Map.Entry objects you provide.

Posted by: Bob Lee | Aug 31, 2007 9:18:59 PM

 

humm... which calls value.hashCode(). Sorry, faulty memory; the hashCode in question didn't hash on the values so I guess it was a user override. It was also older code... what does 1.3.1 do here?
Cliff

Posted by: Cliff Click | Aug 31, 2007 10:21:03 PM

 

I suspect AbstractMap has always worked like that. Modifying the hash code implementation after release would have broken compatibility.

Posted by: Bob Lee | Sep 1, 2007 4:51:30 PM

 

On the bright side, ReferenceMap is insanely concurrent. Charles and I even managed to implement the methods in ConcurrentMap without locking (at least without any more than ConcurrentHashMap does). Doug Lea and Josh Bloch have both reviewed its implementation at one time or another. We're hoping to get it into Java 7.

Posted by: Bob Lee | Sep 1, 2007 4:55:27 PM

 

Excellent post. Congratulations. You have just opened the door for future changes in the JDK.

Posted by: Diego Parrilla | Sep 3, 2007 1:34:05 AM

 

That's a really disapointing post, since a minute spent in the java source shows that HashMap, AbstractHashMap and WeakHashMap all do the right thing with equals() and hashcode().

The post needs to be updated to say "oops! I was wrong".

WeakHashMap has it's limitations, but it's still useful in the case that you want to cache references to classes that might be unloaded along with classloader and web app that they were deployed with.

Posted by: David Roussel | Sep 3, 2007 6:41:02 AM

 

> That's a really disapointing post, since a minute spent in the java source shows that HashMap, AbstractHashMap and WeakHashMap all do the right thing with equals() and hashcode().

Actually, it took more than a minute to find out if Java 1.3.1 did something wacky - 1.3.1 source doesn't pop out from Google very easily (and 1.3.1 does do something wacky; the Entry hashCode is weird), but I think it was the customer's hashcode in any case.

> The post needs to be updated to say "oops! I was wrong".

Post updated to refer to customer's hashCode.

.... and I stand by my complaints:

If you use something other than pointer-equality in a WeakHashMap, then keys may be stripped out in a very few seconds. If you use pointer-equality then a mismatch between hashCode and equals isn't an issue (but if the keys are only ptr-equals then why the need for a HashMap?)

Actually, this is a good question: what usage cases do people have of WHM that use the unique features of WHM productively? Let's skip all the "psuedo-LRU cache" uses, as those are fairly busted in a world with Concurrent GC's.

Thanks,
Cliff

Posted by: Cliff Click | Sep 3, 2007 12:03:55 PM

 

I'm hoping the irony in the title is intentional. There are several layers of "suck" going on here and WeakHashMap is the least of them. Assuming you have a correct hashcode and concurrent GC, as you said everything gets cleared out really fast and the Javadocs explain all this.

The fact that the developers were hoping to use it as some kind of cheap, really crappy caching mechanism really has nothing to do with anything. Their lofty goals aside, WeakHashMap is intended for another purpose entirely - to keep information about objects without preventing them from being GC'd.

As mentioned above, SoftReferences are the cheap, crappy caching mechanism they were thinking of. They're perfect for when you'd like your cache to be cleared at random times and in random order. I'd recommend looking into a real caching algorithm instead.

Posted by: Scott Vachalek | Sep 4, 2007 8:10:48 AM

 

[no question that the customer's app was severely broken in a number of ways]

Certainly the interactions with a crappy hashcode, weakrefs & concurrent GC are truely weird.

But what again is the use-case for WeakHashMap? If I have to find the Value via an 'equals' call and do not have a strong ref, then the Value can disappear out from under me very rapidly. i.e. Ask for the Symbol mapped by String "cliff" once and get an answer; ask for it again a moment later and the mapping (and Symbol) are gone. As the javadocs explain, it's concurrent programming where GC is one of the players - a recipe ripe for surprise NullPointerExceptions. Presumably this isn't the desired situation.

I have a strong ref Key ptr directly in hand (and do not need an equals call) then why didn't I just subclass the Key and add the Value as a direct object field? Is the use-case here an issue of software engineering? (it's too hard to refactor to subclass & add a field? )

I'm coming around to thinking that WeakIdentityHashMap might make sense sometimes, but WeakHashMap never.

Cliff

Posted by: Cliff Click | Sep 4, 2007 9:14:29 AM

 

You're correct that WeakIdentityHashMap makes sense and WeakHashMap does not. Like I said before, weak references are fundamentally identity based.

I think WeakHashMap just happens to work most of the time because your keys don't override equals() and hashCode(). The most common use case for WHM is to use Class instances as keys--you obviously can't extend Class and add a direct reference to your value.

BTW, ReferenceMap is now available as part of the Google Collections: http://code.google.com/p/google-collections/

Posted by: Bob Lee | Sep 10, 2007 4:16:10 PM

 

If you want cache use soft references. Weak references are only as 'weak' references, i.e. not preventing the GC and ensuring no leaks.
Usual caches also use strong keys and weak/soft values.

Posted by: stanimir | Mar 6, 2008 4:53:00 PM

 

i actually did come across a situation recently that i feel WeakHashMap was actually an appropriate solution for. i was doing an adaptive data structure which needed to know how many threads use it. i don't know how many of these structures might exist in a given program (i do a lot of framework dev), and i hardly wanted the overhead of some extra data-structure inside the object keeping track of the threads which have visited it. instead, i did a ThreadLocal> (where Object is a static new Object(), and DataObject uses instance equality) that can be used to see if this thread has accessed this data-structure instance yet, and then adapt it as necessary. since the weakmaps are threadlocal, they only live with the threads, and since they are weak, and using instance equality, they are keeping the intended WeakHashMap contract. at any rate, i think i can perhaps eliminate the usage of maps by adapting in a different fashion, but it is at least an example of using the features of WHM productively that you asked for.

Posted by: Jacy | Mar 28, 2008 11:07:56 PM

 

sigh, stupid HTML formatting. should have previewed. in the post above, the ThreadLocal was supposed to be ThreadLocal{WeakHashMap{DataObject, Object}}. should make more sense now.

Posted by: Jacy | Mar 28, 2008 11:13:05 PM

 

Hi Cliff,

Regarding lock and wait free structures. Your concept and design using state machine is great.

For real time java based system, I have started using non-blocking hashmap, hashset structures. I get around 30,000 objects per second. Some threads do building of tree structure, some threads do iteration, some threads remove expired objects based on time from non blocking structure. All these operations should happen concurrently without blocking. So I feel your structures are best.

a> There should be no synchronization in code. I tested this with non blocking hashlong, producer thread adding object and consumer thread removing object. Can thread which is iterating structure also remove object? Without "concurrent modification exception".

b> Kindly let us know how does non blocking structures work and concept behind "wait and lock free".

c>It seems nonblocking Set does not support "Comparator" like sun java Set/TreeSet. Is there any work around for this? To make this non blocking I have taken "sun" code for TreeSet modified to remove concurrent modification exception. This seems to work but bit fuzzy. There is a memory leak problem. I want to use azul structures with comparator clause.

d>With 30,000 objects/secound I need optimal jvm tunning - no major collections but only minor collections - need to attempt real time jvm. Any advice on this.

Thanks

Posted by: satish | Nov 9, 2008 12:45:14 AM

 

(a) - Anybody can remove anything at any time without throwing a CME. The self-iterator acts as expected, it's ok to call remove as part of the contract. Other threads can also do removes, and it's a data-race as to whether or not the iterating thread sees the removed item. But no CME's ever.

(b) - "how does it work & wait-free vs lock-free". See either my Google tech-talk or my JavaOne talks. Also links to the slides have been posted in various places. I've also given this talk at the NYC JUG and a number of other venues. It's not a suitable topic of this itsy-bitsy blog.

(c) - Adding Comparator isn't on my to-do list. Shoot me a email if you want to talk more about it.

(d) - Depending on your budget a low-end Azul box will easily handle that much allocation. High-end boxes handle 40G/sec of allocation with max pauses in the 10-20 msec range.

Cliff

Posted by: Cliff Click | Nov 9, 2008 12:54:01 PM

 

Hi Cliff,

Thanks. I will try to understand wait-lock
datastructures.

Kindly let me know your mail-id.
We will try to implement "comparators" in non-blocking
structures.

Regards

Posted by: satish | Nov 10, 2008 3:15:26 AM

 

email me at:
cliffc@azulsystems.org

Posted by: Cliff Click | Nov 10, 2008 8:00:19 AM

 

Hi Cliff,

Sent a mail. We need help to place comparator part in NonBlockingSet. Seen code quite complex (got lost). I think FSM is used.

Thanks

Posted by: satish | Nov 13, 2008 3:50:14 AM

 

Post a comment