Suggestion for a large hash table (2^25 elements)

Go To StackoverFlow.com

2

I want to write a birthday attack program in Haskell for a variant of SHA1 which only produces only a 50 bit hash. To do this I need a hash table capable of storing approx. 2^25 entries.

The keys in this map will be Int64 and the values will be short length strings (~ 16 bytes).

Any suggestions for which hash implementation to use?

(Disregard that last update - I would need a bit array of 2^50 elements.)

2012-04-03 21:49
by ErikR
Well, SQL would be the simplest. But their are lots of variations of in-storage and "sorta in-storage" hash tables. A lot would depend on your operating environment and some of the characteristics of the data (such as how evenly it's distributed) - Hot Licks 2012-04-03 21:54


6

For 2^25 entries at 8 bytes a piece, you are looking at something like 768MB of storage for just the data, at most probably around 3 gigabytes with actual overhead for storing bytestrings -- guesstimating 80 bytes per bytestring, then you have the hashtable/map's internals to store, and the boxing for the key, etc.

This means you can store the entire thing resident in memory on a decent machine, which keeps the problem relatively sane, but your collection times will kind of suck.

I would recommend using a lot of smaller hash tables, by partitioning your keyspace, that way you can run lots of the updates in parallel regardless of the hash table you use.

As for implementation:

You can either wrap a bunch of immutable hash tables like the wide-fanout ones from unordered-containers in IORefs and use some kind of atomicModifyIORef or something like ryan newton's compare and swap primitive, or you can try to use the old Data.HashTable implementation in a straightforward imperative manner.

The latter will improve your asymptotics by a logarithmic factor over the hash-array mapped tries used by unordered-containers, but Data.HashTable has bad constants. At the scale of your problem these factors probably cancel out though.

2012-04-04 01:46
by Edward KMETT
Also see Gregory Collins' hashtables package (http://hackage.haskell.org/package/hashtables) for mutable hashtables with better constant factors than Data.HashTable - reinerp 2012-04-04 02:22


2

I also posted same sort of question. And from some suggestion, I am using Kyoto Cabinet. It is pretty cool and gives nice performance also. You can check my posts also because I have similar issues. EX. one, two and three. Perhaps this may helpful.

2012-04-04 01:06
by Arpssss