The Longest 44 Minutes

You've probably written something like this a thousand times:

user_data = {}
user_data["name"] = "Alice"

Or this:

let mut map = HashMap::new();
map.insert("name", "Alice");

Hash tables. Dictionaries. Maps. Whatever your language calls them, they're everywhere. You reach for them without thinking. O(1) lookup, O(1) insertion—the promise is so reliable that you stop questioning it.

I want to talk about the moment that promise broke. But first, even if you already know how hash tables work, let me walk through the internals briefly—because the attack I'm about to describe exploits a detail that most of us gloss over.

What's Actually Happening Under the Hood

A hash table is, at its core, an array. When you insert a key-value pair, a hash function converts the key into an integer, and that integer (modulo the array size) becomes the index where the value is stored. Lookup works the same way: hash the key, jump to that index, done. That's where the O(1) comes from.

But what happens when two different keys hash to the same index? This is called a collision, and it's inevitable—you're mapping an infinite space of possible keys to a finite array. Every hash table implementation has to handle this somehow.

The two main approaches are chaining and open addressing. With chaining, each array slot holds a linked list (or sometimes a tree) of all key-value pairs that hashed to that index. When you look up a key, you hash it to find the slot, then scan the list until you find a match. With open addressing, when a collision occurs, you probe other slots in the array—linear probing checks the next slot, quadratic probing jumps by increasing amounts—until you find an empty one.

Most of the time, collisions are rare and evenly distributed. Your linked lists stay short, or your probes find empty slots quickly. The O(1) holds.

But here's the thing I never thought about until I read about what happened in December 2011: what if the collisions aren't random? What if someone constructs keys specifically designed to collide?

December 28, 2011

The Chaos Communication Congress in Berlin—28C3. Two German researchers, Alexander Klink from security firm n.runs and Julian Wälde from TU Darmstadt, took the stage to demonstrate something that would send shockwaves through the software industry.

They showed how a single laptop, with bandwidth measured in kilobits per second, could bring down a production web server. Not through some exotic zero-day. Through something far simpler: sending a carefully crafted HTTP POST request to any web application that parsed form data.

The attack worked on PHP. It worked on Java. It worked on Python, Ruby, ASP.NET, and the V8 JavaScript engine. Within days, CVEs were assigned across the entire ecosystem. Emergency patches were rushed out.

What made this possible? The hash tables that parse your form data.

How the Attack Works

Most web frameworks at the time used a hash function called DJBX33A, created by Daniel Bernstein:

hash = 5381
for each character c in string:
    hash = hash * 33 + c

Simple and fast. Also, it turns out, fatally flawed for adversarial inputs.

DJBX33A has what cryptographers call "equivalent substrings." If you know the function's structure, you can construct pairs of strings that hash to the same value. In PHP's implementation, "Ez" and "FY" produce the same hash. So do "EzEz" and "FYFY". And "EzFY" and "FYEz". From just two colliding pairs, you can construct 2^n collisions by combining them.

Klink and Wälde built a 2MB POST request containing hundreds of thousands of form fields—all with keys designed to collide. On a Tomcat server, parsing this request consumed 44 minutes of CPU time on an Intel i7 core.

Forty-four minutes. From a single HTTP request. With 6 kilobits per second of bandwidth.

The attack was called HashDoS. It worked because web frameworks did exactly what they were supposed to do: parse form data into a hash table. The code was correct. The algorithm was correct. The vulnerability was in an assumption nobody had questioned: that keys would be random, not adversarial.

What gets me is that Perl had fixed this in 2003. Eight years earlier. The Perl developers randomized their hash function, making collision attacks impractical. But the lesson didn't spread.

What Different Languages Decided to Do

In the aftermath, language communities made different choices about their standard library hash tables. Those choices say something about how we think about defaults.

Rust, when it stabilized in 2015, made SipHash its default hasher for HashMap. SipHash is a cryptographically strong keyed hash function—designed to resist collision attacks even from adversaries who can observe outputs. It's slower than simpler functions, but the Rust developers decided security by default was worth the cost.

There was real debate about this. SipHash 2-4 was the original choice, then switched to SipHash 1-3 for better performance while maintaining security properties. The Rust position: hash-flooding is obscure but devastating when it hits, and developers will forget to think about it, so the default should protect them.

Zig made the opposite choice. Its standard library HashMap uses Wyhash—a fast, non-cryptographic hash function with no built-in resistance to collision attacks. The reasoning: most applications don't need DoS protection, and those that do should explicitly choose a secure hasher.

I keep going back and forth on which approach is right. The Rust position feels paternalistic but also... correct? Most developers won't think about adversarial inputs until it's too late. The Zig position respects developer autonomy but relies on everyone understanding their threat model—and most people don't.

What I find interesting is that this isn't really a technical question. It's a question about human behavior, about what defaults encourage, about whether we trust developers to know what they need.

A Pattern That Keeps Repeating

HashDoS isn't an isolated incident. It's part of a broader category: algorithmic complexity attacks.

Regular expressions have the same problem. A carefully crafted input can cause certain regex patterns to take exponential time—an attack called ReDoS. The "billion laughs" attack against XML parsers uses nested entity definitions to consume exponential memory.

The pattern is always the same: an algorithm that behaves well on typical inputs behaves catastrophically on adversarial inputs. And because we test with typical inputs, we don't discover the problem until someone exploits it.

What makes these attacks hard to diagnose is that they look like correct behavior. The server isn't crashing. It isn't throwing errors. It's just slow. Trying to do what you asked. From the outside, it might look like a traffic spike, or a resource leak, or a hardware problem. You might spend hours debugging before realizing that the slowness is the attack.

What I Think About Now

I still use hash tables constantly. I still reach for them without much thought, because most of the time, they're exactly right.

But when I'm writing code that processes external input, there's a question in the back of my mind now: what if the input is hostile? What if someone constructs keys designed to collide?

I don't always have a good answer. But asking the question feels like the right place to start.