### PlaidCTF 2015: radhos Writeup

Along with several coworkers, I participated a bit in the PlaidCTF 2015 competition this past weekend. The competition is geared mainly toward binary exploitation, with reverse engineering a secondary priority. There are also a number of challenges which focus on cryptography, forensics, and other security-related issues. Unfortunately, my skills lie more toward the latter category than the former, so I wasn’t as broadly useful as I might have otherwise liked to be. Nonetheless, I was able to contribute solutions to three problems: “RE GEX” (a large regular expression, where it was necessary to find a string that *didn’t* match the regular expression), “PNG Uncorrupt” (a PNG file with several missing “\x0d” bytes before “\x0a” bytes), and “radhos” (a Python hash collision challenge). I’ve seen good writeups online for RE GEX and PNG Uncorrupt, but haven’t yet seen any good ones for radhos, so I’m contributing my solution here.

You can download the radhos challenge here (or here if the first link has disappeared). If running it locally, replace line 24 with “` return '127.0.0.1'`

” to keep the server from having errors.

radhos is a simple HTTP server that emulates a small hashtable-based key-value store. Each value is stored using the hash of its key, where the hash is the lower 32 bits of Python’s `hash()`

function. (This appears to be intended to give consistent behavior between 32-bit and 64-bit versions of Python.) The level’s flag is stored in the hashtable’s location for the key “you_want_it_LOLOLOL?”, but the server will reject requests that ask for that key by name. Therefore, the remaining solution is to find something that will generate a collision in `hash(key)`

. To encourage players to solve the challenge without simply guessing random keys, the server restarts (and gets a random `hash()`

seed) every seven minutes.

## Hash Collisions

Using HTTP requests, you can get the value for a key you select, or alternatively, you can set the value for a key. Setting the value for a key will also return the old value in that location in the hashtable, if any, as well as the value of `hash(key)`

. It’s this information leak that allows us to break Python’s randomized hash function, generate a new key that has the same hash value as “you_want_it_LOLOLOL?”, and retrieve the flag.

I initially thought that this would be an instance of the bug described at https://bugs.python.org/issue14621, but that doesn’t appear to be the case. I spent far too long looking at that issue, and trying to replicate similar situations that might allow a collision in this case. However, the issue does have an attachment, hash.py, that gave Python code for the hash algorithm, which was somewhat useful for analysis. The (pre-3.4) Python `hash()`

function used two secret values to obscure the hash value: a “prefix” that affects the whole hash, and a “suffix” that is simply XOR’d in afterward to obscure the result and prevent attackers simply guessing the prefix. If an attacker is able to determine the “prefix”, they can generate hash collisions on their own. (Having the suffix is unnecessary if collisions are the only goal, as it is simply XOR’d on at the end.)

Here’s a bit of code for generating the hash:

HASHMASK = 0xffffffff def hashbytes32(data): if len(data) == 0: return 0 x = prefix x ^= ord(data[0]) << 7 for byte in data: x = (1000003 * x) ^ ord(byte) x &= HASHMASK x ^= len(data) if x == -1: x = -2 return x

## Finding the `prefix`

Luckily, there was a talk on hash collisions (Hash-flooding DoS reloaded: attacks and defenses) that referenced a tool that could be used to determine the hash secret in Python. I was able to find the tool at https://131002.net/siphash/poc.py, and verified that it worked.

Unfortunately, it worked based on hashes for “\x00” and “\x00\x00” (one and two null bytes), which we couldn’t submit to the radhos server (as far as I know). A small amount of math allows us to change the target used in the algorithm, however. Let’s say that we have `h1 = hash("A")`

and `h2 = hash("B")`

. Then we know:

h1 = ((prefix ^ (65 << 7)) * 1000003) ^ 65 ^ 1 ^ suffix h2 = ((prefix ^ (66 << 7)) * 1000003) ^ 66 ^ 1 ^ suffix

Let’s XOR the two, and note that the `1`

and `suffix`

terms cancel out, as they are on both sides. Then we have:

h1 ^ h2 = ((prefix ^ (65 << 7)) * 1000003) ^ 65 ^ ((prefix ^ (66 << 7)) * 1000003) ^ 66

The tool I mentioned above uses this fact, and guesses the value of `prefix`

one bit at a time, narrowing down possibilities as it goes. It then performs a verification step, testing those possibilities with a known hash of a third value. I was able to verify that I received the correct prefix value by testing locally, so I was satisfied with that part of the code.

As an aside, if the bit-by-bit guessing didn’t work to determine the `prefix`

, it would be possible to use the `h1 ^ h2`

math above and several additional hashes to brute-force the prefix, but that would likely take a while. Distributing the problem among multiple cores (or computers) might get a result before the server restarted, however, so it might have been worthwhile if the proof of concept code were not available. (Note that brute-forcing the `suffix`

is unnecessary in this case, as XOR’ing the hashes removes it from consideration.)

## Finding an actual collision

Simply having the `prefix`

value is insufficient, as we need to actually generate something that will collide. The obvious solution is a brute-force attack. This is easy to implement, and can be parallelized to get a result before the server restarts. Unfortunately, it’s also fairly slow in general, when another relatively simple solution is available.

Because the hash code is relatively simple, we can perform what is known as a “meet in the middle” attack. In this case, we start with the result we want (we can already calculate the hash for “you_want_it_LOLOLOL?”, because we have `prefix`

and `suffix`

mask = 0xffffffff

def hashback(instr, target):

val = target

for byte in instr[::-1]:

val = ((val ^ ord(byte)) * 2021759595) & mask

return val

This works because 1000003 * 2021759595 = 1 mod 2^{32}, so multiplying by 2021759595 is equivalent to dividing by 1000003. (Note that in this function, for simplicity, we have already removed the `suffix`

and the size of the string to be hashed.)

So we’ve worked backward and determined what the hash state should be for a given suffix. How does this help? Well, we can do the same thing for many suffixes, and have many possible targets for our brute-forcing. The most efficient way to do this is to try 2^{16} suffixes. This turns the brute-forcing from an expected 2^{32} operations with 1 unit of storage into 2*2^{16} operations with 2^{16} storage. RAM is relatively plentiful for this task, so it’s a good trade-off. I’ll let the code below speak for itself, but feel free to leave a comment if you have a question.

## Putting it all together

The code I ended up using is below, after some minor cleaning up:

(more…)