Home > Exploit development > Leaking information using timing attacks on hash tables, part 2

## Leaking information using timing attacks on hash tables, part 2

This is the second part of Leaking information using timing attacks on hash tables, discussing implementation details of the leak. Read the first part for a high level overview.

# Implementation details

Consider a hash table using double hashing [1]. Checking if a key belongs to a hash table T can be expressed as:

def lookup(T, k):
h1 = hash1(k)
h2 = hash2(k)

idx = h1
while 1:
if is_free(T[idx]):
return False

#structure holding (k,v)
s = T[idx]
if s.k == k:
return True

idx += h2
idx = idx % size(T)

Note this can’t loop if $hash2()$ always returns something relatively prime to $size(T)$ and $T$ is never 100% full (FF keeps both of these conditions true).

Firefox is nice enough to store certain kind of pointers and user supplied integers in the same hash table. All of the magic happens in this function:

file: js\src\jsscope.cpp
function: Shape ** PropertyTable::search(jsid id, bool adding)

$lookup(T, k)$ is an abbreviation of T.search(k, false).

In order to layout keys in $T$ as described in part 1:

we set properties of a javascript object. JS code:

obj[x] = 1;

translates to $add(T, x)$ and if $x$ is an integer, we have full control on where exactly in $T$ it’s going to be stored. The hash functions used by FF are (all operations modulo $2^{32}$):

def h(x):
return x*GOLDEN

def hash1(x, shift):
return h(x)>>shift

def hash2(x, log2, shift):
return (h(x)<<log2)>>shift

where $GOLDEN$ is an odd constant (golden ratio), $log2 = log(2, size(T))$, $shift$ = $32-log2$.

In order to set $T[i]$, we need to find $x$ s.t. $(x*GOLDEN)>>shift$ = $i$. Since $GOLDEN$ is relatively prime to $2^{32}$, it has a multiplicative inverse [2] $GOLDEN^{-1}$. Solving for $x$ yields $x = ((i<. Now the $foo$ factor 😉 would be $0$ if integers passed from JS were not mangled, but unfortunately they are — instead of $x$, the key being added to the table is $2x+1$ (this is related to how FF tags [3] pointers). To accomodate for this, set $foo=1$ and let

$x_0 = ((i<

$x' = x_0 >> 1$

Now obj[x’]=1; will result in setting $T[]$ at position $(2x'+1)*GOLDEN>>shift$, but that equals

$(2*(x_0>>1)+1)*GOLDEN>>shift$
$= x_0*GOLDEN>>shift$
$= ((i<>shift$
$= i$

The equality $2(x_0>>1)+1 = x_0$ holds, because if $foo=1$, $x_0$ must be odd (since both $((i< and $GOLDEN^{-1}$ are odd).

Using these formulas, we fill $T[]$ (mixing python and js pseudocode):

obj = {}
for i in range(N):
x' = calc_x(i)
obj[x'] = 1 #this results in T[i] being set

How to use this crafted table to leak information?

Invoking obj.hasOwnProperty(str) is equivalent to $lookup(T, ptr(str))$, so if we make enough JS objects (strings in this POC) we will eventually find one that takes considerably longer to check than others. Let’s call this object $Mstr$. Recall that running time of $lookup(T, k)$ is proportional to the length of $k$‘s chain, so longer checking means longer chain.

Using the technique described in part 1, we can learn elements belonging to $Mstr$‘s chain. Let’s express them as $a_i = a_0-ri$ ($a_0 = ptr(Mstr)$, $r=h2(Mstr)$).

Part 1 describes how to recover $r$ with high probability, by computing the $gcd$ of a specific set. We will prove that the “exponentially low probability for failure” claim is correct and discuss a case where the described algorithm can fail.

Let $S=\{a_0-ir: 0 \leq i < n\}$ and let $D(S)=\{|a_i-a_j|: a_i,a_j \in S\}$. $D(S)$ is a set of differences of $K$ randomly chosen elements from $S$. Let $k=|D|$ (note that $k \leq K$). What’s the probability that $gcd(D(S))=r$?

An equivalent question is to ask about probability of $gcd(D(S'))=1$ for $S'=\{i: 0 \leq i < n\}$. To see that, consider what happens if we divide all elements of $D(S)$ by $r$, or multiply all elements of $D(S')$ by $r$.

Nymann asked himself the same exact question in 1972 🙂 and proved [5], that the probability of $k$ random integers being relatively prime is $1/\zeta(k)$, where $\zeta(k)$ is the Riemann’s zeta function [6]. I think it’s pretty cool that zeta made its way to a blog post about hashtables in Firefox :D. Yet another grave reason to solve the related millenium problem [8] ;).

It’s easy to see that exponenets work in our favor. Since $\zeta(k)=\sum_{n}^{\infty}n^{-k}$, then $lim_{k \to \infty}2^k (\zeta(k)-1)=1$, so $\zeta(k) \sim 1+1/2^k$ and $1/\zeta(k) \sim 1/(1+1/2^k)$. Probability that $gcd(D)=r$ approaches $1$ exponentially fast in terms of $k$.

The above reasoning works when the sequence of elements of $Mstr$‘s chain is monotonic, meaning that $r$ was small enough to never “wrap” $a_i$ around $|T|$. There’s a second case, which looks like this, when plotted on a graph:

Elements are scattered around both ends of the filled region. This happens when $r$ is big enough to “jump over” the free region at the end of the hashtable.

The $gcd$ in that case is most likely going to be equal to 1, but with $r \ne 1$. There’s a fast way to deal with this problem.

Let $d_n = j_n - i_n$, so that we can express differences of elements of $Mstr$‘s chain as:

$rd_0 \equiv c_0$ mod $|T|$
.
.
$rd_k \equiv c_k$ mod $|T|$

Notice that the range of $d_n$ is $[-CL(Mstr), CL(Mstr)]$. Low end with $j=0$,$i=CL(Mstr)$ and high end with $j=CL(Mstr)$,$i=0$.

Knowing that, we can iterate over all possible values of $d_0$, solve the equation $rd_0 \equiv c_0$ for $r$ (using the linear congruence theorem [7]) and then compute:

$d_1 = j_1-i_1 \equiv c_1*r^{-1}$ mod $|T|$
.
.
$d_k = j_k-i_k \equiv c_k*r^{-1}$ mod $|T|$

If $r$ we found was correct, then $\forall d_i$ $d_i \in [-CL(Mstr), CL(Mstr)]$. Probability that this condition is true, but $r$ is incorrect is (again) exponentially low.

Part 1 discusses how to find the chain’s starting point $x_0=h1(Mstr)$, knowing a single element of the chain and its period $r$.

Having $x_0$ and $r$, we reconstruct ptr(Mstr) exactly:

$ptr = ((x_0<>log2)*GOLDEN^{-1}$

# Defining info leaks

As we can see, leaking information is not the same thing as reading memory. My proposition is to define “leaking” as any procedure leading to diminishing the number of possible states the attacked program might be in. Let’s say a program $P$ has 2 bits of memory. Let’s assume we learned (by any means) that $M[0]$ xor $M[1]=1$ ($M$ is the memory). This condition allows only $M[0]=0$,$M[1]=1$ or $M[0]=1$,$M[1]=0$, so the number of possible states is 2, instead of 4. “Memory disclosure” definition fails to account for these “partial” leaks and side channel leaks, while the “entropy” one doesn’t.

References

1. http://en.wikipedia.org/wiki/Double_hashing
2. http://en.wikipedia.org/wiki/Modular_multiplicative_inverse
3. http://en.wikipedia.org/wiki/Tagged_pointer
4. http://en.wikipedia.org/wiki/Linear_regression
5. J. E. Nymann. “On the Probability that k Positive Integers Are Relatively Prime.” J. Number
Theory 4 (1972):469-73.
6. http://en.wikipedia.org/wiki/Riemann_zeta_function
7. http://en.wikipedia.org/wiki/Linear_congruence_theorem
8. http://en.wikipedia.org/wiki/Millennium_Prize_Problems#The_Riemann_hypothesis

1. 14/08/2012 at 23:26

This is really really cool. Leaking information about memory locations without *reading* memory values directly is mostly unexplored outside of cache timing side channel crypto literature (at least when I did my search it was). At Blackhat DC 2010, presented an info leak attack using the ordering of elements in a heterogeneous hashtable (mixed heap addresses and integers) exposed via a hashtable iterator. I also tried to get the academic community interested in this at WOOT 2010. I don’t think this is the last we’ll see of this kind of attack.

Anyway, VERY cool! Well done.

2. 17/08/2012 at 22:08

I remember your trick — I actually tried to use it against FF, but foreach() loops in JS use “time of addition” for ordering, instead of ptr values 🙂