Home > Exploit development > Leaking information using timing attacks on hash tables, part 2

Leaking information using timing attacks on hash tables, part 2

This is the second part of Leaking information using timing attacks on hash tables, discussing implementation details of the leak. Read the first part for a high level overview.

Implementation details

 

 

Consider a hash table using double hashing [1]. Checking if a key belongs to a hash table T can be expressed as:

def lookup(T, k):
    h1 = hash1(k)
    h2 = hash2(k)

    idx = h1
    while 1:
        if is_free(T[idx]):
            return False

        #structure holding (k,v)
        s = T[idx]  
        if s.k == k:
            return True

        idx += h2
        idx = idx % size(T)

Note this can’t loop if hash2() always returns something relatively prime to size(T) and T is never 100% full (FF keeps both of these conditions true).

Firefox is nice enough to store certain kind of pointers and user supplied integers in the same hash table. All of the magic happens in this function:

file: js\src\jsscope.cpp
function: Shape ** PropertyTable::search(jsid id, bool adding)

lookup(T, k) is an abbreviation of T.search(k, false).

In order to layout keys in T as described in part 1:

we set properties of a javascript object. JS code:

obj[x] = 1;

translates to add(T, x) and if x is an integer, we have full control on where exactly in T it’s going to be stored. The hash functions used by FF are (all operations modulo 2^{32}):

def h(x):
    return x*GOLDEN

def hash1(x, shift):
    return h(x)>>shift

def hash2(x, log2, shift):
    return (h(x)<<log2)>>shift

where GOLDEN is an odd constant (golden ratio), log2 = log(2, size(T)), shift = 32-log2.

In order to set T[i], we need to find x s.t. (x*GOLDEN)>>shift = i. Since GOLDEN is relatively prime to 2^{32}, it has a multiplicative inverse [2] GOLDEN^{-1}. Solving for x yields x = ((i<<shift)+foo)*GOLDEN^{-1}. Now the foo factor 😉 would be 0 if integers passed from JS were not mangled, but unfortunately they are — instead of x, the key being added to the table is 2x+1 (this is related to how FF tags [3] pointers). To accomodate for this, set foo=1 and let

x_0 = ((i<<shift)+foo)*GOLDEN^{-1}

x' = x_0 >> 1

Now obj[x’]=1; will result in setting T[] at position (2x'+1)*GOLDEN>>shift, but that equals

(2*(x_0>>1)+1)*GOLDEN>>shift
= x_0*GOLDEN>>shift
= ((i<<shift)+foo)>>shift
= i

The equality 2(x_0>>1)+1 = x_0 holds, because if foo=1, x_0 must be odd (since both ((i<<shift)+foo) and GOLDEN^{-1} are odd).

Using these formulas, we fill T[] (mixing python and js pseudocode):

obj = {}
for i in range(N):
    x' = calc_x(i)
    obj[x'] = 1 #this results in T[i] being set

How to use this crafted table to leak information?

Invoking obj.hasOwnProperty(str) is equivalent to lookup(T, ptr(str)), so if we make enough JS objects (strings in this POC) we will eventually find one that takes considerably longer to check than others. Let’s call this object Mstr. Recall that running time of lookup(T, k) is proportional to the length of k‘s chain, so longer checking means longer chain.

Using the technique described in part 1, we can learn elements belonging to Mstr‘s chain. Let’s express them as a_i = a_0-ri (a_0 = ptr(Mstr), r=h2(Mstr)).

Part 1 describes how to recover r with high probability, by computing the gcd of a specific set. We will prove that the “exponentially low probability for failure” claim is correct and discuss a case where the described algorithm can fail.

Let S=\{a_0-ir: 0 \leq i < n\} and let D(S)=\{|a_i-a_j|: a_i,a_j \in S\}. D(S) is a set of differences of K randomly chosen elements from S. Let k=|D| (note that k \leq K). What’s the probability that gcd(D(S))=r?

An equivalent question is to ask about probability of gcd(D(S'))=1 for S'=\{i: 0 \leq i < n\}. To see that, consider what happens if we divide all elements of D(S) by r, or multiply all elements of D(S') by r.

Nymann asked himself the same exact question in 1972 🙂 and proved [5], that the probability of k random integers being relatively prime is 1/\zeta(k), where \zeta(k) is the Riemann’s zeta function [6]. I think it’s pretty cool that zeta made its way to a blog post about hashtables in Firefox :D. Yet another grave reason to solve the related millenium problem [8] ;).

It’s easy to see that exponenets work in our favor. Since \zeta(k)=\sum_{n}^{\infty}n^{-k}, then lim_{k \to \infty}2^k (\zeta(k)-1)=1, so \zeta(k) \sim 1+1/2^k and 1/\zeta(k) \sim 1/(1+1/2^k). Probability that gcd(D)=r approaches 1 exponentially fast in terms of k.

The above reasoning works when the sequence of elements of Mstr‘s chain is monotonic, meaning that r was small enough to never “wrap” a_i around |T|. There’s a second case, which looks like this, when plotted on a graph:

Elements are scattered around both ends of the filled region. This happens when r is big enough to “jump over” the free region at the end of the hashtable.

The gcd in that case is most likely going to be equal to 1, but with r \ne 1. There’s a fast way to deal with this problem.

Let d_n = j_n - i_n, so that we can express differences of elements of Mstr‘s chain as:

rd_0 \equiv c_0 mod |T|
.
.
rd_k \equiv c_k mod |T|

Notice that the range of d_n is [-CL(Mstr), CL(Mstr)]. Low end with j=0,i=CL(Mstr) and high end with j=CL(Mstr),i=0.

Knowing that, we can iterate over all possible values of d_0, solve the equation rd_0 \equiv c_0 for r (using the linear congruence theorem [7]) and then compute:

d_1 = j_1-i_1 \equiv c_1*r^{-1} mod |T|
.
.
d_k = j_k-i_k \equiv c_k*r^{-1} mod |T|

If r we found was correct, then \forall d_i d_i \in [-CL(Mstr), CL(Mstr)]. Probability that this condition is true, but r is incorrect is (again) exponentially low.

Part 1 discusses how to find the chain’s starting point x_0=h1(Mstr), knowing a single element of the chain and its period r.

Having x_0 and r, we reconstruct ptr(Mstr) exactly:

ptr = ((x_0<<shift)+(r<<shift)>>log2)*GOLDEN^{-1}

 

Defining info leaks

 

As we can see, leaking information is not the same thing as reading memory. My proposition is to define “leaking” as any procedure leading to diminishing the number of possible states the attacked program might be in. Let’s say a program P has 2 bits of memory. Let’s assume we learned (by any means) that M[0] xor M[1]=1 (M is the memory). This condition allows only M[0]=0,M[1]=1 or M[0]=1,M[1]=0, so the number of possible states is 2, instead of 4. “Memory disclosure” definition fails to account for these “partial” leaks and side channel leaks, while the “entropy” one doesn’t.

References

1. http://en.wikipedia.org/wiki/Double_hashing
2. http://en.wikipedia.org/wiki/Modular_multiplicative_inverse
3. http://en.wikipedia.org/wiki/Tagged_pointer
4. http://en.wikipedia.org/wiki/Linear_regression
5. J. E. Nymann. “On the Probability that k Positive Integers Are Relatively Prime.” J. Number
Theory 4 (1972):469-73.
6. http://en.wikipedia.org/wiki/Riemann_zeta_function
7. http://en.wikipedia.org/wiki/Linear_congruence_theorem
8. http://en.wikipedia.org/wiki/Millennium_Prize_Problems#The_Riemann_hypothesis

  1. Dion
    14/08/2012 at 23:26

    This is really really cool. Leaking information about memory locations without *reading* memory values directly is mostly unexplored outside of cache timing side channel crypto literature (at least when I did my search it was). At Blackhat DC 2010, presented an info leak attack using the ordering of elements in a heterogeneous hashtable (mixed heap addresses and integers) exposed via a hashtable iterator. I also tried to get the academic community interested in this at WOOT 2010. I don’t think this is the last we’ll see of this kind of attack.

    Anyway, VERY cool! Well done.

  2. p_k
    17/08/2012 at 22:08

    I remember your trick — I actually tried to use it against FF, but foreach() loops in JS use “time of addition” for ordering, instead of ptr values 🙂

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: