ROPC — Turing complete ROP compiler (part 3, implementation)
This is the third (and last) post in a series (first post here, second here) about ROPC, describing implementation of its features like tables, conditional jumps, recursive calls, etc. Please familiarize yourself with the two first posts, otherwise this one might be hard to follow.
After our ROP program finishes executing the main function, execution transfers to first address encountered on the stack. This is problematic for few reasons:
- if implementation of function F is present on the stack after the implementation of main, then the ROP program will start executing F, which contradicts our assumption that the execution starts and ends in main,
- if the address after implementation of main isn’t mapped in the process’ address space, then our target program terminates with an exception. On the other hand, if the address is mapped, worst case scenario is that we will start executing format_hard_drive() function 🙂 — we can’t predict what is going to happen.
Additionally, for ROP functions to be able to use the emulated stack, variables holding the stack top and the stack frame have to be initialized before entering main. The solution is to add an initialization procedure as a preamble to every compiled ROP program. This prologue will:
- initialize variables storing stack top and stack frame,
- copy contents of tables used in the ROP program to the target application’s data section,
- call the main function with the return address set to the very end of the ROP program.
Additionally, we add a special constant CRASH_ADDR at the end of the program. CRASH_ADDR is picked so that we are sure it’s an address that’s never mapped in the address space. This way the order of ROP functions on the stack doesn’t matter and execution terminates with an access violation while the processor tries to execute code stored under CRASH_ADDR — execution stops in a controlled manner.
All table assignments tab=[1,2,…] are replaced during AST rewriting with constant assignments tab=ADDR, where ADDR is the address where the wrapper prologue stored the table’s contents. At first glance this might seem like a waste of space, since we need a procedure in the preamble to store the table (one dword at a time) to a fixed (chosen during compilation) address in the process’ data section. The non-wasteful alternative, storing the table explicitly in the ROP payload requires a gadget that copies the ESP register to another one. It’s necessary because the ROP program doesn’t know its memory location, so in order to reference a table in its body, it needs to know the address of itself. The space wasting variant was chosen for its simplicity :).
All pseudoinstructions (see part 1) related to execution transfers make use of a symbolic constant FromTo(A,B), which is replaced by the byte distance between labels A and B in the compiled payload. Since the distance is signed (FromTo(A,B)=-FromTo(B,A)) it follows that FromTo(A,B)+FromTo(B,C)=FromTo(A,C).
The simplest way to implement conditional jumps is the use the lahf instruction. lahf copies some flags from the EFLAGS register to the older byte of AX: AH := EFLAGS(SF:ZF:0:AF:0:PF:1:CF). Below is the pseudocode of how ROPC implements a pair of instructions: cmp e1, e2; jXX jmp_lbl.
[compute e1 and store the result in reg1]
[compute e2 and store the result in reg2]
Sub(reg3, reg1, reg2)
SaveFlags ;just lahf
[set EAX=1 if and only if the jump's condition is met]
SetSymbolic(reg4, FromTo('@@', jmp_lbl))
Mul(EAX, EAX, reg4)
Pseudoinstruction (PI) Sub is used only to update EFLAGS. We can use subtraction instead of comparison on x86 (cmp instruction), because cmp does the subtraction too, but doesn’t save the result. The key is to ensure that EAX==1 if and only if the jump’s condition is met — this way multiplication works as a conditional assignment. It’s easy to express boolean conditions when EAX holds flags from EFLAGS.
For recursion to work we need a stack. Since the normal stack (pointed to by ESP) is already used, we need to emulate it using the data section of the targeted process.
We store two variables at the beginning of the data section: current top of the emulated stack and the current stack frame. The prologue of the ROP payload initializes them with the end of the data section (stack growns downwards). During execution of the payload these two variables are modified just like for the native x86 stack.
Below is the implementation of PushReg(reg) pseudoinstruction. ReadMemConst and WriteMemConst take memory address as a constant.
Sub(reg3, reg1, reg2)
Above pseudocode emulates the native push instruction:
- read the stack’s top from STACK_PTR address and store it to reg0,
- store reg’s value to address pointed by reg0,
- read the stack top again (reg0 could be overwritten),
- set reg2=4,
- compute reg3=reg1-reg2=[STACK_PTR]-4,
- update the top of the stack: [STACK_PTR]=[STACK_PTR]-4.
Pseudoinstructions Pop, Leave, Enter, Ret and Call are implemented in similar fashion.
Call & Ret
call foo(a1, ..., an)
is expressed in pseudoinstructions:
[compute an and store the result in reg]
[compute a1 and store the result in reg]
Set(reg, FromTo(foo, after))
Set(reg, FromTo(after, foo))
Computed parameters (a1,…,an) are stored on the emulated stack. After them, we store the signed distance between the labels foo and after. Then we jump to label foo — we offset ESP by the distance between after and foo.
After foo does its thing it’s time to return to label after (just after OpEsp). Ret pops the “return address” pushed by the call and adds to it the distance from itself to label foo, getting distance from itself to after as a result. Indeed: FromTo(ret,foo)+FromTo(foo,after)=FromTo(ret,after). Knowing this distance, Ret can jump back to label after and exit the current function :).
Invoking OS functions
ROPC uses the application’s import table for calling OS functions. This solution is simple to implement, but limits the power of expression if the import section contains small number of functions.
In order to call an import, we need to:
- save its parameters on the native stack,
- save the return address on the native stack,
- make sure the import’s local variables don’t overwrite our ROP payload.
The easiest way to fulfill all of the above conditions is to invoke the OS function from application’s data section, instead of the native stack. We copy the import calling stub to the data section and then jump to it. This way we are able to provide the return address (since we know the address of the stub in .data) and don’t need to worry about the import’s local variables.
ROPC is impractical because of the size of code it generates. Attacker’s goal is to take control of the victim’s machine, not to compute Fibonacci numbers ;). Conditional jumps and recursion are not necessary to spawn a shell.
Useful continuation of ROPC would be a tool that can find a sequence of gadgets (as short as possible) able to run a piece of native code given as a parameter. ATM tools that do that (like <em>mona.py</em>) use heuristics and don’t guarantee the semantics of sequences they find.