Home > Exploit development, Reverse Engineering > ROPC — Turing complete ROP compiler (part 1)

## ROPC — Turing complete ROP compiler (part 1)

This is a long overdue post describing ROPC (Return Oriented Programming Compiler, available here: https://github.com/pakt/ropc) with its own “higher level” language and features like conditional jumps, loops, functions (even recursive ones), tables, etc.. ROPC was released in 2012. Since then, Christian Heitman made a fork [0] capable of compiling ROP programs expressed in C (!).

# Return Oriented Programming

Let’s consider a simple example to refresh our understanding of ROP. Below is a short snippet of assembly with few gadgets.

start_rop:
ret
set_eax:
pop eax
ret
set_ebx:
pop ebx
ret
write_mem:
mov [eax], ebx

We want to write value X to address ADDR, using only gadgets we see above. One of possible stack arrangements is:

 Address Value ESP+0 set_ebx ESP+4 X ESP+8 set_eax ESP+12 ADDR ESP+16 write_mem

Assuming that EIP=start_rop and stack memory is arranged as above, here’s what happens:

• First RET transfers execution to set_ebx
• POP EBX loads X and saves it to EBX
• RET transfers exec. to set_eax
• RET transfers exec. to write_mem
• MOV [EAX], EBX writes X under ADDR
• last RET transfers exec. to undefined address

This was a really simple example, so creating a ROP program was easy, even by hand. Real world exploits are much more complex and in case of large applications (like Firefox) with a large number of gadgets, it’s not clear how to implement the desired functionality “manually”, just by staring at IDA :).

## Previous work

Q [1] was the first tool which did not require any human interaction to compile a ROP program. Before Q, the most advanced was Roemer’s compiler [2], but it did require some level of interaction — gadget database had to be constructed manually.

ROPC is based on Q. Since Q’s code isn’t available, parts of its functionality were implemented from scratch. ROPC searches, classifies and verifies gadgets just like Q. The set of types of gadgets used to express more complex operations is almost the same as in Q. That’s the end of similarities. ROPL (Return Oriented Programming Language ;)) is higher level and more advanced than QooL. It can express conditional jumps and procedures, even recursive ones.

ROPC also includes some new ideas / solutions:

• compilation based on pseudoinstructions,
• solving technical problems with stack emulation for ROP programs,
• expressing recursive procedures.

## Used tools

ROPC uses two tools:

• BAP
• STP

BAP [3] (Binary Analysis Platform) is a framework (written in ocaml) for binary code analysis. BAP first disassembles given code and then converts it to BIL (BAP’s intermediate language). BIL is easier to manipulate than assembly, because:

• it has less instructions,
• side effects (like EFLAGS mutations)  are expressed explicitly.

ROPC is using a small subset of BAP’s capabilities:

• lifting from x86 to BIL,
• BIL emulation,
• converting BIL to format digestable by STP.

If you want to do any kind of binary analysis there is a very good chance that BAP already does what you plan to implement. Seriously, use BAP :P.

STP [4] is a SMT solver. For our application, it’s just a black box that answer questions about code semantics. BAP provides an API for querying STP (and other solvers) about BIL, so it’s easy to formulate questions about code and get answers from STP. Questions asked by ROPC regard hypotheses about code’s functionality. For example, we might be interested, is it always true that after executing a particular piece of code, the final value in EBX is equal to the starting value of EAX. If STP answers YES, then we can be sure that the analyzed snippet is “semantically equivalent” to  MOV EBX, EAX.

# The compiler

Programs generated by ROPC consist of sequence of gadgets. It’s useful to think of them as instructions of very simple assembler.

Table above describes all gadget types recognized by ROPC. $X \leftarrow Y$ denotes assignment of Y to X. $\diamond_{b}$ is a binary arithmetic or logical operation. [X] denotes 32-bit value in memory cell under address X. Lahf copies processor flags to AH register.

For a given target app (we will call it “host”) we have to first construct a gadget database. This process is described in Q paper, so I won’t repeat it here.

Having the gadget db, compiler can start consuming the ROPL source. From a parsed program in AST (abstract syntax tree) form, there’s a long way to a binary (compiled) result. A simple instruction like: x=a+1 is complex to implement, if all we can operate with are gadgets:

• we need to load r1 register (abstract for now) with the value of local variable a
• set register r2 to 1
• calculate r3 = r1+r2
• write r3 to local variable x

Every read and write from/to a local variable also requires few gadget operations. Since implementations of ROPL instructions in gadgets are long, ROPC uses a higher level construct called pseudoinstructions, or PI. PI are “parametrized macros” allowing to express complex operations succinctly. PI can take multiple types of parameters:

• symbolic registers (abstract variable that can be set to any concrete register),
• concrete registers (EAX, EBX, etc),
• numerical constants.

Simplified set of pseudoinstructions (PI)

Table above describes a simplified set of PI, used to explain inner working of the compiler. Set used by ROPC is more complex, but the idea is the same. [[R]] means a concrete register assigned to symbolic register R. locals(Var) is the memory address of local variable Var. ReadLocal and WriteLocal implementations were omitted, since they are quite long.

Every ROPL instruction has its implementation in PI. For example, x=a+1 can be implemented as:

ReadLocal(r1, a)
Set(r2, 1)
WriteLocal(x, r3)

During compilation, ROPC first rewrites instructions from the source to PI sequences and then tries to unroll them into sequences of gadgets by assigning concrete registers to symbolic ones (r1, r2, r3 in the example above). If unrolling suceeds, the sequence of gadgets’ addresses is written to output file as the compiled result.

## Componenets

Compiler consists of three parts:

• candidate generator (CG),
• verifier (VER),
• the actual compiler (COMP).

CG and VER work exactly the same way as in Q, so I won’t repeat the details here. Together, they are able to construct a gadget database with verified semantics. Found gadgets are guaranteed to work exactly as advertised by their type and not crash :). VER additionally provides ROPC with a set of registers modified during gadget execution.

Below are diagrams providing a very high level overview of how components work together.

Candidate generator and verifier produce the gadget database

Compilation process

### Candidate generator + verifier

Candidate generator

Verifier

# Compilation stages

Compilation

ROPL source is parsed to produce Abstract Syntax Tree  representation of the program. Syntactically correct programs can be semantically incorrect, so we need to run some checks. AST verification stage checks if

• variables are initialized before use,
• all invoked functions are defined,
• function names are unique,
• there is a “main” function,
• functions are called with correct number of arguments.

If a program fails any of the above checks, it’s rejected as invalid and compilation terminates.

AST simplification massages the tree to make it easier to process. It’s done by converting complicated expressions to three address code [5]. For example, x=a+b+c is split (simplified) to two instructions: [t=a+b, x=t+c]. After this step, all trees describing arithmetic operating have depth at most 1. Price for this optimizations is increased number of temporary local variables (like ‘t’ in previous example).

AST rewriting converts ROPL instructions to sequences of pseudoinstructions. ROPL instruction “cmp” is a good example. To compare arguments, it’s necessary the compute them, save to registers, subtract these registers and then save EFLAGS contents to AH register. PI implementation of “cmp v1,v2” is:

ReadLocal(reg1, v1)
Sub(reg3, reg1, reg2)
SaveFlags

During this stage all instructions are expanded into PI sequences. Next stages operate only on PI lists.

### Assigning registers

This is the most complex step of the compilation process. To unroll a list of PI to a list of gadgets, it’s necessary to assign concrete x86 registers to all symbolic registers. The problem is, we need to take into account that some assignments will not work, because of gadget side effects (modifying registers) and because not every assignment is possible. We will call an assignment (assignments are functions from symbolic registers to concrete ones) correct, if (and only if) it’s possible (with a particular gadget database) and if it preserves semantics. Here’s an example explaining these two problems:

load_eax:
pop eax
ret
pop ebx
mov eax, 0
ret
pop ecx
ret
ret
ret

Assuming the above is our host app, how do we assign concrete regs to this sequence of PI:

Set(r1, 1)
Set(r2, 2)
Add(r3, r1, r2)

Assignment [[r1]]=ESI ([[r2]], [[r3]] anything) is not possible, because no gadget is able to set ESI=1. Assignment [[r1]]=EAX, [[r2]]=EBX, [[r3]]=EAX is possible, but does not preserve semantics since load_ebx gadget modifies EAX. Correct (so possible and semantic preserving) assignment is [[r1]]=[[r3]]=EAX, [[r2]]=ECX.

To express these observations more formally, we can say that for every gadget G(r1,..,rn) (r1,..,rn are symbolic) assignment must guarantee that G([[r1]], …, [[rn]]) belongs to a set returned by VER. It must also ensure that if [[r]]=CONCRETE_REG, then CONCRETE_REG isn’t modified between writing and reading r (so it has to be liveness [6] preserving).

Number of possible assignments is exponential, but we can speed up the naive search by observing that there’s no point in checking assignments which do not meet the above criteria (availability and liveness preservation).

Registers available for gadget arguments are provided “for free” by VER. In case of PI, we can use their expansions. Let $P(r) \rightarrow [P_0(r), P_1(r,t_0),P_2(t_0,t_1)]$ (this means that $P$ expands to $[...]$). Then we have $Available(P,r)=Available(P_0,r) \cap Available(P_1,r)$. Since r is used as an argument for $P_0$ and $P_1$ it has to belong to “available” sets for both of these PI.

After performing liveness analysis on ROPL source in PI form, for every pseudoinstruction we will get a set of symbolic registers which are “live” during its execution. These sets provide two important pieces of information: which registers can’t be modified and which registers can’t share a concrete register.

Set of registers modified during gadget execution (as a side effect) is provided by CG and verified by VER, so we can be sure about its correctness.

Having information about possible assignments, liveness and assignment conflicts, we can generate all legal assignments and check them one by one, until we find the one which can be fulfilled using the gadget database.

### Concretization of symbolic constants

Implementation of jumps is based on modifying ESP by a constant equal to the distance between the jump instruction and target label: ESP += C. This constant isn’t known until all PI are expanded to a sequence of gadgets. Only then, by summing their sizes  (how much space they take on the stack) we can calculate the needed offset.

Concretization is the last stage before producing the final ROP payload. The only thing that changes during this stage is the SetSymbolic(reg, S) pseudoinstruction, where reg is a concrete register and S is a symbolic constant expressing the distance between two labels. After computing S (S=X), SetSymbolic(reg, S) changes to Set(reg, C).

### To be continued

I hope this post was useful. Please post any questions in the comments.  Next part will discuss ROPL syntax and capabilities :).

1. 13/12/2013 at 16:43

Very cool. Can you clarify: is your approach guaranteed to find a compilation solution, if one exists? Maybe it’s a silly question; I have limited experience in the constraints imposed by rop.

• 13/12/2013 at 16:47

No, because it can’t find replacements for gadgets like Q does. So for example if you want to compile y=x*2 and there is no MUL gadget, compilation will fail.

2. 06/09/2016 at 17:39

Hello,

Does it provide some low level features like pointers or system calls ?

• 10/10/2016 at 15:17

System calls – yes.

3. 10/10/2016 at 22:04

System calls aren’t possible in such situation.

• 10/10/2016 at 22:45

But can other features works, or the does the whole compile process will fail ?

Can it work without the ret instruction *(in my case only call and jumps exists in the binary because of the 32 bytes alignment enforcing)* ?

• 11/10/2016 at 09:09

RETs are necessary. Compilation will fail without them.

• 11/10/2016 at 22:09

Does the first instruction need to be a ret or is it a requirement only for further instructions ?

Also, Is it possible to trigger the generated rop chain by jumping at specific addressing directly rather than using a stack overflow ? (in my case the stack isn’t executable)

1. 01/01/2014 at 16:09
2. 31/07/2014 at 15:27
3. 20/04/2015 at 23:08