Home > Exploit development, Reverse Engineering > ROPC — Turing complete ROP compiler (part 1)

ROPC — Turing complete ROP compiler (part 1)

13/12/2013 pakt Leave a comment Go to comments

This is a long overdue post describing ROPC (Return Oriented Programming Compiler, available here: https://github.com/pakt/ropc) with its own “higher level” language and features like conditional jumps, loops, functions (even recursive ones), tables, etc.. ROPC was released in 2012. Since then, Christian Heitman made a fork [0] capable of compiling ROP programs expressed in C (!).

Return Oriented Programming

Let’s consider a simple example to refresh our understanding of ROP. Below is a short snippet of assembly with few gadgets.

start_rop:
 ret
set_eax:
 pop eax
 ret
set_ebx:
 pop ebx
 ret
write_mem:
 mov [eax], ebx

We want to write value X to address ADDR, using only gadgets we see above. One of possible stack arrangements is:

Address	Value
ESP+0	set_ebx
ESP+4	X
ESP+8	set_eax
ESP+12	ADDR
ESP+16	write_mem

Assuming that EIP=start_rop and stack memory is arranged as above, here’s what happens:

First RET transfers execution to set_ebx
POP EBX loads X and saves it to EBX
RET transfers exec. to set_eax
POP EAX loads ADDR and saves it to EAX
RET transfers exec. to write_mem
MOV [EAX], EBX writes X under ADDR
last RET transfers exec. to undefined address

This was a really simple example, so creating a ROP program was easy, even by hand. Real world exploits are much more complex and in case of large applications (like Firefox) with a large number of gadgets, it’s not clear how to implement the desired functionality “manually”, just by staring at IDA :).

Previous work

Q [1] was the first tool which did not require any human interaction to compile a ROP program. Before Q, the most advanced was Roemer’s compiler [2], but it did require some level of interaction — gadget database had to be constructed manually.

ROPC is based on Q. Since Q’s code isn’t available, parts of its functionality were implemented from scratch. ROPC searches, classifies and verifies gadgets just like Q. The set of types of gadgets used to express more complex operations is almost the same as in Q. That’s the end of similarities. ROPL (Return Oriented Programming Language ;)) is higher level and more advanced than QooL. It can express conditional jumps and procedures, even recursive ones.

ROPC also includes some new ideas / solutions:

compilation based on pseudoinstructions,
solving technical problems with stack emulation for ROP programs,
expressing recursive procedures.

Used tools

ROPC uses two tools:

BAP [3] (Binary Analysis Platform) is a framework (written in ocaml) for binary code analysis. BAP first disassembles given code and then converts it to BIL (BAP’s intermediate language). BIL is easier to manipulate than assembly, because:

it has less instructions,
side effects (like EFLAGS mutations) are expressed explicitly.

ROPC is using a small subset of BAP’s capabilities:

lifting from x86 to BIL,
BIL emulation,
converting BIL to format digestable by STP.

If you want to do any kind of binary analysis there is a very good chance that BAP already does what you plan to implement. Seriously, use BAP :P.

STP [4] is a SMT solver. For our application, it’s just a black box that answer questions about code semantics. BAP provides an API for querying STP (and other solvers) about BIL, so it’s easy to formulate questions about code and get answers from STP. Questions asked by ROPC regard hypotheses about code’s functionality. For example, we might be interested, is it always true that after executing a particular piece of code, the final value in EBX is equal to the starting value of EAX. If STP answers YES, then we can be sure that the analyzed snippet is “semantically equivalent” to MOV EBX, EAX.

The compiler

Programs generated by ROPC consist of sequence of gadgets. It’s useful to think of them as instructions of very simple assembler.

Gadget types used in compiler

Table above describes all gadget types recognized by ROPC. $X \leftarrow Y$ denotes assignment of Y to X. $\diamond_{b}$ is a binary arithmetic or logical operation. [X] denotes 32-bit value in memory cell under address X. Lahf copies processor flags to AH register.

For a given target app (we will call it “host”) we have to first construct a gadget database. This process is described in Q paper, so I won’t repeat it here.

Having the gadget db, compiler can start consuming the ROPL source. From a parsed program in AST (abstract syntax tree) form, there’s a long way to a binary (compiled) result. A simple instruction like: x=a+1 is complex to implement, if all we can operate with are gadgets:

we need to load r1 register (abstract for now) with the value of local variable a
set register r2 to 1
calculate r3 = r1+r2
write r3 to local variable x

Every read and write from/to a local variable also requires few gadget operations. Since implementations of ROPL instructions in gadgets are long, ROPC uses a higher level construct called pseudoinstructions, or PI. PI are “parametrized macros” allowing to express complex operations succinctly. PI can take multiple types of parameters:

symbolic registers (abstract variable that can be set to any concrete register),
concrete registers (EAX, EBX, etc),
numerical constants.

Simplified set of pseudoinstructions (PI)

Table above describes a simplified set of PI, used to explain inner working of the compiler. Set used by ROPC is more complex, but the idea is the same. [[R]] means a concrete register assigned to symbolic register R. locals(Var) is the memory address of local variable Var. ReadLocal and WriteLocal implementations were omitted, since they are quite long.

Every ROPL instruction has its implementation in PI. For example, x=a+1 can be implemented as:

ReadLocal(r1, a)
Set(r2, 1)
Add(r3, r1, r2)
WriteLocal(x, r3)

During compilation, ROPC first rewrites instructions from the source to PI sequences and then tries to unroll them into sequences of gadgets by assigning concrete registers to symbolic ones (r1, r2, r3 in the example above). If unrolling suceeds, the sequence of gadgets’ addresses is written to output file as the compiled result.

Componenets

Compiler consists of three parts:

candidate generator (CG),
verifier (VER),
the actual compiler (COMP).

CG and VER work exactly the same way as in Q, so I won’t repeat the details here. Together, they are able to construct a gadget database with verified semantics. Found gadgets are guaranteed to work exactly as advertised by their type and not crash :). VER additionally provides ROPC with a set of registers modified during gadget execution.

Below are diagrams providing a very high level overview of how components work together.

Candidate generator and verifier produce the gadget database

Compilation process

Candidate generator + verifier

Candidate generator

Verifier

Compilation stages

Compilation

ROPL source is parsed to produce Abstract Syntax Tree representation of the program. Syntactically correct programs can be semantically incorrect, so we need to run some checks. AST verification stage checks if

variables are initialized before use,
jumps jump to defined labels,
all invoked functions are defined,
function names are unique,
there is a “main” function,
functions are called with correct number of arguments.

If a program fails any of the above checks, it’s rejected as invalid and compilation terminates.

AST simplification massages the tree to make it easier to process. It’s done by converting complicated expressions to three address code [5]. For example, x=a+b+c is split (simplified) to two instructions: [t=a+b, x=t+c]. After this step, all trees describing arithmetic operating have depth at most 1. Price for this optimizations is increased number of temporary local variables (like ‘t’ in previous example).

AST rewriting converts ROPL instructions to sequences of pseudoinstructions. ROPL instruction “cmp” is a good example. To compare arguments, it’s necessary the compute them, save to registers, subtract these registers and then save EFLAGS contents to AH register. PI implementation of “cmp v1,v2” is:

ReadLocal(reg1, v1)
ReadLocal(reg2, v2)
Sub(reg3, reg1, reg2)
SaveFlags

During this stage all instructions are expanded into PI sequences. Next stages operate only on PI lists.

Assigning registers

This is the most complex step of the compilation process. To unroll a list of PI to a list of gadgets, it’s necessary to assign concrete x86 registers to all symbolic registers. The problem is, we need to take into account that some assignments will not work, because of gadget side effects (modifying registers) and because not every assignment is possible. We will call an assignment (assignments are functions from symbolic registers to concrete ones) correct, if (and only if) it’s possible (with a particular gadget database) and if it preserves semantics. Here’s an example explaining these two problems:

load_eax:
  pop eax
  ret
load_ebx:
  pop ebx
  mov eax, 0
  ret
load_ecx:
  pop ecx
  ret
add1: 
  add eax, ebx
  ret
add2:
  add eax, ecx
  ret

Assuming the above is our host app, how do we assign concrete regs to this sequence of PI:

Set(r1, 1)
Set(r2, 2)
Add(r3, r1, r2)

Assignment [[r1]]=ESI ([[r2]], [[r3]] anything) is not possible, because no gadget is able to set ESI=1. Assignment [[r1]]=EAX, [[r2]]=EBX, [[r3]]=EAX is possible, but does not preserve semantics since load_ebx gadget modifies EAX. Correct (so possible and semantic preserving) assignment is [[r1]]=[[r3]]=EAX, [[r2]]=ECX.

To express these observations more formally, we can say that for every gadget G(r1,..,rn) (r1,..,rn are symbolic) assignment must guarantee that G([[r1]], …, [[rn]]) belongs to a set returned by VER. It must also ensure that if [[r]]=CONCRETE_REG, then CONCRETE_REG isn’t modified between writing and reading r (so it has to be liveness [6] preserving).

Number of possible assignments is exponential, but we can speed up the naive search by observing that there’s no point in checking assignments which do not meet the above criteria (availability and liveness preservation).

Registers available for gadget arguments are provided “for free” by VER. In case of PI, we can use their expansions. Let $P(r) \rightarrow [P_0(r), P_1(r,t_0),P_2(t_0,t_1)]$ (this means that $P$ expands to $[...]$ ). Then we have $Available(P,r)=Available(P_0,r) \cap Available(P_1,r)$ . Since r is used as an argument for $P_0$ and $P_1$ it has to belong to “available” sets for both of these PI.

After performing liveness analysis on ROPL source in PI form, for every pseudoinstruction we will get a set of symbolic registers which are “live” during its execution. These sets provide two important pieces of information: which registers can’t be modified and which registers can’t share a concrete register.

Set of registers modified during gadget execution (as a side effect) is provided by CG and verified by VER, so we can be sure about its correctness.

Having information about possible assignments, liveness and assignment conflicts, we can generate all legal assignments and check them one by one, until we find the one which can be fulfilled using the gadget database.

Concretization of symbolic constants

Implementation of jumps is based on modifying ESP by a constant equal to the distance between the jump instruction and target label: ESP += C. This constant isn’t known until all PI are expanded to a sequence of gadgets. Only then, by summing their sizes (how much space they take on the stack) we can calculate the needed offset.

Concretization is the last stage before producing the final ROP payload. The only thing that changes during this stage is the SetSymbolic(reg, S) pseudoinstruction, where reg is a concrete register and S is a symbolic constant expressing the distance between two labels. After computing S (S=X), SetSymbolic(reg, S) changes to Set(reg, C).

To be continued

I hope this post was useful. Please post any questions in the comments. Next part will discuss ROPL syntax and capabilities :).

0 – https://github.com/programa-stic/ropc-llvm

1 – http://users.ece.cmu.edu/~ejschwar/papers/usenix11.pdf

2 – http://cseweb.ucsd.edu/~hovav/dist/rop.pdf

3 – http://bap.ece.cmu.edu/

4 – https://sites.google.com/site/stpfastprover/

5 – http://en.wikipedia.org/wiki/Three_address_code

6 – http://en.wikipedia.org/wiki/Live_variable_analysis

Categories: Exploit development, Reverse Engineering Tags: bap, compilers, llvm, ocaml, rop, ropc

Comments (9) Trackbacks (3) Leave a comment Trackback

John

13/12/2013 at 16:43

Reply

Very cool. Can you clarify: is your approach guaranteed to find a compilation solution, if one exists? Maybe it’s a silly question; I have limited experience in the constraints imposed by rop.
- pakt
  
  13/12/2013 at 16:47
  
  Reply
  
  No, because it can’t find replacements for gadgets like Q does. So for example if you want to compile y=x*2 and there is no MUL gadget, compilation will fail.
ytrezq

06/09/2016 at 17:39

Reply

Hello,

Does it provide some low level features like pointers or system calls ?
- pakt
  
  10/10/2016 at 15:17
  
  Reply
  
  System calls – yes.
  - ytrezq
    
    10/10/2016 at 17:06
    
    Then what appends if the code being analysed doesn’t contains an instruction that can be used to perform a system call ? (I’m in the case where I would like to run it on code compiled for the ɴaᴄl sandbox https://developer.chrome.com/native-client/reference/sandbox_internals/x86-64-sandbox)
    
    Does performing system calls becomes impossible or the whole compiler stops ?
pakt

10/10/2016 at 22:04

Reply

System calls aren’t possible in such situation.
- ytrezq
  
  10/10/2016 at 22:45
  
  Reply
  
  But can other features works, or the does the whole compile process will fail ?
  
  Can it work without the ret instruction *(in my case only call and jumps exists in the binary because of the 32 bytes alignment enforcing)* ?
  - pakt
    
    11/10/2016 at 09:09
    
    RETs are necessary. Compilation will fail without them.
- ytrezq
  
  11/10/2016 at 22:09
  
  Reply
  
  Does the first instruction need to be a ret or is it a requirement only for further instructions ?
  
  Also, Is it possible to trigger the generated rop chain by jumping at specific addressing directly rather than using a stack overflow ? (in my case the stack isn’t executable)

01/01/2014 at 16:09

ROPC — Turing complete ROP compiler (part 2, language) | GDTR
31/07/2014 at 15:27

ROPC — Turing complete ROP compiler (part 3, implementation) | GDTR
20/04/2015 at 23:08

Offensive Computer Security Spring 2014 – CIS4930 & CIS5930 | Never Ending Security

GDTR

ROPC — Turing complete ROP compiler (part 1)

Return Oriented Programming

Previous work

Used tools

The compiler

Componenets

Candidate generator + verifier

Compilation stages

Assigning registers

Concretization of symbolic constants

To be continued

Leave a comment Cancel reply

Stream of consciousness

Recent Posts

Categories

Friendly sites

Meta

GDTR

ROPC — Turing complete ROP compiler (part 1)

Return Oriented Programming

Previous work

Used tools

The compiler

Componenets

Candidate generator + verifier

Compilation stages

Assigning registers

Concretization of symbolic constants

To be continued

Share this:

Related

Leave a comment Cancel reply

Stream of consciousness

Recent Posts

Categories

Friendly sites

Meta