GreHack 2015 Re150 challenge: as painless as possible
In this article, we analyze a GreHack 2015 challenge: reverseMe (Re150).
This is not the purpose of this post to offer a documented write-up; one is already available here, based on an execution trace.
This is more about how we could have analyze this challenge, with the help of the Miasm framework (in addition with others tools, as IDA / radare2 / …).
Overview:
This analysis is based on Miasm revision d2588f5.
First thoughts
The target, reverseMe seems to be a 32 bit ELF. Its main function starts at 0x8048e78. One can view it through Miasm using the disasm/full.py example:
$ python miasm2/example/disasm/full.py reverseMe
$ xdot graph_execflow.dot
Results:
Obviously, the binary starts by decrypting its next code.
Emulation
Even if we can manually apply the XOR, we decide to run it in the Miasm sandbox in order to dump the memory after the decryption process.
We will use one of the Sandbox subclasses, from miasm2.analysis.sandbox, which contains common element for emulation, such as memory emulation, stack initialization, mapping binary to memory, common options, etc.
Let’s inspire ourself from the example jitter/sandbox_pe_x86_32.py. Instead of Sandbox_Win_x86_32, we will use Sandbox_Linux_x86_32:
import os
from miasm2.analysis.sandbox import Sandbox_Linux_x86_32
# Insert here user defined methods
# Parse arguments
parser = Sandbox_Linux_x86_32.parser(description="ELF sandboxer")
parser.add_argument("filename", help="ELF Filename")
options = parser.parse_args()
# Create sandbox
sb = Sandbox_Linux_x86_32(options.filename, options, globals())
# Run
sb.run()
Let’s run it on our binary. Note that debuggers are available thanks to --debug (CLI) or --gdbserver (GDBserver, for instance with IDA) command line options.
We will use the -b (“dump blocks”) option, which output each freshly disassembled basic block encountered during emulation. As the loop (at 0x8048e7e) does not introduce new basic blocks, the output will only display this block once. If the code modifies a previously emulated block, the resulting block will be displayed the next time it will be reached.
$ python sandbox_elf_x86_32.py -b reverseMe
loc_0000000008048E78:0x08048e78
PUSH EDX
MOV EDX, 0x8048A5C
XOR BYTE PTR [EDX], 0xBC
DEC EDX
CMP EDX, 0x8048A45
JAE loc_0000000008048E7E:0x08048e7e
-> c_to:loc_0000000008048E7E:0x08048e7e c_next:loc_0000000008048E8A:0x08048e8a
loc_0000000008048E7E:0x08048e7e
XOR BYTE PTR [EDX], 0xBC
DEC EDX
CMP EDX, 0x8048A45
JAE loc_0000000008048E7E:0x08048e7e
-> c_next:loc_0000000008048E8A:0x08048e8a c_to:loc_0000000008048E7E:0x08048e7e
loc_0000000008048E8A:0x08048e8a
POP EDX
JMP loc_0000000008048A45:0x08048a45
-> c_to:loc_0000000008048A45:0x08048a45
loc_0000000008048A45:0x08048a45
...
loc_0000000008049961:0x08049961
POP EDX
JMP loc_000000000804B1F0:0x0804b1f0
-> c_to:loc_000000000804B1F0:0x0804b1f0
loc_000000000804B1F0:0x0804b1f0
MOV EBX, 0x1
MOV EAX, 0x4
INT 0x80
...
AssertionError
Considering the full output, there are 2 things to notice:
- the decryption pattern seems to repeat itself several time
- emulation ends with a crash: assert(self.get_exception() == 0)
This crash occurs because an exception is raised, through INT 0x80 and is not catched (we did not specify it to Miasm). But it seems that we have reached a new part of the code, different from the decryption steps. Let’s dump the current state of the memory and display the resulting CFG:
# Using -i, we obtain an interactive Python shell at the end of the script
$ python -i sandbox_elf_x86_32.py -b reverseMe
...
AssertionError
# sb is our Sandbox instance
# sb.jitter is the emulator instance
# sb.jitter.vm is the emulator virtual memory
>>> sb.jitter.vm
# Emulated memory. Columns are: address, size, memory permissions
# 0x130000 -> stack
ad 0x130000 size 0x10000 RW_
# 0x8048000 -> mapped binary
ad 0x8048000 size 0x4000 RW_
# Dump the binary (the ELF structure is conserved because in this specific
# binary, memory addresses are directly linked with file offset)
>>> open("dump.bin", "w").write(sb.jitter.vm.get_mem(0x8048000, 0x4000))
$ python miasm2/example/disasm/full.py dump.bin
Results:
The resulting CFG is hard to read, due to the repeated pattern of decryption.
CFG pattern matching
We will use Miasm to remove this pattern from the CFG.
Let’s start with a basic disassembling script:
# Container is the wrapper for ELF, PE, ...
from miasm2.analysis.binary import Container
# Machine is the wrapper for managing multiple architecture
from miasm2.analysis.machine import Machine
cont = Container.from_stream(open("dump.bin"))
# Bin stream is a view on the mapped binary
bin_stream = cont.bin_stream
# 'cont.arch' is "x86_32", extracted from the ELF header
machine = Machine(cont.arch)
# Disassembly engine associated with the current binary
mdis = machine.dis_engine(bin_stream)
# Disassemble the main function
blocks = mdis.dis_multibloc(cont.entry_point)
# Get back the CFG as a dot file
open("cfg.dot", "w").write(blocks.dot())
In cfg.dot, we got again our unreadable CFG. For more information on the AsmCFG instance returned by mdis.dis_multibloc, have a look at PR #309.
Let’s define and match our pattern using MatchGraph (see PR #310).
from miasm2.core.graph import MatchGraphJoker
# Pattern we want to match:
# |
# +----v----+
# | (dad) |
# | PUSH |
# | MOV |
# +----+----+
# |
# +----v----+
# | (middle)|<---+
# +----+--+-+ |
# | +------+
# +----v----+
# | (end) |
# +----+----+
# |
# v
# First block must have 2 assembly lines, a PUSH then a MOV, and an unlimited
# number of incomming edge
dad = MatchGraphJoker(name="dad", restrict_in=False,
filt=lambda block: (len(block.lines) == 2 and
block.lines[0].name == "PUSH" and
block.lines[1].name == "MOV"))
middle = MatchGraphJoker(name="middle")
# Last block can have an unlimited number of outcomming edge
end = MatchGraphJoker(name="end", restrict_out=False)
# Build a MatchGraph by specifing edges (A >> B means an edge from A to B)
matcher = dad >> middle >> middle >> end
# Visualize it to be sure
open("to_match.dot", "w").write(matcher.dot())
Results:
Now, we can match corresponding blocks:
for sol in matcher.match(blocks):
print sol[dad]
loc_00000000080496DE:0x080496de
PUSH EDX
MOV EDX, 0x8048AC0
-> c_next:loc_00000000080496E4:0x080496e4
loc_00000000080482F2:0x080482f2
PUSH EDX
MOV EDX, 0x804A9CE
-> c_next:loc_00000000080482F8:0x080482f8
loc_00000000080496C5:0x080496c5
PUSH EDX
MOV EDX, 0x8049DCB
-> c_next:loc_00000000080496CB:0x080496cb
loc_0000000008049981:0x08049981
PUSH EDX
MOV EDX, 0x8048DAE
-> c_next:loc_0000000008049987:0x08049987
loc_000000000804881F:0x0804881f
PUSH EDX
MOV EDX, 0x8049614
-> c_next:loc_0000000008048825:0x08048825
loc_0000000008049C88:0x08049c88
PUSH EDX
MOV EDX, 0x80480CA
-> c_next:loc_0000000008049C8E:0x08049c8e
loc_0000000008048E78:0x08048e78
PUSH EDX
MOV EDX, 0x8048A5C
-> c_next:loc_0000000008048E7E:0x08048e7e
...
CFG simplification
We will now remove this pattern from our CFG, linking its predecessors to its successors in order to keep a correct control flow. For this purpose, we will write a DiGraphSimplifier pass.
def block_merge(dgs, graph):
"""Remove the pattern defined in matcher
@dgs: instance of DiGraphSimplifier (for recursive call purpose)
@graph: graph to simplify
"""
for sol in matcher.match(graph):
# /!\ we modify the graph during an iteration on it
# Our results do not overlap, we can avoid breaking the 'for' after each
# modification
# Links each successors to our pattern predecessors
successors = graph.successors(sol[end])
for pred in graph.predecessors(sol[dad]):
for succ in successors:
graph.add_edge(pred, succ,
graph.edges2constraint[(pred, sol[dad])])
# Remove matched blocks (and associated edges) from the graph
for node in sol.itervalues():
graph.del_node(node)
Then, we instanciate a DiGraphSimplifier to associate our pass with it and apply it on our blocks.
from miasm2.core.graph import DiGraphSimplifier
dgs = DiGraphSimplifier()
dgs.enable_passes([block_merge])
blocks_after = dgs(blocks)
open("cfg_after.dot", "w").write(blocks_after.dot())
Viewing cfg_after.dot, we can now easily understand what the binary is doing. Keep in mind that only the executed path has been de-xored, so it is normal to obtain incomprehensible blocks on the other paths.
Taking the right execution path
Now, if we analyze what is going on, we can see that the program first ensure that the top stack value (argc because there is no libc init here) equals 2.
We then slightly modify our sandbox_elf_x86_32.py to push our argument on the stack before the run (for more explanation, see PR #284).
from miasm2.core.types import Str, set_allocator
from miasm2.os_dep.common import heap
# Instanciate a heap and set it as the default memory allocator
set_allocator(heap().vm_alloc)
MemStrAnsi = Str().lval
# Allocate a string containing a dummy password
addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, "SomePassword").get_addr()
# The program name, actually unused here
addr_bin = MemStrAnsi.from_str(sb.jitter.vm, "reverseMe").get_addr()
# Push arguments on the stack
sb.jitter.push_uint32_t(addr_pwd)
sb.jitter.push_uint32_t(addr_bin)
sb.jitter.push_uint32_t(0x2)
# Run
sb.run()
Here, manipulated structure are simples; we could use direct memory page allocation instead of the type system wrapper.
addr_pwd = 0x112233
addr_bin = 0x113344
from miasm2.jitter.csts import PAGE_READ, PAGE_WRITE
# Add a memory pages containing dummy arguments
sb.jitter.vm.add_memory_page(addr_pwd, PAGE_READ | PAGE_WRITE, "SomePassword" + "\x00")
sb.jitter.vm.add_memory_page(addr_bin, PAGE_READ, "reverseMe" + "\x00")
Running again our script, we end up with a new error:
loc_000000000804B1DD:0x0804b1dd
INC ECX
XOR BYTE PTR [EBX+ECX], 0x33
JMP loc_0000000008049EAE:0x08049eae
-> c_to:loc_0000000008049EAE:0x08049eae
WARNING: address 0x2000000D is not mapped in virtual memory:
An access has been tried on a non-allocated memory. This access has been attempted by the XOR BYTE PTR [EBX+ECX], 0x33 instruction. As the dumpblock option dumps the full basic block before executing it, we also see the next instruction. Using the -z (--singlestep) argument, we obtain a step by step execution trace (with registers values printed before the instruction execution):
RAX 0000000000000000 RBX 0000000020000000 RCX 000000000000000D RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000013FFF0 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000001
RIP 000000000804B1DD
0804B1DE XOR BYTE PTR [EBX+ECX], 0x33
WARNING: address 0x2000000D is not mapped in virtual memory:
Actually, the program expects a longer input. We can either map a bigger memory page (for instance, 4KB) or use dichotomy to find the exact expected input size, which is left as an exercise to the reader. We obtain an expected input size of 28, so, for instance:
from string import printable
# printable avoid character repetition, useful for debugging
addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, printable[:28]).get_addr()
As before, we now end on a INT 0x80.
Using our previous script (with a new dump, dump2.bin), we output the resulting CFG.
Password matching
In the resulting CFG, we observe an instance of our pattern which have not been simplified (at 0x80491b1). Indeed, this is not exactly our pattern; the middle block (at 0x80491b7) has an incomming constraint c_next from 0x80491b5. As the instructions in this last block are still encrypted, they are disassembled as a ADD AL, 0x8. So, after executing this instruction, the next one must be at 0x80491b7 (this is what c_next mean: the block must be next to this predecessor). To remove this artefact, at least two options:
- modify the pattern to remove the restriction on the number of incomming edge for middle (restrict_in=False)
- tell the disassembler engine to stop on the bad JNZ branch, at 0x8049215
For the purpose of highlighting Miasm API, we choose the second option. We then modify our disassembler to tell it to stop disassembling if this specific address is reached.
# Disassembly engine associated with the current binary
mdis = machine.dis_engine(bin_stream, dont_dis=[0x8049215])
We end with a spaghetti CFG, that is to say a list of basic block with only one successor and only one predecessor.
If we analyze them and particularly the final loop, we understand that the code is XOR-ing each byte of our input with a given key, then test that the result is full of null byte. In other words, our input must be the key.
As this key computing is very simple to understand, a symbolic execution and SMT-solvers (such as z3) based approach is not necessary. It may be dealt with on a better example in a future post (cliffhanger!).
Instead of manually retrieving the key, let’s use a graph traversal starting from the CFG root and printing each XOR constant:
key = []
# Only one head (i.e. root) in our graph
head = blocks_after.heads()[0]
for block in blocks_after.walk_depth_first_forward(head):
# Depth first traversal starting from the root
if (len(block.lines) == 3 and
block.lines[1].name == "XOR"):
# Arguments are ([EBX + ECX], CST), in Miasm Intermediate Representation
cst = block.lines[1].args[1]
key.append(chr(int(cst.arg)))
# Output the resulting key
print "".join(key)
Which outs: Alph4t3stf0rc3mille!!!!!!!!!.
So, finally:
$ ./reverseMe 'Alph4t3stf0rc3mille!!!!!!!!!'
Well done!
Bonus
As noted before, this is a spaghetti code. Miasm already provides a simplification pass for this kind of code, using:
from miasm2.core.asmbloc import bbl_simplifier
merged = bbl_simplifier(blocks_after)
open("merged.dot", "w").write(merged.dot())
Resulting in a bigger basic block:
Conclusion
On a Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz single core (these values are the mean on several execution, removing extremes):
- the complete execution (with the right path and decryption) takes 21.376s
- the graph simplification and the solution extraction takes 1.540s
Finally, analyzing the challenge and writing these small scripts does not take more than 20 minutes (for somebody knowing the Miasm API). A first pass has been done to obtain the relevant code, through emulation. Once the annoying patterns are detected, matching and removing them only takes a couple of minutes.
As this method can be applied on other binaries, like malwares, we may write a second article on how to rebuild a working binary with the pattern and the decryption process removed, leading to a close-to-the-original binary.
Final scripts: