Moopler Closing Read more... ×
Moopler # Other Simple Optimize / Deobfuscation Challenge(s)

## Recommended Posts one - a:
The overlaying obfuscation is fairly easy to remove. Curious to see how well others do. This of course is much like Themida's techniques. Most automatic disassemblers like IDA and most public deobfuscators / libs / tools / misc out there seem to get optimizations wrong, miss the point of it completely, or are limited to handling certain registers only (reg32 seems to be typically the only supported).

I find approaching from an algebraic point of view is very handy. Simply creating downward blocks of equations until you reach stable data movement instructions. After solving, I continue to repeating my optimizations of blocking, solving downward, and filtering until I'm left with fairly readable equations that can easily be converted back into assembly.

example:

```↓push ebx			->	↓esp = esp -  0x4;	->	↓esp = esp -  0x4;	->	↓push eax
↓mov dword ptr ss : [esp], edi 		↓dword ptr *esp =  ebx;		↓dword ptr *esp =  eax;
↓mov dword ptr ss : [esp], eax 		↓dword ptr *esp =  edi;
↓dword ptr *esp =  eax;```

(hint: boolean algebra helps)

There are many possible ways to phrase the solution. try to be as minimal as possible. It shouldn't be much. I can post a video explaining how to walk through and simplify through an algebraic view point if anyone is interested. Don't really feel like doing all of the spoonfeeding just yet.

https://gist.github.com/evodz/e68257b2a405e8337695ea7e8142ecdb

reminders:
push reg
esp = esp - 0x4;
dword ptr *esp =  reg;

pop reg
esp = esp + 0x4;
reg =  dword ptr *esp;

mov reg, data
reg= esp +  0x4;
source = (dword ptr *reg) - data;
dword ptr *reg =  source;

mov reg2, reg1
esp = esp - 0x4;
dword ptr *esp =  reg1;
esp = esp + 0x4;
reg2 =  dword ptr *esp;
-------------------------------------------------------------------------------------------------------
If anyone else has small challenges like such above feel free to post here as well.

Edited by Guest
typo on registers for mov r,r

##### Link to post

If you deobfuscate all the stack manipulation, you're left with

```sub eax, 0x63d17c6b

Which is basically equal to nothing. But since you seem to expect a breakdown, here's top-to-bottom:

Step 1

```push ebx
mov dword ptr ss : [esp], edi
mov dword ptr ss : [esp], eax```

Equals to

`push eax`

Step 2 (considering step 1 results)

```push eax
push ebx
mov ebx, 0x63d17c6b
sub dword ptr ss : [esp + 0x4], ebx
push dword ptr ss : [esp]
pop ebx
pop eax```

Equals to

`sub eax, 0x63d17c6b`

Step 3

```push ebp
mov dword ptr ss : [esp], ecx
mov ecx, 0x63d17c6b
push dword ptr ss : [esp]
mov ecx, dword ptr ss : [esp]
push edi
mov edi, esp
xchg dword ptr ss : [esp], edi
pop esp```

Equals to

```push ecx

Step 4

```push 0x7d28
mov dword ptr ss : [esp], eax
mov eax, esp
push edi
mov edi, 0x4
pop edi
push ebp
mov ebp, 0x4
pop ebp
xchg dword ptr ss : [esp], eax
pop esp```

Equals to

`add esp, 0x4`

Step 5 - Now we have...

```sub eax, 0x63d17c6b
push ecx

The push ecx and add esp,0x04 eliminates eachother, while the sub and add for eax also eliminates eachother. Thus the result of the deobfuscation is:

`jmp jmpback`

Edited by NewSprux2.0?

##### Link to post 1 hour ago, NewSprux2.0? said:

If you deobfuscate all the stack manipulation, you're left with

```
sub eax, 0x63d17c6b

Which is basically equal to nothing.

Yup. You could also express that as well as the following. Which I hinted out in the example. It is a dead give away with the use of that constant being added and subtracted. All in all nothing changed. So yes, nothing is the answer.

```push eax			-> 	push eax	-> mov eax, eax
sub dword ptr[esp], 0x63d17c6b		pop eax
pop eax

* typo correction.

Edited by Guest
typo subtraction routine

##### Link to post
2 minutes ago, Ezekiel said:

Yup. You could also express that as well as the following. Which I hinted out in the example. It is a dead give away with the use of that constant being added and subtracted.

```
push eax	-> 	push eax	-> mov eax, eax
sub eax, 0x63d17c6b	pop eax
pop eax

But that's not right. The snippet you posted there, equals to add eax,<param> because you restore with the pop after the subtraction.

##### Link to post 4 hours ago, NewSprux2.0? said:

But that's not right. The snippet you posted there, equals to add eax,<param> because you restore with the pop after the subtraction.

* had a typo. Went off of vague memory. This example was something I was looking at 2~ weeks ago. Should've caught that off the bat when I posted. Stupidly missed adding to an already pushed to the stack value after looking at your code. Haha, wow. Thanks for seeing that mistake. Your process is correct as well.

```esp = esp - 4
dword ptr *esp = ebx

a = esp + 4
dword ptr *esp = (dword ptr *a) - 0x63d17c6b```

(dword ptr * esp + 4) in the subtraction also cancels out which will reduces it to just (dword ptr *esp - 0x63d17c6b). I didn't explain it in the link below however since mosty everything in part 2 is mostly able to cancel out when you do the math. Quite a pleasing process of elimination.

https://mega.nz/#!4epEzYIT!rRlq6qKPj9_f2QKCobBGqgs13nfiB940fBXdhYxtZWU

My lazy days way without really stepping through 100% since there is no real need. Just substitution and elimination with transcribed assembly to pseudo.

Edited by Guest

##### Link to post

bump so you know someones reading besides sprux

• 1

##### Link to post @db4206910 here is the really simple jist in solving with algebra. There are neater ways to go about but figured this would be the best way to get others interested so hence the hints at the start. It is basically Simplifying Algebraic Expressions something that everyone should know early on in their lives. The goal was and is to show ways to do it by hand to make it alot easier for others to understand and grasp.

example, assuming all register values before hand are zero:

```// really easy obfuscation example:
mov eax, 4
sub ebx, 8
mov eax, ebx
sub eax, 4
sub ebx, 4```
```// step 1: let mov operators mean splitting blocks for statements

// block 1
mov eax, 4

// block 2
sub ebx, 8
mov eax, ebx

// block 3
sub eax, 4
sub ebx, 4```
```// step 2: translate statement blocks seperately
// assuming problem provided is the only range given no greater unknown changes
// assume register values before statements begin to previous resolved or in this case it is 0

↓let ebx = x, eax = y

↓x = 0
↓y = 4				mov eax, 4

↓x = x + y			add ebx, eax
↓x = x - 8			sub ebx, 8
↓y = y + 4			add eax, 4
↓y (set to) x			mov eax, ebx

↓x = x - 4			sub eax, 4
↓y = y - 4			sub ebx, 4```
```// step 3: solve translated statements using elimination and substitution

for x = 0 and y = 4 solve:

↓x = x + y 		→ ↓ x = 4
↓x = x - 8 		→ ↓ x = - 4
↓y = y + 4 		→ ↓ y = 8
↓y (set to) x		→ ↓ y = x

↓x = x - 4
↓y = y - 4
```
```// step 4: apply remaining arithmetics with previous results

...
↓y (set to) x		→ ↓ y = x

for x = -4 and y = -4 solve:

↓x = x - 4		→ ↓ x = -8
↓y = y - 4		→ ↓ y = -8```
```// step 5: re-apply to assembly

// assuming only if it was all limited to the previous scope and static
// assuming if at start all registers are 0
mov eax, -8    (or)  	mov eax, -8 	(or) sub ebx,8 	(or) any other combination...
mov ebx, -8		mov ebx, eax	   mov eax,ebx```

Edited by Guest

##### Link to post
4 hours ago, Ezekiel said:

@db4206910 here is the really simple jist in solving with algebra. There are neater ways to go about but figured this would be the best way to get others interested so hence the hints at the start. It is basically Simplifying Algebraic Expressions something that everyone should know early on in their lives. The goal was and is to show ways to do it by hand to make it alot easier for others to understand and grasp.

example:

```
// step 5: re-apply to assembly

// assuming only if it was all limited to the previous scope and static
mov eax, -8    or  	mov eax, -8 	or sub ebx,8 	or any other combination...
mov ebx, -8		mov ebx, eax	   mov eax,ebx```

Thank you for your effort to explain this topic. But I'm not sure if it makes much sense to assume the register values are 0 at the beginning. It's probably better just to use a solution where your output is independent from the beginning value of eax and ebx. In this case your first two solution would give incorrect values if ebx != 0 but that one

```sub ebx,8
mov eax,ebx```

should work fine. Correct me if I'm wrong.

Edited by Schwan

##### Link to post @Schwan No problem. I did the claim of register values are 0 at the beginning to make it easier to understand. Assuming you are given just that one area to reduce down a bit. I didn't want to include thinking about dynamic registers and prexisting values just yet since it might confuse others more so than it should. The next post takes into value with the following being a small chunk of a larger chunk of code to show the limitations of downwards compression and elimination. Which produces a semi-larger result since we are restricted to determine the rest of the blocks before and after the mov.

Edited by Guest

##### Link to post
19 minutes ago, Ezekiel said:

@Schwan no problem. You are correct, I was going post an example for assuming where registers before are not 0 at start. Was too tired to continue.

```
// step 1: setup in the same manner fo easier viewing

mov eax, 4

sub ebx, 8

mov eax, ebx

sub eax, 4
sub ebx, 4```
```
// step 2: elimination of changes that result back to original state

mov eax, 4

sub ebx, 8

mov eax, ebx

sub ebx, 4```

Actually you did a mistake in step 2. You can't eliminate those opcodes when after add eax,4 the value of eax is overwritten before sub eax,4 is even reached. I hope you get my point.

mov eax, 4

sub ebx, 8

mov eax, ebx

sub eax, 4
sub ebx, 4

Edited by Schwan

##### Link to post Spoiler
```
eax,    ebx
mov eax, 4              4,      x
add ebx, eax            4,      x + 4
sub ebx, 8              4,      x - 4
add eax, 4              8,      x - 4
mov eax, ebx           x - 4,   x - 4
sub eax, 4             x - 8,   x - 4
sub ebx, 4             x - 8,   x - 8

mov eax, 4              4,       x
add ebx, eax            4,       x + 4
sub ebx, 8              4,       x - 4
mov eax, ebx            x - 4,   x - 4
sub ebx, 4              x - 8,   x - 4
sub eax, 4              x - 8,   x - 8

add ebx, 4              y,       x + 4
sub ebx, 8              y,       x - 4
mov eax, ebx            x - 4,   x - 4
sub ebx, 4              x - 4,   x - 8
sub eax, 4              x - 8,   x - 8

sub ebx, 4              y,       x - 4
mov eax, ebx            x - 4,   x - 4
sub ebx, 4              x - 4,   x - 8
sub eax, 4              x - 8,   x - 8

sub ebx, 4              y,       x - 4
mov eax, ebx            x - 4,   x - 4
sub ebx, 4              x - 4,   x - 8
sub eax, 4              x - 8,   x - 8

---------------------------------------
...
x = x - 4
y = x
---------------------------------------
x = x - 4
y = y - 4
...
---------------------------------------```

@Schwan Thanks for pointing it out. Yeah, I saw such.

If you want to express it in another style "dynamic" or so follow such: We can express eax as a constant 4 before hand to make viewing easier and can see that eax itself isn't used heavily to modify another register. A few steps down and it would be come (eax = 8) @ "add eax, 4" however, eax gets overwitten with ebx with the mov so totally nops out the "add eax, 4" meaning that we could also compress the "mov eax , 4" since eax becomes overwritten again after "sub ebx, 4". Knowing that then means that the current ebx is undetermined value before the "mov eax, ebx" so that is as far as one code reduce for that section (forces ebx to be a variable). after "mov eax, ebx" since ebx is still undetermined (forces ebx to be a variable). We have to consider that a eax is that of the ebx variable. Thus, any math applied after has to consider both eax and ebx to be variables which cannot be changed at the given moment.

```sub ebx, 4              y,       x - 4
mov eax, ebx            x - 4,   x - 4
sub ebx, 4              x - 4,   x - 8
sub eax, 4              x - 8,   x - 8

should be the most you can reduce it to in the "dynamic" sense```

However, even then you can still simplify it to:

```x = x - 4
y = x
x = x - 4
y = y - 4

y = x - 4
x = (x - 4) - 4
y = (x - 4) - 4

x = x - 8
y = x - 8

which will result in the same as the previous static way above :)```

~ there is no real difference.

Edited by Guest

##### Link to post

Hey, first off I'd like to say thanks for being kind enough to share information on optimization. Deobfuscation has been one of the most challenges things to learn as a reverse engineer lacking algebra skills. The only resources on this stuff seem to be scholarly articles that assume ridiculous math knowledge. This has been much easier to follow. I'm curious, in order to optimize obfuscated ASM in the real-world, like the ones in Themida for example, what level of math should I learn?

-----

Edit: Took a look at your other contributions like this one. Seriously, you are a godsend dude.

Edited by subi

##### Link to post
5 hours ago, subi said:

Hey, first off I'd like to say thanks for being kind enough to share information on optimization. Deobfuscation has been one of the most challenges things to learn as a reverse engineer lacking algebra skills. The only resources on this stuff seem to be scholarly articles that assume ridiculous math knowledge. This has been much easier to follow. I'm curious, in order to optimize obfuscated ASM in the real-world, like the ones in Themida for example, what level of math should I learn?

-----

Edit: Took a look at your other contributions like this one. Seriously, you are a godsend dude.

In short, you don't. Deobfuscation is not so much math, as it is applied logics, since deobfuscation is actually just "optimization". If you want to perform actual deobfuscation on a target, you have multiple approaches:

• Perform hardcoded pattern searches on assembly IR and swap patterns with their corresponding counter-pattern (optimized realization).
• Perform some heuristics scanning, that takes rules into consideration and tries to collapse instructions.
• Utilize compiler-theory approaches for code-optimization at a low level to deobfuscate low-level code (again, deobfuscation == optimization).

All of the above should perform at least the following approaches:

• Branch prediction
• Constant folding
• Constant propagation
Edited by NewSprux2.0?
• 1

##### Link to post
8 hours ago, NewSprux2.0? said:

In short, you don't. Deobfuscation is not so much math, as it is applied logics, since deobfuscation is actually just "optimization". If you want to perform actual deobfuscation on a target, you have multiple approaches:

• Perform hardcoded pattern searches on assembly IR and swap patterns with their corresponding counter-pattern (optimized realization).
• Perform some heuristics scanning, that takes rules into consideration and tries to collapse instructions.
• Utilize compiler-theory approaches for code-optimization at a low level to deobfuscate low-level code (again, deobfuscation == optimization).

All of the above should perform at least the following approaches:

• Branch prediction
• Constant folding
• Constant propagation

Thanks for your reply! So if I understand correctly, an extremely simple example of optimized realization would be, my program searches for:

```sub esp, 4
mov DWORD PTR SS:[esp], eax```

And swap it with:

`push eax`

I didn't quite understand your second bullet point there though, could you elaborate a bit more on heuristics scanning? Maybe using a simple example?

Appreciate the very specific and clear terminology though, that allows me to know which direction I should be headed to learn this stuff on my own.

##### Link to post
On ‎23‎.‎02‎.‎2018 at 14:53, Ezekiel said:

@Schwan Thanks for pointing it out. Yeah, I saw such.

If you want to express it in another style "dynamic" or so follow such: We can express eax as a constant 4 before hand to make viewing easier and can see that eax itself isn't used heavily to modify another register. A few steps down and it would be come (eax = 8) @ "add eax, 4" however, eax gets overwitten with ebx with the mov so totally nops out the "add eax, 4" meaning that we could also compress the "mov eax , 4" since eax becomes overwritten again after "sub ebx, 4". Knowing that then means that the current ebx is undetermined value before the "mov eax, ebx" so that is as far as one code reduce for that section (forces ebx to be a variable). after "mov eax, ebx" since ebx is still undetermined (forces ebx to be a variable). We have to consider that a eax is that of the ebx variable. Thus, any math applied after has to consider both eax and ebx to be variables which cannot be changed at the given moment.

```
sub ebx, 4              y,       x - 4
mov eax, ebx            x - 4,   x - 4
sub ebx, 4              x - 4,   x - 8
sub eax, 4              x - 8,   x - 8

should be the most you can reduce it to in the "dynamic" sense```

However, even then you can still simplify it to:

```
x = x - 4
y = x
x = x - 4
y = y - 4

y = x - 4
x = (x - 4) - 4
y = (x - 4) - 4

x = x - 8
y = x - 8

which will result in the same as the previous static way above :)```

~ there is no real difference.

@Ezekiel I hope you don't mind If I fix a mistake you made. I provided two solutions which should output correct values.

```x = x - 4				or		x = x - 4
y = x							y = x
x = x - 4						x = x - 4
y = y - 4						y = y - 4

y = x - 4						y = x - 4
y = (x - 4) - 4						x = (x - 4) - 4
x = (x - 4) - 4						y = x

y = x - 8						x = x - 8
x = x - 8						y = x```
Edited by Schwan

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account