Reversing ASM to source code - Challenge 1

Can we take x86 assembly and turn it back into source code?

I thought it would be fun to write a simple C/C++ program and have a friend reverse it and replicate the source as closely as possible. He, in turn, sent me the same challenge. This is a short post about my process of reversing his code. We’ll see how close we got to the original source at the end.

Couple of simple notes: optimization is off and we’re in debug mode, “Debug my own code” looks like it is off. It looks like from the bottom that stack frames are on (cmp ebp,esp, call), so I can safely ignore that code.

Let’s get to work.

Here is the entirety of the function we’re going to reverse.

Assembly code of the function we're going to reverse

The first thing I noticed in the prologue is the local variables for this function (_main).

sub esp, C0

The compiler has allocated 192 bytes for local variables. A couple of instructions below we have a

rep stosd

Which uses the address in EDI to basically memset the memory of the local variable(s) to 0xCCCCCCCC. For the life of me I can’t figure out how to generate this instruction. I’ve tried…

unsigned char buf[192] = { 0xCC };

Which generates a memset call and doesn’t fill all bytes, just sets the first and zeros the rest, clearly not what we want, because in the challenge sent, we’re 95% positive there should be a buffer that is 48d dwords long (192d bytes).

Assembly produced when I tried to replicate the rep stosd instruction

Oh well, let’s move on, we can come back to how that instruction was actually generated later.

There’s a function next

call reversemecolin1.F71212

Peeking into it it looks like this

A small assembly function that features a call to GetCurrentThreadId

This is annoying. I’m not sure if that was hand-written by the programmer or generated by the compiler. It’s clearly something to do with a windows API (GetCurrentThreadId). Note the value at

mov ecx, reversemecolin1.F7C034

is

Memory view of 7 bytes at the address +0xF7C034

This function takes no arguments, just some data in ECX that is already initialized. Checking what section this memory resides in the exe is in reveals it’s in the .msvcjmc section. Something specific to visual studio. Checking google for the presence of a .msvcjmc section reveals that jmc stands for “Just My Code” debugging is actually on! We were wrong initially. It’s safe to conclude (I think) the author didn’t write this portion manually and instead is added by the compiler/linker to support JMC in visual studio. And it’s safe to assume (since there’s no evidence of EAX being used prior to the OpenClipboard call that anything is particularly important. To replicate the code we now know we have to have JMC enabled and Stack frame checking!

In thinking more about the rep stosd instruction and the fact stack frames are on, I’ve concluded that filling that buffer of 192d bytes is part of the stack cookie process. After all 0xCC is an int3 instruction, which it pads the executable space with so it doesn’t overflow the stack! Great, no need to worry about that anymore, whew!

Stack Frames (/RTCs)
Enable Security Check (/GS)
Yes (/JMC)

We have a call to OpenClipboard. Pulling up the msdn page to see how this works reveals this:

BOOL OpenClipboard(
  HWND hWndNewOwner
);

Pretty simple, you provide the handle to the window to the function and it returns a nonzero value if it succeeded. The call after OpenClipboard when I peek inside is a stack check, so we can ignore that.

test eax, eax

This is checking the result of OpenClipboard, and if it returned NULL, will jump over some code. The resulting source then looked like this

This is what our source now looks like:

#include <Windows.h>

int main()
{
	if (OpenClipboard(NULL)) {
		__debugbreak();
	}

    return 0;
}

And the assembly is starting to take shape

Image of the assembly produced by our source code

Next up is a call to EmptyClipboard, doesn’t take any arguments at first glance. MSDN reveals it’s this

BOOL EmptyClipboard(

);

Return Value
Type: BOOL

If the function succeeds, the return value is nonzero.
If the function fails, the return value is zero. To get extended error information, call GetLastError.

The return value is unused by the original programmer (hence no reference to EAX, so we can safely discard it). Resulting in a change to the source

EmptyClipboard();
CloseClipboard();

And finally a messagebox popup that is outside of the if statement

It will look like this

MessageBoxW(0, L"done!", 0, MB_OK);

Since MB_OK is defined as 0L, it seems more explicit for a messagebox call than 0 as the last argument.

Diffing the generated asm from and the original results in the exact same file, with addresses slightly different due to them being loaded at different offsets!

Hex diff of the bytes between the original file and the file we produced with our source code

We did it!

Final source code

#include <Windows.h>

int main()
{
	if (OpenClipboard(NULL))
	{
		EmptyClipboard();
		CloseClipboard();
	}

	MessageBoxW(0, L"done!", 0, MB_OK);

    return 0;
}

A trivial example for sure. I hope I’ve enumerated the process of uncovering 1:1 source code from only x86 asm, as well as shown how to even deduce the compiler settings used.