In response to a Quora question, I wrote the following:
Of course it is possible [to reverse engineer code without the source]. Let me give you a simple example.
I am just looking at a Linux executable that I created moments ago. I pretend that I don't have the source code. The executable file is called t.
First, let me disassemble the file:
objdump -d t > t.asm
And just in case, let me also dump any global data:
objdump -s -j .rodata t > t.dat
And now let's look at the result. The file t.asm is over 200 lines, but a lot of it is just the standard C preamble/cleanup code.
The relevant bit is the main() function, which begins as follows:
00000000004004ec : 4004ec: 55 push %rbp 4004ed: 48 89 e5 mov %rsp,%rbp 4004f0: 48 83 ec 10 sub $0x10,%rsp
This is just the standard C function preamble. Memory is allocated for local (automatic) variables on the stack, by adjusting the stack pointer.
... 4004f4: c7 45 fc 02 00 00 00 movl $0x2,-0x4(%rbp)
The value of 2 is stored in a first variable, which is a 32-bit integer. So let me write down some C code that corresponds to this:
... int A; // First variable A = 2;
Continuing, I read:
... 4004fb: 8b 45 fc mov -0x4(%rbp),%eax 4004fe: 0f af 45 fc imul -0x4(%rbp),%eax 400502: 89 45 f8 mov %eax,-0x8(%rbp)
Here, the first variable is moved to the accumulator, multiplied by itself, and stored in a second variable (also a 32-bit integer). I'd write this as:
... int B; // Second variable B = A * A;
Carrying on:
... 400505: b8 0c 06 40 00 mov $0x40060c,%eax
The address 0x40060c is pushed down the stack. What is at this address? This is where the second file I created, t.dat, comes in handy as it contains a dump of the relevant data segment:
Contents of section .rodata: 400608 01000200 25640a00 ....%d..
So at 0x40060c, I find the string "%d\n".
Going back to the code:
... 40050a: 8b 55 f8 mov -0x8(%rbp),%edx
The value of the second variable is pushed down the stack.
... 40050d: 89 d6 mov %edx,%esi 40050f: 48 89 c7 mov %rax,%rdi 400512: b8 00 00 00 00 mov $0x0,%eax 400517: e8 cc fe ff ff callq 4003e8 <printf@plt>
After the usual function call preamble, the printf function is called. In other words, the original C code had to look something like this:
... printf("%d\n", B);
Finally:
... 40051c: c9 leaveq 40051d: c3 retq 40051e: 90 nop 40051f: 90 nop
And the main() function is concluded.
So fully reconstructed, the program reads:
main() { int A; int B; A = 2; B = A * A; printf("%d\n", B); }
For comparison, this was my original source code:
#include <stdio.h> void main(void) { int i, j; i = 2; j = i * i; printf("%d\n", j); }
Apart from variable names (which are not preserved in the object file) the code was correctly reconstructed.
Now I am not trying to give the impression that it is always easy to reverse engineer code. On the contrary, it can be fiendishly difficult, particularly due to compiler optimizations, when real-world code is concerned and can often look nothing like the original code. But it can be done.