Saturday, August 20, 2011

Assembly Overview 1

--------
-Intro:
--------

Well since we're starting this blog I figured a good place to start is a simple overview on assembly and the high-level to low-level comparison. This is an article I wrote a couple years ago under a different handle but figured it got the point across so I would share it with you all. Without further ado, here is the good stuff.

------------
-Terminology
------------

Before we can just jump into the logic behind high-level to low-level conversions, we must first be familiarized with the instructions used in assembly language. Other articles cover certain ones, yet I felt the more important ones were over-looked; therefore here is a simplified list:

Conditional codes:

cmp --- Compare, simple enough, compares two values, whether they be variables, or constants.
je --- Jump if equal, will perform the jump if the preceding values are equal
jne --- Jump if not equal, performs the jump if values are not equal
jmp --- Unconditional jump, will jump regardless

Mathematical:

add, sub, mult, div --- Common sense, don't need an explanation
shl --- Shift to the left, performs a 'binary shift' to the left, which is equivalent to multiplying by 2
shr --- Shift to the right, performs another 'binary shift', which is dividing by 2

Calls:

calls require a bit of a note before I can introduce the actual instructions. Firstly, there are two types of commands, imported, and internal. Secondly, CALL instructions include prologues(the 'call' itself) and epilogues(a 'ret' instruction), which are what make them easily recognizable from a reversing point of view.

call codeaddress --- Sample of an internal function, where it will jump to the written address

call DWORD PTR [pointer] --- Sample of an external function, pointers are above the scope of this article, however it is enough to understand that it will jump to the code that the pointer points to, not the address of the pointer itself.

Registers:

If assembly language code were to access RAM every time it needed to store a variable or a value, then programs would run horrendously slow. To counter this, CPUs have their own internal memory, called registers. Typical IA-32 processors have 8 registers, all of which are generic. The functions of ESI, EDI, EBP, and ESP are unfortunately outside the scope of this article, however here is a list of those 8, and a list of their general use:

EAX, EBX, EDX --- Used for any integer/boolean/logical/memory operations
ECX --- Sometimes used as a counter, as those used in loops
ESI/EDI --- Used as source/destination pointers
EBP --- Used as a stack base counter
ESP --- Used as the CPU stack pointer

----------------
Logic: Branching
----------------

Ok, now that we've gotten the basics out of the way, let's move on to the actual logic. Ever wonder how high-level language statements are represented in assembly language? no? Well what are you doing reading this article! Suffice to say, assembly language is represented much more abstractly, and sometimes it may not be easy to recognize the similarities. However, I will attempt clarify the ambiguity, allowing you to recognize patterns, and then reconstruct the high-level code from the assembly language code.

** Single Branching **

Single branching refers to a fork, where when code reaches a point, it will either go in one direction, or the other. This is represented in high-level programming with an 'if' statement.

-- The high level representation --

if (variable == 10)
CallFunction();

-- Assembly language representation --

mov eax, [variable] ---> moves variable into eax
cmp eax, 10 ---> compares value with 10
jne Afterfunction ---> jumps to Afterfunction if not equal
call Function
Afterfunction ---> section it jumps to if not equal

As you can see, the code moves our inputted variable into eax, then will cmp eax with 10. Now if they are equal it will skip over the jne, and move to 'call Function', which will call up our function. If they are NOT equal however, once the code reaches 'jne Afterfunction', it will see that they are not equal, and jump to 'Afterfunction'.

** Two Way Conditional Branching **

Two way conditionals are similar to single branching. They are represented by an if-else statement in high-level coding.

-- The high level representation --

if (variable == 10)
Function();
else
OtherFunction();

-- Assembly language representation --

mov eax, [variable]
cmp eax, 10
jne OtherFunction
call Function
jmp AfterIfElse
call OtherFunction
AfterIfElse...

A quick summary about the assembly code: Moves our variable into eax, then compares it to 10. If it is not equal to 10, it will jump to 'call OtherFunction', which calls the respected function and then exits the conditional section. If it is equal to 10, it does not jump, moves on to 'call Function', which again calls the respected Function, then after the function is complete, returns, and unconditionally jumps out of the conditional section.

** Multiple-Alternative Branching **

This is essentially the same as two way, just instead of an else, it would be an else-if. I will therefore quickly go over this one and skip the summary.

-- The high level representation --

if (variable == 10)
Function();
else if (variable == 365)
OtherFunction();

-- Assembly language representation --

mov eax, [variable]
cmp eax, 10
jne OtherFunction
call Function
jmp Afterfunction
cmp eax, 365 ---> Where 'jne Otherfunction' jumps to
jne Afterfunction
call OtherFunction
Afterfunction...

** Compound Conditional Branching **

Compound Conditionals refer to when more than one condition must be satisfied before the condition is s

-- The high level representation --

if (variable == 10 &&
variable2 == 50)
Function();

-- Assembly language representation --

cmp [variable], 100
jne Afterfunction
cmp [variable2], 50
jne Afterfunction
call Function
Afterfunction...

The same kind of conditional can be built using OR's instead of AND's.

-- The high level representation --

if (variable == 10 ||
variable2 == 50)
Function();

-- Assembly language representation --

cmp [variable], 10
je Function
cmp [variable2], 50
je Function
jump Afterfunction
call Function
Afterfunction...

As you can see, the code checks if the first variable equals 10, and if it does, jumps to the 'call Function', disregarding the second variable. If, however, the first variable does not equal 10, then the code immediately check the second variable to see if it equals 50. If it is, it then jumps to the 'call Function'. Otherwise it then unconditionally jumps to after the function at Afterfunction.

-----------
Logic:Loops
-----------

Loops are an essential part of programming. Loops are essentially the same as the code we have been watching except that the code gets repeated. There are essentially two types of loops, Pre-tested loops, and Post-tested loops. As their names indicate, pre-tested check the conditions before the code is executed, and post-tested execute the code first, then check the conditions.

** Pre-Tested Loop **

Pre-tested loops are slightly less efficient then post-tested loops in the fact that they need two jumps: a conditional branch in the beginning, and an unconditional jump at the end that jumps back to the beginning.

-- The high level representation --

a = 0
while (a < 100)
{
array[a] = a;
c++;
}

-- Assembly language representation --

mov ecx, DWORD PTR [array]
xor eax,eax ---> xor's eax to zero
mov DWORD PTR [ecx+eax*4], eax ----> beginning of loop
add eax, 1
cmp eax, 100
jl ------------------> Jumps to beginning of loop if lower then 100

Ok, this code needs a bit of explaining. Firstly, you might be wondering why the conditional is at the beginning in the high-level code, yet at the end of the assembly language code? That is actually due to the compiler. It is slightly more efficient to put the test at the end of the code, since it will not require an unconditional JMP, so the compiler moved it there. I will go more in-depth later in the article. What the code is doing is firstly XORing eax, which is setting it to zero. Then the code will execute the loop, ADD 1 to eax, which is our counter. It will then CMP our counter to 100, and if it is less it will jump back to the beginning of the loop, essentially continuing until it reaches 100, where it will then exit the loop.

** Post-Tested Loop **

If a compiler already produces a post-tested loop from a pre-tested loop, you may be asking yourself what the compiler will do with a post-tested loop. If you answered "They would essentially be the same" you are right! The compiler basically re-arranges the code to make it the most efficient. So a sample would look like:

-- The high level representation --

c=1;
do
{
array[a] = c;
c++;
} while (c < 100);


-- Assembly language representation --

mov ecx, DWORD PTR [array]
mov DWORD PTR [ecx+eax*4], eax ----> beginning of loop
add eax, 1
cmp eax, 100
jl ------------------> Jumps if lower then 100

As you can see the only difference is that there is no XORing involved in the post-tested. Since the Pre-tested loop's conditional statement was moved to the end, the compiler added the XOR function to make sure 'a' was within the boundaries of the conditional. The only difference with the post-tested is that in the post-tested the code will run at least once, no matter what a is set as, therefore there is no need to modify a.

----------
Efficiency
----------

The code you have seen so far in this article is only one of many ways of interpreting a high-level language. The style is mostly dependent on the compiler you use, however most compilers will strive to achieve the highest efficiency. Efficiency can be accomplished by either generating the smallest possible program binaries, or the most high high-performance code possible. In this section I'll give examples of such Optimization.

Mathematically:

Multiplication and Division are considered 'Complex' operations by the CPU. To compensate for that, the CPU will perform a series of 'Simpler' operations. Although there may be many more operations to get the equivalent result, the net result will still be faster than the slower multiplication and division.

** Multiplication **

A sample of multiplying a variable by 32 in an efficient way:

lea eax, DWORD PTR [edx+edx] ----> lea is what the processor uses to make quick additions and shifts
add eax, eax
add eax, eax
add eax, eax
add eax, eax

As you can see, the first line will add your variable which is stored in edx with itself, then store the result in eax. Then the code will add eax with itself 4 times, resulting in 32 times the original edx value.

** Division **

Division is by far the slowest of the arithmical operations of a processor. Division is upwards 50 cycles slower then Addition and Subtraction, each of which is less than 1 cycle. As a result, the CPU treats divisions differently and attempts to streamline this operation using 'Reciprocal Division'. This is not ALWAYS possible, as is the case with unknown divisors, however whenever possible, reciprocal division is sought after. An example of a typical reciprocal-division that is dividing our value by 3 is:

mov ecx, eax ---> moves our number into ecx
mov eax ,0xaaaaaaab ---> moves 2/3 into eax
mul ecx ---> multiplies ecx by eax, so our value * (2/3)
shr edx, 2 ---> then divides our value by 2, so value * (2/3) / 2 = value * (1/3)
mov eax, edx ---> then moves our divided value back into eax

What is that 0xaaaaaab you might be asking? Well that is how Reciprocal-Division is used. In the above code, the function is dividing the value by 3. It is combining a reciprocal 0xaaaaaab, whose value in fractional is 2/3, with another divisor of 2, which comes out to being divided by 3. (2/3) / 2 = (1/3). Combining a fraction with a simple divisor, such as a binary shift, the CPU can create a division as efficiently as possible.

Branching Optimization:

** Branching **

Optimization and efficiency aren't only restricted to mathematical operations. Code segments such as branching can also be modified to operate more efficiently. The following is an example of more efficient/less efficient using a simple IF-OR statement like the one shown earlier in the article.

--- less efficient ---

cmp [variable], 10
je Function
cmp [variable2], 365
je Function
jmp Afterfunction
Call Function
Afterfunction...

--- more efficient ---

cmp [variable], 100
je Function
cmp [variable2], 365
jne Afterfunction
call Function
Afterfunction...

As you can see both codes execute the same branching conditional, however the section on the right has less instructions, and is therefore more efficient. The same can be said for pretty much all sections of code, there is almost always a way to make the code more efficient.


And that's it for the first installation. Hopefully you all enjoyed it. Comments and suggestions are welcome. Until next time,

padraignix

No comments:

Post a Comment