IA-32 Assembly for Compiler Writers

Revised 18 November 2008

This is a brief introduction to IA32 assembly language for beginning compiler writers using Unix like machines. It is not a complete description of the IA32 architecture, or even enough description to write efficient programs from scratch. It is enough knowledge to build a compiler.

Note that the IA32 instruction set is described in several large volumes made freely available by Intel. These volumes should not be read cover to cover, but should be used to look up particular technical details once you have read this introduction. In particular, volumes 2A and 2B describe every instruction in great detail.

  • Volume 1: Overview
  • Volume 2A: Instructions A-M
  • Volume 2B: Instructions N-Z
  • Volume 3: System Programming
  • Tools

    For these examples, we will be using the GNU compiler and assembler, known as gcc and as. A quick way to learn something about assembly is to view the assembler output of the compiler. To do this, run gcc with the -S flag, and the compiler will produce assembly output rather than a binary program. On Unix-like systems, assembly code is stored in files ending with .s. (The suffix "s" stands for "source" file, whereas the suffix "a" is used to indicate an "archive" (library) file.) If running on a 64-bit machine, you will also need the -m32 option to force output of 32-bit code. So, gcc -m32 -S hello.c on this program:
    int main( int argc, char *argv[] )
    {
            printf("hello world!\n");
            return 0;
    }
    
    will yield a file hello.S that looks something like this:
            .file   "test.c"
            .section        .rodata
    .LC0:
            .string "hello world!\n"
            .text
    .globl main
            .type   main,@function
    main:
            pushl   %ebp
            movl    %esp, %ebp
            subl    $8, %esp
            andl    $-16, %esp
            movl    $0, %eax
            subl    %eax, %esp
            subl    $12, %esp
            pushl   $.LC0
            call    printf
            addl    $16, %esp
            movl    $0, %eax
            leave
            ret
    .Lfe1:
            .size   main,.Lfe1-main
            .section        .note.GNU-stack,"",@progbits
            .ident  "GCC: (GNU) 3.2.3 20030502 (Red Hat Linux 3.2.3-54)"
    
    Note that the assembly code has three different kinds of elements:
  • Directives begin with a dot and indicate structural information useful to the assembler or the linker. You will need to know two directives: .globl and .string. For example, .globl main indicates that the label main is a global symbol that can be referenced by other code modules. .string indicates a string constant that the assembler should insert into the output code. You need not be concerned with the other directives shown.
  • Labels end with a colon and indicate by their position the association between names and locations. For example, the label .LC0: indicates that the immediately following string should be called .LC0. The label main: indicates that the instruction pushl %ebp is the first instruction of the main function. By convention, labels beginning with a dot are temporary local labels generated by the compiler, while other symbols are user-visible functions and global variables.
  • Instructions are everything else, typically indented to visually distinguish them from directives and labels.
  • By default, the assembly output is not optimized: it has many unnecessary instructions. It is interesting to consider the output of the compiler when we turn on optimization with the -O flag:

            .file   "test.c"
            .section        .rodata.str1.1,"aMS",@progbits,1
    .LC0:
            .string "hello world!"
            .text
    .globl main
            .type   main,@function
    main:
            pushl   %ebp
            movl    %esp, %ebp
            subl    $8, %esp
            andl    $-16, %esp
            subl    $12, %esp
            pushl   $.LC0
            call    puts
            movl    $0, %eax
            leave
            ret
    .Lfe1:
            .size   main,.Lfe1-main
            .section        .note.GNU-stack,"",@progbits
            .ident  "GCC: (GNU) 3.2.3 20030502 (Red Hat Linux 3.2.3-54)"
    
    This is a very aggressive optimizer! Not only have several unnecessary instructions been removed, but the compiler has determined that a call to the (complicated) function printf can be replaced with a call to puts, which only outputs strings! Your compiler need not be quite this clever.

    To take this assembly code and turn it into a runnable program, just run gcc, which will figure out that it is an assembly program, assemble it, and link it with the standard library:

    % gcc -m32 hello.s -o hello
    
    Be sure to take advantage of the fact that GCC emits assembly code. If you don't know quite what instructions to generate with your compiler, see what GCC emits and then look up the details in the Intel manual.

    Now that you know what tools to use, let's begin to look at the assembly instructions in detail.

    IA32 Registers and Data Types

    IA32 has six (almost) general purpose 32-bit registers:

    %eax%ebx%ecx%edx%esi%edi

    We say almost general purpose because earlier versions of the processors had restrictions on which registers could be used for various purposes. As the design developed, new instructions and addressing modes were added to make the various registers almost equal. A few remaining instructions, particularly related to string processing, require the use of %esi and %edi. In addition, two registers are employed as the stack pointer and the base pointer:

    %esp%ebp

    The IA32 architecture has expanded from 8 to 16 to 32 bits over the years, and so each register has some internal structure that you should know about:

    %ah
    8 bits
    %al
    8 bits
    %ax
    16 bits
    %eax
    32 bits

    The lowest 8 bits of the %eax register are an 8-bit register %al, and the next 8 bits are known as %ah. The low 16 bits are collectively known as %ax, and the entire 32-bits are known as %eax. A similar naming scheme applies to the remaining registers. Although we will generally make use of the full 32-bits, you are likely to encounter code that uses the small registers.

    Addressing Modes

    The first instruction that you should know about is the MOV instruction, which moves data between registers and to and from memory. IA-32 is a complex instruction set (CISC) machine, so the MOV instruction has many different variants that move different types of data between different cells.

    MOV, like most instructions, has a single letter suffix that determines the amount of data to be moved. The following names are used to describe data values of various sizes:

    SuffixNameSize
    BBYTE8 bits
    WWORD16 bits
    LLONG32 bits

    So, MOVB moves a byte, MOVW moves a word, and MOVL moves a long. Clearly, the size of the locations you are moving to and from must match the suffix. It is possible to leave off the suffix, and the assembler will attempt to choose the right size based on the arguments. However, this is not recommended, as it can have unexpected effects.

    The arguments to MOV can have one of several addressing modes. A global value is simply referred to by an unadorned name such as x or printf An immediate value is a constant value indicated by a dollar sign such as $56 A register value is the name of a register such as %ebx. An indirect refers to a value by the address contained in a register. For example, (%esp) refers to the value pointed to by %esp. A base-relative value is given by adding a constant to the name of a register. For example, -12(%ecx) refers to the value at the memory location twelve bytes below the address indicated by %ecx. This mode is important for manipulating stacks, local values, and function parameters. There are a variety of complex variations on base-relative, for example -12(%esi,%ebx,4) refers to the value at the address -12+%esi+%ebx*4. This mode is useful for accessing elements of unusual sizes arranged in arrays.

    Here is an example of using each kind of addressing mode to load a value into %eax:

    Global Symbol MOVL x, %eax
    Immediate MOVL $56, %eax
    Register MOVL %ebx, %eax
    Indirect MOVL (%esp), %eax
    Base-Relative MOVL -4(%ebp), %eax
    Offset-Scaled-Base-Relative MOVL -12(%esi,%ebx,4), %eax

    Of course, the same addressing modes may be used to store data into registers and memory locations. IA32 is a CISC architecture, so most instructions allow many combinations of these addressing modes. However, not all modes are supported. For example, it is not possible to use base-relative for both arguments of MOV: MOVL -4(%ebx), -4(%ebx). To see exactly what combinations of addressing modes are supported, you must read the manual pages for the instruction in question.

    You will need four basic arithmetic instructions for your compiler: ADD, SUB, IMUL, and IDIV. The first three instructions have two operands: a source and a destructive target. For example, this instruction:

    ADDL %ebx, %eax
    
    adds %ebx to %eax, and places the result in %eax, overwriting what might have been there before. This requires that you be a little careful in how you make use of registers. For example, suppose that you wish to translate c = b*(b+a), where a and b are global integers. To do this, you must be careful not to clobber the value of b when performing the addition. Here is one possible translation:
    MOVL a, %eax
    MOVL b, %ebx
    ADDL %ebx, %eax
    IMULL %ebx, %eax
    MOVL %eax, c
    
    The IDIV instruction is a little unusual. It implicitly expects the dividend to be available in %eax, while accepting a divisor as an argument. The result is placed in %eax and the remainder in %edx. For example, to divide a by five:
    MOVL a, %eax
    IDIV $5
    

    The instructions INC and DEC increment and decrement a register destructively. For example, the statement a = ++b could be translated as:

    MOVL b, %eax
    INCL %eax
    MOVL %eax, a
    
    Boolean operations work in a very similar manner: AND, OR, and XOR perform destructive boolean operations on two operands, while NOT performs a destructive boolean-not on one operand.

    Like the MOV instruction, the various arithmetic instructions can work on a variety of addressing modes. However, for your compiler project, you will likely find it most convenient to use MOV to load values in and out of registers, and then use only registers to perform arithmetic.

    AT&T Syntax versus Intel Syntax

    Note that the GNU tools use the traditional AT&T syntax, which is used across many processors on Unix-like operating systems, as opposed to the Intel syntax typically used on DOS and Windows systems. The following instruction is given in AT&T syntax:

    movl %esp, %ebp
    
    movl is the name of the instruction, and the percent signs indicate that esp and ebp are registers. In the AT&T syntax, the source is always given first, and the destination is always given second.

    In other places (such as the Intel manual), you will see the Intel syntax, which (among other things) dispenses with the percent signs and reverses the order of the arguments. For example, this is the same instruction in the Intel syntax:

    MOVL EBP, ESP
    
    When reading manuals and web pages, be careful to determine whether you are looking at AT&T or Intel syntax: look for the percent signs!

    Comparisons and Jumps

    Using the JMP instruction, we may create a simple infinite loop that counts up from zero using the %eax register:
            MOVL $0, %eax
    loop:
            INCL %eax
            JMP loop
    
    To define more useful structures such as terminating loops and if-then statements, we must have a mechanism for evaluating values and changing program flow. In most assembly languages, these are handled by two different kinds of instructions: compares and jumps.

    All comparisons are done with the CMP instruction. CMP compares two different registers and then sets a few bits in an internal EFLAGS registers, recording whether the values are the same, greater, or lesser. You don't need to look at the EFLAGS register directly. Instead a selection of conditional jumps examine the EFLAGS register and jump appropriately:

    JEJump If Equal
    JNEJump If Not Equal
    JLJump If Less Than
    JLEJump If Less or Equal
    JGJump if Greater Than
    JGEJump If Greater or Equal

    For example, here is a loop to count %eax from zero to five:

            MOVL $0, %eax
    loop:
            INCL %eax
            CMPL $5, %eax
            JLE  loop
    
    And here is a conditional assignment: if global variable x is greater than zero, then global variable y gets ten, else twenty:
            MOVL x, %eax
            CMPL $0, %eax
            JLE  twenty
    ten:
            MOVL $10, $ebx
            JMP  done
    twenty:
            MOVL $20, $ebx
            JMP  done
    done:
            MOVL %ebx, y
    
    Note that jumps require the compiler to define target labels. These labels must be unique and private within one assembly file, but cannot be seen outside the file unless a .globl directive is given. In C parlance, a plain assembly label is static, while a .globl label is extern.

    Calling Functions

    In order to call functions, we must first know how to use the stack. Recall that %esp is used to keep track of the stack pointer, and most items on the stack are 4 bytes (32 bits) long. By convention, stacks grow downward from high values to low values. So, to push %eax onto the stack, we must subtract 4 from %esp and then write to the location pointed to by %esp:
    SUBL $4, %esp
    MOVL %eax, (%esp)
    
    Popping a value from the stack involves the opposite:
    MOVL (%esp), %eax
    ADDL $4, %esp
    
    And, if we wish to discard the top value from the stack:
    ADDL $4, %esp
    
    Of course, pushing to and popping from the stack referred to by %esp is so common, that the two operations have their own instructions:
    PUSHL %eax
    POPL  %eax
    
    Using the stack, calling a function is straightforward. Each argument to the function must be pushed onto the stack in reverse order, then the function called. When the function returns, the result is found into the %eax register, overwriting whatever was there. The caller is then responsible for removing or discarding the arguments from the stack:

    For example, the following C code:

    x = printf("value: %d",y);
    
    could be translated to this:
    x:
            .long 0
    y:
            .long 0
    .LC0:
            .string "value: %d"
    start:
            PUSHL y         # push the last argument
            PUSHL $.LC0     # push the first argument
            CALL  printf    # invoke printf
            ADDL  $8, %esp  # discard the arguments
            MOVL  %eax, x   # save the result in x
    

    Defining Functions

    Now we are ready to define a function. Let's start with a simple recipe, and examine how it works. Consider this function that accepts three arguments and uses three local variables:
    .globl func
    func:
            pushl  %ebp          # save the old base pointer
            movl   %esp, %ebp    # set ebp to the current esp
            subl   $12,%esp      # allocate three local variables
    
            # body of function goes here
    
            addl   $12,%esp      # de-allocate local variables
            leave                # restore ebp and esp
            ret                  # return to the caller
    
    A function has quite a few details that must be kept track of: the arguments given to the function, the information necessary to return, and space for local computations. For this purpose, we use the base register pointer %ebp. Whereas the stack pointer %esp points to the end of the stack where new data will be pushed, the base pointer %ebp points to the middle of the values needed by the function.

    Note that the nomenclature here is a little confusing: the stack grows down toward smaller numbers: the top of the stack is located at a lower address than the bottom of the stack. To avoid confusion, we will simply refer to numbers as either positive or negative relative to the base pointer.

    Consider the stack layout for func, defined above:

    locals of calling function
    argument 2 16(%ebp)
    argument 1 12(%ebp)
    argument 0 8(%ebp)
    old %eip register 4(%ebp)
    old %ebp register(%ebp)
    local variable 0 -4(%ebp)
    local variable 1 -8(%ebp)
    local variable 2 -12(%ebp) <-- %esp
    space for current function

    Note that the base pointer points to the middle of the stack layout. At positive values (relative to %ebp) are located the arguments to the function. Argument zero (the leftmost argument in C) is always at 8(%ebp), argument one at 12(%ebp), and so forth. The old instruction pointer and base pointer are stored at 4(%ebp) and (%ebp); these are needed to return when the function is complete. At negative values are stored variables local to the function. Finally, the stack pointer points to the last local variable. If we must use the stack for additional purposes, data will be pushed to further negative values.

    Take a moment to sketch out how this stack layout was arrived at. The caller is responsible for setting up items at positive values. First, the caller must push all the arguments onto the stack in reverse order. When the CALL instruction is executed, the old program counter is pushed onto the stack so that control can be returned to the caller when the function completes. The called function then pushes the old base pointer onto the stack, and makes space for three local variables.

    Within the function, we may use base-relative addressing against the base pointer to refer to both arguments and locals. Argument N is located at +8+N*4(%ebp) and local variable N is located at -4-N*4(%ebp). (Although, you can't use that syntax directly.)

    There is one more complication: each function needs to use a selection of registers to perform computations. However, what happens when one function is called in the middle of another? We do not want any registers currently in use by the caller to be clobbered by the called function. To prevent this, each function must save and restore all of the registers that it uses by pushing them onto the stack at the beginning, and popping them off of the stack before returning.

    Here is a complete example that puts it all together. Suppose that you have a C function defined as follows:

    int addthree( int a, int b, int c )
    {
            int x;
            x = a+b+c;
            return x;
    }
    
    Here is a straightforward translation of the function:
    .globl addthree
    addthree:
            pushl %ebp             # save the base pointer
            movl  %esp, %ebp       # set new base pointer to esp
            subl  $4,%esp          # allocate one local variable
    
            pushl %ebx             # save registers that we will use
            pushl %ecx
            pushl %edx
    
            movl   8(%ebp), %ebx   # load each arg into a register
            movl  12(%ebp), %ecx
            movl  16(%ebp), %edx
    
            addl  %edx, %ecx       # add the args together
            addl  %ecx, %ebx
    
            movl  %ebx, -4(%ebp)   # store the result into local 0
            movl  -4(%ebp), %eax   # move local 0 into the result
    
            popl  %edx             # restore temporary registers
            popl  %ecx
            popl  %ebx
    
            addl  $4,%esp          # de-allocate local variables
            leave
            ret
    

    Further Reading

    You have learned the rudiments of IA32 assembly, but there are many more details to be discovered: IA32 lives up to the CISC name: it is a complex instruction set. As you implement your compiler, you will almost certainly find that you need a few more instructions not listed here. Recommend that you peruse the list of instructions found in section 5.1 of Volume I of the Intel manual. Once you have identified the desired instruction, look up its details in Volume II.

    Happy compiling!