Introduction To x64 Assembly
Introduction To x64 Assembly
Introduction To x64 Assembly
.intel syntax file, gcc skips its C front-end, and just assembles
.text
and links the file. We need to provide the option
.globl main
-m64 to choose x64 assembler. Putting it all to-
main:
/* function prologue */ gether, we get the following command-line:
push rbp gcc -m64 main.s
mov rbp, rsp
• The previous step produced an executable file
/* call puts("Hello, World!") */
lea rdi, [rip + main.S 0] a.out with the machine code. Run it:
call puts ./a.out
/* return zero */
While we are on the topic of tooling, you can also
mov rax, 0
mov rsp, rbp use gcc to compile a C program to assembly by us-
pop rbp ing the gcc command-line option -S. This option in-
ret structs gcc to run the front-end only. That is a use-
main.end: ful approach for getting concrete examples of various
main.S 0: code sequences. If you specify -Wall to enable warn-
.string "Hello, World!" ings and -ansi to select the C dialect, the following
command-line compiles a source program main.c to a
target program main.s:
Figure 1: A simple program in x64 assembler. gcc -Wall -ansi -m64 -masm=intel -S main.c
See also: http://www.inf.usi.ch/faculty/soule/
teaching/2015-fall/cc/x64-intro/hello_world.txt
2 x64 Syntax
Abstract Rather than give a formal grammar for x64, this sec-
The name x64 refers to a 64-bit instruction set for In- tion describes it using the example in Figure 1. There
tel and AMD processors, which are commonly found is one statement per line, which means that changing
in current-generation laptop and desktop computers. newlines would change the meaning of the program.
This document introduces a subset of x64, including Other than that, the syntax is insensitive to whites-
the features needed for a Compilers course at USI. pace, meaning that additional spaces, tabs, or com-
We write assembler files using “Intel syntax”, and we ments do not affect program behavior. Comments
adopt the C calling conventions of Mac OS X. start with /* and end with */.
The example program contains two kinds of state-
ments: directives and instructions. Directives start
1 Example: Hello, World! with a period, such as .intel syntax, whereas instruc-
tions consist of an operator and a list of operands,
Figure 1 shows an example program in x64 assembler such as mov rbp, rsp. In addition, a statement can
that prints a greeting to standard output. Before we start with labels, which are symbols followed by
look at what this does, let’s try running it. The steps colon, such as main.end:.
are as follows: The directive .intel syntax at the start of the file
• Put the code in a file called main.s. The file selects Intel syntax. Without that directive, the de-
extension .s indicates an assembler file. fault is .att syntax. One major difference between
the two options is the order of operands: Intel syntax
• Run the assembler and linker. We use gcc for shows the destination operand first, whereas AT&T
this. When the input file to gcc is an assembler syntax shows the destination operand last. While the
3 Addresses 4 Instructions
The following reference lists instructions in alpha-
As mentioned before, an instruction consists of zero
betical order. When an instruction has multiple
or more labels, an operator, and zero or more
addressing modes, the alternatives are separated by
operands. We refer to operands as “addresses”, even
a vertical bar |. As a general rule of thumb, most
when they are non-pointer values. We use the follow-
instructions support only one memory operand (m),
ing kinds of addresses:
not two. Typically, the first operand is a destination
operand, in other words, many instructions store
• Registers (r). There are sixteen 64-bit general-
their result in the first operand.
purpose registers: rax to rdx, rsp, rbp, rsi, rdi,
and r8 to r15. However, some of these registers add → add r, i | add r, r | add r, m | add m, i | add m, r
play a special role, for example, rsp and rbp typ- Compute the sum of the two operands, and store the
ically hold the stack pointer and base pointer, as result in the first operand.
their names imply.
call → call label
• Immediate operands (i). These are either integer Store the return address into [rsp]. Subtract 8 from
constants or labels. rsp. Jump to the label.
Integer constants are written as either the cmp → cmp r, i | cmp r, r | cmp r, m | cmp m, i | cmp m, r
digit 0, or a digit from 1-9 followed by zero Compare the two operands. Encode the result in
or more digits from 0-9. status flags in an internal register, which can then be