Chapter 5
Assembly Language Programming
Cont…
• In general, programming of a microprocessor usually
takes several iterations before the right sequence of
machine code instructions is written.
• The process, however, is facilitated using a special
program called an “Assembler”.
• The Assembler allows the user to write alphanumeric
instructions, or mnemonics, called Assembly Language
instructions.
• The Assembler, in turn, generates the desired machine
instructions from the Assembly Language instructions.
Laboratory Requirements
• For the entire lab session, we will use the following software
• Linux OS(Linux Mint)
• Netwide Assembler NASM (8086_Assembler)
Steps to execute an assembly program
1. Write the code on any text editor
2. Save the file as file_name.asm and set the type to All files.
3. Assemble by typing: nasm -f elf file_name.asm
• If there is any error, you will be prompted about that at this stage.
Otherwise, an object file of your program named file_name.o will
be created.
4. To link the object file and create an executable file: ld -m elf_i386 -s
-o file_name file_name.o
5. Execute by typing: ./file_name
Assembling the program
• The assembler is used to convert the assembly language
instructions to machine code.
• It is used immediately after writing the Assembly Language
program.
• It starts by checking the syntax, or validity of the structure, of
each instruction in the source file.
• If any errors are found, the assembler displays a report on
these errors along with a brief explanation of their nature.
• However, if the program does not contain any errors, the
assembler produces an object file that has the same name as
the original file but with the “o” extension
linking
• The linker is used to convert the object file to an executable
file.
• The executable file is the final set of machine code instructions
that can directly be executed by the microprocessor.
• An object file may represent one segment of a long program.
• This segment cannot operate by itself, and must be
integrated with other object files representing the rest of the
program, in order to produce the final self-contained
executable file.
• The executable file contains the machine language code.
• It can be loaded in the RAM and be executed by the
microprocessor simply by typing, from the DOS prompt, the
name of the file followed by the Carriage Return Key (Enter
Key).
• If the program produces an output on the screen, or a
sequence of control signals to control a piece of hardware, the
effect should be noticed almost immediately.
• However, if the program manipulates data in memory, nothing
would seem to have happened as a result of executing the
program.
Structure of an Assembly Language Program
• An assembly language program is written according the
following structure and includes the following assembler
directives:
• The data section
• The bss section
• The text section
The data Section
• The data section is used for declaring initialized data or
constants.
• This data does not change at runtime.
You can declare various constant values, file names or buffer
size etc. in this section.
The syntax for declaring data section is:
section .data
Example
section .data ; Directive to declare data segment
msg db "Hi", 0xA ; Define bytes (a string + newline)
The bss Section
• The bss section is used for declaring variables.
• The syntax for declaring bss section is:
section .bss
The text section
• The text section is used for keeping the actual code.
• This section must begin with the declaration global main or global
_start which tells the kernel where the program execution begins.
• The syntax for declaring text section is:
section .text
global main/ _start
main:/_start
Example
section .text
global _start
_start:
mov rax, 5 ; first number
add rax, 3 ; add second number (5 + 3 = 8)
Comments
• Assembly language comment begins with a semicolon (;).
• It may contain any printable character including blank. It
can appear on a line by itself, like:
; This program displays a message on screen
• or, on the same line along with an instruction, like:
• add eax ,ebx ; adds ebx to eax
Assembly Language Statements
• Assembly language programs consist of three
types of statements:
• Means, In Assembly language, programs are made up of three
main types of statements:
• Executable instructions or instructions
• Assembler directives or pseudo-ops
• Macros
Executable instructions or instructions
• These are actual CPU instructions that are translated into machine code. They
tell the processor what to do — such as:
• move data,
• perform arithmetic,
• jump, or call a system service.
• Each executable instruction generates one machine language instruction.
• Example:
• mov rax, 5 ; Move 5 into the RAX register
• add rax, 3 ; Add 3 to RAX
• The assembler directives or pseudo-ops
• These are commands to the assembler, not to the CPU.
They define data, allocate memory, or control the layout
of the program.
• They do not produce machine code.
• Example:
• section .data ; Directive to declare data segment
• msg db "Hi", 0xA ; Define bytes (a string + newline)
• section .text ; Directive to declare code segment
• global _start ; Tell linker where program starts
• db, dw, dd – define bytes/words/double words
• equ – define a constant value
Macros
• are basically a text substitution mechanism.
• A macro is like a shortcut — a name that expands into a set of
instructions.
• It helps reuse code and makes the program cleaner.
• Macros are expanded by the assembler before assembling.
Syntax of Assembly Language Statements
• Assembly language statements are entered one statement per line.
• Each statement follows the following format:
[label] mnemonic [operands] [; comment]
• The fields in the square brackets are optional.
• A basic instruction has two parts, the first one is the name of the
instruction (or the mnemonic) which is to be executed, and the
second are the operands or the parameters of the command.
Variables
• Here we will simply use the 8086 registers as the
variables in our programs.
• Registers have predefined names and do not need to be
declared.
Variable Declaration
• NASM provides various define directives for reserving
storage space for variables.
• The define assembler directive is used for allocation of
storage space.
•It can be used to reserve as well as initialize one or more
bytes.
Allocating Storage Space for Initialized Data
• The syntax for storage allocation statement for initialized
data is:
• [variable-name] define-directive initial-value [, initial-
value] ...
• Where, variable-name is the identifier for each storage
space.
• The assembler associates an offset value for each variable
name defined in the data segment.
• There are five basic forms of the define directive:
Cont…
• Each variable has a type and assigned a memory address
• A question mark (“?”) place in initial value leaves variable uninitialized
• L DB 4 ;define variable L with initial value 4
• J DB ? ;Define variable J with uninitialized value
• Name DB "Course" ;allocate 6 bytes for name
• K DB 5, 3,-1 ;allocate 3 bytes
• L1 db 0 ; byte labeled L1 with initial value 0
• L2 dw 1000 ; word labeled L2 with initial value 1000
• L3 db 110101b ; byte initialized to binary 110101
• L4 db 12h ; byte initialized to hex 12 (18 in decimal)
• L5 db 17o ; byte initialized to octal 17 (15 in decimal)
• L6 dd 1A92h ; double word initialized to hex 1A92
• L7 resb 1 ; 1 uninitialized byte
• L8 db "A" ; byte initialized to ASCII code for A (65)
cont…
• Double quotes and single quotes are treated the same.
• Consecutive data definitions are stored sequentially in
memory.
• That is, the word L2 is stored immediately after L1 in
memory.
• Sequences of memory may also be defined.
• L9 db 0, 1, 2, 3 ; defines 4 bytes
• L10 db "w", "o", "r", ’d’, 0 ; defines a C string = "word"
• L11 db ’word’, 0 ; same as L10
Allocating Storage Space for Uninitialized Data
• The reserve directives are used for reserving space for
uninitialized data.
• The reserve directives take a single operand that specifies the
number of units of space to be reserved.
• Each define directive has a related reserve directive.
• There are five basic forms of the reserve directive:
Constants
• There are several directives provided by NASM that define
constants.
• Some of them are :
• EQU
• %assign
• %define
The EQU Directive
• The EQU directive is used for defining constants.
• The syntax of the EQU directive is as follows:
CONSTANT_NAME EQU expression
• For example,
TOTAL_STUDENTS equ 50
You can then use this constant value in your code, like:
• mov ecx, TOTAL_STUDENTS
• cmp eax, TOTAL_STUDENTS
• The operand of an EQU statement can be an expression:
• LENGTH equ 20
• WIDTH equ 10
AREA equ length * width
The %assign Directive
• The %assign directive can be used to define numeric
constants like
• the EQU directive. This directive allows redefinition.
• For example, you may define the constant TOTAL as:
• %assign TOTAL 10
Later in the code, you can redefine it as:
• %assign TOTAL 20
• This directive is case-sensitive.
The %define Directive
• The %define directive allows defining both numeric and string
constants.
• This directive is similar to the #define in C.
• For example, you may define the constant PTR as:
%define PTR [EBP+4]
• The above code replaces PTR by [EBP+4].
• This directive also allows redefinition and it is case-sensitive.
Reading Assignment
• Linux System Calls
……End……