Chapter 3
Chapter 3
Chapter 3
• This chapter focuses on the basic building blocks of the Microsoft MASM
assembler.
• You will see how constants and variables are defined, standard formats for
numeric and string literals, and how to assemble and run your first programs.
2
Basic Language elements
3
Basic Language elements
4
Integer Literal
• An integer literal (also known as an integer constant) is made up of an optional leading sign, one or
more digits, and an optional radix character that indicates the number’s base.
• So, for example, 26 is a valid integer literal. It doesn’t have a radix, so we assume it’s in decimal
format. If we wanted it to be 26 hexadecimal, we would have to write it as 26h. Similarly, the
number 1101 would be considered a decimal value until we added a “b” at the end to make it
1101b (binary).
• A hexadecimal literal beginning with a letter must have a leading zero to prevent the assembler
from interpreting it as an identifier.
5
Integer expression
• An integer expression is a mathematical expression involving integer values and
arithmetic operators
• The integer expression must evaluate to an integer, which can be stored in 32 bits (0
through FFFFFFFFh)
6
Real Number literals
• Real number literals (also known as floating-point literals) are represented as either decimal reals
or encoded (hexadecimal) reals.
• A decimal real contains an optional sign followed by an integer, a decimal point, an optional
integer that expresses a fraction, and an optional exponent
7
Character literals
• A character literal is a single character enclosed in single or double quotes. The
assembler stores the value in memory as the character’s binary ASCII code.
String literals
• A string literal is a sequence of characters (including spaces) enclosed in
single or double quotes:
• Examples:
'ABC‘
'X‘
"Good night, Gracie“
'4096'
8
Reserved words
• Reserved words have special meaning and can only be used in their correct
context. Reserved works, by default, are not case-sensitive.
• For example, MOV is the same as mov and Mov.
• There are different types of reserved words:
• Instruction mnemonics, such as MOV, ADD, and MUL
• Register names
• Directives, which tell the assembler how to assemble programs
• Attributes, which provide size and usage information for variables and
operands. Examples are BYTE and WORD
• Operators, used in constant expressions
• Predefined symbols, such as @data, which return constant integer values at
assembly time
9
Identifier
• An identifier is a programmer-chosen name. It might identify a variable, a constant, a
procedure, or a code label. There are a few rules on how they can be formed:
• They may contain between 1 and 247 characters.
• They are not case sensitive.
• The first character must be a letter (A..Z, a..z), underscore (_), @ , ?, or $. Subsequent
characters may also be digits.
• An identifier cannot be the same as an assembler reserved word.
10
Directive
• A directive is a command embedded in the source code that is recognized and acted
upon by the assembler.
• One important function of assembler directives is to define program sections, or
segments. Segments are sections of a program that have different purposes.
• For example, one segment can be used to define variables, and is identified by the
.DATA directive:
.data
• The .CODE directive identifies the area of a program containing executable
instructions:
.code
• The .STACK directive identifies the area of a program holding the runtime stack,
setting its size:
.stack 100h
• DWORD
11
Directive Vs Instruction
12
Instruction
An instruction is a statement that becomes executable when a program is assembled.
Instructions are translated by the assembler into machine language bytes, which are
loaded and executed by the CPU at runtime. An instruction contains four basic parts:
• Label (optional)
• Instruction mnemonic (required)
• Operand(s) (usually required)
• Comment (optional)
• Basic syntax
[label:] mnemonic [operands] [ ; comment]
13
Labels
• Act as place markers
• marks the address (offset) of code and data
• Follow identifier rules
• Data label
• must be unique
• example: myArray (not followed by colon)
• count DWORD 100
• Code label
• target of jump and loop instructions
• example: L1: (followed by colon)
Operands
• Constants and constant expressions are often called immediate values
• The ENDP directive marks the end of a procedure. Our program had a procedure
named main, so the endp must use the same name:
main ENDP
• Finally, the END directive marks the end of the program, and references the
program entry point:
END main
• If you prefer to leave the variable uninitialized (assign a random value), the ?
Symbol can be used as a initializer
The BYTE (define byte) and SBYTE (define signed byte) directives allocate storage for one or
more unsigned or signed values. Each initializer must fit into 8 bits of storage.
For example,
value1 BYTE 'A' ; character literal
value2 BYTE 0 ; smallest unsigned byte
value3 BYTE 255 ; largest unsigned byte
value4 SBYTE −128 ; smallest signed byte
• Within a single data definition, its initializers can use different radixes. Character and string
literals can be freely mixed. In the following example, list1 and list2 have the same contents:
list1 BYTE 10, 32, 41h, 00100010b
list2 BYTE 0Ah, 20h, 'A', 22h
Defining Strings
To define a string of characters, enclose them in single or double quotation marks. The most
common type of string ends with a null byte (containing 0), Called a null-terminated string.
Example:
greeting1 BYTE "Good afternoon",0
greeting2 BYTE 'Good night',0
Each character uses a byte of storage. Strings are an exception to the rule that byte values must
be separated by commas.
Defining Strings
• A string can be divided between multiple lines without having to supply a label for each line:
greeting1 BYTE "Welcome to the Encryption Demo program "
BYTE "created by Kip Irvine.",0dh,0ah
BYTE "If you wish to modify this program, please "
BYTE "send me a copy.",0dh,0ah,0
The hexadecimal codes 0Dh and 0Ah are alternately called CR/LF (carriage-return line-feed)
or end-of-line characters.
DUP Operator
The DUP operator allocates storage for multiple data items, using a integer expression as a
counter. It is particularly useful when allocating space for a string or array, and can be used with
initialized or uninitialized data:
BYTE 20 DUP(0) ; 20 bytes, all equal to zero
BYTE 20 DUP(?) ; 20 bytes, uninitialized
BYTE 4 DUP("STACK") ; 20 bytes: "STACKSTACKSTACKSTACK”
var4 BYTE 10,3 DUP(0), 20
The following code, on the other hand, produces a compiled program 20,000 bytes larger:
.data
smallArray DWORD 10 DUP(0) ; 40 bytes
bigArray DWORD 5000 DUP(?) ; 20,000 bytes