Mipspro Assembly Language Programmer'S Guide: Document Number 007-2418-003
Mipspro Assembly Language Programmer'S Guide: Document Number 007-2418-003
Mipspro Assembly Language Programmer'S Guide: Document Number 007-2418-003
Programmer’s Guide
List of Figures ix
List of Tables xi
1. Registers 1
Register Format 1
General Registers 2
Special Registers 4
Floating-Point Registers 5
Floating-Point Condition Codes 7
2. Addressing 9
Instructions to Load and Store Unaligned Data 9
Address Formats 11
Address Descriptions 11
3. Exceptions 13
Main Processor Exceptions 13
Floating-Point Exceptions 14
iii
Contents
4. Lexical Conventions 15
Tokens 16
Comments 16
Identifiers 16
Constants 17
Scalar Constants 17
Floating-Point Constants 18
String Constants 19
Multiple Lines Per Physical Line 20
Section and Location Counters 20
Statements 22
Label Definitions 22
Null Statements 23
Keyword Statements 23
Expressions 23
Precedence 24
Expression Operators 25
Data Types 26
Type Propagation in Expressions 28
Relocations 28
iv
Contents
v
Contents
Index 129
vi
List of Figures
vii
List of Figures
viii
List of Tables
ix
List of Tables
x
About This Guide
This book describes the assembly language supported by the RISCompiler system, its
syntax rules, and how to write assembly programs. For information on assembling and
linking an assembly language program, see the MIPSpro Compiling and Performance
Tuning Guide.
The assembler converts assembly language statements into machine code. In most
assembly languages, each instruction corresponds to a single machine instruction;
however, some assembly language instructions can generate several machine
instructions. This feature results in assembly programs that can run without modification
on future machines, which might have different machine instructions.
In this release of operating system and compiler software, the assembler supports
compilations in –o32, –n32, and –64 mode. Some of the implications of the different data
sizes are explained in this book. For additional information, please refer to the MIPSpro
64-Bit Porting and Transition Guide.
xi
About This Guide
Audience
This book assumes that you are an experienced assembly language programmer. The
assembler produces object modules from the assembly instructions that the C, and
Fortran 77 compilers generate. It therefore lacks many functions normally present in
assemblers. You should use the assembler only when you must:
• Maximize the efficiency of a routine, which might not be possible in C, Fortran 77,
or another high-level language; for example, to write low-level I/O drivers.
• Access machine functions unavailable in high-level languages or satisfy special
constraints such as restricted register usage.
• Change the operating system.
• Change the compiler system.
Topics Covered
xii
About This Guide
xiii
Chapter 1
1. Registers
This chapter describes the organization of data in memory, and the naming and usage
conventions that the assembler applies to the CPU and FPU registers. This chapter covers
the following topics:
• “Register Format” on page 1
• “General Registers” on page 2
• “Special Registers” on page 4
• “Floating-Point Registers” on page 5
• “Floating-Point Condition Codes” on page 7
See Chapter 7, “Linkage Conventions,” for information on register use and linkage.
Register Format
The CPU uses four data formats: a 64-bit doubleword, a 32-bit word, a 16-bit halfword
and an 8-bit byte. Byte ordering within each of the larger data formats – doubleword,
word or halfword – the CPU’s byte ordering scheme (or endian issues), affects memory
organization and defines the relationship between address and byte position of data in
memory.
For R4000 and earlier systems, byte ordering is configurable into either big-endian or
little-endian byte ordering (configuration occurs during hardware reset). When
configured as a big-endian system, byte 0 is always the most-significant (lifetimes) byte.
When configured as a little-endian system, byte 0 is always the least-significant
(rightmost byte).
1
Chapter 1: Registers
General Registers
For the MIPS1 and MIPS2 architectures, the CPU has thirty-two 32-bit registers. In the
MIPS3 architecture and above, the size of each of the thirty-two integer registers is 64-bit.
Table 1-1 and Table 1-2 summarize the assembler’s usage, conventions and restrictions
for these registers. The assembler reserves all register names; you must use lowercase for
the names. All register names start with a dollar sign($).
The general registers have the names $0..$31. By including the file regdef.h (use #include
<regdef.h>) in your program, you can use software names for some general registers.
The operating system and the assembler use the general registers $1, $26, $27, $28, and
$29 for specific purposes. Attempts to use these general registers in other ways can
produce unexpected results.
2
General Registers
Note: General register $0 always contains the value 0. All other general registers are
equivalent, except that general register $31 also serves as the implicit link register for
jump and link instructions. See Chapter 7 for a description of register assignments.
3
Chapter 1: Registers
Special Registers
The CPU defines three special registers: PC (program counter), HI and LO, as shown in
Table 1-3. The HI and LO special registers hold the results of the multiplication (mult and
multu) and division (div and divu) instructions.
You usually do not need to refer explicitly to these special registers; instructions that use
the special registers refer to them automatically.
Name Description
PC Program Counter
Note: In MIPS3 architecture and later, the HI and Lo registers hold 64-bits.
4
Floating-Point Registers
Floating-Point Registers
The FPU has 32 floating-point registers. In –n32 and –64 compiles, each register can hold
either a single-precision (32-bit) or double-precision (64-bit) value. For –32 compiles,
only 16 register values are allowed, specified with the even floating-point registers (for
example, $f4). For –32, the registers are limited to 32 bits, so for double-precision values,
$f0 holds the least-significant half and $f1 holds the most-significant half, but the
combined double-precision value is referenced with $f0. The –n32 register conventions
are different from the –64 convention, to be more compatible with –32. Tables 1-4, 1-5, and
1-6 summarize the assembler’s usage conventions.
5
Chapter 1: Registers
6
Floating-Point Condition Codes
7
Chapter 2
2. Addressing
This chapter describes the formats that you can use to specify addresses. Silicon Graphics
CPUs use a byte addressing scheme. Access to halfwords requires alignment on even
byte boundaries, and access to words requires alignment on byte boundaries that are
divisible by four. Access to doublewords (for 64-bit systems) requires alignment on byte
boundaries that are divisible by eight. Any attempt to address a data item that does not
have the proper alignment causes an alignment exception.
The unaligned assembler load and store instructions may generate multiple machine
language instructions. They do not raise alignment exceptions.
9
Chapter 2: Addressing
10
Address Formats
Address Formats
The assembler accepts these formats shown in Table 2-1 for addresses. Table 2-2 explains
these formats in more detail.
Format Address
Address Descriptions
The assembler accepts any combination of the constants and operations described in this
chapter for expressions in address descriptions.
expression (base-register) Specifies a based address. To get the address, the CPU adds the
value of the expression to the contents of the base-register.
11
Chapter 2: Addressing
index-register (base-register) Same as expression (base-register), except that the index register
is used as the offset.
relocatable-symbol (index register) Specifies an indexed relocatable address. To get the address,
the CPU adds the index register to the relocatable symbol’s
address. The assembler generates the necessary instruction(s)
to address the item and generates relocatable information for
the link editor. If the symbol name does not appear as a label
anywhere in the assembly, the assembler assumes that the
symbol is external.
12
Chapter 3
3. Exceptions
This chapter describes the exceptions that you can encounter while running assembly
programs. The system detects some exceptions directly, and the assembler inserts
specific tests that signal other exceptions. This chapter lists only those exceptions that
occur frequently:
• “Main Processor Exceptions” on page 13
• “Floating-Point Exceptions” on page 14
The following exceptions are the most common to the main processor:
• Address error exceptions, which occur when a data item is referenced that is not on
its proper memory alignment or when an address is invalid for the executing
process.
• Overflow exceptions, which occur when arithmetic operations compute signed
values and the destination lacks the precision to store the result.
• Bus exceptions, which occur when an address is invalid for the executing process.
• Divide-by-zero exceptions, which occur when a divisor is zero.
13
Chapter 3: Exceptions
Floating-Point Exceptions
14
Chapter 4
4. Lexical Conventions
15
Chapter 4: Lexical Conventions
Tokens
The assembler lets you put blank characters and tab characters anywhere between
tokens; however, it does not allow these characters within tokens (except for character
constants). A blank or tab must separate adjacent identifiers or constants that are not
otherwise separated.
Comments
The pound sign character (#) introduces a comment. Comments that start with a # extend
through the end of the line on which they appear. You can also use C-language notation
/*...*/ to delimit comments.
The assembler uses cpp (the C language preprocessor) to preprocess assembler code.
Because cpp interprets #s in the first column as pragmas (compiler directives), do not start
a # comment in the first column.
Identifiers
16
Constants
If an identifier is not defined to the assembler (only referenced), the assembler assumes
that the identifier is an external symbol. The assembler treats the identifier like a.globl
pseudo-operation (see Chapter 8). If the identifier is defined to the assembler and the
identifier has not been specified as global, the assembler assumes that the identifier is a
local symbol.
Constants
Scalar Constants
17
Chapter 4: Lexical Conventions
Floating-Point Constants
where:
• d1 is written as a decimal integer and denotes the integral part of the floating-point
value.
• d2 is written as a decimal integer and denotes the fractional part of the
floating-point value.
• d3 is written as a decimal integer and denotes a power of 10.
• The “+” symbol is optional.
For example:
21.73E–3
Optionally, .float and .double directives may use hexadecimal floating-point constants
instead of decimal ones. A hexadecimal floating-point constant consists of:
<+ or –> 0x <1 or 0 or nothing> . <hex digits> H 0x <hex digits>
The assembler places the first set of hex digits (excluding the 0 or 1 preceding the decimal
point) in the mantissa field of the floating-point format without attempting to normalize
it. It stores the second set of hex digits into the exponent field without biasing them. It
checks that the exponent is appropriate if the mantissa appears to be demoralizing.
Hexadecimal floating-point constants are useful for generating IEEE special symbols,
and for writing hardware diagnostics.
18
Constants
String Constants
String constants begin and end with double quotation marks (").
The assembler observes C language backslash conventions. For octal notation, the
backslash conventions require three characters when the next character can be confused
with the octal number. For hexadecimal notation, the backslash conventions require two
characters when the next character can be confused with the hexadecimal number (that
is, use a 0 for the first character of a single character hex number).
Convention Meaning
\a Alert (0x07)
\b Backspace (0x08)
\n Newline (0x0a)
\\ Backslash (0x5c)
19
Chapter 4: Lexical Conventions
You can include multiple statements on the same line by separating the statements with
semicolons. The assembler does not recognize semicolons as separators when they
follow comment symbols (# or /*).
Assembled code and data fall in one of the sections shown in Figure 4-1.
20
Section and Location Counters
.data
.lit4
The assembler always generates the text section before other sections. Additions to the
text section happen in four-byte units. Each section has an implicit location counter,
which begins at zero and increments by one for each byte assembled in the section.
The bss section holds zero-initialized data. If a .lcomm pseudo-op defines a variable (see
Chapter 8), the assembler assigns that variable to the bss (block started by storage) section
or to the sbss (short block started by storage) section depending on the variable’s size. The
default variable size for sbss is 8 or fewer bytes.
21
Chapter 4: Lexical Conventions
The command line option –G for each compiler (C, Pascal, Fortran 77, or the assembler),
can increase the size of sbss to cover all but extremely large data items. The link editor
issues an error message when the –G value gets too large. If a –G value is not specified
to the compiler, 8 is the default. Items smaller than, or equal to, the specified size go in
sbss. Items greater than the specified size go in bss.
Because you can address items much more quickly through $gp than through a more
general method, put as many items as possible in sdata, srdata, or sbss. The size of sdata,
srdata, and sbss combined must not exceed 64 KB.
Statements
Each statement consists of an optional label, an operation code, and the operand(s). The
system allows these statements:
• Null statements
• Keyword statements
Label Definitions
Label definitions always end with a colon. You can put a label definition on a line by
itself.
A generated label is a single numeric value (1...255). To reference a generated label, put
an f (forward) or a b (backward) immediately after the digit. The reference tells the
assembler to look for the nearest generated label that corresponds to the number in the
lexically forward or backward direction.
22
Expressions
Null Statements
A null statement is an empty statement that the assembler ignores. Null statements can
have label definitions. For example, this line has three null statements in it:
label: ; ;
Keyword Statements
A keyword statement begins with a predefined keyword. The syntax for the rest of the
statement depends on the keyword. All instruction opcodes are keywords. All other
keywords are assembler pseudo-operations (directives).
Expressions
An expression is a sequence of symbols that represent a value. Each expression and its
result have data types. The assembler does arithmetic in twos-complement integers (32
bits of precision in 32-bit mode; 64 bits of precision in 64-bit mode). Expressions follow
precedence rules and consist of:
• Operators
• Identifiers
• Constants
Also, you may use a single character string in place of an integer within an expression.
Thus:
.byte “a” ; .word “a”+0x19
is equivalent to:
.byte 0x61 ; .word 0x7a
23
Chapter 4: Lexical Conventions
Precedence
Unless parentheses enforce precedence, the assembler evaluates all operators of the same
precedence strictly from left to right. Because parentheses also designate index-registers,
ambiguity can arise from parentheses in expressions. To resolve this ambiguity, put a
unary + in front of parentheses in expressions.
The assembler has three precedence levels, which are listed here from lowest to highest
precedence
least binding, binary +,-
lowest precedence
.
. binary *,/,5,<<,>>,^,&, |
.
most binding, unary -,+,~
highest precedence
Note: The assembler’s precedence scheme differs from that of the C language.
24
Expressions
Expression Operators
For expressions, you can rely on the precedence rules, or you can group expressions with
parentheses. The assembler recognizes the operators listed in Table 4-2.
Operator Meaning
+ Addition
- Subtraction
* Multiplication
/ Division
% Remainder
^ Bitwise Exclusive-OR
| Bitwise OR
- Minus (unary)
+ Identity (unary)
~ Complement
25
Chapter 4: Lexical Conventions
Data Types
The assembler manipulates several types of expressions. Each symbol you reference or
define belongs to one of the categories shown in Table 4-3.
Type Description
undefined Any symbol that is referenced but not defined becomes global
undefined, and this module will attempt to import it. The
assembler uses 32-bit addressing to access these symbols.
(Declaring such a symbol in a. globl pseudo-op merely makes its
status clearer).
text The text section contains the program’s instructions, which are
not modifiable during execution. Any symbol defined while
the.text pseudo-op is in effect belongs to the text section.
data The data section contains memory that the linker can initialize to
nonzero values before your program begins to execute. Any
symbol defined while the .data pseudo-op is in effect belongs to
the data section. The assembler uses 32-bit or 64-bit addressing to
access these symbols (depending on whether you are in 32-bit or
64-bit mode).
26
Expressions
Type Description
bss and sbss The bss and sbss sections consist of memory which the kernel
loader initializes to zero before your program begins to execute.
Any symbol defined in a .comm or .lcomm pseudo-op belongs to
these sections (except that a .data, .sdata, or .rdata pseudo-op can
override a .comm directive). If its size is less than the number of
bytes specified by the –G option on the command line (which
defaults to 8), it belongs to sbss (“small bss”), and the linker
places it within a 64 KB region pointed to by the $gp register so
that the assembler can use economical 16-bit addressing to access
it. Otherwise, it belongs to bss and the assembler uses 32-bit or
64-bit addressing (depending on whether you are in 32-bit or
64-bit mode). Local symbols in bss or sbss defined by .lcomm are
allocated memory by the assembler; global symbols are allocated
memory by the link editor; and symbols defined by .comm are
overlaid upon like-named symbols (in the fashion of Fortran
“COMMON” blocks) by the link editor.
Symbols in the undefined and small undefined categories are always global (that is, they
are visible to the link editor and can be shared with other modules of your program).
Symbols in the absolute, text, data, sdata, rdata, bss, and sbss categories are local unless
declared in a .globl pseudo-op.
27
Chapter 4: Lexical Conventions
When expression operators combine expression operands, the result’s type depends on
the types of the operands and on the operator. Expressions follow these type propagation
rules:
• If an operand is undefined, the result is undefined.
• If both operands are absolute, the result is absolute.
• If the operator is + and the first operand refers to a relocatable text-section,
data-section, bss-section, or an undefined external, the result has the postulated type
and the other operand must be absolute.
• If the operator is – and the first operand refers to a relocatable text-section,
data-section, or bss-section symbol, the second operand can be absolute (if it
previously defined) and the result has the first operand’s type; or the second
operand can have the same type as the first operand and the result is absolute. If the
first operand is external undefined, the second operand must be absolute.
• The operators *, /, %, <<, >>, ~, ^, &, and | apply only to absolute symbols.
Relocations
With –n32 and –64 compiles, it is possible to specify a relocation explicitly in assembly.
For example:
lui $24,%hi(.data)
This example emits a lui $24,0 instruction with a R_MIPS_HI16 relocation that references
the .data symbol.
28
Relocations
%hi R_MIPS_HI16
%lo R_MIPS_LO16
%gp_rel R_MIPS_GPREL
%half R_MIPS_16
%call16 R_MIPS_CALL16
%call_hi R_MIPS_CALL_HI16
%call_lo R_MIPS_CALL_LO16
%got R_MIPS_GOT
%got_disp R_MIPS_GOT_DISP
%got_hi R_MIPS_GOT_HI16
%got_lo R_MIPS_GOT_LO16
%got_page R_MIPS_GOT_PAGE
%got_ofst R_MIPS_GOT_OFST
%neg R_MIPS_SUB
%higher R_MIPS_HIGHER
%highest R_MIPS_HIGHEST
See the N32/64 ELF Object-File Documentation for a description of what these
relocations do. Use these relocations in instructions only where it makes sense to do so;
otherwise they are illegal.
29
Chapter 5
This chapter describes instruction notation and discusses assembler instructions for the
main processor. Topics covered include:
• “Instruction Classes” on page 32
• “Reorganization Constraints and Rules” on page 32
• “Instruction Notation” on page 32
• “Instruction Set” on page 34
• “Computational Instructions” on page 44
• “Jump and Branch Instructions” on page 59
• “Special Instructions” on page 65
• “Coprocessor Interface Instructions” on page 66
31
Chapter 5: The Instruction Set
Instruction Classes
The assembler has these classes of instructions for the main processor:
• Load and Store Instructions. These instructions load immediate values and move
data between memory and general registers.
• Computational Instructions. These instructions do arithmetic and logical
operations for values in registers.
• Jump and Branch Instructions. These instructions change program control flow.
To maximize performance, the goal of RISC designs is to achieve an execution rate of one
machine cycle per instruction. When writing assembly language instructions, you must
be aware of the rules to achieve this goal. You can find this information in the appropriate
microprocessor manual for your architecture (for example, the MIPS R8000
Microprocessor User’s Manual).
Instruction Notation
The tables in this chapter list the assembler format for each load, store, computational,
jump, branch, coprocessor, and special instruction. The format consists of an op-code and
a list of operand formats. The tables list groups of closely related instructions; for those
instructions, you can use any op-code with any specified operand.
32
Instruction Notation
The operands in the table in this chapter have the following meanings:
Operand Description
33
Chapter 5: The Instruction Set
Instruction Set
The tables in this section summarize the assembly language instruction set. Most of the
assembly language instructions have direct machine equivalents.
Load and store are immediate type intructions that move data between memory and the
general registers. Table 5-1 summarizes the load and store instruction format, and
Table 5-2 and Table 5-3 provide more detailed descriptions for each load instruction.
Table 5-4 and Table 5-5 provide details of each store instruction.
Load Byte LB
Load Halfword LH
Load Linked* LL
Load Word LW
Load Doubleword LD
34
Instruction Set
Store Conditional * SC
Store Double SD
Store Halfword SH
Store Word SW
35
Chapter 5: The Instruction Set
For all load instructions, the effective address is the 32-bit twos-complement sum of the
contents of the index-register and the (sign-extended) 16-bit offset. Instructions that have
symbolic labels imply an index register, which the assembler determines. The assembler
supports additional load instructions, which can produce multiple machine instructions.
Note: Load instructions can generate many code sequences for which the link editor
must fix the address by resolving external data items.
Load Address (LA) Loads the destination register with the effective 32-bit
address of the specified data item.
Load Doubleword Loads the destination register with the effective 64-bit
Address (DLA) address of the specified data item (MIPS3 and above only).
Load Byte (LB) Loads the least-significant byte of the destination register
with the contents of the byte that is at the memory location
specified by the effective address. The system treats the
loaded byte as a signed value: bit seven is extended to fill
the three most-significant bytes.
Load Byte Unsigned Loads the least-significant byte of the destination register
(LBU) with the contents of the byte that is at the memory location
specified by the effective address. Because the system treats
the loaded byte as an unsigned value, it fills the three
most-significant bytes of the destination register with zeros.
Load Halfword (LH) Loads the two least-significant bytes of the destination
register with the contents of the halfword that is at the
memory location specified by the effective address. The
system treats the loaded halfword as a signed value. If the
effective address is not even, the system signals an address
error exception.
36
Instruction Set
Load Linked (LL) Loads the destination register with the contents of the word
that is at the memory location. This instruction performs an
SYNC operation implicitly; all loads and stores to shared
memory fetched prior to the LL must access memory before
the LL, and loads and stores to shared memory fetched
subsequent to the LL must access memory after the LL.
Load Linked and Store Conditional can be use to update
memory locations atomically. The system signals an
address exception when the effective address is not
divisible by four.
Note:This instruction is not valid in the MIPS1
architectures.
Load Word (LW) Loads the destination register with the contents of the word
that is at the memory location. The system replaces all bytes
of the register with the contents of the loaded word. The
system signals an address error exception when the
effective address is not divisible by four.
Load Word Left Loads the sign; that is, Load Word Left loads the destination
(LWL) register with the most-significant bytes of the word
specified by the effective address. The effective address
must specify the byte containing the sign. In a big-endian
system, the effective address specifies the lowest numbered
byte; in a little-endian system, the effective address specifies
the highest numbered byte. Only the bytes which share the
same aligned word in memory are merged into the
destination register.
37
Chapter 5: The Instruction Set
Load Word Right Loads the lowest precision bytes; that is, Load Word Right
(LWR) loads the destination register with the least-significant bytes
of the word specified by the effective address. The effective
address must specify the byte containing the
least-significant bits. In a big-endian configuration, the
effective address specifies the highest numbered byte; in a
little-endian configuration, the effective address specifies
the lowest numbered byte. Only the bytes which share the
same aligned word in memory are merged into the
destination register.
Unaligned Load Loads a halfword into the destination register from the
Halfword (ULH) specified address and extends the sign of the halfword.
Unaligned Load Halfword loads a halfword regardless of
the halfword’s alignment in memory.
Unaligned Load Loads a halfword into the destination register from the
Halfword Unsigned specified address and zero extends the halfword. Unaligned
(ULHU) Load Halfword Unsigned loads a halfword regardless of
the halfword’s alignment in memory.
Unaligned Load Loads a word into the destination register from the
Word (ULW) specified address. Unaligned Load Word loads a word
regardless of the word’s alignment in memory.
Load Immediate (LI) Loads the destination register with the 32-bit value of an
expression that can be computed at assembly time.
Note: Load Immediate can generate any efficient code
sequence to put a desired value in the register.
38
Instruction Set
Load Doubleword Loads the destination register with the 64-bit value of an
Immediate (DLI) expression that can be computed at assembly time.
Note: Load Immediate can generate any efficient code
sequence to put a desired value in the register (MIPS3 and
above only).
Load Doubleword Loads the destination register with the contents of the
(LD) doubleword that is at the memory location. The system
replaces all bytes of the register with the contents of the
loaded doubleword. The system signals an address error
exception when the effective address is not divisible by
eight.
Load Linked Loads the destination register with the contents of the
Doubleword (LLD) doubleword that is currently in the memory location. This
instruction performs a SYNC operation implicitly. Load
Linked Doubleword and Store Conditional Doubleword can
be used to update memory locations atomically.
39
Chapter 5: The Instruction Set
Table 5-3 (continued) Load Instruction Descriptions for MIPS3/4 Architecture Only
Unaligned Load Loads a doubleword into the destination register from the
Doubleword (ULD) specified address. ULD loads a doubleword regardless of
the doubleword’s alignment in memory.
For all machine store instructions, the effective address is the 32-bit twos-complement
sum of the contents of the index-register and the (sign-extended) 16-bit offset. The
assembler supports additional store instructions, which can produce multiple machine
40
Instruction Set
instructions. Instructions that have symbolic labels imply an index-register, which the
assembler determines.
Store Byte (SB) Stores the contents of the source register’s least-significant
byte in the byte specified by the effective address.
Store Conditional Stores the contents of a word from the source register into
(SC) the memory location specified by the effective address. This
instruction implicitly performs a SYNC operation; all loads
and stores to shared memory fetched prior to the sc must
access memory before the sc, and loads and stores to shared
memory fetched subsequent to the sc must access memory
after the sc. If any other processor or device has modified
the physical address since the time of the previous Load
Linked instruction, or if an RFE or ERET instruction occurs
between the Load Linked and this store instruction, the
store fails. The success or failure of the store operation (as
defined above) is indicated by the contents of the source
register after execution of the instruction. A successful store
sets it to 1; and a failed store sets it to 0. The machine signals
an address exception when the effective address is not
divisible by four.
Note: This instruction is not valid in the MIPS1
architectures.
Store Halfword (SH) Stores the two least-significant bytes of the source register
in the halfword that is at the memory location specified by
the effective address. The effective address must be
divisible by two; otherwise the machine signals an address
error exception.
41
Chapter 5: The Instruction Set
Store Word Left Stores the most-significant bytes of a word in the memory
(SWL) location specified by the effective address. The contents of
the word at the memory location, specified by the effective
address, are shifted right so that the leftmost byte of the
unaligned word is in the addressed byte position. The
stored bytes replace the corresponding bytes of the effective
address. The effective address’s last two bits determine how
many bytes are involved.
Store Word Right Stores the least-significant bytes of a word in the memory
(SWR) location specified by the effective address. The contents of
the word at the memory location, specified by the effective
address, are shifted left so that the right byte of the
unaligned word is in the addressed byte position. The
stored bytes replace the corresponding bytes of the effective
address. The effective address’s last two bits determine how
many bytes are involved.
Store Word (SW) Stores the contents of a word from the source register in the
memory location specified by the effective address. The
effective address must be divisible by four; otherwise the
machine signals an address error exception.
Unaligned Store Stores the contents of the two least-significant bytes of the
Halfword (USH) source register in a halfword that the address specifies. The
machine does not require alignment for the storage address.
Unaligned Store Stores the contents of the source register in a word specified
Word (USW) by the address. The machine does not require alignment for
the storage address.
42
Instruction Set
Store Doubleword Stores the contents of a doubleword from the source register
(SD) in the memory location specified by the effective address.
The effective address must be divisible by eight, otherwise
the machine signals an address error exception.
Store Conditional Stores the contents of a doubleword from the source register
Doubleword (SCD) into the memory locations specified by the effective address.
This instruction implicitly performs a SYNC operation. If
any other processor or device has modified the physical
address since the time of the previous Load Linked
instruction, or if an ERET instruction occurs between the
Load Linked instruction and this store instruction, the store
fails and is inhibited from taking place. The success or
failure of the store operation (as defined above) is indicated
by the contents of the source register after execution of this
instruction. A successful store sets it to 1; and a failed store
sets it to 0. The machine signals an address exception when
the effective address is not divisible by eight.
43
Chapter 5: The Instruction Set
Computational Instructions
Computational Instructions
Table 5-6 summarizes the computational format summaries, and Table 5-7 and Table 5-8
describe these instructions in more detail.
Exclusive-OR XOR
Multiply MUL
44
Computational Instructions
NOT OR NOR
OR OR
45
Chapter 5: The Instruction Set
NOT NOT
46
Computational Instructions
47
Chapter 5: The Instruction Set
Absolute Value Computes the absolute value of the contents of src1 and puts
(ABS) the result in the destination register. If the value in src1 is
–2147483648, the machine signals an overflow exception.
Add with Overflow Computes the twos-complement sum of two signed values.
(ADD) This instruction adds the contents of src1 to the contents of
src2, or it can add the contents of src1 to the immediate value.
Add (with Overflow) puts the result in the destination
register. When the result cannot be extended as a 32-bit
number, the machine signals an overflow exception.
AND (AND) Computes the Logical AND of two values. This instruction
ANDs (bit-wise) the contents of src1 with the contents of
src2, or it can AND the contents of src1 with the immediate
value. The immediate value is not sign extended. AND puts
the result in the destination register.
Divide Signed (DIV) Computes the quotient of two values. Divide (with
Overflow) treats src1 as the dividend. The divisor can be src2
or the immediate value. The instruction divides the contents
of src1 by the contents of src2, or it can divide src1 by the
immediate value. It puts the quotient in the destination
register. If the divisor is zero, the machine signals an error
and may issue a BREAK instruction. The DIV instruction
rounds toward zero. Overflow is signaled when dividing
–2147483648 by –1. The machine may issue a BREAK
instruction for divide-by-zero or for overflow.
Note: The special case DIV $0,src1,src2 generates the real
machine divide instruction and leaves the result in the
HI/LO register. The HI register contains the remainder and
the LO register contains the quotient. No checking for
divide-by-zero is performed.
48
Computational Instructions
Exclusive-OR (XOR) Computes the XOR of two values. This instruction XORs
(bit-wise) the contents of src1 with the contents of src2, or it
can XOR the contents of src1 with the immediate value. The
immediate value is not sign extended. Exclusive-OR puts
the result in the destination register.
Multiply (MUL) Computes the product of two values. This instruction puts
the 32-bit product of src1 and src2, or the 32-bit product of
src1 and the immediate value, in the destination register. The
machine does not report overflow.
Note: Use MUL when you do not need overflow protection:
it’s often faster than MULO and MULOU. For multiplication
by a constant, the MUL instruction produces faster machine
instruction sequences than MULT or MULTU instructions
can produce.
Multiply (MULT) Computes the 64-bit product of two 32-bit signed values.
This instruction multiplies the contents of src1 by the
contents of src2 and puts the result in the HI and LO registers
(see Chapter 1). No overflow is possible.
Note: The MULT instruction is a real machine language
instruction.
49
Chapter 5: The Instruction Set
Multiply with Computes the product of two 32-bit signed values. Multiply
Overflow (MULO) (with Overflow) puts the 32-bit product of src1 and src2, or
the 32-bit product of src1 and the immediate value, in the
destination register. When an overflow occurs, the machine
signals an overflow exception and may execute a BREAK
instruction.
Note: For multiplication by a constant, MULO produces
faster machine instruction sequences than MULT or MULTU
can produce; however, if you do not need overflow
detection, use the MUL instruction. It’s often faster than
MULO.
Negate without Negates the integer contents of src1 and puts the result in the
Overflow (NEGU) destination register. The machine does not report overflows.
50
Computational Instructions
51
Chapter 5: The Instruction Set
Rotate Left (ROL) Rotates the contents of a register left (toward the sign bit).
This instruction inserts in the least-significant bit any bits
that were shifted out of the sign bit. The contents of src1
specify the value to shift, and the contents of src2 (or the
immediate value) specify the amount to shift. Rotate Left
puts the result in the destination register. If src2 (or the
immediate value) is greater than 31, src1 shifts by (src2 MOD
32).
Rotate Right (ROR) Rotates the contents of a register right (toward the
least-significant bit). This instruction inserts in the sign bit
any bits that were shifted out of the least-significant bit. The
contents of src1 specify the value to shift, and the contents of
src2 (or the immediate value) specify the amount to shift.
Rotate Right puts the result in the destination register. If src2
(or the immediate value) is greater than 32, src1 shifts by src2
MOD 32.
Set Equal (SEQ) Compares two 32-bit values. If the contents of src1 equal the
contents of src2 (or src1 equals the immediate value) this
instruction sets the destination register to one; otherwise, it
sets the destination register to zero.
Set Greater Than Compares two signed 32-bit values. If the contents of src1 are
(SGT) greater than the contents of src2 (or src1 is greater than the
immediate value), this instruction sets the destination
register to one; otherwise, it sets the destination register to
zero.
Set Greater/Equal Compares two signed 32-bit values. If the contents of src1 are
(SGE) greater than or equal to the contents of src2 (or src1 is greater
than or equal to the immediate value), this instruction sets
the destination register to one; otherwise, it sets the
destination register to zero.
Set Greater/Equal Compares two unsigned 32-bit values. If the contents of src1
Unsigned (SGEU) are greater than or equal to the contents of src2 (or src1 is
greater than or equal to the immediate value), this
instruction sets the destination register to one; otherwise, it
sets the destination register to zero.
52
Computational Instructions
Set Greater Than Compares two unsigned 32-bit values. If the contents of src1
Unsigned (SGTU) are greater than the contents of src2 (or src1 is greater than
the immediate value), this instruction sets the destination
register to one; otherwise, it sets the destination register to
zero.
Set Less Than (SLT) Compares two signed 32-bit values. If the contents of src1 are
less than the contents of src2 (or src1 is less than the
immediate value), this instruction sets the destination
register to one; otherwise, it sets the destination register to
zero.
Set Less/Equal (SLE) Compares two signed 32-bit values. If the contents of src1 are
less than or equal to the contents of src2 (or src1 is less than
or equal to the immediate value), this instruction sets the
destination register to one; otherwise, it sets the destination
register to zero.
Set Less/Equal Compares two unsigned 32-bit values. If the contents of src1
Unsigned (SLEU) are less than or equal to the contents of src2 (or src1 is less
than or equal to the immediate value) this instruction sets
the destination register to one; otherwise, it sets the
destination register to zero.
Set Less Than Compares two unsigned 32-bit values. If the contents of src1
Unsigned (SLTU) are less than the contents of src2 (or src1 is less than the
immediate value), this instruction sets the destination
register to one; otherwise, it sets the destination register to
zero.
Set Not Equal (SNE) Compares two 32-bit values. If the contents of scr1 do not
equal the contents of src2 (or src1 does not equal the
immediate value), this instruction sets the destination
register to one; otherwise, it sets the destination register to
zero.
Shift Left Logical Shifts the contents of a register left (toward the sign bit) and
(SLL) inserts zeros at the least-significant bit. The contents of src1
specify the value to shift, and the contents of src2 or the
immediate value specify the amount to shift. If src2 (or the
immediate value) is greater than 31 or less than 0, src1 shifts
by src2 MOD 32.
53
Chapter 5: The Instruction Set
Shift Right Logical Shifts the contents of a register right (toward the
(SRL) least-significant bit) and inserts zeros at the most-significant
bit. The contents of src1 specify the value to shift, and the
contents of src2 (or the immediate value) specify the amount
to shift. If src2 (or the immediate value) is greater than 31 or
less than 0, src1 shifts by the result of src2 MOD 32.
Trap if Equal (TEQ) Compares two 32-bit values. If the contents of src1 equal the
contents of src2 (or src1 equals the immediate value), a trap
exception occurs.
Trap if Not Equal Compares two 32-bit values. If the contents of src1 do not
(TNE) equal the contents of src2 (or src1 does not equal the
immediate value), a trap exception occurs.
Trap if Less Than Compares two signed 32-bit values. If the contents of src1 are
(TLT) less than the contents of src2 (or src1 is less than the
immediate value), a trap exception occurs.
54
Computational Instructions
Trap if Less Than Compares two unsigned 32-bit values. If the contents of src1
Unsigned (TLTU) are less than the contents of src2 (or src1 is less than the
immediate value), a trap exception occurs.
Trap if Greater than Compares two signed 32-bit values. If the contents of src1 are
or Equal (TGE) greater than the contents of src2 (or src1 is greater than the
immediate value), a trap exception occurs.
Trap if Greater than Compares two unsigned 32-bit values. If the contents of src1
or Equal Unsigned are greater than the contents of src2 (or src1 is greater than
(TGEU) the immediate value), a trap exception occurs.
Doubleword Divide Computes the quotient of two 64-bit values. DDIV treats src1
Signed (DDIV) as the dividend. The divisor can be src2 or the immediate
value. It puts the quotient in the destination register. If the
divisor is zero, the system signals an error and may issue a
BREAK instruction. The DDIV instruction rounds towards
zero. Overflow is signaled when dividing -2**63 by -1.
Note: The special case DDIV $0,src1,src2 generates the real
doubleword divide instruction and leaves the result in the
HI/LO register. The HI register contains the quotient. No
checking for divide-by-zero is performed.
55
Chapter 5: The Instruction Set
56
Computational Instructions
Doubleword Negate Negates the 64-bit contents of src1 and puts the result in the
without Overflow destination register. Overflow is not reported.
(DNEGU)
Doubleword Rotate Rotates the contents of a 64-bit register left (towards the sign
Left (DROL) bit). This instruction inserts in the least-significant bit any
bits that were shifted out of the sign bit. The contents of src1
specify the value to shift, and contents of src2 (or the
immediate value) specify the amount to shift. If src2 (or the
immediate value) is greater than 63, src1 shifts by src2 MOD
64.
57
Chapter 5: The Instruction Set
Doubleword Rotate Rotates the contents of a 63-bit register right (towards the
Right (DROR) least-significant bit). This instruction inserts in the sign bit
any bits that were shifted out of the least-significant bit. The
contents of src1 specify the value to shift, and the contents of
src2 (or the immediate value) specify the amount to shift. If
src2 (or the immediate value is greater than 63, src1 shifts by
src2 MOD 64.
Doubleword Shift Shifts the contents of a 64-bit register left (towards the sign
Left Logical (DSLL) bit) and inserts zeros at the least-significant bit. The contents
of src1 specify the value to shift, and the contents of src2 (or
the immediate value) specify the amount to shift. If src2 (or
the immediate value) is greater than 63, src1 shifts by src2
MOD 64.
Doubleword Shift Shifts the contents of a 64-bit register right (towards the
Right Arithmetic least-significant bit) and inserts the sign bit at the
(DSRA) most-significant bit. The contents of src2 (or the immediate
value) specify the amount to shift. If src2 (or the immediate
value) is greater than 63, src1 shifts by src2 MOD 64.
Doubleword Shift Shifts the contents of a 64-bit register right (towards the
Right Logical (DSRL) least-significant bit) and inserts zeros at the most-significant
bit. The contents of src1 specify the value to shift, and the
contents of src2 (or the immediate value) specify the amount
to shift. If src2 (or the immediate value) is greater than 63,
src1 shifts by src2 MOD 64.
58
Jump and Branch Instructions
The jump and branch instructions let you change an assembly program’s control flow.
This section of the book describes jump and branch instructions.
Jump and branch instructions change the flow of a program. Table 5-9 summarizes the
formats of jump and branch instructions.
Jump J address
Branch B label
59
Chapter 5: The Instruction Set
60
Jump and Branch Instructions
In Table 5-10 branch instructions, branch destinations must be defined in the source
being assembled.
Branch and Link Branches unconditionally to the specified label and puts the
(BAL) return address in general register $31.
Branch on Equal Branches to the specified label when the contents of src1 equal
(BEQ) the contents of src2, or when the contents of src1 equal the
immediate value.
Branch on Equal to Branches to the specified label when the contents of src1 equal
Zero (BEQZ) zero.
Branch on Greater Branches to the specified label when the contents of src1 are
Than (BGT) greater than the contents of src2, or it can branch when the
contents of src1 are greater than the immediate value. The
comparison treats the comparands as signed 32-bit values.
Branch on Branches to the specified label when the contents of src1 are
Greater/Equal greater than or equal to the contents of src2, or it can branch
Unsigned (BGEU) when the contents of src1 are greater than or equal to the
immediate value. The comparison treats the comparands as
unsigned 32-bit values.
Branch on Branches to the specified label when the contents of src1 are
Greater/Equal greater than or equal to zero.
Zero (BGEZ)
61
Chapter 5: The Instruction Set
Branch on Branches to the specified label when the contents of src1 are
Greater/Equal greater than or equal to zero and puts the return address in
Zero and Link general register $31. When this write is done, it destroys the
(BGEZAL) contents of the register. See the MIPS microprocessor user’s
manual appropriate to your architecture for more
information. Do not use BGEZAL $31.
Branch on Greater Branches to the specified label when the contents of src1 are
or Equal (BGE) greater than or equal to the contents of src2, or it can branch
when the contents of src1 are greater than or equal to the
immediate value. The comparison treats the comparands as
signed 32-bit values.
Branch on Greater Branches to the specified label when the contents of src1 are
Than Unsigned greater than the contents of src2, or it can branch when the
(BGTU) contents of src1 are greater than the immediate value. The
comparison treats the comparands as unsigned 32-bit values.
Branch on Greater Branches to the specified label when the contents of src1 are
Than Zero (BGTZ) greater than zero.
Branch on Less Branches to the specified label when the contents of src1 are
Than Zero (BLTZ) less than zero. The program must define the destination.
Branch on Less Branches to the specified label when the contents of src1 are
Than (BLT) less than the contents of src2, or it can branch when the
contents of src1 are less than the immediate value. The
comparison treats the comparands as signed 32-bit values.
Branch on Branches to the specified label when the contents of src1 are
Less/Equal less than or equal to the contents of src2, or it can branch when
Unsigned (BLEU) the contents of src1 are less than or equal to the immediate
value. The comparison treats the comparands as unsigned
32-bit values.
Branch on Branches to the specified label when the contents of src1 are
Less/Equal Zero less than or equal to zero. The program must define the
(BLEZ) destination.
62
Jump and Branch Instructions
Branch on Less or Branches to the specified label when the contents of src1 are
Equal (BLE) less than or equal to the contents of src2, or it can branch when
the contents of src1 are less than or equal to the immediate
value. The comparison treats the comparands as signed 32-bit
values.
Branch on Less Branches to the specified label when the contents of src1 are
Than Unsigned less than the contents of src2, or it can branch when the
(BLTU) contents of src1 are less than the immediate value. The
comparison treats the comparands as unsigned 32-bit values.
Branch on Less Branches to the specified label when the contents of src1 are
Than Zero and less than zero and puts the return address in general register
Link (BLTZAL) $31. Because the value is always stored in register 31, there is
a chance of a stored value being overwritten before it is used.
See the MIPS microprocessor user’s manual appropriate to
your architecture for more information. Do not use BGEZAL
$31
Branch on Not Branches to the specified label when the contents of src1 do
Equal (BNE) not equal the contents of src2, or it can branch when the
contents of src1 do not equal the immediate value.
Branch on Not Branches to the specified label when the contents of src1 do
Equal to Zero not equal zero.
(BNEZ)
63
Chapter 5: The Instruction Set
Jump And Link Unconditionally jumps to a specified location and puts the
(JAL) return address in a general register. A symbolic address or a
general register specifies the target location. By default, the
return address is placed in register $31. If you specify a pair of
registers, the first receives the return address and the second
specifies the target. The instruction JAL procname transfers to
procname and saves the return address. For the two-register
form of the instruction, the target register may not be the same
as the return-address register. For the one-register form, the
target may not be $31.
64
Special Instructions
Special Instructions
The main processor’s special instructions do miscellaneous tasks. See Table 5-11.
Restore From Restores the previous interrupt called and user/kernel state.
Exception (RFE) This instruction can execute only in kernel state and is
unavailable in user mode.
Syscall (SYSCALL) Causes a system call trap. The operating system interprets
the information set in registers to determine what system
call to do.
65
Chapter 5: The Instruction Set
The coprocessor interface instructions provide standard ways to access your machine’s
coprocessors. See Table 5-12 and Table 5-13.
Note: You cannot use coprocessor load and store instructions with the system control
coprocessor (cp0).
66
Coprocessor Interface Instructions
67
Chapter 5: The Instruction Set
Load Word Loads the destination with the contents of a word that is at
Coprocessor z the memory location specified by the effective address. The
(LWCz) z selects one of four distinct coprocessors. Load Word
Coprocessor replaces all register bytes with the contents of
the loaded word. If bits 0 and 1 of the effective address are
not zero, the machine signals an address exception.
Doubleword Move Stores the 64-bit contents of the general register src-gpr into
To Coprocessor z the coprocessor register specified by the destination.
(DMTCz)
68
Coprocessor Interface Instructions
Store Word Stores the contents of the coprocessor register in the memory
Coprocessor z location specified by the effective address. The z selects one
(SWCz) of four distinct coprocessors. If bits 0 and 1 of the effective
address are not zero, the machine signals an address error
exception.
69
Chapter 6
See Chapter 5 for a description of the main processor’s instructions and the coprocessor
interface instructions.
71
Chapter 6: Coprocessor Instruction Set
Instruction Notation
The tables in this chapter list the assembler format for each coprocessor’s load, store,
computational, jump, branch, and special instructions. The format consists of an op-code
and a list of operand formats. The tables list groups of closely related instructions; for
those instructions, you can use any op-code with any specified operand.
72
Floating-Point Instructions
Floating-Point Instructions
Floating-Point Formats
The formats for the single- and double-precision floating-point constants are shown in
Figure 6-1:
0 1 8 9 31 (big−endian)
1 8 bits 23 bits
31 30 23 22 0 (little−endian)
SINGLE−PRECISION
(big−endian)
0 1 11 12 63
1 11 bits 52 bits
63 62 52 51 0
(little−endian)
DOUBLE−PRECISION
73
Chapter 6: Coprocessor Instruction Set
Floating-point load and store instructions must use even registers. The operands in the
following table have these meanings:
Operand Meaning
Load Fp
Single L.S
Load Indexed Fp
Single LWXC1
Load Immediate Fp
Single LI.S
Store Fp
Single S.S
Store Indexed Fp
Single SWXC1
74
Floating-Point Instructions
This part of Chapter 6 groups the instructions by function. Please consult “Floating-Point
Instructions” for the op-codes.
Instruction Description
Load Fp Instructions Load eight bytes for double-precision and four bytes for
single-precision from the specified effective address into
the destination register, which must be an even register
(32-bit only). The bytes must be word aligned. Note: It is
recommended that you use doubleword alignment for
double-precision operands. It is required in the MIPS2
architecture (R4000 and later).
Load Indexed Fp Indexed loads follow the same description as the load
Instructions instructions above except that indexed loads use
index+base to specify the effective address (64-bit only).
Store Fp Instructions Stores eight bytes for double-precision and four bytes for
single-precision from the source floating-point register in
the destination register, which must be an even register
(32-bit only). Note: It is recommended that you use
doubleword alignment for double-precision operands. It
is required in the MIPS2 architecture and later.
Store Indexed Fp Indexed stores follow the same description as the store
Instructions instructions above except that indexed stores use
index+base to specify the effective address (64-bit only).
75
Chapter 6: Coprocessor Instruction Set
Operand Meaning
Absolute Value Fp
Single ABS.S
Negate Fp
Double NEG.D
Single NEG.S
Add Fp
Single ADD.S
Divide Fp
Double DIV.D
Single DIV.S
Multiply Fp
Double MUL.D
Single MUL.S
Subtract Fp
76
Floating-Point Instructions
Absolute Value Fp
Double SUB.D
Single SUB.S
Multiply Add FP
Single MADD.S
Double NMADD.D
Single NMADD.S
Multiply Subtract FP
Double MSUB.D
Single MSUB.S
Double NMSUB.D
Single NMSUB.S
77
Chapter 6: Coprocessor Instruction Set
Absolute Value Fp
78
Floating-Point Instructions
79
Chapter 6: Coprocessor Instruction Set
This part of Chapter 6 groups the instructions by function. Refer to Table 6-6 and
Table 6-8 for the op-code names. Table 6-5 describes the floating-point Computational
instructions.
Instruction Description
Add Fp Instructions Add the contents of src1 (or the destination) to the
contents of src2 and put the result in the destination
register. When the sum of two operands with
opposite signs is exactly zero, the sum has a positive
sign for all rounding modes except round toward –1.
For that rounding mode, the sum has a negative sign.
Convert Source to Another Convert the contents of src1 to the specified precision,
Precision Fp Instructions round according to the rounding mode, and put the
result in the destination register.
Multiply-Then-Add Fp Multiply the contents of src2 and src3, then add the
Instructions result to src1 and store in the destination register
(MADD). The NMADD instruction does the same
multiply then add, but then negates the sign of the
result (64-bit only).
Truncate and Round The TRUNC instructions truncate the value in the
instructions source floating-point register and put the resulting
integer in the destination floating-point register,
using the third (general-purpose) register to hold a
temporary value. (This is a macro-instruction.) The
ROUND instructions work like TRUNC, but round
the floating-point value to an integer instead of
truncating it.
80
Floating-Point Instructions
Instruction Description
Subtract Fp Instructions Subtract the contents of src2 from the contents of src1
(or the destination). These instructions put the result
in the destination register. When the difference of two
operands with the same signs is exactly zero, the
difference has a positive sign for all rounding modes
except round toward –1. For that rounding mode, the
sum has a negative sign.
Table 6-6 summarizes the floating-point relational instructions. The first column under
Condition gives a mnemonic for the condition tested. As the “branch on true/false”
condition can be used logically to negate any condition, the second column supplies a
mnemonic for the logical negation of the condition in the first column. This provides a
total of 32 possible conditions. The four columns under Relations give the result of the
comparison based on each condition. The final column states if an invalid operation is
signaled for each condition.
81
Chapter 6: Coprocessor Instruction Set
For example, with an equal condition (EQ mnemonic in the True column), the logical
negation of the condition is not equal (NEQ), and a comparison that is equal is True for
equal and False for greater than, less than, and unordered, and no Invalid Operation
Exception is given if the relation is unordered.
F T F F F F no
UN OR F F F T no
EQ NEQ F F T F no
UEQ OLG F F T T no
OLT UGE F T F F no
ULT OGE F T F T no
OLE UGT F T T F no
ULE OGT F T T T no
SF ST F F F F yes
NGL GL F F T T yes
LT NLT F T F F yes
NGE GE F T F T yes
LE NLE F T T F yes
NGT GT F T T T yes
82
Floating-Point Instructions
F False T True
UN Unordered OR Ordered
EQ Equal NEQ Not Equal
UEQ Unordered or Equal OLG Ordered or Less than or Greater
Than
OLT Ordered Less Than UGE Unordered or Greater Than or
Equal
ULT Unordered or Less Than OGE Ordered Greater Than or Equal
OLE Ordered Less Than or Equal UGT Unordered or Greater Than
ULE Unordered or Less Than or OGT Ordered Greater Than
Equal
SF Signaling False ST Signaling True
NGLE Not Greater Than or Less GLE Greater Than, or Less Than or
Than or Equal Equal
SEQ Signaling Equal SNE Signaling Not Equal
NGL Not Greater Than or Less GL Greater Than or Less Than
Than
LT Less Than NLT Not Less Than
NGE Not Greater Than GE Greater Than or Equal
LE Less Than or Equal NLE Not Less Than or Equal
NGT Not Greater Than GT Greater Than
83
Chapter 6: Coprocessor Instruction Set
Compare F
Single C.F.S
Compare UN
Double C.UN.D
Single C.UN.S
Compare EQ
Double C.EQ.D
Single C.EQ.S
Compare UEQ
Double C.UEQ.D
Single C.UEQ.S
Compare OLT
Double C.OLT.D
Single C>OLT.S
Compare ULT
Double C.ULT.D
Single C.ULT.S
Compare OLE
Double C.OLE.D
Single C.OLE.S
84
Floating-Point Instructions
Compare ULE
Double C.ULE.D
Single C.ULE.S
Compare SF
Double C.SF.D
Single C.SF.S
Compare NGLE
Single C.NGLE.S
Compare SEQ
Double C.SEQ.D
Single C.SEQ.S
Compare NGL
Double C.NGL.D
Single C.NGL.S
Compare LT
Double C.LT.D
Single C.LT.S
Compare NGE
Double C.NGE.D
Single C.NGE.S
Compare LE
Double C.LE.D
Single C.LE.S
85
Chapter 6: Coprocessor Instruction Set
Compare NGT
Double C.NGT.D
Single C.NGT.S
Note: These are the most common Compare instructions. The MIPS coprocessor
instruction set provides others for IEEE compatibility.
Table 6-8 describes the relational instruction descriptions by function. Refer to Chapter 1
for information about registers.
Instruction Description
86
Floating-Point Instructions
Instruction Description
Compare OLE Instructions Compare the contents of src1 with the contents of
src2. If src1 is less than or equal to src2, a true
condition results; otherwise, a false condition results.
The machine does not signal an exception for
unordered values.
Compare OLT Instructions Compare the contents of src1 with the contents of
src2. If src1 is less than src2, a true condition results;
otherwise, a false condition results. The machine
does not signal an exception for unordered values.
Compare SEQ Instructions Compare the contents of src1 with the contents of
src2. If src1 equals src2, a true condition results;
otherwise, a false condition results. The machine
signals an exception for unordered values.
87
Chapter 6: Coprocessor Instruction Set
Instruction Description
Compare ULE Instructions Compare the contents of src1 with the contents of
src2. If src1 is less than or equal to src2 (or src1 is
unordered), a true condition results; otherwise, a
false condition results. The machine does not signal
an exception for unordered values.
Compare UEQ Instructions Compare the contents of src1 with the contents of
src2. If src1 equals src2 (or src1 and src2 are
unordered), a true condition results; otherwise, a
false condition results. The machine does not signal
an exception for unordered values.
Compare ULT Instructions Compare the contents of src1 with the contents of
src2. If src1 is less than src2 (or the contents are
unordered), a true condition results; otherwise, a
false condition results. The machine does not signal
an exception for unordered values.
88
Floating-Point Instructions
The floating-point move instructions move data from source to destination registers (only
floating-point registers are allowed).
Move FP
Double MOV.D
Double MOVF.D
Double MOVT.D
Double MOVN.D
Double MOVZ.D
89
Chapter 6: Coprocessor Instruction Set
Instruction Description
90
System Control Coprocessor Instructions
The system control coprocessor (cp0) handles all functions and special and privileged
registers for the virtual memory and exception handling subsystems. The system control
coprocessor translates addresses from a large virtual address space into the machine’s
physical memory space. The coprocessor uses a translation lookaside buffer (TLB) to
translate virtual addresses to physical addresses.
Description Op-code
91
Chapter 6: Coprocessor Instruction Set
Instruction Description
Translation Lookaside Loads the EntryHi and EntryLo registers with the
Buffer Read (TLBR) contents of the translation lookaside buffer (TLB)
entry specified in the TLB Index register.
92
System Control Coprocessor Instructions
Instruction Description
Synchronize (SYNC) Ensures that all loads and stores fetched before the
sync are completed, before allowing any following
loads or stores. Use of sync to serialize certain
memory references may be required in
multiprocessor environments.
Note: This instruction is not valid in the MIPS1
architecture.
31 24 23 22 18 17 12 11 7 6 2 1 0
sticky−
0 c 0 exceptions enables 0
bits
bits: 8 1 5 6 5 5 2
Control and Status Register
(c = compare bit)
11 10 9 8 7 17 16 15 14 13 12 6 5 4 3 2
V Z O U I E V Z O U I V Z O U I
93
Chapter 6: Coprocessor Instruction Set
The exception bits are set for instructions that cause an IEEE standard exception or an
optional exception used to emulate some of the more hardware-intensive features of the
IEEE standard.
The meaning of each bit in the exception field is given below. If two exceptions occur
together on one instruction, the field will contain the inclusive-OR of the bits for each
exception:
Exception Description
Field Bit
E Unimplemented Operation
I Inexact Exception
O Overflow Exception
U Underflow Exception
V Invalid Operation
Z Division-by-Zero
94
System Control Coprocessor Instructions
Field Description
I Inexact Exception
O Overflow Exception
U Underflow Exception
V Invalid Operation
Z Division-by-Zero
Each of the five exceptions is associated with a trap under user control, which is enabled
by setting one of the five bits of the enable field, shown above.
When an exception occurs, both the corresponding exception and status bits are set. If the
corresponding enable flag bit is set, a trap is taken. In some cases the result of an
operation is different if a trap is enabled.
The status flags are never cleared as a side effect of floating-point operations, but may be
set or cleared by writing a new value into the status register, using a “move to
coprocessor control” instruction.
The floating-point compare instruction places the condition which was detected into the
"c" bit of the control and status register, so that the state of the condition line may be
saved and restored. The "c" bit is set if the condition is true, and cleared if the condition
is false, and is affected only by compare and move to control register instructions.
95
Chapter 6: Coprocessor Instruction Set
For each IEEE standard exception, a status flag is provided that is set on any occurrence
of the corresponding exception condition with no corresponding exception trap
signaled. It may be reset by writing a new value into the status register. The flags may be
saved and restored individually, or as a group, by software. When no exception trap is
signaled, a default action is taken by the floating-point coprocessor, which provides a
substitute value for the original, exceptional, result of the floating-point operation. The
default action taken depends on the type of exception, and in the case of the Overflow
exception, the current rounding mode.
The invalid operation exception is signaled if one or both of the operands are invalid for
an implemented operation. The result, when the exception occurs without a trap, is a
quiet NaN when the destination has a floating-point format, and is indeterminate if the
result has a fixed-point format. The invalid operations are:
• Addition or subtraction: magnitude subtraction of infinities, such as
( + 1 ) – ( – 1 ).
• Multiplication: 0 times 1, with any signs.
• Division: 0 over 0 or 1 over 1, with any signs.
• Square root of x: where x is less than zero.
• Conversion of a floating-point number to a fixed-point format when an overflow, or
operand value of infinity or NaN, precludes a faithful representation in that format.
• Comparison of predicates involving < or > without ?, when the operands are
“unordered”.
• Any operation on a signaling NaN.
96
System Control Coprocessor Instructions
Software may simulate this exception for other operations that are invalid for the given
source operands. Examples of these operations include IEEE-specified functions
implemented in software, such as Remainder: x REM y, where y is zero or x is infinite;
conversion of a floating-point number to a decimal format whose value causes and
overflow or is infinity of NaN; and transcendental functions, such as ln (–5) or cos-1(3).
Division-by-zero Exception
If division by zero traps are enabled, the result register is not modified, and the source
registers are preserved.
Software may simulate this exception for other operations that produce a signed infinity,
such as ln(0), sec(p/2), csc(0) or 0-1.
Overflow Exception
The overflow exception is signaled when what would have been the magnitude of the
rounded floating-point result, were the exponent range unbounded, is larger than the
destination format’s largest finite number. The result, when no trap occurs, is determined
by the rounding mode and the sign of the intermediate result.
If overflow traps are enabled, the result register is not modified, and the source registers
are preserved.
97
Chapter 6: Coprocessor Instruction Set
Underflow Exception
Two related events contribute to underflow. One is the creation of a tiny non-zero result
between 2 Emin (minimum expressible exponent) which, because it is tiny, may cause
some other exception later. The other is extraordinary loss of accuracy during the
approximation of such tiny numbers by denormalized numbers.
The IEEE standard permits a choice in how these events are detected, but requires that
they must be detected the same way for all operations.
The IEEE standard specifies that “tininess” may be detected either: “after rounding”
(when a nonzero result computed as though the exponent range were unbounded would
lie strictly between 2 Emin), or “before rounding” (when a nonzero result computed as
though the exponent range and the precision were unbounded would lie strictly between
2 Emin). The architecture requires that tininess be detected after rounding.
Loss of accuracy may be detected as either “denormalization loss” (when the delivered
result differs from what would have been computed if the exponent range were
unbounded), or “inexact result” (when the delivered result differs from what would
have been computed if the exponent range and precision were both unbounded). The
architecture requires that loss of accuracy be detected as inexact result.
When an underflow trap is not enabled, underflow is signaled (via the underflow flag)
only when both tininess and loss of accuracy have been detected. The delivered result
might be zero, denormalized, or 2 Emin. When an underflow trap is enabled, underflow
is signaled when tininess is detected regardless of loss of accuracy.
If underflow traps are enabled, the result register is not modified, and the source registers
are preserved.
98
System Control Coprocessor Instructions
Inexact Exception
If an operation is specified that the hardware may not perform, due to an implementation
restriction on the supported operations or supported formats, an unimplemented
operation exception may be signaled, which always causes a trap, for which there are no
corresponding enable or flag bits. The trap cannot be disabled.
This exception is also raised when an attempt is made to execute an instruction with an
operation code or format code which has been reserved for future architectural
definition. The unimplemented instruction trap is not optional, since the current
definition contains codes of this kind.
This exception may be signaled when unusual operands or result conditions are
detected, for which the implemented hardware cannot handle the condition properly.
These may include (but are not limited to), denormalized operands or results, NaN
operands, trapped overflow or underflow conditions. The use of this exception for such
conditions is optional.
99
Chapter 6: Coprocessor Instruction Set
Floating-Point Rounding
Bits 0 and 1 of the coprocessor control register 31 sets the rounding mode for
floating-point. The machine allows four rounding modes:
• Round to nearest rounds the result to the nearest representable value. When the
two nearest representable values are equally near, this mode rounds to the value
with the least significant bit zero. To select this mode, set bits 1..0 of control register
31 to 0.
• Round toward zero rounds toward zero. It rounds to the value that is closest to and
not greater in magnitude than the infinitely precise result. To select this mode, set
bits 1..0 of control register 31 to 1.
• Round toward positive infinity rounds to the value that is closest to and not less
than the infinitely precise result. To select this mode, set bits 1..0 of control register
31 to 2.
• Round toward negative infinity rounds toward negative infinity. It rounds to the
value that is closest to and not greater than the infinitely precise result. To select this
mode, set bits 1..0 of control register 31 to 3.
100
Chapter 7
7. Linkage Conventions
This chapter gives rules and examples to follow when designing an assembly language
program. The chapter includes a tutorial section that contains information about how
calling sequences work. This involves writing a skeleton version of your prospective
assembly routine using a high level language, and then compiling it with the –S option
to generate a human-readable assembly language file. The assembly language file can
then be used as the starting point for coding your routine. Topics covered include:
• “Program Design” on page 102
• “Examples” on page 111
• “Writing Assembly Language Code” on page 114
This assembler works in either 32-bit, high performance 32-bit (N32) or 64-bit
compilation modes. While these modes are very similar, due to the difference in data,
register and address sizes, the N32 and 64-bit assembler linkage conventions are not
always the same as those for 32-bit mode. For details on some of these differences, see the
MIPSpro 64-Bit Porting and Transition Guide and MIPSpro N32 ABI Guide.
The procedures and examples in this chapter, for the most part, describe 32-bit
compilation mode. In some cases, specific differences necessitated by 64-bit mode are
highlighted.
101
Chapter 7: Linkage Conventions
Introduction
When you write assembly language routines, you should follow the same calling
conventions that the compilers observe, for two reasons:
• Often your code must interact with compiler-generated code, accepting and
returning arguments or accessing shared global data.
• The symbolic debugger gives better assistance in debugging programs using
standard calling conventions.
The conventions for the compiler system are a bit more complicated than some, mostly
to enhance the speed of each procedure call. Specifically:
• The compilers use the full, general calling sequence only when necessary; where
possible, they omit unneeded portions of it. For example, the compilers don’t use a
register as a frame pointer whenever possible.
• The compilers and debugger observe certain implicit rules rather than
communicating via instructions or data at execution time. For example, the
debugger looks at information placed in the symbol table by a “.frame” directive at
compilation time, so that it can tolerate the lack of a register containing a frame
pointer at execution time.
Program Design
This section describes some general areas of concern to the assembly language
programmer:
• Stack frame requirements on entering and exiting a routine.
• The “shape” of data (scalars, arrays, records, sets) laid out by the various high-level
languages.
For information about register format, and general, special, and floating-point registers,
see Chapter 1.
102
Program Design
This discussion of the stack frame, particularly regarding the graphics, describes 32-bit
operations. In 32-bit mode, restrictions such as stack addressing are enforced strictly.
While these restrictions are not enforced rigidly for 64-bit stack frame usage, their
observance is probably still a good coding practice, especially if you count on reliable
debugging information.
The compilers classify each routine into one of the following categories:
• Non-leaf routines, that is, routines that call other procedures.
• Leaf routines, that is, routines that do not themselves execute any procedure calls.
Leaf routines are of two types:
– Leaf routines that require stack storage for local variables
– Leaf routines that do not require stack storage for local variables.
You must decide the routine category before determining the calling sequence.
To write a program with proper stack frame usage and debugging capabilities, use the
following procedure:
1. Regardless of the type of routine, you should include a .ent pseudo-op and an entry
label for the procedure. The .ent pseudo-op is for use by the debugger, and the entry
label is the procedure name. The syntax is:
.ent procedure_name
procedure_name:
2. If you are writing a leaf procedure that does not use the stack, skip to step 3. For leaf
procedure that uses the stack or non-leaf procedures, you must allocate all the stack
space that the routine requires. The syntax to adjust the stack size is:
subu $sp,framesize
103
Chapter 7: Linkage Conventions
where framesize is the size of frame required; framesize must be a multiple of 16.
Space must be allocated for:
• Local variables.
• Saved general registers. Space should be allocated only for those registers
saved. For non-leaf procedures, you must save $31, which is used in the calls to
other procedures from this routine. If you use registers $16–$23, you must also
save them.
• Saved floating-point registers. Space should be allocated only for those registers
saved. If you use registers $f20–$f30 (for 32-bit) or $f24-$f31 (for 64-bit), you
must also save them.
• Procedure call argument area. You must allocate the maximum number of bytes for
arguments of any procedure that you call from this routine.
Note: Once you have modified $sp, you should not modify it again for the rest of the
routine.
3. Now include a .frame pseudo-op:
.frame framereg,framesize,returnreg
The virtual frame pointer is a frame pointer as used in other compiler systems but
has no register allocated for it. It consists of the framereg ($sp, in most cases) added
to the framesize (see step 2 above). Figure 7-1 illustrates the stack components for –32
and Figure 7-2 shows the stack components for –n32 and –64.
The returnreg specifies the register containing the return address (usually $31).
These usual values may change if you use a varying stack pointer or are specifying
a kernel trap routine.
104
Program Design
high memory
argument n
•
•
•
virtual argument 1
framepointer ($fp)
argument build
stack pointer($sp)
(framereg) •
•
•
low memory
105
Chapter 7: Linkage Conventions
high memory
•
•
•
stack parameters
virtual
frame pointer ($fp)
register parameters a0 - a7
4. If the procedure is a leaf procedure that does not use the stack, skip to step 7.
Otherwise you must save the registers you allocated space for in step 2.
To save the general registers, use the following operations:
.mask bitmask,frameoffset
sw reg,framesize+frameoffset–N($sp)
The .mask directive specifies the registers to be stored and where they are stored. A
bit should be on in bitmask for each register saved (for example, if register $31 is
saved, bit 31 should be “1” in bitmask. Bits are set in bitmask in little-endian order,
even if the machine configuration is big-endian). The frameoffset is the offset from
the virtual frame pointer (this number is usually negative). N should be 0 for the
highest numbered register saved and then incremented by four for each
subsequently lower numbered register saved. For example:
sw $31,framesize+frameoffset($sp)
sw $17,framesize+frameoffset–4($sp)
sw $16,framesize+frameoffset–16($sp)
Figure 7-3 illustrates this example.
106
Program Design
Now save any floating-point registers that you allocated space for in step 2 as
follows:
.fmask bitmask,frameoffsets.[sd]
reg,framesize+frameoffset–N($sp)
Notice that saving floating-point registers is identical to saving general registers
except for using the .fmask pseudo-op instead of .mask, and the stores are of
floating-point singles or doubles.The discussion regarding saving general registers
applies here as well, but remember that N should be incremented by 16 for doubles.
The stack framesize must be a multiple of 16.
high memory
virtual
framepointer ($fp)
frame offset
saved $31
saved $17
saved $16 framesize
•
•
•
stack pointer($sp)
low memory
107
Chapter 7: Linkage Conventions
5. This step describes parameter passing: how to access arguments passed into your
routine and passing arguments correctly to other procedures. For information on
high-level language-specific constructs (call-by-name, call-by-value, string or
structure passing), refer to the MIPSpro Compiling and Performance Tuning Guide.
As specified in step 2, space must be allocated on the stack for all arguments even
though they may be passed in registers. This provides a saving area if their registers
are needed for other variables.
General registers must be used for passing arguments. For 32-bit compilations,
general registers $4–$7 and float registers $f12, $f14 are used for passing the first
four arguments (if possible). You must allocate a pair of registers (even if it’s a
single-precision argument) that start with an even register for floating-point
arguments appearing in registers.
For 64-bit compilations, general registers $4–$11 and float registers $f12, through
$f19 are used for passing the first eight arguments (if possible).
In Table 7-1 and Table 7-2, the “fN” arguments are considered single- and
double-precision floating-point arguments, and “nN” arguments are everything
else. The ellipses (...) mean that the rest of the arguments do not go in registers
regardless of their type. The “stack” assignment means that you do not put this
argument in a register. The register assignments occur in the order shown in order
to satisfy optimizing compiler protocols:
108
Program Design
n1,d1 $4,$f13
n1,n2,n3,n4 $4,$5,$6,$7
n1,n2,n3,d1 $4,$5,$6,$f15
n1,s1,n2,s2 $4,$f13,$6,$f15
n1,s1,n2,n3 $4,$f13,$6,$7
109
Chapter 7: Linkage Conventions
6. Next, you must restore registers that were saved in step 4. To restore general
purpose registers:
lw reg,framesize+frameoffset–N($sp)
To restore the floating-point registers:
l.[sd] reg,framesize+frameoffset–N($sp)
Refer to step 4 for a discussion of the value of N.)
7. Get the return address:
lw $31,framesize+frameoffset($sp)
8. Clean up the stack:
addu framesize
9. Return:
j $31
10. To end the procedure:
.end procedurename
Differences in stack frame usage for –n32 and –64 compiles are summarized here. The
portion of the argument structure beyond the initial eight doublewords is passed in
memory on the stack, pointed to by the stack pointer at the time of call. The caller does
not reserve space for the register arguments; the callee is responsible for reserving it if
required (either adjacent to any caller-saved stack arguments if required, or elsewhere as
appropriate). No requirement is placed on the callee either to allocate space and save the
register parameters, or to save them in any particular place.
In most cases, high-level language routine and assembly routines communicate via
simple variables: pointers, integers, booleans, and single- and double-precision real
numbers. Describing the details of the various high-level data structures (arrays, records,
sets, and so on) is beyond our scope here. If you need to access such a structure as an
argument or as a shared global variable, refer to the MIPSpro Compiling, Debugging and
Performance Tuning Guide.
110
Examples
Examples
This section contains the examples that illustrate program design rules. Each example
shows a procedure written in C and its equivalent written in assembly language.
The following example shows a non-leaf procedure. Notice that it creates a stackframe,
and also saves its return address since it must put a new return address into register $31
when it invokes its callee:
float
nonleaf(int i, int *j;)
{
double atof();
int temp;
temp = i - *j;
if (i < *j) temp = -temp;
return atof(temp);
}
.globl nonleaf
# 1 float
# 2 nonleaf(i, j)
# 3 int i, *j;
# 4 {
.ent nonleaf 2
nonleaf:
.cpload $25 ## Load $gp
subu $sp, 32 ## Create stackframe
sw $31, 20($sp) ## Save the return address
sw $sp, 24($sp) ## Save gp
.mask 0x80000000, -4
.frame $sp, 32, $31
# 5 double atof();
# 6 int temp;
# 7
# 8 temp = i - *j;
lw $2, 0($5) ## Arguments are in $4 and $5
subu $3, $4, $2
# 9 if (i < *j) temp = -temp;
bge $4, $2, $32 ## Note: $32 is a label,not a reg
negu $3, $3
111
Chapter 7: Linkage Conventions
$32:
# 10 return atof(temp);
move $4, $3
jal atof
cvt.s. $f0, $f0 ## Return value goes in $f0
lw $gp, 24($sp) ## Restore gp
lw $31, 20($sp) ## Restore return address
addu $sp, 32 ## Delete stackframe
j $31 ## Return to caller
.end nonleaf
The –n32 code for the previous example is shown below. Note that this code is under .set
noreorder, so be aware of delay slots.
.set noreorder
# Program Unit: nonleaf
.ent nonleaf
.globl nonleaf
nonleaf: # 0x0
.frame $sp, 32, $31
.mask 0x80000000, -32
lw $7,0($5) # load *j
addiu $sp,$sp,-32 # .frame.len.nonleaf
sd $gp,8($sp) # save $gp
sd $31,0($sp) # save $ra
lui $31,%hi(%neg(%gp_rel(nonleaf +0))) # load new $gp
addiu $31,$31,%lo(%neg(%gp_rel(nonleaf +0))) #
addu $gp,$25,$31 #
slt $1,$4,$7 # compare i to *j
beq $1,$0,.L.1.1.temp #
subu $7,$4,$7 # i - *j, in delay slot of branch
subu $7,$0,$7 # temp = -temp
.L.1.1.temp: # 0x2c
lw $25,%call16(atof)($gp) #
jalr $25 # atof
or $4,$7,$0 # delay slot of jalr loads arg
ld $31,0($sp) # restore $ra
cvt.s.d $f0,$f0 #
ld $gp,8($sp) # restore $gp
jr $31 #
addiu $sp,$sp,32 # .frame.len.nonleaf
.end nonleaf
112
Examples
The example shown below is a leaf procedure that does not require stack space for local
variables. Notice that it creates no stackframe, and saves no return address.
int
leaf(int p1, int p2)
{
return (p1 > p2) ? p1 : p2;
}
.globl leaf
# 1 int
# 2 leaf(p1, p2)
# 3 int p1, p2;
# 4 {
.ent leaf2
leaf:
.frame $sp, 0, $31
# 5 return (p1 > p2) ? p1 : p2;
ble $4, $5, $32 ## Arguments in
## $4 and $5
move $3, $4
b $33
$32:
move $3, $5
$33:
move $2, $3 ## Return value
## goes in $2
j $31 ## Return to caller
# 6 }
.end leaf
113
Chapter 7: Linkage Conventions
The –n32 code for the previous example looks like this:
.set noreorder
.ent leaf
.globl leaf
leaf: # 0x0
.frame$sp, 0, $31
slt $2,$5,$4 # compare p1 and p2
beq $2,$0,.L.1.2.temp #
or $9,$4,$0 # delay slot
b .L.1.1.temp #
or $2,$9,$0 # delay slot, return p1
.L.1.2.temp: # 0x14
or $2,$5,$0 # return p2
.L.1.1.temp: # 0x18
jr $31 #
nop # delay slot
.end leaf
The rules and parameter requirements that exist between assembly language and other
languages are varied and complex. The simplest approach to coding an interface
between an assembly routine and a routine written in a high-level language is to do the
following:
• Use the high-level language to write a skeletal version of the routine that you plan
to code in assembly language.
• Compile the program using the –S option, which creates an assembly language (.s)
version of the compiled source file (the –O option, though not required, reduces the
amount of code generated, making the listing easier to read).
• Study the assembly-language listing and then, imitating the rules and conventions
used by the compiler, write your assembly language code.
114
Chapter 8
8. Pseudo Op-Codes
This chapter describes pseudo op-codes (directives). These pseudo op-codes influence
the assembler’s later behavior. In the text, boldface type specifies a keyword and italics
represents an operand that you define.
Pseudo-Op Description
115
Chapter 8: Pseudo Op-Codes
Pseudo-Op Description
.aent name, symno Sets an alternate entry point for the current
procedure. Use this information when you
want to generate information for the
debugger. It must appear inside an .ent/.end
pair.
.ascii string [, string]... Assembles each string from the list into
successive locations. The .ascii directive does
not null pad the string. You MUST put
quotation marks (”) around each string. You
can use the backslash escape characters. For a
list of the backslash characters, see Chapter 4.
116
Table 8-1 (continued) Pseudo Op-Codes
Pseudo-Op Description
.cpadd reg Emits code that adds the value of “_gp” to reg.
117
Chapter 8: Pseudo Op-Codes
Pseudo-Op Description
.cpsetup reg1, {offset | reg2}, label Causes the assembler to emit the following at
the point where it occurs:
sd $gp, offset ($sp)
lui $gp, 0 { label }
daddiu $gp, $gp, 0 { label }
daddu $gp, $gp, reg1
ld $gp, offset ($sp)
This sequence is used by
position-independent code following the
callee saved gp convention. It stores $gp in the
saved register area and calculates the virtual
address of label and places it in reg1. By
convention, reg1 is $25 (t9).
If reg2 is used instead of offset, $gp is saved
and restored to and from this register. (–n32
and –64 only)
118
Table 8-1 (continued) Pseudo Op-Codes
Pseudo-Op Description
.dynsym name value Specifies an ELF st_other value for the object
denoted by name. (–n32 and –64 only)
119
Chapter 8: Pseudo Op-Codes
Pseudo-Op Description
.fmask mask offset Sets a mask with a bit turned on for each
floating point register that the current routine
saved. The least-significant bit corresponds to
register $f0. The offset is the distance in bytes
from the virtual frame pointer at which the
floating point registers are saved. The
assembler saves higher register numbers
closer to the virtual frame pointer. You must
use .ent before .fmask and only one .fmask may
be used per .ent. Space should be allocated for
those registers specified in the .fmask.
120
Table 8-1 (continued) Pseudo Op-Codes
Pseudo-Op Description
.frame frame-register offset Describes a stack frame. The first register is the
return_pc_register frame-register, the offset is the distance from
the frame register to the virtual frame pointer,
and the second register is the return program
counter (or, if the first register is $0, this
directive shows that the return program
counter is saved four bytes from the virtual
frame pointer). You must use .ent before .frame
and only one .frame may be used per .ent. No
stack traces can be done in the debugger
without .frame.
121
Chapter 8: Pseudo Op-Codes
Pseudo-Op Description
.lcomm name, expression Makes the name’s data type bss. The assembler
allocates the named symbol to the bss area, and
the expression defines the named symbol’s
length. If a .globl directive also specifies the
name, the assembler allocates the named
symbol to external bss. The assembler puts bss
symbols in one of two bss areas. If the defined
size is smaller than (or equal to) the size
specified by the assembler or compiler’s –G
command line option, the assembler puts the
symbols in the sbss area and uses $gp to
address the data.
.loc file_number line_number Specifies the source file and the line within
[column] that file that corresponds to the assembly
instructions that follow. For use by compilers.
The assembler ignores the file number when
this directive appears in the assembly source
file. Then, the assembler assumes that the
directive refers to the most recent .file
directive. The 64-bit and N32 assembler also
supports an optional value that specifies the
column number.
.mask mask, offset Sets a mask with a bit turned on for each
general purpose register that the current
routine saved. For use by compilers. Bit one
corresponds to register $1. The offset is the
distance in bytes from the virtual frame
pointer where the registers are saved. The
assembler saves higher register numbers
closer to the virtual frame pointer. Space
should be allocated for those registers
appearing in the mask. If bit zero is set it is
assumed that space is allocated for all 31
registers regardless of whether they appear in
the mask.
122
Table 8-1 (continued) Pseudo Op-Codes
Pseudo-Op Description
123
Chapter 8: Pseudo Op-Codes
Pseudo-Op Description
.section name [, section type, section Instructs the assembler to create a section with
flags, section entry size, section the given name and optional attributes. (–n32
alignment] and –64 only)
Legal section type values are denoted by
variables prefixed by SHT_ in <elf.h>.
Legal section flags values are denoted by
variables prefixed by SHF_ in <elf.h>.
The section entry size specifies the size of each
entry in the section. For example, it is 4 for .text
sections.
The section alignment specifies the byte
boundary requirement for the section. For
example, it is 16 for .text sections.
124
Table 8-1 (continued) Pseudo Op-Codes
Pseudo-Op Description
.set option (continued) The at option lets the assembler use the $at
register for macros, but generates warnings if
the source program uses $at. When you use the
noat option and an assembler operation
requires the $at register, the assembler issues a
warning message; however, the noat option
does let source programs use $at without
issuing warnings.
The nomove option tells the assembler to mark
each subsequent instruction so that it cannot
be moved during reorganization. Because the
assembler can still insert nop instructions
where necessary for pipeline constraints, this
option is less stringent than noreorder. The
assembler can still move instructions from
below the nomove region to fill delay slots
above the region or vice versa. The nomove
option has part of the effect of the “volatile” C
declaration; it prevents otherwise
independent loads or stores from occurring in
a different order than intended.
The move option cancels the effect of nomove.
The notransform option tells the assembler to
mark each subsequent instruction so that it
cannot be transformed by pixie(1), into an
equivalent set of instructions. For an overview
of pixie(1) see the SpeedShop User’s Guide.
The transform option cancels the effect of
notransform.
125
Chapter 8: Pseudo Op-Codes
Pseudo-Op Description
126
The directives listed below are only accepted in –32 compiles; they are only meant for
compiler-generated code, and should not be used in hand-written assembly code.
.alias
.asm0
.bgnb
.endb
.err
.gjaldef
.gjallive
.gjrlive
.livereg
.noalias
.set bopt/nobopt
.vreg
127
Index
Symbols B
129
Index
coprocessor instruction E
notation, 72
coprocessor instruction set, 71 ELF
coprocessor interface instructions, 66 relocations, 28
description of, 67 .end, 119
counters endianness, 1
sections and locations, 20 .endr, 119
.ent, 119
exception
D division by zero, 96
unimplemented operation, 98
.data, 118
exceptions, 13
data types floating-point, 14
conventions, 26 main processor, 13
description exception trap processing, 95
address, 11
execption
descriptions inexact, 98
load instructions, 36 invalid operation, 95
directives overflow, 96
-n32 and 64 directives, 115 trap processing, 95
-o32 directives, 127 underflow, 97
division by zero, 96 expression
.double, 119 type propagation, 28
.dword, 119 expression operators, 25
.dysym, 119 expressions, 23
precedence, 24
.extern name expression, 120
130
Index
F H
G J
131
Index
K M
132
Index
R statements
keyword, 23
.rdata, 123 label definitions, 22
Register, 1 null, 23
register, 1 store instructions
endianness, 1 description, 40
format, 1 description - table, 41
registers format, 34
floating-point, 5 sb (store byte), 10
general, 2 sh (store halfword), 10
special, 4 sw (store word), 10
swl (store word left), 9, 10
relational operations
swr (store word right), 10
floating-point, 81
ush (unaligned store halfword), 10
relocations, 28 usw (unaligned store word), 10
.repeat, 123 string constants, 19
.struct, 126
S system control
instruction descriptions, 92
scalar constants, 17 instruction formats, 91
.sdata, 123
See, 65, 66 T
.set, 124, 125
shape of data, 110 .text, 126
shown, 11 tokens
.space, 125 comments, 16
special instructions, 32, 65 constants, 17
identifiers, 16
special registers, 4
type propagation in expression, 28
specify
a relocation, 28
stack frame, 103
stack organization- figure, 105
133
Index
unaligned data
load and store instructions, 9
underflow exception, 97
unimplemented operation exception, 98
value, 22
.verstamp, 126
.word, 126
134
Tell Us About This Manual
As a user of Silicon Graphics products, you can help us to better understand your needs
and to improve the quality of our documentation.
Any information that you provide will be useful. Here is a list of suggested topics:
• General impression of the document
• Omission of material that you expected to find
• Technical errors
• Relevance of the material to the job you had to do
• Quality of the printing and binding
Please send the title and part number of the document with your comments. The part
number for this document is 007-2418-003.
Thank you!