Assembly Language
Assembly Language
HLL computer
Pascal, C, Fortran . . . . . . . . . hardware
complete picture:
----------- ------------
HLL ---> | compiler|---> assembly --->| assembler|--->machine
----------- language ------------ language
we assume that you know a hll (high level language, like C++, C,
Pascal, Fortran). From that, we can give you
SASM. Later in the semester, you will learn Pentium.
Programs will be written in both SASM and Pentium.
------- ----------
| CPU | <---------> | memory |
------- | ----------
|
|
-------
| I/O |
-------
1. instruction FETCH
2. operand (variable) load LOAD
3. operand (variable) store STORE
next step:
suppose we want to execute multiple instructions, like
a program
The added step could come at any time after step 1. It is convenient
to think of it as step 2.
Chapter 3 -- SASM
ABOUT SASM
----------
About SASM:
-- one instruction, declaration per line
-- comments are anything on a line following `;'
(comments may not span lines)
-- given the Intel architecture and its history, there are an enormous
number of RESERVED WORDS. Consult APPENDIX A always!
DECLARATIONS
------------
Pascal:
var variablename: type;
C or C++:
type variablename;
SASM:
variablename type value
type is dd if integer
db if character
dd if floating point
examples:
bool_flag dd 0
counter dd 0
variable3 dd ?
constant_e dd 2.71828
uservalue db ?
letter_a db 'a'
string1 db 'This is a string.', 0
; null terminated string example, VERY USEFUL!
remember:
-- one declaration per line.
DIRECTIVES
----------
a way to give information to the assembler.
examples:
ARITHMETIC instructions
----------------------
examples:
move count, 0
iadd sum, 1
CONDITIONAL EXECUTION
---------------------
sometimes an instruction (or a set of instructions) should
be executed, and sometimes it (they) shouldn't.
if (condition) then
statement
else
statement;
C if-then-else
if (condition)
statement;
else
statement;
SASM 'ifs' and 'gotos'
(a better name is CONTROL INSTRUCTIONS)
---------------------------------------
SASM effect of instruction
These condition codes get changed (set) according to the result of certain
instructions (iadd, isub, ineg, and some logical instructions).
The condition codes are used by the control instructions.
EXAMPLE:
--------
Pascal if-then-else:
C equivalent:
if (count < 0)
count = count + 1;
compare count, 0
blz ifstuff
br end_if
ifstuff: iadd count, 1
end_if: # next program instruction goes here
-- OR --
compare count, 0
bgez end_if
iadd count, 1
end_if: # next program instruction goes here
NOTE: Be careful not to use RESERVED WORDS for your variable names
or label names.
(some reserved words: end endif if else elseif for while repeat)
EXAMPLES:
---------
Pascal:
C or C++:
SASM:
while_loop:
compare count, 0
blez end_while
irem a, count
isub count, 1
br while_loop
end_while: # next program instruction goes here
C or C++:
while_loop:
compare count, limit
bgez end_while
compare c, d
bnz end_while
br while_loop
end_while:
Pascal:
for i:= 3 to 8 do
begin
a := a + i;
end;
C:
SASM:
move i, 3
for_loop:
compare i, 8
bgz end_for
iadd a, i
iadd i, 1
br for_loop
end_for:
NOTE: There are times when you will want to 'get' something that
isn't a character (like an integer or floating point value input
by user). In SASM, you can't, since the instruction doesn't exist.
At the end of Chapter 5, you will know enough about data representation
to be able to read an integer (or floating point value) character
by character and translate it to an integer.
It is done this way because input from a keyboard are only characters.
Output to a simple display (which we are assuming) are only characters.
The C library (that we utilize) gives easy implementation of
output for other types, so you get that benefit in this language.
EXAMPLES:
.code
put_str msg1
put_i int1
put_ch newline
put_str msg2
prints:
.486
.model flat, stdcall
.stack 1000h
include sasmacros.inc
.data
end_letter db 'z'
one_ch db 'a'
newline db 0ah
.code
main:
put_ch one_ch
compareb end_letter, one_ch
bez stop_printing ; when one_ch becomes 'z', done
printing
iadd one_ch, 1
br main ; label main is the top of the
loop
stop_printing:
put_ch newline
done
end
345 is really
3 x 100 + 4 x 10 + 5 x 1
3 x 10**2 + 4 x 10**1 + 5 x 10**0
3 is the most significant symbol (it carries the most weight)
5 is the least significant symbol (it carries the least weight)
hex decimal
0 0
1 1
.
.
.
9 9
a 10
b 11
c 12
d 13
e 14
f 15
S S . . . S S S
n-1 n-2 2 1 0
example:
36/2 = 18 r=0 <-- lsb
18/2 = 9 r=0
9/2 = 4 r=1
4/2 = 2 r=0
2/2 = 1 r=0
1/2 = 0 r=1 <-- msb
example:
example:
example:
3 9 c 8
0011 1001 1100 1000
octal --> binary
(just like hex to binary!)
(trivial!) just write down the 8 bit binary code for
each octal digit
on representing nonintegers
---------------------------
scientific notation
a way of representing rational numbers using integers
(used commonly to represent nonintegers in computers)
exponent
number = mantissa x base
normalization
to keep a unique form for every representable noninteger, they
are kept in NORMALIZED form. A normalized number will follow the
following rule:
1.2
x 2.2
-----
24
+ 24
-----
264 --> 2.64 --> 2.6 or 2.6 +- .05
BINARY FRACTIONS
----------------
f f . . . f f f . f f f . . .
n-1 n-2 2 1 0 -1 -2 -3
|
|
binary point
2**-1 = .5
2**-2 = .25
2**-3 = .125
2**-4 = .0625 etc.
----
.8 is .1100
NON-BINARY FRACTIONS
--------------------
f f . . . f f f . f f f . . .
n-1 n-2 2 1 0 -1 -2 -3
|
|
radix point
example:
101.001 (octal)
1 x 8**2 + 1 x 8**0 + 1 x 8**-3
64 + 1 + 1/512
65 1/512 = 65.0019 (approx)
13.a6 (hexadecimal)
1 x 16**1 + 3 x 16**0 + a x 16**-1 + 6 x 16**-2
16 + 3 + 10/16 + 6/256
19 166/256 = 19.64 (approx)
134 (base 5)
1 x 5**2 + 3 x 5**1 + 4 x 5**0
25 + 15 + 4
44 (base 10)
rem
100/5 20 0
20/5 4 0
4/5 0 4
Chapter 5 -- representations
ALL ABOUT INTEGER REPRESENTATION.
---------------------------------
UNSIGNED
--------
the standard binary encoding already given
SIGN MAGNITUDE
--------------
example: 4 bits
0101 is 5
1101 is -5
4 bits, -7 to +7
n=4, - 2**3 + 1 to 2**3 - 1
-8 + 1 to 8 - 1
0000 is 0, 1000 is 0
the computer must do all calculations such that they
come out correctly and the same whichever representation is used.
ONE's COMPLEMENT
----------------
Now, nobody builds machines that are based on 1's comp. integers.
In the past, early computers built by Semour Cray (while at CDC)
were based on 1's comp. integers.
00000 is 0
00111 is 7, etc.
that is -1.
a 3 bit example:
bit pattern: 100 101 110 111 000 001 010 011
1's comp: -3 -2 -1 0 0 1 2 3
2's comp.: -4 -3 -2 -1 0 1 2 3
the negative values are all "slid" down by one, eliminating the
extra zero representation.
positive values are the same as for sign/mag and 1's comp.
they will have a 0 in the MSB (but it is NOT a sign bit!)
EXAMPLE:
what decimal value does the two's complement 110011 represent?
a 0011
+b +0001
-- -----
sum 0100
BIASED REPRESENTATION
---------------------
choosing a bias:
the bias chosen is most often based on the number of bits
available for representing an integer. To get an approx.
equal distribution of true values above and below 0,
the bias should be 2 ** (n-1) or (2**(n-1)) - 1
SIGN EXTENSION
--------------
how to change an integer with a smaller number of bits into the
same integer (same representation) with a larger number of bits.
by representation:
unsigned: xxxxx --> 000xxxxx
copy the original integer into the LSBs, and put 0's elsewhere
example: 0010101
000 0010101
11110000
11111111 11110000
OVERFLOW
--------
011 (3)
+ 110 (6)
---------
? (9) it would require 4 bits (1001) to represent
the value 9 in unsigned rep.
CHARACTER REPRESENTATION
------------------------
examples:
0100 0001 is 41 (hex) or 65 (decimal). It represents `A'
0100 0010 is 42 (hex) or 66 (decimal). It represents `B'
The code has some nice properties. If the bit patterns are compared,
(pretending they represent integers), then `A' < `B'
This is good, because it helps with sorting things into alphabetical
order.
the digits:
`0' is 48 (decimal) or 30 (hex)
`9' is 57 (decimal) or 39 (hex)
Because of this, code must deal with both characters and integers
correctly.
in1 db ?
result db ?
get_ch in1
moveb result, in1
iadd result, in1
put_ch result
put_ch prints out `f', since the ASCII code for 102(decimal) is `f'
read `3'
translate `3' to 3
read `5'
translate `5' to 5
integer = 3 * 10 + 5 = 35
read `4'
translate `4' to 4
integer = 35 * 10 + 4 = 354
the algorithm:
integer = 0
while there are more characters
get character
digit <- character - 48
integer <- integer * 10 + digit
54 div 10 gives 5
translate 5 to `5' and print it out
54 mod 10 gives 4
4 div 1 gives 4
translate 4 to `4' and print it out
4 mod 1 gives 0, so you're done
NOTE: These algorithms work the same in bases other than 10.
Just change the 10 to the desired base. (There is some extra
work in bases > 10, since have to translate the digits > 9
into the characters. For example, 12 (decimal) is c (hex).)
Note that this is not the only way to represent floating point
numbers, it is just the IEEE standard way of doing it.
the representation
-------------------
| S | E | F |
-------------------
S e
(-1) x f x 2
where
e = E - bias
n
f = F/2 + 1
Note: Computers that did not use this representation, like those
built before the standard, did not always use a radix of 2.
Example: some IBM machines had radix of 16.
An example: put the decimal number 64.2 into the IEEE single
precision representation.
first step:
get a binary representation for 64.2
to do this, get binary reps. for the stuff to the left
and right of the decimal point separately.
64 is 1000000
.2 x 2 = 0.4 0 (msb)
.4 x 2 = 0.8 0
.8 x 2 = 1.6 1
.6 x 2 = 1.2 1
6
1.000000 00110011. . . x 2
third step:
6 is the true exponent. For the standard form, it needs to
be in biased-127 form.
6
+ 127
-----
133 (value of biased exponent)
fourth step:
the mantissa stored (F) is the stuff to the right of the radix point
in the normal form. We need 23 bits of it.
000000 00110011001100110
S E F
0 10000101 00000000110011001100110
0x 4 2 8 0 6 6 6 6
or
42806666h
a 0011
+b +0001
-- -----
sum 0100
ADDITION
--------
unsigned:
just like the simple addition given.
examples:
sign magnitude:
rules:
- add magnitudes only (do not carry into the sign bit)
- throw away any carry out of the msb of the magnitude
(Due, again, to the fixed precision constraints.)
- add only integers of like sign ( + to + or - to -)
- sign of result is same as sign of the addends
examples:
0 01011 (11)
+ 1 01110 (-14)
----------------
don't add! must do subtraction!
one's complement:
by example
so it seems that if there is a carry out (of 1) from the msb, then
the result will be off by 1, so add 1 again to get the correct
result. (Implementation in HW called an "end around carrry.")
two's complement:
rules:
- just add all the bits
- throw away any carry out of the msb
- (same as for unsigned!)
examples
after seeing examples for all these representations, you may see
why 2's complement addition requires simpler hardware than
sign mag. or one's complement addition.
SUBTRACTION
-----------
general rules:
1 - 1 = 0
0 - 0 = 0
1 - 0 = 1
10 - 1 = 1
0 - 1 = borrow!
unsigned:
examples
sign magnitude:
examples
one's complement:
two's complement:
a - b becomes a + (-b)
examples
10110 (-10)
- 00011 (3) --> 00011
------------ |
\|/
11100
+ 1
-------
11101 (-3)
so do
10110 (-10)
+ 11101 (-3)
------------
1 10011 (-13)
(throw away carry out)
1000 (8)
+1001 (9)
-----
1 0001 (1)
sign magnitude -- when there is a carry out of the msb of the magnitude
1 1000 (-8)
+ 1 1001 (-9)
-----
1 0001 (-1) (carry out of msb of magnitude)
2's complement -- when the signs of the addends are the same, and the
sign of the result is different
0011 (3)
+ 0110 (6)
----------
1001 (-7) (note that a correct answer would be 9, but
9 cannot be represented in 4 bit 2's complement)
unsigned -- never
sign magnitude -- never happen when doing subtraction
2's complement -- we never do subtraction, so use the addition rule
on the addition operation done.
MULTIPLICATION of integers
0 x 0 = 0
0 x 1 = 0
1 x 0 = 0
1 x 1 = 1
-- longhand, it looks just like decimal
1111 1111 -1
x 1111 1001 x -7
---------------- ------
11111111 7
00000000
00000000
11111111
11111111
11111111
11111111
+ 11111111
----------------
1 00000000111
-------- (correct answer underlined)
DIVISION of integers
unsigned only!
X = 0011
Y = 1010
X AND Y is 0010
X OR Y is 1011
X NOR Y is 0100
X XOR Y is 1001
etc.
SHIFT OPERATIONS
a way of moving bits around within a word
00110101
00110101
ADDITION
3.25 x 10 ** 3
+ 0.000263 x 10 ** 3
--------------------
3.250263 x 10 ** 3
(presumes use of infinite precision, without regard for accuracy)
-> choose to shift the .25, since we want to increase it's exponent.
-> shift by 10000101
-01111101
---------
00001000 (8) places.
step 2: add (don't forget the hidden bit for the 100)
0 10000101 1.10010000000000000000000 (100)
+ 0 10000101 0.00000001000000000000000 (.25)
---------------------------------------
0 10000101 1.10010001000000000000000
result is
0 10000101 10010001000000000000000
SUBTRACTION
then the algorithm for subtraction of sign mag. numbers takes over.
before subtracting,
compare magnitudes (don't forget the hidden bit!)
change sign bit if order of operands is changed.
MULTIPLICATION
3.0 x 10 ** 1
+ 0.5 x 10 ** 2
-----------------
3.0 x 10 ** 1
+ 0.5 x 10 ** 2
-----------------
1.50 x 10 ** 3
0 10000100 0100
x 1 00111100 1100
-----------------
mantissa multiplication: 1.0100
(don't forget hidden bit) x 1.1100
------
00000
00000
10100
10100
10100
---------
1000110000
becomes 10.00110000
biased:
10000100
+ 00111100
----------
1 01000001 10.00110000
1 01000001 10.00110000
becomes
1 01000010 1.000110000
similar to multiplication.
true division:
do unsigned division on the mantissas (don't forget the hidden bit)
subtract TRUE exponents
The IEEE standard is very specific about how all this is done.
Unfortunately, the hardware to do all this is pretty slow.
instead of doing a / b
they do a x 1/b.
figure out a reciprocal for b, and then use the fl. pt.
multiplication hardware.
rounding
--------
arithmetic operations on fl. pt. values compute results that cannot
be represented in the given amount of precision. So, we must round
results.
There are MANY ways of rounding. They each have "correct" uses, and
exist for different reasons. The goal in a computation is to have the
computer round such that the end result is as "correct" as possible.
There are even arguments as to what is really correct.
3 methods of rounding:
round toward 0 -- also called truncation.
figure out how many bits (digits) are available. Take that many
bits (digits) for the result and throw away the rest.
This has the effect of making the value represented closer
to 0.
example:
.7783 if 3 decimal places available, .778
if 2 decimal places available, .77
example:
1.23 if 2 decimal places, 1.3
-2.86 if 2 decimal places, -2.8
example:
1.23 if 2 decimal places, 1.2
-2.86 if 2 decimal places, -2.9
1.1101
|
1.11 | 10.00
------
1.001
|
1.00 | 1.01
-----
1.1101
|
1.11 | 10.00
------
1.001
|
1.00 | 1.01
-----
round toward zero (TRUNCATE) --
1.1101
|
1.11 | 10.00
------
1.001
|
1.00 | 1.01
-----
-1.1101
|
-10.00 | -1.11
------
-1.001
|
-1.01 | -1.00
-----
1.1111
|
1.11 | 10.00
------
1.1101
|
1.11 | 10.00
------
use of standards
----------------
--> allows all machines following the standard to exchange data
and to calculate the exact same results.
HW vs. SW computing
-------------------
floating point operations can be done by hardware (circuitry)
or by software (program code).
DATA STRUCTURES
---------------
ARRAYS
------
Properties of arrays:
1. each element is the same size (char = 1 byte, integer = 4 bytes)
2. elements are stored contiguously, with the first element
stored at the smallest memory address
example:
my_array db 8 dup(0)
8 character elements, numbered 0 - 7, initialized to 0
0 1 2 3 4 5 6
-----------------------------
| | | | |XXX| | |
| | | | |XXX| | |
-----------------------------
25 26 27 28 29 30 31 <---- address
la ar_addr, array1
la my_addr, array1
iadd my_addr, 4
integer elements --
array2: array[0..5] of integer; /* PASCAL */
in SASM,
array2 dd 6 dup(0)
0 1 2 3 4 5 <-- index
-------------------------
| | | |XXX| | |
| | | |XXX| | |
-------------------------
80 84 88 92 96 100 <-- memory address
2 DIMENSIONAL ARRAYS
--------------------
TERMINOLOGY:
r x c array -- r rows
c columns
example: 4 x 2 array
| |
-------
| 0,0 |
-------
| 0,1 |
-------
| 1,0 |
-------
| 1,1 |
-------
| 2,0 |
-------
| 2,1 |
-------
| 3,0 |
-------
| 3,1 |
-------
| |
| |
-------
| 0,0 |
-------
| 1,0 |
-------
| 2,0 |
-------
| 3,0 |
-------
| 0,1 |
-------
| 1,1 |
-------
| 2,1 |
-------
| 3,1 |
-------
| |
2-D Arrays
----------
(size) (x - first_col)
Column Major:
(size) (y - first_row)
Need to know:
1. row/column major (storage order)
2. base address
3. size of elements
4. dimensions of the array
BOUNDS CHECKING
---------------
Many HLL's offer some form of bounds checking. Your program crashes,
or you get an error message if an array index is out of bounds
cal example: x: array[1..6] of integer;
code. . .
y := x[8];
given -- a 5 x 3 array
byte size elements
row major order
first_row = 1
first_col = 1
STACKS
------
| |
| |
|-------|
| |
|-------|
| |
|-------|
| |
|-------|
| |
-------
Data put into the stack is said to be PUSHED onto the stack.
Data taken out of the stack is said to be POPPED off the stack.
Algorithm Example:
printing out a positive integer, character by character
(integer to character string conversion)
integer = 1024
if integer == 0 then
push '0'
else
while integer <> 0
digit <- integer mod base
char <- digit + 48
push char onto stack
integer <- integer div base
IMPLEMENTATION OF A STACK
-------------------------
One implementation of a stack out of an array.
Need to know:
index of TOP OF STACK (tos), often called a stack pointer (sp).
In most implementations, tos is not an index at all, it is
an ADDRESS (pointer).
(initial state)
sp
|
\ /
-----------------------------
| | | | | | | |
-----------------------------
my_stack dd 50 dup(0)
my_sp dd ?
a PUSH operation:
isub my_sp, 4
move data, M(my_sp)
(initial state)
sp
|
\ /
-----------------------------
| | | | | | | |
-----------------------------
a PUSH operation:
iadd my_sp, 4
move M(my_sp), data
a POP operation:
ANOTHER ALTERNATIVE:
The stack could "grow" from the end of the array towards
the beginning. (Note that which end of the array the stack
grows toward is independent of what stack pointer points to.)
QUEUES
-------
initial state:
--------------------------------------------
| | | | | |
--------------------------------------------
^
|
head, and tail
Chapter 9 -- registers
REGISTERS
---------
So we get
fetch instruction: 30 time units
decode 1 time unit
load x 10 time units
load y 10 time units
add 1 time unit
store x 10 time units
---------------------------------
total time: 62 time units
load z
add y
store x
BUT, what if the operation following the add was something like
div x, x, 3
then, the value for x is already in the accumulator, and the
code on the 1 address machine could be
load z
add y
div 3
store x
there is only 1 extra instruction (3 memory transactions) for this
whole sequence!
On the 3-address machine: 13 transactions
On the 1-address machine: 11 transactions
add x, y, z
load reg10, y
load reg11, z
add reg12, reg11, reg10
store x, reg12
Once a computer has registers (and they ALL do!), then there
can be lots of interesting uses of these registers.
iadd reg2, 1
Registers
---------
EAX AX AH AL
EBX BX BH BL
ECX CX CH CL
EDX DX DH DL
and
EBP BP
ESI SI
EDI DI
ESP SP
There are a few more, but we won't use or discuss them. They
are only used for memory accessability in the segmented memory
model.
Oddities:
This is the only architecture that I know of where the programmer
can designate part of a register as an operand. On ALL other
machines, the whole register is designated and used.
OF Overflow flag
SF Sign flag
ZF Zero flag
PF Parity flag
CF Carry flag
Accessing Memory
----------------
-- The memory model that we use. AND, the memory model that every
other manufactures' processors also use.
--
code
data
stack
other
Addressing Modes
----------------
Some would say that the Intel architectures only support 1 addressing
mode. It looks (something like) this:
where
base reg is EAX, EBX, ECX, EDX or ESP or EBP
index reg is EDI or ESI
scaling factor is 1, 2, 4, or 8
-- register mode --
The operand is in a register. The effective address is the
register (wierd).
Example instruction:
-- immediate mode --
The operand is in the instruction. The effective address is within
the instruction.
Example instruction:
mov eax, 26
Example instruction:
-- direct mode --
The effective address is in the instruction.
Example instruction:
Example instruction:
Example instruction:
-- PC relative mode --
The effective address is the sum of the contents of the PC and
a constant contained within the instruction.
Example instruction:
jmp a_label
Instruction Set
----------------
Generalities:
-- Many (most?) of the instructions have exactly 2 operands.
If there are 2 operands, then one of them will be required
to use register mode, and the other will have no restrictions
on its addressing mode.
Data Movement
-------------
EXAMPLES:
Integer Arithmetic
------------------
EXAMPLES:
Logical
-------
not r/m ; logical not
EXAMPLES:
I/O
---
The only instructions which actually allow the reading and
writing of I/O devices are priviledged. The OS must handle
these things. But, in writing programs that do something
useful, we need input and output. Therefore, there are some
simple macros defined to help us do I/O.
Control Instructions
--------------------
These are the same control instructions that all started with
the character 'b' in SASM.
Chapter 11 -- Procedures
All about Procedures
--------------------
an introduction to procedures
In Pascal:
begin
.
.
.
x := larger(a, b); CALL
.
.
.
end.
HEADER PARAMETERS
function larger (one, two: integer): integer;
begin
if ( one > two ) then
larger := one BODY
else
larger := two
end;
In C:
{
.
.
.
x = larger(a, b); CALL
.
.
.
}
HEADER PARAMETERS
int larger (int one, int two)
{
if ( one > two )
larger = one; BODY
else
larger = two;
}
; one call
lea EAX, rtn_point
jmp proc1
rtn_point:
; another call
lea EAX, rtn_point2
jmp proc1
rtn_point2:
; one call
lea EAX, rtn_point
jmp proc1
rtn_point:
.data
addr_stack dd 100 dup(0) ; hope that 100 addresses is enough!
stack_ptr dd ?
; one call
lea EAX, rtn_point
jmp proc1
rtn_point:
SYSTEM STACK
------------
A stack is so frequently used in implementing procedure call/return,
that many computer systems predefine a stack, the SYSTEM STACK.
In memory, we have:
address | |
0 | your |
| program |
| here |
| |
| |
| |
| |
| |
| system | / \
very | stack | | grows towards smaller addresses
large | here | |
addresses
terminology:
DOWN and UP are vague terms, unless you know what the
picture looks like.
push, in Pentium:
sub esp, 4
mov [esp], ?
OR
pop, in Pentium:
mov ?, [esp]
add esp, 4
NOTE: If you use register esp for any use other than for
a stack pointer, then the location of the stack is lost.
push r/m
reg
immed
pop reg
.
.
.
lea EAX, rtn3 ; this would overwrite the return
jmp proc2 ; address if it had not been saved.
rtn3:
.
.
.
proc2:
.
.
.
jmp [EAX]
call proc1
.
.
.
call proc1
.
.
.
proc1:
.
.
.
call proc2
.
.
.
ret
proc2:
.
.
.
ret
Space for a stack frame gets placed on the stack each time
a procedure is called, and taken off the stack each time a
return occurs. These stack frames are pushed/popped
DYNAMICALLY (while the program is running).
example:
call A
call B
.
.
A: call C
call D
ret
B: call D
ret
C: call E
ret
D: ret
E: ret
becomes . . .
The stack gets used for more than just pushing/popping stack frames.
During the execution of a procedure, there may be a need for temporary
storage of variables. The common example of this is in expression
evaluation.
Example: high level language statement
Z = (X * Y) + (A/2) - 100
The intermediate values of X*Y and A/2 must be stored somewhere.
On older machines, register space was at a premium. There just weren't
enough registers to be used for this sort of thing. So, intermediate
results (local variables) were stored on the stack.
| |
---------
| |
---------
| | --- <- esp
--------- |
| | |
--------- |
| | |--- procedure's frame
--------- |
| | |
--------- |
|rtnaddr| |
--------- ---
| |
---------
---------
| |
---------
| temp2 | <- esp
---------
| temp1 |
---------
| | ---
--------- |
| | |
--------- |
| | |--- procedure's frame
--------- |
| | |
--------- |
|rtnaddr| |
--------- ---
| |
---------
All this is motivation for keeping an extra pointer around that does
not move with respect to the current stack frame.
---------
| |
---------
| temp2 | <- esp
---------
| temp1 |
---------
| | ---
--------- |
| | |
--------- |
| | |--- procedure's frame
--------- |
| | |
--------- |
|rtnaddr| | <- frame pointer
--------- ---
| |
---------
Now items within the frame can be accessed with offsets from the
frame pointer, AND the offsets do not change within the procedure.
NOTE:
-- EBP must be initialized at the start of every procedure,
and restored at the end of every procedure.
parameter passing.
------------------
parameter = argument
example:
.
.
.
mov EAX, [EDX] ; put parameter in EAX
call decrement
mov [EDX], EAX ; recopy parameter to its correct place.
.
.
.
decrement:
sub EAX, 1
ret
CALLEE SAVED
A procedure clears out some registers for its own use.
CALLER SAVED
The calling program saves the registers and local variables
that it does not want a called procedure to overwrite.
REVISITING PROCEDURES.
Needed:
--> items to go in activation record.
return address
frame pointer (if used)
parameters
local variables --| may be some overlap here
saved registers --|
--> mechanism
ASSEMBLY
---------
the assembler's job is to
1. assign addresses
2. generate machine code
an assembler will
-- assign addresses
IMPORTANT:
there are separate memory spaces for data and instructions.
the assembler allocates them IN SEQENTIAL ORDER as it scans
through the source code program.
EXAMPLE INSTRUCTION:
add eax, 24
The only one that would match the operand types is the
third one in the list:
add r/m, immed 81 /0 id
BITS 7 6 5 4 3 2 1 0
mod reg/opcode r/m
BITS 7 6 5 4 3 2 1 0
mod reg/opcode r/m
1 1 0 0 0 0 0 0
dec r/m ff /1
The opcode is ff, and it describes that there will be one operand,
and it is of the general form. The /1 says that the
Reg field of the ModR/M byte will be 001.
BITS 7 6 5 4 3 2 1 0
Mod Reg/Opcode R/M
0 0 1
The table on page 353 describes the Mod and R/M fields.
Find the register direct addressing mode, using register EDX
in the table. It gives Mod 00 and R/M 010.
BITS 7 6 5 4 3 2 1 0
Mod Reg/Opcode R/M
0 0 0 0 1 0 1 0
A BIG EXAMPLE:
.data
a1 dd 4
a2 dd ?
a3 dd 5 dup(0)
.code
main: mov ecx, 20
mov eax, 15
mov edx, 0
jz target_label
loop1: add edx, eax
imul [ebp + 8]
dec ecx
jg loop1
target_label:
done
Symbol table
symbol address
---------------------
a1 0040 0000 (I made up this address, 'cuz we need
a starting address for the data section.)
and,
Assume that the code will be assembled such that the first
instruction is placed at address 0x0000 0000.
mov ecx, 20
mov eax, 15
mov edx, 0
jz target_label
jz rel32 0f 84 "cd"
0000 0000 b9 14 00 00 00
0000 0005 b8 0f 00 00 00
0000 000a ba 00 00 00 00
0000 000f 0f 84 ?? ?? ?? ??
0000 0015
symbol table
symbol address
---------------------
a1 0040 0000
a2 0040 0004
a3 0040 0008
0040 000c
0040 0010
0040 0014
0040 0018
BITS 7 6 5 4 3 2 1 0
Mod Reg/Opcode R/M
1 1 0 1 0 0 0 0
0000 0000 b9 14 00 00 00
0000 0005 b8 0f 00 00 00
0000 000a ba 00 00 00 00
0000 000f 0f 84 ?? ?? ?? ??
0000 0015 03 d0
0000 0017
imul [ebp + 8]
imul r/m f7 /5
BITS 7 6 5 4 3 2 1 0
Mod Reg/Opcode R/M
1 0 1 0 1 1 0 1
0000 0000 b9 14 00 00 00
0000 0005 b8 10 00 00 00
0000 000a ba 00 00 00 00
0000 000f 0f 84 ?? ?? ?? ??
0000 0015 03 d0
0000 0017 f7 ad 08 00 00 00
0000 001d
dec ecx
dec reg 48 + rd
0000 0000 b9 14 00 00 00
0000 0005 b8 10 00 00 00
0000 000a ba 00 00 00 00
0000 000f 0f 84 ?? ?? ?? ??
0000 0015 03 d0
0000 0017 f7 ad 08 00 00 00
0000 001d 49
0000 001e
jg loop1
jg rel32 0f 8f "cd"
To calculate "cd",
at assembly time:
in hex 0x ff ff ff f1
Again,
memory map of code section (so far)
address contents
0000 0000 b9 14 00 00 00
0000 0005 b8 10 00 00 00
0000 000a ba 00 00 00 00
0000 000f 0f 84 ?? ?? ?? ??
0000 0015 03 d0
0000 0017 f7 ad 80 00 00 00
0000 001d 49
0000 001e 0f 8f f1 ff ff ff
0000 0024
0000 0000 b9 14 00 00 00
0000 0005 b8 10 00 00 00
0000 000a ba 00 00 00 00
0000 000f 0f 84 0f 00 00 00
0000 0015 03 d0
0000 0017 f7 ad 80 00 00 00
0000 001d 49
0000 001e 0f 8f f1 ff ff ff
0000 0024
Chapter 13 -- I/O
all about I/O
-------------
computer model:
----- --------
|CPU| <----> |memory|
----- ^ --------
|
|
\ /
-----
|i/o|
-----
protection issues --
in a real system, there could be more than one terminal
(terminal is a keyboard and display together)
Should one user be able to display characters on another's
display? Lock up another's keyboard? Send a file of
infinite length to the printer, effectively shutting
all others out?
physical issues --
A computer today (1998) can complete an instruction at
the rate of about 1 each nsec.
Unfortunately, typical I/O devices are much slower, often
requiring 10s of milliseconds to deal with a single
character. That is approx. 1 million times slower!
This situation is dubbed the "access gap."
all platters are tied together and rotate around the SPINDLE
at a fixed speed.
So, the nitty gritty issue is: how does the OS accomplish
I/O requests? There are 2 possibilities.
How does the processor know that the instruction has completed?
(Is there any need to wait?)
What happens if the device encounters an error?
(Does this halt the computer?)
for example,
address
0x0000 0000 -|
. | real memory
. |
. |
0xffff 0000 -|
and
-- put_ch and put_str: how does the computer know that the device
is ready to print out a second character? What if the
printer jams? (printers and terminals are SLOW!)
address
0x0000 0000 -|
. | real memory
. |
. |
0xffff 0000 -|
and
EXCEPTION HANDLERS
------------------
The I/O devices tell the CPU that they are "ready."
EXCEPTIONS
----------
1. interrupts
--initiated outside the instruction stream
--arrive asynchronously (at no specific time)
examples:
I/O device status change
I/O device error condition
thermal override shutdown
internal error detection
2. traps
--occur due to something in instruction stream
--arrive synchronously (while instruction is executing)
good test: if program was re-run, the trap would
occur in precisely the same place in the code.
examples:
bad opcode
arithmetic overflow
I/O functionality, like put_ch
attempt to access privileged or unavailable memory
exception handling
------------------
some terms
----------
examples: vector
0 divide by zero (a trap, really)
2 non-maskable interrupt
4 overflow (a trap, really)
6 invalid opcode (how could we get one of these?)
7 device not available
there can be up to 256 unique vectors
2. Uses the vector to get at the code for the exception handler.
DBL --
| |
|--------------|
| PC (EIP) |
|--------------|
| | CS |
|--------------|
| EFLAGS |
|--------------|
| ESP |
|--------------|
| | SS |
|--------------|
| |
|--------------|
Note that we now have saved away the return address for
when the exception handling is finished.
PRIVILEDGE
Some resources:
the processor (for executing code!)
main memory
all I/O devices
programs (both applications and the OS code)
2. privileged instructions.
Specific instructions within the instruction set can only be
executed by the OS code. To make this work, there must be
one or more bits in a register somewhere that identify the
privilege level of the currently running program. This bit
is checked each time an instruction is decoded. Each instruction
has its own required privilege level.
PRIORITIES
general solutions:
FCFS -- the first one to arrive gets handled first.
Intel's solution:
Hardware is placed in a chip separate from the processor.
This separate chip is called the Programmable Interrupt
Controller (PIC), and it makes the decisions about what interrupts
are given to the processor, and which come first.
MEMORY HIERARCHIES
------------------
Known in current technologies: the time to access data
from memory is an order of magnitude greater than a
processor operation. (And note that this has been true
for well more than a decade.)
This average (mean) access time will change for each program.
It depends on the program, and its reference pattern, and how
that pattern interracts with the cache parameters.
address contents
(binary) (hex)
00000 aa bb cc dd
00100 00 11 22 33
01000 ff ee 01 23
01100 45 67 89 0a
10000 bc de f0 1a
10100 2a 3a 4a 5a
11000 6a 7a 8a 9a
11100 1b 2b 3b 4b
(1)
First memory reference is to the byte at address 01101.
(2)
Second memory reference is to the byte at address 01010.
(3)
Third memory reference is to the byte at address 01111.
This line within the cache has its valid bit set, so there
is a block (from memory) in the cache. BUT, is it the
block that we want? The tag of the desired byte is checked
against the tag of the block currently in the cache. They
match, and therefore we have a HIT.
(4)
Fourth memory reference is to the byte at address 11010.
This line within the cache has its valid bit set, so there
is a block (from memory) in the cache. BUT, is it the
block that we want? The tag of the desired byte is checked
against the tag of the block currently in the cache. They
do NOT match. Therefore, the block currently in the cache
is the wrong one. It will be overwritten with the block
(from memory) that we now do want.
(5)
Fifth memory reference is to the byte at address 11011.
This line within the cache has its valid bit set, so there
is a block (from memory) in the cache. BUT, is it the
block that we want? The tag of the desired byte is checked
against the tag of the block currently in the cache. They
match, and therefore we have a HIT.
Often
cache: instruction cache 1 cycle
data cache 1 cycle
main memory 20 cycles
Remember:
the cache is smaller than main memory, so not all blocks are in the cache.
blocks are larger than 1 word (spatial locality)
disk managed by OS
PIPELINING (ILP)
-----------------
concept
-------
A task is broken down into steps.
Assume that there are N steps, each takes the same amount of time.
steps: P -- prep
W -- wash
R -- rinse
D -- dry
X -- wax
assume each step takes 1 time unit
2-stage pipeline
----------------
steps:
F -- instruction fetch
E -- instruction execute (everything else)
5-stage pipeline
----------------
a popular pipelined implementation, that works really well for
teaching about pipelines and also for load/store architectures
steps:
IF -- instruction fetch
D -- instruction decode
OA -- operand access
OP -- ALU operation (can be effective address calculation)
R -- store results
data dependencies
-----------------
control dependencies
--------------------
jmp label1
inc eax
label1: mult ecx
Question for the advanced student: Which is better, (a) or (b)? Why?
squashing
A fancy name for branch prediction that always presumes the
branch will be taken, and keeps a copy of the PC that will
be needed in the case of backing out.
Amdahl's Law
------------
= 1
---------
1-f + f/S
Examples
f S speedup
--- --- -------
95% 1.10 1.094
5% 10 1.047
5% inf 1.052
lim 1
--------- = 1/ 1-f
S --> inf 1-f + f/S
f speedup
--- -------
1% 1.01
2% 1.02
5% 1.05
10% 1.11
20% 1.25
50% 2.00
This says that we should concentrate on the common case!
Chapter 16 -- Architectures
PERSPECTIVE ON ARCHITECTURE DESIGN
----------------------------------
price/performance curve
^ |
perf. | | x
| x Want to be to the "left" of these,
| x to have higher performance for the
| price. "More bang for the buck."
| x x
_________________
price ->
technology -- a perspective
----------
as hardware has gotten (much) faster and cheaper, attention has been
diverted to software.
OS
compilers
optimizers
IPC (inter-process communication)
1 instruction at a time isn't enough. The technology
isn't "keeping up." So, do more than one instruction
at a time:
parallelism
-----------
instruction level (ILP) -- pipelining
superscalar -- more than one instruction at a time
multis
VLIW
supercomputer
INSTRUCTIONS
- a relatively simple set, but NOT a load/store arch.
- a two-address architecture
- most instructions are specified in 16 bits -- fixed size.
- tight encoding, it is difficult to distinguish opcode from operands,
but the m.s. 4 bits are always part of the opcode.
integer arithmetic
different opcode for varying size data
(add.b add.w add.l)
8-bit 16-bit 32-bit
logical
different opcode for varying size data
control instructions
conditional branches, jumps
(condition code mechanism used -- where most instructions
had the effect of setting the condition codes)
procedure mechanisms
call and return instructions
floating point ??
(I guess not!)
decimal string
arithmetic presuming representation of binary coded decimal
REGISTERS
16 32-bit general purpose registers,
only one is not general purpose (it is a stack pointer)
the PC is not part of the general purpose registers
DATA TYPES
byte
word (16 bits)
longword (32 bits)
OPERAND ACCESS
PERFORMANCE
?, they got faster as new technologies got faster.
ORIGINAL SIZE
1 64-pin VLSI chip (a huge number of pins at that time)
The Cray 1 earned this honor, and was the fastest for a relatively long
period of time.
The man who designed the machine, Semour Cray, was a bit of an eccentric,
but he can get away with it because he was so good. The Cray 1 has an
exceptionally "clean" design, and that makes it fast. (This is probably
a bit exaggerated due to my bias -- the Cray 1 is probably my favorite
computer.)
Mostly my opinion:
To make the circuitry as fast as possible, the Cray 1 took 2 paths
1. Physical -- a relatively "time-tested" technology was used, but
much attention was paid to making circuits physically close
(Semour was aware of the limits imposed by the speed of light.)
and the technology was pushed to its limits.
2. Include only what was necessary, but on that, a "don't spare the
horses" philosophy was used.
This means that extra hardware was used (not paying attention to
the cost) wherever it could to make the machine faster. And, at
the same time, any functionality that wasn't necessary (in Semour's
opinion) was left out.
Just remember:
if something seems out of place to you, or some functionality of a
computer that you think is essential and was not included in the Cray 1,
it wasn't necessary! And, leaving something out made the machine
faster.
How it is used:
most often, a separate (not as fast/powerful) computer was hooked up
as what was commonly called a host computer. The host is where you do
all your editing and debugging of programs. The host also maintains a
queue of jobs to be run on the Cray. One by one the jobs are run, so
the only thing that the Cray is doing is running the final jobs -- often
with LOTS of data. Although its operating system would allow it,
the "multi-tasking" (more than 1 program running simultaneously)
ability was not often used.
instruction set
fixed length instructions
either 16 or 32 bit (no variability that depends on
the number of operands)
number of operands possible for an instruction
0-3 (it is a 3-address instruction set)
number and kind of instructions
op codes are 7 bits long -- giving 128 instrucitons
This includes complete integer and floating point
instructions.
REGISTERS:
There are an ENORMOUS number of registers.
There are 5 types of registers.
T registers --
These are 64 64-bit backup registers for the S registers.
If you were to do some heavy programming on the Cray 1,
you'd find these registers very useful. This is partially
because you run out of S registers quickly, so you
need temporary storage, but don't want your program
to store to main memory (slow!). There's also an
instruction that allows you to load a block of memory
to the T registers. That's 1 instruction to do a bunch
of loads.
B registers --
These are backups to the A regs and are used in the
same manner as the T regs.
hardware stack
no support for stack accesses at all! There is no
special stack pointer register.
cache
none. There's so many registers that there isn't
really a need for one.
size of machine
A bit bigger than 2 refrigerators.
speed of machine
Significantly faster than the VAX and 68000.
For a while, it was the fastest machine around.
price of machine
As an analogy to some very pricey restaurants:
If you need to see the prices on the menu, you can't
afford to eat there.
on the VAX
----------
Some details:
REGISTERS
16 32-bit general purpose registers,
except that they really weren't all general purpose
R15 is the PC -- note that the programmer can change the
PC at will!
R14 is a stack pointer
R13 is a frame pointer
R12 is an argument pointer (address of where a procedure's
parameters are stored -- sometimes on the stack,
and sometimes in main memory)
DATA TYPES
byte
word (16 bits)
longword (32 bits)
quadword (64 bits)
octaword (128 bits)
OPERAND ACCESS
for example
addl3 (R3), R4, R2
uses Register Direct addressing mode for the first
operand --
operation
the address of the first operand is in R3,
load the operand at the address, add to the
contents of R4, and place the result into R2
The addressing mode for each operand can (an often is)
be different! NO restrictions (unlike Pentium and Motorola 68000)
MACHINE CODE
same
8-bit opcode 0110 0011 0101 0100 0101 0010
^ ^ ^ ^ ^ ^
| | | | | |
---- | ---------- | --------- | -- mode
| | |
|------------|-----------|--- which register
PERFORMANCE
SIZE
one version: the VAX11 750 was about the size of a large-capacity
washing machine
another version: the VAX11 780 was about the size of 2 refridgerators,
standing side by side
some details:
MIPS architecture
These are details of the MIPS R2000 architecture.
load/store architecture
-----------------------
Registers
---------
Example: $5 is register 5.
There are also 16 64-bit registers used for floating point operands.
or, if one addend was in ebx, and the other was in ecx,
then the code could be
mov eax, ebx
add eax, ecx
sb $10, 0($18)
control instructions
--------------------
jump -- unconditional
A portion of an absolute address is placed into the
instruction.
jal proc4
machine code
------------
bits 31 26 25 21 20 16 15 11 10 0
6-bit 5-bit 5-bit 5-bit
opcode reg reg reg more opcode
desig. desig. desig.
31 26 25 21 20 16 15 0
6-bit 5-bit 5-bit 16-bit immediate
opcode reg reg
desig. desig.
sample code
------------
Some notes:
la is not really a MIPS R2000 instruction. It is an example of
and instruction that is allowed in assembly language code, but
is translated into MIPS R2000 code by the assembler. This makes
the job of writing this assembly language code easier.
Virtual Memory
Problems:
Solution:
Address Space
-------------
Associated with every program (process) is a set of addresses
that it may reference.
|
|
\ /
translation
|
|
\ /
Problems:
-- impossible to share code between 2 programs (while keeping
other code/data private to a program). This is because there
is only 1 segment for each process.
Segmentation
------------
Problems:
each entry also has PRESENT bit (called VALID bit in book,
to draw an analogy to cache's valid bit). Present bit
identifies whether physical page is resident in main memory
or not.
Have one page table for each process. Its base address is kept in
a register in the CPU.
Problems:
Problems:
-- If tables are in memory, can take more than one memory access
to get at a piece of data.
Solution: TLB
Also included,
Present bit
Referenced bit(s) - To help know chose victim if no free pages.
Protection - to facilitate sharing
Dirty bit - done at page level, not cache level.
Process ID - sometimes added so that the TLB does not have
to be flushed at each context switch.
PAGE FAULTS
-----------
What happens if the page is not resident in main memory.