CS 273 Course Notes: Assembly Language Programming With The Atmel AVR Microcontroller
CS 273 Course Notes: Assembly Language Programming With The Atmel AVR Microcontroller
CS 273 Course Notes: Assembly Language Programming With The Atmel AVR Microcontroller
Contents
1 Introduction 3
1.1 High Level Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 So How Does a CPU Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 The Fetch-Decode-Execute Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5 Addressing Modes 17
7 Branches 21
7.1 Branches encode the relation operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.2 Signed versus unsigned branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
7.3 Branch addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7.4 How Branch Instructions Interact With Compare Instructions . . . . . . . . . . . . . . . . . . 22
1
8 Condition Code or Status Register (SREG) 22
8.1 Calculating the Arithmetic Condition Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
16 Serial Communications 42
16.1 Basic Concepts of Serial Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
16.2 Setting up Serial I/O on the AVR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
16.3 ASCII Character Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2
1 Introduction
1.1 High Level Programming Languages
So far your experience programming computers has been with high level programming languages like C,
C++, and Java (and possibly other niche languages like Javascript, PHP, and Perl). But these languages
are geared (mostly) towards how we think, not how the computer computes. They are independent of what
type of computer you are using (e.g., PC, Mac, or even your phone) and so must be translated into how
the particular computer you are using actually computes. This translation is done by a compiler or an
interpreter (or even sometimes both, like with Java). In this course we are going to learn how computers
really compute!
A computer has (essentially) three components:
CPU
Memory
(Processor)
I/O
(Input/Output)
• a Processor, also called a CPU (central processing unit); this performs computations that are specified
by instructions (the program);
• some Memory, which stores both data and instructions (the program); and
• some Input and Output mechanisms, called I/O (such as keyboards, monitors, printers, disk drives,
networking, etc.).
These components are connected by busses, which are collections of wires that transfer information (bits).
Thus, a computer lets you get information in, perform some computation, and get results out. Getting
results out is the whole point!
3
AMD does with Intel’s x86 ISA, and why you can buy a PC with a CPU from either company and run the
same programs on it.
Another reason for different ISAs is that a CPU may be specialized for a certain task, like controlling a car
engine, or doing graphics (yes, graphics cards have a CPU), and so will have different machine instructions.
329 = 300 + 20 + 9
The digits 3 and 2 do not represent three and two, but 300 and 20. This is because the position of the digit
indicates a power of 10 by which the digit should be multiplied. This is represented as the formula
4
The positional scheme is much better than, say, the Roman Numeral scheme, where each symbol represented
a unique value no matter where in the number it existed (i.e., M=1000, D=500, C=100, L=50, X=10, V=5,
I=1). For more information you can check out (Wikipedia:Positional notation).
Positional numbering is great, and we will keep using it, but it is the decimal (base 10) part that gives
us problems with computers. We need some other bases! The general scheme for a positional numbering
system in a base B is:
• A hexadecimal number has a leading “0x” attached to the number (e.g., 0x4f2a is the hexadecimal
number for the decimal value 20266);
• An octal number has just a leading “0” attached to it, and then the number itself (e.g., 076 is decimal
62, and 089 is not a number!); and
• Sometimes (not often), a binary number has “0b” in front of it, and then the number (e.g., 0b10110 is
decimal 22);
Usually in Unix/C/C++/Java you have to convert a binary number to something else (like hex), then use
it in your source code. The Arduino environment allows you to use “B” as a prefix in your C/C++ code for
binary numbers, and in your assembly files you can use “0b” as a prefix for binary numbers.
5
2.3 Converting Values between Number Systems
Computer Scientists use octal and hexadecimal a lot because these systems easily convert to binary, whereas
decimal does not. The reason for this is that the bases 8 and 16 are exact powers of 2, whereas 10 is not.
Because 8 is 23 , each octal digit is exactly 3 binary digits; because 16 is 24 , each hex digit is exactly 4 binary
digits. For decimal there is no exact mapping, so you have to do some arithmetic to convert to binary.
For example, octal 046 is binary 100110 – the leftmost three bits (100) is the octal digit 4, and the
rightmost three (110) is the 6; octal 036 is binary 011110; the leftmost three bits change, but the last three
do not, since only the leftmost octal digit changed. This is very nice because when doing conversion we only
need to think about each digit by itself, we do not need to be concerned about the overall value. The table
below shows all of the binary-hex-octal digit mappings for all 4-bit values, zero to fifteen:
Notice that one octal digit exactly captures all unique 3-bit values, but when the fourth bit rolls over to one,
so does the octal value; so one octal digit represents exactly 3 bits. The same is true for hexadecimal, but
with 4 bits. However, decimal notation rolls over from 9 to 10 part way down the second column; this makes
it much harder to convert between decimal and binary, which is why we like to use octal and hex!
To convert between hex/octal/binary do the following simple things:
• From hex to binary, write the four bits that are equivalent to the hex digit, in exactly the same position
in the number as the hex digit. Always write four bits!
• From octal to binary, write the three bits that are equivalent to the octal digit, in exactly the same
position in the number as the octal digit. Always write three bits!
• From binary to hex, group the bits into fours, starting at the right. Then translate each group into
one hex digit, in the same position as the group of four bits.
• From binary to octal, group the bits into threes, starting at the right. Then translate each group into
one octal digit, in the same position as the group of three bits.
• If on the leftmost side of the binary number there are not enough bits to make a complete group, it is
OK to add 0’s to the left.
Unfortunately, we think in decimal, so we often need to convert numbers to and from decimal. This takes
a little more work. Converting to decimal is very easy: just multiply each digit by the power of the base for
that position. Examples:
• Octal: 0473 = (4 ∗ 82 ) + (7 ∗ 81 ) + (3 ∗ 80 ) = (4 ∗ 64) + (7 ∗ 8) + (3 ∗ 1) = 31510
• Binary: 0b1011 = (1 ∗ 23 ) + (0 ∗ 22 ) + (1 ∗ 21 ) + (1 ∗ 20 ) = (1 ∗ 8) + (0 ∗ 4) + (1 ∗ 2) + (1 ∗ 1) = 1110
• Hex: 0xC6A = (12 ∗ 162 ) + (6 ∗ 161 ) + (10 ∗ 160 ) = (12 ∗ 256) + (6 ∗ 16) + (10 ∗ 1) = 317810
Notice that binary to decimal is especially easy, since anything multiplied by 1 is itself and anything multiplied
by 0 is 0. This means we can drop all the terms with a 0 bit, and just use the power of 2 directly for each 1
bit. Thus the example above becomes
0b1011 = 23 + 21 + 20 = 8 + 2 + 1 = 1110
6
Converting from decimal to another base is a bit harder: we must repeatedly divide by the base we are
converting to. The remainders from these divisions will be our digits in the converted number (but they are
backwards!) The algorithm is:
1. Divide number by base, get quotient and remainder. This remainder is the rightmost digit.
2. Now divide qotient by base, get new quotient and remainder. This remainder is second-rightmost digit.
3. Keep dividing each new quotient by the base to get the next rightmost digit. Stop when new quotient
is 0 and remainder is the old quotient; this is the final, leftmost digit of the number in the new base.
(shortcut: when you see a new quotient that is less than the base, this is your final leftmost digit)
Octal example: convert decimal 93 into octal. 93/8 is 11, remainder of 5. We then take the quotient
and again divide by 8. So 11/8 is 1, remainder of 3. We do the same again, so 1/8 is 0, remainder of 1. We
stop since we have a quotient of 0. Now we take those remainders (5,3,1) and reverse them for our octal
number of 0135, which is the conversion of decimal 93 into octal. We can check that by converting back:
1 ∗ 82 + 3 ∗ 8 + 5 = 64 + 24 + 5 = 9310
Binary example: convert decimal 21 to binary. 21/2 is 10, remainder 1; 10/2 is 5 remainder 0; 5/2 is 2
remainder 1; 2/2 is 1 remainder 0; 1/2 is 0 remainder 1; done; Reverse remainders and get 0b10101. Check:
16 + 4 + 1 = 2110 .
7
Important: We now have TWO number systems that we can use in a computer: an unsigned number
system which uses values from 0 on up, and a signed number system that uses both positive and negative
values (and 0). You can write programs that use either (or both if you are careful). In C/C++ programming
you also have access to both: when you declare a variable as an “int”, you are using the signed number
system, and when you declare it as “unsigned int” you are using the unsigned number system (this is also
true for other sizes of int, such as long, short, char, etc.). In Java there is no data type of “unsigned int”, so
all you can use are signed values.
Most arithmetic-type instructions do not care whether your values are signed or unsigned – the place
where it makes a difference is in your compare-and-branch instructions, and in certain bit manipulations
such as shifts.
255 0 -1 0
overflow no overflow
across line across line
no overflow overflow
across line across line
As with number lines, addition moves you “right” (clockwise) and subtraction moves you “left” (counter-
clockwise). The top of the circle is where the bit patterns of all 0’s and all 1’s meet, along with the
hexadecimal representation and their decimal meaning in the two number systems. The bottom of the circle
is also marked with the hexadecimal representation of the bit patterns, and their decimal meanings in the
two number systems.
For the unsigned number system, the top of the circle is where the ends of the finite number line meet, and
for signed (two’s complement), the bottom is where the ends meet. The important lesson is that mathematical
operations will freely cross those points! We call this an overflow and it will play an important role later on,
but you should realize that you can indeed perform an operation in your code and end up with an incorrect
result if your operation crosses these lines.
8
3.1 Memory
You probably have at least one computer, and you probably know how much memory it has (maybe 4GB of
RAM). But what does that mean, and what is it used for?
From the section above we learned that all values in a computer are in binary: just lots and lots of 1’s and
0’s. This means that the data looks like “01100010101001000011110010111” And the machine instructions
look like “0110001110101010111100” as well! Recall that the memory holds both instructions and data. So,
the memory holds a bunch of bits (actually lots and lots of bits!) Your 4GB of RAM holds more than 32
billion bits!
Memory must be organized somehow, in order for us to find the particular data or instruction that we
need. The best way to think about memory is to think of it as one huge 1-dimensional array, with each
location storing a certain number of bits (usually 8, or one byte). In a program we would use an index to
access an array element, but in memory we instead call it an address.
An address is a number that signifies a specific memory location, just like your apartment number
signifies a specific apartment. Generally, each memory location holds one byte (8 bits) of data or instruction;
this is true on your PC. On the AVR (the CPU we use in this course) the data memory has one byte per
address, but the program (instruction memory) has two bytes (one word) per address.
Accessing memory is just like indexing an array in a program:
• an address (i.e., index) is presented to the memory;
• the memory looks up that address and returns the bits that are stored at that location.
This describes reading a value from memory; to write (or store) a value into memory, both the address and
data are presented to the memory, along with a flag indicating a write operation, and the memory stores
the new value in the location at that address.
• store the result in the register back into memory at the variable z.
9
So, four machine instructions are needed to implement one simple C/Java statement. This style of Load-
Compute-Store operation is very common, and ideally we want to make the “compute” part as long as
possible, so that we avoid using memory too much. On the AVR we have a fairly large amount of registers to
use, and so it is often possible to fit an entire iterative loop of operations inside the “compute” step, without
loading from or storing to memory except before the loop begins and after it has completed its last iteration.
We now know that information in memory is retrieved by using addresses to indicate the location. The
compiler typically picks where our variables are stored in memory, but the CPU must also be able to fetch
the instructions from the program, and each time it finishes an instruction it must fetch another one at a
different memory location. How does the CPU know what address to use when fetching an instruction? Well,
the CPU has a very special register called the Program Counter, or just PC. The PC always contains
the address of the next instruction to be executed; when the CPU goes to fetch the next instruction, it uses
the value in the PC as the address to read the program memory at. As each instruction executes, the PC
automatically gets changed so that the next instruction that is needed is fetched.
This “language” of words and abbreviations for the machine instructions is called assembly language,
and the assembler is the program which reads a program written in assembly language, translates it and
produces the binary machine code. This process is similar to compiling a program in a high level programming
language, but we call it assembling to distinguish it from compiling because:
• each assembly instruction is exactly one machine instruction, whereas high level programming language
statements are translated into many instructions each; and
• the assembly instructions are specific to a type of microprocessor (CPU) (e.g., an AVR assembly
program cannot be assembled to run on a Pentium).
Even though assembling is not the same as compiling, but the gcc compiler, which we are using, also includes
an assembler, so we can use it to assemble our programs. In the Arduino environment we will combine our
assembly program with C/C++ code as well, so using gcc on all of the pieces makes sense.
An assembly language also has other nice features:
• It lets us use decimal, octal, and hex numbers (in addition to binary);
• It lets us assign names, or symbols, to certain values to make our program more readable;
• It lets us put comments in our code (This is important, really!).
(continued on next page)
10
3.4 An example AVR assembly language function w/ global data (not a com-
plete program)
#
# Global data (val1 and val2) (new change)
#
.data
.comm val1,1
.comm val2,1
#
# Program code (compute function)
#
.text
.global compute
compute:
lds r18, val2
ldi r19, 23
add r18, r19
sts val1, r18
ret
11
The Arduino environment requires a C/C++ file that it calls “PDE” code to contain at least the functions
setup() and loop(); we use these as jumping off points to call our assembly language functions, and the
Arduino environment compiles and links the object code pieces, along with object code from libraries, into
a complete executable file. It also produces a version of the executable file that can be downloaded to the
Arduino board, which is where we really need to execute it.
1
2 #
3 # Global data (val1 and val2) (new change)
4 #
5 .data
6 .comm val1,1
7 .global val1
8 .comm val2,1
9 .global val2
10
11 #
12 # Program code (compute function)
13 #
14 .text
15 .global compute
16 compute:
17 0000 2091 0000 lds r18, val2
18 0004 37E1 ldi r19, 23
19 0006 230F add r18, r19
20 0008 2093 0000 sts val1, r18
21 000c 0895 ret
22
DEFINED SYMBOLS
*COM*:00000001 val1
*COM*:00000001 val2
compute.s:16 .text:00000000 compute
NO UNDEFINED SYMBOLS
This listing shows our program and the resulting machine code and the memory that it takes up. The
first column is the line number of the listing (and of the program). The second column is a relative memory
address in hexadecimal. These relative addresses begin at 0, but that does not mean that when the program
runs it will be at memory address 0! Following the relative address is the machine code that each instruction
results in. Most are 16-bit machine instructions, but some are 32-bit. Finally, the program text line is
displayed. The second page of the listing shows a table of defined symbols, and then a table of undefined
symbols if there are any (in this case, none).
Important: each of these symbols represent a memory address, but we don’t know yet what that address
is! The two instructions that refer to “val1” and “val2” have a second 16-bit value of 0, but this is temporary:
that value will be changed to be equal to the memory address represented by that symbol, when the linker
create a final executable program. What’s a linker? It is the final step that a compiler or assembler uses to
12
create an executable program. It’s job is to connect all the pieces together (“link”), and figure out the actual
addresses used in the pieces, and then put those real addresses into the code whereever they are needed.
• Registers r18–r27 and r30–r31 are freely available to use in functions; the caller cannot expect their
values to remain unchanged across the call; Note that r27:r26 is the X indirect addressing register, and
r31:r30 is the Z indirect addressing register.
• Registers r2–r17 and r28–r29 (Y) can be used but only if they are saved first, and restored to their
original value after use; the caller expects that their values remain unchanged across the call. The Y
register is used as a frame pointer register by C functions (and by us).
• Registers r0 and r1 are never used for local data by the compiler, but may be used for temporary
purposes; r0 is like a freely available register and can be used for temporary values (not preserved
across function calls); r1 is assumed to always hold the value 0; if r1 is used for a temporary value (not
across function calls) it must be cleared when the value is no longer needed.
In this course, ALL assembly code that you write must adhere to these conventions. You will be tempted
to cut corners and skip these rules sometimes, but do not do it! It will save you countless hours of debugging
time if you strictly adhere to these rules.
Typically, your assembly code should use registers r18–r25 for “regular” code that is performing com-
putations. If you need to call some external function (such as a library delay routine), then you can either
save your values into memory somehow (more on this in a bit), or you can first save the current value in a
register in r2–r17, and then use it for your own purposes, knowing that the delay function cannot corrupt it.
Whether you are saving your caller’s values in r2–r17 or r28–r29, or you are saving your own values in
r18–r27 or r30–r31 in order to perform a function call, the easiest way to save register values is by using the
push and pop instructions.
Later in the course we will learn about the detailed operation of the stack, but for now we can just say
that there is a special portion of memory that the AVR treats as a last-in-first-out data structure called a
stack, and we can place values on (or into) the stack using the push instruction, and retrieve them with a
pop instruction. Think of a stack of dinner plates in one of those spring-loaded plate columns at restaurant
salad bars; each dinner plate is a value, and you can only put a plate on the top (push) or take a plate off
the top (pop). This means that whatever you push on the stack must be popped off the stack in exactly the
reverse order ; and because the stack is also used for other purposes, you must exactly undo everything you
do to the stack (i.e., every push must have a corresponding pop).
13
parameters and arguments actual parameters! I will try to use parameter and argument appropriately, but
will probably slip up occasionally.
Later in the course we will learn the full set of rules for function parameters, but similarly to the register
usage conventions in the last section, we will use the function parameter conventions that the C compiler
uses, and for now the simplified rule set is:
• Arguments are passed in registers r25 down to r8, starting with r25.
• BUT, all arguments are aligned to start in an even-numbered register; this means that 1-byte arguments
will only occupy even numbered registers and will skip odd-numbered registers; e.g., for a function with
three 1-byte parameters, the first (leftmost) will be in r24, the second in r22, and the third in r20.
• Arguments occupy as many registers as needed, but are still little endian; e.g., if parameter 1 is 16
bits, the argument value is in r25:r24, and if it is 32 bits, the argument value is in r25:r24:r23:r22.
• A function return value is passed in r25 down to r18, depending on the size of the return value (a 64-bit
return value would be in r25–r18), but as with parameters, a 1-byte return value is placed in r24, not
r25; also, a 1-byte return value must be zero or sign-extended through r25, so r25 must be either all 0
bits (for an unsigned or positive value in r24) or all 1 bits (for a negative value in r24).
Most of our work in this course will deal with 1-byte arguments, and these are always in even-numbered
registers starting at r24 and decreasing. We will sometimes use 2-byte arguments, and the Arduino library
delay() function takes a 32-bit (4-byte) argument; we just need to use the registers appropriately.
• the AVR has a 16-bit stack pointer, SP, which holds the address of where the stack is in memory; oddly
enough, the SP sits outside of the typical CPU registers, and must be accessed using I/O operations!
(at I/O addresses 0x3d and 0x3e (memory addresses 0x5d, 0x5e)) What’s a stack? We’ll learn in a bit;
• the AVR has a 16-bit program counter, PC, which as said before always contains the address of the
next instruction to execute; thus the PC keeps track of where the program is executing;
• the AVR has an 8-bit status flags register, SREG, which contains 8 individual 1-bit flags (true/false
conditions) that indicate any important conditions that resulted from the previous instruction(s); like
the stack pointer, SREG is accessed in I/O space, at I/O address 0x3f (memory address 0x5f).
All of this information is part of the AVR’s ISA (Instruction Set Architecture), because it is necessary for
understanding what its set of instructions actually does.
14
4.1 Time
We will need to understand how long our program, or a piece of our program, takes to execute. This is
important for such things as, say, turning the robot for 1/2 of a circle. We would take the speed of the motor
and figure out how long to keep the motor on, then write a program to handle that.
In a digital synchronous system, which is what a CPU is, time is strictly controlled by a clock signal.
During each clock cycle, something happens. On the AVR, most instructions take 1 or 2 clock cycles, with
a few taking 3 or 4 cycles. If you recall the fetch/decode/execute explanation of program execution, it is
hard to imagine that those three steps can all happen in one cycle!
Well, they usually don’t!
The AVR uses the idea of pipelining, which simply means the overlapping of the execution of sequential
instructions. A good analogy is washing clothes: you don’t wait until your first load is finished drying to
start your second load in the washer, you overlap the washing of the second load with the drying of the first
load. That’s pipelining!
Modern complex processors like the ones in your personal computer use very complex pipelining schemes,
but the AVR uses a simple one: the fetch of the next instruction is overlapped with the execution of the
current instruction. This allows most of the simple instructions to take only one cycle of execution time,
even though their total fetch+execute time is two cycles.
Branches cause problems with pipelining because the processor doesn’t know what instruction to fetch
next until the branch decides which way to go! The AVR does a small bit of speculation: it goes ahead and
fetches the next instruction in sequence while the branch instruction is executing; if the branch falls through
then the instruction fetched is ready to execute. If the branch needs to be taken, then the instruction that
was fetched is thrown away and the instruction at the branch target is fetched. This is why in the instruction
table branches are shown as taking either 1 or 2 cycles: they take 1 if they fall through, but take 2 if they
actually branch.
4.2 Endianess
Although the AVR uses mostly 1-byte data values, it does sometimes operate on multi-byte values, with
addresses being 2-byte values and all machine instructions being 2-byte or 4-byte values. With multi-byte
values, bytes are stored in order, but the important question is: is the first byte the least significant byte
or the most significant byte? All CPUs must choose a byte ordering when dealing with multi-byte values.
The AVR chooses to store the lowermost byte first, and we call this choice little endian.
Other CPUs choose to store the highest byte first, and we call this big endian. Highest byte first seems
natural in one way, because if we look at memory as increasing in address from left to right, the number reads
in the same direction we would write it down. But what if you had a 16-bit result where you knew the upper
byte was zero, and you just wanted to access the lower byte. This actually happens alot in C programming,
because many of the C library functions return an integer, but actually the result is a single-byte character.
In this case, then, you need to access the memory location at 1 plus the base address of the value on a big
endian CPU, but on a little endian CPU the value is directly at the address.
In computer science, we call this choice endianess. If the most significant byte is first, then that CPU is
“big-endian”, and if the least significant byte is first, then the CPU is “little-endian”. Our AVR processor
is little endian.
Intel/AMD CPU’s (and thus the resulting computers) are little endian, while most Motorola CPU’s are
big endian. If you sent a 16 or 32 bit number from a Intel-based computer to one with a Motorola CPU
without modification, they would not agree on what that number is!!!
15
Program CPU Data
Memory (Processor) Memory
I/O
(Input/Output)
which simply means that the memory for data and the memory for the program are separate, and the
address spaces are separate. Furthermore, each program memory address in the AVR is a word address,
meaning each location holds 16 bits (two bytes), while the data memory is byte addressed, meaning each
location holds just one byte. This means the AVR can have up to 128KB of program memory and 64KB
of data memory. Our particular version (the avr328p) has 32KB of flash memory for storing programs,
but only 2KB of data memory (SRAM). The program memory is not accessible by the memory load/store
instructions, only the data memory is. The program memory is only accessed implicitly by fetching the
instructions as we need them. A special program is used when your program is downloaded that takes your
program instruction and flashes them into the program memory, but when your program actually runs, this
memory is read-only.
Recall that the AVR is little endian, meaning that wherever multi-byte values exist, they are always
store least-significant-byte-first. This is true in program memory, data memory, the 16-bit registers X, Y,
and Z, and any other place where multi-byte values are used.
4.4 Input/Output
Microcontrollers like the AVR have built-in I/O capabilities that can do a variety of I/O tasks: digital input
and output, analog input (built-in analog-to-digital conversion), timers (lots of built-in timer capability),
and serial communications (sending bits to another device, and receiving bits back). The AVR has many
neat I/O capabilities, but it also has some weirdness that we need to deal with.
In designing a processor, the designers must consider how I/O will be accessed. Two main approaches
are: use special I/O instructions that use special I/O addresses to access the various I/O capabilities, or
carve out and reserve a section of the memory addresses for use in I/O capabilities, and then use the generic
memory load/store instructions for I/O purposes. This is called memory-mapped I/O and is possible because
input is similar to reading a value from memory, and output is similar to storing a value to memory, except
that rather than memory it is an I/O device that is interacted with.
Oddly enough, the AVR does both!
The AVR has a legacy I/O system where separate unique I/O addresses are used with special I/O
instructions: in, out, sbi, cbi, sbic, sbis. This I/O capability is still supported, but the I/O capability
has been extended and they ran out of I/O addresses; so now I/O operations are also memory-mapped,
where generic memory instructions can perform I/O operations by accessing special memory addresses. The
weird thing about having both modes is that where they overlap, different addresses are used in the I/O
space and memory space!
I/O addresses are from 0x00 to 0x3F: the bit-oriented instructions (cbi, sbi, sbic, sbis) can only operate
on I/O addresses from 0x00 to 0x1F. The instructions (in, out) can operate on any of them.
Memory-mapped I/O addresses are from 0x0020 to 0x00FF. These can only be used by memory in-
structions (e.g., load and store). The I/O addresses 0x00 to 0x3F correspond to memory addresses 0x0020
to 0x005F: just add 0x20 to the I/O address for the corresponding memory-mapped address. In the AVR
documentation of the I/O capabilities, you can see I/O address given, and then the corresponding memory
address in parentheses.
Interestingly enough, the register set r0–r31 can also be accessed by the memory addresses 0x0000 to
0x001F! E.g., the instruction “lds r3, 0x0010” copies the value in register r16 to r3.
So, the data memory map looks like:
16
Address Range Purpose
0x0100 - 0x08FF 2K SRAM (general purpose RAM)
0x0020 - 0x00FF Memory Mapped I/O
0x0000 - 0x001F Registers R0 - R31
If our AVR had more memory installed, the memory would continue above address 0x08FF, potentially all
the way up to address 0xFFFF.
The AVR manuals contain the full I/O map. Some of the most used I/O addresses are:
Other I/O capabilities that we’ll use are above the I/O address space and only accessible using memory
instructions and addresses. Especially notable are the A/D (analog-to-digital) capabilities.
5 Addressing Modes
A program must get data to and from memory; this is done with load and store instructions. We’ve already
learned that accessing memory requires an address that specifies the location in memory to access. But how
does the address get formed?
To make certain types of programming easier, such as accessing an array of data, the AVR supports
several different addressing modes, or ways of accessing memory locations. These addressing modes are:
Immediate The actual data is located in memory immediately following the instruction’s opcode, or
is embedded in the opcode. This is only useful for a constant, it won’t work for a variable since the
value is embedded in read-only program memory. Thus, immediate addressing is never available on store
instructions, since they write to memory. On the AVR, immediate addressing is available on load, and some
others (add-to-word, compare, subtract, etc.).
Direct The 16-bit address of the data is located immediately following the opcode. This is used for
accessing single global variables where the linker knows the address of the variable when it is constructing
the executable program file. Since the address is embedded in program memory, it cannot change and so
the instruction always accesses the same memory location, but the location is in data memory and so direct
addressing can be used for both load and store operations. On the AVR the suffix ‘s’ is used to indicate
direct addressing (I have no idea why!), so lds and sts are the load and store instructions that use direct
addressing.
17
Indirect The value in an index register (X, Y, or Z) is used as the address of the data, potentially in
addition to a constant offset. This addressing mode has many uses, including accessing array elements, local
variables, function arguments, and other uses. It is very versatile and is really what gives computers their
power. In all of computer science, many many problems have been solved by providing a level of indirection
and then manipulating the indirection! By storing the address that will be used by the instruction in a
register, the address can be modified so that the instruction accesses different memory locations each time
it executes (it may be in a loop, or the function it is in may be called multiple times).
On the AVR, X, Y, and Z are the names of 16-bit “registers”, which are really the upper six 8-bit registers
treated as register pairs: r27:r26 is X, r29:r28 is Y, and r31:r30 is Z (recall that the AVR is little endian, so
the lowest numbered register is the lower byte of the 16-bit value). The load and store instructions that use
plain indirect addressing have no suffix and are just ld and st; a form that adds a constant displacement uses
the mnemonics ldd and std. Furthermore, the AVR indirect addressing instructions have auto-postincrement
and auto-predecrement modes where the index register is automatically adjusted by +/- 1 as it is used
to address memory. Note that there is ONLY post-increment and pre-decrement, which means that for
increment mode the register is adjusted by +1 after it is used as an address, and for decrement mode the
register is adjusted by -1 before it is used as an address. There are no pre-increment or post-decrement
modes!
Relative (or PC-Relative) This is used for branches and relative jumps/calls, not for accessing data.
These instructions need a 16-bit address of the location in the program to branch or jump to, but this
address is not stored in the machine code; only an offset is stored that is relative to the address of the
current instruction (contained in the PC). The actual formula is
where “currentPC” means the address of the branch/jump/call instruction itself. The offset is a 2’s com-
plement offset stored in the instruction’s opcode. This offset is only 7 bits for branch instructions (+63 /
-64 locations), but is 12 bits for the relative jump/call instructions (+2047 / -2048 locations). Recall that
program memory is word addressed, and most instructions are single 16-bit words. This means that typi-
cally, branches can branch forward or backward about 64 instructions, and relative jumps or calls about 2048
instructions. There are a few instructions that take two 16-bit words, so the offset count is not necessarily
equal to the instruction count.
18
Every AVR instruction mnemonic has an associated 16-bit machine code value, but since the instruction
has operands (registers, constants, and the like), the operand values also need embedded into the 16-bit
machine instructions.
Take the basic ADD instruction, which has two register operands, Rr and Rd. In AVR assembly the
destination is always the leftmost operand, so Rd is the left operand and Rr is the right operand.
The machine code format for ADD (which we find by looking in a machine instruction table) is
0000-11rd-dddd-rrrr
The 0’s and 1’s are fixed value bits that are those values for every ADD instruction (they are, in essence,
the bits that define an ADD instruction). The r’s and d’s are the bits that hold the operand values. If our
program had the instruction “ADD r25, r17” in it, then the r bits would hold the value 17, and the d bits
would hold the value 25. The bits above are all in normal magnitude order, left decreasing to right; even
though the r bits are not contiguous, they still represent one value (in this case, 17).
So, since decimal 25 is binary 11001, and decimal 17 is binary 10001, the instruction “ADD r25, r17” is
exactly the machine code
0000-1111-1001-0001
All we did was write the r bits in and the d bits in, left to right.
Finally, recall that the AVR is a little endian CPU. This means that for multi-byte values, it stores the
bytes by first starting at the least significant byte (i.e., right to left, but this applies to bytes, not bits; within
a byte the 8 bits are still left-to-right).
So to finish up the machine code as it is actually stored in program memory, we should swap the two
bytes, and so our example instruction has the machine code, in memory, as
1001-0001-0000-1111
which we could write in hex as 0x91 0x0F. Note that when referring to a single machine instruction we
will usually just leave it in the normal 16-bit order, but when showing machine code listings or discussing
machine code as it is stored in memory, we will view it in little endian (reverse) byte order.
19
instruction to the branch target location. A negative offset means the instruction branches/jumps backwards
(forming a loop). The offset is actually from the instruction that follows the branch/jump instruction, so
the formula for calculating the offset is (targetAddress − branchAddress + 1). You can actually use this
formula, even doing the arithmetic in hexadecimal since we usually write addresses in hex, but most students
prefer to do a simpler, counting method. If you are writing a sequence of machine code words in a table or
even just in a list, then:
1. start at the machine code for the branch instruction you are working on (probably blank at the moment,
since you haven’t filled it in);
2. find the machine code for the target instruction (a label in a program always refers to the next instruc-
tion, either on the same line as the label or on the next line if the label is by itself); if the branch is a
forward branch you should leave space for the branch instruction, work out the machine code forward
through the target instruction, then go back to the branch and do these steps;
3. go forward from the branch instruction one program word (this is the “+1”);
4. now count off program words until you reach the target instruction (this is forward if the target is
forward, and backwards if the target is backwards); you must reach the target instruction itself, so
in going backwards you will count the target instruction opcode itself, to reach the address where it
begins, not where it ends;
5. convert your count to binary, putting 0 bits in front to make it as wide as your operand field (7 or 12
bits); if the branch was a forward branch, you are done and this is the offset to put in your opcode
field; if the branch is backwards, negate the count by performing the 2’s complement operation on it,
and put this in the opcode field.
You can do the above on the assembly program listing as well, but you must be careful to know how many
program words each instruction results in, because the offset is counting machine code program words, not
assembly language instructions. Most instructions result in just one program word, but a few produce two
(the most common for us are the LDS and STS instructions), and for those you must count them twice if
you are counting your offset over the assembly program instructions. In the assembly code, start at the
branch, go forward to the next instruction, then count off to the target instruction (counting two for any
two-program-word instruction). You will get the same offset as the machine code process above.
The direct addressing instructions, such as LDS and STS, require a full 16-bit address as their second
program word. This is the data memory address that the instruction will access (read or write). In assign-
ments or exam questions you will be told the starting address of where global variables are allocated from,
and then according to the data section of the program (which reserves memory locations for global variables)
you can allocate and figure out the actual addresses for each global variable, and then translate from the
symbol name to the actual address. This address is also stored little-endian, so the lower byte is stored first
and the upper byte is stored last, just as with other machine code words.
1. write the program word in binary, flipping the bytes back so that the most significant byte is leftmost;
2. find the machine code pattern in the opcode table that matches the actual machine code; all of the
constant 1’s and 0’s in the pattern must exactly match the given program word; there will be only one
exact match;
3. the pattern shows two things: what instruction this program word is, and what the operand fields are;
write the operand fields separate with the bits from the instruction opcode;
20
4. translate the operand fields from binary into something meaningful in the program, and write it in the
reconstructed assembly program.
For example, suppose we had the program word 0x5B 0xE2, shown as stored in program memory. Since
the stored word is little endian, our actual progam word is 0xE2 0x5B, or binary 11100010 01011011. Looking
this up in our opcode table (the one ordered by opcode value), we find that this matches the LDI pattern of
1110-KKKK-dddd-KKKK. So we know this is a load immediate instruction. The constant K field is 8 bits,
and pulling those bits out they are 00101011 (the hex digits 2 and B). In either unsigned or in 2C this is the
decimal value 43. The Rd field is only 4 bits, so our rule about the 5th bit being assumed to be a 1 comes in
to play; the field itself is 0101, so our register number is 10101, or decimal 21. So the final instruction that
this program word represents is LDI r21, 43.
7 Branches
In high-level programming languages, we usually don’t have to think about how our code branches – it
is implicit in the control structures and the curly braces. For example, we know implicitly that if an if
condition is false our program will branch to the else clause, and that at the closing brace of a loop body
our program will branch back up to the loop condition at the top.
In assembly language, however, these branches are not implicit, but instead they are very explicit – you
have to pick them and place them in the right spots.
21
The two equality checks (equal to and not equal to) are the same for signed and unsigned: BREQ and
BRNE. But for the others, it is very important for you to pick the right branch depending on your data.
The wrong branch simply will not work!
Remember that AVR program memory is word-addressed, so a 7-bit 2C offset allows branches to branch
backwards or forwards approximately 64 instructions, though a few instructions take 2 words to store and
so could affect this rough estimate.
There is also a “relative jump” instruction, RJMP, which has a 12-bit offset and uses the same formula
as the branch instructions. This can branch backwards or forwards approximately 2048 addresses.
• Bit 1: the Z flag (Zero): The result was exactly zero. NOTE: the Z bit is 1 when “zero” is true!
• Bit 2: the N flag (Negative): For 2C numbers, the result was negative. For unsigned, this flag is
meaningless.
• Bit 3: the V flag (Overflow): a 2C result was too big. Analagous to the C flag, but for signed numbers.
• Bit 4: the S flag (Signed): a precomputed result of (N XOR V). Used by signed conditional branches.
22
• Bit 5: the H flag (Half Carry): A carry occurred from bit 3 to bit 4 (i.e., half the byte). This is useful
only when dealing with BCD numbers, something we do not discuss in this course.
• Bit 6: the T flag (Bit Store): A temporary storage place for one bit.
• Bit 7: the I flag (Interrupt): Set to 1 to enable (allow) interrupts to happen. We will talk about
interrupts later.
We will mostly use the first five flags. They are the flags signalling results of normal arithmetic and com-
parison operations.
The “official” calculation performed by the CPU to calculate the V bit is: if the last two carries are the
same, V=0. If they are different, V=1. Recall that we number bits from the right starting at 0, so the most
significant bit in a byte, the eighth bit, is termed bit 7. So if the carry from bit 6 to bit 7 and the carry from
bit 7 out are different, then V=1; otherwise they are the same and V=0.
Finally the S bit is simply the XOR (Exclusive OR) of the N and V bits that you computed. XOR can
be seen as a “difference” operator, so S=1 if N and V are different, otherwise S=0.
23
9 Manipulating and Testing Bits
In high level programming languages you have used the logical operators: AND, OR, and NOT. In C/C++/Java,
these logical operators are represented as &&, ||, and !. Since each bit is essentially a boolean value, these
operators can be applied as bit operators, too! In fact even the high level languages allow you to do this,
with the operators &, |, ∼, and ^ for XOR. The AVR, like most processors, has specific instructions to
perform these bitwise operations:
Instruction Description
AND bitwise and of two registers
ANDI bitwise and of register and immediate value
OR bitwise or of two registers
ORI bitwise or of register and immediate value
EOR bitwise exclusive or of two registers
COM bitwise not of one register (complement)
NEG 2’s complement negation of a register
24
• SBRS: Skip if Bit in Register is Set: skips the next instruction if the specified bit in the specified
register is 1.
• SBIC: Skip if Bit in I/O Port is Clear: skips the next instruction if the specified bit in the specified
I/O address is 0.
• SBIS: Skip if Bit in I/O Port is Set: skips the next instruction if the specified bit in the specified I/O
address is 1.
These instructions cannot be made to branch anywhere, they can only be used to skip the instruction
following the bit test instruction. Of course, that instruction can be an RJMP that can jump anywhere!
25
10.2 The Stack Frame Pointer
If some of the arguments are passed on the stack, or if some local variables are created, then the procedure
must have some mechanism for accessing those. The typical way to do this on most processors, and on
the AVR, is to copy the SP register into an index register, and then use indexed addressing to access the
necessary items.
On the AVR, the convention is to use the Y index register (r29:r28) for this purpose. We call the register
used to hold a copy of SP the frame pointer, since it points to the stack frame (more formally called the
activation record) of the procedure. The normal way to copy the SP to the Y index register is simple:
in r28, 0x3d
in r29, 0x3e
Note that you could also use the “LDS” instruction, but in that case you would have to use the equivalent
memory addresses of 0x5d and 0x5e.
Once the SP is copied to the Y, then the key realization when thinking about how a procedure accesses
things in its activation record is that it does not pop them off the stack but simply accesses them in place,
where they are within the stack!
We can do this with the special indirect addressing instruction ldd, which is “indirect with displacement”.
This allows a constant unsigned offset to be applied to the address in the index register. We need to simply
count out how many memory locations away from Y are the arguments or local variables that we want to
access, and then use “Y+q” where q is that constant offset.
For example, if I had a function with 11 1-byte arguments, the first nine would be passed in the even-
numbered registers from r24 down to r8. The last two would be passed on the stack. If I am going to use
the Y register then I have to save it first, so the beginning of my function would look like:
push r28
push r29
in r28, 0x3d
in r29, 0x3e
26
used as arguments, because all the procedure gets is a copy of the current value, and doesn’t know where it
came from. The procedure can use its parameter as a “local variable”, and can assign to it and change it.
But this only changes the copy on the stack, and not anywhere else.
Call-by-reference, as its name implies, is not so safe. With this mechanism, the procedure actually gets
a reference to the variable that is being used as an argument in the procedure call. Now the procedure can
actually change the variable outside of itself. To do this, we need to pass the address of the variable to the
procedure, not its value!. Of course, addresses on the AVR are 16 bits, so call by reference will always use
2 bytes – either 2 registers or 2 bytes on the stack.
To access the variable in the procedure itself, we need to copy the address into an index register (usually
X or Z), and then use indexed addressing with that address to access the variable.
in r24, 0x3d
in r25, 0x3e
subiw r24, 4
out 0x3d, r24
out 0x3e, r25
This grabs the current SP, subtracts 4 from it, and stores it back. But that’s pretty difficult! We could also
do:
27
push r0
push r0
push r0
push r0
Here we don’t really care what value is in r0, we’re just using it to complete the push instruction; 4 pushes
create 4 bytes of space. This is good for a few bytes of local variable space; obviously if we needed 50 bytes
of local variables, the subtract method would be better. A really wierd mechanism that the compiler will
generate to allocate even numbers of variable bytes is:
rcall .+1
rcall .+1
The above also allocates 4 bytes of local variable space – how? Well remember that an RCALL pushes the
return address on the stack, which is 2 bytes. The “.” in the operand means “current instruction address”,
and so each of the instructions is “calling” the next instruction, which has no effect! Well, none except
pushing 2 bytes on the stack!
Note that the “indirect plus displacement” addressing modes on the Y register (in the manual as ldd )
only allow an unsigned displacement. This means that transferring the SP to the Y register must be done
after local variable space is created, so that the offsets are positive.
28
I/O Address Memory Address Designation
0x03 0x0023 PINB - input, port B
0x04 0x0024 DDRB - data direction, port B
0x05 0x0025 PORTB - ouput, port B
0x06 0x0026 PINC - input, port C
0x07 0x0027 DDRC - data direction, port C
0x08 0x0028 PORTC - ouput, port C
0x09 0x0029 PIND - input, port D
0x0A 0x002A DDRD - data direction, port D
0x0B 0x002B PORTD - ouput, port D
A further capability that these ports have is that each pin in each port has an indivually-selectable
internal pull-up resistor that can be enabled in input mode. This is very useful when reading devices like a
press-switch that is normally not pressed, but when it is pressed the program needs to detect it. An external
circuit only needs to connect the switch to ground (0 volts), like such:
+5V
internal switch
pull-up
resistor
GND
Digital
I/O Pin
When the switch is not pressed, with no pull-up resistor the pin is essentially unconnected to any circuit
and would thus have an undetermined value (the manual calls this a “tri-state” mode), but with an internal
pull-up resistor enabled, the pin will read as a 1 reliably. Then when the switch is pressed and a connection
to ground is made, the pin will read 0. To set a port pin with the pull-up resistor enabled, you need
to write a 0 to the appropriate DDRx bit (this sets the pin in input mode) and then write a 1 to the
corresponding PORTx bit (this enabled the pull-up resistor). As usual, you read the current input value
using the corresponding PINx bit. An input value of 1 will mean “nothing happening”, while an input value
of 0 will mean “my switch is pressed!”. Section 13.2.3 in the AVR Technical Reference, and Table 13-1 in
that section, provide more detail.
29
input bit rather than saving it somewhere, and so the AVR does provide instructions for testing individual
I/O bits. The instructions that do this are:
• sbis ioport,bit - this stands for “skip if bit set” and it reads the bit position at the I/O address given
and if the bit is a 1 it skips the next instruction in your program.
• sbic ioport,bit - this stands for “skip if bit clear” and it reads the bit position at the I/O address given
and if the bit is a 0 it skips the next instruction in your program.
Notice that these are basically “compare and branch” instructions, but they cannot branch to any arbitrary
label like normal branch instructions can. All they can do is skip the next instruction or not. If the action
you need to perform is only one instruction then you can put the instruction right there; but what if you need
more? The answer is: put an “rjmp” instruction there and use it to branch. For example, if you wanted to
do something if a switch at PORTC, pin 3, is pressed (which reads a 0 value), then you could do something
like:
...
sbic PORTC,3
rjmp notpressed
;
; code here that handles the switch press
;
notpressed:
; continuation of program here
If the switch is pressed then the pin reads a 0 bit and the sbic instruction skips the jump and executes the
code that handles the switch press; otherwise the jump instruction branches around the code and continues
on in the program. The example above is only the if-pressed case; if you needed some alternative action for
the not-pressed case, you would have to have an if-then-else structure to handle both cases.
30
• 0x0078: ADCL: this contains the low byte result;
• 0x0079: ADCH: this contains the high byte result;
• 0x007A: ADCSRA: this contains most of the configuration bits;
• 0x007B: ADCSRB: this contains some configuration bits;
• 0x007C: ADMUX: this contains input selection bits and some other configuration flags;
• 0x007E: DIDR0: this contains flags that turn off digital input;
Recall that the AVR produces a 10-bit result, so at least 2 bits must be stored in a second byte; but which
two bits these are is configurable! The AVR can be configured so that either the upper two bits or the lower
two bits of the result are in the second byte. We will configure it so that the lower two bits are in a second
byte, and then we will just ignore them and use the upper 8 bits as our 1-byte value. Essentially we are
rounding off the 10-bit value into an 8-bit value.
Below we treat each of these I/O registers in the most practical sequence, which is not the same as the
address order above. The tables show the 8 bits in each I/O register, their names, and then below the tables
are the bits’ meanings and uses.
7 6 5 4 3 2 1 0
ADEN ADSC ADATE ADIF ADIE ADPS2 ADPS1 ADPS0
• ADEN is A/D Enable: writing a 1 in this flag turns the A/D system on (0 turns it off);
• ADSC is A/D Start Conversion: writing a 1 starts an input reading and conversion; when conversion
is complete, this bit will read back a 0;
• ADATE is A/D Auto Trigger Enable: this allows an event to automatically start a conversion; we will
not use this;
• ADIF is A/D Interrupt Flag (read only): this is set to 1 when a conversion is complete;
• ADIE is A/D Interrupt Enable: setting this to 1 enables the A/D interrupt; we will not use this for
now;
• ADPS2-ADPS0 is a 3-bit A/D clock PreScaler: Chapter 23 of the AVR tech manual says that A/D
works best with an A/D clock of 50KHz to 200KHz; Table 23-5 shows how these three bits set a value
to divide the system clock by; since our system clock is 16MHz, we should set these bits to binary 111
to get a divisor of 128.
7 6 5 4 3 2 1 0
REFS1 REFS0 ADLAR empty MUX3 MUX2 MUX1 MUX0
• REFS1-0 is a 2-bit Reference voltage selector; Table 23-3 shows the options, but we need this to be
binary 01 to select the +5V board voltage;
• ADLAR is A/D Left Adjust Result: setting this to 1 causes the high 8 bits of the result to be in ADCH
and the lowest 2 in ADCL; setting it to 0 gives the opposite (lowest 8 in ADCL, highest 2 in ADCH);
you should set this to 1 and then ignore the lowest 2 bits; just use ADCH as your 8-bit value;
• MUX3-MUX0 is a 4-bit input selector: binary 0000 to 0111 select the 8 different input pins of Port
C; 1000 selects the internal CPU temperature sensor, 1110 selects the internal CPU voltage, and 1111
selects Ground.
31
DIDR0: Digital Input Disable Register, 0x007E
7 6 5 4 3 2 1 0
empty empty ADC5D ADC4D ADC3D ADC2D ADC1D ADC0D
• ADC[0-5]D: when set to 1, the digital input buffers are disabled, which saves power; this is for the pins
0-5 of Port C, which are also used for A/D input; pins 6 and 7 do not have digital input buffers; we
should set these to 1 to turn the buffers off.
7 6 5 4 3 2 1 0
empty ACME empty empty empty ADTS2 ADTS1 ADTS0
• ACME is A/D Comparator Multiplexer Enable: we will not use this;
• ADTS2-ADTS0 is a 3-bit A/D Auto-Trigger Source: we will not use this;
From the above, we can make A/D programming fairly easy. Firstly, we only need to use ADSRA,
ADMUX, and DIDR0 for configuration and control. Secondly, if we set ADLAR to 1 then we only need
ADCH for a 1-byte data value. Pretty easy! Programming A/D comes in two parts: one part is the initial
setup that only needs done once, at the beginning of your program. The other part is the things you need
to do each time you want to read a new input value.
Initialization :
• In DIDR0, initialize ADC[0-5]D all to 1 to turn off digital input buffers;
• In ADMUX, initialize REFS1-REFS0 to 01 (select +5V as reference), ADLAR to 1 (left adjust output
bits), and MUX3-0 to 0000 (we’ll change the MUX bits later on);
• In ADCSRA, initialize ADEN to 1 (turn on A/D), ADPS2-0 to 111 (divide system clock by 128), and
the rest to 0 bits;
Reading a value :
• In ADMUX, set MUX2-0 to the desired input pin;
• In ADCSRA, set ADSC to 1 to start a conversion;
• In a loop, read ADSC until it reads 0;
• Fetch result byte from ADCH; ignore the lowest 2 bytes in ADCL
Once you have your byte value, it’s up to you to make sense of it. All the AVR knows is that it read
some A/D input channel, created a digital value, and gave it to you in a byte. What that “means” in your
application program is up to you.
Also note that you must redo all of the non-initialization steps each time you want to read a new value.
If you simply load from ADCH, you will just get whatever value was in there before. To get an actual new
sensor reading you must do all four steps for reading a value each time.
32
+5V
photoresistor resistor
We can sense the voltage drop on an A/D input line, and thus obtain a value representing the amount of
light falling on the photoresistor. The A/D pin itself does not complete a circuit to Ground and thus does
not draw much current, so we need a regular resistor in series with the photoresistor to complete a circuit
from our +5V power source to Ground. The voltage drop across both resistors is always 5 (+5V to 0V),
but the voltage drop to the A/D pin will vary depending on how much light is falling on the photoresistor
(which will cause the current draw to vary since the voltage drop is constant). A good regular resistor size
to choose is a 1K resistor, since this is on the order of the resistance range of the photoresistor.
• sends multiple 8-bit bytes, each bit as with the address sending;
• sends a stop bit: like the start bit but a low-to-high transition.
33
Bytes are sent highest-bit-first; in other words, I2C is big-endian, the opposite of the AVR processor! Oh
well!
After each byte (including the address), the slave will send an acknowledgement bit back indicating
success, but the master must still generate the clock period for the ACK bit. So each byte actually needs
nine clock periods generated. Our program will ignore the ACK (slightly unsafe!), but we still need the clock
period.
34
• Send control byte 0x01 (indicating writing to the config register);
• Send byte 0x60 (configure for full 12 bits of resolution); finally a stop bit. Remember that each of the
three bytes needs an ACK clock period, too.
• Send a stop bit;
• Send a start (yes, this needs to be a second message);
• Send address byte 0x92 (7-bit address + 0 bit for write mode);
• Send control byte 0x00, this tells the thermometer to send the temperature data when we do our reads;
• Send the stop bit.
Those two messages configure the thermometer chip to be read; we only need to do this once, then we can
read as many times as we want.
To read the temperature, we need to generate the clock periods, start and stop bits, and I2C address as
with our writes, but input data rather than output it. This is how:
• Send a start bit;
• Send the address byte 0x93, NOTE: 7-bit address plus a 1 bit to indicate a read; don’t forget ACK;
• Read a byte: generate each bit’s clock period and while clock is high, do a bit read; this is somewhat
tricky; below this list is a function readByte() that you can call and it will leave the byte value read in
r24.
• Read another byte; the thermometer responds with two data bytes; so call readByte() again (after
saving r24 somewhere!);
• Send a stop bit.
The first byte that you get back is the signed integer number of degrees in Celcius. The thermometer has
a range of -55 to +128. The second byte that you get is the unsigned fractional part to add to the integer
part, in the count of 256ths. For example, if our first byte was binary 00010110, this is 16+4+2 = 22 degrees
Celcius, and if the second byte is binary 11000000, this is 128+64 = 192 256ths more, or 192/256 = 0.75;
this means you would add 0.75 to 22 and the temperature reading is 22.75 degrees Celcius. Note that if the
first byte is a negative number you still would add the positive fractional part; if for example the first byte
was -13 and the second byte was as above, the reading would be -12.25 degrees Celcius.
To read a data byte from the thermometer, use the following functions:
#
# Read a byte from the I2C bus (assumes all initialization has been done,
# and that the start bit and address have been sent; this only reads one
# byte (and does the ACK), caller must complete the communication, reading
# more bytes, sending a stop bit, etc.)
#
.global readI2CByte
readI2CByte:
push r16 ; save r16 in case program is using it
push r17 ; save r17 in case program is using it
cbi PORTC, SDA ; ensure output is low to switch to input
cbi DDIRC, SDA ; change SDA pin to input rather than output
ldi r16, 8 ; we’re going to read 8 bits
clr r17 ; r17 will hold data byte, so start it at 0
readLoop:
lsl r17 ; shift the bits we have so far one place to left
sbi PORTC, SCL ; set clock high
call I2CDelay ; keep high for a bit, gives time for therm to send bit
sbic PINC, SDA ; skip next instruction if input bit is 0
35
ori r17, 0x01 ; input bit is a 1, so put it into data byte
cbi PORTC, SCL ; set clock low
call I2CDelay ; keep low for a bit
dec r16 ; decrement our loop counter
brne readLoop ; if it is still not 0, go back to top of loop
readDone:
sbi DDIRC, SDA ; change SDA pin back to output
cbi PORTC, SDA ; set data line low for ACK
sbi PORTC, SCL ; start ACK clock period
call I2CDelay ; hold high
cbi PORTC, SCL ; set clock low
call I2CDelay ; hold low
mov r24, r17 ; move data over to return value register r24
pop r17 ; restore original r17
pop r16 ; restore original r16
ret ; data byte is left in r24
.extern delayMicroseconds ; need to tell assembler we are using this library function
I2CDelay:
ldi r24, 50
ldi r25, 0
call delayMicroseconds
ret
Note that in our lab experience, we could run the I2C protocol quite slow for communicating with the
7-segment display chip (we could use the Arduino millisecond delay function), but the digital thermometer
chip was more picky; we had to speed up the protocol with the delay function in the above example for
everything to work.
36
The L293D chips also have an “enable” pin for each pair of its outputs. This allows another channel of
motor control and is used for Pulse Width Modulation, which is a fancy name for turning the motors on
and off really fast, to slow down the motors from their full-on speed. These enable pins are seen on the
schematic as the PWM signals (PWM0A, PWM0B, PWM2A, PWM2B) These signals are directly connected
to Arduino pins in PORTD (J1) and PORTB (J3).
Although we are not going to control motor speed in this lab, you will need to set these signals to enable
the latched motor byte to be seen on the motor output terminals (M1-M4). You can do this by outputting
a 1 on the pin that maps to each PWM output signal. On PORTB, PWM2A is connected to pin 3; on
PORTD, PWM0A and PWM0B are connected to pins 6 ans 5, respectively, and PWM2B is connected to
pin 3. The template code has this in it.
From the motor perspective, in the motor control byte, motor 1 is mapped to bits 2 and 3; bits 0 and 6
for motor 2; bits 1 and 4 for motor 3; and bits 5 and 7 for motor 4. Setting these bits to 01 will turn the
motor in one direction; setting them to 10 will rotate the motor in the opposite direction. Setting both bits
for a motor to 00 or 11 will stop the motor, but you should avoid using 11.
Notice the two, five-screw output terminals on the motor shield. These are labeled M1-M4 for each of the
four motors, which tells you where to connect your motors to the shield. The outer screws on each of these
terminals interface to the individual motors; the middle screws are grounds (which we won’t use). Looking
at the motor shield from above with the 5-screw terminal with ”M4” and ”M3” written underneath it, the
bits are mapped to the 5-screw terminals as:
bit0 bit6 Gnd bit5 bit7
M4 M4 M3 M3
Internal
M1 M1 M2 M2
bit2 bit3 Gnd bit1 bit4
Each motor also has an enable line that must be 1 in order for the motor to turn on. These are defined
as M1ENABLE through M4ENABLE, and are connected to PORTB and PORTD. All of the pins we will
use are:
Name Port Pin
MOTDATA PORTB 0
MOTCLOCK PORTD 4
MOTLATCH PORTB 4
M1ENABLE PORTB 3
M2ENABLE PORTD 3
M3ENABLE PORTD 5
M4ENABLE PORTD 6
BOARDLED PORTB 5
The last symbol is just the Arduino on-board LED.
37
4. Set the PWM outputs to high.
5. Reset the latch clock (MOTLATCH) by clearing the bit.
To send each bit of the motor byte to the shift register:
1. Clear or reset the latch (MOTLATCH).
2. Based on the bits in your motor byte (MOTDATA), send either a “0” or a “1”.
3. Set the motor clock signal (MOTCLOCK) high and keep it high for at least 1 ms.
4. Clear the motor clock signal (MOTCLOCK) to set the clock back to low.
5. Clear the MOTDATA bit so you’re ready for the next bit.
6. Delay for at least 1ms before you send the next bit.
Once all 8 bits of the motor byte have been sent to the shift register, you need to latch this byte to the motor
output terminals. After the bits are latched, you should reset the latch by clearing the bit. To do this:
1. Set the latch clock signal (MOTLATCH) to high. Keep this signal high for at least 1ms.
2. Set the proper MOTENABLE signal(s) to high to enable the motor byte to be seen on the output
terminals.
3. Reset the latch clock signal (MOTLATCH) by clearing the bit.
38
Vector # Entry
Address Source Name Description
1 0x0000 RESET External Pin, Power-on Reset,
Brown-out Reset and Watchdog System Reset
2 0x0002 INT0 External Interrupt Request 0
3 0x0004 INT1 External Interrupt Request 1
4 0x0006 PCINT0 Pin Change Interrupt Request 0
5 0x0008 PCINT1 Pin Change Interrupt Request 1
6 0x000A PCINT2 Pin Change Interrupt Request 2
7 0x000C WDT Watchdog Time-out Interrupt
8 0x000E TIMER2 COMPA Timer/Counter2 Compare Match A
9 0x0010 TIMER2 COMPB Timer/Counter2 Compare Match B
10 0x0012 TIMER2 OVF Timer/Counter2 Overflow
11 0x0014 TIMER1 CAPT Timer/Counter1 Capture Event
12 0x0016 TIMER1 COMPA Timer/Counter1 Compare Match A
13 0x0018 TIMER1 COMPB Timer/Coutner1 Compare Match B
14 0x001A TIMER1 OVF Timer/Counter1 Overflow
15 0x001C TIMER0 COMPA Timer/Counter0 Compare Match A
16 0x001E TIMER0 COMPB Timer/Counter0 Compare Match B
17 0x0020 TIMER0 OVF Timer/Counter0 Overflow
18 0x0022 SPI, STC SPI Serial Transfer Complete
19 0x0024 USART, RX USART Rx Complete
20 0x0026 USART, UDRE USART, Data Register Empty
21 0x0028 USART, TX USART, Tx Complete
22 0x002A ADC ADC Conversion Complete
The interrupt vector table is always at a fixed location in memory; for the AVR, as can be seen in the
second column, the table begins at address 0x0000 in flash (program) memory, and ends at address 0x002B.
Reading down the names of the interrupts, you can see that some are directly triggered by external circuitry
connected to external pins, some are communication oriented, to handle the “slow” nature of data transfer,
there is one for A/D conversion indication, and quite a few related to timers.
The first entry in the table is a psuedo-interrupt in the sense that it doesn’t have a current program to
get back to; it is the power up condition. Did you ever wonder how a CPU knows what to do when you first
turn it on? Well the first entry is it! This RESET interrupt is triggered whenever the CPU is power-reset or
software-reset; this entry contains the address of the code that needs executed first when the CPU starts up.
In the Arduino environment this code eventually leads to the function main() being called, which is where
all C programs start.
39
any two instructions in a program, and this means that it could execute between a comparison instruction
and the conditional branch instruction that will use the results of the comparison. So if the interrupt handler
needs to execute any instructions that change any flags in the SREG register, it needs to save the current
SREG value.
IMPORTANT: Interrupt handlers must always be kept SHORT and QUICK. Many times students will
try to write their whole program as a response to some interrupt. The problem is that further interrupts
are disabled until the current handler finishes! Thus, interrupt handlers should always be very short pieces
of code which simply do something immediately doable and/or just set a flag for later code to check. Your
goal always is to return from the interrupt handler as quickly as possible.
14.3 Timers
As seen in the above table, quite a few interrupts are related to timers. This is very typical of small processors
meant to be used in embedded systems. Think of your microwave oven; almost everything the processor in
that system needs to do is associated with keeping track of time.
Using a timer interrupt is essentially the same as setting your alarm clock; you configure the timer to
run at a certain rate and then set a specific trigger point, and the interrupt happens when the timer reaches
that trigger point. And just like your alarm clock that is automatically reset for the next day, the timer
resets and will cause an interrupt the next time it hits the trigger point (and the next, and the next, etc.,
until it is turned off). In this manner the timer acts like a metronome, giving you a tick (an interrupt) at a
very precise and regular interval.
Timers work as what is known as “free running counters”; that is, they are a self- contained circuit and
register that can do a “timer++” action at the rate that you specify. Setting a trigger point means setting
a value such that when the timer value reaches that value, an interrupt occurs. One natural point to trigger
an interrupt is when the timer value reaches its maximum and “overflows” (it wraps around to 0). In the
interrupt vector you can see three overflow interrupts.
The AVR has three timers: 0, 1, and 2. Timer 0 and 2 are 8-bit timers, while timer 1 is a 16 bit timer.
There are also quite complex mechanisms available to set up timer interrupts, with various comparator
setups. We are going to keep it simple and just use the overflow interrupts.
The memory-mapped I/O addresses associated with the timers are somewhat scattered:
• Timer 0 controls are down in the “pure” I/O space, with memory addresses 0x44 to 0x48 (I/O addresses
0x24 to 0x28); all the rest are only in memory space;
• Timer 1 controls are 0x80 to 0x8B;
• Timer 2 controls are 0xB0 to 0xB4;
• All three timer interrupt controls are at 0x6E to 0x70;
40
In any case, the most logical setting for us is 101, selecting a divisor of 1024; because the system clock is
fairly fast (16MHz), we want to slow it (even through Clk-I/O) down as much as possible.
The only other thing we need to do is enable the interrupt for the timer overflow. This is done in bit 0
of the TIMSKn control registers. Each names bit 0 as TOIE, meaning “timer overflow interrupt enable”.
int i;
int *pi;
void func()
{
i = 4;
pi = &i;
*pi = i + 3;
}
Let’s translate this into AVR assembly language. First, the data declarations:
.data
.comm i,1
.comm pi,2
41
The variable i is just one byte because the AVR uses 1-byte integers (mostly). But why is pi two bytes?
Well, pi must “point” to some data location, and thus it must hold an address. On the AVR, addresses are
16 bits, or 2 bytes. Now let’s add some code, for all but the last C statement:
.data
.comm i,1
.comm pi,2
.text
.global func
func:
ldi r17, 4
sts i, r17
ldi r30, lo8(i)
ldi r31, hi8(i)
adiw r30, 3
sts pi, r30
sts pi+1, r31
So, what is going on here? Well, the first two instructions are easy. They load the register r17 with a
constant 4, and then store it into the variable i using direct addressing. The next instructions looks odd,
but are correct. They load the value of the label i – not the value of the variable i – into the Z register
(which is r31:r30 treated as one 16-bit register). Think of these instructions as the “&” operator – in fact,
when C programmers speak, they say “address of” to denote the “&” operator. Remember, all labels are
address values, so the Z register now has the address of i in it.
The next instruction adds 3 to the word r31:r30, and then we store the 16-bit word into the variable pi
using direct addressing (the second byte must go in the next memory location, so we use +1 on the label).
The variable pi, which holds an 16-bit address, is now “pointing to” the variable i! This is exactly how a
pointer variable operates. When it comes down to the machine instruction level, a pointer (or reference)
variable is really just a variable that holds the address of another variable.
16 Serial Communications
We said earlier that the AVR has the ability to do serial communication, which is what the Arduino envi-
ronment uses when you open the “Serial Comm” window and enter text to send and see printed text that
the program outputs.
42
Our USB connections from the Arduino to a PC are full duplex connections. In addition to two data
wires, a USB connection has one wire for a common Ground (0 volts) reference, and one wire for a +5 volts
connection, which can be used for powering a small USB device (like your iPod), or recharging it.
Both sides must, of course, agree on a signalling protocol to be able to transmit data. USB defines
its own signalling protocol, but a peripheral chip on the Arduino board manages the USB protocol, so
a program that uses serial communication does not need to know about it. Instead the AVR processor
has a USART (Universal Synchronous-Asynchronous Receiver Transmitter) programming mechanism, and
a program simply needs to abide by its rules and use its I/O registers in order to program serial I/O on the
Arduino.
43
operations that we can program at assembly level are basically designed into the hardware in terms of gates.
And it takes alot of gates – CPU’s like the Pentium have millions of gates.
These basic boolean operations form a discrete mathematical system called boolean algebra. It obeys
many of the properties that we are used to seeing in math, and more:
• OR is represented by a plus sign, AND by a dot (or nothing, like multiplication). NOT is a bar over
the term (but sometimes a prime mark), and XOR is most often a plus sign inside of a circle
• like integer math’s plus operator, 0 is the boolean identity value for OR – that is, anything OR’d with
zero is itself.
• like integer math’s multiply operator, 1 is the boolean identity value for AND – that is, anything
AND’d with 1 is itself.
• AND and OR are associative, commutative, and distributive.
• NOT defines an inverse value
• DeMorgan’s Laws
1. NOT( AND(A, B)) is the same as OR( NOT(A), NOT(B))
2. NOT( OR(A, B)) is the same as AND( NOT(A), NOT(B))
Just like in regular math, we can write down formulas in boolean math.
(in class examples)
Since circuit designers like to draw their circuits rather than just write formulas, there are a set of diagrams
for each operator in Boolean logic:
As an example for how these gates can be constructed out of switches (or transistors), the following
diagram shows an AND gate and an OR gate made out of switches.
TODO: Diagram: gates from transistors
S
Q
Q’ (not-Q)
R
44
This circuit essentially stores one bit of data! But we don’t have a “data” line into it, we just have two
controls (set and reset). To make a data-in line and a data-out line takes a little more doing:
D
Q
Q’ (not-Q)
Clock
This circuit is essentially a one-bit storage circuit. If you put eight of them in parallel, you have one 8-bit
register on the AVR!
That is, 5 bits to the left, and 3 bits to the right of the binary point. Conversion to a decimal value
is done exactly like we did for integers, with the positions to the right of the binary point being negative
exponents. For example, the value 01011.101 would be
in decimal. With our representation, we can have values from 0.0 (our format is unsigned) to 31.875, in
steps of 0.125 (0.001 binary)
[Recall that negative exponents are reciprocals of positive exponents, so 2−2 = 1/4, 2−3 = 1/8, etc.]
A fixed point format is useful, and it has been used in computer design, but the problem is that it doesn’t
scale to different problems. What if all the values we had were smaller than 1? This representation would
waste all of the five bits in front of the binary point, and we wouldn’t have very much accuracy with only
three bits.
45
18.2 Idea 2: Floating point representation
A better solution would be to allow the decimal point to float – in other words have a floating point
representation. Actually, this is already familiar to us in the form of scientific notation. Think of a scientific
notation where you are only allowed to write down, let’s say, 8 digits (not including the 10 that is the base
of the exponent) – and that 3 of those digits are in the exponent, and five in the value.
With such a format, we could write down a number like 4.3275 ∗ 10182 , or a number like 8.4832 ∗ 10003 ,
or even (if we can have a negative exponent) 6.3491 ∗ 10−821 .
The important notion here is that we always have five digits of accuracy, even though our numbers range
from very big to extremely small. We are always able to use all our digits of accuracy. That is the advantage
of floating point numbers.
Ok, we can do the same thing in binary. But we need some names. The value that is the exponent is
called the exponent, and the value of our number without the exponent is called the mantissa. That is,
in the first example in the above paragraph, 4.3275 is the mantissa, and 182 is the exponent. 10 is the base
of the exponent, and when we do this in binary, the base will be 2.
So, let’s take our 8 bits and pick the same dividing point for the mantissa and exponent – that is:
mmmmm eee
So we have 5 mantissa bits and 3 exponent bits. For now, let’s assume nothing is signed, and that we put
a binary point to the right of the first mantissa bit. So, for example, the 8-bit value 10101011 is interpreted
as 1.0101 ∗ 2011 . All of the conversions we have learned still apply, so this number is just
which is (1.3125 * 8) == 10.5. Notice that the mantissa 1.0101 just gets shifted over 3 places, since the
exponent is 3. This is just like decimal scientific notation, there is nothing new going on here – it just
happens to all be in binary!
With our representation, the biggest number we can represent is a bit value 11111111, or 1.1111 * 27 ,
which is decimal 248. Notice that the next smallest number we can represent is 11110111, or 1.1110 * 27 ,
which is 240. So we skipped some integer values!
Well, this should be no surprise – it is the same as with decimal scientific notation. If we limit ourselves
to 2 mantissa digits in base-10 scientific notation, we can write 1.3 ∗ 103 and 1.2 ∗ 103 , which are the numbers
1,300 and 1,200 respectively. But without more mantissa digits, we cannot write the numbers between 1,300
and 1,200. So it is the same with binary floating point.
This is an important lesson, because it underscores the need for the programmer to understand the
accuracy of the floating point representation. It is not infinitely accurate.
Lastly, the only thing we haven’t dealt with is signs. We will talk about this in the next section, but just
notice that there are two signs needed – one on the mantissa, signalling a negative number, and one on the
exponent, signalling a very small number (the binary point is moving far left).
46
positive unsigned number, but it represents exponent values from -126 to +127. The exponent is a power of
2, of course.
In addition to the mantissa, there is a hidden bit that is a 1 bit tacked onto the front of the mantissa. If
you think about it, a binary mantissa always begins with 1 since we don’t write leading 0’s on numbers. So
the IEEE format just assumes that a 1 is there, and doesn’t store it. It is a free extra bit of accuracy.
So, the number represented by a single precision IEEE number is
Value = s * 1.mmmmmmmmmmmmmmmmmmmmmmm * 2 ^ (eeeeeeee - 127)
In decimal terms, this gives a number with about 7 digits of accuracy, and magnitudes from about 10−38 to
1038 .
In double precision (64 bit) IEEE format, the mantissa is 52 bits, and the exponent is 11 bits (with an
offset of 1023). It gives us almost 16 decimal digits of precision, and magnitudes from 10−308 to 10308 . This
is a much larger range than single precision. Quad precision is 112 bits of mantissa, 15 of exponent.
We said earlier that exponents of all 1’s and all 0’s are reserved. This is for special error conditions, like
trying to divide by 0, or taking the square root of a negative number.
An exponent of all 1’s is considered to be infinity – positive infinity if the sign bit is 0, negative infinity
if the sign bit is 1. Dividing a non- zero number by zero results in infinity.
An exponent of all 0’s is considered to be not-a-number, or NaN for short. Dividing 0 by 0, or taking the
square root of a negative number, will result in a NaN value.
47
• directives are commands for the assembler itself, not instructions for the processor; these tell the
assembler the extra information it needs to be able to properly assemble your program, and offer
conveniences to make your program more readable; in Gnu assembly all directives begin with a dot
(’.’).
• labels are names we give for particular positions in our data and in our program; labels end up being
translated into addresses, either data addresses or program addresses;
• comments are always important in any programming endeavor, and we need to have the ability
to create comments in assembly programming as well; although the Gnu assembler allows C-style
comments (/*...*/), the better style is: a line beginning with a #-sign is a whole-line comment, and
any occurrence of a semicolon (;) indicates that the rest of the line is a comment.
An assembly file is typically organized with an indented column of instructions; only comments and labels
ever begin a line without any whitespace in front of them (whitespace is tabs or spaces). Instructions and
directives are indented at a constant indentation (6 or 8 spaces is a good indentation), and then instruction
and directive operands are spaced over to form a second column.
Below is an example assembly program file with a standard organization, which is explained below the
program.
#
# Symbolic constants
#
.set PORTC,0x08 ; symbolic constant for PORT C I/O address
.set DDIRC,0x07 ; symbolic constant for PORT C data dir I/O address
.set LEDB,5 ; symbolic constant for pin position of on-board LED
#
# Global data
#
.data ; directive that starts the data section
.comm val1,1 ; creates 1-byte global variable named val1
.comm val2,1 ; creates 1-byte global variable named val2
#
# Program code
#
.text ; directive that starts the program section
.global compute ; tells assembler to make label compute externally available
compute:
lds r18, val2 ; finally this is the first actual AVR instruction
ldi r19, 23 ; the first four instructions do:
add r18, r19 ; val1 = val2 + 23
sts val1, r18
ldi r18, 0x20 ; set bit 5 to be 1 so that we can turn on the LED
out DDIRC, r18 ; tell the AVR we want pin 5 to be in output mode
out PORTC, r18 ; now output our 1 bit to pin 5, thus turning on the LED
ret
The program above displays most of the basic syntax we will use, except for declaring arrays. The three
sections in the program are important.
The first section is where we define symbolic constants using the “.set” directive. These allow us to use
the symbol name in our program anywhere we would have needed to use the actual value. This makes your
programs MUCH more readable, especially when creating programs that use a lot of I/O ports. You should
use symbolic constants where ever appropriate. These are not variables, and do not exist when your program
is running; the assembler substitutes the symbol name for its value as it assembles your program.
The second section is where we declare any global data (variables) that we need. This section always
must begin with the “.data” directive. We can never put any actual machine instructions in this section, it is
48
only for declaring data, and so the only thing here will be other directives. The code above uses the “.comm”
directive (which means ‘common’, or shared) twice to create two variables. The operands for “.comm” are
the variable name and then its size in bytes; the assembler then reserves that much space for it and equates
the name of the variable with the address of where it is stored; whenever the variable name is used in the
program, the address is substituted for the name just like a symbolic constant’s value is substituted.
The third section is where we actually create our program, and this must begin with the “.text” directive.
Calling the program code “text” goes back many decades to the very early days of computers. Don’t ask
me where it came from, I have no idea! In our code section we create the functions that will make up the
program. In the above code there is only one function, but there could be many (with no need to repeat
the “.text”). The function’s name is simply a label in front of the first instruction of the function. To make
the function accessible outside of the assembly file to other pieces of the program, we must use the “.global”
directive to tell the assembler to share this label with other parts of the program; otherwise the label would
“disappear” once this file was assembled into machine code. All functions end with a return instruction,
which causes the execution to return to whomever called the function.
A label is a named location in the program; in this program “compute” is the only label in the program,
but the variable names “val1” and “val2” can also be considered labels since they refer to data locations.
Labels in the program (and sometimes in the data section) are always a name that begins at the beginning
of the line (no leading spaces are allowed) and ends with a colon (which is not part of the label name).
Function names are labels at the beginning of the function, and we use labels inside functions to indicate
places we need to branch to, such as the top or exit of a loop, and if-then-else blocks.
Notice also that the assembler understands both decimal (plain) and hexadecimal (leading 0x) numeric
constants; it also understands octal (with a leading 0), and binary (with a leading 0b).
• .ascii “string” creates the byte constants in memory for the characters inside the double quotes; as
with .byte, a label is needed to refer to the string. Warning: C-style strings must always end with an
extra byte value of 0 after the last character in the strings, and this directive DOES NOT create that
byte. Usually you should always use .asciz instead of the plain version. It will create the 0 byte.
• lo8(symbol) and hi8(symbol) are built-in assembler functions for accessing the lowest byte (8 bits) of a
16-bit value and the highest byte of a value; these are almost always used on a symbol that represents
an address, usually something like an array name, and is used to load the individual bytes of an address
into an indirect addressing register, one byte at a time.
49