C For The Microprocessor Engineer
C For The Microprocessor Engineer
C For The Microprocessor Engineer
Part II C 167
v
vi Contents
8 Naked C 199
8.1 A Tutorial Introduction 200
8.2 Variables and Constants 202
8.3 Operators, Expressions and Statements 213
8.4 Program Flow Control 224
10 ROMable C 278
10.1 Mixing Assembly Code and Starting Up 278
10.2 Exception Handling 286
10.3 Initializing Variables 291
10.4 Portability 297
11 Preliminaries 310
11.1 Specification 313
11.2 System Design 315
14 Software in C 355
14.1 Data Structure and Program 355
14.2 6809 – Target Code 359
14.3 68008 – Target Code 370
viii
LIST OF FIGURES ix
7.1 Onion skin view of the steps leading to an executable program. 170
7.2 Assembly-level machine code translation. 172
7.3 Assembly environment. 188
7.4 Syntax tree for sum = (n+1) * n/2; 191
7.5 The Whitesmiths C compiler process. 194
16.1 Typical X and Y waveforms, showing two ECG traces covering 2 s. 420
List of Tables
xi
xii LIST OF TABLES
Target Processors
The microprocessor revolution began in 1971 with the introduction of the Intel
4004 device. This featured a 4-bit data bus, direct addressing of 512 bytes of
memory and 128 peripheral ports. It was clocked at 108 kHz and was imple-
mented with a transistor count of 2300. Within a year, the 8-bit 200 kHz 8008
appeared, addressing 16 kbyte of memory and needing a 3500 transistor imple-
mentation. The improved 8080 replacement appeared in 1974, followed a few
months later by the Motorola 6800 MPU [1]. Both processors could directly ad-
dress 64 kbytes of memory through a 16-bit address bus and could be clocked at
up to 2 MHz. These two families, together with descendants and inspired close
relatives, have remained the industry standards ever since.
The Motorola 6800 MPU [2] was perceived to be the easier of the two to use
by virtue of its single 5 V supply requirement and a clean internal structure. The
8085 MPU is the current state of the art Intel 8-bit device. First produced in
1976, it has an on-board clock generator and requires only a single power supply,
but has a virtually identical instruction set to the 8080 device. Soon after Zilog
produced its Z80 MPU which was upwardly compatible with Intel's offering, then
the market leader, with a much extended instruction set and additional internal
registers [3].
The Motorola 6802/8 MPUs (1977) also have internal clock generators, with the
former featuring 128 bytes of on-board RAM. This integration of support mem-
ory and peripheral interface leads to the single-chip microcomputer unit (MCU) or
micro-controller, exemplified by the 6801, 6805 and 8051 MCU families [4]. The
6809 MPU introduced in 1979 [5, 6, 7] was seen as Motorola's answer to Zilog's Z80
and these both represent the most powerful 8-bit devices currently available. By
this date the focus was moving to 16- and 32-bit MPUs, and it is unlikely that
there will be further significant developments in general-purpose 8-bit devices.
Nevertheless, these latter generation 8-bit MPUs are powerful enough to act as the
controller for the majority of embedded control applications, and their architec-
ture is sophisticated enough to efficiently support the requirements of high-level
languages; more of which in later chapters. Furthermore, many MCU families
have a core and language derived from their allied 8-bit MPU cousins.
2
ARCHITECTURE 3
1.1 Architecture
The internal structure of a general purpose microprocessor can be partitioned
into three functional areas:
1. The mill.
2. Register array.
3. Control circuitry.
Figure 1.1 shows a simplified schematic of the 6809 MPU viewed from this per-
spective.
THE MILL
A rather old fashioned term used by Babbage [8] for his mechanical computer
of the last century to identify the arithmetic and logic processor which `ground'
the numbers. In our example the 6809 has an 8-bit arithmetic logic unit (ALU)
implementing Addition, Subtraction, Multiplication, AND, OR, Exclusive-OR, NOT
and Shift operations. Associated with the ALU is the Code Condition (or Sta-
tus) register (CCR). Five of the eight CCR bits indicate the status of the result of
ALU processes. They are: C indicating a Carry or borrow, V for 2's complement
oVerflow, Z for a Zero result, N for Negative (or bit 7 = 1) and H for the Half carry
between bits 3 and 4. These flags are set as a result of executing an instruction,
and are normally used either for testing and acting on the status of a process, or
for multiple-byte operations. The remaining three bits are associated with inter-
rupt handling. The I bit is used to lock out or mask the IRQ interrupt, and the
F bit carries out the same function for the FIRQ interrupt. During an interrupt
service routine the E flag may be consulted to see if the Entire register state has
been saved (IRQ, NMI and SWI) or not (FIRQ). More details are given in Section 6.1.
REGISTER ARRAY
The 6809 has two Data registers, termed Accumulators A and B. These Data reg-
isters are normally targeted by the ALU as the source and destination for at least
one of its operands. Thus ADDA #50 adds 50 to the contents of Accumulator_A (in
register transfer language, RTL, this is symbolized as [A] <- [A] + 50, which
reads `the contents of register A become the original contents of A plus 50'). Op-
erations requiring one operand can seemingly be done directly on external mem-
ory; for example, INC 6000h which increments the contents of location 6000h
([6000] <- [6000] + 1). The suffix h indicates the hexadecimal number base,
whilst b denotes binary. However, in reality the MPU executes this by bringing
down the contents of 6000h (written as [6000]), uses the ALU to add one and
returns it. Whilst this fetch and execute process is invisible to the programmer,
the penalty is space and time; INC M (3 bytes length) takes 7 µs and INCA or INCB
(1 byte length) takes 2 µs (at a 1 MHz clock rate). Thus while it is always better
to use the Data registers for operands, this is difficult in practice because there
are only two such registers. Unlike the older 6800 MPU, the 6809's two 8-bit Data
4 C FOR THE MICROPROCESSOR ENGINEER
registers can be concatenated to one 16-bit double register A:B; the D Accumu-
lator. A few operations such as Add (e.g. ADDD #4567) can directly handle this.
But although the 6809 has pretensions to be a 16-bit MPU, the ALU is only 8-bits
wide and instructions such as this require two passes; but they are nevertheless
faster than two single operations.
Six dedicated Address registers are accessible to the programmer and are as-
sociated with generating addresses of program and operand bytes external to the
processor. The Program Counter (PC) always points to the current program byte
in memory, and is automatically incremented by the number of operation bytes
during the fetch. It normally advances monotonically from its start (reset) value,
with discontinuities occurring only at Jump or Branch operations, and internal
and external interrupts.
Two Index registers are primarily used when a computed address facility is de-
sired. For example an Index register may be set up to address or point to the first
element of a byte array. At any time after this, the nth element of this array can be
fetched by augmenting the contents of the Index register by n. Thus the instruc-
tion LDA 6,X brings down array[6] to Accumulator_A ([A] <- [[X]+6]). Index
registers can also be automatically or manually incremented or decremented and
thus can systematically step through a table or array. The 6809 does not have a
separate ALU for computed address generation, and this can make the execution
of such operations rather lengthy. Sometimes Index registers are used, rather
surreptitiously, to perform simple 16-bit arithmetic, for example counting loop
passes. An example is given in the listing of Table 2.9.
The System Stack Pointer (SSP) register (also known as Hardware Stack Pointer)
is normally used to identify an area of RAM used as a temporary storage area,
to facilitate the implementation of subroutines and interrupts. These techniques
are discussed in Chapters 5 and 6. Rather unusually the 6809 also has a User
Stack Pointer (USP), which can be usefully employed to point to an area of RAM
which can be used by the programmer to place data for retrieval later and will
not get mixed in with the automatic action of the SSP. Both Stack Pointers can
also be used as Index registers.
The address size of most 8-bit MPUs is 16-bits wide, allowing direct access
to 65,536 (216 ) bytes. With a data bus of only 8-bits width, instructions which
specify absolute addresses will be at least three bytes long (one or more bytes
for the operation code and two for the address). As well as needing space, the
three fetches take time. To reduce this problem, the 6800 and 6502 processors
use the concept of zero page addressing. This is a shortform absolute address
mode which assumes that the upper address byte is 00h. Thus in 6800 code,
loading data from location 005Fh (LDAA 005F) can be coded as: B6-00-5F (4 cy-
cles) using the 3-byte Extended Direct address mode or 96-5F (3 cycles) with the
2-byte Direct address mode. In the 6809 MPU this concept has been extended in
that the direct page can be moved to any 256-byte segment based at 00 to FFh,
the segment number being held in the Direct Page register (DP). Thus, supposing
locations 8000 – 80FFh hold peripheral interface devices which are frequently be-
ing accessed, then transferring the segment number 80h into the DP means that
6 C FOR THE MICROPROCESSOR ENGINEER
the instruction LDA 5F, coded as 96-5F, actually moves data from 805Fh into
Accumulator_A. When the 6809 is Reset, the DP is set to 00h and, unless its value
is changed, direct addressing is equivalent to zero page addressing. The DP can
be changed dynamically as the program progresses, but this is worthwhile only
if more than eight accesses within a page are to be made.
CONTROL CIRCUITRY
The remaining registers shown in Fig. 1.1 are invisible to the programmer, in
that there is no direct access to their contents. Of these, the Instruction decoder
represents the ìntelligence' of the MPU. In essence its job is to marshal all available
resources in response to the operation code word fetched from memory. This
sequential control function is the most complex internal process undertaken by
the MPU; however, its design is beyond the scope of this text. References [9, 10]
are useful background reading in this regard. Suffice to say that the 6809, like
its earlier relatives, uses a random logic circuit for its decoder implementation.
This provides for the highest implementation speed but at the expense of a less
structured set of programming operations.
CONTROL BUS
All MPUs have similar data and address buses, but differ considerably in the
miscellany of functions conveniently lumped together as the control bus. These
OUTSIDE THE 6809 7
indicate to the outside world the status of the processor, or allow these external
circuits control over the processor operation.
A single 5 V±5% supply dissipating a maximum of 1.0 W (200 mA). The analogous
Hitachi 6309 CMOS MPU dissipates 60 mW during normal operation and 10 mW
in its sleep mode.
Read/Write (R/W)
Used to indicate the status of the data bus, high for Read and low for Write. Halt
and DMA/BREQ float this signal.
Halt
A low level here causes the MPU to stop running at the end of the present instruc-
tion. Both data and address buses are floated, as is R/W. While halted, the MPU
does not respond to external interrupt requests. The system clocks (E and Q)
continue running.
DMA/BREQ
This is similar to Halt in that data, address and R/W signals are floated. How-
ever, the MPU does not wait until the end of the current instruction execution.
1
This gives a response delay (sometimes called a latency) of 1 2 cycles, as opposed
to a worst-case Halt latency of 21 cycles [5]. The payback is that because the
8 C FOR THE MICROPROCESSOR ENGINEER
processor clock is frozen, the internal dynamic registers will lose data unless
periodically refreshed. Thus the MPU automatically pulls out of this mode every
14 clock cycles for an internal refresh before resuming (cycle stealing).
Reset
A low level at this input will reset the MPU. As long as this pin is held low, the
vector address FFFEh will be presented on the address bus. On release, the 16-bit
data stored at FFFEh and FFFFh will be moved to the Program Counter; thus the
Reset vector FFFE:Fh should always hold the restart address (see Fig. 6.4).
Reset should be held low for not less than 100 ms to permit the internal clock
generator to stabilize after a power switch on. As the Reset pin has a Schmitt-
trigger input with a threshold (4 V minimum) higher than that of standard TTL-
compatible peripherals (2 V maximum), a simple capacitor/resistor network may
be used to reset the 6809. As the threshold is high, other peripherals should be
out of their reset state before the MPU is ready to run.
A negative edge (pulse width one clock cycle minimum) at this pin forces the MPU
to complete its current instruction, save all internal registers (except the System
Stack Pointer, SSP) on the System stack and vector to a program whose start ad-
dress is held in the NMI vector FFFC:Dh. The E flag in the CCR is set to indicate
that the Entire group of MPU registers (known as the machine state) has been
saved. The I and F mask bits are set to preclude further lower priority interrupts
(i.e. IRQ and FIRQ). If the NMI program service routine is terminated by the Re-
Turn from Interrupt (RTI) instruction, the machine state is restored and the
interrupted program continues. After Reset, NMI will not be recognized until
the SSP is set up (e.g. LDS #TOS+1 points the System Stack Pointer to just over
the top of the stack, TOS). More details are given in Section 6.1.
A low level at this pin causes an interrupt in a similar manner to NMI. However,
this time the interrupt will be locked out if the F mask in the CCR is set (as it is
automatically on Reset). If F is clear, then the MPU will vector via FFF6:7h after
saving only the PC and CCR on the System stack. The F and I masks are set to
lock out any further interrupts, except NMI, and the E flag cleared to show that
the Entire machine state has not been saved.
As FIRQ is level sensitive, the source of this signal must go back high before
the end of the service routine.
A low level at this pin causes the MPU to vector via FFF8:9h to the start of the
IRQ service routine, provided that the I mask bit is cleared (it is set automatically
at Reset). The entire machine state is saved on the System stack and I mask set
MAKING THE CONNECTION 9
to prevent any further IRQ interrupts (but not FIRQ or NMI). As in FIRQ, the IRQ
signal must be removed before the end of the service routine. On RTI the machine
state will be restored, and as this includes the CCR, the I mask will return low
automatically.
00 : Normally running
01 : Interrupt or Reset in progress
10 : A software SYNC is in progress (see Section 6.2)
11 : MPU halted or has granted its bus to DMA/BREQ
Figure 1.3 A snapshot of the 6809 MPU reading data from a peripheral device. Worst-case 1 MHz
device times are shown.
a MPU, it is necessary to understand the interplay between the relevant buses and
control signals. These involve sequences of events, and are usually presented as
timing or flow diagrams.
Consider the execution of the instruction LDA 6000h ([A] <- [6000h]). This
instruction takes four clock cycles to implement; three to fetch down the 3-
byte instruction (B6-60-00) and one to send out the peripheral (memory or oth-
erwise) address and put the resulting data into Accumulator_A. Figure 1.3 shows
a somewhat simplified state of affairs during that last cycle, with the assumption
of a 1 MHz clock frequency. The address will be out and stable by not later than
25 ns before Q goes high (t AQ). The external device (at 6000h in our example)
must then respond and set up its data on the bus by no later than 80 ns (t DSR)
before the falling edge of E, which signals the cycle end. Such data must remain
held for at least 10 ns (t DHR) to ensure successful latching into the internal data
register. t AQ, t DSR, t DHR for the 68B09 2 MHz processor are 15, 40 and 10 ns re-
spectively.
Writing data to an external device or memory cell is broadly similar, as illus-
trated in Fig. 1.4, which shows the waveforms associated with, for example, the
last cycle of a STA 8000h (Store) instruction.
Once again the Address and R/W signals appear just before the rising edge
of Q, t AQ. This time it is the MPU which places the data on the bus, which will
be stable well before the falling edge of Q. This data will disappear within 30 ns
MAKING THE CONNECTION 11
after the cycle end t DHW; the corresponding address hold time t AH is 20 ns.
Earlier members of the 6800 family did not provide a Q clock signal. In these
cases the end of the E signal had to be used to turn off or trigger the external
device when writing. As there are only 30 ns after this edge before the data
collapses, care had to be taken to ensure that the sum of the address decoder
propagation delay plus the time data must be held at the peripheral interface
device after the trigger event (hold time) satisfies this criterion. Because of this
tight timing requirement, the E clock is normally routed directly to the interface
circuitry, rather than be delayed by the address decoder (e.g. see Fig. 1.9). With
the 6809, it is preferable to use the falling edge of the Q clock for this purpose
when writing. While reading of course, the peripheral interface must be enabled
up to (and a little beyond) the end of the E cycle, at which point the MPU captures
the proffered data.
The basic structure of a synchronous common data bus MPU-based system is
shown in Fig. 1.5. The term synchronous is used to denote that normal commu-
nication between peripheral device and MPU is open loop, with the latter having
no knowledge of whether data is available or will be accepted at the end of a clock
cycle. If a peripheral responds too slowly, its garbled data will be read at the end
of the cycle irrespective of its validity. In such cases MRDY can be used to slow
things down, although this is considered an abnormal transition. The alternative
closed-loop architecture is discussed on page 71.
12 C FOR THE MICROPROCESSOR ENGINEER
as the duration from the application of a stable address or chip enable until the
activation of the cell to be read from or written to. In the 6116 RAM, this internal
decoding occurs irrespective of the state of the chip enables. Looking first at
RAM interfacing and taking Fig. 1.8 as an example, it is clear that the writing
action is the more critical as this will end earlier at the falling edge of Q. From
Fig. 1.4 we see that we have t AQ + t QH less the RAM data setup time. The Hitachi
HM6116AP-20 has a setup requirement of 50 ns and a 200 ns access time, so:
t AQ + t QH − 50 ≥ 200
t AQ + t QH ≥ 250 ns
At 1 MHz, t AQ + t QH is 455 ns, but this shrinks to 230 ns for a 2 MHz clock. Thus
a 150 ns access time RAM chip must be used in the latter instance; for example
the Hitachi HM6116AP-15. The 6264 RAM has an access time measured from the
chip select. In this case the address decoder delay must be part of the calculation.
An example of this is given in Section 3.3.
ROM chips are interfaced in a similar fashion, but of course they are read-
only. Referring to the timing diagram of Fig. 1.3, we see that as data from the
ROM must be present t DSR before the end of the cycle, we have the relationship
t cyc −t AVS −t DSR ≥ t access. At 1 MHz this sums to 720 ns, and 380 ns at 2 MHz. Most
of the smaller EPROMs, for example the 2 kbyte Texas Instruments TMSD2516JL,
have 450 ns access times. The TMS2764-25JL is an 8 kbyte 250 ns device and is
therefore suitable for the higher-speed processor.
Rather than qualifying each write-to peripheral by Q, it is possible to enable
the address decoder directly. Thus the decoder should have a lengthy output
pulse when a read is in operation, but be cut short (at the end of Q) when a write
is in progress. This relationship can be written as:
giving the decoder output waveforms shown in Fig. 1.6(b)(ii) and (iii). To make
use of the two active low G2A and G2B 74138 inputs, a little Boolean algebra
yields:
it is permissible to mix the two kinds of peripheral devices, each enabled by the
appropriate address decoder. For example, a primary address decoder could en-
able a simple secondary decoder for 68xx peripheral devices, and a more complex
Q related secondary decoder for simple interface circuitry.
18 C FOR THE MICROPROCESSOR ENGINEER
References
[1] Noyce, R.N. and Marcian, E. H.; A History of Microprocessor Development at Intel,
IEEE Micro, Feb. 1981, pp. 8 – 21.
[2] Cahill, S.J.; Designing Microprocessor-Based Digital Circuitry, Prentice-Hall, 1985,
Chapters 8 and 9.
[3] Frazer, D.A. et al.; Introduction to Microcomputer Engineering, Ellis Horwood/Halsted
Press, 1985, Chapter 3.
[4] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1988.
[5] Ritter, T. and Boney, J.; A Microprocessor for the Revolution: The 6809, BYTE, 4, part
1, Jan. 1979, pp. 14 – 42; part 2, Feb. 1979, pp. 32 – 42; part 3, Mar. 1979, pp. 46 – 52.
[6] Wakerly, J.F.; Microcomputer Architecture and Programming: The 68000 Family,
Wiley, 1989, Chapter 16.
[7] Horvath, R.; Introduction to Microprocessors using the MC6809 or the MC68000,
McGraw-Hill, 1992.
[8] Hyman, A; Charles Babbage: Pioneer of the Computer, Princeton University
Press/Oxford University Press, 1982, Chapter 16.
[9] Agrawala, A.K. and Rauscher, T.G.; Foundations of Microprogramming, Academic
Press, 1976.
[10] Encegovac, M.D. and Larg, T.; Digital Systems and Hardware/Software Algorithms,
Wiley, 1985, Chapter 11.
[11] Monolithic Memories; PAL Handbook, 3rd ed., 1983, pp. 6.27 – 6.39 and 8.40 – 8.43.
[12] Cahill, S.J.; Digital and Microprocessor Engineering, 2nd. ed., Ellis Horwood/Prent-
ice-Hall, 1993, Chapter 5.3.
[13] Cahill, S.J.; Digital and Microprocessor Engineering, 2nd. ed., Ellis Horwood/Prent-
ice-Hall, 1993, Chapter 5.3.4.
CHAPTER 2
19
20 C FOR THE MICROPROCESSOR ENGINEER
complete code for PSHS A,B,X is 34-16h. In binary this is 0011 0100 0001
0110, where each bit of the post-byte represents a register to be saved according
to the format shown in Fig. 2.1. Of course the programmer normally need not be
concerned with detail at this level; the assembler will take care of such matters.
LDA #80h
TFR A,DP
first places the number 80h in Accumulator_A (it could equally be B) and then
transfers this to the DP register. This overhead is justified as the DP register
is (or should be) rarely altered. The TransFeR instruction can move the con-
tents of any 8-bit register (A,B,DP,CC) to any other, or any 16-bit register contents
(X,Y,U,S,D,PC) to any other. The upper and lower nybbles (four bits) of the post-
byte determine the source and destination register respectively, according to the
code:
thus TFR A,DP is coded as 1F-8Bh (post-byte 1000 1011b). EXchanGe works in
a similar way between like-sized registers with the same post-byte construction.
ITS INSTRUCTION SET 21
The programmer can easily keep two separate stacks using the System Stack
Pointer and User Stack Pointer registers. These stacks are normally set up at the
beginning of the program, simply by using the relevant Load operation. Thus if
we wish to define RAM from 1FFFh downwards as the System stack and 18FFh
downwards as a User stack, the sequence:
LDS #02000h
LDU #01900h
will accomplish this. Notice that the Top Of Stack (TOS) in both cases is one above
physical memory. This is because the Push and Pull operations, as well as the
22 C FOR THE MICROPROCESSOR ENGINEER
Figure 2.3 Stacking registers in memory using PSH and PUL. Also applicable to IRQ and NMI interrupts.
24 C FOR THE MICROPROCESSOR ENGINEER
Flags
Operation Mnemonic V N Z C Description
Add √ √ √ √ Binary addition
to A; to B ADDA; ADDB √ √ √ √ [A]<-[B]+[M]; [B]<-[B]+[M]
to D ADDD [D]<-[D]+[M:M+1]
B to X ABX • • • • [X]<-[X]+[00|B]
Note 1: Overflow set when passes from 10000000 to 01111111, i.e. an apparent sign change.
Note 2: Overflow set when passes from 01111111 to 10000000, i.e. an apparent sign change.
Note 3: Carry set to state of bit 7 product, i.e. MSB of lower byte; for rounding off.
Note 4: Overflow set if original data is 10000000 (−128), as there is no +128.
Note 5: Carry set if original data is 00000000; for multiple-byte negation.
ITS INSTRUCTION SET 25
calculates the effective address as [X] + 1 and loads it into the X Index register
([X] <- [X] + 1); thus it is the equivalent to an INcrement X (INX) instruction,
which is missing from the 6809's repertoire. Much more powerful permutations
of LEA exist, thus:
LEAY A,X ; Coded as 31-96h
Flags
Operation Mnemonic V N Z C Description
Shift left, arithmetic or logic Linear shift left into carry
1 √√
memory ASL b7
√√ C ← ← 0
A; B ASLA; ASLB 1 b7
Circular or Rotate Shift instructions are similar to Add with Carry, in that they
can be used for multiple-precision operations. A Rotate takes in the Carry from
any previous Shift and in turn saves its ejected bit in the C flag. As an example,
a 24-bit word stored in 24 M 16 15 M+1 8 7 M+2 0 can be shifted
right once by the sequence [4]:
M
LSR M ; 0 → ⇒ b16 → C
M+1
ROR M+1 ; b16/ C → ⇒ b8 → C
M+2
ROR M+2 ; b8 / C → ⇒ b0 → C
In all types of Left Shifts, the oVerflow flag is set when bits 7 and 6 differ
before the shift (i.e. b7⊕b6), meaning that the (apparent) sign will change after
the shift.
The logic operations of AND, OR, Exclusive-OR and NOT (Complement) are
provided, as shown in Table 2.4. The only unusual feature here is the special
instructions of ANDCC and ORCC for clearing or setting flags in the Code Condition
register. Thus to clear the I mask (see Fig. 1.1) we have:
ITS INSTRUCTION SET 27
The setting of the CCR flags can be used after an operation to make some
deduction about, and hence act on, the state of the operand data. Thus, to deter-
mine if the value of a port located at, say, 8080h is zero, then:
LDA 8080h ; Move in data & set Z & N flags as appropriate {86-80-80h}
BEQ SOMEWHERE ; Go somewhere if Z flag EQuals zero {27-xxh}
will bring its contents into Accumulator_A and set the Z flag if it is zero. Branch
if EQual to zero will then cause the program to skip to another place. The
N flag is also set if bit 7 is logic 1, and thus a Load operation can enable us to
test the state of this bit. The problem is, loading destroys the old contents of the
Accumulator, and the new data is probably of little interest. A non-destructive
equivalent of loading is TeST, as shown in Table 2.5. The sequence now becomes:
TST 8080h ; Check data & set Z & N flags as appropriate {7D-80-80h}
BEQ SOMEWHERE ; Go somewhere if Z flag EQuals zero {27-xxh}
but the Accumulator contents are not overwritten. However, 16-bit tests must
be carried out using a 16-bit Load operation as only 8-bit TeST instructions are
provided.
TeST can only check for all bits zero or the state of bit 7. For data already in
an 8-bit Accumulator, ANDing can check the state of any bit; thus:
28 C FOR THE MICROPROCESSOR ENGINEER
will set the Z flag if bit 5 is 0, otherwise Z will be cleared. Once again this is a
destructive examination, and the equivalent from Table 2.5 is BIT test; thus:
BITB #00100000b ; Coded as C5-20h
does the same thing, but with the contents of Accumulator_B remaining un-
changed; and more tests can subsequently be carried out without reloading.
Comparison of the magnitude of data in an Accumulator with either a constant
or data in memory requires a different approach. Mathematically this can be
done by subtracting [M] from [A] and checking the state of the flags. Which
flags are relevant depend on whether the numbers are to be treated as unsigned
(magnitude only) or signed. Taking the former first gives:
[A] Higher than [M] : [A]−[M] gives no Carry and non-Zero C=0, Z=0 (C + Z=1)
[A] Equal to [M] : [A]−[M] gives Zero (Z=1)
[A] Lower than [M] : [A]−[M] gives a Carry (C=1)
The signed situation is more complex, involving both the Negative and oVer-
flow flag. Where a subtraction occurs and the difference is positive, then either
bit 7 will be 0 and there will be no overflow (both N and V are 0) or else an overflow
will occur with bit 7 at logic 1 (both N and V are 1). Logically, this is detected by
the function N⊕V. A negative difference is signalled whenever there is no over-
flow and the sign bit is 1 (N is 1 and V is 0) or else an overflow occurs together
with a positive sign bit (N is 0 and V is 1). Logically, this is N⊕V. Based on these
outcomes we have:
[A] Greater than [M] : [A]−[M] → non-zero +ve result (N⊕V·Z = 1 or N⊕V+Z = 0)
[A] Equal to [M] : [A]−[M] → zero (Z=1)
[A] Less than [M] : [A]−[M] → a negative result (N⊕V = 1)
applied to both Index and Stack Pointer registers as well as 8- and 16-bit Accu-
mulators.
2's complement Branch
All Conditional operations in the 6809 are in the form of a Branch instruction.
These cause the Program Counter to skip xx places forward or backwards; usu-
ally based on the state of the CCR flags. Excluding Branch to SubRoutine (see
Section 5.1), there are 16 Branches provided, which can be considered as the True
or False outcome of eight flag combinations. Thus Branch if Carry Set (BCS)
and Branch if Carry Clear (BCC) are based on the one test (C =?).
If the test is True, the offset following the Branch op-code is added to the
Program Counter. Thus if the Carry flag is zero:
will add 0008h to the Program Counter state E102h to give PC = E10Ah. Note
that the PC is already pointing to the following instruction when execution occurs,
giving an effective destination of ten places on from the Branch location. The
Branch offset is sign extended before addition to the Program Counter; thus if
the N flag is zero:
E100:1 BPL-F8 ; Coded as 24-F8h
gives PC<-E102h + FF F8h = E0FAh, which is eight places back (six places back
from the Branch itself). With such a single signed-byte offset, the maximum range
is only +125 and −129 bytes.
Each 6809 Branch has a long equivalent which uses a double-byte offset. Thus
the Conditional Branch:
E100:1:2:3 BCC-100F ; Coded as 10-24-10-0Fh
Table 2.7: (a) The M6809 instruction set (continued next page).
Table 2.7: (b) The M6809 instruction set (continued next page).
Table 2.7 (c) (continued). The M6809 instruction set. Reproduced by courtesy of Motorola Semicon-
ductor Products Ltd.
inform the MPU's Control registers where this data is being held. There are a
few exceptions to this, the so called Inherent operations, such as NOP (No OP-
eration) and RTS (ReTurn from Subroutine). Single-byte instructions whose
operand is a single register, for example INCA (INCrement accumulator A), are
also sometimes classified as Inherent.
With the exception of Inherent instructions, the bytes following the op-code
are either the (constant) operand itself, or more usually a pointer to where the
operand can be found. We have already met the simplest of these, where the
absolute address itself follows, as in:
LDA 2000h ; [A] <- [2000] {Coded as B6-20-00h}
program would take 3072 cycles, whilst the loop equivalent takes considerably
longer at 4867 cycles to execute.
In the remainder of this section, we will look at the 6809 address modes. In
this catalog, op-code may be one or two bytes.
Inherent
op-code
All the operand information is contained in the op-code, with no specific address-
related bytes following. All of the 6809 inherent operations are one byte long
except SoftWare Interrupt 2. An example is NOP (No OPeration). Motorola
also classify most Register-Direct instructions as inherent, for example INCA (IN-
Crement A). Table 2.7 gives the Inherent instructions.
Register Direct, R
op-code post-byte
Immediate, #kk
With Immediate addressing, the byte or bytes following the op-code are constant
data and not a pointer to data. We have used this form of addressing before, in
the array argument routine in Table 2.8. Some examples are:
Absolute, M
36 C FOR THE MICROPROCESSOR ENGINEER
Here the op-code is followed by a post-byte 9Fh and then a 16-bit address. This
is not the address of the operand but a pointer to where the operand address is
stored in memory. Thus, if the locations 2000:2001h hold the address 80-08h,
then the instruction:
effectively fetches the data down from 2000h and then 2001h, puts them to-
gether as a 16-bit address and sends this address out on the address bus to fetch
the data into Accumulator_A. Although the location in memory of this pointer
address is absolute, the pointer residing there can be altered as the program
progresses.
As an example, consider the problem of implementing a subroutine (see Chap-
ter 5) which will process in some way the contents of an array of data. Rather
than passing each element of the array to the subroutine it makes sense to send
only the address or pointer to the first element. This can be done by using an
absolute address, say 2000:2001h, to store the pointer prior to jumping to the
subroutine. The subroutine can then use this pointer as a sort of base address
to access any element of the array relative to this location.
As this indirect address is at an absolute location, this address mode is only
slightly more flexible than the ordinary absolute modes. However, indirection
can be used in conjunction with the Indexed addressing modes discussed below.
As in the absolute case, the effective address is in fact only the address of a
pointer to the data and not the data itself.
ADDRESS MODES 37
Branch Relative
op-code offset 8-bit (Short)
We have already discussed this form of address mode in the previous section.
Regular (or short) Branches sign extend the following 8-bit offset, and add this to
the Program Counter. Effectively this means that offsets between 80h and FFh
are treated as negative. For example the instruction BRA -06 is coded as 20-FAh
(FAh is the 2's complement of 06h) when the PC is at E108h, is implemented as:
Indexed
The Absolute address modes are used where operands lie in fixed locations. In
many cases, this places an unacceptable restriction on the data structures which
can easily be processed. Compilers, for example, like to pass parameters in a
stack, and these should then be capable of being retrieved in locations relative
to the Stack Pointer. The 6800 MPU has a primitive form of computed effective
address (ea), where this could be up to +FFh (+255) bytes from the contents of
one Index register thus:
LDAA 8,X ; [A] <- [X] + 8
means that if X is 8000h at the time of execution, then 8008h is the ea of the
data brought down to Accumulator_A. The 6809 has an additional complement of
Index registers (X, Y, S, U and sometimes the PC), as well as an extended repertoire
of offsets. Constant offsets of up to ±215 are now possible, and Accumulator_A,
_B or _D can act as a variable offset. In addition, automatic incrementation or
decrementation submodes are possible. A level of indirection is also provided
for most combinations. Table 2.7(c) summarizes the submodes, which are coded
as an op-code followed by a post-byte. Notice that Absolute Indirect is part of
this table, although strictly it is not an Indexed address mode.
As we saw in the listing of Table 2.8(b), indexing comes into its own when stepping
through blocks of memory, arrays and related structures. To avoid having to
follow (or lead) the use of the Index register with an Increment or Decrement,
this mode provides for automatic advance or retard; thus:
LDA ,R+ ; Bring down data byte and then increment Index register R
LDA ,-R ; Bring down data byte and then increment Index register R twice
LDA ,R++ ; Decrement Index register R and then bring down data byte
LDA ,--R ; Decrement Index register R twice and then bring down data byte
op-code post-byte
Note that the value of the 8-bit Accumulator is sign extended before the addition,
giving a range of +127 to −128. Thus if B is FEh, then FF FEh is added to the X Index
register in the first example above to give the effective address. Of course, FFFEh
is effectively −2, so the target memory location is actually X − 2. If this is not
desirable, Accumulator_A may be cleared and D used as the offset, e.g.:
CLRA
LDA D,X ;[A] <- [00|[B]+[X]]
The first instruction puts the absolute address of the first table element (E206h)
in the X Index register. The effective address calculated in the following instruc-
tion is B + X. If, say, B is 04h on entry, then this gives 0004 + E206 = E20Ah.
The data in here is 4Ch, and this is the value loaded into Accumulator_B. Notice
the assembler directive .BYTE, which states that the following bytes are to be put
into memory verbatim; that is not to be interpreted as instruction mnemonics.
One of the major advantages of the Relative address mode is that it produces
position independent code (PIC). Thus a Branch is relative to where the program
is at the time the decision is taken. If the program is moved to a different part
of memory, all the offsets move with it unchanged. This is what differentiates
a Branch from a Jump operation. The Program Counter Offset mode extends
the PIC capability to any instruction which has an Indexed address mode. This is
similar to the Constant Offset from Register mode, but with the Program Counter
being the Index register. For example in:
LDA 200h,PC ;[A] <- [200+[PC]]
the data 200h bytes on from where the PC is on execution (pointing to the fol-
lowing instruction) is placed in Accumulator_A. This of course is not an absolute
address, as only the distance from the instruction is of interest. PIC is especially
suitable for code in ROM (i.e. firmware) which can be placed anywhere in the ad-
dress space. Thus a vendor could sell a ROM-based floating-point package with
no a priori knowledge of where the customer will locate the firmware in memory.
As an example of this, consider the 7-segment decoder routine previously dis-
cussed. Line 1 of the actual code (shown second column from the left) contained
the bytes E2-06h, which is the absolute location of the table bottom. If, say, the
table of data was to start at C180h, then the ROM would have to be reprogrammed
to make these two bytes C1-80h, the rest of the code remaining unaltered. Here
is a PIC version of the same routine:
C102/3/4 30-8C-03 LEAX 3,PC
;Effective address PC+3 is loaded into X, which then points to the table
C105/6 E6-85 LDB B,X ; Get element [B]
C107 39 RTS
still produces the same code 30-8C-03h; that is the label TABLE_BOT is interpreted
by the assembler as the distance from the following instruction to the absolute
address TABLE_BOT and not the absolute value C108h.
EXAMPLE PROGRAMS 41
We first met the Load Effective Address (LEA) instruction in Table 2.2. Here
we observed that it could be used to perform simple arithmetic on the X, Y, U or S
registers. Essentially, any effective address computed by any of the Direct Index
address modes, except Post-Increment/Pre-Decrement, can be loaded into one of
these four registers. A few examples are:
instruction mnemonic, the operand (if any) and an optional comment. Some as-
semblers require all fields to be present in spirit, their absence being signalled
by spaces or tabs. The Real Time Systems XA8 cross assembler1 used here has a
free format, where absent fields can simply be omitted. The only essential role
of space is in separating the instruction mnemonic from its operand. However,
as the following code fragment shows, spaces and tabs should be used for read-
ability:
or
The latter source code is obviously more pleasing to the eye. Notice that lines 1
and 2 have no label, line 2 no comment and line 3 no operand field.
Looking at the syntax in more detail.
Labels
These are defined in the first field and should be delineated by a colon. The colon
is omitted when the label is referred to in the operand field. The label takes on
the value of the Program Counter pointing to the first instruction byte. Labels
can be up to 15 meaningful alphanumeric (including _ and .) characters long,
and should not start with a numeral.
Operator mnemonics
These are the standard manufacturer's mnemonics, with a few minor extensions.
There must be an entry in this field.
Operand
These may be a label, defined name, address or data constant. Numbers may be
in decimal, hexadecimal, octal, binary or ASCII. Thus the following all translate
to the same:
LDA #43h ; Codes as 86-43h. Use a 0 prefix if MSD is alpha, e.g. 0F6h
LDA #67 ; Codes as 86-43h. Decimal 67 is 43 hex
LDA #01000011b ; Codes as 86-43h. Binary 01000011 is 43 hex
LDA #103o ; Codes as 86-43h. Octal 103 is 43 hex
LDA #'C' ; Codes as 86-43h. ASCII 'C' is 43 hex
1 Real Time Systems, M & G House, Head Road, Douglas, Isle of Man, British Isles; Intermetrics
Microsystems Software Inc., 733 Concord Avenue, Cambridge MA 02138, USA.; Whitesmiths Aus-
tralia Pty Ltd. PO Box 756, Suite 3, 47 Regent Street, Kogarah NSW 2217, Australia; COSMIC SARL,
33 rue Le Corbusier, EUROPARC CRETEIL, 94035 CRETEIL CEDEX, France and ADaC, Nihon Seimei
Otsuka Bldg., No. 13-4 Kita Otsuka 1-chome, Toshima-Ku, Tokyo 170 Japan.
EXAMPLE PROGRAMS 43
but the use of the appropriate form aids in readability and thus documentation.
Mathematical expressions can be used to generate a constant at assembly time,
thus:
Comment
The final field is simply a documentation comment, delimited by a semicolon ;.
Whole-line comments are possible with an initial semicolon. Some assemblers
use an asterisk * to delimit comments.
Some of the more common assembler directives, all of which are distinguished
by a leading period, are:
.PROCESSOR
The first line of source code must indicate which processor is being targeted, e.g.:
.processor m6809
.END
The last line of source code must be .end.
.DEFINE
This gives a permanent value to a symbol. For example:
the source file by simply altering this header. The mnemonic EQU (EQUate) is
frequently used in other assemblers to perform the same function; see page 180.
.INCLUDE
Source code in separate files can be included for assembly by using this directive,
for example:
.PSECT
A useful feature of this assembler is the ability to delineate sections of the source
program to produce code in different memory areas. Thus program code and
fixed constants can be assigned to area _text which the linker can place in mem-
ory occupied by ROM, whilst section _data can be used for variable data destined
for RAM. An example of the use of .psect is given in Table 2.12.
.ORG
The assembler used here is configured to be relocatable, that is absolute ad-
dresses are not assigned until link time (see Section 7.2). The .ORG function
is normally used in an absolute assembler (one in which absolute locations are
assigned at assembly time) to denote where the code commences. In the RTS as-
sembler .ORG can be used in a relocatable manner relative to a label, for example:
Assuming that the section _text is linked to 0E000h, then the code at RVECTOR
is commanded to be placed in 0E000h + 1FFEh = 0FFFEh.
Statements such as this have to be used with caution where the program is
blasted into ROM. Constants can be located in ROM (e.g. .psect _text). but not
in RAM (e.g. .psect _data). This is because there is no download of code prior
to the run, and volatile memory is unpredictable on power up. Care must be taken
when using a simulator to debug such programs, as this data is downloaded into
RAM from the assembled machine code file and will then appear to be available
at start-up.
The algorithm used in Table 2.9 simply clears the initial total, temporarily
located in the X Index register, and adds to it the progressively decrementing in-
teger, kept in Accumulator_B. When B reaches zero, the grand total is transferred
to Accumulator_D for return. The instruction Add B to X (ABX) is a convenient
vehicle to add the 8-bit integer to the 16-bit partial summation. Without it, n
would have to be unsigned promoted to 16-bits by zeroing Accumulator_A and
then the instruction LEAX D,X used for the addition.
The source-code file is translated by the assembler program to produce a
machine-code file, which will eventually find its way into program memory. An
absolute listing file is also generated, which documents the machine code and its
46 C FOR THE MICROPROCESSOR ENGINEER
location together with the original source code. The listing of Table 2.10 shows
the outcome of the translation, with the line number, location and machine code
occupying the leftmost three columns. This type of file is often referred to as
object code. The absolute location of the machine code is decided by the linker-
locator program, as described in Section 7.2. All 6809-based programs in this text
assume ROM from E000h upwards for the program sections designated _text,
and RAM from 0000h upwards for the _data sections. Only _text is needed in
this case.
The program of Table 2.10 is 12 bytes long and takes 16 + 13n cycles (max-
imum 3331). An alternative algorithm recognizes that the total is given by the
expression n × (n + 1) ÷ 2. In Table 2.11 this is implemented by copying n into
Accumulator_A, incrementing it, multiplying the two Accumulators and doing a
single double-byte shift right (i.e. ÷ 2). Only six bytes long and executing in a
fixed 28 cycles, this illustrates that time taken in refining the problem algorithm
can be profitable. However, there is a bug in this implementation, with one value
of n giving an erroneous zero answer. Can you determine which, and recode to
avoid this problem?
another program module, the resulting code will be acceptably short. In any
case, in the absence of a hardware divide operation in the 6809, execution time
is likely to be long.
An alternative algorithm, which is especially suitable for small numbers, is
illustrated in Fig. 2.4. Essentially the nth-decade digit is evaluated as the number
of successful subtractions by 10n , where n begins at the highest possible value,
and is decremented towards zero after each decade evaluation. As the maximum
value for a 16-bit binary number is 65,535, this requires subtraction by 10,000,
1000, 100, 10 and 1. With the procedure being the same for each decade, it is
easier to store the constants as a table in ROM and use a loop with an advancing
pointer to select the decade and its corresponding table entry. This look-up table
is shown in the listing of Table 2.12 in line 43. Notice the additional zero word
at the end of the table; this is used to provide an escape mechanism after the
decade passes 100 .
The actual subtraction of 10n is performed in line 23, with the X Index regis-
ter pointing into the table of powers. If no borrow is generated (C = 0), the byte
holding the nth string character (initialized to ASCII 0 = 30h in lines 18 – 21)
is incremented and the process repeated (lines 25 – 28). On emerging from this
inner (decade) loop, the 10n constant is added back to compensate for the one
subtraction too many. As line 30 uses the Double-Increment Index address mode
(ADDD ,X++), the table pointer is simultaneously advanced one word. LEAY 1,Y
then increments the string pointer (the Y Index register) one byte, and the scene
is set for the next decade evaluation. Before returning to the top of this outer
loop, the escape condition (i.e. NULL) must be tested. There is no instruction
to test the zero state of a double memory location; instead an unused double
register is loaded with the word data (LDU 0,X in line 34) and the Z flag will
be set accordingly. An alternative escape procedure would be to decrement a
count on each loop pass or simply to check the table pointer for 0E030h (e.g.
CMPX #PWR_10+10). Using a special terminate character is better where the length
48 C FOR THE MICROPROCESSOR ENGINEER
Table 2.12 Object code for the conversion of 16-bit binary to an equivalent ASCII-coded decimal
string.
1 .processor m6809
2 ; *****************************************************************
3 ; * Converts 16-bit binary to a string of five ASCII-coded *
4 ; * characters terminated by 00 (NULL) *
5 ; * EXAMPLE : FFFF -> '6''5''5''3''5''0' (36/35/35/33/35/00h) *
6 ; * ENTRY : Binary word in D *
7 ; * EXIT : Decimal string in 6 RAM bytes starting from DEC_STRG*
8 ; * EXIT : All register contents unchanged *
9 ; *****************************************************************
10 .list +.text
11 .define NUL = 0000
12 .psect _text
13 E000 3476 BIN_2_DEC: pshs a,b,x,y,u ; Save pointer registers used
14 ; N=4
15 E002 308C21 leax PWR_10,pc ; Point to table bottom (10^4)
16 E005 108E0000 ldy #DEC_STRG ; Point to beginning of string in RAM
17 ; Nth decade = '0'
18 E009 1F03 NEW_N: tfr d,u ; Put away binary for safekeeping
19 E00B 8630 lda #'0' ; Put ASCII '0' in nth decade of string
20 E00D A7A4 sta 0,y
21 E00F 1F30 tfr u,d ; Get binary back
22 ; Binary - 10**N
23 E011 A384 NEXT_SUBT: subd 0,x
24 ; Can do?
25 E013 2504 bcs NEXT_DEC ; A Carry/borrow means No
26 ; IF Yes THEN increment Nth decade
27 E015 6CA4 inc 0,y
28 E017 20F8 bra NEXT_SUBT
29 ; ELSE restore 10**N to binary
30 E019 E381 NEXT_DEC: addd ,x++
31 ; N = N -1
32 E01B 3121 leay 1,y ; Advance one decade
33 ; N < 0?
34 E01D EE84 ldu 0,x ; Look for double-byte NULL in table
35 ; No
36 E01F 26E8 bne NEW_N
37 ; Yes
38 E021 6FA4 clr 0,y ; IF Yes terminate the string
39 ; End
40 E023 3576 puls a,b,x,y,u ; Return old register values
41 E025 39 rts ; Omit if above is puls a,b,x,y,u,pc!
; ****************************************************************
42 ; This is the table of powers of 10
43 E026 2710 PWR_10: .word 10000,1000,100,10,1,NUL
03E8
0064
000A
0001
0000
; ****************************************************************
44 ; This is the area of RAM where the number string is to be returned
45 .psect _data
46 0000 DEC_STRG: .byte [6] ; Reserve six memory bytes for string
47 .end
of the table can vary, and is the normal approach to character strings, as is spec-
ified in this example (line 38).
None of the MPU's registers are altered by this subroutine, except the Code
Condition register. A subroutine with this property is known as transparent. This
50 C FOR THE MICROPROCESSOR ENGINEER
is achieved by pushing the used registers onto the System stack at the beginning
(line 13) and restoring them at the end (line 40). In general the number of Push
and Pull operations should match to ensure that the System Stack Pointer is back
up to the return Program Counter, which was shoved out automatically when the
subroutine was called. Thus ReTurn from Subroutine (RTS) will then be able
to retrieve the original PC as required. One trick sometimes seen is to add the PC
to the last PULS, which of course does the same thing; thus:
PULS A,B,X,Y,U,PC
is the same as
PULS A,B,X,Y,U
RTS
The two pointers, X to the table and Y to the string, are set up just after
the initial Push. The table pointer is set up in line 15 using the Program Counter
Relative address mode, LEAX PWR_10,PC. Looking at the machine code produced
(namely 30-8C-21h), shows an operand of 21h, being the distance between the
PC (pointing at execution time to the following instruction at 0E005h) and the
start of the table at 0E026h. This relative operand ensures that no matter where
the program/table ROM is placed in address space, the code need not be altered.
This code is strictly speaking not position independent, as the string is in a fixed
location in the _data program section, that is in RAM. If DEC_STRG is the first
occurrence of .psect _data, then our linker will place the string at locations
0000h to 0005h. Thus the code in line 16 for LDY #DEC_STRG is 108E-0000h.
We could use the Program Counter Relative mode here (i.e. LEAY DEC_STRG,PC)
but this would mean that the address distance between the ROM and RAM chips
would have to remain constant, and they could not be independently relocated:
not very convenient.
Our last example also has a mathematical flavor. We are required to calculate
the factorial of an integer n passed in Accumulator_B. The factorial of n (repre-
sented as n!) is defined as n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1. By convention
0! is defined as 1 [5].
Superficially this appears to be the same as our first example, but with multi-
plication replacing addition, see Fig. 2.5. However, the product rapidly becomes
very large, with 12! = 479, 001, 600 being the largest factorial fitting into a 32-bit
binary number. Thus we will restrict n to the range 0 – 12, and will have to return
n! in four memory bytes, as no 6809 register of this size is available (although it
could be returned in two pieces using, for example, the X and Y Index registers).
Furthermore, we will use Accumulator_B to return an error status byte of FFh if
the programmer sent an out of range integer (n > 12), otherwise 00h indicating
success.
Our first problem is that product generation is a 4-byte long-word process,
whilst the 6809 can only perform an 8 × 8 multiplication. Thus our requirement
EXAMPLE PROGRAMS 51
From the above discussion, we see that the addition process is different at
each position, as the 16-bit result from the multiplication `slides' from right
to left. This is a pity, as otherwise the four multiply/add steps are the same.
This inefficiency can be circumvented by allowing three buffer temporary bytes,
as shown dashed at the top of Fig. 2.6. This allows us to put the multiply/add
EXAMPLE PROGRAMS 53
process in a loop ensuring that no other data object is inadvertently altered by the
slide leftward. In lines 36 – 48 of this loop, the X Index register is used to point to
both the relevant product byte (line 37) and, with offset, to the temporary addition
target bytes (lines 39 – 46). When the multiplication is over, the result becomes
the new product (lines 51 – 56). n is decremented in situ on the System stack,
using the System Stack Pointer as an Index register (line 56), and the process
continued until n = 1 (line 28). On exit Accumulator_B is cleared to indicate
success (line 58), unless n is >12 on entry, in which case FFh is put into B (line 17),
and an immediate exit made. Notice how four bytes for the product and seven
temporary locations are reserved in the data program section (RAM) in lines 65
and 66.
As there are only 13 legitimate outcomes of the program for n = 0 → 12, a
more efficient technique is to use a look-up table. The coding for this approach
is shown in Table 2.14. Basically, the X Index register is pointed to the bottom of
TABLE (line 22) and n (stored in Accumulator_B) is used as an offset to point into
the relevant area. As each table entry occupies four bytes, B must be multiplied
by four (by shifting twice left in lines 20 and 21), so that it goes up in 4-byte steps.
The operation Load Effective Address into X with the address mode B,X points
X to the entry in line 23 (the maximum value of B is 48, thus its sign extension
inherent with this address mode will have no deleterious effect). Now the high
word can be moved from the table to 2 bytes of memory via Index register_Y
(lines 24 and 25). As the Indexed with Post Double Increment address mode
is used, X will automatically point to the lower word, for a repeat performance
(lines 26 and 27).
The coding shows the assembler directive .DOUBLE being used for the first
eight table entries and .WORD twice for each of the remaining entries. This is
deliberate, as the assembler used here has a bug which gives incorrect values
for .DOUBLE above 32,767 (00007FFFh). Assemblers, as all other software, are
not immune to bugs! See Table 4.14 for a look-up table using .DOUBLE for this
situation.
It is interesting to compare the performance of the two implementations. The
former mathematical algorithm requires 96 bytes of ROM and 11 of RAM. Its
operation time varies with n, from 53 cycles with n = 0 or 1 to 1724 cycles with
n = 12. The tabular approach takes 84 bytes of ROM and 4 of RAM, and takes a
fixed 42 cycles for n between 0 and 12. In both cases an error situation requires
30 cycles. The conclusion is obvious.
References
[1] Ritter, T and Boney, J.; Preliminary Detailed Description MC6809, Motorola Bulletin
055, March 1978.
[2] Ritter, T and Boney, J.; A Microprocessor for the Revolution: The 6809, BYTE, 4, part 1,
no. 1, Jan. 1979, pp. 14 – 42.
[3] Bartee, T.C.; Digital Computer Fundamentals, 5th ed., McGraw-Hill, 1981, Section 6.16.
[4] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987, Section 1.2.
[5] Dorn, W.S. and McCracken, D.D.; Introductory Finite Mathematics with Computing,
Wiley, 1976, Section 8.3.
CHAPTER 3
56
INSIDE THE 68000/8 57
ics. In general the 68000 family is found in the more powerful personal comput-
ers, such as the APPLE Macintosh, as well as graphic workstations such as the
Hewlett Packard Apollo DN series.
All this growth in raw power has made the microcomputer at least as powerful
as a minicomputer from the last decade, but there has also been a spin-off into
the area of embedded microprocessor circuitry, with which we are concerned in
this text. Although the current 8-bit microprocessors are adequate in the ma-
jority of embedded applications, either singly on in multiple-processor config-
urations, many of the more powerful tasks are being implemented using these
newer devices. This is not necessarily due to their virtues, but because more aids
to hardware and software design, which have appeared in the last decade, have
been targeted in this direction. This is especially true in the field of compiler and
simulator work.
The 68000/8 is the second of our MPUs we have chosen to illustrate high-level
language techniques. This and the following chapter overviews its hardware and
software features.
THE MILL
A 16-bit ALU implements in hardware the arithmetic operations of Addition, Sub-
traction, Multiplication and Division; the former with and without carry/borrow
and the latter in signed and unsigned representations. The logic operations of
AND, OR, Exclusive-OR, NOT and Shift are also provided.
Five flags in the associated Code Condition register (CCR) provide a status
report on ALU activity. The Carry, Negative, Zero and 2's complement oVerflow
semaphores are standard, but the eXtend flag needs some explanation. The X flag
is similar to Carry, but is only affected by Addition, Subtraction, Negate and cer-
tain Shift operations. Multiple-precision versions of these instructions use the
X flag for their carry; thus the familiar ADd with Carry (ADC) instruction ap-
pears here as ADD with eXtend (ADDX). For example, this means that a Compare
operation, which of course affects the C flag, can be done in between multiple-
precision operations without affecting the `true' carry information (which is in
X).
We shall see that the 68000 MPU directly operates on byte (8-bit), word (16-bit)
or long-word (32-bit) data. All CCR flags operate correctly (eg. Carry from bit 7,
bit 15 or bit 31 respectively), automatically reflecting the operand size.
As shown, the Code Condition register occupies the lower byte of the 16-
bit Status register; the upper field containing masks and bits which control the
58 C FOR THE MICROPROCESSOR ENGINEER
Status register
Control register Code Condition register
T S I2 I 1 I0 X N Z V C
.. .. .. .. .. .. ..
... ... Interrupt .. .. .. .. .
.. .. .. .. Carry/Borrow
.. .. Mask .. .. ..
.. .. Priority .. .. . oVerflow (2's complement)
.. .. .. .. Zero outcome
... . Level ..
. Supervisor/User state . Negative (MSB=1)
Trace on/off eXtend carry
operating mode of the processor. The three bits I2 I1 I0 represent the Interrupt
mask. The MPU will only respond to an interrupt request signalled externally
on pins IPL2 IPL1 IPL0, (IPL stands for Interrupt Priority Level) if this active-low
IPL number is above the mask number. For example, an IPL number Low High
Low (active-low 5) will trigger a level-5 request (IRQ5 in Fig. 3.1), if the mask is
set at between 000 and 100. The exception is a level-7 request, which is non-
maskable. More details are given in Section 6.1. The mask is set to level 7 at
Reset, thus inhibiting all but a non-maskable interrupt.
The 68000 MPU leads a Jeckyll and Hyde existence, in that it has two states
of existence, which are virtually independent of each other. These are somewhat
more prosaically termed the Supervisor and User states. When the MPU is Reset,
the S bit in the Status register is automatically set to 1, the Supervisor state.
Certain, so called privileged instructions, can only be executed in this state.
These instructions generally deal with the overall operation of the processor. For
example it is only possible to change the Interrupt mask in the supervisor state;
for example:
sets the mask to level 4. Moving data into the Status register is a privileged
instruction (but not reading it).
The only way to exit the Supervisor state is to clear the S bit, for example:
will clear bit 13 of the Status register and leave all else unchanged. As you might
expect AND Immediately to Status Register is privileged, as is ORI #data,SR
(to set individual bits) and EORI #data,SR (to toggle individual bits).
Once in the User state you cannot return to the Supervisor state by simply
setting the S bit as the MOVE, ANDI, ORI and EORI #data,SR instructions are
illegal in this situation, (but note that the same instructions targeted to the CCR
part of the Status register are perfectly legal; for instance:
ORI #00000001b,CCR
60 C FOR THE MICROPROCESSOR ENGINEER
sets the Carry flag. The only way back to the Supervisor state is when an inter-
rupt or Trap occurs (a Trap is a type of Software interrupt, and is described in
Chapter 6).
What is the point in having two distinct states? In a multitasking environment
(more than one program running concurrently on the same machine) it is usual
to have a master program, known as the operating system. The operating system
provides resources to the user program, such as an interface to a magnetic disk
store. Where more than one user program appears to run simultaneously, it may
switch between these programs in a time-slice manner in a fairly complex way [4].
As a simple example, consider a microprocessor development system to which
software can be downloaded into RAM, whence it can be run and tested. The
operating system, here called a monitor, usually resides in ROM. Once control is
passed from the monitor to the user program running in real time, the only way
back to the operating system is via a Software interrupt, Hardware interrupt or
Reset. In all these situations it is important to ensure that user programs do not
corrupt memory or other resources used by the operating system.
In the 68000 MPU, this operating system runs in the Supervisor state, into
which it enters automatically on Reset. The MPU informs the outside world which
mode it is running by using the three Function Code pins FC2 FC1 FC0, as detailed
in Section 3.2. Thus the hardware engineer can design the address decoder to
access Supervisor ROM and RAM chips in an entirely separate address space than
that accessible to the User program. Furthermore, the Supervisor and User modes
have separate System Stack Pointers, the Supervisor Stack Pointer (SSP) and User
Stack Pointer (USP). Thus, in reality there are two A7 registers, only one of which
is active in any mode. Both separate and mutually exclusive memory spaces
and System Stack Pointers make it difficult for the user program to accidentally
corrupt the operating software.
In small dedicated embedded systems there often is no operating system as
a separate entity. In such naked cases, it is normal to stay in the Supervisor
state and ignore the existence of the User state. We will do this for our project
in Part 3. However, the security of a two distinct states is important for the
reliable operation of more sophisticated embedded systems, especially where an
extensive interrupt driven configuration is being used.
Finally, bit 15 of the Status register is the Trace bit. When set to 1, a Software
interrupt/Trap will occur at the end of each instruction execution. This can be
used in conjunction with a suitable operating system routine to print out infor-
mation, such as the register contents after each step of the program [5]. The
Trace bit is turned off on Reset.
REGISTER ARRAY
As in all microprocessors, the 68000 has a Program Counter (PC) which essentially
points to the next instruction to be fetched. With this MPU, the situation is a
little more complex. This is because use is made of time when the external buses
would normally be idle, to bring down words from program memory into a 2-word
prefetch queue buffer [1]. For example, when a Branch is executed, both the next
INSIDE THE 68000/8 61
instruction and the Branch-to op-code will already be in the buffer. Which one
executed, will depend on the outcome of the condition test. Like most of the
registers, the PC is 32-bit wide, but in the basic 68000 MPU only the lower 24 bits
have any connection to the external address bus.
Two arrays of eight 32-bit registers are of major concern to the programmer.
These are functionally divided into Data and Address registers. Data registers
provide the source or/and the destination data for most operations. The Ad-
dress registers hold pointers to data stored outside in the MPU's memory. Mo-
torola have made a considerable effort to ensure that these registers behave in a
consistent and regular manner (they use the term orthogonal); for example any-
thing that can be done on D0 can also be done in exactly the same manner on,
say, D7. However, they have made a clear distinction between registers holding
operational data (Data registers) and those used to compute addresses (Address
registers).
The eight Data registers are the equivalent to the one or two Accumulator reg-
isters found in most 8-bit MPUs. Most instructions use at least one Data register
to hold a source or destination operand; for example:
ADD.L [ea],D0 ; [D0] <- [D0] + [ea]
adds the 32-bit long operand at some effective address (ea) to D0, answer in D0.
ADD.L D1,[ea] ; [ea] <- [ea] + [D1]
Any bits outside the target field remains unchanged. I have used the notation
D2(n:m) as meaning bits n through m of Data register 2. Most instructions acting
on Data registers come in all three size varieties, indicated to the assembler by
using the extensions .B (for byte), .W (defaults to word) and .L (for long-word).
Two bits in the op-code word are used to represent the size, as shown in Fig. 4.4.
There are also a few instructions which can affect any bit in a Data register; for
example:
BSET #12,D4 ; Sets bit 12 of D4.L high, the rest unchanged.
In order to make it difficult to use an Address register for anything other than
its legitimate role, only a small range of special instructions can be used to alter
their contents. For example, to set up A0 to address 0000C000h we have:
MOVEA.L #0C000L,A0 ; [A0(31:0)] <- 0C000h
copies the contents of the data byte located in wherever A0 points to plus the
32-bit variable in D7 plus the constant 64 into the lower byte of D0! Incidentally,
if there is going to be lots of activity around this area of memory, the instruction:
puts the effective address (A0.L plus D0.L plus 64) in A1; and future accesses
can be made without further calculation using A1 as a pointer. More about Load
Effective Address and address modes in Chapter 4.
CONTROL CIRCUITRY
The 68000's Instruction decoder uses a microcoded design [6] as opposed to
the random logic employed by 8-bit processors, such as the 6809. The order of
magnitude increase in complexity exhibited in 4th generation devices makes the
design and testing of the more efficient random logic circuitry difficult. Thus
the disadvantages of larger and slower circuitry are considered more than offset
by the advantages of simplicity of design and testing, as well as the flexibility
of an easier change or enhancement of operation. In a microcoded design, the
sequence of steps in implementing an instruction are stored in integral ROMs [2].
The 68008 MPU is an 8-bit data bus version of the 68000. Despite the reduced
external functionality, as can be seen from Fig. 3.2, internally the two processors
are the same. Software is identical for both processors, although execution times
INSIDE THE 68000/8 63
are typically 40% longer, due to the larger numbers of 8-bit fetches, as opposed
to 16-bit equivalents [7]. This still makes the 68008 a powerful alternative to a
purely 8-bit MPU, and it is often used for this purpose in embedded MPU circuitry.
Although the device itself is similar in price to its bigger brother; the smaller
package, bus width and number of memory chips (see Figs. 3.11, 3.12 and 13.3)
considerably reduces board space and hence costs.
The 68000 MPU is available in a 64-pin package, which is shown in Fig. 3.3 to-
gether with the 48-pin 68008. Unlike the 8086 family, all bus signals are non-
multiplexed. All signals are TTL voltage-level compatible. The 68HC000 is a
CMOS version with slightly different electrical and timing specifications. Unless
otherwise stated, figures are given for the normal HMOS version.
The term àddress' is normally used in a rather careless way without qualification.
Address of what? In an 8-bit processor, at the hardware level it can be taken as
the bit pattern on the address bus, which is externally decoded to physically
enable the target 8-bit byte in memory or port onto the 8-bit data bus. Thus it
is a byte address. In a 16-bit processor, it is a word address; that is, points to
a word in memory space. In a high-level language, what meaning do we attach
to the address of, say, an object comprizing an array of ten byte-sized elements
stored in consecutive memory locations? The general convention is to specify
the lowest byte address of the object. This is mainly for historic reasons, as MPU
technology came of age with 8-bit devices. Thus, if the array fred[ ] is stored
in memory between byte addresses 01C030h and 01C03Ah, then its address is
01C030h. In the 68000 MPU this base address is used for word and long-word
sized objects. Thus the instruction MOVE.W 01C030h,D0 will bring the object
1C030h 1C031h
MSB LSB down into D0(15:0).
The physical address bus reflects this natural word size by omitting line a0.
Thus each pattern on the bus a23 – a1 spans two internal byte addresses a23 – a0,
one even and one odd. As we shall see, the missing a0 line is implicitly available
in the guise of the two Data Strobe lines. Up to 8 Mwords or 16 Mbytes are directly
accessible on this address bus. The 68008 MPU has a natural byte-sized word, as
reflected in its byte organized address bus, which does explicitly provide an a0
line. This 68008 has 20 address pins, from a19 to a0, giving a 1 Mbyte address
space (there is a 52-pin version with 22 lines).
OUTSIDE THE 68000/8 65
Address_Strobe (AS) is asserted when the state of the address bus is valid, see
Figs. 3.6 & 3.7. When enabled, the address lines can drive up to four LSTTL loads
into a 130 pF capacitive load. AS can similarly drive six LSTTL loads. Both sets of
lines are off when in a direct memory access (DMA) mode, whilst only the address
bus is off when halted.
DATA BUS and DATA STROBES (dn & UDS / LDS / DS)
The 68000 MPU uses a single bi-directional 16-bit data bus to carry both in-
66 C FOR THE MICROPROCESSOR ENGINEER
struction and operand data to the MPU (Read) and from (Write). There is a prob-
lem here, in that the 68000 sees a byte organized world out there through a
word-sized eye. Figure 3.4 shows the execution cycle of a MOVE instruction in
byte, word and long-word versions. In the case of a Read-byte action, the ac-
tual data lines used for the transfer depend on whether an even address (upper
eight lines) or odd address (lower eight lines) is specified. Data as considered
EVEN ODD
in byte-sized lumps is organized as UDS LDS . Thus the
Upper_Data_Strobe is seen to be equivalent to the missing a0 (active when a0
is 0, that is on even byte addresses) and Lower_Data_Strobe is active when a0 is 1
(odd byte address). Thus the two Data Strobes have a dual role. Firstly they signal
when data is valid during a Write action, as shown in Fig. 3.7. Secondly they can
enable either the upper or lower byte of an addressed word, effectively enabling
the 16-bit data bus to carry a single byte from a word-organized memory space.
A word transfer is signalled with both UDS and LDS being together asserted,
and the two bytes feeding the bus simultaneously. Notice that the most signif-
icant byte (MSB) is always in the even address (lower byte address) in common
with all Motorola MPUs (see page 20). A long-word transfer simply involves two
word transfers in sequence. As can be seen, the execution time here is longer by
four clock periods (see Fig. 3.5) due to the extra transfer cycle. Byte and word
execution takes the same time. In both word and long-word cases the data has to
be organized starting with an even address (MSB). Attempts to do an odd-address
word or long-word access; for example:
MOVE.W 0C101h,D0 ; This is erroneous
is an error, and the 68000 will terminate execution by returning to the Supervisor
state via an Address Error Trap (see Section 6.2).
The 68008 has only a byte-sized data bus and a single Data Strobe (DS). There
is no problem here, as address line a0 is provided explicitly to reflect the natu-
ral byte size of the data bus, and thus each target memory byte is individually
enabled. This is exactly the same as an 8-bit MPU seeing the world through an
8-bit eye. Nevertheless, the even boundaries restriction for word and long-word
memory data are retained for compatibility with the 68000 processor. Execu-
tion times for the 68008 are shortest for a byte operand; word and long-word
operands taking one and three extra access cycles respectively. Fetching the op-
code also takes twice as long. At a clock frequency of 8 MHz, the 68000 moves
a word to a data register in 2 µs, whilst a 68008 takes 4 µs. However moving
between registers; for example:
MOVE.W D7,D0 ; A register to register move
1
takes µs in both cases. The moral being to keep as much in the Data registers
2
as possible.
When the data lines and DS signals are enabled, they can drive up to six 74LS
loads and 130 pF without external buffering. Data lines are high-impedance when
the processor is halted or in a DMA mode. DS signals are off only during DMA.
Reset
Asserting both Reset and Halt together initiates a total Reset of the processor.
OUTSIDE THE 68000/8 67
This must be held for at least 100 ms when the power is initially applied. This
ensures stabilization of the internal bias voltage generator and external clock
source. Otherwise a duration of ten clock cycles is sufficient.
A total Reset causes the contents of the long-word at 000000 – 3h to be moved
into the Supervisor Stack Pointer (its initial setting) and long-word 000004 – 7h to
be moved into the Program Counter (the Restart address, see Fig. 6.5). The Status
register is also set to Supervisor state (S = 1), Trace off (T = 0) and Interrupt
mask to 7 (I2 I1 I0 = 111). No other registers are affected.
68 C FOR THE MICROPROCESSOR ENGINEER
Reset can also act as an output signal, activated by the privileged instruction
RESET. This drives the Reset pin low for 124 clock periods, which can be used
to reset peripheral devices. Because of this bi-directional action, external restart
circuitry must be more complex than a simple switch. An example of a typical
circuit [8] is shown in Fig. 13.3. Reset will also be driven low, together with Halt
when a Double-bus fault occurs, as described in the next paragraph.
Halt
Like Reset, this is also a bi-directional line. As an input it can be used in conjunc-
tion with Reset or alone. When asserted alone, it will stop the processor after the
current instruction is finished. The address and data buses will then be floated,
and other Control outputs negated. If Halt is then released for one cycle, the
processor will execute the next instruction and then stop. So, Halt can be used
to single-step the processor for debug purposes [9].
As an output, Halt is driven low (together with Reset) when the initial Super-
visor Stack Pointer setting obtained from the Vector table on Reset is odd or a
Bus Error is active in an exceptional event (see page 161). This is known as a
Double-bus fault. Halting the MPU is the obvious thing to do in these cases, as
such events are unrecoverable.
Read/Write (R/W)
This is low during a Write cycle, otherwise high. It is floated during DMA and, as
a precaution, normally has a pull-up resistor to prevent erroneous writes during
this situation. It can drive up to six LSTTL loads into 130 pF.
Data_Transfer_ACKnowledge (DTACK)
This is a signal sent back by the addressed device to indicate that the peripheral's
data is valid during a read cycle and that the peripheral is ready to accept the data
during a Write cycle. This asynchronous handshake protocol is discussed in detail
in the next section.
These input pins are driven from external devices requesting an interrupt. The 3-
bit active-low code thus placed is its priority level, ranging from zero (111) for no
interrupt (quiescent state) to seven (000) for a non-maskable top-priority request.
The IPL pins are constantly monitored, and any change lasting a minimum of two
successive clock periods is internally latched. At the end of each instruction, the
latched request level is compared with the Interrupt mask bits setting in the Status
register and acted upon if higher. If masked to level 7, a change from a lower level
to level 7 request will trigger an edge-triggered non-maskable interrupt response.
More details in Section 6.1.
The 68008 MPU (except in its 52-pin version) internally connects the IPL0 and
IPL2 lines as shown in Fig. 3.2. This means that only levels where bits 0 & 2 are
the same (111 = 0, 101 = 2, 010 = 5 and 000 = 7) are available.
OUTSIDE THE 68000/8 69
Bus_ERRor (BERR)
This input acts as a special type of interrupt used to inform the processor that
something has gone wrong out there. As an example of what can go awry, perhaps
the addressed peripheral has not sent back its DTACK acknowledge signal. If
this continues indefinitely, the processor will hang up forever waiting for the
peripheral to respond. Using a re-triggerable monostable activated by DTACK to
drive BERR would ensure that in the absence of a correct response, say within
10 ms, the monostable will relax and alarm the processor. The use of a `watch-
dog' timer like this can be extended to ensure that the veracity of the program
in high-noise situations, which can corrupt data and address lines, causing the
processor to go off to some illegal memory space and do its own thing. By using a
few lines of the legitimate program to trigger a watch-dog at some regular interval,
a Bus Error can be signalled if this area of program is not entered. See Section 6.2
for more details. If a Bus Error occurs during the Restart process, signalling that
the Reset vectors cannot be accessed, then the MPU stop with the HALT pin low.
During normal execution, if the external error-detection circuitry also drives
the Halt line in the correct fashion, the processor can be persuaded to rerun the
cycle which caused the error [10].
When a Bus Error occurs, the processor pushes data onto the Supervisor stack,
which can then be used by the operating system for diagnostic purposes. If a Bus
Error continues to be signalled, then a Double-bus fault is said to have occurred.
The processor signals this catastrophe by bringing Halt low and stopping.
These three outputs inform the outside world concerning the state of the proces-
sor according to the codes:
Being able to distinguish between User and Supervisor states allows the hardware
engineer to design address decoding circuitry which accesses different RAM and
ROM chips. Knowing that an interrupt is being serviced is useful in cancelling
the request, as discussed in Section 6.1.
Function_Code outputs can drive up to four LSTTL loads into 130 pF. They go
high impedance on DMA.
Bus_Request (BR)
External devices that wish to take over the buses for direct memory access (DMA)
70 C FOR THE MICROPROCESSOR ENGINEER
Bus_Grant (BG)
Bus_Grant_ACKnowledge (BGACK)
Before taking over the buses, the DMA device checks that no other DMA device
is asserting BGACK. If it is, the new device waits until BGACK is negated before
asserting its own BGACK and proceeding. All DMA devices have their BGACK
outputs wire-OR'ed together. The 68008 does not have this handshake input
(except for the 52-pin version) and so can only handle systems where only one
DMA device is present.
CLocK (CLK)
This must be driven by an external TTL compatible oscillator. Small crystal con-
trolled DIL packaged circuits are readily available for this purpose. Rise and fall
times should be 10 ns or better (8 & 10 MHz). Maximum frequency versions of 8
and 10 MHz are readily available with 12.5 and 16 MHz (not 68008) variants ob-
tainable. The 68040/68060 can run up to 50 MHz. A typical Read or Write cycle
needs four clock pulses (see Figs. 3.5 & 3.6), thus taking between 500 ns (8 MHZ)
through to 80 ns (50 MHz). The 68000/8 has internal dynamic circuitry, so has
a lower clock frequency bound (2 MHz for the HMOS devices, 4 MHz for CMOS
versions).
This output is CLK frequency divided by ten (six low, four high). It is equivalent to
the same-named signal in the older 6800 and 6809 MPU's (see Figs. 1.3 & 1.4), and
is used when interfacing to the older style specialized 6800-oriented peripheral
devices. It can drive up to six LSTTL loads at 130 pF.
Valid_Memory_Access (VMA)
This is also an òld-style' 6800 type signal (not 6809). It indicates that the address
bus data is valid, and is used as an Address Strobe synchronized to E for òld-style'
peripheral devices, such as the 6821 PIA (see Fig. 3.14). This is not available on the
68008 MPU, but can be generated with external circuitry [11]. It is only generated
when external circuitry asserts the MPU's VPA pin, and then will take some time
to lock into the E signal.
Valid_Peripheral_Address (VPA)
This input, which is usually driven from the address decoder, indicates that the
location the MPU wishes to communicate with is populated with a 6800-style
MAKING THE CONNECTION 71
peripheral, and that a special 6800-type data transfer cycle (using E & VMA) should
be used. VPA is also used to indicate that the processor should use automatic
vectoring to respond to an interrupt, as described in Section 6.1.
1. The address bus's data will be valid within t CLAV (Clock Low to Address Valid)
of the beginning of phase 1 (φ1).
72 C FOR THE MICROPROCESSOR ENGINEER
Figure 3.6 The 68000/8 Read cycle. Times given are for the 8 MHz HMOS version.
74 C FOR THE MICROPROCESSOR ENGINEER
2. The AS and DS strobes are asserted by t CHSL (Clock High to Strobe Low) follow-
ing the start of φ2.
3. The peripheral device responds when ready by asserting its DTACK line. If this
can be done by t ASI (Asynchronous Setup Input) preceding the end of φ4, then
the cycle will go ahead. Otherwise, the MPU will insert wait states of one clock
period each (two phases) until DTACK is recognized on the falling edge.
4. The peripheral must set up its data on the bus no less than t DICL (Data In to
Clock Low) before the \ of φ6, to ensure a successful read by the processor.
5. The AS and DS Strobes are then negated by no more than t CLSH (Clock Low to
Strobe High) following φ6.
6. The peripheral has up to two clock periods from this point to negate its DTACK
and remove its data.
Function_Code values, not shown in the diagram, are stable for the duration
of the asserted Strobe signals, as is R/W (high for Read.).
The Write cycle time sequence shown in Fig. 3.7 is broadly the same as for
reading. This time data is put on the bus by the MPU, and it is the job of the
peripheral device to capture this and acknowledge with the DTACK handshake.
The Data_Strobes are not asserted until the outgoing data is valid; somewhat later
in this situation than the Address_Strobe; which indicates a valid address. After
UDS/LDS is negated, the data is taken off the bus, and the peripheral should now
terminate its handshake.
In more detail:
1. The address bus will be valid within t CLAV (Clock Low to Address Valid) of the
beginning of phase 1 (φ1).
2. AS is asserted by t CHSL (Clock High to Strobe Low) following the start of φ2.
3. The MPU sends out data on the bus by no later than t CLDO (Clock Low to
Data Out) following φ3.
4. The UDS/LDS Strobes are asserted by t CHSL following the start of φ4.
5. The peripheral device responds when ready by asserting its DTACK line. If this
can be done by t ASI (Asynchronous Setup Input) preceding the end of φ4, then
the cycle will go ahead. Otherwise the MPU will insert wait states of one clock
period each (two phases) until DTACK is recognized on the clock \ .
6. All Strobes are negated by no more than t CLSH (Clock Low to Strobe High) fol-
lowing φ6.
7. Anytime after this, the peripheral can lift its DTACK handshake.
8. The MPU lifts its data off the bus by no less than t SHODI (Strobe High to Data Out
Invalid) after the Strobes negate. This is the time a peripheral has to grab the
data (including its setup time) after a / Strobe edge (30 ns for the 8 MHz
device, 20 and 15 ns for the 10 and 12.5 MHz devices respectively).
Not shown are the Function_Code settings, which are valid for the duration of
the AS Strobe, whilst R/W is low for Write as long as the DS is active.
Designing an address decoder involves the definition of logic which will imple-
ment the Boolean equations describing which combinations (addresses) of input
MAKING THE CONNECTION 75
Figure 3.7 The 68000/8 Write cycle. Times given are for the 8 MHz HMOS version.
76 C FOR THE MICROPROCESSOR ENGINEER
variables (address lines) are to select the various peripheral devices. In this re-
gard the 68000/8 does not differ from that for an 8-bit processor (see Section 2.3),
although the larger number of variables is a further inducement to use more so-
phisticated implementations, such as programmable array logic [12]. This is es-
pecially the case where high speed versions demand small propagation delays. It
is beyond the scope of this book to discuss the merits and features of the various
circuitry, reference [13] gives a good review for the interested reader.
A rather unlikely, but nevertheless working circuit, is shown in Fig. 3.8. Here
the 16 Mbyte address map can be considered split into four quarters using a23
and a22. A 74LS154 4 to 16-line decoder further splits the quarter defined by
a23 a22 = 00 into 16 pages of 256 Kbytes each. Page 0 is again sub-divided into
eight `paragraphs' of 16 Kbytes, which are assumed to directly enable the labelled
devices. In the cases where only a single peripheral interface is indicated, further
levels of decoding may be used. EPROM_EN combines two of these paragraphs
using a 74LS08 AND gate, as 27128 EPROM pairs have a 16 Kword (32 Kbyte)
capacity.
The secondary decoder is qualified by AS. As AS is only asserted when the
address signals have stabilized, this ensures that there are no spurious outputs
during times when the address bus is in transition. With AS being asserted ap-
proximately one clock phase after the address is valid, it should be applied to
the last decoder stage. This allows primary stages to `get on with it' as soon as
possible, and hence reduce the decoder's overall propagation delay. When high
clock-speed versions of the 68000 are used, AS is commonly fed directly to the
peripheral or memory's Chip Enable, to further reduce this delay; an example
being shown in Fig. 3.11(b).
Address decoding for the 68008 is identical to that for the 68000, but only
address lines up to a19 are available. Thus, a functionally equivalent page 0 split
could be obtained by replacing the 74LS154 decoder by a 74LS08 AND gate acting
on a19 and a18.
As we have seen, each peripheral addressed by a 68000 family MPU must re-
ply by asserting the DTACK line when ready. All 68xxx peripheral devices specifi-
cally designed to function in an asynchronous manner automatically provide this
handshake signal. An example if this is the 68230 Parallel Interface/Timer (PI/T)
shown in Fig. 3.13. However, memory chips and elementary interface devices
such as 3-state buffers and latches do not generate this information.
In the simplest of situations the 68000/8 MPU will run with its DTACK input
permanently asserted. No wait states will be inserted into its Read or Write cycle;
so all memory and peripheral interface must be fast enough to function correctly
in the allowed time. Figure 15.6 shows an example of this treatment of DTACK.
A slightly more sophisticated approach is depicted at the bottom of Fig. 3.8.
Here the pulse actually enabling the relevant device is also fed back to acknowl-
edge readiness. This will activate shortly after AS is asserted, and will thus appear
well before the end of clock phase 4, and no wait states will be introduced. The
AND gate used to sum the Enable signals to the relevant interfaces and memory,
is open-collector. Thus other similar signals from elsewhere in the memory space
MAKING THE CONNECTION 77
can be wire-ORed to the one DTACK pin; see Fig. 3.9. The PI/T_EN of Fig. 3.8 does
not take any part in this scheme, as the 68230 provides its own open-collector
DTACK handshake output (see Fig. 3.12).
Although this approach is more flexible than simply grounding DTACK, it
still assumes that the addressed device is fast enough not to require wait states.
Where fast 68000 MPUs are used, this is not likely to be the case for all periph-
erals. Peripherals such as EPROMs and LCD interfaces tend to be rather slow. In
MAKING THE CONNECTION 79
such situations a delay circuit is needed for each such DTACK reply. This may
take the form of a monostable, counter or shift register. An example of the latter
is given in Fig. 3.9. Normally when the device in question is not being accessed,
DEV_EN is high and all eight flip flops are low. The 74LS05 open-collector buffer
is then off. When the device is selected, DEV_EN goes low trailing AS by the ad-
dress decoder's propagation delay; thus releasing the register's CLR. As the serial
inputs are permanently held high, the flip flops will each in turn become logic 1,
with an advance from QA to QH on the rising edge of the 68000's Clock. Assum-
ing that the decoder's and 74LS05's propagation delay plus the 74LS164's setup
time is less than the difference between AS being asserted and t ASI before the end
of clock phase 4 (approximately one clock cycle, see Figs. 3.6 and 3.7), then wait
states of between 0 and 7 clock periods are available according to the position
of the link. Once the logic 1 reaches the link, the 74LS05's output goes low and
DTACK is asserted.
Two 74LS377 octal flip flop registers are used in Fig. 3.10 to illustrate the im-
plementation of an elementary 16-bit output port. The registers are both enabled
by the address decoder, and the data clocked in by one or both Data Strobes, as
appropriate (see Fig. 3.4). The rising edge of the Strobe is the active transition; 6
in Fig. 3.7. There is a minimum of t SHDOI between this point and the data becom-
ing invalid. In determining the margin, the hold time (5 ns) for the 74LS377 must
be subtracted. In the case of the 8 MHz 68000, this gives a worst-case margin
of 25 ns, which shrinks to 10 ns for the 12.5 MHz version. There is no problem
meeting the 25 ns 74LS377 setup requirement.
From these figures, it is clear that the Data_Strobes should directly clock the
registers and not be gated via additional logic. For example; it is tempting to
use R/W ANDed with UDS/LDS to ensure that an accidental read from this port
does not latch in irrelevant data. The alternative of using R/W in conjunction
with OUT_EN is preferable for this purpose. The falling edge of UDS/LDS via an
inverter or gate cannot be reliably used as the clock, as it is just possible that
if t CLDO is a maximum and t CHSL is a minimum, the data will not be valid at this
point.
In the case of the 68008 MPU, one 74LS377 will give an 8-bit output port, with
DS acting as the clock (see Fig. 13.1). The same timing considerations hold.
The 6264 is a static CMOS 64 Kbit RAM organized as an 8K × 8 array. It is
commonly available in 100, 120 and 150 ns access time selections. Taking the
Hitachi HM6264CP-10 as an example of a 100 ns device; the access time defining
the minimum period from a stable address and device enabled (CS1 = 0, CS2 =
1) before data becomes valid during a Read. When writing, the address must
be stable for the full 100 ns and for at least 80 ns of this time the device must
be enabled and R/W = 0 for a successful Write-to action. The address must
remain stable for at least 5 ns after CS1 or R/W change state, or 15 ns after CS2
deactivates.
Referring to Fig. 3.11(a), we see that two broadside 6264s provide the 16 bits
at each word address. As there is no a0 byte address bit available from the 68000
80 C FOR THE MICROPROCESSOR ENGINEER
MPU, address lines a1 – a13 drive the A0 – A12 RAM inputs, with UDS and LDS ef-
fectively providing the byte selection.
To determine whether wait states are required in using these devices, we need
to analyze the timing constraints [14]. Essentially the RAM is enabled for the
duration of the Data Strobes. As this is shortest during a Write cycle, we will use
this as the determining factor. From Fig. 3.7, the worst-case width of UDS/LDS
is 6 − 4 , or three clock phases − t CHDL; if we assume a minimum t CLSH of zero
(no figure is given). For the 8, 10 and 12.5 MHz MPUs, this is 120, 90 and 60 ns
respectively. Thus the 80 ns HM6264LP-10 figure is suitable for up to 10 MHz
MAKING THE CONNECTION 81
systems. Actually we are being unduly pessimistic, as the 68000 data sheet gives
t DSL (Data Strobe Low) minimum as 80 ns for the 12.5 MHz MPU. For the Read
cycle, 160 ns is the equivalent 12.5 MHz figure, rising to 240 ns for the 8 MHz
version.
We have assumed that the propagation delay through the address decoder is
such that RAM_EN is asserted before the Data Strobes. During a Write cycle this
is the time between 4 and 2 in Fig. 3.7; around one clock cycle. In the case of
a Read cycle, the propagation delay must be subtracted from the t DSL time that
the Data Strobes are low. In higher speed circuits, this propagation delay can be
82 C FOR THE MICROPROCESSOR ENGINEER
minimized by omitting AS from the address decoder and using it to qualify the
R/W signal, as shown in Fig. 3.11(b). This is more economical than qualifying the
RAM_EN signal, as the modified R/W (i.e. RAM_R/W) can be used for any number
of RAM chips. The inverted MPU_R/W is normally used in this situation to turn
off the output 3-state buffers during a Write, by activating Output_Enable (OE).
Turn-off time is quicker from OE than from the RAM's Chip Select or R/W.
EPROMs cause problems as they tend to be very much slower. A typical 27128
16K×8 EPROM has a 250 ns access time from stable address/asserted Chip_Select.
Even at 8 MHz, there is only 235 ns from the falling edge of AS until within the
setup time before the end of φ6 (5 × cycles − t CHSL − t DOCL). Fortunately, the time
from Output_Enable (OE) to data valid is much less; for example 100 ns for the
Hitachi HN4827128AG-25; and the circuit of Fig. 3.12 makes use of this means
of access. Here CS is enabled whenever R/W is high; that is, during each Read.
The R/W signal is valid no later than 70 ns after φ0, which gives around 350 ns
enabling time to the end of φ6, less setup time t DICL ( 4 in Fig. 3.6). Provided
that the EPROM's OE is enabled at least 100 ns prior to this endpoint, a successful
Read will occur. As the time between AS enabling the address decoder and this
point is 235 ns, 135 ns is left to more than adequately cover this delay.
Faster CMOS EPROMs, such as the 150 ns National Semiconductor NMC27C64
(60 ns from OE) facilitate no-wait state operation for faster processors. Alterna-
tively the contents of slow EPROM could be transferred `lock-stock and barrel'
to fast RAM at the beginning of the program, and the EPROM henceforth ignored.
This technique is frequently used in IBM PCs, where the BIOS is shadowed in RAM
during the booting process.
RAM and ROM are interfaced to the 68008 MPU in the same way, but this time
the MPU provides the byte-address bit a0, and this goes to the memories' A0 line.
DS replaces UDS and LDS, see Fig. 13.3.
The 68000 family are supported by a series of dedicated peripheral interface
devices. The 68230 Parallel Interface/Timer (PI/T) is typical of these, providing
three 8-bit peripheral ports, two with handshake, and sharing functions with an
internal timer together with interrupt facilities. As shown in Fig. 3.13, interfacing
is straightforward, with a Data Strobe enabling the device together with the ad-
dress decoder output. DTACK is internally generated and is connected directly to
the MPU's DTACK node. Handshaking for the Interrupts (one for the parallel in-
terface PIRQ/PIACK and one for the timer TOUT/TIACK) is provided, as described
in Chapter 6.
There are 25 internal registers addressed by the five Register Select inputs
(RS1 – RS5). As shown driven by address lines a1 – a5, they will appear at alternate
byte addresses. Although this presents little inconvenience, a special instruc-
tion, MOVEP, can transfer two or four bytes at alternate addresses to suit this
arrangement.
The two main peripheral ports can be set up to act as one 16-bit port, although
the rather strange decision to use an 8-bit data bus means that two cycles are
needed to transfer the data word. Programming the 68230 is complex and beyond
84 C FOR THE MICROPROCESSOR ENGINEER
the scope of this text; see reference [15] for a good description.
When the 68000 MPU was first released in 1979, the decision was taken to pro-
vide an operating mode to allow its use with the existing 68xx family of peripheral
interface devices. This would ensure that the MPU was immediately useful with-
out having to wait for further device introductions. We have already met the
6821 PIA in Fig. 1.9, and Fig. 3.14 shows this device in the alien environment of
the 68000.
Essentially a 68xx device prompts the 68000 MPU about its special status by
asserting the latter's VPA input, rather than DTACK; as shown in Fig. 3.8. The
Read and Write cycles are then synchronized to the E clock to give the normal
6800/6809-type synchronous data transfer sequence. The Valid_Memory_Address
(VMA) status output is used as an Address Strobe in this mode. DTACK should
86 C FOR THE MICROPROCESSOR ENGINEER
not be asserted during this time. As E is the 68000's clock divided by ten, then
the normal 1 MHz 6821 version is adequate up to 10 MHz systems. The 1.5 MHz
68A21 is suitable for the 12.5 MHz 68000 MPU.
References
[1] Starnes, T.W.; Design Philosophy Behind Motorola's MC68000; Part 1: A 16-bit Pro-
cessor with Multiple 32-bit Registers, BYTE, 8, no. 4, April 1983, pp.70 – 92.
[2] Starnes, T.W.; Design Philosophy Behind Motorola's MC68000; Part 2: Data Move-
ment, Arithmetic, and Logic Instructions, BYTE, 8, no. 5, May 1983, pp.342 – 367.
[3] Starnes, T.W.; Design Philosophy Behind Motorola's MC68000; Part 3: Advanced
Instructions, BYTE, 8, no. 6, June 1983, pp.339 – 349.
[4] Lawrence, P.D. and Mauch, K.; Real-Time Microcomputer System Design: An Intro-
duction, McGraw-Hill, 1987, Chapter 16.
[5] Kane, G et al.; 68000 Assembly Language Programming, Osbourne/McGraw-Hill,
1981, Chapter 19.
[6] Stritter, S and Tredennic, N.; Microprogrammed Implementation of a Single Chip Mi-
croprocessor, Prog. 11th Ann. Microprogramming Workshop, Nov. 1978, IEEE, pp.8 –
16.
[7] Browne, J.W.; µp Fits 16-bit Performance into 8-bit Systems, Electronic Design, 30,
April 15th, 1982, pp.183 – 187.
[8] Wilcox, A.D.; 68000 Microcomputer Systems: Designing and Troubleshooting,
Prentice-Hall, 1987, Section 9.13.
[9] Starnes, T.W.; Handling Exceptions Gracefully Enhances Software Reliability, Elec-
tronics, 11th Sept. 1980, pp.153 – 155.
[10] Clements, A.; Microprocessor Systems Design: 68000 Hardware, Software, and Inter-
facing, PWS-KENT, 2nd ed., 1992, Section 6.5.
[11] Barth, A.J.; Designing with the 68008 MPU, 90, no. 1579, April 1984, pp.30 – 33 & 41.
[12] Cahill, S.J.; Digital and Microprocessor Engineering, Ellis Horwood/Prentice-Hall,
2nd ed., 1993, Section 6.1.
[13] Clements, A.; Microprocessor Systems Design: 68000 Hardware, Software, and Inter-
facing, PWS-KENT, 2nd ed., 1992, Sections 5.1 & 5.2.
[14] Wilcox, A.D.; 68000 Microcomputer Systems: Designing and Troubleshooting,
Prentice-Hall, 1987, Section 10.6.
[15] Clements, A.; Microprocessor Systems Design: 68000 Hardware, Software, and Inter-
facing, PWS-KENT, 2nd ed., 1992, Section 8.3.
CHAPTER 4
Although the 68000 architecture represents a complete break with its progenitor
6800 family; its software is in reality an evolution rather than a break from ear-
lier implementations. Many of the characteristics exhibited by the 6809 instruc-
tion set (see Chapter 2) also appear in 68000 software, and indeed this is not
surprising as they both support high-level language compilation, with extensive
stack-oriented operations and a large repertoire of computed address modes.
The use of a full 16-bit op-code allows considerable scope in handling the
many instruction:op-code:register combinations. Nevertheless, a special effort
was made to make the assembly-level software user friendly. There are only
56 primary instructions [1], although variations on themes of several of these add
another 29 mnemonics (eg. MOVE and MOVEQ for MOVE and MOVE Quick). Most
instructions are orthogonal, in that they apply to all registers within a group (Data
or Address) in the same manner. The `rules of grammar' are fairly consistent
across the range of instructions with relatively minor quirks [2].
In this chapter we look at the more important of the instructions and their
address modes. We will tie these together with the same example subroutines
used to illustrate 6809 software in Section 2.3. The same assembler will be used
here, details of which were given at that point. 68008 software is identical to that
for the 68000 (except that only the lower twenty address bits are significant) and
we will use the term 68000 as generic of the two.
It would take a complete book, rather than a single chapter, to do justice to
assembly-level programming for such a complex processor. References [3, 4, 5, 6]
are recommended to the interested reader.
87
88 C FOR THE MICROPROCESSOR ENGINEER
Single-operand (or monadic) instructions, such as CLeaR, have only one entry
in the operand field, for example:
CLR.B 0E000h ; [E000] <- 00 {Coded as 4439-0000-E000h}
CLR.L D0 ; [D0(31:0)] <- 00000000 {Coded as 4480h}
Data Movement is the the most common operation executed. Reference [7]
reports a frequency count of about 33% for MOVE, and it is with this in mind
that we start with Table 4.1. Here we can see that only three mnemonics cover
the range (see also LEA and PEA in Table 4.2). Of these the chief is MOVE, which
subsumes the Load and Store operations of the 6809 MPU. MOVE is so frequently
used that Motorola made it the most flexible of all the 68000 operations, a true 2-
address instruction. Data in 8-, 16- or 32-bit packets can be copied from anywhere
in memory, any register (except the PC) or immediately to any alterable memory
or to any register (except PC). All other 2-operand instructions must specify a
register as the source and/or destination, for instance ADD.B 0C000h,D0.
The MOVEA variation of the plain MOVE instruction must be used where an
Address register is the destination. For example:
MOVEA.L #0C000h,A0 ; [A0(31:0)] <- 0000C000 {Coded as 207C-0000-C000h}
Like all specific Address register-destination operations, the CCR flags are not
altered, and only word and long-word sizes are permitted. Word-sized operands
are sign extended to 32 bits, for example:
MOVE.W #0C000h,A0 ; [A0(31:0)] <- FFFFC000 {Coded as 307C-C000h}
The state of the CCR flags can be set up using the MOVE <ea>,CCR variant
(some assemblers use the non-standard mnemonic MTCCR for Move To CCR).
Notice that its size is word only (the .W is usually omitted) although the CCR is
byte sized. The Status register equivalent is MOVE <ea>,SR (or MTSR <ea>), and
is only legal in the Supervisor state, that is privileged; but a Move From the SR,
MOVE SR,<ea> (or MFSR <ea>), can be made from anywhere. The Move From
the CCR is only available on the 68010 MPU and higher family members.
The MOVE Quick (MOVEQ) instruction is targeted exclusively to the Data regis-
ters. It is used to set up a 32-bit Data register to a fixed long number between +127
and −128 (signed 8-bit). Of course an ordinary MOVE can be used, but as the im-
mediate data is included in the op-code for MOVEQ, the latter's execution is much
faster, as shown here:
ITS INSTRUCTION SET 89
where ~ indicates clock cycle. Thus the ordinary MOVE takes 1.5 µs at an 8 MHz
clock rate against 0.5 µs for a MOVEQ. The timings for the 68008 MPU are 24~ (3 µs)
and 8~ (1 µs) respectively. Note that all 32 bits of the Data register are affected.
There is no MOVEQ.B or MOVEQ.W; an ordinary MOVE must be used in cases where
only the lower 8 or 16 bits are to be setup.
Using a regular MOVE with the appropriate address mode gives the equivalent
of a Push or Pull operation; for example:
MOVE.L D0,-(SP) ; Same as PSHS D0 (14~) {Coded as 2F00h}
pushes all of D0 out to the System stack, after the System Stack Pointer A7 has
been decremented four bytes, and
MOVE.L (SP)+,D0 ; Same as PULS D0 (12~) {Coded as 201Fh}
pulls four bytes off the System stack into D0.L and then increments the System
Stack Pointer. The actual System stack used depends on whether the MPU is in the
Supervisor or User mode, the assembler allowing the use of the mnemonic SP or,
indeed A7, for either System Stack Pointer. Note that a MOVE.B to/from the System
stack always results in a word being transferred, to preserve the evenness of the
90 C FOR THE MICROPROCESSOR ENGINEER
System Stack Pointer (i.e. A7). Any of the other Address registers may be used in
place of A7. Pre-Decrement and Post-Increment address modes are discussed in
the next section.
As there are 16 registers which may have to be pushed or pulled, clearly a
single instruction which can save or retrieve any or all Address and Data registers
at one go will be more efficient. The MOVE Multiple instruction fulfils this task;
for example:
pushes all of D2,D3, D4 and A2 out to the System stack, the System Stack Pointer
ending 16 bytes down; and
pulls the register contents back out, restoring the System Stack Pointer to its
original value. Any Address register can be used in place of A7. In general, the
time taken for a multiple Push is 8 + 8n~ and multiple Pull is 12 + 8n~, where
n is the number of registers involved. Thus to Push a full register complement
takes 132 clock cycles (16.5 µs at 8 MHz) against 224 clock cycles and 32 bytes
of program memory using ordinary MOVEs.
The MOVEM instruction uses a post-word to the op-code to indicate which regis-
ters are involved, as shown in Fig. 4.1. If less than the full complement is involved,
then the order of storage in the stack is still that shown in the register list. There
is a word-sized MOVEM which only transfers the lower register words. This saves
stack space and time; however, on return all registers — both Data and Address —
are filled with the sign-extended long version of the stored word.
Less usefully, a fixed address can be used as MOVEM's address mode instead
of Pre-Decrement (registers to memory) or Post-Increment (memory to registers).
In this case no pointer marks the bottom of the dump, and the same address is
used for both directions.
EXchanGe (EXG) swaps around the complete 32-bit contents of any two regis-
ters, Data or Address. SWAP acts only on Data registers, and exchanges the lower
and upper words. This is useful, for example, when using the Division opera-
tion, which produces a 16-bit quotient in the lower part of a Data register and
the remainder in the upper 16-bits. Using SWAP makes getting at the remainder
easier (see Table 4.12). The 68020 MPU has a byte-sized SWAP which exchanges
the lower two bytes. The 68000 can use a ROL.W #8,Dn to perform the same
function (see Table 4.3).
The 68000 provides for Addition, Subtraction, Multiplication and Division op-
erations together with some ancillary instructions. The elementary Addition and
Subtraction operations are straightforward, with at least one of the operands
being a Data register, for example:
ADD.B D0,1234h ; [1234] <- [D0(7:0)] + [1234h]. Add <Source> to <Destination>
SUB.W 1234h,D1 ; [D1(15:0)] <- [D1(15:0)] - [1234:5h]. Sub <Source> from <Destination>
ADD.L D0,D1 ; [D1(31:0)] <- [D1(31:0)] + [D0(31:0)]. Add <Source> to <Destination>
ITS INSTRUCTION SET 91
Flags
Operation Mnemonic X N Z V C Description
Add √√√√√ Add source to destination
to Data reg. ADD.s3 ea,Dn √√√√√ [Dn] <- [Dn] + [ea]
to memory ADD.s3 Dn,ea [ea] <- [ea] + [Dn]
to Address reg. ADDA.s2 ea,An • • • • • [An] <- [An] + [ea]
√ √√√√
quick ADDQ.s31 #d3,ea √ √√√√ [ea] <- [ea] + #d32
immediate ADDI.s3 #kk,ea [ea] <- [ea] + #kk
√ √ 3 √√
with extend ADDX.s3 Dy,Dx [Dx] <- [Dx] + [Dy] + X
√ √ 3 √√
ADDX.s3 -(Ay),-(Ax) [-(Ax)] <- [-(Ax)] + [-(Ay)] + X
Multiply √√
signed MULS ea,Dn • √ √ 0 0 [Dn(31:0)]<-[Dn(15:0)]×±[ea(15:0)]
unsigned MULU ea,Dn • 0 0 [Dn(31:0)]<-[Dn(15:0)]× [ea(15:0)]
Note 1: Only Long and Word with Address register destination. Also CCR unchanged.
Note 2: d3 is a 3-bit number 1 to 8.
Note 3: Cleared for non-zero, otherwise unchanged.
Note 4: Not Address register.
ITS INSTRUCTION SET 93
In all cases the result is stored at the destination. Notice that in subtraction
the , can be read as from. When the destination is in memory, then it must of
course be alterable memory, usually RAM. Amongst the instructions, only MOVE
can have both operands in memory.
An Address register is not permitted as a destination, although legal as a
source. Instead the special instructions ADDA and SUBA are used. As is usual,
the CCR flags are not changed by any operation that alters an Address register,
and only word and long-word sizes are permitted. Word results are always sign
extended to a long-word.
The ADD immediate Quick and SUB immediate Quick instructions are used
as a substitute for the missing Increment and Decrement operations. A constant
between 1 and 8 can be added or subtracted from any Data or Address register
or read/write memory location, for example:
ADDA.W #1,A0 ; [A0(31:0)] <- [A0(31:0)] + 1. Increment (12~) {Coded as D0FC-0001h}
ADDQ.W #1,A0 ; [A0(31:0)] <- [A0(31:0)] + 1. Increment ( 8~) {Coded as 5248h}
SUBQ.B #1,1234h ; [1234h] <- [1234] - 1. Decrement (16~) {Coded as 5338-1234h}
The constant is encoded as a 3-bit group in the op-code itself. As can be seen
above, this halves the size of the instruction and therefore decreases execution
time. If an Address register is targeted, the usual word or long-word sizes are
permitted, with the latter being sign extended to the whole 32 bits. The CCR flags
remain unaltered.
Notice that the last example above altered a memory location directly without
using a Data register as an intermediary stop. The ADD Immediate and SUB
Immediate instructions can be used where the data is greater than 8, for example:
SUBI.W #500h,0C000h ; [C000:1] <- [C000:1] - 500h
Where operands of greater than 32 bits are involved, then several sequential
Adds or Subtracts may be used to form the multiple-precision sum or difference.
In most processors the Carry flag provides the linkage between successive op-
erations but, as noted on page ??, the X flag is used for this purpose in the
68000 family.
Figure 4.2 shows an example of a 96-bit addition made up of three 32-bit
operations. The program for this is:
MOVEA.L #0C00Ch,A0 ; Point A0 to just before least significant long-word <Source>
MOVEA.L #0C10Ch,A1 ; and A1 to just before least significant long-word <Destination>
ADD.L -(A0),-(A1) ; Add LSLWs, sum in <Destination> LSLW
ADDX.L -(A0),-(A1) ; Add NSLWs, sum in <Destination> NSLW
ADDX.L -(A0),-(A1) ; Add MSLWs, sum in <Destination> MSLW
One main point to notice here is the use of the Pre-Increment Address Regis-
ter Indirect address mode. As described in the next section, the Address register
used to point to the operand (like an Index register) is automatically decremented
by the appropriate number of bytes (by four here) before being used. With the
arrangement of Fig. 4.2, the address will naturally creep towards the most sig-
nificant bytes as we do each addition. This is the only memory targeted address
mode that can be used by ADDX and SUBX to access data in memory. Alternatively
both operands can lie in Data registers.
94 C FOR THE MICROPROCESSOR ENGINEER
Wouldn't it be useful if you could tell whether the whole multiple sum or
difference was zero? A normal Add or Subtract will set the Z flag if the re-
sult is zero otherwise it will clear it; thus the state of Z reflects the last addi-
tion/subtraction. However, ADDX/SUBX does not affect the Z flag when the result
is zero, otherwise the flag is cleared. Thus setting the Z flag (and also clearing
the X flag) and using all ADDX or SUBXs will give a final Z setting of 1 only if all
outcomes in the sequence are zero. Use:
The middle example illustrates the use of LEA in position independent code (see
Sections 2.2 and 4.2).
Signed and unsigned 16 × 16 multiplication is provided as a primitive. The
Source can be anywhere in memory, a Data register or immediate data, whilst the
destination must be a Data register, for example:
MULU 0C000h,D0 ; [D0(31:0)] <- [D0(15:0)] x [C000:1]
MULS #-7,D0 ; [D0(31:0)] <- [D0(15:0)] x -7
MULU D1,D2 ; [D2(31:0)] <- [D2(15:0)] x [D1(15:0)]
The Division instructions are more complex. These are designed to divide a
32-bit dividend by a 16-bit divisor, giving a 16-bit quotient in the lower word of
the destination Data register and a 16-bit remainder in the upper word of the
same register. The following code fragment shows how a dividend in D0.L is
divided by 5000, with the quotient result placed in the the bottom of D6 and the
remainder in the bottom of D7:
DIVU #5000,D0 ; Divide the destination by the source
; [D0(15:0)] <- [D0(31:0)] / 5000 (/ symbol is integer division)
; [D0(31:16)] <- [D0(31:0)] % 5000 (% symbol is integer remainder)
CLR.L D6 ; Will hold the quotient
CLR.L D7 ; Will hold the remainder
MOVE.W D0,D6 ; 16-bit quotient to D6.W
SWAP D0 ; 16-bit remainder in lower D0
MOVE.W D0,D7 ; to D7
Alternatively, the number of shifts can be specified dynamically by the lower five
bits held in another Data register Dx[4:0]. For instance:
96 C FOR THE MICROPROCESSOR ENGINEER
Flags
Operation Mnemonic X N Z V C Description
Arithmetic Shift Right Linear Shift Right keeping the sign
√ √ 1
memory ASR.W ea b0 b0
√ √ 1
static Data reg. ASR.s3 #d3,Dn b0 b0 X
√ √ 1
dynamic Data reg. ASR.s3 Dx,Dy b0 b0 → → C
As well as being able to specify a shift number larger than eight, this type of
specification has the advantage of variability, as it can be changed dynamically
in software as conditions warrant, for example in a loop.
The Logic Shift instructions simply shift in 0s from the left or right as appro-
priate, with the emerging bit being caught by flags C and X. Arithmetic Shift
Left and Logic Shift Left are the same, except that the V flag is set if the MSbit
changes. If the operand is a signed number, this would signal a sign change, for
instance 0, 10011110 → 1, 0011100. In the case of Arithmetic Shift Right,
the sign bit propagates right; thus 1,1110100b (−12) becomes 1,1111010b (−6)
becomes 1,1111101b (−3) etc. and 0,0001100b (+12) becomes 0,0000110b (+6)
becomes 0,0000011b (+3) etc.
ROtate through the eXtend instructions (ROXL, ROXR) are similar to ADD with
eXtend, in that they can be used for multiple-precision operations. A ROtate
through eXtend takes in the X flag from any previous Shift and in turn saves
its ejected bit in X. As an example, a 48-bit number stored as three consecu-
tive 16-bit words in memory 47 M 32 31 M+2 16 15 M+4 0 can
be shifted once right as follows[8]:
M
LSR M ; 0 → ⇒ b32 → X
M+2
ROXR M+2 ; b32/ X → ⇒ b16 → X
M+4
ROXR M+4 ; b16/ X → ⇒ b0 → X
True circular ROtates are provided, where the shift is not through a flag (al-
though the C flag still catches the emerging bit). This emerging bit is copied into
the other end of the operand word. Thus:
ROR.W #8,D0 ; [D0(15:8)] <- [D0(7:0)], [D0(7:0)] <- [D0(15:8)]
moves the lower byte of D0 up eight places and the next higher byte around to be
the new lower byte. This is the equivalent of SWAP.W D0 (only SWAP.L is available,
except in the 68020 MPU and up).
The three binary logic operations AND, OR, Exclusive-OR (EOR) and NOT are
provided, as shown in Table 4.4. The first two can bitwise operate on any Data
register or alterable memory location. EOR (rather inconsistently) can only use a
Data register as target. All three have an Immediate variant that can target an
alterable memory location directly or be used to change any bit or bits in the CCR
or SR (the latter only in the Supervisor state), for example:
ANDI.B #11111110b,CCR ; Clear Carry flag, others unchanged
OR Logic bitwise OR
√ √
to Data register OR.s3 ea,Dn • 0 0 [Dn] <- [Dn] + [ea]
√ √
to memory OR.s3 Dn,ea • 0 0 [ea] <- [ea] + [Dn]
1 √ √
immediate ORI.s3 #kk,ea • 0 02 [ea] <- [ea] + #kk
circulates in a tight loop waiting for bit 6 of memory location 8080h to change
to logic 1. This may be the Control register of a PIA, and thus effectively the
program will be waiting for the active edge of handshake line CA2 (programmed
as an input) to occur. Of course if that event never occurs, due to a hardware
fault, then the system will hang up indefinitely. More about that later.
Strictly speaking BTST should be classified as a Data testing instruction, its
purpose being not to change the operand but to sense its state, which is reflected
in the Z flag to be used later by a Conditional Branch. The two other such instruc-
tions are CoMPare (CMP) and TeST (TST), as shown in Table 4.6. A CoMPare does
a subtraction of the source operand from the destination operand (as does SUB),
setting the flags accordingly but not putting the difference into the destination.
A TeST for zero or negative is just a CoMPare with a zero source operand
ITS INSTRUCTION SET 99
Notice the comparison is destination with source, just as SUB is subtract source
from destination. Some processor assemblers, such as for the PDP-11 minicom-
puter and 80x86 family MPUs, reverse the order.
CMPA is used with Address register destinations. Unlike other such targeted
instructions (e.g. ADDA), the CCR flags are set normally, but with word-length
source operands sign extended in the usual way to a long-word, for example:
fragment exits with the address+1 of the first pair of bytes which differ in two
blocks of data or strings:
The TeST primitive is represented by the TST instruction. This can check
that the contents of any memory location or Data register is zero (sets Z flag) or
negative (sets N flag), for example:
Note 1: Normally a label is specified here and the assembler works out the offset.
Note 2: The condition codes (cc) are:
True on True on
0000 T3 True always Always 1000 VC oVerflow Clear V=0
0001 F3 False always Never 1001 VS oVerflow Set V=1
0010 HI HIgher than C+Z = 0 1010 PL PLus N=0
0011 LS Lower or Same C+Z = 1 1011 MI MInus N=1
0100 CC Carry Clear C=0 1100 GE Greater or Equal N⊕V = 0
0101 CS Carry Set C=1 1101 LT Less Than N⊕V = 1
0110 NE Not Equal Z=0 1110 GT Greater Than N⊕V·Z = 1
0111 EQ EQual Z=1 1111 LE Less or Equal N⊕V·Z = 0
the 8-bit displacement if Short or all zeros if Word. In the latter case the 16-
bit displacement follows the op-code. Thus the instruction BPL .06 (Branch if
PLus six places on) is coded as 0110-1010-00000110b (6A06h).
In the 68000 family the cc tests can be used with other instructions, the most
useful of which are the Decrement, Test and Branch loop operations. We have
already used software loops, for example the Block-Compare routine on page 99.
Essentially a loop is a mechanism in which a section of code can either be repeated
a fixed number of times (the loop count) or exit when a certain condition or
conditions are fulfilled, or both.
As an example of the latter situation, consider interfacing to a peripheral which
sets bit 6 of an interface device's Control register (e.g. a 6821 PIA) when it has
valid data it wishes to be read. This involves continually checking the state of
bit 6 in a loop until it goes high; only then do we move on and read the data.
102 C FOR THE MICROPROCESSOR ENGINEER
But what happens if, say, due to hardware malfunction, this Data Ready signal is
never sent? The software will then hang indefinitely. Perhaps it would be better
to give up after a fixed number of times and go to an error routine if this sequence
of events happens. To do this we would have to check the flag; if it is not set,
then decrement the loop count, and if this hasn't fallen through zero (i.e. to −1)
then repeat. Following the structure of Fig. 4.3 a possible coding is:
MOVE.W #n,D1 ; Set loop count n
LOOP: BTST #6,CONTROL ; Test bit 6 of the Control register 16~
BNE EXIT ; IF True THEN EXIT (cc=Not Equal Zero) 12~
SUBQ.W #1,D1 ; ELSE decrement loop count 4~
BCC LOOP ; IF no Carry then [D1] is not -1 18~
EXIT: CMP #-1,D1 ; Exit with n = -1?
BEQ ERROR ; IF True THEN error
MOVE.B PORT,D0 ; ELSE read data from port
..... ....... ; and continue
For applications where speed is important (not this example) reducing the time
taken by the control mechanism is important, as this housekeeping overhead is
executed on each pass through the loop body. In this case the Test and Control
is 34~ as against 50~. Notice that BNE is shown with an execution time of 12~,
whilst BCC is 18~. This is because Branches taken (i.e. True) for byte offsets take
longer than Branches not taken (but the opposite for word offsets, 18~ and 20~!).
Similarly DBcc has a variable execution time. As the number n used by DBcc
is limited to 65,536, the ordinary Branch construction must be used where the
default timeout parameter exceeds this number.
Some situations require the number of loop passes to be fixed. As the normal
DBcc exits if either test is True, the variant DBF makes the first test always False,
and so an exit only happens when the loop count reaches −1. The routine below,
which is a fixed delay using an idle loop body, shows this:
DELAY: MOVE.W #n,D0 ; n is the delay parameter 16~
LOOP: NOP ; Do nothing and take 4~
DBF D0,LOOP ; one less pass 18~
The total delay here is 16 + (n + 1) × 22 (+8 extra when DBF is True), a total of
46 + 22n clock cycles. Thus a 0.1 s delay requires 46 + 22n = 8 × 105 µs at a
clock rate of 8 MHz, giving n = 36, 363. Remember that n has a maximum value
of 65,536 for DBF.
104 C FOR THE MICROPROCESSOR ENGINEER
This routine occupies 6144 bytes of program memory and takes 20,480 clock
cycles (2560 µs at 8 MHz) to execute.
As we need to repeat the same operation 1024 times, clearly we have a prime
candidate for using a loop construction, thus:
CLEAR_ARR: MOVEA.L #0E000h,A0 ; Point A0 to ARRAY[0]
MOVE.W #1023,D0 ; Set up loop count less 1 in D0.W
CLOOP: CLR.B (A0)+ ; While [D0.W] > -1
; Clear Array element pointed to by A0 and move pointer on one byte
DBF D0,CLOOP ; Decrement loop count, exit on D0.W = -1
Inherent
op-code
Immediate, #kk
Here the operand is the data itself, not an address or pointer to an address. Gen-
erally the constant follows the op-code as one or two words. Three instructions
have Quick-Immediate variants where the data is embedded in the op-code itself,
MOVEQ reserves 8 bits for the signed constant (+127 to −128) and ADDQ/SUBQ
can only be used for unsigned 3-bit constants 1 to 8 (000b represents 8 here).
The instruction variants ADDI/SUBI permit constants of any applicable size to be
added or subtracted directly on alterable memory locations, rather than on Data
registers. Some examples are:
ADD.L #1,D0 ; [D0(31:0)]<-[D0(31:0)]+1 (16~) {Coded D0BC-0000-0001h}
ADDQ.L #1,D0 ; [D0(31:0)]<-[D0(31:0)]+1 ( 8~) {Coded 5280h}
ADDQ.W #1,0E000h ; [E000:1] <-[E000/1] +1 (20~) {Coded 5279-0000-E000h}
ADDI.W #56h,0E000h ; [E000:1] <-[E000/1] +56h (24~) {Coded 0679-0056-0000-E000}
Notice the difference in size and execution time between the top two examples,
which do the same thing. Of course ADDQ is limited to operand sizes of up to only
ADDRESS MODES 109
eight. The difference between ADDQ and ADDI for alterable memory destinations
is not so great, but still significant.
op-code
The vast majority of instructions use a Data register as the destination, source or
both — as listed in Table 4.8. The op-code itself holds the register number(s) (see
Fig. 4.4), so instructions using this address mode are short and also execute faster.
Thus, where convenient, variables should be kept in a register. The first two
examples under the Immediate heading also used Data Register Direct addressing
as the destination; some other possibilities are:
ADD.L D0,D1 ; [D1(31:0)] <- [D1(31:0)] + [D0(31:0)] {Coded as D280h}
ADD.B D1,0E000h; [E000] <- [E000] + [D1(31:0)] {Coded as D339-0000-E000h}
op-code
Addresses stored in an Address register can point to data for most instructions,
but only the special instructions ADDA, SUBA and MOVEA can also target and hence
change these pointers. The ADDQ and SUBQ variants can also target any Address
register in .W or .L sizes. They are useful to increment or decrement pointers.
Some examples are:
ADD.L A0,D0 ; [D0(31:0)] <- [D0(31:0)]+[A0(31:0)] {Coded as D188h}
ADDA.W #8000h,A1 ; [A1(31:0)] <- [A1(31:0)]+FFFF8000h {Coded as D2FC-8000h}
SUBQ.L #1,A1 ; [A1(31:0)] <- [A1(31:0)]-00000001h {Coded as 5389h}
Note again that any operation changing Address register contents always acts
on all 32 bits, and if word-sized (no byte size allowed), will be sign extended as
shown in the second example above.
The absolute address itself directly follows the op-code in this mode. In the
short-form version, only a 16-bit address is specified, and this is sign-extended
in the usual manner before being sent out on to the address bus. The applicable
range for this is 00007FFFh to 00000000h and FFFF FFFFh to FFFF 8000h. Con-
ceptualizing the memory map as a grand circle, this can be thought of as a range
110 C FOR THE MICROPROCESSOR ENGINEER
from +0 up to +32,767 and back to −32, 768. The long form will of course specify
any address directly, but occupies an extra word of program memory and thus
takes an extra Read cycle (4~) during the fetch phase. Two examples are:
op-code
Here an Address register holds the location of the operand in memory, that is
points to the operand. The term Indirect is used, as the register does not hold
the data itself. Thus:
has the same affect as ADD.B 0E000h,D0, but of course once A0 is set up, the
shorter and faster indirect access can be used, and the target address dynamically
altered by changing the contents of A0.
op-code displacement
op-code
There are two modes here, both of which automatically modify the designated
Address register, which points to the operand. The former decrements the ef-
fective address by one, two or four for a byte, word or long object respectively
before the operation. In the latter case the Address register holds the ea, which,
after the operation is complete, is incremented by the appropriate one, two or
four.
We have already illustrated these modes in use, see Fig. 4.1 and the opening
example on page 106, where we cleared an array. As a further example, which
also uses the previous indirect modes, consider the problem of digitally low-pass
filtering this same array. Taking the 1024 byte-array elements already stored
between locations E000 and E3FFh as samples in advancing time, originating
from, say, an analog to digital converter, then the 3-point algorithm [9] is given
as:
This mode offsets the contents of a designated Address register with both a
constant and a variable to give the effective address. The variable index can be
the contents of any Address or Data register. Either the entire 32 bits (.L) or a
sign-extended 16 bits (.W) can be used. The constant is a signed 8-bit byte. Thus
we have:
If we assume that D0.W is 0004h on entry, then the first instruction puts
the absolute address of the first table element (060Ch) into A0. The effective
address calculated in the following instruction is 00 + [A0] + SEX|D0(15:0),
in this case 00 + 0000060C + 00000004 = 00000610h. The data in this byte is
4Ch, and this is the value moved to D0(7:0) prior to return.
As can be seen from this example, this mode is useful for random access
into an array, with the array number (or a multiple of, for word or long-word
arrays) being in the Index register. It is instructive to compare this example with
its equivalent 6809 code on page 39, which used an Accumulator to hold the
variable offset and one of the Index registers to hold the base address.
op-code displacement
This is similar to Address Register Indirect with Displacement but this time the
Program Counter is the specified register. For example in:
MOVE.B 200h(PC),D0 ; [D0(7:0)] <- [[PC]+200h]
the data 200h bytes on from where the PC is (actually pointing to the next in-
struction) is placed in D0.B. This of course is not an absolute address, as only
the distance from the instruction is of interest. Like the relative Branch instruc-
tions, a label is normally used for the destination and the assembler evaluates
the appropriate offset.
ADDRESS MODES 113
The only difference between the two programs is in line 1. Previously the absolute
address of the table bottom was put into A0. In the PIC case, A0 is loaded with
the contents of PC plus 6, which is again the address of the bottom of the table,
but is calculated at run time. If we were to relocate the subroutine to start at
1780h, nothing would change.
In practice, if the first line of the program were:
LEA TABLE_BOT(PC),A0
the assembler would produce the same code (41FA-0006h), evaluating the dif-
ference between TABLE_BOT and the location of the following instruction, that is
6 bytes. The absolute value of TABLE_BOT is not used as the offset — as in the
case of Branch instructions.
Note the use of Load Effective Address to move the ea generated by any ad-
dress mode (except Pre-Decrement and Post-Increment) into an Address register.
Some other examples are:
LEA is long-word sized only, and must solely target an Address register.
This is similar to Address Register Indirect with Index in that a constant offset
plus a variable offset in either an Address or Data register is added to the PC to
give an effective address. The assembler permits a label to be used as the con-
stant, and will calculate the required difference. Using this mode the 7-segment
program reduces to:
Note the offset of 02 in the machine code generated by the first instruction.
The offset permissible for this mode is only +127 to −128, which represents
a considerable limitation compared to the plain offset-mode with a range of
+32,767 to −32, 768 (both ranges have been extended for the 68020 MPU).
The twelve address modes covered there are summarized in Table 4.9. Except
for the two Register Direct modes, additional time is needed to calculate the
effective address. Some of this may be due to the necessity to fetch one or more
extension words, and some due to the address arithmetic. As an example, the
base time to CLeaR a memory byte is 8 clock cycles (4 to read the op-code and
4 to send out the zero on the data bus). Thus from the table, CLR.B (An) takes
8 + 4 = 12~, CLR.B 0E04567h takes 8 + 12 = 20~. Reference [4] gives timings for
all instructions. The 68008 takes longer to generate eas for most operations due
to its byte-sized Data bus.
Note 1: A 3-bit code indicating the target register for modes 000b to 110b,
otherwise a submode.
Note 2: The Index register, which can be any Data or Address register, is specified
as a 4-bit code in the extension word, which also carries the 8-bit offset.
Not all address modes are legitimate in many situations. For example, an Im-
mediate operand by definition cannot be specified as the destination ea. Also,
but not so obviously, the two Program Counter Indirect modes are also illegal for
a destination operand. This is because it is considered bad practice to modify
program code, and in any case the area around the PC will frequently be in ROM
and therefore cannot be altered. The group of address modes excluding PC Rel-
ative and Immediate are referred to as Alterable. Those also excluding Address
Register Direct are categorized as Data Alterable. In general, except for special
instructions such as ADDX, all address modes may be used as a source operand.
The destination operand may be a Data register only, an Address register only
EXAMPLE PROGRAMS 115
or, in more comprehensive operations, such as MOVE and ADD, a Data Alterable
mode may be specified. Except for MOVE, one of the operands must be a register.
Table 4.8 summarizes the permitted address modes for each instruction.
Table 4.9 also lists a 6-bit code against each mode. This is the bit pattern
used in the op-code to specify the address mode for both source (if present) and
destination. Two examples are given in Fig. 4.4. Of course it is not necessary
for the programmer to work out the binary code for an instruction, unless he
or she suspects the assembler's integrity — I did once find an assembler which
incorrectly coded one instruction – address mode combination. After all this is
the main raison d'être for using an assembler.
source operand to the operand field and destination to the instruction mnemonic,
for instance:
In 68000 assembly language, the mnemonic does not contain any operand
information, and any operands appear explicitly or implicitly in the operand field
as <source>,<destination>, for example:
However, the size of the operands are indicated in the mnemonic field by the
extension .B, .W or .L as appropriate. Both operands are the same size.
One quirk peculiar to the XA8 cross assembler is the treatment of the MOVE
Multiple (MOVEM) instruction. The standard Motorola way of representing a
range of registers is to use the - range operator, for example D0-D3 meaning
D0/D1/D2/D3. Thus the two ways of indicating a Push of the registers D0 to D3
and A0 on to the System stack are:
The XA8 assembler unfortunately does not support the - range operator.
Each program module is written in the form of a complete subroutine, with
data assumed present on entry in some place, usually a Data register, and termi-
nated by a ReTurn from Subroutine (RTS) instruction. We will look at subrou-
tines in some detail in Chapter 5.
Our first program generates the sum of all integers up to a maximum n
of 65,535 (FFFFh). We assume that n is passed to the subroutine in the lower
word of D0. The maximum possible sum of 2,147,450,880 fits comfortably in
EXAMPLE PROGRAMS 117
the 32-bit D1 for return. Compare this with the n = 255 limit in the 6809 equiva-
lent on page 45 due to its smaller registers, although of course external memory
could have been used for larger operands.
The algorithm used in the listing of Table 4.10 simply clears Data register_D1,
which will hold the 32-bit sum, and also the upper 16 bits of D0. This latter oper-
ation effectively promotes the word-sized parameter n, passed to the subroutine
in D0.W, to long-sized. The equality is necessary for the addition of line 12,
which adds the progressively decrementing n to the partial sum. The loop con-
trol DBF implements this decrementation using n both as the operand and the
loop counter. When n drops below zero, the loop terminates and the final sum
is in D1.L as specified.
The object code shown in Table 4.10 is the result of passing the source code
file through the assembler and then the linker-loader, as described in Section 7.2.
All 68000-based programs in this book assume ROM from 0400h up for the pro-
gram sections designated _text and RAM from E000h up for the _data sec-
tions. Only _text is needed in this case. The program is 16 bytes long and takes
54 + 14n clock cycles to execute (maximum 114,694.75 µs at 8 Mhz).
The alternative direct algorithm:
(n + 1)
sum = n ×
2
is shown coded in Table 4.11. This copies n into D1.W, adds one, multiplies
to give the long n × (n + 1) and then divides by two using a Shift Right once
operation. Only 10 bytes in length, it takes 104 clock cycles to execute (13 µs
at 8 MHz) irrespective of n. However, like its 6809 equivalent of Table 2.10, one
value of n will give an erroneous zero answer. It is left to the reader to determine
118 C FOR THE MICROPROCESSOR ENGINEER
The conversion loop simply divides repetitively by ten the long binary number
passed in D0, producing the 16-bit remainder in the top of D0 and the 16-bit
quotient at the bottom. SWAP (line 19) is used to reverse the order of these, and
with the quotient safely at the top, the following Convert to ASCII and Move-byte
operations leave this undisturbed (lines 20 to 22). Finally, clearing the remainder
and swapping again restores the quotient as a 32-bit quantity ready for the next
32 ÷ 16 bit DIVU.
Data register_D1.W is used with DBF to give 5 passes around the loop, and A0
is used as a pointer to the next RAM byte for the string digits, in conjunction with
the Post-Increment address mode. Multiple MOVEs at the start and end of the
subroutine Push and Pull all use registers into the System stack, and ensure that
the internal state (except the CCR) is returned unaltered on completion.
Unlike the 6809 equivalent in Table 2.12, the binary number is not restricted
to FFFFh (65,535). As we have coded the algorithm for five digits, the upper limit
is 99,999. Changing line 16 to MOVEQ #5,D1 (i.e. six digits) will increase this to
655,359 before overflow occurs. The reason the limit is not 999,999 is the 16-bit
quotient produced by DIVU. The 68020 MPU has a 32 × 32 divide, giving a 32-bit
quotient and remainder (e.g. DIVUL #10,D0:D1 puts the 32-bit quotient in D0
and 32-bit remainder in D1). With the 68000's DIVU, one approach is initially
to divide the binary number by 10,000, the quotient then holding the upper five
digits and the remainder the lower five digits. Each half is then processed as
shown. The limit thus is 4,294,967,295. Coding this is left as an exercise for the
reader.
in D1. As we observed on page 50, this restricts n to no more than 12, and to
signal a value outside this range, D0.L is used to return an error status, −1 for
error and 0 for success.
As in Section 2.3, there are two techniques for tackling problems of this nature.
The direct method uses the mathematical definition of factorial as the product
of all integers up to and including n (with the exception of 0! = 1), as shown
in Fig. 2.5. Although the 6809 MPU has a multiplication instruction, its 8 × 8 field
size meant that the necessary 32×8 products had to be evaluated as four separate
operations together with the necessary shifting and addition. Furthermore the
growing product had to be kept externally in four memory bytes, all of which led
to the messy coding of Table 2.13.
Matters are somewhat improved in the 68000 with its 16 × 16 multiply and
32-bit Data registers. Implementing a 32 × 8 multiplication now involves the
process:
120 C FOR THE MICROPROCESSOR ENGINEER
Firstly the 32-bit sum of products is split into two words each of which is
multiplied by n (promoted to word size in line 12). The second product is shifted
left 16 places and the two products added to give the new sum. Repeating this
with M decrementing from n to 1 gives the loop algorithm of Table 4.13, lines 21 –
34.
Splitting up the sum of products, using a word MOVE from D1 (holding the
32-bit sum) to D2.W, gives the 16-bit SUM.L. Moving all of D1 to D3 and then
swapping words (SWAP D3) puts the 16-bit SUM.U in the lower word of D3. The
two MULUs of lines 26 and 27 then give the two sub-products. The second of these
is moved left 16 places by doing a SWAP and clearing the lower 16 bits. Finally,
EXAMPLE PROGRAMS 121
they are summed in D1 (lines 30 and 31) to give the grand total. Decrementing n
(line 33) completes the loop.
Once again this example is easier to implement with the 68020 MPU, which has
a 32×32-bit multiply MULU.L. This would avoid the need to split the multiplicand
in two and later combine the two sub-products.
On entry to the loop, n is tested for 1 or 0, and if True the subroutine is
exited with D0.L cleared. The alternative exit if n > 12 (lines 14 and 15) puts
FFFFFFFFh (−1) in D0.L to signal error and bypasses the clearing operation.
Where no simple mathematical algorithm exists to specify a function, using
a table of outcomes is the only approach, for example the 7-segment decoder
of page 111. Although this is not the case here, there are only 13 successful
122 C FOR THE MICROPROCESSOR ENGINEER
outcomes to the subroutine, and the use of a look-up table is an attractive propo-
sition.
Using this approach, the resulting coding of Table 4.14 shows the active por-
tion of the program (i.e. excluding error checking and reporting, which is the same
as the previous listing) to be only lines 21 and 22. The first multiplies n by four to
match the size of the table entries. This is then used as the Index register (D0.W)
to point into the table, with A0 holding the base address 0426h (TABLE). For exam-
ple, if n = 4 then [D0(15:0)] becomes 10h (4 × 4) and MOVE.L 0(A0,D0.W),D1 ef-
fectively moves the 4 bytes starting at 0+[A0]+[D0(15 : 0)] = 0+0426h+10h =
0436h to D1.L. The contents of 0436:7:8:9h are 24 (00 00 00 18h), as required
for n!.
References
[1] Starnes, T.W.; Powerful Instructions and Flexible Registers of the 68000 Make Pro-
gramming Easy, Electronic Design, 28, no. 9, April 1980, pp. 171 – 176.
[2] Wakerly, J.F.; Microcomputer Architecture and Programming: The 68000 Family,
Wiley, 1989, Section 8.4.1.
[3] Motorola; M68000 16/32-bit Microprocessor Programmer's Reference Manual,
5th ed., 1986.
[4] Leventhal, L.A.; 68000 Assembly Language Programming, McGraw-Hill, 2nd ed.,
1986.
[5] Leventhal, L.A. and Cordes, F.; Assembly Language Subroutines for the 68000,
McGraw-Hill, 1989.
[6] Kelly-Bootle, S. and Fowler, B.; 68000, 68010, 68020 Primer, H.W. Sams, 1985.
[7] Van de Goor, A.J.; Computer Architecture and Design, Addison-Wesley, 1989, Sec-
tion 4.1.2.
[8] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987, Section 1.2.
[9] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987, Section 9.1.
CHAPTER 5
y = sin(x);
the program must jump out to the code, carrying with it the value of x. After
execution, the outcome y will be found at some prearranged location.
Subroutines are primarily used to reduce the size of the overall code, since
they may be successively called from many points outside, including other sub-
routines, and even from inside itself (when they are known as recursive)! For
example, the calculation of sine may be needed at five different parts of the pro-
gram, but if it is coded as a subroutine, only one implementation is necessary.
Furthermore, subroutines can be nested, with one subroutine calling another. For
example, a call to a cosine subroutine will invariably have recourse to the use of
the sine function.
Sets of useful subroutines are often organized in a library. These libraries are
scanned at link time (see Section 7.2) and the relevant entries referred to in the
user's program, extracted and added to the final code. To be used in this man-
ner, each subroutine must be documented with well-defined parameter-passing
protocols. Libraries may be built up by the user or be available as a commercial
package. High-level languages usually come with several such packages.
Aside from saving space, subroutines are the vehicle normally used to im-
plement modular programming [1]. A structured approach to hardware design
decomposes the system into functional modules, for example oscillator, gate,
counter, decoder, display. Each module has a relatively simple function and may
be designed, implemented and tested as a separate entity, with the appropriate
stimuli. This may not produce the smallest, most efficient circuit, but it is likely
that the product will come to fruition earlier and be more maintainable due to its
testability.
The software module is analogous to its hardware cousin as it too can be
inserted into its motherboard (the main program), takes one or more signals
(parameters, e.g. x) and has an outcome (return values, e.g. sin(x)). A software
123
124 C FOR THE MICROPROCESSOR ENGINEER
PSHS PC
= JSR DELAY ............................PULS PC = RTS
JMP DELAY
Note 1: Available in signed 8-bit (+127, −128) and 16-bit offset (+32,767, −32, 768)
varieties. Most assemblers can chose the appropriate versions automatically.
The 68020 upwards have a full 32-bit offset Branch capability.
From Fig. 5.2 we see that the action of JSR/BSR and RTS on the System stack
is the same for both 6809 and 68000 MPUs, except the latter requires four bytes.
As is usual for Motorola MPUs, the lower byte is located in the higher address (i.e.
the lower byte of the address is pushed out first). The 68000's SSP must always
point to an even address, and this will be enforced even if a single byte is pushed
out.
As an example, consider a subroutine to give a 0.1 s delay. This is easily imple-
mented by loading a constant into a register and decrementing to zero. Coding
for the 6809 and 68000 processors is shown in Table 5.2. Other than the termi-
THE CALL-RETURN MECHANISM 127
Figure 5.2 Saving the return address on the Stack. The SSP assumed a priori set to 4000h.
nating RTS, the programs are perfectly normal routines. Strictly, in calculating
their delay, the time to get to the subroutine should be considered, and this can
differ according to how far away the subroutine is from the caller and which Call
instruction and/or address mode is used. This also illustrates that there is a time
overhead in using a subroutine, and where speed is of the essence, in-line code
should be used.
Notice that in both cases illustrated in Table 5.2, one of the registers (X or A0)
will be returned in an altered state, the same being true of the Code Condition reg-
ister (CCR). Provided that such changes are well documented, this will frequently
be of little consequence. However, it is often preferable to make subroutines
transparent in that all registers, or perhaps a subset, remain unaltered. This can
be accomplished by pushing all relevant registers into a stack at the beginning
of the subroutine and pulling them out again just before the final exit RTS. This
128 C FOR THE MICROPROCESSOR ENGINEER
Table 5.2 A simple subroutine giving a fixed delay of 100 ms when called.
1 .processor m6809
2 ; ********************************************************************
3 ; * This subroutine does nothing and takes 0.1s to do it *
4 ; * ENTRY : Non *
5 ; * EXIT : X Address register = 0000, CCR destroyed *
6 ; ********************************************************************
7 .define N =12500-(5+3/8)
8 E000 8E30D3 DELAY: ldx #N ; Delay factor, 3~
9 E003 301F DLOOP: leax -1,x ; Decrement , Nx5~
10 E005 26FC bne DLOOP ; to zero , Nx3~
11 E007 39 rts ; , 5~
12 .end
is easy in the 6809 MPU, as any combination of registers, including the CCR, can
be Pushed or Pulled with a single instruction, see Table 5.3(a). There is a slight
problem with the 68000 MPU. The MOVEM instruction used for Pushing and Pulling
only acts on Address and Data registers. There is a MOVE SR,-(SP) instruction
which copies the whole Status register, of which the CCR is the lower byte. The
opposite Pull operation is supported, that is MOVE (SP)+,CCR! Although the lat-
ter only pulls out a byte, the SSP moves up two bytes. This is necessary to obey
the rule that the SSP always points to an even address, and thus preserves the
integrity of the System stack. Interestingly the 68010 and higher family mem-
bers have gained the missing MOVE CCR,<ea> instruction, which matches the
MOVE <ea>,CCR instruction.
From Table 5.1(b), we see that the 68000 family has a second Return instruc-
tion, RTR (ReTurn and Restore CCR). This is used as an equivalent to the se-
quence:
MOVE (SP)+,CCR
RTS
and assumes that the CCR has been saved out onto the System stack at the be-
ginning of the subroutine before any other stack-based operations have altered
the SSP. Notice from Table 5.3(b) that the CCR is saved first (line 8), before the
Data register is Pushed. The Pull sequence at the end of the subroutine is then
in the reverse order. Failure to observe this can lead to spectacular crashes! The
equivalent instruction PULS CCR,PC is sometimes used to terminate a 6809 sub-
routine (see Table 8.3).
THE CALL-RETURN MECHANISM 129
Figure 5.3 The stack when executing the code of Table 5.3(b), viewed as word-oriented.
130 C FOR THE MICROPROCESSOR ENGINEER
(a) 6809 code. Note lines 12 and 13 could be replaced by puls x,cc,pc.
1 .processor m68000
2 ; ********************************************************************
3 ; * This subroutine does nothing and takes 0.1 s to do it *
4 ; * ENTRY : None *
5 ; * EXIT : No change *
6 ; ********************************************************************
7 .define N = (200000-8-18-14-8-8)/14
8 000400 40E7 DELAY: move sr,-(sp) ; Save CCR (in SR) , 14~
9 000402 3F00 move.w d0,-(sp) ; and Data reg d0(15:0) , 8~
10 000404 303C37CC move.w #N,d0 ; Initial delay factor , 8~
11 000408 5340 DLOOP: subq.w #1,d0 ; Decrement , Nx4~
12 00040A 66FC bne DLOOP ; to zero , Nx10/8~(taken/not)
13 00040C 301F move.w (sp)+,d0 ; Retrieve old d0(15:0) , 8~
14 00040E 4E77 rtr ; Retrieve CCR then RTS , 20~
15 .end
(b) 68000 code. Note rtr is equivalent to 6809 code puls cc,pc.
Table 5.4 Using a register to pass the delay parameter. The call-up sequence shown above passed a
constant (ten) to the subroutine.
1 .processor m68000
2 ; ********************************************************************
3 ; * This subroutine does nothing and takes Zx0.1s to do it *
4 ; * EXAMPLE : Z = 10; delay = 1 second *
5 ; * ENTRY : Z passed in lower 16 bits of D1 *
6 ; * EXIT : D1(15:0) = FFFF, D2(15:0) = 0000, CCR destroyed *
7 ; ********************************************************************
8 .define N = (200000-8-10)/14
9 000400 6008 DELAY: bra LOOPTEST ; Check Z = 0 , 10~
10 000402 303C37CC OUTERLOOP: move.w #N,d0 ; 100ms delay factor , 8~
11 000406 5340 INNERLOOP: subq.w #1,d0 ; Decrement , Nx4~
12 000408 66FC bne INNERLOOP ; to zero , Nx10/8
13 00040A 51C9FFF6 LOOPTEST: dbf d1,OUTERLOOP ; One less 100ms click, 10/14~
14 00040E 4E75 rts ; , 16~
15 .end
The coding itself uses an inner loop (lines 11 and 12) identical to that in Ta-
bles 5.2 and 5.3, with DBF being employed in conjunction with D1.W (i.e. Z) to
count the number of passes through this inner core (i.e. 0.1 s ticks). This DBF
Decrement and Test is exercised immediately the subroutine is entered, to en-
sure a speedy exit should Z be zero. The delay due to line 9 only happens once,
and can be thought of together with the caller's JSR/BSR as a constant error in-
dependent of Z. No data is returned from this void subroutine.
If the delay parameter is a variable, for example data read from an analog to
digital converter, and stored somewhere in memory at MEM_Z, then:
MOVE.W MEM_Z,D1 ; Copy the delay variable to D1
BSR DELAY ; to pass to subroutine
will do the necessary. Note that the parameter passed is a copy of the variable
(still in MEM_Z), not the variable itself. Thus when D1.W is decremented in the
subroutine, Z will not be altered, just its clone. Passing copied parameters is
known as call by value [2]. We will look at ways of directly affecting variables
through a subroutine later.
Using registers to pass parameters is convenient, fast and efficient. Further-
more, with some modification, it is suitable for recursion (subroutines that call
themselves), supports re-entrant code (subroutines which can be interrupted and
then called again by the service routine, see Section 6.1) and is position indepen-
dent. Its main problem is lack of generality, as the complement, range and type
of registers available vary considerably between devices. Thus the 6502 MPU has
two 8-bit Address registers and one 8-bit Data register, the 8086 with four 16-
bit Data registers and three 16-bit Address registers, while the 68000 has eight
32-bit registers each of both types. This is especially a problem with high-level
language compilers, which attempt to be portable between processors.
132 C FOR THE MICROPROCESSOR ENGINEER
Table 5.5 Using a static memory location to pass the delay parameter.
1 .processor m68000
2 ; ********************************************************************
3 ; * This subroutine does nothing and takes Zx0.1 s to do it *
4 ; * EXAMPLE : Z = 10; delay = 1 s *
5 ; * ENTRY : Z passed in memory location 6000/6001h *
6 ; * EXIT : D1(15:0) = FFFF, D2(15:0) = 0000, CCR destroyed *
7 ; ********************************************************************
8 .define N = (200000-8-10)/14
9 000400 32386000 DELAY: move.w 6000h,d1 ; Get delay parameter, 12~
10 000404 6008 bra LOOPTEST ; Check Z = 0 , 10~
11 000406 303C37CC OUTERLOOP: move.w #N,d0 ; 100 ms delay factor, 8~
12 00040A 5340 INNERLOOP: subq.w #1,d0 ; Decrement , Nx4~
13 00040C 66FC bne INNERLOOP ; to zero , Nx10/8
14 00040E 51C9FFF6 LOOPTEST: dbf d1,OUTERLOOP ; 1 less 100 ms click, 10/14~
15 000412 4E75 rts ; , 16~
16 .end
If MEM_Z was actually the common memory location, then this copy would not
need to be made, but care would have to be taken not to alter the variable itself
(rather than the copy).
The use of common static memory has the advantage of being able to pass
large numbers of parameters and structures such as arrays. However, as these lo-
cations are by definition fixed, such subroutines cannot be recursive or re-entrant.
Also, unless different static locations are used for each subroutine, nesting can
lead to unfortunate side effects as one subroutine inadvertently alters another
subroutine's variables. This makes debugging difficult, as routines other than the
one being tested may interact in unpredictable ways. Such common areas can be
used to hold global variables, which are known throughout all linked program
modules.
Many of these problems can be overcome by using a stack to pass variables
back and forth, or preferably putting them there in the first place [3, 4]. This
situation is depicted in the listing of Table 5.6 and Fig. 5.4. Now to call up DELAY,
a copy of the delay variable Z is pushed onto the System stack before calling the
subroutine. On return the System Stack Pointer must be moved back up again to
balance this Push and be returned to its original position. Using LEA 2(SP),SP
is an alternative to ADDQ #2,SP (or ADDA +2,SP), and can be used for operands
up to 32,767. The 8086 MPU family has a convenient RET #n instruction which
is equivalent to LEA +n(SP),SP after a RTS. Similarly, the 68010 and up has a
RTD #n equivalent (ReTurn and Deallocate parameters) where n is a 16-bit
PASSING PARAMETERS 133
Comparing Tables 5.6 and 5.5, we see that the only change is of Address mode
in line 9. From Fig. 5.4, we see that Z lies 4:5 bytes up from where the SSP points
to on arrival. Its effective address is thus 4(SP).
Passing parameters using dynamic allocation permits nesting, recursion and
re-entrancy as the SSP automatically moves down for each call and up again on
each return. Essentially such variables are local (sometimes called automatic) and
are known only to their own subroutine. The technique is general to all processors
supporting a stack, and is used by block-structured high-level languages such as
Algol, Pascal and C [5]. It is also possible to return values on a stack in a similar
manner.
All our examples so far have involved copying the value of a variable to pass to
the subroutine. The actual variable itself is somewhere out in read/write memory
134 C FOR THE MICROPROCESSOR ENGINEER
the two locations. A comparison tests for a successful copy as well as advancing
the pointer. The DBNE loop control exits if it is true that the two bytes are not
equal (i.e. unsuccessful) otherwise decrements the count in D0.W (originally set
to LENGTH − 1 in line 19) and repeats. The residue in D0.W will be FFFFh if each
copied byte is verified, otherwise its exit state reflects the number of loop passes
taken.
The System stack, as seen in Fig. 5.5, is used for three purposes. Firstly the
three parameters are pushed out prior to the call, in a sequence such as:
Then the actual call places the PC on the System stack automatically. Finally, as
the subroutine is to be transparent, the System stack is used to save any used
registers, apart from D0.
The code shown in Table 5.7 uses offsets from the SSP to obtain the three
parameters, for example MOVE.W 22(SP),D0. This can cause problems, since in
the body of many subroutines, the SSP is used to Push and Pull temporary results
of evaluation into and out of the System stack. In particular local variables (that
is variables used only by the subroutine and forgotten about after return) are also
frequently kept on this stack. All this means that the parameter offsets from the
SSP will be in a constant state of flux. To get around this problem another Address
register is frequently pointed to the top of the System stack at the beginning of
the subroutine and this remains as a fixed point of reference for the duration of
the subroutine, irrespective of what is happening to the SSP. This is known as the
Frame Pointer (FP), with the space used on the System stack after entry being the
Frame.
Our final example is used to illustrate the concept of a Frame. Consider a
subroutine where an analog signal must be sampled as rapidly as possible for a
variable number of times, using an 8-bit analog to digital converter, after which
the resulting array is to be processed in some manner. Typical processes are
filtering, averaging and peak detection. To keep our program as simple as pos-
sible, we will assume that we wish to return the simple sum of not more than
256 of these samples. To comply with the injunction that sampling should be as
quick as possible, it will be necessary to allocate space to store temporarily up
to 256 bytes. After this burst of sampling, the process can be carried out on the
array now in situ in this RAM buffer.
Our first implementation is based on the 6809 MPU, as an example of a proces-
sor without any specific Frame-handling instructions. The System stack reflecting
the coding of Table 5.8 is shown in Fig. 5.6. The variable i representing the num-
ber of samples to be taken is pushed on to this stack in the normal way prior to
the subroutine call. The subroutine itself commences by saving the contents of
the User Stack Pointer (USP) on the System stack. The USP is to point to the Top
PASSING PARAMETERS 137
Figure 5.6 The 6809 System stack organized by the array averaging subroutine.
Of Frame (TOF) and is thus to be the Frame Pointer. Transferring the contents of
the System Stack Pointer (SSP) to the USP effectively points the Frame Pointer to
the TOF, and then the SSP is moved down 257 bytes, one to hold the temporary
(local) variable holding the count and 256 for the array (lines 11 – 13). At this
point, the SSP points to the bottom of the frame (BOF) but, as all references in Ta-
ble 5.8 use the Frame Pointer (e.g. line 21, DEC -1,U), it can be used subsequently
for other purposes.
After the body of the subroutine, the Frame is closed by copying the Frame
Pointer to the SSP — that is moving it up to the TOF — and pulling out the old
Frame Pointer, before RTS (lines 34 and 35). Of course, after return the System
stack will need to be cleaned up to compensate for passing i.
138 C FOR THE MICROPROCESSOR ENGINEER
Figure 5.7 The 68000 System stack organized by the array-averaging subroutine.
The coding shown in Table 5.9 is designed to reflect the 6809 equivalent,
rather than using the more efficient features of the 68000, such as DBF. The
LINK A6,#102h instruction in line 11 replaces the three equivalent 6809 instruc-
tions in lines 11 – 13 of Table 5.8. The old Frame Pointer (A6 in this example, but
any Address register except A7 could be used) is firstly saved in the System stack.
Then it is overwritten by the SSP to become the new Frame Pointer to TOF. Finally,
the SSP is moved down to open the 102-byte Frame. The opposite UNLinK (UNLK)
instruction of line 30 undoes these three actions also in one go. Table 5.1(b) lists
the behavior of this pair of instructions. Note that LINK An,#kk is a word op-
eration, with kk being sign extended to a 32-bit constant and then added to SSP.
Effectively this limits the frame size to 32,768 bytes. With relatively little mod-
ification, the code given below could deal with sampled arrays of this size. The
68020 MPU has a long LINK variant.
140 C FOR THE MICROPROCESSOR ENGINEER
The core of the program is straightforward, with the only problem lying in
lines 25 and 26. Here a byte sample is to be added to a word sum. As both source
and destination operands must be the same size, the byte variable is promoted
to word size by moving into previously cleared D0.W. This is then added to D7.W.
In stepping an Address register through the array, A0 fulfils the same role as the
X Index register in the 6809 equivalent, leaving the Frame Pointer A6 untouched
(lines 16 and 25).
The 68000 family are blessed with a generous complement of registers. It
would thus be more efficient to use a Data register to hold the loop counter
rather than operate directly in memory. The C high-level language allows the
programmer to declare local (known as Auto) variables as Register variables. The
compiler will then make an attempt to lodge such variables in a register.
The last two examples have returned their single parameter in a Data register.
High-level languages such as Pascal and C permit only one return variable, which
is defined as the value of the function. Thus expressions in C such as:
if (block_copy(rom_start, ram_start, length) = -1)
{do this, as no error has occurred;}
else
{do that, on an error situation;}
References 141
are possible, where function block_copy() (see Table 5.7) is called up (with
the passed parameters indicated in brackets) and its value compared to −1. Its
`value' is in fact the returned value.
In C and Pascal, larger numbers of variables can be altered by passing pointers
(as in this example) or by declaring variables as global. Global variables are stored
in fixed RAM locations, and are thus accessible to any function.
The System stack itself may be used to pass back multiple variables. In such
cases, room is normally left on this stack, just below the pass-to variables, be-
fore moving control to the subroutine. On return, the SSP will then point to the
returned parameters, which can be extracted before the stack is cleaned up.
References
[1] Yourdon, E.; Techniques of Program Structure and Design, Prentice-Hall, 1975, Sec-
tion 3.4.
[2] Goor, A.J. van de; Computer Architecture and Design, Addison-Wesley, 1989, Sec-
tion 8.3.
[3] Wakerly, J.K.; Microcomputer Architecture and Programming: The 68000 Family, Wi-
ley, 1989, Section 9.3.6.
[4] Maurer, W.D.; Subroutine Parameters, BYTE, 4, no. 7, July 1979, pp. 226 – 230.
[5] Wakerly, J.K.; Microcomputer Architecture and Programming: The 68000 Family, Wi-
ley, 1989, Section 9.2.
CHAPTER 6
142
INTERRUPTS PLUS TRAPS EQUALS EXCEPTIONS 143
explicit instructions which can cause the processor to act in much the same way
as a hardware interrupt. These are sometimes known as software interrupts. A
generic term for all hardware and software interrupts is an exception (for excep-
tional circumstances).
Processors handle exceptions in differing ways. In this chapter we will look
at the general concepts involved in interrupt handling, and how the 6809 and
68000 processors implement exceptions.
This is simple enough. But consider that N = FFFF FFFFh. If an interrupt strikes
in-between lines 2 and 7, and the interrupt service routine uses N, then the value it
will see is FFFF 0000h rather than 0000 0000h. Although problems like this can
be avoided at assembly level, they are difficult to overcome when using high-level
languages, as the machine-level code produced by the compiler is not directly
under the control of the programmer. This is particularly true as high-level in-
structions are not entities as seen by an interrupt. In general do not share data
between interrupt service routines and other code, see Section 10.2. However,
avoiding the use of global variables is easier said than done.
Most interrupts can be inhibited during `sensitive moments', such as de-
scribed above, by setting the appropriate mask in the Code Condition register.
Specifically the 6809 MPU supports three interrupt lines. These are labelled in
Fig. 6.2(a) as IRQ (for Interrupt_ReQuest), FIRQ (for Fast_Interrupt_ReQuest) and
NMI (for Non_Maskable_Interrupt). The former two are inhibited by mask bits I
and F respectively. These are automatically set when the MPU is Reset, so that pe-
ripheral interface devices and relevant variables can be allocated their initial state
before dealing with an interrupt. The ANDCC instruction can be used at any point
in the program to clear either or both mask bits, for example ANDCC #10101111b
enables both IRQ and FIRQ lines. Conversely the ORCC instruction can be used to
inhibit, for example ORCC #01000000b disables FIRQ.
The 6809 has one non-maskable interrupt line. This cannot be locked out, and
as such must be used with caution. Unlike IRQ and FIRQ which are activated by
a low voltage level at the appropriate pin, NMI is triggered by a low-going
voltage \ that is edge triggered. This voltage may stay low after the event,
and will not cause another interrupt until the signal goes high and then low again.
In the event of one type of interrupt being interrupted by another, the NMI will
have top priority, that is NMI can interrupt an IRQ or FIRQ service routine, or even
itself. IRQ has the lowest priority, and can be interrupted by a FIRQ, as well as
NMI. As we shall see, the interrupt handling mechanism requires the use of the
System stack. After the 6809 is Reset a NMI interrupt event is latched, but not
acted upon, until the first load into the System Stack Pointer, which it is assumed
sets up the System stack, for instance LDS #0400h.
The interrupt structure of the 68000 MPU as shown in Fig. 6.2(b) is some-
what more complex. Here too there are three interrupt lines, and in a minimum
system these can be used to give three different responses. However, the pro-
cessor is actually designed to differentiate between seven different interrupt re-
quests, which it interprets from the 3-bit pattern on the Interrupt Priority Level
IPL2 IPL1 IPL0. Thus 100b (active low 011b) is considered a level 3 interrupt re-
quest. A level 0 request (IPL2 IPL1 IPL0 = 111) is ignored (no interrupt), whilst
level 7 is non-maskable, and like the 6809's NMI equivalent, is edge triggered, an
edge here being defined as a transition from a lower level.
The mask structure also echoes the level-oriented interrupt request. Three
mask bits in the Status register (see Fig. 3.1) set the level above which a request
146 C FOR THE MICROPROCESSOR ENGINEER
Figure 6.2 Interrupt logic for the 6809 and 68000 processors.
HARDWARE INITIATED INTERRUPTS 147
Interrupt request lines from three peripheral interfaces may be directly con-
nected to IPL2 IPL1 IPL0, having level 1, 2 or 4 priorities. Up to seven interrupt
sources can be handled using external circuitry to encode these lines to 3-bit
binary. The most common approach shown in Fig. 6.3 uses a 74LS148 priority
encoder [2]. This has eight active-low inputs and three active-low outputs. The
74LS148 gives a 3-bit coded equivalent of the highest active input line. Thus if
devices 6 and 1 simultaneously request service (10111101b), then the output
will be 6 (001b, active-low). Once device 6 has been serviced and its interrupt
request line lifted, the 74LS148's output will change to 110b (active low 1), and
148 C FOR THE MICROPROCESSOR ENGINEER
device 1 will then be eligible for service (if not masked out by I2 I1 I0). Similar
considerations apply to the 68008 MPU, although as we can see from Fig. 3.2 IPL0
and IPL2 are internally connected, effectively allowing only levels 2 (101), 5 (010)
and 7 (000) to be acceded to. The higher the level of interrupt request, the greater
is its priority. Thus if a level 5 interrupt is in progress, it can only be interrupted
by a level 6 or 7 request.
Once the MPU accepts an interrupt, it must change from executing the back-
ground program, and move to the appropriate interrupt service routine or fore-
ground program. This is similar to switching to a subroutine, but the change-
over is dictated by an apparently random call from outside. As this can happen
anywhere in the background program, the state of all the MPU's registers (its
context) used in the background program must be saved before the change-over.
On return these are restored, leaving the state of the MPU unchanged. Making
the interrupt process invisible in this manner allows the MPU apparently to ex-
ecute more than one task in parallel. Multitasking in this manner is of course
a serial process, and carries the overhead of the time to switch context between
background and foreground [3].
There are two approaches to context switching. At the very least the Program
Counter and Code Condition register/Status register must be saved. The former,
so that control can be passed back to the background program at the point of
the break, as in the case of a subroutine call. The latter, because the CCR will be
altered by any but the most trivial interrupt service routine. Any additional regis-
ters altered by the service routine can be saved by Pushing and Pulling via a stack,
in the manner shown in Table 5.3. Some early microprocessors, such as the 6800,
save all internal registers automatically on the System stack when an interrupt
response is initiated and return them at the end. This entire-state context switch-
ing is convenient, but in processors with a significant complement of registers,
the resulting time overhead can have a noticeable impact on system response.
This is not justified where only a few registers are actually used in the service
routine. Early processors have few registers and/or stack-oriented instructions
(the 6800 has one Address register, two Data registers and cannot directly Push
or Pull the former), and thus an automatic whole-state context switch is efficient.
Both types of context switching use the System stack to save the register states.
The 6809 MPU has both partial and full context switching. The IRQ and NMI
responses automatically cause all registers to be Pushed on to the System stack,
in the order shown in Fig. 6.4(b). The FIRQ response saves only the PC and CCR,
leaving the rest up to the programmer (see Table 6.1(b)). The E flag in the CCR is
set after the Push if the Entire state has been saved. It is used by the ReTurn from
Interrupt instruction, which terminates all 6809 interrupt service routines. RTI
reverses the context switch and restores the MPU to its original state.
The FIRQ response automatically sets the I and F mask bits in the CCR before
entering its service routine, in order to ensure that it cannot be further inter-
rupted by any other than the non-maskable interrupt. Only the I mask is set in
the IRQ response. Consequently an IRQ service routine can be interrupted by
a FIRQ response as well as a NMI. Of course when the old value of the CCR is
HARDWARE INITIATED INTERRUPTS 149
Figure 6.4: How the 6809 responds to an interrupt request (continued next page).
150 C FOR THE MICROPROCESSOR ENGINEER
Figure 6.5: How the 68000 responds to an interrupt request (continued next page).
152 C FOR THE MICROPROCESSOR ENGINEER
the IRQ service routine in FFF8:9h. Normally this vector table is in ROM, and this
is a necessity for the Reset vector in FFFE:Fh, as the address for the main routine
must be present at power up (cold start). In systems where no actual memory
exists at these locations, the Address decoder must be designed to enable phys-
ical memory when these addresses are output by the MPU. If necessary, clever
address decoding can be used to place locations FFF2 – FFFDh in RAM where they
may be dynamically altered by the program, although this is rare.
As an example, consider an extension to the system shown in Fig. 6.1. An
external 16-bit counter records 1 ms ticks, whilst a detector circuit records signal
peaks. An array of 256 peak to peak times in milliseconds is to be displayed on
an oscilloscope. Two digital to analog converters are to be used to drive the X
and Y oscilloscope plates — see Fig. 11.3. The background program is to scan
this array sending its analog equivalent to the Y plates at the same time as the
X plates drive is being incremented from 0 to 255 (0 to full-scale analog). This
occurs as a continuous loop, giving a flicker-free display. Whenever a peak is
detected, the processor is to switch from its background display task to updating
the array with the latest period. When the array is full (256 peaks), the process
is to be repeated, over-writing the oldest values. Provided that this foreground
task is accomplished quickly, this switch back and forth will not be noticed on
the display.
We need not concern ourselves with the details of the Address decoder nor the
interfacing digital to analog converters here, but we must consider the problem
of driving the MPU's interrupt input from the peak detector. Taking the 6809 MPU
for our first solution, we will use the FIRQ input to keep the response time short.
Now FIRQ (and IRQ) are active as long as their level is low. We have not specified
the duration of the peak detector's active output, but in this situation it is likely
to be anything up to 250 ms, to avoid multiple triggering due to noise around
the peak. Thus if FIRQ is still low after the return to the background program,
then another interrupt response will be immediately set in train. In this case the
whole 256-word array will probably be updated in one go!
As shown in Fig. 6.6, interposing a D flip flop solves the problem. As the flip
flop is edge-triggered, its D input is only clocked in on the falling edge (in this
case). This interrupt flag is thus `lowered'. After the processor vectors to the
service routine, the act of reading the counter also activates the flip flop's Preset
input, which sets it to logic 1 (raises the flag). Thus on return, the interrupt line
is no longer active, irrespective of the indeterminate length of the source request.
Edge-triggered interrupts, such as NMI, can be directly driven without using an
external flag.
Peripheral devices designed specifically to interface to a MPU normally incor-
porate such flags as part of a Status or Control register. For example the 6821 PIA
of Fig. 1.9 uses bits 6 and 7 of each Control register for this purpose [4]. Reading
the appropriate Data register clears these flags automatically.
The 6809 code implementing our specification is shown in Table 6.1. This
comprises three separate source modules:
154 C FOR THE MICROPROCESSOR ENGINEER
Figure 6.6 Using an external interrupt flag to drive a level-sensitive interrupt line.
1. The background module DISPLAY which extracts the 256 array values using
them to drive the oscilloscope Y plates as it ramps up the X plates. This module
runs continually except when interrupted by the foreground module.
2. The foreground module UPDATE is entered only when an external event occurs.
It reads the counter, evaluates the time since the last event, inserts the outcome
in the array and moves the array index on one.
3. The VECTOR module simply sets up the Interrupt and Reset vectors. The ac-
tual values are put into memory at load time, that is when the EPROM is pro-
grammed (or program downloaded into RAM in a Microprocessor Development
System). It does not execute as such at run time, it is simply in situ in a sup-
porting role to the two previous modules.
Each of these three modules are separately assembled and subsequently linked
together to give the listing of Table 6.1. We will discuss this linkage process in
the following chapter, here it is sufficient to note that the assembler reserves
256 words in its Data program space .psect _data (line 17 of Table 6.1(a)), the
start address of which is called ARRAY. This name is made known to the other sep-
arately assembled modules through the linker by declaring it .public in line 16.
The foreground module needs to use this address, its value at assembly time be-
ing unknown, and it gets round this problem by declaring ARRAY as .external in
line 20 of Table 6.1(b). This directive is really saying to the assembler `hold your
fire, the actual address will be supplied at a later date via the linker'. Of course
this is an array of words, as the counter is 16-bits wide. In a similar manner, the
address of both run-time modules are made known to module VECTOR by declar-
HARDWARE INITIATED INTERRUPTS 155
ing their start address .public. They are consequently declared .external in
line 9 of Table 6.1(c).
In the background module, the ramp count x (the Scan pointer) is also used
as array index (i.e. at position x display ARRAY[x]). However, as each array ele-
ment is a double byte, x is first promoted to 16 bits (line 24) and then multiplied
by two (lines 25 and 26). The resulting value in Accumulator_D is then used as
the offset to the X Index register — which is pointing to ARRAY[0] — to abstract
ARRAY[x]. The same mechanism is used in the update interrupt module to con-
vert the Update pointer i to the array pointer i (lines 29 – 31). Both the Scan and
Update pointers conveniently wrap around from 255 to zero after incrementing.
Numbers other than 255 would need to be actively zeroed.
Notice how in module VECTOR (lines 10 – 12) the start addresses for the Reset
and FIRQ service routines are located in their appropriate place. In practice other
vector addresses, not used in this fragment, would be defined here.
The 6809 MPU can only deal directly with three interrupt requests from sepa-
rate sources. Some applications require many more than can be handled in this
Table 6.1: 6809 code displaying heart rate on an oscilloscope (continued next page).
1 .processor m6809
2 ; ********************************************************************
3 ; * Background program which scans array of word data (ECG periods) *
4 ; * Sends out to oscilloscope Y plates in sequence *
5 ; * At same time incrementing X plates *
6 ; * so that ARRAY[0] is seen at the left of screen *
7 ; * and ARRAY[255] at the right of screen *
8 ; * ENTRY : None *
9 ; * EXIT : Endless loop *
10 ; ********************************************************************
11 ;
12 .define DAC_X=6000h, ; 8-bit X-axis D/A converter
13 DAC_Y=6001h ; 12-bit Y-axis D/A converter
14 ;
15 .psect _data ; Data space
16 .public ARRAY ; Make the array global
17 0000 ARRAY: .word [256] ; Reserve 256 words for the array
18 0200 X_COORD: .byte [1] ; and a byte for the X co-ordinate
19 ;
20 .psect _text ; Program space
21 .public DISPLAY; Make program known to the linker
22 E000 10CE0800 DISPLAY: lds #0800h ; Define Top Of Stack
23 E004 F60200 DLOOP: ldb X_COORD ; Get X co-ordinate
24 E007 4F clra ; Expand to word size
25 E008 58 lslb ; Multiply by two
26 E009 49 rola ; to give array index in Acc.D
27 E00A 8E0000 ldx #ARRAY ; Point to ARRAY[0]
28 E00D 308B leax d,x ; now to ARRAY[X]
29 E00F F60200 ldb X_COORD ; Get back X co-ordinate
30 E012 F76000 stb DAC_X ; Send it out to X plates
31 E015 EC84 ldd 0,x ; Get ARRAY[X] word
32 E017 FD6001 std DAC_Y ; and send it to the Y plates
33 E01A 7C0200 inc X_COORD ; Go one on in X direction
34 E01D 20E5 bra DLOOP ; and show next sample
35 .end
letting IRQ rise through the pull-up resistor to +V . If one or more request lines
go low then IRQ goes low. MPU-compatible peripheral interface devices, such as
the 6821 and 68230 PIAs, have integral open-collector buffers at their interrupt
output lines.
Given that the MPU has gone to the service routine, how is it to distinguish
between the various possible sources? A simple procedure is to examine each in-
terrupt flag in turn, until the source is found. Where MPU-compatible peripherals
are used, this is accomplished by examining the relevant bits in the appropriate
peripheral Control/Status register.
Polling in this manner is rather slow but does have the advantage of simplicity,
and a priority scheme of arbitrary complexity can be implemented in software.
There are many schemes which speed up the process of distinguishing be-
tween interrupting peripherals [5], one of which is shown in Fig. 6.7. Here, four
events (e.g. peak detectors) trigger interrupt flags in the manner of Fig. 6.6. These
four service requests are combined together with open-collector buffers to drive
the MPU's IRQ line. The state of these four lines can be read at any time through
3-state buffers at address Vector. Assuming that unconnected data lines read
as logic 0, we have:
Request Vector
0 00000100 (4)
1 00001000 (8)
2 00010000 (16)
3 00100000 (32)
When the 68000 MPU is Reset, the initial setting of the Supervisor Stack Pointer
(not the User Stack Pointer) is fetched from long-word 0 (000000 – 000003h),
followed by the start value of the Program Counter in long-word 1 (000004 –
000007h). This dual vector must be in ROM to ensure a successful cold start
(i.e. from power up), as must be the equivalent 6809 Reset vector at the top
of memory. The remaining 254 vectors are normally also located in ROM, but
clever address decoding can be used to overlay these vectors in RAM. This latter
procedure allows the software dynamically to relocate exception service routines.
The external decoder can distinguish between vectors 0 and 1, and 2 to 255
from the state of the Function Code status pins, which are 110b for the former
(Supervisor Program) and 101b for the latter (Supervisor Data) — see page 69.
As the Supervisor Stack Pointer is set up after the MPU leaves its Reset start-up,
interrupts can be immediately serviced. The Interrupt Mask bits in the Status
register are set to 111b, locking out all but level 7 interrupts (i.e. non-maskable).
bringing VPA low during this Read cycle. More sophisticated peripheral interface
devices specifically designed for the 68000 MPU can respond by putting a Vector
number on its data bus and activating DTACK in the normal asynchronous way
(see Fig. 3.6). The MPU multiplies this number by four (shift left twice) giving the
address of the user interrupt vector somewhere in the table.
Referring to Fig. 6.8, we see that in both cases a 3 to 8-line decoder generates
one of seven Interrupt Acknowledge signals IACKn from the 3-bit level address.
This decoder is only active when the Function Code is 111, that is Interrupt Ac-
knowledge. The rest of the address lines are logic 1 and the general address
decoding must ensure that nothing else responds to this situation. The level,
and hence which IACK line is active, is determined by the connection of the pe-
ripheral's service request to a 74LS148 Priority encoder, as described in Fig. 6.3.
First we look at a dumb interface, such as shown in Figs 6.1 and 6.6, which
cannot generate its own Vector number. In Fig. 6.8 the level-1 request itself and
acknowledgement (IACK1) are ANDed to drive VPA low. The MPU will go auto-
matically to vector 25 (000064 – 7h) for its level-1 service routine. As previously
Table 6.2: 68000 code displaying heart rate on an oscilloscope (continued next page).
1 .processor m68000
2 ; ********************************************************************
3 ; * Background program which scans array of word data (ECG points) *
4 ; * Sends out to oscilloscope Y plates in sequence *
5 ; * At same time incrementing X plates *
6 ; * so that ARRAY[0] is seen at the left of screen *
7 ; * and ARRAY[255] at the right of screen *
8 ; * ENTRY : None *
9 ; * EXIT : Endless loop *
10 ; ********************************************************************
11 ;
12 .define DAC_X=6000h,; 8-bit X-axis D/A converter
13 DAC_Y=6001h ; 12-bit Y -axis D/A converter
14 ;
15 .psect _data ; Data space
16 .public ARRAY ; Make the array global
17 00E000 ARRAY: .word [256] ; Reserve 256 words for the array
18 00E200 X_COORD:.byte [1] ; and a byte for the X co-ordinate
19 ;
20 .psect _text ; Program space
21 .public DISPLAY ; This program known to the linker
22 000400 4240 DISPLAY: clr.w d0 ; Get X co-ordinate byte
23 000402 1039 DLOOP: move.b X_COORD,d0 ; expanded to word
0000E200
24 000408 E348 lsl.w #1,d0 ; x2 to give array index in D0.W
25 00040A 207C0000E000 movea.l #ARRAY,a0 ; Point A0 to ARRAY[0]
26 000410 31F000006001 move.w 0(a0,d0.w),DAC_Y ; Get ARRAY[x] to Y plates
27 000416 31F90000E2006000 move.w X_COORD,DAC_X ; Send X coord to X plates
28 00041E 52390000E200 addq.b #1,X_COORD ; Go one on in X direction
29 000424 60DC bra DLOOP ; and show next sample
30 .end
Read cycle. This vector number is programmed into the appropriate interface
register (the Port Interrupt Vector register in the 68230 [8]) during the setup rou-
tine. If we wanted to vector via address 000100h, the programmed-in vector
number would be 40h (000100 ÷ 4h). Vector numbers 0 – 63 should not be used,
although there is nothing physically to prevent this. Should a 68xxx peripheral
interface not have its vector register set up when an interrupt occurs, a default
vector 15 will be sent to indicate Uninitialized Interrupt.
The software for our example is given in Table 6.2. It matches the listing of
Table 6.1 for the 6809 MPU, and the comments made there apply equally. Notice
that the Interrupt Service routine UPDATE is terminated by RTE, the 68000 equiva-
lent for RTI. I have assumed that a level-1 autovector is being used as a pointer to
the service routine. A simple change of operand in line 12 of Table 6.2(c) would
move the start address to any other appropriate vector number.
Vector 24 is described in Fig. 6.5(c) as a Spurious Interrupt. This startup ad-
dress will be used if external circuitry asserts the Bus_Error (BERR) pin during an
Interrupt Acknowledge Read cycle. The hardware designer may wish to do this
when DTACK (or VPA) is not activated within a fixed time after the start of this
cycle; to indicate a hardware problem. Such circuitry is frequently implemented
as a retriggerable monostable which `collapses' if not clocked frequently enough.
Such a watch-dog timer can of course be used to indicate trouble òut there' during
a normal (i.e. not Interrupt Acknowledge) cycle. In such cases the MPU returns to
the Supervisor state and enters the Bus Error exception service routine pointed to
by Vector 2. Should the BERR signal persist when the status is being pushed out
to the Supervisor stack on entry to the service routine, a catastrophic situation is
assumed to have occurred. Such a Double-Bus fault causes the MPU to stop, with
both Halt and Reset going low. This response will occur in general where a prob-
lem occurs when an exception (including a Reset) tries to Push out its registers,
for example when the Supervisor Stack Pointer is odd.
Another possibility is to assert BERR and Halt simultaneously. Then the failed
bus cycle will be rerun, with the hope that a spurious failure occurred (perhaps
due to noise) and that the situation can be redeemed [9].
An IRQ will have no effect in this example, as it is masked out. Notice that unusu-
ally a FIRQ will enter its service routine with the entire machine state (context)
saved.
The SYNChronize instruction is similar, although any CCR flags will have to
be set by a preceding instruction. However, this time if the interrupt occurs but is
masked out, then the processor will simply move on to the following instruction.
If the interrupt is not masked out, and lasts for three clock cycles or more, then
it will be answered in the normal way. Tri-state buses go high impedance during
SYNC, allowing an external device to access memory directly [10].
The 68000's STOP instruction is comparable with the 6809's CWAI, but the im-
mediate word operand is the new state of the Status register, rather than being
ANDed with it. For example, STOP #001000 011 00000000b will halt the proces-
sor until an interrupt of level greater than 3 occurs. The MPU then responds in the
normal way. STOP is privileged and thus can only be used in the Supervisor state.
The machine context is not switched prior to the request. The equivalent HaLT
(HLT) for the 8086 family does not carry an immediate operand, but otherwise
operates in the same manner.
The 6809 MPU has three instructions which explicitly initiate Software inter-
rupt operations. SWI causes the entire state to be saved, sets the I and F masks
to lock out all but NMI interrupts, and then vectors to the start of its service
routine via FFFA:Bh. Instructions SWI2 and SWI3 are similar but using vectors
FFF2:3h and FFF4:5h respectively to hold their start address, and not locking out
the Hardware interrupts.
The 68000 MPU has 17 Software interrupts, known as TRAPs. Sixteen of these,
TRAP #0 to TRAP #15 are unconditional and TRAPV is only implemented if the
oVerflow flag is set at execution time. Looking at Fig. 6.5(c), we see that TRAP #0
vectors via location 000080 – 3h up to TRAP #15 at 0000BC – Fh, Exception vec-
tors 32 to 47. TRAPV has its service address located at 00001C – 00001Fh. Like
all other Exceptions, Traps execute in the Supervisor state.
Although what a Software interrupt/Trap does is clear enough, the reason for
using one is not entirely evident. Consider an environment where an applications
program is being written for a specific computer system. This system will have
various means of communicating to the world, using typically a keyboard, VDU,
serial and parallel ports, interrupts and various disk drives. Knowing the charac-
teristics of all these input/output (I/O) devices, the programmer can write a suite
of subroutines known as device handlers. Once this has been done, data can
be transferred by calling up the appropriate handler. However, a change of en-
vironment to a different computer will likely require a complete rewrite of these
handlers.
This approach is frequently adopted by the designers of embedded micropro-
cessor systems, where the hardware infrastructure is usually highly individual-
istic. Some standardization is possible for mass-produced computing machines,
such as engineering workstations and personal computers. These normally come
with an operating system, which can be thought of as a shell around the applica-
tions software shielding the programmer from the hardware. Typical operating
INTERRUPTS IN SOFTWARE 165
systems are UNIX [11] and MSDOS [12]. These systems are mainly disk-based
loaded into RAM, but work in tandem with a Basic Input Output System (BIOS),
usually located in ROM. The applications programmer can then call up the ap-
propriate subroutine in the BIOS, to communicate with a peripheral. The BIOS
ROM will vary with different machines, but in such a way as to hide the hardware
details from the operating system. The use of an operating system leads to the
concept of system-independent (portable) software.
Using a Trap call to communicate, rather than a subroutine, has the advantage
that the address of the procedure need not be explicitly known, as the vector table
will be in the BIOS. Hiding explicit details of the BIOS is important for portabil-
ity. Thus, as an example, INT #25 in a MSDOS environment [12] will enable a
Read from a magnetic disk (INT is the 8086 family mnemonic for TRAP). Param-
eters such as track, sector and drive are placed in registers prior to the Trap. In
68000-family based systems, the operating system normally resides in the Super-
visor state, completely separated from the application program in the User state
memory space.
The 68000 MPU has two additional explicit software interrupt instructions.
The instruction ILLEGAL (op-code 4AFCh) causes a transfer via vector address
000010 – 13h and the CHecK register (CHK) instruction vectors via 000018 – Bh
if the lower word of the designated Data register is below zero or above the stated
limit.
There are also a number of implicit traps, triggered by some internal event.
These are:
unused, but this facility provides the means for emulating unimplemented in-
structions in software.
References
[1] Cahill, S.J. and McClure, G.; A Microcomputer-Based Heart-Rate Variability Monitor,
IEEE Trans. Biomed. Engng., BME-30, no. 2, Feb. 1983, pp. 87 – 92.
[2] Cahill, S.J.; Digital and Microprocessor Engineering, Ellis Horwood/Prentice-Hall,
2nd. ed., 1993, Section 3.2.1.
[3] Lawrence, P.D. and Mauch, K.; Real-Time Microcomputer Design, McGraw-Hill, 1987,
Section 16.3.
[4] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987, Chapter 3.
[5] Leventhal, L.A.; Introduction to Microprocessors, Prentice-Hall, 1978, Chapter 9.
[6] Motorola Application Note AN866; Vectoring by Device using Interrupt Sync Ac-
knowledge with the MC6809/MC6809E. Reprinted in MCU/MPU Applications Manual,
2, 1984.
[7] Motorola Application Note AN1012; A Discussion of Interrupts for the MC68000,
1988.
[8] Miller, M.A.; The 68000 Microprocessor, Merrill Publishing, 1988, Section 8.8 – 8.11.
[9] Clements, A.; Microprocessor Systems Design, PWS, 2nd ed., 1992, Section 6.5.
[10] Motorola Application Note AN865; The MC6809/MC6809E SYNC Instruction,
Reprinted in MCU/MPU Applications Manual, 2, 1984.
[11] McIlroy, M.D.; UNIX Time-Sharing System, The Bell System Technical Journal, 57,
no. 6, part 2, 1978, pp. 1899 – 1904.
[12] Simrin, S.; MS-DOS Bible, H.W. Sams, 3rd ed., 1989.
[13] Leventhal, L.A.; 68000 Assembly Language Programming, McGraw-Hill, 2nd ed.,
1986, Chapter 19.
PART II
The only reality as seen from a central processing unit, be it mainframe, mini or
microprocessor, is in the patterns of binary states in memory. This is generally
far removed from the human description of the task which is to be controlled by
the processor hardware. In going from the problem specification to executable
binary installed in memory involves many steps, both conceptual and in software.
Many translation processes must occur on the way (see Fig. 7.1). Furthermore,
testing, debugging and commissioning the system require additional skills and
aids.
In Part 2 we look at these steps in some detail, how they interact and their
limitations. In particular we will investigate the use of the high-level language C as
a buffer between the problem-oriented human thought process and the machine-
oriented assembly-level languages. Many of the concepts introduced here apply
to other high-level languages, such as Pascal and Forth, but C is a small language
which is widely available, especially in a cross form, popular, flexible and can run
on inexpensive development systems. I can do no better than quote from the
originators of the language:1
C is a general-purpose programming language featuring economy of ex-
pression, modern control flow and data structure capabilities, and a rich
set of operators and data types.
C is not a `very high-level' language nor a big one and is not special-
ized to any particular area of application. Its generality and an absence
of restrictions make it more convenient and effective for many tasks than
supposedly more powerful languages. C has been used for a wide variety
of programs, including the UNIX operating system, the C compiler itself,
and essentially all UNIX applications software. The language is sufficiently
expressive and efficient to have completely displaced assembly language
programming on UNIX.
1 Ritchie, D.M. et al.; The C Programming Language, The Bell System Technical Journal, 57, no. 6,
part 2, July – August 1978, pp. 1991 – 2019.
CHAPTER 7
Consider the fragment of code below. To a 68000-family MPU this makes perfect
sense. Indeed a series of binary bits, typically represented by nominal 0 V and 5 V
potentials stored in memory, is the only code that a MPU or any other type of
computer, can understand. To the software engineer, interpreting programs in
this pure machine code is virtually impossible. Writing code in this form is
torturous, involving at the very least working out each op-code by hand, together
with bits representing source, destination and any applicable data; evaluating
relative offsets; and keeping tally of where data is stored.
0001000000111000 0001001000110100
0101110000000000
0001111000000000 0001001000110101
Even with a program written in such a form, some means must be found of
putting or loading the code to its final place in memory. Very early computers
did not use electronic memory at all, the code being configured by wire links.
Using switches to set up each memory address and its corresponding data, in
effect a kind of direct memory access, was still used up to the 1960s to enter a
short startup program. This program was known as a bootstrap, as once in and
executed, a paper tape reader could be controlled. Programs could then be read
in from this source, that is the computer was able to pick itself up by its own
bootstraps. A modern version of this is the resident BIOS in a PC, which allows
the MPU to read in the operating system from magnetic disk after switch on,
hence the term `to boot up'.
Using the computer to aid in translating code from more user-friendly (human)
forms to machine code and loading this into memory began in the late 1940s. At
the very least it permitted the use of higher order number bases such as octal
and hexadecimal. Using the latter, our code fragment becomes:
1038 1234
5C00
1E00 1235
A hexadecimal loader will translate this to binary and put the code in designated
addresses. Hexadecimal coding has little to commend it, except that the number
of keystrokes is reduced (but there are more keys!) and it is slightly easier to
spot certain types of errors. Nevertheless, this technique was extensively used in
168
SOURCE TO EXECUTABLE CODE 169
the early 1970s for microprocessor software generation and is often still used in
education as a first introduction to programming simple MPUs.
At the very least a symbolic translator or assembler is required for serious
programming. This allows the use of mnemonics for the instructions and internal
registers with names for constants, variables and addresses. We now have:
.DEFINE CONSTANT = 6
MOVE.B NUM1,D0 ; Get the number NUM1
ADDQ.B #CONSTANT,D0 ; Add the constant to it
MOVE.B D0,NUM2 ; is now the number NUM2
.ORG 1234h ; This is the data area
NUM1: .BYTE [1] ; NUM1 lives at 1234h
NUM2: .BYTE [1] ; and NUM2 at 1235h
Giving names to addresses and constants is especially valuable for long pro-
grams. Together with the use of comments, this makes code written in assembly
level easier to maintain. Furthermore, programs can be written as separate mod-
ules with symbols defined in only one module and a linker program used to
put them together with their actual values. This assembly of modules into one
program gave the name assembly-level to this type of language [1]. Of course
assemblers/linkers and their ancillary programs are rather more complex than
simple hexadecimal loaders. Thus they demand more of the computer running
them, especially in the area of memory and backup store. Because of this, their
use in small MPU-based projects was limited until the early 1980s, when power-
ful personal computers (made possible by MPUs) appeared. Prior to this, either
mainframe and minicomputers or target-specific microprocessor development
systems (MDSs) were required. Any of these solutions were expensive.
Assembly-level language is machine-oriented, in that there is generally a one-
to-one correspondence to the machine instructions. As such, code written at this
level bears little relationship to the problem being implemented. The use of a
high-level language permits a description of the problem in an algorithmically-
oriented language. In C, our code fragment becomes:
#define CONSTANT = 6
unsigned char NUM1,NUM2; /* Define NUM1 and NUM2 as unsigned bytes */
{NUM1 = NUM2 + CONSTANT;} /* The process */
Now we no longer need to keep track of exactly where NUM1 and NUM2 have to
be stored. Also we have a large repertoire of mathematical and string functions,
which do not have a one-to-one machine level counterpart. Notice that our pro-
gram did not indicate which processor's machine code would eventually be pro-
duced, the target might well be a Z80 rather than a 68000 (see Table 10.15).
Of course there are problems in using high-level languages, especially when
the target is an embedded MPU-based system. In general the further away the
level is from the machine code, the more isolated the programmer is from the
raw hardware. A compiler also demands much more of its supporting computer,
and for this reason only recently became popular as a tool in this type of design.
170 C FOR THE MICROPROCESSOR ENGINEER
Figure 7.1 Onion skin view of the steps leading to an executable program.
nient editor allows alterations to be easily made to the source code, which can
then be quickly retranslated with the updated symbolic and offset values [1, 2].
In faithfully reflecting the underlying structure of the hardware, assembler
code can produce the smallest and quickest code of any of the symbolic lan-
guages. Even though it is furthest away from the problem algorithm, these advan-
tages frequently mean that assembly-level routines are linked in with high-level
code, or even used entirely to implement problems, especially when real-time
operation is required.
Assemblers are one of a class of translator programs and are available from
a wide range of originators for most target processors. Although some attempt
has been made to standardize syntax [3, 4], normally each package has its own
rules. Generally the MPU manufacturer's recommended mnemonics are adhered
to reasonably closely. Directives, which are pseudo operators used to pass infor-
mation to the assembler program, do differ considerably. Details of the layout
and syntax for the assemblers used in Part 1 are given in Section 2.3 and will
not be repeated here. Differences in other assemblers used later in this text are
pointed out where they occur.
No matter which language is being used, the programmer must prepare the
source form of the code in the appropriate format and syntax. This preparation
involves the use of an editor program or word processor. The actual one used
is irrelevant, provided that the text is stored in a form which can be read by
the translator, usually plain ASCII. Most operating systems come with a basic
editor, for example MSDOS's EDLIN and UNIX's ED. More sophisticated packages,
such as Wordperfect, are usually favored for larger projects. Table 7.1 shows
a slightly modified source form of the sum-of-integers program first presented
in Table 4.10 (actually entered using EDLIN). This document, which is normally
stored on magnetic disk, is the file presented to the assembler for translation.
Conventionally the file name is postfixed .S, .SRC or .ASM for assembly source,
thus the file printed in Table 7.1 was called list7_1.s.
Assemblers can be broadly classified as absolute or relocatable, according to
the type of code they produce. The former normally generates a file with the
machine code and its absolute location ready to be loaded into memory. This
machine code file is a finished entity, to which no further alterations need be
or should be made before loading. The output of a relocatable assembler is not
yet complete, as it usually does not contain information regarding the eventual
location of the machine code in memory. Furthermore, symbols may be used in
the source code which are not defined at this juncture and which are assumed to
be in modules coming from elsewhere. It will be the job of a Linker program to
satisfy these unrequited references and to define code addresses.
Absolute assemblers tend to be simpler to use, as the path between source and
machine code is more direct, as can be seen in Fig. 7.2(a). Despite their simplicity
they are rarely used in major projects due to their lack of flexibility.
As a demonstration, consider the source code listed in Table 7.1. This is vir-
tually identical to the source of Table 4.10, but with the directive .ORG replacing
.PSECT. As this source is to be processed by an absolute assembler, the pro-
172 C FOR THE MICROPROCESSOR ENGINEER
grammer must specify the start address or origin (ORG) of each section of code
or data. The .ORG directive may be used as many times as required to locate the
various sectors, thus if necessary each subroutine may be located at a specific
start address.
In translating this source code input, the absolute assembler produces four
kinds of output. Should there be a problem with the syntax of the source, an
error file will be produced, giving the line in which it occurred and usually a
short description. Sometimes a syntax error in one line can lead to problems in
several other places. Table 7.2 is an example of such a file, it was generated by
replacing the instruction AND in line 11 of our source by the illegal mnemonic
ANP and the referenced label SLOOP in line 13 by LOOP, that is DBF D0,LOOP. The
source file is referred to as a:list7_2.s.
If all goes well, zero errors will be produced. This does not of course guarantee
that the program will work, only that there are no syntax errors! In this situation
a listing file will be generated, as illustrated in Table 7.3. This shows the original
source code together with addresses and the translated code. Other information
may be provided as well. In this case a cross-reference table shows where names,
other than reserved mnemonics and directives, are first defined and where they
Table 7.3 Listing file produced from the source code in Table 7.1.
1 .processor m68008
2 ; ******************************************************************
3 ; * FUNCTION : Sums all unsigned word numbers up to n (max 65,535) *
4 ; * ENTRY : n is passed in Data register D0.W *
5 ; * EXIT : Sum is returned in Data register D1.L *
6 ; ******************************************************************
7 ;
8 .define LONG_MASK = 0000FFFFh ; Used to promote word to long
9 .org 0400h ; Program starts at 0400h
10 ; for (sum=0;n>=0;n--){
11 000400 0280 SUM_OF_INT: and.l #LONG_MASK,d0 ; n promoted to long
0000FFFF
12 000406 4281 clr.l d1 ; Sum initialized to 00000000
13 000408 D280 SLOOP: add.l d0,d1 ; sum = sum + n
14 00040A 51C8FFFC dbf d0,SLOOP ; n--, n>-1? IF yes THEN repeat
15 00040E 4E75 SEXIT: rts
16 .end
LONG_MASK ----- 8 11
SEXIT 15
SLOOP 13 14
SUM_OF_INT 11
d0 ----- 11 13 14
d1 ----- 12 13
m68008 ----- 1
are referred to. This can be useful when maintaining large programs. Listing files
of this nature are for documentation only and have no executable function.
Symbol files list all symbols which occur in the program, giving name, location
and sometimes other information. In Table 7.4 three labels are implicitly identi-
fied, SUM_OF_INT is located at 0400h (the 0x prefix is the hexadecimal indicator
used in C), SLOOP at 0408h, and SEXIT at 040Eh. The suffix t indicates text (i.e.
program section). The label LONG_MASK is explicitly valued and is suffixed a for
absolute. See Table 7.11(a) for a more complex example.
Table 7.4 Symbol file produced from the absolute source of Table 7.1.
0x0000ffffa LONG_MASK
0x0000040et SEXIT
0x00000408t SLOOP
0x00000400t SUM_OF_INT
Symbol files are commonly used by simulator (see Section 15.2) and in-circuit
emulator software (see Section 15.3) to replace addresses by their symbolic equiv-
alents, to aid in the debug process. They are also useful as a documentation aid.
THE ASSEMBLY PROCESS 175
: Start of line
10 Number of code bytes (16)
0400 The address of the first byte
00 Record type (code)
02800000FFFF4281D28051C8FFFC4E75 Code
80 Checksum (2's complement)
Originally developed for the 6800 MPU, the Motorola S1/S9 object format is
similar, with a starting marker of S followed by 1 for a code record and 9 for a
termination line. This is succeeded by a count byte, which indicates the number
of bytes trailing the S1 or S9 field (including itself), a 4-byte address field and
then the code bytes. The checksum field is the 1's complement of the modulo-256
sum of all bytes following S1 or S9. The loader should sum each line including
the checksum to FFh if the line has been correctly received. Using this format,
we have from Table 7.5(b):
S1 Start of code line (S field)
13 Number of bytes after S field (19)
0400 The address of the first byte
02800000FFFF4281D28051C8FFFC4E75 Code
7C Checksum (1's complement)
Neither of these object formats can handle addresses of more than 2-byte size.
The Motorola S2/S8 format, developed for the 68000 MPU, is an extension to the
S1/S9 format, but with a 3-byte address field. The S3/S7 format is used for 32-bit
176 C FOR THE MICROPROCESSOR ENGINEER
S1 13 0400 02800000FFFF4281D24051C8FFFC4E75 7C
S9 03 0000 FC
(b) Motorola S1/S9 format.
S2 14 0FC400 02800000FFFF4281D24051C8FFFC4E75 AC
S8 04 0FC400 28
(c) Motorola S2/S8 format.
processors, which require 4-byte addresses. Table 7.5(c) shows the hex file for our
example but originated at 0FC400h. The extended Intel hexadecimal equivalent
is rather more complex as it was designed to cope with the segmented address
space of the 80x86 family. This uses an extended address record (type 02) if the
load address is over FFFFh. The data field here holds a 4-digit address which is
shifted left four times by the loader (giving here F0000h) before being added to
a subsequent 01 type data records' start addresses (here C400h) to give a 5-digit
load address (i.e. 0FC400h).
The actual mechanism of the translation process used by the assembler is
of little importance to us here. Most assemblers are described as 2-pass, as
historically all but the simplest read the source code, which was frequently on
paper tape, twice through from beginning to end. During the first pass a loca-
tion counter keeps track of where each instruction is to be placed in memory.
In an absolute assembler, this will be set by any .ORG directive (0400h in Ta-
ble 7.1). As each operation mnemonic is encountered, the location counter is in-
cremented by the appropriate number; thus AND.L #LONG_MASK,D0 causes the
location counter to advance by six.
As labels are encountered, their name and the state of the location counter are
stored in the symbol table, which is built up during the first pass. Labels which
are explicitly defined, such as LONG_MASK, are of course added to the symbol table
without a translation being necessary.
It is necessary to build up a symbol table in the first pass to cope with forward
references; thus an instruction BRA NEXT, where NEXT is further on down the
source file, cannot be fully translated until NEXT has been encountered and given
a value. Some assemblers may save any translated machine code to speed up the
second pass.
During the second pass, the translation is repeated, but this time any refer-
ences to symbolic names are replaced by the values extracted from the symbol
THE ASSEMBLY PROCESS 177
Table 7.6 A simple macro creating the modulus of the target operand.
3 ; Define macro
4 .macro LABSOLUTE
5 tst.l ?1 ; Is number in ?1 positive
6 bpl 1$ ; IF so then no action to be taken
7 neg.l ?1 ; ELSE negate it
8 1$: .endm ; Continue
9 ;
10 ; Now this macro can be evoked at any time by using its name
11 ; followed by an operand
12 ;
13 ; This fragment converts [D0.L] to an absolute value
14 ;~~~~~~~~~~~~~~~~~
15 000400 4A80 LABSOLUTE d0 ;~ tst.l d0 ~
6A02 ;~ bpl 1$ ~
4480 ;~ neg.l d0 ~
16 1$: ; ~~~~~~~~~~~~~~~~
17 ; This fragment converts 20 long words from E100h up to absolute form
18 ;
19 000406 303C0013 move.w #19,d0
20 00040A 307CE100 move.w #0E100h,a0 ;~~~~~~~~~~~~~~~~~
21 00040E 4A98 LOOP: LABSOLUTE (a0)+ ;~ tst.l (a0)+ ~
6A02 ;~ bpl 1$ ~
4498 ;~ neg.l (a0)+ ~
22 000414 51C8FFF8 1$: dbf d0,LOOP ;~~~~~~~~~~~~~~~~~
23 ;
24 ;
table. With the translation complete, listing, symbol and object files are created
in the appropriate format.
In general, assemblers bear a one-to-one relationship to their translated ma-
chine code. Macroassemblers represent a useful upward extension, by allowing
the programmer to define a group of assembly-language instructions as a named
macro [6]. This macro can be used repetitively anywhere in the program by sim-
ply naming it, followed by a list of operands. The assembler expands this source
line to its fundamental components whenever that name is encountered. The
programmer can thus emulate more powerful instructions that are not in the
MPU's repertoire. As an example, consider the operation where a long-word is to
be converted to its modulus (positive equivalent). This can be done by testing
for negative and if true negating the target. This sequence is defined in the body
of a macro in Table 7.6 between the directives .MACRO and .ENDM. The macro
name is LABSOLUTE (long absolute value) and takes one operand, a Data register
or address mode. This is indicated in the body of the macro by the dummy ?1
(first operand; this assembler can take up to nine). The numeric label 1$ used
in line 8 has the property that its lifetime only extends to the end of the macro.
This is necessary, as macro labels will appear in each expansion; and will thus be
defined several times, see Table 7.6 lines 15/16 and 21/22.
The macro is invoked by using its name LABSOLUTE followed by the operand.
In Table 7.6 this is done twice, the first specifying a Data register (LABSOLUTE D0)
and the second the Address Register Post-increment address mode based on A0
(LABSOLUTE (A0)+).
178 C FOR THE MICROPROCESSOR ENGINEER
A logical progression of this ability to create a new and more powerful in-
struction set is the evolution of a high-level assembler [7], or even a high-level
language.
The 2-pass principle (and the use of macros) apply equally to relocatable as-
semblers. This time the symbol table cannot be fully resolved, as some symbols
appear in other modules. This resolution is the job of a linker program, which is
the subject of the next section.
Machine code is passed to the linker in streams. The RTS assemblers fun-
damentally identify two streams, one for program code and the other for data.
Programs in Tables 6.1 and 6.2 used the directives .PSECT _TEXT for the former
and .PSECT _DATA for the latter, where .PSECT stands for Program SECTion.
Most embedded microprocessor systems will require text (which includes tables
of constants) in ROM and use RAM for variable data. In certain circumstances the
RTS assembler linker can handle two additional data sections, _ZPAGE for data
which will lie in the direct/absolute-short memory areas (zero page) of MPUs such
as the 6800/9 and 68000 devices and _BSS (Block Symbol Start) frequently used
for variables which have no initial value (see Section 10.3).
The Microtec Research Paragon 68K products1 , used later in this section, can
handle up to 16 program sections. This is useful where several non-contiguous
memory chips are being targeted. For example, initialized variables could be put
in a specific segment and placed in ROM. Later, at run time, they can be copied
into RAM, where they can be treated as variables; that is changed at will (see
Section 10.3).
Some relocatable products do not permit absolute placement of code using the
.ORG directive, and in any case this is considered bad practice. The RTS products
do permit relocatable ORGs, thus the fragment:
START: -----------
-----------
.ORG START + 0FFEh
.WORD ADDR1
will place the data word ADDR1 0FFE:Fh bytes on from START. If you know that
the linker will locate START at 0E000h, then this will actually be at 0EFFE:Fh.
That part of the machine code referring to labels, e.g. MOVEA.L #ARRAY,A0 in
line 30 of Table 6.2(b), which are relocatable or external is not resolvable at this
time. Thus the assembler must parallel the code streams with information relat-
ing these bytes to their label. Object code also contains headers giving processor
information, such as the order of address bytes (most or least significant first),
size of processor words, length of symbols, number of machine-code bytes etc.
With all this in mind it will be appreciated that relocatable object file formats are
much more complex than their absolute counterparts of Table 7.5. As a conse-
quence of this, their structure is very much specific to each product.
As our example for this section, we will follow through the program defined in
Table 6.2 but this time using the more sophisticated Microtec Research Paragon
68K assembler/linker. The instruction mnemonics and address mode represen-
tations follow the standard Motorola conventions, but the directives differ con-
siderably from the RTS mnemonics used up to now. Some key directives are:
The source code using this assembler for the Display module is given in Ta-
ble 7.7(a). I have placed data (ARRAY and X_COORD) in Section 14 and the program
text in Section 9. These are the sections chosen for data and text by the Microtec
Table 7.7: Assembling the Display module with the Microtec Research Relocatable assem-
bler (continued next page).
opt E,CASE
DISPLAY idnt
; ********************************************************************
; * Background program which scans array of word data (ECG points) *
; * Sends out to oscilloscope Y plates in sequence *
; * At same time incrementing X plates *
; * so that ARRAY[0] is seen at the left of screen *
; * and ARRAY[255] at the right of screen *
; * ENTRY : None *
; * EXIT : Endless loop *
; ********************************************************************
;
DAC_X: equ 6000h ; 8-bit X-axis D/A converter
DAC_Y: equ 6001h ; 12-bit Y-axis D/A converter
;
sect 14 ; Section 14 is Data space
xdef ARRAY ; Make the array global
ARRAY: ds.w 256 ; Reserve 256 words for the array
X_COORD: ds.b 1 ; and a byte for the X co-ordinate
;
sect 9 ; Program space
xdef DISPLAY ; Make this program known to the linker
DISPLAY: clr.w d0 ; Get X co-ordinate byte
DLOOP: move.b X_COORD,d0 ; expanded to word
lsl.w #1,d0 ; Multiply by two to give array index in D0.W
movea.l #ARRAY,a0 ; Point A0 to ARRAY[0]
move.w 0(a0,d0.w),DAC_Y ; Get ARRAY[x] to oscilloscope Y plates
move.w X_COORD,DAC_X ; Send X co-ordinate to X plates
addq.b #1,X_COORD ; Go one on in X direction
bra DLOOP ; and show next sample
end DISPLAY
Table 7.7 (continued). Assembling the Display module with the Microtec Research Relocatable as-
sembler.
Microtec Research ASM68008 V6.2a Page 1 Wed Jan 04 15:59:41 1989
Line Address
1 opt E,CASE
2 DISPLAY idnt
3 ; ********************************************************************
4 ; * Background program which scans array of word data (ECG points) *
5 ; * Sends out to oscilloscope Y plates in sequence *
6 ; * At same time incrementing X plates *
7 ; * so that ARRAY[0] is seen at the left of screen *
8 ; * and ARRAY[255] at the right of screen *
9 ; * ENTRY : None *
10 ; * EXIT : Endless loop *
11 ; ********************************************************************
12 ;
13 00006000 DAC_X: equ 6000h ; 8-bit X-axis D/A converter
14 00006001 DAC_Y: equ 6001h ; 12-bit Y-axis D/A converter
15 ;
16 sect 14 ; Section 14 is Data space
17 xdef ARRAY ; Make the array global
18 00000000 ARRAY: ds.w 256 ; Reserve 256 words for the array
19 00000200 X_COORD: ds.b 1 ; and a byte for the X co-ordinate
20 ;
21 sect 9 ; Program space
22 xdef DISPLAY ; This program known to the linker
23 00000000 4240 DISPLAY: clr.w d0 ; Get X co-ordinate byte
24 00000002 1039 DLOOP: move.b X_COORD,d0; expanded to word
0000 0200 R
25 00000008 E348 lsl.w #1,d0 ; x2 to give array index in D0.W
26 0000000A 207C movea.l #ARRAY,a0 ; Point A0 to ARRAY[0]
0000 0000 R
27 00000010 31F0 move.w 0(a0,d0.w),DAC_Y ; Get ARRAY[x] to Y plates
0000 6001
28 00000016 31F9 move.w X_COORD,DAC_X ; Send X co-ord to X plates
0000 0200 R 6000
29 0000001E 5239 addq.b #1,X_COORD; Go one on in X direction
0000 0200 R
30 00000024 60DC bra DLOOP ; and show next sample
31 end DISPLAY
Symbol Table
Label Value
ARRAY 14:00000000
DAC_X 00006000
DAC_Y 00006001
DISPLAY 9 :00000000
DLOOP 9 :00000002
X_COORD 14:00000200
research C compiler, which we will use later; for example see Table 10.9).
The listing file produced after assembly is shown in Table 7.7(b). Both Data
and Text sections are shown starting from 00000000h; they will be subsequently
located by the linker. This uncertainty also affects machine code relating to la-
bels. Thus in line 29 the value of X_COORD is replaced by its offset from the data
182 C FOR THE MICROPROCESSOR ENGINEER
segment's zero start value, that is 0200h. Notice that all lines with machine code
which contains values to be relocated later are tagged with R. A symbol table is
also produced by the Lister utility (as shown at the bottom of the listing), and
this shows the section number followed by an offset for each relocatable symbol.
Absolute symbols, such as DAC_X, have an absolute value attached.
The output from the Update module source is shown in Table 7.8. Here we
have an external symbol which is tagged with E in line 31 and identified in line 21.
No value is given for ARRAY in the Symbol table, just External.
The Vector code, shown in Table 7.9, similarly has external symbols which are
tagged with E. This has been placed in Section 0, so that it can be linked in as the
start of the 68000's vector table.
The linker program, depicted in Fig. 7.2(b), has several tasks to perform:
1. To concatenate code from the various input modules in the specified order,
to give one contiguous object module.
2. To resolve any intersegment and library symbolic references.
3. To extract code from libraries into the output object module.
4. To generate the object file together with any symbol, listing and link-time error
files.
In our example, incoming object modules contain code located in three sec-
tors; 0, 9 & 14 (two for Tables 6.1 and 6.2; _text and _data). The new composite
sections are built up by concatenating like streams from the input object files
as they come in. Unless otherwise directed, code from Section n simply begins
where the last Section n input left off. Thus looking back at Table 6.2, text in
the Display module goes from 0400h – 0425h and in the Update module from
0426h – 0463h. However, the programmer can sometimes override this progres-
sion by specifying a module's start address independently. This is how the Vector
module's text was forced to run from 0000h – 0068h, as directed in Table 7.10(b).
Table 7.10 shows the invocation of two linker programs. The top one, LOD68K
by Microtec Research, is used in our example, whilst the bottom one, LINKX by
RTS, was used to generate the code in Table 6.2.
Taking the latter first, LINKX is followed by a Command line comprising a
series of flags and file names, by which the programmer directs the action of the
linker. The action commanded in Table 7.10(b) is, reading from left to right:
Note the use of the C language prefix 0x to indicate hexadecimal. The action of
these commands can clearly be seen by looking at the addresses of the resulting
code of Table 6.2.
LINKING AND LOADING 183
Line Address
1 opt E,CASE
2 UPDATE idnt
3 ; *********************************************************************
4 ; * Interrupt service routine to update one array element *
5 ; * with the latest ECG period, as signalled by the peak detector *
6 ; * ENTRY : Via a Level1 interrupt *
7 ; * ENTRY : Location of ARRAY[0] is globally known through the linker *
8 ; * EXIT : ARRAY[i] updated, where i is a local index *
9 ; * EXIT : MPU state unchanged *
10 ; *********************************************************************
11 ;
12 00009000 COUNTER: equ 9000h ; The 16-bit period Counter
13 00009800 INT_FLAG: equ 9800h ; The external Interrupt flag
14 ;
15 sect 14 ; Data space
16 00000000 UPDATE_I: ds.b 1 ; for the array update index
17 00000002 LAST_TIME: ds.w 1 ; and for the last reading
18 ;
19 sect 9 ; Program space
20 xdef UPDATE ; This routine known to the linker
21 xref 14:ARRAY ; Get ARRAY from another module
22 00000000 48E7 UPDATE: movem.l d0/d1/a0,-(sp) ; Save used registers
C080
23 00000004 4279 clr INT_FLAG ; Reset external Interrupt flag
0000 9800
24 0000000A 3039 move.w COUNTER,d0 ; and get count from the counter
0000 9000
25 00000010 3200 move.w d0,d1 ; Put in D0.W for safekeeping
26 00000012 9279 sub.w LAST_TIME,d1 ; Sub frm last cnt for new period
0000 0002 R
27 00000018 33C0 move.w d0,LAST_TIME ; & update last counter reading
0000 0002 R
28 0000001E 4240 clr.w d0 ; Prepare to get update array indx
29 00000020 3039 move.w UPDATE_I,d0 ; expanded to word size
0000 0000 R
30 00000026 E348 lsl.w #1,d0 ; x2 to cope with word ARRAY
31 00000028 207C movea.l #ARRAY,a0 ; Point A0.L to ARRAY[0]
0000 0000 E
32 0000002E 3181 0000 move.w d1,0(a0,d0.w) ; Put new value (D1.W) in ARRAY[I]
33 00000032 5279 addq.w #1,UPDATE_I ; Move update marker on one
0000 0000 R
34 00000038 4CDF 0103 movem.l (sp)+,d0/d1/a0 ; Return machine state
35 0000003C 4E73 rte
36 end UPDATE
Symbol Table
Label Value
ARRAY 14:External
COUNTER 00009000
INT_FLAG 00009800
LAST_TIME 14:00000002
UPDATE 9 :00000000
UPDATE_I 14:00000000
184 C FOR THE MICROPROCESSOR ENGINEER
Line Address
1 opt E,CASE
2 VECTOR idnt
3 ; *********************************************************************
4 ; * Sets up Interrupt and Reset vectors at bottom of ROM *
5 ; * using globally known labels through the linker *
6 ; *********************************************************************
7 ;
8 sect 0 ; Use Section 0 for vector table
9 xdef VECTOR ; Make this routine known globally
10 xref UPDATE,DISPLAY ; These will be got through linker
11 SSP:
12 00000000 VECTOR: dc.l 0F000h ; Init value of the System Stack pointer
0000 F000
13 00000004 PCR: dc.l DISPLAY ; Go to DISPLAY routine on Reset
0000 0000 E
14 00000008 ds.l 23 ; Other vectors not used here
15 00000064 LEVEL1: dc.l UPDATE ; Addr of Level1 IRQ service routine
0000 0000 E
16 end VECTOR
Symbol Table
Label Value
DISPLAY External
LEVEL1 0:00000064
PCR 0:00000004
SSP 0:00000000
UPDATE External
VECTOR 0:00000000
LOD68K @display.cmd,display.map,display.abs
LINKX -tb 0000 vector.o -tb 0x0400 -db 0xE000 display.o update.o -odisplay.xeq
(b) The equivalent linking process using the RTS products, see Table 6.2.
While code is being entered from the various input object files, a composite
symbol table is being built up by the linker. For our example this combined
symbol table is shown in the Map file produced by LOD68K to give the final location
(i.e. map) of all code sections and symbols. There are three types of symbols
entered into the linker. Absolute symbols have been given a fixed value by the
programmer. These are usually known addresses of external hardware, such as
the X and Y digital to analog converters of our Update module. These are marked
as ABSCONST under SECTION in the map.
Defined symbols are assigned relative to the beginning of the module they are
created. Thus DLOOP is indicated as Section 9 Offset 00000402 in the Display
module.
Symbols referred to but not actually defined in a module are usually assigned
a value when all the code is in. They must be declared Public where they are
defined. When known, the value of a ref (referred to) symbol is substituted in the
code where they are referred to. Public symbols are listed separately in the Map
file.
The LOD68K linker does not give an Absolute listing file output, unlike the
LINKX product (via a utility program ABSX). However, Table 6.2 is indicative of
186 C FOR THE MICROPROCESSOR ENGINEER
Table 7.11: Output from the Microtec linker (continued next page).
SECTION SUMMARY
---------------
MODULE SUMMARY
--------------
series of object-code programs, each headed up with a name and a code length.
Should an unresolved symbol match such a name, the succeeding code is ex-
tracted and added to the appropriate Program sections already formed by the
linker. Thus, unlike a normal object-code file, only relevant portions of a library
file are extracted and used. The linker recognizes a library file from its unique
header. A typical evocation of a linker using a floating point mathematics library
might be:
In the situation where they are the same, as depicted in Fig. 7.3(a), the Loader
will frequently be part of the computer's operating system. Such a Loader can
usually deal with both relocatable and absolute object files produced by a Linker.
In the former case the operating system decides on the location of the various
code streams; in the latter the programmer can influence this decision through
the Linker. In such a resident system, the user program in its object form nor-
mally resides on disk. When the operator decides to run the program, the operat-
ing system first loads and locates the code, then proceeds directly to execution;
that is load and go. Some configurations, mainly mainframe, combine the linkage
and loading operations in one Linker-Loader program.
Although it is possible to interface devices to a computer and use a resident
configuration, in engineering applications the cross-target arrangement depicted
in Fig. 7.3(b) is the more usual. Here the microprocessor-oriented hardware is
distinct from the computing apparatus doing the code conversion. Indeed it is
unlikely that they even use the same processor. In this situation the assembler
is known as a cross-assembler as opposed to a resident assembler. Where the
user hardware is a dedicated controller with its software in ROM, the Loader must
be in the target system. This may well be part of the operating software of an
intelligent EPROM programmer, into which absolute object code is downloaded
into a RAM buffer for later programming. The blown EPROM is then moved by
hand to the target. Alternatively during development the Loader may be in an in-
circuit emulator interface package or the operating system of a microprocessor
development system (MDS). In all of these cases it is likely that the Loader will
act on absolute object code, such as depicted in Table 7.5. Absolute Loaders are
somewhat less complex than their relocatable counterparts, which are used in
resident configurations.
The cross environment is necessary because dedicated microcontrollers rarely
have the facilities necessary to develop their own software. The additional soft-
ware and hardware resources necessary for this purpose cannot be integral to the
system as they must be easily jettisoned when their use is over. Targets of this
form, without their own general-purpose operating system, are often referred to
as naked systems. The use of a general purpose computer with an in-circuit
emulator can be thought of as supplying these resources to a naked system in a
form that can be readily disengaged when no longer needed.
sum = (n + 1) * n/2;
which evaluates the sum of all integers up to n. A Lexical analysis would produce
something like that shown in Table 7.12, where each chunk is parsed into a token
and an attribute. For instance, the variable sum is an identity (the name of a
variable is commonly known as its identity) and its attribute is an address or
pointer into the Symbol table.
n+++m
THE HIGH-LEVEL PROCESS 191
where the ++ operator following a variable means Auto-Increment after use, and
prior to a variable means Auto-Increment before use. A single + in the normal
way means Add. Are the tokens n++ +m or n+ ++m?
Actually, the former is the correct interpretation, as C compilers analyze using
the `maximal munch' strategy [13]. Here the parser moves from left to right,
biting off the longest possible token; hence ++ first followed by +. Thus the
expression means add to m the post-incremented value of n.
A Lexical analysis says nothing about the relationships between the various
leximes. For this, a subsequent Syntactic analysis must be performed. The inter-
relationship and order of operators, constants and variables for our example is
shown in the Syntax tree of Fig. 7.4. The expression to the right of the assignment
operator is evaluated from the most distant parts up: that is, first add n + 1 then
multiply by n and then divide by 2. The variable sum is finally overwritten by this
value. This process is governed by the precedence order defined by the language
(e.g. multiplication has a higher precedence than addition), direction of evaluation
(e.g. right to left) — see Table 8.4 for an example of both of these —, parenthesis,
brackets, loop constructs etc. More elaborate Syntax trees are often called Parse
trees.
Parse trees are in turn subjected to Semantic analysis. This gathers type in-
formation relevant to the coming code-generation phase. In particular the type
and size of variables and constants need to be checked and altered according to
the rules of the language. For example in C if we have an expression of the form:
192 C FOR THE MICROPROCESSOR ENGINEER
Z = X + Y;
where X has been declared an integer (say 32 bits) and Y a short integer (say
16 bits), then Y must be expanded to 32 bits before the addition is performed.
Other type conversions include signed and unsigned combinations, floating and
fixed-point mixes etc. Errors may be reported during this phase as well as all
previous phases.
The output of the Semantic analysis is a type of Intermediate code. Although
Intermediate code is independent of a real machine, it nevertheless reflects the
type of operation available in the target. The synthesis of real machine code
involves the determination of storage requirements and addressing algorithms
for the variables and the expansion of the Intermediate code statements to se-
quences of machine-specific instructions. Intermediate code for our example may
be something like:
The actual machine code produced by the Cosmic 6809 C cross-compiler V3.1
for this example is shown in Table 7.13. Notice how closely it mirrors this pseudo
code.
The front end of the compiler covering the analysis phases through to the
production of Intermediate code is mainly a function of the source language and
largely independent of the target machine. The back end of the compiler includes
those portions of the compiler that generate the specific target language. In
theory the target may be changed by replacing only the back end components.
The code produced by the compiler illustrated in Table 7.13 is in normal
assembly-level format. To complete the production of machine code, it can be
passed on through the chain of Fig. 7.2, that is the assembly process. Other mod-
ules from high-level language programs, assembly-level programs and libraries
can be linked to generate the final executable code.
The compilation process used to produce the code in Table 7.13 is shown in
Fig. 7.5. The Whitesmiths Group series of compilers use separate programs to
implement the various processes discussed above [14]. These are:
pp
The preprocessor implements the Lexical analysis. Also expands out #include
file and #define substitutions and macros.
p1
Performs Syntax and Semantic analysis to produce intermediate code.
THE HIGH-LEVEL PROCESS 193
p2nn
Generates source code for machine nn's assembler. For example, p209 synthe-
sizes source code for a 6809 MPU, p280 for 8080/8085 processors.
optnn
This is an optional peephole optimizer which eliminates redundant instruc-
tions generated by p2nn.
Not shown in the diagram is the Listing utility which produces interleaved
listings of assembly-level statements as comments. Also the optional front-end
Pascal to C (PTC) translator used when Pascal source is desired.
Splitting up the compiler into separate programs has the advantages of flexi-
bility and requires less in the way of memory capacity of the computer. A single
composite compiler program, as used in most commercial products, is much
faster but does demand more of the translating engine. In both situations, var-
ious options are selected by following the program(s) with flags in the form of
command lines or files, much as depicted in Table 7.10.
The top right process box in Fig. 7.5 is labelled Peephole optimizer. Depending
on the sophistication of the compiler translation, a variable percentage of the
machine code produced is either inefficient or redundant. For example, as each
high-level code statement is processed separately, a subsequent translation may
not be aware that a variable that it requires is already down in a Data register.
Accessing an array element in a loop may require the use of a complex address
mode to calculate its location, (see line 27 of Table 7.7 as an example). However,
if the array elements are going to be accessed sequentially, it will be faster to
load an Address register prior to entering the loop with the address of the array
element to be accessed on the first loop iteration. Thereafter, indirect addressing
with automatic increment/decrement can be used. This is known as strength
reduction. There are of course obvious faux pas such as using multiplication for
the function X * 1!
Peephole optimization is a method of improving the quality of the machine
code by moving a small window over the target program looking for redundancies.
This window is typically 30 – 100 code lines. In general, the window-sized scan
will be repeated until no further improvements can be made.
There are many different types of techniques for optimization transforma-
tions. Reference [15] gives an overview of this area. For example, the Microtec
Table 7.14: Passing a simple program through the compiler of Fig. 7.5 (continued next
page).
unsigned int sum_of_n()
{
static unsigned int n;
static unsigned int sum;
sum=0;
while(n>0)
{
sum=sum+n;
n=n-1;
}
return(sum);
}
(a) C source.
; Compilateur C pour MC6809 (COSMIC-France)
.processor m6809
.psect _data
L3_n: .byte 0,0
L31_sum: .byte 0,0
.psect _text
_sum_of_n: clra (sum=0000 )
clrb
std L31_sum
L1: ldx L3_n
cmpx #0 (n>0? )
jbeq L11 (Exit IF true )
ldd L31_sum (Get sum )
addd L3_n (Add n to it )
std L31_sum (= new sum )
ldd L3_n (Get n )
addd #-1 (Subtract 1 )
std L3_n (n=n-1 )
jbr L1 (Repeat while )
L11: ldd L31_sum (Return sum )
rts
.public _sum_of_n
.end
Table 7.14: Passing a simple program through the compiler of Fig. 7.5 (continued next
page).
; Compilateur C pour MC6809 (COSMIC-France)
.processor m6809
.psect _data
L3_n: .byte 0,0
L31_sum: .byte 0,0
; unsigned int sum_of_n() ; {
.psect _text
; static unsigned int n; ; static unsigned int sum; ; sum=0;
_sum_of_n: clra
clrb
std L31_sum
; line 6 ; while(n>0)
L1: ldx L3_n
;***** cmpx #0 (Removed by optimizer)
jbeq L11
; { ; sum=sum+n;
ldd L31_sum
addd L3_n
std L31_sum
; n=n-1;
ldd L3_n
addd #-1
std L3_n
; }
jbr L1
; line 10 ; return(sum);
L11: ldd L31_sum
rts
; }
.public _sum_of_n
.end
(c) Optimized, with C source interspersed.
1 ; Compilateur C pour MC6809 (COSMIC-France)
2 .processor m6809
3 .psect _data
4 0000 0000 L3_n: .byte 0,0
5 0002 0000 L31_sum: .byte 0,0
6 ; unsigned int sum_of_n()
7 ; {
8 .psect _text
9 ; static unsigned int n;
10; static unsigned int sum;
11; sum=0;
12 E000 4F _sum_of_n: clra
13 E001 5F clrb
14 E002 FD0002 std L31_sum
15; line 6; while(n>0)
16 E005 BE0000 L1: ldx L3_n
17;***** cmpx #0
18 E008 2714 jbeq L11
19; {
20; sum=sum+n;
21 E00A FC0002 ldd L31_sum
22 E00D F30000 addd L3_n
23 E010 FD0002 std L31_sum
24; n=n-1;
25 E013 FC0000 ldd L3_n
26 E016 C3FFFF addd #-1
27 E019 FD0000 std L3_n
28; }
29 E01C 20E7 jbr L1
30; line 10; return(sum);
31 E01E FC0002 L11: ldd L31_sum
32 E021 39 rts
33; }
34 .public _sum_of_n
35 .end
(d) Object listing.
THE HIGH-LEVEL PROCESS 197
Table 7.14 (continued) Passing a simple program through the compiler of Fig. 7.5.
:20E000004F5FFD0002BE00002714FC0002F30000FD0002FC0000C3FFFFFD000020E7FC00AD
:02E020000239C3
:0400000000000000FC
:00E000011F
BASIC language is usually run under an interpreter, although compilers are avail-
able for this language and may be used once the interpreter-based development
has been completed. C interpreters are also available as a development aid, but
are rarely used.
A compromise is sometimes effected, where a compiler produces an interme-
diate code and at run time a much simplified interpreter èxecutes' this code as
its source. Pascal is traditionally used in this manner (the compiler producing
p-code).
References
[1] Barron, D.W.; Assemblers and Loaders, MacDonald and Jane's, (UK), 3rd ed., 1978,
Chapters 1 – 4.
[2] Calingaert, P.; Assemblers, Compilers and Program Translation, Computer Science
Press, Springer-Verlag, 1979, Chapter 2.
[3] Fischer, W.P.; Microprocessor Assembly Language Draft Standard, Computer, 12,
no. 12, Dec. 1979, pp. 96 – 109.
[4] Standard for Microprocessor Assembly Language, ANSI/IEEE Standard 694-1985,
IEEE Service Center, Publications Sales Dept., 445 Hoes Lane, POB 1331, Piscataway,
NJ 08855-1331, USA.
[5] Wakerly, J.F.; Microcomputer Architecture and Programming: The 68000 Family,
Wiley, 1989, Section 6.3.
[6] Barron, D.W.; Assemblers and Loaders, MacDonald and Jane's, (UK), 3rd ed., 1978,
Chapter 6.
[7] Walker, G.; Towards a Structured 6809 Assembler Language, Parts 1 and 2, BYTE, 6,
nos. 11 and 12, Nov. and Dec., 1981, pp. 370 – 382 and 198 – 228.
[8] Barron, D.W.; Assemblers and Loaders, MacDonald and Jane's, (UK), 3rd ed., 1978,
Chapter 5.
[9] Barron, D.W.; Assemblers and Loaders, MacDonald and Jane's, (UK), 3rd ed., 1978,
Chapter 8.
[10] Aho, A.V.; Compilers, Addison-Wesley, 1986, Chapter 1.
[11] Aho, A.V.; Compilers, Addison-Wesley, 1986, Chapters 3 – 9.
[12] Calingaert, P.; Assemblers, Compilers and Program Translation, Computer Science
Press, Springer-Verlag, Chapters 6 and 7.
[13] Koenig, A.; C Traps and Pitfalls, Addison-Wesley, 1989, Section 1.3.
[14] Reid, L. and McKinlay, A.P.; Whitesmiths C Compiler, BYTE, 8, no. 1, Jan. 1983,
pp. 330 – 343.
[15] Aho, A.V.; Compilers, Addison-Wesley, 1986, Chapter 10.
CHAPTER 8
Naked C
199
200 C FOR THE MICROPROCESSOR ENGINEER
possible, and that such changes should allow existing programs to compile with,
at most, minor changes. The two major changes were the tightening up of the
syntax for declaring and defining functions, so that the compiler can report er-
rors due to mismatched arguments. The original specification did not define
the libraries accompanying C; although many such functions became de facto
standards, there were many portability problems. ANSI C has a standard library,
which is a specified part of the language running in a hosted environment; that
is with an operating system in situ.
Within the scope of this book, it is impossible to do more than survey the
elements of programming in C. There are many excellent texts devoted entirely
to this end, some of which are listed at the end of this chapter [7, 8, 9, 10, 11]. To
reduce the size of this summary, aspects of C which are unlikely to be of interest
to non-hosted environments, that is naked MPU-based systems, have been omit-
ted, for example, file and terminal I/O functions. In addition I have concentrated
on the newer ANSI C language, which I have given the generic term C. Where the
original specification is alluded to, the term old C has been used. At the time of
writing, virtually all compilers are implementing ANSI C.
The program itself, shown in Table 8.1, is a slightly modified version of Ta-
ble 7.14(a). It is written in the form of a subroutine (known in C as a function)
with the variable n being passed to it from the calling program, and sum being
returned to the caller. The algorithm continually adds n to the initially cleared
sum, as n is decremented to zero.
A TUTORIAL INTRODUCTION 201
1–5: These five lines are comments. Any characters between delimiters /* and */
are regarded as a single space by the compiler. Comments can be anywhere
where whitespace (the collective term for blank, tab or newline) can appear.
Thus lines 10, 12 and 13 have comments after the executable part. Generally,
whitespace is used as a matter of style to make the code easier to read. The
language itself is entirely freeform, provided that the various statements etc.
can be distinguished by the preprocessor.
6: This line names the function sum_of_n and declares that it returns an un-
signed integer value and acts on an unsigned short integer variable n passed
to it by the calling program. Setting out the function parameters like this is
known as prototyping. Objects of type int mean that such variables have
fixed-point (as opposed to real floating-point) values. A short integer size is
typically 16 bits whilst a plain integer is typically 32 bits (see Fig. 8.3). Here
they are to be treated as unsigned numbers.
7: A left brace thus { is equivalent to begin in Pascal. All begins must be
matched with an end, or in C a right brace }. It is good programming style
to indent each begin from the column of the immediately preceding line(s)
and to ensure that begin and end braces line up. In this case line 16 is the
corresponding end brace. Between lines 7 and 16 is the body of the function
sum_of_n().
8: There is only one variable which is local to our function. Its name and type
are defined here. In C all variables (unless external) must be defined before
they are used. Conventionally, all variables are defined at the beginning of the
function. A definition tells the compiler what properties the named variable
has, for example size, so that it can allocate suitable storage. Several variables
of the same type may be defined in the one statement, for example:
int var1, var2, var3;
The line is terminated by a semicolon ; as are all statements in C.
9: Here we assign (=) the value 0 to the variable sum, that is clear it. A definition
and an initializing assignment can frequently be combined; thus:
unsigned int sum = 0;
is a legitimate statement combining lines 8 and 9.
10: In evaluating sum we need to repeat the same process for as long as n is greater
than zero. This is the purpose of the while loop introduced in this line. The
body of this loop, that is the statements which appear between the following
left and right braces of lines 11 and 14, is continually executed for as long
as the expression in the parentheses evaluates to non-zero (True in C). This
test is done before each pass through the body. Thus in our example, on
entry the expression n > 0 is evaluated. If True, then n is added to sum, n is
decremented, and the loop test repeated. In this case, eventually n reaches
zero. Then the expression n > 0 evaluates to False (zero), and the statement
following the closing brace is entered (line 15).
An alternative is while(n), which will also terminate when n reaches zero
(False). This is similar to the difference at assembly level between the Test and
Compare operations.
202 C FOR THE MICROPROCESSOR ENGINEER
11: The begin brace defining the while body. Notice that for style it is indented.
12: The right expression to the assignment is evaluated, sum + n, and the result-
ing value (r_value) given to the left variable (l_value), sum. The expression
sum += n; in C is equivalent and means increment sum by n.
13: Here one is subtracted from n and the result becomes the new n. The decre-
ment operator -- can also be used giving the expression n--;.
14: The end brace. Notice the style with the opening brace (in line 7) and clos-
ing brace vertically aligned. This reduces the chance of an error in complex
expressions. Braces are used to surround compound statements; that is se-
quences of single statements. Such blocks can be treated in exactly the way
a single statement is dealt with. Except where they surround the body of a
function, braces may be omitted when the block has only one statement (a
simple statement). In our example, lines 9 – 14 could be replaced by:
while(n) sum+=n--;
which reads: while n is non-zero, add n to sum and decrement n. C can be
written in a terse style like this, but the result can be difficult to read. The
style used in this book would be:
while(n)
{sum += n--;}
At a simple level, our dissection has given us a feeling for the basic architecture
of a C function. C programs are normally structured in a modular fashion with a
central function, conventionally named main(), calling up a series of functions,
some of which may be from a library. Functions can of course be nested. This
structure is shown in Fig. 8.1.
We will spend the remainder of this and the next chapter exploring the basic
concepts informally introduced here, and enlarging our repertoire of C operations
and constructions.
We will discuss these properties at some length in this section, except for scope
which is deferred until Section 9.1.
Simple objects are based on a fixed set of basic types, which are illustrated
in Fig. 8.3. The fundamental division is between real and integer forms. The
former are valued in terms of floating-point numbers, with sign, magnitude and
exponent parts. Three real types are specified, namely float, double float and
long double float. C does not guarantee that the three types will in any given
implementation differ in precision, only that a double float object will never
be of lower precision than a plain float equivalent, and similarly that a long
double float will never be of lower precision than a double float equivalent.
The actual format is implementation dependent, but typically conforms to the
ANSI Standard 754-1985 [12] shown in Fig. 8.3.
Most microprocessor-target implementations treat long double objects as
the default double. Some also permit the optional situation where all real types
are treated as single-precision float objects. This gives faster processing at the
expense of precision, especially when real operations are not implemented in a
mathematics co-processor. Even when only one or two precisions are actually im-
plemented, it is not considered an error to declare an object of an unimplemented
size. For example, where an implementation only supports a single- and double-
204 C FOR THE MICROPROCESSOR ENGINEER
u
n
s s
i i
v g g
c o n n
o e e
n l
s a d d
t t
a i
long int
n l
t e
❄ ❄ int
Integer
short int
❃
✚
❄ ❄ ✚
✚ char
✚
Object
long double float
Floating-
~
double float
point
float
where n is the number of bits. The qualifier short int can be shortened to just
short; similarly long int shortens to long.
Although all int types default to a signed representation, they may be prefixed
by the (redundant) qualifier signed for clarity. The unsigned qualifier can be
used to give an object which is positive only, and covering the range 0 to 2n − 1.
206 C FOR THE MICROPROCESSOR ENGINEER
defines an object called sum which can range from 0 to 4,294,967,295 (assuming
4-byte size).
char types are not guaranteed either way. The qualifier signed or unsigned
must be used if such objects are reliably to partake in mathematical operations
with other integer types, see Section 8.4.
A void object does not exist and does not (naturally) take up any space. It is
normally used to declare that a function does not return a variable back to the
caller or that no variable is passed by the caller to the function. This is illustrated
in Section 9.1. void could properly be said to be a pseudo type.
One of the major properties of an object is where it will be stored: in a register,
in an absolute memory location or in a relative memory location in a stack-based
frame. The programmer can use the qualifiers register, static or auto to
declare which storage class the named variable is to be assigned.
In order to illustrate how these high-level attributes map down to assembly
level, I have compiled three versions of a slightly modified version of the sum-
of-integers C program of Table. 8.1. The output of this compiler is shown in
Table 8.2. This shows the original C source code statements as comments (the
syntax of the assembler uses a * to denote a comment) interspersed with their
resulting assembly-level instructions. I have manually added comments in paren-
theses to clarify what is happening at this assembly level; the compiler cannot
do this.
By default, all variables are automatically assigned locations in a frame when
they are defined on entry to the function in which they operate. They are then
accessed relative to the Frame Pointer, as illustrated in Figs 5.6 and 5.7. This is
the situation shown in Table 8.2(a) where the variables are declared type auto
(line C3). The Frame is made eight deep (LINK A6,#-8) as described in Sec-
tion 5.2. The 4-byte variable n is located at A6-4:3:2:1, and can be fetched
using MOVE.L -4(A6),D7, where A6 is the Frame Pointer. Similarly the resulting
4-byte sum is located at A6-8:7:6:5, hence the operation MOVE.L D7,-8(A6).
Although the qualifier auto is shown in this example, it is usually omitted as the
default.
Once a function or compound statement has been completed, the frame for
any internally defined variables (that is variables defined inside the braces) is
closed and its contents lost. Thus if that code is re-entered at some time in the
future, no sensible use can be made of an auto variable's previous incarnation.
Thus an auto variable's lifetime is simply from where it is defined to its corre-
sponding closing brace. It is unknown outside this region, that is its scope is
local to that of the braces within which it was defined.
A static variable is permanently allocated storage, rather than residing inside
a transient frame. Thus in Table 8.2(b), both variables n and sum are given room
in the data program section (.data is the directive used for the Whitesmith's
VARIABLES AND CONSTANTS 207
assembler as equivalent to .psect _data, and similarly .text for the text sec-
tion), and the compiler names them L3_n and L31_sum respectively. Now to
fetch n we have the instruction MOVE.L L3_n,D7. Similarly, to update sum we
have MOVE.L D7,L31_sum. Both L3_n and L31_sum of course translate to abso-
lute addresses after linking. Absolutely located variables usually take longer to
fetch and return to memory as opposed to stack-based (i.e. auto) storage.
Internally defined static variables have the same scope as auto variables, that
is they are local to the function or compound statement in which they are defined.
Their lifetime is however that of the program run. Thus if the code is re-entered,
the last value of that static variable will still be known. static variables can
be declared outside a function, in which case they are globally known from their
definition point onwards. This will be discussed in Section 9.1.
Variables have to be brought down to a register to be processed, and then re-
turned to their abode in memory (either to a fixed or relative address) afterwards.
All these toings and froings are time consuming and take up program space. In
processors with a copious supply of registers, some can be reserved to keep vari-
ables in situ for longer periods. This is especially valuable in a loop situation
where, otherwise, variables would have to be continually swapped in and out of
memory.
The programmer can designate any number of auto variables as candidates
for register storage, by using the keyword register. The compiler does not have
to take any notice of this, and if ignored, such variables are treated as auto types.
The Whitesmith 68020 C cross-compiler V3.2, used to generate the code shown
in Table 8.2(c), reserves three Data and three Address registers for this purpose.
Such register variables are widened to 32 bits (int) when fitted into the desig-
nated register. Floating-point variables cannot be designated register types, but
pointers (addresses) of such objects can. The scope and lifetime of register vari-
VARIABLES AND CONSTANTS 209
ables is identical to that of auto types; indeed they behave in an equivalent way
to these, except that their address cannot be taken (see Section 9.2).
Variables can be given a value at any time by simple assignment, for example
sum = 0, in line C4 of Table 8.2. It is possible to initialize a variable at the time
of its definition; thus we may have:
int x=5, y=10, z=-3;
defining x as a (signed) integer with an initial value of +5, y likewise at +10 and z
starting off life as −3. How this is done at machine code level, and the resulting
effects at high level, depends on whether the variable has a permanent storage
location (i.e. is static) or temporary (i.e. auto or register).
Variables that are static as viewed from the high-level perspective are given
their initial value before the program begins execution. This is obvious when the
assembly code of a static initializing definition is examined, as shown in Ta-
ble 8.3(a). Each static variable has its location reserved for it in data space with
the constant in situ, by using a .BYTE (or DC) directive. Thus when the program
is put into memory prior to execution by using a loader, these constants will be
placed at the appropriate addresses. Loading is a one-off procedure, and no mat-
ter how often the definition code is executed, the contents of these locations will
not be re-initialized.
Notice from the listing that c has been given an initial value of zero. The
language specification guarantees that all uninitialized static variables will be
zero (see also lines 4 and 5 of Table 7.14(d)).
Relying on initial static variable states is dangerous when ROMable C code
(that is code destined to be located in ROM) is executed. This is because there
is no loading action before execution, the program being permanently stored
in ROM. Data in RAM will be garbage, as the power-up state for such memory is
unspecified. This is discussed further in Section 10.3.
Variables that are auto or register can be initialized in their definition using
the same syntax, but the effects are very different from the previous situation. As
can be seen from Table 8.3(b) such a definition leads to executable code identical
to that produced by the sequence:
auto int a, b, c;
a = 5;
b = 23;
Such code is executed at each pass through the function; that is the constants a
and b are re-initialized each time. auto and register variables are said to be
initialized at run time, as opposed to static variables which get their primary
values at load time. Uninitialized auto and register variables have no pre-
dictable value, as their locations will either hold the random power-up state of
volatile memory or a value generated by some other code which used the same
locations previously.
The const and volatile type modifiers are new to ANSII C [13]. An object
declared const must not be changed by the compiler subsequent to any optional
pre-initialization. Code such as:
210 C FOR THE MICROPROCESSOR ENGINEER
.processor m6809
.psect _text
; 1 main()
; 2 {
_main: pshs u ; Open frame
leau ,s
leas -6,s ; Six deep
; 3 auto int a=5, b=23, c;
ldd #5 ; Make a 5
std -2,u
ldd #23 ; Make b 23
std -4,u
; 4 c=a+b;
ldd -2,u ; Get a
addd -4,u ; Add b
std -6,u ; = c
; 5 }
leas ,u ; Close frame
puls u,pc
.public _main
.end
int a, b;
const int c;
c = a + b;
usually in ROM.
The const modifier can also be used together with the volatile modifier to
declare a peripheral register as read-only. Specifically the volatile qualifier
warns the compiler that the specified variable may be altered by some outside
agency not known by the program; that is its value is subject to spontaneous and
random change. Thus for example, an input port will reflect an external event
not under the program's control. Also the compiler should never try to modify
an input port's contents; it is read-only.
The classical example is monitoring a bit in a Status register, waiting for an
event to happen, for example:
unsigned char i; /* i is an ordinary variable */
volatile unsigned char status; /* status is the Status register */
const volatile unsigned char in_port; /* in_port is the read-only input */
while (!(status & 0x80)) /* As long as bit 7 is False (0) */
{;} /* Do this (a null statement) */
i = port; /* When bit 7 is True (1), read in_port */
Here the Status register is continually ANDed (&, the bitwise AND operator) with
the mask 10000000b (80h = 0x80). If bit 7 (the flag which says an event has hap-
pened out there) is 0 or False, then the expression !(status & 0x80) returns
!(False). As ! is the logic NOT operator, this yields NOT False or True, and
the body of the while construction is executed. The single ; statement termi-
nator is used to give a null body. When bit 7 is high !(status & 0x80) returns
!(True) = False and the polling terminates.
If the volatile qualifier is not used, then the compiler may well optimize
the situation by reading in status to a register. The compiler will then contin-
ually test this copy, to save regularly bringing it down into the MPU. This would
only make sense if status were an ordinary variable whose value could only
be changed by the compiler. Note that the port peripheral register has been
declared both const and volatile. This means it is a read-only object (the com-
piler should not try and alter it) and its value can only be modified by an outside
agency. The object descriptor pair of modifiers unsigned char are the equiva-
lent to saying that the qualified object is byte sized, as illustrated below. Another
example of volatile is given in the listing of Table 9.6. Normally objects of this
kind are pointers to (i.e. addresses of) hardware ports or fixed memory locations,
rather than the objects themselves. We will discuss pointers in Section 9.2.
read-only
externally alterable
byte sized
name of object
are identical, with the 0x prefix indicating hexadecimal and 0 for octal. Be careful,
377 (decimal) and 0377 (octal) are very different in C. Character constants are
indicated by single quotes; thus:
i = 'a' /* Same as i = 0x61; if ASCII */
x + y * z
x + (y * z)
(x + y) * z
as C follows the usual rules of computing the contents of parentheses first (in-
nermost outwards for nested parentheses).
The way in which an expression is combined is obviously of critical impor-
tance. In C, operators are graded in order of their precedence. Table 8.4, which
lists operators in descending order of precedence, shows that multiplication is
of a higher precedence than addition, and so will be implemented first. Thus
the first form of the parenthesized expression above is equivalent to our original
statement.
This still leaves us with the problem of mixing operators at the same level of
precedence. For example:
x/y/z
Is this (x/y)/z or x/(y/z)? The outcomes are very different. Most operators
associate from left to right, thus the equivalent here is:
(x/y)/z
f = x = y = z = 0;
What value will f have? The answer here is 0, as assignment operators associate
from right to left. Firstly z will be assigned to 0, then y to z (i.e. 0), then x to y
(i.e. 0) then f to x (i.e. 0); so all variables will be set to 0, that is:
f = (x = (y = (z = 0)));
214 C FOR THE MICROPROCESSOR ENGINEER
Table 8.4: C operators, their precedence and associativity (continued next page).
{
y = z + 5;
x = 12 * y + 2;
}
Most of the operators listed in Table 8.4 are intuitive and will not be covered
in any detail here. Apart from functions, arrays and structures, discussions of
which are deferred to Chapter 9, the unary operators have the highest priority.
Unary operators attach to a single object, for example ~x inverts all bits in x
(1's complement). Most operators are binary in that they connect two objects,
for example x + y. Unaries bind very tightly to their object due to their high
priority; thus:
a = b + ~x;
and
a = b + (~x);
are the same, as ~ has a higher priority than the binary addition operator.
Care must be taken when inverting C objects, as all zeros implicit in the vari-
able become ones. Consider:
int i = 0xA9, j;
j = ~i; /* j = 0xFF56 or 0xFFFF56 */
j = ~i & 0xFF; /* j = 0x0056 or 0x000056 */
Although i is assigned constant 0xA9, its bit pattern will be 0000 0000 1010
1001b or 0000 0000 0000 0000 0000 0000 1010 1001b, depending on whether
int is 16 or 32 bits. On inversion, all the implicit zero bits will become one
as shown. Bit ANDing (the && operator) by 1111 1111b will clear these, as this
int constant has implicit leading digits of zero. & has a lower priority than ~, so
no parentheses are required. In a 2's complement machine the unary - operator
acts in a similar way to ~, that is -a is the same as ~a + 1 (2's complement is
invert plus 1). As you would expect, unary - simply changes the sign of the object.
Consider the statement:
f = a + (b-c);
You might think that the expression (b-c) would be evaluated first and then a
added to it. In fact C will ignore the parentheses, deeming them unnecessary,
as the binary addition (+) and subtraction (-) operators have the same level of
priority. Then, according to the table, evaluation occurs from left to right; that
is a + b and then -c. If it is important to you to add (b-c) to a and not b alone
(perhaps because you are afraid of overflow) then the unary + operator will ensure
this happens; that is:
f = a + +(b-c);
OPERATORS, EXPRESSIONS AND STATEMENTS 217
Unary + forces evaluation of its operand, as this has a higher priority than the
binary + Addition operator (see Table 8.4).
Although C guarantees the way an expression is put together according to the
rules of precedence and associativity, it says nothing about the sequence in which
component sub-expressions are produced. Consider the following (convoluted)
statement:
f = a + (z = z+4) + 3*z;
where the writer hoped that the parentheses would force the variable z to be
incremented by four first, then multiplied by 3 and finally, left to right, a added
to the new value of z and then added to three times the new value of z. But
what if the compiler took it into its head firstly to multiply z by three (i.e. old z)
and store the answer away somewhere, then evaluate (z+4) and store it away,
and then add a to the new value of z plus three times the old value! This type
of occurrence is known as a side effect, as it is usually caused by using an as-
signment, increment, decrement or function that changes the value of an object
that appears elsewhere in the expression. C makes no promises that side effects
will occur in a predictable order within a single statement [14]. A safer sequence
would be:
z = z + 4;
f = a + z + 3*z;
or
f = a + 4*(z + 4);
Unary operators normally tag their object to the left, the possible exception
being the Increment ++ and Decrement -- unaries. These can be before or after
the identifier; their effect being subtly different. A left Increment/Decrement
unary operator means first change the object and then use it. A right unary
means first use the (old) value in the calculations and then change the object. For
example:
sum = sum + n--; /* Add n to sum, then decrement n */
sum = sum + --n; /* Decrement n first, then add n to sum */
The former is clearly shown in line C6 of Table 8.2(b). First n is fetched from
memory into internal storage (MOVE.L L3_n,A1), then the original object out
there in memory is decremented (SUBQ.L #1,L3_n). Finally the original value is
used for the addition (MOVE.L A1,D7; ADD.L d7,L31_sum).
Because of side effects, care must be taken that Incremented/Decremented
objects do not appear elsewhere in the same statement; for example:
z = 6*n-- + a/n;
Will the n used in the denominator have the old or new value? You will be at the
mercy of the vagaries of your compiler in writing such code.
Rather confusingly, some of the unary operators have the same symbols as
binary operators, with very different meanings; particularly address of (&) and
218 C FOR THE MICROPROCESSOR ENGINEER
separates the 8-bit object packed_BCD into its two 4-bit constituent BCD digits
in their ASCII form. The low digit is obtained by clearing the upper four bits with
a bitwise AND, whilst the high digit is separated out by shifting right four times.
Adding ASCII 0 (i.e. 0x30) converts to the appropriate ASCII code. Parentheses
are used, as & and >> are of lower priority than +. Table 8.5 shows how these
operations translate to 68000 code.
The Shift Left operation always feeds in zeros. Shifting right is more problem-
atical. If the object is unsigned, then a Logic Shift Right is generated, with zeros
moving in. The situation is confused when a signed object is being acted upon.
Most compilers will emit an Arithmetic Shift Right, where the sign bit is propa-
gated along. However, this is not guaranteed. If a Logic Shift Right is desired,
OPERATORS, EXPRESSIONS AND STATEMENTS 219
z = (unsigned int)a » 6;
where I have assumed a is a signed int type. The unary operator (type) used
to force the variable a is known as a cast.
C has a range of relational and logic operations which treat objects as Booleans,
that is having only two values, True (non-zero) and False (zero). We have already
used the Greater Than (>) operator in line 5 of Table 8.2. Here the value of n is
compared to 0. If Greater Than, then the outcome of the expression n > 0 is 1
(i.e. True); otherwise the outcome is 0 (False). Actually in this case the construc-
tion while (n) would do the same thing. Unary logic NOT (!) simply changes
the truth value of the object; for example:
while ((!n && m) || (n && !m) ) /* while this is true */
{do this, that and the other} /* loop body */
executes the loop body if n is False (!n True) AND m is True OR ELSE n is True
AND m is False. In other words, only if one of m or n is False (i.e. 0) will the loop
body be executed. Notice the use of && and || for logic AND and OR, as opposed
to the bitwise & and | operator symbols.
All logic (Boolean) expressions are guaranteed to be evaluated left to right, and
this evaluation ceases as soon as an overall result can be ascertained. Thus in the
example above, if n were False and m were True, the sub-expression (n && !m)
would not be executed. Thus fancy programming such as:
(!n && m)||(n && !m++)
would be dangerous as the m++ increment would only happen if n was True and/or
m was False. In this case, the first expression would be False and the compiler
would move onto the second expression.
Mixing up the logic equivalent operator == and assignment operator = is a
major source of error [15] (not helped by most texts calling == equal). Compare
the following two statements:
if (a == b) {do this;} /* Correct */
if (a = b) {do this;} /* Dangerous */
In the former case the value of a is compared to that of b. If they are the same,
(True) the value of the expression is 1, and {this;} is executed. If they differ,
the result is 0 (False) and {this;} is skipped (see page 224). Neither a nor b are
changed by this process. In the latter case a is assigned the value of b, and the
value of the expression is b. If b is non-zero then {this;} is done, and if zero,
skipped. It is unlikely the programmer meant to do this, and if he/she did, then
it should be done in a less obscure fashion.
As a final example, consider the problem of determining the state of the
most significant bit of an unsigned int object x. This simply requires AND-
ing by 2n−1 , where n is the number of bits in the object. Unfortunately an int
220 C FOR THE MICROPROCESSOR ENGINEER
object can have 16 bits in some implementations and 32 bits in others (other val-
ues are also possible but rare). If the software is to be written in a portable form,
then one of the two masks 215 (10000000b) and 231 (1000000000000000b) has
to be chosen.
C has a unary operator called sizeof, which operates on a type designator or
object, and which returns its size in bytes. This also applies to composite objects
such as arrays and structures. Using this, a possible sequence might be:
if (sizeof(x) == 4) /* Has x got 4 bytes? */
{mask = 0x80000000;} /* If True */
else
{mask = 0x8000;} /* If not True */
msb = mask & x;
In Section 9.2, we repeat this example for larger binary numbers, using an array
data structure.
Consider the statement above:
binary = binary/10;
binary /= 10;
using the /= compound assignment function. This could be read as divide binary
by 10.
Apart from compound assignments' concise notation, there can be advantages
in the size of machine code emitted where complex objects are involved. As an
example, consider a 2-dimensional byte array (see Section 9.2) of 100 rows and
12 columns. If, say, we wish to multiply an element 5 rows down and 3 columns
across, we could write:
x[5][2] = x[5][2] * n;
using simple assignment. The compiler knows where the start address of the
array is, so to get x[5][2] it must multiply the number of rows (5) by the max-
imum number of columns (i.e. 12). Finally add the actual number of columns
(2 across). This is the number of bytes on from the start (62), see Fig. 9.3(b), and
would then be used as part of some Indexed address mode to give the effective
address (ea). Once x[5][2] was down, it would be multiplied by n. The compiler
would then move to the left side of the assignment, and if not very bright would
again calculate the ea (probably previously thrown away) to determine the target
address for the Store/Move. This takes lots of wasted time and code.
The alternative compound assignment is written:
x[5][2] *= 2;
The compiler now knows that the ea has only to be calculated once, which con-
sequently produces a superior coding.
Using this notation, line 6 of Table 8.2 could be replaced by:
sum += n;
or even, using the comma operator (,), lines 5 and 6 could be combined as:
while (sum += n--, n > 0) {;}
The comma operator, shown at the bottom of Table 8.4, allows expressions
to be concatenated. Each such expression is guaranteed to be evaluated from left
sub-expression to right, with the value being that of the rightmost sub-expression.
Thus, in the example above, sum += n-- will be executed and then the test n > 0.
The value (True or False) of this latter is the one acted upon by the while instruc-
tion. Notice the use of {;} to indicate a null statement (i.e. do nothing). The
braces are optional. It is normally recommended that the comma operator be
used with caution.
A close scrutiny of the code produced in Table 8.5 shows that the three objects
packed_BCD, BCD_LOW and BCD_HIGH are stored in memory as bytes (at [A6]-1,
[A6]-2 and [A6]-3 respectively), as expected by their declaration as char. How-
ever, when brought down into a MPU register, they are converted into 32-bit ints.
For example:
MOVEQ.L #0,D6 ; Clears all 32 bits
MOVE.B -1(A6),D6 ; packed_BCD occupies lower 8 bits
222 C FOR THE MICROPROCESSOR ENGINEER
shows the promotion of the unsigned char packed_BCD to 32-bit status by mak-
ing the upper 24 bits zero. If packed_BCD had been signed, then a Sign Extension
would have been used (e.g. EXT for the 68000 MPU). This promotion to int is the
reason why an Arithmetic Shift Right (ASR) was used to implement >> in line C5,
as opposed to the expected LSR, as int is signed and the compiler sensibly uses
Arithmetic Shift operations for signed numbers.
In general, C prefers to do all its fixed point arithmetic in int form. Thus,
as shown by the thick arrow in Fig. 8.4, all objects declared signed or unsigned
char, signed or unsigned short are automatically made int for the duration
of their stay in the processor. Some compilers give the option of disabling this
widening, which can be useful for 8-bit MPUs which have difficulty in this area.
However, this extension facility is non-standard. In a similar manner, C prefers to
do its floating-point operations in double float form. This too may sometimes
be changed to the non-standard single-precision float size, to save time and
storage.
C permits arithmetic with mixed types. Consider the following example:
short z;
OPERATORS, EXPRESSIONS AND STATEMENTS 223
int x;
unsigned long y;
float a;
a = x + y/z;
What type will the right-hand side end up with, and how will that equate with the
left-hand type?
Well, firstly object z will be promoted to unsigned long to match the numer-
ator, and the result will be unsigned long. Then x will be promoted to unsigned
long to match, and added to give an unsigned long right-hand value. Finally
this is converted to float, which is the value assigned to the left-hand variable.
In general, in a mixed type operation, the objects involved migrate upwards
to the highest commonalty, as defined in the hierarchy of Fig. 8.4, with int being
the base integral type.
One point that needs watching is the notion that an unsigned integral type is of
a higher order than its signed counterpart. This is because an unsigned quantity
can hold a larger magnitude for the same size, see Fig. 8.3. This can cause strange
outcomes when mixing unsigned and signed types together. For example, in the
statement above, if x was −1 on a 2's complement machine, it would be stored
as 0xFFFFFFFF (for 32 bits). Now because of y, it must be converted to unsigned
long, and in this case it will be treated as a positive number (4,294,967,295). In
some situations, this can lead to spectacular results, although it will work out
correctly in this case. In general, if possible do not mix signed and unsigned
numbers.
In an assignment, the right-hand value (r_value) is converted to the l_value
type, in this example, the float equivalent to the unsigned long r_value. Where
the l_value type is further down the hierarchy, then truncation or other unspeci-
fied shortening will occur, and unless the actual value can be fitted into the lower
type, an erroneous result will be recorded.
As a final example of what can go wrong consider the code fragment:
long int sum; /* Reserve 32 bits for sum */
unsigned int n; /* and a 16-bit n */
sum = (n+1)*n/2; /* Sum of all integers up to n */
compiled with a 16-bit int and 32-bit long compiler model. All arithmetic is
done at unsigned int level (i.e. 16-bit precision). However, if n is large enough,
overflow will occur; for example if n is 256, then (n+1)*n will give 256 and not
65,792 (256 is 65, 792 − 65, 536)! The fact that sum is defined as long will not
save the situation, as this means only that the final (erroneous) r_value will be
promoted to 32 bits. If values of sum greater than 65,535 are expected, then
the variable n may be treated as a 32-bit object by using the cast operator (i.e.
(long)n), which will force 32-bit arithmetic thus:
sum = ((long)n+1) * n/2;
Why didn't I bother to cast the second n? Why is the code of Table 7.13 safe?
224 C FOR THE MICROPROCESSOR ENGINEER
Here the ASCII code for 0 (i.e. 30h) is subtracted if the digit lies between 0
and 9 (30h – 39h) and 37h is subtracted if it does not (which assumes that it
must be between A and 'F', 41h – 46h).
if instructions may be nested, although care must be taken in using braces
to force the proper association. As an example, consider a Real-Time Clock func-
tion entered via an interrupt once a second. We will discuss how this might be
accomplished in a C program in Section 10.2. Once in the function, we have three
variables: Seconds, Minutes and Hours. The logic for the update is:
As shown in Table 8.6, the Seconds variable is first incremented and then
compared for greater than 59 in line 4 (note the ++ operator before the variable
Seconds). If this is not True then the following complex statement, delineated
by the braces of lines 5 and 15 is skipped and the function exits. Otherwise,
this complex statement is entered, Seconds are zeroed in line 6 and the next
if instruction executed. This does the same thing with Minutes, and if the result
is not greater than 59 its body, delineated by braces in lines 8 and 14, is skipped
and the function terminated. Finally the third-level nested if increments and
checks Hours. If the result is not greater than 23, then its body is skipped to the
brace in line 13 and thence the exit point at line 16.
Notice how the if instructions are indented, and how the different nesting
levels' braces line up. It is essential to take care with constructions like this to
avoid error.
Nesting ifs with elses can cause errors, as any else will associate itself to
the nearest unattached if, thus:
226 C FOR THE MICROPROCESSOR ENGINEER
The writer of this code fragment meant to restrict the variable n to the range
0 – max, limiting it to these boundary values if beyond. Thus the logic was:
1. Check n above zero, IF False then make n = 0.
2. Check n above max, IF True make n = max, ELSE do nothing.
What actually happens is the else of line 3 will attach itself to the if of line 2,
not that of line 1; thus if n is lower than zero, all of lines 2 and 3 are bypassed.
Furthermore, if n is not above max, then n will be made zero! The situation is
solved by proper use of braces:
if (n > 0)
{
if (n > max) {n = max;}
}
else n = 0;
Although nested ifs can be utilized to make multiple decisions, their use is
not very elegant, and, as we have seen, error prone. The else-if construction
illustrated in Fig. 8.6 is the more structured approach. The several expressions
are evaluated in order until the first True result. The statement associated with
this expression is executed, and the rest of the chain is by-passed. An optional
final else can be used at the end to give a default action.
As an example, let us redo the Real-Time Clock function, this time using an
else-if construction. In Table 8.7, line 4 is a plain if, which checks the state
of Seconds after incrementing. Should Seconds be less than 60 the dummy
null statement {;} is executed and the rest of the structure bypassed. If not,
then the Minutes variable must in turn be incremented and checked. However,
first Seconds must be zeroed. I have used the comma concatenate operator to
outcome is met, the associated expression is evaluated and the whole of the rest
of the structure bypassed. As we shall see, there are more efficient ways to im-
plement this function.
each case statement is compound, ending with the break instruction. This forces
the execution to bypass all remaining statements down to the return of line 20.
Leaving out break is not a syntax error, but is rarely what the programmer meant
to do [16].
switch-case structures are frequently used in conjunction with a keyboard
to select an appropriate response to each keypress, usually by jumping to a sub-
routine. Thus, if key M is pressed, do a memory examine; if V is pressed, view a
block; etc.
230 C FOR THE MICROPROCESSOR ENGINEER
The loop structure is the standard technique for repeating a process a number
of times, either on a single object or on an array or block of related objects.
We have already extensively used this approach at assembly level, for example
Table 5.7. C has three statements specifically handling loops: while, do-while
and for.
Initially, let us see how we could handle a loop without using specific looping
instructions. Consider the following code fragment, which evaluates the factorial
by repetitive multiplication of a decrementing n.
factorial = 1;
LOOP: if (n>1) {factorial *= n--; goto LOOP;}
This uses the goto instruction, together with a label, to repeat the if test on each
pass of the loop body, in a similar fashion to an assembly language implementa-
tion (see Table 4.13).
The goto instruction can be used to force an unconditional branch to a label
anywhere within a function. However, its use is frowned upon (it has been stated
that the quality of programmers is a decreasing function of the density of goto
instructions in the programs they produce [17]), as used without care it can lead
to spaghetti (unstructured) code. Nevertheless, its use is sometimes virtually
indispensable, particularly when trying to escape to the outside world from within
nested loops. Use with caution.
We have already met the while loop back in Table 8.1. Here the body — lines
11 to 14 — was repetitively executed as long as the test n>0 was True. On False
the code following the body, that is line 15, is entered.
Three elements present in any loop should be noted. Firstly, variables must be
PROGRAM FLOW CONTROL 231
set to their initial state before the loop proper is entered, see line 9 in Table 8.1.
Then, in a while construct, a test is made, as shown in Fig. 8.8(a). If the outcome
is True, the loop body is executed. Finally, some change must be made to the test
variables, so that this test will eventually have a False outcome, and execution will
go on to the next code section. Sometimes this change is explicit, as in line 13 of
Table 8.1, and sometimes implicit in the loop body.
Two keywords are used in conjunction with while. A break forces an im-
mediate exit from within the loop, usually on some exceptional situation. In
Table 8.10, this occurs if n>12, which is the error condition demanding a return
of zero. This is done by testing for greater than 12 and breaking if True in line 7.
In the case of a nested loop, breaking will move the execution only to the next
outer level.
The continue keyword forces an early repeat of the test by jumping over the
rest of the loop body. As an example, consider an array of signed elements (see
Section 9.2). The following code totalizes only array members that are positive:
sum = 0, x = 0;
✲ while (x < MAX) ✛
}
if (array[x] < 0) {continue; }
sum += array[x++];
}
Sometimes it is necessary to go through the loop body first before testing for
exit. This ensures that at least one pass will be performed irrespective of the
outcome of the test. The structure of this do-while (repeat-until) loop is shown
in Fig. 8.8(b). A break can be used in a similar way as in while, and continue
causes a drop down to the test (rather than upwards). The do-while loop is the
least used of the three kinds of C loop constructions.
The most versatile of the three is the for loop. This is similar to while but
combines the initialization, test and loop variable update as its arguments; thus:
for(expr1; expr2; expr3)
{loop body}
PROGRAM FLOW CONTROL 233
initialization
(b) Resulting 68020 MPU assembler code (64 bytes) with annotated comments.
References
[1] Richards, M.; BCPL : A Tool for Compiler Writing and Systems Programming, Proc.
AFIPS SJCC, 34, 1969, pp. 557 – 566.
[2] Richards, M.; The Typeless Survivor, .EXE (UK), 6, no. 6, Nov. 1991, pp. 74 – 81.
[3] Ritchie, D.M. and Thompson, K.; The UNIX Time-Sharing Systems, Bell Systems Tech-
nical Journal, 57, no. 6, part 2, pp. 1905 – 1929.
[4] Johnson, S.C. and Kernighan, B.W.; The Programming Language B, Comp. Sci. Tech.
Ref., no. 8, Bell Laboratories, Jan. 1973.
[5] Ritchie, D.M. et al.; The C Programming Language, Bell System Technical Journal, 57,
no. 6, part 2, July/Aug. 1978, pp. 1991 – 2019.
References 235
[6] Collinson, P.; What Dennis Ritchie Says; Part 1, .EXE (UK), 5, no. 8, Feb. 1991, pp. 14 –
18.
[7] Kernighan, B.W. and Ritchie, D.M.; The C Programming Language, Prentice-Hall,
1978.
[8] Kernighan, B.W. and Ritchie, D.M.; The C Programming Language, 2nd. ed., Prentice-
Hall, 1988.
[9] Banahan, M.; The C Book, Addision-Wesley, 1988.
[10] Kelly, A. and Pohl, I.; A Book on C, Benjamin Cummings Publishing Co., 2nd. ed.,
1989.
[11] Gardner, J.; From C to C, Harcourt Brace Jovanovich/Academic Press, 1989.
[12] Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985; IEEE
Service Center, Publications Sales Dept., 445 Hoes Lane, POB 1331, Piscataway,
NJ 08855-1331, USA.
[13] Jaeschke, R.; The Proposed ANSI C Language Standard, Programmer's Journal, 5,
part 4, pp. 38 – 40, 1987.
[14] Kernighan, B.W. and Ritchie, D.M.; The C Programming Language, 2nd. ed., Prentice-
Hall, 1988, Section 8.3.
[15] Koenig, A.; C Traps and Pitfalls, Addison-Wesley, 1988, Section 1.1
[16] Koenig, A.; C Traps and Pitfalls, Addison-Wesley, 1988, Section 2.4
[17] Dijkstra, E.W.; Goto Statement Considered Harmful, Letters to the Editor, Communi-
cations of the ACM, March 1968, pp. 147 – 148.
CHAPTER 9
More Naked C
9.1 Functions
The function in C is the direct equivalent to the subroutine at assembly level,
and is directly translated as such. A function encapsulates an idea or algorithm
into a named structure. It can be used in an expression as a normal variable,
by naming it together with any parameters that are being passed. All functions,
except void, return a value as defined by the return instruction. This is the
value that is substituted for the function in the calling expression. For example,
if we have the function defined in Table 8.11, then the code fragment:
x = 4;
y = 1/factor(x);
1
will make y the reciprocal of 4!, that is 24 . The value of factor(4) is of course 24,
as returned in line 9 of that table.
In this section, we specifically need to look at how functions are declared and
defined, how parameters are passed back and forth, and the scope of objects
declared inside and outside functions.
C programs are structured as a collection of external objects. These objects
are mainly global variables and functions. This is graphically shown in a much
simplified form in Fig. 8.1. The main function, conventionally called main(),
acts as a central spine calling up the various ancillary functions in the appropri-
ate order, usually with a minimum of processing itself. In a hosted environment,
main() interacts with the operating system, from which it can obtain and some-
times return information. In a naked environment it is normally entered via an
assembly-level startup routine, and frequently runs forever in an endless loop.
More details are given in Section 10.1.
Although main() is regarded as a little special in C, in reality the compiler
treats it in the same fashion as any other function. The layout of Fig. 9.1 shows
this, with main() being one of three functions in the figure. Each of these func-
tions must be defined. A function definition, typical examples of which are
236
FUNCTIONS 237
shown in Tables 8.1 and 8.6 – 8.11, consists of a prototype heading followed by
local variable definitions and any called function declarations, and then by the
body of the function. This body is a series of any legal C statements enclosed
in braces. Unusually, these braces must be present even if the body comprises a
single statement; for example:
{
return (x*x);
}
is a legitimate function body (squaring x, which has been passed to the function).
The return instruction is the mechanism whereby the function is assigned a
value, as seen by the caller. If return is omitted, the function will still exit back
to the caller (i.e. there will be an RTS or equivalent at the end of the subroutine),
but the value seen by the caller will be undefined. Functions which do not return
a value, that is are void (e.g. Table 8.6), can either omit this statement or as a
matter of style include a null return;. Parentheses are optional around return's
expression, but are frequently used for clarity.
The function prototype at the head of the body simply names and indicates
the type of the function (i.e. its return value) and the types of any parameters
passed to the function. Doing this, we have for our squaring function definition:
int square(signed char x) /* Prototype head */
{ return (x*x); } /* Body */
The software implementation of Table 9.1(a) uses this algorithm, but recog-
nizes that overflow will occur for certain combinations of y and exp, and returns 0
for this situation. This is determined when the result of the kth multiplication is
(b) Resulting 68020 MPU assembly code (line numbers added for clarity).
FUNCTIONS 241
less than the (k − 1)th. However, as y can be signed, it is the modulus of these
results that must be compared, rather than their actual value.
The program is structured as three functions. The obligatory main() is for
demonstration only, and terminates with the value of 253 . Of interest to us is the
declaration in line C6 of the function power(). This declaration is in prototype
form, where the function return type (int here) is followed by its name and the
type of the two objects to be passed (a signed char and unsigned char). For
clarity the formal parameters y and exp are used in the declaration. This is
optional, and the declaration:
is acceptable.
This line is a declaration, unlike lines C3 – 5 where variables are defined. The
difference is that a definition gives the properties of the referenced object as
well as reserving storage. On the other hand a declaration only makes known the
object's properties; no storage space is assigned.
A declaration function prototype is not mandatory, and the alternative:
int power();
(lines 10 – 12). Finally a JSR is made to _power: in the normal way to transfer to
the subroutine.
Several implementation-specific properties of this compiler (Whitesmiths V3.2
68020 C cross-compiler) cloud this issue. Firstly, all functions identified as
name() commence at _name: at assembly level. This compiler passes chars (and
shorts) promoted to ints, although they are treated according to their proper
type in the function (lines 27 and 41). This is probably to cope with old-style non-
prototype declarations, where parameters are promoted to int or double [1].
Finally, and rather obscurely, the compiler always puts the first variable passed
(the rightmost) away using the plain Address Register Indirect address mode,
whereas further variables (moving leftward) use the Address Register Indirect
with Pre-Decrement mode for a proper Push action. Thus we have:
Why is this? Well, the former is quicker, especially as the System stack does
not have to be restored to its original value (cleaned up) on return (e.g. line 14)
to compensate for its decrementation. This is useful, as the majority of function
calls only pass a single parameter, but of course whatever was on the System
stack before will be overwritten. This compiler gets around this either by having
created a frame which is overly large (see Fig. 9.2) or else when registers were
saved on the System stack after the frame was opened (line 4) a sacrificial register
was put away. In either case the stack content is irrelevant at the time when
the parameters are sent out, and can be overwritten. This compiler specifies
that registers D3, D4, D5, A3, A4, A5, A6 are not to be altered on return from a
function. Thus, as main() uses D5, it is saved in line 4 together with D0, whose
value on return is unspecified; that is the sacrificial register.
Compiler-specific details like these are irrelevant at the higher level. However,
they are important when mixing C and assembly-level subroutines, as described
in Section 10.1, and when debugging, see Chapter 15.
Some compilers pass one or more variables in a register. For example the Cos-
mic/Intermetrics V3.3 6809 C cross-compiler will normally put the first copied
value in Accumulator_D if char, short or int, and pass copies of any subsequent
variables through the System stack in the normal way. In either case the objects
are always known only to the called function, living either in the local frame or
register set. Fig. 9.2 illustrates the frame as seen by power(). Notice how the
passed variables are referenced above the local Frame Pointer A6, that is y at
A6'+11 and exp at A6'+15, whereas the internal variables are below at A6'-8 for
result and A6'-12 for old_result. Notice the sacrificial long word at the bot-
tom of the stack, which is overwritten in lines 31 and 34 by the single parameter
passed to abs().
The concept of scope as the lifetime of an object was introduced in Section 8.2.
Let us look at this in more detail. Objects defined inside braces are local to that
multiple statement only. Thus, in the code fragment:
FUNCTIONS 243
Figure 9.2 The System stack as seen from within power(), lines 21 – 38.
244 C FOR THE MICROPROCESSOR ENGINEER
function()
{
int i;
{
do lots of things with i;
}
{
int i;
do more things with a different i;
}
etc
}
The two is are different, and the lower i is known only down to its local }.
Outside this redeclared area the first i is known.
Generally, variables are declared at the opening function brace and disappear
from view at the closing brace. When the function is re-entered, static vari-
ables (placed in absolute memory) will have kept their last value, whereas auto
variables (assigned space in a stack-based frame) have lost theirs.
Unless a function is declared static [2], its identifier is broadcast as an exter-
nal object; that is it is declared public or global at assembly level, and as such is
known to the linker. In lines 56 – 58 of Table 9.1(b), the three labels _main, _abs
and _power are declared .GLOBL (similar to .PUBLIC), as would be expected. This
means that any other file which has been separately compiled for future linking
can use, say, function power(), as the assembly line JSR _POWER will be recog-
nized by the linker (see Section 7.2). However, the declaration:
must appear in this separate file either before the first function (in which case its
scope is the entire file) or else in functions which call it. This tells the compiler
that the function power() will be found elsewhere through the linker. The extern
qualifier in C will generate a .EXTERNAL or XREF directive at assembly level, for
example:
.external _power
will not generate assembler line 58, that is the identifier _power is not broadcast
as public.
The first defines an array named arr[], comprising 1024 consecutive long-
words in absolute memory. At assembly level the reservation will be labelled
something like _arr: .double[1024]. Thus arr is actually the address of the
first element of the array (e.g. see Table 9.5(b), _Array:).
246 C FOR THE MICROPROCESSOR ENGINEER
The second definition reserves 256 units in the frame. Although these loca-
tions are in relative memory, the root name fred in the C source still refers to
the address of element fred[0].
The final statement defines ten consecutive bytes of constants, beginning at
_table_7:, which are in absolute memory, probably destined for ROM. In prac-
tice, this is useless as it stands as an initial value must be given as part of its
definition, otherwise it will be filled with zeros by the compiler — in the normal
way for static objects — which will generate an assembly-level line something
like:
In the latter case the dimension of the array was not given, the compiler tak-
ing it as the number of initializers (i.e. eleven); and array elements square[0]
to square[10] will have the values shown. The dimension n, specified either
explicitly or implicitly in a definition, is the number of elements. However, as
element 0 is the first, the final element is n − 1. Thus the first example really
means factor[0] = 0, factor[1] = 1, factor[2] = 4, factor[3] = 9, and
factor[4] = 16. There is no factor[5]! The compiler will not warn you of er-
rors like this, and using undefined elements is a fruitful source of obscure errors.
If the explicit array size parameter is greater than the number of initializers, all
unspecified elements are assumed to be zero, unless an auto storage-class array.
Old C did not permit auto array initialization, but this is not true of ANSII C.
At any time an element m of an array can be referred to by following the root
name by [m]. The resulting object can then be treated like any other C object
of the same type. As an example, consider the following code fragment which
applies the 3-point low-pass filter transformation, defined on page 110, to an
array of 256 elements:
.text
.even
L5_array: .long 1 ; array[0]
.long 1 ; array[1] etc.
.long 2
.long 6
.long 24
.long 120
.long 720
.long 5040
.long 40320
.long 362880
.long 368800
.long 39916800
.long 479001600 ; array[12]
* 1 unsigned long factor(int n)
* 2 {
.even
_factor: link a6,#-4
* 3 static unsigned const long array[13] =
{1,1,2,6,24,120,720,5040,40320,362880,368800,39916800,479001600};
* 4 if(n>12) {return 0;}
cmpi.l #12,8(a6) ; Compare n living in [A6]+8 with 12
ble.s L1 ; IF lower or equal THEN continue
moveq.l #0,d7 ; ELSE exit with 0 in D7.L
unlk a6
rts
* 5 return(array[n]);
L1: move.l 8(a6),d7 ; Get n
asl.l #2,d7 ; Multiply by 4 to match array element size-long
move.l d7,a1 ; and put in A1.l
adda.l #L5_array,a1 ; Add it to the address of array[0]
move.l (a1),d7 ; which then points to array[n]. Get it
unlk a6 ; and return
rts
.globl _factor
* 6 }
A look-up table is a synonym for an array, usually, but not always, of con-
stants. Table 9.2 shows this technique used to generate our old friend, the facto-
248 C FOR THE MICROPROCESSOR ENGINEER
rial. Here an array of 13 elements hold the equivalents of 0! to 12!. The array, or
table, is declared in line C3 as static (i.e. stored in absolute memory) and const,
together with the appropriate values. The const qualifier forces the compiler to
place these values in the text program section along with the program, both of
which will be in ROM in an embedded implementation. This storage is simply
thirteen sequential long-words beginning at L5_array:, in the same manner as
in Fig. 9.3(b).
The resulting assembly-level program itself uses the array index n multiplied
by four (to match the element size) to give the offset from the base address.
Putting this in an Address register (A1) and adding the base address L5_array
to it gives the position of array[n]. This is then transferred to D7.L for return.
Multi-dimensional arrays can be implemented in C, although their use is rare.
Some example definitions are:
caller()
-------------------------
-------------------------
--------------------------
--------------------------
This would result in the first n elements of array block_y[] being physically
replaced by the first n elements of array block_x[].
The code itself, reproduced in Table 9.3(b), shows that the address of array2[]'s
base was passed in a long-word 12 – 15 bytes above the Frame Pointer (A6) and
that of array1[] in locations 8 – 11 bytes above. Parameter length is passed
by copy in [A6]+16/19. Array element i's address is calculated as i (stored in
D5.L) plus the relevant base address. This is a pity, as the sequential nature of the
process would suit the use of the Address Register Indirect with Post-Increment
mode to creep up (walk) through the block, such as in lines 23 and 25 of Ta-
250 C FOR THE MICROPROCESSOR ENGINEER
ble 5.7. However, the code size of 40 bytes compares well with that of 39 in the
hand-assembled version.
Notice how an array is denoted in the function prototype by the root name and
empty square brackets, for instance array1[]. A size is not necessary, although
it can be added for clarity if desired. Multi-dimensional arrays must give the
size of each dimension, except the leftmost. As we have seen, this is in order to
calculate the address of any element. Care must be taken, as the compiler does
not check for overrun; thus reference to array[9] in an array defined to have
four elements is accepted, and the contents of memory location array + 9 × w
actually fetched or, worse still, changed!
ARRAYS AND POINTERS 251
What if we wanted to copy the contents of a ROM into a RAM, as was the case in
Table 5.7? The function will be exactly the same, but this time we must pass the
start address of the two chips. We have seen that we can determine the address
of an array by just referring to its root name; can we extend the principle? The
affirmative answer to this leads us to one of C's strengths, the use of pointers.
A pointer is a constant or variable object holding information relating to where
another object is stored. In MPUs with linear addressing techniques, such as most
8-bit devices and the 68000 family, this is just the absolute address. In segmented
architectures, as exhibited by the 8086 family, typically near and far pointers
exist; the former holding the address within the current segment (usually two
bytes) and the latter the segment:address (usually four bytes).
Pointers may be taken of any C object, except a register type, by using the
address-of unary operator & (see Table 8.4). For example, if a variable x exists,
then its address can be assigned to the pointer variable ptr thus:
ptr = &x;
y = *ptr;
which reads from left to right, y is assigned the contents of address ptr.
We now have the problem of what value will a pointer have if the pointed-to
object is bigger than one byte, say a 4-byte long-word variable or 100-word array?
And how would the construction ptr+1 be interpreted? In assembly/machine
code, the address is normally the lowest byte address of the object, for example:
and C uses the same convention. Thus, the value of a pointer to the 100 short-
element array (&ar[0]) stored in memory at 0xC000 — 0xC063 will simply be
0xC000. In the case of an array, we can use the root name as the base address;
hence:
ptr = &ar[0];
ptr = ar;
are the same, and ptr will be a pointer to whatever kind of object the array
comprises, provided of course that it has been previously defined as such.
Although the storage size of the base address is fixed, and is independent
of the object, pointers do take on the type of their referred-to object. Thus, for
example, we can have pointer-to-int and pointer-to-float entities. This, and
other properties, are bequeathed to the pointer variable during its definition. In
common with any other object, all pointer variables must be defined before use.
Some examples are:
252 C FOR THE MICROPROCESSOR ENGINEER
In the first instance, the pointer variable port is brought into being and de-
clared to be addressing a char object. This can be read as `the contents of port
is a char'. Alternatively, the * indirection operator can be transcribed as `pointer
to', if read right to left; that is `port is pointer-to a char object'. The second ex-
ample creates two pointer-to float objects, namely pvar1 and pvar2. This reads
(from right to left) pvar1 is a pointer-to a float, pvar2 is a pointer-to a float.
The final definition is rather enigmatic, as it appears to be saying that point is
a pointer-to nothing! A pointer to void type is treated as a generic pointer (pure
address) and can be cast to any real type and back without any loss of integrity.
The concept of different types of pointers is important when dealing with
pointer arithmetic. Pointers can be incremented/decremented, added/subtract-
ed with constants and pointers of the same kind and compared with pointers of
the same kind. Consider a pointer pvar having a value at some instant of 0xC000,
then:
pvar += 4;
will have what value? If pvar is a pointer-to char (or void) then 0xC004 will be
the answer, but if pvar is a pointer-to long (or float) then 0xC010 is the answer.
Thus in pointer arithmetic, constants indicate objects, and are multiplied by their
sizes for the purposes of any calculation.
A more sophisticated example of pointer arithmetic is given by the example:
statement calendar[10] + 1;? The result of the addition will be the address
of December, that is 31 chars on, or calendar[11]. Thus, calendar[10] is a
pointer-to array-of-31-chars type! If a pointer variable is to be assigned to such
a type, then it must be defined accordingly, for example:
char (* month)[31]; /* Declare month as pointer-to-array-of 31 chars */
month = calendar[0] + i;
The complex definition of the pointer month reads from inside the parentheses
going left and then right: month is a pointer-to / an array of 31 / chars. Paren-
theses must be used, as [] is of higher precedence than *. Pointer variable month
can then participate in pointer arithmetic where the other pointer variables are
of the same type.
The pointer variable itself may be given properties, such as being const or
static (i.e. stored in absolute memory locations, see Table 14.8), for example:
static unsigned char * const port;
which says that port is a const pointer-to an unsigned char object and is stored
statically. Besides giving port its arithmetic properties, the compiler has been
told to store it in absolute memory and flag any attempt to change it (typically
placed in ROM in an embedded system).
In dealing with the interface to hardware, the software designer must be able
to direct data to and from ports at known addresses. Thus in C, we must be able
to assign specific addresses to pointers. In ANSII C a pointer can only be assigned
a value if their types match; thus the statement:
char * port = 0x9000; /* !!!!!!!!!!! Incorrect */
which now states that port is a pointer-to a char type variable, and its value is
0x9000. The cast reads (right to left) pointer-to-char type.
On this basis, if we wished to call up the function of Table 9.3 to copy a ROM
starting from E000h to RAM starting at 2000h of length 1000h bytes, then the
caller would include the following:
call() /* The caller function */
-----------------------
-----------------------
char * const ROM_start = (char*)0xE000; /* ROM_start is a pointer */
char * const RAM_start = (char*)0x2000; /* as is RAM_start */
254 C FOR THE MICROPROCESSOR ENGINEER
-----------------------
-----------------------
block_copy(ROM_start, RAM_start, 0x1000); /* Invoke the copy function */
-----------------------
or even
call()
-----------------------
-----------------------
block_copy((char*)0xE000, (char*)0x2000, 0x1000);
-----------------------
This prototype identifies number as an unsigned char (i.e. byte) and the address
variable port as a const (fixed) pointer to an unsigned char object. The func-
tion is declared as returning int, as it is proposed to return −1 to indicate an
error (defined as n > 9), otherwise 0. Based on this declaration, to display 7 at
the digit located at 0x9000, we would use the call:
Notice the cast (unsigned char*), which is necessary to match the constant
address to the same pointer-to type as port.
The program given in Table 9.4 is straightforward. The code conversion is
done by means of a look-up table, as described in Table 9.2. The extracted value
is moved to port in line C5, by the simple expedient of saying that the contents
of port are the nth entry of the table. Notice that in line C3 the table is an array
ARRAYS AND POINTERS 255
of statically stored constant characters (i.e. bytes), and has thus been assigned to
the _text program section. In an embedded system this will be in ROM. Objects
which are statically stored are considered to have been given their initial values
at compile time (i.e. during loading), not run time. At assembly level this appears
as .BYTE (or equivalent) directives. The const qualifier ensures that an attempt
to modify the table values will be flagged as erroneous, which is sensible if the
table is in ROM! Thus these initial values are the only values that the table will ever
have. The compiler evaluates the size of the array from the number of initializers.
The example of Table 9.4 shows that it is as easy to pass a pointer through to
a function as a copy of an object. This can be exploited to change an object itself
in a function, rather than the copy. Just send a reference to the object and not
the copy (e.g. &x, not x). The contents of that object can then be changed at will;
thus (rather trivially) to add ten to an int object x we have:
add_ten(&x);
The variable *pvar (i.e. x) can be manipulated in exactly the same way as any
òrdinary' variable. The Contents-Of operator *, in common with all other unary
operators, has a high precedence and thus parentheses need rarely be used; for
example:
z = *(pvar1)/5 + *(pvar2)*7;
z = *pvar1/5 + *pvar2*7;
ARRAYS AND POINTERS 257
are equivalent. However, care must be taken when the ++ and -- unary Incre-
ment and Decrement operators are used, as these have the same precedence. As
unaries read from right to left, we have as an example:
x = *pvar++; /* Increment pointer and take contents of */
x = (*pvar)++; /* Increment contents of pvar */
x = ++*pvar /* Same as above */
Pointers to a function (i.e. where the function begins) are also possible (see also
Section 10.1), thus:
int (*fred)(parameter list); /*fred is a pointer to the function fred()*/
Hence, it is possible to store a table of pointers to functions (see page 281), fre-
quently seen in assembly-level programs as jump tables [3]. Pointers to pointers
can be defined ad infinitum, if rarely used.
We introduced pointers by noting that arrays were handled using addresses,
especially when being passed to functions. Thus, by inference it is possible to
use pointer rather than array notation in such functions. We did this as one way
of clearing a 100-element array. Another example is given in lines C3 and C5 of
Table 9.4, which can be replaced by:
unsigned const char * table_7 = {0x20, ......, 0x10}; /* 7-segment table */
*port = *(table_7 + n); /* Send out nth entry */
9.3 Structures
We have seen that arrays are data structures grouping many objects having the
same type under a single name. Many real situations require organizations of
objects having many different properties, but coming under the same banner. As
an example, consider a monitoring system in a hospital ward containing up to ten
patients. Treating each patient (rather unfeelingly) as a composite object, then a
record would contain data such as the hospital number, age, date and an array of
physiological readings, such as heart rate, temperature, blood pressure etc. This
would be continuously gathered, and perhaps once an hour transferred to a file
on magnetic disk for later analysis. These ten objects could be defined in C as:
Table 9.5: Displaying and updating heart rate (continued next page).
short Array[256]; /* Global array of 256 words */
/* The background routine is defined here */
void display(void)
{
register unsigned char x_coord; /* The x co-ordinate */
char *const dac_x = (char*)0x6000; /* 8-bit X-axis D/A converter */
short *const dac_y = (short*)0x6002; /* 12-bit Y-axis D/A converter */
while(1) /* Do forever */
{
*dac_y = Array[x_coord]; /* Get array[x] to Y plates */
*dac_x = x_coord++; /* Send X co-ordinate to X plates */
}
}
void update(void)
{
static unsigned short last_time; /* The last counter reading */
static unsigned char update_i; /* The array update index */
short * const counter = (short*)0x9000; /* The counter is at 9000/1h */
char * const int_flag = (char*)0x9080; /* The external interrupt flag */
*int_flag = 0; /* Reset external interrupt flag */
Array[update_i++] = *counter-last_time; /* Difference is new array value */
last_time = *counter; /* Last reading is updated */
}
{
unsigned long hosp_numb;
unsigned char age;
unsigned long time;
unsigned char day;
unsigned char month;
unsigned short year;
unsigned short array[256];
} patient[10];
makes the object month inside the structure named patient[6] = three.
The tag med_record is optional, and is the name of the structure template.
Objects can be given this template any time later, for example dog_1 and dog_2
may be defined as:
struct med_record dog_1, dog_2;
Thus only a template (which does not cause storage to be allocated) can be de-
clared, and definitions can occur at any following point, for example within other
functions.
Taking an example closer to the theme of this book, consider a compound
peripheral interface such as the 6821 PIA [4]. We have already seen how this
can be interfaced to a MPU in Figs 1.9 and 3.14; here we look at the internal
register structure as described in Fig. 9.5. There are six programmer-accessible
8-bit registers living in an address space of four bytes, as determined by the state
of the Register Select bits RS0 RS1.
Sharing a slot are the Data Direction and Data I/O registers. Which of the pair
is actually connected to the data bus when addressed is determined by the state
of bit-2 of the associated Control register. Each of the eight I/O bits may be set
to in or out, as defined by the corresponding bits in the Data Direction register;
for instance if ddr_a is 00001111b, then Data register_A has its upper half set as
input and lower half as output. Once a Direction register has been set up, then its
slot can be changed to the I/O port, by setting the appropriate Control register's
bit-2 high.
Each of the six component parts of a PIA can be defined as a pointer, in the
way described in the last section, and treated in the normal way. However, if
there are several PIAs in the system, then a template for this device as a single
compound object can be made and used for each physical port of this kind.
Lines C1 – C9 of Table 9.6(a) declare a template describing the PIA as a struc-
ture of pointers, thus each register is characterized as an address. Two PIAs are
262 C FOR THE MICROPROCESSOR ENGINEER
defined based on this declaration, port0 and port1 in lines C12 and C13. Some
of the registers are qualified as pointer-to volatile, as bits read from the outside
world will change independently of the software. I have declared these structures
to be const and stored in absolute memory, that is static. This means that the
structure elements, which here are constant addresses, will be stored in ROM
along with the program (assembly lines 3 and 5 in Table 9.6(b)) and any attempt
to change these pointers will be flagged by the compiler as an error. Such struc-
tures are initialized in the same way as a comparable array. Notice in lines C12
and C13 how the casts are the same as in the template definition.
Functions can take structures as parameters and return them. In both cases
the structure name alone is sufficient; for example in line C15 the passed param-
eter is port0, and this causes copies of all six elements to be pushed into the
stack prior to the Jump (assembly lines 7 – 10).
Pass by copy is the same technique as used for ordinary single objects, and,
as such, the elements themselves cannot be altered by the function. In the situ-
ation depicted in Table 9.6, the structure elements are pointers, so although we
cannot alter their copies in function initialize(), we can alter the pointed-to
variable (i.e. PIA registers) through them. Thus, line C25 means that the contents
of structure type PIA named port element control_a is assigned to zero, that
is * port.control_a = 0;. As port is an element by element copy of struc-
ture type PIA named port0 (if called up from line C15), the contents of port0's
control_a register are affected.
Strangely, structure objects are passed by copy, whereas the equivalent pro-
cess with arrays causes a pointer to the array to be passed. This latter is much
more efficient, as only one object (the pointer) has to be pushed on to the stack
prior to the Jump to Subroutine, irrespective of the size of the array. Structures
* 10 main()
* 11 {
* 12 static const struct PIA port0 = {(unsigned volatile char*)0x8000,
(unsigned char*)0x8000, (unsigned volatile char*)0x8001,
(unsigned volatile char*)0x8002, (unsigned char*)0x8002,
(unsigned volatile char*)0x8003};
* 13 static const struct PIA port1 = {(unsigned volatile char*)0x8020,
(unsigned char*)0x8020, (unsigned volatile char*)0x8021,
(unsigned volatile char*)0x8022, (unsigned char*)0x8022,
(unsigned volatile char*)0x8023};
* 14 void initialize(struct PIA);
* 15 initialize(port0);
6: .even
7: _main: adda.l #-24,sp ; Prepare to push 24 bytes
8: move.l #L5_port0,-(sp) ; i.e. the six pointers of port0
9: move.l #24,d0 ; out onto the System stack
10: jsr a~pushstr ; Using this library subroutine
11: jsr _initialize
12: lea 24(sp),sp ; Restore the Stack Pointer
* 16 initialize(port1);
13: adda.l #-24,sp ; Repeat above to pass struct PIA
14: move.l #L51_port1,-(sp) ; port1 to initialize()
15: move.l #24,d0
16: jsr a~pushstr
17: jsr _initialize
18: lea 24(sp),sp
* 17 /* Main body sends out of port1's B reg the sum of port0 & 1's A reg */
* 18 *(port1.data_b) = *(port0.data_a) + *(port1.data_a);
19: movea.l L51_port1+12,a1 ; Point A1 to port1's data_b reg
20: movea.l L5_port0,a2 ; Point A2 to port0's data_a reg
21: moveq.l #0,d7
22: move.b (a2),d7 ; Get port0's data_a byte
23: movea.l L51_port1,a2 ; Point A2 to port1's data_a reg
24: moveq.l #0,d6
25: move.b (a2),d6 ; Get port1's data_a byte
26: add.l d6,d7 ; Add them
27: move.b d7,(a1) ; and send to port1's data_b reg
28: rts
* 19 }
* 20 /* Function sets up a PIA as a simple input A and output B port */
STRUCTURES 265
are generally smaller than arrays are likely to be, and presumably for this reason
the less efficient pass by value copy technique is used. In Table 9.6(b) this is done
by moving the System Stack Pointer down 24 bytes (6 × 4-byte pointers), pushing
the base address of the structure on to the System stack, the byte size in D0.L,
and using the machine library subroutine (see Section 9.4) a~pushstr to do the
moving, lines 7 – 12 and 13 – 18.
Just like a simple object, a structure's address can be passed instead. This is
the more efficient method of passing a structure to a function. Thus to pass the
medical record of patient[6] to a function store(), which will store it on disk,
we could use the calling statement:
store(&patient[6]);
where patient[6] is the name of the structure and &patient[6] its address.
The sizeof operator will give the size of the whole structure. This may be
greater than the total of the individual elements, as some machines enforce stor-
age boundaries, which effectively pads out elements with holes. For example in
the 68000 family, non-byte objects normally begin at even addresses. An exam-
ple of this is shown in the use of the .EVEN assembler directive in lines 2 and 4 of
Table 9.6. The & operator (i.e. address-of) can also be used to generate a pointer
266 C FOR THE MICROPROCESSOR ENGINEER
C10: main()
C11: {
C12: static const struct PIA port0 = {(unsigned volatile char*)0x8000,
(unsigned char*)0x8000, (unsigned volatile char*)0x8001,
(unsigned volatile char*)0x8002, (unsigned char*)0x8002,
(unsigned volatile char*)0x8003};
C14: void initialize(struct PIA *);/* Declare ftn accepting a ptr to struct */
C15: initialize(&port0);
C16: initialize(&port1);
C17: /* Main body sends out of port1's B reg the sum of port0 & port1's A reg */
states that the address of port0, previously used to name a structure of type PIA,
is to be 8000h. As in previous pointer assignments, we must use a cast to convert
the constant to the appropriate type, which in this case is pointer-to struct PIA.
This gives us a major headache, as the Data and Data Direction registers share
the same address, so our six structure members cannot all have unique addresses.
The way around this problem is to use a union. A union is declared and initial-
ized in the same way as a structure, but all union members occupy the same
268 C FOR THE MICROPROCESSOR ENGINEER
C1: union share /* Template for shared DDR and Data registers */
C2: {
C3: unsigned char ddr;
C4: unsigned volatile char data;
C5: };
C6: struct PIA /* Template for PIA */
C7: {
C8: union share a; /* Shared registers A side */
C9: unsigned volatile char control_a;
C10: union share b; /* Shared registers B side */
C11: unsigned volatile char control_b;
C12: };
C13: main()
C14: {
C15: struct PIA *pntr_2_port0 = (struct PIA *)0x8000;/* port0's base @ 8000h*/
C16: struct PIA *pntr_2_port1 = (struct PIA *)0x8020;/* port1's base @ 8020h*/
C17: void initialize(struct PIA *);/*Decl ftn taking ptr to struct type PIA */
C18: initialize(pntr_2_port0);
C19: initialize(pntr_2_port1);
C20: /* Main body sends out of port1's B reg the sum of port0 & port1's A reg */
C21: pntr_2_port0->b.data = pntr_2_port0->a.data + pntr_2_port1->a.data;
C22: }
C23: /* Function sets up a PIA as a simple input A and output B port */
that both the -> and . operators have the same precedence, and associate right
to left, so parentheses are not required.
In Table 9.8, I have not defined the structure as being static or const, as
opposed to Tables 9.6 and 9.7. This leads to the structure addresses being stored
in the frame (assembly lines 4 and 5) at run time rather than being in absolute
memory at load time (static). The qualifier const would not change this, but
would produce a warning if the program tried to meddle with these addresses.
Compiling this source produced 130 bytes of machine code.
Although the procedure outlined in Table 9.8 seems best, there can be prob-
lems. The resultant assembly code has located the four elements at Base (line 26),
Base+1 (line 24), Base+2 (line 32) and Base+3 (line 34), that is at sequential ad-
dresses. However, many compilers will pad out elements to begin at even ad-
dresses. Indeed the circuit of Fig. 3.14 shows the PIA elements located at four
sequential even addresses (eight bytes), as address line a0 is not provided by the
68000 MPU. Most compilers permit various alternative storage configurations for
structures, and with collusion with the hardware engineer, a suitable scheme can
be devised. Nevertheless, the awareness of hardware circuitry intruding on soft-
ware matters leads to portability problems if the circuitry or/and processor is
changed.
One final note on pointers to structures. If arithmetic is attempted on such
objects, then one is taken to be the size of the structure. For example:
pointer = pntr_2_port0 + 1;
HEADERS AND LIBRARIES 271
.include "hardware.h"
#define TRUE 1
#define FALSE 0
#define ERROR -1
#define FOREVER_DO for(;;;)
#define I/O_PORT (char*)0x8000
#define PYE 22/7
void update(void)
{
static unsigned short last_time; /* The last counter reading */
static unsigned char update_i; /* The array update index */
short * const counter = COUNT_PORT; /* The counter is at 9000/1h */
char * const int_flag = INTERRUPT_FLAG; /* The external interrupt flag */
*int_flag = 0; /* Reset external interrupt flag */
Array[update_i++] = *counter-last_time; /* Difference is new array value */
last_time = *counter; /* Last reading is updated */
}
Some compilers insist that the # character begins the line (no spaces) and that
there is no whitespace between the # and define. Table 9.9 repeats 9.5(a) using
a header for each module. Notice how the addresses, including casts, are named.
The #define directive can do more than simple text and mathematics substi-
tutions, it can be used to define macros with arguments, rather like in Table 7.6.
Consider the definition:
Although MAX(X,Y) looks like a function call, any reference to MAX later expands
into in-line code, for instance:
temperature = MAX(t1,t2);
Notice how the macro definition was carefully parenthesized to avoid prob-
lems with complex parameter substitutions. For example:
#ifndef MPU
#error Microprocessor type not defined
#endif
#else
#error Unknown microprocessor type
#endif
There are quite a number of new keywords used in this example. The purpose
is to introduce two new types of C objects, namely WORD and LONG_WORD, rather
than use char, int etc. The C operator typedef allows the writer to use syn-
onyms for object types of any complexity. For example the FILE type available
to hosted C compilers to open, close, write to or read from a named file on disk
is a synonym for a complex structure type.
Now to make the types WORD and LONG_WORD portable, the underlying base
type must be chosen according to the target processor. Thus for example, int
274 C FOR THE MICROPROCESSOR ENGINEER
is a 16-bit word in most 6809 and 8086 compilers, and usually 32 bits for 68000
and 80386 target compilers. Our example defines the WORD and LONG_WORD types
differently according to the state of the variable MPU, which has been set by the
operator prior to using the compiler. For example in the MSDOS operating system:
SET MPU = 68K
In the case of the former, the preprocessor assumes that the file hardware.h is in
the same directory as itself. In the latter, various other specified directories are
searched as well, usually a special header subdirectory. The details are compiler
dependent. Usually the quotes version is used for your own private include files,
whereas the angle bracket form specifies standard library header files. Of course,
files other than headers may be included, such as other C source programs.
All C compilers come with a set of libraries, which give the writer facilities to
do complex mathematics, input and output routines, file handling, graphics etc.
These libraries consist of a number of functions (see Table 9.10) in object code
form, together with a dictionary. Such libraries are added to the linker's command
line as shown in Fig. 7.5; however, the linker does not treat a library object file
in the same way as a normal program object file. Rather than adding all the
object code in a library to the existing code, only functions which are referred
to and declared extern by the user's modules are extracted. Thus functions are
selectively added.
Most compilers come with a librarian utility. This allows the programmer to
make up a library of his/her own favorite functions or, more dangerously, alter
the commercial ones. The linker scans libraries in the order they are named in its
command line; thus it is possible to replace unsatisfactory commercial functions
by home-brew ones.
Old C did not specify a standard library, although many of the more common
functions became a de facto part of the language. The ANSI standard does specify
a de jure common core library [8], but most compilers have additional libraries
to deal with operating system-specific functions, graphics, communications etc.
In general the standard libraries are only relevant in a hosted environment.
In a free-standing situation, such as met in embedded microprocessor targets,
many library functions are either irrelevant or require modification.
Most compilers that are not operating-system specific use libraries at several
levels. The lowest of these is the machine library, which holds basic subroutines
HEADERS AND LIBRARIES 275
which the assembly-level source code can use without the writer at the high-level
being aware of their existence. Thus, for example, an integer multiplication in a
6809 MPU target requires a 16 × 16 operation, although the processor itself has
only an integral 8 × 8 MUL instruction. It is likely that the C-originated assembly
code will include a JSR to the requisite subroutine held in the machine-level li-
brary. An example of this is given in line 10 of Table 9.6, where the subroutine
a~pushstr is used by the compiler to implement the passing of a structure to a
function (see also Table 14.6, line 115).
The next up in the hierarchy of libraries provides low-level support routines
used by the user callable libraries, and includes all the operating-system inter-
face routines. For example, they may contain subroutines to obtain a character
from a terminal (typically called inch for input character) and to output a single
character (typically outch for output character). The actual code here depends
on the hardware. In a non-hosted environment, the writer will alter such routines
to suit the system.
The user-callable libraries contain all the functions which may be explicitly
called from the C program. These are the ANSII standard libraries and the various
high-level options, such as graphics. Such libraries make use of the low-level
support library when interacting with the environment.
Variations, include optional integer libraries (suitable for embedded applica-
tions where the normal floating-point functions may not be required) and libraries
coded to make use of mathematics co-processors.
Given that libraries comprise a number of functions external to the user's
program, such functions that are to be called must be declared extern and pro-
totyped in the normal way. To avoid this chore, compilers come with a number
of standard header files which may be #included as appropriate at the head of
the user program. Table 9.10 shows the header file math.h provided with the
Cosmic/Intermetrics cross 6809 C compiler V3.31, as an example. This declares
most of the standard ANSII maths library. As can be seen, the majority of maths
functions take double float arguments and return a double float value.
This header file is designed to be used by several related compilers. If the
variable _PROTO has been defined, then any text of the form __(a) will be replaced
by just a:
#ifdef _PROTO
#define __(a) a
#else
#define __(a) ()
#endif
For example, on this basis the first True line will be converted by the preprocessor
to:
which is the normal ANSII C function prototype. However, if _PROTO is not de-
fined we will get:
276 C FOR THE MICROPROCESSOR ENGINEER
which is suitable for an old C-style compiler, which does not support prototyping.
Notice how the internal variable __MATH__ is defined at the top of the header. This
lets subsequent headers know that the math.h header is present.
Finally, the ANSII committee have authorized the #pragma directive, as a prag-
matic way of introducing compiler dependent directives, which may be anything
the compiler writer wishes. An example of this from the same compiler is:
#pragma space [] @ dir
which instructs the compiler to store (i.e. space) all non-auto data objects (des-
ignated []) in direct memory. That is, use the Direct address mode for static
and extern data objects instead of the default Extended Direct addressing mode.
#ifndef __MATH__
#define __MATH__ 1
/* set up prototyping */
#ifndef __
#ifdef _PROTO
#define __(a) a
#else
#define __(a) ()
#endif
#endif
/* function declarations */
double acos __((double x)); /* Computes the radian angle, cos of which is x */
double asin __((double x)); /* Computes the radian angle, sine of which is x */
double atan __((double x)); /* Computes the radian angle, tan of which is x */
double atan2 __((double y, double x));
/* Computes the radian angle of y/x. If y is -ve the result is -ve.
If x is -ve the magnitude of the result is greater than pi/2 */
double ceil __((double x)); /* Computes the smallest integer >=to x */
double cos __((double x)); /* Computes the cosine of x radians, range [0,pi] */
double cosh __((double x)); /* Computes the hyperbolic cosine of x */
double exp __((double x)); /* Computes the exponential of x */
double fabs __((double x)); /* Obtains the absolute value of x */
double floor __((double x)); /* Computes the largest integer <= x */
double fmod __((double x, double y));/* Computes the floating-pt remainder of x/y */
double log __((double x)); /* Computes the natural logarithm of x */
double log10 __((double x)); /* Computes the common logarithm of x */
double modf __((double value, double *pd));
/* Extracts the integral and fractional parts */
double pow __((double x, double y)); /* Raises x to the power of y */
double sin __((double x)); /* Computes the sine of x rads, range [-pi/2,pi/2] */
double sinh __((double x)); /* Computes the hyperbolic sine of x */
double sqrt __((double x)); /* Computes the sqr root of x; if x -ve returns 0 */
double tan __((double x)); /* Computes the tan of x rads, range [-pi/2,pi/2] */
double tanh __((double x)); /* Computes the hyperbolic tangent of x */
int abs __((int i)); /* Obtains the integer absolute value of i */
#endif
References 277
Obviously this is very target specific, and considerations of this nature are the
subject of the next chapter.
References
[1] Kernighan, B.W. and Ritchie, D.M.; The C Programming Language, 2nd. ed., Prentice-
Hall, 1988, Section A7.3.2.
[2] Jaeschke, R.; Recursion, Variable Classes and Scope, DEC Prof., 3, no. 4, 1984, pp. 84 –
93.
[3] Jeaschke, R.; Pointers to Functions, Programmer's Journal, 3, no. 2, 1985, pp. 20 – 21.
[4] Cahill, S.J.; Digital and Microprocessor Engineering, 2nd. ed., Ellis Horwood/Simon
and Schuster, 1993, Section 5.3.
[5] Jouvelot, P.; De L'Assembleur aux Languages Structures: Le Language `C'; Micro Sys-
tems (France), no. 42, June 1984, pp. 102 – 112.
[6] Banahan, M.; The C Book, Addison-Wesley, 1988, Section 8.2.2.
[7] Banahan, M.; The C Book, Addison-Wesley, 1988, Chapter 7.
[8] Banahan, M.; The C Book, Addison-Wesley, 1988, Chapter 9.
CHAPTER 10
ROMable C
In the last two chapters we have seen that it is possible to take a source program
written in C and compile to assembly level. This assembly code can then be
linked and converted into a machine code file, ready for loading, as described
in Chapter 7. In that chapter, we observed that the environment of a hosted
computer is very different to that of a naked system. In the former situation,
each operator request for a program run causes the relevant machine-code file
to be loaded into computer RAM (usually from disk) and execution to commence.
In a naked system the program is normally permanently resident in ROM. Thus
the initializing loading stage is eliminated. A compiler producing code which can
be run in ROM is known as a ROMable compiler.
At the very least, a ROMable compiler must provide the means to put pro-
gram code and constant data in one section of memory (i.e. ROM), and variable
data in another (i.e. RAM). However, there remain several other problems to over-
come before a high-level sourced program can successfully run in a naked system.
Typical of these are the means to set up the System stack, Reset and Interrupt
vector tables, link in hand-assembled routines and implement exception service
routines. Handling MPU-specific tasks, such as setting interrupt mask bits in the
Status/CCR register, and portability issues raise their heads.
Most of these hardware-related activities are handled by the operating system,
but in a free-standing environment the programmer must provide such services
as are required by the executing software. In this chapter we examine this aspect
of software design in more detail.
278
MIXING ASSEMBLY CODE AND STARTING UP 279
Table 10.1: Elementary startup for a 6809-based system (continued next page).
.processor m6809
;*******************************************************************
;* Startup code for non-interrupt system *
;* Assumes RAM up to 07FFh *
;*******************************************************************
.external _main ; _main is outside this file
_Start: lds #0800h ; Point Stack Pointer to top of RAM
jsr _main ; Go to C code
bra _Start ; Should it return then repeat
.public _Start ; Make this routine known to linker
.end
.processor m6809
;******************************************************************
;* Vector table, Reset vector only *
;******************************************************************
.external _Start ; Start is outside this file
.word [6] ; Miss out the interrupt vectors
RESET: .word _Start ; Put restart address here
.end
main()
{
static int i;
while(1) {i++;}
}
6 ; *********************************************************
7 ; * Startup code for non-interrupt system *
8 ; * Assumes RAM up to 07FFh *
9 ; *********************************************************
10 .psect _text
11 .external _main ; _main is outside this file
12 E000 10CE0800 _Start: lds #0800h ; Point Stack Pointer to top of RAM
13 E004 BDE009 jsr _main ; _main is outside this file
14 E007 20F7 bra _Start ; Should it return then repeat
15 .public _Start ; Make this known to the linker
16 ; 1 main()
17 ; 2 {
18 ; 3 static int i;
19 ; 4 while(1) {i++;}
20 E009 7C0002 _main: inc L3_i+1
21 E00C 2603 jbne L4
22 E00E 7C0001 inc L3_i
23 E011 20F6 L4: jbr _main
24 ; 5 }
25 .public _main
26 ;******************************************************************
27 ;* Vector table, Reset vector only *
28 ;******************************************************************
29 .external _Start ; Start is outside this file
30 FFF2 .word [6] ; Miss out the interrupt vectors
31 FFFE E000 RESET: .word _Start ; Put restart address here
32 .end
The startup.s and vector.s files are assembled to their relocatable object ver-
sions startup.o and vector.o. The compiler then converts fred.c to fred.o
and links startup.o to this code followed by vector.o. startup begins at
E000h (+text -b0xE000) and vector at FFF2h (+text -b0xFFF2), where we are
MIXING ASSEMBLY CODE AND STARTING UP 281
assuming ROM between E000h and FFFFh (e.g. a 2764 EPROM). The code in Ta-
ble 10.1(d) shows everything in its proper place.
If desired, the Vector module could be written in C and compiled to vector.o
before linking. A possible C routine with the same role as Table 10.1(b) is given
in Table 10.2. This is an example of an array of pointers to functions, where
vector[n] is a const pointer to function n (the name of a function is its address,
thus main is the pointer to function main()) [2]. Only the Reset vector is shown;
to expand the function to include interrupt vectors just replace the null pointers
by the root name of the appropriate handler function (see Table 10.7).
ANSII C permits a pointer of any kind to be assigned the constant zero, as is
done on line C2 of Table 10.2, that is a void or null pointer. No legitimate data
should be held at this address (i.e. 0000h) [3]. For this reason, the linker script
above, which assumed memory between 0000h and 07FFh (e.g. a 6116 RAM),
started the data bias at 0001h (+data -b1) rather than the more obvious 0000h bias.
Starting up a 68000 MPU-based system can be done in the same way as for
the 6809, with a separate text bias for the Vector and Startup routines, typically
00000h and 00400h respectively. However, as the program usually directly fol-
lows the Vector table, a composite Vector/Startup module may be created and
linked in at zero. As shown in Table 10.3, the User Stack Pointer is setup and the
state changed to User before entering the main C routine (see also Table 14.10).
Tables 10.1 and 10.3 are simple examples of incorporating assembly routines
with C code. They are elementary because no data is explicitly passed between
them. It would have been quite easy to pass the value, say, i = 6 to main() in
Table 10.1, but we would have to know how the C compiler handles such variables,
5 00400 207c 00001000 _Start: movea.l #0x1000,a0* Fix to set-up User Stack Ptr
6 00406 4e60 move a0,usp * Privileged instruction
7 00408 027c dfff andi.w #0xDFFF,sr* Bit 13 changes to User state
8 0040c 4eb9 00000416 jsr _main * Go to C routine
9 00412 6000 fff8 bra _Start * Repeat if returns
10 .end
as each has its own house rules. In fact, that particular compiler would have
expected i to be passed in Accumulator_D rather than through the System stack
(see Table 10.6). Thus in any particular compiler, a knowledge of its operation is
needed in order to mesh the two successfully.
Before giving an example, why use a mixture of the two languages, except
for startup? It is an accepted rule of thumb that a program will spend some
90% of its time in around 10% of the code [4]. Where time is of the essence,
replacing this code by equivalent assembly-based subroutines will be beneficial.
Another candidate for assembly code is the creation of library routines (see also
Table 10.16). As these will be used by many different projects, time spent in
refining such code can be justified in some cases.
Our example here involves the creation of a library subroutine to return the
unsigned short int square root of an unsigned short int parameter. The
function is to mimic the C function:
1. Integral and pointer parameters are extended to four bytes and pushed onto
the System stack least significant byte first. Where there is more than one
parameter, then the compiler works along the list from right to left.
2. Registers D3 – D5 and A3 – A7 are guaranteed unaltered by the function on
return.
3. Integral and pointer parameters are returned in D7.L.
There are of course also rules for floating-point and structure objects.
The algorithm implemented by Table 10.4 uses Newton's numerical method [5].
√ 1
This states that if we guess an initial value for x, usually 2 x, then:
1
NEW_ESTIMATE = (OLD_ESTIMATE − x/OLD_ESTIMATE)
2
MIXING ASSEMBLY CODE AND STARTING UP 283
Table 10.4 A C-compatible assembler function evaluating the square root of an unsigned int.
.processor m68000
; *********************************************************************
; * Calculates the square root to the nearest lower integer *
; * using Newton's method where an original estimate of n/2 is made *
; * and successive estimates are = (old_estimate + n/old_estimate)/2 *
; * Exit either after 20 iterations or when new and old estimates *
; * are the same *
; * EXAMPLE : Return for n = 18 is 4 *
; * ENTRY : short unsigned int is passed on the Stack at SP+4/5 *
; * EXIT : Return in D7.W as an unsigned short int, max 256 *
; * EXIT : D0/D1/D2 and CCR altered *
; *********************************************************************
;
_sqr_root: move.w 4(sp),d7 ; Copy n to D7.W
cmp.w #1,d7 ; n = 0 or 1?
bhi CONTINUE ; IF higher than continue
bra EXIT ; ELSE exit with answer = n
CONTINUE: lsr.w #1,d7 ; Create first estimate by dividing by 2
move.w #19,d0 ; 19+1 iterations count in D6.W
; After initialization repetitively build up new estimate in D2.L
LOOP: move.w d7,d1 ; Copy estimate into D1.W
clr.l d2 ; Copy n into D2 as a 32-bit clone
move.w 4(sp),d2 ; for the division following
divu d7,d2 ; [D2.W] = n/old_estimate
move.w d2,d7 ; Move it to D7.W
add.w d1,d7 ; [D7.W] = old_estimate + n/old_estimate
lsr.w #1,d7 ; Divide by 2 to give the new estimate
cmp.w d1,d7 ; Compare new with old estimates
dbeq d0,LOOP ; IF equal exit ELSE dec loop count; IF not -1 repeat
EXIT: rts ; ELSE exit with answer in D7.W
.public _sqr_root; Make known to the outside world
.end
and if we keep going round the loop, the estimate will converge to the desired
value. In our listing, I have exited whenever NEW_ESTIMATE = OLD_ESTIMATE or
when the number of interations reaches 20. The latter is necessary, as numerical
techniques often produce an oscillating outcome (for example x = 65535 pro-
duces an estimate alternating between 255 and 256), or even do not converge.
Without an unconditional out, such functions may go into an unscripted endless
loop.
In Table 10.4, all variables are held in registers, so no frame is created and
SP is used as the reference to obtain the passed variable x (MOVE.W 4(SP),D7).
Furthermore, none of the preserved registers are used, therefore they do not
require saving and retrieving. The answer is returned in D7 as required.
Calling up the function from a C program is done in exactly the same way as
any function actually written in C, for example:
x = sqr_root(27U);
Table 10.5 Using in-line assembly code to set up the System stack.
main()
{
static int i;
_asm("lds #0800h ; Point Stack Pointer to top of RAM");
while(1) {i++;}
}
(a) C source.
One of the disadvantages of using any high-level language is the loss of the
ability to use any special feature of the underlying processor. For example, it
may be necessary to lock out any interrupt occurring during a specific part of the
code. How could we handle a 6809-based system with the requirement to stop
at a specific point and use the SYNC instruction (see page 163) to continue when
an interrupt subsequently occurs? Of course, we could write the code as part of
an assembly subroutine and link it in as previously shown, but this is not very
efficient for short sequences.
Many C compilers permit the insertion of assembly source lines interleaved in
the C source code. Although this is a common feature, it is not standard, and thus
is very implementation dependent. Where it is available, the keyword asm is usu-
ally involved. For example, the Aztec C compilers use #asm and #asmend to sand-
wich such code. The Microtec equivalent uses a #pragma asm – #pragma endasm
sandwich. Our illustration in Table 10.5 uses the Whitesmiths group built-in func-
tion _asm() for this purpose. Here I have forced a LDS #0800h assembler line
MIXING ASSEMBLY CODE AND STARTING UP 285
in at the beginning of the C code. This obviates the need for the Startup mod-
ule, but the Vector module must still be linked in. _asm() can take several lines
of assembly code as its argument between double quotes, and use \n and \t to
indicate New Line and Horizontal Tab respectively.
The Microtec asm() can optionally use an assembly command to return data
to a C object. For example:
switch = asm(unsigned char, "move.b 9000h,d0");
which assigns the value read from 9000h to an unsigned byte C variable.
Despite its flexibility, assembler windows should be used sparingly, as it seri-
ously compromises the portability of such code (see Section 10.4).
It is possible to call a function whose absolute location is known from a C pro-
gram, but which cannot be accessed in relocatable object form by the linker. This
is likely to occur when the target system has a resident operating system/monitor,
and the C user program wishes to use those external resources. Another situation
which requires this facility, is where a preprogrammed mathematics package is
resident, for example the 6839 floating-point ROM.
As an example, assume that a certain ROM-based 6809-monitor has a subrou-
tine called OUTCH (OUTput CHaracter) located at F830h. This sends out a single
character, passed to it in Accumulator_B, to the terminal. We wish to make use of
this subroutine in implementing a C function which sends a character ch to the
terminal whenever called.
Now we noted on page 281 (see also Table 10.2), that in ANSII C the name
of a function is a pointer to that function, that is its address. Thus, it might
be thought that the statement (0xF830)(ch); would pass ch and jump to the
subroutine at F830h. However, 0xF830 is an integer constant so we must first
cast it to type pointer-to a void function taking a single char parameter, that
is (void(*)(char))0xF830. This complex cast reads from inside out: pointer
to function (*)/ taking a char (char)/ returning void. The whole is enclosed
by the cast's parenthesis and qualifies the constant 0xF830. Note how the com-
plex type reads from inside out first right then left. This is the normal way of
constructing compound types.
In Table 10.6 I have used a header to replace the name OUTCH by this casting.
It would be normal to use a header to define the resources available in such a
co-resident ROM. Thus the statement:
(OUTCH)('\n');
LDB #10
JSR 0F830h
as desired.
Table 10.6 defines a function known as void new_line(void) which is de-
signed to send a Line Feed (ASCII code 10) to the terminal. This simply in turn
sends out '\n' to OUTCH. The character '\n' is C'ese for New Line (or Line Feed).
286 C FOR THE MICROPROCESSOR ENGINEER
1 .processor m6809
2 ;****************************************************************
3 ;* Vector table, IRQ and vector only *
4 ;****************************************************************
5 .external _Start, IRQ_handler
6 FFF2 .word [3] ; Miss out SWI2, SWI3 and FIRQ
7 FFF8 E00B IRQ: .word IRQ_handler ; Put IRQ handler address here
8 FFFA .word [2] ; Miss out SWI & NMI
9 FFFE E000 RESET: .word _Start ; Put restart address here
10 .end
Consider the program of Table 9.5. There are two functions here, the back-
ground main function called display() and the interrupt service function update().
Function update() is not explicitly entered, or indeed known, by background
function display(); they communicate through global object Array[], which is
known to both of them.
We look first at the 6809 processor and assume the use of IRQ to switch con-
text. As the entire processor state is automatically saved, all our interrupt han-
dler (IRQ_handler in line 11 of Table 10.7(a)) has to do is jump to the subroutine
_update, and on return do a RTI. The address of IRQ_handler is placed in the
IRQ vector in Table 10.7(b). Thus, when an IRQ interrupt occurs, the processor
will save its state and go via the IRQ vector (FFF8:9h) to the stub IRQ handler in
the startup routine. This simply jumps to the appropriate C function and termi-
nates with a RTI. Notice that this startup routine clears the I mask bit in the CCR
(line 8), which allows the MPU to respond to IRQ requests. The I mask has been
automatically cleared after a Reset.
The situation would be a little more complex if FIRQ were used to initiate the
288 C FOR THE MICROPROCESSOR ENGINEER
10 00400 207c 00001000 _Start: movea.l #0x1000,a0* Make to set-up User Stack
11 00406 4e60 move a0,usp
12 00408 46f8 0100 move.w 0x0100,sr * User state, Int mask = 001
13 0040c 4eb9 00000426 ENTER: jsr _display * Go to background C routine
14 00412 6000 fff8 bra ENTER * Repeat if returns
exception. In this situation, only the PC and CCR are automatically saved. Thus
the handler must use a Push/Pull pair to sandwich the JSR, in order to preserve
the state. This is the situation for all 68000-based interrupts and the Push/Pull
sandwich is clearly seen at INT2_handler in lines 15 and 17 of Table 10.8. The
house rules of this compiler (Whitesmiths V3.2) are such that registers D3 to D5
and A3 to A7 are preserved in any C function, so only the remaining registers are
saved by the handler. The three interrupt mask bits are set to 001 in line 12 to
enable level-2 interrupts (they were set to 111 when the MPU was Reset).
Both Tables 10.7 and 10.8 are linked in with the C code in exactly the same
manner as for the corresponding Tables 10.1 and 10.3. Software interrupts and
exceptions are handled in the same way as hardware interrupts. Where interrupt
vectors are stored in RAM rather than the normal ROM, the startup routine must
dynamically load the address before enabling the interrupt mask.
In a realistic system, the startup is likely to be more complex than these exam-
ples show. For example, any programmable I/O devices should be configured be-
fore enabling interrupts. If the exception service routine communicates through
global variables and these are presumed to have an initial value, then this too
should be done in the startup module. This will be described in the following
section.
The double-hop response to an interrupt slows down the MPU's response to
a request. There are two ways around this problem. The first involves writing
the interrupt service routine (ISR) entirely in assembly language; thus the handler
EXCEPTION HANDLING 289
becomes the whole routine. If the ISR is of any size, it is likely that it will be in a
separate file or library, and will be added in through the linker.
Some compilers allow the programmer to specify a C function as an inter-
rupt service routine. In such cases the generated assembly code includes an
entry sequence that saves all used registers that the compiler's house rules state
must be preserved. On exit these are returned and a RTI/RTE generated at
exit. Like assembly windows, these are extensions to the ANSII standard and are
highly compiler specific. As an example, the Mictrotec Research Paragon C cross
to 68000/68020 V3 compiler requires such functions to be sandwiched by the
$INTERRUPT directive, for example:
#define $INTERRUPT
< definition of function ifred() >
#undef $INTERRUPT
Function ifred() will then be coded as an interrupt service routine rather than
a subroutine. As many functions as required may be sandwiched.
To illustrate the effect of $INTERRUPT, consider the Real-Time Clock program
of Table 8.6. Compiling this as an interrupt service function with the Paragon
compiler, gives the code in Table 10.9. Notice how the registers are saved and
restored at the beginning and end of the routine, and the terminating RTE. In
this situation, the address of clock(), that is .clock, should be placed in the
appropriate vector, rather than that of an intermediate handler.
The Whitesmiths group compilers versions 3.3 and up, use the prefix @port
to specify interrupt service functions, see Tables 14.6 and 14.12. Thus:
would give us the Real-Time Clock interrupt service function for these compilers.
Using interrupts in high-level code is fraught with difficulties. Unlike assembly
code, an interrupt will be serviced in the middle of a high-level instruction. If, for
example, we had a global int variable i which was shared between background
and foreground routines, then an interrupt in the middle of an instruction i++
may well produce intriguing results, for instance:
i++
inc L3_i+1 ; Increment lower byte
<<<- - - - - - - - - - - - - - Interrupt - - - - - - - - - - - - - - >>>
bne L4 ; IF not zero THEN continue
inc L3_i ; ELSE increment upper byte
L4:
Here we have assumed a 6809 compiler with a 16-bit int. To increment this
object, the lower byte has been incremented first, and only if this rolls over to
zero is the upper byte incremented (Table 10.1, lines 20 – 22). If i was initially
00FFh, then the first INC produced i = 0000. If an interrupt now occurs, and
the ISR used, say, i to update array[i], then array[0] will be altered instead
of array[256]! Clearly a compiler that used the sequence:
290 C FOR THE MICROPROCESSOR ENGINEER
LDD L3_i
ADDD #1
STD L3_i
would be better; however, long 4-byte integers will still be prone to disjoint global
problems like this.
C compilers for 8-bit processors normally use absolute memory locations to
hold floating-point numbers, rather than internal registers, and this non-recursive
mode makes floating-point arithmetic particularly prone to this problem. Even
16/32-bit devices, which can handle all sizes of integers in one indivisible ma-
chine instruction, require multiple floating-point operations, unless using a math-
ematical co-processor. Thus, in general it is inadvisable to use floating-point
global variables which can be altered by interrupt service routines. Similar con-
siderations apply to any global compound-element structure and multiple-byte
integers for 8-bit MPUs. Of course it is always possible to mask out interrupts
during sensitive processing.
Interrupt problems occurring due to disjoint operations are particularly per-
nicious because they appear very rarely and apparently at random. As they are
not reproducible to order, it is virtually impossible to track them down!
If global variables have to be shared, the normal advice is to ensure that only
the highest order of interrupt service function making use of the variable actually
does the changing. Here the background function is treated as level 0. Thus
in our Real-Time Clock, the interrupt function clock() is permitted to change
the global variables Seconds, Minutes and Hours, with the background and any
lower priority interrupts only reading these variables. Higher priority interrupt
functions should not make any reference to these variables.
This procedure is not foolproof. Consider a background function turning off
the central heating pump each morning at 9 am, that is 09:00:00. It turns the
pump off and on by pulsing a toggling flip flop. It is now 09:59:59. The program
reads Hours as 09. Getting interested, it is about to read Minutes when an in-
terrupt occurs and alters the time to 10:00:00. On return, Minutes and Seconds
are then read as 00:00, and the processor thinks it is 9 am, toggles the flip flop
and turns the pump on again! This may happen perhaps once a year, but when it
does, the switching will continue at 180◦ from the proper sequence. The cure is
to mask out the interrupt when the time is being read, or to read it several times
in quick succession — and not to use a toggling flip flop as the pump interface!
Table 8.3(b), these fixed values are moved into memory each time the local area
in which their scope applies is entered, that is run-time setup. Uninitialized
variables have an indeterminate value until assigned.
2. static and global variables (static or otherwise) can be initialized in their
definition. The resulting code leads to a compile-time set-up, where the con-
stants are placed in memory by the loader, see Table 8.3(a). When the program
starts, it assumes that these values are already in situ, put there by some out-
side agency (the loader). On subsequent executions, any altered variables will
not regain their original values, unless a load precedes the run. Uninitialized
static/global variables are given an explicit zero value, as for example in
Table 10.9.
3. static or global objects declared const, are placed by the compiler in the
text area of memory. In an embedded system, this will be in ROM, and is use-
ful for look-up tables and string constants. Such objects are always present,
with their initial values placed there by the one and only load into the EPROM
programmer. Table 9.2 shows an example of this situation.
Aztec:
__H0_org & __H0_end Code segment start and end+1.
__H1_org & __H1_end Initialized data segment start and end+1.
__H2_org & __H2_end Uninitialized data segment start and end+1.
Cosmic:
__text__ Code segment end+1.
__data__ Initialized data segment end+1.
__bss__ Uninitialized data segment end+1.
The linker allows the programmer to set the starting address of each section
separately. If the BSS/Section 14/_bss sections are not biased in this way, then
they normally follow directly on from the corresponding DSEG/Section 13/_data
section.
Finally how do the compilers produce an image of the pre-initialized data
in ROM? The Aztec compiler does this automatically with the image following
on directly from the TSEG portion; that is starting at __H0_end. Its length is
__H1_end − __H1_org. Using this information, a possible startup for this Aztec
compiler is shown in Table 10.10. This is written as an extension to Table 10.3,
but using the Aztec's assembler syntax (standard Motorola). Operation is self-
evident from the comments; however, note that if a segment does not exist, then
the org and end labels are made the same by the linker, so a zero difference
signals non-existence.
The Paragon product provides two library routines, which help this copy pro-
cess. These are .initdata and memclr(). The former is designed to be called
directly from the startup routine, for example jsr .initdata, and takes no pa-
rameters directly. The latter is normally used from the C program, requiring a
pointer to the first byte and an int count.
294 C FOR THE MICROPROCESSOR ENGINEER
The linker must be informed through its command file (see Table 7.10) that an
image of Section 13 is required, by using the directive initdata. Thus in line 19
of Table 10.11, we tell the linker to generate an image of Section 13 starting
at 6000h (unfortunately there is no symbol denoting the end of Section 9, the
text). In the startup, jsr .initdata will then use the linker-generated symbols
automatically to do the copying. Line 10 informs the linker that Section 14
(uninitialized variables) is to follow Section 13. In doing this, RAM can easily
be cleared from ?RAM_START+?ROM_SIZE upwards.
In the Cosmic/Intermetrics compilers, the linker is followed by a hexer utility,
which generates machine code in the requested format for the EPROM program-
mer (see Table 7.5). Each program section can be shifted to a new start point by
the hexer; however, as the text remains unaltered, the program still assumes its
data is at the linker's (original) data bias. Thus, to produce an image of the data
section following on from the text we have:
Table 10.11 A typical lod68k command file to produce an image of initialized data in ROM for use
in the startup code.
***************************************************************************
* This is a prototype command file for the Microtec linker *
* Puts a copy of initialized data in ROM for the startup *
***************************************************************************
* Section 0 is for the entry code, e.g. vector table, in ROM *
* Section 9 is for the program in ROM usually *
* Section 13 holds initialized local static and global variables, in RAM *
* Section 14 is for other vars, e.g. Global and uninitialized statics *
* Put initialized static/globals after uninitialized same *
order 13,14 * Put Section 14 after 13 *
sect 0=0 * Vector table starting at 00000 *
sect 9=0400h * Program starts at 0400h *
sect 13=0E000h * Any data is at E000h up (RAM) *
absolute 0,9,13 * Put only these ROM sections in .hex file *
* Copy section in ROM at zzzzzh for initialized local static *
* data produced in Section 13 in RAM, if relevant *
* In entry program subroutine .initdata will copy it back *
* always at runtime into RAM *
initdata 13,6000h
list d,s,t,x,c * Public symbols in object module *
*; Local symbol table to object module; Lists it; and public *
* symbol table; Produces a cross-reference listing *
load startup * Start up assembler routine *
load fred * Then the compiled user program *
load 68000.lib\mcc68kab.lib * and absolute library *
end *
***************************************************************************
which says produce the (Intel coded) machine code file with the data bias (-db)
reset to E080h. The output file is called fred.hex and the input (from the linker)
is fred.xeq. The net result of this process is to create a copy of the initialized
data in ROM, beginning at E080h but leaving the actual data area unchanged. An
example is given in Table 10.12(b).
The Cosmic/Intermetrics 6809 compiler does not produce start_of labels (e.g.
Text segment start), but including all programs sections in the startup routine,
as shown in Table 10.12(a), defines these local symbols according to the biases
set in the linker. Thus if the linker's data bias is 1, then Start_data is 0001h.
This routine is similar to that of Table 10.10, but with differing symbols.
Cosmic/Intermetrics provide a utility toprom with their compilers version 3.32
and up, to modify their linker output to create this image automatically. The
starting address in RAM and end address of this image in ROM are also embedded
296 C FOR THE MICROPROCESSOR ENGINEER
Table 10.12 A startup for the Cosmic compiler, initializing statics/globals and setting up the DPR for
zero page.
.processor m6809
;********************************************************************
;* Startup routine for Cosmic 6809 V3.3 supporting zero page *
;* and copying initial values of statics/globals into RAM *
;********************************************************************
.external _main, __text_, __data_, __bss_
Start_data: .psect _data ; Define beginning of data section
Start_bss: .psect _bss ; Define beginning of bss section
Start_zero: .psect _zpage ; Define beginning of zero page section
.psect _text
_Start: lds #0800h ; Point Stack Pointer to top of RAM
; Now clear bss region
ldx #Start_bss ; Point to beginning of BSS
LOOP1: cmpx #__bss_ ; End yet?
beq INIT_DATA ; IF yes THEN move on
clr 0,x+ ; ELSE clear byte and advance pointer
bra LOOP1
; Now setup data region
INIT_DATA: ldx #Start_data ; Point to beginning of data
ldy #0E080h ; Point to beginning of image
LOOP2: cmpx #__data_ ; End yet?
beq ZERO_PAGE ; IF yes THEN move on
lda 0,x+ ; ELSE get byte
sta 0,y+ ; and move it
bra LOOP2
; Set up DPR to point to zero page
ZERO_PAGE: ldd #Start_zero ; Start of zero page
tfr a,dp ; Top byte to Direct Page register
jsr _main ; Go to C code
bra _Start ; Should it return then repeat
.public _Start ; Make this routine known to linker
.end
into the start of this ROM record, and are used by their provided startup routine.
This works in the same way as outlined above, but with less hassle.
As can be seen in Table 10.12, this compiler supports the use of the 6809's
direct page (or zero page) address mode (see page 35) as a non-ANSII extension.
Any static or extern data object can be placed into the assembler's _zpage
program section by preceding it by the directive @dir. Thus, altering line C3 in
Table 10.5 to:
will change the .psect _bss to .psect _zpage and the two following INC com-
mands will use Direct rather than Extended addressing, as shown in Table 10.13
(see also Table 14.8). All such objects in the file can be placed in the zero page
by using the ANSII directive #pragma:
#pragma space[]@dir
but it must be remembered that a page in the 6809's space is only 256 bytes long.
The bias for this page can be set in the linker; for instance:
ln09 +zpage -b0x8000
sets it to 8000h. This will be the value of Start_zero in Table 10.12. Bringing
this down to Accumulator_D and then doing a TFR A,DP sets up the Direct Page
register to the upper address byte (80h in this example) as required.
I have assumed in Table 10.12(a) that the initial state of the zero page does not
matter. If it does, then all 256 bytes can be cleared or an image copied from ROM.
10.4 Portability
To the microprocessor engineer, portability is one of the major attractions of
a high-level language. Thus a company upgrading a 6502-based product line
to, say, the 68000 family, can continue to use the bulk of the original software,
without a substantial change. In reality, the migration of software between dif-
fering systems, at the lowest to the highest level, is fraught with difficulties to
the unwary [6].
As an example of low-level problems that can occur, most of the newer families
of MPU are software downwards compatible. Thus the 80386 MPU has an 8086
298 C FOR THE MICROPROCESSOR ENGINEER
emulation mode and the 68020 MPU is object code compatible to the 68000. Con-
sider the CLR <memory> instruction in the 68000/8 MPU. This is implemented as
a classical read–modify–write operation, although the data read is irrelevant (see
page 25). This means that the address of <memory> is put out on the address bus
twice. A devious hardware engineer may deliberately make use of the resulting
double address decoder pulse, by using CLR, say, to increment a counter twice.
At some time later, probably after this ingenious engineer has left, the company
decides to upgrade to a 68020-based microcomputer. They have been assured
the 68000 code will directly run under 68020 control. So it does, or does it? Mo-
torola have speeded up CLR on the 68020 MPU and subsequent family members,
by dispensing with the initial useless Read cycle, ergo a counter incrementing at
half its proper rate! Abstruse bugs like this are difficult and very expensive to
unearth, but abound where software is migrated between systems.
At the higher level, one solution to the portability problem is to define a
virtual machine (i.e. having a hypothetical structure) together with a UNiversal
Computer-Oriented Language (UNCOL) [7]. Each physical machine would have a
translator from UNCOL to its particular machine code. With such a scheme, a
high-level language would only require the one machine-independent compiler
to UNCOL.
Unfortunately no UNCOL exists in practice, although several half-hearted at-
tempts towards this goal have been made. At one time, A-natural [8] was in vogue
as a kind of standard assembly language, but its close relationship to the 808x-
MPU family led to its eventual demise. Some software engineers consider C as
an UNCOL. Certainly its origins as a high-level assembly language used to port
the operating system UNIX to various hardware hosts [9] would seem to fit it into
that role. Amongst its other virtues, the relative lack of dialects, now enforced by
the ANSII standard, makes C one of the most portable of the higher languages.
But even here, 100% portability is a pipe-dream, and the term transportable is a
more apt description.
Considerations of portability depend on the type and scope of the software.
This can roughly be categorized as follows:
scanf("%u",&n);
means go to stdin and get an unsigned decimal integer (%u), which will be put
away at the address of n (i.e. assigned to n). Other formats tokens are %d, %ld,
%x, %f etc., for Decimal integer, Long Decimal integer, heXadecimal integer and
decimal Floating-point.
printf() is the formatted write to standard output function counterpart
(stdout is normally the VDU screen or printer), which sends messages with vari-
able values replacing embedded format tokens [12]. Thus:
prints the message in quotes, with the format token %u replaced by the decimal
value of num at that point in the program, and the long decimal value of sum
likewise. Notice the use of \n to give a new line. Table 10.14(b) shows a run-time
example.
Enter number
35
The sum of all integers up to 35 is : 629
(b) Typical run.
300 C FOR THE MICROPROCESSOR ENGINEER
Compilers come with a set of header files, giving amongst other things, proto-
types of all the library functions. Some of these are <stdio.h> for the standard
input/output functions and <stdlib.h> for utility functions. Table 9.10 shows
a typical <math.h> mathematics function header.
Unfortunately, even with the ANSII standard, many details are left as imple-
mentation dependent. For example, the size of ints (typically 16 or 32 bits),
whether an unqualified char is signed or unsigned, the direction of truncation
for / (divide) and the sign of the result for % (remainder) are machine-dependent
for negative operands. File handling, for example rules for naming, and various
system-related constants, such as the End of File constant (EOF is usually −1), are
operating-system specific.
Most implementation and operating-system foibles tend to be obscure and
difficult to track down. As a simple example of the former, consider the code
fragment:
int i;
for (i=0; i<32768; i++) {do this;}
This will work perfectly well in an implementation which maps int on to a 32-bit
word, but this will be done forever on a 16-bit implementation with its largest
value of +32767 (7FFFh).
To reduce the possibility of this kind of problem, system-dependent variables
should be gathered together into a header file, which can easily be altered if the
software is transposed. Also standard types can be defined. Thus:
SIGNED_32 i;
for (i=0; i<32768, i++) {do this;}
Table 10.15: Compiling the same source with a spectrum of CPUs (continued next page).
;:ts=8 ;:ts=8
;main(n) ;main(n)
;unsigned char n; ;unsigned char n;
public _main public main_
_main: link a6,#.2 main_ jsr .csav#
movem.l .3,-(sp) fcb .3
;{ fdb .2
;static unsigned short sum; ;{
bss .4,2 ;static unsigned short sum;
;for(sum=0;n>0;n--) bss .4,2
clr.w .4 ;for (sum=0;n>0;n--)
bra .8 stx .4
; {sum+=n;} stx .4+1
.7 move.l #0,d0 jmp .6
move.b 11(a6),d0 .5 clc
add.w d0,.4 lda #255
.5 sub.b #1,11(a6) ldy #11
.8 tst.b 11(a6) adc (4),Y
bhi .7 sta (4),Y
;return(sum); lda #255
.6 move.l #0,d0 adc #0
move.w .4,d0 .6 ldy #11
.9 movem.l (sp)+,.3 lda (4),Y
unlk a6 sta 24
rts stx 25
;} txa
.2 equ 0 cmp 24
.3 reg sbc 25
dseg jcs .7
end ; {sum+=n;}
lda (4),Y
(a) Aztec 68000 MPU V3.30c. sta 24
stx 25
;main(n) clc
;unsigned char n; lda .4
_main: push bp adc 24
mov bp,sp sta .4
mov cx,word ptr 4[bp] lda .4+1
;{ adc 25
;static unsigned short sum; sta .4+1
;for (sum=0;n>0;n--) jmp .5
mov word ptr [026c],0 ;return(sum);
jmp L2 .7 lda .4
; {sum+=n;} sta 8
L1: add word ptr [026c],c lda .4+1x
dec cx sta 9
L2: or cx,cx rts
jne L1 ;}
;return(sum); .2 equ 0
mov ax,word ptr [026c] .3 equ 0
;} public .begin
pop bp dseg
ret cseg
.public _main end
.end
(c) Aztec 6502 MPU V3.20c.
Table 10.15: Compiling the same source with a spectrum of CPUs (continued next page).
NAME summ(18) NLIST D
RSEG CODE(0) LIST E,L
RSEG UDATA(0) ; Version 1.5 Compiler 860818 P Code Gen 860715
PUBLIC main(150,191) ; Source: summ Prog: Date: 25-MAY-1989 12:50:15
EXTERN ?CL6801_1_15_L07 NAME summ
RSEG CODE DSEG
* 1. main(n) DEFS 00002H
P6801 EXTPUB GLOB_
* 2. unsigned char n; GLOB_: STKLN 100H
* 3. { EXTRN ENTZ2_
main: PSHB XSEG
PSHA CONST_: CSEG
* 4. static unsigned short sum; M_summ: JP ENTZ2_
* 5. for (sum=0;n>0;n--) ; 1. main(n)
CLRB EXTPUB MAIN_
CLRA ; 2. unsigned char n;
STD ?0000 ; 3. {
?0002: TSX ; 4. static unsigned short sum;
LDAB 1,X ; 5. for (sum=0;n>0;n--)
PSHB ; 6. {sum+=n;}
CLRB ; 7. return(sum);
TSX MAIN_: PUSH HL
SUBB 0,X ; line 3
INS ; line 5
BCC ?0001 XOR A
* 6. {sum+=n;} LD (GLOB_+0FFFFH),A
?0003: TSX Q_2: LD HL,00000H
LDAB 1,X ADD HL,SP
CLRA LD A,(HL)
PSHB AND A
PSHA JR Z,Q_1 ;
LDD #?0000 JR Q_3 ;
PULX Q_4: LD HL,00000H
PSHB ADD HL,SP
PSHA LD A,(HL)
PSHX DEC A
PULA LD (HL),A
PULB JR Q_2 ;
PULX Q_3:; line 6
ADDD 0,X LD HL,00000H
STD 0,X ADD HL,SP
TSX LD C,(HL)
PSHB LD A,(GLOB_+0FFFFH)
PSHA ADD A,C
PSHX LD (GLOB_+0FFFFH),A
PULA JR Q_4 ;
PULB Q_1:; line 7
PULX LD A,(GLOB_+0FFFFH)
ADDD #1 LD L,A
PSHB LD H,00000H
PSHA POP DE
PSHX RET
PULA ; 8. }
PULB DSEG
PULX ORG GLOB_
LDAB 0,X XSEG
DEC 0,X ORG CONST_
* 7. return(sum); END
BRA ?0002 ; Code Bytes: 49 (4) Constant Bytes: 0
?0001: LDD ?0000 ; Data Bytes: 2
* 8. } ; Constant Bytes: 0
?0005: PULX
RTS (e) Microtec Z80 MPU V1.5.
RSEG UDATA
?0000: RMB 2
END
(d) IAR 6801 MCU V1.15/MD2.
PORTABILITY 303
Table 10.15 (continued) Compiling the same source with a spectrum of CPUs.
Transputer DECODE (V1.2) of t_sum.bin 1 smain(n)
ID T8 "occam 2 V2.1" 2 unsigned char n;
"CC_transputer V2.0" smain: .entry smain,^m,r2,r3.
SC 0 subl2 #4,sp
TOTALCODE 148 0 movab $DATA,r3
STATIC 2 3 {
1 main(n) 4 static unsigned short sum;
5 for(sum=0;n>0;n--)
CODESYMB "main" 00000030 clrw (r3)
71 00030 ldl 1 moval 4(ap),r2
30 00031 ldnl 0 movzbl (r2),r0
20 20 00032 ldnl MODNUM beql sym.2
BF 60 00034 ajw -1 nop
D0 00036 stl 0 6 {sum+=n;}
2 unsigned char n; sym.1: movzwl (r3),r1
3 { movzbl (r2),r0
4 static unsigned short sum; addl2 r1,r0
5 for(sum=0;n>0;n--) cvtlw r0,(r3)
40 00037 ldc 0 decb (r2)
70 00038 ldl 0 movzbl (r2),r0
E1 00039 stnl 1 bneq sym.1
13 0003A ldlp 3 7 return(sum);
F1 0003B lb sym.2: movzwl (r3),r0
40 0003C ldc 0 ret
F9 0003D gt 8 }
A0 21 0003E cj 00050
6 {sum+=n;} (h) DEC V.2.3-024 VAX 750 minicomputer.
70 00040 ldl 0 NAME test(16)
31 00041 ldnl 1 RSEG CODE(0)
13 00042 ldlp 3 RSEG DATA(0)
F1 00043 lb PUBLIC main
F5 00044 add EXTERN ?CL6811_3_00_L07
70 00045 ldl 0 RSEG CODE
E1 00046 stnl 1 P68H11
13 00047 ldlp 3 1 main(n)
F1 00048 lb 2 unsigned char n;
41 00049 ldc 1 3 {
F4 0004A diff main: PSHB
13 0004B ldlp 3 PSHA
FB 23 0004C sb 4 static unsigned short
0A 61 0004E j 0003A 5 for(sum=0;n>0;n--)
7 return(sum); CLRB
70 00050 ldl 0 CLRA
31 00051 ldnl 1 STD ?0000
B1 00052 ajw 1 ?0002: TSX
F0 22 00053 ret LDAB 1,X
8 } CMPB #0
BLS ?0001
(f) Parallel C INMOS Transputer T825 V2.0. 6 {sum+=n;}
?0003: TSX
; Compilateur C pour MC68HC16 (COSMIC-France) LDAB 1,X
.psect _bss CLRA
.even ADDD ?0000
L3_sum: .byte [2] STD ?0000
.psect _text DEC 1,X
.even 7 return(sum);
_main: pshm x,d BRA ?0002
tsx ?0001: LDD ?0000
.set OFST=0 8 }
clrw L3_sum PULX
L1: ; line 5, offset 7 RTS
ldab OFST+3,x RSEG DATA
beq L11 ?0000: FCB 0,0
clra END
addd L3_sum
std L3_sum (i) IAR 6811 MCU V3.00E.
dec OFST+3,x
bra L1
L11: ; line 6, offset 27
ldd L3_sum
ldx 0,x
ais #4
rts
.public _main
.end (g) COSMIC/Intermetrics V.3.32 6816 MCU.
304 C FOR THE MICROPROCESSOR ENGINEER
will only be equivalent for the latter case. Particular problems arise in recon-
ciling targets with segmented address spaces and special I/O instructions (e.g.
the 80x86 family) to code targeted to processors with a linear address space and
memory-mapped I/O (e.g. the 680x0 family).
In practice, most portability problems occur in handling I/O and files. Op-
erating systems are designed to act as an insulating layer between applications
programs (software that you write) and such considerations. Most small and
medium-sized embedded systems are self-standing, or at most a ROM-based mon-
itor may be resident.
Without this decoupling, it is likely that the designer will have to write the
startup/support code and library routines to handle, where applicable, inter-
rupts, fault response, memory management and device protocol. A good deal
of this is processor dependent, and so must be coded at assembler level, which
by definition is non-portable.
A larger embedded system may be able to support the overhead of a resident
commercial operating system. The majority of standard operating systems are
not suitable for this category of system, supposing as they do a fairly standard
computer environment. More relevant real-time systems software can be pur-
chased, but hardly add to the portability score. Sometimes a single-board com-
puter may be available which mimics a standard computer architecture, such as
an IBM PC. This can then be used in certain circumstances with a standard oper-
ating system, such as MSDOS. A ROM version of the system software is available
where a magnetic disk bulk storage unit is not required.
An embedded configuration is characterized by a rich variety of I/O devices,
such as lamps, 7-segment and alphanumeric LCD displays, switches, keypads,
analog to digital converters and many more exotic examples. Using standard I/O
library routines, such as in Table 10.14, is hardly practicable in these situations.
Instead special device drivers must be developed. These can be written in C, but
care must be taken, as machine and architectural considerations intrude at this
level. Standard ANSII and other library routines which do not access I/O can be
utilized in the normal way.
Where peripherals resembling standard computer terminals will be attached
to the system, then the ANSII I/O routines can be used in the usual way. These
routines, such as printf() and scanf() as well as file input/output make use of
the base routines putchar() and getchar(). Thus if putchar() and getchar()
PORTABILITY 305
Table 10.16 Tailoring the ANSII I/O functions to suit an embedded target.
int putchar(unsigned char c)
{
_asm("clr.l d0\n");
_asm("move.b 7(sp),d0 * Get c out of stack widened to int \n");
_asm("jsr 0x7F1E * OUTCH \n");
_asm("clr.l d7\n");
_asm("move.b d0,d7 * Return(c); \n");
}
int getchar(void)
{
_asm("clr.l d7\n");
_asm("jsr 0x7F00 * INCH \n");
_asm("move.b d0,d7 * Return it \n");
}
#include <maximot.h>
#include <stdio.h>
main()
{
printf("Hello world");
}
* 1 #include <maximot.h>
* 2 #include <stdio.h>
L5: ; This is the string "Hello world",0
.byte 72,101,108,108,111,32,119,111
.byte 114,108,100,0
* 3 main()
* 4 {
.even
_main: link a6,#-4 ; Open a frame to send a pointer to the string
* 5 printf("Hello world");
move.l #L5,(sp) ; Push out the pointer to the string
jsr _printf ; Goto the printf() function declared in <stdio.h>
unlk a6 ; Close down the frame
rts ; and return to the Startup routine
.globl _main
.globl _printf
(d) The resulting source code, with printf() extracted from the library.
are written to suit the target hardware, then the higher-order input/output library
routines can be used in the normal way.
As an example, consider a self-standing circuit based on an embedded 68000-
MPU. This system runs under an operating system monitor which communicates
with a terminal through a bidirectional serial link through a UART. The monitor
306 C FOR THE MICROPROCESSOR ENGINEER
has two subroutines to send and receive single characters along this link. Sub-
routine OUTCH is located at 7F1Eh and sends out one 8-bit character located in the
bottom byte of D0. Subroutine INCH at 7F00h waits until a character is received
and returns with it in the lower byte of D0.
The definition of the C function putchar() is:
giving the declaration unsigned char putchar()(int c), where c is the char-
acter to be sent out. The definition of this function is given in Table 10.16(a),
and simply extracts c from the System stack (seven bytes up from SP), jumps
to subroutine OUTCH and then widens and copies it to D7.L, the normal return
register for this compiler (Cosmic V3.32).
The definition of getchar() is:
References
[1] Lawrence, P and Mauch, K.; Real Time Microcomputer Systems Design, McGraw-Hill,
1987, Section 7.6.
[2] Banahan, M.; The C Book, Addison-Wesley, 1988, Section 5.6.
References 307
[3] Kernighan, B.W. and Ritchie, D.M.; The C Programming Language, Prentice-Hall,
2nd. ed., 1988, Section 5.4.
[4] Doyle, J.; C – An Alternative to Assembly Programming, Microprocessors and Mi-
crosystems, 9, no. 3, April 1985, pp. 124 – 132.
[5] Crenshaw, J.W.; Square Roots are Simple?, Embedded Systems Programming, 4, no. 1,
Nov. 1991, pp. 30 – 52.
[6] Dettmer, R.; A Movable Feast: The TDF Route to Portable Software, IEE Review, 39,
no. 2, March 18th, 1993, pp. 79 – 82.
[7] Goor, A.J. van de; Computer Architecture and Design, Addison Wesley, 1989, Sec-
tion 2.5.3.
[8] Reid, L. and McKinly, A.P.; Whitesmiths C Compiler, BYTE, 8, no. 1, Jan. 1983,
pp. 330 – 344.
[9] Johnston, S.C. and Ritchie, D.M.; Portability of C Programs and the UNIX System, The
Bell System Technical Journal, 57, no. 6, part 2, 1973, pp. 2021 – 2048.
[10] Kernighan, B.W. and Ritchie, D.M.; The C Programming Language, Prentice-Hall,
1978, Chapter 7.
[11] Kernighan, B.W. and Ritchie, D.M.; The C Programming Language, Prentice-Hall,
2nd. ed., 1988, Chapter 7 and Appendix B.
[12] Barclay, K.A.; ANSI C Problem Solving, Prentice-Hall, 1990, Appendix F.
PART III
Project in C
Preliminaries
310
PRELIMINARIES 311
nature of the phosphor's luminance lifetime; thus the CRT must be selected with
the application in mind. A digital solution, in conjunction with a standard CRT,
provides a much more flexible solution. Using a MPU to control the acquisition,
storage and display of the data, means that additional features, such as freeze,
back spacing and signal processing, can also be accomplished. Furthermore, once
the data is in situ it can be used in ways not related to the display function.
In essence, we need to continuously sample the signal at a suitably slow rate,
while concurrently scanning and displaying several seconds' worth of past data at
a faster rate suitable for the human eye. A typical sequence, showing the resulting
scrolling trace, is shown in Fig. 11.2. This diagram shows file snapshots taken
1
at 4 window intervals. A window here is defined as the time past shown on the
display. The most historical data is shown to the left of the display, and this
results in the trace scrolling to the left as new data is acquired. In implementing
this process for our project, we will have created a time-compressed memory.
Unlike Fig. 11.1, this technique does not rely on the phosphor luminance lifetime.
312 C FOR THE MICROPROCESSOR ENGINEER
11.1 Specification
The customer specification is the rock on which the enterprise is built. As such,
it should be treated with the same respect afforded to the foundation of any
building.
The product request will normally originate either from the customer, or as
a projected need from marketing personnel. Unless the objective is the exact
replacement of a product already on the market, for example a central heating
controller, such a request is likely to be couched in the language of the application
rather than in technical terms. There will be obvious boundary constraints of both
a financial and technical nature, but other concerns may well involve complying,
say, with legal rulings, such as medical safety requirements.
In essence, the design team must tease as much information as possible from
the originator; take away the request and return with a set of proposals. This will
involve consideration of the following questions:
• What is it to do?
• Is it possible?
• How is it to be done?
• Can the request be modified?
1: Input
Three-lead ECG/EKG signal with integral amplifier having a bandwidth of 0.14 Hz
to 50 Hz.
2: Output
100 mm (4 ) width standard CRT, displaying a nominal two seconds worth of
data.
3: Data accuracy
±0.5% of full scale.
4: Display resolution
Better than 0.5 mm.
5: Facilities
Freeze on demand. Sampling variation of −50% to + 100% around nominal.
314 C FOR THE MICROPROCESSOR ENGINEER
The average adult has a resting heartrate of 72 beats per minute (0.83 Hz), with
a variation between 40 and 180 beats per minute over all conditions. Although the
frequency range is essentially contained in the range 0.14 Hz to 50 Hz [1], most of
the energy lies below 20 Hz. Thus the 128 Hz sampling rate will give at least six
samples per cycle, which is just adequate for reasonable visual representation.
Increasing the sampling rate to, say, 512 Hz, would require a 1024-word data
store and consequently a 10-bit digital to analog converter. Furthermore, eight
samples per scan would be needed to keep the dot rate on the screen the same.
SYSTEM DESIGN 315
Remember that the dot rate is the time used by the MPU to get and send out the
new X and Y values to the CRT amplifiers.
Of course designing a prototype and subsequent modifications is only the
beginning of the process. Setting up a production line is expensive, and the al-
ternative of subcontracting all or part of this activity is one of the major design
decisions that will be taken at this point. With the assumption of in-house man-
ufacture, which is only really feasible for large scale production, the next stage is
the construction of several preproduction prototypes. In making a few units, as
if for sale, the production team will be verifying that the system can be economi-
cally built on an assembly line. Electronic devices are relatively standard, but me-
chanical components, such as printed-circuit boards, switches, connectors, case
and artwork are somewhat variable. Decisions must be made regarding methods
of construction, second-sourcing of components, stock levels and even down to
whether to use surface mount or sockets for the integrated circuits. Just as im-
portant, but often overlooked, is how and when to test components, subsystems
and the final product.
The production literature covers assembly details and wiring patterns. In
some cases programs for computer-aided manufacture (CAM) facilities will be
covered under this heading. Included in this category is the testing documenta-
tion. This may be either a tester's manual or software for automatic test equip-
ment (ATE).
Post-production documentation covers service manuals and of course the user's
handbook. The quality of this material will often add considerably to the cus-
tomer's satisfaction, which hopefully will eventually increase the reputation of
the manufacturer and eventually increase sales.
and digital processes. For example, should an input signal be filtered before the
A/D conversion (analog filter) or after (a digital software filter)?
At the digital end of things, the choice essentially lies between random logic
(hard-wired digital combinational and sequential circuitry) and programmable
logic (microprocessor-based software-directed hardware). Conventional logic is
often best for small systems with few functions, which are unlikely to require ex-
pansion. Indeed the present project was based on a random-logic time-compressed
memory predecessor. In larger mass-produced products, this type of logic may
appear in the guise of programmable arrays, semi and fully custom-designed in-
tegrated circuits.
Microprocessors work sequentially doing one thing at a time, while random
logic can process in parallel. Thus, where nanosecond speed is important, con-
ventional logic is indicated (but note that analog electronics is even faster). It is
possible to run many microprocessor chips in parallel, the transputer being the
seminal example. The conventional approach uses mixed logic with a micropro-
cessor in a supervisory role controlling the action of supporting random logic
and analog circuitry.
In keeping with the objective of this book, we choose a microprocessor-based
implementation. In such cases, the processing tasks must be partitioned between
hardware and software. As an example, consider an extension to our specifica-
tion, where the time between ECG/EKG peaks is continually measured, and is to
be displayed on a separate alphanumeric readout. Now we have a choice between
using an expensive intelligent display, which incorporates an integral ASCII de-
coder [3], or a cheaper dumb display, where the segment patterns are picked out
by software. The former will cost more on a unit basis, but the latter will require
money before the product is launched, to design the software-driver package.
This of course is a fairly trivial example, but in general hardware is available off
the shelf and therefore has a low initial design cost and takes some load off the
central processor. At this level, software is rarely obtainable off the shelf and re-
quires initial investment in a (fairly highly paid) software engineer, but is usually
more flexible than a hardware-only solution. In some cases, technical consid-
erations rule out one or other approach. Thus in our example, it is likely that
the processor will not have time both to display the waveform and to pick out
the peak (a more difficult task than it seems) in software. Hence, external peak-
picking hardware is indicated, as shown in Fig. 6.1. Of course, this hardware could
be another MPU running a peak-detection software routine! Thus, when techni-
cally feasible, a software-oriented solution is indicated for large production runs,
where the initial investment is amortized by a lower unit cost.
With a provisional task allocation between hardware and software, what choices
has the designer available in implementing the hardware? There are three main
approaches to the problem.
In situations where the ratio of design cost to production numbers is poor,
a system implementation should be considered. This entails using a commer-
cial microcomputer, such as an IBM PC, as the processor. Such instruments are
normally sold with keyboard, VDU, magnetic and random access memory, which
318 C FOR THE MICROPROCESSOR ENGINEER
are sufficient for most tasks. Ruggedized rack mounting industrial and portable
versions are also available. Generally, the hardware engineer will be concerned
only to customize the system, by designing specialist supporting hardware and
interface circuitry. The software engineer will create a software package based
on this microcomputer, which will drive the hardware. The microcomputer will
support commercially available development packages, such as editors, assem-
blers, compilers and debuggers, which facilitates the software design process at
low cost.
Tailoring a general purpose machine to a semi-dedicated role requires a rela-
tively low investment up-front and low production expenses. Furthermore, doc-
umentation and the provision of service facilities are eased, as a pre-existing
commercial product is used. Technically this type of implementation is bulky,
but where facilities such as a disk drive and VDU are needed, the size and unit
cost are not necessarily greater than a custom-designed equivalent. Sometimes
the customer may already possess the microcomputer; the vendor simply selling
the hardware plug-in interface and software package. This can be an attractive
proposition for the end user.
Thus a system-level implementation is indicated when low-to-medium produc-
tion runs are in prospect and the system complexity is high; for example com-
puterized laboratory equipment. For one-offs this approach is the only economic
proposition, provided that such a system will satisfy the technical boundary con-
straints. For instance, it would be obviously ridiculous (but technically feasible)
to employ this technique for a washing-machine controller.
At the middle range of complexity, a system may be constructed using a
bought-in single-board computer (SBC). Sometimes several modules are used (eg.
MPU, memory, interface), and these are plugged into a mother-board carrying
a bus structure. If necessary these may be augmented with in-house designed
cards to complete the configuration.
Although the cost of these bought-in cards is many times that of the ma-
terial cost of the self-produced equivalent, they are likely to be competitive in
production runs of up to around a thousand. Like system-implemented config-
urations, they considerably reduce the up-front hardware expenses and do not
require elaborate production and test facilities. By shortening the design time,
the product can be marketed earlier, and subsequently the economics improved
by substitution of in-house boards. Whilst more expensive than a system based
on a commercial microcomputer, a board-level implementation gives greater flex-
ibility to configure the hardware to the specific product needs. Furthermore, in a
multiple-card configuration, at least some of the standard modules can be used
for more than one product (e.g. a memory card) thus gaining the cost benefits of
bulk buying. Reference [4] gives an example of this approach.
Neither system or board-level implementations provide an economical means
of production for volumes much in excess of a thousand, with the exception of
high complexity-value products. In many cases, technical demands, such as size
and speed, preclude these techniques even for small production runs. In such
situations, a fundamental chip-level design, as outlined in Fig. 11.5, is indicated.
320 C FOR THE MICROPROCESSOR ENGINEER
References
[1] Riggs, T., et al.; Spectral Analysis of the Normal Electrocardiogram in Children and
Adults, J. Electrocardiology, 12, no. 4, 1979, pp. 377 – 379.
[2] Wilcox, A.D.; 68000 Microcomputer Systems, Prentice-Hall, 1987, Part 1.
[3] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987, Appendix 3.
[4] Blasewitz, R.M. and Stern, F.; Microcomputer Systems, Hayden, 1987, Section 9.5.
[5] Cahill, S.J.; The Single-Chip Microcomputer, Prentice-Hall, 1987.
CHAPTER 12
Most real-world parameters are analog in nature. Some examples are tempera-
ture, pressure and light intensity. An analog parameter is a continuum — limited
in practice between an upper and lower level. Thus a dry-bulb thermometer can
be read to whatever resolution is necessary, between, say, −10◦ C and +180◦ C. Be-
low this, the mercury disappears into its bulb, and above this the top of the tube
is blown off! Theoretically, the quantum nature of matter sets a lower limit to the
continuous nature of things, but in practice noise levels and the limited accuracy
of the device generating the signal sets a realistic upper limit to resolution.
Digital circuitry deals with patterns of symbols, which represent amplitudes.
Depending on the number and type of digits making up the pattern, only a finite
total of values are possible. Most of us tend to use denary (decimal) digits to
represent our numbers, while computers prefer binary. Patterns made up of
eight binary digits can represent up to 28 (256) discrete values, whilst 16 bits can
1 1
resolve down to 216 ( 65536 ) of full scale (see Table 12.1).
Physically, bits are represented in hardware as two values of a signal parame-
ter. Most commonly this is voltage, and typical ranges are 0 – 0.8 V and 2.0 – 5.0 V
for logic 0/logic 1 respectively. Other values, such as ±12 V, and parameters such
as frequency, for instance 1.2/2.4 kHz, are in regular use. Digital signals are in
fact analog signals with analog characteristics, such as finite transition times
and noise. Although digital systems designers must be cognizant of these sig-
nal properties, they are normally regarded as secondary effects, caused by the
intrusion of an imperfect world!
Given that we wish to do our processing using digital techniques, in our case
a microprocessor, conversion to and from analog signals is necessary. The aim
of this chapter is to overview this process, with an eye to our specific project.
As well as the A/D and D/A converters themselves, we will look at some of the
consequences of the digitization of analog signals.
12.1 Signals
Our project specifically targeted the adult ECG/EKG signal as the system input.
The physical origins of this signal and the use of electrodes as the sensor are
outside the scope of this text; the interested reader is directed to references [1, 2]
323
324 C FOR THE MICROPROCESSOR ENGINEER
for this information. The most common configuration measures the potential
difference across the chest, the LA (left arm) and RA (right arm) leads in Fig. 11.3.
The RL (right leg) is used as the reference point. As the RA–LA potential is rarely
more than a few millivolts, the following amplifier must provide differential gain
to the order of 1000 (60 dB). However, there is more to this stage than just gain.
A good ECG/EKG amplifier must have the following properties, in addition to the
usual requirements of linearity, slew rate and frequency response:
1. A common-mode rejection ratio (differential gain/common-mode gain) of at
least 80 dB. Common-mode signals arise from interference from external sources,
typically mains hum. Such extraneous signals are important, as they are usu-
ally much greater than the signal of interest. Signals appearing on both leads
should not affect a differential amplifier's output, but in practice there will be
some feedthrough.
2. Suppression of baseline drift. Large, essentially d.c., voltages can appear
across the electrodes, due to electrolytic action at the skin interface. These
are not constant, but change slowly with time. Straight amplification by 1000
would cause the amplifier to saturate.
3. As a safety requirement, leakage current between electrodes, that is through
the body, to be less than 10 µA [2]. Because of this, an isolating amplifier is
recommended. This uses a front end with optical or transformer coupling to
the normally-powered main gain block. This front end can either be battery
powered or supplied through an isolating power supply (sometimes integral
with the amplifier).
4. Protection against pacemaker spikes and, if applicable, defribrillator surges of
around 25 kV!
Because of safety considerations [2], I have resisted the temptation of describ-
ing an ECG/EKG amplifier here. A range of commercially available isolating am-
plifiers is available for biomedical applications, a typical example being the Burr
Brown ISO100P. Either disposable or reusable silver/silver chloride electrodes are
available from any medical supply house [3]. If necessary shave and clean the skin
with surgical spirit before application. However, all the hardware and software
to be presented can be fully and safely tested in comfort using a sinusoidal or
function generator.
Auxiliary circuits, such as filters, level shifters and sample and hold circuits,
lumped together in Fig. 11.3 as the signal conditioning process, are discussed at
the appropriate point later.
Quantizing a signal obviously distorts the original information. In essence,
quantizing is the comparison of the analog quantity with a fixed number of levels.
The nearest level is then the value taken in expressing the original in its digital
equivalent. Thus in Fig. 12.1, an input voltage of 0.4285 full scale is 0.0536 above
quantum level 3 and 0.0714 below level 4. Its quantized value will then be taken
as level 3 and coded as 011b in a 3-bit system.
The residual error of −0.0536 will remain as quantizing noise, and can never
be eradicated (see Fig. 12.2(d)). The distribution of quantization error is given at
SIGNALS 325
the bottom of Fig. 12.1, and is affected only by the number of levels. This can
simply be calculated by evaluating the average of the error function squared. The
square root of this is then the root mean square (r.m.s.) of the noise.
L L
F (x) = − x+
X 2
The mean square is:
326 C FOR THE MICROPROCESSOR ENGINEER
X X
1 2 1 L2 2 L2 L2
F (x) dx = x − x + dx
X 0 X 0 X2 X 4
X
1 L2 L2 2 L2 L2
3
= x − x + x =
X 3X 2 2X 4 12
0
L L
Giving a r.m.s. noise value of √12 = 2√3
A fundamental measure of a system's merit is the signal to noise ratio. Taking
the signal to be a sinusoidal
wave
of peak to peak amplitude 2n L (see Fig. 12.2),
2n L
peak
we have an r.m.s. signal of √22 , that is √2 . Thus for a binary system with
n binary bits, we have a signal to noise ratio of:
n
2√L √
2 2 2n 12
= √ = 1.22 × 2n
√L 2 2
12
In decibels we have:
The dynamic range of a quantized system is given by the ratio of its full scale
(2n L) to its resolution, L. This is just 2n , or in dB, 20 log 2n = 20n log 2 = 6.02n.
The percentage resolution given in Table 12.1 is of course just another way of
expressing the same thing.
The exponential nature of these quality parameters with respect to the number
of binary-word bits is clearly seen in Table 12.1. However, the implementation
complexity and thus price also follows this relationship. For example, a 20-bit
conversion of 1 V full scale would have to deal with quantum levels less than 1 µV
apart. Compact disks use 16-bit technology for high quality music. Pulse-code
modulated telephonic links use eight bits, but the quantum levels are unequally
spaced, being closer at the lower amplitude levels. This reduces quantization
hiss where conversations are held in hushed tones! Linear 8-bit conversions are
1
suitable for most general purposes, having a resolution of better than ± 4 %. Ac-
tually video looks quite acceptable at a 4-bit resolution, and music can just be
heard using a single bit (i.e. positive or negative)!!
SIGNALS 327
The analog world treats time as a continuum, whereas digital systems sample
signals at discrete intervals. The sampling theorem [4] states that provided this
interval does not exceed half that of the highest signal frequency, then no infor-
mation is lost. The reason for this theoretical twice highest frequency sampling
limit, called the Nyquist rate, can be seen by examining the spectrum of a train
of amplitude modulated pulses. Ideal impulses (pulses with zero width and unit
area) are characterized in the frequency domain as a series of equal-amplitude
harmonics at the repetition rate, extending to infinity [5]. Real pulses have a
similar spectrum but the harmonic amplitudes fall with increasing frequency.
If we modulate this pulse train by a baseband signal A sin ωf t, then in the
frequency domain this is equivalent to multiplying the harmonic spectrum (the
pulse) by A sin ωf t, giving sum and different components thus:
AB
A sin ωf t × B sin ωh t = 2 (sin(ωh + ωf )t + sin(ωh − ωf )t)
More complex baseband signals can be considered to be a band-limited (fm )
collection of individual sinusoids, and on the basis of this analysis each pulse
harmonic will sport an upper (sum) and lower (difference) sideband. We can see
from the geometry of Fig. 12.2(b) that the harmonics (multiples of the sampling
rate) must be spaced at least 2 × fm apart, if the sidebands are not to overlap.
A low-pass filter can be used, as shown in Fig. 12.2(d), to recover the baseband
from the pulse train. Realizable filters will pass some of the harmonic bands,
albeit in an attenuated form. A close examination of the frequency domain of
Fig. 12.2(d) shows a vestige of the first lower sideband appearing in the pass
band. However, most of the distortion in the reconstituted analog signal is due
to the quantizing error resulting from the crude 3-bit digitization. Such a system
will have a S/N ratio of around 20 dB.
In order to reduce the demands of the recovery filter, a sampling frequency
somewhat above the Nyquist limit is normally used. This introduces a guard
band between sidebands. For example the pulse code telephone network has an
analog input bandlimited to 3.4 kHz, but is sampled at 8 kHz. Similarly the audio
compact disk (CD) uses a sampling rate of 44.1 kHz, for an upper music frequency
of 20 kHz. This means that with a 16-bit sample and 70 minute play period, a CD
must store around 3000 Mbits!
A more graphic illustration of the effects of sampling at below the Nyquist
rate is shown in Fig. 12.3. Here the sampling rate is only 0.75 of the baseband
frequency. When the samples are reconstituted by filtering the resulting pulse
train, the outcome, shown in Fig. 12.3(b), bears no simple relationship to the
original. This spurious signal is known as an alias.
Returning now to our project, we have established that the sampling rate will
be 128 per second. As the baseband of interest is limited to 20 Hz, this seems
to give us a Nyquist margin of around 300%. However, the ECG/EKG signal does
have components beyond 1 kHz; and noise, both from external and internal (e.g.
muscle noise) sources, will have a spectrum extending well above the Nyquist
limit. Thus, an anti-aliasing filter will be required as part of the front end signal
conditioning process.
328 C FOR THE MICROPROCESSOR ENGINEER
Many filter designs exist. That shown in Fig. 12.4 is a 4th-order Butterworth
low-pass filter using the multiple-feedback configuration. The overall gain in the
passband is designed to be unity, with the −3 dB frequency at 24 Hz. Design
equations and other relevant data is given in reference [6]. In practical situa-
tions, component tolerances will cause wide deviations from this figure; this is
especially true of the large capacitors used for low frequencies. Reference [6]
gives a tuning procedure using R1 R3, where more precise results are required.
The transfer characteristic of Fig. 12.4(b) is a real transfer of an untuned cir-
cuit, using 0.1% resistors and ±1% capacitors. Actual preferred values, as shown
bracketed, were used. From this characteristic, the gain at the Nyquist frequency
of 64 Hz is −32 dB down from the passband.
the virtual earth of an operational amplifier's summing junction, will give a com-
posite analog output, which is a function of the digital pattern and the resistor
values. Thus using an 8 kΩ resistor driven by b0, 4 kΩ driven by b1, 2 kΩ by b2
and 1 kΩ by b3 will feed currents in ascending orders of two into the summing
junction.
In a practical situation, the use of weighted resistors, leads to severe accu-
racy problems. These are the result of the wide range of resistance values; for
example, in a 12-bit system, if b0 switches in 1 kΩ then b11 will have to switch
in a 2.048 MΩ resistor. As well as the problem of matching the ratio of all these
resistors, the precision analog switches have to carry an equally wide spread of
currents.
One way around the matching problem is to use a ladder network, such as
shown in Fig. 12.5(a). Looking left at node A we `see' a resistance 2R. At node B
I ref
3
IO = bk × 2 k where bk = 1 or 0
16 k=0
constant offset and gain error. Unlike the non-linear errors, both these linear
errors may be trimmed out using the operational amplifier buffers.
Manufacturers specify their error figures in different ways. For example the
AD7528's relative accuracy is measured as the maximum deviation of any code
from the ideal, with offset and gain errors eliminated. Depending on the version,
1
this is given as ±1 bit and ± 2 bit. Differential non-linearity is the maximum
difference between the ideal 1-bit change expected between any two adjacent
codes and the actual measured value. This is given as ±1 bit, and therefore the
converter is guaranteed monotonic (just!). Gain error is the worst case full-scale
error due to offset and gain tolerance. It can be as high as ±6 bits, but is easily
trimmed out if need be.
334 C FOR THE MICROPROCESSOR ENGINEER
The output port chosen for our project is the Analog Devices AD7528 dual
8-bit D/A converter. This provides for both X and Y analog channels in the one
device. The AD7528 is microprocessor-compatible, with two integral 8-bit trans-
parent latches. Besides any necessary operational amplifier network, only an
external precision reference voltage is required.
The heart of each converter is a current R-2R ladder network, which is essen-
tially an 8-bit version of Fig. 12.5(a). From Fig. 12.8 we see that an additional R
resistor connected to the output node is also provided (at pins 3 and 19), designed
to be used as the feedback resistor of the first amplifier stage of Fig. 12.5(b). Its
use is illustrated in Fig. 12.9.
The original specification in Section 11.1 called for 0.5% resolution and ±0.5%
1
full-scale accuracy. An 8-bit system gives 256 ≈ 0.4% resolution, thus a ±1 bit
non-linear accuracy will fall within our target.
From Section 11.1, we have estimated a data rate interval to both D/A con-
verters of 56 µs. The settling time for a step between zero and full current
1
(00000000b ↔ 11111111b) is 400 ns maximum to within a 2 bit of the final
value (supply voltage V DD of +5 V and V ref of +10 V). This is around 0.7% of the
step period. Of course the amplifier circuitry converting this current to voltage
will worsen this figure.
Both channels are independent, with separate analog sections and 8-bit trans-
parent latches driving the ladder switches. DACA/DACB (pin 6) directs data from
the MPU bus through to the appropriate 8-bit latch whenever both the Chip_Select
(CS) and Write (WR) pins are low. In driving DACA/DACB from a0, the AD7528
looks to the MPU as two ordinary 8-bit digital output ports located at adjacent
DIGITAL TO ANALOG CONVERSION 335
Figure 12.8 The AD7528 dual D/A converter. Reprinted with the permission of Analog Devices, Inc.
addresses. An address decoder line enables the CS input, whilst a strobe of some
kind activates WR when sending data. In the 6809 MPU, the Q inverted clock
is normally used for this purpose (see Fig. 1.7), whilst DS fulfils this role for the
68008 device. The 68000 MPU would use UDS or LDS as appropriate, see Fig. 3.10.
With a V DD of +5 V, all digital signals are TTL and therefore MPU compatible. The
AD7528 response times are (just!) compatible with a 2 MHz 6809 and no wait-
state 8 MHz 68000/68008.
The reference voltage must be supplied externally. In Fig. 12.9, I have used
the Plessey ZN040 4.01 V ±1% bandgap voltage reference IC for this purpose.
With the amplifier configuration shown, this gives a full-scale voltage output
of nominally +4 V. The actual voltage may be trimmed over the range ±5% by
connecting pin 2 to the center tap of a 100 kΩ potentiometer across pins 1 and 3.
The choice of reference voltage is fairly arbitrary in our case, as both channels are
to drive input amplifiers on an oscilloscope. If necessary, a Zener diode will act as
a reasonable substitute (anode to top) with 5.6 V having the lowest temperature
coefficient. The series resistance of 1.2 kΩ gives a bias current of around 9 mA.
The minimum current is given as 150 µA, with a maximum of 75 mA. This would
also be suitable for a 5.6 V Zener diode.
V ref can vary over the range ±10 V. By choosing a negative value, the single
amplifier/channel configuration used in Fig. 12.9 gives a positive unipolar range.
336 C FOR THE MICROPROCESSOR ENGINEER
The internal feedback resistor has been used with an external 33 pF polystyrene
capacitor in parallel to stabilize the amplifier's high frequency behavior. The
TL082 operational amplifiers feature a typical slew rate of 13 V/µs (minimum
8 V/µs). Thus a full-scale swing of 4 V will take around 300 ns. Any operational
amplifier can be used here, but note that the general purpose 741 type has a
slew rate of only 0.5 V/µs. Analog power supplies of between ±8 V to ±15 V are
suitable, and can be used for the reference voltage IC bias.
Ideally the analog ground should be run directly back to the power supply
common point, rather than make a direct connection to the noisy digital ground.
Where this is done, it is recommended that two back to back signal diodes be
connected between them, close to the IC. This reduces the chance of transient
voltages injecting noise into the system.
Testing the the D/A converter dynamically is covered in Section 15.2. A simple
static test is possible before connecting the digital signals to the system. Firstly
meter the V ref inputs and power supplies. Then keeping pins 6, 15 and 16 to
digital ground, as well as all eight data lines, check output A is close to analog
1
ground. Bring DB7 to V DD. Now output A should be 2 V ref. With all DB lines logic 1,
the output voltage will be ≈ V ref. Repeat with pin 6 at logic 1, for output B. The
ANALOG TO DIGITAL CONVERSION 337
deselected channel will retain its last value. Connecting all DB lines together to a
TTL-compatible square wave generator and monitoring the analog outputs is an
alternative test, which also checks delay and slew rate parameters.
Handle carefully to avoid electrostatic discharge damage, and do not insert
into a powered socket.
The former is initialized to zero (nothing on the pan) and the latter to 80h (largest
known weight, 10000000b).
The while loop adds the test weight to the trial pattern, sends it out to the
D/A converter and checks the comparator output. If this is logic 1 (i.e. in line 13,
the contents of the address comparator ANDed with 10000000b is non-zero;
*comparator & 0x80) then the test weight is removed (subtracted); otherwise
it is left as part of the aggregrate. Each new test weight is generated by shifting
weight right once. As weight has been declared unsigned, this should be a
Logic Shift Left operation (but strictly is compiler dependent). After eight passes
through the loop, the state of the aggregate is the digital equivalent to V in.
Although the circuit of Fig. 12.11 works, it is fairly slow. Typically an 8-bit
conversion will take around 100 µs at best. A wide range of stand-alone succes-
sive approximation converters are available to interface directly to MPU buses,
340 C FOR THE MICROPROCESSOR ENGINEER
analog_in()
{
unsigned char * const d_a = d_a_address;
unsigned char * const comparator = comp_address;
register unsigned char digital = 0; /* The digital trial */
register unsigned char weight = 0x80; /* The walking weight */
while (weight != 0)
{
digital += weight; /* Add weight to trial */
*d_a = digital; /* Send out to d_a converter */
if (*comparator & 0x80) /* IF too big */
{digital -= weight;} /* THEN remove weight from trial */
weight >>= 1; /* weight divided by 2 */
}
return (digital); /* Nearest approximation returned*/
}
Figure 12.12 Functional diagram of the AD7576 A/D converter. Reprinted with the permission of
Analog Devices Inc.
ANALOG TO DIGITAL CONVERSION 341
For our project we have chosen to use the Analog Devices AD7576 8-bit A/D
converter, as outlined in Fig. 12.12. The AD7576 is a monolithic device contain-
ing all the digital and analog circuitry necessary to implement the successive
approximation strategy. In Fig. 12.12, the block labelled SAR is the Successive
Approximation Register, holding the bit pattern trial as it is built up (digital
in Table 12.2). The Control Logic box sequentially sets each flip flop in the SAR,
clearing it shortly after, if the comparator (COMP) indicates that the analog output
of the D/A converter (DAC) is above the input (Ain). The timing of this sequence
is a function of the internal Clock Oscillator box, whose frequency is controlled
by CR components at pin 5. The minimum conversion time is given as 10 µs. An
external oscillator may alternatively be used to drive pin 5, and in this situation
2 MHz gives the 10 µs minimum conversion time.
The AD7576 operates in two modes, depending on the state of the MODE
input. If pin 3 is high, then a low-going signal at the RD pin begins the conversion
process, provided that the device is enabled (CS = 0). BUSY goes low during
this process, and returns high when it has been completed. The new data is
transferred to the internal latch register on the rising edge of BUSY. These latches
are interfaced to the data bus via integral 3-state buffers, which are enabled when
RD (i.e. Read) is low and the device is enabled. Thus the RD control is a dual
purpose Start Convert and Read function, that is in reading data a new conversion
is automatically initiated.
The interface diagram of Fig. 12.13(a) uses the AD7576 in its asynchronous
mode (pin 3 low). Here the A/D converter performs continuous conversions. Data
in the output latches is always valid, and can be used (RD and CS low) at any time.
With the clock CR components shown, the data is never more than 10 µs out of
date.
The AD7576 is powered by a single +5 V supply. As this will probably be com-
mon with the logic supply, it should be decoupled to analog ground as close to
the device as possible, with a recommended 47 µF tantalum capacitor in parallel
with a 0.1 µF ceramic capacitor. This supply, and analog ground, should be run
directly back to the power supply.
The internal D/A converter requires an external V ref of 1.23 V ±5%. This is pro-
vided from an AD589 bandgap reference, and should be decoupled in the same
way. With this value of V ref, full scale at 2V ref is 2.46 V, giving a nominal 10 mV
resolution. The internal D/A converter suffers from the same errors discussed
in Section 12.2 The resulting non-linear error (the relative accuracy) is either ±1
1
or ± 2 bit maximum, depending on the device selection type. This is within our
specification.
The input analog range is unipolar 0 – 2V ref. The simple operational amplifier
network shown in Fig. 12.13(b) will convert a bipolar input to the necessary range,
by adding a constant bias. This offset may be alternatively incorporated into the
anti-aliasing filter. The resulting code is in offset binary form and can, if neces-
sary, be converted to 2's complement form by inverting the MSB (see page 332).
One consideration remains. The analog input is changing during the time the
342 C FOR THE MICROPROCESSOR ENGINEER
conversion takes place. Accuracy considerations dictate that any change should
not exceed one bit during this aperture time. Taking, as a worst-case situation,
a sinusoid swinging through the full scale, as shown in Fig. 12.14, then we can
determine the rate of change by differentiation:
d
Rate of change ( dt ) is V ref ω cos ωt
Maximum when cos ωt = 1 is V ref ω volts s−1
Aperture time is 10 µs, therefore:
change in 10 µs (δ) is 10−5 V ref ω
and thus:
δ ≤ 1 bit
−5 2V ref
10 V ref ω ≤
256
References 343
This is well within our specification, but if, say, a 12-bit conversion was needed
within 16 µs, then the upper frequency falls to less than 10 Hz! In such cases a
sample and hold (S/H) circuit preceding the A/D converter must be used. This
captures the signal, with typically a 40 ns aperture time [11]. The principle of
most S/Hs involves a capacitor being charged up during the sample period, and
held whilst conversion occurs. As with A/D and D/A converters, S/H circuits are
normally obtainable as monolithic integrated circuits.
Although S/H aperture times are low, they may take several µs to stabilize,
after which conversion can commence. They tend to droop during hold, as the
capacitor looses its charge (typically 20 µV/ms), and suffer from all analog ill-
nesses of the flesh, such as drift, offset and non-linearity. Thus the S/H must be
matched to the A/D converter's performance.
References
[1] Friedman, H.H.; Diagnostic Electrocardiography and Vectorcardiography, McGraw-
Hill, 3rd. ed., 1985.
344 C FOR THE MICROPROCESSOR ENGINEER
From our discussion, the target microcomputer will have the following facilities:
All this is in addition to the memory, address decoder and other necessary sup-
port circuitry.
In consideration of the requested sampling rate variation of −50% to + 100%
around the nominal 128 per second value, both of the following circuits use an
oscillator connected to the MPU's interrupt line(s). The interrupt frequency can
easily be varied using a potentiometer. Furthermore, a switch connected to this
sampling oscillator's Reset acts as a convenient freeze input. No sample rate –
no new samples.
The alternative scheme requires a switch port, not only to read the freeze-
request switch (see Fig. 11.3), but to read several switches requesting the sam-
pling rate. Although I have not used this technique, the two microcomputers
developed in this chapter have 4-bit switch ports provided. This gives a Read-
option expansion capability, and is exploited in Chapter 14, where diagnostic
software tests are discussed.
The provision of an 8-bit digital port is a little more expensive than the neces-
sary 1-bit output. This is also useful for diagnostic purposes and gives additional
scope for expansion.
Microcomputers based on both the 6809 and 68008 MPUs are developed in
the next two sections. By using C to target two different MPUs, we will be able to
investigate one of the major advantages of a high-level language.
345
346 C FOR THE MICROPROCESSOR ENGINEER
Samples are acquired at a rate dictated by the astable network U7, C6, R3, R7.
Based on a 555 timer [1], the total period is given by the relationship:
tp = 0.693(R7 + 2R3)C6
and can be varied with R3 from nominally 60 – 250 Hz. The 555 is a noisy device,
and thus the +5 V supply should be locally well decoupled. By connecting S1 to
the 555's Reset, the astable can be halted. Thus no further updates will occur,
giving a frozen display. NMI is used as the interrupt input, as its edge-triggered
nature obviates the need for an external interrupt flag, such as used in Fig. 6.6.
All unused interrupt lines, as well as BREQ and MRDY, are tied high through R5.
HLT has its own pull-up resistor R4, as this line is frequently used by in-circuit
emulators to control the progression of the MPU.
The address map for the system is:
The address decoder comprises U3 and U4. The 74HCT138 splits the mem-
ory map into eight 8 kbyte pages, six of which enable the devices above. All
Write-to devices include Q as part of their enabling logic. The digital output
port U2 is clocked by the rising edge of this strobe, whilst the dual D/A converter
A/D1 is enabled by it. The 6116 RAM uses Q together with R/W as a modified
Read/Write control. This is shortened during a Write cycle, as described in Fig. 1.8.
Output_Enable is driven by R/W to ensure that no data is output during the pre-
mature ending of a Write cycle. OE of the 2716 EPROM (usually labelled V pgm)
is similarly enabled, to prevent accidental writing to a read-only memory. NAND
gates U4 provide these auxiliary functions.
Both RAM and EPROM have a 2 kbyte capacity, which is more than adequate
for our application. With a 1 MHz clock frequency, any speed selection will be
suitable. With a 2 MHz clock, a 300 ns EPROM is required. Although it is possible
to purchase such a 2716 (or Texas 2516) it is easier and cheaper to use a 2764
8 kbyte device at this speed (see Fig. 13.3). If desired, an integral battery backup
48ZO2 RAM may be directly substituted for the 6116. RAMs with an access time
of 150 ns (min) should be used for a 2 MHz processor.
The two analog ports are as described in Figs 12.9 and 12.13. Figure 13.1 does
not show any necessary filtering and buffering.
Quad 3-state buffer U9/10 provides input port facilities for four switches. This
74HCT125 is directly enabled from the address decoder. A 74HCT377 connected
as described in Fig. 1.7 gives a byte-sized digital output port. One of the lines
can be used to blank out the CRO during flyback, and the others are free. Some
6809 – TARGET HARDWARE 349
CROs require large negative voltages, typically −40 V, to perform this function.
In such cases a suitable transistor buffer and power supply will be required.
A free-run facility, HDR1, R16, D1, D2 and SW2 is shown in Fig. 13.1. This al-
lows the user to exercise the processor before software is available for the EPROM
and without using an in-circuit emulator. Its action is described on page 399.
The complete circuit requires +5 V at typically 250 mA and ±12 to ±15 V at
25 mA. The analog ±15 V is conveniently supplied from a dual d.c./d.c. converter,
such as the Citec BC5151S +5V to ±15 V device. Care should be taken, as most
converters are not short-circuit proof. Any analog grounds should be returned to
this power supply 0 V together with the +5 V's ground return. The supplies should
be decoupled using a mixture of 1 µF tantalum and 0.1 µF ceramic capacitors at
around one capacitor each two devices.
Any suitable wiring technique may be used for the prototype. We use wire-
wrap with considerable success. This avoids close parallel paths for the clock and
bus signals and reduces crosstalk. It is especially important to keep the analog
signals as far away from such digital lines as possible. Whatever technique is
used, it is important to color-code any wiring to aid in the debug phase. Several
The address decoder comprises U9, U10 and U4C. The 74HCT138 splits mem-
ory up into eight 8 kbyte pages. Address lines a19 – a16 are ignored by this scheme,
and this of course gives 15 images of each page. Gates U10/U4C detect wherever
a memory access is made, and activate DTACK. All peripheral devices are fast
enough to support direct feedback in this manner, without the necessity of in-
troducing a delay as shown in Fig. 3.9. Care should be taken that the EPROM has
an access time of 250 ns or better, and the RAM has a 120 ns maximum access
time (see Section 3.3). Alternatively, a lower-frequency clock oscillator (minimum
2 MHz) can be used with slower devices. The digital output port is clocked by the
falling edge of DS, at which time data on the bus has stabilized (point 5 in Fig. 3.7).
Both RAM and analog output ports are enabled when DS is active. The EPROM is
only enabled when R/W is high, to prevent an accidental Write-to operation. The
RAM's output buffers are similarly disabled when R/W is high. Interface details
for the AD7528 and AD7576 are given in Figs 12.9 and 12.13 respectively.
A free-run facility, HDR1 and HDR2 is shown between the data bus/DTACK
and the MPU. By substituting the two headers, the user can check out the system
before software is available for the EPROM and without using an in-circuit emu-
lator. It is also a useful diagnostic aid when the system is in service. Its action is
described on page 400.
The complete circuit requires +5 V at typically 300 mA and ±12/ ± 15 V for
the analog circuitry, at 25 mA. Normal power supply and decoupling practice,
as described in the last section, should be followed. However, the data sheet
indicates that the 68008 MPU can take current peaks of 1.5 A [5]. Thus a direct
connection using heavier or multiple wiring between the 68008's power pins and
the power supply is recommended, as is local decoupling.
If you have access to a PAL programmer, a PAL20L10 or 22V10 can be used to
implement the address decoder and other glue logic. Chips U9, U10 and U4C/R2
are replaced by the one 24-pin device. Connection details and the requisite equa-
tions are given in Fig. 13.4.
References
[1] Berlin, H.M.; The 555 Timer Applications Sourcebook with Experiments, H.W. Sams,
1976, Chapter 3.
[2] Alford, R.C.; Programmable Logic Designer's Guide, H.W. Sams, 1989, Chapter 5.
[3] Cahill, S.J.; Digital and Microprocessor Engineering, Ellis Horwood/Simon and Schus-
ter, 2nd. ed., 1993, Section 6.1.
[4] Berlin, H.M.; The 555 Timer Applications Sourcebook with Experiments, H.W. Sams,
1976, Chapter 2.
[5] Wilcox, A.D.; 68000 Microprocessor Systems: Designing and Troubleshooting,
Prentice-Hall, 1987, Section 9.1.1.
CHAPTER 14
Software in C
Task 1:
BEGIN:
Forever do:
Scan and send out to the Y-plates the 256 stored array values from oldest to
newest, while incrementing and sending out the X count to the X-plates (left to
right).
Flyback:
End each scan with a flyback procedure.
END:
Task 2:
BEGIN:
Forever do:
At regular intervals interrogate input and place sampled value into array, over-
writing oldest value.
END:
In the remainder of this chapter we will develop the necessary data structures
and interaction between these tasks. From this, a general program in C is devel-
oped; followed by topics specific to the two chosen targets.
355
356 C FOR THE MICROPROCESSOR ENGINEER
elements are accessed from the oldest member (i = 0) to the newest member
(i = 255). At the same time, i is converted to its analog equivalent and hence
drives the X spot from left (oldest) to right (newest).
The job of the updating task is to fetch a sample into the array, to where
Oldest points, and move that index on one. Thus the element just before Oldest
is the most juvenile sample. When a whole scan of 256 samples has been com-
pleted, flyback occurs and the process begins again, but this time beginning from
the current most ancient element. The circular manner of this scan is simulated
by wrapping around the sum of Oldest plus i modulo-256, that is from 255 back
to 0 (11111111b + 1 = 00000000b).
The software implementation of our time-compressed memory is given in Ta-
ble 14.1. The two tasks are assigned to different functions. main() implements
the initialization, repetitive scan and flyback. New samples are acquired and
the array and Oldest pointer updated by the function update(). This is de-
signed to be entered via an interrupt, and so no data is sent or returned from it.
Communication between tasks is via the global data array Array[] and global
index Oldest. Both are defined before main(), and therefore are known to both
DATA STRUCTURE AND PROGRAM 357
main()
{
register short int i; /* Scan counter */
register unsigned char leftmost; /* The initial array index when x is 0 */
unsigned char * const x = ANALOG_X; /* x points to a byte @ (address) ANALOG_X */
unsigned char * const y = ANALOG_Y; /* y points to a byte @ (address) ANALOG_Y */
unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) */
Oldest = 0; /* Start New index at beginning of the array*/
for(i=0; i<256; i++) /* Clear array */
{Array[i] = 0;}
while(1) /* Do forever display contents of array */
{
leftmost = Oldest; /* Make leftmost point on the screen the oldest sample */
for (i=0; i<256; i++)
{
*x = (unsigned char)i; /* Send x co-ordinate to X plates */
*y = Array[(leftmost+i)&0x0ff]; /* and the display byte to the Y D/A */
}
*z = BLANK_ON; /* Blank out for flyback */
*x = 0; /* Move to right of screen */
*y = Array[Oldest]; /* Y value at left of screen */
for(i=0; i<5; i++) {;} /* Delay */
*z = BLANK_OFF; /* Blank off */
} /* Do another scan */
}
/***************************************************************************************
* This is the NMI interrupt service routine which puts the analog sample in the array *
* and updates the New index *
* ENTRY : Via NMI and startup *
* ENTRY : Array[] and Oldest are global *
* EXIT : Value held at a_d in Array[Oldest], Oldest incremented with wraparound *
* at 256 (modulo-256) *
***************************************************************************************/
void update(void)
{
volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */
Array[Oldest++] = *a_d; /* Overwrite oldest sample in Array[] & inc Oldest index */
}
functions. The former is defined as having 256 unsigned char (byte) elements,
whilst the latter is a single unsigned char. Each element therefore can vary from
0 to 255. Details of the entry to update() and the header file hard.h, included at
the beginning of the file, are discussed in Sections 14.2 and 14.3. The header file
contains hardware-related detail, such as the locations of the various peripheral
devices.
main() begins by defining five local variables. Both i, the integer scan counter,
and leftmost, the char element indicating the most ancient array entry, are
defined as being of type register. Both are used inside the scan loop, and will
benefit from being stored internally. Processors with insufficient registers will
ignore this request. The variables x, y and z are defined as being fixed pointers to
358 C FOR THE MICROPROCESSOR ENGINEER
unsigned chars (bytes), and are assigned as ANALOG_X, ANALOG_Y and Z_BLANK,
which are given values (addresses) in the header file. As they are qualified as
const, any subsequent attempt to change them will be reported by the compiler
as an error.
The program proper commences by zeroing the global variables Oldest and
Array[]. Strictly this run-time initialization is not necessary, as ANSII C specifies
that global variables are to be considered zero if not explicitly initialized in their
definition. To simulate this situation, the relevant RAM locations could be zeroed
in the startup routine. In this case we have chosen to do this in the C coding.
Actually the system will operate perfectly satisfactorily if not cleared, but there
would be a 2-second transient display while the array was being filled with the
first 256 samples.
After initialization, an endless loop is entered inside the body of while(1). At
the commencement of this loop the local variable leftmost is equated to Oldest.
This prevents changes in Oldest during the scan (i.e. via update()), altering the
display.
The scan itself uses a for loop construction, with i acting as the loop counter.
i has been defined as an int, so that the condition i < 256 False can be used as
a loop terminator. If i was a char, it would wrap around at 255. In this situation
a break on i == 255 at the closing brace should be used as the out condition (see
Table 14.8).
The for body simply assigns the contents of x (the ANALOG_X output) to i (0
to 255), and the contents of y (the ANALOG_Y output) to the array element. The
index of the array is the sum of the X co-ordinate (i.e. i) plus the leftmost value,
truncated to 8-bits (modulo-256) by ANDing with 000011111111b (0xFF). This
stratagem achieves a wrap around at 255. For example if leftmost were 180 and
i were 159, then Array[83] is the value sent to the Y-plates (180+159 is 83 when
added modulo-256). A similar result could be obtained if the sum was given an
independent int-sized existence and then cast to char. I have used such a cast in
equating the char-sized contents of x to the integer i, x = (unsigned char)i;.
In practice the compiler will truncate the r_value in assigning to a small l_value
(see page 223).
Flyback is generated by sending the correct patterns to the Z port (BLANK_ON
is defined in the header), zero to the X-plates and the initial array value to the
Y-plates. A short null for-loop gives a delay, before the BLANK_OFF pattern is
sent out to Z. After this, the scan begins again.
Function update() is very short. The local pointer variable a_d is defined as
being the const address ANINPUT, whose absolute value is given in the header.
This pointer is to an unsigned char (byte) which is volatile (changes sponta-
neously) and is const (read-only). The value read from this port is then put into
the array at the oldest index, and the global variable Oldest automatically in-
cremented. We are relying here on the char nature of Oldest wrapping around
at 255. An explicit wraparound would be necessary for other array lengths.
Function update() assumes that the analog to digital converter can be treated
as a simple read-only input port. In that respect the program is not portable.
6809 – TARGET CODE 359
Normally, a separate function is used for more complex parts, frequently called
getchar(). Such a function would be part of an input/output library, which was
hardware specific, or would appear in the header file. Similar assumptions have
also been made for output in main().
Portability has been further compromised by the assumption that char objects
are 8-bit wide. In practice this is true for the vast majority of microprocessor
targeted compilers. However, 9-bit character systems do exist, and the use of
complex character sets, such as Japanese, requires 16-bit characters. ANSII C
makes no guarantees regarding the 8-bit nature of char objects.
In the next two sections we look at machine-specific details regarding our two
target circuits of Figs 13.1 and 13.3.
CLRA: LEAY D,Y: LDB _Array,Y. As both processes are done in each loop pass,
the savings are obvious. Table 14.9 shows code where the register qualifier is
obeyed.
ANSI C specifies that chars and shorts are promoted to ints during process-
ing (see Fig. 8.4). Objects larger than bytes (chars) are handled with difficulty in
most 8-bit processors. The Intermetrics/COSMIC 6809 C cross-compiler permits
processing of chars in their byte form. Thus the assignment leftmost = Oldest;
is simply implemented in lines 54 and 55 using Accumulator_B only. However,
the usefulness of this option is not fully realized in this particular instance, as i
is a 16-bit object, and most arithmetic involves this variable.
Table 14.3: 6809 code resulting from Tables 14.1 and 14.2 (continued next page).
1 ; Compilateur C pour MC6809 (COSMIC-France)
2 .list +
3 .psect _text
4 ; 1 /* Version 16/11/89 */
5 ; 2 #include <hard_09.h>
18 ; 4 unsigned char Oldest; /* Index to the Oldest inserted data byte (left point on)
19 5
20 ; 6 main()
21 ; 7 {
22 E00D 3440 _main: pshs u ;## Open a frame
23 E00F 33E4 leau ,s ;## U is the Top Of Frame (TOF)
24 E011 3277 leas -9,s ;## Nine bytes deep
25 ; 8 register short int i; /* Scan counter */
26 ; 9 register unsigned char leftmost; /* The initial array index when x is 0 */
27 ; 10 unsigned char * const x = ANALOG_X;/* x points to a byte @ (address) ANALOG_X*/
28 E013 CC2000 ldd #2000h ;## Put constant 2000h in frame at FP-5/-4
29 E016 ED5B std -5,u
30 ; 11 unsigned char * const y = ANALOG_Y;/* y points to a byte @ (address) ANALOG_Y*/
31 E018 CC2001 ldd #2001h ;## Put constant 2001h in frame at FP-7/-6
32 E01B ED59 std -7,u
33 ; 12 unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) */
34 E01D CCA000 ldd #0A000h ;## Put constant A000h in frame at FP-9/-8
35 E020 ED57 std -9,u
36 ; 13 Oldest = 0; /* Start New index at beginning of the array */
37 E022 7F0001 clr _Oldest
38 ; 14 for(i=0; i<256;i++) /* Clear array */
39 E025 4F clra
40 E026 5F clrb
41 E027 ED5E std -2,u ;## i lives in FP-2/-1; is cleared, i=0
42 E029 AE5E L1: ldx -2,u ;## Get i into X
43 E02B 8C0100 cmpx #256 ;## i<256?
44 E02E 2C0C jbge L14 ;## IF not THEN jump out of for loop
6809 – TARGET CODE 361
Table 14.3: 6809 code resulting from Tables 14.1 and 14.2 (continued next page).
45 ; 15 {Array[i]=0;}
46 E030 6F890002 clr _Array,x ;## EA is Array[0]+i, clear it
47 E034 6C5F inc -1,u ;## Double-precision increment of 16-bit int i, i++
48 E036 2602 jbne L4
49 E038 6C5E inc -2,u
50 E03A 20ED jbr L1
51 ; 16 while(1) /* Do forever display contents of array */
52 ; 17 {
53 ; 18 leftmost = Oldest;/* Make leftmost point on the screen the oldest sample*/
54 E03C F60001 L4: ldb _Oldest ;## Put Oldest array index
55 E03F E75D stb -3,u ;## in FP-3/-2, where leftmost lives in the frame
56 ; 19 for (i=0; i<256; i++)
57 E041 4F clra
58 E042 5F clrb
59 E043 ED5E std -2,u ;## Again i=0
60 E045 AE5E L16: ldx -2,u ;## Get i into X
61 E047 8C0100 cmpx #256 ;## i<256?
62 E04A 2C1C jbge L17 ;## IF not THEN jump out of for loop
63 ; 20 {
64 ; 21 *x = (unsigned char)i; /* Send x co-ordinate to X plates */
65 E04C EC5E ldd -2,u ;## Get i into D
66 E04E E7D8FB stb [-5,u] ;## Put lower byte (char) indirectly into X D/A
67 ; 22 *y = Array[(leftmost+i)&0x0ff];/* and the display byte to the Y D/A*/
68 E051 E65D ldb -3,u ;## Get leftmost out of the frame into B
69 E053 4F clra ;## extended to 16 bits (int)
70 E054 E35E addd -2,u ;## Add to int i; leftmost+i
71 E056 4F clra ;## Neat way of ANDing with 0000 0000 1111 1111b!
72 E057 1F01 tfr d,x ;## X holds (leftmost+i)&0xff
73 E059 E6890002 ldb _Array, x ;## EA is Array[0]+(leftmost+i)&0xff; get element
74 E05D E7D8F9 stb [-7,u] ;## Put Array[(leftmost+i)&0xff] indirectly into Y
75 ; 23 }
76 E060 6C5F inc -1, u ;## Double-precision increment of 16-bit int i, i++
77 E062 2602 jbne L6
78 E064 6C5E inc -2,u
79 E066 20DD L6: jbr L16
80 ; 24 *z = BLANK_ON; /* Blank out for flyback */
81 E068 C6FF L17: ldb #255 ;## Send out indirectly 1111 1111b to Z
82 E06A E7D8F7 stb [-9,u]
83 ; 25 *x = 0; /* Move to right of screen */
84 E06D 6FD8FB clr [-5,u] ;## Send out indirectly 00h to X D/A; i.e. flyback
85 ; 26 *y = Array[Oldest]; /* Y value at left of screen */
86 E070 8E0002 ldx #_Array ;## While this is happening get Array[Oldest]
87 E073 F60001 ldb _Oldest
88 E076 4F clra
89 E077 E68B ldb d,x
90 E079 E7D8F9 stb [-7,u] ;## and put it indirectly into the Y D/A converter
91 ; 27 for(i=0;i<5;i++) {;} /* Delay */
92 E07C 5F clrb
93 E07D ED5E std -2,u ;## i=0
94 E07F AE5E L121: ldx -2,u ;## Get i into X
95 E081 8C0005 cmpx #5 ;## i<5?
96 E084 2C08 jbge L131 ;## IF not THEN jump out of for delay loop
97 E086 6C5F L141: inc -1,u ;## Double-precision increment of 16-bit int i, i++
98 E088 2602 jbne L01
99 E08A 6C5E inc -2,u
100 E08C 20F1 L01: jbr L121
101 ; 28 *z = BLANK_OFF; /* Blank off */
102 E08E 6FD8F7 L131: clr [-9,u] ;## Send out indirectly 0000 0000b to Z
103 ; 29 }
104 E091 20A9 jbr L14 ;## Do another scan; forever
105 ; 30 }
362 C FOR THE MICROPROCESSOR ENGINEER
Table 14.3 (continued) 6809 code resulting from Tables 14.1 and 14.2.
106 ; /********************************************************************************
107 ; 32 * This is the NMI interrupt service routine which puts the analog sample in the
108 ; 33 * ENTRY : Via NMI and startup
109 ; 34 * ENTRY : Array[] and Oldest are global
110 ; 35 * EXIT : Value held at a_d in Array[Oldest], Oldest incremented with wraparound
111 ; 36 ********************************************************************************
112 ; 37
113 ; 38 void update(void)
114 ; 39 {
115 E093 3440 _update:pshs u ;## Open a frame
116 E095 33E4 leau ,s ;## With U as TOF
117 E097 327E leas -2,s ;## of two bytes
118 ; 40 volatile unsigned char * const a_d = ANINPUT;/* This is the Analog input port*/
119 E099 CC6000 ldd #6000h ;## to locate the constant 6000
120 E09C ED5E std -2,u
121 ; 41 Array[Oldest++] = *a_d; /* Overwrite oldest sample in Array[] & inc
122 E09E 8E0002 ldx #_Array ;## Point x to Array[0]
123 E0A1 F60001 ldb _Oldest ;## Get Oldest
124 E0A4 7C0001 inc _Oldest ;## Oldest++
125 E0A7 4F clra ;## 16-bit Oldest++
126 E0A8 308B leax d,x ;## Point X to Array[0]+Oldest++; ie Array[Oldest++]
127 E0AA E6D8FE ldb [-2,u] ;## Get indirectly the contents of A/D; ie of 6000h
128 E0AD E784 L61: stb ,x ;## Put it away as the latest entry into the array
129 ; 42 }
130 E0AF 32C4 leas ,u ;## Close the frame
131 E0B1 35C0 puls u,pc
132 .public _update
133 .public _main
The startup routine, shown in Table 14.4(a) has three functions. The first is
to set the Stack Pointer to the top of the System stack. Hence in line 7 I have
put this at the top of the 6116 RAM. If the library routines malloc()[1] (Memory
ALLOCate) and other related functions are being used, then this can be lowered
somewhat and memory above used as a general storage pool (called the heap).
The second purpose of this startup routine is to go to the main C function.
This is implemented as a simple JSR _main in line 8. In this case, startup.s
does not pass any parameters to main(). main() is an endless loop and so no
return should occur, but if it does, a skip back to the beginning is actioned. The
re-entry point is labelled _exit, and can be reached from the C level by calling the
library routine exit(). exit() is supposed to return True or False to indicate
an error condition, but no use is made of this in our situation.
The final function deals with NMI interrupt handling. Function update() is
terminated with a Return From Subroutine operation (implemented in line 131
of Table 14.3 with a PULS PC) and therefore cannot be directly entered from an
364 C FOR THE MICROPROCESSOR ENGINEER
Table 14.5 The machine-code file for the 6809-based time-compressed memory.
e093 _update
e00d _main
e009 NMI
e007 _exit
e000 start
0001 _Oldest
0002 _Array
e0b8 __prog_top
0001 __data_top
0102 __stack_bottom
0400 __stack_top
e7f6 a:vecttcm9.o
$
:20E0000010CE0400BDE00D20F7BDE0933B344033E43277CC2000ED5BCC2001ED59CCA000EB
:20E02000ED577F00014F5FED5EAE5E8C01002C0C6F8900026C5F26026C5E20EDF60001E7B0
:20E040005D4F5FED5EAE5E8C01002C1CEC5EE7D8FBE65D4FE35E4F1F01E6890002E7D8F91A
:20E060006C5F26026C5E20DDC6FFE7D8F76FD8FB8E0002F600014FE68BE7D8F95FED5EAED2
:20E080005E8C00052C086C5F26026C5E20F16FD8F720A9344033E4327ECC6000ED5E8E0048
:18E0A00002F600017C00014F308BE6D8FEE78432C435C03B32C435C0B0
:0AE7F600E000E000E000E009E000B0
:03E0B800E0BB00CA
:00E000011F
interrupt. Instead, the startup routine has a stub in lines 12 and 13, which is la-
belled NMI. This stub simply jumps to update() (JSR _update) and terminates
on return with RTI. If the address NMI is placed in the NMI vector, then on re-
ceiving such an interrupt, the processor will jump to NMI (E009h here) and again
jump to update(). The way back is a similar RTS–RTI double hop. As all registers
are saved on entry, no other action need be taken.
The vector routine of Table 14.4(b) is linked in after the C code and begins at
E7F6h, which is the FIRQ vector in the 2716 EPROM. All vectors are specified to
point to the beginning of the startup routine (E000h), except the NMI vector. The
addresses start and NMI have been broadcast by the startup routine as public
and declared external by the vector routine.
The end production of the compilation/assembly and linkage of these three
files is the Intel-format machine-code file of Table 14.5. This is used as the input
to the EPROM programmer or in-circuit emulator. In total there are 178 bytes of
EPROM text plus the ten Vector bytes.
The double-hop interrupt handling technique will work with any compiler.
However, most compilers specifically designed to produce ROMable code support
extensions to the ANSII standard, enabling the user to declare a function as an
interrupt handler (See Section 10.2). The function name, in our case update, is
then entered into the Vector table directly in the normal way. This direct entry
should decrease the response time to an interrupt and at the same time reduce
the code emitted by the compiler.
This particular compiler uses the directive @port to designate a function in
this way, the function header then becoming @port update(). It is instructive
6809 – TARGET CODE 365
to look at the code produced, which is shown in Table 14.6(a). Here we can
see the RTI in line 120, but also the RTS in line 133. What has happened, is
that the original code has been cocooned by the RTI at the end and a library
subroutine c_cstk at the beginning. As you will remember, the 6809 has three
interrupt inputs: NMI, IRQ and FIRQ. The two former save all internal registers
on the System stack and retrieve them, whilst the latter saves only the CCR and
366 C FOR THE MICROPROCESSOR ENGINEER
Table 14.7 Using _asm() to terminate a NMI/IRQ type interrupt service function.
; 31/**************************************************************************************
; 32 * This is the NMI service routine which puts the analog sample in the array and update
; 33 * ENTRY : Via NMI and startup
; 34 * ENTRY : Array[] and Oldest are global
; 35 * EXIT : Value held at a_d in Array[Oldest], Oldest incremented with wraparound @ 256
; 36 **************************************************************************************
; 37
; 38 void update(void)
; 39 {
_update: pshs u
leau ,s
leas -2,s
; 40 volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */
ldd #6000h
std -2,u
; 41 Array[Oldest++] = *a_d;/* Overwrite oldest sample in Array[] and inc Oldest index */
ldx #_Array
ldb _Oldest
inc _Oldest
clra
leax d,x
ldb [-2,u]
L61: stb ,x
; 42 _asm("LEAS ,U\nPULS U,PC\nRTI ; Wrap up frame and return to main \n");
LEAS ,U ;## Three inserted assembler-level instructions
PULS U,PC ;## to wrap up frame
RTI ;## and return from interrupt
; 43 }
leas ,u ;## These 2 instructions are now dead code; ie never entered
puls u,pc
.public _update
.public _main
.psect _bss
_Oldest: .byte [1]
.public _Oldest
_Array: .byte [256]
.public _Array
.end
PC. c_cstk, shown disassembled (see page 385) in Table 14.6(b), first checks the
E flag. If E is clear then a NMI or IRQ interrupt service is in progress and nothing
further needs doing. If not, the E flag is cleared and all the registers are put into
the System stack to pretend that the FIRQ is really an IRQ/NMI type interrupt.
Table 14.6 shows us that although @port is deceptively simple at the C level, it
neither improves the speed nor reduces the size of the resulting code. Knowing,
as we do, that update() is entered via an NMI, we could simply alter line 131
of Table 14.3 in its source form to PULS U : RTI, before letting it through to the
assembler. This is messy, and as an alternative the function:
is used to insert three assembly-level instructions. The first two close the frame,
whilst an RTI terminates the interrupt routine. This is shown in Table 14.7,
line C42. The principle could be extended to FIRQ by saving registers at the
beginning, and pushing them out at the end. Incidentally, _asm() could also be
used to implement the startup routine as a front to the C code.
6809 – TARGET CODE 367
All three approaches are non-portable and error prone, so in the majority of
cases a stub approach is best, if rather slow. The @port solution gives 192 bytes
whilst using _asm() yields 179 bytes. Creatively editing the source file is the
most efficient of all, giving a total of 175 bytes. These figures take account of the
removal of the stub from startup.s, but not the vector table. Creative editing,
whilst being efficient, is the most dangerous, as it does not show up in the source
of any of the constituent files, and, unless extremely well documented, will cause
havoc if any but the original designer tries to make subsequent changes.
We will compare the resulting machine code to a hand-assembled version in
Section 16.1, but the question must be asked here: can the resulting machine
code be reduced in size, knowing the way the compiler produces such code.
Two possibilities spring to mind. As we have said the 6809 does not handle
16-bit quantities with any finesse. If we could use a char-sized i, instead of
short, a considerable economy should be achieved. This can be done, if rather
inelegantly, by defining i as unsigned char and replacing the statement:
for (i=0; i<256, i++)
{body;}
by:
i = 0;
do
{
body;
i++;
} while (i!=0);
Here i will be 1 after the first pass, and the while argument will be True. When i
reaches 255, then i++ will wrap around to 0 and the while argument will return
False, causing the do…while loop to exit.
This structure is of course only relevant to loops of 256 iterations on an 8-bit
machine, and presupposes an 8-bit char.
A further reduction can be obtained if the compiler's treatment of pointer
constants, such as a_d in lines 26 – 33 of Table 14.3 is studied. There are four
such constants in our program, and each is put into the frame on entry to the
function, for example:
119 LDD #6000h ; the constant a_d
120 STD -2,U ; in the frame at TOF-2 and TOF-1
Once in the frame they can be used as a pointer via Indirect addressing, for in-
stance [-2,U] = 6000h. With main() this stack initialization is done only once
on entry, and execution proceeds to the core endless loop. The same setup oc-
curs on each entry to update(); however, this will happen around 128 times per
second!
It is not necessary to store constants in the essentially dynamic frame; it is
better to use absolute locations. This can be done by defining such pointers as
static; for example:
368 C FOR THE MICROPROCESSOR ENGINEER
L5_x: .word 2000h ;## 1st word in the txt sect (ROM) holds the pointer constant 2000h
L51_y: .word 2001h ;## Next in absolute memory is the constant 2001h (Y amplifier port)
L52_z: .word 0A000h ;## and A000h the Z-blank port
; 3 unsigned char Array [256]; /* Global array holding display data */
; 4 @dir unsigned char Oldest; /* Index to the Oldest inserted data byte (left point on
; 5
; 6 main()
; 7 {
_main: pshs u
leau ,s
leas -2,s
; 8 unsigned char i; /* Scan counter */
; 9 unsigned char leftmost; /* The initial array index when x is 0 */
; 10 static unsigned char * const x = ANALOG_X; /* x points to a byte @ (address) ANALOG_X
; 11 static unsigned char * const y = ANALOG_Y; /* y points to a byte @ (address) ANALOG_Y
; 12 static unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) */
; 13 Oldest = 0; /* Start New index at beginning of the array */
clr _Oldest
; 14 i=0;
clr -1,u
; 15 do /* Clear array */
; 16 {Array[i]=0; i++;} while(i!=0);
L1: ldx #_Array ;## First do the body statements
ldb -1,u ;## i is now a char; and is at U-1
clra
clr d,x
inc -1,u ;## i++
lda -1,u ;## Then do the test i != 0
jbne L1
; 17 while(1) /* Do forever display contents of array */
; 18 {
; 19 leftmost = Oldest; /* Make the leftmost point on the screen the oldest sample
L13: ldb _Oldest
stb -2,u
; 20 i=0;
clr -1,u
; 21 do
; 22 {
; 23 *x = (unsigned char)i; /* Send x co-ordinate to X plates */
L15: ldb -1,u
stb [L5_x]
; 24 *y = Array[(leftmost+i)&0x0ff]; /* and the display byte to the y D/A */
clra
addb -2,u
rola
clra
tfr d,x
ldb _Array,x
stb [L51_y]
; 25 } while(++i!=0);
inc -1,u ;## i++
6809 – TARGET CODE 369
the qualifiers static and const tells the compiler to put constants in ROM, that
is the _text section. The compile-time nature of these constants is clearly seen
in lines 6 – 8 of Table 14.8, where they are placed in EPROM at locations E00D –
E012h. This saves 4 × 3 bytes and results in quicker execution (it also makes
the code easier to read). Defining const pointers externally is an alternative to
a static declaration, see Table 15.5. Notice that update() no longer requires a
frame.
Defining Oldest to lie in zero page (with the @dir prefix in line C4) saves
another few bytes, giving a total size of 132 bytes, plus vectors. Table 14.8 uses
the startup stub entry for the interrupt entry to update(). A further few bytes
may be saved at the expense of portability by using _asm().
Another possibility, not implemented in Table 14.8, is to replace the array
representations in the three loops by equivalent pointer constructions. As these
loops walk through the array, this procedure should be more effective (see Sec-
tion 9.2). However, the saving is illusionary in this rather efficient compiler [2].
See Table 15.5 for an example of this technique.
Table 14.9: 68000 code resulting from Tables 14.1 and 14.2 (continued next page).
~~1WSL 3.0 as68k Sat Dec 02 14:13:38 1989
1 * 1 /* Version 02/12/89 */
2 * 2 #include <hard_68k.h>
3 * 1 #define ANALOG_X (unsigned char *)0x2000 /* Analog output to X amplifier */
4 * 2 #define ANALOG_Y (unsigned char *)0x2001 /* Analog output to Y amplifier */
5 * 3 #define ANINPUT (unsigned char *)0x6000 /* Analog input port at 6000h */
6 * 4 #define SWITCH (unsigned char *)0x8000 /* Digital input port at 8000h */
7 * 5 #define Z_BLANK (unsigned char *)0xA000 /* Digital output port at A000h */
8 * 6 #define RAM_START (unsigned char *)0xE000 /* 6264 chip starts at location E000h*/
9 * 7 #define RAM_LENGTH 0x2000 /* 6264 byte capacity is 8K or 2000h */
10 * 8 #define ROM_START (unsigned short *)0x0000/* 2764 chip starts at location 0000h*/
11 * 9 #define ROM_LENGTH 0x2000/* 2764 byte capacity is 8K or 2000h */
12 *10 #define BLANK_ON 0xFF /* Bit pattern to blank out beam */
13 *11 #define BLANK_OFF 0 /* Bit pattern to enable beam */
14 * 3 unsigned char Array [256]; /* Global array holding display */
15 * 4 unsigned char Oldest; /* Index to the Oldest inserted */
16 * 5
17 * 6 main()
18 * 7 {
19 .text
20 .even
21 00418 4e56 fff4 _main: link a6,#-12 *## Frame of 3 words with A6 as FP (TOF)
22 0041C 48e7 0c00 movem.l d5/d4,-(sp) *## D4/D5 not to be changed by any ftn
23 * 8 register short int i; /* Scan counter */
24 * 9 register unsigned char leftmost; /* The initial array index when x is 0 */
25 *10 unsigned char * const x = ANALOG_X; /* x points to a byte @ (address) ANALOG_X*/
26 00420 2d7c 00002000fffc move.l #0x2000,-4(a6) *## Pointer constant 2000h @ TOF -4/-1
27 *11 unsigned char * const y = ANALOG_Y; /* y points to a byte @ (address) ANALOG_Y*/
28 00428 2d7c 00002001fff8 move.l #0x2001,-8(a6) *## Likewise constant 2001h @ TOF-8/-5
29 *12 unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) */
30 00430 2d7c 0000a000fff4 move.l #0xa000,-12(a6)*## Likewise constant A000h @ TOF-12/-9
31 *13 Oldest = 0; /* Start New index at beginning of array */
32 00438 4239 0000e000 clr.b _Oldest *## _Oldest lives in absolute memory @ E000h
372 C FOR THE MICROPROCESSOR ENGINEER
Table 14.9: 68000 code resulting from Tables 14.1 and 14.2 (continued next page).
33 *14 for(i=0; i<256; i++) /* Clear array */
34 0043E 4245 clr.w d5 *## D5(15:0) holds register short i
35 00440 0c45 0100 L1: cmpi.w #256,d5 *## Is i beyond 255?
36 00444 6c 0e bge.s L14 *## IF yes THEN exit clear for loop
37 *15 {Array[i]=0;}
38 00446 227c 0000e002 move.l #_Array,a1 *## ELSE point A1 to Array[0] each time thru
39 0044C 4231 5000 clr.b (a1,d5.w) *## Clear Array[i]
40 00450 5245 addq.w #1,d5 *## i++
41 00452 60 ec bra.s L1 *## and repeat
42 *16 while(1) /* Do forever display contents of array */
43 *17 {
44 *18 leftmost = Oldest; /* Make the leftmost point on screen the oldest sample */
45 00454 1839 0000e000 L14: move.b _Oldest,d4 *## Now make leftmost (in reg D4) = _Oldest
46 *19 for (i=0; i<256; i++)
47 0045A 4245 clr.w d5 *## i=0
48 0045C 0c45 0100 L16: cmpi.w #256,d5 *## i>255 yet?
49 00460 6c 2a bge.s L17 *## IF yes THEN end scan for loop
50 *20 {
51 *21 *x = (unsigned char)i; /* Send x co-ordinate to X plates */
52 00462 226e fffc move.l -4(a6),a1 *## Get pointer constant 2000h (ie x) to A1
53 00466 1e05 move.b d5,d7 *## Move lower 8 bits of i into D7[7:0]
54 00468 1287 move.b d7,(a1) *## and then send it to x
55 *22 *y = Array[(leftmost+i)&0x0ff]; /* and the display byte to the Y D/A */
56 0046A 226e fff8 move.l -8(a6),a1 *## Get pointer constant 2001h (y) to A1
57 0046E 7e00 moveq.l #0,d7 *## Move 8-bit leftmost extended to 32-bit
58 00470 1e04 move.b d4,d7 *## int to D7
59 00472 3c05 move.w d5,d6 *## Get i to D6[15:0]
60 00474 48c6 ext.l d6 *## and extend to 32-bit int
61 00476 de86 add.l d6,d7 *## Add them in int form = leftmost+i
62 00478 0287 000000ff andi.l #255,d7 *## Reduce to 8-bit (leftmost+i)&0xff
63 0047E 2447 move.l d7,a2 *## Put this array index in A2
64 00480 d5fc 0000e002 add.l #_Array,a2 *## + to Array gives address of Array[index]
65 00486 1292 move.b (a2),(a1) *## Move to y
66 *23 }
67 00488 5245 addq.w #1,d5 *## i++
68 0048A 60 d0 bra.s L16 *## and repeat scan
69 *24 *z = BLANK_ON; /* Blank out for flyback */
70 0048C 226e fff4 L17: move.l -12(a6),a1 *## Get constant pointer to z into A1
71 00490 12bc 00ff move.b #-1,(a1) *## Send 1111 1111 to z
72 *25 *x = 0; /* Move to right of screen */
73 00494 226e fffc move.l -4(a6),a1 *## Get constant pointer to x into A1
74 00498 4211 clr.b (a1) *## x=0
75 *26 *y = Array[Oldest]; /* Y value at left of screen */
76 0049A 226e fff8 move.l -8(a6),a1 *## A1 now points to y
77 0049E 247c 0000e002 move.l #_Array,a2 *## A2 now points to Array[0]
78 004A4 7e00 moveq.l #0,d7 *## Extend _Oldest to 32-bit int size
79 004A6 1e39 0000e000 move.b _Oldest,d7
80 004AC 12b2 7800 move.b (a2,d7.l),(a1)*## Send array[Oldest] to y
81 *27 for(i=0; i<5; i++) {;} /* Delay */
82 004B0 4245 clr.w d5 *## i=0
83 004B2 0c45 0005 L121: cmpi.w #5,d5 *## i<5?
84 004B6 6c 04 bge.s L131 *## IF yes THEN exit from delay for loop
85 004B8 5245 L141: addq.w #1,d5 *## ELSE i++
86 004BA 60 f6 bra.s L121 *## and repeat
87 *28 *z = BLANK_OFF; /* Blank off */
88 004BC 226e fff4 L131: move.l -12(a6),a1 *## A1 now points to z
89 004C0 4211 clr.b (a1) *## Send 0000 0000 to z
90 *29 }
91 004C2 60 90 bra.s L14 *## Repeat the complete scan
92 *fnsize=86
93 *30 }
68008 – TARGET CODE 373
Table 14.9 (continued) 68000 code resulting from Tables 14.1 and 14.2.
.94 *31 /*********************************************************************************
95 *32 * This is the NMI interrupt service routine which puts the analog sample in the
96 *33 * ENTRY : Via NMI and startup
97 *34 * ENTRY : Array[] and Oldest are global
98 *35 * EXIT : Value held at a_d in Array[Oldest], Oldest incremented with wraparound
99 *36 *********************************************************************************
100 *37
101 *38 void update(void)
102 *39 {
103 .even
104 004C4 4e56 fffc _update: link a6,#-4 *## Make frame of 1 word for pointer const
105 *40 volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */
106 004C8 2d7c 00006000fffc move.l #0x6000,-4(a6)*## Constant pointer ANINPUT in TOF-4/-1
107 *41 Array[Oldest++] = *a_d; /* Overwrite oldest sample in Array[] & inc */
108 004D0 1e39 0000e000 move.b _Oldest,d7 *## _Oldest to D7[7:0]
109 004D6 5239 0000e000 addq.b #1,_Oldest *## _Oldest++
110 004DC 0287 000000ff and.l #255,d7 *## Expand to 32-bit int
111 004E2 2247 move.l d7,a1 *## A1 now holds array index
112 004E4 d3fc 0000e002 add.l #_Array,a1 *## A1 now points to Array[Oldest]
113 004EA 246e fffc move.l -4(a6),a2 *## A2 now points to ANINPUT
114 004EE 1292 move.b (a2),(a1) *## Put [ANINPUT] into Array[Oldest]
115 *42 }
116 004F0 4e5e unlk a6 *## Close up frame
117 004F2 4e75 rts *## and return
118 *fnsize=110
119 .globl _update
120 .globl _main
121 .bss
122 .even
123 0E000 _Oldest: .=.+1 *## Reserve one byte for char Oldest
124 .globl _Oldest
125 .even
126 0E002 _Array: .=.+256 *## Reserve 256 bytes for Array[256]
127 .globl _Array
no assembler errors
code segment size = 220
data segment size = 0
After all this 32-bit `fiddling around', the sum of these two is truncated by AND-
ing with 11111111b (0xFF):
It is clear from this discussion that nothing has been gained in making these
two register variables char and short. Unlike its 6809 counterpart, no provision
to buck the ANSII promotion requirement is provided by this compiler. We will
return to this point later.
Communication between the background main() (strictly void main(void))
and the interrupt function update() is handled via the two global objects, Oldest
and Array[]. By defining these outside any function (lines 14 and 15 in Ta-
ble 14.9), the compiler has placed their base labels _Oldest and _Array in ab-
solute memory (lines 123 – 127). One byte has been reserved for the former and
256 for the latter. However, as Array[] is not a byte object, a hole of one byte
374 C FOR THE MICROPROCESSOR ENGINEER
is left after _Oldest, to ensure that it starts at an even address (i.e. .EVEN). Both
labels have been declared .GLOBL, and thus are known to all, through the linker.
The two labels have been placed in the _bss program section (directive .BSS),
which is used by this compiler for static and extern data with no initial values.
C specifies that these should be load-time initialized by default to zero, and that,
by inference, this should be done in the startup routine. However, in this instance
I have chosen to do this at run time in the main() function at the C level, in lines
33 – 41.
The linker has been configured to commence the _bss sector at E000h, which
locates _Oldest at E000h and _Array at E002h. Program section _text actually
begins at 0000h, but the startup routine vector table of Table 14.10 brings _main
up above the vector table top (03FFh).
The startup routine has three functions. The first is to place the initial System
Stack Pointer address in locations 00000 – 00003h and Reset address (i.e. initial
Program Counter value) in 00004 – 7h. In addition, the level-7 interrupt auto
vector, which points to the startup NMI stub, is placed in 0007C – 0007Fh. Other
vectors could of course be filled in the same manner (see Table 10.8). Space is
then reserved up to 003FFh.
The startup program proper begins at 00400h. This has two purposes. The
first is to go to the main C routine, which is implemented as a simple JSR _main
in line 12. No flags require changing in the Status register before this move, as we
are remaining within the Supervisor state, and the initial Interrupt mask setting
of 111b still permits edge triggered non-maskable interrupts. In our situation,
no parameters are passed (i.e. through the stack) to main(), and, as this is an
endless loop, there should be no return. If there is, a move back to the beginning
is actioned. This re-entry point is labelled _exit, and can be reached from the
C level by calling the ANSII library routine exit() [3]. exit() is supposed to
return True or False, to indicate an error condition, but no use is made of this in
our implementation.
The final function deals with the level-7 interrupt handler. The update()
function is terminated with RTS, in line 117 of Table 14.9, and so cannot be
directly entered via an interrupt. Instead, the startup has a stub, in lines 14 –
17, which is labelled NMI. This address was placed in the vector table earlier in
line 8. When a level-7 interrupt occurs, the processor goes to this stub. All that
happens here is that the registers D7, A1 and A2 are pushed onto the System
stack, and a subroutine Jump (JSR _update) is made to update(). The way
back is a similar double-hop, with update()'s RTS returning the processor to
the stub, the registers then being pulled off the stack followed by a terminating
RTE. This compiler's house rule always preserves D3, D4, D5, A3, A4, A5 (all of
which are used for register variables) and A6, A7 (the Frame and Stack Pointers)
on return from a function. Thus a general interrupt stub need only save D0, D1,
D2, D6, D7, A0, A1, A2. However, specifically update() only uses D7, A1 and A2.
The linker places the startup code before the output from the C compiler, giv-
ing the Intel-coded machine-code file of Table 14.11. This is used as the input
to an EPROM programmer or in-circuit emulator. In total there are 244 bytes of
68008 – TARGET CODE 375
EPROM text (excluding the fixed vector table). It is interesting to compare this
with the 178 bytes produced by the 6809 equivalent in the last section. Although
there are less lines, 68000 instructions tend to be longer than their 6809 coun-
terparts.
The double-hop interrupt handling technique will work with any compiler.
However, most compilers with aspirations to produce ROMable code, support
extensions to the ANSII standard, enabling the user to declare a function as an
interrupt handler (see Section 10.2). The function name, in our case _update,
should then be placed directly in the vector table, rather than the stub label. This
direct entry should decrease the response period to an interrupt, and at the same
time possibly reduce the code emitted by the compiler.
This compiler uses the directive @port to designate a function in this way,
the function heading becoming @port update(). The code produced by this
stratagem is shown in Table 14.12. Here, four instructions have been inserted
into the function code, lines 104 – 107. These instructions are virtually identical
to the stub of Table 14.10, but of course are directly entered at _update. In
reality no time is saved, as the main body of update() is unchanged, and is
simply treated as a subroutine. Thus a double hop still occurs on entry and exit.
Bonding with the update() function code can be improved by eschewing the
use of @port and using the (once again non-standard) _asm() function to insert
the relevant assembly-level code, as shown in Table 14.13. Thus:
:200400004EB90000041860F848E701604EB9000004C44CDF06804E734E56FFF448E70C00BE Startup
:200420002D7C00002000FFFC2D7C00002001FFF82D7C0000A000FFF442390000E000424519 and main()
:200440000C4501006C0E227C0000E00242315000524560EC18390000E00042450C450100A0
:200460006C2A226EFFFC1E051287226EFFF87E001E043C0548C6DE860287000000FF2447D2
:20048000D5FC0000E0021292524560D0226EFFF412BC00FF226EFFFC4211226EFFF8247CE9
:2004A0000000E0027E001E390000E00012B2780042450C4500056C04524560F6226EFFF4AC
:2004C000421160904E56FFFC2D7C00006000FFFC1E390000E00052390000E000028700000B update()
:1404E00000FF2247D3FC0000E002246EFFFC12924E5E4E754F
:00000001FF
at the close, tightly couples the additional interrupt code to the compiler-emitted
code. Care must be taken to mirror any registers pushed out on to the stack by
68008 – TARGET CODE 377
the compiler. _asm() could also, in principle, be used to implement the startup
code as a front end to the C code.
These latter two approaches are non-portable, and can be error prone. Thus,
in the majority of cases a startup stub approach is best, if rather slow. If speed
and/or space is extremely tight, then the compiler generated assembly-level file
can be creatively edited. Thus any MOVEM instruction emitted by the compiler
can be augmented with the registers not left untouched by the routine, and RTS
replaced by RTE. But this approach is dangerous, as it does not show up in the
source of any constituent file, and, unless extremely well documented, will cause
havoc if any but the original designer tries to make subsequent changes. In any
case, tinkering with intermediate files is not what compiling is all about.
In the last section we were able to fine tune our C source file, knowing the char-
acteristics of the target processor. The increase in speed and size is of course
at the expense of portability. Can we do this for the 68000-target version? For
94 * 31 /********************************************************************************
95 * 32 * This is the NMI interrupt service routine which puts the analog sample in the
96 * 33 * ENTRY : Via NMI and startup
97 * 34 * ENTRY : Array[] and Oldest are global
98 * 35 * EXIT : Value held at a_d in Array[Oldest], Oldest incremented with wraparound
99 * 36 ********************************************************************************
100 * 37
101 * 38 @port update()
102 * 39 {
103 .even
104 004B4 48e7 e3e0 _update: movem.l d0-d2/d6/d7/a0-a2,-(sp) *## Save all registers
105 004B8 4eb9 000000bc jsr L6 *## Go to update() proper
106 004BE 4cdf 07c7 movem.l (sp)+,d0-d2/d6/d7/a0-a2 *## Restore regs before
107 004C2 4e73 rte
108 004C4 4e56 fffc L6: link a6,#-4 *## As Table 14.9
109 * 40 volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port*/
110 004C8 2d7c 00006000fffc move.l #0x6000,-4(a6)
111 * 41 Array[Oldest++] = *a_d; /* Overwrite oldest sample in Array[] and inc index */
112 004D0 1e39 0000e000 move.b _Oldest,d7
113 004D6 5239 0000e000 addq.b #1,_Oldest
114 004DC 0287 000000ff and.l #255,d7
115 004E2 2247 move.l d7,a1
116 004E4 d3fc 0000e002 add.l #_Array,a1
117 004EE 246e fffc move.l -4(a6),a2
118 004F0 1292 move.b (a2),(a1)
119 004F2 4e5e unlk a6
120 004F4 4e75 rts
121 *fnsize=110
122 .globl _update
123 .globl _main
124 .bss
125 .even
126 0E000 _Oldest: . =.+1
127 .globl _Oldest
128 .even
129 0E002 _Array: . =.+256
130 .globl _Array
378 C FOR THE MICROPROCESSOR ENGINEER
example, we have previously observed that the use of register short and char
objects is counterproductive, as such objects are extended to int during most
arithmetic processes. Neither short i nor char leftmost rely on modulo-256
wraparound, so they can profitably be redefined as ints to overcome this addi-
tional processing.
Most 68000-targeted compilers can be persuaded to define int as either a 16
or 32-bit word. All previous examples have been based on 32-bit ints. Using 16-
bit ints will speed up memory access and ALU processes. However, any address
arithmetic, such as the calculation of the position of an array element, will require
conversion to the 32-bit pointer size.
Where constants are being stored, for example pointers to fixed hardware
ports, it is not necessary to locate these values dynamically in the frame. We can
see this run-time setup in line 110 of Table 14.12, where the constant 6000h (the
address of the A/D) is put into the frame on each entry to update(). Constants
are best stored in absolute locations, preferably in ROM along with the program
text. In the case of constant pointers, this can be done by defining such objects
as static, for example:
.even
L53_a_d:.long 0x6000 *## Pointer constant to A_D here in ROM
*30 }
*31 /**************************************************************************************
*32 * This is the NMI interrupt service routine which puts the analog sample in the array
*33 * ENTRY : Via NMI and startup
*34 * ENTRY : Array[] and Oldest are global
*35 * EXIT : Value held at a_d in Array[Oldest], Oldest incremented with wraparound @ 256
*36 **************************************************************************************
*37
*38 void update(void)
*39 {
.even
*40 static volatile unsigned char * const a_d = ANINPUT;/* This is the Analog i/p port*/
*41 Array[Oldest++] = *a_d; /* Overwrite oldest sample in Array[] & inc Oldest index */
_update: move.b _Oldest,d7 *## Notice, no frame is made for this function as no autos
addq.b #1,_Oldest
and.l #255,d7
move.l d7,a1
add.l #_Array,a1
move.l L53_a_d,a2
move.b (a2),(a1)
*43 }
rts
*fnsize=99
.globl _update
.globl _main
.bss
.even
_Oldest: . =.+1
.globl _Oldest
.even
_Array: . =.+256
.globl _Array
that is the _text section. The compile-time nature of these constants is clearly
seen in lines 3 – 9 of Table 14.14, where they are placed in the EPROM at locations
00418 – 0041Fh. This saves four bytes for each of the four pointer constants.
Furthermore, neither main() nor update() require a frame, as no auto variables
382 C FOR THE MICROPROCESSOR ENGINEER
are used.
Table 14.14 shows the tuned version of our software. It differs from Table 14.9
in the following respects:
1. The compiler has been configured for a 16-bit int. This obviates conversion to
32-bit for arithmetic processes, and suits the 16-bit ALU used by the 68000/8
processors. However, it is a double-edged sword, in that pointers are still 32-
bit, and the use of an int to generate an address (e.g. as an array index) will
require a promotion (see code between C22 and C23).
2. The register variables i and leftmost have been redefined as int types.
This avoids conversion extensions in arithmetic processes.
3. Pointer constants have been defined as static, which places them in ROM.
The alternative of defining them externally (see Table 15.5) does the same
thing. The compiler then uses absolute addressing to get these values (see
code between lines C21 and C22). Some compilers (not this) have a small
model mode where the Short Absolute address mode is used. Where this is
available, two bytes are saved for each absolute access.
With these alterations, the total size is now down to 224 bytes plus data
storage and the fixed-size vector table. I have used the startup stub entry to
update(), which is the most portable technique. A few bytes may be saved at
the expense of this portability by direct editing of the assembly-level code. For
comparison, a hand-assembled 68000-based version is given in Table 16.2.
Another possibility, not implemented here, is to use pointers to implement the
three array handling loops. As these loops walk through the array, the process
may be more efficient (see Section 9.2). However, with this compiler savings are
minimal, as its array-handling code is quite efficient [2]. See Table 15.5 for an
example of this technique.
References
[1] Banahan, M.; The C Book, Addison-Wesley, 1988, Chapter 5.
[2] Sutherland, D.; Compiled Thoughts, Letters to the Editor, Embedded Systems, 6, no. 5,
May 1993, pp. 11.
[3] Banahan, M.; The C Book, Addison-Wesley, 1988, Section 9.15.4.
CHAPTER 15
The process from program text to binary bits in ROM has already been charted
in Fig. 7.5. But what then? It would be naive to presume that the production of
a machine-code file and programmed ROM is the end of the story. Just inserting
this ROM into the target hardware, switching on and hoping for the best is unlikely
to be productive of anything except frustration. Invariably testing and debugging
this software will take far longer than the writing of the original code [1].
Testing involves executing the software in a controlled environment, to exer-
cise the various responses to typical input stimuli. It is impossible to test every
possible pathway in all but the simplest of routines, but a range of typical and
boundary values should help and ensure that the program behaves properly. De-
composing the program into functional modules helps to facilitate this process.
Malfunctions are said to be caused by bugs (after an alleged incident where a
moth was caught in a relay of an early electro-mechanical computer [2]). Bugs are
normally found by applying a series of tests which focuses down onto the area
of software (or hardware) which is exhibiting the erroneous behavior. Hardware
testing and debugging is aided by a range of tools, varying in complexity from
meters through to the logic analyser. Similarly, software debug tools are available
in various levels of sophistication, to enable the tester to `see into the works'
while the program executes.
The easiest scenario arises when a general-purpose computer is used to gen-
erate (i.e. compile and assemble) code which it will itself subsequently run. Its
many resources, such as operating system, VDU and keyboard, can be utilized
with resident debug software to test the operation of the application software.
Virtually total ignorance of the underlying hardware is possible.
The situation is very different when the target system is a dedicated ROM-
based stand-alone system, usually with a different processor to the code-generating
computer. In this cross environment (see Fig. 7.3(b)), gone is any resident debug
software or superfluous peripheral devices. Interaction with the hardware at the
machine code level looms large. Problems are compounded where a high-level
language is used as the source, as the correlation between the executing code
and source code is tortuous. At the time of writing, high-level simulators and
emulators are the fastest growing area of cross software support.
In this chapter we will look at some of the debug tools available for cross-target
support. Our time-compressed memory will be used to illustrate the character-
istics of these aids.
383
384 C FOR THE MICROPROCESSOR ENGINEER
15.1 Simulation
Given that a program has been written in a high-level language for another target,
how is it to be tested? We have already observed that a naked target will carry no
debug overhead to permit meaningful monitoring. Furthermore, it will execute
(obviously) at machine-code level.
A first approach to the problem is to use a native compiler running on the
host machine. Thus, if an IBM PC is utilized as the development system, then use
a compiler which produces native executable files. Obviously the environment of
the host is very different to that of the naked target. However, gross algorithmic
problems can often be eliminated using this technique. Function input/output
parameters can be simulated by using operating system input/output functions.
Table 10.14 shows a simple example, where the sum_of_n function is emulated
using keyboard input and VDU output.
Monitoring high-level objects can be accomplished by using output functions
to display or print their values. In Table 10.14, a printf() statement inside the
loop would be suitable. Many native compilers, especially (but not exclusively)
targeted to MSDOS 80x86 family hosts, can be run in conjunction with a debug
package [3]. Such packages allow the operator to watch a selection of variables as
the program advances in a single-step or trace mode. Alternatively a breakpoint
may be inserted (e.g. stop at line 26, or/and when Array[6] = 0), which permits
high-speed operation to a predetermined point, at which time execution ceases
and variables can be examined.
The usefulness of this native technique is enhanced if the native and cross
compilers belong to the same family of products. Several compiler vendors pro-
vide suitable products, such as Aztec and the Intermetrics/COSMIC Whitesmiths
group. In these circumstances, native and cross products usually share common
characteristics, such as libraries.
Where there is a great deal of interaction between software and hardware,
native debugging is of limited use. This is particularly the case where the tar-
get processor is different to the host. For example, a 68030 MPU-based Hewlett
Packard workstation hosting a Z80-based target. Monitoring machine-level code
will often be necessary to reveal the more subtle problems, especially where hard-
ware interaction is involved.
One way of tackling this problem is to use the host to simulate the target MPU [4].
Such a cross-simulator, sometimes known as a low-level symbolic debugger, is
particularly of use in testing cross-assembler code. However, languages whose
compilers produce assembly-level code, can also be tested in this manner. One
major advantage of the use of a simulator is that no target hardware is involved.
Thus the hardware and software design stages can stay apart longer. This takes
the load off expensive equipment, such as an in-circuit emulator, which can then
be used for the really obscure problems and final testing. By their nature, simula-
tors cannot run in real time and they still leave a lot to be desired when interaction
with hardware is problematical.
Most simulators take their output from the linker in terms of the machine code,
SIMULATION 385
location data and symbol tables. Part of the host's memory space is used to hold
this machine code, and the target's data memory space is likewise mapped. The
major facilities offered by a simulator are:
Disassembly
Displays the contents of simulated target memory as instruction mnemonics
— a sort of reverse assembler.
Register and memory examine/change
To be able to examine any internal register(s) or memory location(s) and make
necessary changes.
Step execution
To execute the target program one or more instructions at a time, usually
displaying registers after each one.
Trace execution
Similar to the previous item, but as fast as can be displayed.
Breakpoints
Insertion of conditions, such as reaching a certain address, which causes exe-
cution to pause or stop.
Execute
Similar to Trace, but as fast as the simulator can operate with no screen output.
Normally stops when a breakpoint is encountered.
The operation of a simulator is very much product specific. The COSMIC/Inter-
metrics MIMIC range of simulators have been used for the following three exam-
ples.
Our first simulation is our old friend the sum of n integers, Table 4.10. Ta-
ble 15.1 is a log of a simulation session, with comments added later for clarity.
After loading in the file, the process was:
1. Disassemble program mnemonics from the beginning (e SUM_OF_N or e 0x400).
2. Change D0 to 0xFF0003 to simulate D0.W = 0003h ($D0 = 0xFF0003).
3. Single step until S_EXIT is reached (s or s1). Note that Step goes from the
current value of PC, here initialized to 400h when the object file was loaded.
Thus to start again, do $pc = SUM_OF_N or $pc = 0x400.
The second example is more elaborate. Here the target is the 6809 equivalent
of Tables 2.9 and 2.10. We wish to trace the execution down to where the simu-
lated processor attempts to fetch the final instruction (RTS) at S_EXIT. Thus we
have to set up a breakpoint at this address.
This time the log shown in Table 15.2 was generated as follows:
1. Set a breakpoint at S_EXIT (b S_EXIT or b 0xE00C). Note, br S_EXIT sets a
break when reading from S_EXIT, which is an alternative in this situation.
Breaks on a Write and over a range of addresses are possible. For example
bw 0xE000, 0xFFFF breaks when a Write is attempted in memory between
E000h and FFFFh, which is one way of simulating ROM. An unlimited number
of breakpoints can be set.
386 C FOR THE MICROPROCESSOR ENGINEER
Table 15.1 Simulating the program of Table 4.10. User input shown in quotes, comments bracketed.
"e SUM_OF_N #5"
(Disassemble from SUM_OF_N for five instructions)
0x000400 SUM_OF_N:
--> 0x000400 02800000ffff andi.l #0xffff,d0
0x000406 4281 clr.l d1
0x000408 SLOOP:
0x000408 d280 add.l d0,d1
0x00040a 51c8fffc dbf d0,SLOOP
0x00040e S_EXIT:
0x00040e 4e75 rts
"$d0 = 0xff0003"
(Set D0.L to 00FF0003h)
"s"
(Single step for as long as desired)
TS I XNZVC
d0:00ff0003 d1:00000000 d2:00000000 d3:00000000 ssp:00000000 sr:../0/..... |
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:00000400 | (Before
a0:00000000 a1:00000000 a2:00000000 a3:00000000 | ANDI.L #FFFFh)
a4:00000000 a5:00000000 a6:00000000 a7:00000000 andi.l #0xffff,d0 |
"s"
d0:00000003 d1:00000000 d2:00000000 d3:00000000 ssp:00000000 sr:../0/..... |
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:00000406 | (After execution.
a0:00000000 a1:00000000 a2:00000000 a3:00000000 | Next instruction
a4:00000000 a5:00000000 a6:00000000 a7:00000000 clr.l d1 | is CLR.L D1)
"s"
d0:00000003 d1:00000000 d2:00000000 d3:00000000 ssp:00000000 sr:../0/..Z.. |
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:00000408 | (Status register
a0:00000000 a1:00000000 a2:00000000 a3:00000000 | showing the
a4:00000000 a5:00000000 a6:00000000 a7:00000000 add.l d0,d1 | Z flag setting)
"s"
d0:00000003 d1:00000003 d2:00000000 d3:00000000 ssp:00000000 sr:../0/.....
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:0000040a
a0:00000000 a1:00000000 a2:00000000 a3:00000000
a4:00000000 a5:00000000 a6:00000000 a7:00000000 dbf d0,SLOOP
"s"
d0:00000002 d1:00000003 d2:00000000 d3:00000000 ssp:00000000 sr:../0/..... | (PC goes back to
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:00000408 | 408h, i.e. SLOOP
a0:00000000 a1:00000000 a2:00000000 a3:00000000 | as D0.W isn't
a4:00000000 a5:00000000 a6:00000000 a7:00000000 add.l d0,d1 | -1)
"s"
d0:00000002 d1:00000005 d2:00000000 d3:00000000 ssp:00000000 sr:../0/.....
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:0000040a
a0:00000000 a1:00000000 a2:00000000 a3:00000000
a4:00000000 a5:00000000 a6:00000000 a7:00000000 dbf d0,SLOOP
"s"
d0:00000001 d1:00000005 d2:00000000 d3:00000000 ssp:00000000 sr:../0/.....
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:00000408
a0:00000000 a1:00000000 a2:00000000 a3:00000000
a4:00000000 a5:00000000 a6:00000000 a7:00000000 add.l d0,d1
"s"
d0:00000001 d1:00000006 d2:00000000 d3:00000000 ssp:00000000 sr:../0/.....
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:0000040a
a0:00000000 a1:00000000 a2:00000000 a3:00000000
a4:00000000 a5:00000000 a6:00000000 a7:00000000 dbf d0,SLOOP
"s"
d0:00000000 d1:00000006 d2:00000000 d3:00000000 ssp:00000000 sr:../0/.....
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:00000408
a0:00000000 a1:00000000 a2:00000000 a3:00000000
a4:00000000 a5:00000000 a6:00000000 a7:00000000 add.l d0,d1
"s"
d0:00000000 d1:00000006 d2:00000000 d3:00000000 ssp:00000000 sr:../0/..... | (D0.W has been
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:0000040a | decremented to
a0:00000000 a1:00000000 a2:00000000 a3:00000000 | -1, so end of
a4:00000000 a5:00000000 a6:00000000 a7:00000000 dbf d0,SLOOP | DBF loop)
"s"
d0:0000ffff d1:00000006 d2:00000000 d3:00000000 ssp:00000000 sr:../0/..... | (Ans is in D1.L
d4:00000000 d5:00000000 d6:00000000 d7:00000000 usp:00000000 pc:0000040e | at end of
a0:00000000 a1:00000000 a2:00000000 a3:00000000 | subroutine. RTS
a4:00000000 a5:00000000 a6:00000000 a7:00000000 (rts ) | isn't executed)
SIMULATION 387
"$b = 03"
(Make Accumulator B 03)
"t SUM_OF_N"
(Trace from sum_of_N down to breakpoint)
generates a level-7 interrupt each 20,000 cycles (100 interrupts per simulated
second with an 8 MHz clock) and continues on.
Simulating high-level sourced programs of any size at this level is tedious at
the very least. Table 7.14 was deliberately chosen to have only static variables,
so that each variable has a meaningful label attached. Most variables in C are
dynamic (i.e. auto), and have no fixed abode. Of course in a simulator, their posi-
tion relative to the Frame Pointer can be found and therefore accessed. However,
determining by hand where many variables are in the frame is time consum-
ing. Some compilers produce a report on the size and location of each variable.
An example of such a report is given in Table 15.4. Not only is this useful for
assembly-level simulation, but it is the first step towards high-level cross simu-
lation.
Simulators used to debug realistic high-level sourced software must provide
the facility to monitor directly high-level objects and instructions as well as ad-
dresses, registers and assembly-level instructions. The next few examples are
based on the Intermetrics/COSMIC CXDB (C cross DeBugger) products. These are
high-level front ends to the MIMIC simulator (renamed MICSIM for MICroproces-
sor SIMulator) we have just used. The user can move down to MICSIM at any time
to perform any machine-level task, for example to set up a breakpoint on an at-
tempt to write to simulated ROM, and move back up again. MICSIM's instructions
are the same as those illustrated in Tables 15.1 – 15.3.
At the high level, the following core features are available:
Listing
Displays the C source with or without the resulting assembly code, around the
current execution point in the source window.
SIMULATION 389
Table 15.4 A report on the variables used in the 68008 TCM system of Table 15.5.
Information extracted from a:diag_682.xeq
FUNCTION : extern void update() lines 50 to 53 at 0x4d4-0x4f8 <- Function update() has
<- no local variables
FUNCTION : extern void diagnostic() lines 61 to 74 at 0x4f8-0x548 <- Nor has diagnostic()
FUNCTION : extern void output_test() lines 77 to 86 at 0x548-0x572 <- Ftn output_test has
VARIABLES: <- only 1 reg. variable
register unsigned char count at reg. d5
FUNCTION : extern void input_test() lines 88 to 92 at 0x572-0x590 <- Ftn input_test() has
<- no local variables
FUNCTION : extern void RAM_test() lines 94 to 111 at 0x590-0x5d0 <- Function RAM_test()
VARIABLES: <- All its local variables
register unsigned int i at reg. d5
register unsigned char temp at reg. d4
register unsigned char *memory at reg. a5
FUNCTION : extern void ROM_test() lines 113 to 120 at 0x5d0-0x606 <- Function ROM_test()
VARIABLES: <- All its local variables
register unsigned char *address at reg. a5
register unsigned short sum at reg. d5
Monitor
Displays state of C objects or values of C expressions continuously in the
monitor window.
Update
Allows a C data object to be altered at any time.
SIMULATION 391
Step execution
Steps through the C program, each step executing one or more C source lines.
During this time, variables may be continually monitored, and the function
window shows which function the program is in and what values were passed
to it. Also the state of the frame and any variables in that function can be
examined.
Breakpoints
Insertion of high-level conditions, such as executing a C line or entering a func-
tion, which causes execution to pause. Actions may be taken automatically on
pause, such as changing the value of a variable.
Execute
Runs simulation at full speed from halt point, normally to the next breakpoint.
For our first example, consider the screen dump of Fig. 15.1(a). This shows the
sum-of-integers program of Table 7.14 in the central code window. The cursor is
at line 6, which has yet to be executed. The variables n and sum appear in the top
left monitor window. To get to this point, the following commands were entered:
1. Step from line 1 (entry point) to line 5 (s5 or five single steps, s).
2. Command that the variables n and sum be monitored (m n,sum).
3. Set (Update) the value of n to 20 to simulate a passed parameter (u n 20).
4. Single step around the loop once, that is four steps (s4). After doing this, n
has been added to sum, thus sum is 20. Also n has been decremented, thus n
is 19.
With the loop operation checked, the algorithm can be verified by either single
stepping until line 12, or more conveniently, setting a breakpoint and executing
at full speed. The screen dump of Fig. 15.1(b) is the result, with the following
additional commands:
1. Set a breakpoint at line 11 (b :11)
2. Go from current cursor (g :11 or just g)
3. Look at register state at breakpoint (r)
Now n has been decremented to zero and sum is correctly read as 210. Also
shown in the code window is the state of the registers. If we could have gone back
to the calling function, Accumulator_B would have been D2h (i.e. sum returned).
The register display can be toggled on and off by entering r.
Figure 15.1(b) showed that the underlying target was 6809 code. If the regis-
ter display had not been toggled, this fact would not have been known (actually
Fig. 15.1(a) was generated using the 68000 simulator, just to illustrate this point!).
Machine independence is a feature of high-level symbolic simulators. It is pos-
sible to step showing the mixture of high and low-level codes, but stepping and
monitoring are still at source level.
The top right window shows both the current function and the function path
taken to arrive at the current point. This is not very informative in the simple
situation simulated in Fig. 15.1, but is useful in realistic situations. Figure 15.2
392 C FOR THE MICROPROCESSOR ENGINEER
depicts the exponentiation program of Table 9.1. Three functions are coded,
namely main(), power() and abs(). I have stepped through this program until
exp is 1. The function window then shows abs(15625) at the bottom, which says
you are now in function abs() which has had 15,625 passed to it (i.e. result).
Above this is power(25,3), which says the entry point was from power(), to
which had been passed the values 25 and 3. Finally above this is main(), the
caller of power(). The 68000 simulator was used for this diagram.
Finally let us examine our time-compressed memory software of Table 14.1. In
Fig. 15.3(a) I have stepped around the loop so that i is ten. The monitor window
shows the contents of x (the X D/A converter), which is just i, the contents of
y (the Y D/A converter), the variable Oldest (changed by update(), here = 0)
and the array value currently being sent out to y (here Array[10]). This shows
that expressions and indirection can be monitored, as well as simple variables.
If pointers are monitored, they display in hexadecimal. Ordinary objects default
to decimal, but using the mx command (examine Memory in heXadecimal) forces
hexadecimal.
In Fig. 15.3(b) I have converted the code window to a view box of the variables
in function main(). The top five (i, leftmost, *x, *y and *z) are all auto vari-
394 C FOR THE MICROPROCESSOR ENGINEER
ables and their address is given in the System stack (set to 400h by the startup
code in the 6809 version). As *x, *y and *z are pointers, their value is given in
hexadecimal.
Array[] is an external variable and is listed under file variables. All 256 values
are given. If the window is scrolled down, the file variable Oldest is given as:
The screen dumps shown in Fig. 15.3(a) and (b) were taken on different runs
and thus *y and Array[10] vary between the two situations.
Both Array[] and Oldest objects are updated by an interrupt service routine.
How is such an interrupt simulated with CXDB? To `generate' an interrupt at any
time (typically in-between steps or at a break point) requires a move down to the
underlying assembly-level simulator (MICSIM with this product). The sequence
of commands to generate the screen dump shown in Fig. 15.4 was:
the next null instruction. Thus all address logic, Chip Enable connections and
some control signals, such as Reset and Clocks, can be monitored dynamically.
Simple test equipment, such as a logic probe or oscilloscope are adequate for this
purpose. Free running is especially useful for signature analysis [6].
get MPU. Firstly it should not execute any Write cycles, as the data bus has been
hijacked. This means that it will either do something on an internal register or
even nothing at all. Secondly, its op-code should be the size of the data bus, or
if larger, a repetitive multiple.
Figure 15.6(a) and (b) shows the free-run facility applied to the 6809 MPU. Nor-
mally the switch is open, and the two back-to-back diodes do not conduct. With
the switch closed and the data bus isolated from the outside world, the pattern
01011111b (5Fh) is jammed onto the bus. Thus on Reset, the 6809 fetches down
the two bytes at FFFE:Fh, 5F5Fh in this case, and commences execution at this
address. Its first instruction is 5Fh, or CLRB. Once this has been done (all Read cy-
cles), the instruction at 5F60h is fetched, again CLRB…ad infinitum. CLRB rather
than NOP was chosen, as the latter's op-code of 01h would require seven diodes.
As long as the 6809 free runs, its address bus acts as a 16-bit counter, cycling
from 0000h to FFFFh. Assuming a 1 MHz clock (4 MHz crystal), a15 will cycle in
216 × 2 = 0.131 s, a14 in 0.0655 s, down to 2 µs for a0. During this time R/W and
E and Q can be monitored using an oscilloscope.
1
The address decoder outputs will last 8 of this cycle time, and will appear in
the correct sequence. Some typical examples are shown in Fig. 15.7. These can in
turn be traced to the appropriate Chip Enables. Although the data bus is discon-
nected from the MPU during this time, it will still be activated by any enabled in-
put device. Thus using a 2-beam oscilloscope, monitoring the Switch_Port_Enable
(i.e. 8000h) and d0, will enable the state of Switch 0 to be seen, gated through to
the data line. Similarly, activity at the time of the EPROM and RAM Chip Enables
can be viewed.
Figure 15.7 One free-run cycle, showing RAM, A/D and DIG_O/P Enables.
400 C FOR THE MICROPROCESSOR ENGINEER
The requirements for a 68000 null instruction are more stringent, as its op-
code must be even. This is because of the requirements that both PC and SP
must be even, and these will be equal to the null op-code after Reset. Should an
odd word be fetched for these, then a fatal Double-Bus fault will occur [7]. The
word size of a 68000 op-code puts further restrictions on the choice of a null
instruction for the 68008 processor, as this will fetch the op-code down in two
identical bytes. Both considerations rule out the NOP instruction, with its op-code
of 4E-71h. Fortunately the op-code for ORI.B #0,D0 is 00-00h, and this fulfils
all these requirements.
The free-run circuitry shown in Figs 15.6(c) and 15.6(d) comprises two head-
ers. The normal header simply connects the eight data lines and DTACK directly
through. Free running is accomplished by replacing this by a header shorting
these lines to ground.
As in the 6809 case, the PC and hence the address bus repetitively cycle
through the entire address space. As the null instruction takes 16 cycles, then at
8 MHz a 2 µs instruction time is obtained. Address line a19 takes 220 × 2 = 2.1 s
to complete a sweep. During this time both AS and DS operate in the normal way,
and R/W is high (Read).
The address decoder outputs cycle with a repetition rate of 216 × 2 = 0.131 s,
as a15 is the highest decoded line. Waveforms are similar to those for the 6809,
shown in Fig. 15.7, but decoder outputs are qualified by AS, giving striated Chip
Enables. As previously described, these can be used with a 2-beam oscilloscope
to monitor activity on the data bus. The DTACK generator can also be monitored,
although this is trivial in simple circuits such as shown in Fig. 13.3. Logic analyzer
traces showing the 68000 in this free-run mode are shown in reference [8].
The free-run facility is useful, as it requires a minimum of built-in test hard-
ware. It is possible to take the process a stage further, and incorporate a hard-
ware single-step facility [8]. However, this isn't often done, as the extra testability
rarely justifies the expense.
With a reasonable assurance that the target hardware is functioning, the diag-
nostic software can be loaded in. This can be done by using a romulator (ROM
emulator) or programming an EPROM and inserting into its socket. The former
uses a block of dual-port RAM to take the place of ROM memory. One port is con-
nected to the ROM socket in the target system via a ribbon cable and DIL plug.
The other port is controlled from the terminal, typically the workstation on which
the compiler/assembler runs. A driver software package downloads hex files to
this RAM, usually through a serial link. With the loading completed, the ROMula-
tor can be switched to emulate mode, and will appear as a programmed ROM [9].
The use of an in-circuit emulator for this purpose is the subject of the following
section.
The circuits of Figs 13.1 and 13.3 have a 4-bit switch port available to choose
between normal and diagnostic modes. With this port at zero at power-up, the
normal application program is run. In the diagnostic mode, one of four tests are
made, as follows:
RESIDENT DIAGNOSTICS 401
Once one of the two modes have been entered, change-over can only be imple-
mented through a reset. After Reset the switch port can be used as a normal
run-time port.
The application software shown on the first page of Table 15.5 is basically a
modified version of Table 14.14. There are two significant changes. Firstly, all
ports (now including the switch port, named diag_port) are defined externally,
that is before main(). This is because they are needed by the various diagnostic
routines, besides main() and update(). Although it is considered poor program-
ming practice to use public objects unnecessarily, hardware ports are by nature
global. As can be seen from Table 15.6, such objects are stored as constants
in ROM in the same manner as the static const equivalents of Table 14.14.
They could still be qualified as static, in which case they would not be declared
globally known to the linker.
The second change checks the state of diag_port (ANDed with 00001111b
to zero the undefined upper four bits). If non-zero (True), then execution is
transferred to function diagnostic(). Otherwise the time-compressed mem-
ory endless loop is entered. The diagnostic software thus adds nothing to the
execution time of the applications software.
Function diagnostic() comprises a main body having an endless loop se-
lecting one of four subfunctions, depending on which switch is set. Notice from
Table 15.6(b), lines C69 – C72, that the BTST instruction is used to check the state
of the target switch, rather than use the less efficient AND or BIT instruction.
The output_test() function simply counts up from 0 to 255 and sends each
value to the Z digital and X analog output ports. The complement is sent to the
Y analog port. Using an oscilloscope, the X and Y ports give ramps, up and down
respectively, as shown in Fig. 15.8. The Z port acts as an 8-bit counter.
The input_test() function `connects' the analog input port to the two analog
outputs. Thus using a sinewave generator as an input should give two quantized
copies at the output. The switch input port is of course implicitly tested by getting
to this routine in the first place.
Testing the RAM chip is in essence a matter of sending out a test pattern
(10101010b in this case) to each cell in turn and checking that it gets there [10].
This is of course a destructive test, so the original value must be fetched and
saved, before each cell is checked (line C102) and returned afterwards (line C109).
The pointer variable address is used to move up through memory. The values
RAM_START (a pointer) and RAM_LENGTH are defined for the circuit in the header
file (Tables 14.2 and 14.9). The digital Z port is used to indicate pass or fail by
being set respectively to all ones or all zeros. Of course exercising with such a
simplistic test pattern is not a fully comprehensive verification; for example, it
will not detect a stuck at bit 0 error. However, the principle is the same for a
402 C FOR THE MICROPROCESSOR ENGINEER
Table 15.5: Complete 68008 package, including resident diagnostics (continued next
page).
/* Version 01/02/90 */
#include <hard_68k.h>
unsigned char Array [256]; /* Global array holding display data */
unsigned char Oldest; /* Index to the Oldest inserted data byte (left point on scr*/
unsigned char * const x = ANALOG_X; /* x points to a byte @ (address) ANALOG_X */
unsigned char * const y = ANALOG_Y; /* y points to a byte @ (address) ANALOG_Y */
unsigned char * const z = Z_BLANK; /* The z-mod port (digital port) */
volatile unsigned char * const a_d = ANINPUT; /* This is the Analog input port */
volatile unsigned char * const diag_port = SWITCH; /* The z-mod port (digital port) */
main()
{
register unsigned char * array_ptr; /* Pointer into array */
register unsigned char i; /* Scan counter */
register unsigned char leftmost; /* The initial array index when x is 0 */
void diagnostic (void); /* Define the diagnostic function */
/****************************************************************************************
* This is the NMI ISR which puts the analog sample in the array & updates the New index*
* ENTRY : Via NMI and startup *
* ENTRY : Array[] and Oldest are global *
* EXIT : Value held at a_d in Array[Oldest], Oldest incremented with wraparound at 256*
****************************************************************************************/
void update(void)
{
Array[Oldest++] = *a_d;/* Overwrite oldest sample in Array[] & inc Oldest index mod-256 */
}
RESIDENT DIAGNOSTICS 403
void output_test(void)
{
register unsigned char count = 0;
do
{
*x = count; /* Send count out to X D/A converter, i.e. ramp up */
*z = count; /* and to the Z digital port */
*y =~count; /* ramp Y output down */
} while(++count != 0);
}
void input_test(void)
{
*x = *a_d; /* Get input from a_d and send to X d/a */
*y = *a_d; /* and to Y d/a */
}
void RAM_test(void)
{
register unsigned int i;
register unsigned char temp;
register unsigned char * address;/* Address of the memory byte being tested */
*z = 0; /* Set digital port to all zeros (pass) */
for(address=RAM_START; address<RAM_START+RAM_LENGTH;)
{
temp = *address; /* Get ith memory byte */
*address = 0xAA; /* Send out 10101010b to it */
if(*address != 0xAA)
{ /* IF not this value THEN signal failure by sending out 11111111b*/
*z = 0xFF;
break;
}
*address++ = temp; /* Restore original value */
}
}
void ROM_test(void)
{
register unsigned short * address; /* Address points to 16_bit word in EPROM */
register unsigned short sum=0;
*z = 0; /* Set digital output to all zeros to signal pass */
for(address=ROM_START; address<ROM_START+ROM_LENGTH; sum+=*address++) {;}
if(sum) {*z = 0xFF;}/*IF a non-zero sum THEN signal error by digital output = 10101010*/
}
a time). Unprogrammed locations are usually FFh, and so one must be added to
this sum to compensate for the overwritten word (FFFFh is −1). Thus we have
404 C FOR THE MICROPROCESSOR ENGINEER
CS + (sum + 1) = 0
or
where sum is the 1's complement and sum + 1 is the 2's complement, that is
−sum. Hence all that is needed is to invert the modulo-65536 (16-bit) summation
of all EPROM words and overwrite a convenient unprogrammed word. In cases
where unprogrammed locations are zero, then one should be subtracted from
the inverted summation.
Function ROM_test() walks through the contents of the EPROM using a pointer-
to short moving from ROM_START to ROM_START + ROM_LENGTH. Each word is
added to the 16-bit variable sum, which eventually will give the modulo-65536
check digit, which is hopefully zero. Care must be taken as the checksum is not
always calculated in this way. For example the modulo-65536 sum of all bytes
does not give the same answer.
Code generated by the diagnostics() source is given in Table 15.6. The time
compressed code is virtually the same as in Table 14.14 and is not reproduced
here. The assembly-level code is commented and is straightforward. One inter-
esting point concerns lines C90 and C91. The reader might suppose the textbook
equivalent:
*x = *y = *a_d;
to be the same. Not necessarily so. This compiler implemented this as follows:
RESIDENT DIAGNOSTICS 405
Table 15.6: Code for the 68008 implementation (continued next page).
Table 15.6: Code for the 68008 implementation (continued next page).
* 77 void output_test(void)
* 78 {
.even
_output_test: move.l d5,-(sp)
* 79 register unsigned char count = 0;
clr.b d5 *## count lives in D5.B and is zeroed
* 80 do
* 81 {
* 82 *x = count; /* Send count out to X D/A converter; ie ramp up */
L162: move.l _x,a1 *## A1 points to X port
move.b d5,(a1) *## send count out to this port
* 83 *z = count; /* and to the Z digital port */
move.b _z,a1 *## A1 points to Z port
move.b d5,(a1) *## send count out to this port
* 84 *y =~count; /* ramp Y output down */
move.l _y,a1 *## Point to Y analog port
clr.w d7
move.b d5,d7 *## Get count again!
not.w d7 *## Invert it (ie ~count)
move.b d7,(a1) *## and send it out
* 85 } while(++count != 0);
addq.b #1,d5 *## First increment count
bne.s L162 *## IF not folded over to zero (ie 256) THEN repeat
move.l (sp)+,d5
rts
* 86 }
* 87
* 88 void input_test(void)
* 89 {
.even
* 90 *x = *a_d; /* Get input from a_d and send to X d/a */
_input_test: move.l _x,a1 *## Point A1 to X d/a output port
move.l _a_d,a2 *## Point A2 to a/d input port
move.b (a2),(a1) *## Send input data to output X port
* 91 *y = *a_d; /* and to Y d/a */
move.l _y,a1 *## Point A1 to Y d/a output port
move.l _a_d,a2 *## Point A2 to a/d input port
move.b (a2),(a1) *## Send input data to output Y
rts
* 92 }
* 93
* 94 void RAM_test(void)
* 95 {
_RAM_test: movem.l d5/d4/a5,-(sp)
* 96 register unsigned int i;
* 97 register unsigned char temp;
* 98 register unsigned char * address /* Address of the memory word being tested */
* 99 *z = 0; /* Set digital port to all zeros (pass) */
move.l _z,a1 *## A1 points to Z port
clr.b (a1) *## Send out 00000000b
While this may be logically correct, y is a write-only port; it cannot be read! There
is no way in ANSII C to designate an object write-only. A read-only object is of
course designated as const. Designating such an object volatile may help, in
that it should signal to the compiler that it cannot depend on what it reads, but
RESIDENT DIAGNOSTICS 407
is dangerous, as the stack and frame area is of course in this part of memory. Even
though I have made RAM_test() non-destructive, problems can arise. By making
temp a register variable, the original value of any RAM location can temporarily
be stored (in D4.B) out of harm's way. However, the 6809 compiler ignores any
register qualifications (as do implementations for most 8-bit targets) and puts
temp as an auto variable in a frame, that is in RAM. When that particular address is
tested, temp will be overwritten by the test pattern. Similarly the pointer address
itself is in RAM.
Table 15.7 An alternative RAM testing module for the 6809 system.
void RAM_test(void)
{
_asm(" .define RAM_START = 0, RAM_LENGTH = 800h\n");
_asm(" ldy #0 ; i held in Y, =0\n");
_asm(" ldb #10101010b ; Test pattern in B\n");
_asm(" clr _z ; Send out all zeros to digital port to signal ok\n");
_asm("RLOOP: lda RAM_START,y ; Get mem byte @ RAM_START+i (RAM_START from header)\n");
_asm(" stb RAM_START,y ; Put pattern back out to same location\n");
_asm(" cmpb RAM_START,y ; Did it get there?\n");
_asm(" bne ERROR ; IF not THEN break to error handler\n");
_asm(" sta RAM_START,y ; ELSE put byte back\n");
_asm(" leay 1,y ; i++\n");
_asm(" cmpy #RAM_LENGTH+1 ; Finished yet?\n");
_asm(" bne RLOOP ; IF not THEN test next byte\n");
_asm(" rts\n");
_asm("ERROR: ldb #11111111b ; Put out the error code\n");
_asm(" stb _z\n");
} ; Exit via }'s RTS
system can be monitored using the normal hardware tools. A variation of this
technique uses a ROMulator. This is a RAM pack with a flying lead and DIL plug
masquerading as an EPROM. Machine code can be downloaded into the ROMulator
which is plugged into the target EPROM socket. Such software is easier to change
than firmware and some monitoring of target variables is possible.
Where a more extensive examination of both hardware and software is neces-
sary, then an in-circuit emulator (ICE) is required [11]. An ICE is a microprocessor-
based product which exercises the target hardware under the control of a micro-
processor development system. A typical configuration is shown in Fig. 15.9.
Here the ICE replaces the target microprocessor via an umbilical cord and plug.
The ICE hosts the same processor as the target, often piggybacked onto the umbil-
ical plug, to be as close to the target as possible. This slave (i.e. target) processor
is controlled by the ICE master microprocessor, which also communicates with a
computer via a serial link. Thus, typically the target MPU might be a 68008, with
a Z80 master and 8086-based computer!
Many different configurations are possible. Historically the Intel corporation
invented the ICE in 1975 as part of their development system for the 8080 MPU.
The ICE-80 was a plug-in card to the Intel Microprocessor Development System
(MDS) bus. An 8080 processor both emulated, controlled and communicated
with the user. Most manufacturers followed with their own version, such as
Motorola's EXORmacs MDS. Some of the large test equipment manufacturers,
notably Tektronix and Hewlett Packard, developed a general purpose MDS, not
tied to one specific manufacturer's product [12]. Here the ICE could be altered
by changing the plug-in board, pod and associated software.
With the rise in popularity of the personal computer, the stand-alone configu-
ration of Fig. 15.9 has become popular. The user interface can be anything from a
dumb VDU terminal to a workstation or minicomputer. Firmware in the ICE itself
communicates with this terminal and is used by the master processor. Gener-
ally, changing the target processor involves changing one or more of; the pod,
firmware, ICE-board, terminal software.
Although most stand-alone ICEs will operate with a dumb terminal host, the
internal ROM-based ICE commands are very basic and elementary. Using an in-
telligent terminal, such as a personal computer, allows a much more powerful
and user-friendly software interface to insulate the user from the complexities of
the ICE hardware. Aids such as menus and helpful prompts are useful to novice
users. As with other software aids, the protocols and commands available are
very product dependent, doubly so here as both hardware and software are in-
volved. The following examples use the Noral SDT1 product [13], but the facilities
available are similar to most products [11].
All ICEs permit shadowing of the target's memory map. Thus memory is avail-
able to the slave emulator MPU on-board the ICE. As seen by this slave, its memory
map can be set in chunks between local internal memory (known as overlay) or
the target. As an example, consider a target with ROM between 3000 and 3FFFh,
1 Noral Microelectronics, Logic House, Gate St., Blackburn, Lancs, BB1 3AQ, UK.
410 C FOR THE MICROPROCESSOR ENGINEER
6000 and 7FFFh, and E000 and FFFFh. The rest of its memory space is occupied
by RAM or memory-mapped peripherals. Normally on power-up all memory is
mapped to the target of type read/write. To `move' the three ROM areas into the
internal overlay ICE memory, use the MMO (Memory Map to Overlay) commands
thus:
is shown in the command area at the bottom of the page. The register window at
the top right shows the state of the MPU's registers. The Supervisor Stack Pointer
(SSP) has been set to 10000h by using the RW (Register Write) command. Below
this is the state of the System stack. This is useful to examine parameters passed
to the function or subroutine through the stack and monitoring frame data.
Clicking a mouse on the [S] or [GO] boxes causes the program to Step or GO
and execute as appropriate. In the Step mode the line of code being executed is
highlighted on the screen.
Although the data presented in Fig. 15.9 looks similar to that of Table 15.1,
remember that the latter is a pure simulation whilst the former is running on an
actual 68008 microprocessor.
High-level ICE driven packages are now becoming available, which have the
same relationship as the low and high-level simulators discussed in Section 15.1.
Some of these are extensions of existing simulation products which makes mov-
ing between a simulation and emulation environment easier.
Although an in-circuit emulator is versatile, it is expensive (typically $7000+),
relatively bulky and fragile. They can also be cantankerous! Thus it makes sense
to use a simulator at the outset to check out the purely software aspects of the
project. If testability has been incorporated into the hardware, as described in
Section 15.2, then the ICE can be left for the final phases of the testing and `tough
nut' servicing situations.
References
[1] Wakerly, J.F.; Microcomputer Architecture and Programming, Wiley, 1989, Sec-
tion 13.1.
[2] Atherton, W.A.; Pioneers: Grace M. Hopper, Electronics World + Wireless World (UK),
95, no. 1646, Dec. 1989, pp. 1192 and 1194.
[3] MacClean, A.; The Great C Debugger Review, .EXE (UK), 1, no. 9, March 1988, pp. 12 –
25.
[4] Adams, M.; Development without Development Systems, from Microprocessor Devel-
opment and Development Systems, ed. Tseng, V., Granada (UK), 1982, Chapter 8.
[5] Adams, M.; C, 68000 assembler and the IBM PC, .EXE (UK), 1, no. 9, March 1988,
pp. 26 – 30.
[6] Ferguson, J.; Microprocessor Systems Engineering, Addison-Wesley, 1985, Section 8.3.
[7] Wilcox, A.D.; Bringing up the 68000 – A First step, Dr Dobb's Journal, 11, Jan. 1986,
pp. 33 – 40.
[8] Stockton, J and Scherer, V.; Learn the Timing and Interfacing of MC68000 Peripheral
Circuits, Electronic Design, 27, no. 26, Nov. 8, 1979, pp. 58 – 64.
[9] Ferguson, J.; Microprocessor Systems Engineering, Addison-Wesley, 1985, Section 4.2.
[10] Gilmour, P.S.; Caveat Tester, Embedded Systems Programming, 4, no. 7, July 1991,
pp. 58 – 65.
References 415
[11] Ferguson, J.; In-Circuit Emulation, Wireless World (UK), 84, no. 1580, June 1984,
pp. 53 – 55.
[12] Lejeuine, B.; In-Circuit Emulation, in Microprocessor and Microprocessor Development
Systems, ed. Tseng, V., Granada (UK), 1982, Chapter 7.
[13] Ferguson, J.; Microprocessor Systems Engineering, Addison-Wesley, 1985, Section 5.1.
CHAPTER 16
C'est la Fin
Having designed and tested our project it only remains to wrap up by doing a
comparative analysis of the various implementations and giving some sugges-
tions on how the basic specification can be extended.
16.1 Results
One of the first questions asked is how will a C-coded program compare with its
assembly-level equivalent? To try and answer this question I have coded both our
systems at assembly level, so that we can contrast the two approaches. In defence
of the expected outcome, it should be pointed out that small routines, especially
those that intimately interact with hardware, are the forte of assembly-level code
and the antithesis of high-level languages. Thus our results will be at the far
end of the spectrum; however, this will at least give us a worst-case yardstick to
balance the pros and cons of the two approaches.
Our first demonstration is the 6809-based coding of Table 16.1. This is struc-
tured after the C-level coding of Tables 14.3 and 14.4. Like the C program, the
variables Oldest and Array[] are stored in absolute memory locations and so
are globally known to both the routines MAIN and UPDATE.
At the beginning of the scan (lines 18 – 29) the address of the leftmost Y co-
ordinate, Array[Oldest], is calculated and placed in Index register_X. This cal-
culation is done in lines 18 – 21 by expanding out the 8-bit Oldest index, pointing
X to Array[0] and using the instruction LEAX D,X to put the effective address
Oldest + Array back in X.
The main scan routine simply uses the Post-Increment Index address mode au-
tomatically to advance this pointer once each time Array[i] is fetched, prepara-
tory to sending it out to the Y plates. The stratagem of keeping the X count
in Accumulator_A and fetching Array[i] down to Accumulator_B, means that
both X and Y co-ordinates can be output together using a single Store Double
instruction (line 35).
In incrementing the Array[] pointer, a check is made to detect the situation
when its value reaches Array[256] (Array + 256) and to reset back to the begin-
ning Array[0]. This gives a pseudo circular structure. Thus if Oldest were 40h,
then the leftmost value (X = 00h) would be Array[40], and when X reached
416
RESULTS 417
BFh the Y point would be Array[255]. The next point at X = C0h should be
Array[0].
The NMI ISR UPDATE simply computes the address of Array[Oldest] in the
same way as the leftmost point was calculated, fetches the analog sample down
into this element and increments the index Oldest with wrap around from FFh
to 00h. As the NMI interrupt saves and restores all internal registers, there is no
restriction on register usage.
The total length of the routine is 77 bytes plus vectors. The scan time for one
screen of data, ignoring any interrupt service time, is 7.5 ms, giving a sweep rate
of 133 Hz.
The 68008-based equivalent is shown in Table 16.2. Like the C program of
Table 14.9, variables are preferentially located in registers, with only the global
object Array[256] being located in memory. The X count is held in D0.B and the
index to the oldest updated array element in D7.B. Address register_A0 is used in
the background program to point to the array element currently being fetched,
whilst A1 is a convenient way of holding the constant address of Array[0].
On entry to the scan loop (lines 33 – 40), A0 points to the leftmost array el-
ement to be displayed. After each point is displayed (lines 33 – 34) both the
X count (in D0.B) and array pointer (A0.L) are incremented. In the case of the
latter, wraparound occurs whenever the address reaches 256 above the array
base (lines 37 and 39). This gives the necessary circular data structure.
During the flyback delay, the array pointer is reset to the new leftmost point in
line 44 by using D7.W as an index (the oldest array element) with A1.L (pointing to
the base of the array), that is Yleftmost = Array[Oldest]. As this index address
mode uses a (sign-extended) word-sized index register (byte sized not allowed),
D7 was originally word-sized cleared in line 22 to ensure no non-zero bits in
D7[15:8] will upset this calculation.
The level-7 ISR is called UPDATE, and simply uses D7.W (the oldest index) as
an offset to A1.L (which is permanently pointing to the array base), to move the
value from the A/D converter to Array[Oldest]. Adding one to D7.B ensures
that this points to the array element furthest back in time on exit. The byte-sized
increment automatically gives wraparound. As none of the registers are saved or
retrieved by a 68000 MPU interrupt, using D7 and A1 as global register variables
is legitimate.
The program totals 102 bytes excluding vectors and takes 5.9 ms for one
screen's worth of data, ignoring any interrupt service time. This gives a sweep
rate of 169 Hz.
The final figures then show a size factor of 2.4 for the 6809-based circuit and
a speed factor of 2.7. The 68008 has a closer size factor of 1.7 together with a
speed factor of 2.9. If we treat these figures as a worst-case scenario, then for
realistic situations these factors are likely to be of the order of 1.5 at best; that is
a C coding will have around 50% more code and be 50% slower than an equivalent
assembly-level implementation. Against this must be ranged the high-level code
advantages of cost, portability and reliability.
Figure 16.1 shows a typical set of X and Y traces captured on a Hewlett Packard
RESULTS 419
Figure 16.1 Typical X and Y waveforms, showing two ECG traces covering 2 s.
54501A digitizing oscilloscope. The upper trace shows the contents of the 256-
byte array covering approximately two seconds. The bottom trace shows the
X sweep. The flyback blank has been deliberately increased to give a refresh rate
of 50 Hz.
the amount of data presented on the screen. Four minutes worth can be displayed
by using two traces, which are scanned in succession.
The overall double scan of the complete four minutes, stored in a 512-byte
array, will still have to be accomplished in 20 ms (10 ms per trace) to give a flicker-
free display.
The apparently two separate traces can be simulated by reducing the vertical
resolution to seven bits. The top trace is displayed with a MSB of 1, whilst the
bottom 256 data bytes have a MSB of 0. The Y-output D/A converter will then
1
bias the first 256 data bytes by 2 scale. Of course the data bytes must first be
logic shifted once right (divided by two) before the MSB is tampered with.
As the 512 data points need to be displayed in the time previously required
for 256 points, the processor will have to work twice as hard. If the software is
coded in C then it is doubtful if there is sufficient reserve. Renovating the circuit
to use a 2 MHz 68B09 (with faster EPROM) or a 12.5 MHz 68000 MPU will provide
the additional horsepower if this is the case. However, this is probably a good
case for using an assembly-level coding. It does work!
Using a bidirectional X-sweep would slightly reduce the scan time. By display-
ing, say, the bottom trace from left to right and then returning along the top trace
right to left, the flyback and Z-blank delays are eliminated. Using a triangular in-
stead of sawtooth timebase is a standard technique implemented by printers. It is
feasible to use this bidirectional scan for the single trace of the basic project, but
as the traces will be superimposed, the oscilloscope must not exhibit appreciable
hysteresis, in my experience a problem with low-cost oscilloscopes.
Continuing with this theme, the freeze facility can be applied to only one of
these traces, say, the upper, whilst the lower continues on as normal.
Another approach to displaying additional data is to use hidden pages with
only one or two on-screen traces. Thus, for example, we could store all data
for the last 32 minutes in a 32 kbyte RAM, but only display the last two minutes
worth. However any of the 2-minute pages from time past could be displayed
as commanded using the setting of the switch port. Furthermore a hard-copy
routine could be written to dump the entire 32-minutes worth sequentially to a
chart recorder or graphics printer, or even uploaded to a PC for further analysis.
The option of invisibly acquiring data as this process is in progress, by using
a shadow RAM, is useful. Indeed this option is also a possibility for the freeze
process in the main project. Thus the freeze command makes the display static,
but data continues to be acquired `behind the scenes'.
Depending on the quality of the analog data, it may be desirable to apply a sim-
ple digital smoothing routine at the output (e.g. see the 3-point filter of page 246).
Regardless of such processes, an 8-bit quantized system will look rather granular
in hard copy. With a 12-bit A/D converter, the number of quantization levels in-
creases from 256 to 4096. Unfortunately this will require a similar enhancement
of RAM capacity from 256 byte-sized elements to 4096 word-sized elements for
each 2-minute slot, a 32-fold increase!
Displaying a 4096 element data array in 20 ms is well above the capabilities of
any of the processors used in this text. Rather, every other 16th element could be
422 C FOR THE MICROPROCESSOR ENGINEER
used for a conventional 8-bit oscilloscope display and the full resolution reserved
for the hard-copy or uploaded version, where time is not an issue. Even so, the
extra overhead of a ×16 interrupt rate, reading and writing 12-bit quantities over
an 8-bit bus (e.g. see Fig. 6.1) and extracting one in sixteen bytes is onerous.
With the processor pretty well spending its entire time displaying the wave-
form, there is no spare capacity available to analyze the data. A second processor
running in parallel with the display processor would enable both functions to be
carried out in real time. Data acquired by the master processor could be sent to
the slave by writing the latest sample to an output port, then interrupting the
slave which reads this as an input port. Of course the option of uploading, say,
via the serial link, is a viable alternative if the data rate is not too high.
Analysis tasks include detecting waveform peaks and calculating beat rates
and beat-to-beat variations. A separate display of the appropriate data could be
maintained by the slave or multiplexed on to the primary display. This display
device need not be CRO-based, a liquid crystal panel is a viable alternative, and
will probably contain its own microprocessor.
Appendix A
A Accumulator_A
A/D Analog to Digital converter
B Accumulator_B
BIOS Basic Input/Output System
CCR Code Condition Register
D Accumulator_D
D/A Digital to Analog converter
DMA Direct Memory Access
ea Effective Address
ECG Electro-CardioGram trace
EKG See ECG
EPROM Erasable Programmable Read-Only Memory
ICE In-Circuit Emulator
I/O Input/Output
ISR Interrupt Service Routine
K 1024 = 210
LSB Least Significant Bit or Byte
Op-code Operation code
OS Operating System
PC Personal Computer
PC Program Counter
PIA Peripheral Interface adapter
PI/T Parallel Interface Timer
M 1,048,576 = 220
MDS Microprocessor Development System
MSB Most Significant Bit or Byte
MSDOS MicroSoft Disk Operating System
OS Operating System
RAM Random Access Memory
ROM Read-Only Memory
S System Stack Pointer register
SBC Single-Board Computer
S/N Signal to Noise ratio
S/H Sample and Hold
SP Stack Pointer register
423
424 C FOR THE MICROPROCESSOR ENGINEER