Architecture: TMS320C54x

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

EMUI/OFF itistate

Emulator interrupt
pin/disable all outputs

11.4.2 Architecture of
TMS320C54x Processors
The TMSS200>Nprocessors employ an
advanced. modified Harvard architecture
essing power by providing tour pairs of separate bus
proc
rOgram mory. The 4 pairs or S'internal buses of stnuctures three nairs for data memory a
for program mem

TMS320C54x processors
rocessors are.
PB Program Bus
PAB Program Address BusTogram memory bus to read opcode and immediate operana

CB CBus
CAB C Address Bus Two independent data memory buses to Havvoa
DB:DBus read two data
simultaneously from memory
DAB D Address Bus

EB E Bus
Data memory bus to write data in data memory
EAB E Address Bus
n 1MS520C54x processors, the separate program and data memory spaces allow simultaneous access
to program instructions and data, providing a high degree of parallelism. For example. two read and one write
operations can be performed in a single cycle. Special instructions with parallel load/store and multipiy
accumulate fully utilize this architecture. In addition, data can be transferred between data and program
memory spaces. Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation operations
that can be performed in a single machine cycle. In addition, the TM$320C54x processors include the control
mechanisms to manage interrupts repeated operations and function calls.

The simplified internal architecture of TMS320C54xprocessor is shown in fig 11.14. The architectue
can be broadly divided into three major areas. They are CPU (Central Processing Unit), on-chip memory unit
and on-chip peripherals.
The functional units ofCPU are 40-bit ALU (Arithmetic Logic Unit), two numbers of 40-bit accumulators
(ACCA and ACCB), barrel shifter, 17x 17-bit multiplier, 40-bit adder, CSSU (Compare, Select and Store Unit)
exponent encoder, status registers, data address generation unit, program address generation unit and system
control interface.
------
38
m--f--
p
Chapter 11 -Digital Signal Processors

.Barrelshifter 11. 52
. 17x17-bitmultiplier
.40-bitadder
. Compare, Select and Store Unit (CSSU)
.Exponentencoder
.Data address generation unit
Program address generation unit

Arithmetic Logic Unit (ALU)


The 40-bit ALU can perform a wide
range of arithmetic and logical functions in a
AAer the ALU operation, the destination of the single clock cycle.
result is either accumulator or
memory.
For ALU operations involving two data, one of
the data is from
dota is from
accumulator/memory/T-register. The barrel-shifter and barrel-shifter/memory and the other
When data is fed from
accumulator supply 40-bit data to ALU.
memory to ALU, two 16-bit data are loaded to bits 0
to 15 and bits 16 to 31 with bits
32 to 39 either filled with zero or sign extended.
The ALU can function as two
16-bit ALUs and perform two 16-bit operations simultaneously when the
C16 bit in status register 1 (ST1) is set.

Accumulators
The CPU has two 40-bit accumulators referred to as accumulator A (ACCA) and accumulator
B(ACCB). The accumulators can act as source/destination for the ALU and the multiplier/adder. Also, any
of the accumulators can be used as temporary storage for the other.

The accumulators are divided into three parts: t u a t db i b s

Guard bits (bits 32-39)


ua
A high-orderword (bits 16-31)
Alow-order word (bits 0-15)
The guard bits are used as a headmargin for computations, which prevent overflow in iterative
computations like convolution/correlation. The instruction set of the TMS320C54x processor includes
instructions for storing the guard bits, the high and the low-order accumulator words in data memory, and for
manipulating 32-bit accumulator words in or out of data memory.

Barrel Shifter
The 40-bit barrel shifter can perform 0 to 31 bits left shift, 0 to 16 bits right shit and along with
exponent encoder can normalize the accumulator content. The shift informations are specified in the shift
cOunt field of the instruction, the shift count field of status register I or in T-register. The shift and normalize
operations of barrel shifter can be used to realize the following operations.
before an ALU operation
Prescaling of the memory/accumulator operand
value
Logical or arithmetic shifting of accumulator
Normalizing the accunmulator
in memory
Postscaling the accumulator before storing
Digital Signal PrOCessng
11. 53
buses (DB and
16-biv32-biv40-bit onerands which are input from data
Shifter can handle
CB buses) or from accumulators. The output ofshifter can be loaded in ALU or bus.

Multiplier/Adder
unit consists of 17 17-bit multiplier, 40-bit adder, signed/unsigned input control
ne muiplier/adder x

logic and T-register.


One or the

ogc, ractional control logic, zero detector, rounder. overfilow/saturation


can be supplied from T-register/data-memory/accumulator,
and the other nput can be
putsrorthe multiplier
Supplied from data-memory/program-memory/accumulatnr.
multiplication and 40-bit addition
e a d d e r unit can perform 17x 17-bit two's complement can pertorm MAC
nparaiiel n a single instruction cycle. In addition, the multiplier and ALU together can oe usea
n A L O Operation in parallel in a single instruction cycle. These parallel operations
Tor etticient impelmentation
of DSP computations like convolution, correlation and filtering
Compare, Select and Store Unit (CSSU)
The CSSU is an application specific hardware unit dedicated to perform add/compare/select operations
n order to support various viterbi butterfly algorithms used in equalizers and channel decoders.

The inputs to CSSU for comparision are from accumulator and the output is stored in data memory.
The status of comparision is also stored in LSB of TRN register and TC bit of status register 0.

The instruction "CMPS src,


SMEM", use the CsSU to compare the low and high word of specified
source accumulator, toselect the largest of the two words and store in
specified data memory. It hign
accumulator is greater, then 0 is stored in LSB of TRN and
TC, or iflow accumulator is greater, then l1 is stored
in LSB of TRN and TC.

Exponent Encoder
For implementation of floating
point arithmetic in fixed point processors like TMS320C54x, require
separation of exponent and mantissa of the floating point data.
The exponent encoder is an
application-specific hardware device dedicated to extract the exponent
value from floating point data in the accumulators
and store in T-register.
The "EXPsrc" instruction is used to extract the
exponent and save in T-register. The "NORM src, dst"
instruction is used to normalize the accumulator
using the exponent in T-register as count value.
Data Address Generation Unit

The data address


generation units
consist of two numbers of
Auxiliary Register Arithmetic Units
(ARAUO,ARAUI), eight numbers of Auxiliary Registers (ARO-AR7),a l6-bit circular buffer size register
and a 16-bit Stack Pointer (SP). (BK)
The auxiliary registers are used to hold the
data-memory address in indirect addressing mode.
3-bit ARP (Auxiliary Register Pointer) field of status The
register 0 indicates the current AR used
addressing. The auxiliary register-0 is also used as an index for
register for modifying the content indirect
auxiliary registers. of other
The ARAU perform arithmetic operations related to address generation for indirect
decrement, indexing, biIt reversed aadress generation and circular
like increment, addressing mode
independent ARAUs at any time can operae on wo ARS TO gOnerare two address generation. The two
data-memory address simultaneouslv
chapter Digital Signal Proces
The 99-bit DP (Data-page Pointer) of status register-0 is 11.54
address) in dire irect addressing. The circular buffer used as upper 9 bits of data-memory address
(page
te the start and end address of register is loaded
circular memory along with with circular buffer size which is
used t og e n e r a t e

ter is used to implement the LIFO AR specified in the instruction. The


stack stack for
pointe. always holds the address of memory operands that uses stack addressing. The stack
top of stack.
s

Program Address Generation Unit


The program address generation unit consists of
five registers,
ter (RC), Block-Repeat Counter (BRC), namely, Program Counter (PC), Repeat
Cad Address register (REA). Some versions of TMS320c54x Block-Repeat Start Address register (RSA) and Block-Repeat
has an additional register called
ngram counter extension register (APC) to support addressingprocessors
progra
of virtual memory.
The program counter PC is a 16-bit register which holds
the address of the program code. An instruction
ic fetched from program memory by
loading the content of PC (address) on the program address bus
nd then reading the code from program bus (PB). When the (PAB)
memory is read, the PC is incremented for the
next fetch, so that when an instruction word is read, the PC holds
the address of next word of same instruction
or the next instruction. The XPC is a 7-bit
register that selects the extended page of program memory in the
processors that supports virtual addressing.

When the execution ofa single instruction has to be


repeated a number of times the 16-bit RC register
is used to hold the count value and when a block of instruction has to be repeated the BRC is used to hold the
count value. The registers RSA and REA are used to hold the start and end address of the block to be
repeated
respectively.
Status Registers (ST0 and ST1)
The TMS320CS4x processors has two numbers of 16-bit status registers STO and STI which holds the
status of ALU result, pointers for indirect addressing and various bits for interrupt control, hold mode.
arithmeticmode and accumulator shift value. The format ofstatus registers are shown in fig 11.12 and 11.13.
The functions of various bits ofstatus register are listed in tables 11.14 and 11.15.
The status registers can be stored into data memory and loaded firom data memory, thereby allowing
the processor status to be saved and restored for subroutines. The individual bit ofthe STO and STI registers
can be set or cleared with the SSBX and RSBX instruction.
The ARP, DP and ASM bit fields can be loaded using the LD instruction with a short-immediate
operand. The ASM and DP fields can be also loaded with data-memory values by using the LD instruction.

Table 11.18: Functions of various bits of STO of TMS320C54x processors

RESET
BIT NAME VALUE FUNCTIOON
to select AR for indirect addressing
15-13 ARP Auxiliary register pointer
stores the results of ALU test bit operations
12 TC Test/control flag bit which
indicates a carry or borrow in ALU operation
11 C Carry bit which
Indicates an overtlow in ALU operation with destination as ACCA
10 OVA Overflow flag for ACCA.

Overflow flag for ACCB. Indicales an overilow in ALU operation with destination as ACCB
OVB
pointer to specity the current data memory page
8-0 Data-memory page
DP
15 14 13 12 11 10 9 87 6 432
DP
ARP
T oVAov
Fig 11.15: Format of status register 0 (STO) of TMS320C54xprocessor

Table11.19:Functions of various bits of ST1 of TMS320C54x processors


RESET
BIT NAME VALUE FUNCTION
active
15 BRAF Block-repeat active flag, Indicates whether a block repeat is currently

14 CPL direct addressing


Compiler mode bit. Indicates which pointer is used in relative

13 XF Status of external flag pin

12 HM Hold mode bit

11 INTM Interrupt mode bit. Enables/disables all interrupts

10 0 Always 0

9 OVM Overflow mode bit. Enables/disables the saturation mode in ALU.

8 SXM Sign-extension mode bit. Enables/disables sign extension of an arithmetic operation

7 C16 Dual 16-Bit/double precision arithmetic mode selection bit

FRCT Fractional mode bit

5 CMPT Compatibility mode bit. Determines the compatibility mode for the ARP

4-0 ASM Accumulator shift value in the range -16,, to 15,,

15 14 13 12 11 10 9
8 76 5 210
BRAF CPL XF HM
INTM oVM SxM cis FRCT CMPT ASM

Fig 11.16:Format ofstatus register 1 (ST1) of TMS320C54x processor

CPU Memory Mapped Registers


The TMS320C54x has 32 numbers of 16-bit CPU registers that are mapped into page-0 of data memory
Snace, These memory-mapped registers includes registers tor data and
program memory address generation,
various status and control registers for CPU and the accumulators. The memory-mapped registers along with
11.20.
their memory address are listed in table
Chaprer1
11. 56
Table
11.20: CPU
le 11.20: CPU Memory-Mapped Registers of TM$320C54x processors
Address Name
Dec Hex
Deserlption
IMR
Interrupt mask register
IFR
Interrupt ag register
2-S 2-5 Reserved for testing
STO Status register 0
ST1 Status register 1
8 AL Accumulator A low word (bits 15-0)
AH Accumulator A high word (bits 31-16)
10 A AG Accumulator A guard bits (bits 39-32)
11 B BL Accumulator B low word (bits 15-0)
2 C BH Accumulator B high word (bits 31-16)

13 D BG Accumulator B guard bits (bits 39-32)

14 E T Temporary register
15 F TRN Transition register
Auxiliary register 0 - Auxiliary register 7
16-23 10-17 ARO-AR7

24 18 SP Stack pointer

25 19 BK Circular-buffer size register

26 1A BRC Block-repeat counter


1B RSA lock-repeat start address
27
REA Block-repeat end address
28 1C

PMST Processor mode status register


29 ID

XPC Program counter extension register


30 IE

31 IF Reserved

TMS320C54x Processors
11.4.4 On-Chip Memory in and they
consists of three different types of on-chip memory
The TMS320C54x family of processors Dual-Access RAM (DARAM). The various
Single-Access RAM (SARAM) and
are mask-programmable ROM, which are listed in table I1.21.
will have different capacity on-chip memory
of
members of TMS320C54x
11. 57
iharwivnviurerennsrd

Table 11.21: On-chip Memory in TMS320C54x Processors


Processors
TMSJ20C54x Family of
C548 C549
Memory 1ype C545 C546
C541 C542 C543
2k 16k
32k 32k
20k 2k 2k
ROM Pogram ROM (PROM) 16k
16k 16k
Program/Data ROM 8k
8k 8k
10k 6k 6k
RAM DARAM Sk 10k
24k
24k
SARAM

On-chip ROM
2k to 48k words.
TMS320C54x have internal maskable ROM ofsize
The various models of processors
and in some processors
In majority of the processors, the
on-chipROM is mapped to program-memory space
or excluding
a part of ROM can be mapped to data-memory space.
The processor has an option for including
address space.
the on-chip ROM addresses in the processor program memory
store the program code and
data for a specific
The main purpose of internal ROM is to permanently the content
The processor has an option of boot loading
application during manufacturing of the chip itself. ROM can be
ROM to internal/extermal RAM during power-ON reset. The content of the on-chip
of on-chip
to the program code. This feature provide security
protected so that any external device cannot have access
for proprietary algorithms.
On-chip DARAM
The TMS320C54x family of processors has 5k to 10k words of on-chip DARAM which are organized
into blocks as shown below.
.TMS320C541 :Sk words organized as 5 blocks ofIk words each

TMS320C542/543 : 10k words organized as 5 blocks of 2k words each


.TMS320Cs45/546: 6kwords organized as 3 blocks of 2k words each
TMS320C548/549: 8k words organized as 4 blocks of 2k words each
The DARAM blocks can be accessed twice per machine cycle. Upon reset, the DARAM is mapped to
data memory address space and after reset the processor has provision to map the DARAM into program
memory space.
On-chip SARAM

The TMS320C548/549 processors has 24k words of on-chip SARAM which are organized as three
blocks of 8k words. Upon reset, the SARAM is mapped to data memory space and after reset the processor
has provision to map the SARAM into program memory space.
11.4.5 On-Chip Peripberals of TMS320C54x Processors
The various on-chip peripherals available in TMS320C54x family of processors are,
Software-programmable wait-state generator
Programmable bank switching
Parallel 1Oports
Chapter
-
11. 58
DMA controller
.HostPort Interface (HPI)
.Serial ports (Standard, TDM, BSP and McBSP)
.General purpose 1O pins
Timer

.Clockgenerator and Phase Locked Loop (PLL)


Software-programmable wait-state generator
The software-programmable wait-state generator can
insert/generate wait-states in external bus cycles
for interfacing with slow speed external memory and 10 devices. The
wait-state generator can extend the
external bus cycles up to seven machine cycles. When all external accesses are
the internal clock to the wait-state generator is shut off to reduce
configured to zero wait states.
power consumption.
Programmable Bank Switching
The programmable bank-switching logic can be used to
insert one cycle automatically when the
memory data access switches from data memory space to program memory space or vice versa. 1his extra
the
cycle helps memory to release the bus before the other memory starts
bus contention.
driving the bus, thereby avoidingg
Parallel 1O ports
The TMS320C54x family of
processors has 64k 10 address space which can be used as 64k 1O ports.
The 1O ports can be addressed by the PORTR and PORTW instruction for data
transfer between ports and
data memory. The processor generates a signal IS
during 10 access to indicate a port read or port write
operation. The processor can be easily interfaced to external 1O devices through 1O ports with minimal
external address decoding circuits.

DMA(Direct Memory Access) Controller


The internal DMA controller in TMS320C54x
processors can perform data transfer between various
internal and external memory spaces without the intervention of CPU. The DMA has six
independent
programmable channels, allowing six different contexts for DMA operation. The DMA has higher priority
than the CPU for both internal and external accesses. The DMA can
perform single word or double word
transfers. The DMA transfer from/to external to internal memory
require 5 cycles.
Host Port Interface (HPT)
The HPIis an 8-bit parallel port that provides an interface to a host
processor for information exchange
between the Digital Signal Processor (DSP) and the host The information
processor. exchange takes
place via
on-chip memory that is accessible to both DSP and host. The TMS320CS4x family of processor has 2k words
ofinternal DARAM mapped in data memory space 1000h to 17FFh as HPI memory.
Serial Ports
The TMS320C54x
processors has the following four types of serial ports.
Synchronous serial port
Time Division Multiplexed (TDM) serial port
Buffered serial port
Multichannel Buffered Serial Port (McBSP)
serlal ports are high-speed, full-duplexed serial ports that proviae aeet
e chromows serial ADC. etc.
to
These ports can operate up one-ouror
communication with serial devices such as codecs,
rate. The transmitter and receiver are double buffered and data
is framed either as oytes
C
macnine cycle
as words.

technique for serial communicaionto


he DM serial port employs the time-division multiplexingthe process of dividing
the time intervals
devices having TDM ports. The time-division multiplexingis
mutple
nto number ofsubintervals with each subinterval representing a communication channel. One TMSSZ04x
can communicate with up to seven devices/processors with TDM serial ports
via a pair of data
processor 1or both
lines. Like synchronous serial port, the TDM port is also double-buffered
a pair ot address
transmit and receive data.
interface and an
The bufered serial port consists of a full-duplex double-buffered serial-portunit by a
uro-Durering processor internal memory is connected to an auto-buffering
unit. 1heserial dedicated
bus, so that the buffered port can directly read/write to processor internal memory without the intervention
of CPU. This results in minimal overhead for serial port transactions and faster data rates.

The
multichannel byffered serialport
(McBSP) is an enhanced buffered serial
port can that support
multichannel transmit and receive up to 128 channels. The advanced features ofMcBSP are wide data sizes
from8-bit to 32-bit, p-law and A-law companding and programmable internal clock and frame synchronization.
Genera-Purpose 10 Pins
TheTMS320C54x family of processors has two general-purpose 1O pins and they are branch control
input pin, BIO and external flag output pin, XF.

The BIO pin can be used to monitor the status of peripheral devices. A branch instruction can be
conditionally executed depending upon the state ofthe BlIO input. The BIO pin is an alternative to interrupt,
when the interrupts are dedicated to time-critical applications.
The XF pin can be used to signal external devices. The XF pin is controlled using software. At reset
the XF pin is set high. The SSBX instruction is used to set XF pin and RSBX instruction is used to reset XF
pin.
Timer
The on-chip timer in TMS320C54x processors is a 16-bit timer with a4-bit
prescaler. The tùmer can be
used to initiate any time-based event through interrupt. The timer has a count
register, which is loaded with
a count value and at every clock cycle the timer count is decremented
by 1. At the end of the count an
interrupt is generated. The timer has a control register to control its operations like start, stop, restart and
disable.

Clock Generatorand PLL(Phase Locked Loop)


Stack Stack Add Addne yp
Stack
There are two methods of clock generation in TMS520C54x processors. In one
method, the internal
oscillator connected to an external crystal is used to generate a clock at crystal frequency and then divided
by
1,2, or 4 and used for CPU.
another method, a low-frequency external clock is supplied to an internal PLL circuit. The CPU
In
clock
ic oenerated by a PLL circuit at multiple frequency of external clock. This method reduces svstem power
consumption and clock-generalca EMI andlacilitatethe use of low-cost crystal.
Chapter114 Digital Signal Processors
11. 60
Modes of TMS3200C54x Processors
* * * *

4.6 Addressing
The addressing mode refer to the method of
ction, The
instruction. The TMS320C54x
specifying the operand or the data to be operated by the
processors supports the following seven
addressing modes.
1 Immediate addressing
2 Absolute addressing
3 Accumulator addressing (
4. Direct addressing v)
S. Indirect addressing
6. Memory-mapped register addressing
7. Stack addressing

Immediate Addressing memoy mappe Add


In immediate addressing, the data is specified as a part of the instruction. In this
addressing, the
instruction vwill carry a 3-bit/5-bit/8-bit/9-bit/16-bit constant, which is the data to be operated by the instruction.
The immediate constant is specified with # symbol. In the instructions listed in table 11.22, the syntax used for
immediate addressing are # k3, #k5, # k9, # Kand # Ik.

Example
LD #1Ch, ASM ;Load the immediate 5-bit constant (ICh) in ASM ield of status register1
LD # 12Ah, DP :Load the immediate 9-bit constant (12Ah) in DP field of status register 0
LD#37A5h, 16,A Shift the long immediate (16-bit) constant by 16-bit and load in accumulator A
Absolute Addressing
In absolute addressing, the 16-bit address ofthe operand is directly specified in the instruction. This
addressing can be used to address an operand in all the three address spaces of the processor (ie., address
anoperand in program memory, data memory and IO ports). Inthe instruction listed in table 11.22, the syntax
used for absolute addrsssing are pmad, dmad and PA. In assembly language programs, the l6-bit address is
specified as a 16-bit constant without # symbol.

Example
MVKD SF3Bh, "AR2 Movethe data from data memory addressed by the instruction (address =5F38h
to anotherdata memory localion addressed by AR2
MVPD 3FCAh, "AR4 Move the data from program memory addressed by the instruction
(address 3FCAh} to data memory location addressed by AR4
=

Move the data from the l0 port addressed by the instruction (address = 7C20h)
PORTR 7C20h, *ARI
to data memory location addressed by ARI

Accumulator Addressing
In accumulator addressing, the content of accumulator is the address of the operand/data in program

memory.
11.61 Digital Signal Processing
Example:
EADA AR3 : Read the content of program memory addressed by accumulator A and store in data

memory addressed by AR3


WRITA AR4 Write the content oft data memory addressed by AR4 In program memory doares
Ocumulator A
Direct Addressing
in the direct
addressing mode the lower 7 bits of data memory address are specified in the instruction
TSelt. The 16-bit data memory address is formed by using either the 9 bits of DP (Data Pointer) in status

register-0 or the 16-bit of SP (Stack Pointer).


When DP is used, the 9 bits of DP is the upper 9 bits of the 16-bit address and the lower 7 bits are the
address directly specified by the instruction.
When SP is used, the (16-bit) content of SP is added to 7 bits specified in the instruction to form
16-bit address.
in the instructions listed in table 11.22, the syntax used to represent direct addressing is Smem.
assembly language programs, the 7-bit address is specified as a 7-bit constant without # symbo.

Example:
ADD 6Ch, A Add the content of memory directly addressed by the instruction laddress - 6Ch)tothe
:accumulator A
57h) from the
SUB 57h, B Subtract the content of memory directily addressed by the instruction laddress =

:accumulatorB
Indirect Addressing
In the indirect addressing mode, the data memory address is specified by the content of one of the
the data
eight auxiliary registers, ARO-AR7. The AR (Auxiliary Register) currently used for accessing
is

denoted by 3-bit ARP (Auxiliary Register Pointer) field of status register-0.


In this addressing mode, the content of AR can be updated automatically either after or before the

operand is fetched. The syntax used for modifying the content of AR are listed in table 11.21.

In the instruction set listed in table 11.22, the syntax used to represent indirect addressing is Smem
Xmem/Ymem. In the assembly language programs, the syntax listed in table 11.21 are used.

Table 11.22: Syntax Used in Indirect Address for Modifying AR

SYNTAX MODIFICATION OF AR

ARx AR unaltered

AR decremented by 1 after data access


ARx
AR incremented by l after data access
ARx+
AR incremented by 1 beforo data access
+ARx
AR 0 AR decremented by the content of index register (ARO)

ARx +0 AR incremented by the content of index rogister (ARO)


wwwpap
Chapter gnal Prvcessosors
Table 11.22:Continned.
11. 62
SYNTAN
MODIFICATION OF AR
AR AR
decremented for bit reversed
AR 08 AR
incremented for bit reversed
addressing using index register
(ARO)
AR AR decremented addressing using index register
for cireular (ARO)
AR incremented
addressing
for cireular
ARx AR decremented
addressing
for circular
ARr+O% AR
addressing using index register (ARO)
incremented for circular
RPR) addressing using index register (ARO)
ARx Base, Ik- OfNset, Data address Base
'ARx(?k) Same +Ofset, ARx is not altered
as above, but ARx is
modified by long immediate
+ARx(lk)% Same as above, but
address modified for circular
Absolute addressing
addressing

Example
LD AR3, A Load the content of
memory addressed by AR3 in accumulator A
LD AR3- A Same as above, but after
loading decrement AR3
LD AR3+ A Same as above, but after
loading increment AR3
LD AR3-0, A Same as above, but
after loading decrement AR3 using ARO
LD AR3+0, A Same as above, but after
loading increment AR3 ARO using
Memory-Mapped Register Addressing
In memory-mapped register addressing, the address of the memory-mapped register is specified as
direct or indirect address in the instruction.
The memory-mapped registers are mapped to page-0 of data memory address and so can be accessed
by using only 7-bit address. In direct addressing, the 7 bits are directly specified in the instruction as a 7-bit
constant without # symbol. In indirect addressing, the lower 7 bits of auxiliary
register will be the address of
memory-mapped register. In this addressing nmode, the memory-mapped registers are accessed without atfecting
the content of DP (Data Pointer) or SP (Stack Pointer).

Example
LDM 06h, A : Load the content of MMR directly addressed by the instruction (address = 06h) in
accumulator A

STLMA, 1Eh Store the content of accumulator A in MMR directly addressed by the instruction
(address= IEh)

Stack Addressing
In stack addressing mode, the data memory address is the content of Stack Pointer (SP).

the stack addressing mode. The call


The push and pop instructions access the stack memory using
stack pointer address for automatic storage/retrieval of information
mterrupt and return instructions also use

to/from stack.
11.63
Digital Signal Processing
Note : Stack memory is a
portion of data memory reserved by user/system designed for stack operations.
Example:
PSHM ICh: Decrement SP by 2 and push the content of MMR addressed by the instruction
laddress=1Ch} to stack memory addressed by SP
POPM ICh : Popthetop of stack
polnted by SP to MMR addressed by the Instruction (address=1Chl,
:then SP is Incremented by 2.

11.4.7 Instruction Pipelining in


TM$320C54x Processor
The execution of TMS320c54x
processor instructions involve six level/phase of pipelining. The six
phases of pipelining are program prefetch, program fetch, decode, access, read and execute. The functions
performed in the six phases are given below.
Program prefetch Program Address Bus (PAB) is loaded with the address of the next instruction to
be fetched.
Program fetch The opcode (instruction word) is fetched from Program Bus (PB) and loaded
into the Instruction Register (IR).
Decode :The opcode is decoded to determine the type of memory access operation and
the control sequence at the data address generation unit and the CPU.
Access Operand address is loaded on the Data Address Bus (DAB). Ifa second operand
is required, then another address loaded in CAB.
Read :The operands are read from buses DB and CB.
Execute :Perform the task specified by the instruction.
The six phases of pipeline are independent of each other, which permits overlapping of instruction
execution. Daring any clock oycle, there is a possibility of execution of different phases of one to six instructions.
of
Therefore, the average execution time word instruction is clock cycle. While executing of the
one one
instructions all the phases of pipeline are not fully utilized and so the average execution time will be 2 to 6
some

clock cycles.

11.4.8 Instruction of TMS320C54x Proccssors


The TM$320C54x processors instruction set consists of instructions for signal processing operations,
high speed computations and general purpose applications. The instructions of TMS320c54xcan beclassiífied
into the following groups.
1. Arithmetic instructions 2. Logicalinstructions
3. Branch/control instructions 4. Load/store instruction
5. Move instruction
The instructions of TMS320C54x processors are classified into the above groups, arranged in
alphabetical order and listed in table 11.21
The size of TMS320CS4x instructions is I to 3 words. When all the instructions and data reside in
internal memory, most of the one-word instructions are executed in one clock cycle. The execution time for
2/3 word instructions anid some data transfer, branch and MAC instructions will be 2 to 6 clock cycles.

You might also like