Retargeting A C Compiler For A DSP Processor: Henrik Antelius

Download as pdf or txt
Download as pdf or txt
You are on page 1of 93

Retargeting a C Compiler for a

DSP Processor

Master thesis performed in electronics systems


by

Henrik Antelius

LiTH-ISY-EX-3595-2004
Linköping 2004
Retargeting a C Compiler for a
DSP Processor

Master thesis in electronics systems


at Linköping Institute of Technology
by

Henrik Antelius

LiTH-ISY-EX-3595-2004

Supervisors: Thomas Johansson


Ulrik Lindblad
Patrik Thalin
Examiner: Kent Palmkvist
Linköping, 2004-10-05
Avdelning, Institution Datum
Division, Department Date
2004-10-05

Institutionen för systemteknik


581 83 LINKÖPING

Språk Rapporttyp ISBN


Language Report category
Svenska/Swedish Licentiatavhandling ISRN LITH-ISY-EX-3595-2004
X Engelska/English X Examensarbete
C-uppsats Serietitel och serienummer ISSN
D-uppsats Title of series, numbering
Övrig rapport
____

URL för elektronisk version


http://www.ep.liu.se/exjobb/isy/2004/3595/

Titel Anpassning av en C-kompilator för kodgenerering till en DSP-processor


Title
Retargeting a C Compiler for a DSP Processor

Författare Henrik Antelius


Author

Sammanfattning
Abstract
The purpose of this thesis is to retarget a C compiler for a DSP processor.

Developing a new compiler from scratch is a major task. Instead, modifying an existing
compiler so that it generates code for another target is a common way to develop compilers for
new processors. This is called retargeting.

This thesis describes how this was done with the LCC C compiler for the Motorola DSP56002
processor.

Nyckelord
Keyword
retarget, compiler, LCC, DSP
Abstract

The purpose of this thesis is to retarget a C compiler for a DSP proces-


sor.
Developing a new compiler from scratch is a major task. Instead,
modifying an existing compiler so that it generates code for another
target is a common way to develop compilers for new processors. This
is called retargeting.
This thesis describes how this was done with the LCC C compiler for
the Motorola DSP56002 processor.
Table of contents

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose and goal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 The reader. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Reading guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 DSP 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Motorola DSP56002. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Data buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Address buses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Data ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.4 Address generation unit . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.5 Program control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Compilers 9
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 The analysis-synthesis model . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 Lexical analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.2 Syntax analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.3 Semantic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5.1 Intermediate code generation . . . . . . . . . . . . . . . . . . . 15
3.5.2 Code optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5.3 Code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Symbol table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.7 Error handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.8 Front and back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.9 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

ix
3.9.1 Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.9.2 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.9.3 Linker and loader. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.10 Compiler tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 LCC 23
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 The compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.1 Lexical analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.2 Syntax analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.3 Semantic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.4 Intermediate code generation . . . . . . . . . . . . . . . . . . . 29
4.3.5 Back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Implementation 33
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 The compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Data types and sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Register usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.3 Memory usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.4 Frame layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.5 Calling convention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.6 Naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Retargeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.2 Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.3 Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.4 C code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Special features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Other changes to LCC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.6 The environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.7 crt0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.8.1 Register targeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.8.2 48-bit registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.8.3 Address registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.9 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Conclusions 51
6.1 Retargeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
References 53

x
Table of contents

Appendix A: Instructions 55
A.1 Arithmetic instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
A.2 Logical instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
A.3 Bit manipulation instructions . . . . . . . . . . . . . . . . . . . . . . . 57
A.4 Loop instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.5 Move instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.6 Program control instructions . . . . . . . . . . . . . . . . . . . . . . . 58
Appendix B: Sample code 59
B.1 sample.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
B.2 sample.asm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Appendix C: dsp56k.md 61
Index 79

xi
xii
1
Introduction

1.1 Background
The division of Electronics Systems (ES) at the department of Electri-
cal Engineering (ISY) at Linköping University (LiU) is currently run-
ning a project aiming at developing a DSP processor. The goal of this
project is to make a DSP with a scalable structure that is instruction
level compatible with the Motorola DSP56002 processor. The scalabil-
ity refers to variable data word length and addition or removal of
memories and instructions. The goal with scalability is to reduce the
power consumption.
Currently this project is nearly finished. In order to increase the usabil-
ity of the DSP a C compiler is needed.
It was decided that the best way to create a C compiler was to retarget
an existing C compiler. Creating a compiler from scratch is a big
undertaking that requires a lot of work. Retargeting a compiler is a rel-
atively easy task compared to developing an entire compiler.

1.2 Purpose and goal


The purpose of this thesis is to retarget a C compiler to the Motorola
DSP56002 processor. The resulting compiler should from one or more
C source files produce an executable file that can execute on the DSP.
The only requirement on the compiler is that it should generate code

1
1.3 – The reader

that works correctly and functions as intended. There are no require-


ments on the performance or the size of the generated code.
The compiler should also be compatible with Motorola’s C compiler
and tools for the DSP56002. This makes it possible to mix generated
code from the two compilers. It also means that the tools from
Motorola can be used for the new compiler.

1.3 The reader


It is assumed that the reader of this thesis has basic knowledge of the
C programming language and some knowledge of assembly lan-
guage. It is also assumed that the reader has a general knowledge of
how processors work and what function a compiler has.

1.4 Reading guidelines


This is a brief description of the chapters:
• Chapter 1 contains an introduction and states the purpose of the
thesis.
• Chapter 2 describes how the DSP56002 processor works and
how it can be used.
• Chapter 3 contains general compiler theory that is needed to
understand how a compiler works.
• Chapter 4 describes the compiler LCC that was used in this
thesis.
• Chapter 5 describes the implementation and modifications that
were done to LCC.
• Chapter 6 lists the conclusions that were made and suggests fur-
ther work.

2
2
DSP

This chapter contains a description of how the Motorola DSP56002


processor works. This information is collected from [4].

2.1 Introduction
Digital signal processing is, as the term suggests, the processing of sig-
nals by digital means. The signal is normally an electrical signal car-
ried on a wire, but it can represent almost any kind of information and
it can be processed in a wide variety of ways. Examples of digital sig-
nal processing include the following:
• Filtering of signals.
• Convolution, which is the mixing of two signals.
• Correlation, which is the comparison of two signals.
• Rectification, amplification and transformation of a signal.
All of these tasks have earlier been performed by using analog cir-
cuits. Nowadays integrated circuits have enough processing power to
perform these and many other functions. The devices performing
these tasks are called digital signal processors, or DSPs. They are spe-
cialised microprocessors with architectures designed specifically for
the types of operations required in digital signal processing. Like gen-
eral-purpose microprocessors, the DSPs are programmable devices
with its own native instruction set.

3
2.2 – Motorola DSP56002

DSPs can today be found in almost all electronic areas, such as mobile
phones, personal computers, digital television decoders, surround
receivers, and so on. The advantages of using a DSP instead of analog
circuits are many. Generally, fewer components are needed, DSPs
have higher noise immunity, it is easy to change the behaviour of a fil-
ter, filters with closer tolerances can be built, and so on. Also, since the
DSP is a microcomputer, the same hardware design can be used in
many different areas by simply changing the software for the DSP.

2.2 Motorola DSP56002


The Motorola DSP56002 is a general purpose DSP processor with a tri-
ple-bus Harvard architecture. This architecture can access multiple
memories at the same time. It uses fixed-point arithmetic and has
three function units; data arithmetic and logic unit (Data ALU),
address generation unit (AGU) and program control unit (PCU). It
does also have three memories, two for data (X and Y) and one for the
program (P). A block diagram of the DSP56002 can bee seen in Figure
2.1.

Figure 2.1: Block diagram of the Motorola DSP56002

4
Chapter 2 – DSP

This architecture with multiple memories and buses makes it possible


to, during one instruction cycle, make one computation in the data
ALU while accessing the X and Y memories at the same time.

2.2.1 Data buses


The data buses consists of four 24-bit wide buses called the Y data bus
(y_dbus), the X data bus (x_dbus), the program data bus (p_dbus) and
the global data bus (g_dbus). They are used for moving data between
the function units and the memories. Data transfers between the data
ALU and the X and Y memories occur over the X and Y data buses,
respectively. All other data movements occur over the global data bus
and instruction fetches occurs over the program data bus.

2.2.2 Address buses


Addresses for the X data memory and the Y data memory are speci-
fied over the X address bus (x_abus) and the Y address bus (y_abus).
Addresses for the program memory are specified over the P address
bus (p_abus). All address buses are 16-bit wide.

2.2.3 Data ALU


The data ALU performs all of the arithmetic and logical operations on
the data. It uses a register set that consists of four 24-bit input regis-
ters, two 48-bit accumulator registers and two 8-bit accumulator
extension registers.
The input registers are called X0, X1, Y0 and Y1. They can also be com-
bined into two 48-bit registers called X and Y. The two accumulators
are called A and B and are 56 bits wide. Each consists of three concate-
nated registers, A2:A1:A0 and B2:B1:B0. The A2 and B2 are the 8-bit
accumulator extension registers and they are used when more than
48-bit accuracy is needed.
The input registers are used for operands to the instructions and the
accumulator registers are used for both operands and the result from
instructions.

2.2.4 Address generation unit


The AGU performs all of the address storage and address calculations
necessary to access the data in the memories. The AGU is divided into
two identical halves, each of which has an address arithmetic unit that
can generate one address each instruction cycle. The AGU has three
sets of eight registers. They are the address registers R0 – R7, the offset

5
2.3 – Instruction set

registers N0 – N7 and the modifier registers M0 – M7. The R-registers


are used for storing addresses that are used to address the memories.
The N- and M-registers are used to update the R-registers in various
ways. The registers are connected. So, for example, only N1 and M1
can be used to update R1.

2.2.5 Program control unit


The PCU performs instruction prefetch, instruction decoding, hard-
ware loop control and interrupt processing. It contains a 15-level sys-
tem stack that is 32 bits wide and the following six registers: program
counter (PC), loop address (LA), loop counter (LC), status register
(SR), operating mode register (OMR) and stack pointer (SP).

2.3 Instruction set


The instruction set can be seen in Appendix A on page 55. About half
of the available instructions allow the use of parallel data moves.

2.4 Assembly
The instruction syntax is organized in four columns; opcode, oper-
ands and two parallel move fields. An example of a typical assembly
instruction can be seen here:
Opcode Operands XDB YDB
MAC X0,Y0,A X:(R0)+,X0 Y:(R4)+,Y0
The opcode column specifies the operation that should be performed.
The operands column specifies which operands the opcode should
use. The XDB and YDB columns specify optional data transfers over
the X data bus and the Y data bus. The address space qualifiers X: and
Y: indicate which memory is being referenced.
This is an example of a small assembly program:
ORG Y:
var_a dc 42
var_b dc 48

ORG P:$40
MOVE Y:var_a,X0
MOVE Y:var_b,A

6
Chapter 2 – DSP

ADD X0,A
MOVE A,Y:var_a
This program simply adds the variables var_a and var_b and stores
the result in var_a.
This is a list of some of the features of the assembler that is used in this
thesis:
• Labels: If the first character on a line is not a space or a tab it is a
label. Labels are used for variables and jump destinations. A
colon is often used to end the label to increase readability of the
assembly.
• ORG: The ORG directive is used to indicate which memory the
following statements belong to. It is also used for a lot of other
memory related things.
• OPT: The OPT directive is used to assign options to the assem-
bler.
• Variables: Variables are declared with a label and the DC direc-
tive to define a constant.
• GLOBAL: The GLOBAL keyword is used to instruct the assem-
bler that a variable is global.
• Comments: Semicolon is used as a comment specifier. All char-
acters to the right of the semicolon are ignored.

7
2.4 – Assembly

8
3
Compilers

This chapter contains general compiler theory. Most of the informa-


tion is collected from [1].

3.1 Introduction
A compiler is a program that reads a program written in one language
and translates it into an equivalent program in another language. An
important part of this process is to report the presence of errors in the
source program to the user.
There exists thousands of different compilers for different source lan-
guages and target languages, and there also exists many different
types of compilers. However, the basic principles of how the compil-
ers work are the same. This chapter will discuss these basic principles.

3.2 The analysis-synthesis model


There are two parts to compilation: analysis and synthesis. The analy-
sis part breaks up the source program into consecutive pieces and cre-
ates an intermediate representation of the source program. The
synthesis part constructs the desired target program from the interme-
diate representation.

9
3.3 – Phases

During analysis the operations stated in the source program are deter-
mined and recorded in a hierarchical structure called a tree. Often a
special kind of tree called a syntax tree is used.
In the synthesis part of the compilation the output is generated from
the contents of the syntax tree. There is often also some sort of optimi-
zation of the generated source in this part.

3.3 Phases
A compiler operates in phases, each of which transforms the source
program from one representation to another. A typical decomposition
of a compiler is shown in Figure 3.1. The following sections will dis-
cuss the different phases and how they are connected.

Figure 3.1: Phases of a compiler

10
Chapter 3 – Compilers

3.4 Analysis
The analysis consists of three phases: lexical analysis, syntax analysis
and semantic analysis.

3.4.1 Lexical analysis


Lexical analysis, sometimes called scanning, is where the stream of
characters that make up the source program is scanned left-to-right
and transformed into groups of characters called tokens. For example,
the characters in the statement
result = start + rate * 60
would be transformed into the following tokens:
1. The identifier result.
2. The assignment symbol =.
3. The identifier start.
4. The plus sign.
5. The identifier rate.
6. The multiplication sign.
7. The number 60.
The white space is normally eliminated during lexical analysis.

3.4.2 Syntax analysis


Syntax analysis, or parsing, is where the tokens of the source program
is grouped into grammatical phrases. Usually the phrases of the
source program is represented by a parse tree. An example of a parse
tree can be seen in Figure 3.2.

11
3.4 – Analysis

Figure 3.2: Parse tree for the statement result=start+rate*60

The phrase rate*60 will be grouped together because the rules of


arithmetic expressions state that multiplication is performed before
addition.

Context free grammars


The rules for the syntax analysis is often expressed by context free
grammars. The grammar gives a precise and easy to understand spec-
ification of the syntax of the programming language. It is also possible
to construct a parser from a grammar by using automated tools.
For example, an if-else statement in C has the form:
if ( expression ) statement else statement
The statement is the concatenation of the keyword if, an opening
parenthesis, an expression, a closing parenthesis, a statement, the key-
word else, and another statement. Using the variable expr for expres-
sion and stmt for statement, this rule can be expressed as:
stmt → if ( expr ) stmt else stmt
The arrow may be read as “can have the form”. This kind of rule is
called a production. In a production lexical elements like the keyword
if and the parenthesis are called tokens. Variables like expr and stmt
represent sequences of tokens and are called nonterminals.
A context free grammar has four components:
1. A set of tokens, known as terminal symbols.
2. A set of nonterminals.

12
Chapter 3 – Compilers

3. A set of productions where each production consists of a nonter-


minal, an arrow, and a sequence of tokens and/or nonterminals.
4. A designation of one of the nonterminals as the start symbol.
The following is an example of a simple grammar that can parse the
right hand side of the assignment statement in Figure 3.2:
expr → identifier
expr → number
expr → expr + expr | expr * expr
The symbol | is used to separate multiple productions on one line and
can be read as “or”. By using expr as the start symbol the derivation of
the right hand side of the assignment statement could look like this:
expr → expr + expr
→ identifier + expr
→ identifier + expr * expr
→ identifier + identifier * expr
→ identifier + identifier * number
A grammar derives strings by beginning with the start symbol and
repeatedly replacing a nonterminal by the right side of the production
for that nonterminal. The set of token strings that can be derived from
the start symbol form the language defined by the grammar.

Syntax tree
A more common internal representation of the syntactic structure is
the syntax tree. It is a compressed representation of the parse tree
where the operators appear as the nodes, and the operands of an oper-
ator are the children for that node. An example of a syntax tree is seen
in Figure 3.3.

Figure 3.3: Syntax tree for the statement result=start+rate*60

13
3.4 – Analysis

3.4.3 Semantic analysis


The semantic analysis phase checks the source program for semantic
errors and gather type information for the code generation phase. It
uses the hierarchical structure generated in the syntax analysis phase
to identify the operators and operands of expressions and statements.
This checking ensures that certain kinds of programming errors will
be detected and reported. Examples of semantic checks can be:
• Type checks: The compiler should report an error if an operator
is applied to an incompatible operand. For example, if an inte-
ger variable is added to a function. It can also check that param-
eters to functions are correct in type and number.
• Flow-of-control checks: Statements that causes the flow of con-
trol to leave a construct must have some place to which to trans-
fer the flow of control. For example, a break statement in C
causes the flow of control to leave the enclosing while, for or
switch statement. If break is used outside of one of those an
error is generated.
• Uniqueness checks: Sometimes an object can only be defined
once. For example, the case labels in a switch statement in C
must be unique, and variables with the same name in the same
scope is not permitted.
• Name related checks: Sometimes the same name must appear at
multiple locations. For example, in ADA a loop or block may
have a name that appears at the beginning and at the end of the
construct.
There are many more different types of checks that can be needed to
be performed depending on the language. In C for example, functions
and variables must be declared before they are used, something that is
not necessary in some languages.
The type checking does not always have to result in an error. For
example, a type mismatch can sometimes be resolved by converting
the operand. If a in the statement
a = a * 2;
is a floating point number, the integer 2 must be converted to a float-
ing point number before the multiplication can take place. This is
accomplished by inserting a new node that explicitly converts an inte-
ger to a floating point number in the syntax tree.

14
Chapter 3 – Compilers

Since programming languages are so different and the semantic


checks needed by the languages are so different there is no systematic
way perform the semantic checks. It is usually done by traversing the
tree and examining the nodes or during the syntax analysis phase.

3.5 Synthesis
The synthesis consists of the three phases intermediate code genera-
tion, code optimizer and code generator. It is responsible for trans-
forming the source that is now in the form of a syntax tree to the
output language.

3.5.1 Intermediate code generation


After the syntax and semantic analysis some compilers generate a
machine independent intermediate form of the source program.
Although the source program can be translated directly to the target
language from the syntax tree, there are some benefits of using an
intermediate form:
• A machine independent optimizer can be used on the interme-
diate representation.
• Retargeting is made easier. Creating a compiler for a different
machine can be done by replacing a smaller part of the compiler
than would have otherwise been necessary.
The intermediate representation should have two important proper-
ties; it should be easy to generate and it should be easy to transform
into the target program. A common way to solve this is to use a so
called three-address code. It is very similar to an assembly language
where each memory location can be used as a register. The code con-
sists of a sequence of instructions, each of which can have at most
three operands. For example, the assignment statement from Figure
3.2 might look like this:
temp1 = rate * 60
temp2 = temp1 + start
result = temp2
There are also statements for conditional and unconditional jumps,
procedure calls, return statements, indexed assignment to be used on
arrays, and address and pointer assignments.
The instruction set of three-address codes must be large enough to
implement the operations in the source language, but a smaller

15
3.5 – Synthesis

instruction set is easier to implement and retarget. However, if it is too


small the intermediate code generator can be forced to generate long
sequences of statements for some source language operations. It will
then be more difficult for the optimizer and the code generator to pro-
duce good code.

3.5.2 Code optimization


The code optimizer will attempt to improve the intermediate code so
that faster running machine code will be generated. It can sometimes
also be of interest to make the code smaller. For DSP processors code
with lower power consumption is sometimes preferred.
There are two types of optimizations that can be done; machine inde-
pendent and machine dependent. Machine independent optimiza-
tions are typically done using the intermediate form as the base and
does not consider any details of the target architecture when making
optimization decisions. It is often very general in nature. Machine
dependent optimizations can be done both on the intermediate form
and the generated code. These optimizations consider the target archi-
tecture specifically and uses special instructions such as hardware
loops and so on.
There are a number of common optimization techniques.

Constant propagation
Constant propagation is simply the replacement of variable references
with constant references when possible. For example, the statement
a = 3;
function_call(a + 42);
becomes
function_call(3 + 42);

Constant folding
Expressions with constant operands can be calculated at compile time.
The example above would be transformed to
function_call(45);
Programmers usually do not write expressions such as 3+42 directly,
but these expressions are quite common after macro expansion and
other optimizations such as constant propagation.

16
Chapter 3 – Compilers

Common subexpression elimination


A common subexpression, or CSE, is created when two or more
expressions compute the same value. The expression is calculated
once to a temporary variable that is used instead of the CSE. For
example, the statement
array1[i + 1] = array2[i + 1];
will be transformed to
temp1 = i + 1;
array1[temp1] = array2[temp1];

Dead code elimination


Code that is never reached or that does not affect the program can be
eliminated. For example, this code fragment
int global;
void foo(void){
int k = 1;
global = 1;
global = 2;
}
will transform into the following
int global;
void foo(void){
global = 2;
}

Expression simplification
Some expressions can be simplified by replacing them with a more
efficient expression. For example, i+0 will be replaced by i, i*0 and
i-i by 0, and so on.

Code motion
Expressions in a loop that gives the same result each time the loop is
iterated can be moved outside the loop and calculated only once
before entering the loop.

17
3.6 – Symbol table

Strength reduction
Strength reduction replaces expensive instructions with less expensive
instructions. For instance, a popular strength reduction is to replace a
multiplication by a constant power of two with a left shift.

3.5.3 Code generation


The final phase of the compiler is the generation of target code. The
target code is usually relocatable machine code or assembly code.
Memory locations are selected for each of the variables used in the
source program and the intermediate instructions are translated into
one or more assembly level instructions that perform the same task. A
vital part of code generation is the assignment of registers to variables,
since that can greatly affect the performance of the generated code.
Using the example from the previous sections, the generated code
might look like this
MOVE rate, R1
MUL #60, R1
MOV start, R2
ADD R1, R2
MOVE R2, result

3.6 Symbol table


An essential part of the compiler is to keep track of the identifiers used
in the source program and to collect information about various
attributes of each identifier. These attributes contains information
about the name and type of the identifier, its size, scope and so on. For
functions and procedures it also contains the number and types of its
arguments and the return type. In a similar way it works for more
complex data types like arrays and structures.
The symbol table is a data structure that contains a record for each
identifier and fields for the attributes of the identifier. The data struc-
ture makes it possible to search for identifiers and add or retrieve the
attributes and to add new identifiers.
When an identifier is found in the lexical analysis its name is added to
the symbol table if its not already there. The index in the symbol table
is then passed along in the token and that index is used to refer to the
identifier from there on. In the later phases of compilation information

18
Chapter 3 – Compilers

about the type and other attributes are added and is used in various
ways.

3.7 Error handler


It is important that the compiler can detect errors and deal with them
in a reasonable way. When an error is encountered the compiler emits
an error message containing the location of the error in the source pro-
gram and a message stating the type of error and it then tries to con-
tinue with the compilation. It can sometimes be difficult for the
compiler to know what to do to continue when an error has been
detected. One way is to, for example, skip all input until the next sem-
icolon to get to the next statement.
As soon as the error count is greater than zero a flag is set and the
compiler will stop execution after the semantic analyzer phase. There
is no point in generating the target program when there exists errors
in the source program.
The compiler can also detect minor errors that will not stop the compi-
lation and emit warnings about these errors instead.

3.8 Front and back end


Often the phases are collected into a front end and a back end. The
front end consists of the phases that depend on source language and
are largely independent of the target machine. These normally include
lexical analysis, syntax analysis, semantic analysis and the generation
of intermediate code. The machine independent optimizations can
also be done in the front end. The creation of the symbol table and
most of the error handling is also done in the front end.
The back end includes the parts of the compiler that are dependant on
the target machine, and these parts usually does not depend on the
source language, only the intermediate code. The back end therefore
consists of the code optimizer and the code generator. It also uses the
symbol table and error handler.
This division of the design makes it easy to take the front end of a
compiler and combine it with a new back end to produce a compiler
for the same source language to a different target machine. This is
called retargeting.

19
3.9 – Environment

3.9 Environment
In addition to the compiler, several other programs are required if an
executable program is to be created. See Figure 3.4.

Figure 3.4: A compiler system

3.9.1 Preprocessor
The preprocessor produce the input to the compiler. It often performs
different kinds of text processing, for example macro processing and
file inclusion. In C for example, all lines beginning with a # is an
instruction to the preprocessor. #define FAIL -1 causes all occur-
rences of FAIL to be replaced by -1, and #include <file.h> will
include the file file.h in the source program. In C the preprocessor
also removes the comments from the source program.

3.9.2 Assembler
Some compilers produce assembly code, and that must be passed to
an assembler for further processing. The assembler works much like a
compiler and translates the assembly source to relocatable machine
code.

20
Chapter 3 – Compilers

3.9.3 Linker and loader


The linker makes it possible to combine several relocatable machine
code files into a single program. The different machine code files can
be the result from several compilations, and some may be library files.
The linker resolves external references in the input files so that data
and functions from the different files can be used by each other.
When all the external references are resolved the loader takes the relo-
catable machine code and alters all relocatable addresses to real
addresses and places the code and the data in its proper locations and
creates the output file.

3.10 Compiler tools


Since most compilers use the same structure and function in the same
way, specialized tools have been developed that helps implement the
various components of the compiler. These tools use specialized lan-
guages for specifying and implementing the components, and many
use algorithms that are quite sophisticated. The following is a list of
some compiler construction tools:
• Scanner generators: These automatically generate lexical ana-
lyzers. Usually from a specification based on regular expres-
sions. Examples include flex and lex.
• Parser generators: These produce syntax analyzers from specifi-
cations that is normally based on a context free grammar. Before
the parser generators appeared, the parser was the most time
consuming part to implement. Now it is considered to be one of
the easiest to implement thanks to the parser generators. Exam-
ples of parser generators are yacc and bison.
• Syntax-directed translation engines: These produce routines
that walk the parse tree and generates intermediate code.
• Automatic code generators: These tools generates routines that
translates the intermediate language into the machine language
for the target machine by the help of a collection of rules. The
basic technique is template matching. The intermediate code
statements are replaced by templates that represent sequences of
machine instructions.

21
3.10 – Compiler tools

22
4
LCC

This chapter describes how the compiler LCC works. Most of this
information is collected from [2] and [3].

4.1 Introduction
LCC is a free ANSI C compiler that is designed to be retargetable. The
source code is available for download from the internet [6] under a
license [7] that imposes almost no restrictions at all.
This compiler was chosen because it is very small and simple. It is
designed in a way so that it is easy to retarget it to generate code for
other processors. There is also excellent documentation of LCC in the
form of a book that describes every detail of the implementation of the
entire compiler. It is called “A Retargetable C Compiler: Design and
Implementation” and it was used extensively during this thesis. The
thesis could probably not have been completed without the book.
Another compiler candidate was the GNU C Compiler, or GCC, from
the GNU Compiler Collection, which is an open source C compiler.
GCC would probably have generated better and faster code, but it
was not chosen because it is much bigger and more complex than
LCC. Also, the same kind of documentation that was available for
LCC was not available for GCC.

23
4.2 – C

4.2 C
C is a general purpose programming language that was developed
during the 1970’s by Brian Kernighan and Dennis Ritchie and it is still
widely used today. It is a relatively low level language where the basic
data types in the language correspond to real data types found in the
hardware. The language provides no operations to deal directly with
composite data types, such as strings, arrays, lists and so on. There are
no input/output facilities and no file access facilities. All these higher
level operations must be provided by library functions. This, and sev-
eral other limitations, has some advantages. It makes the language
small and relatively easy to learn. It does also mean that compilers for
the language will be smaller and easier to construct.
C has become very popular and there exists compilers for many differ-
ent processors and operating systems. Although it is far from an ideal
language for DSP processors, it is still extensively used for them. That
is probably because it is such a simple and low level language, which
makes it easier to construct a compiler that generates efficient code for
the DSP processors.
Over the years the C programming language has evolved and been
standardized a couple of times. The first version, called K&R C (from
Kernighan and Ritchie), is derived from the reference manual in the
first edition of the book “The C programming language” by Brian
Kernighan and Dennis Ritchie. In 1989 ANSI standardized the lan-
guage, and it is commonly referred to as ANSI C or C89. ISO has
released two standards for C, and they are called ISO C90 and ISO
C99.

4.3 The compiler


The following sections will describe the different phases of the com-
piler and how they work.

4.3.1 Lexical analysis


The lexical analyzer reads source text and produces tokens. For each
token the lexical analyzer returns its token code and zero or more
associated values. The token codes for single character tokens, for
example = and +, are the characters themselves. For tokens that can
consist of one or more characters, for example identifiers and con-
stants, defined constants are used. For example, the expression
ptr = 42 results in the following token stream

24
Chapter 4 – LCC

ID "ptr" symbol table entry for ptr


'='
ICON "42" symbol table entry for 42
The token code for the operator = is the numeric value of =, and it
does not have any associated values. The token code for the identifier
ptr is the value of the constant ID, and the associated values are the
identifier string itself and a pointer to the symbol table entry for the
identifier. The integer constant 42 returns the token ICON and the
associated values "42" and a pointer to the symbol table.
Keywords, such as for and switch, have their own token codes to
distinguish them from identifiers.
The lexical analyzer also tracks the source coordinates for each token.
These coordinates contains the file name, line number and position on
the line of the first character of the token. The coordinates are used to
locate errors when they are found.

Recognizing tokens
The lexical analyzer in LCC is written by hand, it is not generated by a
tool. This is due to the fact that the lexical structure in C is simple and
that generated analyzers tend to be large and slow.
The lexical analyzer is used by calling the function gettok(), which
returns the next token. The gettok() function recognizes a token by
using a switch statement on the first character in the token to classify
it. It then consumes the following characters that make up the token.
The following is a small sample of the code
...
switch (*rcp++) {
...
case '<':
if (*rcp == '=') return cp++, LEQ;
if (*rcp == '<') return cp++, LSHIFT;
return '<';
...
rcp and cp are pointers to the next character in the input file. The
code for identifying most of the tokens looks very similar to the exam-
ple, but identifying numbers, strings and identifiers is a bit harder.
However, it works in the same way by looking ahead in the input
stream.

25
4.3 – The compiler

4.3.2 Syntax analysis


The syntax analyzer, or parser, uses the stream of tokens from the lexi-
cal analyzer and confirms that it follows the syntax of the language. It
also builds an internal representation of the input that is used by the
rest of the compiler.
The parser for LCC is also written by hand. The reason for this the
same as for the lexical analyzer; C is a simple language and the code
generated by tools is slow and big.

Grammar
LCC uses a context free grammar written in EBNF form to define the
rules for the parser. The parser is constructed by writing a parsing
function for each nonterminal. The idea is to write a function X() for
each nonterminal X, using the productions for X as a guide to writing
the code for X(). For example, the parsing function for the following
production
expr → term { + term }
will look like
void expr(void){
term();
while(t == '+'){
t = gettok();
term();
}
}
The { and } in the production is an EBNF feature that means “zero or
more”.

Abstract syntax tree


When parsing the program the compiler also generates an intermedi-
ate representation of the program. This is done in the form of abstract
syntax trees, or simply trees. Abstract syntax trees are parse trees
without the nodes for nonterminals and nodes for useless terminals.
For example, the tree for the expression (a+b)*c can be seen in Fig-
ure 4.1.

26
Chapter 4 – LCC

Figure 4.1: Tree for the expression (a+b)*c

There are no nodes for the nonterminals used when parsing this
expression, and there are no nodes for the tokens ( and ). The tokens
+ and * are contained in the nodes ADD+I and MUL+I. The nodes with
the operator ADDRG+P compute the address of the operand and
INDIR+I fetches integers at the address given by their operand.
The name of the nodes are constructed by an operator and a type suf-
fix that denotes the type that the operator operates on. For example,
the node ADD+I states that the node uses integer addition. Table 4.1
lists the different type suffixes available.

Type suffix Meaning


F Floating point
I Integer
U Unsigned
P Pointer
V Void
B Structure
Table 4.1: Type suffixes

The trees can contain operators that do not appear in the source pro-
gram. For example, the INDIR+I node fetches integers at an address,
but there is no fetch operator in C. A list of operators that can appear
in the trees is seen in Table 4.2. In addition to these, there are six more
operators that are used in trees listed in Table 4.3.

27
4.3 – The compiler

Operator Type suffix Operation


ADDRF ...P.. Address of a parameter
ADDRG ...P.. Address of a global
ADDRL ...P.. Address of a local
CNST FIUP.. Constant
BCOM .IU... Bitwise complement
CVF FI.... Convert from float
CVI FIU... Convert from signed integer
CVP ..U... Convert from pointer
CVU .IUP.. Convert from unsigned integer
INDIR FIUP.B Fetch
NEG FI.... Negation
ADD FIUP.. Addition
BAND .IU... Bitwise AND
BOR .IU... Bitwise inclusive OR
BXOR .IU... Bitwise exclusive OR
DIV FIU... Division
LSH .IU... Left shift
MOD .IU... Modulus
MUL FIU... Multiplication
RSH .IU... Right shift
SUB FIUP.. Subtraction
ASGN FIUP.B Assignment
EQ FIU... Jump if equal
GE FIU... Jump if greater than or equal
GT FIU... Jump if greater than
LE FIU... Jump if less than or equal
LT FIU... Jump if less than
NE FIU... Jump if not equal
ARG FIUP.B Argument
CALL FIUPVB Function call
RET FIUPV. Function return
JUMP ....V. Unconditional jump
LABEL ....V. Label definition
Table 4.2: Node operators

28
Chapter 4 – LCC

Operator Operation
AND Logical AND
OR Logical OR
NOT Logical NOT
COND Conditional expression
RIGHT Composition
FIELD Bit-field access
Table 4.3: Tree operators

4.3.3 Semantic analysis


The semantic analysis of the source program is done when the parser
recognizes the input, so there is therefore no explicit phase in the com-
pilation where this is done. Each parsing function detects and handles
the semantic errors according to the semantics of each construct.
When for example a type conversion is needed an extra convert node
is inserted in the abstract syntax tree, and the expression x = 6 gener-
ates an error if x is not defined. There are a lot of other semantic
checks that are also being done.

4.3.4 Intermediate code generation


During this stage the compiler produces directed acyclic graphs, or
dags, from the trees. The compiler also eliminates common subexpres-
sions. For example, in the expression (a+b)+b*(a+b) the value of
a+b is computed twice. The dag for this expression can be seen in Fig-
ure 4.2. The multiplication node (MULI4) uses the already computed
values for a+b and b instead of computing them again.
The names of the nodes in dags are made up of a generic operator, a
type suffix and a size indicator. The + is omitted to distinguish dags
from trees. For example, ADDI4 denotes a 4-byte (32-bit) integer addi-
tion.

Figure 4.2: The dag for (a+b)+b*(a+b)

29
4.3 – The compiler

Trees contain operators that are not allowed for dags. The available
operators for dags are seen in Table 4.2. When the dags are con-
structed the operators that are not allowed are replaced by other oper-
ators instead. For example, the operator AND is replaced by a
comparison and jumps and labels.
Before the dags are passed to the back end they may be converted to
trees again. Some back ends wants trees and some wants dags. All
back ends that are included in the LCC distribution wants trees. When
the conversion is done nodes that are referenced multiple times
because of the common sub expression optimization are changed. The
result of the common subexpression is stored in a temporary variable
that is used instead. The resulting tree is still using the same data
structures and representation as the dags though.

4.3.5 Back end


LCC's back end is divided in a machine independent part and in a
machine dependent part. The front end communicates with the back
end by calling a number of interface functions.
In a C program, all program code is contained in functions. To gener-
ate code for a function the front end calls the interface function
function(). function() uses two functions to generate code,
gencode() and emitcode(). gencode() selects and orders
instructions and allocates registers. emitcode() emits the assembler
code for the function and also removes unnecessary register to register
copies. These register to register copies are left over from earlier opti-
mizations and it is easier to remove them here.

Selecting instructions
The instruction selection is done in the function gencode(). The
instruction selectors used by LCC are generated automatically from a
specification by a program called lburg. lburg is a code generator
generator and it emits a tree parser written in C.
The core of an lburg specification is a tree grammar, which is a list of
rules where each rule has a nonterminal on the left and a pattern of
terminals and nonterminals on the right. For example, the rule
addr: ADDI4(reg, con)
matches a tree at an ADDI4 node if the node’s first child recursively
matches the nonterminal reg and the second child recursively matches

30
Chapter 4 – LCC

the nonterminal con. In Figure 4.3 the tree with the selected rules for
the statement i = c + 2 can be seen.

Figure 4.3: Tree with rules

Tree grammars are usually ambiguous, which means that there can be
more than one selection of instructions that do the same thing. For
example, increasing a register by one can be done by adding one to the
register directly or by loading one into another register and adding
the two registers. The cheapest implementation is preferred, so a cost
is assigned to each rule and the parse tree with the lowest total cost is
selected.

Specifications
lburg specifications uses the following format
%{
configuration
%}
declarations
%%
rules
%%
C code
The configuration part is C code and is optional. It is copied
directly into the generated file. The same applies to the C code part.
The declarations part contains the start symbol and a list of all the
terminals. The rules part contains tree patterns. Each rule has an
assembler code template, which is a quoted string that specifies what
to emit when the rule is used. Rules end with an optional cost. The fol-
lowing is an example of a simple specification

31
4.3 – The compiler

%start stmt
%term ADDI4=309 ADDRLP1=295 ASGNI4=53
%term CNSTI4=21 INDIRI4=67
%%
con: CNSTI4 "1"
addr: ADDRLP1 "2"
addr: ADDI4(reg, con) "3"
rc: con "4"
rc: reg "5"
reg: ADDI4(reg, rc) "6" 1
reg: addr "7" 1
stmt: ASGNI4(addr, reg) "8" 1
In this example the assembler code templates are simply rule num-
bers. Rule 1 states that con matches constants. Rule 2 and 3 states that
addr matches trees that can be computed by address calculations, like
an ADDRLP1 or the sum of a register and a constant. rc matches a con-
stant or a reg, and reg matches any tree that can be computed into a
register. Rule 6 describes an add instruction. The first operand must be
in a register and the second operand can be a register or a constant.
The result is stored in a register. Rule 7 describes an instruction that
loads an address into a register. Rule 8 describes an instruction that
stores a register at an address.

The emitter
The emitter in the function emitcode() is what outputs the assem-
bler code from the assembler templates. Each rule has one assembler
template. If the template ends with a newline character, lburg
assumes that it is an instruction, otherwise it is assumed to be a piece
of an instruction.
When the emitter emits the template it treats some characters differ-
ently. %digit tells the emitter to emit the digit-th nonterminal from
the pattern. %c emits the nonterminal on the left side of the produc-
tion. For example, the rule
areg: ADDI4(reg, rc) "add %c,%0,%1"
might be emitted as
add a1,r1,#60
If the template begins with #, emit2() is called to emit the instruc-
tion. This is needed to deal with tricky features in some assemblers.

32
5
Implementation

5.1 Introduction
The main goal of this thesis was the design and implementation of a
new back end to the LCC compiler for the DSP56002 processor. One
other goal was to maintain compatibility with Motorola’s C compiler,
so that the generated code would behave in the same way. This means
that the two compilers use the registers in the same way, uses the
same memory layout, uses the same calling convention, and so on. By
doing this, code generated by Motorola’s compiler can use code com-
piled by this compiler, libraries for example, and vice versa.
LCC is designed so that retargeting should be as easy as possible, and
the included backs ends only consist of about 1000 lines of code each.
This chapter will describe how the back end was constructed and why
it looks and behaves as it does.

5.2 The compiler


The DSP56002 digital signal processor is designed to execute DSP ori-
ented calculations as fast as possible. As a consequence, it has an
architecture that is somewhat unconventional for the C language.
Because of this there are characteristics of the compiler and the gener-
ated code that are a bit unusual and will be documented here. Since
this compiler should be compatible with Motorola’s compiler this sec-
tion is based on information from [5].

33
5.2 – The compiler

5.2.1 Data types and sizes


Because of the word orientation of the DSP56002, all data types are
aligned on word boundaries. One word is 24-bit wide.

Integer data types


The sizes and ranges of the integer data types is defined in Table 5.1.

Data type Size (words) Min value Max value


char 1 –8388608 8388607
unsigned char 1 0 0xFFFFFF
short 1 –8388608 8388607
unsigned short 1 0 0xFFFFFF
int 1 –8388608 8388607
unsigned int 1 0 0xFFFFFF
long 2 –140737488355328 140737488355327
unsigned long 2 0 0xFFFFFFFFFFFF
long long 2 –140737488355328 140737488355327
unsigned long long 2 0 0xFFFFFFFFFFFF
Table 5.1: Integer data type sizes and ranges

Floating point types


The C data types float and double are implemented as fractional
numbers as used by the DSP56002 processor. The precision and range
can be seen in Table 5.2.
This is not consistent with the Motorola compiler. It uses single preci-
sion floating point arithmetic for both float and double. However,
since the DSP56002 can not do floating point arithmetic in hardware,
all operations are performed by calls to an external library.
The choice to implement floating point numbers as fractional numbers
was made because that is what the hardware supports. Emulating
“real” floating point numbers defeats the purpose of using a DSP
processor in the first place since it will then be slower than a normal
processor. It was also easy to implement since the only thing that is
different between integers and fractional numbers is the way multipli-
cation and division is handled. There is one problem though, since the
hardware does not support 48-bit multiplication and division the
compiler uses the integer version of those operations for the double

34
Chapter 5 – Implementation

data type. This pretty much makes this data type useless, but it is
implemented anyway.

Data type Precision (bits) Range


float 24 – 1.0 ≤ x < 1.0
double 48 – 1.0 ≤ x < 1.0
Table 5.2: Floating point data type precision and ranges

Pointer types
All pointers are 16-bit. When computing addresses with integer arith-
metic only the least significant 16 bits are relevant. See Table 5.3.

Data type Size (words) Min value Max value


pointers 1 0 0xFFFF
Table 5.3: Pointer size and range

5.2.2 Register usage


The compiler uses all of the registers in the DSP56000 processor except
the M-registers. The register usage can be seen in Table 5.4.

Register Usage
R0 Frame pointer (16-bit)
R6 Stack pointer (16-bit)
Address registers used for pointers and structures. (16-
R1 – R5, R7
bit)
Compiler temporary. Used when updating the R-regis-
N0 – N7
ters. (16-bit)
M0 – M7 Unused. Must be kept as 0xFFFF. (16-bit)
48-bit general purpose register and 48-bit function
A
return value.
24-bit general purpose register and 24-bit and 16-bit
A1
function return value.
B 48-bit general purpose register.
B1 24-bit general purpose register.
Table 5.4: Register usage

35
5.2 – The compiler

Register Usage
X, Y 48-bit general purpose registers.
X0, X1, Y0, Y1 24-bit general purpose registers.
Table 5.4: Register usage

5.2.3 Memory usage


Due to the architecture of the DSP56002 program and data memory
are separate. The program resides in the P-memory and all data are
stored in the Y-memory. Figure 5.1 illustrates the default program and
data memory layout.

Figure 5.1: Program and data memory layout

The bottom of the program memory contains the interrupt table and it
is filled with jumps to the subroutine Fabort, except at the first posi-
tion that contains a jump to the subroutine F__start. This is because
the DSP processor starts execution here. The F__start function takes
care of initialization and calls the Fmain subroutine, which is the com-
piled main() function. The rest of the program memory is used to
store the compiled functions.
The data memory is split in three parts. The bottom part contains data
defined in the crt0 file. The next part is used for global and static data,
and the final part is used for the runt-time stack and the heap.

36
Chapter 5 – Implementation

5.2.4 Frame layout


An activation record, or frame, holds all the state information needed
for one invocation of a function. This includes local and temporary
variables, saved registers and return address. The stack stores one
frame for each active function. When a function is called it puts a new
copy of the frame on the stack, and when the function returns it
removes the frame. The frame pointer points into the currently active
frame. It is used to access data inside the frame. The stack pointer
points to the first free space on the stack and it grows upwards. Figure
5.2 shows the layout of a frame.

Figure 5.2: Frame layout

An example of what the stack looks like during a function call can be
seen in Figure 5.3. The code that is executed looks like this:
void main(void) {
func();
}
When the execution point reaches the call to func() there is a frame
on the stack for the function main(). A new frame is built and

37
5.2 – The compiler

pushed onto the stack and the frame pointer is updated. When the
function returns, it’s frame is removed from the stack and the frame
pointer is restored.

Figure 5.3: Frame example

5.2.5 Calling convention


Whenever a function is called, a strict calling convention is followed.
The calling sequence is divided in three parts, the caller sequence, the
callee sequence and the return sequence.

Caller sequence
The caller part of the calling sequence consists of:
1. Pushing the arguments onto the stack in reverse order.
2. Calling the function.
3. Adjust the stack pointer.

Callee sequence
During the initial part of the calling sequence, the called function is
responsible for:
1. Saving the return address and the old frame pointer.
2. Updating the frame and stack pointers.
3. Saving the following registers if they are used by the function:
B0, B1, X0, X1, Y0, Y1, R1 – R5 and R7.

Return sequence
During the final part of the calling sequence, the called function is
responsible for:
1. Placing the return value in register A.

38
Chapter 5 – Implementation

2. Testing the return value. This feature is not used by this com-
piler, but it is needed to maintain compatibility with the calling
convention used by Motorola.

5.2.6 Naming convention


The compiler uses a special naming format when generating assembly
code. This can be seen in Table 5.5.

Label Purpose
Local labels. Used for targets of jumps. # is a unique
L#
number.
Global variables and functions. <identifier> is the var-
F<identifier>
iable or function name.
F__<identifier># Variables static to a function.
Section names. The contents of each assembly file gen-
erated by the compiler are contained in a unique sec-
<filename_c>
tion. <filename_c> is the file currently being compiled
where the '.' is replaced by '_'.
Table 5.5: Naming convention

5.3 Retargeting
The back end in LCC is split in two parts, a machine independent and
a machine dependant part. All back ends use the same machine inde-
pendent part. This simplifies the construction of new back ends since
it does not require as much new code.
The different back ends in LCC are stored in .md files (md stands for
machine description). They contain everything that is needed for the
back ends. To create a new back end a new file is created. When LCC is
being compiled the files for the back ends are fed through the program
lburg. lburg generates C code from the .md files, and that is then
compiled together with the rest of the source code for LCC to create
the compiler.
The format of the .md file is the same as of the specification mentioned
in section 4.3.5 on page 30. The complete file for the new back end is
called dsp56k.md and can be seen in Appendix C on page 61.
The following sections will give a more detailed description of the file.
The section names correspond to the names used in section 4.3.5.

39
5.3 – Retargeting

5.3.1 Configuration
This part contains declarations and function definitions. They are cop-
ied to the top of the generated C file and is almost identical for all back
ends. It is only the global variables that differ somewhat.
The arrays ireg[32], ireg2[32] etc. are used to hold information
about the registers. The variables iregw, iregw2 etc. are used to hold
an entire register set, also called a wildcard. cseg is used to hold the
current segment and retstruct is a flag that is set if the function
being compiled returns a structure.

5.3.2 Declarations
This part contains all the terminals that are used for the rules. They are
generated by a program called ops that is included in the LCC distri-
bution. The command line given to the program was the following:
ops c=1 s=1 i=1 l=2 h=2 f=1 d=2 x=2 p=1
The letter c stands for char, s for short, and so on, covering all the
data types in C. x means long double and p means pointers. The
number indicates how many bytes (or words) each data type use. The
output from the program is a list with all the terminals that can appear
with the given data type sizes and it is copied directly into the .md
file.

5.3.3 Rules
This is the core of the back end, and the rules were the most time con-
suming part to write. They required a lot of testing and fine tuning.
The rules were constructed incrementally. It started with a few basic
rules that was only able to compile empty programs. Then new rules
were gradually added to handle more operations, but only for one
data type (1-word integers). When all operators were covered addi-
tional rules to take care of all data types were added. While writing
the rules, the function emit2() was also constructed to take care of
the cases where the instruction templates were not enough. This func-
tion is documented further down.
The other back ends were used as an inspiration when writing this,
but a lot is different because of the nature of the DSP56002.

5.3.4 C code
This is the actual interface to the back end that the front end uses. It is
made up of a number of interface functions and a structure called the

40
Chapter 5 – Implementation

interface record that contains configuration data. The front end calls
the functions to inform the back end of things during the compilation.
It also calls the functions to let the back end emit the actual assembly
code.
This part was created in the same way and at the same time as the
rules were written. At first only a few functions are needed to compile
an empty program. As more and more rules were created the func-
tions needed to have more and more features.
The following is a description of the functions and the interface record
in the .md file. They are listed in the same order as they appear in the
file.

progbeg()
The front end calls progbeg() during initialization to set up varia-
bles and initialize data and to emit the boilerplate at the beginning of
the generated assembly file. It is also responsible for checking and tak-
ing care of the command line arguments passed to the back end.

progend()
When the compilation ends the front end calls progend to give the
back end a chance to clean up and finalize its output.

rmap()
This function is used to tell the front end which register class the oper-
ators should use. P and B is pointers and structures, they use the R-
registers. I, U and F are integers, unsigned integers and floating point
numbers. They use the same registers; X and Y if they are 48-bit (2
words) and X0, X1, Y0 and Y1 if they are 24-bit (1 word).

segment()
The front end tells the back end which segment it should use, CODE,
BSS, DATA or LIT. CODE is used for code, BSS for uninitialized varia-
bles, DATA for initialized variables and LIT for constants. For this
implementation CODE uses the P memory and all the data uses the Y
memory.

target()
This function is used because some instructions must have their oper-
ands in certain registers or they leave the result in a specific register. In
this implementation there are a lot of instructions that needs to be tar-

41
5.3 – Retargeting

geted because of the hardware. Most of instructions in the DSP56002


use the X and Y registers as input registers and the accumulator regis-
ters A and B as output registers. Therefore almost all instructions must
be forced to use specific registers.
The function rtarget() is used to specify a register for the operand
and setreg() is used to set the register that will contain the result
from an instruction.

clobber()
Some instructions destroy the value in a register. This function tells
the front end to insert instructions to save and restore the register
before and after an instruction that destroys it.

emit2()
Some instructions are too complicated to be emitted using the instruc-
tion templates. They are emitted by this function instead. There are a
lot of different reasons to why an instruction must be emitted by
emit2() instead. For example, reg: CNSTI2 loads a constant into a
register, but the hardware can not move a 48-bit constant to a register
using one instruction, so the constant must be split in two parts and
moved to the register using two move instructions. This can not be
done with the templates so emit2() emits it instead.

doarg()
doarg() is called for each argument before a function call. It is used
to compute the register or stack cell assigned to the next argument.
For this implementation all function arguments are put on the stack.

blkfetch(), blkstore() and blkloop()


These functions can be used to emit code that copies blocks of data.
This can, for example, be used by structure assignments. If they are
empty the compiler will generate other code instead.

local()
This function is used by the front end to announce local variables to
the back end. It is used to set the stack offset for the variables.

function()
function() is used by the front end to generate and emit code for a
function. It is usually divided in three parts. In the first part initializa-
tion is being done. Offsets for the argument variables are calculated.

42
Chapter 5 – Implementation

After that gencode() is called to generate the code for the function.
In the second part the size of the frame and the registers that need sav-
ing are known. The function prologue is emitted to save the old frame
pointer and to set up a new stack pointer and save the necessary regis-
ters. After that emitcode() is called to emit the actual assembly code
for the function. In the third part the function epilogue is emitted to
restore registers and the frame and stack pointers.

defsymbol()
defsymbol() is called by the front end whenever a new symbol is
defined. A symbol is an internal data type in the compiler that repre-
sents variables, constants, labels and types. This function sets up the
name that the back end uses for the symbols.

address()
This function is used to initialize a symbol that represents an address
on the form x+n, where x is a symbol name and n is a number. This is
used so that addresses can be calculated by the assembler instead of at
run time.

defconst()
This function is used to emit assembly code for constants.

defaddress()
This function emits assembly for pointer constants.

defstring()
This function is used to emit assembly code to initialize a string.
There is a special case that needs to be taken care of here. Internally
the compiler treats all arrays with the data type “1-byte integer” as
strings. This causes a problem since char, short and int are all of
this data type in this implementation. Therefore the compiler believes
that all arrays of these data types are strings and tries to emit them as
strings. A change in the front end was made so that if the variable n
(the length of the string) is -1, the string pointer *str contains the
actual value instead of pointing to a string that should be emitted.

export() and import()


These functions are used to emit assembly directives to import and
export symbols so that variables and functions can be reached across
different source files.

43
5.4 – Special features

global()
The front end calls this function to emit assembly to make a variable
global.

space()
This function is used to emit code that creates a block of words set to
zero.

Interface record
The interface record is used to configure the back end. It consists of a
structure that is assigned to the global variable IR. It is done when the
compiler is initialized and it is set to the structure of the back end that
the compiler chooses. The front end can then call the interface func-
tions in the back end on the form (*IR->progbeg)(arg1,arg2)
and access the interface variables on the form IR->wants_dag.
The first part of the interface record sets up the size and alignment of
the data types in C. After that are some flags to set up specific features.
The rest is used to assign the functions documented above.

5.4 Special features


There are some special features in the compiler that needs to be men-
tioned.
The C standard specifies that floating point constants default to dou-
ble if they are not ended with an f. For example, 1.0 is a double
and 1.0f is a float. But since the double data type is not fully
implemented this was changed so that floating point numbers default
to float and double is used if the number ends with a d. This vio-
lates the C standard, but it makes the compiler more usable.
1.0 is not a valid number for float and double since the ranges for
these data types are – 1.0 ≤ x < 1.0 . But it is still possible to use it and it
will be converted to the largest number allowed for that data type.
This avoids statements like
float x = 0.999999999999;
to get the largest possible number.

44
Chapter 5 – Implementation

5.5 Other changes to LCC


One “internal” goal when implementing the compiler was to avoid
any changes to the front end. This could however not be accom-
plished.
The biggest change that needed to be done involved the byte size.
LCC assumes that a byte is 8 bits wide, and the DSP56002 processor
uses a byte size (or word size) of 24 bits. LCC is hard coded with this,
so the expression 8*someting appeared in a lot of places in the
source code. This was changed by replacing the 8 with a global varia-
ble with the byte size that is set during initialization when the back
end is chosen.
Other things that needed changes in the front end was the change of
the default floating point type from double to float and the string-
array-problem mentioned for defstring(). A new command line
option was added that turns on debug messages in the back end (-x),
and some minor fixes to deal with 48-bit constants (LCC’s maximum
is 32-bit) and the fractional numbers.
LCC uses strength reduction for optimization and replaces multiplica-
tions and divisions with a power of two with left and right shifts. This
was disabled because the DSP56002 can only shift one step at a time.
Also, the multiplication and the shift operation have the same execu-
tion time.
All of these changes were made so that they only affected this back
end, and not the other back ends, so that they would still work. One
reason for this was that the other back ends were used to test and com-
pare this back end so it was crucial that they still worked.

5.6 The environment


The actual compiler executable, called rcc, is used together with a
number of other programs to form a complete compiler environment.
rcc only compiles pre processed C code to assembly code. The main
executable for LCC is called lcc and is a so called driver. It drives the
compilation by taking care of file names and starting the different pro-
grams with the correct command line parameters. The following is a
list of the programs used:
• Preprocessor: The C preprocessor from the GNU Compiler Col-
lection is used. The actual executable is called cpp.

45
5.7 – crt0

• Assembler: The assembler from Motorola is used. It is called


asm56000 and it is run with the following command line
options: -B -OIL,NOW. This makes the assembler output a relo-
catable object file and disables both output to the terminal and
warning messages.
• Linker: Also the linker from Motorola is used. It is called
dsplnk and is run with the command line options -d and -B.
-B causes the linker to output an absolute executable file. -d is
an undocumented switch that is also used by the Motorola com-
piler when it invokes the linker. It instructs the linker to define
the variable DSIZE to the amount of used memory in the Y
memory. It is used to initialize the stack pointer. The switch may
have other effects, but it is unknown.

5.7 crt0
The crt0 file is linked with all programs when they are compiled. It
contains the C bootstrap code and provides the environment to exe-
cute a C program. The bootstrap code is put first in the resulting exe-
cutable and is always the code that gets executed first when a program
runs. It defines some global variables, initializes the memories, stack
and frame pointers and registers. It then jumps to the main() func-
tion in the C program.

5.8 Problems
There have been some problems during the development of the back
end. Most of them originates from the fact that LCC is intended for
normal processors and not DSP processors.

5.8.1 Register targeting


The most severe problem has to do with register targeting. In LCC reg-
isters can be grouped together in something that is called wildcards.
Each data type in LCC (I1, I2, U1 and so on) is assigned to use regis-
ters from one wildcard. This is done in the function rmap(). When
registers are selected for the instructions they are all assigned from the
same wildcard since they are the same data type. This is not suitable
for the DSP52002 since it uses input registers and accumulator regis-
ters. Most of the instructions must use the input registers for input
and produces the output in the accumulator registers.

46
Chapter 5 – Implementation

The solution to this problem is to use the target() function which


forces the register allocator to select a specific register for an instruc-
tion. The input registers are allocated from the wildcard and tar-
get() sets the output register to one of the accumulator registers.
This results in another problem that is caused by the optimization
being done in the register allocator. When compiling expressions like
a*b*c the compiler generates temporary variables to store the inter-
mediate result between the two multiplications. This intermediate
result would be calculated to the accumulator register, and the register
allocator uses this as an input register to the next multiplication. This
is not allowed on the DSP56002 processor. For example, the expression
a*b*c would compile to
mpy y0,y1,b
mpy b,x1,b
The solution to this is to target the input registers for some instruc-
tions as well. The result of this is that the register allocator can not
choose between the free registers when targeting these instructions,
but has to use a specific register that may already be used. This causes
the compiler to emit extra instructions to save that register to memory
and also a lot of register to register moves.
There is still a problem with the register allocator. This could possibly
be caused by a bug in the front end. When compiling the expression
a*a the register allocator will use the same input register for both
operands. This is not allowed on the DSP56002. The following code is
generated
mpy y0,y0,b
even though the input registers are targeted to Y0 and Y1. This prob-
lem is handled in the function emit2().
There is one final problem with the register allocator that could not be
solved. In the function progbeg() the variables tmask[] and
vmask[] are used to configure which registers can be used for tempo-
rary values and which registers can be used for variables. When com-
piling larger programs the register selector will crash and complain
that it can not find any free registers unless vmask[] is set to zero
(meaning no registers). This means that variables will always be
fetched and stored directly to and from memory when they are used,
instead of being stored in a register between calculations. For exam-
ple, the statements

47
5.8 – Problems

x = a + b;
y = a + c;
will cause the value of a to be fetched two times instead of being
fetched and then reused.
The reason to why the register allocator crashes is unknown, but if the
problem could be fixed the performance of the generated code would
increase a lot.

5.8.2 48-bit registers


The emit2() function is quite big as a result of another problem. The
DSP52002 does not handle 48-bit numbers very well. The and, or, eor
and not instructions can only operate on 24-bit numbers, so that
needs to be done in two steps. The mpy, div and mod instructions only
supports 24-bit input, so special algorithms are emitted for them to
handle 48-bit input.
Moving 48-bit registers to and from memory needs special treatment.
The DSP56002 can only transfer 24 bits at a time over the data bus to
the memory, so that is done in two steps.

5.8.3 Address registers


The DSP56002 can only address the memory by using the R-registers
together with the N- and M-registers. This causes some problems
because LCC often accesses the memory by adding a constant offset to
an address in a register. The local variables are accessed by adding a
constant to the stack pointer and the function arguments are accessed
by adding a constant to the frame pointer. To do this on the DSP56002
the constant must be moved to one of the N-registers before the mem-
ory can be accessed. For example, something that could have looked
like this
move Y:(r0)-3,a
will instead look like this
move #-3,n0
lua (r0)+n0,r7
move Y:(r7),a
The assembler will also insert two nop instructions because the new
values in the R- and N-registers are not available until one instruction
has passed. This is due to pipeline delays in the processor. So, a total
of five instructions are required to access any local or argument varia-

48
Chapter 5 – Implementation

ble. This is very costly since it is used for both local variables and
function arguments.

5.9 Improvements
There are some improvements that could be made to the compiler.
One obvious thing to improve is the register allocator. The first thing
to do is to fix the crashing so it can promote variables to registers. One
other thing that can be done is to introduce new register classes for the
input and accumulator registers so it can take full advantage of all
available registers.
The DSP56002 has some hardware features that could be used. Some
of them may be a bit tricky to implement:
• The DSP can do hardware looping, but that would be difficult to
use since it has a lot of restrictions.
• The instruction MAC, which means multiply and accumulate,
could be used to speed up the generated code. That would rela-
tively easy to implement since it could be realised with a rule
that looks like this:
reg: ADDI4(MULI4(reg, reg), reg)
It would match two nodes in the tree and use fewer instructions.
There are other instructions that could be used in the same way,
for example INC, which increases a register by one.
• The DSP can execute one instruction and move data to and from
the memories at the same time. This can be utilized to speed up
the code, but it would be difficult to implement. The Motorola
compiler uses this by running the generated assembly code
trough an external program called alo. It analyses the code and
concatenates MOVE instructions with other instructions. For
example, the code
move x0,Y:(r1)
add x1,a
would be transformed to
add x1,a x0,Y:(r1)
This kind of optimization would increase the performance of the code
a lot.

49
5.9 – Improvements

50
6
Conclusions

6.1 Retargeting
The goal of this thesis was to retarget a C compiler for the Motorola
DSP56002 processor. The resulting compiler should generate working
code that functions as intended. That goal was fulfilled. A working
compiler was constructed. The performance of the code that the com-
piler generates is far from optimal, but since that was not a require-
ment, only a limited amount of time was spent to improve it.
The second goal was to make the compiler compatible with the com-
piler from Motorola. This was also accomplished and the compiler can
be used with the assembler and linker from Motorola to form a com-
plete environment that compiles the C code.
The code that the compiler generates was tested in a simulator for the
DSP56002 processor to verify that it works. This was very useful and a
lot of bugs were found using the simulator. There is probably still a
few bugs left in the compiler since there was not enough time to test
everything.
An example of a compilation of a simple program can be seen in
Appendix B on page 59.

6.2 Future work


The compiler works, but it generates code that does not perform very
well. Much of this has to do with the way the compiler selects and

51
6.2 – Future work

uses registers. As mentioned in section 5.9, this could be fixed by


changing how the register allocator works.
Hardware features can be used to speed up the code. Some of the
hardware features may be very difficult to use, such as hardware loop-
ing, but others may be implemented with very little difficulty.
And, of course, more testing to find and fix bugs can always be done.

52
References

Books and papers


[1] A. V. Aho, R. Sethi and J. D. Ullman, Compilers - Principles, Tech-
niques, and Tools, Addison-Wesley, 1985
[2] C. W. Fraser and D. R. Hanson, A Retargetable C Compiler: Design
and Implementation, Addison-Wesley, 1995
[3] C. W. Fraser and D. R. Hanson, The lcc 4.x Code-Generation Inter-
face, Microsoft Research, 2003
[4] Motorola, DSP56000 24-bit Digital Signal Processor Family Man-
ual, Motorola Inc., 1994
[5] Motorola, Motorola DSP56000 Family Optimizing C Compiler
User’s Manual, Release 6.3, Motorola Inc., 1999

Internet
[6] LCC home page:
http://www.cs.princeton.edu/software/lcc/
[7] LCC license:
http://www.cs.princeton.edu/software/lcc/4.1/CPYRIGHT

53
54
A
Instructions

This appendix lists all instructions available for the Motorola


DSP56002. The instructions are arranged in groups according to their
functionality. The * indicates that a parallel data move is allowed for
the instruction.

A.1 Arithmetic instructions

Instruction Description
ABS Absolute value *
ADC Add long with carry *
ADD Add *
ADDL Shift left then add *
ADDR Shift right then add *
ASL Arithmetic shift accumulator left *
ASR Arithmetic shift accumulator right *
CLR Clear accumulator *
CMP Compare *
CMPM Compare magnitude *
DEC Decrement accumulator
Table A.1: Arithmetic instructions

55
Instruction Description
DIV Divide iteration
INC Increment accumulator
MAC Signed multiply-accumulate *
MACR Signed multiply-accumulate and round *
MPY Signed multiply *
MPYR Signed multiply and round *
NEG Negate accumulator *
NORM Normalize accumulator iteration
RND Round accumulator *
SBC Subtract long with carry *
SUB Subtract *
SUBL Shift left then subtract *
SUBR Shift right then subtract *
Tcc Transfer Conditionally
TFR Transfer data ALU register *
TST Test accumulator *
Table A.1: Arithmetic instructions

A.2 Logical instructions

Instruction Description
AND Logical AND *
ANDI AND immediate with control register
EOR Logical exclusive OR *
LSL Logical shift accumulator left *
LSR Logical shift accumulator right *
NOT Logical complement on accumulator *
OR Logical inclusive OR *
ORI OR immediate with control register
ROL Rotate accumulator left *
ROR Rotate accumulator right *
Table A.2: Logical instructions

56
Appendix A – Instructions

A.3 Bit manipulation instructions

Instruction Description
BCHG Bit test and change
BCLR Bit test and clear
BSET Bit test and set
BTST Bit test on memory
Table A.3: Bit manipulation instructions

A.4 Loop instructions

Instruction Description
DO Start hardware loop
ENDDO Exit from hardware loop
Table A.4: Loop instructions

A.5 Move instructions

Instruction Description
LUA Load updated address
MOVE Move data
MOVEC Move control register
MOVEM Move program memory
MOVEP Move peripheral data
Table A.5: Move instructions

57
A.6 Program control instructions

Instruction Description
DEBUG Enter debug mode
DEBUGcc Enter debug mode conditionally
ILLEGAL Illegal instruction interrupt
Jcc Jump conditionally
JCLR Jump if bit clear
JMP Jump
JScc Jump to subroutine conditionally
JSCLR Jump to subroutine if bit clear
JSET Jump if bit set
JSSET Jump to subroutine if bit set
JSR Jump to subroutine
NOP No operation
REP Repeat next instruction
RESET Reset on-chip peripheral devices
RTI Return from interrupt
RTS Return from subroutine
STOP Stop processing (low power standby)
SWI Software interrupt
WAIT Wait for interrupt (low power standby)
Table A.6: Program control instructions

58
B
Sample code

This appendix contains the resulting assembly file of a compilation of


a small test program called sample.c.

B.1 sample.c
int res = 1;
int x = 10;

void main(void) {
if(res == 1)
res = res + x;
else
res = x;
}

59
B.2 sample.asm

section sample_c
boilerplate
opt so,nomd,rp
org y:
global Fres
Fres
dc $1
initialized data
global Fx
Fx
dc $a
org p:

;;;; Function main starts


global Fmain
Fmain: entry sequence/
move r0,Y:(r6)+ function prologue
lua (r6)+,r0
move ssh,Y:(r6)+
move y1,Y:(r6)+
move Y:Fres,a
move #>1,y1
cmp y1,a
jne L2
move Y:Fres,a
move Y:Fx,y1
add y1,a
body of main()
move a,Y:Fres
jmp L3
L2:
move Y:Fx,y1
move y1,Y:Fres
L3:
L1:
move Y:-(r6),y1
move Y:-(r6),ssh
tst a Y:-(r6),r0 exit sequence/
function epilogue
rts
;;;; Function main ends

endsec
end boilerplate

60
C
dsp56k.md

This appendix contains the entire contents of the file dsp56k.md. It is


used to create the back end for the Motorola DSP56002.
%{
#include “c.h”
#include “time.h”

#define debug2(x) (void)(xflag&&((x),0))

#define NODEPTR_TYPE Node


#define OP_LABEL(p) ((p)->op)
#define LEFT_CHILD(p) ((p)->kids[0])
#define RIGHT_CHILD(p) ((p)->kids[1])
#define STATE_LABEL(p) ((p)->x.state)

static void address(Symbol, Symbol, long);


static void blkfetch(int, int, int, int);
static void blkloop(int, int, int, int, int, int[]);
static void blkstore(int, int, int, int);
static void defaddress(Symbol);
static void defconst(int, int, Value);
static void defstring(int, char *);
static void defsymbol(Symbol);
static void doarg(Node);
static void emit2(Node);
static void export(Symbol);
static void clobber(Node);
static void function(Symbol, Symbol [], Symbol [], int);
static void global(Symbol);
static void import(Symbol);
static void local(Symbol);
static void progbeg(int, char **);
static void progend(void);
static void segment(int);
static void space(int);
static void target(Node);

static int equal(Node p, int a);

static Symbol ireg[32], ireg2[32], areg[32], areg2[32];


static Symbol rreg[32];
static Symbol iregw, ireg2w, aregw, areg2w;
static Symbol rregw;

static int cseg;


static int retstruct;

%}
%start stmt

%term CNSTF1=1041 CNSTF2=2065


%term CNSTI1=1045 CNSTI2=2069
%term CNSTP1=1047

61
%term CNSTU1=1046 CNSTU2=2070

%term ARGB=41
%term ARGF1=1057 ARGF2=2081
%term ARGI1=1061 ARGI2=2085
%term ARGP1=1063
%term ARGU1=1062 ARGU2=2086

%term ASGNB=57
%term ASGNF1=1073 ASGNF2=2097
%term ASGNI1=1077 ASGNI2=2101
%term ASGNP1=1079
%term ASGNU1=1078 ASGNU2=2102

%term INDIRB=73
%term INDIRF1=1089 INDIRF2=2113
%term INDIRI1=1093 INDIRI2=2117
%term INDIRP1=1095
%term INDIRU1=1094 INDIRU2=2118

%term CVFF1=1137 CVFF2=2161


%term CVFI1=1141 CVFI2=2165

%term CVIF1=1153 CVIF2=2177


%term CVII1=1157 CVII2=2181
%term CVIU1=1158 CVIU2=2182

%term CVPU1=1174

%term CVUI1=1205 CVUI2=2229


%term CVUP1=1207
%term CVUU1=1206 CVUU2=2230

%term NEGF1=1217 NEGF2=2241


%term NEGI1=1221 NEGI2=2245

%term CALLB=217
%term CALLF1=1233 CALLF2=2257
%term CALLI1=1237 CALLI2=2261
%term CALLP1=1239
%term CALLU1=1238 CALLU2=2262
%term CALLV=216

%term RETF1=1265 RETF2=2289


%term RETI1=1269 RETI2=2293
%term RETP1=1271
%term RETU1=1270 RETU2=2294
%term RETV=248

%term ADDRGP1=1287
%term ADDRFP1=1303
%term ADDRLP1=1319

%term ADDF1=1329 ADDF2=2353


%term ADDI1=1333 ADDI2=2357
%term ADDP1=1335
%term ADDU1=1334 ADDU2=2358

%term SUBF1=1345 SUBF2=2369


%term SUBI1=1349 SUBI2=2373
%term SUBP1=1351
%term SUBU1=1350 SUBU2=2374

%term LSHI1=1365 LSHI2=2389


%term LSHU1=1366 LSHU2=2390

%term MODI1=1381 MODI2=2405


%term MODU1=1382 MODU2=2406

%term RSHI1=1397 RSHI2=2421


%term RSHU1=1398 RSHU2=2422

%term BANDI1=1413 BANDI2=2437


%term BANDU1=1414 BANDU2=2438

%term BCOMI1=1429 BCOMI2=2453


%term BCOMU1=1430 BCOMU2=2454

%term BORI1=1445 BORI2=2469


%term BORU1=1446 BORU2=2470

%term BXORI1=1461 BXORI2=2485


%term BXORU1=1462 BXORU2=2486

%term DIVF1=1473 DIVF2=2497


%term DIVI1=1477 DIVI2=2501
%term DIVU1=1478 DIVU2=2502

%term MULF1=1489 MULF2=2513


%term MULI1=1493 MULI2=2517
%term MULU1=1494 MULU2=2518

62
Appendix C – dsp56k.md

%term EQF1=1505 EQF2=2529


%term EQI1=1509 EQI2=2533
%term EQU1=1510 EQU2=2534

%term GEF1=1521 GEF2=2545


%term GEI1=1525 GEI2=2549
%term GEU1=1526 GEU2=2550

%term GTF1=1537 GTF2=2561


%term GTI1=1541 GTI2=2565
%term GTU1=1542 GTU2=2566

%term LEF1=1553 LEF2=2577


%term LEI1=1557 LEI2=2581
%term LEU1=1558 LEU2=2582

%term LTF1=1569 LTF2=2593


%term LTI1=1573 LTI2=2597
%term LTU1=1574 LTU2=2598

%term NEF1=1585 NEF2=2609


%term NEI1=1589 NEI2=2613
%term NEU1=1590 NEU2=2614

%term JUMPV=584

%term LABELV=600

%term VREGP=711

%term LOADI1=1253
%term LOADI2=2277
%term LOADU1=1254
%term LOADU2=2278
%term LOADB=233
%term LOADF1=1249
%term LOADF2=2273
%term LOADP1=1255

%%

reg: INDIRI1(VREGP) “# read register\n”


reg: INDIRI2(VREGP) “# read register\n”
reg: INDIRU1(VREGP) “# read register\n”
reg: INDIRU2(VREGP) “# read register\n”
reg: INDIRF1(VREGP) “# read register\n”
reg: INDIRF2(VREGP) “# read register\n”
rreg: INDIRP1(VREGP) “# read register\n”

stmt: ASGNI1(VREGP, reg) “# write register\n”


stmt: ASGNI2(VREGP, reg) “# write register\n”
stmt: ASGNU1(VREGP, reg) “# write register\n”
stmt: ASGNU2(VREGP, reg) “# write register\n”
stmt: ASGNF1(VREGP, reg) “# write register\n”
stmt: ASGNF2(VREGP, reg) “# write register\n”
stmt: ASGNP1(VREGP, rreg) “# write register\n”

con: CNSTI1 “%a”


reg: CNSTI2 “# \tmove \t#>%a,%c\n” 2+2
con: CNSTU1 “%a”
reg: CNSTU2 “# \tmove \t#>%a,%c\n” 2+2
con: CNSTP1 “%a”
reg: CNSTF1 “# \tmove \t#>%a,%c\n” 2
reg: CNSTF2 “# \tmove \t#>%a,%c\n” 2+2

reg: LOADI1(reg) “\tmove \t%0,%c\n” 2


reg: LOADU1(reg) “\tmove \t%0,%c\n” 2
reg: LOADU1(rreg) “\tmove \t%0,%c\n” 2
reg: LOADI2(reg) “# \tmove \t%0,%c\n” 2
reg: LOADU2(reg) “# \tmove \t%0,%c\n” 2
reg: LOADF1(reg) “\tmove \t%0,%c\n” 2
reg: LOADF2(reg) “# \tmove \t%0,%c\n” 2
rreg: LOADP1(rreg) “\tmove \t%0,%c\n” 2
rreg: LOADP1(reg) “\tmove \t%0,%c\n” 2

reg: con “\tmove \t#>%0,%c\n” 2


rreg: con “\tmove \t#%0,%c\n” 2

rreg: ADDRGP1 “\tmove \t#%a,%c\n” 2


rreg: ADDRFP1 “\tmove \t#%a,n0\n\tlua \t(r0)+n0,%c\n” 2+4
rreg: ADDRLP1 “\tmove \t#%a,n6\n\tlua \t(r6)+n6,%c\n” 2+4

stmt: reg ““
stmt: LABELV “%a:\n”

addr: rreg “Y:(%0)”


addr: ADDRGP1 “Y:%a”

stmt: ASGNI1(addr, reg) “\tmove \t%1,%0\n” 2


stmt: ASGNI2(addr, reg) “# \tmove \t%1,%0\n” 2
stmt: ASGNU1(addr, reg) “\tmove \t%1,%0\n” 2
stmt: ASGNU2(addr, reg) “# \tmove \t%1,%0\n” 2

63
stmt: ASGNF1(addr, reg) “\tmove \t%1,%0\n” 2
stmt: ASGNF2(addr, reg) “# \tmove \t%1,%0\n” 2
stmt: ASGNP1(addr, rreg) “\tmove \t%1,%0\n” 2
stmt: ASGNB(rreg, INDIRB(rreg)) “# ASGNB\n”

reg: INDIRI1(addr) “\tmove \t%0,%c\n” 2


reg: INDIRI2(addr) “# \tmove \t%0,%c\n” 2
reg: INDIRU1(addr) “\tmove \t%0,%c\n” 2
reg: INDIRU2(addr) “# \tmove \t%0,%c\n” 2
reg: INDIRF1(addr) “\tmove \t%0,%c\n” 2
reg: INDIRF2(addr) “# \tmove \t%0,%c\n” 2
rreg: INDIRP1(addr) “\tmove \t%0,%c\n” 2

reg: ADDI1(reg, reg) “\tadd \t%1,%0\n” 2


reg: ADDI2(reg, reg) “\tadd \t%1,%0\n” 2
reg: ADDU1(reg, reg) “\tadd \t%1,%0\n” 2
reg: ADDU2(reg, reg) “\tadd \t%1,%0\n” 2
reg: ADDF1(reg, reg) “\tadd \t%1,%0\n” 2
reg: ADDF2(reg, reg) “\tadd \t%1,%0\n” 2

reg: SUBI1(reg, reg) “\tsub \t%1,%0\n” 2


reg: SUBI2(reg, reg) “\tsub \t%1,%0\n” 2
reg: SUBU1(reg, reg) “\tsub \t%1,%0\n” 2
reg: SUBU2(reg, reg) “\tsub \t%1,%0\n” 2
reg: SUBF1(reg, reg) “\tsub \t%1,%0\n” 2
reg: SUBF2(reg, reg) “\tsub \t%1,%0\n” 2

rc: reg “%0”


rc: con “%#%0”
rreg: ADDP1(rc, rreg) “# \tmove \t%0,n8\n\tlua \t(%1)+n8,%c\n” 2+4
rreg: ADDP1(rreg, rc) “# \tmove \t%1,n8\n\tlua \t(%0)+n8,%c\n” 2+4

rreg: SUBP1(rc, rreg) “# \tmove \t%0,n8\n\tlua \t(%1)-n8,%c\n” 2+4


rreg: SUBP1(rreg, rc) “# \tmove \t%1,n8\n\tlua \t(%0)-n8,%c\n” 2+4

reg: MULI1(reg, reg) “# \tmpy \t%0,%1,%c\n\tasr \t%c\n\tmove \t%c0,%c\n” 2+2+2


reg: MULI2(reg, reg) “# \tmpy \t%0,%1,%c\n” 2
reg: MULU1(reg, reg) “# \tmpy \t%0,%1,%c\n\tasr \t%c\n\tmove \t%c0,%c\n” 2+2+2
reg: MULU2(reg, reg) “# \tmpy \t%0,%1,%c\n” 2
reg: MULF1(reg, reg) “# \tmpy \t%0,%1,%c\n\tmove \t%c0,%c\n” 2+2
reg: MULF2(reg, reg) “# \tmpy \t%0,%1,%c\n” 2

reg: DIVI1(reg, reg) “# \tdiv \t%0,%1\n” 2


reg: DIVI2(reg, reg) “# \tdiv \t%0,%1\n” 2
reg: DIVU1(reg, reg) “# \tdiv \t%0,%1\n” 2
reg: DIVU2(reg, reg) “# \tdiv \t%0,%1\n” 2
reg: DIVF1(reg, reg) “# \tdiv \t%0,%1\n” 2
reg: DIVF2(reg, reg) “# \tdiv \t%0,%1\n” 2

reg: MODI1(reg, reg) “# \tmod \t%0,%1\n” 2


reg: MODI2(reg, reg) “# \tmod \t%0,%1\n” 2
reg: MODU1(reg, reg) “# \tmod \t%0,%1\n” 2
reg: MODU2(reg, reg) “# \tmod \t%0,%1\n” 2

reg: CVII1(reg) “# \tmove \t%0,%c\n” move(a)


reg: CVII2(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVIU1(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVIU2(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVIF1(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVIF2(reg) “# \tmove \t%0,%c\n” move(a)

reg: CVUI1(reg) “# \tmove \t%0,%c\n” move(a)


reg: CVUI2(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVUU1(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVUU2(reg) “# \tmove \t%0,%c\n” move(a)
rreg: CVUP1(reg) “\tmove \t%0,%c\n” move(a)

reg: CVPU1(rreg) “\tmove \t%0,%c\n” move(a)

reg: CVFF1(reg) “# \tmove \t%0,%c\n” move(a)


reg: CVFF2(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVFI1(reg) “# \tmove \t%0,%c\n” move(a)
reg: CVFI2(reg) “# \tmove \t%0,%c\n” move(a)

stmt: JUMPV(jaddr) “\tjmp \t%0\n” 4

reg: CALLB(jaddr, rreg) “\tmove \t%1,%c\n\tjsr \t%0\n” 2+4

reg: CALLI1(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


reg: CALLI1(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)
reg: CALLI1(jaddr) “\tjsr \t%0\n” equal(a, 0)

reg: CALLI2(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


reg: CALLI2(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)
reg: CALLI2(jaddr) “\tjsr \t%0\n” equal(a, 0)

reg: CALLP1(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


reg: CALLP1(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)
reg: CALLP1(jaddr) “\tjsr \t%0\n” equal(a, 0)

reg: CALLU1(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


reg: CALLU1(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)

64
Appendix C – dsp56k.md

reg: CALLU1(jaddr) “\tjsr \t%0\n” equal(a, 0)

reg: CALLU2(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


reg: CALLU2(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)
reg: CALLU2(jaddr) “\tjsr \t%0\n” equal(a, 0)

reg: CALLF1(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


reg: CALLF1(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)
reg: CALLF1(jaddr) “\tjsr \t%0\n” equal(a, 0)

reg: CALLF2(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


reg: CALLF2(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)
reg: CALLF2(jaddr) “\tjsr \t%0\n” equal(a, 0)

stmt: CALLV(jaddr) “\tjsr \t%0\n\tmove \t#%a,n6\n\tmove \t(r6)-n6\n” 4+2


stmt: CALLV(jaddr) “\tjsr \t%0\n\tmove \t(r6)-\n” equal(a, 1)
stmt: CALLV(jaddr) “\tjsr \t%0\n” equal(a, 0)

jaddr: ADDRGP1 “%a”


jaddr: CNSTP1 “%a”
jaddr: rreg “(%0)”

stmt: ARGI1(reg) “\tmove \t%0,Y:(r6)+\n” 2


stmt: ARGI2(reg) “# \tmove \t%0,Y:(r6)+\n” 2
stmt: ARGU1(reg) “\tmove \t%0,Y:(r6)+\n” 2
stmt: ARGU2(reg) “# \tmove \t%0,Y:(r6)+\n” 2
stmt: ARGF1(reg) “\tmove \t%0,Y:(r6)+\n” 2
stmt: ARGF2(reg) “# \tmove \t%0,Y:(r6)+\n” 2
stmt: ARGP1(rreg) “\tmove \t%0,Y:(r6)+\n” 2
stmt: ARGB(INDIRB(rreg)) “# ARGB\n” 8

stmt: RETI1(reg) “# rts\n” 4


stmt: RETI2(reg) “# rts\n” 4
stmt: RETU1(reg) “# rts\n” 4
stmt: RETU2(reg) “# rts\n” 4
stmt: RETF1(reg) “# rts\n” 4
stmt: RETF2(reg) “# rts\n” 4
stmt: RETP1(rreg) “# rts\n” 4

stmt: EQI1(reg, reg) “\tcmp \t%1,%0\n\tjeq \t%a\n” 2+4


stmt: EQI2(reg, reg) “\tcmp \t%1,%0\n\tjeq \t%a\n” 2+4
stmt: EQU1(reg, reg) “\tcmp \t%1,%0\n\tjeq \t%a\n” 2+4
stmt: EQU2(reg, reg) “\tcmp \t%1,%0\n\tjeq \t%a\n” 2+4
stmt: EQF1(reg, reg) “\tcmp \t%1,%0\n\tjeq \t%a\n” 2+4
stmt: EQF2(reg, reg) “\tcmp \t%1,%0\n\tjeq \t%a\n” 2+4

stmt: GEI1(reg, reg) “\tcmp \t%1,%0\n\tjge \t%a\n” 2+4


stmt: GEI2(reg, reg) “\tcmp \t%1,%0\n\tjge \t%a\n” 2+4
stmt: GEU1(reg, reg) “\tcmp \t%1,%0\n\tjge \t%a\n” 2+4
stmt: GEU2(reg, reg) “\tcmp \t%1,%0\n\tjge \t%a\n” 2+4
stmt: GEF1(reg, reg) “\tcmp \t%1,%0\n\tjge \t%a\n” 2+4
stmt: GEF2(reg, reg) “\tcmp \t%1,%0\n\tjge \t%a\n” 2+4

stmt: GTI1(reg, reg) “\tcmp \t%1,%0\n\tjgt \t%a\n” 2+4


stmt: GTI2(reg, reg) “\tcmp \t%1,%0\n\tjgt \t%a\n” 2+4
stmt: GTU1(reg, reg) “\tcmp \t%1,%0\n\tjgt \t%a\n” 2+4
stmt: GTU2(reg, reg) “\tcmp \t%1,%0\n\tjgt \t%a\n” 2+4
stmt: GTF1(reg, reg) “\tcmp \t%1,%0\n\tjgt \t%a\n” 2+4
stmt: GTF2(reg, reg) “\tcmp \t%1,%0\n\tjgt \t%a\n” 2+4

stmt: LEI1(reg, reg) “\tcmp \t%1,%0\n\tjle \t%a\n” 2+4


stmt: LEI2(reg, reg) “\tcmp \t%1,%0\n\tjle \t%a\n” 2+4
stmt: LEU1(reg, reg) “\tcmp \t%1,%0\n\tjle \t%a\n” 2+4
stmt: LEU2(reg, reg) “\tcmp \t%1,%0\n\tjle \t%a\n” 2+4
stmt: LEF1(reg, reg) “\tcmp \t%1,%0\n\tjle \t%a\n” 2+4
stmt: LEF2(reg, reg) “\tcmp \t%1,%0\n\tjle \t%a\n” 2+4

stmt: LTI1(reg, reg) “\tcmp \t%1,%0\n\tjlt \t%a\n” 2+4


stmt: LTI2(reg, reg) “\tcmp \t%1,%0\n\tjlt \t%a\n” 2+4
stmt: LTU1(reg, reg) “\tcmp \t%1,%0\n\tjlt \t%a\n” 2+4
stmt: LTU2(reg, reg) “\tcmp \t%1,%0\n\tjlt \t%a\n” 2+4
stmt: LTF1(reg, reg) “\tcmp \t%1,%0\n\tjlt \t%a\n” 2+4
stmt: LTF2(reg, reg) “\tcmp \t%1,%0\n\tjlt \t%a\n” 2+4

stmt: NEI1(reg, reg) “\tcmp \t%1,%0\n\tjne \t%a\n” 2+4


stmt: NEI2(reg, reg) “\tcmp \t%1,%0\n\tjne \t%a\n” 2+4
stmt: NEU1(reg, reg) “\tcmp \t%1,%0\n\tjne \t%a\n” 2+4
stmt: NEU2(reg, reg) “\tcmp \t%1,%0\n\tjne \t%a\n” 2+4
stmt: NEF1(reg, reg) “\tcmp \t%1,%0\n\tjne \t%a\n” 2+4
stmt: NEF2(reg, reg) “\tcmp \t%1,%0\n\tjne \t%a\n” 2+4

reg: NEGI1(reg) “\tneg \t%0\n” 2


reg: NEGI2(reg) “\tneg \t%0\n” 2
reg: NEGF1(reg) “\tneg \t%0\n” 2
reg: NEGF2(reg) “\tneg \t%0\n” 2

con1: CNSTI1 “%a” range(a,1,1)

reg: LSHI1(reg, rc) “\trep \t%1\n\tasl \t%0\n” 4+2


reg: LSHI1(reg, con1) “\tasl \t%0\n” 4
reg: LSHI2(reg, rc) “\trep \t%1\n\tasl \t%0\n” 4+2
reg: LSHI2(reg, con1) “\tasl \t%0\n” 4

65
reg: LSHU1(reg, rc) “\trep \t%1\n\tlsl \t%0\n” 4+2
reg: LSHU1(reg, con1) “\tlsl \t%0\n” 4
reg: LSHU2(reg, rc) “\trep \t%1\n\tasl \t%0\n” 4+2
reg: LSHU2(reg, con1) “\tasl \t%0\n” 4

reg: RSHI1(reg, rc) “\trep \t%1\n\tasr \t%0\n” 4+2


reg: RSHI1(reg, con1) “\tasr \t%0\n” 4
reg: RSHI2(reg, rc) “\trep \t%1\n\tasr \t%0\n” 4+2
reg: RSHI2(reg, con1) “\tasr \t%0\n” 4
reg: RSHU1(reg, rc) “\trep \t%1\n\tlsr \t%0\n” 4+2
reg: RSHU1(reg, con1) “\tlsr \t%0\n” 4
reg: RSHU2(reg, rc) “\tmove \t#0,%02\n\trep \t%1\n\tasr \t%0\n” 4+2
reg: RSHU2(reg, con1) “\tmove \t#0,%02\n\tasr \t%0\n” 4

reg: BANDI1(reg, reg) “\tand \t%0,%1\n” 2


reg: BANDI2(reg, reg) “# \tand \t%0,%1\n” 2
reg: BANDU1(reg, reg) “\tand \t%0,%1\n” 2
reg: BANDU2(reg, reg) “# \tand \t%0,%1\n” 2

reg: BORI1(reg, reg) “\tor \t%0,%1\n” 2


reg: BORI2(reg, reg) “# \tor \t%0,%1\n” 2
reg: BORU1(reg, reg) “\tor \t%0,%1\n” 2
reg: BORU2(reg, reg) “# \tor \t%0,%1\n” 2

reg: BXORI1(reg, reg) “\teor \t%0,%1\n” 2


reg: BXORI2(reg, reg) “# \teor \t%0,%1\n” 2
reg: BXORU1(reg, reg) “\teor \t%0,%1\n” 2
reg: BXORU2(reg, reg) “# \teor \t%0,%1\n” 2

reg: BCOMI1(reg) “\tnot \t%0\n” 2


reg: BCOMI2(reg) “# \tnot \t%0\n” 2
reg: BCOMU1(reg) “\tnot \t%0\n” 2
reg: BCOMU2(reg) “# \tnot \t%0\n” 2

%%

static int equal(Node p, int a){


if(p->syms[0]->u.c.v.i == a){
return 0;
} else {
return LBURG_MAX;
}
}

static void progbeg(int argc, char **argv) {


int i, n;
char section[256];

time_t res = time(NULL);


print(“;;;; LCC-DSP56k - compiled on %s\n”, asctime(localtime(&res)));

parseflags(argc, argv);

ireg[0] = mkreg(“x0”, 0, 1, IREG);


ireg[1] = mkreg(“x1”, 1, 1, IREG);
ireg[2] = mkreg(“y0”, 2, 1, IREG);
ireg[3] = mkreg(“y1”, 3, 1, IREG);

ireg2[0] = mkreg(“x”, 0, 1, IREG);


ireg2[1] = mkreg(“y”, 2, 1, IREG);
ireg2[0]->x.regnode->mask = 3; // 00000011
ireg2[1]->x.regnode->mask = 12; // 00001100

areg[0] = mkreg(“a0”, 4, 1, IREG);


areg[1] = mkreg(“a1”, 5, 1, IREG);
areg[2] = mkreg(“b0”, 6, 1, IREG);
areg[3] = mkreg(“b1”, 7, 1, IREG);

areg2[0] = mkreg(“a”, 4, 1, IREG);


areg2[1] = mkreg(“b”, 6, 1, IREG);
areg2[0]->x.regnode->mask = 48; // 00110000
areg2[1]->x.regnode->mask = 192; // 11000000

for(i = 0; i < 8; i++) {


rreg[i] = mkreg(stringf(“r%d”, i), i+8, 1, IREG);
}

iregw = mkwildcard(ireg);
ireg2w = mkwildcard(ireg2);
aregw = mkwildcard(areg);
areg2w = mkwildcard(areg2);
rregw = mkwildcard(rreg);

tmask[IREG] = 0x0000BEFF; // R7,5-1 Y,X Y1,Y0,X1,X0


tmask[FREG] = 0x00000000;
vmask[IREG] = 0x00000000;
vmask[FREG] = 0x00000000;

// convert . to _ and remove the path in filename


// (../path/to/file.c -> file_c)
if(firstfile) {
for(i = n = 0; firstfile[i] && i < 255; i++){

66
Appendix C – dsp56k.md

char ch = firstfile[i] == ‘.’ ? ‘_’ : firstfile[i];


if(ch == ‘/’) {
n = 0;
} else {
section[n++] = ch;
}
}
section[i]=0;
print(“\tsection %s\n”, section);
}

// so - write symbols
// nomd - do not write macro defs
// rp - generate nop insn to accomodate pipeline delay
print(“\topt \tso,nomd,rp\n”);

static void progend(void) {


print(“\n\tendsec\n”);
print(“\tend\n”);
}

static Symbol rmap(int opk) {


debug2(fprint(stderr, “Inside %s: %d\n”, “rmap”, opk));
switch(optype(opk)) {
case P: case B:
return rregw;
break;
case I: case U: case F:
if(opsize(opk) == 1) {
return iregw;
} else {
return ireg2w;
}
break;
default:
return 0;
}
}

static void segment(int n) {


debug2(fprint(stderr, “Inside %s: %d\n”, “segment”, n));
if(n == cseg) {
return;
}
cseg = n;
if(n == CODE){
print(“\torg p:\n”);
} else {
print(“\torg y:\n”);
}
}

static void target(Node p) {


debug2(fprint(stderr, “Inside %s: %x\n”, “target”, p));
switch(specific(p->op)) {
case CALL+B:
setreg(p, areg2[0]); //a
//rtarget(p, 1, areg2[0]); //a
break;
case ADD+I: case SUB+I: case ADD+U: case SUB+U:
case ADD+F: case SUB+F:
setreg(p, areg2[0]); //a
rtarget(p, 0, areg2[0]); //a
break;
case MUL+I: case MUL+U: case MUL+F:
if(opsize(p->op) == 1) {
rtarget(p, 0, ireg[2]); //y0 This is bad but prevents this
rtarget(p, 1, ireg[3]); //y1 error: a*b*c -> mpy y0,y1,b
setreg(p, areg2[1]); //b mpy b,x1,b
} else {
rtarget(p, 0, areg2[0]); //a
rtarget(p, 1, areg2[1]); //b
setreg(p, areg2[1]); //b
}
break;
case DIV+I: case DIV+U: case DIV+F:
if(opsize(p->op) == 1) {
rtarget(p, 0, areg2[0]); //a
rtarget(p, 1, ireg[0]); //x0
setreg(p, areg2[0]); //a
} else {
rtarget(p, 0, areg2[0]); //a
rtarget(p, 1, areg2[1]); //b
setreg(p, areg2[0]); //a
}
break;
case MOD+I: case MOD+U:
if(opsize(p->op) == 1) {
rtarget(p, 0, areg2[0]); //a
rtarget(p, 1, ireg[0]); //x0

67
setreg(p, areg2[0]); //a
} else {
rtarget(p, 0, areg2[0]); //a
rtarget(p, 1, areg2[1]); //b
setreg(p, areg2[0]); //a
}
break;
case LSH+I: case RSH+I: case LSH+U: case RSH+U:
if(opsize(p->op) == 1) {
rtarget(p, 0, areg2[0]); //a
setreg(p, areg[1]); //a1
} else {
rtarget(p, 0, areg2[0]); //a
setreg(p, areg2[0]); //a1
}
break;
case NEG+I: case NEG+F:
rtarget(p, 0, areg2[0]); //a
setreg(p, areg2[0]); //a
break;
case BAND+I: case BOR+I: case BAND+U: case BOR+U:
case BXOR+I: case BXOR+U:
if(opsize(p->op) == 1) {
rtarget(p, 0, ireg[3]); //y1
rtarget(p, 1, areg[1]); //a1
setreg(p, areg[1]); //a1
} else {
rtarget(p, 1, areg2[0]); //a
setreg(p, areg2[0]); //a
}
break;
case BCOM+I: case BCOM+U:
if(opsize(p->op) == 1) {
rtarget(p, 0, areg[1]); //a1
setreg(p, areg[1]); //a1
} else {
rtarget(p, 0, areg2[0]); //a
setreg(p, areg2[0]); //a
}
break;
case EQ+I: case GE+I: case GT+I: case LE+I: case LT+I: case NE+I:
case EQ+U: case GE+U: case GT+U: case LE+U: case LT+U: case NE+U:
case EQ+F: case GE+F: case GT+F: case LE+F: case LT+F: case NE+F:
rtarget(p, 0, areg2[0]); //a
if(opsize(p->op) == 2) {
rtarget(p, 1, areg2[1]); //b
}
break;
case CALL+I: case CALL+U: case CALL+F: case CALL+P: case CALL+V:
setreg(p, areg2[0]); //a
break;
case RET+I: case RET+U: case RET+F:
rtarget(p, 0, areg2[0]); //a
break;
case RET+P:
rtarget(p, 0, areg[1]); //a1
break;
case CVI+I: case CVI+U: case CVI+F: case CVU+I: case CVU+U:
case CVF+I: case CVF+F:
if(opsize(p->op) == 1){
setreg(p, areg2[0]); //a
} else {
setreg(p, areg2[0]); //a
}
break;
case LOAD+I: case LOAD+U: case LOAD+F:
if(p->kids[0]->x.inst == _rreg_NT){
break;
}
if(opsize(p->op) == 1){
rtarget(p, 0, areg2[0]); //a
} else {
rtarget(p, 0, areg2[0]); //a
}
break;
default:
break;
}
}

static void clobber(Node p) {


debug2(fprint(stderr, “Inside %s: %x\n”, “clobber”, p));
assert(p);
}

static char *reg_name(int reg){


if(reg >= 0 && reg <= 3){
return ireg[reg]->x.name;
} else if(reg == 4 || reg == 6){
return areg2[reg/2 - 2]->x.name;
} else if(reg == 5 || reg == 7){
return areg[reg - 4]->x.name;

68
Appendix C – dsp56k.md

} else {
assert(0);
}
}

static void emit2(Node p) {


debug2(fprint(stderr, “Inside %s: %x\n”, “emit2”, p));
switch(specific(p->op)) {
case CNST+F:
assert(0);
break;
case CNST+I: case CNST+U: {
assert(opsize(p->op) == 2);
//reg: CNSTI2 “# \tmove \t%a,%c\n” 0
int reg = getregnum(p);
print(“\tmove \t#>%d,%s\n”,
(int)(p->syms[0]->u.c.v.i & 0xFFFFFF), reg_name(reg));
print(“\tmove \t#>%d,%s\n”,
(int)(p->syms[0]->u.c.v.i >> 24), reg_name(reg + 1));
break;
}
case INDIR+I: case INDIR+U: case INDIR+F: {
if(p->kids[0]->op != VREG+P){
if(opsize(p->op) == 2){
int dst = getregnum(p);
//reg: INDIRI2(addr) “\tmove \t%0,%c\n”
print(“\tmove \t”);
emitasm(p->kids[0], _addr_NT);
if(p->kids[0]->x.inst == _rreg_NT){
print(“+,%s\n”, reg_name(dst));
} else {
print(“,%s\n”, reg_name(dst));
}

print(“\tmove \t”);
emitasm(p->kids[0], _addr_NT);
if(p->kids[0]->x.inst == _rreg_NT){
print(“-,%s\n”, reg_name(dst + 1));
} else {
print(“+1,%s\n”, reg_name(dst + 1));
}
}
}
break;
}
case ASGN+I: case ASGN+U: case ASGN+F: {
if(p->kids[0]->op != VREG+P){
if(opsize(p->op) == 2){
int src = getregnum(p->kids[1]);
//stmt: ASGNI2(addr, reg) “\tmove \t%1,%0\n”
print(“\tmove \t%s,”, reg_name(src));
emitasm(p->kids[0], _addr_NT);
if(p->kids[0]->x.inst == _rreg_NT){
print(“+\n”);
} else {
print(“\n”);
}

print(“\tmove \t%s,”, reg_name(src+1));


emitasm(p->kids[0], _addr_NT);
if(p->kids[0]->x.inst == _rreg_NT){
print(“-\n”);
} else {
print(“+1\n”);
}
}
}
break;
}
case ASGN+B: {
int src = getregnum(p->x.kids[1]) - 8;
int dst = getregnum(p->x.kids[0]) - 8;
int lab = genlabel(1);
print(“\tmove \tr%d,n%d x1,Y:(r6)\n”, src, src);
print(“\tmove \tr%d,n%d\n”, dst, dst);
print(“\tdo \t#%d,L%d\n”, (int)p->syms[0]->u.c.v.i, lab);
print(“\tmove \tY:(r%d)+,x0\n”, src);
print(“\tmove \tx0,Y:(r%d)+\n”, dst);
print(“L%d:\n”, lab);
print(“\tmove \tn%d,r%d\n”, src, src);
print(“\tmove \tn%d,r%d Y:(r6),x1\n”, dst, dst);
break;
}
case ADD+P:
//rreg: ADDP1(rreg, rc) “# \tmove \t%1,n8\n\tlua \t(%0)+n8,%c\n”
//rreg: ADDP1(rc, rreg) “# \tmove \t%0,n8\n\tlua \t(%1)+n8,%c\n”
if(p->kids[0]->x.inst == _rreg_NT){
int reg = getregnum(p->x.kids[0]) - 8;
int dst = getregnum(p) - 8;
print(“\tmove \t”);
emitasm(p->kids[1], _rc_NT);
print(“,n%d\n\tlua \t(r%d)+n%d,r%d\n”, reg, reg, reg, dst);

69
} else if(p->kids[1]->x.inst == _rreg_NT){
int reg = getregnum(p->x.kids[1]) - 8;
int dst = getregnum(p) - 8;
print(“\tmove \t”);
emitasm(p->kids[0], _rc_NT);
print(“,n%d\n\tlua \t(r%d)+n%d,r%d\n”, reg, reg, reg, dst);
} else
assert(0);
break;
case SUB+P:
//rreg: SUBP1(rc, rreg) “# \tmove \t%0,n8\n\tlua \t(%1)-n8,%c\n”
//rreg: SUBP1(rreg, rc) “# \tmove \t%1,n8\n\tlua \t(%0)-n8,%c\n”
if(p->kids[0]->x.inst == _rreg_NT){
int reg = getregnum(p->x.kids[0]) - 8;
int dst = getregnum(p) - 8;
print(“\tmove \t”);
emitasm(p->kids[1], _rc_NT);
print(“,n%d\n\tlua \t(r%d)-n%d,r%d\n”, reg, reg, reg, dst);
} else if(p->kids[1]->x.inst == _rreg_NT){
int reg = getregnum(p->x.kids[1]) - 8;
int dst = getregnum(p) - 8;
print(“\tmove \t”);
emitasm(p->kids[0], _rc_NT);
print(“,n%d\n\tlua \t(r%d)-n%d,r%d\n”, reg, reg, reg, dst);
} else
assert(0);
break;
case MUL+I: case MUL+U: case MUL+F: {
//reg: MULI1(reg, reg) “\tmpy \t%0,%1,%c\n\tasr \t%c\n
// \tmove \t%c0,%c\n”
if(opsize(p->op) == 1) {
int s1 = getregnum(p->x.kids[0]);
int s2 = getregnum(p->x.kids[1]);
int d = getregnum(p);
char *src1 = p->x.kids[0]->syms[RX]->x.name;
char *src2 = p->x.kids[1]->syms[RX]->x.name;
if(s1 == s2){
//temp_reg = x1, x0 if x1 is taken
char *temp_reg = (s1 == 0) ? (ireg[1]->x.name) : (ireg[0]->x.name);
print(“\tmove \t%s,Y:(r6)\n”, temp_reg);
print(“\tmove \t%s,%s\n”, src2, temp_reg);
print(“\tmpy \t%s,%s,%s\n”,src1, temp_reg, p->syms[RX]->x.name);
if(optype(p->op) == I || optype(p->op) == U){
print(“\tasr \t%s\n”, p->syms[RX]->x.name);
print(“\tmove \t%s0,%s Y:(r6),%s\n”, p->syms[RX]->x.name,
p->syms[RX]->x.name, temp_reg);
}
} else {
print(“\tmpy \t%s,%s,%s\n”,src1, src2, p->syms[RX]->x.name);
if(optype(p->op) == I || optype(p->op) == U){
print(“\tasr \t%s\n”, p->syms[RX]->x.name);
print(“\tmove \t%s0,%s\n”, p->syms[RX]->x.name, p->syms[RX]->x.name);
}
}
} else {
// long mpy - copied from motorola
// a and b is input regs and b is output reg
int lab1 = genlabel(1);
int lab2 = genlabel(1);
if(optype(p->op) == F){
warning(“double multiply is not implemented; using long multiply instead\n”);
}
print(“\t;; begin long multiply\n”);
print(“\tmoven0,y:(r6)+\n”);
print(“\tmovea0,y:(r6)+\n”);
print(“\tmovea1,y:(r6)+\n”);
print(“\tmovex0,y:(r6)+\n”);
print(“\tmovex1,y:(r6)+\n”);
print(“\tmovey0,y:(r6)+\n”);
print(“\tmove#$0,n0\n”);
print(“\tmovea1,x0\n”);
print(“\teorx0,bb1,x1\n”);
print(“\tjplL%d\n”, lab1);
print(“\tmove#$1,n0\n”);
print(“L%d:\n”, lab1);
print(“\tabsa x1,b1\n”);
print(“\tabsb y1,y:(r6)+\n”);
print(“\tmoveb0,y:(r6)+\n”);
print(“\tmoveb1,y:(r6)\n”);
print(“\tclrb b1,x0\n”);
print(“\tmovex0,b0\n”);
print(“\taslb\n”);
print(“\taslb\n”);
print(“\tclrb b1,x0\n”);
print(“\tmovea1,b0\n”);
print(“\taslb\n”);
print(“\taslb #$7fffff,y1\n”);
print(“\tmoveb1,x1\n”);
print(“\tmovey:(r6)-,b\n”);
print(“\tmovey:(r6),b0\n”);
print(“\tmovex0,y:(r6)+\n”);
print(“\tmovex1,y:(r6)\n”);

70
Appendix C – dsp56k.md

print(“\taslb\n”);
print(“\tandy1,b\n”);
print(“\tmoveb1,x0\n”);
print(“\tasrb\n”);
print(“\tmoveb0,b\n”);
print(“\tandy1,b\n”);
print(“\tmoveb1,x1\n”);
print(“\tasla\n”);
print(“\tandy1,ay1,b1\n”);
print(“\tmovea1,y0\n”);
print(“\tasra\n”);
print(“\tmovea0,y1\n”);
print(“\tandy1,b\n”);
print(“\tmoveb1,y1\n”);
print(“\tmpyx1,y1,b\n”);
print(“\tmpyx1,y0,a\n”);
print(“\tmacx0,y1,a\n”);
print(“\tmovea1,a2\n”);
print(“\tmovea0,a1\n”);
print(“\tmove#$0,a0\n”);
print(“\tasra\n”);
print(“\tadda,b\n”);
print(“\tmpyx0,y0,a\n”);
print(“\tmovey:(r6)-,y0\n”);
print(“\tmovey:(r6)-,x0\n”);
print(“\tmacy0,x1,a\n”);
print(“\tmacx0,y1,a\n”);
print(“\tclra a0,x0\n”);
print(“\tmovex0,a2\n”);
print(“\tasra y:(r6)-,y1\n”);
print(“\tasra y:(r6)-,y0\n”);
print(“\tadda,by:(r6)-,x1\n”);
print(“\tasrb y:(r6)-,x0\n”);
print(“\tmoven0,a\n”);
print(“\ttsta y:(r6)-,a\n”);
print(“\tjeqL%d\n”, lab2);
print(“\tnegb\n”);
print(“L%d:\n”, lab2);
print(“\ttstb y:(r6)-,a0\n”);
print(“\tmovey:(r6),n0\n”);
print(“\t;; end long multiply\n”);
}
break;
}
case DIV+I: case DIV+U: case DIV+F:{
if(opsize(p->op) == 1){
char *src = ireg[getregnum(p->x.kids[1])]->x.name;
char *dst = p->syms[RX]->x.name;
int lab = genlabel(1);
print(“\tmove \tb,Y:(r6)+\n”);
print(“\tabs \t%s %s,b\n”, dst, dst); //b
if(optype(p->op) == I || optype(p->op) == U){
print(“\tclr \t%s %s1,Y:(r6)\n”, dst, dst);
print(“\tmove \tY:(r6),%s0\n”, dst);
print(“\tasl \t%s\n”, dst);
}
print(“\trep \t#24\n”);
print(“\tdiv \t%s,%s\n”, src, dst);
print(“\teor \t%s,b\n”, src); //b
print(“\tjpl \tL%d\n”, lab);
print(“\tneg \t%s\n”, dst);
print(“L%d:\tmove \t%s0,%s\n”, lab, dst, dst);
print(“\tmove \tY:-(r6),b\n”);
} else {
// long div - copied from motorola
// a and b is input regs and a is output reg
int lab[11];
int i;
for(i = 0; i < 11; i++){
lab[i] = genlabel(1);
}
if(optype(p->op) == F){
warning(“double division is not implemented; using long division instead\n”);
}
print(“\t;;-- begin long division\n”);
print(“move n0,y:(r6)+\n”);
print(“move b0,y:(r6)+\n”);
print(“move b1,y:(r6)+\n”);
print(“move x0,y:(r6)+\n”);
print(“move x1,y:(r6)+\n”);
print(“move y0,y:(r6)+\n”);
print(“move y1,y:(r6)+\n”);
print(“\n”);
print(“move #$0,n0\n”);
print(“move b1,y1\n”);
print(“eor y1,a a1,y0\n”);
print(“jpl L%d\n”, lab[2]);
print(“move #$1,n0\n”);
print(“L%d:\n”, lab[2]);
print(“abs b y0,a1\n”);
print(“abs a #$0,x1\n”);
print(“\n”);

71
print(“move b1,y1\n”);
print(“clr b b0,y0\n”);
print(“move b1,y:(r6)\n”);
print(“\n”);
print(“ori #$04,mr\n”);
print(“asl a #>$1,x0\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[3]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[3]);
print(“andi #$f3,mr\n”);
print(“div x1,b\n”);
print(“\n”);
print(“do #$2f,L%d\n”, lab[4]);
print(“btst #23,y:(r6)\n”);
print(“jcs L%d\n”, lab[5]);
print(“add x,a\n”);
print(“move #$0,a2\n”);
print(“ori #$04,mr\n”);
print(“asl a\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[6]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[6]);
print(“andi #$f3,mr\n”);
print(“div x1,b\n”);
print(“sub y,b\n”);
print(“jmp L%d\n”, lab[7]);
print(“L%d:\n”, lab[5]);
print(“move #$0,a2\n”);
print(“ori #$04,mr\n”);
print(“asl a\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[8]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[8]);
print(“andi #$f3,mr\n”);
print(“div x1,b\n”);
print(“add y,b\n”);
print(“L%d:\n”, lab[7]);
print(“move a1,x1\n”);
print(“move b1,a1\n”);
print(“eor y1,a\n”);
print(“move b1,y:(r6)\n”);
print(“move x1,a1\n”);
print(“move #$0,x1\n”);
print(“L%d:\n”, lab[4]);
print(“btst #23,y:(r6)\n”);
print(“jcs L%d\n”, lab[9]);
print(“add x,a\n”);
print(“L%d:\n”, lab[9]);
print(“move #$80,x1\n”);
print(“eor x1,a (r6)-\n”);
print(“move #$0,a2\n”);
print(“\n”);
print(“move n0,b\n”);
print(“tst b\n”);
print(“jeq L%d\n”, lab[10]);
print(“neg a\n”);
print(“L%d:\n”, lab[10]);
print(“move y:(r6)-,y1\n”);
print(“move y:(r6)-,y0\n”);
print(“move y:(r6)-,x1\n”);
print(“move y:(r6)-,x0\n”);
print(“move y:(r6)-,b\n”);
print(“tst a y:(r6)-,b0\n”);
print(“move y:(r6),n0\n”);
print(“\t;;-- end long division\n”);
}
break;
}
case MOD+I: case MOD+U: {
if(opsize(p->op) == 1){
char *src = ireg[getregnum(p->x.kids[1])]->x.name;
char *dst = p->syms[RX]->x.name;
int lab = genlabel(1);
print(“\tmove \tb,Y:(r6)+\n”);
print(“\tabs \t%s %s,b\n”, dst, dst); //b
print(“\tclr \t%s %s1,Y:(r6)\n”, dst, dst);
print(“\tmove \tY:(r6),%s0\n”, dst);
print(“\tasl \t%s\n”, dst);
print(“\trep \t#24\n”);
print(“\tdiv \t%s,%s\n”, src, dst);

print(“\tmove \t%s1,Y:(r6)\n”, dst);


print(“\tmove \t%s,%s\n”, src, dst);
print(“\tabs \t%s Y:(r6),%s\n”, dst, src);//destroy x0 (src)
print(“\tadd \t%s,a\n”, src);
print(“\tasr \t%s\n”, dst);
print(“\ttst \tb\n”); //b

print(“\tjge \tL%d\n”, lab);

72
Appendix C – dsp56k.md

print(“\tneg \t%s\n”, dst);


print(“L%d:\n”, lab);
print(“\tmove \tY:-(r6),b\n”);
} else {
// long modulo - copied from motorola
// a and b is input regs and b is output reg
int lab[14];
int i;
for(i = 0; i < 14; i++){
lab[i] = genlabel(1);
}
print(“\t;;-- begin long modulo\n”);
print(“move n0,y:(r6)+\n”);
print(“move b0,y:(r6)+\n”);
print(“move b1,y:(r6)+\n”);
print(“move x0,y:(r6)+\n”);
print(“move x1,y:(r6)+\n”);
print(“move y0,y:(r6)+\n”);
print(“move y1,y:(r6)+\n”);
print(“move r0,y:(r6)+\n”);
print(“move r1,y:(r6)+\n”);
print(“move #$0,n0\n”);
print(“tst a\n”);
print(“jpl L%d\n”, lab[12]);
print(“abs a\n”);
print(“move #$1,n0\n”);
print(“L%d:\n”, lab[12]);
print(“move b1,y1\n”);
print(“tfr a,b b0,y0\n”);
print(“\n”);
print(“clr a #>$1,x0\n”);
print(“tst b #$0,x1\n”);
print(“jpl L%d\n”, lab[1]);
print(“sub x,a\n”);
print(“L%d:\n”, lab[1]);
print(“move b1,x1\n”);
print(“move a1,b1\n”);
print(“eor y1,b r6,r0\n”);
print(“move b1,y:(r6)+\n”);
print(“move r6,r1\n”);
print(“move b1,y:(r6)+\n”);
print(“move x1,b1\n”);
print(“move #$0,b2\n”);
print(“ori #$04,mr\n”);
print(“asl b #$0,x1\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[2]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[2]);
print(“andi #$f3,mr\n”);
print(“div x1,a\n”);
print(“\n”);
print(“do #$2f,L%d\n”, lab[3]);
print(“btst #23,y:(r1)\n”);
print(“jcs L%d\n”, lab[4]);
print(“add x,b\n”);
print(“move #$0,b2\n”);
print(“ori #$04,mr\n”);
print(“asl b\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[5]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[5]);
print(“andi #$f3,mr\n”);
print(“div x1,a\n”);
print(“sub y,a\n”);
print(“jmp L%d\n”, lab[6]);
print(“L%d:\n”, lab[4]);
print(“move #$0,b2\n”);
print(“ori #$04,mr\n”);
print(“asl b\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[7]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[7]);
print(“andi #$f3,mr\n”);
print(“div x1,a\n”);
print(“add y,a\n”);
print(“L%d:\n”, lab[6]);
print(“move b1,x1\n”);
print(“move a1,b1\n”);
print(“eor y1,b\n”);
print(“move b1,y:(r1)\n”);
print(“move x1,b1\n”);
print(“move #$0,x1\n”);
print(“L%d:\n”, lab[3]);
print(“btst #23,y:(r1)\n”);
print(“jcs L%d\n”, lab[8]);
print(“add x,b\n”);
print(“L%d\n”, lab[8]);
print(“move #$80,x1\n”);
print(“eor x1,b\n”);

73
print(“move #$0,x1\n”);
print(“btst #23,y:(r0)\n”);
print(“jcc L%d\n”, lab[9]);
print(“add x,b\n”);
print(“L%d:\n”, lab[9]);
print(“move b1,x1\n”);
print(“move y:(r0),b1\n”);
print(“move y:(r1),x0\n”);
print(“eor x0,b\n”);
print(“move x1,b1\n”);
print(“jpl L%d\n”, lab[10]);
print(“btst #23,y:(r0)\n”);
print(“jcc L%d\n”, lab[11]);
print(“sub y,a\n”);
print(“jmp L%d\n”, lab[10]);
print(“L%d:\n”, lab[11]);
print(“add y,a\n”);
print(“L%d:\n”, lab[10]);
print(“move n0,b\n”);
print(“tst b\n”);
print(“jeq L%d\n”, lab[13]);
print(“neg a\n”);
print(“L%d:\n”, lab[13]);
print(“move (r6)-\n”);
print(“move (r6)-\n”);
print(“move (r6)-\n”);
print(“move y:(r6)-,r1\n”);
print(“move y:(r6)-,r0\n”);
print(“move y:(r6)-,y1\n”);
print(“move y:(r6)-,y0\n”);
print(“move y:(r6)-,x1\n”);
print(“move y:(r6)-,x0\n”);
print(“move y:(r6)-,b\n”);
print(“tst a y:(r6)-,b0\n”);
print(“move y:(r6),n0\n”);
print(“\t;;-- end long modulo\n”);
}
break;
}
case ARG+I: case ARG+U: case ARG+F:{
if(opsize(p->op) == 2){
int src = getregnum(p->x.kids[0]);
//stmt: ARGI2(reg) “# \tmove \t%0,Y:(r6)+\n”
print(“\tmove \t%s,Y:(r6)+\n”, reg_name(src));
print(“\tmove \t%s,Y:(r6)+\n”, reg_name(src + 1));
}
break;
}
case ARG+B: {
int src = getregnum(p->x.kids[0]) - 8;
int label = genlabel(1);
int size = p->syms[0]->u.c.v.i;
print(“\tmove \t#%d,n6\n”, size);
print(“\tmove \tr%d,n%d\n”, src, src);
print(“\tmove \tx0,Y:(r6+n6)\n”);
if(size > 1) {
print(“\tdo \t#%d,L%d\n”, size , label);
}
print(“\tmove \tY:(r%d)+,x0\n”, src);
print(“\tmove \tx0,Y:(r6)+\n”);
if(size > 1) {
print(“L%d:\n”, label);
}
print(“\tmove \tn%d,r%d\n”, src, src);
print(“\tmove \tY:(r6),x0\n”);
break;
}
case BAND+I: case BAND+U:
case BOR+I: case BOR+U:
case BXOR+I: case BXOR+U: {
//reg: BANDI2(reg, reg) “\tand \t%0,%1\n”
int src = getregnum(p->x.kids[0]);
int dst = getregnum(p->x.kids[1]);
print(“\tand \t%s,”, reg_name(src + 1));
emitasm(p->kids[1], _reg_NT);
print(“\t%s,Y:(r6)\n”, reg_name(dst));
print(“\tmove \t%s,%s\n”, reg_name(dst + 1), reg_name(dst));
print(“\tmove \tY:(r6),%s\n”, reg_name(dst + 1));

switch(generic(p->op)){
case BAND:
print(“\tand \t%s,”, reg_name(src));
break;
case BOR:
print(“\tor \t%s,”, reg_name(src));
break;
case BXOR:
print(“\teor \t%s,”, reg_name(src));
break;
}

emitasm(p->kids[1], _reg_NT);

74
Appendix C – dsp56k.md

print(“\n\tmove \t%s,Y:(r6)\n”, reg_name(dst + 1));


print(“\tmove \t%s,”, reg_name(dst));
emitasm(p->kids[1], _reg_NT);
print(“\n\tmove \tY:(r6),%s\n”, reg_name(dst));
break;
}
case BCOM+I: case BCOM+U: {
//reg: BCOMI2(reg) “\tnot \t%0\n”
int src = getregnum(p->x.kids[0]);
print(“\tnot \t%s\n”, reg_name(src + 1));
print(“\tmove \t%s,Y:(r6)\n”, reg_name(src + 1));
print(“\tmove \t%s,%s\n”, reg_name(src), reg_name(src + 1));
print(“\tnot \t%s\n”, reg_name(src + 1));
print(“\tmove \t%s,%s\n”, reg_name(src + 1), reg_name(src));
print(“\tmove \tY:(r6),%s\n”, reg_name(src + 1));
break;
}
case RET+I: case RET+U: case RET+F:
break;
case CVI+I: case CVI+U: case CVI+F: case CVU+I: case CVU+U:
case CVF+I: case CVF+F: {
char *src = p->kids[0]->syms[RX]->x.name;
char *dst = p->syms[RX]->x.name;
if(src == areg2[1]->x.name){
print(“\ttfr \t%s,%s\n”, src, dst);
} else {
print(“\tmove \t%s,%s\n”, src, dst);
}
break;
}
case LOAD+I: case LOAD+U: case LOAD+F:
assert(opsize(p->op) == 2);
if(opsize(p->x.kids[0]->op) == 2){
int src = getregnum(p->x.kids[0]);
int dst = getregnum(p);
print(“\tmove \t%s,%s\n”, reg_name(src), reg_name(dst));
print(“\tmove \t%s,%s\n”, reg_name(src+1), reg_name(dst+1));
} else {
print(“\tmove \t%s,%s\n”, p->kids[0]->syms[RX]->x.name, p->syms[RX]->x.name);
assert(0);
}
break;
default:
break;
}

static void doarg(Node p) {


debug2(fprint(stderr, “Inside %s: %x\n”, “doarg”, p));
mkactual(1, p->syms[0]->u.c.v.i);
}

static void blkfetch(int k, int off, int reg, int tmp) {


debug2(fprint(stderr, “Inside %s\n”, “blkfetch”));
}

static void blkstore(int k, int off, int reg, int tmp) {


debug2(fprint(stderr, “Inside %s\n”, “blkstore”));
}

static void blkloop(int dreg, int doff, int sreg, int soff,
int size, int tmps[]) {
debug2(fprint(stderr, “Inside %s\n”, “blkloop”));
}

static void local(Symbol p) {


debug2(fprint(stderr, “Inside %s: %s, retstruct: %d\n”, “local”, p->name, retstruct));
if (retstruct) {
assert(p == retv);
p->sclass = REGISTER;
if(askregvar(p, rreg[7]) == 0) {
assert(0);
}
retstruct = -1;
return;
}
if(askregvar(p, rmap(ttob(p->type))) == 0) {
mkauto(p);
}
}

static void function(Symbol f, Symbol caller[], Symbol callee[], int n) {


int i;

debug2(fprint(stderr,”Inside %s: %s, %d\n”, “function”, f->name, n));

usedmask[0] = usedmask[1] = 0;
freemask[0] = freemask[1] = ~(unsigned)0;

offset = -3;
for(i = 0; callee[i]; i++) {

75
Symbol p = callee[i];
Symbol q = caller[i];
assert(q);
if(q->type->size == 2) { //long, double
p->x.offset = q->x.offset = offset - 1;
p->x.name = q->x.name = stringf(“%d”, offset - 1);
} else { //char, short, int, float
p->x.offset = q->x.offset = offset;
p->x.name = q->x.name = stringf(“%d”, offset);
}
p->sclass = q->sclass = AUTO;
offset -= q->type->size;
}
assert(caller[i] == 0);
offset = maxoffset = 0;

retstruct = isstruct(freturn(f->type));

debug2(fprint(stderr, “gencode(%s):\n”, f->name));


gencode(caller, callee);

print(“\n ;;;; Function %s starts\n”, f->name);


print(“\tglobal\t%s\n”, f->x.name);
print(“%s:\n”, f->x.name);
print(“\tmove \tr0,Y:(r6)+ \t;; save frame pointer\n”);
print(“\tlua \t(r6)+,r0 \t;; set new frame pointer\n”);
print(“\tmove \tssh,Y:(r6)+ \t;; save return address\n”);

debug2(fprint(stderr, “usedmask[IREG]: %x, usedmask[FREG]: %x\n”,


usedmask[IREG], usedmask[FREG]));

for(i = 0; i < 4; i++) {


if(usedmask[IREG] & (1 << i)) {
print(“\tmove \t%s,Y:(r6)+ \t;; save ireg\n”,
ireg[i]->x.name);
}
}
for(i = 6; i < 8; i++) {
if(usedmask[IREG] & (1 << i)) {
print(“\tmove \t%s,Y:(r6)+ \t;; save areg\n”,
areg[i-4]->x.name);
}
}
for(i = 8; i < 16; i++) {
if(usedmask[IREG] & (1 << i)) {
print(“\tmove \t%s,Y:(r6)+ \t;; save rreg\n”,
rreg[i-8]->x.name);
}
}

debug2(fprint(stderr, “offset: %d, maxoffset: %d, argoffset: %d, maxargoffset: %d\n”, offset, maxoffset,
argoffset, maxargoffset));
if(maxoffset > 0) {
print(“\tmove \t#%d,n6 \t\t;; update stack with local and temp offset\n”, maxoffset);
print(“\tmove \t(r6)+n6 \t;;\n”);
}

if(retstruct == -1){
print(“\tmove \ta,r7\n”);
retstruct = 0;
}

debug2(fprint(stderr, “emitcode(%s):\n”, f->name));


emitcode();

if(maxoffset > 0) {
print(“\tmove \t#%d,n6 \t\t;; restore stack\n”, maxoffset);
print(“\tmove \t(r6)-n6 \t;;\n”);
}

for(i = 15; i >= 8; i--) {


if(usedmask[IREG] & (1 << i)) {
print(“\tmove \tY:-(r6),%s \t;; restore rreg\n”,
rreg[i-8]->x.name);
}
}

if(usedmask[IREG] & (1 << 7)) { //FIX b1.b0 ->b.b0


print(“\tmove \t%Y:-(r6),%s \t;; restore areg\n”,
areg2[1]->x.name);
}
if(usedmask[IREG] & (1 << 6)) {
print(“\tmove \t%Y:-(r6),%s \t;; restore areg\n”,
areg[2]->x.name);
}

for(i = 3; i >= 0; i--) {


if(usedmask[IREG] & (1 << i)) {
print(“\tmove \t%Y:-(r6),%s \t;; restore ireg\n”,
ireg[i]->x.name);
}
}

76
Appendix C – dsp56k.md

print(“\tmove \tY:-(r6),ssh \t;; restore return address\n”);


print(“\ttst \ta Y:-(r6),r0 \t;; restore frame pointer and test a\n”);
print(“\trts \n”);
print(“ ;;;; Function %s ends\n\n”, f->name);
}

static void defsymbol(Symbol p) {


debug2(fprint(stderr, “Inside %s: %s -> “, “defsymbol”, p->name));
if(p->scope >= LOCAL && p->sclass == STATIC) {
p->x.name = stringf(“F__%s%d”, cfunc->name, genlabel(1));
} else if(p->generated) {
p->x.name = stringf(“L%s”, p->name);
} else if(p->scope == CONSTANTS
&& (isint(p->type) || isptr(p->type))) {
if(p->name[0] == ‘0’ && p->name[1] == ‘x’) {
p->x.name = stringf(“$%s”, &p->name[2]);
} else {
p->x.name = p->name;
}
} else {
p->x.name = stringf(“F%s”, p->name);
}
debug2(fprint(stderr, “%s\n”, p->x.name));
}

static void address(Symbol q, Symbol p, long n) {


debug2(fprint(stderr, “Inside %s: %s, %s, %d\n”, “address”, q->name, p->name, n));
if (p->scope == GLOBAL || p->sclass == STATIC || p->sclass == EXTERN)
q->x.name = stringf(“%s%s%D”, p->x.name, n >= 0 ? “+” : ““, n);
else {
assert(n <= INT_MAX && n >= INT_MIN);
q->x.offset = p->x.offset + n;
q->x.name = stringd(q->x.offset);
}
}

static void defconst(int suffix, int size, Value v) {


debug2(fprint(stderr, “Inside %s: %d, %d, %d \n”, “defconst”, suffix, size, v.i));
if (suffix == F) {
if(size == 1) {
unsigned u;
if(v.d == 1.0){
u = 0x7FFFFF;
} else {
u = (unsigned)(v.d * 8388608) & 0xFFFFFF;
}
print(“\tdc\t$%x\n”, u);
} else {
long long u;
if(v.d == 1.0){
u = 0x7FFFFFFFFFFFLL;
} else {
u = (long long)(v.d * 16777216*8388608) & 0xFFFFFFFFFFFFLL;
}
print(“\tdc\t$%x\n”, (unsigned)(u & 0xFFFFFF));
print(“\tdc\t$%x\n”, (unsigned)(u >> 24));
}
} else if (suffix == P) {
print(“\tdc\t$%x\n”, (unsigned)v.p);
} else if (size == 1) {
print(“\tdc\t$%x\n”, (unsigned)(suffix == I ? v.i : v.u));
} else if (size == 2) {
print(“\tdc\t$%x\n”, (unsigned long)((suffix == I ? v.i : v.u) & 0xFFFFFF));
print(“\tdc\t$%x\n”, (unsigned long)((suffix == I ? v.i : v.u) >> 24));
}
}

static void defaddress(Symbol p) {


debug2(fprint(stderr, “Inside %s: %s\n”, “defaddress”, p->name));
print(“\tdc\t%s\n”, p->x.name);
}

static void defstring(int n, char *str) {


int i;
debug2(fprint(stderr, “Inside %s: “, “defstring”));
if(n == -1){
print(“\tdc\t%d\n”, *(int *)str);
} else {
for(i = 0; i < n; i++) {
debug2(fprint(stderr, “%d - %c, “, str[i], str[i]));
print(“\tdc\t%d\n”, str[i]);
}
}
debug2(fprint(stderr, “, %d\n”, n));
}

static void export(Symbol p) {


debug2(fprint(stderr, “Inside %s: %s\n”, “export”, p->name));
}

static void import(Symbol p) {

77
debug2(fprint(stderr, “Inside %s: %s\n”, “import”, p->name));
}

static void global(Symbol p) {


debug2(fprint(stderr, “Inside %s: %s\n”, “global”, p->name));
print(“\tglobal\t%s\n%s\n”, p->x.name, p->x.name);
}

static void space(int n) {


debug2(fprint(stderr, “Inside %s: %d\n”, “space”, n));
print(“\tbsc\t%d\n”, n);
}

Interface dsp56kIR = {
1, 1, 0, /* char */
1, 1, 0, /* short */
1, 1, 0, /* int */
2, 2, 0, /* long */
2, 2, 0, /* long long */
1, 1, 1, /* float */
2, 2, 1, /* double */
2, 2, 1, /* long double */
1, 1, 0, /* T * */
0, 1, 0, /* struct */
1, /* little_endian */
0, /* mulops_calls */
1, /* wants_callb */
1, /* wants_argb */
0, /* left_to_right */
0, /* wants_dag */
0, /* unsigned_char */
address,
blockbeg,
blockend,
defaddress,
defconst,
defstring,
defsymbol,
emit,
export,
function,
gen,
global,
import,
local,
progbeg,
progend,
segment,
space,
0, 0, 0, 0, 0, 0, 0,
{
1, /* max_unaligned_load */
rmap,
blkfetch, blkstore, blkloop,
_label,
_rule,
_nts,
_kids,
_string,
_templates,
_isinstruction,
_ntname,
emit2,
doarg,
target,
clobber,

}
};

static char rcsid[] = “$Id: dsp56k.md,v 1.24 2004/08/19 13:42:43 henan702 Exp $”;

78
Index

A H
abstract syntax tree . . . . . . . . 26 Harvard architecture . . . . . . . .4
accumulator register . . . . . . . .5 heap . . . . . . . . . . . . . . . . . . . 36
activation record . . . . . . . . . 37
address register . . . . . . . . . . .5 I
ANSI C . . . . . . . . . . . . . . . . 24 input register . . . . . . . . . . . . .5
B K
boilerplate . . . . . . . . . . . . . . 41 K&R C . . . . . . . . . . . . . . . . . 24
bootstrap . . . . . . . . . . . . . . . 46
L
C lburg . . . . . . . . . . . . . . . 30, 39
C89 . . . . . . . . . . . . . . . . . . . 24
C90 . . . . . . . . . . . . . . . . . . . 24 M
C99 . . . . . . . . . . . . . . . . . . . 24 modifier register . . . . . . . . . . .6
D N
dag . . . . . . . . . . . . . . . . . . . 29 nonterminal . . . . . . . . . . . . . 12
DC . . . . . . . . . . . . . . . . . . . .7
derivation . . . . . . . . . . . . . . 13 O
directed acyclic graph . . . . . . 29 offset register . . . . . . . . . . . . . 5
ops . . . . . . . . . . . . . . . . . . . . 40
E OPT . . . . . . . . . . . . . . . . . . . . 7
EBNF . . . . . . . . . . . . . . . . . 26 ORG . . . . . . . . . . . . . . . . . . . . 7

F P
fixed-point . . . . . . . . . . . . . . .4 P address bus . . . . . . . . . . . . . 5
fractional number . . . . . . . . . 34 parse tree . . . . . . . . . . . . 11, 26
parser . . . . . . . . . . . . . . . . . . 26
G parsing . . . . . . . . . . . . . . . . . 11
GCC . . . . . . . . . . . . . . . . . . 23 production . . . . . . . . . . . . . . 12
GLOBAL . . . . . . . . . . . . . . . . 7 program data bus . . . . . . . . . . 5
global data bus . . . . . . . . . . . . 5
GNU C Compiler . . . . . . . . . 23 R
GNU Compiler Collection . . . 23 retarget . . . . . . . . . . . . . . . . . 19

79
S
scanning . . . . . . . . . . . . . . . 11
stack . . . . . . . . . . . . . . . . . . 36
start symbol . . . . . . . . . . . . . 13
syntax tree . . . . . . . . . . . . . . 13

T
template . . . . . . . . . . . . 31, 32
terminal . . . . . . . . . . . . . . . . 12
three-address code . . . . . . . . 15
token . . . . . . . . . . . . . . . 11, 24
tree grammar . . . . . . . . . . . . 30
tree parser . . . . . . . . . . . . . . 30

W
wildcard . . . . . . . . . . . . 40, 46

X
X address bus . . . . . . . . . . . . 5
X data bus . . . . . . . . . . . . . . . 5

Y
Y address bus . . . . . . . . . . . . 5
Y data bus . . . . . . . . . . . . . . . 5

80
På svenska

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –


under en längre tid från publiceringsdatum under förutsättning att inga
extra-ordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda
ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat
för ickekommersiell forskning och för undervisning. Överföring av upphovs-
rätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan
användning av dokumentet kräver upphovsmannens medgivande. För att
garantera äktheten, säkerheten och tillgängligheten finns det lösningar av
teknisk och administrativ art.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upp-
hovsman i den omfattning som god sed kräver vid användning av dokumen-
tet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller
presenteras i sådan form eller i sådant sammanhang som är kränkande för
upphovsmannens litterära eller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press se
förlagets hemsida http://www.ep.liu.se/

In English

The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission
for anyone to read, to download, to print out single copies for your own use
and to use it unchanged for any non-commercial research and educational
purpose. Subsequent transfers of copyright cannot revoke this permission.
All other uses of the document are conditional on the consent of the copy-
right owner. The publisher has taken technical and administrative measures
to assure authenticity, security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be pro-
tected against infringement.
For additional information about the Linköping University Electronic
Press and its procedures for publication and for assurance of document integ-
rity, please refer to its WWW home page: http://www.ep.liu.se/

© Henrik Antelius

You might also like