Retargeting A C Compiler For A DSP Processor: Henrik Antelius
Retargeting A C Compiler For A DSP Processor: Henrik Antelius
Retargeting A C Compiler For A DSP Processor: Henrik Antelius
DSP Processor
Henrik Antelius
LiTH-ISY-EX-3595-2004
Linköping 2004
Retargeting a C Compiler for a
DSP Processor
Henrik Antelius
LiTH-ISY-EX-3595-2004
Sammanfattning
Abstract
The purpose of this thesis is to retarget a C compiler for a DSP processor.
Developing a new compiler from scratch is a major task. Instead, modifying an existing
compiler so that it generates code for another target is a common way to develop compilers for
new processors. This is called retargeting.
This thesis describes how this was done with the LCC C compiler for the Motorola DSP56002
processor.
Nyckelord
Keyword
retarget, compiler, LCC, DSP
Abstract
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose and goal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 The reader. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Reading guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 DSP 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Motorola DSP56002. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Data buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.2 Address buses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Data ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.4 Address generation unit . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.5 Program control unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Instruction set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Compilers 9
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 The analysis-synthesis model . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Phases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 Lexical analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.2 Syntax analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4.3 Semantic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5.1 Intermediate code generation . . . . . . . . . . . . . . . . . . . 15
3.5.2 Code optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5.3 Code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.6 Symbol table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.7 Error handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.8 Front and back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.9 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
ix
3.9.1 Preprocessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.9.2 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.9.3 Linker and loader. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.10 Compiler tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 LCC 23
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 The compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.1 Lexical analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.2 Syntax analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.3 Semantic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.4 Intermediate code generation . . . . . . . . . . . . . . . . . . . 29
4.3.5 Back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Implementation 33
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 The compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Data types and sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Register usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.3 Memory usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.4 Frame layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2.5 Calling convention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2.6 Naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Retargeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.2 Declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.3 Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.3.4 C code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 Special features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Other changes to LCC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.6 The environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.7 crt0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.8.1 Register targeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.8.2 48-bit registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.8.3 Address registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.9 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Conclusions 51
6.1 Retargeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
References 53
x
Table of contents
Appendix A: Instructions 55
A.1 Arithmetic instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
A.2 Logical instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
A.3 Bit manipulation instructions . . . . . . . . . . . . . . . . . . . . . . . 57
A.4 Loop instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.5 Move instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
A.6 Program control instructions . . . . . . . . . . . . . . . . . . . . . . . 58
Appendix B: Sample code 59
B.1 sample.c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
B.2 sample.asm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Appendix C: dsp56k.md 61
Index 79
xi
xii
1
Introduction
1.1 Background
The division of Electronics Systems (ES) at the department of Electri-
cal Engineering (ISY) at Linköping University (LiU) is currently run-
ning a project aiming at developing a DSP processor. The goal of this
project is to make a DSP with a scalable structure that is instruction
level compatible with the Motorola DSP56002 processor. The scalabil-
ity refers to variable data word length and addition or removal of
memories and instructions. The goal with scalability is to reduce the
power consumption.
Currently this project is nearly finished. In order to increase the usabil-
ity of the DSP a C compiler is needed.
It was decided that the best way to create a C compiler was to retarget
an existing C compiler. Creating a compiler from scratch is a big
undertaking that requires a lot of work. Retargeting a compiler is a rel-
atively easy task compared to developing an entire compiler.
1
1.3 – The reader
2
2
DSP
2.1 Introduction
Digital signal processing is, as the term suggests, the processing of sig-
nals by digital means. The signal is normally an electrical signal car-
ried on a wire, but it can represent almost any kind of information and
it can be processed in a wide variety of ways. Examples of digital sig-
nal processing include the following:
• Filtering of signals.
• Convolution, which is the mixing of two signals.
• Correlation, which is the comparison of two signals.
• Rectification, amplification and transformation of a signal.
All of these tasks have earlier been performed by using analog cir-
cuits. Nowadays integrated circuits have enough processing power to
perform these and many other functions. The devices performing
these tasks are called digital signal processors, or DSPs. They are spe-
cialised microprocessors with architectures designed specifically for
the types of operations required in digital signal processing. Like gen-
eral-purpose microprocessors, the DSPs are programmable devices
with its own native instruction set.
3
2.2 – Motorola DSP56002
DSPs can today be found in almost all electronic areas, such as mobile
phones, personal computers, digital television decoders, surround
receivers, and so on. The advantages of using a DSP instead of analog
circuits are many. Generally, fewer components are needed, DSPs
have higher noise immunity, it is easy to change the behaviour of a fil-
ter, filters with closer tolerances can be built, and so on. Also, since the
DSP is a microcomputer, the same hardware design can be used in
many different areas by simply changing the software for the DSP.
4
Chapter 2 – DSP
5
2.3 – Instruction set
2.4 Assembly
The instruction syntax is organized in four columns; opcode, oper-
ands and two parallel move fields. An example of a typical assembly
instruction can be seen here:
Opcode Operands XDB YDB
MAC X0,Y0,A X:(R0)+,X0 Y:(R4)+,Y0
The opcode column specifies the operation that should be performed.
The operands column specifies which operands the opcode should
use. The XDB and YDB columns specify optional data transfers over
the X data bus and the Y data bus. The address space qualifiers X: and
Y: indicate which memory is being referenced.
This is an example of a small assembly program:
ORG Y:
var_a dc 42
var_b dc 48
ORG P:$40
MOVE Y:var_a,X0
MOVE Y:var_b,A
6
Chapter 2 – DSP
ADD X0,A
MOVE A,Y:var_a
This program simply adds the variables var_a and var_b and stores
the result in var_a.
This is a list of some of the features of the assembler that is used in this
thesis:
• Labels: If the first character on a line is not a space or a tab it is a
label. Labels are used for variables and jump destinations. A
colon is often used to end the label to increase readability of the
assembly.
• ORG: The ORG directive is used to indicate which memory the
following statements belong to. It is also used for a lot of other
memory related things.
• OPT: The OPT directive is used to assign options to the assem-
bler.
• Variables: Variables are declared with a label and the DC direc-
tive to define a constant.
• GLOBAL: The GLOBAL keyword is used to instruct the assem-
bler that a variable is global.
• Comments: Semicolon is used as a comment specifier. All char-
acters to the right of the semicolon are ignored.
7
2.4 – Assembly
8
3
Compilers
3.1 Introduction
A compiler is a program that reads a program written in one language
and translates it into an equivalent program in another language. An
important part of this process is to report the presence of errors in the
source program to the user.
There exists thousands of different compilers for different source lan-
guages and target languages, and there also exists many different
types of compilers. However, the basic principles of how the compil-
ers work are the same. This chapter will discuss these basic principles.
9
3.3 – Phases
During analysis the operations stated in the source program are deter-
mined and recorded in a hierarchical structure called a tree. Often a
special kind of tree called a syntax tree is used.
In the synthesis part of the compilation the output is generated from
the contents of the syntax tree. There is often also some sort of optimi-
zation of the generated source in this part.
3.3 Phases
A compiler operates in phases, each of which transforms the source
program from one representation to another. A typical decomposition
of a compiler is shown in Figure 3.1. The following sections will dis-
cuss the different phases and how they are connected.
10
Chapter 3 – Compilers
3.4 Analysis
The analysis consists of three phases: lexical analysis, syntax analysis
and semantic analysis.
11
3.4 – Analysis
12
Chapter 3 – Compilers
Syntax tree
A more common internal representation of the syntactic structure is
the syntax tree. It is a compressed representation of the parse tree
where the operators appear as the nodes, and the operands of an oper-
ator are the children for that node. An example of a syntax tree is seen
in Figure 3.3.
13
3.4 – Analysis
14
Chapter 3 – Compilers
3.5 Synthesis
The synthesis consists of the three phases intermediate code genera-
tion, code optimizer and code generator. It is responsible for trans-
forming the source that is now in the form of a syntax tree to the
output language.
15
3.5 – Synthesis
Constant propagation
Constant propagation is simply the replacement of variable references
with constant references when possible. For example, the statement
a = 3;
function_call(a + 42);
becomes
function_call(3 + 42);
Constant folding
Expressions with constant operands can be calculated at compile time.
The example above would be transformed to
function_call(45);
Programmers usually do not write expressions such as 3+42 directly,
but these expressions are quite common after macro expansion and
other optimizations such as constant propagation.
16
Chapter 3 – Compilers
Expression simplification
Some expressions can be simplified by replacing them with a more
efficient expression. For example, i+0 will be replaced by i, i*0 and
i-i by 0, and so on.
Code motion
Expressions in a loop that gives the same result each time the loop is
iterated can be moved outside the loop and calculated only once
before entering the loop.
17
3.6 – Symbol table
Strength reduction
Strength reduction replaces expensive instructions with less expensive
instructions. For instance, a popular strength reduction is to replace a
multiplication by a constant power of two with a left shift.
18
Chapter 3 – Compilers
about the type and other attributes are added and is used in various
ways.
19
3.9 – Environment
3.9 Environment
In addition to the compiler, several other programs are required if an
executable program is to be created. See Figure 3.4.
3.9.1 Preprocessor
The preprocessor produce the input to the compiler. It often performs
different kinds of text processing, for example macro processing and
file inclusion. In C for example, all lines beginning with a # is an
instruction to the preprocessor. #define FAIL -1 causes all occur-
rences of FAIL to be replaced by -1, and #include <file.h> will
include the file file.h in the source program. In C the preprocessor
also removes the comments from the source program.
3.9.2 Assembler
Some compilers produce assembly code, and that must be passed to
an assembler for further processing. The assembler works much like a
compiler and translates the assembly source to relocatable machine
code.
20
Chapter 3 – Compilers
21
3.10 – Compiler tools
22
4
LCC
This chapter describes how the compiler LCC works. Most of this
information is collected from [2] and [3].
4.1 Introduction
LCC is a free ANSI C compiler that is designed to be retargetable. The
source code is available for download from the internet [6] under a
license [7] that imposes almost no restrictions at all.
This compiler was chosen because it is very small and simple. It is
designed in a way so that it is easy to retarget it to generate code for
other processors. There is also excellent documentation of LCC in the
form of a book that describes every detail of the implementation of the
entire compiler. It is called “A Retargetable C Compiler: Design and
Implementation” and it was used extensively during this thesis. The
thesis could probably not have been completed without the book.
Another compiler candidate was the GNU C Compiler, or GCC, from
the GNU Compiler Collection, which is an open source C compiler.
GCC would probably have generated better and faster code, but it
was not chosen because it is much bigger and more complex than
LCC. Also, the same kind of documentation that was available for
LCC was not available for GCC.
23
4.2 – C
4.2 C
C is a general purpose programming language that was developed
during the 1970’s by Brian Kernighan and Dennis Ritchie and it is still
widely used today. It is a relatively low level language where the basic
data types in the language correspond to real data types found in the
hardware. The language provides no operations to deal directly with
composite data types, such as strings, arrays, lists and so on. There are
no input/output facilities and no file access facilities. All these higher
level operations must be provided by library functions. This, and sev-
eral other limitations, has some advantages. It makes the language
small and relatively easy to learn. It does also mean that compilers for
the language will be smaller and easier to construct.
C has become very popular and there exists compilers for many differ-
ent processors and operating systems. Although it is far from an ideal
language for DSP processors, it is still extensively used for them. That
is probably because it is such a simple and low level language, which
makes it easier to construct a compiler that generates efficient code for
the DSP processors.
Over the years the C programming language has evolved and been
standardized a couple of times. The first version, called K&R C (from
Kernighan and Ritchie), is derived from the reference manual in the
first edition of the book “The C programming language” by Brian
Kernighan and Dennis Ritchie. In 1989 ANSI standardized the lan-
guage, and it is commonly referred to as ANSI C or C89. ISO has
released two standards for C, and they are called ISO C90 and ISO
C99.
24
Chapter 4 – LCC
Recognizing tokens
The lexical analyzer in LCC is written by hand, it is not generated by a
tool. This is due to the fact that the lexical structure in C is simple and
that generated analyzers tend to be large and slow.
The lexical analyzer is used by calling the function gettok(), which
returns the next token. The gettok() function recognizes a token by
using a switch statement on the first character in the token to classify
it. It then consumes the following characters that make up the token.
The following is a small sample of the code
...
switch (*rcp++) {
...
case '<':
if (*rcp == '=') return cp++, LEQ;
if (*rcp == '<') return cp++, LSHIFT;
return '<';
...
rcp and cp are pointers to the next character in the input file. The
code for identifying most of the tokens looks very similar to the exam-
ple, but identifying numbers, strings and identifiers is a bit harder.
However, it works in the same way by looking ahead in the input
stream.
25
4.3 – The compiler
Grammar
LCC uses a context free grammar written in EBNF form to define the
rules for the parser. The parser is constructed by writing a parsing
function for each nonterminal. The idea is to write a function X() for
each nonterminal X, using the productions for X as a guide to writing
the code for X(). For example, the parsing function for the following
production
expr → term { + term }
will look like
void expr(void){
term();
while(t == '+'){
t = gettok();
term();
}
}
The { and } in the production is an EBNF feature that means “zero or
more”.
26
Chapter 4 – LCC
There are no nodes for the nonterminals used when parsing this
expression, and there are no nodes for the tokens ( and ). The tokens
+ and * are contained in the nodes ADD+I and MUL+I. The nodes with
the operator ADDRG+P compute the address of the operand and
INDIR+I fetches integers at the address given by their operand.
The name of the nodes are constructed by an operator and a type suf-
fix that denotes the type that the operator operates on. For example,
the node ADD+I states that the node uses integer addition. Table 4.1
lists the different type suffixes available.
The trees can contain operators that do not appear in the source pro-
gram. For example, the INDIR+I node fetches integers at an address,
but there is no fetch operator in C. A list of operators that can appear
in the trees is seen in Table 4.2. In addition to these, there are six more
operators that are used in trees listed in Table 4.3.
27
4.3 – The compiler
28
Chapter 4 – LCC
Operator Operation
AND Logical AND
OR Logical OR
NOT Logical NOT
COND Conditional expression
RIGHT Composition
FIELD Bit-field access
Table 4.3: Tree operators
29
4.3 – The compiler
Trees contain operators that are not allowed for dags. The available
operators for dags are seen in Table 4.2. When the dags are con-
structed the operators that are not allowed are replaced by other oper-
ators instead. For example, the operator AND is replaced by a
comparison and jumps and labels.
Before the dags are passed to the back end they may be converted to
trees again. Some back ends wants trees and some wants dags. All
back ends that are included in the LCC distribution wants trees. When
the conversion is done nodes that are referenced multiple times
because of the common sub expression optimization are changed. The
result of the common subexpression is stored in a temporary variable
that is used instead. The resulting tree is still using the same data
structures and representation as the dags though.
Selecting instructions
The instruction selection is done in the function gencode(). The
instruction selectors used by LCC are generated automatically from a
specification by a program called lburg. lburg is a code generator
generator and it emits a tree parser written in C.
The core of an lburg specification is a tree grammar, which is a list of
rules where each rule has a nonterminal on the left and a pattern of
terminals and nonterminals on the right. For example, the rule
addr: ADDI4(reg, con)
matches a tree at an ADDI4 node if the node’s first child recursively
matches the nonterminal reg and the second child recursively matches
30
Chapter 4 – LCC
the nonterminal con. In Figure 4.3 the tree with the selected rules for
the statement i = c + 2 can be seen.
Tree grammars are usually ambiguous, which means that there can be
more than one selection of instructions that do the same thing. For
example, increasing a register by one can be done by adding one to the
register directly or by loading one into another register and adding
the two registers. The cheapest implementation is preferred, so a cost
is assigned to each rule and the parse tree with the lowest total cost is
selected.
Specifications
lburg specifications uses the following format
%{
configuration
%}
declarations
%%
rules
%%
C code
The configuration part is C code and is optional. It is copied
directly into the generated file. The same applies to the C code part.
The declarations part contains the start symbol and a list of all the
terminals. The rules part contains tree patterns. Each rule has an
assembler code template, which is a quoted string that specifies what
to emit when the rule is used. Rules end with an optional cost. The fol-
lowing is an example of a simple specification
31
4.3 – The compiler
%start stmt
%term ADDI4=309 ADDRLP1=295 ASGNI4=53
%term CNSTI4=21 INDIRI4=67
%%
con: CNSTI4 "1"
addr: ADDRLP1 "2"
addr: ADDI4(reg, con) "3"
rc: con "4"
rc: reg "5"
reg: ADDI4(reg, rc) "6" 1
reg: addr "7" 1
stmt: ASGNI4(addr, reg) "8" 1
In this example the assembler code templates are simply rule num-
bers. Rule 1 states that con matches constants. Rule 2 and 3 states that
addr matches trees that can be computed by address calculations, like
an ADDRLP1 or the sum of a register and a constant. rc matches a con-
stant or a reg, and reg matches any tree that can be computed into a
register. Rule 6 describes an add instruction. The first operand must be
in a register and the second operand can be a register or a constant.
The result is stored in a register. Rule 7 describes an instruction that
loads an address into a register. Rule 8 describes an instruction that
stores a register at an address.
The emitter
The emitter in the function emitcode() is what outputs the assem-
bler code from the assembler templates. Each rule has one assembler
template. If the template ends with a newline character, lburg
assumes that it is an instruction, otherwise it is assumed to be a piece
of an instruction.
When the emitter emits the template it treats some characters differ-
ently. %digit tells the emitter to emit the digit-th nonterminal from
the pattern. %c emits the nonterminal on the left side of the produc-
tion. For example, the rule
areg: ADDI4(reg, rc) "add %c,%0,%1"
might be emitted as
add a1,r1,#60
If the template begins with #, emit2() is called to emit the instruc-
tion. This is needed to deal with tricky features in some assemblers.
32
5
Implementation
5.1 Introduction
The main goal of this thesis was the design and implementation of a
new back end to the LCC compiler for the DSP56002 processor. One
other goal was to maintain compatibility with Motorola’s C compiler,
so that the generated code would behave in the same way. This means
that the two compilers use the registers in the same way, uses the
same memory layout, uses the same calling convention, and so on. By
doing this, code generated by Motorola’s compiler can use code com-
piled by this compiler, libraries for example, and vice versa.
LCC is designed so that retargeting should be as easy as possible, and
the included backs ends only consist of about 1000 lines of code each.
This chapter will describe how the back end was constructed and why
it looks and behaves as it does.
33
5.2 – The compiler
34
Chapter 5 – Implementation
data type. This pretty much makes this data type useless, but it is
implemented anyway.
Pointer types
All pointers are 16-bit. When computing addresses with integer arith-
metic only the least significant 16 bits are relevant. See Table 5.3.
Register Usage
R0 Frame pointer (16-bit)
R6 Stack pointer (16-bit)
Address registers used for pointers and structures. (16-
R1 – R5, R7
bit)
Compiler temporary. Used when updating the R-regis-
N0 – N7
ters. (16-bit)
M0 – M7 Unused. Must be kept as 0xFFFF. (16-bit)
48-bit general purpose register and 48-bit function
A
return value.
24-bit general purpose register and 24-bit and 16-bit
A1
function return value.
B 48-bit general purpose register.
B1 24-bit general purpose register.
Table 5.4: Register usage
35
5.2 – The compiler
Register Usage
X, Y 48-bit general purpose registers.
X0, X1, Y0, Y1 24-bit general purpose registers.
Table 5.4: Register usage
The bottom of the program memory contains the interrupt table and it
is filled with jumps to the subroutine Fabort, except at the first posi-
tion that contains a jump to the subroutine F__start. This is because
the DSP processor starts execution here. The F__start function takes
care of initialization and calls the Fmain subroutine, which is the com-
piled main() function. The rest of the program memory is used to
store the compiled functions.
The data memory is split in three parts. The bottom part contains data
defined in the crt0 file. The next part is used for global and static data,
and the final part is used for the runt-time stack and the heap.
36
Chapter 5 – Implementation
An example of what the stack looks like during a function call can be
seen in Figure 5.3. The code that is executed looks like this:
void main(void) {
func();
}
When the execution point reaches the call to func() there is a frame
on the stack for the function main(). A new frame is built and
37
5.2 – The compiler
pushed onto the stack and the frame pointer is updated. When the
function returns, it’s frame is removed from the stack and the frame
pointer is restored.
Caller sequence
The caller part of the calling sequence consists of:
1. Pushing the arguments onto the stack in reverse order.
2. Calling the function.
3. Adjust the stack pointer.
Callee sequence
During the initial part of the calling sequence, the called function is
responsible for:
1. Saving the return address and the old frame pointer.
2. Updating the frame and stack pointers.
3. Saving the following registers if they are used by the function:
B0, B1, X0, X1, Y0, Y1, R1 – R5 and R7.
Return sequence
During the final part of the calling sequence, the called function is
responsible for:
1. Placing the return value in register A.
38
Chapter 5 – Implementation
2. Testing the return value. This feature is not used by this com-
piler, but it is needed to maintain compatibility with the calling
convention used by Motorola.
Label Purpose
Local labels. Used for targets of jumps. # is a unique
L#
number.
Global variables and functions. <identifier> is the var-
F<identifier>
iable or function name.
F__<identifier># Variables static to a function.
Section names. The contents of each assembly file gen-
erated by the compiler are contained in a unique sec-
<filename_c>
tion. <filename_c> is the file currently being compiled
where the '.' is replaced by '_'.
Table 5.5: Naming convention
5.3 Retargeting
The back end in LCC is split in two parts, a machine independent and
a machine dependant part. All back ends use the same machine inde-
pendent part. This simplifies the construction of new back ends since
it does not require as much new code.
The different back ends in LCC are stored in .md files (md stands for
machine description). They contain everything that is needed for the
back ends. To create a new back end a new file is created. When LCC is
being compiled the files for the back ends are fed through the program
lburg. lburg generates C code from the .md files, and that is then
compiled together with the rest of the source code for LCC to create
the compiler.
The format of the .md file is the same as of the specification mentioned
in section 4.3.5 on page 30. The complete file for the new back end is
called dsp56k.md and can be seen in Appendix C on page 61.
The following sections will give a more detailed description of the file.
The section names correspond to the names used in section 4.3.5.
39
5.3 – Retargeting
5.3.1 Configuration
This part contains declarations and function definitions. They are cop-
ied to the top of the generated C file and is almost identical for all back
ends. It is only the global variables that differ somewhat.
The arrays ireg[32], ireg2[32] etc. are used to hold information
about the registers. The variables iregw, iregw2 etc. are used to hold
an entire register set, also called a wildcard. cseg is used to hold the
current segment and retstruct is a flag that is set if the function
being compiled returns a structure.
5.3.2 Declarations
This part contains all the terminals that are used for the rules. They are
generated by a program called ops that is included in the LCC distri-
bution. The command line given to the program was the following:
ops c=1 s=1 i=1 l=2 h=2 f=1 d=2 x=2 p=1
The letter c stands for char, s for short, and so on, covering all the
data types in C. x means long double and p means pointers. The
number indicates how many bytes (or words) each data type use. The
output from the program is a list with all the terminals that can appear
with the given data type sizes and it is copied directly into the .md
file.
5.3.3 Rules
This is the core of the back end, and the rules were the most time con-
suming part to write. They required a lot of testing and fine tuning.
The rules were constructed incrementally. It started with a few basic
rules that was only able to compile empty programs. Then new rules
were gradually added to handle more operations, but only for one
data type (1-word integers). When all operators were covered addi-
tional rules to take care of all data types were added. While writing
the rules, the function emit2() was also constructed to take care of
the cases where the instruction templates were not enough. This func-
tion is documented further down.
The other back ends were used as an inspiration when writing this,
but a lot is different because of the nature of the DSP56002.
5.3.4 C code
This is the actual interface to the back end that the front end uses. It is
made up of a number of interface functions and a structure called the
40
Chapter 5 – Implementation
interface record that contains configuration data. The front end calls
the functions to inform the back end of things during the compilation.
It also calls the functions to let the back end emit the actual assembly
code.
This part was created in the same way and at the same time as the
rules were written. At first only a few functions are needed to compile
an empty program. As more and more rules were created the func-
tions needed to have more and more features.
The following is a description of the functions and the interface record
in the .md file. They are listed in the same order as they appear in the
file.
progbeg()
The front end calls progbeg() during initialization to set up varia-
bles and initialize data and to emit the boilerplate at the beginning of
the generated assembly file. It is also responsible for checking and tak-
ing care of the command line arguments passed to the back end.
progend()
When the compilation ends the front end calls progend to give the
back end a chance to clean up and finalize its output.
rmap()
This function is used to tell the front end which register class the oper-
ators should use. P and B is pointers and structures, they use the R-
registers. I, U and F are integers, unsigned integers and floating point
numbers. They use the same registers; X and Y if they are 48-bit (2
words) and X0, X1, Y0 and Y1 if they are 24-bit (1 word).
segment()
The front end tells the back end which segment it should use, CODE,
BSS, DATA or LIT. CODE is used for code, BSS for uninitialized varia-
bles, DATA for initialized variables and LIT for constants. For this
implementation CODE uses the P memory and all the data uses the Y
memory.
target()
This function is used because some instructions must have their oper-
ands in certain registers or they leave the result in a specific register. In
this implementation there are a lot of instructions that needs to be tar-
41
5.3 – Retargeting
clobber()
Some instructions destroy the value in a register. This function tells
the front end to insert instructions to save and restore the register
before and after an instruction that destroys it.
emit2()
Some instructions are too complicated to be emitted using the instruc-
tion templates. They are emitted by this function instead. There are a
lot of different reasons to why an instruction must be emitted by
emit2() instead. For example, reg: CNSTI2 loads a constant into a
register, but the hardware can not move a 48-bit constant to a register
using one instruction, so the constant must be split in two parts and
moved to the register using two move instructions. This can not be
done with the templates so emit2() emits it instead.
doarg()
doarg() is called for each argument before a function call. It is used
to compute the register or stack cell assigned to the next argument.
For this implementation all function arguments are put on the stack.
local()
This function is used by the front end to announce local variables to
the back end. It is used to set the stack offset for the variables.
function()
function() is used by the front end to generate and emit code for a
function. It is usually divided in three parts. In the first part initializa-
tion is being done. Offsets for the argument variables are calculated.
42
Chapter 5 – Implementation
After that gencode() is called to generate the code for the function.
In the second part the size of the frame and the registers that need sav-
ing are known. The function prologue is emitted to save the old frame
pointer and to set up a new stack pointer and save the necessary regis-
ters. After that emitcode() is called to emit the actual assembly code
for the function. In the third part the function epilogue is emitted to
restore registers and the frame and stack pointers.
defsymbol()
defsymbol() is called by the front end whenever a new symbol is
defined. A symbol is an internal data type in the compiler that repre-
sents variables, constants, labels and types. This function sets up the
name that the back end uses for the symbols.
address()
This function is used to initialize a symbol that represents an address
on the form x+n, where x is a symbol name and n is a number. This is
used so that addresses can be calculated by the assembler instead of at
run time.
defconst()
This function is used to emit assembly code for constants.
defaddress()
This function emits assembly for pointer constants.
defstring()
This function is used to emit assembly code to initialize a string.
There is a special case that needs to be taken care of here. Internally
the compiler treats all arrays with the data type “1-byte integer” as
strings. This causes a problem since char, short and int are all of
this data type in this implementation. Therefore the compiler believes
that all arrays of these data types are strings and tries to emit them as
strings. A change in the front end was made so that if the variable n
(the length of the string) is -1, the string pointer *str contains the
actual value instead of pointing to a string that should be emitted.
43
5.4 – Special features
global()
The front end calls this function to emit assembly to make a variable
global.
space()
This function is used to emit code that creates a block of words set to
zero.
Interface record
The interface record is used to configure the back end. It consists of a
structure that is assigned to the global variable IR. It is done when the
compiler is initialized and it is set to the structure of the back end that
the compiler chooses. The front end can then call the interface func-
tions in the back end on the form (*IR->progbeg)(arg1,arg2)
and access the interface variables on the form IR->wants_dag.
The first part of the interface record sets up the size and alignment of
the data types in C. After that are some flags to set up specific features.
The rest is used to assign the functions documented above.
44
Chapter 5 – Implementation
45
5.7 – crt0
5.7 crt0
The crt0 file is linked with all programs when they are compiled. It
contains the C bootstrap code and provides the environment to exe-
cute a C program. The bootstrap code is put first in the resulting exe-
cutable and is always the code that gets executed first when a program
runs. It defines some global variables, initializes the memories, stack
and frame pointers and registers. It then jumps to the main() func-
tion in the C program.
5.8 Problems
There have been some problems during the development of the back
end. Most of them originates from the fact that LCC is intended for
normal processors and not DSP processors.
46
Chapter 5 – Implementation
47
5.8 – Problems
x = a + b;
y = a + c;
will cause the value of a to be fetched two times instead of being
fetched and then reused.
The reason to why the register allocator crashes is unknown, but if the
problem could be fixed the performance of the generated code would
increase a lot.
48
Chapter 5 – Implementation
ble. This is very costly since it is used for both local variables and
function arguments.
5.9 Improvements
There are some improvements that could be made to the compiler.
One obvious thing to improve is the register allocator. The first thing
to do is to fix the crashing so it can promote variables to registers. One
other thing that can be done is to introduce new register classes for the
input and accumulator registers so it can take full advantage of all
available registers.
The DSP56002 has some hardware features that could be used. Some
of them may be a bit tricky to implement:
• The DSP can do hardware looping, but that would be difficult to
use since it has a lot of restrictions.
• The instruction MAC, which means multiply and accumulate,
could be used to speed up the generated code. That would rela-
tively easy to implement since it could be realised with a rule
that looks like this:
reg: ADDI4(MULI4(reg, reg), reg)
It would match two nodes in the tree and use fewer instructions.
There are other instructions that could be used in the same way,
for example INC, which increases a register by one.
• The DSP can execute one instruction and move data to and from
the memories at the same time. This can be utilized to speed up
the code, but it would be difficult to implement. The Motorola
compiler uses this by running the generated assembly code
trough an external program called alo. It analyses the code and
concatenates MOVE instructions with other instructions. For
example, the code
move x0,Y:(r1)
add x1,a
would be transformed to
add x1,a x0,Y:(r1)
This kind of optimization would increase the performance of the code
a lot.
49
5.9 – Improvements
50
6
Conclusions
6.1 Retargeting
The goal of this thesis was to retarget a C compiler for the Motorola
DSP56002 processor. The resulting compiler should generate working
code that functions as intended. That goal was fulfilled. A working
compiler was constructed. The performance of the code that the com-
piler generates is far from optimal, but since that was not a require-
ment, only a limited amount of time was spent to improve it.
The second goal was to make the compiler compatible with the com-
piler from Motorola. This was also accomplished and the compiler can
be used with the assembler and linker from Motorola to form a com-
plete environment that compiles the C code.
The code that the compiler generates was tested in a simulator for the
DSP56002 processor to verify that it works. This was very useful and a
lot of bugs were found using the simulator. There is probably still a
few bugs left in the compiler since there was not enough time to test
everything.
An example of a compilation of a simple program can be seen in
Appendix B on page 59.
51
6.2 – Future work
52
References
Internet
[6] LCC home page:
http://www.cs.princeton.edu/software/lcc/
[7] LCC license:
http://www.cs.princeton.edu/software/lcc/4.1/CPYRIGHT
53
54
A
Instructions
Instruction Description
ABS Absolute value *
ADC Add long with carry *
ADD Add *
ADDL Shift left then add *
ADDR Shift right then add *
ASL Arithmetic shift accumulator left *
ASR Arithmetic shift accumulator right *
CLR Clear accumulator *
CMP Compare *
CMPM Compare magnitude *
DEC Decrement accumulator
Table A.1: Arithmetic instructions
55
Instruction Description
DIV Divide iteration
INC Increment accumulator
MAC Signed multiply-accumulate *
MACR Signed multiply-accumulate and round *
MPY Signed multiply *
MPYR Signed multiply and round *
NEG Negate accumulator *
NORM Normalize accumulator iteration
RND Round accumulator *
SBC Subtract long with carry *
SUB Subtract *
SUBL Shift left then subtract *
SUBR Shift right then subtract *
Tcc Transfer Conditionally
TFR Transfer data ALU register *
TST Test accumulator *
Table A.1: Arithmetic instructions
Instruction Description
AND Logical AND *
ANDI AND immediate with control register
EOR Logical exclusive OR *
LSL Logical shift accumulator left *
LSR Logical shift accumulator right *
NOT Logical complement on accumulator *
OR Logical inclusive OR *
ORI OR immediate with control register
ROL Rotate accumulator left *
ROR Rotate accumulator right *
Table A.2: Logical instructions
56
Appendix A – Instructions
Instruction Description
BCHG Bit test and change
BCLR Bit test and clear
BSET Bit test and set
BTST Bit test on memory
Table A.3: Bit manipulation instructions
Instruction Description
DO Start hardware loop
ENDDO Exit from hardware loop
Table A.4: Loop instructions
Instruction Description
LUA Load updated address
MOVE Move data
MOVEC Move control register
MOVEM Move program memory
MOVEP Move peripheral data
Table A.5: Move instructions
57
A.6 Program control instructions
Instruction Description
DEBUG Enter debug mode
DEBUGcc Enter debug mode conditionally
ILLEGAL Illegal instruction interrupt
Jcc Jump conditionally
JCLR Jump if bit clear
JMP Jump
JScc Jump to subroutine conditionally
JSCLR Jump to subroutine if bit clear
JSET Jump if bit set
JSSET Jump to subroutine if bit set
JSR Jump to subroutine
NOP No operation
REP Repeat next instruction
RESET Reset on-chip peripheral devices
RTI Return from interrupt
RTS Return from subroutine
STOP Stop processing (low power standby)
SWI Software interrupt
WAIT Wait for interrupt (low power standby)
Table A.6: Program control instructions
58
B
Sample code
B.1 sample.c
int res = 1;
int x = 10;
void main(void) {
if(res == 1)
res = res + x;
else
res = x;
}
59
B.2 sample.asm
section sample_c
boilerplate
opt so,nomd,rp
org y:
global Fres
Fres
dc $1
initialized data
global Fx
Fx
dc $a
org p:
endsec
end boilerplate
60
C
dsp56k.md
%}
%start stmt
61
%term CNSTU1=1046 CNSTU2=2070
%term ARGB=41
%term ARGF1=1057 ARGF2=2081
%term ARGI1=1061 ARGI2=2085
%term ARGP1=1063
%term ARGU1=1062 ARGU2=2086
%term ASGNB=57
%term ASGNF1=1073 ASGNF2=2097
%term ASGNI1=1077 ASGNI2=2101
%term ASGNP1=1079
%term ASGNU1=1078 ASGNU2=2102
%term INDIRB=73
%term INDIRF1=1089 INDIRF2=2113
%term INDIRI1=1093 INDIRI2=2117
%term INDIRP1=1095
%term INDIRU1=1094 INDIRU2=2118
%term CVPU1=1174
%term CALLB=217
%term CALLF1=1233 CALLF2=2257
%term CALLI1=1237 CALLI2=2261
%term CALLP1=1239
%term CALLU1=1238 CALLU2=2262
%term CALLV=216
%term ADDRGP1=1287
%term ADDRFP1=1303
%term ADDRLP1=1319
62
Appendix C – dsp56k.md
%term JUMPV=584
%term LABELV=600
%term VREGP=711
%term LOADI1=1253
%term LOADI2=2277
%term LOADU1=1254
%term LOADU2=2278
%term LOADB=233
%term LOADF1=1249
%term LOADF2=2273
%term LOADP1=1255
%%
stmt: reg ““
stmt: LABELV “%a:\n”
63
stmt: ASGNF1(addr, reg) “\tmove \t%1,%0\n” 2
stmt: ASGNF2(addr, reg) “# \tmove \t%1,%0\n” 2
stmt: ASGNP1(addr, rreg) “\tmove \t%1,%0\n” 2
stmt: ASGNB(rreg, INDIRB(rreg)) “# ASGNB\n”
64
Appendix C – dsp56k.md
65
reg: LSHU1(reg, rc) “\trep \t%1\n\tlsl \t%0\n” 4+2
reg: LSHU1(reg, con1) “\tlsl \t%0\n” 4
reg: LSHU2(reg, rc) “\trep \t%1\n\tasl \t%0\n” 4+2
reg: LSHU2(reg, con1) “\tasl \t%0\n” 4
%%
parseflags(argc, argv);
iregw = mkwildcard(ireg);
ireg2w = mkwildcard(ireg2);
aregw = mkwildcard(areg);
areg2w = mkwildcard(areg2);
rregw = mkwildcard(rreg);
66
Appendix C – dsp56k.md
// so - write symbols
// nomd - do not write macro defs
// rp - generate nop insn to accomodate pipeline delay
print(“\topt \tso,nomd,rp\n”);
67
setreg(p, areg2[0]); //a
} else {
rtarget(p, 0, areg2[0]); //a
rtarget(p, 1, areg2[1]); //b
setreg(p, areg2[0]); //a
}
break;
case LSH+I: case RSH+I: case LSH+U: case RSH+U:
if(opsize(p->op) == 1) {
rtarget(p, 0, areg2[0]); //a
setreg(p, areg[1]); //a1
} else {
rtarget(p, 0, areg2[0]); //a
setreg(p, areg2[0]); //a1
}
break;
case NEG+I: case NEG+F:
rtarget(p, 0, areg2[0]); //a
setreg(p, areg2[0]); //a
break;
case BAND+I: case BOR+I: case BAND+U: case BOR+U:
case BXOR+I: case BXOR+U:
if(opsize(p->op) == 1) {
rtarget(p, 0, ireg[3]); //y1
rtarget(p, 1, areg[1]); //a1
setreg(p, areg[1]); //a1
} else {
rtarget(p, 1, areg2[0]); //a
setreg(p, areg2[0]); //a
}
break;
case BCOM+I: case BCOM+U:
if(opsize(p->op) == 1) {
rtarget(p, 0, areg[1]); //a1
setreg(p, areg[1]); //a1
} else {
rtarget(p, 0, areg2[0]); //a
setreg(p, areg2[0]); //a
}
break;
case EQ+I: case GE+I: case GT+I: case LE+I: case LT+I: case NE+I:
case EQ+U: case GE+U: case GT+U: case LE+U: case LT+U: case NE+U:
case EQ+F: case GE+F: case GT+F: case LE+F: case LT+F: case NE+F:
rtarget(p, 0, areg2[0]); //a
if(opsize(p->op) == 2) {
rtarget(p, 1, areg2[1]); //b
}
break;
case CALL+I: case CALL+U: case CALL+F: case CALL+P: case CALL+V:
setreg(p, areg2[0]); //a
break;
case RET+I: case RET+U: case RET+F:
rtarget(p, 0, areg2[0]); //a
break;
case RET+P:
rtarget(p, 0, areg[1]); //a1
break;
case CVI+I: case CVI+U: case CVI+F: case CVU+I: case CVU+U:
case CVF+I: case CVF+F:
if(opsize(p->op) == 1){
setreg(p, areg2[0]); //a
} else {
setreg(p, areg2[0]); //a
}
break;
case LOAD+I: case LOAD+U: case LOAD+F:
if(p->kids[0]->x.inst == _rreg_NT){
break;
}
if(opsize(p->op) == 1){
rtarget(p, 0, areg2[0]); //a
} else {
rtarget(p, 0, areg2[0]); //a
}
break;
default:
break;
}
}
68
Appendix C – dsp56k.md
} else {
assert(0);
}
}
print(“\tmove \t”);
emitasm(p->kids[0], _addr_NT);
if(p->kids[0]->x.inst == _rreg_NT){
print(“-,%s\n”, reg_name(dst + 1));
} else {
print(“+1,%s\n”, reg_name(dst + 1));
}
}
}
break;
}
case ASGN+I: case ASGN+U: case ASGN+F: {
if(p->kids[0]->op != VREG+P){
if(opsize(p->op) == 2){
int src = getregnum(p->kids[1]);
//stmt: ASGNI2(addr, reg) “\tmove \t%1,%0\n”
print(“\tmove \t%s,”, reg_name(src));
emitasm(p->kids[0], _addr_NT);
if(p->kids[0]->x.inst == _rreg_NT){
print(“+\n”);
} else {
print(“\n”);
}
69
} else if(p->kids[1]->x.inst == _rreg_NT){
int reg = getregnum(p->x.kids[1]) - 8;
int dst = getregnum(p) - 8;
print(“\tmove \t”);
emitasm(p->kids[0], _rc_NT);
print(“,n%d\n\tlua \t(r%d)+n%d,r%d\n”, reg, reg, reg, dst);
} else
assert(0);
break;
case SUB+P:
//rreg: SUBP1(rc, rreg) “# \tmove \t%0,n8\n\tlua \t(%1)-n8,%c\n”
//rreg: SUBP1(rreg, rc) “# \tmove \t%1,n8\n\tlua \t(%0)-n8,%c\n”
if(p->kids[0]->x.inst == _rreg_NT){
int reg = getregnum(p->x.kids[0]) - 8;
int dst = getregnum(p) - 8;
print(“\tmove \t”);
emitasm(p->kids[1], _rc_NT);
print(“,n%d\n\tlua \t(r%d)-n%d,r%d\n”, reg, reg, reg, dst);
} else if(p->kids[1]->x.inst == _rreg_NT){
int reg = getregnum(p->x.kids[1]) - 8;
int dst = getregnum(p) - 8;
print(“\tmove \t”);
emitasm(p->kids[0], _rc_NT);
print(“,n%d\n\tlua \t(r%d)-n%d,r%d\n”, reg, reg, reg, dst);
} else
assert(0);
break;
case MUL+I: case MUL+U: case MUL+F: {
//reg: MULI1(reg, reg) “\tmpy \t%0,%1,%c\n\tasr \t%c\n
// \tmove \t%c0,%c\n”
if(opsize(p->op) == 1) {
int s1 = getregnum(p->x.kids[0]);
int s2 = getregnum(p->x.kids[1]);
int d = getregnum(p);
char *src1 = p->x.kids[0]->syms[RX]->x.name;
char *src2 = p->x.kids[1]->syms[RX]->x.name;
if(s1 == s2){
//temp_reg = x1, x0 if x1 is taken
char *temp_reg = (s1 == 0) ? (ireg[1]->x.name) : (ireg[0]->x.name);
print(“\tmove \t%s,Y:(r6)\n”, temp_reg);
print(“\tmove \t%s,%s\n”, src2, temp_reg);
print(“\tmpy \t%s,%s,%s\n”,src1, temp_reg, p->syms[RX]->x.name);
if(optype(p->op) == I || optype(p->op) == U){
print(“\tasr \t%s\n”, p->syms[RX]->x.name);
print(“\tmove \t%s0,%s Y:(r6),%s\n”, p->syms[RX]->x.name,
p->syms[RX]->x.name, temp_reg);
}
} else {
print(“\tmpy \t%s,%s,%s\n”,src1, src2, p->syms[RX]->x.name);
if(optype(p->op) == I || optype(p->op) == U){
print(“\tasr \t%s\n”, p->syms[RX]->x.name);
print(“\tmove \t%s0,%s\n”, p->syms[RX]->x.name, p->syms[RX]->x.name);
}
}
} else {
// long mpy - copied from motorola
// a and b is input regs and b is output reg
int lab1 = genlabel(1);
int lab2 = genlabel(1);
if(optype(p->op) == F){
warning(“double multiply is not implemented; using long multiply instead\n”);
}
print(“\t;; begin long multiply\n”);
print(“\tmoven0,y:(r6)+\n”);
print(“\tmovea0,y:(r6)+\n”);
print(“\tmovea1,y:(r6)+\n”);
print(“\tmovex0,y:(r6)+\n”);
print(“\tmovex1,y:(r6)+\n”);
print(“\tmovey0,y:(r6)+\n”);
print(“\tmove#$0,n0\n”);
print(“\tmovea1,x0\n”);
print(“\teorx0,bb1,x1\n”);
print(“\tjplL%d\n”, lab1);
print(“\tmove#$1,n0\n”);
print(“L%d:\n”, lab1);
print(“\tabsa x1,b1\n”);
print(“\tabsb y1,y:(r6)+\n”);
print(“\tmoveb0,y:(r6)+\n”);
print(“\tmoveb1,y:(r6)\n”);
print(“\tclrb b1,x0\n”);
print(“\tmovex0,b0\n”);
print(“\taslb\n”);
print(“\taslb\n”);
print(“\tclrb b1,x0\n”);
print(“\tmovea1,b0\n”);
print(“\taslb\n”);
print(“\taslb #$7fffff,y1\n”);
print(“\tmoveb1,x1\n”);
print(“\tmovey:(r6)-,b\n”);
print(“\tmovey:(r6),b0\n”);
print(“\tmovex0,y:(r6)+\n”);
print(“\tmovex1,y:(r6)\n”);
70
Appendix C – dsp56k.md
print(“\taslb\n”);
print(“\tandy1,b\n”);
print(“\tmoveb1,x0\n”);
print(“\tasrb\n”);
print(“\tmoveb0,b\n”);
print(“\tandy1,b\n”);
print(“\tmoveb1,x1\n”);
print(“\tasla\n”);
print(“\tandy1,ay1,b1\n”);
print(“\tmovea1,y0\n”);
print(“\tasra\n”);
print(“\tmovea0,y1\n”);
print(“\tandy1,b\n”);
print(“\tmoveb1,y1\n”);
print(“\tmpyx1,y1,b\n”);
print(“\tmpyx1,y0,a\n”);
print(“\tmacx0,y1,a\n”);
print(“\tmovea1,a2\n”);
print(“\tmovea0,a1\n”);
print(“\tmove#$0,a0\n”);
print(“\tasra\n”);
print(“\tadda,b\n”);
print(“\tmpyx0,y0,a\n”);
print(“\tmovey:(r6)-,y0\n”);
print(“\tmovey:(r6)-,x0\n”);
print(“\tmacy0,x1,a\n”);
print(“\tmacx0,y1,a\n”);
print(“\tclra a0,x0\n”);
print(“\tmovex0,a2\n”);
print(“\tasra y:(r6)-,y1\n”);
print(“\tasra y:(r6)-,y0\n”);
print(“\tadda,by:(r6)-,x1\n”);
print(“\tasrb y:(r6)-,x0\n”);
print(“\tmoven0,a\n”);
print(“\ttsta y:(r6)-,a\n”);
print(“\tjeqL%d\n”, lab2);
print(“\tnegb\n”);
print(“L%d:\n”, lab2);
print(“\ttstb y:(r6)-,a0\n”);
print(“\tmovey:(r6),n0\n”);
print(“\t;; end long multiply\n”);
}
break;
}
case DIV+I: case DIV+U: case DIV+F:{
if(opsize(p->op) == 1){
char *src = ireg[getregnum(p->x.kids[1])]->x.name;
char *dst = p->syms[RX]->x.name;
int lab = genlabel(1);
print(“\tmove \tb,Y:(r6)+\n”);
print(“\tabs \t%s %s,b\n”, dst, dst); //b
if(optype(p->op) == I || optype(p->op) == U){
print(“\tclr \t%s %s1,Y:(r6)\n”, dst, dst);
print(“\tmove \tY:(r6),%s0\n”, dst);
print(“\tasl \t%s\n”, dst);
}
print(“\trep \t#24\n”);
print(“\tdiv \t%s,%s\n”, src, dst);
print(“\teor \t%s,b\n”, src); //b
print(“\tjpl \tL%d\n”, lab);
print(“\tneg \t%s\n”, dst);
print(“L%d:\tmove \t%s0,%s\n”, lab, dst, dst);
print(“\tmove \tY:-(r6),b\n”);
} else {
// long div - copied from motorola
// a and b is input regs and a is output reg
int lab[11];
int i;
for(i = 0; i < 11; i++){
lab[i] = genlabel(1);
}
if(optype(p->op) == F){
warning(“double division is not implemented; using long division instead\n”);
}
print(“\t;;-- begin long division\n”);
print(“move n0,y:(r6)+\n”);
print(“move b0,y:(r6)+\n”);
print(“move b1,y:(r6)+\n”);
print(“move x0,y:(r6)+\n”);
print(“move x1,y:(r6)+\n”);
print(“move y0,y:(r6)+\n”);
print(“move y1,y:(r6)+\n”);
print(“\n”);
print(“move #$0,n0\n”);
print(“move b1,y1\n”);
print(“eor y1,a a1,y0\n”);
print(“jpl L%d\n”, lab[2]);
print(“move #$1,n0\n”);
print(“L%d:\n”, lab[2]);
print(“abs b y0,a1\n”);
print(“abs a #$0,x1\n”);
print(“\n”);
71
print(“move b1,y1\n”);
print(“clr b b0,y0\n”);
print(“move b1,y:(r6)\n”);
print(“\n”);
print(“ori #$04,mr\n”);
print(“asl a #>$1,x0\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[3]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[3]);
print(“andi #$f3,mr\n”);
print(“div x1,b\n”);
print(“\n”);
print(“do #$2f,L%d\n”, lab[4]);
print(“btst #23,y:(r6)\n”);
print(“jcs L%d\n”, lab[5]);
print(“add x,a\n”);
print(“move #$0,a2\n”);
print(“ori #$04,mr\n”);
print(“asl a\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[6]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[6]);
print(“andi #$f3,mr\n”);
print(“div x1,b\n”);
print(“sub y,b\n”);
print(“jmp L%d\n”, lab[7]);
print(“L%d:\n”, lab[5]);
print(“move #$0,a2\n”);
print(“ori #$04,mr\n”);
print(“asl a\n”);
print(“andi #$fe,ccr\n”);
print(“jec L%d\n”, lab[8]);
print(“ori #$01,ccr\n”);
print(“L%d:\n”, lab[8]);
print(“andi #$f3,mr\n”);
print(“div x1,b\n”);
print(“add y,b\n”);
print(“L%d:\n”, lab[7]);
print(“move a1,x1\n”);
print(“move b1,a1\n”);
print(“eor y1,a\n”);
print(“move b1,y:(r6)\n”);
print(“move x1,a1\n”);
print(“move #$0,x1\n”);
print(“L%d:\n”, lab[4]);
print(“btst #23,y:(r6)\n”);
print(“jcs L%d\n”, lab[9]);
print(“add x,a\n”);
print(“L%d:\n”, lab[9]);
print(“move #$80,x1\n”);
print(“eor x1,a (r6)-\n”);
print(“move #$0,a2\n”);
print(“\n”);
print(“move n0,b\n”);
print(“tst b\n”);
print(“jeq L%d\n”, lab[10]);
print(“neg a\n”);
print(“L%d:\n”, lab[10]);
print(“move y:(r6)-,y1\n”);
print(“move y:(r6)-,y0\n”);
print(“move y:(r6)-,x1\n”);
print(“move y:(r6)-,x0\n”);
print(“move y:(r6)-,b\n”);
print(“tst a y:(r6)-,b0\n”);
print(“move y:(r6),n0\n”);
print(“\t;;-- end long division\n”);
}
break;
}
case MOD+I: case MOD+U: {
if(opsize(p->op) == 1){
char *src = ireg[getregnum(p->x.kids[1])]->x.name;
char *dst = p->syms[RX]->x.name;
int lab = genlabel(1);
print(“\tmove \tb,Y:(r6)+\n”);
print(“\tabs \t%s %s,b\n”, dst, dst); //b
print(“\tclr \t%s %s1,Y:(r6)\n”, dst, dst);
print(“\tmove \tY:(r6),%s0\n”, dst);
print(“\tasl \t%s\n”, dst);
print(“\trep \t#24\n”);
print(“\tdiv \t%s,%s\n”, src, dst);
72
Appendix C – dsp56k.md
73
print(“move #$0,x1\n”);
print(“btst #23,y:(r0)\n”);
print(“jcc L%d\n”, lab[9]);
print(“add x,b\n”);
print(“L%d:\n”, lab[9]);
print(“move b1,x1\n”);
print(“move y:(r0),b1\n”);
print(“move y:(r1),x0\n”);
print(“eor x0,b\n”);
print(“move x1,b1\n”);
print(“jpl L%d\n”, lab[10]);
print(“btst #23,y:(r0)\n”);
print(“jcc L%d\n”, lab[11]);
print(“sub y,a\n”);
print(“jmp L%d\n”, lab[10]);
print(“L%d:\n”, lab[11]);
print(“add y,a\n”);
print(“L%d:\n”, lab[10]);
print(“move n0,b\n”);
print(“tst b\n”);
print(“jeq L%d\n”, lab[13]);
print(“neg a\n”);
print(“L%d:\n”, lab[13]);
print(“move (r6)-\n”);
print(“move (r6)-\n”);
print(“move (r6)-\n”);
print(“move y:(r6)-,r1\n”);
print(“move y:(r6)-,r0\n”);
print(“move y:(r6)-,y1\n”);
print(“move y:(r6)-,y0\n”);
print(“move y:(r6)-,x1\n”);
print(“move y:(r6)-,x0\n”);
print(“move y:(r6)-,b\n”);
print(“tst a y:(r6)-,b0\n”);
print(“move y:(r6),n0\n”);
print(“\t;;-- end long modulo\n”);
}
break;
}
case ARG+I: case ARG+U: case ARG+F:{
if(opsize(p->op) == 2){
int src = getregnum(p->x.kids[0]);
//stmt: ARGI2(reg) “# \tmove \t%0,Y:(r6)+\n”
print(“\tmove \t%s,Y:(r6)+\n”, reg_name(src));
print(“\tmove \t%s,Y:(r6)+\n”, reg_name(src + 1));
}
break;
}
case ARG+B: {
int src = getregnum(p->x.kids[0]) - 8;
int label = genlabel(1);
int size = p->syms[0]->u.c.v.i;
print(“\tmove \t#%d,n6\n”, size);
print(“\tmove \tr%d,n%d\n”, src, src);
print(“\tmove \tx0,Y:(r6+n6)\n”);
if(size > 1) {
print(“\tdo \t#%d,L%d\n”, size , label);
}
print(“\tmove \tY:(r%d)+,x0\n”, src);
print(“\tmove \tx0,Y:(r6)+\n”);
if(size > 1) {
print(“L%d:\n”, label);
}
print(“\tmove \tn%d,r%d\n”, src, src);
print(“\tmove \tY:(r6),x0\n”);
break;
}
case BAND+I: case BAND+U:
case BOR+I: case BOR+U:
case BXOR+I: case BXOR+U: {
//reg: BANDI2(reg, reg) “\tand \t%0,%1\n”
int src = getregnum(p->x.kids[0]);
int dst = getregnum(p->x.kids[1]);
print(“\tand \t%s,”, reg_name(src + 1));
emitasm(p->kids[1], _reg_NT);
print(“\t%s,Y:(r6)\n”, reg_name(dst));
print(“\tmove \t%s,%s\n”, reg_name(dst + 1), reg_name(dst));
print(“\tmove \tY:(r6),%s\n”, reg_name(dst + 1));
switch(generic(p->op)){
case BAND:
print(“\tand \t%s,”, reg_name(src));
break;
case BOR:
print(“\tor \t%s,”, reg_name(src));
break;
case BXOR:
print(“\teor \t%s,”, reg_name(src));
break;
}
emitasm(p->kids[1], _reg_NT);
74
Appendix C – dsp56k.md
static void blkloop(int dreg, int doff, int sreg, int soff,
int size, int tmps[]) {
debug2(fprint(stderr, “Inside %s\n”, “blkloop”));
}
usedmask[0] = usedmask[1] = 0;
freemask[0] = freemask[1] = ~(unsigned)0;
offset = -3;
for(i = 0; callee[i]; i++) {
75
Symbol p = callee[i];
Symbol q = caller[i];
assert(q);
if(q->type->size == 2) { //long, double
p->x.offset = q->x.offset = offset - 1;
p->x.name = q->x.name = stringf(“%d”, offset - 1);
} else { //char, short, int, float
p->x.offset = q->x.offset = offset;
p->x.name = q->x.name = stringf(“%d”, offset);
}
p->sclass = q->sclass = AUTO;
offset -= q->type->size;
}
assert(caller[i] == 0);
offset = maxoffset = 0;
retstruct = isstruct(freturn(f->type));
debug2(fprint(stderr, “offset: %d, maxoffset: %d, argoffset: %d, maxargoffset: %d\n”, offset, maxoffset,
argoffset, maxargoffset));
if(maxoffset > 0) {
print(“\tmove \t#%d,n6 \t\t;; update stack with local and temp offset\n”, maxoffset);
print(“\tmove \t(r6)+n6 \t;;\n”);
}
if(retstruct == -1){
print(“\tmove \ta,r7\n”);
retstruct = 0;
}
if(maxoffset > 0) {
print(“\tmove \t#%d,n6 \t\t;; restore stack\n”, maxoffset);
print(“\tmove \t(r6)-n6 \t;;\n”);
}
76
Appendix C – dsp56k.md
77
debug2(fprint(stderr, “Inside %s: %s\n”, “import”, p->name));
}
Interface dsp56kIR = {
1, 1, 0, /* char */
1, 1, 0, /* short */
1, 1, 0, /* int */
2, 2, 0, /* long */
2, 2, 0, /* long long */
1, 1, 1, /* float */
2, 2, 1, /* double */
2, 2, 1, /* long double */
1, 1, 0, /* T * */
0, 1, 0, /* struct */
1, /* little_endian */
0, /* mulops_calls */
1, /* wants_callb */
1, /* wants_argb */
0, /* left_to_right */
0, /* wants_dag */
0, /* unsigned_char */
address,
blockbeg,
blockend,
defaddress,
defconst,
defstring,
defsymbol,
emit,
export,
function,
gen,
global,
import,
local,
progbeg,
progend,
segment,
space,
0, 0, 0, 0, 0, 0, 0,
{
1, /* max_unaligned_load */
rmap,
blkfetch, blkstore, blkloop,
_label,
_rule,
_nts,
_kids,
_string,
_templates,
_isinstruction,
_ntname,
emit2,
doarg,
target,
clobber,
}
};
static char rcsid[] = “$Id: dsp56k.md,v 1.24 2004/08/19 13:42:43 henan702 Exp $”;
78
Index
A H
abstract syntax tree . . . . . . . . 26 Harvard architecture . . . . . . . .4
accumulator register . . . . . . . .5 heap . . . . . . . . . . . . . . . . . . . 36
activation record . . . . . . . . . 37
address register . . . . . . . . . . .5 I
ANSI C . . . . . . . . . . . . . . . . 24 input register . . . . . . . . . . . . .5
B K
boilerplate . . . . . . . . . . . . . . 41 K&R C . . . . . . . . . . . . . . . . . 24
bootstrap . . . . . . . . . . . . . . . 46
L
C lburg . . . . . . . . . . . . . . . 30, 39
C89 . . . . . . . . . . . . . . . . . . . 24
C90 . . . . . . . . . . . . . . . . . . . 24 M
C99 . . . . . . . . . . . . . . . . . . . 24 modifier register . . . . . . . . . . .6
D N
dag . . . . . . . . . . . . . . . . . . . 29 nonterminal . . . . . . . . . . . . . 12
DC . . . . . . . . . . . . . . . . . . . .7
derivation . . . . . . . . . . . . . . 13 O
directed acyclic graph . . . . . . 29 offset register . . . . . . . . . . . . . 5
ops . . . . . . . . . . . . . . . . . . . . 40
E OPT . . . . . . . . . . . . . . . . . . . . 7
EBNF . . . . . . . . . . . . . . . . . 26 ORG . . . . . . . . . . . . . . . . . . . . 7
F P
fixed-point . . . . . . . . . . . . . . .4 P address bus . . . . . . . . . . . . . 5
fractional number . . . . . . . . . 34 parse tree . . . . . . . . . . . . 11, 26
parser . . . . . . . . . . . . . . . . . . 26
G parsing . . . . . . . . . . . . . . . . . 11
GCC . . . . . . . . . . . . . . . . . . 23 production . . . . . . . . . . . . . . 12
GLOBAL . . . . . . . . . . . . . . . . 7 program data bus . . . . . . . . . . 5
global data bus . . . . . . . . . . . . 5
GNU C Compiler . . . . . . . . . 23 R
GNU Compiler Collection . . . 23 retarget . . . . . . . . . . . . . . . . . 19
79
S
scanning . . . . . . . . . . . . . . . 11
stack . . . . . . . . . . . . . . . . . . 36
start symbol . . . . . . . . . . . . . 13
syntax tree . . . . . . . . . . . . . . 13
T
template . . . . . . . . . . . . 31, 32
terminal . . . . . . . . . . . . . . . . 12
three-address code . . . . . . . . 15
token . . . . . . . . . . . . . . . 11, 24
tree grammar . . . . . . . . . . . . 30
tree parser . . . . . . . . . . . . . . 30
W
wildcard . . . . . . . . . . . . 40, 46
X
X address bus . . . . . . . . . . . . 5
X data bus . . . . . . . . . . . . . . . 5
Y
Y address bus . . . . . . . . . . . . 5
Y data bus . . . . . . . . . . . . . . . 5
80
På svenska
In English
The publishers will keep this document online on the Internet - or its possible
replacement - for a considerable time from the date of publication barring
exceptional circumstances.
The online availability of the document implies a permanent permission
for anyone to read, to download, to print out single copies for your own use
and to use it unchanged for any non-commercial research and educational
purpose. Subsequent transfers of copyright cannot revoke this permission.
All other uses of the document are conditional on the consent of the copy-
right owner. The publisher has taken technical and administrative measures
to assure authenticity, security and accessibility.
According to intellectual property law the author has the right to be
mentioned when his/her work is accessed as described above and to be pro-
tected against infringement.
For additional information about the Linköping University Electronic
Press and its procedures for publication and for assurance of document integ-
rity, please refer to its WWW home page: http://www.ep.liu.se/
© Henrik Antelius