C Compiler Design For A Network Processor
C Compiler Design For A Network Processor
C Compiler Design For A Network Processor
net/publication/3224767
Article in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems · December 2001
DOI: 10.1109/43.959859 · Source: IEEE Xplore
CITATIONS READS
29 1,237
2 authors, including:
Jens Wagner
Hochschule für Technik, Wirtschaft und Kultur Leipzig
36 PUBLICATIONS 112 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jens Wagner on 17 September 2015.
Abstract—One important problem in code generation for em- have already been developed. These include code generation
bedded processors is the design of efficient compilers for target for irregular data paths [3]–[7], address code optimization
machines with application-specific architectures. This paper out- for DSPs [8]–[11], and exploitation of multimedia instruction
lines the design of a C compiler for an industrial application-spe-
cific instruction-set processor (ASIP) for telecom applications. The sets [12]–[14]. It has been shown experimentally, that such
target ASIP is a network processor with special instructions for highly machine-specific techniques are a promising approach
bit-level access to data registers, which is required for packet-ori- to generate high-quality machine code, whose quality often
ented communication protocol processing. From a practical view- comes close to handwritten assembly code. Naturally, this has
point, we describe the main challenges in exploiting these applica- to be paid with increased compilation times in many cases.
tion-specific features in a C compiler and we show how a compiler
backend has been designed that accommodates these features by While partially impressive results have been achieved in
means of compiler intrinsics and a dedicated register allocator. The code optimization for ASIPs in the DSP area, less emphasis has
compiler is fully operational and first experimental results indicate been so far on a new and important class of ASIPs for bit-serial
that C-level programming of the ASIP leads to good code quality protocol processing, which are called network processors
without the need for time-consuming assembly programming.
(NPs). The design of NPs has been motivated by the growing
Index Terms—Bit-level addressing, code generation, compilers, need for new high bandwidth communication equipment in net-
network processors. works (e.g., Internet routers and Ethernet adapters) as well as in
telecommunication (e.g., ISDN and xDSL). The corresponding
I. INTRODUCTION communication protocols mostly employ bit-stream-oriented
data formats. The bit streams consist of packets of different
A. Code Selection
The code selector uses the widespread technique of tree pat-
tern matching with dynamic programming [23] for mapping
data flow trees (DFTs) into assembly code. The basic idea in
this approach is to represent the target machine instruction set
in the form of a cost-attributed tree grammar, and parsing each
given DFT with respect to that grammar. As a result, an optimum
derivation for the given cost metric, and hence an optimal code
selection, are obtained. The runtime complexity is only linear in
the DFT size.
For the implementation, we used the OLIVE tool (an exten-
sion of IBURG [24] contained in the SPAM compiler [25]),
that generates code selector C source code for a given instruc- Fig. 9. Construction of the SIG. Virtual register sets fR1; R2g and
fR3; R4; R5g represent two register arrays, while R6 refers to a scalar
tion set represented by a tree grammar. Specifying the instruc- variable.
tion set with OLIVE is convenient, since the tool permits to at-
tach action functions to the instruction patterns, which facilitates
book-keeping and assembly code emission. some register pointer are assumed to be live while is in use.
The LANCE frontend splits each C function into a set of basic Liveness of is determined by inspecting the pointer initializa-
blocks, each of which contains an ordered list of DFTs. The tions in the calls to CKF INIT (see Fig. 8).
DFTs, which are directly generated in the format required for Due to the allocation constraints imposed by register arrays,
OLIVE, are passed to the generated code selector and are trans- the mapping of virtual registers to physical registers is based
lated into assembly code sequences one after another. During on a special multilevel graph coloring algorithm. Physical reg-
this phase, also the calls to CKFs are detected and are directly isters are assigned to those virtual registers first that belong to
transformed into the corresponding NP assembly instructions. register arrays. This is necessary since register arrays present
This step is rather straightforward, since CKFs are simply iden- higher pressure for the register allocator than scalar registers.
tified by their name. However, the code selector, in cooperation First, any node set in the original interference graph that be-
with the register allocator, is still responsible for a correct reg- longs to a certain register array is merged into a supernode.
ister mapping, since CKFs are called with symbolic C variables Then, the interference graph is transformed into a superinter-
instead of register names. The result of the code selection phase ference graph (SIG), while deleting all edges internal to each
is symbolic assembly code with references to virtual registers. supernode and all scalar virtual register nodes and their incident
This code is passed to the register allocator described in the fol- edges (Fig. 9).
lowing. Next, a weight is assigned to each supernode , which is equal
to the number of internal virtual registers of plus the maximum
number of internal virtual registers of ’s neighbors in the SIG.
B. Register Allocation
The supernodes are mapped to physical registers according to
Although the NP shows a RISC-like basic architecture, the descending weights. This heuristic is motivated by the fact that
classical graph coloring approach to global register allocation supernodes of a lower weight are generally easier to allocate be-
[26] cannot be directly used. The reason is the need to handle cause they cause less lifetime conflicts. Furthermore, in case of
register arrays. As explained in Section III (see also Figs. 6 a conflict, it is cheaper to spill/reload a smaller array instead of a
and 8), register arrays arise from indirect addressing in C pro- larger one. For any supernode with internal virtual registers,
grams, where unaligned bit packets are traversed within loops. a contiguous range in the register file is assigned. Since there
As a consequence, virtual registers containing (fragments of) may be multiple such windows available at a certain point of
bit-packet arrays have to be assigned to contiguous windows in time, the selection of this range is based on the best fit strategy
the physical register file. in order to ensure a tight packing of register arrays in the reg-
In order to achieve this, the register allocator maintains two ister file, i.e., in order to avoid too many spills.
sets of virtual registers: one for scalar values and one for reg- In our approach, any element of a register array can be ac-
ister arrays. All virtual registers are indexed by a unique number, cessed in two different ways: first by direct addressing (e.g.,
where each register array gets a dedicated, unique, and con- ) or indirectly by the use of a bit-packet pointer. In case of
tiguous index range. As usual, register allocation starts with a insufficient physical registers using indirect access, spill code is
lifetime analysis of virtual registers. Potential conflicts in the generated for all virtual registers within a register array; other-
form of overlapping life ranges are represented in an interfer- wise, only the particular virtual register is spilled. After register
ence graph, where each node represents a virtual register, and allocation, all symbolic addresses for bit packets have to be re-
each edge denotes a lifetime overlap. The lifetime analysis is calculated because now they specify a physical register within
based on a standard def-use analysis of virtual registers [27]. the register file instead of the name of a virtual register.
During lifetime analysis, special attention has to be paid to bit The permissible size of bit-packet arrays is constrained by
packets indirectly addressed via register pointers, whose values the size of the physical register file. Therefore, no register al-
might not be known at compile time. In order to ensure program location can be done for register arrays that are larger than the
correctness, all register array elements potentially pointed to by register file and the compiler needs to reject such code with cor-
WAGNER AND LEUPERS: C COMPILER DESIGN FOR A NETWORK PROCESSOR 1307
[4] G. Araujo and S. Malik, “Optimal code generation for embedded [24] C. W. Fraser, D. R. Hanson, and T. A. Proebsting, “Engineering a simple,
memory nonhomogeneous register architectures,” in Proc. 8th Int. efficient code generator generator,” ACM Lett. Program. Lang. Syst., vol.
Symp. System Synthesis, Sept. 1995, pp. 36–41. 1, no. 3, pp. 213–226, 1992.
[5] S. Liao, S. Devadas, K. Keutzer, S. Tjiang, and A. Wang, “Code opti- [25] A. Sudarsanam, “Code optimization libraries for retargetable compila-
mization techniques for embedded DSP microprocessors,” in Proc. 32nd tion for embedded digital signal processors,” Ph.D. dissertation, Dept.
Design Automation Conf., June 1995, pp. 599–604. Elect. Eng., Princeton Univ., Princeton, NJ, 1998.
[6] A. Timmer, M. Strik, J. van Meerbergen, and J. Jess, “Conflict modeling [26] P. Briggs, “Register allocation via graph coloring,” Ph.D. dissertation,
and instruction scheduling in code generation for in-house DSP cores,” Dept. Comput. Sci., Rice Univ., Houston, TX, 1992.
in Proc. 32nd Design Automation Conf., Dec. 1995, pp. 593–598. [27] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers—Principles, Tech-
[7] S. Bashford and R. Leupers, “Constraint driven code selection for fixed- niques, and Tools. Reading: Addison-Wesley, MA 1986.
point DSPs,” in Proc. 36th Design Automation Conf., June 1999, pp. [28] R. Leupers, Code Optimization Techniques for Embedded Proces-
817–822. sors. Norwell, MA: Kluwer, 2000.
[8] D. H. Bartley, “Optimizing stack frame accesses for processors with re- [29] D. Callahan, S. Carr, and K. Kennedy, “Improving register allocation
stricted addressing modes,” Software—Practice and Experience, vol. 22, for subscripted variables,” in Proc. ACM SIGPLAN Conf. Programming
no. 2, pp. 101–110, Feb. 1992. Language Design and Implementation, June 1990, pp. 53–65.
[9] S. Liao, S. Devadas, K. Keutzer, S. Tjiang, and A. Wang, “Storage
assignment to decrease code size,” in Proc. ACM SIGPLAN Conf.
Programming Language Design and Implementation, June 1995, pp.
186–195.
[10] R. Leupers and F. David, “A uniform optimization technique for offset
J. Wagner received the B.S. degree in electrical
assignment problems,” in Proc. 11th Int. System Synthesis Symp., Dec.
engineering from the Engineer School of Eisleben,
1998, pp. 3–8.
Eisleben, Germany, in 1993, the B.S. degree in
[11] E. Eckstein and A. Krall, “Minimizing cost of local variables access
computer science from the Bolton Institute, Bolton,
for DSP processors,” in Proc. ACM SIGPLAN Workshop Languages,
U.K., in 1996, and the Diploma degree in computing
Compilers, and Tools for Embedded Systems, May 1999, pp. 20–27.
from Leipzig University of Applied Sciences,
[12] R. J. Fisher and H. G. Dietz, “Compiling for SIMD within a register,”
Leipzig, Germany, in 1997. He is currently working
in Proc. 11th Annu. Workshop Languages and Compilers for Parallel
toward the Ph.D. degree in computer science at the
Computing, Aug. 1998, pp. 290–304.
University of Dortmund, Dortmund, Germany.
[13] R. Leupers, “Code selection for media processors with SIMD instruc-
He has been working in electronics and the IT in-
tions,” in Proc. Design Automation and Test Eur., 2000, pp. 4–8.
dustry for six years and joined the Embedded Sys-
[14] S. Larsen and S. Amarasinghe, “Exploiting superword level parallelism
tems Group at the University of Dortmund in 1999 as a Ph.D. student. He is
with multimedia instruction sets,” in Proc. ACM SIGPLAN Conf.
currently with the technology transfer company Informatik Centrum Dortmund
Programming Language Design and Implementation, June 2000, pp.
(ICD), Dortmund, Germany, where he designs customized compilers for appli-
145–156.
cation-specific processors. His current research interests include assembly code
[15] P. Paulin, “Network processors: A perspective on market requirements,
optimization techniques, retargetable compilers, and hardware–software code-
processors architectures, and embedded S/W tools,” in Proc. Design Au-
sign.
tomation and Test Eur., 2001, pp. 420–427.
[16] D. Brooks and M. Martonosi, “Dynamically exploiting narrow width
operands to improve processor power and performance,” in Proc. 5th
Int. Conf. High-Performance Computer Architecture, Jan. 1999.
[17] M. Stephenson, J. Babb, and S. Amarasinghe, “Bitwidth analysis with
application to silicon compilation,” in Proc. ACM SIGPLAN Conf. Pro- R. Leupers received the Dipl.-Inform. and Ph.D.
gram Language Design and Implementation, June 2000, pp. 108–120. degrees in computer science with distinction from
[18] S. C. Goldstein, H. Schmidt, M. Moe, M. Budiu, S. Cadambi, R. R. the University of Dortmund, Dortmund, Germany,
Taylor, and R. Laufer, “PipeRench: A coprocessor for streaming mul- in 1992 and 1997, respectively.
timedia acceleration,” in Proc. 26th Annu. Int. Symp. Computer Archi- He is currently a Senior Researcher and Lecturer
tecture, May 1999, pp. 28–39. with the Computer Science Department of the
[19] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. R. University of Dortmund. Since 1993, he has been a
Taylor, “PipeRench: A reconfigurable architecture and compiler,” IEEE Member of the Embedded Systems Group at Dort-
Comput., vol. 33, pp. 70–77, Apr. 2000. mund, where he is responsible for research projects
[20] X. Nie, L. Gazsi, F. Engel, and G. Fettweis, “A new network processor in the area of retargetable and optimizing compilers
architecture for high-speed communications,” in Proc. IEEE Workshop for embedded processors. In addition to his research
Signal Processing Systems, 1999, pp. 548–557. and teaching activities, he also heads the embedded software tools development
[21] C. Liem, Retargetable Compilers for Embedded Core Proces- team at the technology transfer company Informatik Centrum Dortmund
sors. Norwell, MA: Kluwer, 1997. (ICD), Dortmund, Germany. He has been a Co-Organizer of the Software
[22] A. Inoue, H. Tomiyama, H. Okuma, H. Kanbara, and H. Yasuura, “Lan- and Compilers for Embedded Systems workshop series and he serves on the
guage and compiler for optimizing datapath widths of embedded sys- program committees of several design automation, digital signal processor, and
tems,” IEICE Trans. Fundam., vol. E81-A, no. 12, pp. 2595–2604, Dec. compiler conferences. He authored Retargetable Code Generation for Digital
1998. Signal Processors (Norwell, MA: Kluwer, 1997) and Code Optimization
[23] A. V. Aho, M. Ganapathi, and S. W. K. Tjiang, “Code generation using Techniques for Embedded Processors (Norwell, MA: Kluwer, 2000).
tree matching and dynamic programming,” ACM Trans. Program. Lang. Dr. Leupers received a Best Paper Award at the 2000 Design Automation and
Syst., vol. 11, no. 4, pp. 491–516, 1989. Test in Europe Conference.