Sarma 2015 Ijca 907609
Sarma 2015 Ijca 907609
Sarma 2015 Ijca 907609
net/publication/336771654
CITATIONS READS
3 3,638
1 author:
SEE PROFILE
All content following this page was uploaded by Anjan Kumar Sarma on 03 April 2020.
27
International Journal of Computer Applications (0975 – 8887)
Volume 131 – No.16, December2015
(b) Local reduction in strength, that is, obviously true for embedded systems, where memory usage is
replacing a more expensive operator particularly crucial.
by a cheaper one.
Some of the new optimization techniques used for code size
(c) Constant Folding reduction are discussed here
(iv) Reordering statements that do not depend on one
another. 2.2.1 Reverse-Inlining (or Procedural
Abstraction)
2.1.2 Global Optimization (Intra-Procedural This code optimization technique attempts to reduce the size
Methods) of the source code by replacing all common code patterns in a
Global optimization techniques act on whole functions. In program with function calls. This technique is in fact opposite
global optimization, improvement takes into account what of the “Inlining of Code”, where the call to a function is
happens across basic blocks. replaced by a copy of the function body. In this technique, all
common code patterns in a program are placed into compiler-
Most global optimization techniques are based on data-flow generated functions. Every occurrence of these patterns in the
analysis. The results of data-flow analysis all have the same program is replaced by a call to the corresponding function.
form: for each instruction in the program, they specify some This technique is particularly useful for optimizing programs
property that must hold every time that instruction is in which more occurrences of a given pattern can be found.
executed.
Procedural abstraction involves the following steps:
Some of the global optimization techniques are:
1. The source code is partitioned into basic blocks.
(a) Eliminating global common sub-expressions.
2. A callgraph is created from these basic blocks. The
(b) Copy propagation. following procedure is used to create the callgraph:
(c) Dead-code elimination. a. All the basic blocks except the one where program
(d) Code Motion. execution starts are considered unused.
(e) To find induction variables in loops and optimize b. The blocks which are actually entered by the control
their computation. flow of the program are considered and marked as
used.
(f) Constant folding.
c. The blocks which are not included in the completely
2.1.3 Inter-Procedural Optimization: generated call graph are removed.
Inter-procedural optimization (IPO) techniques are applied to
3. A fingerprint is computed from each basic block.
programs that contain many frequently used functions. The
technique is called inter-procedural because it analyses the This fingerprint is used to find candidate blocks for
entire program, whereas other optimizations such as local outsourcing. The algorithm for creating the
optimization or global optimization look at only a single fingerprint is quite simple and it works as follows:A
function, or even a single block of code. 64-bit value from each basic block is created by
Inter-procedural optimization techniques can be categorized looking at the first 16 opcodes of the block, and
as- each of these opcodes is assigned a 4-bit code in the
fingerprint.
(a) Address-token analysis
4. The compiler now checks all candidates for
(b) Array-dimension padding
outsourcing based on the fingerprint, and if possible
(c) Alias analysis swaps them out into functions.
(d) Automatic array transposition The average code size reduction using this approach is up to
(e) Automatic memory pool formation 30%. However, use of Object-oriented programming
languages like C++, Java and C# further increases the
(f) Common block variable coalescing opportunities for code compaction.
(g) Common block splitting Example of Procedural abstraction
(h) Constant propagation Load r1,$5200 Load r1,$5200 Load r1,$5300
add r1,r2 add r1,r2 add r1,r2
(i) Dead code detection rot r1,$2 rot r1,$2 rot r1,$2
(j) Formal parameter alignment analysis mul r1,r1 mul r1,r1 mul r1,r1
ret
The global code optimization technique is based on inter- (a)Original Code
procedural information only.
call f call f load r1,$5300 f: Load r1,$5200
2.2 New Optimization Techniques add r1,r2 add r1,r2
In general, code optimization is performed for reducing the rot r1,$2 rot r1,$2
compiled code size and execution time .However, reducing mul r1,r1 mul r1,r1
the code size has become low-priority due to steadily ret
decreasing cost of memory. Still, it is favorable to spend effort
on the code size reduction instead of wasting memory. This is (b) Procedural Abstraction Variant I
28
International Journal of Computer Applications (0975 – 8887)
Volume 131 – No.16, December2015
29
International Journal of Computer Applications (0975 – 8887)
Volume 131 – No.16, December2015
as a frame pointer (FP) which is used to address local function into a leaf function, an accumulator variable need to
variables on the stack. There are three local variables A, B, be added in the function.
and C, on the stack. Suppose, the sequence S=(A,C,A,B)
specifies the order in which these local variables will be For the special treatment of leaf functions, some other
accessed. Also, the auto increment range is suppose R=[-1,1]. conditions must also be met; for example, the registers must
be used for their own variables and temporaries.
In figure: (a), the variables are assigned to stack locations n,
n+1 and n+2 in the order A-B-C and FP is initially pointing to The GCC Compiler performs register numbering before it
variable A at address n. According to the sequence S, the next knows whether the function is suitable for converting to a leaf
access goes to variable C located at address n+2. Thus, FP function. It therefore needs to remember the registers in order
needs to be modified by the value +2. But, since the auto to output a leaf function. The GCC uses the following macros
increment range is restricted to R=[+1,-1] and +2 does not fall to accomplish this.
in this range, the request cannot be served. The other two (i) Macro: LEAF REGISTERS
accesses of the variables B and C according to the sequence S
require FP modifications by the values (-2) and (+1), of which (ii) Macro: LEAF-REG-REMAP
only the last FP modification (+1) can be implemented by
auto-increment. In order to implement the FP modifications
2.2.5 Type-Conversion Optimization:
The type conversion optimization attempts to reduce the code
+2 and -2, two extra instructions would be required. But, if the
size by reducing the size of the data itself.
variables are reassigned to memory locations in the order B-
A-C, these two extra instructions would not be required, since Most of the compilers insert many implicit type conversions
the access sequence now requires FP modifications by the to support a variety of data types. For example, In C
values +1, -1 and -1. All of these values fall in the auto- programming language arithmetic operators operate on
increment range R and thus all FP modifications can be integer-sized values (typically 32- bit) only. The compiler will
implemented by auto-increment. always select the most efficient integer size if we declare an
int, without specifying the size. Integers of smaller sizes (char,
It is therefore a good choice to rearrange the variable layout in
short int) will be converted to types to integers of the default
memory to get better code. The goal of address code
size when doing calculations, and only the lower 8 or 16 bits
optimization techniques is to compute such good variable
of the result will be used. We can assume that the type
layouts.
conversion takes zero or one clock cycle. In a 64-bit system,
2.2.4 Leaf Function Optimization: there is only a minimal difference between the efficiency of
A function that calls no other function is called a leaf 32-bit integers and 64-bit integers.
function. These functions form the leaves of the call graph in Consider the following C language statement:
the call graph representation. A leaf function is expected to
run more efficiently if it does not make its own register char a,b,c;
window. Therefore the set of registers that would normally be
………
reserved for making function calls and parameter passing can
be used for general computation in a leaf function. In leaf- c=a+b;
function optimization, non-leaf functions are transformed into
leaf functions that can be represented as the leaves of a call ………
graph. Before doing the addition, the C compiler will promote char
There are several advantages of converting functions into leaf (say signed 8-bit) variables a and b to integer types of the
functions. Since leaf functions can be inlined easily, there is default size. The result of this addition is of integer type and it
no need for function entry and exit code, which saves code must be demoted to char type before being stored in the char
size. Register constraints are removed which also saves code variable c. If a and b are stored in memory, four instructions
size. Besides, since the body of the inlined function becomes would be performed: two loads for loading a and b, one add
part of the parent function, it becomes easier to further for the addition and one store for storing back the result in c.
optimize these functions as well. However, if a and b are stored in registers, the compiler must
generate code that sign extends a and b before doing the
Leaf functions can be derived from the original source code, addition. The compiler will generate two instructions for each
but sometimes they are the results of procedural abstraction. sign extension: a left-shift to move the char sign bit into the
Some non-leaf functions can even be treated as leaf functions. int sign bit, and then a right shift to restore the value and
propagate the sign bit into the upper 24-bits of the register.
For example, consider the factorial function
However, if it is known in advance that the result of the
int fact (int n, int acc) addition will be converted to a char type then there is no need
{ to sign extend a and b before addition. The addition can be
if(n==0) performed with 8-bit precision. Thus a careful analysis before
return acc; performing the addition can eliminate the sign extension
else return fact(n-1, acc*n); operations.
The type conversion optimization techniques use basic
} principles type analysis to remove redundant type conversions
that can produce useful reductions in code size.
This function has a recursive call at the end of the function.
This recursive call can be transformed into a loop by 2.2.6 Multiple Memory Access optimization
implementing it as a jump to the beginning of the function. If The instructions which load or store two or more registers
implemented as a loop, the function behaves as a leaf simultaneously are called Multiple Memory Access (MMA)
function. However, in order to transform this non-leaf instructions. Many microprocessors use MMA instructions for
30
International Journal of Computer Applications (0975 – 8887)
Volume 131 – No.16, December2015
reducing code size. MMA instructions include multiple loads electronic gadgets including digital alarm, digital radio and
and multiple stores where a set of registers are loaded from, or television, car fuel indicator, micro-wave ovens, remote-
stored to, successive instruction words in memory. For controlled devices etc. use embedded processors. Extensive
example, the ARM7 has LDM and STM instructions for growth of these embedded systems demands more efficient
Load-multiple registers and Store-multiple registers. There is software to operate these systems. A crucial aspect in coding
a significant compactness in code size by expressing a large for these systems is that these systems use limited memory.
set of registers with few instruction word bits. For example, in Thus, code size reduction is particularly important for these
a ARM7 processor, the LDM instruction uses only sixteen bits systems. Continuous researches in the embedded systems
to encode up to sixteen register loads (512 bits without the use results in the development of even smaller and smaller
of the LDM instruction). ). Effective use of LDM and STM devices which use very little memory. Thus, code
instructions in a ARM7 processor can save up to 480 bits optimization for reducing size is a major challenge in case of
opcode space. the embedded processors. Also, since these systems run on
battery power, developing code that consumes less power is
2.2.7 Combined Code Motion and Register another challenge for these systems.
Allocation
This technique combines two traditionally antagonistic 4. CONCLUSION AND FUTURE SCOPE
compiler phases: code motion and register allocation. Code This research paper tries to introduce the current trends in
motion aims to place instructions in less frequently executed code optimization. Since, most of the compiler researchers are
basic blocks, while instruction scheduling within blocks or conducting their researches on code optimization which
regions arranges instructions such that independent results in the development of newer techniques for code
computations can be performed in parallel. The Register optimization on a daily basis, it is not possible to cover all the
Allocation and Code Motion (RACM) algorithm aims to new techniques in this research paper. This paper includes a
reduce register pressure firstly by moving code (Code few of these techniques and the basic principles of these
Motion), secondly by Live-range splitting (Code-Cloning), techniques. Since code optimization is a field of broad
and thirdly by spilling. This optimization is applied to the research, it is not possible to cover all aspects of code
VSDG intermediate code, which greatly simplifies the task of optimization in this paper.
code motion. Data dependencies are explicit within the graph,
The future scope of this research is to develop some new
and so moving an operation node within the graph ensures
techniques based on these existing ones. This research paper
that all relationships with dependent nodes are maintained.
will help the programmers to do smart coding by identifying
Also, it is trivial to compute the live range of variables (edges)
the redundancies in their code and by applying these
within the graph; computing register requirements at any
optimization techniques to their programs. This will also help
given point (called a cut) within the graph is a matter of
the compiler designers to provide more optimization
enumerating all of the edges (live variables) that are
techniques in their compilers.
intersected by that cut.
31
International Journal of Computer Applications (0975 – 8887)
Volume 131 – No.16, December2015
[9] Neil Johnson and Alan Mycroft, “Using Multiple [14] Prajakta Gotarane, Sumedh Pundkar, (2015) “Smart
Memory Access Instructions for Reducing Code Size”, Coding using New Code Optimization Techniques in
University of Cambridge Java to Reduce Runtime Overhead of Java Compiler”,
International Journal of Computer Applications (0975 –
[10] Johnson, N., and Mycroft, A., (2003) “Combined Code 8887), Volume 125 – No.15, September 2015
Motion and Register Allocation using the Value State
Dependence Graph.” In Proc. 12th International [15] Keith D. Cooper 1 and L. Taylor Simpson, “Live Range
Conference on Compiler Construction (CC'03) (April Splitting in a Graph Coloring Register Allocator”, Rice
2003), vol. 2622 of LNCS (Springer-Verlag) University, Houston, Texas, USA,
[11] Gergö Barany, “Integrated Code Motion and Register [16] Preston Briggs, Keith D. Coope,Linda Torczon,,
Allocation”, Thesis for the Degree of Doctor, Vienna “Aggressive Live Range Splitting”, Houston University,
University of Technology Texas, USA
[12] Qingfeng Zhuge, Bin Xiao, Edwin H.-M. Sha,“ [17] Michael Burke, Linda Torczon, “Inter-procedural
Performance optimization of Multiple Memory Optimization: Eliminating Unnecessary Recompilation”,
Architectures for DSP” IBM Research, Rice University
[13] Josef Weidendorfer, “Analysis and Optimization of the
Memory Access Behavior of Applications”
IJCATM : www.ijcaonline.org 32