-63-
A COMPLEXITY MEASURE
BASED ON NESTING LEVEL
Warren A. Harrison
Kenneth I. Magel
Computer Science Department
University of Missouri-Rolla
For the past several years an accepted method of determining
the complexity of computer programs has involved developing a
direct graph, G=(V,E) which represents the flow of control of the
program.
The directed graph, G consists of a set of nodes, V
which represent "blocks" of groups of code, and a set E, of edges
which corresponds to the flow of control among the various nodes.
The graph is usually restricted to having one initial node which
is always executed first.
In addition, each block has two properties:
(i)
No transfer occurs
from outside.
(2)
If the first statement of the block is executed,
all the statements of the block are executed.
into the interior
of the block
McCabe [3] has published one method which has been widely
accepted.
His method requires the calculation of the number of
basic paths within the program
the smallest set of paths that,
when taken in combination may serve to generate every possible
path in the graph.
This is calculated as the cyclomatic number,
V(G), using the following formula:
V(G)
= e - n + 2
where e represents the number of edges, and n represents
number of nodes in the control flow graph.
the
The calculation of the cyclomatic number proves to be an
effective complexity measure.
However, because the cyclomatic
measure only counts the number of basic paths, it is incapable of
recognizing the effects of two major complexity factors which can
be intuitively seen to increase program complexity.
These two
items are the complexity of the individual blocks within the program
which we shall refer to as "program magnitude", and the
program.
Halstead [I] has proposed several methods of measuring program magnitude.
However, a measure of program magnitude alone
sone not satisfy the need to account for the level of nesting
within control structures.
For example, assuming that each node
or block, S. of the following code segments is of comparable
complexity, a the first segment of code is obviously more conplex by virtue of the nesting of the if-then-else's.
-64-
if Pl then
if P 2 then Si;
else
else
S
®
S2 ,
3)
$4;
if Pl then Si;
else $2;
if P2 then $3;
$4;
However, neither the McCabe nor the Halstead metrics take into
account the additional complexity of hte nesting.
Using McCabe's
measure, the first segment has a V(G) of 3.
The second code
segment also has a V(G) of 3.
Because Halstead's measure simple determines the number of
operators, operands and the total use of each, and since both
segments contain two selection statements, and three blocks of
instructions, the complexities as determined by Halstead's meausre would alos be indistinguishable.
However, by examining the control flow graph of a computer
program we can determine the nesting of each block of node.
This~
can be done by using some concepts from the study of lattices.
However, it is important to recognize that the control flow graph
of a computer program is not a lattice since the relation defined
by the set of edges is not necessarily a partially ordered set
[2].
We may say that a node x precedes a node if there is a
path from node x to node y.
We may write this as x<y.
If
there is an edge from node x to node y, then we may say x immediately proceeds y, written x<<y.
Now consider a subgraph G' of a control flow graph G.
An
element m in G is said to be an upper bound for G' if it precedes
all nodes in G'.
An element n in G is said to be a lower bound
of G' if it succeeds every node in G'/
If there exists an upper
bound of G', then m' is the least upper bound of G'
Likewise,
if there exists a lower bound of G', n' which precedes all other
lower bounds of G', n' is the greatest lower bound of G'
-65-
Often, a node will exist in G with an outdegree of two or
greater.
We shall refer to such a node as a selection node.
Even though we cannot determine every path possible from
a selection node since it may be an infinite number if we allow
backward branches, we can determine the nodes that lay upon any
possible parh from a selection node using the above concepts.
If we accept the fact that the complexity of a control structure
such as an if-then-else is dependent upon the statements within
its range, then ~ o w i n g what nodes lie upon the paths within its
range can be useful.
We may determine which nodes lie within the selection node's
range by forming a subgraph of all the nodes which lie between
the selection node itself, and the greatest lower bound of the
subgraph formed of all nodes which immediately succeed the
selection node.
This subgraph, G' will contain every node which
lies within the range of the selection node.
For example:
if Pi then
if
P2 then
S1;
else
$2;
else
$3;
$4;
It can be said that all nodes, except for S~ lie within the
range of the node Pi (i.e., the outer if-then-el~e construct).
The greater lower b6und for this constru-~ Is S-4~ L i k e w i s e ,
o lie withinthe range o~ noae P~.
~or F~ ~ also
S and S als
~ ' for ~ ~ P
i~ the g~eatest lower bound.
The subgraph
G
i
contains the nodes S x, P2' S1 and S 2. The subgraph G' for
P2 contains the nodeg S 1 and S 2.
-
We may utilize this technique to compute the complexity of
a computer program by assigning each node a "raw" complexity
value which would consist of the Halstead measure for that node.
In addition, each node would possess an additional complexity
measure shich we shall refer to as that node's "adjusted" complesity.
-66-
The adjusted complexity of a given node may be calculated
using the following procedure:
First, the subgraph G' of each selection node
degree >i) should bedeterminedo
(i°eo~
out-
After the subgraph G' is formed (note that it does not
contain the greatest lower bound) the adjusted c o m p l e x i t y ~ o r
for selection node is computed by summing the raw complexity of
every node within the subgraph G', and adding the raw complexity
of the selection node itself.
For all other nodes (i.e., outdegree <i) the adjusted
plexity is set equal to the node's raw complexity.
com-
The sum of the adjusted complexities of all the nodes in
the control flow graph is then used as a measure of the total
programs complexity.
This complexity measure is capable of recognizing the effects
of both the level of nesting within control structures and the
program magnitude.
The level of nesting is determined via the
use of the control flow graph, and the progrma magnitude is
taken into account by using the Halstead measure for each component node ~i.e., the node's raw complexity).
This lends itself to a more intuitively satisfying complexity
metric than either McCabe's or Halstead's metrics by themselves.
While both McCabe's metric and the one presented here make use
of control flow graphs, it should be noted that the similarity
ends there.
MCCabe is actually interested only in the basic
paths, while the current complexity metric used the control paths
fo the graph only to identify which nodes contribute to other
node's conplexities.
On the following pages are some comparisions of the measures
calculated by both McCabe's measure and the method presented
in this paper.
The flow graphs used are taken from examples
given in McCabe's original paper.
While the raw complexity of each node
reflect that node's Halstead measure, raw
have been assigned to every node to allow
the two techniques (since McCabe does not
complexity of individual nodes).
would ordinarily
complexities of 1
comparisions between
take into account the
-67A
B
Q
(
I
)
)
d
I
©
C
D
-68-
E
i
/
G
H
~69~
jk
zf
K
L
J
J
J
J
-70-
Complexit~
Graph
McCabe's
A
B
C
D
E
F
G
H
I
J
K
L
Calculations
Measure
New Com~lexit X M e a s u r e *
2
3
5
6
8
8
9
i0
I0
ii
i0
19
3
15
24
26
28
59
78
81
53
186
39
126
* Whenever a graph contains two consecutive nodes where executing
the first node implies executing the second and vice versa, these
nodes have been m e r g e d for d e t e r m i n a t i o n of the new c o m p l e x i t y
measure.
Table
Ranking
Graph
McCabe's
A
B
C
D
E
F
G
H
I
J
K
L
*** Graphs
of Programs
1
by C o m p l e x i t y
New C o m p l e x i t y
Measure
1
2
3
4
5
5
7
8
8
ii
8
12
where
ranking
Measure
Measure
1
2
3
4
5
8***
9***
10"**
7***
12"**
6***
ii***
differs
Table
between
2
the two methods
-71-
The actual values obtained for the two complexity measures
are not very significant since we assume the complexity of each
node was one.
What is significant is the ranking of the programs
in order of increasing complexity presented in Table 2. For the
simple graphs A through E, the two measures agree on rankings.
For the more complex graphs, the two measures rank the graphs in
different orders.
The rankings of graphs F and K are particularly
different.
McCabe's complexity measure and the one discussed in this
paper capture different notions of complexity.
McCabe attempts
to determine the number of execution paths in a program.
This
type of measure provides an indication of the difficulty which
would be encountered in debugging a program using testing.
The
new measure attempts to determine how difficult the static program text would be to understand.
This type of measure provides
an indication of the difficulty which would be encountered in
p r o v i n g a program correct.
Consider graph F. The McCabe measure ranks this graph as
simpler than any of G through L. The new measure considers K
and I to be more complex.
The following programs exhibit the
same control structure as F, I and K respectively.
Program Corresponding
to F
If pl
then $3
goto S13
else
if p2
then goto $3
else if p4
then case p6 of
(I): goto $3
(2):
$7
goto $9
(3):
s8
goto $9
esac
else if p5
then gotp p6
$9
if pl0
then s Ii
S12
S13
-72-
Program C o r r e s p o n d i n g _ t o
CASE pl of
(i) : if p2
then $3
if p4
then $5
$22
goto $23
else $6
(2) : if p7
then $8
if p9
then SI0
S12
else Sii
S13
(3) : if p14
then S15
if pl6
then
if p17
then S19
else;
else S18
$20
esac
$21
$23
I
-73-
Program Corresp0nding to K
S1
$2
DO $3 while(condition)
DO $4 while(condition)
UO $5 while(condition)
$6
$7
DO $8 while(condition)
$9
DO SI0 while(condition)
SiI
DO S12 while(condition)
DO S13 while(condition)
IF p14
then S15
else S16
if p17
then goto S16
else goto $7
Where the conditions for repetition of statements $3, $4,
$5, $8, SI0, S12, and S13 were not indicated in the original
graph.
-74-
The p r o g r a m c o r r e s p o n d i n g to K is the easiest to read.
Except for the b r a n c h back to $7 at the end, it can be read
from top to bottom.
The p r o g r a m c o r r e s p o n d i n g to I can be read
as a Case statement f o l l o w e d by two simple statements.
The
three cases w i t h i n the case s t a t e m e n t each involve two levels
of nested IFs.
The first also has a goto to outside of the
Case statement.
The p r o g r a m c o r r e s p o n d i n g to F has three levels
of n e s t e d IF's w i t h a case statement w i t h i n the lowest level
IF.
Further, there are six goto's w h i c h make the p r o g r a m very
difficult to read.
REFERENCES
.
Halstead, M a u r i c e H., Elements of Software Science,
Amsterdam, The N e t h e r l a n d s :
...N..o r t h - H o l l a n d , 1977~
.
Kildall, Gary A., "A U n i f i e d A p p r o a c h to Global P r o g r a m
O p t i m i z a t i o n " , First A C M C o n f e r e n c e on P r i n c i p l e s of
P r o g r a m m i n g Languages, Boston, October, 1973, pp. 194-206
.
McCabe,
actions
Thomas J., "A C o m p l e x i t y
of Software E n g i n e e r i n g ,
Measure", IEEE TransVol. SE-2, No. 4 (Dec 1976).