The paper proposes a new complexity measure for computer programs based on their nesting levels, addressing limitations in existing metrics such as McCabe's cyclomatic complexity and Halstead's program magnitude. By examining control flow graphs, the proposed measure incorporates the complexity of nested structures, thus providing a more holistic assessment of program complexity that reflects readability and maintenance challenges.

-63- A COMPLEXITY MEASURE BASED ON NESTING LEVEL Warren A. Harrison Kenneth I. Magel Computer Science Department University of Missouri-Rolla For the past several years an accepted method of determining the complexity of computer programs has involved developing a direct graph, G=(V,E) which represents the flow of control of the program. The directed graph, G consists of a set of nodes, V which represent "blocks" of groups of code, and a set E, of edges which corresponds to the flow of control among the various nodes. The graph is usually restricted to having one initial node which is always executed first. In addition, each block has two properties: (i) No transfer occurs from outside. (2) If the first statement of the block is executed, all the statements of the block are executed. into the interior of the block McCabe [3] has published one method which has been widely accepted. His method requires the calculation of the number of basic paths within the program the smallest set of paths that, when taken in combination may serve to generate every possible path in the graph. This is calculated as the cyclomatic number, V(G), using the following formula: V(G) = e - n + 2 where e represents the number of edges, and n represents number of nodes in the control flow graph. the The calculation of the cyclomatic number proves to be an effective complexity measure. However, because the cyclomatic measure only counts the number of basic paths, it is incapable of recognizing the effects of two major complexity factors which can be intuitively seen to increase program complexity. These two items are the complexity of the individual blocks within the program which we shall refer to as "program magnitude", and the program. Halstead [I] has proposed several methods of measuring program magnitude. However, a measure of program magnitude alone sone not satisfy the need to account for the level of nesting within control structures. For example, assuming that each node or block, S. of the following code segments is of comparable complexity, a the first segment of code is obviously more conplex by virtue of the nesting of the if-then-else's. -64- if Pl then if P 2 then Si; else else S ® S2 , 3) $4; if Pl then Si; else $2; if P2 then $3; $4; However, neither the McCabe nor the Halstead metrics take into account the additional complexity of hte nesting. Using McCabe's measure, the first segment has a V(G) of 3. The second code segment also has a V(G) of 3. Because Halstead's measure simple determines the number of operators, operands and the total use of each, and since both segments contain two selection statements, and three blocks of instructions, the complexities as determined by Halstead's meausre would alos be indistinguishable. However, by examining the control flow graph of a computer program we can determine the nesting of each block of node. This~ can be done by using some concepts from the study of lattices. However, it is important to recognize that the control flow graph of a computer program is not a lattice since the relation defined by the set of edges is not necessarily a partially ordered set [2]. We may say that a node x precedes a node if there is a path from node x to node y. We may write this as x<y. If there is an edge from node x to node y, then we may say x immediately proceeds y, written x<<y. Now consider a subgraph G' of a control flow graph G. An element m in G is said to be an upper bound for G' if it precedes all nodes in G'. An element n in G is said to be a lower bound of G' if it succeeds every node in G'/ If there exists an upper bound of G', then m' is the least upper bound of G' Likewise, if there exists a lower bound of G', n' which precedes all other lower bounds of G', n' is the greatest lower bound of G' -65- Often, a node will exist in G with an outdegree of two or greater. We shall refer to such a node as a selection node. Even though we cannot determine every path possible from a selection node since it may be an infinite number if we allow backward branches, we can determine the nodes that lay upon any possible parh from a selection node using the above concepts. If we accept the fact that the complexity of a control structure such as an if-then-else is dependent upon the statements within its range, then ~ o w i n g what nodes lie upon the paths within its range can be useful. We may determine which nodes lie within the selection node's range by forming a subgraph of all the nodes which lie between the selection node itself, and the greatest lower bound of the subgraph formed of all nodes which immediately succeed the selection node. This subgraph, G' will contain every node which lies within the range of the selection node. For example: if Pi then if P2 then S1; else $2; else $3; $4; It can be said that all nodes, except for S~ lie within the range of the node Pi (i.e., the outer if-then-el~e construct). The greater lower b6und for this constru-~ Is S-4~ L i k e w i s e , o lie withinthe range o~ noae P~. ~or F~ ~ also S and S als ~ ' for ~ ~ P i~ the g~eatest lower bound. The subgraph G i contains the nodes S x, P2' S1 and S 2. The subgraph G' for P2 contains the nodeg S 1 and S 2. - We may utilize this technique to compute the complexity of a computer program by assigning each node a "raw" complexity value which would consist of the Halstead measure for that node. In addition, each node would possess an additional complexity measure shich we shall refer to as that node's "adjusted" complesity. -66- The adjusted complexity of a given node may be calculated using the following procedure: First, the subgraph G' of each selection node degree >i) should bedeterminedo (i°eo~ out- After the subgraph G' is formed (note that it does not contain the greatest lower bound) the adjusted c o m p l e x i t y ~ o r for selection node is computed by summing the raw complexity of every node within the subgraph G', and adding the raw complexity of the selection node itself. For all other nodes (i.e., outdegree <i) the adjusted plexity is set equal to the node's raw complexity. com- The sum of the adjusted complexities of all the nodes in the control flow graph is then used as a measure of the total programs complexity. This complexity measure is capable of recognizing the effects of both the level of nesting within control structures and the program magnitude. The level of nesting is determined via the use of the control flow graph, and the progrma magnitude is taken into account by using the Halstead measure for each component node ~i.e., the node's raw complexity). This lends itself to a more intuitively satisfying complexity metric than either McCabe's or Halstead's metrics by themselves. While both McCabe's metric and the one presented here make use of control flow graphs, it should be noted that the similarity ends there. MCCabe is actually interested only in the basic paths, while the current complexity metric used the control paths fo the graph only to identify which nodes contribute to other node's conplexities. On the following pages are some comparisions of the measures calculated by both McCabe's measure and the method presented in this paper. The flow graphs used are taken from examples given in McCabe's original paper. While the raw complexity of each node reflect that node's Halstead measure, raw have been assigned to every node to allow the two techniques (since McCabe does not complexity of individual nodes). would ordinarily complexities of 1 comparisions between take into account the -67A B Q ( I ) ) d I © C D -68- E i / G H ~69~ jk zf K L J J J J -70- Complexit~ Graph McCabe's A B C D E F G H I J K L Calculations Measure New Com~lexit X M e a s u r e * 2 3 5 6 8 8 9 i0 I0 ii i0 19 3 15 24 26 28 59 78 81 53 186 39 126 * Whenever a graph contains two consecutive nodes where executing the first node implies executing the second and vice versa, these nodes have been m e r g e d for d e t e r m i n a t i o n of the new c o m p l e x i t y measure. Table Ranking Graph McCabe's A B C D E F G H I J K L *** Graphs of Programs 1 by C o m p l e x i t y New C o m p l e x i t y Measure 1 2 3 4 5 5 7 8 8 ii 8 12 where ranking Measure Measure 1 2 3 4 5 8*** 9*** 10"** 7*** 12"** 6*** ii*** differs Table between 2 the two methods -71- The actual values obtained for the two complexity measures are not very significant since we assume the complexity of each node was one. What is significant is the ranking of the programs in order of increasing complexity presented in Table 2. For the simple graphs A through E, the two measures agree on rankings. For the more complex graphs, the two measures rank the graphs in different orders. The rankings of graphs F and K are particularly different. McCabe's complexity measure and the one discussed in this paper capture different notions of complexity. McCabe attempts to determine the number of execution paths in a program. This type of measure provides an indication of the difficulty which would be encountered in debugging a program using testing. The new measure attempts to determine how difficult the static program text would be to understand. This type of measure provides an indication of the difficulty which would be encountered in p r o v i n g a program correct. Consider graph F. The McCabe measure ranks this graph as simpler than any of G through L. The new measure considers K and I to be more complex. The following programs exhibit the same control structure as F, I and K respectively. Program Corresponding to F If pl then $3 goto S13 else if p2 then goto $3 else if p4 then case p6 of (I): goto $3 (2): $7 goto $9 (3): s8 goto $9 esac else if p5 then gotp p6 $9 if pl0 then s Ii S12 S13 -72- Program C o r r e s p o n d i n g _ t o CASE pl of (i) : if p2 then $3 if p4 then $5 $22 goto $23 else $6 (2) : if p7 then $8 if p9 then SI0 S12 else Sii S13 (3) : if p14 then S15 if pl6 then if p17 then S19 else; else S18 $20 esac $21 $23 I -73- Program Corresp0nding to K S1 $2 DO $3 while(condition) DO $4 while(condition) UO $5 while(condition) $6 $7 DO $8 while(condition) $9 DO SI0 while(condition) SiI DO S12 while(condition) DO S13 while(condition) IF p14 then S15 else S16 if p17 then goto S16 else goto $7 Where the conditions for repetition of statements $3, $4, $5, $8, SI0, S12, and S13 were not indicated in the original graph. -74- The p r o g r a m c o r r e s p o n d i n g to K is the easiest to read. Except for the b r a n c h back to $7 at the end, it can be read from top to bottom. The p r o g r a m c o r r e s p o n d i n g to I can be read as a Case statement f o l l o w e d by two simple statements. The three cases w i t h i n the case s t a t e m e n t each involve two levels of nested IFs. The first also has a goto to outside of the Case statement. The p r o g r a m c o r r e s p o n d i n g to F has three levels of n e s t e d IF's w i t h a case statement w i t h i n the lowest level IF. Further, there are six goto's w h i c h make the p r o g r a m very difficult to read. REFERENCES . Halstead, M a u r i c e H., Elements of Software Science, Amsterdam, The N e t h e r l a n d s : ...N..o r t h - H o l l a n d , 1977~ . Kildall, Gary A., "A U n i f i e d A p p r o a c h to Global P r o g r a m O p t i m i z a t i o n " , First A C M C o n f e r e n c e on P r i n c i p l e s of P r o g r a m m i n g Languages, Boston, October, 1973, pp. 194-206 . 