See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/281858313
Taxonomy Dimensions of Complexity Metrics
Conference Paper · July 2015
CITATIONS
READS
0
58
2 authors:
Bouchaib Falah
Magel Kenneth
18 PUBLICATIONS 8 CITATIONS
61 PUBLICATIONS 387 CITATIONS
Al Akhawayn University
SEE PROFILE
North Dakota State University
SEE PROFILE
All content following this page was uploaded by Bouchaib Falah on 18 September 2015.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
96
Int'l Conf. Software Eng. Research and Practice | SERP'15 |
Taxonomy Dimensions of Complexity Metrics
Bouchaib Falah1, Kenneth Magel2
Al Akhawayn University, Ifrane, Morocco,
2
North Dakota State University, Fargo, ND, USA
1
B.Falah@aui.ma, 2Kenneth.magel@ndsu.edu
1
Abstract
Over the last several years, software engineers have
devoted a great effort to measuring the complexity of
computer programs and many software metrics have been
introduced. These metrics have been invented for the
purpose of identifying and evaluating the characteristics
of computer programs. But, most of them have been
defined and then tested only in a limited environment.
Scientists proposed a set of complexity metrics that
address many principles of object oriented software
production to enhance and improve software development
and maintenance. The aim of this paper is to present
taxonomy of complexity metrics that, separately, evaluate
structural and dynamic characteristics of size, control
flow, and data. While most invented metrics applied to
only the method and class levels of complexity, our
approach uses metrics on each of the three levels: class,
method, and statement.
Keywords: Complexity Metrics; Software Testing,
Effectiveness, Data flow, Data usage; Taxonomy;
Cohesion.
1. Introduction
Measurement makes interesting characteristics of
products more visible and understandable [1, 2].
Appropriate measurement can identify useful patterns
present in the product being measured [3]. It makes
aspects and products more visible and understandable to
us, giving us a better understanding of relationships
among activities and entities. Measurement is not only
useful, but it is necessary. It is needed at least for
assessing the status of our applications, projects, products,
and systems. Measurement does not only help us to
understand what is happening during the development and
maintenance of our projects, but it also allows us to
control the interaction between the components of our
project and encourages us to improve our projects and
products.
There are a multitude of computer program software
metrics that have been developed since the pioneering
work of Halstead [4]. There are also several taxonomies
that have been used to describe these metrics.
Nowadays, software is expected to have an extended
lifespan, which makes the evaluation of its complexity at
the early stages critical in upcoming maintenance. Indeed,
complexity is proportional to the evolution of software.
Software metrics were introduced as tools that allow us to
obtain an objective measurement of the complexity of
software. Hence, enabling software engineering to assess
and manage software complexity. Reducing software costs
is one of the major concerns of software engineering
which creates an increasing need for new methodologies
and techniques to control those costs. Software
complexity metrics can help us to do so. In this paper, we
would provide taxonomy of complexity metrics that can
be served in reducing software costs. These metrics are
used on each of the three levels: class, method, and
statement.
2. Related Work
Many metrics have been invented. Most of them have
been defined and then tested only in a limited
environment. The most commonly used metrics for
software are the number of lines of source code LOC (a
rough measure of size), and Cyclomatic complexity (a
rough measure of control flow).
Halstead software science [4] metrics are other
common object oriented metrics that are used in the
coding phase. Maurice Halstead's approach relies on
mathematical relationships among the number of
variables. His metrics, or what are commonly referred to
as ‘software science’ [4], were proposed as means of
determining quantitative measures directly from the
operators and operands in the program. Halstead metrics
1
Int'l Conf. Software Eng. Research and Practice | SERP'15 |
are used during the development phase with the goal of
assessing the code of the program. Halstead’s metrics are
at the statement level, although they can be aggregated to
form method and class level metrics.
Chidamber and Kemerer [5] proposed a set of complexity
metrics that address many principles of object oriented
software production to enhance and improve software
development and maintenance. However, their metrics
applied to only the method and class levels of complexity.
They were evaluated against a wide range of complexity
metrics proposed by other software researchers and
experienced object oriented software developers. When
these metrics are evaluated, small experiments are done to
determine whether or not the metrics are effective
predictors of how much time would be required to
perform some task, such as documentation or answering
questions about the software. Results have been mixed.
Nevertheless, industry has adopted these metrics and
others because they are better than nothing.
In recent years, much attention has been directed
toward reducing software cost. To this end, software
engineers have attempted to find relationships between the
characteristics of programs and the complexity of doing
programming tasks or achieving desirable properties in
the resulting product such as traceability or security. The
aim has been to create measures of software complexity to
guide efforts to reduce software costs.
Our work applies a comprehensive suite of complexity
metrics that can solve the problem of maximizing the
effectiveness of software testing
3. Software Complexity Metrics
This paper uses software complexity metrics for
object-oriented applications. Metrics for code that is not
object oriented are not discussed in this research paper.
A metric is a measurement. Any measurement can be a
useful metric. There are several reasons to use metrics in
measuring the complexity of software, for instance:
x Prediction: metrics form the basis of any method for
predicting schedule, resource needs, performance or
reliability.
x Evaluation: metrics form the basis of determining
how well we have done.
x Targeting: metrics form the basis for deciding how
much effort to assign to which part of a task.
x Prioritization: metrics can form the basis for deciding
what to do next.
Several researchers have proposed a wide variety of
software complexity metrics. Each metric examines only
one characteristic of software. This characteristic is one
of:
97
x
x
x
Size: how large is the software.
Control Flow: either how varied is the possible flow
or how deeply nested is the possible flow or how long
is the possible flow.
Data Usage: either how many data items are defined
in the software or how many data items are related or
how many values an attribute’s value depend upon.
3.1. Size Metrics
One of the basic measures of a system is its size.
Measures of software size include length, functionality,
and complexity.
The oldest and most widely used size metric is the
lines of code. The lines of code are common object
oriented metrics that are used in the coding phase. There
are two major ways to count the lines of code depending
on what we count: a physical line of code (LOC) and a
logical line of code (LLOC). While the common
definition of LOC is the count of lines in text of the
program’s source code including comment lines, LLOC is
defined to be the number of statements.
For example: if we consider the following Java fragment
code:
// this is a
line of code example.
In this example: LOC = 1 and LLOC = 2.
Another common OO metrics that are used in the
coding phase were provided by Halstead software science
[4]. Halstead's approach is based on the assumption that a
program should be viewed as an expression of language.
Halstead believed that the complexities of languages are
an essential part of the reasons a programmer might find
complexity in the program code. Therefore, he bases his
approach on the mathematical relationships among the
number of variables, the complexity of the code and the
type of programming language statements
Because our research is related to Object Oriented
Java Application, we will adopt the Halstead metrics to
calculate the number of operators that are contained in
each statement of a Java code program, then we will
extend this metric to compute the total and the maximum
number of operators of all statements within each method,
and furthermore, we will compute the total and the
maximum number of operators in all methods within the
class. That means that we will use the number of operators
in all three levels: class, method, and statement.
3.2. Control Flow Metrics
Another object oriented metric that is used in coding
phase is McCabe Cyclomatic metric [6, 7]. Thomas
McCabe developed his complexity metric in 1976. His
approach was based on the assumption that the complexity
2
98
Int'l Conf. Software Eng. Research and Practice | SERP'15 |
of software is related to the number of control paths
generated by the code [6]. In other words, the code
complexity is determined based on the number of control
paths created by the code. This means that, in order to
compute a code complexity, the number of decisions
(if/then/else) and control statements (do while, while, for)
in the code are the sole criterion for this purpose and
therefore must be determined. For example, a simple
function with no conditionals has only one path; a
function with two conditionals has two paths. This metric
is based on the logic that programs with simple
conditionals are more easy to understand and hence less
complex. Those with multiple conditionals are harder to
understand and hence, more difficult and complex.
the most widely C&K metric used example, when
cohesion is related to instance variables, is Lack of
Cohesion in Methods (LOCM) [12, 13].
Chidamber and Kemerer proposed a set of metrics
that cover not just the data aspect but also cover other
different aspects.
The C&K metrics are computed for each class in an
application. Most of the metrics are at the class level
while a few are at the method level. Figure 1, for example,
illustrates how the C&K metrics would be apportioned
among taxonomy dimensions.
The control flow graph, G, of any given program can
be drawn. Each node of the graph G corresponds to a
block of code and each arc corresponds to a branch of
decision in the program. The McCabe cyclomatic metric[8] of such graph can be defined as:
CC(G) = E – N + 2P
(1) where,
x E: is the number of edges of G.
x N: is the number of nodes of G.
x P: is the number of connected components.
The formula (1) can also be written as:
CC(G) = D +1 (2) where,
x
D: is the number of decisions inside of the code.
Even if this information supplies only a portion
complex picture, McCabe [7] tried to extend his
into an architectural design and developed a
methodology that integrates the notion of
complexity with the testing requirement.
of the
metric
testing
design
Figure 1. Taxonomy Dimensions of C&K Metrics.
While C&K metrics are used only at class and method
levels, our approach uses metrics on each of the three
levels: class, method, and statement.
Figure 2 illustrates how our suite of complexity metrics
would be apportioned among our taxonomy dimensions.
3.3. Data Metrics
Data complexity metrics car be divided in two
different aspects: data flow and data usage. Data flow is
the number of formal parameters of activities and the
mappings between activities’ data [9]. We will define
Data usage for a statement to be the number of variable
values used in that statement plus the number of variable
assigned new values in that statement.
The development of test cases of many researchers was
based on the program unit’s variables. The emphasis of
test cases was based on data and data flow or Data-Usage
Path [10]. Chidamber and Kemerer metrics [5], also
known as C&K metrics, were among the first family of
related metrics that address many concerns of OO
designers including relationships such as coupling,
cohesion, inheritance, and class size [11]. The notion of
cohesion and the various complexity metrics associated
with the cohesion are also related to data variables. In OO,
Figure 2. Taxonomy Dimensions of Our Approach.
3
Int'l Conf. Software Eng. Research and Practice | SERP'15 |
4. Comprehensive Taxonomy of Metrics
Software engineers use measurement throughout the
entire life cycle. Software developers measure the
characteristics of software to get some sense of whether
the requirements are consistent and complete, whether the
design is of high quality, and whether the code is ready to
be tested. Project Managers measure attributes of the
product to be able to tell when the software will be ready
for delivery and whether the budget will be exceeded.
Customers measure aspects of the final product to
determine if it meets the requirements and if its quality is
sufficient. And maintainers must be able to assess and
evaluate the product to see what should be upgraded and
improved.
Software metrics usually are considered in one or two
of four categories:
x Product: (e.g. lines of code)
x Process: (e.g. test cases produced)
x People: (e.g. inspections participated in)
x Value to the customer: (e.g. requirements completed)
In our work, we will concentrate on product metrics as
selectors for test cases. Previous work using metrics
almost always considered only a small set of metrics
which measured only one or two aspects of product
complexity.
Our work starts with the development of a
comprehensive taxonomy of product metrics. We will
base this taxonomy on two dimensions: (1) the level of the
product to which the metric applies; and (2) the
characteristic of product complexity that the metric
measures.
In future work, we hope to produce a comprehensive
taxonomy from the other kinds of metrics.
The scope of consideration dimension includes the
following values:
(1) the product’s context including other software
and hardware with which the product interacts
(2) the entire product
(3) a single subsystem or layer
(4) a single component
(5) a class
(6) a method
(7) a statement
For the initial uses of this taxonomy reported in this
paper, we will use only (5), (6), and (7) since they appear
to be the most relevant scopes for unit testing. Future
99
work may add (3) and (4) as we consider integration
testing. Values (1) and (2) may be used for system testing.
The complexity kind dimension includes the following
values:
1) Size
2) control flow
3) data
Each of these values in turn has sub-values.
For size, the sub-values are:
a) number of units (e.g. statements)
b) number of interactions (e.g. number of method
calls)
For control flow, the sub-values are:
a) number of decisions
b) depth of decisions
For data, the sub-values are:
a) data usage
b) data flow
4.1. Metrics at Statement Level
4.1.1. Data Complexity. In our research, we consider
two separate aspects, data flow and data usage. Data flow
is based on the idea that changing the value of any
variable will affect the values of the variables depending
upon that variable’s value. However, data usage is based
on the number of data defined in the unit being considered
or the number of data related to that unit. We will define
data usage for a statement to be the number of variable
values used in that statement plus the number of variable
assigned new values in that statement.
Data flow complexity measures the structural
complexity of the program. It measures the behavior of
the data as it interacts with the program. It is a criteria that
is based on the flow of data through the program. This
criteria is developed to detect errors in data usage and
concentrate on the interactions between variable definition
and reference.
Several testers have chosen testing with data flow
because data flow is closely related to Object Oriented
cohesion [12, 14]. One measure of class cohesion is how
methods are related through common data variables.
Data flow testing is a white box testing technique that
can be used to detect inappropriate usage of data values
due to coding errors [15]. For instance, a programmer
might use a variable without defining it or might define a
variable without initializing it (e.g. int a; if (a==1) {…}).
A program written in an OO language, such as Java,
contains variables. Variables are defined by assigning
values to them and are used in expressions. An assignment
4
100
Int'l Conf. Software Eng. Research and Practice | SERP'15 |
statement such as: x = y + z ; defines the variable x. This
statement also makes use of variables y and z. In this case,
the variable x is called a definition while the variables y
and z are called uses.
The declaration statement such as: int x, y, z; defines
three variables x, y, and z. The three variables are
assumed to be definitions.
In our research, data flow will be estimated for each
statement in a method by counting how many active data
values there are when the method executes. Active data
values will be counted by determining which variable
assignments could still be active when this statement
begins execution plus the number of assignments done in
this statement. As an example, let us consider the
following Java class:
Figure 3. Java Code – Scope Metric Example.
Table 1 shows the scope metric value of each statement
in the code of Figure 3.
Table 1. Scope Metric Values of Statements of Figure 5.
Statement
In the first statement of this code, the variable
is a
definition. The same variable is a use in the second
statement. Thus, the data flow of this statement is 1.
In the second statement,
is a definition and
assigned a value. The variable is a use in the third
assignment. Thus the data flow value of the second
statement is 2.
In the third statement,
is a definition and
(6), (11)
(6), (11), (14)
(6), (11)
None
Scope
Metric Value
0
1
2
3
2
0
assigns a
new value. The variable is no longer active before the
method executes. Thus the data flow value of this third
statement is 1.
On the other hand, as an example of data usage, let us
consider the statement assignment:
(4), (5)
(8), (9),
(10)
(13)
(15)
(16)
(19)
Construct
Level contains
the statement
None
(6)
.
The variables and are used, and the variable is
assigned a new value in the statement. Thus the data usage
of this statement is 3.
4.1.2. Control Flow Complexity. In our research, we
will use one control flow measure, the scope metric [16].
For each statement, we will count how many control
constructs (do while, if-else, for, while …) contain this
statement.
For example, assume that Figure 3 illustrates a
statement fragment code of a return method named
method C within the class “class C”.
The construct level statements in this code are the
statements numbered (6), (11), and (14).
4.1.3. Size Complexity. Our size metrics relied on the
Halstead Software Science Definition. We will use a
simplified version of Halstead’s operators count discussed
previously. Halstead's software science is one traditional
code complexity measure that approaches the topic of
code complexity from a unique perspective. Halstead
counted traditional operators, such as + and ||, and
punctuations, such as semicolon and ( ), where
parentheses pair counted as just one single operator.
In our work, we will count just traditional operators for
simplicity by counting the number of operators used in
each statement of the code.
Figure 4 shows the metrics used in this research at the
statement level. These four metrics will be used as roots to
derive other complexity metrics that will be used at the
method level and class level.
5
Int'l Conf. Software Eng. Research and Practice | SERP'15 |
101
Table 2. Metric Results of the Code in Figure 5.
Control Flow
Complexity
Data
Complexity
Size
Complexity
Data Flow
Number of
Levels
Data Usage
Number of
Operators
Figure 4. Complexity Perspectives and Metrics at
Statement Level.
4.2. Metrics at Method Level
Since the method constitutes different statements, we
will use both the sum of each metric for the statements
within that method and the maximum value of each metric
for the statements within that method. In addition to the
sum and the maximum of these metrics, we will use
another single level metric that counts the number of other
methods within the same module (class) that call this
method.
An example of this additional metric is shown in Figure 5.
Method
Other methods that call this method
method1
method2, method3
Metric
Value
2
method2
method1, method3
2
method3
None
0
Figure 6 illustrates the nine metrics that will be used to
measure the complexity of a method. Eight of these nine
metrics are derived from the four metrics defined at
statement level.
# of
Levels
# of
Operators
Total
# of
Level
Total #
of
Operator
Max #
of
Level
Max # of
Operators
Data
Flow
Total
# of
DF
Max
# of
DF
Data
Usage
Total
# of
DU
# of
other
Metho
ds
Max #
of DU
Figure 6. Complexity Metrics at Method Level.
4.3. Metrics at Class Level
Figure 5. Example of Other Methods that Call a Method
within the Same Class.
For each method within the class “ClassA”, the number of
other methods that call that method with the same class is
shown in Table 2.
At the class level, we will use both the sum of each metric
for the methods within the class and the maximum value
of each metric for the methods within the class. We will
then add two additional metrics: the in-out degree of that
class, which is the number of methods outside of that class
that are called by at least one method in that class, and the
number of public members within the class. The public
members within a class are defined as the public fields
and the public methods defined in that class.
As a summary of the comprehensive taxonomy of
metrics that will be used in our research, for each
executable statement within a method we will have 4
metrics that emerged from three complexity dimensions:
x
x
x
Data Dimension: active data values and Data
usage values.
Control Dimension: scope metric.
Size Dimension: number of operators.
6
102
Int'l Conf. Software Eng. Research and Practice | SERP'15 |
For each method, we will have nine metrics. 2 metrics
constitute the total and the max of the metrics of each
statement within that method plus the number of other
methods that call that method.
For each class, we will have twenty metrics, two metrics
compose the total and the max of each of the 9 metrics
that will be used for the method within that class, plus two
more metrics including the number of methods outside of
that class that are called by at least one method in that
class, and the number of public members within the class.
5. Conclusion
This paper aims at developing a comprehensive
taxonomy of product metrics that can be used to target test
cases. This taxonomy is based on the metric dimension
(product level) and the kind dimension (product
complexity characteristic). We used the scope metric
dimension values of class, method, and statement. We
considered kind dimension values of size, control flow,
and data. The three kind dimension values of product
complexity have sub-categories. The size has the number
of units and the number of interactions. The control flow
has the number of decisions and the depth of decisions.
The data has the data flow and the data usage.
In our work, we used at least one sub-category from
each complexity kind dimension value. For the size, we
used the number of units and the number of interactions.
For the control flow, we used only number of decisions.
For the data, we used data flow and data usage.
Another contribution of this research was the use of
summation and maximum to build larger scope metrics
from smaller scope metrics.
6. References
[1] B. Falah, K. Magel. “Test Case Selection Based on a
Spectrum of Complexity Metrics”. Proceedings of 2012
on International Conference on Information
Technology and Software Engineering (ITSE),
Lecture Notes in Electrical Engineering , Volume
212, 2013, pp. 223-235
[2] B. Falah, K. Magel, O. El Ariss. “A Complex Based
Regression Test Selection Strategy”, Computer Science &
Engineering: An International Journal (CSEIJ), Vol.2,
No.5, October 2012
[3] B. Falah. “An Approach to Regression Test Selection Based
on Complexity Metrics” , Scholar’s Press,
ISBN-10: 3639518683, ISBN-13: 978-3639518689, Pages:
136, October 28, 2013
[4] M.H. Halstead, “Elements of Software Science,”
Operating and programming systems series, New
York: Elsevier North-Holland, 1977.
[5] S. R. Chidamber and C.F. Keremer, “A Metric Suite
for Object Oriented Design,” IEEE Transactions on
Software Engineering, Vol. 20, No 6, June 1994,
pages 476- 493
[6] T. J. McCabe and Charles Butler, "Design Complexity
Measurement and Testing," Communications of the
ACM, Vol. 32, Issue 12, December 1989.
[7] Thomas J. McCabe, “A Complexity Measure,” IEEE
Transactions on Software Engineering, Vol. SE-2,
No.4, December 1976.
[8] M. Clark, B. Salesky, C. Urmson, and D. Brenneman,
“Measuring Software Complexity to Target Risky
Modules in Autonomous Vehicle Systems,” AUVSI
Unmanned Systems North America, June 2008.
[9] J. Cardoso, “Control-Flow Complexity Measurement
of Processes and Weyuker’s Properties,” Word
Academy of Science, Engineering and Technology,
August 2005.
[10] S. Rapps and E. Weyuker, “Selecting Test Data
Using Data Flow Information,” IEEE Transactions
on Software Engineering, Vol. SE- 11, No. 4, April
1985, pp.
367-375.
[11] R. Harrison, S. J. Counsell, and R.V. Nithi, “An
Investigation into the Applicability and Validity of
Object Oriented Design Metrics,” Empirical Software
Engineering, Vol. 3, Issue 3, September 1998.
[12] S. R. Chidamber and C.F. Keremer, “A Metric Suite
for Object Oriented Design,” IEEE Transactions on
Software Engineering, Vol. 20, No 6, June 1994,
pages 476- 493
[13] S.R. Chidamber and C. F. Kemerer, “Towards A
Metrics Suite for Object Oriented Design,” In
Proceeding of the ACM Conference on ObjectOriented Programming Systems, Languages, and
Applications. Vol. 26, Issue 11, November 1991.
[14] F. Damereu, “A Technique for Computer Detection
and Correction of Spelling Errors,” Communications
of the ACM, Vol. 7, Issue 3, March 1964.
[15] J.P. Myers, “The Complexity of Software Testing,”
Software Engineering Journal, January 1992, pp. 13
– 24.
[16] H. F. Li and W. K. Cheung, “An Empirical Study of
Software Metrics,” IEEE Transactions on Software
Engineering, Vol. 13, Issue 6, June 1987.
7
View publication stats