McCabe's Cyclomatic complexity metric interpreted.
McCabe's Cyclomatic complexity description
McCabe developed a metric that allows developers to identify modules that are difficult to test or maintain.
As a result, he developed a software metric that equates complexity to the number of decisions in a program. Developers can use this measure to determine which modules of a program are overly-complex and need to be re-coded. The metric can also be used to define the minimal number of test cases that are needed to adequately test the program's paths.
McCabe bases the metric on graph theory. Because the metric is discussed in graph theory terms, fundamental graph concepts are presented here.
The flow of a program can be represented by a program control graph. The elements of the graph are nodes (or vertices) and edges (or arcs). A node is a sequential block of code. Control flow may only enter the first statement of the block and may only branch from the last statement. The nodes represent the executable statements of a program and the edges correspond to the flow of control between nodes. The top node of the graph, from which other nodes stem, is the entry node. The bottom node defines the end of the block of code, as illustrated in the figure on the side.
The metric can be calculated either from evaluating the graphs of the program or from evaluating the program's statements.
The general formula for calculating the Cyclomatic Complexity is as follows:
$
\begin{equation}
v\left(G\right) = e - n + 2\times p
\end{equation}
$
where “v(G)” is the Cyclomatic Complexity, “e” is the number of edges, "n" is the number of nodes, and "p" is the number of modules. A module can always be represented by a connected graph. A connected graph is one in which any two nodes of the graph are connected to each other by a chain of nodes and edges. However, this formula is based on a theorem that applies to strongly-connected graphs.
At the simplest level, where a graph has one module, the formula reduces to the following:
$
\begin{equation}
v\left(G\right) = e - n + 2
\end{equation}
$
This reduced equation has been used extensively in industry and is often referred to as “McCabe's Complexity Metric”, although his general equation actually includes the variable “p”. Using the reduced form can lead to incorrect results if the developer is not careful. If the totals of each variable for several modules are used in the equation, the result will underestimate the real Cyclomatic Complexity because the number of modules variable, p, is not present in this shortened equation.
The Cyclomatic Complexity of a module also gives the maximum number of linearly independent paths through it. A path is a route from the entry node to the exit node. For a path to be linearly independent, it must not be a linear combination of any other paths in the graph.
The Cyclomatic Complexity measure can be used on both structured and unstructured programs. Additionally, it can be automated, as evidenced by the tools that exist on the market.
- The Cyclomatic Complexity can also be computed by counting the branch conditions in a module:
-
$
\begin{equation}
v\left(G\right) = \pi + 1
\end{equation}
$
$
\mbox{where “}\pi \mbox{” is the number of simple branch conditions.}
$
- $\mbox{The “}\pi+ 1\mbox{” formula}$ is convenient because it allows developers to calculate the Cyclomatic Complexity of a program without having to use graph analysis. This Cyclomatic Complexity equation only applies to individual modules.
- The module must contain only single-entry and single-exit, structured, blocks of code for this formula to be precise. In single-entry, single-exit blocks of code, any called subroutines must pass the control flow back to the calling module. If the code branches out of or into a loop, or branches out of or into a decision, $\pi+ 1$ will not accurately reflect v(G) because the relationship of nodes to edges is upset. Hence, $\pi+ 1$ will no longer accurately represent $\mbox{the “}e - n + 2p\mbox{” formula}$ from which it is derived.
Besides measuring complexity, the Cyclomatic Complexity Measure indicates the maximum number of test cases that must be generated to achieve branch coverage. For this type of test coverage, a basis set of paths constitutes a set of test cases that ensures branch coverage. However, "the number of paths in a basis can greatly exceed the minimum number of paths required to achieve branch coverage." (Prather 1983). Hence, this method could lead to over-testing. Furthermore, as previously mentioned, the basis set is not uniquely determined.
General Analysis of McCabe's Cyclomatic Complexity Metric
In order to produce repeatable, well-behaved measures, McCabe's metric, like Halstead's metrics, requires that precise counting rules be defined and implemented. These rules will differ depending on the language's constructs.
To use McCabe's metric properly, efforts must be made to ensure the following:
- The source code is error-free (e.g., it has compiled correctly).
- Unconditional branching does not contribute to the count.
- Every condition in a statement is considered.
- The formulas are applied properly. To use the "e - n + 2p" formula, treat each subgraph separately. Do no attempt to join the separate modules with the main module by adding edges in the
control graph. Instead, obtain the edge and node counts for each module and sum them to get an overall count of the program's Cyclomatic Complexity. To use the "$\pi$+ 1" formula, apply the formula to each module.
- All edges and nodes are properly counted for the "e - n + 2p" formula. Similarly, in the "$\pi$+ 1" formula, ensure that all forms of conditions are anticipated by the counting rules and
thus are counted.
McCabe's metric is flexible. The complexity of a module can be determined by counting the number of conditions in a program or by counting nodes and edges in a control graph. The former allows the process to be easily automated. Many commercial tools are available that tally a given language's constructs. When the "e - n + 2p" equation is used, the code is graphically represented. The graphs allow the programmer to see the structure of individual modules, and the graphs can be used to diagnose overly-complex structures.
Although McCabe defines three types of nodes (collecting, function, and predicate), he never clearly defines what statements each should contain. Hence, the implementor must judge what constitutes a node and must clearly define it. The measure will yield inconsistent results if implementors define nodes differently. For a particular application, however, results will not be inconsistent if the nodes have been defined precisely.
The "π+ 1" equation has several potential implementation problems. The main problem with using this equation stems from the fact that it only applies to structured code. Most real applications contain a portion of unstructured code. So although industry fully embraces this form of the Cyclomatic Complexity Measure because it can be calculated solely from code, its results may be questionable. Further, assuming the code is very structured, the measure may still be implemented incorrectly on multiple modules, as noted earlier.
The formula "e - n + 2p" will yield the most consistent, repeatable results. Because this formula is additive, it can be applied to modules or the entire application. Although graph analysis is
required, the analysis can be automated. Again, nodes must be defined precisely. The CE may request that the developer submit definitions of the nodes to ensure that the measure has been properly interpreted.
McCabe's Cyclomatic Complexity Metric claims to measure the complexity of a module by counting the number of decisions in a program. The metric measures the complexity within a module. It does not assess the complexity that might exist in the connections between modules. Further, McCabe's metric measures only the control structures within a module. Although it may prove to indicate a module's complexity in general, because of the metric's narrow focus, there may be exceptions.
For example, Module A contains a simple sequence control structure in which v(G) is 1. Module B also has a v(G) of 1. When the code is inspected, Module A is found to contain 1000 lines of sequential source code; Module B is found to contain one line. According to the Cyclomatic Complexity metric, these two modules have the same complexity. Obviously, the one containing 1000 lines is more complex, even at the most cursory level of reading the module. This situation is atypical, however, since a module containing 1000 lines of code would more likely contain complex control structures as well. In general, studies show a high correlation between McCabe's Complexity Metric and a module's complexity and error-proneness (Myers 1977).
Since 1982, McCabe's Cyclomatic Complexity Metric has received widespread support, both in industry and the government. McCabe's measure provides an upper threshold of complexity for modules. Industry and government have supported his notion that a module containing a v(G) greater than 10 is too complex. Project leaders have used this number to determine the complexity of modules, and to identify complex code in applications.
SQM Based on McCabe's Software Metric
McCabe's SQM, like Halstead's, is narrowly focused. McCabe's metrics look at the control flow to determine the structural complexity of a module. The metrics are not very context-sensitive. They only address conditional statements.
Kitchenham (1981) casts some doubts on the generality of McCabe's metrics. She failed to find strong correlations between complexity and McCabe's and Halstead's metrics. Kitchenham concedes that more research is required to identify the particular conditions under which these metrics can be applied.
The narrow focus means the v(G)-Complexity SQM provides only a partial assessment of the complexity of a module. Other SQM pairs, such as v(G)-Simplicity and v(G)-Understandability provide a partial assessment for the same reason. While they are viable SQM, they should be used as a measure of how control-flow affects each of the software qualities.
McCabe's Cyclomatic Complexity Metric fares better with respect to the following SQM:
- v(G) - Maintainability
- v(G) - Modularity
- v(G) - Testability
Industry supports these SQM as valid indicators of the extent to which a program is maintainable,
modular, and easily tested. The v(G)-Testability SQM continues to receive the most attention in
private industry and the government.
McCabe's Cyclomatic Complexity Measure addresses the structural coverage requirements for assessing
test coverage. However, although it determines the maximum number of test cases required to
generate branch coverage, it does not provide a number that will indicate path coverage.
Hence, if this SQM is used to calculate the number of test cases, it should not be used to
calculate the critical modules' test cases, which require path coverage.