In this excerpt, we take advantage of the huge amount of open source software available to study the relationships between different size and complexity metrics. To avoid suffocating in the myriads of attributes and metrics, we focus only on one programming language: C, a “classic” in software development that remains one of the most popular programming languages. We measure a grab-bag of metrics ranging from the simplest and most commonly cited (lines of code) to some rather sophisticated syntactic metrics for a set of about 300,000 files. From this we have found out which metrics are independent from a statistical point of view—that is to say, whether traditional complexity metrics actually provided more information than the simple lines-of-code approach.
The results shown suggest that for non-header files written in C language, all the complexity metrics are highly correlated with lines of code, and therefore the more complex metrics provide no further information that could not be measured simply with lines of code.
However, these results must be accepted with some caution. Header files show poor correlation between cyclomatic complexity and the rest of metrics. We argue that this is because of the nature of this kind of file. In other words, header files do not contain implementations, only specifications. We are trying to measure the complexity of source code in terms of program comprehension. Programmers must of course read and comprehend header files, which means that header files can contribute to complexity to a certain extent. However, even though cyclomatic complexity is poorly correlated with lines of code in this case, that does not mean that it is a good complexity metric for header files. On the contrary, the poor correlation is due only to the lack of control structures in header files. These files do not contain loops, bifurcations, etc., so their cyclomatic complexity will always be minimal, regardless of their size.
For nonheader files, all the metrics show a high degree of correlation with lines of code. We accounted for the confounding effect of size, showing that the high correlation coefficients remain for different size ranges.
In our opinion, there is a clear lesson from this study: syntactic complexity metrics cannot capture the whole picture of software complexity. Complexity metrics that are exclusively based on the structure of the program or the properties of the text (for example, redundancy, as Halstead’s metrics do), do not provide information on the amount of effort that is needed to comprehend a piece of code—or, at least, no more information than lines of code do. This has implications for how these metrics are used. In particular, defect prediction, development and maintenance effort models, and statistical models in general cannot benefit from these metrics, and lines of code should be considered always as the first and only metric for these models.
The problem of code complexity versus comprehension complexity has been faced in the research community before. In particular, a semantic entropy metric has been proposed, based on how obscure the identifiers used in a program are (for instance, names of variables). Interestingly, those kind of measurements are good defect predictors.
This does not mean there are no useful lessons to take from traditional complexity metrics. First, cyclomatic complexity is a great indicator for the amount of paths that need to be tested in a program. Halstead’s Software Science metrics also provide an interesting lesson: there are always several ways of doing the same thing in a program. So if you choose one way and use it in many parts of the program, you’ll make your code more redundant, in turn making it more readable and less complex—in spite of what other statistics might say.
Learn more about this topic from Making Software.
Many claims are made about how certain tools, technologies, and practices improve software development. But which claims are verifiable, and which are merely wishful thinking? In this book, leading thinkers such as Steve McConnell, Barry Boehm, and Barbara Kitchenham offer essays that uncover the truth and unmask myths commonly held among the software development community. Their insights may surprise you.