Java: Developing On The Streets: Source Code Quality Estimation is Hard

As you might have guessed I don't believe in metrics that estimate software quality by measuring source code. In the most general case they are bounded by this paradox. Even if not taken too generally, there are still considerable difficulties related to such metrics.

In order to avoid confusion and misinterpretation let me say that (a) the items on this list reflect current knowledge. Future research may find solutions to the problems therein; (b) I am not claiming here that every metric is bad nor that all reasons hold for all metrics.

1. Source code quality metrics will never be able to detect infinite loops.

In other words, you can get a good score for a program that does stupid things at run time.

2. Lack of normalization

Most metrics are not normalized: they do not have a scale by which scores can be compared. With a non-normalized metric the distance between 15 and 20 may completely different than the distance between 25 and 30. In other words, if 100 is the perfect score, then it may be that getting from 99 to 100 is a million times more difficult than from 98 to 99.

3. Indirectness: metrics do not measure the "real thing"

The real thing is software quality, which is a vague notion . For the sake of the discussion let's define quality as "number of bugs". Source code metrics do not measure number of bugs directly. They measure artifacts which are only remotely related to number of bugs.

Think about a 100-meter dash. Instead of determining the winner by directly measuring the running times of the contestants, you measure the number of practices they had in the last four weeks.

4. Blind spots

Source code metrics are mostly blind to DSLs, to the universal design pattern and a bunch of other useful techniques.

5. Detecting the negative

Many metrics work by by collecting negative points. E.g., the Cyclomatic complexity metric gives you negative points for branches in your code. The problem is that Spotting "bad" things is not equivalent for spotting "good" things.

The first implication is that such a metric can only help you when you are doing something wrong. It does not tell you that your are doing something right. In a way, it leads to mediocrity.

Moreover, let's see what happens when you use a metric that detects the negative and also has blind spots (which, sadly, most of the metrics are so) . Such a metric will encourage you to make the blind spots as bigger as possible. As the blind spot grows, the amount of negative things that can be detected lessens. The metric simply beats its own purpose.

6. Focus on things that can be measured

By definition, metrics ignore things that cannot be measured. Focusing only on the measurable makes sense as much estimating the quality of a novel by calculating the number of sentences per chapter.