Jump to content

Best Practices for Developer Code Review

+ 1
  andyo's Photo
Posted Nov 29 2010 08:22 AM

There are a million ways to do a code review, but there’s one component that’s common to all—a single developer critically examining code. So it makes sense to ask what the data tells us about developers diving into code by themselves, without distraction and without discussion.

What are the best practices during quiet contemplation?

Focus Fatigue

How much time should you spend on a code review in one sitting?

Figure 18-1 maps how efficient we are at finding defects over time [Dunsmore 2000]. The horizontal axis plots time passing; the vertical is the average number of defects a person found at this time index.

Figure 18-1. After 60‒90 minutes, our ability to find defects drops off precipitously

Attached Image


Looking early on in the code review, there’s a clear linear relationship between time and defects: every 10 minutes another defect is found (on average). This is encouraging—the more time spent in the review, the more defects are found.

That’s really efficient if you stop and think about it. What other software development processes do you know in which a new bug is found (and often fixed) every 10 minutes? Certainly not in the usual development/QA loop by the time you count the effort by both the tester and the developer.

But around the 60-minute mark, there’s a sudden drop-off. Now another 10 minutes doesn’t necessarily turn up another defect. The efficiency of finding defects drops sharply.

The usual theory explaining this effect is that around one hour we get mentally fatigued; it’s just a long time to be concentrating on a technical task. But whatever the cause, it’s clear that we shouldn’t review code for more than an hour at a time.

Speed Kills

It’s logical that the more time you spend on a piece of code, the deeper and more nuanced your analysis will be and the more bugs you’ll find. It’s similarly logical that if you race through a review, you’ll find only the shallowest errors.

Where is the balance between spending enough time to root out important problems but not wasting time dwelling on decent code?

Figure 18-2 shows the effect of speed on ability to find defects in a study of 2,500 reviews [Cohen 2006].

On the horizontal axis we have the speed of the review measured in lines of code (LOC) per hour. The vertical axis shows defect density, which is the number of defects found per 1000 LOC (or kLOC). “Density” is used rather than “number of defects” because reviews with more lines of code will naturally have more defects, so this normalizes the results across reviews of different sizes.

Figure 18-2. Defect detection drops at 400‒500 lines of code per hour

Attached Image


Between zero and about 400 LOC/hour, you can see the defect density is evenly distributed between zero and 100 defects/kLOC. This distribution is expected. To see why, consider a review consisting of adding documentation to an interface. Three hundred lines of code might contain few or no defects—a low defect density. Now consider a review of brain-twisting, multithreaded, high-performance code in a core module that the entire application depends on. Perhaps just four lines of code were changed, but because of the complexity of the problem and the necessity of getting it right, several reviewers criticized the code and nitpicked the smallest of problems, and two defects were found—a high defect density.

In short, reviews will naturally have a range defect densities, so that spread is normal.

The problem comes with reviews faster than 400‒500 LOC/hour. Suddenly we rarely see a review with a high defect density. Of course that’s not because there are no defects, but because the reviewer is going too fast to find those defects.

Size Kills

Let’s put the previous two results together. If we should spent at most one hour in review, and if we can review at most 400 LOC/hour, then we conclude that we shouldn’t review more than 400 LOC at one time.

A sensible theory; let’s put it to the test.

Figure 18-3. As the size of the code under review increases, our ability to find all the defects decreases. Don’t review more than 400 lines of code at a time.

Attached Image


In Figure 18-3, we see how defect density is affected by the amount of code under review [Cohen 2006]. As in the last section, we expect to see a defect density spread between zero and 100 defects/kLOC, and we do see that with the smaller reviews. But as the review size increases, defect density drops off. As predicted, at around 300‒500 LOC, it’s clear that the reviews aren’t being effective.

The fact that this data matches the expected result so well lends credibility to all three results.
Cover of Making Software
Learn more about this topic from Making Software. 

Many claims are made about how certain tools, technologies, and practices improve software development. But which claims are verifiable, and which are merely wishful thinking? In this book, leading thinkers such as Steve McConnell, Barry Boehm, and Barbara Kitchenham offer essays that uncover the truth and unmask myths commonly held among the software development community. Their insights may surprise you.

Learn More Read Now on Safari


Tags:
2 Subscribe


2 Replies

0
  AlexF1's Photo
Posted Nov 30 2010 02:01 PM

It appears this post is based on the work of Jason Cohen, founder of Smart Bear Software, and contributor to the Making Software book, but it wasn't made clear.

A Webinar with several of the book's authors is scheduled for December 16 at 1 PM EST. Details will be made available on Twitter via @smartbear.

Thank you.

Alex
+ 1
  andy-lester's Photo
Posted Dec 01 2010 07:42 AM

I suggest that the first graph, showing a plateau of code review effectiveness at the 60 minute mark, also applies to any meeting. An hour is a long time to be doing serious brainwork of any sort. (And if your meeting is not serious brainwork, why are you holding it?)

Any meeting I'm chairing, there's a five-minute break at an hour whether we think we need it or not.