Measuring developer productivity
By Rick Andrews
Developer productivity is a big issue at Microsoft. Highly skilled developers are a precious and expensive resource, and Microsoft wants to make sure all its developersare productive.
For nearly two years on the Microsoft Six Sigma team, I studied developer productivity and how to best measure it. It didn’t take long to figure out that we really had no good way to measure developer productivity.
Software engineering is unique in this respect. When you look at practically every other engineering discipline – whether it’s manufacturing widgets, building bridges, or some other type of construction or manufacturing – they all have at least one clear wayto measure how well their engineers are doing. At Microsoft, we have ways to measure software developers, but they’re all controversial and poorly adopted.
The most common way to rate developer productivity at Microsoft has always been to count the number of bugs per 1,000 lines of code a developer writes – the infamous bugs per KLOC measurement.
With almost every team I visited at Microsoft, as soon as I mentioned bugs per KLOC, I’d hear groans. Developers didn’t want to be measured by it. They didn’t feel it was a fair metric. I knew there had to be a better method than counting bugs per KLOC, a method that seems to be particularly disliked by developers here and has many detractors (see accompanying article.)[JS1] But what could replace this common metric, which is so easy to calculate?
I decided the solution was to create a new tool that measured something other than bugs, which are a notoriously unreliable way of evaluating developer productivity.After much research and discussion with teams across Microsoft, I decided to look at how many lines of code a developer writes that later have to be changed.
The reasoning behind this approach is simple. Each time a line of code is changed, it’s changed for a reason.I think it’s safe to assume that the reason is because the code wasn’t written correctly the first time--whether because of dev error, design changes, planned changes, etc. etc., whatever, Maximizer breaks these changes down into these various categories based on input from the developer checking in the changes. Thus, metrics can be calculated based on these categories. Every line of code that has to be later changed is a defect of sorts, since it’s inefficient to have to go back and make changes. It saps productivity.
I’ve had some people disagree with me that counting lines of changed code is a good way to measure productivity. But most of the development teams I worked with understood the usefulness of this metric after they tried measuring it.[JS2]
I worked with Maxim Stepin, a developer on the Microsoft Productivity Tools team, to co-develop a new tool that measures lines of changed code. We finished the first version of the tool in Sept. 2002 and called it the Maximizer, because it has the potential to help development teams maximize their productivity (and Maxim gets a tool named after himself[RA3].)
Both Maxim and I have since moved on to new developer jobs [RA4]inside Microsoft, but we still maintain the tool and feel strongly about the need to study and improve developer productivity.
Every time a developer checks in his or her code, the Maximizer counts the lines of code. It uses Visual SourceSafeSource Depot to track the lines that are later changed (including lines added, deleted, or moved), and calculates the fraction or percentage of code changed.
We spent six months working on algorithms to make sure the Maximizer accurately counts lines of code changed. I explain The Maximizerin detailon my which is devoted to code quality and developer productivity issues. (There is alink to download the tool at the end of the description.)
Understanding the numbers
We are gathering data from the first teams using the Maximizer. Our goal is to establish a baseline of data so we can understand what the numbers mean. We want to learn what percentages of lines of code changed can be considered good, and what is poor. The key is to get good baseline numbers from as many teams as possible. For the first two teams we are testing the Maximizer with, we are finding an average of 2.4 of every 100 lines of their code are later changed.
If developers are unhappy having their productivity measured by any metric based on bugs, then managers will have trouble using these metrics to drive improved productivity. We hope our tool eventually can give developers confidence that their productivity is being measured in a meaningful, accurate way.
We know our method of measuring isn’t foolproof. For example, what if the reason a line of code is changed has nothing to do with a mistake? What if the code was planned from the beginning to be changed later? We have tried to address most of these problems with the Maximizer.
There are still ways that a developer could “game” his numbers to look good to the Maximizer. For example, if a developer knows he is being judged by the number of lines of code changed, he may decide to change fewer lines, even if he thinks the code could benefit from more changes. But we believe it is less likely any developers will attempt to game their numbers with our tool compared to methods that count lines of code or bugs; plus, this type of "gaming" would be easily caught in code reviews leaving the developer accountable.
Good developers will never lower their standards to make a productivity metric look good. They will always fix or report bugs they find, write compact code, and follow other good coding practices.
We didn’t design our tool as a way to evaluate individual developers. And we don’t recommend that any managers reward their developers with higher review scores, promotions, or bonuses based on the results of our tool. That’s not our goal. A tool can’t take the place of a manager’s judgment when evaluating people.
But if we are successful, more teams will be able to start taking meaningful measurements of the quality and productivity of their work. And we think we can make progress in at least one area: Helping to speed the demise of the practice of counting bugs per KLOC.
[JS1]Link to main article in the Feb. issue
[JS2]Rick – Here is where it would be helpful to be able to quote a dev manager on one of the teams that tried your tool. Can you give me some names?
[RA3]I predict that Maxim will want to remove this line, but I love it. Thanks! Please remove it, though, if he requests.
[RA4]I'm no longer a developer; I've stooped to being a Program Manager Lead :-) May I burn in Dev hell :-)