Split Loop

Refactoring contributed by Martin Fowler

You have a loop that is doing two things

Duplicate the loop


void printValues() {
	double averageAge = 0;
	double totalSalary = 0;
	for (int i = 0; i < people.length; i++) {
			averageAge += people[i].age;
			totalSalary += people[i].salary;
	}
	averageAge = averageAge / people.length;
	System.out.println(averageAge);
	System.out.println(totalSalary);
}


void printValues() {
	double totalSalary = 0;
	for (int i = 0; i < people.length; i++) {
			totalSalary += people[i].salary;
	}

	double averageAge = 0;
	for (int i = 0; i < people.length; i++) {
			averageAge += people[i].age;
	}
	averageAge = averageAge / people.length;

	System.out.println(averageAge);
	System.out.println(totalSalary);
}

Motivation

You often see loops that are doing two different things at once, because they can do that with one pass through a loop. Indeed most programmers would feel very uncomfortable with this refactoring as it forces you to execute the loop twice - which is double the work.

But like so many optimizations, doing two different things in one loop is less clear than doing them separately. It also causes problems for further refactoring as it introduces temps that get in the way of further refactorings. So while refactoring, don't be afraid to get rid of the loop. When you optimize, if the loop is slow that will show up and it would be right to slam the loops back together at that point. You may be surprised at how often the loop isn't a bottleneck, or how the later refactorings open up another, more powerful, optimization.

Mechanics

Example

Additional Comments

The split loop is an often used performance optimisation in data intensive applications. When you are accessing to separate arrays in the same loop you can get hit badly by the cache misses. That is, if the arrays are large enough you loose locality of reference and the cache fails to do the job. Put enough code here and every operation will not hit the cache and instead have to go back out to main memory.

By splitting loops up into small separate components that only act on one array you get significant performance increases. In rendering code I've seen up to an order of magnitude difference. This is particularly beneficial if it is primitive type based and less so if class reference or pointer based where you need an additional dereference off into a random memory location. This technique is also very similar to loop unravelling for performance gains (although something I doubt would ever appear in a refactoring/patterns/OO type book :)

--Justin Couch

Thanks to Bob Bowman for spotting and fixing an error.

| Refactoring Home | | Alphabetical List |