Guidance Cards - Refactor

From School of CS Student Wiki
Revision as of 07:35, 29 July 2014 by gravatar Mbax9mb5 [userPHRhYmxlIGNsYXNzPSJ0d3BvcHVwIj48dHI+PHRkIGNsYXNzPSJ0d3BvcHVwLWVudHJ5dGl0bGUiPkdyb3Vwczo8L3RkPjx0ZD51c2VyPGJyIC8+PC90ZD48L3RyPjwvdGFibGU+] (talk | contribs) (Refactor Card 1)
Jump to navigation Jump to search

Part of Red-Green-Go!

Refactor Card 1

Requires Review/Re-wording

Use helper methods with descriptive names to wrap low-level code, so that your code reads as much like natural language as possible.

Refactor 1.jpg

One of the things we look for in our code during the refactoring step is whether or not it clearly and explicitly expresses our intentions. Good code should read fluently, as a description of what the code is doing and why, without the need for comments.

For example, take a look at this method definition, before refactoring for readability. It is taken from an implementation of the well known "Snake" game, and is part of the code that makes the snake move East once.


public int moveSnakeHeadEast() {
     if (snakeHeadColumn < fieldSize-1 && 
                     !(snakeHeadColumn+ONE_MOVE == snakeBodyColumn && 
                       snakeHeadRow == snakeBodyRow))
          snakeHeadColumn += ONE_MOVE;
}

This code is not very readable. We have lots of low level code checks, and we have to read each one in detail, and think hard about the domain and the context, to be able to figure out what is going on.

Until relatively recently, most developers would have agreed that the solution to this readability problem is to add a comment, like this:

public int moveSnakeHeadEast() {
     // Check whether a move to the east is legal
     if (snakeHeadColumn < fieldSize-1 && 
                     !(snakeHeadColumn+ONE_MOVE == snakeBodyColumn && 
                       snakeHeadRow == snakeBodyRow))
          // Move the snake head one cell to the East
          snakeHeadColumn += ONE_MOVE;
}

This is better. At least the person reading the code can get an idea of what is going on, without having to read the detail of the code. But, our comments have just made the code longer. They haven't actually improved the code or made it more readable; they have just patched over the problems in a somewhat clumsy way. What is worse, we have found that programmers are pretty hopeless at keeping comments up to date when the code changes. So, we don't even know whether these comments are to be trusted or not.

Nowadays, we prefer to make the code itself readable, rather than embedding comments. (In fact, the presence of more than just a bare handful of comments in code is now considered to be a code smell, indicative of a design failing.)

One way we can increase the readability of the code is to define helper methods, with names that explicitly describe what they do. For example, we can refactor the above code with a couple of helper methods as follows:

public int moveSnakeHeadEast() {
    if (moveEastIsLegal())
          moveSnakeHeadOneColumnToRight();
}

private boolean moveEastIsLegal() {
     return snakeHeadColumn < fieldSize-1 &&
               !(snakeHeadColumn+ONE_MOVE == snakeBodyColumn && 
                  snakeHeadRow == snakeBodyRow);
}

private void moveSnakeHeadOneColumnToRight() {
     snakeHeadColumn += ONE_MOVE;
}

See how the refactored code now reads almost like a sentence describing what the code does? This is the benefit of this approach to maintaining readability: it raises the abstraction level of our code, bringing it closer to the level of the domain concepts understood by the customer.


So What Does the Mean in Practice?

So, if you feel yourself wanting to add comments to your code, because it feels too obscure and low level, consider writing yourself a helper method instead.

When refactoring, try reading the code out loud to your partner. Whenever you stumble, or the code sounds awkward, take a moment to think whether a helper method, or some better names for the methods you have, might help the code to read more fluently.


I think I've Got It. Any Other Examples?

Here's another example, taken from code written for one of our research projects (the IQBot data quality management project). Here's the original code, which is using Spring HttpInvoker to ask a remote component (an IQBot) to store a new "watch task" for us. (You don't need to know what an IQBot or a watch task is to get the point of the example.)

	public void addWatchTask(DateTime watchTaskStartTime,
                                           ReadablePeriod watchTaskPeriod) {
	     ApplicationContext applicationContext = new ClassPathXmlApplicationContext(
                                                                            new String[]{"httpinvoker-client.xml"});
	     IQBot iqBotProxy = (IQBot) applicationContext.getBean("iqBotServiceProxy");
	
     iqBotProxy.addWatchTask(watchTaskStartTime, watchTaskPeriod);
	}

I'm sure you'll agree that this code is not very readable. It calls some HttpInvoker specific classes and methods, and has literals embedded directly into the code. Unless we already know HttpInvoker well, we are not going to be able to understand this code at first sight.

So, during a refactoring step, the developer of the code decided to add a helper method, to make the intent of the code clearer (as well as making the literals constants with helpful names). Here is the result. What do you think?

public void addWatchTask(DateTime watchTaskStartTime, 
                                           ReadablePeriod watchTaskPeriod) {
     IQBot iqBotProxy = getRemoteIQBotProxy();
     iqBotProxy.addWatchTask(watchTaskStartTime, watchTaskPeriod);
}
	
private static IQBot getRemoteIQBotProxy() {
     ApplicationContext applicationContext = new ClassPathXmlApplicationContext(
                                            new String[]{HTTPINVOKER_CLIENT_CONFIG_FILE});
     return (IQBot) applicationContext.getBean(IQ_BOT_SERVICE_PROXY_NAME);
}

In the new code, even if we don't understand how HttpInvoker makes the remote call in detail, the name of the helper method at least gives us a clue as to what it is intended to do. We can ignore the details of the HttpInvoker calls, and still understand what is happening in the method.


Further Resources

There is a great discussion of readability in code (and code quality in general) in this StackExchange discussion:

What does it mean to write good code?


http://goo.gl/qvPwE4 | QR Code

Refactor Card 2

Requires Review/Re-wording

Replace if-statements describing how to process 0, 1, 2 and 3 items with a loop showing how to process N items.

Refactor 2.jpg


When refactoring, we should always be on the lookout for "patterns" within the code that suggest that a more general implementation may be possible. One such pattern that you can expect to encounter frequently on your TDD adventures is when a sequence of if-statements suggests that a looping behaviour is in fact being specified by the test cases.

Let's take a simple example, and watch the pattern being created. We're going to write tests for code that adds 1 to each item in an array of integers. (This is a very stupid example, but it will help us to show the key points, without more complex domain behaviours getting in the way.)

After several TDD cycles, we have the following set of test cases:

int[] anEmptyCollection = new int[]; int[] anotherEmptyCollection = new int[]; assertThat(theIncrementedVersionOf(anEmptyCollection), is(anotherEmptyCollection));

int[] aSingletonCollection = { 1 }; int[] anIncrementedSingletonCollected = { 2 }; assertThat(theIncrementedVersionOf(aSingletonCollection),

                                                                             is(anIncrementedSingletonCollection));

int[] aCollectionOfSize2 = { 1, 1 }; int[] anIncrementedCollectionOfSize2 = { 2, 2 }; assertThat(theIncrementedVersionOf(aCollectionOfSize2),

                                                                             is(anIncrementedCollectionOfSize2));

int[] aCollectionOfSize3 = { 1, 1, 1 }; int[] anIncrementedCollectionOfSize3 = { 2, 2, 2 }; assertThat(theIncrementedVersionOf(aCollectionOfSize3),

                                                                             is(anIncrementedCollectionOfSize3));


and our production code looks like this:

public int[] theIncrementedVersionOf(int[] collection) {

   if (collection.length == 0}
        return new int[];
   if (collection.length == 1) {
        int[] incrementedCollection = { 2 };
        return incrementedCollection;
    }
    if (collection.length == 2) {
         int[] incrementedCollection = { 2, 2 };
         return incrementedCollection;
    }
    int[] incrementedCollection = { 2, 2, 2 };
    return incrementedCollection;

}

Here, we can see a clear pattern emerging in the code. In fact, what we can see is an "unfolded" version of a loop. If we take a normal while loop, and write out the statements that are executed when the loop iterates 0 times, once, twice, three times, etc., this is the sort of pattern that we would see.

In this refactoring, we are going to reverse the transformation to refold these if-statements up into the loop behaviour that inspired them. The loop variable is clearly related to the length of the collection being process (as can clearly be seen in the conditions of our if-statements). And the body of the loop has the job of adding the number 2 into the collection, on each iteration.

So, after refactoring, here is our new method definition:

public int[] theIncrementedVersionOf(int[] collection) {

    int[] incrementedCollection = new int[collection.length];
    for (int i = 0; i < collection.length; i++) {
         incrementedCollection[i] = 2;
    }
   return incrementedCollection;

}

Of course, this is a very simple example, and most programmers would probably right the correct code from scratch without difficulty. This kind of loop re-folding is most useful for more complex conditions, where you are not confident that you can get the loop condition right, or highly critical code, where it is vitally important that the condition is executed correctly, even when relatively simple seeming.


But What About Actually Incrementing the Numbers?

We've ignored the question of how the individual items in the array are being incremented in this code so far, while we concentrated on writing and passing tests describing how the items in the collection are being iterated over. We can write some tests that force us to address the question of how the numbers will be incremented next, now that we have pushed out the collection management behaviour.

This is an example of how TDD allows us to focus on one part of the behaviour at a time, and to ignore the rest. And how, by doing so, it allows us to keep the intellectual complexity of programming under control.



http://goo.gl/myM3wE | QR Code

Refactor Card 3

Requires Review/Re-wording

Embedding literal values in your production code is a form of technical debt. Define key literals as constants to make their meaning explicit.

Refactor 3.jpg

We have known for a long time about the dangers of embedding literals directly into our code. Embedded literals are hard to read, often obscure, can quickly become duplicated in different parts of the code, and are all too easy to overlook when changes must be made. In short, they're a maintenance nightmare...

There have been a number of well-documented incidents of embedded literals causing real financial problems for companies. In the late 1990s, for example, many companies fell foul of an embedded constant in the COBOL code that ran their mission-critical information systems. For many years, it had been the practice amongst COBOL developers to ask users to key in the special code 9999 as the first field in a data entry form, to indicate that no more records needed to be entered. Unfortunately, many of these data entry forms began with a date field. All was well until the date 9th September 1999 approached (coded with the string "9/9/99" in those days) when many companies found they could not enter any records for that date --- no sales, no records of income received or salaries paid, no records of debits or credits on that particular day. When the date "9/9/99" was entered, the software immediately exited the data entry form. The programmers developing these systems had not thought it possible that their code would last for so many decades, and would still be being used as we approached the new Millennium. So, they hard coded the special "exit" number into their code. Turns out they were wrong!

So What Should I Do with my Embedded Literals?

Of course, we need some way for the code to know about the literal values that are significant to the domain we are coding for. What we don't want is for the literal values to be left scattered around the code, where they cannot easily be found and modified.

The usual solution is to define the literals of interest as constants, in some meaningful place, and with some meaningful name.

In Java, the convention is for constants to be defined as final, static members of a class, with names given in upper case only, and with words separated by underscores. For example:

static final String JDBC_DRIVER = "com.mysql.jdbc.Driver";
static final String DB_URL = "jdbc:mysql://happy.mycom.co.uk/maindb";

These constants describe values needed to connect to a database using JDBC.

When you are refactoring, you should look for embedded literals in the code, and consider redefining them as constants, with very descriptive names. This small and simple change can radically improve the readability of your code, as well as its resilience to change in the future.


But I Can't be Bothered to Do All That Typing

Well, the good news is: you don't have to. All modern IDEs have great refactoring support these days, including a refactoring option to convert a literal into a constant definition.

Check out your preferred IDEs refactoring menu, and look for the command that will create the constant for you. Normally, you'll have to select the literal you want to transform in your IDE, before selecting the refactoring menu item. Then you'll hae to interact with a short form, asking you for the name of the new constant, and for other information about where and how it should be initialised, etc.


So What is this Technical Debt Thing?

The concept of technical debt allows us to make pragmatic decisions about the level of code quality we can afford to have at any point in time. Although your programming lecturers would probably like you to write beautiful and elegant code all the time, in practice, this is just not possible. Good code takes time to write. And in a real industrial setting, there is usually only so much time to go around. We have to use it wisely, focussing on the parts of the code quality that really deliver benefits for us, and living with less-than-perfect code when we have to.

This less-than-perfect code is what we mean by "technical debt". By leaving something problematic in our code now, we buy ourselves some time to work on other, more immediately valuable activities. The term "technical debt" reminds us that we do not get this windfall of time for free. The problematic code will come back to bite us in the future, when it causes a bug, or slows down the addition of some new functionality, or just makes the code harder for a new employee to get to grips with. We will at some point in the future have to pay back the debt, we just don't know when and how much it will cost us.

Therefore, careful coders keep their technical debt under careful management. And they take any chance they can to pay back some of the technical debt that they see, while working on other things, as they know that this will pay for itself later, in avoiding a bigger problem. Defining literal values as constants, with really good, descriptive names, is one way to avoid the technical debt caused by having literals embedded in your code.


Further Resources

If you are new to the concept of magic strings/numbers in code, you may find these links enlightening.



Magic String

Eliminating Magic Numbers: When is it time to say “No”?

Avoid using magic numbers and string literals in your code


http://goo.gl/M66MS5 | QR Code

Refactor Card 4

Requires Review/Re-wording

Remember that test code is code, too, and should also be refactored to keep the design clean.
Refactor 4.jpg

In the excitement of getting tests to pass, and of writing beautifully readable code that gets tests to pass, it is all too easy to forget that our test code needs refactoring, too. Test code is likely to be the first code that new members of the team look at, as well as being the first target when requirements change or when bugs are found. Keeping test code up to date when the design of the system changes is another major task, that can eat up resources if test code quality slips. Our test suites are our specification for the system we are building, and so it is important that they should be as readable, as elegant and as concise a description of that specification as we can make them in the time available.

So, when you've finished refactoring the production code, take a minute to look at your test code, and see whether your tests are as readable and maintainable as they could be. All the refactoring tips described in the other refactoring cards apply to test code (with the exception, perhaps, of the advice regarding the identification of patterns in production code).

Look also to see whether all the tests are actually needed, or whether they could be simplified in some way. Check the names of the test cases and the test classes. And check whether the test fixture is appropriate for each test case. You may want to create some new test classes, and move the tests about, into other test classes that might be a better home for them.


Further Resources

If you want to explore this topic further, you might find this Stack Exchange conversation, on the topic of test suite maintenance, interesting:
How do people maintain their test suite?


http://goo.gl/eZM6uu | QR Code

Refactor Card 6

Requires Review/Re-wording

It's mandatory to look for refactoring opportunities in each red-green-green cycle. It is not mandatory to actually refactor. Do so only when the benefits are clear.
Refactor 6.jpg

When should you refactor?
When is it time to refactor?
When to refactor?


http://goo.gl/2rWgw0 | QR Code

Refactor Card 7

Requires Review/Re-wording

It's mandatory to look for refactoring opportunities in each red-green-green cycle. It is not mandatory to actually refactor. Do so only when the benefits are clear.
Refactor 7.jpg

Sometimes, especially in the early cycles of the construction of a new unit of code, there may not be any clearly worthwhile refactoring options in either the production code or the test code. Or, you may see opportunities for refactoring, but be unsure of which of several alternatives to take.

In these cases, it is perfectly acceptable to pass through the refactoring step without having made any changes to your code. As long as the tests all pass, you can move on to writing the next test.


So Is it Okay if I Never Refactor? (Well, Hardly Ever!)

If you are repeatedly moving through the refactoring step without making any changes, then you might want to pause and think about your technique as a pair (or as an individual developer). If you are really implementing the most simple and direct code that will make the tests pass in the coding step of the cycle, then you really ought to be generating plenty of refactoring material as you go. The refactoring step is where we do the design work on our system. If you are not seeing opportunities to improve the design as you gain more information about what functionality the system must provide, then that suggests a problem with your approach. You're unlikely to succeed with TDD unless you can build up your refactoring muscles quickly. Refactoring isn't just a chance to make the code prettier, bolted onto the end of the TDD process; it is a core element of the whole approach.

You should consider, for example, whether you might be unconsciously avoiding the code changes that refactoring requires, perhaps because you know from experience how difficult code change can be on medium to large scale systems. Thanks to the high coverage test suites, we can make extensive changes to code developed using TDD, with confidence that the tests will trap any regressions that we cause, as we cause them. (Well, most of them, anyway!)


So, don't panic if you can't think of anything to do to improve the code in some of your red-green-green cycles. But, as the development progresses, you should expect to be finding a great many things that you'd like to change, for the majority of the system increments. If that isn't happening, it might be time to do a little self-examination, or to pair with someone with more TDD experience for a while, if you can.


Further Resources

You may find these online conversations between professional programmers interesting. They are all discussions of the question of when it is a good idea to refactor, and when not.

When should you refactor?
When is it time to refactor?
When to refactor?


Refactor Card 8

Refactor 8.jpg

Requires Review/Re-wording

If you're not sure whether a particular refactoring will improve things, then leave it. Later test cases may make the benefits clearer.

Sometimes, especially in the early stages of the development of a new unit of code, a number of refactoring opportunities may present themselves. It can be unclear as to which is the best way to go.

This happens to everyone, beginning TDDers and TDD gurus alike. The indecision is often not caused by inexperience, but instead by the fact that, in the beginning, we have very little information about the functionality we are developing. This can make it difficult to make decisions trading off different design features, no matter how expert a TDDer you are. At this stage, we just don't have the information we need to decide.

The solution to this problem is not to feel pressured to refactor. If you're unsure of whether a particular refactoring is appropriate or not, wait for a few more test cases to be written, to see whether the new information they provide gives us any grounds for accepting or rejecting the refactoring option.

It is definitely possible to refactor too early. In the first few cycles, too much of a focus on code quality can be a form of gold-plating, where code is made beautiful in the first few cycles, only to be deleted and replaced with other (messy) code in the next few, when we understand better what is needed. Another problem with premature refactoring is that it can obscure the patterns that are building up in the production code, following you (or your partners) careful and accurate work in the coding step.

So, there is a judgment call to be made for each refactoring opportunity. Not all the opportunities we think we see will bear fruit once we have implemented them.


What if I'm Still Not Sure, Three or Four Tests Down the Line?

If you put off a refactoring for a few more test cases, but you don't learn anything that can tell you whether it will be worthwhile, you have another option open to you.

You can try the refactoring as an experiment, and see how it goes.

Because our refactorings should all be small and focussed code changes, they are also fairly easy to reverse. With the support of a version control system, plus our safety net of test cases, experimenting with refactorings, to see what they look like, can be a relatively low cost way to get to the answer we need regarding them.


Refactor Card 9

Refactor 9.jpg

Requires Review/Re-wording

Duplicated code can indicate a missing design concept. Keep code readable and flexible by wrapping and removing duplicated code using helper methods.

It is a long standing principle in software design that duplicated code is a "bad thing". The reasons for this are easy to see. When code that is duplicated across several parts of a software system must change, we rely on the developers knowing about (or being able to find) all the copies of the code, and changing them all consistently. This (it turns out) is quite a big ask! And if some copies are changed while others are not, the behaviour of the software system becomes erratic and hard to predict, with all kinds of expensive consequences.

Therefore, developers today work hard to make sure that all required behaviour is implemented in one and only one place in the code. That way, when it changes, we only have to update one piece of code. All other code that needs this behaviour will invoke the changed (single) version, and the system's behaviour will be consistent.

In TDD, duplication of code has another, slightly different role. In TDD, we don't worry about avoiding duplication up front so much. In fact, during the coding step, we are allowed to duplicate code, if this means we get back to our green test suite quickly. Then, in the refactoring step, we look for these instances of duplication to tell us when domain concepts are hiding in the code, waiting to be made explicit. In this way, we follow the trail of duplication to help us identify these concepts, and then we make them visible. This can be done by defining a brand new class, or a new instance of a class, a new field or method on a class.


The maxim on this guidance card refers to a particular technique for removing duplication, which is to wrap the duplicated code within a helper method, and to replace it with calls to that method.

This is a straightforward process in most modern IDEs. Check the refactoring method for a "Create/Make method" option. You'll need to start by selecting the code you want to wrap within the helper method. Then select the refactoring option, and compete the short form that follows.

The name of the helper method should be chosen to be as readable as possible. Usually, this will mean giving the helper method a name that describes the domain concept that it corresponds to.


Can you Give an Example?

Sure. Let's see this technique in action. Suppose we have some code that supports an e-commerce system. The code to add a product to the basket first checks to see whether the customer is a member of the loyalty card programme, and if so whether he or she might qualify for a discount on this product. The same check is made when the quantity of the product ordered is updated. And a similar check is made when a customer is removed from the loyalty card scheme, to find out which (if any) open orders might need to have discounts removed.

While refactoring, we spot that each of these methods is calling more or less the same code to make this for discount eligibility. We use our IDEs refactoring menu to extract a method from one of the duplicated cases, and create the new method. We have to think about what to call it and where to put it. We decide to put the new method on the Customer class, and to give it the following signature:

List<Discount> applicableDiscounts(Product product)

When we are happy that no tests are broken with this first replacement, we go on to replace the code in the other two methods, with the new method (retesting the complete test suite, of course, to check for any unexpected regressions).

Performing this refactoring made us see that we needed a new object class (the Discount class), and that it should link customers and products. In removing the duplication, our understanding of the domain has grown just a tiny bit, moving us one step closer to our finished system.


Do We Need to Worry About Duplication in Test Cases?

Absolutely. Test code is code too, as the maxim goes. Duplication in test case code has one very specific negative effect, that we want to avoid if at all possible: it makes tests brittle. This means that small changes to the production code interface and design have the effect of breaking large numbers of tests at time. When we remove duplicated test code, however, the changes that are needed are localised to the new helper method. Many tests will fail, still, but we can diagnose the source of the problem as being somewhere within the helper method. One quick fix there and all the tests should run again.


Authors

  • gravatar Mbasssme [userPHRhYmxlIGNsYXNzPSJ0d3BvcHVwIj48dHI+PHRkIGNsYXNzPSJ0d3BvcHVwLWVudHJ5dGl0bGUiPkdyb3Vwczo8L3RkPjx0ZD51c2VyPGJyIC8+PC90ZD48L3RyPjwvdGFibGU+] ·
  • gravatar Mbax9mb5 [userPHRhYmxlIGNsYXNzPSJ0d3BvcHVwIj48dHI+PHRkIGNsYXNzPSJ0d3BvcHVwLWVudHJ5dGl0bGUiPkdyb3Vwczo8L3RkPjx0ZD51c2VyPGJyIC8+PC90ZD48L3RyPjwvdGFibGU+]