Unitiliy

Visualising good code: car manufacturing driven development

This is the first in a series of articles answering the question

What is good code?

Put simply, good code can be changed quickly and easily. For me, thats all there is to it. So the question becomes, how can we structure our code, so that it is easy to work with?

We write code like we build cars. When we build a car1, each part is built in a separate factory and tested to ensure it works. We know the efficiency of the engine long before it goes anyway near the factory floor. Similarly, the engine parts are built and tested in a different factory. This continues right down to the screws, the strength of which is known before they are put in a box.

We should develop our code in exactly the same way. In the past I have written a program and then run it. This is the equivalent of getting a load of raw materials in a big warehouse, building a car from scratch, and then driving straight on to a busy motorway to see if it works.

Good code is broken down into units, each of which we test in isolation. The reason this works (in both the car and software industries) is because it's such a natural thing to do. Humans are great at solving big problems by breaking them down into little ones. What ever influences your programming: Object orientated, SOLID principles, functional programming, or anything else, you are doing it like that so you can compose lots of small peices of functionality (solutions to smaller problems) together to solve a big problem.

We can visualise an individual unit as a block with it's inputs flowing in at the top from the left. The data flows down through the unit and the ouptut is returned to the left at the bottom. For a unit with two inputs and a single output we draw

For a single unit, a picture doesn't help much, but as we peice units together to make programs we'll see how insightful these visualisations can be.

Enough talking, show me an example!

Everything in this artice can be applied irrespective of the language. However, I wanted to have example code, so I needed to choose an arbitrary language. I picked Java simply because its fairly widely used and relatively simple. Dont get hooked up on this choice, and remember, we can use these technniques in whatever language you program in, and please dont worry about the Java conventions I've followed or the ones I haven't.

So let's get started. We have an application that finds the unique first names in a database, and writes a report to disk. This is the longest code example in any of these articles. We reuse it again and again so if you get though this, it's easy sailing from here on. The code looks like this:

public class App {
    public static void main(String[] args) throws IOException {
        new App().run(args[0]);
    }

    private void run(String reportTitle) throws IOException {
        DatabaseClient databaseClient = DatabaseClientFactory.newclient(
            "jdbc:sqlserver://db1:2000;databaseName=prodDb");

        Collection<String> firstNames = databaseClient.runQuery(
                "select firstname from users");

        Map<String, Integer> counts = new HashMap<>();

        for (String firstName : firstNames) {
            String capitalisedFirstName = Character.toUpperCase(firstName.charAt(0)) +
                    firstName.substring(1);

            Integer currentCount = counts.get(capitalisedFirstName);
            counts.put(capitalisedFirstName, currentCount == null ? 1 : currentCount + 1);
        }

        StringBuilder report = new StringBuilder();
        report.append(reportTitle);
        report.append(System.lineSeparator());

        for (Map.Entry<String, Integer> nameAndCounts : counts.entrySet()) {
            report.append(
                    String.format(
                            "%s:%d%s",
                            nameAndCounts.getKey(),
                            nameAndCounts.getValue(),
                            System.lineSeparator()
                    )
            );
        }

        FileUtils.writeStringToFile(
                new File("/home/reports/firstNames.txt"),
                report.toString(),
                StandardCharsets.UTF_8
        );
    }
}

Now that probably works (I can see a couple of potential bugs) but its a good start. I'd argue that its pretty readable, theres no duplication, and the variables are well named. But how can we tell if it works? theres only on way im afraid: run it, and for this reason i would describe this as bad code. Each time we add or change something, we would need to manually test every bit of functionality to make sure it still works. So our changes will never be easy or fast.

Another big problem is that we only have one unit of code. If we look at a report and see that it doesnt look like we expect, anything could be broken:

  • Are we looking in the right database?
  • Does the counting logic work?
  • Is the name capitalisiation changing more than it should?
  • Am i looking at the report generated from the previous run?

The problem is obvious when we look at the visualisation of the program.

It is a single unit which has one input (the title of the report). It doesnt have any outputs, but it does cause some side effects (writes a file to disk). We cant reason about (visualise) anything more than this. As we improve the code base, we will see the visualisation become much more insightful.

Our first sub unit

Let's suppose we run the code, and the report looks a bit odd. We suspect the code that generates the file contents is to blame, so let's extract it into a separate unit and test it in isolation.

We shal create a ReportCreator unit. Its' going to be just like that part of the car engine, that is created and tested in a different factory to the rest of the car. We create it separately to the rest of the application. It has no idea about databases or file writing, all it knows is how to generate a report from a list of first names.

The public interface of this unit is a single function that takes a String (the report title) and collection of Strings (the first names), it returns a String (the report content):

public interface ReportCreator {
    String createReport(String reportTitle, Collection<String> firstNames);
}

and our implementation is just the relevant code extracted from the previous example:

public class ReportCreatorImpl implements ReportCreator {

    @Override
    public String createReport(String reportTitle, Collection<String> firstNames) {
        Map<String, Integer> counts = new HashMap<>();

        for (String firstName : firstNames) {
            String capitalisedFirstName = Character.toUpperCase(firstName.charAt(0)) +
                    firstName.substring(1);

            Integer currentCount = counts.get(capitalisedFirstName);
            counts.put(capitalisedFirstName, currentCount == null ? 1 : currentCount + 1);
        }

        StringBuilder report = new StringBuilder();
        report.append(reportTitle);
        report.append(System.lineSeparator());

        for (Map.Entry<String, Integer> nameAndCounts : counts.entrySet()) {
            report.append(
                    String.format(
                            "%s:%d%s",
                            nameAndCounts.getKey(),
                            nameAndCounts.getValue(),
                            System.lineSeparator()
                    )
            );
        }

        return report.toString();
    }
}

The new unit is visualised like:

It is a single unit of code that takes two inputs (a title and a collection of first names) and has one output (a String containing all the first names, and the number of people who share that name).

Now we can test this unit, and confidently use it anywhere we wish. It's quite simple, we write a second piece of code that executes the count first name function for different inputs. Each time we execute it, we check that the result is what we expected

public class ReportCreatorTest {

    public static void main(String[] args) {
        ReportCreatorImpl reportCreator = new ReportCreatorImpl();

        // Test 1
        assertEquals(reportCreator.createReport("an interesting title",
        Arrays.asList("mike", "steve")),
                "an interesting title\n" +
                              "Mike:1\n" +
                              "Steve:1\n");

        // Test 2
        assertEquals(reportCreator.createReport("an interesting title",
        Arrays.asList("mike", "steve", "steve")),
                "an interesting title\n" +
                              "Mike:1\n" +
                              "Steve:2\n");


        // Test 3
        assertEquals(reportCreator.createReport("an interesting title",
        Arrays.asList("mike", "steve", "mike", "steve", "steve")),
                "an interesting title\n" +
                              "Mike:2\n" +
                              "Steve:3\n");
    }

    private static void assertEquals(String actualOutput, String expectedOutput) {
        if (!expectedOutput.equals(actualOutput)) {
            throw new RuntimeException("Report creator is broken");
        }
    }
}

Theres nothing radical about unit testing, you'd always use a library, (JUnit, NUnut, HUnit...) but the above sinppet gives you the gist of it. How many tests you write is up to you, with these tests I'd be pretty confident the ReportCreator unit works. If I was writing code to land the space shuttle, I'd probably test a few more edge cases.

The main point is that we are testing the unit in isolation. The unit tests define what the unit does. If the tests pass, we know it works and that we can use it where ever we wish. Just like when we know some screws are strong enough, we can use them to build a car.

We can visualise the tested unit like this.

For a given input we expect a specific output, it tells us what the code does, but doesnt tell us how. Interestingly the better the test coverage, the less important the actual implementation is. Often when reviewing pull requests, if I see the tests first, I find myself thinking: I dont actually care how this is implemented, it does what it needs to do.

So now we have our first tested unit it's time to use it in our application. In langauges like Java we ususally do this by injecting the unit into another unit's constructor.

public class App {
    private final ReportCreator reportCreator;

    public static void main(String[] args) throws IOException {
        new App(new ReportCreatorImpl()).run(args[0]);
    }

    public App(ReportCreator reportCreator) {
        this.reportCreator = reportCreator;
    }

    private void run(String reportTitle) throws IOException {
        DatabaseClient databaseClient = DatabaseClientFactory.newclient(
            "jdbc:sqlserver://db1:2000;databaseName=prodDb");

        Collection<String> firstNames = databaseClient.runQuery(
                "select firstname from users");

        String report = reportCreator.createReport(reportTitle, firstNames);

        FileUtils.writeStringToFile(
                new File("/home/reports/firstNames.txt"),
                report,
                StandardCharsets.UTF_8
        );
    }
}

We can visualise the main program unit, it has a single dependency: the ReportCreator unit. The root unit delegates the responsibility of creating the report to its dependency

Decoupling from the runtime

We've extracted one unit, but all our code should be in self contained units in exactly the same way. Currently the root unit contains the Java main function. If we extract it, we can test it in isolation, as well as use it in different situations, e.g. we can deploy it to a serverless architecture like AWS Lambda or Azure functions.

So let's extract a FirstNameReportWriter unit

public class FirstNameReportWriter {
    private final ReportCreator reportCreator;

    public FirstNameReportWriter(ReportCreator reportCreator) {
        this.reportCreator = reportCreator;
    }

    public void writeReport(String reportTitle) throws IOException {
        DatabaseClient databaseClient = DatabaseClientFactory.newclient(
            "jdbc:sqlserver://db1:2000;databaseName=prodDb");

        Collection<String> firstNames = databaseClient.runQuery(
                "select firstname from users");

        String report = reportCreator.createReport(reportTitle, firstNames);

        FileUtils.writeStringToFile(
                new File("/home/reports/firstNames.txt"),
                report,
                StandardCharsets.UTF_8
        );
    }
}

we can now instantiate our new FirstNameReportWriter unit in the main method of our app:

public class App {
    public static void main(String[] args) throws IOException {
        FirstNameReportWriter firstNameReportWriter =
                new FirstNameReportWriter(new ReportCreatorImpl());

        firstNameReportWriter.writeReport(args[0]);
    }
}

This is a bigger deal than it may appear at first glance. We have decoupled our code from this particular run time environment. Now we can run it easily anywhere, including from our tests.

We now visualise our application like:

The black unit represents a run time envionment, in this case it is App the java main method. We can also see that it passes its argument straight to its dependency FirstNameReportWriter.

External dependencies

At this point we could write tests for the FirstNameReportWriter unit. We would need to create one, call the writeReport method, read in the file it created, and finally assert that it is as expected.

However the code queries a database and this causes problems

  1. It only works for the data in the database at the time we ran the test. if the data in the database changes the test will fail.
  2. The test wont cover edge cases that are not in the database right now but probably will be in the future.
  3. There are significant security and performance implications from hitting the production database every time we run our tests.

To get around these issues, we will extract the database reading code, and in our tests we will substitute it with a unit that acts the same but doesnt need a database.

The new unit has a public interface with a single function:

public interface FirstNameDataStore {
    Collection<String> getAllFirstNames();
}

and the database implementation looks like this:

public class Database implements FirstNameDataStore {
    private final String url;

    public Database(String url) {
        this.url = url;
    }

    @Override
    public Collection<String> getAllFirstNames() {

        DatabaseClient databaseClient = DatabaseClientFactory.newclient(url);

        return databaseClient.runQuery(
                "select firstname from users");
    }
}

Writing an automated test for this unit isn't straightforward, you will need a non production database and either populate it with data before each test or keep it in a consistent state (neither of these are trivial). If you are lucky you may be able to spin up an embedded version of the database ensuring a clean slate each time. There are other ways you could deal with it such as relying soley on manual testing. However you test this unit, once you are convinced it is working, you can inject it our FirstNameReportWriter unit:

public class FirstNameReportWriter {
    private final ReportCreator reportCreator;
    private final FirstNameDataStore firstNameDataStore;

    public FirstNameReportWriter(ReportCreator reportCreator, FirstNameDataStore firstNameDataStore) {
        this.reportCreator = reportCreator;
        this.firstNameDataStore = firstNameDataStore;
    }

    public void writeReport(String reportTitle) throws IOException {
        Collection<String> firstNames = firstNameDataStore.getAllFirstNames();

        String report = reportCreator.createReport(reportTitle, firstNames);

        FileUtils.writeStringToFile(
                new File("/home/reports/firstNames.txt"),
                report,
                StandardCharsets.UTF_8
        );
    }
}

which now looks like:

We can now test the FirstNameReportWriter unit without using the database. We create a mock database unit, this looks and acts just like our database implemenation but returns anything we choose. Notice how the FirstNameReportWriter unit doesnt know whether its dependency is the real database implementation or the mock one, it only knows about its public interface.

The mock datastore looks like this

public class MockDatabase implements FirstNameDataStore{
    private final Collection<String> firstNamesToReturn;

    public MockDatabase(Collection<String> firstNamesToReturn) {
        this.firstNamesToReturn = firstNamesToReturn;
    }

    @Override
    public Collection<String> getAllFirstNames(String url) {
        return firstNamesToReturn;
    }
}

In our tests we simply create the FirstNameReportWriter unit with an instance of the mock unit. Then we can test with any data we choose, without needing a real database running:

public class FirstNameReportWriterTest {
    public static void main(String[] args) throws IOException {
        FirstNameReportWriter firstNameReportWriter;
        File outputFile = new File("/home/reports/firstNames.txt");


        // Test 1
        firstNameReportWriter = new FirstNameReportWriter(
                new ReportCreatorImpl(), new MockDatabase(Arrays.asList("mike", "steve"))
        );
        firstNameReportWriter.writeReport("an interesting title");
        String actualReport = FileUtils.readFileToString(outputFile, StandardCharsets.UTF_8);
        assertEquals(actualReport,
                "an interesting title\n" +
                              "Mike:1\n" +
                              "Steve:1\n");


        // Test 2
        firstNameReportWriter = new FirstNameReportWriter(
                new ReportCreatorImpl(), new MockDatabase(Arrays.asList("mike", "steve", "steve"))
        );
        firstNameReportWriter.writeReport("an interesting title");
        actualReport = FileUtils.readFileToString(outputFile, StandardCharsets.UTF_8);
        assertEquals(actualReport,
                "an interesting title\n" +
                              "Mike:1\n" +
                              "Steve:2\n");


        // Test 3
        firstNameReportWriter = new FirstNameReportWriter(
                new ReportCreatorImpl(),
                new MockDatabase(Arrays.asList("mike", "steve", "mike", "steve", "steve"))
        );
        firstNameReportWriter.writeReport("an interesting title");
        actualReport = FileUtils.readFileToString(outputFile, StandardCharsets.UTF_8);
        assertEquals(actualReport,
                "an interesting title\n" +
                              "Mike:2\n" +
                              "Steve:3\n");
    }

    private static void assertEquals(String actualOutput, String expectedOutput) {
        if (!expectedOutput.equals(actualOutput)) {
            throw new RuntimeException("FirstNameReportWriter creator is broken");
        }
    }
}

You should use a library to provide the mock functionality rather than rolling your own, but this illustrates the point.

The final extraction

We've come a long way but theres still one unit that would be useful to extract. Every time we run our tests we write the report to the file system. This is another external dependency that may cause performance problems, for example we can't run multiple tests in parallel because they would all write to the same file. It also means theres a lot more logic (FileUtils.readFileToString) than we would like in the FirstNameReportWriter unit see . Therefore we will create our final unit.

It's public interface will be:

public interface ReportWriter {
    void writeReport(String report) throws IOException;
}

and its implementation is

public class FileWriter implements ReportWriter {
    @Override
    public void writeReport(String report) throws IOException {
        FileUtils.writeStringToFile(
                new File("/home/reports/firstNames.txt"),
                report,
                StandardCharsets.UTF_8
        );
    }
}

There isn't a lot to test in this unit. Given a String (any String) it should write it to disk. So let's write that test:

public class FileWriterTest {

    public static void main(String[] args) throws IOException {

        // Test 1
        FileWriter fileWriter = new FileWriter();
        fileWriter.writeReport("a great report");

        String result = FileUtils.readFileToString(
                new File("/home/reports/firstNames.txt"), StandardCharsets.UTF_8);
        assertEquals("a great report", result);

    }

    private static void assertEquals(String actualOutput, String expectedOutput) {
        if (!expectedOutput.equals(actualOutput)) {
            throw new RuntimeException("FileWriter creator is broken");
        }
    }
}

We can inject this dependency into the FirstNameReportWriter unit.

public class FirstNameReportWriter {
    private final ReportCreator reportCreator;
    private final FirstNameDataStore firstNameDataStore;
    private final ReportWriter reportWriter;

    public FirstNameReportWriter(
            ReportCreator reportCreator,
            FirstNameDataStore firstNameDataStore,
            ReportWriter reportWriter
    ) {
        this.reportCreator = reportCreator;
        this.firstNameDataStore = firstNameDataStore;
        this.reportWriter = reportWriter;
    }

    public void writeReport(String reportTitle) throws IOException {
        Collection<String> firstNames = firstNameDataStore.getAllFirstNames();

        String report = reportCreator.createReport(reportTitle, firstNames);

        reportWriter.writeReport(report);
    }
}

Interestingly this unit now has no logic it's sole responsibility is to compose the functionality of its dependencies. This means we can greatly improve it's visualisation.

It's tests can now be greatly improved, we no longer need to repeat the file reading logic, and we no longer need to have an external dependency i.e. we no longer need to write to the file system. In the tests we will inject a mock of the ReportWriter unit, it will simply capture its arguments:

public class ArgumentCatchingReportWriter implements ReportWriter {

    private String capturedArgument;

    @Override
    public void writeReport(String report) throws IOException {
        this.capturedArgument = report;
    }

    public String getCapturedArgument() {
        return capturedArgument;
    }
}

and our now tests look like this:

public class FirstNameReportWriterTest {
    public static void main(String[] args) throws IOException {
        FirstNameReportWriter firstNameReportWriter;
        ArgumentCatchingReportWriter reportWriter;


        // Test 1
        reportWriter = new ArgumentCatchingReportWriter();
        firstNameReportWriter = new FirstNameReportWriter(
                new ReportCreatorImpl(),
                new MockDatabase(Arrays.asList("mike", "steve")),
                reportWriter
        );
        firstNameReportWriter.writeReport("an interesting title");
        assertEquals(reportWriter.getCapturedArgument(),
                "an interesting title\n" +
                "Mike:1\n" +
                "Steve:1\n");


        // Test 2
        reportWriter = new ArgumentCatchingReportWriter();
        firstNameReportWriter = new FirstNameReportWriter(
                new ReportCreatorImpl(),
                new MockDatabase(Arrays.asList("mike", "steve", "steve")),
                reportWriter
        );
        firstNameReportWriter.writeReport("an interesting title");
        assertEquals(reportWriter.getCapturedArgument(),
                "an interesting title\n" +
                "Mike:1\n" +
                "Steve:2\n");


        // Test 3
        reportWriter = new ArgumentCatchingReportWriter();
        firstNameReportWriter = new FirstNameReportWriter(
                new ReportCreatorImpl(),
                new MockDatabase(Arrays.asList("mike", "steve", "mike", "steve", "steve")),
                reportWriter
        );
        firstNameReportWriter.writeReport("an interesting title");
        assertEquals(reportWriter.getCapturedArgument(),
                "an interesting title\n" +
                "Mike:2\n" +
                "Steve:3\n");
    }

    private static void assertEquals(String actualOutput, String expectedOutput) {
        if (!expectedOutput.equals(actualOutput)) {
            throw new RuntimeException("FirstNameReportWriter creator is broken");
        }
    }
}

which can be visualised as:

The greyed out units show we are mocking them out for the test to return a pre defined output for a specific input.

Finally, for the application code, we new up the FirstNameReportWriter unit with the FileWriter unit as it's dependency.

public class App {
    public static void main(String[] args) throws IOException {
        FirstNameReportWriter firstNameReportWriter = new FirstNameReportWriter(
                new ReportCreatorImpl(),
                new Database("jdbc:sqlserver://db1:2000;databaseName=prodDb"),
                new FileWriter()
        );

        firstNameReportWriter.writeReport(args[0]);
    }
}

The end result is a very insightful visualisation of our code

We can see that all of our units fall into one of three categories.

1. Pure units

These may have inputs and outputs, they dont depend on internal state, and cause no side effects. Given the same inputs, they produce the same outputs every time. ReportCreatorImpl is a pure unit. Our goal is to have as much as our code as possible made from pure units because they are the easiest to test and work with.

2. Side effect units

These communicate with the outside world, given the same inputs they may produce different outputs. FileWriter and FirstNameDataStore are side effect units. Our goal is to make these units as small as possible by moving as much logic into pure units as possible. We want to minimise the code in these units because they can never be tested exactly like they are used in production.

3. Workflow units

These simply compose units together to create bigger units. FirstNameReportWriter is a workflow unit.

In the upcoming articles we'll talk more about why its important to try and restrict our code to these three types of unit. For now we can see that it allows us to visualise our programs and test each part in isolation, just like we do when building cars.

1 I don't personally build cars so please forgive me if this isn't entirly accurate, which its probably not.