Testing with Node and Docker Compose, Part 1: On the Backend

My last post covered the basics of how to get a node.js application running in Docker. This post is the first in a short series about automated testing in node.js web applications with Docker.

It boils down to running npm test in a Docker container, which may not seem like it should require multiple blog posts! However, as an application gets more complicated and requires more kinds of testing, such as frontend testing and end-to-end testing across multiple services, getting to npm test can be nontrivial. Fortunately, Docker and Docker Compose provide tools that can help.

To illustrate, we’ll build and test an example application: a small TODO list manager. Here’s what it will look like when finished, at the end of the series:

In this post, we’ll start with the backend, which is a small node.js service that provides a RESTful API for managing the task list. In particular, we’ll:

cover some (opinionated) background on web application testing,
set up a node.js service and a database for development with Docker Compose,
write shell scripts to automate some repetitive docker-compose commands,
see how to set up and connect to separate databases for development and test, and
finally, run npm test in the container!

Subsequent posts will extend the approach developed here to include frontend and end-to-end testing, and then to multiple services. The Compose setup in this post is pretty standard and provides the foundation from which we’ll build up to using some more advanced features as the application grows.

The code is available on GitHub, and each post in the series has a git tag that marks the corresponding code.

The TODO List Manager Backend

Let’s start with a quick tour of the service we’ll be developing and testing. It uses PostgreSQL for the datastore and Express for the web server. For convenient database access, it uses knex.js with Objection.js as the object-relational mapping layer.

It has just one model: a Task on the TODO list. Each task has an identifier and a description, which can’t be null and must have a sensible length. Here’s the code:

`src/task.js`

const { Model } = require('objection')

require('./knex') // ensure database connections are set up

class Task extends Model {
  static get tableName() {
    return 'tasks'
  }

  static get jsonSchema() {
    return {
      type: 'object',
      required: ['description'],

      properties: {
        id: { type: 'integer' },
        description: { type: 'string', minLength: 1, maxLength: 255 }
      }
    }
  }
}

module.exports = Task

The service exposes a RESTful API for managing the tasks, which is implemented in the usual way:

`src/app.js`

const bodyParser = require('body-parser')
const express = require('express')

const Task = require('./task')

const app = express()

app.use(bodyParser.json())

// Is the service up?
app.get('/status', (req, res) => res.sendStatus(204))

// List tasks.
app.get('/api/tasks', async (req, res, next) => {
  try {
    const tasks = await Task.query().orderBy('id')
    res.json({ tasks })
  } catch (error) {
    next(error)
  }
})

// Create a new task.
app.post('/api/tasks', async (req, res, next) => {
  try {
    const task = await Task.query().insert({
      description: req.body.description
    })
    res.json({ task })
  } catch (error) {
    if (error instanceof Task.ValidationError) {
      res.status(400).json({ error: { message: error.message } })
      return
    }
    next(error)
  }
})

// Check the id route param looks like a valid id.
app.param('id', (req, res, next, id) => {
  if (/^\d+$/.test(req.params.id)) return next()
  res.sendStatus(404)
})

// Complete a task (by deleting it from the task list).
app.delete('/api/tasks/:id', async (req, res, next) => {
  try {
    await Task.query().deleteById(req.params.id)
    res.sendStatus(204)
  } catch (error) {
    next(error)
  }
})

module.exports = app

The API endpoints are relatively thin wrappers around the ORM, because there’s not much logic in a TODO list app, but we’ll still find some worthwhile things to test.

Behind the scenes, there is also some boilerplate for database connection strings, database access, database migrations, and running the server.

Dockerfile and Compose for Development

Next let’s see how to get the service running in development. I’ve followed the approach in my previous post to set up Dockerfile that handles getting node and its dependencies installed, so here let’s focus on the Compose file:

`docker-compose.yml`

version: '3.7'

services:
  todo:
    build:
      context: .
      target: development
    command: npx nodemon server.js
    depends_on:
      - postgres
    environment:
      PORT: 8080
    ports:
      - '8080:8080'
    volumes:
      - .:/srv/todo
      - todo_node_modules:/srv/todo/node_modules

  postgres:
    image: postgres:12
    environment:
      POSTGRES_HOST_AUTH_METHOD: trust

volumes:
  todo_node_modules:

It’s short but fairly dense. Let’s break it down:

The Compose file defines two services, our todo API service and its database, postgres.
The todo service is built from the current directory, using the development stage of the multi-stage Dockerfile, like in my last post.
The command runs the service in the container under nodemon, so it will restart automatically when the code changes in development.
The todo service depends_on the postgres database; this just ensures that the database is started whenever the todo service starts.
The todo service uses the PORT environment variable to decide what port to listen on in the container, here 8080. The ports key then tells compose to expose port 8080 in the container on port 8080 on the host, so we can access the service on http://localhost:8080.
The volumes are set up to allow us to bind the service’s source files on the host into the container for fast edit-reload-test cycles, like in my last post.
There’s not much to the postgres service, because we’re using it as it comes. It’s worth noting that the image is fixed to version 12, which is the latest at the time of writing. It’s a good idea to fix a version (at least a major version) to avoid accidental upgrades.

Development Helper Scripts

Now that Compose is set up, let’s see how to use it. While not strictly speaking required, it is often helpful to write some shell scripts to automate common tasks. This saves on typing and helps with consistency. The most interesting script is the bin/up script, which handles the initial setup and can also be safely re-run to make sure you are up to date after pulling in remote changes:

`bin/up`

#!/usr/bin/env bash

set -e

docker-compose up -d postgres

WAIT_FOR_PG_ISREADY="while ! pg_isready --quiet; do sleep 1; done;"
docker-compose exec postgres bash -c "$WAIT_FOR_PG_ISREADY"

for ENV in development test
do
  # Create database for this environment if it doesn't already exist.
  docker-compose exec postgres \
    su - postgres -c "psql $ENV -c '' || createdb $ENV"

  # Run migrations in this environment.
  docker-compose run --rm -e NODE_ENV=$ENV todo npx knex migrate:latest
done

docker-compose up -d

Taking it from the top:

The set -e tells the script to abort on the first error, instead of continuing and getting into even deeper trouble. All shell scripts should start with this.
It starts up the database and then runs the built-in postgres pg_isready utility in a loop, waiting until postgres finishes starting up (which is usually pretty quick, but not instant). If we didn’t do this, subsequent commands that need the database might fail sporadically.
Then it creates two databases, one called development and one called test and runs the database migrations in each one. The application uses the connection string for the appropriate database depending on what NODE_ENV it is running in. More on this later.
Finally it brings up the rest of the application with up -d, which runs it detached, in the background.

The application is just an API at this point, so here’s what it looks like when exercised with curl on localhost:8080 — an interface that only a developer could love:

# Create a task 'foo'.
$ curl --header 'Content-Type: application/json' \
  --data '{"description": "foo"}' \
  http://localhost:8080/api/tasks
{"task":{"description":"foo","id":1}}

# List the tasks, which now include 'foo'.
$ curl http://localhost:8080/api/tasks
{"tasks":[{"id":1,"description":"foo"}]}

# Complete task 'foo' by its ID.
$ curl -X DELETE http://localhost:8080/api/tasks/1

The Tests

Now that we’ve seen the service, let’s look at the tests. I’ve chosen two kinds of tests for the example service, model tests and integration tests. These terms are borrowed from Ruby on Rails, which I think encourages an approach to testing that is sound for many kinds of web applications.

Model tests test the ‘model’ layer of the Model-View-Controller (MVC) architecture, which most web applications follow to at least some degree. The model layer contains the application’s core business logic. The controllers are responsible for translating between the model layer and the view layer, which comprises, for most web applications, HTML or JSON responses to HTTP requests. Integration tests test that the models, controllers and views work together. In diagram form:

Note that both model and integration tests have access to the database; the database is not mocked, because it is an essential part of the application. There are certainly cases where mocks are the right tool for the job, but I think the primary database is rarely one of them. I’ve included an appendix with some further discussion on this point.

Model Tests

So, let’s see some examples of model tests. The example app doesn’t have much in the way of core business logic, so these particular model tests border on trivial. However, they illustrate at least one thing that model tests often do: they exercise the Task model to check that some valid data can be inserted and some invalid data can’t be:

`test/model/task.test.js`

const assert = require('assert')

const cleanup = require('../support/cleanup')

const Task = require('../../src/task')

describe('Task', function() {
  beforeEach(cleanup.database)

  it('can be created with a valid description', async function() {
    const description = 'a'.repeat(255)
    const task = await Task.query().insert({ description })
    assert.strictEqual(task.description, description)
  })

  it('must have a description', async function() {
    try {
      await Task.query().insert({
        description: ''
      })
      assert.fail()
    } catch (error) {
      assert(error instanceof Task.ValidationError)
      assert(/should NOT be shorter than 1 characters/.test(error.message))
    }
  })

  it('must not have an overly long description', async function() {
    try {
      await Task.query().insert({
        description: 'a'.repeat(1000)
      })
      assert.fail()
    } catch (error) {
      assert(error instanceof Task.ValidationError)
      assert(/should NOT be longer than 255 characters/.test(error.message))
    }
  })
})

A few remarks:

The tests are written with mocha and the built-in node assertions. (It is usually worthwhile to use a library such as chai for more kinds of assertions, but I didn’t want to overload the example app with tons of libraries, and node’s assert does the job.)
The cleanup.database hook at the top is a helper function I wrote to clear out the database before each test. Compared with cleaning up more selectively, this is a brute force approach, but it does help avoid flakey tests by making sure each test starts with a clean slate.
These model tests are for the Task model, which is a model in the ORM sense of the word. However, model tests can test other kinds of code that aren’t coupled to the ORM and database, too. If your application affords some pure functions (woo hoo!) or plain objects (also good!), you can still test those in model tests. Just omit the cleanup.database hook.

Technically, the main thing that distinguishes a model test from an integration test is that it doesn’t require spinning up or interacting with the express app. The model layer in MVC should be independent from the controller and view layers where possible — it should not care if it’s running in a background job or a service using websockets instead of plain HTTP. If it does, parts of it probably belong in the controller or view layer, where they will be easier to test with integration tests.

Integration tests

That brings us to integration tests, in which we do start up the express app and test it primarily by making HTTP requests. I wrote another test helper to start the express application in a global mocha before hook, so it starts once for the whole test suite. The helper also puts a testClient object on mocha’s this with convenience methods for making requests against the app, such as this.testClient.post. Here are some integration tests:

`test/integration/todo.test.js`

const assert = require('assert')

const cleanup = require('../support/cleanup')

const Task = require('../../src/task')

// Ensure the global test server is started, for this.testClient.
require('../support/test-server')

describe('todo', function() {
  beforeEach(cleanup.database)

  describe('with existing tasks', function() {
    let exampleTasks
    beforeEach(async function() {
      exampleTasks = await Task.query().insert(
        ['foo', 'bar'].map(description => ({ description }))
      )
    })

    it('lists the tasks', async function() {
      const response = await this.testClient.get('/api/tasks')
      assert(response.ok)
      const body = await response.json()
      assert.strictEqual(body.tasks.length, 2)
      assert.strictEqual(body.tasks[0].description, 'foo')
    })

    it('completes a task', async function() {
      const response = await this.testClient.delete(
        `/api/tasks/${exampleTasks[0].id}`
      )
      assert.strictEqual(response.status, 204)

      const remainingTasks = await Task.query()
      assert.strictEqual(remainingTasks.length, 1)
      assert.strictEqual(remainingTasks[0].id, exampleTasks[1].id)
    })
  })

  it('creates a task', async function() {
    const response = await this.testClient.post('/api/tasks', {
      description: 'foo'
    })
    const body = await response.json()
    assert.strictEqual(body.task.description, 'foo')
  })

  it('handles a validation error on create', async function() {
    const response = await this.testClient.post('/api/tasks', {})
    assert(!response.ok)
    assert.strictEqual(response.status, 400)
    const body = await response.json()
    assert.strictEqual(
      body.error.message,
      'description: is a required property'
    )
  })

  it('handles an invalid task ID', async function() {
    const response = await this.testClient.delete(`/api/tasks/foo`)
    assert.strictEqual(response.status, 404)
  })
})

A few talking points:

The tests generally follow a state verification pattern, in which we put the system into an initial state, provide some input to the system, and then verify the output or the final state, or both. For example, the setup for the describe('with existing tasks', ...) block creates two tasks in the database, and then the it('completes a task', ...) test makes a DELETE request and verifies that the service (1) produces the correct response code, 204, and (2) puts the database into the expected state, in which there is only one uncompleted task left.
The tests are gray box tests, in that they are allowed to reach into the database (ideally through the model layer) to affect and inspect the state of the system. Here the API is complete enough that we could write these tests as black box tests using only the public API, but that is not always the case. Having access to the model layer in integration tests gives a lot of useful flexibility.
The integration tests aim to cover all the success and error handling cases in the app’s controllers, but they don’t exhaustively test all of the possible causes of errors in the model layer, because those are covered in the model tests. For example, the it('handles a validation error on create', ...) test checks what happens if the description is missing, but there isn’t an integration test for the case where a description that is too long, because there was a model test for that.

This effect is a major contributor to the often talked about test pyramid, in which we have more model tests than integration tests. In this example, the model layer is too simple for that pattern to emerge, but the test pyramid is a good ideal to strive for in a large application. Integration tests generally take longer to run and require more effort to write than model tests, because there is much more going on — making requests, receiving and deserializing responses, etc.. Other things being equal, it’s usually best to test at the lowest level you can.

Running the Tests

With model and integration tests in hand, let’s see how to run them. We’ve seen above that the bin/up script creates separate development and test databases, so we have to set up the service to use them. This happens mainly in the knexfile:

const common = {
  client: 'postgresql'
}

module.exports = {
  development: {
    ...common,
    connection: 'postgres://postgres:postgres@postgres/development'
  },
  test: {
    ...common,
    connection: 'postgres://postgres:postgres@postgres/test'
  },
  production: {
    ...common,
    connection: process.env.DATABASE_URL
  }
}

The development and test connection strings tell the application how to connect to postgres as the default postgres user, which has default password postgres, running on the host postgres, as declared in our Compose file. (It is a bit like that buffalo buffalo sentence.) In production, we assume that the service will be provided with a DATABASE_URL environment variable, because hard coding production credentials here would be a bad idea.

Then in the service’s package.json we can set up the test scripts to run with NODE_ENV=test, which is how the service knows to connect to the test database:

"scripts": {
  "test": "NODE_ENV=test mocha 'test/**/*.test.js'",
  "test:model": "NODE_ENV=test mocha 'test/model/**/*.test.js'"
},

The test:model script runs only the model tests, which can be useful if you just want to run some model tests without the overhead of starting the express app in the global before hook, as mentioned above.

So, we are now ready to run npm test in a Docker container:

$ docker-compose run --rm todo npm test
Starting todo_postgres_1 ... done

> todo@1.0.0 test /srv/todo
> NODE_ENV=test mocha 'test/**/*.test.js'



  todo
    ✓ creates a task (123ms)
    ✓ handles a validation error on create
    ✓ handles an invalid task ID
    with existing tasks
      ✓ lists the tasks
      ✓ completes a task

  Task
    ✓ can be created with a valid description
    ✓ must have a description
    ✓ must not have an overly long description


  8 passing (349ms)

which is the name of the game for this blog post. (Or we can use the bin/test helper script, which does the same thing.)

Finally, it’s worth noting that to run a subset of the tests, mocha’s grep flag works, provided that it is after a -- delimiter to tell npm to pass it through to mocha. For example,

$ bin/test -- --grep Task

runs only the tests with names containing Task.

Conclusion

We’ve seen how to write and run model and integration tests for a simple node.js web service with Docker and Docker Compose. Model tests exercise the core business logic in the model layer, and integration tests check that the core business logic is correctly wired up to the controller and view layers. Both types of tests benefit from being able to access the database, which is not mocked.

The Docker Compose setup for this project was pretty simple — just one container for the application. We’ll see some more advanced Docker Compose usage in subsequent posts, together with scripts like bin/up to help drive them.

Next time, we’ll add a frontend so we can manage our TODO list without curl, and of course we will add some frontend tests. See you then!

Onward to part 2!

If you’ve read this far, you should follow me on twitter, or maybe even apply to work at Overleaf. :)

Appendix: Views on Testing

This series of posts makes some assumptions about the kinds of automated tests that we’re trying to write, so I should say what those assumptions are. I’ll start with the big picture and work back to the practical.

So, why do we test? We test to estimate correctness. Good testing lets us iterate quickly to improve correctness by providing accurate and cheap estimates.

I say ‘estimate’ here because we can in principle measure ‘ground truth’ correctness by letting the system loose in the real world and seeing what happens. If we can release to production quickly and get feedback quickly through great monitoring, and if the cost of system failure is low, we might not need to estimate. For example, if we deliver pictures of cats at scale, we might just ship to production and measure; if we make antilock braking systems, not so much. In most domains, it is worth investing in testing so we can accurately predict and improve correctness before we go to production.

The main way to achieve high prediction accuracy is through high test fidelity. As they say at NASA, fly as you test, test as you fly. For high fidelity, the system under test should closely resemble the one in production, and it should be tested in a way that closely resembles how it is used in production. However, fidelity usually comes at a cost.

There are two main costs to testing: the effort to create and maintain the tests, and the time to run the tests. Both are important. Software systems and their requirements change frequently, which requires developers to spend time adding and updating tests. And those tests run many, many times, which leaves developers twiddling their thumbs while they wait for test results.

Testing effectively requires finding the right tradeoff between fidelity and cost for your domain. This dynamic drives many decisions in testing. One example in this post is the decision to write both model tests and integration tests. We could just test everything with integration tests, which would be high fidelity but also high cost. Model tests are lower fidelity, in that they only test a part of the system, but they are generally easier to write and faster to run than integration tests, and hence lower cost.

The use of fakes (test doubles / mocks / stubs / etc.) is another important example; it can reduce test runtimes at the expense of lower fidelity and more effort to create and maintain the fakes. If replacing a component with a fake makes tests for other components much easier to write or faster to run without too much loss of fidelity, and it is not too much work to create and maintain the fake, it can be a good tradeoff.

Generally it makes sense to fake a component when two conditions hold: it is slow or unwieldy, and its coupling with the rest of the system is low. For example, if you want to test that your system sends an email when a user registers, you don’t need to stand up a full SMTP server and email client in your test environment; it is better to just fake the code that sends the email. Third party APIs called over HTTP often fall into this category, too, especially because there are good tools, such as vcr and polly.js, that make it easy to fake them.

One component I think it seldom makes sense to fake is the database. In most applications, coupling with the database is high, simply because some business logic is best handled by the database. The humble uniqueness constraint, for example, is very difficult to achieve without race conditions in application code but trivial with help from the database. And if you are fortunate enough to have a database that can do joins and enforce some basic data integrity constraints, you should definitely let it do that instead of endlessly rewriting that logic in application code. Given this coupling, including the database in the system under test increases fidelity and reduces costs to write and maintain tests, at the expense of longer test runtimes.

Fortunately, there are other ways to decrease test runtimes besides using fakes. A key property of testing is that it is embarrassingly parallel. Tests cases are independent by design, so it is straightforward (though not necessarily trivial) to run them in parallel. Some frameworks, such as rails, can do this out of the box, and scalable Continuous Integration services make it easy to bring a lot of compute power to bear on your tests at relatively low cost. Of course, massive parallelism doesn’t help so much if you are running the tests on your laptop, but in most cases you don’t have to rerun the whole test suite to make progress on a single feature — usually only a subset of the tests will be relevant. Then you can push the code to the beefy CI boxes to get a final check that the change hasn’t broken something in an unexpected part of the application.

So, to sum up, in this series of blog posts I’m advocating for high fidelity (don’t fake too much) and low costs to write and maintain tests (don’t spend too much time tending fakes), with high investment in testing infrastructure to offset longer test runtimes. This does certainly make test environments more complex, but Docker and Docker Compose provide many useful tools for managing this complexity, which is one of the motivations for this series of blog posts.