How to test inner loops

In this blog post we will look at something that has kept me busy for a couple of days, writing tests for code with inner loops. Loops are very common and although it is often easy to understand how they are intended to work, it can be surprisingly difficult to test them. Fortunately, QuickCheck has a very cool block/unblock feature that allows you to split a loop into smaller, more manageable chunks that can be tested in a straightforward way.

To illustrate how the block and unblock feature works I will implement a simple TCP/IP time server that lets users connect to the port it is listening to and then replies with a text message telling what time it is. It is simple but complicated enough for this blog post.

A first (loop-less) version of the time server

Let us start with a very simple version of the time server without loops that only allows for a single user to connect to it. This will serve as an introduction to how to write QuickCheck component tests and will set the stage for the next section where we add an inner loop. If you have already written QuickCheck tests, the material in this section will probably already be familiar to you.

So, let us get started! The following is the code for the server.

%% Starts the time server on the provided port.
start(TCP, Port) ->
  {ok, Socket} =
    TCP:listen(Port,
      [binary, {reuseaddr, true}, {packet, 4}, {active, false}]),
  {ok, ClientSocket} = TCP:accept(Socket),
  handle_client(TCP, ClientSocket),
  TCP:close(Socket).

%% Handles the connected client.
handle_client(TCP, ClientSocket) ->
  {H, M, S} = time(),
  NowStr =
    io_lib:format("The time is ~2..0b:~2..0b:~2..1b.\n", [H, M, S]),
  ok = TCP:send(ClientSocket, NowStr),
  TCP:close(ClientSocket).

The code is pretty straightforward, except for the argument TCP which is used to specify which module should be used for the TCP/IP functions. The reason for having this argument is that it makes it easier to mock the TCP/IP functions later on in the test. (Since QuickCheck uses the gen_tcp module itself we should avoid mocking this particular module.)

Setting up the test

Before writing the test commands for the server we need to get some setup code out of the way and start by creating a new module for the tests, time_server_eqc, that includes the necessary QuickCheck libraries, tells the compiler to export all functions (which makes it easier to test the code during development) and defines the TCP macro.

-module(time_server_eqc).

-include_lib("eqc/include/eqc.hrl").
-include_lib("eqc/include/eqc_component.hrl").

-compile(export_all).

-define(TCP, time_server_eqc_TCP).
Keeping track of the test state

When testing the server, we want to generate unique identifiers to keep track of the sockets that are created and in order to do so we let the test state contain a counter that we increment each time a socket is allocated in our model (by calling the model function allocate_socket).

-record(state, { socket_counter = 0 }).


%% ----- Initial state
initial_state() ->
  #state{ }.


%% ----- Utility functions
allocate_socket_return(S, _Args) ->
  {socket, S#state.socket_counter}.

allocate_socket_next(S, {socket, Counter}, _) ->
  S#state{ socket_counter = Counter + 1 }.

We also include a postcondition, shared by all commands, that says that the result of a command should be equal to the return value specified for the command by the model.

%% ----- Common postcondition
postcondition_common(S, Call, Result) ->
  eq(Result, return_value(S, Call)).
The test commands

We are now ready to implement the actual test commands for our server. Since the server only has one function in its API, start, and it only allows for a single user to connect, our test will only have a single command, start. We let its argument be a randomly generated port number and make it call time_server:start with the name of our mocked TCP/IP module and the generated port number.

start_args(_) ->
 [nat()].

start(Port) ->
  time_server:start(?TCP, Port).

The meat of the test is in the callout function for start that specifies which functions the server is expected to call, in which order, with what arguments and what they should return.

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?MATCH(ClientSocket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, accept, [Socket], {ok, ClientSocket}),
  ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
  ?CALLOUT(?TCP, close, [ClientSocket], ok),
  ?CALLOUT(?TCP, close, [Socket], ok).

The callouts closely follow the server: we start by allocating a new socket and use it as the return value of listen. The server should now call accept on the listening socket to wait for a connection and return a new socket for the connection. We specify this by first allocating a new socket (by calling allocate_socket again) and saying that the server should call accept with the listening socket as an argument and get the new client socket back. After this the server should send some data to the client (ideally, we would check that it sends the correct data, but to keep things simple we skip this here) and then close both the client socket and the listening socket.

To complete the test we define our mocking specification and add a top-level property that generates and runs sequences of our test command.

%% ----- The mocking specification
api_spec() ->
  #api_spec{
    modules = [
      #api_module{
        name = ?TCP,
        functions = [
          #api_fun{ name = listen, arity = 2 },
          #api_fun{ name = accept, arity = 1 },
          #api_fun{ name = send, arity = 2 },
          #api_fun{ name = close, arity = 1 }
        ]
      }
    ]
  }.


%% ----- Properties
prop_time_server() ->
  ?SETUP(
     fun() ->
       eqc_mocking:start_mocking(api_spec()),
       fun() -> ok end
     end,
     ?FORALL(Cmds, commands(?MODULE),
       begin
         {H, S, Res} = run_commands(?MODULE, Cmds),
         aggregate(command_names(Cmds),
           pretty_commands(?MODULE, Cmds, {H, S, Res}, eq(Res, ok)))
       end)).

We can now run the test with QuickCheck and check that the server works as intended, which it seems to do!

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

100.0% {time_server_eqc,start,1}
true

The time server – now with loops!

With the simple version of the server out of the way, we are ready to step it up a notch and make it handle multiple users. After opening the listening socket, it will now enter a loop where it waits for a user to connect, sends the current time and closes the connection before reentering the loop.

The following is the modified server code.

%% Starts the time server on the provided port.
start(TCP, Port) ->
  {ok, Socket} =
    TCP:listen(Port,
      [binary, {reuseaddr, true}, {packet, 4}, {active, false}]),
  server_loop(TCP, Socket).

server_loop(TCP, Socket) ->
  % Wait for a connection.
  {ok, ClientSocket} = TCP:accept(Socket),
  handle_client(TCP, ClientSocket),
  server_loop(TCP, Socket).

How do we write the callouts for this loop? Why not just create a callout function that directly follows the structure of the server loop and let the callouts for start call it?

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?APPLY(server_loop, [Socket]).

server_loop_callouts(_S, [Socket]) ->
  ?MATCH(ClientSocket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, accept, [Socket], {ok, ClientSocket}),
  ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
  ?CALLOUT(?TCP, close, [ClientSocket], ok),
  ?APPLY(server_loop, [Socket]).

This looks pretty good but, unfortunately, it does not work because when QuickCheck encounters a call to a command it will fully evaluate its callout structure. In this particular case, it will start by calling start_callouts and evaluate the returned callouts. Since the last term in these callouts is a call to the model function server_loop, it will call server_loop_callouts and continue evaluating its returned callouts. This is where it goes wrong because the newly returned callouts contain a recursive call to server_loop so QuickCheck will now make another call to server_loop_callouts and, thus, enter an infinite loop and never terminate.

If you attempt to run this, the Erlang session will eventually run out of heap space and crash.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 2733560184 bytes of memory
             (of type "heap", thread 1).
Aborted (core dumped)

To prevent this from happening, we need to somehow stop the callouts loop from recursing uncontrollably and instead unroll the loop one iteration at the time. With QuickCheck we can do exactly that by using ?BLOCK and ?UNBLOCK.

Instead of specifying the exact return value of the call to accept in the server loop callouts, we can say that the return value is ?BLOCK(server_loop_accept), which means that the call to accept will block and wait for another command in the test to give it the return value (the atom server_loop_accept serves as a unique identifier in case there are multiple blocked callouts). When we get the return value we bind it to ClientSocket and handle the client as before.

server_loop_callouts(_S, [Socket]) ->
  ?MATCH({ok, ClientSocket}, ?CALLOUT(?TCP, accept, [Socket],
                             ?BLOCK(server_loop_accept))),
  ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
  ?CALLOUT(?TCP, close, [ClientSocket], ok),
  ?APPLY(server_loop, [Socket]).

When QuickCheck sees that a command gets to a blocking callout, it will suspend execution of that command and its callouts and continue running other commands. In a way this is similar to the command and its callouts being evaluated lazily, driven by the other commands in the test. In our case, this means that we want to have a second command, connect, that simulates that a client is connecting by unblocking the call to accept in the server loop. This command allocates a new socket and uses that as the unblocked return value.

connect_args(_) ->
  [].

connect() ->
  ok.

connect_callouts(_S, []) ->
  ?MATCH(ClientSocket, ?APPLY(allocate_socket, [])),
  ?UNBLOCK(server_loop_accept, {ok, ClientSocket}).

This is pretty neat and separates the test code for the server loop and the connecting clients which is a first step towards being able to test the server with multiple clients connecting concurrently if we wanted to do so.

Keeping track of the server started state

We are, however, not fully done yet. We only want the connect command to run when the server is started (and we do not want to start the server if it is already running) so we add a flag to the test state that tells if the server has been started and add a model function to set it.

-record(state, { started = false,
                 socket_counter = 0 }).

set_started_next(S, _, [Started]) ->
  S#state{ started = Started }.

The start command will now have a precondition that checks that the server is not already started and we make sure that the started flag is set when the command is run.

start_pre(S) ->
  not S#state.started.

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?APPLY(set_started, [true]),
  ?APPLY(server_loop, [Socket]).

Similarly, we add a precondition to the connect command that checks that server is started.

connect_pre(S) ->
  S#state.started.

With this in place we run test again and get the following result which both shows that the server works and that we run the connect command a lot more than the start command which is expected since we can only start the server once in every command sequence but can connect any number of clients.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

93.2% {time_server_eqc,connect,0}
6.8% {time_server_eqc,start,1}
true

It would also be interesting to know how long the generated command sequences are and we can find that out by measuring the command length by using measure in our top-level property in the following way.

prop_time_server() ->
  ?SETUP(  
     fun() ->
       eqc_mocking:start_mocking(api_spec()),
       fun() -> ok end
     end,
     ?FORALL(Cmds, commands(?MODULE),
       begin
         {H, S, Res} = run_commands(?MODULE, Cmds),
         measure('Commands', length(Cmds),
           aggregate(command_names(Cmds),
             pretty_commands(?MODULE, Cmds, {H, S, Res},
                             eq(Res, ok))))
       end)).

When running the test we now get the following output where we can see that on average, the property generates sequences of commands consisting of one call to start and 14 calls to connect.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

93.0% {time_server_eqc,connect,0}
7.0% {time_server_eqc,start,1}

Commands:    Count: 100   Min: 1   Max: 65   Avg: 14.670 
             Total: 1467
true

Serving multiple clients in parallel

A limitation of our server is that it can only serve one client at the time which means that you could easily attack it by connecting to it and refusing to receive the data it sends. This will cause the server to block since it will not accept any more incoming connections until it is done serving the currently connected client.

To address this we will change the server to handle the connected clients in a spawned process and immediately go back to accept more incoming connections. This is a common way of implementing systems in Erlang, you have a loop waiting for something to happen and spawn new processes to handle what happened.

server_loop(TCP, Socket) ->
  % Wait for a connection.
  {ok, ClientSocket} = TCP:accept(Socket),
  spawn(fun() -> handle_client(TCP, ClientSocket) end),
  server_loop(TCP, Socket).

If we make this change to the server and run the test again it will, however, fail with the following output.

Reason:
  Callout mismatch:
    {unexpected, time_server_eqc_TCP:accept({socket, 0}), expected,
       time_server_eqc_TCP:send({socket, 1}, '_')}

This happens because there is now a race condition between the calls to handle_client and server_loop but our callouts specify that the server must handle the connected client before it can accept any new connections. To bring the callouts in line with the updated server we use ?PAR to say that the calls from handle_client will be performed in parallel with the calls from the recursive call of the server loop.

The new callouts look as follows.

server_loop_callouts(_S, [Socket]) ->
  ?MATCH({ok, ClientSocket}, ?CALLOUT(?TCP, accept, [Socket],
                             ?BLOCK(server_loop_accept))),
  ?PAR(
    ?CALLOUTS(
      ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
      ?CALLOUT(?TCP, close, [ClientSocket], ok)),
    ?APPLY(server_loop, [Socket])).

This is pretty cool and I like how we are able to specify the callouts in a way that closely follows the structure of the server loop implementation.

We can now run the test again which will pass!

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

93.1% {time_server_eqc,connect,0}
6.9% {time_server_eqc,start,1}

Commands:    Count: 100   Min: 1   Max: 57   Avg: 14.460
             Total: 1446
true

Fault injection

A neat thing with using ?BLOCK and ?UNBLOCK is that it allows us to add fault injection in a modular way by having a separate command for making the call to accept fail. Before adding the new command we do, however, need to modify the callouts for the server loop so it knows how to deal with the failure.

Instead of pattern matching on the return value of accept inside the enclosing ?MATCH we will bind the return value and have the callouts do different things depending on what value it is.

If we get {ok, ClientSocket} it means that the call to accept succeeded and we have the same callouts as before except that we propagate the return value of the recursive call to the server loop (if the loop would terminate because of a failure). The return value is propagated by pattern matching against the return value of the ?PAR statement and returning the second element of that list (the first element is the return value of the send and close callouts).

If the call to accept fails, we specify that the server should throw an exception because it cannot match the value returned by accept and we do so by using ?RET and ?EXCEPTION.

server_loop_callouts(_S, [Socket]) ->
  ?MATCH(Result, ?CALLOUT(?TCP, accept, [Socket],
                 ?BLOCK(server_loop_accept))),
  case Result of
    {ok, ClientSocket} ->
      ?MATCH([_, Ret],
        ?PAR(
          ?CALLOUTS(
            ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
            ?CALLOUT(?TCP, close, [ClientSocket], ok)),
          ?APPLY(server_loop, [Socket]))),
      ?RET(Ret);
    _ -> ?RET(?EXCEPTION({badmatch, Result}))
  end.

We also need to update start_callouts to make sure that it propagates the return value of the server loop and changes the server started flag in the state to reflect that it is no longer running.

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?APPLY(set_started, [true]),
  ?MATCH(Ret, ?APPLY(server_loop, [Socket])),
  ?APPLY(set_started, [false]),
  ?RET(Ret).

With these changes in place it is trivial to add a command that injects failures. All we need to do is to unblock the server loop with an error return value in the following way.

%% ----- Command: connect_fail
connect_fail_pre(S) ->
  S#state.started.

connect_fail_args(_) ->
  [].

connect_fail() ->
  ok.

connect_fail_callouts(_S, []) ->
  ?UNBLOCK(server_loop_accept, {error, enodev}).

When running the test again we see that the three different commands are generated with roughly the same probability.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

37.9% {time_server_eqc,start,1}
32.8% {time_server_eqc,connect_fail,0}
29.4% {time_server_eqc,connect,0}

Commands:    Count: 100   Min: 1   Max: 64   Avg: 14.370
             Total: 1437
true

Usually, I want to focus more on the positive testing and would like to generate more calls to connect and less calls to connect_fail. We can fine tune the probabilities of the commands by giving them different weights in the following way where we say that connect should have a weight five times higher than the other commands.

%% ----- The command weights
weight(_, connect) -> 5;
weight(_, _) -> 1.

If we run the test again with this weight function we get a better distribution of the commands.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

68.6% {time_server_eqc,connect,0}
18.7% {time_server_eqc,start,1}
12.7% {time_server_eqc,connect_fail,0}

Commands:    Count: 100   Min: 1   Max: 70   Avg: 14.900
             Total: 1490
true

Summary

I hope that this blog post has been useful and shown how you can use ?BLOCK and ?UNBLOCK together with ?PAR and fault injection to test code with inner loops. Using the techniques in this blog post you can test much more complicated loop structures where you have chains of blocked commands and not only use unblocks to provide a return value but also meta-data used to instruct the unblocked callouts. I do, however, want to keep this blog post simple and focused so that is better left for a future blog post.

Happy testing!