Blog – QuviQ

Testing C++ STL set using QuickCheck

Hans Svensson — Thu, 21 Sep 2017 07:43:11 +0000

Introduction

In QuickCheck version 1.42.1 we released eqc_cpp, a library to help testing C++ code with Quviq’s QuickCheck.

QuickCheck already had support for automatic generation of tests for the C language. Functions and types defined in C got automatically their counterpart in Erlang, such that QuickCheck could call these functions with the right arguments.

This has now been extended to C++. With eqc_cpp C++ is handled by auto-generating C-wrapper code and then using the exisiting C-binding in QuickCheck. Since it is infeasible to write and maintain a C++ parser we have based the C++ support on the SWIG Tool, a widely used tool for creating wrapper code for C/C++ (although using it to access C++ from C is most likely not the normal use case). (Note: For template instantiation to work you need to use SWIG >= 3.0.12)

STL set

As a running example we are going to test some of the functionality of the Standard Template Library (STL) set implementation This example is complicated enough to illustrate the concept, while not having to write and explain lots of C++ code. We decided to test the following functions in STL set:

bool empty() const; – Check whether the set is empty.
size_t size() const; – Return the size of (number of elements in) the set.
iterator find(const Elem &x); – Look for a particular value in the set. If the element is present an iterator pointing to the element is returned, otherwise an iterator pointing to set::end() is returned.
pair insert(const Elem &x); – Insert an element into the set. The result is a pair. The first component of the pair is an iterator pointing to the inserted element (or to the already existing element if it was already present). The second component is true if the element was inserted, and false if it was already present.
size_t erase(const Elem &x); – Remove an element from the set, returns 1 if the element was removed, and 0 otherwise.
iterator begin(); – Returns an iterator referring to the first element in the set container (or the past-the-end element if the set is empty).
iterator end(); – Returns an iterator referring to the past-the-end element in the set container.

Calling C++ from Erlang

For eqc_cpp to generate wrapper code for STL set we need to produce a SWIG-file that contains the necessary definitions. A first attempt could look like this (where we name an instance of a set of integers set_int):

// Module name is not used by eqc_cpp, but SWIG dictates it should be present
%module unused

%{
  #include 
%}

namespace std {

  template  class pair{};

  template 
  class set {
    public:

      class iterator;

      set();
      bool empty() const;
      size_t size() const;
      void clear();
      iterator begin();
      iterator end();
      iterator find(const Elem &x);
      pair insert(const Elem &x);
      size_t erase(const Elem &x);
  };
}

%template(set_int) std::set;

Note that we need to include a declaration of the pair template, since it is used in the return type of insert. With the SWIG-file you define what part of the C++ API you want to create wrappers for. It should completely describe this API.

This SWIG-file can be given to the Erlang module eqc_cpp (for example in an Erlang shell):

284> eqc_cpp:start(set, [{swig_file, "stl_set.swg}, verbose]).

+++ Called swig                                                  [   136ms ]
SWIG Parse result:
  - Class instances:
      set_int              ==> std::set
      set_int_iterator     ==> std::set::iterator
      pair                 ==> std::pair::iterator, bool>  (auto instantiated)

+++ Called eqc_c:start:
  eqc_c:start(set,
              [{c_src, "/tmp/__eqc_tmp1504519839758850_wrap.h"},{additional_files, "/tmp/__eqc_tmp1504519839759607_wrap.o"},{cflags, "-lstdc++"},
               {rename, []}, verbose]).
< ... >
ok

And this results in the creation of an Erlang module set with Erlang functions for the defined API:

289> set:module_info(exports).
[{pair_new,0},
 {set_int_clear,1},
 {set_int_delete,1},
 {set_int_empty,1},
 {set_int_begin,1},
 {set_int_end,1},
 {set_int_iterator_delete,1},
 {set_int_iterator_new,0},
 {module_info,0},
 {module_info,1},
 {set_int_new,0},
 {set_int_size,1},
 {set_int_erase,2},
 {set_int_find,2},
 {set_int_insert,2}]

Some quick experiments in the shell shows that the wrapping seems to work, but also reveals a problem:

292> S = set:set_int_new().
{ptr,"set_int",140682019733552}
293> set:set_int_size(S).
0
295> set:set_int_insert(S, 7).
{ptr,"pair",140682020782096}
296> set:set_int_find(S, 7).
{ptr,"set_int_iterator",140682019733600}

Since we cannot directly access objects in C the wrapping translates all functions that return an object (or a reference to an object) into a function that returns a pointer to something opaque. Thus there is not really much we can do with the result from insert or from find, since all we have is a pointer to something that represents an iterator or a pair respectively. The solution is to provide more detail to the SWIG-file, and thus reveal more details to eqc_cpp. We expand the definitions of iterator and pair into:

template 
class pair {
  public:
    T first;
    U second;
};

class iterator {
  public:
    bool operator ==(const iterator &i) const;
    iterator operator ++() const;
    Elem operator *() const;
};

The result are functions that can be used to destruct a pair (pair_get_first and pair_get_second) and functions that can do something sensible with iterators (set_int_iterator==, set_int_iterator++, and set_int_iterator*).

Now we have created a link between Erlang and C++ such that from Erlang we can call C++ functions and inspect the results.

Modelling sets

We now have everything that we need to be able to write a simple QuickCheck state machine model for STL set. QuickCheck models are written in Erlang and are used to automatically generate test cases from.

In order to keep the model simple each generate test case makes use of a single set. The state is a simple list-representation of a set. The tested property is a pretty standard eqc_statem property where the eqc_cpp is started in ?SETUP and the C-binding is restarted before each test, lastly the set is created and passed to run_commands.

-module(stl_set_eqc).

-compile([export_all, nowarn_export_all]).

-include_lib("eqc/include/eqc.hrl").
-include_lib("eqc/include/eqc_statem.hrl").

-define(SWG_FILE, "./stl_set.swg").

%% -- State ------------------------------------------------------------------

initial_state() -> [].

%% -- Property ---------------------------------------------------------------

%% The property.
prop_ok() ->
  ?SETUP(fun() -> start(), fun() -> ok end end,
  ?FORALL(Cmds, commands(?MODULE),
  begin
    eqc_c:restart(),
    S = set:set_int_new(),
    HSR={_, _, Res} = run_commands(Cmds, [{set, S}]),
    pretty_commands(?MODULE, Cmds, HSR,
    check_command_names(?MODULE, Cmds,
      Res == ok))
  end)).

start() ->
  eqc_cpp:start(set, [{swig_file, ?SWG_FILE}, verbose]).

The only thing missing now is to model each operation. The easiest operation to model is size. It takes a single parameter, the set, it does not affect the model state (no _next) and the postcondition just compares the size of the set with the length of our model state list:

%% --- size ---

size_args(_) -> [{var, set}].

size(S) -> set:set_int_size(S).

size_post(Xs, [_S], V) ->
  eq(V, length(Xs)).

Of course it is quite boring to measure the size of the empty set, we better insert something into the set. Calls to insert take the set and a random (small) integer. The model state is updated by adding the possibly new number – since we use lists:umerge we will keep an ordered list (we come back to this in a short while) without duplicates. The postcondition we check is that the model and the system under test (SUT) agree on whether the inserted number was new or not.

%% --- insert ---

insert_args(_) -> [{var, set}, int()].

insert(S, N) ->
  P = set:set_int_insert(S, N),
  set:pair_get_second(P).

insert_next(Xs, _V, [_S, X]) ->
  lists:umerge([X], Xs).

insert_post(Xs, [_S, X], V) ->
  eq(V, not lists:member(X, Xs)).

Running the property with these two operations works well:

372> eqc:quickcheck(stl_set_eqc:prop_ok()).
....................................................................................................
OK, passed 100 tests

50.8% {eqc_cpp_stl_set_eqc,size,1}
49.2% {eqc_cpp_stl_set_eqc,insert,2}
true

Modelling the remaining operations, find and erase, proceeds similarly. The most interesting bit is getting the postcondition for find right – we need to compare the result of set:set_int_find with set:set_int_end using the overloaded iterator== operator. We choose to perform these operations in the find call rather than the postcondition to make test cases easier to read.

%% --- find ---

find_args(_) -> [{var, set}, int()].

find(S, X) ->
  not set:'set_int_iterator=='(set:set_int_find(S, X), set:set_int_end(S)).

find_post(Xs, [_Set, X], V) ->
  eq(V, lists:member(X, Xs)).

%% --- erase ---

erase_args(_) -> [{var, set}, int()].

erase(S, X) -> set:set_int_erase(S, X).

erase_next(Xs, _V, [_S, X]) ->
  Xs -- [X].

erase_post(Xs, [_S, X], V) ->
  eq(V, length([ Y || Y <- Xs, X == Y ])).

As mentioned above we do keep the model list sorted. There is a good reason for this, if we read the manual for STL set we find the following information “Internally, the elements in a set are always sorted following a specific strict weak ordering criterion…”. To test if this is really true we need to compare the model state (that should be sorted correctly) with the result of iterating through the C++ set. We model this as the operation to_list, that iterate from set_int_begin to set_int_end checking that the set is indeed sorted.

%% --- to_list ---

to_list_args(_) -> [{var, set}].

to_list(S) -> iterate(set:set_int_begin(S), set:set_int_end(S)).

to_list_post(Xs, [_S], V) ->
  eq(V, Xs).

iterate(I, End) -> iterate(I, End, []).

iterate(I, End, Acc) ->
  case set:'set_int_iterator=='(I, End) of
    true  -> lists:reverse(Acc);
    false ->
      X = set:'set_int_iterator*'(I),
      iterate(set:'set_int_iterator++'(I), End, [X | Acc])
  end.

At times it is rather awkward to do C/C++ operations on the Erlang side. In these cases one can use the SWIG extend-functionality, which makes it possible to write bits of code that are included in the wrapping as if it was part of the C++ code. For example, we saw that it was a bit cumbersome to compare the result of find with end to conclude whether the number was found in the set or not. A better approach could be to add an is_member function “to” the set object. This can be achieved by adding the following to the SWIG file (inside the set class):

%extend{
  bool is_member(const Elem &x){
    return $self->find(x) != $self->end();
  }
}

$self is the SWIG syntax for this and we simply do the comparison on the C++ side (which is trivial because of the overloaded ==) and end up with a function that just returns a boolean. Internally eqc_cpp handle this by adding a C++ function with this functionality, and then proceeds to wrap the added function just like the other member functions. (Note: This is not a real member function, thus there is no access to private class variables etc.)

If we now add is_member to the model we could remove the find function from it if we like. We are able to test the find function through the simpler to write is_member function:

%% --- is_member ---

is_member_args(_) -> [{var, set}, int()].

is_member(S, X) ->
  set:set_int_is_member(S, X).

is_member_post(Xs, [_Set, X], V) ->
  eq(V, lists:member(X, Xs)).

Tests generated by the resulting model run just fine, after all the STL set functionality is rather well tested and we did not really expect to find any issues.

381> eqc:quickcheck(stl_set_eqc:prop_ok()).
....................................................................................................
OK, passed 100 tests

17.9% {eqc_cpp_stl_set_eqc,erase,2}
17.5% {eqc_cpp_stl_set_eqc,is_member,2}
16.9% {eqc_cpp_stl_set_eqc,find,2}
16.2% {eqc_cpp_stl_set_eqc,insert,2}
15.8% {eqc_cpp_stl_set_eqc,size,1}
15.7% {eqc_cpp_stl_set_eqc,to_list,1}
true

To make sure we could find issues if there were any, we can plant a bug in the specification:

to_list_post(Xs, [_S], V) ->
  eq(V, lists:reverse(Xs)).

Instead of requiring that to_list return the elements in sorted order we specify that they should be returned in reverse sorted order. Now running the tests quickly finds a counterexample:

381> eqc:quickcheck(stl_set_eqc:prop_ok()).
....Failed! After 4 tests.
< ... >
Shrinking xxxxxxx..xx...xxx.xxxxx(6 times)
[{model,eqc_cpp_stl_set_eqc},
 {set,{var,1},{call,eqc_cpp_stl_set_eqc,insert,[{var,set},0]}},
 {set,{var,2},{call,eqc_cpp_stl_set_eqc,insert,[{var,set},1]}},
 {set,{var,3},{call,eqc_cpp_stl_set_eqc,to_list,[{var,set}]}}]

eqc_cpp_stl_set_eqc:insert(Set, 0) -> true
eqc_cpp_stl_set_eqc:insert(Set, 1) -> true
eqc_cpp_stl_set_eqc:to_list(Set) -> [0, 1]

Reason:
  Post-condition failed:
  [0, 1] /= [1, 0]
false

Final version

Below are the final versions of the SWIG-file and the QuickCheck state machine model (without the bug).

// Module name is not used by eqc_cpp, but SWIG dictates it should be present
%module unused

%{
  #include 
%}

namespace std {

  template 
  class pair {
    public:
      T first;
      U second;
  };

  template 
  class set {
    public:
      class iterator {
        public:
          bool operator ==(const iterator &i) const;
          iterator operator ++() const;
          Elem operator *() const;
      };

      set();
      bool empty() const;
      size_t size() const;
      void clear();
      iterator begin();
      iterator end();
      iterator find(const Elem &x);
      pair insert(const Elem &x);
      size_t erase(const Elem &x);

      %extend{
        bool is_member(const Elem &x){
          return $self->find(x) != $self->end();
        }
      }
  };

}

%template(set_int) std::set;

%%% File        : stl_set_eqc.erl
%%% Author      : Ulf Norell
%%% Description : Testing STL set implementation
%%% Created     : 28 Aug 2017 by Ulf Norell
-module(stl_set_eqc).

-compile([export_all, nowarn_export_all]).


-include_lib("eqc/include/eqc.hrl").
-include_lib("eqc/include/eqc_statem.hrl").

-define(SWG_FILE, "../examples/eqc_cpp/stl_set/stl_set.swg").

%% -- State ------------------------------------------------------------------

initial_state() -> [].

%% -- Operations -------------------------------------------------------------

%% --- insert ---

insert_args(_) -> [{var, set}, int()].

insert(S, N) ->
  P = set:set_int_insert(S, N),
  set:pair_get_second(P).

insert_next(Xs, _V, [_S, X]) ->
  lists:umerge([X], Xs).

insert_post(Xs, [_S, X], V) ->
  eq(V, not lists:member(X, Xs)).

%% --- find ---

find_args(_) -> [{var, set}, int()].

find(S, X) ->
  not set:'set_int_iterator=='(set:set_int_find(S, X), set:set_int_end(S)).

find_post(Xs, [_Set, X], V) ->
  eq(V, lists:member(X, Xs)).

%% --- is_member ---

is_member_args(_) -> [{var, set}, int()].

is_member(S, X) ->
  set:set_int_is_member(S, X).

is_member_post(Xs, [_Set, X], V) ->
  eq(V, lists:member(X, Xs)).

%% --- erase ---

erase_args(_) -> [{var, set}, int()].

erase(S, X) -> set:set_int_erase(S, X).

erase_next(Xs, _V, [_S, X]) ->
  Xs -- [X].

erase_post(Xs, [_S, X], V) ->
  eq(V, length([ Y || Y <- Xs, X == Y ])).

%% --- size ---

size_args(_) -> [{var, set}].

size(S) -> set:set_int_size(S).

size_post(Xs, [_S], V) ->
  eq(V, length(Xs)).

%% --- to_list ---

to_list_args(_) -> [{var, set}].

to_list(S) -> iterate(set:set_int_begin(S), set:set_int_end(S)).

to_list_post(Xs, [_S], V) ->
  eq(V, Xs).

%% -- Common -----------------------------------------------------------------

iterate(I, End) -> iterate(I, End, []).

iterate(I, End, Acc) ->
  case set:'set_int_iterator=='(I, End) of
    true  -> lists:reverse(Acc);
    false ->
      X = set:'set_int_iterator*'(I),
      iterate(set:'set_int_iterator++'(I), End, [X | Acc])
  end.

%% -- Property ---------------------------------------------------------------

%% The property.
prop_ok(V) ->
  ?SETUP(fun() -> start(), fun() -> ok end end,
  ?FORALL(Cmds, commands(?MODULE),
  begin
    eqc_c:restart(),
    S = set:set_int_new(),
    HSR={_, _, Res} = run_commands(?MODULE, Cmds, [{set, S}]),
    pretty_commands(?MODULE, Cmds, HSR,
    check_command_names(?MODULE, Cmds,
      Res == ok))
  end)).

start() ->
  start([]).

start(ExtraOpts) ->
  eqc_cpp:start(set, [{swig_file, ?SWG_FILE}] ++ ExtraOpts).

Testing TLS with QuickCheck

Hans Svensson — Thu, 09 Feb 2017 12:21:59 +0000

Testing TLS with QuickCheck

Lately we have been experimenting with modelling TLS (v1.2) using QuickCheck.

We took the approach where QuickCheck takes the ‘client’ role and tested a couple of available implementations running as a server. For the experiment we
ran against the built-in Erlang SSL/TLS implementation, S2N from Amazon Security Labs, OpenSSL, and mbedTLS (previously PolarSSL). For a complete description of TLS v1.2 we refer to RFC5246; in short it is a protocol where a series of messages is interchanged in order to setup a shared secret, such that an efficient symmetric cipher can be used for a communication session.

TLS has lots of additions and extensions defined. Here we describe the basics, without such extensions. Also, this is not directly a test of the underlying cryptographic primitives. We are using the ‘crypto’ library of Erlang, which in term utilizes OpenSSL. We would, of course, notice if the tested implementation deviates from this ‘standard’ implementation, but otherwise we assume it to be correct. After all, for S2N we know that considerable effort has been put in verifying the crypto part

Purpose

Quviq has on several occasions been asked to do work in the spirit of ‘does an implementation adhere to its specification’? I.e. acceptance testing of a well specified application. For example we have done this for the AUTOSAR specification (for example the CAN and LIN communication stacks), and we have found it an interesting and challenging problem. The question is important because of interoperability, will two implementations that supposedly follow the same standard work together? And when they don’t work together, who is at fault?

How we tested

The TLS protocol is initialized by the client sending a CLIENT_HELLO message. This message contains the security parameters that the client supports, we use
QuickCheck to generate a random list of configurations (skewed against configurations that the server will accept). Depending on the selected configurations (the server makes the final selection from the list sent by the client). The protocol continues with messages containing certificates, possibly additional ciphers, etc. In each stage we check that the response is the expected and generate the partly random data necessary to respond in compliance to the protocol. The TLS protocol is terminated once a successful handshake is completed (or aborted if we happened to generate a negative test).

Findings

Saving the most interesting finding to last we start by two observations regarding the S2N TLS implementation. (1) S2N does not follow the specification regarding when to send explicit ABORT messages in the protocol; and (2) S2N uses a strange internal prioritization order of ciphers. (1) is an intentional deviation as far as we could tell, there are certain cryptographic attacks towards TLS that can benefit from the information given by the ABORT, thus Amazon engineers have on purpose left them out. Such a deviation is, of course, good to make – but from an interoperability perspective it can still cause a problem so exposing the deviation is important. (Also, we had to read this
between the lines in the S2N documentation, it would not hurt to be more explicit!) The second observation was that S2N will select crypto configuration
(from the configurations suggested by the client), in a strange order. For ephemeral elliptic curve Diffie-Hellman (ECDHE) with 128 bit AES then SHA1 is
preferred over SHA256 as the MAC algorithm. But for RSA with 128 AES then SHA256 is preferred over SHA1.

For the Erlang SSL/TLS implementation we actually found a real bug. We found a CLIENT_HELLO message that was incorrectly rejected by the server. Upon closer inspection it turned out that whenever the lower 16-bits of the GMT time was zero *and* the first two bytes of the client random material that goes in the message (read as a 16 bit UINT) match the length of the rest of the message it is rejected. The reason is that the message is then, incorrectly, interpreted as a SSLv2 message, and SSLv2 is deprecated and such messages will be rejected. This bug is reported to the Erlang OTP team. A patch is already in place and will be included in a coming release.

Conclusion

Modelling only the basic functionality of the protocol we did not expect to find any major bugs or deviations. The tested (server-) implementations can all be considered very mature and since they are all well used any error in the commonly used core would have been exposed long ago. Still we did find a real error and a couple of smaller deviations/documentation issues – which goes to show that it is an effective way to test an implementation.

We generate random tests in such a way that dependencies that cannot be random, but depend on the state we are in, are given sensible values. In this way we quickly get the implementation under test in interesting states. We expect QuickCheck testing to find even more interesting things if we start implementing a number of extensions of this protocol.

Shrinking fallback servers to primary servers

Thomas Arts — Sun, 11 Oct 2015 13:38:05 +0000

Shrinking fallback servers to primary servers

Imagine you are running a large system with some primary and some fallback servers. You have at least one, up to three primary servers and zero to four fallback servers. You run tests on a random configuration to make sure your system behaves as expected. One of those tests fails and you want to shrink it to a minimal counter example.

A minimal counter example often takes the configuration into account. Years ago we showed how to test elevator control software using QuickCheck. The property first selected a random number of floors and elevators and then run a sequence of random commands. As soon as such sequence failed, it would shrink the configuration to the minimal number of floors and elevators needed to provoke this error. The idea is that analyzing a failure in a small configuration or a default (i.e., well known) configuration is easier than analyzing it in a large, tricky configuration. Moreover, if we cannot shrink the configuration to for example 2 floors, then we know that we need at least 3 floors for the error to happen, giving us additional information about the possible causes.

At Basho they test their software against many random configurations of primary and fallback servers. The primary servers typically have id 1, 2 and 3 and the fallback servers typically have ids larger than 3. Thus the list [1, 3, 6, 8] represents a configuration with two primary servers (1 and 3) and two fallback servers (6 and 8).
A failing test case is easier to analyze if there are less servers involved and in particular if it can be provoked using only primary servers. We want a generator for such configurations that has good shrinking behaviour. That is, it may shrink [1, 3, 6, 8] to [1, 3, 2], clearly less nodes and only primary nodes.

Posting a challenge

At the first QuickCheck User workshop we had a free 30 minute slot and I posted the above sketched problem as a QuickCheck generator design challenge. Create a generator primary_fallback(N, M) for M >= N. The idea is that N is the maximum number of primary nodes and M is the maximum node number. The generator has to fullfil the following conditions:

It generates a list of numbers between 1 and M
The list contains at least one primary node, i.e., it contains an element less or equal to N.
Each number occurs at most once
For shrunk minimal counter examples we have:
- Removing an element from the list should result in a passing test
- Replacing a fallback by a primary node should result in a passing test

In order to verify that we shrink correctly, I proposed to use the QuickCheck primitives eqc:watch_shrinking() and eqc_gen:sampleshrink(Gen). The correct way, as @ulfnorell showed during the workshop is to create a property over shrinking paths, but we only had 30 minutes for this challenge.

If we want to watch shrinking, we better have a failing property. I came up with a rather simplistic property that would show some shrinking.

prop_pf() ->;
   ?FORALL(Nodes, primary_fallback(3, 7),
           lists:sum(Nodes) < 6).

Thus, if we have at most 3 primary nodes, always carrying node id 1, 2, or 3 and we have some fallback nodes with node id 4, 5, 6, or 7, then when we pick any set of nodes, the sum of the node ids is less than 6.

A clearly wrong solution would be to implement the primary_fallback generator as a subset of the list [1..M]:

primary_fallback(_, M) ->
   sublist(lists:seq(1,M)).

When testing with this generator, one would, for example see shrinking to a list without primary nodes as well as to lists that do keep the fallback nodes instead of replacing them by primary nodes. The first shrinking is wrong, since we need a primary node in each faling test case. The second shrinking is somehow correct, since replacing the fallback node 4 with a primary node makes the property pass. However, given the starting point [1,2,3,4] we might have liked to get the failing counter example [1,2,3] as minimal result:

4> eqc:quickcheck(prefs_eqc:prop_pf()).
...........Failed! After 12 tests.
[1,2,3,6]
Shrinking xxx..xx(2 times)
[6]
false
5> eqc:quickcheck(prefs_eqc:prop_pf()).
.................Failed! After 18 tests.
[1,2,3,4]
Shrinking xxx.xxx.xx(2 times)
[2,4]
false

By checking this property one observes the final shrinking result, but not the intermediate results. In order to do so, one uses watch_shrinking after one has found a failing test case:

6> eqc:watch_shrinking().              
Failed!
[1,2,3,4]

OK!
[]

OK!
[1]

OK!
[1,2]

Failed!
[3,4]

OK!
[4]

OK!
[3]

OK!
[1,4]

Failed!
[2,4]

OK!
[2,3]

OK!
[2]
ok

Shrinking never considered the case [1, 2, 3] as an alternative, smaller test case.
Your challenge: create a better generator.

A different property

The first thing that happens is that @rjmh comes up with a different property, quickly thereafter improved by @LeHoff. Instead of fixing the property, take an arbitrary property that fails for a specific given input. The property should fail sporadically in order to get interesting failures.

prop_randomfailure() ->
  ?FORALL({Nodes, P}, {primary_fallback(3, 7),
                       function1(weighted_default({70,true}, {10,false}))},
          P(Nodes)).

However, this property has as a disadvantage that you may have very few valid shrinking alternatives and that you do not know where a test case should shrink to. Therefore, we disregard this solution and stick to the prop_pf defined above.

Simple and reasonable effective

I present one generator that does a reasonable job. Compare this to one of your own solutions and see where your solution performs better. There’s a price to pay for making a super elegant solution compared to a working solution.

First thing we noticed from our naive approach is the generation of a list of nodes in which we do not have a primary node. We want at least one primary node! We could choose that first using elements(lists:seq(1, N)) and then choose a sublist of other nodes with sublist(lists:seq(1, M)). This will generate sublists that do contain the node chosen first and therefore we can eliminate duplicates from the concatenated list.

primary_fallback(N, M) ->
  ?LET({Primary, Fallback}, 
       {[elements(lists:seq(1, N))], sublist(lists:seq(1, M))},
       shuffle(lists:usort(Primary ++ Fallback))).

This generator produces lists of nodes that always contain a primary node.

7> eqc_gen:sample(prefs_eqc:primary_fallback(3,7)).
[1,2,3,6,7]
[1,2,3,4]
[4,5,7]
[5]
[4,6,7]
[2]
[2,3,5,6,7]
[1,2,3,4,5,6]
[4]
[1,2,3,5,6]
[2,3,4,5,6]
ok

We should also observe the shrinking behaviour and indeed, similar to before we shrink the length of the list eagerly.

8> eqc_gen:sampleshrink(prefs_eqc:primary_fallback(3,7)).
[2,5,7]
--> [[],[2],[2,5],[7],[5,7],[2,7],[1,5,7],[1,2,7],[2,3,7],[2,4,7],[1,2,5],[2,3,5],[2,5,6]]
ok

Using this generator for our property results in two failing cases [1,2,3] and [1,5]. Both are correct failing cases in the sense that they cannot be made smaller by shortening the list or replacing a fallback by a primary; when we replace 5 by 4, the property passes.

However, it is a bit unsatisfactory to get [1,5] so many times as a result of shrinking:
Failed! After 1 tests.

[3,1,6,2,7]
Shrinking .xxx...x.x(5 times)
[1,5]
false

We probably want to add a shrinking alternative that takes care of the cases in which the primary nodes alone could have caused the property to fail. We can do so by adding a shrinking alternative.

primary_fallback(N, M) ->
  ?LET({Primary, Fallback}, 
       {[elements(lists:seq(1, N))], sublist(lists:seq(1, M))},
       ?SHRINK(shuffle(lists:usort(Primary ++ Fallback)),
              [ [ E || E<-Primary++Fallback, E=
This only adds one shrinking alternative to the solution, viz. filtering out all primary nodes. One could add additional shrinking alternatives with subset of primary nodes, but I decided to not complicate this generator further.

How to test inner loops

Tobias Gedell — Mon, 17 Nov 2014 12:50:01 +0000

How to test inner loops

In this blog post we will look at something that has kept me busy for a couple of days, writing tests for code with inner loops. Loops are very common and although it is often easy to understand how they are intended to work, it can be surprisingly difficult to test them. Fortunately, QuickCheck has a very cool block/unblock feature that allows you to split a loop into smaller, more manageable chunks that can be tested in a straightforward way.

To illustrate how the block and unblock feature works I will implement a simple TCP/IP time server that lets users connect to the port it is listening to and then replies with a text message telling what time it is. It is simple but complicated enough for this blog post.

A first (loop-less) version of the time server

Let us start with a very simple version of the time server without loops that only allows for a single user to connect to it. This will serve as an introduction to how to write QuickCheck component tests and will set the stage for the next section where we add an inner loop. If you have already written QuickCheck tests, the material in this section will probably already be familiar to you.

So, let us get started! The following is the code for the server.

%% Starts the time server on the provided port.
start(TCP, Port) ->
  {ok, Socket} =
    TCP:listen(Port,
      [binary, {reuseaddr, true}, {packet, 4}, {active, false}]),
  {ok, ClientSocket} = TCP:accept(Socket),
  handle_client(TCP, ClientSocket),
  TCP:close(Socket).

%% Handles the connected client.
handle_client(TCP, ClientSocket) ->
  {H, M, S} = time(),
  NowStr =
    io_lib:format("The time is ~2..0b:~2..0b:~2..1b.\n", [H, M, S]),
  ok = TCP:send(ClientSocket, NowStr),
  TCP:close(ClientSocket).

The code is pretty straightforward, except for the argument TCP which is used to specify which module should be used for the TCP/IP functions. The reason for having this argument is that it makes it easier to mock the TCP/IP functions later on in the test. (Since QuickCheck uses the gen_tcp module itself we should avoid mocking this particular module.)

Setting up the test

Before writing the test commands for the server we need to get some setup code out of the way and start by creating a new module for the tests, time_server_eqc, that includes the necessary QuickCheck libraries, tells the compiler to export all functions (which makes it easier to test the code during development) and defines the TCP macro.

-module(time_server_eqc).

-include_lib("eqc/include/eqc.hrl").
-include_lib("eqc/include/eqc_component.hrl").

-compile(export_all).

-define(TCP, time_server_eqc_TCP).

Keeping track of the test state

When testing the server, we want to generate unique identifiers to keep track of the sockets that are created and in order to do so we let the test state contain a counter that we increment each time a socket is allocated in our model (by calling the model function allocate_socket).

-record(state, { socket_counter = 0 }).


%% ----- Initial state
initial_state() ->
  #state{ }.


%% ----- Utility functions
allocate_socket_return(S, _Args) ->
  {socket, S#state.socket_counter}.

allocate_socket_next(S, {socket, Counter}, _) ->
  S#state{ socket_counter = Counter + 1 }.

We also include a postcondition, shared by all commands, that says that the result of a command should be equal to the return value specified for the command by the model.

%% ----- Common postcondition
postcondition_common(S, Call, Result) ->
  eq(Result, return_value(S, Call)).

The test commands

We are now ready to implement the actual test commands for our server. Since the server only has one function in its API, start, and it only allows for a single user to connect, our test will only have a single command, start. We let its argument be a randomly generated port number and make it call time_server:start with the name of our mocked TCP/IP module and the generated port number.

start_args(_) ->
 [nat()].

start(Port) ->
  time_server:start(?TCP, Port).

The meat of the test is in the callout function for start that specifies which functions the server is expected to call, in which order, with what arguments and what they should return.

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?MATCH(ClientSocket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, accept, [Socket], {ok, ClientSocket}),
  ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
  ?CALLOUT(?TCP, close, [ClientSocket], ok),
  ?CALLOUT(?TCP, close, [Socket], ok).

The callouts closely follow the server: we start by allocating a new socket and use it as the return value of listen. The server should now call accept on the listening socket to wait for a connection and return a new socket for the connection. We specify this by first allocating a new socket (by calling allocate_socket again) and saying that the server should call accept with the listening socket as an argument and get the new client socket back. After this the server should send some data to the client (ideally, we would check that it sends the correct data, but to keep things simple we skip this here) and then close both the client socket and the listening socket.

To complete the test we define our mocking specification and add a top-level property that generates and runs sequences of our test command.

%% ----- The mocking specification
api_spec() ->
  #api_spec{
    modules = [
      #api_module{
        name = ?TCP,
        functions = [
          #api_fun{ name = listen, arity = 2 },
          #api_fun{ name = accept, arity = 1 },
          #api_fun{ name = send, arity = 2 },
          #api_fun{ name = close, arity = 1 }
        ]
      }
    ]
  }.


%% ----- Properties
prop_time_server() ->
  ?SETUP(
     fun() ->
       eqc_mocking:start_mocking(api_spec()),
       fun() -> ok end
     end,
     ?FORALL(Cmds, commands(?MODULE),
       begin
         {H, S, Res} = run_commands(?MODULE, Cmds),
         aggregate(command_names(Cmds),
           pretty_commands(?MODULE, Cmds, {H, S, Res}, eq(Res, ok)))
       end)).

We can now run the test with QuickCheck and check that the server works as intended, which it seems to do!

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

100.0% {time_server_eqc,start,1}
true

The time server – now with loops!

With the simple version of the server out of the way, we are ready to step it up a notch and make it handle multiple users. After opening the listening socket, it will now enter a loop where it waits for a user to connect, sends the current time and closes the connection before reentering the loop.

The following is the modified server code.

%% Starts the time server on the provided port.
start(TCP, Port) ->
  {ok, Socket} =
    TCP:listen(Port,
      [binary, {reuseaddr, true}, {packet, 4}, {active, false}]),
  server_loop(TCP, Socket).

server_loop(TCP, Socket) ->
  % Wait for a connection.
  {ok, ClientSocket} = TCP:accept(Socket),
  handle_client(TCP, ClientSocket),
  server_loop(TCP, Socket).

How do we write the callouts for this loop? Why not just create a callout function that directly follows the structure of the server loop and let the callouts for start call it?

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?APPLY(server_loop, [Socket]).

server_loop_callouts(_S, [Socket]) ->
  ?MATCH(ClientSocket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, accept, [Socket], {ok, ClientSocket}),
  ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
  ?CALLOUT(?TCP, close, [ClientSocket], ok),
  ?APPLY(server_loop, [Socket]).

This looks pretty good but, unfortunately, it does not work because when QuickCheck encounters a call to a command it will fully evaluate its callout structure. In this particular case, it will start by calling start_callouts and evaluate the returned callouts. Since the last term in these callouts is a call to the model function server_loop, it will call server_loop_callouts and continue evaluating its returned callouts. This is where it goes wrong because the newly returned callouts contain a recursive call to server_loop so QuickCheck will now make another call to server_loop_callouts and, thus, enter an infinite loop and never terminate.

If you attempt to run this, the Erlang session will eventually run out of heap space and crash.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 2733560184 bytes of memory
             (of type "heap", thread 1).
Aborted (core dumped)

To prevent this from happening, we need to somehow stop the callouts loop from recursing uncontrollably and instead unroll the loop one iteration at the time. With QuickCheck we can do exactly that by using ?BLOCK and ?UNBLOCK.

Instead of specifying the exact return value of the call to accept in the server loop callouts, we can say that the return value is ?BLOCK(server_loop_accept), which means that the call to accept will block and wait for another command in the test to give it the return value (the atom server_loop_accept serves as a unique identifier in case there are multiple blocked callouts). When we get the return value we bind it to ClientSocket and handle the client as before.

server_loop_callouts(_S, [Socket]) ->
  ?MATCH({ok, ClientSocket}, ?CALLOUT(?TCP, accept, [Socket],
                             ?BLOCK(server_loop_accept))),
  ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
  ?CALLOUT(?TCP, close, [ClientSocket], ok),
  ?APPLY(server_loop, [Socket]).

When QuickCheck sees that a command gets to a blocking callout, it will suspend execution of that command and its callouts and continue running other commands. In a way this is similar to the command and its callouts being evaluated lazily, driven by the other commands in the test. In our case, this means that we want to have a second command, connect, that simulates that a client is connecting by unblocking the call to accept in the server loop. This command allocates a new socket and uses that as the unblocked return value.

connect_args(_) ->
  [].

connect() ->
  ok.

connect_callouts(_S, []) ->
  ?MATCH(ClientSocket, ?APPLY(allocate_socket, [])),
  ?UNBLOCK(server_loop_accept, {ok, ClientSocket}).

This is pretty neat and separates the test code for the server loop and the connecting clients which is a first step towards being able to test the server with multiple clients connecting concurrently if we wanted to do so.

Keeping track of the server started state

We are, however, not fully done yet. We only want the connect command to run when the server is started (and we do not want to start the server if it is already running) so we add a flag to the test state that tells if the server has been started and add a model function to set it.

-record(state, { started = false,
                 socket_counter = 0 }).

set_started_next(S, _, [Started]) ->
  S#state{ started = Started }.

The start command will now have a precondition that checks that the server is not already started and we make sure that the started flag is set when the command is run.

start_pre(S) ->
  not S#state.started.

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?APPLY(set_started, [true]),
  ?APPLY(server_loop, [Socket]).

Similarly, we add a precondition to the connect command that checks that server is started.

connect_pre(S) ->
  S#state.started.

With this in place we run test again and get the following result which both shows that the server works and that we run the connect command a lot more than the start command which is expected since we can only start the server once in every command sequence but can connect any number of clients.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

93.2% {time_server_eqc,connect,0}
6.8% {time_server_eqc,start,1}
true

It would also be interesting to know how long the generated command sequences are and we can find that out by measuring the command length by using measure in our top-level property in the following way.

prop_time_server() ->
  ?SETUP(  
     fun() ->
       eqc_mocking:start_mocking(api_spec()),
       fun() -> ok end
     end,
     ?FORALL(Cmds, commands(?MODULE),
       begin
         {H, S, Res} = run_commands(?MODULE, Cmds),
         measure('Commands', length(Cmds),
           aggregate(command_names(Cmds),
             pretty_commands(?MODULE, Cmds, {H, S, Res},
                             eq(Res, ok))))
       end)).

When running the test we now get the following output where we can see that on average, the property generates sequences of commands consisting of one call to start and 14 calls to connect.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

93.0% {time_server_eqc,connect,0}
7.0% {time_server_eqc,start,1}

Commands:    Count: 100   Min: 1   Max: 65   Avg: 14.670 
             Total: 1467
true

Serving multiple clients in parallel

A limitation of our server is that it can only serve one client at the time which means that you could easily attack it by connecting to it and refusing to receive the data it sends. This will cause the server to block since it will not accept any more incoming connections until it is done serving the currently connected client.

To address this we will change the server to handle the connected clients in a spawned process and immediately go back to accept more incoming connections. This is a common way of implementing systems in Erlang, you have a loop waiting for something to happen and spawn new processes to handle what happened.

server_loop(TCP, Socket) ->
  % Wait for a connection.
  {ok, ClientSocket} = TCP:accept(Socket),
  spawn(fun() -> handle_client(TCP, ClientSocket) end),
  server_loop(TCP, Socket).

If we make this change to the server and run the test again it will, however, fail with the following output.

Reason:
  Callout mismatch:
    {unexpected, time_server_eqc_TCP:accept({socket, 0}), expected,
       time_server_eqc_TCP:send({socket, 1}, '_')}

This happens because there is now a race condition between the calls to handle_client and server_loop but our callouts specify that the server must handle the connected client before it can accept any new connections. To bring the callouts in line with the updated server we use ?PAR to say that the calls from handle_client will be performed in parallel with the calls from the recursive call of the server loop.

The new callouts look as follows.

server_loop_callouts(_S, [Socket]) ->
  ?MATCH({ok, ClientSocket}, ?CALLOUT(?TCP, accept, [Socket],
                             ?BLOCK(server_loop_accept))),
  ?PAR(
    ?CALLOUTS(
      ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
      ?CALLOUT(?TCP, close, [ClientSocket], ok)),
    ?APPLY(server_loop, [Socket])).

This is pretty cool and I like how we are able to specify the callouts in a way that closely follows the structure of the server loop implementation.

We can now run the test again which will pass!

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

93.1% {time_server_eqc,connect,0}
6.9% {time_server_eqc,start,1}

Commands:    Count: 100   Min: 1   Max: 57   Avg: 14.460
             Total: 1446
true

Fault injection

A neat thing with using ?BLOCK and ?UNBLOCK is that it allows us to add fault injection in a modular way by having a separate command for making the call to accept fail. Before adding the new command we do, however, need to modify the callouts for the server loop so it knows how to deal with the failure.

Instead of pattern matching on the return value of accept inside the enclosing ?MATCH we will bind the return value and have the callouts do different things depending on what value it is.

If we get {ok, ClientSocket} it means that the call to accept succeeded and we have the same callouts as before except that we propagate the return value of the recursive call to the server loop (if the loop would terminate because of a failure). The return value is propagated by pattern matching against the return value of the ?PAR statement and returning the second element of that list (the first element is the return value of the send and close callouts).

If the call to accept fails, we specify that the server should throw an exception because it cannot match the value returned by accept and we do so by using ?RET and ?EXCEPTION.

server_loop_callouts(_S, [Socket]) ->
  ?MATCH(Result, ?CALLOUT(?TCP, accept, [Socket],
                 ?BLOCK(server_loop_accept))),
  case Result of
    {ok, ClientSocket} ->
      ?MATCH([_, Ret],
        ?PAR(
          ?CALLOUTS(
            ?CALLOUT(?TCP, send, [ClientSocket, ?WILDCARD], ok),
            ?CALLOUT(?TCP, close, [ClientSocket], ok)),
          ?APPLY(server_loop, [Socket]))),
      ?RET(Ret);
    _ -> ?RET(?EXCEPTION({badmatch, Result}))
  end.

We also need to update start_callouts to make sure that it propagates the return value of the server loop and changes the server started flag in the state to reflect that it is no longer running.

start_callouts(_S, [Port]) ->
  ?MATCH(Socket, ?APPLY(allocate_socket, [])),
  ?CALLOUT(?TCP, listen, [Port, ?WILDCARD], {ok, Socket}),
  ?APPLY(set_started, [true]),
  ?MATCH(Ret, ?APPLY(server_loop, [Socket])),
  ?APPLY(set_started, [false]),
  ?RET(Ret).

With these changes in place it is trivial to add a command that injects failures. All we need to do is to unblock the server loop with an error return value in the following way.

%% ----- Command: connect_fail
connect_fail_pre(S) ->
  S#state.started.

connect_fail_args(_) ->
  [].

connect_fail() ->
  ok.

connect_fail_callouts(_S, []) ->
  ?UNBLOCK(server_loop_accept, {error, enodev}).

When running the test again we see that the three different commands are generated with roughly the same probability.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

37.9% {time_server_eqc,start,1}
32.8% {time_server_eqc,connect_fail,0}
29.4% {time_server_eqc,connect,0}

Commands:    Count: 100   Min: 1   Max: 64   Avg: 14.370
             Total: 1437
true

Usually, I want to focus more on the positive testing and would like to generate more calls to connect and less calls to connect_fail. We can fine tune the probabilities of the commands by giving them different weights in the following way where we say that connect should have a weight five times higher than the other commands.

%% ----- The command weights
weight(_, connect) -> 5;
weight(_, _) -> 1.

If we run the test again with this weight function we get a better distribution of the commands.

1> eqc:quickcheck(time_server_eqc:prop_time_server()).
......................................................
..............................................
OK, passed 100 tests

68.6% {time_server_eqc,connect,0}
18.7% {time_server_eqc,start,1}
12.7% {time_server_eqc,connect_fail,0}

Commands:    Count: 100   Min: 1   Max: 70   Avg: 14.900
             Total: 1490
true

Summary

I hope that this blog post has been useful and shown how you can use ?BLOCK and ?UNBLOCK together with ?PAR and fault injection to test code with inner loops. Using the techniques in this blog post you can test much more complicated loop structures where you have chains of blocked commands and not only use unblocks to provide a return value but also meta-data used to instruct the unblocked callouts. I do, however, want to keep this blog post simple and focused so that is better left for a future blog post.

Happy testing!

Generating Fixed Test Suites with QuickCheck

John Hughes — Thu, 16 Oct 2014 15:04:01 +0000

Generating Fixed Test Suites with QuickCheck

The great strength of QuickCheck is that we never run out of tests to run: QuickCheck can always find more and more combinations of things to test, and so rare bugs that depend on several features interacting can eventually be found. Nevertheless, sometimes you just want to test fast, and be sure that you did indeed test everything at least once. QuickCheck is able to generate and run saved test suites with a variety of properties, which can be used as a fast sanity check in between longer test runs using random generation of test cases. In this blog post, we’ll show how to use recent additions to QuickCheck to generate better test suites.

The example we’re going to use is a simple specification of the Erlang process registry, which does both positive and negative testing, and moreover kills processes randomly to test the registry’s behaviour when processes crash. You can find the source code we’re starting from here. It’s a state machine specification, with a property instrumented to collect the proportions of each command in the generated tests.

prop_registry() ->
  ?FORALL(Cmds, commands(?MODULE),
          begin
            [catch unregister(N) || N <- ?names],
            {H, S, Res} = run_commands(?MODULE,Cmds),
            pretty_commands(?MODULE, Cmds, {H, S, Res},
              aggregate(command_names(Cmds),
                        Res == ok))
          end).

The property generates a list of commands (line 2), makes sure the registry is clean at the start of each test (line 4), runs the commands (line 5), pretty-prints test results on failure (line 5), displays statistics on the occurrences of each command name (line 6), and checks that the test passed (line 7).

Running QuickCheck generates output like this:

OK, passed 100 tests

25.2% {registry_eqc,spawn,0}
22.3% {registry_eqc,whereis,1}
20.9% {registry_eqc,unregister,1}
16.2% {registry_eqc,register,2}
15.3% {registry_eqc,kill,1}

from which we can see that register and kill are generated a little less often than the other commands (because we can’t call them until after we have spawned at least one pid), but nevertheless each command is called often enough for testing to look reasonably thorough. Now, you can read about how we improved the distribution of these tests in another blog post, “Getting Better Statistics from State Machine Tests”, but this article focusses on generating a suite of fixed tests.

Generating a Suite of Tests

The module we use to generate fixed test suites is eqc_suite. Now, we can use eqc_suite to generate random test suites, but it’s more interesting to generate test suites with a particular goal. We do that using “feature-based testing”, in which QuickCheck runs tests randomly, and saves test cases that exhibit interesting features. Let’s try it:

6> eqc_suite:feature_based(registry_eqc:prop_registry()).
Generating feature based test suite...
No features found.
{feature_based,[]}

We called eqc_suite:feature_based/1, giving it the property we want a test suite for—but it failed to find any interesting test cases! The result returned is an empty test suite: not at all what we wanted.

Why did we find no interesting tests? Because we haven’t yet told QuickCheck what we’re interested in! Let’s see how we can do so.

Declaring Features of Interest

To use feature-based testing, we need to tell QuickCheck which features of interest each test case exhibits. We do this by instrumenting the property under test, adding a call to features/2, rather like we collect statistics by adding calls to aggregate/2. For example, let’s suppose that we are interested in finding test cases that call each function under test. Then we might declare the “features of interest” of each test case to be a list of the functions that it calls, like this:

prop_registry() ->
  ?FORALL(Cmds, commands(?MODULE),
          begin
            [catch unregister(N) || N <- ?names],
            {H, S, Res} = run_commands(?MODULE,Cmds),
            pretty_commands(?MODULE, Cmds, {H, S, Res},
              aggregate(command_names(Cmds),
                features(command_names(Cmds),
                        Res == ok)))
          end).

Line 8 is the addition, which just declares the features to be the same command names that we are aggregating.

Repeating our test suite generation, this time we find 5 test cases:

11> S1 = eqc_suite:feature_based(registry_eqc:prop_registry()).
Generating feature based test suite...
[{registry_eqc,whereis,1}]
[{registry_eqc,spawn,0}]
[{registry_eqc,kill,1}]
[{registry_eqc,unregister,1}]
[{registry_eqc,register,2}]

5 test cases generated.
{feature_based,[{[{registry_eqc,whereis,1}]...

Lines 3–7 tell us that QuickCheck has found a test case for each of those functions; the test cases are returned together in the final test suite, which we save here as S1 (it’s easy to save the test suite in a file, too). Of course, we need to be able run the test suite, which we can as follows:

12> eqc_suite:run(registry_eqc:prop_registry(),S1).
5 tests completed
[]

All the tests passed.

Inspecting the tests

So far, of course, we haven’t even seen the generated tests. The best way to do that is to run the test suite in “verbose” mode:

20> eqc_suite:run(eqc:batch(registry_eqc:prop_registry()),S1,verbose).
%%% Testing [{registry_eqc,whereis,1}]

OK, passed the test.
[{set,{var,1},{call,registry_eqc,whereis,[a]}}]

registry_eqc:whereis(a) -> undefined
...

For each test case, we see why it’s been included in the suite (in this case, to test whereis), whether it passed or not, QuickCheck’s representation of the test case (line 5), and then finally a pretty-printing of what the test actually did (line 7). If we extract the pretty-printed results from each of the five tests, they look like this:

registry_eqc:whereis(a) -> undefined

registry_eqc:spawn() -> <0.11483.1>

registry_eqc:spawn() -> <0.11486.1>
registry_eqc:kill(<0.11486.1>) -> ok

registry_eqc:unregister(a) ->
  {'EXIT',
     {badarg,
        [{erlang, unregister, [a], []},
         {registry_eqc, unregister, 1,
            [{file, "registry_eqc.erl"}, {line, 92}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

registry_eqc:spawn() -> <0.11491.1>
registry_eqc:register(a, <0.11491.1>) -> true

Sure enough, we have a test here for every operation, but the tests are hardly very complete. Can we do better?

Using Call Features as Test Features

We just generated a rather uninteresting test suite, because we specified rather uninteresting features—just whether or not each possible function was called. But suppose we specify that we are also interested in how each function was called? Nowadays QuickCheck allows us to specify the “features” of each call in a state machine test, and the model we’re using actually does so. For example, the features of register are specified in the model as follows:

register_features(S,[Name,Pid],Result) ->
  [success || Result==true] ++
    [name_already_registered || lists:keymember(Name,1,S#state.regs)] ++
    [pid_already_registered || lists:keymember(Pid,2,S#state.regs)] ++
    [pid_is_dead || lists:member(Pid,S#state.dead)].

register_features is used by QuickCheck to compute a list of features for each call of register, which in this case will contain an atom telling us whether the call succeeded, or how it failed. The model contains similar callbacks for whereis and unregister (the other functions which can succeed or fail). We can extract a list of the features of all calls in a test case using call_features/1, which lets us tell eqc_suite to use call features as test case features just by replacing the line

features(command_names(Cmds),

in our property by

features(call_features(H),

After doing so, eqc_suite generates a much more interesting set of seven test cases. Running the tests, we see three tests from the old test suite

registry_eqc:unregister(a) ->
  {'EXIT',
     {badarg,
        [{erlang, unregister, [a], []},
         {registry_eqc, unregister, 1,
            [{file, "registry_eqc.erl"}, {line, 92}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

registry_eqc:whereis(a) -> undefined

registry_eqc:spawn() -> <0.13401.1>
registry_eqc:register(a, <0.13401.1>) -> true

which test unregister, whereis and register in the simplest possible way, but we also see four new tests. There is a test in which unregister succeeds,

registry_eqc:spawn() -> <0.13407.1>
registry_eqc:register(a, <0.13407.1>) -> true
registry_eqc:unregister(a) -> true

a test in which whereis succeeds,

registry_eqc:spawn() -> <0.13413.1>
registry_eqc:register(d, <0.13413.1>) -> true
registry_eqc:whereis(d) -> <0.13413.1>

and two tests in which register fails:

registry_eqc:spawn() -> <0.13410.1>
registry_eqc:kill(<0.13410.1>) -> ok
registry_eqc:register(a, <0.13410.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [a, <0.13410.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

registry_eqc:spawn() -> <0.13404.1>
registry_eqc:register(a, <0.13404.1>) -> true
registry_eqc:register(a, <0.13404.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [a, <0.13404.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

The first of these fails because the pid is dead, and the second fails for two reasons, because the name is already registered, and because the pid is already registered. So now that we have declared our interest in all these possible failure modes, we get test cases in our suite that cover them. Nice!

Refining the call features

Of course, we might feel that QuickCheck “cheated” a little in the last test case above, by testing two features in one test! Perhaps we would prefer to include two tests in our test suite, one for each failure more. Doing so is easy: we just change the features of register as follows:

register_features(S,[Name,Pid],Result) ->
  [[success || Result==true] ++
     [name_already_registered || lists:keymember(Name,1,S#state.regs)] ++
     [pid_already_registered || lists:keymember(Pid,2,S#state.regs)] ++
     [pid_is_dead || lists:member(Pid,S#state.dead)]].

You may be forgiven if you don’t immediately notice the change! All we have done is add another pair of list brackets around the right hand side, so that instead of returning a list of features such as success, name_already_registered or pid_already_registered, the function will return a list of features which are themselves lists: [success], [name_already_registered], or [pid_already_registered]—or, of course, in the test case above, [name_already_registered,pid_already_registered], a list with two elements! By turning call features into lists, we enable eqc_suite to track calls that exhibit several features at the same time, and distinguish them from a combination of calls that exhibit the same features together. With this slight change, we get three new test cases: one where register fails because the pid is already registered,

registry_eqc:spawn() -> <0.16788.1>
registry_eqc:register(a, <0.16788.1>) -> true
registry_eqc:register(b, <0.16788.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [b, <0.16788.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

one where it fails because the name is already registered,

registry_eqc:spawn() -> <0.16791.1>
registry_eqc:register(b, <0.16791.1>) -> true
registry_eqc:spawn() -> <0.16792.1>
registry_eqc:register(b, <0.16792.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [b, <0.16792.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

and one—which I did not think of in advance—which fails for two reasons simultaneously:

registry_eqc:spawn() -> <0.16795.1>
registry_eqc:register(b, <0.16795.1>) -> true
registry_eqc:spawn() -> <0.16796.1>
registry_eqc:kill(<0.16796.1>) -> ok
registry_eqc:register(b, <0.16796.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [b, <0.16796.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

In this test, the last call to register fails both because the name is already registered, and because the pid is dead! This is a good example of how QuickCheck can find—and eqc_suite can keep—tests that a human tester would be unlikely to think of.

Once we’re happy, we can save test suite in a file

eqc_suite:write(registry_tests,S3)

and add code to our model to run the test suite:

run_registry_tests() ->
  eqc_suite:run(registry_eqc:prop_registry(),registry_tests).

This saves the suite in a file called “registry_tests.suite”, and all we need to do to run them is pass eqc_suite:run/2> the file name as an atom, instead of the test suite itself.

Getting Better Statistics from State Machine Tests

John Hughes — Wed, 15 Oct 2014 16:55:33 +0000

Getting Better Statistics from State Machine Tests

One of the risks of property based testing is that, if your tests pass, then you never actually get to see your test data. You could potentially be running a large number of trivial tests, and ending up with a false sense of security as a result. To counter this, it’s important to collect statistics from your tests, and check that the distribution of tests is reasonable. This blog post shows you how to use one of the new features of QuickCheck state machines to get better statistics than you could before.

A Simple Example

As our example, we’re going to use a QuickCheck model of the Erlang process registry, which you can find here. This model combines positive and negative testing of register, unregister, and whereis, and moreover kills processes randomly to test the registry’s behaviour in the presence of crashes.

Here is the property that tests the model:

prop_registry() ->
  eqc:numtests(1000,
  eqc_statem:show_states(
  ?FORALL(Cmds, commands(?MODULE),
          begin
            [catch unregister(N) || N <- ?names],
            {H, S, Res} = run_commands(?MODULE,Cmds),
            pretty_commands(?MODULE, Cmds, {H, S, Res},
                            aggregate(command_names(Cmds),
                                      ?IMPLIES(Res/=precondition_failed,
                                               Res == ok)))
          end))).

It specifies that we should run 1,000 tests, and that the model states should be pretty-printed when a test fails. It generates test cases from the model in this module, cleans up the registry before running the test, runs the generated commands, and then pretty-prints the result in the case of a test failure. The next line specifies that we want to collect statistics on the names of the commands in the test case. The final two lines just check the result of the test.

We’re going to focus here on the statistics, though, collected by this line of code:

aggregate(command_names(Cmds),

When we test this property, we see that it passes, and we see the distribution of command names in the tests:

OK, passed 1000 tests

22.53% {registry_eqc,unregister,1}
22.18% {registry_eqc,whereis,1}
21.88% {registry_eqc,spawn,0}
16.75% {registry_eqc,register,2}
16.65% {registry_eqc,kill,1}

From this we can see that unregister, whereis and spawn (which is used to create processes that we can use as test data) were called equally often, while register and kill were called a little less often—because neither can be called until we have spawned a process to pass to them.

Using Features

So, we called all of the functions in our test reasonably often—but we know nothing about how often we tested positive versus negative cases. For example, register can fail under a variety of circumstances. How often did that happen? A new extension to QuickCheck makes it easy to find out. We can now associate “features” with each call in a test case—in this case, we might associate the features success and failure with calls of register. Then we can collect statistics on the features seen during testing, instead of or as well as the statistics we already collect.

To associate features with calls of register, we just add a definition of the register_features callback to our code:

register_features(_S,_Args,Result) ->
  case Result of
    true ->
      [success];
    {'EXIT',_} ->
      [failure]
  end.

This callback takes the same parameters as a postcondition, such as register_post, and returns a list of features that we assign to the call. So features can depend on the model state, the arguments, or the result of a call. In this case, we just check to see whether register raised an exception, or returned true (indicating success). Adding this callback is enough to make QuickCheck collect these features as tests are run.

Then we replace the line in our property that collects statistics by this one:

aggregate(call_features(H),

telling QuickCheck to collect call features rather than command names. Notice that the call features are extracted from the test case history H, rather than from the generated list of commands, because obviously they depend on the dynamic behaviour during the test. When we run the tests, then instead of the distribution of command names, we see

OK, passed 1000 tests

60.82% {{registry_eqc,register,2},failure}
39.18% {{registry_eqc,register,2},success}

We can see at once that only 40% of the calls to register succeeded—which is quite OK. Notice that QuickCheck automatically tells us which function a feature refers to—we don’t need to track that explicitly.

Let’s collect the success rate of unregister and whereis too:

unregister_features(_,_,Result) ->
  case Result of
    true ->
      [success];
    {'EXIT',_} ->
      [failure]
  end.

whereis fails in a slightly different way: instead of raising an exception, it returns undefined:

whereis_features(_,_,Result) ->
  [Result /= undefined].

Now when we run our tests, we see:

OK, passed 1000 tests

33.46% {{registry_eqc,unregister,1},failure}
33.12% {{registry_eqc,whereis,1},false}
16.27% {{registry_eqc,register,2},failure}
11.09% {{registry_eqc,register,2},success}
3.13% {{registry_eqc,unregister,1},success}
2.93% {{registry_eqc,whereis,1},true}

Now we can see that most calls to unregister and whereis fail! In fact, only around 10% of the calls to one of these operations succeeds. This is not necessarily a problem: in 1,000 tests, each consisting of many calls, we perform so many calls that we will certainly have called both unregister and whereis successfully many times. But these statistics do raise the question of whether we are using our test effort as effectively as we could. The point here, though, is that by assigning features to calls we can easily discover problems in the distribution of tests that would otherwise be quite hard to observe.

Improving the distribution

Let’s see how we can increase the proportion of successful calls to these two functions. First we must understand why so many calls fail. We’re choosing the name to unregister or look up uniformly from a list of five possibilities:

-define(names,[a,b,c,d,e]).

name() ->
  elements(?names).

So if only 10% of the calls are succeeding, then the registry must contain on average only 0.5 entries when we make these calls! We can easily confirm this by adding another measurement to the property we are testing: we add

measure(registered,length(S#state.regs),
        ...)

around the last two lines of the property. Here S is the state of our model at the end of the test, and S#state.regs is our model of the expected registry contents—a list of pairs of registered names and pids. This call measures the number of registered processes at the end of each test, which gives us a good approximation to the average number of processes in the registry overall. Rerunning our tests, we see

registered:    Count: 1000   Min: 0   Max: 4   Avg: 0.4350   Total: 435

which tells us that we measured “registered” 1,000 times, once per test, and the average number of processes we found in the registry was 0.435. No wonder calls to unregister and whereis usually fail!

Registering processes more often

The most obvious way to increase the average number of processes in the registry is to call register more often! We can specify a weight for each call via the weight callback, as follows:

weight(_,register) ->
  20;
weight(_,_) ->
  10.

This specifies that calls to register should be generated with a weight of 20, twice as often as any other function. (Weights must be integers, so we use 10 rather than 1 as the default so that we can later decrease the weights of other functions easily.) When we rerun our tests, we now see

OK, passed 1000 tests

28.72% {{registry_eqc,register,2},failure}
26.03% {{registry_eqc,whereis,1},false}
24.97% {{registry_eqc,unregister,1},failure}
13.61% {{registry_eqc,register,2},success}
3.40% {{registry_eqc,whereis,1},true}
3.27% {{registry_eqc,unregister,1},success}

registered:    Count: 1000   Min: 0   Max: 4   Avg: 0.5900   Total: 590

We increased the average number of registered processes at the end of a test to almost 0.6: an improvement, but a small one. Unsurprisingly, the proportion of successful calls to unregister and whereis did not change much.

Why didn’t this work? Well, looking at the collected call features, we see that most calls to register failed! To make this even clearer, we can add another line to our property to display statistics just for calls to register:

aggregate(call_features(register,H),
          ...)

call_features/2 lets us collect statistics about the calls to just one function, so we can easily see what proportion of register calls succeed, for example. Of course, we can aggregate call features several times, to analyze several functions independently, as well as or instead of using call_features/1 to see the overall picture.

The resulting output tells its sorry tale very clearly:

68.33% {{registry_eqc,register,2},failure}
31.67% {{registry_eqc,register,2},success}

Before we adjusted its weight, 60% of calls to register failed. Now, almost 70% do! Our problem is not just that we call register too rarely—it is that most calls fail.

Why do calls to register fail?

Time to collect some more information. Looking into the model, we find a function that predicts whether or not a call to register is expected to fail:

register_ok(S,[Name,Pid]) ->
  not lists:keymember(Name,1,S#state.regs)
    andalso not lists:keymember(Pid,2,S#state.regs)
    andalso not lists:member(Pid,S#state.dead).

There are three possible reasons for failure: because the name is already registered, because the pid is already registered, or because the pid is dead. We can investigate which of these is happening by redefining the features of a register call to tell us the reason for failure:

register_features(S,[Name,Pid],Result) ->
  [success || Result==true] ++
    [name_already_registered || lists:keymember(Name,1,S#state.regs)] ++
    [pid_already_registered || lists:keymember(Pid,2,S#state.regs)] ++
    [pid_is_dead || lists:member(Pid,S#state.dead)].

(The list comprehensions with only a condition just include the given element in the resulting list when the condition is true). Running our tests again, we see

36.45% {{registry_eqc,register,2},pid_is_dead}
28.43% {{registry_eqc,register,2},success}
21.41% {{registry_eqc,register,2},pid_already_registered}
13.71% {{registry_eqc,register,2},name_already_registered}

So most calls to register fail because the pid we try to register is dead! The next most common reason for failure is that the pid is already registered, and the least frequent problem is that the name is already registered.

So how can we make register succeed more often? Simple! We shall kill processes less often, so that fewer pids become dead, and we shall spawn processes more often, so that register is more likely to choose a fresh pid when it is called. Revising our weights like this:

weight(_,spawn)    -> 50;
weight(_,register) -> 20;
weight(_,kill)     -> 5;
weight(_,_)        -> 10.

we get the following results:

OK, passed 1000 tests

25.90% {{registry_eqc,register,2},success}
20.06% {{registry_eqc,unregister,1},failure}
18.90% {{registry_eqc,whereis,1},false}
11.61% {{registry_eqc,register,2},name_already_registered}
7.53% {{registry_eqc,register,2},pid_already_registered}
6.57% {{registry_eqc,whereis,1},true}
5.68% {{registry_eqc,unregister,1},success}
3.76% {{registry_eqc,register,2},pid_is_dead}

53.08% {{registry_eqc,register,2},success}
23.80% {{registry_eqc,register,2},name_already_registered}
15.43% {{registry_eqc,register,2},pid_already_registered}
7.70% {{registry_eqc,register,2},pid_is_dead}

registered:    Count: 1000   Min: 0   Max: 5   Avg: 1.1600   Total: 1160

We increased the average number of pids in the registry to over 1.1; we made over 50% of calls to register succeed; we increased the successful calls of unregister and whereis to more than one in five. Progress—but we’re not there yet!

Unregistering less often

Given that we are using five different names as test data, we might like the average number of processes in the registry to be around 2.5. We are still some way away from that–and one reason is that, as soon as a process is in the registry, we start trying to unregister it again! Somewhat paradoxically, if we want unregister to succeed more often, then we need to call it less often! So we shall reduce the weight of unregister:

weight(_,spawn)      -> 50;
weight(_,register)   -> 20;
weight(_,unregister) -> 1;
weight(_,kill)       -> 5;
weight(_,_)          -> 10.

If we also add two more calls to aggregate to collect the features of calls to unregister and whereis separately, then we see:

OK, passed 1000 tests

29.47% {{registry_eqc,register,2},success}
23.72% {{registry_eqc,whereis,1},false}
18.25% {{registry_eqc,register,2},name_already_registered}
10.38% {{registry_eqc,register,2},pid_already_registered}
9.24% {{registry_eqc,whereis,1},true}
5.28% {{registry_eqc,register,2},pid_is_dead}
2.56% {{registry_eqc,unregister,1},failure}
1.10% {{registry_eqc,unregister,1},success}

46.49% {{registry_eqc,register,2},success}
28.80% {{registry_eqc,register,2},name_already_registered}
16.38% {{registry_eqc,register,2},pid_already_registered}
8.33% {{registry_eqc,register,2},pid_is_dead}

69.9% {{registry_eqc,unregister,1},failure}
30.1% {{registry_eqc,unregister,1},success}

72.0% {{registry_eqc,whereis,1},false}
28.0% {{registry_eqc,whereis,1},true}

registered:    Count: 1000   Min: 0   Max: 5   Avg: 1.4310   Total: 1431

We’re up to 1.4 processes in the registry on average, and a 30% success rate for calls to unregister and whereis. As one final refinement, we’ll try generating longer test cases—it may be that we’re now generating so many calls to spawn that we don’t have time in one test to fill the registry up. We replace line 3 of the property by

  ?FORALL(Cmds, more_commands(3,commands(?MODULE)),

which generates test sequences three times as long as the default, and run our tests again:

OK, passed 1000 tests

26.34% {{registry_eqc,register,2},name_already_registered}
23.99% {{registry_eqc,register,2},success}
17.58% {{registry_eqc,whereis,1},false}
13.47% {{registry_eqc,whereis,1},true}
10.26% {{registry_eqc,register,2},pid_already_registered}
5.03% {{registry_eqc,register,2},pid_is_dead}
1.90% {{registry_eqc,unregister,1},failure}
1.43% {{registry_eqc,unregister,1},success}

40.14% {{registry_eqc,register,2},name_already_registered}
36.55% {{registry_eqc,register,2},success}
15.64% {{registry_eqc,register,2},pid_already_registered}
7.67% {{registry_eqc,register,2},pid_is_dead}

57.1% {{registry_eqc,unregister,1},failure}
42.9% {{registry_eqc,unregister,1},success}

56.60% {{registry_eqc,whereis,1},false}
43.40% {{registry_eqc,whereis,1},true}

registered:    Count: 1000   Min: 0   Max: 5   Avg: 2.0080   Total: 2008

Now we have on average two processes in the registry at the end of a test, and 40-50% of calls to unregister and whereis succeed—we’re calling unregister less often, but more calls are succeeding (in absolute, not just relative terms). This is a pretty good result. We could go further, and try to reduce the “wasted” calls to register, perhaps by reducing its weight if the registry is “nearly full”, or perhaps by generating the names we use more carefully—but this is enough for today.

The lesson is: assigning features to calls, and gathering statistics about them, can give us much more information about our tests, make problems obvious, and help us get better value from the machine time we spend on testing.

Adding a GitHub repository to QuickCheck-CI

Alex Gerdes — Fri, 10 Oct 2014 08:08:37 +0000

Adding a GitHub repository to QuickCheck-CI

In this blog article we are going to explain how you can add a repository on GitHub to QuickCheck Continuous Integration. In contrast to other continuous integration systems our CI system is specialised in property based testing. After the code has been built, we run QuickCheck on the properties that are present in the repository. Using these properties we can thoroughly test the software and provide detailed feedback in case a property fails. In addition we generate coverage information, which can be used to spot dead code or to find the location of a bug.

Unfortunately many projects on GitHub haven’t defined properties yet. Many projects, however, do have unit tests. A property abstracts away from a single unit tests, and covers many unit tests. We are going to show how to define a property based on a unit test.

Apache Couchdb

We are going to use Apache CouchDB (with git revision 6fc1124) as an example project. Apache CouchDB, commonly referred to as CouchDB, is an open source database that focuses on ease of use. It is a NoSQL database that uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API. The project consists of many Erlang modules and contains two test suites: one based on ETAP and another one based on Grunt. If we run the latter we get some unexpected behaviour:

    ...
      NavBar
        adding links
          ✓ Should add link to navlinks
          ✓ Should add link to bottom links
          ✓ Should add link to footer links
        removing links
          ◦ should remove links from list: Error loading resource file:///app/templates/layouts/myTemplate.html (203). Details: Error opening /app/templates/layouts/myTemplate.html: No such file or directory
    Error loading resource file:///app/templates/layouts/myTemplate.html (203). Details: Error opening /app/templates/layouts/myTemplate.html: No such file or directory
    Error loading resource file:///Users/alex/Repositories/quviq/couchdb/src/ (301). Details: Protocol "file" is unknown
          ✓ should remove links from list
          ✓ Should call render after removing links

      TabMenu
        on change polling rate
          ✓ Should set polling rate
          ✓ Should clearInterval
          ✓ Should trigger update:poll event
          ✓ should set the correct active tab
        on request by type
          ✓ should set the filter 'database' for the main-view

      DataSection
        #setPolling
          ✓ Should set polling interval

      Replication Addon
        Replication View
          ✓ should render

      83 passing (396 ms)

    Done, without errors.

Although there are clearly some individual tests failing, due to missing input files, the overall test run is still regarded successful. If we would add such a test suite to a continuous integration server, these kind of errors will not be detected. It would be an improvement if the test suite returns an error if one of the containing test fails. We do not investigate this test suite further, instead, in the remainder of this blog we look at the ETAP test suite.

Abstracting unit tests to properties

There are about 50 tests defined in the ETAP test suite. A test consists of a sequence of assertions, where an assertion is a function call applied to some given arguments together with the expected result. Consider an excerpt from the 010-file-basics.t ETAP test file:

    {ok, Fd} = couch_file:open(filename() ++ ".0", [create, overwrite]),
    etap:ok(is_pid(Fd),
        "Returned file descriptor is a Pid"),

    etap:is({ok, 0}, couch_file:bytes(Fd),
        "Newly created files have 0 bytes."),

    ?etap_match(couch_file:append_term(Fd, foo), {ok, 0, _},
        "Appending a term returns the previous end of file position."),

    {ok, Size} = couch_file:bytes(Fd),
    etap:is_greater(Size, 0,
        "Writing a term increased the file size."),

    ?etap_match(couch_file:append_binary(Fd, <<"fancy!">>), {ok, Size, _},
        "Appending a binary returns the current file size."),

    etap:is({ok, foo}, couch_file:pread_term(Fd, 0),
        "Reading the first term returns what we wrote: foo"),

    etap:is({ok, <<"fancy!">>}, couch_file:pread_binary(Fd, Size),
        "Reading back the binary returns what we wrote: <<\"fancy\">>."),

    etap:is({ok, couch_compress:compress(foo, snappy)},
        couch_file:pread_binary(Fd, 0),
        "Reading a binary at a term position returns the term as binary."
    ),

    {ok, BinPos, _} = couch_file:append_binary(Fd, <<131,100,0,3,102,111,111>>),
    etap:is({ok, foo}, couch_file:pread_term(Fd, BinPos),
        "Reading a term from a written binary term representation succeeds."),

This particular instance tests the couch_file module. This is a low-level module that handles file interactions, such as reading and writing. The setup of this particular test is very common and straightforward. In essence, it opens a file, writes something in the opened file, and then tries to read from that file. The result of reading from file should be equal to what was written. The test for appending a term to the file is however only performed for a single term, namely foo. After a successful test run we may conclude that it is possible to append a term to a file, but one might ask if we tested the append_term function, which writes a term to a file, thoroughly enough. Are we sure we can handle large terms as well? What happens if we write multiple entries or interleave the function call with other operations, such as reading? Do the order of the assertions have any influence on the test results–for example, if we append some binary data before we append a term? If we want to test these kind of interactions we need to write many unit tests, which is tedious and error prone work.

Instead of testing unit tests, we propose to use property based testing using QuickCheck. With QuickCheck you can define properties that your software should have. For example, the append_term function should not only work for foo, but also for any other term. A property that captures this characteristic of append_term can be expressed as follows:

    prop_append_term() ->
      ?FORALL(Term, term(),
              begin
                {ok, Fd} = couch_file:open(?TESTFILE, [create, overwrite]),
                ok(is_pid(Fd), "Returned file descriptor is a Pid"),
                is({ok, 0}, couch_file:bytes(Fd), "Newly created files have 0 bytes."),
                case couch_file:append_term(Fd, Term) of
                  {ok, 0, _} -> true;
                  _ -> exit("Appending a term returns the previous end of file position.")
                end,
                {ok, Size} = couch_file:bytes(Fd),
                couch_file:close(Fd),
                Size > 0 % Writing a term increased the file size
              end).

The ?FORALL macro takes a generator as second argument, in this case a term generator, and binds the generated term to the variable Term. This variable is used in the third argument, the actual property, which uses the generated value to determine if the function (or system) under test behaves as expected. In this case we check the same sequence of commands as is present in the ETAP test. We open a file, check if the first index is zero, append the term to the file, and finally check if the size of the file has increased. We can validate this property as follows:

    1> eqc:quickcheck(prop_append_term(Fd)).
    ....................................................................................................
    OK, passed 100 tests
    true

We call the quickcheck function from the eqc module, and conclude that we can successfully call the append function a hundred times with randomly generated terms. Note that we could make this property stronger if we can predict what the size value is going to be. This is relatively hard without knowing how this function is implemented. We could however determine if the Index is correct after every call to append_term, because we merely need to add the previous term size and index that we get returned from each append call. This implies that we need to maintain state information, namely the return values of the append calls.

EQC state machine

QuviQ QuickCheck offers a library called eqc_statem to handle operations that have side-effects, which are specified via a state machine. The following may seem a bit too general for the problem at hand, but it will save time in the long run.

A minimal QuickCheck state machine with only one call (to append_term) is specified as follows:

    -define(TESTFILE, "couch_eqc_test_file").
    -record(state, {file_desc, indexes}).

    initial_state(Fd) ->
      #state{file_desc = Fd, indexes = []}.

    %% -- Append call
    append_term_args(S) ->
      [S#state.file_desc, term()].

    append_pre(_S) ->
      true.

    append_term(Fd, Term) ->
      couch_file:append_term(Fd, Term).

    append_term_next(S, Result, _Args) ->
      S#state{indexes = [Result | S#state.indexes]}.

    append_term_post(S, _Args, {ok, NextIndex, _}) ->
      CalcIndex = case S#state.indexes of
                    []                          -> 0;
                    [{ok, Index, TermSize} | _] -> Index + TermSize
                  end,
      NextIndex == CalcIndex. 

    %% -- Property
    prop_append(Fd) ->
      ?FORALL(Cmds, commands(?MODULE, initial_state(Fd)),
              begin
                couch_file:truncate(S#state.file_desc, 0),
                {H, S, Res} = run_commands(?MODULE,Cmds),
                pretty_commands(?MODULE, Cmds, {H, S, Res}, Res == ok)
              end).

The above code snippet shows how to define an operation for a state machine. The append_term operation is defined by five functions. The function with the _args postfix get the state as an argument and returns the (generated) arguments for the operation in a list. We can put a precondition on an operation using the postfix _pre function. In this example the precondition function just returns true; in other words, it is always valid to perform the append operation (and actually, we could have omitted this definition, since a true precondition is the default). The append_term receives two arguments: a file descriptor, which is stored in the state record, and a generated term. The function term() returns a QuickCheck generator for Erlang terms; we will later explain this in more detail. The append_term performs the actual operation by calling the appropriate function from the couch_file module. The function with the postfix _next updates the state. We want to use the results of a previous call to append to calculate the expected size. We therefore store the results of a call in the indexes field of the state. The stored result is used in the fifth function, with the _post postfix, which checks the post condition for the append call. The post condition checks if the addition of the previous index and term size is equal to the next index. In case of the first call to append, that is the indexes list is empty, the calculated index is zero.

To test the append operation we need to define a state machine property and an initial state. The latter is defined by providing an implementation for the initial_state/1 function. We use a state record to store the file descriptor and a list of indexes. The property for the state machine is a general one, and basically states to generate a list of commands, execute those, and check if no post condition has been violated. Note that we empty the file before each test run using the truncate function, so that we start in a known state. More information about QuickCheck state machines and generators can be read in the QuickCheck documentation.

Just as with the more simple property we can validate the state machine property by running the quickcheck function:

    2> eqc:quickcheck(prop_append()).
    ....................................................................................................
    OK, passed 100 tests
    true

Again, we can run a hundred tests without any problems. But what did we actually test? How many append calls did we do? It would be interesting to see number of consecutive calls to append we did. Let us adapt the property to include some statistics gathering. We add the function collect/2 to the property to collect the length of the generated sequence of commands (Cmds), for now a list of append calls.

    prop_append(Fd) ->
      ?FORALL(Cmds, commands(?MODULE, initial_state(Fd)),
              collect(length(Cmds),
              begin
                couch_file:truncate(S#state.file_desc, 0),
                {H, S, Res} = run_commands(?MODULE,Cmds),
                pretty_commands(?MODULE, Cmds, {H, S, Res}, Res == ok)
              end)).

With this modification we get the following result when running quickcheck:

    3> eqc:quickcheck(prop_append()).
    ....................................................................................................
    OK, passed 100 tests

    12% 1
    8% 4
    8% 0
    5% 6
    5% 3
    5% 2
    4% 33
    4% 10
    4% 7
    3% 41
    3% 20
    3% 11
    3% 9
    3% 8
    2% 57
    2% 24
    2% 22
    2% 18
    2% 15
    2% 14
    2% 13
    2% 12
    2% 5
    1% 74
    1% 72
    1% 69
    1% 47
    1% 42
    1% 40
    1% 39
    1% 36
    1% 35
    1% 29
    1% 21
    1% 16
    true

We observe that we indeed call the append operation many times. This does not give us much confidence in the append implementation though. We do not check if the terms are stored properly and that we can read them back. We are going to add the read operation, as well as many other operations, to the state machine specification. But first we explain a bit about QuickCheck generators, then we show how to add a github repository to QuickCheck CI, and then we will come back to the state machine to extend the set of operations.

Generators

QuickCheck generators are used to generate random test data for QuickCheck properties. A generator specifies three things at the same time:

a set of values that can be generated,
a probability distribution on that set,
a way of shrinking generated values to similar, smaller values

The last point is very important because the generated values can be quite large and may contain irrelevant parts. After a test fails QuickCheck searches for a similar, but simpler failing test case by shrinking the value. The shrinking process cuts away the irrelevant parts. QuickCheck generators are composable and can be combined to quickly specify new generators. The code below shows the generator for Erlang terms that we have used before:

    literal() ->
      oneof([atom(), bool(), char(), int()]).

    atom() ->
      ?LET(S, list(char()), list_to_atom(S)).

    tuple(Gen) ->
      ?LET(Xs, list(Gen), list_to_tuple(Xs)).

    term(0) ->
      literal();
    term(Size) ->
      N = Size div 3,
      oneof([literal(), list(term(N)), tuple(term(N))]).

    term() ->
      ?SIZED(Size, term(Size)).

We start with defining a generator for literals (literals/1), such as integers and characters. The bool/1, char/1, and int/1 are predefined literal generators provided by the QuickCheck generator library. The list/1 generator generates random lists with elements of a given generator. In the atom/1 generator we use the list generator to create a generator of lists of characters. Furthermore, this generator uses the ?LET macro, which is used to create a new generator that binds values of a supplied generator and uses these values to generate values. In this case we convert the generated list of characters (bound to the variable S) to an Erlang atom.

QuickCheck lets you also define own generators for recursive data structures, such as the term/1 generator. When using recursive generators we need to be careful with the size of the generated value. To keep the size of recursive data structures under control we divide the size of the recursive call by three, and use the ?SIZED macro to let QuickCheck take care of the actual size.

There are many more generator combinators and predefined generators that can be used to build QuickCheck generators.

QuickCheck Continuous Integration

QuickCheck CI is a continuous integration server that runs QuviQ QuickCheck on a GitHub project. Open Source Developers can use QuickCheck CI to get free access to QuviQ QuickCheck for their quality assurance. Before you can register your project to QuickCheck CI, you need to add a licence file to the project, which states that we are allowed to build the project and execute tests. In addition, you need to supply a configuration file that explains how to build your project. After that you can register the GitHub URL to our CI server. More information about adding a project can be found here.

When registered, we can queue a build and run QuickCheck on the stated properties in the project. The CI system gives a nice overview of whether or not a project could be built and shows the results of all properties along with coverage information.

Since we do not have access to the Apache CouchDB GitHub repository, we have, for this example, forked the repository and added the licence file (note that the licence of Apache CouchDB allows for this). We also added the following configuration file:

{build, "./configure ; make ; cd test/eqc/ebin/ ; erl -make "}.
{test_path, "test/eqc/ebin"}.
{deps, "src/couch/ebin"}.
{deps, "src/couch_stats/ebin"}.
{deps, "src/snappy/ebin"}.
{deps, "src/ioq/ebin"}.

We can now add the project to the CI server and queue a build for the forked project. After we have verified that we can successfully build the project, we can continue to add a module with the QuickCheck property, which we defined above, and start testing with QuickCheck.

Complete model

So far we have only added the append operation to our model of couch_file. As mentioned before this gives little confidence in
this particular operation, as well as the overall module. We therefore complete the model with the other operations:

open
append_binary
pread_binary / pread_binary
truncate
bytes
write_header / read_header
sync
close

We omit the explanation of most of these functions because the names give a good indication of the semantics. Indeed, the names are in some cases the only documentation available! So we have used QuickCheck to gain an understanding of how the API behaves: we learned that a file contains any number of terms or binaries, each of which has a location in the file, and one “header” which is read and written by separate calls. We extended our state record to model this, keeping track of the written contents paired with their index, and we added a field that stores the header. Note that we can adapt the distribution of the operations, and we made sure that we call the append... and pread... operations more often than, for example, the close operation. The source code of the model can be inspected here.

We now discuss some peculiarities we encountered during the development of the model.

Deviations from the model

In our first implementation of the model we expected the bytes operation, which measures the total size of the file, to return the sum of the index and the size of the last term to be appended. We store these indices in the state record. However, when we run QuickCheck on the model we get the following counterexample:

    couch_eqc_found_bytes_inconsistency:open() -> {ok, <0.10858.0>}
    couch_eqc_found_bytes_inconsistency:write_header({ok, <0.10858.0>}, []) -> ok
    couch_eqc_found_bytes_inconsistency:bytes({ok, <0.10858.0>}) -> {ok, 23}

    Reason: {postcondition, {0, '/=', 23}}

This counterexample shows us that we need to take the size of the header into account. Our impression had been that the header is treated differently from the other appended contents. But the bytes operation does not make this distinction. This is fair enough–we found a misunderstanding in our model. After we adjusted our model to reflect this, we got the following counterexample:

    couch_eqc_truncate_removes_header:open()
      -> {ok, <0.13038.0>}
    couch_eqc_truncate_removes_header:append({ok, <0.13038.0>}, {list_bin, [<<>>]})
      -> {ok, 0, 5}
    couch_eqc_truncate_removes_header:write_header({ok, <0.13038.0>}, [])
      -> ok
    couch_eqc_truncate_removes_header:truncate({ok, <0.13038.0>}, {ok, 0, 5})
      -> ok
    couch_eqc_truncate_removes_header:read_header({ok, <0.13038.0>})
      -> no_valid_header

    Reason: {postcondition, {no_valid_header, '/=', {ok, []}}}

In this counterexample, we expect the truncate to remove the term that we wrote, but the results show that we also removed the header written afterwards. This makes us suspect that a header is written just like any other term or binary, and not treated specially. Not knowing the specification we cannot determine whether or not this is the intended behaviour. We again adapt the model: we remove the header if it is written after an index that we truncate.

With this modified model we run into the next related counterexample:

    couch_eqc:open() -> {ok, <0.350.0>}
    couch_eqc:write_header({ok, <0.350.0>}, []) -> ok
    couch_eqc:append({ok, <0.350.0>}, {term, []}) -> {ok, 23, 6}
    couch_eqc:write_header({ok, <0.350.0>}, []) -> ok
    couch_eqc:truncate({ok, <0.350.0>}, {ok, 23, 6}) -> ok
    couch_eqc:read_header({ok, <0.350.0>}) -> {ok, []}

    Reason: {postcondition, {{ok, []}, '/=', no_valid_header}}

What happens here is that we write a header twice, do a truncate, and then try to read the header. The truncate is done with an index that is in between the two write_header operations. After the previous counterexample, we adapted our model to reset the header if we do a truncate before the header is written. We therefore expected that the couch_file implementation would return the no_valid_header value, as we have just removed the header. Instead, we get the header that was written the first time. We can conclude that a header is not overwritten if we call the write_header operation more than once. So if you are going to use the header functionality together with the truncate function, you need to do some bookkeeping to determine which header you can expect–previously overwritten headers seem to “return from the dead”!

Only later we found out that there is an additional ETAP test file that contains unit tests for the header functionality. It became clear that the displayed behaviour is actually intended. This came as a bit of a surprise; it seems that a header is not associated with a file, as the name suggests, but instead it is used as a kind of marker for a block of entries. We again adapted the model and are now able to run thousands of tests without any problems.

This example shows that QuickCheck finds inconsistencies quickly, using generated test cases that are hard to come up with yourself. The couch_file module is a very low-level module of the CouchDB project, but it was interesting to pin down its specification with QuickCheck. A possible continuation would be to test a more high-level module of the CouchDB project.

Checksum property for AUTOSAR

Thomas Arts — Mon, 21 Jan 2013 07:15:39 +0000

Checksum property for AUTOSAR

Property based testing is based upon the idea that one writes a property about the software under tests and from that property the tool QuickCheck automatically generates test cases to test your software.

Properties can be difficult to formulate, since a lot of software has grown and the once clear ideas have been polluted and clear properties have been obfuscated. That is, if the software has not been developed using property based development. However, there is a simple property that comes in handy very often: one version of the software should for a certain input domain be the same as a different version. This comparison with other versions or even other implementations can often reveal errors introduced by optimizations or compatibility issues between competing implementations.

In this post we take a simple example of a table based implementation and a computation based computation of a checksum algorithm. The example was motivated by the implementation of the CRC library in the AUTOSAR standard. Two differently configured C programs should give equal results on the same input. This is a commonly occurring phenomena, but the solution presented is kept simple and does not take stateful systems into account; this extension is possible and subject for a later post.

Modelling CRC routines

Computing checksums is based on mathematical theory where CRC-Algorithms are related to polynomials with coefficients in the finite field of two elements. The Hamming Distance, i.e., the number of bits that an be detected to have changed, is determined by the generator polynomial chosen. The AUTOSAR specification explains these algorithms in general terms and provides some optimizations for specific values.

It is important to realize that the AUTOSAR specification specifies a family of CRC functions that are parametrised in several ways. The specification furthermore provides allowed subsets in the parameter space. Different End-to-End implementations are required to use different CRC functions or the same function with different parameters. We describe a QuickCheck model for the CRC CalculateCRC8 function. This function has fixed the CRC result to 8 bits, a polynomial with value 0x1D, an inital value 0xFF, no input reflection nor result data reflection, and an XOR value of 0xFF.

The CRC-Algorithms have some mathematical properties that can be used as weak validators of implementations of the algorithms. These properties provide a simple mechanism to see if the implementation computes a correct value in a number of data points. The first ‘property’ is Check, stating that if the implementation is given the ASCII values ”123456789” then it should return the value 0x4B. The second property is Magic Check, which is a check for any data sequence. Given a number of bytes, compute the checksum, compute the checksum over data plus checksum and XOR that with 0xFF. The result for CRC_CalculateCRC8 should always be 0xC4.
Clearly, Check is just one test case, whereas Magic Check is a property that describes a large number of possible test cases that all should result in the same value 0xC4.

Mathematical properties in Erlang and QuickCheck

Now assume that we have a C implementation of the function CRC_CalculateCRC8 with prototype as specified in the AUTOSAR standard:

uint8 Crc_CalculateCRC8(const uint8* Crc_DataPtr, uint32 Crc_Length,
                        uint8 Crc_StartValue8, boolean Crc_IsFirstCall)

The first argument is a pointer to the data, the second argument the number of bytes in the data block, the third argument is a start value which is only used if the calculation uses multiple calls to this function, in which case the fourth argument should be set to false.

Performing the Check property in Erlang and QuickCheck is done as follows. First one links the C implementation to the test framework by calling the function eqc_c:start(’Crc’), given that the name of the C module is ”Crc.c”. The above mentioned CRC_CalculateCRC8 function is now accessible and call be called from the framework.

The test case is constructed in Erlang by first allocating some memory in the C threat the C program is running in, putting the data ”123456789” in that memory location and calling the function with arguments: 9 bytes long data, initial value 0xFF as defined by the standard and 1 (TRUE) for the last argument. The test cases looks in Erlang and QuickCheck as:

check() ->
  Ptr   = eqc_c:create_array("uint8","123456789"),
  16#4B = Crc:Crc_CalculateCRC8(Ptr,9,16#FF,1).

The first assignment in this Erlang program binds the variable Ptr to the value of the C pointer in which the array of bytes is stored. The second assignment is an assertion only succeeding if indeed the called function returns the hexadecimal value 4B

In this test case, we use both the initial value and we supply the fact that we compute the CRC-result in a single call. If the last argument of this function is TRUE, then the function should always start from the initial value, ignoring Crc_StartValue8. In other words, the same test should hold for any 8 bit value of Crc_StartValue8. This is where QuickCheck comes in. We can write a property over arbitrary values:

prop check() ->
  ?FORALL(Crc_StartValue8, byte(),
          begin
            Ptr = eqc_c:create_array("uint8","123456789"),
            16#4B == Crc:Crc_CalculateCRC8(Ptr,9,Crc_StartValue8,1)
          end).

A QuickCheck property is represented as an Erlang macro ?FORALL with three arguments. This corresponds to logical quantification with the second argument the domain of the quantification, in this case byte(). The generator byte() is a QuickCheck data generator that during testing will generate random values from 0 to 255. This generated value is bound to the first argument of the ?FORALLmacro, the binding variable Crc_StartValue8. The third and last argument is a logical statement in which the randomly generated value is used. Note that instead of the assertion Erlang’s equality operator (a double ==) is used. Running QuickCheck will result in hundred tests with a random value selected from the interval 0 to 255. Clearly, it would be possible to just check all possible values in this case, but in general enumerating all possible values for all parameters is impossible within reasonable time limits.

Now we would also like to vary the input data and send other blocks than the ASCII values from 1 to 9. However, if we were to do that, then we would need to know the outcome. Clearly the outcome is not always 4B! One could re-implement the function in Erlang and compare the Erlang result with the C function result. The disadvantages are that one has to implement the same function twice and that one can make the same mistake twice, therewith potentially not detecting a bug. Instead, we use the mathematical property Magic Check that can be performed independently of the data provided.

The AUTOSAR specification provides an example for Magic Check for the function CRC CalculateCRC8 as follows:

Magic check calculation of SAE-J1850 CRC8 over data bytes 00h 00h 00h 00h:

CRC generation: CRC over 00h 00h 00h 00h, start value FFh:
CRC-result = 59h
CRC check: CRC over 00h 00h 00h 00h 59h, start value FFh:
CRC-result = 3Bh
Magic check = CRC-result XORed with XOR value:
C4h = 3Bh xor FFh

That is, for each data array, we compute the checksum, add the checksum to the data and compute the checksum over this extended array. After that, the checksum is XOR-ed with 0xFF and the outcome should then be 0xC4 irrespective of the initial data.

In order to write a QuickCheck property for this, we use a generator for arbitrary sequences of data list(byte()). We represent an array in Erlang as a list of bytes and use the append operator ++ to add an element to the end of the list. Finally we use Erlang’s build in operator for XOR on numbers, called bxor (the xor operator is reserved for boolean values).

prop_magic_check() ->
  ?FORALL(Data, list(byte()),
          begin
            Ptr1 = eqc_c:create array("uint8",Data),
            N    = length(Data),
            Res1 = Crc:Crc_CalculateCRC8(Ptr1,N,16#FF,1),
            Ptr2 = eqc_c:create_array("uint8",Data++[Res1]),
            Res2 = Crc:Crc_CalculateCRC8(Ptr2,N+1,16#FF,1),
            16#C4 == Res2 bxor 16#FF
          end)).

With this property we test for arbitrary blocks of data. Clearly, it gets rather quickly impossible to enumerate all these values. It would take far too long to check all data blocks up to 10 bytes which each byte having 256 different values. Randomly selecting arbitrarily large blocks of data turns out to be very effective in detecting errors.

by