Generating Fixed Test Suites with QuickCheck

The great strength of QuickCheck is that we never run out of tests to run: QuickCheck can always find more and more combinations of things to test, and so rare bugs that depend on several features interacting can eventually be found. Nevertheless, sometimes you just want to test fast, and be sure that you did indeed test everything at least once. QuickCheck is able to generate and run saved test suites with a variety of properties, which can be used as a fast sanity check in between longer test runs using random generation of test cases. In this blog post, we’ll show how to use recent additions to QuickCheck to generate better test suites.

The example we’re going to use is a simple specification of the Erlang process registry, which does both positive and negative testing, and moreover kills processes randomly to test the registry’s behaviour when processes crash. You can find the source code we’re starting from here. It’s a state machine specification, with a property instrumented to collect the proportions of each command in the generated tests.

prop_registry() ->
  ?FORALL(Cmds, commands(?MODULE),
          begin
            [catch unregister(N) || N <- ?names],
            {H, S, Res} = run_commands(?MODULE,Cmds),
            pretty_commands(?MODULE, Cmds, {H, S, Res},
              aggregate(command_names(Cmds),
                        Res == ok))
          end).

The property generates a list of commands (line 2), makes sure the registry is clean at the start of each test (line 4), runs the commands (line 5), pretty-prints test results on failure (line 5), displays statistics on the occurrences of each command name (line 6), and checks that the test passed (line 7).

Running QuickCheck generates output like this:

OK, passed 100 tests

25.2% {registry_eqc,spawn,0}
22.3% {registry_eqc,whereis,1}
20.9% {registry_eqc,unregister,1}
16.2% {registry_eqc,register,2}
15.3% {registry_eqc,kill,1}

from which we can see that register and kill are generated a little less often than the other commands (because we can’t call them until after we have spawned at least one pid), but nevertheless each command is called often enough for testing to look reasonably thorough. Now, you can read about how we improved the distribution of these tests in another blog post, “Getting Better Statistics from State Machine Tests”, but this article focusses on generating a suite of fixed tests.

Generating a Suite of Tests

The module we use to generate fixed test suites is eqc_suite. Now, we can use eqc_suite to generate random test suites, but it’s more interesting to generate test suites with a particular goal. We do that using “feature-based testing”, in which QuickCheck runs tests randomly, and saves test cases that exhibit interesting features. Let’s try it:

6> eqc_suite:feature_based(registry_eqc:prop_registry()).
Generating feature based test suite...
No features found.
{feature_based,[]}

We called eqc_suite:feature_based/1, giving it the property we want a test suite for—but it failed to find any interesting test cases! The result returned is an empty test suite: not at all what we wanted.

Why did we find no interesting tests? Because we haven’t yet told QuickCheck what we’re interested in! Let’s see how we can do so.

Declaring Features of Interest

To use feature-based testing, we need to tell QuickCheck which features of interest each test case exhibits. We do this by instrumenting the property under test, adding a call to features/2, rather like we collect statistics by adding calls to aggregate/2. For example, let’s suppose that we are interested in finding test cases that call each function under test. Then we might declare the “features of interest” of each test case to be a list of the functions that it calls, like this:

prop_registry() ->
  ?FORALL(Cmds, commands(?MODULE),
          begin
            [catch unregister(N) || N <- ?names],
            {H, S, Res} = run_commands(?MODULE,Cmds),
            pretty_commands(?MODULE, Cmds, {H, S, Res},
              aggregate(command_names(Cmds),
                features(command_names(Cmds),
                        Res == ok)))
          end).

Line 8 is the addition, which just declares the features to be the same command names that we are aggregating.

Repeating our test suite generation, this time we find 5 test cases:

11> S1 = eqc_suite:feature_based(registry_eqc:prop_registry()).
Generating feature based test suite...
[{registry_eqc,whereis,1}]
[{registry_eqc,spawn,0}]
[{registry_eqc,kill,1}]
[{registry_eqc,unregister,1}]
[{registry_eqc,register,2}]

5 test cases generated.
{feature_based,[{[{registry_eqc,whereis,1}]...

Lines 3–7 tell us that QuickCheck has found a test case for each of those functions; the test cases are returned together in the final test suite, which we save here as S1 (it’s easy to save the test suite in a file, too). Of course, we need to be able run the test suite, which we can as follows:

12> eqc_suite:run(registry_eqc:prop_registry(),S1).
5 tests completed
[]

All the tests passed.

Inspecting the tests

So far, of course, we haven’t even seen the generated tests. The best way to do that is to run the test suite in “verbose” mode:

20> eqc_suite:run(eqc:batch(registry_eqc:prop_registry()),S1,verbose).
%%% Testing [{registry_eqc,whereis,1}]

OK, passed the test.
[{set,{var,1},{call,registry_eqc,whereis,[a]}}]

registry_eqc:whereis(a) -> undefined
...

For each test case, we see why it’s been included in the suite (in this case, to test whereis), whether it passed or not, QuickCheck’s representation of the test case (line 5), and then finally a pretty-printing of what the test actually did (line 7). If we extract the pretty-printed results from each of the five tests, they look like this:

registry_eqc:whereis(a) -> undefined

registry_eqc:spawn() -> <0.11483.1>

registry_eqc:spawn() -> <0.11486.1>
registry_eqc:kill(<0.11486.1>) -> ok

registry_eqc:unregister(a) ->
  {'EXIT',
     {badarg,
        [{erlang, unregister, [a], []},
         {registry_eqc, unregister, 1,
            [{file, "registry_eqc.erl"}, {line, 92}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

registry_eqc:spawn() -> <0.11491.1>
registry_eqc:register(a, <0.11491.1>) -> true

Sure enough, we have a test here for every operation, but the tests are hardly very complete. Can we do better?

Using Call Features as Test Features

We just generated a rather uninteresting test suite, because we specified rather uninteresting features—just whether or not each possible function was called. But suppose we specify that we are also interested in how each function was called? Nowadays QuickCheck allows us to specify the “features” of each call in a state machine test, and the model we’re using actually does so. For example, the features of register are specified in the model as follows:

register_features(S,[Name,Pid],Result) ->
  [success || Result==true] ++
    [name_already_registered || lists:keymember(Name,1,S#state.regs)] ++
    [pid_already_registered || lists:keymember(Pid,2,S#state.regs)] ++
    [pid_is_dead || lists:member(Pid,S#state.dead)].

register_features is used by QuickCheck to compute a list of features for each call of register, which in this case will contain an atom telling us whether the call succeeded, or how it failed. The model contains similar callbacks for whereis and unregister (the other functions which can succeed or fail). We can extract a list of the features of all calls in a test case using call_features/1, which lets us tell eqc_suite to use call features as test case features just by replacing the line

features(command_names(Cmds),

in our property by

features(call_features(H),

After doing so, eqc_suite generates a much more interesting set of seven test cases. Running the tests, we see three tests from the old test suite

registry_eqc:unregister(a) ->
  {'EXIT',
     {badarg,
        [{erlang, unregister, [a], []},
         {registry_eqc, unregister, 1,
            [{file, "registry_eqc.erl"}, {line, 92}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

registry_eqc:whereis(a) -> undefined

registry_eqc:spawn() -> <0.13401.1>
registry_eqc:register(a, <0.13401.1>) -> true

which test unregister, whereis and register in the simplest possible way, but we also see four new tests. There is a test in which unregister succeeds,

registry_eqc:spawn() -> <0.13407.1>
registry_eqc:register(a, <0.13407.1>) -> true
registry_eqc:unregister(a) -> true

a test in which whereis succeeds,

registry_eqc:spawn() -> <0.13413.1>
registry_eqc:register(d, <0.13413.1>) -> true
registry_eqc:whereis(d) -> <0.13413.1>

and two tests in which register fails:

registry_eqc:spawn() -> <0.13410.1>
registry_eqc:kill(<0.13410.1>) -> ok
registry_eqc:register(a, <0.13410.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [a, <0.13410.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

registry_eqc:spawn() -> <0.13404.1>
registry_eqc:register(a, <0.13404.1>) -> true
registry_eqc:register(a, <0.13404.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [a, <0.13404.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

The first of these fails because the pid is dead, and the second fails for two reasons, because the name is already registered, and because the pid is already registered. So now that we have declared our interest in all these possible failure modes, we get test cases in our suite that cover them. Nice!

Refining the call features

Of course, we might feel that QuickCheck “cheated” a little in the last test case above, by testing two features in one test! Perhaps we would prefer to include two tests in our test suite, one for each failure more. Doing so is easy: we just change the features of register as follows:

register_features(S,[Name,Pid],Result) ->
  [[success || Result==true] ++
     [name_already_registered || lists:keymember(Name,1,S#state.regs)] ++
     [pid_already_registered || lists:keymember(Pid,2,S#state.regs)] ++
     [pid_is_dead || lists:member(Pid,S#state.dead)]].

You may be forgiven if you don’t immediately notice the change! All we have done is add another pair of list brackets around the right hand side, so that instead of returning a list of features such as success, name_already_registered or pid_already_registered, the function will return a list of features which are themselves lists: [success], [name_already_registered], or [pid_already_registered]—or, of course, in the test case above, [name_already_registered,pid_already_registered], a list with two elements! By turning call features into lists, we enable eqc_suite to track calls that exhibit several features at the same time, and distinguish them from a combination of calls that exhibit the same features together. With this slight change, we get three new test cases: one where register fails because the pid is already registered,

registry_eqc:spawn() -> <0.16788.1>
registry_eqc:register(a, <0.16788.1>) -> true
registry_eqc:register(b, <0.16788.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [b, <0.16788.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

one where it fails because the name is already registered,

registry_eqc:spawn() -> <0.16791.1>
registry_eqc:register(b, <0.16791.1>) -> true
registry_eqc:spawn() -> <0.16792.1>
registry_eqc:register(b, <0.16792.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [b, <0.16792.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

and one—which I did not think of in advance—which fails for two reasons simultaneously:

registry_eqc:spawn() -> <0.16795.1>
registry_eqc:register(b, <0.16795.1>) -> true
registry_eqc:spawn() -> <0.16796.1>
registry_eqc:kill(<0.16796.1>) -> ok
registry_eqc:register(b, <0.16796.1>) ->
  {'EXIT',
     {badarg,
        [{erlang, register, [b, <0.16796.1>], []},
         {registry_eqc, register, 2,
            [{file, "registry_eqc.erl"}, {line, 45}]},
         {eqc_statem, run_commands, 2,
            [{file, "../src/eqc_statem.erl"}, {line, 790}]},
         {registry_eqc, '-prop_registry/0-fun-0-', 1,
            [{file, "registry_eqc.erl"}, {line, 136}]}]}}

In this test, the last call to register fails both because the name is already registered, and because the pid is dead! This is a good example of how QuickCheck can find—and eqc_suite can keep—tests that a human tester would be unlikely to think of.

Once we’re happy, we can save test suite in a file

eqc_suite:write(registry_tests,S3)

and add code to our model to run the test suite:

run_registry_tests() ->
  eqc_suite:run(registry_eqc:prop_registry(),registry_tests).

This saves the suite in a file called “registry_tests.suite”, and all we need to do to run them is pass eqc_suite:run/2> the file name as an atom, instead of the test suite itself.