Adding a GitHub repository to QuickCheck-CI

In this blog article we are going to explain how you can add a repository on GitHub to QuickCheck Continuous Integration. In contrast to other continuous integration systems our CI system is specialised in property based testing. After the code has been built, we run QuickCheck on the properties that are present in the repository. Using these properties we can thoroughly test the software and provide detailed feedback in case a property fails. In addition we generate coverage information, which can be used to spot dead code or to find the location of a bug.

Unfortunately many projects on GitHub haven’t defined properties yet. Many projects, however, do have unit tests. A property abstracts away from a single unit tests, and covers many unit tests. We are going to show how to define a property based on a unit test.

Apache Couchdb

We are going to use Apache CouchDB (with git revision 6fc1124) as an example project. Apache CouchDB, commonly referred to as CouchDB, is an open source database that focuses on ease of use. It is a NoSQL database that uses JSON to store data, JavaScript as its query language using MapReduce, and HTTP for an API. The project consists of many Erlang modules and contains two test suites: one based on ETAP and another one based on Grunt. If we run the latter we get some unexpected behaviour:

    ...
      NavBar
        adding links
          ✓ Should add link to navlinks
          ✓ Should add link to bottom links
          ✓ Should add link to footer links
        removing links
          ◦ should remove links from list: Error loading resource file:///app/templates/layouts/myTemplate.html (203). Details: Error opening /app/templates/layouts/myTemplate.html: No such file or directory
    Error loading resource file:///app/templates/layouts/myTemplate.html (203). Details: Error opening /app/templates/layouts/myTemplate.html: No such file or directory
    Error loading resource file:///Users/alex/Repositories/quviq/couchdb/src/ (301). Details: Protocol "file" is unknown
          ✓ should remove links from list
          ✓ Should call render after removing links

      TabMenu
        on change polling rate
          ✓ Should set polling rate
          ✓ Should clearInterval
          ✓ Should trigger update:poll event
          ✓ should set the correct active tab
        on request by type
          ✓ should set the filter 'database' for the main-view

      DataSection
        #setPolling
          ✓ Should set polling interval

      Replication Addon
        Replication View
          ✓ should render

      83 passing (396 ms)

    Done, without errors.

Although there are clearly some individual tests failing, due to missing input files, the overall test run is still regarded successful. If we would add such a test suite to a continuous integration server, these kind of errors will not be detected. It would be an improvement if the test suite returns an error if one of the containing test fails. We do not investigate this test suite further, instead, in the remainder of this blog we look at the ETAP test suite.

Abstracting unit tests to properties

There are about 50 tests defined in the ETAP test suite. A test consists of a sequence of assertions, where an assertion is a function call applied to some given arguments together with the expected result. Consider an excerpt from the 010-file-basics.t ETAP test file:

    {ok, Fd} = couch_file:open(filename() ++ ".0", [create, overwrite]),
    etap:ok(is_pid(Fd),
        "Returned file descriptor is a Pid"),

    etap:is({ok, 0}, couch_file:bytes(Fd),
        "Newly created files have 0 bytes."),

    ?etap_match(couch_file:append_term(Fd, foo), {ok, 0, _},
        "Appending a term returns the previous end of file position."),

    {ok, Size} = couch_file:bytes(Fd),
    etap:is_greater(Size, 0,
        "Writing a term increased the file size."),

    ?etap_match(couch_file:append_binary(Fd, <<"fancy!">>), {ok, Size, _},
        "Appending a binary returns the current file size."),

    etap:is({ok, foo}, couch_file:pread_term(Fd, 0),
        "Reading the first term returns what we wrote: foo"),

    etap:is({ok, <<"fancy!">>}, couch_file:pread_binary(Fd, Size),
        "Reading back the binary returns what we wrote: <<\"fancy\">>."),

    etap:is({ok, couch_compress:compress(foo, snappy)},
        couch_file:pread_binary(Fd, 0),
        "Reading a binary at a term position returns the term as binary."
    ),

    {ok, BinPos, _} = couch_file:append_binary(Fd, <<131,100,0,3,102,111,111>>),
    etap:is({ok, foo}, couch_file:pread_term(Fd, BinPos),
        "Reading a term from a written binary term representation succeeds."),

This particular instance tests the couch_file module. This is a low-level module that handles file interactions, such as reading and writing. The setup of this particular test is very common and straightforward. In essence, it opens a file, writes something in the opened file, and then tries to read from that file. The result of reading from file should be equal to what was written. The test for appending a term to the file is however only performed for a single term, namely foo. After a successful test run we may conclude that it is possible to append a term to a file, but one might ask if we tested the append_term function, which writes a term to a file, thoroughly enough. Are we sure we can handle large terms as well? What happens if we write multiple entries or interleave the function call with other operations, such as reading? Do the order of the assertions have any influence on the test results–for example, if we append some binary data before we append a term? If we want to test these kind of interactions we need to write many unit tests, which is tedious and error prone work.

Instead of testing unit tests, we propose to use property based testing using QuickCheck. With QuickCheck you can define properties that your software should have. For example, the append_term function should not only work for foo, but also for any other term. A property that captures this characteristic of append_term can be expressed as follows:

    prop_append_term() ->
      ?FORALL(Term, term(),
              begin
                {ok, Fd} = couch_file:open(?TESTFILE, [create, overwrite]),
                ok(is_pid(Fd), "Returned file descriptor is a Pid"),
                is({ok, 0}, couch_file:bytes(Fd), "Newly created files have 0 bytes."),
                case couch_file:append_term(Fd, Term) of
                  {ok, 0, _} -> true;
                  _ -> exit("Appending a term returns the previous end of file position.")
                end,
                {ok, Size} = couch_file:bytes(Fd),
                couch_file:close(Fd),
                Size > 0 % Writing a term increased the file size
              end).

The ?FORALL macro takes a generator as second argument, in this case a term generator, and binds the generated term to the variable Term. This variable is used in the third argument, the actual property, which uses the generated value to determine if the function (or system) under test behaves as expected. In this case we check the same sequence of commands as is present in the ETAP test. We open a file, check if the first index is zero, append the term to the file, and finally check if the size of the file has increased. We can validate this property as follows:

    1> eqc:quickcheck(prop_append_term(Fd)).
    ....................................................................................................
    OK, passed 100 tests
    true

We call the quickcheck function from the eqc module, and conclude that we can successfully call the append function a hundred times with randomly generated terms. Note that we could make this property stronger if we can predict what the size value is going to be. This is relatively hard without knowing how this function is implemented. We could however determine if the Index is correct after every call to append_term, because we merely need to add the previous term size and index that we get returned from each append call. This implies that we need to maintain state information, namely the return values of the append calls.

EQC state machine

QuviQ QuickCheck offers a library called eqc_statem to handle operations that have side-effects, which are specified via a state machine. The following may seem a bit too general for the problem at hand, but it will save time in the long run.

A minimal QuickCheck state machine with only one call (to append_term) is specified as follows:

    -define(TESTFILE, "couch_eqc_test_file").
    -record(state, {file_desc, indexes}).

    initial_state(Fd) ->
      #state{file_desc = Fd, indexes = []}.

    %% -- Append call
    append_term_args(S) ->
      [S#state.file_desc, term()].

    append_pre(_S) ->
      true.

    append_term(Fd, Term) ->
      couch_file:append_term(Fd, Term).

    append_term_next(S, Result, _Args) ->
      S#state{indexes = [Result | S#state.indexes]}.

    append_term_post(S, _Args, {ok, NextIndex, _}) ->
      CalcIndex = case S#state.indexes of
                    []                          -> 0;
                    [{ok, Index, TermSize} | _] -> Index + TermSize
                  end,
      NextIndex == CalcIndex. 

    %% -- Property
    prop_append(Fd) ->
      ?FORALL(Cmds, commands(?MODULE, initial_state(Fd)),
              begin
                couch_file:truncate(S#state.file_desc, 0),
                {H, S, Res} = run_commands(?MODULE,Cmds),
                pretty_commands(?MODULE, Cmds, {H, S, Res}, Res == ok)
              end).

The above code snippet shows how to define an operation for a state machine. The append_term operation is defined by five functions. The function with the _args postfix get the state as an argument and returns the (generated) arguments for the operation in a list. We can put a precondition on an operation using the postfix _pre function. In this example the precondition function just returns true; in other words, it is always valid to perform the append operation (and actually, we could have omitted this definition, since a true precondition is the default). The append_term receives two arguments: a file descriptor, which is stored in the state record, and a generated term. The function term() returns a QuickCheck generator for Erlang terms; we will later explain this in more detail. The append_term performs the actual operation by calling the appropriate function from the couch_file module. The function with the postfix _next updates the state. We want to use the results of a previous call to append to calculate the expected size. We therefore store the results of a call in the indexes field of the state. The stored result is used in the fifth function, with the _post postfix, which checks the post condition for the append call. The post condition checks if the addition of the previous index and term size is equal to the next index. In case of the first call to append, that is the indexes list is empty, the calculated index is zero.

To test the append operation we need to define a state machine property and an initial state. The latter is defined by providing an implementation for the initial_state/1 function. We use a state record to store the file descriptor and a list of indexes. The property for the state machine is a general one, and basically states to generate a list of commands, execute those, and check if no post condition has been violated. Note that we empty the file before each test run using the truncate function, so that we start in a known state. More information about QuickCheck state machines and generators can be read in the QuickCheck documentation.

Just as with the more simple property we can validate the state machine property by running the quickcheck function:

    2> eqc:quickcheck(prop_append()).
    ....................................................................................................
    OK, passed 100 tests
    true

Again, we can run a hundred tests without any problems. But what did we actually test? How many append calls did we do? It would be interesting to see number of consecutive calls to append we did. Let us adapt the property to include some statistics gathering. We add the function collect/2 to the property to collect the length of the generated sequence of commands (Cmds), for now a list of append calls.

    prop_append(Fd) ->
      ?FORALL(Cmds, commands(?MODULE, initial_state(Fd)),
              collect(length(Cmds),
              begin
                couch_file:truncate(S#state.file_desc, 0),
                {H, S, Res} = run_commands(?MODULE,Cmds),
                pretty_commands(?MODULE, Cmds, {H, S, Res}, Res == ok)
              end)).

With this modification we get the following result when running quickcheck:

    3> eqc:quickcheck(prop_append()).
    ....................................................................................................
    OK, passed 100 tests

    12% 1
    8% 4
    8% 0
    5% 6
    5% 3
    5% 2
    4% 33
    4% 10
    4% 7
    3% 41
    3% 20
    3% 11
    3% 9
    3% 8
    2% 57
    2% 24
    2% 22
    2% 18
    2% 15
    2% 14
    2% 13
    2% 12
    2% 5
    1% 74
    1% 72
    1% 69
    1% 47
    1% 42
    1% 40
    1% 39
    1% 36
    1% 35
    1% 29
    1% 21
    1% 16
    true

We observe that we indeed call the append operation many times. This does not give us much confidence in the append implementation though. We do not check if the terms are stored properly and that we can read them back. We are going to add the read operation, as well as many other operations, to the state machine specification. But first we explain a bit about QuickCheck generators, then we show how to add a github repository to QuickCheck CI, and then we will come back to the state machine to extend the set of operations.

Generators

QuickCheck generators are used to generate random test data for QuickCheck properties. A generator specifies three things at the same time:

a set of values that can be generated,
a probability distribution on that set,
a way of shrinking generated values to similar, smaller values

The last point is very important because the generated values can be quite large and may contain irrelevant parts. After a test fails QuickCheck searches for a similar, but simpler failing test case by shrinking the value. The shrinking process cuts away the irrelevant parts. QuickCheck generators are composable and can be combined to quickly specify new generators. The code below shows the generator for Erlang terms that we have used before:

    literal() ->
      oneof([atom(), bool(), char(), int()]).

    atom() ->
      ?LET(S, list(char()), list_to_atom(S)).

    tuple(Gen) ->
      ?LET(Xs, list(Gen), list_to_tuple(Xs)).

    term(0) ->
      literal();
    term(Size) ->
      N = Size div 3,
      oneof([literal(), list(term(N)), tuple(term(N))]).

    term() ->
      ?SIZED(Size, term(Size)).

We start with defining a generator for literals (literals/1), such as integers and characters. The bool/1, char/1, and int/1 are predefined literal generators provided by the QuickCheck generator library. The list/1 generator generates random lists with elements of a given generator. In the atom/1 generator we use the list generator to create a generator of lists of characters. Furthermore, this generator uses the ?LET macro, which is used to create a new generator that binds values of a supplied generator and uses these values to generate values. In this case we convert the generated list of characters (bound to the variable S) to an Erlang atom.

QuickCheck lets you also define own generators for recursive data structures, such as the term/1 generator. When using recursive generators we need to be careful with the size of the generated value. To keep the size of recursive data structures under control we divide the size of the recursive call by three, and use the ?SIZED macro to let QuickCheck take care of the actual size.

There are many more generator combinators and predefined generators that can be used to build QuickCheck generators.

QuickCheck Continuous Integration

QuickCheck CI is a continuous integration server that runs QuviQ QuickCheck on a GitHub project. Open Source Developers can use QuickCheck CI to get free access to QuviQ QuickCheck for their quality assurance. Before you can register your project to QuickCheck CI, you need to add a licence file to the project, which states that we are allowed to build the project and execute tests. In addition, you need to supply a configuration file that explains how to build your project. After that you can register the GitHub URL to our CI server. More information about adding a project can be found here.

When registered, we can queue a build and run QuickCheck on the stated properties in the project. The CI system gives a nice overview of whether or not a project could be built and shows the results of all properties along with coverage information.

Since we do not have access to the Apache CouchDB GitHub repository, we have, for this example, forked the repository and added the licence file (note that the licence of Apache CouchDB allows for this). We also added the following configuration file:

{build, "./configure ; make ; cd test/eqc/ebin/ ; erl -make "}.
{test_path, "test/eqc/ebin"}.
{deps, "src/couch/ebin"}.
{deps, "src/couch_stats/ebin"}.
{deps, "src/snappy/ebin"}.
{deps, "src/ioq/ebin"}.

We can now add the project to the CI server and queue a build for the forked project. After we have verified that we can successfully build the project, we can continue to add a module with the QuickCheck property, which we defined above, and start testing with QuickCheck.

Complete model

So far we have only added the append operation to our model of couch_file. As mentioned before this gives little confidence in
this particular operation, as well as the overall module. We therefore complete the model with the other operations:

open
append_binary
pread_binary / pread_binary
truncate
bytes
write_header / read_header
sync
close

We omit the explanation of most of these functions because the names give a good indication of the semantics. Indeed, the names are in some cases the only documentation available! So we have used QuickCheck to gain an understanding of how the API behaves: we learned that a file contains any number of terms or binaries, each of which has a location in the file, and one “header” which is read and written by separate calls. We extended our state record to model this, keeping track of the written contents paired with their index, and we added a field that stores the header. Note that we can adapt the distribution of the operations, and we made sure that we call the append... and pread... operations more often than, for example, the close operation. The source code of the model can be inspected here.

We now discuss some peculiarities we encountered during the development of the model.

Deviations from the model

In our first implementation of the model we expected the bytes operation, which measures the total size of the file, to return the sum of the index and the size of the last term to be appended. We store these indices in the state record. However, when we run QuickCheck on the model we get the following counterexample:

    couch_eqc_found_bytes_inconsistency:open() -> {ok, <0.10858.0>}
    couch_eqc_found_bytes_inconsistency:write_header({ok, <0.10858.0>}, []) -> ok
    couch_eqc_found_bytes_inconsistency:bytes({ok, <0.10858.0>}) -> {ok, 23}

    Reason: {postcondition, {0, '/=', 23}}

This counterexample shows us that we need to take the size of the header into account. Our impression had been that the header is treated differently from the other appended contents. But the bytes operation does not make this distinction. This is fair enough–we found a misunderstanding in our model. After we adjusted our model to reflect this, we got the following counterexample:

    couch_eqc_truncate_removes_header:open()
      -> {ok, <0.13038.0>}
    couch_eqc_truncate_removes_header:append({ok, <0.13038.0>}, {list_bin, [<<>>]})
      -> {ok, 0, 5}
    couch_eqc_truncate_removes_header:write_header({ok, <0.13038.0>}, [])
      -> ok
    couch_eqc_truncate_removes_header:truncate({ok, <0.13038.0>}, {ok, 0, 5})
      -> ok
    couch_eqc_truncate_removes_header:read_header({ok, <0.13038.0>})
      -> no_valid_header

    Reason: {postcondition, {no_valid_header, '/=', {ok, []}}}

In this counterexample, we expect the truncate to remove the term that we wrote, but the results show that we also removed the header written afterwards. This makes us suspect that a header is written just like any other term or binary, and not treated specially. Not knowing the specification we cannot determine whether or not this is the intended behaviour. We again adapt the model: we remove the header if it is written after an index that we truncate.

With this modified model we run into the next related counterexample:

    couch_eqc:open() -> {ok, <0.350.0>}
    couch_eqc:write_header({ok, <0.350.0>}, []) -> ok
    couch_eqc:append({ok, <0.350.0>}, {term, []}) -> {ok, 23, 6}
    couch_eqc:write_header({ok, <0.350.0>}, []) -> ok
    couch_eqc:truncate({ok, <0.350.0>}, {ok, 23, 6}) -> ok
    couch_eqc:read_header({ok, <0.350.0>}) -> {ok, []}

    Reason: {postcondition, {{ok, []}, '/=', no_valid_header}}

What happens here is that we write a header twice, do a truncate, and then try to read the header. The truncate is done with an index that is in between the two write_header operations. After the previous counterexample, we adapted our model to reset the header if we do a truncate before the header is written. We therefore expected that the couch_file implementation would return the no_valid_header value, as we have just removed the header. Instead, we get the header that was written the first time. We can conclude that a header is not overwritten if we call the write_header operation more than once. So if you are going to use the header functionality together with the truncate function, you need to do some bookkeeping to determine which header you can expect–previously overwritten headers seem to “return from the dead”!

Only later we found out that there is an additional ETAP test file that contains unit tests for the header functionality. It became clear that the displayed behaviour is actually intended. This came as a bit of a surprise; it seems that a header is not associated with a file, as the name suggests, but instead it is used as a kind of marker for a block of entries. We again adapted the model and are now able to run thousands of tests without any problems.

This example shows that QuickCheck finds inconsistencies quickly, using generated test cases that are hard to come up with yourself. The couch_file module is a very low-level module of the CouchDB project, but it was interesting to pin down its specification with QuickCheck. A possible continuation would be to test a more high-level module of the CouchDB project.