Testing TLS with QuickCheck

Lately we have been experimenting with modelling TLS (v1.2) using QuickCheck.

We took the approach where QuickCheck takes the ‘client’ role and tested a couple of available implementations running as a server. For the experiment we
ran against the built-in Erlang SSL/TLS implementation, S2N from Amazon Security Labs, OpenSSL, and mbedTLS (previously PolarSSL). For a complete description of TLS v1.2 we refer to RFC5246; in short it is a protocol where a series of messages is interchanged in order to setup a shared secret, such that an efficient symmetric cipher can be used for a communication session.

TLS has lots of additions and extensions defined. Here we describe the basics, without such extensions. Also, this is not directly a test of the underlying cryptographic primitives. We are using the ‘crypto’ library of Erlang, which in term utilizes OpenSSL. We would, of course, notice if the tested implementation deviates from this ‘standard’ implementation, but otherwise we assume it to be correct. After all, for S2N we know that considerable effort has been put in verifying the crypto part

Purpose

Quviq has on several occasions been asked to do work in the spirit of ‘does an implementation adhere to its specification’? I.e. acceptance testing of a well specified application. For example we have done this for the AUTOSAR specification (for example the CAN and LIN communication stacks), and we have found it an interesting and challenging problem. The question is important because of interoperability, will two implementations that supposedly follow the same standard work together? And when they don’t work together, who is at fault?

How we tested

The TLS protocol is initialized by the client sending a CLIENT_HELLO message. This message contains the security parameters that the client supports, we use
QuickCheck to generate a random list of configurations (skewed against configurations that the server will accept). Depending on the selected configurations (the server makes the final selection from the list sent by the client). The protocol continues with messages containing certificates, possibly additional ciphers, etc. In each stage we check that the response is the expected and generate the partly random data necessary to respond in compliance to the protocol. The TLS protocol is terminated once a successful handshake is completed (or aborted if we happened to generate a negative test).

Findings

Saving the most interesting finding to last we start by two observations regarding the S2N TLS implementation. (1) S2N does not follow the specification regarding when to send explicit ABORT messages in the protocol; and (2) S2N uses a strange internal prioritization order of ciphers. (1) is an intentional deviation as far as we could tell, there are certain cryptographic attacks towards TLS that can benefit from the information given by the ABORT, thus Amazon engineers have on purpose left them out. Such a deviation is, of course, good to make – but from an interoperability perspective it can still cause a problem so exposing the deviation is important. (Also, we had to read this
between the lines in the S2N documentation, it would not hurt to be more explicit!) The second observation was that S2N will select crypto configuration
(from the configurations suggested by the client), in a strange order. For ephemeral elliptic curve Diffie-Hellman (ECDHE) with 128 bit AES then SHA1 is
preferred over SHA256 as the MAC algorithm. But for RSA with 128 AES then SHA256 is preferred over SHA1.

For the Erlang SSL/TLS implementation we actually found a real bug. We found a CLIENT_HELLO message that was incorrectly rejected by the server. Upon closer inspection it turned out that whenever the lower 16-bits of the GMT time was zero *and* the first two bytes of the client random material that goes in the message (read as a 16 bit UINT) match the length of the rest of the message it is rejected. The reason is that the message is then, incorrectly, interpreted as a SSLv2 message, and SSLv2 is deprecated and such messages will be rejected. This bug is reported to the Erlang OTP team. A patch is already in place and will be included in a coming release.

Conclusion

Modelling only the basic functionality of the protocol we did not expect to find any major bugs or deviations. The tested (server-) implementations can all be considered very mature and since they are all well used any error in the commonly used core would have been exposed long ago. Still we did find a real error and a couple of smaller deviations/documentation issues – which goes to show that it is an effective way to test an implementation.

We generate random tests in such a way that dependencies that cannot be random, but depend on the state we are in, are given sensible values. In this way we quickly get the implementation under test in interesting states. We expect QuickCheck testing to find even more interesting things if we start implementing a number of extensions of this protocol.