Wednesday, June 29, 2016

How to detect fake tests - Introduction to Mutation Testing

In the last posts (1, 2, 3) I showed various ways for producing fake tests. Of course, good developers won't fake their tests, and the chances to encounter a test suite purely made of fake tests in real life is rather low. Nevertheless, in certain environments it may occasionally happen that metrics are polished for various reasons. But it's more likely, that the quality of a test suites deteriorates over time because of various reasons, i.e. project pressure, sloppy moments during coding, wrong assumptions, etc. And typically we rely on metrics to determine whether our project is in good shape.

My intention for the last three posts was to show, how easy the common metrics - test count, line and condition coverage - can be tricked and are of very low value without the proper context. They are as good for determining the health of a software project as lines of codes are. They might be an weak indicator but nothing more.

The main question is, how could we determine the actual value of our tests and test suites? How would others do it? Firebrigades test their procedures and techniques on a real fire. Military is holding maneuvers, martial arts fighters test their skills in championships, NetFlix is letting the Chaos Monkey terminate instances to detect holes in the recovery procedures.

What is the main reason to have automated tests? To detects bugs that slipped into existing code unintentionally. It doesn't matter if you wrote the tests beforehand by practicing Uncle Bob style TDD or afterwards to create a safepoint for your code. The base assumption is, once you've written your code and your tests, it's free of errors. But it's called Software for a reason: it may change over time. The once written, error-free code will eventually be changed. To ensure, it is still functional, the test suites are run and if it's all green, nothing was broken. But how can you be sure of that?

The only thing to verify your test suite is capable of detecting bugs is to induce bugs in your code.

The technique of altering your code and re-run your test suite to verify the test suite detects the code change is called Mutation Testing. The concept is known for quite a while and was mostly subject to academic research with the tools being somewhat theoretical and less practical to use. But since the arrival of Pitest.org a practical, stable and well integrated tool has been around that should be in every developer's toolbox.

Pitest mutates bytecode and runs highly parallel making it the fastest mutation testing tool for the JVM. Pitest offers a set of Mutation Operators that modify bytecode according to a defined ruleset and thus creates a modified version of the code, a Mutation. The test suite is run again and if at least one test fails, the Mutation is killed. In the end, the Mutation Score is calculated from the number of killed mutations vs the total number of mutations.

Different to line or branch coverage, which can be determined with a single test suite execution, Pitest requires one test suite execution per mutation. With larger code-bases the execution time increases exponentially due to the sheer number of combinations of mutations. Although Pitest offers a variety of settings and options to limit execution time - i.e. delta execution, selection of mutation operators, exclusion of classes, to name a few - it requires some thorough planning how this technique should be incorporated into the CI/CD pipeline. The value it delivers, comes with a price.

In the next post of this series, I will provide examples of how to setup and run pitest with practical examples, so stay tuned.

Wednesday, June 22, 2016

How to fake tests (Part 3)

In this 3rd part of the series I want to show how assertions can be faked, so that not only lines and branches get covered but the test themselves also contain some assertions.
Faking Assertions only makes sense if a metric such as "assertions/test" is computed at all. Otherwise you may skip that part, because every proper code review would reveal your test as fake.
Test libraries such as Junit or TestNG contain various means for expressing assertions. In addition to this, some frameworks exist for that sole purpose, i.e. Hamcrest, Truth, to name a few. Basic approach for all is, to invoke the system under test (generating coverage information) and to verify outcomes against assertions.
But the outcomes doesn’t have to be related to what is declared as expected for the test to succeed. So all of the following assertions might do the trick

assertTrue(true);
assertNotNull(new Object()); (a real life example I’ve encountered during a code review)
assertEquals("2","2");
…

After having applied fake assertions and fake coverage, our testsuite satisfies the following criteria

Big, lots of tests for all the methods
100% Line Coverage
100% Condition Coverage
Tests contain assertions (maybe 1 assertion/method as a metric)

This would make every project manager happy, because the quality of the product is so good and there is proof for that....

Not!

You’ve probably produced the most sophisticated test suite with best quality ratings with minimum effort to create that has no value at all (Achievement unlocked).

In the next post I'll show how all these fakes described in this and the earlier posts can be revealed as such - and more important, how the effectiveness of a test suite can be determined and gaps in a sensible test suite be found. So stay tuned.

Thursday, June 9, 2016

How to fake tests (Part 2)

In the last post I described how to write fake tests to statisfy number-of-tests KPI. Apparently this is not a good practice for software craftsmen. Unfortunately some organisation do value KPIs more than good craftsmenship and may be simply tricked by fake tests. So in today's post I'd like to show you how to fake line and condition coverage of tests. This is a call to action for decision makers who base their decisions on such numbers: don't trust them. And for developers: if encouter things like the following (or like in the last post): fix them. So let's start with line coverage.

Faking Line Coverage

Line Coverage is a metric that measures how many and which lines have been covered during execution. There are various tools to measure coverage.

Jacoco – Measuring on ByteCode level which has the advantage that you can test your actual artifacts, but bytecode can be different to its source at times.
ECobertura, Clover – Measuring on SourceCode level which is more precise than byte-code measuring but injects additional code before compilation, ending up in different artifacts than you want to deliver.

When running your tests with line coverage enabled, all lines touched are recorded to produce the metric. If you have 0% line coverage, you didn’t run any code. So let’s extend our test to get some coverage:


@Test
public void test() {
  subject.invokeSomeMethod();
}

Obviously this test is broken because in cannot fail – unless the code itself produces an exception. But with tests like these you may achieve quite easily a high line coverage and a stable test suite.
But typical programs are rarely linear and have some sort of loop or branch constructs. So it’s unlikely you achieve 100% line coverage. So we have to fake branch coverage, too.

Faking Condition Coverage

Lets assume our simple program consists of the following code


Object compute(Object input) {
  if("left".equals(input) {
    return "right";
  } 
  return "left";
}

It has one condition with two branches. With a single test, you may get 66% Line Coverage and 50% Condition Coverage. I’ve experiences several times that branch coverage is perceived as “better” or of “more value” because it’s harder to achieve. If “harder” means “more code” it’s certainly true, but branch coverage suffers the same basic problem as line coverage does: it’s just a measure for which code is executed and not how good your tests are. It also depends on the code base, what is harder to achieve. If the happy-flow you test covers only a minor part of the code, you may have 50% branch-coverage but only 10% line coverage. Given the above example, assume the “left”-branch contains 10 lines of code, but you only test for the “right”-branch.

But as we are developers who want to make happy managers, let’s fake branch coverage!
Given, we only test a single flow in a single test, we need two tests:


@Test
public void testLeft() {
 String output = compute("left");
}
@Test
public void testRight() {
 String output = compute("right");
}

This test will produce 100% branch- and line coverage and is very unlikely to fail, ever.
But again: it’s worthless, because we don’t check any output of the operation. So the operation may return anything without failing the test. But still in terms of KPI metrics we achieved:

2 tests for 1 method (great ratio!)
100% line coverage
100% condition coverage

What we missed to have is an assertion. Assertion postulate expected outcomes of an operation. If the actual outcome is different than expected, the test fails. Theoretically it would would be possible to count assertions per test in static code analysis. But I’ve never seen such metric although it’s value would be similar to line- or condition coverage. Nevertheless: we can fake it!

So in the next post, I'll show you how to fake assertions.

Tuesday, May 31, 2016

How to fake tests (Part 1)

In most projects, metrics play an important role to determine the status, health, quality etc. of the project. Not rarely the common metrics for quality have been

Number of Unit Tests (Total, Failed, Successful)
Line Coverage
Branch Coverage

Usually those “KPI” (Key Performance Indicators) were used by “managers” to steer the project to success. The problem with these metrics is: they are totally useless if taken out of context - and the context is usually not that well defined in terms of metrics, but often requires knowledge and insight into the system that’s been measured.

This post is about to show how to game the system and life-hack those KPIs to fake good quality. It’s NOT a best practice but a heads up to those who make decision based on those metrics to look behind the values.

Faking Number of Unit Tests

Most (if not all?) frameworks count the number of tests executed, which failed and which succeeded. A high number of tests is usually perceived as a good indicator of quality. The increase of the amount of tests should correlate with the increase in lines of code (another false-friend KPI). But what is counted as a test?

Let’s look at the Junit which is the de-facto standard for developing and executing Java based unit tests, but other frameworks such as TestNG follow similar concepts.

In Junit 3 it was every parameterless public void method starting with “test” in a class extending TestCase. Since Junit 4 every method annotated with @Test counts as a Test.

That’s it. Just a name convention or an Annotation and you have your test, so let’s fake it!

@Test
public void test() {

}

This is pure gold: a stable and ever succeeding Unit Test!

Copy and paste or even generate those and you produce a test suite satisfying the criteria:

Big, tons of tests, probable even more than you have LoCs
Stable, none of these tests is failing. Ever.
Fast, you have feedback about the success within seconds.

The downside: it’s worthless (surprise, surprise!). There are basically two primary reasons, why its worthless:

It doesn’t run any code
It doesn’t pose any assertion about the outcome

Good indicators to check the first one are line or condition coverage analysis. The latter is more difficult to check.

In the upcoming posts we'll have a look into both.

Saturday, December 19, 2015

Scribble 0.3.0

I am proud to announce a new version of the the Scribble testing library! The biggest changes are the new modularization and documentation. For every functional aspect there is now a separate module so that not a whole load of unused dependencies have to be included in your project if you only require just a single functional aspect. In addition to this, the entire project documentation is now kept in the source and be generated using maven's site support. This includes this wiki documentation as well, although the publishing process is not yet part of the release build jobs.
As new features for testing I introduce a http server as a TestRule that can be set up in various ways to server static content. It's still rather limited, but will be contiuously improved in future releases. Further features are the possibility to create temporary zip files, record system out and err via a TestRule and capture and restore System Properties - a simple rule that helps keeping the test environment clean, and finally a matcher for matching date strings against a data format.

For more information, have a look at the wiki or find the source code on GitHub.

Task

[SCRIB-55] - Modularize Scribble

Story

[SCRIB-35] - Embedd static HTTP content as a rule
[SCRIB-43] - Build documentation as part of the release
[SCRIB-49] - Create zipped temp file from resources
[SCRIB-50] - Date Format Matcher
[SCRIB-52] - Rule for capturing System.out and System.err
[SCRIB-53] - Rule for setting and restoring System Properties

Bug

[SCRIB-39] - ConfigPropertyInjection#isMatching sets default value
[SCRIB-51] - TemporaryFile not usable as ClassRule
[SCRIB-57] - ApacheDS all prevents exclusion of modules
[SCRIB-58] - Remove SLF4J Binding dependencies
[SCRIB-59] - DirectoryServer/DirectoryService not working as ClassRule

Wednesday, July 8, 2015

Scribble Release 0.2.0

I am proud to announce a new version of the the Scribble testing library! The new version has support for an embedded ldap server which allows to write tests against an ldap server without having to rely on existing infrastructure. Further, the JCR support has been improved, now it's possible to pre-initialize a JCR repository with content from a descriptor file and to create a security-enabled in-memory repository. Some additional improvements have been made in the CDI injection support and the matchers have been extended for availability checks for URLs.

For more information, have a look at the wiki or find the source code on GitHub.

Release Notes - Scribble - Version 0.2.0

Bug

[SCRIB-31] - Primitive types not support for ConfigProperty injection
[SCRIB-32] - String to Number conversion of default values in ConfigProperty injection fails
[SCRIB-41] - LDAP Rules are not properly applied
[SCRIB-42] - ResourceAvailabilityMatcher is not compatible with URL
[SCRIB-48] - Directory Rules can not be used as ClassRules

Story

[SCRIB-1] - Builder support for LDAP Server and Service
[SCRIB-2] - Make LDAP Port configurable
[SCRIB-5] - Matchers for availability of an URL
[SCRIB-10] - Support for prepared JCR Content
[SCRIB-12] - Support security enabled content repositories
[SCRIB-14] - Add Convenience method for admin login
[SCRIB-33] - Convenience Methods for Directory creation
[SCRIB-34] - Convenience Method for anonymous login
[SCRIB-38] - Supply package-info.java

Friday, May 29, 2015

Multi-Module Integration Test Coverage with Jacoco and Sonar

Yesterday I have struggled to capture IT coverage results in a multi-module project setup, which I eventually solved.

So lets assume, I have the following setup:

rootModule
+Module1
+Module2
| +SubModule2-1
| +SubModule2-1-1
| +SubModule2-2
+ITModule
+ITModule1

The ITModule contains only integration tests, where ITModule1 is a special scenario, that requires a single module. Module2 consists of nested submodules. There are several examples out there to use a path like ../target/jacoco-it.exec but that's obviously not working if you more than one nesting level.

To know how to solve it, you must understand, how sonar is doing the analysis. When analysing the coverage information sonar checks the code of each module against the coverage file that is specified in the sonar.jacoco.itReportPath property which defaults to target/jacoco-it.exec. So when analyzing Module1 it check for coverage info in Module1/target/jacoco-it.exec. But as the coverage data is captured in the ITModule, respectively ITModule1, I have to point sonar to the file generated in the IT module.
So the best location to gather the coverage data is to the use rootModule, i.e. rootModule/target/jacoco-it.exec and append the results of all IT tests to that file.

I use the following plugin configuration that uses separate files for unit-test coverage (don't forget the append flag otherwise overall coverage will be incorrect) and the central file for IT covergage.

<plugin>
  <groupId>org.jacoco</groupId>
  <artifactId>jacoco-maven-plugin</artifactId>
  <version>0.7.4.201502262128</version>
  <executions>
    <execution>
      <id>prepare-agent</id>
      <goals>
        <goal>prepare-agent</goal>
      </goals>
      <configuration>
        <destFile>target/jacoco.exec</destFile>
        <append>true</append>
        <propertyName>surefireArgLine</propertyName>
      </configuration>
    </execution>
    <execution>
      <id>prepare-it-agent</id>
      <phase>pre-integration-test</phase>
      <goals>
        <goal>prepare-agent</goal>
      </goals>
      <configuration>
        <destFile>${session.executionRootDirectory}/target/jacoco-it.exec</destFile>
        <append>true</append>
        <propertyName>failsafeArgLine</propertyName>
      </configuration>
    </execution>
  </executions>
 </plugin>

The ${session.executionRootDirectory} property is the root of execution, when I build the entire project, it will point to the rootModule. So this is the best path to use, when you have multi-module with more than one level of nesting.

For the analysis, I need to point sonar to use that file when analyzing IT coverage. So I have to set the sonar.jacoco.itReportPath to that file. Unfortunately, this does not work with the session.executionRootDirectory property and I have to set the absolute path to the file manually. I do not recommend to specify the absolute path in the pom.xml as this path is specific to the build environment. So either set the path in Sonar or as System property of your build environment. I set it directly in the Sonar Project Settings (Java > Jacoco), for example /opt/buildroot/myProject/target/jacoco-it.exec. Now sonar will check that file for the IT coverage analysis of each module.

whoopdicity

Pages

Wednesday, June 29, 2016

How to detect fake tests - Introduction to Mutation Testing

Wednesday, June 22, 2016

How to fake tests (Part 3)

Thursday, June 9, 2016

How to fake tests (Part 2)

Faking Line Coverage

Faking Condition Coverage

Tuesday, May 31, 2016

How to fake tests (Part 1)

Faking Number of Unit Tests

Saturday, December 19, 2015

Scribble 0.3.0

Task

Story

Bug

Wednesday, July 8, 2015

Scribble Release 0.2.0

Release Notes - Scribble - Version 0.2.0

Bug

Story

Friday, May 29, 2015

Multi-Module Integration Test Coverage with Jacoco and Sonar