whoopdicity: kpi

Thursday, June 9, 2016

How to fake tests (Part 2)

In the last post I described how to write fake tests to statisfy number-of-tests KPI. Apparently this is not a good practice for software craftsmen. Unfortunately some organisation do value KPIs more than good craftsmenship and may be simply tricked by fake tests. So in today's post I'd like to show you how to fake line and condition coverage of tests. This is a call to action for decision makers who base their decisions on such numbers: don't trust them. And for developers: if encouter things like the following (or like in the last post): fix them. So let's start with line coverage.

Faking Line Coverage

Line Coverage is a metric that measures how many and which lines have been covered during execution. There are various tools to measure coverage.

Jacoco – Measuring on ByteCode level which has the advantage that you can test your actual artifacts, but bytecode can be different to its source at times.
ECobertura, Clover – Measuring on SourceCode level which is more precise than byte-code measuring but injects additional code before compilation, ending up in different artifacts than you want to deliver.

When running your tests with line coverage enabled, all lines touched are recorded to produce the metric. If you have 0% line coverage, you didn’t run any code. So let’s extend our test to get some coverage:


@Test
public void test() {
  subject.invokeSomeMethod();
}

Obviously this test is broken because in cannot fail – unless the code itself produces an exception. But with tests like these you may achieve quite easily a high line coverage and a stable test suite.
But typical programs are rarely linear and have some sort of loop or branch constructs. So it’s unlikely you achieve 100% line coverage. So we have to fake branch coverage, too.

Faking Condition Coverage

Lets assume our simple program consists of the following code


Object compute(Object input) {
  if("left".equals(input) {
    return "right";
  } 
  return "left";
}

It has one condition with two branches. With a single test, you may get 66% Line Coverage and 50% Condition Coverage. I’ve experiences several times that branch coverage is perceived as “better” or of “more value” because it’s harder to achieve. If “harder” means “more code” it’s certainly true, but branch coverage suffers the same basic problem as line coverage does: it’s just a measure for which code is executed and not how good your tests are. It also depends on the code base, what is harder to achieve. If the happy-flow you test covers only a minor part of the code, you may have 50% branch-coverage but only 10% line coverage. Given the above example, assume the “left”-branch contains 10 lines of code, but you only test for the “right”-branch.

But as we are developers who want to make happy managers, let’s fake branch coverage!
Given, we only test a single flow in a single test, we need two tests:


@Test
public void testLeft() {
 String output = compute("left");
}
@Test
public void testRight() {
 String output = compute("right");
}

This test will produce 100% branch- and line coverage and is very unlikely to fail, ever.
But again: it’s worthless, because we don’t check any output of the operation. So the operation may return anything without failing the test. But still in terms of KPI metrics we achieved:

2 tests for 1 method (great ratio!)
100% line coverage
100% condition coverage

What we missed to have is an assertion. Assertion postulate expected outcomes of an operation. If the actual outcome is different than expected, the test fails. Theoretically it would would be possible to count assertions per test in static code analysis. But I’ve never seen such metric although it’s value would be similar to line- or condition coverage. Nevertheless: we can fake it!

So in the next post, I'll show you how to fake assertions.

Tuesday, May 31, 2016

How to fake tests (Part 1)

In most projects, metrics play an important role to determine the status, health, quality etc. of the project. Not rarely the common metrics for quality have been

Number of Unit Tests (Total, Failed, Successful)
Line Coverage
Branch Coverage

Usually those “KPI” (Key Performance Indicators) were used by “managers” to steer the project to success. The problem with these metrics is: they are totally useless if taken out of context - and the context is usually not that well defined in terms of metrics, but often requires knowledge and insight into the system that’s been measured.

This post is about to show how to game the system and life-hack those KPIs to fake good quality. It’s NOT a best practice but a heads up to those who make decision based on those metrics to look behind the values.

Faking Number of Unit Tests

Most (if not all?) frameworks count the number of tests executed, which failed and which succeeded. A high number of tests is usually perceived as a good indicator of quality. The increase of the amount of tests should correlate with the increase in lines of code (another false-friend KPI). But what is counted as a test?

Let’s look at the Junit which is the de-facto standard for developing and executing Java based unit tests, but other frameworks such as TestNG follow similar concepts.

In Junit 3 it was every parameterless public void method starting with “test” in a class extending TestCase. Since Junit 4 every method annotated with @Test counts as a Test.

That’s it. Just a name convention or an Annotation and you have your test, so let’s fake it!

@Test
public void test() {

}

This is pure gold: a stable and ever succeeding Unit Test!

Copy and paste or even generate those and you produce a test suite satisfying the criteria:

Big, tons of tests, probable even more than you have LoCs
Stable, none of these tests is failing. Ever.
Fast, you have feedback about the success within seconds.

The downside: it’s worthless (surprise, surprise!). There are basically two primary reasons, why its worthless:

It doesn’t run any code
It doesn’t pose any assertion about the outcome

Good indicators to check the first one are line or condition coverage analysis. The latter is more difficult to check.

In the upcoming posts we'll have a look into both.

Pages