Software Testing

I’ve spent a fair amount of time working on a range of software testing, and have certainly formed some opinions from being on teams where my feelings and empirical evidence suggested that tests were providing a lot of value, and other teams where large amounts of effort was spent on tests with questionable value.

Likely as a result of agile software there seems to be widespread adoption of testing (which is good) but much of that testing seems to focus far too much on hitting some metric or other by testing what the system is expected to do rather than exercising and exorcising the dark corners of undesired behavior. Overreliance on some of the testing methods can also lead to tests which rely on advanced testing frameworks which remove the system under test from a representative environment, and can also allow developers to leave code as needlessly complex rather than thinking through how to design it to be simpler and more inherently testable without such power.

System Openness

An initial factor when considering testing strategies is to match the test approach to how the system will be used. There are at least two general classes of software - that which operates in a closed system and that which is open.

Closed Systems

An example of a closed system is the code behind this Web site; if the Web site is generated properly than any code within the site is obviously functioning properly. In the past I’ve built plenty of internal tools which also fall into this category; there’s some smallish amount of trusted input which is discretely handled and any issue would result in a detectable failure which could be addressed if it arose with minimal incurred costs. Such systems may need little to no testing if they remain simple enough that encountered errors would not be mysterious.

A strategy I’ve used for such systems with a relatively wide internal user base (or sometimes even myself) is a form of blackbox testing that I personally call I/O testing since I don’t know of a better name (there likely is one), It is basically end-to-end snapshot testing: define pairs of inputs and expected outputs and use some basic tooling (bash and diff do the trick) to verify that each input produces the desired output. The granularity of such tests match that of the system as a whole in that the value of the system can be entirely defined in terms of the evident output produced from inputs. This can include a range of torture tests to attempt to work through edges and corners and provides a very straightforward way to reproduce and address any issues that may arise. This can also lend itself to some basic performance testing.

Open Systems

While closed systems may allow for looser testing, software that pays the bills is almost certainly open. Such software exists to support whatever inputs users want to throw at it, and the software should be expected to do something appropriate regardless of what that input looks like. Acquiring confidence regarding the functionality of such systems requires not just understanding how the expected cases are handled but how any possible scenario would be handled (or prevented). These systems are likely to benefit from an arsenal of different types of tests that attempt to exercise the system and any of its integrations as completely as possible. This is crucial to not only try to contain encountered issues but also to attempt to curb the far more insidious issues which aren’t encountered and are lurking undetected in systems waiting to manifest in some form of knock-on issues or eroded user experience.

The intent of this distinction is largely to explain the limited testing for closed systems, but the gaps around testing open systems is a far more serious issue as alluded to the introduction. I’ve witnessed cases where open systems are treated as closed systems which yields a mentality where appropriate behavior at the moment is deemed to indicate correctness over time (such as considering a snapshot of production data representative of the full range of expected inputs and treating any unexpected cases as one-off anomalies). More commonly systems ares tested with fairly simplistic and idealized cases which do little to cover the range of interactions and environmental impacts the system may endure in the wild. Any such deficiencies could likely be remedied by honestly appraising how likely testing is to determine how the system will behave in response to the different ways how and environments in which it may be used rather than narrower contrived use. A caveat to this position is that it could be interpretted as suggesting that tests should target more representative environments but that approach is likely to be far more cumbersome and often impractical when it comes to testing scenarios such as those that may be guarded against but remain possible; strategies such as test pyramid should be preferred to provide test coverage while containing the costs of the tests so this is more a matter of which cases are fit into such approaches rather than an alternative approach.