In this article, we
To better understand the matter at hand, imagine a background service or job (a system under test) that consumes messages containing JSON documents from a queue and subsequently stores these documents in a cloud object storage such as Azure Blob Storage or AWS S3. Is there a way to test such a service end-to-end in a way that scales both with the number of tests and team members? Can such a test be run easily on a local workstation by anybody on the team?
To test such a service, typically one of the following strategies is used:
Let’s start the discussion by asking whether real infrastructure dependencies such as SQL databases, Azure’s Storage Accounts, Key Vaults, Service Bus, Event Hubs, or AWS’s S3, Dynamo, Kinesis, or SES should be used during testing.
Based on our observation, with a slight grain of salt, there are two opposing camps of engineers with strong opinions on this topic.
On the more traditional side, there are engineers who intentionally design both production and test code so that external dependencies are used only in the latest stages of the testing pipeline (such as acceptance or system tests). These engineers are often willing to go to great lengths to abstract away access to all the external dependencies in the code and use mocks or other test doubles heavily. They attempt to cover as much of the codebase as possible with unit tests or other kinds of tests that do not require external dependencies.
On the opposite side, some engineers find almost zero value in any tests that are not running against a fully integrated system and disapprove introducing abstractions in the code only to decouple logic from the infrastructure access.
Obviously, both approaches have issues, and they both have merit. The optimal testing strategy will be a mixture of the two. However, selecting the optimal testing strategy in a given context is not a simple unidimensional decision; it requires evaluating the specific context of the given system and the team.
Now, let’s look into specific challenges we face when choosing one approach over the other.
When striving to use real infrastructure as little as possible (shifting right), most problems are caused by the introduced abstractions and the need for test doubles:
On the other hand, when using the real infrastructure as much as possible (shifting left), problems arise with provisioning the infrastructure and complexity of the test code:
Now, imagine, just for a moment, that all the problems with the real infrastructure in the test code disappear:
In such a hypothetical situation, would there be any reason not to use real infrastructure in all the tests? In our opinion, the answer is no, with one exception. When a test case needs to verify how the system under test responds to infrastructure faults or delays, using real infrastructure is, in most cases, not feasible. Otherwise:
Of course, this is not the reality we live in. However, with modern programming languages, infrastructure tools, cloud services, and testing frameworks, we are much closer to this ultimate state now (2024) than we were just a few years ago. This is especially true when compared to 10-20 years ago, when many best practices around software testing were devised. Despite these advancements, many people continue to dogmatically follow those older practices today.
We believe the advances in the tooling mentioned above will continue at the same or higher pace, and thus, it is worth betting on this when designing a testing strategy. However, we are still far from the ideal situation, so using real infrastructure in tests should be carefully considered. This leads to several principles being explained further.
Same as with software architecture, the correct testing strategy depends on many factors that continuously change. Factors to consider include:
SELECT
query in a relational database is an operation that is both trivial to execute and easy to emulate without real infrastructure thanks to existing tools in the area such as SQLite or Entity Framework. Manipulating blocks in Azure Storage blobs, on the other hand, is not so straightforward and existing emulation tools are lacking.
This rephrasing of Albert Einstein's quote, "Everything should be made as simple as possible, but not simpler", is the main principle when deciding test strategy. We should minimize the number of abstractions and test doubles (for the reasons explained above) but not beyond the threshold where various qualities of the tests would suffer. The test qualities that might be compromised include:
Designing for testability is a sound principle, battle-tested not only in software but also in hardware and other kinds of engineering. In software engineering, it is known not just for making testing more efficient but making the tested code itself better by:
Unfortunately, attempting to design code for testability also often leads to adding more layers of abstractions in the quest to make the code using external infrastructure testable without using the actual infrastructure. Such abstractions serve no other purpose than hiding access to the infrastructure. They have no meaning related to the purpose of the actual application.
We are convinced that such abstractions are only an incidental complexity and should be eliminated if possible. However, this does not mean giving up on the separation of concerns or breaking layering/modularization rules imposed by our architecture of choice.
When using DDD, we are not advocating starting to access infrastructure directly in the domain layer instead of using repositories. Repositories, event publishers, and such have a special, well-understood function in the DDD that goes beyond the simple hiding of infrastructure access. In the case of repositories, they are responsible for fetching domain model aggregates which includes constructing and deconstructing domain objects and might involve orchestrating access to more than one external dependency.
In general, DDD’s domain layer is designed to be highly testable without reliance on real infrastructure. However, when it comes to the application or infrastructure layer, we advocate implementing at least a few tests that utilize real infrastructure or a reliable emulator. It is also crucial to avoid introducing abstractions, beyond those required by DDD that serve only to hide access to infrastructure.
In software testing, many dogmas or “best practices” might have been valid only in the past or only in some specific contexts, but they are, even today, considered the only correct way to test code. We think it is very important to challenge these dogmas and find new ways that are adapted to a given system, team and state-of-the-art tools. Examples of such dogmas include:
Building on the previous example of a dogma, here is a sample code snippet in C# using ASP.NET, Spotflow In-Memory Azure Test SDK and FluentAssertions library. This code demonstrates how straightforward it can be to test a web API that depends on Azure Blob Storage.
First, let’s introduce a production code that bootstraps and starts the web application. The application is accepting HTTP PUT requests on /upload/{name}
route:
The application makes use of UploadService
class for uploading the incoming data to the Azure Blob Storage. This service internally uses BlobContainerClient
from the official Azure SDK to perform the actual infrastructure operations:
At last, let’s see how the test code looks like:
We believe that this is a great example of how easily, with modern tools, an application that requires two distinct pieces of infrastructure (Azure Storage and web server) can be tested.
This might be a more controversial point, but we firmly believe fakes are superior to mocks on many levels when test doubles are needed.
For further discussion, consider the following example: class MessageProcessor
that requires AppendBlobClient
class for appending data into an Azure Storage Append Blob. MessageProcessor
internally calls the method AppendBlock(Stream content)
method on the AppendBlobClient
. This client also has DownloadContent
method that allows to read all appended blocks at once in a form of a Stream
.
We want to test scenarios where one or more messages are sent to MessageProcessor
which should in turn append them into a blob so that all appended messages can be read at once.
With mocking, engineers typically craft mocks of dependencies of the code that is being tested and then assert that specific methods with specific parameters were called by the tested code, sometimes also in the specific order (so-called behavior verification). Such mocks are crafted for each specific test case. In the context of the example above, the engineer would create a mock object for the AppendBlobClient
, for example with a library like Moq or NSubstitute in .NET, and then assert that the method AppendBlock
was called with specific content.
With fakes, engineers implement a working version of the dependency that mimics the real behavior as closely as necessary but is as simplified as possible (e.g. storing data in memory only). This simplified version is then used in the tested code. To verify the correctness of the system under test, various properties of the fake are asserted after the test case is executed (so-called state verification). In the example above, the engineer could create a class that inherits AppendBlobClient
, e.g., InMemoryAppendBlobClient : AppendBlobClient
, provide a simple in-memory implementation of the blob and then assert that the blob has expected content by calling the DownloadContent
method.
This might not seem like a big difference at first, but the implications are significant:
AppendBlock
method was called.void/Unit
return type, but it gets really tedious otherwise. With fake, only one implementation is needed that might be reused for multiple test cases, tested, versioned, shared externally, and overall treated as any other code.MessageProcessor
will be updated to buffer multiple messages and call AppendBlock
only once? With mocks, the existing tests will start failing. With fakes, the existing tests remain intact.So far, we have considered fakes to be implemented in the same language and used in the same process as the tested code. However, fakes can also be implemented in a language-agnostic way, most frequently as a Docker container that can be run locally. The tested code simply connects to it via the local network as it would connect to the real dependency, e.g., over the Internet.
Moreover, with fakes, chances are that there are already existing fake implementations for many dependencies. Some examples:
We recommend an excellent article by Martin Fowler on the topic for an in-depth discussion of the differences between fakes, mocks, and other kinds of test doubles, or more generally, between state verification and behavior verification.
When it comes to testing code that requires external infrastructure, there are two main approaches: abstracting access to the external infrastructure via patterns such as Gateway and Repository, or using real infrastructure during testing. Both approaches come with specific challenges, such as incidental code complexity when using abstraction or test setup complexity when using real infrastructure.
For most non-trivial systems, both of these approaches need to be combined to achieve the desired levels of code testability as well as test complexity, repeatability and speed. Although there is no simple recipe that anyone can use to determine the testing strategy that would fit any specific system, team, and available tooling, we provide a framework for everybody to make their own decisions, alongside a few widely applicable principles. These principles can be summarized in the following points:
The author wishes to thank Michal Zatloukal, Jakub Švehla, and David Nepožitek for their valuable insights that contributed to this article.