How to test code with infrastructure dependencies

You're on the waitlist!

Oops! Something went wrong while submitting the form.

You're on the waitlist!

Oops! Something went wrong while submitting the form.

In this article, we

discuss the benefits and drawbacks of two basic approaches (using abstraction vs. real infrastructure),
list questions to ask when designing a testing strategy,
provide several opinionated principles that apply regardless of the selected testing strategy,
share the Spotflow engineering team’s experience and opinions on the topic.

To better understand the matter at hand, imagine a background service or job (a system under test) that consumes messages containing JSON documents from a queue and subsequently stores these documents in a cloud object storage such as Azure Blob Storage or AWS S3. Is there a way to test such a service end-to-end in a way that scales both with the number of tests and team members? Can such a test be run easily on a local workstation by anybody on the team?

To test such a service, typically one of the following strategies is used:

Abstraction
External infrastructure is abstracted from the code with patterns such as Gateway or Repository. These patterns are frequently paired with test doubles such as mocks, stubs, or fakes.
‍
Real infrastructure
Code is tested against real infrastructure, which is either deployed in a dedicated testing instance (frequently shared by multiple engineers or teams) or production infrastructure is used.
‍
The code is not tested at all
This is an easy way out, but it is clearly not sustainable in the long term.

Should the real infrastructure be used in tests?

Let’s start the discussion by asking whether real infrastructure dependencies such as SQL databases, Azure’s Storage Accounts, Key Vaults, Service Bus, Event Hubs, or AWS’s S3, Dynamo, Kinesis, or SES should be used during testing.

Based on our observation, with a slight grain of salt, there are two opposing camps of engineers with strong opinions on this topic.

On the more traditional side, there are engineers who intentionally design both production and test code so that external dependencies are used only in the latest stages of the testing pipeline (such as acceptance or system tests). These engineers are often willing to go to great lengths to abstract away access to all the external dependencies in the code and use mocks or other test doubles heavily. They attempt to cover as much of the codebase as possible with unit tests or other kinds of tests that do not require external dependencies.

On the opposite side, some engineers find almost zero value in any tests that are not running against a fully integrated system and disapprove introducing abstractions in the code only to decouple logic from the infrastructure access.

Obviously, both approaches have issues, and they both have merit. The optimal testing strategy will be a mixture of the two. However, selecting the optimal testing strategy in a given context is not a simple unidimensional decision; it requires evaluating the specific context of the given system and the team.

What are the challenges?

Now, let’s look into specific challenges we face when choosing one approach over the other.

When striving to use real infrastructure as little as possible (shifting right), most problems are caused by the introduced abstractions and the need for test doubles:

Incidental complexity
‍Abstracting all infrastructure access leads to additional code and non-essential (incidental) complexity. Navigating multiple layers of indirections when debugging a simple HTTP call is neither pleasant nor effective.

Skill transfer
‍Engineers who previously worked with some specific kind of infrastructure and used standard, widely known, SDKs to access it, cannot reuse their existing knowledge. Instead, they must learn new set of application-specific abstractions and interfaces.
‍
Effort spent on test doubles
‍The complexity of writing test doubles (mocks, stubs, fakes) might easily get out of hand. Especially with mocks, a large amount of non-reusable code is typically produced.
‍
Coverage
‍Some parts of the codebase (the part for infrastructure access) are tested only by tests that are run less frequently, are more expensive to create, and are painful to keep up to date. This makes such parts of the codebase more difficult to maintain and more error-prone.

On the other hand, when using the real infrastructure as much as possible (shifting left), problems arise with provisioning the infrastructure and complexity of the test code:

Setup
‍It is not straightforward to ensure that each test runs on infrastructure in the state expected by the test. For a database, this means creating and cleaning up relevant tables, making sure that tests use an independent set of rows/tables, or recreating the database for each test. Any of these approaches will certainly increase the complexity of the test code.

Time
‍Tests are taking much more time. Creating and cleaning the infrastructure is not instant; for example, creating a new Azure Storage Account might take 30 seconds. Also, the test execution is slower: assuming that a typical HTTP call takes 100 ms and a method call on a mock takes 100 nanoseconds, the difference might get huge (1 ms = 1 000 000 ns).

‍Need for isolation
‍Special effort must be made to isolate the test runs. Compared to tests using in-memory test doubles (which have exactly zero chance of interfering with other engineers' test runs), tests using external infrastructure might conflict and thus be flaky or non-deterministic.

Costs
‍Tests might get expensive. Real infrastructure is not free. The additional cost must be taken into account, especially when the pay-as-you-go/serverless infrastructure cannot be used.

Fault simulation
‍When using real infrastructure, it is almost impossible to simulate specific faults, such as requests timing out or failing, to which the system under test should be resilient. With in-memory test doubles, simulating such faults is easy.

How to choose the optimal strategy?

Now, imagine, just for a moment, that all the problems with the real infrastructure in the test code disappear:

Preparing a fresh, isolated infrastructure for each test run is as simple and fast as a method call.
All operations are as fast as they would be executed in memory.
There are no extra costs.

In such a hypothetical situation, would there be any reason not to use real infrastructure in all the tests? In our opinion, the answer is no, with one exception. When a test case needs to verify how the system under test responds to infrastructure faults or delays, using real infrastructure is, in most cases, not feasible. Otherwise:

With real infrastructure, incidental complexity caused by abstractions is removed, saving a lot of time when writing and, mainly, reading the code. Of course, useful abstractions such as those introduced by Domain-Driven Design are still possible.
Every part of the codebase can be tested with equal difficulty.
Tests are giving more precise feedback because they run in a more production-like setup.
No need to write extra code for the test doubles.

Of course, this is not the reality we live in. However, with modern programming languages, infrastructure tools, cloud services, and testing frameworks, we are much closer to this ultimate state now (2024) than we were just a few years ago. This is especially true when compared to 10-20 years ago, when many best practices around software testing were devised. Despite these advancements, many people continue to dogmatically follow those older practices today.

We believe the advances in the tooling mentioned above will continue at the same or higher pace, and thus, it is worth betting on this when designing a testing strategy. However, we are still far from the ideal situation, so using real infrastructure in tests should be carefully considered. This leads to several principles being explained further.

Evaluate the context of the tested system

Same as with software architecture, the correct testing strategy depends on many factors that continuously change. Factors to consider include:

How many engineers are on the development team?
A system that a single indie developer develops should probably be tested differently than a system developed by a startup of 8 engineers, and such a startup has different needs than a department of 10 teams. The indie developer can spin-up one testing instance of a database and use it for all the tests with success. On the contrary, the entire department using a single database instance might lead to problems with large number of tests running concurrently, different requirements on seed data and such.

How many tests are there and how long do they take?
Small and fast test suites are typically fine to be run sequentially. On the contrary, larger suites or suites containing long-running tests typically need to be run in parallel and thus designed accordingly.

Is the infrastructure access central to the functionality of the tested system?Consider two systems. First, a business application whose essential complexity is in evaluating business rules in memory over tabular data which can be fetched via ODBC from any relational database. Second, a data-intensive application carefully manipulates blocks of Azure Storage blobs as part of its core functionality. Testing with real infrastructure surely brings more value to the second application than the first one. A SELECT query in a relational database is an operation that is both trivial to execute and easy to emulate without real infrastructure thanks to existing tools in the area such as SQLite or Entity Framework. Manipulating blocks in Azure Storage blobs, on the other hand, is not so straightforward and existing emulation tools are lacking.

How sophisticated is the provisioning of the infrastructure?
‍When infrastructure provisioning is fast and simple, it shifts the balance toward creating more tests relying on real infrastructure. For example, if an engineer is able to provision an AWS S3 bucket directly in test code via an API call, the cost-value ratio of writing a test that needs real infrastructure improves dramatically compared to a situation when creating the infrastructure requires manual steps or an unfeasible amount of time.

What are the capabilities of the toolchain being used?
Not all programming languages, ecosystems of libraries, and build tools are equal. This aspect is especially relevant for mocking or similar techniques that heavily rely on reflection, code emitting, or source generation. Also, we might find interesting differences between language ecosystems regarding time abstraction. In some ecosystems, the time abstraction is deeply integrated (such as for .NET and its TimeProvider), while 3rd party libraries or custom abstractions are needed for others.

‍

Use as few abstractions and test doubles as possible but not fewer

This rephrasing of Albert Einstein's quote, "Everything should be made as simple as possible, but not simpler", is the main principle when deciding test strategy. We should minimize the number of abstractions and test doubles (for the reasons explained above) but not beyond the threshold where various qualities of the tests would suffer. The test qualities that might be compromised include:

Setup complexity.
Readability.
Speed.
Repeatability.
Ability to verify resilience towards infrastructure faults (fault simulation).

Design code for testability but avoid introducing abstractions whose only purpose is to hide infrastructure access

Designing for testability is a sound principle, battle-tested not only in software but also in hardware and other kinds of engineering. In software engineering, it is known not just for making testing more efficient but making the tested code itself better by:

Making classes and methods/functions focused on well-defined tasks (high cohesion).
Compose object graphs in a way that minimizes interdependencies (low coupling).
High cohesion and low coupling decrease the need for test doubles.
Promoting code without side effects.
Making instantiation of objects simple and robust.

Unfortunately, attempting to design code for testability also often leads to adding more layers of abstractions in the quest to make the code using external infrastructure testable without using the actual infrastructure. Such abstractions serve no other purpose than hiding access to the infrastructure. They have no meaning related to the purpose of the actual application.

We are convinced that such abstractions are only an incidental complexity and should be eliminated if possible. However, this does not mean giving up on the separation of concerns or breaking layering/modularization rules imposed by our architecture of choice.

Example for Domain Driven Design (DDD)

When using DDD, we are not advocating starting to access infrastructure directly in the domain layer instead of using repositories. Repositories, event publishers, and such have a special, well-understood function in the DDD that goes beyond the simple hiding of infrastructure access. In the case of repositories, they are responsible for fetching domain model aggregates which includes constructing and deconstructing domain objects and might involve orchestrating access to more than one external dependency.

In general, DDD’s domain layer is designed to be highly testable without reliance on real infrastructure. However, when it comes to the application or infrastructure layer, we advocate implementing at least a few tests that utilize real infrastructure or a reliable emulator. It is also crucial to avoid introducing abstractions, beyond those required by DDD that serve only to hide access to infrastructure.

Challenge existing testing practices

In software testing, many dogmas or “best practices” might have been valid only in the past or only in some specific contexts, but they are, even today, considered the only correct way to test code. We think it is very important to challenge these dogmas and find new ways that are adapted to a given system, team and state-of-the-art tools. Examples of such dogmas include:

‍“Unit tests cannot have any external dependencies”
If my code unit needs the external dependency to perform its core functionality and I am able to provide the dependency within the test with reasonable complexity, why should I use a mock or other test double instead?
‍
“Each test must be executed on an isolated infrastructure with the same initial state.”
This practice definitely has its place in larger teams or for tests that validate code behavior under very specific, intricate conditions. This might include concurrent access or testing systems under specific kind of stress. However, many, if not most, tests validate logical operations that should, by design, behave exactly the same under any circumstances. In many situations, such tests are executable on a shared infrastructure without issues.

“Test kinds other than unit tests require complex tooling and setup.”
This might have been true in the past, when, for example, web servers required separate heavyweight processes and obtaining a new database took hours or days. But these days, it is much easier. We can spin up a web server directly within the test process, connect to an in-memory database or a database created in the cloud, and finally, use a standard HTTP client to make a request and verify the response, all within a single process.
‍

Building on the previous example of a dogma, here is a sample code snippet in C# using ASP.NET, Spotflow In-Memory Azure Test SDK and FluentAssertions library. This code demonstrates how straightforward it can be to test a web API that depends on Azure Blob Storage.

First, let’s introduce a production code that bootstraps and starts the web application. The application is accepting HTTP PUT requests on /upload/{name} route:

static class ExampleApplication
{
	public static async Task StartAsync(BlobContainerClient client)
	{	
			var builder = WebApplication.CreateBuilder(args);
			
			builder.Services.AddSingleton(client);
			
			var app = builder.Build();
			
			app.MapPut("/upload/{name}", HandleRequest);
			
			await app.StartAsync();
			
			return app;
	}		
		
	static IResult HandleRequest(string name, HttpContext context, UploadService service)
	{		
			service.Upload(name, context.Request.Body);
	}			
}

The application makes use of UploadService class for uploading the incoming data to the Azure Blob Storage. This service internally uses BlobContainerClient from the official Azure SDK to perform the actual infrastructure operations:

class UploadService(BlobContainerClient client)
{
		public void Upload(string name, Stream stream) => client.UploadBlob(name, stream);
}

At last, let’s see how the test code looks like:

[TestMethod]
public async Task Example_Service_Should_Upload_Blob()
{	
	// Arrange: create in-memory storage account, start the app and prepare HTTP client
	
	var storage = new InMemoryStorageProvider();
	var account = storage.AddAccount();
	var conn = account.CreateConnectionString();
	
	var containerClient = new InMemoryBlobContainerClient(conn, "data", storage);
	
	using var app = await ExampleApplication.StartAsync(containerClient);	
	using var httpClient = new HttpClient { BaseAddress = new Uri(app.Urls.First()) };

	// Act: send HTTP request
	
	var data = BinaryData.FromString("test-content").ToStream();
	var content = new StreamContent(content);
	var response = await httpClient.PutAsync("/upload/test-name", content);
	
	// Assert: check response code, and existing blobs in the storage
	
	response.Should().HaveStatusCode(HttpStatusCode.OK);	
  containerClient.GetBlobs().Should().HaveCount(1);
  containerClient.GetBlobClient("test-name").Exists().Should().BeTrue();
}

We believe that this is a great example of how easily, with modern tools, an application that requires two distinct pieces of infrastructure (Azure Storage and web server) can be tested.

Prefer fakes over mocks when the need for a test double arises.

This might be a more controversial point, but we firmly believe fakes are superior to mocks on many levels when test doubles are needed.

For further discussion, consider the following example: class MessageProcessor that requires AppendBlobClient class for appending data into an Azure Storage Append Blob. MessageProcessor internally calls the method AppendBlock(Stream content) method on the AppendBlobClient. This client also has DownloadContent method that allows to read all appended blocks at once in a form of a Stream.

class AppendBlobClient
{
		void AppendBlock(Stream content) { .. }
		Stream DownloadContent() { .. }
}

We want to test scenarios where one or more messages are sent to MessageProcessor which should in turn append them into a blob so that all appended messages can be read at once.

With mocking, engineers typically craft mocks of dependencies of the code that is being tested and then assert that specific methods with specific parameters were called by the tested code, sometimes also in the specific order (so-called behavior verification). Such mocks are crafted for each specific test case. In the context of the example above, the engineer would create a mock object for the AppendBlobClient , for example with a library like Moq or NSubstitute in .NET, and then assert that the method AppendBlock was called with specific content.

With fakes, engineers implement a working version of the dependency that mimics the real behavior as closely as necessary but is as simplified as possible (e.g. storing data in memory only). This simplified version is then used in the tested code. To verify the correctness of the system under test, various properties of the fake are asserted after the test case is executed (so-called state verification). In the example above, the engineer could create a class that inherits AppendBlobClient, e.g., InMemoryAppendBlobClient : AppendBlobClient, provide a simple in-memory implementation of the blob and then assert that the blob has expected content by calling the DownloadContent method.

This might not seem like a big difference at first, but the implications are significant:

Tests using fakes tend to verify behavior that truly matters in production, compared to mocks, which frequently verify some technical assumptions about the tested code that matter little in reality. Instead of asserting that the blob has expected content, with mocks, we would assert how many times and with which parameters the AppendBlock method was called.
With mocks, we are creating mock objects for each test case over and over. This might not be an issue when mocked methods with void/Unit return type, but it gets really tedious otherwise. With fake, only one implementation is needed that might be reused for multiple test cases, tested, versioned, shared externally, and overall treated as any other code.
With mocks, we are relying on specific implementation details of the tested code. What happens if MessageProcessor will be updated to buffer multiple messages and call AppendBlock only once? With mocks, the existing tests will start failing. With fakes, the existing tests remain intact.
As discussed above, the infrastructure tooling is continuously improving. It is much easier to update tests to use real infrastructure when they were previously using fakes, compared to when they were using mocks. With fakes, we can gradually replace them or even only parts of them with the real infrastructure with little to no impact on the test code. Switching from mocks to real infrastructure typically requires redesigning the entire test and thus not benefiting from the improvements.

So far, we have considered fakes to be implemented in the same language and used in the same process as the tested code. However, fakes can also be implemented in a language-agnostic way, most frequently as a Docker container that can be run locally. The tested code simply connects to it via the local network as it would connect to the real dependency, e.g., over the Internet.

Testing with in-memory test doubles/fakes.

Testing with fake implementation running as a Docker container on local network.

Moreover, with fakes, chances are that there are already existing fake implementations for many dependencies. Some examples:

Language-agnostic: TestContainers (variety of infra), Azurite (Azure), LocalStack (AWS)
.NET: Spotflow In-Memory Azure Test SDK (Azure)
Python: Moto (AWS)

We recommend an excellent article by Martin Fowler on the topic for an in-depth discussion of the differences between fakes, mocks, and other kinds of test doubles, or more generally, between state verification and behavior verification.

Conclusion

When it comes to testing code that requires external infrastructure, there are two main approaches: abstracting access to the external infrastructure via patterns such as Gateway and Repository, or using real infrastructure during testing. Both approaches come with specific challenges, such as incidental code complexity when using abstraction or test setup complexity when using real infrastructure.

For most non-trivial systems, both of these approaches need to be combined to achieve the desired levels of code testability as well as test complexity, repeatability and speed. Although there is no simple recipe that anyone can use to determine the testing strategy that would fit any specific system, team, and available tooling, we provide a framework for everybody to make their own decisions, alongside a few widely applicable principles. These principles can be summarized in the following points:

Continuously evaluate the context. The characteristics of the tested system and the team, as well as the state of the art of available infrastructure tooling, vary widely. Moreover, all of this is constantly evolving.
Use precisely the minimum amount of abstraction necessary to achieve the desired test quality - no more, no less. Do not introduce abstractions just because they were once considered "best practice."
Design code to minimize the need for test doubles. If a test double is necessary, prefer fakes over mocks, or in other words, prefer state verification over behavior verification.

The author wishes to thank Michal Zatloukal, Jakub Švehla, and David Nepožitek for their valuable insights that contributed to this article.

You're on the waitlist!

Oops! Something went wrong while submitting the form.

You're on the waitlist!

Oops! Something went wrong while submitting the form.

About Author

Tomas Pajurek

CTO at Spotflow

Tomas is a software engineer at heart with a proven track record in architecting data-intensive systems, mainly for agritech, manufacturing, or biotech sectors, and leading engineering teams building those. He is deeply interested in the design of IIoT, stateful stream processing & distributed systems, software & platform architecture, resilience, cloud, and security. With his team of talented engineers, they apply this knowledge daily to ensure Spotflow is a product that our customers can fully rely on and enjoy using.