In the previous article, we discussed the essentials of monitoring and observability in IoT. Mainly, we presented how to leverage logs, metrics, traces, and structured events to enhance the observability of your IoT systems.
It is no exception to operate tens of thousands of IoT devices. Scaling your observability solution might quickly lead to insufficient performance and unbearable costs for your observability infrastructure. Thus, this article will focus on handling the large scale.
We’ll discuss a few techniques that can help you balance the trade-offs that come with a great scale:
Okay, we know what to collect, now we just dump all the data into our MySQL and we’re ready to observe, right? Well, not so fast (pun intended), this might not be the best idea for several reasons. We’ll look at our requirements for the database and then suggest a storage that will serve our needs better.
First, let’s revise a few characteristics of storing IoT observability data:
There’s definitely more to it, but this small set of characteristics will be enough to make our point.
We’re probably all familiar with SQL databases, so it’s natural to consider it as a place to store our observability data. However, several technical aspects make SQL databases unsuitable for storing large-scale observability data.
Traditional row-oriented databases, like MySQL or PostgreSQL, struggle to efficiently handle queries on tables with many dimensions when only a subset of columns is required. Another issue of high dimensionality is the difficulty of implementing efficient indexing. We can’t create database indices for a subset of columns beforehand, because we don’t know which dimensions will be important during troubleshooting. So, we would either need to index all columns (which would be quite expensive), or the queries would be slow when filtering based on the unindexed columns.
Also, without explicit time-based data partitioning, there is usually no efficient way of discarding old data. Time-partitioning allows efficiently deleting large chunks of data when they get stale.
In case of reasonable motivations for using a traditional SQL database for observability data, you might want to consider Timescale. It is a PostgreSQL extension that addresses some of the challenges mentioned above with time partitioning and better compression while still using the row-based SQL model. Spotflow provides seamless integration with PostgreSQL and Timescale via the SQL egress sink. Our platform transforms the JSON messages from your devices into database rows according to the mapping that you specify.
The categorization of observability signals into metrics, logs, and traces has led to the development of specialized storages tailored to each signal type. For example, there is Mimir for metrics, Loki for logs, and Tempo/Jaeger for traces. Each of these storages is made with the specific signal type in mind, which makes them effective for monitoring use cases within the specific signal. However, it might be cumbersome to query data across these storages.
Additionally, certain storages have some specific limitations. For instance, the traditional time series databases (TSDBs, such as Mimir) cannot handle high cardinality data. TSDBs store a separate time series for each unique set of attributes. This approach can be very efficient with a limited number of dimensions and low cardinality as writing and querying within a single time series is very performant. However, with high cardinality, the database needs to create a new series very often because it often encounters a unique combination of attributes. As a result, when retrieving aggregate values, the database needs to read through each time series, making the operation inefficient. This issue is particularly problematic within the IoT sector, where using high cardinality labels such as device ID and sensor ID would seem appropriate.
With Spotflow, you can leverage the strength of the Grafana stack and other storages that support the OpenTelemetry protocol (OTLP) using the OpenTelemetry egress sink. It allows you to send messages in the OTLP format into the Spotflow IoT platform, which will route these messages to your preferred observability backend.
With the increasing demand for analytical workloads similar to ours (as described above), a new wave of databases emerged. They employ columnar storage, which makes the read operations more efficient as they only touch the columns required for the particular query. Thanks to time-partitioning, the database can limit the read operations only to a limited range of data, making the queries even more efficient. The combination of these design choices makes the compression work faster as well, as the algorithm operates on single columns bounded by a time range. Notable examples of such storages include InfluxDB, QuestDB, and ClickHouse.
At a certain scale, it becomes unbearable to collect and store every observability signal that your devices produce. Thankfully, this is usually unnecessary as you can successfully debug issues with only a fraction of the observability data. For example, the events describing successful scenarios are often not as important as the ones describing failures. This is why we can discard most of these events and store only a few examples that are representative enough to reconstruct the particular historical situation.
Various sampling strategies exist to ensure that only a limited number of events are collected while still preserving sufficient detail. It's essential to choose a sampling approach that aligns with your specific needs. Instrumentation libraries, such as OpenTelemetry SDKs, often provide implementations of such sampling strategies. This makes sampling a relatively easy way to reduce storage and processing costs.
In the context of tracing, we distinguish two kinds of sampling based on the point where the sampling decisions are made: head and tail sampling. Head sampling decides whether a span/trace will be sampled right at the device, while tail sampling makes this decision later once all the spans of the particular trace are collected. The main advantages of head sampling are simplicity and cost efficiency. It reduces network traffic, which can be constrained in IoT environments, and avoids storing and processing unsampled data in observability backends. However, tail sampling becomes necessary if you prefer to make sampling decisions based on the entire trace. This approach is useful if you want to sample traces with errors differently than the successful ones.
Observability data tends to lose their value over time quickly. The telemetry received today is usually much more valuable than data from the last year. This gives us another way to significantly trim the storage costs. Retention policies allow the automatic removal of data beyond a specified timeframe. Time-based partitioning simplifies the implementation of retention policies which is why many modern databases support them out of the box.
Another strategy is utilizing tiered storage. That is, storing older data in low-cost object storages like Amazon S3 or Azure Blob Storage. Although querying from these storages might have higher latencies than local disks, it allows you to retain the data longer while still reducing storage costs.
Lastly, it is possible to reduce the resolution of historical data further. One approach is to perform a secondary round of downsampling on older data. An alternative approach is to explicitly create aggregates of historical data while discarding the original raw records.
When setting up an IoT observability stack, you must decide where to store the data and select an appropriate observability backend. In this article, we have described various aspects to consider when making this decision to optimize cost-efficiency and scalability. The main points to remember are the following: