Remote management of IoT devices

TOMAS PAJUREK, CTO & Head of Engineering at Spotflow

Published on Thu Oct 31 2024 Tomas Pajurek CTO at Spotflow

In this article, you will learn about IoT cloud-to-device communication and its several kinds, including examples. We explain that each kind has its specific purpose and place within an IoT solution. Selecting a suitable kind of cloud-to-device communication is the first step toward a reliable production-grade IoT solution.

What is the role of the IoT platform in the remote management of IoT devices?

One of the essential parts of most IoT solutions is the connection of devices to some cloud service or server. Therefore, phrases like “Connecting devices to the cloud” are used frequently. However, the term can have very different meanings under different contexts. The primary difference is in the direction of that connection. Is the device sending data to the cloud (aka device-to-cloud comm.) or receiving data/commands from the cloud (aka cloud-to-device comm.)? In this article, we will explore the less well-understood cloud-to-device communication.

Good real-world examples of cloud-to-device communication might be:

Restarting an IoT device over a network.
Updating configuration parameters of an entire fleet of devices already deployed in the field.
Fetching diagnostics information from an IoT device in real-time.

There are multiple kinds of cloud-to-device communication, and for designers of IoT solutions, it is crucial to pick suitable kinds for given requirements. Failure to do so often leads to unreliability, poor performance, and unnecessarily high costs. However, understanding which type of cloud-to-device communication is suitable for a task at hand is usually insufficient for production-grade IoT solutions, and additional challenges must be addressed.

Kinds of cloud-to-device communication

Based on expected guarantees for delivery, durability, and latency, we can divide cloud-to-device communication into three kinds: Device Twins, Messaging, and Remote Procedure Calls (RPC).

Device Twins

A Device Twin (sometimes referred to as Digital Twin) is the virtual counterpart of a device that lives within the cloud. It represents the current or desired state of the device that can be read or modified by operators or other services. It can be accessed anytime, no matter if the device is online or offline at the moment.

If an operator updates the device twin and the physical device is online at that time, the update (or the entire updated twin snapshot) is immediately delivered to the device, and the device can act accordingly (e. g. change its configuration). If the device is offline, the update is persisted in the cloud and delivered to the device as soon as it returns online. If there are multiple updates to the device twin during the offline period, the device will receive just one merged update (or entire updated twin snapshot) when it comes online. This is an important differentiator of device twins to other categories of cloud-to-device communication → individual updates/history of a an important differentiator of device twins to other categories of cloud-to-device communication → individual updates/history of a device twin are unimportant, and we only care about the resulting state, which is kept for the entire lifetime of the device (it does not expire).

Example

In practical terms, the device twin is a document (e.g., JSON document) stored in the cloud that both device and operators can read and write to. The prominent use case for device twins is configuration.

Let’s look at the following example: assume an imaginary autonomously guided device is moving inside a warehouse. We want an easy way to adjust the maximum movement speed of the device without physical access to it. To implement this functionality, we use a device twin with one numeric property maxSpeed. To change the desired maximum speed, an operator can just set this property to a specific value from anywhere via cloud API. The underlying mechanism of device twins will make sure that this value is delivered to the device (it might not be delivered immediately, but eventually, it will get there). Once the device receives the device twin update, it can adjust the max speed, and the action is successfully finalized.

{ 
 "maxSpeed": "20" 
}

Desired vs. reported state

In an ideal situation, there should not be a difference between the desired state of the device (set by an operator) and the reported (actual, real) state. However, in reality, there are several situations in which this might not be true:

The device is offline at the time when the twin update is posted.
The device is online and receives the twin update but cannot act accordingly for e.g. mechanical faults or limitations.

Frequently, we need to monitor this discrepancy and/or receive feedback from devices for device twin updates. For this reason, the device twin is typically represented as two independent parts (e.g. two JSON documents):

Desired properties represent the desired state of the device that is not necessarily the actual state of the device at the time. It is written by operators and read by devices.
Reported properties representing the actual state of the device. They are written by devices and read by operators.

If everything works smoothly, these two parts should be identical apart for short periods of time between posting the twin update by an operator and processing the update by the device. However, as mentioned, there are situations in which there will be differences that we might want to monitor and act upon.

Example continued

Following up on the previous example of the autonomously guided device and its maximum movement speed, it could happen that the operator sets the maxSpeed property to X, but the specific device’s maximum speed is X-10 (e. g. because the device has a less powerful engine than the operator anticipated). In such a situation, the device can either increase its speed to its real maximum or powerful engine than the operator anticipated). In such a situation, the device can either increase its speed to its real maximum or refuse to make any change at all. Anyway, it can report the actually configured max speed via the reported properties of its device twin so that the operator can act accordingly.

{ 
 "desired": { 
 "maxSpeed": "30" 
 }, 
 "reported": { 
 "maxSpeed": "20" 
 } 
}

Messaging

Messaging is another category of cloud-to-device communication that is in some aspects similar to device twin but is quite different in others. Indeed, it is easier to describe.

With messaging, an operator simply sends a message (sometimes called a command) to a specific device and expects it to act accordingly. If the device is offline at the time of sending, one or more messages are queued in the cloud, and all of them are delivered to the device when it comes back online. The queued messages also typically have some time-to-live (TTL). If a message is queued longer than the TTL without being consumed by the device, it is automatically deleted. These are crucial differences to device twins → each individual message is delivered to the device, and the messages can expire.

Example

Each message is addressed to a specific device and is represented by e.g. JSON document or any other serializable format. Typical examples of cloud-to-device messaging are an invocation of one-off actions such as restarts: the operator wants to restart a device without physical access to it. It is not required for the device to restart right away, but the operator is willing to wait up to 30 minutes. In such a scenario, the operator could send a message containing e.g. string RESTART and set TTL to 30 minutes. Suppose the device does not manage to process the message in under 30 minutes. In that case, the message is automatically deleted, and the operator can retry or move on to other tasks without worrying about the device being restarted at some inappropriate moment in the future.

{ 
 "command": "RESTART" 
}

Message Feedback

In some situations, it is helpful for an operator to receive feedback for the sent messages (e.g., whether it was successfully processed, rejected, or expired).

Remote procedure call

The last discussed category of cloud-to-device communication is remote procedure calls (RPC). RPCs are pretty similar to messaging, with two crucial differences:

It is not possible to successfully invoke RPC if the device is offline.
The device has an opportunity to provide immediate feedback to the caller.

In other words, the RPC is a way how to interact with a device as if the device would expose methods/functions in a programming language of choice that we can invoke. The invocation succeeds, and we will receive some return value, or an exception is thrown. Anyway, the invocation finishes in a few seconds tops. Suppose the device is offline at the time of invocation. In that case, the invocation immediately fails, nothing is persisted in the cloud, and the invocation is not continued/retried when the device returns online - it is up to the operator to retry the invocation if needed.

Example

Let’s continue with the example of an operator who needs to restart a device. With cloud-to-device messaging, the operator can just send the restart command, which is eventually delivered to the device. The asynchronicity of this process might be desired in some cases (the operator wants to send the command and move on to other tasks without waiting for the result), but it might be unpleasant to work in this asynchronous way in different situations. If the restart operation needs to be done immediately, the cloud-to-device messaging might not be the best choice, and the remote procedure call could be used instead.

An even more noticeable example might be a scenario that involves end users/consumers of devices. For example, there might be a UI application available for the users that enables querying some data that are stored only in the devices and are not being sent and stored in the cloud (e.g., because it is expected that most of the data won’t be ever used and the data are relevant only in some limited timeframe thus it is not economical to collect it all). In such cases, using the remote procedure call with the data in question being the return value from the call is ideal.

How the RPC can be implemented

There are two fundamentally different ways how the devices respond to remote procedure calls:

Polling-based

Devices actively connect to the cloud and poll for active remote procedure calls and respond to them. With bidirectional protocols widely used in IoT, such as AMQP and MQTT, the polling can be very efficient and is not associated with historically negative connotations of polling over HTTP. The key takeaway is that the device needs to play an active role in the RPC initiation.

Practical implementation of this approach involves some SDK in a given programming language used for the device’s embedded software. This SDK handles the aspects of connecting to the cloud and polling. An embedded software engineer can register a custom function/method to be called every time there is a new active RPC. Multiple “RPC-invocable” methods/functions might be registered for different use cases if needed. When this is the case, the RPC caller must specify the method name, and the embedded software engineer must specify the same name when registering the custom method /function.

Proxy-based

The device can have a more passive role in the RPC initiation if needed. The device can expose an API (e.g., REST API) and let something else invoke it. This might be more convenient for some devices and/or embedded software engineers because there is no need for special SDKs. On the other hand, a proxy-based approach typically requires additional components, such as IoT gateways that handle the connection to the cloud and poll for active RPCs.

Theoretically, the device’s API could be invoked directly from the cloud, but it is impractical in most cases because the devices would need to have public API, or there would have to be a VPN/VNET.

Key takeaways

To wrap up this first part of the article, this is the summary of key points:

What is cloud-to-device communication: interacting with IoT devices remotely (e.g., configuring them or sending commands).
There are three kinds of cloud-to-device communication.
It is essential to select the suitable kind for a given task. Examples of typical usage:
• Device Twins - configuration
• Messaging - asynchronous commands
• RPC - Real-time(synchronous) interaction

As touched upon in the beginning, using cloud-to-device communication in production-grade IoT solutions brings additional problems. Handling hundreds and more devices, ensuring everybody can seamlessly use the solution, and making the solution extensible enough to support various business use cases is not easy. There are IoT platform products that can help significantly. You can find more information on these topics in the second part of this article: Managing IoT devices at scale.

Tomas Pajurek

CTO at Spotflow

Tomas is a software engineer at heart with a proven track record in architecting data-intensive systems, mainly for agritech, manufacturing, or biotech sectors, and leading engineering teams building those. He is deeply interested in the design of IIoT, stateful stream processing & distributed systems, software & platform architecture, resilience, cloud, and security. With his team of talented engineers, they apply this knowledge daily to ensure Spotflow is a product that our customers can fully rely on and enjoy using.

The Team

We are a team of tech enthusiasts immersed in IoT solutions for over a decade. Our expertise spans distributed systems, cloud engineering, embedded programming, and IoT, giving us a unique perspective on real-world challenges in this space.

Our Vision

Over the years, we've listened to builders of embedded hardware who struggle to gain visibility into device operations—finding it tough to quickly check device logs or metrics and learn about their overall status. That's why we started working on a new product designed to simplify IoT log collection and working on the platform for embedded observability. We help you to keep track of how your devices operate so you can focus on what truly matters: innovating and building great products.

Our Track Record

Our journey began with building the IoT platform at Datamole . That foundation has grown into building a robust product that now powers large-scale solutions for brands like Lely or Agrifac with more than 100,000 devices actively using the platform today.