One of the essential parts of most IoT solutions is the connection of devices to some cloud service or server. Therefore, phrases like “Connecting devices to the cloud” are used frequently. However, the term can have very different meanings under different contexts. The primary difference is in the direction of that connection. Is the device sending data to the cloud (aka device-to-cloud comm.) or receiving data/commands from the cloud (aka cloud-to-device comm.)? In this article, we will explore the less well-understood cloud-to-device communication.
Good real-world examples of cloud-to-device communication might be:
There are multiple kinds of cloud-to-device communication, and for designers of IoT solutions, it is crucial to pick suitable kinds for given requirements. Failure to do so often leads to unreliability, poor performance, and unnecessarily high costs. However, understanding which type of cloud-to-device communication is suitable for a task at hand is usually insufficient for production-grade IoT solutions, and additional challenges must be addressed.
This article explains the differences among kinds of cloud-to-device communication and the challenges of using cloud-to-device communication in production solutions.
The article is split into two parts with the following content:
This article uses the term “cloud,” but IoT solutions do not necessarily need to be implemented using cloud technologies. In the following text, the term “cloud” is used for simplicity, but it could be easily substituted with a more generic “service side” or “backend.”
Based on expected guarantees for delivery, durability, and latency, we can divide cloud-to-device communication into three kinds: Device Twins, Messaging, and Remote Procedure Calls (RPC).
A Device Twin (sometimes referred to as Digital Twin) is the virtual counterpart of a device that lives within the cloud. It represents the current or desired state of the device that can be read or modified by operators or other services. It can be accessed anytime, no matter if the device is online or offline at the moment.
If an operator updates the device twin and the physical device is online at that time, the update (or the entire updated twin snapshot) is immediately delivered to the device, and the device can act accordingly (e. g. change its configuration). If the device is offline, the update is persisted in the cloud and delivered to the device as soon as it returns online. If there are multiple updates to the device twin during the offline period, the device will receive just one merged update (or entire updated twin snapshot) when it comes online. This is an important differentiator of device twins to other categories of cloud-to-device communication → individual updates/history of a an important differentiator of device twins to other categories of cloud-to-device communication → individual updates/history of a device twin are unimportant, and we only care about the resulting state, which is kept for the entire lifetime of the device (it does not expire).
In practical terms, the device twin is a document (e.g., JSON document) stored in the cloud that both device and operators can read and write to. The prominent use case for device twins is configuration.
Let’s look at the following example: assume an imaginary autonomously guided device is moving inside a warehouse. We want an easy way to adjust the maximum movement speed of the device without physical access to it. To implement this functionality, we use a device twin with one numeric property maxSpeed. To change the desired maximum speed, an operator can just set this property to a specific value from anywhere via cloud API. The underlying mechanism of device twins will make sure that this value is delivered to the device (it might not be delivered immediately, but eventually, it will get there). Once the device receives the device twin update, it can adjust the max speed, and the action is successfully finalized.
In an ideal situation, there should not be a difference between the desired state of the device (set by an operator) and the reported (actual, real) state. However, in reality, there are several situations in which this might not be true:
Frequently, we need to monitor this discrepancy and/or receive feedback from devices for device twin updates. For this reason, the device twin is typically represented as two independent parts (e.g. two JSON documents):
If everything works smoothly, these two parts should be identical apart for short periods of time between posting the twin update by an operator and processing the update by the device. However, as mentioned, there are situations in which there will be differences that we might want to monitor and act upon.
Following up on the previous example of the autonomously guided device and its maximum movement speed, it could happen that the operator sets the maxSpeed property to X, but the specific device’s maximum speed is X-10 (e. g. because the device has a less powerful engine than the operator anticipated). In such a situation, the device can either increase its speed to its real maximum or powerful engine than the operator anticipated). In such a situation, the device can either increase its speed to its real maximum or refuse to make any change at all. Anyway, it can report the actually configured max speed via the reported properties of its device twin so that the operator can act accordingly.
Messaging is another category of cloud-to-device communication that is in some aspects similar to device twin but is quite different in others. Indeed, it is easier to describe.
With messaging, an operator simply sends a message (sometimes called a command) to a specific device and expects it to act accordingly. If the device is offline at the time of sending, one or more messages are queued in the cloud, and all of them are delivered to the device when it comes back online. The queued messages also typically have some time-to-live (TTL). If a message is queued longer than the TTL without being consumed by the device, it is automatically deleted. These are crucial differences to device twins → each individual message is delivered to the device, and the messages can expire.
Each message is addressed to a specific device and is represented by e.g. JSON document or any other serializable format. Typical examples of cloud-to-device messaging are an invocation of one-off actions such as restarts: the operator wants to restart a device without physical access to it. It is not required for the device to restart right away, but the operator is willing to wait up to 30 minutes. In such a scenario, the operator could send a message containing e.g. string RESTART and set TTL to 30 minutes. Suppose the device does not manage to process the message in under 30 minutes. In that case, the message is automatically deleted, and the operator can retry or move on to other tasks without worrying about the device being restarted at some inappropriate moment in the future.
In some situations, it is helpful for an operator to receive feedback for the sent messages (e.g., whether it was successfully processed, rejected, or expired).
The last discussed category of cloud-to-device communication is remote procedure calls (RPC). RPCs are pretty similar to messaging, with two crucial differences:
In other words, the RPC is a way how to interact with a device as if the device would expose methods/functions in a programming language of choice that we can invoke. The invocation succeeds, and we will receive some return value, or an exception is thrown. Anyway, the invocation finishes in a few seconds tops. Suppose the device is offline at the time of invocation. In that case, the invocation immediately fails, nothing is persisted in the cloud, and the invocation is not continued/retried when the device returns online - it is up to the operator to retry the invocation if needed.
Let’s continue with the example of an operator who needs to restart a device. With cloud-to-device messaging, the operator can just send the restart command, which is eventually delivered to the device. The asynchronicity of this process might be desired in some cases (the operator wants to send the command and move on to other tasks without waiting for the result), but it might be unpleasant to work in this asynchronous way in different situations. If the restart operation needs to be done immediately, the cloud-to-device messaging might not be the best choice, and the remote procedure call could be used instead.
An even more noticeable example might be a scenario that involves end users/consumers of devices. For example, there might be a UI application available for the users that enables querying some data that are stored only in the devices and are not being sent and stored in the cloud (e.g., because it is expected that most of the data won’t be ever used and the data are relevant only in some limited timeframe thus it is not economical to collect it all). In such cases, using the remote procedure call with the data in question being the return value from the call is ideal.
There are two fundamentally different ways how the devices respond to remote procedure calls:
Devices actively connect to the cloud and poll for active remote procedure calls and respond to them. With bidirectional protocols widely used in IoT, such as AMQP and MQTT, the polling can be very efficient and is not associated with historically negative connotations of polling over HTTP. The key takeaway is that the device needs to play an active role in the RPC initiation.
Practical implementation of this approach involves some SDK in a given programming language used for the device’s embedded software. This SDK handles the aspects of connecting to the cloud and polling. An embedded software engineer can register a custom function/method to be called every time there is a new active RPC. Multiple “RPC-invocable” methods/functions might be registered for different use cases if needed. When this is the case, the RPC caller must specify the method name, and the embedded software engineer must specify the same name when registering the custom method /function.
The device can have a more passive role in the RPC initiation if needed. The device can expose an API (e.g., REST API) and let something else invoke it. This might be more convenient for some devices and/or embedded software engineers because there is no need for special SDKs. On the other hand, a proxy-based approach typically requires additional components, such as IoT gateways that handle the connection to the cloud and poll for active RPCs.
Theoretically, the device’s API could be invoked directly from the cloud, but it is impractical in most cases because the devices would need to have public API, or there would have to be a VPN/VNET.
To wrap up this first part of the article, this is the summary of key points:
As touched upon in the beginning, using cloud-to-device communication in production-grade IoT solutions brings additional problems. Handling hundreds and more devices, ensuring everybody can seamlessly use the solution, and making the solution extensible enough to support various business use cases is not easy. There are IoT platform products that can help significantly. You can find more information on these topics in the second part of this article: Managing IoT devices at scale.