When most people think about IoT, they think about Internet-connected devices that collect data. These include security systems, manufacturing systems, electrical grids, distribution systems, Internet-connected cars, among many others. There are quite literally billions of devices connected to the Internet. Each device can generate thousands of messages, meaning that there can be hundreds of billions of messages sent by devices. Managing the devices, messaging, and data generated created an umbrella that covers various disciplines, including devices themselves, device management, big data, data presentations, and data consumption.
In each of these various disciplines, Azure has different solutions that serve the purposes of what they are trying to manage.
Devices are the “things” in IoT. The devices connect to the Internet either directly or indirectly. Through direct communication, a device will reach out over a network and connect to its respective management suite through any available protocols, including HTTP, MQTT, AMQP, and others. Indirectly, a device may communicate with a broker through some proprietary protocol that transforms and relays the information over the Internet or through an IoT Edge appliance installed near the device. Devices can use more generic means of communication through APIs provided through IoT suites, implement an SDK on the device, and leverage the SDK’s abstractions for an IoT implementation.
Azure provides several services:
- Azure Sphere – a custom PCB and operating system from Microsoft that offers a platform for building IoT devices and connecting those to Azure.
- Azure Precept – Azure Precept is an AI-enhanced device that integrates with Azure IoT Hubs and Azure ML. Models can be trained and pushed to the device to enable AI neural chips on the device
- Azure IoT SDKs for many different languages. This enables developers and device makers to use their own PCBs and device with the language of their choice.
An edge device in the IoT space is a device that sits on the “edge” of a network, similar to a firewall or proxy server. It brokers communication between the cloud and devices. It can also bring some of the functionality typically performed in the cloud closer to the devices that connect to the edge appliance.
Microsoft provides a point solution for the edge with Azure IoT Edge. Azure IoT Edge is a free, custom operating system that can be installed on a virtual or physical appliance. Azure IoT Edge brings the cloud closer to the devices by implementing many of the same services found in the cloud on an appliance, such as Stream Analytics and Azure ML. The Azure IoT Edge can also act as a proxy for messages from devices so messages can be stored and forwarded in the event of network outages and the like. IoT Edge is extensible through Docker containers that run on the edge. These containers can process, filter, and transform commands and telemetry for devices connected to the Azure IoT Edge.
In addition to the IoT Edge, Azure provides Data Box Edge. Data Box Edge provides a data collection endpoint for devices and some AI-enhanced data processing capabilities on the Edge. Data Box Edge is an appliance that can provide two-way communication for data in and out of Azure and data transformations.
IoT management provides three essential functions for IoT deployments: messaging, device management, and security. Messaging handles device-to-cloud (D2C) and cloud-to-device (C2D) messaging. The two basic kinds of messaging are commands and telemetry. Commands tell a device to do something, while telemetry is some kind of data, such as a periodic sensor reading or some event from a device. These message types are brokered through IoT management, which is integrated with the next essential function, device management. Messaging brokered by an IoT platform can route messages to external systems.
Device management encapsulates device onboarding and device configuration. Device onboarding happens when a device is turned on for the first time when it is deployed in the target environment. The device contacts an endpoint preconfigured at the factory. Based on the provided credentials on the device, the device will be attached to the appropriate security providers and messaging platforms. Additionally, management provides device configuration. Device configuration is, in essence, the ability to manage the settings on a device from the cloud. The settings are stored cloud side in a “device twin” that enables cloud-side services to read devices settings without querying the deployed device. Likewise, it serves as a backup if a device loses its configuration.
Security is two-sided for both the devices and the cloud. Devices typically use an authentication scheme, either through a key or through certificates, that enable the devices to authenticate with the IoT messaging platform. The IoT management manages the critical infrastructure to issues and rotates certificates for deployed devices. The initial certificates or keys are issued to a device through onboarding.
Azure offers many solutions for managing devices:
- Azure IoT Hub – Azure IoT Hub provides the basic infrastructure for messaging, management, and security for device deployments. The messaging platform creates queues for sending and receiving messages to a device and easily integrates with Azure IoT Edge, Azure Sphere, and Azure IoT SDKs.
- Azure Device Provisioning Service (DPS) – DPS provides a solution for onboarding devices onto an Azure IoT Hub. This service allows device manufacturers and device users to set up devices at scale. It can recognize new devices, apply the appropriate credentials, and connect the device to the appropriate IoT Hub.
- Azure Digital Twins – Azure Digital Twins offers enhancements over the basic twinning offered in IoT Hub by creating a virtual environment that represents physical environments for simulations and data parity for devices and their relationships (such as people and places) through a queryable graph.
- Azure IoT Central – Azure IoT Central is a SaaS offering from Azure that provides tenancy for applications interacting with Azure-managed IoT devices. These applications apply logic through rules and APIs that interact with external systems, such as webhooks, messaging platforms, and storage solutions.
A data layer in an IoT platform processes the commands, events, and telemetry from devices as scale. Data can be described as three different paths: hot, cold, and warm.
Hot path data is data processed in real-time or near real-time. Data processing in this context is typically triggered by the arrival of new messages or using tiny data windows for a given amount of time. The demarcation between hot paths and cold paths is that hot paths attempt to make data available as soon as possible. The output from a hot path can be a cold storage platform, like a database, but the intent is that the data is immediately available such as through an API. Therefore, the operations in that data context are more transactional. Alternatively, the data from a hot path can be forwarded as messages and can be consumed downstream by message consumers, such as live dashboards or other event-driven systems.
Azure provides a few services for hot path data:
- Azure Event Hub – Azure Event Hub provides a massively scalable messaging platform that can deliver millions of messages per second. It often serves as an output from Azure IoT Hub and an input for Azure Stream Analytics. Therefore, it’s primarily for streaming workloads, but it does not supply intrinsic message retention. Azure Event Hub can persist messages in Blob Storage.
- Azure Stream Analytics – Azure Stream Analytics provides a SQL-like language that aggregates, filters, and enriches data from streaming and static sources. The output Stream Analytics can be any number of outputs, including cold storage options like SQL server or messaging platforms like Azure Service Bus.
- Azure Functions – Azure Functions provides an event-driven, serverless programming model that enables developers to create applications that respond to different kinds of triggers, including outputs from Stream Analytics, Event Hubs, and Messages Bus. Functions require developers to bring their code; therefore, it can be used for various purposes. Output bindings enable results from Functions to be stored handed off to another step in a pipeline.
- Azure Databricks – Azure Databricks is a partnership between Databricks and Microsoft to provide Apache Spark as a service. Spark is a Swiss Army Knife like took for data processing and supports numerous programming languages and data sources. Depending on the need, it can perform real-time processing and batch processing. Databricks is more of a premium service and adds value-added services that enable analysts and data scientists to work together.
- Azure Synapse Analytics – Azure Synapse Analytics provides Apache Spark integrated into Azure Synapse. Synapse is aimed more at data warehousing and therefore is more intended for analytics and reporting services, so it’s more on-demand.
- Azure HDInsights – HDInsights provides many tools for data processing as a service, including Apache Spark and Kafka. Kafka is the purpose-built tool for performing stream analytics in the Hadoop ecosystem. HDInsight primarily aims to provide an always-on configuration for running being data workloads.
Cold path is virtually synonymous with batch processing, which is the primary distinction between it and hot path, which aims at real-time or near real-time processing. Cold path data allows data to accumulate over a period of time, then processes the data en masse rather than transactionally like hot paths typically do. Cold path workloads almost always run on a timer. The timescales for these ranges vary widely, with some running as often as every minute with other jobs running daily, weekly, or even monthly. Cold path pipelines perform ETL (Extract, Transform, Load) or ELT (Extract, Load, Transform) pipelines. Data extraction implies pulling data from cold storage, such as a database, data lake, storage account. Transformations consist of aggregates, joins, and enrichments of the data. Loads consist of storing the data into another cold storage system ready for consumption.
Like hot path data, there are a myriad of options on Azure for processing data through a cold path,
- Azure Data Factory – Azure Data Factory provides a no-code abstraction on top of Apache Spark that enables users to develop ETL and ELT pipelines using a visual designer. Data Factory uses a serverless billing model, which bills jobs based on CPU and bandwidth consumption. Data Factory provides input and output connectors for virtually every data service on Azure and many off Azure, integrating well with existing Azure services.
- Azure Databricks – Azure Databricks is a partnership between Databricks and Microsoft to provide Apache Spark as a service. Spark is a Swiss Army Knife like took for data processing and supports numerous programming languages and data sources. It can perform real-time processing and batch processing, depending on the need. Databricks is more of a premium service and adds value-added services that enable analysts and data scientists to work together.
- Azure Synapse SQL – Azure Synapse brings together many different tools for data processing particularly aimed at data warehousing. Azure Synapse SQL is the successor to Azure Data Warehouse and provides a SQL interface for performing ETL and ELT workloads within the context of Azure Synapse.
- HDInsight – HDInsight provides several different open-source tools in the Hadoop ecosystem, including Spark. HDInsight primarily aims to provide an always-on configuration for running being data workloads.
- Azure Batch – Azure Batch provides an orchestrator for marshaling virtual machines on Azure for High-Performance Compute (HPC) workloads. Azure Batch interacts with storage accounts and can also run Docker containers using Batch Shipyard. Batch creates VM pools, spins up a set of VMs, processes data, then shuts down the VMs after a job completes. This solution enables developers to write their processing in virtually any language for Linux or Windows.
- Azure Data Lake – Azure Data Lake provides cold storage for numerous compute solutions listed above. Data Lake does not provide any OLTP services for the data, rather just a massively scalable data storage solution. Scale is realized through data partitioning and a distributed storage system. Data Lake is abstracted through Azure Storage Accounts. It provides a Hadoop File System (HDFS) compatible abstraction that enables many of the aforementioned tools and many tools off Azure to read and write data.
- DBaaS – Azure offers numerous Database as a Services (DBaaS) offerings in multiple paradigms that integrate well with many data processing services. More traditional OLTP RDMSs include Azure SQL, MariaDB, MySQL, and PostgreSQL. Azure also provides Azure CosmosDB, a multi-paradigm NoSQL database that supports document databases through its own SQL API, a Mongo API, an Apache Casandra API for columnar data, a Gremlin compatible API for graph databases, and a Key/Value store through the Table API.
Warm path data sits between hot path data and cold path data. Functionally, warm path data behaves more like cold path data in that it performs batch processing. The primary difference here is that warm path processes smaller amounts of data more often than those handled in cold path batch processing. Practically, a warm path behaves more like hot path data in that it attempts to provide processing on the latest data coming from IoT devices.
Because warm path data is like both hot and cold paths, it shares several offerings with each, including Azure Functions, Data Factory, Data Lake, DBaaS, and Spark-based offerings, particularly Databricks. Additionally, there are a few services more optimized for Warm Path data.
- Azure Time Series Insights – Azure Time Series Insights is a point solution designed for IoT workloads that need warm path style processing. Times series enables users to query, transform, and enrich data for a given window of time as determined by the need of the application. The data is retained to enable processing a variety of time slicing schemes to segment the data. Time Series Insights integrates well with Azure Event Hub and IoT Hub for data processing.
- Azure Functions – Functions in the context of warm path data offer a way to process data in small batches without marshaling compute clusters.
- Data Lake – Data Lake partitioning schemes enable data to be stored in a way that facilitates quick access to data based on time slices. Choosing the time resolution on data increments then makes processing data in this manner simple using compute triggered on a timer.
Data presentation entails providing data to external consumers. Therefore, the data presentation not only brokers the data but also provides security controls for accessing the data, and in some cases, may push the data to external consumers rather than have the consumers pull the data. Data in these cases can be exposed through APIs or integration services.
In many ways, the data presentation drives what the data paths build. When looking to develop a data path, one considers what the data presentation will look like, what shape the data sources are in, and then what it will take to transform the data from the data sources into something consumable through the data presented.
Azure offers several tools for creating and managing APIs and exposing data sources for external consumers.
- Azure App Services – Azure App Services is the canonical platform for hosting web-based applications, including APIs. The hosting platform can host APIs with database backends with some features for adding security, including data encryption, authentication, and authorization.
- Azure Functions – Azure Functions enable serverless applications, which can also expose APIs through HTTP triggers and outputs. Azure Functions uses an on-demand billing model and can scale rapidly without configuring these sorts of operations manually.
- Azure API Manager (APIM) – APIM provides an abstraction layer for APIs and provides some level of request and response transformations. It also includes API management by enabling API key management, throttling, and other sources that improve performance and security.
- Logic Apps – Logic Apps are a no-code solution for building workflows and can be used for lightweight data integrations and processing and endpoints for some data access components.
- Azure SignalR Service – SignalR Service provides an abstraction that enables a server to push messages to clients using WebSockets or long polling should WebSockets not be available. SignalR originally was written for .NET, but it works with numerous backend platforms. SignalR has clients for mobile applications and web applications.
- Azure Service Bus is a general-purpose messaging platform with topics and queues with configurable security models and retention policies for messages. It also supports many different protocols, including HTTP and AMQP.
- Azure Storage – Azure Storage provides an economical way to host datasets in the cloud. Through APIs, clients can authenticate and retrieve blob data for external consumption. Azure Storage also supports other protocols, including standard web requests and beta support for SFTP.
- Azure DBaaS – Azure DBaaS offerings often serve data directly to external consumers. Azure offers are a robust set of security services to ensure that data is safely delivered to external consumers.
Data consumers are external applications that are not explicitly a part of an IoT platform but rely on the data provided by an IoT platform. Such consumers could be static reports emailed to a consumer, web and mobile applications, reporting services, business intelligence platforms, and external systems that ingest the IoT platform’s data. Here are a few services worth noting that facilitate consumption.
- SendGrid – SendGrid is a third-party service owned by Twilio for delivering email messages. The Twilio infrastructure offers a robust infrastructure for message delivery and APIs for integration points. Numerous services on Azure natively integrated with SendGrid for sending email messages.
- PowerBI – PowerBI is a business intelligence platform used for data analysis and reporting. IT integrates with virtually all data sources on Azure and many off Azure. The tooling provides a rich set of connectors and a powerful analytics language for creating reports and data visualizations that can be filtered and sliced within the context of a report.
Azure offers many solutions that require careful consideration for creating a complete, end-to-end IoT solution. No one solution will contain everything but will usually have at least one service in each discipline.