Accelerating Home Automation at SmartThings with Rule Engine
by vlad shtibin how long does it take for a light switch to turn on a light bulb? how about for the garage opener to start raising the door? how often does the switch not work? does it still work when the internet is down? the average “not smart” home is fast and reliable. this is the challenge we took on when thinking through building our next generation of automations. in order to create a smart home that is able to meet these goals, we need to take advantage of both cloud services and hubs on the edge. this blog post describes the architecture behind the smartthings rules engine that is used by the cloud and hub to power automations on the smartthings platform. device communications the following diagram conveys a high-level summary of what the smartthings platform “does” with devices. events—in the form of protocol specific messages (zigbee, z-wave, and more)—are converted into a common format (which we refer to as the capabilities) and used by smartthings services. commands from smartthings to a device follow the same pattern, just in reverse. cloud vs. local devices device communications can happen on the cloud or hub. this is mainly driven by the integration type: wi-fi integrations execute on the cloud with cloud and direct connected devices, while zigbee, z-wave, and lan integrations execute on the hub with edge drivers. we also need to provide the flexibility for unique situations, like smartapp connected cloud devices or hub connected devices that do not follow published specifications. in these cases, we use the cloud for parsing. thought experiment with philips hue our partnership with philips hue is a great example of an integration seamlessly executing across cloud and hub. for a bit of background: there are multiple ways to connect philips hue products to the smartthings platform, creating a broad topology: to summarize the topology, all input and output (i/o) is dependent on where the integration is deployed and all rule evaluation logic is independent of the deployment location. with the above use cases in mind, we needed to cover the two main integration points: event delivery: ability to consume events and get device states. command dispatch: ability to send commands to devices, set location modes, and more. looking at the architecture, we had additional requirements for high reliability, memory efficiency, and low overhead for delivery. we decided to use a rust based solution because it is memory efficient, reduced code duplication across surfaces, and easily integrated with our existing experience and release engineering infrastructure. the following table summarizes the requirements in a bit more detail: requirement description benefits of rust high reliability ensures all events are picked up and evaluated and automations execute, regardless of origin on the hub or cloud the type system and ownership model mitigates a large class of potential bugs, ensuring memory and thread safety memory efficiency this is extremely important for embedded devices (aka hubs) that run the rules engine small footprint and processor demand make it easy to run on hubs delivery overhead we needed to balance the quick delivery of features and availability of the cloud platform versus the long firmware cycles for embedded devices reduce feature implementation time, flexible with multiple deployment targets smartthings engineering compatible release engineering at smartthings is standardized, new features must work with existing tooling our rust smes were able to leverage the rich ecosystem for integration with our ci/cd processes for deployment application architecture with the above requirements, topology, and architecture in mind, we created rule engine. the resulting application architecture is essentially as follows: hive the brains of the rule engine that contains and is responsible for the majority of the rule execution code, hive has two main functions: expose an api to execute rules and provide an interface for parent services to interact with the platform. to reduce overhead, hive is dedicated to executing rules, trusting the parent service to provide contextual data needed for rule execution, such as device states or location modes. for example, when a parent service receives a device state change event, it invokes hive to evaluate (for example, is equals condition true), and executes the rule. swarm the cloud container for hive and management of i/o functionality for cloud execution, this service is deployed to the cloud and listens to events from the smartthings event pipeline. when events are consumed, swarm invokes hive to execute rules. the implementation of hive’s interface by swarm is a set of http clients that interact with the smartthings api. for example, when hive requires a device state to evaluate is equals condition true, swarm dispatches a get request to the device api and forwards the state to hive. similarly, when hive needs to send a device command, swarm dispatches a post request to the device api. drone the container for hive embedded on edge devices (aka hubs), drone invokes hive to execute rules when events are consumed locally on the hub. to implement hive’s interface, drone uses a rust api provided by services on the hub (for example, hubcore). today, this application is bundled with smartthings hub firmware and listens to events from a dispatcher on the hub. for example, when hive requires a device state to evaluate is equals condition true, drone calls into hubcore to retrieve the device state. similarly, when hive needs to send a device command, drone calls a senddevicecommand() function on hubcore’s rust api. rule engine end-to-end, the process looks something like this: rule: if switch a is on, set switch b to on user turns switch a on swarm (cloud) or drone (hub) receives the on event swarm/drone tells hive to execute rules with if switch a is on hive evaluates the rule conditions and determines to be true hive then evaluates actions, set switch b to on hive says, “send device command to switch b” swarm/drone receives “send device command to switch b” and executes the request conclusion our team has been very impressed with the capabilities of the rust programming language and supporting libraries. in the cloud, a small cluster of swarm applications is able to execute hundreds of millions of rules per day, resulting in drastically reduced cost of smartthings hosted automations. we are then able to deploy virtually the same codebase on an embedded device—with a negligible footprint—and execute consistent, low latency and reliable automations in users’ homes. since the initial launch, we have released new locally executing features with each new firmware update. looking ahead, we are regularly adding new features to rule engine that can bring you more complex automations. to learn more, check out our rules api documentation and sign up for sdc21 to participate in smartthings sessions.