In today’s business world big data is generating a big buzz. Besides the searching, storing and scaling, one thing that clearly stands out is – stream processing. That’s where Apache Kafka comes in.
Kafka at a high level can be described as a publish and subscribe messaging system. Like any other messaging system, Kafka maintains feeds of messages into topics. Producers write data into topics and consumers read data out of these topics. For the sake of simplicity, I have linked to the Kafka documentation here.
In the first post in this series, we talked about the challenges of integrating the Internet of Things into the enterprise. In the next few blog posts, we are going to build a simple IoT application that illustrates all the major aspects of working with SnapLogic and hardware. In this post, we’re going to skip device details, but at a high level we’ll have:
A sensor somewhere (on-premises, from an API, etc.) that produces data that includes a “color” payload;
An LED on-premise, attached to our local network, conveniently hooked up to look like a REST endpoint;
Two pipelines, one on-premise, one in the cloud.
Some IoT hardware is designed to be cloud-native, and will generally have a publish/subscribe relationship with a cloud server (such as MQTT). This is very easy to work with from a security standpoint, since the output of these devices are accessible from anywhere.
Other devices instead communicate on their local network. Assuming your local network isn’t internet accessible, this can create problems in talking to the device. Usefully, the SnapLogic Control Plane (depicted, in a manner of speaking, as the rightmost rectangle below) comes to our rescue here.
[update – check out what’s new in our Spring 2016 release – the Metadata Snaps are also useful for Lifecycle Management requirements]
One of the areas our integrated data services team and partners spend time with customers early in a SnapLogic Elastic Integration Platform deployment is on deploying from one project phase to the other (Dev -> QA -> Prod). There are a number of different configuration options. In this post, I’ll describe one. First a few assumptions:
The enterprise Lifecycle Management feature is not implemented in this example
The phases that are in use are Development, QA and Production
Each phase in use is being managed at a project level as a separate project with in a single Organization Setup
The users have the necessary permissions to perform the operations described in this post
The enhanced account encryption feature is not in use in the current SnapLogic Org
In my final post in this series on SnapLogic Ultra Pipelines I’m going to cover the three pillars for successful implementation and data pipeline management: performance, scaling and high availability.
Performance: Performance of an Ultra Pipeline largely depends on the response times of the end-system applications that the task is connecting to. An Ultra Pipeline containing a large number of high latency endpoint Snaps can observe a congestion of documents, building all the way up through the upstream Snaps to the feedmaster, until the feedmaster queue can no longer hold the messages. This can be avoided by either creating multiple instances of the Ultra Pipeline task or by using the Router Snap to distribute the document load. Multiple instances of an Ultra Pipeline will ensure that even if one instance is slow, others are available to consume documents and keep the feedmaster queue flowing. Likewise, a Router Snap can be used in each instance of the pipeline to distribute the documents across multiple endpoint Snaps, to improve the performance and add parallel processing capability to an instance. This is in addition to the built-in parallel computation capability of a pipeline which implies that at a given point in time, each Snap in a pipeline is processing a different document.
Scaling: Scaling can be attained by increasing the number of instances in an Ultra Pipeline task. The total number of instances required for an Ultra Pipeline task is a direct function of the expected response time, the resource utilization of the node when a single instance of the task is running and the functional load on the Snaplex from other pipeline runs. When the execution nodes are highly utilized, adding more execution nodes allows the instances of the task to be distributed horizontally and scaled out across the Snaplex.
High availability: In order to avoid service disruption and to allow high availability, it is highly recommended to use a load balancer with two feedmasters and two execution nodes as the minimum architecture for Ultra Pipeline setup. Such an architecture can also be used to avoid a single point of failure from a feedmaster or execution node.
In my first post on SnapLogic Ultra Pipelines, I began to review aspects to consider when designing these low-latency pipelines. Once you’ve determined the right number of views, you need to determine the type of views. The unconnected views in an Ultra Pipeline act as the gatekeepers of the task, receiving and returning documents from the external applications. Continue reading “Designing Ultra Pipelines: Types of Views”
Ultra Pipeline tasks are used to implement real-time web service integrations which require expected response times to be close to a few sub seconds. In the first series of posts I’ll outline some of the key aspects of designing Ultra Pipelines. In the second series of posts I’ll focus on monitoring these low-latency tasks.
Because Ultra Pipelines tools are analogous to web service request/response architecture, the following aspects should be considered in designing Ultra Pipelines in SnapLogic. Continue reading “Designing Ultra Pipelines”