Machine Learning for the Enterprise, Part 3: Building the Pipeline

In the last post we went into some detail about anomaly detectors, and showed how some simple models would work. Now we are going to build a pipeline to do streaming anomaly detection.

We are going to use a triggered pipeline for this task. A triggered pipeline is instantiated whenever a request comes in. The instantiation can take a couple of seconds, so it is not recommended for low latency or high-traffic situations. If we’re getting data more frequently than that, or want less latency, we should use an Ultra pipeline. An Ultra pipeline stays running, so the input-to-output latency is significantly less.

For the purpose of this post, we’re going to assume we have an Anomaly-Detector-as-a-Service Snap.  In the next post, we’ll show how to create that Snap using Azure ML. Our pipeline will look like this:

Final Pipeline
Final Pipeline

Continue reading “Machine Learning for the Enterprise, Part 3: Building the Pipeline”

Building an IoT Application in SnapLogic, Part II: Speeding Through the Last Mile

The last post in this ongoing IoT series detailed the creation of a cloud-based Ultra Pipeline to do the bulk of the work for our IoT application. We described the following application:

  • A sensor somewhere (on-premises, from an API, etc.) that produces data that includes a “color” payload;
  • An LED on-premise, attached to our local network, conveniently hooked up to look like a REST endpoint;
  • Two pipelines, one on-premise, one in the cloud.

Continue reading “Building an IoT Application in SnapLogic, Part II: Speeding Through the Last Mile”

Building an IoT Application in SnapLogic: Figuring out Pipelines and Tasks

In the first post in this series, we talked about the challenges of integrating the Internet of Things into the enterprise. In the next few blog posts, we are going to build a simple IoT application that illustrates all the major aspects of working with SnapLogic and hardware.  In this post, we’re going to skip device details, but at a high level we’ll have:

  • A sensor somewhere (on-premises, from an API, etc.) that produces data that includes a “color” payload;
  • An LED on-premise, attached to our local network, conveniently hooked up to look like a REST endpoint;
  • Two pipelines, one on-premise, one in the cloud.

Hardware Considerations

Some IoT hardware is designed to be cloud-native, and will generally have a publish/subscribe relationship with a cloud server (such as MQTT).  This is very easy to work with from a security standpoint, since the output of these devices are accessible from anywhere.

Other devices instead communicate on their local network.  Assuming your local network isn’t internet accessible, this can create problems in talking to the device.  Usefully, the SnapLogic Control Plane (depicted, in a manner of speaking, as the rightmost rectangle below) comes to our rescue here.

Control-Data Plane Diagram
A “graphical depiction” of the control plane (right) communicating with various data planes, including a Hadooplex at the bottom. We see the artist is somewhat defensive about his rendering of the pachyderm likeness.

Continue reading “Building an IoT Application in SnapLogic: Figuring out Pipelines and Tasks”

REST GET and the SnapLogic Public APIs for Pipeline Executions

As a part of a wider analytics project I’m working on, analyzing runtime information from the SnapLogic platform, I chose to use the functionality exposed to all customers, the Public API for Pipeline Monitoring API and the REST API. These two things are combined in this post. I started by reading the documentation (of course), which shows the format of the request and response. So I created a new pipeline and dropped a REST GET Snap on the canvas:

snaplogic_REST_pipeline
Continue reading “REST GET and the SnapLogic Public APIs for Pipeline Executions”

Two-way SSL with SnapLogic’s REST Snap

SnapLogic_word_cloudThere are lots of ways for a client to authenticate itself against a server, including basic authentication, form-based authentication, and OAuth.

In these cases, the client communicates with the server over HTTPS, and the server’s identify is confirmed by validating its public certificate. The server doesn’t care who the client is, just as long as they have the correct credentials. Continue reading “Two-way SSL with SnapLogic’s REST Snap”

ETL and EAI: One SnapLogic Pipeline, Multiple Integration Solutions

“This is not your father’s ETL. This is not your mother’s message bus. This is not your uncle’s application integration”

– Rich Dill commenting on the need to think differently about application and data integration with SnapLogic

ETL pipelineThis week I sat down with Rich Dill, one of SnapLogic’s data integration gurus who has over twenty years of experience in the data management and data warehousing world. Rich talked about the leap forward in terms of ease-of-use and user enablement that an integration platform as a service (iPaaS) can provide. He also commented on the latency differences and when it comes to moving data from one cloud to another or from a cloud to a data center, compared to moving data from one application to a data warehouse in a data center.

“When using a new technology people tend to use old approaches. And without training what happens is they are not able to take advantage of the features and capabilities of the new technology. It’s the old adage of putting the square peg in a round hole.”

To highlight the power of how SnapLogic brings together multiple styles of integration in a single platform, Rich put together this demonstration, where he creates a data flow, called a pipeline, that is focused on a classic extract, transform and load (ETL) use case and goes much further. Here’s a summary of SnapLogic consuming, transforming and delivering data.

Part 1: ETL as a Service

  1. Rich selects data from two databases, explaining how you can preview data and view it in multiple formats as it flows through the platform.
  2. He reviews how SnapLogic processes JSON documents, which gives the platform the ability to loosely couple the structure of the integration job to the target, and goes on to perform inner and outer Joins before formatting the output and writing the joined data to a File Writer.
  3. Then he goes back and adds a SQL Server Lookup to get additional information.
  4. He runs the pipeline and creates a version of it.

Part 2: Managing Change (Ever get asked to add a few more columns in a database and then have to change your data integration task?)

  1. He goes in and modifies the underlying SQL table.
  2. He re-runs the SnapLogic pipeline and shows the new results, without having to make a change. This highlights the flexibility and adaptability of the SnapLogic Elastic Integration Platform.

Part 3: Salesforce Data Loading

  1. He brings in the Data Mapper Snap to map data to what’s in Salesforce.
  2. He drags and drops the Salesforce Upsert Snap and determines if he should use the REST or Bulk API.
  3. He uses SmartLink to do a fuzzy search and map the input and output fields.
  4. He reviews the Expression Editor to highlight the kinds of data transformations that are possible.
  5. He shows how data is now inserted into Salesforce and saves this version of the pipeline.

Part 4: RESTful Pipelines

  1. He removes the Data Mapper Snap because the output will be different and he brings in the JSON Formatter.
  2. Here, Rich takes a minute to review how not only is SnapLogic a loosely-coupled document model, but it’s also 100% REST-based. This means that each pipeline is abstracted and, as he puts it, “addressable, usable, consumable, trigger-able, schedule-able as a REST call.”
  3. He goes to Manager > Tasks and creates a new Task and sets it to Trigger.
  4. He executes the Task to demonstrate that, when calling the REST-based pipeline, it can then show how a mobile device can do a REST Get to bring the data into a mobile device.
  5. He wraps up the demonstration by changing the pipeline from a JSON document output to an XML document.

I’ve embedded the video below. Be sure to check out our Resource center for other information about SnapLogic’s Elastic Integration Platform. Thanks for a great demo Rich!

SnapLogic Tips and Tricks: REST Snap Compression Capabilities

This article is brought to you by our Senior Director of Product Management, Craig Stewart.

In the Fall 2014 release, SnapLogic added a number of new features across the broad range of Snaps. Amongst those was the ability for a REST GET operation to accept gzip-encoded data. When combined with a triggered pipeline in another Snaplex, this can add significant performance and reliability (the less time you spend moving data over the wire, the less total packets moved, the less scope there is for network errors, and the less time it should take).

As an example, I created a simple pipeline which outputs a set of data, in this case just an Oracle database query returning 101,000 rows of data:

Oracle Select

For this, I created a task so I could call it using the REST GET Snap in the other pipeline:

task

To call it, I created a pipeline using the REST GET snap, which would call this URL:

rest-get

As the URLs for triggered pipelines require authentication, I created and assigned a Basic Auth account with my credentials, and associated it with the REST GET Snap.  The URL is copied and pasted from the task created previously. This was all possible in earlier versions of SnapLogic. The change in this version, is the ability to add the content-type accept headers:

rest-get-headers

Now what will happen is that the Snap, if it gets data in gzip format, will automatically uncompress and process that data received (even when not from a SnapLogic triggered pipeline). No additional Snaps required. The clever bit is that the the triggered pipeline will also note that the caller is able to accept gzip format, so it will automatically send the data in that format.

In summary, you just need to add the HTTP Headers to the REST Get.

As an aside, the Task Execute Snap will do this compression automatically, to be covered in a future post. For more SnapLogic Integration Cloud best practices and tips and tricks, be sure to check out our TechTalk webinars and recordings.