SnapLogic on the Radar with MWD Advisors

MWD AdvisorsSnapLogic was recently reviewed by advisory firm MWD Advisors for our efforts to reinvent integration platform technology by creating one unified platform that can address many different kinds of application and data integration use cases.

A few highlights from the report:

  • An in-depth look at our multi-tenant, AWS-hosted platform which includes the SnapLogic Designer, Manager and Dashboard
  • The Snaplex execution environment; namely, the Cloudplex, Groundplex and Hadooplex
  • 3 competitive differentiators – deployment flexibility, a unified approach across multiple integration scenario types and both scalability and adaptability

Read the full review here, or take a look below. You can also check out other SnapLogic reviews on our website.

MWD Advisors is a specialist advisory firm providing practical industry insights to business leaders and technology professionals working to drive change with the help of digital technology.

Cloud Integration Requirements: Native JSON and REST

One of the primary requirements of an integration platform as a service (iPaaS) and why integration heritage matters in the cloud is that the technology is built on modern web standards – specifically JSON and REST.

JavaScript Object Notation (JSON)

“It’s easy to put rows and columns into a document, but vice-versa doesn’t work.”

– Craig Stewart, Director of SnapLogic Product Management in this recorded demonstration

SnapLogic deals with documents, rather than just the rows and columns of the traditional ETL type tools. In his post, Technical Advantages of JSON-centric iPaaS, SnapLogic’s Chief Scientist Greg Benson reviews 5 data processing and end users benefits of using documents as our native data type:

  1. Documents are a better match to modern web services.
  2. Documents result in more succinct Pipelines.
  3. A document model allows Pipelines to be loosely coupled. (Watch this customer video to hear about the benefits of “schema-less integration”)
  4. A document model allows for greater Pipeline reuse.
  5. Documents are a superset of records.

When it comes to SnapLogic’s foundational support for modern web standards, he writes:

“Our support for documents allows our Snap endpoints to directly consume hierarchical data in native format and send it on to downstream Snaps in a Pipeline. This means that there is no requirement to flatten data into records or to turn a JSON document in a string or BLOB type.”

When it comes to the benefits of a JSON-centric approach supporting structured and unstructured data seamlessly, he concludes, “This native support for documents is one of the many architectural innovations we have developed to help businesses connect both web services and traditional data stores.”

Here is a JSON and Table view of a Twitter Query Snap in the SnapLogic Elastic Integration Platform Designer:

SnapLogic_Native_JSON

SnapLogic_Rows_Columns

Representational State Transfer (REST)

REST_SnapsAs we wrote in the Why Buses Don’t Fly Whitepaper, “Mobile and enterprise APIs are primarily exposed over the REST protocol with the data encoded via JSON. From an integration platform perspective, REST and JSON together are increasingly replacing SOAP and XML, making ESBs less relevant in today’s enterprise SMAC architecture.”

SnapLogic’s architecture is entirely REST-based. We have REST Snaps (as displayed in the screenshot to the right) and REST is how the control plane communicates with the data plane (see the post on Software Defined Integration and the high-level architecture diagram below). As we outlined back in 2011, in the early days of building our new Elastic Integration Platform:

“REST lets you publish your data and have others – regardless of where they might be – work with it. Just looking at the URI gives you an indication of how to proceed. Yet despite all of these advantages, basing SnapLogic on REST gave us the same security and massive scalability as the overall Web itself.”

SnapLogic_architecture

In the next modern integration requirements post, we’ll cover connectivity in more detail.

Reliability in the SnapLogic iPaaS

Dependable system operation is a requirement for any serious integration platform as a service (iPaaS). Often, reliability or fault tolerance is listed as a feature, but it is hard to get a sense of what this means in practical terms. For a data integration project, reliability can be challenging because it must connect disparate external services, which fail on their own. In a previous blog post, we discussed how SnapLogic Integration Cloud pipelines can be constructed to manage end point failures with our guaranteed delivery mechanism. In this post, we are going to look at some of the techniques we use to ensure the reliable execution of the services we control.

We broadly divide the SnapLogic architecture into two categories: the data plane and the control plane. The data plane is encapsulated within a Snaplex and the control plane is a set of replicated distributed servers. This design separation is useful both for data isolation and for reliability because we can easily employ different approaches to fault tolerance into the two planes.

Data Plane: Snaplex and Pipeline Redundancy
The Snaplex is a cluster of one or more pipeline execution nodes. A Snaplex can reside both in the SnapLogic Integration Cloud or on-premises. The Snaplex is designed to support autoscaling in the pretense of increased pipeline load. In addition, the Monitoring Dashboard monitors the health of all Snaplex nodes. In this way, Snaplex node failure can be detected early so that future pipelines are not scheduled on the faulty node. For cloud-based Snaplexes, also known as Cloudplexes, node failures are detected automatically and replacement nodes are made availably seamlessly. For on-premise Snaplexes, aka Groundplexes, admin users are notified of the faulty node so that a replacement can be made.

If a Snaplex node fails during a pipeline execution, the pipeline will be marked as failed. Developers can choose to retry failed pipelines or in some cases, such as recurring scheduled pipelines, the failed run may be ignored. Dividing long running pipelines into several shorter pipelines can limit exposure to node failure. For highly critical integrations it is possible to build and run replicated pipelines concurrently. In this way a single failed replica won’t interrupt the integration. As an alternative, a pipeline can be constructed to stage data in the SnapLogic File System (SLFS) or in an alternate data store such as AWS S3. Staging data can mitigate the need to re-acquire data from a data source, for example, if a data source is slow such as AWS Glacier. Also, some data sources have higher transfer costs or have transfer limits that would make it prohibitive to request data multiple times in the presence of failures on the upstream end point in a pipeline.

Control Plane: Service Reliability
SnapLogic’s “control plane” resides in the SnapLogic Integration Cloud, which is hosted in AWS. By decoupling control from data processing, we provide differentiated approaches to reliability. All control plane services are replicated for both scalability and for reliability. All REST-based front end servers sit behind the AWS ELB (Elastic Load Balancing) service. If any control plane service fails, there will always be a pool of replicated services available that can service client and internal requests. Here is an example where redundancy helps both with reliability and scalability.

We employ ZooKeeper to implement our reliable scheduling service. An important aspect of the SnapLogic iPaaS is the ability to create scheduled integrations. It is important that these scheduled tasks are initiated at a specified time or the required intervals. We implement the scheduling service as a collection of servers. All the servers can accept incoming CRUD requests on tasks, but only one server is elected as the leader. We use a ZooKeeper-based leader election algorithm for this purpose. In this way, if the leader fails, a new leader will be elected immediately and resume scheduling tasks on time. We ensure that no scheduled task is missed. In addition to using ZooKeeper for leader election, we also use it to allow the follower schedulers to notify the leader of task updates.

We also utilize a suite of replicated data storage technologies to ensure control and that metadata exists in a reliable manner. We currently use MongoDB clusters for metadata and encrypted AWS S3 buckets for implementing SLFS. We don’t expose S3 directly, but rather provide a virtual hierarchical view of the data. This allows us to track and properly authorize access to the SLFS data.

For MongoDB we have developed a reliable read-modify-write strategy to handle metadata updates in a non-blocking manner using findAndModfy. Our approach results in highly efficient non-conflicting updates, but is safe in the presence of a write conflict. In a future post we will provide a technical description of how this works.

The Benefits of Software-Defined Integration
By dividing the SnapLogic elastic iPaaS architecture into the data plane and the control plane we can employ effective, but different, reliability strategies between these two classes. In the data plane we help both identify and correct Snaplex server failures, but also allow users to implement highly reliable pipelines as needed. In the control plane we use a combination of server replication, load balancing and ZooKeeper to ensure reliable system execution. Our one size does not fit all approach allows us to modularize reliability and employ targeted testing strategies. Reliability is not a product feature, but an intrinsic design feature in every aspect of the SnapLogic Integration Cloud.

The SnapLogic Integration Cloud: Using the SOAP Request Snap

This next training video provides a quick overview to using the SOAP Request Snap. In this video, you will:

  • Learn how field-level suggest provides data after a WSDL is supplied.
  • Learn how to pass data to a parameter.
  • See a custom envelope generated by SnapLogic based on the data.


Look for more training videos on our SnapLogic video site later this week.

The SnapLogic Integration Cloud: Using the Monitoring Dashboard

Next in the series of our training videos is an overview of the SnapLogic Integration Cloud Monitoring Dashboard. The Dashboard provides visibility into the health of your integrations with system performance graphs found in various tabs.

The tabs you will learn about in this training video are:

  • Health tab – provides a visual view of the overall health of your Snaplex
  • Pipeline tab – displays your pipeline run history including run status, run-time and duration
  • Snaplex tab – displays graphs for active pipelines, executed pipelines, active nodes and pipeline distribution

This video also shows how you can mouse over graphs for specific information at a given point in time, and drag the slider bars to expand the timeframe being viewed. Stay tuned for more training videos next week!

Using the SnapLogic Integration Cloud Manager as an Administrator

Yesterday we showed how to use the SnapLogic Integration Cloud Manager as an Integrator for projects and tasks. Today’s training video will cover use of the Manager as an administrator, including the ability to access groups. Groups are a collection of users that make it easier for users to be assigned to projects.

In this video, administrators using the SnapLogic Integration Cloud learn how to:

  • Create new users
  • Access and manage groups, which includes assigning users to specific projects

Stay tuned for the rest of our series of training videos and in the meantime, download our technical whitepaper for additional details of the SnapLogic Integration Cloud.

Using the SnapLogic Integration Cloud Manager for Projects & Tasks

This is the second training video for the SnapLogic Integration Cloud user interface, specifically covering project access and management using the SnapLogic Manager. Projects are logical groupings of pipelines, files, accounts and tasks, which are an alternative way to execute your pipelines.

In this video, integrators using the SnapLogic Integration Cloud learn how to:

  • Create a new project
  • Delete a pipeline, move a pipeline to a different project, or make a copy of a pipeline
  • Schedule tasks, configuring when and how often they will run
  • Set up a notification for when a task has started, completed or failed

Stay tuned for more on the administration of users, groups and organizations which will be covered in additional training videos. And download our technical whitepaper for additional details of the SnapLogic Integration Cloud.