Collaborations in Building Hybrid Cloud Computing and Data Integrations

Post first published by Ravi Dharnikota on LinkedIn.

It’s one thing to create application and data integrations; it’s an even bigger challenge to collaborate with other teams in the enterprise to reuse and repurpose and standardize on what has already been built.

The need for seamless content collaboration is a key ingredient for overall success in app and data integrations, just as it is in app development and delivery. A platform that allows for easy sharing of information between employees is the different between a platform’s adoption throughout the enterprise or becoming shelf-ware. Continue reading “Collaborations in Building Hybrid Cloud Computing and Data Integrations”

Two-way SSL with SnapLogic’s REST Snap

SnapLogic_word_cloudThere are lots of ways for a client to authenticate itself against a server, including basic authentication, form-based authentication, and OAuth.

In these cases, the client communicates with the server over HTTPS, and the server’s identify is confirmed by validating its public certificate. The server doesn’t care who the client is, just as long as they have the correct credentials. Continue reading “Two-way SSL with SnapLogic’s REST Snap”

SnapLogic Is Hiring – Engineering, PM, DevOps, QA & More

SnapLogic-is-hiringSnapLogic is currently on the hunt for engineers, marketers, QA specialists and more. We’re building the team in our San Mateo, CA headquarters and continuing to expand our Boulder, CO office and seeking out talent all around the country for both in-office and remote positions. Are you currently in Boston or New York? We have offices there too and we are expanding internationally in 2016. Some of the teams that are currently hiring include:

  • Product Management
  • Marketing
  • Sales Development
  • Human Resources

And in engineering we are hiring for many important roles, including:

  • Big Data Developer
  • DevOps Engineer
  • Java Application Engineers
  • Tech Support Engineer
  • Solutions Engineer

You can check out all of our open positions here.

As more and more enterprise organizations recognize the importance of data and application integration to cloud and big data success, we have grown significantly in the past year with some important hires and the opening of additional offices. Learn more about the perks and benefits of working at SnapLogic.

A little bit more about SnapLogic and the company’s history:

SnapLogic is the industry’s first unified data and application integration platform as a service (iPaaS). Our hybrid cloud architecture is powered by 300+ Snaps, which are pre-built integration components that simplify and automate complex enterprise integration patterns and processes. Funded by leading venture investors, including Andreessen Horowitz and Ignition Partners, and co-founded by Gaurav Dhillon, the SnapLogic Elastic Integration Platform enables leading enterprises to connect faster and gain a better return on their cloud application and big data investments.

Connecting SaaS providers with SnapLogic’s OAuth-enabled REST Snaps

OAuth is an open standard for authorization. OAuth provides client applications a ‘secure delegated access’ to server resources on behalf of a resource owner. It specifies a process for resource owners to authorize third-party access to their server resources without sharing their credentials.

Wikipedia

SnapLogic has many Snaps that utilize OAuth, including Box, Concur, Eloqua, LinkedIn, Facebook, and Google Analytics. We also support it in a generic way with our REST Snaps that can be used to connect with providers we have yet to build a Snap for, so it’s useful to understand what OAuth is and how it works.

While it is not necessary to have any prior knowledge of OAuth to continue reading, if you wish to understand the OAuth standard at a deeper level, oauth.net provides a good starting point.

Let’s dive in with a common use case - you (the user) wish to use SnapLogic (the app) to connect to your Google Drive (the server). In this example, your Google Account is the Owner, the Server is Google’s Identify Platform, and the Client is SnapLogic’s REST Snap.

We will use SnapLogic’s REST Snaps to send and receive data to Google’s Drive API, but it needs to be configured first. As we require accessing content from Google, the Snap needs a way of proving to Google that it has been authorized by the user to interact with their Google Drive, while also allowing the user revoke that access directly from their account (Google provides an “Apps connected to your account” settings page where users can easily review and remove apps).

Our first step is to log into the Google Developers Console and create a new Project:

Create SnapLogic Google Drive Project

Once the Project has been created, we must enable Drive API integration:

Enable Drive API integration

Next, we customize the OAuth consent screen by providing a Product name and, optionally, a logo:

Provide product name and logo to the OAuth consent screen

Finally, we configure a new “OAuth 2.0 client ID” credential to identify our Snap to Google when we ask the user for authorization. We use “https://elastic.snaplogic.com/api/1/rest/admin/oauth2callback/rest” URL as the authorized redirect URI.

Create OAuth 2.0 Client ID for Web Application App

Take note of the generated client ID and secret:

Client ID and Client Secret

We can now create a pipeline, add the REST Get Snap, and configure it to request authorization from the user to list their Google Drive files:

Create new pipeline with REST Get Snap, add new OAuth2 account

When creating the REST OAuth2 Account, we use the client ID and secret provided earlier, and configure the remaining fields with the values specified by the Google OAuth for Web Server Apps documentation:

Configure OAuth2 account

The “Header authenticated” checkbox instructs the REST Snap to include an “Authorization” HTTP Header with every request, whose value is the soon-to-be-acquired access token as a Bearer token. Alternatively, you may choose not to enable this setting and instead include an “access_token” query parameter in each request, whose value is the special expression “$account.access_token“, which was created after a successful authorization.

The “redirect_uri” parameter must be provided in both the auth and token endpoint configs, and the value must match the authorized redirect URI configured for the OAuth 2.0 client ID credential created previously. The “response_type” authentication parameter must have a value of “code” (defined by the OAuth specification), and the “scope” parameter defines the Google Drive capabilities being requested (you may wish to modify the scope to what is appropriate for your use case).

The Google-specific “access_type” and “approval_prompt” parameters are also included in the auth endpoint config. An “access_type” value of “offline” requests Google to return a refresh token after the user’s first successful authorization. This allows the Snap to refresh access to the user’s Google Drive without the user being present. The “approval_prompt” parameter value of “auto“, will instruct Google to provide the refresh token only on the very first occasion the user gave offline consent. A value of “force” will prompt the user to re-consent to offline access to acquire a new refresh token.

Clicking the “Authorize” button will start the OAuth Dance. Depending on whether the User is already logged into their Google Account, or is logged to multiple Google Accounts, they may need to login or choose which Account to use. Either way, as long as the user has not already authorized the app, the user will eventually be prompted to allow the REST Snap to access their Google Drive data:

Snap Authorization consent window

These permissions correspond to the “scopes” that were defined previously. You’ll notice that this is a google.com website and the URL address (https://accounts.google.com/o/oauth2/auth) starts with the same value as the one entered for the “OAuth2 Endpoint” field above. The Snap has also appended some of the other fields, plus some extra security parameters have been added by the SnapLogic Platform.

Assuming the User gives consent by clicking the Allow button, the next couple of steps happen behind the scenes on within the SnapLogic Platform and are mostly concerned with checking that neither SnapLogic nor Google are being tricked by the other party.

Google will return an Authorization Code to the SnapLogic Platform, which will immediately send a request to the “OAuth2 Token” URL (also entered above) with the authorization code, client ID, client secret and redirect URI parameters. On a successful receipt of that request, Google will once again redirect back to SnapLogic, but this time will include an access token, an access expiration timestamp, plus a refresh token.

If all goes well, the User will be shown the SnapLogic Designer view with the REST OAuth Account form again visible, except now with values for the access and refresh tokens:

OAuth2 Account with acquired access and refresh tokens

The “Refresh” button is now also visible (due to a refresh token having been acquired), allowing the user to manually acquire a new access token when the existing one expires. The user may also choose to enable the “Auto-refresh token” setting to permit the SnapLogic Platform to automatically refresh any expiring access tokens, enabling a true offline mode.

Automatically refresh access tokens by enabling the Auto-refresh token setting

We can click the “Apply” button to associate the authorized OAuth2 Account with the REST Snap. The user can now begin querying the Google Drive API to list their Google Drive files.

The Google Drive API Reference details the full capabilities of what our integration can interact with. For example, we could list the files whose title contains “Target Customers”. To do this, the ”Service URL” is updated to https://www.googleapis.com/drive/v2/files, and we add a “q” query parameter with the search parameter value “title contains 'Target Customers'“:

REST Get search and list GDrive files

Save and close the settings dialog to validate the pipeline and preview the results:

REST Get preview GDrive API results

et voilà, we have successfully completed an OAuth 2.0 Authorization Dance and used the acquired access token to connect with Google Drive! The full power of the Google Drive API is now accessible within SnapLogic.

March 2015 Snap Release for the SnapLogic Elastic Integration Platform

SnapIn_HexagonsThis month, we are planning the delivery of our latest Snap Release.

New Snaps

NetSuite Update will be added to the NetSuite Snap Pack. This Snap provides the ability to update the records of an object in NetSuite.

The SAP HANA Snap Pack will be expanded with the addition of a Stored Procedure Snap. Use this Snap to execute a stored procedure in the database and writes any out put to the output view.

This release also introduces the Splunk Search Snap, which executes a search query using Splunk’s REST API.

Updated Snaps

As with all releases, we make continual improvements on our Snaps. Changes in this release focus on Database Snaps, NetSuite Snap Pack, Script Snap, and others. See the Release Notes for more information.

2014 in Review: From Elastic Integration to Enterprise iPaaS

Season's Greetings

What a year 2014 has been! The SnapLogic team hosted regular webinars and TechTalks, attended many industry and partner events, spoke at conferences, delivered frequent updates to our elastic integration platform as a service (iPaaS), announced new financing, expanded our team… and had a lot of fun along the way! We also spent a lot of time working closely with our customers and partners ensuring they’re realizing the maximum potential from our unified platform that connects enterprise data, applications and APIs.

Looking forward to 2015, we recently came out with some tech predictions from our co-founder and CEO Gaurav Dhillon as well as an infographic based on recent survey results with TechValidate. In this video, Gaurav shares his predictions for cloud computing and big data in 2015, covering the topics of “cloudication” and self-service. And in our infographic, we highlight cloud integration drivers and requirements in 2015 including speed, time to value, modern and scalable architecture, support for hybrid deployments and more.

So before we launch into what promising to be an exciting 2015 as iPaaS and big data top enterprise IT priorities, here’s a look back on the past year, including some of the events we attended, product releases and noteworthy announcements. Don’t forget to check out our Resource library for a full collections of our webinars, whitepapers and more and subscribe to our blog from regular updates.

Happy New Year from the SnapLogic Team!

The Most Integrated Man at #GartnerSym2014 Events:

  • Salesforce1 Tour: We participated in this event in a few cities to get the word out about integration data, apps and APIs…and took some selfies in the process!
  • Knowledge14: This show was a great opportunity to talk to customers about using cloud services for enterprise IT service automation.
  • InformationWeek Conference – Gaurav Dhillon spoke on a panel to discuss digital disruption and the pace of business innovation.
  • Dreamforce: We announced our collaboration with the new Salesforce Analytics Cloud ecosystem to deliver big data integration for Wave.
  • Gartner Symposium: Here we were mentioned as part of the iPaaS Magic Quadrant and were able to talk to customers and prospects about the citizen integrator, elastic iPaaS and the changing world of enterprise IT.

2014 Platform Releases and Launches:

SNAP_IN_BIG_DATASnapLogic in the News in 2014

Be sure to check out all of our coverage here, and take a look at some of our recent photos. You can also follow along on our social media channels: Twitter, LinkedIn, Facebook and Google+. See you in 2015!

SnapLogic Big Data Processing Platforms

ArchitectureOne of our goals at SnapLogic is to match data flow execution requirements with an appropriate execution platform. Different data platforms have different benefits. The goal of this post is to explain the nature of data flow pipelines and how to choose an appropriate data platform. In addition to categorizing pipelines, I will explain our current supported execution targets and our planned support for Apache Spark.

First, some preliminaries. All data processed by SnapLogic pipelines is handled natively in an internal JSON format. We call this document-oriented processing. Even flat, record-oriented data is converted into JSON for internal processing. This lets us handle both flat and hierarchical data seamlessly. Pipelines are constructed from Snaps. Each Snap encapsulates specific application or technology functionality. The Snaps are connected together to carry out a data flow process. Pipelines are constructed with our visual Designer. Some Snaps provide connectivity, such as connecting to databases or cloud applications. Some Snaps allow for data transformation such as filtering out documents, adding or removing fields or modifying fields. We also have Snaps that perform more complex operations such as sort, join and aggregate.

Given this setup, we can categorize pipelines into two types: streaming and accumulating. In a streaming pipeline, documents can flow independently. The processing of one document is not dependent on another document as they flow through the pipeline. Such streaming pipelines have low memory requirements because documents can exit the pipeline once they have reached the last Snap. In contrast, an accumulating pipeline requires that all documents from the input source must be collected before result documents can be emitted from a pipeline. Pipelines with sort, join and aggregate are accumulating pipelines. In some cases, a pipeline can be partially accumulating. Such accumulating pipelines can have high memory requirements depending on the number of documents coming in from an input source.

Now let’s turn to execution platforms. SnapLogic has an internal data processing platform called a Snaplex. Think of a Snaplex as a collection of processing nodes or containers that can execute SnapLogic pipelines. We have a few flavors of Snaplexes:

  •  A Cloudplex is a Snaplex that we host in the cloud and it can autoscale as pipeline load increases.
  • Groundplex is a fixed set of nodes that are installed on-premises or in a customer VPC. With a Groundplex, customers can do all of their data processing behind their firewall so that data does not leave their infrastructure.

We are also expanding our support for external data platforms. We have recently released our Hadooplex technology that allows SnapLogic customers to use Hadoop as an execution target for SnapLogic pipelines. A Hadooplex leverages YARN to schedule Snaplex containers on Hadoop nodes in order to execute pipelines. In this way, we can autoscale inside a Hadoop cluster. Recently we introduced SnapReduce 2.0, which enables a Hadooplex to translate SnapLogic pipelines into MapReduce jobs. A user builds a designated SnapReduce pipeline and specifies HDFS files and input and output. These pipelines are compiled to MapReduce jobs to execute on very large data sets that live in HDFS. (Check out the demonstration in our recent cloud and big data analytics webinar.)

Finally, as we announced last week as part of Cloudera’s real-time streaming announcement, we’ve begun work on our support for Spark as a target big data platform. A Sparkplex will be able to utilize SnapLogic’s extensive connectivity to bring data into and out of Spark RDDs (Resilient Distributed Datasets). In addition, similar to SnapReduce, we will allow users to compile SnapLogic pipelines into Spark codes so the pipelines can run as Spark jobs. We will support both streaming and batch Spark jobs. By including Spark in our data platform support, we will give our customers a comprehensive set of options for pipeline execution.

Choosing the right big data platform will depend on many factors: data size, latency requirements, connectivity and pipeline type (streaming versus accumulating). Here are some guidelines for choosing a particular big data integration platform:

Cloudplex

  • Cloud-to-cloud data flow
  • Streaming unlimited documents
  • Accumulating pipelines in which accumulated data can fit into node memory

Groundplex

  • Ground-to-ground, ground-to-cloud and cloud-to-ground data flow
  • Streaming unlimited documents
  • Accumulating pipelines in which accumulated data can fit into node memory

Hadooplex

  • Ground-to-ground, ground-to-cloud and cloud-to-ground data flow
  • Streaming unlimited documents
  • Accumulating pipelines can operate on arbitrary data sizes via MapReduce

Sparkplex

  • Ground-to-ground, ground-to-cloud and cloud-to-ground data flow
  • Allow for Spark connectivity to all SnapLogic accounts
  • Streaming unlimited documents
  • Accumulating pipelines can operate on data sizes that can fit in Spark cluster memory

Snap In to Big DataNote that recent work in the Spark community has increased support for out-of-core computations, such as sorting. This means that accumulating pipelines that are currently only suitable for MapReduce execution may be supported in Spark as out-of-core Spark support becomes more general. The Hadooplex and Sparkplex have added reliable execution benefits so that long-running pipelines are guaranteed to complete.

At SnapLogic, our goal is to allow customers to create and execute arbitrary data flow pipelines on the most appropriate data platform. In addition, we provide a simple and consistent graphical UI for developing pipelines which can then execute on any supported platform. Our platform agnostic approach decouples data processing specification from data processing execution. As your data volume increases or latency requirements change, the same pipeline can execute on larger data and at a faster rate just by changing the target data platform. Ultimately, SnapLogic allows you to adapt to your data requirements and doesn’t lock you into a specific big data platform.