EMA_Infographic_iconDefined by some as the “collection, processing and analysis of information from sensors on a large number of disparate devices,” the Internet of Things (IoT) presents challenges to IT organizations that go beyond simply dealing with more available data. It presents challenges related to collection, management and processing.

In order to learn how (or if) enterprises were taking on the IoT challenge, Enterprise Management Associates (EMA) recently surveyed IT professionals about their plans regarding big data strategies, including IoT projects. SnapLogic sponsored this research and we have provided an infographic with some of the survey findings.

IoT is more than just a buzzword. Survey results show that enterprises have realized that the data available from things provide a rich source of business insights. 50% of respondents in the survey indicated that IoT is “essential” or “important” to their business strategies.

By the nature of this data, processing latency is important. Respondents needed real-time (32%) or intra-hour (23%) data processing to realize their goals. Respondents also found that traditional ETL was not sufficient for IoT data. Instead, these companies were moving beyond batch toward streaming technologies for integration.

To help enterprises deal with IoT data quickly and efficiently, the SnapLogic Spring ’15 release extended our cloud and big data integration capabilities to the Internet of Things with support for Message Queuing Telemetry Transport (MQTT).

Learn more about the current state of IoT in the enterprise.

snaplogic_API_managementIt’s been said we have entered “the API Economy.” As our partner 3scale describes it: “As web-enabled software becomes the standard for business processes, the ways organizations, partners and customers interface with it have become a critical differentiator in the market place.”

In this post I’ll summarize SnapLogic’s native API management capabilities and introduce the best-in-class partnerships we announced today with 3scale and Restlet to extend the cloud and big data integration capabilities that SnapLogic’s Elastic Integration Platform delivers. In the next set of posts, we’ll provide a deeper overview of our API Management partners.

Exposing SnapLogic Pipelines as APIs
SnapLogic’s unified integration platform as a service (iPaaS) allows citizen integrators and developers to build multi-point integration pipelines that connect cloud and on-premises applications as well as disparate enterprise data sources for big data analytics, and expose them as RESTful APIs. These APIs can be invoked by any authorized user, application, web backend or mobile app through a simple and standardized HTTP call, in order to trigger the execution of the SnapLogic pipeline. Here are two options for exposing SnapLogic pipelines as Tasks:

  • SnapLogic_TasksTriggered Tasks: Each Task exposes a cloud URL and, if it is to be executed on an on-premises Snaplex (aka Groundplex), then it may have also have an on-premises URL. These dataflow pipelines may be invoked using REST GET/POST, passing parameters and optionally a payload in and out. These pipelines use SnapLogic’s standard HTTP basic auth authorization scheme.
  • Ultra Pipelines: These Tasks are “always on”, where the pipeline is memory-resident, ready to process incoming requests with millisecond-level invocation overhead. Ultra Pipelines are ideal for real-time integration processing. In terms of authentication, Ultra Pipelines may be assigned a “bearer token,” which is an arbitrary string that must be passed in the invoking HTTP/s request in the HTTP headers. The token is optional, allowing customers to invoke a pipeline with no authentication, which may be appropriate on internal trusted networks.

SnapLogic Platform APIs
SnapLogic also provides an expanding set of APIs for customer systems to interact with our elastic iPaaS. Examples include:

  • User and Group APIs: Programmatically manage and automate the creation of SnapLogic users and groups.
  • Pipeline Monitoring APIs: Determine pipeline execution status. A common use case is when you are using external enterprise schedulers to monitor your pipelines.

SnapLogic Snaps Consuming Application APIs
SnapLogic Snaps are the intelligent introspecting, dynamic connectors that provide the building blocks for pipelines. Snaps may interface to applications utilizing the APIs provided by the applications. As the application’s APIs evolve, SnapLogic takes care of keeping the Snaps up to date, allowing our customers take care of the business of their business, not the business of API-level integration.

SnapLogic provides a comprehensive set of 300+ pre-built Snaps, but no collection of connectors will ever be complete. To deal with the expanding cloud and on-premises application and data integration needs of our customers, SnapLogic provides an SDK for customers and partners to be able to create their own Snaps, addressing custom endpoints, or custom transformations.

Best in Class API Management Partnerships
From API authoring and authentication to governance and reporting to protocol translation, our enterprise ISV partners extend SnapLogic’s cloud and big data integration capabilities with best-in-class API Management capabilities. From our partners:

“Making it easy to share digital assets is at the core of what we do. Being able to offer our customers this seamless way to expose their data and application and integration pipelines as RESTful APIs is another way to make that happen.”

– Manfred Bortenschlager, API Market Development Director at 3scale


“With over 300 pre-built connectors available in SnapLogic’s Elastic Integration Platform, this partnership enables virtually any data sources, from legacy to cloud, big data and social platforms, to be exposed as RESTful APIs.”

–  Jerome Louvel, Chief Geek and founder at Restlet

Read the press releases to learn more about our API Management partners and Contact Us for more information:

Did I mention we update and introduce new Snaps frequently? Here’s a summary of our most recent Snap update. And here’s what’s coming next weekend (June 27th):

  • A new HBase Snap Pack for big data integration, which includes Get and Put Snaps for use in Standard Mode pipelines.

snaplogic_snapsUpdates to the following Snaps:

  • Applications: Anaplan, ServiceNow
  • Databases and Analytics: AWS Redshift, Oracle RDBMS, Microsoft SQL Server, PostgreSQL, SAP HANA
  • Transforms: Sort, XML Formatter, JSON Splitter, Excel Parser
  • Technologies: JMS, JDBC, SOAP, REST
  • Core Snaps: Flow, Email

For more information on SnapLogic Snaps Contact Us.

Screen Shot 2015-06-16 at 3.52.08 PMWe recently posted a recap of last week’s Hadoop Summit here on the blog and now have a great highlight video to go along with it. The event proved to be a great couple of days for the SnapLogic team to talk to customers and prospects about their Hadoop adoption plans, big data integration use cases, and how to connect faster with the SnapLogic Elastic Integration Platform.

The team also attended keynotes and sessions, had great discussions about the potential of the data lake, and hosted a lunch featuring SnapLogic Chief Scientist and big data expert Greg Benson, who covered big data processing engines, SnapReduce, and how enterprise customers are ingesting, preparing and delivering big data with SnapLogic’s modern data integration platform.

Take a look at the video below, as well as a round-up of some of the social buzz during the event:

There was a lot of social buzz at this year’s Hadoop Summit, with everything from giveaways at our booth to technical sessions covering all things Hadoop:

Congratulations to our partner Hortonworks on a fantastic Hadoop Summit 2015 this week in San Jose, CA. As they outlined on the official blog, the three day conference featured 160+ presentations, community meetups, keynotes (including a great market overview from Geoffrey Moore, who also presented in 2012), and a vibrant ecosystem with lots of activity in the partner expo. While there were lots of deep-dive technical sessions on Spark, Flink, Zeppelin, and other new open source projects, the overall theme of this year’s conference leaned towards enterprise Hadoop adoption, big data use cases and business outcomes. Governance and security were front and center, as was the data lake and how it will both complement and replace elements of your existing analytics infrastructure. I’m pretty sure I even heard a few people talk about metadata.

I’ve seen some great posts about the event already and the Hadoop Summit tweet stream is full of interesting observations and content. Here are some of the better conference reviews I’ve seen so far:

You can watch the keynotes (except Geoffrey Moore, TrueCar and the customer panel, which I think were the best) here. A few of my favorite tweets:

The SnapLogic team was out in full force at the conference talking about our big data integration capabilities and how we Connect Faster (and how we’re hiring).

Connect-Faster-Logo-2015Here are a few resources that summarize SnapReduce, the Hadooplex and our thoughts on the Data Lake:

Stay tuned for more updates on Hadoop Summit 2015 and check out our upcoming events to connect with us live.

We’ve written a lot about the requirements and drivers for enterprise integration platform as a service (iPaaS) on the SnapLogic blog. Between recent industry analyst reports and a steady stream of media coverage of the importance of integration to cloud and big data success like this one: 5 Reasons Integration is Changing, I think it’s fair to say that integration is being “re-imagined” in the enterprise.

network-world-logoI recently wrote an article that first appeared on NetworkWorld on the topic of comparing iPaaS to legacy approaches to cloud and data integration, such as traditional ESB systems. I’ve reposted the article here for comment. As always, I’d appreciate your thoughts.


iPaaS: A New Approach to Cloud Integration

Application integration has often been an exercise in frustration – long delays, high costs and over promises by vendors. How many ERP projects have you heard of that were canceled or shelved due to complex customization and integration challenges?

Integration though, is coming to a new place. Cloud technologies and open APIs are helping enterprises merge on-premise and off-premise systems without considerable coding and re-architecting. Instead of requiring specialists in SOA, enterprise service bus (ESB), extract transform and load (ETL) and data warehousing, organizations are hoping the concept of Integration Platform as a Service (iPaaS) can be used to integrate systems in half the time using technically-savvy generalists and increased involvement from lines of business.

As defined by Gartner, Forrester, Ovum and other analyst firms, iPaaS represents a new approach for enterprise IT organizations that are undergoing a rapid transition to the cloud and have big data plans.

Behind the move to more flexible, cloud integration platforms are two core trends: “cloudification,” the race to transform organizations to cloud-based architectures; and the need for agility, as business users expectations’ for rapid delivery of new web, social and mobile services continue to grow.

In our recent TechValidate survey we asked about the drivers for adopting a cloud integration platform and the top response was “speed and time to value.” Respondents also took issue with legacy, on-premises tools to address cloud integration requirements, with 43% noting the high expense of hardware and software purchases and configurations. More than a third (35%) said change management is painful in legacy tools where end point changes mean integration re-work.

Let’s look at the legacy approaches to the problem:

no_esbThe Enterprise Service Bus

  • ESB is a middleware architecture that was designed to manage access to applications and services and present a single and consistent interface to end-users. ESB incorporates the features required to implement a service-oriented architecture (SOA), and was appealing to enterprise IT organizations struggling with constantly changing application versions and upgrades.
  • “Loose coupling” would introduce much more flexibility to application lifecycle management, it was the thought. Unfortunately for most, implementing the SOA and ESB vision was too expensive and unwieldy. IT organizations needed to install three environments (development, test and production), leading to delays.
  • Secondly, ESBs were not very flexible in managing change, such as adding a new field to an end point, because of the inflexible underlying technologies. Thirdly, ESB projects required high-priced specialized integration experts. As a result, many IT organizations have continued to use the same old point-to-point enterprise application integration (EAI) methods of the past, which does not bode well for integrating the more dynamic cloud applications businesses now prefer, such as Salesforce and Workday.

ETL or Batch Data Integration
ETL is typically used for getting data in and out of a repository (data mart, data warehouse) for analytical purposes, and often addresses data cleansing and quality as well as master data management (MDM) requirements. With the onset of Hadoop to cost-effectively address the collection and storage of structured and unstructured data, however, the relevance of traditional rows-and-columns-centric ETL approaches is now in question.

XML-based Integration
Many application integration tools are XML-based, which over time has resulted in some technical shortcomings. A few of those include the fact that XML encoding tags are significant and can result in bloated payloads and the expensive overhead of repeatedly marshaling the data into and out of the document object model (DOM). What’s more, unlike JavaScript Object Notation (JSON), XML is not ideal for supporting the poly-structured information that’s becoming more common in today’s enterprise. XML-based tools were designed to handle smaller data sets at low latencies, causing problems when companies attempt to use such tools for high-volume, high-speed cloud application integration projects.

iPaaS attempts to solve many of the problems that legacy systems have not been able to do cost-effectively or within the faster requirements of agile-based development. iPaaS is a set of cloud-based services that enables both IT organizations and lines of business to develop, deploy, manage, govern and integrate applications and business systems.

Vendors provide the software and hardware infrastructure, as well as the tools for building/testing/deploying/monitoring and orchestrating integration flows. Solutions include pre-built connectors to support a variety of modern and legacy data sources and systems. While still early in enterprise IT adoption, iPaaS solutions are developed to meet the new cloud expectations of the business, built on modern lightweight and more flexible standards like JSON and REST, with the ability to scale in and out elastically when needed.

Since iPaaS abstracts the complexity, and in this case, the code, there is a perceived loss of functionality or flexibility for IT, but the trade-off is a gain in productivity.

What to Consider
In large organizations, moving to an iPaaS solution is often a step-by-step approach and many companies will retain ESB and other older architectures for a period of time as they modernize their application and data infrastructure. Here are some thoughts on how to approach iPaaS.

First, evaluate iPaaS providers for the following technical requirements:

  • SnapLogic_architectureMetadata-driven integrations vs. programmatic approaches
  • Drag-and-drop user experience that allows for some degree of self-service
  • Pre-built connectivity (minimizing coding)
  • Cloud-based management and monitoring, including comprehensive error management, transactional support, data transformation and other operations
  • A hybrid deployment model that respects data gravity and allows processing to run close to the applications, regardless of where they reside.

In addition, if your organization is transforming to a cloud-based enterprise where agility is valued, be sure to dig into the platform aspects of iPaaS and ensure you don’t get locked into a technology that’s only suited for one style of integration. An iPaaS solution must be built to expose and consume micro-services and be able to handle real-time application integration, as well as the new big data integration requirements that are driving predictive analytics, digital marketing and customer-centricity initiatives in the modern enterprise.

To handle the new social, mobile, analytics, cloud and the Internet of Things (SMACT) data and API requirements seamlessly, an iPaaS needs to expand and contract compute capacity to handle variable workloads while streaming data into and out of a Hadoop-based analytics infrastructure.

While integration is certainly not a new enterprise IT challenge, there’s still a lot of old thinking and even older technology that must be reconsidered as cloud business application adoption grows and new approaches emerge to handle big data requirements. The good news is there is once again tremendous innovation happening in the integration market. As result, iPaaS is also gaining enterprise IT acceptance and adoption.

With Hadoop Summit this week in San Jose and so many opinions (and survey results) being shared about whether the data lake and Hadoop are half full or half empty, I thought I’d repost an article I wrote that was fist published on the Datanami site a few weeks ago. But first, a few of the half full/empty posts I’m referring to:

I’ll be at the Hadoop Summit this week with the SnapLogic Team (details here) and would love to explore these and other big data topics. Here’s my Datanami post: What Lies Beneath the Data Lake. Please let me know if you have feedback.


Hadoop and the data lake represents potential business breakthrough for enterprise big data goals, yet beneath the surface is the murky reality of data chaos.

In big data circles, the “data lake” is one of the top buzzwords today. The premise: companies can collect and store massive volumes of data from the Web, sensors, devices and traditional systems, and easily ingest it in one place for analysis.

The data lake is a strategy from which business-changing big data projects can begin, revealing potential for new types of real-time analyses which have long been a mere fantasy. From connecting more meaningfully with customers while they’re on your site to optimizing pricing and inventory mix on-the-fly to designing smart products, executives are tapping their feet waiting for IT to deliver on the promise.

Until recently, though, even large companies couldn’t afford to continue investing in traditional data warehouse technologies to keep pace with the growing surge of data from across the Web. Maintaining a massive repository for cost-effectively holding terabytes of raw data from machines and websites as well as traditional structured data was technologically and economically impossible until Hadoop came along.

Hadoop, in its many iterations, has become a way to at last manage and merge these unlimited data types, unhindered by the rigid confines of relational database technology. The feasibility of an enterprise data lake has swiftly improved, thanks to Hadoop’s massive community of developers and vendor partners that are working valiantly to make it more enterprise friendly and secure.

Yet with the relative affordability and flexibility of this data lake come a host of other problems: an environment where data is not organized or easily manageable, rife with quality problems and unable to quickly deliver business value. The worst-case scenario is that all that comes from the big data movement is data hoarding – companies will have stored petabytes of data, never to be used, eventually forgotten and someday deleted. This outcome is doubtful, given the growing investment in data discovery, visualization, predictive analytics and data scientists.

For now, there are several issues to be resolved to make the data lake clear and beautiful—rather than a polluted place where no one wants to swim.

Poor Data Quality
This one’s been debated for a while, and of course, it’s not a big data problem alone. Yet it’s one glaring reason why many enterprises are still buying and maintaining Oracle and Teradata systems, even alongside their Hadoop deployments. Relational databases are superb for maintaining data in structures that allow for rapid reporting, protection, and auditing. DBAs can ensure data is in good shape before it gets into the system. And, since such systems typically deal only with structured data in the first place, the challenge for data quality is not as vast.

In Hadoop, however, it’s a free-for-all: typically no one’s monitoring anything in a standard way and data is being ingested raw and ad hoc from log files, devices, sensors and social media feeds, among other unconventional sources. Duplicate and conflicting data sets are not uncommon in Hadoop. There’s been some effort by new vendors to develop tools that incorporate machine learning for improved filtering and data preparation. Yet companies also need a foundation of people—skilled Hadoop technicians—and process to attack the data quality challenge

Lack of Governance
Closely related to the quality issue is data governance. Hadoop’s flexible file system is also its downside. You can import endless data types into it, but making sense of the data later on isn’t easy. There’s also been plenty of concerns about securing data (specifically access) within Hadoop. Another challenge is that there are no standard toolsets yet for importing data in Hadoop and extracting it later. This is a Wild West environment, which can lead to compliance problems as well as slow business impact.

To address the problem, industry initiatives have appeared, including the Hortonworks-sponsored Data Governance Initiative. The goal of DGI is to create a centralized approach to data governance by offering “fast, flexible and powerful metadata services, deep audit store and an advanced policy rules engine.” These efforts among others will help bring maturity to big data platforms and enable companies to experiment with new analytics programs.

Skills Gaps
In a recent survey of enterprise IT leaders conducted by TechValidate and SnapLogic, the top barrier to big data ROI indicated by participants was a lack of skills and resources. Still today, there are a relatively small number of specialists skilled in Hadoop. This means that while the data lake can be a treasure chest, it’s one that is still somewhat under lock and key. Companies will need to invest in training and hiring of individuals who can serve as so-called “data lake administrators.” These data management experts have experience managing and working with Hadoop files and possess in-depth knowledge of the business and its various systems and data sources that will interact with Hadoop.

Transforming the data lake into a business strategy that benefits customers, revenue growth and innovation is going to be a long journey. Aside from adding process and management tools, as discussed above, companies will need to determine how to integrate old and new technologies. More than half of the IT leaders surveyed by TechValidate indicated that they weren’t sure how they were going to integrate big data investments with their existing data management infrastructure in the next few years. Participants also noted that the top big data investments they would be making in the near term are analytics and integration tools.

We’re confident that innovation will continue rapidly for new Big Data-friendly integration and management platforms, but there’s also need to apply a different lens to the data lake. It’s time to think about how to apply processes, controls and management tools to this new environment, yet without weakening what makes the data lake such a powerful and flexible tool for exploration and delivering novel business insights.


For more information about SnapLogic big data integration visit www.snaplogic.com/bigdata. Please be sure to also take a minute to complete the Hadoop Maturity Survey for a chance to win an Amazon Gift Card.