Integrate through the big data insights gap

By Bill Creekbaum

Whether you’re an analyst, data scientist, CxO, or just a “plain ol’ business user,” having access to more data represents an opportunity to make better business decisions, identify new and innovative opportunities, respond to hard-to-identify threats … the opportunities abound.

More data – from IoT, machine logs, streaming social media, cloud-native applications, and more – is coming at you with diverse structures and in massive volumes at high velocity. Traditional analytic and integration platforms were never designed to handle these types of workloads.

The above data is often associated with big data and tends to be accessible by a very limited audience with a great deal of technical skill and experience (e.g., data scientists), limiting the business utility of having more data. This creates a big data insights gap and prevents a much broader business user and analyst population from big data benefits. Our industry’s goal should be to help business users and analysts operationalize insights from big data. In fact, Forbes has declared that 2017 is the year that big data goes mainstream.

There are two critical elements needed to close this big data insights gap:

  • A scalable data platform: Handles big data that is compatible with “traditional” analytic platforms
  • An integration platform: Acquires large volumes of high-velocity diverse data without IT dependency

To address the first element, Amazon has released Amazon Redshift Spectrum as part of their growing family of AWS big data services. Optimized for massive data storage (e.g., petabytes and exabytes) that leverages S3 and delivered with the scalable performance of Amazon Redshift, AWS is making the above scenarios possible from an operational, accessibility, and economic perspective:

  • Operational: Amazon Redshift Spectrum allows for interaction with data volumes and diversity not possible with traditional OLAP technology.
  • Accessibility: SQL interface allows business users and analysts to use traditional analytic tools and skills to leverage these extreme data sets.
  • Economic: Amazon Redshift Spectrum shifts the majority of big data costs to S3 service which is far more economical than storing the entire data set in Redshift.

Clearly, Amazon has delivered a platform that can democratize the delivery of extremely large volumes of diverse business data to business users and analysts, allowing them to use the tools they currently employ, such as Tableau, PowerBI, QuickSight, Looker, and other SQL-enabled applications.

However, unless the large volumes of high velocity and diverse data can be captured, loaded to S3, and made available via Redshift Spectrum, none of the above benefits will be realized and the big data insights gap will remain.

The key challenges of acquiring and integrating large volumes of high velocity and diverse data:

  • On-prem in a Cloud-Native World: Many integration platforms were designed long ago to operate on-premises and to load data to an OLAP environment in batches. While some have been updated to operate in the cloud, many will fail with streaming workloads and collapse under the high volume of diverse data required today.
  • Integration is an “IT Task”: Typical integration platforms are intended to be used by IT organizations or systems integrators. Not only does this severely limit who can perform the integration work, it will also likely force the integration into a lengthy project queue, causing a lengthy delay in answering critical business questions.

To address the second element in closing the big data insights gap, business users and analysts themselves must be able to capture the “big data” so that business questions can be answered in a timely manner. If it takes a long and complex IT project to capture the data, the business opportunity may be lost.

To close the big data insights gap for business users and analysts, the integration platform must:

  • Handle large volumes of high velocity and diverse data
  • Focus on integration flow development (not complex code development)
  • Comply with IT standards and infrastructure

With the above approach to integration, the practical benefit is that those asking the business questions and seeking insights from having more data are able to leverage the powerful capabilities of Amazon Redshift Spectrum and will be able to respond business opportunities while it still matters.

Amazon’s Redshift Spectrum and the SnapLogic Enterprise Integration Cloud represent a powerful combination to close the big data insights gap for business users and analysts. In upcoming blog posts, we’ll look at actual use cases and learn how to turn these concepts into reality.

Interested in how SnapLogic empowers cloud warehouse users with up to a 10x improvement in the speed and ease of data integration for Redshift deployments, check out the white paper, “Igniting discovery: How built-for-the-cloud data integration kicks Amazon Redshift into high gear.”

Bill Creekbaum is Senior Director, Product Management at SnapLogic. Follow him on Twitter @wcreekba.

Data integration: The key to operationalizing innovation

md_craig-BW-1443725112By Craig Stewart

It’s not just a tongue twister. Operationalizing innovation has proven to be one of the most elusive management objectives of the new millennium. Consider this sound bite from an executive who’d just participated in an innovation conference in 2005:

The real discussion at the meeting was about … how to operationalize innovation. All roads of discussion led back to that place. How do you make your company into a systemic innovator? There is no common denominator out there, no shared understanding on how to do that.[1]

The good news is that, in the 12 years since, cloud computing has exploded, and a common denominator clearly emerged: data. Specifically, putting the power of data – big data, enterprise data, and data from external sources – and analytics into users’ hands. More good news: An entirely new class of big data analysis tools[2] has emerged that allows business users to become “citizen data analysts.”

The bad news: There hasn’t been a fast, easy way to perform the necessary integrations between data sources, in the cloud – an essential first step that is the foundation of citizen data analytics, today’s hottest source of innovation.

Until now.

The SnapLogic Enterprise Integration Cloud is a mature, full-featured Integration Platform-as-a-Service (iPaaS) built in the cloud, for the cloud. Through its visual, automated approach to integration, the SnapLogic Enterprise Integration Cloud uniquely empowers both business and IT users, accelerating analytics initiatives on Amazon Redshift and other cloud data warehouses.

Unlike on-premises ETL or immature cloud tools, SnapLogic combines ease of use, streaming scalability, on-premises and cloud integration, and managed connectors called Snaps. Together, these capabilities present a 10x improvement over legacy ETL solutions like Informatica or other “cloud-washed” solutions originally designed for on-premises use, accelerating integrations from months to days.

By enabling “citizen integrators” to more quickly build, deploy and efficiently manage multiple high-volume, data-intensive integration projects, SnapLogic uniquely delivers:

  • Ease of use for business and IT users through a graphical approach to integration
  • A solution built for scale, offering bulk data movement and streaming data integration
  • Ideal capabilities for hybrid environments, with over 400 Snaps to handle relational, document, unstructured, and legacy data sources
  • Cloud data warehouse-readiness with native support for Amazon Redshift and other popular cloud data warehouses
  • Built-in data governance* by synchronizing data in Redshift at any time interval desired, from real-time to overnight batch.

* Why data governance matters

Analytics performed on top of incorrect data yield incorrect results – a detriment, certainly, in the quest to operationalize innovation. Data governance is a significant topic, and a major concern of IT organizations charged with maintaining the consistency of data routinely accessed by citizen data scientist and citizen integrator populations. Gartner estimates that only 10% of self-service BI initiatives are governed[3] to prevent inconsistencies that adversely affect the business.

Data discovery initiatives using desktop analytics tools risk creating inconsistent silos of data. Cloud data warehouses afford increased governance and data centralization. SnapLogic helps to ensure strong data governance by replicating source tables into Redshift clusters, where the data can be periodically synchronized at any time interval desired, from real-time to overnight batch. In this way, data drift is eliminated, allowing all users who access data, whether in Redshift or other enterprise systems, to be confident in its accuracy.

To find out more about how SnapLogic empowers citizen data scientists, and how a global pharmaceutical company is using SnapLogic to operationalize innovation, get the white paper, “Igniting discovery: How built-for-the-cloud data integration kicks Amazon Redshift into high gear.

Craig Stewart is Vice President, Product Management at SnapLogic.

[1] “Operationalizing Innovation–THE hot topic,” Bruce Nussbaum, Bloomberg, September 28, 2005. https://www.bloomberg.com/news/articles/2005-09-28/operationalizing-innovation-the-hot-topic

[2] “The 18 Best Analytics Tools Every Business Manager Should Know,” Bernard Marr, Forbes, February 4, 2016. https://www.forbes.com/sites/bernardmarr/2016/02/04/the-18-best-analytics-tools-every-business-manager-should-know/#825e6115d397

[3] “Predicts 2017: Analytics Strategy and Technology,” Kurt Schlegel, et. al., Gartner, November 30, 2016. ID: G00316349

Iris – Can you build an integration pipeline for me?

The promise of Artificial Intelligence technology is flourishing. From Amazon shopping recommendations, Facebook image recognition, and personal assistants like Siri, Cortana, and Alexa,  AI is becoming part of our everyday lives, whether we know it or not. These apps use information collected from your past requests to make predictions and deliver results that are tailored to your preferences. 

The importance of AI in today’s world is not lost upon us at SnapLogic. We are always trying to keep up with the latest innovations and technologies, so making our software fast, efficient, and automated for our customers has always been our goal. With the Spring release, SnapLogic launched the SnapLogic Integration Assistant. The SnapLogic Integration Assistant is a recommendation engine that uses Artificial Intelligence and machine learning to predict the next step in building a data pipeline architecture. Iris uses advanced algorithms to collect information from millions of metadata elements and billions of data flows to make predictions and deliver results that are tailored to the customer’s needs.

Currently, customers build pipelines by searching and selecting from over 400 Snaps in the SnapLogic catalog and dragging and dropping them into the canvas. Repeating this step for every single Snap, although easy, can make a pipeline building process somewhat tedious and time-consuming. But with the Integration Assistant, operations like these are simplified to give business users the right next steps in the building process, making pipeline building easy and efficient. Besides efficiency and speed, the “self-driving” software shortens the learning curve for line-of-business users to manage their data flows while freeing technology staff for higher-value software development. See how it works in this video.

In the next few steps,  learn how to enable this feature and start building interactive pipelines yourself.

Right now, we have two ways of building pipelines:

  • Choose a Snap from the SnapLogic catalog
  • Use the Integration Assistant for recommending the right Snaps

How to enable the Integration Assistant feature

By default, the Integration Assistant option is turned off,  allowing you to continue building pipelines by selecting Snaps in the SnapLogic Catalog. However, to utilize the Integration Assistant, just head to the Settings icon and check the Integration Assistant option.

Once the Integration Assistant is enabled, you’ll immediately see the benefits of the self-guided user interface. Drag the first snap onto the canvas and the Integration Assistant instantly kicks in and highlights the next suitable Snap. At the same time, it also opens up another panel that lists suggested Snaps on the right-hand side of the canvas. These AI-driven Snap recommendations are based on the historical metadata from your previous workflows.

Next, you can choose to click the highlighted Snap or pick from the recommended list by dragging the suitable Snap into the canvas. This process continues further until you select a snap with a closed output. At this point, the Integration Assistant will stop suggesting Snaps and the pipeline will be ready for execution.

As you can see, the Integration Assistant improves your pipeline building experience by suggesting Snaps that are the best fit for your organization based on the historical metadata flows.

Interested in learning more? Watch a quick demo on our YouTube channel – SnapLogic Spring 2017: Integration Assistant.”

Namita Prabhu is Senior QA Manager at SnapLogic.

Gaurav Dhillon on the BBC: “Entrepreneurship, Innovation & Passion”

Growing up in India, SnapLogic CEO Gaurav Dhillon was an avid listener of the BBC World Service, the preeminent international radio broadcasting giant based out of London. He wasn’t alone – since 1939, the BBC World Service has ruled the global airwaves, reaching hundreds of millions of people every week with its award-winning 24-hour news, reporting, and analysis.

So it was full-circle for Gaurav, when on a trip to London last week, he was invited by the BBC World Service to appear as a guest on its live global news radio program, ‘Business Matters.‘ Hosted by Roger Hearing, the hour-long show reviews the day’s global headlines and business news, with commentary throughout from live guests – this time including Gaurav and Marketplace reporter Nancy Marshall-Genzer.

After discussing the day’s news, the discussion turned to Gaurav and his storied career in technology. On the table for discussion: Gaurav’s upbringing in India, early influences that shaped his life’s ambitions, how he helped build Informatica from an idea into a billion dollar public company, why his burning desire to innovate and build great products led to his departure from Informatica and to the founding of SnapLogic, and his advice to aspiring entrepreneurs who want to make their mark and have an impact on our world.

You can listen to the full broadcast here. The focused 12-minute discussion with Gaurav starts around the 26:30 mark.

SnapLogic Live: Demos and Q&A with Our Team of Experts

SnapLogic-LiveFor the past few months, we’ve been hosting a series of bi-weekly live demo sessions with our team of integration experts at SnapLogic. Each week we focus on one particular type of integration and, via a live demo, shows how customers can solve different integration challenges using the SnapLogic Elastic Integration Platform. So far, we’ve covered everything from app integration to big data integration for customers using Hadoop.

In addition to the demo, we also take time to cover a Q&A session based on questions we receive from customers and attendees.

If you’re interested in ServiceNow integration, register here for next week’s SnapLogic Live session which will take place on Thursday, October 29th at 10:00am PST / 1:00pm EST. In the meantime, recordings of past sessions are available to view on our video site. One of our most popular SnapLogic Live demos, featuring Salesforce integration, can also be viewed below:

Webinar: It’s the 21st Century – Why Isn’t Your Data Integration Loosely Coupled?

LKDN_IntellyxWebinar_180x110“The problem with traditional connectors is that they are tightly coupled – any change in the data format or interface requirements for either end of any interaction would require an update of the connector, at the risk of a failed interaction.”

– Jason Bloomberg, President, Intellyx

Join us next Tuesday, May 19th for an interactive webinar with digital transformation and SOA thought leader, Jason Bloomberg. In this webinar we’ll hear from Jason about how connectors have been a traditional enterprise application integration (EAI) tool since the dawn of EAI back in the 1990s and how the rise of SOA and Web Services was in part intended to resolve the limitations of such traditional connectors, but often fall short. Additional topics covered will include:

  • A discussion of the age-old problem of implementing loosely coupled data integration
  • An architectural approach to solving this difficult problem
  • A demonstration of SnapLogic’s approach to solving the data integration challenge in a scalable and cloud-friendly manner that aligns with modern application architectures

Before joining the webinar next week, you can also review last week’s Spring 2015 release and learn a little more about Jason Bloomberg here:

Jason Bloomberg is the leading industry analyst and expert on achieving agile digital transformation by architecting business agility in the enterprise. He writes for Forbes, Wired, and his biweekly newsletter, the Cortex. As president of Intellyx, he advises business executives on their digital transformation initiatives, trains architecture teams on Agile Architecture, and helps technology vendors and service providers communicate their agility stories. His latest book is The Agile Architecture Revolution.

A few of our past blog posts also address some of the topics we’ll be diving into next week. Check them out:

Register for the webinar here – we look forward to next week’s interactive discussion.

Getting a Better Return on your Big Data Investment

SNAP_IN_BIG_DATAWhat happens when you’re faced with the challenge of maintaining a legacy data warehouse while dealing with ever-increasing volumes, varieties and velocities of data?

While powerhouses in the days of structured data, legacy data warehouses commonly consisted of RDMS technologies from the likes of Oracle, IBM, Microsoft, and Teradata. Data was extracted, transformed and loaded into data marts or the enterprise data warehouse with traditional ETL tools, built to handle batch-oriented use cases, running on expensive, multi-core servers. In the era of self-service and big data, there’s a rethinking of these technologies and approaches going on in enterprise IT.

In response to these challenges, companies are steadfastly moving to modern data management solutions that consist of NoSQL databases like MongoDB, Hadoop distributions from vendors like Cloudera and Hortonworks, cloud-based systems like Amazon Redshift, and data visualization from Tableau and others. Along this big data and cloud data warehouse journey, many people I speak with have realized that it’s vital to not only modernize their data warehouse implementations, but to future-proof how they collect and drive data into their new analytics infrastructures – requiring an agile, multi-point data integration solution that is both seamless and capable of dealing with structured and unstructured streaming real-time and batch-oriented data. As companies reposition IT and the analytics infrastructure from back-end infrastructure management cost centers to an end-to-end partner of the business, service models become an integral part of both the IT and business roadmaps.

The Flexibility, Power and Agility Required for the New Data Warehouse

In most enterprises today, IT’s focus is moving away from using its valuable resources for undifferentiated-heavy lifting and more into delivering differentiating business value. By using SnapLogic to move big data integration management closer to the edge, resources can be freed up for more value-based projects and tasks while streamlining and accelerating the entire data-to-insights process. SnapLogic’s drag-and-drop data pipeline builder and streaming integration platform removes the burden and complexity of data ingestion into systems like Hadoop, transforming data integration from a rigid, time-consuming process into a process that is more end-user managed and controlled.

Productivity Gains with Faster Data Integration

A faster approach to data integration not only boosts productivity, but in many cases results in substantive cost savings. In one case, a SnapLogic customer with over 200 integration interfaces, managed and supported by a team of 12, was able to reduce their integration management footprint down to less than 2 FTEs, realizing an annual hard cost savings of more than 62% annually, 8:1 annual FTE improvement, infrastructure savings of over 50% and a 35% improvement in their dev-ops release schedule. Ancillary effects with that same customer realized a net productivity gain and increased speed to market by transferring ownership of their Hadoop data ingest process to data scientists and marketers. This shift resulted in the company being more responsive while significantly streamlining their entire data insights process for faster, cheaper and better decision-making across their entire enterprise.

More Agile Company

Increased integration agility means having the ability to make faster, better and cheaper moves and changes. SnapLogic’s modular design allows data scientists and marketers to be light on their feet, making adds, moves and changes in a snap with the assurance they require as new ideas arise and new flavors of data sources enter the picture.

By integrating with Hadoop through the SnapLogic Enterprise Integration Platform, with fast, modern and multi-point data integration, customers have the ability to seamlessly connect to and stream data from virtually any endpoint, whether cloud-based, ground-based, legacy, structured or unstructured. In addition, by simplifying the data integration, SnapLogic customers no longer need to use valuable IT resources to manage and maintain data pipelines, freeing them to contribute to areas of more business value.

Randy Hamilton is a Silicon Valley entrepreneur and technologist who writes periodically about industry related topics including the cloud, big data and IoT.  Randy has held positions as Instructor (Open Distributed Systems) at UC Santa Cruz and has enjoyed positions at Basho (Riak NoSQL database), Sun Microsystems, and Outlook Ventures, as well as being one of the founding members and VP Engineering at Match.com.

Next Steps: