Data integration: The key to operationalizing innovation

md_craig-BW-1443725112By Craig Stewart

It’s not just a tongue twister. Operationalizing innovation has proven to be one of the most elusive management objectives of the new millennium. Consider this sound bite from an executive who’d just participated in an innovation conference in 2005:

The real discussion at the meeting was about … how to operationalize innovation. All roads of discussion led back to that place. How do you make your company into a systemic innovator? There is no common denominator out there, no shared understanding on how to do that.[1]

The good news is that, in the 12 years since, cloud computing has exploded, and a common denominator clearly emerged: data. Specifically, putting the power of data – big data, enterprise data, and data from external sources – and analytics into users’ hands. More good news: An entirely new class of big data analysis tools[2] has emerged that allows business users to become “citizen data analysts.”

The bad news: There hasn’t been a fast, easy way to perform the necessary integrations between data sources, in the cloud – an essential first step that is the foundation of citizen data analytics, today’s hottest source of innovation.

Until now.

The SnapLogic Enterprise Integration Cloud is a mature, full-featured Integration Platform-as-a-Service (iPaaS) built in the cloud, for the cloud. Through its visual, automated approach to integration, the SnapLogic Enterprise Integration Cloud uniquely empowers both business and IT users, accelerating analytics initiatives on Amazon Redshift and other cloud data warehouses.

Unlike on-premises ETL or immature cloud tools, SnapLogic combines ease of use, streaming scalability, on-premises and cloud integration, and managed connectors called Snaps. Together, these capabilities present a 10x improvement over legacy ETL solutions like Informatica or other “cloud-washed” solutions originally designed for on-premises use, accelerating integrations from months to days.

By enabling “citizen integrators” to more quickly build, deploy and efficiently manage multiple high-volume, data-intensive integration projects, SnapLogic uniquely delivers:

  • Ease of use for business and IT users through a graphical approach to integration
  • A solution built for scale, offering bulk data movement and streaming data integration
  • Ideal capabilities for hybrid environments, with over 400 Snaps to handle relational, document, unstructured, and legacy data sources
  • Cloud data warehouse-readiness with native support for Amazon Redshift and other popular cloud data warehouses
  • Built-in data governance* by synchronizing data in Redshift at any time interval desired, from real-time to overnight batch.

* Why data governance matters

Analytics performed on top of incorrect data yield incorrect results – a detriment, certainly, in the quest to operationalize innovation. Data governance is a significant topic, and a major concern of IT organizations charged with maintaining the consistency of data routinely accessed by citizen data scientist and citizen integrator populations. Gartner estimates that only 10% of self-service BI initiatives are governed[3] to prevent inconsistencies that adversely affect the business.

Data discovery initiatives using desktop analytics tools risk creating inconsistent silos of data. Cloud data warehouses afford increased governance and data centralization. SnapLogic helps to ensure strong data governance by replicating source tables into Redshift clusters, where the data can be periodically synchronized at any time interval desired, from real-time to overnight batch. In this way, data drift is eliminated, allowing all users who access data, whether in Redshift or other enterprise systems, to be confident in its accuracy.

To find out more about how SnapLogic empowers citizen data scientists, and how a global pharmaceutical company is using SnapLogic to operationalize innovation, get the white paper, “Igniting discovery: How built-for-the-cloud data integration kicks Amazon Redshift into high gear.

Craig Stewart is Vice President, Product Management at SnapLogic.

[1] “Operationalizing Innovation–THE hot topic,” Bruce Nussbaum, Bloomberg, September 28, 2005. https://www.bloomberg.com/news/articles/2005-09-28/operationalizing-innovation-the-hot-topic

[2] “The 18 Best Analytics Tools Every Business Manager Should Know,” Bernard Marr, Forbes, February 4, 2016. https://www.forbes.com/sites/bernardmarr/2016/02/04/the-18-best-analytics-tools-every-business-manager-should-know/#825e6115d397

[3] “Predicts 2017: Analytics Strategy and Technology,” Kurt Schlegel, et. al., Gartner, November 30, 2016. ID: G00316349

How to get valuable insights on data stored in Azure Data Lake Store

In a previous blog post, I discussed major trends in the data integration space and customers moving from on-prem to cloud. I’d like to focus on one trend which involves moving data from on-premises or cloud data analytics platforms to a Data Lake technology such as Azure Data Lake.

What is a Data Lake?

The Data Lake is a term coined for storing large amounts of data in its raw native form, including structured and unstructured data in one location. This data can come from various sources, and the Data Lake can act as a single source of truth for any organization. From the architecture standpoint, the data is first stored in data swamp/data acquisition, then cleansed/transformed as part of data transformation, and later published to gain business insights.

Data Lake

As seen in the diagram above, enterprises have multiple systems such as ERP, CRM, RDBMS, NoSQL, IoT sensors, etc. The disparate data, stored in different systems makes, is difficult to pull data from. A Data Lake brings all the data under one roof (data acquisition) using one of the following services:

  • Azure Blob
  • Azure Data Lake Store
  • Amazon S3
  • HDFS
  • Others

Data stored in one of these services can then be transformed in the following ways:

  • Aggregate
  • Sort
  • Join
  • Merge
  • Other

The transformed data is then moved to the data publish/data access section (could be the same as data acquisition services) where users can utilize the following tools to query the data:

  • Microsoft’s U-SQL
  • Amazon Athena
  • Hive
  • Presto
  • Others etc.

The bottom line is that a Data Lake can serve as a platform to run analytics in order to provide better customer experience, recommendations, and more.

Azure Data Lake is one such Data Lake from Microsoft and the repository used to store all the data is Azure Data Lake Store. Users can run Analytics Service, HDInsight or use U-SQL – a big data query language on top of this data store to gain better business insights.

ADLSSource: Microsoft

Azure Data Lake Store (ADLS) can store any data in its native format. One of the goals of this data store is to bring data from disparate sources. The Snaplogic Enterprise Integration Cloud with its pre-built connectors called Snaps help by moving data from different systems to the data store in a fast manner.

ADLS provides a complex API, which applications use to store data in ADLS. Snaplogic has abstracted all these complexities via Snaps so users can now easily move data from various systems to ADLS without needing to know anything of the complexities of these APIs.

Use case

A business needs to track and analyze content to better recommend products or services to its customers. Its data – from various sources such as Oracle, files, Twitter, etc. – needs to be stored in a central repository such as ADLS so that business users can run analytics on top to measure customer buying behavior, their interests, and products purchased.

Here’s a sample pipeline that can address this use case using Snaps:

Using the File Writer Snap and choosing the Azure Data Lake account as shown below, one can store the data merged from various systems into Azure Data Lake with ease.

All in all, the Data Lake can be a one-stop shop of storage for any data, giving users more ways to derive insights from multiple data sources. And SnapLogic is ready to make it easier for users to move their data into the Data Lake (in this case, an Azure Data Lake Store) in a quick and easy way.

Pavan Venkatesh is Senior Product Manager at SnapLogic. Follow him on Twitter @pavankv.

Will the Cloud Save Big Data?

This article was originally published on ITProPortal.

Employees up and down the value chain are eager to dive into big data solutions, hunting for golden nuggets of intelligence to help them make smarter decisions, grow customer relationships and improve business efficiency. To do this, they’ve been faced with a dizzying array of technologies – from open source projects to commercial software products – as they try to wrestle big data to the ground.

Today, a lot of the headlines and momentum focus around some combination of Hadoop, Spark and Redshift – all of which can be springboards for big data work. It’s important to step back, though, and look at where we are in big data’s evolution.

In many ways, big data is in the midst of transition. Hadoop is hitting its pre-teen years, having launched in April 2006 as an official Apache project – and then taking the software world by storm as a framework for distributed storage and processing of data, based on commodity hardware. Apache Spark is now hitting its strides as a “lightning fast” streaming engine for large-scale data processing. And various cloud data warehousing and analytics platforms are emerging, from big names (Amazon Redshift, Microsoft Azure HDInsight and Google BigQuery) to upstart players like Snowflake, Qubole and Confluent.

The challenge is that most big data progress over the past decade has been limited to big companies with big engineering and data science teams. The systems are often complex, immature, hard to manage and change frequently – which might be fine if you’re in Silicon Valley, but doesn’t play well in the rest of the world. What if you’re a consumer goods company like Clorox, or a midsize bank in the Midwest, or a large telco in Australia? Can this be done without deploying 100 Java engineers who know the technology inside and out?

At the end of the day, most companies just want better data and faster answers – they don’t want the technology headaches that come along with it. Fortunately, the “mega trend” of big data is now colliding with another mega trend: cloud computing. While Hadoop and other big data platforms have been maturing slowly, the cloud ecosystem has been maturing more quickly – and the cloud can now help fix a lot of what has hindered big data’s progress.

The problems customers have encountered with on-premises Hadoop are often the same problems that were faced with on-premises legacy systems: there simply aren’t enough of the right people to get everything done. Companies want cutting-edge capabilities, but they don’t want to deal with bugs and broken integrations and rapidly changing versions. Plus, consumption models are changing – we want to consume data, storage and compute on demand. We don’t want to overbuy. We want access to infrastructure when and how we want it, with just as much as we need but more.

Big Data’s Tipping Point is in the Cloud

In short, the tipping point for big data is about to happen – and it will happen via the cloud. The first wave of “big data via the cloud” was simple: companies like Cloudera put their software on Amazon. But what’s “truly cloud” is not having to manage Hadoop or Spark – moving the complexity back into a hosted infrastructure, so someone else manages it for you. To that end, Amazon, Microsoft and Google now deliver “managed Hadoop” and “managed Spark” – you just worry about the data you have, the questions you have and the answers you want. No need to spin up a cluster, research new products or worry about version management. Just load your data and start processing.

There are three significant and not always obvious benefits to managing big data via the cloud: 1) Predictability – the infrastructure and management burden shifts to cloud providers, and you simply consume services that you can scale up or down as needed; 2) Economics – unlike on-premises Hadoop, where compute and storage were intermingled, the cloud separates compute and storage so you can provision accordingly and benefit from commodity economics; and 3) Innovation – new software, infrastructure and best practices will be deployed continuously by cloud providers, so you can take full advantage without all the upfront time and cost.

Of course, there’s still plenty of hard work to do, but it’s more focused on the data and the business, and not the infrastructure. The great news for mainstream customers (well beyond Silicon Valley) is that another mega-trend is kicking in to revolutionize data integration and data consumption – and that’s the move to self-service. Thanks to new tools and platforms, “self-service integration” is making it fast and easy to create automated data pipelines with no coding, and “self-service analytics” is making it easy for analysts and business users to manipulate data without IT intervention.

All told, these trends are driving a democratization of data that’s very exciting – and will drive significant impact across horizontal functions and vertical industries. Data is thus becoming a more fluid, dynamic and accessible resource for all organizations. IT no longer holds the keys to the kingdom – and developers no longer control the workflow. Just in the nick of time, too, as the volume and velocity of data from digital and social media, mobile tools and edge devices threaten to overwhelm us all. Once the full promise of the Internet of Things, Artificial Intelligence and Machine Learning begins to take hold, the data overflow will be truly inundating.

The only remaining question: What do you want to do with your data?

Ravi Dharnikota is the Chief Enterprise Architect at SnapLogic. 

VIDEO: SnapLogic Discusses Big Data on #theCUBE from Strata+Hadoop World San Jose

It’s Big Data Week here in Silicon Valley with data experts from around the globe convening at Strata+Hadoop World San Jose for a packed week of keynotes, education, networking and more - and SnapLogic was front-and-center for all the action.

SnapLogic stopped by theCUBE, the popular video-interview show that live-streams from top tech events, and joined hosts Jeff Frick and George Gilbert for a spirited and wide-ranging discussion of all things Big Data.

First up was SnapLogic CEO Gaurav Dhillon, who discussed SnapLogic’s record-growth year in 2016, the acceleration of Big Data moving to the cloud, SnapLogic’s strong momentum working with AWS Redshift and Microsoft Azure platforms, the emerging applications and benefits of ML and AI, customers increasingly ditching legacy technology in favor of modern, cloud-first, self-service solutions, and more. You can watch Gaurav’s full video below, and here:

Next up was SnapLogic Chief Enterprise Architect Ravi Dharnikota, together with our customer, Katharine Matsumoto, Data Scientist at eero. A fast-growing Silicon Valley startup, eero makes a smart wireless networking system that intelligently routes data traffic on your wireless network in a way that reduces buffering and gets rid of dead zones in your home. Katharine leads a small data and analytics team and discussed how, with SnapLogic’s self-service cloud integration platform, she’s able to easily connect a myriad of ever-growing apps and systems and make important data accessible to as many as 15 different line-of-business teams, thereby empowering business users and enabling faster business outcomes. The pair also discussed ML and IoT integration which is helping eero consistently deliver an increasingly smart and powerful product to customers. You can watch Ravi and Katharine’s full video below, and here:

 

Finally viable: Best-of-breed enterprise environments

It’s one of the oldest, most contentious rivalries in the enterprise application arena: What’s better, best-of-breed environments or single-vendor suites? Since the turn of the century, suite vendors have argued that their approach avoids the steep data integration challenges that can be inherent with best-of-breed. On the flip side, point solution vendors say that enterprise suites pack in a lot of “dead wood” but don’t offer the real functionality, or customization potential, that is needed.

However, unlike religion and politics, this is one argument that is headed toward extinction. The biggest barrier to best-of-breed strategies — data integration — is, hands down, easier by an order of magnitude today, thanks to built-for-the-cloud app integration solutions that eliminate previous barriers. As a result, best-of-breed application environments aren’t just viable, they’re readily attainable.

Two dimensions of data integration

There are two ways in which data integration has dramatically improved with native cloud solutions: on the back end, between the applications themselves, and on the front end, from the user experience perspective.

On the back end, one of the first-order implications of a robust data model is the number of connectors a data integration solution provides. SnapLogic has hundreds of Snaps (connectors) and that’s not coincidental. Our library of Snaps proves our suitability to the modern world; it’s an order of magnitude easier to build and support a SnapLogic connector than an Informatica connector — the integration tool of choice for last-century best-of-breed environments — because our data model fits the modern world.

As a result, customers are up and running with SnapLogic in a day or two. In minutes we can show customers what SnapLogic is capable of doing. This is in comparison to Informatica and other legacy integration technologies; here, developers or consultants can work for weeks or months on the same integration project and still have nothing. They can’t deliver quickly due to the limitations of the underlying technology.

The ease of big data integration with SnapLogic has profound implications on the user experience. Instead of having to beg analysts to do ETLs (extract, transfer, and load) to pull the data set they need, SnapLogic users can get whatever data they want, themselves. They can then analyze it and get answers far faster than under previous best-of-breed regimes.

These are not subtle differences.

The economics of cloud-based integration

The subscription-based pricing model of cloud-based integration services further democratizes data access. Instead of putting the burden on IT to buy and implement an integrated application suite — which can cost upwards of $100 million in a large enterprise — cloud-based integration technology can be acquired at a nominal per-user fee, charged to a corporate credit card. Lines of business have taken advantage of this ease of access, making their own cloud big data technology moves with the full knowledge and support of IT.

For IT organizations that have embraced their new mission of enablement, the appeal of cloud-based data integration is clear. In addition to allowing business users to work the way they want to, the cloud-based solution is infinitely easier to customize, and deploy and support globally. And it offers an obvious answer to the question, “Do I want to continue feeling the pain of using integrated app suites or do I want to join the new century?”

Find out more about how and why SnapLogic puts best-of-breed integration within every organization’s grasp. Register for this upcoming webinar featuring a conversation with myself, industry analyst and data integration expert David Linthicum, and Gaurav Dhillon, SnapLogic’s CEO and also an Informatica alumnus: “We left Informatica. Now you can, too.”

SNP_Thumb_Informatica

James Markarian is CTO at SnapLogic. Follow him on Twitter @jamesmarkarian.

James Markarian: Was the Election a Referendum on Predictive Analytics?

In his decades working in the data and analytics industry, SnapLogic CTO James Markarian has witnessed few mainstream events that have sparked as much discussion and elicited as many questions – around the value and accuracy of predictive analytics tools – as our recent election.

In a new blog post on Forbes, James examines where the nation’s top pollsters (who across the board predicted a different election outcome) possibly went wrong, why some predictions succeed and others fail, what businesses who have invested in data analytics can learn from the election, and how new technologies such as integration platform as a service (iPaaS) can help them make sense of all their data to make better predictions.

Be sure to read James’s blog, titled “What The Election Taught Us About Predictive Analytics”, on Forbes here.

7 Big Data Predictions for 2017

As data increasingly becomes the means by which businesses compete, companies are restructuring operations to build systems and processes liberating data access, integration and analysis up and down the value chain. Effective data management has become so important that the position of Chief Data Officer is projected to become a standard senior board level role by 2020, with 92 percent of CIOs stating that a CDO is the best person to determine data strategy.

With this in mind as you evaluate your data strategy for 2017, here are seven predictions to contemplate to build a solid framework for data management and optimization.

  1.  Self-Service Data Integration Will Take Off
    Eschewing the IT bottleneck designation and committed to being a strategic partner to the business, IT is transforming its mindset. Rather than be providers of data, IT will enable users to achieve data optimization on a self-service basis. IT will increasingly decentralize app and data integration – via distributed Centers of Excellence based on shared infrastructure, frameworks and best practices – thereby enabling line-of-business heads to gather, integrate and analyze data themselves to discern and quickly act upon insightful trends and patterns of import to their roles and responsibilities. Rather than fish for your data, IT will teach you how to bait the hook. The payoff for IT: satisfying business user demand for fast and easy integrations and accelerated time to value; preserving data integrity, security and governance on a common infrastructure across the enterprise; and freeing up finite IT resources to focus on other strategic initiatives.
  1. Big Data Moves to the Cloud
    As the year takes shape, expect more enterprises to migrate storage and analysis of their big data from traditional on-premise data stores and warehouses to the cloud. For the better part of the last decade, Hadoop’s distributed computing and processing power has made it the standard open source platform for big data infrastructures. But Hadoop is far from perfect. Common user gripes include complexity and instability – not all that surprising given all the software developers regularly contributing their improvements to the platform. Cloud environments are more stable, flexible, elastic and better-suited to handling big data, hence the predicted migration.
  1. Spark Usage Outside of Hadoop Will Surge
    This is the year we will also see more Spark use cases outside of Hadoop environments. While Hadoop limps along, Spark is picking up the pace. Hadoop is still more likely to be used in testing rather than production environments. But users are finding Spark to be more flexible, adaptable and better suited for certain workloads – machine learning and real-time streaming analytics, as examples. Once relegated to Hadoop sidekick, Spark will break free and stand on its own two feet this year. I’m not alone in asking the question: Hadoop needs Spark but does Spark need Hadoop?
  1. A Big Fish Acquires a Hadoop Distro Vendor?
    Hadoop distribution vendors like Cloudera and Hortonworks paved the way with promising technology and game-changing innovation. But this past year saw growing frustration among customers lamenting increased complexity, instability and, ultimately, too many failed projects that never left the labs. As Hadoop distro vendors work through some growing pains (not to mention limited funds), could it be that a bigger, deeper-pocketed established player – say Teradata, Oracle, Microsoft or IBM – might swoop in to buy their sought after technology and marry it with a more mature organization? I’m not counting it out.
  1. AI and ML Get a Bit More Mainstream
    Off the shelf AI (artificial intelligence) and ML (machine learning) platforms are loved for their simplicity, low barrier to entry and low cost. In 2017, off the shelf AI and ML libraries from Microsoft, Google, Amazon and other vendors will be embedded in enterprise solutions, including mobile varieties. Tasks that have until now been manual and time-consuming will become automated and accelerated, extending into the world of data integration.

6. Yes, IoT is Coming, Just Not This Year
Connecting billions and billions of sensor-embedded devices and objects over the internet is inevitable, but don’t yet swallow all the hype. Yes, there is a lot being done to harness IoT for specific aims, but the pace toward the development of a general-purpose IoT platform is closer to a canter than a gallop. IoT solutions are too bespoke and purpose-built to solve broad, commonplace problems – the market still nascent with standards gradually evolving – that a general-purpose, mass-adopted IoT platform to collect, integrate and report on data in real-time will take, well, more time. Like any other transformation movement in the history of enterprise technology, brilliant bits and pieces need to come together as a whole. It’s coming, just not in 2017.

  1. APIs Are Not All They’re Cracked Up to Be
    APIs have long been the glue connecting apps and services, but customers will continue to question their value vs investment in 2017. Few would dispute that APIs are useful in building apps and, in many cases, may be the right choice in this regard. But in situations where the integration of apps and/or data is needed and sought, there are better ways. Case in point is iPaaS (integration platform as a service), which allows you to quickly and easily connect any combination of cloud and on-premise technologies. Expect greater migration this year toward cloud-based enterprise integration platforms – compared to APIs, iPaaS solutions are more agile, better equipped to handle the vagaries of data, more adaptable to changes, easier to maintain and far more productive.

I could go on and on, if for no other reason that predictions are informed “best guesses” about the future. If I’m wrong on two or three of my expectations, my peers will forgive me. In the rapidly changing world of technology, batting .400 is a pretty good statistic.