Data management takes center stage at Rutberg 2017 conference

Each year, research-centric investment bank Rutberg & Company gathers top business leaders and technology experts for an intimate, two-day forum where they discuss and debate the technology, ideas, and trends driving global business. The annual Rutberg 2017 conference took place last week in Half Moon Bay, California, and data management was front and center.

SnapLogic CEO Gaurav Dhillon joined Mesosphere CEO Florian Leibert and Segment CEO Peter Reinhardt for a spirited panel discussion on the growing data management opportunities and challenges facing enterprises today. The panel was moderated by Fortune reporter Jonathan Vanian.

A number of important data management and integration trends emerged, including:

  • LOB’s influence grows: Gaurav noted that more and more, “innovation is coming from the LOB,” whether in Sales, Marketing, Finance, HR, or elsewhere in the organization. These LOB leaders are tech-savvy, are responsible for their own P&L’s, and they know speed and agility will determine tomorrow’s winners. So they’re constantly on the hunt for the latest tech solutions that will drive innovation, spur growth, and help them beat the competition.
  • Data fragmentation on the rise: With individual LOBs procuring a flurry of new cloud applications and technologies, the result is often business silos and a disconnected enterprise. “The average enterprise has 10x more SaaS apps than a CIO thinks,” said Gaurav of the increasing SaaS sprawl, which is requiring CIOs to think differently about how they integrate and manage disparate apps and data sources across the enterprise.
  • Self-service integration is here to stay: The bigger a company gets – with more apps, more end-points, more data-types, more fragmentation – there’s never going to be enough humans to manage the required integration in a timely manner, explained Gaurav. Enter new, modern, self-service integration platforms. “The holy grail of integration is self-service and ease-of-use … we have to bring integration out of the dungeon and into the light,” Gaurav continued. And this means getting integration into the hands of the LOB, and making it fast and easy. The days of command-and-control by IT are over: “Trying to put the genie back in the bottle is wrong; instead you need to give the LOBs a self-service capability to wire this up on their own,” noted Gaurav.
  • AI will be a game-changer: Artificial intelligence (AI) and machine learning (ML) are already making apps, platforms, and people smarter. Like with Google auto-complete or shopping on Amazon, we’re already becoming accustomed to assistance from, and recommendations by, machines. “Software without AI will be like Microsoft Word or email without spell-check,” it will be jarring not to have it, said Gaurav. AI is already being applied to complex tasks like app and data integration; it’s not a future state, he said, the start of “self-driving integration is happening today.”
  • The enterprise is a retrofit job: For all the latest advances – new cloud apps, AI and ML technologies, self-service integration platforms – the enterprise remains a “retrofit job,” where the new must work with the old. Large, global enterprises aren’t about to throw out decades of technology investment all at once, particularly if it is working just fine or well-suited to handle certain business processes. So, new cloud technologies will need to work with older on-premise solutions, once again cementing integration platforms as a critical piece of an enterprise technology strategy. “It will be a hybrid world for a long, long time,” concluded Gaurav.

Without question, data has become any organization’s most valuable asset, and those that are able to integrate, manage, and analyze data effectively will be the winners of tomorrow.

Spring 2017 Release: Self-driving integration, field cryptography, MS Dynamics 365 CRM and more

Today the Spring 2017 release is available and featuring artificial intelligence (AI) technology that promises to dramatically reduce the time and cost of cloud, analytics, and digital transformation initiatives. The first AI-powered feature is SnapLogic Integration Assistant, a recommendation engine that delivers expert step-by-step guidance to improve the speed and quality of building a data pipeline — with up to 90% accuracy. Integration Assistant is available to all of our customers starting today at no charge.

SnapLogic’s Spring 2017 release also introduces new and enhanced software that helps our customers jumpstart their CRM, HR, and cloud data warehouse software projects. Finally, the release has added platform-wide features designed to save time and increase productivity.

New or enhanced capabilities in the Spring 2017 release include:

  • SnapLogic Integration Assistant: The SnapLogic Integration Assistant is a recommendation engine that uses machine learning to predict the next step in building a data pipeline for the cloud, analytics, and digital initiatives – with up to 90% accuracy. It is part of SnapLogic’s “Iris” technology – an industry first in applying artificial intelligence for enterprise integration. See how it works in this video.
  • Microsoft Dynamics CRM integration: A new Snap Pack is available to help users create, read, and update records in the cloud and on-premises versions of Microsoft Dynamics. Also, users can delete a record based on account ID and also search with various filter options.
  • Workday integration: The Workday Read Snap now supports page number and page size and shows great performance improvements. The Workday Write Snap now supports bulk operations for many objects and has improved performance from hours and minutes to just seconds.
  • Confluent integration: The updated Confluent Acknowledge Snap greatly reduces potential data losses, eliminates duplicates, and externally acknowledges each message.
  • Enhanced Snaps for Amazon Redshift, Anaplan, Tableau, Apache Hive + Kerberos.
  • Field encryption and decryption: The Transform Snap Pack now includes new Snaps to encrypt and decrypt field values for sensitive data, or entire documents, providing a greater level of data security.
  • Platform enhancements: These updates save time and enhance productivity, including Snaplex restart; enhanced Asset Search with Snap labels; network statistics display to aid troubleshooting and performance optimization; and parameterizable accounts that make it easier to automate or dynamically assign environments in a development lifecycle.

SnapLogic is committed to continuous innovation and the features in the Spring 2017 release for the SnapLogic Enterprise Integration Cloud are examples of how we continue to make integrating data warehouses, applications, IoT, and big data streams faster, easier. and more powerful. To learn more, go to www.snaplogic.com/spring2017.

Applying machine learning tools to data integration

greg-bensonBy Gregory D. Benson

Few tasks are more personally rewarding than working with brilliant graduate students on research problems that have practical applications. This is exactly what I get to do as both a Professor of Computer Science at the University of San Francisco and as Chief Scientist at SnapLogic. Each semester, SnapLogic sponsors student research and development projects for USF CS project classes, and I am given the freedom to work with these students on new technology and exploratory projects that we believe will eventually impact the SnapLogic Enterprise Integration Cloud Platform. Iris and the Integration Assistant, which applies machine learning to the creation of data integration pipelines, represents one of these research projects that pushes the boundaries of self-service data integration.

For the past seven years, these research projects have provided SnapLogic Labs with bright minds and at the same time given USF students exposure to problems found in real-world commercial software. I have been able to leverage my past 19 years of research and teaching at USF in parallel and distributed computing to help formulate research areas that enable students to bridge their academic experience with problems found in large-scale software that runs in the cloud. Project successes include Predictive Field Linking, the first SnapLogic MapReduce implementation called SnapReduce, and the Document Model for data integration. It is a mutually beneficial relationship.

During the research phase of Labs projects, the students have access to the SnapLogic engineering team, and can ask questions and get feedback. This collaboration allows the students to ramp up quickly with our codebase and gets the engineering team familiar with the students. Once we have prototyped and demonstrated the potential for a research project we transition the code to production. But the relationship doesn’t end there – students who did the research work are usually hired on to help with transitioning the prototype to production code.

The SnapLogic Philosophy
Iris technology was born to help an increasing number of business users design and implement data integration tasks that previously required extensive programming skills. Most companies must manage an increasing number of data sources and cloud applications as well as an increasing amount of data volume. And it’s data Integration platforms that help business connect and transform all of this disparate data. The SnapLogic philosophy has always been to truly provide self-service integration through visual programming. Iris and the Integration Assistant further advances this philosophy by learning from the successes and failures of thousands of pipelines and billions of executions on the SnapLogic platform.

The Project
Two years ago, I led a project that refined our metadata architecture and last year I proposed a machine learning project for USF students. At the time, I gave some vague ideas about what we could achieve. The plan was to spend the first part of the project doing data science on the SnapLogic metadata to see what patterns we could find and opportunities for applying machine learning.

One of the USF graduate students working on the project, Thanawut “Jump” Anapiriyakul, discovered that we could learn from past pipeline definitions in our metadata to help recommend likely next Snaps during pipeline creation. Jump experimented with several machine learning algorithms to find the ones that give the best recommendation accuracy. We later combined the pipeline definition with Snap execution history to further improve recommendation accuracy. The end result: Pipeline creation is now much faster with the Integration Assistant.

The exciting thing about the Iris technology is that we have created an internal metadata architecture that supports not only the Integration Assistant but also the data science needed to further leverage historical user activity and pipeline executions to power future applications of machine learning in the SnapLogic Enterprise Cloud. In my view, true self-service in data integration will only be possible through the application of machine learning and artificial intelligence as we are doing at SnapLogic.

As for the students who work on SnapLogic projects, most are usually offered internships and many eventually become full-time software engineers at SnapLogic. It is very rewarding to continue to work with my students after they graduate. After ceremonies this May at USF, Jump will join SnapLogic full-time this summer, working with the team on extending Iris and its capabilities.

I look forward to writing more about Iris and our recent technology advances in the weeks to come. In the meantime, you can check out my past posts on JSON-centric iPaaS and Hybrid Batch and Streaming Architecture for Data Integration.

Gregory D. Benson is a Professor in the Department of Computer Science at the University of San Francisco and Chief Scientist at SnapLogic. Follow him on Twitter @gregorydbenson.

VIDEO: SnapLogic Discusses Big Data on #theCUBE from Strata+Hadoop World San Jose

It’s Big Data Week here in Silicon Valley with data experts from around the globe convening at Strata+Hadoop World San Jose for a packed week of keynotes, education, networking and more - and SnapLogic was front-and-center for all the action.

SnapLogic stopped by theCUBE, the popular video-interview show that live-streams from top tech events, and joined hosts Jeff Frick and George Gilbert for a spirited and wide-ranging discussion of all things Big Data.

First up was SnapLogic CEO Gaurav Dhillon, who discussed SnapLogic’s record-growth year in 2016, the acceleration of Big Data moving to the cloud, SnapLogic’s strong momentum working with AWS Redshift and Microsoft Azure platforms, the emerging applications and benefits of ML and AI, customers increasingly ditching legacy technology in favor of modern, cloud-first, self-service solutions, and more. You can watch Gaurav’s full video below, and here:

Next up was SnapLogic Chief Enterprise Architect Ravi Dharnikota, together with our customer, Katharine Matsumoto, Data Scientist at eero. A fast-growing Silicon Valley startup, eero makes a smart wireless networking system that intelligently routes data traffic on your wireless network in a way that reduces buffering and gets rid of dead zones in your home. Katharine leads a small data and analytics team and discussed how, with SnapLogic’s self-service cloud integration platform, she’s able to easily connect a myriad of ever-growing apps and systems and make important data accessible to as many as 15 different line-of-business teams, thereby empowering business users and enabling faster business outcomes. The pair also discussed ML and IoT integration which is helping eero consistently deliver an increasingly smart and powerful product to customers. You can watch Ravi and Katharine’s full video below, and here:

 

7 Big Data Predictions for 2017

As data increasingly becomes the means by which businesses compete, companies are restructuring operations to build systems and processes liberating data access, integration and analysis up and down the value chain. Effective data management has become so important that the position of Chief Data Officer is projected to become a standard senior board level role by 2020, with 92 percent of CIOs stating that a CDO is the best person to determine data strategy.

With this in mind as you evaluate your data strategy for 2017, here are seven predictions to contemplate to build a solid framework for data management and optimization.

  1.  Self-Service Data Integration Will Take Off
    Eschewing the IT bottleneck designation and committed to being a strategic partner to the business, IT is transforming its mindset. Rather than be providers of data, IT will enable users to achieve data optimization on a self-service basis. IT will increasingly decentralize app and data integration – via distributed Centers of Excellence based on shared infrastructure, frameworks and best practices – thereby enabling line-of-business heads to gather, integrate and analyze data themselves to discern and quickly act upon insightful trends and patterns of import to their roles and responsibilities. Rather than fish for your data, IT will teach you how to bait the hook. The payoff for IT: satisfying business user demand for fast and easy integrations and accelerated time to value; preserving data integrity, security and governance on a common infrastructure across the enterprise; and freeing up finite IT resources to focus on other strategic initiatives.
  1. Big Data Moves to the Cloud
    As the year takes shape, expect more enterprises to migrate storage and analysis of their big data from traditional on-premise data stores and warehouses to the cloud. For the better part of the last decade, Hadoop’s distributed computing and processing power has made it the standard open source platform for big data infrastructures. But Hadoop is far from perfect. Common user gripes include complexity and instability – not all that surprising given all the software developers regularly contributing their improvements to the platform. Cloud environments are more stable, flexible, elastic and better-suited to handling big data, hence the predicted migration.
  1. Spark Usage Outside of Hadoop Will Surge
    This is the year we will also see more Spark use cases outside of Hadoop environments. While Hadoop limps along, Spark is picking up the pace. Hadoop is still more likely to be used in testing rather than production environments. But users are finding Spark to be more flexible, adaptable and better suited for certain workloads – machine learning and real-time streaming analytics, as examples. Once relegated to Hadoop sidekick, Spark will break free and stand on its own two feet this year. I’m not alone in asking the question: Hadoop needs Spark but does Spark need Hadoop?
  1. A Big Fish Acquires a Hadoop Distro Vendor?
    Hadoop distribution vendors like Cloudera and Hortonworks paved the way with promising technology and game-changing innovation. But this past year saw growing frustration among customers lamenting increased complexity, instability and, ultimately, too many failed projects that never left the labs. As Hadoop distro vendors work through some growing pains (not to mention limited funds), could it be that a bigger, deeper-pocketed established player – say Teradata, Oracle, Microsoft or IBM – might swoop in to buy their sought after technology and marry it with a more mature organization? I’m not counting it out.
  1. AI and ML Get a Bit More Mainstream
    Off the shelf AI (artificial intelligence) and ML (machine learning) platforms are loved for their simplicity, low barrier to entry and low cost. In 2017, off the shelf AI and ML libraries from Microsoft, Google, Amazon and other vendors will be embedded in enterprise solutions, including mobile varieties. Tasks that have until now been manual and time-consuming will become automated and accelerated, extending into the world of data integration.

6. Yes, IoT is Coming, Just Not This Year
Connecting billions and billions of sensor-embedded devices and objects over the internet is inevitable, but don’t yet swallow all the hype. Yes, there is a lot being done to harness IoT for specific aims, but the pace toward the development of a general-purpose IoT platform is closer to a canter than a gallop. IoT solutions are too bespoke and purpose-built to solve broad, commonplace problems – the market still nascent with standards gradually evolving – that a general-purpose, mass-adopted IoT platform to collect, integrate and report on data in real-time will take, well, more time. Like any other transformation movement in the history of enterprise technology, brilliant bits and pieces need to come together as a whole. It’s coming, just not in 2017.

  1. APIs Are Not All They’re Cracked Up to Be
    APIs have long been the glue connecting apps and services, but customers will continue to question their value vs investment in 2017. Few would dispute that APIs are useful in building apps and, in many cases, may be the right choice in this regard. But in situations where the integration of apps and/or data is needed and sought, there are better ways. Case in point is iPaaS (integration platform as a service), which allows you to quickly and easily connect any combination of cloud and on-premise technologies. Expect greater migration this year toward cloud-based enterprise integration platforms – compared to APIs, iPaaS solutions are more agile, better equipped to handle the vagaries of data, more adaptable to changes, easier to maintain and far more productive.

I could go on and on, if for no other reason that predictions are informed “best guesses” about the future. If I’m wrong on two or three of my expectations, my peers will forgive me. In the rapidly changing world of technology, batting .400 is a pretty good statistic.

Enterprise IoT: Watching Cat Videos Without Getting Caught (or, How I Learned to Stop Looking Over My Shoulder and Trust the CEO Proximity Alert

We have a slight problem at SnapLogic. While we spend a vanishingly small percent of the day watching adorable cat videos on the Internet, it seems our CEO always shows up behind our desks while doing so. If only we knew when our CEO was nearby and could get an alert when he was.

Continue reading “Enterprise IoT: Watching Cat Videos Without Getting Caught (or, How I Learned to Stop Looking Over My Shoulder and Trust the CEO Proximity Alert”

Machine Learning for the Enterprise, Part 3: Building the Pipeline

In the last post we went into some detail about anomaly detectors, and showed how some simple models would work. Now we are going to build a pipeline to do streaming anomaly detection.

We are going to use a triggered pipeline for this task. A triggered pipeline is instantiated whenever a request comes in. The instantiation can take a couple of seconds, so it is not recommended for low latency or high-traffic situations. If we’re getting data more frequently than that, or want less latency, we should use an Ultra pipeline. An Ultra pipeline stays running, so the input-to-output latency is significantly less.

For the purpose of this post, we’re going to assume we have an Anomaly-Detector-as-a-Service Snap.  In the next post, we’ll show how to create that Snap using Azure ML. Our pipeline will look like this:

Final Pipeline
Final Pipeline

Continue reading “Machine Learning for the Enterprise, Part 3: Building the Pipeline”