Practical Machine Learning for the Enterprise, Part I

“Speak English!” said the Eaglet. “I don’t know the meaning of half those long words, and, what’s more, I don’t believe you do either!” – Alice in Wonderland

Machine learning (and its subset, deep learning) have been hailed as The Next Big Thing, capable of creating autonomous cars, upending business models, and generally requiring a massive investment in human and financial capital for a business to stay competitive. The hype has drowned out the ‘how’ and particularly the ‘why’. While we are bullish at SnapLogic on the promise of machine learning (ML) in the enterprise, we think the first question is not “how do I implement it?” but “what is it I want to know?”

A Crash Course in ML

At its core, most ML algorithms are based on something you may have done in high school: drawing a line through a bunch of points. In fact, if you’ve ever run a regression in Excel, you’ve done machine learning. So what’s the big deal now?

Basically, data volume and the compute power to process it. Regression is an example of supervised learning, which is a formal way of saying you already know the correct answers but are trying to see how well you can make a model that would predict those answers. (Or, as a data scientist would say, you have “labelled data.”) The other major class of problem is “unsupervised learning,” where you have a mass of data (“unlabeled data”) and the hope that somehow an algorithm can make sense of it all. Since you probably have a lot more unlabeled data than labeled, this can require a lot of computational power to process.

One other important category to note is anomaly detection (AD). There are many ways to implement AD, but the basic idea is, given a bunch of data, identify those points that are ‘wrong’. If you have a credit card, you’ve probably at some point gotten a call from the issuing company because their AD algorithm flagged a transaction as potentially being fraudulent. Besides fraud detection, AD is useful for quality control, predictive maintenance, and security, among other applications.

Actually Using Machine Learning in the Enterprise

If your organization has data scientists, ask them where they spend the majority of their time. A New York Times article claimed that “50 percent to 80 percent of their time [is spent] mired in [the] mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.” Some organizations will create “Big Data Engineers” to try and offload this data plumbing work from the data scientists. Either way, 2x to 4x the number of person-hours are going to be spent simply trying to get your data into a useful form than will actually be spent making the data useful.

At SnapLogic, data plumbing, data munging, data janitorial work, data transformation – whatever you want to call it – is our specialty. Recall that the modern interest in machine learning is driven by the volume of data available and the computing power that’s available to process it. As we recently said, “SnapLogic brings all your data together, at incredible speeds and with an ease never known before. Data, applications, and APIs—from any source, anywhere.”

Getting to Yes

This was the first-part of a multi-part series on machine learning in the enterprise. Future posts will go into what we should ask of ML, what data and infrastructure are needed to achieve those answers, and how SnapLogic and its partners enables these solutions. Besides this series, you may wish to check out our IoT blog series, our YouTube channel, or reach out to us for a demo.