We have a slight problem at SnapLogic. While we spend a vanishingly small percent of the day watching adorable cat videos on the Internet, it seems our CEO always shows up behind our desks while doing so. If only we knew when our CEO was nearby and could get an alert when he was.
In the last post we went into some detail about anomaly detectors, and showed how some simple models would work. Now we are going to build a pipeline to do streaming anomaly detection.
We are going to use a triggered pipeline for this task. A triggered pipeline is instantiated whenever a request comes in. The instantiation can take a couple of seconds, so it is not recommended for low latency or high-traffic situations. If we’re getting data more frequently than that, or want less latency, we should use an Ultra pipeline. An Ultra pipeline stays running, so the input-to-output latency is significantly less.
For the purpose of this post, we’re going to assume we have an Anomaly-Detector-as-a-Service Snap. In the next post, we’ll show how to create that Snap using Azure ML. Our pipeline will look like this:
This article originally appeared as a slide slow on ITBusinessEdge: Data Lakes – 8 Data Management Requirements.
2016 is the year of the data lake. It will surround, and in some cases drown the data warehouse and we’ll see significant technology innovations, methodologies and reference architectures that turn the promise of broader data access and big data insights into a reality. But big data solutions must mature and go beyond the role of being primarily developer tools for highly skilled programmers. The enterprise data lake will allow organizations to track, manage and leverage data they’ve never had access to in the past. New data management strategies are already leading to more predictive and prescriptive analytics that are driving improved customer service experiences, cost savings and an overall competitive advantage when there is the right alignment with key business initiatives. Continue reading “Eight Data Management Requirements for the Enterprise Data Lake”
Last time we talked about figuring out what we want machine learning to do to be more important than how to do it. So before we jump into how to build a machine learning pipeline in the SnapLogic Elastic Integration Platform, let’s talk about what we are doing. Continue reading “Machine Learning in the Enterprise, Part II: Intro to Anomaly Detection”