Microsoft Azure HDInsight is an Apache Hadoop distribution powered by the cloud. Internally HDInsight leverages the Hortonworks data platform. HDInsight supports a large set of Apache big data projects like Spark, Hive, HBase, Storm, Tez, Sqoop, Oozie and many more. The suite of HDInsight projects can be administered via Apache Ambari.
In the last post we went into some detail about anomaly detectors, and showed how some simple models would work. Now we are going to build a pipeline to do streaming anomaly detection.
We are going to use a triggered pipeline for this task. A triggered pipeline is instantiated whenever a request comes in. The instantiation can take a couple of seconds, so it is not recommended for low latency or high-traffic situations. If we’re getting data more frequently than that, or want less latency, we should use an Ultra pipeline. An Ultra pipeline stays running, so the input-to-output latency is significantly less.
For the purpose of this post, we’re going to assume we have an Anomaly-Detector-as-a-Service Snap. In the next post, we’ll show how to create that Snap using Azure ML. Our pipeline will look like this: