Executing Spark Pipelines on HDInsight

Microsoft Azure HDInsight is an Apache Hadoop distribution powered by the cloud. Internally HDInsight leverages the Hortonworks data platform. HDInsight supports a large set of Apache big data projects like Spark, Hive, HBase, Storm, Tez, Sqoop, Oozie and many more. The suite of HDInsight projects can be administered via Apache Ambari.

SnapLogic-for-MicrosoftThis post lists out the steps involved in spinning up an HDInsight cluster, setting up SnapLogic’s Hadooplex on HDInsight, and building and executing a Spark data flow pipeline on HDInsight. We start with spinning up a HDInsight cluster from the MS Azure Portal. Continue reading “Executing Spark Pipelines on HDInsight”