There is a massive Influx of data from numerous sources, ranging from webserver logs to data relayed from IOT devices that fill the data lake. Enterprises are faced with the challenge to optimally process this huge volume of data on their data lake to derive insights for purposes like product recommendation, advertising, customer acquisition and engagement, fraud detection, cost optimization and many more. Snaplogic’s Spark offering lets you leverage on the power of Spark to transform the huge volume of data with high performance.
Hadooplex is a data processing engine of SnapLogic’s Elastic Integration Platform deployed on a Hadoop cluster. It is a Snaplex that is installed on a Hadoop Cluster. A Hadooplex can execute Standard mode and Spark mode pipelines.
Hadooplex at the core comprises of Yarn Application Master (Hadooplex Master) process. Hadooplex Master is responsible for negotiating resources from Yarn Resource Manager and communicate with the NodeManager(s) to execute and monitor the containers and their resource consumption.
A Hadooplex can be configured to enable Spark support which allows creation of a Spark pipeline for execution using the Spark engine. Spark pipelines enable SnapLogic’s users to build dataflow logic using the Snap and pipeline paradigm that would execute as a Spark program. When Spark pipelines are executed on a Hadooplex, the Hadooplex requests the YARN Resource Manager to schedule the SnapLogic Spark driver for execution. Each SnapLogic Spark driver instance handles one Spark pipeline execution.
Details regarding the high level architecture, prerequisites and launching of Spark enabled Hadooplex are available on the SnapLogic’s documentation page.
The next blog in this series will take you though the various snaps available to build the Spark pipeline to leverage the power of Spark. In the meantime, learn how to build and execute Spark pipelines on HDInsight, watch a demo building Spark pipelines on SnapLogic’s Elastic Integration Platform, or contact us if you would like more information on SnapLogic’s solutions for Spark.