In this video, we will explore how SnapLogic eXtreme helps businesses obtain a higher ROI from their big data investments. SnapLogic eXtreme easily processes large amounts of data and performs complex data transformations with zero code. In this demo, we show how to manage your cloud data lake on AWS but it could easily apply to Microsoft Azure or Google Cloud Platform.
Finally, eXtreme lessens the need for users with specialized skills, which are expensive and hard to find while reducing the time and cost associated with managing cloud data lakes.
The first step to using eXtreme is to set up the AWS account that will be used to initiate, elastically scale and terminate EMR clusters through the SnapLogic Manager.
Once my AWS account is set up, I am ready to configure eXtreme with the necessary definitions.
eXtreme performs elastic scale processing on BDaaS clusters through a new Snaplex called ‘eXtremeplex.’ I will perform a one-time configuration which is defining the characteristics of the EMR cluster I want to spin up.
To reduce the cost of clusters, I can choose between the on-demand market or the spot market. If I select the spot market, I can see the spot price in terms of a percentage of the on-demand cost.
EMR has the capability to write all the application log files to an S3 bucket. eXtreme utilizes this feature to log not only the EMR application logs, but also eXtreme related logs.
And in order to execute an eXtreme pipeline, certain artifacts are needed such as Snap Pack, pipeline definition, etc. These need to be stored in an S3 bucket so the EMR cluster has access to them.
eXtreme needs relevant permissions to successfully initiate an EMR cluster. I recommend using the Amazon supplied default roles, which are pre-populated.
eXtreme helps save valuable operational expenses by providing the option of auto-terminating inactive clusters.
I can also choose to use Auto Scaling. Selecting the checkbox reveals the auto scaling options.
We are ready to create the eXtremeplex now.
I will start dragging Snaps onto the canvas to build my pipeline. I use the Reader and Writer Snaps to read and write files into S3. I then execute the pipeline once it is completed
and the eXtremeplex triggers the EMR cluster to be initialized.
Then, I log into the Amazon EMR console and look for the cluster that eXtreme has initiated. Once I expand the view to see the eXtremeplex cluster details, I see that the nodes are in the provisioning state and will progress to bootstrapping stage.
Going back to the Dashboard, I can see that the Acmeportsystems eXtremeplex is up and operational. I can see that all nodes are running and the status of the pipeline is completed
Here I provide a link to the S3 bucket that contains the application logs as well as the Spark step logs. The AWS logging service only pushes the logs every 5 minutes. Clicking on the log link will open up a new tab with access to the S3 bucket via the Amazon console.
I can find the Acmeportsystems eXtremeplex on the left-hand side. By clicking the down arrow, I can manually terminate the cluster.
Then I go to the EMR console and see that the termination command was received by EMR and the cluster has started the Termination process so I am done.
In this video, I showed how business and IT users can reduce the cost of processing large amounts of data to obtain insights without the need for specialized skills. With eXtreme, users can focus on supporting business goals and using data as a strategic asset.
For more information about eXtreme, please contact SnapLogic or request a demo today.