Big data processing now made easier with SnapLogic eXtreme

As enterprises continue their digital transformation journey and learn the benefits of big data architectures, they’re looking to migrate their data lakes to the cloud for cost-savings, data processing, and scaling benefits. However, connecting cloud-based data environments and creating Apache Spark pipelines requires extensive technical knowledge and resources.

With SnapLogic eXtreme, our new big data solution, SnapLogic is making cloud-based big data processing viable for enterprises for the first time by offering flexibility, scalability, and reduced OpEx not to mention lessening the need for specialized skills to manage big data clusters. SnapLogic is also inherently enabling enterprises to see ROI on big data investments by becoming truly data-driven.

Big data processing: A brief history

Discovering business insights using big data processing has had mixed success, with many enterprises unable to show a compelling ROI. In the early days, enterprises processed large volumes of data by building a Hadoop cluster on-premises using a distribution such as Cloudera, Hortonworks, or MapR. The data analyzed was mostly structured and required a large capital expenditure upfront to purchase the necessary hardware. Also, Hadoop is a complex entity to manage and monitor requiring a specialized skill set and people with such skills are scarce.

As companies see increased business benefits from big data, they are creating or migrating their big data architecture to the cloud to take advantage of tremendous operational cost savings, nearly limitless data processing power and the instant scaling options the cloud provides.

Many enterprises are going through this “lift and shift,” where they move the on-premises cluster to the cloud. This has the advantage of not having the large capital expenditure to spin up the cluster and get going. However, since it is still managed and monitored by the enterprise, this strategy does nothing to address the OpEx and skill set gap. Hence, enterprises are still waiting for the promised benefits (lower OpEx, faster TTV, and ROI).

For the majority of enterprises, managing and monitoring Hadoop environments does not add to their competitive advantage, so they are looking for a better way to perform data transformation at scale. BDaaS provides such a data transformation environment. Since it is a managed service, they can dramatically reduce the amount of time spent on managing and monitoring the cluster allowing enterprises to focus on their core competitive advantages. However, connecting cloud-based big data environments with diverse data sources, while also creating Apache Spark pipelines to transform that data, requires highly technical knowledge and continuous coding resources from data engineers and core IT groups that result in prohibitive operational costs and longer time-to-value.

Enter SnapLogic eXtreme

With SnapLogic eXtreme, SnapLogic is making cloud-based big data viable for enterprises for the first time by offering flexibility, scalability, and decreased OpEx. Data engineers can use SnapLogic eXtreme to lower the prohibitive cost and resource requirements many companies face when building and operating big data architectures in the cloud. As a result, data engineers, Business Analysts, and others can focus on obtaining more timely insights from the big data and driving improved decision-making and faster time-to-market.

A Customer 360 example

All enterprises want a better understanding of their customers and usually have an initiative to help them obtain a 360-degree view of them. To ensure the broadest viewpoint of their customer, however, one of the challenges is that customer data is held in silos. To get a complete view of a customer, one needs to combine and enrich their customer data from multiple sources. First, they must ingest customer data from a cloud-based CRM such as Salesforce, clickstream logs from their website, customer care logs from their customer service application, and social media feeds such as Twitter. These data sources contain both structured and semi-structured data.

Via SnapLogic’s graphical user interface, data engineers can leverage over 450 pre-built intelligent connectors or Snaps to build data pipelines with just a couple of clicks and capture structured data from the on-premises systems such as relational databases and cloud-based applications (Salesforce), and semi-structured data such as Twitter social media and clickstream data from their website. All of this data is captured in its raw format and lands in their cloud-based data lake storage services such as Amazon S3 or Azure Data Lake Store.

Using the same graphical user interface, engineers can then quickly create transformative Apache Spark pipelines with SnapLogic’s ephemeral plex capabilities to more easily process the large volumes of data from these sources. The first Spark pipeline that is executed causes the ephemeral Amazon EMR cluster to be initiated. The cluster that is spun-up is based on the configuration specified in the UI. Subsequent pipeline executions reuse the existing cluster. Once all the processing has been completed and the cluster is idle for a period of time, the cluster will be terminated, saving valuable OpEx. Once the transformations are complete, the data is written back to the data lake typically in a columnar format such as Parquet.

Finally, the data is delivered from the cloud-based data lake to end systems that can include cloud data warehouses, such as Snowflake or BI tools such as Tableau.

A unified platform

SnapLogic eXtreme is part of SnapLogic’s leading self-service integration platform, the Enterprise Integration Cloud (EIC), and can be used to build and submit powerful Spark transformations through the use of its visual programming interface. The powerful combination of the EIC and SnapLogic eXtreme reduces the time, cost, and complexity of cloud big data integrations. With a fully managed data architecture in the cloud, customers benefit from no CapEx, lower OpEx, and no skills gap. Complex big data integrations that used to take weeks or months can now be done in days. What’s not to like?