Today SnapLogic announced SnapReduce 2.0, which aims to make big data elastic.
How we got here:
Although Hadoop promises tremendous value in terms of insights into previously unharvested business data, not all organizations have been able to get to the promised land because:
Most customers are unable to access and incorporate all the data they need for comprehensive analytical insights. With Hadoop?s massively parallel processing capabilities, the more data you incorporate, the smarter your algorithms and the better your insights. Developer tools like Sqoop are difficult to use and can become very cumbersome to maintain with little re-use. Also, these tools are good at funneling data from on-premises systems, but getting data from cloud applications and sources such as Salesforce, Workday, as well as custom applications running on public clouds as Amazon Web Services (AWS) and Microsoft Azure.
MapReduce, the main data processing engine in Hadoop, is also a very developer-centric tool. In order to run analytics, data scientists need to write MapReduce jobs and are now expected to be top-class coders in addition to having knowledge of their data and statistics. Adding a developer resource at the data scientists disposal only slows him or her down as every little change becomes a change request.
The Hadoop community has recognized these gaps and have made Hadoop more extensible by introducing Yet Another Resource Negotiator (YARN). YARN, aka MapReduce 2.0, allows Hadoop users to run any arbitrary job in the Hadoop framework.
With SnapReduce 2.0, SnapLogic makes its entire Elastic Integration Platform available to Hadoop users. By making it YARN managed, users can now make their big data elastic. Here?s how:
Elastic Scale: Users can run their elastic integrations at Hadoop scale by running SnapLogic integrations natively on Hadoop as YARN-managed resources. This is possible because of SnapLogic’s Software-defined Integration architecture where the data planes are purely executors of instructions dispatched from the control plane. These data planes, called Snaplexes, can run on any platform that supports Java, of which Hadoop is one.
Richer Data in Hadoop: Beyond basic data integration, data scientists can also incorporate data from over 160 data sources, both on-premises and in the cloud, without any coding. SnapLogic?s rich drag-and-drop user interface requires no specialized integration skills for the data scientists. Additionally, SnapLogic’s schemaless integration pipelines (weakly-typed vs strongly-typed) become highly resilient and re-usable in the world of big data.
Data Preparation: With SnapReduce and SnapLogic, data scientists can graphically transform and enrich data without any coding. Common yet cumbersome tasks such as union and joins become a matter of drag, drop and configure.
Elastic Delivery: SnapReduce 2.0 also allows them to elastically deliver their analytics results via multiple delivery channels – files, APIs, or visualization techniques such as Tableau. Data scientists can build pipelines that read their result sets from Hadoop and deliver it to business analysts as CSV files, Tableau-ready data format files, or to traditional data warehouses and BI tools in relational format for highly optimized analytical experience.
With the consolidation of storage, compute, and now integration of big data into the Hadoop platform, it now becomes a single data management platform for customers. This helps them rationalize their data management stack onto Hadoop and streamline their big data investments. Existing SnapLogic customers are relying on us to help them with their big data initiatives. It is typically those customers who have already trusted the SnapLogic Elastic Integration Platform for their application, API, or classic ETL/ELT integration needs and would rather extend the capabilities of SnapLogic into big data and analytics in order to standardize on a single platform for all their enterprise integration needs.
Here’s what our partner Cloudera had to say about today’s announcement:
?Our customers are recognizing the value of building an enterprise data hub, and modern data collection and transformation technologies are essential for delivering maximum operational and analytical benefits. We’re pleased to be working with SnapLogic as they bring SnapReduce 2.0 to market, enabling customers to leverage Cloudera Enterprise?s massively parallel processing capabilities for their big data integration initiatives.?