SnapLogic provides a big data integration platform as a service (iPaaS) for business customers to process data in a simple, intuitive and powerful way. SnapLogic provides a number of different modules called Snaps. An individual Snap provides a convenient way to get, manipulate or output data, and each Snap corresponds to a specific data operation. All the customer needs to do is to drag the corresponding Snaps together and configure them, which creates a data pipeline. Customers execute pipelines to handle specific data integration flows.
The suite of Snaps available in the Spark mode enable us to ingest and land data from a Hadoop ecosystem and transform the data by leveraging the parallel operations such as map, filter, reduce or join on a Resilient Distributed Datasets (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.
There are various formats available for data storage in HDFS. These file formats support one or more compression formats that affect the size of data stored in the HDFS file system. The choice of file formats and compression depends on various factors like desired performance for read or write specific use case, desired compression level for storing the data. Continue reading “Ingestion, Transformation and Data Flow Snaps in Spark”
The SnapLogic Elastic Integration Platform connects your enterprise data, applications, and APIs by building drag-and-drop data pipelines. Each pipeline is made up of Snaps, which are intelligent connectors, that users drag onto a canvas and “snap” together like puzzle pieces.
These pipelines are executed on a Snaplex, an application that runs on a multitude of platforms: on a customer’s infrastructure, on the SnapLogic cloud, and most recently on Hadoop. A Snaplex that runs on Hadoop can execute pipelines natively in Spark.
The SnapLogic platform is known for its easy-to-use, self-service interface, made possible by our team of dedicated engineers (we’re hiring!). We work to apply the industry’s best practices so that our clients get the best possible end product — and testing is fundamental. Continue reading “Testing… Testing… 1, 2, 3: How SnapLogic tests Snaps on the Apache Spark Platform”
– Greg Benson, Chief Scientist at SnapLogic
Hadoop. Hive. MapReduce. Spark. As the organizing principles of managing big (and small) data are in the midst of being re-written, there continues to be a lot of confusion in the market. What role will traditional extraction, transformation and loading (ETL) tools play when it comes to big data analytics? Greg Benson is our in-house expert on all things big data, Hadoop, MapReduce and more and will be joining the SnapLogic team of data integration experts next week on Friday, November 7th for a webinar to discuss why the same old tools won’t cut it in the new world of big data and various integration needs.
In the webinar we’ll be talking about what’s new, what’s hot and what’s happening when it comes to accessing, preparing and delivering big data for a wide variety of use cases. We’ll also feature trends in the market, what’s changing and the impact of Spark (with mention of our new capabilities using the Sparklex). Lastly, we’ll review some of the use cases we’re seeing for a more flexible integration platform as a service (iPaaS) that can handle multiple styles of ingesting, synchronizing and transforming big data sources and dive into our latest platform demonstrations.
We recommend that you check out this webinar if you’re an Information Architect, Big Data Practitioner, Data Warehouse Architect, Business Analyst or BI Practitioner. Register here and we looked forward to discussing big data integration more next week!