Hadoop / Big Data in Enterprise
The Big Data ecosystem came into existence to deal with the massive amount of data generated via web/online activity. Once the major components of the ecosystem matured, it didn’t take much time for enterprise organizations to use these new tools and technologies for their use cases. Today, enterprise organizations are dumping all kinds of structured, semi-structured as well as unstructured data into their data lakes. But the major challenge faced by them is how to make sense of the massive data in their data lake. Using the SnapLogic Elastic iPaaS (Integration platform as a service), which is based on the visual programming paradigm, customers can address this issue with ease. It provides a powerful web-based Designer and hundreds of prebuilt connectors (Snaps), which customers can drag and drop to build their data pipelines to cleanse and reshape the data in the required format, and they can do it at big data scale and in a big data environment. Continue reading “Leveraging Big Data Security with SnapLogic iPaaS”
In the previous post, we discussed what SnapLogic’s Hadooplex can offer with Spark. Now let’s continue the conversation by seeing what Snaps are available to build Spark Pipelines.
The suite of Snaps available in the Spark mode enable us to ingest and land data from a Hadoop ecosystem and transform the data by leveraging the parallel operations such as map, filter, reduce or join on a Resilient Distributed Datasets (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.
There are various formats available for data storage in HDFS. These file formats support one or more compression formats that affect the size of data stored in the HDFS file system. The choice of file formats and compression depends on various factors like desired performance for read or write specific use case, desired compression level for storing the data. Continue reading “Ingestion, Transformation and Data Flow Snaps in Spark”
To connect your data, apps, API’s and Internet of Things (IOT) faster, unleash the power of Spark on SnapLogic’s Hadooplex.
Continue reading “Spark the Spark with Snaplogic’s Hadooplex”
SnapLogic and Amazon Web Services are hosting a series of exclusive live seminars starting this week in Dallas. Next week we’ll be in Chicago and New York, followed by Palo Alto later in the month. The seminar series is focused on the future of data warehouse solutions and analytics in the modern enterprise. A key question that we’ll address is: Is the Data Warehouse Dead? Continue reading “Is the Data Warehouse Dead?”
Microsoft Azure HDInsight is an Apache Hadoop distribution powered by the cloud. Internally HDInsight leverages the Hortonworks data platform. HDInsight supports a large set of Apache big data projects like Spark, Hive, HBase, Storm, Tez, Sqoop, Oozie and many more. The suite of HDInsight projects can be administered via Apache Ambari.
This post lists out the steps involved in spinning up an HDInsight cluster, setting up SnapLogic’s Hadooplex on HDInsight, and building and executing a Spark data flow pipeline on HDInsight. We start with spinning up a HDInsight cluster from the MS Azure Portal. Continue reading “Executing Spark Pipelines on HDInsight”
The SnapLogic team is going to London! Our team of hybrid cloud and big data integration experts will be in town this week for Strata + Hadoop World EU to talk about big data integration, moving to a data lake and how to quickly ingest, prepare and deliver big data in Hadoop or Spark environments, regardless of the data’s velocity, variety and volume.
Continue reading “The SnapLogic Team Goes to London for #StrataHadoop World”