What is Apache Hive? Hive provides a mechanism to query, create and manage large datasets that are stored on Hadoop, using SQL like statements. It also enables adding a structure to existing data that resides on HDFS. In this post I’ll describe a practical approach on how to ingest data into Hive, with the SnapLogic Elastic Integration Platform, without the need to write code.
In my previous post I described the various approaches and patterns to consider when ingesting data from a relational data sources into a Hadoop-based data lake. In this post I’ll describe a practical approach on how to utilize these patterns with the SnapLogic Elastic Integration Platform without the need to write code. The big data ingestion patterns described here take into account all the design considerations and best practices for effective ingestion of data into the Hadoop data lake. These patterns are being used by many enterprise organizations today to move large amounts of data. Continue reading “Big Data Ingestion Patterns: Ingest into the Data Lake”
A common pattern that a lot of companies use to populate a Hadoop-based data lake is to get data from pre-existing relational databases and data warehouses. When planning to ingest data into the data lake, one of the key considerations is to determine how to organize data and enable consumers to access the data. Hive and Impala provide a data warehouse infrastructure on top of Hadoop – commonly referred to as SQL on Hadoop – that provide a structure to the data and the ability to query the data using a SQL-like language. Continue reading “Big Data Ingestion Patterns”