What is Apache Hive? Hive provides a mechanism to query, create and manage large datasets that are stored on Hadoop, using SQL like statements. It also enables adding a structure to existing data that resides on HDFS. In this post I’ll describe a practical approach on how to ingest data into Hive, with the SnapLogic Elastic Integration Platform, without the need to write code.
A common pattern that a lot of companies use to populate a Hadoop-based data lake is to get data from pre-existing relational databases and data warehouses. When planning to ingest data into the data lake, one of the key considerations is to determine how to organize data and enable consumers to access the data. Hive and Impala provide a data infrastructure on top of Hadoop – commonly referred to as SQL on Hadoop – that provide a structure to the data and the ability to query the data using a SQL-like language. Continue reading “Big Data Ingestion Patterns”