Big Data Ingestion Patterns: Ingesting Data from Cloud & Ground Sources into Hive

What is Apache Hive? Hive provides a mechanism to query, create and manage large datasets that are stored on Hadoop, using SQL like statements. It also enables adding a structure to existing data that resides on HDFS. In this post I’ll describe a practical approach on how to ingest data into Hive, with the SnapLogic Elastic Integration Platform, without the need to write code.

Continue reading “Big Data Ingestion Patterns: Ingesting Data from Cloud & Ground Sources into Hive”

Big Data Ingestion Patterns

Big Data Integration

A common pattern that a lot of companies use to populate a Hadoop-based data lake is to get data from pre-existing relational databases and data warehouses. When planning to ingest data into the data lake, one of the key considerations is to determine how to organize data and enable consumers to access the data. Hive and Impala provide a data warehouse infrastructure on top of Hadoop – commonly referred to as SQL on Hadoop – that provide a structure to the data and the ability to query the data using a SQL-like language. Continue reading “Big Data Ingestion Patterns”