Big Data Ingestion – Definition & Overview

What is big data ingestion?

Big data ingestion gathers data and brings it into a data processing system where it can be stored, analyzed, and accessed. Data processing systems can include data lakes, databases, and search engines. Usually, this data is unstructured, comes from multiple sources, and exists in diverse formats. 

Depending on the source and destination of the data, data can be ingested in real time, batches, or both (called lambda architecture). Data that is streamed in real time is imported while it is emitted by the source. Data that is ingested in batches is imported in distinct groups at regular intervals of time.

In many situations, the source and destination may not have the same format, protocol, or data timing. To make the data useable to the destination system, the data will require some type of transformation or conversion. 

An effective data ingestion begins with the data ingestion layer. This layer processes incoming data, prioritizes sources, validates individual files, and routes data to the correct destination. It ends with the data visualization layer which presents the data to the user.

SnapLogic helps organizations improve data management in their data lakes. This includes moving and processing large volumes of data from various sources. SnapLogic eXtreme manages big data clusters and makes cloud-based big data processing viable for enterprises by offering scalability, flexibility, and reduced OpEx. 

Learn more about big data ingestion pipeline patterns and data pipeline architecture.