Big Data Integration: Understanding Streaming Computation

In the first post in this series, I introduced The Case for Hybrid Batch and Streaming Architecture for Data Integration and focused on understanding batch computation. In this post I’ll focus on understanding streaming computation. In the final post I’ll review the benefits of unifying batch and streaming for data integration.

data_streamingUnderstanding Streaming Computation
Streaming data processing engines have varying functionality and implementation strategies. In one view, a streaming engine can process data as it arrives in contrast to a batch system that must first have all the data present before starting a computation. The goal of the streaming computation may be to filter out unneeded data or transform incoming data before sending the resulting data onto its final destination. If each piece of streaming data can be acted on independently, then the memory requirements of the the stream processing nodes can be constrained as long as the the streaming computation can keep up with the incoming data. Also, it is often not necessary or desirable to persist incoming stream data to disk. Continue reading “Big Data Integration: Understanding Streaming Computation”

Webinar: 5 Critical Things to Understand About Modern Data Integration

Linthicum-webinar-graphicData integration is not optional. It is a fundamental technology that binds systems and data together to drive the business. The importance of data integration is self-evident. However, in the changing world of IT, the path to effective data integration approaches and technology seems to be out of reach for even the most innovative and well-funded enterprises. The gap seems to be more about understanding than capabilities. Let’s fix that problem.

Continue reading “Webinar: 5 Critical Things to Understand About Modern Data Integration”

The Case for a Hybrid Batch and Streaming Architecture for Data Integration

binarystreamModern data integration requires both reliable batch and reliable streaming computation to support essential business processes. Traditionally, in the enterprise software space, batch ETL (Extract Transform and Load) and streaming CEP (Complex Event Processing) were two completely different products with different means to formulating computations. Until recently, in the open source software space for big data, batch and streaming were addressed separately, such as MapReduce for batch and Storm for streams. Now we are seeing more data processing engines that attempt to provide models for both batch and streaming, such as Apache Spark and Apache Flink. In series of posts I’ll explain the need for a unified programming model and underlying hybrid data processing architecture that accommodates both batch and streaming computation for data integration. However, for data integration, this model must be at a level that abstracts specific data processing engines. Continue reading “The Case for a Hybrid Batch and Streaming Architecture for Data Integration”

SnapLogic Wins CODiE Award for Best Data Integration Solution

CODIE_2016_winner_black
SnapLogic wins 2016 SIIA CODiE Award for Best Data Integration Solution

I’m pleased to announce that the SnapLogic Elastic Integration Platform was named the Best Data Integration Solution as part of the 2016 SIIA CODiE Awards. A CODiE Award win is a prestigious honor as each award winner was reviewed by a field of industry experts, whose evaluations determined the finalists. SIIA members then reviewed the finalists and their votes were combined with the scores from the industry experts’ to select this year’s CODiE Awards winners.  Continue reading “SnapLogic Wins CODiE Award for Best Data Integration Solution”

New With the Spring 2016 Release: Data Ingest-Prep-Deliver for Microsoft HDInsight

SnapLogic continues to build on its momentum in cloud-based data management with new support for HDInsight, Microsoft’s big-data-as-a-service on Azure. This follows our other recent announcements regarding support for the Microsoft Azure and Cortana ecosystem including availability in the Azure Marketplace. Continue reading “New With the Spring 2016 Release: Data Ingest-Prep-Deliver for Microsoft HDInsight”

SnapLogic Kafka Snaps in Action

Apache Kafka

In today’s business world big data is generating a big buzz. Besides the searching, storing and scaling, one thing that clearly stands out is – stream processing. That’s where Apache Kafka comes in.

Kafka at a high level can be described as a publish and subscribe messaging system. Like any other messaging system, Kafka maintains feeds of messages into topics. Producers write data into topics and consumers read data out of these topics. For the sake of simplicity, I have linked to the Kafka documentation here.

In this blog post, I will demonstrate a simple use case where Twitter feeds to a Kafka topic and the data is written to Hadoop. Below are the detailed instructions of how users can build pipelines using the SnapLogic Elastic Integration Platform.
Continue reading “SnapLogic Kafka Snaps in Action”

SnapLogic’s Latest Release: Spring 2016 has Sprung…

…and it’s looking Kafka-esque. So to speak.

Today SnapLogic announced our Spring 2016 platform and Snap release. Overall, we believe this release will help our customers focus on data insights, not data engineering. It takes a lot of the repetitive, time-consuming activities around data ingest-preparation-delivery and makes them reusable and simple. We also believe that this release will help our customers continue to stay abreast of the ever-changing big data technology ecosystem, and choose the right tools and frameworks for each job. Continue reading “SnapLogic’s Latest Release: Spring 2016 has Sprung…”