The Case for a Hybrid Batch and Streaming Architecture for Data Integration

binarystreamModern data integration requires both reliable batch and reliable streaming computation to support essential business processes. Traditionally, in the enterprise software space, batch ETL (Extract Transform and Load) and streaming CEP (Complex Event Processing) were two completely different products with different means to formulating computations. Until recently, in the open source software space for big data, batch and streaming were addressed separately, such as MapReduce for batch and Storm for streams. Now we are seeing more data processing engines that attempt to provide models for both batch and streaming, such as Apache Spark and Apache Flink. In series of posts I’ll explain the need for a unified programming model and underlying hybrid data processing architecture that accommodates both batch and streaming computation for data integration. However, for data integration, this model must be at a level that abstracts specific data processing engines. Continue reading “The Case for a Hybrid Batch and Streaming Architecture for Data Integration”

SnapLogic Kafka Integration Snaps in Action

Apache Kafka

In today’s business world big data is generating a big buzz. Besides the searching, storing and scaling, one thing that clearly stands out is – stream processing. That’s where Apache Kafka comes in.

Kafka at a high level can be described as a publish and subscribe messaging system. Like any other messaging system, Kafka maintains feeds of messages into topics. Producers write data into topics and consumers read data out of these topics. For the sake of simplicity, I have linked to the Kafka documentation here.

In this blog post, I will demonstrate a simple use case where Twitter feeds to a Kafka topic and the data is written to Hadoop. Below are the detailed instructions of how users can build pipelines using the SnapLogic Elastic Integration Platform.
Continue reading “SnapLogic Kafka Integration Snaps in Action”

June 2014 Snap Release for the SnapLogic Elastic Integration Platform

We are pleased to announce the addition of the following Snaps for the SnapLogic Elastic Integration Platform:

Google-Directory-logoGoogle Directory Snap Pack

With this Snap Pack, you can add and modify users, user photos, groups and org units in your Google Directory. For example:

  • Query users to find all email addresses in use
  • List the group membership of a user
  • Create a new user, group, or org unit
  • Update an existing user’s name and add them to an org unit
  • Delete user photos or an org unit

LinkedinLogoTransparentLinkedIn Snap Pack

This Snap Pack lets you gather information from LinkedIn and provide or modify updates, such as:

  • Fetch a LinkedIn user’s profile
  • Search for people
  • Update or share a status message, and optionally post it to Twitter
  • Join or leave groups
  • Post, like, and follow group updates

Twitter_logo_blueThe latest Snap to join the Twitter Snap Pack, the Twitter Streaming Search Snap, streams tweets based on a keyword.

Additionally, the following Snaps are available as a Beta release:

  • Data Validator Snap (Beta release): This Snap validates incoming documents and their attributes against constraints you define.
  • Hadoop Snap Pack: Reader, Writer (Beta release): This Snap Pack lets you read data from or write data to a Hadoop File System.

With the June 2014 Snap Release, we are also delivering minor updates and fixes for the following Snaps: Google DFA Reports, JIRA Search, JMS and See the Release Notes for more information on these Snaps. The Snap update will occur this evening PDT, with no required down time.

For more information about SnapLogic Snaps, be sure to also check out our documentation as well as this post.