How to get valuable insights on data stored in Azure Data Lake Store

In a previous blog post, I discussed major trends in the data integration space and customers moving from on-prem to cloud. I’d like to focus on one trend which involves moving data from on-premises or cloud data sources to a Data Lake technology such as Azure Data Lake.

What is a Data Lake?

The Data Lake is a term coined for storing large amounts of data in its raw native form, including structured and unstructured data in one location. This data can come from various sources, and the Data Lake can act as a single source of truth for any organization. From the architecture standpoint, the data is first stored in data swamp/data acquisition, then cleansed/transformed as part of data transformation, and later published to gain business insights.

Data Lake

As seen in the diagram above, enterprises have multiple systems such as ERP, CRM, RDBMS, NoSQL, IoT sensors, etc. The disparate data, stored in different systems makes, is difficult to pull data from. A Data Lake brings all the data under one roof (data acquisition) using one of the following services:

  • Azure Blob
  • Azure Data Lake Store
  • Amazon S3
  • HDFS
  • Others

Data stored in one of these services can then be transformed in the following ways:

  • Aggregate
  • Sort
  • Join
  • Merge
  • Other

The transformed data is then moved to the data publish/data access section (could be the same as data acquisition services) where users can utilize the following tools to query the data:

  • Microsoft’s U-SQL
  • Amazon Athena
  • Hive
  • Presto
  • Others etc.

The bottom line is that a Data Lake can serve as a platform to run analytics in order to provide better customer experience, recommendations, and more.

Azure Data Lake is one such Data Lake from Microsoft and the repository used to store all the data is Azure Data Lake Store. Users can run Analytics Service, HDInsight or use U-SQL – a big data query language on top of this data store to gain better business insights.

ADLSSource: Microsoft

Azure Data Lake Store (ADLS) can store any data in its native format. One of the goals of this data store is to bring data from disparate sources. The Snaplogic Enterprise Integration Cloud with its pre-built connectors called Snaps help by moving data from different systems to the data store in a fast manner.

ADLS provides a complex API, which applications use to store data in ADLS. Snaplogic has abstracted all these complexities via Snaps so users can now easily move data from various systems to ADLS without needing to know anything of the complexities of these APIs.

Use case

A business needs to track and analyze content to better recommend products or services to its customers. Its data – from various sources such as Oracle, files, Twitter, etc. – needs to be stored in a central repository such as ADLS so that business users can run analytics on top to measure customer buying behavior, their interests, and products purchased.

Here’s a sample pipeline that can address this use case using Snaps:

Using the File Writer Snap and choosing the Azure Data Lake account as shown below, one can store the data merged from various systems into Azure Data Lake with ease.

All in all, the Data Lake can be a one-stop shop of storage for any data, giving users more ways to derive insights from multiple data sources. And SnapLogic is ready to make it easier for users to move their data into the Data Lake (in this case, an Azure Data Lake Store) in a quick and easy way.

Pavan Venkatesh is Senior Product Manager at SnapLogic. Follow him on Twitter @pavankv.

Big Data Ingestion Patterns: Ingesting Data from Cloud & Ground Sources into Hive

What is Apache Hive? Hive provides a mechanism to query, create and manage large datasets that are stored on Hadoop, using SQL like statements. It also enables adding a structure to existing data that resides on HDFS. In this post I’ll describe a practical approach on how to ingest data into Hive, with the SnapLogic Elastic Integration Platform, without the need to write code.

Continue reading “Big Data Ingestion Patterns: Ingesting Data from Cloud & Ground Sources into Hive”

SnapLogic CTO James Markarian on DisrupTV

SnapLogic CTO James Markarian recently appeared as a guest on DisrupTV, a weekly live-interview web-series produced by analyst firm Constellation Research and hosted by R “Ray” Wang and Vala Afshar. The trio discussed a variety of enterprise topics including modern data management, data lake strategy considerations and big data analytics.

Continue reading “SnapLogic CTO James Markarian on DisrupTV”

SnapLogic CEO Gaurav Dhillon on Andreessen Horowitz Podcast

SnapLogic co-founder and CEO Gaurav Dhillon sat down recently with Scott Kupor, managing partner at Andreessen Horowitz, for a wide-ranging podcast discussion of all-things-data.

The two discussed how the data management landscape has changed in recent years, the rise of advanced analytics, the move from data warehouses to data lakes, and other changes which are enabling organizations to “take back their enterprise.”

Continue reading “SnapLogic CEO Gaurav Dhillon on Andreessen Horowitz Podcast”

SnapLogic CTO James Markarian Discusses the Evolving Big Data Landscape on theCUBE

SnapLogic was in New York this week for Strata + Hadoop World NYC, and our CTO James Markarian took the opportunity to sit down with Dave Vellante and George Gilbert, hosts of theCUBE, for a wide-ranging discussion on the shifting big data landscape.

Continue reading “SnapLogic CTO James Markarian Discusses the Evolving Big Data Landscape on theCUBE”

SnapLogic Introduces Intelligent Connectors for Microsoft Azure Data Lake Store

SnapLogic announced the availability of new pre-built intelligent connectors – called Snaps – for Microsoft Azure Data Lake Store. The new Snaps provide fast, self-service data ingestion and transformation from virtually any source – whether on-premises, in the cloud or in hybrid environments – to Microsoft’s highly-scalable, cloud-based repository for big data analytics workloads. This latest integration between SnapLogic and Microsoft Azure helps enterprise customers gain new insights and unlock business value from their cloud-based big data initiatives.

Microsoft Quote Continue reading “SnapLogic Introduces Intelligent Connectors for Microsoft Azure Data Lake Store”

Big Data Game-Changers at Strata + Hadoop World NYC

Next week our team of integration experts will be in New York for Strata + Hadoop World to demonstrate how our big data integration platform as a service (iPaaS) allows customers to quickly ingest, prepare and deliver data to other sources within their IT ecosystems. We are also hosting a networking event for big data game-changers on demystifying data lakes, Hadoop and hybrid architecture. Learn more here.

Continue reading “Big Data Game-Changers at Strata + Hadoop World NYC”