Azure Data Platform: Reading and writing data to Azure Blob Storage and Azure Data Lake Store

By Prasad Kona

Organizations have been increasingly moving towards and adopting cloud data and cloud analytics platforms like Microsoft Azure. In this first in a series of Azure Data Platform blog posts, I’ll get you on your way to making your adoption of the cloud platforms and data integration easier.

In this post, I focus on ingesting data into the Azure Cloud Data Platform and demonstrate how to read and write data to Microsoft Azure Storage using SnapLogic.

For those who want to dive right in, my 4-minute step-by-step video “Building a simple pipeline to read and write data to Azure Blob storage” shows how to do what you want, without writing any code.

What is Azure Storage?

Azure Storage enables you to store terabytes of data to support small to big data use cases. It is highly scalable, highly available, and can handle millions of requests per second on average. Azure Blob Storage is one of the types of services provided by Azure Storage.

Azure provides two key types of storage for unstructured data: Azure Blob Storage and Azure Data Lake Store.

Azure Blob Storage

Azure Blob Storage stores unstructured object data. A blob can be any type of text or binary data, such as a document or media file. Blob storage is also referred to as object storage.

Azure Data Lake Store

Azure Data Lake Store provides what enterprises look for in storage today and it:

  • Provides additional enterprise-grade security features like encryption and uses Azure Active Directory for authentication and authorization.
  • Is compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem including Azure HDInsight.
  • Includes Azure HDInsight clusters, which can be provisioned and configured to directly access data stored in Data Lake Store.
  • Allows data stored in Data Lake Store to be easily analyzed using Hadoop analytic frameworks such as MapReduce, Spark, or Hive.

How do I move my data to the Azure Data Platform?

Let’s look at how you can read and write to Azure Data Platform using SnapLogic.

For SnapLogic Snaps that support Azure Accounts, we have an option to choose one of Azure Storage Account or Azure Data Lake Store:

Azure Data Platform 1

Configuring the Azure Storage Account in SnapLogic can be done as shown below using the Azure storage account name and access key you get from the Azure Portal:

Azure Data Platform 2

Configuring the Azure Data Lake Store Account in SnapLogic as shown below, uses the Azure Tenant ID, Access ID, and Secret Key that you get from the Azure Portal:

Azure Data Platform 3

Put together, you’ve got a simple pipeline that illustrates how to read and write to Azure Blob Storage:

Azure Data Platform 4

Here’s the step-by-step video again: Building a simple pipeline to read and write data to Azure BLOG storage

In my next blog post, I will describe the approaches to move data from your on-prem databases to Azure SQL Database.

Prasad Kona is an Enterprise Architect at SnapLogic. You can follow him on LinkedIn or Twitter @prasadkona.

 

SnapLogic CTO James Markarian on DisrupTV

SnapLogic CTO James Markarian recently appeared as a guest on DisrupTV, a weekly live-interview web-series produced by analyst firm Constellation Research and hosted by R “Ray” Wang and Vala Afshar. The trio discussed a variety of enterprise topics including modern data management, data lake strategy considerations and big data analytics.

Continue reading “SnapLogic CTO James Markarian on DisrupTV”

SnapLogic CEO Gaurav Dhillon on Andreessen Horowitz Podcast

SnapLogic co-founder and CEO Gaurav Dhillon sat down recently with Scott Kupor, managing partner at Andreessen Horowitz, for a wide-ranging podcast discussion of all-things-data.

The two discussed how the data management landscape has changed in recent years, the rise of advanced analytics, the move from data warehouses to data lakes, and other changes which are enabling organizations to “take back their enterprise.”

Continue reading “SnapLogic CEO Gaurav Dhillon on Andreessen Horowitz Podcast”

A Hadoop Data Lake For Banking: A SnapLogic Story

Last week, part of the SnapLogic team was in New York City for the Strata/Hadoop World conference. It’s one of the largest big data events in the U.S. and has grown steadily larger over recent years. The agenda has shifted a bit as well – from largely academic discussions and how-to presentations by open source committers to real-world case studies by non-ISV enterprises.

With that in mind, I’d like to share a story from one of our enterprise customers. In fact, this customer is a 100+ year old financial institution. Perhaps not a company that you would associate with the cutting edge of data management technologies… Due the nature of their industry, I can’t share their name.

Like many established companies, this bank’s data processing and storage systems have been acquired or added over the years based on the most pressing needs and compliance requirements at the time. They ultimately found themselves trying to manage an unwieldy mix of 240+ interfaces and applications. Continue reading “A Hadoop Data Lake For Banking: A SnapLogic Story”

SnapLogic CTO James Markarian Discusses the Evolving Big Data Landscape on theCUBE

SnapLogic was in New York this week for Strata + Hadoop World NYC, and our CTO James Markarian took the opportunity to sit down with Dave Vellante and George Gilbert, hosts of theCUBE, for a wide-ranging discussion on the shifting big data landscape.

Continue reading “SnapLogic CTO James Markarian Discusses the Evolving Big Data Landscape on theCUBE”

SnapLogic Introduces Intelligent Connectors for Microsoft Azure Data Lake Store

SnapLogic announced the availability of new pre-built intelligent connectors – called Snaps – for Microsoft Azure Data Lake Store. The new Snaps provide fast, self-service data ingestion and transformation from virtually any source – whether on-premises, in the cloud or in hybrid environments – to Microsoft’s highly-scalable, cloud-based repository for big data analytics workloads. This latest integration between SnapLogic and Microsoft Azure helps enterprise customers gain new insights and unlock business value from their cloud-based big data initiatives.

Microsoft Quote Continue reading “SnapLogic Introduces Intelligent Connectors for Microsoft Azure Data Lake Store”

New Podcast Episode: Navigating the Data Lake – Tips From a Practitioner

In this episode of the SnapTalk podcast series, enterprise architect Ravi Dharnikota talks with Rakesh Raghavan, Director of Snap Engineering at SnapLogic. Rakesh comes to SnapLogic having designed, developed and managed data lakes for several leading online retailers and consumer-facing websites. He has successfully navigated enterprise data lakes using open source tools and manual techniques, and in this episode shares his first-hand experiences.

Ravi and Rakesh discuss the pitfalls of jumping into a data lake without a clear architecture, the challenges of supporting both traditional reporting and ad hoc data exploration use cases in the same environment, and the often-overlooked, often manual data engineering tasks involved in data lake implementation.

Subscribe to the series: https://soundcloud.com/snaplogic/sets/snaptalk