Azure Data Platform: Reading and writing data to Azure Blob Storage and Azure Data Lake Store

By Prasad Kona

Organizations have been increasingly moving towards and adopting cloud data and cloud analytics platforms like Microsoft Azure. In this first in a series of Azure Data Platform blog posts, I’ll get you on your way to making your adoption of the cloud platforms and data integration easier.

In this post, I focus on ingesting data into the Azure Cloud Data Platform and demonstrate how to read and write data to Microsoft Azure Storage using SnapLogic.

For those who want to dive right in, my 4-minute step-by-step video “Building a simple pipeline to read and write data to Azure Blob storage” shows how to do what you want, without writing any code.

What is Azure Storage?

Azure Storage enables you to store terabytes of data to support small to big data use cases. It is highly scalable, highly available, and can handle millions of requests per second on average. Azure Blob Storage is one of the types of services provided by Azure Storage.

Azure provides two key types of storage for unstructured data: Azure Blob Storage and Azure Data Lake Store.

Azure Blob Storage

Azure Blob Storage stores unstructured object data. A blob can be any type of text or binary data, such as a document or media file. Blob storage is also referred to as object storage.

Azure Data Lake Store

Azure Data Lake Store provides what enterprises look for in storage today and it:

  • Provides additional enterprise-grade security features like encryption and uses Azure Active Directory for authentication and authorization.
  • Is compatible with Hadoop Distributed File System (HDFS) and works with the Hadoop ecosystem including Azure HDInsight.
  • Includes Azure HDInsight clusters, which can be provisioned and configured to directly access data stored in Data Lake Store.
  • Allows data stored in Data Lake Store to be easily analyzed using Hadoop analytic frameworks such as MapReduce, Spark, or Hive.

How do I move my data to the Azure Data Platform?

Let’s look at how you can read and write to Azure Data Platform using SnapLogic.

For SnapLogic Snaps that support Azure Accounts, we have an option to choose one of Azure Storage Account or Azure Data Lake Store:

Azure Data Platform 1

Configuring the Azure Storage Account in SnapLogic can be done as shown below using the Azure storage account name and access key you get from the Azure Portal:

Azure Data Platform 2

Configuring the Azure Data Lake Store Account in SnapLogic as shown below, uses the Azure Tenant ID, Access ID, and Secret Key that you get from the Azure Portal:

Azure Data Platform 3

Put together, you’ve got a simple pipeline that illustrates how to read and write to Azure Blob Storage:

Azure Data Platform 4

Here’s the step-by-step video again: Building a simple pipeline to read and write data to Azure BLOG storage

In my next blog post, I will describe the approaches to move data from your on-prem databases to Azure SQL Database.

Prasad Kona is an Enterprise Architect at SnapLogic. You can follow him on LinkedIn or Twitter @prasadkona.