Get your game plan on: Data warehouse migration to the cloud

By Ravi Dharnikota

You’ve decided to move your data warehouse to the cloud, and want to get started. Great! It’s easy to see why – in addition to the core benefits I wrote about in my last blog post, there are many more benefits associated with cloud data warehousing: incredibly fast processing, speedy deployment, built-in fault tolerance and disaster recovery and, depending on your cloud provider, strong security and governance.

A six-step reality check

But before you get too excited, it’s time to take a reality check; moving an existing data warehouse to the cloud is not quick, and it isn’t easy. It is definitely not as simple as exporting data from one platform and loading to another. Data is only one of the six warehouse components to be migrated.

Tactically and technically, data warehouse migration is an iterative process and needs many steps to migrate all of the components, as illustrated below. Here’s everything you need to consider in migrating your data warehouse to the cloud.

1) Migrating schema: Before moving warehouse data, you’ll need to migrate table structures and specifications. You may need to make structural changes as part of the migration, including indexing or partitioning – do they need to be rethought?

Data Warehouse Migration Process

2) Migrating data: Moving very large volumes of data is process intensive, network intensive, and time-consuming. You’ll need to map out how long will it take to migrate and if you can accelerate that process. You may need to restructure as part of schema migration and transform data as part of the data migration? Alternatively, can you transform in-stream or should you pre-process and then migrate?

3) Migrating ETL: Moving data may be the easy part when compared to migrating ETL processes. You may need to change the code base to optimize for platform performance and change data transformations to sync with data restructuring. You’ll need to determine if data flows should remain intact or be reorganized. As part of the migration, you may need to reduce data latency and deliver near real-time data. If that’s the case, would it make sense to migrate ETL processing to the cloud, as well? Is there a utility to convert your ETL code?

4) Rebuilding data pipelines: With any substantive change to data flow or data transformation, rebuilding data pipelines may be a better choice than migrating existing ETL. You may be able to isolate individual data transformations and package them as executable modules. You’ll need to understand the dependencies among data transformations to construct optimum workflow and the advantages you may gain – performance, agility, reusability, and maintainability – by rebuilding ETL as modular data pipelines using modern, cloud-friendly technology. 

5) Migrating metadata: Source-to-target metadata is a crucial part of managing a data warehouse; knowing data lineage, and tracing and troubleshooting is critical when problems occur. How readily will this metadata transfer to a new cloud platform? Are all of the mappings, transform logic, dataflow, and workflow locked in proprietary tools or buried in SQL code? You’ll need to determine if you’ll be able to export and import by either reverse engineering the metadata or rebuilding it from scratch.

6) Migrating users and applications: The final step in the process is migrating users and applications to the new cloud data warehouse, without interrupting business operations. Security and access authorizations may need to be created or changed, and BI and analytics tools should be connected. To do this, what communication is needed and with whom?

Don’t try to do everything at once

A typical enterprise data warehouse contains a large amount of data describing many business subject areas. Migrating an entire data warehouse in a single pass is usually not realistic. Incremental migration is the smart approach when “big bang” migration isn’t practical. Migrating incrementally is a must when undertaking significant design changes as part of the effort.

However, incremental migration brings new considerations. Data location should be transparent from a user point of view throughout the period when some data resides in the legacy data warehouse and some in the new cloud data warehouse. Consider a virtual layer as a point of access to decouple queries from data storage location.

A hybrid strategy is another viable option. With a hybrid approach, your on-premises data warehouse can remain operating as the cloud data warehouse comes online. During this transition phase, you’ll need to synchronize the data between the old on-premises data warehouse and the new one that’s in the cloud.

Cloud migration tools to the rescue

The good news is, there are many tools and services that can be invaluable when migrating your legacy data warehouse to the cloud. In my next post, the third and final in this series, I’ll explore the tools for data integration, data warehouse automation, and data virtualization, and system integrator resources that can speed and de-risk the process.

Learn more at SnapLogic’s upcoming webcast, “Traditional Data Warehousing is Dead: How digital enterprises are scaling their data to infinity and beyond in the Cloud,” on Wednesday, August 16 at 9:00am PT. I’ll be presenting with Dave Wells, Data Management Practice Lead, Eckerson Group, and highlighting tangible business benefits that your organization can achieve by moving your data to the cloud. You’ll learn:

      • Practical best practices, key technologies to consider, and case studies to get you started
      • The potential pitfalls of “cloud-washed” legacy data integration solutions
      • Cloud data warehousing market trends
      • How SnapLogic’s Enterprise Integration Cloud delivers up to a 10X improvement in the speed and ease of data integration

Sign up today!

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

The commoditization of integration

By Dinesh Chandrasekhar

Eight years ago, dozens of integration vendors were offering scores of solutions, all with what seemed to be the same capabilities. Pick any ESB or ETL tool and each seemed to perform the same functions as their competitors. RFPs were no longer a viable way to weed out the inferior vendors as each solution checked all the boxes across the board. Plus, all vendors were ready to lower their prices at the drop of a hat to win your business. It was at this time that the integration market had truly reached a level of commoditization. Consumers could easily pick and choose any solution as there were no true differentiators amongst them.

But, several factors have changed the landscape since then:

  • NoESB – The NoESB architecture had started gaining interest – pushing the idea of the irrelevancy of ESB for many integration scenarios. Yet, an API Gateway was not the right alternative.
  • Cloudification – The cloudification of pretty much all your favorite on-premises enterprise applications began around the same time. Enterprises that were thinking of a digital transformation couldn’t get too far without a definitive cloud strategy in place.
  • Convergence of ESB and ETL – The lines between application integration and data integration were blurring. CIOs and IT managers didn’t want to deal with two different sets of integration tools. With the onset of mobile and IoT, data volumes were exploding daily. As a result, even data warehouses moved to the cloud. To serve such big data needs, the traditional/legacy ESB/ETL tools were incompetent and unfit.
  • Agile Integrations – Finally, the DevOps and Agile movements impacted enterprise integration initiatives as well. They had given rise to new user personas in the enterprise – Citizen Integrators or Citizen Developers. These are the LOB Managers or non-IT personnel that needed quick integrations within their applications to render their data in different views. The reliance on IT to deliver solutions to business was becoming a major hindrance.

All these factors have influenced the iPaaS (Integration Platform as a Service) market. Now, thousands of companies are already leveraging iPaaS solutions to integrate their cloud and on-premises solutions. iPaaS solutions break away from legacy approaches to integration, are cloud-native, intuitive, fast, self-starting, support hybrid architectures, and offer connectors to a wide range of on-premises and on the cloud applications.

Now comes the big question – “Will iPaaS solutions be commoditized, too?” At the moment, the answer is a definite NO and there are multiple reasons why. Beyond scale, latency, tenancy, SLAs, number of connectors etc., one of the key areas that will differentiate iPaaS solutions is the developer experience. The user interface of the solution will determine the adoption rate and the value it brings to the enterprise. So, for a citizen integrator to actually use the system, the interface should be intuitive enough to guide them in building their integration flows quickly, effectively, and most importantly, without the assistance of IT. This alone will make or break the system adoption.

iPaaS vendors are trying to enhance this developer experience with features like drag-and-drop connectors, pipeline snippets, a templates library, a starter kit, mapping enhancements, etc. However, very few vendors are offering AI-driven tooling that enables intelligent ways to predict next steps – based on learnings from hundreds of other users – for your integration flow. AI-assist is truly a great benefit for citizen integrators, who may be non-technical. Even technically savvy developers welcome a significant boost in their productivity. With innovations like this happening, the iPaaS space is quite far away from being commoditized. However, enterprises still need to be wary of cloud-washing iPaaS vendors that offer “1000+” connectors, a thick-client IDE, or an ESB wrapped in a cloud blanket. And, that is a post for a different day!

Dinesh Chandrasekhar is Director of Product Marketing at SnapLogic. Follow him on Twitter @AppInt4All.

Webinar: Introduction to iPaaS – Drivers, Requirements and Use Cases

Synerzip webinarIf you have heard the term “iPaaS” but still aren’t quite sure what it means, join us tomorrow, Wednesday, July 20th at 10am PST, for a webinar in partnership with Synerzip to hear more about this increasingly recognized term and why you might be ready to adopt an integration platform as a service (iPaaS) solution.

Continue reading “Webinar: Introduction to iPaaS – Drivers, Requirements and Use Cases”

Webinar: 5 Critical Things to Understand About Modern Data Integration

Linthicum-webinar-graphicData integration is not optional. It is a fundamental technology that binds systems and data together to drive the business. The importance of data integration is self-evident. However, in the changing world of IT, the path to effective data integration approaches and technology seems to be out of reach for even the most innovative and well-funded enterprises. The gap seems to be more about understanding than capabilities. Let’s fix that problem.

Continue reading “Webinar: 5 Critical Things to Understand About Modern Data Integration”

The 3 A’s of Enterprise Integration

This post originally appeared on Data Informed.

binary-big-dateAs organizations look to increase their agility, IT and lines of business need to connect faster. Companies need to adopt cloud applications more quickly and they need to be able to access and analyze all their data, whether from a legacy data warehouse, a new SaaS application, or an unstructured data source such as social media. In short, a unified integration platform has become a critical requirement for most enterprises.

According to Gartner, “unnecessarily segregated application and data integration efforts lead to counterproductive practices and escalating deployment costs.”

Don’t let your organization get caught in that trap. Whether you are evaluating what you already have or shopping for something completely new, you should measure any platform by how well it address the “three A’s” of integration: Anything, Anytime, Anywhere. Continue reading “The 3 A’s of Enterprise Integration”

Collaborations in Building Hybrid Cloud Computing and Data Integrations

Post first published by Ravi Dharnikota on LinkedIn.

It’s one thing to create application and data integrations; it’s an even bigger challenge to collaborate with other teams in the enterprise to reuse and repurpose and standardize on what has already been built.

The need for seamless content collaboration is a key ingredient for overall success in app and data integrations, just as it is in app development and delivery. A platform that allows for easy sharing of information between employees is the different between a platform’s adoption throughout the enterprise or becoming shelf-ware. Continue reading “Collaborations in Building Hybrid Cloud Computing and Data Integrations”

Webinar: Get to the Cloud and Big Data Faster with Modern Data Integration

INFA-webinarHas your legacy data extraction, transformation and loading (ETL) technology become a barrier to cloud and big data adoption?

If so, join the data integration experts from SnapLogic and INTRICITY next week on Thursday, January 21st for an interactive webinar focused on why it’s time to re-think your integration layer and how to get to the cloud and big data faster with a modern platform. This webinar will introduce our Integration Modernization Assessment that is specifically designed to help enterprise IT organizations achieve greater agility by moving away from legacy data and application integration technologies. Continue reading “Webinar: Get to the Cloud and Big Data Faster with Modern Data Integration”