The bigger picture: Strategizing your data warehouse migration

By Ravi Dharnikota

If your organization is moving its data warehouse to the cloud, you can be confident you’re in good company. And if you read my last blog post about the six-step migration process, you can be even more confident that the move will go smoothly. However, don’t pull the trigger just yet. You’ve got a bit more planning to do, this time at a more strategic level.

First, let’s recap the migration process I covered in my last post, of the data warehouse itself. In that blog post, I broke down all the components of this diagram:

Data Warehouse Migration Process

Now, as you can see in the diagram below, the data warehouse migration process itself is part of a bigger picture of migration planning and strategy. Let’s take a look at the important pre-migration steps you can take help to ensure success with the migration itself.

Migration Strategy and Planning

Step 1: Define Goals and Business Case. Start the planning process with a clear picture of the business reasons for migrating your data warehouse to the cloud. Common goals include:

  • Agility in terms of both the business and the IT organization’s data warehousing projects.
  • Performance on the back end, to ensure timeliness and availability of data, and on the front end, for fast end-user query response times.
  • Growth and headroom to ease capacity planning; the elastic scalability of cloud resources mitigates this problem.
  • Cost savings on hardware, software, services, space, and utilities.
  • Labor savings from reduced needs for database administration, systems administration, scheduling and operations, and maintenance and support.

Step 2: Assess the current data warehouse architecture. If the current architecture is sound, you can plan to migrate to the cloud without redesign and restructuring. If architecturally sufficient for BI but limited for advanced analytics and big data integration, you should review and refine data models and processes as part of the migration effort. If the current architecture struggles to meet current BI requirements, plan to redesign it as you migrate to the cloud.

Step 3: Define the migration strategy. A “lift and shift” approach is tempting, but it rarely succeeds. Changes are typically needed to adapt data structures, improve processing, and ensure compatibility with the chosen cloud platform. Incremental migration is more common and usually more successful.

As I mentioned in my last blog post, a hybrid strategy is another viable option. Here, your on-premises data warehouse can remain operating as the cloud data warehouse comes online. During this transition phase, you’ll need to synchronize the data between the old on-premises data warehouse and the new one that’s in the cloud.

Step 4: Select the technology including the cloud platform you’ll migrate to, and which tools you’ll need for the migration. There are many types of tools and services that can be valuable:

  • Data integration tools are used to build or rebuild ETL processes to populate the data warehouse. Integration platform as a service (iPaaS) technology is especially well suited for ETL migration.
  • Data warehouse automation tools like WhereScape can be used to deconstruct legacy ETL, reverse engineer and redesign ETL processes, and regenerate ETL processes without the need to reconstruct data mappings and transformation logic.
  • Data virtualization tools such as Denodo provide a virtual layer of data views to support queries that are independent of storage location and adaptable to changing data structures.
  • System integrators and service providers like Atmosera can be helpful when manual effort is needed to extract data mappings and transformation logic that is buried in code.

Using these tools and services individually or in combination can make a remarkable difference in your data warehouse, serving to speed and de-risk the migration process.

Step 5: Migrate and operationalize; start by defining test and acceptance criteria. Plan the testing, then execute the migration process to move schema, data, and processing. Execute the test plan and, when successful, operationalize the cloud data warehouse and migrate users and applications.

Learn more at SnapLogic’s upcoming webinar

To get the full story on data warehouse cloud migration, join me for an informative SnapLogic webinar, “Traditional Data Warehousing is Dead: How digital enterprises are scaling their data to infinity and beyond in the Cloud,” on Wednesday, August 16 at 9:00am PT. I’ll be presenting with Dave Wells, Leader of Data Management Practice, Eckerson Group, and highlighting tangible business benefits that your organization can achieve by moving your data to the cloud. You’ll learn:

  • Practical best practices, key technologies to consider, and case studies to get you started
  • The potential pitfalls of “cloud-washed” legacy data integration solutions
  • Cloud data warehousing market trends
  • How SnapLogic’s Enterprise Integration Cloud delivers up to a 10X improvement in the speed and ease of data integration

Register today!

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

Get your game plan on: Data warehouse migration to the cloud

By Ravi Dharnikota

You’ve decided to move your data warehouse to the cloud, and want to get started. Great! It’s easy to see why – in addition to the core benefits I wrote about in my last blog post, there are many more benefits associated with cloud data warehousing: incredibly fast processing, speedy deployment, built-in fault tolerance and disaster recovery and, depending on your cloud provider, strong security and governance.

A six-step reality check

But before you get too excited, it’s time to take a reality check; moving an existing data warehouse to the cloud is not quick, and it isn’t easy. It is definitely not as simple as exporting data from one platform and loading to another. Data is only one of the six warehouse components to be migrated.

Tactically and technically, data warehouse migration is an iterative process and needs many steps to migrate all of the components, as illustrated below. Here’s everything you need to consider in migrating your data warehouse to the cloud.

1) Migrating schema: Before moving warehouse data, you’ll need to migrate table structures and specifications. You may need to make structural changes as part of the migration, including indexing or partitioning – do they need to be rethought?

Data Warehouse Migration Process

2) Migrating data: Moving very large volumes of data is process intensive, network intensive, and time-consuming. You’ll need to map out how long will it take to migrate and if you can accelerate that process. You may need to restructure as part of schema migration and transform data as part of the data migration? Alternatively, can you transform in-stream or should you pre-process and then migrate?

3) Migrating ETL: Moving data may be the easy part when compared to migrating ETL processes. You may need to change the code base to optimize for platform performance and change data transformations to sync with data restructuring. You’ll need to determine if data flows should remain intact or be reorganized. As part of the migration, you may need to reduce data latency and deliver near real-time data. If that’s the case, would it make sense to migrate ETL processing to the cloud, as well? Is there a utility to convert your ETL code?

4) Rebuilding data pipelines: With any substantive change to data flow or data transformation, rebuilding data pipelines may be a better choice than migrating existing ETL. You may be able to isolate individual data transformations and package them as executable modules. You’ll need to understand the dependencies among data transformations to construct optimum workflow and the advantages you may gain – performance, agility, reusability, and maintainability – by rebuilding ETL as modular data pipelines using modern, cloud-friendly technology. 

5) Migrating metadata: Source-to-target metadata is a crucial part of managing a data warehouse; knowing data lineage, and tracing and troubleshooting is critical when problems occur. How readily will this metadata transfer to a new cloud platform? Are all of the mappings, transform logic, dataflow, and workflow locked in proprietary tools or buried in SQL code? You’ll need to determine if you’ll be able to export and import by either reverse engineering the metadata or rebuilding it from scratch.

6) Migrating users and applications: The final step in the process is migrating users and applications to the new cloud data warehouse, without interrupting business operations. Security and access authorizations may need to be created or changed, and BI and analytics tools should be connected. To do this, what communication is needed and with whom?

Don’t try to do everything at once

A typical enterprise data warehouse contains a large amount of data describing many business subject areas. Migrating an entire data warehouse in a single pass is usually not realistic. Incremental migration is the smart approach when “big bang” migration isn’t practical. Migrating incrementally is a must when undertaking significant design changes as part of the effort.

However, incremental migration brings new considerations. Data location should be transparent from a user point of view throughout the period when some data resides in the legacy data warehouse and some in the new cloud data warehouse. Consider a virtual layer as a point of access to decouple queries from data storage location.

A hybrid strategy is another viable option. With a hybrid approach, your on-premises data warehouse can remain operating as the cloud data warehouse comes online. During this transition phase, you’ll need to synchronize the data between the old on-premises data warehouse and the new one that’s in the cloud.

Cloud migration tools to the rescue

The good news is, there are many tools and services that can be invaluable when migrating your legacy data warehouse to the cloud. In my next post, the third and final in this series, I’ll explore the tools for data integration, data warehouse automation, and data virtualization, and system integrator resources that can speed and de-risk the process.

Learn more at SnapLogic’s upcoming webcast, “Traditional Data Warehousing is Dead: How digital enterprises are scaling their data to infinity and beyond in the Cloud,” on Wednesday, August 16 at 9:00am PT. I’ll be presenting with Dave Wells, Data Management Practice Lead, Eckerson Group, and highlighting tangible business benefits that your organization can achieve by moving your data to the cloud. You’ll learn:

      • Practical best practices, key technologies to consider, and case studies to get you started
      • The potential pitfalls of “cloud-washed” legacy data integration solutions
      • Cloud data warehousing market trends
      • How SnapLogic’s Enterprise Integration Cloud delivers up to a 10X improvement in the speed and ease of data integration

Sign up today!

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

Data and analytics – behind and after an acquisition

By Karen He

Now more than ever, organizations need to move beyond innovation to grow their business, stay competitive, and remain relevant. And it’s a combination of data and analytics that provides businesses the right insights to help them move in the right direction and beyond innovation. Amazon and Walmart have done just that. Their competitiveness and relentless business tactics to become leaders in the extremely competitive retail world have set them apart from many. A shared tactic, acquiring smaller competitors in their space, is something both companies have done well, and data is at the core.  

In late 2016, Walmart strengthened its e-commerce strategy by acquiring Jet.com. It continued into 2017 by acquiring five other online retailers, including Moosejaw, ModCloth, Bonobos, Shoebuy.com, and Hayneedle.com.

Amazon, on the other hand, and just this year, expanded its brick-and-mortar expansion strategy with its first physical store in New York City to its pending purchase of Whole Foods.

Acquisitions are easier said than done. The teams behind most retail conglomerates like Walmart or Amazon do not buy companies by merely following their instincts. Instead, they rely heavily on data from multiple sources to develop a business strategy to grow their business. In acquisitions, data would show retail leaders whether an acquisition is strategically feasible for the business.

The data behind the decision

Nowadays, the amount of data available to organizations is both a blessing and a curse. Businesses are surrounded by deep pockets of their own data, residing in different cloud-based and on-premises applications and databases. For the most part, this volume of enterprise data is extremely hard to retrieve without technical assistance. As a result, businesses continuously face the challenge of spending an immense amount of time and effort simply rounding up and compiling data.

Subsequently, retail leaders gather and analyze data to make an informed decision on whether to purchase another company or not. But most retailers are not in the position to make such decisions because that insightful and illusive data resides in multiple places. They pivot from one tool to another, sifting through thousands of data sets before even getting into the analysis. These cumbersome, manual processes prevent retailers from gaining real-time insights, potentially preventing them from taking on first-mover initiatives and leapfrog its competitors. Until now.

Companies that can pull data and derive insights in real-time are empowered to transform and grow their business. Retailers need to pull data on-demand to be able to visualize complete insights to make a sound business decision. In Walmart and Amazon‘s cases, they tapped into extensive data sources to understand whether they would gain more value by growing organically or by dropping millions of dollars to acquire established companies. Of course, we know what they’ve acquired, but not necessarily what they didn’t acquire or why.

Post-acquisition alignment

Beyond all the pre-acquisition data and number crunching, both Amazon and Walmart are aware of the many M&A processes needed post-acquisition. Once a company acquires another company, the parent company must realign the business by consolidating virtually all the departments, operations, and processes between both companies. In Amazon and Walmart’s case, consolidating business operations and supply chains involve complex data migrations. For conglomerates like Amazon and Walmart to have a seamless flow of information, subsidiaries need to migrate their data from all their systems and applications into their parent company.

Without realignment, business users across functions can become unproductive due to the lack of data or inefficient manual labor to connect data files from disparate systems and applications. A marketing department alone may have at least a half a dozen marketing applications, including CRM, marketing campaign automation, web analytics, marketing intelligence, predictive analytics, content management and social media management systems. Gaps in marketing reports and insights, for example, emerge when marketers use duplicate data from different marketing applications, and result in potentially lower business performance. To fold in existing operations and processes, companies need a smarter way to connect systems together or migrate data from one system to another.

Amazon and Walmart are proof points of how companies must innovate and grow in the competitive retail market. Businesses across industries should also look into their data to unearth growth opportunities. Complete, real-time data and analytics empower business professionals to expand their business and stay competitive in the market.

Learn more about connecting systems and applications to fuel rich data and analytics in this recorded webcast.

Karen He is Product Marketing Manager at SnapLogic. Follow her on Twitter @KarenHeee.

 

Integrate through the big data insights gap

By Bill Creekbaum

Whether you’re an analyst, data scientist, CxO, or just a “plain ol’ business user,” having access to more data represents an opportunity to make better business decisions, identify new and innovative opportunities, respond to hard-to-identify threats … the opportunities abound.

More data – from IoT, machine logs, streaming social media, cloud-native applications, and more – is coming at you with diverse structures and in massive volumes at high velocity. Traditional analytic and integration platforms were never designed to handle these types of workloads.

The above data is often associated with big data and tends to be accessible by a very limited audience with a great deal of technical skill and experience (e.g., data scientists), limiting the business utility of having more data. This creates a big data insights gap and prevents a much broader business user and analyst population from big data benefits. Our industry’s goal should be to help business users and analysts operationalize insights from big data. In fact, Forbes has declared that 2017 is the year that big data goes mainstream.

There are two critical elements needed to close this big data insights gap:

  • A scalable data platform: Handles big data that is compatible with “traditional” analytic platforms
  • An integration platform: Acquires large volumes of high-velocity diverse data without IT dependency

To address the first element, Amazon has released Amazon Redshift Spectrum as part of their growing family of AWS big data services. Optimized for massive data storage (e.g., petabytes and exabytes) that leverages S3 and delivered with the scalable performance of Amazon Redshift, AWS is making the above scenarios possible from an operational, accessibility, and economic perspective:

  • Operational: Amazon Redshift Spectrum allows for interaction with data volumes and diversity not possible with traditional OLAP technology.
  • Accessibility: SQL interface allows business users and analysts to use traditional analytic tools and skills to leverage these extreme data sets.
  • Economic: Amazon Redshift Spectrum shifts the majority of big data costs to S3 service which is far more economical than storing the entire data set in Redshift.

Clearly, Amazon has delivered a platform that can democratize the delivery of extremely large volumes of diverse business data to business users and analysts, allowing them to use the tools they currently employ, such as Tableau, PowerBI, QuickSight, Looker, and other SQL-enabled applications.

However, unless the large volumes of high velocity and diverse data can be captured, loaded to S3, and made available via Redshift Spectrum, none of the above benefits will be realized and the big data insights gap will remain.

The key challenges of acquiring and integrating large volumes of high velocity and diverse data:

  • On-prem in a Cloud-Native World: Many integration platforms were designed long ago to operate on-premises and to load data to an OLAP environment in batches. While some have been updated to operate in the cloud, many will fail with streaming workloads and collapse under the high volume of diverse data required today.
  • Integration is an “IT Task”: Typical integration platforms are intended to be used by IT organizations or systems integrators. Not only does this severely limit who can perform the integration work, it will also likely force the integration into a lengthy project queue, causing a lengthy delay in answering critical business questions.

To address the second element in closing the big data insights gap, business users and analysts themselves must be able to capture the “big data” so that business questions can be answered in a timely manner. If it takes a long and complex IT project to capture the data, the business opportunity may be lost.

To close the big data insights gap for business users and analysts, the integration platform must:

  • Handle large volumes of high velocity and diverse data
  • Focus on integration flow development (not complex code development)
  • Comply with IT standards and infrastructure

With the above approach to integration, the practical benefit is that those asking the business questions and seeking insights from having more data are able to leverage the powerful capabilities of Amazon Redshift Spectrum and will be able to respond business opportunities while it still matters.

Amazon’s Redshift Spectrum and the SnapLogic Enterprise Integration Cloud represent a powerful combination to close the big data insights gap for business users and analysts. In upcoming blog posts, we’ll look at actual use cases and learn how to turn these concepts into reality.

Interested in how SnapLogic empowers cloud warehouse users with up to a 10x improvement in the speed and ease of data integration for Redshift deployments, check out the white paper, “Igniting discovery: How built-for-the-cloud data integration kicks Amazon Redshift into high gear.”

Bill Creekbaum is Senior Director, Product Management at SnapLogic. Follow him on Twitter @wcreekba.

Why citizen integrators are today’s architects of customer experience

By Nada daVeiga

Lately, I’ve been thinking a lot about customer experience (CX) and the most direct, most effective ways for companies to transform it. As I recently blogged, data is the centerpiece – the metaphorical cake, as it were, compared to the martech frosting – of creating winning customer experiences.

That being said, which internal organization could possibly be better than marketing, to shape customer experience?

Nearly every enterprise function shapes CX

As it turns out, there are many teams within the modern enterprise that serve as CX architects. Think of all the different groups that contribute to customer engagement, acquisition, retention, and satisfaction: marketing, sales, service, and support are the most obvious, but what about product development, finance, manufacturing, logistics, and shipping? All of these functions impact the customer experience, directly or indirectly, and thus should be empowered to improve it through unbridled data access.

This point of view is reflected in SnapLogic’s new white paper, “Integration in the age of the customer: The five keys to connecting and elevating customer experience.” From it, a key thought:

[W]ho should corral the data? The best outcomes from customer initiatives happen when the business takes control and leads the initiative. The closer the integrators are to the customer, the better they can put themselves in their customers’ shoes and understand their needs. Often, they have a clear handle on metrics, the business processes, the data, and real-world customer experiences, whether they’re in marketing, sales, or service, and are the first to see how the changes they’re making are improving customer experience — or not.

Democratizing data integration

Because most departmental leaders in sales, service, and marketing are typically not familiar with programming, they look for integration solutions that provide click-not-code graphical user interfaces (GUIs) that enable a visual, intuitive process to democratize customer data integration. SnapLogic believes that GUI-driven, democratic data integration is an essential first step in empowering today’s CX architects to gain the analytic insight they need to improve customer experience.

In short, we believe that “citizen integrator” is really just another name for “citizen innovator;” fast, easy, seamless data integration shatters stubborn barriers to CX innovation by igniting exploration and problem-solving creativity.

To learn how to design your integration strategy to improve customer experience across the organization, download the white paper, “Integration in the age of the customer: The five keys to connecting and elevating customer experience.” In it, you’ll find actionable insights on how to optimize your organization’s data integration strategy to unlock CX innovation, including:

  • Why you need to ensure your organization’s integration strategy is customer-focused
  • How to plan around the entire customer lifecycle
  • Which five integration strategies help speed customer analytics and experience initiatives
  • How to put the odds of customer success in your favor

Nada daVeiga is VP Worldwide Pre-Sales, Customer Success, and Professional Services at SnapLogic. Follow her on Twitter @nrdaveiga.

Data integration: The key to operationalizing innovation

md_craig-BW-1443725112By Craig Stewart

It’s not just a tongue twister. Operationalizing innovation has proven to be one of the most elusive management objectives of the new millennium. Consider this sound bite from an executive who’d just participated in an innovation conference in 2005:

The real discussion at the meeting was about … how to operationalize innovation. All roads of discussion led back to that place. How do you make your company into a systemic innovator? There is no common denominator out there, no shared understanding on how to do that.[1]

The good news is that, in the 12 years since, cloud computing has exploded, and a common denominator clearly emerged: data. Specifically, putting the power of data – big data, enterprise data, and data from external sources – and analytics into users’ hands. More good news: An entirely new class of big data analysis tools[2] has emerged that allows business users to become “citizen data analysts.”

The bad news: There hasn’t been a fast, easy way to perform the necessary integrations between data sources, in the cloud – an essential first step that is the foundation of citizen data analytics, today’s hottest source of innovation.

Until now.

The SnapLogic Enterprise Integration Cloud is a mature, full-featured Integration Platform-as-a-Service (iPaaS) built in the cloud, for the cloud. Through its visual, automated approach to integration, the SnapLogic Enterprise Integration Cloud uniquely empowers both business and IT users, accelerating analytics initiatives on Amazon Redshift and other cloud data warehouses.

Unlike on-premises ETL or immature cloud tools, SnapLogic combines ease of use, streaming scalability, on-premises and cloud integration, and managed connectors called Snaps. Together, these capabilities present a 10x improvement over legacy ETL solutions like Informatica or other “cloud-washed” solutions originally designed for on-premises use, accelerating integrations from months to days.

By enabling “citizen integrators” to more quickly build, deploy and efficiently manage multiple high-volume, data-intensive integration projects, SnapLogic uniquely delivers:

  • Ease of use for business and IT users through a graphical approach to integration
  • A solution built for scale, offering bulk data movement and streaming data integration
  • Ideal capabilities for hybrid environments, with over 400 Snaps to handle relational, document, unstructured, and legacy data sources
  • Cloud data warehouse-readiness with native support for Amazon Redshift and other popular cloud data warehouses
  • Built-in data governance* by synchronizing data in Redshift at any time interval desired, from real-time to overnight batch.

* Why data governance matters

Analytics performed on top of incorrect data yield incorrect results – a detriment, certainly, in the quest to operationalize innovation. Data governance is a significant topic, and a major concern of IT organizations charged with maintaining the consistency of data routinely accessed by citizen data scientist and citizen integrator populations. Gartner estimates that only 10% of self-service BI initiatives are governed[3] to prevent inconsistencies that adversely affect the business.

Data discovery initiatives using desktop analytics tools risk creating inconsistent silos of data. Cloud data warehouses afford increased governance and data centralization. SnapLogic helps to ensure strong data governance by replicating source tables into Redshift clusters, where the data can be periodically synchronized at any time interval desired, from real-time to overnight batch. In this way, data drift is eliminated, allowing all users who access data, whether in Redshift or other enterprise systems, to be confident in its accuracy.

To find out more about how SnapLogic empowers citizen data scientists, and how a global pharmaceutical company is using SnapLogic to operationalize innovation, get the white paper, “Igniting discovery: How built-for-the-cloud data integration kicks Amazon Redshift into high gear.

Craig Stewart is Vice President, Product Management at SnapLogic.

[1] “Operationalizing Innovation–THE hot topic,” Bruce Nussbaum, Bloomberg, September 28, 2005. https://www.bloomberg.com/news/articles/2005-09-28/operationalizing-innovation-the-hot-topic

[2] “The 18 Best Analytics Tools Every Business Manager Should Know,” Bernard Marr, Forbes, February 4, 2016. https://www.forbes.com/sites/bernardmarr/2016/02/04/the-18-best-analytics-tools-every-business-manager-should-know/#825e6115d397

[3] “Predicts 2017: Analytics Strategy and Technology,” Kurt Schlegel, et. al., Gartner, November 30, 2016. ID: G00316349

James Markarian: Was the Election a Referendum on Predictive Analytics?

In his decades working in the data and analytics industry, SnapLogic CTO James Markarian has witnessed few mainstream events that have sparked as much discussion and elicited as many questions – around the value and accuracy of predictive analytics tools – as our recent election.

In a new blog post on Forbes, James examines where the nation’s top pollsters (who across the board predicted a different election outcome) possibly went wrong, why some predictions succeed and others fail, what businesses who have invested in data analytics can learn from the election, and how new technologies such as integration platform as a service (iPaaS) can help them make sense of all their data to make better predictions.

Be sure to read James’s blog, titled “What The Election Taught Us About Predictive Analytics”, on Forbes here.