Get your game plan on: Data warehouse migration to the cloud

By Ravi Dharnikota

You’ve decided to move your data warehouse to the cloud, and want to get started. Great! It’s easy to see why – in addition to the core benefits I wrote about in my last blog post, there are many more benefits associated with cloud data warehousing: incredibly fast processing, speedy deployment, built-in fault tolerance and disaster recovery and, depending on your cloud provider, strong security and governance.

A six-step reality check

But before you get too excited, it’s time to take a reality check; moving an existing data warehouse to the cloud is not quick, and it isn’t easy. It is definitely not as simple as exporting data from one platform and loading to another. Data is only one of the six warehouse components to be migrated.

Tactically and technically, data warehouse migration is an iterative process and needs many steps to migrate all of the components, as illustrated below. Here’s everything you need to consider in migrating your data warehouse to the cloud.

1) Migrating schema: Before moving warehouse data, you’ll need to migrate table structures and specifications. You may need to make structural changes as part of the migration, including indexing or partitioning – do they need to be rethought?

Data Warehouse Migration Process

2) Migrating data: Moving very large volumes of data is process intensive, network intensive, and time-consuming. You’ll need to map out how long will it take to migrate and if you can accelerate that process. You may need to restructure as part of schema migration and transform data as part of the data migration? Alternatively, can you transform in-stream or should you pre-process and then migrate?

3) Migrating ETL: Moving data may be the easy part when compared to migrating ETL processes. You may need to change the code base to optimize for platform performance and change data transformations to sync with data restructuring. You’ll need to determine if data flows should remain intact or be reorganized. As part of the migration, you may need to reduce data latency and deliver near real-time data. If that’s the case, would it make sense to migrate ETL processing to the cloud, as well? Is there a utility to convert your ETL code?

4) Rebuilding data pipelines: With any substantive change to data flow or data transformation, rebuilding data pipelines may be a better choice than migrating existing ETL. You may be able to isolate individual data transformations and package them as executable modules. You’ll need to understand the dependencies among data transformations to construct optimum workflow and the advantages you may gain – performance, agility, reusability, and maintainability – by rebuilding ETL as modular data pipelines using modern, cloud-friendly technology. 

5) Migrating metadata: Source-to-target metadata is a crucial part of managing a data warehouse; knowing data lineage, and tracing and troubleshooting is critical when problems occur. How readily will this metadata transfer to a new cloud platform? Are all of the mappings, transform logic, dataflow, and workflow locked in proprietary tools or buried in SQL code? You’ll need to determine if you’ll be able to export and import by either reverse engineering the metadata or rebuilding it from scratch.

6) Migrating users and applications: The final step in the process is migrating users and applications to the new cloud data warehouse, without interrupting business operations. Security and access authorizations may need to be created or changed, and BI and analytics tools should be connected. To do this, what communication is needed and with whom?

Don’t try to do everything at once

A typical enterprise data warehouse contains a large amount of data describing many business subject areas. Migrating an entire data warehouse in a single pass is usually not realistic. Incremental migration is the smart approach when “big bang” migration isn’t practical. Migrating incrementally is a must when undertaking significant design changes as part of the effort.

However, incremental migration brings new considerations. Data location should be transparent from a user point of view throughout the period when some data resides in the legacy data warehouse and some in the new cloud data warehouse. Consider a virtual layer as a point of access to decouple queries from data storage location.

A hybrid strategy is another viable option. With a hybrid approach, your on-premises data warehouse can remain operating as the cloud data warehouse comes online. During this transition phase, you’ll need to synchronize the data between the old on-premises data warehouse and the new one that’s in the cloud.

Cloud migration tools to the rescue

The good news is, there are many tools and services that can be invaluable when migrating your legacy data warehouse to the cloud. In my next post, the third and final in this series, I’ll explore the tools for data integration, data warehouse automation, and data virtualization, and system integrator resources that can speed and de-risk the process.

Learn more at SnapLogic’s upcoming webcast, “Traditional Data Warehousing is Dead: How digital enterprises are scaling their data to infinity and beyond in the Cloud,” on Wednesday, August 16 at 9:00am PT. I’ll be presenting with Dave Wells, Data Management Practice Lead, Eckerson Group, and highlighting tangible business benefits that your organization can achieve by moving your data to the cloud. You’ll learn:

      • Practical best practices, key technologies to consider, and case studies to get you started
      • The potential pitfalls of “cloud-washed” legacy data integration solutions
      • Cloud data warehousing market trends
      • How SnapLogic’s Enterprise Integration Cloud delivers up to a 10X improvement in the speed and ease of data integration

Sign up today!

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

Moving your data warehouse to the cloud: Look before you jump

By Ravi Dharnikota

Where’s your data warehouse? Is it still on-premises? If so, you’re not alone. Way back in 2011, in its IT predictions for 2012 and beyond, Gartner said, “At year-end 2016, more than 50 percent of Global 1000 companies will have stored customer-sensitive data in the public cloud.”

While it’s hard to find an exact statistic on how many enterprise data warehouses have migrated, cloud warehousing is increasingly popular as companies struggle with growing data volumes, service-level expectations, and the need to integrate structured warehouse data with unstructured data in a data lake.

Cloud data warehousing provides many benefits but getting there isn’t easy. Migrating an existing data warehouse to the cloud is a complex process of moving schema, data, and ETL. The complexity increases when restructuring of database schema or rebuilding of data pipelines is needed.

This post is the first in a “look before you leap” three-part series on how to jump-start your migration of an existing data warehouse to the cloud. As part of that, I’ll also cover how cloud-based data integration solutions can significantly speed your time to value.

Beyond basic: The benefits of cloud data warehousing

Cloud data warehousing is a Data Warehouse as a Service (DWaaS) approach that simplifies time-consuming and costly management, administration, and tuning activities that are typical of on-premises data warehouses. But beyond the obvious – data warehouses being stored in the cloud - there’s more. Processing is also cloud-based, and all major solution providers charge separately for storage and compute resources, both of which are highly scalable.

All of which leads us to a more detailed list of key advantages:

  • Scale up (and down): The volume of data in a warehouse typically grows at a steady pace as time passes and history is collected. Sudden upticks in data volume occur with events such as mergers and acquisitions, and when new subjects are added. The inherent scalability of a cloud data warehouse allows you to adapt to growth, adding resources incrementally (via automated or manual processes) as data and workload increase. The elasticity of cloud resources allows the data warehouse to quickly expand and contract data and processing capacity as needed, with no impact to infrastructure availability, stability, performance, and security.
  • Scale out: Adding more concurrent users requires the cloud data warehouse to scale out. You will be able to add more resources – either more nodes to an existing cluster or an entirely new cluster, depending on the situation – as the number of concurrent users rises, allowing more users to access the same data without query performance degradation.
  • Managed infrastructure: Eliminating the overhead of data center management and operations for the data warehouse frees up resources to focus where value is produced: using the data warehouse to deliver information and insight.
  • Cost savings: On-premises data centers themselves are extremely expensive to build and operate, requiring staff, servers, and hardware, networking, floor space, power, and cooling. (This comparison site provides hard dollar data on many data center elements.) When your data warehouse lives in the cloud, the operating expense in each of these areas is eliminated or substantially reduced.
  • Simplicity: Cloud data warehouse resources can be accessed through a browser and activated with a payment card. Fast self-service removes IT middlemen and democratizes access to enterprise data.

In my next post, I’ll do a quick review of additional benefits and then dive into data migration. If you’d like to read all the details about the benefits, techniques, and challenges of migrating your data warehouse to cloud, download the Eckerson Group white paper, “Jump-Start Your Cloud Data Warehouse: Meeting the Challenges of Migrating to the Cloud.

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

The commoditization of integration

By Dinesh Chandrasekhar

Eight years ago, dozens of integration vendors were offering scores of solutions, all with what seemed to be the same capabilities. Pick any ESB or ETL tool and each seemed to perform the same functions as their competitors. RFPs were no longer a viable way to weed out the inferior vendors as each solution checked all the boxes across the board. Plus, all vendors were ready to lower their prices at the drop of a hat to win your business. It was at this time that the integration market had truly reached a level of commoditization. Consumers could easily pick and choose any solution as there were no true differentiators amongst them.

But, several factors have changed the landscape since then:

  • NoESB – The NoESB architecture had started gaining interest – pushing the idea of the irrelevancy of ESB for many integration scenarios. Yet, an API Gateway was not the right alternative.
  • Cloudification – The cloudification of pretty much all your favorite on-premises enterprise applications began around the same time. Enterprises that were thinking of a digital transformation couldn’t get too far without a definitive cloud strategy in place.
  • Convergence of ESB and ETL – The lines between application integration and data integration were blurring. CIOs and IT managers didn’t want to deal with two different sets of integration tools. With the onset of mobile and IoT, data volumes were exploding daily. As a result, even data warehouses moved to the cloud. To serve such big data needs, the traditional/legacy ESB/ETL tools were incompetent and unfit.
  • Agile Integrations – Finally, the DevOps and Agile movements impacted enterprise integration initiatives as well. They had given rise to new user personas in the enterprise – Citizen Integrators or Citizen Developers. These are the LOB Managers or non-IT personnel that needed quick integrations within their applications to render their data in different views. The reliance on IT to deliver solutions to business was becoming a major hindrance.

All these factors have influenced the iPaaS (Integration Platform as a Service) market. Now, thousands of companies are already leveraging iPaaS solutions to integrate their cloud and on-premises solutions. iPaaS solutions break away from legacy approaches to integration, are cloud-native, intuitive, fast, self-starting, support hybrid architectures, and offer connectors to a wide range of on-premises and on the cloud applications.

Now comes the big question – “Will iPaaS solutions be commoditized, too?” At the moment, the answer is a definite NO and there are multiple reasons why. Beyond scale, latency, tenancy, SLAs, number of connectors etc., one of the key areas that will differentiate iPaaS solutions is the developer experience. The user interface of the solution will determine the adoption rate and the value it brings to the enterprise. So, for a citizen integrator to actually use the system, the interface should be intuitive enough to guide them in building their integration flows quickly, effectively, and most importantly, without the assistance of IT. This alone will make or break the system adoption.

iPaaS vendors are trying to enhance this developer experience with features like drag-and-drop connectors, pipeline snippets, a templates library, a starter kit, mapping enhancements, etc. However, very few vendors are offering AI-driven tooling that enables intelligent ways to predict next steps – based on learnings from hundreds of other users – for your integration flow. AI-assist is truly a great benefit for citizen integrators, who may be non-technical. Even technically savvy developers welcome a significant boost in their productivity. With innovations like this happening, the iPaaS space is quite far away from being commoditized. However, enterprises still need to be wary of cloud-washing iPaaS vendors that offer “1000+” connectors, a thick-client IDE, or an ESB wrapped in a cloud blanket. And, that is a post for a different day!

Dinesh Chandrasekhar is Director of Product Marketing at SnapLogic. Follow him on Twitter @AppInt4All.

Data management takes center stage at Rutberg 2017 conference

Each year, research-centric investment bank Rutberg & Company gathers top business leaders and technology experts for an intimate, two-day forum where they discuss and debate the technology, ideas, and trends driving global business. The annual Rutberg 2017 conference took place last week in Half Moon Bay, California, and data management was front and center.

SnapLogic CEO Gaurav Dhillon joined Mesosphere CEO Florian Leibert and Segment CEO Peter Reinhardt for a spirited panel discussion on the growing data management opportunities and challenges facing enterprises today. The panel was moderated by Fortune reporter Jonathan Vanian.

A number of important data management and integration trends emerged, including:

  • LOB’s influence grows: Gaurav noted that more and more, “innovation is coming from the LOB,” whether in Sales, Marketing, Finance, HR, or elsewhere in the organization. These LOB leaders are tech-savvy, are responsible for their own P&L’s, and they know speed and agility will determine tomorrow’s winners. So they’re constantly on the hunt for the latest tech solutions that will drive innovation, spur growth, and help them beat the competition.
  • Data fragmentation on the rise: With individual LOBs procuring a flurry of new cloud applications and technologies, the result is often business silos and a disconnected enterprise. “The average enterprise has 10x more SaaS apps than a CIO thinks,” said Gaurav of the increasing SaaS sprawl, which is requiring CIOs to think differently about how they integrate and manage disparate apps and data sources across the enterprise.
  • Self-service integration is here to stay: The bigger a company gets – with more apps, more end-points, more data-types, more fragmentation – there’s never going to be enough humans to manage the required integration in a timely manner, explained Gaurav. Enter new, modern, self-service integration platforms. “The holy grail of integration is self-service and ease-of-use … we have to bring integration out of the dungeon and into the light,” Gaurav continued. And this means getting integration into the hands of the LOB, and making it fast and easy. The days of command-and-control by IT are over: “Trying to put the genie back in the bottle is wrong; instead you need to give the LOBs a self-service capability to wire this up on their own,” noted Gaurav.
  • AI will be a game-changer: Artificial intelligence (AI) and machine learning (ML) are already making apps, platforms, and people smarter. Like with Google auto-complete or shopping on Amazon, we’re already becoming accustomed to assistance from, and recommendations by, machines. “Software without AI will be like Microsoft Word or email without spell-check,” it will be jarring not to have it, said Gaurav. AI is already being applied to complex tasks like app and data integration; it’s not a future state, he said, the start of “self-driving integration is happening today.”
  • The enterprise is a retrofit job: For all the latest advances – new cloud apps, AI and ML technologies, self-service integration platforms – the enterprise remains a “retrofit job,” where the new must work with the old. Large, global enterprises aren’t about to throw out decades of technology investment all at once, particularly if it is working just fine or well-suited to handle certain business processes. So, new cloud technologies will need to work with older on-premise solutions, once again cementing integration platforms as a critical piece of an enterprise technology strategy. “It will be a hybrid world for a long, long time,” concluded Gaurav.

Without question, data has become any organization’s most valuable asset, and those that are able to integrate, manage, and analyze data effectively will be the winners of tomorrow.

Will the Cloud Save Big Data?

This article was originally published on ITProPortal.

Employees up and down the value chain are eager to dive into big data solutions, hunting for golden nuggets of intelligence to help them make smarter decisions, grow customer relationships and improve business efficiency. To do this, they’ve been faced with a dizzying array of technologies – from open source projects to commercial software products – as they try to wrestle big data to the ground.

Today, a lot of the headlines and momentum focus around some combination of Hadoop, Spark and Redshift – all of which can be springboards for big data work. It’s important to step back, though, and look at where we are in big data’s evolution.

In many ways, big data is in the midst of transition. Hadoop is hitting its pre-teen years, having launched in April 2006 as an official Apache project – and then taking the software world by storm as a framework for distributed storage and processing of data, based on commodity hardware. Apache Spark is now hitting its strides as a “lightning fast” streaming engine for large-scale data processing. And various cloud data warehousing and analytics platforms are emerging, from big names (Amazon Redshift, Microsoft Azure HDInsight and Google BigQuery) to upstart players like Snowflake, Qubole and Confluent.

The challenge is that most big data progress over the past decade has been limited to big companies with big engineering and data science teams. The systems are often complex, immature, hard to manage and change frequently – which might be fine if you’re in Silicon Valley, but doesn’t play well in the rest of the world. What if you’re a consumer goods company like Clorox, or a midsize bank in the Midwest, or a large telco in Australia? Can this be done without deploying 100 Java engineers who know the technology inside and out?

At the end of the day, most companies just want better data and faster answers – they don’t want the technology headaches that come along with it. Fortunately, the “mega trend” of big data is now colliding with another mega trend: cloud computing. While Hadoop and other big data platforms have been maturing slowly, the cloud ecosystem has been maturing more quickly – and the cloud can now help fix a lot of what has hindered big data’s progress.

The problems customers have encountered with on-premises Hadoop are often the same problems that were faced with on-premises legacy systems: there simply aren’t enough of the right people to get everything done. Companies want cutting-edge capabilities, but they don’t want to deal with bugs and broken integrations and rapidly changing versions. Plus, consumption models are changing – we want to consume data, storage and compute on demand. We don’t want to overbuy. We want access to infrastructure when and how we want it, with just as much as we need but more.

Big Data’s Tipping Point is in the Cloud

In short, the tipping point for big data is about to happen – and it will happen via the cloud. The first wave of “big data via the cloud” was simple: companies like Cloudera put their software on Amazon. But what’s “truly cloud” is not having to manage Hadoop or Spark – moving the complexity back into a hosted infrastructure, so someone else manages it for you. To that end, Amazon, Microsoft and Google now deliver “managed Hadoop” and “managed Spark” – you just worry about the data you have, the questions you have and the answers you want. No need to spin up a cluster, research new products or worry about version management. Just load your data and start processing.

There are three significant and not always obvious benefits to managing big data via the cloud: 1) Predictability – the infrastructure and management burden shifts to cloud providers, and you simply consume services that you can scale up or down as needed; 2) Economics – unlike on-premises Hadoop, where compute and storage were intermingled, the cloud separates compute and storage so you can provision accordingly and benefit from commodity economics; and 3) Innovation – new software, infrastructure and best practices will be deployed continuously by cloud providers, so you can take full advantage without all the upfront time and cost.

Of course, there’s still plenty of hard work to do, but it’s more focused on the data and the business, and not the infrastructure. The great news for mainstream customers (well beyond Silicon Valley) is that another mega-trend is kicking in to revolutionize data integration and data consumption – and that’s the move to self-service. Thanks to new tools and platforms, “self-service integration” is making it fast and easy to create automated data pipelines with no coding, and “self-service analytics” is making it easy for analysts and business users to manipulate data without IT intervention.

All told, these trends are driving a democratization of data that’s very exciting – and will drive significant impact across horizontal functions and vertical industries. Data is thus becoming a more fluid, dynamic and accessible resource for all organizations. IT no longer holds the keys to the kingdom – and developers no longer control the workflow. Just in the nick of time, too, as the volume and velocity of data from digital and social media, mobile tools and edge devices threaten to overwhelm us all. Once the full promise of the Internet of Things, Artificial Intelligence and Machine Learning begins to take hold, the data overflow will be truly inundating.

The only remaining question: What do you want to do with your data?

Ravi Dharnikota is the Chief Enterprise Architect at SnapLogic. 

Podcast: James Markarian and David Linthicum on New Approaches to Cloud Integration

SnapLogic CTO James Markarian recently joined cloud expert David Linthicum as a guest on the Doppler Cloud Podcast. The two discussed the mass movement to the cloud and how this is changing how companies approach both application and data integration.

In this 20-minute podcast, “Data Integration from Different Perspectives,” the pair discuss how to navigate the new realities of hybrid app integration, data and analytics moving to the cloud, user demand for self-service technologies, the emerging impact of AI and ML, and more.

You can listen to the full podcast here, and below:

 

The Internet of Things and Wearable Tech: Our Interconnected Future

Internet-of-ThingsThe first Bluetooth headset was sold in 2000. Nearly a decade and a half later, 2014 was declared “the year of the wearable” by tech publications and industry enthusiasts. 2014, after all, was when tech fans were first informed about the pending arrival of what is now the most famous wearable in the world – the Apple Watch.

But if 2014 was the year of the wearable, you wouldn’t have known it if you were a guest at that year’s Consumer Electronic Show. The 2014 CES was dominated not by Apple Watch anticipation, but unbridled excitement over the Internet of Things (IoT).

Now in 2015, it is hard to talk about one without talking about the other. Wearables and IoT are on a collision course, and the merger is already triggering an entirely new technological revolution: the Internet of Me.

Wearables: Following the Path of the Smartphone

The Internet of Things refers to the widespread usage of wifi to animate and connect “dumb” machines and objects, such as toothbrushes to make them “smart.” Once enlightened, these smart devices can communicate not only with each other, but with their human masters.

A smart toothbrush gathers data about your brushing habits and sends it directly to your dentist to analyze before your next visit. IoT, which has been creeping forward for years, is now poised for mainstream saturation before the end of the decade. But the introduction of wearables is speeding up - and altering - the onset of IoT.

Wearables are evolving along a path similar to the one taken by smartphones. Smartphones didn’t truly hit their stride until Apple launched the App Store, which enabled users to integrate their entire digital lives -  from their daily planner to GoogleDrive to their ecommerce landing pages to their iTunes music library - all in one place.

Like pre-App Store smartphones, wearables are just another ecosystem of devices. Wearables can’t revolutionize the way humans interact with technology until they are stitched together with the other crucial components of our digital lives. The arrival of IoT is providing just that stitching.

Ford Cars, Android Wear and Connected Wearables

Ford is leading the charge to integrate IoT with the wearables that people access while they are driving. If a diabetic driver has a medical bracelet or watch, it could relay information about the driver’s blood-glucose level to the car’s on-board multimedia system, which could then relay that information to physicians or family members, if need be. If a baby were sleeping in the back, a wearable could monitor its vitals and relay the information to the vehicle, to the parents’ wearables, or both.

Wearable-techOne The Internet of YOU: When Wearable Tech and the Internet of Things Collide describes the phenomenon of IoT plus wearables - The Internet of You - as “having the potential to build our technology so that it works for us, not the other way around.” One example is Android Wear, which was built by Google. Google recently purchased Nest, which is a collection of smart household devices. When Android Wear connects to the Next thermostat, for example, the thermostat wouldn’t need to be programmed. Instead, Wear could “tell” the thermostat that the wearer is getting too warm or cool, and the thermostat could then adjust the temperature in the room.

The Internet of You combines the personalization of wearables with the ubiquity of the Internet of Things. Like smartphones, wearables unite the scattered elements of the user’s personal and digital life. If wearables existed in a vacuum, they would be another cool novelty gadget - a toy for people with disposable income. But with IoT acting as the glue that bonds wearables to all of the increasingly “smart” devices that surround us in our daily lives, wearables have the potential to rival - or replace - smartphones as the single most important devices we own. Just as IoT will affect the rise of wearables, wearables have the potential to act as the unifying force that bonds the billions of devices that will make up the Internet of Things.

Together, they are the Internet of You.

Nick Rojas is a business consultant and write who lives in Los Angeles and Chicago. He has consulted small and medium-sized enterprises for over twenty years. He has contributed articles to Visual.ly, Entrepreneur and TechCrunch. You can follow him on Twitter @NickARojas, or you can reach him at NickAndrewRojas@gmail.com.