Moving your data warehouse to the cloud: Look before you jump

By Ravi Dharnikota

Where’s your data warehouse? Is it still on-premises? If so, you’re not alone. Way back in 2011, in its IT predictions for 2012 and beyond, Gartner said, “At year-end 2016, more than 50 percent of Global 1000 companies will have stored customer-sensitive data in the public cloud.”

While it’s hard to find an exact statistic on how many enterprise data warehouses have migrated, cloud warehousing is increasingly popular as companies struggle with growing data volumes, service-level expectations, and the need to integrate structured warehouse data with unstructured data in a data lake.

Cloud data warehousing provides many benefits but getting there isn’t easy. Migrating an existing data warehouse to the cloud is a complex process of moving schema, data, and ETL. The complexity increases when restructuring of database schema or rebuilding of data pipelines is needed.

This post is the first in a “look before you leap” three-part series on how to jump-start your migration of an existing data warehouse to the cloud. As part of that, I’ll also cover how cloud-based data integration solutions can significantly speed your time to value.

Beyond basic: The benefits of cloud data warehousing

Cloud data warehousing is a Data Warehouse as a Service (DWaaS) approach that simplifies time-consuming and costly management, administration, and tuning activities that are typical of on-premises data warehouses. But beyond the obvious – data warehouses being stored in the cloud - there’s more. Processing is also cloud-based, and all major solution providers charge separately for storage and compute resources, both of which are highly scalable.

All of which leads us to a more detailed list of key advantages:

  • Scale up (and down): The volume of data in a warehouse typically grows at a steady pace as time passes and history is collected. Sudden upticks in data volume occur with events such as mergers and acquisitions, and when new subjects are added. The inherent scalability of a cloud data warehouse allows you to adapt to growth, adding resources incrementally (via automated or manual processes) as data and workload increase. The elasticity of cloud resources allows the data warehouse to quickly expand and contract data and processing capacity as needed, with no impact to infrastructure availability, stability, performance, and security.
  • Scale out: Adding more concurrent users requires the cloud data warehouse to scale out. You will be able to add more resources – either more nodes to an existing cluster or an entirely new cluster, depending on the situation – as the number of concurrent users rises, allowing more users to access the same data without query performance degradation.
  • Managed infrastructure: Eliminating the overhead of data center management and operations for the data warehouse frees up resources to focus where value is produced: using the data warehouse to deliver information and insight.
  • Cost savings: On-premises data centers themselves are extremely expensive to build and operate, requiring staff, servers, and hardware, networking, floor space, power, and cooling. (This comparison site provides hard dollar data on many data center elements.) When your data warehouse lives in the cloud, the operating expense in each of these areas is eliminated or substantially reduced.
  • Simplicity: Cloud data warehouse resources can be accessed through a browser and activated with a payment card. Fast self-service removes IT middlemen and democratizes access to enterprise data.

In my next post, I’ll do a quick review of additional benefits and then dive into data migration. If you’d like to read all the details about the benefits, techniques, and challenges of migrating your data warehouse to cloud, download the Eckerson Group white paper, “Jump-Start Your Cloud Data Warehouse: Meeting the Challenges of Migrating to the Cloud.

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

Future Data Movement Trends with SnapLogic

Data volumes are exponentially increasing and many organizations are starting to realize the complexity of their growing data movement and data management solutions. Data exists in various systems, and getting meaningful value out of it has become a major challenge for many companies. Also, most of the data is usually stored in relational systems like MySQL, PostgreSQL and Oracle, these being the mainstream databases primarily used for OLTP purposes. NoSQL systems like Cassandra, MongoDB and DynamoDB have also emerged with tunable consistency model in order to store some of these mission critical data. Customers then typically move these data to much bigger systems like Teradata and Hadoop (OLAP) that can store large amounts of data, so they can run analytics, reporting or complex queries against it. There is also a recent trend where some of these data are moved to the cloud, especially to Amazon RedShift or Snowflake and also to HDInsights or Azure Data Warehouse.

Continue reading “Future Data Movement Trends with SnapLogic”

SnapLogic CEO Gaurav Dhillon on Andreessen Horowitz Podcast

SnapLogic co-founder and CEO Gaurav Dhillon sat down recently with Scott Kupor, managing partner at Andreessen Horowitz, for a wide-ranging podcast discussion of all-things-data.

The two discussed how the data management landscape has changed in recent years, the rise of advanced analytics, the move from data warehouses to data lakes, and other changes which are enabling organizations to “take back their enterprise.”

Continue reading “SnapLogic CEO Gaurav Dhillon on Andreessen Horowitz Podcast”

The Data Lake Data Integration Challenge

The future for big data processing lies in the adoption of commercial Hadoop distributions and their supposed deployments. The macro use case for big data are data lakes, which are massive amounts of structured and unstructured data that do not carry the same restrictions as traditional data warehouses. They store everything, including every type of data, any volume, any scope of data that may be used by enterprise data users, for any reason.

Despite the power and potential of data lakes, many enterprises continue to approach this technology with the same data integration approaches and mechanisms they’ve used in the past, none of which work well. How can we tap into the power of the data lake?  Continue reading “The Data Lake Data Integration Challenge”

Is the Data Warehouse Dead?

snaplogic_aws

SnapLogic and Amazon Web Services are hosting a series of exclusive live seminars starting this week in Dallas. Next week we’ll be in Chicago and New York, followed by Palo Alto later in the month. The seminar series is focused on the future of data warehouse solutions and analytics in the modern enterprise. A key question that we’ll address is: Is the Data Warehouse Dead? Continue reading “Is the Data Warehouse Dead?”

Thought Leadership Webinar: Are You Battling an Illogical Data Warehouse?

G+_PostIcon_1200x627 (1)Big data is evolving as a practice and we are quickly approaching a point at which data will be treated as a single source, which will require a different type of architecture. According to John Myers of Enterprise Management Associates (EMA), this architecture will need to be one that is focused beyond a single platform, where operational and analytical workloads work together. This architecture is called a Hybrid Data Ecosystem.

Join us on Wednesday, April 29th for a live webinar with John, Managing Research Director for EMA’s Business Intelligence practice. This webinar will review the drivers associated with big data implementations, evolving technical requirements for big data environments, and how a robust information management layer is important to big data projects.

During the webinar, we’ll also review how recent EMA research describes the following:

  • Use cases that drive big data and the importance of Internet of Things and streaming applications in big data
  • The impact of cloud implementation avenues for big data projects
  • How the EMA Hybrid Data Ecosystem Information Management Layer coordinates integration between disparate platforms and data types

Register now and join John Myers and the SnapLogic team for this exciting webinar to learn about what constitutes the Hybrid Data Ecosystem – and why it’s a necessity for modern data integration.

5 Best Practices for Attaining Excellence in Big Data Integration

“Our research uncovers best practices that innovative organizations use not only to prepare and integrate big data but also more tightly unify it with analytics and operations across enterprise and cloud computing environments.”

– Mark Smith, CEO & Chief Research Officer, Ventana Research

G+_PostIcon_1200x627Our latest webinar, featuring industry expert Mark Smith, focused on integration as a way to make full use of big data coming into the enterprise from a variety of sources and in incompatible formats. That being said, most organizations lack the technology to automate this process and manage this daunting challenge. Rather than relying on existing tools not specifically designed for this purpose, Ventana Research recommends that businesses use technology designed specifically to handle big data integration as adoption can significantly impact the ability to succeed in the world of nonstop data.

During the interactive discussion, Mark covered five best practices for attaining excellence in big data integration, which are:

  1. Evaluate efficiency of processes: Organizations need to increase agility rather than waste significant amounts of time on data integration-related tasks; integration capabilities need to be flexible enough to deliver cycles of processing to satisfy an array of different needs.
  2. Examine new approaches: Only one third of organizations are satisfied with their current technology and more than half of organizations say their current infrastructure is not fast enough or flexible enough; almost half said the technology is simply inadequate.
  3. Evaluate technology needs: Research shows that what matters most in selecting big data integration technology are its usability and reliability; the top three factors driving big data integration are business improvement, analytics and BI initiatives, and improvement in the quality of business processes.
  4. Investigate dedicated technology: Using dedicated data integration improves integration processes; however, currently only 12 percent of organizations use dedicated technology for this purpose.
  5. Gain benefits that outweigh costs: Organizations need to gain value from data and pinpoint the areas of business in which investment can help, allowing acquisition and deployment to address an organization’s needs

Mark also talked about how cloud is playing an increasing role as data can be accessed anywhere; according to research, 35% of organizations are integrating cloud-based systems with those on-premises. For full content of the webinar, take a look at the presentation slides or watch the recording here; you can also check out our infographic that addresses enterprise IT drivers, questions and uncertainties around big data integration. Ventana Research has also published an ebook available now on our website on the same topics covered in the webinar. Download it here.

Finally, we live tweeted Wednesday’s webinar with the hashtag #BDI. Check out the full roundup below with some great insights from Mark Smith: