The bigger picture: Strategizing your data warehouse migration

By Ravi Dharnikota

If your organization is moving its data warehouse to the cloud, you can be confident you’re in good company. And if you read my last blog post about the six-step migration process, you can be even more confident that the move will go smoothly. However, don’t pull the trigger just yet. You’ve got a bit more planning to do, this time at a more strategic level.

First, let’s recap the migration process I covered in my last post, of the data warehouse itself. In that blog post, I broke down all the components of this diagram:

Data Warehouse Migration Process

Now, as you can see in the diagram below, the data warehouse migration process itself is part of a bigger picture of migration planning and strategy. Let’s take a look at the important pre-migration steps you can take help to ensure success with the migration itself.

Migration Strategy and Planning

Step 1: Define Goals and Business Case. Start the planning process with a clear picture of the business reasons for migrating your data warehouse to the cloud. Common goals include:

  • Agility in terms of both the business and the IT organization’s data warehousing projects.
  • Performance on the back end, to ensure timeliness and availability of data, and on the front end, for fast end-user query response times.
  • Growth and headroom to ease capacity planning; the elastic scalability of cloud resources mitigates this problem.
  • Cost savings on hardware, software, services, space, and utilities.
  • Labor savings from reduced needs for database administration, systems administration, scheduling and operations, and maintenance and support.

Step 2: Assess the current data warehouse architecture. If the current architecture is sound, you can plan to migrate to the cloud without redesign and restructuring. If architecturally sufficient for BI but limited for advanced analytics and big data integration, you should review and refine data models and processes as part of the migration effort. If the current architecture struggles to meet current BI requirements, plan to redesign it as you migrate to the cloud.

Step 3: Define the migration strategy. A “lift and shift” approach is tempting, but it rarely succeeds. Changes are typically needed to adapt data structures, improve processing, and ensure compatibility with the chosen cloud platform. Incremental migration is more common and usually more successful.

As I mentioned in my last blog post, a hybrid strategy is another viable option. Here, your on-premises data warehouse can remain operating as the cloud data warehouse comes online. During this transition phase, you’ll need to synchronize the data between the old on-premises data warehouse and the new one that’s in the cloud.

Step 4: Select the technology including the cloud platform you’ll migrate to, and which tools you’ll need for the migration. There are many types of tools and services that can be valuable:

  • Data integration tools are used to build or rebuild ETL processes to populate the data warehouse. Integration platform as a service (iPaaS) technology is especially well suited for ETL migration.
  • Data warehouse automation tools like WhereScape can be used to deconstruct legacy ETL, reverse engineer and redesign ETL processes, and regenerate ETL processes without the need to reconstruct data mappings and transformation logic.
  • Data virtualization tools such as Denodo provide a virtual layer of data views to support queries that are independent of storage location and adaptable to changing data structures.
  • System integrators and service providers like Atmosera can be helpful when manual effort is needed to extract data mappings and transformation logic that is buried in code.

Using these tools and services individually or in combination can make a remarkable difference in your data warehouse, serving to speed and de-risk the migration process.

Step 5: Migrate and operationalize; start by defining test and acceptance criteria. Plan the testing, then execute the migration process to move schema, data, and processing. Execute the test plan and, when successful, operationalize the cloud data warehouse and migrate users and applications.

Learn more at SnapLogic’s upcoming webinar

To get the full story on data warehouse cloud migration, join me for an informative SnapLogic webinar, “Traditional Data Warehousing is Dead: How digital enterprises are scaling their data to infinity and beyond in the Cloud,” on Wednesday, August 16 at 9:00am PT. I’ll be presenting with Dave Wells, Leader of Data Management Practice, Eckerson Group, and highlighting tangible business benefits that your organization can achieve by moving your data to the cloud. You’ll learn:

  • Practical best practices, key technologies to consider, and case studies to get you started
  • The potential pitfalls of “cloud-washed” legacy data integration solutions
  • Cloud data warehousing market trends
  • How SnapLogic’s Enterprise Integration Cloud delivers up to a 10X improvement in the speed and ease of data integration

Register today!

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

Integrate through the big data insights gap

By Bill Creekbaum

Whether you’re an analyst, data scientist, CxO, or just a “plain ol’ business user,” having access to more data represents an opportunity to make better business decisions, identify new and innovative opportunities, respond to hard-to-identify threats … the opportunities abound.

More data – from IoT, machine logs, streaming social media, cloud-native applications, and more – is coming at you with diverse structures and in massive volumes at high velocity. Traditional analytic and integration platforms were never designed to handle these types of workloads.

The above data is often associated with big data and tends to be accessible by a very limited audience with a great deal of technical skill and experience (e.g., data scientists), limiting the business utility of having more data. This creates a big data insights gap and prevents a much broader business user and analyst population from big data benefits. Our industry’s goal should be to help business users and analysts operationalize insights from big data. In fact, Forbes has declared that 2017 is the year that big data goes mainstream.

There are two critical elements needed to close this big data insights gap:

  • A scalable data platform: Handles big data that is compatible with “traditional” analytic platforms
  • An integration platform: Acquires large volumes of high-velocity diverse data without IT dependency

To address the first element, Amazon has released Amazon Redshift Spectrum as part of their growing family of AWS big data services. Optimized for massive data storage (e.g., petabytes and exabytes) that leverages S3 and delivered with the scalable performance of Amazon Redshift, AWS is making the above scenarios possible from an operational, accessibility, and economic perspective:

  • Operational: Amazon Redshift Spectrum allows for interaction with data volumes and diversity not possible with traditional OLAP technology.
  • Accessibility: SQL interface allows business users and analysts to use traditional analytic tools and skills to leverage these extreme data sets.
  • Economic: Amazon Redshift Spectrum shifts the majority of big data costs to S3 service which is far more economical than storing the entire data set in Redshift.

Clearly, Amazon has delivered a platform that can democratize the delivery of extremely large volumes of diverse business data to business users and analysts, allowing them to use the tools they currently employ, such as Tableau, PowerBI, QuickSight, Looker, and other SQL-enabled applications.

However, unless the large volumes of high velocity and diverse data can be captured, loaded to S3, and made available via Redshift Spectrum, none of the above benefits will be realized and the big data insights gap will remain.

The key challenges of acquiring and integrating large volumes of high velocity and diverse data:

  • On-prem in a Cloud-Native World: Many integration platforms were designed long ago to operate on-premises and to load data to an OLAP environment in batches. While some have been updated to operate in the cloud, many will fail with streaming workloads and collapse under the high volume of diverse data required today.
  • Integration is an “IT Task”: Typical integration platforms are intended to be used by IT organizations or systems integrators. Not only does this severely limit who can perform the integration work, it will also likely force the integration into a lengthy project queue, causing a lengthy delay in answering critical business questions.

To address the second element in closing the big data insights gap, business users and analysts themselves must be able to capture the “big data” so that business questions can be answered in a timely manner. If it takes a long and complex IT project to capture the data, the business opportunity may be lost.

To close the big data insights gap for business users and analysts, the integration platform must:

  • Handle large volumes of high velocity and diverse data
  • Focus on integration flow development (not complex code development)
  • Comply with IT standards and infrastructure

With the above approach to integration, the practical benefit is that those asking the business questions and seeking insights from having more data are able to leverage the powerful capabilities of Amazon Redshift Spectrum and will be able to respond business opportunities while it still matters.

Amazon’s Redshift Spectrum and the SnapLogic Enterprise Integration Cloud represent a powerful combination to close the big data insights gap for business users and analysts. In upcoming blog posts, we’ll look at actual use cases and learn how to turn these concepts into reality.

Interested in how SnapLogic empowers cloud warehouse users with up to a 10x improvement in the speed and ease of data integration for Redshift deployments, check out the white paper, “Igniting discovery: How built-for-the-cloud data integration kicks Amazon Redshift into high gear.”

Bill Creekbaum is Senior Director, Product Management at SnapLogic. Follow him on Twitter @wcreekba.

Gartner Names SnapLogic a Leader in the 2017 Enterprise iPaaS Magic Quadrant

For the second year in a row, SnapLogic has been named a Leader in Gartner’s Magic Quadrant for Enterprise Integration Platform as a Service (iPaaS).

Gartner evaluated iPaaS vendors on “completeness of vision” and “ability to execute.” Those named to the Leaders quadrant, as Gartner noted in the report, “have a solid reputation, with notable market presence and a proven track record in enabling … their platforms are well-proven and functionally rich, with regular releases to rapidly address this fast-evolving market.”

In a press release issued today, SnapLogic CTO James Markarian said of the recognition: “Since our inception, we have been laser-focused on delivering a modern enterprise integration platform that is specifically designed to manage the data and application integration demands of today’s hybrid enterprise technology environments. Our Enterprise Integration Cloud eliminates the complexity of legacy integrations, providing a platform that supports fast and easy self-service integration.”

The Enterprise iPaaS Magic Quadrant is embedded below. We’d encourage you to download the complete report as it provides a comprehensive review of all the vendors and the growing market.

Gartner 2017 iPaaS MQ

Thanks to all of SnapLogic’s customers, partners, and employees for the ongoing support and for making SnapLogic’s Enterprise Integration Cloud a leading self-service integration platform connecting applications, data, and things.

VIDEO: SnapLogic Discusses Big Data on #theCUBE from Strata+Hadoop World San Jose

It’s Big Data Week here in Silicon Valley with data experts from around the globe convening at Strata+Hadoop World San Jose for a packed week of keynotes, education, networking and more - and SnapLogic was front-and-center for all the action.

SnapLogic stopped by theCUBE, the popular video-interview show that live-streams from top tech events, and joined hosts Jeff Frick and George Gilbert for a spirited and wide-ranging discussion of all things Big Data.

First up was SnapLogic CEO Gaurav Dhillon, who discussed SnapLogic’s record-growth year in 2016, the acceleration of Big Data moving to the cloud, SnapLogic’s strong momentum working with AWS Redshift and Microsoft Azure platforms, the emerging applications and benefits of ML and AI, customers increasingly ditching legacy technology in favor of modern, cloud-first, self-service solutions, and more. You can watch Gaurav’s full video below, and here:

Next up was SnapLogic Chief Enterprise Architect Ravi Dharnikota, together with our customer, Katharine Matsumoto, Data Scientist at eero. A fast-growing Silicon Valley startup, eero makes a smart wireless networking system that intelligently routes data traffic on your wireless network in a way that reduces buffering and gets rid of dead zones in your home. Katharine leads a small data and analytics team and discussed how, with SnapLogic’s self-service cloud integration platform, she’s able to easily connect a myriad of ever-growing apps and systems and make important data accessible to as many as 15 different line-of-business teams, thereby empowering business users and enabling faster business outcomes. The pair also discussed ML and IoT integration which is helping eero consistently deliver an increasingly smart and powerful product to customers. You can watch Ravi and Katharine’s full video below, and here:

 

7 Big Data Predictions for 2017

As data increasingly becomes the means by which businesses compete, companies are restructuring operations to build systems and processes liberating data access, integration and analysis up and down the value chain. Effective data management has become so important that the position of Chief Data Officer is projected to become a standard senior board level role by 2020, with 92 percent of CIOs stating that a CDO is the best person to determine data strategy.

With this in mind as you evaluate your data strategy for 2017, here are seven predictions to contemplate to build a solid framework for data management and optimization.

  1.  Self-Service Data Integration Will Take Off
    Eschewing the IT bottleneck designation and committed to being a strategic partner to the business, IT is transforming its mindset. Rather than be providers of data, IT will enable users to achieve data optimization on a self-service basis. IT will increasingly decentralize app and data integration – via distributed Centers of Excellence based on shared infrastructure, frameworks and best practices – thereby enabling line-of-business heads to gather, integrate and analyze data themselves to discern and quickly act upon insightful trends and patterns of import to their roles and responsibilities. Rather than fish for your data, IT will teach you how to bait the hook. The payoff for IT: satisfying business user demand for fast and easy integrations and accelerated time to value; preserving data integrity, security and governance on a common infrastructure across the enterprise; and freeing up finite IT resources to focus on other strategic initiatives.
  1. Big Data Moves to the Cloud
    As the year takes shape, expect more enterprises to migrate storage and analysis of their big data from traditional on-premise data stores and warehouses to the cloud. For the better part of the last decade, Hadoop’s distributed computing and processing power has made it the standard open source platform for big data infrastructures. But Hadoop is far from perfect. Common user gripes include complexity and instability – not all that surprising given all the software developers regularly contributing their improvements to the platform. Cloud environments are more stable, flexible, elastic and better-suited to handling big data, hence the predicted migration.
  1. Spark Usage Outside of Hadoop Will Surge
    This is the year we will also see more Spark use cases outside of Hadoop environments. While Hadoop limps along, Spark is picking up the pace. Hadoop is still more likely to be used in testing rather than production environments. But users are finding Spark to be more flexible, adaptable and better suited for certain workloads – machine learning and real-time streaming analytics, as examples. Once relegated to Hadoop sidekick, Spark will break free and stand on its own two feet this year. I’m not alone in asking the question: Hadoop needs Spark but does Spark need Hadoop?
  1. A Big Fish Acquires a Hadoop Distro Vendor?
    Hadoop distribution vendors like Cloudera and Hortonworks paved the way with promising technology and game-changing innovation. But this past year saw growing frustration among customers lamenting increased complexity, instability and, ultimately, too many failed projects that never left the labs. As Hadoop distro vendors work through some growing pains (not to mention limited funds), could it be that a bigger, deeper-pocketed established player – say Teradata, Oracle, Microsoft or IBM – might swoop in to buy their sought after technology and marry it with a more mature organization? I’m not counting it out.
  1. AI and ML Get a Bit More Mainstream
    Off the shelf AI (artificial intelligence) and ML (machine learning) platforms are loved for their simplicity, low barrier to entry and low cost. In 2017, off the shelf AI and ML libraries from Microsoft, Google, Amazon and other vendors will be embedded in enterprise solutions, including mobile varieties. Tasks that have until now been manual and time-consuming will become automated and accelerated, extending into the world of data integration.

6. Yes, IoT is Coming, Just Not This Year
Connecting billions and billions of sensor-embedded devices and objects over the internet is inevitable, but don’t yet swallow all the hype. Yes, there is a lot being done to harness IoT for specific aims, but the pace toward the development of a general-purpose IoT platform is closer to a canter than a gallop. IoT solutions are too bespoke and purpose-built to solve broad, commonplace problems – the market still nascent with standards gradually evolving – that a general-purpose, mass-adopted IoT platform to collect, integrate and report on data in real-time will take, well, more time. Like any other transformation movement in the history of enterprise technology, brilliant bits and pieces need to come together as a whole. It’s coming, just not in 2017.

  1. APIs Are Not All They’re Cracked Up to Be
    APIs have long been the glue connecting apps and services, but customers will continue to question their value vs investment in 2017. Few would dispute that APIs are useful in building apps and, in many cases, may be the right choice in this regard. But in situations where the integration of apps and/or data is needed and sought, there are better ways. Case in point is iPaaS (integration platform as a service), which allows you to quickly and easily connect any combination of cloud and on-premise technologies. Expect greater migration this year toward cloud-based enterprise integration platforms – compared to APIs, iPaaS solutions are more agile, better equipped to handle the vagaries of data, more adaptable to changes, easier to maintain and far more productive.

I could go on and on, if for no other reason that predictions are informed “best guesses” about the future. If I’m wrong on two or three of my expectations, my peers will forgive me. In the rapidly changing world of technology, batting .400 is a pretty good statistic.

Fall is Here, and So is Our Fall 2016 Release!

Fall is the time to move your clocks back, get a pumpkin latte and slow down with the approaching cold weather. But not for SnapLogic! We continue to deliver system integration solutions with full force. After our Summer 201fall-2016-graphic6 release, which was feature-packed, the Fall 2016 release takes the SnapLogic platform to a whole new level by extending support for Teradata, introducing new Snap Packs for Snowflake and Azure Data Lake Store, adding more capabilities to Spark mode, and delivering several enhancements for security, performance and governance. As our VP of engineering, Vaikom Krishnan, aptly said, “We continue to make it easier and faster for organizations to connect any and all data sources – whether on premises, in the cloud, or in hybrid environments.”

Continue reading “Fall is Here, and So is Our Fall 2016 Release!”

Big Data Ingestion Patterns: Ingesting Data from Cloud & Ground Sources into Hive

What is Apache Hive? Hive provides a mechanism to query, create and manage large datasets that are stored on Hadoop, using SQL like statements. It also enables adding a structure to existing data that resides on HDFS. In this post I’ll describe a practical approach on how to ingest data into Hive, with the SnapLogic Elastic Integration Platform, without the need to write code.

Continue reading “Big Data Ingestion Patterns: Ingesting Data from Cloud & Ground Sources into Hive”