The bigger picture: Strategizing your data warehouse migration

By Ravi Dharnikota

If your organization is moving its data warehouse to the cloud, you can be confident you’re in good company. And if you read my last blog post about the six-step migration process, you can be even more confident that the move will go smoothly. However, don’t pull the trigger just yet. You’ve got a bit more planning to do, this time at a more strategic level.

First, let’s recap the migration process I covered in my last post, of the data warehouse itself. In that blog post, I broke down all the components of this diagram:

Data Warehouse Migration Process

Now, as you can see in the diagram below, the data warehouse migration process itself is part of a bigger picture of migration planning and strategy. Let’s take a look at the important pre-migration steps you can take help to ensure success with the migration itself.

Migration Strategy and Planning

Step 1: Define Goals and Business Case. Start the planning process with a clear picture of the business reasons for migrating your data warehouse to the cloud. Common goals include:

  • Agility in terms of both the business and the IT organization’s data warehousing projects.
  • Performance on the back end, to ensure timeliness and availability of data, and on the front end, for fast end-user query response times.
  • Growth and headroom to ease capacity planning; the elastic scalability of cloud resources mitigates this problem.
  • Cost savings on hardware, software, services, space, and utilities.
  • Labor savings from reduced needs for database administration, systems administration, scheduling and operations, and maintenance and support.

Step 2: Assess the current data warehouse architecture. If the current architecture is sound, you can plan to migrate to the cloud without redesign and restructuring. If architecturally sufficient for BI but limited for advanced analytics and big data integration, you should review and refine data models and processes as part of the migration effort. If the current architecture struggles to meet current BI requirements, plan to redesign it as you migrate to the cloud.

Step 3: Define the migration strategy. A “lift and shift” approach is tempting, but it rarely succeeds. Changes are typically needed to adapt data structures, improve processing, and ensure compatibility with the chosen cloud platform. Incremental migration is more common and usually more successful.

As I mentioned in my last blog post, a hybrid strategy is another viable option. Here, your on-premises data warehouse can remain operating as the cloud data warehouse comes online. During this transition phase, you’ll need to synchronize the data between the old on-premises data warehouse and the new one that’s in the cloud.

Step 4: Select the technology including the cloud platform you’ll migrate to, and which tools you’ll need for the migration. There are many types of tools and services that can be valuable:

  • Data integration tools are used to build or rebuild ETL processes to populate the data warehouse. Integration platform as a service (iPaaS) technology is especially well suited for ETL migration.
  • Data warehouse automation tools like WhereScape can be used to deconstruct legacy ETL, reverse engineer and redesign ETL processes, and regenerate ETL processes without the need to reconstruct data mappings and transformation logic.
  • Data virtualization tools such as Denodo provide a virtual layer of data views to support queries that are independent of storage location and adaptable to changing data structures.
  • System integrators and service providers like Atmosera can be helpful when manual effort is needed to extract data mappings and transformation logic that is buried in code.

Using these tools and services individually or in combination can make a remarkable difference in your data warehouse, serving to speed and de-risk the migration process.

Step 5: Migrate and operationalize; start by defining test and acceptance criteria. Plan the testing, then execute the migration process to move schema, data, and processing. Execute the test plan and, when successful, operationalize the cloud data warehouse and migrate users and applications.

Learn more at SnapLogic’s upcoming webinar

To get the full story on data warehouse cloud migration, join me for an informative SnapLogic webinar, “Traditional Data Warehousing is Dead: How digital enterprises are scaling their data to infinity and beyond in the Cloud,” on Wednesday, August 16 at 9:00am PT. I’ll be presenting with Dave Wells, Leader of Data Management Practice, Eckerson Group, and highlighting tangible business benefits that your organization can achieve by moving your data to the cloud. You’ll learn:

  • Practical best practices, key technologies to consider, and case studies to get you started
  • The potential pitfalls of “cloud-washed” legacy data integration solutions
  • Cloud data warehousing market trends
  • How SnapLogic’s Enterprise Integration Cloud delivers up to a 10X improvement in the speed and ease of data integration

Register today!

Ravi Dharnikota is Chief Enterprise Architect at SnapLogic. Follow him on Twitter @rdharn1

Data integration and best practices in the age of the digital customer

Nada-headshotBy Nada daVeiga

Organizations are competing more than ever based on how they engage with customers. It’s become a vital part of the enterprise digital transformation agenda. Yet in the rush, integration, a foundational element, is often overlooked in the haste to deploy new digital customer applications and experiences. McKinsey recently observed that “Integrating new processes with legacy systems in a cost-efficient way is a challenge most companies face when they digitize their customer. [1]

Why does it matter, and why is it such a big obstacle anyway?

The problem is that a lack of integration can quickly become transparent to customers. In retail, lack of strong integration between an e-commerce system and the CRM or ERP can result in website ordering, pricing, or shopping cart issues that aren’t visible to customer service. This lack of integration often results in customer frustration or a lost sale. In B2B, poor integration between the CRM and ERP can also lead to incorrectly rekeyed customer or order information, resulting in downstream invoicing issues.

But why is it so hard to pull together more integrated customer processes? Because there are just so many applications within the enterprise that manage a part of the customer process.

For example, a recent study by Ventana Research on customer analytics found that 40 percent of respondents worked with 14 different types of data across at least 6 different systems to derive customer insight. [2]

Five key strategies to connect and elevate your customer experience

With integration being the biggest barrier, let’s look at five strategies key to connecting and elevating the customer experience.

  1. Start with analytics, grow to experience

Why this sequencing? Simply, we have to start somewhere in order to measure key metrics, since only things measured can be improved. Getting a clear 360-degree view of the customer – with metrics around customer satisfaction, engagement, churn, and acquisition – provides the blueprint for targeting the best opportunities to upgrade customer experience.

  1. Put customer experts in control

Who better than the sales or service team to put themselves in the customer’s shoes? Often analytics projects can quickly become an IT-led project. While IT has an incredibly important role to play, in governance and ensuring the efficient use of technology, experts in the lines of business should be enabled to connect the dots themselves.

  1. Customer experience is a team sport – get collaborative

The chances are one of your customer process steps will likely depend on another team’s app. Or the data needed for your analytics project will be within another team’s control. With so much cross-departmental integration, ensure different teams are using the same integration platform to maximize reuse.

  1. Plan to keep pace with customer touchpoint variety

Having to perform hand-coded API integrations or costly custom integrations just to keep pace is a sure way to drain budgets. Ensure your integration platform connects with your current apps, whether you’re running Salesforce, NetSuite, SAP, Oracle, or any other app, as well as the ones you plan to use in the future, without requiring having to build connectivity.

  1. Customer data is your fastest growing asset – prepare to scale

There’s often no faster growing asset in the enterprise than customer data. And not just data, the sheer number of workflows around customer experience are set to skyrocket. Choose an integration platform that’ll keep pace. Because being forced to switch customer integration platforms later can quickly put the brakes on a customer experience initiative.

Set the foundation for customer experience success

To learn how to design your integration strategy to enable success with your customer initiatives, watch our webcast, “Data integration best practices in the age of the digital customer experience,” featuring Michele Goetz, Principal Analyst, Forrester Research Inc, and Ravi Dharnikota, Chief Enterprise Architect, SnapLogic. You’ll take away actionable insights for ensuring your organization’s data integration strategy is optimized for the digital customer. Register today!

Nada daVeiga is VP Worldwide Pre-Sales, Customer Success, and Professional Services at SnapLogic. Follow her on Twitter @nrdaveiga.

 

[1]Digitizing customer journeys and processes: Stories from the front lines,” McKinsey, May 2017.

[2]The Next Generation of Customer Analytics,” Ventana Research, February 2014.

 

 

 

Gaurav Dhillon on Nathan Latka’s “The Top” Podcast

Popular podcast host Nathan Latka has a built a large following getting top CEOs, founders, and entrepreneurs to share strategies and tactics that set them up for business success. A data industry veteran and self-described “company-builder,” SnapLogic founder and CEO Gaurav Dhillon was recently invited by Nathan to appear as a featured guest on “The Top.”

Nathan is known for his rapid-fire, straight-to-the-point questioning, and Gaurav was more than up to the challenge. In this episode, the two looked back at Gaurav’s founding of Informatica in the ’90s; how he took that company public and helped it grow to become a billion-plus dollar business; why he stepped away from Informatica and decided to start SnapLogic; how data integration fuels digital business and why customers are demanding modern solutions like SnapLogic’s that are easy to use and built for the cloud; and how he’s building a fast-growing, innovative business that also has it’s feet on the ground.

The two also kept it fun, with Gaurav fielding Nathan’s “Famous Five” show-closing questions, including favorite book, most admired CEO, advice to your 20-year-old self, and more.

You can listen to the full podcast above or via the following links:

How to set up Stream processing for Twitter using Snaps

Sharath-Punreddy300pxBy Sharath Punreddy

As you probably know, SnapLogic data pipelines use Streams, a continuous flow of data from a source to a target. By processing and extracting valuable insights out of Streaming data, a user/system can make decisions more quickly than with traditional batch processing. Streaming data analytics now provide near real-time, if not real-time, analytics.

In this data-driven age, timing of data analytics and insights has become a key differentiator. In some cases, the data becomes less relevant - if not obsolete - as it ages. Analyzing the data as it flows-in is crucial for use cases such as sentimental analysis for new product launches in retail, fraudulent transaction detection in the financial industry, preventing machine failures in manufacturing, sensor data processing for weather forecasts, disease outbreaks in healthcare, etc. Stream processing enables processing in near real-time, if not real-time, allowing the user or system to draw insights from the very latest data. Along with traditional APIs, companies are providing Streaming APIs for rendering data in real-time as it is being generated. Unlike traditional ReST/SOAP APIs, Streaming APIs establish a connection to the server and continuously stream the data for the desired amount of time. Once the time has elapsed, the connection will be terminated. Apache Spark with Apache Kafka as a Streaming platform has become a de facto industry standard for stream processing.

In this blog post, I’ll walk through the steps for building a simple pipeline to retrieve and process Tweets. You can also jump to the how-to video here.

Twitter Streams
Twitter has become a primary data source for sentiment analysis. The Twitter Streaming APIs provide access to global Tweets and can be accessed in real-time as people are tweeting. Snaplogic’s “Twitter Streaming Query” Snap enables users to retrieve Tweets based on a keyword in the text of the Tweet. The Tweets can then be processed using Snaps such as Filter Snap, Mapper Snap, or Aggregate Snap, for filtering, transforming, and aggregating, respectively. SnapLogic also provides a “Spark Script” Snap where an existing Python program can be executed on incoming Tweets. Tweets can also be routed to different destinations based on a condition, copied to multiple destinations (RDBMS, HDFS, S3, etc.) for storing and further analysis.

Getting Started
Below is a simple pipeline for retrieving Tweets, filtering them based on the language, and publishing to a Kafka cluster.

  1. Twitter_to_Kafka_PipelineUsing the Snaps tab on the left frame, search for the Snap. Drag and drop the Snap onto the Designer canvas (white space on the right).

Twitter_Snap_Img1    a. Click on the Snap to open the Snap Settings form.

Twitter_Snap_Img4Note: The “Twitter Streaming Query” Snap requires a Twitter account, which can be created through Designer while building the pipeline or using Manager prior to building pipeline.

b. Click on the “Account” tab.

Twitter_Snap_Img3    c. Click on the “Add Account” button.

Twitter_Account_Create_Img1Note: Twitter provides a couple of ways to authenticate applications to Twitter account. The “Twitter Dynamic OAuth1” is for Application-Only authentication and “Twitter OAuth1” is for User Authentication where the user is required to authenticate the application by signing into Twitter. In this case, we are using the User Authentication mechanism.

d. Choose an appropriate option based on the accessibility of the Account:
i. For Location of the Account: Shared makes this account accessible by the entire Organization, “projects/shared” would make the account accessible by all the users in the project, and “project/” would make the account accessible by only the user.
ii. For Account Type: Choose the “Twitter OAuth1” option to grant access to the Twitter account of the individual user.
iii. Click “OK.”

Twitter_Account_Create_Img2    e. Enter meaningful text for the “Label” such as [Twitter_of_] and click the “Authorize” button.

Twitter_Account_Create_Img3Note: If a user is logged into Twitter with an active session, they will be taken to the “Authorize” page of the Twitter website for the user to grant access to the application. If the user is not logged in or does not have an active session, it will take the user to Twitter sign-in page for them to sign in.

f. Click on the “Authorize app” button.

Twitter_Account_Create_Img4Note: The above “OAuth token” and “OAuth token secret” values are not active and are for example only.

g. At this point, the “OAuth token” and the “OAuth token secret” should have been populated. Click “Apply.”

Twitter_Account_Select_Img12. Once the account is successfully set up, click on the “settings” tab to provide the search keyword and time.

Twitter_Snap_Img4Note: The Twitter Snap will be retrieving Tweets for a designated time duration. For continuous retrieving, you can provide a value of “0” to the “Timeout in seconds.”

a. Enter a keyword and a time duration in seconds.

Twitter_Snap_Img5

3. Save by clicking the disk icon on the top right . This will trigger validation and should become a check mark if validation is successful.

Twitter_Snap_Img6

4. Click on list to preview the data.

Twitter_Snap_Img75. This confirms that the “Twitter Streaming Query” Snap has successfully established connection to the Twitter account and is fetching the Tweets.

6. The “Filter” Snap is used for filtering Tweets. Search for “Filter” using the Snaps tab on left frame. Drag and drop “Filter” Snap onto the canvas.

Filter_Snap_Img1    a. Click on “Filter” Snap to open the Settings form.

Filter_Snap_Img2    b. Provide a meaningful name such as “Filter By Language” for the “Label” and filter condition for “Filter Expression.” You can use the drop-down for choosing the filter attribute.

7. Click on disk icon to save it, which again triggers validation. You’ve now successfully completed a “Filter” Snap.

8. Search for “Confluent Kafka Producer” Snap using the Snaps tab on left frame. Drag and drop the Snap on the canvas.

Confluent_Account_Img1BNote: Confluent is an Apache Kafka distribution geared for Enterprises.

a. The “Confluent Kafka Producer” requires an account to connect to the Kafka cluster. Choose appropriate values based on the location and type of the account.

Confluent_Account_Img1A    b. Provide meaningful text for the “Label” of bootstrap server(s). In case of multiple bootstrap servers, use a comma to separate them, along with port.

Twitter_Account_Create_Img2    c. The “Schema registry URL” is optional, but is required in case Kafka is required to parse the message based on the Schema.

Confluent_Account_Img3    d. The other optional Kafka properties can be passed to the Kafka using the “Advanced Kafka Properties.” Click on “validate.”

e. If the validation is successfully, you should see a message on top as “Account validation successful.” Click “Apply.”

Confluent_Snap_Img29. Once the account is setup and chosen, click on “Settings” tab to provide Kafka topic and message.


Confluent_Snap_Img3

a. You can choose from the list of available topics by clicking the bubble icon next to the “Topic” field. Leave other fields to default. Another required field is “Message value.” Enter “$” to send entire Tweet and metadata information. Save by clicking the disk icon .

Twitter_to_Kafka_Pipeline410. The above is a fully validated pipeline to fetch the Tweets and load them into Kafka.

11. At this point, the pipeline is all set to receive the Tweets and push them into Kafka Topic. Run the pipeline by the clicking play button on the right-hand top corner . View the progress by clicking display button .

Twitter_to_Kafka_Pipeline5As you can see, the pipeline can be built in less than 15 minutes without requiring any deep technical knowledge. This tutorial and video provides a basic example of what can be achieved when using these Snaps. There are several other Snaps that can act on the data, such as filtering, copying, aggregating, triggering events, sending out emails, and others. Snaplogic takes pride in bringing complex technology to citizen integrator. I hope you found this useful!

Sharath Punreddy is Enterprise Solution Architect at SnapLogic. Follow him on Twitter @srpunreddy.

Making Workday Faster for Vassar College

Last week we attended Workday Rising in Chicago where we talked to attendees about integrating Workday with the rest of their IT ecosystems. The real stars of the show, however, were our customers from Vassar College who gave a brief presentation at our booth to discuss their journey from finding the need for an integration vendor, to assessing different platforms, to ultimately choosing SnapLogic’s elastic integration platform as a service (iPaaS).vassar-college-image-edited

Continue reading “Making Workday Faster for Vassar College”

Testing… Testing… 1, 2, 3: How SnapLogic tests Snaps on the Apache Spark Platform

The SnapLogic Elastic Integration Platform connects your enterprise data, applications, and APIs by building drag-and-drop data pipelines. Each pipeline is made up of Snaps, which are intelligent connectors, that users drag onto a canvas and “snap” together like puzzle pieces.

A SnapLogic pipeline being built and configured
A SnapLogic pipeline being built and configured

These pipelines are executed on a Snaplex, an application that runs on a multitude of platforms: on a customer’s infrastructure, on the SnapLogic cloud, and most recently on Hadoop. A Snaplex that runs on Hadoop can execute pipelines natively in Spark.

The SnapLogic data management platform is known for its easy-to-use, self-service interface, made possible by our team of dedicated engineers (we’re hiring!). We work to apply the industry’s best practices so that our clients get the best possible end product — and testing is fundamental. Continue reading “Testing… Testing… 1, 2, 3: How SnapLogic tests Snaps on the Apache Spark Platform”

New Podcast Series: SnapTalk

We are pleased to announce our new podcast series called SnapTalk. The series will feature short, 10-15 min. episodes on topics relevant to big data, data management and app and data integration. Our host for the series is Ravi Dharnikota, SnapLogic’s head of enterprise architecture. Each episode features a special guest in conversation with Ravi, such as SnapLogic’s chief scientist, Greg Benson.

This project grew out of the great conversations we have at Snappy Hour. Eating lunch as a group at least a couple of times a week and our weekly happy hour (called Snappy Hour) are big parts of the SnapLogic culture. And, invariably, the conversations at these gatherings range from the lightweight, such as the latest episode of Game of Thrones, to the complex, such as the future of Spark and what makes streaming data streaming. This podcast series is intended to capture the essence of those ad hoc discussions, get people thinking, and hopefully inspire additional discussions.

The first episodes are posted now and cover topics such as Spark, streaming data and Kafka. Stay tuned to this space for the next episode. The SnapTalk playlist is here and our new SoundCloud channel is here– I hope you’ll subscribe, and we welcome your feedback.