Igniting data discovery: SnapLogic delivers a “quanta” in improvement over Informatica

md_craig-BW-1443725112In my previous blog post, I talked about how a pharmaceutical company uses SnapLogic and Amazon Redshift to capitalize on market and environmental fluctuations, driving sales for its asthma relief medication. In this post, I’ll tell you the path the company took to get there. Hint: It wasn’t a straight one.

An IT organization abandons Informatica

Several months prior to launching its current environment, with data flows powered by SnapLogic, the pharmaceutical company tried, unsuccessfully, to integrate its data warehouses in Redshift using Informatica PowerCenter and Informatica Cloud. The IT team’s original plan was to move data from Salesforce, Veeva, and third-party sources into Amazon Simple Storage Service (S3), and then integrate the data into Redshift for sales and marketing analytics.

However, the project stalled due to difficulty with Informatica PowerCenter, the IT team’s initial choice for data integration. PowerCenter, which Informatica describes as a “metadata-driven integration platform,” is a data extract, transfer, and load (ETL) product rooted in mid-1990s enterprise architecture. The team found PowerCenter complicated to use and slow to deliver the urgently needed integrations.

Looking for faster results, the pharmaceutical company then attempted to use Informatica Cloud, Informatica’s cloud-based integration solution. The data integration initiative was again derailed, this time by the solution’s lack of maturity and functionality. The pharmaceutical company’s data was forced back on-premises, jeopardizing the entire cloud data warehouse initiative.

Data integration aligned with the cloud

But the IT team kept searching for the right data integration solution. “Cloud was instrumental to our plans, and we needed data integration that aligned with where we were headed,” said the senior business capability manager in charge of the integration project. The pharmaceutical company chose the SnapLogic Enterprise Integration Cloud.

After a self-evaluation, the IT team was able to quickly build data integrations with SnapLogic; no specialized resources or consultants were required. To accomplish the integrations in Redshift, the pharmaceutical company used:

  • Salesforce Snap
  • Redshift Snap
  • Various RDBMS Snaps
  • ReST/SOAP Snaps
  • Transformation Snaps

With the data integration accomplished in a matter of days, the IT organization was assured that current skills sets could support the company’s future global BI architecture. In addition, the IT team found the SnapLogic Enterprise Integration Cloud easy enough for business users, such as the marketing team, to integrate new data into Redshift.

Leveraging Redshift’s nearly infinite availability of low-cost data storage and compute resources, the analytic possibilities are equally limitless – igniting the marketing team’s discovery of new strategies to drive new insights, revenues, and operational efficiencies.

SnapLogic delivers a “quanta” in improvement 

What is quanta? It’s the plural of the word “quantum,” a physics term that describes “a discrete quantity of energy proportional in magnitude to the frequency of the radiation it represents.” If you’re not a physicist, your closest association is probably “quantum leap” – basically a gigantic leap forward.

Which is exactly what SnapLogic delivers. With regard to Informatica, Gaurav Dhillon, founder and CEO of SnapLogic, says:

“Fundamentally, I believe that SnapLogic is 10 times better than Informatica. That’s a design goal, and it’s also a necessary and sufficient condition for success. If a startup is going to survive, it’s got to have some 10x factor, some quanta of a value proposition.

“The quanta over the state of the art – the best-of-the-best of the incumbents – is vital. SnapLogic can fluently solve enterprise data problems almost as they are happening. That has a ‘wow’ factor people experience when they harness the power of our data integration technology.”

The SnapLogic Enterprise Integration Cloud is a mature, full-featured Integration Platform-as-a-Service (iPaaS) built in the cloud, for the cloud. Through its visual, automated approach to integration, the SnapLogic Enterprise Integration Cloud uniquely empowers both business and IT users, accelerating cloud data warehouse and analytics initiatives on Redshift and other cloud data warehouses

Unlike on-premises ETL or immature cloud tools, SnapLogic combines ease of use, streaming scalability, on-premises, and cloud integration, and managed connectors. Together, these capabilities present an improvement of up to 10 times over legacy ETL solutions such as Informatica or other “cloud-washed” solutions originally designed for on-premises use, accelerating cloud data warehouse integrations from months to days.

To learn more about how SnapLogic allows citizen data scientists to be productive with Amazon Redshift in days, not months, register for the webcast “Supercharge your Cloud Data Warehouse: 7 ways to achieve 10x improvement in speed and ease of Redshift integration.”

Craig Stewart is Vice President, Product Management at SnapLogic.

Discovery in overdrive: SnapLogic and Amazon Redshift power today’s pharma marketing

md_craig-BW-1443725112At its most fundamental, pharmaceutical marketing is based on straightforward market sizing and analytic baselines:

“The global market is composed of many submarkets [aka therapeutic categories] (TCs), whose number is given and equal to nTC. Each TC has a different number of patients (PatTC) in need of treatment for a specific disease, which determines the potential demand for drugs in each submarket. This number is set at the beginning of each simulation by drawing from a normal distribution [PatTC~N(μp,σp)] truncated in 0 to avoid negative values, and it is known by firms. Patients of each TC are grouped according to their willingness to buy drugs characterised by different qualities.”*

Yet capturing market share in today’s competitive environment is anything but easy. In the recent past, an army of sales reps would market directly to doctors, their efforts loosely coupled with consumer advertising placed across demographically compatible digital and traditional media.

This “spray and pray” approach with promotional spending, while extremely common, made it difficult to pinpoint specific tactics that drove individual product revenues. Projections and sales data factored heavily into the campaign planning stage, and in reports that summarized weekly, monthly, and quarterly results, but the insights gleaned were nearly always backward-looking and without a predictive element.

A pharmaceutical company pinpoints opportunity

Today, sophisticated pharmaceutical marketers have a much firmer grasp of how to use data to drive sales in a predictive manner – by deploying resources with pinpoint precision. A case in point: To maximize the market share of a prescription asthma medication, a leading pharmaceutical company uses SnapLogic and Amazon Redshift to analyze and correlate enormous volumes of data on a daily basis, capitalizing on even the smallest market and environmental fluctuations.

  • Each night, the marketing team takes in pharmacy data from around the US to monitor sales in each region, to learn how many units of the asthma medication sold the previous day. These numbers are processed, analyzed, and reported back to the sales team the following morning, allowing reps to closely monitor progress against their sales objectives.
  • With this data, the pharmaceutical marketing team can monitor, at aggregate and territory levels, the gross impact of many variables including:
    • Consumer advertising campaigns
    • Rep incentive programs
    • News coverage of air quality and asthma
  • However, the pharmaceutical marketing team takes its exploration much deeper. Layered on top of the core sales data, the marketing team correlates weather data from the National Weather Service (NWS) and multiple data sets from the US Environmental Protection Agency (EPA), such as current air quality, historic air quality, and air quality over time. Like the sales data, the weather and EPA data cover the entire US.

By correlating these multiple data sets, the marketing team can extract extraordinary insights that improve tactical decisions and inform longer-term strategy. At a very granular, local level, the team can see:

  • How optimal timing and placement of advertising across digital and traditional media drives demand
  • Which regional weather conditions stimulate the most sales in specific locales
  • The impact of rep incentive programs on sales
  • How news coverage of air quality and asthma influences demand

Ultimately, the pharmaceutical marketing team can identify, with uncanny precision, markets to concentrate spending on local and regional media, which can change on a constant basis. In this way, prospective consumers are targeted with laser-like accuracy, raising their awareness of the pharmaceutical company’s asthma medication at the time they need it most.

The results of the targeted marketing strategy are clear: The pharmaceutical company has enjoyed significant market share growth with its asthma relief medication, while reducing advertising costs due to more effective targeting.

Tools to empower business users

The pharmaceutical industry example exemplifies perhaps the biggest trend in recent business history: massive demand for massive amounts of data, to provide insight and drive informed decision-making. But five years after data scientist was named “the sexiest job of the 21st century,” it’s not data scientists who are gathering, correlating, and analyzing all this data; at the most advanced companies, it’s business users. At the pharmaceutical company and countless others like it, the analytics explosion is ignited by “citizen data scientists” using SnapLogic and Redshift.

In my next blog post, the second of this two-part series, I’ll talk about how SnapLogic turned around a failing initial integration effort at the pharmaceutical company, replacing Informatica PowerCenter and Informatica Cloud.

To find out more on how to use SnapLogic with Amazon Redshift to ignite discovery within your organization, register for the webcast “Supercharge your Cloud Data Warehouse: 7 ways to achieve 10x improvement in speed and ease of Redshift integration.”

Craig Stewart is Vice President, Product Management at SnapLogic.

* JASSS, A Simulation Model of the Evolution of the Pharmaceutical Industry: A History-Friendly Model, October 2013

Applying machine learning tools to data integration

greg-bensonBy Gregory D. Benson

Few tasks are more personally rewarding than working with brilliant graduate students on research problems that have practical applications. This is exactly what I get to do as both a Professor of Computer Science at the University of San Francisco and as Chief Scientist at SnapLogic. Each semester, SnapLogic sponsors student research and development projects for USF CS project classes, and I am given the freedom to work with these students on new technology and exploratory projects that we believe will eventually impact the SnapLogic Enterprise Integration Cloud Platform. Iris and the Integration Assistant, which applies machine learning to the creation of data integration pipelines, represents one of these research projects that pushes the boundaries of self-service data integration.

For the past seven years, these research projects have provided SnapLogic Labs with bright minds and at the same time given USF students exposure to problems found in real-world commercial software. I have been able to leverage my past 19 years of research and teaching at USF in parallel and distributed computing to help formulate research areas that enable students to bridge their academic experience with problems found in large-scale software that runs in the cloud. Project successes include Predictive Field Linking, the first SnapLogic MapReduce implementation called SnapReduce, and the Document Model for data integration. It is a mutually beneficial relationship.

During the research phase of Labs projects, the students have access to the SnapLogic engineering team, and can ask questions and get feedback. This collaboration allows the students to ramp up quickly with our codebase and gets the engineering team familiar with the students. Once we have prototyped and demonstrated the potential for a research project we transition the code to production. But the relationship doesn’t end there – students who did the research work are usually hired on to help with transitioning the prototype to production code.

The SnapLogic Philosophy
Iris technology was born to help an increasing number of business users design and implement data integration tasks that previously required extensive programming skills. Most companies must manage an increasing number of data sources and cloud applications as well as an increasing amount of data volume. And it’s data Integration platforms that help business connect and transform all of this disparate data. The SnapLogic philosophy has always been to truly provide self-service integration through visual programming. Iris and the Integration Assistant further advances this philosophy by learning from the successes and failures of thousands of pipelines and billions of executions on the SnapLogic platform.

The Project
Two years ago, I led a project that refined our metadata architecture and last year I proposed a machine learning project for USF students. At the time, I gave some vague ideas about what we could achieve. The plan was to spend the first part of the project doing data science on the SnapLogic metadata to see what patterns we could find and opportunities for applying machine learning.

One of the USF graduate students working on the project, Thanawut “Jump” Anapiriyakul, discovered that we could learn from past pipeline definitions in our metadata to help recommend likely next Snaps during pipeline creation. Jump experimented with several machine learning algorithms to find the ones that give the best recommendation accuracy. We later combined the pipeline definition with Snap execution history to further improve recommendation accuracy. The end result: Pipeline creation is now much faster with the Integration Assistant.

The exciting thing about the Iris technology is that we have created an internal metadata architecture that supports not only the Integration Assistant but also the data science needed to further leverage historical user activity and pipeline executions to power future applications of machine learning in the SnapLogic Enterprise Cloud. In my view, true self-service in data integration will only be possible through the application of machine learning and artificial intelligence as we are doing at SnapLogic.

As for the students who work on SnapLogic projects, most are usually offered internships and many eventually become full-time software engineers at SnapLogic. It is very rewarding to continue to work with my students after they graduate. After ceremonies this May at USF, Jump will join SnapLogic full-time this summer, working with the team on extending Iris and its capabilities.

I look forward to writing more about Iris and our recent technology advances in the weeks to come. In the meantime, you can check out my past posts on JSON-centric iPaaS and Hybrid Batch and Streaming Architecture for Data Integration.

Gregory D. Benson is a Professor in the Department of Computer Science at the University of San Francisco and Chief Scientist at SnapLogic. Follow him on Twitter @gregorydbenson.

Onboarding contingent workers: Can your HR processes handle it?

Nada-headshotIt’s not just Lyft drivers and TaskRabbit taskers. Contingent workers — workers who are part-time, temporary, or often both — form a rapidly growing proportion of the overall US workforce:

  • 17% of the US workforce in 1989[1]
  • 36% in 2015[2]
  • Projected 43% by 2020[3]

Contingent workers, the cornerstone of the Gig Economy, have been the subject of countless headlines and conversations for several years. But while picking up a few bucks on the weekends driving for Uber is now mainstream, the companies that hire contingent workers still struggle with how to on- and off-board them efficiently. In an age when every moment of productivity counts, onboarding processes that are slowed by legacy systems present a significant drag on revenues and profitability.

Catching the productivity thief: poor integration

Take, for example, a venerable national restaurant chain on a hot streak of growth. To handle increased demand, more cooks needed be hired — in the midst of a national shortage of cooks. In the face of this dire shortage, literally every moment of productivity counted at the restaurant chain. To onboard new cooks in moments, not hours or days, the restaurant chain’s CIO embarked on a cloud-first strategy, intent on automating key HR processes and reducing costs. The company deployed Workday Human Capital Management (HCM) and ServiceNow, and set out to integrate the cloud applications with legacy on-premise Oracle applications using Informatica PowerCenter.

Several months into a proof of concept (POC) exercise, the project stalled due to the complexity, expense and slowness of the integration process.

SnapLogic speeds integration — and business results

Recognizing that a radically different integrated business solutions were needed, the restaurant chain chose the SnapLogic Enterprise Integration Cloud to accelerate its transition. Here, the company’s goal was twofold:

  • Automate critical current and future workflows, such as employee on- and off-boarding, across Workday, Oracle applications and multiple point solutions. This comprised:
    • Onboarding cooks quickly and getting these new employees productive as fast as possible, having enrolled them in applicable benefits programs, provisioned uniforms and automated other new-employee actions.
    • Off-boarding departing employees as quickly as possible to reduce risk to the company.
  • Reduce the complexity and cost of the initial integrations, as well as their maintenance.

Using SnapLogic, the restaurant chain achieved all of its goals. It completed the complex Workday integration project in a matter of days, a dramatic contrast to its protracted, unsuccessful Informatica PowerCenter POC initiative.

With SnapLogic integration Snaps in place, and new Snaps easily added, the restaurant chain has a flexible foundation to handle future data and process integrations with speed and ease. And, because the Workday integration was executed at a fraction of the time and cost of using Informatica PowerCenter, SnapLogic was a direct catalyst to the restaurant chain achieving its digital transformation goals.

The SnapLogic Enterprise Integration Cloud, a self-service integration platform, makes it fast and easy to connect data, applications and devices. In doing so, SnapLogic eliminates business silos and technology bottlenecks, helping companies of all kinds to more efficiently manage their contingent workforces.

Bonus: reduce the cost of Workday integrations by up to 90% 

Find out how SnapLogic can accelerate the integration of Workday applications into enterprise environments, reducing associated time and costs by up to 90%. Register today for the webcast “How rapid Workday integration drives digital transformation.”

 

Nada daVeiga is VP Worldwide Pre-Sales, Customer Success, and Professional Services at SnapLogic. Follow her on Twitter @nrdaveiga.

 

[1] Source: U.S. Bureau of Labor Statistics, via Intuit

[2] Source: U.S. Bureau of Labor Statistics and U.S. Census, via Intuit

[3] Source: Intuit Contingent Workforce Forecast 2015

How to set up Stream processing for Twitter using Snaps

Sharath-Punreddy300pxBy Sharath Punreddy

As you probably know, SnapLogic data pipelines use Streams, a continuous flow of data from a source to a target. By processing and extracting valuable insights out of Streaming data, a user/system can make decisions more quickly than with traditional batch processing. Streaming data analytics now provide near real-time, if not real-time, analytics.

In this data-driven age, timing of data analytics and insights has become a key differentiator. In some cases, the data becomes less relevant - if not obsolete - as it ages. Analyzing the data as it flows-in is crucial for use cases such as sentimental analysis for new product launches in retail, fraudulent transaction detection in the financial industry, preventing machine failures in manufacturing, sensor data processing for weather forecasts, disease outbreaks in healthcare, etc. Stream processing enables processing in near real-time, if not real-time, allowing the user or system to draw insights from the very latest data. Along with traditional APIs, companies are providing Streaming APIs for rendering data in real-time as it is being generated. Unlike traditional ReST/SOAP APIs, Streaming APIs establish a connection to the server and continuously stream the data for the desired amount of time. Once the time has elapsed, the connection will be terminated. Apache Spark with Apache Kafka as a Streaming platform has become a de facto industry standard for stream processing.

In this blog post, I’ll walk through the steps for building a simple pipeline to retrieve and process Tweets. You can also jump to the how-to video here.

Twitter Streams
Twitter has become a primary data source for sentiment analysis. The Twitter Streaming APIs provide access to global Tweets and can be accessed in real-time as people are tweeting. Snaplogic’s “Twitter Streaming Query” Snap enables users to retrieve Tweets based on a keyword in the text of the Tweet. The Tweets can then be processed using Snaps such as Filter Snap, Mapper Snap, or Aggregate Snap, for filtering, transforming, and aggregating, respectively. SnapLogic also provides a “Spark Script” Snap where an existing Python program can be executed on incoming Tweets. Tweets can also be routed to different destinations based on a condition, copied to multiple destinations (RDBMS, HDFS, S3, etc.) for storing and further analysis.

Getting Started
Below is a simple pipeline for retrieving Tweets, filtering them based on the language, and publishing to a Kafka cluster.

  1. Twitter_to_Kafka_PipelineUsing the Snaps tab on the left frame, search for the Snap. Drag and drop the Snap onto the Designer canvas (white space on the right).

Twitter_Snap_Img1    a. Click on the Snap to open the Snap Settings form.

Twitter_Snap_Img4Note: The “Twitter Streaming Query” Snap requires a Twitter account, which can be created through Designer while building the pipeline or using Manager prior to building pipeline.

b. Click on the “Account” tab.

Twitter_Snap_Img3    c. Click on the “Add Account” button.

Twitter_Account_Create_Img1Note: Twitter provides a couple of ways to authenticate applications to Twitter account. The “Twitter Dynamic OAuth1” is for Application-Only authentication and “Twitter OAuth1” is for User Authentication where the user is required to authenticate the application by signing into Twitter. In this case, we are using the User Authentication mechanism.

d. Choose an appropriate option based on the accessibility of the Account:
i. For Location of the Account: Shared makes this account accessible by the entire Organization, “projects/shared” would make the account accessible by all the users in the project, and “project/” would make the account accessible by only the user.
ii. For Account Type: Choose the “Twitter OAuth1” option to grant access to the Twitter account of the individual user.
iii. Click “OK.”

Twitter_Account_Create_Img2    e. Enter meaningful text for the “Label” such as [Twitter_of_] and click the “Authorize” button.

Twitter_Account_Create_Img3Note: If a user is logged into Twitter with an active session, they will be taken to the “Authorize” page of the Twitter website for the user to grant access to the application. If the user is not logged in or does not have an active session, it will take the user to Twitter sign-in page for them to sign in.

f. Click on the “Authorize app” button.

Twitter_Account_Create_Img4Note: The above “OAuth token” and “OAuth token secret” values are not active and are for example only.

g. At this point, the “OAuth token” and the “OAuth token secret” should have been populated. Click “Apply.”

Twitter_Account_Select_Img12. Once the account is successfully set up, click on the “settings” tab to provide the search keyword and time.

Twitter_Snap_Img4Note: The Twitter Snap will be retrieving Tweets for a designated time duration. For continuous retrieving, you can provide a value of “0” to the “Timeout in seconds.”

a. Enter a keyword and a time duration in seconds.

Twitter_Snap_Img5

3. Save by clicking the disk icon on the top right . This will trigger validation and should become a check mark if validation is successful.

Twitter_Snap_Img6

4. Click on list to preview the data.

Twitter_Snap_Img75. This confirms that the “Twitter Streaming Query” Snap has successfully established connection to the Twitter account and is fetching the Tweets.

6. The “Filter” Snap is used for filtering Tweets. Search for “Filter” using the Snaps tab on left frame. Drag and drop “Filter” Snap onto the canvas.

Filter_Snap_Img1    a. Click on “Filter” Snap to open the Settings form.

Filter_Snap_Img2    b. Provide a meaningful name such as “Filter By Language” for the “Label” and filter condition for “Filter Expression.” You can use the drop-down for choosing the filter attribute.

7. Click on disk icon to save it, which again triggers validation. You’ve now successfully completed a “Filter” Snap.

8. Search for “Confluent Kafka Producer” Snap using the Snaps tab on left frame. Drag and drop the Snap on the canvas.

Confluent_Account_Img1BNote: Confluent is an Apache Kafka distribution geared for Enterprises.

a. The “Confluent Kafka Producer” requires an account to connect to the Kafka cluster. Choose appropriate values based on the location and type of the account.

Confluent_Account_Img1A    b. Provide meaningful text for the “Label” of bootstrap server(s). In case of multiple bootstrap servers, use a comma to separate them, along with port.

Twitter_Account_Create_Img2    c. The “Schema registry URL” is optional, but is required in case Kafka is required to parse the message based on the Schema.

Confluent_Account_Img3    d. The other optional Kafka properties can be passed to the Kafka using the “Advanced Kafka Properties.” Click on “validate.”

e. If the validation is successfully, you should see a message on top as “Account validation successful.” Click “Apply.”

Confluent_Snap_Img29. Once the account is setup and chosen, click on “Settings” tab to provide Kafka topic and message.


Confluent_Snap_Img3

a. You can choose from the list of available topics by clicking the bubble icon next to the “Topic” field. Leave other fields to default. Another required field is “Message value.” Enter “$” to send entire Tweet and metadata information. Save by clicking the disk icon .

Twitter_to_Kafka_Pipeline410. The above is a fully validated pipeline to fetch the Tweets and load them into Kafka.

11. At this point, the pipeline is all set to receive the Tweets and push them into Kafka Topic. Run the pipeline by the clicking play button on the right-hand top corner . View the progress by clicking display button .

Twitter_to_Kafka_Pipeline5As you can see, the pipeline can be built in less than 15 minutes without requiring any deep technical knowledge. This tutorial and video provides a basic example of what can be achieved when using these Snaps. There are several other Snaps that can act on the data, such as filtering, copying, aggregating, triggering events, sending out emails, and others. Snaplogic takes pride in bringing complex technology to citizen integrator. I hope you found this useful!

Sharath Punreddy is Enterprise Solution Architect at SnapLogic. Follow him on Twitter @srpunreddy.

Gartner Names SnapLogic a Leader in the 2017 Enterprise iPaaS Magic Quadrant

For the second year in a row, SnapLogic has been named a Leader in Gartner’s Magic Quadrant for Enterprise Integration Platform as a Service (iPaaS).

Gartner evaluated iPaaS vendors on “completeness of vision” and “ability to execute.” Those named to the Leaders quadrant, as Gartner noted in the report, “have a solid reputation, with notable market presence and a proven track record in enabling … their platforms are well-proven and functionally rich, with regular releases to rapidly address this fast-evolving market.”

In a press release issued today, SnapLogic CTO James Markarian said of the recognition: “Since our inception, we have been laser-focused on delivering a modern enterprise integration platform that is specifically designed to manage the data and application integration demands of today’s hybrid enterprise technology environments. Our Enterprise Integration Cloud eliminates the complexity of legacy integrations, providing a platform that supports fast and easy self-service integration.”

The Enterprise iPaaS Magic Quadrant is embedded below. We’d encourage you to download the complete report as it provides a comprehensive review of all the vendors and the growing market.

Gartner 2017 iPaaS MQ

Thanks to all of SnapLogic’s customers, partners, and employees for the ongoing support and for making SnapLogic’s Enterprise Integration Cloud a leading self-service integration platform connecting applications, data, and things.

Podcast: James Markarian and David Linthicum on New Approaches to Cloud Integration

SnapLogic CTO James Markarian recently joined cloud expert David Linthicum as a guest on the Doppler Cloud Podcast. The two discussed the mass movement to the cloud and how this is changing how companies approach both application and data integration.

In this 20-minute podcast, “Data Integration from Different Perspectives,” the pair discuss how to navigate the new realities of hybrid app integration, data and analytics moving to the cloud, user demand for self-service technologies, the emerging impact of AI and ML, and more.

You can listen to the full podcast here, and below: