Strata-+-Hadoop-World_blog

Last week we attended Strata + Hadoop World, put on by O’Reilly Media and SnapLogic partner Cloudera. We enjoyed talking to attendees, customers and Big Data experts about swimming in the data lake vs. sinking in a data warehouse, big data integration and the next generation of ETL tools. We kept our booth lively by inviting attendees to jump into the data lake with us in our awesome photo booth, complete with props! Full album of everyone who participated here. Our Chief Scientist, Greg Benson, also had a speaking opportunity. Stay tuned for a recording of the session; in the meantime, take a look at his presentation slides: Jump into the Data Lake with Hadoop-Scale Data Integration.

Check out a recap of event highlights below and learn more about SnapLogic big data integration here. We’re also hosting an upcoming webinar featuring Ventana Research on how to attain excellence in big data integration.

“The problem is, big data does not play well with traditional tools or analytic theories. Traditional tools choke on the volume, velocity and variety of data. To get the most from big data, companies should invest in new technologies, real-time automation, visualization and machine learning.”

- Evan Hovorka, Group Manager, Digital Marketing at a Fortune 500 Retailer

SnapLogic: Can you describe your current role?

SnapLogic for MarketingEH: My team provides solutions that drive digital marketing ROI and great customer experiences. Some recent examples would be a cross channel analytic solution, several big data projects and web service automation for substantial marketing channels. It’s a unique role which evolved over the years as the lines between marketing and IT faded. The business relies on key partners to drive traffic and sales to our sites. Not surprising, these Internet-first companies demand fast and flexible solutions. An example is item, price and promo data. This data needs to be distributed to dozens of partners with many custom business rules in play. Using marketing automation tools we empower partners like Google, shop comparison sites, social sites and affiliates to iterate and innovate quickly on our behalf. Speed equals relevancy in digital marketing, so a foundation built on flexibility is required.

Deploying advanced technology for ad reporting, attribution and media performance is also part of the role. Pulling in metrics from ad servers, paid search programs, email, social and video into a Hadoop cluster that marries with more traditional databases allows us to see performance across channels. This secure network of data pipes allows us to make smart decisions quickly and safely.

SnapLogic: Marketing has become so technology-based in the past few years. This has brought about innovation but where are the challenges or downsides?

EH: After obtaining a degree in Computer Information Systems I worked as a programmer in the smart card engineering field. It was interesting work, but even more fascinating was all the opportunity for improvement in marketing systems. This is a space that rewards responsiveness over flawless design. Being able to build and automate quickly is required but not something traditional IT shops were set up to deal with then. Another challenge is that marketers historically have not held computer science degrees and lack the skills needed to take advantage of new tech. Connecting and distributing data and applications has become a core strategy for a progressive marketing shop. This is all possible with deeper IT partnerships, smarter marketers and tools that simplify technology so more people can find success. The challenge is finding the right skills to staff both IT and marketing.

SnapLogic: If you’re in front of a room of marketers and use the words Big Data – do eyes glaze?

EH: Not anymore. Progressive minds are really excited about the opportunity. In theory, big data tools connect disparate sales channels, offline and online activity, data from IoT sensors and provide views into real-time inventory, allowing leaders to adjust plans accordingly. Hadoop and its tool are maturing and increasingly easier to use, so the skills threshold is dropping. People should be extremely excited about co-locating their valuable data assets into a data lake/Hadoop cluster, which is the definition of democratized data. Big data at big companies introduces some concerns, though. I can’t stress enough the importance of security, data lineage and the education of users.

SnapLogic: Is big data improving the bottom line at large companies yet?

EH: Expense reduction makes Hadoop an easy sell but I’m not sure how many established companies have actually shut off their old, expensive databases once Hadoop showed up. That is acceptable in my books, because the true value is not in shutting off older systems or porting ETL load but rather from the previously impossible ideas and innovations that evolve fro these new systems. Companies need to hire data scientists, which can be expensive. So even though Hadoop is affordable, one still needs those specialists to make it really shine.

While that transition happens, companies can enable their senior analysts to discover and build in the big data lake, using tools that mitigate some of the mystery and risk. SnapLogic is one such tool, allowing regular business teams to tap into data, connect disparate sources and get to insights without hiring an army of data scientists and ETL experts. One tool alone may not produce the next Netflix recommendation engine, but most companies have much walking to do before they run in Hadoop.

SnapLogic: Marketing analytics is a hot space, but what are some of the technological gaps that remain?

BigData_webinar_GraphicEH: Big data is here to stay; we can collect and expose high volumes of low-level information that has not traditionally been mined or co-located. An analyst would typically look at store sales, demographic data and regional data. Big data tools could add to that with many more sources of important information, such as log files from IoT, weather patterns, vast amounts of historic data, time series data and much more. The problem is, big data does not play well with traditional tools or analytic theories. Traditional tools choke on the volume, velocity and variety of data. To get the most from big data, companies should invest in new technologies, real-time automation, visualization and machine learning.

Another challenge can be the various sources of truth. Reconciling silos is not easy, but it is doable. Ideally an analyst could quickly and reliably move and transform data from place A to place B, analyze its value and take actions to improve commonly-agreed-upon KPIs. This often means moving data from place B to a plethora of partners and downstream systems each with their own format, cadence and security, which is something that traditional ETL tools struggle to accomplish.

SnapLogic: So how can people get started with big data and the tools needed to use it?

EH: Mining value from these new data solutions is an opportunity for smart people to shine. Big data is not about incrementally improving ETL jobs or pushing data to faster, cheaper systems. That’s a conservative view. Big data is about big ideas. Ideally, people will put a moon-shot lens over their data assets and invent game changing opportunities. Ideas that may not even align to current corporate initiatives should be one of the goals. Moving from incremental improvements into a whole new business model is what excites me about the big data/computer science space. Humans harnessing new machines in an effort to build something amazing! Who wouldn’t want to work under that imperative?

Evan Hovorka is a Group Manager in digital marketing and is passionate about using new technology to drive business goals and empower people to do great things. He leads prototype, automation and big data proof of concepts (POCs) for a large US retailer. His career includes 15 years of strategy leadership in the CRM, digital media and data-driven marketing fields.

While there’s no shortage of big data hype in the market, according to a recent survey we ran with TechValidate at the end of 2014, there is still a great deal of enterprise IT uncertainty when it comes to which Hadoop distributions and tools to use. We surveyed over 100 companies in the US with revenues greater than $500M and found that IT leaders are excited about big data’s ability to power sharper analytics and other modern applications, but they continue to struggle with limited skills and resources. We also found that big data integration technologies were right up there with analytics as the top priority investment.

You can check out Roger Chan’s latest infographic below, read the press release here and visit www.snaplogic.com/techvalidate to see the complete survey results.

Big-Data_infographic_FINAL

The SnapLogic team will be at Strata + Hadoop World this week in San Jose and TDWI Las Vegas next week talking about (and demonstrating) the benefits of our elastic integration platform as a service (iPaaS) for big data integration.

Global Information Storage Capacity

Image courtesy of Wikipedia

These days, it’s impossible to pick up a newspaper or magazine without reading about the tremendous changes that are about to be wrought upon our society. We’re not talking about the climate change or the surveillance state, but something that promises to improve our lives: let’s talk about Big Data.

First, a bit about what exactly we’re referring to when we talk about Big Data. We’re talking about the modern usage of extremely powerful supercomputers to store and analyze huge amounts of data on a macro scale. Though humans have almost always collected data, up until recently, we lacked the tools to analyze this data without a tremendous expenditure of effort. Data had to be combed though by hand, calculations worked out on paper, and finally compiled together, all in a coordinated effort so complicated as to be effectively impossible. Even after the invention of computers, it took decades before they were powerful enough and could store enough information to make analysis of these massive datasets possible. Even after sufficiently powerful computers were developed, they were out of reach for everyone save those with the deepest pockets. They were event called “supercomputers” to differentiate them from those used by everyone else.

But we’re in the midst of a sea of change: for the first time in history, analysis of these vast data stores is not only possible, but accessible to and feasibly for the average business. This access promises to change everything about the way we live. Welcome to the age of Big Data.

Big Data & Hockey

Image courtesy of Flickr

One of Big Data’s Earliest Success Stories Is in Sports

Keeping in mind the obsession that sports fans have with statistics of all kinds, it would seem natural that it would be one of the first places that Big Data would be applied. It has all the ingredients: the data is there and easily accessible, there is plenty of interest in the subject and there’s even a lot of money in it. And eventually, it happened in baseball (later, the story became the basis for the acclaimed film Moneyball).

But baseball isn’t the only place that Big Data can find applications, Indeed, hockey is another sport where it has begun to make a big impact. Scientists like Aaron Clauset have turned many erstwhile sports “rules” on their heads. In a paper published as an open access article in EPJ Data Science, Clauset debunked the established idea of “momentum,” the concept that players and teams have “hot streaks” during which many points are scored in quick succession. Clauset found this to be little more than random choice: people seeing patterns in the tea leaves.

But it doesn’t meant that Big Data can’t be used productively by coaches. A 2013 seminar on the applications of Big Data to hockey in Toronto found several ways that coaches can make use of their player data to enhance their game. By using a complicated analysis, players can be given different ratings for particularly characteristics they have, such as their ability to perform in a given position. This can help coaches make choices about where (and when) to deploy a player to maximize his or her chances of scoring. Though most sports have been more resistant to the Big Data Revolution than baseball was, it’s only a matter of time before all organized sports adopt these methods of data analysis.

Data: Better than a Gut Feeling About Someone

One of the strengths of Big Data is its ability to determine relationships between seemingly unrelated factors. Another place that data has the chance to really revolutionize the way business is conducted is in the recruiting industry. The recruiting industry is concerned with all kinds of success metrics, from talent sourcing to employee retention. A useful example is that of Gate Gourmet, a caterer for the airline industry. One of their human resources analysts noticed that their Chicago O’Hare Airport staff was experiencing a 50 percent turnover rate, and set out to determine the reason. In depth analysis found a strong connection between employee turnover and time spent commuting to work; Gate Gourmet was able to adjust their hiring strategies to find workers closer to the airport and cut their turnover rate almost in half.

Seeing the Forest for the Trees

In some ways, forest management, with its focus on long-term trends and relationships, predicted Big Data. But in those times, scientists were forced to deal with abstractions, and predictions had to be tailored to specific regions, or even specific forests. Now, foresters can get insights based on general tools just by inputting their own forest’s data. Software can account for types of trees, different growth rates, climates, weather and many other variables, and can predict outcomes over years, decades, or even centuries. This analysis has proved its value through its ability to identify high-risk areas for dangerous forest fires or even ecological collapse. That’s right: Big Data can even save lives.

A Healthy Impact

It will come as no surprise, then, that one of the most important areas that Big Data will influence the future is in the realm of healthcare. Finally, scientists have access to anonymous statistics about millions of people, providing (with due analysis) an unparalleled look into the details of people’s lives And the ways that this can help are multitudinous. One recent example was published in the Journal of American Medical Informatics Association, which analyzed people’s Google searches to identify a previously unknown drug interaction. They found that people who searched for both paroxetine (an anti-depressant) and pravastatin (a cholesterol drug) were more likely to search for symptoms of hyperglycemia, a link that was later established in a study.

Big Data: Big Conclusions

These four examples are only the tip of the iceberg. Truly, wherever you look, there are processes that can be improved by application of data, whether they be free business loan calculators or advanced hockey statistics. Here, we’ve only covered a few areas where its impact will be felt, but the real question might turn out to be: where won’t Big Data change everything?


Nick Rojas is a business consultant and write who lives in Los Angeles and Chicago. He has consulted small and medium-sized enterprises for over twenty years. He has contributed articles to Visual.ly, Entrepreneur and TechCrunch. You can follow him on Twitter @NickARojas, or you can reach him at NickAndrewRojas@gmail.com.

SnapLogic @ Strata + Hadoop WorldIf you’re interested in big data and big data integration, next week will be a great opportunity to learn about all things big data and Hadoop at Strata + Hadoop World in San Jose, California. In addition to some great sessions and keynotes, including a talk on “Hadoop’s Impact on the Future of Data Management” from Cloudera co-founder and CTO Amr Awadallah, SnapLogic will be sponsoring the event and giving away over $1,000 worth of prizes to winners of our photo booth / social media contest. Be sure to stop by and get your photo taken as you strive to “swim in the data lake,” and check out the full conference schedule here.

We’ll also have a brief speaking slot of our own in the Solutions Showcase Theater from our Chief Data Scientist and in-house Hadoop expert, Greg Benson, whose session will be: “Jump into the Data Lake with Hadoop-Scale Data Integration.”

A few other event details below:

  • Event will take place at the San Jose Convention Center; the SnapLogic team can be found at booth #1608
  • Use the code SnapLogic20 to get 20% off your registration
  • Event dates are Wednesday, February 18th through Friday, February 20th

In partnership with Cloudera, SnapLogic is also sponsoring the Data Dash, a charity run on Thursday morning. Meet in Discovery Meadow Park near West San Carlos Street at 6:30 for a scenic 2.6 mile run, free swag bag, and the opportunity to make a donation to the Innocence Project Florida. All details, including registration, can be found here.

And finally, if you’re just getting into the world of big data, check out this whitepaper from GigaOM analyst David Linthicum on the old vs. new when it comes to data, data integration, traditional ETL tools and what’s to come. We look forward to seeing you next week in San Jose!

While many believe that he never actually said it, one of my favorite quotes about modern innovation is the Henry Ford line: “If I had asked people what they wanted, they would have said faster horses.” (Note that there is similar disagreement about a famous Steve Jobs quote about listening to customers.)

When it comes to re-thinking the integration plumbing that will be required for the next wave of big data and microservices in the modern enterprise, David Linthicum had this to say in our webinar last week:

linthicum_big_data_integration“Those who don’t understand the strategic value that new approaches to data integration will have in the emerging world will end up being caught without the technology they need to be successful. I get a call a week from people who are trying to take the existing approaches, patterns and technologies that they leverage, 10, 15, 20 years ago and re-apply them into big data systems into Data Lakes and new versions of how we’re dealing cloud-based systems and petabyte-scale databases and they’re falling short and the reason they’re falling short is because they were engineered to solve a particular static problem.”

He went on to say, “The Data Lake concept and the ability to move in that direction, I think is going to provide a ton more value within the enterprises than we saw in the past.” The lively discussion with SnapLogic co-founder and CEO, Gaurav Dhillon, covered ETL, ESBs, the future of data warehousing and the Internet of Things. Be sure to check out the recording or listen to the podcast on iTunes.

As Ventana Research recently noted in their benchmark research, Big Data Requires Integration Technology. But trying to get your old ETL tool to solve your new big data integration challenges is like hoping your horse will run faster when what you really need a Tesla. Back to David Linthicum, who has authored 13 books on integration:

“We have to change the way in which these systems exchange information and that is something either you can resist and try to take your existing technology and throw it at the problem and yell at your vendor for not getting the things into their technology they need to get in, or you take this as an opportunity to reinvent the way in which you do data integration, the way in which you approach the problem and ultimately bring more value into your enterprise by automating access to the information that, quite frankly, you need to run your business and will change your business, if used effectively.”

Here’s an overview of how SnapLogic is approaching the need for a modern approach to cloud and big data integration, with SnapReduce, Hadooplex and our investment in building a JSON-centric iPaaS from our Chief Scientist, Greg Benson:

SnapLogic Ultra PipelinesToday SnapLogic announced the availability of Ultra Pipelines, which continuously consume documents from external sources that require low-latency processing of their inputs or that are not compatible with event-based triggers. Ultra Pipelines receive input from a website or application and return data to the requester at speeds up to 10x faster than before for some use cases, satisfying the needs of line-of-business managers who need instant access to company or customer data. According to Niraj Nagrani, vice president of engineering at SnapLogic, “the frictionless, real-time processing that Ultra Pipelines deliver is what companies need to take full advantage of the cloud and big data.”

This idea of frictionless, real-time processing without the complexity and overhead of implementing and managing legacy enterprise service bus (ESB) technology was a primary topic of discussion in our webinar last week with industry analyst, author and integration practitioner David Linthicum. Called The Death of Traditional Data Integration – How the Changing Nature of IT Mandates New Approaches and Technologies, the number of attendees and questions certainly indicates that many enterprise IT organizations are in the midst of re-thinking their integration layer in the era of Social, Mobile, Analytics, Cloud and the Internet of Things (SMACT).

LinkedIn_Posts_180x110Picking up on the theme of why buses don’t fly in the cloud, a topic we also discussed on a webinar with Forrester in 2014, we asked attendees: “What is the future of the ESB at your company?” 67% said they’re looking to move to a more flexible / agile integration platform. Here’s what David Linthicum had to say about re-thinking your integration layer and the ESB:

“The fact of the matter is that the cloud integration trend, the utilization of data lakes, the utilization of unstructured information, the big data systems that we’re seeing out there, the complex data analytics and the ability to consume and deal with petabytes of information in one single scoop is something where traditional integration can’t keep up with the speed and the size and the complexity of the information as it’s moving from place to place. It’s just too much for traditional ETL systems, traditional enterprise EAI systems. Things that I’ve built in the past and even the ESBs that were built around the SOA movement.

Understand that we have to continuously rethink the way in which we’re approaching technology. Integration is no different and ultimately the existing approaches and the existing technology are going to fall short so we have to rethink, reinvent, re-innovate the way in which we’re approaching integration.”

This week I’ll be posting excerpts from the discussion. David also has published a whitepaper on the topic that you can download here and you can listen to a podcast of the webinar on our iTunes channel.

To learn more about SnapLogic Ultra Pipelines, which are available today to all of our customers, visit: http://www.snaplogic.com/ultra-pipelines