I guess you could say that Big Data is now such a mainstream topic that there’s a band named after it. Coming off of the successful Hortonworks IPO this week, The Wall St. Journal published this article: The Joys and Hype of Software Called Hadoop – Big Data Is Hot in Silicon Valley, and Hadoop Underpins Craze. To get some context, a bit of history, and a few insights into what’s coming next, I sat down with SnapLogic’s co-founder and CEO Gaurav Dhillon to discuss all things data, what’s different and what’s the role of data integration in the era of social, mobile, analytics, cloud and the Internet of Things (SMACT).
Here’s the podcast:
And here’s the transcript of a few key parts of the discussion:
What’s different now? Why are some of these becoming mainstream topics? What’s changed?
Statisticians have been around, maybe not as long as mathematicians, but shortly thereafter they arrived. They have been using traditional technologies. Just because somebody has a sharper tool doesn’t make them a better craftsman necessarily. I agree, there’s some element of, “Oh, well sprinkle data on it and it will be magic.” Right? This magic sprinkling. I would first of all concede that statisticians have been using forms of these technologies, which we now call data science, in a way that certain companies have been doing for a long time….
What’s different, is that the volume of information that you have, and the tools that you have, take this out of a high priesthood, into more of a common man perspective. Traditionally, we democratized reporting. We democratized how people get information. There’s a potential, if this industry lives up to the promise and is responsible about all the resources available to it, there’s a potential to democratize the benefit of what would be a very highly-funded, very niche, almost a national government level of effort, for many people. There’s an opportunity to democratize that using open-source technologies, falling prices, better products, and smarter graduates from college.
Analytics is getting more than its share of attention in the market, but the feeds for these analytical systems still don’t seem to be getting the same attention. A lot of companies are stuck using what they used to use, or trying to use what they used to use to solve some of the newer problems. What’s changing on the plumbing side of things that is getting you excited?
You’re right, plumbing, like Rodney Dangerfield, doesn’t get you respect in this business. It’s always about the “gee whiz”, the graphics. “Oh, look at this thing we found.” So on and so on. What’s changing is that the plumbing is letting you have twice as many data scientists. There’s a scarcity of a job type in the world, it is data scientists. The fact that you can have twice as many chefs, like giving the sous chef business to somebody else, is hugely important to everybody. I don’t care how rich you are, because there’s always somebody as rich as you, or richer than you, competing with you. They fact that you can almost double the energy you can put into this very important area is a huge deal, and it’s causing an increase in importance in plumbing. New words like data wrangling are springing up, to really show you how the sous-chef end of preparing this wonderful result is becoming more important. How do you profile this data? How do you a priori make sure it arrives? How do you engage with the data? How do you combine and transform? These sorts of things are, I think, raising in importance because the straightforward payback of having twice the benefit from it is very, very clear.
I’ve heard you use the term “so-so integration“. Same old, same old. Why wouldn’t I just use the tools that I’ve got? I’ve spent a lot of money on these. Why wouldn’t I use those to solve some of these newer data challenges that you talk about?
The question is not, can you or can’t you? Some of our customers are some of the best-funded, biggest corporations in the world. You could. The question is, should you? In this business, it’s always been, can you or should you? What I’ve found in 2 decades of doing this is that, you really shouldn’t. The R&D investments that we have made, and a whole 50 million dollars of capital has been put into building a platform that is enormously capable, we can advance the results of that investment across all our customers. On top of that, there’s certain attributes to the platform that give you the opportunity to tackle how you move to the Cloud, as well as the data element. The change in data gravity, the change in thinking from same old, same old data warehousing, to the modern architectures of the new Hadoop world that we’re seeing. It’s not can you, the question is should you?
What’s the impact on Cloud computing on the data world? What impact is that having and will that have in the next 3 to 5 years?
I think the immediate apparent impact of Cloud computing is that you are able to light up a large number of seats of people using these products with that huge amount of skill sets and plumbing required…
In the marketing area, every marketing department has a dozen or more Cloud/SaaS applications, often without the CIO or anybody else in the business knowing about them. They don’t think of them as applications – they think of them as websites. That’s just for a small company. Multiply that times a Fortune 500 company, and there are probably hundreds and hundreds of things coming in to the company, and that asteroid belt that these companies are sailing through, is causing more data to be produced, more endpoints to be created, more engagement, and need to cross-tabulate or to be able to combine that information. This is causing a growth in the need for plumbing in the way that hasn’t been there. In the 90’s when you had 4 or 5 systems, if you had SAP and Siebel and PeopleSoft, maybe 1 other you would own. The fact that you have this asteroid belt of websites that run a business today is causing the problem of integration and plumbing to rise to the fore like never before.
What’s different about SnapLogic, what should people know about SnapLogic that they might not already know?
We built SnapLogic to change how people engage with the modern enterprise… We built this company from the ground up, based on the experiences that we had in the 90’s, to provide something that is very simple to use, scales across the largest set of problems somebody can throw at us, and is extremely well connected – has Snaps for a variety of endpoints and data points. It is a single platform to help you move to the Cloud and use big data, use Hadoop, use data science to solve the analytics set of equations that you face as your business goes through big change.
The data structures are changing, the endpoints are many, many more. We think of them as massively multi-point data-types. In addition, you have a very broad population of users that you didn’t have in the 90’s where integration was more of a back office operation in the dungeon. The engine room of the ship, not quite at the bridge. The passengers didn’t even know it existed, unless it stopped working.