James Markarian, CTO, SnapLogic sat down with Jeff Frick from theCUBE to discuss the evolving big data landscape.
More videos from SnapLogic Innovation Day 2018:
- Gaurav Dhillon on the future of enterprise integration
- Greg Benson on how AI is accelerating app and data integration.
- Craig Stewart on the role of APIs and new integration with Apigee Edge.
- Diletta D’Onofrio on the role of integration in digital transformation success.
- Omar Nawaz on his approach to digital transformation at Quantum.
>> Announcer: From San Mateo,California, it’s theCUBE! Covering SnapLogic, Innovation Day, 2018. Brought to you by SnapLogic.
>> Hey welcome back everybody,Jeff Frick here with theCUBE. We are in San Mateo, at what they call the crossroads, it’s 92 and 101. If you’re coming by and probably sitting in a traffic, look up and you’ll see SnapLogic. It’s their new offices. We’re really excited to be here for Innovation Day. We’re excited to have this CTO, James Markarian. James, great to see you and I guess, we last talked was a couple years ago in New York City.
>> Yeah that’s right, and why was I there? It was like a big data show.>> That’s right.
>> And we are two years later talking about big data.
>> Big data, big data is fading a little bit, because now big data is really an engine, that’s powering this new thing that’s so exciting, which is all about analytics,and machine learning, and we’re going to eventually stop saying artificial intelligence and say augmented intelligence, ’cause there’s really nothing artificial about it.
>> Yeah and we might stop saying big data and just talk about data because it’s becoming so ubiquitous.
>> Jeff: Right.
>> I know that big data, it’s not necessarily going away but it’s sort of how we’re thinking about handling it is, like kind of evolved over time, especially in the last couple of years.
>> That’s what we’re kind of seeing from our customers.
>> ‘Cause there’s kind of an ingredient now, right? It’s no longer this new shiny object now. It’s just part of the infrastructure that helps you get everything else done.
>> Yeah, and I think when you think about it, from like, an enterprise point of view, that that shift is going from experimentation to operationalizing. I think that the things you look for in experimentation, there’s like, one set of things here looking for proving out the overall value, regardless maybe of costand uptime and other things and as you operationalize you start thinking about other considerations that obviously Enterprise IT has to think about.
>> Right, so if you think back to like, Hadoop Summit and Hadoop World who were first cracking their teeth, like in 2010 or around that time frame, one of the big discussions that always comes up and that was before kind of the rise of public cloud, you know which has really taken off over the last several years, there’s this kind of ongoing debate between, do you move the data to the compute or do you move the compute to the data? There was always like, this monster data gravity issue which was almost insurmountable and many would say, oh, you’re never going to get all your data into the cloud. It’s just way too hard and way too expensive. But, now Amazon has Snowball and Snowball isn’t big enough. They actually had a diesel truck that’ll come and help you come move your data. Amazon rolled that thing across the stage a couple of years ago. The data gravity thing seems to be less and if you think of a world with infinite compute, infinite stored, infinite networking asyndetically approaching zero, not necessarily good news for some vendors out there but that’s a world that we’re eventually getting to that changes the way that you organize all this stuff.
>> Yeah, I think so and so much has changed. I was fortunate to be one of the early speakers, like I used to do Worlds and everything, and I was adamantly proclaiming you know, the destiny of Hadoop as bright and shiny and there’s this question about what really happened. I think that there’s a kind of a few different variables that kind of shifted at the same time. One, is of course, this like glut of computing in the cloud happened and there are so many variables moving at once. It’s like, How much time do you have Jeff?
>> Ask them to get a couple more drinks for us.
>> Seeing our lovely new headquarters here and one of the things is that there is no big data center. We have a little closet with some of the servers we keep around but mostly, everything we do is on Amazon. You’re even looking at things like, commercial real estate is changing because I don’t need all the cooling and the power and the space for my data center that I once had.
>> Jeff: Right, right.
>> I become a lot more space efficient than I used to be and so the cloud is really kind of changing everything. On the data side, you mention this like, interesting philosophical shift, going from I couldn’t possibly do it in the cloud to why in the world would we not do things in the cloud. Maybe the one stall word in there being some fears about security. Obviously there’s been a lot of breaches. I think that there’s still a lot of introspection everyone needs to do about, are my on premise systems actually more secure than some of these cloud providers? It’s really not clear that we know the answer to that. In fact, we suspect that some of the cloud providers are actually more secure because they are professionals about it and they have the best practice.
>> And a whole lot of money.
>> The other thing that happened that you didn’t mention, that’s approaching infinity and we’re not quite there yet, is interconnect speeds. So it used to be the case that I have a bunch of mainframes and I have a tier rating system and I have a high speed interconnect that puts the two together. Now with fiber network sand just in general, you can run super high speed, like WAN. Especially if you don’t care quite as much about latency. So if 500 millisecond latency is still okay with you.
>> You can do a heck of a lot and move a lot to the cloud. In fact, it’s so good,that we went from worrying, could I do this in the cloud at all to well, why wouldn’t I do somethings in Amazon and some things in Microsoftand some things in Google? Even if it meant replicating my data across all these environments. The backdrop for some of that is, we had a lot of customers and I was thinking that people would approach it this way, they would install on premise Hadoop, whether it’s like Apache or Cloud Air or the other vendors and I would hire a bunch of folks that are the administrators and retire terra data and I’m going to put all my ETL jobs on there, etc. It turned out to be a great theory and the practice is real for some folks but it turned out to be moving a lot of things to kind of shifting sands because Hadoop was evolving at the time. A lot of customers were putting a lot of pressure on it, operational pressure. Again, moving from experimentation phase over to like, operational phase.
>> Jeff: Right, right.
>> When you don’t have the uptime guarantee and I can’t just hire somebody off the street to administer this, ithas to be a very sharp, knowledgeable person that’s very expensive, people start saying, what am I really getting from this and can I just dump it all in S3 and apply a bunch of technology there and let Amazon worry about keeping this thing up and running? People start to say, I used to reject that idea and now it’s sounding like a very smart idea.
>> It’s so funny we talk about people processing tech all the time, right? But they call them tech shows, they don’t call thempeople in process shows.
>> At least not the ones we go to but time and time again I remember talking to some people about the Hadoop situation and there’s just like, no Hadoop people. Sometimes technology all day long. There just aren’t enough people with the skills to actually implement it. It’s probably changed now but I remember that was such a big problem. It’s funny you talk about security and cloud security. You know, at AWS, on Tuesday night of Reinvent, they have a special, kind of a technical keynote speak and like, James Hamilton would go. In the amount of resources, and I just remember one talk he gave just on their cabling across the ocean, and the amount of resources that he can bring to bear, relative to any individual company, is so different; much less a mid-tier company or a small company. I mean, you can bring so much more resources, expertise and knowledge.
>> Yeah, the economy is a scale, their just there.
>> They’re just crazy.
>> That’s right and that why you know, you sort of assume that the cloud sort of, eventually eats everything.
>> Right, right.
>> So there’s no reason to believe this won’t be one of those cases.
>> So you guys are getting Extreme. So what is Snaplogic Extreme?
>> Well, Snaplogic Extreme is kind of like a response to this trend of data moving from on premise to the cloud and there are some interesting dynamics of that movement. First of all, you need to get data into the cloud, first of all and we’ve been doing that for years. Connect to everything,dump it in S3, ADLS, etc. No problem. The thing we’re seeing with cloud computing is like, there’s another interesting shift. Not only is it kind of like mess for less, and let Amazon manage all this, and I probably refer to Amazon more than other vendors would appreciate.
>> Right, right. They’re the leaders solet’s call a spade a spade.
>> Certainly Google and Microsoft are out there as well so those are the top three and we’ve acknowledged that.
>> One of the interesting things about it is that you couldn’t really adequately achieve on premises is the burstiness of your computer. I run at a steady state where I need, you know, 10 servers or a 100 servers, but every once in a while, I need like, 1,000 or 10,000 servers to apply to something. So what’s the on premise model? Rack and stack, 10,000 machines, and it’s like waiting for the great pumpkin, waiting for that workload to come that I’ve been waiting months and months for and maybe it never comes but I’ve been paying for it. I paid for a software license for the thing that I need to run there. I’m paying for the cabling and the racking and everything and the person administering. Make sure the disks are all operating in the case where it gets used. Now, all of a sudden, we are taking Amazon and they’re saying, hey, pay us for what you’re using. You can use reserved pricing and pay a lower rate for the things you might actually care about on a consistent basis but then I’m going to allow you to spike,and I’ll just run the meter. So this has caused software vendors like us, to look at the way we charge and the way that we deploy our resources and say, hey, that’s a very good model. We want to follow that and so we introduced Snaplogic Extreme, which has a few different components. Basically, it enables us to operate in these elastic environments, shift our thinking in pricing so that we don’t think about like,node based or god forbid, core based pricing and say like, hey, basically pay us for what you do with your data and don’t worry about how many servers it’s running on. Let Snaplogic worry about spinning up and spinning down these machines because a lot of these workloads are data integration or application workloads that we know lots about.
>> So first of all, wemanage these ephemeral, what we call ephemeral or elastic clusters. Second of all, the way that we distribute our workload is by generating Spark code currently. We use the same graphic environment that you use for everything but instead of running on our engines, we kind of spit out Spark code on the end that takes advantage of the massive scale out potential for these ephemeral environments.
>> We’ve also kind of built this in such a way that it’s Spark today but it could be like, Native or some other engine like Flank or other things that come up. We really don’t care like what back end engine actually is as long as it can run certain types of data oriented jobs. It’s actually like lots of things in one. We combine out data acquisition and distribution capability with this like, massive elastic scale out capability.
>> Yeah, it’s unbelievable how you can spin that up and then of course, most people forget you need to spin it down after the event.
>> James: Yeah, that’s right.
>> We talked to a great vendor who talked about, you know, my customer spends no money with me on the weekend, zero.
>> James: Right.
>> And I’m thrilled because they’re not using me. When they do use me, then they’re buying stuff. I think what’s really interesting is how that changes. Also, your relationship with your customer. If you have a recurring revenue model, you have to continue to deliver a value. You have to stay close to your customer. You have to stay engaged because it’s not a one time pop and then you send them the 15%or 20% maintenance bill. It’s really this ongoing relationship and they’re actually gaining value from your products each and every time you use that. It’s a very different way.
>> Yeah, that’s right. I think it creates better relationships because you feel like, what we do is unproportionate to what they do and vise-versa, so it has this fundamental fairness about it, if you will.
>> Right, it’s a good relationship but I want to go down another path before you turn the cameras on. Talk a little bit about the race always between the need for compute and the compute. It used to be personified best with Microsoft and Intel until we come out with a new chip and then Microsoft OS would eat up all the extra capacity and then they’d come up with a new chip and it was an ongoing thing. You made an interesting comment that, especially in the cloudworld where the scale of these things is much, much bigger, that ran a world now where the compute and the storage have kind of, outpaced the applications, if you will, and there’s an opportunity for the application to catch up. Oh by the way, we have this cool new thing called machine learning and augmented intelligence. I wonder if you could, is that what’s going to fill or kind of rebalance the consumption pattern?
>> Yeah, it seems that way and I always think about kind of like, computer and software spiraling around each other like a helix.
>> Like at one point,one is leading the other and they sort of just, one eventually surpasses the other and then you need innovation on the other side. I think for a while, like if you turn the clock way back to like, when the Pentium was introduced and everyone was like, how are we ever going to use all of the compute power.
>> Windows 95, whoo!
>> You know, power of like the Pentium. Do I really need to run my spreadsheets 100% faster? There’s no business value whatsoever in transacting faster, orlike general user interface or like graphical user interfaces or rendering web pages. Then you start seeing this new glut, often led by like researchers first. Like, software applications coming up that use all of this powerbecause in academia you can start saying, what if I did have infinite compute? What would I do differently? You see things, you knowlike VR and advanced gaming, come up on the consumer side. Then I think the real answer on the business side is AI and ML. The general trend I start thinking of is something I used to talk about,back in the old days, which is conversion of like, having machines work for us instead of us working for machines. The only way we’re ever going to get there is by having higher and higher intelligence on the application side so that it kind of intuits more based on what it’s seen before and what it knows about you, etc., in terms of the task that needs to get done. Then there’s this whole new breed of person that you need in order to wield all that power because like Hadoop,it’s not just natural. You don’t just have people floating around like, hey, you know, I’m going to be an Uzi expert or a yarn expert. You don’t run into people everyday that’s like, oh, yeah, I know neural nets well. I’m a gradient descent expert or whatever you’re model is. It’s really going to drive like, lots of changes I think.
>> Right, well hopefully it does and especially like we were talking about earlier, you know, within core curriculums at schools and stuff. We were with Grace Hopper and Brenda Wilkerson, the new head of the Anita Borg organization, was at this Chicago public school district and they’re actually starting to make CS a requirement, along with biology and physics and chemistry and some of these other things.
>> So we do have a huge,a huge dearth of that but I want to just closeout on one last concept before I let you go and you guys are way on top of this. Greg talked about what you just talked about, which is making the computers work for us versus the other way around. That’s where the democratization of the power that we heard a lot about the democratization of big data and the tools and now you guys you guys are talking about the democratization of the integration,especially when you have a bunch of cloud based applications that everybody has access to and maybe, needs to stitch together a different way. But when you look at this whole concept of democratization of that power, how do you see that kind of playing out over the next several years?
>> Yeah, that’s a very big-
>> Sorry I didn’t bring you a couple of beer before I brought that up.
>> Oh no, I got you covered. So it’s a very big, interesting question because I think that you know, first of all, it’s one of these, god knows, we can’t predict with a lot of accuracy how exactly that’s going to look because we’re sort of juxtaposing two things. One is, part of the initial move to the cloud was the failure to properly democratize data inside the enterprise, for whatever reason, and we didn’t do it. Now we have the computer resources and the central, kind of web based access to everything. Great. Now we have Cambridge Analytica and like, Facebook and people really thinking about data privacy and the fact that we want ubiquitous safe access. I think we know how to make things ubiquitous. The question is, do we know how to make it safe and fair so that the right people are using the right data and the right way? It’s a little bit like, you know, there’s all these cautionary tales out there like, beware of AI and robotics and everything and nobody really thinks about the danger of the data that’s there. It’s a much more immediate problem and yet it’s sort of like the silent killer until some scandal comes up. We start thinking about these different ways we can tackle it. Obviously there’s great solutions for tokenization and encryption and everything at the data level but even if you have the access to it, the question is, how do you control that wildfire that could happen as soon as the horse leaves the barn. Maybe not in it’s current form, but when you look at things like Blockchain, there’s been a lot of predictions about how Blockchain can be used around like, data. I think that this privacy and this curation and tracking of who has the data, who has access to it and can we control it, I think you are looking at even more like, centralized and guarded access to this private data.
>> Great, interesting times.
>> Yeah, yeah Jeff, for sure.
>> Alright James, well thanks for taking a couple of minutes with us. I really enjoyed the conversation.
>> Yeah, it’s always great. Thanks for having me Jeff.
>> It’s James on Jeff and you’re watching theCUBE We’re at the Snaplogic headquarters in San Mateo, California and thanks for watching. (electronic music)