Streaming Data and Data Lakes at #StrataHadoop World

Ravi DSnapLogic’s big data expert and Head of Enterprise Architecture, Ravi Dharnikota, was featured on Information Management recounting his observations at last month’s Strata+Hadoop World in San Jose. The main takeaway was that the attendees and sessions were primarily focused on streaming data, data lakes, and Apache Spark for analytics. He noted: “While the continuous innovation and change in the big data industry provides fast, frequent improvements to the technology, it is tough to keep up with in an organization where there are competing priorities and projects.”

You can read the full Q&A below. 

Information Management: What are the most common themes that you heard among Strata+Hadoop World conference attendees and how do those themes align with what you expected?

Ravi Dharnikota: Compared to the 2015 event, this year shifted a bit away from academic discussions of the latest Apache project and towards real use cases. This year I heard quite a bit about:

  • Streaming — Streaming data ingestion, processing and analytics.
  • Data lake — How to do the lake right; ingestion; governance; data prep.
  • Spark — A huge shift towards support for technologies to run on Spark as a platform.

IM: What are the most common data challenges that attendees are facing?
RD: One of the most common challenges with data management is simply its pervasiveness. It’s everywhere in the organization. They need some way of bringing it all together in one place, making data searchable and consumable by everyone, with “guardrails” in place.

The other challenge is that the big data ecosystem is both constantly changing and can be quite noisy with overlapping messages from vendors and open source die-hards. Organizations that just want to get stuff done to drive business practices need help from end-end frameworks.

IM: What are the most surprising things that you heard from attendees?
RD: None of these are truly surprising, but worth noting:

Customers are realizing that no matter how open and flexible the vision of a data lake is, there has to be some governance with proper access controls, auditing and data sensitivity considerations. Also data needs to be easily searchable for anyone looking for data in the lake.

The data lake is not just Hadoop. It could be in the cloud from Amazon, Microsoft or Google.

A lot of organizations have both Hortonworks and Cloudera in their data hub cluster.

IM: What does your company view as the top data issues or challenges in 2016?
RD: Organizations outside the heavy tech industry need guidance and help in democratizing data.

There is a lack of an industry-defined “best practice” for doing data management well in the modern big data context.

Lack of big data skill sets will continue to require self-service platforms and tools that abstract the technology and make it easy to use.

While the continuous innovation and change in the big data industry provides fast, frequent improvements to the technology, it is tough to keep up with in an organization where there are competing priorities and projects.

IM: How do these themes and challenges relate to your company’s market strategy this year?
RD: SnapLogic’s big data strategy is focused on making it easy to keep up with changes in the big data ecosystem for the organizations that are not able to pour resources into creating and tinkering with their system of moving, managing and consuming data.

Our strategy revolves around looking at the Data Lake as a whole and what an enterprise needs to achieve their Data Management initiatives. This could include looking at things like security, streaming, storage formats, governance, metadata etc.


Next Steps:

Category: Data
Topics: Data Lake Hadoop

We're hiring!

Discover your next great career opportunity.