VIDEO: SnapLogic Discusses Big Data on #theCUBE from Strata+Hadoop World San Jose

It’s Big Data Week here in Silicon Valley with data experts from around the globe convening at Strata+Hadoop World San Jose for a packed week of keynotes, education, networking and more - and SnapLogic was front-and-center for all the action.

SnapLogic stopped by theCUBE, the popular video-interview show that live-streams from top tech events, and joined hosts Jeff Frick and George Gilbert for a spirited and wide-ranging discussion of all things Big Data.

First up was SnapLogic CEO Gaurav Dhillon, who discussed SnapLogic’s record-growth year in 2016, the acceleration of Big Data moving to the cloud, SnapLogic’s strong momentum working with AWS Redshift and Microsoft Azure platforms, the emerging applications and benefits of ML and AI, customers increasingly ditching legacy technology in favor of modern, cloud-first, self-service solutions, and more. You can watch Gaurav’s full video below, and here:

Next up was SnapLogic Chief Enterprise Architect Ravi Dharnikota, together with our customer, Katharine Matsumoto, Data Scientist at eero. A fast-growing Silicon Valley startup, eero makes a smart wireless networking system that intelligently routes data traffic on your wireless network in a way that reduces buffering and gets rid of dead zones in your home. Katharine leads a small data and analytics team and discussed how, with SnapLogic’s self-service cloud integration platform, she’s able to easily connect a myriad of ever-growing apps and systems and make important data accessible to as many as 15 different line-of-business teams, thereby empowering business users and enabling faster business outcomes. The pair also discussed ML and IoT integration which is helping eero consistently deliver an increasingly smart and powerful product to customers. You can watch Ravi and Katharine’s full video below, and here:

 

Deep Dive into SnapLogic Winter 2017 Snaps Release

By Pavan Venkatesh

Data streams with Confluent and migration to Hadoop: In my previous blog post, I explained how future data movement trends will look. In this post, I’ll dig into some of the exciting things we announced as part of the Winter 2017 (4.8) Snaps release. This will also address future data movement trends for customers who want to move data to the cloud from different systems or migrate to Hadoop.

Major highlights in 2017 Winter release (4.8) include:

  • Support of Confluent Kafka – A distributed messaging system for streaming data
  • Teradata to Hadoop – A quick and easy way to migrate data
  • Enhancements to the Teradata Snap Pack: On the TPT front, customers can quickly load/update/delete data in Teradata
  • The RedShift Multi-Execute Snap – Allows multiple statements to be sequentially executed, so customers can maintain business logic
  • Enhancements to the MongoDB Snap pack (Delete and Update) and the DynamoDB Snap pack (Delete and Delete-item)
  • Workday Read output enhancements – Now it’s easier for the downstream systems to consume
  • Netsuite Snap Pack improvements -Users can now submit asynchronous operations
  • Security feature enhancements – Including SSL for MongoDB Snap Pack and invalidating database connection pools when account properties are modified
  • Major performance improvement while writing to an S3 bucket using S3 File Writer – Users can now configure a buffer size in the Snap so larger blocks are sent to S3 quickly

Confluent Kafka Snap Pack

Kafka is a distributed messaging system based on publish/subscribe model with high throughput and scalability. It is mainly used for ingestion from multiple sources and then sent to multiple downstream systems. Use cases include website activity tracking, fraud analytics, log aggregation, sales analytics, and others. Confluent is the company that provides the enterprise capability and offering for open source Kafka.

Here at SnapLogic we have built Kafka Producer and Consumer Snaps as part of the Confluent Snap Pack. A deep dive into Kafka architecture and its working will be a good segue before going into the Snap Pack or pipeline details.

kafka-cluster

Kafka consists of single or multiple Producers that can produce messages from a single or multiple upstream systems, and single or multiple Consumers that consume messages as part of downstream systems. A Kafka cluster constitutes one or more servers called Brokers. Messages (key and value or just the value) will be fed into higher level abstraction called Topics. Each Topic can have multiple messages from different Producers. User can also define different Topics for new category of messages. These Producers write messages to Topics and Consumers consume from one or more Topics. Also Topics are partitioned, replicated, and persisted across Brokers. Messages in the Topics are ordered within a partition and each of these will have a sequential ID number called offset. Zookeeper usually maintains these offsets but Confluent calls it coordination kernel.

Kafka also allows configuring a Consumer group where multiple Consumers are part of it, when consuming from a Topic.

With over 400 Snaps supporting various on-prem (relational databases, files, nosql databases, and others) and cloud products (Netsuite, SalesForce, Workday, RedShift, Anaplan, and others), the Snaplogic Elastic Integration Cloud in combination with the Confluent Kafka Snap Pack will be a powerful combination for moving data to different systems in a fast and streaming manner. Customers can realize benefits and generate business outcomes in a quick manner.

With respect to the Confluent Kafka Snap Pack, we support Confluent Version 3.0.1 (Kafka v0.9). These Snaps abstract the complexities and users only have to provide configuration details to build a pipeline which moves data easily. One thing to note is that when multiple Consumer Snaps are used in a pipeline and have been configured with the same consumer group, then each Consumer Snap will be assigned a different subset of partitions in the Topic.

kafka-producer

kafka-consumer

pipeline1

In the above example, I built a pipeline where sales leads (messages) stored in local files and MySQL are sent to a Topic in Confluent Kafka via Confluent Kafka Producer Snaps. The downstream system Redshift will consume these messages from that Topic via the Confluent Kafka Consumer Snap and bulk load it to RedShift for historical or auditing needs. These messages are also sent to Tableau as another Consumer to run analytics on how many leads were generated this year, so customer can compare this against last year.

Easy migrations from Teradata to Hadoop

There has been a major shift where customers are moving from expensive Teradata solutions to Hadoop or other data warehouse. Until now, there has not been an easy solution in transferring large amounts of data from Teradata to big data Hadoop. With this release we have developed a Teradata Export to HDFS Snap with two goals in mind: 1) ease of use and 2) high performance. This Snap uses the Teradata Connector for Hadoop (TDCH v1.5.1). Customers just have to download this connector from the Teradata website in addition to the regular jdbc jars. No installation required on either Teradata or Hadoop nodes.

TDCH utilizes MapReduce (MR) as its execution engine where the queries gets submitted to this framework, and the distributed processes launched by the MapReduce framework make JDBC connections to the Teradata database. The data fetched will be directly loaded into the defined HDFS location. The degree of parallelism for these TDCH jobs is defined by the number of mappers (a Snap configuration) used by the MapReduce job. The number of mappers also defines the number of files created in HDFS location.

The Snap account details with a sample query to extract data from Teradata and load it to HDFS is shown below.

edit-account

terradata-export

 

The pipeline to this effect is as follows:

pipeline2

As you can see above, you use just one Snap to export data from Teradata and load it into HDFS. Customers can later use HDFS Reader Snap to read files that are exported.

Winter 2017 release has equipped customers with lots of benefits, from data streams, easy migrations, to enhancing security functionality, and performance benefits. More information on the SnapLogic Winter 2017 (4.8) release can be found in the release notes.

Pavan Venkatesh is Senior Product Manager at SnapLogic. Follow him on Twitter @pavankv.

Why enterprises must meet millennials’ expectations

guarav_blog_headshot

Millennials’ attitudes in the workplace have gotten a bad rap, the roots of which are explored in this extremely popular video by author and speaker Simon Sinek. But this blog isn’t a slam on millennials’ expectations for job fulfillment. It’s about meeting their expectations of how easy it should be to use enterprise technology — and that’s a good thing.

A very vocal majority

Since 2015, millennials have been the largest demographic group in the US workforce, numbering 53.5 million. They are now mainstream enterprise tech consumers, and there’s a thing or two we can learn. For example, millennials came of age using smartphones. In fact, 97% of millennials aged 25-34 own a smartphone. And I doubt that a single one would want to give up their smartphone for a separate flip phone, music player and camera.

The reality is, we live in an age where people expect multiple utility from technology, a driving force in innovation. How about a washer that also dries your clothes, that’s pretty rad. Or the motorcycle helmet that puts an entire dashboard of information right in front of your eyes, that’s radder still.

Expectations for multiple utility are similarly all over the workplace, and millennials are approaching the data consumption challenge with a clean slate. They say it should be easy, like a smartphone, and be self-service. Once again, millennials are clamoring for multiple utility.

us-labor-force

SnapLogic meets millennial expectations of modern business

This is an area where SnapLogic trumps legacy technologies. On its best day, the 25-year-old data integration technology offered by Informatica creates ETLs (extract, transfer, loads) and has some other capabilities added on. But at its core, Informatica was designed to deal with batch, relational, ETL-like kinds of problems. Unfortunately, no one in the working world, not even retiring Boomers, lives in batch mode. Business change happens in real-time, and our data and analytics need to support that.

From day one, SnapLogic has been designed to solve all kinds of data-in-flight problems in the enterprise. These include, as we called them in the last century, application integration problems like connecting Salesforce with SAP, or data integration problems, providing information feeds to solve modern analytic sorts of questions. We can use SnapLogic to solve problems with technologies that weren’t widely available in the last century like predictive analytics, machine learning, or wiring up large industrial enterprises with IoT sensors, to give you new profit pools and help do a better job of building products.

That’s the kind of multiple utility that people expect from their technology — it’s not about feeds or speeds, it’s about having a smart phone versus having a separate phone, camera and music player. That’s just so 1992, you know?

This is the “match point” that SnapLogic can defend into eternity. Hundreds of our customers around the globe testify to that. Almost all of these companies had some flavor of Informatica or its competitor, and they have made the choice to move to SnapLogic. Some have moved completely, in a big bang, and others have side-by-side projects and will migrate completely to SnapLogic over time.

Want to learn more about meeting today’s lofty expectations for enterprise tech? Read SnapLogic’s new whitepaper that captures my conversation with James Markarian, SnapLogic’s CTO and also an Informatica alumnus: “We left Informatica. Now you can, too.”

snp-informatica-wp-1000x744

The need for speed: Why I left Informatica (and you should, too)

guarav_blog_headshotInformatica is one of the biggest, oldest names in enterprise technology. It’s a company I co-founded in 1992 and left over 10 years ago. Although the reasons why I left can be most easily summarized as “disagreements with the board over the direction of the company,” it all boils down to this: aging enterprise technology doesn’t move fast enough to keep up with the speed of today’s business.

About a year after I left, I founded SnapLogic, a company that has re-invented data integration for the modern enterprise — an enterprise that is increasingly living, working and innovating in the cloud. The pace at which enterprises are shifting operations to the cloud is reflected in stats like this: According to Forrester Research, the global public cloud market will top $146 billion in 2017, up from $87 billion in 2015.

Should you ride a horse to the office?

need-for-speedGiven the tidal wave of movement to the cloud, why would a company stick with Informatica? Often, it’s based on decisions made in the last century, when CIOs made strategic commitments to this legacy platform. If you’re the CIO of that shop today, you may or may not have been the person who made that decision, but here you are, running Informatica.

Going forward, does it make sense to keep running the company on Informatica? The truthful answer is it can, just as you can run a modern company on a mainframe. You can also ride a horse to the office. But is it something you should do? That’s where I say “no.” The direct path between a problem and a solution is to use appropriate technologies that are in synch with the problems being solved, in the times and the budget that are available today. That is really the crux of Informatica inheritance versus the SnapLogic future.

It’s true that the core guts of what is still Informatica — the underlying engine, the metadata, the user interface and so on — have to some extent been replenished. But they are fundamentally still fixed in the past. It’s like a mainframe; you can go from water cooling to air cooling, but fundamentally it’s still a mainframe.

The high price of opportunity cost

IT and business people always think about sunk costs, and they don’t want to give up on sunk costs. Informatica shops have invested heavily in the application, and the people, processes, iron and data centers required to run it; these are sunk costs.

But IT and business leaders need to think about sunk opportunity, and the high price their companies pay for missing out because their antiquated infrastructure — of which Informatica is emblematic — doesn’t allow them to move fast enough to seize opportunity when they see it.

Today, most enterprises are making a conscious decision to stop throwing good money after bad on their application portfolios. They recognize they can’t lose out on more opportunities. They are switching to cloud computing and modern enterprise SaaS. As a result, there’s been a huge shift toward solutions like Salesforce, Workday and Service Now; companies that swore they would never give up on-premise software are moving their application computing to the cloud.

Game, set, match point

In light of that, in a world that offers new, ultra-modern technology at commodity prices, you start to realize, “We ought to modernize. We should give up on the sunk costs and instead think of the sunk opportunity of persisting with clunky old technology.”

This is the “match point” that SnapLogic can defend into eternity. Hundreds of our customers around the globe testify to that. Almost all of these companies had some flavor of Informatica or its competitor, and they have made the choice to move to SnapLogic. Some have moved completely, in a big bang, and others have side-by-side projects and will migrate completely to SnapLogic over time.

Need more reasons to move fast? Read SnapLogic’s new whitepaper that captures my conversation with James Markarian, SnapLogic’s CTO and also an Informatica alumnus: “We left Informatica. Now you can, too.”

snp-informatica-wp-1000x744

SnapLogic Sits Down with theCUBE at AWS re:Invent to Talk Self-Service Cloud Analytics

SnapLogic was front-and-center at AWS re:Invent last week in Las Vegas, with our team busier than ever meeting with customers and prospects, showcasing our solutions at the booth, and networking into the evening with event-goers interested in all things Cloud, AWS integration and SnapLogic.

Ravi Dharnikota, SnapLogic’s Head of Enterprise Architecture and Big Data Practice, took time out to stop by and visit with John Furrier, co-founder of the live video interview show theCUBE.  Ravi was joined by Matt Glickman, VP of Products with our partner Snowflake Computing, for a wide-ranging discussion on the changing customer requirements for effective data integration, SaaS integration, warehousing and analytics in the cloud.  

The roundtable all agreed — organizations need fast and easy access to all data, no matter the source, format or location — and legacy solutions built for a bygone era simply aren’t cutting it.  Enter SnapLogic and Snowflake, each with a modern solution designed from the ground-up to be cloud-first, self-service, fully scalable and capable of handling all data. Customers using these solutions together — like Kraft Group, owners of the New England Patriots and Gillette Stadium — enjoy dramatic acceleration in time-to-value at a fraction of the cost by eliminating manual configuration, coding and tuning while bringing together diverse data and taking full advantage of the flexibility and scalability of the cloud.

To make it even easier for customers, SnapLogic and Snowflake recently announced tighter technology integration and joint go-to-market programs to help organizations harness all data for new insights, smarter decisions and better business outcomes.

To watch the full video interview on theCUBE, click here.

Don’t Let Cloud Be Another Silo: Accelerate Your AWS integration

Gone are the days when enterprises had all of their apps and data sources on-premises. Today is the era of big data, cloud and hybrid deployments. More and more enterprises are rapidly adopting different SaaS applications and hosting their solutions in public clouds including Amazon Web Services and Microsoft Azure. But soon enterprises realize that their SaaS applications and on-premises data sources are not integrated with their public cloud footprint and the integration itself becomes an expensive and time consuming undertaking.

Continue reading “Don’t Let Cloud Be Another Silo: Accelerate Your AWS integration”

New Podcast Episode: The Lifecycle of Data

Next up in our ongoing podcast series: an episode on the “lifecycle of data” featuring our guest, Enterprise Solution Architect Rich Dill. The series is hosted by our own head of enterprise architecture, Ravi Dharnikota.

In this episode, Ravi Dharnikota and Rich Dill discuss the lifecycle of data, including the transition of data storage and processing to the cloud, the implications of distributed data, a “multi-tiered data lifecycle,” and the evolution of the data lake.

You can view and subscribe to the entire series here.