Deep Dive into SnapLogic Winter 2017 Snaps Release

By Pavan Venkatesh

Data streams with Confluent and migration to Hadoop: In my previous blog post, I explained how future data movement trends will look. In this post, I’ll dig into some of the exciting things we announced as part of the Winter 2017 (4.8) Snaps release. This will also address future data movement trends for customers who want to move data to the cloud from different systems or migrate to Hadoop.

Major highlights in 2017 Winter release (4.8) include:

  • Support of Confluent Kafka – A distributed messaging system for streaming data
  • Teradata to Hadoop – A quick and easy way to migrate data
  • Enhancements to the Teradata Snap Pack: On the TPT front, customers can quickly load/update/delete data in Teradata
  • The RedShift Multi-Execute Snap – Allows multiple statements to be sequentially executed, so customers can maintain business logic
  • Enhancements to the MongoDB Snap pack (Delete and Update) and the DynamoDB Snap pack (Delete and Delete-item)
  • Workday Read output enhancements – Now it’s easier for the downstream systems to consume
  • Netsuite Snap Pack improvements -Users can now submit asynchronous operations
  • Security feature enhancements – Including SSL for MongoDB Snap Pack and invalidating database connection pools when account properties are modified
  • Major performance improvement while writing to an S3 bucket using S3 File Writer – Users can now configure a buffer size in the Snap so larger blocks are sent to S3 quickly

Confluent Kafka Snap Pack

Kafka is a distributed messaging system based on publish/subscribe model with high throughput and scalability. It is mainly used for ingestion from multiple sources and then sent to multiple downstream systems. Use cases include website activity tracking, fraud analytics, log aggregation, sales analytics, and others. Confluent is the company that provides the enterprise capability and offering for open source Kafka.

Here at SnapLogic we have built Kafka Producer and Consumer Snaps as part of the Confluent Snap Pack. A deep dive into Kafka architecture and its working will be a good segue before going into the Snap Pack or pipeline details.

kafka-cluster

Kafka consists of single or multiple Producers that can produce messages from a single or multiple upstream systems, and single or multiple Consumers that consume messages as part of downstream systems. A Kafka cluster constitutes one or more servers called Brokers. Messages (key and value or just the value) will be fed into higher level abstraction called Topics. Each Topic can have multiple messages from different Producers. User can also define different Topics for new category of messages. These Producers write messages to Topics and Consumers consume from one or more Topics. Also Topics are partitioned, replicated, and persisted across Brokers. Messages in the Topics are ordered within a partition and each of these will have a sequential ID number called offset. Zookeeper usually maintains these offsets but Confluent calls it coordination kernel.

Kafka also allows configuring a Consumer group where multiple Consumers are part of it, when consuming from a Topic.

With over 400 Snaps supporting various on-prem (relational databases, files, nosql databases, and others) and cloud products (Netsuite, SalesForce, Workday, RedShift, Anaplan, and others), the Snaplogic Elastic Integration Cloud in combination with the Confluent Kafka Snap Pack will be a powerful combination for moving data to different systems in a fast and streaming manner. Customers can realize benefits and generate business outcomes in a quick manner.

With respect to the Confluent Kafka Snap Pack, we support Confluent Version 3.0.1 (Kafka v0.9). These Snaps abstract the complexities and users only have to provide configuration details to build a pipeline which moves data easily. One thing to note is that when multiple Consumer Snaps are used in a pipeline and have been configured with the same consumer group, then each Consumer Snap will be assigned a different subset of partitions in the Topic.

kafka-producer

kafka-consumer

pipeline1

In the above example, I built a pipeline where sales leads (messages) stored in local files and MySQL are sent to a Topic in Confluent Kafka via Confluent Kafka Producer Snaps. The downstream system Redshift will consume these messages from that Topic via the Confluent Kafka Consumer Snap and bulk load it to RedShift for historical or auditing needs. These messages are also sent to Tableau as another Consumer to run analytics on how many leads were generated this year, so customer can compare this against last year.

Easy migrations from Teradata to Hadoop

There has been a major shift where customers are moving from expensive Teradata solutions to Hadoop or other data warehouse. Until now, there has not been an easy solution in transferring large amounts of data from Teradata to big data Hadoop. With this release we have developed a Teradata Export to HDFS Snap with two goals in mind: 1) ease of use and 2) high performance. This Snap uses the Teradata Connector for Hadoop (TDCH v1.5.1). Customers just have to download this connector from the Teradata website in addition to the regular jdbc jars. No installation required on either Teradata or Hadoop nodes.

TDCH utilizes MapReduce (MR) as its execution engine where the queries gets submitted to this framework, and the distributed processes launched by the MapReduce framework make JDBC connections to the Teradata database. The data fetched will be directly loaded into the defined HDFS location. The degree of parallelism for these TDCH jobs is defined by the number of mappers (a Snap configuration) used by the MapReduce job. The number of mappers also defines the number of files created in HDFS location.

The Snap account details with a sample query to extract data from Teradata and load it to HDFS is shown below.

edit-account

terradata-export

 

The pipeline to this effect is as follows:

pipeline2

As you can see above, you use just one Snap to export data from Teradata and load it into HDFS. Customers can later use HDFS Reader Snap to read files that are exported.

Winter 2017 release has equipped customers with lots of benefits, from data streams, easy migrations, to enhancing security functionality, and performance benefits. More information on the SnapLogic Winter 2017 (4.8) release can be found in the release notes.

Pavan Venkatesh is Senior Product Manager at SnapLogic. Follow him on Twitter @pavankv.

From helicopter to enabler: The new face of enterprise IT

Can an IT organization effectively run a 2017 business on 25-year-old technology? As someone who played a large hand in developing the data integration technology in question — at Informatica, where I was CTO for nearly two decades — I can tell you that the answer is simple: “No.”

A vastly different primordial landscape

That said, I know that when Informatica was created, it was the best technology for data integration at the time. The world was a lot simpler in 1992: there were five databases that mattered, and they were all pretty similar. There were just a few ERP systems: Oracle, SAP and a young PeopleSoft. Informatica was ideally suited to that software baseline, and the scale-up UNIX platforms of that era. The web, obviously, was not in the picture.

IT organizations were also a lot simpler in 1992. If any business person wanted new tech functionality — a new workstation added to a network, or a new report from a client/server system — they put their request into the IT queue, because that was the only way to get it.

IT is still important; it’s just different

Fast-forward 25 years to 2017. Almost everything about that primordial technology landscape, when Informatica roamed the world, is different. For example, now there’s the web, the cloud, NoSQL databases, and best of breed application strategies that are actually viable. None of these existed when Informatica started. Every assumption from that time — the compute platform, scale-up/scale-out, data types, data volumes and data formats — is different.

IT organizations are radically different, too. The command-and-control IT organization of the past has transformed into a critical enablement function. IT still enables core operations by securing the enterprise and establishing a multitude of technology governance frameworks. But the actual procurement of end-user technology, such as analyzing data aggregated from across systems and across the enterprise, is increasingly in the hands of business users.

In other words, the role of IT is changing, but the importance of IT isn’t. It’s like parenting; as your kids grow your role changes. It’s less about helicoptering and more about enabling. Parents don’t become less important, but how we deliver value evolves.

This is a good analog to the changes in enterprise IT. The IT organization wants to enable users because it’s pretty impossible to keep up with the blistering pace of business growth and change. If the IT organization tries to control too much, at some point it starts holding the business back.

Smart IT organizations have realized their role in the modern enterprise is to help their business partners become more successful. SnapLogic delivers a vital piece of required technology; we help IT organizations to give their users the self-service data integration services they need, instead of waiting for analysts to run an ETL through Informatica to pull the requested data together. By enabling self-service, SnapLogic is helping lines of business — most companies’ biggest growth drivers — to reach their full potential. If you’re a parent reading this, I know it will sound familiar.

Here’s another way to find out more about why IT organizations are embracing SnapLogic as a critical enabler: readSnapLogic’s new whitepaper that captures my conversation with Gaurav Dhillon, SnapLogic’s CEO and also an Informatica alumnus: “We left Informatica. Now you can, too.”

snp-informatica-wp-1000x744

Why enterprises must meet millennials’ expectations

guarav_blog_headshot

Millennials’ attitudes in the workplace have gotten a bad rap, the roots of which are explored in this extremely popular video by author and speaker Simon Sinek. But this blog isn’t a slam on millennials’ expectations for job fulfillment. It’s about meeting their expectations of how easy it should be to use enterprise technology — and that’s a good thing.

A very vocal majority

Since 2015, millennials have been the largest demographic group in the US workforce, numbering 53.5 million. They are now mainstream enterprise tech consumers, and there’s a thing or two we can learn. For example, millennials came of age using smartphones. In fact, 97% of millennials aged 25-34 own a smartphone. And I doubt that a single one would want to give up their smartphone for a separate flip phone, music player and camera.

The reality is, we live in an age where people expect multiple utility from technology, a driving force in innovation. How about a washer that also dries your clothes, that’s pretty rad. Or the motorcycle helmet that puts an entire dashboard of information right in front of your eyes, that’s radder still.

Expectations for multiple utility are similarly all over the workplace, and millennials are approaching the data consumption challenge with a clean slate. They say it should be easy, like a smartphone, and be self-service. Once again, millennials are clamoring for multiple utility.

us-labor-force

SnapLogic meets millennial expectations of modern business

This is an area where SnapLogic trumps legacy technologies. On its best day, the 25-year-old data integration technology offered by Informatica creates ETLs (extract, transfer, loads) and has some other capabilities added on. But at its core, Informatica was designed to deal with batch, relational, ETL-like kinds of problems. Unfortunately, no one in the working world, not even retiring Boomers, lives in batch mode. Business change happens in real-time, and our data and analytics need to support that.

From day one, SnapLogic’s integration platform has been designed to solve all kinds of data-in-flight problems in the enterprise. These include, as we called them in the last century, application integration problems like connecting Salesforce with SAP, or data integration problems, providing information feeds to solve modern analytic sorts of questions. We can use SnapLogic’s application integration architecture to solve problems with technologies that weren’t widely available in the last century like predictive analytics, machine learning, or wiring up large industrial enterprises with IoT sensors, to give you new profit pools and help do a better job of building products.

That’s the kind of multiple utility that people expect from their technology — it’s not about feeds or speeds, it’s about having a smart phone versus having a separate phone, camera and music player. That’s just so 1992, you know?

This is the “match point” that SnapLogic can defend into eternity. Hundreds of our customers around the globe testify to that. Almost all of these companies had some flavor of Informatica or its competitor, and they have made the choice to move to SnapLogic. Some have moved completely, in a big bang, and others have side-by-side projects and will migrate completely to SnapLogic over time.

Want to learn more about meeting today’s lofty expectations for enterprise tech? Read SnapLogic’s new whitepaper that captures my conversation with James Markarian, SnapLogic’s CTO and also an Informatica alumnus: “We left Informatica. Now you can, too.”

snp-informatica-wp-1000x744

Winter 2017 Release Is Now Available

As enterprises grow and adopt best of breed solutions based in the cloud, on-premises and/or hybrid, integrating data between varied applications, databases and data warehouses (used by the enterprise) continues to be a challenge. New solutions are rapidly adopted, and technical and non-technical users alike need help to meet the challenge of quickly integrating the data from multiple sources into one view to make decisions at the speed of business.

snp-76209-winterrelease-484x252-facebookThe release includes several new Snaps and Snap updates that make it faster and easier to integrate Workday, NetSuite and Amazon Redshift with other applications and data sources across the enterprise. All three systems are increasingly popular as businesses embrace the cloud to run their business, a “cloud shift” that Gartner says will drive more than $1 trillion in technology spending by 2020.

Here is a brief overview of new and enhanced Snaps:

  • Confluent KafkaThe need for streaming data becomes more important and today about one-third of the Fortune 500 uses Kafka. SnapLogic is pleased to introduce a new Snap for Confluent’s distribution of Apache KafkaTM, an enterprise-ready solution that connects data sources, applications and IoT devices in real time.
  • TeradataSeveral new Snaps have been added to Teradata Snap Pack expanding support with Teradata TPT Load, TPT Update Snap, and Teradata Export to HDFS Snap which allows customers to easily export data from Teradata to an HDFS cluster without the need for any additional installation or complex configuration.
  • Workday: Workday Read Snap has been enhanced to provide a simplified Workday output format making it even easier to be consumed by downstream systems.
  • NetSuite: Asynchronous operations support for NetSuite, enables more efficient use of NetSuite’s capabilities, through new Snaps including Netsuite Async  Upsert, Async Search, Async Delete List, Async GetList, Check Async Status and GetAsync Result Operations Support Snap.
  • Amazon Redshift: Our customers use Redshift to connect multiple on-premises data sources and applications to Redshift without any coding. The Winter 2017 release introduces a new Snap to execute multiple RedShift commands in one Snap, thereby making RedShift data integration pipelines even more easy to create and manage.
  • Amazon S3: The Winter 2017 release brings additional streaming performance improvement while writing to an Amazon S3 bucket.

Continued Enterprise Focus: Introducing Asset Search Functionality

SnapLogic continues to be the best platform for enterprise IT and LOB teams to integrate applications and data sources without any coding. Enterprises often have thousands of pipelines, files and accounts and it’s hard to search for a given asset. The Winter 2017 release allows customers to quickly search for assets and also filter search outputs.

Security and Performance Enhancements

Security and performance continue to be focus areas for SnapLogic. To further tighten user passwords, the Winter 2017 release enforces enhanced password complexity requirements. Customers can also configure session timeout and idle timeout parameters. In addition, the MongoDB snap pack has been extended to support SSL.

SnapLogic is committed to supporting the growing enterprise’s needs. We hope you will find the new Confluence Kafta snap, expanded support for WorkDay, Netsuite, Amazon RedShift, enhanced search and security useful. Customers can start using the capabilities described in the Winter 2017 release right away. For more information on the Winter 2017 release, including demo videos, see www.snaplogic.com/winter2017.

The need for speed: Why I left Informatica (and you should, too)

guarav_blog_headshotInformatica is one of the biggest, oldest names in enterprise technology. It’s a company I co-founded in 1992 and left over 10 years ago. Although the reasons why I left can be most easily summarized as “disagreements with the board over the direction of the company,” it all boils down to this: aging enterprise technology doesn’t move fast enough to keep up with the speed of today’s business.

About a year after I left, I founded SnapLogic, a company that has re-invented data integration for the modern enterprise — an enterprise that is increasingly living, working and innovating in the cloud. The pace at which enterprises are shifting operations to the cloud is reflected in stats like this: According to Forrester Research, the global public cloud market will top $146 billion in 2017, up from $87 billion in 2015.

Should you ride a horse to the office?

need-for-speedGiven the tidal wave of movement to the cloud, why would a company stick with Informatica? Often, it’s based on decisions made in the last century, when CIOs made strategic commitments to this legacy platform. If you’re the CIO of that shop today, you may or may not have been the person who made that decision, but here you are, running Informatica.

Going forward, does it make sense to keep running the company on Informatica? The truthful answer is it can, just as you can run a modern company on a mainframe. You can also ride a horse to the office. But is it something you should do? That’s where I say “no.” The direct path between a problem and a solution is to use appropriate technologies that are in synch with the problems being solved, in the times and the budget that are available today. That is really the crux of Informatica inheritance versus the SnapLogic future.

It’s true that the core guts of what is still Informatica — the underlying engine, the metadata, the user interface and so on — have to some extent been replenished. But they are fundamentally still fixed in the past. It’s like a mainframe; you can go from water cooling to air cooling, but fundamentally it’s still a mainframe.

The high price of opportunity cost

IT and business people always think about sunk costs, and they don’t want to give up on sunk costs. Informatica shops have invested heavily in the application, and the people, processes, iron and data centers required to run it; these are sunk costs.

But IT and business leaders need to think about sunk opportunity, and the high price their companies pay for missing out because their antiquated infrastructure — of which Informatica is emblematic — doesn’t allow them to move fast enough to seize opportunity when they see it.

Today, most enterprises are making a conscious decision to stop throwing good money after bad on their application portfolios. They recognize they can’t lose out on more opportunities. They are switching to cloud computing and modern enterprise SaaS. As a result, there’s been a huge shift toward solutions like Salesforce, Workday and Service Now; companies that swore they would never give up on-premise software are moving their application computing to the cloud.

Game, set, match point

In light of that, in a world that offers new, ultra-modern technology at commodity prices, you start to realize, “We ought to modernize. We should give up on the sunk costs and instead think of the sunk opportunity of persisting with clunky old technology.”

This is the “match point” that SnapLogic can defend into eternity. Hundreds of our customers around the globe testify to that. Almost all of these companies had some flavor of Informatica or its competitor, and they have made the choice to move to SnapLogic. Some have moved completely, in a big bang, and others have side-by-side projects and will migrate completely to SnapLogic over time.

Need more reasons to move fast? Read SnapLogic’s new whitepaper that captures my conversation with James Markarian, SnapLogic’s CTO and also an Informatica alumnus: “We left Informatica. Now you can, too.”

snp-informatica-wp-1000x744

7 Data Predictions for 2017

As data increasingly becomes the means by which businesses compete, companies are restructuring operations to build systems and processes liberating data access, integration and analysis up and down the value chain. Effective data management has become so important that the position of Chief Data Officer is projected to become a standard senior board level role by 2020, with 92 percent of CIOs stating that a CDO is the best person to determine data strategy.

With this in mind as you evaluate your data strategy for 2017, here are seven predictions to contemplate to build a solid framework for data management and optimization.

  1.  Self-Service Data Integration Will Take Off
    Eschewing the IT bottleneck designation and committed to being a strategic partner to the business, IT is transforming its mindset. Rather than be providers of data, IT will enable users to achieve data optimization on a self-service basis. IT will increasingly decentralize app and data integration – via distributed Centers of Excellence based on shared infrastructure, frameworks and best practices – thereby enabling line-of-business heads to gather, integrate and analyze data themselves to discern and quickly act upon insightful trends and patterns of import to their roles and responsibilities. Rather than fish for your data, IT will teach you how to bait the hook. The payoff for IT: satisfying business user demand for fast and easy integrations and accelerated time to value; preserving data integrity, security and governance on a common infrastructure across the enterprise; and freeing up finite IT resources to focus on other strategic initiatives.
  1. Big Data Moves to the Cloud
    As the year takes shape, expect more enterprises to migrate storage and analysis of their big data from traditional on-premise data stores and warehouses to the cloud. For the better part of the last decade, Hadoop’s distributed computing and processing power has made it the standard open source platform for big data infrastructures. But Hadoop is far from perfect. Common user gripes include complexity and instability – not all that surprising given all the software developers regularly contributing their improvements to the platform. Cloud environments are more stable, flexible, elastic and better-suited to handling big data, hence the predicted migration.
  1. Spark Usage Outside of Hadoop Will Surge
    This is the year we will also see more Spark use cases outside of Hadoop environments. While Hadoop limps along, Spark is picking up the pace. Hadoop is still more likely to be used in testing rather than production environments. But users are finding Spark to be more flexible, adaptable and better suited for certain workloads – machine learning and real-time streaming analytics, as examples. Once relegated to Hadoop sidekick, Spark will break free and stand on its own two feet this year. I’m not alone in asking the question: Hadoop needs Spark but does Spark need Hadoop?
  1. A Big Fish Acquires a Hadoop Distro Vendor?
    Hadoop distribution vendors like Cloudera and Hortonworks paved the way with promising technology and game-changing innovation. But this past year saw growing frustration among customers lamenting increased complexity, instability and, ultimately, too many failed projects that never left the labs. As Hadoop distro vendors work through some growing pains (not to mention limited funds), could it be that a bigger, deeper-pocketed established player – say Teradata, Oracle, Microsoft or IBM – might swoop in to buy their sought after technology and marry it with a more mature organization? I’m not counting it out.
  1. AI and ML Get a Bit More Mainstream
    Off the shelf AI (artificial intelligence) and ML (machine learning) platforms are loved for their simplicity, low barrier to entry and low cost. In 2017, off the shelf AI and ML libraries from Microsoft, Google, Amazon and other vendors will be embedded in enterprise solutions, including mobile varieties. Tasks that have until now been manual and time-consuming will become automated and accelerated, extending into the world of data integration.

6. Yes, IoT is Coming, Just Not This Year
Connecting billions and billions of sensor-embedded devices and objects over the internet is inevitable, but don’t yet swallow all the hype. Yes, there is a lot being done to harness IoT for specific aims, but the pace toward the development of a general-purpose IoT platform is closer to a canter than a gallop. IoT solutions are too bespoke and purpose-built to solve broad, commonplace problems – the market still nascent with standards gradually evolving – that a general-purpose, mass-adopted IoT platform to collect, integrate and report on data in real-time will take, well, more time. Like any other transformation movement in the history of enterprise technology, brilliant bits and pieces need to come together as a whole. It’s coming, just not in 2017.

  1. APIs Are Not All They’re Cracked Up to Be
    APIs have long been the glue connecting apps and services, but customers will continue to question their value vs investment in 2017. Few would dispute that APIs are useful in building apps and, in many cases, may be the right choice in this regard. But in situations where the integration of apps and/or data is needed and sought, there are better ways. Case in point is iPaaS (integration platform as a service), which allows you to quickly and easily connect any combination of cloud and on-premise technologies. Expect greater migration this year toward cloud-based enterprise integration platforms – compared to APIs, iPaaS solutions are more agile, better equipped to handle the vagaries of data, more adaptable to changes, easier to maintain and far more productive.

I could go on and on, if for no other reason that predictions are informed “best guesses” about the future. If I’m wrong on two or three of my expectations, my peers will forgive me. In the rapidly changing world of technology, batting .400 is a pretty good statistic.

Future Data Movement Trends with SnapLogic

Data volumes are exponentially increasing and many organizations are starting to realize the complexity of their growing data movement and data management solutions. Data exists in various systems, and getting meaningful value out of it has become a major challenge for many companies. Also, most of the data is usually stored in relational systems like MySQL, PostgreSQL and Oracle, these being the mainstream databases primarily used for OLTP purposes. NoSQL systems like Cassandra, MongoDB and DynamoDB have also emerged with tunable consistency model in order to store some of these mission critical data. Customers then typically move these data to much bigger systems like Teradata and Hadoop (OLAP) that can store large amounts of data, so they can run analytics, reporting or complex queries against it. There is also a recent trend where some of these data are moved to the cloud, especially to Amazon RedShift or Snowflake and also to HDInsights or Azure Data Warehouse.

Continue reading “Future Data Movement Trends with SnapLogic”