Will the Cloud Save Big Data?

This article was originally published on ITProPortal.

Employees up and down the value chain are eager to dive into big data, hunting for golden nuggets of intelligence to help them make smarter decisions, grow customer relationships and improve business efficiency. To do this, they’ve been faced with a dizzying array of technologies – from open source projects to commercial software products – as they try to wrestle big data to the ground.

Today, a lot of the headlines and momentum focus around some combination of Hadoop, Spark and Redshift – all of which can be springboards for big data work. It’s important to step back, though, and look at where we are in big data’s evolution.

In many ways, big data is in the midst of transition. Hadoop is hitting its pre-teen years, having launched in April 2006 as an official Apache project – and then taking the software world by storm as a framework for distributed storage and processing of data, based on commodity hardware. Apache Spark is now hitting its strides as a “lightning fast” streaming engine for large-scale data processing. And various cloud data warehousing and analytics platforms are emerging, from big names (Amazon Redshift, Microsoft Azure HDInsight and Google BigQuery) to upstart players like Snowflake, Qubole and Confluent.

The challenge is that most big data progress over the past decade has been limited to big companies with big engineering and data science teams. The systems are often complex, immature, hard to manage and change frequently – which might be fine if you’re in Silicon Valley, but doesn’t play well in the rest of the world. What if you’re a consumer goods company like Clorox, or a midsize bank in the Midwest, or a large telco in Australia? Can this be done without deploying 100 Java engineers who know the technology inside and out?

At the end of the day, most companies just want better data and faster answers – they don’t want the technology headaches that come along with it. Fortunately, the “mega trend” of big data is now colliding with another mega trend: cloud computing. While Hadoop and other big data platforms have been maturing slowly, the cloud ecosystem has been maturing more quickly – and the cloud can now help fix a lot of what has hindered big data’s progress.

The problems customers have encountered with on-premises Hadoop are often the same problems that were faced with on-premises legacy systems: there simply aren’t enough of the right people to get everything done. Companies want cutting-edge capabilities, but they don’t want to deal with bugs and broken integrations and rapidly changing versions. Plus, consumption models are changing – we want to consume data, storage and compute on demand. We don’t want to overbuy. We want access to infrastructure when and how we want it, with just as much as we need but more.

Big Data’s Tipping Point is in the Cloud

In short, the tipping point for big data is about to happen – and it will happen via the cloud. The first wave of “big data via the cloud” was simple: companies like Cloudera put their software on Amazon. But what’s “truly cloud” is not having to manage Hadoop or Spark – moving the complexity back into a hosted infrastructure, so someone else manages it for you. To that end, Amazon, Microsoft and Google now deliver “managed Hadoop” and “managed Spark” – you just worry about the data you have, the questions you have and the answers you want. No need to spin up a cluster, research new products or worry about version management. Just load your data and start processing.

There are three significant and not always obvious benefits to managing big data via the cloud: 1) Predictability – the infrastructure and management burden shifts to cloud providers, and you simply consume services that you can scale up or down as needed; 2) Economics – unlike on-premises Hadoop, where compute and storage were intermingled, the cloud separates compute and storage so you can provision accordingly and benefit from commodity economics; and 3) Innovation – new software, infrastructure and best practices will be deployed continuously by cloud providers, so you can take full advantage without all the upfront time and cost.

Of course, there’s still plenty of hard work to do, but it’s more focused on the data and the business, and not the infrastructure. The great news for mainstream customers (well beyond Silicon Valley) is that another mega-trend is kicking in to revolutionize data integration and data consumption – and that’s the move to self-service. Thanks to new tools and platforms, “self-service integration” is making it fast and easy to create automated data pipelines with no coding, and “self-service analytics” is making it easy for analysts and business users to manipulate data without IT intervention.

All told, these trends are driving a democratization of data that’s very exciting – and will drive significant impact across horizontal functions and vertical industries. Data is thus becoming a more fluid, dynamic and accessible resource for all organizations. IT no longer holds the keys to the kingdom – and developers no longer control the workflow. Just in the nick of time, too, as the volume and velocity of data from digital and social media, mobile tools and edge devices threaten to overwhelm us all. Once the full promise of the Internet of Things, Artificial Intelligence and Machine Learning begins to take hold, the data overflow will be truly inundating.

The only remaining question: What do you want to do with your data?

Ravi Dharnikota is the Chief Enterprise Architect at SnapLogic. 

Podcast: James Markarian and David Linthicum on New Approaches to Cloud Integration

SnapLogic CTO James Markarian recently joined cloud expert David Linthicum as a guest on the Doppler Cloud Podcast. The two discussed the mass movement to the cloud and how this is changing how companies approach both application and data integration.

In this 20-minute podcast, “Data Integration from Different Perspectives,” the pair discuss how to navigate the new realities of hybrid app integration, data and analytics moving to the cloud, user demand for self-service technologies, the emerging impact of AI and ML, and more.

You can listen to the full podcast here, and below:

 

The Internet of Things and Wearable Tech: Our Interconnected Future

Internet-of-ThingsThe first Bluetooth headset was sold in 2000. Nearly a decade and a half later, 2014 was declared “the year of the wearable” by tech publications and industry enthusiasts. 2014, after all, was when tech fans were first informed about the pending arrival of what is now the most famous wearable in the world – the Apple Watch.

But if 2014 was the year of the wearable, you wouldn’t have known it if you were a guest at that year’s Consumer Electronic Show. The 2014 CES was dominated not by Apple Watch anticipation, but unbridled excitement over the Internet of Things (IoT).

Now in 2015, it is hard to talk about one without talking about the other. Wearables and IoT are on a collision course, and the merger is already triggering an entirely new technological revolution: the Internet of Me.

Wearables: Following the Path of the Smartphone

The Internet of Things refers to the widespread usage of wifi to animate and connect “dumb” machines and objects, such as toothbrushes to make them “smart.” Once enlightened, these smart devices can communicate not only with each other, but with their human masters.

A smart toothbrush gathers data about your brushing habits and sends it directly to your dentist to analyze before your next visit. IoT, which has been creeping forward for years, is now poised for mainstream saturation before the end of the decade. But the introduction of wearables is speeding up - and altering - the onset of IoT.

Wearables are evolving along a path similar to the one taken by smartphones. Smartphones didn’t truly hit their stride until Apple launched the App Store, which enabled users to integrate their entire digital lives -  from their daily planner to GoogleDrive to their ecommerce landing pages to their iTunes music library - all in one place.

Like pre-App Store smartphones, wearables are just another ecosystem of devices. Wearables can’t revolutionize the way humans interact with technology until they are stitched together with the other crucial components of our digital lives. The arrival of IoT is providing just that stitching.

Ford Cars, Android Wear and Connected Wearables

Ford is leading the charge to integrate IoT with the wearables that people access while they are driving. If a diabetic driver has a medical bracelet or watch, it could relay information about the driver’s blood-glucose level to the car’s on-board multimedia system, which could then relay that information to physicians or family members, if need be. If a baby were sleeping in the back, a wearable could monitor its vitals and relay the information to the vehicle, to the parents’ wearables, or both.

Wearable-techOne The Internet of YOU: When Wearable Tech and the Internet of Things Collide describes the phenomenon of IoT plus wearables - The Internet of You - as “having the potential to build our technology so that it works for us, not the other way around.” One example is Android Wear, which was built by Google. Google recently purchased Nest, which is a collection of smart household devices. When Android Wear connects to the Next thermostat, for example, the thermostat wouldn’t need to be programmed. Instead, Wear could “tell” the thermostat that the wearer is getting too warm or cool, and the thermostat could then adjust the temperature in the room.

The Internet of You combines the personalization of wearables with the ubiquity of the Internet of Things. Like smartphones, wearables unite the scattered elements of the user’s personal and digital life. If wearables existed in a vacuum, they would be another cool novelty gadget - a toy for people with disposable income. But with IoT acting as the glue that bonds wearables to all of the increasingly “smart” devices that surround us in our daily lives, wearables have the potential to rival - or replace - smartphones as the single most important devices we own. Just as IoT will affect the rise of wearables, wearables have the potential to act as the unifying force that bonds the billions of devices that will make up the Internet of Things.

Together, they are the Internet of You.

Nick Rojas is a business consultant and write who lives in Los Angeles and Chicago. He has consulted small and medium-sized enterprises for over twenty years. He has contributed articles to Visual.ly, Entrepreneur and TechCrunch. You can follow him on Twitter @NickARojas, or you can reach him at NickAndrewRojas@gmail.com.

November 2014 Release for the SnapLogic Elastic Integration Platform

We are pleased to announce that our November 2014 release went live this past weekend. Some of the updates available in this release include:

Security

Enhanced Account Encryption lets you encrypt account credentials used to access endpoints from SnapLogic using a private key/public key model.

enhanced-encryption-2

Additionally, initial user passwords are now system-generated and the ability to reset a forgotten password has been added.

 

Patterns

Projects can now be saved to show up on the Patterns tab of the catalog, letting you use any pipelines within them as templates.

create-project

SnapReduce

SnapLogic’s support for big data integrations using Hadoop’s framework to process large amounts of data across large clusters has come out of Beta. Support for Kerberos has been introduced as a Beta feature.

 

Lifecycle Management (Beta)

The subscribable feature lets you manage pipeline development though the concept of promotion through phases to production.

org-phases

UI Enhancements

A few changes have been implemented to make pipeline building easier, including:

  • Unique Snap names when placing multiples of the same Snap on the canvas.
  • Copying and pasting of configured Snaps.
  • Multi-select of Snaps
  • The ability to turn off the auto-validation that occurs with each save.

 

Pipeline Documentation

With one click, you can download or print a document that describes your pipeline, listing all Snaps, their configurations and the pipeline properties.

pipeline-doc

New and Enhanced Snaps

This release introduces the Email Delete Snap, SumoLogic Snap Pack (Beta), and SAP HANA Snap Pack (Beta). In addition, enhancements have been made to SAP Execute, Directory Browser, Google Analytics, File Writer, Aggregate, and database writer Snaps.


See the release notes for detailed information.

Fall 2014 Release for the SnapLogic Elastic Integration Platform

Screen Shot 2014-10-02 at 2.18.01 PMWe will be updating the SnapLogic Elastic Integration Platform on October 4th, 2014 for the Fall 2014 Release. The update will occur at 10 AM PST and an email will be sent out to customers with details beforehand.

As previewed in this week’s release webinar a few days ago, below are some highlights in this release. In addition to a recording of the webinar, we have also provided some resources in this post containing additional information about these latest features and enhancements.

Mapper Snap

The Mapper Snap, formerly known as the Data Snap, has been enhanced to support structural mapping, drag and drop mapping, highlighted mapping path, data preview of input and output data and the mapping root option for arrays.

Mapper Preview

 

In addition, SmartLink has been enhanced to support historical matching beyond the initial string matching of field names.

Pipeline Versions

Pipeline versions lets you easily replace an existing pipeline with a newer one or to rollback to a previous version.

versions

 

Password Expiration Policy

You can now force user passwords to expire either immediately or at the end of a specified time period.

password-policy

SnapReduce (Beta)

SnapReduce is SnapLogic’s support for big data integrations using Hadoop’s framework to process large amounts of data across large clusters. With this feature, pipelines can be built to run on Hadoop. Be sure to join the pre-release program here and talk to your SnapLogic account manager for more details.

Additional UI Enhancements

In this release, you can also find performance enhancements, importing and exporting projects (Beta), visibility of child pipelines in Dashboard. See the release notes for more information.

Snaps

Enhancements were made to several of our Snaps in this release, including ServiceNow, JMS, Oracle Stored Procedure, XML Formatter and the Table List Snaps. This post provides a useful overview of SnapLogic Snaps.

Take a look at the resources below for additional details of this release and let us know in the comments section if you have any questions:

What’s Next for the Cloud in 2014?

This time of the year is always a time for technology predictions and insights into what to expect for the new year to come. SnapLogic cloud integration expert and Sr. Director of Product Marketing Maneesh Joshi made some great predictions recently that made their way into VMblog. Predictions covered everything from iPaaS capabilities to API management to the citizen developer, with the roles Social, Mobile, Analytics and Cloud in the enterprise at the forefront of each prediction.

The Rise of the Cloud Data Warehouse

To learn more about what’s new and what’s next for the Cloud, check out our predictions below:

1. iPaaS makes ESBs obsolete
ESBs have had a great 10 year run where they were regarded as the platform of choice for creating a service abstraction layer for building loose-coupled integrations across on-premise applications. Although the vision was great, the enterprise service bus is now facing extinction because of its inability to adapt to the new world of cloud and SaaS applications. The major complication that ESBs were unable to adapt to was that a big chunk of endpoints (SaaS) are now being accessed over the public internet. ESBs were unable to negotiate the unpredictability and unreliability of the public internet. For what it’s worth, it wasn’t entirely the ESB’s fault; SOA had a lot to do with it, which leads me to the next prediction.

Watch the webinar: Is ETL Dead in the API Economy

2. API management and iPaaS jointly displace Service Oriented Architecture (SOA) in enterprise IT
API management and integration will join forces to eliminate SOA from the vocabulary of the enterprise IT professional. In my honest opinion, the building principles of SOA are timeless and stand valid even today. However SOA lost its way because it hitched its wagon to the heavy and unwieldy simple object access protocol (SOAP) protocol. Mobile application developers, who are the primary consumers of services, demand a lightweight and flexible protocol that makes them hyper-productive. This mismatch of expectations resulted into Representative-State Transfer (REST) protocol to gain popularity. The same APIs that will expose data and business services to mobile developers will also be doubled up as integration APIs. iPaaS will emerge as the de facto choice for integrating and orchestrating across these application and data service APIs.

3. IT dinosaurs face extinction; the citizen developer emerges
In 2014, we will be one year closer Gartner’s prediction that CMOs will have a larger technology budget than the CIO by 2017. In my conversations with CIOs, it is clear that the more dynamic and forward-looking ones are already taking appropriate steps to ensure that they don’t become extinct in the process. However, it’s the dinosaurs who are trying to solve new age problems (social, mobile, cloud, and big data analytics) with last generation tools that need to wake up. The new generation of business analysts (a.k.a. citizen developers) are quite tech-savvy and have already begun solving problems with agile new generation without IT’s permission. Not only should the dinosaurs embrace new technologies, but they need to provide easy access to data and information to analysts via self-service. This trend will result into a shift of power towards the business users.

Learn more about the user experience of the SnapLogic Integration Cloud.

Puzzle4. Digital Marketing platforms take over the world
Tech-savvy digital marketers are plotting world domination. 2013 saw leading edge marketers building digital marketing platforms that essentially combine integration technologies and analytics engines. With this platform they can easily aggregate and analyze customer data in real-time, and more importantly respond to it with targeted offers. The most effective integration layer involved in such platforms needs to handle structured and unstructured data, handle bulk and real-time integrations, and be able to orchestrate across multiple applications to make this a reality. 2014 will be the year when such robust platforms go mainstream.

5. The rise of the cloud data warehouse
Amazon Redshift will severely disrupt the EDW market in 2014. 2013 already saw some strong adoption, as customers are looking for a data warehouse in the cloud that uses a pay-per-use model. 2014 will see full scale adoption of cloud data warehousing. Cloud and desktop visualization tools such as Tableau Desktop and Tableau Online will follow suit. ETL technologies that were purpose-built for on-premise data warehousing will become less relevant in this new world. At my own company, SnapLogic, we’re seeing a high demand among our customers to load data into and between Redshift and Tableau – and that demand is poised to grow.

To see our deck featuring these predictions, check us out on Slideshare. And let us know in the Comments section below if you have similar predictions, and what you’re looking forward to most in 2014!

Engineering – What We Value at SnapLogic

At SnapLogic we have a high quality bar for the software we create. Our products must be top-notch to compete in the enterprise integration space so we can’t compromise when it comes to engineering best-practices. Below are a few of the things we do to ensure our bar stays high. Nothing here is too revolutionary, just good common sense.

Code Revision System
Like many other companies we’ve adopted git to track changes in our code. To avoid the operational overhead of maintaining our own central repo we use GitHub. The service is fantastic and we can sleep at night knowing our code is safe and secure in the cloud.

Peer Code Reviews
Not a single line of code is checked in without at least two engineers looking at it. This greatly reduced the number of defects introduced into the code and is a natural way to have engineers cross-train on different parts of the code base. As human beings we all make mistakes, but with the help of our colleagues we have a better chance of catching them. For more insight into this aspect of how we operate, SmartBear Software has an excellent whitepaper discussing the merits of good code reviews.

Coding Style Guide
Code artifacts are the main deliverables of our daily activities. Very much like works of art, we want our artifacts not only to be fully functional, but to also be beautiful objects of which we can be proud. Beyond beauty there are other benefits of a consistent code base:

  • Well-written and well-documented code is easier to upgrade and maintain
  • Bugs are easier to find and fix, earlier in the development lifecycle
  • More re-use of existing artifacts

Additionally, a consistent coding style ensures that people will be most able to work in different areas of the products without having to spend an inordinate amount of time having to familiarize themselves with unfamiliar idioms and constructs. We’ve primarily settled on PEP-8 for Python, Oracle’s style guide for Java, and Google’s style guide for JavaScript.

Continuous Integration Environment
Every engineer’s best friend is a jolly fellow named HudsonJenkins who answers the age-old question “will this code break the build?” Jenkins happily checks out your code, builds it, runs tests, pushes artifacts to Artifactory or Nexus repos, and finally deploys it. No more surprises at the end of the release cycle!

Hermetic Builds
Every binary that is compiled in our continuous integration environment is deterministic, meaning we can build a bit-wise identical copy of any artifact from source. This requires a static build-tools chain with versioned OS, compiler (with flags), and dependencies (.jar, .so, .egg, etc). Hermetic builds guarantee we can re-create any existing artifact just from a git commit hash. Gone are the days when the binary running in production was built on some random engineers laptop.

Automated Testing
Unit, smoke, regression, performance, lemon, negative, positive, stress, load, penetration, and error tests. What do they all have in common? A machine can do them! Manual testing of software is time consuming and inconsistent. Save your QA’s precious time by having them tests those tricky, non-deterministic or UX cases that just can’t be done by a machine. Don’t make your QA just press buttons. They’ll thank you for it and your users will get a better product.

Code Coverage Metrics
Peter Drucker, an often quoted management consultant famously said “If you can’t measure it, you can’t improve it.” This quote aptly applies to code coverage; if you don’t measure how good your unit tests are, you can’t have any real confidence in the quality of your code. There are plenty of fantastic tools out there that report coverage — Eclipse and IntelliJ both have coverage baked right into the IDE. This helps engineers write comprehensive tests up-front and tools like Sonar can tell you if those tests add value.

Repeatable Release Process
Each push to production should be audited, repeatable, and easy. We currently use a one-button push via Jenkins that reliably pushes new binaries to our production machines via puppet. It then systematically restarts each service and sends the push event to Graphite. This process takes one mouse click and any of our release engineers can do it. No manual steps, no cut-n-paste deploy commands, and no bespoke push scripts. This makes rollbacks simple too!

“Leave this world a little better than you found it.”
Last but not least, we operate under the philosophy made famous by Lord Baden Powell, founder of the Boy Scouts. Whatever code we touch we strive to leave it in better shape than we found it.

If you too value the above, get in touch. We’re always looking for like-minded engineers to join the team! You can get in touch at SnapLogic.com/jobs.