“As the workload increases, the application master requests the YARN ResourceManager to spin up more Hadooplex nodes one at a time as shown in the diagram below. This scale out occurs dynamically until either the workload starts decreasing or a maximum number of Hadooplex nodes allowed has been met.
As the workload decreases, the nodes start spinning down. This is how SnapLogic achieves elastic scaling based on the workload volumes within a Hadoop cluster utilizing the YARN ResourceManager. This is possible only if an application is a native YARN application.”
I wanted to take this further by showing what this looks like in a SnapLogic Elastic Integration Platform demonstration. In this demo, you can see how how the Hadooplex, which is the run-time execution engine elastically scales depending on the workload.
You can read more about SnapLogic big data processing platforms in this paper and check out more SnapLogic demonstrations here. Be sure to also check out our upcoming webinar with Mark Madsen, which will focus on the new reference architecture for the enterprise data lake.
“Without focusing on the data architecture of the lake, you will build a swamp with nice code.”
— Mark Madsen, President and Analyst, Third Nature
Join us for a live webinar on Thursday, December 8th to hear from industry analyst and big data thought leader Mark Madsen about the future of big data and the importance of the new Enterprise Data Lake reference architecture. Mark will be joined by Craig Stewart, SnapLogic’s Senior Director of Product Management, and Erin Curtis, SnapLogic’s Senior Director of Product Marketing.
As Mark says, “Building a data lake requires thinking about the capabilities needed in the system. This is a bigger problem than just installing and using open source projects on top of Hadoop. Just as data integration is the foundation of the data warehouse, an end-to-end data processing capability is the core of the data lake. The new environment needs a new workhorse.”
This webinar will cover:
What’s important when building a modern, multi-use data infrastructure and why the field of dreams approach doesn’t work
The difference between a Hadoop application vs. data lake infrastructure
An enterprise data lake reference architecture to get you started
Craig and Erin will also discuss how SnapLogic’s Elastic Integration Platform powers the new enterprise data lake reference architecture and some of the benefits of a modern data integration solution.
Who should join:
Data warehouse managers
Chief data officers
Business intelligence practitioners
Data integration managers
Anyone building or considering the new hybrid data architecture
Register here - we look forward to seeing you online on Tuesday, December 8th!
About Mark Madsen
Mark Madsen, president and founder of Third Nature, is a well-known consultant and industry analyst. Mark frequently speaks at conferences and seminars in the US and Europe and writes for a number of leading industry publications. Mark is a former CTO andCIO with experience working for both IT and vendors, including a stint at a company used as a Harvard Business School case study. Over the past decade Mark has received awards for his work in analytics, business intelligence and data strategy from the american Productivity and Quality Center, the Smithsonian Institute and TDWI. He is co-author of several books and lectures and writes frequently on analytics and data topics.
Adding on to our Fall 2015 release, tonight our library of 350+ pre-built intelligent connectors, called Snaps, is being updated with our November 2015 release. Updates to the SnapLogic Elastic Integration Platform are quarterly and Snap updates are monthly. Some of the updates in this Snap release include:
Select Snap and Update Snap added to our core JDBC Snap Pack, allowing you to fetch data and update tables respectively.
SnapLogic is currently on the hunt for engineers, marketers, QA specialists and more. We’re building the team in our San Mateo, CA headquarters and continuing to expand our Boulder, CO office and seeking out talent all around the country for both in-office and remote positions. Are you currently in Boston or New York? We have offices there too and we are expanding internationally in 2016. Some of the teams that are currently hiring include:
And in engineering we are hiring for many important roles, including:
As more and more enterprise organizations recognize the importance of data and application integration to cloud and big data success, we have grown significantly in the past year with some important hires and the opening of additional offices. Learn more about the perks and benefits of working at SnapLogic.
SnapLogic is the industry’s first unified data and application integration platform as a service (iPaaS). Our hybrid cloud architecture is powered by 300+ Snaps, which are pre-built integration components that simplify and automate complex enterprise integration patterns and processes. Funded by leading venture investors, including Andreessen Horowitz and Ignition Partners, and co-founded by Gaurav Dhillon, the SnapLogic Elastic Integration Platform enables leading enterprises to connect faster and gain a better return on their cloud application and big data investments.
YARN, a major advancement in Hadoop 2.0, is a resource manager that separates out the execution and processing management from the resource management capabilities of MapReduce. Like an operating system on a server, YARN is designed to allow multiple, diverse user applications to run on a multi-tenant platform.
Developers are no longer limited to writing multi-pass MapReduce programs with disadvantages like high latency, when a better option can be modeled using a directed acyclic graphic (DAG) approach.
Any application, including the likes of Spark, can be deployed onto an existing Hadoop cluster, and take advantage of YARN for scheduling and resource allocation. This is also the basic ingredient of a Hadooplex in SnapLogic – to achieve elastic scale out and scale in for integration jobs.
The per-application ApplicationMaster is, in effect, a framework specific a library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor tasks.
SnapLogic’s application master is responsible for negotiating resources with the ResourceManager. The control plane in SnapLogic is the brain (read this post on software defined integration), which holds all critical information and helps make logical decisions for scale out and scale in. The Hadooplex is the actual application itself that runs the workload.
In the this diagram you can see that the Hadooplex reports its workload information to the control plane at regular intervals. The application master gets the load information from the control plane, also at regular intervals.
As the workload increases, the application master requests the YARN ResourceManager to spin up more Hadooplex nodes one at a time as shown in the diagram below. This scale out occurs dynamically until either the workload starts decreasing or a maximum number of Hadooplex nodes allowed has been met.
As the workload decreases, the nodes start spinning down. This is how SnapLogic achieves elastic scaling based on the workload volumes within a Hadoop cluster utilizing the YARN ResourceManager. This is possible only if an application is a native YARN application. (Read about the importance of YARN-native here.)
Learn more here about how customers are using SnapLogic for big data integration with Hadoop.
The SnapLogic team is charging full steam ahead this month talking big data and what it means for the future of enterprise IT, with some recent and upcoming speaking opportunities led by our fearless leader, Gaurav Dhillon. This week Gaurav spoke on a panel at Andreessen Horowitz a16z Tech Summit SF, titled “Own Your Data, Don’t Be Owned By It”. Joined by leaders from Cazena, Databricks and Mixpanel, the summit examined technologies that are changing the future of business and Gaurav’s panel specifically addressed big data, and if CIOs and CMOs are using it or just talking about it. Gaurav will also be speaking at the a16z Tech Summit in London next week. He was also at the Variety Big Data Summit on a panel with executives from Microsoft and Google. Here’s a picture of the panel from Twitter.
We’re also sending a team to a few upcoming events including the CDM Media Big Data Summit, an event in Phoenix next week for C-level executives involved in data storage, management and analysis to discuss how companies can effectively manage, protect and leverage the growing amounts of data in the enterprise.
SnapLogic specifically will be sponsoring the Mobile Data Quality Thought Leadership portion of the Summit. Data quality has long been one of the most challenging issues that IT organizations and the enterprises that are home to them have had to deal with. Everyone knows that these data quality issues exist, but the cost and complication of addressing them has pushed them to the back burner. With a focus on best practices, the event will allow attendees to explore strategies and technologies surrounding real-time data processing, data protection and privacy, meeting industry regulations and compliance, and data storage.
And finally, catch us in December for the Gartner Application Architecture, Development & Integration Summit in Las Vegas. We enjoy going to this event every year to talk to attendees about modern integration challenges and how, as mobile, cloud and IoT transform the enterprise, application and data integration strategies must evolve to seize new opportunities. These strategies usually include the need for a hybrid integration platform, that is able to handle streaming, event-based application integration and real-time and batch-oriented data integration. We’ll be talking about SnapLogic’s Elastic Integration Platform as a service (iPaaS) of course.
Read more from Gartner here on why businesses should begin converging application and data integration.
In the movie Shawshank Redemption, Andy DuFresne is wrongly imprisoned and spends the next 27 years tunneling out with the only tool he had available – a simple rock hammer. 27 years! What if Andy had a more modern and powerful tool?
When it comes to today’s enterprise IT organization, most companies are also “Innocent Prisoners” to their legacy integration tools. Leading tools in the traditional ETL and ESB marketplace are now almost 27 years old – built in the 1990’s. Yet companies are still saddled with the “legacy technical debt” of choices made 5, 10 or sometimes even 20 years ago. The good news is that more and more companies are finally realizing that their technology vendors can’t just iterate out old technology by releasing a few new features, or coming out with a watered down “cloudified” version of their software.
In 2015, over 50% of your data is coming from new sources- created in the last 3- years as your data and applications are moving to the cloud and new big data sources emerge. How many software-as-a-service (SaaS) applications are you running today? How many will you have in place next year? At SnapLogic alone we have over 30 SaaS applications and we are not a $1B company (not yet ).
HR and Marketing departments seems to buy next SaaS application every month without involving IT. And while C-level executives and industry analyst firms like Gartner may mandate that you move your apps and analytics to the cloud, but most IT and line of business organizations continue to be hampered by their existing application and data integration tools.
Your applications and your data is moving to the cloud – why isn’t your Integration Platform? I have met few people who debate this fact – but many continue to be paralyzed to move because they are chained to legacy integration tools that are slow, complicated and were built before SaaS, JSON, REST and Big Data were even considerations.
So you agree, but how do you get started?
Find a small project and get started. With a cloud-to-cloud integration scenario – you don’t even need to install software – it’s all in the cloud. You could be integrating Salesforce and Eloqua or pulling data out of Workday and loading it into Amazon Redshift before you leave the office tonight!
Don’t be afraid to fail fast. I have several customers who can attempt BI projects in a day or 2 just to see if they will be successful. Most successful teams are willing to try new things.
So the bottom line? Don’t be like Andy, find a modern project like SaaS-to-SaaS integration and get started today. There are modern integration tools out there to help.
This post originally appeared on LinkedIn. Louis Hausle has worked with Integration tools since 1999, working at companies such as Informatica, IBM, Acta and SnapLogic.