iPaaS: The Cooler, Hipper ETL and ESB at GameStop

SnapLogic customer GameStop was featured in a great write up on the integration platform as a service (iPaaS) market today in NetworkWorld: iPaaS: What this cloud technology is and why it’s important. The article highlights how, “iPaaS vendor SnapLogic shows how many cloud-based apps, from Salesforce or Workday, Hadoop and IoT can be integrated into an iPaaS and then streamed into on-premises systems.”

GameStopLogo_BlackRedMark Patton, vice president of architecture at GameStop, a global retailer that operates more than 4,000 stores in the U.S. and another 2,000 abroad, commented in the article that the company needed to be able to rapidly integrate many new and legacy cloud and on-premises applications for real estate management, HR, ERP systems, financial platforms, etc.: “We needed a way to glue these things together. You need to be able to get data into and out of apps quickly.”

When asked why not use the data extraction, transformation and loading (ETL) tools that come with SaaS application providers like Salesforce, the article noted that, “he’d rather develop skills in a platform that can integrate myriad sources together, instead of being beholden to whatever integrations Salesforce develops for its SaaS.”

Mark Patton’s comment: ”My view is the cloud environment will remain extremely fragmented, so there will always be a need for someone to play Switzerland, for some platform to integrate across these vendors.”

nwlogo10The NetworkWorld article also quoted Gartner industry analyst Massimo Pezzini, who noted that the iPaaS market will grow at a 55% clip to reach a $1 billion market within three years, and included last June’s IDC estimate that the worldwide data integration and access market (which includes ETL and iPaaS) will reach $5 billion, growing 8% annually.

Here’s a video of GameStop’s Mark Patton discussing his deployment of the SnapLogic Elastic Integration Platform in which he points to the fact that, ”We’ve been able to decrease the time it takes to implement a well-defined integration by 83%.”

Next Steps:

Streaming Data and Data Lakes at #StrataHadoop World

Ravi DSnapLogic’s big data expert and Head of Enterprise Architecture, Ravi Dharnikota, was featured on Information Management recounting his observations at last month’s Strata+Hadoop World in San Jose. The main takeaway was that the attendees and sessions were primarily focused on streaming data, data lakes, and Apache Spark for analytics. He noted: “While the continuous innovation and change in the big data industry provides fast, frequent improvements to the technology, it is tough to keep up with in an organization where there are competing priorities and projects.”

You can read the full Q&A below.  Continue reading “Streaming Data and Data Lakes at #StrataHadoop World”

SnapLogic Named to CRN Big Data 100

CRN Big DataFor the second year in a row, SnapLogic has been named to the Big Data 100 list by CRN, which highlights vendors that have demonstrated an ability to innovate in bringing to market products and services that help business work with big data.

According to Robert Faletra, CEO of The Channel Company, “Big data is becoming critical for many businesses. Organizations are faced with managing information streams of unprecedented volume and complexity, and are always in need of more powerful and efficient tools for capturing, storing, organizing, securing and analyzing data to gain business insights.” By being included on the Big Data 100 list, the SnapLogic team gets to further prove ingenuity and creative problem-solving to help customers keep up with the rapidly evolving demands of data management. As our VP of marketing, Darren Cunningham said, SnapLogic’s goal is to ensure that integration is an on-ramp to big data and cloud adoption in the enterprise, not a roadblock.” You can read the full press release here. Also be sure to take a look at 30 of the “Coolest Data Management Vendors” from the Big Data 100 list.

If your organization is currently thinking about the exploding volume, speed and variety of information you’re dealing with on a daily basis, learn more about integration for big data, and register for our upcoming SnapLogic Live session featuring “Big Data in Motion” on Thursday, May 12th.

Eight Data Management Requirements for the Enterprise Data Lake

SnapLogicDataLakeMgmt01itbe_logoThis article originally appeared as a slide slow on ITBusinessEdge: Data Lakes – 8 Data Management Requirements.

2016 is the year of the data lake. It will surround, and in some cases drown the data warehouse and we’ll see significant technology innovations, methodologies and reference architectures that turn the promise of broader data access and big data insights into a reality. But big data solutions must mature and go beyond the role of being primarily developer tools for highly skilled programmers. The enterprise data lake will allow organizations to track, manage and leverage data they’ve never had access to in the past. New data management strategies are already leading to more predictive and prescriptive analytics that are driving improved customer service experiences, cost savings and an overall competitive advantage when there is the right alignment with key business initiatives.

So whether your enterprise data warehouse is on life support or moving into maintenance mode, it will most likely continue to do what it’s good at for the time being: operational and historical reporting and analysis (aka rear view mirror). As you consider adopting an enterprise data lake strategy to manage more dynamic, poly-structured data, your data integration strategy must also evolve to handle the new requirements. Thinking that you can simply hire more developers to write code or rely on your legacy rows-and-columns-centric tools is a recipe to sink in a data swamp instead of swimming in a data lake. Here are eight enterprise data management requirements that must be addressed in order to get maximum value from your big data technology investments.

1) Storage and Data Formats

Traditional data warehousing focused on relational databases as the primary data and storage format. A key concept of the data lake is the ability to reliably store a large amount of data. Such data volumes are typically much larger than what can be handled in traditional relational databases, or much larger than what can be handled in a cost-effective manner. To this end, the underlying data storage must be scalable and reliable. The Hadoop Distributed File System (HDFS) has matured and is now the leading data storage technology that enables the reliable persistence of large-scale data. However, other storage technologies can also provide the data store backend for the data lake. Open source systems such as Cassandra, HBase, and MongoDB can provide reliable storage for the data lake. Alternatively, cloud-based storage services can also be used as a data store backend. Such services include Amazon S3, Google Cloud Storage, and the Microsoft Azure Blob Store.

Unlike relational databases, big data storage does not usually dictate a data storage format. That is, big data storage supports arbitrary data formats that are understood by the applications that use the data. For example, data may be stored in CSV, RCFile, ORC, or Parquet to name a few. In addition, various compression techniques — such as GZip, LZO, and Snappy — can be applied to data files to improve space and network bandwidth utilization. This makes data lake storage much more flexible. Multiple formats and compression techniques can be used in the same data lake to best support specific data and query requirements. Continue reading “Eight Data Management Requirements for the Enterprise Data Lake”

Big Data Management: Doug Henschen Dives into the Data Lake

doug_henschen_constellationYesterday SnapLogic hosted a webinar featuring Doug Henschen from Constellation Research called Democratizing the Data Lake: The State of Big Data Management in the Enterprise. Doug kicked things off by walking through where we were and where we are today with some compelling examples from The Second Machine Age, by Erik Brynjolfsson and Andrew McAfee. When it comes to the power of modern computing, for instance, in 1996 the U. S ASCI Red at Sandia Labs cost $55M, was 1,600 square feet and had 1.8 Teraflops of computing power. In 2006 the Sony PlayStation 3 sold for $499 at the size of 4 x 12 x 10 inches and could handle 1.8 Teraflops of computing power. Amazing! Doug went on to discuss the impact of distributed computing and how software has evolved (think: Kasparov vs. Big Blue compared to the chess game on your laptop today).

Sure, some of these factoids are often discussed, and there’s no shortage of stats about the impact of big data on every industry as well our everyday lives, but what I really liked about Doug’s message was the importance of focusing on what actually drives business value. Use big data to improve analytical insight, but remember that “big data is only part of the digital disruption trend.”

Doug’s presentation reviewed the Hadoop market today, noting that the fastest growing segment is the shift to the cloud. Hadoop has been accepted as the platform standard with growing adoption in the enterprise, but Spark is definitely the accelerator. On the topic of the data lake, Doug made a number of important points:

  • It doesn’t just consist of all new data types. It’s often data that organizations just couldn’t afford to retrain or practically analyze in the past.
  • It’s not a replacement for an enterprise data warehouse – there is still a need for what he calls “industrialized queries against known data.”
  • It’s about integrated new data, with proactive and predictive analytics as a common driver.
  • A cluster can turn into a swamp without a well-ordered infrastructure.

Before diving into the nuts and bolts of the enterprise data lake and reviewing vendors in each category, the conversation focused on specific big data use cases by industry. Specific examples of case studies he’s worked on were shared - from campaign analysis and optimization in digital marketing and advertising, to archiving, to money laundering in financial services, to supply chain optimization in retail, to customer churn initiatives in telecommunications, to claims fraud analysis in insurance.

I encourage you to watch the entire presentation here. Similar to some of the data lake architecture examples and whitepapers we’ve recently shared on the SnapLogic blog, there are a number of solid conclusions about how to think about the data lake relative to your existing data infrastructure. The bottom line? As Doug wrote about on his blog, Hadoop is 10 years old, and as all parents know, it’s important to spend time with your kids, try to mitigate risks and set appropriate limits as they mature. The same is true with your data: know your data, your users and your risks and set the appropriate limits along your maturity curve.

As an industry, we need to democratize Hadoop and simplify efforts to create data lakes. We’re moving to a more cognitive era and data monetization is a hot trend, but the “journey to digital cannot be accomplished without connectedness.” When it comes to your data integration strategy, ensure that it is cloud, services and data enrichment savvy. But think bigger – how will a data lake drive new businesses and data models?

I’d like to thank Doug Henschen and Constellation Research for a great market overview and discussion. There’s a lot more that I didn’t cover in the full presentation, which is available on the SnapLogic website. I’ll leave you with this slide, which summarizes what Hadoop users are saying today:

Constellation Research: Common Data Lake Challenges. What Hadoop users say…

Building an IoT Application in SnapLogic, Part II: Speeding Through the Last Mile

The last post in this ongoing IoT series detailed the creation of a cloud-based Ultra Pipeline to do the bulk of the work for our IoT application. We described the following application:

  • A sensor somewhere (on-premises, from an API, etc.) that produces data that includes a “color” payload;
  • An LED on-premise, attached to our local network, conveniently hooked up to look like a REST endpoint;
  • Two pipelines, one on-premise, one in the cloud.

Our remaining task is to create the on-premise pipeline of the last bullet point. This is a short, simple pipeline with one slight wrinkle – using pipeline parameters.

image02 Continue reading “Building an IoT Application in SnapLogic, Part II: Speeding Through the Last Mile”

SnapLogic Now in the Microsoft Azure Marketplace

You may have seen our recent announcement about the availability of a SnapLogic Snaplex for Azure in the Microsoft Azure Marketplace. This is the latest development in our growing partnership with Microsoft, and a huge step forward in our support for Microsoft customers adopting a cloud-centric or hybrid cloud infrastructure.

Why is SnapLogic a good fit for Microsoft cloud services? The SnapLogic Elastic Integration Platform architecture is naturally well-suited for consuming and moving large data sets to and from the cloud and for cloud-to-ground data flows, so it’s the ideal data integration solution for Microsoft Azure data stores, HDInsight (Spark and Hadoop clusters in the cloud) and Cortana Intelligence solutions.

SnapLogic and Microsoft Azure
SnapLogic and Microsoft Azure

We also support Microsoft customers with Snaps – our pre-built intelligent connectors – for Azure SQL Data Warehouse, SQL Database and Blob Storage, plus on premises SQL Server, Microsoft Dynamics, Active Directory and more. So we’re able to support Microsoft customers whether they are operating completely on premises, in the cloud, or a combination of the two.

As noted in the announcement, we are pleased to partner with Microsoft to help democratize business analytics with fast, self-service connectivity between data sources and Microsoft cloud solutions. The combination of SnapLogic, Microsoft Azure and the Cortana Intelligence Suite (formerly Cortana  Analytics) delivers seamless ‘analytics-as-a-service’ for the enterprise.

Coming soon: more details about our support for Microsoft HDInsight. Stay tuned…