By Bill Creekbaum
Whether you’re an analyst, data scientist, CxO, or just a “plain ol’ business user,” having access to more data represents an opportunity to make better business decisions, identify new and innovative opportunities, respond to hard-to-identify threats … the opportunities abound.
More data – from IoT, machine logs, streaming social media, cloud-native applications, and more – is coming at you with diverse structures and in massive volumes at high velocity. Traditional analytic and integration platforms were never designed to handle these types of workloads.
The above data is often associated with big data and tends to be accessible by a very limited audience with a great deal of technical skill and experience (e.g., data scientists), limiting the business utility of having more data. This creates a big data insights gap and prevents a much broader business user and analyst population from big data benefits. Our industry’s goal should be to help business users and analysts operationalize insights from big data. In fact, Forbes has declared that 2017 is the year that big data goes mainstream.
- A scalable data platform: Handles big data that is compatible with “traditional” analytic platforms
- An integration platform: Acquires large volumes of high-velocity diverse data without IT dependency
To address the first element, Amazon has released Amazon Redshift Spectrum as part of their growing family of AWS big data services. Optimized for massive data storage (e.g., petabytes and exabytes) that leverages S3 and delivered with the scalable performance of Amazon Redshift, AWS is making the above scenarios possible from an operational, accessibility, and economic perspective:
- Operational: Amazon Redshift Spectrum allows for interaction with data volumes and diversity not possible with traditional OLAP technology.
- Accessibility: SQL interface allows business users and analysts to use traditional analytic tools and skills to leverage these extreme data sets.
- Economic: Amazon Redshift Spectrum shifts the majority of big data costs to S3 service which is far more economical than storing the entire data set in Redshift.
Clearly, Amazon has delivered a platform that can democratize the delivery of extremely large volumes of diverse business data to business users and analysts, allowing them to use the tools they currently employ, such as Tableau, PowerBI, QuickSight, Looker, and other SQL-enabled applications.
However, unless the large volumes of high velocity and diverse data can be captured, loaded to S3, and made available via Redshift Spectrum, none of the above benefits will be realized and the big data insights gap will remain.
The key challenges of acquiring and integrating large volumes of high velocity and diverse data:
- On-prem in a Cloud-Native World: Many integration platforms were designed long ago to operate on-premises and to load data to an OLAP environment in batches. While some have been updated to operate in the cloud, many will fail with streaming workloads and collapse under the high volume of diverse data required today.
- Integration is an “IT Task”: Typical integration platforms are intended to be used by IT organizations or systems integrators. Not only does this severely limit who can perform the integration work, it will also likely force the integration into a lengthy project queue, causing a lengthy delay in answering critical business questions.
To address the second element in closing the big data insights gap, business users and analysts themselves must be able to capture the “big data” so that business questions can be answered in a timely manner. If it takes a long and complex IT project to capture the data, the business opportunity may be lost.
To close the big data insights gap for business users and analysts, the integration platform must:
- Handle large volumes of high velocity and diverse data
- Focus on integration flow development (not complex code development)
- Comply with IT standards and infrastructure
With the above approach to integration, the practical benefit is that those asking the business questions and seeking insights from having more data are able to leverage the powerful capabilities of Amazon Redshift Spectrum and will be able to respond business opportunities while it still matters.
Amazon’s Redshift Spectrum and the SnapLogic Enterprise Integration Cloud represent a powerful combination to close the big data insights gap for business users and analysts. In upcoming blog posts, we’ll look at actual use cases and learn how to turn these concepts into reality.
Interested in how SnapLogic empowers cloud warehouse users with up to a 10x improvement in the speed and ease of data integration for Redshift deployments, check out the white paper, “Igniting discovery: How built-for-the-cloud data integration kicks Amazon Redshift into high gear.”
Bill Creekbaum is Senior Director, Product Management at SnapLogic. Follow him on Twitter @wcreekba.