Why the cloud can save big data
This article originally appeared on computable.nl.
Many companies like to use big data to make better decisions, strengthen customer relationships, and increase efficiency within the company. They are confronted with a dizzying array of technologies – from open source projects to commercial software – that can help to get a better grip on the large amounts of data. For example, services such as Hadoop, Spark, and Redshift can be used as a basis for working with big data.
Ultimately, most companies simply want better data and faster answers – and not the hassle that comes with applying different technologies. Where Hadoop and other big data platforms have developed slowly, the cloud has grown faster. Therefore, the cloud can now solve many of the problems that previously prevented the progress of big data.
The promise of big data has in recent years mainly been met by large companies with extensive engineering and data science departments. The systems that were used were complex, difficult to manage and were subject to change. This is feasible for large enterprise organizations in Silicon Valley, but the average Dutch company cannot afford such systems. An average company wants the best data as quickly as possible in the right place, without having to hire dozens of Java engineers because they know the technology from A to Z.
The problems encountered by customers with the Hadoop on-premises platform are often the same problems they experienced with local legacy systems: there is simply insufficiently qualified staff to get everything done. Companies want advanced capabilities, but they do not want to be confronted with bugs, failed integrations and new versions. Moreover, the consumption models are changing – we want to consume, store and process data at all times. We do not want too much capacity. We want access to the infrastructure at any time and in every way, and we always want something more than we need.
In short, big data can only be used optimally using cloud. The first wave of “big data via the cloud” was simple: companies like Cloudera put their software on Amazon. But “real cloud” means that companies do not have to manage Hadoop or Spark – but move the complexity to a hosted infrastructure, where someone else takes care of the management. To this end, Amazon, Microsoft and Google now supply “managed Hadoop” and “managed Spark.” Companies only need to think about the data they have, the questions they have and the answers they want. There is no need to run a cluster, research new products or worry about version management. It is a matter of loading data and starting to process it.
Reasons to manage big data
There are three important – perhaps not always obvious – reasons for managing big data in the cloud:
- Predictability: The care for the infrastructure and its management lie with the cloud provider. As a result, companies can scale according to their own insight and need, without being confronted with (financial) surprises.
- Cost efficiency: Unlike on-premises Hadoop, where computing power and storage affect each other, they are separated in the cloud. Companies can both deploy individually as needed and benefit from lower costs.
- Innovation: Cloud providers continuously implement the latest software, infrastructure and best practices. As a result, companies can make optimum use of the benefits of the cloud without investing in time and money.
Of course there is still a lot of work to be done, but that is more focused on data and operations, and not on infrastructure. The good news for businesses is that there is a “new” trend in the field of data integration and use, and that is the transition to self-service. Thanks to new tools and platforms, “self-service integration” makes it possible to quickly and easily create automated data plans without the use of code. “Self-service analytics” makes it easier for analysts and business users to edit data without the intervention of IT.
All in all, these trends are responsible for the democratization of data – and that is promising. This will have a significant impact on horizontal functions and vertical industries. Data thus becomes a more fluid, dynamic and accessible source for all organizations. IT no longer holds the keys to the kingdom and developers no longer determine the workflow. Just in time, because the volume and speed of data from digital and social media, mobile tools and edge devices threatens to overwhelm us. As soon as the promise of the Internet of Things, AI, and Machine Learning really comes true, we will be overwhelmed by huge amounts of data.