The future for big data processing lies in the adoption of commercial Hadoop distributions and their supposed deployments. The macro use case for big data are data lakes, which are massive amounts of structured and unstructured data that do not carry the same restrictions as traditional data warehouses. They store everything, including every type of data, any volume, any scope of data that may be use by enterprise data users, for any reason.
Despite the power and potential of data lakes, many enterprises continue to approach this technology with the same data integration approaches and mechanisms they’ve used in the past, none of which work well.
Read this David Linthicum white paper to learn why data lakes require data integration solutions that can deal with structured and unstructured data, along with some additional requirements:
- The need for schema-less data storage
- The ability to deal with streams of data that function in real time
- An entirely different approach to data integration that involves newer data integration technology