Data Integration – Definition & Overview

What is data integration?

Data integration is a foundational part of data science and analysis. Data can be overwhelming, providing too much data across sources to sort through to make timely, effective business decisions. Data integration sorts through large structured and unstructured data sets and selects data sets, structuring data to provide targeted insights and information. Data integration can combine data from big data, IoT, applications, and a variety of locations. In that way, data integration cuts down on the chaos and noise of too much-unsorted data and allows data to be cross-referenced. Merged data can provide a unified picture, like a unified picture of operations, or be organized to make data more usable. Data integration can combine data from any number of sources whether from entirely different systems and apps or specific formats like spreadsheets. 

Data integration cleans and sorts information for a variety of ends, including: design and development; data merging; data migration and replication; data warehousing; data cleansing; data modeling; and third-party interfaces. Data Integration also allows for the governance of that data across its sources. Datasets extracted from a larger data warehouse can go into specific, unified reports. For example, a look at marketing and sales datasets can provide a more comprehensive report.

Methods of data integration

  • Batch integration
  • Real-time integration
  • Cloud-based integration