Data Integration : Beyond the boundaries of the individual enterprise
Gartner has released their 2009 Cool Vendors in Data Management and Integration report, and here at SnapLogic we’re thrilled to be named as one of the innovative vendors.
In the report, Gartner notes the expansion of data integration requirements to "encompass data beyond the boundaries of individual enterprises". One of their recommendations is to:
Investigate data integration technologies that can unify data residing in applications both inside and outside an enterprise’s firewall, in order to address alternative models for adopting business applications (such as SaaS) with flexible, yet cost-effective, approaches to data management.
Data integration has always been a complex problem. There are some good reasons why, including identifying and mapping data, defining the business rules behind transformations, managing data and metadata, and the fundamental nature of a process that crosses organizational and functional boundaries.
Changing requirements have always contributed to integration complexity. However, the current wave of industry changes is more fundamental; not only are we crossing organizational and functional boundaries, we now are crossing the enterprise boundary as well. Once you cross the enterprise boundary, you’re in the realm of cloud computing, with all its opportunities and challenges.
The cloud computing movement has put the core IT landscape itself in flux. These changes to the IT landscape are having a direct impact on data integration is three ways :
Expanded data sources
Adding to the complexity of what was once a purely on-premises function using internal data sources, we now have web data sources and data from SaaS applications to contend with.
Data integration has traditionally focused on databases and files, since that is where the bulk of the data lived. Applications were built on databases, so as long as you could ‘get to the database layer through the back door’, integration was possible.
SaaS applications change that assumption. Although still built on databases, the database ‘back door’ is no longer an option. Access is through defined API’s (usually SOAP Web Services based), replacing generic database connectivity with a need for an interface that understands the application and it’s internal structure.
Web data sources add another variation. Direct database access is usually not an option (and even if it is, the database may not even be relational.) However, the interface is typically REST, and still requires some knowledge of the application and its internal structure to use.
The last twist is the sheer number of SaaS and web applications to connect to. As of today, SaaS Showplace lists 950 applications, and Programmable Web lists 1200 API’s, which is a large set of possible data sources that might need to be integrated.
New consumers for data
As web and mobile technology influences the enterprise, the way data is generated and consumed is also changing.
In data integration, the traditional target for data has been a persistent database, usually a data warehouse or operational data store. That data has then been used for business intelligence, reporting, and analysis.
Although this model will continue, the need to provide data on demand, in the form of data services, is now a requirement. Consuming external data as a service, from free and commercial sources, is already common in the form of syndicated feeds. The concepts behind mashup applications, which combine data from multiple sources, are being used in lightweight and situational applications. These same concepts of multiple data sources and data services are now being applied to internal data as well.
New deployment models
Data integration has traditionally dealt with on-premises data,
with some external reference data mixed in. The effect was an integration model influenced by the assumptions that most data would be local, and integration would be centralized.
The cloud movement challenges these assumptions for integration. The data sources are becoming a mixture of on-premises and remote data, with SaaS and web in the mix. The delivery is no longer to a fixed target; it can now be a mixture of SaaS, local targets, and data service endpoints. It’s likely that a mixture of these will be required in most organizations.
The shifts in data sources and data consumers have a direct impact on the way integration software is deployed. Integration software may need to be local, to enable access to on-premises applications and database. It may need to be remote, to provide a data services layer for one or more applications without service interfaces. It may need to be hosted, either in a data center or on a cloud platform, to take advantage of less expensive resources. The actual deployment model will be determined by the capabilities needed, not constrained by invalid assumptions.
The data integration landscape is changing, and that change is driven by fundamental shifts is the locations and types of data, and how the data is used. The core functions of data integration haven’t changed much, but the neighborhood we live in sure has.