Data is an essential asset required by every enterprise to compete effectively in today’s economy. Yet, the value of data assets can only be realized when they are used strategically, operationally, consistently, and accurately across the business. Doing so has historically been challenging. Today’s enterprises are turning to data-as-a-service (DaaS) as part of their cloud data strategy to ensure high levels of SLA, data governance, accuracy, and high availability demanded by customers and business strategy. To fully leverage a DaaS approach, they need to have a modern data architecture in place.
What is data architecture?
Designed by data architects, data architecture standardizes how enterprises collect, store, transform, distribute, and use data for the purpose of helping data analysts and people in the organization make better decisions based on real-time business intelligence. Data architecture is the foundation for data modeling and information architecture, both of which make data useable and useful across the organization.
While data architecture is not new, modern enterprise data architecture (or modern data architecture) is and has evolved as businesses increasingly move to the cloud. Only the cloud enables the speed, scalability, and ease of use needed to make modern data architecture effective. As businesses are moving to cloud-based infrastructures, their data architecture is also being transformed.
What is modern data architecture?
A modern data architecture focuses on aligning data to the capabilities powered by the cloud. Traditional data architecture was built on on-premise data models that consumed a ton of time for data processing and data management. With the infrastructure abstracted away by the cloud, modern data architecture focuses on making data as easy to access and as useful to the business and customer as possible. It facilitates ease, speed, collaboration, real-time analysis, and consistency.
A modern data architecture is:
- Built for end-users to consume. The cloud enables end users to determine what data they need for business decisions and data architects to design data access that delivers what they need.
- Automated with data pipelines and data flows. No one has time to wait for slow data processing. With the cloud and data integration, enterprises can automate the entire data management process so data flows smoothly and freely everywhere it needs to go in the organization, while still maintaining data governance. Data integration is key to making sure that every part of the whole connects.
- Curated by AI/ML. Modern enterprise data architecture harnesses the power of artificial intelligence (AI) and machine learning (ML) to automate data processing, recognize new data types, cleanse data, fix data quality issues, perform data mining, ensure data standards are maintained, and surface data analytics and insights. AI/ML is key to the speed and accuracy behind automation.
- Scalable to meet unpredictable demands. Data is generated and consumed at extraordinary rates, and as businesses deal with fluctuations in consumer demands, they need to be able to scale data up and down, automatically and affordably.
- Shareable for trusted collaboration. Shared data is critical to ensure that everyone works from the same data source of truth. Shared data also helps to break down departmental silos and foster easier, more trusted collaboration.
- Secure by design. For most enterprises, data is their most valuable asset. Modern data architecture takes into account data security with controlled data access and authorization, as well as compliance with data privacy laws and regulations such as GDPR and HIPAA.
If you’re creating a data architecture from scratch in the cloud, then building these characteristics in is easy. But most enterprises are straddling legacy on-premise infrastructure with cloud, and often multi-cloud. Their data resides in multiple places and is typically heavily siloed. Data migrations to the cloud and data integrations are a priority.
In addition to the six characteristics of a modern data architecture, you’ll also want to ensure that your design facilitates the following:
- Supports a move to self-service and multiple types of users (integrators, data scientists, line of business leaders, stakeholders)
- Enables a hyper-connected enterprise (think of data as the nerves connecting every part of the body, seamlessly transmitting information as needed)
- Shifts reporting to predictive and prescriptive analytics for real-time insights, AI-driven recommendations, and in-the-moment decision making
- Future-proofs for new data sources, downstream applications, and use cases
3 stages of the modern data enterprise journey
Because enterprises are digitally transforming and shifting toward the cloud, they typically undergo a phased journey to achieve a modern data architecture.
This can be broken down into three main stages:
Stage 1 — On-Premises
Most enterprises have on-premises systems, with the tools to store and process big data sets and perform complex transformations. This environment is challenging for the following reasons:
- It requires a large capital investment up front to get started and a large investment in operating expenses (OpEx) for the necessary personnel
- It needs a specialized, dedicated skill set to manage the big data tools
- It results in a slow response time, including the lead time in purchasing, shipping, and installation of the data environment
Enterprises have operated like this for many decades, and typically have heavy investments in on-premise models. Not only is there financial investment, but the risk of losing data or disconnecting customized integrations can be too great for a complete cloud migration. Many enterprises have data they feel needs to remain in the purview of their own servers and so take a hybrid cloud approach.
Stage 2 — Cloud: Virtual Private Cloud (VPC)
As they adopt the cloud, the second stage in the journey is “lift and shift,” where enterprises simply move on-premises clusters to a cloud provider running in a virtual private cloud network and can take advantage of IaaS benefits, such as lower cost. Forrester reports that organizations deploying in the cloud save 20-60 percent over on-premises infrastructure cost, since most overprovision their servers and storage and then need to manage these environments.
However, this stage still has some major challenges, as it:
- Does nothing to address the challenges of managing and maintaining the environment
- Has high OpEx
- Does not address the skill-set gap, and the skills required to manage the services running in the VPC
- Has a slow response time
- Does not support native cloud storage services
Managing on-premise and private clouds is complex, which often leads enterprises to look for a better way to manage the cloud environment. This leads to moving to managed cloud services.
Stage 3 — Cloud: Big Data as a Service
At this stage, enterprises have recognized the challenges and are addressing them by moving to cloud-managed services such as IBM, Microsoft and Google. These managed services free the enterprise from the complexity of managing and maintaining the at-scale processing environments, and lower valuable OpEx spend.
Other advantages include:
- On-demand capabilities that utilize storage and computing resources only when needed, thus reducing OpEx
- A much simpler way to scale up and down to Terabyte/Petabyte volumes
- Faster response times for business needs
Additionally, cloud-managed big data platforms are designed with cloud storage services. They have native integration with the cloud storage, so you can use the cloud storage as a distributed storage component suitable for data lake storage.
Let’s talk a bit about data storage.
Modern data architecture needs data lakes
A data warehouse stores structured data (i.e., from transactional systems). It’s optimized to analyze relational data, not semi/unstructured data. So, before writing from the data source to the data warehouse, the structure needs to be defined, and data needs to be cleaned and transformed. This takes time and makes it more difficult to get usable data at the speed an enterprise needs. Also, with so much new data available, the cost of data warehousing is actually very prohibitive.
Data lakes support modern data architecture.
Unlike a data warehouse, a data lake is a collection of all data types: structured, semi-structured, and unstructured. Data is stored in its raw format without the need for any structure or schema. In fact, you don’t need to define the data structure when it’s captured, only when it’s read. Because data lakes are highly scalable, they support larger volumes of data at a cheaper price. And, with a data lake, you can store data from relational sources (like relational databases) and from non-relational sources (IoT devices/ machines, social media, etc.) without ETL (extract, transform, load), which makes data available for analysis much faster.
4 features of a modern enterprise data architecture
There are four primary features of a modern enterprise data architecture: 1) the data cycle, 2) data storage, 3) an integration platform, and 4) data delivery.
Enterprises constantly encounter new data sources and need to capture data before they know the eventual use case. Captured data is extracted to populate known use cases as well as held for future undefined use cases. Then the inbound data needs to be conformed to corporate standards to ensure governance, quality, consistency, regulatory compliance, and accuracy for downstream consumers, regardless of their business need, skill set, or understanding of data architecture. Once the data has been captured and conformed to corporate standards, refinement services prepare the data for its eventual downstream application and/or use cases.
Data is stored in the data lake. Think of the data lake as a modern data factory, and within the lake are “containers” for various stages of data processing. The first container is the landing container, where inbound raw data is received regardless of its form, transport, or source. This is where uncleansed data goes. Decisions about what raw data to keep are made here. Data that is kept is moved to the conformed container.
The conformed container is where raw data is cleansed and data quality is ensured. The conform container ensures that the enterprise is working with a consistent data set that is compliant with standards.
Next, we have the refined container that prepares data for its eventual delivery target, and there may be subsets of refineries depending on the use cases. Once the data is refined, it’s staged for delivery to its destination. After delivery, it may be moved to a working area for data scientists to use, archived for long-term storage, or deleted.
Data integration platform
The integration platform takes data from different sources and combines them to provide a unified view. In a modern data architecture, the integration platform needs to be flexible enough to support all the required data sources and targets, as well as the data services through each stage of the data cycle. It needs to be able to support data with and without schema and manage metadata. Additionally, it needs to be able to handle the integration and processing required for:
- High velocity, variety, and volume data capture
- Low latency application integration
- High volume data conformance processing
- Data integration from delivery to target
- API consumption (essential for B2B ecosystems)
Further, the scenarios above need to be made accessible to a broad user community ranging from highly skilled IT professionals to business users needing to accelerate a line of business project in response to a quickly changing business environment. In the modern enterprise, analysts and data scientists are being called upon to answer strategic questions and unlock innovation at an unprecedented pace and simply don’t have the luxury of being dependent on an IT organization to make the critically necessary information available. Self-service is no longer a luxury or convenience but is now a mission-critical requirement. Being able to quickly build data pipelines is essential to keep business moving at the speed it needs in a digital age.
Lastly, data needs to be delivered to its appropriate targets. Secure data accessibility is integral to modern data architecture. Governance, security, role-based access control (RBAC), SLA, throttling, and usage analytics are all critical to delivering data to its intended users whether internal employees or external partners.
Enterprises that take a data-as-a-service delivery approach ensure the highest levels of availability, accessibility, and customer experience without the expense of constant IT fire drills or having to compromise security or internal intellectual property. Data gets delivered to its final destinations which will include data marts, applications, files, data ponds, data science workbenches, AI-enabled solutions, and API-ecosystems.
Build a robust modern data architecture
A robust modern enterprise data architecture will ensure that enterprises have the accessibility, speed, flexibility, and reliability to optimize every data source and use it to make better business decisions. SnapLogic provides data integration through its intelligent integration platform as a service, helping enterprises build modern data architectures to future-proof their data needs.