12
Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that can deliver sophisticated data services, enable an SOA to address its data-centric challenges, and finally realize its full potential. David S. Linthicum Introduction Data services are a combination of application behavior and information, which are the core building blocks of IT architecture (including service-oriented architecture, or SOA). Data services deliver data as a standards-based service and provide controlled and governed access to back-end systems and data. Given the heterogeneity of back-end data sources and the complexity involved in effectively leveraging these data sources in an enterprise, data services are the single most important component in an SOA approach. Considering the importance of data services, logically there should be a focus on the technology, design, implementation, and deployment of data services. The larger, more strategic benefit of data services is increased business agility, or getting access to information faster and in real time. SOA is all about business agility. With a comprehensive data services technology, IT organizations can finally realize the full potential of SOA. First, be aware that not all data services are created equal. There are different types of data services that are built to meet the needs of the architecture, and thus the business. There are data services that are more transactional in nature, thus exposing more behavior than data. There are data services that lack behavior and are little more than direct database APIs. And then there are more sophisticated data services that go across several heterogeneous data sources to enable data access, data quality, data transformation, and data delivery, using any modality and any protocol. To distinguish these data services from more primitive data services that simply provide physical data access, I will refer to these more sophisticated data services as SOA-based data services throughout this paper. The trick is to determine your requirements—to enable a single-view of “X” for your applications or to extend a data warehouse to include transactional data for operational business intelligence, for example—and then design and deploy SOA-based data services that will meet the needs of the architecture. WHITE PAPER

Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Embed Size (px)

Citation preview

Page 1: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Defining, Designing, and Implementing SOA-Based Data ServicesLeverage a comprehensive data integration platform that can deliver sophisticated data services, enable an SOA to address its data-centric challenges, and finally realize its full potential.

David S. Linthicum

IntroductionData services are a combination of application behavior and information, which are the core

building blocks of IT architecture (including service-oriented architecture, or SOA). Data

services deliver data as a standards-based service and provide controlled and governed

access to back-end systems and data. Given the heterogeneity of back-end data sources

and the complexity involved in effectively leveraging these data sources in an enterprise,

data services are the single most important component in an SOA approach. Considering

the importance of data services, logically there should be a focus on the technology, design,

implementation, and deployment of data services.

The larger, more strategic benefit of data services is increased business agility, or getting

access to information faster and in real time. SOA is all about business agility. With a

comprehensive data services technology, IT organizations can finally realize the full potential

of SOA.

First, be aware that not all data services are created equal. There are different types of

data services that are built to meet the needs of the architecture, and thus the business.

There are data services that are more transactional in nature, thus exposing more behavior

than data. There are data services that lack behavior and are little more than direct

database APIs. And then there are more sophisticated data services that go across several

heterogeneous data sources to enable data access, data quality, data transformation, and data

delivery, using any modality and any protocol.

To distinguish these data services from more primitive data services that simply provide

physical data access, I will refer to these more sophisticated data services as SOA-based data

services throughout this paper.

The trick is to determine your requirements—to enable a single-view of “X” for your

applications or to extend a data warehouse to include transactional data for operational

business intelligence, for example—and then design and deploy SOA-based data services

that will meet the needs of the architecture.

white paper

Page 2: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

For example, let’s consider a typical enterprise that has accumulated multiple, disparate

operational data sources to support all of its applications. As when many databases are

leveraged, data is often duplicated and needs to be reconciled across these systems. Thus,

there are many versions of the same database attributes, such as they relate to customers,

sales, and other classifications..

Understanding the issues that data services address should be the first consideration

when taking the enterprise to a more data-driven approach, or work from the data to the

processes. However, many efforts to “reinvent” enterprise architecture around data to

become more data driven do not work. Why? There is a lack of understanding of the data, a

lack of externalization of the data using standards (such as data services), and a lack of data

governance and data quality as part of the infrastructure.

SOA-based data services offer the opportunity to get data under control and leverage an

enterprise architecture that’s more data driven. SOA-based data services require a model-

driven approach to architecture that facilitates the rapid location and understanding of the

data, improves the ability to quickly build trusted data services once and deploy them for all

applications, and ensures consistent enforcement of data policies (e.g., data quality, freshness,

privacy). These capabilities are typically missing from the traditional enterprise architecture,

and the lack of focus on the data, and underlying core data services features, hampers

productivity and the potential of the IT infrastructure

There are a few core use cases to consider, including single-view of data for applications and

virtual data warehouse for operational business intelligence.

The single view of data for applications is the ability for SOA-based data services to provide

one abstracted view of many different databases with many different native structures while

insulating the consuming applications from underlying data changes. These physical structures

are externalized as a single view of the data through an SOA-based data service with a

unified schema that’s more logical for business and thus more productive when consumed

by any number of applications. For instance, the ability to externalize sales data using one

unified schema externalized using SOA-based data services, which physically exists within

three different databases that exist within three different systems, such as inventory, CRM,

and general ledger systems, as an example. To facilitate reuse, SOA-based data services must

have the ability to be provisioned in multiple ways (e.g., SQL, Web services, messaging, ETL)

without having to rebuild them for each consuming application and include data quality rules

to ensure the data can be trusted. Furthermore, a single view of data must be resilient to

data source changes and a trusted source of data, and because it is not always possible to

cleanse the source data, SOA-based data services must be able to cleanse and match data

on the fly (we’ll discuss this in more detail below).

[ 2 ]

Page 3: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

A virtual data warehouse for operational business intelligence, as related to the use of SOA-

based data services, combines historical and operational data with the ability to leverage

disparate data from any number of operational data stores through the abstraction of the

data versus the traditional approach of extracting the data, translating the data, and loading

the data into a separate database. The difference is in data freshness (i.e., latency), data

quality, simplicity, and rapid implementation. The data can be viewed in real time or near real

time with data quality rules applied on the fly, leveraging current operational data, versus a

traditional data warehouse where the data can be weeks, perhaps months old. The ability

for a traditional data warehouse to absorb new data sources can take months to implement,

whereas a virtual data warehouse that augments an existing data warehouse or data mart

with new data sources can be implemented in days. For instance, when a manufacturing

company needs to understand the current state of plant operations, it can abstract the data

from any number of operational data stores and provide a true “dashboard” of the current

state of the plant, leveraging data that is minutes old. Moreover, users are able to look at

near real-time data within the context of historical data. For example, they could look at the

existing state of plant operations in the context of the same data gathered during the past

100 days, and perhaps the past several years. This ability allows management to make critical

business decisions using only the latest data, while considering the knowledge of the past

(we’ll discuss this in more detail below).

Let’s drill down a bit deeper into the general use cases we’ve presented above. Let’s say you

have multiple disparate data sources for all of your applications, and data is often duplicated

and needs to be reconciled across these systems. What you need is a single consistent view

of the data, no matter where it resides, and no matter what structure is employed. Core to

this problem is the lack of a single information system that can identify all the current and

accurate data that relates to, say, a specific business entity (such as a customer or a product),

regardless of the physical system in which that data resides.

In many firms, customer data remains in silos, such as specific divisions of the organization,

typically scattered across many different databases. Therefore, information is often

inconsistent or incomplete. The inability to create a single view of the customer, or to

synchronize customer data in a timely fashion across multiple operational systems, leads to a

few issues:

The customer may be frustrated by lack of convenient access to information. •

The business may provide an inconsistent customer experience across product groups or •channels because of the lack of integrated customer data.

Without a holistic view of a customer’s relationship, it is very difficult to make intelligent, •relevant marketing offers to the customer to sell more products.

The solution to these problems is to provide downstream applications with access to a

logical master data object, without regard to where it physically resides or how it changes.

This is the notion of data abstraction or, in this case, the use of SOA-based data services to

realize and manage that abstraction, which is the core topic of this paper.

[ 3 ]

Page 4: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

A good example of the value of this approach is the example of an organization that has a

limited view of its customers, making it difficult for the company to generate new revenues

through up-selling, cross-selling, or otherwise leveraging core customer data that is physically

scattered throughout the organization. How many times have you gotten multiple calls from

the same company, with none of the callers aware that someone else in the company is

contacting you? This situation sours the relationship between company and customer.

When it comes to virtual data warehousing, let’s look at the concept of operational business

intelligence (operational BI reports, tools, and dashboards) so that an organization can react

faster to business needs and anticipate business problems in advance before they become

major issues. When considering this type of BI, you need near real-time access to the data,

typically with zero or low latency.

A case in point would be a financial services company that purchases another financial

services company. An operational BI portal needs unified data from both companies. This

single view of data from many physical data stores allows the new company to make core

decisions about the business, perhaps even looking at real-time operational data in context

with historical data, to understand the current state of the combined business in light of its

history. This approach allows the new company to look at critical data without having to

combine and aggregate the data into the physical data warehouse, which could take months

and delay realizing the value of the acquisition. With SOA-based data services, the same

logic used to virtualize the data for the operational BI portal can later be reused to move

the data from the new company into the appropriate physical data stores if desired.

Considering the examples above, we can then further define the value proposition of

SOA-based data services, including the ability to leverage a data abstraction layer that allows

the physical data sources to better represent the business entities. Also, provisioning data

for any application will allow a single platform to become the foundation for reuse of data

integration logic for any use within the enterprise. Finally, including data quality rules and

data freshness policies within data services allows IT to manage and support SLAs to define

performance, as well as to monitor ongoing data quality issues.

Moving to a Data-Driven EnterpriseThe key issue here is the lack of a single information system that can identify all the current

and accurate data that relates to, say, a specific business entity (e.g., customer or product),

regardless of the system in which that data resides. This single view of a database enables

the user to see and understand the current state of the data, using the structure that best

represents the business, regardless of the existing physical structures.

In the case of customer data, the distribution of disparate data causes several issues:

The lack of a single view means the customer may be frustrated by lack of convenient •access to timely, trusted information.

The business may provide an inconsistent customer experience across product groups or •channels because of the lack of integrated customer data.

Without a holistic view of a customer’s relationship, it is very difficult to make intelligent, •relevant marketing offers to the customer to sell more products.

[ 4 ]

Page 5: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Relevant to the use of a single data view is the fact that within most data warehouses you •can’t get a single real-time view of the data. Thus, when looking to leverage concepts such as operational business intelligence, you can’t see real-time or near real-time transactional or transient data and you are not able to make decisions on that data using the operational BI tools. Typically, the environment needs to support access for thousands of users and must also ensure rapid development and integration time. Required is the need to augment the data warehouse with the support for direct querying of source systems without consolidating data.

In this paper, we will take the mystery out of what a comprehensive SOA-based data •services technology should look like. We’ll examine the types of data services, or core data service patterns, that we’re seeing in modern IT architecture. Also, we’ll move beyond the definitions and look at the best practices around data services design and development.

The Need for Data ServicesAs defined in this paper, data services are a fundamental building block of an SOA and IT

architecture in general. Indeed, the ability to define, design, create, and deploy data services

is critical to deliver the right information to the services, processes, applications, and people

who need the data.

Typically, data services are suited for the following situations:

Business needs are changing or dynamic. or the enterprise is in an industry where change •is a constant (e.g., mergers and acquisitions), such as the financial or high-technology verticals

Time to data is a defined business need, such as the need to get data as it happens in •support of a business process or key business intelligence applications

There is a need for better operational visibility and faster decision making, such as the •manufacturing plant use case presented above. The ability to see operational data is a core strategic value for the manufacturing company

When historical data needs to be combined with transactional or other transient data, •such as the ability to understand the current state of the data in the context of past data to better interpret the meaning

There are complex data integration challenges and needs, such as high data volumes and •many heterogeneous data sources

The validity of data is transient, where creating a data warehouse is time and cost •prohibitive for a particular business

Replicating information is not allowed by organizational policy—for instance, working with •private data in the health care vertical

There is a need for prototyping to see what data looks like before moving it to the data •warehouse or other physical store

Data quality needs to be enforced proactively at the point of entry and consistently across •applications and projects

[ 5 ]

Page 6: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Much of the inefficiency around data within IT concerns data issues, typically poorly

normalized, ill-designed, unstructured data, and poor data quality (errors, omissions, and

duplicates) that limit IT’s ability to get the right information, at the right time, to support

a core business process. Moreover, the static nature of existing data, such as traditional

systems where the data is tightly coupled to the application, means that developers who

don’t leverage data services are forced to change applications anytime the underlying

database schema is altered. This reduces agility, or the ability to quickly align IT with the

changing needs of the business.

The use of data services is not a panacea; you need to incorporate the right design

approach. A model-driven approach built upon active, logical data objects as the foundation

for data service design is a best practice. The logical data object represents the business,

and defining the business entities before mapping them back to physical databases or new

data structures is much more effective than working up from the existing physical databases,

or from a physical schema design. Associating comprehensive data services to the logical

data object enables a single point of management to rapidly provision data services for all

applications—this is what is meant by active logical data objects. The logical data objects are

not merely static models of the data; they are executable and extendable in nature.

Finally, the incorporation of data services governance ensures that the use of the data service

will be in line with the requirements of the business. Data services governance enforces

standardization and consistency to meet data quality expectations, data freshness SLAs, and

data privacy regulations. Those who neglect the use of good data governance practices,

and underlying enabling technology, won’t provide the long-term value for the applications,

services, or processes that leverage those data services.

Here are a few key recommendations:

Understand the core needs of those who will leverage the data services, including •governance, data quality, performance, and security, before defining and designing the data service.

Focus on the logical design of the data service before you consider the physical mappings. •This will allow you to consider the business holistically versus focusing on a physical schema or schemas.

Profile the data through the lens of the logical data object to discover data quality issues •early on so that work can accurately be scoped, minimizing the risk of any project delays.

Consider data services governance as systemic to the data services lifecycle, including how •data services will be managed in an operational state.

The value of SOA-based data services is clear, but the path to data services requires a bit of

groundwork. With the right approaches, the right steps, and the right technology, you will

enhance your chances of success in leveraging and managing your existing or new enterprise

data assets.

[ 6 ]

Page 7: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Defining Data ServicesServices define the basic building blocks of an SOA. The services provide:

Behavior•

Data•

Interoperability•

First, you should note that there are vast differences between traditional data services that

focus solely on data access and SOA-based data services that provide data abstraction,

data access, and the ability to surround the data with predefined behavior. SOA-based data

services are even more valuable to the architecture because single and intelligent views of

the data may be combined with any number of other services or exist within any number

of applications. They enable reuse, in that the SOA-based data services may be leveraged

by any number of IT assets, and they provide agility, considering that they abstract any

application, process, or service from the underlying physical database that may frequently

change.

Services come in all different forms, in relation to the specific purpose of the services.

However, generally speaking, they are typically transactional in nature, providing more

behavior than data, or they are data oriented in nature, providing more data than behavior.

Most services are data oriented.

Services that focus on the production and consumption of data are known as data services

and make up about 95 percent of the services under management within a typical SOA.

These services supply access to physical data that exists either within a database or

application and re-represent that data using a structure that’s native to the data service (e.g.,

customer data), provide that data in the context of some type of behavior (e.g., update,

add, delete, edit), as well as furnish a standard interface, such as a Web services interface,

to interact with other applications or services without requiring close coordination around

development

Data services are pervasive within an SOA because most of what applications, processes,

and other services do is process data. Furthermore, designing and building data services

can be especially challenging when the data is distributed across multiple sources and the

semantics and data quality are not clearly understood. Thus the need to define, create,

and implement data services is critical to the success of an SOA. The approach you take

to building data services and the technology you leverage to design, build, and deploy data

services are critical as well.

[ 7 ]

Page 8: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Core ServicesThere are many types of data services and/or approaches required to support data

management within the context of an SOA. We can break them up into three main

categories:

Information catalog services•

Data provisioning services•

Data service governance (see Figure 1)•

Features & Capabilities High-Level BenefitsInformation Catalog Services Discover & understsand all enterprise data

assets

Data Provisioning Services Deliver data to any application, any mode,

using any protcol, & insulate applications

from underlying data changes through

encapsulation

Data Service Governance Enable on-the-fly data cleansing &

administration of business rules, user

authorization & policies

Figure 1: SOA-Based Data Services

Information catalog services locate any information regardless of source or type. For example,

a service that provides a directory of metadata to determine the location of a particular

piece of data you’re looking at in the context of an application or the holistic architecture

is an information catalog serviceA sample application for this would be the ability to find

customer data for a specific use. Also, you should consider integrated data profiling so

you can quickly understand the semantics of the data and uncover data quality issues.

Data profiling allows those who maintain the data service to accurately scope out the

implementation of the data service.

Data provisioning services provide optimized access to information over any protocol

or modality. In essence, data provisioning services facilitate access to the data using a

mechanism that’s best for the specific applications. This includes SQL, Web services, or JMS,

or other native interfaces. This type of data should provide access in real time, near real time,

and batch, as well as have the capability of being leveraged from within any application or

service. Using a model-driven approach to building data services insulates applications from

underlying data changes in the source systems.

Data services governance supplies a single point of authorization and privacy for all data,

allowing those charged with governing the SOA to place policies and rules around the

quality, freshness, and utilization of the dataThe concept here is to provide a single point

of control to ensure that the data under management, and the data services that access

the physical data, are indeed doing what they should be doing in support of the business

requirements and architecture.

[ 8 ]

Page 9: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Designing Data ServicesThere are several core steps to design and deploy data services. These steps include:

Define the core purpose of the data service•

Create a logical data object and abstraction layer •

Define use and behavior of the data service•

Test the data service•

Support data services governance•

Define the core purpose of the data service is about determining the characteristics of the

service, including what functions the service should perform, applications and other services

that will leverage that data service, as well as security, privacy, and governance requirements

for that service. In addition, you need to consider performance requirements, including the

minimum amount of time for information to be consumed into the service for delivery to a

back-end database, and the consumption of the data into another application or service.

Core to the data service design process is the method of creating a logical data object and

abstraction layer to back-end databases and applications. Typically, this is accomplished by

linking the “as is” physical schema (or schemas, if many data sources are involved) to the “to

be” schema, or how the structure will be represented within the data service. The purpose

of doing this is to take complex data and, in many instances, data that does not provide a

good representation of the business entities (e.g., customer, sales, inventory) and allow access

to that physical data using a well-defined “to be” schema that represents the business entities

in a much more logical and meaningful way.

There are a few core patterns to consider here, including schema combining and abstraction,

recasting, and physical re-representation. Data and schema combining is the ability to take

data from very different sources and combine them into one logical schema that provides

a single combined abstract schema for several different data sources, sometimes leveraging

very different types of technology (see Figure 2). This is useful when there is a need to view

data from multiple databases or applications within a single data service representation, using

a single virtual data schema. This scenario is a requirement of many business intelligence and

custom applications that need to look at any number of physical databases to externalize

the required business decision data. The logical data object and abstraction layer method

provides the maximum amount of reuse and encapsulation.

[ 9 ]

Page 10: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Recasting is the complete re-representation of a single underlying physical schema. Typically,

this step involves taking a single poorly designed database and placing a better schema on

that database. This is accomplished through the use of a data service that provides the

applications that leverage the data services with a much more productive representation of

the physical data.

Physical re-representation recasts the physical structure of an existing back-end database

within the context of the data service, typically without modification of the underlying

schema. This provides a standard interface into the data, but does not alter the way the data

is represented from that database.

Define use and behavior of the data service is about building functional logic around the data,

or the behavior defined for accessing, profiling, cleansing, transforming, and delivering the

data. Keep in mind that data services are more than mere interfaces into the data. They

can be created to provide any number of reusable functions around the externalization and

consumption of information.

When testing the data service, it should be checked for form, function, and use, including

how well it lives up to the predefined purpose of the service. Does it deliver the quality

of information required for the consuming application? Does it perform as required by the

consuming applications in terms of performance, data freshness (e.g., update frequencies),

and data quality?

Finally, data services governance is the ability to place policies, rules, and logic around the

data service to control access and data quality along with operational policies to tune

performance, specify caching rules for data freshness, and define data privacy rulesThis

ensures that the use of the data service is limited to only those who are authorized to

leverage those services, that the utilization of the data service will be restricted to specific

types of use, and that governance policies are enforced consistently across all applications

and projects. Data services governance is a critical success factor for leveraging data services

for any type of SOA.

PackagedApplication

Database WarehouseUnstructured Data Internet & CloudFlat Files Operational Store

Virtual Data Schema

Figure 2: Leveraging data services allows you to combine data from very different data sources into a logical data object and data abstraction.

[ 10 ]

Page 11: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

Call to ActionThe use of data services is critical to the success of an SOA, as well as critical to the success

of IT in general. The ability to access complex data models using mechanisms that allow

you to represent the data in a more logical business context, along with business rules

and behavior, provides a huge strategic advantage. SOA-based data services enable IT

organizations to be more agile and responsive to the business and fully leverage data assets

for optimal business performance.

However, SOA-based data services are not possible without the right design and

architecture processes that come into play, and the right enabling technology platform to

create and deploy data services. In this paper, we provided the basics around the concept

of data services, with the call to action being your IT organization actively looking at data

services, the process of building data services, and the value they can bring to your existing

IT infrastructure.

About the Author

David Linthicum (Dave) is an internationally known enterprise application integration (EAI),

service-oriented architecture (SOA), and cloud computing expert. In his career, Dave has

formed or enhanced many of the ideas behind modern distributed computing, including EAI,

B2B application integration, and SOA, approaches and technologies in wide use today.

Currently, Dave is the founder of David S. Linthicum, LLC, a consulting organization dedicated

to excellence in SOA product development, SOA implementation, corporate SOA strategy,

and leveraging cloud computing. Dave is the former CEO of BRIDGEWERX and former

CTO of Mercator Software and has held key technology management roles with a number

of organizations, including CTO of SAGA Software, Mobil Oil, EDS, AT&T, and Ernst and

Young. Dave is on the board of directors serving Bondmart.com and provides advisory

services for several venture capital organizations and key technology companies.

In addition, Dave was an associate professor of computer science for eight years and

continues to lecture at major technical colleges and universities, including the University of

Virginia, Arizona State University, and the University of Wisconsin. Dave keynotes at many

leading technology conferences on application integration, SOA, Web 2.0, cloud computing,

and enterprise architecture and has appeared on a number of TV and radio shows as a

computing expert.

[ 11 ]

Page 12: Defining, Designing, and Implementing SOA-Based Data Services · Defining, Designing, and Implementing SOA-Based Data Services Leverage a comprehensive data integration platform that

© 2009 David S. Linthicum, LLC 7042 (10/20/2009)