27
  HHSC Enterprise Information Technology Proo f of Concept Assessment Repor t for Master Data Managemen t Date: February 2, 2010 Prepared by: Enterprise Data Warehouse project

PoC for MDM

  • Upload
    svmglp

  • View
    40

  • Download
    1

Embed Size (px)

DESCRIPTION

PoC for MDM

Citation preview

  • HHSC Enterprise Information Technology

    Proof of Concept Assessment Report for

    Master Data Management

    Date: February 2, 2010

    Prepared by: Enterprise Data Warehouse project

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Table of Contents

    1. Problem Definition ................................................................................................................2 Data Collection....................................................................................................................2 Data Cleansing ....................................................................................................................2 Data Matching .....................................................................................................................3 Data Standardization and Ongoing Maintenance ...............................................................3

    2. Master Data Management (MDM) ......................................................................................4 Definition .............................................................................................................................4 How MDM aligns with HHS Initiatives ...............................................................................5

    3. Proof of Concept (PoC) .........................................................................................................6 Scope of IBM Master Data Management (MDM) Proof of Concept (PoC) ........................6 IBM MDM Product Suite .....................................................................................................8 Proof of Concept (PoC) Approach ....................................................................................11

    4. Results and Observations....................................................................................................14 Observations of Source Data Quality ................................................................................15 Data Matching Process......................................................................................................17 Use Case Results................................................................................................................18

    5. Recommendations and Conclusion ....................................................................................19 Recommendations ..............................................................................................................21 Conclusion .........................................................................................................................24

    Appendices...................................................................................................................................... i Appendix A Glossary........................................................................................................ ii

    Page i

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    1. Problem Definition

    The Health and Human Services (HHS) system uses various mission critical applications to support and maintain its day to day operations for providing client services while providing decision making capabilities within its executive, management and operational activities. As a result of program growth across HHS agencies and the need to adapt to various laws, legislation, policies and procedures over the years, the IT systems that support these operations have become complex, difficult to maintain, and difficult to change from a decentralized, program-centric design of providing client services to a client-centric design of providing client services. As a result, it is difficult to transform to more current federal and state philosophies to move to a client-centric service delivery view and to support interoperability initiatives as set forth by initiatives such as the Medicaid Information Technology Architecture (MITA). For example, a lack of contextually consistent identification mechanisms, definitions, and standards associated with tracking key business entities, such as clients, providers, and services introduces a significant challenge for executive management and operational staff to get a holistic view of an entity across programs. The ability to match and link these entities across different programs and systems with a high degree of trust is a foundational level issue that could directly or indirectly impact the successful implementation of upcoming initiatives on the HHS roadmap, such as Enterprise Data Warehouse, MITA and Health Information Technology / Health Information Exchange (HIT/HIE). Establishing a robust and reusable solution for programs and applications to establish a trustworthy enterprise view of the client is critical for moving forward with future initiatives at HHS. There are several inherent system designs, operational practices, and technical issues that currently prevent HHS from creating an enterprise view of a client or a provider at this time.

    Data Collection

    Each HHS agency uses independent operational systems to support their various programs. Although key components of a given data set are often similar across the enterprise (e.g. client data: name, date of birth, social security number, address, etc.), the data collected by each agency resides in silo-centric systems in different formats with varying operational business rules. Linking data from different sources across the enterprise is difficult due to program-specific system designs, inconsistent data formats, and lack of data sharing agreements.

    Data Cleansing

    To assist executive management in making informed decisions and to satisfy HHS analytical and reporting needs, several partial or unsuccessful attempts have been made to consolidate data into one central location from various context-specific data sources across a subset of HHS agencies / programs rather than from the perspective of establishing a complete client-centric view of services availed. Often times, these data collection and cleansing processes are extremely resource-intensive and, in some cases, do not accurately consolidate the large amount of data in a contextually meaningful way.

    Page 2

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Data Matching

    Client data collected in various mission-critical applications are operation-specific and their designs ignore the existence of the same client data being captured in one or more other HHS systems. Data collected in each system have separate business rules, formats and attributes that present challenges to matching client data from various agencies and prevent establishing a single view of a client. This operational practice of capturing client data for different contexts without contextual validation across systems during the operational process, makes matching and reconciliation of cross-system data a significant challenge during downstream operations and strategic analysis activities.

    Data Standardization and Ongoing Maintenance

    There are limited enterprise-level standards and guidelines to define HHS master data entities (client, provider, claim etc.), or relationships between similar entities (client, patient, person) which has contributed to redundant data across disparate operational and analytical systems and potential duplication of services provided by HHS agencies to clients. See example in diagram 1 below.

    Client Data Source

    Diagram 1: Ways of storing Client Data in various HHS systems

    Data Source Data Source Data Source Data Source Data Source (DS) 1 (DS) 2 (DS) 3 (DS) 4 (DS) 5

    Client Data Attribute

    INDV_ID XXX_MEMBER_NO PCN_NBR RECORD_KEY PERSON_ID

    Client data is stored with different names, formats and values there is no single view of a client. Problem: Client data cannot be joined across various systems without extensive data analysis

    and transformation rules.

    There is no single, accurate, and comprehensive reusable framework to link client data across the various systems as a proactive and foundational basis for decision making and managing operations from a cross-functional client view perspective. Operational practices that attempt to build such context-specific views are therefore reactive in nature and are often resource intensive, involve significant manual intervention, and take a significant amount of time to design and perform the necessary cross-system analysis. In addition, these reactive solutions are often situation-driven, context-specific, and offer limited opportunity for expansion into reusable robust enterprise-focused long term solutions that efficiently leverage and capture cross-program subject matter knowledge. A potential solution to these data issues is to proactively recognize this pattern of problems across the enterprise and establish a unified view of commonly used entities (client, provider, claim, etc.) to make available entity-centric structures at appropriate levels of detail for reuse by various agency-

    Page 3

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    specific analytical and operational level activities. Program level entity-matching operations could then refocus their resources from siloed, resource intensive, entity matching activities using a service from a centralized repository that maintains the necessary business intelligence and data for a dynamic entity-centric view of key business entities.

    This foundational level strategy of centralized data management, while potentially resource intensive from an organizational support and automation resource support standpoint, could be the basis for cost effectively making investments. The return on these investments would be very effectively demonstrated for existing program functions and systems, as well as future client-centric data management initiatives on the HHS roadmap, such as the Enterprise Data Warehouse (EDW), Medicaid Information Technology Architecture, and Health Information Technology / Health Information Exchange (HIT/HIE).

    2. Master Data Management (MDM)

    Recent industry trends with client data entity management and tracking of other key data entities across systems advocate the use of Master Data Management (MDM) as a solution. In addition, the implementation of MDM as an enterprise level solution to maximize the benefits of MDM is a current industry trend. This document presents the assessment results of a proof of concept exercise performed to assess the capabilities of Master Data Management (MDM) in the context of providing centralized data entity management and entity linking across systems.

    Definition

    Master Data Management (MDM) comprises a set of processes and tools that consistently define and manage the data entities of an organization. MDM has the objective of providing processes for collecting, aggregating, matching, consolidating, quality-assuring, persisting and distributing such data throughout an organization to ensure consistency and control in the ongoing maintenance and application use of this information. MDM is about two critical components -- the data itself and the functionality to ensure the data is contextually accurate and timely. While data is the foundation of a Master Data Management solution, it cannot be effective without a secondary component -- functionality to govern the data. Data on its own has no ability to maintain data readiness or more simply, the accuracy of the data in the context of a specific purpose. Master data must be actively managed by appropriately selected data stewards within an organization.

    Page 4

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    How MDM aligns with HHS Initiatives

    An MDM solution can be used:

    Collaboratively, to create and define master data Operationally for real-time data access, and Analytically, for data analysis.

    An Enterprise MDM strategy, when properly implemented, can be beneficial across the HHS enterprise for cross agency data sharing and synchronization, Health Information Exchange (HIE) efforts, Medicaid Information Technology Architecture (MITA), and the creation of a true Enterprise Data Warehouse (EDW). A true Enterprise MDM implementation can manage changes, event triggers and notifications across all applications enterprise-wide. In addition, MDM must do more than simply house the data; it must manage its use in processes across the enterprise using different implementation strategies.

    MDM in the context of Cross-Agency Data Sharing and Synchronization Initiatives

    A Collaborative MDM manages the process of creating, defining, and synchronizing master data across systems. Once the master data is defined, it can then be synchronized with operational and analytical systems and applications. Collaborative MDM provides a platform to aggregate, enrich, and publish definitional data and requires workflow and advanced security capabilities. The MDM solution provides execution on all critical data changes and event notifications from simple to complex. This includes everything from resolving a duplicate record to determining which systems get specific updates. For example, address changes made in one source system can be sent to MDM as part of real-time updates or a daily batch feed to update the master record. MDM can then identify that the same client exists in other systems within HHSC and can send a critical data change notification to these systems as well.

    MDM in the context of upcoming Health Information Exchange (HIE) and Medicaid Information Technology Architecture (MITA) Initiatives

    In an Operational MDM, use and maintenance of master data occurs within operational processes and applications. The master data is leveraged by other systems using these services. Operational MDM can leverage and become a significant part of a services-oriented architecture to support a variety of application needs. In the case of the HHS environment, HIE and MITA related initiatives will establish systemic processes that could benefit from Operational MDM. The MDM implementation requires performance

    Page 5

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    to handle high transaction levels and should have open integration with operational applications. Operational MDM uses pre-defined, out-of-the-box business services. An Operational MDM solution is modeled on a service-oriented architecture (SOA), should be flexible and scalable, and have some predefined set of out-of-the-box functions to support the management and integrity of data. Operational MDM systems have the flexibility to extend functionality to support new or additional business processes.

    MDM in the context of Business Intelligence (BI) through Enterprise Data Warehouse (EDW) Initiatives

    An Analytical MDM provides accurate, consistent, and up-to-date master data to an Enterprise Data Warehouse (EDW). It feeds business intelligence insight data back into collaborative and operational MDM. For example, a change in address (city/county/region) by a client thru MDM can indicate that he/she is now eligible under a different program previously not available for the client. MDM can then be used to send a notification to the EDW that can trigger a Business Intelligence (BI) event to alert the case worker to contact the client regarding the additional eligibility available.

    3. Proof of Concept (PoC)

    Scope of IBM Master Data Management (MDM) Proof of Concept (PoC)

    To evaluate the viability and capabilities of master data management (MDM) using client data across different program data sets, HHSC entered into an agreement with IBM to perform a proof of concept (PoC) exercise. The IBM MDM solution was chosen for the evaluation for two key reasons:

    1. IBM had been previously identified as one of three visionary industry leaders in the customer data integration solution space during a Gartner Research Study in May 2009.

    2. IBM agreed to commit resources and make available the necessary software and hardware infrastructure to perform the proof of concept exercise in accordance with HHSC policies and procedures.

    The IBM InfoSphere product suite was used to assess if the Master Data Management (MDM) technical solution could help HHSC in defining a single view of a client called Master Client Index (MCI). The PoC was designed to demonstrate the viability of IBM MDM software products to build a unified, standardized, and integrated repository of clients served and used by the various benefits programs offered by HHS. The PoC was intended to:

    Prove the benefits of utilizing a Master Data Management solution within HHS using business use cases.

    Validate the role of MDM in enabling strategic and operational analytic applications.

    Page 6

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Validate an MDM solution across structured and unstructured data stores. Identify any supporting operational roles, standards, processes, and other key dependencies

    that would have to be established for implementing MDM.

    The scope of the PoC was intended to demonstrate the functional and technical capabilities of an MDM solution by accomplishing the following:

    Determine the attributes that should be used to match client records across various source systems (SSN, name, address, etc.)

    Identify individual clients processed by multiple source systems Resolve the same clients records across multiple systems into a single record based on

    matching attributes Assign a single, integrated Master Client Index key for each individual Create a Master record which associates all source system keys for an identified client Use the Master record for integrated reporting on client information across source systems.

    The PoC assesses if various HHS agencies and systems could take full advantage of a Master Client Index capability maintained at an enterprise level. An MCI could potentially enable various HHS applications to link data with other systems to derive and answer analytical and operational questions accurately and assist business operations and executives to make informed decisions. Assessment of the performance capabilities of the various hardware and software tools used in the PoC was not in scope for this PoC. In addition, the focus of the PoC was to evaluate the general capabilities and maturity of available MDM tools rather than perform a technology evaluation of the IBM MDM solution relative to other technologies.

    Page 7

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Page 8

    IBM MDM Product Suite

    The IBM MDM tool set and high level functionality has been summarized in this section.

    Understand Cleanse Transform Deliver

    Parallel ProcessingRich Connectivity to Applications, Data, and Content

    IBM Information Server

    Discover, model, and govern information

    structure and content

    Standardize, merge,and correct information

    Combine and restructure information for new uses

    Synchronize, virtualizeand move information for

    in-line delivery

    Unified Deployment

    Unified Metadata Management

    Understand Cleanse Transform Deliver

    Parallel ProcessingRich Connectivity to Applications, Data, and Content

    IBM Information Server

    Discover, model, and govern information

    structure and content

    Standardize, merge,and correct information

    Combine and restructure information for new uses

    Synchronize, virtualizeand move information for

    in-line delivery

    Unified Deployment

    Unified Metadata Management

    IBM Master Data Management

    Collaborate

    Define, create and synchronizeMaster Information

    Operationalize

    Deliver Master Information as a Service for business operations

    Analyze

    Drive real time business insight

    Product, Partner, Customer, Supplier, LocationData Domain

    Industry Models & Assets

    IBM Master Data Management

    Collaborate

    Define, create and synchronizeMaster Information

    Operationalize

    Deliver Master Information as a Service for business operations

    Analyze

    Drive real time business insight

    Product, Partner, Customer, Supplier, LocationData Domain

    Industry Models & Assets

    Configured From A Multi-FormMaster DataManagement

    System

    ExploitInformation

    Server

    Configured From A Multi-FormMaster DataManagement

    System

    ExploitInformation

    Server

    Information Analyzer QualityStage DataStage DataStage

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Product Suite Description Function and Toolset

    1) Understand the data. IBM InfoSphere Information Analyzer can help companies automatically discover, model, define and govern information content and structure, as well as understand and analyze the meaning, relationships and lineage of information. 2) Cleanse the data. IBM InfoSphere QualityStage supports information quality and consistency by standardizing, validating, matching and merging data. 3) Transform data into information. IBM InfoSphere DataStage help transforms and enriches information to help ensure that it is in the proper context for new uses. It also provides high-volume, complex data transformation and movement functionality that can be used for stand-alone ETL scenarios or as a real-time data processing engine for applications or processes. 4) Deliver the right information at the right time. IBM InfoSphere DataStage provides the ability to virtualize, synchronize or move information to the people, processes or applications that need it. It also supports critical Service Oriented Architectures (SOAs) by allowing transformation rules to be deployed and reused as services across multiple enterprise applications.

    IBM InfoSphere Information Server

    IBM InfoSphere Information Server enables businesses to perform five key integration functions:

    5) Perform unified metadata management. IBM InfoSphere Information Server is built on a unified metadata infrastructure that enables shared understanding between the different user roles involved in a data integration project, including business, operational and technical domains. Master Data Repository. InfoSphere Master Data Management Server maintains master data for multiple domains including customer, account and product as well as other data types such as location and privacy preferences. MDM Business Services. Through business services, InfoSphere Master Data Management Server facilitates integration with all applications and business processes that consume master data. The MDM Integrity layer of InfoSphere Master Data Management Server provides data quality management capabilities around party matching, data validation, data standardization and external reference identifiers. The MDM Intelligence layer of the InfoSphere Master Data Management Server contains a business rule and event detection functionality that is fully integrated with the MDM Business Services. MDM Data Governance Services allow transaction and data attributebased authorization.

    IBM Master Data Management

    IBM Multiform Master Data Management (MDM) addresses the challenges for an effective and complete management of master data with a proven framework designed to help organizations across the enterprise. The fundamental principle of MDM is that master data is decoupled from operational, transactional and analytical systems into a centralized independent repository or hub. This centralized information is then provided to Service Oriented Architecture (SOA) business

    SOA Service Interfaces allow multiple systems and applications to integrate with the MDM Business Services.

    Page 9

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Page 10

    Product Suite Description Function and Toolset The MDM Data Stewardship user interface provides an intuitive graphical interface for managing various collaborative data processes such as managing groups, duplicate suspect processing and hierarchies. The MDM Event Management client provides the ability to trigger events and schedule processing at a party level.

    services so data is managed independently of any single line of business, system or application. This strategy enables enterprises to identify common functionality for all systems and applications and then support efficient, consistent use of business information and processes.

    MDM Batch Job Manager. This client application is designed to manage batch processing by providing capabilities such as pacing, logging and multithreading.

    Per IBM, their MDM platform moves beyond previous attempts at centralizing control of data by allowing users to fully manage data with multiple domains and multiple styles of data usage.

    Operational Applications

    Middleware & Business

    Processes

    Data warehouses & Analytics

    Data Stewards & MDM Users

    Wide audience of users of master data

    MDM Business

    Data Quality Management

    Intelligence Business

    Logic

    Data Governance

    Knowledge MDM Domains (i.e.. Party, account, U

    I

    A

    p

    p

    l

    i

    c

    a

    t

    i

    o

    n

    s

    InfoSphere Master Data Management Server

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Proof of Concept (PoC) Approach

    To perform the PoC, HHSC EIT collected client data from various agencies/systems. Due to the limited hardware capacity associated with performing a PoC, only a small subset of client data was used to assess the functionality of IBMs MDM product suite. Specifically, the subset of client data sets whose last name began with I was chosen for this exercise.

    To prove Master Client Index (MCI) integration between data sources, analytical use cases were defined to merge MCI data with claims data from one system and lab data from another. These use cases were designed to prove how an MCI could be used for data warehousing and analytical application integration.

    The diagram on the next page shows the architecture, number of data sources and use cases involved in the IBM MDM PoC.

    Page 11

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Page 12

    Diagram 2: MDM PoC Architecture Diagram

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    The diagram and tables below present the various data sets that were used for the PoC including the subset of data loaded from each source into the consolidated data environment on the MDM Server.

    Diagram 3: MDM PoC Data Flow Diagram

    Page 13

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    The following table describes the data source details of the data files used to load MDM.

    Source Data Details

    DS 1 Data from source systems master client index table (all clients). Only loaded clients whose last names started with I into the PoC. Client addresses not in this source system were obtained via a data extract

    from the appropriate source system. DS 2 Monthly client extract (all clients obtained).

    Only loaded data for clients whose last name started with I DS 3 Monthly extract file as of August 2009.

    Only loaded data in for clients whose last name started with I Clients and claims data.

    DS 4 2 Years (2008 and 2009) of client and lab data from this system. Statistics quoted are only for clients whose last name starts with I.

    4. Results and Observations

    Observations in this section are specifically for the sample data extracts used in the PoC. However, until further validation has been performed with subject matter experts, there is no clear indication that the types of issues identified are valid issues and that these issues currently exist in the source systems. In addition, it is important that conclusions made on the extent of certain types of data patterns or problems not be inferred across the entire data set or system.

    In some cases, HHSC was aware of the observations (e.g., inclusion of historical client records in the data sets provided). Since the client data from MDM that was joined with the claims data to produce analytical reports for this PoC was a small subset of the data obtained from source systems, some of the observations may be skewed or misrepresented due to the subset selected. It is therefore important to understand that the results inferred from this PoC be used to reach conclusions on the capabilities of a MDM solution rather than generalizations about the quality of the data itself.

    The following results and observations are intended to provide insight into:

    (1) the redundant data issues identified from the source sample data prior to the creation of the Master Client Index, and

    (2) the types of data issues encountered with each system that had to be addressed by the Master Data Management solution.

    Page 14

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Observations of Source Data Quality

    The table below summarizes the steps facilitated by the MDM software and the associated counts as data was loaded from the different sources into the MDM environment and the different software capabilities were used:

    Data Load Description Record Count

    Raw client data starting with "I" for all 4 data sources (clients loaded into the consolidated table) 69,057Duplicates within a data source dropped (sum for all 4 data sources) 5,620Records dropped due to data issues within each data source (total of 4 data sources) 3,563Records duplicated across data sources (updated existing MDM record with additional data source identifier) 7,298Client records resulting from standardization & matching (clients loaded into master table) 52,576Rows dropped due to invalid last name 749Total clients loaded into MDM 51,827

    Issues Addressed by Data Standardization The MDM solutions data transformation step included data standardization. This data standardization process was required to address the following types of issues encountered with source system data prior to the creation of the MCI:

    DS 1

    15% of the records had a blank value in the Social Security Number field. 30% of the records contained filler information in the address fields (e.g.

    Hurricane Ike, Homeless, Same as above) 75% of the addresses were blank in the data extracts utilized Non-standard data entries within the address related fields i.e. address values

    spread across multiple columns 11% of clients had multiple records within the same data source

    DS 2 1% non-standard address structures 13% blank SSN Clients with multiple records within source 2% Contains clients > 18 yrs old (pregnant women)

    DS 3 Suspected invalid age data, client ages greater than 107 years 28% of the records had non-standard address values Records containing case numbers with zero values 2% of clients had multiple records within the same data source

    Page 15

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    DS 4 Multiple date formats in date-related fields - mm/dd, mm/dd/yyyy, mm/dd/yy 98% of the records had a blank value in the Social Security Number field. 91% addresses in the data extract used were blank 21% of clients had multiple records within the same data source

    Across Source Systems Inconsistent formats for birthdates Inconsistent formats for addresses (missing or incomplete address data

    components)

    The data below provides additional details on the number and percentage of data standardization and duplication issues encountered in the subset of data used for the PoC.

    Count Percent Count PercentDS 1 16,628 29.52 DS 1 9,186 16.3DS 2 38 1.24 DS 2 410 13.42DS 3 1,000 28.4 DS 3 67 1.9DS 4 5,683 92.46 DS 4 6,007 97.73

    Count PercentDS 1 41,832 74.52DS 2 0 0DS 3 1,702 48.35DS 4 77 1.25

    Count Percent Count PercentDS 1 11,696 16.93 DS 1 6,916 12.28DS 2 284 0.41 DS 2 65 2.13DS 3 133 0.19 DS 3 60 1.7DS 4 4,367 6.32 DS 4 1,542 25.09

    Duplicates Within Source

    Invalid Social Security Numbers

    Invalid Zip Codes

    Invalid Address Structures

    Duplicate Records

    Once the final MCI had been created, further analysis of the data using the software allowed one to observe the following:

    Clients were identified as matching from 2 or 3 data sources, however, no single client was found in all 4 data sources used.

    71% of clients from the DS 2 data set existed in the DS 1 data set (this doesnt imply that these clients were receiving benefits from both systems simultaneously; additional cross referencing would be necessary).

    Page 16

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Page 17

    Data Matching Process

    Matching of client data sets involved the following steps:

    Client data set was grouped based upon predefined criteria. The grouped data was then matched against attributes to produce a statistical score on

    the likelihood that the records matched. Any data whose score was not sufficient to instill confidence that the records matched

    was retained for use in the next data matching iteration.

    Below are the 3 grouping utilized in this PoC:

    Grouping Attributes Used 1st First Name, Last Name, Street Name 2nd First Name, Last Name, DOB 3rd SSN

    Data evaluated for each grouping used the following attributes in the data matching process:

    SSN Last Name First Name Middle Name DOB Gender Address

    City State Zip County Region Phone Source System Key

    As issues were identified in the data matching process, the MDM tool set allowed additional data matching rules to be defined. Overall observation was that any MDM solution implemented would require a flexible tool set that could be customized to address data matching needs.

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Use Case Results

    Use cases identified used a sampling (clients whose last names start with I) of HHS data from all 4 data sources. Both operational and analytical use cases were utilized.

    Operational MDM Use Cases (Enterprise Master Client Index (MCI)):

    Identify and report on contradictory and/or overlapping attribute values per identified individual and general data profiling information discovered in the analysis Successfully demonstrated.

    Demonstrate the ability to identify a suspected duplicate individual during an operational add of a new individual to the Master Data Management Repository Successfully demonstrated.

    Demonstrate the potential capability to enable HHS applications to search, access, and update individual client information with service calls to the Master Data Management repository Successfully demonstrated.

    Single View of Client and Claims for auditing purpose Successfully demonstrated.

    Analytical MDM Use Case:

    Show aggregated costs across both coverage programs at different levels of aggregation (aggregate by Plans, Services, Population demographics, etc.), including:

    Costs for the 100 Most Costly Medicaid Clients Successfully demonstrated Five claim data files were manually loaded to a claim fact table (2

    acute care and 3 CMS claims). This report was generated by joining the claim fact table with the customer table which was a dimension out of MDM.

    Determine Diabetic Clients Overdue for a Medical Screening Not performed

    Collaborative MDM Use Case:

    No collaborative use cases had been identified for this PoC at the onset of planning this exercise. Future POC and assessment activities will need to validate these capabilities.

    Page 18

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    5. Recommendations and Conclusion

    Although the MDM Proof of Concept (PoC) was initially undertaken to show the viability of accurately matching or linking records across different data sources to establish a unified and contextually accurate view of an entity (client, patient, provider, etc.), it quickly became evident during the PoC that there were a number of other HHS initiatives that could benefit from an Enterprise Master Data Management (EMDM) solution. It was determined that a Master Data Management combined with an Enterprise Data Warehouse or a data bank might be utilized in the development of an enterprise level information repository that could be considered for use on Health Information Exchange (HIE) initiatives. A single enterprise level MDM system to handle the cleansing, standardization, and linking of client records for use in performing consistent data exchange with other nodes in the HIE network could prevent data mismatches during the process. This could effectively eliminate the need for individual agencies and/or departments to develop multiple data matching solutions (and algorithms for matching) and interfaces with various trading partners (Providers, Physicians, the RHIE, and/or National Health Information Network (NHIN)) that could avoid risks related to lack of data integrity and data corruption in the exchange processes. It is important to note that to date several limited silo-centric MDM solutions or processes have been identified as currently being used within HHS. While some areas identified the need to enhance or upgrade these solutions and were interested in contributing requirements to an enterprise level solution, other areas believed that their solutions or processes sufficed from their individual operational point of view. MDM solutions currently in use within HHS included:

    Informatica SSA Sun GlassFish An older version of Informatica SSA combined with custom code SPSS and Python based solution Custom Code for matching clients between systems at an HHS agency.

    The purpose of this POC was not to facilitate the selection of a recommended technology or tool. That is recommended as a future next step. Rather, the assessment was to verify the availability of a comprehensive solution that could provide a complete spectrum of capabilities reflecting current needs in place today while at the same time is scalable and uses more current matching algorithms and techniques for future initiatives. It is important that a solution with this wide range of capabilities be assessed for the following reasons:

    1. Current solutions and tools implemented were often chosen for a narrower, operation-specific set of requirements (e.g. batch processing only with no data stewardship) and often driven by having limited financial resources.

    Page 19

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Page 20

    2. The different platforms available in the market, including those in use within HHS, use different matching mechanisms and logic that do not match across platforms. For example, the resulting set of matching clients using one technology is not the same as another (although most may overlap). This does not represent a fully effective and consistent solution as client matching mechanisms for an enterprise level view of a client will have different results and mismatches as data is pulled together from different sources, thus repeating the problem that master data management was supposed to solve.

    For this reason, a solution that allows for flexibility in automation versus manual decision making through data stewardship with flexibility to centralize or decentralize the data governance decision making processes is important. This allows the owners of record at various levels of the organization to participate in the data management and provisioning processes. While the capabilities may exist in vendor tools and technology offerings (through additional modules), the current implementations of MDM tools in the enterprise do not reflect such sophistication and are therefore limited.

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Recommendations Conducting this PoC resulted in the following recommendations for implementing an Enterprise Master Data Management (EMDM) solution:

    1. Identified requirements for a robust, comprehensive, and enterprise level MDM solution.

    The MDM solution selected needs to include a robust and comprehensive toolset representing the current and future needs of the enterprise. The current environment of multiple, limited solutions and implementations presents a barrier to enterprise level master data management and in turn, an enterprise level view of a client. That comprehensive toolset should have the ability to:

    Customize data standardization rules that could be applied similarly to both batch processes

    and real-time processes Analyze data sets and identify data quality concerns and inconsistencies Match data using an easily customized set of rules and weight factors Delegate data stewardship with a user friendly interface review and processing of

    suspected duplicates identified by the data matching process that require human intervention for final determination

    Capture end-to-end metadata (or data about data) to show data lineage (where data comes from) and impact analysis (how adding or changing data will affect existing data)

    Interact with standard, authenticated data sources, like USPS (US Postal Service) to verify addresses and SSA (Social Security Administration) to check death records

    Provide capabilities to efficiently create standardized data sets that will be used downstream to exchange data with external entities. E.g. to adapt to various electronic data exchange standards, including X-12, HL7, etc.

    Implementing an MDM solution with a robust toolset like the functionality described above decreases the amount of manual record-matching needed and when configured effectively, reduces mismatched records. Regardless of whether a mismatch results in an incorrectly merged record that was in fact unique clients or failure to merge records that were duplicates, the impact can be costly.

    Not merging duplicate records can result in clients receiving benefits to which they are not

    entitled and that could lead to another client receiving fewer benefits due to insufficient funding.

    Incorrectly merging client records could result in the inappropriate disclosure of sensitive/confidential data.

    When an incorrectly merged client record is split, the process is complex because multiple transactions for two people may be recorded as one person and it may not be clear which transaction was entered for which person.

    Page 21

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    2. Data Governance

    Organizational infrastructure and processes need to be committed for data governance. An MDM solution cannot replace all components of master data management needs. However, it can minimize the resources currently being used to perform MDM-related or MDM-like activities. Prior to designing and implementing an MDM solution, a data governance team needs to be in place to:

    Identify and prioritize the data elements / attributes to be captured and maintained within

    the MDM repository. Identify data matching attributes and qualitative scoring that will be used to determine

    unique client criteria. Clearly define system of record precedence when matching and merging records (which

    system has the best source of data for each data element). Define data standardization rules to be enforced via automation, such as changing all

    instances of Street to ST to standardize an address so that the USPS can validate the address.

    Identify and establish the process and data steward team that will have the authority to handle suspect processing and criteria on where data should be corrected (i.e., when client discrepancies exist, decide whether to automate the data correction process with source systems or manage the corrected data in MDM with a notification to source systems that the client exists).

    A strong data governance structure is needed to ensure the accuracy of the data. MDM solutions provide the technical means by which data can be managed to facilitate client matching across the enterprise; it does not address the data ownership and decision making structure required to accurately process and consolidate enterprise data. A team composed of data experts from each agency is needed to work together to develop enterprise data-related rules (such as using USPS standards for entering addresses) and to take ownership of the data to ensure those rules are enforced; to address issues and concerns; and to govern the process as new elements for inclusion in the MDM repository are identified.

    3. Perform cost/benefit analysis (CBA) to determine the true implementation cost.

    This PoC did not include a cost/benefit analysis (CBA) component. As this POC dealt only with a small subset of HHS data, further analysis is required to determine the true cost for implementing an enterprise MDM solution (i.e., license costs, staffing costs, and hardware infrastructure).

    Page 22

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Page 23

    4. Perform additional proof of concept or perform pilot projects for MDM.

    An additional PoC could be conducted with another recognized MDM solution provider to assess viability of the technology for collaborative MDM and other use cases not verified during the first proof of concept project. This recommendation suggests performing a PoC on a larger set of data with more data sources to gain a better understanding of the data issues that might be encountered with a full implementation. In addition, in order to properly assess viability and maturity of the MDM concept, the PoC should be conducted with a different tool or technology. In staying with the confines of maximizing value of performing the PoC through industry leading vendors recommended by independent research, the PoC should be performed using technologies such as Initiate systems or Oracle as recommended by Gartner Research. However, a constraining factor may be that other vendors in the MDM market may not have the financial resources to perform such a POC, so there is a risk that proving a solution on a full set of data may not be feasible. This will also require organizational support and staff resources to facilitate oversight of the PoC which may also be a constraining factor. In order to obtain meaningful results to support a purchasing decision, it may be necessary to allocate funds to participate in an actual pilot versus PoC.

    5. Educate and present results of the MDM PoC to various user communities

    A significant number of users and technical operations continue to maintain that MDM solutions should be implemented at a local level with cheaper solutions. Often times, this approach is recommended due to a lack of understanding or awareness of the overall, enterprise need, or a lack of understanding of downstream processes from areas that may be a consumer of their data. In addition, issues of control, limited budget, and speed of execution drive the decision making process of choosing a local siloed implementation. This recommendation advocates presentation of the MDM PoC results and education of the user community to achieve a broader level vision of how MDM can positively impact agency wide operations and could be cost-effectively implemented across the enterprise through cost sharing.

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Conclusion From a technical standpoint, MDM was assessed to be a viable solution for the problem of matching and linking clients/patients across different programs and systems with a high degree of trust. An MDM solution implemented on an enterprise level could potentially play an integral part in the success of other HHS initiatives, including the EDW Initiative and HIE/HIT. MDM could support and facilitate enterprise level data governance operations. The MDM concept needs to be assessed in more detail from the standpoint of collaborative analytics. A successful Enterprise MDM implementation will require substantial planning and investment not only in the software/hardware environment, but also in establishing a supporting governance structure. In order to establish the viability of implementation, further research needs to be done on the financial viability of an enterprise level solution. In addition, the performance capabilities of MDM solutions needs to be researched through case studies of MDM implemented in other enterprises. An additional PoC and/or pilot should be undertaken before a final MDM solution is selected so that the tool set capabilities can be compared and the magnitude of effort required when working with a larger set of data can be assessed.

    Page 24

  • HHSC PoC Executive Report Enterprise IT: IBM Master Data Management

    Appendices

    Page i

  • HHSC Enterprise Information Technology Proof of Concept Assessment Report Master Data Management

    Appendix A Glossary

    Term / Acronym Definition

    BI Business Intelligence

    CBA Cost Benefit Analysis

    DOB Date of Birth

    EDW Enterprise Data Warehouse

    EIT Enterprise Information Technology

    EMDM Enterprise Master Data Management

    FTE Full-Time Employee

    HHS Health and Human Services

    HHSC Health and Human Services Commission

    HIE Health Information Exchange

    HIT Health Information Technology

    MCI Master Client Index

    MDM Master Data Management

    MITA Medicaid Information Technology Architecture

    MOU Memorandum of Understanding

    NHIN National Health Information Network

    PoC Proof of Concept

    RHIE Regional Health Information Exchange

    SME Subject Matter Expert

    SOA Service-Oriented Architecture

    SSN Social Security Number

    Page ii

    Data Collection Data Cleansing Data MatchingData Standardization and Ongoing MaintenanceDefinitionHow MDM aligns with HHS InitiativesScope of IBM Master Data Management (MDM) Proof of Concept (PoC)IBM MDM Product Suite Proof of Concept (PoC) ApproachObservations of Source Data QualityData Matching ProcessUse Case ResultsRecommendationsConclusionAppendix A Glossary