Upload
vukhuong
View
216
Download
1
Embed Size (px)
Citation preview
ATHABASCA UNIVERSITY
Applying Fuzzy Logic for Data Governance
BY
XiaoHai Lu
A project submitted in partial fulfillment
Of the requirements for the degree of
MASTER OF SCIENCE in INFORMATION SYSTEMS
Athabasca, Alberta
November, 2014
© XiaoHai Lu, 2014
DEDICATION
This essay is dedicated to my supported wife Winnie and my boys Andrew and Michale.
ABSTRACT
Every day, as we browse the internet, we consume big data from the various search
engines and social networks that we visit. Like individuals, enterprises also confront
a vast stream of information from individuals, communities, corporations, and
governments. With vast volumes of information, long retention cycles and high
velocity decision-making has the potential to derail the usefulness of information and
do more damage than good to enterprises. The axiom 'better data means better
decisions' becomes critical. Without solid data governance in place, data can be
inaccurate and unfit for usage.
This essay will describe the history and future of data governance. It will also
explain the current process of data governance before demonstrating a prototype of
a data governance application in the banking industry.
Data governance processes such as matching and linking related records require
mathematical support in the decision-making process. Fuzzy logic, which is a
approach to computing that is based on varying degrees of truth, was found to be a
good solution to this issue. As such, this essay successfully applies fuzzy logic to
overcome and improve the process, reduce human intervention, and improve the
data quality of data governance processes.
3
ACKNOWLEDGMENTS
I thank all who were involved in the support and review process of this book. Without
their support, the essay could not have been satisfactorily completed.
Thanks go to all those who provided their insightful and constructive comments, in
particular, to professor Richard Huntrods of Athabasca University who provided
priceless suggestions and feedback on my essay.
4
Applying Fuzzy Logic for Data Governance
Table of Contents
DEDICATION...........................................................................................................................................2
ABSTRACT...............................................................................................................................................3
ACKNOWLEDGMENTS.........................................................................................................................4
CHAPTER 1 – INTRODUCTION............................................................................................................7
Data Governance: The History..............................................................................................................7
Data Governance: The current literature on the topic...........................................................................8
Data Governance: The Future...............................................................................................................9
CHAPTER2 – DATA GOVERNANCE PROCESS.................................................................................11
Data Governance Process....................................................................................................................11
CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS.....................................................................43
The Potential Overlay Task:................................................................................................................43
Match Duplicate Suspects to Create a New Master Record:...............................................................44
Link Related Records from Multiple Sources:....................................................................................45
CHAPTER4 – FUZZY LOGIC................................................................................................................48
Traditional Logic:................................................................................................................................48
Fuzzy Logic History............................................................................................................................51
The Basic Concept of Fuzzy Logic ....................................................................................................52
A Fuzzy Implementation:....................................................................................................................52
Brief Discussion:.................................................................................................................................57
CHAPTER 5 - CONCLUSIONS.............................................................................................................57
References................................................................................................................................................58
5
Applying Fuzzy Logic for Data Governance
List of Figures
Figure 1: Data Governance Process.........................................................................................................11
Figure 2: MDM Process...........................................................................................................................20
Figure 3: MDM Initial Load Process.......................................................................................................24
Figure 4: MDM Delta Load Process........................................................................................................26
Figure 5: Quality Stage Initial Load Process...........................................................................................29
Figure 6: Quality Stage Delta Load Process............................................................................................29
Figure 7: Case 5.......................................................................................................................................43
Figure 8: Case 3.......................................................................................................................................45
Figure 9: Case 2.......................................................................................................................................46
Figure 10: Cases ......................................................................................................................................47
Figure 11: Training Set.............................................................................................................................49
Figure 12: Traditional Decision Tree.......................................................................................................51
Figure 13: Fuzzy MF................................................................................................................................52
Figure 14: Traditional Decision Tree.......................................................................................................55
Figure 15: Decision Matrix......................................................................................................................56
6
Applying Fuzzy Logic for Data Governance
CHAPTER 1 – INTRODUCTION
Data Governance: The History
Data governance is an emerging discipline with an ever evolving definition. The
discipline embodies a convergence of data quality, data management, data policies, business
process management, and risk management surrounding the handling of data in an
organization.1 The central point of this definition of data governance is related to data quality.
From the point of view of businesses, data governance needs to be able to provide qualified
information. The data governance process is the practice of transforming data into qualified
information, which can be used by businesses. Incidentally, the concept of data governance
has been around since the beginning of relational databases. Data is stored across
referenced tables. Businesses can retrieve information by joining the data through cross
referencing those tables. With the growth of information technology, databases are gradually
becoming central part of information systems. In order to insert qualified data into databases,
data governance is extended from databases into a set of processes which are defined as
extracting, transforming, and loading (ETL) areas in order to provide databases with clean,
accurate, and prompt data feeds. New terms such as metadata, data source, target, and
staging are emerging with the ETL approach. There are numerous ETL tools available on the
market such as Informatica and Ab initio. However, the motivation for ETL comes from an
information technology (IT) perspective and focuses on IT techniques. In 2004, IBM started
to introduce data governance as a discipline for treating data as an enterprise asset, 3. As a
financial asset, data has to be treated like other financial assets — just as one would treat a
plant and equipment. Data inventory is required for enterprises with existing data, in as much
7
Applying Fuzzy Logic for Data Governance
the same way as inventories are needed for physical assets. Preventing unauthorized data
changes for critical data, should also be considered since this can affect the integrity of
financial reporting, as well as the quality and reliability of daily business decisions.3 Protecting
sensitive data and intellectual information property from both internal and external threats is
also another element that falls under data governance. Since data is a business asset, the
question of how to maximize its value is also under the umbrella of data governance.
Data Governance: The current literature on the topic
As an emerging form of technology, data governance has been mainly supported by business
vendors rather than academic research. For example, performing an query on the subject
“data governance” on the ACM Digital library (Association for Computing Machinery) only
yields 2824 results (queried on Aug 27, 2014). In contrast, when the same query is performed
on Google, 36,200,000 results are yielded (queried on Aug 27, 2014). The technologies
pushed by business vendors share common challenges, such as having broad fundamental
concepts with aspects being emphasized differently by each vendor. For example, Oracle
does not buy the unified processes introduced by IBM white paper. In addition, challenges
include that the concepts of data governance and practices are still shadowed by their
precedences such as ETL, data warehouse, and ERP products. “MDM is effectively Data
Warehousing branded with ERP market rhetoric and contains an added repository of 'master
data'. We see MDM as another attempt at data integration due to the failure of previous Data
Warehousing, ERP and ERPII/BI initiatives.” 17 Although many companies prefer specialized
MDM solutions, the three main players in the MDM market are IBM, Oracle and SAP.
8
Applying Fuzzy Logic for Data Governance
Data Governance: The Future
Data governance is constantly evolving and morphing into new forms. This process of
evolving has resulted in the next generation of data that is beginning to enter companies.
Different from traditional data, next generation data will be a part of companies' daily routine.
For example, when we make a cellphone call, the relationship data (which includes the
callers' name, phone number, and location) will have been collected. Likewise, the
transactional data (which includes the time of the call and the duration of the call) will have
been collected as well. Such kinds of big data are not limited to mobile data, GPS
coordinates, location awareness data, and social interactions such as LinkedIn and
Facebook. The way that next generation data is captured through the cloud will definitely
change the way we deal with traditional data. It's one thing to be flooded with big data; it's
another thing to be able to make sense of it and then be able to act on it or make
recommendations for a human or another system to act on it.6 Big data by itself is merely
unstructured data, as we have to analyze the data in order to understand it. MDM and data
governance processes will make the analysis more efficient. Through data governance's
identity resolution, we can have a single view of an entire company's data. With data
governance, we will not be drawn by next generation big data; however, we can understand
their relationship and react on it quickly.
Big data and the cloud, which generates and delivers real time data, will require us to react in
real time, while next generation data governance will help us with understanding and reacting
to real time data.
In addition, unlike traditional data, big data may be owned by a number of brokers or a third-
party. The next generation data governance process should also have the ability to accept
9
Applying Fuzzy Logic for Data Governance
different protocols.
10
Applying Fuzzy Logic for Data Governance
CHAPTER2 – DATA GOVERNANCE PROCESS
Data Governance Process
Below is a diagram detailing the process of data governance by IBM: 6
Figure 1: Data Governance Process
Note. Descriptive note. Adapted from “ The IBM Data Governance Unified Process” by Sunil Soares, 2010, p8 Copyright 2010 by MC Press
Online,LLC. Adapted with permission
1) Define the business problem
The main reason for the failure of data governance programs is that they do not identify a
tangible business problem. It is imperative that the organization defines the initial scope of the
11
Applying Fuzzy Logic for Data Governance
data governance program around a specific business problem, such as a failed audit, a data
breach, or the need for improved data quality for risk- management purposes. Once the data
governance program begins to tackle the identified business problems, it will receive support
from the business functions to extend its scope to additional areas.
2) Obtain executive sponsorship
It is important to establish sponsorship from key IT and business executives for the data
governance program. The best way to obtain this sponsorship is to establish value in terms of
a business case and quick hits. For example, the business case might be focused on house
holding and name-matching in order to improve the quality of data to support a customer-
centricity program.
3) Conduct a maturity assessment
Every organization needs to conduct an assessment of its data governance maturity,
preferably on an annual basis. The IBM Data Governance Council has developed a maturity
model based on 11 categories (discussed in Chapter 5), such as Data Risk Management and
Compliance, Value Creation, and Stewardship. The data governance organization needs to
assess the company’s current level of maturity (current state) and the desired future level of
maturity (future state). The company's future state is usually projected at a time frame
spanning 12 to 18 months ahead. This duration must be long enough to produce results.
However, at the same time, it must be short enough to ensure the continued buy-in from key
stakeholders.
12
Applying Fuzzy Logic for Data Governance
4) Build a road map
The data governance organization needs to develop a roadmap to bridge the gap between
the current state and the desired future state for the eleven categories of data governance
maturity. For example, the data governance organization might review the maturity gap for
stewardship and determine that the enterprise needs to appoint data stewards who will focus
on targeted subject areas such as the customer, vendor, and product. The data governance
program also needs to include quick hit areas where the initiative can drive near-term
business value.
5) Establish an organizational blueprint
The data governance organization needs to build a charter to govern its operations, and to
ensure that it has enough authority to act as a tiebreaker in critical situations. Data
governance organizations operate best in a three-tier format. The top tier is the data
governance council, which consists of the key functional business leaders who rely on data
as an enterprise asset. The middle tier is the data governance working group,which consists
of middle managers. The final tier consists of the data stewardship community, which is
responsible for the quality of the data on a day-to-day basis.
6) Build a data dictionary
The effective management of business terms can help ensure that the same descriptive
13
Applying Fuzzy Logic for Data Governance
language applies throughout the organization. A data dictionary or business glossary is a
repository with definitions of key terms. It is used to gain consistency and agreement between
the technical and business sides of an organization. For example, what is the definition of a
“customer”? Is a customer someone who has made a purchase, or someone who is
considering a purchase? Is a former employee still categorized as an “employee”? Are the
terms “partner” and “reseller” synonymous? These questions can be answered by building a
common data dictionary. Once implemented, the data dictionary can span the organization to
ensure that business terms are tied via metadata to technical terms and that the organization
has a single, common understanding.
7) Understand data
Someone once said, “You cannot govern what you do not first understand.” Few applications
stand alone today. Rather, they are made up of systems, and “systems of systems”, with
applications and databases across the enterprise, yet integrated, or at least interrelated. The
relational database model worsens the situation through the fragmentation of business
entities for storage. However, how is everything related? The data governance team needs to
discover the critical data relationships across the enterprise. Data discovery may include
simple and hard-to-find relationships, as well as the locations of sensitive data within the
enterprise’s IT systems.
8) Create a metadata repository
Metadata is data that has the purpose of giving information about other data. It is information
regarding the characteristics of any data artifact, such as its technical name, business name,
14
Applying Fuzzy Logic for Data Governance
location, perceived importance , and relationships to other data artifacts in the enterprise. The
data governance program will generate a lot of business metadata from the data dictionary
and a lot of technical metadata during the discovery phase. This metadata needs to be stored
in a repository so that it can be shared and leveraged across multiple projects.
9) Define metrics:
Data governance needs to have robust metrics to measure and track progress. The data
governance team must recognize that when something is measured, performance improves.
As a result, the data governance team must pick a few key performance indicators (KPIs) to
measure the ongoing performance of the program. For example, a bank will want to assess
the overall credit exposure by industry. In that case, the data governance program might
select a percentage of null Standard Industry Classification (SIC) codes as a KPI, to track the
quality of risk management information.
10) Govern master data
The most valuable information within an enterprise, which is critical data about customers,
products, materials, vendors, and accounts, is commonly known as master data. Despite its
importance, master data is often replicated and scattered across business processes,
systems, and applications throughout the enterprise. Governing master data is an ongoing
practice, whereby business leaders define the principles, policies, processes, business rules,
and metrics for achieving business objectives, by managing the quality of their master data.
Challenges regarding master data tend to bedevil most organizations, but it is not always
easy to get the right level of business sponsorship to fix the root cause of the issues. As a
15
Applying Fuzzy Logic for Data Governance
result, it is important to justify an investment in a master data initiative. For example, consider
an organization such as a bank, which sends multiple pieces of mail to the same household.
The bank can establish a quick return on investments by cleansing its customer data to create
a single view of the “household.” The bottom line is that the vast majority of data governance
programs deal with issues around data stewardship, data quality, master data, and
compliance.
11) Govern analytics
Enterprises have invested huge sums of money to build data warehouses to gain competitive
insight. However, these investments have not always yielded results. As a consequence,
businesses are increasingly scrutinizing their investments. We define the “analytics
governance” track as the setting of policies and procedures to better align business users
with the investments in analytic infrastructure. Data governance organizations need to ask the
following questions:
❏ How many users do we have for our data, by business area?
❏ How many reports do we create, by business area?
❏ Do the users derive value from these reports?
❏ How many report executions do we have per month?
❏ How long does it take to produce a new report?
❏ What is the cost of producing a new report?
❏ Can we train the users to produce their own reports?
Many organizations will want to set up a Business Intelligence Competency Centre (BICC) to
educate users, increase business intelligence, and develop reports.
16
Applying Fuzzy Logic for Data Governance
12) Manage security and privacy
Data governance leaders, especially those who report to the chief
information security officer, often have to deal with issues around data security and privacy.
Some of the common data security and privacy challenges include:
❏ Where is our sensitive data?
❏ Has the organization masked its sensitive data in non-production
environments (for example, in development, testing, and training) to comply with privacy
regulations?
❏ Are database audit controls in place to prevent privileged users, such as DBAs from
accessing private data, such as employees' salaries and customer lists?
13) Govern the information lifecycle
Unstructured content makes up more than 80 percent of the data within
the typical enterprise. As organizations move from data governance to
information governance, they start to consider the governance of this
unstructured content.
The lifecycle of information starts with data creation and ends with
its removal from production. Data governance organizations have to deal with the following
issues regarding the lifecycle of information:
❏ What is our policy regarding digitizing paper documents?
❏ What is our records management policy for paper documents,
electronic documents, and email? (In other words, which documents do
17
Applying Fuzzy Logic for Data Governance
we maintain as records and for how long?)
❏ How do we archive structured data to reduce storage costs and improve performance?
❏ How do we bring structured and unstructured data together under a
common framework of policies and management?
14) Measure the results:
Data governance organizations must ensure continuous improvement by constantly
monitoring metrics. In step nine, the data governance team sets up the metrics. In this step,
the data governance team reports to senior stakeholders on the progress of those metrics
from IT and the business.
Data Governance Business Application
Today, banking systems, establish and maintain line of business (LoB) specific customer
views with associated accounts and product holdings – either in product systems or in LoB
specific Customer Information Files (CIFs). Thus, the customer, account, and product
relationship information resides in information silo applications. This limits the ability to
understand the customer holistically (across LoBs) and does not provide an enterprise view of
the customer.
The Master Data Management (MDM) initiative enables a complete 360 degree operational
view of customers across the bank (enterprise goal). At the target state, the key capabilities of
MDM are to:
Provide consistent and accurate data about essential business entities derived from a
single trusted source.
18
Applying Fuzzy Logic for Data Governance
Uniquely identify a customer and all the associated relationships/holdings with the
bank, based on the customer's privacy preferences
To achieve the target state objective, the MDM solution will integrate/interface between the
numerous LoB specific applications, consolidate the data, and create a single golden master
record.
19
Applying Fuzzy Logic for Data Governance
Below is a typical data governance business (Master Data Management) application diagram:
20
Figure 2: MDM Process
Applying Fuzzy Logic for Data Governance
The solution overview diagram clearly depicts various sub-systems in the solution. At a high
level, the entire solution is classified into following layers:
Presentation Layer
OCIF Sub-system
Data Integration and Quality Layer
Application Layer and
Database Layer
Presentation Layer
The presentation layer of the solution essentially implies user interface applications. The
following user interface applications are included:
Reporting User Interface
Data Stewardship User Interface
Business Administration User Interface
The Reporting user interface will generate business and stewardship reports on the data
available in MDM, the Data Stewardship user interface will provide various options for
operating with customer information along with searching and handling duplicate or potential
duplicate customers, and the Data Administration user interface will manage reference data
and other meta-data in the MDM database.
21
Applying Fuzzy Logic for Data Governance
OCIF Sub-System
The OCIF is an existing authoritative operational source of customer information being used
by multiple systems. This sub-system is presently being considered as a ‘Book of Record’ in
the enterprise. The key objective of this system is creating and maintaining standardized and
consistent customer information across the systems, reducing potential duplicate customers
and improving customer data integrity significantly so that it can be treated as a single source
of truth. In the current solution context, this system is considered as the only source systems
from which customer information will be loaded into MDM data base. Based on the solution
overview diagram, there will be two approaches of data synchronization between the
systems. These are:
The initial load – The entire content of the data base
The delta load – The difference in content between the last day and the current day
To populate data into MDM from OCIF, an OCIF component/utility is required, which will
extract required data. The new component that will be developed will be responsible for
providing extracts on a daily basis which will be the input for downstream sub-systems to
transform and load into MDM and thus synchronize two systems.
Information Integration Layer
Information Integration Layer – Data Stage
The information integration layer is a key component that is responsible for integrating OCIF
and the MDM server application. The data format provided by OCIF is not compatible with the
22
Applying Fuzzy Logic for Data Governance
MDM server and hence it is not directly consumable. DataStage, being the part of the
integration layer, is responsible for transforming OCIF extracts to an MDM specific format.
The key objectives of this layer is to:
Read extracts provided by the source system
Transform the extract in the required format based on a synchronization
mechanism
Transform the extract file in the format required by the data quality component
for standardization during the initial load
Transforming reference value to MDM specific codes depending on the source
system reference value
Loading the transformed data into a data base/file
The IIS DataStage component is responsible for reading extracts from the source system,
transforming them into SIF format and pushing data into MDM in two different ways:
Directly into the database during the initial load
Writing into files (ExSIF) for the delta load
The following sections detail the approaches to be followed in the ETL layer.
23
Applying Fuzzy Logic for Data Governance
The diagram below describes the high level steps to be performed in DataStage during the
initial data load.
1. Custom DataStage extract job will be developed to read extract files from the ETL
receiving zone and parse each record based on the record type and sub-type into
individual records of the SIF format, which is a pipe delimited standard interface file.
2. Validation jobs will be responsible for data standardization. It will also perform the SIN
validation and phone number validation. Any failed record information will be logged
into an error log file through error handling jobs.
3. An ETL job will be invoked to populate a separate file for standardization which will be
24
Figure 3: MDM Initial Load Process
Applying Fuzzy Logic for Data Governance
used by QualityStage. The above steps will generate the SIF files for consumption of
the BIL jobs.
4. The BIL import job imports the SIF file for processing.
5. A validation job validates the code column value and invokes error handling framework
jobs in the case of failure. In such scenarios, these records which are a source of
issue, are dropped from the requested SIF file. Based on the strategy of the initial load,
the dropping of records is minimized to synchronize the MDM with the source system
at the highest degree.
6. The party referential integrity validation job ensures every party has either a valid
PersonName or OrgName record and also verifies that a valid party record exists for
the “Provided By” Source System Key (SSK).
7. The BIL consists of one job for each Record Type or Sub Type (RT/ST) that performs
key assignment and database loading. For example the Contact key assignment job
assigns CONT_ID, PERSON_ID, ORG_ID and CONTEQUIV_ID to CONTACT,
PERSON, ORG and CONTEQUIV records respectively and inserts them into the MDM
database. Before loading the records into MDM, an MDM Involved Party ID will be
generated within ETL jobs. At a high level, the new MDM Involved Party ID will be of
an 18 character length where the first 2 characters will imply the version of the BIL and
the last 16 characters will be a random number.
8. The data quality error consolidation process reads the data quality error files created
during the import SIF, validation, and referential integrity validation phases and drops
any records associated with the records in the error file.
25
Applying Fuzzy Logic for Data Governance
The diagram below describes the high level steps to be performed in DataStage during the
delta data load.
1. Custom DataStage extract jobs for the process of the initial load will be re-used to
read extract files from the ETL receiving zone and to parse each record based on the
record type or sub-type.
2. Data validation jobs are responsible for CII data standardization. It will also perform
SIN validation and phone number validation. Any failed records will be logged into the
26
Figure 4: MDM Delta Load Process
Applying Fuzzy Logic for Data Governance
log file through an error handling mechanism. The above two steps essentially
generate the SIF files for consumption of the BIL asset.
3. The DataStage import job imports the SIF file for processing.
4. Applicable business transformation rules are invoked using DataStage transformation
jobs which are responsible for generating extended SIF files for MDM to consume.
Errors are logged using DataStage's out of box error handling mechanism for further
analysis and action.
Data Quality Management – Quality Stage
The master data hub solution is about providing complete, accurate, standardized information
about the customers stored in the MDM system. Even though OCIF maintains its own data
quality, customer attributes need further standardization before they are stored in MDM as it
will be the single version of truth on customer data throughout the enterprise. The
QualityStage component is primarily responsible for data standardization, the improvement of
overall quality of the data asset of the enterprise, and identification of duplicate/potentially
duplicate customers. The current solution places QualityStage with the following objectives:
Standardize name and address related attributes
Validate and correct customers' addresses with the Canada post address
repository implemented through SERP
Perform probabilistic matching to identify potential duplicate customers
The IIS QualityStage component is responsible for maintaining data quality stored in MDM.
27
Applying Fuzzy Logic for Data Governance
The key objective of QualityStage is:
Name and address standardization
Identifying duplicate/potentially duplicate customers
Matching
28
Applying Fuzzy Logic for Data Governance
The diagram below describes the high level steps to be performed during the initial load.
The diagram below describes the high level steps to be performed during the delta load.
Individual Customer Name Standardization
This standardization procedure will receive an individual name from MDM before processing
the individual name through the MNNAME rule set. The MNNAME rule set will parse the
individual name into separate name elements and create an analysis value or phonetic
representation value for the first and last name of the individual.
29
Figure 5: Quality Stage Initial Load Process
Source to SIF DS jobs
Source Extract File
QS Stan Jobs
MDM Code conformation
MDM DB
Figure 6: Quality Stage Delta Load Process
Source to SIF DS jobs
Source Extract File
MDM Code conformation
MDM DB
MDM
QS Stan Jobs
QS Stan Jobs No Code
value conformation
Applying Fuzzy Logic for Data Governance
For example:
If an individual by the name of “Mr William Chen” was passed to the individual
standardization procedure, this would be the standardization result.
Organizational Customer Name Standardization
This standardization procedure will receive an organization name from MDM and process the
organization name through the MNNAME rule set. The MNNAME rule set will parse the
organization name into separate word elements and create an analysis value or phonetic
representation value for word1 and word2 of the organization name.
For Example:
If the organization name of “Bank of Example” was passed to this organization
standardization procedure, this would be the standardization result.
The important thing to note is that the original name feed into QualityStage from MDM will be
passed back to MDM. QualityStage does not change or enhance the organization name in
any way. QualityStage parses the name into smaller elements for matching purposes only.
MDM will receive the original name, the phonetic representation of organization name, and
the standardized name.
Address Standardization
This standardization procedure will receive an address from MDM and process the address
through the MDMCADDR and MDMCAAREA rule sets. The MDMCAADDR rule set will parse
30
Applying Fuzzy Logic for Data Governance
the address name into separate address elements and create an analysis value or phonetic
representation value for street name. The MDMCAAREA rule set will parse the city, province,
and postal code into separate address elements and create an analysis value or phonetic
representation value for the city name.
For example:
If the address of “123 Maple Street Unit 5 ” was passed to this address standardization
procedure, this would be the standardization result.
Matching
In order to maintain data quality, adding and updating a customer will trigger the matching
process.
Individual and organizational customers will be processed by different match specifications in
QualityStage, which consists of blocking parameters and scoring specifications for different
passes.
The MDM service will provide QualityStage (QS) with a set of candidates by searching the
MDM database through blocking parameters for different passes. The QS matching process
will compare and score each candidate and return the match result to the MDM.
In order to implement the match specification and respond to MDM requests, ISD job and
shared containers are created for the interface.
31
Applying Fuzzy Logic for Data Governance
Application Layer
The MDM server is an application component of the solution that interfaces with the data
source where the master data will be stored. It is also responsible for providing various
features for managing and maintaining master data to keep the data source as single version
of truth. The application is responsible for:
Interfacing with the master data source through various protocols
Managing master data through the exposed interfaces with other sub-
systems/external sources
Controlling access in terms of data visibility and enhancing data security
Identifying and providing information on potential candidate list of duplicate
customers to assist quality stage with figuring out detailed information on
customer duplication and storing them in the data source.
Merging two/multiple customers to enforce MDM data source as single view of
the customer and a single version of truth.
Provides a user interface to merge and maintain customer information while
duplicates are potential and not guaranteed.
Provides a user interface to manage and configure metadata
Database Layer
The database layer in the solution is responsible for storing all the master data. It also stores
the history data, audit, and meta-data required for the MDM application to execute. During the
initial load, the database is populated directly by the information integration layer. Once the
32
Applying Fuzzy Logic for Data Governance
initial population is successfully completed, daily extracts from source systems will be loaded
into the MDM database through the MDM batch framework and maintenance services.
Apart from business data, the database layer also contains meta-data which is required for
the MDM application. Meta-data is another key set of information which is configured for the
MDM application and controls the behaviour and functionality of the MDM application.
Data Quality Management in Detail
For example, if we have the following input file:
File Name
Profile
ID Name Address
Phone
Number
Party
Type B2B Personal
Cardholders
B2BPC
1 John Smith
123 Main Street, Toronto,
Ontario, Canada X1X1X1
416-549-
7061 <Blank>B2B Personal
Cardholders
B2BPC
2 ABC Limited
456 King Avenue, Calgary,
Alberta, Y2Y2Y2
416-549-
7061 <Blank>B2B Personal
Cardholders
B2BPC
3
John and
Jane Smith
123 Main Street, Toronto,
Ontario, Canada X1X1X1 <Blank> <Blank>B2B Personal
Cardholders
B2BPC
4 A <Blank> <Blank>
There are several data quality requirements (baseline) that need to be followed in order to
keep the data quality:
Requirement Description
33
Applying Fuzzy Logic for Data Governance
Name Formatting and Standardization
1. If a free form name, i.e. it is unparsed as a single string,is received
as an input, the MDM matching solution should parse or tokenize
the name to the common format required for processing. E.g.:
John Smith may need to be tokenized into First Name = John and
Last Name = SmithAddress Formatting and Standardization
1. If a free form address, i.e. it is unparsed as a single string, is
received as an input, the MDM matching solution should parse or
tokenize the address to the common format required for
processing – for both Canadian and US addresses. E.g.: 123 Main
Street may need to be tokenized into Street Number = 123, Street
Name = Main Street. If the country code / name is missing in the
incoming files, the Canadian address standardization rules will be
applied as a default Address Validation and Correction
1. All addresses received in the input files should be validated and
corrected based on checks with Canada Post. In case of an
address correction, the address as provided by Canada Post will
be applied.Phone Number Formatting and Standardization
1. If a free form phone number, i.e. it is unparsed as a single string,
is received as an input, the MDM matching solution should parse
or tokenize the phone number to the common format required for
34
Applying Fuzzy Logic for Data Governance
processing. E.g.: 416-549-7061 may need to be tokenized into
Area Code = 416, Number = 549-7061Name Patterns
1. The MDM matching solution should develop data processing rules
to handle the following patterns that may occur in the ‘name’
fields:
For individuals, the connectors that will identify such patterns are
a. Space And Space (e.g.: John And Jane Smith)
b. Space and Space (e.g.: John and Jane Smith)
c. & (e.g.: John&Jane Smith)
d. Space & Space (e.g.: John & Jane Smith)
e. / (e.g.: John/Jane Smith)
f. Space / space (e.g.: John / Jane Smith)
g. \ (e.g.: John\Jane Smith)
h. Space \ Space (e.g.: John \ Jane Smith)
For organizations, the connectors that will identify such patterns are
i. / (e.g.: John/ABC Limited)
j. Space / space (e.g.: John / ABC Limited)
k. \ (e.g.: John\ABC Limited)
l. Space \ Space (e.g.: John \ ABC Limited)
2. If a name pattern has either of the ‘And’, ‘and’ or ‘&’ connector, the
following requirements should be developed:
a. A lookup with the organization name directory should be
35
Applying Fuzzy Logic for Data Governance
performed.
b. If the name pattern matches with an organization name
from the directory, the record should not be split into
discrete records.
c. If the name pattern does not match with an organization
name from the directory, the record should be split into
discrete records.
Matching Process Requirement:
Req. ID Requirement DescriptionFR
2.3.1
Rules - Overview
1. The MDM matching rules should be designed and developed to match
incoming records across all input files – i.e. match all input files with
each other
2. The MDM matching rules should be designed and developed to match
the incoming records with all records stored in the MDM
FR
2.3.2
Rules - List
The following matching rules should be designed and developed in the
MDM environment:
1. Rule 1: Individual Matching - Individual Full Name and Full Address
2. Rule 2: Organizational Matching - Organization Name and Full Address
3. Rule 3: Household Matching Individual - Last Name and Full Address
4. Rule 4: Address Matching - Full Address
36
Applying Fuzzy Logic for Data Governance
5. Rule 5: Phone Number Matching – Full Phone Number
NOTE: Each of the above matching rules should generate independent
match IDs/keysFR
2.3.3
Rules - Data Elements
1. Full Name – When the matching rules are based on Full Names, the
following discrete data elements should be used:
a. First Name a.k.a Given Name
b. Last Name
c. Name Suffix
d. Organization Name (as applicable)
2. Full Address – When the matching rules are based on Full Address, the
following discrete data elements should be used:
a. Apartment / Unit Number
b. Street Number
c. Street Name
d. Street Type
e. City
f. Province
g. Postal Code
h. Country
i. Non Civic Address Info (as applicable)
3. Full Phone Number – When the matching rules are based on Full
Phone Number, the following discrete data elements should be used:
37
Applying Fuzzy Logic for Data Governance
a. Country Code
b. Area Code
c. Number
NOTE: The Individual Matching Process uses ‘Residential Primary’ or
equivalent addresses only, while Organizational Matching Process uses
‘Business Primary’ or equivalent addresses only
NOTE: Phonetic representations of first name, last name, and street name
are used by the current MDM matching processFR
2.3.4
Rules – Guidelines
1. The corrected postal address should be used by the MDM matching
process.
2. Each record from each input file should undergo each of the 4 rules
stated above.
NOTE: For example, a record identified as ‘Individual’ should undergo the
Organizational match rule as well.
3. Wherever applicable, the Match IDs/keys as generated by the individual
matching rules should be cross referenced in the output files. E.g.: A
record could have an Individual Match Key as 123 and a Household
Match Key as 456.
4. A separate match ID / key should be generated for records within the
MDM that do not have a match with records in the input files.FR
2.3.5
Rules – Weights, Thresholds and Categories
1. The MDM matching solution should be designed and developed for
38
Applying Fuzzy Logic for Data Governance
‘looser’ matching rules.
2. The weights and thresholds that are currently assigned in the MDM
environment should be used as a starting point for the design and
development.
3. The match categories that are currently identified in the MDM
environment should be used as a starting point for the design.
FR
2.3.6
Rules – Error Condition
1. In case the incoming record was unable to be processed by the MDM
matching solution, it should be highlighted in the output file.
2. A description of the reason why the record could not undergo the
matching process should be included in the output file.
NOTE: These error descriptions should be as provided by the MDM
matching solution with no new requirements.
Output file:
Input Data
Input
File
Name
In
put
Pro
file
ID
In
put
Name
In
put
Ad
dres
s
Phon
e
Num
ber
In
put
Part
y
Type
Se
quenc
e
Num
ber
Spli
t
Name
Ad
dress
Valid
ation
or
Cor
rec
tion
Indic
Ad
dress
Valid
ation
or
Cor
rec
tion
De
Cor
rec
ted
Ad
dress
or
Ad
dress
from
Matc
Pro
cess
39
Applying Fuzzy Logic for Data Governance
ator
scrip
tion MDM
B2B
Per
sonal
Card
hold
ers
B2BP
C 1
John
Smit
h
123
Mai
Stre
et,
Toro
nto,
Onta
rio,
X1X1
X1
416
549
7061
<Bla
nk>
<Blan
k>
John
Smit
h
Cor
rected
Street
name
not
found
123
Main
Stree
t,
Toron
to,
Ontar
io,
X1X1X
1
Suc
cess
B2B
Per
sonal
Card
hold
ers
B2BP
C 2
ABC
Lim
ited
456
King
Aven
ue,
Calg
ary,
Albe
rta,
Y2Y2
Y2
416
549
7061
<Bla
nk> 1
ABC
Lim
ited
Cor
rected
Street
name
not
found
456
King
Aven
ue,
Cal
gary,
Al
berta
,
Y5M2Y
2
Suc
cessB2B
Per
sonal
Card
hold
ers
B2BP
C 2
ABC
Lim
ited
456
King
Aven
ue,
Calg
ary,
Albe
416
549
7061
<Bla
nk>
2 ABC
Lim
ited
Cor
rected
Postal
Code
Incor
rect
456
King
Aven
ue,
Cal
gary,
Al
Suc
cess
40
Applying Fuzzy Logic for Data Governance
rta,
Y2Y2
Y2
berta
,
Y5M2Y
2
B2B
Per
sonal
Card
hold
ers
B2BP
C 3
John
and
Jane
Smit
h
123
Main
Stre
et,
Toro
nto,
Onta
rio,
X1X1
X1
<Bla
nk>
<Bla
nk> 3
John
Smit
h Valid
Accur
ate
123
Main
Stree
t,
Toron
to,
Ontar
io,
X1X1X
1
Suc
cess
B2B
Per
sonal
Card
hold
ers
B2BP
C 3
John
and
Jane
Smit
h
123
Main
Stre
et,
Toro
nto,
Onta
rio,
X1X1
X1
<Bla
nk>
<Bla
nk> 4
Jane
Smit
h Valid
Accur
ate
123
Main
Stree
t,
Toron
to,
Ontar
io,
X1X1X
1
Suc
cessB2B
Per
sonal
Card
hold
ers
B2BP
C 4
A <Bla
nk>
<Bla
nk>
<Bla
nk>
<Blan
k>
<Bla
nk>
<Blank
>
Insuf
fi
cient
(or
blank)
ad
dress
Fail
41
Applying Fuzzy Logic for Data Governance
in
forma
tion
<Blank
>
<Bla
nk>
<Bla
nk>
<Bla
nk>
<Bla
nk>
<Bla
nk> 5
Dav
id
John
son
<Blank
>
<Blank
>
789
Pop
lar
Road,
Ott
awa,
Ontar
io,
A6A6A
6
<Bla
nk>
42
Applying Fuzzy Logic for Data Governance
CHAPTER 3 – ISSUES, CHALLENGES AND TRENDS
Unfortunately, during the MDM matching process, there are still processes that need human
intervention, such as the following tasks:
The Potential Overlay Task:
A potential overlay occurs when a record is updated with information that is radically different
from the data already in the record. For example, consider the situation illustrated below:
The data steward will mark the record as a potential overlay record because the ID field from
both records are the same. However, when we look closely on these two records, we can find
that Linda Xiang and Jane Lewis are clearly not the same person. The ID
388293023980000000 was created on Feb 28, 1998 and belongs to Linda Xiang. Somehow,
on Aug 24, 2006, the record was updated. It now appears to belong to a woman named Jane
Lewis. It may have been caused by a common typographical data entry mistake in which
43
Figure 7: Case 5
party id Gender Address phone last modify
5### LEWIS JANE F 26-Jun-71 08/24/06
### XIANG LINDA F 13-Jan-78 02/28/98
Case
Family Name
Given Name
Date of Birth
100 Kumar
Avenue, Markham, Ontario, Canada A2B2C2
416-549-7070456 King
Avenue, calgary, Alberta, Y2Y2Y2
416-549-7070
Applying Fuzzy Logic for Data Governance
Linda Xiang's record was open on the screen when the customer service representative
started typing, not realizing that he or she was typing over someone else's data.
There are also some situations in which this scenario would be perfectly valid. In cases of
events such as marriage, divorce, a move, or phone-number change, a person's data would
change significantly enough to flag a potential overlay task by a data steward application.
Using data mining and fuzzy logic can automatically solve the potential overlay tasks.
Match Duplicate Suspects to Create a New Master Record:
As a solution to Data warehouse applications, data governance will match the records from
multiple lines of business (LOB). There are situations where customers from multi LOBs may
have similar names, addresses, and telephone numbers and may have fields with blank or
values that are not available (N/A). For instance, see below:
44
Applying Fuzzy Logic for Data Governance
The two records above have the same Given name, Date of Birth, Address, Phone number.
However, the Family Name is slightly different. Are these two records the same customer?
Data governance applications currently available in markets will stop here and waiting human
intervention to identify. Through the application of data mining and fuzzy logic we would be
able to identify such cases without human intervention and generate a single customer profile
with the best data from all sources.
Link Related Records from Multiple Sources:
With overlays, the task verifies the existing records in the system. With duplicate suspects,
the task gets rid of extra records. For this task, it links records between systems. The current
data steward application available in the market may not able to automatically link such
45
Figure 8: Case 3
Case party id Address phone
3### VERKIN SMITH 5-Aug-60
### VERKI SMITH 5-Aug-60
Family Name
Given Name
Date of Birth
987 Village Ave,
Toronto, Ontario, Canada T2T1C1
416-222-3333
987 Village Ave,
Toronto, Ontario, Canada T2T1C1
416-222-3333
Applying Fuzzy Logic for Data Governance
records due to not having enough data in common, such the example below:
For the above example, when you first look at these two records, they appear to be the same
company. However, if one looks closely, one can determine some differences. First, the
address are different: “Unit 10” is only displayed in one of the record's address field. Second,
the phone numbers are different. One is “647-123-4567” and the other is “647-123-2352”.
Data mining and fuzzy logic can automatically verify these two records are the same company
and link them together.
46
Figure 9: Case 2
Case party id Address phone Source
2### 10-May-98 Market
### 10-May-98 Auto
Family Name
Given Name
Date of Birth
GUGGENHEIM REAL ESTATE
LLC
123 Main Street,
Toronto, Ontario, Canada X1X1X1
647-123-4567
GUGGENHEIM REAL ESTATE
LLC
St Unit 10,
Toronto, Ontario, Canada X1X1X1
647-123-2352
Applying Fuzzy Logic for Data Governance
CHAPTER4 – FUZZY LOGIC
Traditional Logic:
Now let's suppose that we generate the following training set based on the Data Steward
application output including the potential overlay task, duplicate suspects, and related records
from multiple sources. We would have:
47
Applying Fuzzy Logic for Data Governance
48
Figure 10: Cases
Case party id Address phone Source Class
1### VERKIN YOUSSOU 5-Aug-60 Mortgage
N
### VERKIN JANE 5-Aug-60 Auto
2### 10-May-98 Market
Y
### 10-May-98 Auto
3### VERKIN SMITH 5-Aug-60 Life
Y
### V. SMITH 5-Aug-60 Auto
4### 5-Aug-60
Y
### 5-Aug-60
5 ### LEWIS JANE 26-Jun-71 N
### XIANG LINDA 13-Jan-78
Family Name
Given Name
Date of Birth
10 Main Street,
Markham, Ontario, Canada X2Y1X1
915-123-4213
10 Main Street,
Markham, Ontario, Canada X2Y1X1
915-123-4213
GUGGENHEIM REAL ESTATE
LLC
123 Main Street,
Toronto, Ontario, Canada X1X1X1
647-123-4567
GUGGENHEIM REAL ESTATE
LLC
123 Main St Unit
10, Toronto, Ontario, Canada X1X1X1
647-123-2352
987 Village Ave,
Toronto, Ontario, Canada T2T1C1
416-222-3333
987 Village Ave,
Toronto, Ontario, Canada T2T1C1
416-222-3333
CREATIVE LEADERSHIM GROUM
LTD
456 King Avenue, calgary, Alberta, Y2Y2Y2
416-549-7070
CREATIVE LEADERSHIM GROUM
LTD
456 King Avenue, calgary, Alberta, Y2Y2Y2
416-549-7070
100 Kumar
Avenue, Markham, Ontario, Canada A2B2C2
416-549-7070
24/08/2006
456 King Avenue, calgary, Alberta, Y2Y2Y2
416-549-7070
28/02/1998
Applying Fuzzy Logic for Data Governance
Traditional logic - the idea that the outcome can only be either true or false, 1 or 0, right or
wrong. This form of logic dates back to ancient Greece and is perfectly adequate to answer
simple questions in single dimensions. For example, if A is 1 and B is 0 what is A AND B ? It
can be extended, as is done in Boolean algebra to more complex questions, as long as all the
parts can be described using the same restricted alphabet of two symbols. Such logic is a
deductive way of understanding consequences and is a highly valuable intellectual technique.
12
If we use the above traditional logic, we will get the following training set:
Applying the information gain on the above training set, we will get the information gain on the
attributes:
49
Figure 11: Training Set
case Address phone class
1 T F T T T N
2 T T T F F Y
3 F T T T T Y
4 T T T T T Y
5 F F F F T N
Family Name
Given Name
Date of Birth
Applying Fuzzy Logic for Data Governance
Info(D)=−∑i=1
m
p i log2( p i)
InfoA(D)=∑j=1
v ∣D j∣∣D∣
×Info(D j)
Gain( A)=Info(D)−InfoA (D)
Info(D)=−25
log225−
35
log235=0.97
InfoFamilyName (D)=35 (−2
3log2
23−
13
log213 )+2
5 (−12
log212−
12
log212 )=0.95
InfoGivenName (D)=35 (−3
3log2
33−
03
log203 )+2
5 (−22
log222−
02
log202 )=0
InfoDateofBirth(D)=45 (−3
4log2
34−
14
log214 )+1
5 (−11
log211−
01
log201 )=0.65
InfoAddress(D)=35 (−2
3log2
23−
13
log213 )+2
5 (−12
log212−
12
log212 )=0.96
InfoPhone(D)=45 (−2
4log2
24−
24
log224 )+1
5 (−01
log201−
11
log211 )=0.8
Hence, the gain in information from such a partitioning would be:
Gain(FamilyName)=Info(D)−InfoFamilyName (D)=0.97−0.95=0.02Gain(GivenName)=Info(D)−InfoGivenName (D)=0.97−0=0.97
Gain(DateofBirth)=Info (D)−InfoDateofBirth(D)=0.97−0.65=0.32Gain( Address)=Info(D)−InfoAddress(D)=0.97−0.96=0.01
Gain(Phone)=Info(D)−InfoPhone(D)=0.97−0.8=0.17
Courier
50
Applying Fuzzy Logic for Data Governance
Since GivenName has the highest information gain among the attributes, it is selected as the
splitting attribute. So we get the following decision tree:
One of the issues about the above decision tree is the uncertainty of the attributes. For
example, is the name “John Smith” the same as “J.Smith”. The above model only provided
two states for attributes which considered whether the Given Name is the same or the Given
Name is not the same. I will illustrate here to tackle the uncertainty associated with the
description of knowledge by using fuzzy logic.
51
Figure 12: Traditional Decision Tree
Applying Fuzzy Logic for Data Governance
Fuzzy Logic History
The term "fuzzy logic" was introduced with the 1965 proposal of the fuzzy set theory by Lotfi
A. Zadeh .[2][3] Fuzzy logic has been applied to many fields, from control theory to artificial
intelligence. Fuzzy logic however had been studied since the 1920s as infinite-valued logic
notably by Łukasiewicz and Tarski.[4]
The Basic Concept of Fuzzy Logic
Fuzzy mathematics forms a branch of mathematics related to the fuzzy set theory and
fuzzy logic . It started in 1965 after the publication of Lotfi Asker Zadeh 's seminal work Fuzzy
sets.[1] A fuzzy subset A of a set X is a function A:X→L, where L is the interval [0,1]. This
function is also called a membership function. A membership function is a generalization of a
characteristic function or an indicator function of a subset defined for L = {0,1}. More
generally, one can use a complete lattice L in a definition of a fuzzy subset A . 9
A Fuzzy Implementation:
For each input and output variable selected, I define two or more membership functions (MF).
There is qualitative category for each one, for example: true or false. The shape of these
functions can be diverse but I will work with a triangle, which needs three points to define one
MF of one variable. Below is the triangle for the variable GivenName:
52
Applying Fuzzy Logic for Data Governance
If we take GivenName as a variable, 'true' as the triangle, and 'false' as the trapezoid ( see
the figure above),
– the MF 'true' will be defined by three points : (x0, x1, x2) (x0 is any negative
value. )
– the MF 'false' will be defined by four points : (x1, x2, x3, x4) ( x4 is any positive
value > x3. This means that 'false' will be 1 after x2 infinite. )
We have the following MF for Given Name:
y(triangle)
true ( x; x0 ,x1 , x2 )=max(min( x−x0
x1−x0
,x2−xx2−x1
),0)y(trapezoid)
false(x ; x1, x2 , x3 , x4)=max (mix ( x−x1
x2−x1
,1 ,x4−x
x4−x3) ,0)
For the Given Name variable, I use the Levenshtein distance to calculate the value of x:
53
Figure 13: Fuzzy MF
x0 x1 x2 x3 x4
y1
true false
Applying Fuzzy Logic for Data Governance
5, then the two values are somewhat similar. If the distance is greater than 5, then the two
values are not the same at all.
After the above specification, we have the fuzzificate real value for GivenName. For example,
for “kitten” and “sitting” with distance 3, we can get the fuzzificated
ytrue=max(min( 3−∞
0−∞,5−35−0 ),0)=0.4
y false=max(min( 3−0
5−0,1 ,
∞−3∞−15 ) ,0)=0.6
Decision Tree definition:
Now let's reconsider the decision tree we introduced before:
For this simple case, we have the following rule based on the decision tree above:
IF GivenName is equal (T), THEN two records are equal.
Next is to compute the degree of membership to the MF (true, false) of the output (the THEN
55
Figure 14: Traditional Decision Tree
Applying Fuzzy Logic for Data Governance
part). Once a variable such as the Given Name is fuzzificated, it takes a value between 0 and
1, indicating the degree of membership to a given MF of the specific variable. The degrees of
membership of the input variables have to be combined to get the degree of membership of
the output. For a single input variable, such as the rule specified above, we can for example
have a fuzzy rule as shown below:
IF GivenName is equal (T), THEN two records are equal;
IF GivenName is not equal (F), THEN two records are not equal;
According to these rules, if we suppose that the degree of membership for GivenName is 0.6
to MF 'false', then the two records that are not equal are 0.6, too.
In case we have more than one input variable, the degree of membership for the output value
will be the minimum value of the degree of membership for the different inputs. For example,
suppose we have two input variables (GivenName X and Family Name Y) and the decision
matrix below:
If we calculated the attributes as having the following fuzzificated values:
56
Figure 15: Decision Matrix
FamilyNameequal not equal
GivenNameequal equal not equalnot equal not equal not equal
Applying Fuzzy Logic for Data Governance
yGivenNameequal
=0.8
yFamilyNamenotequal
=0.9
Then we have the following rule satisfied:
IF GivenName is equal (degree of 0.8) and FamilyName is not equal (degree of 0.9) THEN
the two records are not equal (degree of 0.8).
yGivenNamenot equal
=0.8
yFamilyNameequal
=0.2
The following rule would also be satisfied:
IF GivenName is not equal (degree of 0.8) and FamilyName is equal (degree of 0.2) THEN
the two records are not equal (degree of 0.2)
Brief Discussion:
In applying fuzzy logic to the data governance process, we can get a more accurate decision
tree, which will enhance the decision making process. With the above example, using the
traditional decision tree model, it has to be taken into consideration whether FamilyName and
GivenName are slightly different. If FamilyName and GivenName are different, the conclusion
may be drawn that the two records belong to different persons. However, when we apply
fuzzy logic, we may say that the records with FamilyName are not equal to some extent(let's
say 20% not equal) and that GivenName is somewhat equal at 0.3 degrees. In that case, the
records would be considered to belong to the same person based on fuzzificated logic.
Therefore, a more accurate result is gained.
57
Applying Fuzzy Logic for Data Governance
CHAPTER 5 - CONCLUSIONS
In this essay, the history of data governance was discussed, as well as current literature and
the future of this process. The data governance process itself was then explained wherein it
was found that the central point of data governance is related to data quality. In order to
improve the data quality of the master data repository, fuzzy logic was applied to the data
governance process. With data governance constantly evolving , we have the requirement to
guarantee the quality of data governance. Applying fuzzy logic will definitely help to improve
the quality of data governance. Fuzzy logic will not only improve the data quality process, but
it will actually also improve the process automation.
58
Applying Fuzzy Logic for Data Governance
REFERENCE
1. Data Governance (November 7, 2013). In Wikimedia, the free encyclopedia.
Retrieved December 5, 2013, from http://en.wikipedia.org/wiki/Data_governance
2. A Brief History of Data Quality (March 25, 2009). Data Governance Insider:
Covering the world of big data and data governance. Retrieved from http://data-
governance.blogspot.ca/2009/03/brief-history-of-data-quality.html
3. Nigel Turner (Nov 15, 2013). Kindling the Flames: The Future of Data Governance.
Retrieved December 11, 2013, from http://smartdatacollective.com/dat-
mai/167531/kindling-flames-future-data-governance
4. Rick Sherman (2011) A must to avoid: Worst practices in enterprise data governance.
Retrieved from http://searchdatamanagement.techtarget.com/feature/A-must-to-avoid-
Worst-practices-in-enterprise-data-governance
5. Marketing Data Governance in the Era of “Big Data” Retrieved from http://www.kbmg.-
com/wp-content/uploads/2013/07/Winterberry-Group-White-Paper-Market-
ing-Data-Governance-July-2013.pdf
6. Sunil Soares (Sept 2010). The IBM Data Governance Unified Process. Ketchum,
USA: MC Press Online, LLC
7. Julie Langenkamp-Muenkel (Oct 2013). MDM and Next-Generation Data Sources.
Information Management
8. Huey-Li Chen, Long-Hui Chen and chien-Yu Huang (2009). Fuzzy Goal Programming
Approach To Solve The Equipments-Purchasing Problem of AN FMC. International
Journal of Industrial Engineering, 16(4), 270-281, 2009
9. Fazzy Mathematics ( Nov 28, 2013). In Wikimedia, the free encyclopedia. Retrieved
Feb 2, 2014, from http://en.wikipedia.org/wiki/Fuzzy_mathematics
10. A Fuzzy implementation. Retrieved Nov 15, 2014 from http://apps.ensic.inpl-
nancy.fr/benchmarkWWTP/RiskAnalysis/RiskWeb/RiskModule_070423_fichiers/Fuzzy
_implementation_070423.pdf
11. Risk Analysis (April 2007) Retrieved Nov 15, 2014 from http://apps.ensic.inpl-
nancy.fr/benchmarkWWTP/RiskAnalysis/RiskWeb/RiskModule_070423_fichiers/
59
Applying Fuzzy Logic for Data Governance
12.Fuzzy Multidimensional Logic (March 2004). Retrieved Feb 18, 2014 from
http://www.calresco.org/lucas/fuzzy.htm
13.Levenshtein distance ( Feb, 2014 ). Retrieved Feb 19, 2014 from
http://en.wikipedia.org/wiki/Levenshtein_distance
14.Adler. Big Data Governance Maturity (March 2012). Retrieved Feb 23, 2014 from
https://www.ibm.com/developerworks/community/blogs/adler/entry/big_data_governan
ce_maturity?lang=en
15.DataFlux Data Management. The Intersection of Big Data, Data Governance and
MDM. Retrieved Feb 23, 2014 from http://digital.info-mgmt.com/info-
mgmt/DataFlux_SAS2012#pg1
16. Sammon, D. and Adam, F. “Making Sense of the Master Data Management (MDM)
Concept: Old Wine in New Bottles or New Wine in Old Bottles?” Proceedings of the
2010 conference on Bridging the Socio-technical Gap in Decision Support Systems:
Challenges for the Next Decade Pages 175-186
60