Upload
camille-ketchum
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
1er Simposio Latinoamericano
Data Quality Fundamentals
Miguel Angel Granados Troncoso
Agenda
• Scenarios• Definitions, Processes and Standards• Data Quality Services (DQS)• DQS Solutions
Organizational Compliance
Optimized Productivity
11Extend Any Data, Anywhere
Fast Timeto Solution
Scalable Analytics & DW
8Credible, Consistent Data
Peace of Mind
Managed Self-Service BI
4
Rapid Data Exploration
3Blazing-Fast Performance
2Required 9s& Protection
1
Scale on Demand
5 76
12109
MISSION CRITICAL CONFIDENCE
BREAKTHROUGH INSIGHT
CLOUD ON YOUR TERMS
Credible, Consistent Data% of master data complete & accurate
Hrs spent per employee each week searching for info
Top 20% Performers1.2hrs
Middle 50% Performers 2.8hrs
91%
68%
Under 50%Bottom 30% Performers 6hrs
Companies with accurate data perform better¹
Single BI Semantic
Model
Data Quality
Services¹Source: “Turning Pain into Productivity with Master Data Management,” Aberdeen Group, Feb 2011
Delivered with MasterData
Services
#7
Why is Data Quality Important?Data quality problems cost U.S. businesses more than $600 billion a year.
Data Warehousing Institute (TDWI)
Costs associated with bad data include: • Excess inventory• Higher supply chain costs• higher direct marketing costs• Billing• And more…
Common Data Quality IssuesData Quality Issue Sample Data Problem
Format Do values follow consistent formatting standards ? Telephone number formats:xxxxxxxxxx, (xxx) xxx-xxxx 1.xxx.xxx.xxxx, etc.
Standard Are data elements consistently defined and understood ? ‘Gender code’ = M, F, U ‘Gender code’ = 0, 1, 2
Consistent Do values represent the same meaning ? How is revenue presented ?Dollars, Euro, Both?
Complete Is all necessary data present ? 20% of customers’ last name is blank, 50% of zip-codes are 99999
Accurate Does the data accurately represent reality or a verifiable source? A Supplier is listed as ‘Active’ but went out of business six years ago
Valid Do data values fall within acceptable ranges? Salary values should be between 60,000-120,000
Duplicates Data appears several times Both John Ryan and Jack Ryan appear in the system – are they the same person?
Agenda
Scenarios• Definitions, Processes and Standards• Data Quality Services (DQS)• DQS Solutions
Data Governance
IT Governance
Data Governance
Data Management
Data Quality
Data Correctness
Strategic
Tactical
Data Management
Content
• Subject details• Attribute identification• Subject names• Definitions• Values representation• Standard formats
Relationship
• Identity part (similar attributes)• Group (Rules/Logic)• Hierarchy (Parent/Child)• Relationship Rules/Scenarios
Access
• Access and Sharing Politics (internal/external)
• Data provider• Metadata (use, lineage, etc)• Regulations/Security• External data sources
Changes Management
• Data Quality and Acceptability• Measurement and monitoring• Detection and Error correction• Centralized change tracking• Jurisdiction over data
Data Standarization
Data Management
Master Data Management
Data Quality
• Data quality consists of verifying whether the data is suitable for their intended use in operations, decision making and planning.
Domain Management
Knowledge Discovery
Discovery Value
Management
Quality Control Efforts• Knowing the context of the data• Profile the data required• Create and maintain quality standards• Tracking Data Quality
Requirements for Data Quality Solution
Cleansing
MatchingProfiling
Monitoring
Tracking and monitoring the state of data quality activities and quality of data.
Analysis of the data source; providing insight into the quality of the data, to identify data quality issues.
Amend, remove or enrich data that is incorrect or incomplete. This includes correction, standardization and enrichment.
Identifying, linking and removing duplications within or across sets of data.
How to Manage Data Quality?Data quality management entails the establishment and deployment of:– Roles– Responsibilities– Policies– Procedures– Technology
Data Quality Standards
ISO 8000
• Data Quality Principles• Characteristics that
defines data quality• Processes that ensure
data quality
ISO 22745
• Defines open technical dictionaries
• Applying dictionaries to master data
International Association for Information and Data Qualityhttp://www.iaidq.org/
Agenda
ScenariosDefinitions, Processes and Standards• Data Quality Services (DQS)• DQS Solutions
What is Data Quality Services?
Data Quality Services (DQS) is a Knowledge-Driven data quality solution, enabling IT Pros and data stewards to easily improve the quality of their data
DQS Solution Concepts
Knowledge-DrivenBased on a Data Quality Knowledge Base (DQKB) that is reusable for a variety of data quality improvements
Knowledge Discovery
Acquire additional knowledge through data samples and user feedback
SemanticsData is mapped into Data Domains, which capture its Semantics
Open and Extendible
Support use of user-generated knowledge and IP by 3rd party reference data providers
Easy to Use
Compelling user experience designed for increased productivity
Data Quality Knowledge Base (DQKB)
Matching Policy
Domains
Composite Domains
Matching Rules
Reference Data Services
Composite Domain Rules
Value Relations
Reference Data Services
Domain Rules
Term-based Relations
Values
• Repository of knowledge about data:– Domains define values and rules for each field– Matching policies define rules for identifying duplicate records
DQS Knowledge Sources
Windows Azure Marketplace™ Data MarketCleanse and enrich data with Reference Data Services from DataMarket
DQS Data StoreWebsite that contains DQS knowledge available for downloading
3rd Party Reference Data ProvidersOpen integration with external 3rd party reference data providers
Organization DataCreate domains from your own data sources
Out of the Box Knowledge A set of data domains that come out of the box with DQS
What is a Domain?
Domain
Values
Reference Data Rules and Relationships
• Domains are specific to a data field
• Domains contain the rules for the data
• Domains can be individual or composite
KB
Name
Family NameFirst Name
What is a Reference Data Service?
Address
• The Azure Marketplace hosts specialist data cleansing providers Set up an account
Subscribe to a reference service
Map your domain to the reference service
DQS Architecture Overview
DQS Clients
Knowledge Discovery and Management
DQS Cloud Services
DataMarket - Categorized Reference DataDQS Client
DQS Server
Reference Data API(Browse, Set, Validate…)
Reference Data API(Browse, Get, Update…)
Common Knowledge Store
DQS Engine
Knowledge Discovery Data Profiling Exploration Matching
Cleansing
Reference Data
Reference Data Services
DQS Store - KB, Domains
© 2010 Microsoft Corporation. Microsoft Materials - Confidential. All rights reserved.
Interactive DQ Projects
Administration
Future Clients: Excel, SharePoint,MDS…
DQ Active Projects Published KBs
SSIS DQS Cleansing Component
DQ Projects Store
Other DQS Clients
3rd Party Reference Data
Agenda
ScenariosDefinitions, Processes and StandardsData Quality Services (DQS)• DQS Solutions
IntegratedProfiling
Progress NotificationsStatus
DQS process
Build
Use
DQ Projects
Knowledge Management
Cloud Services
KnowledgeBase
EnterpriseData
ReferenceData
Interactive Cleansing – DQS Project• Analyzes the quality of source data• Automatically corrects and enriches the data• Manual approval/rejection of suggestions provided by the cleansing algorithm/ reference data services
Knowledge Base
Batch Cleansing - Using SSIS
Matching Policy
Reference Data Definition
Invalid
Corrected
Suggested
Correct
Reference Data Services
New
DQS server
Values/Rules
Matching – DQS Project
Why Match?• Identify duplicates within the data source• Create consolidated view of data
DQS Matching• Build a matching police• Matching training• Create a matching project • Choose survivors
Agenda
ScenariosDefinitions, Processes and StandardsData Quality Services (DQS)DQS Solutions
Q&A
Miguel Ángel Granados Troncoso@[email protected]
Personal Bloghttp://www.granadostroncoso.com.mx
PASS Mexico City Chapterhttp://mexico.sqlpass.org @PASSMXDF
SolidQ Journalhttp://www.solidq.com/sqj/Pages/Home.aspx
Microsofthttp://www.microsoft.com/sqlserver/en/us/solutions-technologies/SQL-Server-2012-business-intelligence.aspx