28
WHITE PAPER Why have a Da ta Quality Str ateg y? Coursing through the electronic veins o organizations around the globe are critical pieces o inormation—whether they be about customers, products, inventories, or transactions. While the vast majority o enterprises spend months and even years determining which computer hardware, networking, and enterprise sotware solutions will help them grow their businesses, ew pay attention to the data that will support their investments in these systems. In act, Gartner contends, “By 2005, Fortune 1000 enterprises will lose more money in operational ineciency due to data quality issues than they will spend on data warehouse and customer relationship management (CRM ) initiatives (0.9 probability).” (Gartner Inc. T. Friedman April 2004). In its 2002 readership survey conducted by the Gantry Group LLC, DM Review asked, “What are the three biggest challenges o implementing a business intelligence/data warehousing (BI /DW) project within your organization?” O the 68 8 people who responded, the number-one answer (35% o respondents) was budget constraints. Tied with budget constraints, the other number-one answer was data quality. In addition, an equal number o respondents (35%) cited data quality as more important than budget constraints. Put simply, to realize the ull benets o their investments in enterprise computing systems, organizations must have a detailed understanding o the quality o their data—how to clean it, and how to keep it clean. And those organizations that approach this issue strategically are those that will be successul. But what goes into a data quality strategy? This paper rom Business Objects, an SAP company, explores strategy in the context o data quality. Data Quality Strategy: a S tep- by -Step approa ch CONTENTS 1 Why Have a Data Quality Strategy ?  12 Denitions o Strategy 1 3 Building a Data Quality Strategy 24 Data Qu ality Goals 25 The Six Fact ors o Data Quality  16 Factor 1: Context  16 Factor 2: Storage  18 Factor 3: Data Flow 13 Factor 4: Workfow 15 Factor 5: Stewardship 18 Factor 6: Continuous Monitoring 24 Tying It All Together 25 Implementation and Project Management 26 Appendix A: Da ta Quali ty Strategy Checklist 27 About Business Objects

60.Data Quality Strategy a Step-By-Step Approach

Embed Size (px)

Citation preview

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 1/28

WHITE PAPER

Why have a Data Quality Strategy?Coursing through the electronic veins o organizations around the globe arecritical pieces o inormation—whether they be about customers, products,inventories, or transactions. While the vast majority o enterprises spend

months and even years determining which computer hardware, networking,and enterprise sotware solutions will help them grow their businesses,ew pay attention to the data that will support their investments in thesesystems. In act, Gartner contends, “By 2005, Fortune 1000 enterprises willlose more money in operational ineciency due to data quality issues thanthey will spend on data warehouse and customer relationship management(CRM) initiatives (0.9 probability).” (Gartner Inc. T. Friedman April 2004).

In its 2002 readership survey conducted by the Gantry Group LLC, DM Review asked, “What are the three biggest challenges o implementinga business intelligence/data warehousing (BI/DW) project within yourorganization?”

O the 688 people who responded, the number-one answer (35% o

respondents) was budget constraints. Tied with budget constraints, theother number-one answer was data quality. In addition, an equal numbero respondents (35%) cited data quality as more important than budgetconstraints.

Put simply, to realize the ull benets o their investments in enterprisecomputing systems, organizations must have a detailed understandingo the quality o their data—how to clean it, and how to keep it clean. Andthose organizations that approach this issue strategically are those that willbe successul. But what goes into a data quality strategy? This paper romBusiness Objects, an SAP company, explores strategy in the context odata quality.

Data Quality Strategy:a Step-by-Step approach

CONTENTS1 Why Have a Data Quality Strategy?

  12 Denitions o Strategy1 3 Building a Data Quality Strategy24 Data Quality Goals

25 The Six Factors o Data Quality  16 Factor 1: Context  16 Factor 2: Storage  18 Factor 3: Data Flow

13 Factor 4: Workfow15 Factor 5: Stewardship18 Factor 6: Continuous Monitoring

24 Tying It All Together25 Implementation and

Project Management26 Appendix A: Data Quality

Strategy Checklist27 About Business Objects

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 2/28

DefinitionS of StrategyMany denitions o strategy can be ound in management literature. Most all into oneo our categories centered on planning, positioning, evolution, and viewpoint. Thereare even dierent schools o thought on how to categorize strategy; a ew examples

include corporate strategies, competitive strategies, and growth strategies. Ratherthan pick any one in particular, claiming it to be the right one, this paper avoids thedebate o which denition is best, and picks the one that ts the management odata. This is not to say other denitions do not t data. However, the denition thispaper uses is, “Strategy is the implementation o a series o tactical steps.” Morespecically, the denition used in this paper is:

“Strategy is a cluster o decisions centered on goals that determinewhat actions to take and how to apply resources.”

Certainly a cluster o decisions—in this case concerning six specic actors—needto be made to eectively improve the data. Corporate goals determine how thedata is used and the level o quality needed. Actions are the processes improved

and invoked to manage the data. Resources are the people, systems, nancing,and data itsel. We thereore apply the selected denition in the context o data,and arrive at the denition o data quality strategy:

“A cluster o decisions centered on organizational data quality goals thatdetermine the data processes to improve, solutions to implement, andpeople to engage.”

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 3/28

builDing a Data Quality StrategyThis paper discusses:

• Goals that drive a data quality strategy

• Six actors that should be considered when building a strategy—context, storage,data fow, workfow, stewardship, and continuous monitoring

• Decisions within each actor

• Actions stemming rom those decisions

• Resources aected by the decisions and needed to support the actions

You will see how, when added together in dierent combinations, the six actors odata quality provide the answer as to how people, process, and technology are theintegral and undamental elements o inormation quality.

The paper concludes with a discussion on the transition rom data quality strategy

development to implementation via data quality project management. Finally,the appendix presents a strategy outline to help your business and IT managersdevelop a data quality strategy.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 4/28

Goals drive strategy. Your data quality goals must support ongoing unctionaloperations, data management processes, or other initiatives, such as theimplementation o a new data warehouse, CRM application, or loan processingsystem. Contained within these initiatives are specic operational goals. Examples

o operational goals include:• Reducing the time it takes you to process quarterly customer updates

• Cleansing and combining 295 source systems into one master customerinormation le

• Complying with the U.S. Patriot Act and other governmental or regulatoryrequirements to identiy customers

• Determining i a vendor data le is t or loading into an enterprise resourceplanning (ERP) system

In itsel, an enterprise-level initiative is driven by strategic goals o the organization.

For example, a strategic goal to increase revenue by 5% through cross-selling andup-selling to current customers would drive the initiative to cleanse and combine295 source systems into one master customer inormation le. The link betweenthe goal and the initiative is a single view o the customer versus 295 separateviews. This single view allows you to have a complete prole o the customer andidentiy opportunities otherwise unseen. At rst inspection, strategic goals maybe so high-level that they seem to provide little immediate support or data quality.Eventually, however, strategic goals are achieved by enterprise initiatives thatcreate demands on inormation in the orm o data quality goals.

For example, a nonprot organization establishes the objective o supporting alarger number o orphaned children. To do so, it needs to increase donations,which is considered a strategic goal or the charity. The charity determines that

to increase donations it needs to identiy its top donors. A look at the donor lescauses immediate concern—there are numerous duplicates, missing rst names,incomplete addresses, and a less-than rigorous segmentation between donorand prospect les, leading to overlap between the two groups. In short, theorganization cannot reliably identiy its top donors. At this point, the data qualitygoals become apparent: a) cleanse and standardize both donor and prospectles, b) nd all duplicates in both les and consolidate the duplicates into “best-o”records, and c) nd all duplicates across the donor and prospect les, and moveprospects to the prospect le, and donors to the donor le.

As this example illustrates, every strategic goal o an organization is eventuallysupported by data. The ability o an organization to attain its strategic goals is, inpart, determined by the level o quality o the data it collects, stores, and manages

on a daily basis.

Data Quality goalS

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 5/28

When creating a data quality strategy, there are six actors, or aspects, o anorganization’s operations that must be considered. The six actors are:

1. Context—the type o data being cleansed and the purposes or which it is used

2. Storage—where the data resides

3. Data fow—how the data enters and moves through the organization

4. Workfow—how work activities interact with and use the data

5. Stewardship—people responsible or managing the data

6. Continuous monitoring—processes or regularly validating the data

Figure 1 depicts the six actors centered on the goals o a data quality initiative.Each actor requires that decisions be made, actions carried, and resourcesallocated.

Figure 1: Data Quality Factors

Each data quality actor is an element o the operational data environment. Itcan also be considered as a view or perspective o that environment. In thisrepresentation (Figure 1), a actor is a collection o decisions, actions, andresources centered on an element o the operational data environment. The arrowsextending rom the core goals o the initiative depict the connection between goalsand actors, and illustrate that goals determine how each actor will be considered.

the Six factorSof Data Quality

Decisions

Actions

Context

ContinuousMonitoring Storage

Stewardship

Resources

 Actions

Decisions

Data Flow

Work Flow

Goals

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 6/28

factor 1: contextContext denes the type o data and how the data is used. Ultimately, the contexto your data determines the necessary types o cleansing algorithms and unctionsneeded to raise the level o quality. Examples o context and the types o data

ound in each context are:• Customer data—names, addresses, phone numbers, social security numbers,

and so on

• Financial data—dates, loan values, balances, titles, account numbers, and typeso account (revocable or joint trusts, and so on)

• Supply chain data—part numbers, descriptions, quantities, supplier codes,and the like

• Telemetry data—or example, height, speed, direction, time, and measurement type

Context can be matched against the appropriate type o cleansing algorithms.

For example, ”title” is a subset o a customer name. In the customer name column,embedded within the rst name or last name or by itsel, are a variety o titles—VP, President, Pres, Gnl Manager, and Shoe Shiner. It takes a specialized data-cleansing algorithm to “know” the complete domain set o values or title, and thenbe congurable or the valid domain range that is a subset. You may need a title-cleansing unction to correct Gneral Manager to General Manager, to standardizePres to President, and, depending on the business rules, to either eliminate ShoeShiner or fag the entire record as out o domain.

factor 2: StorageEvery data quality strategy must consider where data physically resides.Considering storage as a data quality actor ensures the physical storage medium

is included in the overall strategy. System architecture issues—such as whetherdata is distributed or centralized, homogenous or heterogeneous—are important.I the data resides in an enterprise application, the type o application (CRM,ERP, and so on), vendor, and platorm will dictate connectivity options to the data.Connectivity options between the data and data quality unction generally all intothe ollowing three categories:

• Data extraction

• Embedded procedures

• Integrated unctionality

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 7/28

D e

Data extraction occurs when the data is copied rom the host system. It is thencleansed, typically in a batch operation, and then reloaded back into the host.Extraction is used or a variety o reasons, not the least o which is that native,

direct access to the host system is either impractical or impossible. For example,an IT project manager may attempt to cleanse data in VSAM les on an overloadedmainrame, where the approval process to load a new application (a cleansingapplication, in this case) on the mainrame takes two months, i approved at all.Extracting the data rom the VSAM les to an intermediate location (or cleansing,in this case) is the only viable option. Extraction is also a preerable method ithe data is being moved as part o a one-time legacy migration or a regular loadprocess to a data warehouse.

emddd pds

Embedded procedures are the opposite o extractions. Here, data quality unctionsare embedded, perhaps compiled, into the host system. Custom-coded, stored

procedure programming calls invoke the data quality unctions, typically in atransactional manner. Embedded procedures are used when the strategy dictatesthe utmost customization, control, and tightest integration into the operationalenvironment. A homegrown CRM system is a likely candidate or this type oconnectivity.

id f

Integrated unctionality lies between data extraction and embedded procedures.Through the use o specialized, vendor-supplied links, data quality unctions areintegrated into enterprise inormation systems. A link allows or a quick, standardintegration with seamless operation, and can unction in either a transactionalor batch mode. Owners o CRM, ERP, or other enterprise application sotwarepackages oten choose this type o connectivity option. Links are a specic

technology deployment option, and are discussed in additional detail below, inthe workfow actor. Deployment options are the technological solutions andalternatives that acilitate a chosen connectivity strategy.

Data model analysis or schema design review also alls under the storage actor.The existing data model must be assessed or its ability to support the project. Isthe model scalable and extensible? What adjustments to the model are needed?For instance, eld overuse is one common problem encountered in a data qualityinitiative that requires a model change. This can happen with personal names—orexample, where pre-names (Mr., Mrs.), titles (president, director), and certications(CPA, PhD) may need to be separated rom the name eld into their own elds orbetter customer identication.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 8/28

factor 3: Data floWEach o the six strategy actors builds a dierent view o the operational dataenvironment. With context (type o data) and storage (physical location) identied,the next step in developing a data quality strategy is to ocus on data fow—the

movement o data.Data does not stay in one place. Even with a central data warehouse, data movesin and out just like any other orm o inventory. The migration o data can present amoving target or a data quality strategy. Hitting that target is simplied by mappingthe data fow. Once mapped, staging areas provide a “reeze rame” o the movingtarget. A data fow will indicate where the data is manipulated, and i the usage othe data changes context. Certainly the storage location will change, but knowingthe locations in advance makes the strategy more eective as the best location canbe chosen given the specic goals. Work evaluating data fow will provide iterativerenement o the results compiled in both the storage and context actors.

Data fow is important because it depicts access options to the data, andcatalogs the locations in a networked environment where the data is staged and

manipulated. Data fow answers the question: Within operational constraints, whatare the opportunities to cleanse the data? In general, such opportunities all intothe ollowing categories:

• Transactional updates

• Operational eeds

• Purchased data

• Legacy migration

• Regular maintenance

Figure 2 shows where these opportunities can occur in an inormation supplychain. In this case, a marketing lead generation workfow is used with itsaccompanying data fow. The ve cleansing opportunities are discussed in thesubsequent sections.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 9/28

Figure 2: Lead Generation Workow 

ts uds

An inherent value o the data fow actor is that it invites a proactive approach todata cleansing. The entry points—in this case, transactions—o inormation into theorganization can be seen, depicting where the exposure to fawed data may occur.When a transaction is created or captured, there is an opportunity to validatethe individual data packet beore it is saved to the operational data store (ODS).Transactional updates oer the chance to validate data as it is created or capturedin a data packet, rich with contextual inormation. Any deects encountered canimmediately be returned to the creator or originator or conrmation o change. Thiscontextual setting is lost as the data moves urther in the workfow and away romthe point o entry.

The dierence between a created and captured transaction is subtle, butimportant. A created transaction is one where the creator (owner o the data)directly enters the data into the electronic system as a transaction. A goodexample is a new subscriber to a magazine who logs onto the magazine’s Website and lls out an order or a subscription. The transaction is created, validated,and processed automatically without human intervention.

EngageProspects

StoreLeads

QualifyLeads

DistributeLeads

TradeShow

CollectLeads

ContactRecords

QualifiedProspectRecords

SalesProspect

Lists

List ofAttendees

RawLeads

PurchasedLists

LegacyMigration

OperationalFeeds

CRM SystemObsolete,

Home-GrownCall Center

TransactionalUpdates

OperationalFeeds

TransactionalUpdates

ActiveProspectRecords

Maintenance

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 10/28

Alternatively, a captured transaction is where the bulk o data collection takes placeofine and is later entered into the system by someone other than the owner o thedata. A good example is a new car purchase where the buyer lls out multiple paperorms, and several downstream operators enter the inormation (such as registration,

insurance, loan application, and vehicle conguration data) into separate systems.Created and captured data workfows are substantially dierent rom each other.The ability to correct the data with owner eedback is substantially easier and lesscomplex at the point o creation, than in the steps removed during capture.

o fds

The second opportunity to cleanse data is operational eeds. These are regular,monthly, weekly, or nightly updates supplied rom distributed sites to a central datastore. A weekly upload rom a subsidiary’s CRM system to the corporation’s datawarehouse is an example. Regular operational eeds collect the data into batchesthat allow implementation o scheduled batch-oriented data validation unctionsin the path o the data stream. Transactional updates, instead o being cleansedindividually (which implies slower processing and wider implementation ootprint),

can be batched together i immediate eedback to the transaction originator iseither not possible or necessary. Transaction-oriented cleansing in this manneris implemented as an operational data eed. Essentially, transaction cleansingvalidates data entering an ODS, such as a back-end database or a Web site,whereas operational-eed validation cleanses data leaving an ODS, passing tothe next system—typically a data warehouse, ERP, or CRM application.

psd D

A third opportunity to cleanse is when the data is purchased. Purchased data is aspecial situation. Many organizations erroneously consider data to be clean whenpurchased. This is not necessarily the case. Data vendors suer rom the sameaging, context-mismatch, eld overuse, and other issues that all other organizations

suer. I a purchased list is not validated upon receipt, the purchasing organizationessentially abdicates its data quality standards to those o the vendor.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 10

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 11/28

Validating purchased data extends beyond veriying that each column o data iscorrect. Validation must also match the purchased data against the existing dataset. The merging o two clean data sets is not the equivalent o two clean rivers

 joining into one; rather, it is like pouring a gallon o red paint into blue. In the case

o a merge, 1 + 1 does not always equal 2, and may actually be 1.5, with theremainder being lost because o duplication. To ensure continuity, the merged datasets must be matched and consolidated as one new, entirely dierent set. A hiddendanger with purchased data is it enters the organization in an ad hoc event, whichimplies no regular process exists to incorporate the data into the existing systems.The lack o established cleansing and matching processes written exclusively orthe purchased data raises the possibility that cleansing will be overlooked.

l M

A ourth opportunity to cleanse data is during a legacy migration. When you exportdata rom an existing system to a new system, old problems rom the previoussystem can inect the new system unless the data is robustly checked andvalidated. For example, a manuacturing company discovers during a data quality

assessment that it has three types o addresses—site location, billing address, andcorporate headquarters—but only one address record per account. To capture allthree addresses, the account sta was duplicating account records. To correct theproblem, the account record structure model o the new target system is modiedto hold three separate addresses, beore the migration occurs. Account recordsthat are duplicated because o dierent addresses can then be consolidatedduring the migration operation.

A question oten arises at this point: The account managers were well aware owhat they were doing, but why was the duplication o accounts not taken intoconsideration during the early design o the target system? The answer lies in thepeople involved in the design o the new system—what users were interviewed, andhow closely the existing workfow practices were observed. Both o these topics are

covered in the workfow and data stewardship actors discussed later in this paper.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 11

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 12/28

r M

The th and nal opportunity to cleanse data is during regular maintenance.Even i a data set is deect-ree today (highly unlikely), tomorrow it will be fawed.Data ages. For example, each year, 17% o U.S. households move, and 60% o

phone records change in some way. Moreover, every day people get married,divorced, have children, have birthdays, get new jobs, get promoted, and changetitles. Companies start up, go bankrupt, merge, acquire, rename, and spin o. Toaccount or this irrevocable aging process, organizations must implement regulardata cleansing processes—be it nightly, weekly, or monthly. The longer the intervalbetween regular cleansing activities, the lower the overall value o the data.

Regular maintenance planning is closely tied to the sixth strategy actor—Continuous Monitoring. Both require your organization to assess the volatility oits data, the requency o user access, the schedule o operations that use thedata, and the importance–and hence, the minimum required level o quality or thedata. Keeping all o this in mind, your organization can establish the periodicityo cleansing. The storage actor will have identied the location o the data and

preerred connectivity option.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 13/28

factor 4: WorkfloWWorkfow is the sequence o physical tasks necessary to accomplish a givenoperation. In an automobile actory, a workfow can be seen as a car moving alongan assembly line, each workstation responsible or a specic set o assembly

tasks. In an IT or business environment, the workfow is no less discrete, just lessvisually stimulating. When an account manager places a service call to a client,the account manager is perorming a workfow task in the same process-orientedashion as an engine bolted into a car.

Figure 3 shows a workfow or a lead generation unction where a prospect visitsa booth at a tradeshow and supplies contact inormation to the booth personnel.From there, the workfow takes over and collects, enters, qualies, matches,consolidates, and distributes the lead to the appropriate sales person, who thenadds new inormation back to the new account record.

Figure 3. Workow Touch Points and Data Quality Deployment Options

In Figure 3 above, two dierent concepts are indicated. Workfow touch points,shown in red, are the locations in the workfow where data is manipulated. Youcan consider these as the locations where the workfow intersects the data fow.

Sales contactslead and enters

into sales process

Sales plansapproach to lead

Sales learnlead information

Notify salesof lead

Qualify leadEnter lead

(data)Collect leadsquality center

Tradeshow(Event)

ProspectiveCustomer

Lead data

Data EntryReal-time

Point of CaptureReal-time

Information ExtractionEnterprise App Plug-in

Contract ManagementCustom Application

Matching,Consolidation,

and DataAppending

Manual Batch

Matching,(Leads to Territories)

Automated Batch

CRMLead

Information

Data EnteredData Converted

to Information

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 14/28

Some o these locations, like “Point o Capture,” actually spawn a data fow.Data quality deployment options, shown in purple, are a specic type o sotwareimplementation that allows connectivity or use o data quality unctionality at thepoint needed. In regard to workfow, data quality operations all into the ollowing

areas:• Front-oce transaction—real-time cleansing

• Back-oce transaction—staged cleansing

• Back-oce batch cleansing

• Cross-oce enterprise application cleansing

• Continuous monitoring and reporting

Each area broadly encompasses work activities that are either customer-acingor not, or both, and the type o cleansing typically needed to support them.Specic types o cleansing deployment options help acilitate these areas. Not tobe conused with the connectivity options discussed in the workfow actor, thethree general methods or accessing the data are connectivity options—extraction,embedded procedures, and integrated unctionality. Deployment options are ormso cleansing technology implementations that support a particular connectivitystrategy. The deployment option list below identies the types o options:

• Low-level application program interace (API) sotware libraries—high-controlcustom applications

• High-level API sotware libraries—quick, low-control custom applications

• Web-enabled applications—real-time e-commerce operations

• Enterprise application plug-ins—ERP, CRM, and extraction, transormation, andload (ETL) integrations

• Graphical user interace (GUI) interactive applications—data proling

• Batch applications—auto or manual start

• Web services and application service provider (ASP) connections—access toexternal or outsourced unctions

Each option incorporates data quality unctions that measure, analyze, identiy,standardize, correct, enhance, match, and consolidate the data.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 15/28

In a workfow, i a data touch point is not protected with validation unctions,deective data is captured, created, or propagated per the nature o the touch point.An important action in the workfow actor is listing the various touch points to identiylocations where deective data can leak into your inormation stream. Superimposing

the list on a workfow diagram gives planners the ability to visually map cleansingtactics, and logically cascade one data quality unction to eed another.

I a “leaky” area exists in the inormation pipeline, the map helps to positionredundant checks around the leak to contain the contamination. When building thelist and map, concentrate on the data dened by the goals. A workfow may havenumerous data touch points, but a subset will interact with specied data elements.

For example, a teleprospecting department needs to have all o the telephone areacodes or their contact records updated because rather than making calls, accountmanagers are spending an increasing amount o time researching wrong phonenumbers stemming rom area code changes. The data touch points or just thearea code data are ar ewer than that o an entire contact record. By ocusing onthe three touch points or area codes, the project manager is able to identiy two

sources o phone number data to be cleansed, and limit the project scope to justthose touch points and data sources. With the project scope narrowly dened,operational impact and costs are reduced, and expectations o disruption arelowered. The net result is that it is easier to obtain approval or the project.

factor 5: SteWarDShipNo strategy is complete without the evaluation o the human actor and its eecton operations. Workfows and data fows are initiated by people. Data itsel has novalue except to ulll purposes set orth by people. The people who manage dataprocesses are, in the current data warehouse vernacular, called data stewards. Aplain, nonspecialized steward is dened in the dictionary as, “One who managesanother’s property, nances, or other aairs.” Extending that denition or our

purposes, a data steward is a person who manages inormation and activities thatencompass data creation, capture, maintenance, decisions, reporting, distribution,and deletion. Thereore, a person perorming any o these unctions on a set o datais a data steward.

Much can be said about each o these activities, not to mention the principleso how to manage, provide incentives or, assign accountability, and structureresponsibilities or data stewards. A discussion on organizational structures ordata stewards could easily occupy a chapter in a book on data quality.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 16/28

In the denition o steward, there is a caption to emphasize: “One who managesanother’s property …” Many times project managers complain they can not movetheir project past a certain point because the stakeholders can’t agree on whoowns the data. This is dead center a stewardship issue. No steward owns the data.

The data is owned by the organization, just as surely as the organization owns itsname, trademarks, cash, and purchased equipment. The debate on ownership isnot really about ownership, but usually centers on who has the authority to approvea change to the data. The answer is the data stewardship team.

An action in the stewardship actor is to identiy the stakeholders (stewardshipteam) o the source data. Inorm them o the plans, ask each one about theirspecic needs, and collect their eedback. I there are many stakeholders,selecting a representative rom each user unction is highly encouraged. To doless will surely result in one o three conditions:

• A change is made that alienates hal o the users and the change is rolled back

• Hal o the users are alienated and they quit using the system

• Hal o the users are alienated, but are orced to use the system, and grumble andcomplain at every opportunity

Most would agree that any o these three outcomes are not good or uture workingrelationships!

Some organizations have progressed to the point where a ormal data stewardshipteam is appointed. In this case, someone has already identied the stakeholders,and selected them as representatives on the team. This denitely makes strategydevelopment a quicker process, as data stewards don’t have to be located.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 17/28

When evaluating the data stewardship actor or a new project the ollowing tasksneed to be perormed:

• Answer questions, such as: Who are the stakeholders o the data? Who are thepredominant user groups, and can a representative o each be identied? Who

is responsible or the creation, capture, maintenance, reporting, distribution, anddeletion o the data? I one o these is missed—any one o them—their actions willall out o sync as the project progresses, and one o those, “You never told meyou were going to do that!” moments will occur.

• Careully organize requirements-collecting sessions with the stakeholders. Tellthese representatives any plans that can be shared, assure them that nothingyet is nal, and gather their input. Let these people know that they are criticalstakeholders. I strong political divisions exist between stakeholders, meet withthem separately and arbitrate the disagreements. Do not setup a situation whereeuds can erupt.

• Once a near-nal set o requirements and a preliminary project plan are ready,

reacquaint the stakeholders with the plan. Expect changes.

• Plan to provide training and education or any new processes, data modelchanges, and updated data denitions.

• Consider the impact o new processes or changed data sets on organizationalstructure. Usually a data quality project is ocused on an existing system, andcurrent personnel reporting structures can absorb the new processes or modelchanges. Occasionally, however, the existing system may need to be replacedor migrated to a new system, and large changes in inormation inrastructure arerequently accompanied by personnel shits.

Data quality projects usually involve some changes to existing processes. The goal

o hal o all data quality projects is, ater all, workfow improvement. For example,a marketing department in one organization sets a goal o reducing processingtime o new leads rom two weeks to one day. The existing process consistso manually checking each new lead or duplications against its CRM system.The department decides to implement an automated match and consolidationoperation. The resulting workfow improvement not only saves labor time andmoney, but also results in more accurate prospect data. With improvement comeschange (sometimes major, sometimes minor) in the roles and responsibilities o thepersonnel involved. Know what those changes will be.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 18/28

A plan to compile and advertise the benets (return on investment) o a dataquality project deserves strategic consideration. This alls in the stewardshipactor because it is the data stewards and project managers that are tasked with

 justication. Their managers may deliver the justication to senior management, but

it’s oten the data stewards who are required to collect, measure, and assert the“payo” or the organization. Once the message is crated, do not underestimatethe need or and value o repeatedly advertising how the improved data willspecifcally benet the organization. Give your organization the details as acomponent o an internal public or employee relations campaign. Success comesrom continually reinorcing the benets to the organization. This builds inertia,while hopeully managing realistic expectations. This inertia will see the projectthrough budget planning when the project is compared against other competingprojects.

factor 6: continuouS MonitoringThe nal actor in a data quality strategy is continuous monitoring. Adhering

to the principals o Total Quality Management (TQM), continuous monitoringis measuring, analyzing, and then improving a system in a continuous manner.Continuous monitoring is crucial or the eective use o data, as data immediatelyages ater capture, and uture capture processes can generate errors.

Consider the volatility o data representing attributes o people. As stated earlier,in the United States, 17% o the population moves annually, which means theaddresses o 980,000 people change each week. A supplier o phone numbersreports that 7% o non-wireless U.S. phone numbers change each month, equatingto approximately 3.5 million phone numbers changing each week. In the UnitedStates., 5.8 million people have a birthday each week, and an additional 77,000are born each week. These sample statistics refect the transience o data. Eachweek mergers and acquisitions change the titles, salaries, and employment status

o thousands o workers. The only way to eectively validate dynamic data or usein daily operations is to continuously monitor and evaluate using a set o qualitymeasurements appropriate to the data.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 19/28

A common question in this regard is, “How oten should I prole my data?”Periodicity o monitoring is determined by our considerations:

1. How oten the data is used—or example, hourly, daily, weekly, or monthly.

2. The importance o the operation using the data—mission critical, liedependent, routine operations, end o month reporting, and so on.

3. The cost o monitoring the data. Ater the initial expense o establishing themonitoring system and process, the primary costs are labor and CPU cycles.The better the monitoring technology, the lower the labor costs.

4. Operational impact o monitoring the data. There are two aspects to consider:the impact o assessing operational (production) data during live operations,and the impact o the process on personnel. Is the assessment process highlymanual, partially automatic, or ully automatic?

The weight o these considerations varies depending on their importance tothe operation. The greater the importance, the less meaningul the cost andoperational impact o monitoring will be. The challenge comes when an operationis o moderate importance, and cost and operational impact are at the samelevel. Fortunately, data is supported by technology. While that same technologyimproves, it lowers the costs o monitoring, and lowers operational impacts.

Data stored in electronic media and even data stored in nonrelational les canbe accessed via sophisticated data proling sotware. It is with this sotware thatully automated and low-cost monitoring solutions can be implemented, therebyreducing the nal consideration o continuous monitoring to “how oten” it shouldbe done. When purchased or built, a data proling solution could be rationalizedas “expensive,” but when the cost o the solution is amortized over the trillions omeasurements taken each year or perhaps each month, the cost per measurementquickly nears zero. Another actor that reduces the importance o cost is theultimate value o continuous monitoring—nding and preventing deects rompropagating, and thereore eliminating crisis events where the organization isimpacted rom those deects.

As the previous data-churn statistics show, data cleansing cannot be a one-time activity. I data is cleansed today, tomorrow it will have aged. A continuousmonitoring process allows an organization to measure and gauge the datadeterioration so it can tailor the periodicity o cleansing. Monitoring is also theonly way to detect spurious events such as corrupt data eeds—unexpected andinsidious in nature. A complete continuous monitoring plan should address each othe ollowing areas.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 20/28

• id msms d ms . Start with project goals. Thegoals determine the rst data quality strategy actor—the context. In the contextactor, it’s determined what data supports the goals. The measurements ocus onthis data. Various attributes (ormat, range, domain, and so on) o the data elements

can be measured. The measurements can be rolled up or aggregated (each havingits own weight) into metrics that combine two or more measurements. A metric omany measurements can be used as a single data quality score at the divisional,business unit, or corporate level. A group o measurements and metrics can orma data quality dashboard or a CRM system. The number o deective addresses,invalid phone numbers, incorrectly ormatted email addresses, and nonstandardpersonnel titles can all be measured and rolled up into one metric that representsquality o just the contact data. Then, i the quality score o the contact data doesnot exceed a threshold dened by the organization, a decision is now possible topostpone a planned marketing campaign until cleansing operations raise the scoreabove the threshold.

• id w d w m. The storage, data fow, and workfow

actors provide the inormation or this step. The storage actor tells what datasystems house the data that needs to be monitored. The workfow actor tells howoten the data is used in a given operation and will provide an indication as tohow oten it should be monitored. The data fow actor tells how the data moves,and how it has been manipulated just prior to the proposed point o measure. Adecision continuous monitoring will ace is whether to measure the data beoreor ater a given operation. Is continuous monitoring testing the validity o theoperation, or testing the validity o the data to uel the operation, or both?

One pragmatic approach is to put a monitoring process in place to evaluate aew core tables in the data warehouse on a weekly basis. This identies deectsinserted by processes eeding the data warehouse, and deects caused byaging during the monitoring interval. It may not identiy the source o the deectsi multiple inputs are accepted. To isolate changes rom multiple events, themonitoring operation would need to be moved urther upstream or timed to occurater each specic update.

Organizations should be aware that although this simple approach doesn’toptimally t an organization’s goals, but suces or an initial implementation.An enhancement to the simple plan is to also monitor the data at the upstreamoperational data store or staging areas. Monitoring at the ODS identies deectsin isolation rom the data warehouse, and captures them closer to the processesthat caused them. The data in the ODS is more dynamic and thereoremonitoring may need to be perormed in greater requency—or example,nightly instead o weekly.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 0

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 21/28

• imm m ss. This involves conguring a data proling sotwaresolution to test specic data elements against specic criteria or business rules,and save the results o the analysis to a metadata repository. Once established,when to monitor and where to implement the process is relatively straightorward.

Most data proling packages can directly access relational data sources identiedin the storage actor. More sophisticated solutions are available to monitornonrelational data sources, such as mainrame data and open systems fat les.

Conguring the data proling sotware involves establishing specic businessrules to test. For example, a part number column may have two allowed ormats:###A### and ###-###, where # is any valid numeric character, and A is anycharacter in the set A, B, C, and E. The user would enter the two valid ormats intothe data proling sotware where the rules are stored in a metadata repository. Theuser can then run the rules as ad hoc queries or as tasks in a regularly scheduled,automated monitoring test set.

• r s ssssm. A baseline assessment is the rst set o tests

conducted to which subsequent assessments in the continuous monitoringprogram will be compared. Identiying the business rules and conguring the dataproling sotware or the rst assessment is where the majority o work is requiredin a continuous monitoring program. Building the baseline assessment serves as aprototyping evolution or the continuous monitoring program. First iterations o testsor recorded business rules need to be changed as they will not eectively evaluatecriteria that are meaningul to the people reviewing the reports. Other rules and thedata will change over time as more elements are added or the element attributesevolve. The initial setup work or a baseline assessment is leveraged when the nalset o analysis tasks and business rules runs on a regular basis.

• ps m s. A common ailing o a continuous monitoring program ispoor distribution or availability o the analysis results. A key purpose o the program

is to provide both inormation and impetus to correct fawed data. Restrictingaccess to the assessment results is counterproductive. Having a data prolingsolution that can post daily, weekly, or monthly reports automatically, ater each run,to a corporate Intranet is an eective communication device and productivity tool.The reports should be careully selected. The higher the level o manager reviewingthe reports, the more aggregated (summarized) the report data should be.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach 1

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 22/28

The report example below in Figure 4 oers two dierent measurementssuperimposed on the same chart. In this case, a previous business rule or thedata stipulated there should be no NULL values. When numerous NULL valueswere indeed ound, another test was implemented to track how eective the

organization was at changing the NULLs to the valid values o N or P.

Figure 4: Report Example

This level o reporting is appropriate or eld-level analysts and managers whohave to cure a specic process problem, but is too low level or a senior manager.For a director level or higher position, a single aggregate score o all qualitymeasurements in a set o data is more appropriate.

• Sd d swd m ms w m ds.  

Review meetings can be large or small, but they should occur regularly. Theoreti-cally, they could occur as oten as the battery o monitoring tests. I the tests arerun nightly, meeting daily as a team may be a burden. A single person could beassigned to review the test runs, and call the team together as test results warrant.However, a typical ailing o continuous monitoring programs is ollow-through.The inormation gained is not acted upon. While tremendous value can be derivedrom just knowing what data is deective and avoiding those deects, the greatestvalue comes rom xing the deects early in the trend. This cannot be done unlessthe stewardship team, either as individuals, or as a team, implements a remediationaction to both cleanse the data and cure the process that caused the deects.

140

Dev Area Visibility Codes N and P

120

      C    o    u    n     t

100

80

60

40

20

0Jan Feb Mar

2003April May

N- QRY

P- QRY

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 23/28

In summary, continuous monitoring alerts managers to deterioration in data qualityearly in the trend. It identies which actions are or are not altering the data qualityconditions. It quanties the eectiveness o data improvement actions, allowingthe actions to be tuned. Last, and most importantly, it continually reinorces the

end users’ condence in the usability o the data.The irony is many systems all into disuse because o deective data, and stayunused even ater strenuous exertions by IT to cleanse and enhance the data.The reason is perception. The system is perceived by the users, not IT, to stillbe suspect. A ew, well-placed and ill-timed deects can destroy overnight thereliability o a data system. To regain the trust and condence o users, a steadystream o progress reports and data scores need to be published. These comerom a continuous monitoring system that shows and convinces users over timethe data is indeed improving.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 24/28

In order or any strategy ramework to be useul and eective, it must be scalable.The strategy ramework provided here is scalable rom a simple one-eld update,such as validating gender codes o male and emale, to an enterprise-wideinitiative, where 97 ERP systems need to be cleansed and consolidated into one

system. To ensure the success o the strategy, and hence the project, each othe six actors must be evaluated. The size (number o records/rows) and scope(number databases, tables, and columns) determines the depth to which eachactor is evaluated.

Taken all together or in smaller groups, the six actors act as operands in dataquality strategy ormulas:

• Context by itsel = The type o cleansing algorithms needed

• Context + Storage + Data Flow + Workfow = The types o cleansing andmonitoring technology implementations needed

• Stewardship + Workfow = Near-term personnel impacts

• Stewardship + Workfow + Continuous Monitoring = Long-term personnelimpacts

• Data Flow + Workfow + Continuous Monitoring = Changes to processes

It is a result o using these ormulas that people come to understand thatinormation quality truly is the integration o people, process, and technology inthe pursuit o deriving value rom inormation assets.

tying it all together

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 25/28

Where the data quality strategy ormulation process ends, data quality projectmanagement takes over. In truth, much, i not all o the work resolving the sixactors, can be considered data quality project planning. Strategy ormulation otenencompasses a greater scope than a single project and can support the goals o

an entire enterprise, numerous programs, and many individual projects. Sooner orlater, strategy must be implemented through a series o tactics and actions, whichall in the realm o project management. While the purpose o this paper is not tocover the deep subject o data quality project management, it does set the stageor a clear transition rom strategy ormulation to the detailed management o thetasks and actions that ensure its success.

Once a strategy document is created—big or small, comprehensive or narrowlyocused—it can be handed to the project manager and everything he or she needsto know to plan the project should be in that document. This is not to say all thework has been done. While the goals have been documented, and the data setsestablished, the project manager must build the project requirements rom thegoals. The project manager should adhere to the sound project management

principals and concepts that apply to any project, such as task ormulation,estimation, resource assignments, scheduling, risk analysis, mitigation, and projectmonitoring against critical success actors. Few o these tactical issues arecovered in a strategy-level plan.

Another acet o a successul data quality strategy is consideration o the skills,abilities, and culture o the organization. I the concept o data quality is new to yourorganization, a simple strategy is best. Simple strategies t pilot projects. A typicalpilot project may involve one column o data (phone numbers, or example) in onetable, impacting one or two users, and involved in one or two processes. A simplestrategy or this project, encompassing all six actors, can t on one page o paper.

However, the more challenging the goals o a data quality strategy, the greater thereturns. An organization must accept that with greater returns come greater risks.

Data quality project risks can be mitigated by a more comprehensive strategy.Be aware that the initial strategy is a rst iteration. Strategy plans are “living”work products. A complex project can be subdivided into mini-projects, or pilots.Each successul pilot builds inertia. And therein lies a strategy in itsel: divideand conquer. Successul pilots will drive uture initiatives. Thus an initial strategyplanning process is part o a larger recurring cycle. True quality management is,ater all, a repeatable process.

iMpleMentation anDproJect ManageMent

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 26/28

To help the practitioner employ the data quality strategy methodology, the corepractices have been extracted rom the actors and listed here.

• A statement o the goals driving the project

• A list o data sets and elements that support the goal

• A list o data types and categories to be cleansed1 

• A catalog, schema, or map o where the data resides2 

• A discussion o cleansing solutions per category o data3 

• Data fow diagrams o applicable existing data fows

• Workfow diagrams o applicable existing workfows

• A plan or when and where the data is accessed or cleansing4 

• A discussion o how the data fow will change ater project implementation• A discussion o how the workfow will change ater project implementation

• A list o stakeholders aected by the project

• A plan or educating stakeholders as to the benets o the project

• A plan or training operators and users

• A list o data quality measurements and metrics to monitor

• A plan or when and where to monitor5 

• A plan or initial and then regularly scheduled cleansing

appenDix a:Data QualityStrategy checkliSt

1 Examples o type are text, date, or time, and examples o category are street address,

part number, contact name, and so on.2 This can include the name o the L AN, server, database, and so on.3 This should include possible and desired deployment options or the cleansing solution.

See the section entitled Workfow or specic deployment options.4 This covers the when (during what steps in the data fow and workfow will the cleansing

operation be inserted) and the where (on what data systems will the cleansing operation be employed)

o the cleansing portion o the project.5 This includes running a baseline assessment, and then selecting tests rom the baseline

to run on a regular basis. Reports rom the recurring monitoring will need to be posted,

and regular review o the reports scheduled or the data stewardship team.

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 27/28

Business Objects, an SAP company, has been a pioneer in business intelligence(BI) since the dawn o the category. Today, as the world’s leading BI sotwarecompany, Business Objects transorms the way the world works through intelligentinormation. The company helps illuminate understanding and decision-making

at more than 44,000 organizations around the globe. Through a combination oinnovative technology, global consulting and education services, and the industry’sstrongest and most diverse partner network, Business Objects enables companieso all sizes to make transormative business decisions based on intelligent,accurate, and timely inormation.

More inormation about Business Objects can be ound atwww.businessobjects.com.

about buSineSS obJectS

bsss ojs. Data Quality Strategy: A Step-by-Step Approach

8/6/2019 60.Data Quality Strategy a Step-By-Step Approach

http://slidepdf.com/reader/full/60data-quality-strategy-a-step-by-step-approach 28/28

© 2008 Business Objects. All rights reserved. Business Objects owns the ollowing U.S. patents, which may cover products that are oered and licensed by Business Objects: 5,555,403; 5,857,205;

businessobjects.com