26
Data Cleansing and Matching Welcome Data Cleansing and Matching Workshop

Data Cleansing and Matching Welcome Data Cleansing and Matching Workshop

Embed Size (px)

Citation preview

Data Cleansing and Matching

Welcome

Data Cleansing and Matching Workshop

Data Cleansing and Matching

The Agenda

An Introduction to VisionWare

The Fundamental Elements of Data Cleansing and Matching

Case Study: Clackmannanshire Council

Open Discussion

The Benefits

Data Cleansing and Matching

Public Sector Public Sector PedigreePedigree

Public Sector Public Sector PedigreePedigree

Thought Thought LeadershipLeadership

Thought Thought LeadershipLeadership

Rapid, Deep Rapid, Deep IntegrationIntegration

Rapid, Deep Rapid, Deep IntegrationIntegration

About VisionWare plc

Strategic Strategic AlliancesAlliancesStrategic Strategic AlliancesAlliances

60+ Established Public Sector ClientsCustomer ReferencesCRM/SSA/ICS/Citizen Account/Smart Card Initiatives

60+ Established Public Sector ClientsCustomer ReferencesCRM/SSA/ICS/Citizen Account/Smart Card Initiatives

Broad and Deep Integration CapabilityBack-end Legacy & Front-end CRMSystems/Applications/Data/Functions/Services

Broad and Deep Integration CapabilityBack-end Legacy & Front-end CRMSystems/Applications/Data/Functions/Services

Products: MultiVue/relate/E-FormsNon-Prescriptive Trusted DataCustomisable Framework

Products: MultiVue/relate/E-FormsNon-Prescriptive Trusted DataCustomisable Framework

Microsoft ITNET Parity Solidsoft Deloittes Capita Microsoft ITNET Parity Solidsoft Deloittes Capita

Data Cleansing and Matching

HEALTH TRUSTS• Ayrshire & Arran

• Fife Primary Care

• Highlands Acute Hospitals

• NHS Highland

• Inverclyde Hospitals

• Yorkhill

• Lanarkshire Primary Care

• Fife Acute

• Peterborough

• Royal Wolverhampton

• West Suffolk Hospitals

• Weston Area Healthcare

• United Bristol

• Great Ormond Street Hospital

• North Cheshire

• Croydon PCT

HEALTH TRUSTS• Ayrshire & Arran

• Fife Primary Care

• Highlands Acute Hospitals

• NHS Highland

• Inverclyde Hospitals

• Yorkhill

• Lanarkshire Primary Care

• Fife Acute

• Peterborough

• Royal Wolverhampton

• West Suffolk Hospitals

• Weston Area Healthcare

• United Bristol

• Great Ormond Street Hospital

• North Cheshire

• Croydon PCT

LOCAL GOVERNMENT• Aberdeen City

• East Renfrewshire

• Glasgow City

• Moray

• North Ayrshire

• North Lanarkshire

• West Lothian

• South Lanarkshire

• Renfrewshire

• West Dunbartonshire

• East Lothian

• Clackmannanshire

• Wansbeck

• Leicestershire

• Sutton

LOCAL GOVERNMENT• Aberdeen City

• East Renfrewshire

• Glasgow City

• Moray

• North Ayrshire

• North Lanarkshire

• West Lothian

• South Lanarkshire

• Renfrewshire

• West Dunbartonshire

• East Lothian

• Clackmannanshire

• Wansbeck

• Leicestershire

• Sutton

A Selection of VisionWare Public Sector Customers

LOCAL GOVERNMENT

• Midlothian

• Merton

• Newham

• Croydon

• Luton

• Tower Hamlets

• North Tyneside

• Windsor & Maidenhead

• Wiltshire

• Blackburn with Darwin

• Calderdale

• East Sussex

• Inverclyde

• Cambridgeshire

• Bedfordshire Consortium

LOCAL GOVERNMENT

• Midlothian

• Merton

• Newham

• Croydon

• Luton

• Tower Hamlets

• North Tyneside

• Windsor & Maidenhead

• Wiltshire

• Blackburn with Darwin

• Calderdale

• East Sussex

• Inverclyde

• Cambridgeshire

• Bedfordshire Consortium

Data Cleansing and Matching

The Fundamental Elements

of Data Cleansing and Matching

Evaluate the Quality and Quantity of Data

Cleanse the Data

Match the Data

Maintain and Synchronise the Data

Data Cleansing and Matching

The Operational Challenge

The use and administration of data within Public Sector organisations has grown:

- Electronic Service Delivery

- Modernising Government Initiatives Each vertical departmental system stores demographic data

and information relating to their functional area

- This creates silos of information across the organisation We need to deliver services designed around the citizen NOT

around the departmental function We must therefore Join-Up Data to deliver Joined-Up Services What underpins these initiatives?

- Information Sharing

- Trusted Source of Unified Data

The use and administration of data within Public Sector organisations has grown:

- Electronic Service Delivery

- Modernising Government Initiatives Each vertical departmental system stores demographic data

and information relating to their functional area

- This creates silos of information across the organisation We need to deliver services designed around the citizen NOT

around the departmental function We must therefore Join-Up Data to deliver Joined-Up Services What underpins these initiatives?

- Information Sharing

- Trusted Source of Unified Data

Data Cleansing and Matching

Evaluate the Quality and Quantity of Data

Identity information is held within each of the organisation’s line of business applications

Each identity will vary in terms of:- Quality- Accuracy- Quantity

Need to be able to:- Report on the variance of both data quality and data

quantity across the departmental systems- Match and rationalise the information

Identity information is held within each of the organisation’s line of business applications

Each identity will vary in terms of:- Quality- Accuracy- Quantity

Need to be able to:- Report on the variance of both data quality and data

quantity across the departmental systems- Match and rationalise the information

Data Cleansing and Matching

Evaluate the Quality and Quantity of Data: On a National Scale

Scotland: 5,057,400

Annual demographic change factors:

52,395 birth registrations, 58,326 death registrations, 30,651 marriages recorded, 10,484 divorces recorded,125,000 Annual address changes, Unquantifiable job and circumstance changes

With these demographic changes on a yearly basis how can we ensure the quality of our data…?

Scotland: 5,057,400

Annual demographic change factors:

52,395 birth registrations, 58,326 death registrations, 30,651 marriages recorded, 10,484 divorces recorded,125,000 Annual address changes, Unquantifiable job and circumstance changes

With these demographic changes on a yearly basis how can we ensure the quality of our data…?

Births

Deaths

Marriages

Divorces

Address Changes

Unchanged

Births

Deaths

Marriages

Divorces

Address Changes

Unchanged

• This represents over 5% of the population.

• In 2 years, at least 10% of data could be out of date

• In 5 years, at least 30% of data could be out of date

• This represents over 5% of the population.

• In 2 years, at least 10% of data could be out of date

• In 5 years, at least 30% of data could be out of date

Data Cleansing and Matching

L

E

I

S

U

R

E

L

I

B

R

A

R

Y

T

R

A

V

E

L

S

C

H

O

O

L

L

SSmartCardSmartCard

DatasetDataset

V

U

L

N

E

R

A

B

L

E

E

L

D

E

R

L

Y

SSASSA

DatasetDataset

Evaluate the Quality and Quantity of Data: Modernising Government Initiatives

ICSICS

DatasetDataset

C

H

I

L

D

R

E

N

C

A

R

E

R

S

CRM DatasetCRM Dataset

P

E

O

P

L

E

P

R

O

P

E

R

T

Y

P

L

A

C

E

S

S

P

A

C

E

S

O

B

J

E

C

T

S

A

S

S

E

T

S

LLPGLLPG

DatasetDataset

Each Initiative generates its own datasetEach Initiative generates its own dataset

Existing LOB Applications do not participate and ADD to the problemExisting LOB Applications do not participate and ADD to the problem

Between 150-250 LOB Applications containing Customer Data ElementsBetween 150-250 LOB Applications containing Customer Data Elements

Data Cleansing and Matching

Data Cleansing and Matching

Integration of various identities will invariably lead to a series of data contentions- Multiple names, multiple addresses, inconsistent dates of birth,

incorrect (false) demographic information and duplicate information This needs to be resolved before we can provide a unified view of

trusted data relating to either a person or property. Need to be able to:

- Resolve data contentious issues- Aggregate all non-contentious information- Provide a composition that retains information of the highest quality

and quantity by:

- Matching the records- Merging the information - Managing the duplicated data

Integration of various identities will invariably lead to a series of data contentions- Multiple names, multiple addresses, inconsistent dates of birth,

incorrect (false) demographic information and duplicate information This needs to be resolved before we can provide a unified view of

trusted data relating to either a person or property. Need to be able to:

- Resolve data contentious issues- Aggregate all non-contentious information- Provide a composition that retains information of the highest quality

and quantity by:

- Matching the records- Merging the information - Managing the duplicated data

Data Cleansing and Matching

The Real Challenge: A Plethora of Systems, Silos of Information

CEX&Corporate

Registrar ofBD&M

Leisure

Education

Environment

Housing

HBIS (@CSL)VME/IDMS &

Pericles(Replacement)

Housing Rents(NORA)

Unix/Progress

Housing Repairs &Needs (OHMS/ROS)

Unix/Oracle

HWSVoids andAllocationsVME/IDMS

UNIVERSEUnix/Pick

NEWHAM APPLICATION SYSTEMS MAP

PARIS:Cashiers

NT/MS SQL

Geoff Connell - CICT Development11/09/2003

FMS(Masterpiece 3)Win2k/MS SQL

Benfit Adjustments / W

Benefit / WTerminations / D

Tenant Name Loader/M

Various Job Related(1/2 Hourly)

PARIS: CashManagementNT/MS SQL

4/D

1/O

MIS2 DebtorsNT4/Oracle 7

Council TaxVME/IDMS &

Pericles(replacement)Win2k/SQL

PayrollVME/IDMS &

Trent(Replacement)Win2k/Oracle

1/A

3/D

3/W

CADNT4/Oracle 7

CustomerTrackingCRMNT4/Oracle 7

LandlordO'payments/D

AllbacsNT4/Datafile

BACSSocial Services

CarefirstUnix/Oracle 7

1/A

NNDRNT2K/Pick

2/M

2/M

2/A

1/D

1/W&M

Start HereNT4/Flat Files

1/D

4/W

Iclipse (DIP)NT4/Oracle 7

3/M

CallscanNT4/Access

1/D

Estateman(Commercial Rents)

NT4/?

1/D

W

3/D

1/W&M

CAPSNT4/Oracle

1/D

4/D

1/D

Cashiers

ChequeReconciliation

Win2K/Oracle 8

ElectoralRegistration

DOS

3/M

1&3/D

1&3/W*2

O

Corporate &Commercial Debt

NT4/Oracle 7

4/O1/O

1/D

Rebates etc/D

Salary Ddns/W&M

3/D

W

4/D

O&D

D

PWA(Personnel)NT4/SQL

Key

Interface type1. Payment Details2. Direct Debits3. Reconciliation4. Account Balances

Interface FrequencyD. DailyW. WeeklyM. MonthlyA. Ad-HocO. On-LineP. Pending

1/W

W

3/P

StoresNT/Datafile

1/D& 3/W

3/W

3/DEMS

NT4/Oracle 8i

SIMSNT4/Oracle 7

Pupils ServicesNT4/Oracle 7

1/D& 3/W

BanksNat West &Girobank

3/W

3/M&W

NewFleetNT4/SQL7

3/M

HotlineNT4/Oracle

CorporateGIS (GGP)

NT/?

Internet(APLAWS)Unix/Oracle

Jobs/W

IntranetNT4/SQL7

Asylum SeekersWIN2K/Access

3/P

1/W*2

Parking(POW)

NT4/Oracle

1/D

N2 (Pending)NT4/SQL7

1/D& 3/W

RTB/ServiceCharges &

Major WorksNT4/Dataease

1/W

LAMPNT4/?

3/M

1/M

iSYS HousingSun/Oracle

TaskNT/Oracle

Choice BasedLettingsNT/SQL

Kiosks

W

O

People&Properties/DBids bi-W

DVLA/Trace

W

Planned Interface

Live Interrface

Kiosks

CareSupportWin2k/Oracle

9iAS

1/M

1/M

ProfessWin2k/SQL

1

StarkNT/Access

LEB

RegistrarGeneral's

Office

RSSLotus Notes

LibrariesTBC

LoCTAWin2k/

SqlServer/IIS

WCustomer Complaints

NT4/Oracle 7

Data WarehouseWin2k/Oracle 9i

Loctahub

O

Identification of £1.5m of Benefit FraudIdentification of £1.5m of Benefit Fraud580,000 records relating to 200,000 people 3:1 Ratio of Duplicates 580,000 records relating to 200,000 people 3:1 Ratio of Duplicates

Data Cleansing and Matching

Data Matching: Look at it this way…

Antonia Marie PilaskiAlias:

Toni PilaskiMarie Pilaski

Address:33 2 Prince Regent Street EDINBURGH

Antonia Marie PilaskiAlias:

Toni MarieAddress:

45 Dunfermline Av EDINBURGH

Mark Baker

Address:24 6 Montgomery Street EDINBURGH

Mark Ritchie

Address:24 6 Montgomery Street EDINBURGH

Antonia Ritchie

Address:24 6 Montgomery Street EDINBURGH

WHO THEN IS ANTONIA RITCHIE?

Lives with parents

Moves to own flat

Gets engaged

Fiancée changes name

Gets married & moves in with husband

Data Cleansing and Matching

Maintenance and Synchronisation

The maintenance of identity information is:- Time consuming- Inefficient manual process

Potential risk involved in latency of updates Possible inconsistencies within the datasets Need to be able to:

- Implement a mechanism that enables information to be passed and shared between the departmental systems

- Each connected application needs to be notified of any validated changes

The Benefits- Ensures consistent view of an individual- Level of data latency can be controlled- The risks of utilising redundant information is managed

The maintenance of identity information is:- Time consuming- Inefficient manual process

Potential risk involved in latency of updates Possible inconsistencies within the datasets Need to be able to:

- Implement a mechanism that enables information to be passed and shared between the departmental systems

- Each connected application needs to be notified of any validated changes

The Benefits- Ensures consistent view of an individual- Level of data latency can be controlled- The risks of utilising redundant information is managed

Data Cleansing and Matching

The Fundamental Elements of Data Cleansing and Matching

COUNCIL TAX

SYSTEMHOUSING

SYSTEMPLANNING

SYSTEM

CENTRAL

GOVERNMENT

CENTRAL

HEALTH

SYSTEMS

COMMUNITY RELATIONSHIP MANAGEMENT (CRM) SYSTEM – FRONT END INTEGRATION

MIDDLEWARE – BACK END SYSTEMS INTEGRATION

One Number

Contact Centre

EDUCATION

BENEFITS

AGENCYPOLICE

CUSTOMER CONTACT CHANNELS

TRUSTED DATA SCALING ACROSS THE LINE OF BUSINESS APPLICATIONS

SHARED INFRASTRUCTURE

Mediated Access

SOCIAL WORK

SYSTEM

S

Y

N

C

H

R

O

N

I

S

A

T

I

O

N

S

Y

N

C

H

R

O

N

I

S

A

T

I

O

N

P O R T A L S O F F U N C T I O N A L I T Y P O R T A L S O F F U N C T I O N A L I T Y

SYSTEMS

DATA

VOLUMES

DATA

VOLUMES

DATA

VALUE

DATA

VALUE

Data Cleansing and Matching

Case Study Presentation

Brian Forbes

Modernising Government Strategy Manager

Clackmannanshire Council

Data Cleansing and Matching

Open Discussion

VisionWare plc

Willie Clinton, Director

Campbell McNeill, Consultant

Clackmannanshire Council

Brian Forbes, Modernising Government Strategy Manager

Alexis Easton, Head of IT Services

Data Cleansing and Matching

Topics for Discussion

How do you change the culture to ensure that staff maintain quality data?

How do you measure data quality? Do we need to define national standards for the

Public Sector? What are the difficulties matching citizen data

with limited information? What resources are required to match data?

How do you change the culture to ensure that staff maintain quality data?

How do you measure data quality? Do we need to define national standards for the

Public Sector? What are the difficulties matching citizen data

with limited information? What resources are required to match data?

Data Cleansing and Matching

How do you change the culture to ensure that staff maintain quality data?

The Structure of the Organisation: - Silos of information exists across departmental systems- Each departmental system holds demographic information

about entities (person, property, assets) - Should each department manage their own data?- Should the organisation have a corporate-wide strategy?- Should we consider a Centralised Repository of

Information, for example, The Citizen Account?- Data quality has to be improved by changing business

processes and working practices

The Structure of the Organisation: - Silos of information exists across departmental systems- Each departmental system holds demographic information

about entities (person, property, assets) - Should each department manage their own data?- Should the organisation have a corporate-wide strategy?- Should we consider a Centralised Repository of

Information, for example, The Citizen Account?- Data quality has to be improved by changing business

processes and working practices

Data Cleansing and Matching

How do you measure data quality?

Data Quality- Data Quality = How accurate is the information?- Data Latency = How up-to-date is the information- Data Quantity = multiple systems, silos of information- Other areas to consider

Information Audit

Technology

Public Enquiry, at worst

- Some systems have more valuable data than others- How can these systems support the “weaker systems?”

Data Quality- Data Quality = How accurate is the information?- Data Latency = How up-to-date is the information- Data Quantity = multiple systems, silos of information- Other areas to consider

Information Audit

Technology

Public Enquiry, at worst

- Some systems have more valuable data than others- How can these systems support the “weaker systems?”

Data Cleansing and Matching

Do we need to define data standards

We have existing standards:- eGIF- Citizen Account Dataset- BS8766 (Name)- BS7666 (Addressing)- BS7799 (Security)- Data Protection

How do you stop standards from stifling innovation or impacting for example, Data Protection

We have existing standards:- eGIF- Citizen Account Dataset- BS8766 (Name)- BS7666 (Addressing)- BS7799 (Security)- Data Protection

How do you stop standards from stifling innovation or impacting for example, Data Protection

Data Cleansing and Matching

What are the difficulties matching citizen data with limited information?

Limited Information- Does the organisation know what information they

hold?- Is forename, surname and address limited

datasets?- Does limited data come from the imposition of

Data Protection and Information Sharing? Leverage the best of what we have

- The process has got to be evolutionary not revolutionary

- Dependent upon the Quality, Quantity and Latency of Information

Limited Information- Does the organisation know what information they

hold?- Is forename, surname and address limited

datasets?- Does limited data come from the imposition of

Data Protection and Information Sharing? Leverage the best of what we have

- The process has got to be evolutionary not revolutionary

- Dependent upon the Quality, Quantity and Latency of Information

Data Cleansing and Matching

What resources are required to match data?

People and Technology Manual Process

- Build a level of trust in the data Automatic Process

- Probalistic matching to deterministic matching - Parameters set by the organisation

People and Technology Manual Process

- Build a level of trust in the data Automatic Process

- Probalistic matching to deterministic matching - Parameters set by the organisation

Data Cleansing and Matching

The Benefits

Data Management

Trusted Data Source

Joined-Up Services

Multi-Agency Working

Data Cleansing and Matching

Some Case Study Examples

Customer CRM Citizen Account

Data Management

Master Address Database

Children Elderly

Clackmannanshire

East Lothian

Fife Constabulary

Glasgow City Council

Inverclyde

Midlothian

North Ayrshire

Renfrewshire

South Lanarkshire

West Dunbartonshire

West Lothian

Data Cleansing and Matching

MultiVue is the Key to Joined-Up Data

VisionWare specialises in the provision of trusted data with MultiVue Identification Server, an enterprise-wide data integration tool. 

Public Sector departments and multiple agencies can now share accurate and reliable information on every citizen.

VisionWare specialises in the provision of trusted data with MultiVue Identification Server, an enterprise-wide data integration tool. 

Public Sector departments and multiple agencies can now share accurate and reliable information on every citizen.

Data Cleansing and Matching

Thank you!Willie Clinton

Director

VisionWare plc

0141 285 7150

[email protected]

www.visionwareplc.com