15
Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

Embed Size (px)

Citation preview

Page 1: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

Proposal of a collaboration to improve the ethnicity

classification of patient registers

Pablo MateosUCL - CASA25th May 2005

Page 2: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

Contents

1. Aims of proposal

2. Mutual Benefits & Justification

3. Members

4. Data Sharing

5. Data Protection

6. Intellectual Property

7. Project Name

Page 3: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

1- Aims

The purposes of the group are threefold:

• To facilitate access to the Names-to-CEL directory

developed by CASA

• To develop and to share access to knowledge relating

to:

Effective use of the Names-to-CEL directory in public health

Data mining of birthplace information in the ‘Exeter’ register

• To improve the quality and accuracy of the directory by

contributing anonymised data from operational files

Page 4: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

2- Mutual Benefits

Benefits for the model

• Wider population base per

surname

– More ethnic groups better

represented

– Better Firstname or Surname

matches

• More extensive birthplace

name alias tables

Benefits for the PCTs

• Birthplace information

correctly classified

• Ethnic group classification

provided

• Richer ethnic classification:

Beyond 16+

At individual level

• Know-how already built

Page 5: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

One PCT is not enough

Frequency Distribution of Camden PCTSurnames or Forenames >40 people

0

500

1000

1500

2000

2500

3000

1 51 101 151 201 251 301 351 401 451 501 551 601 651

Nr. of Names

Nr

of

Peo

ple

/ N

ame

Surnames Forenames

Page 6: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

London ‘non-16+ ethnic groups’ (1.2 million people stated ‘other’ ethnic identities in London 2001 Census)

Ethnic Group PopulationOther white European, European Mixed 185,690Other white, white unspecified 171,744English 154,203Sri Lankan 53,307Black British 46,348Turkish 37,827Italian 35,252Other Mixed, Mixed unspecified 35,027Any other group 29,469Greek Cypriot 23,340Middle Eastern (excluding Israeli, Iranian and 'Arab') 20,537Arab 20,256Filipino 19,669Japanese 19,415Other mixed white 19,239Other Asian, Asian unspecified 18,334Greek 17,888Iranian 16,494Multi-ethnic islands 15,952Polish 15,928South and Central American 15,607British Asian 14,625Turkish Cypriot 14,074

Ethnic Group PopulationVietnamese 11,719Commonwealth of (Russian) Independent States 11,606North African 11,218Kurdish 9,659Latin American 9,188Mixed Black 9,001Jewish 8,912Other Black, Black unspecified 8,344Cypriot (part not stated) 7,360Mixed: Irish and other white 7,071Scottish 7,020Kosovan 6,896Welsh 6,895Somali 6,172East African Asian 5,328Chinese and White 4,871Tamil 4,758Black and White 4,226Moroccan 4,133Caribbean Asian 4,070Black and Asian 3,946Malaysian 3,384Albanian 3,226Sikh 2,814

Source: 2001 Census GLA commissioned tables(.../...)

Page 7: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

1 ANGLOPHONE2 ANGLOPHONE: CARIBBEAN3 BLACK AFRICAN: CONGOLESE4 BLACK AFRICAN: ETHIOPIAN5 BLACK AFRICAN: GAMBIAN6 BLACK AFRICAN: GHANAIAN7 BLACK AFRICAN: KENYAN8 BLACK AFRICAN: LIBERIAN9 BLACK AFRICAN: NIGERIAN

10 BLACK AFRICAN: SIERRA LEONEAN11 BLACK AFRICAN: SOUTH AFRICAN12 BLACK AFRICAN: UGANDAN13 BLACK AFRICAN: UNCLASSIFIED14 EAST ASIAN: CHINESE15 EAST ASIAN: INDOCHINA16 EAST ASIAN: JAPANESE17 EAST ASIAN: KOREAN18 EAST ASIAN: VIETNAMESE19 EUROPEAN: BALKAN20 EUROPEAN: BRITISH: UNCLASSIFIED21 EUROPEAN: DANISH22 EUROPEAN: DUTCH23 EUROPEAN: DUTCH_WORLD24 EUROPEAN: EASTERN EUROPE25 EUROPEAN: FINNISH26 EUROPEAN: FRENCH27 EUROPEAN: FRENCH_WORLD28 EUROPEAN: GERMAN29 EUROPEAN: GERMAN OR DUTCH30 EUROPEAN: GREEK / GREEK CYPRIOT31 EUROPEAN: HUNGARIAN32 EUROPEAN: IRISH: UNCLASSIFIED33 EUROPEAN: ITALIAN34 EUROPEAN: NORDIC35 EUROPEAN: OTHER36 EUROPEAN: POLISH37 EUROPEAN: ROMANIAN38 EUROPEAN: SLAVIC39 EUROPEAN: SWEDISH

CEL Group

40 HISPANIC: BRAZILIAN41 HISPANIC: CATALAN42 HISPANIC: LATIN AMERICAN43 HISPANIC: PORTUGUESE44 HISPANIC: PORTUGUESE_WORLD45 HISPANIC: SPANISH46 HISPANIC: SPANISH_WORLD47 HISPANIC: UNCLASSIFIED48 JEWISH49 MUSLIM: AFGHAN50 MUSLIM: ARAB51 MUSLIM: ARMENIAN52 MUSLIM: BALKANS53 MUSLIM: BANGLADESHI54 MUSLIM: BLACK AFRICAN OTHER55 MUSLIM: EGYPTIAN56 MUSLIM: ERITREAN57 MUSLIM: EURASIA58 MUSLIM: IRANIAN59 MUSLIM: IRAQI60 MUSLIM: LEBANESE61 MUSLIM: MIDDLE EASTERN62 MUSLIM: NORTH AFRICAN63 MUSLIM: PAKISTANI64 MUSLIM: PERSONAL NAME65 MUSLIM: SOMALI66 MUSLIM: SOUTHEAST ASIA67 MUSLIM: SUDANESE68 MUSLIM: TURKISH69 MUSLIM: UNCLASSIFIED70 OTHER NON-BRITISH71 OTHER SOUTH ASIAN: HINDI72 OTHER SOUTH ASIAN: HINDI OR SIKH73 OTHER SOUTH ASIAN: NEPALESE74 OTHER SOUTH ASIAN: NORTH INDIAN75 OTHER SOUTH ASIAN: SIKH76 OTHER SOUTH ASIAN: SOUTH INDIAN & SRI LANKAN77 OTHER SOUTH ASIAN: UNCLASSIFIED

CEL Group

Page 8: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

3- Members

• Primarily aimed at PCTs and health institutions

working on improving ethnicity classification

• Open to any institution interested in benefiting

and contributing to the ethnicity classification

model

• Pre-existing ‘operational names data’ at

individual level must exist within each member

Page 9: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

• UCL will distribute to members each update of the Name-to-CEL directories:

- Surname-to-CEL

- Forename-to-CEL

• Members will provide 2 separate files:

- Surname-Birthplace aggregation

- Forename-Birthplace aggregation

• There will be no way to link these two files together

• Only 1 common version of the Name-to-CEL directory will be maintained

4- Data Sharing

Page 10: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

Sur-names

Input & Processing Module (Highly restricted access)

BirthPlace Geocoder

SURNAME TOTAL COB1 COB2 COB3 COB4 ETCSURNAME X ZZ 37% 23% 12% 8% ETC

Records Aggregated by surname

Check > threshold

N Leave surname until more records arrive

Y

Proposed Data Flow (1)

Example provided here for Surnames. An exact parallel process applies for First Names

Output Surname

PCTs

Surname-to-CEL

UCL

Output Module

Current threshold =

Over 10 persons / surname

Page 11: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

Output Module

CEL=COBN

Y

Surname-to-CEL Assigned

CEL=Group of COBs

N Manual Review

Surname-to-CEL Directory

Visual Inspection

Updates Distributed

Proposed Data Flow (2)

Input & Processing Module

Surname-to-CEL

Page 12: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

• There will be no way to link the 2 files together (surname or forename)

• Records in the files will identify aggregations of either a surname or a forename, not individuals

• A minimum threshold of 10 persons per name will be applied to process & release the name to the output module.

• A detailed data sharing framework document is being developed to be signed by members

5- Data Protection

Page 13: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

• Intellectual Property of the Names-to-CEL directory is held by University College London

• Access to this directory, and to the methods and tools developed in the project will be granted free of charge for contributing members

• A fee will be charged to non-contributing members, as per future arrangements

• Contributing members are those who provide data to improve the Name-to-CEL allocation

6- Intellectual Property

Page 14: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

GEONom

7- Project Name

Geographic & Ethnic Origin of Names

www.casa.ucl.ac.uk/geonom

Page 15: Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25 th May 2005

8- Open Discussion

8.1. Data Sharing and Data Protection

8.2. Methodology

8.3. Applications