28
Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of Computer and Systems Sciences (DSV) [email protected]

Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Embed Size (px)

Citation preview

Page 1: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Panel:Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?

Hercules DalianisClinical Text Mining GroupDepartment of Computer and Systems Sciences (DSV)

[email protected]

Page 2: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Background

• Starting 2007

• Karolinska University Hospital, Stockholm

• Greater Stockholm (City Council) 2 million inhabitants

• 1800 beds/inpatients

• 550 clinical units

Hercules Dalianis, MEDINFO 2013 2

Page 3: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

TakeCare EPR system

• Swedish electronic patient record system, now owned by CompuGroup Medical

• Centralized, text file based

• Built on APL programming language

• Data transferred to MySQL database to make it manageable (Intelligence)

Hercules Dalianis, MEDINFO 2013 3

Page 4: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Ethical permission

• What type of research will be carried out

• How will it be carried out

• No social security number

• No personal names

• Safe guard of data

Hercules Dalianis, MEDINFO 2013 4

Page 5: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Encryption and safe guard

• Encrypted server

• Password protected

• Locked into an alarmed room

• Server locked to a rack

• No Internet connection

• Few people have access to this server (that have to sign security paper)

=> Probably safer than at the hospital

Hercules Dalianis, MEDINFO 2013 5

Page 6: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Trust, Trust and more Trust

• Good contacts with hospital management

• They decide for the whole hospital/all clinical units

• No psychiatric or veneric diseases, no paperless refugees

Hercules Dalianis, MEDINFO 2013 6

Page 7: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

• We obtained 1 million patient records from 550 clinical units from the year 2006-2010

• In several extracts that also continue

• Each patient have an unique social security number, from birth to dead

Replaced by a serial number

• All patient names removed

• The rest including sensitive text is present

Hercules Dalianis, MEDINFO 2013 7

Stockholm EPR Corpus

Page 8: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

DEID work

• Yes, we did it also to obtain an overview of what problems may occur

• We followed HIPAA*) but adapted it for Swedish conditions

*) Health Insurance Portability and Accountability Act

Hercules Dalianis, MEDINFO 2013 8

Page 9: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis

The Stockholm EPR PHI*) corpus

• 100 electronic patient records (EPRs) in Swedish

•Five clinics: Neurology, Orthopaedia, Infection, Dental Surgery and Nutrition

•20 patients from each clinic, 50% men, 50% women

•380 000 tokens

•Three annotators annotated the whole corpus

*) Protected Health Information

9

Page 10: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 10

28 PHI-classes

• Account_Number, Age, Age_Over_89, Biometric_Identifier,

Date_Part, Full_Date, Year,

First_Name, Last_Name, Patient_First_Name,

Patient_Last_Name, Relative_First_Name, Relative_Last_Name,

Clinician_First_Name, Clinician_Last_Name, Location, Country,

Municipality, Organization, Street_Address, Town,

Health_Care_Unit, Device_Identifier_and_Serial_Number,

Ethnicity, Fax_Number, Phone_Number, Relation, Uncertain

Page 11: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 11

Page 12: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Consensus eight annotation classes• Age

• Date_Part

• Full_Date

• First_Name

• Last_Name,

• Health_Care_Unit

• Location

• Phone_Number

Hercules Dalianis 12

Page 13: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Annotation classes and instances

• Age 56

• Full date 710

• Date part 500

• First name 923

• Last name 928

• Location 1 021

• Health care unit 148

• Phone number 135

Sum: 4 421

Hercules Dalianis 13

Page 14: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

• 380 000 tokens

• 4 421 sensitive instances

• ~ 1 percent sensitive information

Hercules Dalianis 14

Page 15: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Eight annotation classes training and test using Stanford NER-CRF

Hercules Dalianis 15

Page 16: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

• 0.95-0.74 precision,

• 0.83-0.36 recall

• 0.90-0.49 F-score

• The 8 annotation classes and the words

• The rest is Black box– Window breadth– Distance between words etc

Hercules Dalianis 16

Conditional Random fields à la Stanford NER

Page 17: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Research on Stockholm EPR Corpus

• DEID and Resynthesis

• Factuality level detection of diagnoses

• Negation detection

• Detecting the amount of hospital-acquired infections (HAI)

• Detection of adverse drug events

• Comorbidities

Hercules Dalianis, MEDINFO 2013 17

Page 18: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Conclusion

• Preferably to work on original data

• Too costly and difficult to de-identify data

• Not safe enough

• De-identification makes the data too noisy.

Hercules Dalianis, MEDINFO 2013 18

Page 19: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

References

• Velupillai, S., H. Dalianis, M. Hassel and G. H. Nilsson. 2009.

Developing a standard for de-identifying electronic patient

records written in Swedish: precision, recall and F-measure in a

manual and computerized annotation trial. International Journal

of Medical Informatics (2009),

doi:10.1016/j.ijmedinf.2009.04.005

• Dalianis, H. and S. Velupillai. 2010.

De-identifying Swedish Clinical Text - Refinement of a Gold

Standard and Experiments with Conditional Random Fields,

Journal of Biomedical Semantics 2010, 1:6 (12 April 2010)

Hercules Dalianis, MEDINFO 2013 19

Page 20: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

• Alfalahi, A., S. Brissman and H. Dalianis. 2012.

Pseudonymisation of person names and other PHIs in an

annotated clinical Swedish corpus. In the Proceedings of the

Third Workshop on Building and Evaluating Resources for

Biomedical Text Mining (BioTxtM 2012) held in conjunction

with LREC 2012, May 26, Istanbul, pp 49-54

Hercules Dalianis, MEDINFO 2013 20

Page 21: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Comorbidities in Comorbidity-view

• Which ICD-10 codes co-occur with which other ones

Hercules Dalianis 21

Page 22: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 22

Comorbidity View

Page 23: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 23

Page 24: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 24

Page 25: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 25

123 H - IVA 322916614D 2007-08-21 9:12 1944 Kvinna Anamnesis

Kvinna med hjrtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med.

 

Example record(Anonymized manually)

Page 26: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

23 H - IVA 322916614D 2008-08-21 10:54 1944 Kvinna Bedömning

Grav hjärtsvikt efter hjärtinfarkt x 2 inklusive eoisod med asystoli och HLR. EF 20-25%. Neurologisk påverkan med hösidig svaghet. Blodprov. Odlingar tas i blod och urin. Remiss skickas pulm-rtg enl dr Svenssons anteckning. Atelektaser. Pneumoni, I110. Hjärtinsufficiens, ospecificerad, I509

Hercules Dalianis 26

Page 27: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 27

(English translation)

123 H - IVA 322916614D 2008-08-21 9:12

1944 Woman Anamnesis

Woman with hert failures, atrial fibrillation, and angina pectoris. Single widow. Former CVL with sequele, rght hemiparesis and aphasia. Prior hospital care for seizures, suspected to be apoepeleptic. Arrive to hospital after being found in a chair and probably been sitting there over night. Arrive for further investigation and care. Accompanied by her son Johan.

 

Page 28: Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining Group Department of

Hercules Dalianis 28

123 H - IVA 322916614D 2008-08-21 10:54 1944 Woman Assessment/Plan

Severe heart failure after heart infarction x 2. including episode with heart arrest and acute heart arrest treatment. Ejection fracture (EF) 20-25%. Neurological symptoms with right sided hemiparesis. Blood samples. Culture for blood and urine. Referral for pulmonary x-ray according to dr Svensson’s notes. Atelectases. Pneumonia, I110. Heart failure, unspecified, I509.