Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules...

Preview:

Citation preview

Panel:Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?

Hercules DalianisClinical Text Mining GroupDepartment of Computer and Systems Sciences (DSV)

hercules@dsv.su.se

Background

• Starting 2007

• Karolinska University Hospital, Stockholm

• Greater Stockholm (City Council) 2 million inhabitants

• 1800 beds/inpatients

• 550 clinical units

Hercules Dalianis, MEDINFO 2013 2

TakeCare EPR system

• Swedish electronic patient record system, now owned by CompuGroup Medical

• Centralized, text file based

• Built on APL programming language

• Data transferred to MySQL database to make it manageable (Intelligence)

Hercules Dalianis, MEDINFO 2013 3

Ethical permission

• What type of research will be carried out

• How will it be carried out

• No social security number

• No personal names

• Safe guard of data

Hercules Dalianis, MEDINFO 2013 4

Encryption and safe guard

• Encrypted server

• Password protected

• Locked into an alarmed room

• Server locked to a rack

• No Internet connection

• Few people have access to this server (that have to sign security paper)

=> Probably safer than at the hospital

Hercules Dalianis, MEDINFO 2013 5

Trust, Trust and more Trust

• Good contacts with hospital management

• They decide for the whole hospital/all clinical units

• No psychiatric or veneric diseases, no paperless refugees

Hercules Dalianis, MEDINFO 2013 6

• We obtained 1 million patient records from 550 clinical units from the year 2006-2010

• In several extracts that also continue

• Each patient have an unique social security number, from birth to dead

Replaced by a serial number

• All patient names removed

• The rest including sensitive text is present

Hercules Dalianis, MEDINFO 2013 7

Stockholm EPR Corpus

DEID work

• Yes, we did it also to obtain an overview of what problems may occur

• We followed HIPAA*) but adapted it for Swedish conditions

*) Health Insurance Portability and Accountability Act

Hercules Dalianis, MEDINFO 2013 8

Hercules Dalianis

The Stockholm EPR PHI*) corpus

• 100 electronic patient records (EPRs) in Swedish

•Five clinics: Neurology, Orthopaedia, Infection, Dental Surgery and Nutrition

•20 patients from each clinic, 50% men, 50% women

•380 000 tokens

•Three annotators annotated the whole corpus

*) Protected Health Information

9

Hercules Dalianis 10

28 PHI-classes

• Account_Number, Age, Age_Over_89, Biometric_Identifier,

Date_Part, Full_Date, Year,

First_Name, Last_Name, Patient_First_Name,

Patient_Last_Name, Relative_First_Name, Relative_Last_Name,

Clinician_First_Name, Clinician_Last_Name, Location, Country,

Municipality, Organization, Street_Address, Town,

Health_Care_Unit, Device_Identifier_and_Serial_Number,

Ethnicity, Fax_Number, Phone_Number, Relation, Uncertain

Hercules Dalianis 11

Consensus eight annotation classes• Age

• Date_Part

• Full_Date

• First_Name

• Last_Name,

• Health_Care_Unit

• Location

• Phone_Number

Hercules Dalianis 12

Annotation classes and instances

• Age 56

• Full date 710

• Date part 500

• First name 923

• Last name 928

• Location 1 021

• Health care unit 148

• Phone number 135

Sum: 4 421

Hercules Dalianis 13

• 380 000 tokens

• 4 421 sensitive instances

• ~ 1 percent sensitive information

Hercules Dalianis 14

Eight annotation classes training and test using Stanford NER-CRF

Hercules Dalianis 15

• 0.95-0.74 precision,

• 0.83-0.36 recall

• 0.90-0.49 F-score

• The 8 annotation classes and the words

• The rest is Black box– Window breadth– Distance between words etc

Hercules Dalianis 16

Conditional Random fields à la Stanford NER

Research on Stockholm EPR Corpus

• DEID and Resynthesis

• Factuality level detection of diagnoses

• Negation detection

• Detecting the amount of hospital-acquired infections (HAI)

• Detection of adverse drug events

• Comorbidities

Hercules Dalianis, MEDINFO 2013 17

Conclusion

• Preferably to work on original data

• Too costly and difficult to de-identify data

• Not safe enough

• De-identification makes the data too noisy.

Hercules Dalianis, MEDINFO 2013 18

References

• Velupillai, S., H. Dalianis, M. Hassel and G. H. Nilsson. 2009.

Developing a standard for de-identifying electronic patient

records written in Swedish: precision, recall and F-measure in a

manual and computerized annotation trial. International Journal

of Medical Informatics (2009),

doi:10.1016/j.ijmedinf.2009.04.005

• Dalianis, H. and S. Velupillai. 2010.

De-identifying Swedish Clinical Text - Refinement of a Gold

Standard and Experiments with Conditional Random Fields,

Journal of Biomedical Semantics 2010, 1:6 (12 April 2010)

Hercules Dalianis, MEDINFO 2013 19

• Alfalahi, A., S. Brissman and H. Dalianis. 2012.

Pseudonymisation of person names and other PHIs in an

annotated clinical Swedish corpus. In the Proceedings of the

Third Workshop on Building and Evaluating Resources for

Biomedical Text Mining (BioTxtM 2012) held in conjunction

with LREC 2012, May 26, Istanbul, pp 49-54

Hercules Dalianis, MEDINFO 2013 20

Comorbidities in Comorbidity-view

• Which ICD-10 codes co-occur with which other ones

Hercules Dalianis 21

Hercules Dalianis 22

Comorbidity View

Hercules Dalianis 23

Hercules Dalianis 24

Hercules Dalianis 25

123 H - IVA 322916614D 2007-08-21 9:12 1944 Kvinna Anamnesis

Kvinna med hjrtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med.

 

Example record(Anonymized manually)

23 H - IVA 322916614D 2008-08-21 10:54 1944 Kvinna Bedömning

Grav hjärtsvikt efter hjärtinfarkt x 2 inklusive eoisod med asystoli och HLR. EF 20-25%. Neurologisk påverkan med hösidig svaghet. Blodprov. Odlingar tas i blod och urin. Remiss skickas pulm-rtg enl dr Svenssons anteckning. Atelektaser. Pneumoni, I110. Hjärtinsufficiens, ospecificerad, I509

Hercules Dalianis 26

Hercules Dalianis 27

(English translation)

123 H - IVA 322916614D 2008-08-21 9:12

1944 Woman Anamnesis

Woman with hert failures, atrial fibrillation, and angina pectoris. Single widow. Former CVL with sequele, rght hemiparesis and aphasia. Prior hospital care for seizures, suspected to be apoepeleptic. Arrive to hospital after being found in a chair and probably been sitting there over night. Arrive for further investigation and care. Accompanied by her son Johan.

 

Hercules Dalianis 28

123 H - IVA 322916614D 2008-08-21 10:54 1944 Woman Assessment/Plan

Severe heart failure after heart infarction x 2. including episode with heart arrest and acute heart arrest treatment. Ejection fracture (EF) 20-25%. Neurological symptoms with right sided hemiparesis. Blood samples. Culture for blood and urine. Referral for pulmonary x-ray according to dr Svensson’s notes. Atelectases. Pneumonia, I110. Heart failure, unspecified, I509.

 

Recommended