122
Lars Juhl Jensen The pragmatic text miner From literature to electronic health records

The pragmatic text miner: From literature to electronic health records

Embed Size (px)

Citation preview

Page 1: The pragmatic text miner: From literature to electronic health records

Lars Juhl Jensen

The pragmatic text miner

From literature to electronic health records

Page 2: The pragmatic text miner: From literature to electronic health records

why text mining?

Page 3: The pragmatic text miner: From literature to electronic health records

data mining

Page 4: The pragmatic text miner: From literature to electronic health records

guilt by association

Page 5: The pragmatic text miner: From literature to electronic health records
Page 6: The pragmatic text miner: From literature to electronic health records

structured data

Page 7: The pragmatic text miner: From literature to electronic health records

unstructured text

Page 8: The pragmatic text miner: From literature to electronic health records

biomedical literature

Page 9: The pragmatic text miner: From literature to electronic health records

>10 km

Page 10: The pragmatic text miner: From literature to electronic health records

too much to read

Page 11: The pragmatic text miner: From literature to electronic health records

computer

Page 12: The pragmatic text miner: From literature to electronic health records

as smart as a dog

Page 13: The pragmatic text miner: From literature to electronic health records

teach it specific tricks

Page 14: The pragmatic text miner: From literature to electronic health records
Page 15: The pragmatic text miner: From literature to electronic health records
Page 16: The pragmatic text miner: From literature to electronic health records

named entity recognition

Page 17: The pragmatic text miner: From literature to electronic health records

dictionary-based approach

Page 18: The pragmatic text miner: From literature to electronic health records

identification required

Page 19: The pragmatic text miner: From literature to electronic health records

dictionary

Page 20: The pragmatic text miner: From literature to electronic health records

cyclin dependent kinase 1

Page 21: The pragmatic text miner: From literature to electronic health records

CDC2

Page 22: The pragmatic text miner: From literature to electronic health records

expansion rules

Page 23: The pragmatic text miner: From literature to electronic health records

CDC2

Page 24: The pragmatic text miner: From literature to electronic health records

hCdc2

Page 25: The pragmatic text miner: From literature to electronic health records

flexible matching

Page 26: The pragmatic text miner: From literature to electronic health records

cyclin dependent kinase 1

Page 27: The pragmatic text miner: From literature to electronic health records

cyclin-dependent kinase 1

Page 28: The pragmatic text miner: From literature to electronic health records

“black list”

Page 29: The pragmatic text miner: From literature to electronic health records

SDS

Page 30: The pragmatic text miner: From literature to electronic health records

>10 km<10 hours

Page 31: The pragmatic text miner: From literature to electronic health records

the formal way

Page 32: The pragmatic text miner: From literature to electronic health records

benchmark

Page 33: The pragmatic text miner: From literature to electronic health records

manually annotated corpus

Page 34: The pragmatic text miner: From literature to electronic health records

automatic tagging

Page 35: The pragmatic text miner: From literature to electronic health records

compare

Page 36: The pragmatic text miner: From literature to electronic health records
Page 37: The pragmatic text miner: From literature to electronic health records

quality metrics

Page 38: The pragmatic text miner: From literature to electronic health records

precision

Page 39: The pragmatic text miner: From literature to electronic health records

recall

Page 40: The pragmatic text miner: From literature to electronic health records

F-score

Page 41: The pragmatic text miner: From literature to electronic health records

manually annotated corpus

Page 42: The pragmatic text miner: From literature to electronic health records

use existing corpus

Page 43: The pragmatic text miner: From literature to electronic health records

not new

Page 44: The pragmatic text miner: From literature to electronic health records

make new corpus

Page 45: The pragmatic text miner: From literature to electronic health records

hard work

Page 46: The pragmatic text miner: From literature to electronic health records

natural language processing

Page 47: The pragmatic text miner: From literature to electronic health records

part-of-speech tagging

Page 48: The pragmatic text miner: From literature to electronic health records

semantic tagging

Page 49: The pragmatic text miner: From literature to electronic health records

sentence parsing

Page 50: The pragmatic text miner: From literature to electronic health records

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 51: The pragmatic text miner: From literature to electronic health records

handle negations

Page 52: The pragmatic text miner: From literature to electronic health records

directionality

Page 53: The pragmatic text miner: From literature to electronic health records

high precision

Page 54: The pragmatic text miner: From literature to electronic health records

poor recall

Page 55: The pragmatic text miner: From literature to electronic health records

highly domain specific

Page 56: The pragmatic text miner: From literature to electronic health records

the pragmatic way

Page 57: The pragmatic text miner: From literature to electronic health records

benchmark light™

Page 58: The pragmatic text miner: From literature to electronic health records

requires fewer calories

Page 59: The pragmatic text miner: From literature to electronic health records

non-annotated corpus

Page 60: The pragmatic text miner: From literature to electronic health records

automatic tagging

Page 61: The pragmatic text miner: From literature to electronic health records

random sampling

Page 62: The pragmatic text miner: From literature to electronic health records

manual inspection

Page 63: The pragmatic text miner: From literature to electronic health records
Page 64: The pragmatic text miner: From literature to electronic health records

precision

Page 65: The pragmatic text miner: From literature to electronic health records

no recall

Page 66: The pragmatic text miner: From literature to electronic health records

relative recall

Page 67: The pragmatic text miner: From literature to electronic health records

compare methods

Page 68: The pragmatic text miner: From literature to electronic health records

co-mentioning

Page 69: The pragmatic text miner: From literature to electronic health records

within documents

Page 70: The pragmatic text miner: From literature to electronic health records

within paragraphs

Page 71: The pragmatic text miner: From literature to electronic health records

within sentences

Page 72: The pragmatic text miner: From literature to electronic health records

weighted score

Page 73: The pragmatic text miner: From literature to electronic health records

benchmark

Page 74: The pragmatic text miner: From literature to electronic health records

associations good?

Page 75: The pragmatic text miner: From literature to electronic health records

tagging good enough

Page 76: The pragmatic text miner: From literature to electronic health records

unifying text & data

Page 77: The pragmatic text miner: From literature to electronic health records

web resources

Page 78: The pragmatic text miner: From literature to electronic health records

text mining

Page 79: The pragmatic text miner: From literature to electronic health records

curated knowledge

Page 80: The pragmatic text miner: From literature to electronic health records

experimental data

Page 81: The pragmatic text miner: From literature to electronic health records

computational predictions

Page 82: The pragmatic text miner: From literature to electronic health records

common identifiers

Page 83: The pragmatic text miner: From literature to electronic health records

quality scores

Page 84: The pragmatic text miner: From literature to electronic health records

proteins

Page 85: The pragmatic text miner: From literature to electronic health records

STRING

Page 86: The pragmatic text miner: From literature to electronic health records

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

Page 87: The pragmatic text miner: From literature to electronic health records

small molecules

Page 88: The pragmatic text miner: From literature to electronic health records

Kuhn et al., Nucleic Acids Research, 2012

Page 89: The pragmatic text miner: From literature to electronic health records

compartments

Page 90: The pragmatic text miner: From literature to electronic health records

compartments.jensenlab.org

Page 91: The pragmatic text miner: From literature to electronic health records

tissues

Page 92: The pragmatic text miner: From literature to electronic health records

tissues.jensenlab.org

Page 93: The pragmatic text miner: From literature to electronic health records

diseases

Page 94: The pragmatic text miner: From literature to electronic health records
Page 95: The pragmatic text miner: From literature to electronic health records

environments

Page 96: The pragmatic text miner: From literature to electronic health records

electronic health records

Page 97: The pragmatic text miner: From literature to electronic health records

Jensen et al., Nature Reviews Genetics, 2012

Page 98: The pragmatic text miner: From literature to electronic health records

structured data

Page 99: The pragmatic text miner: From literature to electronic health records

Jensen et al., Nature Reviews Genetics, 2012

Page 100: The pragmatic text miner: From literature to electronic health records

unstructured data

Page 101: The pragmatic text miner: From literature to electronic health records

clinical narrative

Page 102: The pragmatic text miner: From literature to electronic health records
Page 103: The pragmatic text miner: From literature to electronic health records

comorbidity

Page 104: The pragmatic text miner: From literature to electronic health records

Jensen et al., Nature Reviews Genetics, 2012

Page 105: The pragmatic text miner: From literature to electronic health records

Roque et al., PLoS Computational Biology, 2011

Page 106: The pragmatic text miner: From literature to electronic health records

in Danish

Page 107: The pragmatic text miner: From literature to electronic health records

by busy doctors

Page 108: The pragmatic text miner: From literature to electronic health records

confounding factors

Page 109: The pragmatic text miner: From literature to electronic health records

age and gender

Page 110: The pragmatic text miner: From literature to electronic health records

reporting bias

Page 111: The pragmatic text miner: From literature to electronic health records

temporal correlation

Page 112: The pragmatic text miner: From literature to electronic health records

diagnosis trajectories

Page 113: The pragmatic text miner: From literature to electronic health records

Jensen et al., in preparation, 2013

Page 114: The pragmatic text miner: From literature to electronic health records

pharmocovigilance

Page 115: The pragmatic text miner: From literature to electronic health records

adverse drug reactions

Page 116: The pragmatic text miner: From literature to electronic health records

Eriksson et al., in preparation, 2013

Page 117: The pragmatic text miner: From literature to electronic health records

ADR profiles

Page 118: The pragmatic text miner: From literature to electronic health records

Eriksson et al., in preparation, 2013

Page 119: The pragmatic text miner: From literature to electronic health records

ADR frequencies

Page 120: The pragmatic text miner: From literature to electronic health records

Eriksson et al., in preparation, 2013

Page 121: The pragmatic text miner: From literature to electronic health records

Acknowledgments

STRINGChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenPeer Bork

Text miningSune FrankildEvangelos PafilisAlberto SantosKalliopi TsafouJanos BinderLucia FaniniSarah FaulwetterChristina PavloudiJulia SchnetzerAikaterini VasileiadouHeiko HornMichael KuhnNigel BrownReinhard SchneiderSean O’Donoghue

EHR miningRobert ErikssonPeter Bjødstrup JensenAnders Boeck JensenFrancisco S. RoqueHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulTudor OpreaPope MoseleyThomas WergeSøren Brunak

Page 122: The pragmatic text miner: From literature to electronic health records