Pragmatic text mining: From literature to electronic health records

Preview:

Citation preview

Lars Juhl Jensen

Pragmatic text miningFrom literature to electronic health

records

why text mining?

data mining

guilt by association

structured data

unstructured text

biomedical literature

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

dictionary-based approach

identification required

dictionary

cyclin dependent kinase 1

CDC2

expansion rules

CDC2

hCdc2

flexible matching

hyphens and spaces

“black list”

SDS

efficient tagger

Pafilis et al., PLOS ONE, 2013

the formal way

benchmark

manually annotated corpus

automatic tagging

precision

recall

natural language processing

Gene and protein namesCue words for entity recognitionVerbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

hard work

the pragmatic way

“benchmark light”

requires fewer calories

non-annotated corpus

automatic tagging

random inspection

precision

no recall

relative recall

co-mentioning

within documents

within paragraphs

within sentences

weighted score

unifying text & data

web resources

text mining

curated knowledge

Letunic & Bork, Trends in Biochemical Sciences, 2008

experimental data

von Mering et al., Nucleic Acids Research, 2005

computational predictions

common identifiers

quality scores

proteins

Szklarczyk, Franceschini et al., Nucleic Acids Research, 2011

small molecules

Kuhn et al., Nucleic Acids Research, 2012

compartments

compartments.jensenlab.org

tissues

tissues.jensenlab.org

diseases

electronic health records

Jensen et al., Nature Reviews Genetics, 2012

structured data

Jensen et al., Nature Reviews Genetics, 2012

unstructured data

clinical narrative

Danish

busy doctors

psychiatric patients

pharmacovigilance

structured data

medication

text mining

drug indications

adverse drug events

temporal correlation

complex filtering

Eriksson et al., in submitted, 2013

Eriksson et al., submitted, 2013

Drug substance ADE p-value

Chlordiazepoxide Nystagmus 4.0e-8

Simvastatin Personality changes

8.4e-8

Dipyridamole Visual impairment

4.4e-4

Citalopram Psychosis 8.8e-4

Bendroflumethiazide

Apoplexy 8.5e-3

AcknowledgmentsProtein networksChristian von MeringDamian SzklarczykMichael KuhnManuel StarkJean MullerTobias DoerksAlexander RothMilan SimonovicBerend SnelMartijn HuynenPeer Bork

Localization and diseaseSune FrankildAlberto SantosKalliopi TsafouJanos BinderReinhard SchneiderSean O’DonoghueElectronic health recordsRobert ErikssonThomas WergeSøren Brunak

Recommended