Upload
lars-juhl-jensen
View
157
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Text mining for organism and environment names
Citation preview
Lars Juhl Jensen
Text mining for organismand environment names
who am I?
sequence analysis
protein networks
string-db.org
chemical networks
stitch-db.org
group leader
proteomics
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
medical informatics
Jensen et al., Nature Reviews Genetics, 2012
cofounder
me
why text mining?
data mining
unstructured text
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
organisms
environments
expansion rules
plural and adjective forms
flexible matching
hyphens and spaces
“black list”
a
execution modes
C++ batch tagger
Python API
web service
questions?