Upload
ann-marie-roche
View
120
Download
3
Tags:
Embed Size (px)
Citation preview
Country Long Distance
Australia +61 3 8488 8993
Austria +43 (0) 7 2088 2171
Belgium +32 (0) 42 68 0164
Canada +1 (647) 497-9386
Denmark +45 (0) 89 88 04 43
Finland +358 (0) 931 58 4587
France +33 (0) 182 880 933
Germany +49 (0) 692 5736 7304
Ireland +353 (0) 19 036 186
Text Mining Full Text for Molecular Targetswith George Jiang, Ph.D., M.B.A
Our Webinar will begin in a few minutes
Country Long Distance
Italy +39 0 294 75 15 36
Netherlands +31 (0) 108 080 115
New Zealand +64 (0) 9 801 0293
Norway +47 21 03 72 89
Spain +34 911 23 4247
Sweden +46 (0) 852 500 292
Switzerland +41 (0) 435 0824 40
United Kingdom +44 (0) 330 221 9921
United States +1 (646) 307-1726
TO USE YOUR COMPUTER'S AUDIO: When the webinar begins, you will be connected to audio using
your computer's microphone and speakers (VoIP). A headset is recommended.
--OR--
TO USE YOUR TELEPHONE: If you prefer to use your phone, you must select "Use Telephone" after
joining the webinar and call in using the numbers below.
Dial your country’s number and then use Access Code: 655-028-479
Text Mining Full Text for Molecular Targets
George Jiang, PhD, MBA
Product Manager, Text Mining
March 31, 2015
George Jiang
Product ManagerText Mining
Trained scientist with several years of experience in text analytics, data integration, and
scientific software development
• Currently, Product Manager with Elsevier working on text mining projects and
semantic search products, based out of Rockville, MD
• Previously, worked at US National Center for Biotechnology Information (NCBI)
working on Discovery Initiative to understand users needs and crosslink data and
expose it to make research information more discoverable
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
World Leader in Digital Information Solutions
Published over
330,000 articlesin 2013
Founded over
130 years ago
Work with over
30 million scientists, students, health
& information professionals
Received over
1 million submissions in 2013
SOLUTIONS
Over 53 million items indexed by
Scopus
Elsevier
R+D Solutions
Elsevier
Clinical Solutions
Helps corporate
researchers, R+D
professionals, and
engineers improve how
they interact with, share,
and apply information to
solve problems using
our digital workflow
tools, analytics, and data
Provides universities,
governments, and
research institutions with
the resources and
insights to improve
institutional research
strategy, management,
and performance.
Elsevier
Education
Helps medical
professionals apply
trusted data and
sophisticated tools to
make better clinical
decisions, deliver better
care, and produce
better healthcare
outcomes.
Helps educate
highly-skilled,
effective healthcare
professionals, using
the most advanced
pedagogical tools
and reference
works.
Elsevier
Research Intelligence
CONTENT
CA
PA
BIL
ITIE
SP
LA
TF
OR
MS
Publishes over
2,200 online
journals & over
26,000 books
(e + print)
Elsevier eBooks, Online
Journals, Databases
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Working With Text is A Big Data Challenge
Text is everywhere! We’ve already covered 100s of terms in this presentation.
Twitter - 58M tweets/day x 14.98 words/tweet => 868M words/day => 6B
Average journal article = 10, 150, 6000 words in title, abstract , full text
abstracts – 2.4B words (24M abstracts @ PubMed x 100 words/abstract)
full text – 144B words ( if comparable set from PubMed, 25M x 6000
The information deluge of scientific content and how to manage
and/or leverage this information is a big data challenge
Information seeking challenges can be addressed with automation assistance and text mining for greater insight
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Summary
• Text mining can help to sift through large amounts of scientific literature and other
textual content
• Text mining can help to increase project team efficiency to find precise statements
and relationships
• Full text articles provide richer result sets that can be useful in finding additional
insights that cannot be garnered just using abstracts
• Several hurdles still exist to implement text mining but the value can outweigh costs
Text mining full text can be used to help find molecular targets of
interest quickly that may be missed if relying on abstracts and
keyword searching
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Agenda
• Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
What is Text Mining?
Text Mining
• Refers to the process of deriving high-quality structured
A Does B
X Inhibits Y
G Stops D
I Drink T
documents facts
Why Text Mining?
• Text Mining can yield better results, and increase team efficiency
• The application of text mining techniques can be used to solve
business problems
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Example of Getting Structured Information (Facts)
Triple negative breast cancer (TNBC) cells lack receptor expression, are frequently
more aggressive and are resistant to growth factor inhibition
documents
sentence
fact(s)
Tumour cells show greater dependency on glycolysis so providing a sufficient and rapid energy supply for fast growth. In many breast cancers, estrogen, progesterone and
epidermal growth factor receptor-positive cells proliferate in response to growth factors and growth factor antagonists are a mainstay of treatment. However, triple negative
breast cancer (TNBC) cells lack receptor expression, are frequently more aggressive and are resistant to growth factor inhibi tion. Downstream of growth factor receptors,
signal transduction proceeds via phosphatidylinositol 3-kinase (PI3k), Akt and FOXO3a inhibition, the latter being partly responsible for coordinated increases in glycolysis
and apoptosis resistance. FOXO3a may be an attractive therapeutic target for TNBC. Therefore we have undertaken a systematic review of FOXO3a as a target for breast
cancer therapeutics.
paragraph
TNBC cells lack receptor expression
TNBC cells are more aggressive
TNBC cells resist growth factor inhibition
Excerpt from Taylor et al. Evaluating the
evidence for targeting FOXO3a in breast
cancer: a systematic review.
Wordcloud plotted with Wordle.nettokens
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Text analytics and
Visualizations
What is Text Mining Being Used For?
Use cases include:
• Target identification and prioritization
• Biomarker discovery
• Drug repurposing
• Drug safety and finding adverse events
• Clinical study design and site selection
• Competitive intelligence
DISCOVERYPRE-
CLINICALCLINICAL
POST-LAUNCH
Text mining article submissions for curation
assistance in publishing
Basic Research Applied Research
Text mining can be used to support several research and development areas
Information retrieval and analysis
of biomedical literature for target
identification, systematic reviews,
etc.
Searching clinical trial data
or electronic health records
to find signals in patient
populations
Triage of news and papers
for literature curation and
regulatory reporting
Identifying relevant items for
meta-analysis of specific research
results
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
How to Text Mine?
• Content
• Ontology
• Software solution(s)
• Expertise
Several pieces and steps are often needed to get results from text mining
Aggregate1 Structure2
Normalize3
Integrate4
• PDF -> XML
• XML quality differs
• XML uniformity e.g. dealing
with sources, types, etc.
Default or custom ontology
• Text mining the corpus
• Balancing expectations of
precision and recall
1. Aggregate
2. Structure
3. Normalize
4. Integrate
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Text Mining solutions &
Professional Services
Elsevier Offers Several Text Mining Solutions
facts and data out
support downstream
applications and activitiesAggregate
Normalize
Structure
Integrate
1
2
3
4
Journals and Books
Internal content
Patents
Other
Software solution
UI / API
Public data sources
User Questions
Software solutions and Professional Services available for text mining and
semantic searching
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
• Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A
Agenda
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Abstracts vs Full Text
• Concise summaries
• Readily accessible
• Relatively uniform
Summary of main differences
• Complete documents
• May not be as accessible
• Information within can vary
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Benefits of Using Full Text
• Distribution of keywords, facts and relations – more keywords, facts
and relations are found in full text
• Concept under-representation in abstracts – specific entities may not
be mentioned in abstracts but primarily in full text sections e.g. biological functions
• Missing Negative data – often negative results or non-significant data
are missing from abstracts
• Citations per article – full text sections are more cited vs abstracts
• Timeliness – Relevant facts and relationships can be found in full text
first before any mentions in abstracts as researchers surmise in
Full Text provide richer results sets
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Additional Reading
• Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics. 2003 May 29;4:20. Epub 2003 May 29.
• Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles. J Biomed Inform. 2010
Apr;43(2):173-89. doi: 10.1016/j.jbi.2009.11.001. Epub 2009 Nov 10.
• Do Peers See More in a Paper Than Its Authors? Adv Bioinformatics. 2012;2012:750214. doi: 10.1155/2012/750214. Epub 2012 Nov 27.
• Is searching full text more effective than searching abstracts? Bioinformatics. 2009 Feb 3;10:46. doi: 10.1186/1471-2105-10-46.
• Challenges for automatically extracting molecular interactions from full-text articles. BMC Bioinformatics. 2009 Sep 24;10:311. doi:
10.1186/1471-2105-10-311.
• Semi-Automatic Indexing of Full Text Biomedical Articles. AMIA Annu Symp Proc. 2005:271-5.
• Discovering implicit associations between genes and hereditary diseases. Pac Symp Biocomput. 2007:316-27.
• The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics. 2010
Sep 29;11:492. doi: 10.1186/1471-2105-11-492.
• Abstracts in high profile journals often fail to report harm. BMC Med Res Methodol. 2008 Mar 27;8:14. doi: 10.1186/1471-2288-8-14.
• Quality of abstracts of original research articles in CMAJ in 1989. CMAJ. 1991 Feb 15;144(4):449-53.
• Accuracy of data in abstracts of published research articles. JAMA. 1999 Mar 24-31;281(12):1110-1.
Articles highlighting the differences between abstracts and full text
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Abstract vs Full Text Example
Challenges
Sifting through more information!
Finding the right results
Concise abstracts cannot contain all details whereas full text will
contain all the relevant information
Significant advances have been made in the treatment of human immunodeficiency virus (HIV) infection over the past two
decades. Improved therapy has prolonged survival and improved clinical outcome for HIV-infected children and adults.
Sixteen antiretroviral (ART) medications have been approved for use in pediatric HIV infection. The Department of Health
and Human Services (DHHS) has issued “Guidelines for the Use of Antiretroviral Agents in Pediatric HIV Infection”, which
provide detailed information on currently recommended antiretroviral therapies (ART). However, consultation with an HIV
specialist is recommended as the current therapy of pediatric HIV therapy is complex and rapidly evolving.
Elvitegravir is a once daily integrase inhibitor being studied in adults.
Children with treatment failure should be evaluated for medication adherence, drug intolerance, and possible drug
interactions which may lessen the efficacy of the therapeutic regimen.
Abstract
Full Text
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
• Introduction to Text Mining
• The Value of Full Text Articles
• Illustration of Text Mining Full Text Articles
• Recap
• Q&A
Agenda
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
• Use Elsevier Text Mining solution to search against corpus of biomedical literature
• Abstracts – MEDLINE/PubMed (24M)
• Full text – PubMed Central, Elsevier and partner publishers (4M)
• Refine results corpus, redefine search / text mining output
• Review and analyze data
• Create visual data reports using other tools available
Methods
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Search against scientific literature corpus for sentences related to efficacy
If looking for details, one really needs to look at the full text results
Text Mining Abstracts vs Full Text
Word clouds suggest insight differences between abstracts and full text
Full textAbstracts Only
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Full text provides insights into the specific mutations implicated in differential enzymatic efficacy of
a particular drug class
Finding Molecular Targets in Full Text
Word clouds illustrating differences in point mutations mentioned
Full TextAbstracts Only
Gives insight into the mutations implicated for changes in efficacy.
No mutations mentioned in abstracts of comparable document set.
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Finding Molecular Targets in Full Text
Example searching for cancer immunity checkpoint proteins
Full text provides insights into additional protein targets that may be of interest for cancer
immunology research in cancer checkpoints
Full TextAbstracts Only
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Text Mining Results Can Then Be Used For Analyses
• Review results. Not just keyword matching anymore …
identifying more relevant documents for review
identifying relationships and precise statements
Identifying other targets/content of interest
• Link data to other items of interest
• Analytics, visualization and system/network analysis e.g. Pathway Studio,
Cytoscape
• Integrate text mining data and process into different workflows for project
quality and efficiency
Text mining results can be used to improve scientific research and can be
used to address business problems
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Text Mining Finds Answers Faster & Increases Efficiency
An Example Project Comparison
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Savings:
Text mining robustly identifies the relevant articles
Savings of 171 person-days per project
Allows more projects/higher quality with same staff
Keyword searching: Text Mining:
Finds 1,408 articles
Many of them not relevantIdentifies 142 relevant articles
176 person-days to review
@ 20 min/article
5 person-days to review
@ 20 min/article
VS
24
Writing comprehensive state of the science review article on the chemical toxicity of a particular
substance
Relationship map using Elsevier Text Mining
results into Cytoscape visualization
NLP
Example of Visual Insights of Text Mining Results
Intersecting adverse events between two anti-TNF drugs
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Summary
• Text mining can help to sift through large amounts of scientific literature and other
textual content
• Text mining can help to increase project team efficiency to find precise statements
and relationships
• Full text articles provide richer result sets that can be useful in finding additional
insights that cannot be garnered just using abstracts
• Several hurdles still exist to implement text mining but the value can outweigh costs
Text mining full text can be used to help find molecular targets of
interest quickly that may be missed if relying on abstracts and
keyword searching
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang
Thank you for joining our webinar today:
Text Mining Full Text for Molecular Targetswith George Jiang, Ph.D., M.B.A
If you have any questions for our speaker, please type them into
the CHAT window.
If you would like more information you can contact:
George Jiang
Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang