Developing a Semantic Search Application A Pharma Case Study

Developing a Semantic Search Application

A Pharma Case StudyTom Reamy

Chief Knowledge ArchitectKAPS Group

http://www.kapsgroup.comProgram Chair – Text Analytics World

Taxonomy Boot Camp: Washington DC, 2013

KAPS Group: General Knowledge Architecture Professional Services – Network of Consultants Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching

– Attensity, Clarabridge, Lexalytics, Strategy – IM & KM - Text Analytics, Social Media, Integration Services:

– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Fast Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,

Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc.

Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies

Presentations, Articles, White Papers – http://www.kapsgroup.com

Project Agile Methodology Goal – evaluate semantic technologies ability to:

– Replace manual annotation of scientific documents – automated or semi-automated

– Discover new entities and relationships – Provide users with self-service capabilities

Goal – feasibility and effort level

Components – Technology, Resources Cambridge Semantics, Linguamatics, SAS Enterprise Content

Categorization– Initial integration – passing results as XML

Content – scientific journal articles Taxonomy – Mesh – select small subset Access to a “customer” – critical for success

Three rounds - Iterations Visualization – faceted search, sort by date, author, journal

– Cambridge Semantics Round 1 – PDF from their database

– Needed to create additional structure and metadata– No such thing as unstructured content

Round 2 & 3 – XML with full metadata from PubMed Entity Recognition – Species, Document Type, Study Type, Drug

Names, Disease Names, Adverse Events

Components & Approach Rules or sample documents?

– Need more precision and granularity than documents can do– Training sets – not as easy as thought

First Rules – text indicators to define sections of the document – Objectives, Abstract, Purpose, Aim – all the “same” section

Separate logic of the rules from the text – Stable rules, changing text

Scores – relevancy with thresholds– Not just frequency of words

Document Type Rules

(START_2000, (AND, (OR, _/article:"[Abstract]", _/article:"[Methods]“, _/article:"[Objective]",

_/article:"[Results]", _/article:"[Discussion]“, (OR, _/article:"clinical trial*", _/article:"humans", (NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe",

_/article:"use", _/article:"animals"), Clinical Trial Rule: If the article has sections like Abstract or Methods AND has phrases around “clinical trials / Humans” and not words

like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score

Rules for Drug Names and Diseases

Primary issue – major mentions, not every mention– Combination of noun phrase extraction and categorization– Results – virtually 100%

Taxonomy of drug names and diseases Capture general diseases like thrombosis and specific types like

deep vein, cerebral, and cardiac Combine text about arthritis and synonyms with text like “Journal

of Rheumatology”

(OR, _/article/title:"[clonidine]", (AND, _/article/mesh:"[clonidine]",_/article/abstract:"[clonidine]"), (MINOC_2, _/article/abstract:"[clonidine]") (START_500, (MINOC_2,"[clonidine]")))

Means – any variation of drug name in title – high score Any variation in Mesh Keywords AND in abstract – high score Any variation in Abstract at least 2x – good score Any variation in first 500 words at least 2x – suspect

Results: – Wide Range by type -- 70-100% recall and precision

Focus mostly on precision – difficult to test recall One deep dive area indicated that 90%+ scores for both precision

and recall could be built with moderate level of effort Not linear effort – 30% accuracy does not mean 1/3 done

Iteration 3

Complete treatment of disease state:– Indication (disease you want to treat)– Concomitant disease– Adverse or side effects

Use XML metadata – some variant of “adverse” Any combination of words associated with a disease (depression)

and any of the words that indicated an adverse event or effect

Conclusion

Project was a success! Useful results – as defined by the customer Reasonable and doable effort level – both for initial development

and maintenance Essential Success Factors

– Rules not documents, training sets (starting point)– Full platform for disambiguation of noun phrase extraction,

major-minor mention– Separation of logic and text

Semantic Search works!– If you do it smart!

Questions? Tom Reamy

tomr@kapsgroup.comKAPS Group

Knowledge Architecture Professional Serviceshttp://www.kapsgroup.com

www.TextAnalyticsWorld.com March 17-19, San Francisco

Developing a Semantic Search Application A Pharma Case Study

Documents

Semantic Search with Semantic Web

Conceptual foundations for semantic mapping and semantic search

Review Paper An Overview of Semantic Search Engines · Review Paper An Overview of Semantic Search Engines Subham Roy, Akshay Modak, Debabrata Barik, ... but Semantic search engines

Semantic Search Engine

OpenHPI 6.7 - Semantic Search

Using BM25F for Semantic Search

Semantic Web Search Engine

Writing for Semantic Search

Web-scale semantic search

Hummingbird and Semantic Search - State of Search Dallas

Semantic Search Agent System applying Semantic Web Techniques

Semantic Search Engine for Bioinformatics Company · >Azciti Semantic Search Engine for BioinformaticsCompany Azati designed and developed a semantic search engine powered bymachinelearning.Itextractstheactual

Eric milstch semantic search ppt

Semantic search in databases

Schema Semantic Markup & Search

Semantic Search Engines

Semantic Search From Truevert

APPROACHES TO IMPLEMENT SEMANTIC SEARCHmices.co/...Peters_Approaches-to-semantic-search.pdf · SEARCH Johannes Peter Product Owner / Architect for Search. 2 WHAT IS SEMANTIC SEARCH

Semantic Search as Inference - GitHub Pagesbevankoopman.github.io/papers/KoopmanPhDThesis-Semantic... · 2021. 5. 14. · Semantic Search as Inference: Applications in Health Informatics

Navigating Semantic Search