Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Prepared By: , 2013 |
Text Analytics WorldBoston, MA | October 1, 2013
Christine Connors, Jeff Catteau, Mike Vogel October Boston, MA
Semantics and SemanticsIntegrating Text Analytics and Ontologies
Monday, September 30, 13
We Leverage Traditional and Novel Technologies to Improve Scientific and Business Outcomes
Baselining
Signal Detection
Case Management
Reporting
Risk Management
Proc
ess
Man
agem
ent
Stru
ctur
ed A
utho
ring
Anal
ytic
s
Info
rmat
ion
Inte
rcha
nge
Case Study: Improved Signal Detection for Pharmacovigilance Leveraging Network Analysis, Pattern Matching and Ontological Logic Frameworks
Semantic Support
Semantics Lens
Monday, September 30, 13
Semantics: the study and application of meaning
You give me a headache. Hapitol is the antidepressant I use.
I have been on Hapitol for two years and only recently began to experience headaches at night
Monday, September 30, 13
Semantic technologies reduce barriers between people and information
How we think about: The world Our work Information A task A problem An analytic
How information representsand supports The world Our work Information A task A solution An analytic
Lens
of M
eanin
g
Monday, September 30, 13
With the right approach, semantic processing can provide a reusable framework for authorship, analytics and search
In Scientific Search, we are making it easier for scientists and
information technology to
collaborate through semantic and algorithmic techniques
The core processing engine has additional
applications in research and Big Data
And fits within an overall knowledge
architecture that can drive internal and
external collaboration
Monday, September 30, 13
To achieve differentiated search, you need to de-couple the information capabilities
Content Processing
Indexing and Alignment
Search Application
Functional Extensions
Structured Accessors
Unstructured Accessors
Connectivity to Information Assets
Semantic Extraction, Embedding and
Pattern Detection
Map Concepts to Client Business
Definitions
Orchestrated Query Processor
Use-Case Specific Filters, Analytics and
Visualizations
Monday, September 30, 13
We use hybridized NLP capabilities to:
Extract meaningful entities from text (people, places, products, etc.)
Process language to determine meaning and intent (context, clarification, disambiguation)
Match things to each other (entities, patterns, moods, sentiment)
Identify things that do our don’t match expectations (signal detection)
Organize our information aligned to our thoughts (autoclassification and tagging, streaming to process, translation)
What do the semantic processors do?
More M
eaningful
Monday, September 30, 13
It helps to align to content processing already in the enterprise (or that should be)
Domain & Enterprise Semantics
Knowledge Pipeline
Search-RelatedProcessing and
Enrichment
Data
CaptureNormalizerData Capture Normalization Quality
Control Distribution
Authoring &Editorial
Interfaces
Content Monitoring& Alerting
Coding
Manual Coding Interface
ManualCodingQueueEntity
Extraction Categorizer Rules-basedCoding
Expansion/Validation
Semantic Processor
Monday, September 30, 13
Leveraging the semantic core, we can create rich user experiences in a plug-and-play architecture
Monday, September 30, 13
We can add additional processing power for scalability, richness and more rapid processing of content
Monday, September 30, 13
Tools alone can’t solve the information challenge
People conceptualize information based on how they want to use it – differently even for shared data
It isn’t about search – it’s about exploration
Why does search fail in the enterprise?
Monday, September 30, 13
Demo Overview: Data Sources
• Data Sources:– Pages about drugs, chemicals, etc. from
Wikipedia – Pages with chemical data from PubChem
covering many of the drugs, chemicals, etc. from Wikipedia
– Extract from Drugs@FDA– Extract from ClinicalTrials.gov
Monday, September 30, 13
Overview: Main Features
– Search across all the data sources– Searches can be structured based on:
• A variety of facets• Molecular Similarity (Details covered later…)
– Facets are created through Ontology-based tagging via both SmartLogic and GATE
– Search Query entry supplemented through Ontology lookups via SmartLogic
– Search Results are post-processed via R to provide representative analytic graphics, e.g., heat-map
Monday, September 30, 13
Ontologies loaded in SmartLogic
• Demo Ontologies taken from: nci.nih.gov– Medical - NICHD-Pediatric-Terminology– Structure - FDA/ndfrt/StructuralClass– Mechanism of Action - FDA/ndfrt/MechanismOfAction
Monday, September 30, 13
Approaches to Similarity Searching
Monday, September 30, 13
Google: Similarity Searching on the Web
Crawls the internet to
evaluate Web pages
• Frequency on page• Location on page• Age of page• Links to the page
Google uses page rank
criteria
To evaluate web pages(or indexes to pages)
User enters keyword(s) To suggest links
Monday, September 30, 13
Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.
Google: Similarity Searching on the Web
Crawls the internet to
evaluate Web pages
• Frequency on page• Location on page• Age of page• Links to the page
Google uses page rank
criteria
To evaluate web pages(or indexes to pages)
User enters keyword(s) To suggest links
Monday, September 30, 13
Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:
Google: Similarity Searching on the Web
Crawls the internet to
evaluate Web pages
• Frequency on page• Location on page• Age of page• Links to the page
Google uses page rank
criteria
To evaluate web pages(or indexes to pages)
User enters keyword(s) To suggest links
Monday, September 30, 13
Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:
The frequency and location of keywords within the Web page:
Google: Similarity Searching on the Web
Crawls the internet to
evaluate Web pages
• Frequency on page• Location on page• Age of page• Links to the page
Google uses page rank
criteria
To evaluate web pages(or indexes to pages)
User enters keyword(s) To suggest links
Monday, September 30, 13
Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:
The frequency and location of keywords within the Web page:
How long the Web page has existed
Google: Similarity Searching on the Web
Crawls the internet to
evaluate Web pages
• Frequency on page• Location on page• Age of page• Links to the page
Google uses page rank
criteria
To evaluate web pages(or indexes to pages)
User enters keyword(s) To suggest links
Monday, September 30, 13
Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:
The frequency and location of keywords within the Web page:
How long the Web page has existed The number of other Web pages that link to the page
Google: Similarity Searching on the Web
Crawls the internet to
evaluate Web pages
• Frequency on page• Location on page• Age of page• Links to the page
Google uses page rank
criteria
To evaluate web pages(or indexes to pages)
User enters keyword(s) To suggest links
Monday, September 30, 13
Pandora: Similarity Searching for Music
Music Genome Project
Evaluates StandardAttributes
Melody Harmony Rhythm Form Composition Lyrics
Uses defined criteria for that
song
To evaluate musicUser Selected
Song/Artist
To suggest a similar song
Monday, September 30, 13
Pandora is an internet radio service that tailors the selection of music it plays to your interest
Pandora: Similarity Searching for Music
Music Genome Project
Evaluates StandardAttributes
Melody Harmony Rhythm Form Composition Lyrics
Uses defined criteria for that
song
To evaluate musicUser Selected
Song/Artist
To suggest a similar song
Monday, September 30, 13
Pandora is an internet radio service that tailors the selection of music it plays to your interest
Unlike other services that may match with popularity or similarity to a particular artist, Pandora using the Music Genome project.
Pandora: Similarity Searching for Music
Music Genome Project
Evaluates StandardAttributes
Melody Harmony Rhythm Form Composition Lyrics
Uses defined criteria for that
song
To evaluate musicUser Selected
Song/Artist
To suggest a similar song
Monday, September 30, 13
Pandora is an internet radio service that tailors the selection of music it plays to your interest
Unlike other services that may match with popularity or similarity to a particular artist, Pandora using the Music Genome project.
The Music Genome project defined 400 musical attributes such as melody, harmony, rhythm, form, composition and lyrics
Pandora: Similarity Searching for Music
Music Genome Project
Evaluates StandardAttributes
Melody Harmony Rhythm Form Composition Lyrics
Uses defined criteria for that
song
To evaluate musicUser Selected
Song/Artist
To suggest a similar song
Monday, September 30, 13
Pandora is an internet radio service that tailors the selection of music it plays to your interest
Unlike other services that may match with popularity or similarity to a particular artist, Pandora using the Music Genome project.
The Music Genome project defined 400 musical attributes such as melody, harmony, rhythm, form, composition and lyrics
Pandora evaluates each song for these attributes and delvers similarly attributed music to the listener
Pandora: Similarity Searching for Music
Music Genome Project
Evaluates StandardAttributes
Melody Harmony Rhythm Form Composition Lyrics
Uses defined criteria for that
song
To evaluate musicUser Selected
Song/Artist
To suggest a similar song
Monday, September 30, 13
Similar molecules are those with a “similar” spatial arrangement of functional groups. Matches tend to have similar physical and biological properties.Similarity is found by:
Knowledgent: Similarity Searching in Chemistry
Defining a set of featuresOne-Dimensional DescriptorsField based DescriptorsFragment based DescriptorsSub-Shape Descriptors
Identifying which feature exists in each moleculeCalculating similarity based on the overlap and differences of features
Salicylsalicylic acidAspirin
Similar
Monday, September 30, 13
Knowledgent: Scientific Search with Molecular Similarity
• Search Results• Navigation Facets• Visualizations
UnstructuredStore(s)
Search and Analytic Engine
Produces
Ontology Relations
Sim
ilar
Com
poun
ds
Matching Docs
XML
External Data Sources
Internal Data Sources
Displayed As:
Indexed Into
Monday, September 30, 13
Life ScienceMolecular SimilarityProject SimilarityEmployee SimilarityPathway Similarity
Financial ServicesFraud Detection (Account Similarity)
Human ResourcesHiring (Finding Similar Candidates)Employees (Skills Similarity)
Combining Search and Similarity Analytics is Broadly Applicable
Beyond Similarity, we can build Custom Relevancy Measures As Needed.
Monday, September 30, 13
Original and New Architectures
Monday, September 30, 13
Original Architecture
Monday, September 30, 13
New Architecture
Monday, September 30, 13
Examples of Additional Features made possible by New Architecture
• Entity extraction of persons, organizations, and locations using multiple engines, e.g., adding OpenNLP.
• Identify similar documents via clustering using the K-Means algorithm from Mahout (Machine Learning).
• Statistically Interesting Phrases, a.k.a. collocations, to discover words or terms that co-occur more often than would be expected by chance. Also done via Mahout.
• Document similarity based on a cosine similarity algorithm to measure the level of similarity between two documents.
• Which gives us more data for Analytics and Graphics via R, etc.• Analysis of query and click logs:
– Most Popular Queries– Most Popular Query Terms– Mean Reciprocal Rank– Queries with Less Than 100 Results– Number of Distinct Queries– Total Queries
Monday, September 30, 13
Thank You!
Christine Connors
@cjmconnorsM: 910.874.8486
Contact
Monday, September 30, 13