34
Prepared By: , 2013 | Text Analytics World Boston, MA | October 1, 2013 Christine Connors, Jeff Catteau, Mike Vogel October Boston, MA Semantics and Semantics Integrating Text Analytics and Ontologies Monday, September 30, 13

Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Prepared By: , 2013 |

Text Analytics WorldBoston, MA | October 1, 2013

Christine Connors, Jeff Catteau, Mike Vogel October Boston, MA

Semantics and SemanticsIntegrating Text Analytics and Ontologies

Monday, September 30, 13

Page 2: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

We Leverage Traditional and Novel Technologies to Improve Scientific and Business Outcomes

Baselining

Signal Detection

Case Management

Reporting

Risk Management

Proc

ess

Man

agem

ent

Stru

ctur

ed A

utho

ring

Anal

ytic

s

Info

rmat

ion

Inte

rcha

nge

Case Study: Improved Signal Detection for Pharmacovigilance Leveraging Network Analysis, Pattern Matching and Ontological Logic Frameworks

Semantic Support

Semantics Lens

Monday, September 30, 13

Page 3: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Semantics: the study and application of meaning

You give me a headache. Hapitol is the antidepressant I use.

I have been on Hapitol for two years and only recently began to experience headaches at night

Monday, September 30, 13

Page 4: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Semantic technologies reduce barriers between people and information

How we think about: The world Our work Information A task A problem An analytic

How information representsand supports The world Our work Information A task A solution An analytic

Lens

of M

eanin

g

Monday, September 30, 13

Page 5: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

With the right approach, semantic processing can provide a reusable framework for authorship, analytics and search

In Scientific Search, we are making it easier for scientists and

information technology to

collaborate through semantic and algorithmic techniques

The core processing engine has additional

applications in research and Big Data

And fits within an overall knowledge

architecture that can drive internal and

external collaboration

Monday, September 30, 13

Page 6: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

To achieve differentiated search, you need to de-couple the information capabilities

Content Processing

Indexing and Alignment

Search Application

Functional Extensions

Structured Accessors

Unstructured Accessors

Connectivity to Information Assets

Semantic Extraction, Embedding and

Pattern Detection

Map Concepts to Client Business

Definitions

Orchestrated Query Processor

Use-Case Specific Filters, Analytics and

Visualizations

Monday, September 30, 13

Page 7: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

We use hybridized NLP capabilities to:

Extract meaningful entities from text (people, places, products, etc.)

Process language to determine meaning and intent (context, clarification, disambiguation)

Match things to each other (entities, patterns, moods, sentiment)

Identify things that do our don’t match expectations (signal detection)

Organize our information aligned to our thoughts (autoclassification and tagging, streaming to process, translation)

What do the semantic processors do?

More M

eaningful

Monday, September 30, 13

Page 8: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

It helps to align to content processing already in the enterprise (or that should be)

Domain & Enterprise Semantics

Knowledge Pipeline

Search-RelatedProcessing and

Enrichment

Data

CaptureNormalizerData Capture Normalization Quality

Control Distribution

Authoring &Editorial

Interfaces

Content Monitoring& Alerting

Coding

Manual Coding Interface

ManualCodingQueueEntity

Extraction Categorizer Rules-basedCoding

Expansion/Validation

Semantic Processor

Monday, September 30, 13

Page 9: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Leveraging the semantic core, we can create rich user experiences in a plug-and-play architecture

Monday, September 30, 13

Page 10: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

We can add additional processing power for scalability, richness and more rapid processing of content

Monday, September 30, 13

Page 11: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Tools alone can’t solve the information challenge

People conceptualize information based on how they want to use it – differently even for shared data

It isn’t about search – it’s about exploration

Why does search fail in the enterprise?

Monday, September 30, 13

Page 12: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Demo Overview: Data Sources

• Data Sources:– Pages about drugs, chemicals, etc. from

Wikipedia – Pages with chemical data from PubChem

covering many of the drugs, chemicals, etc. from Wikipedia

– Extract from Drugs@FDA– Extract from ClinicalTrials.gov

Monday, September 30, 13

Page 13: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Overview: Main Features

– Search across all the data sources– Searches can be structured based on:

• A variety of facets• Molecular Similarity (Details covered later…)

– Facets are created through Ontology-based tagging via both SmartLogic and GATE

– Search Query entry supplemented through Ontology lookups via SmartLogic

– Search Results are post-processed via R to provide representative analytic graphics, e.g., heat-map

Monday, September 30, 13

Page 15: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Approaches to Similarity Searching

Monday, September 30, 13

Page 16: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Google: Similarity Searching on the Web

Crawls the internet to

evaluate Web pages

• Frequency on page• Location on page• Age of page• Links to the page

Google uses page rank

criteria

To evaluate web pages(or indexes to pages)

User enters keyword(s) To suggest links

Monday, September 30, 13

Page 17: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.

Google: Similarity Searching on the Web

Crawls the internet to

evaluate Web pages

• Frequency on page• Location on page• Age of page• Links to the page

Google uses page rank

criteria

To evaluate web pages(or indexes to pages)

User enters keyword(s) To suggest links

Monday, September 30, 13

Page 18: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:

Google: Similarity Searching on the Web

Crawls the internet to

evaluate Web pages

• Frequency on page• Location on page• Age of page• Links to the page

Google uses page rank

criteria

To evaluate web pages(or indexes to pages)

User enters keyword(s) To suggest links

Monday, September 30, 13

Page 19: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:

The frequency and location of keywords within the Web page:

Google: Similarity Searching on the Web

Crawls the internet to

evaluate Web pages

• Frequency on page• Location on page• Age of page• Links to the page

Google uses page rank

criteria

To evaluate web pages(or indexes to pages)

User enters keyword(s) To suggest links

Monday, September 30, 13

Page 20: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:

The frequency and location of keywords within the Web page:

How long the Web page has existed

Google: Similarity Searching on the Web

Crawls the internet to

evaluate Web pages

• Frequency on page• Location on page• Age of page• Links to the page

Google uses page rank

criteria

To evaluate web pages(or indexes to pages)

User enters keyword(s) To suggest links

Monday, September 30, 13

Page 21: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Google ranks search results, which in turn determines the order Google displays results on its search engine results page using its PageRank algorithm, which assigns each Web page a relevancy score.A Web page's PageRank depends on factors like:

The frequency and location of keywords within the Web page:

How long the Web page has existed The number of other Web pages that link to the page

Google: Similarity Searching on the Web

Crawls the internet to

evaluate Web pages

• Frequency on page• Location on page• Age of page• Links to the page

Google uses page rank

criteria

To evaluate web pages(or indexes to pages)

User enters keyword(s) To suggest links

Monday, September 30, 13

Page 22: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Pandora: Similarity Searching for Music

Music Genome Project

Evaluates StandardAttributes

Melody Harmony Rhythm Form Composition Lyrics

Uses defined criteria for that

song

To evaluate musicUser Selected

Song/Artist

To suggest a similar song

Monday, September 30, 13

Page 23: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Pandora is an internet radio service that tailors the selection of music it plays to your interest

Pandora: Similarity Searching for Music

Music Genome Project

Evaluates StandardAttributes

Melody Harmony Rhythm Form Composition Lyrics

Uses defined criteria for that

song

To evaluate musicUser Selected

Song/Artist

To suggest a similar song

Monday, September 30, 13

Page 24: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Pandora is an internet radio service that tailors the selection of music it plays to your interest

Unlike other services that may match with popularity or similarity to a particular artist, Pandora using the Music Genome project.

Pandora: Similarity Searching for Music

Music Genome Project

Evaluates StandardAttributes

Melody Harmony Rhythm Form Composition Lyrics

Uses defined criteria for that

song

To evaluate musicUser Selected

Song/Artist

To suggest a similar song

Monday, September 30, 13

Page 25: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Pandora is an internet radio service that tailors the selection of music it plays to your interest

Unlike other services that may match with popularity or similarity to a particular artist, Pandora using the Music Genome project.

The Music Genome project defined 400 musical attributes such as melody, harmony, rhythm, form, composition and lyrics

Pandora: Similarity Searching for Music

Music Genome Project

Evaluates StandardAttributes

Melody Harmony Rhythm Form Composition Lyrics

Uses defined criteria for that

song

To evaluate musicUser Selected

Song/Artist

To suggest a similar song

Monday, September 30, 13

Page 26: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Pandora is an internet radio service that tailors the selection of music it plays to your interest

Unlike other services that may match with popularity or similarity to a particular artist, Pandora using the Music Genome project.

The Music Genome project defined 400 musical attributes such as melody, harmony, rhythm, form, composition and lyrics

Pandora evaluates each song for these attributes and delvers similarly attributed music to the listener

Pandora: Similarity Searching for Music

Music Genome Project

Evaluates StandardAttributes

Melody Harmony Rhythm Form Composition Lyrics

Uses defined criteria for that

song

To evaluate musicUser Selected

Song/Artist

To suggest a similar song

Monday, September 30, 13

Page 27: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Similar molecules are those with a “similar” spatial arrangement of functional groups. Matches tend to have similar physical and biological properties.Similarity is found by:

Knowledgent: Similarity Searching in Chemistry

Defining a set of featuresOne-Dimensional DescriptorsField based DescriptorsFragment based DescriptorsSub-Shape Descriptors

Identifying which feature exists in each moleculeCalculating similarity based on the overlap and differences of features

Salicylsalicylic acidAspirin

Similar

Monday, September 30, 13

Page 28: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Knowledgent: Scientific Search with Molecular Similarity

• Search Results• Navigation Facets• Visualizations

UnstructuredStore(s)

Search and Analytic Engine

Produces

Ontology Relations

Sim

ilar

Com

poun

ds

Matching Docs

XML

External Data Sources

Internal Data Sources

Displayed As:

Indexed Into

Monday, September 30, 13

Page 29: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Life ScienceMolecular SimilarityProject SimilarityEmployee SimilarityPathway Similarity

Financial ServicesFraud Detection (Account Similarity)

Human ResourcesHiring (Finding Similar Candidates)Employees (Skills Similarity)

Combining Search and Similarity Analytics is Broadly Applicable

Beyond Similarity, we can build Custom Relevancy Measures As Needed.

Monday, September 30, 13

Page 30: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Original and New Architectures

Monday, September 30, 13

Page 31: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Original Architecture

Monday, September 30, 13

Page 32: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

New Architecture

Monday, September 30, 13

Page 33: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Examples of Additional Features made possible by New Architecture

• Entity extraction of persons, organizations, and locations using multiple engines, e.g., adding OpenNLP.  

• Identify similar documents via clustering using the K-Means algorithm from Mahout (Machine Learning).

• Statistically Interesting Phrases, a.k.a. collocations, to discover words or terms that co-occur more often than would be expected by chance. Also done via Mahout.

• Document similarity based on a cosine similarity algorithm to measure the level of similarity between two documents.

• Which gives us more data for Analytics and Graphics via R, etc.• Analysis of query and click logs:

– Most Popular Queries– Most Popular Query Terms– Mean Reciprocal Rank– Queries with Less Than 100 Results– Number of Distinct Queries– Total Queries

Monday, September 30, 13

Page 34: Semantics and Semantics - Text Analytics World...Knowledgent: Scientific Search with Molecular Similarity • Search Results • Navigation Facets • Visualizations Unstructured Store(s)

Thank You!

Christine Connors

@cjmconnorsM: 910.874.8486

Contact

Monday, September 30, 13