58
Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric Bregand, Chief Executive Officer TEMIS Tampa (FL) – March 09

Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric Bregand, Chief Executive Officer TEMIS Tampa (FL) – March

Embed Size (px)

Citation preview

Redefining Search Technology Solutions for Better Information Access

ASIDIC Spring MeetingEric Bregand, Chief Executive Officer TEMISTampa (FL) – March 09

Copyright © 2009 TEMIS – All rights reserved 2

What Have We Learned So Far?

Users back as heart of solutions• Engage, Empower, Ease of Use & Trust (E3T)

Information accuracy is key• #1 criteria for churn

Web 2.0/3.0 as backbone for information delivery• Semantic Search• Digital desktop

“Give users what they want before they know they want it”

Copyright © 2009 TEMIS – All rights reserved 3

Where are we?

Agenda

1. Introduction to Text Mining

2. Text Mining for information consumers

3. Text Mining for information producers

4. Moving forward >> Text Mining Web Services

5. Summary and Q&A

Copyright © 2009 TEMIS – All rights reserved 5

Term

Entity

Fact

Knowledge

What is Text Mining? Example!

Product

Dosing

Action Target

State Event Action

Potential Adverse EffectDrug = Trimilax Dosing = 500mgSymptom = TirenessWhen = After administration

Drug Symptom Condition

Prop. Num.Abrev. Verb /3rd

Pron. Adj. Prep. NounVerb

Trimilax

500 makes me feel after ingestion

mg dizzy

Copyright © 2009 TEMIS – All rights reserved 6

Text Mining? Understand!

Title: Google gives drivers a hand at the gas pumps

Source: InformationWeekAuthor: Antone GonsalvesDate: November 7, 2007

Metadata

Entities

Facts

Copyright © 2009 TEMIS – All rights reserved 7

Text Mining? Understand!

Linux

United States

Open-source …

Google

T-Mobile HTC

Qualcomm Motorola

Atlanta

Locations

National Association of Conveni…

Organizations

Lucy Sackett

Persons

Internet

Technologies

Gilbarco Veeder-Root

Companies

InformationWeek

Sackett

Gilbarco

Entities

Facts

Metadata

Product

New Service Google Service

Copyright © 2009 TEMIS – All rights reserved 8

Text Mining? Understand!

Launch

Gilbarco Google Service

Gilbarco New service

Announcement

Partnership

Gilbarco Google

Sackett InformationWeek

Function

Sackett Gilbarco

Alliance

Google HTC

Qualcomm

Motorola

T-Mobile

Entities

Facts

Metadata

Announcement

Who: GilbarcoWhom: unknownWhat: New ServiceWhen: unknown

Who: GilbarcoWhat: Google ServiceWhen: early next week

Launch

Who: SackettCompany: GilbarcoFunction: spoke woman

Function

Who: GilbarcoWith whom: GoogleWhen; unknownState: Negative

Partnership

Who: GoogleWith whom: T-Mobile, HTC, Qualcom, MotorolaWhen: unknown

Alliance

Announcement

Who: SackettWhom: InformationWeekWhen: unknownWhat: unknown

Copyright © 2009 TEMIS – All rights reserved 9

Text Mining? Create Knowledge!

Copyright © 2009 TEMIS – All rights reserved 10

What is Text Mining?

Text Mining is an information access technology… Text Mining generates Knowledge Text Mining serves information consumers & producers

Text Mining Back-End

DataRepository

Text Mining Front-End(Text Analytics)

Agenda

1. Introduction to Text Mining

2. Text Mining for Information Consumers

3. Text Mining for Information Producers

4. Moving forward >> Text Mining Web Services

5. Summary and Q&A

Copyright © 2009 TEMIS – All rights reserved 12

Search Engines Today

Scalable (billions docs) Pervasive (any sources) Live (any time) Dynamic ?

Fast (m’sec queries) Simple (list of documents) Relevant? Informative ?

Document Processing QueryIndex

Copyright © 2009 TEMIS – All rights reserved 13

Text Analytics Today

Index

Text Mining Platform

Entities & Concepts Events & Facts

Occurrences & Position

Search

Discover

Analyze

Scalable (100k docs) Domain-centric Live (any time)

Pertinent Collaborative

Copyright © 2009 TEMIS – All rights reserved 14

Enhanced Searches with Text Mining

Enrich Search Index with more & more relevant extracted

information

Document Processing Query

Text Mining Platform

Index

Business Centric Annotators

Pertinent searches Richer indexes More relevant information

Just better searches No analysis No discovery

Copyright © 2009 TEMIS – All rights reserved 15

Beyond Search ! Discover & Analyze

Document ProcessingIndex

Entities & Concepts Events & Facts

Occurrences & Position Discover

Analyze

Query

Informative Easy reading with highlighting Knowledge Discovery within info links

Pertinent searches Richer indexes More relevant information

Text Mining Platform

Copyright © 2009 TEMIS – All rights reserved 16

Term

Entity

Concept

Pertinence Gains – Beyond Terms…

Pertinence

Average

Good

Excellent

Administration

Federal

Federal Drug Administration

Regulation Agency

Agency

Swiss

Regulatory

Swiss Regulation Agency

Drug

“Search Regulation Agency” better than “Search FDA or Federal…”

Copyright © 2009 TEMIS – All rights reserved 17

Term

Entity

Concept

Proximity(Paragraph)

Pertinence Gains – Beyond Doc’ts

Co-Occurrence(Document)

Facts(Sentence)

Identify entities near by in documentIdentify entities near by in paragraphIdentify entities linked by semantic sense

Proximity

Buy

It was discovered by San Francisco-based Sugen, a biotechnology company that was purchased by pharmaceutical company Pharmacia Corp.

….

Five months later, Pfizer bid for Pharmacia. The maker of the popular arthritis drug Celebrex and hair-loss treatment Rogaine…

Pertinence

Average

Good

Excellent

Copyright © 2009 TEMIS – All rights reserved 18

Pertinence Gains – Benchmarks

Relevance

Average

Good

Excellent Concept Facts

ProximityEntity

TermCo-

Occurence

Text Mining & Search Engine

Standalone Search engine

Copyright © 2009 TEMIS – All rights reserved 19

Key Feature Benefits

Combined Text Analytics & Search• Stay fast & scalable• But also become more pertinent & collaborative

End-user benefits = powerful search & discovery1. Enhanced search 2. Guided navigation3. Assisted document reading4. Standardized data analysis and reporting5. Information discovery6. Collaborative platform

Copyright © 2009 TEMIS – All rights reserved 20

1. Enhanced Search Experience

Simple recognition of words…

From standard keyword search….

Copyright © 2009 TEMIS – All rights reserved 21

•Make comprehensive and precise search•Get more relevant documents•Find what you don’t know!

1. Enhanced Search Experience… to Entity & Fact search!

End-

User

Benefits

Copyright © 2009 TEMIS – All rights reserved 22

2. Faceted Navigation

From “narrow your search”….

Copyright © 2009 TEMIS – All rights reserved 23

2. Faceted Navigation

•Get a quick vision of document content•Navigate within context-relevant information•Rapidly focus on targeted documents

End-

User

Benefits

… to multi-dimensional faceted navigation

Self-adjusting filters to refine the search

Ability to combine several filters at once

(and/or)

Point & Click filtering

Copyright © 2009 TEMIS – All rights reserved 24

3. Assisted Document Reading

From raw data display…

Copyright © 2009 TEMIS – All rights reserved 25

3. Assisted Document Reading

•Instant spotting of relevant information•Guided reading•Get additional context (“Smart Link”)

End-

User

Benefits

… to targeted information

viewing

Instant access to relevant information

Text Highlighting

Copyright © 2009 TEMIS – All rights reserved 26

From bug view ….

4. Data Analysis and Reporting

Copyright © 2009 TEMIS – All rights reserved 27

4. Data Analysis and Reporting

… to bird-eye

view!

•Visualize key Entities & Facts (pie/bar charts)•Detect Entities & Facts dependencies (matrix

charts)•Zoom in & out by drilling anywhere

End-

User

Benefits

Copyright © 2009 TEMIS – All rights reserved 28

5. Information Discovery

From flat list of documents ….

Copyright © 2009 TEMIS – All rights reserved 29

5. Information Discovery… to

information network

Entities

Facts

Search Panel

Discovery Tools

Proofs

•Search in knowledge, not in documents •Get a graphical representation of knowledge•Discover information by navigating within Facts

End-

User

Benefits

Copyright © 2009 TEMIS – All rights reserved 30

6. Collaborative Platform

User Enriched Content• Join 2 entities

Ex: BASF = BASF Plant Sciences

• Re-assign entityEx: Carl Zeiss = Company (instead of person)

• Remove entityEx: BUT is not a company (although a French one)

• Add entityEx: XyyyZ is a protein•Increase information sharing

•Capitalize on knowledge•Improve indexing quality

End-

User

Benefits

… to information producer!

From information consumer…

Agenda

1. Introduction to Text Mining

2. Text Mining for Information Consumers

3. Text Mining for Information Producers

4. Moving forward >> Text Mining Web Services

5. Summary and Q&A

Copyright © 2009 TEMIS – All rights reserved 33

Text Mining as Core Component

ProductManagement

Web Content Management

Text MiningContent

Enrichment

Related TopicsExtraction

SmartLinking

Sentiment Analysis

Trends Analysis & Charting

Similarity Detection

Content Annotation

Metadata Extraction

Taxonomy Management

Automatic Categorization

Entity & FactsExtraction

Original ContentJournal Scans

Expert InterviewsEvent Reports

Visitors & customer

s

Content Editors

Editorial& Content

Management

Copyright © 2009 TEMIS – All rights reserved 34

Text Mining Value Proposition

1. Enhance editorial productivity• Reduce cost of creating information products• Increase product quality and consistency• Improve editorial team satisfaction & productivity

2. Enrich content for agile publishing• Increase revenue & maximize content monetization• Improve customer experience & loyalty• Provide agility in creating faster smarter products

Text Mining reduces the production costs and accelerates the delivery of information products

Copyright © 2009 TEMIS – All rights reserved 35

1. Enhancing Editorial Productivity

Content categorization & alerts• Content is automatically categorized according to editors’

preferences and expertise Reduce time in integrating content

Extraction & normalization• References, citations and metadata are automatically

extracted and normalized Ensure information consistency

Semantic and topical tagging• Semantic tags and topics are suggested for editors’ review

and approval Speed-up the editorial process

Copyright © 2009 TEMIS – All rights reserved 36

2. Enrich Content for Agile Publishing

Semantic content linking – navigate!• Provide more relevant content in context by suggesting

similar documents Create more engaging, longer lasting user visits

Richer content tagging – find!• Leverage the powerful content enrichment to better

describe the content and then power accurate searches Richer user experience through accurate answers & facets

Information Analytics – understand!• Powerful analytics to slice & dice your content Quickly assess the feasibility of new product ideas Reach out to new audiences with smarter products

Copyright © 2009 TEMIS – All rights reserved 38

Better Search Capabilities!Example

Peshawar President Bush Islamic union Boycott Benazir Bhutto Pervez Musharraf John McCain Islamabad

Politics Local Washington International

Business News Product Launch

Finance M&A Stock Dow/Nasdaq

Deals People

On the move Interviews

SEARCH

Related documentsMusharraf to hold early electionTalibans positions moveAdministration reiterates supportBenazir calls for resignation(more)

In this documentPeople President Bush Benazir BhuttoOrganizations White HouseLocations Peshawar Washington(more)

News Today

Relevant Topic Extraction

Automatic Categorization

Pakistan polls boycott would help Musharraf : Bhutto2 days agoPESHAWAR, Pakistan (AFP) — Former Pakistan prime minister Benazir Bhutto said Sunday an opposition boycott of upcoming polls would only help President Pervez Musharraf legitimise his imposition of emergency rule.Bhutto said she would meet early next week with former rival Nawaz Sharif, who has called for a boycott of the January 8 election, to discuss the issue."If we all boycott elections, then it will give Musharraf a two-thirds majority in the parliament to validate his provisional constitutional order," she told a press conference in northwestern city of Peshawar, an Islamic political stronghold."That is why we are saying that we will take part in elections under protest, but we will also leave the door open (to talks on a boycott).""I am getting conflicting signals from Nawaz Sharif and Qazi Hussain Ahmad about (an) election boycott as they have filed nomination papers and if someone does that it means he is taking part in election," Bhutto told reporters.

Pervez Musharraf--------------------GoogleWikipediaLinkedIn

Pervez Musharraf--------------------GoogleWikipediaLinkedIn

Smart Linking Entity Extraction

Similarity

Copyright © 2009 TEMIS – All rights reserved 39

Selected Business Cases

Editorial Agile Publishing

Copyright © 2009 TEMIS – All rights reserved 40

Editorial – Current BIODATA

Objectives• Automate primary content acquisition (scientific

literature, patents, business wires, sites, …)• Automate primary content indexing (protein,

genes, diseases, company, people, etc.) Solution

• Web harvesting with QL2• Information extraction, categorization and alerting with

Luxid® and packaged Annotators (BER, MER, CI) Benefits

• Significant cost savings on data gathering and analysis• Highly scalable framework covering multi-topics and

thousands of sources

Copyright © 2009 TEMIS – All rights reserved 41

Editorial – LexisNexis

Objectives• Automatic categorization & indexation using legal controlled

vocabulary• Centralized Knowledge• Easier access to Content

Solution• Mondeca as Legal Ontology • Luxid® with legal Annotator (custom made)

Benefits• More efficient asset management and update• Improved content quality and consistency• More efficient search/navigation based on semantics

Copyright © 2009 TEMIS – All rights reserved 44

Editorial – Search enhancement

Objective• Increase search and retrieval quality with better part-of-

speech tagging in German

Solution• TEMIS XeLDA® to improve the indexing process• Integration with Verity K2

Benefits• Increase customer satisfaction by providing more

accurate and comprehensive search results

Copyright © 2009 TEMIS – All rights reserved 45

Editorial – AFP

Objective• Build the new AFP cross media platform of information

access (B2B « Image Forum » platform).

Solution• Luxid® with People, Location, Organization, Company and

IPTC codes annotators• Integration with an ontology management tool and a

search engine

Benefits• Uniform access to any AFP content (text, audio, video…)• Make information access easier on 10M+ articles in 6

different languages, 10M+ images and between 2 and 3 millions of news articles per year

Copyright © 2009 TEMIS – All rights reserved 46

Agile Publishing – Elsevier

Objective• Develop a revolutionary database indexing the last 28 years in

chemistry patent• Provide an exceptional users’ experience by using “smart

content”

Results• ~20 Million Chemistry Patent documents• Searchable by chemical reactions, solvents, reactants directly

extracted from the documents• Released by Elsevier-MDL in Nov. 2004

Currently• TEMIS distributes the Chemical Entities Relationships Annotator

in partnership with Elsevier

Copyright © 2009 TEMIS – All rights reserved 47

Agile Publishing – Thomson

Objective = Rescue lost-data• 49 bound volumes of Biological

Abstracts® for 1926 to 1968 digitized using offshore resources

• Required to make the data searchable with the BIOSIS

Approach• Use Luxid® entity extraction to obtain

candidate terms from the titles and abstracts• Map the extracted entities to the BIOSIS vocabulary• Output the resulting indexing as XML for loading to the

Content Management System

Copyright © 2009 TEMIS – All rights reserved 48

Agile Publishing – Springer Objective

• Mapping of meaningful words and phrases in journal articles to encyclopedia entries

• Identification of related documents in a pool of over three million journal articles

Solution• Indexing of incoming journal articles to link journal articles with the

related encyclopedia entry• Creation of semantic fingerprint for each journal article to allow search

engine calculate degree of relationship• Integration with Springer’s search engine

Benefits• Increased product sales by improving content linking

Agenda

1. Introduction to Text Mining

2. Text Mining for Information Consumers

3. Text Mining for Information Producers

4. Moving forward >> Text Mining Web Services

5. Summary and Q&A

Copyright © 2009 TEMIS – All rights reserved 51

Market Expectations

On-the-fly annotation services• Federated platform (web2.0/3.0)• Serving all user/IT tools (browser, office, search, content

management, …) Text Mining Any Where• Highly scalable• Anytime (24/7)• Any documents• Any languages (US, European, Asian, Arabic, …)

Copyright © 2009 TEMIS – All rights reserved 52

Market Expectations

On-the-fly annotation services High-quality & accuracy

• Generic entities (people, company, …)• Market-specific entities (drug, patient, court cases, …)• Generic facts (acquisition, announcements, events, …)• Market-specific facts (binding, activation, law suit, …)• Disambiguation (Orange! Telco company? Location? Fruit?)• Normalization (IBM Corp = IBM = I.B.M)

Copyright © 2009 TEMIS – All rights reserved 53

Market Expectations

On-the-fly annotation services High-quality & accuracy More than just annotations

• Content enrichment with additional dataGPS coordinate for locations , Chemical

structure for drugs, …• Information linking

Content is about hyper linking• Semantic mash-up

Wikipedia for named entities (people, location, events, …)

Google maps for geolocationPatents database for scientific literature…

Copyright © 2009 TEMIS – All rights reserved 54

Receive annotated documents

Text Mining Web Services

Send documents

Content Annotation

Web Services

+++ Receive annotated

enriched documents

Receive annotated enriched & linked

documents

++

+ +

Persistent Content Repositories

Text Mining Services

Content Hyper-linking Web

Services

Text Mining Services Customer Data

Public Data

Content Enrichment

Web Services

Text Mining Services

Copyright © 2009 TEMIS – All rights reserved 55

Workflow Engine

Data Source Reader

Data OutputGenerator

……

Luxid® Information Mart

AD

MIN

Workflow Engine

Data Source Reader

Data OutputGenerator

……

Luxid® Information Mart

AD

MIN

Luxid® Annotation Factory

API/W

EB S

ER

VIC

E

AnnotationPlan 1

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan N

Skill C

artridge

Skill C

artridge

Skill C

artridge

Luxid® Annotation Factory

API/W

EB S

ER

VIC

E

AnnotationPlan 1

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan 1

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan N

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan N

Skill C

artridge

Skill C

artridge

Skill C

artridge

Back-up Environment

TEMIS – Luxid® Web Services

Workflow Engine

Data Source Reader

Data OutputGenerator

……

Luxid® Information Mart

AD

MIN

Workflow Engine

Data Source Reader

Data OutputGenerator

……

Luxid® Information Mart

AD

MIN

Luxid® Annotation Factory

API/W

EB S

ER

VIC

E

AnnotationPlan 1

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan N

Skill C

artridge

Skill C

artridge

Skill C

artridge

Luxid® Annotation Factory

API/W

EB S

ER

VIC

E

AnnotationPlan 1

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan 1

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan N

Skill C

artridge

Skill C

artridge

Skill C

artridge

AnnotationPlan N

Skill C

artridge

Skill C

artridge

Skill C

artridge

Production Environment

• Create/Update Annotation Plans• Create/Update Annotation Workflows

TEMIS, Inc.

HTTPSWEB

SERVICES

• Create/Update Skill Cartridges™• Create/Update Classification Plans• Install/Upload Skill Cartridges™

Luxid® Knowledge Studio

On-DemandAnnotationTriggered by manual intervention

On-the-FlyAnnotationTriggered by automatic call

Luxid® Administration Console

HTTPSBrowser

HTTPSBrowser

SecuredFTP

Remote AdministrationMonitoring & Administration

Copyright © 2009 TEMIS – All rights reserved 56

TEMIS – Company Background

TEMIS = TExt MIning Solutions• Software company created in 2000• Dual Headquarters in Philadelphia & Paris• Acquisition of Xerox Linguistics (20 years of R&D)

Leader in Publishing and Life Sciences Text Mining• Over 200 clients in Pharma and B-to-B publishing• Founding member of UIMA’s OASIS committee

Flagship software product• Top-20 most innovative products

across Europe

Enable organizationsto better interact with their environment

by extracting knowledge and making sense of content

Agenda

1. Introduction to Text Mining

2. Text Mining empower Search Engines

3. Text Mining for Publishers

4. Moving forward – Text Mining Web Services

5. Summary and Q&A

Copyright © 2009 TEMIS – All rights reserved 58

Summary

Content Enrichment is critical • For End-Users• For Publishers • For any information consumers and producers

Copyright © 2009 TEMIS – All rights reserved 59

Summary

Content Enrichment is critical More than Content Enrichment is expected

• Content is about linking (Hyper-linking)• Semantic mash-up

Copyright © 2009 TEMIS – All rights reserved 60

Summary

Content Enrichment is critical More than content enrichment is expected Text-Mining plays an important role

• Proven technology• Key component in information access technology stack• Wide range of services (from basic tagging to semantic

linking)

Copyright © 2009 TEMIS – All rights reserved 61

Summary

Content Enrichment is critical More than content enrichment is expected Text-Mining plays an important role Key business benefits

• Reduce cost of creating information products• Increase revenue & maximize content monetization

Copyright © 2009 TEMIS – All rights reserved 62

Summary

Content Enrichment is critical More than content enrichment is expected Text-Mining plays an important role Key business benefits Immediate impacts

• Improve editorial team satisfaction & productivity• Enhance product quality and consistency• Increase customer experience & loyalty

Questions? Thank you!

ASIDIC Spring MeetingEric Bregand, Chief Executive Officer TEMISTampa (FL) – March 09