26
Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc. Large Scale Distributed Information Systems (LSDIS) Lab University Of Georgia; http://lsdis.cs.uga.edu October 24, 2002 © Amit Sheth Based on Keynote CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002

Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Snapshot of Semantic Web Commercial State of the Art

(presented at Science on the Semantic Web, Rutgers, October 2002)

Amit Sheth

CTO, Semagix Inc. Large Scale Distributed Information Systems (LSDIS) Lab

University Of Georgia; http://lsdis.cs.uga.edu

October 24, 2002© Amit Sheth

Based on Keynote

CONTENT- AND SEMANTIC-BASED INFORMATION RETRIEVAL @ SCI 2002

Page 2: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

I am not selling any product here.

It is interesting to note SW = Software has move to SW = Semantic Web

Page 3: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Fundamental Issue

• Ontology Creation and maintenance– Human consensus + automatic KB

(assertion) extraction

• Automatic Semantic Annotation• Extremely fast computations

exploiting semantic metadata– Especially named relationships

Page 4: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Central Role of Metadata

Where is the

content? Whose is

it?

ProduceAggregate

What is this

content about?

Catalog/Index

What other

content is it

related to?

Integrate Syndicate

What is the right

content for this user?

Personalize

What is the best way to

monetize this interaction?

Interactive Marketing

Broadcast,Wireline,Wireless,Interactive TV

Semantic Metadata

ApplicationsBack End

"A Web content repository without metadata is like a library without an index." - Jack Jia, IWOV“Metadata increases content value in each step of content value chain.” Amit Sheth

Page 5: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

A Metadata Classification

Data (Heterogeneous Types/Media)(Heterogeneous Types/Media)

Content Independent Metadata (creation-date, location, type-of-sensor...)(creation-date, location, type-of-sensor...)

Content Dependent Metadata (size, max colors, rows, columns...)(size, max colors, rows, columns...)

Direct Content Based Metadata (inverted lists, document vectors, LSI)(inverted lists, document vectors, LSI)

Domain Independent (structural) Metadata (C++ class-subclass relationships, HTML/SGML(C++ class-subclass relationships, HTML/SGML Document Type Definitions, C program structure...)Document Type Definitions, C program structure...)

Domain Specific Metadata area, population (Census),area, population (Census), land-cover, relief (GIS),metadata land-cover, relief (GIS),metadata concept descriptions from ontologiesconcept descriptions from ontologies

OntologiesClassificationsClassificationsDomain ModelsDomain Models

User

More More

SemanticsSemantics

for for

Relevance Relevance

to tackleto tackle

InformationInformation

Overload!!Overload!!

Page 6: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Semantic Metadata Extraction, Semantic Annotation

WWW, EnterpriseRepositories

METADATAMETADATA

EXTRACTORSEXTRACTORS

Digital Maps

NexisUPIAPFeeds/

Documents

Digital Audios

Data Stores

Digital Videos

Digital Images. . .

. . . . . .

Key challenge: Create/extract as much (semantics)metadata automatically as possible

Page 7: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix
Page 8: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Semantic Content Organization and Retrieval Engine (SCORE) technology

• Automatically aggregates and extracts information

from

disparate sources and multiple formats

• Automatically tags/annotates and categorizes

content

• Automatically creates relevant associations

- Maps content topics and their relationships

• Semantic query engine relates information and

knowledge

both internal and external to the organization into a

single

view

Page 9: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Semagix Freedom Product Components

Page 10: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Market Guide (MG)ZDNet (ZD)

Hoover’s (H)Data supplied from NASA (DPL)

Federation of American Scientists (FAS)Central Intelligence Agency (CIA)

The Interdisciplinary Center (ICT)Federal Bureau of Investigation (FBI)

Capital Advantage (CA)Office of Foreign Assets Control (OFAC)

PERSON (OFAC, FBI, DPL)

-politician (OFAC, FBI, CIA, CA)

politician associated with politicalOrganziation

politician held politicalOffice

politician associated with politicalOffice

-terrorist (OFAC, FBI, DPL)

terrorist memberOf organization

terrorist appears on watchList

-companyExecutive (MG)

companyExecutive holdsOffice companyPosition

person has permanent address address (OFAC, FBI)

person has dob(date of birth) (OFAC, FBI)

person has pob(place of birth) (OFAC, FBI)

Knowledge Sources Used

THING

-event (ICT)

terroristOrganization participated in terroristSponsoredEvent (ICT)

-politicalOffice (CIA, CA)

politicalOffice office(s) within govtOrganization

politicalOffice associated with organization

-watchList (OFAC, FBI, DPL)

terroristOrganization appears on watchList (OFAC, FBI, DPL)

-organization (OFAC, FBI, FAS, ICT, CA, CIA)

organization appears on watchList

organization memberOf suborganization

-company

company manufactures product (ZD)

company identifiedBy tickeySymbol (H)

companyposition position in company (MG)

company memberOf industry (H)

-tickerSymbol (H)

tickerSymbol memberOf exchange (H)

PLACE

-organization located in place (H, OFAC)

-religiousAffiliation practiced in place (CIA)

-company headquarters in city (H)

Entity Classes and Relationships populated by these knowledge sources:

JIVA

Page 11: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Video withEditorialized Text on the Web

AutoCategorization

AutoCategorization

Semantic MetadataSemantic Metadata

Automatic Categorization & Metadata Tagging (unstructured text)

Page 12: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Extraction Agent

Enhanced Metadata Asset

Semantic Metadata Extraction/Annotation:Semi-structured source

Web Page

Page 13: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Semantic Metadata

Syntax Metadata

Semantic Content Enhancement Workflow

Page 14: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Enabling powerful linking of actionable information and facilitating important semantic applications such as knowledge discovery and link analysis

(user’s task of manually retrieving all the information he needs to know is greatly minimized; he can spend more time making effective decisions)

Semantic Metadata Content TagsCompany: Cisco Systems, Inc.Classification: Channel Partners,

E-Business SolutionsChannel Partner: Siemens NetworkChannel Partner: Voyager NetworkChannel Partner: Siemens NetworkChannel Partner: Wipro GroupE-Business Solution: CI S-1270 SecurityE-Business Solution: CI S-320 LearningE-Business Solution: CI S-6250 FinanceE-Business Solution: CI S-1005 e-MarketTicker: CSCOI ndustry: Telecommunication, . . .Sector: Computer HardwareExecutive: J ohn ChambersCompetition: Nortel Networks

Syntactic MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San J ose, CAURL: http:/ /bloomberg.com/1.htmMedia: Text

XML content item with enriched semantic tagging, ready to be queried

E-Business SolutionOntology

CiscoSystems

VoyagerNetwork

SiemensNetwork

WiproGroup

UlysysGroup

CIS-1270 Security

CIS-320Learning

CIS-6250 Finance

CIS-1005 e-Market

Channel Partner

belongs to

- - -

Ticker

represen

ted b

y

- - -

- - -

- - -

- - -

Industry

chan

nel p

artn

er of

- - -

- - -

- - -

- - -

Competitioncompetes with

provider of

- - -

- - -

- - -

- - -

Executives

works

for

- - -

- - -

- - -

- - -

Sectorbelo

ngs

to

Semantic Enhancement

Uniquelyexploiting

real-worldsemantic

associationsin the right

context

SemanticMetadataExtraction

(also syntactic)

Content TagsSemantic MetadataClassification: Channel Partners,

E-Business SolutionsCompany: Cisco Systems, Inc.

Syntactic MetadataProducer: BusinessWireSource: BloombergDate: Sept. 10 2001Location: San J ose, CAURL: http: //bloomberg.com/1.htmMedia: Text

ChannelPartners

E-BusinessSolutionsClassification

Content Tags

Semantic MetadataClassification: Channel Partners,

E-Business Solutions

Classification CommitteeKnowledge-base, Machine Learning &

Statistical Techniques

Content Asset Index Evolution

Page 15: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Focused relevantcontent

organizedby topic

(semantic categorization)

Automatic ContentAggregationfrom multiple

content providers and feeds

Related relevant content not

explicitly asked for (semantic

associations)

Competitive research inferred

automatically

Automatic 3rd party content

integration

Semantic Application Example – Analyst Workbench

Page 16: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Related Stock

News

Related Stock

News

Semantic Web – Intelligent Content

IndustryNews

IndustryNews

Technology Products

Technology Products

COMPANYCOMPANY

SECEPAEPA

RegulationsRegulations

CompetitionCompetition

COMPANIES in Same or Related INDUSTRY

COMPANIES inINDUSTRY with Competing PRODUCTS

Impacting INDUSTRY or Filed By COMPANY

Important to INDUSTRY or COMPANY

Intelligent Content = What You Asked for + What you need to know!

Page 17: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Syntax Metadata

Semantic Metadata

led by

Same entity

Human-assisted inference

Knowledge-based & Manual Associations

Page 18: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Blended Semantic Browsing and Querying (Intelligence Analyst

Workbench)

Page 19: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Innovations that affect User Experience

• BSBQ: Blended Semantic Browsing and Querying

– Ability to query and browse relevant desired content in a highly contextual manner

• Seamless access/processing of Content, Metadata and Knowledge

– Ability to retrieve relevant content, view related metadata, access relevant knowledge and switch between all the

above, allowing user to follow his train of thought

• dACE: dynamic Automatic Content Enhancement

– Ability to provide enhanced annotation features, allowing the user to retrieve relevant knowledge about significant

pieces of content during content consumption

• Semantic Engine APIs with XML output

– Ability to create customized APIs for the Semantic Engine involving Semantic Associations with XML output to

cater to any user application

Page 20: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

VisionicsAcSysSecurity Portal

Check-in

Interrogation

Boarding Gate AirportAirspace

SemagixOntologyMetabase

Threat Scoring

Gov’t WatchlistsNews Media

Web Info

LexisNexisRiskWise

Passenger RecordsReservation Data

Airline DataAirport Data

Airline and Airport Data Future and Current Risks

Airport LEO

ARC AvSec ManagerData Management

Data Mining

IPG

Page 21: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Sources Used

Knowledge Sources:FBI - Most Wanted Terrorists

Denied Persons Lists

Terrorism Files

ICT

Office of Foreign Asset Control (OFAC)

Hamas terrorists

CNN Locations

FAA_Airport_Codes

About.com

Comtex_International

Hindustan Times

JerusalemPost

CNN

Newstrove_Hamas

Content Sources :

Africa News Service

AFX News – Asia/UK/Europe

AP Worldstream

Asia Pulse

BusinessWire

ComputerWire (CTW)

EFE News Services

FWN Select

Itar-TASS

Knight Ridder News (Open)

Knight-Ridder Open

M2 - International

M2 Airline Industry Information

New World Publishing

PR Newswire

PRLine (PRL)

Resource News International

RosBusiness

United Press International

UPI Spotlights

Page 22: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Semagix’s Semantic

Technology enables flight

authorities to :

- take a quick look at the

passenger’s history

- check quickly if the passenger is

on any official watchlist

- interpret and understand

passenger’s links to other

organizations (possibly terrorist)

- verify if the passenger has

boarded the flight from a “high

risk” region

- verify if the passenger originally

belongs to a “high risk” region

- check if the passenger’s name

has been mentioned in any news

article along with the name of a

known bad guy

Interrogation Kiosk – Unique Advantages of Semagix

SmithJohn

Page 23: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

SmithJohn

Threat Score Components

LEXIS NEXIS ANNOTATION

Action: Information about or related to the passenger returned by Lexis Nexis is enhanced by linking important entities to Semagix’s rich ontology

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text and further automatically co-relate it with other data in the ontology to present a clear picture about the passenger to the flight official

Flight Coutry Check 45 0.15

Person Country Check 25 0.15

Nested Organizations Check 75 0.8

Aggregate Link Analysis Score: 17.7

LINK ANALYSIS

Action: Semantic analysis of the various components (watchlist, Lexis Nexis, ontology search, metabase search, etc.) to come up with an aggregate threat score for the passenger

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge, recognize entities in a piece of text, automatically co-relate it with other data in the ontology, search for relevant content to present an overall idea of the threat level fo the passenger, allowing him to take quick action

appearsOn watchList:

FBI

ONTOLOGY SEARCH

Action: Semagix’s rich ontology is searched for this name and associated information like position, aliases, relationships (past or present) of this name to other organizations, watchlists, country, etc. are retrieved

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge about a passenger and automatically co-relate it with other data in the ontology to present a visual association picture to the flight official

METABASE SEARCH

Action: Semagix’s rich metabase is searched for this name and associated content stories mentioning the passenger’s name are retrieved

Ability Proven: Ability to automatically aggregate and retrieve relevant content stories, field reports, etc. about the passenger that can be used by flight officials to determine if the passenger has any connections with known bad people or organizations

WATCHLIST ANALYSIS

Action: Semagix’s rich ontology is automatically searched for the possible appearance of this name on any of the watchlists

Ability Proven: Ability to automatically aggregate relevant rich domain knowledge and automatically co-relate it and rank the threat factors to indicate threat level of the passenger on the watchlist front

Page 24: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

What it will take RDBMS to support flight security application

Link Analysis Component # Queries (Voquette) # Queries (RDBMS) Time (Voquette) Time (RDBMS)

Direct Watchlist Match (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec

Organization Watchlist Match (person name, organization name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 seclook up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec

Nested Organization Watchlist Match (person name, organization name)look up organization entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve the organization's relationships to organizations 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organizations' relationships to watchlists 1 SQL Query 1 SQL Query .005 sec .005 sec

Flight Origin (country name)retrieve country entity 1 SQL Query 1 SQL Query .005 sec .005 secsee if country is on a list containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec

Person Origin (person name)lookup person entity 1 CACS Request 5-10 SQL Queries .05 sec 5-10 sec.retrieve person's home country 1 SQL Query 1 SQL Query .005 sec .005 secretrieve the organization's relationships to lists containing "high-risk" countries 1 SQL Query 1 SQL Query .005 sec .005 sec

Field Report Search (person name)perform SSE query for field reports that mention this person 1 SSE Request 2 SQL Queries .03 sec 5-30 secretrieve a list of people associated with these field reports 1 SQL Query 1 SQL Query .005 sec .005 secdetermine which people are on watchlists, terrorists, etc… 1 SQL Query 1 SQL Query .005 sec .005 sec

18 requests 39-64 SQL Queries .33 sec 30-80 sec.

Query Comparison:Semagix vs. RDBMS

Page 25: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

Performance

> 10,000 entities/relationships per hr.Population/update rate in a Ontology with 1 million entities/relationships

1 minute (near real-time)Incremental Index Update Frequency

65msQuery Response Time (64 concurrent users) 

1 - 10 msQuery Response Time (light load)

> 1,980,000Queries per server per hour

Page 26: Snapshot of Semantic Web Commercial State of the Art (presented at Science on the Semantic Web, Rutgers, October 2002) Amit Sheth CTO, Semagix Inc.Semagix

More at www.semagix.comand

http://lsdis.cs.uga.edu/lib/presentations.html