Integrating Language Understanding agents
into the Semantic Web
Integrating Language Understanding agents
into the Semantic Web
Akshay Java, Tim Finin, Sergei Nirenburg 11/04/2005 Akshay Java, Tim Finin, Sergei Nirenburg 11/04/2005
OutlineOutline
• Motivation: Language Understanding Agents• Ontological Semantics• Bridging the Knowledge Gap• Preliminary Evaluation• SemNews: An Application Testbed• Conclusion• Q&A
WWWWWW
MotivationMotivation• Intelligent agents need knowledge and information.• Majority of content on the web remains in NL text.• SW can benefit NLP tools in their language
understanding task
Web of documents Web of data
Text
Images
Audio
video
Ontologies
Instances
triples
Natural Language RDF/OWL
Facts from NL
structured information
Semantic Web
NL
P T
ools
MotivationMotivation
Language Understanding Agents
Provides RDF version of the news.
Ontological SemanticsOntological SemanticsOntoSem is a Natural Language Processing System that processes the text and converts them into facts.
Supported by a constructed world model encoded in a rich Ontology.
Ontological SemanticsOntological Semantics
InputText
Text MeaningRepresentation(TMR)
Grammar:Ecology
MorphologySyntax
Lexicon andOnomasticon
Static Knowledge Resources
Ontology andFact Repository
Preprocessor
Syntactic
Analyzer
Semantic
Analyzer
Mapping OntoSem to web based KR
Mapping OntoSem to web based KR
• OntoSem ontology is a frame based representation
ONTOLOGY ::= CONCEPT+CONCEPT ::= ROOT | OBJECT-OR-EVENT | PROPERTYSLOT ::= PROPERTY | FACET | FILLER
• Translating OntoSem Ontology deals with mapping its semantics into corresponding OWL representation.
• OntoSem’s supporting fact repositories are also mapped to OWL.
• The text meaning representation of the sentences is now converted to OWL.
Mapping OntoSem to web based KR
Mapping OntoSem to web based KR
NL Text OntoSem
OWLOntology
Lexicon OntoSem2OWL
FactRepository
TMR
Ontology
TMRsIn OWL
Mapping Rules for ClassesMapping Rules for ClassesOntoSem LISP version
(make-frame patent
(definition (value (common "the exclusive right to make, use or sell an invention, which is granted to the inventor")))
(is-a (value (common intangible-asset legal-right))))
OWL Version:
• <owl:Class rdf:about="&ontosem;patent">• <rdfs:subClassOf>• <owl:Class rdf:about="&ontosem;intangible-asset">• </owl:Class>• </rdfs:subClassOf>• <rdfs:subClassOf>• <owl:Class rdf:about="&ontosem;legal-right">• </owl:Class>• </rdfs:subClassOf>• <rdfs:comment>he exclusive right to make, use or • sell an invention, which is granted to the inventor• </rdfs:label>• </owl:Class>
Mapping Rules for PropertiesMapping Rules for Properties
• Properties can be• ObjectProperty owl:ObjectProperty• Datatype Property owl:DatatypeProperty
• Property hierarchy is defined by owl:subPropertyOf
• Domain maps to rdfs:domain• Range maps to rdfs:range• Restrictions are handled using owl:Restriction• Numeric datatypes are handled using XSD
Mapping Rules for Properties…Mapping Rules for Properties…(make-frame controls
(domain (sem (common physical-event physical-object
social-event social-role))) (range
(sem (common actualize artifact natural-object social-role)))
(is-a (value (common relation))) (inverse (value (common controlled-by))) (definition
(value (common "A relation which relates concepts to what
they can control"))))
Mapping Rules for Properties…Mapping Rules for Properties…<owl:ObjectProperty rdf:ID= "controls">
<rdfs:domain> <owl:Class> <owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#physical-event"/> <owl:Class rdf:about="#physical-object"/><owl:Class rdf:about="#social-event"/>
<owl:Class rdf:about="#social-role"/> </owl:unionOf> </owl:Class>
</rdfs:domain> <rdfs:range>
<owl:Class> <owl:unionOf rdf:parseType="Collection">
<owl:Class rdf:about="#actualize"/> <owl:Class rdf:about="#artifact"/> <owl:Class rdf:about="#natural-object"/> <owl:Class rdf:about="#social-role"/>
</owl:unionOf> </owl:Class>
</rdfs:range> <rdfs:subPropertyOf>
<owl:ObjectProperty rdf:about="#relation"/> </rdfs:subPropertyOf> <owl:inverseOf rdf:resource="#controlled-by"/> <rdfs:label> "A relation which relates concepts to what they can control" </rdfs:label>
</owl:ObjectProperty>
(make-frame
(domain
(range
(is-a(inverse
Mapping Rules for FacetsMapping Rules for FacetsFacets are a way to restricting the fillers that can be used for a
particular slot
• SEM and VALUE• Maps them using owl:Restriction on a particular property.
• RELAXABLE-TO• Add this to the classes present in owl:Restriction and add this
information in the annotation.• DEFAULT
• No clear way to represent non-monotonic reasoning and closed world assumptions in Semantic Web.
• DEFAULT-MEASURE• similar to DEFAULT Facet, not handled.• DEFAULT, DEFAULT-MEASURE used relatively less frequently
• NOT• Not facet can be handled using owl:disjointOf
• INV• need not be handled since is-a slot is already mapped to owl:inverseOf
Mapping RulesMapping Rules
Case Frequency Mapped Using
1 domain 617 rdfs:domain
2 domain with not facet 16 owl:disjointWith
3 range 406 rdfs:range
4 range with not facet 5 owl:disjointWith
5 inverse 260 owl:inverseOf
Property Related Constructs
Mapping RulesMapping Rules
Case Frequency Mapped Using1 value 18217 owl:Restriction
2 sem 5686 owl:Restriction
3 relaxable-to 95 annotation
4 default 350 Not handled
5 default-measure 612 Not handled
6 not 134 owl:disjointWith
7 inv 1941 Not required
Facet related constructs
Translating TMR2OWLTranslating TMR2OWL
Translating TMRs involves instantiation of concepts mapped in OWL.
Example:(COME-1740
(TIME (VALUE (COMMON (FIND-ANCHOR-TIME))))
(DESTINATION(VALUE (COMMON CITY-1740)))
(AGENT (VALUE (COMMON POLITICIAN-1740))) (ROOT-WORDS (VALUE (COMMON (ARRIVE)))) (WORD-NUM (VALUE (COMMON 2))) (INSTANCE-OF (VALUE (COMMON COME)))
<ontosem:come rdf:about="COME-1740"> <ontosem:destination rdf:resource="#CITY-1740"/> <ontosem:agent rdf:resource="#POLITICIAN-1740"/>
</ontosem:come>
EvaluationEvaluation
http://w3c.org/RDF/Validator/
Swoop
Pellet
Wonderweb
Built Ontology translation tool using Jena API Total Triples Generated ~ 102189 (including bnode)
Time to build the Model ~ 10-40 sec
Time to do RDFS Inference ~ 10 sec
Time to do OWL Micro ~ 40 sec
Time to do OWL Full ~ ????
DL Expressivity: ELUIHEL - Conjunction and Full Existential QuantificationU - UnionH - Role HierarchyI - Role Inverse
Total Number of Classes: 7747 (Defined: 7747, Imported: 0)Total Number of Datatype Properties: 0 (Defined: 0, Imported: 0)Total Number of Object Properties: 604 (Defined: 604, Imported: 0)Total Number of Annotation Properties: 1 (Defined: 1, Imported: 0)Total Number of Individuals: 0 (Defined: 0, Imported: 0) NOTE: This is using no Restrictions
After Translation
OWL FULL
EvaluationEvaluation
• Syntactic Correctness: was checked using OWL/RDF validators.
• Semantic Validation: Full semantic validation even for subsets of OWL is difficult.
• Meaning Preservation: some subset of the native representation features such as DEFAULTS, modality, case roles may be underrepresented or not handled.
• Feature Minimization: Complex features could be difficult for reasoners to handle hence we can perform the translations at each of the levels – OWL Lite, OWL DL, OWL Full.
• Translation Complexity: OntoSem is an extensive and large ontology (~8000 concepts). Translation itself is done syntactically but in general translation might require reasoning which could be an issue.
Reasoning CapabilitiesReasoning CapabilitiesBuildfile: build.xml
init:
compile:
dist: [jar] Building jar: /home/aks1/software/eclipse/workspace/ontojena/dist/lib/ontojena.jar
run: [java] MODEL OK [java] Resource: http://ontosem.org/#fire-engine [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#fire-engine) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#all) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#physical-object) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#inanimate) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#engine-propelled-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#wheeled-engine-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#artifact) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#object) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#land-vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#vehicle) [java] - (http://ontosem.org/#fire-engine rdfs:subClassOf http://ontosem.org/#truck) [java] - (http://ontosem.org/#fire-engine rdfs:label ' "a truck with equipment for fighting fires"') [java] - (http://ontosem.org/#fire-engine rdf:type owl:Class) [java] fire-engine recognized as subclas of vehicle
BUILD SUCCESSFULTotal time: 10 seconds
real 0m11.144suser 0m9.530ssys 0m0.190s[aks1@trishuli ontojena]$
Finding Transitive Closures (RDFS reasoning)
Fire-engine
Truck
Wheeled-engine-vehicle
Engine-propelled--vehicle Wheeled--vehicle
Land-vehicle
vehicleInferred Triples
An Application Testbed: SemNewsAn Application Testbed: SemNews
• SemNews: Semantically Search and Browser news
• Aggregators collect the RSS news descriptions form various sources.
• The sentences are processed by OntoSem and are converted into Text Meaning Representations (TMRs)
• Provides intelligent agents with the latest news in a machine readable format
http://semnews.umbc.edu
Semantic RSS
Data Aggregators
News Feeds
OntoSem
TMRs FR
Language Processing
OntoSem2OWLDekade Editor
Knowledge Editor Environment Semantic Web Tools
OntoSem Ontology (OWL)
TMR
Inferred
Triples
Fact Repository Interface
Ontology &Instance browser
Text Search
RDQL Query
Swoogle Index
1 2
56
7
8
9
3 4
10
11
12
13
14
15
RSSAggregator
http://semnews.umbc.edu
Agent understandable newsAgent understandable news
Provides RDF version of the news.
http://semnews.umbc.edu
Semantacizing RSSSemantacizing RSS
View structured representation of the RSS news story.
Future versions would enable editing the facts and provide provenance information
http://semnews.umbc.edu
News stories are ontologically linked
News stories are ontologically linked
Find news stories by browsing through the OntoSem ontology.
http://semnews.umbc.edu
Tracking Named EntitiesTracking Named Entities
Find stories about a specific named entity.
http://semnews.umbc.edu
Browsing FactsBrowsing FactsFact repository explorer for named entity ‘Mexico’ shows that it has a relation ‘nationality-of’ with CITIZEN-235
Fact repository explorer for instance CITIZEN-235 shows that the citizen is an agent of ESCAPE-EVENT
http://semnews.umbc.edu
Querying the semanticized RSSQuerying the semanticized RSS
RDQL Queries
Provides structured querying over text converted into RDF representation.
http://semnews.umbc.edu
Semantic AlertsSemantic Alerts
Alerts can be specified as ontological concepts/ keywords / RDQL queries.
Subscribe to results of structured queries
http://semnews.umbc.edu
ConclusionsConclusions
• Integrating language processing agents into the SW would publish SW annotations and documents that capture the text’s meaning.
• Migrating from native non-web based representation to SW representation may be loss-full but is still useful for many applications.
• SemNews application testbed demonstrates some scenarios that can benefit from language understanding agents.
Q&AThank you.
http://ebiquity.umbc.eduhttp://
semnews.umbc.edu
ReferencesReferencesSoftware Used[1] OntoSem http://ilit.umbc.edu/[2] RDF Validation service http://w3c.org/RDF/Validator[3] Jena Toolkit http://jena.sourceforge.net/[4] Swoop Ontology Viewer http://www.mindswap.org/2004/SWOOP/[5] Pellet OWL DL Reasoner http://www.mindswap.org/2003/pellet/[6] Wonder Web OWL Validator http://phoebus.cs.man.ac.uk:9999/OWL/Validator
Papers[1] Sergei Nirenburg and Victor Raskin, Ontological Semantics, Formal Ontology and Ambiguity[2] Sergei Nirenburg and Victor Raskin, Ontological Semantics, MIT Press, Forthcoming[3] Sergei Nirenburg, Ontological Semantics: Overview, Presentation CLSP JHU, Spring 2003[4] Marjorie McShane, Sergei Nirenburg, Stephen Beale, Margalit Zabludowski, The Cross Lingual Reuse and
Extension of knowledge Resources in Ontological Semantics[5] P.J Beltran-Ferruz, P.A Gonzalez-Calero, P. Gervas Converting Mikrokosmos frames into Description Logics.[6] Sergei Nirenburg, Ontology Tutorial, ILIT UMBC
Mailing Lists[1] Jena Developers [email protected][2] pellet users [email protected][3] Semantic web [email protected][4] W3c RDF Interest [email protected][5] W3c Semantic web [email protected]
Backup slides
Static Knowledge SourcesStatic Knowledge Sources• Ontology 8000 concepts• Avg 16 properties each• English Lexicon 45000 entries• Spanish Lexicon 40000 entries• Chinese Lexicon 3000 entries• Fact repository 20000 facts
[Sergei Nirenburg, Ontological Semantics: Overview, Presentation CLSP JHU, Spring 2003]
Text Meaning Representation (TMR)
Text Meaning Representation (TMR)
He asked the UN to authorize the war.
REQUEST-ACTION-69 AGENT HUMAN-72 THEME ACCEPT-70 BENEFICIARY ORGANIZATION-71 SOURCE-ROOT-WORD ask TIME (< (FIND-ANCHOR-TIME))
ACCEPT-70 THEME WAR-73 THEME-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD authorize
ORGANIZATION-71 HAS-NAME United-Nations BENEFICIARY-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD UN
HUMAN-72 HAS-NAME Colin Powell AGENT-OF REQUEST-ACTION-69 SOURCE-ROOT-WORD he ; reference resolution has been carried out
WAR-73 THEME-OF ACCEPT-70 SOURCE-ROOT-WORD war
Example from[Marjorie McShane, Sergei Nirenburg, Stephen Beale, Margalit Zabludowski, The Cross Lingual Reuse and Extension of knowledge Resources in Ontological Semantics]
Text Meaning Representation (TMR)
Text Meaning Representation (TMR)
The OntoSem OntologyThe OntoSem Ontology
PROPERTY
FILLER
FA
CE
T