Transcript
Page 1: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Copyright 2007

Cyc-Gateworkflow

Blaz Fortuna, Luka BradeskoCycorp Europe, Slovenia

Page 2: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Goal

• Demonstrate reasoning over non-structured input data

• Learn how to correctly annotate a new plug-in

• Learn how to add a new plug-in to the platform

Page 3: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

External tools useds

• GATE– Information Extraction framework– Used here for extraction of named entities

from articles

• ResearchCyc– Common-sense knowledge base

• ~300,000 concepts, 1.3M assertions

– Reasoning engine

Page 4: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Pipeline diagram

Query Identify

Transform

Select

ReasonResult

ResearchCyc

GATE

Internet

Page 5: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Example

Page 6: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Query

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle " http://shodan.ijs.si:8080/GateServer/news.txt " .

?company cyc:isa cyc:PubliclyHeldCorporation }

Page 7: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Identify

• Find links to html documents and retrieve them using ArticleIdentifier plugin.– Returns a text document:

http://shodan.ijs.si:8080/GateServer/news.txt

Page 8: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Transform

• Use GATE to extract organizations– Retruns SetOfStatements of style:

article-0 urn:hasUrl “http://shodan.ijs.si:8080/GateServer/news.txt "

company-0 urn:nameString “Microsoft”

company-0 urn:mentionedInArticle article-0

company-1 urn:nameString “Ford”

company-1 urn:mentionedInArticle article-0

Query:

?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt"

Page 9: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Select

• Select only the companies with corresponding concept in ResearchCyc KBcompany-0 → #$MicrosoftInccompany-1 → #$FordMotors

• Replace URIs with Cyc conceptscyc:mentionedInArticle → #$mentionedInArticle

• Output:

#$MicrosoftInc  #$mentionedInArticle #$article-0

#$FordMotors #$mentionedInArticle #$article-0

Page 10: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Reason

• Reason– Load the triples with

Cyc concept names in ReasearchCyc KB

– Transform SPARQL query to Cyc query

– Execute and retrieve results

Page 11: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Run the workflow on your computer!

Main class: eu.larkc.core.LarkcVM arguments: -Xmx512m

Page 12: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Run SPARQL client

• In windows:Double-click SPARQLClient.jar

• In Linux:java –jar SPARQLClient.jar

Page 13: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Run example query

• Execute query in SPARQL Client

• Walk-through the output of the program

• Go through the plug-ins’ .java files

Page 14: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Other interesting queries

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:PubliclyHeldCorporation }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:SoftwareVendor }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:SoftwareVendor }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:Business }

Page 15: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Other interesting queries

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?company cyc:makesProductType cyc:CellularTelephone }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?company cyc:makesProductType cyc:CellularTelephone .

?company cyc:stockTickerSymbol ?ticker }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?program cyc:programAuthor ?company }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?competitor cyc:competitors ?company .

?competitor cyc:makesProductType cyc:CellularTelephone }

Page 16: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Plug-in SAWSDL description

<wsdl:description>

<!-- COMMON TO ALL SELECTERS -->

<wsdl:interface name="identifier"

sawsdl:modelReference="http://larkc.eu/plugin#Identifier">

</wsdl:interface>

<wsdl:binding name="larkcbinding" type="http://larkc.eu/wsdl-binding" />

<!-- SPECIFIC TO THIS IDENTIFIER -->

<wsdl:service

name="urn:eu.larkc.plugin.identify.article.ArticleIdentifier"

interface="identifier”

sawsdl:modelReference="http://larkc.eu/plugin#ArticleIdentifier" >

<wsdl:endpoint

location="java:eu.larkc.plugin.identify.article.ArticleIdentifier" />

</wsdl:service>

</wsdl:description>

Page 17: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Plug-in ontology

@prefix larkc: <http://larkc.eu/plugin#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

larkc:ArticleIdentifier

rdf:type rdfs:Class ;

rdfs:subClassOf larkc:Identifier ;

larkc:hasInputType larkc:SPARQLQuery ;

larkc:hasOutputType larkc:NaturalLanguageDocument .

Page 18: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Scripted decider

Pipeline pipeline = new Pipeline();

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.identify.article.ArticleIdentifier"));

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.transform.gate.GateTransformer"));

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.select.cycselecter.CycSelecter"));

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.reason.cycreasoner.CycReasoner"));

try {

pipeline.start(theQuery);

} catch (Exception e) {

// error

}

return (VariableBinding)pipeline.take();

Page 19: LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Write a new plug-in

• Create new project– New Folder– Link bin directory– Make source directory– Add libraries

• Prepare code:– Copy-paste GateTransformer.Java– Rename it to SimpleNamedEntitiyExtractor– Insert code available in SimpleNamedEntitiyExtractor.txt

• Prepare/update meta-data files– SimpleNamedEntitiyExtractor.wsdl– SimpleNamedEntitiyExtractor.rdf

• Update CycGateDecider• Clean, Build and Run!