LarKC Tutorial at ISWC 2009 - Second Hands-on Scenario

Preview:

DESCRIPTION

The aim of the EU FP 7 Large-Scale Integrating Project LarKC is to develop the Large Knowledge Collider (LarKC, for short, pronounced “lark”), a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. The LarKC platform is available at larkc.sourceforge.net. This is the first of two hand-ons that introduce participants to working with directly LarKC code.

Citation preview

Copyright 2007

Cyc-Gateworkflow

Blaz Fortuna, Luka BradeskoCycorp Europe, Slovenia

Goal

• Demonstrate reasoning over non-structured input data

• Learn how to correctly annotate a new plug-in

• Learn how to add a new plug-in to the platform

External tools useds

• GATE– Information Extraction framework– Used here for extraction of named entities

from articles

• ResearchCyc– Common-sense knowledge base

• ~300,000 concepts, 1.3M assertions

– Reasoning engine

Pipeline diagram

Query Identify

Transform

Select

ReasonResult

ResearchCyc

GATE

Internet

Example

Query

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle " http://shodan.ijs.si:8080/GateServer/news.txt " .

?company cyc:isa cyc:PubliclyHeldCorporation }

Identify

• Find links to html documents and retrieve them using ArticleIdentifier plugin.– Returns a text document:

http://shodan.ijs.si:8080/GateServer/news.txt

Transform

• Use GATE to extract organizations– Retruns SetOfStatements of style:

article-0 urn:hasUrl “http://shodan.ijs.si:8080/GateServer/news.txt "

company-0 urn:nameString “Microsoft”

company-0 urn:mentionedInArticle article-0

company-1 urn:nameString “Ford”

company-1 urn:mentionedInArticle article-0

Query:

?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt"

Select

• Select only the companies with corresponding concept in ResearchCyc KBcompany-0 → #$MicrosoftInccompany-1 → #$FordMotors

• Replace URIs with Cyc conceptscyc:mentionedInArticle → #$mentionedInArticle

• Output:

#$MicrosoftInc  #$mentionedInArticle #$article-0

#$FordMotors #$mentionedInArticle #$article-0

Reason

• Reason– Load the triples with

Cyc concept names in ReasearchCyc KB

– Transform SPARQL query to Cyc query

– Execute and retrieve results

Run the workflow on your computer!

Main class: eu.larkc.core.LarkcVM arguments: -Xmx512m

Run SPARQL client

• In windows:Double-click SPARQLClient.jar

• In Linux:java –jar SPARQLClient.jar

Run example query

• Execute query in SPARQL Client

• Walk-through the output of the program

• Go through the plug-ins’ .java files

Other interesting queries

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:PubliclyHeldCorporation }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:SoftwareVendor }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:SoftwareVendor }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:Business }

Other interesting queries

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?company cyc:makesProductType cyc:CellularTelephone }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?company cyc:makesProductType cyc:CellularTelephone .

?company cyc:stockTickerSymbol ?ticker }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?program cyc:programAuthor ?company }

PREFIX cyc: <http://www.cycfoundation.org/concepts/>

SELECT ?company WHERE

{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .

?competitor cyc:competitors ?company .

?competitor cyc:makesProductType cyc:CellularTelephone }

Plug-in SAWSDL description

<wsdl:description>

<!-- COMMON TO ALL SELECTERS -->

<wsdl:interface name="identifier"

sawsdl:modelReference="http://larkc.eu/plugin#Identifier">

</wsdl:interface>

<wsdl:binding name="larkcbinding" type="http://larkc.eu/wsdl-binding" />

<!-- SPECIFIC TO THIS IDENTIFIER -->

<wsdl:service

name="urn:eu.larkc.plugin.identify.article.ArticleIdentifier"

interface="identifier”

sawsdl:modelReference="http://larkc.eu/plugin#ArticleIdentifier" >

<wsdl:endpoint

location="java:eu.larkc.plugin.identify.article.ArticleIdentifier" />

</wsdl:service>

</wsdl:description>

Plug-in ontology

@prefix larkc: <http://larkc.eu/plugin#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

larkc:ArticleIdentifier

rdf:type rdfs:Class ;

rdfs:subClassOf larkc:Identifier ;

larkc:hasInputType larkc:SPARQLQuery ;

larkc:hasOutputType larkc:NaturalLanguageDocument .

Scripted decider

Pipeline pipeline = new Pipeline();

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.identify.article.ArticleIdentifier"));

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.transform.gate.GateTransformer"));

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.select.cycselecter.CycSelecter"));

pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.reason.cycreasoner.CycReasoner"));

try {

pipeline.start(theQuery);

} catch (Exception e) {

// error

}

return (VariableBinding)pipeline.take();

Write a new plug-in

• Create new project– New Folder– Link bin directory– Make source directory– Add libraries

• Prepare code:– Copy-paste GateTransformer.Java– Rename it to SimpleNamedEntitiyExtractor– Insert code available in SimpleNamedEntitiyExtractor.txt

• Prepare/update meta-data files– SimpleNamedEntitiyExtractor.wsdl– SimpleNamedEntitiyExtractor.rdf

• Update CycGateDecider• Clean, Build and Run!