117
INDEXING AND SEARCHING RDF DATASETS Improving Performance of Semantic Web Applications with Lucene, SIREn and RDF Mike Hugo Entagen, LLC

Improving RDF Search Performance with Lucene and SIREN

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Improving RDF Search Performance with Lucene and SIREN

INDEXING AND SEARCHING RDF DATASETS

Improving Performance of Semantic Web Applications with Lucene, SIREn and RDF

Mike HugoEntagen, LLC

Page 2: Improving RDF Search Performance with Lucene and SIREN

slides and sample code can be found at https://github.com/mjhugo/rdf-lucene-siren-presentation

Page 3: Improving RDF Search Performance with Lucene and SIREN
Page 4: Improving RDF Search Performance with Lucene and SIREN

ACCELERATING INSIGHTENTAGEN

Page 5: Improving RDF Search Performance with Lucene and SIREN

17

Page 6: Improving RDF Search Performance with Lucene and SIREN
Page 7: Improving RDF Search Performance with Lucene and SIREN
Page 8: Improving RDF Search Performance with Lucene and SIREN

AGENDA

Page 9: Improving RDF Search Performance with Lucene and SIREN
Page 10: Improving RDF Search Performance with Lucene and SIREN

SPARQL

Page 11: Improving RDF Search Performance with Lucene and SIREN

SPARQL

LUCENE

Page 12: Improving RDF Search Performance with Lucene and SIREN

SPARQL

LUCENE

SIREN

Page 13: Improving RDF Search Performance with Lucene and SIREN

SPARQL

LUCENE

SIREN

TripleMap.com

Page 14: Improving RDF Search Performance with Lucene and SIREN
Page 15: Improving RDF Search Performance with Lucene and SIREN

LINKING OPEN DATA

Page 16: Improving RDF Search Performance with Lucene and SIREN

LIFE SCIENCE LINKED DATA

Page 17: Improving RDF Search Performance with Lucene and SIREN

WHAT’S A TRIPLE?

Subject

Object

Predicate

Page 18: Improving RDF Search Performance with Lucene and SIREN

WHAT’S A TRIPLE?

<Mike>

“Mike Hugo”

<name>

Page 19: Improving RDF Search Performance with Lucene and SIREN

WHAT’S A TRIPLE?

“Mike Hugo”

<name>

“Minneapolis”

<lives_in_city>

<Mike>

Page 20: Improving RDF Search Performance with Lucene and SIREN

WHAT’S A TRIPLE?

“Mike Hugo”

<name>

“Minneapolis”<lives_in_city><Mike>

<Lydia>

<daughter>

Page 21: Improving RDF Search Performance with Lucene and SIREN

WHAT’S A TRIPLE?

“Mike Hugo”

<name>

“Minneapolis”<lives_in_city><Mike>

<daughter>

<Lydia> <name> “Lydia Hugo”

Page 22: Improving RDF Search Performance with Lucene and SIREN
Page 23: Improving RDF Search Performance with Lucene and SIREN
Page 24: Improving RDF Search Performance with Lucene and SIREN
Page 25: Improving RDF Search Performance with Lucene and SIREN
Page 26: Improving RDF Search Performance with Lucene and SIREN

Ready, GO!

Page 27: Improving RDF Search Performance with Lucene and SIREN

select id, label from targetswhere label = ‘${queryValue}’

Page 28: Improving RDF Search Performance with Lucene and SIREN

select id, label from targetswhere label ilike ‘%${queryValue}%’

Page 29: Improving RDF Search Performance with Lucene and SIREN

SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER (?label = '${params.query}')} LIMIT 10

Page 30: Improving RDF Search Performance with Lucene and SIREN

SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER regex(?label, '\\\\Q${params.query}\\\\E', 'i')} LIMIT 10

Page 31: Improving RDF Search Performance with Lucene and SIREN

SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER regex(?label, '\\\\Q${params.query}\\\\E', 'i')} LIMIT 10

query as literal valuecase insensitive

Page 32: Improving RDF Search Performance with Lucene and SIREN

DEMOBaseline SPARQL Query Performance

Page 33: Improving RDF Search Performance with Lucene and SIREN

FASTER!

Page 34: Improving RDF Search Performance with Lucene and SIREN
Page 35: Improving RDF Search Performance with Lucene and SIREN

Java API

Indexing and Searching Text

Page 37: Improving RDF Search Performance with Lucene and SIREN

ind

ex

ing

stora

ge

Page 38: Improving RDF Search Performance with Lucene and SIREN

Document

Page 39: Improving RDF Search Performance with Lucene and SIREN

Document

field valueID 2

name “Mike Hugo”company “Entagen”

bio“lorem ipsum

dolor sum etc...”

Page 40: Improving RDF Search Performance with Lucene and SIREN

Index

field value

name “mike hugo”

company “Entagen”

bio “lorem ipsum dolor sum etc...”

field value

name “mike hugo”

company “Entagen”

bio “lorem ipsum dolor sum etc...”

field value

name “mike hugo”

company “Entagen”

bio “lorem ipsum dolor sum etc...”

field value

name “mike hugo”

company “Entagen”

bio “lorem ipsum dolor sum etc...”

field value

name “mike hugo”

company “Entagen”

bio “lorem ipsum dolor sum etc...”

field value

name “mike hugo”

company “Entagen”

bio “lorem ipsum dolor sum etc...”

field valueid 2

name “Mike Hugo”company “Entagen”

bio“lorem ipsum

dolor sum etc...”

Indexednot

Stored

Page 41: Improving RDF Search Performance with Lucene and SIREN

name: mikeQuery:

Page 42: Improving RDF Search Performance with Lucene and SIREN

name: mike

field valueid 2field value

id 2field valueid 2field value

id 2

Query:

MatchingDocuments:

Page 43: Improving RDF Search Performance with Lucene and SIREN

field valueid 2

Page 44: Improving RDF Search Performance with Lucene and SIREN

field valueid 2

Page 45: Improving RDF Search Performance with Lucene and SIREN

field valueid 2

field valueID 2

name “Mike Hugo”company “Entagen”

bio “lorem ipsum dolor sum etc...”

Page 46: Improving RDF Search Performance with Lucene and SIREN

Simplest Solution

Page 47: Improving RDF Search Performance with Lucene and SIREN

Lucene index of rdfs:label

Page 48: Improving RDF Search Performance with Lucene and SIREN

Build the Index

Page 49: Improving RDF Search Performance with Lucene and SIREN

try { String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } """ sparqlQueryService.executeForEach(repository, queryLabels) { def doc = new Document() String uri = it.uri.stringValue() String label = it.label.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } } finally { writer.close() // Close index }

Build a SPARQL query to find all the rdfs:label properties

Page 50: Improving RDF Search Performance with Lucene and SIREN

try { String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } """ sparqlQueryService.executeForEach (repository, queryLabels) { String uri = it.uri.stringValue() String label = it.label.stringValue()

def doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } } finally { writer.close() // Close index }

Execute the SPARQL query

Page 51: Improving RDF Search Performance with Lucene and SIREN

try { String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } """ sparqlQueryService.executeForEach(repository, queryLabels) { String uri = it.uri.stringValue() String label = it.label.stringValue() Document doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } } finally { writer.close() // Close index }

Instantiate a new Lucene Document

Page 52: Improving RDF Search Performance with Lucene and SIREN

try { String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } """ sparqlQueryService.executeForEach(repository, queryLabels) { String uri = it.uri.stringValue() String label = it.label.stringValue() Document doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } } finally { writer.close() // Close index }

Add the Subject URI to the Document

key

value

Page 53: Improving RDF Search Performance with Lucene and SIREN

try { String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } """ sparqlQueryService.executeForEach(repository, queryLabels) { String uri = it.uri.stringValue() String label = it.label.stringValue() Document doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } } finally { writer.close() // Close index }

Add the Label field to the document

(but don’t store it)

valuekey

Page 54: Improving RDF Search Performance with Lucene and SIREN

try { String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } """ sparqlQueryService.executeForEach(repository, queryLabels) { def doc = new Document() String uri = it.uri.stringValue() String label = it.label.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } } finally { writer.close() // Close index }

Add the document to the Index

Page 55: Improving RDF Search Performance with Lucene and SIREN

Query the Index

Page 56: Improving RDF Search Performance with Lucene and SIREN

def query = { Query query = new QueryParser( Version.LUCENE_CURRENT, LABEL_FIELD, new StandardAnalyzer()) .parse(params.query); def s = new Date().time List results = executeQuery(query) def e = new Date().time render(view: 'index', model: [results: results, time: e - s]) } public List executeQuery(Query query) { IndexSearcher searcher = luceneSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService.getLabelAndType(uri, connection) results << [uri: uri, type: labelAndType.type, label: labelAndType.label] } connection.close() return results }

Create a Lucene Query from user

input

query this field

for this value

Page 57: Improving RDF Search Performance with Lucene and SIREN

def query = { Query query = new QueryParser( Version.LUCENE_CURRENT, LABEL_FIELD, new StandardAnalyzer()) .parse(params.query); def s = new Date().time List results = executeQuery(query) def e = new Date().time render(view: 'index', model: [results: results, time: e - s]) } public List executeQuery(Query query) { IndexSearcher searcher = luceneSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService.getLabelAndType(uri, connection) results << [uri: uri, type: labelAndType.type, label: labelAndType.label] } connection.close() return results }

Search the index (limit 10) for

matching documents

Page 58: Improving RDF Search Performance with Lucene and SIREN

def query = { Query query = new QueryParser( Version.LUCENE_CURRENT, LABEL_FIELD, new StandardAnalyzer()) .parse(params.query); def s = new Date().time List results = executeQuery(query) def e = new Date().time render(view: 'index', model: [results: results, time: e - s]) } public List executeQuery(Query query) { IndexSearcher searcher = luceneSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType =

sparqlQueryService. getLabelAndType(uri, connection)

results.add([ uri: uri,

type: labelAndType.type, label: labelAndType.label]) } connection.close() return results }

For each matching document, get the

doc and extract the Subject URI

Page 59: Improving RDF Search Performance with Lucene and SIREN

def query = { Query query = new QueryParser( Version.LUCENE_CURRENT, LABEL_FIELD, new StandardAnalyzer()) .parse(params.query); def s = new Date().time List results = executeQuery(query) def e = new Date().time render(view: 'index', model: [results: results, time: e - s]) } public List executeQuery(Query query) { IndexSearcher searcher = luceneSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType =

sparqlQueryService. getLabelAndType(uri, connection)

results.add([ uri: uri,

type: labelAndType.type, label: labelAndType.label]) } connection.close() return results }

Using the Subject URI, load properties from the triplestore

Page 60: Improving RDF Search Performance with Lucene and SIREN

def query = { Query query = new QueryParser( Version.LUCENE_CURRENT, LABEL_FIELD, new StandardAnalyzer()) .parse(params.query); def s = new Date().time List results = executeQuery(query) def e = new Date().time render(view: 'index', model: [results: results, time: e - s]) } public List executeQuery(Query query) { IndexSearcher searcher = luceneSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType =

sparqlQueryService. getLabelAndType(uri, connection)

results.add([ uri: uri,

type: labelAndType.type, label: labelAndType.label]) } connection.close() return results }

return results containing Subject

URI, Type, and Label

Page 61: Improving RDF Search Performance with Lucene and SIREN

DEMOLucene Index of Searchable Labels

Page 62: Improving RDF Search Performance with Lucene and SIREN

WHAT ABOUT ENTITY RELATIONSHIPS?

Page 63: Improving RDF Search Performance with Lucene and SIREN

WHAT ABOUT OTHER PROPERTIES?

Page 64: Improving RDF Search Performance with Lucene and SIREN

Indexing and SearchingSemi-Structured Data

Lucene Extension

Page 65: Improving RDF Search Performance with Lucene and SIREN

Document

Page 66: Improving RDF Search Performance with Lucene and SIREN

Document

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

Page 67: Improving RDF Search Performance with Lucene and SIREN

Build the Index

Page 68: Improving RDF Search Performance with Lucene and SIREN

RepositoryConnection connection = repository.connection try { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach(repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new StringWriter() NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWriter) connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString(), Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) }

Select all Subject URIs from the

triplestore

Page 69: Improving RDF Search Performance with Lucene and SIREN

RepositoryConnection connection = repository.connection try { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach( repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new StringWriter() NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWriter) connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString(), Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) }

Execute the Sparql Query For each URI, create a

new Document

Page 70: Improving RDF Search Performance with Lucene and SIREN

RepositoryConnection connection = repository.connection try { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach( repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new StringWriter() NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWriter) connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString(), Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) }

Add the Subject URI to the Document

Page 71: Improving RDF Search Performance with Lucene and SIREN

RepositoryConnection connection = repository.connection try { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach( repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new StringWriter() NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWriter) connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString(), Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) }

Get an NTriples string from the

triplestore

Page 72: Improving RDF Search Performance with Lucene and SIREN

RepositoryConnection connection = repository.connection try { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach( repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new StringWriter() NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWriter) connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString(), Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) }

Add the NTriples string to the

document

Page 73: Improving RDF Search Performance with Lucene and SIREN

RepositoryConnection connection = repository.connection try { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach( repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue() doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new StringWriter() NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWriter) connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString(), Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) }

Add the document to the index

Page 74: Improving RDF Search Performance with Lucene and SIREN

Query the Index

Page 75: Improving RDF Search Performance with Lucene and SIREN

SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()))) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST) List results = executeQuery(query)

query the Triples field

Page 76: Improving RDF Search Performance with Lucene and SIREN

SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()))) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST) List results = executeQuery(query)

for a predicate

Page 77: Improving RDF Search Performance with Lucene and SIREN

SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()))) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST) List results = executeQuery(query)

of rdfs:label *

* note: could be any predicate!

Page 78: Improving RDF Search Performance with Lucene and SIREN

SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()))) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST) List results = executeQuery(query)

query the Triples field

Page 79: Improving RDF Search Performance with Lucene and SIREN

SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()))) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST) List results = executeQuery(query)

for an object

Page 80: Improving RDF Search Performance with Lucene and SIREN

SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()))) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST) List results = executeQuery(query)

matching the user input

Page 81: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

“imatinib”Query:

Page 82: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

triples field

Query:

Page 83: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

predicate = rdfs:label

Query:

Page 84: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

predicate = rdfs:label

Query:

object = “imatinib”

Page 85: Improving RDF Search Performance with Lucene and SIREN

public List executeQuery(Query query) { IndexSearcher searcher = sirenSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results }

Search the index (limit 10) for

matching documents

Page 86: Improving RDF Search Performance with Lucene and SIREN

public List executeQuery(Query query) { IndexSearcher searcher = sirenSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results }

For each matching document, get the

doc and extract the Subject URI

Page 87: Improving RDF Search Performance with Lucene and SIREN

public List executeQuery(Query query) { IndexSearcher searcher = sirenSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results }

Using the Subject URI, load properties from the triplestore

Page 88: Improving RDF Search Performance with Lucene and SIREN

public List executeQuery(Query query) { IndexSearcher searcher = sirenSearcherManager.get() ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results }

return results containing Subject

URI, Type, and Label

Page 89: Improving RDF Search Performance with Lucene and SIREN

DEMOSIREn Index of RDF Entities

Page 90: Improving RDF Search Performance with Lucene and SIREN

FLEXIBILITY

Page 91: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

predicate = rdfs:label

Query:

object = “imatinib”

Page 92: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

Query:

object = “imatinib”

Page 93: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

Query:

object = “imatinib”

object = “gleevec”OR

Page 94: Improving RDF Search Performance with Lucene and SIREN

MORE THAN LITERALS

Page 95: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

Query:

predicate = brandName

Page 96: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

Query:

predicate = target

Page 97: Improving RDF Search Performance with Lucene and SIREN

RELATIONSHIPS

Page 98: Improving RDF Search Performance with Lucene and SIREN
Page 99: Improving RDF Search Performance with Lucene and SIREN
Page 100: Improving RDF Search Performance with Lucene and SIREN
Page 101: Improving RDF Search Performance with Lucene and SIREN
Page 102: Improving RDF Search Performance with Lucene and SIREN

field value

URI <DB00619>

triples

<DB00619> rdfs:label "Imatinib" .<DB00619> rdf:type <drugbank:drugs> .<DB00619> drugbank:brandName "Gleevec" .<DB00619> drugbank:target <targets/1588> .

Query:

object = <targets/1588>

Page 103: Improving RDF Search Performance with Lucene and SIREN

DEMOSearching SIREn Index for Relationships

Page 104: Improving RDF Search Performance with Lucene and SIREN

DistributedIndexing and SearchingSemi-Structured Data

Page 105: Improving RDF Search Performance with Lucene and SIREN
Page 106: Improving RDF Search Performance with Lucene and SIREN

Replication

Page 107: Improving RDF Search Performance with Lucene and SIREN
Page 108: Improving RDF Search Performance with Lucene and SIREN
Page 109: Improving RDF Search Performance with Lucene and SIREN
Page 110: Improving RDF Search Performance with Lucene and SIREN
Page 111: Improving RDF Search Performance with Lucene and SIREN

400 Million Documents> 12 Billion Triples

Page 112: Improving RDF Search Performance with Lucene and SIREN

Query Parser

Page 113: Improving RDF Search Performance with Lucene and SIREN

Query Parser

subject

predicate object

Page 114: Improving RDF Search Performance with Lucene and SIREN

DEMOSIREn in action on TripleMap.com

Page 115: Improving RDF Search Performance with Lucene and SIREN

DEMOSIREn in action on TripleMap.com

Page 116: Improving RDF Search Performance with Lucene and SIREN

SPARQL

LUCENE

SIREN

TripleMap.com