Keyword Search over RDF Graphs
Shady Elbassuoni* and Roi Blanco**
* Max-Planck Institute for Informatics
** Yahoo! Research, Barcelona
RDF Datasets
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardInnerspace hasGenre ComedyJoe_Dante directed InnerspaceToy_Story hasWonPrize Academy_AwardRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyTom_Hanks actedIn Toy_StoryDiner hasWonPrize Academy_AwardDiner type Comedy_filmsSteve_Guttenberg actedIn DinerThe_Pink_Panther type Criminal_comedy_filmsThe_Pink_Panther hasWonPrize Academy_AwardPolice_Academy type Comedy_filmsSteve_Guttenberg actedIn Police_AcademyThe_Darwin_Awards type Comedy_films
subject predicate object
Searching RDF Data
Structured triple-pattern queries (SPARQL) Example: comedies that have won an
academy award
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
SELECT ?mWHERE {?m hasGenre Comedy . ?m hasWonPrize Academy_Award}
Searching RDF Data
Triple-pattern queries are very expressive but are not that useable Most users/ Search APIs prefer keyword queries
Support keyword search over RDF graphs
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
Keyword Search over RDF Data
How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph
What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
Keyword Search over RDF Data
How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph
What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph
What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)
Keyword Search over RDF Data
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
How to process keyword queries? Translate keyword queries into SPARQL Directly process the queries over the RDF graph
What are the results to a keyword query? Resources Triples Tuples of triples (subgraphs)
Keyword Search over RDF Data
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
Processing Keyword Queries
Construct a document D(t) for each triple t D(t) contains all literals in t and any text
associated with the URIs in t
Example:
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
innerspace USA1987 science fiction comedy film Joe Dante Michael Finnell Dennis Quaid Martin Short Meg Ryan academy award best visual effects …
innerspace USA1987 science fiction comedy film Joe Dante Michael Finnell Dennis Quaid Martin Short Meg Ryan academy award best visual effects …
t: Innerspace hasGenre Comedy
We can now create triple-term indexes
Retrieving Query Results For each query keyword, retrieve a list of triples Join the triples from different lists based on their URIs
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
comedy award
Innerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...
Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...
`
T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award
Retrieving Query Results Retrieve a list of triples matching a query keyword Join the triples from different lists based on their URIs
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
comedy award
Innerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...
Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...
`
T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardT: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_Award
Retrieving Query Results Retrieve a list of triples matching a query keyword Join the triples from different lists based on their URIs
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
comedy award
Innerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...
Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...
`
T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardT: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_AwardT: Police_Academy type Comedy_Films . The_Darwin_Awards type Comedy_Films
Result Ranking is crucial!!
Language Models for Triples
D(t)
t:Innerspace hasGenre Comedy
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
w P(w|D(t))
innerspace 0.234
1987 0.123
science 0.012
fiction 0.020
comedy 0.111
film 0.179
classic 0.111
meg 0.019
ryan 0.019
oscar 0.148
. . . . . .
Esitmate from
w
P(w)
Ranking Model
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
comedy award
T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award
but we treat triples as bags of words!
Ranking Model
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
comedy award
T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award
probability of the structure of triple t being relevant to keyword w
Estimating Structural Relevance
For each keyword, construct a probability distribution over predicates
Example: award
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
r P(r|w)
hasWonPrize 0.459
wasNominatedFor 0.387
type 0.112
directed 0.020
actedIn 0.021
producedIn 0.025
bornIn 0.008
. . . . . .
P(Innerspace hasWonPrize Academy_Award|award) = P(hasWonPrize|award)
estimated from the whole dataset
Example Ranked Query Results
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
comedy award
Bag of Words
Combat_Academy type Comedy_films . The_Darwin_Awards type Comedy_filmsPolice_Academy type Comedy_films . The_Darwin_Awards type Comedy_films Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award
Structure Aware
Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardToy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_AwardShrek hasWonPrize Academy_Award_Best_Animated_Feature . Shrek hasGenre Comedy
Experimental Setup
User study over two RDF datasets: movies from IMDB books from LibraryThing
Models compared: Structure Aware Approach Bag of Words Approach Language-model-based Object Retrieval BANKS (keyword search over databases)
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
Experimental Setup
30 evaluation queries Gathered relevance assessments for the top-
50 results retrieved by each model
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
Experimental Results
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
P-value < 0.05
Conclusion
Keyword Search over RDF data is crucial To support keyword search over RDF data
Combine structured triples with text Construct a document for each triple
Retrieve meaningful query results Tuples of joined triples Can be extended to larger subgraphs of the RDF
graph Rank the retrieved results
A language model approach that uses both text and structure
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
Ranking Model
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011
RDF Graphs
Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011