39
Monkey with the Semantic Web

Semantic Searchmonkey

Embed Size (px)

DESCRIPTION

Semantic Search + SeachMonkey talk given at Yahoo! Hacku event.http://developer.yahoo.com/hackuhttp://developer.yahoo.com/searchmonkey

Citation preview

Page 1: Semantic Searchmonkey

Monkey with the Semantic Web

Page 2: Semantic Searchmonkey

SearchMonkey

Presentation by:

Paul Tarjan, Chief Technical Monkey

([email protected])

Online at:

http://www.slideshare.net/ptarjan/semantic-searchmonkey

Page 3: Semantic Searchmonkey

The web was / is fragmented

University event page

Friend’s website

Cool bookmarks

Super secret military site

Funny pictures

Page 4: Semantic Searchmonkey

So we added search to find stuff

University event page

Friend’s website Cool

bookmarks

Super secret

military site

Funny pictures

Google Yahoo

Page 5: Semantic Searchmonkey

But there are many similar sites

Facebook Events Evite Events Upcoming Events

Youtube Metacafe Vimeo

Digg Reddit Technorati

Let’s treat these as “views” onto “objects”

Page 6: Semantic Searchmonkey

Wouldn’t it be cool if you could do:

• object:video creator:”Paul Tarjan” length<=60s

Page 7: Semantic Searchmonkey

Wouldn’t it be cool if you could do:

• object:video creator:http://paulisageek.com/ length<=60s

Page 8: Semantic Searchmonkey

Wouldn’t it be cool if you could do:

• object:game name:”Desktop Tower Defense” version:1.5 publishdate:”May 2 2005”

Page 9: Semantic Searchmonkey

Wouldn’t it be cool if you could do:

• object:video author:”The Escapist” game:”Left 4 Dead”

Page 10: Semantic Searchmonkey

It gets even cooler

Page 11: Semantic Searchmonkey

Aggregation:

• object:review type:camera make:canon model:D40

Page 12: Semantic Searchmonkey

Aggregation:

• object:event date:”May 16, 2008” type:party price<$5

Page 13: Semantic Searchmonkey

Aggregation:

• object:photo person:“Paul Tarjan”

Page 14: Semantic Searchmonkey

Aggregation:

• object:photo person:http://paulisageek.com

Page 15: Semantic Searchmonkey

The Semantic What?

• Web pages are views of data for people to read

• Search Engines are a hack • They treat pages as a bucket of words • Lets turn the web into a database • APIs are good, but there is no “web” of APIs •  If you figure out a good way of doing that, let

me know

Page 16: Semantic Searchmonkey

Ok, I want to do it. Now what?

Page 17: Semantic Searchmonkey

Recommendation: µF

•  If there is a microformat for your data, use it –  hcard –  hreview –  hresume –  hcalendar –  rel-tag –  rel-licence –  xfn –  hatom –  geo

Page 18: Semantic Searchmonkey

µF in a nutshell

•  Change your @class to something that is known •  <div>

–  <span class=“name”>Paul Tarjan</span> –  <span class=‘email’>[email protected]</span>

•  </div> •  BECOMES •  <div class=“vcard”>

–  <span class=“fn”>Paul Tarjan</span> –  <span class=“email”>[email protected]</span>

•  </div>

Page 19: Semantic Searchmonkey

Recommendation: RDFa

•  If you have data that doesn’t really fit in a µF

• Examples: –  Markup APIs (YUI, javadoc, etc) –  Media (Audios, Videos, Games, Presentations) –  Job Postings

Page 20: Semantic Searchmonkey

RDFa in a nutshell

• Make a namespace • Use @property, @rel and @resource • For DATA: @property makes the node

contents into the value • For URLs: @rel makes the @resource into

the value

Page 21: Semantic Searchmonkey

Normal HTML

•  <html> …

<div class="private”> private static String <strong>_createCookieHash </strong> (hash) …

Page 22: Semantic Searchmonkey

RDFa: example

•  <html xmlns:yui="http://yuilibrary.com/rdf/1.0/yui.rdf#"> …

<div class="private” rel="yui:method" resource="#method__createCookieHash"> private static String <strong property="yui:name"> _createCookieHash </strong> (hash) …

Page 23: Semantic Searchmonkey

That’s it!

• Automatically picked up by semantic parsers / crawlers

• Can build a SearchMonkey app on it • Can make a mashup way easier than screen

scraping • Can get the data from Yahoo! BOSS

Page 24: Semantic Searchmonkey

an open platform for using structured data to build more useful and relevant search results

Before After

What is SearchMonkey?

Page 25: Semantic Searchmonkey

Enhanced Result: Zagat

Key/Value Pairs or Abstract

Links Image

Page 26: Semantic Searchmonkey

Infobar: Wikipedia Preview

Summary Blob

Page 27: Semantic Searchmonkey

Part of the puzzle

SearchMonkey

Semantic markup on web pages

Semantic vocabularies

Page 28: Semantic Searchmonkey

Vocabularies

• Need to speak the same language •  I like to see girls of that... caliber. • English, French, Spanish, Esparanto? • URLs to the rescue

–  Dublin Core (http://purl.org/dc/elements/1.1/) –  Friend of a Friend (http://xmlns.com/foaf/0.1/) –  X-Friend Network (http://gmpg.org/xfn/11/) –  … (many more)

Page 29: Semantic Searchmonkey

Syntax

• Nouns, Verbs, and Adjectives, oh my! • All phrases become lots of triples •  (Subject, Verb / Adj. / Prep. / etc, Object) • Key / Value pairs ++

–  Everything is a URL or String –  Subject doesn’t have to be the document

Page 30: Semantic Searchmonkey

Syntax 2

• Key / Value pair –  Title = Awesome SearchMonkey Presentation –  Homepage =

http://search.yahoo.com/searchmonkey

• Triples –  (self, http://purl.org/dc#title, “Awesome

SearchMonkey Presentation”) –  (self, http://vcard#url,

http://search.yahoo.com/searchmonkey)

Page 31: Semantic Searchmonkey

Decompose to triples

• My friend “Bob” is an idiot. –  (self, http://xmlns.com/foaf/0.1/knows,

genid:Ui__152310312_366) –  (genid:Ui__152310312_366, http://

www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”) –  (genid:Ui__152310312_366, http://

example.org/ptarjan/isInstanceOf, http://example.org/ptarjan/idiot)

• Unnamed nodes are O.K.

Page 32: Semantic Searchmonkey

Writing URLs takes a lot of work!

•  xmlns:foaf=http://xmlns.com/foaf/0.1/ •  xmlns:vcard=http://www.w3.org/2001/vcard-rdf/

3.0# •  xmlns:junk=http://example.org/ptarjan/ •  My friend “Bob” is an idiot.

–  (self, foaf:knows, genid:Ui__152310312_366) –  (genid:Ui__152310312_366, vcard:fn, “Bob”) –  (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot)

•  Unnamed nodes are O.K.

Page 33: Semantic Searchmonkey

RDFa

•  <html xmlns:foaf=“http://xmlns.com/foaf/0.1” xmlns:vcard=http://www.w3.org/2001/vcard-rdf/3.0# xmlns:junk=http://example.org/ptarjan/> <div rel=“foaf:knows”> <span property=“vcard:fn”>Bob</span> <span rel=“junk:isInstanceOf” resource=“junk:idiot” /> </div> </html>

Page 34: Semantic Searchmonkey

•  </SemanticWeb>

• Questions?

Page 35: Semantic Searchmonkey

Innards of SearchMonkey

• You build a web-service inside our framework

• When a search page renders –  We check which SM apps are enabled –  We call them

• 50ms for in-page • Long time for AJAX

–  They return data in our template –  We render them (and cache)

Page 36: Semantic Searchmonkey

Prototyping with XSLT

• What if I don’t have structured data? –  I don’t own the site –  I do own the site, but I want to prototype first

• Build an XSLT custom data service first –  Write some XSLT to extract the data and

transform it into DataRSS –  Mostly about finding the right XPath (use

Firebug or XPather ) –  Quick to implement, but brittle –  Can’t do a good Enhanced Result

Page 37: Semantic Searchmonkey

Do it for real

• Demo

Page 38: Semantic Searchmonkey

Examples

• Rubic’s cube • VTA Bus • API Monkey • BugMeNot • RetailMeNot • Amazon

Page 39: Semantic Searchmonkey

questions?