API's, Freebase, and the Collaborative Semantic web

Preview:

DESCRIPTION

A presentation about the state of the collaborative semantic web, including: - What? - Why? - Where do we stand? - A case study on Metaweb's Freebase project

Citation preview

API’s, Freebase, and the Collaborative Semantic Web

Diana Tamabayeva & Dan Delany

Dan Delanydan.delany@gmail.comcognitiveharmony.net

970‐309‐8598

Object‐Oriented Representation

Structured data in object ‐ useful to 

computers  

“JohnSmithisafive‐foot‐tallmalewhoweighsone‐hundredeightypounds.He’s

forty‐twoyearsoldandhelovesdogs.”

String of characters ‐ useful to people

sortByHeight(users);

isADude(“JohnSmith”);

Object‐Oriented Representation

classDog<Mammal@home,@age,@humandefgoHome

go(@home);end

classHuman<Mammal@home,@age,@pets@spouse,@friendsdefdivorce

@spouse=nil;end

classPlace@latlong,@elevation,@address,@namedefisBelowSeaLevel

@elevation<0;end

Home

HomeHuman

Pet

SpouseFriends

O‐O instance variables and methods represent  structured semantic relationships between pieces of information. 

Structured Semantic Relationships Among Objects

We visualize this with a semantic graph.

LAMP

The (hypertext) Web Today

WebServers

WWW

UserComputer

ServersServers

The (semantic) Web Tomorrow?

UserComputer

Servers

DataAggregator/Visualizer

StructuredData(OWL/RDF)DocumentData(HTML/CSS)Hyperlinks(Oldwwwlinks)

A Cloud that Talks To Itself.

vs.

Why? Searchability.

With semantic graphs, you can perform semantic searches by traversing the graph:

“Where does the woman who lives at 2408 Walnut work?”

Example: Image Analysis

Why? Context‐Aware AI

Today’s AI is limited by the domain‐specificity of its input data

Context‐Unaware: Blob Tracking

Blob2Blob1

Useful in it’s domain (in this case, touch detection), but ultimately has no ‘intelligence.’

Example: Image Analysis

Why? Context‐Aware AI

Today’s AI is limited by the domain‐specificity of its input data

Context‐Aware: Mine the semantic graph for heuristics and clues

DanRachel Taifur

TreesSky

You’re in the mountains near Aspen, CO. It is the fall. Your friends Taifur and Rachel are atthe same location. Etc.

Why? Context‐Aware AI

I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web ‐ the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day‐to‐day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.

 The ‘intelligent agents’ people have touted for ages will finally materialize.

Tim Berners‐Lee, 1999

14

Challenges ‐ Motivation/Critical Mass

Value comes from ubiquity, and ubiquity comes from value.How do we encourage adoption of technologythat does not yet provide value?

More importantly, how will it provide value to content producers? How is giving away your data’s meaning (your secret sauce) instead of presenting it alongside ads valuable?

Should semantic feeds be monetized?“Information As A Service”

Challenges ‐ Privacy/Security

• ReducedanonymityontheWeb• Increasedinvasionofprivacy• SoluSon:accessprivilegesmust

becontrolledbyinfrastructure‐levelsecurity

The State of the Semantic Web

Top‐Down: Content Structuring•SemanScsearchenginesgenerateandleverageaninternalsemanScdatabase•DatasSllreturnedasHTML,noAPInothelpingcreatethesemanScweb!

Top‐Down: Dapper

WWW

RSS

CSV

OWL

RDF

tosemanScweb

•Dapper‐UIforseXnguprulestoscrapesemi‐structureddata(CSV,RSS,XML)fromanysetofHTMLdocuments!

Top‐Down: Yahoo! Pipes

RSS OWL

RDF

tosemanScweb

•Yahoo!Pipes‐UIfortransforming,reformaXng,andcombiningdatafeedsintomoreusefuldatafeeds. JSON

JSON

CSV

RSS

Bottom‐Up: Publishing Standards

RDF and OWL standards•W3C‐sponsoredstandardsfordefiningsemanScrelaSonshipsandresources•Powerful,butcomplexandhardforhumanstoread/create•Nomo@va@onfordeveloperstocreate•NoW3C‐sponsoreduniversalontologies

1999

Dan’sCar CarVehicle Color

isA hasA

HasMul*ple(4)

RDF Data Node OWL Ontology

make

year

color

<rdf:Description rdf:about="http://.../DansCar"> <car:color>Red</car:color> <car:make>Honda</make> <car:year>1999</make>

owl:Class rdf:ID="Car" rdfs:subClassOf rdf:resource="#Vehicle" rdfs:subClassOf [a owl:Restriction; owl:cardinality "4"^^xsd:nonNegativeInteger; owl:onProperty <#Wheel> ]

API’s: I dream of a RESTful tomorrow

Thousands of content creators are already sharing structured data with API’s!

Bottom‐Up: Knowledge Bases

•CollaboraSvea_empttobuildfactualknowledgebase•“Object‐OrientedWikipedia”

DomainExperts

Bottom‐Up: Knowledge Bases

•Contentcontributedbyexpertsmanually,orbydatasetownersautomaScally.

DomainExperts

Seman@cAnalysis

Freebase: The Everything Graph

Metaweb Freebase Statistics

•FreebaseLaunch,March2007•LLCFounded,July2005

•25,379userstoday.•Growingby600‐800/month

•5.3milliontopicstoday•Growingby~15,000/month•Pulledfrompublicdata•Bycomparison,Wikipediahas2.64millionEnglisharScles

The Freebase ApproachCrea@veCommonsALribu@onLicense

CasualCollaborators

NodeEditorGUIApp Developers

Data Modelers

Expert Users &Dataset Owners

OpenAPI(MQL)

WWW

WWW

Exis@ngDatasets

ContentPublishers

RDFAPI

Freebase Data PolicyCrea@veCommonsALribu@onLicense

OpenAPI(MQL)RD

FAPI

•RequiresA_ribuSonofSource

[{ "album" : { "artist" : [], "name" : null, "release_date" : null }, "limit" : 25, "name" : null, "name~=" : "Love*", "type" : "/music/track" }]

MQLQuery:

Returns:[{ "album" : { "artist" : ["Massive Attack"], "name" : "Blue Lines", "release_date" : "1991-04-08" }, "name" : "One Love", "type" : "/music/track"},{"album" : { "artist" : [“Squirrel Nut Zippers"], "name" : "The Inevitable", "release_date" : "1995-03-17" }, "name" : "Anything But Love", "type" : "/music/track" }, { "album" : { "artist" : [ "PJ Harvey" ], "name" : "Dry", "release_date" : "1992-02-11" }, "name" : "Oh My Lover", "type" : "/music/track" },

DataFreedom!

The Freebase ApproachCrea@veCommonsALribu@onLicense

CasualCollaborators

NodeEditorGUIApp Developers

Data Modelers

Expert Users &Dataset Owners

OpenAPI(MQL)

WWW

WWW

Exis@ngDatasets

ContentPublishers

RDFAPI

Freebase Community

CasualCollaborators

App Developers

Data Modelers

Expert Users &Dataset Owners

AppDevelopersList

DataModelersList

IRCChat(opentoall)

DiscussionThreadsonIndividualTopics

•NoCentralForum•NoBacklogofMailingList•NoFriends•NoPrivateMessages

“Asaprogrammer,IfeelthatI'mmosteffecSvewhenI'mcontribuSng

largedatasets…faciliSesbuiltintoFreebasethatletusersuploadlistsoftopicsarelimitedtospecificsituaSons.”‐ Shawn Simister

Freebase Community Tools

CasualCollaborators

App Developers

Data Modelers

Expert Users &Dataset Owners

“Acre,theFreebaseapplicaSondevelopmentplaqorm,letsanyonemashupFreebasedatausingJavascriptandhaveithostedforfree.”‐ Shawn Simister, developer

Employees

“Wehavebeenworkinghardrecentlytoprovide

bulkimporttoolsforFreebase.Whilesuchtoolsexistinternally,thereconciliaSonprocesshasthusfarbeentoocomplicatedforpublicrelease.”‐ Brian Culbertson, Metaweb Engineer

“[DataModelingishardbecausenewschemasareaslowprocess,andtheycanbreakusers’code.Let’shavea

“SloppyFreebase”thatallowsuserstoenterunstructureddataunSlnewschemasaredefined.]”‐Jack Alves, former Metaweb Director of Engineering

“Sloppy” Data Modeling

“[DataModelingishardbecausenewschemasareaslowprocess,andtheycanbreakusers’code.Let’shavea

“SloppyFreebase”thatallowsuserstoenterunstructureddataunSlnewschemasaredefined.]”‐Jack Alves, former Metaweb Director of Engineering

Currently, Freebase users cannot submit data if there is not already a data structure + ontology built for that data type.

“Sloppy” data creation allows users to create their own data types, which will later be cleaned and standardized.

CasualCollaborators

Bob Jonesmajor: English

Eric Bradleystudying:

CS

John Greenfocus:

Sociology

User‐Generated, Semi‐structured “sloppy” data

Automated DataCleaner‐Upper

Bob Jonesarea of study:

English

Eric Bradleyarea of study:

CS

John Greenarea of study:

Sociology

Clean, structured data

OWL

RDF

tosemanScweb

Conclusion

Wearehere. • Progress• RDF/OWL• Freebase‐5.3mil.topics!• Dapper,Yahoo!Pipes,other

dataabstractors• API’soutthewazoo

• Issues• AdopSon• Privacy/Security• IntellectualProperty• We’regoodatmaking

content,butwesuckatminingitanddescribingit.