38
Why Metadata Matters: From a Search Engine Perspective. Schema 101 By: Barbara Starr Twitter: @BarbaraStarr Email: [email protected]

Smxeastbarbarastarr2012

Embed Size (px)

DESCRIPTION

Schema 101, Why the New Metadata Matters. "From a Search Engine Perspective" SMX East 2012

Citation preview

Page 1: Smxeastbarbarastarr2012

Why Metadata Matters: From a Search Engine Perspective.

Schema 101

By: Barbara StarrTwitter: @BarbaraStarrEmail: [email protected]

Page 2: Smxeastbarbarastarr2012

• Pursued a doctorate in Artificial Intelligence from South Africa in the 80's.

• Recruited to build intelligent/predictive trading systems on Wall Street

• Migrated to government-based contracts, several of which turned into real world products like

– SIRI (PAL from DARPA)– WATSON (Acquaint - IBM Watson Labs was a team

member)• From the vantage of a semantic technologist, I keenly

watched the evolution of the Semantic Web.• “Shocked into the real world” when working as a

consultant @ Overstock• Today - Educator, Consultant, Developer.

Meta InformationME

By: Barbara StarrTwitter: @BarbaraStarrEmail: [email protected]: http://www.linkedin.com/in/barbarastarr

My favorite author: Isaac Asimov

Favorite book: I Robot

Favorite character: MULTIVAC

Page 3: Smxeastbarbarastarr2012

Additional MetainformationFor the purpose of this talk:

same-as

MY ROBOT or Artificially Intelligent Entity or Search Engine

Page 4: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

How can I exploit metadata or

“semantic search”?

Page 5: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

RICH SNIPPETS 2009

tiles

Searchmonkey 2008I can directly extract

information to enhance SERP displays

Page 6: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

I can search directly on consumed metadata!

Page 7: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

I can provide direct answers to queries by

searching on consumed, verified and validated information

Page 8: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEWI can even aggregate answers or deduce

them (like a timeline of events)

Page 9: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

I can even use it in conjunction with machine learning techniques- to eg.

Train other components

I can detect relevancy

signals: i.e what content to show

to what audience

I can use it to Assist in

interpreting a user query

Penn Treebank tagset

?

Page 10: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Really interesting in terms of exposing long tail

content too. It makes things findable for me

when pages are published with structured markup!

I meant the beer brewer

in Arizona

Page 11: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

I’m a Search Engine Robot

I could really use this stuff. And it is like the tower

of babel out there!

MicrodataMicroformatsRDFa

Syntax Ontology:Vocabulary or lexicon

Multiple conflicting vocabularies that I will have to align internally

and multiple syntax formats as well.

Prior to Schema.org

Goodrelations for e-commerce

Page 12: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Time to get Serious!

Page 13: Smxeastbarbarastarr2012

What has been the history?

Percentage of URLs with embedded metadata in various formats

Five-fold increase between March, 2009 and October, 2010

Another five-fold increase between October 2010 and January, 2012

RDFa exploded in 2012 – Source Peter Mika - Yahoo

Page 14: Smxeastbarbarastarr2012

Current state of metadata on the Web

• 31% of webpages, 5% of domains contain some metadata– Analysis of the Bing Crawl (US crawl, January, 2012)– RDFa is most common format

• By URL: 25% RDFa, 7% microdata, 9% microformat• By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat

– Adoption is stronger among large publishers• Especially for RDFa and microdata• See also

– P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012– H.Mühleisen, C.Bizer.Web

Data Commons - Extracting Structured Data from Two Large Web Corpora, LDOW 2012

Page 15: Smxeastbarbarastarr2012

What’s been the HistoryLinked Open Data exploded from 2007 thru 2010

Oct 2007

Nov 2007

Page 16: Smxeastbarbarastarr2012

What’s been the History

Sept 2008

March 2009

Linked Open Data exploded from 2007 thru 2010

Page 17: Smxeastbarbarastarr2012

What’s been the HistoryLinked Open Data exploded from 2007 thru 2010

LOD Cloud

Sept 2010

Page 19: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Align and consume many vocabularies that may not be of interest to search

engines?

Rather mandate vocabulary And Syntax - microdata

A Search Engine alliance has the power

to MANDATE vocabulary and syntax!

Page 20: Smxeastbarbarastarr2012

Sample portion

Page 21: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

On the other hand – Not wise to

ignore standards bodies like W3C

No mandate on Syntax

Page 22: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Did I tell you I don’t like spam?

Page 23: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Make sure you are not cloaking by

feeding one set of information to me

and another to human users!

Ensure your data feeds match

information with the structured

markup or “metadata” on

your web pages.

Page 24: Smxeastbarbarastarr2012

Your Logo

SEARCH ENGINE POINT OF VIEW

Serving RELEVANT

ANSWERS are IMPERATIVE!

& central to my very being!

Page 25: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

ELSE I AM

Page 26: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

X

Page 27: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Adding context in search verticals really

helps me serve up relevant information

(Seriously increases my recall), as does

geospatial information.Consumed information - Structured Data Dashboard

Google’s “SearchVerticals”

Notice any correlations?I would advise you to!

Page 28: Smxeastbarbarastarr2012

OH! and be sure to check out Moores law

SEARCH ENGINE POINT OF VIEW

I also have a pretty good understanding of

big data and web intelligence so I can

leverage them!

SIRI

“Amazing fact: same amount of computing to answer one Google Search query as all the computing done -- in flight and on the ground -- for the entire Apollo program!

Page 29: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

I can leverage metadata for better image

search

SIRI

I can combine it with computer vision techniques.

I can enhance user’s shopping experience.

Page 30: Smxeastbarbarastarr2012
Page 31: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Know rather than Recognize?

INTRODUCING THE KNOWLEDGE GRAPH

Symbolic reasoning vs

stochastic reasoning (Latter is

more like NLP or page rank)

Page 32: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEWTalk of increase in screen real estate

and CTR?

And if you thought the knowledge graph was cool,

checkout the knowledge carousel!

Page 33: Smxeastbarbarastarr2012

SEARCH ENGINE POINT OF VIEW

Thank you for your time!

And just a bye-the-bye, this technology is still in it’s nascent stages. Can you imagine what I will

be able to do soon?

Barbara StarrEmail: [email protected]: @BarbaraStarr

Resources to help you! Make sure to use them wisely!

Page 34: Smxeastbarbarastarr2012

Resources at this point in timeCaveat: Some training may be required for some of the tools

Programming Languages:JavaSCript: Microdatajs Live microdataPhp: MicrodataphpRuby: RDF Microdata RDF Lib plugin PerlRuby: RDF Microdata Gem MidaJava: Sindice any23 library

PublishingForm Based tools:

Schema Creator Microdata generator

Standalone toolsWeb.instadata

Editors:Topbraid ComposerProtege

Platforms:DrupalJoomlaWordpress (about 7 of them)VirtuosoTopbraid Composer

Validators, Testers and More Check.rdfa.info Sindice InspectorRich Snippets Testing Tool Bing ValidatorStructured data Linter Online Parser?viewer and RSS generatorValidator.nu Google Structured Data Tester

Page 35: Smxeastbarbarastarr2012

Resources at this point in timeGoodrelations: Resources, generators, validators, more, ….

Page 37: Smxeastbarbarastarr2012

Franz new toolSoon to be released for SEO

Page 38: Smxeastbarbarastarr2012

Other Semantic Web Resources

OpenCalais – Can extract information about people, places and thingsAlchemyAPI – named entity extraction, topic recognition, keyword tagging, more ….Cogito – Expert SystemFranz Inc. – Gruff Many More….

Barbara StarrTwitter: @BarbaraStarr

Email: [email protected]: http://www.linkedin.com/in/barbarastarrFor more info contact:

Caveat: Some training may be required for some of the tools