Upload
souridatta
View
821
Download
0
Embed Size (px)
DESCRIPTION
Presentation talks about BOSS and Content Analysis along with Dapper.
Citation preview
BOSS around the web
Saurabh Sahni YDN Developer, Hacker, Evangelist
Souri DattaStructured Data Extraction Team
http://www.flickr.com/photos/sumrow/1267682594/sizes/l/
BOSS is Build your own search service
http://developer.yahoo.com/search/boss/
Provides APIs
To our Searchdatabase
TO BUILD your ownpowerful
Search applications
BOSS allows you to search over
Web, images, news & Blogs
What can be done on top of BOSS?
• Blend and re-rank search results
• Your own look and feel• Mix it with other APIs
BOSS Pricing
Free for building your hacks!!
BOSS uses OAuth for securityCode : https://github.com/sourind/hacku/
Get a FREE consumer key and
secret
http://hackyourworld.org/hacku/
http://developer.yahoo.com/yql/console/
3. Copy This url
1. Select yql query
2. Select output format
Finding images of “The Dark Knight Rises”
select * from boss.search where q="The Dark Knight Rises" and service="images"
and ck="..." and secret="..."
Finding “The Dark Knight Rises” in IMDB, movies.yahoo.com
select * from boss.search where q="The Dark Knight Rises" and
sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."
Spell Check and Correction
select * from boss.search where q="The Dirk Knight Rises" and service="spelling" and
ck="..." and secret="..."
Finding news on “The Dark Knight Rises”
select * from boss.search where q="The Dark Knight Rises" and service="news" and
ck="..." and secret="..."
Finding interesting objects:Content Analysis
select * from contentanalysis.analyze where text="Sachin Tendulkar is batting very well"
Content Analysis from a URL
select * from contentanalysis.analyze where url="http://www.cnn.com/"
Lets See it in Action!
Query Cheatsheet• Find images of “The Dark Knight Rises”• select * from boss.search where q="The Dark
Knight Rises" and service="images" and ck="..." and secret="..."
• Find reviews of “The Dark Knight Rises”• select * from boss.search where q="reviews
intitle:The Dark Knight Rises" and service="web" and ck="..." and secret="…"
• Search for Avatar but not the movie: • select * from boss.search where q="Avatar -
movie" and ck="..." and secret="... "
• Search pdfs of “The Dark Knight Rises”• select * from boss.search where q="The Dark
Knight Rises" and type="pdf" and ck="..." and secret="..."
Query Cheatsheet• Find all the news of “The Dark Knight Rises”• select * from boss.search where q="The Dark
Knight Rises" and service="news" and ck="..." and secret="..."
• Get long abstracts in the results• select * from boss.search where q="The Dark
Knight Rises" and abstract="long" and ck="..." and secret="…"
• Retrieve 51-100 results of the query• select * from boss.search where q="The Dark
Knight Rises" and start=51 and ck="..." and secret="... "
EXAMPLES
duckduckgo.com
Data Extraction
Why extraction is difficult?• Internet has lot of information• Not all can be processed by machines
– Unstructured data– E.g. DiscountedPrice and RedcudedPrice of a
product (both mean the same)
• Ultimate aim is to publish data in structured format
• Most simple way- xml,json
Web Scraping• Demo Dapper
More Resources• Yahoo! BOSS:
http://developer.yahoo.com/boss • BOSS Technical Documentation: http://
developer.yahoo.com/search/boss/boss_api_guide/
• Content Analysis : http://developer.yahoo.com/contentanalysis/
• Oauth sample code : https://github.com/sourind/hacku/
Questions??http://www.flickr.com/photos/reem_unique/4119729692/
• http://slideshare.net/souridatta
• https://github.com/sourind/
Thanks!!