Upload
pradeep-varadaraja-banavara
View
958
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
MAKING SENSE OUT OF THINGS ON THE WEB@pradeepbv
3
We have been accumulating a lot of information
4http://en.wikipedia.org/wiki/File:Jingangjing.jpg
5
http://en.wikipedia.org/wiki/File:Printer_in_1568-ce.png
http://en.wikipedia.org/wiki/File:BuxheimStChristopher.jpg
6http://en.wikipedia.org/wiki/Odhecaton
7
http://upload.wikimedia.org/wikipedia/commons/f/f1/The_First_Telegraph.jpg
What hath God wrought
8
http://en.wikipedia.org/wiki/File:1891_Telegraph_Lines.jpg
1891 Telegraph Lines
9
Mr Watson—Come hereI want to see you
http://www.boerner.net/jboerner/?p=9396
10
radioRadio
11http://www.elon.edu/e-web/predictions/150/1930.xhtml
12
13
14
15
www
16http://en.wikipedia.org/wiki/File:NCSA_Mosaic.PNG
17
the Internet had an estimated 16 million users by 1995
18http://en.wikipedia.org/wiki/Venture_capital
19
People from all over the world started sharing their interests,
hopes and dreams online
20
21http://electrokami.com/wp-content/uploads/2010/09/the-internet-in-real-life.jpg
22
The number of devices connected to IP networks will be nearly three times as high as the global population in 2016
23
The Zettabyte Era
http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/VNI_Hyperconnectivity_WP.html
kilomegateragigapitaexazettayotta9,444,732,965,739,290,427,392 bits (1024 exbibytes)
24
“Reports that say that something hasn't happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.”
Donald Rumsfeld, US Defense Secretaryat a press conference at NATO Headquarters, Brussels, Belgium, June 6, 2002Image: planetization.org
25
Nicholas Carr worries that the flood of digital information is changing not only our habits, but even our mental capacities: Forced to scan and skim to keep up, we are losing our abilities to pay sustained attention, reflect deeply, or remember what we’ve learned.
26http://blogs.tusc.k12.al.us/bhslibrary/files/2012/01/Information_overload.jpg
Information overload?
27http://www.teachersdiary.com/.a/6a0115703931fc970c0128765537ba970c-800wi
DO YOU KNOW WHAT ARE YOU LOOKING FOR?
28http://www.flickr.com/photos/special/1597251/
DO YOU KNOW WHERE TO FIND WHAT YOU WANT?
29http://www.flickr.com/photos/sumrow/1267682594/sizes/l/
REGULAR SEARCH #FAIL?
30http://www.flickr.com/photos/sumrow/1267682594/sizes/l/
IS THERE A SUPERHEROWHO CAN HELP?
BUILD YOUR OWN SEARCH SERVICE
Yes, you are the superhero
BOSS IS BUILD YOUR OWN SEARCH SERVICE
http://developer.yahoo.com/search/boss/
BOSS PROVIDES APIS
TO OUR SEARCHDATA STORES
TO BUILD YOUR OWNPOWERFUL
SEARCH APPLICATIONS
BOSS allows you to search over
Web, images, news & Blogs
You can even monetize yourapplications using Search Ads from BOSS and get support.
What can be done on top of BOSS?• Blend and re-rank search results
• Your own look and feel
• Mix it with other APIs
BOSS Pricing
Free for building your hacks!!
Where do I start?
Restful XML and JSON API
Web
Image
Spelling
News
Search Ads
What’s in it?
http//www.flickr.com/photos/joeshlabotnik/419914250/sizes/o/in/photostream/.jpg
Oauth based Autentication
http//www.flickr.com/photos/friarsbalsam/5736126308/sizes/o/in/photostream/.jpg
What else do I get?
Web and Limited Web results
Image attributes
like height, width, etc
Time span filtering
for News Search
Document type filtering
Extended abstracts
http//www.flickr.com/photos/acidpix/6021203584/sizes/o/in/photostream/.jpg
BOSS + YQL
• Table Name: boss.search
• e.g. select * from boss.search where ck=… and secret=… and q=‘openhackindia’
Parameters Example
Consumer Key ck -
Consumer Secret secret -
Query Term q ‘iitd’
Searching “The Dark Knight”
Finding images of “The Dark Knight Rises”
select * from boss.search where q="The Dark Knight Rises" and service="images" and
ck="..." and secret="..."
Finding “The Dark Knight Rises” in IMDB, movies.yahoo.com
select * from boss.search where q="The Dark Knight Rises" and
sites="imdb.com,movies.yahoo.com" and ck="..." and secret="..."
Spell Check and Correction
select * from boss.search where q="The Dark Knight Rises" and service="spelling" and
ck="..." and secret="..."
Finding news on “The Dark Knight Rises”
select * from boss.search where q="The Dark Knight Rises" and service="news" and ck="..."
and secret="..."
And through the BOSS API
Getting multiple data sets /ysearch/web,images,news?q=anna
/ysearch/web,images,news?web.q=anna&images.q=anna&news.q=lokpal
Searching through sites A Simple Movie Search
/ysearch/web?q=“Dark Knight”&
sites=movies.yahoo.com,netflix.com,imdb.com
AND/OR operators /ysearch/web?q="steve jobs"AND((ipad)OR(iphone))&sites=bestbuy.com,newegg.com
Important: Use Braces or quotes
Unary Operators Search for Batman but not “Dark Knight”
q=(batman -“Dark Knight")
Find pages with “Heath Ledger” but not “Dark Knight”
q=+”heath ledger”–”Dark
Knight”&sites=movies.yahoo.com
Force auto-spelling off
q=+”drk knight”
AND OR
Searching in body and in title
Searching for Dark Knight in the Title on Yahoo moviesq=reviews intitle:"dark knight"&sites=movies.yahoo.com
Searching for Dark Knight in the Title in Yahoo movies containing Christian Baleq=reviews intitle:"dark knight" inbody:"christian
bale"&sites=movies.yahoo.com
Market and document specific Filters
Search for “Dark Knight” in India specific sites q=“Dark Knight”&market=en-in
Search for “PDF’s containing “Dark Knight” q=“Dark Knight”&type=pdf
Search for MS Office type (except PPT’s) containing “Dark Knight” q=“Dark Knight”&type=msoffice,-ppt
Output
Image search parameters
Search for images that are not offensive
/ysearch/images?q=“san francisco”&filter=yes
Search for images that are wallpaper size
/ysearch/images?q=“san francisco”&dimensions=wallpaper
Search for a image at a certain refer URL
/ysearch/images?q=yahoo&refererurl=http://www.flickr.com
• Interesting Output Fields
format, file size, height, width, title, total result count
News search parameters
Search news that is less than 7 days old/ysearch/news?q=lokpal&age=7d
Search news that is between 20hrs and 2 days old
/ysearch/news?q=lokpal&age=20h2d
Re-rank news results by date
/ysearch/news?q=lokpal&ranking=true
Interesting Output Fields
Source, Date, Source URL
EXAMPLE HACKS
Duckduckgo.com
Interceder
Ask-boss (v1)
Hack: http://ask-boss.appspot.com Code: https://github.com/saurabhsahni/Hacks/tree/master/askBOSS
webmeme.in
http://hackyourworld.org/~iitb_pacman/search/
I did BOSS and got data, now how to extract information of out it?
make sense out of it?
Content Analysis
select * from contentanalysis.analyze where text="Yahoo! kicks off hackday”
Content Analysis from a URL
select * from contentanalysis.analyze where url="http://www.cnn.com/"
Term Exraction
select * from search.termextract where context in (select description from rss where url=‘’)
More resources Yahoo! BOSS: http://developer.yahoo.com/boss
BOSS Technical Documentation:
http://developer.yahoo.com/search/boss/boss_api_guide/
YQL: http://developer.yahoo.com/yql
Amazon Web Services: http://aws.amazon.com
oAuth: http://oauth.net/
Open Data: http://theinfo.org
Alt Search Engines: http://www.altsearchengines.com/
Happy hacking!