81
Search and the ‘Net @ 2013 Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2012

Search and the ‘Net @ 2013 Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’

Embed Size (px)

Citation preview

Search and the ‘Net @ 2013

Michael HunterReference Librarian

Hobart and William Smith Colleges

For Rochester Regional Library Council

Member Libraries’ StaffSponsored by the

Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB)

funds granted by the New York State Library 2012

For today . . .

The Searchscape Entity-based Search New Services and Tools The Social Web Bing, Blekko, DuckDuckGo, Exalead News from Google A Privacy Primer Trends and Future Directions Linklist

http://people.hws.edu/hunter/searchnet13links.htm

America at the Digital Turning PointCenter for the Digital Future – USC Annenberg School for Communication www.digitalcenter.org/pdf/CDF_10_year_digital_turning_point.pdf Longitudinal study over 10 years Over 2,000 US households surveyed each

year “…online behavior changes relentlessly.” “…constant social connection, unlimited

access to information, and unprecedented abilities to purchase.”

“…online technology creates extraordinary demands on our time, major concerns about privacy, and fundamental questions about the proliferation of the digital realm…”

America at the Digital Turning PointSelected highlights Americans view the Internet as an

important information source, yet many Internet users do not trust much of the information (there)

Our privacy is lost. Most printed daily newspapers will be gone

in about five years. The sheer overwhelming nature of

technology may be reaching a critical point. Because of online technology, work is

increasingly a 24/7 experience.

America at the Digital Turning PointTime spent face-to-face with family in the household since the Internet

The Web Worldwidedata from the International Telecommunications Union 2011

Total Population – ca. 7 b. Connected to the Web – ca. 2 b. Mobile subscriptions – ca. 6 b.

Mobile subscriptions forecast for 2017- 9 b. with 5 b. mobile broadband connections

GLOBAL 5,981,000,000

Developed nations 1,461,000,000

Developing nations 4,520,000,000

World Internet Projectwww.worldinternetproject.netBy using the Internet, people like you can better understand politics – 2009 reporting countries

New Top Level Domains (ICANN 1/11/12) .com domains almost exhausted for new

website names “Someone got there first” New businesses must pay domain brokers

for an address or register a new one with un-natural, insignificant words

Now possible to purchase a unique TLD (.mycompany or .ourtrademark or .ourbrand)

Fee - $185,000 with waiting period of 2 years.

Domain Registration

Currently unrestricted: .com .info .net .org

Currently require proof of eligibility .edu .coop .mil .gov .int .museum .xxx .aero .asia

Search engines and satisfactionmdgadvertising.com (data from Pew Research)How often do you actually find the information you’re looking for with search engines?

Entity-based Search:Google’s Knowledge GraphBing’s Satori

Entity-based searchThe back end- How S.E’s worked until now

Matched query terms to terms in their crawler-created database

Results refined Linkage patterns Popularity Personalization Other (?????)

Ambiguous terms abound“kings” “jaguar” “Apollo”

Can a system know????

“Charles Dickens” This searcher wants information about

and books by him “Frank Lloyd Wright”

This searcher wants information about and pictures of buildings designed by him

The basics…. Entity database seeded with a

large“bag of nouns” and supplemented with nouns from web crawls identified through natural language processing

These nouns are mapped to another database of information related and/or relevant to those nouns through n.l.p. beyond simple text matches

Results can be customized based on click responses from previous anonymous searches for that query

Yahoo Research paper - 2009http://research.yahoo.com/files/pods09-woc.pdf Extract structured data (addresses, prices,

item #, etc.) from web documents and associate it with an entity

Link relationships between entities An actor to his films and other actors he has

worked with Discover categorizing information in the

document’s content Subject headings Reviews ( : or ) : Type of food served

The front end- Google’s Knowledge Graph: Focused on questions and answers Contextual box for ambiguous terms

with short descriptions Bing’s Satori: Focused on potential “actions”

associated with the entity Searchers for a rock band usually want to

buy a recording, find lyrics or get tickets “Snapshot” panel – entity-based results

from the social web (your’s and others)

Benefits of entity-based search Greater predictability of searcher

satisfaction Discovers related information that

does not contain the search term(s) Disambiguates many terms Colocates related information from

across the Web in a variety of filetypes

The Long Tailhttp://searchengineland.com/search-illustrated-b2b-long-tail-seo-13237

Future challenges- the “long tail” Entities are now limited to the most

popular topics Currently no way to map complex

queries to an entity or entity group “volcanic eruptions in the 18th century” “Lady Gaga concerts in a warm location”

Currently limited to English only Including more entities in English and

other languages will greatly increase processing and impact response time

New Services and ToolsRealtime, Metas and Collaboration

http://marketingland.com/new-social-discovery-engine-bottlenose-aims-to-take-over-real-time-exploration-17024

Bottlenose: A realtime meta

Launched 8/12 (public beta) Homepage access via login to your social

network (gives Bottlenose access also) Click into Social Search tab and search a

category with no login (11/27/12) Searches all the major social networks Events, trending topics and people Tabs to sort, organize and display Mobile apps available

Terrier – www.smartfp7.eu

Open source Research Project of the EU based at U. of Glasgow

Real-time information about the “real world” Current traffic conditions at a specific

intersection My friends’ favorite bar right now

“Smart Cities” concept Physical spaces covered in an array of

intelligent sensors which communicate and can be searched for information

Zuula: a multi-meta

Web search includes Google, Bing,Yahoo, Gigablast, Exalead, Alexa, EntireWeb, Mahalo, Mojeek

Unique sources and settings available for each type of search: Web News Images Tags Blogs Jobs

Tab through results from each source engine

Polymeta

Web search includes Google, Bing, Ask, Yahoo, Exalead

Source selection available for each search type Web News Images Videos Twitter Blogs Twitter search is limited to top 50 containing

your search terms Faceted and graphed results available Related results from other search types

appear to the right

Searchteam.com

Search engine with wiki-like, real-time collaborative work spaces

“Collective knowledge from your trusted social network circles”

Web sites Videos (YouTube) ImagesReference (Wikipedia) EducationalBooks and Articles (Amazon)

Faceted results and suggested searches Related main topics Subtopics Related searches (suggested)

Searchteam.com

SearchSpaces Organize and share links Online forum for collaborative searching

with friends Small database Educational tab not inclusive of all .edu

domains Results counts unreliable

GapVisnrabinowitz.github.com/gapvis/index.html Maps occurrences of geographic places

in texts Currently includes public domain texts

of Graeco-Roman literature Project of classical scholars and

visualization designers in the US and UK In beta

The Shape of Today’s Social Web

Why search the social web???

Public responses/attitudes/primary sources Breaking news Trending topics and people Latest product reviews Companies and competition

Security, technology topics (latest virus, etc.) Locate individuals and their networks

Who they follow, who follows them People interested in a topic/hobby

Monitor collaborations

Social Networks in the Egyptian Revolution

1/25/11-2/11/11Enabling protesters to become citizen

journalists

Mining Today’s Social Web:The trust factors People you don’t know

Wikipedia Human-created databases, directories

“I need a few good sites on solar energy”

Mahalo, Ipl2.org Q&A Services

“How do I repair my garage door opener?”

Yahoo Answers, Answers.com, Mahalo Answers

Mining Today’s Social Web:The trust factors

People you follow Twitter-human created Tweets“What’s the buzz on Beyonce?”

People you know Post a question to friends and family“What type of Mac should I buy?” Facebook, LinkedIn, Google+, Bing (login

via Facebook)

Tumblr

Microblogging platform; requires free account

Allows users to post multimedia and other content to a “tumblog”

Search options www.tumblr.com- posts searchable by

author-supplied tags only; no keyword search Tumblow.com- offers keyword search Google site search- more comprehensive

than tumblow Site:www.tumblr.com +search term(s)

TwitterminingSome tweets are more “authoritative” than others…

Access to unfiltered, real-time perspective on what people are thinking and doing

Authority (and usefulness) of a tweet depends on Who sent it The number and “authority” of their

followers When it was sent Documents/sites it refers to

Twittermining Tools

Twitter.com Requires a (free) account Only the latest 2 weeks available Searchable by hashtag (#)

Author-designated keyword or significant term or phrase

#rochester #jobs #marketing

Twittermining Tools

Discover Tab (access via your account) Launched 5/12 Offers Personalized content based on your

Twitter activity Favorites, follows, retweets, and more by

people you follow Who to follow -Twitter accounts suggested

for you based on who you follow Browse categories (<25) and

people/organizations heavily associated with the categories

Twittermining Tools

https://twitter.com/search-advanced No account required Only the latest 2 weeks available Advanced search features

Booleans Hashtag Language limit Author search (tweets from or to) “Near this place” Attitude – positive, negative, question

.

Twittermining Tools

Storify.com Users build social stories, bringing

together media scattered across the Web into a coherent narrative

Access material shared with and by you and public posts

Postings, status updates, photos, videos, podcasts from Twitter, Facebook, YouTube, Flickr, Instagram and more

Discover others with similar interests Requires (free) account, via Facebook or

Twitter

What does/could searching the social web provide your library’s users?

Established Services:Bing, Blekko, DuckDuckGo, Exalead

The Fallacy of the Superior Search Engine

Conrad Saam*

Is there a difference in the quality of search results from Google and Bing? Data set of 100 difficult queries

“clean crayon off an led t.v. screen”“Who was Kim Jong Un’s mother?”“wii new release rumors”

*http://searchengineland.com/google-fails-to-trounce-bing-again-the-fallacy-of-the-superior-search-engine-revisited-107238

The Fallacy of the Superior Search Engine

Evaluative factors Timeliness One-click access to information Volume of content Lack of spam Authoritative sites appear in first 3 results

The winner??? G. 296 B. 274 “Bing needs to be a much better search

engine than Google to make it worth the switch”

October, 2012

Microsoft’s Bing Redesigned 6/8/12 Social search results now located in the

newSocial Sidebar (Facebook-based)

When logged in through Facebook Ask friends Friends who might know People who know Feed of questions you’ve asked your FB

friends through Bing Without a FB login Sidebar results come

from public posts

What Bing is NOW

Travel- Price Predictor Video- Hover and get a preview Music: Artists – All content related to the

artist (entity-based search) Events – FanSnap (meta for ticket

purchasing) Shopping – Hottest deals on the web right

now Maps – Malls and Airports added Everywhere – Xbox, Mobile, iPad

Curating the web with Blekkohttp://blekko.com (still in beta!)

Human/crawler service Blekko (human) editors create “topic” and

“built-in” slashtags used to label content in the Blekko crawler database.

Registered users can create their own tags for any site in the Blekko database for a personal, searchable web

Slashtags help refine results and eliminate spam

Small but well curated database “AdSpam” algo blocked 1.5 m. sites in the first

6 months

Blekko: Under the hood

3 search options Web results Slashtags (human/expert curation) Likes (Facebook friends’ curation)

Adding a slashtag limits the search to those sites so tagged

Note: adding multiple Blekko “topic” slashtags limits the search to sites which have ALL the tags

Blekko this year

Slashtags now automatically added to searches in 500 broad categories based on aggregated anonymous search behavior.

For suggested slashtags-Search term/

Adding /monte gives you results from 3 engines; sources revealed only after you select the most relevant results set

Received substantial investment from major Russian search engine Yandex

DuckDuckGo – http://ddg.gg

Home and search results pages redesigned

Related “Search Suggestions” on results pages

“Goodies” – user-supplied questions with answers in 20 broad categoriesEntertainment ProgrammingFood & Drink SysadminTravel Web Design

Exalead – http://exalead.com/search

Enterprise search company based in France with free web search as product demo

Advanced search options appear as questions

Database well maintained Faceted search results Used by several of the major

metaengines

The Year at Google

Personalization and Social Networks in Google Results: A Timeline 2005 – Sites you visited given a boost

(Opt-in via Google account) 2009 – Sites your IP address visited

given a boost by default (Opt-out possible)

2009 – Sites mentioned by your personal social network given a boost, but separated from main results (Opt-in)

2011 – Social network results blended with main results (Opt-in)

Personalization and Social Networks:2012 – Search Plus Your World Boosts in results ranking

Based on IP search behavior (Opt-out) Based on personal search behavior (Opt-

in) Based on your social networks (Opt-in) Based on Google+ public posts (Default;

multiple steps needed to opt-out) Based on your private Google+ network

posts(Opt-in)

IP-based personalization

To permanently opt-out go to Search Settings

To opt-out on a per-search basis use the toggle (top right)

Personalization based on your personal search behavior is still opt-in

Google+ plus.google.com

Google’s social network (requires a Google account)

Launched 9/19/11 (access to Twitter ended 7/2/11)

Currently over 400 m users, 100 m active on a monthly basis Facebook currently over 1.01 b. active

users Offers “hangouts” –video chat rooms

within the social network Businesses and organizations allowed

Google+

“Google+1” allows Google+ member to give a site a vote of approval

Web search results include +1 votes, sometimes location-based

Best access to content is through Google: site:plus.google.com search term(s)

Social Networks and Results: Users RespondA distraction and concerns about privacy

Search Lesson Plans and Common Core Standards

Part of Google’s search education initiative 5 main topics with beginner, intermediate and

advanced levels Picking the right search terms Understanding search results Narrowing a search to get the best results Searching for evidence for research tasks Evaluating credibility of sources

google.com/insidesearch/searcheducation/lessons.html

Search Lesson Plans

Focus is using Google, but adaptable to other sources

Each plan lists Common Core Standards addressed

Include illustrative slides and suggested assessments of student work

“A Google-a-day challenge” questions with answers

Good strategies for deep web searching in Advanced Level of Lesson #1

APA Lawsuit settled

2005 – Association of American Publishers and McGraw-Hill, Person, Penguin, John Wiley, Simon & Shuster allege copyright violation in the Library scanning project

2012- Google settles with publishers, who may now remove their books or journals from the Library project

Author’s Guild suit remains unsettled

Content Removal Requests 1/12 – 6/12

Top 6 countries

Country Total Requests

US 4167

UK 3193

Brazil 2310

Turkey 2084

Germany 1903

France 1250

SHARING USER INFORMATION HAS BECOME THE INDUSTRY NORM

A Privacy Primer

Search engines and privacy

Google’s policy for its account-based services New unified privacy policy in effect

3/1/12 User profiles and individual search

behavior will be shared among all Google services that require a login

Account holders cannot opt-out of this sharing

Separate privacy policies still in effect for Google Books and Chrome

Google’s policy for services not requiring an account

Covers Search, Youtube IP-based personalization in effect since

2009 “We will not combine Double-Click

cookie information with personally identifiable information unless we have your opt-in consent”

Remarketing or retargeting in the Google ad network Company and other websites tag visitors

with an IP-based (personally anonymous) cookie

When you visit other sites in Google’s ad network you will see ads from sites you have visited before based on these cookies

How to opt-out of remarketing/retargeting in your browser Turn off Web history Clear/Remove Web history Accept no cookies

Bing’s privacy policy

For MS services that require a Windows Live ID “…information collected through one MS

service may be combined with information obtained through other Microsoft services.”

Signing into one service may automatically sign you into other Microsoft services

To opt-out Use separate browsers for each MS service

you access Sign in and out of your accounts throughout

the day to de-couple specific activities

DuckDuckGo

Does not collect or share personal information

No browser cookies stored No personally identifiable or IP-based

search histories stored No IP addresses stored Very comprehensive with high-quality

search results

Current Trends andFuture Directions

Search Engine Trends in 2012 Reversal in transparency at the major

services Increasing personalization as the norm Explosion of social network influence Stronger anti-competitive allegations Modest Bing marketshare gains

“The nature of the Internet is undergoing a paradigm shift” – Matthew Berk (Zyxt Labs)http://zyxt.com/post/26851542949/study-of-1-3-billion-urls-22-of-web-pages-reference

2012 study of 1.3 billion URLs 22% of web pages contain Facebook

URLs Among 500 m. hardcoded links to

Facebook only 3.5 m. are unique URLs from Common Crawl (open

repository of web crawl data that can be accessed and analyzed by everyone)

“The Internet is shifting….” – M. Berk from unstructured to structured content

Structured content can be parsed and formatted into any other type of content

Unstructured content- static html from websites to entities

Nodes in social and other networks that contain or link to websites and other content

from links to connection Growth of business and personal presence

on the social web

In the future ---

Mobile search will continue to grow rapidly Entity-based search will continue to develop Personalization will grow but more slowly as

users better understand the consequences Social networks will continue as powerful

tools for grassroots political movements Web access and web search will attract more

government scrutiny worldwide

Thank You and Enjoy Your Searching!

Michael HunterReference Librarian

Hobart and William Smith CollegesGeneva, NY 14456

(315) 781-3014 [email protected]