27
When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube [email protected] o.il

When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube [email protected]

Embed Size (px)

Citation preview

Page 1: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

When Google Isn’t Enough!

Finding Information on the Invisible Web

Yaacov [email protected]

Page 2: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

What is the Visible (Surface) Web?

“It’s made up of HTML Web pages that the search engines have chosen to include in their

indices. It’s no more complicated than that.”

Sherman and Price.

Page 3: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

What is the Visible (Surface) Web?

•A collection of webpages •Searchable with “search engines”•What you and I think of as the “Internet” is actually only a small portion of the Internet

Page 4: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

What is the Visible (Surface) Web?

•High volume

•Mass appeal

•High value

•Small percentage of web content –Exception: Google books and Google Scholar

Page 5: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

What is the Invisible Web?

•What search engines do not search•Searchable Databases

–Tens of Thousands–Accessible and searchable via the Internet–Results often dynamically generated in specific response to your request (eBay, MapQuest, etc.)

Page 6: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

What is the Invisible Web?•Excluded Pages

–Excluded per search engine–Excluded per webpage by the owner of the site

•Typically databases–Businesses–Governments–Schools–Libraries –Associations

Page 7: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

What is the Invisible Web?•Academic•Never been indexed or linked•Uniquely generated pages•Proprietary •Confidential•Protected by username & password•Constitutes the majority of the webpages on the Internet

Page 8: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

•The Invisible Web is about 550 times larger than the visible web and is growing much faster•The deep Web consists of about 91,000 terabytes. •The surface Web is only about 167 terabytes1•The Library of Congress contains about 11 terabytes. •Quality content is 1,000 to 2,000 times greater than surface web•95% of the Deep Web is accessible to public (no fees or subscription required)

•based on extrapolations from a study done at University of California, Berkeley

Visible vs. Invisible Web

Yaacov Taube
based on extrapolations from a study done at University of California, Berkeley
Page 9: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Opaque Web

Private Web

Proprietary Web

Pay per click

What is on the Invisible Web

Page 10: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

• Requires payment

• Requires registration

• Dynamically generated

• Very new

• Website specifically stops spiders

Why can’t Google find it?

Page 11: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

• Fixed, or Could be indexed, but is not

• Deemed not important enough

• Too new and therefore not linked

• Never makes max results cutoff

• No one ever linked or submitted URL

Opaque Web

Page 12: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Private Web

• Deliberately excluded– Password– Special coding in website stops spiders

• Only for select individuals– Employees– Students – Researchers

Page 13: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Proprietary Web• Protected

– Password – Registration (N.Y. Times, eBay, banks, etc.) – Terms of Use

• Anyone can access if you – Pay – Register– Agree to terms

Page 14: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Pay per click

Search Engine Marketing toolsEx: overture.com, FindWhat.com

Page 15: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

When do I use ….

• Portal or Directory?

• Search Engine?

• Invisible Web?

Page 16: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Portal or Directory

• You have a general topic• You know little about the subject• You do not know keywords • You want someone or something to

have sorted out the junk• You need an exploratory overview

Page 17: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Search Engine

• You are looking for something specific• You have keywords• You are pretty sure the information is

– advertised or – otherwise generally disseminated

Page 18: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Tips for search engines

• Use a toolbar• Determine the key words/phrases

most likely to be in your document and nowhere else

• Learn and use Boolean Operators• Scan results • Question the results

Page 19: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Invisible Web• You are pretty sure the information is in a

specific database • Need something authoritative• Speed• The information is dynamically generated• You are familiar with the database

– Search techniques– Protocols– Access requirements

Page 20: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Searching the Invisible Web• Directories – subject guide compiled by

human editors

• Specialized Search Engines– http://library.albany.edu/internet/choose.html

• Special Databases ( Library of Congress,Library of Congress

http://catalog.loc.gov

LookSmart’s Find Articles (over 900 publicationshttp://www.findarticles.com

National Science Digital Libraryhttp://www.nsdl.org

Singing Fish – audio and videohttp://www.singingfish.com

Page 21: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Special Databases

• Library of Congress– http://catalog.loc.gov

• LookSmart’s Find Articles (over 900 publications)– http://www.findarticles.com

• National Science Digital Library– http://www.nsdl.org

• Singing Fish – audio and video– http://www.singingfish.com

Page 22: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Types of Databases

Information stored in tables (Access, Oracle, SQL Server, DB2) and accessible only by query.

Examples: • Phone books, People finders, • Patents, laws• Items for sale in a Web store or Web-based auctions • Digital exhibits• Multimedia and graphical files• Stock and bond prices

Page 23: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Types of Hidden Info

• Pages in searchable databases: medical (WebMD.com), patent, scientific, legal (Lexis and Westlaw), reference

• Pages requiring login or registration: Social Sites, New

York Times, web based applications, calendars, Google Docs, etc. • Government publications or databases: ERIC,

usa.gov • Online databases: Gale Research• PDF files, audio, video, any new format

Page 24: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

More hidden stuff

• Dictionaries and thesauri• Sites that require forms to be filled out (ex:

travel direction, job hunting)• Product catalogs and library catalogs• Newspaper and magazine archives• Dynamic web pages (ex: airline flight

checkers, mapquest)• Interactive tools (ex: calculators &

measurement converters)

Page 25: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Access to invisible web is improving …

Google Books http://books.google.com/

Google Scholar http://scholar.google.co.il/

Page 26: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

Maybe Consider …

• Specialized Databases such as Dialog, Nexis Lexis, Factiva, etc. (not cheap)

• Use an Information Professional www.aiip.org

Page 27: When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube yankee@infoserve.co.il

To Conclude …

Focus and continue doing what you do best and what you have been trained for and let an Information Professional find the info you need.

He is trained to do it faster, more effectively and efficiently than you or one of your employees. (www.aiip.org)