30
The Invisible Web Definition Searching

The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Embed Size (px)

Citation preview

Page 1: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

The Invisible Web

DefinitionSearching

Page 2: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

The Invisible Web

Also called: deep content hidden internet dark matter

Page 3: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

The Invisible Web

The vast number of pages that search engines cannot or will not index

Restricted: login, password (such as intranets, databases; private, proprietary)

Sites not linked from anywhere (undiscovered) Sites that use a robots.txt file to keep files off limits from spiders Unsearchable or un-indexable file formats Non-static - searchable databases that only produce results

dynamically in response to a specific search request (such as CGI, ASP, CFM)

Real-time data – changes rapidly – too “fresh” Sites that are too “deep”

Page 4: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

The Invisible Web

Search engines often avoid indexing web pages that are delivered dynamically, such as via database programs:

Often, the search engine may not like the URL used in order to retrieve the document. Many dynamic delivery mechanisms make use of the ? symbol.

For example, a page may be found this way:http://www.website.com/cgi-bin/getpage.cgi?name=sitemap Most search engines will not read past the ? in that URL.

Page 5: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

The Invisible Web

Invisible Web sources tend to be: More current More comprehensive Searchable (however, not by SE’s) More specific/targeted Deeper breadth Often better quality

Page 6: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

The Invisible Web

Top types of “invisible” information News RSS Blogs Public company filings, stock prices Customized maps and directions Clinical trials Telephone numbers and addresses, postal codes Definitions Job postings Grant information Statistics Weather Museum, gallery, and library holdings

Page 7: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Finding the “Dark Matter”

Search Engines Specialized Search Engines Directories Vortals

Page 8: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Traditional Search Engines

Traditional Search Engines incorporation of “Invisible” Databases

Weather Maps Phone directories Catalogs Stock prices

Page 9: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Traditional Search Engines

Unless specially, programmed, though, spiders can’t find all the valuable resources available

Page 10: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Specialized Search Engines

Search deeper into sites: Go beyond top page, or homepage Choose sources to spider—topical sites

only “Smart” ranking and indexing based on

knowledge of the specific subject

Page 11: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Specialized Search Engines

There are hundreds of specialized search engines for almost every topic-

Search Engine Guide Specialty Search Engines

Page 12: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter
Page 13: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter
Page 14: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter
Page 15: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Directories

Collections of pre-screened web-sites into categories based on a controlled ontology

Ontology: classification of human knowledge into topics, similar to traditional library catalogs

Page 16: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Directories

Closed Model: paid editors; quality control (LookSmart, Yahoo)

Open Model: volunteer editors; (Open Directory Project, Google)

Page 17: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Directories

Easier access to relevant results Faster Access to materials not always indexed by

search engines—content in databases or file types not searched by spiders

Page 18: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Directories

Issues with directories: Inherently small Unseen editorial policies

May charge for listingLopsided coverage

Timeliness--Harder to keep updated

Page 19: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter
Page 20: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter
Page 21: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

SearchSearch

Page 22: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter
Page 23: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter
Page 24: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Vortals

Vortals: vertical-portal. Instead of being a horizontal, all-inclusive entry point into the Web, they are vertical, specialized entry points.

Comprehensive sites focusing on gathering and providing links to the best resources in a specific topic.

Usually are combined subject-specific search engines and subject-specific directories

Also called “focused crawlers”; metasites; guru; authority; industry guide; subject directory site

Page 25: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Vortals

Advantages – best of directories and subject specific search engines

More up-to-date - crawl subject specific pages more often

Deeper crawl - gets more of the content on each server

More precision, less recall

Page 26: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Searching the Invisible Web

How do you find these sites? Use directories known directories to find

invisible web searching and browsing tools:Librarians’ Index to the InternetOpen DirectoryGoogle DirectoryTeoma works well, too.

Page 27: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Searching the Invisible Web

Rethink your search: Think key terms specific details – macro vs. micro Example you want to find the melting point of hydrogen

peroxide. On the general web, you’d put in the key words melting, point, and “hydrogen peroxide” On the invisible web, you look for chemical databases, which included melting points as one feature of the database, once in the database, then you’d search for hydrogen peroxide

Page 28: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Searching the Invisible Web

Remember some concepts are assumed Do not use the subject a search term Example: If you are looking for information

on gender inequity in math education, exclude terms like education from your search in AskERIC, an education specific search tool

Page 29: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Mining the Invisible Web

Tips: Certain kinds of sites can prove to be clearinghouses of information:

Government - statistics of all kinds Professional organizations - archives of relevant

research and statistics Media sites (TV and Radio) – transcripts and

speeches College and university professor sites – lectures

and personal publications

Page 30: The Invisible Web Definition Searching. The Invisible Web Also called: deep content hidden internet dark matter

Mining the Invisible Web

Look for library guides and commercial portals for more guidance in finding the hidden, valuable content available for free on the Web (more on this in the next lesson):

My Ready Reference on the Web Resource