Upload
luke-franklin
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
www.sharon-it.com2
Contents• What is the Invisible Web?• How big is the Invisible Web?• Why is there an Invisible Web
(and what’s in it)? • Case study – patent search.• How to find Invisible Web
resources?
www.sharon-it.com3
What is the Invisible Web?
• Called also “Deep Web” in contrast to the “Surface Web”, which is the Visible Web.
• The term “Invisible Web” relates to content of pages that are available and accessible on the Web, but are not accessible and not indexed by the regular SEs, and includes mostly:– searchable databases– excluded pages
• These pages do not appear in the SEs search results.• Finding information on the Invisible Web is available
using direct access or using Specialized SEs.• The extent of the Invisible Web is larger than the Visible
Web.
www.sharon-it.com4
How Big is the Invisible Web?
According to BrightPlanet study (2000):• Deep web (Invisible web) is 500 times larger.
– Number of search utilities:• 45,000 search engines on the surface web. • 200,000 searchable databases within the deep web.
– Number of documents:• 1 billion documents on the surface web. • 550 billion documents within deep web.
• Deep web quality is 1,000 times greater. • 95% of deep web information is publicly
available. http://www.brightplanet.com/technology/deepweb.asp
www.sharon-it.com5
Invisible Web?!The Invisible Web
www.sharon-it.com6
Is it Really that Big?• Some argue that the Invisible Web
is actually only 50-80 times bigger than the Visible Web.
www.sharon-it.com7
Why is there an Invisible Web ?
1. Specialized searchable databases:
– Dynamic pages– Require parameters and user
judgment– Require user and password
2. Script-based pages:– Include “?” in their URL– Hazard to SEs – traps
www.sharon-it.com8
Why is there an Invisible Web? (2)3. Real-time, constantly changing,
content4. Very large websites are partially
indexed5. Private/secret websites:
– Internal companies portals– Excluded from SEs (Using robots.txt or
similar)
www.sharon-it.com9
Why is there an Invisible Web? (3)6. Multimedia (and other formats) files
– Special formats– Example: PDF, DOC, PPT, GIF, Flash– Part of the SEs can’t, or won’t index.
www.sharon-it.com10
Why is there an Invisible Web? (4)
7. Additional reasons (mostly spam related or resources saving):– File size– Number of words in the page– Pages requiring cookies– Other spam characteristics– Multimedia files– Files and URLs with special
characters.
www.sharon-it.com11
8. Non-linked pages (no incoming/inbound links)
9. Pages on servers with dynamic IP
• For example, 5% of the Internet is not connected!The Internet is partitioned and there exists “Dark Address Space”, or prefixes that are not reachable for one provider but that are available from other providers for long periods of time.
5% of the total number of prefixes in the Internet or tens of millions of end hosts.
Source: Arbor Networks:http://www.arbornetworks.com/downloads/research38/
dark_address_space.pdf
Why is there an Invisible Web? (5)
www.sharon-it.com12
Example – Invisible Flash Website
• Example: “Leo baby Pot” fails in all search engines.
• Exists in Website: www.tamooz.com • Reason: Flash Website!
www.sharon-it.com13
Leo baby Pot
www.sharon-it.com14
www.sharon-it.com15
The Invisible Web is Mostly Topic Databases
• The SEs know many databases, but not their content.
• Entry to many SEs is blocked• Searching these databases requires
entry via the site user interfaces, and often also registration/password/cookies.
www.sharon-it.com16
The Invisible Web is Mostly Topic Databases
www.sharon-it.com17
Specialized interface
VS.
• The database’s user interface is specialized and designed to get the best results.
www.sharon-it.com18
Will the Invisible Web become
Visible? Definitely!• Using intelligent SEs.• After a while, new
information is being updated in SEs that can access the invisible web, and make it visible.
www.sharon-it.com19
Example: Patent/Trademark Search• Searching Google: “Google patents
or trademarks” gives various answers: – Patent dbs, patent disputes, Adware
etc..
• USPTO or patent DB gives an organized list.
www.sharon-it.com20
Is the Invisible Web Invisible ?
GoogleYahoo!
X
www.sharon-it.com21
Search for sources – not information!1. Two-step searching in general search
engines.– learn how to phrase queries well
(vocabulary, anchor words, etc.)2. Invisible Web Search Engines 3. Pathfinders: Directories and Guides4. Expand from one website to others
– Using link:, related:, directories, pearl culturing
How to Find Invisible Web Resources?
www.sharon-it.com22
1 .Two-Step Searching• Use general search engine (such as
Google) to search for a good database, then search for the information inside that database/website search engine.
• Example anchor words to add in the query (see next slide):– Database– Association– portal– encyclopedia– product review
www.sharon-it.com23
Anchor Words• Use anchor words to find key websites and directories
– Directory of– Center of– Industry portal– Guide – Database– Resource– Bibliography– Reference– Working group
• Examples:– “professional publications/journals”– <term> “Industry portal” (or just portal)– <term> “metasite resource”– <term> pathfinder– allintitle: <term> “directory of”
www.sharon-it.com24
Two Step Searching: Example 1
• Find Videos of a Man blowing a Shofar.– Google Search: Video Search– Finds a list of Video Search Engines
• AltaVista Video Search www.altavista.com/video/default
• Yahoo Video Search video.search.yahoo.com• Singingfish www.singingfish.com• Google Video video.google.com• Blinkx Video Search www.blinkx.tv• Etc.
– Search for Shofar
www.sharon-it.com25
AltaVista Video: Shofar
www.sharon-it.com26
Two Step Searching: Example 2
"? איציק, שמור מצל שיריךמיהו איציק שלו נכתב בשיר "•חיפוש בגוגל: שירים–
אתר שירונט (אופציה שניה)•
חיפוש מהתפריט, מילים מתוך השיר–קבלת "איציק מאנגר"–ציטוט:–
איציק, שמור מצל שיריך,"אל תהיה שוטה ושמע:
תן בשיר טיפה של ייןאך שמרהו מדמעה"
www.sharon-it.com27
Search for sources – not information!1. Two-step searching in general search
engines.– learn how to phrase queries well
(vocabulary, anchor words, etc.)2. Invisible Web Search Engines 3. Pathfinders: Directories and Guides4. Expand from one website to others
– Using link:, related:, directories, pearl culturing
How to Find Invisible Web Resources?
www.sharon-it.com28
2 .Invisible Web Search Engines
• Sherman-Price Invisible-web Directory http://www.invisible-web.net/ * temporarily out of service
• CompletePlanet http://www.completeplanet.com
• Beaucoup http://www.beaucoup.com
• Turbo10 http://turbo10.com/
www.sharon-it.com29
Sherman-Price Invisible-Web directory
www.sharon-it.com30
Invisible-Web directoryPeople Search – cont.
www.sharon-it.com31
Complete Planet
www.sharon-it.com32
Beaucoup
• Over 2500 engines• The engines listed on the main site
are "free information" sites -- a *lot* of information.
• Subject Directory/Annotated
www.sharon-it.com33
Beaucoup
www.sharon-it.com34
Turbo10
www.sharon-it.com35
Turbo10
www.sharon-it.com36
www.sharon-it.com37
Turbo10 – Edit Collections
www.sharon-it.com38
Search for sources – not information!1. Two-step searching in general search
engines.– learn how to phrase queries well
(vocabulary, anchor words, etc.)2. Invisible Web Search Engines 3. Pathfinders: Directories and Guides4. Expand from one website to others
– Using link:, related:, directories, pearl culturing
How to Find Invisible Web Resources?
www.sharon-it.com39
3. Pathfinders
• Librarianas’ Index to the Internet http://www.lii.org
• MeL Michigan eLibrary http://www.mel.org/
• Internet Scout Project http://scout.wisc.edu/
• Infomine http://infomine.ucr.edu/
• More: http://www.calvin.edu/library/searreso/internet/webdirec.stm
www.sharon-it.com40
Search for sources – not information!1. Two-step searching in general search
engines.– learn how to phrase queries well
(vocabulary, anchor words, etc.)2. Invisible Web Search Engines 3. Pathfinders: Directories and Guides4. Expand from one website to others
– Using link:, related:, directories, pearl culturing
How to Find Invisible Web Resources?
www.sharon-it.com41
Quiz
• Can you find any information on the World Wide Web if you use a big enough search engine?
www.sharon-it.com42
Exercises1. Search for a keyword in a big website
(example "acid rain" site:epa.gov) – use several search engines. – How many results do you get?
2. What was the exchange rate of Canadian dollars (in us dollars) on 20 Sep 1991?
3. What was the value of Berkshire stock (BRK.A) on Nov. 12 1996?
4. Who wrote this (hint: book or paper): "Israeli Arabs will undoubtedly benefit from peace agreements between Israel and its neighbors“
5. What is the zipcode of 22 Hamaayan in Givataimמה אמר בנימין זאב הרצל באסיפה מוקדמת לקונגרס הציוני הראשון 6.
לפי העיתון "המגיד"?7. Search for armchairs in ikea.co.il website (try using site:).8. Where would you search for people?
www.sharon-it.com43
References
• http://websearch.about.com/od/invisibleweb/• http://www.shelton
.cc.al.us/library/lbs102/lbs102session12.html• http://www.lib.berkeley.edu/TeachingLib
/Guides/Internet/InvisibleWeb.html• http://www.campus-technology.com/article.asp?id
=7477• http://www.searchengineoptimising.com/
optimisation• http://www.press.umich.edu/jep/07-01/bergman.
html• http://www.oliveglobal.com/wsw.html