Upload
ronald-sanders
View
214
Download
0
Embed Size (px)
Citation preview
Search Tips or with competition with search robots
Inspired by Mary Ellen Bates’ workshop“Tips From a Super Searcher: Getting the Most From the Web and Online Sources”, Prague , 2003.
Toshka Borisova
AUBG Freedom Forum Journalism Library Coordinator
19 and 26 June 2003 Toshka Borisova 2
Search Tips
The World Wide Web contains more information than any other single resource in existence today. Finding the information you are looking for among the billions of web pages on the web can be tough. This guide of search tips will have you on the road to finding information quickly and effectively.
Web search tips The invisible web
19 and 26 June 2003 Toshka Borisova 3
Online Search Strategies
What are you looking for: Full text or abstracts? Current material or 10 years back? Basic or advanced material? Short or in-depth articles? Any "validating" sources? Exact match or something close? Leads to identify experts to call? White papers ( White Papers contain an official set of proposals in
specific policy areas), statistics and other info more likely to be on web sites?
19 and 26 June 2003 Toshka Borisova 4
Online Search Tips Use "advanced search" option http://www.aubg.bg/library/text.php?i=68
0 Google Well known as the "king of search," this engine
has one of the largest databases of web pages in the world. Fast, accurate results are common here and chances are good that if you can't find it in Google, it's not meant to be found.
19 and 26 June 2003 Toshka Borisova 5
Online Search Tips Plan on two separate search sessions Be sure to value your time
White Paper on the true cost of searching
the open web vs. the professional online
Services www.factiva.com/infopro/BusIntellletter.pdf Assume you will find something We have higher relevance expectations
than our patrons Watch for what's not online
19 and 26 June 2003 Toshka Borisova 6
Online Search Tips
Watch for references to "grey literature“"That which is produced on all levels of government, academics, business and industry in print and electronic formats, but which is not controlled by commercial publishers."
Include www or http in your search strategies to find mentions of web sites
Always use several tools for the same search Watch for alternate spellings and
phrasings Use same words in different order
19 and 26 June 2003 Toshka Borisova 7
Web Search Tips
Use tools, not search engines. There is absolutely no pattern
Wayback Machine
http://www.archive.org/ Purge your "assumptions cache" regularly Keep a trail of where you have been Be sure to value your time
19 and 26 June 2003 Toshka Borisova 8
Web Search Tips
When exploring a site, use the Site Map or Site Index Use the [Search This Site] feature to find hidden
pages Know the "power tools" of each search engine
Field searches File-type searches Limits by date, language, site Truncation Boolean
19 and 26 June 2003 Toshka Borisova 9
Search Tips
Keyword SearchMany search engines by default offer a keyword search
Phrase Search. Boolean Operators
Named after mathematician George Boole, Boolean logic involves the operators AND, OR, NOT, and occasionally NEAR
19 and 26 June 2003 Toshka Borisova 10
Online Search Tips
Keyword Search Use KWIC (Key Word In Context)
Try to find synonyms, acronyms
http://www.keyworddensity.com/
http://www.wordtracker.com/ Search for key words in title Use the "at least X times" feature
DJI/Factiva, LexisNexis, Dialog:
19 and 26 June 2003 Toshka Borisova 11
Web Search TipsPhrase Searching
Requires the terms to appear in the exact order that they are typed. Most systems that allow phrase searching have the user enter the phrase in quotes.
"national endowment for the arts" Phrase Searching”- Supported by all Google - Phrases may not be on page Teoma- “Not always exact matches” (FIXED) Openfind Debuting in beta form in July 5, 2002
Openfind is a new, large independently-built search engine, initially claiming 3.5 billion pages. It is based on research in Taiwan and has a Chinese version as well. None available now
19 and 26 June 2003 Toshka Borisova 12
Web Search Tips
Boolean operators Just use it wisely
– Simple ANDs, ORs– Narrows results
Boolean NOT ( - )– Exclude meaning– Exclude domains
Boolean OR– Crucial synonyms– Need more pages
19 and 26 June 2003 Toshka Borisova 13
Web Search TipsTo OR or not to OR Google: OR in CAPS, advanced
– Does not always work right– yellowstone bison OR buffalo
AlltheWeb: use ( ) or Advanced Boolean Box– yellowstone (bison buffalo)
AltaVista: normal– yellowstone AND (bison OR buffalo)
Gigablast: Use + (but not the same)– +yellowstone bison buffalo
Teoma– yellowstone bison OR buffalo– Becomes(yellowstone AND bison) OR buffalo
19 and 26 June 2003 Toshka Borisova 14
Web Search Tips
Proximity
– Text matching– citation hunt– plagiarism check– Q&A
NEAR and Other Proximity– AltaVista only
19 and 26 June 2003 Toshka Borisova 15
Web Search Tips
TruncationSearches for variants of a word by using a symbol to represent one or more characters. The most common symbols are * (asterisks), ? (question marks), and ! (exclamation marks). If truncation is not supported by the search engine use the Boolean operator OR to combine like terms. – AltaVistaTruncation
HotBot & MSN Truncation Another term ”Stemming”: MSN (e.g., find "movies" if your
search word is "movie")
19 and 26 June 2003 Toshka Borisova 16
Web Search Tips
Case Sensitive ( alaskan pipeline- with the incorrect lowercase "a")
– AltaVista Advanced or Quoted Simple– MIT vs. mit or IT vs. it
19 and 26 June 2003 Toshka Borisova 17
Web Search Tips
Wild Card Word in Phrase Wild Card characters represent undefined letters or numerals in a search term. Wild Card characters allow for retrieval of:
- Singular and plural word forms
- Spelling variations (e.g., British/American spellings)
- Word stems with prefixes and suffixes
* - Represents zero to any number of characters at the beginning or end of a term. *GROW* - Possible Retrievals GROW , GROWS, OUTGROWTH
? - Represents exactly one character within a term...
T??TH TEETH, TOOTH, TRUTH
...or one character at the end of a term AMIN? AMINE , AMINO
19 and 26 June 2003 Toshka Borisova 18
Web Search Tips
Field SearchingFields searching allows the searcher to designate where a specific search term will appear. Rather than searching for words anywhere on a Web page, fields define specific structural units of a document. The title, the URL, an image tags, or a hypertext link are common fields on a Web page.
How search engines workSpidering program - Collect links
Indexing program - Include metatags
Search/retrieval program - Sort results
19 and 26 June 2003 Toshka Borisova 19
Web Search Tips
Link Searching
Pages include a link to the specified URL.
Link Updates, Impact Analysis- Best at AltaVista, AlltheWeb
– Can have different results for
http://www.name.org/Example: http://www.freedomforum.org/ - finds pages with links to
this site Title:searching will look for the word 'searching' in the
title of a Web page. Hits have the term(s) in the HTML title element. title: "search engines”
19 and 26 June 2003 Toshka Borisova 20
Web Search TipsField Searching IP: Page is the specified IP range. Incomplete numbers
are truncated. ip:216.32.120 finds any computer in 216.32.120.*
Site: Results are only from the specified site. site:nasa.gov - finds pages at NASA's Web site
Suburl: Pages have the term(s) somewhere in the URL (host name, path, or filename). suburl:searchenginewatch
URL: Result must be exactly this URL and nothing else. url: www.slashdot.com/index.html
19 and 26 June 2003 Toshka Borisova 21
Web Search Tips
– Field Searching
title: AltaVista, AlltheWeb, HotBot, Lycos, Gigablast
intitle: Google Google, Teomaurl: AltaVista, AlltheWeb, Lycos, Gigablastinurl: Google, Teomasite: AlltheWeb, Gigablast, Google, Teomalink: AltaVista, Google, AlltheWeb, HotBot,
Gigablastanchor: AltaVistaimage: AltaVista
19 and 26 June 2003 Toshka Borisova 22
Web Search Tips
Selected Limits Usually on advanced search formLanguage: At most, languages varyDate: AlltheWeb, AltaVista, Google, Inktomi– Cut out old material, focus search– Or to find old informationFile Type: AlltheWeb, AltaVista, Google, Inktomi.
PDFs at all, Flash at AlltheWeb, Media Type: HotBot, MSN, AlltheWebPage Size: AlltheWebIP Range: AlltheWeb
standard
19 and 26 June 2003 Toshka Borisova 23
Web Search Tips
Diacritics: é
Does e find é? - Sometimes Not at Google
– Exact match on diacritics only At other search engines
– e usually finds e OR é
é usually finds only éUse English equivalents for special letters and omit diacritics
19 and 26 June 2003 Toshka Borisova 24
Web Search Tips
Counting Complexities Search Engines Can’t Count
Only the big search engines count, top10 search engines Numbers constantly change
– From one page of results to the next– From one minute to the next
Try reloading for more
19 and 26 June 2003 Toshka Borisova 25
Web Search TipsFeature Inconsistencies Databases Changes
– Constant– If they don’t . . .
• They get old, out-of-date, dead links– Size Changes Often Sudden– Database Reversions– Searching Failures And Other Unexpected Results
On the Fly Analysis Always Question Results Evaluate and Compare Find one unique, low-posted term
– Use for search engine comparisons – Evaluate change over time
“On-the-Fly Search Engine Analysis.” ONLINE 23(5):63-66, Sept. 1999. onlinemag.net/OL1999/net9.html
19 and 26 June 2003 Toshka Borisova 26
Web Search Tips
CEO - Search Engine Optimization SearchEngineShowdown.com
More on Advanced Features
Feature Chart
Detailed Reviews Search Engine Watch
http://www.searchenginewatch.com/facts/ataglance.html
19 and 26 June 2003 Toshka Borisova 27
InconsistenciesLow Recall or "I am not finding any sites on my topic!!" Have I chosen the correct database? Have I been too specific in formulating the search? Have I included all possible terms and word forms? Should I use
truncation? Was Boolean logic used correctly? Did I make a technical error, e.g., spelling, or command syntax?
Low Precision or "I found hundreds of citations and many are not on my topic!!"
Delete less specific synonyms and ambiguous terms Search fewer fields e.g., just the title field or URL Add additional facets with AND or NOT Add restrictions, e.g., date of publication
19 and 26 June 2003 Toshka Borisova 28
The Invisible Web
What is it?It consists of searchable information resources whose
contents cannot be indexed by traditional search engines.
Content in databases Professional online services Non-ASCII files Sites that require log-in or registration Real-time information Dynamically-created web pages Discussion forums and BBSs
19 and 26 June 2003 Toshka Borisova 29
Searching the Invisible Web
Much "invisible" content has a
"visible web" front Some databases are opening up
Google searches PDF, XLS, RTF, DOC files
19 and 26 June 2003 Toshka Borisova 30
Searching the Invisible Web Use directories and portals
-Open Directory Project http://www.dmoz.org is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community of volunteer editors.-Librarian’s Index to the Internet http://www.lii.org-Subject-specific directories http://www.econ.bg
Experts and info pros watch for this materialExperts.com www.experts.comA reliable and diverse source of experts, many of whom are outside the academic arena.
Yahoo - http://groups.yahoo.com/ Search for database or forum along with subject terms
19 and 26 June 2003 Toshka Borisova 31
Searching the Invisible Web
Use meta-search engines DogPile.com MetaCrawler.com Use Teoma.com's "Experts' Links“ Scan the libraries of relevant
discussion groups Lurk on lists
19 and 26 June 2003 Toshka Borisova 32
Searching the Invisible Web
Use reverse link look-up to find "more like this"
Google and Alta Vista:
link:www.BatesInfo.com HotBot: http://www.hotbot.com/
link:www.aubg.bg/fforum - use [Links to this URL]
19 and 26 June 2003 Toshka Borisova 33
The Invisible WebInvisible Web Directories http://www.invisibleweb.com/
The InvisibleWeb Catalog™ contains over 10,000 databases and searchable sources that have been frequently overlooked by traditional searching.
CompletePlanet.com Contains 103 searchable databases
DirectSearch Difficult to use but extensive
http://www.internets.com/They have assembled the largest filtered collection of useful search engines and newswires anywhere on the World Wide Web. There are 1-2 billion documents, on the "surface web". The deep web is estimated to be approximately 500 billion documents.
Good hierarchy of databases
19 and 26 June 2003 Toshka Borisova 34
Web Search Tips
Set aside one afternoon every two weeks for your web reading !!!
More infohttp://www.BatesInfo.com