Things You Just Have to Know About Search Engines

Preview:

DESCRIPTION

Things You Just Have to Know About Search Engines. Ran Hock Online Strategies May 14, 2002 InfoToday 2002. Things You Just Have to Know About Search Engines. 1 - No Search Engine Covers Everything - PowerPoint PPT Presentation

Citation preview

Things You Just Have to Know About Search Engines

Ran HockOnline Strategies May 14, 2002 InfoToday 2002

Things You Just Have to Know About Search Engines

1 - No Search Engine Covers Everything

2 - Different Engines "Miss" and Find Different Things

3 - Large Numbers Aren’t Necessarily Bad Searches

4 - All Search Engines Have Techniques That Allow You Improve Results

Things You Just Have to Know About Search Engines

5 - Metasearch engines are not "search engines"

6 - Google is great, but not the only one you should use.

7 - Some Things Change, Some Don't

1 -No Search Engine Covers Everything

There are pages no engine covers: Invisible pages Un-linked pages, database pages,

password protected sites, “deep” pages, etc.

Different engines ”miss" and find different things (Point #2)

2 - Different Engines Find and Miss Different Things

Each engine may find something others missed.

Even “2nd tier” engines find things missed by the top 3

Consider the results of the following search on: “erris head” sailing

2 - Different Engines Find and Miss Different Things

2 - Different Engines Find and Miss Different Things

Of the 20 different records retrieved by all the engines, Google found (only) 14 (70%)

Google missed 6 (30%)

If you had searched Google, then just one more engine, your retrieval would have increased by 15%

Even HotBot found 2 the other three engines missed.

2 - Different Engines Find and Miss Different Things - Why ?

Indexing "policies" What words and other items get indexed How those things are "parsed"

Crawling differences Starting points Depth / Breadth of crawling etc.

Spam policiesRanking

3 - Large Numbers Aren’t Necessarily Bad Searches

Most common complaintYou’re not “obligated”All use some form of relevance

rankingRelevance ranking does, to some

degree at least, the same things we do to find the best items

What relevance ranking uses:

3 - Large Numbers Aren’t Necessarily Bad Searches

Relevance ranking uses some combination of:PopularityFrequency of termsWeighting by field (e.g., Title counts more than

Summary)Proximity of termsWeighting by size of the typeWeighting according to the order in which the

searcher entered termsEtc.

3 - Large Numbers Aren’t Necessarily Bad Searches

Most search engines automatically “enhance” your search

Automatic phrase identification

Word variants (and/or truncation)

Case sensitivity

Analysis of documents in the database (links, term association, associative networks, cluster analysis, co-occurrence, etc.)

Etc.

Automatic Re-Write - AllTheWeb

4- All Search Engines Provide Options for You to Enhance Your Search

Field Searching title URL date language etc.

Boolean (yes, “Boolean,” which is neither difficult nor bad)

4- All Search Engines Provide Options for You to Enhance Your Search

How do you know about these optionsUse the Advanced Search pageRead the documentation________________

4- All Search Engines Provide Options for You to Enhance Your Search

Use the Advanced Search page

5 - Metasearch engines are notnot “search engines”

Consider the following example of a search done in individual engines, then in metasearch engines

DoneDirectly

viavivisimo

viaDogPile

viaMetaCrawler

ViaSearch.com

Viaixquick

AllTheWeb 52 10 0 0 9 0Google 39 0 0 0 0 0WiseNut 15 0 0 0 10 0AltaVista 10 0 0 9 10 0HotBot 9 0 0 0 10 0Excite 6 0 0 0 0 1TOTAL 48 15 16 61 16

Search done for “geologic resources” worcester

5 - Metasearch engines are notnot “search engines”

Most don’t search all of the largest enginesMost don’t give you more than 10 or 20 records

from each engineMost don’t convey your full query syntax to the

target enginesMost give “paid sites” first“Client-side” metasearch programs, e.g., Copernic

and Bulls-Eye do NOT have the above problems.Even online metasearch engines have occasional

socially redeeming features (vivisimo’s clustering).

6 - Google is Great, But Not the Only One You Should Use

Points 1 and 2 - No search engine finds everything and different engines find different things

6 - Google is Great, But Not the Only One You Should Use

Great Because of: Size Popularity-based ranking Unique content

newsgroupsPDFs and other file typeslargest image collection

Dandy little features like addresses, definitions, etc.

Pretty good search options

6 - Google is Great, But Not the Only One You Should Use

But Doesn’t Have:EverythingTruncation and NEAR that AltaVista hasAs much news coverage as AllTheWebAs much currentness as AllTheWeb

(maybe)Etc.

7 - Search Engines Change

In some ways a lot, in other ways very little

7 - Search Engines Change

Areas of little changeFor most engines: How they do basic

things such as phrases, Boolean, truncation, field searching etc.

7 - Search Engines ChangeAreas of frequent/considerable changeSome come, some go

Gone” Go/InfoSeek et al. Arrived: WiseNut, Teoma

How things are arranged on the home page (esp. AltaVista)

Partners (which directory they use, featured partners and tools, etc.)

Added content, esp, content types (PDFs, newsgroups, etc. in Google.)

In Summary

1 - No Search Engine Covers Everything

2 - Different Engines "Miss" and Find Different Things

3 - Large Numbers Aren’t Necessarily Bad Searches

4 - All Search Engines Have Techniques That Allow You Improve Results

5 - Metasearch engines are not "search engines"

6 - Google is great, but not the only one you should use.

7 - Some Things Change, Some Don't

Ran HockOnline Strategies1-800-871-4033www.onstrat.comran@onstrat.com