25
1 Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web-search Engines Robert F. Musco Southern Connecticut State University

Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

1

Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web-search Engines

Robert F. Musco

Southern Connecticut State University

Page 2: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

2

Introduction

Web search engines are currently the primary method used to find information available on

the web, but few people are aware of how they work, or which are more suited to different needs.

The purpose of this paper is to compare the search tools of four different search engines, and to

conduct a sample search, analyzing each site‟s results in terms of quantity, overlap, and

relevance. The first three search engines discussed, Google, Ask, and Gigablast, were chosen

because they are popular tools that each use their own proprietary software. The fourth, Dogpile,

was selected because it is a popular example of a metasearch engine, which compiles

information from four other search engines, including Google, Ask, MSN Live Search, and

Yahoo.

As the first step in understanding how these search engines operate, the documentation found

at each web site was evaluated. The following section gives an overview of the search tools

offered by each engine, and these findings are schematized in the comparative table in Appendix

2. It should be noted that Google and Ask‟s documentation was more complete than that of

Gigablast, while very little documentation was found on Dogpile. As a result, this overview is

based on the information each company provided about its search functions. If a function was not

mentioned, however, one cannot assume it is not available, since spot tests carried out

throughout the investigation showed that some important functions not listed were indeed

operational. The instances in which this occurred are mentioned in the paper. The actual

operation of each engine in test searches will be discussed in the third section.

Search Engine Overview: Comparison of Search Tools

Although Google is often viewed as the leader of “user-friendly” search engines, finding and

compiling basic information about how its search function works required visiting several

Page 3: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

3

sections of the website. Beginning in the Advanced Search, one sees that queries can be limited

by “all these words, one or more of these words, this exact word or phrase, (none) of these

unwanted words”, language, format, site name, exact phrases, data created, usage rights, location

of key words, region, and numeric range (Google, 2009a). Two additional limiters give results

that are “similar to the page” specified, and “link(ed) to the page” specified.

More specific information is found in the Basic Search Help, such as advice about search

strategies, and basic tips of how the engine operates (Google, 2009c). For example, Google

ignores most punctuation, is case insensitive, and generally counts all words, though it may

ignore a word if it considers it irrelevant. The “More Search Help” area is the most complete

guide to searching terms (Google, 2009d). Boolean operators are permitted in the main search

box, and a full list of additional operators, such as wildcard and synonym symbols, is given.

Google results can also be limited to a series of subject headings, or performed over the entire

web.

Google‟s main search box uses the AND operator by default when a space is left between

two terms, though a quick test shows that if the AND is actually used, the number of hits may

change, particularly if the search terms are common words, such as [cat AND dog]. These results

indicate that unknown criteria come into play when using Boolean operators. Perhaps this is an

example of Google‟s intelligent search overriding some operator commands in an attempt to get

at the user‟s “obvious” intent, as mentioned in the documentation.

Google‟s “Technology Overview” page, reached through its “Corporate Information”

section, gives a clear, though simple explanation of the theory behind its search technology

(Google, 2009b). Since the explanation does not enter into technical detail, it would be

inadequate for a searcher with a great deal of technical expertise, but its discussion of strategies

Page 4: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

4

for optimizing searches and using special operators is sufficient for the general user. Google

explains that its robot, which crawls the web on a regular basis, is fully automated, meaning that

the company cannot adjust page rankings. In fact, Google states that it does not accept payment

for inclusion or placement in its ranking.

The explanation also helps to explain why search results may not exactly match search terms.

Google uses its “PageRank™ algorithm”, a proprietary ranking methodology, to weigh a number

of factors, including the appearance and position of text on the page, in determining the relative

relevance of web pages. The algorithm calculates a page‟s importance relative to similar pages

by counting the number of linked pages that point to it. What is innovative about this method is

that the “pointer” pages which link to the retrieved page are themselves weighed for their

relevance by the number of pages that point to them in turn. Pages being evaluated are penalized

if they contain links to “link farms”, which are sites created purely for the purpose of raising

other pages‟ relevance. Sometimes search results may include pages that do not actually contain

the search terms, but are reached through links in other pages described by text that contains the

terms.

Google is undeniably one of the leaders in enhanced web search features that operate with

shortcuts directly from the search box. These “bells and whistles” are small applications that

give real-time information just by entering limited search terms, like automatic stock quotes by

entering the stock symbol and Fed Ex tracking information by entering the tracking number

(Google, 2009f).

Ask was founded as AskJeeves.com in 1996. Like Google, it allows searching from various

subject categories, and has features similar to Google‟s in its Advanced Search page, which

allows a query to be defined by “all the words, at least one of the words, the exact phrase, none

Page 5: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

5

of the words”, language, specific domain, exact phrases, date modified, location of key words,

and region (Ask, 2009a).

Searches are case-insensitive, word order matters, and spelling is automatically corrected

(Ask, 2009c). Though some of the operators listed in Google are not specifically listed in Ask‟s

Advanced Search Tips, Ask has developed even more Boolean-like operators than Google,

which make it possible to limit searches not only to specific URLs, but also to specify date

ranges or pages with the search terms in the titles or in hyper-linked text (Ask, 2009b). Oddly, it

is never mentioned that AND is the default operator in ASK, though spot-testing shows that it is.

Like Google, Ask produces variable results when a blank space between terms is compared to

use of the AND operator, especially when the two terms are very common words.

The Site Features section in Ask also lists a number of enhanced shortcut features, though

their operation can be confusing, since some are reached through menu categories, while others

are activated with a keyword in the search terms (Ask, 2009e).

Ask also uses a proprietary algorithm, here called Expertrank™, that relies on a “clustering

concept of subject-specific popularity”, which ranks hits based on the number of pages which

link to a site, weighing which of those pointer pages are more authoritative (Ask, 2009d). The

method is not explained in more detail, though the description sounds very similar to Google‟s

PageRank™ technology.

Gigablast offers a stripped-down search engine whose appearance is less commercial than

Google‟s. Gigablast does not include any of the automatic shortcut functions for weather, stock

market, etc., found on Google or Ask, but it does offer subject directories, though they were not

functional during the end of February and beginning of March while this paper was being

researched.

Page 6: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

6

Gigablast‟s Advanced Search can handle searches restricting queries to “all these words, any

of these words, this exact phrase, none of these words” (Gigablast, 2009a). Searches can also be

limited to a specific URL, a specific site, pages linked to a specific URL, and one can choose to

enable site clustering.

The “Query Syntax” section of Gigablast explains the Boolean operators permitted, which

are mostly comparable to those in Google and Ask (Gigablast, 2009b). The AND operator is the

default, but it is applied in a very specific way. For example, with two terms, preference is given

to incidences of both terms next to each other. If one wants to avoid giving preference to both

terms together, the operator [term .. term] can be used.

An OR operator actually gives preference to hits with both terms, which appears to be a

strategy to increase relevance. Parentheses are said to be optional, and indeed, a test shows that

both AND and a blank when used without parentheses will nest the two AND terms before

applying another operator, such as OR, afterwards. For example, a search with [soup AND shoes

OR train] yielded results whose top ten were devoted to national train companies, with no

instances of “soup” or “shoe”, since the algorithm obviously gave greater weight to the OR term

it considered most important. When parentheses are used, however, they define the order in

which the operators are applied, instead of the default left-to-right “AND-first” logic that appears

to be used here. Thus, a sample of the hits from [soup AND (shoes OR train)] included pages

with “soup” and “shoes”, or “soup” and “train”, and even some hits with only the word “soup”.

Gigablast does not provide operators for limiting documents by date as Ask does, but one can

restrict hits by formats, such as .doc or .xml. There is one unusual operator worthy of mention,

which searches first by a primary term, than ranks all hits by a second term. Gigablast offers no

explanation of how its algorithm functions.

Page 7: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

7

Dogpile is a metasearch engine that links results from Google, Yahoo, MSN Live Search,

and Ask. Dogpile includes sponsored links mixed among the results, though each is labeled as

such. Dogpile‟s Advanced Search shows search boxes which handle the operators “all these

words, any of these words, the exact phrase, none of these words”, in a specific domain, and

specific language (Dogpile, 2009a). The main search page provides tabbed categories for

limiting searches to specific formats (images or music), or subjects (news, yellow pages, and

white pages).

The “Metasearch 101” section explains the rationale for a metasearch engine (Dogpile,

2009c). A link can be found to a self-study, carried out in collaboration with University of

Pittsburgh and the Pennsylvania State University in 2007, that found that less than one percent of

first-page results on a given search query overlapped among the four major search engines

(Dogpile, 2008). The implication is that with so little overlap, any claims of highly relevant

results by competitors are suspect. The “InfoSpace” section mentions Dogpile‟s InfoSpace

proprietary technology as the software behind the search, but does not explain how the search

engine works, or how it ranks results (Dogpile, 2009b). A quick test shows that the AND

function is at least partly a default, but in contrast to the other engines discussed here, Dogpile

does not appear to allow Boolean operators, and only provides refined searches through its

advanced search page. Indeed, using AND as an explicit operator with two terms gives

inconsistent results, since the AND appears bolded in the results, suggesting that it is interpreted

not as an operator, but at least sometimes as an actual search term. Using OR in a search with

two terms actually reduces hits, suggesting that OR is not a Boolean operator. Performing a

search from the advanced search box, however, it was possible to see that when a search is done

with an exact phrase, the phrase then appears in the general search box with apostrophes placed

Page 8: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

8

around it, and a minus sign before a “none of these words” term. Subsequent testing shows that

apostrophes and minus signs are indeed active Boolean operators.

Dogpile is not consistently case-insensitive, in contrast to the other three engines, and does

not always ignore “stop words”. The most notable difference seen when using Dogpile is that the

total number of hits in a search is not indicated on the page, and that the number of hits returned

can frequently be several orders of magnitude less than products like Google.

Comparison of Search Results among Search Engines

To analyze how the different search engines handle queries, a fairly limited topic

containing multiple terms was chosen. The goal of the query was to find out if research shows

that there is a correlation between playing violent video games and student achievement in high

school students. The search was begun with general terms, and refined in four steps. Each

separate search is indicated by a title of the search query, which is set off by brackets to frame

exactly what was entered in the search boxes, not to be confused with nesting parentheses.

Search Query: [student AND achievement]

Searching was begun with general terms, to give an idea of how each engine handled the

AND operator [student AND achievement]. Google produced more than 38,000,000 hits,

covering broad areas such as technology and student achievement, analysis of student

achievement, and public policy related to student achievement. Of the top 20 hits, almost all

were relevant in that they discussed the general issue of student achievement. Several documents

even addressed the specific issue of factors related to student achievement, such as teacher

quality, poverty, class size, library use. Most of the sampled results were from the domains .org,

.gov, and .edu though a few were from .coms, such as journal databases. Hits included pages

Page 9: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

9

with the terms as a single phrase or separated in the document, and included the plural

“students”, showing that alternate forms of the terms will be returned.

Searching for the same two terms without the AND raised the number to 200,000,000, but

at the same time seemed to favor documents that had incidences of the terms as a phrase. One

possible explanation for the 6-fold increase in hits is that the search default without the written

AND is actually acting partially as an OR, thus including many pages with only one of the terms.

Due to the volume of hits, it was difficult to confirm this theory.

The [student AND achievement] search yielded fewer hits on Ask, roughly 10,000,000.

The sponsored results, which appeared at the top, were irrelevant, but did not affect precision,

because they were not counted in the number of hits retrieved. Since all the sampled hits had to

do with the general topic of student achievement, they were relevant. The hits included the

search terms singly or as a phrase, and when the search was repeated without the AND operator,

the number of results was unchanged, though the ranking shifted a bit.

Gigablast retrieved 505,000 hits, many fewer than the other two search engines, and

included the search terms separately and together. A search without the AND operator increased

the hits slightly, about 7%, and seemed more likely to rank instances of the full phrase first. Most

results could be considered relevant.

Dogpile gave 71 hits, a huge reduction from all other search engines. Because sponsored

results are counted as hits, and these were largely irrelevant, the precision rate dropped to 65%

(of the first 20 surveyed). AND was not tested as an operand, since it was already determined

that Dogpile does not support Boolean searches.

Search Query: [“student achievement” AND “high school students”]

Page 10: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

10

In an effort to reduce the hits to an order of thousands, the next search query narrowed the

focus using two exact phrases framed in quotes, separated by an AND operator. Google returned

260,000 hits, and all of the first 20 appeared relevant to the query. Ask, by contrast, returned

almost 60,900 hits, whose first 20 could also be considered relevant in light of the search terms.

Though Ask appeared to interpret the exact phrase operator correctly, single words were

highlighted in the results, and looking at some individual hits showed that exact phrases may not

actually be part of the document body, but may be taken from surrounding text or larger page

headings or topics.

Gigablast, by contrast, returned 1,100,000 hits, far more even than the previous more

general search query, implying that either exact phrases are not always respected, or that OR is

somehow in effect. Once again, due to the sheer number of returns, the exact reason for this

anomaly could not be determined.

Carrying out the search in Dogpile required some adjustments in order to use two separate

exact phrases, because the Advanced Search box does not allow more than one exact phrase, and

Boolean operators seem not to work in the general search box. However it was observed that

when an exact phrase search was done in the Advanced Search box, the terms appeared in the

general search box in apostrophes. A trial was conducted in the general search box using the two

separate exact phrases, each enclosed in apostrophes (without the AND operator), which seemed

to work correctly. It returned 64 hits. Once again, the number of irrelevant sponsored results

clogged the page and lowered the precision rate, but the non-sponsored results surveyed were

appropriate to the query. Further testing showed that quotes appeared to function as a Boolean

operator to create exact phrases as well.

Page 11: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

11

This search brought to light a problem with all the search engines‟ Advanced Search

functions. Though all four companies provide boxes in their Advanced Search window to allow

queries with the equivalent of AND, OR, and exact phrase, a single exact phrase box limits

queries to only one exact phrase at a time (except for Gigablast, which has two spaces). In

attempting to find a fix for this issue, the previous search [“student achievement”AND “high

school students”] was repeated in Google. From this results page, the Advanced Search was

opened to show how those terms had been placed. The second phrase appeared in the “all these

words” box, separated by hyphens, and the first phrase appeared in the “this exact wording” box.

The search was then retried in the general search box using these operators (second phrase with

hyphens, and the second with quotes), as in [“student achievement” AND high-school-students],

with and without nesting parenthesis. The search produced double the number of hits, 500,000.

Many of the sampled hits were relevant, though the first results were inconsistent in terms of

respecting the two separate phrases. This outcome shows that words joined by hyphens are not a

useful substitute for quotes as an exact phrase operator in Google. This could lead one to

conclude that more than one exact phrase cannot be used in the Advanced Search area, and that

unless Boolean formulas can be used to AND or OR multiple exact phrases in the general search

box, such a search cannot be done. Additionally, if Boolean operators are used in the general

search box, it is usually not possible to combine them with additional limiters, such as date, or

language, from within the Advanced Search box, because searching first from the general search

box, and then repeating the search using the terms as they were automatically placed in the

Advanced Search section produced inconsistent results.

Page 12: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

12

At this point, it seemed advisable to investigate how other search engines handle the shift

between a search in the general search box, and a search using terms the way they automatically

appear in the Advanced Search box after the search is performed from the general search box. In

Ask, the query in the general search box [“student achievement” AND “high school students”]

appeared in Advanced Search with the Boolean operators and the terms completely intact within

the ALL THE WORDS box. Toggling back to the general search preserved the search results.

In Gigablast, the Advanced Search area is not accessible from the results page, and repeating the

identical search from the Advanced Search page within the “all of these words” box produced a

blank page. Dogpile, as previously mentioned, will reproduce both quotes as well as the

apostrophe operators in the “all of these” box in the Advanced Search Page after an exact phrase

search is done from the general search page.

Returning to Google to test whether the same operators work within the Advanced Search

page, exact phrases with quotes were tested inside the “all these words” box, which gave the

same results as the original search from the general search box, though toggling back a second

time to Advanced Search caused the results to double again. It should be noted that the other

three search engines results shifted very slightly when toggling back and forth from general to

Advanced searches.

Search Terms: [research AND “student achievement” AND “high school students” AND

"violent video games"]

Some additional terms were added to limit the results to research, or references to research,

about the relationship of playing violent video games to student achievement in high school

students. Google returned 228 hits, the first of which were appropriate to the search terms given.

Ask returned 1420 hits, but of the top ten not all included only exact phrases. Some terms were

Page 13: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

13

found in titles or tabs unrelated to the body text, and others were not found on the page at all,

which suggested that these pages were returned because the phrases were found in linking

documents. Gigablast returned 97 hits, with results similar to those of Ask in terms of its

inconsistency in respecting exact phrases. Gigablast had several broken links in the top 10 alone,

which indicates that it is updated less frequently than other engines. Dogpile returned 47 hits,

and actually seemed to do the best job of respecting exact phrases, when compared to Ask and

Gigabyte.

Viewing a sampling of the hits in their entirety in the four search engines showed another

effect of the search using multiple terms. The hits returned were often pages with several articles

listed or abstracted on a single page, each one with one or more of the exact phrases. This makes

it possible for a result to be relevant from the point of view of a search engine, but useless to the

researcher, since the search terms are not always related in a single document.

It was also seen that many “relevant” hits tended to focus on the relationship between video

games and aggressive behavior, (which may be the predominant connection in the literature) and

not video games and student achievement. This suggested that terms related to aggression should

be excluded to further limit the field. The risk, of course, is that documents that address the topic

of violent video games as a factor influencing school achievement may also mention aggression,

and some useful documents may be missed, but it seemed a useful experiment.

Search Terms: [research AND “student achievement” AND “high school students” AND

"violent video games" -aggression]

Excluding the search term in Google returned 118 hits. Analyzing the first 30 by viewing

each page individually, no more than 4 out of 30 hits could be considered truly relevant in

answering the research query, for a precision rate of 13%. The reason for this is perhaps at the

Page 14: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

14

core of why searching the web is very limited for some purposes. Almost all the pages were

related in some way to education, but they invariably met the search conditions of matching

multiple exact phrases by finding pages in which the phrases appeared in unrelated areas. The

most common kind of page returned was a collection of short news articles, or abstracts and lists

of articles and documents, in which each abstract contained at least one, but rarely all, of the

terms. These hits generally did not contain any information relevant to the intent of the query,

though the pages retrieved met the conditions of the search.

Ask returned 35 hits. Of the first 30, there was a 63% overlap with Google. Viewing each

of the first 30 pages gave a relevance of approximately 10%. When analyzing the sample pages

for relevance individually, several odd results were seen. For example, the abstract of one

particular page seemed to be relevant, since it mentioned the following phrase: “the frequency

and type of video games played appears to parallel risky drug and alcohol use”. When the link

was opened, however, neither the abstract phrase nor the original search terms could be found on

the page itself. Where the text actually came from is a mystery, and may have been included in

the page metadata, or perhaps a link from a different page.

Gigblast returned only 7 hits, 4 of which were also returned by Google, making it difficult

to calculate an overlap rate. The relevance rate was approximately 1 out of 7, or 14%. 4 of the 7

pages retrieved overlapped with Google.

Dogpile returned 28 hits, with a 68% overlap with Google, a rate similar to Ask‟s. Looking

at each page shows a precision of rate of 14%, not very different from the other products.

Conclusion

While all four search engines were able to handle a multi-termed query adequately with

AND, NOT, and exact phrase operators, the “intelligent” searching algorithms did not always

Page 15: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

15

result in the best resources being retrieved. Chief among the reasons may be the fact that these

products often are not absolute in respecting exact phrases. The ability to look beyond the strict

exact phrase is a valuable feature, especially when it comes to finding logical alternate forms of

words within exact phrases, as in “high school students” vs. “high school student”, and “violent

video games” vs. “violence in video games”. Without such capability, perhaps no or very few

exact hits would be found exactly as queries, and the searcher would be required to know in

advance what combinations of search terms actually exist in documents. On the other hand, the

way in which the algorithms return unpredictable blends of the requested terms sometimes leads

to a lower precision. On the other hand, perhaps the exact phrases were not respected because

there were very few incidences of pages in which they all occurred, and the search engines were

making the next best choice.

It is also likely that a real flaw in this exercise lay in the search query itself, which

contained a few implicit assumptions. The first was that some of the available literature would

express an association between student achievement and violent video games, which may not be

the case. The second was that removing “aggression” from the query might lead to more accurate

hits. The results suggest that many of the current sources about students playing violent video

games do so in the context of discussions about aggressive behavior, with or without any

mention of student achievement. It is entirely possible that excluding this term eliminated useful

material.

A second possible failing is the relationship between the kind of query chosen and the

intrinsic nature of much of the information found by web search engines. As previously

mentioned, web search engines find matching or related terms on a page, in metadata, and in

page links, and do not seem reliable in distinguishing between lists of unrelated articles or

Page 16: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

16

abstracts on a page, and a single document in which all the search conditions apply, in spite of

the highly touted algorithms, which purport to carry out an instant linguistic analysis. While one

of the engines does allow one to restrict search terms within a certain distance of each other, that

option was not available on all engines. One conclusion that can be drawn is that this particular

search was not appropriate to a web search engine, and would have been better suited to an

article database or an OPAC, both of which generally match terms with single sources. Perhaps

the most important lesson learned here is this cautionary tale for any researcher who may believe

that the best information is always found on the web.

Since the abstracts retrieved generally did not give enough information to make a judgment

about the relevance of hits obtained, each of the top 30 pages had to be viewed individually.

Google showed an approximate relevance rate of 13%, Ask of 10%, Gigablast of 14% (from a

very small sample), and Dogpile of 14%. These relative rates, however, are possibly inaccurate.

For example, a particular page from the Center on Media and Child Health, returned by Ask, was

not relevant because it did not connect the concepts in the search query, but following a link on

that page for “violent video games” brought up a page of extremely relevant studies discussing

the relationship to school performance. Finding the link seemed more fortuitous than an actual

instance of a relevant result from the search engine, though the general site was a promising

source.

The overlap in results returned among the products is as follows: Google and Ask—63%,

Google and Gigablast—57% (unscientifically extrapolated from 4 out of the total 7 results),

Google and Dogpile—68%, Ask and Gigablast (extrapolated from 3 out of 7 results), Ask and

Dogpile--64%, and Gigablast and Dogpile—43% (extrapolated from 3 out of 7 results).

Page 17: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

17

This particular exercise does not make it possible to designate one search engine as

preferable to another, since the precision was rather low overall. What does seem clear to this

researcher is that all the engines tested use complicated strategies for determining relevance,

making it difficult to decipher the exact functioning of the algorithm from these “black box”

results. Doing so would require analyzing in detail the entire contents of each page returned. The

fairly high rates of overlap between Google, Ask, and Gigablast, and their comparable precision

suggest that these products have access to many of the same resources. Google is somewhat

faster to respond than the other engines tested, especially if Google‟s Chrome browser is used.

Dogpile‟s inability to use many Boolean operators sets it a bit outside the mainstream in terms of

ease of use, while Gigablast‟s behavior with exact phrase terms makes it quirky and somewhat

unpredictable.

Page 18: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

18

References

Ask. (2009a). Advanced search. Retrieved 11 March, 2009, at

http://www.ask.com/?o=0&l=dir

Ask. (2009b). Advanced search tips. Retrieved 11 March, 2009, at

http://about.ask.com/en/docs/about/adv_search_tips.shtml

Ask. (2009c). Ask.com search tips. Retrieved 11 March, 2009, at

http://about.ask.com/en/docs/about/search_tips.shtml

Ask. (2009d). Ask search technology. Retrieved 11 March, 2009, at

http://about.ask.com/en/docs/about/webmasters.shtml

Ask. (2009e). Site Features. Retrieved 11 March, 2009, at

http://about.ask.com/en/docs/about/site_features_a11.shtml

Dogpile. (2009a). Advanced search. Retrieved 11 March, 2009, at

www.dogpile.com

Dogpile. (2008). Different Engines, Different Results Web: Searchers Not Always Finding What

They’re Looking for Online: A Research Study by Dogpile.com. Retrieved February 28, 2009 at:

http://www.infospaceinc.com/onlineprod/Overlap-DifferentEnginesDifferentResults.pdf

Dogpile. (2009c). Infospace. Retrieved 11 March, 2009, at

http://www.infospaceinc.com/ourstory/default.aspx

Dogpile. (2009d). Metasearch 101. Retrieved 11 March, 2009, at

http://www.dogpile.com/rescuefctb/ws/metasearch/_iceUrlFlag=11?_IceUrl=true

Gigablast. (2009a). Advanced search. Retrieved 11 March, 2009, at

http://gigablast.com/adv.html

Gigablast. (2009b). Query syntax. Retrieved 11 March, 2009, at

http://gigablast.com/help.html

Google. (2009a). Advanced Search. Retrieved 11 March, 2009, at

http://www.google.com/advanced_search?hl=en

Google. (2009b). Corporate information: Technology overview. Retrieved 11 March, 2009, at

http://www.google.com/corporate/tech.html

Google. (2009c). Google search basics: Basic search help. Retrieved 11 March, 2009, at

http://www.google.com/support/websearch/bin/answer.py?answer=134479&topic=351

Page 19: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

19

Google. (2009d). Google search basics: More search help. Retrieved 11 March, 2009, at

http://www.google.com/support/websearch/bin/answer.py?hl=en&answer=136861

Google. (2009f). Search Features. Retrieved 11 March, 2009, at

http://www.google.com/intl/en/help/features.htm

Page 20: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

20

Appendix 1

Comparison of Search Results of Four Search Engines

Search query 1: [student AND achievement] Google.com Ask.com Gigablast.com Dogpile.com

Number of hits 38,700,000 9,480,000 429,404 71

Notes Without AND operator up to

200,000,000 and favors whole

phrase, plural form found

Without AND operator little

difference,

Without AND operator up to

538,000 and favors whole

phrase,

AND not used, not an operator,

sponsored hits irrelevant and

included in count

Search query 2: [“student achievement”AND “high school students”] Google.com Ask.com Gigablast.com Dogpile.com

Number of hits 260,000 60,900 9,586 64

Notes Quotes work correctly for exact

phrase, without AND operator

little difference,

Quotes work correctly, but

single words are also

highlighted, and phrases may not

be part of document body

Quotes work correctly (though

some attempts a little buggy),

but single words are also

highlighted

AND not used, used apostrophe

to include two exact phrases

because only one exact phrase

box in advanced search. Search

results not exact (changed

plurals), sponsored results lower

precision

Search query 3: [research AND “student achievement” AND “high school students” AND "violent video games"] Google.com Ask.com Gigablast.com Dogpile.com

Number of hits 228 1420 97 47

Notes Noted that link is made between

student achievement and

aggression

Quotes work correctly, exact

phrase not respected in all

results, single words are also

highlighted, and phrases may not

be part of document body. Tends

to show more exact phrases in

abstracts

Quotes work correctly, and

phrases may not be part of

document body, single words

are also highlighted

Some results to not have all

phrases, several links are broken.

AND not used, used apostrophe

to include two exact phrases (see

previous search). Appeared more

consistent in respecting exact

phrases. Sponsored results lower

precision.

Page 21: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

21

Search query 4: [research AND “student achievement” AND “high school students” AND "violent video games" -aggression]

Google.com Ask.com Gigablast.com Dogpile.com

Number of hits 117 35 7 28

Relevance of top

30 hits

4/30 (13%) 3/30 (10%) 1/7 (14%) 4/28 (14%)

Notes Many hits are not single

documents, but rather lists of

articles, abstracts, blog headings,

etc.

Many hits are not single

documents, but rather lists of

articles, abstracts, blog headings,

etc.

Many hits are not single

documents, but rather lists of

articles, abstracts, blog headings,

etc.

MANY fewer results than other

products.

Quotes used, no AND.

In general search when minus

sign used to exclude term

“aggression” it does not appear

as excluded in the Advanced

Search, but when the entire

query done from within the

advanced Search with quotes for

exact phrases and no AND

operator, then toggling back to

general search shows the

excluded term with a minus sign.

Overlap Google/Ask 17/30 (63%)

Google/Gigablast 4/7

Google/Dogpile 19/28 (68%)

Ask/Gigablast 3/7

Ask/Dogpile 18/28 (64%)

Gigablast/Dogpile 3/7

Page 22: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

22

Appendix 2

Comparison of Functionality of Four Search Engines

The information in the chart below was culled from the four search engines‟ documentation. Each specific function has not been tested

to determine if it works, except for the search queries carried out and analyzed in the body of this paper.

Legend for chart:

Normal text indicates that the feature is explicitly listed as operable.

(Grayed-out) Indicates that the feature is not explicitly listed as being operable, and has not been tested.

* Indicates that the feature is not explicitly listed as being operable, but has been determined to be operable through a trial.

[ ] Brackets surround exact search query as entered in search box.

___ Blanks indicate the search query terms (operands) used with Boolean operators.

1. Default is AND, but preference given to both terms next to each other. To avoid giving preference to both terms together,

separate operands with [term .. term]. Both AND and a blank when used without parentheses will nest the two AND terms

before applying another operator afterwards, like OR. Says parentheses are optional, but that is confusing in some cases.

2. OR operator gives preference to hits with both terms

3. Default is AND, but if the operator AND is actually used, number of search results may change.

Search engine→ Google.com Ask.com Gigablast.com Dogpile.com

Features

documented ↓

Search categories Includes categories to limit

searches (eg.: Web, Video,

Image, Groups, Products,

Blogs, News, and MANY more)

Includes categories to limit

searches (eg.: Web, Video,

Image, Groups, Products,

Blogs, News, and MANY more)

Includes categories to limit

searches (Arts, Games, Kids and

Teens, etc.)

Includes categories to limit

searches (eg.: Web, Video,

Image, Music, News, Yellow

Pages, White Pages)

Boolean search? Allows special operators (see

below)

AND operator by default

-Allows special operators (see

below)

- AND operator by default*

-Allows special operators (see

below)

-AND operator by default1

Does not appear to consistently

allow special operators5

-AND operator by default*

Special query

operators allowed

[ ___ AND ___ ] to show both

terms3

[+__ ] to include commonly

ignored words, and to search

-[ ___ AND ___ ] to show both

terms*3

-[+__ ] to include commonly

ignored words, and to search

-[ ___ AND ___ ] to show both

terms1

[+__ ] to include commonly

ignored words, and to search

[ ___ AND ___ ] to show both

terms3

[+__ ] to include commonly

ignored words, and to search

Page 23: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

23

word precisely (no space,

precede by space)

related:(name of website)

[ -___ ] to exclude words (no

space, precede by space)

[ OR ___ ] to show at least one

of two separated terms (include

space)

[~__ ] synonym

[__*__ ] wildcard, whole words

only

[ “ __”] only exact word (same

as +), in phrase, exact words in

exact order

[ ___ site:(site name)] term

appears is specific site

word precisely (no space,

precede by space)

related:(name of website)

-[ -___ ] to exclude words (no

space, precede by space)

-[ OR ___ ] to show at least one

of two separated terms (include

space)

[~__ ] synonym

[__*__ ] wildcard, whole words

only

-[ “ __”] only exact word (same

as +), in phrase, exact words in

exact order

-[ ___ site:(site name)] term

appears is specific site

-[___ intitle:(title name)] term

must appear in page title

-[___ inurl:(url name)] term

must appear in name of the URL

[___ last:(period of time)] term

appears in pages during the

specified time period (subject to

limitations)

[___ afterdate:yyymmdd] term

appears in pages after specified

date

[___ beforedate:yyymmdd] term

appears in pages before specified

date

[___ betweendate:yyymmdd,

yyymmdd] term appears in

pages between specified dates

[___ inlink:(link address)] term

must appear in anchor link

word precisely (no space,

precede by space)

related:(name of website)

-[- ___ ] to exclude words (no

space, precede by space)

[AND NOT ___ ] to exclude

words

-[ OR ___ ] to show at least one

of two separated terms (include

space)2

[~__ ] synonym

[__*__ ] wildcard, whole words

only

-[ “ __”] only exact word (same

as +), in phrase, exact words in

exact order

-[ ___ site:(site name)] term

appears is specific site

-[ ___ title:(title name)] term

must appear in page title

-[ ___ suburl:(url name)] term

must appear in name of the URL

[___ last:(period of time)] term

appears in pages during the

specified time period (subject to

limitations)

[___ afterdate:yyymmdd] term

appears in pages after specified

date

[___ beforedate:yyymmdd] term

appears in pages before specified

date

[___ betweendate:yyymmdd,

yyymmdd] term appears in

pages between specified dates

-[___ link:(link address)] search

results link to webpage

word precisely (no space,

precede by space)

related:(name of website)

[ -___ ] to exclude words (no

space, precede by space)*

[ OR ___ ] to show at least one

of two separated terms (include

space)

[~__ ] synonym

[__*__ ] wildcard, whole words

only

[ “ __”] exact words in order*

[ „ __‟] exact words in order*

[ ___ site:(site name)] term

appears is specific site

Page 24: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

24

-[___ type:(doc, xls, etc.) search

results must be in given format

-[___ | ___] searches for first

term, then for second, ranks

according to second

-[ ___ ip:___] searches for term

in numerical IP address

Advanced search

form

Allows search to be defined by

AND, OR, NOT, language,

format, site name, or exact

phrases, data created, usage

rights, location of key words,

region, and numeric range. Also

“similar to the page” and “link to

the page”.

- Allows search to be defined by

AND, OR, NOT, language,

specific domain, or exact

phrases, date modified, location

of key words, region,

-Allows search to be defined by

AND, OR, NOT, site name,

exact phrase, a URL, pages

linked to URL, site clustering,

Allows search to be defined by

AND, OR, NOT, exact phrase,

specific domain, specific

language.

Method of treating

words (truncating,

case, plurals,

spelling, stop

words, etc.)

All words count, though some

terms ignored if results deemed

relevant

Case Insensitive

Punctuation ignored with certain

exceptions; signs with “obvious”

meaning in term, such as:

[$__ ] dollar sign indicates

price,

[__-__ ] hyphen joins two

closely-related words

[__ _ __ ] underscore can

connect two words

Uses synonyms automatically

Stop words ignored (a, for, the,

etc.) sometimes (logic applied)

Will offer corrected spelling

All words count, though some

terms ignored if results deemed

relevant

-Case Insensitive

Punctuation ignored with certain

exceptions; signs with “obvious”

meaning in term, such as:

[$__ ] dollar sign indicates

price,

[__-__ ] hyphen joins two

closely-related words

[__ _ __ ] underscore can

connect two words

Uses synonyms automatically

Stop words ignored (a, for, the,

etc.) sometimes (logic applied)

-Will offer corrected spelling

-Word order matters (should

follow natural language)

“Natural language technology”

offers suggestions

All words count, though some

terms ignored if results deemed

relevant

Case Insensitive

Punctuation ignored with certain

exceptions; signs with “obvious”

meaning in term, such as:

[$__ ] dollar sign indicates

price,

[__-__ ] hyphen joins two

closely-related words

[__ _ __ ] underscore can

connect two words

Uses synonyms automatically

Stop words ignored (a, for, the,

etc.) sometimes (logic applied)

-Will offer corrected spelling

-All words count, though some

terms ignored if results deemed

relevant*

-Case sensitive (in some cases)*

Punctuation ignored with certain

exceptions; signs with “obvious”

meaning in term, such as:

[$__ ] dollar sign indicates

price,

[__-__ ] hyphen joins two

closely-related words

[__ _ __ ] underscore can

connect two words

Uses synonyms automatically

-Stop words NOT always

ignored (a, for, the, etc.)

sometimes (logic applied)

-Will offer corrected spelling

Additional Spell-checking -Spell-checking* -Spell-checking -Spell-checking

Page 25: Google OR Ask OR Gigablast OR Dogpile: A Comparison of Web ...robertmuscomls.weebly.com/uploads/3/0/...web_search... · relevance. The first three search engines discussed, Google,

25

features (Bells and

Whistles)

Current weather

Current stock-quotes

Current time

Current scores

Book search

Unit conversion

Synonyms

Definitions

Business by location

Movies by location

Real estate by location

Flight info

Currency conversion

Maps

Package tracking

Patent numbers

Area codes location

Etc.

-Current weather

-Current stock-quotes

-Current time*

-Current scores*

Book search

-Unit conversion

-Synonyms

-Definitions

- Business by location

- Movies by location

-Real estate by location*

Flight info

-Currency conversion

-Maps

-Package tracking*

Patent numbers

Area codes

Natural language questions.

-TV listings

-local events

Etc.

Current weather

Current stock-quotes

Current time

Current scores

Book search

Unit conversion

Synonyms

Definitions

Business by location

Movies by location

Real estate by location

Flight info

Currency conversion

Maps

Package tracking

Patent numbers

Area codes location

Etc.

Current weather

Current stock-quotes

Current time

Current scores

Book search

Unit conversion

Synonyms

Definitions

Business by location

Movies by location

Real estate by location

Flight info

Currency conversion

Maps

Package tracking

Patent numbers

Area codes location

Etc.

Search Engine

advice

Keep it simple, start with fewer

words

Use words you guess most likely

to appear on page

Choose more descriptive words

for specific needs

“Search is rarely absolute … a

variety of techniques is used”

Use words you guess most likely

to appear on page

Choose more descriptive words

for specific needs

Use natural word order

Search one question at a time

Use spaces between words (test

if unsure)

Use the most direct words

possible.

Refine searches (use refined

categories offered)

Use natural word order

Use spaces between words (test

if unsure)

Use search categories next to

search box.