54
Tutorial 3 Searching the Web

Tutorial 3 Searching the Web. XP Objectives Determine whether a research question is specific or exploratory Learn how to formulate an effective Web search

  • View
    219

  • Download
    4

Embed Size (px)

Citation preview

Tutorial 3

Searching the Web

XPXPXPObjectives• Determine whether a research question is specific or

exploratory

• Learn how to formulate an effective Web search strategy to answer research questions

• Learn how to use Web search engines, Web directories, and Web metasearch engines effectively

New Perspectives on The Internet, Seventh Edition 2

XPXPXPObjectives• Use Boolean logic and filtering techniques to improve

your Web searches

• Use advanced search options in Web search engines

• Assess the validity and quality of Web research resources

• Learn about the future of Web search tools

New Perspectives on The Internet, Seventh Edition 3

XPXPXPTypes of Search Questions• Specific question: question you can phrase easily

and one for which you will recognize the answer when you find it

• Exploratory question: open-ended question that can be harder to phrase; it also is difficult to determine when you find a good answer

New Perspectives on The Internet, Seventh Edition 4

XPXPXPSpecific Question

New Perspectives on The Internet, Seventh Edition 5

XPXPXPExploratory Question

New Perspectives on The Internet, Seventh Edition 6

XPXPXPWeb Search Process

New Perspectives on The Internet, Seventh Edition 7

XPXPXPWeb Search Strategy• Repeating the search process

– You may need to reformulate, or more clearly state, your question

– Try to think of synonyms for each word

– Identify unique phrases that relate to your topic or question

New Perspectives on The Internet, Seventh Edition 8

XPXPXPUsing Search Engines• Four Broad Categories of Web Search Tools:

– Search engines– Directories– Metasearch engines– Other Web resources such as Web

bibliographies

XPXPXPUnderstanding Search Engines• Search engine: Web site (or part of a Web site) that

finds other Web pages that match a word or phrase you enter

• Search expression or query: word or phrase you enter in a search engine

• A search expression might also include instructions that tell the search engine how to search

• A search engine does not search the Web to find a match; it searches only its own database of information about Web pages that it has collected, indexed, and stored

New Perspectives on The Internet, Seventh Edition 10

XPXPXPUnderstanding Search Engines• Hit: a Web page indexed in the search engine’s

database that contains text that matches your search expression

• Most search engines report the number of hits they find

• Results pages: a list of Web pages in a search engine that contain hyperlinks to the Web pages that contain text that matches your search expression

New Perspectives on The Internet, Seventh Edition 11

XPXPXPUnderstanding Search Engines• Web robot (bot or spider): a program that

automatically searches the Web to find new Web sites and update information about old Web sites already in the database

• Most search engines allow Web page creators to submit the URLs of their pages to search engine databases

• Search engine operators often sell advertising space on the search engine Web page and on the results pages

New Perspectives on The Internet, Seventh Edition 12

XPXPXPUnderstanding Search Engines• Sponsored links: paid placement links on results pages

• Banner ad: sponsored link that appears in a box on the page (usually at the top, but sometimes along the side or bottom of the page)

• Revenue from sponsored links and banner ads is used to generate profit after covering the costs of maintaining the computer hardware and software required to search the Web and to create and search the database

New Perspectives on The Internet, Seventh Edition 13

XPXPXPUnderstanding Search Engines

New Perspectives on The Internet, Seventh Edition 14

Google search results for the search term “car”

XPXPXPUsing More Than One Search Engine• Each search engine includes different Web pages in its

database• Each search engine uses different rules to evaluate search

expressions• The best way to determine how a specific search engine

interprets search expressions is to read the Help pages on the search engine Web site

• Search engines change the way they interpret search expressions from time to time, so you should read the Help pages regularly

New Perspectives on The Internet, Seventh Edition 15

XPXPXPUnderstanding Search Engine Databases• Search engine databases store different collections of

information about the pages that exist on the Web at any given time

• Each search engine database indexes the information it has collected from the Web differently

• Search engine robots may collect information from a Web page’s title, description, keywords, HTML tags, or read a certain number of words from each Web page

New Perspectives on The Internet, Seventh Edition 16

XPXPXPUnderstanding Search Engine Databases

• Meta tag: HTML code that a Web page creator places in the page header for the specific purpose of informing Web robots about the content of the page

New Perspectives on The Internet, Seventh Edition 17

META tags in a Web page

XPXPXPUnderstanding Search Engine Databases• Full text indexing: when search engines store the entire

content of every Web page they index

• Stop words: common words, such as and, the, it, and by, that many search engines omit from their databases

• Many search engines include information about their search engines, robots, and databases on their Help or About pages

New Perspectives on The Internet, Seventh Edition 18

XPXPXPSearch Engine Features• Page ranking: a way of grading Web pages by the number of

other Web pages that link to them• URLs of Web pages with high rankings are presented first on

search results pages• Stemming: the use of the root form of a word to find results

containing the root word and its variations, which are created by adding standard endings to the root word

• Natural language query interface: allows users to enter a question exactly as they would ask a person that question

• Parsing: the procedure of converting a natural language question into a search expression

New Perspectives on The Internet, Seventh Edition 19

XPXPXPSearch Engine Features

New Perspectives on The Internet, Seventh Edition 20

Natural language query on Askcom

XPXPXPUsing Directories and HybridSearch Engine Directories• Web directory: a listing of hyperlinks to Web pages that is

organized into hierarchical categories• The difference between a search engine and a Web directory is

that people select the Web pages to include in a Web directory• Many directories allow a Web page to be indexed in several

different categories• The main weakness of a Web directory is that you must know

which category is likely to yield the information you desire• Yahoo! is one of the oldest and most respected directories on

the Web

New Perspectives on The Internet, Seventh Edition 21

XPXPXPUsing Directories and HybridSearch Engine Directories

New Perspectives on The Internet, Seventh Edition 22

Yahoo! Web directory

XPXPXPUsing Directories and HybridSearch Engine Directories• Hybrid search engine directory: the combination of

search engine and directory

• Using a hybrid search engine directory can help you identify which category in the directory is likely to contain the information you need

• After you enter a category, the search engine is useful for narrowing a search even further. You can enter a search expression and limit the search to that category

New Perspectives on The Internet, Seventh Edition 23

XPXPXPUsing Metasearch Engines• Metasearch engine

– Tool that lets you search several engines at the same

– Does not have its own database of Web information

– Accepts a search expression and transmits it to several search engines, which run the search expression against their databases and then return results to the metasearch engine, which then reports consolidated results from all of the search engines it queried

• Mammacom was one of the first metasearch engines on the Web

New Perspectives on The Internet, Seventh Edition 24

XPXPXPUsing Metasearch Engines

New Perspectives on The Internet, Seventh Edition 25

Mammacom was one of the first metasearch engines on the Web

XPXPXPUsing Metasearch Engines• In the KartOO metasearch engine, hits are shown as images each

image is clustered around words that appear in the results pages • When the pointer is moved over a word, the links appear as lines

between the word and the images • To refine a search, click a word to add it to the search expression

New Perspectives on The Internet, Seventh Edition 26

XPXPXPUsing Other Web Resources• Web bibliographies: Web search tools that are similar

to bibliographies in that they contain lists of hyperlinks to Web pages, but instead contain list of links to Web pages

• Many of these resources include summaries or reviews of Web pages

• Also called:– Resource lists– Subject guides– Clearinghouses– Virtual libraries

New Perspectives on The Internet, Seventh Edition 27

XPXPXPUsing Other Web Resources• Web bibliographies are sometimes confusingly called

“Web directories”– Usually more focused on specific subjects than Web directories– Usually do not include a tool for searching within their

categories• Web bibliographies can be very useful when you want to

obtain a broad overview or a basic understanding of a complex subject area

• Some Web bibliographies are general references, but most are more focused

• Many Web bibliographies are created by librarians at university and public libraries

New Perspectives on The Internet, Seventh Edition 28

XPXPXPBoolean Logic andFiltering Techniques• The most important factor in obtaining good results in a

Web search is careful selection of search terms

• You can usually choose one or two words that will work well when the object of your search is straightforward

• More complex search questions require more complex queries, which you can use along with Boolean logic, search expression operators, or filtering techniques, to broaden or narrow your search expression

New Perspectives on The Internet, Seventh Edition 29

XPXPXPBoolean Operators• Boolean algebra was developed by George Boole, a

nineteenth century British mathematician • Boolean operators, or logical operators, specify the logical

relationship between the elements they join• Three basic Boolean operators—AND, OR, and NOT—are

recognized by most search engines• You can use these operators in many search engines by

including them with search terms

New Perspectives on The Internet, Seventh Edition 30

XPXPXPBoolean Operators

Search Expression Search Returns Pages that Include

exports AND France AND Japan All of the three search terms

exports OR France OR Japan Any of the three search terms

exports NOT France NOT Japan Exports, but not if the page also includes the terms France or Japan

exports AND France NOT Japan Exports and France, but not Japan

New Perspectives on The Internet, Seventh Edition 31

XPXPXPOther Search Expression Operators• A precedence operator, also called an inclusion

operator or a grouping operator, clarifies the grouping within a complex expression and is usually indicated by the parentheses symbols

• A location operator, or proximity operator, lets you search for terms that appear close to each other in the text of a Web page. The most common location operator offered in Web search engines is the NEAR operator

New Perspectives on The Internet, Seventh Edition 32

XPXPXPWildcard Characters• Wildcard character:

– Allows you to omit part of a search term– Most search engines support some use of a wildcard

character in their search expressions– Many search engines recognize the asterisk (*) as the

wildcard character

New Perspectives on The Internet, Seventh Edition 33

XPXPXPSearch Filters• Search filter:

– Eliminates Web pages from a search– Filter criteria can include such Web page attributes as

language, date, domain, host, or page component– Many search engines allow you to restrict your

search by using them

New Perspectives on The Internet, Seventh Edition 34

XPXPXPComplex Searches• Most search engines implement many of the

operators and filtering techniques you have learned about

• Some search engines provide separate advanced search pages for these techniques

• Some search engines allow you to use advanced techniques such as Boolean operators on their simple search pages

New Perspectives on The Internet, Seventh Edition 35

XPXPXPUsing AltaVistaAdvanced Search• Open the AltaVista search engine in your Web browser

• Select the Advanced Search option

• Formulate the Boolean search

• Enter the search terms in the query builder in accordance with the Boolean logic rules

• Click the Find button

• Evaluate the results and, if necessary, revise your search expression

New Perspectives on The Internet, Seventh Edition 36

XPXPXPUsing AltaVistaAdvanced Search

New Perspectives on The Internet, Seventh Edition 37

Complex search in AltaVista

XPXPXPFiltered Search in Askcom• Open the Askcom search engine page in your Web

browser• Click the Advanced link• Formulate and enter a suitable search expression• Set any filters you want to use for the search• Click the Advanced Search button• Evaluate the results and, if necessary, revise your

search expression

New Perspectives on The Internet, Seventh Edition 38

XPXPXPFiltered Search in Askcom

New Perspectives on The Internet, Seventh Edition 39

Advanced search page in Askcom

XPXPXPFiltered Search in Google• Open the Google search engine page in your

Web browser

• Click the Advanced Search link

• Formulate and enter suitable search expression elements

• Formulate and set appropriate search filters

• Click the Google Search button

• Evaluate the results and, if necessary, revise your search expression

New Perspectives on The Internet, Seventh Edition 40

XPXPXPFiltered Search in Google

New Perspectives on The Internet, Seventh Edition 41

Advanced search page in Google

XPXPXPSearch Engines withClustering Features• Clusty is a search engine that uses advanced

technology to group its results into clusters

• The clustering of results provides a filtering effect

• The filtering is done automatically by the search engine after it runs the search

New Perspectives on The Internet, Seventh Edition 42

XPXPXPObtaining Clustered Search Results Using Clusty• Open the Clusty search engine page in your

browser• Formulate and enter a suitable search expression• Click the Search button• Evaluate the results and, if necessary, revise your

search expression

New Perspectives on The Internet, Seventh Edition 43

XPXPXPObtaining Clustered Search Results Using Clusty

New Perspectives on The Internet, Seventh Edition 44

XPXPXPFuture of Web Search Tools• Most search engines only search static Web pages

– Static Web page: an HTML file that exists on a Web server– Dynamic Web page: a Web page generated as a result of a

user’s query– Dynamic Web pages are not stored permanently on a Web

server and cannot be found by search engine robots– Much of the content on dynamic Web pages is accessible only

by logged-in users

• Several researchers have explored the difficulties that search engine robots face when trying to include information contained in the databases that some Web sites use to generate their dynamic pages

New Perspectives on The Internet, Seventh Edition 45

XPXPXPUsing People to EnhanceWeb Directories• About.com hires people with expertise in specific

subject areas to create and manage their Web directory entries in those areas

• The Open Directory Project uses the services of more than 40,000 volunteer editors who maintain listings in their individual areas of interest– Offers the information in its Web directory to other Web

directories and search engines at no charge– Many major Web directories, search engines, and metasearch

engines regularly download and store the Project’s information in their databases

New Perspectives on The Internet, Seventh Edition 46

XPXPXPEvaluating Web Research Resources• Information on the Web is seldom subjected to the

review and editing processes that have become a standard practice in print publishing

• The risks of obtaining and relying on inaccurate or unreliable information can be significant

• Reduce your risk by carefully evaluating the quality of any Web resource on which you plan to rely for information related to an important judgment or decision

• Evaluate on the Web page’s authorship, content, and appearance

New Perspectives on The Internet, Seventh Edition 47

XPXPXPAuthor Identity and Objectivity• Web pages should identify the author and present the

author’s background information and credentials

• Check secondary sources for corroborating information

• Author contact information should be provided

• Examine the domain identifier in the URL

• Consider whether the qualifications presented by the author pertain to the material that appears on the Web site

• Information about the author’s affiliations should be provided

New Perspectives on The Internet, Seventh Edition 48

XPXPXPContent• Determine timeliness of the content by checking

the publication date

• Read the content critically and evaluate whether the included topics are relevant to the research question at hand

• Determine whether important topics or considerations were omitted

• Assess the depth of treatment the author gives to the subject

New Perspectives on The Internet, Seventh Edition 49

XPXPXPForm and Appearance• Many pages that contain low-quality or incorrect

information are poorly designed and not well edited

• A Web page that contains spelling errors might indicate a low-quality resource

• Loud colors, graphics that serve no purpose, and flashing text are all Web page design elements that often suggest low-quality resource

New Perspectives on The Internet, Seventh Edition 50

XPXPXPEvaluating the Quality of a Web Site• Open the Web page in your Web browser

• Identify the author, if possible. If you can identify the author, evaluate his or her credentials and objectivity

• Examine the content of the Web site

• Evaluate the site’s form and appearance

• Draw a conclusion about the site’s overall quality

New Perspectives on The Internet, Seventh Edition 51

XPXPXPWikipedia• Wikipedia is a Web site that hosts a community-

edited set of online encyclopedias in more than a dozen different languages

• The concept behind Wikipedia is similar to that behind the Open Directory Project

• Wikipedia’s content is only as good as its contributors, and consequently, some of the information on the site is inaccurate, incomplete, or biased

New Perspectives on The Internet, Seventh Edition 52

XPXPXPSummary• In this tutorial, you learned:

– How to formulate specific and exploratory research questions

– How to use a structured Web search process to find information on the Web

– How to develop search expressions and used them in search engines, Web directories, and metasearch engines

– What Boolean operators, precedence operators, and location operators are and how they work in several major search engines

New Perspectives on The Internet, Seventh Edition 53

XPXPXPSummary• In this tutorial, you learned (continued):

– How to use wildcards in search expressions

– How to use several types of filtering techniques to narrow your search results

– How to evaluate the validity and reliability of a Web page by using information about author identity and objectivity

– How to evaluate the validity and reliability of a Web page by evaluating content, form and appearance

New Perspectives on The Internet, Seventh Edition 54