22
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Embed Size (px)

Citation preview

Page 1: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Web Searching Basics

Dr. Dania Bilal

IS 530

Fall 2009

Page 2: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

How the Web Came About?

• First, we had the Internet with text-based files and indexes to find information in these files – Static, no graphics or multimedia– No point and click using a mouse– No GUI (Graphical User Interface)– Menu-driven and subject categories for topics

were hierarchical in nature

Page 3: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

How the Web Came About?

• Tim Berners-Lee– Late 1980s created the HTTP protocol– Hypertext Transfer Protocol– Links various files and documents (text,

sound, images, videos, etc.) available on various Internet host servers in a seamless way

• Beginning of the World Wide Web (WWW)• WWW is part of the Internet

Page 4: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

How the Web Came About?

• Graphical Web browsers were developed for navigating through Web content

• Mosaic– First Web browser – Appeared in 1993– Revolutionized access to information – Made use of the Web much easier to use

• Other browsers appeared

Page 5: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Searching the Web

• Search engines (general and subject-driven)

• Directories

• Meta-search engines

• Meta-directories

Page 6: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Search Engines

• Engines are computer programs designed for searching the Web

• Components– Crawlers or spiders– Database – Search engine software – Search algorithms

Page 7: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Crawlers or Spiders

• Traverse the Web, visits web pages that are not blocked

• Read the pages visited

• Follows links form pages to additional pages

• Return frequently to the pages for updates

Page 8: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Database Component

• Stores copies of the web pages the crawlers or spiders visited

• Database is organized based on a preset scheme

• Fields in each document or webpage are identified (e.g., URL, page title, header or section title, metadata described by author of a page)----> pages are indexed

Page 9: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Search Engine Software

• Program that sorts through the pages stored in the database

• Takes a user query entered in a search engine• Matches the words in the query to the web

pages stored in the database alongside the search criteria in the query– Matches each word and accounts for the operators

appearing in the query (+; -; “ “)• The + sign is assumed when no operators are used

Page 10: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Search Engine Software

• Matching is performed by algorithms (computational rules)

• Relevance of what was matched is calculated using sophisticated algorithms

• Relevance ranking of pages returned to a user are based on rules used by the engine company

Page 11: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Search Engine Relevance Ranking

• Some criteria– Word frequency– Location of a word in the web page or

document • page title, page URL, page first heading, 2nd

heading, first sentence in a heading, etc.)

– Number of links to a page by other pages– No. of clicks on a page when it appears in the

result of a search– Meta-tags (metadata)

Page 12: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Basic Search Strategy

• Identify the information need• Extract basic concepts from the information need (broad

ideas) • Choose possible keywords or terms related to the

concepts– Think of broader, narrower, or related terms

• Determine the search logic and techniques most suitable for formulating a search using the keywords or terms – Boolean? Proximity? Combination of both? Nesting?

• Select an appropriate engine, directory, meta-engine, or meta-directory based on the topic

Page 13: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Basic Search Strategy• Explore the features of the engine or directory if you’re unfamiliar with them

– Visit the Advanced Search options, Help file, Search Tips, as applicable • Conduct the search • Examine the first page of returned results and visit the top five or more

– Search engine ranks results not based on the context of the topic search; rather, based on the matching and ranking criteria

• System relevance

• Identify the pages or documents that are the most relevant to your topic– User relevance judgment (also called pertinence)

• Use the most relevant document or page and explore the keywords, headings, phrases, etc. that you can use to find additional relevant pages or documents.

– “Seed” document or “Pearl growing”– Follow the Cited by, as applicable to find additional documents relevant to the

topic. • Revise your search if needed.• Try your search in another engine, specialized engine, meta-engine,

directory, etc.

Page 14: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

The Question of Quality

• Criteria for evaluating information quality– Source domain (.com, .edu, .gov, etc.)– Authority– Purpose or motivation– Quality of writing– Balanced views– Currency of information– Sources cited

Page 15: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

The Question of Quality

• Accuracy• Factual information (check against two or

more authoritative sources)• Use additional sources for evaluating the

quality of information on the Internet. http://www.virtualchase.com/quality http://www.lib.berkeley.edu/TeachingLib/Guides/

Internet/Evaluate.html

Page 16: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

The Invisible Web

• Search engines don’t index all web pages

• Reasons:– Information stored in databases that require

subscription– Pages or websites that are password-

protected – Pages that are not linked to other pages– Pages that are blocked to spiders or crawlers

Page 17: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Search Logic: Boolean Operators

Source: Google Images

Page 18: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Boolean and Search Engines

• AND +

• OR

• NOT -

Page 19: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Phrase Searching

• Proximity searching

• “ “ are used in search engines

• Provides more precise results

• Limits the results to the words that are close to each other.

Page 20: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Demos

• Google Features– Basic– Advanced– I’m feeling lucky– Google Directory– About Google– More (from the menu option)– Show options/Hide options (from the results

page)

Page 22: Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009

Yahoo Demo

• Basic

• Advanced

• Directory

• Yahoo Answers

• Ask Earl

• Other features