WHAT HAVE WE DONE SO FAR? Weeks 1 – 8 : various components of an information retrieval system ...

Preview:

Citation preview

WHAT HAVE WE DONE SO FAR?WHAT HAVE WE DONE SO FAR?

Weeks 1 – 8 : various components of an Weeks 1 – 8 : various components of an information retrieval systeminformation retrieval system

Now – look at various examples of Now – look at various examples of information retrieval systemsinformation retrieval systems InternetInternet Digital libraryDigital library OPACOPAC Bibliographical systemsBibliographical systems

WMES3103 : WMES3103 : INFORMATION INFORMATION

RETRIEVALRETRIEVALWEEK 12WEEK 12

SEARCHING THE WEBSEARCHING THE WEB

INTERNETINTERNET

Different types of Different types of information on the information on the InternetInternet

Journal Journal magazines, magazines, newspapersnewspapers

DatabasesDatabases SoftwareSoftware MultimediaMultimedia Organisational Organisational

informationinformation

Different types of Web Different types of Web sitessites• EntertainmentEntertainment• Business & marketingBusiness & marketing• Reference/informationReference/information• NewsNews• Personal web sitesPersonal web sites

Information Source Information Source (http://www.clearinghouse.net)(http://www.clearinghouse.net)

Dictionaries & list of Dictionaries & list of acronymsacronyms

Telephone & email Telephone & email directorydirectory

EncyclopediaEncyclopedia ThesauriThesauri Dictionaries of Dictionaries of

other languagesother languages ArticlesArticles

E-jounalE-jounal Contents pages of Contents pages of

journalsjournals DirectoriesDirectories TV & radioTV & radio NewspaperNewspaper etcetc

WEBWEB

The Web is a portion of the InternetThe Web is a portion of the Internet Use of hypertextUse of hypertext 3 methods of searching for information on 3 methods of searching for information on

the Webthe WebUse a search engine Use a search engine Use a Web directory that classes the sites Use a Web directory that classes the sites

by subjectby subjectUse hyperlinkUse hyperlink

PROBLEMS WITH THE WEBPROBLEMS WITH THE WEB

DataData

a. Distributed dataa. Distributed data

b. High % of volatile datab. High % of volatile data

c. Large volumec. Large volume

d. Unstructured redundantd. Unstructured redundant

datadata

e. Quality of datae. Quality of data

f.  Heterogeneous dataf.  Heterogeneous data

PROBLEMS WITH THE WEBPROBLEMS WITH THE WEB

User’s interaction with the User’s interaction with the IRSIRS

a. How to specify a query?a. How to specify a query?

b. How to interpret the b. How to interpret the answer provided by the answer provided by the system?system?

SEARCH ENGINESSEARCH ENGINES

Single – use crawlers Single – use crawlers to find and retrieve to find and retrieve information, information, descriptors from own descriptors from own index database, use index database, use own database,own database,

ranking,ranking, (Altavista,Infoseek, (Altavista,Infoseek,

Excite,Goggle, Excite,Goggle, DirectHit, HotBot)DirectHit, HotBot)

• Specialised – Specialised – search for specific search for specific information only information only (Thomas, SOSIG, (Thomas, SOSIG, ERIC)ERIC)

• Meta – use other Meta – use other search engines search engines concurrently concurrently (Metacrawler, (Metacrawler, SavvySearch)SavvySearch)

USER INTERFACEUSER INTERFACE

Query interfaceQuery interface Basic : a box where Basic : a box where

user can type in one user can type in one or more wordsor more words

Complex – uses Complex – uses command language – command language – Boolean operators, Boolean operators, phrase searching, phrase searching, proximity searching, proximity searching, wild card - example : wild card - example : HotBot dan Northern HotBot dan Northern LightLight

ANSWER INTERFACEANSWER INTERFACE

Lists the 10 most relevant sites by rankingLists the 10 most relevant sites by ranking Ranking on the index and not on the textRanking on the index and not on the text Information – URL, size, date page was Information – URL, size, date page was

indexed, page indexed, title and a few lines indexed, page indexed, title and a few lines from the document or descriptors or a from the document or descriptors or a sentence sentence

Example : AltaVista, HotBot, Northern Light, Example : AltaVista, HotBot, Northern Light, ExciteExcite

Arranged by ranking and relevanceArranged by ranking and relevance Too many hits, resubmit queryToo many hits, resubmit query

WEB DIRECTORYWEB DIRECTORY

Numerous search engines provide Numerous search engines provide categorization of subjectscategorization of subjects

Also known as Also known as catalogscatalogs, , yellow pages yellow pages or or subject directoriessubject directories

Send web sites to the Web directory for Send web sites to the Web directory for checking and if accepted, it will be checking and if accepted, it will be classified and added to the directoryclassified and added to the directory

Example : YahooExample : Yahoo

USER - PROBLEMSUSER - PROBLEMS

Unable to search for wordsUnable to search for words Unable to find suitable words because do Unable to find suitable words because do

not understand how system look for the not understand how system look for the selected wordsselected words

Do not understand proper use of Boolean Do not understand proper use of Boolean operatorsoperators

EVALUATION OF WEB SITESEVALUATION OF WEB SITESCRITERIACRITERIA

AccuracyAccuracy AuthorityAuthority ObjectivityObjectivity CurrencyCurrency CoverageCoverage www.gvsu.edu/library/www.gvsu.edu/library/

TEN C’s FOR EVALUATING TEN C’s FOR EVALUATING INTERNET SOURCESINTERNET SOURCES

ContentContent CredibilityCredibility Critical thinkingCritical thinking CopyrightCopyright CitationCitation

ContinuityContinuity CensorshipCensorship ConnectivityConnectivity ComparabilityComparability ContextContext

www.uwec.edu/www.uwec.edu/library/guides/library/guides/tencs.htmltencs.html

METASEARCHERSMETASEARCHERS

Web servers which sends Web servers which sends query to a few search query to a few search engines, Web directories engines, Web directories and other databases, and other databases, collect and collate the collect and collate the answersanswers

Example : Metacrawler, Example : Metacrawler, Savvysearch, CopernicSavvysearch, Copernic

INTERNETINTERNET

An information retrieval systemAn information retrieval system Has input, process and outputHas input, process and output Has relevance feedback cycleHas relevance feedback cycle ComponentsComponents

Retrieval evaluationRetrieval evaluation Query language/operationQuery language/operation Text operationsText operations Indexing & searchingIndexing & searching User interfaceUser interface

Recommended