43
Group 8-Bibliotechs Assignment 2 4/13/09 Group Members: Student A Susan Edwards Student C Student D Student E Student F PART A—DATABASE STRUCTURE 1. Database Structure Textbase Structure Textbase Information Textbase: C:\Documents and Settings\Pablo\My Documents\SLIS\LIBR 202\Group8\Assignment2\Group8_Assignment2 Created: 3/11/2009 8:59:04 PM Modified: 3/12/2009 9:48:53 PM Field Summary: 1. DocNo: Automatic Number(next avail=15, increm=1), Term 2. Author: Text, Term & Word 3. Title: Text, Term & Word 4. JnlCite: Text, Term & Word 5. Abstract: Text, Term & Word 6. Postco: Text, Term Validation: valid-list 7. Preco: Text, Term Validation: valid-list Log file enabled, showing 'DocNo' Leading articles: a an the Stop words: a an and by for from in of the to

Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Group 8-Bibliotechs Assignment 24/13/09

Group Members:Student ASusan EdwardsStudent CStudent DStudent EStudent F

PART A—DATABASE STRUCTURE

1. Database Structure

Textbase Structure

Textbase Information

Textbase: C:\Documents and Settings\Pablo\My Documents\SLIS\LIBR 202\Group8\Assignment2\Group8_Assignment2Created: 3/11/2009 8:59:04 PMModified: 3/12/2009 9:48:53 PM

Field Summary: 1. DocNo: Automatic Number(next avail=15, increm=1), Term 2. Author: Text, Term & Word 3. Title: Text, Term & Word 4. JnlCite: Text, Term & Word 5. Abstract: Text, Term & Word 6. Postco: Text, Term Validation: valid-list 7. Preco: Text, Term Validation: valid-list

Log file enabled, showing 'DocNo'Leading articles: a an the Stop words: a an and by for from in of the to XML Match Fields: 1. DocNo

Textbase Defaults: Default indexing mode: SHARED IMMEDIATE Default sort order: <none>Textbase passwords:

Page 2: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 2

Master password = '' 0 Access passwords: No Silent password

2. Validation List

Pre Coordinate TermsTerm index for field 'Preco', textbase 'Group8_Assignment2',

1 Archives Management--Digital Documents--Standards--Object-oriented Model1 Archives Management--Digital Documents--Standards--Relational Database Model1 Controlled Vocabulary--Thesaurus--Standards1 Information Organization--Metadata--Tagging--Community tagging1 Information Organization--Methods--Aggregation1 Information Retrieval and Storage--Methods--Controlled Vocabulary--Thesaurus1 Information Retrieval and Storage--Methods--Natural Language Processing1 Information Retrieval--Content based--3-D object1 Information Retrieval--Content based--audio1 Information Retrieval--Content based--image1 Information Retrieval--Content based--video1 Information Retrieval--Evaluation--Precision1 Information Retrieval--Evaluation--Recall3 Information Retrieval--Information Retrieval Systems2 Information Retrieval--Methods--Browsing3 Information Retrieval--Methods--End-user2 Information Retrieval--Methods--Manual Searching4 Information Retrieval--Methods--Search1 Information Retrieval--Methods--Search--Directed Search1 Information Retrieval--Methods--Search Engines1 Information Retrieval--Methods--Search--Full Text1 Information Retrieval--Systems2 Information Retrieval Systems--Design--User Interface--User Behavior1 Information Science--Bibliometrics--Citation Analysis4 Information Science--Information Organization1 Information Science--Information Retrieval1 Information Science--Information Storage1 Information Science--Metadata--Standards--Archives1 Information Science--Metadata--Standards--History1 Information Science--Metadata--Standards--Libraries1 Information Science--Metadata--Standards--Museums1 Information Science--Professional Ethics1 Information Science--Theory--Boolean Logic1 Information Scientists--Claude Shannon1 Information Scientists--George Boole

Page 3: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 3

1 Information Scientists--Mortimer Taube1 Information Storage--Archives Management--Digital Documents--Decay1 Information Storage--Archives Management--Digital Documents--Encoding1 Information Storage--Archives Management--Digital Documents--Migration2 Information Storage--Archives Management--Digital Documents--Preservation1 Information Storage--Archives Management--Digital Documents--Software Emulation1 Information Storage--Preservation-- Ephemera1 Information Storage--Preservation--Websites1 Internet--Evolution--Development1 Libraries--Collections--Development--Digital--Internet1 Libraries--Collections--Development--Digital--Storage1 Libraries--Collections--Development--Print1 Libraries--Digital1 Philosophy--Knowledge representation2 Search Engines--Evaluation1 Search Engines--Metasearch1 Search Engines--Navigation--Interfaces2 Search Engines--Users--Behavior3 Search Engines--Users--Usability1 Search Enginges--Users--Information--Needs

Total number of keys: 55

Post Coordinate TermsTerm index for field 'Postco', textbase 'Group8_Assignment2'

1 Aggregated search1 Aggregation1 Archival content1 Archives1 Boolean Algebra1 Boolean Logic1 Browsing1 Circuits1 Claude Shannon1 Collection management1 Community tagging1 Compass Interface1 Content based information1 Controlled Vocabulary1 Data Migration1 Database standards1 Database structures2 Descriptors

Page 4: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 4

3 Digital media1 Digital Migration2 Digital Preservation2 Document relevance2 Evolving Search Strategies1 Full text retrieval1 Full text search1 George Boole1 Global Search Engines2 Information Evolution3 Information organization3 Information retrieval2 Information science1 Information Storage5 Information systems2 Internet2 Libraries1 Library History1 Manual indexed term search1 Medical thesaurus2 Metadata2 Methodology1 Mortimer Taube1 Museums1 Precision1 Preservation1 Professional values1 Public Health1 Recall2 Retrieval effectiveness2 Search Engines4 Searching methodologies1 Standardization1 Switches1 Thesauri1 Thesaurus Development3 Usability1 User behavior2 User studies1 User Trail1 Web 2.0

Total number of keys: 59

Page 5: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 5

Records

DocNo 1Author Marcia J. BatesTitle The Invisible Substrate of Information ScienceJnlCite Journal of the American Society for Information Science. 50(12):1043–1050Abstract The explicit, above-the-water-line paradigm of information science is well known and widely discussed. Every disciplinary paradigm, however, contains elements that are less conscious and explicit in the thinking of its practitioners. The purpose of this article is to elucidate key elements of the below-the-water-line portion of the information science paradigm. Particular emphasis is given to information science’s role as a meta-science—conducting research and developing theory around the documentary products of other disciplines and activities. The mental activities of the professional practice of the field are seen to center around representation and organization of information rather than knowing information. It is argued that such representation engages fundamentally different talents and skills from those required in other professions and intellectual disciplines. Methodological approaches and values of information science are also considered.Postco Information systems Information science Libraries Methodology Professional values Information organizationPreco Information Science--Information Organization Information Science--Professional Ethics Philosophy--Knowledge representation

DocNo 2Author Bernard J. Jansen Amanda Spink Sherry KoshmanTitle Web Searcher Interaction With the Dogpile.com Metasearch EngineJnlCite Journal of the American Society for Information Science and Technology, 58(5):744–755Abstract Metasearch engines are an intuitive method for improving the performance of Web search by increasing coverage, returning large numbers of results with a focus on relevance, and presenting alternative views of information needs. However, the use of metasearch engines in an operational environment is not well understood. In this study, we investigate the usage of Dogpile.com, a major Web metasearch engine, with the aim of discovering how Web searchers interact with metasearch engines. We report results examining 2,465,145 interactions from

Page 6: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 6

534,507 users of Dogpile.com on May 6, 2005 and compare these results with findings from other Web searching studies. We collect data on geographical location of searchers, use of system feedback, content selection, sessions, queries, and term usage. Findings show that Dogpile.com searchers are mainly from the USA (84% of searchers), use about 3 terms per query (mean 2.85), implement system feedback moderately (8.4% of users), and generally (56% of users) spend less than one minute interacting with the Web search engine. Overall, metasearchers seem to have higher degrees of interaction than searchers on non-metasearch engines, but their sessions are for a shorter period of time. These aspects of metasearching may be what define the differences from other forms of Web searching. We discuss the implications of our findings in relation to metasearch for Web searchers, search engines, and content providers.Postco Internet Aggregated search User behavior Usability Searching methodologies Search EnginesPreco Information Retrieval--Information Retrieval Systems Information Retrieval--Methods--Search Engines Search Engines--Evaluation Search Engines--Metasearch Search Engines--Users--Usability

DocNo 3Author Stephanie W. Hass Debbie A. TraversTitle Issues in the Development of a Thesaurus for Patients’ Chief Complaints in the Emergency DepartmentJnlCite Proceedings of the 67th ASIS&T Annual Meeting, vol. 41, pp. 411-417Abstract (ED), the reason the patient is seeking care is recorded as the Chief Complaint (CC). Beyond its role in the patient’s care, there is interest in the CC for secondary uses. Clinicians and epidemiologists can use CC for research. ED clinicians and administrators incorporate CC data into quality monitoringand improvement efforts. Public health officials can use it as data for health surveillance. But there is no controlled vocabulary for recording CC, or standard for a CC component in the patient record. Travers (2003) completed a crucial first step toward the creation of a thesaurus for CC by analyzing a corpus of CCs to determine the nature of the language used by triage nurses, and the concepts that were expressed. Her analysis also illuminated many issues concerning the content and structure of a CC thesaurus that must be discussed before the thesaurus can be developed. Using Cimino’s 1998 article, “Desiderata for Controlled Medical Vocabularies in the Twenty-First Century”, as a framework, we discuss these issues and the resulting decisions that the thesaurus development team, along with other stakeholders, will encounter.Postco

Page 7: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 7

Information systems Information organization Database structures Thesauri Public Health Controlled Vocabulary Thesaurus Development Standardization Medical thesaurusPreco Controlled Vocabulary--Thesaurus--Standards Information Retrieval and Storage--Methods-- Controlled Vocabulary--Thesaurus Information Retrieval and Storage--Methods-- Natural Language Processing Information Retrieval Systems--Design--User Interface--User Behavior Information Science--Information Organization Information Science--Information Storage

DocNo 4Author Gary Wan Zao LiuTitle Content-Based Information Retrieval and Digital LibrariesJnlCite (2008) Information Technology and Libraries, 27(1), p. 41-7Abstract This paper discusses the applications and importance of content-based information retrieval technology in digital libraries. It generalizes the process and analyzes current examples in four areas of the technology. Content-based information retrieval has been shown to be an effective way to search for the type of multimedia documents that are increasingly stored in digital libraries. As a good complement to traditional text-based information retrieval technology, content-based information retrieval will be a significant trend for the development of digital libaries.Postco Content-based information Information retrieval Digital media Methodology Document relevance Information systemsPreco Information Retrieval--Content-based--3-D object Information Retrieval--Content-based--audio Information Retrieval--Content-based--image Information Retrieval--Content-based--video Libraries--Digital Information Retrieval--Systems

Page 8: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 8

DocNo 5Author Mary W. Elings Gunter WaibelTitle Metadata for All: Descriptive Standards and Metadata Sharing across Libraries, Archives and MuseumsJnlCite FirstMonday, 12. Retrieved August 1, 2007, from www.firstmonday.org/issues/

issue12_3/elings/index.htmlAbstract Integrating digital content from libraries, archives and museums represents a persistent challenge. While the history of standards development is rife with examples of crosscommunity experimentation, in the end, libraries, archives and museums have developed parallel descriptive strategies for cataloguing the materials in their custody. Applying in particular data content standards by material type, and not by community affiliation, could lead to greater data interoperability within the cultural heritage community. In making this argument, the article demystifies metadata by defining and categorizing types of standards, provides a brief historical overview of the rise of descriptive standards in museums, libraries and archives, and considers the current tensions and ambitions in making descriptive practice more economic.Postco Database standards Descriptors Libraries Archives Museums Library History Metadata Archival contentPreco Information Science--Information Retrieval Information Science--Metadata--Standards--Archives Information Science--Metadata--Standards--History Information Science--Metadata--Standards--Libraries Information Science--Metadata--Standards--Museums

DocNo 6Author Scott A. Golder Bernardo A. HubermanTitle The Structure of Collaborative Tagging SystemsJnlCite Golder, S., & Huberman, B. (2005). The structure of collaborative tagging systems. arXiv.org. Retrieved March 5, 2009, from http://arxiv.org/ftp/cs/papers/0508/0508082.pdfAbstract Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. Recently, collaborative tagging has grown in popularity on the web, on sites that allow users to tag bookmarks, photographs and other content. In this paper we analyze the structure of collaborative tagging systems as well as their dynamical aspects. Specifically, we discovered regularities in user activity, tag frequencies, kinds of tags used, bursts of popularity in bookmarking and a remarkable stability in the relative proportions of tags within a given url. We also present a dynamical model of collaborative tagging that predicts these stable patterns and relates them to imitation and shared knowledge.

Page 9: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 9

Postco Information organization Metadata Community tagging Web 2.0 AggregationPreco Information Organization--Metadata--Tagging--Community tagging Information Organization--Methods--Aggregation Information Science--Information Organization Internet--Evolution--Development

DocNo 7Author Stephen P. Harter Yung-Rang ChengTitle Colinked Descriptors: Improving Vocabulary Selection for End-User SearchingJnlCite Harter, S., & Cheng, Y. (1996). Colinked descriptors: Improving vocabulary selection for end-user searching. Journal of the American Society of Information Science, 47, no. 4, 311-325.Abstract This article introduces a new concept and technique for information retrieval called colinked descriptors. Borrowed from an analogous idea in bibliometrics-cocited references colinked descriptors provide a theory and method for identifying search terms that, by hypothesis, will be superior to those entered initially by a searcher. The theory suggests a means of moving automatically from two or more initial search terms, to other terms that should be superior in retrieval performance to the two original terms. A research prolect designed to test this colinked descriptor hypothesis is reported. The results suggest that the approach is effective, although methodological problems in testing the idea are reported. Algorithms to generate Co- linked descriptors can be incorporated easily into system interfaces, front-end or pre-search systems, or help soft ware, in any database that employs a thesaurus. The potential use of colinked descriptors is a strong argument for building richer and more complex thesauri that reflect as any legitimate links among descriptors as possible.Postco Descriptors Information retrieval Searching methodologies User studies Retrieval effectivenessPreco Information Retrieval--Methods--Search Information Science--Information Organization Information Science--Bibliometrics--Citation Analysis Information Retrieval--Methods--End-user

DocNo 8Author Elizabeth Smith

Page 10: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 10

Title On the Shoulders of Giants: From Boole to Shannon to Taube: The Origins and Development of Computerized Information from the Mid-19th Century to the Present

JnlCite Smith, E. (1993, June). On the Shoulder of Giants: From Boole to Shannon to Taube: The Origins and Development of Computerized Information from the Mid-19th Century to the Present. Information Technology and Libraries, 12, no. 2, 217-226.

Abstract This article describes the evolvement of computerized information storage and retrieval, from its beginnings in the theoretical works on logic by George Boole in the mid-nineteenth century, to the application for Boole’s logic to switching circuits by Claude Shannon in the late 1930, and the development of coordinate indexing by Mortimer Taube in the late 1940s and early 1950s. Thus, electronic storage and retrieval of information, as we know it today, was the result of two major achievements: the advancement of computer technology initiated to a large extent the work of Shannon, and the development of coordinate indexing and retrieval by the work of Taube Both these achievements are base on and are the application of theoretical works of George Boole.Postco George Boole Claude Shannon Mortimer Taube Boolean Algebra Boolean Logic Circuits Switches Information Evolution Evolving Search StrategiesPreco Information Science--Theory--Boolean Logic Information Scientists--Claude Shannon Information Scientists--George Boole Information Scientists--Mortimer Taube Information Retrieval--Information Retrieval Systems

DocNo 9Author Karen Schmidt Wendy Allen Shelburne David Steven VessTitle Approaches to Selection, Access, and Collection Development in the Web World: A Case Study with Fugitive Literature.JnlCite Schmidt, K., Shelburne, W. A., & Vess, D. S. (2008). Approaches to Selection, Access,

and Collection Development in the Web World: A Case Study with Fugitive Literature. Library Resources & Technical Services. 52(3) 184-91.

Abstract Academic and research libraries are well-versed in collecting materials from the print world. The present and future collections that are being produced on the Web require urgent attention to acquire, preserve, and provide access to them for future research. 'Many of the skills that librarians have honed, through years of collecting in the print-based world are applicable to

Page 11: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 11

digital collection development, but will require ramping up technical skills and actively embracing digital content in current and future collection-development work. This paper reports on an exploratory project that aims to apply existing skills and knowledge to collect materials from the Internet and lay the groundwork for collection development in the future.Postco Digital Preservation Preservation Collection management Information Storage Information systems Digital media Internet Digital Migration Data MigrationPreco Information Storage--Archives Management--Digital Documents--Preservation Information Storage--Preservation--Ephemera Information Storage--Preservation--Websites Libraries--Collections--Development--Digital--Internet Libraries--Collections--Development--Print Libraries--Collections--Development--Digital--Storage

DocNo 10Author David BodoffTitle Relevance for searching, relevance for browsingJnlCite Bodoff, David (2006). Relevance for browsing, relevance for searching. Journal of the American Society for Information Science and Technology 57, 69-86Abstract The concept of relevance has received a great deal of theoretical attention. Separately, the relationship between focused search and browsing has also received extensive theoretical attention. This article aims to integrate these two literatures with a model and an empirical study that relate relevance in focused searching to relevance in browsing. Some factors affect both kinds of relevance in the same direction; others affect them in different ways. In our empirical study, we find that the latter factors dominate, so that there is actually a negative correlation between the probability of a document’s relevance to a browsing user and its probability of relevance to a focused searcher.Postco Browsing Searching methodologies Document relevance Information science UsabilityPreco Information Retrieval--Methods--Browsing Information Retrieval--Methods--End-user Information Retrieval--Methods--Manual Searching

Page 12: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 12

Information Retrieval--Methods--Search Information Retrieval--Methods--Search--Directed Search Search Engines--Users--Behavior Search Enginges--Users--Information Needs

DocNo 11Author Jeff RothenbergTitle Ensuring the Longevity of Digital DocumentsJnlCite Rothenberg, J. (1995, January). Ensuring the Longevity of Digital Documents. 1999 rev. of original from Scientific American, 272(1), 42-47.Abstract Digital documents are replacing paper in the most dramatic record-keeping revolution since the invention of printing. Is the current generation of these documents doomed to be lost forever?Postco Digital Preservation Digital media Information EvolutionPreco Archives Management--Digital Documents--Standards--Object-oriented Model Archives Management--Digital Documents--Standards--Relational Database Model Information Storage--Archives Management--Digital Documents--Decay Information Storage--Archives Management--Digital Documents--Encoding Information Storage--Archives Management--Digital Documents--Migration Information Storage--Archives Management--Digital Documents--Preservation Information Storage--Archives Management--Digital Documents--Software Emulation

DocNo 12Author Mazlita Mat-Hassan Mark LeveneTitle Can navigational assistance improve search experience?JnlCite Mat-Hassan, Mazlita and Levene, Mark (2001). Can navigational assistance improve

search experience? A user study. FirstMonday 6. Retrieved March 3, 2009, from http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/883/792Abstract Providing navigational aids to assist users in finding information in hypertext systems has been an ongoing research problem for well over a decade. Despite this, the incorporation of navigational aids into Web search tools has been slow. While search engines have become very efficient in producing high quality rankings, support for the navigational process is still far from satisfactory. To deal with this shortcoming of search tools, we have developed a site specific search and navigation engine that incorporates several recommended navigational aids into its novel user interface, based on the concept of a user trail. Herein, we report on a usability study whose aim was to ascertain whether adding semi-automated navigational aids to a search tool improves users' experience when "surfing" the Web. The results we obtained from the study revealed that users of the navigation engine performed better in solving the question set posed than users of a conventional search engine. Moreover, users of the navigation engine provided more accurate answers in less time and with less clicks. Our results indicate that adding

Page 13: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 13

navigational aids to search tools will enhance Web usability and take us a step further towards resolving the problem of "getting lost in hyperspace".Postco User Trail Compass Interface Global Search Engines User studies Usability Search EnginesPreco Search Engines--Users--Behavior Search Engines--Evaluation Search Engines--Users--Usability Search Engines--Navigation--Interfaces

DocNo 13Author Marcia BatesTitle The Design of browsing and berrypicking techniques for the online search interface.JnlCite Bates, Marcia J. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407-424.Abstract First, a new model of searching is online and other information systems, called “berry picking”, is discussed. This model, it is argued, is much closer to the real behavior of information searchers than the traditional model of information retrieval is, and, consequently, will guide our thinking better in the design of effective interfaces. Second, the research literature of manual information seeking behavior is drawn on for suggestions of capabilities that users might like to have in online systems. Third, based on the new model and research on information seeking, suggestions are made for how new search capabilities could be incorporated into the design of search interfaces. Particular attention is given to the nature and types of browsing that can be facilitated.Postco Information retrieval Information systems Searching methodologies Evolving Search StrategiesPreco Information Retrieval Systems--Design--User Interface--User Behavior Information Retrieval--Methods--Browsing Information Retrieval--Methods--Manual Searching Information Retrieval--Methods--Search

DocNo 14Author David C Blair M.E. MaronTitle An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval SystemJnlCite Maron, M. & Blair, D.(1985, March). An Evaluation of Retrieval Effectiveness for a

Page 14: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 14

Full-Text Document-Retrieval System, Communications of the ACM, 28(3), 289-299.Abstract An evaluation of a large, operational full-text document-retrieval system (containing roughly 350,000 pages of text) shows the system to be reentering less than 20 percent of the documents relevant to a particular search. The findings are discussed in terms of the theory- and practice of full-text document retrieval.Postco Full-text retrieval Full-text search Manual indexed term search Retrieval effectiveness Precision RecallPreco Information Retrieval--Methods--Search--Full Text Information Retrieval--Information Retrieval Systems Information Retrieval--Methods--End-user Information Retrieval--Methods--Search Search Engines--Users--Usability Information Retrieval--Evaluation--Precision Information Retrieval--Evaluation—Recall

PART B—USER GUIDE

Introduction

This database catalogues articles that present influential research and concepts within the

field of information retrieval from the late 1980s onward. The database is intended to be a

resource for graduate students studying Library and Information Science, as well as librarians

and information professionals working in the field. Articles catalogued present key research

findings, usability studies, and theoretical models that have had significant impact on the field of

information retrieval.

Audience

The database is meant for graduate students in the field of library science, who are

working on their MLIS degree. Those who might want to use it include these students, as well as

people interested in library science, information science and the information retrieval field. This

Page 15: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 15

database can be used to research relevant articles by topic. MLIS students will want to use this

database to search for articles related to library science topics and research in the library science

field.

Effective Subject Searches

Users should have a general idea of what concept/topic they are looking for information

about. Users should be familiar with some basic keywords and subject matter regarding library

science, and have a basic understanding of information storage and retrieval concepts. For

example, terms that would be helpful to know before performing a search include: “database,”

“standards,” “tagging,” and “Boolean.” Documents in the database are not meant to teach what

these basic information science concepts mean, but to support topics and provide peer-reviewed

research.

To find articles by subject, we recommend beginning a search within one of the

controlled vocabulary fields, which primarily describe major topics of information description

and retrieval that are familiar to most information professionals. Users who wish to search for

specific tools and applications (e.g., specific brands of search engines and online tools), should

begin a search within the title and abstract fields, as such specific tools are not named in the

controlled vocabularies. Some proper names of individuals significant to the field (e.g., George

Boole) are included within the controlled vocabularies. But searches for most other proper names

should be limited to the title, author, and abstract fields.

Information about Database Fields

Users can search for text in the following bibliographic fields: author, title, citation, and

abstract. In addition, each article is indexed using two fields for two controlled vocabularies—

one for precoordinate terms and one for postcoordinate terms.

Page 16: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 16

Here is a breakdown by field:

Author: follows "first name last name" format, can contain multiple authors

Title: article title from original publication

JnlCite: full citation -- contains author, title, publication source (journal, website, etc.),

volume and issue (if applicable) and where article is located within publication (URL,

page no., etc.)

Abstract: this is copied verbatim from the original publication's abstract

Postcoordinate terms: a set of controlled vocabularies that describes the content of

articles

Precoordinate terms: another set of similar controlled vocabularies, organized in a

hierarchical structure

If an article has multiple authors, all authors are included in the author field. Titles

include the full title of the article. Citation fields include the title of the journal as well as volume

and edition numbers, and date of publication. Abstracts fields can also be searched.

Users can search by keyword within these fields. Titles can be found by searching exact

title. The two vocabulary fields can contain more than one term. The subject heading/search term

may have more than one subdivision depending on the article.

Precoordinate and Postcoordinate Languages

This database has two separate controlled vocabularies. The two controlled vocabularies

are similar in topic, but structured differently—one uses precoordination, and the other

postcoordination.

Postcoordinate terms are independent and intended to be combined through Boolean

operators (AND, OR, NOT). Here is an example of how Boolean operators work with the

Page 17: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 17

postcoordinate term “digital libraries and archives:”

Example 1. OR operator:

1 digital library OR archives

The user types in “digital libraries” in the first field, then click “OR” and finally enters

the last term, “archives” which instructs the query to find documents containing either

key term, thus the search is broadened.

Example 2. AND operator:

2 digital library AND archives

With these terms using the AND operator, the user would retrieve articles including

both keywords “digital libraries” and “archives” to narrow the search.

Example 3. NOT operator:

3 archives NOT digital libraries

This combination of Boolean search will retrieve documents on “archives” but

exclude the articles with “digital libraries.”

Precoordinate terms are hierarchical, containing several layers of specificity. The user

needs to know that the precoordinate terms are aggregated for headings and sub-headings in an

increasing level of specificity, allowing the user to select the heading or sub-heading relevant to

his search. Here is a demonstration of the workings of precoordinate terms:

Example. Keyword: Metadata

The user is interested in finding out more on metadata and decides to search under the

heading of “information science.” The hierarchy of the precoordinate terms for metadata

is structured below:

4 Information Science—Metadata

Page 18: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 18

With a specific topic in mind, like usage standards for metadata, the user can narrow

the search even more by increasing the level of hierarchy as follows:

5 Information Science—Metadata—Standards

Other Searching Tips

Besides searching using Boolean operators or typing keywords into the fields (refer to

previous sections for a detailed explanation) in a hit-or-miss manner, users can view the

controlled vocabularies of the pre- and postcoordinate terms by clicking on said field and then

pressing the F3 key. This method will give users a good overview of concepts and ideas of the

articles within the database and will assist users in doing an effective search.

Rules within the Database

Within the database, plural terms are used in the vocabularies to help with stemming

issues, provide consistency, eliminate confusion and make searching easier. Proper nouns are

also included. The terms used within the precoordinate field all describe topics within the field of

information retrieval.

PART C--EVALUATION

1. Evaluation Criteria and Scenarios for TestingThis evaluation will test the recall, precision, and utility of search results retrieved from a small

database of articles on the topic of information retrieval.

RECALL

The evaluation will test how well searches on the database can recall relevant articles about

subjects in the field of information retrieval. Recall will be defined as the percentage of all

relevant articles in the database that are retrieved using a given search term (Meadow, 2007, p.

260).

Page 19: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 19

Rationale: to test how well articles in the database are indexed for aggregation

PRECISION

The evaluation will test precision, or how well articles retrieved by searches on the database

match the search criteria. Precision will be defined as the percentage of relevant articles among

those retrieved (Meadow, 2007, p. 260).

Rationale: to test how well articles in the database are indexed for discrimination

UTILITY

The evaluation will test the perceived utility of output from searches on the database. Utility will

be defined broadly as the user’s ability to make use of the retrieved documents (Meadow, 2007,

p. 322). It will be determined by each user subjectively and rated on a 7-point Likert scale.

Rationale: to compare ability of subject indexing versus other fields like title, and author, to

retrieve usable results

TESTING SCENARIOS

1.  MLIS student needs more information on "information retrieval using search engines"

2.  MLIS student needs more information on "computers and the future of libraries"

2. Evaluation of Subject Access Using Our Specific Questions2a. Testing Description

After our database was created, we needed to find a way to test how accurate and well-

constructed created it was. Our first step was deciding which criteria would give us that answer.

We pondered topical relevance, efficiency, utility, precision and recall, and eventually narrowed

it down to the latter three. We decided upon a rubric scale of 1-7 to grade the utility of each

search based on the averages of the scores. For precision, we divided the number of retrieved,

Page 20: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 20

relevant articles by the total number of articles retrieved. For recall, we decided how many

articles in the entire database should have been retrieved (x) and then divided the actual number

retrieved by x. We created a spreadsheet, and each team member did two searches. Here is an

example of a single search:

Keywords Fields # Retrieved

# Relevant overall

Recall # Of retrieved relevant

Precision Utility

Search Engines Users Usability

Preco 3 6 50% 3 100% 7

study Postco 0 6 0 0 0% 5study Title 1 6 17% 0 0% 2study Abstract 1 6 17% 0 0% 1

When all the team members had finished their testing and grading, we compiled this data

into one large chart in order to make it easier to evaluate as shown in Figure 1.

2b. Analysis of Database

Each field in our database had certain problems and advantages for our database. Our

natural language fields “title” and “abstract,” were word indexed. The advantage of these fields

being word indexed was that if the keyword was anywhere in the title or abstract, our users

would find that article with their search. This is reflected in the data (Table 1), where we had a

very high percentage of precision for searches on those two fields (average of 88% for title field,

and 91% for abstract field).

Page 21: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 21

Table 1. Averages for each testing critera (see Figure 1 for complete data)

Recall Precision Utility

Precoordinate 31% 100% 6

Postcoordinate 27% 97% 4

Title 9% 88% 2

Abstract 17% 91% 3

Another problem with this type of search is that the user would have to be lucky enough

to search with the exact term that appears in these fields, which raises the possibility of retrieving

no articles at all. For example, if our user was searching for “digital library,” he or she would not

have retrieved the article by Wan and Liu, “Content-Based Information Retrieval and Digital

Libraries” because “digital libraries” appears here in the plural. This is held up in our data (see

Table 2), where we see that in searches on both title and abstract fields, at least half of the

searches retrieved zero articles. There is also a potential problem of retrieving irrelevant articles,

because even if a title or abstract included a given keyword, this is not guarantee that the article

is about that topic. We believe this database was too small to see this effect.

Table 2. Testing results for Title and Abstract fields (see Figure 1 for complete data)

Page 22: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 22

Our controlled vocabulary fields, “preco” and “postco,” were term indexed. An

advantage of this is that it allowed the user to dip into the world of the indexer. When the user

retrieved a relevant record from the natural language fields, they were able to see the preco and

postco terms associated with that article. Having access to these fields allowed the user to get a

better idea of what to search for. For example, our user may be searching for “metadata.” In a

relevant article, they may see a preco term of “Information Retrieval—Metadata—History.”

This may open up a new path of research for the user that they may not have thought of. Our data

shows that in searches on both preco and postco fields, the precision was very high (see Table 1)

and also there was a much lower occurrence of zero articles retrieved than on searches of the

title and abstract fields (see Figure 1). Thus, we can conclude that preco and postco indexing

produces slightly better retrieval on precision.

Page 23: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 23

A problem with the controlled vocabulary terms is that they may be too specific. If the

terms are too specific, the user may rule out too many relevant articles, lowering their recall.

This is supported in our testing where searches on preco and postco terms had very high

precision (averaging 100% for precoordinate term searches, 97% for postcoordinate), but

significantly lower recall (averaging 31% for precoordinate and 27% for postcoordinate).

The title and abstract fields are a “crap shoot” when it comes to information retrieval.

You may or may not find relevant articles based on whether or not that specific keyword is

located in these fields. However, with a broad keyword, you may access a large amount of

articles. This will give you a larger chance of finding at least one relevant article to help further

your research. With that relevant article, you can find the preco and postco terms, which

contribute greatly to information retrieval. Using the preco and postco terms will give you access

to more specific articles relevant to your research. Comparing title/abstract to preco/postco, the

data seems to indicate that controlled vocabularies have a clear advantage over natural language

searching when it comes to recall. Looking at our averages (Table 1), preco and postco terms

achieved 31% and 27% recall, respectively. While this is not spectacular, it is much improved

from recall on natural language searches of title and abstract fields, which are 9% and 17%,

respectively. Thus, the controlled vocabularies result in recall that is improved more than 100%

(comparing preco to title field search, the improvement is threefold).

One interesting result of this study is the observation that precision doesn’t seem to be a

very useful metric for evaluating subject indexing since any search on a given term will

necessarily retrieve relevant articles. Whether those articles are useful to the searcher, and

whether the search is comprehensive are not measured by this metric. This brings us to our

Page 24: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 24

analysis of utility. Looking at the averages in Table 1, across all fields, a higher percentage of

recall is correlated with a higher utility score, but a higher percentage of precision does not

correlate with a higher utility score. As noted above, precision is always high. So, the ability to

retrieve search results that match a given search term doesn’t necessarily mean that you’ve

retrieved useful information.

2c. Ideas for Improvement and Further Development of Database

Two fields, “title” and “abstract,” are preset by the author. Therefore, improvements to

vocabulary for the other fields, containing precoordinate terms and postcoordinate terms, are

needed. One method is adding more general terms in the postcoordinate list. The addition of

more general terms in the vocabulary will in turn broaden the search and increase aggregation.

With improved aggregation, the recall will be better and will allow users to determine the

relevance of the retrieved documents (usability). Exclusion of jargon is another consideration to

improve our database’s controlled vocabulary terms and it would be extremely beneficial in

regard to the title and abstract fields where authors tend to use them.

The hierarchy of precoordinate terms could be changed, as well. Although the hierarchy

“structure may … assist the user in thinking of the problem and discovering ramifications and

new aspects” (Soergel, 1994), difficulties arise in retrieval if all the terms are not included in the

search. Simplifying the hierarchy of precoordinate terms will reduce the problem of hierarchy

structure being too narrow or too specific.

The three modifications: generalizing terms for postcoordinate list, exclusion of jargon, and

simplifying hierarchy of precoordinate vocabulary, are first steps our team would take to

maintain our database. We would also like to remove the abstract and put in its place a synopsis

Page 25: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 25

of the abstract/article without the usage of jargons and with generalized terms. This modification

was a direct result of the low percentage of the recalled documents under the abstract field. The

changes in these three fields will put recall and precision of our database at a more equal footing.

Page 26: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 26

REFERENCES

Meadow, C.T., Boyce, B.R., Kraft, D.H. & Barry, C. (2007). Text information retrieval

systems. Third edition. Oxford: Elsevier Inc.

Soergel, D. (1994). Indexing and retrieval performance: The logical evidence. Journal of

the American Society for Information Science, 45(8), 589-599.

Page 27: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 27

Appendix: Team Analysis of Strengths and Weaknesses of Working as a Team

Working as a team was challenging but we also got quite a lot of work done as a group. A

strength of the team was that many different viewpoints were voiced, examined, and some

agreed upon, some rejected. This was very important since there were moments of uncertainty

about whether we were heading in the right direction. Bouncing questions off one another helped

clarify some points, especially for those who were unsure. Different understandings of concepts

from each of us helped moved the group forward, especially during a few tough spots. However,

the flip side of this may be a weakness, in that sometimes too many views can lead to confusion

and chaos. Having so many people create precoordinate and postcoordinate terms and do actual

testing was also really efficient. Each of us created terms for two or three articles and did just a

few searches and evaluated them, and this resulted in a lot of data. We also had one member who

had difficulty getting the database program to work. We figured out an alternative plan for her to

participate in the testing was helpful, which one of the strengths in group work.

A weakness that we all felt was the difficulty in getting six people to meet to discuss the

assignment. Due to other team members’ work schedules and availability, sometimes the

decisions were made by just a few members. Getting all team members to meet various work

deadlines was also a difficult task due to our schedules. The desire to get everyone’s input and

consensus hindered the group a little bit, too. Making decisions about how to structure the

precoordinate and postcoordinate terms was also very difficult to do by committee since we each

had strong feelings about what terms we needed to include and we all didn’t agree.

Figure Caption

Page 28: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 28

Figure 1. Chart of Evaluation ResultsKeyword Precoordinate Postcoordinate Title Abstract

 Preco

KeywordReca

llPrecisi

onUtilit

yReca

llPrecisi

onUtilit

yReca

llPrecisi

onUtilit

yReca

llPrecisi

onUtilit

y

recall

information retrieval--

recall 0% N/A 0 20% 100% 7 0% N/A 0 0% N/A 0

digital documents

libraries--collections--

development--digital--

internet 25% 100% 6 0% N/A 0 25% 100% 7 25% 100% 7 Susan

usability

search engines--

users--usability 50% 100% 7 50% 66% 5 0% N/A 0 17% 100% 6

Student A

testing

search engines--

users--usability 50% 100% 7 0% N/A 0 0% N/A 0 17% N/A 1

Student C

study

search engines--

users--usability 50% 100% 7 0% N/A 0 17% 0% 2 17% 0% 1

Student D

methodology

information retrieval--methods--

search 67% 100% 7 33% 100% 7 0% N/A 0 0% N/A 0Student E

digital preservation

information storage--archives

management--digital

documents--preservation 33% 100% 7 33% 100% 7 0% N/A 0 0% N/A 0

Student F

search engines

information retrieval--methods--

search engines 20% 100% 5 40% 100% 6 0% N/A 0 40% 100% 6

browsing

information retrieval--methods--browsing 40% 100% 6 20% 100% 7 40% 100% 6 40% 100% 6

metadata

information science--

metadata--standards--

libraries 20% 100% 7 40% 100% 6 20% 100% 7 40% 100% 6

standards

information science--

information retrieval 20% 100% 6 0% N/A 0 20% 100% 6 20% 100% 6

information retrieval

information retrieval--

systems 9% 100% 6 27% 100% 3 9% 100% 6 27% 100% 6

internet

information storage--

preservation--websites 14% 100% 6 29% 100% 6 0% N/A 0 14% 100% 7

librarianlibraries--

digital 25% 100% 5 0% N/A 0 0% N/A 0 0% N/A 0

end user

information retrieval--methods--

end user 30% 100% 6 0% N/A 0 10% 100% 7 0% N/A 0

archives

information storage--archives 0% N/A 0 33% 100% 6 33% 100% 6 33% 100% 6

information storage

information science--

information storage 33% 100% 6 33% 100% 6 0% N/A 0 33% 100% 6

Page 29: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 29

information organization

information science--

information organization 67% 100% 6

100% 100% 5 0% N/A 0 0% N/A 0

aggregation

information organization-

-methods--aggregation

100% 100% 5 50% 100% 3 0% N/A 0 0% N/A 0

AVERAGE 34% 100% 6 27% 97% 4 9% 88% 2 17% 91% 3

Page 30: Group 8-Bibliotechs€¦  · Web viewGroup 8-Bibliotechs . Assignment 2. 4/13/09. Group Members: Student A. Susan Edwards. Student C. Student D. Student E. Student F. PART A—DATABASE

Bibliotechs Assignment 2 30