Upload
kory-greer
View
212
Download
0
Embed Size (px)
Citation preview
The Structure ofInformation Retrieval Systems
LBSC 708A/CMSC 838L
Douglas W. Oard and Philip Resnik
Session 1: September 4, 2001
Agenda
• 5:30-6:00 Introductions
• 6:00-6:15 What is “Information Retrieval?”
• 6:15-6:40 Applications
6:40-6:55 (Break)
• 6:55-7:20 Applications
• 7:20-7:50 System design
7:50-7:55 (Stretch break)
• 7:55-8:15 Course outline
Introductions
• Pair up with a partner from another department
• Get to know each other in 5 minutes (total)
• Tell us about your partner in 30 seconds– Name– Degree program (department, Master’s/Ph.D./Visitor)– Information retrieval experience– One thing they would like to learn
What do We Mean by “Information?”
• How is it different from “data”?– Information is data in context
• Databases contain data and produce information
• IR systems contain and provide information
• How is it different from “knowledge”?– Knowledge is a basis for making decisions
• Many “knowledge bases” contain decision rules
What Do We Mean by “Retrieval?”
• Find something that you want– The information need may or may not be explicit
• Known item search– Find the class home page
• Answer seeking– Is Lexington or Louisville the capital of Kentucky?
• Directed exploration– Who makes videoconferencing systems?
Source: Global Reach
EnglishEnglish
2000 2005
Global Internet User Population
Chinese
IR is More Than Web Searching!
• Form into four groups to discuss:
1: A system to search a collection of oral history interviews
2: Construction of a personalized newspaper
3: Software to find music recordings in an online CD store
4: Searching all the Xerox copies ever made at an office
What To Do
• Form 2-3 person subgroups to discuss: (10 min)– How would you describe what to search for?
• What makes one object “better” than another?
– How would you recognize when you have found it?• How would you explain the way you made a choice?
– What kind of technology might be helpful?• Speech recognition, optical character recognition, …
• Get together to discuss what you learned (10 min)– Build a 5 minute Powerpoint presentation
Supporting the Search Process
SourceSelection
Search
Query
Selection
Ranked List
Examination
Document
Delivery
Document
QueryFormulation
IR System
Query Reformulation and
Relevance Feedback
SourceReselection
Nominate ChoosePredict
Supporting the Search Process
SourceSelection
Search
Query
Selection
Ranked List
Examination
Document
Delivery
Document
QueryFormulation
IR System
Indexing Index
Acquisition Collection
Design Strategies
• Foster human-machine synergy– Exploit complementary strengths– Accommodate shared weaknesses
• Divide-and-conquer – Divide task into stages with well-defined interfaces– Continue dividing until problems are easily solved
• Co-design related components– Iterative process of joint optimization
Human-Machine Synergy
• Machines are good at:– Doing simple things accurately and quickly– Scaling to larger collections in sublinear time
• People are better at:– Accurately recognizing what they are looking for– Evaluating intangibles such as “quality”
• Both are pretty bad at:– Mapping consistently between words and concepts
Divide and Conquer
• Strategy: use encapsulation to limit complexity
• Approach:– Define interfaces (input and output) for each component
• Query interface: input terms, output representation
– Define the functions performed by each component• Remove common words, weight rare terms higher, …
– Repeat the process within components as needed
• Result: a hierarchical decomposition
Co-design
• Design of one component may affect another– Effect may be direct or indirect
• Approach:– Develop alternatives for each interacting component– Assess the effects of each practical combination
• on efficiency, effectiveness, and usability
– Repeat the process until a suitable combination is found
Some Examples of Codesign in IR
SourceSelection
Search
Query
Selection
Ranked List
Examination
Document
Delivery
Document
QueryFormulation
IR System
Indexing Index
Acquisition Collection
Course Goals
• Appreciate IR system capabilities and limitations
• Understand IR system design & implementation– For a broad range of applications and media
• Evaluate IR system performance
• Identify current IR research problems
Course Design
• Text/readings provide background and detail– At least one recommended reading is required
• Class provides organization and direction– We will not cover every important detail
• Assignments and project provide experience– The TA can help CLIS students with the project
• Final exam helps focus your effort
Grading
• Assignments (30% total)– Mastery of concepts and experience using tools– 708A: “homework,” 838L: “programming”
• Term project (30%)– 3 options, described on course Web page
• Final exam (40%)– Two different in-class exams
Handy Things to Know
• Classes will be videotaped– Available in the CLIS library if you miss class
• Office hours are by appointment– Send an email or ask after class
• Everything is on the web– At http://www.clis.umd.edu/courses/708a/
• We are most easily reached by email– [email protected], [email protected]
Do This Week
• Do the reading before class– Don’t fall behind!
• Start on assignment 1– Due in 2 weeks!
• Explore the Web site– Start thinking about the term project