Upload
sobhan-dasari
View
225
Download
0
Embed Size (px)
Citation preview
8/12/2019 Chapter 4 Ql
1/29
Chapter 4 : Query Languages
Baeza-Yates, 1999
Modern Information Retrieval
8/12/2019 Chapter 4 Ql
2/29
Outline
Keyword-Based Querying
Patten Matching
Structural Queries
Query Protocols
Trends and Research Issues
8/12/2019 Chapter 4 Ql
3/29
Keyword-Based Querying
A query is formulation of a user information need
Keyword-based queries are popular
1. Single-Word Queries
2. Context Queries
3. Boolean Queries
4. Natural Language
Data Retrieval
Information Retrieval
8/12/2019 Chapter 4 Ql
4/29
Single-Word Queries
A query is formulated by a word
A document is formulated by long sequences ofwords
A word is a sequence of letters surrounded byseparators
What are letters and separators? e.g,on-line
The division of the text into words is not
arbitrary
8/12/2019 Chapter 4 Ql
5/29
Context Queries
Definition
- Search words in a given context
Types
Phrase
>a sequence of single-word queries
>e.g, enhance retrieval
Proximity
>a sequence of single words or phrases, and a maximumallowed distance between them are specified
>e.g,within distance (enhance, retrieval, 4) will matchenhance the power of retrieval
8/12/2019 Chapter 4 Ql
6/29
Definition
A syntax composed of atoms that retrieve documents, and ofBoolean operators which work on their operands
e.g, translation AND syntax OR syntactic
Fuzzy Boolean Retrieve documents appearing in some operands (The AND
may require it to appear in more operands than the OR)
Boolean Queries
8/12/2019 Chapter 4 Ql
7/29
Natural Language
Generalization of fuzzy Boolean
A query is an enumeration of words and contextqueries
All the documents matching a portion of the userquery are retrieved
8/12/2019 Chapter 4 Ql
8/29
Pattern Matching
Data retrieval
A pattern is a set of syntactic features that mustoccur in a text segment
Types Words
Prefixes
e.q comput->computer,computation,computing,etc
Suffixese.q ters->computers,testers,painters,etc
Substringse.q tal->coastal,talk,metallic,etc
Ranges
between heldand hold->hoaxand hissing
8/12/2019 Chapter 4 Ql
9/29
Allowing errors
Retrieve all text words which all similarto the
given word
edit distance:
the minimum number of character insertions,deletions, and replacements needed to maketwo strings equal, e.q , flowerand flo wer
maximum allowed edit distance:
query specifies the maximum number of allowederrors for a word to match the pattern
8/12/2019 Chapter 4 Ql
10/29
Regular expressions
union:if e1and e2are regular expressions , then(e1|e2)matches what e1or e2matches
concatenation:if e1and e2are regular expressions, the
occurrences of (e1e2) are formed by the occurrences of e1immediately followed by those of e2
repetition:if e is a regular expression , then (e*)matches a sequence of zero or more contiguousoccurrence of e
pro(blem|tein)(s|)(0|1|2)*->problem2andproteins
8/12/2019 Chapter 4 Ql
11/29
Structural Queries
Mixing contents and structure in queries
- contents: words, phrases, or patterns
- structural constraints: containment, proximity,or other restrictions on structural elements
Three main structures
- Fixed structure
- Hypertext structure- Hierarchical structure
8/12/2019 Chapter 4 Ql
12/29
Fixed Structure
Document:a fixed set of fields
EX: a mail has a sender, a receiver, a date, a subject and a body field
Search for the mails sent to a given person with football in theSubject field
8/12/2019 Chapter 4 Ql
13/29
A hypertext is a directed graphwhere nodes hold sometext (text contents)
the linksrepresent connections between nodes orbetween positions inside nodes (structural connectivity)
Hypertext
8/12/2019 Chapter 4 Ql
14/29
Hypertext : WebGlimpse
WebGlimpse: combine browsing and searching onthe Web
8/12/2019 Chapter 4 Ql
15/29
Hierarchical Structure
8/12/2019 Chapter 4 Ql
16/29
Hierarchical Structure
8/12/2019 Chapter 4 Ql
17/29
Hierarchical Structure
PAT Expressions
Overlapped Lists
Lists of References
Proximal Nodes
Tree Matching
8/12/2019 Chapter 4 Ql
18/29
Query Protocols
Z39.50
WAIS (Wide Area Information Service)
8/12/2019 Chapter 4 Ql
19/29
Z39.50
American National Standard InformationRetrieval Application Service Definition
Can be implemented on any platform
Query bibliographical information using astandard interface between the client and thehost database manager
Z39.50 protocol is part of WAIS
8/12/2019 Chapter 4 Ql
20/29
Z39.50 Brief history
Z39.50-1988(version 1)
Z39.50-1992(version 2)
Z39.50-1995(version 3)
Version 4, development began in Autumn 1995
8/12/2019 Chapter 4 Ql
21/29
Using Z39.50 over the WWW
WWW Client WWW Z39.50
Z39.50 Client
Z39.50Server
RepositoryDigital library
8/12/2019 Chapter 4 Ql
22/29
WAIS (Wide Area Information Service)
Beginning in the 1990s
Query databases through the Internet
8/12/2019 Chapter 4 Ql
23/29
Trends and Research Issues
Model Queries allowed
BooleanVectorProbabilisticBBN
word,set operationswordswordswords
Relationship between types of queries and models
8/12/2019 Chapter 4 Ql
24/29
Query Language Taxonomy
The types of queries covered and how they are structured
8/12/2019 Chapter 4 Ql
25/29
8/12/2019 Chapter 4 Ql
26/29
Overlapped Lists
The model allow for the areas of a region tooverlap, but not to nest
It is not clear, whether overlapping is good or
not for capturing the structural properties
8/12/2019 Chapter 4 Ql
27/29
Lists of References
Overlap and nest are not allowed
All elements must be of the same type,e.g onlysections, or only paragraphs.
A reference is a pointer to a region of thedatabase.
8/12/2019 Chapter 4 Ql
28/29
8/12/2019 Chapter 4 Ql
29/29
Tree Matching
The leaves of the query can be not onlystructural elements but also text patterns,meaning that the ancestor of the leaf must
contain that pattern.