DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED

ENVIRONMENT

MAYURI UMRANIKAR

CONTENTS

Introduction

Retrieval Environment

- The Vector Space Model

- INEX Environment

- Flexible Retrieval System

Method Used for Retrieval

- Document Tree – Construction

- Ranking of Elements

- Output

Experiments

Conclusions

INTRODUCTION

Extensible Markup Language (XML) preferred for representing documents and due to increase of documents, issue of element retrieval arises

Focus on retrieval of relevant elements rather than entire document INEX – INitiative for Evaluation of XML Retrieval Flexible Mechanisms Different Approaches Term Weighting

RETRIEVAL ENVIRONMENT

2 Factors – Issues when focus moves from documents to components and Salton’s Vector Space Model

Vector Space Model – Weight number of times a term occurs in the document

Fox’s Extended Vector Space Model – Incorporation of objective identifiers

Document vector consists of subvectors Contain text independently indexed, weighted, searched and

retrieved Term Weighting – weighting within subjective vectors Smart Experimental Retrieval System

INEX ENVIRONMENT

Content Only (CO) –ignore document structure, like typical queries, specify only content of search

Content and Structure (CAS) – explicitly refer to structure, exhaustive and specific

CO query directly to user, CAS additional filtering and search of body portion

CAS returns rank ordered list of elements INEX-EVAL – uses measures of recall and precision

( fig, exhaustivity, specificity mapped to a single relevance)

results are ranked

FLEXIBLE RETRIEVAL SYSTEM

Smart Format – documents and topics translated, indexed as extended vectors

Subjective vectors – contain content bearing terms Objective vectors – serve as filters on result returned by CAS

queries Extended vector – subjective vector, terms having a paragraph in

body subvector Lnu-ltu weighting Dynamic flexible retrieval- tree representation, rank ordered list by

lnu weights

METHOD FOR FLEXIBLE RETRIEVAL Input – Query Q given and paragraph, retrieve rank ordered list,

terminal modes N top ranked paragraphs as input selected Set of paragraphs used to identify documents – elements generated

and returned as output Document Tree – Needs information of structure

Terminal nodes

Pre-order traversal

Terminal nodes found in paragraph index

SIMPLE XML DOCUMENT AND ITS SCHEMA

CONSTRUCTION OF DOCUMENT TREE For query Q, n top ranked paras used to build trees Leaf elements or terminal nodes - paragraph nodes Each leaf represented by term-freq weighted frequency vector 1st – gather all leaf nodes, terminal nodes done 2nd – merge children vectors for parents Document schema determine merging Parent – unique terms of children, term –freq weighted parent

vector( has content of children) Process in recursive manner done

RANKING OF ELEMENTS

Set of elements of document tree generated Problem- structured retrieval; rank ordered list of elements Method used – All-element index( separate representation for each

element of each document and weighting information) Lnu weights - elements variable length, do not require global

frequency Normalization and length – failing results in biased values Pivot – document length probability of relevance= probability of

retrieval Slope- amount of tilting Pivoted Normalization – reduces difference Lnu term weights:

((1+log(term_freq))/ (1+log(avg_term_freq)))/((1-slope)+slope*((no_unique_terms)/pivot)

Ltu weighting – N collection size, nk no of elements

((1+log(term_freq))/log(N/nk))/

((1-slope)+slope*(no_unique_terms)/pivot)) N,nk element dependent, should be known through indexing We move up; N – count elements of each type Nk – inverted file entry in paragraph index, mapping identifiers and

xpaths (given)

OUTPUT OF FLEXIBLE RETRIEVAL Select another leaf node, gather siblings, construct document tree,

calculate Lnu term weights, Ltu weighted query; produce another rank ordered list

After n top ranked exhausted, last list produced, merge lists Single set of elements rank ordered – correlation Q Comparison – flexible retrieval & all-element index

identical – set of n paragraphs i/p to flexible retrieval have all paragraphs same values used for Lnu-ltu

ALGORITHM

EXPERIMENTS

Paragraph – result; set of extended vectors representing paragraph CO – subvector represents subjective portion, body subvector

important (content of element and not type) contained in body Tree Representation

FACTORS OF INTEREST

Slope, pivot for Lnu-ltu Effective structure retrieval Can be determined – empirically, applied from one collection to

other; Generic N- no of paragraphs input, sets upper bound on number per query Actual trees depend on number of paragraphs having same group

or same document

EXPERIMENTS DONE

All-element and dynamic/flexible retrieval experiments and results

- body-only retrieval Correlation between element and query vector produced –

correlation of body elements only

Table 1

RESULTS Tables

Result equivalent Flexible more efficient – file space

Time required for indexing is half Dynamic- Per query basis cost more – n; total trees not exact

required specified Another factor – value of nk

DISCUSSIONS AND CONCLUSIONS Flexible retrieval dynamically, rank ordered list of elements, single

indexing at level - basic indexing node (paragraph) Basic functions- SMART; extended vector model Results – flexible capabilities Attempt to incorporate other subvectors, internal node, weight INEX – exhaustivity and specificity; results exhaustive; specificity

research going on; results are reflection It is the better way of retrieval than all-indexing

THANK YOU!!!

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT

Documents

Why Can’t We All Get Along? ( Structured Data and Information Retrieval) Bruce Croft Computer Science Department University of Massachusetts Amherst

New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval (ECIR 2012)

Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection

From Multi-class Classification to Structured Output ... Multi-class Classification to Structured Output Prediction ... Sequence labeling gives a label to each element ... –They

Interactive Information Retrieval with Structured Documentsduepublico.uni-duisburg-essen.de/servlets/DerivateServlet/Derivate... · 2.1 Information seeking ... Norbert was also truly

Tree structured representation of music for polyphonic music information retrieval

AlvisP2P : Scalable Peer-to-Peer Text Retrieval in a Structured P2P Network

Modern Information Retrieval - csee.umbc.edunicholas/676/mir2ed... · Indexing algorithms for XML retrieval require similar ... of the searched collection Structured Text Retrieval,

Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval

NLIWoD ISWC 2014 - Multilingual Retrieval Interface for Structured data on the Web

Predicting Multiple Structured Visual Interpretations · retrieval, server job scheduling, document summarization, grasping and motion planning. For example in information retrieval,

Information Retrieval Using SQL Structured Query Language

DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR

April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of

Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval

Diversified Retrieval as Structured Prediction

Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)

Chapter 19: Information Retrieval Search Engines Information Retrieval and Structured Data Directories Database System Concepts - 5th Edition 19.3 ©Silberschatz, Korth and Sudarshan

Tree structured representation of music for polyphonic music information retrieval David Rizo Departament of Software and Computing Systems University

Building structured query in target language for Vietnamese - English Cross Language Information Retrieval systems