10
Browsing by phrases: terminological Browsing by phrases: terminological information in interactive information in interactive multilingual text retrieval multilingual text retrieval Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED) Joint Conference on Digital Libraries 2001 Roanoke, VA

Browsing by phrases: terminological information in interactive multilingual text retrieval

  • Upload
    swann

  • View
    19

  • Download
    0

Embed Size (px)

DESCRIPTION

Joint Conference on Digital Libraries 2001 Roanoke, VA. Browsing by phrases: terminological information in interactive multilingual text retrieval. Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED). - PowerPoint PPT Presentation

Citation preview

Page 1: Browsing by phrases: terminological information in interactive multilingual text retrieval

Browsing by phrases: terminological information Browsing by phrases: terminological information in interactive multilingual text retrievalin interactive multilingual text retrieval

Anselmo Peñas, Julio Gonzalo and Felisa Verdejo

NLP Group, Dpto. Lenguajes y Sistemas Informáticos,

Distance Learning University of Spain (UNED)

Joint Conference on Digital Libraries 2001 Roanoke, VA

Page 2: Browsing by phrases: terminological information in interactive multilingual text retrieval

GoalsGoals

to bridge the gap between users’ vocabulary and collection terminology

• even cross-language• without needs of thesauri construction

robust and efficient integration of NLP resources and tools

• Semantic network: EuroWordNet

• Tokeniser

• Morphological analyser

• POS tagger

• Shallow parser

Page 3: Browsing by phrases: terminological information in interactive multilingual text retrieval

ApproachApproach

Perform Automatic Terminology Extraction to provide:

– At indexing time:

Criteria to add to the index a controlled set of phrases

– At query time:

Term browsing,

to navigate through the terminology and access the documents from complex terms

Page 4: Browsing by phrases: terminological information in interactive multilingual text retrieval

ApproachApproach

The task: To retrieve terminology– Lexical compounds are retrieved from mono-lexical terms

Requires– A phrase indexing level– Query expansion– Query translation

Phrasal information is used to reduce noise when expanding and translating (co-occurrence of words in the same phrase)

Lemma Document

Phrase

Page 5: Browsing by phrases: terminological information in interactive multilingual text retrieval

Terminology Extraction and IndexingTerminology Extraction and Indexing

Processing Tokenising, Lemmatising,Tagging Shallow parsing (Syntactic pattern recognition)

Results Terminological phrases for each language

• Term frequency

• Document frequency

• Component lemmas

Patterns for Spanish and CatalanN NN AN [A] Prep N [A] N [A] Prep Art N [A] N [A] Prep V N [A] Prep V N [A]

Patterns for EnglishA N [N] N N [N] A A NN A NN Prep N

Page 6: Browsing by phrases: terminological information in interactive multilingual text retrieval

Query Expansion and TranslationQuery Expansion and TranslationProhibiciónembargoentredichointerdiccióninterdictoproscripción

baninterdictionprohibitionproscription

Pruebascata, cataduradegustaciónensayoescandalloexperimentogustaciónmuestreo, tanteo

demonstrateestablish, exhibitexperimentexperimentationfall, fittingindicate, pointpresent, proofprove, runsample, samplingshew,show, tastetest, trial, try

de Nuclearesnuclear

nuclear

de

Nuclear fitting interdiction manage? Nuclear taste proscription process?

Exp

ansi

on

Tra

nsl

atio

n

Tratadosacuerdocapitulaciónconcertaciónconveniocuidar, pactomanejarprocesar

accorddiscoursehandlemanagepactprocesstreattreatisetreaty

Page 7: Browsing by phrases: terminological information in interactive multilingual text retrieval

Query in Spanish

Hierarchy of terms

Catalan

English

Spanish

Ranking of documents

Page 8: Browsing by phrases: terminological information in interactive multilingual text retrieval

QUERY

RECONSULT WITH PHRASE

EXPLORE PHRASE

EXPLORE DOCUMENT

Page 9: Browsing by phrases: terminological information in interactive multilingual text retrieval

EvaluationEvaluation

All queries 1 word queries >1 word queries

First action DOC 40.70% 45.49% 37.30%

after QUERY PHRASE 51.14% 45.65% 55.05%

RECONSULT 8.141% 8.846% 7.640%

Last action

before finishing QUERY 48.74% 53.38% 45.15%

the session with PHRASE 42.95% 40.85% 44.57%

explore DOC RECONSULT 8.306% 5.764% 10.27%

• 1523 sessions with interaction

• an average of 5.11 actions per session

• explore phrase is used in 65.13%

Page 10: Browsing by phrases: terminological information in interactive multilingual text retrieval

ConclusionConclusionss Development of a search engine based on

terminology extraction– Using terminological phrases in an intermediate way

between free-searching and thesaurus-guided searching– Without needs of thesaurus construction– Bridging the distance between the terms used in the query

and the terminology used in the collection (even in different languages)

Users appreciate phrasal information for document selection– Phrases give higher expectations of relevance than

Google’s ranking– WTB phrasal information can substantially complement the

document ranking provided by the search engines