Upload
swann
View
19
Download
0
Embed Size (px)
DESCRIPTION
Joint Conference on Digital Libraries 2001 Roanoke, VA. Browsing by phrases: terminological information in interactive multilingual text retrieval. Anselmo Peñas, Julio Gonzalo and Felisa Verdejo NLP Group, Dpto. Lenguajes y Sistemas Informáticos, Distance Learning University of Spain (UNED). - PowerPoint PPT Presentation
Citation preview
Browsing by phrases: terminological information Browsing by phrases: terminological information in interactive multilingual text retrievalin interactive multilingual text retrieval
Anselmo Peñas, Julio Gonzalo and Felisa Verdejo
NLP Group, Dpto. Lenguajes y Sistemas Informáticos,
Distance Learning University of Spain (UNED)
Joint Conference on Digital Libraries 2001 Roanoke, VA
GoalsGoals
to bridge the gap between users’ vocabulary and collection terminology
• even cross-language• without needs of thesauri construction
robust and efficient integration of NLP resources and tools
• Semantic network: EuroWordNet
• Tokeniser
• Morphological analyser
• POS tagger
• Shallow parser
ApproachApproach
Perform Automatic Terminology Extraction to provide:
– At indexing time:
Criteria to add to the index a controlled set of phrases
– At query time:
Term browsing,
to navigate through the terminology and access the documents from complex terms
ApproachApproach
The task: To retrieve terminology– Lexical compounds are retrieved from mono-lexical terms
Requires– A phrase indexing level– Query expansion– Query translation
Phrasal information is used to reduce noise when expanding and translating (co-occurrence of words in the same phrase)
Lemma Document
Phrase
Terminology Extraction and IndexingTerminology Extraction and Indexing
Processing Tokenising, Lemmatising,Tagging Shallow parsing (Syntactic pattern recognition)
Results Terminological phrases for each language
• Term frequency
• Document frequency
• Component lemmas
Patterns for Spanish and CatalanN NN AN [A] Prep N [A] N [A] Prep Art N [A] N [A] Prep V N [A] Prep V N [A]
Patterns for EnglishA N [N] N N [N] A A NN A NN Prep N
Query Expansion and TranslationQuery Expansion and TranslationProhibiciónembargoentredichointerdiccióninterdictoproscripción
baninterdictionprohibitionproscription
Pruebascata, cataduradegustaciónensayoescandalloexperimentogustaciónmuestreo, tanteo
demonstrateestablish, exhibitexperimentexperimentationfall, fittingindicate, pointpresent, proofprove, runsample, samplingshew,show, tastetest, trial, try
de Nuclearesnuclear
nuclear
de
Nuclear fitting interdiction manage? Nuclear taste proscription process?
Exp
ansi
on
Tra
nsl
atio
n
Tratadosacuerdocapitulaciónconcertaciónconveniocuidar, pactomanejarprocesar
accorddiscoursehandlemanagepactprocesstreattreatisetreaty
Query in Spanish
Hierarchy of terms
Catalan
English
Spanish
Ranking of documents
QUERY
RECONSULT WITH PHRASE
EXPLORE PHRASE
EXPLORE DOCUMENT
EvaluationEvaluation
All queries 1 word queries >1 word queries
First action DOC 40.70% 45.49% 37.30%
after QUERY PHRASE 51.14% 45.65% 55.05%
RECONSULT 8.141% 8.846% 7.640%
Last action
before finishing QUERY 48.74% 53.38% 45.15%
the session with PHRASE 42.95% 40.85% 44.57%
explore DOC RECONSULT 8.306% 5.764% 10.27%
• 1523 sessions with interaction
• an average of 5.11 actions per session
• explore phrase is used in 65.13%
ConclusionConclusionss Development of a search engine based on
terminology extraction– Using terminological phrases in an intermediate way
between free-searching and thesaurus-guided searching– Without needs of thesaurus construction– Bridging the distance between the terms used in the query
and the terminology used in the collection (even in different languages)
Users appreciate phrasal information for document selection– Phrases give higher expectations of relevance than
Google’s ranking– WTB phrasal information can substantially complement the
document ranking provided by the search engines