View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Crosslingual Ontology-Based
Document Retrieval (Search)
in an eLearning Environment RANLP, Borovets, 2007
Eelco MosselUniversity of Hamburg
2
• EU-Project LT4eL: Language Technology for eLearning (www.lt4el.eu)
• Goal: use of Language Technology to improve the effectiveness of Learning Management Systems
• Multilingual Setting: 8 languages• 12 European partner universities/institutes• Crosslingual search: work together with:
– Cristina Vertan, Stefanie Reimers (University of Hamburg)
– Kiril Simov and his team (Bulgarian Academy of Sciences, Sofia)
– Alex Killing (ETH Zürich (Eidgenössische Technische Hochschule))
Framework
3
• Goals of semantic search• Resources for search function• Functionality and architecture• Further work
Overview
4
Goals of the approach1. Improved retrieval of documents
– Find documents that would not be found by simple text search (exact search word occurs in text)
– Example: search for “screen” – retrieve doc that contains “monitor” but not “screen”.
2. Multilinguality– One implementation for all languages in the project
3. Crosslinguality– Find documents in languages different from
search/interface language• No need to translate search query• Search possible with passive foreign language knowledge
Crosslingual semantic search
5
• A multilingual document collection• An ontology including a domain ontology on
the domain of the documents• Concept lexicalisations in different
languages• Annotation of concepts in the documents
Overview of resources
6
Overview of resources (graphical)
PLPTRO
ENMTNL
BGCDDE
Lexicons:TermConce
pt
LOs
Ontology
BGCSDEENMTNLPLPTRO
7
Ontology: contains concepts Document
Database
Lexicons: contain
term-concept mappings
Visualisation selec
t conce
pts
Search-Terms(multiple languages)
Search-Concepts
Retrieved Documents
Search procedure
8
Search with ILIAS
9
10
Search functionality comprises:
1. Find terms in lexicons that reflect search query.
2. Find corresponding concepts for derived terms.
3. Find relevant documents for concepts. 4. Create ranking for set of found
documents.5. Create ontology fragment containing
necessary information to present concept neighbourhood
6. Find “shared concepts”
Internal components
11
Architecture
CrosslingualSearch
LMS / ILIAS / other system using
the search functionality
LexiconLookup
Component
Ontology Management
System
OntologySearchEngine
Lexicon OntologyLucene
Database
12
• Why start with a free text query?– User wants results fast (as in Google)– Compete with fulltext search and keyword
search– Find starting point for ontology browsing
• Query lexicon: adopted/implemented strategies for– Case and diacritic insensitive– Create combinations for multiword terms
Example: Text Editor • text-editor• texteditor• text editor• text• editor
1: Query Terms
13
• Other ideas to improve recognition of query:– Lemmatisation of search terms– Expansion of lexicon with word forms– Match substrings– Match similar strings
• Insertion of function words e.g. Portuguese: “provedor acesso” “provedor de acesso”
- Dynamic list of available terms that contain input so far (involves change of GUI)
1: Query Terms (continued)
14
Not always 1:1 mapping.• Corresponding concept is missing from ontology
– LT4eL: not in lexicon• Unique result: term is lexicalisation of one concept• Multiple concepts from one domain, e.g.:
– Key (from keyboard)– Key (in database)
• Concepts from more domains: – Window (graphical representation on monitor)– Window (part of a building)
• Different concepts for different languages:– “Kind” (English: sort/type)– “Kind” (German: child)
Let the user choose: present multiple browsing units
2: Term Concept
15
• Simplest: – Disjunctive search with ranking
• For each concept, each document that is annotated with it is returned
• Documents with more search concepts are ranked higher– DISADVANTAGES:
• (too) many results• slower
• Use super/subconcepts• Further possibilities
– Conjunctive search:• Combination of concepts must occur in a document• Is taken into account by current ranking
– DISADVANTAGES:• For automatic concept search: concept set might be larger
than expected, thus restricting search results too much
3: Concept Documents
16
• How useful is it, to find documents that treat a superconcept?– Negative example: lt4el:Subroutine lt4el:Software.– Positive example: lt4el:WebPortal lt4el:Website.
• How useful is it, to find documents that treat a subconcept?– lt4el:Program has 93 subconcepts, e.g.:
• ApplicationProgram• Computervirus• Driver• Unzip
3: Concept Documents (continued)
17
• Number of different search concepts• Annotation frequency: number of times
search concepts are annotated in the document– Normalise: divide by document length
• Superconcepts and subconcepts of search concepts have lower weight – A factor determines their weight
• Language of document:– Sort per language? (currently)– Sort by ranking throughout (independent of)
languages?– Make language a factor in ranking?
4: Ranking
18
• Does semantic search return correct results? (appropriate documents)
• How easy is it to use semantic search?• Are the results better (precision/recall)
than with keyword search or fulltext search (also available in ILIAS)?– Relevant for monolingual scenario
• Is the learning process improved?– Depends on quality of ontology and annotation– In multilingual case: depends on domain
knowledge and language knowledge of multilingual test persons
Evaluation
19
• Display document fragment for search results, in addition to title.– Choose contexts, where search concepts occur
close together– More on this Thursday 18:30 at BIS-21++
information session.
• Integrate faster document lookup component
• Improve: search term lexicon entry• Make use of more relations than
super/subconcepts• Possibly other changes like:
– Sort differently than per language
Future work
20
Thank you