View
830
Download
1
Category
Tags:
Preview:
DESCRIPTION
Invited talk at Processing ROmanian in Multilingual, Interoperational and Scalable Environments (PROMISE 2010) on how to port the QALL-ME framework to a new language
Citation preview
Porting the QALL-ME framework to Romanian
Constantin Orasan
Research Group in Computational LinguisticsResearch Institute in Information and Language Processing
University of Wolverhamptonhttp://www.wlv.ac.uk/~in6093/
29th March 2010
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
Need to access information
• as a result of the Internet development more and moreinformation becomes available
• this information is in many languages
• fields from computational linguistics such as automaticsummarisation, question answering, text mining, etc. can helppeople deal with information
Need to access information
• as a result of the Internet development more and moreinformation becomes available
• this information is in many languages
• fields from computational linguistics such as automaticsummarisation, question answering, text mining, etc. can helppeople deal with information
Question answering (QA)
• Question answering aims at identifying the answer to aquestion in a large collection of documents
• the information provided by QA is more focused thaninformation retrieval
• the output can be the exact answer or a text snippet whichcontains the answer
• the domain took off as a result of the introduction of QAtrack in TREC, whilst cross-lingual QA as a result of CLEF
Types of QA systems
• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database
• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain
Types of QA systems
• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database
• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain
Types of QA systems
• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database
• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain
Purpose of the presentation
• briefly present the QALL-ME project
• show how it was adapted to answer questions in Romanianabout movies
Purpose of the presentation
• briefly present the QALL-ME project
• show how it was adapted to answer questions in Romanianabout movies
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
The QALL-ME project
• QALL-ME = Question Answering Learning technologies in amultiLingual and Multimodal Environment
• EU-funded project part of FP6
• 7 partners:• FBK-irst, Italy• University of Wolverhampton, UK• University of Alicante, Spain• DFKI, Germany• Comdata, Italy• UbiEST, Italy• WayCom, Italy
• Web page: http://qallme.fbk.eu
The QALL-ME project
• aimed at establishing a shared infrastructure for multilingualand multimodal QA in the domain of tourism
• In the QALL-ME system• users ask natural language questions in several languages (both
in textual and speech modality) using a variety of input devices(e.g. mobile phones), and
• returns a list of specific answers formatted in the mostappropriate modality, ranging from small texts, maps, videos,and pictures.
Spanish Answer Extractor
Italian Answer Extractor
German Answer Extractor
QALLME central QA planner
Service Provider
Question Type ontology
Answer Type ontology
Dialog Models
English Answer Extractor
Local Information Sources
Semantic representation
Speech Recognizers
Main outputs of the project
• an ontology for the domain of tourism
• entailment based QA framework
• the QALL-ME benchmark
• an entailment framework
(all accessible from the project’s web page:http://qallme.fbk.eu)
The ontology
• A domain-specific ontology for the tourism domain wasdeveloped and shared among all the partners.
• The ontology was used to serve as:• bridge between different languages• communication language between different components of the
system
• The ontology was linked to domain independent ontologiessuch as MultiWordNet and Sumo
• For more information see (Ou et al., 2008)
Design of the ontology
• Analysis of data from content providers
• Analysis of users requirements
• Inspired by similar ontologies:• Harmonise and eTourism: focus on static information (e.g.
accommodation and events/activities)• Similar to eTourism as is written in OWL rather RDFs• but wider coverage
• Introspection
The ontology
• Main classes: Country, Destination, Site (i.e.Accommodation, Attraction, Gastro, and Infrastructure),Transportation, EventContent and Event
• Element classes: Facility, Room, PersonOrganization,Language, and Currency
• Attribute classes: Contact, Location, Period and Price.
• Element and attribute classes cannot exist independently andhave to be attached to other main or element classes
MovieShow
Cinema
Movie
TicketPrice
DateTimePeriod
synposis
isInSitehasPrice
hasEventContent
hasPeriod
priceType
priceValue
Director
Star
Producer
Writer
Currency
GPSCoordinate
DirectionLocation
Contact
hasCurrency
TimePeriod
DatePeriod
startTimeendTime
endDate startDate
hasTimePeriod
hasDatePeriod
DirectionLocation
hasSiteFacility
hasContact
hasWriter
hasDirector
hasProducer
genre
name
hasPostalAddress
hasGPSCoordinate
PostalAddress
CinemaRoom
hasRoom
hasStar
certificate
SitePrice
Event
EventContentPeriod
subClassOfsubClassOf
subClassOf
subClassOfsubClassOf
SiteFacility
RoomFacility
hasRoomFacility
name description
The ontology
• Encoded using OWL DL, since it has more expressive powerthan OWL Lite and has more efficient reasoning support thanOWL Full
• Used Protege-OWL as the editor and RacerPro7 as thereasoner
• The ontology contains• 122 classes (concepts),• 55 datatype properties and• 52 object properties which indicate the relationships among
the 122 classes.• 15 top-level classes.
• The class hierarchy has a maximum depth of 4.
The QALL-ME framework
• is an architecture skeleton for multilingual QA systems forclosed domains
• designed in such a way that it allows fast development ofclosed domain QA systems
• freely available from http://qallme.sourceforge.net/
• is based on a Service Oriented Architecture (SOA) which isrealised using web services
• relies on textual entailment recognisers
Web services
1 Context providers: are used to anchor questions in spaceand time
2 Annotators: Currently three types of annotators areavailable:
• named entity annotators which identify names of cinemas,movies, persons, etc.
• term annotators which identify hotel facilities, movie genresand other domain-specific terminology
• temporal annotators that are used to recognise and normalisetemporal expressions in user questions
3 Entailment engine: determines whether a user questionentails a retrieval procedure
4 Query generator: which relies on an entailment engine togenerate a query to extract the answer.
5 Answer pool: retrieves the answers from a database.
Context providers
• are used to anchor a question in space and time
• return the current position and time
• used by the presentation module when maps are displayed
• used by temporal process to normalise temporal entities
• determines which services are used in a cross-lingual scenario
• can be static or determined from a mobile phone
Named entity and term annotators
• named entity recogniser = identifies names of hotels, movies,persons, etc.
• term annotator = identifies domain specific terms such ashotel facilities, movie genres, etc.
• the entities and terms are known, so the task is reduced to adatabase look up
• Gazetteers are the main source for determining the entities
• The annotation module needs to determine the canonical formof a entity
• greedy algorithm that uses character based similarity, amodified TF*IDF and a greedy algorithm
• does not allow overlapping and there are few ambiguities
Named entity and term annotators
• Annotates both standard and non-standard entities: cinema,movie, location, genre, certificate
• Needs to deal with nosy input:• misspelt words/input from ASR engines/SMS input e.g.
becaming Jane, becoming Jade• free word order (Will Smith / Smith, Will)• equivalent strings (saw III / three / 3; Smith, Will / Smith,
W.)
• Needs to deal with questions in mixed languages
• Needs to deal with ambiguous entities
Temporal annotator
• questions from the domain of tourism contain a large numberof temporal expressions
• we use a simplified version of the tagger implemented byPuscasu (2004)
• the simplification was done to reduce the processing time(Varga, Puscasu, and Orasan, 2009)
• identifies both self-contained temporal expressions (TEs) andindexical/under-specified TEs
• uses TIMEX2 standard
• the output is used by TIMEX2SPARQL service to restrict theextracted answers
Entailment engine
• often closed-domain QA systems transform a question to aProlog fact or SQL query
• often this solution works only partially due to languagevariability
• in QALL-ME this problem is solved using textual entailment
• the entailment engine determines whether two questions entailthe same meaning so they share the same retrieval procedure:
• T the input question• H is textual pattern stored in a repository• textual patterns have SPARQL retrieval procedures
• we calculate the similarity between two sentences to determinewhether between them there is an entailment relation
Query generation service
• produces a SPARQL query that can be used to answer thequestion
• has a list of question templates with their associated SPARQLqueries
• relies on the entailment engine to determine which of thequestion patterns entail the same meaning as the userquestion
• fills in the slots of the question patterns
Example
User question (T): What movie can I see tonight inWolverhampton?
List of patterns (H):
• Who is the director of [MOVIE]?
• Where can I see [MOVIE] [TIMEX]?
• What movies are on in [DESTINATION] [TIMEX]?
• What is the address of [CINEMA]?
• . . .
Example
User question (T): What movie can I see tonight inWolverhampton? → What movie can I see [TIMEX] in[DESTINATION]?
List of patterns (H):
• Who is the director of [MOVIE]?
• Where can I see [MOVIE] [TIMEX]?
• What movies are on in [DESTINATION] [TIMEX]?
• What is the address of [CINEMA]?
• . . .
Select the retrieval pattern associated with the questionWhat movies are on in Wolverhampton tonight
Answer Pool service
• takes the SPARQL query generated by the query generatorand extracts the answer
• SPARQL is a query language for accessing RDF graphs by theW3C RDF Data Access Working Group
• SPARQL provides interoperability between languages
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
Cross-lingual QA
• QALL-ME tourism prototype is design to allow bothmonolingual and cross-lingual QA
• relevant web services are activated depending on the sourceand target language
• user scenario: Romanian tourist in UK who wants to find outmore about the movies in Wolverhampton
Cross-lingual QA
Prototype for Romanian
• we wanted to find out how long it takes to develop a demo forRomanian
• components had to be adapted:• named entity and term annotators had to be trained on a
different list of entities• a simple temporal annotator was implemented on the basis of
the English one• the language independent similarity entailment engine was used• the question patterns were translated to Romanian• answer pool did not required any change
• the whole process took under one week
Romanian demo
http://qallme.wlv.ac.uk:8080/QALL-ME-web-demo/index.jsp
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
Conclusions
• multilinguality is a very important issue for the QALL-MEproject
• the ontology constitute the bridge between languages
• the QALL-ME framework can be used to quickly developprototypes for other languages
Thank you!
References
Ou, Shiyan, Viktor Pekar, Constantin Orasan, Christian Spurk, and Matteo Negri.2008. Development and alignment of a domain-specific ontology for questionanswering. In European Language Resources Association (ELRA), editor, Proceedingsof the Sixth International Language Resources and Evaluation (LREC’08), Marrakech,Morocco, May 28 – 30.
Puscasu, Georgiana. 2004. A framework for temporal resolution. In Proceedings ofthe 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon,Portugal, May, 26-28.
Varga, Andrea, Georgiana Puscasu, and Constantin Orasan. 2009. Identification oftemporal expressions in the domain of tourism. In Knowledge Engineering: Principlesand Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.
Recommended