Communicative evolution:from strings to words to expressions to concepts to intentionsPiek Vossen
©Irion Technologies
ICT Kenniscongres, April 11th, 2006
ICT Kenniscongres, April 11th, 2006
Irion Technologies: The company
Founded in 2000 as a spin-off from TNO Multimedia Technology
5 investors: Parcom Ventures, FLV, Twinning, TNO, Van Dale The people
About 10 language technology and computer science specialists. Collaboration with teams at Van Dale and TNO.
The mission Equal access to the knowledge and information on the Internet
to all people, regardless of language and background Develop systems that understand language
ICT Kenniscongres, April 11th, 2006
Product TwentyOne
XML
Publishing Platform
Concept extraction
Translation
Indexing
Classification
Summarization
Crawl
Copy
Convert
Split
Web documents
Paper documents
Word Processor
Documents
Databases
AV Documents
ConceptualConceptualIndexingIndexing ( (NLPNLP))
Search
Capture
Cockpit
DialogueMatch /Mining
ICT Kenniscongres, April 11th, 2006
Strings Strings
Expression in language
Words….
Expression in language
….Words
Strings
Concept
Query
Concept
InformationSeeker
InformationProvider
Information
ape
….
energy
….
mass
….
….
zebra
Index of Strings
ICT Kenniscongres, April 11th, 2006
Strings Strings
Expression in language
my cell phone….
Expression in language
….mobile
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
….
….
mobile
….
….
zebra
Index of Strings
Conceptual match
Linguistic mismatch
ICT Kenniscongres, April 11th, 2006
Strings Strings
Expression in language
my cell phone….
Expression in language
….nerve cells
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
cell
….
….
….
….
zebra
Index of Strings
Conceptual mismatch
Linguistic match
ICT Kenniscongres, April 11th, 2006
Strings Strings
Expression in language
police cell ….
Expression in language
…. nerve cells
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
cell
….
….
….
….
zebra
Index of Strings
Conceptual mismatch
Linguistic match
ICT Kenniscongres, April 11th, 2006
Strings Strings
Expression in language
neuron ….
Expression in language
….nerve cells
Strings
Concept Concept
Query Information
InformationSeeker
InformationProvider
ape
….
cell
….
….
….
….
zebra
Index of Strings
Conceptual match
Linguistic mismatch
ICT Kenniscongres, April 11th, 2006
Recall & Precision
query:
“cell”
Search engine for
fatabase with
all documents
“cell
phone”
“mobile
phones”
“nerve cell”
“police cell”
recall = doorsnede / relevant
precision = doorsnede / gevonden
found intersection relevant
Recall < 20% for basic search engines!
(Blair & Maron 1985)
ICT Kenniscongres, April 11th, 2006
Synonyms,Semantic network
thesaurus
golfclub(s)
Tiger
Woods
golf
sticks
Language technology: a hole in one!
golfclubs
Linguistic analysis
at the club
clubs
for golf
ICT Kenniscongres, April 11th, 2006
Information systems lack a communicative model Language is an instrument for communication:
Not fully descriptive Minimal & sufficient information for a
communicative effect Speakers/writers make assumptions about the
addressee: Knowledge of the world Knowledge of language Knowledge about the communicative settings
ICT Kenniscongres, April 11th, 2006
Communicative models in a robust and scalable system Index of concepts instead of strings
Meaning of a word in context: Domain of the document:
Juventus => football Topic of the paragraph:
transfer scandal => business, crime Phrase: linguistically-motivated combination of words:
[wing player]football player in [police cell]jail
Topic of the query: Can I order chicken wings? => food
Phrase: [chicken wings]dish
Multilingual semantic networks in many languages to map words to concepts
Concept matching calculus for comparing query phrases with phrases in documents
ICT Kenniscongres, April 11th, 2006
Dialogue system that cooperates with user: Detect intention: complaint, buy, support, information Measure satisfaction: happy, emotions Avoid deadlocks:
Detect vagueness or ambiguity (what meaning of cell?) Detect topic shifts Handle negative information: “No phones, I want jails!” Allow to change perspective Ask user for help, directions, confirmation and explication
Create more context than simple key words and deliver more precision: answers instead of hits.
Communicative models in a robust and scalable system
ICT Kenniscongres, April 11th, 2006
Dialogue system
ClassifierEngine
DialogueManager
RetrievalEngine
• Can I help you?• My head phone does not work?
• I want to buy a new one.
• Are you looking for support or products?
• Can you tell me more about the product?• It is for my cell phone• Can you give me more details?
• It is a Nokia 338• I found the following accessories for you. Please have a look.
Concepts
Facts
Model Price In stockUserModel-Intention-Satisfaction-Emotion
InformationState:-Positives-Negatives-Relations
• Thats not what I want
cellphone
accessories
repair
support
PhrasesUtteranceTyper
ICT Kenniscongres, April 11th, 2006
Research & Development
Starting point in 2000: Retrieval technology from TNO (10 years research) Language resources from Van Dale (decades of work)
Research projects: MEANING (IST-2001-34460), 2002-2005 PIDGIN (CIC-programme), 2002-2004 Global Wordnet Association, 2000 – ongoing Aarhus (Provincie Gelderland ICT), 2005 - 2006 Kenniswijk (Senter Novem), 2005-2006 Gemeente Connect (STEVIN), 2005-2006 Cornetto (STEVIN), 2006 - 2008
ICT Kenniscongres, April 11th, 2006
MEANING (IST-2001-34460)
Funded by the EU, 2002-2005 Conceptual index and conceptual matching Extended search engine (EN, NL, DE, FR, IT,
ES) to cover also Basque and Catalan Doubled (!) production in end-user scenario
of Spanish publisher EFE. Extended in Aarhus project
ICT Kenniscongres, April 11th, 2006
PIDGIN
Funded in the CIC-programme, 2002-2004 Cross-lingual chat application English-Dutch Sign-post dialogue system that searches
information: In a collaborative task between user and machine Without the need to build a model of the world Can be applied to unlimited amounts of
unstructured data Extension: Kenniswijk, GemeenteConnect
ICT Kenniscongres, April 11th, 2006
Global Wordnet Association:http://www.globalwordnet.org Stimulates the development and interlinking
of semantic networks for all languages in the world
World-wide Semantic Grid: mapping of all languages to a single set of concepts
Currently 39 languages and extending Bi-annual conference: India (2002), Czech
(2004), Korea (2006) Extended for Dutch in Cornetto