Upload
norma-dean
View
213
Download
0
Embed Size (px)
Citation preview
WSD for Applications
Bill DolanSenseEval 2004
Where is WSD useful? Lots of work in the field, but still no clear
answer Where WSD = classical, dictionary-sense
resolution
Intuitive Motivations Automates something we already do with
dictionaries Many applications seem to require WSD
Information Retrieval/Question Answering Cross-language information retrieval Information extraction Proofing tools, e.g. synonym replacement Translation
Pragmatic Motivations Splitting off WSD yields a pleasing division
of the NLP problem space manageable in size clear success metrics readily available training data: annotated and
unannotated
But where are the applications? Why is it so hard to find a convincing app? Hopeful answer: the quality bar just hasn’t
been met yet But even experimentally, little/no evidence
that WSD helps any application Alternatively: maybe we’re trying to
automate the wrong task Then what is the right task?
An Application-centric view What do apps actually need?
Information Retrieval/Question Answering Cross-language information retrieval Information extraction Proofing tools, e.g. synonym replacement Translation
Not a sense, a cluster of related words, etc. Instead:
The ability to map one string into another that’s superficially distinct
Regardless of length or language Paraphrase
Question Answering
The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists
Researchers announced Thursday they've completed the genetic blueprint of the blight-causing culprit responsible for sudden oak death
Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks
The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
Information Extraction
The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists
Researchers announced Thursday they've completed the genetic blueprint of the blight-causing culprit responsible for sudden oak death
Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks
The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
Cross-lingual Information Retrieval
The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists
Researchers announced Thursday they've completed the genetic blueprint of the blight-causing culprit responsible for sudden oak death
Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks
The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
Proofing: rewriting tool
The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists
Researchers announced Thursday they've completed the genetic blueprint of the blight-causing culprit responsible for sudden oak death
Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks
The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
A different take on the problem
What’s missing is a basic enabling technology Paraphrase identification/generation capability
The applications for WSD that have been suggested over the years really need more general paraphrase identification/generation skills Resolving lexical associations is just one aspect of this
Problem begins to look more like an MT problem Map one chunk of text to another, similar or not Not clear that explicit WSD useful
Some Apps Machine Translation
Data-driven techniques predominate, work pretty well No explicit WSD, just learned associations between bilingual pairings
Lexical mappings learned through statistical association not perfect, but given the right data, pretty good Different language pairs require different sense breakdowns Paraphrase/MT are the same problem
Cross-language IR What else but MT?
Proofing tools, e.g. thesaurus-level replacements But often not terribly useful; as any writer knows, there’s usually no good
synonym, and a complete rewrite is necessary Question Answering/IR
Map a query to a piece of text to semantically similar but potentially formally distinct prose
For all of these apps, problem is less individual words than whole sequences
Direction? The applications that have been suggested
for WSD are all just aspects of the larger paraphrase problem Even MT is a paraphrase problem, though a bit
more extreme than the monolingual case Focus on the broader paraphrase problem,
rather than on individual words