Upload
ruby-williamson
View
213
Download
0
Embed Size (px)
Citation preview
GoogleDictionary
Paul Nepywoda
Alla Rozovskaya
Goal
Develop a tool for English that, given a word, will illustrate its usage
Who Will Benefit
Learners of English Teachers of English Native speakers who wish to find common
usages of a word
Similar Tools?
Dictionaries
BUT our tool• focuses on the usage of words and not on
defining their meanings• ranks expressions based on frequency• extracts examples straight from context
Similar Tools?
GoogleBUT our tool
• focuses on finding high frequency neighboring words instead of simply the documents that contain the target word
Data Resources
Corpus of newspaper articles (3.5 Million words) [used for demo]
• Advantage: large amount of data• Disadvantage: limited domain
Use a search engine to build a corpus of documents containing the target word
• Advantages: various domains, dynamic data source• Disadvantage: time to download documents
Implementation (1)
Search a corpus to determine the most typical words by extracting words within a certain window of the target word and rank words based on their frequencies
-compute rank of single words and pairs of words within a window
Implementation (2)
Computing rank of expression• Tf : raw count
• Idf of a word :
• Position Normalization: Reward context
words closer to the target
wordthecontainingsentences
corpusinsentences
___#
__#log
Interface
Output ranked list of expressions with
example sentences via the Web
Examples:
course
information
notorious
come
come (without idf)
Further Improvements
Use a search engine to build a corpus Allow phrase searching Provide option to search for highly frequent
phrases as opposed to idiomatic expressions
Conclusion
We have presented a tool that given a word will find typical usages of the word in natural language
The tool should be useful for• learners of English• native speakers