11
GoogleDictionary Paul Nepywoda Alla Rozovskaya

GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Embed Size (px)

Citation preview

Page 1: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

GoogleDictionary

Paul Nepywoda

Alla Rozovskaya

Page 2: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Goal

Develop a tool for English that, given a word, will illustrate its usage

Page 3: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Who Will Benefit

Learners of English Teachers of English Native speakers who wish to find common

usages of a word

Page 4: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Similar Tools?

Dictionaries

BUT our tool• focuses on the usage of words and not on

defining their meanings• ranks expressions based on frequency• extracts examples straight from context

Page 5: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Similar Tools?

GoogleBUT our tool

• focuses on finding high frequency neighboring words instead of simply the documents that contain the target word

Page 6: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Data Resources

Corpus of newspaper articles (3.5 Million words) [used for demo]

• Advantage: large amount of data• Disadvantage: limited domain

Use a search engine to build a corpus of documents containing the target word

• Advantages: various domains, dynamic data source• Disadvantage: time to download documents

Page 7: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Implementation (1)

Search a corpus to determine the most typical words by extracting words within a certain window of the target word and rank words based on their frequencies

-compute rank of single words and pairs of words within a window

Page 8: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Implementation (2)

Computing rank of expression• Tf : raw count

• Idf of a word :

• Position Normalization: Reward context

words closer to the target

wordthecontainingsentences

corpusinsentences

___#

__#log

Page 9: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Interface

Output ranked list of expressions with

example sentences via the Web

Examples:

course

information

notorious

come

come (without idf)

Page 10: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Further Improvements

Use a search engine to build a corpus Allow phrase searching Provide option to search for highly frequent

phrases as opposed to idiomatic expressions

Page 11: GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage

Conclusion

We have presented a tool that given a word will find typical usages of the word in natural language

The tool should be useful for• learners of English• native speakers