27
What it's like to do a Master's thesis with me (Ted Pedersen) [email protected] http://www.d.umn.edu/~tpederse September 16, 2013

What it's like to do a Master's thesis with me (Ted Pedersen)

Embed Size (px)

DESCRIPTION

Some thoughts on what it's like to do a Master's thesis with me, including general ideas about research, my research interests, and a few suggestions as to what will lead to success

Citation preview

Page 1: What it's like to do a Master's thesis with me (Ted Pedersen)

What it's like to do a Master's thesis with me

(Ted Pedersen)

[email protected]://www.d.umn.edu/~tpederse

September 16, 2013

Page 2: What it's like to do a Master's thesis with me (Ted Pedersen)

Outline●What is research?●What are my interests?●What do you need to do to succeed?●A little bit about previous students●Comments on reading I've provided

Page 3: What it's like to do a Master's thesis with me (Ted Pedersen)

Research

Page 4: What it's like to do a Master's thesis with me (Ted Pedersen)

What is research?Asking questions about the world where the answers are interesting, whether they are positive or negative

Page 5: What it's like to do a Master's thesis with me (Ted Pedersen)

Interesting?● Can I implement this algorithm?

– Important and interesting to you, but not that significant to the rest of us

● Can I improve this algorithm to run in linear time (rather than exponential)

– Great if you succeed, but if you fail...?● Can I show this problem is inherently exponential and can't be improved upon?

– Might be a winner, assuming that this answer is still unknown and problem is of general interest

Page 6: What it's like to do a Master's thesis with me (Ted Pedersen)

Interesting?● My method is 67% accurate. Their method is 62% accurate.

– Hurrah! Yawn. Nice but incomplete.– What do we now know about the world because of this?

● I've reimplemented Smith's method and added to it a new kind of feature. This has improved Smith's result by 5%.

● Plausible, assuming we can clearly show improvement is due to the new feature

Page 7: What it's like to do a Master's thesis with me (Ted Pedersen)

Interesting!● Does knowing the part of speech of preceding words help us predict the meaning of a word?

–Yes. Tells us that syntax and semantics are connected, and that syntactic clues are important to semantics.

–No. Suggests that syntax and semantics are disconnected.

● Imagine that this is the feature we added to Smith's method

Page 8: What it's like to do a Master's thesis with me (Ted Pedersen)

What is research?●We develop interesting questions to answer

● We call these hypotheses●We then figure out the best way to answer those questions ● In our work, answers are found experimentally

–Just like in many sciences, except we use computers to conduct the experiments (and a lot of other sciences use computers to do experiments too)

● Could also be more theoretical, but that's not usually what we do

Page 9: What it's like to do a Master's thesis with me (Ted Pedersen)

This is Science●I'm a Scientist●We do some engineering to build systems to conduct experiments, but ours goals are scientific● We want to answer questions about the world, in particular human language

● Any engineering is a means to an end–The end is an answer to our question–A nicely built system is not science, it's the laboratory in which you can begin to do your science

–The department is called Computer Science, and your degree will be a Master of Science

Page 10: What it's like to do a Master's thesis with me (Ted Pedersen)

What is a Master's Thesis?● It presents an interesting and original question (hypotheses)

● It shouldn't matter if the answer is positive or negative (otherwise you force the results one way or the other)

● You must persuade your audience that the question is indeed interesting and worth answering

● You must present an argument that supports your answer

● Our arguments are nearly always experimental

● They are based on a series of well formed clearly explained experiments that can be replicated by others

● Questions do not need to be incredibly difficult or time consuming to pursue, but they should be interesting and to some extent unanswered or needing confirmation

Page 11: What it's like to do a Master's thesis with me (Ted Pedersen)

My interests

Page 12: What it's like to do a Master's thesis with me (Ted Pedersen)

What questions interest me?● Natural Language Processing – making computers better able to process human language (written form)

● Computational Linguistics – understanding the nature of language better by studying it with computational techniques

Page 13: What it's like to do a Master's thesis with me (Ted Pedersen)

What kinds of language interest me?

●General text● News articles, web search results

●Medical text● Clinical records, patient-centered social networks

●Most often in English● Sometimes other languages● I don't work on translation

Page 14: What it's like to do a Master's thesis with me (Ted Pedersen)

NLP● Word sense disambiguation (WSD)

● Assigning meanings to words based on the context in which they occur

–The boy fishes from the bank–The bank gave me a loan

● Assume meanings are already defined, for example in a dictionary

● Many of our recent questions concern the role of semantic coherence in allowing us to determine meanings of words

● http://senserelate.sourceforge.net

● http://search.cpan.org/dist/UMLS-SenseRelate/

Page 15: What it's like to do a Master's thesis with me (Ted Pedersen)

NLP● Word sense discrimination

● Assumes you don't know the possible meanings ahead of time

–Goal is to discover them● Group occurrences of a word together based on

contextual similarity ● Label the discovered groups (clusters) with a definition or

description● Many interesting questions about the role of surrounding

context in determining and defining meaning ● http://senseclusters.sourceforge.net

Page 16: What it's like to do a Master's thesis with me (Ted Pedersen)

NLP & CL●Collocation discovery

● Identify combinations of words (in large samples of text) that tend to occur together and carry some additional meaning–Toaster oven, kick the bucket, card carrying member

● Often use statistical measures of association or networks of word co-occurrences to identify

● Necessary step in some approaches to word sense disambiguation and discrimination

● A frequent question is whether a particular technique can identify a certain kind of expression (and why or why not)

●http://ngram.sourceforge.net

Page 17: What it's like to do a Master's thesis with me (Ted Pedersen)

CL● Semantic Similarity and Relatedness

● ranking or comparing concepts based on their similarity

– Is a dog more like a cat or a house?– Is corn more related to a farmer or an astronaut?

●http://wn-similarity.sourceforge.net – Is blood more like a tissue or a bone?– Is aspirin more related to a headache or a vaccination?

●http://umls-similarity.sourceforge.net● Many questions about how to use information from ontologies

or corpora to replicate human performance, and the significance of this to other NLP tasks

Page 18: What it's like to do a Master's thesis with me (Ted Pedersen)

Experimental methods●Statistical and data driven

● Clustering approaches, supervised learning

●Knowledge based● WordNet – general English● UMLS – medicine, biology, anatomy, etc.

Page 19: What it's like to do a Master's thesis with me (Ted Pedersen)

What you need to do to succeed

Page 20: What it's like to do a Master's thesis with me (Ted Pedersen)

Keys to success●Desire to conduct science, not just engineering

● Enthusiasm for asking and answering interesting questions–Going beyond just implementing things–Results do matter, and we'll form our questions such that we don't require a certain answer, but we must get concrete results that lead to an answer

●Ability to express technical ideas, questions, etc. in writing●Mature work habits

● Willingness to stay involved, and maintain steady rate of work over 4 semesters

● Email as a key channel of communication

●Willingness to program and learn what you don't know● Previous projects have used Perl, MySQL, Java● APIs increasingly important

Page 21: What it's like to do a Master's thesis with me (Ted Pedersen)

Key values●Experimental research

● Ask and answer questions (hypotheses)

●Publish when we can● A “good” Master's thesis should result in publishable work

●Open source● Free and frequent distribution of code● Allows for replication of results

●Documentation of code● User should be able to install, run, and understand results based on our documentation

● Allows for replication of results

Page 22: What it's like to do a Master's thesis with me (Ted Pedersen)

My typical schedule●Develop a very detailed proposal in first semester (with concrete deadlines specified) – typically there are 2-3 main research questions (hypotheses) that we will address

●During second semester we develop baselines based on known answers to our questions that will be basis for comparison

●During third semester we conduct 1-2 experiments designed to answer 1-2 of our questions – we measure how well (or not) those answers worked out and report on that

●During fourth semester we do one more set of experiments to answer our remaining question – again measuring how well (or not) that worked out and reporting on that

●Do not generally work too much with students in summer due to other constraints and demands on time

Page 23: What it's like to do a Master's thesis with me (Ted Pedersen)

My expectations of you● We write the thesis AS WE GO, we do not do all the writing at the end

● We release software and data AS WE GO

● We often build off of previous student's work, so we need to be careful in separating your work from theirs, and also leaving behind a body of work that future students can build on

● We meet regularly (once every week or two) and communicate very regularly (sometimes daily or even more often) via email

● I do a lot of testing and verification of results, I also read and comment on documentation extensively

● This process needs to be iterative, and you need to be responsive to my concerns (not always agreeing, but at least acknowledging and discussing, and I will do the same for yours)

● I ask that your thesis be treated as equal in priority to your class work (not higher, but not less either)

Page 24: What it's like to do a Master's thesis with me (Ted Pedersen)

A little bit about previous (successful) students

Page 25: What it's like to do a Master's thesis with me (Ted Pedersen)

Former (successful) studentshttp://www.d.umn.edu/~tpederse/masters.html

●Supervised 16 MS students ●6 earned PhDs

–CMU (3), Utah, Toronto, UM-TC

●2 are pursuing PhDs –CMU and Toronto

●2 earned second MS degree–Missouri and Pittsburgh

●Supervised 1 PhD ●UM-TC

●Topics?● 5 in semantic similarity ● 5 in word sense disambiguation

● 3 in word sense discrimination

● 2 in collocation discovery● 1 outside of NLP

Page 26: What it's like to do a Master's thesis with me (Ted Pedersen)

Reading●The paper I've suggested you read is from a highly competitive conference (ACL 2004) where it won the best paper award

●Since then it has had impact both in terms of citations and influencing the direction of NLP and CL

●I'm interested in how well you can understand this, and how interesting you find it. I would also like you to think about the hypotheses that likely motivated this work.

Page 27: What it's like to do a Master's thesis with me (Ted Pedersen)

Thank you!

http://www.d.umn.edu/[email protected]