SOP-Statistcs_UMich.pdf

Embed Size (px)

Citation preview

  • 7/27/2019 SOP-Statistcs_UMich.pdf

    1/2

    Statement of purpose

    Name: Seong-Hwan Jun

    Objective

    My research interest is in machine learning. I am particularly interested in developing computationallyefficient statistical inference methods to solve problems arising in large data settings. My objective for

    pursuing a PhD is to become an expert researcher in the field of machine learning with capabilities to

    identify the problems suitable for machine learning methods and to develop new methods (or extend old

    methods) to solve the problems. By the end of PhD, I will have become an ideal candidate for a research

    position in the industry as well as in the academia.

    Why pursue a PhD in statistics

    It appears that the current trend in machine learning is in applying probabilistic and statistical modeling

    techniques for accurate inference. Efficient computation is also a challenge as many of the existing methods

    do not scale easily to larger data. I realize that to become an expert in machine learning, I would have tobuild strong foundation in mathematics (specifically, probability theory) and computer science as well as

    statistics. Although it is more popular to approach machine learning from computer science, my decision is to

    approach machine learning from statistics based on two reasons. First, statisticians use probabilistic models

    to capture uncertainty for accurate inference; hence, to become a statistician, one needs to be familiar with

    modern advances of probability theory. Second, in order to deal with abundance of data, there is a growing

    emphasis on computation in modern statistics. A relatively new field of computational statistics is at the

    frontiers of statistics research so that statistics can be applied to large data settings. Therefore, I concluded

    that the best way to aproach machine learning is from statistics as it is the only discipline that specifically

    focuses on three components (probability theorey, computation, and inference) necessary for becoming an

    expert researcher in machine learning.

    Research experience

    As part of the research curriculum during my Masters degree, I participated in a weekly machine learning

    reading group where the members took turns to read a paper and present the main results to the group.

    The topics chosen include computational methods for large dataset, Bayesian computational methods, and

    non-parametric Bayesian modeling techniques to name a few. We also read many applied research papers

    where the statistical methods are applied to problems in computational linguistics and phylogenetics. The

    reading group trained me to read and extract the main points efficiently from the research papers.

    I have one publication titled Entangled Monte Carlo, which was published in the proceedings of the

    25th

    conference of Neural Information Processing Systems (NIPS). I have attended the conference and madea spotlight presentation as well as a poster presentation. The paper proposes a method for efficiently

    distributing computation of the popular Sequential Monte Carlo method over multiple computing nodes.

    The reading group helped me to shape out my research interest. Currently, I am interested in a problem of

    inferring evolutionary relationship between (natural) languages using statistical modeling and computational

    1

  • 7/27/2019 SOP-Statistcs_UMich.pdf

    2/2

    techniques. I intend to tackle many problems arising in this field of computational linguistics by applying

    statistical models and machine learning methods.

    Another interest I have is in non-parametric statistical methods. I gained appreciation for this class of

    methods while attending the NIPS conference as I noticed that many machine learning researchers apply non-

    parametric statistical methods, both Bayesian and frequentist, to solve the variety of problems. I have been

    introduced to Bayesian non-parametric methods through the aforementioned reading group; however, I havenever had much contact with the recent developments in non-parametric (frequentist) statistics. Recently, I

    found myself to often wanting to learn about the non-parametric methods and to extend them so that I can

    apply them to my research. It is one of new rising interests of mine, which I intend to explore in the future.

    I gave presentation on Sequential Monte Carlo and Entangled Monte Carlo methods at the SFU-UBC joint

    seminar in September 2012.

    I gave two presentations at the UBC Department of Statistics student seminar. In the first presentation,

    I introduced basics of C/C++ programming and GNU gsl library to the fellow students. In the second

    presentation, I gave a walk-through on how to use Amazon EC2 servers for free and provided tips for

    performing computing on department servers.

    2