View
214
Download
1
Category
Tags:
Preview:
Citation preview
What are developers talking about?AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW
DENNIS PORTENGEN
Authors
• Anton Barua (pursuing MSc. Computing Science)
• Stephen W. Thomas (PhD Computing Science)
• Dr. Ahmed E. Hassan (Business)
Goal of the paper
• “Uncovering the main discussion topics, their underlying dependencies, and trends over time.” (Barua et al., 2012)
• 4 RQs• What are the main discussion topics? • Does a question in one topic trigger answers in another?• How does developer interest change over time?• How do the interest in specific technologies change over time?
Main topics in article
• Topic modelling• Uses word-frequencies and co-occurence frequencies to build a model of
related words
• LDA (Latent Dirichlet Allocation) • Statistical technique that creates topics of sets of words in a document
• Simple idea:• ‘Planet’ , ‘Space’, ‘Star’, ‘Orbit’ indicates that topic is related to astronomy
Research Methodology
Stack Overflow Data Set Post Extraction
Extracted Posts
Pre-processing
Pre-processed Posts
LDA
Topics and Topic Memberships
ResultsPost-processing
Phase 1 Phase 2 Phase 3
Example Result of pre-processing
Before pre-processing After pre-processing<p> I’ve been having issues getting C sockets API to work properly in C++. Specifically, although I am including sys/socket.h, I still get compile time errors telling me that AF_INET is not defined. Am I missing something obvious, or could this be related to the fact that I’m doing this coding on z/OS and my problems are much more complicated? </p>
Issu c socket api work properly c++ specif include sy socket.h compil time error af_inet defin miss obvious relat fact code z os problem complic
Related Literature
• Categorized in 4 fields• The general study of Q&A websites • The study of Stack Overflow specifically• The study of other social platforms for developers• The use of LDA to study trends in software engineering data
• Difference with these studies• Aimed at the textual context generated by users instead of user activity
Opinion
STRONG POINTS
• Qualitative and quantitave techniques
• Large dataset
• Methodology applicable to other developer resources
WEAK POINTS
• Methodology does not incorporate predictive model
• Experimentation with K value and value of treshold δ
Recommended