Transcript
Page 1: The Perills of Doing Software Engineering Research using Github Data

What is in github?

Daniel M [email protected]

Page 2: The Perills of Doing Software Engineering Research using Github Data
Page 3: The Perills of Doing Software Engineering Research using Github Data
Page 4: The Perills of Doing Software Engineering Research using Github Data

Researcher states:

“40% of pull requests are not merged”

● Based on simply querying ghtorrent data● But it ignores what really happens● Many pull requests are merged without being marked as merged in github

● Ghtorrent data has many potential threats to validity

Page 5: The Perills of Doing Software Engineering Research using Github Data

What is github used for?

Page 6: The Perills of Doing Software Engineering Research using Github Data

"I store my presentations in github. I don't need USB stick anymore!"

Page 7: The Perills of Doing Software Engineering Research using Github Data
Page 8: The Perills of Doing Software Engineering Research using Github Data
Page 9: The Perills of Doing Software Engineering Research using Github Data

Are there potential threats to validity for studies that assume github is about software engineering

only?

Page 10: The Perills of Doing Software Engineering Research using Github Data

Methodology

● Reuse:– Surveys

– Data analysis for other papers

● Mixed methods:– Quantitative, and

– Qualitative

Page 11: The Perills of Doing Software Engineering Research using Github Data
Page 12: The Perills of Doing Software Engineering Research using Github Data
Page 13: The Perills of Doing Software Engineering Research using Github Data

Uses:

Page 14: The Perills of Doing Software Engineering Research using Github Data

Most projects are inactive

Page 15: The Perills of Doing Software Engineering Research using Github Data

Social?

67% of projects a personal repos

95% have 3 or less committers

Page 16: The Perills of Doing Software Engineering Research using Github Data

Self contained?

“Any serious project would have to have someseparate infrastructure - mailing lists, forums, ircchannels and their archives, build farms, etc. [...]Thus while GitHub and all other project hosts areused for collaboration, they are not and can not

be a complete solution.”

Page 17: The Perills of Doing Software Engineering Research using Github Data

But.. what about the users?

Page 18: The Perills of Doing Software Engineering Research using Github Data

Switch to http://osrc.dfm.io/dmgerman

Page 19: The Perills of Doing Software Engineering Research using Github Data

Is it still worth exploring github?

Definitely!


Recommended