60
How Do Centralized and Distributed Version Control Systems Impact Software Changes? Caius Brindescu Mihai Codoban Sergii Shmarkatiuk Danny Dig 1

How do Centralized and Distributed Version Control Systems Impact Software Changes?

Embed Size (px)

DESCRIPTION

Talk given at ICSE 2014 in Hyderabad, India.

Citation preview

Page 1: How do Centralized and Distributed Version Control Systems Impact Software Changes?

How Do Centralized and Distributed Version Control Systems Impact

Software Changes?Caius Brindescu Mihai Codoban

Sergii Shmarkatiuk Danny Dig

1

Page 2: How do Centralized and Distributed Version Control Systems Impact Software Changes?

2

GitHub is the main “forge” for OSS projects

SourceForge 300K repos

GitHub 4.6M repos

Page 3: How do Centralized and Distributed Version Control Systems Impact Software Changes?

What’s the difference?

3

Git SVN

History Local to every user On the server

Commits Private, local Centralized, public

Branching and merging Cheap Expensive

History Modifiable “Set in stone”

Page 4: How do Centralized and Distributed Version Control Systems Impact Software Changes?

What’s the difference?

3

Git SVN

History Local to every user On the server

Commits Private, local Centralized, public

Branching and merging Cheap Expensive

History Modifiable “Set in stone”

Page 5: How do Centralized and Distributed Version Control Systems Impact Software Changes?

What’s the difference?

3

Git SVN

History Local to every user On the server

Commits Private, local Centralized, public

Branching and merging Cheap Expensive

History Modifiable “Set in stone”

Page 6: How do Centralized and Distributed Version Control Systems Impact Software Changes?

What’s the difference?

3

Git SVN

History Local to every user On the server

Commits Private, local Centralized, public

Branching and merging Cheap Expensive

History Modifiable “Set in stone”

Page 7: How do Centralized and Distributed Version Control Systems Impact Software Changes?

4

What are we missing?Developers Managers

Researchers Tool Builders

Page 8: How do Centralized and Distributed Version Control Systems Impact Software Changes?

4

What are we missing?Developers Managers

Researchers Tool Builders

Page 9: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Are they using the tools to their full potential?

4

What are we missing?Developers Managers

Researchers Tool Builders

Page 10: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Are they using the tools to their full potential?

4

What are we missing?Developers Managers

Researchers Tool Builders

Page 11: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Are they using the tools to their full potential?

4

What are we missing?

Is switching to Git good?

Developers Managers

Researchers Tool Builders

Page 12: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Are they using the tools to their full potential?

4

What are we missing?

Is switching to Git good?

Developers Managers

Researchers Tool Builders

Page 13: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Are they using the tools to their full potential?

4

What are we missing?

How does this new paradigm affect mining software repositories?

Is switching to Git good?

Developers Managers

Researchers Tool Builders

Page 14: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Are they using the tools to their full potential?

4

What are we missing?

How does this new paradigm affect mining software repositories?

Is switching to Git good?

Developers Managers

Researchers Tool Builders

Page 15: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Are they using the tools to their full potential?

4

What are we missing?

How does this new paradigm affect mining software repositories?

Is switching to Git good?

Are they building the right tools?

Developers Managers

Researchers Tool Builders

Page 16: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Survey

5

820 participants

Page 17: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Survey

5

820 participants

85% from industry 56% have over 10 years experience

51% work in teams of 6 or larger

Page 18: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Repository Analysis

6

132 repositories

358K commits

409M LOC

Page 19: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Repository Analysis

6

358K commits

409M LOC

52 SVN 51 Git 29 Hybrid

Page 20: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Git is the most used VCS

7

0%

15%

30%

45%

60%

5%1%9%12%

20%

53%

Git SVN Hg MS TFS CVS Other

Page 21: How do Centralized and Distributed Version Control Systems Impact Software Changes?

We identified 3 themes

8

2. Impact of the team size on the VCS

3. Impact of the VCS on the software process

RQ 6: Does team size affect the choice of VCS? RQ 7: Are larger teams more likely to use Issue Tracking Systems (ITS)? RQ 8: Does team size affect the size of commits? RQ 9: Does team size influence commit squashing?

RQ 10: Does the type of VCS influence the presence and the number of issue tracking labels (ITL)? RQ 11: Is there a correlation between the number of ITLs in the commit message and the commit size? RQ 12: How does the size of commits vary in time?

1. Impact of VCS on developer’s behaviorRQ 1: Does the type of VCS affect the size of commits? RQ 2: Do developers split their commits into logical units of change? How do they do it? RQ 3: How often and why do developers squash their commits? RQ 4: Why do developers prefer one Version Control System over another? RQ 5: Does the VCS influence the frequency with which developers commit?

Page 22: How do Centralized and Distributed Version Control Systems Impact Software Changes?

We identified 3 themes

8

2. Impact of the team size on the VCS

3. Impact of the VCS on the software process

RQ 6: Does team size affect the choice of VCS?RQ 7: Are larger teams more likely to use Issue Tracking Systems (ITS)?RQ 8: Does team size affRQ 9: Does team size influence commit squashing?

RQ 10: Does the type of VCS influence the presence and the number of issue tracking labels (ITL)? RQ 11: Is there a correlation between the number of ITLsRQ 12: How does the size of commits vary in time?

1. Impact of VCS on developer’s behaviorRQ 1: Does the type of VCS affect the size of commits? RQ 2: Do developers split their commits into logical units of change? How do they do it? RQ 3: How often and why do developers squash their commits? RQ 4: Why do developers prefer one Version Control System over another? RQ 5: Does the VCS influence the frequency with which developers commit?

Page 23: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ1: Does the type of VCS affect commit size?

For Git and SVN the difference was statistically significant

9

LOC

0

10.5

21

31.5

42

Mean

23.20

40.06

SVN Git

Page 24: How do Centralized and Distributed Version Control Systems Impact Software Changes?

10

“Git promotes the idea that your commit space is not inflicting pain on anybody else […] it

promotes small frequent commits […] rather than the 5pm commit”

RQ1: Does the type of VCS affect commit size?

Page 25: How do Centralized and Distributed Version Control Systems Impact Software Changes?

For repositories that transitioned, there was no statistically significant difference

11

RQ1: Does the type of VCS affect commit size?

LOC

0

6.5

13

19.5

26

Mean

25.7223.02

Hybrid-SVN Hybrid-Git

Page 26: How do Centralized and Distributed Version Control Systems Impact Software Changes?

12

Git repositories have commits size 34% smaller than SVN repositories, in terms of LOC

RQ1: Does the type of VCS affect commit size?

Page 27: How do Centralized and Distributed Version Control Systems Impact Software Changes?

12

Git repositories have commits size 34% smaller than SVN repositories, in terms of LOC

One possible explanation is that each developer commits to their own local repo, with no need for synchronization or merging their changes.

Hybrid repos keep the same commit size because of existing policies.

RQ1: Does the type of VCS affect commit size?

Page 28: How do Centralized and Distributed Version Control Systems Impact Software Changes?

12

Git repositories have commits size 34% smaller than SVN repositories, in terms of LOC

One possible explanation is that each developer commits to their own local repo, with no need for synchronization or merging their changes.

Hybrid repos keep the same commit size because of existing policies.

RQ1: Does the type of VCS affect commit size?

Old habits die hard

Page 29: How do Centralized and Distributed Version Control Systems Impact Software Changes?

ImplicationsSmaller commits makes it easier to “bisect” the tree

Git offers better tools for splitting commits

Some repositories migrate from one paradigm to the other; this might bias the results

Changing the VCS is not enough

13

Page 30: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Separating the changes to the working copy into multiple, separate commits

14

RQ2: Do developers split their changes?

file1.txtfile2.txt!file3.txt

file1.txtfile2.txt!file3.txt

Page 31: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Separating the changes to the working copy into multiple, separate commits

14

RQ2: Do developers split their changes?

file1.txt

file2.txt!file3.txt

Commit 1

Commit 2

Page 32: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ2: Do developers split their changes?

15

0%

25%

50%

75%

100%

SVN Git

6%6%13%27%81%

68%

Split their changes Group their changesOther

Page 33: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ2: Do developers split their changes?

15

0%

25%

50%

75%

100%

SVN Git

6%6%13%27%81%

68%

Split their changes Group their changesOther

“[changes] should be logically separated to easily allow [the] commit message to drive [the]

review”

Page 34: How do Centralized and Distributed Version Control Systems Impact Software Changes?

16

0%

25%

50%

75%

100%

SVN Git

12%11%6%5%

45%62%

37%22%

By implementation By issuePolicy Other

RQ2: Do developers split their changes?

Page 35: How do Centralized and Distributed Version Control Systems Impact Software Changes?

16

0%

25%

50%

75%

100%

SVN Git

12%11%6%5%

45%62%

37%22%

By implementation By issuePolicy Other

RQ2: Do developers split their changes?

“[Git] gives useful tools for splitting or merging commits”

Page 36: How do Centralized and Distributed Version Control Systems Impact Software Changes?

76% of developers split their commits. The percentage is higher for Git (81.25%),

compared to SVN (67.89%).

17

RQ2: Do developers split their changes?

Page 37: How do Centralized and Distributed Version Control Systems Impact Software Changes?

76% of developers split their commits. The percentage is higher for Git (81.25%),

compared to SVN (67.89%).

17

RQ2: Do developers split their changes?

We attribute this to an easier commit process.

Page 38: How do Centralized and Distributed Version Control Systems Impact Software Changes?

76% of developers split their commits. The percentage is higher for Git (81.25%),

compared to SVN (67.89%).

17

RQ2: Do developers split their changes?

We attribute this to an easier commit process.

Overall, developers choose to split their commits based on the issue they belong to.

Page 39: How do Centralized and Distributed Version Control Systems Impact Software Changes?

18

RQ2: Do developers split their changes?

For Git, more users (37%) split changes based on implementation details that in SVN (22%).

Page 40: How do Centralized and Distributed Version Control Systems Impact Software Changes?

18

RQ2: Do developers split their changes?

For Git, more users (37%) split changes based on implementation details that in SVN (22%).

“Each commit is one cohesive change […] (like ‘sphere class can now calculate its own volume’) - user level features usually take many commits.”

Page 41: How do Centralized and Distributed Version Control Systems Impact Software Changes?

ImplicationsDoing this makes it easier to perform other operations such as cherry-picking.

19

Page 42: How do Centralized and Distributed Version Control Systems Impact Software Changes?

ImplicationsDoing this makes it easier to perform other operations such as cherry-picking.

19

For mining software repositories, Git might be better since it allows smaller atomic changes.

Splitting changes is a manual and tedious process. Tool builders could make their tools support this process

Page 43: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ3: Why do developers prefer one VCS over another?

20

0

12.5

25

37.5

50

Killer features Old habit Ease of use Personal pref. Other

9%2%

20%23%

46%

5%1%

42%42%

11%

SVN Git

Page 44: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ3: Why do developers prefer one VCS over another?

20

0

12.5

25

37.5

50

Killer features Old habit Ease of use Personal pref. Other

9%2%

20%23%

46%

5%1%

42%42%

11%

SVN Git

Page 45: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ3: Why do developers prefer one VCS over another?

20

0

12.5

25

37.5

50

Killer features Old habit Ease of use Personal pref. Other

9%2%

20%23%

46%

5%1%

42%42%

11%

SVN Git

Page 46: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ3: Why do developers prefer one VCS over another?

20

0

12.5

25

37.5

50

Killer features Old habit Ease of use Personal pref. Other

9%2%

20%23%

46%

5%1%

42%42%

11%

SVN Git

Page 47: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ3: Why do developers prefer one VCS over another?

20

0

12.5

25

37.5

50

Killer features Old habit Ease of use Personal pref. Other

9%2%

20%23%

46%

5%1%

42%42%

11%

SVN Git

Page 48: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ3: Why do developers prefer one VCS over another?

20

0

12.5

25

37.5

50

Killer features Old habit Ease of use Personal pref. Other

9%2%

20%23%

46%

5%1%

42%42%

11%

SVN Git

Page 49: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ3: Why do developers prefer one VCS over another?

20

0

12.5

25

37.5

50

Killer features Old habit Ease of use Personal pref. Other

9%2%

20%23%

46%

5%1%

42%42%

11%

SVN Git

Page 50: How do Centralized and Distributed Version Control Systems Impact Software Changes?

21

RQ3: Why do developers prefer one VCS over another?

“You get to commit to a local repository and make your changes public only when they are ready”

Page 51: How do Centralized and Distributed Version Control Systems Impact Software Changes?

21

RQ3: Why do developers prefer one VCS over another?

“You get to commit to a local repository and make your changes public only when they are ready”

“I found the commit process very straightforward […]”

Page 52: How do Centralized and Distributed Version Control Systems Impact Software Changes?

22

RQ3: Why do developers prefer one VCS over another?

Git is preferred because of its “killer features”

SVN is preferred because of it’s easier to use and because of familiarity

Page 53: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Implications

23

Tool builders should focus on features that complement the developer’s workflow.

While Git has a steep learning curve, it does allow for better ways to manage your changes.

Page 54: How do Centralized and Distributed Version Control Systems Impact Software Changes?

RQ4: Do developers squash their commits

24

What is squashing?

Page 55: How do Centralized and Distributed Version Control Systems Impact Software Changes?

25

RQ4: Do developers squash their commits

0%

15%

30%

45%

60%

Git

8.62

55%

37%

Yes No N/A

Page 56: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Why do they do it?

Developers using Git mention two different reasons

(a) grouping several changes together

(b) they only care about the final solution, not the path they took to get there

26

RQ4: Do developers squash their commits

Page 57: How do Centralized and Distributed Version Control Systems Impact Software Changes?

27

RQ4: Do developers squash their commits

Over 1/3 of developers squash their commits

Large teams squash commits more often then small ones

Page 58: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Implications

28

Tool builder could allow for non-destructive history modifications, e.g.: hierarchical commits

Git allows users to change history before they make it public or available to others.

Page 59: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Threats

Squashing

Age bias

OSS vs. Proprietary software

29

Page 60: How do Centralized and Distributed Version Control Systems Impact Software Changes?

Conclusions

30

The commit size is smaller in Git than SVN.

Developers split their changes more often in Git, using a finer granularity.

1/3 of developers use squashing to change the history.

cope.eecs.oregonstate.edu/VCStudy

Teams of all sizes predominantly prefer Git (71%)