49
activeGitHub by Gautham Nair

aboutActiveGitHub

Embed Size (px)

DESCRIPTION

Short explanation of the motivation behind activegithub.com and the underlying model.

Citation preview

Page 1: aboutActiveGitHub

activeGitHubby Gautham Nair

Page 2: aboutActiveGitHub

Python wrappers for the GitHub API

Page 3: aboutActiveGitHub

2008 2009 2010 2011 2012 2013

future activity?

gap

Page 4: aboutActiveGitHub

2008 2009 2010 2011 2012 2013

future activity?

gap

Page 5: aboutActiveGitHub

activeGitHubA quantitative estimator of future repository activity

Page 6: aboutActiveGitHub

2008 2009 2010 2011 2012 2013 2014

Estimated probability of any commits in the next six months

Page 7: aboutActiveGitHub

Present

time

repoArepoB

repoY..

20,000 repos

birth

Training Data

Page 8: aboutActiveGitHub

6 months back

time

repoArepoB

repoY..

Page 9: aboutActiveGitHub

time

repoArepoB

compute features active or not?

repoY..

Page 10: aboutActiveGitHub

time

repoArepoB

compute features active or not?

Logistic Regression

repoY..

Page 11: aboutActiveGitHub

time

repoArepoB

compute features active or not?

Logistic Regression

train

test

AUC = 0.89

repoY..

Page 12: aboutActiveGitHub

any Repo you want

Page 13: aboutActiveGitHub

time

any Repo you want

Present

Page 14: aboutActiveGitHub

time

any Repo you want

compute features

Page 15: aboutActiveGitHub

time

any Repo you want

compute features active ?

predict

p = 0.73

Page 16: aboutActiveGitHub

feature!predicts!ACTIVE

Regression Coefficient

Page 17: aboutActiveGitHub

feature!predicts!ACTIVE

Regression Coefficient

2 wrecent commit!

rate

Page 18: aboutActiveGitHub

feature!predicts!ACTIVE

Regression Coefficient

2 wrecent commit!

rate

days since last commit

Page 19: aboutActiveGitHub

feature!predicts!ACTIVE

Regression Coefficient

2 wrecent commit!

rate

days since last commit

contributor!diversity

Page 20: aboutActiveGitHub

2 w!3 m!6 m!1 y!1 m

total commits

recent commit!rate

contributor!diversity

days since last commitage

Regression Coefficient

feature!predicts!ACTIVE

Page 21: aboutActiveGitHub
Page 22: aboutActiveGitHub

by Gautham Nair

Page 23: aboutActiveGitHub

R2

R10

_9d

R2_

48h

R2_

9dR

2_O

nR

10_4

8hR

10_O

nw

m98

3b_0

w0u

M_0

dR

1050

uM_1

d1u

M_1

d1u

M_5

d50

uM_5

dw

m98

3b_3

ww

m98

3b_4

ww

m98

3b_1

ww

m98

3b_2

w

axon guidancenuclear divisionnuclear divisionnuclear divisionnuclear divisionnuclear divisionmicrotubule−based processgnrt. of a sgnl invl. in cll−cll sgnl.gnrt. of a sgnl invl. in cll−cll sgnl.axon guidancengtv rgltn of mltclllr orgnsml prcssactin cytoskeleton organization

by Gautham Nair

Num of cells=27 Num of cells=48

div-1(g19)

Wild type

div-1(g19)

Wild type

div-1(g19) 25oC

150015000 0unc−120 RNAunc−120 RNA

Age

sinc

e EM

S di

visi

on (m

in)

MS E C

D

MS E C

D

div-1(or148) 25oC

0

50

100

150

MS E C

D

N2 25oC

N2 20oC

N

27

48

Age

sinc

e EM

S di

visi

on (m

in)

0

50

100

15000unc−120 RNA

A B

C D E

MS E C

D

Systems biology

Physical Chemistry

Page 24: aboutActiveGitHub

2 w!3 m!6 m!1 y!1 m

total commits

recent commit!rate

contributor!diversity

days since last commitage

predicts!ACTIVE

Regression Coefficient

Page 25: aboutActiveGitHub

0.00

0.25

0.50

0.75

1.00

0 1 2 3log10(daysSinceLastCommit + 1)

prob

Alive

25050075010001250

count

Page 26: aboutActiveGitHub

0.0

0.2

0.4

0.6

0 1 2 3log10(daysSinceLastCommit + 1)

dens

ity

stargazers_count > 1000FALSETRUE

Page 27: aboutActiveGitHub

0.00

0.25

0.50

0.75

1.00

0 2 4 6 8 10 12months

cum

ulat

ive d

istri

butio

n

months

cumulative!distribution

gap lengthonly gaps > 1 week

Page 28: aboutActiveGitHub

Logistic Regression with default regularization

predicted!aliveOrDead alive dead! alive 6167 1924! dead 1673 9036

Page 29: aboutActiveGitHub

0

10000

20000

30000

40000

0 1 2 3log10(gap_length_days + 1)

coun

t

0

250

500

750

0 1 2 3log10(longest_gap_days + 1)

coun

t

0

500

1000

1500

0 1 2 3log10(daysSinceLastCommit + 1)

coun

tAll gaps between commits

Longest commits

gap

Time since last commit

Page 30: aboutActiveGitHub

any Repo you want

Page 31: aboutActiveGitHub

time

any Repo you want

Present

Page 32: aboutActiveGitHub

time

any Repo you want

compute features

Page 33: aboutActiveGitHub

time

repoArepoB

repoZ

compute features active or not?

Page 34: aboutActiveGitHub
Page 35: aboutActiveGitHub
Page 36: aboutActiveGitHub
Page 37: aboutActiveGitHub
Page 38: aboutActiveGitHub

Finished: R repos >1 stars Python > 9 stars

Page 39: aboutActiveGitHub
Page 40: aboutActiveGitHub

false “Alive”

p_alive > 95%

Page 41: aboutActiveGitHub

false “Alive”

Page 42: aboutActiveGitHub
Page 43: aboutActiveGitHub

false “Alive”

Page 44: aboutActiveGitHub

false “Alive”

Page 45: aboutActiveGitHub

false “Alive”

Page 46: aboutActiveGitHub

p_alive < 5%

false “Dead”

Page 47: aboutActiveGitHub

false “Dead”

Page 48: aboutActiveGitHub

false “Dead”

Page 49: aboutActiveGitHub

false “Dead”