Upload
gauthamnair
View
110
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Short explanation of the motivation behind activegithub.com and the underlying model.
Citation preview
activeGitHubby Gautham Nair
Python wrappers for the GitHub API
2008 2009 2010 2011 2012 2013
future activity?
gap
2008 2009 2010 2011 2012 2013
future activity?
gap
activeGitHubA quantitative estimator of future repository activity
2008 2009 2010 2011 2012 2013 2014
Estimated probability of any commits in the next six months
Present
time
repoArepoB
repoY..
20,000 repos
birth
Training Data
6 months back
time
repoArepoB
repoY..
time
repoArepoB
compute features active or not?
repoY..
time
repoArepoB
compute features active or not?
Logistic Regression
repoY..
time
repoArepoB
compute features active or not?
Logistic Regression
train
test
AUC = 0.89
repoY..
any Repo you want
time
any Repo you want
Present
time
any Repo you want
compute features
time
any Repo you want
compute features active ?
predict
p = 0.73
feature!predicts!ACTIVE
Regression Coefficient
feature!predicts!ACTIVE
Regression Coefficient
2 wrecent commit!
rate
feature!predicts!ACTIVE
Regression Coefficient
2 wrecent commit!
rate
days since last commit
feature!predicts!ACTIVE
Regression Coefficient
2 wrecent commit!
rate
days since last commit
contributor!diversity
2 w!3 m!6 m!1 y!1 m
total commits
recent commit!rate
contributor!diversity
days since last commitage
Regression Coefficient
feature!predicts!ACTIVE
by Gautham Nair
R2
R10
_9d
R2_
48h
R2_
9dR
2_O
nR
10_4
8hR
10_O
nw
m98
3b_0
w0u
M_0
dR
1050
uM_1
d1u
M_1
d1u
M_5
d50
uM_5
dw
m98
3b_3
ww
m98
3b_4
ww
m98
3b_1
ww
m98
3b_2
w
axon guidancenuclear divisionnuclear divisionnuclear divisionnuclear divisionnuclear divisionmicrotubule−based processgnrt. of a sgnl invl. in cll−cll sgnl.gnrt. of a sgnl invl. in cll−cll sgnl.axon guidancengtv rgltn of mltclllr orgnsml prcssactin cytoskeleton organization
by Gautham Nair
Num of cells=27 Num of cells=48
div-1(g19)
Wild type
div-1(g19)
Wild type
div-1(g19) 25oC
150015000 0unc−120 RNAunc−120 RNA
Age
sinc
e EM
S di
visi
on (m
in)
MS E C
D
MS E C
D
div-1(or148) 25oC
0
50
100
150
MS E C
D
N2 25oC
N2 20oC
N
27
48
Age
sinc
e EM
S di
visi
on (m
in)
0
50
100
15000unc−120 RNA
A B
C D E
MS E C
D
Systems biology
Physical Chemistry
2 w!3 m!6 m!1 y!1 m
total commits
recent commit!rate
contributor!diversity
days since last commitage
predicts!ACTIVE
Regression Coefficient
0.00
0.25
0.50
0.75
1.00
0 1 2 3log10(daysSinceLastCommit + 1)
prob
Alive
25050075010001250
count
0.0
0.2
0.4
0.6
0 1 2 3log10(daysSinceLastCommit + 1)
dens
ity
stargazers_count > 1000FALSETRUE
0.00
0.25
0.50
0.75
1.00
0 2 4 6 8 10 12months
cum
ulat
ive d
istri
butio
n
months
cumulative!distribution
gap lengthonly gaps > 1 week
Logistic Regression with default regularization
predicted!aliveOrDead alive dead! alive 6167 1924! dead 1673 9036
0
10000
20000
30000
40000
0 1 2 3log10(gap_length_days + 1)
coun
t
0
250
500
750
0 1 2 3log10(longest_gap_days + 1)
coun
t
0
500
1000
1500
0 1 2 3log10(daysSinceLastCommit + 1)
coun
tAll gaps between commits
Longest commits
gap
Time since last commit
any Repo you want
time
any Repo you want
Present
time
any Repo you want
compute features
time
repoArepoB
repoZ
compute features active or not?
Finished: R repos >1 stars Python > 9 stars
false “Alive”
p_alive > 95%
false “Alive”
false “Alive”
false “Alive”
false “Alive”
p_alive < 5%
false “Dead”
false “Dead”
false “Dead”
false “Dead”