Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Algebraic statistics for network models–Connecting statistics, combinatorics, and computational algebra–
–Part Two–
Sonja Petrovic
(Statistics Department, Pennsylvania State University)↓
Applied Mathematics Department, Illinois Institute of Technology
Summer School on Network ScienceColumbia, SC
Tuesday, 21 May 2013
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 1 / 15
Motivating problem Tools: algebraic statistics, polyhedral geometry
Recap: Model Validation Problem
Problem
Given a candidate ERGM P and one observed network x, decide (with ahigh degree of confidence) whether x can be regarded as a draw fromsome distribution Pθ0 ∈ P.
Maximum likelihood estimation problem:
Use the observed data x to produce an optimalestimate for θ0.
11/4/09 9:03 PMModel Viewer
Page 1 of 2http://www.javaview.de/services/modelViewer/index.html
Home Demos Applications Tutorial Download Help Feedback
Model Viewer: A Web-Based Geometry Viewer
Visualize and study your own geometry models using this web service which is based onJavaView. The model files may reside on your local computer or somewhere on the internet.Simply, browse your local disk or type the URL of a model using the form below.
JavaView v.3.95www.javaview.de
Loading http://www.javaview.de/models/primitive/Dodecahedron_Demo.jvx ...
In the display, use the right mouse to get help or to open the control panel.
no file selectedChoose File
Type or browse a file from your local disk and press <upload>: upload
Currently, the file formats described in data formats are supported which include JavaView's JVX,BYU, Sun's OBJ, Mathematica graphics MGS, Maple graphics MPL, STL, WRL, DXF (someformats are partially supported only). You may also upload gzip- or zip-compressed files whichmust have an extension like .jvx.gz or .mpl.zip.
Your uploaded file will remain on the server for at most 3 hours!
Applications:
Apply any of the algorithms implemented in JavaView to your own models.
Goodness-of-fit problem (and model selection):
Can the MLE be considered as a satisfactory generativemodel for the data at hand?
Markov bases
Markov bases
Sonja Petrovic (SAC seminar) Algebraic statistics February 15, 2012 11 / 27
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 1 / 15
Motivating problem Tools: algebraic statistics, polyhedral geometry
Recap: Model Validation Problem
Problem
Given a candidate ERGM P and one observed network x, decide (with ahigh degree of confidence) whether x can be regarded as a draw fromsome distribution Pθ0 ∈ P.
Maximum likelihood estimation problem:Use the observed data x to produce an optimalestimate for θ0.(Faces of model polytope.)
11/4/09 9:03 PMModel Viewer
Page 1 of 2http://www.javaview.de/services/modelViewer/index.html
Home Demos Applications Tutorial Download Help Feedback
Model Viewer: A Web-Based Geometry Viewer
Visualize and study your own geometry models using this web service which is based onJavaView. The model files may reside on your local computer or somewhere on the internet.Simply, browse your local disk or type the URL of a model using the form below.
JavaView v.3.95www.javaview.de
Loading http://www.javaview.de/models/primitive/Dodecahedron_Demo.jvx ...
In the display, use the right mouse to get help or to open the control panel.
no file selectedChoose File
Type or browse a file from your local disk and press <upload>: upload
Currently, the file formats described in data formats are supported which include JavaView's JVX,BYU, Sun's OBJ, Mathematica graphics MGS, Maple graphics MPL, STL, WRL, DXF (someformats are partially supported only). You may also upload gzip- or zip-compressed files whichmust have an extension like .jvx.gz or .mpl.zip.
Your uploaded file will remain on the server for at most 3 hours!
Applications:
Apply any of the algorithms implemented in JavaView to your own models.
Goodness-of-fit problem (and model selection):
Can the MLE be considered as a satisfactory generativemodel for the data at hand?
Markov bases
Markov bases
Sonja Petrovic (SAC seminar) Algebraic statistics February 15, 2012 11 / 27
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 1 / 15
Motivating problem Tools: algebraic statistics, polyhedral geometry
Recap: Model Validation Problem
Problem
Given a candidate ERGM P and one observed network x, decide (with ahigh degree of confidence) whether x can be regarded as a draw fromsome distribution Pθ0 ∈ P.
Maximum likelihood estimation problem:Use the observed data x to produce an optimalestimate for θ0.(Faces of model polytope.)
11/4/09 9:03 PMModel Viewer
Page 1 of 2http://www.javaview.de/services/modelViewer/index.html
Home Demos Applications Tutorial Download Help Feedback
Model Viewer: A Web-Based Geometry Viewer
Visualize and study your own geometry models using this web service which is based onJavaView. The model files may reside on your local computer or somewhere on the internet.Simply, browse your local disk or type the URL of a model using the form below.
JavaView v.3.95www.javaview.de
Loading http://www.javaview.de/models/primitive/Dodecahedron_Demo.jvx ...
In the display, use the right mouse to get help or to open the control panel.
no file selectedChoose File
Type or browse a file from your local disk and press <upload>: upload
Currently, the file formats described in data formats are supported which include JavaView's JVX,BYU, Sun's OBJ, Mathematica graphics MGS, Maple graphics MPL, STL, WRL, DXF (someformats are partially supported only). You may also upload gzip- or zip-compressed files whichmust have an extension like .jvx.gz or .mpl.zip.
Your uploaded file will remain on the server for at most 3 hours!
Applications:
Apply any of the algorithms implemented in JavaView to your own models.
Goodness-of-fit problem (and model selection):
Can the MLE be considered as a satisfactory generativemodel for the data at hand?(Random walk on a fiber.)
Markov bases
Markov bases
Sonja Petrovic (SAC seminar) Algebraic statistics February 15, 2012 11 / 27
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 1 / 15
Three fundamental problems... ... and a model for many types of interactions
The p1 model Holland-Leinhardt ’81A renown model for directed networks
Assumes dyads are mutually independent draws from(n
2
)multinomial
distributions of size 1 and class probabilities
pi ,j = (pi ,j(0, 0), pi ,j(1, 0), pi ,j(0, 1), pi ,j(1, 1)) , i < j
corresponding to one the four possible edge configurations:
pij(0, 0) 7→ λij
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 0) 7→ λijαiβjθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(0, 1) 7→ λijαjβiθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 1) 7→ λijαiαjβiβjθ2ρij .
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
Different variants of the p1 model can be obtained by constraining themodel parameters (no reciprocal effect; constant, or edge-dependentreciprocation).
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 2 / 15
Three fundamental problems... ... and a model for many types of interactions
The p1 model Holland-Leinhardt ’81A renown model for directed networks
Assumes dyads are mutually independent draws from(n
2
)multinomial
distributions of size 1 and class probabilities
pi ,j = (pi ,j(0, 0), pi ,j(1, 0), pi ,j(0, 1), pi ,j(1, 1)) , i < j
corresponding to one the four possible edge configurations:
pij(0, 0) 7→ λij
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 0) 7→ λijαiβjθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(0, 1) 7→ λijαjβiθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 1) 7→ λijαiαjβiβjθ2ρij .
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
Different variants of the p1 model can be obtained by constraining themodel parameters (no reciprocal effect; constant, or edge-dependentreciprocation).
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 2 / 15
Three fundamental problems... ... and a model for many types of interactions
The p1 model Holland-Leinhardt ’81A renown model for directed networks
Assumes dyads are mutually independent draws from(n
2
)multinomial
distributions of size 1 and class probabilities
pi ,j = (pi ,j(0, 0), pi ,j(1, 0), pi ,j(0, 1), pi ,j(1, 1)) , i < j
corresponding to one the four possible edge configurations:
pij(0, 0) 7→ λij
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 0) 7→ λijαiβjθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(0, 1) 7→ λijαjβiθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 1) 7→ λijαiαjβiβjθ2ρij .
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
Different variants of the p1 model can be obtained by constraining themodel parameters (no reciprocal effect; constant, or edge-dependentreciprocation).
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 2 / 15
Three fundamental problems... ... and a model for many types of interactions
The p1 model Holland-Leinhardt ’81A renown model for directed networks
Assumes dyads are mutually independent draws from(n
2
)multinomial
distributions of size 1 and class probabilities
pi ,j = (pi ,j(0, 0), pi ,j(1, 0), pi ,j(0, 1), pi ,j(1, 1)) , i < j
corresponding to one the four possible edge configurations:
pij(0, 0) 7→ λij
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 0) 7→ λijαiβjθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(0, 1) 7→ λijαjβiθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 1) 7→ λijαiαjβiβjθ2ρij .
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
Different variants of the p1 model can be obtained by constraining themodel parameters (no reciprocal effect; constant, or edge-dependentreciprocation).
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 2 / 15
Three fundamental problems... ... and a model for many types of interactions
The p1 model Holland-Leinhardt ’81A renown model for directed networks
Assumes dyads are mutually independent draws from(n
2
)multinomial
distributions of size 1 and class probabilities
pi ,j = (pi ,j(0, 0), pi ,j(1, 0), pi ,j(0, 1), pi ,j(1, 1)) , i < j
corresponding to one the four possible edge configurations:
pij(0, 0) 7→ λij
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 0) 7→ λijαiβjθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(0, 1) 7→ λijαjβiθ
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
pij(1, 1) 7→ λijαiαjβiβjθ2ρij .
Recent related work: algebra of network models ERGMs: the p1 model (Holland-Leinhardt)
The p1 random graph model: algebraic perspective
n nodes, random occurrence of directed edges.Each pair {i , j} modeled independently:
pij(0, 0) = no edgepij(1, 0) = edge from i to jpij(0, 1) = edge from j to ipij(1, 1) = bidirected edge between i and j .
pij := (pij(0, 0), pij(1, 0), pij(0, 1), pij(1, 1)) 2 �3 ⇢ R4.
Definition
The p1 model Mn is the image of the simplex under the polynomial map
'n : C[pij(⇤, ⇤)]! C[�ij ,↵i ,�i , ✓, ⇢ij ]
pij(1, 0) 7! �ij↵i�j✓, pij(0, 0) 7! �ij ,
pij(0, 1) 7! �ij↵j�i✓, pij(1, 1) 7! �ij↵i↵j�i�j✓2⇢ij .
Sonja Petrovic (PSU) A. Rinaldo and S.E. Fienberg (CMU) July 12, 2012 36 / 78
Different variants of the p1 model can be obtained by constraining themodel parameters (no reciprocal effect; constant, or edge-dependentreciprocation).
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 2 / 15
Fundamental questions relevant for statistical analysis of network data
Model validation/selection problems
Standard asymptotics not applicable: complexity growthnumber of parameters increases with the number of nodes.
Sufficient statistic for the most general case of the p1 model are:
in-degrees of nodes
out-degrees of nodes
number of reciprocated edges
Questions
1 What parameter values ‘best explain’ the data? (Ideas in Lecture 1)
2 What’s wrong with goodness-of-fit?
3 Are there random walks? What are their properties? (Fixing values ofsufficient statistics.)
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 3 / 15
Fundamental questions relevant for statistical analysis of network data
Model validation/selection problems
Standard asymptotics not applicable: complexity growthnumber of parameters increases with the number of nodes.
Sufficient statistic for the most general case of the p1 model are:
in-degrees of nodes
out-degrees of nodes
number of reciprocated edges
Questions
1 What parameter values ‘best explain’ the data? (Ideas in Lecture 1)
2 What’s wrong with goodness-of-fit?
3 Are there random walks? What are their properties? (Fixing values ofsufficient statistics.)
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 3 / 15
Is the proposed model appropriate? ...and new theoretical challenges
Fitting network models – non-interpretable GoF output!
UCINET contains some routines forfitting the p1 model (Network>P1)
Assumes constant reciprocation, willgive you parameter estimates forρ, αi , βi , θ
Bad news: UCINET returns G-squarednegative goodness-of-fit valuebut “probabilities are not printedbecause the theoretical distributiongoverning these values has not yet beenestablished.”
Good news: Theoretical advance:algebraic statistics, non-asymptoticmethods.
Social Networks as Random Graphs
Testing:
� UCINET contains some routines for fittingthe P1 model (Network>P1)
� UCINET assumes constant recipro-cation, will give you parameter estimates for⇢,↵i ,�i , ✓
� Bad news: UCINET returns G -squarednegative goodness-of-fit value but ”probabil-ities are not printed because the theoreticaldistribution governing these values has notyet been established.”
� Good news: We can obtain a good-ness of fit statistic using algebraic methods!
Figure 18.27 shows the results of fitting the P1 model to the Knoke binary information network.
Figure 18.27. Results of P1 analysis of Knoke information network
The technical aspects of the estimation of the P1 model are complicated, and maximum likelihoodmethods are used. A G-square (likelihood ratio chi-square) badness of fit statistic is provided, buthas no direct interpretation or significance test.
Elizabeth Gross, math.uic.edu/⇠ lizgross Social Networks as Random Graphs
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 4 / 15
Is the proposed model appropriate? Non-asymptotic approach necessary
Random Walks for model validation
GF statistic: discrepancy(observed x , expected network)e.g., Pearson’s χ2 and the likelihood ratio statistic:
∑i<k
4∑k=1
(pij (k)− xij (k)
)2
p2ij (k)
and∑i<k
4∑k=1
xij (k) log
(xij (k)
pij (k)
).
Declare poor fit if GF is greater than a deterministic threshold(based on asymp. approx: the asymptotic tests operate with known
distributions of GF, but in sparse cases those may fail. Validity of χ2 for
sparse data: Hutchinson ’79, Haberman ’88.)
Alternative: (use exact conditional tests) Compute fraction of networks(from a relevant set F (x)) whose GF is larger!
αx =|{x ′ ∈ F (x) : GF (x ′) > GF (x)}|
|F (x)|
αx statistically large, i.e. closer to 1 =⇒p is closer to observed network than most of the other relevant points
=⇒ the model fits really well.S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 5 / 15
Is the proposed model appropriate? Non-asymptotic approach necessary
Random Walks for model validation
GF statistic: discrepancy(observed x , expected network)e.g., Pearson’s χ2 and the likelihood ratio statistic:
∑i<k
4∑k=1
(pij (k)− xij (k)
)2
p2ij (k)
and∑i<k
4∑k=1
xij (k) log
(xij (k)
pij (k)
).
Declare poor fit if GF is greater than a deterministic threshold(based on asymp. approx: the asymptotic tests operate with known
distributions of GF, but in sparse cases those may fail. Validity of χ2 for
sparse data: Hutchinson ’79, Haberman ’88.)
Alternative: (use exact conditional tests) Compute fraction of networks(from a relevant set F (x)) whose GF is larger!
αx =|{x ′ ∈ F (x) : GF (x ′) > GF (x)}|
|F (x)|
αx statistically large, i.e. closer to 1 =⇒p is closer to observed network than most of the other relevant points
=⇒ the model fits really well.S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 5 / 15
Is the proposed model appropriate? Non-asymptotic approach necessary
Random Walks for model validation
A fiber F (x) of an observed network is the set of all networks on the samenumber of nodes that have the same sufficient statistics.
Markov bases
Markov bases
Sonja Petrovic (SAC seminar) Algebraic statistics February 15, 2012 11 / 27
A Markov basis for the model is a set of movesthat allows a walk from any network to anyother network in the same fiber. (Markov chainsfor sampling from conditional distributions)
Alternative, non-asymptotic approach to testing goodness of fit:
set k:=0; set xold := x ; while k ≤ K do:1 randomly choose a Markov move and a number ε ∈ {−1, 1};2 if xold + εf ∈ F (x), then xnew := xold + εf ; else xnew = xold;3 if GF (xnew) > GF (x) then k:= k+1.
For K � 0, k/K accurate estimate of αx [Diaconis-Sturmfels ’98].
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 6 / 15
Is the proposed model appropriate? Non-asymptotic approach necessary
Random Walks for model validation
A fiber F (x) of an observed network is the set of all networks on the samenumber of nodes that have the same sufficient statistics.
Markov bases
Markov bases
Sonja Petrovic (SAC seminar) Algebraic statistics February 15, 2012 11 / 27
A Markov basis for the model is a set of movesthat allows a walk from any network to anyother network in the same fiber. (Markov chainsfor sampling from conditional distributions)
Alternative, non-asymptotic approach to testing goodness of fit:
set k:=0; set xold := x ; while k ≤ K do:1 randomly choose a Markov move and a number ε ∈ {−1, 1};2 if xold + εf ∈ F (x), then xnew := xold + εf ; else xnew = xold;3 if GF (xnew) > GF (x) then k:= k+1.
For K � 0, k/K accurate estimate of αx [Diaconis-Sturmfels ’98].
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 6 / 15
Markov bases guarantee connected fibers for log-linear models.
Markov bases take-home message:
Well-defined for linear exponential families.
The only set of moves that guarantee that the random walk will givethe real distribution. Ignoring subtleties and going with simple orheuristic moves will at best give approximation, but we have no clueabout their accuracy.
=⇒ So, if the goal is to use fixed values for minimal sufficient statistics,then one must use Markov bases.Anything else is guaranteed not to be exact.
Direct tie: problems of sampling the conditional distributions of the spaceof graphs with fixed characteristics. (e.g. degree sequence or distribution,in- and out- degrees for directed graphs, etc.).
Software (Macaulay2, 4ti2) can be used on small examples.Dynamic algorithms for any reasonably-sized example.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 7 / 15
Algebraic geometry interlude!
INTERLUDE
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 1 / Inerlude
Algebraic geometry interlude!
What is Algebraic Statistics?
Algebraic geometry and related fields applied to statistics
Fact (Guiding principle)
Many important statistical models correspond to algebraic orsemi-algebraic sets of parameters.
The geometry of these parameter spaces determines the behavior ofwidely used statistical inference procedures.
Model geometry: ”Shape” of a statistical model: intuitive notion offundamental importance to statistical inference; reflected in itsabstract geometric properties
Ex: is the likelihood function multimodal?Does the model have singularities (is non-regular)?Nature of underlying singularities?
When a model is algebraic, use tools from algebraic geometry andcomputational algebra software packages.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 2 / Inerlude
Algebraic geometry interlude!
What is Algebraic Statistics?
Algebraic geometry and related fields applied to statistics
Fact (Guiding principle)
Many important statistical models correspond to algebraic orsemi-algebraic sets of parameters.
The geometry of these parameter spaces determines the behavior ofwidely used statistical inference procedures.Model geometry: ”Shape” of a statistical model: intuitive notion offundamental importance to statistical inference; reflected in itsabstract geometric properties
Ex: is the likelihood function multimodal?Does the model have singularities (is non-regular)?Nature of underlying singularities?
When a model is algebraic, use tools from algebraic geometry andcomputational algebra software packages.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 2 / Inerlude
Algebraic geometry interlude! History
History
Diaconis, Sturmfels ’98: Grobner bases for exact conditional tests.
Pistone, Wynn ’96: Use Grobner bases to study confounding in designof experiments.
Pistone, Riccomagno, Wynn ’01, Algebraic statistics: computationalcommutative algebra in statistics.
Drton, Sturmfels, Sullivant ’09: Lectures in algebraic statistics.
Gibilisco, Riccomagno, Rogantin, Wynn ’09: Algebraic and Geometricmethods in statistics.
Current theory and applications:
- contingency tables, sampling methods, graphical and latent classmodels, factor analysis, design of experiments, tropical geometry
- statistical disclosure limitation, computational biology
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 3 / Inerlude
Algebraic geometry interlude! History
History
Diaconis, Sturmfels ’98: Grobner bases for exact conditional tests.
Pistone, Wynn ’96: Use Grobner bases to study confounding in designof experiments.
Pistone, Riccomagno, Wynn ’01, Algebraic statistics: computationalcommutative algebra in statistics.
Drton, Sturmfels, Sullivant ’09: Lectures in algebraic statistics.
Gibilisco, Riccomagno, Rogantin, Wynn ’09: Algebraic and Geometricmethods in statistics.
Current theory and applications:
- contingency tables, sampling methods, graphical and latent classmodels, factor analysis, design of experiments, tropical geometry
- statistical disclosure limitation, computational biology
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 3 / Inerlude
Algebraic geometry interlude! Algebraic geometry
Introduction to algebraic geometry
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 4 / Inerlude
Algebraic geometry interlude! Algebraic geometry
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 5 / Inerlude
Algebraic geometry interlude! Algebraic geometry
Example: Hardy-Weinberg Equilibrium
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 6 / Inerlude
Algebraic geometry interlude! Algebraic geometry
The main point of this interlude
1 Many statistical models are described by (semi)-algebraic constraintson a natural parameter space
2 Generators of the vanishing ideal can be useful for constructingalgorithms or analyzing properties of a statistical model
3 Two examples:
Markov basesIdentifiability of phylogenetic mixture models
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 7 / Inerlude
Algebraic geometry interlude! Algebraic geometry
End of interlude!
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 8 / Inerlude
Markov bases guarantee connected fibers. Breaking news: algebra gives Markov bases
Algebraic geometry of statistical models
In algebraic statistics, we studystatistical models whoseparameter spaces correspond toreal positive parts of algebraicvarieties.
Log-linear models correspond totoric varieties.
Theorem (Diaconis-Sturmfels 1998)
A set of moves is a Markov basis if and only if the corresponding binomialsgenerate the toric ideal of the model.
The toric ideal of the p1 model is the set of all polynomial (binomial)relations among the joint probabilities pij(0, 0), pij(0, 1), pij(1, 0), pij(1, 1).
These binomials are zero for any choice of model parameters. Nontrivial toderive.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 8 / 15
Algebraic statistics: tools that guarantee algorithms for GoF
The story of 80,000 Markov moves
Theorem (P.-Rinaldo-Fienberg, 2010)
The toric ideal of the p1 random graph model on n nodes is themulti-homogenous piece of the ideal generated mainly by the definingequations for the edge subring of a bipartite graph.
Natural toric parametrization → 4ti2 computed a minimalgenerating set for the 4-node graph: 80, 610 binomials.
Our parametrization → 77 minimal generators.
Theorem + multi-grading → 10 essential ‘pieces’ of generators.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 9 / 15
Algebraic statistics: tools that guarantee algorithms for GoF
Two easy examples of Markov moves for the p1 model
Recall: pij (1, 0) 7→ λijαiβjθ, pij (0, 0) 7→ λij ,
pij (0, 1) 7→ λijαjβiθ, pij (1, 1) 7→ λijαiαjβiβjθ2ρij .
The fiber of any network in the p1 model is the set of all other networkswith the same in- and out- degrees and number of reciprocated edges.
Two moves, and the corresponding binomials:
p12(1, 0)p23(1, 0)p13(0, 1)−p12(0, 1)p23(0, 1)p13(1, 0) = 0,
p12(0, 0)p14(1, 0)p23(1, 0)p24(0, 1)p34(0, 0)−p12(1, 0)p14(0, 0)p23(0, 0)p24(1, 0)p34(0, 1)
= 0.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 10 / 15
Algebraic statistics: tools that guarantee algorithms for GoF
Two easy examples of Markov moves for the p1 model
Recall: pij (1, 0) 7→ λijαiβjθ, pij (0, 0) 7→ λij ,
pij (0, 1) 7→ λijαjβiθ, pij (1, 1) 7→ λijαiαjβiβjθ2ρij .
The fiber of any network in the p1 model is the set of all other networkswith the same in- and out- degrees and number of reciprocated edges.
Two moves, and the corresponding binomials:
p12(1, 0)p23(1, 0)p13(0, 1)−p12(0, 1)p23(0, 1)p13(1, 0) = 0,
p12(0, 0)p14(1, 0)p23(1, 0)p24(0, 1)p34(0, 0)−p12(1, 0)p14(0, 0)p23(0, 0)p24(1, 0)p34(0, 1)
= 0.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 10 / 15
Algebraic statistics: tools that guarantee algorithms for GoF
Two easy examples of Markov moves for the p1 model
Recall: pij (1, 0) 7→ λijαiβjθ, pij (0, 0) 7→ λij ,
pij (0, 1) 7→ λijαjβiθ, pij (1, 1) 7→ λijαiαjβiβjθ2ρij .
The fiber of any network in the p1 model is the set of all other networkswith the same in- and out- degrees and number of reciprocated edges.
Two moves, and the corresponding binomials:
p12(1, 0)p23(1, 0)p13(0, 1)−p12(0, 1)p23(0, 1)p13(1, 0) = 0,
p12(0, 0)p14(1, 0)p23(1, 0)p24(0, 1)p34(0, 0)−p12(1, 0)p14(0, 0)p23(0, 0)p24(1, 0)p34(0, 1)
= 0.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 10 / 15
Algebraic statistics: tools that guarantee algorithms for GoF
Two easy examples of Markov moves for the p1 model
Recall: pij (1, 0) 7→ λijαiβjθ, pij (0, 0) 7→ λij ,
pij (0, 1) 7→ λijαjβiθ, pij (1, 1) 7→ λijαiαjβiβjθ2ρij .
The fiber of any network in the p1 model is the set of all other networkswith the same in- and out- degrees and number of reciprocated edges.
Two moves, and the corresponding binomials:
p12(1, 0)p23(1, 0)p13(0, 1)−p12(0, 1)p23(0, 1)p13(1, 0) = 0,
p12(0, 0)p14(1, 0)p23(1, 0)p24(0, 1)p34(0, 0)−p12(1, 0)p14(0, 0)p23(0, 0)p24(1, 0)p34(0, 1)
= 0.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 10 / 15
Algebraic statistics: tools that guarantee algorithms for GoF
Two easy examples of Markov moves for the p1 model
Recall: pij (1, 0) 7→ λijαiβjθ, pij (0, 0) 7→ λij ,
pij (0, 1) 7→ λijαjβiθ, pij (1, 1) 7→ λijαiαjβiβjθ2ρij .
The fiber of any network in the p1 model is the set of all other networkswith the same in- and out- degrees and number of reciprocated edges.
Two moves, and the corresponding binomials:
p12(1, 0)p23(1, 0)p13(0, 1)−p12(0, 1)p23(0, 1)p13(1, 0) = 0,
p12(0, 0)p14(1, 0)p23(1, 0)p24(0, 1)p34(0, 0)−p12(1, 0)p14(0, 0)p23(0, 0)p24(1, 0)p34(0, 1)
= 0.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 10 / 15
Combinatorics of Markov bases - the parameter hypergraph -
Enter combinatorics: the parameter hypergraph
← Markov moves definedby color-balancedpictures →
General models: [P.-Stasi, 2012-13]
A d-uniform hypergraph ↔ a linear exponential family model:Joint probabilities parametrized by monomials of degree d .
Color-balanced hypergraphs provide Markov moves.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 11 / 15
Combinatorics of Markov bases - the parameter hypergraph -
Enter combinatorics: the parameter hypergraph
← Markov moves definedby color-balancedpictures →
General models: [P.-Stasi, 2012-13]
A d-uniform hypergraph ↔ a linear exponential family model:Joint probabilities parametrized by monomials of degree d .
Color-balanced hypergraphs provide Markov moves.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 11 / 15
Algebraic statistics asks for: (1) Squarefree Graver basis (2) minimal generators
Markov bases ↔ balanced hypergraphs
Markov basis for the model M is thus described by bicoloredmonomial hypergraphs over the model hypergraph HM (P.-Stasi):
If E supports a binomial over HM, then a Markov move on a fiber of themodel corresponds to replacing the set of red edges in E by the set of blueedges in E .
Degree bounds for minimal generators→ a bound for the Markov complexity(Markov width) of the model M.
Sometimes, the squarefree part of theGraver basis of IH is required for the fullMarkov basis (Hara-Takemura ’10). HM for the no 3-way
interaction model
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 12 / 15
So .... How about minimal generating sets?
Degree bounds (Markov complexity)
Theorem (Gross-P. ’12)
IH is generated in degree at most d,for any d ≥ 2, if and only if thereexist appropriate splitting sets onbalanced hypergraphs (binomialsupports) on H.
Examples
Elizabeth Gross, math.uic.edu/⇠ lizgross Markov Complexity of Hypergraphs
Hypergraph with splitting set.
The criterion is based on decomposable monomial walks, separators,and splitting sets.
Non-existence of a splitting set =⇒ indispensable binomial in IH .
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 13 / 15
So .... How about minimal generating sets?
Degree bounds (Markov complexity)
The graph case: a combinatorial criterion for IG generated by quadrics–existence of special chords in G (Ohsugi-Hibi ’00, Villarreal ’01).
x1
x2
x3
x4 x5
x6x7 x1
x2
x3
x4
x7
x4 x5
x6x7
1
lx1
x2
x3
x4 x5
x6x7 x1
x2
x3
x4
x7
x4 x5
x6x7
1
x1
x2
x3
x4 x5
x6x7 x1
x2
x3
x4
x7
x4 x5
x6x7
1
Theorem (Gross-P. ’12)
IH is generated in degree at most d,for any d ≥ 2, if and only if thereexist appropriate splitting sets onbalanced hypergraphs (binomialsupports) on H.
Examples
Elizabeth Gross, math.uic.edu/⇠ lizgross Markov Complexity of Hypergraphs
Hypergraph with splitting set.The criterion is based on decomposable monomial walks, separators,and splitting sets.Non-existence of a splitting set =⇒ indispensable binomial in IH .
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 13 / 15
So .... How about minimal generating sets?
Degree bounds (Markov complexity)
The graph case: a combinatorial criterion for IG generated by quadrics–existence of special chords in G (Ohsugi-Hibi ’00, Villarreal ’01).
Theorem (Gross-P. ’12)
IH is generated in degree at most d,for any d ≥ 2, if and only if thereexist appropriate splitting sets onbalanced hypergraphs (binomialsupports) on H.
Examples
Elizabeth Gross, math.uic.edu/⇠ lizgross Markov Complexity of Hypergraphs
Hypergraph with splitting set.
The criterion is based on decomposable monomial walks, separators,and splitting sets.
Non-existence of a splitting set =⇒ indispensable binomial in IH .
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 13 / 15
Combinatorics of Markov bases - the parameter hypergraph -
Combinatorics and algorithms
Applicable Markov moves for general models [P.-Stasi ’13]
Color-balanced hypergraphs provide exactly the moves necessary forconnecting the fiber in the presence of sampling constraints.
The moves need to be ‘primitive’ and ‘squarefree’.
Dual problem [P.-Stasi, ’13]
Detecting primitivity (squarefree) ↔ discrepancy problem in hypergraphs.
The hypergraph construction → a data-oriented dynamic algorithm forgenerating these moves for p1 [Gross-P.-Stasi 2013+].
A 10,000,000-move random walk
Dynamic exploration of a fiber of a p1 random network on 12 nodes, 16edges.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 14 / 15
Combinatorics of Markov bases - the parameter hypergraph -
Combinatorics and algorithms
Applicable Markov moves for general models [P.-Stasi ’13]
Color-balanced hypergraphs provide exactly the moves necessary forconnecting the fiber in the presence of sampling constraints.
The moves need to be ‘primitive’ and ‘squarefree’.
Dual problem [P.-Stasi, ’13]
Detecting primitivity (squarefree) ↔ discrepancy problem in hypergraphs.
The hypergraph construction → a data-oriented dynamic algorithm forgenerating these moves for p1 [Gross-P.-Stasi 2013+].
A 1,000-move random walk
Dynamic exploration of a fiber of a p1 random network on 5, 000 nodes,100, 000 edges.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 14 / 15
summary
Summary: What does algebraic statistics machinery offer?
For any exponential family model, the Markov bases guarantee toconnect all graphs (or hypergraphs!) with a fixed sufficient statistics.
The Fundamental Theorem of Markov Bases says that the Markovmoves are equivalent to generators of the toric ideal of the model.
For many models of interest, the hypergraph representation in termsof algebraic statistics allows for a direct derivation of the moves.
* For example, for a fixed observed degree sequence of a graph, or fixedobserved degree distribution, or fixed in- and out- degrees for directedgraphs... these can be derived straight from known results inalgebraic geometry and commutative algebra.
* Same goes for hypergraph degree sequences!
In particular, if you are interested in sampling/enumerating space of(hyper)graphs with fixed linear properties, algebra offers the answer.
S. Petrovic ([email protected]) Algebraic Statistics for Network Models Tuesday, 21 May 2013 15 / 15