2013
Oriol Alarcón Heras
Albert Gordi Margalef
Supervisor: Dr. Christian T. Brownlees
Universitat Pompeu Fabra
IBE – Final Year Project
28/11/2013
Partial CorrelationNetwork Analysis
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 1 –
Index1. Executive summary........................................................................................ 3
2. Network Analysis: theoretical setup.............................................................. 4
2.1 Introduction ................................................................................................... 4
2.2 Networks: Overview ...................................................................................... 6
Graphs............................................................................................................ 6
Characteristics of Graphs............................................................................... 6
Adjacency Matrix ( ) .................................................................................... 8
Notation......................................................................................................... 9
Metrics ......................................................................................................... 10
2.3 Network Analysis of Multivariate Time Series............................................. 11
2.4 Partial Correlation Network......................................................................... 12
Types of Networks ....................................................................................... 12
Setup and Definition .................................................................................... 13
Relation with Linear Regressions................................................................. 15
Characterizing the Partial Correlation Network .......................................... 15
Pros and Cons of Partial Correlation Network: ........................................... 16
2.5 Sparse Network Estimation ......................................................................... 18
Sparsity of Networks: Definition and Assumption ...................................... 18
LASSO Estimation Technique....................................................................... 19
Estimating the Partial Correlation Network (SPACE)................................... 21
3. Illustrated Simulation .................................................................................. 26
3.1 Software R ................................................................................................... 26
Extra Packages ............................................................................................. 26
3.2 Procedure of the Illustrated Simulation ...................................................... 27
True Network Definition.............................................................................. 27
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 2 –
Obtaining an Estimated Network ................................................................ 27
Obtaining and Recording the Results .......................................................... 28
3.3 Plotting the Results...................................................................................... 29
Lambda versus Sparsity ............................................................................... 29
ROC Curve .................................................................................................... 33
3.4 Conclusions .................................................................................................. 36
4. Application with real data ........................................................................... 36
4.1 Introduction ................................................................................................. 36
4.2 Explain the data ........................................................................................... 37
4.3 Procedure .................................................................................................... 38
Script ............................................................................................................ 38
4.4 Results.......................................................................................................... 40
Visual analysis .............................................................................................. 41
Numerical analysis ....................................................................................... 45
4.5 Conclusions .................................................................................................. 49
5. Conclusions .................................................................................................. 50
6. List of figures................................................................................................ 52
7. Bibliography................................................................................................. 53
8. Annexes ....................................................................................................... 56
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 3 –
1. Executive summaryThe final year project came to us as an opportunity to get involved in a topic which hasappeared to be attractive during the learning process of majoring in economics:statistics and its application to the analysis of economic data, i.e. econometrics.Moreover, the combination of econometrics and computer science is a very hot topicnowadays, given the Information Technologies boom in the last decades and theconsequent exponential increase in the amount of data collected and stored day byday. Data analysts able to deal with Big Data and to find useful results from it are verydemanded in these days and, according to our understanding, the work they do,although sometimes controversial in terms of ethics, is a clear source of value addedboth for private corporations and the public sector. For these reasons, the essence ofthis project is the study of a statistical instrument valid for the analysis of largedatasets which is directly related to computer science: Partial Correlation Networks.
The structure of the project has been determined by our objectives through thedevelopment of it. At first, the characteristics of the studied instrument are explained,from the basic ideas up to the features of the model behind it, with the final goal ofpresenting SPACE model as a tool for estimating interconnections in between elementsin large data sets. Afterwards, an illustrated simulation is performed in order to showthe power and efficiency of the model presented. And at last, the model is put intopractice by analyzing a relatively large data set of real world data, with the objective ofassessing whether the proposed statistical instrument is valid and useful when appliedto a real multivariate time series. In short, our main goals are to present the model andevaluate if Partial Correlation Network Analysis is an effective, useful instrument andallows finding valuable results from Big Data.
As a result, the findings all along this project suggest the Partial Correlation Estimationby Joint Sparse Regression Models approach presented by Peng et al. (2009) to workwell under the assumption of sparsity of data. Moreover, partial correlation networksare shown to be a very valid tool to represent cross-sectional interconnections inbetween elements in large data sets.
The scope of this project is however limited, as there are some sections in whichdeeper analysis would have been appropriate. Considering intertemporal connectionsin between elements, the choice of the tuning parameter lambda, or a deeper analysisof the results in the real data application are examples of aspects in which this projectcould be completed.
To sum up, the analyzed statistical tool has been proved to be a very useful instrumentto find relationships that connect the elements present in a large data set. And afterall, partial correlation networks allow the owner of this set to observe and analyze theexisting linkages that could have been omitted otherwise.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 4 –
2. Network Analysis: theoretical setup2.1 IntroductionAs suggested by the journalist Alvin Toffler in the early 1970s in his book Future Shock1,
a huge amount of information may end up hindering the decision making of an
individual because of the difficulties to understand a particular situation in presence of
too much available information. It is what he called ‘Information overload’, also known
as ‘Infobesity’ or ‘Infoxication’. The idea has become more and more popular in the last
decades, when the unstoppable progress of Information Technologies, globalization
and the Internet phenomenon have fostered the exponential growth in the digital
information available worldwide2.
These elements have made the exponential growth possible by reducing the costs of
data collection dramatically in the recent years. But there needs to be a reason why all
this information has been collected. The answer is probably the popular quote
‘information is power’.
Nowadays, contrary to the thought behind ‘Information overload’, not only individuals,
but specially firms and organizations, tend to believe that the more information you
owe, the better: the better the interpretation of the situation, the better the decision-
making process, and the better the outcome resulting from the information collected
originally. This is the reason why, given the mentioned reduction in data collection
costs, the volume of digital information available has grown so dramatically in the last
decades.
Storage, retrieval, transmission and manipulation of data are present these days in
almost all fields of study, like for instance biotechnology3, medical science4,
telecommunications5 and, in particular for the scope of our project, in the business
and economics6 environment. Large datasets are collected day by day with lots of
empirical information about the functioning of the real world around us at a cost much
1 (Toffler 1970)2 (Cukier 2010)3 (McBride 2012)4 (Groves et al. 2013)5 (Fox, van den Dam, and Shockley 2013)6 (Einav and Levin 2013)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 5 –
lower than a few decades ago. Letting aside concerns regarding storage limitations and
security issues as regards personal information, it is important to bear in mind that
information stored is not useful (profitable) per se. There needs to be a posterior
information management process that creates value for the owner of the information.
Big data7, the term used for defining these huge and complex data collections,
challenges the capacity of human beings of handling such amounts of information with
the existing tools or methods.
Stock market returns of S&P 100 companies, unemployment rates of European Union
member states, growth rates in developing countries, inflation rates in top economies
and so on are few examples of the large set of economic data available to (almost)
everyone nowadays. One can be interested, for instance, in the relationship in
between the different elements that conform one of the collections of data just
mentioned. Dealing with and interpreting large datasets like these may end up being a
hard task for experts, but especially for non-expert final users of these information.
And this is why a field called Network Analysis has emerged in the recent years.
Network Analysis is a branch of statistics which tries to help to interpret the hidden
interconnections in between the elements present in large datasets, like those
classified under the name of big data. It uses available tools in the statistics literature
to analyze large dimensional systems and, as a result, it may be used, for instance, to
design graphically represented networks that synthesize the interconnections of large
multivariate time series systems; interconnections that would have been hard to
observe otherwise.
That way, statisticians and econometricians have found a method to beat the
information overload in this specific setup and turn it into a powerful tool. A tool that
allows information users to not just read the results from the analysis, but to plot them
in a graphical representation that eases the interpretation of the data obtained from
real world empirical observations. In other words, they have developed a procedure to
construct simpler graphs representing the reality behind those complex large datasets.
7 (Arthur 2013)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 6 –
2.2 Networks: OverviewGraphsIn Mathematics, the usual way of representing networks is using graphs, which can be
generally defined as a collection of nodes connected by lines. However, depending on
the purpose of the study, graphs can have alternative definitions. Since the notation
for graphs can be quite extensive, we are going to focus only on those concepts useful
for our project. = ( ;ℰ)A graph is made out of two basic elements: the first one is vertices ( ) or nodes that
represent a set of elements; the second one is edges (ℰ) or links connecting vertices.
The set of edges is defined as ℰ ⊆ × . Hence, the following condition holds:( , ) ℰ and are connected by an edge.
Characteristics of GraphsA graph representing a network can be classified according to its main features.
Considering the scope of this project, the three main characteristics of a network
graph to consider are directionality, weight, and color.
Directionality of the edges: Directed and undirected graphs differ in terms of the
directionality of the edges. All the links in between vertices in a directed
network graph have a particular direction. Therefore, an edge from to is
different from an edge from to . On the other hand, in undirected network
graphs, edges do not have directionality, and an edge from to is the same as
an edge from to . In mixed network graphs, there exist both kinds of edges,
some have direction and others do not. The direction of a link may represent
several things, like for instance Granger causality.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 7 –
FIGURE 1 – UNDIRECTED AND DIRECTED NETWORK GRAPHS
Weight of the edges: The difference between weighted and unweighted graphs
has to do with the relative weight of each edge. The weighted network graphs
are constructed with edges that have different weights. This difference can be
represented by allocating numbers to the edges or by drawing edges with
different thickness. In unweighted graphs, all the edges have the same weight.
The weight of a link may have different interpretations, as for example the
intensity of the partial correlation.
FIGURE 2 – UNWEIGHTED AND WEIGHTED NETWORK GRAPHS
Color of the vertices/edges: Colored graphs are those which use different colors
to represent vertices/edges in the network, that way classifying the nodes/links
in different groups. On the other hand, all nodes and all links in an uncolored
Undirected Directed
1
3
2
1
3
2
1
3
2
15
22
12
Unweighted Graph
Weighted Graph
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 8 –
graph are colored with the same color. Colors may be used to describe a
difference in a particular feature in between the vertices/edges.
FIGURE 3 – UNCOLORED AND COLORED NETWORK GRAPHS
As described later on, due to the specific conditions and resources of our project,
undirected, unweighted (and colored in the last section) graphs are going to be used to
represent the networks we are going to deal with in the following sections. It means
that, first, in between two interrelated elements there is going to be represented just
one neutral connection without direction; second, all interrelations found between
two elements of the data set are going to be represented as with the same relevance
(and third, the vertices/edges are going to be colored so as to classify them according
to a specific characteristic).
Adjacency Matrix ( )The graphical representation of a network has its origin in what is known as the
Adjacency matrix ( ). It is the compact numerical representation of a network with
the form of a matrix.
It can be defined as the | | × | | matrix such that, if vertices and are connected by
an edge, then the element has a value different from zero8. In the case of an
unweighted graph, all elements will be either one or zero, depending on the
existence or not of an edge in between vertices and , respectively. In weighted
graphs however, can take any value as the network represented does not identify
just the existence of a linkage, but also specifies the weight of it. Notice that, in all
8 Therefore, the elements in the diagonal of the Adjacency matrix are all zeros, as a node cannot have anedge with itself.
1
3
2 1
3
2
Uncolored Graph Colored Graph
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 9 –
cases a particular network graph is undirected, the corresponding Adjacency matrix is
unequivocally symmetric.
In order to get a clearer idea of how does it work, an example of undirected weighted
graph is provided in Figure 4.
FIGURE 4 – UNDIRECTED-WEIGHTED NETWORK GRAPH AND ITS ADJACENCY MATRIX
NotationWhen describing graphical representations of networks, a couple of concepts may be
useful to define the structure of the network: path and component.
A path is the sequence of adjacent edges between two nodes, i.e. the set of edges
that, with the origin in a particular vertex, need to be ‘gone through’ so as to reach
another node in the network.
Components are sets of vertices such that all vertices in the set are connected by a
path, i.e. each of the ‘separated’ groups of linked vertices which are not
interconnected within a network. The figure below (Fig. 5) exhibits examples of both
concepts.
Undirected-weighted
Adjacencymatrix
Graph
1
2
3
15
22
12
5
4
13
8
14 [1,] [2,] [3,] [4,] [5,][,1] 0 15 22 0 0[,2] 15 0 12 0 0[,3] 22 12 0 8 13[,4] 0 0 8 0 14[,5] 0 0 13 14 0
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 10 –
FIGURE 5 – PATH AND COMPONENTS IN A NETWORK GRAPH
MetricsSo as to analyze a particular network, there are some measurement tools that may be
used to compare features of the network and the elements forming it. They are called
metrics.
Metrics can be global or local: global metrics refer to the whole network, while local
metrics refer just to parts of the network.
The three main metrics, and the ones used in the last section in this project, are
centrality, clustering and similarity.
Centrality: measures of centrality indicate how central a vertex is in the network.
The basic index of centrality is the degree of the vertex, i.e. the number of
edges connecting the node. There are more complex measures of centrality,
like the ones that internet search engines use to rank the pages when showing
the results of a specific search, for instance PageRank for Google9, but they are
beyond the scope of this project.
Clustering: measures of clustering indicate the degree of clustering between
edges in a network. A simple metric of clustering for two specific nodes is the
proportion of common adjacent nodes. An interesting feature of that metric is
that networks in which the degree of clustering is very high are also known as
small world networks10. The name comes after the small-world phenomenon,
9 (Brin and Page 1998)10 (Watts and Strogatz 1998)
Path from A to G: {( , ), ( , ), ( , ), ( , )}Components: { , , , , , } and { , }
G
F
EDA
C
B
H
G
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 11 –
popularly known as six degrees of separation. The idea behind that
phenomenon is that the maximum number of degrees of separation between
any two nodes in a network is 611. It is mostly applied to social networks, posing
that everyone in the world is connected by a friend-to-friend chain with a
maximum length of six steps; in other words, anyone should be able to connect
with Michael Jordan (or with José Gómez, former pétanque player12) through
just six friends at most.
Similarity: measures of similarity are not as analytical as the two previous
measures. They are based on the observation of the characteristics of the
elements in the network which are not represented by the edges, and the
subsequent comparison of those features so as to find to which extent nodes
are similar or not. The usual way to represent similarity in a graph is by color,
size or shape of the nodes. For instance, similar (in a certain aspect) elements
can be colored with the same color, or the size and shape of the vertices may
be set according to a feature of the elements.
There are other metrics and algorithms like degree distribution, modularity or motifs,
but they are beyond the scope of this project, and therefore not discussed in this
section.
2.3 Network Analysis of Multivariate Time SeriesAs briefly introduced in the previous section, network analysis’ final aim is to represent
as a network a large data collection which would be hard to interpret otherwise. In
particular, the type of data collection which is going to be used in this project to
develop a network analysis is a multivariate time series.
A multivariate time series is the collection of values of a determined number of
variables taken in successive periods of time, i.e. the collection of n data points of N
variables for T periods of time.
11 (Watts and Strogatz 1998)12 (Egea 2010)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 12 –
= ⋮ for = 1, … ,For instance, a multivariate time series may be the GDP growth rate (quarterly) of the
European Union Member States in the last 20 years. In this case, = 28 Member
States, and = 20 × 4 = 80 data points in time from 1993 to 2013.
In this example, the final output of the network analysis of this particular multivariate
time series could be displayed in a graph in which all the countries are represented by
nodes (vertices) and the interconnections in between the growth rates of those
elements are plotted as the edges linking the vertices which are interconnected.
For the purpose of this project, the concept of a white noise process is also required to
be understood. In short, a white noise process is that series in which each observation
of a variable “has an identical, independent, mean-zero distribution. Each period’s
observation in a white-noise time series is a complete ‘surprise’: nothing in the previous
history of the series gives us a clue whether the new value will be positive or negative,
large or small”13.
The formal definition for a white noise process is:( ) = 0 for all t( ) = for all t( , ) = 0 for all ≠ 02.4 Partial Correlation NetworkTypes of NetworksApart from the different types of graphs used to represent networks, there are also
different kinds of networks; each of them is usually represented using a particular type
of graph of those described in section 2.2.
There are several sort networks which have been studied in the statistics literature so
far. A list of the main ones is provided below, together with the papers in which they
have been presented and discussed:
13(A. Parker)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 13 –
Partial Correlation Network:
Dempster (1973), Lauritzen (1996), Meinshausen & B uhlmann (2006),Peng, Wang, Zhou, Zhu (2009)
Granger Causality Network
Billio, Getmanksi, Lo, Pellizzon (2012), Diebold and Yilmaz (2011)
Long Run Partial Correlation Network
Davis, Zhang, Zheng (2012), Brownlees & Barigozzi (2012)
Tail Dependence Networks
Hautsch, Schaumburg, Schienle (2011)
As regards the scope of our project, we are going to focus on the first one, Partial
Correlation Network, as it is the basic model and establishes the basis for the
development of the other models, which are quite more complex and applicable in
situations in which assumptions valid for Partial Correlation network to appropriately
work are not satisfied. Therefore, those other types of networks fall beyond the scope
of this project.
Setup and DefinitionIn order to analyze partial correlation network’s theoretical framework, we are going
to consider a scenario in which is a multivariate white noise process as defined in
the previous section. In this setup, the partial correlation network associated with this
system can be displayed as the undirected / unweighted graph in which:
a) all the elements of are represented by nodes, i.e. the different verticesplotted in the graph; and
b) in the case of the existence of partial correlation in between two given nodes(i.e., if and are partially correlated given all other elements), there would bean edge linking those two vertices in the graph.
The reason why the plotted edges in between two nodes which are partially correlated
would not be directed, neither weighted is that:
a. being partially correlated with is not different of being partially correlatedwith (in both cases, given all the other elements), because in this setup we are
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 14 –
not concerned about Granger causality relationships14. Therefore, the linkageof these two partially correlated elements has no direction from one to theother, i.e. the graph is undirected and the adjacency matrix is symmetric; and
b. the algorithm designed to graphically represent the partial correlation networkonly ‘cares’ about the mere existence (or not) of linear conditional dependencebetween two elements given all other elements, and this does not provide perse different values to assign weights to the edges connecting the partiallycorrelated nodes. I.e. the graph is unweighted as the elements in the adjacencymatrix from which the graphical representation of the network is built areeither one or zero, depending on the existence or not of partial correlationrespectively.
As a formal definition, partial correlation ( ) is the measure of the cross-sectional
linear conditional dependence between the variables and given all the other
variables: = ; { : ≠ , }As regards the partial correlation network graphical representation, it is defined as the
collection of edges in between those vertices whose partial correlation as defined
above is not zero, i.e. the linkages in between nodes which are partially correlated
given all the other elements in the network:
= { , } ∈ × ≠ 0Therefore, the graphs plotted using partial correlation network method have a simple
and easy to understand definition, hence transforming potentially complex and hard to
read information (a large dataset) into a plain and clear representation of vertices and
edges showing the existence or not of interconnections in between the elements of
such big amount of data. However, interpretations from the plot of a network may still
be hard to be reasoned, as well as the simplicity of the graph may sometimes be
misleading. The useful and the not so good features of partial correlation networks are
going to be discussed in section Pros and Cons of Partial Correlation Networks.
14 Granger causality has to do with forecasting of future values of a given variable, using past values ofother variables (i.e., partial correlation BUT across time). In this setup we are only consideringcontemporaneous interconnections in between elements. Intertemporal connections are beyond thescope of this project.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 15 –
Relation with Linear RegressionsThe relationship between partial correlation and the broadly known basic linear
regression model is intuitive and can help to understand the not so common concept
of partial correlation.
Consider the usual model of cross-section linear regression in which a single variable is
regressed on all the other variables in the dataset for a given period without any non-
linear regressor. For instance, imagine a multivariate time series with only four (4)
regressors at a given time t.= + + + += + + + += + + + += + + + +Running the regression for each of the variables on all the other elements as shown
above, if and only if ≠ 0 for ≠ ; then and are partially correlated.
In fact, the partial correlation of and is equivalent to the linear correlation
between the residuals of and obtained from the regression of the two elements
on all the other variables in the system, as in the equations above (i.e. , ).
Intuitively, when in a partial correlation network plot there exists a partial correlation
edge in between two given nodes and , then it is true that elements and are
correlated. Moreover, it is also true the other way around: if two elements are
correlated, then there exists a partial correlation edge in the partial correlation
network representation which connects the two nodes representing the elements.
Characterizing the Partial Correlation NetworkSo far it has been shown that the partial correlation network graphical representation
is displayed from the elements in the Adjacency matrix ( ). This matrix, which entirely
determines the structure of the plot of the network, is in turn constructed from
another matrix called the Concentration matrix ( ).
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 16 –
The Concentration matrix ( ) can be completely characterized from the inverse of the
covariance matrix ( ) of the multivariate time series of interest . Therefore, is a× matrix, where equals the number of variables of the multivariate time series.=Being the ( , ) element of the Concentration matrix , then the following equality
holds:
= −As a result:
a) the partial correlation in between all the elements of the multivariate timeseries of interest is entirely defined by the Concentration matrix , which inturn is completely defined from the Covariance matrix of the multivariatetime series;
b) the vertices and are connected by an edge if and only if the element ofthe Concentration matrix is not zero (i.e., ≠ 0⟺ ≠ 0 ); and
c) the estimation of a partial correlation network can be built based on theestimation of the Concentration matrix, as the Adjacency matrix ( )corresponding to the partial correlation network is made out of theConcentration matrix.
Notice that, as it is going to be shown later on in this project, the Concentration matrix
easily turns into the Adjacency matrix just by substituting all the non-zero elements in
by ones (1), and replacing the ones (1) in the diagonal of by zeros (0). As discussed
before in this section, the Adjacency matrix is the last object from which the plot of the
partial correlation network is created.
Pros and Cons of Partial Correlation Network:Partial correlation networks, as all kind of networks have some good features and
some limitations. In this section the main pros and cons of this type of network are
going to be summarized.
The main advantages of network analysis in general are the capacity of easing the
interpretation and the possibility of graphically representing interconnections in
between the elements of a large set of data from a multivariate time series. It is not a
specific characteristic of partial correlation networks but, as a network, it also complies
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 17 –
with that. These graphical representations may ease the interpretation of the results
once the partial correlations in between elements have been estimated, although
sometimes it can be not so easy to interpret the resulting plots due to extreme
complexity of the original empirical data (i.e., if real world data is extremely complex,
the chances are that the simplest possible reliable representation of it looks quite
complex too)15. The difficulty of the reasoning behind any interpretation of the data
goes beyond the scope of this project, as it does not depend on what is done with the
available information dataset, but with the complexity of the real world per se.
However, the idea that a graphical representation is very often helpful to build your
own reasoning to interpret most kinds of data cannot be denied.
A particularity of the partial correlation network is that in the case the data is Gaussian
white noise (hence, independent and identically distributed), absence of partial
correlation always implies conditional independence. It means that, in this scenario,
given the lack of an edge in between two nodes in the graph built using the partial
correlation estimation model, those two elements are conditionally independent (only
applicable in the case the estimation of the edges is correct). As a result, it can be
concluded that the representation coming from the partial correlation network in this
situation represents reliably the dependence structure of the original data. However, in
the case the original data is not normally distributed, this condition does not hold, and
the absence of a link in between two edges does not imply conditionally independence.
In addition, the way partial correlation works has an important drawback. As the
model is based in the interconnections between elements only at the same period of
time t, the partial correlation network analysis only has to do with contemporaneous
dependence in between the elements of the network. Any serial dependence or
spillover effect from series to series is going to be omitted by the model, and therefore
it will not appear in the graphical representation of the data. This feature may lead to
mistakes in the graphical projection of the data and therefore to misleading
interpretations of the plots. This weakness can be solved with more complex models
such as the ones mentioned in the Types of Networks section, like for instance Granger
15 As discussed in the section about Sparsity of networks, the estimations used to build the partialcorrelation network help to simplify the data so as to ease this interpretation.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 18 –
Causality network or the Long Run Partial Correlation network. As mentioned before,
those models are beyond the scope of this project and therefore not discussed here.
2.5 Sparse Network EstimationIn order to end up with the desired outcome of a network graph which has all the
advantages described in the previous section, an estimation of the partial correlations
in between elements of the dataset needs to be done. The way these partial
correlation estimations are found is going to be explained all along this section.
Sparsity of Networks: Definition and AssumptionOne specific assumption of the partial correlation network estimation model which is
described in this project is sparsity. Sparsity refers to the fact that, in a given network,
not every node (i.e., each element) is connected with any other node in the dataset,
and consequently the network is not complete. It means that in a given network not all
the elements in it are interconnected.
In the matrices presented so far ( , ), which characterize the network completely,
sparsity is represented by the existence of a great number of zero elements in them.
Recall that a zero element in the matrix means the absence of an edge in between two
variables; therefore a great amount of zeros implies a sparse or incomplete network.
In fact, previous research shows that certain kind of networks that have been studied
in past literature turn up to be sparse in general16. Citation networks, friendship-
networks, and most genetics networks17 are examples of sparse networks. Therefore,
it is reasonable to assume that, in certain situations, the datasets we are going to be
dealing with can be considered to be sparse. It should not always happen to be like
this, since depending on which field we are working in, perhaps a complete network
may be used. Nevertheless, in our project, and in general in the field of partial
correlation network estimation, when working with models of network estimation
trying to represent the real world from large empirical datasets, sparsity is assumed to
prevail and, as a consequence, estimation tools that provide coefficients originating
matrices of sparse networks may unequivocally be used.
16 (Scholz 2013)17 (Jeong et al. 2001); (Gardner et al. 2003) ; (Tegner et al. 2003)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 19 –
In particular, in the next section of our project we are going to present the LASSO
estimation tool, an instrument used to obtain a number non-zero partial correlation
estimations low enough so that allows ending up with a sparse network.
LASSO Estimation TechniqueIn 1996, Robert Tibshirani presented an innovative regression model called LASSO.
LASSO estimation technique turns out to be a very useful and effective tool to obtain
the estimations of partial correlations in between elements of a multivariate time
series that have the particular features that fit for our purpose of ending up with a
sparse network of the original data.18
The LASSO estimation model is a version of the usual least squares estimation model,
but which has the main characteristic of simultaneously estimating the parameters on
the regression while shrinking some of the estimated coefficients to exact zeros. LASSO
stands for Least Absolute Shrinkage and Selection Operator, which defines in short the
main feature of this estimation technique.
LASSO perfectly fits in the partial correlation setup discussed in this project because
allows to shrink a number of estimated coefficients enough to end up with a sparse
network. As discussed in the previous section, sparsity is an assumption that, apart
from being plausible, eases the understanding and interpretation of the final results
and the plot obtained from the partial correlation network estimation.
LASSO estimators are calculated in a way similar to ordinary least squares (OLS)
estimators. The set of LASSO estimators is defined as those which satisfy the following
expression:= arg min ∑ ( − θ) + λ ∑ |θ | ≥ 0Therefore, the LASSO estimators are those coefficients that minimize the square of the
error in the regression of each element on all the others (just as OLS), but taking into
account a parameter called lambda. Lambda, also called tuning parameter, controls for
the amount of shrinkage in the estimation procedure. Intuitively, it works as follows:
18 (Tibshirani 1996)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 20 –
In the case lambda takes a value equal to zero, the expression above turnsexactly the same as the OLS estimators expression, as the right hand side of thesum (i.e. the lambda parameter multiplying the sum of the absolute values ofall the LASSO estimators) is cancelled to exactly zero. Therefore no shrinkage isproduced and the estimators are exactly the same as in the ordinary leastsquares case.
In the case lambda takes a very large value, the right hand side of the sum inthe expression above becomes highly reactive for small changes in theestimators given the large value the parameter lambda has taken. Therefore,for a parameter lambda big enough, any LASSO estimator different from zerowould result in a value of the above expression very high. As the LASSOestimators are selected so as to minimize the expression above, the values forall the LASSO estimators given a large value for lambda are going to be zero soas to minimize the expression. In other words, all the parameters are shrunk toexact zeros and no estimator different from zero is found.
In the case lambda takes not zero, neither a very large value, the number ofLASSO estimator parameters is going to fall in between OLS estimators andzero. Therefore, some of them are going to be shrunk to zero, while others aregoing to remain different from zero.
The great thing about LASSO estimation technique is that, the way it works, it
automatically selects the variables which better explain the regressed variable by
shrinking the parameters corresponding to the variables which are not so explanatory
to exact zeros, keeping the parameters of the worthy variables different from zero. As
a result, the outcome is a selection of the variables which are partially correlated with
the regressed element, identifying the existence or not of interconnections between
the elements of the original multivariate time series.
This feature is exactly what is necessary to construct a sparse network. LASSO is a tool
that is useful to simply determine whether two elements are partially correlated or
not, in a way that, as discussed in the next section, this tool can be used for designing
the final outcome of the partial correlation network estimation: the estimated sparse
network and the corresponding plot of the partial correlations existing in a specific
multivariate time series.
Moreover, the LASSO estimation technique has another feature which is really worthy
in a specific scenario. In the case where the number of elements (N in the expression
above) is larger than the number of data points (n in the expression above), under
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 21 –
appropriate conditions19, the LASSO estimators found still work appropriately. This
scenario in which there are more variables than number of observations kind of
contradicts the traditional regression model in which a large number of observations
for a limited number of variables is required so as to find significant results regarding
the coefficients in the regression model, for instance using the typical OLS estimation
model.
In short, LASSO estimator turns out to be an optimal instrument for estimating the
existence of partial correlations in a setup as the one discussed in this project because
it allows to shrink to exact zeros non-useful coefficients of the regression model and
select worthy ones obtaining estimators that can be used to represent sparse networks
of multivariate time series, even in a scenario where there are much more elements
(variables, N) than data points (number of observations, n), called high-dimension-low-
sample-size scenario.
Estimating the Partial Correlation Network (SPACE)In 2009, Peng, Wang, Zhou and Zhu presented a “novel method for detecting pairs of
variables having nonzero partial correlations among a large number of random
variables based on iid samples”20. They called the model SPACE after Sparse PArtial
Correlation Estimation, and it is currently (one of) the best computationally efficient
approach to estimate the existence of partial correlations in the above mentioned
setup. It estimates partial correlations by using joint sparse regression models.
SPACE, as it has been suggested in the previous sections as a plausible assumption,
“assumes the overall sparsity of the partial correlation matrix, and employs sparse
regression techniques for model fitting” (Jie Peng et al., 2009, p.735), in particular the
LASSO estimation technique described in the previous section, to estimate non-zero
entries in the inverse of the covariance matrix, also known as the Concentration
matrix ( ). As discussed in previous sections, non-zero entries in the Concentration
matrix imply the existence of conditional dependency between the two corresponding
elements conditional on the rest of variables. Moreover, only under normality
19 The scope of this Project does not cover the discussion about in which cases these conditions arefulfilled.20 (Peng et al. 2009)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 22 –
assumption (i.e., Gaussian world), a zero entry in the Concentration matrix ( ) would
directly imply conditional independence in between the two corresponding elements.
As in the case of the LASSO estimator (in which this model is based), the method still
works in the scenario where the number of elements is way larger than the number of
observations, what the authors called high-dimension-low-sample-size setting. As
stated in the paper: “This is a typical scenario for many real life problems. For example,
high throughput genomic experiments usually result in datasets of thousands of genes
for tens or at most hundreds of samples. However, many high-dimensional problems
are intrinsically sparse. In the case of genetic regulatory networks, it is widely believed
that most gene pairs are not directly interacting with each other. Sparsity suggests that
even if the number of variables is much larger than the sample size, the effective
dimensionality of the problem might still be within a tractable range” (Jie Peng et al.,
2009, p.736).
The results in the simulation studies performed by the authors show that the method
proposed “achieves good power in nonzero partial correlation selection as well as hub
identification” (Jie Peng et al., 2009, p.735). Power (also known as sensitivity) is a
measure of the quality of the estimation model21, and hubs are “vertices (variables)
that are connected to (have nonzero partial correlations with) many other vertices
(variables)” (Jie Peng et al., 2009, p.735). Nowadays, it has been proved that hubs, like
sparsity, are present in many large networks such as the Internet, citation networks or
protein interaction networks22.
Furthermore, as stated by Jie Peng et al. (2009, p.745), “by controlling the overall
sparsity of the partial correlation matrix, space is able to automatically adjust for
different neighborhood sizes and thus to use data more effectively” by selecting the
value of the tuning parameter lambda. “The proposed method also explicitly employs
the symmetry among the partial correlations, which also helps to improve efficiency23.
21 See section Features of the Estimation22 (Newman 2003)23 For multivariate estimators, one estimator is said to be more efficient than another if the covariancematrix of the second minus the covariance matrix of the first is a positive semi definite matrix.http://economics.about.com/cs/economicsglossary/g/efficiency.htm
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 23 –
Moreover, this joint model makes it easy to incorporate prior knowledge about network
structure.”
The model
Using the LASSO regression tool and taking advantage of its useful features, mentioned
in the LASSO Estimation Technique section, Sparse PArtial Correlation Estimation
(SPACE) is a model for directly estimating the partial correlation in between the
elements of a large dataset. In other words, what is going to be estimated is the set of
elements that form the Concentration matrix , which as previously explained,
completely characterizes the partial correlation network behind the dataset.
Consider the regression model:= ∑ 1 − + for = 1, 2, … ,where = 0 if ≠ and = 1The regression coefficients and residual variance ( ) of the regression model
above are related to the entries in the Concentration matrix in the following way:
= 1( )= −This relationship can be derived from the partial correlation network estimation model,
in which the LASSO estimator of the concentration matrix can be obtained by
minimizing:
ℒ ( ) = − + λ |ρij|Where , = 1, … , is a pre-estimator of the reciprocal of the residual variance of
node i
= −
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 24 –
The outcome of the application of this model is the matrix containing the estimations
of the partial correlations in between elements (i.e., the Concentration matrix ),
exactly what is needed so as to define the network we are interested in plotting in a
graph easier to interpret.
The correct functioning of this model, as it is based on the LASSO estimation technique,
depends on the choice of the tuning parameter lambda.
Choosing Lambda
As discussed in the section where the LASSO estimator has been introduced, fitting a
network implies choosing a tuning parameter lambda, which is going to determine the
level of shrinkage of the parameters, and in the end the number of existing edges in
the graphical representation resulting from the partial correlation network estimation
made from the data.
Lambda can be seen as the penalty to pay when constructing a sparse network from
empirical data which is not sparse enough. In other words, lambda determines the
balance of type I and type II errors which the final result after the estimation is going to
be carrying on due to the shrinkage needed to end up with a sparse network.
In the particular setup of this project, type I error would be the failure to reject a non-
existing linkage in between two elements, while type II error would be the incorrect
rejection of a true existing linkage in between two elements 24.
As it is going to be shown in the following section with an illustrated simulation, the
larger the value of lambda, the more type II error the estimation is going to be carrying
on; and on the other hand, the lower the value of lambda (with the extreme case of
lambda equaling zero), the more type I error is going to be found in the estimation.
In practice, networks are estimated for different values of the lambda parameter, and
afterwards, information criteria like Akaike Information Criterion (AIC) or Bayesian
Information Criterion (BIC) are used to determine the appropriate or optimal value of
24 Type II error is also known as False Discovery Rate (FDR), see Features of the Estimation for furtherexplanation.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 25 –
lambda.25 In general, BIC is preferred to AIC as the optimal value for lambda proposed
by this criterion is larger, and therefore penalizes more and the resulting network is
sparser, hence easier to understand and interpret.
Features of the Estimation
In the following section, when implementing an illustrated simulation, as well as in the
last part of this project where we are going to deal with real world empirical data, the
SPACE model is going to be used in order to estimate the existence of partial
correlations in between the variables of the data sets used.
In order to evaluate and compare the results obtained in the illustrated simulation,
where we are able to ‘play’ with the model, the following measurement ratios are
going to be considered:
Sensitivity: the ratio of the number of correctly detected edges to the number of
total true edges. It is also known as power, and is the inverse measure of type II
error.
Specificity: the ratio of the number of correctly detected edges to the number of
total detected edges.
False discovery rate (FDR): the ratio of the number of wrongly detected edges to
the number of total detected edges; therefore = 1 − . It is
also known as type I error.
These ratios are going to be useful to compare the quality of the estimations obtained
using the SPACE model. Depending on the number of observations (n) and the choice
of the penalty or tuning parameter lambda ( ) we are going to assess this fitness of the
estimations by comparing the estimated network with the true network generated by
ourselves.
25 The criteria used to choose the optimal value of lambda is beyond the scope of our project.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 26 –
3. Illustrated SimulationOnce we have theoretically explained how the partial correlation network estimation
method works, an illustrated simulation is going to be carried out in order to show
empirically how to use the joint sparse model in estimating partial correlation
networks.
3.1 Software RThe software chosen to perform the simulation is R, which is both a programming
language and a computer environment for statistical computing that can be
downloaded for free. R features and a wide variety of commands and packages
available make it a popular language and software for data analysis. Thus, R suits
perfectly with our needs in this point of the project.
Extra PackagesThe basic version of the software R comes with just few packages in its memory,
packages that are not enough for the simulation that we need to carry out.
Fortunately, a list of them performing specific commands can be downloaded also for
free and some are going to be crucial for the development of the simulation and the
application of the model to the real data. These are space, igraph and tcltk2.
Space (Sparse PArtial Correlation Estimation): the package needed so as to perform
sparse partial correlation estimations using the joint sparse regression model
proposed by Peng et al. (2009) in R.26
Igraph (Network analysis and visualization): tool used for designing and plotting
graphs out of the data loaded in R. Igraph can handle large graphs very well and
provides functions for generating random and regular graphs, graph
visualization, centrality indices and much more.27
Tcltk: this package provides access to the platform-independent Tcl (i.e. Tool
command language) scripting language and Tk (i.e. Tool kit) GUI elements. It
can plot interactive graphs which are useful when analyzing networks.28
26 (Peng et al. 2013)27 (“Igraph: Network Analysis and Visualization” 2013)28 (“Tcltk” 2013)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 27 –
3.2 Procedure of the Illustrated SimulationIn order to check how the joint sparse model works to estimate partial correlation
networks, an illustrated simulation is going to be very effective. Since the objective is
not the estimation of a real world data set, a network is going to be defined to be used
as the true network of the data set.
True Network DefinitionBased on the assumption of sparsity, the true network created is going to have three
(3) independent components, six (6) hubs with three or more edges and a total of 19
edges. It is going to have 20 elements; therefore the Concentration matrix of a 20
variables set is going to be created ( is a 20 20 matrix). It does not represent a very
big data set, but it is enough to perform a practical simulation to analyze and get
useful results.
Then, taking the Concentration matrix and replacing any value different from 0 with
ones and placing zeros on the diagonal, the Adjacency matrix showing the partial
correlation existence or not is obtained. The plot of that matrix gives us the graph
representing the real network (Fig. 6) with which all simulations are going to be
compared.
FIGURE 6 – TRUE NETWORK
Obtaining an Estimated NetworkFrom that point on, the simulation process begins. First, a command making random
observations from a multivariate normal distribution is defined. The number of
variables is equal to 20 and their variance-covariance matrix used to generate those
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 28 –
observations is the inverse of the Concentration matrix created in the first step.
After that, the space.joint command, which performs the joint sparse regression, is
used to generate a concentration matrix from the random observations obtained
previously taking as the value of the penalty. Finally, the same procedure already
done to obtain the true Adjacency matrix is repeated to obtain the estimated
Adjacency matrix by replacing any value different from 0 with ones and placing zeros
on the diagonal.
The last step consists of plotting the graphs corresponding to both the true and the
estimated adjacency matrices so as to get the true and the estimated networks
graphically represented.
Obtaining and Recording the ResultsThe previous section showed the basic procedure of the simulation. Nevertheless, to
make a deeper and more reliable analysis of the joint sparse regression model, we
must perform several simulations while changing the two variables that affect the
metrics of the final estimated network, which are the sample size of the multivariate
random sampling and the penalty value of the space.joint regression. Therefore, the
objective is to check how sparsity, sensitivity and specificity vary and change.
Using the commands that generate a single estimated network as the basis, a more
complex script that performs the simulation a thousand (1000) times is written, and in
which the values of and can be chosen.
In our simulation, we are going to study the results for four different samples sizes ={50 ; 100 ; 500 ; 1000}. The tuning parameter lambda, always proportional to the
sample size29, is going to be determined to take values from = 0.00 × to =1.00 × in a sequence of 0.05 by 0.05, plus = 1.50 × and = 2.00 × so as to
complete the study of how it affects the metrics of the resulting network.
Since the simulation is a random process, it must be done many times in order to get
accurate and unbiased results30. To reach that end, the script is prepared to repeat the
29 Determined to be proportional to the sample size because of the LASSO model (R. Tibshirani, 1996).30 Although SPACE is presented by Peng et al. (2009) to be powerful even in high-dimension-low-sample-size scenario
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 29 –
process of estimating a network 1000 times for each pair of lambda and sample size
values, which ends up in an estimating process that generates (4 [ ] ×1000 [ ] × 22 [ ] =) 88,000 networks. Such amount of
iterations and the corresponding results collected cause an average computer to take
some minutes to entirely run the script.
Finally, the script counts how many edges are there and how many of them are
correctly detected (i.e., edges that do exist on the true network) and makes the
average of the 1,000 networks of each pair of and values. The final outcomes of
the script are four 4 × 22 matrices containing the four sample sizes as rows and the
twenty-two penalty values as columns: colavgeest is the collection of the average
edges estimated, colavgteest is the collection of the average true edges estimated,
sens is the recording of the sensitivities computed by dividing colavgteest by the
number of true edges (19) and spec is the recording of the specificities calculated by
dividing colavgteest by colavgeest.
Once having recorded all the data needed, the four matrices are exported as .csv files
(i.e., comma-separated values) so as to become portable documents, and plots of the
results obtained are prepared. These graphs are the tools from which conclusions
about the performance of the SPACE model are going to be taken out.
The complete script of the illustrated simulation can be found in the Annex 1, as well
as the tables with the results mentioned in the previous paragraph in Annexes 2 to 5.
3.3 Plotting the ResultsIn order to analyze the data obtained from the simulation, two different graphs are
going to be plotted so as to study the features of the joint sparse regression model.
The first one is going to show how changes in the penalty value affect the sparsity of
the estimated network; and the second one is going to be a ROC curve (i.e., Receiver
Operating Characteristics curve) of the model comparing specificity and the FDR in
each sample size.
Lambda versus SparsityTo be able to analyze the effect that changes in the tuning parameter have on sparsity,
the following graph (Fig. 7) is shown, which exhibits how the number of edges
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 30 –
estimated vary with different lambda values from = 0.00 × to = 1.00 × , and
for each samples size.
FIGURE 7 – HOW DOES LAMBDA AFFECT SPARSITY
Going back to the definition of sparsity, it refers to the fact that not every node is
connected with everyone in the data set. Therefore, the smaller the total number of
edges found, the sparser the estimated network is. Consequently, on the graph it is
clearly observed that there is a positive relation between the penalty value and
sparsity. When the first one increases, the total number of edges found decreases, so
the estimated network becomes sparser, moving from above the number of true edges
for lower lambdas, to below it when the penalization is too big.
It can also be pointed out that when the penalty value is zero, the number of edges
found is 190 for every sample size, which is the maximum possible number of existing
edges in a system of 20 elements31. In the same way, when the penalty is = 1.00 ×31 The maximum possible number of edges is ×( ) = × = 190.
0
20
40
60
80
100
120
140
160
180
200
Num
ber o
f edg
es fo
und
λ
How does λ affect sparsity
n=50
n=100
n=500
n=1000
True Edges = 19
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 31 –
, sparsity converges again for all sample sizes at a level below the true number of
edges in the network.
Comparing now the results found for the different sample sizes, it is observed that for
large values of , the estimated networks become very sparse and the number of
edges estimated fall close to the number of true edges (represented by the
discontinuous black line at 19) just by increasing a little above zero, and further
penalization does not alter significantly the number of edges for a wide interval of
values. On the other hand, for small sample sizes, the number of edges estimated is
close to the number of true edges just for few values of (proportionally big
ones).Therefore, in the cases is lower, the estimated networks become sparser
following a not-so-steep path until the true number of edges is reached, and once in
there, the number of edges estimated fall below the optimal value faster than with
large sample sizes as lambda increases.
Consequently, for proportionally equal penalty values, estimations from small values
are less sparse and fit worse with the true network than the ones from big sample
sizes. So as to check graphically those results, three different values of are chosen,
and the estimated networks obtained with these tuning parameters for the sample
sizes equal to 50 and 1000 are exhibited in Figure 8.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 32 –
FIGURE 8 – NETWORK COMPARISON BETWEEN SAMPLE SIZES FOR DIFFERENT VALUES
ESTIMATED NETWORKS
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 33 –
From the networks plotted above, it can be concluded that for proportionally small
values of lambda, big samples sizes fit much better with the true network than small
ones. However, that difference is reduced as lambda increases proportionally and
networks become sparser.
As a conclusion, the effectiveness of LASSO as a sparse estimator is well proved when
penalization is applied, working and fitting better through the joint sparse model when
the lambda value belongs to the optimal range for each sample size, and performing
more precisely when the number of observations is higher.
ROC CurveOnce sparsity has been reviewed, what follows is to analyze the efficacy of the joint
sparse regression model. To achieve that goal, the Receiver Operating Characteristic or
simply ROC curve32 stands as a very useful tool, since it is a commonly used graph to
illustrate the performance of binary classifier systems. It is generated by plotting the
sensibility on the false discovery rate or FDR.
Although the FDR ranges from zero to one, on the following graph (Fig. 9) it is defined
from zero to 0.20 in order to focus on the interval where the results on the model’s
effectiveness can be compared more significantly for different sample sizes of the
estimations.
32 (Fawcett 2006)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 34 –
FIGURE 9 – ROC CURVE
At first sight, it can be observed that there is a positive relation between sensitivity and
false discovery rate. This can be explained in terms of type I and type II error. As
described in the Choosing Lambda section, Type I error is a false positive, in that case,
the detection of edges that are not true; while type II error is a false negative, i.e. not
detecting the true edges. Therefore, it is clear that when type I error increases and we
detect more false edges (specificity decreases), the type II error decreases by detecting
a higher number of true edges.
The ROC curve shows that the differences in the performance of the model depending
on the sample size of the estimation are crystal clear. While high false discovery rates
are ‘paid’ by small sample sizes such as = 50 in order to achieve high degrees of
sensitivity, in the large sample sizes the sensitivity is almost constant at 1 all along the
graph, and starts to decrease only when false discovery rate is very close to zero and
the number of total estimated edges falls below the number of total true edges. In
short, the highest the , the better the model is going to fit with the true network.
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0 0,05 0,1 0,15 0,2
Sens
itivi
ty(in
vers
e to
type
II e
rror
)
False Discovery Rate (1 - specificity)type I error
ROC Curve
n=50n=100n=500n=1000FDR = 0,05
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 35 –
To make a deeper look at that reasoning, the type I error is kept low by setting the FDR
at the 0.05 level, and that way compare how each sample size works at that point
looking at the sensitivity degree. Starting from below, = 50 achieves just a
sensitivity of 0.32, = 100 achieves 0.67, = 500 achieves a 0.98 and = 1000achieves a maximum sensitivity of 1.00. Thus, the lower the sample size, the higher the
type II error present in the estimated network for a fixed level of type I error. The
following plots (Fig. 10) illustrate the better fit with the true network for estimations
made out of larger sample sizes while maintaining FDR at 0.05.
FIGURE 10 – NETWORK COMPARISON BETWEEN SAMPLE SIZES KEEPING FDR AT 5% LEVEL
ESTIMATED NETWORKS
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 36 –
Out of the previous graphs it can be seen that, in order to maintain a similar false
discovery rate, the bigger the sample size, the smaller the proportional penalty value.
Therefore, it could be assumed that in absolute terms, in order to get a better fit of the
estimated network the tuning parameter must increase less than proportional to the
increase in the sample size.
3.4 ConclusionsOut of the illustrated simulation it is confirmed the successful functioning of the joint
sparse regression model, and therefore of the LASSO estimator when estimating sparse
partial correlation networks under the right lambda values. Applied to relatively small
data sets as the one used in that section, perfectly fitted estimated networks can be
obtained when dealing with substantial sample sizes, showing the accuracy of the
model.
Further discussion about the performance of these models in different setups such as
the high-dimension-low-sample-size scenario can be found in the original papers of
Peng et al. (2009) and Tibshirani (1996).4. Application with real data4.1 IntroductionOn that last section of the project, the joint sparse regression model is going to be
applied to a real dataset to generate the estimated partial correlation network of a real
multivariate time series. It is not intended to analyze in deep the resulting network,
what could be done by getting hypothesis of the reason for each partial correlation
found and trying to accept or refuse them. Instead of that, this project has focused on
how the model and its estimating tool work, both theoretically and practically.
Therefore, the goal of this application with real data is to exhibit how the model
provides us with an estimated network of a real system from where, by observation
and reasoning, understand how the elements in a set are related in a much simple and
understandable manner than analyzing the numbers behind the graph. Notice that, as
described in the previous sections, the final outcome of the application is going to be
an undirected-unweighted graph, like the prototype partial correlation network. But in
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 37 –
this case, it is also going to be a colored graph, so as to make it more understandable
and easy to analyze.
In order to put in practice the methodology previously described, the database chosen
needs to meet some requirements to suit in the best possible way the objective set.
Firstly, it must fulfill the characteristics required by partial correlation networks, which
perform under multivariate white noise processes. Furthermore, so as to get to more
reliable results and to be able to take out more conclusions on the results found, the
data to be used needs to be normally distributed, which can be computed by
standardizing the original one.
Apart from the already mentioned constraints, there are also some practical bounds
that must be taken into account. Despite the fact that the model is designed to work
well with big data sets, it is not appropriate to deal with that kind of systems in this
case, due to the scope and the length of our project. Facing a system with more than
100 variables would stand as an unfeasible challenge in this section.
Taking into account the objectives exposed and the constraints already mentioned, the
change in the unemployment rate among the states of the US in the last decades suits
well the specified setting. Moreover, in the big market that the US represents, there is
a high degree of labor mobility (compared to the European Union)33 and it comprises
states with completely different characteristics regarding roles inside the national
economy, which could lead to an interesting to analyze estimated partial correlation
network.
4.2 Explain the dataFrom the Bureau of Labor Statistics34, a webpage derived from the United States
department of labor, it can be exported the unemployment rate data of the 51 states
(i.e. the 50 states and the federal district of Columbia) from the January of 1976 to the
August of 2013. Thus, the real dataset is composed of 51 variables containing 451observations each (552 unemployment rates in specified countries used to calculate
33 (Nickell 1997)34 (“Bureau of Labor Statistics” 2013)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 38 –
551 changes in unemployment rate), standing as a reasonable size for our estimations
to end up with significant results.
As mentioned before, working with Gaussian data eases the performance and the
analysis of the model. Hence, instead of the unemployment rate values, the
application is going to deal with its monthly changes (exported directly from the same
source) since this results in every variable having a mean close to zero. Then, to
standardize the data, each variable needs to be divided by the standard deviation of its
observations.
4.3 ProcedureThe procedure followed to reach the estimated network from the exported data has
some points in common with the illustrated simulation, since it is based on calculating
the adjacency matrix of a data set and plotting its network. Hence, the procedure is
written down and performed by a script in the R software, which like the illustrated
simulation is going to be attached on the Annexes section35 for further
comprehension.
ScriptFirstly, the unemployment data collected in an excel file is imported to “R”. Each
variable is divided by its standard deviation and the 451 51 matrix called sunempUS is
created, where each column is named with the abbreviation of a state containing its451 observations and each row is named with the date (i.e. month and year) of the
observations. From that point, the partial correlations matrix is generated by running
the space.joint command. Nevertheless, to do so the penalty value must be assigned,
as the process of choosing lambda plays a crucial role in the performance of the
application.
As explained on the theoretical section of the project, there are quantitative methods
as BIC and AIC which enable to find out the optimal lambda for the model. However,
since those tools are beyond the scope of the project, the penalty value of the real
data application has been chosen based on the results obtained from the illustrated
simulation, which is that when dealing with significantly big sample sizes, the optimal
35 See Annex 6: Script of the Real Application estimation.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 39 –
lambda is going to be the one from which small variations on both directions do not
modify substantially the features of the resulting estimated network. Consequently,
the process has been based on observing the networks from a wide range of lambda
values and choosing the one that looks more efficient. Again, remind that the analysis
of the causes behind the nature of the partial correlations are not a goal of the project
and therefore the selection of the optimal lambda is not a crucial point, as long as we
are working in a close interval from it.
Having decided the tuning parameter for the sparse joint regression, some features of
the network need to be defined so as to get a final graph from where to get visual
conclusions and also to make it the more visually understandable and pleasant to its
observers. Firstly, the labels of each node are defined as the abbreviated name of the
state36 it represents. Secondly, as we already mentioned, the estimated network is
going to be colored, both regarding the nodes and the edges.
Then, the color of the nodes represents the region where their states are located.
Among the various regional classification systems that exist for the US, the one used by
the government to collect the US population census data37 has been used on the
application. It divides the country in four regional areas (which are further divided in
divisions, not taken into account on the network). The region 1 is the Northeast, the
smallest in extension, represented by a light green. It comprises the following states:
Division 1 (New England): Maine, NewHampshire, Vermont, Massachusetts, Rhode Island, Connecticut
Division 2 (Mid-Atlantic): New York, Pennsylvania, New Jersey
The region 2 is the Midwest, represented in a violet tone and comprising the following:
Division 3 (East North Central): Wisconsin, Michigan, Illinois, Indiana, Ohio Division 4 (West North Central): Missouri, North Dakota, South
Dakota, Nebraska, Kansas, Minnesota, Iowa
The region 3 is the South, characterized by a tan tone. It comprises three divisions:
Division 5 (South Atlantic): Delaware, Maryland, District ofColumbia, Virginia, West Virginia, North Carolina, Carolina, Georgia, Florida
36 (“State Abbreviations” 2013)37 (“Census Regions and Divisions of the United States” 2013)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 40 –
Division 6 (East South Central): Kentucky, Tennessee, Mississippi, Alabama Division 7 (West South Central): Oklahoma, Texas, Arkansas, Louisiana
Finally, the region 4, the West, is represented by the color blue in the network and it is
composed by the following states:
Division 8 (Mountain) : Idaho, Montana,Wyoming, Nevada, Utah, Colorado, Arizona, New Mexico
Division 9 (Pacific): Alaska, Washington, Oregon, California, Hawaii
So as to assign another visual feature of the variables, according to the population of
each state, the 51 variables are sorted in 6 groups and the size of their nodes is defined
proportionally, ending up with 6 kinds of nodes in terms of size.
After that, the color of the edges to be found is defined. Taking the colors of the nodes
as the outline, the edge connecting states from the same region is going to have a tone
similar to the nodes which is connecting. On the other hand, edges linking nodes from
different regions are going to be painted in grey. That will ease the analysis of
similarity regarding location, which is checking if the fact of being from the same
region has any relation with the partial correlations estimated.
Furthermore, some additional features as the width of the edges or the font of the
labels in each vertex are modified to get a much more attractive graph. The last step is
just to plot the final estimated partial correlation network using the interactive
interface given by the tcltk package and to analyze it.
4.4 ResultsThe analysis of the network obtained is going to be divided in two parts. The first one
is the visual observation, intended to exhibit the power of the model to represent
graphically the partial correlations among a set of variables from where to get
conclusions. Therefore, some hypothesis are going to be made for the data set chosen
whose veracity is not going to be discussed in the project. On the other hand, the
numerical analysis is going to be based on calculating some metrics of the network so
as to analyze its structure and also state some hypothesis extracted from the
numerical results (hypothesis which are not going to be tested afterwards).
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 41 –
Visual analysisAlthough the partial correlation network given by the joint sparse model is a very
useful tool to ease the analysis of a data set, looking at the resulting network might
seem a little bit confusing at first sight. In order to get reasonable conclusions it
requires looking carefully at it and sorting what the network exhibits to extract just the
significant information. Therefore, since it requires using R to get an interactive plot of
the network, the following figured plotted (Fig. 11):
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 42 –
FIGURE 11 – UNEMPLOYMENT RATE CHANGES AMONG US STATES – ESTIMATED PARTIAL CORRELATION NETWORK USING SPACE
⓯Northeast ⓯South
⓯Midwest ⓯West
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 43 –
From a general view, what is observed at first is the significant sparsity of the network
obtained. There are three main components and twelve nodes alone. One component
is composed by two states, another by five states and the biggest one contains thirty-
two nodes.
Referring to the isolated nodes, seven out of twelve are from the south region. Hence,
if the model reliably shows reality, it seems that the labor market in that region is kind
of fragmented, since the 40% of the region behaves in an independent manner. The
reason behind that could be easier to explain in some states than in the others.
Florida, as an example, could be explained by the hypothesis that its economy is based
on tourism since it is the top travel destination in the world38. It is also observed that
three of that isolated nodes are from the Midwest, so it could also be interpreted as a
sign of fragmentation there. In relation to the entire country, the graph exposes that
approximately the 24% of the states behave as independent labor markets, idea
supported by the size of those isolated states, which apart from Delaware (DE) and
Oklahoma (OK) belong to the highly populated groups.
Looking at the components, the smallest one is made up of two states, Louisiana (LA)
and Mississippi (MS), both located in the south. Those states share frontiers and it
could be assumed that their labor markets perform pegged to each other. The next
component, which in the network appears in a kind of kite shape, is composed by five
states: Missouri (MO), Iowa (IA), Kansas (KS) and South Dakota (SD) from the Midwest
region; connected to the component by the former there is Utah (UT), belonging to the
West. As shown below, the four states sharing region also share boundaries, so it is
reasonable to think that they act as an independent entity in terms of labor mobility.
Figure 12 shows the territorial proximity on those two components.
38 (“Tourism in Florida” 2013)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 44 –
FIGURE 12 – LOCATION OF THE STATES COMPOSING THE TWO SMALL COMPONENTS 39
Then, there is the main component of the network which is formed by the 32
remaining states. That is the more complicated point in the analysis of that network
due to the high density of edges in between those elements. Nevertheless, the fact of
having them colored in function of the location of the nodes linking is going to highly
ease that task. The main observation here could be that geographical location drives
grouping. That stands as the similarity analysis of the network, since we are checking if
the fact that nodes share additional characteristics as location influences the partial
correlations among them. That assumption is derived from the fact that there can be
observed substantial partial correlations connecting states in the same region.
Therefore, proximity has an impact on those “regional labor markets”. Then, it can be
also appreciated that small states are always connected to any of the big ones,
belonging to the same regional area or not. On the other hand, the big ones seem to
be very central elements of the component, since they are connected by a high
number of edges. That leads to the idea that the big states on the component perform
as the central elements of that labor market influencing the small ones. That also
39 (“National Atlas ”)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 45 –
reinforces the idea that the elements which are isolated are able to work as
independent labor markets due to its big size.
Furthermore, California (CA) seems to be highly interconnected and therefore the
most central element of the network. That makes a lot of sense nowadays, since it is
one of the economic driving forces of the US and one of the top regions in the world in
development of information and communications technologies. Hence, it is linked to
states belonging to the four regions of the country. Moreover, other big states like
Illinois (IL), Michigan (MI), Minnesota (MN) and Virginia (VA) appear as kind of central
elements too.
Finally, not very clearly exhibited, it seems that the Northeast region, despite having
partial correlations with the other geographical areas, is a little bit less interconnected
to the entire component. On the graph they appear in a kind of peripheral area, but
that hypothesis cannot be strongly extracted from the visual analysis.
Some of the statements made above are going to be reinforced or refuted when
analyzing the metrics of the network on the next section. However, as a specific
conclusion on the visual analysis, it can be stated that the network provided by the
joint sparse model enables to get reasonable assumptions of the structure of the real
system in a quite well manner.
Numerical analysisThe first thing which is going to be confirmed by the analysis of the numerical results is
the quite obvious at sight sparsity of the data. To do so, the number of edges
estimated in the regression model implemented is going to be compared to the total
amount of possible edges in the case every single state was partially correlated with
each other.
The result is that only 98 out of the 1275 possible existing edges are estimated to
exist, a clear sign of sparsity of the data. In other words, out of 1275 possible partial
correlations to happen, only the 7.69% appear to be true.
This information, although difficult to be calculated exactly on visual analysis because
of the limited space of the plot, is something that could be observed directly on the
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 46 –
graphical representation of the network. However, there are other features or metrics
which are much more difficult to observe in the final plot, or even impossible in some
cases.
An example of the first case is measures of centrality. In this project we have chosen to
measure centrality by just counting the total number of existing estimated edges (i.e.,
degree of vertices) each state has, because more complex metrics fall beyond the
scope of our project. The results show that the states which have a higher number of
connecting edges are California (12), Virginia (11), Minnesota (10), New Jersey (9) and
Illinois (9). All these states are classified as big sized ones, which should not be
surprising as the bigger the state, the more influential it can be thought to be to the
economy of the nation, and therefore the more interconnected it should be. However,
as previously discussed in the general analysis, several big states appear to be
independent in the network, and a alternative reasoning might be proposed. Finally, it
is at least surprising that the first four states in terms of centrality belong each of them
to one of the different regions selected to classify the elements. See Figure 13.
FIGURE 13 – REGIONAL LOCATION OF THE FOUR MOST CENTRAL STATES OF THE NETWORK 40
40(“U.S. Regions: West, Midwest, South and Northeast”)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 47 –
To complete the analysis of centrality, the average number of edges each state has is
almost 4 in the whole network, despite the median is 3 given the many states that
have been estimated to be conditionally independent, in particular twelve (12). As
mentioned, the maximum is 12 in California, and the minimum is obviously 0 in all the
independent states.
On the other hand, an example of metric which cannot be observed in the graph is the
strength of the partial correlations plotted. One may be interested in knowing whether
the edges in the graph represent a significant41 partial correlation or not, or even
more, if the partial correlation found is positive or negative.
The analysis of the strength of partial correlations estimated using SPACE shows that,
out of those partial correlations estimated to be different from zero, one in three can
be assumed to be large enough42; hence a clear interconnection can be assumed to be
in this case. Considering also the elements which are not partially correlated (i.e.,
partial correlations estimated to be zero), only in 2.6% of the cases it takes values high
enough to be considered significant. This last finding supports the idea of sparsity of
the data, as only 2.6% of the total possible edges to be found are estimated to have a
value important enough using the SPACE model.
In particular, the higher value is found in between two states that, as described in the
previous section, form a component by themselves. These are Louisiana (LA) and
Mississippi (MS). The reasoning behind falls beyond the scope of our project, but it is
clear that the change in unemployment rate in those two states is highly
interconnected, as the partial correlation is larger than 0.5. The next largest partial
correlations connect Montana (MT) and Idaho (ID), Vermont (VT) and Maine (ME), and
in the fourth position Tennessee (TN) and Kentucky (KY), whose values are 0.36, 0.35
and 0.34 respectively. All these pairs of states share boundaries (with the exception of
Vermont and Maine, although they are almost sharing frontier), which confirms what
could be guessed at first sight: geographical location drives interconnectivity in this
setup.
41 Understanding significant as a synonym of important enough, large enough to be considered42 Values above 0.1 for a Partial Correlation are assumed to be important enough
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 48 –
Another finding worth to point out is the absence of negative partial correlations in the
estimation. None of the states has been estimated to have a negative partial
correlation with another one. It seems economically logical, in the sense that all the
states are part of the same economy, the United States of America, and although in
some cases it has been found that no relationship can be estimated, it makes sense to
assume that hits that affect the economic situation and make the unemployment rate
to fluctuate affect similarly all the states. More precisely, it has not been estimated any
case in which a change in one direction in a particular state, for instance a reduction in
the unemployment rate, is connected to a change in the opposite direction in another
state.
The last analytical metric performed has to do with clustering. First we have selected
the top centrality state in each of the four regions, which has been found previously in
this section talking about centrality. Those are California (CA) in the West, New Jersey
(NJ) in the North-East, Minnesota (MN) in the Mid-West and Virginia (VA) in the South
(although Virginia is not really in the geographical South). For these states, the
measure of clustering performed has been to calculate the number of nodes in
common each possible pair of states has. It means, for instance, how many states out
of those California is connected by an edge with are also at the same time connected
to Virginia.
The results show that all the possible pairs formed with these four states have at least
two nodes in common, with the maximum found in between California and New
Jersey, which share six nodes.
Afterwards, our interest has been directed at finding which is the pair of states more
clustered in the whole nation. To do so we have calculated the number of nodes in
common of each state with each other state. The complete results can be found in the
symmetric matrix in the Annex 7, but the main result is that the states sharing more
nodes are again California and New Jersey. It is surprising, as so far we have found that
geographical location had quite importance in the existence of partial correlation in
between two countries, and in this case, the two states are located very far away from
each other. We could guess that the point here is that CA and NJ are no directly
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 49 –
connected, but as both them (similar to the other big states located in the main
component) are driving forces of the US economy, changes in any of these two (or
more if considering other big states) countries affect a high number of states, that in
addition coincide.
The final ratio calculated is the average number of nodes in common found for two
elements in the network. The percentage equaling 43% suggests that, almost half of
the times you pick two random states in the set, the chances are that these two
random states share a vertex in common. To assess whether the estimated network
found in this section could be considered to be small-world, further analysis of
clustering coefficients should be done, using the tools proposed by Watts and Strogatz
(1998)43.
4.5 ConclusionsThe final purpose of this application with real data is not the analysis of the data
behind the multivariate time series used, but to analyze how the estimation model
presented in this project works in a real data setup. However, a few ideas or
hypothesis emerge from the output of the study of the change in unemployment rates
in the US states in the last decades.
The main hypothesis we would put forward is that high degree of labor mobility in the
United States determines the non-negative partial correlation existence in the data
analyzed. In other words, we would suggest that labor demand and labor supply across
the different states ‘meet’ at a national level after all, in the sense that changes on the
unemployment rate could be caused by movements on the entire economy, and labor
force moves from states with labor surplus to the states where there is high demand,
reducing unemployment imbalances in between them.
But, all in all, this section has been useful to check that SPACE is a powerful tool to
represent the interconnections in between elements of a real data multivariate time
series.
43 (Watts and Strogatz 1998)
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 50 –
5. ConclusionsThroughout this project, the joint sparse regression model (i.e. SPACE) is proposed to
estimate partial correlation networks representing big data sets under the assumption
on sparsity. Hence, after the setting of the theoretical framework on the method and
the sparse estimation tools it incorporates (i.e. LASSO), SPACE has been analyzed from
two points of view: its estimating efficiency and its usefulness as an analytical tool for
real data sets.
For the first one, an illustrated simulation is done so as to analyze the performance
and precision of the method in estimating a true network. It has been checked by
estimating 88,000 networks while changing both the sample size (i.e. the number of
observations for each variable) and the penalty value , which works as the tuning
parameter of the method. From this section there have arisen some strong
foundations stating that the method works very well on delivering sparse estimators
and in which, maintaining a constant false discovery rate (i.e. type I error), the bigger
the sample size, the higher the sensitivity of the estimation (i.e. the lower the type II
error). However, to obtain almost perfectly fitting estimated networks, an optimal
penalty value must be chosen. Hence, since the BIC and AIC criteria fall beyond the
scope of this project, it has been empirically observed by the figures provided in the
simulation that for big enough sample sizes, the optimal value falls in an interval
where small variations do not modify substantially the structure of the network, since
sparsity keeps more or less constant at the true level. That assumption has been used
on the second part of the analysis when dealing with real data.
The second viewpoint used to test SPACE has been the application of the model to a
real data set. After defining the constraints on the choice coming both from the
method and the scope of the project, the selected multivariate system has been the
changes in unemployment rates among the US states from January 1976 to August
2013. It represents a data set composed by 51 elements with 451 observations each, a
reasonably big enough data set both in number of variables and sample size so as to
proof the usefulness of SPACE in analyzing it. From that point on, the method has been
applied to the data and the final estimated partial correlation network has been
obtained, using a penalty value chosen under the assumptions obtained on the
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 51 –
illustrated simulation regarding the optimal value. At first sight, the colored
undirected-unweighted network appears as a very useful tool from where to get
conclusions on the data set. After a deeper visual and numerical analysis of the
network structure, in which we have checked some metrics of the network such as
centrality, clustering or similarity, the idea being tested during the application has
been reaffirmed. Hence, the estimated partial correlation network resulting from
SPACE is a powerful tool to analyze the interconnections between elements on sparse
real data sets.
To sum up, the proposal of the joint sparse regression model as an estimating method
for partial correlation networks has been reaffirmed during the entire project by its
effective performance as sparse estimating model and its usefulness when dealing
with appropriate real data sets.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 52 –
6. List of figuresFigure 1 – Undirected and directed network graphs ....................................................... 7
Figure 2 – Unweighted and weighted network graphs .................................................... 7
Figure 3 – Uncolored and colored network graphs.......................................................... 8
Figure 4 – Undirected-weighted network graph and its adjacency matrix...................... 9
Figure 5 – Path and components in a network graph .................................................... 10
Figure 6 – True network ................................................................................................. 27
Figure 7 – How does lambda affect sparsity .................................................................. 30
Figure 8 – Network comparison between sample sizes for different values.............. 32
Figure 9 – Roc curve ....................................................................................................... 34
Figure 10 – Network comparison between sample sizes keeping FDR at 5% level ....... 35
Figure 11 – Unemployment rate changes among us states – estimated partial
correlation network using space .................................................................................... 42
Figure 12 – Location of the states composing the two small components ................... 44
Figure 13 – Regional location of the four most central states of the network ............. 46
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 53 –
7. BibliographyA. Parker, Jeffrey. “Fundamental Concepts of Time-Series Econometrics.”
http://www.reed.edu/economics/parker/s13/312/tschapters/S13_Ch_1.pdf.
Arthur, Lisa. 2013. “What Is Big Data?” Forbes.http://www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/.
Barigozzi, Matteo, and Christian Brownlees. 2013. “Econometric Analysis of Networks.”
Brin, Sergey, and Lawrence Page. 1998. “The Anatomy of a Large-Scale HypertextualWeb Search Engine.” Computer Networks and ISDN Systems 30 (1-7) (April): 107–117. doi:10.1016/S0169-7552(98)00110-X.http://linkinghub.elsevier.com/retrieve/pii/S016975529800110X.
“Bureau of Labor Statistics.” 2013. Accessed November 15.http://www.bls.gov/bls/exit_BLS.htm?url=http://www.dol.gov/.
“Census Regions and Divisions of the United States.” 2013. Accessed November 16.http://www.census.gov/geo/maps-data/maps/pdfs/reference/us_regdiv.pdf.
Cukier, Kenneth. 2010. “Data, Data Everywhere.” The Economist 12 (8).http://www.economist.com/specialreports/displayStory.cfm?story_id=15557443.
Egea, Andrés. 2010. “«Volveré a Estudiar; de La Petanca No Se Puede Vivir».”Laverdad.es. http://www.laverdad.es/murcia/v/20101014/deportes_murcia/mas-deporte/volvere-estudiar-petanca-puede-20101014.html.
Einav, Liran, and Jonathan Levin. 2013. “The Data Revolution and Economic Analysis.”Prepared for the NBER Innovation Policy and the Economy Conference: 1–29.
Fawcett, Tom. 2006. “An Introduction to ROC Analysis.” Pattern Recognition Letters.doi:10.1016/j.patrec.2005.10.010.
Fox, Bob, Rob van den Dam, and Rebecca Shockley. 2013. “Analytics : Real-World Useof Big Data in Telecommunications.” http://www-935.ibm.com/services/multimedia/Real_world_use_of_big_data_in_telecommunications.pdf.
Gardner, Timothy S, Diego di Bernardo, David Lorenz, and James J Collins. 2003.“Inferring Genetic Networks and Identifying Compound Mode of Action viaExpression Profiling.” Science (New York, N.Y.) 301 (5629): 102–105.doi:10.1126/science.1081900.
Groves, Peter, B Kayyali, David Knott, and S Van Kuiken. 2013. “The ‘BigData’revolution in Healthcare.” McKinsey Quarterly (January).
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 54 –
http://www.lateralpraxis.com/download/The_big_data_revolution_in_healthcare.pdf.
“Igraph: Network Analysis and Visualization.” 2013. Accessed November 10.http://cran.r-project.org/web/packages/igraph/index.html.
Jeong, H, S P Mason, A L Barabási, and Z N Oltvai. 2001. “Lethality and Centrality inProtein Networks.” Nature 411 (6833): 41–42.
McBride, Ryan. 2012. “10 Reasons Why Biotech Needs Big Data.” FierceBiotechIT.http://www.fiercebiotechit.com/special-reports/10-reasons-why-biotech-needs-big-data.
“National Atlas .” http://nationalatlas.gov/.
Newman, M. 2003. “The Structure and Function of Complex Networks.” Arxiv PreprintCond-Mat. http://arxiv.org/abs/cond-mat/0303516,2003.
Nickell, Stephen. 1997. “Unemployment and Labor Market Rigidities: Europe VersusNorth America.” Journal of Economic Perspectives. doi:10.1257/jep.11.3.55.
Peng, Jie, Pei Wang, Nengfeng Zhou, and Ji Zhu. 2013. “Space: Sparse PArtialCorrelation Estimation.” Accessed November 10. http://cran.r-project.org/web/packages/space/index.html.
Peng et al. 2009. “Partial Correlation Estimation by Joint Sparse Regression Models.”Journal of the American Statistical Association 104 (486) (June 1): 735–746.doi:10.1198/jasa.2009.0126.http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2770199&tool=pmcentrez&rendertype=abstract.
Scholz, Matthias. 2013. “Getting Connected - The Highly Connected Society.” AccessedNovember 15. http://www.network-science.org/highly-connected-society-dense-social-complex-networks.html.
“State Abbreviations.” 2013. Accessed November 15. http://about.usps.com/who-we-are/postal-history/state-abbreviations.pdf.
“Tcltk.” 2013. Accessed November 10. http://cran.r-project.org/web/packages/tcltk/index.html.
Tegner, Jesper, M K Stephen Yeung, Jeff Hasty, and James J Collins. 2003. “ReverseEngineering Gene Networks: Integrating Genetic Perturbations with DynamicalModeling.” Proceedings of the National Academy of Sciences of the United Statesof America 100 (10): 5944–5949.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 55 –
Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of theRoyal Statistical Society Series B Methodological 58 (1): 267–288.doi:10.1111/j.1553-2712.2009.0451c.x. http://www.jstor.org/stable/2346178.
Toffler, Alvin. 1970. Future Shock. USA: Random House Inc.
“Tourism in Florida.” 2013. Accessed November 20.http://fcit.usf.edu/florida/lessons/tourism/tourism1.htm.
“U.S. Regions: West, Midwest, South and Northeast.”http://thomaslegion.net/uscensusbureauregionsthewestthemidwestthesouthandthenortheast.html.
Watts, D J, and S H Strogatz. 1998. “Collective Dynamics of ‘Small-World’ Networks.”Nature 393 (6684) (June 4): 440–2. doi:10.1038/30918.http://www.ncbi.nlm.nih.gov/pubmed/9623998.
Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi
– 56 –
8. Annexes Annex 1 – Script Illustrated Simulation Annex 2 – Table Results Illustrated Simulation Sensitivity Annex 3 – Table Results Illustrated Simulation Specificity Annex 4 – Table Results Illustrated Simulation Edges Estimated Annex 5 – Table Results Illustrated Simulation True Edges Estimated Annex 6 – Script Application with Real Data Annex 7 – Matrix Representing Number of Nodes in Common Between Every
Pair of States