Download pdf - Partial Correlation Network Analysis

2013

Oriol Alarcón Heras

Albert Gordi Margalef

Supervisor: Dr. Christian T. Brownlees

Universitat Pompeu Fabra

IBE – Final Year Project

28/11/2013

Partial CorrelationNetwork Analysis

Partial Correlation Network AnalysisOriol AlarcónAlbert Gordi

– 1 –

Index1. Executive summary........................................................................................ 3

2. Network Analysis: theoretical setup.............................................................. 4

2.1 Introduction ................................................................................................... 4

2.2 Networks: Overview ...................................................................................... 6

Graphs............................................................................................................ 6

Characteristics of Graphs............................................................................... 6

Adjacency Matrix ( ) .................................................................................... 8

Notation......................................................................................................... 9

Metrics ......................................................................................................... 10

2.3 Network Analysis of Multivariate Time Series............................................. 11

2.4 Partial Correlation Network......................................................................... 12

Types of Networks ....................................................................................... 12

Setup and Definition .................................................................................... 13

Relation with Linear Regressions................................................................. 15

Characterizing the Partial Correlation Network .......................................... 15

Pros and Cons of Partial Correlation Network: ........................................... 16

2.5 Sparse Network Estimation ......................................................................... 18

Sparsity of Networks: Definition and Assumption ...................................... 18

LASSO Estimation Technique....................................................................... 19

Estimating the Partial Correlation Network (SPACE)................................... 21

3. Illustrated Simulation .................................................................................. 26

3.1 Software R ................................................................................................... 26

Extra Packages ............................................................................................. 26

3.2 Procedure of the Illustrated Simulation ...................................................... 27

True Network Definition.............................................................................. 27


– 2 –

Obtaining an Estimated Network ................................................................ 27

Obtaining and Recording the Results .......................................................... 28

3.3 Plotting the Results...................................................................................... 29

Lambda versus Sparsity ............................................................................... 29

ROC Curve .................................................................................................... 33

3.4 Conclusions .................................................................................................. 36

4. Application with real data ........................................................................... 36

4.1 Introduction ................................................................................................. 36

4.2 Explain the data ........................................................................................... 37

4.3 Procedure .................................................................................................... 38

Script ............................................................................................................ 38

4.4 Results.......................................................................................................... 40

Visual analysis .............................................................................................. 41

Numerical analysis ....................................................................................... 45

4.5 Conclusions .................................................................................................. 49

5. Conclusions .................................................................................................. 50

6. List of figures................................................................................................ 52

7. Bibliography................................................................................................. 53

8. Annexes ....................................................................................................... 56


– 3 –

1. Executive summaryThe final year project came to us as an opportunity to get involved in a topic which hasappeared to be attractive during the learning process of majoring in economics:statistics and its application to the analysis of economic data, i.e. econometrics.Moreover, the combination of econometrics and computer science is a very hot topicnowadays, given the Information Technologies boom in the last decades and theconsequent exponential increase in the amount of data collected and stored day byday. Data analysts able to deal with Big Data and to find useful results from it are verydemanded in these days and, according to our understanding, the work they do,although sometimes controversial in terms of ethics, is a clear source of value addedboth for private corporations and the public sector. For these reasons, the essence ofthis project is the study of a statistical instrument valid for the analysis of largedatasets which is directly related to computer science: Partial Correlation Networks.

The structure of the project has been determined by our objectives through thedevelopment of it. At first, the characteristics of the studied instrument are explained,from the basic ideas up to the features of the model behind it, with the final goal ofpresenting SPACE model as a tool for estimating interconnections in between elementsin large data sets. Afterwards, an illustrated simulation is performed in order to showthe power and efficiency of the model presented. And at last, the model is put intopractice by analyzing a relatively large data set of real world data, with the objective ofassessing whether the proposed statistical instrument is valid and useful when appliedto a real multivariate time series. In short, our main goals are to present the model andevaluate if Partial Correlation Network Analysis is an effective, useful instrument andallows finding valuable results from Big Data.

As a result, the findings all along this project suggest the Partial Correlation Estimationby Joint Sparse Regression Models approach presented by Peng et al. (2009) to workwell under the assumption of sparsity of data. Moreover, partial correlation networksare shown to be a very valid tool to represent cross-sectional interconnections inbetween elements in large data sets.

The scope of this project is however limited, as there are some sections in whichdeeper analysis would have been appropriate. Considering intertemporal connectionsin between elements, the choice of the tuning parameter lambda, or a deeper analysisof the results in the real data application are examples of aspects in which this projectcould be completed.

To sum up, the analyzed statistical tool has been proved to be a very useful instrumentto find relationships that connect the elements present in a large data set. And afterall, partial correlation networks allow the owner of this set to observe and analyze theexisting linkages that could have been omitted otherwise.


– 4 –

2. Network Analysis: theoretical setup2.1 IntroductionAs suggested by the journalist Alvin Toffler in the early 1970s in his book Future Shock1,

a huge amount of information may end up hindering the decision making of an

individual because of the difficulties to understand a particular situation in presence of

too much available information. It is what he called ‘Information overload’, also known

as ‘Infobesity’ or ‘Infoxication’. The idea has become more and more popular in the last

decades, when the unstoppable progress of Information Technologies, globalization

and the Internet phenomenon have fostered the exponential growth in the digital

information available worldwide2.

These elements have made the exponential growth possible by reducing the costs of

data collection dramatically in the recent years. But there needs to be a reason why all

this information has been collected. The answer is probably the popular quote

‘information is power’.

Nowadays, contrary to the thought behind ‘Information overload’, not only individuals,

but specially firms and organizations, tend to believe that the more information you

owe, the better: the better the interpretation of the situation, the better the decision-

making process, and the better the outcome resulting from the information collected

originally. This is the reason why, given the mentioned reduction in data collection

costs, the volume of digital information available has grown so dramatically in the last

decades.

Storage, retrieval, transmission and manipulation of data are present these days in

almost all fields of study, like for instance biotechnology3, medical science4,

telecommunications5 and, in particular for the scope of our project, in the business

and economics6 environment. Large datasets are collected day by day with lots of

empirical information about the functioning of the real world around us at a cost much

1 (Toffler 1970)2 (Cukier 2010)3 (McBride 2012)4 (Groves et al. 2013)5 (Fox, van den Dam, and Shockley 2013)6 (Einav and Levin 2013)


– 5 –

lower than a few decades ago. Letting aside concerns regarding storage limitations and

security issues as regards personal information, it is important to bear in mind that

information stored is not useful (profitable) per se. There needs to be a posterior

information management process that creates value for the owner of the information.

Big data7, the term used for defining these huge and complex data collections,

challenges the capacity of human beings of handling such amounts of information with

the existing tools or methods.

Stock market returns of S&P 100 companies, unemployment rates of European Union

member states, growth rates in developing countries, inflation rates in top economies

and so on are few examples of the large set of economic data available to (almost)

everyone nowadays. One can be interested, for instance, in the relationship in

between the different elements that conform one of the collections of data just

mentioned. Dealing with and interpreting large datasets like these may end up being a

hard task for experts, but especially for non-expert final users of these information.

And this is why a field called Network Analysis has emerged in the recent years.

Network Analysis is a branch of statistics which tries to help to interpret the hidden

interconnections in between the elements present in large datasets, like those

classified under the name of big data. It uses available tools in the statistics literature

to analyze large dimensional systems and, as a result, it may be used, for instance, to

design graphically represented networks that synthesize the interconnections of large

multivariate time series systems; interconnections that would have been hard to

observe otherwise.

That way, statisticians and econometricians have found a method to beat the

information overload in this specific setup and turn it into a powerful tool. A tool that

allows information users to not just read the results from the analysis, but to plot them

in a graphical representation that eases the interpretation of the data obtained from

real world empirical observations. In other words, they have developed a procedure to

construct simpler graphs representing the reality behind those complex large datasets.

7 (Arthur 2013)


– 6 –

2.2 Networks: OverviewGraphsIn Mathematics, the usual way of representing networks is using graphs, which can be

generally defined as a collection of nodes connected by lines. However, depending on

the purpose of the study, graphs can have alternative definitions. Since the notation

for graphs can be quite extensive, we are going to focus only on those concepts useful

for our project. = ( ;ℰ)A graph is made out of two basic elements: the first one is vertices ( ) or nodes that

represent a set of elements; the second one is edges (ℰ) or links connecting vertices.

The set of edges is defined as ℰ ⊆ × . Hence, the following condition holds:( , ) ℰ and are connected by an edge.

Characteristics of GraphsA graph representing a network can be classified according to its main features.

Considering the scope of this project, the three main characteristics of a network

graph to consider are directionality, weight, and color.

Directionality of the edges: Directed and undirected graphs differ in terms of the

directionality of the edges. All the links in between vertices in a directed

network graph have a particular direction. Therefore, an edge from to is

different from an edge from to . On the other hand, in undirected network

graphs, edges do not have directionality, and an edge from to is the same as

an edge from to . In mixed network graphs, there exist both kinds of edges,

some have direction and others do not. The direction of a link may represent

several things, like for instance Granger causality.


– 7 –

FIGURE 1 – UNDIRECTED AND DIRECTED NETWORK GRAPHS

Weight of the edges: The difference between weighted and unweighted graphs

has to do with the relative weight of each edge. The weighted network graphs

are constructed with edges that have different weights. This difference can be

represented by allocating numbers to the edges or by drawing edges with

different thickness. In unweighted graphs, all the edges have the same weight.

The weight of a link may have different interpretations, as for example the

intensity of the partial correlation.

FIGURE 2 – UNWEIGHTED AND WEIGHTED NETWORK GRAPHS

Color of the vertices/edges: Colored graphs are those which use different colors

to represent vertices/edges in the network, that way classifying the nodes/links

in different groups. On the other hand, all nodes and all links in an uncolored

Undirected Directed

1

3

2

1

3

2

1

3

2

15

22

12

Unweighted Graph

Weighted Graph


– 8 –

graph are colored with the same color. Colors may be used to describe a

difference in a particular feature in between the vertices/edges.

FIGURE 3 – UNCOLORED AND COLORED NETWORK GRAPHS

As described later on, due to the specific conditions and resources of our project,

undirected, unweighted (and colored in the last section) graphs are going to be used to

represent the networks we are going to deal with in the following sections. It means

that, first, in between two interrelated elements there is going to be represented just

one neutral connection without direction; second, all interrelations found between

two elements of the data set are going to be represented as with the same relevance

(and third, the vertices/edges are going to be colored so as to classify them according

to a specific characteristic).

Adjacency Matrix ( )The graphical representation of a network has its origin in what is known as the

Adjacency matrix ( ). It is the compact numerical representation of a network with

the form of a matrix.

It can be defined as the | | × | | matrix such that, if vertices and are connected by

an edge, then the element has a value different from zero8. In the case of an

unweighted graph, all elements will be either one or zero, depending on the

existence or not of an edge in between vertices and , respectively. In weighted

graphs however, can take any value as the network represented does not identify

just the existence of a linkage, but also specifies the weight of it. Notice that, in all

8 Therefore, the elements in the diagonal of the Adjacency matrix are all zeros, as a node cannot have anedge with itself.

1

3

2 1

3

2

Uncolored Graph Colored Graph


– 9 –

cases a particular network graph is undirected, the corresponding Adjacency matrix is

unequivocally symmetric.

In order to get a clearer idea of how does it work, an example of undirected weighted

graph is provided in Figure 4.

FIGURE 4 – UNDIRECTED-WEIGHTED NETWORK GRAPH AND ITS ADJACENCY MATRIX

NotationWhen describing graphical representations of networks, a couple of concepts may be

useful to define the structure of the network: path and component.

A path is the sequence of adjacent edges between two nodes, i.e. the set of edges

that, with the origin in a particular vertex, need to be ‘gone through’ so as to reach

another node in the network.

Components are sets of vertices such that all vertices in the set are connected by a

path, i.e. each of the ‘separated’ groups of linked vertices which are not

interconnected within a network. The figure below (Fig. 5) exhibits examples of both

concepts.

Undirected-weighted

Adjacencymatrix

Graph

1

2

3

15

22

12

5

4

13

8

14 [1,] [2,] [3,] [4,] [5,][,1] 0 15 22 0 0[,2] 15 0 12 0 0[,3] 22 12 0 8 13[,4] 0 0 8 0 14[,5] 0 0 13 14 0


– 10 –

FIGURE 5 – PATH AND COMPONENTS IN A NETWORK GRAPH

MetricsSo as to analyze a particular network, there are some measurement tools that may be

used to compare features of the network and the elements forming it. They are called

metrics.

Metrics can be global or local: global metrics refer to the whole network, while local

metrics refer just to parts of the network.

The three main metrics, and the ones used in the last section in this project, are

centrality, clustering and similarity.

Centrality: measures of centrality indicate how central a vertex is in the network.

The basic index of centrality is the degree of the vertex, i.e. the number of

edges connecting the node. There are more complex measures of centrality,

like the ones that internet search engines use to rank the pages when showing

the results of a specific search, for instance PageRank for Google9, but they are

beyond the scope of this project.

Clustering: measures of clustering indicate the degree of clustering between

edges in a network. A simple metric of clustering for two specific nodes is the

proportion of common adjacent nodes. An interesting feature of that metric is

that networks in which the degree of clustering is very high are also known as

small world networks10. The name comes after the small-world phenomenon,

9 (Brin and Page 1998)10 (Watts and Strogatz 1998)

Path from A to G: {( , ), ( , ), ( , ), ( , )}Components: { , , , , , } and { , }

G

F

EDA

C

B

H

G


– 11 –

popularly known as six degrees of separation. The idea behind that

phenomenon is that the maximum number of degrees of separation between

any two nodes in a network is 611. It is mostly applied to social networks, posing

that everyone in the world is connected by a friend-to-friend chain with a

maximum length of six steps; in other words, anyone should be able to connect

with Michael Jordan (or with José Gómez, former pétanque player12) through

just six friends at most.

Similarity: measures of similarity are not as analytical as the two previous

measures. They are based on the observation of the characteristics of the

elements in the network which are not represented by the edges, and the

subsequent comparison of those features so as to find to which extent nodes

are similar or not. The usual way to represent similarity in a graph is by color,

size or shape of the nodes. For instance, similar (in a certain aspect) elements

can be colored with the same color, or the size and shape of the vertices may

be set according to a feature of the elements.

There are other metrics and algorithms like degree distribution, modularity or motifs,

but they are beyond the scope of this project, and therefore not discussed in this

section.

2.3 Network Analysis of Multivariate Time SeriesAs briefly introduced in the previous section, network analysis’ final aim is to represent

as a network a large data collection which would be hard to interpret otherwise. In

particular, the type of data collection which is going to be used in this project to

develop a network analysis is a multivariate time series.

A multivariate time series is the collection of values of a determined number of

variables taken in successive periods of time, i.e. the collection of n data points of N

variables for T periods of time.

11 (Watts and Strogatz 1998)12 (Egea 2010)


– 12 –

= ⋮ for = 1, … ,For instance, a multivariate time series may be the GDP growth rate (quarterly) of the

European Union Member States in the last 20 years. In this case, = 28 Member

States, and = 20 × 4 = 80 data points in time from 1993 to 2013.

In this example, the final output of the network analysis of this particular multivariate

time series could be displayed in a graph in which all the countries are represented by

nodes (vertices) and the interconnections in between the growth rates of those

elements are plotted as the edges linking the vertices which are interconnected.

For the purpose of this project, the concept of a white noise process is also required to

be understood. In short, a white noise process is that series in which each observation

of a variable “has an identical, independent, mean-zero distribution. Each period’s

observation in a white-noise time series is a complete ‘surprise’: nothing in the previous

history of the series gives us a clue whether the new value will be positive or negative,

large or small”13.

The formal definition for a white noise process is:( ) = 0 for all t( ) = for all t( , ) = 0 for all ≠ 02.4 Partial Correlation NetworkTypes of NetworksApart from the different types of graphs used to represent networks, there are also

different kinds of networks; each of them is usually represented using a particular type

of graph of those described in section 2.2.

There are several sort networks which have been studied in the statistics literature so

far. A list of the main ones is provided below, together with the papers in which they

have been presented and discussed:

13(A. Parker)


– 13 –

Partial Correlation Network:

Dempster (1973), Lauritzen (1996), Meinshausen & B uhlmann (2006),Peng, Wang, Zhou, Zhu (2009)

Granger Causality Network

Billio, Getmanksi, Lo, Pellizzon (2012), Diebold and Yilmaz (2011)

Long Run Partial Correlation Network

Davis, Zhang, Zheng (2012), Brownlees & Barigozzi (2012)

Tail Dependence Networks

Hautsch, Schaumburg, Schienle (2011)

As regards the scope of our project, we are going to focus on the first one, Partial

Correlation Network, as it is the basic model and establishes the basis for the

development of the other models, which are quite more complex and applicable in

situations in which assumptions valid for Partial Correlation network to appropriately

work are not satisfied. Therefore, those other types of networks fall beyond the scope

of this project.

Setup and DefinitionIn order to analyze partial correlation network’s theoretical framework, we are going

to consider a scenario in which is a multivariate white noise process as defined in

the previous section. In this setup, the partial correlation network associated with this

system can be displayed as the undirected / unweighted graph in which:

a) all the elements of are represented by nodes, i.e. the different verticesplotted in the graph; and

b) in the case of the existence of partial correlation in between two given nodes(i.e., if and are partially correlated given all other elements), there would bean edge linking those two vertices in the graph.

The reason why the plotted edges in between two nodes which are partially correlated

would not be directed, neither weighted is that:

a. being partially correlated with is not different of being partially correlatedwith (in both cases, given all the other elements), because in this setup we are


– 14 –

not concerned about Granger causality relationships14. Therefore, the linkageof these two partially correlated elements has no direction from one to theother, i.e. the graph is undirected and the adjacency matrix is symmetric; and

b. the algorithm designed to graphically represent the partial correlation networkonly ‘cares’ about the mere existence (or not) of linear conditional dependencebetween two elements given all other elements, and this does not provide perse different values to assign weights to the edges connecting the partiallycorrelated nodes. I.e. the graph is unweighted as the elements in the adjacencymatrix from which the graphical representation of the network is built areeither one or zero, depending on the existence or not of partial correlationrespectively.

As a formal definition, partial correlation ( ) is the measure of the cross-sectional

linear conditional dependence between the variables and given all the other

variables: = ; { : ≠ , }As regards the partial correlation network graphical representation, it is defined as the

collection of edges in between those vertices whose partial correlation as defined

above is not zero, i.e. the linkages in between nodes which are partially correlated

given all the other elements in the network:

= { , } ∈ × ≠ 0Therefore, the graphs plotted using partial correlation network method have a simple

and easy to understand definition, hence transforming potentially complex and hard to

read information (a large dataset) into a plain and clear representation of vertices and

edges showing the existence or not of interconnections in between the elements of

such big amount of data. However, interpretations from the plot of a network may still

be hard to be reasoned, as well as the simplicity of the graph may sometimes be

misleading. The useful and the not so good features of partial correlation networks are

going to be discussed in section Pros and Cons of Partial Correlation Networks.

14 Granger causality has to do with forecasting of future values of a given variable, using past values ofother variables (i.e., partial correlation BUT across time). In this setup we are only consideringcontemporaneous interconnections in between elements. Intertemporal connections are beyond thescope of this project.


– 15 –

Relation with Linear RegressionsThe relationship between partial correlation and the broadly known basic linear

regression model is intuitive and can help to understand the not so common concept

of partial correlation.

Consider the usual model of cross-section linear regression in which a single variable is

regressed on all the other variables in the dataset for a given period without any non-

linear regressor. For instance, imagine a multivariate time series with only four (4)

regressors at a given time t.= + + + += + + + += + + + += + + + +Running the regression for each of the variables on all the other elements as shown

above, if and only if ≠ 0 for ≠ ; then and are partially correlated.

In fact, the partial correlation of and is equivalent to the linear correlation

between the residuals of and obtained from the regression of the two elements

on all the other variables in the system, as in the equations above (i.e. , ).

Intuitively, when in a partial correlation network plot there exists a partial correlation

edge in between two given nodes and , then it is true that elements and are

correlated. Moreover, it is also true the other way around: if two elements are

correlated, then there exists a partial correlation edge in the partial correlation

network representation which connects the two nodes representing the elements.

Characterizing the Partial Correlation NetworkSo far it has been shown that the partial correlation network graphical representation

is displayed from the elements in the Adjacency matrix ( ). This matrix, which entirely

determines the structure of the plot of the network, is in turn constructed from

another matrix called the Concentration matrix ( ).


– 16 –

The Concentration matrix ( ) can be completely characterized from the inverse of the

covariance matrix ( ) of the multivariate time series of interest . Therefore, is a× matrix, where equals the number of variables of the multivariate time series.=Being the ( , ) element of the Concentration matrix , then the following equality

holds:

= −As a result:

a) the partial correlation in between all the elements of the multivariate timeseries of interest is entirely defined by the Concentration matrix , which inturn is completely defined from the Covariance matrix of the multivariatetime series;

b) the vertices and are connected by an edge if and only if the element ofthe Concentration matrix is not zero (i.e., ≠ 0⟺ ≠ 0 ); and

c) the estimation of a partial correlation network can be built based on theestimation of the Concentration matrix, as the Adjacency matrix ( )corresponding to the partial correlation network is made out of theConcentration matrix.

Notice that, as it is going to be shown later on in this project, the Concentration matrix

easily turns into the Adjacency matrix just by substituting all the non-zero elements in

by ones (1), and replacing the ones (1) in the diagonal of by zeros (0). As discussed

before in this section, the Adjacency matrix is the last object from which the plot of the

partial correlation network is created.

Pros and Cons of Partial Correlation Network:Partial correlation networks, as all kind of networks have some good features and

some limitations. In this section the main pros and cons of this type of network are

going to be summarized.

The main advantages of network analysis in general are the capacity of easing the

interpretation and the possibility of graphically representing interconnections in

between the elements of a large set of data from a multivariate time series. It is not a

specific characteristic of partial correlation networks but, as a network, it also complies


– 17 –

with that. These graphical representations may ease the interpretation of the results

once the partial correlations in between elements have been estimated, although

sometimes it can be not so easy to interpret the resulting plots due to extreme

complexity of the original empirical data (i.e., if real world data is extremely complex,

the chances are that the simplest possible reliable representation of it looks quite

complex too)15. The difficulty of the reasoning behind any interpretation of the data

goes beyond the scope of this project, as it does not depend on what is done with the

available information dataset, but with the complexity of the real world per se.

However, the idea that a graphical representation is very often helpful to build your

own reasoning to interpret most kinds of data cannot be denied.

A particularity of the partial correlation network is that in the case the data is Gaussian

white noise (hence, independent and identically distributed), absence of partial

correlation always implies conditional independence. It means that, in this scenario,

given the lack of an edge in between two nodes in the graph built using the partial

correlation estimation model, those two elements are conditionally independent (only

applicable in the case the estimation of the edges is correct). As a result, it can be

concluded that the representation coming from the partial correlation network in this

situation represents reliably the dependence structure of the original data. However, in

the case the original data is not normally distributed, this condition does not hold, and

the absence of a link in between two edges does not imply conditionally independence.

In addition, the way partial correlation works has an important drawback. As the

model is based in the interconnections between elements only at the same period of

time t, the partial correlation network analysis only has to do with contemporaneous

dependence in between the elements of the network. Any serial dependence or

spillover effect from series to series is going to be omitted by the model, and therefore

it will not appear in the graphical representation of the data. This feature may lead to

mistakes in the graphical projection of the data and therefore to misleading

interpretations of the plots. This weakness can be solved with more complex models

such as the ones mentioned in the Types of Networks section, like for instance Granger

15 As discussed in the section about Sparsity of networks, the estimations used to build the partialcorrelation network help to simplify the data so as to ease this interpretation.


– 18 –

Causality network or the Long Run Partial Correlation network. As mentioned before,

those models are beyond the scope of this project and therefore not discussed here.

2.5 Sparse Network EstimationIn order to end up with the desired outcome of a network graph which has all the

advantages described in the previous section, an estimation of the partial correlations

in between elements of the dataset needs to be done. The way these partial

correlation estimations are found is going to be explained all along this section.

Sparsity of Networks: Definition and AssumptionOne specific assumption of the partial correlation network estimation model which is

described in this project is sparsity. Sparsity refers to the fact that, in a given network,

not every node (i.e., each element) is connected with any other node in the dataset,

and consequently the network is not complete. It means that in a given network not all

the elements in it are interconnected.

In the matrices presented so far ( , ), which characterize the network completely,

sparsity is represented by the existence of a great number of zero elements in them.

Recall that a zero element in the matrix means the absence of an edge in between two

variables; therefore a great amount of zeros implies a sparse or incomplete network.

In fact, previous research shows that certain kind of networks that have been studied

in past literature turn up to be sparse in general16. Citation networks, friendship-

networks, and most genetics networks17 are examples of sparse networks. Therefore,

it is reasonable to assume that, in certain situations, the datasets we are going to be

dealing with can be considered to be sparse. It should not always happen to be like

this, since depending on which field we are working in, perhaps a complete network

may be used. Nevertheless, in our project, and in general in the field of partial

correlation network estimation, when working with models of network estimation

trying to represent the real world from large empirical datasets, sparsity is assumed to

prevail and, as a consequence, estimation tools that provide coefficients originating

matrices of sparse networks may unequivocally be used.

16 (Scholz 2013)17 (Jeong et al. 2001); (Gardner et al. 2003) ; (Tegner et al. 2003)


– 19 –

In particular, in the next section of our project we are going to present the LASSO

estimation tool, an instrument used to obtain a number non-zero partial correlation

estimations low enough so that allows ending up with a sparse network.

LASSO Estimation TechniqueIn 1996, Robert Tibshirani presented an innovative regression model called LASSO.

LASSO estimation technique turns out to be a very useful and effective tool to obtain

the estimations of partial correlations in between elements of a multivariate time

series that have the particular features that fit for our purpose of ending up with a

sparse network of the original data.18

The LASSO estimation model is a version of the usual least squares estimation model,

but which has the main characteristic of simultaneously estimating the parameters on

the regression while shrinking some of the estimated coefficients to exact zeros. LASSO

stands for Least Absolute Shrinkage and Selection Operator, which defines in short the

main feature of this estimation technique.

LASSO perfectly fits in the partial correlation setup discussed in this project because

allows to shrink a number of estimated coefficients enough to end up with a sparse

network. As discussed in the previous section, sparsity is an assumption that, apart

from being plausible, eases the understanding and interpretation of the final results

and the plot obtained from the partial correlation network estimation.

LASSO estimators are calculated in a way similar to ordinary least squares (OLS)

estimators. The set of LASSO estimators is defined as those which satisfy the following

expression:= arg min ∑ ( − θ) + λ ∑ |θ | ≥ 0Therefore, the LASSO estimators are those coefficients that minimize the square of the

error in the regression of each element on all the others (just as OLS), but taking into

account a parameter called lambda. Lambda, also called tuning parameter, controls for

the amount of shrinkage in the estimation procedure. Intuitively, it works as follows:

18 (Tibshirani 1996)


– 20 –

In the case lambda takes a value equal to zero, the expression above turnsexactly the same as the OLS estimators expression, as the right hand side of thesum (i.e. the lambda parameter multiplying the sum of the absolute values ofall the LASSO estimators) is cancelled to exactly zero. Therefore no shrinkage isproduced and the estimators are exactly the same as in the ordinary leastsquares case.

In the case lambda takes a very large value, the right hand side of the sum inthe expression above becomes highly reactive for small changes in theestimators given the large value the parameter lambda has taken. Therefore,for a parameter lambda big enough, any LASSO estimator different from zerowould result in a value of the above expression very high. As the LASSOestimators are selected so as to minimize the expression above, the values forall the LASSO estimators given a large value for lambda are going to be zero soas to minimize the expression. In other words, all the parameters are shrunk toexact zeros and no estimator different from zero is found.

In the case lambda takes not zero, neither a very large value, the number ofLASSO estimator parameters is going to fall in between OLS estimators andzero. Therefore, some of them are going to be shrunk to zero, while others aregoing to remain different from zero.

The great thing about LASSO estimation technique is that, the way it works, it

automatically selects the variables which better explain the regressed variable by

shrinking the parameters corresponding to the variables which are not so explanatory

to exact zeros, keeping the parameters of the worthy variables different from zero. As

a result, the outcome is a selection of the variables which are partially correlated with

the regressed element, identifying the existence or not of interconnections between

the elements of the original multivariate time series.

This feature is exactly what is necessary to construct a sparse network. LASSO is a tool

that is useful to simply determine whether two elements are partially correlated or

not, in a way that, as discussed in the next section, this tool can be used for designing

the final outcome of the partial correlation network estimation: the estimated sparse

network and the corresponding plot of the partial correlations existing in a specific

multivariate time series.

Moreover, the LASSO estimation technique has another feature which is really worthy

in a specific scenario. In the case where the number of elements (N in the expression

above) is larger than the number of data points (n in the expression above), under


– 21 –

appropriate conditions19, the LASSO estimators found still work appropriately. This

scenario in which there are more variables than number of observations kind of

contradicts the traditional regression model in which a large number of observations

for a limited number of variables is required so as to find significant results regarding

the coefficients in the regression model, for instance using the typical OLS estimation

model.

In short, LASSO estimator turns out to be an optimal instrument for estimating the

existence of partial correlations in a setup as the one discussed in this project because

it allows to shrink to exact zeros non-useful coefficients of the regression model and

select worthy ones obtaining estimators that can be used to represent sparse networks

of multivariate time series, even in a scenario where there are much more elements

(variables, N) than data points (number of observations, n), called high-dimension-low-

sample-size scenario.

Estimating the Partial Correlation Network (SPACE)In 2009, Peng, Wang, Zhou and Zhu presented a “novel method for detecting pairs of

variables having nonzero partial correlations among a large number of random

variables based on iid samples”20. They called the model SPACE after Sparse PArtial

Correlation Estimation, and it is currently (one of) the best computationally efficient

approach to estimate the existence of partial correlations in the above mentioned

setup. It estimates partial correlations by using joint sparse regression models.

SPACE, as it has been suggested in the previous sections as a plausible assumption,

“assumes the overall sparsity of the partial correlation matrix, and employs sparse

regression techniques for model fitting” (Jie Peng et al., 2009, p.735), in particular the

LASSO estimation technique described in the previous section, to estimate non-zero

entries in the inverse of the covariance matrix, also known as the Concentration

matrix ( ). As discussed in previous sections, non-zero entries in the Concentration

matrix imply the existence of conditional dependency between the two corresponding

elements conditional on the rest of variables. Moreover, only under normality

19 The scope of this Project does not cover the discussion about in which cases these conditions arefulfilled.20 (Peng et al. 2009)


– 22 –

assumption (i.e., Gaussian world), a zero entry in the Concentration matrix ( ) would

directly imply conditional independence in between the two corresponding elements.

As in the case of the LASSO estimator (in which this model is based), the method still

works in the scenario where the number of elements is way larger than the number of

observations, what the authors called high-dimension-low-sample-size setting. As

stated in the paper: “This is a typical scenario for many real life problems. For example,

high throughput genomic experiments usually result in datasets of thousands of genes

for tens or at most hundreds of samples. However, many high-dimensional problems

are intrinsically sparse. In the case of genetic regulatory networks, it is widely believed

that most gene pairs are not directly interacting with each other. Sparsity suggests that

even if the number of variables is much larger than the sample size, the effective

dimensionality of the problem might still be within a tractable range” (Jie Peng et al.,

2009, p.736).

The results in the simulation studies performed by the authors show that the method

proposed “achieves good power in nonzero partial correlation selection as well as hub

identification” (Jie Peng et al., 2009, p.735). Power (also known as sensitivity) is a

measure of the quality of the estimation model21, and hubs are “vertices (variables)

that are connected to (have nonzero partial correlations with) many other vertices

(variables)” (Jie Peng et al., 2009, p.735). Nowadays, it has been proved that hubs, like

sparsity, are present in many large networks such as the Internet, citation networks or

protein interaction networks22.

Furthermore, as stated by Jie Peng et al. (2009, p.745), “by controlling the overall

sparsity of the partial correlation matrix, space is able to automatically adjust for

different neighborhood sizes and thus to use data more effectively” by selecting the

value of the tuning parameter lambda. “The proposed method also explicitly employs

the symmetry among the partial correlations, which also helps to improve efficiency23.

21 See section Features of the Estimation22 (Newman 2003)23 For multivariate estimators, one estimator is said to be more efficient than another if the covariancematrix of the second minus the covariance matrix of the first is a positive semi definite matrix.http://economics.about.com/cs/economicsglossary/g/efficiency.htm


– 23 –

Moreover, this joint model makes it easy to incorporate prior knowledge about network

structure.”

The model

Using the LASSO regression tool and taking advantage of its useful features, mentioned

in the LASSO Estimation Technique section, Sparse PArtial Correlation Estimation

(SPACE) is a model for directly estimating the partial correlation in between the

elements of a large dataset. In other words, what is going to be estimated is the set of

elements that form the Concentration matrix , which as previously explained,

completely characterizes the partial correlation network behind the dataset.

Consider the regression model:= ∑ 1 − + for = 1, 2, … ,where = 0 if ≠ and = 1The regression coefficients and residual variance ( ) of the regression model

above are related to the entries in the Concentration matrix in the following way:

= 1( )= −This relationship can be derived from the partial correlation network estimation model,

in which the LASSO estimator of the concentration matrix can be obtained by

minimizing:

ℒ ( ) = − + λ |ρij|Where , = 1, … , is a pre-estimator of the reciprocal of the residual variance of

node i

= −


– 24 –

The outcome of the application of this model is the matrix containing the estimations

of the partial correlations in between elements (i.e., the Concentration matrix ),

exactly what is needed so as to define the network we are interested in plotting in a

graph easier to interpret.

The correct functioning of this model, as it is based on the LASSO estimation technique,

depends on the choice of the tuning parameter lambda.

Choosing Lambda

As discussed in the section where the LASSO estimator has been introduced, fitting a

network implies choosing a tuning parameter lambda, which is going to determine the

level of shrinkage of the parameters, and in the end the number of existing edges in

the graphical representation resulting from the partial correlation network estimation

made from the data.

Lambda can be seen as the penalty to pay when constructing a sparse network from

empirical data which is not sparse enough. In other words, lambda determines the

balance of type I and type II errors which the final result after the estimation is going to

be carrying on due to the shrinkage needed to end up with a sparse network.

In the particular setup of this project, type I error would be the failure to reject a non-

existing linkage in between two elements, while type II error would be the incorrect

rejection of a true existing linkage in between two elements 24.

As it is going to be shown in the following section with an illustrated simulation, the

larger the value of lambda, the more type II error the estimation is going to be carrying

on; and on the other hand, the lower the value of lambda (with the extreme case of

lambda equaling zero), the more type I error is going to be found in the estimation.

In practice, networks are estimated for different values of the lambda parameter, and

afterwards, information criteria like Akaike Information Criterion (AIC) or Bayesian

Information Criterion (BIC) are used to determine the appropriate or optimal value of

24 Type II error is also known as False Discovery Rate (FDR), see Features of the Estimation for furtherexplanation.


– 25 –

lambda.25 In general, BIC is preferred to AIC as the optimal value for lambda proposed

by this criterion is larger, and therefore penalizes more and the resulting network is

sparser, hence easier to understand and interpret.

Features of the Estimation

In the following section, when implementing an illustrated simulation, as well as in the

last part of this project where we are going to deal with real world empirical data, the

SPACE model is going to be used in order to estimate the existence of partial

correlations in between the variables of the data sets used.

In order to evaluate and compare the results obtained in the illustrated simulation,

where we are able to ‘play’ with the model, the following measurement ratios are

going to be considered:

Sensitivity: the ratio of the number of correctly detected edges to the number of

total true edges. It is also known as power, and is the inverse measure of type II

error.

Specificity: the ratio of the number of correctly detected edges to the number of

total detected edges.

False discovery rate (FDR): the ratio of the number of wrongly detected edges to

the number of total detected edges; therefore = 1 − . It is

also known as type I error.

These ratios are going to be useful to compare the quality of the estimations obtained

using the SPACE model. Depending on the number of observations (n) and the choice

of the penalty or tuning parameter lambda ( ) we are going to assess this fitness of the

estimations by comparing the estimated network with the true network generated by

ourselves.

25 The criteria used to choose the optimal value of lambda is beyond the scope of our project.


– 26 –

3. Illustrated SimulationOnce we have theoretically explained how the partial correlation network estimation

method works, an illustrated simulation is going to be carried out in order to show

empirically how to use the joint sparse model in estimating partial correlation

networks.

3.1 Software RThe software chosen to perform the simulation is R, which is both a programming

language and a computer environment for statistical computing that can be

downloaded for free. R features and a wide variety of commands and packages

available make it a popular language and software for data analysis. Thus, R suits

perfectly with our needs in this point of the project.

Extra PackagesThe basic version of the software R comes with just few packages in its memory,

packages that are not enough for the simulation that we need to carry out.

Fortunately, a list of them performing specific commands can be downloaded also for

free and some are going to be crucial for the development of the simulation and the

application of the model to the real data. These are space, igraph and tcltk2.

Space (Sparse PArtial Correlation Estimation): the package needed so as to perform

sparse partial correlation estimations using the joint sparse regression model

proposed by Peng et al. (2009) in R.26

Igraph (Network analysis and visualization): tool used for designing and plotting

graphs out of the data loaded in R. Igraph can handle large graphs very well and

provides functions for generating random and regular graphs, graph

visualization, centrality indices and much more.27

Tcltk: this package provides access to the platform-independent Tcl (i.e. Tool

command language) scripting language and Tk (i.e. Tool kit) GUI elements. It

can plot interactive graphs which are useful when analyzing networks.28

26 (Peng et al. 2013)27 (“Igraph: Network Analysis and Visualization” 2013)28 (“Tcltk” 2013)


– 27 –

3.2 Procedure of the Illustrated SimulationIn order to check how the joint sparse model works to estimate partial correlation

networks, an illustrated simulation is going to be very effective. Since the objective is

not the estimation of a real world data set, a network is going to be defined to be used

as the true network of the data set.

True Network DefinitionBased on the assumption of sparsity, the true network created is going to have three

(3) independent components, six (6) hubs with three or more edges and a total of 19

edges. It is going to have 20 elements; therefore the Concentration matrix of a 20

variables set is going to be created ( is a 20 20 matrix). It does not represent a very

big data set, but it is enough to perform a practical simulation to analyze and get

useful results.

Then, taking the Concentration matrix and replacing any value different from 0 with

ones and placing zeros on the diagonal, the Adjacency matrix showing the partial

correlation existence or not is obtained. The plot of that matrix gives us the graph

representing the real network (Fig. 6) with which all simulations are going to be

compared.

FIGURE 6 – TRUE NETWORK

Obtaining an Estimated NetworkFrom that point on, the simulation process begins. First, a command making random

observations from a multivariate normal distribution is defined. The number of

variables is equal to 20 and their variance-covariance matrix used to generate those


– 28 –

observations is the inverse of the Concentration matrix created in the first step.

After that, the space.joint command, which performs the joint sparse regression, is

used to generate a concentration matrix from the random observations obtained

previously taking as the value of the penalty. Finally, the same procedure already

done to obtain the true Adjacency matrix is repeated to obtain the estimated

Adjacency matrix by replacing any value different from 0 with ones and placing zeros

on the diagonal.

The last step consists of plotting the graphs corresponding to both the true and the

estimated adjacency matrices so as to get the true and the estimated networks

graphically represented.

Obtaining and Recording the ResultsThe previous section showed the basic procedure of the simulation. Nevertheless, to

make a deeper and more reliable analysis of the joint sparse regression model, we

must perform several simulations while changing the two variables that affect the

metrics of the final estimated network, which are the sample size of the multivariate

random sampling and the penalty value of the space.joint regression. Therefore, the

objective is to check how sparsity, sensitivity and specificity vary and change.

Using the commands that generate a single estimated network as the basis, a more

complex script that performs the simulation a thousand (1000) times is written, and in

which the values of and can be chosen.

In our simulation, we are going to study the results for four different samples sizes ={50 ; 100 ; 500 ; 1000}. The tuning parameter lambda, always proportional to the

sample size29, is going to be determined to take values from = 0.00 × to =1.00 × in a sequence of 0.05 by 0.05, plus = 1.50 × and = 2.00 × so as to

complete the study of how it affects the metrics of the resulting network.

Since the simulation is a random process, it must be done many times in order to get

accurate and unbiased results30. To reach that end, the script is prepared to repeat the

29 Determined to be proportional to the sample size because of the LASSO model (R. Tibshirani, 1996).30 Although SPACE is presented by Peng et al. (2009) to be powerful even in high-dimension-low-sample-size scenario


– 29 –

process of estimating a network 1000 times for each pair of lambda and sample size

values, which ends up in an estimating process that generates (4 [ ] ×1000 [ ] × 22 [ ] =) 88,000 networks. Such amount of

iterations and the corresponding results collected cause an average computer to take

some minutes to entirely run the script.

Finally, the script counts how many edges are there and how many of them are

correctly detected (i.e., edges that do exist on the true network) and makes the

average of the 1,000 networks of each pair of and values. The final outcomes of

the script are four 4 × 22 matrices containing the four sample sizes as rows and the

twenty-two penalty values as columns: colavgeest is the collection of the average

edges estimated, colavgteest is the collection of the average true edges estimated,

sens is the recording of the sensitivities computed by dividing colavgteest by the

number of true edges (19) and spec is the recording of the specificities calculated by

dividing colavgteest by colavgeest.

Once having recorded all the data needed, the four matrices are exported as .csv files

(i.e., comma-separated values) so as to become portable documents, and plots of the

results obtained are prepared. These graphs are the tools from which conclusions

about the performance of the SPACE model are going to be taken out.

The complete script of the illustrated simulation can be found in the Annex 1, as well

as the tables with the results mentioned in the previous paragraph in Annexes 2 to 5.

3.3 Plotting the ResultsIn order to analyze the data obtained from the simulation, two different graphs are

going to be plotted so as to study the features of the joint sparse regression model.

The first one is going to show how changes in the penalty value affect the sparsity of

the estimated network; and the second one is going to be a ROC curve (i.e., Receiver

Operating Characteristics curve) of the model comparing specificity and the FDR in

each sample size.

Lambda versus SparsityTo be able to analyze the effect that changes in the tuning parameter have on sparsity,

the following graph (Fig. 7) is shown, which exhibits how the number of edges


– 30 –

estimated vary with different lambda values from = 0.00 × to = 1.00 × , and

for each samples size.

FIGURE 7 – HOW DOES LAMBDA AFFECT SPARSITY

Going back to the definition of sparsity, it refers to the fact that not every node is

connected with everyone in the data set. Therefore, the smaller the total number of

edges found, the sparser the estimated network is. Consequently, on the graph it is

clearly observed that there is a positive relation between the penalty value and

sparsity. When the first one increases, the total number of edges found decreases, so

the estimated network becomes sparser, moving from above the number of true edges

for lower lambdas, to below it when the penalization is too big.

It can also be pointed out that when the penalty value is zero, the number of edges

found is 190 for every sample size, which is the maximum possible number of existing

edges in a system of 20 elements31. In the same way, when the penalty is = 1.00 ×31 The maximum possible number of edges is ×( ) = × = 190.

0

20

40

60

80

100

120

140

160

180

200

Num

ber o

f edg

es fo

und

λ

How does λ affect sparsity

n=50

n=100

n=500

n=1000

True Edges = 19


– 31 –

, sparsity converges again for all sample sizes at a level below the true number of

edges in the network.

Comparing now the results found for the different sample sizes, it is observed that for

large values of , the estimated networks become very sparse and the number of

edges estimated fall close to the number of true edges (represented by the

discontinuous black line at 19) just by increasing a little above zero, and further

penalization does not alter significantly the number of edges for a wide interval of

values. On the other hand, for small sample sizes, the number of edges estimated is

close to the number of true edges just for few values of (proportionally big

ones).Therefore, in the cases is lower, the estimated networks become sparser

following a not-so-steep path until the true number of edges is reached, and once in

there, the number of edges estimated fall below the optimal value faster than with

large sample sizes as lambda increases.

Consequently, for proportionally equal penalty values, estimations from small values

are less sparse and fit worse with the true network than the ones from big sample

sizes. So as to check graphically those results, three different values of are chosen,

and the estimated networks obtained with these tuning parameters for the sample

sizes equal to 50 and 1000 are exhibited in Figure 8.


– 32 –

FIGURE 8 – NETWORK COMPARISON BETWEEN SAMPLE SIZES FOR DIFFERENT VALUES

ESTIMATED NETWORKS


– 33 –

From the networks plotted above, it can be concluded that for proportionally small

values of lambda, big samples sizes fit much better with the true network than small

ones. However, that difference is reduced as lambda increases proportionally and

networks become sparser.

As a conclusion, the effectiveness of LASSO as a sparse estimator is well proved when

penalization is applied, working and fitting better through the joint sparse model when

the lambda value belongs to the optimal range for each sample size, and performing

more precisely when the number of observations is higher.

ROC CurveOnce sparsity has been reviewed, what follows is to analyze the efficacy of the joint

sparse regression model. To achieve that goal, the Receiver Operating Characteristic or

simply ROC curve32 stands as a very useful tool, since it is a commonly used graph to

illustrate the performance of binary classifier systems. It is generated by plotting the

sensibility on the false discovery rate or FDR.

Although the FDR ranges from zero to one, on the following graph (Fig. 9) it is defined

from zero to 0.20 in order to focus on the interval where the results on the model’s

effectiveness can be compared more significantly for different sample sizes of the

estimations.

32 (Fawcett 2006)


– 34 –

FIGURE 9 – ROC CURVE

At first sight, it can be observed that there is a positive relation between sensitivity and

false discovery rate. This can be explained in terms of type I and type II error. As

described in the Choosing Lambda section, Type I error is a false positive, in that case,

the detection of edges that are not true; while type II error is a false negative, i.e. not

detecting the true edges. Therefore, it is clear that when type I error increases and we

detect more false edges (specificity decreases), the type II error decreases by detecting

a higher number of true edges.

The ROC curve shows that the differences in the performance of the model depending

on the sample size of the estimation are crystal clear. While high false discovery rates

are ‘paid’ by small sample sizes such as = 50 in order to achieve high degrees of

sensitivity, in the large sample sizes the sensitivity is almost constant at 1 all along the

graph, and starts to decrease only when false discovery rate is very close to zero and

the number of total estimated edges falls below the number of total true edges. In

short, the highest the , the better the model is going to fit with the true network.

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 0,05 0,1 0,15 0,2

Sens

itivi

ty(in

vers

e to

type

II e

rror

)

False Discovery Rate (1 - specificity)type I error

ROC Curve

n=50n=100n=500n=1000FDR = 0,05


– 35 –

To make a deeper look at that reasoning, the type I error is kept low by setting the FDR

at the 0.05 level, and that way compare how each sample size works at that point

looking at the sensitivity degree. Starting from below, = 50 achieves just a

sensitivity of 0.32, = 100 achieves 0.67, = 500 achieves a 0.98 and = 1000achieves a maximum sensitivity of 1.00. Thus, the lower the sample size, the higher the

type II error present in the estimated network for a fixed level of type I error. The

following plots (Fig. 10) illustrate the better fit with the true network for estimations

made out of larger sample sizes while maintaining FDR at 0.05.

FIGURE 10 – NETWORK COMPARISON BETWEEN SAMPLE SIZES KEEPING FDR AT 5% LEVEL

ESTIMATED NETWORKS


– 36 –

Out of the previous graphs it can be seen that, in order to maintain a similar false

discovery rate, the bigger the sample size, the smaller the proportional penalty value.

Therefore, it could be assumed that in absolute terms, in order to get a better fit of the

estimated network the tuning parameter must increase less than proportional to the

increase in the sample size.

3.4 ConclusionsOut of the illustrated simulation it is confirmed the successful functioning of the joint

sparse regression model, and therefore of the LASSO estimator when estimating sparse

partial correlation networks under the right lambda values. Applied to relatively small

data sets as the one used in that section, perfectly fitted estimated networks can be

obtained when dealing with substantial sample sizes, showing the accuracy of the

model.

Further discussion about the performance of these models in different setups such as

the high-dimension-low-sample-size scenario can be found in the original papers of

Peng et al. (2009) and Tibshirani (1996).4. Application with real data4.1 IntroductionOn that last section of the project, the joint sparse regression model is going to be

applied to a real dataset to generate the estimated partial correlation network of a real

multivariate time series. It is not intended to analyze in deep the resulting network,

what could be done by getting hypothesis of the reason for each partial correlation

found and trying to accept or refuse them. Instead of that, this project has focused on

how the model and its estimating tool work, both theoretically and practically.

Therefore, the goal of this application with real data is to exhibit how the model

provides us with an estimated network of a real system from where, by observation

and reasoning, understand how the elements in a set are related in a much simple and

understandable manner than analyzing the numbers behind the graph. Notice that, as

described in the previous sections, the final outcome of the application is going to be

an undirected-unweighted graph, like the prototype partial correlation network. But in


– 37 –

this case, it is also going to be a colored graph, so as to make it more understandable

and easy to analyze.

In order to put in practice the methodology previously described, the database chosen

needs to meet some requirements to suit in the best possible way the objective set.

Firstly, it must fulfill the characteristics required by partial correlation networks, which

perform under multivariate white noise processes. Furthermore, so as to get to more

reliable results and to be able to take out more conclusions on the results found, the

data to be used needs to be normally distributed, which can be computed by

standardizing the original one.

Apart from the already mentioned constraints, there are also some practical bounds

that must be taken into account. Despite the fact that the model is designed to work

well with big data sets, it is not appropriate to deal with that kind of systems in this

case, due to the scope and the length of our project. Facing a system with more than

100 variables would stand as an unfeasible challenge in this section.

Taking into account the objectives exposed and the constraints already mentioned, the

change in the unemployment rate among the states of the US in the last decades suits

well the specified setting. Moreover, in the big market that the US represents, there is

a high degree of labor mobility (compared to the European Union)33 and it comprises

states with completely different characteristics regarding roles inside the national

economy, which could lead to an interesting to analyze estimated partial correlation

network.

4.2 Explain the dataFrom the Bureau of Labor Statistics34, a webpage derived from the United States

department of labor, it can be exported the unemployment rate data of the 51 states

(i.e. the 50 states and the federal district of Columbia) from the January of 1976 to the

August of 2013. Thus, the real dataset is composed of 51 variables containing 451observations each (552 unemployment rates in specified countries used to calculate

33 (Nickell 1997)34 (“Bureau of Labor Statistics” 2013)


– 38 –

551 changes in unemployment rate), standing as a reasonable size for our estimations

to end up with significant results.

As mentioned before, working with Gaussian data eases the performance and the

analysis of the model. Hence, instead of the unemployment rate values, the

application is going to deal with its monthly changes (exported directly from the same

source) since this results in every variable having a mean close to zero. Then, to

standardize the data, each variable needs to be divided by the standard deviation of its

observations.

4.3 ProcedureThe procedure followed to reach the estimated network from the exported data has

some points in common with the illustrated simulation, since it is based on calculating

the adjacency matrix of a data set and plotting its network. Hence, the procedure is

written down and performed by a script in the R software, which like the illustrated

simulation is going to be attached on the Annexes section35 for further

comprehension.

ScriptFirstly, the unemployment data collected in an excel file is imported to “R”. Each

variable is divided by its standard deviation and the 451 51 matrix called sunempUS is

created, where each column is named with the abbreviation of a state containing its451 observations and each row is named with the date (i.e. month and year) of the

observations. From that point, the partial correlations matrix is generated by running

the space.joint command. Nevertheless, to do so the penalty value must be assigned,

as the process of choosing lambda plays a crucial role in the performance of the

application.

As explained on the theoretical section of the project, there are quantitative methods

as BIC and AIC which enable to find out the optimal lambda for the model. However,

since those tools are beyond the scope of the project, the penalty value of the real

data application has been chosen based on the results obtained from the illustrated

simulation, which is that when dealing with significantly big sample sizes, the optimal

35 See Annex 6: Script of the Real Application estimation.


– 39 –

lambda is going to be the one from which small variations on both directions do not

modify substantially the features of the resulting estimated network. Consequently,

the process has been based on observing the networks from a wide range of lambda

values and choosing the one that looks more efficient. Again, remind that the analysis

of the causes behind the nature of the partial correlations are not a goal of the project

and therefore the selection of the optimal lambda is not a crucial point, as long as we

are working in a close interval from it.

Having decided the tuning parameter for the sparse joint regression, some features of

the network need to be defined so as to get a final graph from where to get visual

conclusions and also to make it the more visually understandable and pleasant to its

observers. Firstly, the labels of each node are defined as the abbreviated name of the

state36 it represents. Secondly, as we already mentioned, the estimated network is

going to be colored, both regarding the nodes and the edges.

Then, the color of the nodes represents the region where their states are located.

Among the various regional classification systems that exist for the US, the one used by

the government to collect the US population census data37 has been used on the

application. It divides the country in four regional areas (which are further divided in

divisions, not taken into account on the network). The region 1 is the Northeast, the

smallest in extension, represented by a light green. It comprises the following states:

Division 1 (New England): Maine, NewHampshire, Vermont, Massachusetts, Rhode Island, Connecticut

Division 2 (Mid-Atlantic): New York, Pennsylvania, New Jersey

The region 2 is the Midwest, represented in a violet tone and comprising the following:

Division 3 (East North Central): Wisconsin, Michigan, Illinois, Indiana, Ohio Division 4 (West North Central): Missouri, North Dakota, South

Dakota, Nebraska, Kansas, Minnesota, Iowa

The region 3 is the South, characterized by a tan tone. It comprises three divisions:

Division 5 (South Atlantic): Delaware, Maryland, District ofColumbia, Virginia, West Virginia, North Carolina, Carolina, Georgia, Florida

36 (“State Abbreviations” 2013)37 (“Census Regions and Divisions of the United States” 2013)


– 40 –

Division 6 (East South Central): Kentucky, Tennessee, Mississippi, Alabama Division 7 (West South Central): Oklahoma, Texas, Arkansas, Louisiana

Finally, the region 4, the West, is represented by the color blue in the network and it is

composed by the following states:

Division 8 (Mountain) : Idaho, Montana,Wyoming, Nevada, Utah, Colorado, Arizona, New Mexico

Division 9 (Pacific): Alaska, Washington, Oregon, California, Hawaii

So as to assign another visual feature of the variables, according to the population of

each state, the 51 variables are sorted in 6 groups and the size of their nodes is defined

proportionally, ending up with 6 kinds of nodes in terms of size.

After that, the color of the edges to be found is defined. Taking the colors of the nodes

as the outline, the edge connecting states from the same region is going to have a tone

similar to the nodes which is connecting. On the other hand, edges linking nodes from

different regions are going to be painted in grey. That will ease the analysis of

similarity regarding location, which is checking if the fact of being from the same

region has any relation with the partial correlations estimated.

Furthermore, some additional features as the width of the edges or the font of the

labels in each vertex are modified to get a much more attractive graph. The last step is

just to plot the final estimated partial correlation network using the interactive

interface given by the tcltk package and to analyze it.

4.4 ResultsThe analysis of the network obtained is going to be divided in two parts. The first one

is the visual observation, intended to exhibit the power of the model to represent

graphically the partial correlations among a set of variables from where to get

conclusions. Therefore, some hypothesis are going to be made for the data set chosen

whose veracity is not going to be discussed in the project. On the other hand, the

numerical analysis is going to be based on calculating some metrics of the network so

as to analyze its structure and also state some hypothesis extracted from the

numerical results (hypothesis which are not going to be tested afterwards).


– 41 –

Visual analysisAlthough the partial correlation network given by the joint sparse model is a very

useful tool to ease the analysis of a data set, looking at the resulting network might

seem a little bit confusing at first sight. In order to get reasonable conclusions it

requires looking carefully at it and sorting what the network exhibits to extract just the

significant information. Therefore, since it requires using R to get an interactive plot of

the network, the following figured plotted (Fig. 11):


– 42 –

FIGURE 11 – UNEMPLOYMENT RATE CHANGES AMONG US STATES – ESTIMATED PARTIAL CORRELATION NETWORK USING SPACE

⓯Northeast ⓯South

⓯Midwest ⓯West


– 43 –

From a general view, what is observed at first is the significant sparsity of the network

obtained. There are three main components and twelve nodes alone. One component

is composed by two states, another by five states and the biggest one contains thirty-

two nodes.

Referring to the isolated nodes, seven out of twelve are from the south region. Hence,

if the model reliably shows reality, it seems that the labor market in that region is kind

of fragmented, since the 40% of the region behaves in an independent manner. The

reason behind that could be easier to explain in some states than in the others.

Florida, as an example, could be explained by the hypothesis that its economy is based

on tourism since it is the top travel destination in the world38. It is also observed that

three of that isolated nodes are from the Midwest, so it could also be interpreted as a

sign of fragmentation there. In relation to the entire country, the graph exposes that

approximately the 24% of the states behave as independent labor markets, idea

supported by the size of those isolated states, which apart from Delaware (DE) and

Oklahoma (OK) belong to the highly populated groups.

Looking at the components, the smallest one is made up of two states, Louisiana (LA)

and Mississippi (MS), both located in the south. Those states share frontiers and it

could be assumed that their labor markets perform pegged to each other. The next

component, which in the network appears in a kind of kite shape, is composed by five

states: Missouri (MO), Iowa (IA), Kansas (KS) and South Dakota (SD) from the Midwest

region; connected to the component by the former there is Utah (UT), belonging to the

West. As shown below, the four states sharing region also share boundaries, so it is

reasonable to think that they act as an independent entity in terms of labor mobility.

Figure 12 shows the territorial proximity on those two components.

38 (“Tourism in Florida” 2013)


– 44 –

FIGURE 12 – LOCATION OF THE STATES COMPOSING THE TWO SMALL COMPONENTS 39

Then, there is the main component of the network which is formed by the 32

remaining states. That is the more complicated point in the analysis of that network

due to the high density of edges in between those elements. Nevertheless, the fact of

having them colored in function of the location of the nodes linking is going to highly

ease that task. The main observation here could be that geographical location drives

grouping. That stands as the similarity analysis of the network, since we are checking if

the fact that nodes share additional characteristics as location influences the partial

correlations among them. That assumption is derived from the fact that there can be

observed substantial partial correlations connecting states in the same region.

Therefore, proximity has an impact on those “regional labor markets”. Then, it can be

also appreciated that small states are always connected to any of the big ones,

belonging to the same regional area or not. On the other hand, the big ones seem to

be very central elements of the component, since they are connected by a high

number of edges. That leads to the idea that the big states on the component perform

as the central elements of that labor market influencing the small ones. That also

39 (“National Atlas ”)


– 45 –

reinforces the idea that the elements which are isolated are able to work as

independent labor markets due to its big size.

Furthermore, California (CA) seems to be highly interconnected and therefore the

most central element of the network. That makes a lot of sense nowadays, since it is

one of the economic driving forces of the US and one of the top regions in the world in

development of information and communications technologies. Hence, it is linked to

states belonging to the four regions of the country. Moreover, other big states like

Illinois (IL), Michigan (MI), Minnesota (MN) and Virginia (VA) appear as kind of central

elements too.

Finally, not very clearly exhibited, it seems that the Northeast region, despite having

partial correlations with the other geographical areas, is a little bit less interconnected

to the entire component. On the graph they appear in a kind of peripheral area, but

that hypothesis cannot be strongly extracted from the visual analysis.

Some of the statements made above are going to be reinforced or refuted when

analyzing the metrics of the network on the next section. However, as a specific

conclusion on the visual analysis, it can be stated that the network provided by the

joint sparse model enables to get reasonable assumptions of the structure of the real

system in a quite well manner.

Numerical analysisThe first thing which is going to be confirmed by the analysis of the numerical results is

the quite obvious at sight sparsity of the data. To do so, the number of edges

estimated in the regression model implemented is going to be compared to the total

amount of possible edges in the case every single state was partially correlated with

each other.

The result is that only 98 out of the 1275 possible existing edges are estimated to

exist, a clear sign of sparsity of the data. In other words, out of 1275 possible partial

correlations to happen, only the 7.69% appear to be true.

This information, although difficult to be calculated exactly on visual analysis because

of the limited space of the plot, is something that could be observed directly on the


– 46 –

graphical representation of the network. However, there are other features or metrics

which are much more difficult to observe in the final plot, or even impossible in some

cases.

An example of the first case is measures of centrality. In this project we have chosen to

measure centrality by just counting the total number of existing estimated edges (i.e.,

degree of vertices) each state has, because more complex metrics fall beyond the

scope of our project. The results show that the states which have a higher number of

connecting edges are California (12), Virginia (11), Minnesota (10), New Jersey (9) and

Illinois (9). All these states are classified as big sized ones, which should not be

surprising as the bigger the state, the more influential it can be thought to be to the

economy of the nation, and therefore the more interconnected it should be. However,

as previously discussed in the general analysis, several big states appear to be

independent in the network, and a alternative reasoning might be proposed. Finally, it

is at least surprising that the first four states in terms of centrality belong each of them

to one of the different regions selected to classify the elements. See Figure 13.

FIGURE 13 – REGIONAL LOCATION OF THE FOUR MOST CENTRAL STATES OF THE NETWORK 40

40(“U.S. Regions: West, Midwest, South and Northeast”)


– 47 –

To complete the analysis of centrality, the average number of edges each state has is

almost 4 in the whole network, despite the median is 3 given the many states that

have been estimated to be conditionally independent, in particular twelve (12). As

mentioned, the maximum is 12 in California, and the minimum is obviously 0 in all the

independent states.

On the other hand, an example of metric which cannot be observed in the graph is the

strength of the partial correlations plotted. One may be interested in knowing whether

the edges in the graph represent a significant41 partial correlation or not, or even

more, if the partial correlation found is positive or negative.

The analysis of the strength of partial correlations estimated using SPACE shows that,

out of those partial correlations estimated to be different from zero, one in three can

be assumed to be large enough42; hence a clear interconnection can be assumed to be

in this case. Considering also the elements which are not partially correlated (i.e.,

partial correlations estimated to be zero), only in 2.6% of the cases it takes values high

enough to be considered significant. This last finding supports the idea of sparsity of

the data, as only 2.6% of the total possible edges to be found are estimated to have a

value important enough using the SPACE model.

In particular, the higher value is found in between two states that, as described in the

previous section, form a component by themselves. These are Louisiana (LA) and

Mississippi (MS). The reasoning behind falls beyond the scope of our project, but it is

clear that the change in unemployment rate in those two states is highly

interconnected, as the partial correlation is larger than 0.5. The next largest partial

correlations connect Montana (MT) and Idaho (ID), Vermont (VT) and Maine (ME), and

in the fourth position Tennessee (TN) and Kentucky (KY), whose values are 0.36, 0.35

and 0.34 respectively. All these pairs of states share boundaries (with the exception of

Vermont and Maine, although they are almost sharing frontier), which confirms what

could be guessed at first sight: geographical location drives interconnectivity in this

setup.

41 Understanding significant as a synonym of important enough, large enough to be considered42 Values above 0.1 for a Partial Correlation are assumed to be important enough


– 48 –

Another finding worth to point out is the absence of negative partial correlations in the

estimation. None of the states has been estimated to have a negative partial

correlation with another one. It seems economically logical, in the sense that all the

states are part of the same economy, the United States of America, and although in

some cases it has been found that no relationship can be estimated, it makes sense to

assume that hits that affect the economic situation and make the unemployment rate

to fluctuate affect similarly all the states. More precisely, it has not been estimated any

case in which a change in one direction in a particular state, for instance a reduction in

the unemployment rate, is connected to a change in the opposite direction in another

state.

The last analytical metric performed has to do with clustering. First we have selected

the top centrality state in each of the four regions, which has been found previously in

this section talking about centrality. Those are California (CA) in the West, New Jersey

(NJ) in the North-East, Minnesota (MN) in the Mid-West and Virginia (VA) in the South

(although Virginia is not really in the geographical South). For these states, the

measure of clustering performed has been to calculate the number of nodes in

common each possible pair of states has. It means, for instance, how many states out

of those California is connected by an edge with are also at the same time connected

to Virginia.

The results show that all the possible pairs formed with these four states have at least

two nodes in common, with the maximum found in between California and New

Jersey, which share six nodes.

Afterwards, our interest has been directed at finding which is the pair of states more

clustered in the whole nation. To do so we have calculated the number of nodes in

common of each state with each other state. The complete results can be found in the

symmetric matrix in the Annex 7, but the main result is that the states sharing more

nodes are again California and New Jersey. It is surprising, as so far we have found that

geographical location had quite importance in the existence of partial correlation in

between two countries, and in this case, the two states are located very far away from

each other. We could guess that the point here is that CA and NJ are no directly


– 49 –

connected, but as both them (similar to the other big states located in the main

component) are driving forces of the US economy, changes in any of these two (or

more if considering other big states) countries affect a high number of states, that in

addition coincide.

The final ratio calculated is the average number of nodes in common found for two

elements in the network. The percentage equaling 43% suggests that, almost half of

the times you pick two random states in the set, the chances are that these two

random states share a vertex in common. To assess whether the estimated network

found in this section could be considered to be small-world, further analysis of

clustering coefficients should be done, using the tools proposed by Watts and Strogatz

(1998)43.

4.5 ConclusionsThe final purpose of this application with real data is not the analysis of the data

behind the multivariate time series used, but to analyze how the estimation model

presented in this project works in a real data setup. However, a few ideas or

hypothesis emerge from the output of the study of the change in unemployment rates

in the US states in the last decades.

The main hypothesis we would put forward is that high degree of labor mobility in the

United States determines the non-negative partial correlation existence in the data

analyzed. In other words, we would suggest that labor demand and labor supply across

the different states ‘meet’ at a national level after all, in the sense that changes on the

unemployment rate could be caused by movements on the entire economy, and labor

force moves from states with labor surplus to the states where there is high demand,

reducing unemployment imbalances in between them.

But, all in all, this section has been useful to check that SPACE is a powerful tool to

represent the interconnections in between elements of a real data multivariate time

series.

43 (Watts and Strogatz 1998)


– 50 –

5. ConclusionsThroughout this project, the joint sparse regression model (i.e. SPACE) is proposed to

estimate partial correlation networks representing big data sets under the assumption

on sparsity. Hence, after the setting of the theoretical framework on the method and

the sparse estimation tools it incorporates (i.e. LASSO), SPACE has been analyzed from

two points of view: its estimating efficiency and its usefulness as an analytical tool for

real data sets.

For the first one, an illustrated simulation is done so as to analyze the performance

and precision of the method in estimating a true network. It has been checked by

estimating 88,000 networks while changing both the sample size (i.e. the number of

observations for each variable) and the penalty value , which works as the tuning

parameter of the method. From this section there have arisen some strong

foundations stating that the method works very well on delivering sparse estimators

and in which, maintaining a constant false discovery rate (i.e. type I error), the bigger

the sample size, the higher the sensitivity of the estimation (i.e. the lower the type II

error). However, to obtain almost perfectly fitting estimated networks, an optimal

penalty value must be chosen. Hence, since the BIC and AIC criteria fall beyond the

scope of this project, it has been empirically observed by the figures provided in the

simulation that for big enough sample sizes, the optimal value falls in an interval

where small variations do not modify substantially the structure of the network, since

sparsity keeps more or less constant at the true level. That assumption has been used

on the second part of the analysis when dealing with real data.

The second viewpoint used to test SPACE has been the application of the model to a

real data set. After defining the constraints on the choice coming both from the

method and the scope of the project, the selected multivariate system has been the

changes in unemployment rates among the US states from January 1976 to August

2013. It represents a data set composed by 51 elements with 451 observations each, a

reasonably big enough data set both in number of variables and sample size so as to

proof the usefulness of SPACE in analyzing it. From that point on, the method has been

applied to the data and the final estimated partial correlation network has been

obtained, using a penalty value chosen under the assumptions obtained on the


– 51 –

illustrated simulation regarding the optimal value. At first sight, the colored

undirected-unweighted network appears as a very useful tool from where to get

conclusions on the data set. After a deeper visual and numerical analysis of the

network structure, in which we have checked some metrics of the network such as

centrality, clustering or similarity, the idea being tested during the application has

been reaffirmed. Hence, the estimated partial correlation network resulting from

SPACE is a powerful tool to analyze the interconnections between elements on sparse

real data sets.

To sum up, the proposal of the joint sparse regression model as an estimating method

for partial correlation networks has been reaffirmed during the entire project by its

effective performance as sparse estimating model and its usefulness when dealing

with appropriate real data sets.


– 52 –

6. List of figuresFigure 1 – Undirected and directed network graphs ....................................................... 7

Figure 2 – Unweighted and weighted network graphs .................................................... 7

Figure 3 – Uncolored and colored network graphs.......................................................... 8

Figure 4 – Undirected-weighted network graph and its adjacency matrix...................... 9

Figure 5 – Path and components in a network graph .................................................... 10

Figure 6 – True network ................................................................................................. 27

Figure 7 – How does lambda affect sparsity .................................................................. 30

Figure 8 – Network comparison between sample sizes for different values.............. 32

Figure 9 – Roc curve ....................................................................................................... 34

Figure 10 – Network comparison between sample sizes keeping FDR at 5% level ....... 35

Figure 11 – Unemployment rate changes among us states – estimated partial

correlation network using space .................................................................................... 42

Figure 12 – Location of the states composing the two small components ................... 44

Figure 13 – Regional location of the four most central states of the network ............. 46


– 53 –

7. BibliographyA. Parker, Jeffrey. “Fundamental Concepts of Time-Series Econometrics.”

http://www.reed.edu/economics/parker/s13/312/tschapters/S13_Ch_1.pdf.

Arthur, Lisa. 2013. “What Is Big Data?” Forbes.http://www.forbes.com/sites/lisaarthur/2013/08/15/what-is-big-data/.

Barigozzi, Matteo, and Christian Brownlees. 2013. “Econometric Analysis of Networks.”

Brin, Sergey, and Lawrence Page. 1998. “The Anatomy of a Large-Scale HypertextualWeb Search Engine.” Computer Networks and ISDN Systems 30 (1-7) (April): 107–117. doi:10.1016/S0169-7552(98)00110-X.http://linkinghub.elsevier.com/retrieve/pii/S016975529800110X.

“Bureau of Labor Statistics.” 2013. Accessed November 15.http://www.bls.gov/bls/exit_BLS.htm?url=http://www.dol.gov/.

“Census Regions and Divisions of the United States.” 2013. Accessed November 16.http://www.census.gov/geo/maps-data/maps/pdfs/reference/us_regdiv.pdf.

Cukier, Kenneth. 2010. “Data, Data Everywhere.” The Economist 12 (8).http://www.economist.com/specialreports/displayStory.cfm?story_id=15557443.

Egea, Andrés. 2010. “«Volveré a Estudiar; de La Petanca No Se Puede Vivir».”Laverdad.es. http://www.laverdad.es/murcia/v/20101014/deportes_murcia/mas-deporte/volvere-estudiar-petanca-puede-20101014.html.

Einav, Liran, and Jonathan Levin. 2013. “The Data Revolution and Economic Analysis.”Prepared for the NBER Innovation Policy and the Economy Conference: 1–29.

Fawcett, Tom. 2006. “An Introduction to ROC Analysis.” Pattern Recognition Letters.doi:10.1016/j.patrec.2005.10.010.

Fox, Bob, Rob van den Dam, and Rebecca Shockley. 2013. “Analytics : Real-World Useof Big Data in Telecommunications.” http://www-935.ibm.com/services/multimedia/Real_world_use_of_big_data_in_telecommunications.pdf.

Gardner, Timothy S, Diego di Bernardo, David Lorenz, and James J Collins. 2003.“Inferring Genetic Networks and Identifying Compound Mode of Action viaExpression Profiling.” Science (New York, N.Y.) 301 (5629): 102–105.doi:10.1126/science.1081900.

Groves, Peter, B Kayyali, David Knott, and S Van Kuiken. 2013. “The ‘BigData’revolution in Healthcare.” McKinsey Quarterly (January).


– 54 –

http://www.lateralpraxis.com/download/The_big_data_revolution_in_healthcare.pdf.

“Igraph: Network Analysis and Visualization.” 2013. Accessed November 10.http://cran.r-project.org/web/packages/igraph/index.html.

Jeong, H, S P Mason, A L Barabási, and Z N Oltvai. 2001. “Lethality and Centrality inProtein Networks.” Nature 411 (6833): 41–42.

McBride, Ryan. 2012. “10 Reasons Why Biotech Needs Big Data.” FierceBiotechIT.http://www.fiercebiotechit.com/special-reports/10-reasons-why-biotech-needs-big-data.

“National Atlas .” http://nationalatlas.gov/.

Newman, M. 2003. “The Structure and Function of Complex Networks.” Arxiv PreprintCond-Mat. http://arxiv.org/abs/cond-mat/0303516,2003.

Nickell, Stephen. 1997. “Unemployment and Labor Market Rigidities: Europe VersusNorth America.” Journal of Economic Perspectives. doi:10.1257/jep.11.3.55.

Peng, Jie, Pei Wang, Nengfeng Zhou, and Ji Zhu. 2013. “Space: Sparse PArtialCorrelation Estimation.” Accessed November 10. http://cran.r-project.org/web/packages/space/index.html.

Peng et al. 2009. “Partial Correlation Estimation by Joint Sparse Regression Models.”Journal of the American Statistical Association 104 (486) (June 1): 735–746.doi:10.1198/jasa.2009.0126.http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2770199&tool=pmcentrez&rendertype=abstract.

Scholz, Matthias. 2013. “Getting Connected - The Highly Connected Society.” AccessedNovember 15. http://www.network-science.org/highly-connected-society-dense-social-complex-networks.html.

“State Abbreviations.” 2013. Accessed November 15. http://about.usps.com/who-we-are/postal-history/state-abbreviations.pdf.

“Tcltk.” 2013. Accessed November 10. http://cran.r-project.org/web/packages/tcltk/index.html.

Tegner, Jesper, M K Stephen Yeung, Jeff Hasty, and James J Collins. 2003. “ReverseEngineering Gene Networks: Integrating Genetic Perturbations with DynamicalModeling.” Proceedings of the National Academy of Sciences of the United Statesof America 100 (10): 5944–5949.


– 55 –

Tibshirani, R. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of theRoyal Statistical Society Series B Methodological 58 (1): 267–288.doi:10.1111/j.1553-2712.2009.0451c.x. http://www.jstor.org/stable/2346178.

Toffler, Alvin. 1970. Future Shock. USA: Random House Inc.

“Tourism in Florida.” 2013. Accessed November 20.http://fcit.usf.edu/florida/lessons/tourism/tourism1.htm.

“U.S. Regions: West, Midwest, South and Northeast.”http://thomaslegion.net/uscensusbureauregionsthewestthemidwestthesouthandthenortheast.html.

Watts, D J, and S H Strogatz. 1998. “Collective Dynamics of ‘Small-World’ Networks.”Nature 393 (6684) (June 4): 440–2. doi:10.1038/30918.http://www.ncbi.nlm.nih.gov/pubmed/9623998.


– 56 –

8. Annexes Annex 1 – Script Illustrated Simulation Annex 2 – Table Results Illustrated Simulation Sensitivity Annex 3 – Table Results Illustrated Simulation Specificity Annex 4 – Table Results Illustrated Simulation Edges Estimated Annex 5 – Table Results Illustrated Simulation True Edges Estimated Annex 6 – Script Application with Real Data Annex 7 – Matrix Representing Number of Nodes in Common Between Every

Pair of States