62
Social Network Analysis with Sylva Social Network Analysis with Sylva Juan Luis Suárez & Anabel Quan-Haase Western University

Sylva workshop.gt that camp.2012

Embed Size (px)

Citation preview

Page 1: Sylva workshop.gt that camp.2012

Social Network Analysis with SylvaSocial Network Analysis with Sylva

Juan Luis Suárez & Anabel Quan-HaaseWestern University

Page 2: Sylva workshop.gt that camp.2012

Overview of Workshop• General overview of the social network

approach• Key terminology• Uniqueness of collecting and analyzing

social network data• Entering data into Sylva• Importing/exporting data into Sylva• Example I:• Example II:• Understanding limitations and problems• Future Work and Gephi.org

Page 3: Sylva workshop.gt that camp.2012

What is SNA?Social network analysis is focused on uncovering the patterning of people’s interaction.…Network analysts believe that how an individual lives depends in large part on how that individual is tied into the larger web of social connections. Many believe, moreover, that the success or failure of societies and organizations often depends on the patterning of their internal structure (Freeman, 1998, November 11).

Page 4: Sylva workshop.gt that camp.2012
Page 5: Sylva workshop.gt that camp.2012

What is Unique about SNA?

Social science research and theory tends to focus on social actors’:

•attributes•attitudes •opinions•behavior

Focus is on individual level of analysis, less on network-structural level.

Page 6: Sylva workshop.gt that camp.2012

a whole is not simply the sum of its parts

Page 7: Sylva workshop.gt that camp.2012

Key Terminology

• 1. Social structure• 2. Social network• 3. Nodes• 4. Linkages/relations• 5. Additional terms of relevance:

– Nodes & edges– Directed graphs vs. undirected graphs– Ego– Alter– Homophily

Page 8: Sylva workshop.gt that camp.2012

1. Social Structure

• Sociological inquiry consists of understanding the constraining influence of social structure on social action

• BUT; how do we study social structure?

Attributes Networks

Page 9: Sylva workshop.gt that camp.2012

Figure 2: Social Structure as Social Network

Social ActorsTies

2. Social Network

Page 10: Sylva workshop.gt that camp.2012

3. Nodes

• The actors considered in a social network are exclusively social (alternatively referred to as agents, nodes, or social entities).

• These include individuals, organizations, institutions, nations, or groups (Wasserman & Faust, 1994).

Page 11: Sylva workshop.gt that camp.2012

Blurred Nodes

• Social actors can therefore be distinguished from non-social actors – e.g., neurons comprising a neural network.

• On occasion, the distinction between a social and a non-social actor is not absolute. For example, computer networks represent a hybrid type of network.

Page 12: Sylva workshop.gt that camp.2012

Node Attributes

• Every single node can have one or more attributes.

• These attributes describe the nodes and allow researchers to conduct complex queries of the database.

• Node attributes can include the time of publication of a book, its length, the number of authors, etc.

Page 13: Sylva workshop.gt that camp.2012

One-mode vs. Two-mode• Most social network analysis methods allow only one type of

social actor (for instance, individuals or corporations) in their analysis; these are referred to as one-mode networks (Wasserman & Faust, 1994).

• However, methods exist which allow two different types of social actors in their analysis; these are referred to as two-mode networks. For instance, a study may simultaneously analyze corporations and their directors.

• Two-mode networks may also include social actors from distinct networks, for example, a network comprised of adults and a network comprised of children.

• Two-mode networks allow for comparison between different types and sets of social actors.

Page 14: Sylva workshop.gt that camp.2012

4. Relationships

• Ties are links that connect social actors, and are the main focus of social network analysis. Ties are seen as “channels for transfer or “flow” of resources (either material or nonmaterial)” (Wasserman & Faust, 1994, p. 4).

Page 15: Sylva workshop.gt that camp.2012

Simple Relationships

• Naturally occurring ties among social actors are inherently complex and consist of numerous different interaction activities.

• However, unlike ethnographers network analysts do not focus on the complexity of interactions among individuals (Burt, 1983).

• Instead, social network analysts focus more on the pattern of relations amongst individuals and to do so simplify the inherent complexity of social relationships by categorizing interactions into different broad types. The types can be manifold. For example, a pair of social actors may have friendship, working, cooperation, or citation ties.

Page 16: Sylva workshop.gt that camp.2012

5. Additional Terms

• Directed graphs vs. undirected graphs• Ego• Alter• Homophily

Page 17: Sylva workshop.gt that camp.2012

Types of Network Analysis

• Ego-centered/Socio-centered Social Networks• Community-centered social networks

Page 18: Sylva workshop.gt that camp.2012

Ego-centered/Socio-centered Social Networks

Page 19: Sylva workshop.gt that camp.2012

Actor-Level Centrality

• Actor level degree centrality: Degree centrality measures the extent to which an actor is linked to all of the other actors in the network. Three different measures can be distinguished: nodal degree, indegree, and outdegree.

• Actor level closeness centrality: Closeness measures the distance that an actor has to all of the other actors in the network.

Page 20: Sylva workshop.gt that camp.2012

• Actor level betweenness centrality: Betweenness measures the extent to which an actor lies between two other actors and thus facilitates/controls the flow of information.

Page 21: Sylva workshop.gt that camp.2012

9

Face-to-face (1/week) CS

Community-Centered Social Networks

Page 22: Sylva workshop.gt that camp.2012

Network Level Centralization

• Cohesion Distance: measures the degree of separation between actors in a network. It indicates how many other people are between two actors - that is, actors between an actor and the actor this person needs to talk to.

• Network Centralization: measures the number of actors that are connected to each actor in the network. The more connections among actors, the greater the network centrality.

• Density: measures the degree of connection that exists in a network. The more actors talk to each other, the higher the density.

Page 23: Sylva workshop.gt that camp.2012

Measures of Centrality and AssumptionsMeasure Level Data Type Symmetry/Asymmetry

Nodal Degree Centrality Actor Dichotomized (>5) Symmetric (Maximum)

Indegree Centrality Actor Valued Asymmetric

Outdegree Centrality Actor Valued Asymmetric

Closeness Centrality Actor Dichotomized (>5) Symmetric (Maximum)

Betweenness Centrality Actor Dichotomized (>5) Symmetric (Maximum)

Network Cohesion Network Valued Asymmetric

Network Centrality Network Dichotomized (>5) Asymmetric

Network Density Network Dichotomized (>5) Symmetric (Maximum)

Page 24: Sylva workshop.gt that camp.2012

Uniqueness of Collecting and AnalyzingSocial Network Data

• Relational data• Boundary specification and sampling• Interdependence of data points• Query search• Complexity of data collection

– Manually-harvested– Data set– Behavioral– Self-report

Page 25: Sylva workshop.gt that camp.2012

25

Internet Resources ofSocial Network Analysis

• Center for the Study of Group Processeshttp://lime.weeg.uiowa.edu/~grpproc/

• INSNA International Network of Social Network Analysishttp://www.heinz.cmu.edu/project/INSNA/

• Barry Wellman’s Homepagehttp://www.chass.utoronto.ca/~wellman/index.html

• CulturePlex• http://cultureplex.ca/• Gephi.org• NodeXL

http://nodexl.codeplex.com/

Page 26: Sylva workshop.gt that camp.2012
Page 27: Sylva workshop.gt that camp.2012

27

Limitations of Social Network Analysis

• Boundary specification

• Data source

• Definition of social actors

• No distinct method

Page 28: Sylva workshop.gt that camp.2012

What is Sylva?

• A database system management system• Graph databases• NoSQL database• Built on top of Neo4J

Page 29: Sylva workshop.gt that camp.2012

Whose Needs Does Sylva Serve?• Sylva requires no programming skills• On-the-go modification of the schema• Storing data in a graph form• Work from the nodes or from the edges• Collaborative platform• Easy-to-use interface thanks to forms,

autocomplete, …• Multiple visualizations• Search and Query Engines

Page 30: Sylva workshop.gt that camp.2012

The Interface

Page 31: Sylva workshop.gt that camp.2012

The Dashboard

Page 32: Sylva workshop.gt that camp.2012

Creating a Database (Graph)

Page 33: Sylva workshop.gt that camp.2012

Schema vs Data

Page 34: Sylva workshop.gt that camp.2012

My First Schema

Page 35: Sylva workshop.gt that camp.2012

Creating a Schema on Sylva (manually)

• New Type of Node (person)• (2nd) New Type of Node (work)• Relation

– Incoming or outgoing– Allowed relationships

• (3rd) New Type of Node (institution)

Page 36: Sylva workshop.gt that camp.2012
Page 37: Sylva workshop.gt that camp.2012

Properties of Objects

• Data objects have properties• A property is an attribute that defines certain

operations than can be performed on the object

• We need properties to enter our data

Page 38: Sylva workshop.gt that camp.2012
Page 39: Sylva workshop.gt that camp.2012

Properties of “Person”

Page 40: Sylva workshop.gt that camp.2012

Properties of “Person”

Page 41: Sylva workshop.gt that camp.2012

Entering Data (manually)

Page 42: Sylva workshop.gt that camp.2012

My First Graph

Page 43: Sylva workshop.gt that camp.2012

The Node Level: Selecting and Expanding

Page 44: Sylva workshop.gt that camp.2012

Collaboration in Sylva

Page 45: Sylva workshop.gt that camp.2012

Case of Collaboration

Page 46: Sylva workshop.gt that camp.2012

Searching

• Returns a list

Page 47: Sylva workshop.gt that camp.2012

Importing and Exporting

• Importing a Schema• Exporting Data to Gephi

Page 48: Sylva workshop.gt that camp.2012

Cuba’s Prominence: Modeling The Latin American Afro in Topic Maps

• Objectives:– locating the various nodes of bibliographic

production associated with the generation of an image of the Latin-American Afro

– evaluating the causes that make certain nodes, i.e., Cuba and various Cuban intellectuals, emerge as key nodes in the network of production of Afro-Latin American images

Page 49: Sylva workshop.gt that camp.2012

Cuba’s Prominence

• Methodology: – a combination of traditional close-reading of texts

(extraction of nodes and relations) with– graph analysis of the emerging network with Page

Rank algorithm

Page 50: Sylva workshop.gt that camp.2012
Page 51: Sylva workshop.gt that camp.2012

Measurements (Gephi)• Closeness centrality: expresses how well connected an individual is to the whole

network. A high value in this measurement indicates better connectivity and thus expresses the importance of the individual with respect to other elements in the network.

• Betweenness centrality: indicates how important the individual is as a connection and transference point within the network. A high value indicates that it is a topic that is passed through in the communications (relationships) between the other topics on the map.

• Modularity: is a coefficient that enables us to group together those nodes which share connections and zones on the network, so that it divides the map into zones with high relationships between them.

• Influence between nodes: is an analysis which we shall carry out in the second part of the article. It is based on the Page Ranking algorithm. This is basic algorithm on which the Google search engine was originally based for calculating the importance of the pages that it comes up with after a search, and which it used to order the results. Its basic idea is that a given node within a network becomes important based on the importance of the nodes that relate with it or that point to it.

Page 52: Sylva workshop.gt that camp.2012

Betweennes Centrality

Page 53: Sylva workshop.gt that camp.2012

Modularity

Page 54: Sylva workshop.gt that camp.2012

Some numerical results

Page 55: Sylva workshop.gt that camp.2012

Sustaining a Global Community• Henrich et al. [1] have proven that the existence of norms that sustain

fairness in exchanges among strangers are connected with the diffusion of institutions such as market integration and the participation in world religions.

• Their research confirms the hypothesis that modern world religion may have contributed to the sustainability of large- scale societies and large-scale interactions and we propose that art is another institution that contributes to the arising and sustainability of large-scale societies.

• We use the case of the formation of an artistic network of paintings, schools, themes, genres, and artists whose development goes along with the expansion and colonization of the Hispanic Monarchy across America to show that this artistic network has a presence in all political territories encompassing most ethnicities and religions of indigenous origin.

Page 56: Sylva workshop.gt that camp.2012

Methodology• The data set comprising the paintings from the Baroque period are

organized and stored in a PostgreSQL web based database. • The data includes more than 100,000 total topics (11,443 of them

are artworks). A distinctive feature of the information is that it is organized around both text fields and ad-hoc descriptors that follow the model of a formal ontology.

• For our study we have decided to model the data in one of the possible networks, a network created from common descriptors as weighted edges and artworks as nodes.

• Some pruning methods had to be applied in order to overcome some of the shortcomings resulting from the millions of edges and the too many relational joins. We also split the dataset in 12 sections, each covering a 25 year-period, from 1550 to 1850 [4].

Page 57: Sylva workshop.gt that camp.2012

Methodology• Similarity Measure:

– S(Art1,Art2)=#{common descriptors of Art1 and Art2}

Artwork 1

Artwork 2

Descriptor 1

Descriptor 3

Descriptor 5

Descriptor 4

Descriptor 2

Descriptor 7

Descriptor 6

S=2

Page 58: Sylva workshop.gt that camp.2012

Research Questions

• Our research addresses the issue of the sustainability of communities through the existence of a flow of shared information.

• This question is of the utmost importance to understand the formation and dynamics of cultural groups and cultural areas.

• As important as the latter is the study of the spatial and temporal dimensions of any given political and cultural community as this will shed light on the cultural processes resulting from previous and currents waves of globalization

Page 59: Sylva workshop.gt that camp.2012

Baroque Paintings in the Hispanic World: A Network.

• The graph shows, for the first two periods of our study, the growth of the saints-related paintings (red cluster) as compared to the decrease of the cluster with virgins (blue). Portraits’ size (brown cluster) remains more or less the same, but they get more connected to saints’.

• FOTO

Page 60: Sylva workshop.gt that camp.2012

1550-1575 1575-1600 1600-1625 1625-1650 1650-1675

1750-17751700-1725 1725-1750 1775-1800 1800-1825

v v1675-1700

v1825-1850

v

v

v

vvvv

v

Clustering & Visualizations: Raw Graphs

http://zoom.it/vJVw#full

Page 61: Sylva workshop.gt that camp.2012

Further Work with Sylva

• Visualization of Schema• Two Visualizations of Data:

– Node-centered– Community centered

• Query System:– Pattern-matching– Traversals

• Need for multi-disciplinary teams• Complexity of analysis

Page 62: Sylva workshop.gt that camp.2012

Thank you!“With enough effort and perseverance:

Anything is possible”