38
2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor: Qiaozhu Mei School of Information University of Michigan

2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

Embed Size (px)

Citation preview

Page 1: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan 1

Introductory Social Network Analysis

with Pajek

(Lecture for SI508 Networks: Theory and Application)

Sept 16, 2009

Instructor: Qiaozhu Mei

School of InformationUniversity of Michigan

Page 2: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Overview of Network Analysis Tools

Pajeknetwork analysis and visualization,menu driven, suitable for large networks

platforms: Windows (on linux via Wine) download

Netlogoagent based modelingrecently added network modeling capabilities

platforms: any (Java)download

 GUESSnetwork analysis and visualization,extensible, script-driven (jython)

platforms: any (Java)download

Other software tools that we will not be using but that you may find useful: visualization and analysis: UCInet - user friendly social network visualization and analysis software (suitable smaller networks)iGraph - if you are familiar with R, you can use iGraph as a module to analyze or create large networks, or you can directly use the C functions Jung - comprehensive Java library of network analysis, creation and visualization routinesGraph package for Matlab (untested?) - if Matlab is the environment you are most comfortable in, here are some basic routines SIENA - for p* models and longitudinal analysis SNA package for R - all sorts of analysis + heavy duty stats to boot NetworkX - python based free package for analysis of large graphsInfoVis Cyberinfrastructure - large agglomeration of network analysis tools/routines, partly menu driven visualization only:GraphViz - open source network visualization software (can handle large/specialized networks)TouchGraph - need to quickly create an interactive visualization for the web? yEd - free, graph visualization and editing software specialized:fast community finding algorithmmotif profilesCLAIR library - NLP and IR library (Perl Based) includes network analysis routines

finally: INSNA long list of SNA packages

Page 3: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Tools We’ll Use in this Course

• Pajek: extensive menu-driven functionality, including many, many network metrics and manipulations– but… not extensible

• Guess: extensible, scriptable tool of exploratory data analysis, but more limited selection of built-in methods compared to Pajek

• NetLogo: general agent based simulation platform with excellent network modeling support– many of the demos in this course were built with NetLogo

• NetDraw: network visualization tool associated with UCInet. UCInet is not free, but NetDraw is.

Page 4: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Other Tools: gephi

• http://gephi.org• primarily for visualization, has some nice

touches

Page 5: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Other Visualization Tools: Walrus

• developed at CAIDA available under the GNU GPL.

• “…best suited to visualizing moderately sized graphs that are nearly trees. A graph with a few hundred thousand nodes and only a slightly greater number of links is likely to be comfortable to work with.”

• Java-based• Implemented Features

– rendering at a guaranteed frame rate regardless of graph size– coloring nodes and links with a fixed color, or by RGB values

stored in attributes– labeling nodes– picking nodes to examine attribute values– generating subgraph: displaying a subset of nodes or links

based on a user-supplied boolean attribute– interactive pruning of the graph to temporarily reduce clutter and

occlusion– zooming in and out

Source: CAIDA, http://www.caida.org/tools/visualization/walrus/

Page 6: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Visualization Tool: GraphViz

• Takes descriptions of graphs in simple text languages• Outputs images in useful formats• Options for shapes and colors• Standalone or use as a library

• dot: hierarchical or layered drawings of directed graphs, by avoiding edge crossings and reducing edge length

• neato (Kamada-Kawai) and fdp (Fruchterman-Reinhold with heuristics to handle larger graphs)

• twopi – radial layout• circo – circular layout http://www.graphviz.org/

Page 7: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Dot (GraphViz)

Page 8: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Visualization Tools: YEd - JavaTM Graph Editorhttp://www.yworks.com/en/products_yed_about.htm

(good primarily for layouts, scales better, maybe free)

Page 9: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

yEd and 26,000 Nodes (Takes a Few Seconds)

Page 10: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Visualization Tools: Prefuse

• (free) user interface toolkit for interactive information visualization – built in Java using Java2D graphics library

– data structures and algorithms

– pipeline architecture featuring reusable, composable modules

– animation and rendering support

– architectural techniques for scalability

• requires knowledge of Java programming• website: http://prefuse.sourceforge.net/

– CHI paper http://guir.berkeley.edu/pubs/chi2005/prefuse.pdf

Page 11: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Simple Prefuse Visualizations

Source: Prefuse, http://prefuse.sourceforge.net/

Page 12: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Prefuse Application: Flow MapsA flow map of migration from California from 1995-2000, generated automatically by Prefuse system using edge routing but no layout adjustment.

http://graphics.stanford.edu/papers/flow_map_layout/

Page 13: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Prefuse Application: Vizster http://jheer.org/vizster/

Page 14: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Visualization Tool: Manyeyes

• http://manyeyes.alphaworks.ibm.com/manyeyes/

• Only for Visualization• Not just for networks,

but many other data type

• Web based, very easy to use

14

Page 15: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Outline

• In Pajek– visualization and layouts– degree– connected components– snowball sampling– one mode projections of bipartite graphs– thresholding weighted graphs

• Fetching you Facebook network– What is an egonetwork?

Page 16: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Using Pajek for Exploratory Social Network Analysis

• Pajek – (pronounced in Slovenian as Pah-yek) means ‘spider’

• website: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ • wiki: http://pajek.imfm.si/doku.php

– download application (free)– tutorials– lectures– data sets

• Windows only (works on Linux via Wine, Mac via Darwine)

• helpful book: ‘Exploratory Social Network Analysis with Pajek’ by Wouter de Nooy, Andrej Mrvar and Vladimir Batagelj

Page 17: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Interface

Drop down list of networks opened or created with pajek. Active is displayed

Drop down list of network partitions by discrete variables, e.g. degree, mode, label

Drop down list of continuous node attributes, e.g. centrality, clustering coefficients

we’ll use today

can be used for clustering

Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download

Page 18: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Opening a Network Fileclick on folder icon

to open a file

Save changes to your network, network partitions, etc., if you’d like to keep them

Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download

Page 19: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Working with Network Files

• The active network, partition, etc is shown on top of the drop down list

Draw the network

Source: Pajek, Free for noncommercial use - http://pajek.imfm.si/doku.php?id=download

Page 20: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek data format

*Vertices 26

1 "Ada" 0.1646 0.2144 0.5000

2 "Cora" 0.0481 0.3869 0.5000

3 "Louise" 0.3472 0.1913 0.5000

..

*Arcs 1 3 2 c Black

1 2 1 c Black

2 1 1 c Black

..

*Edges 2 3 1 c Black

..

2

1

1

Ada

Cora

Louisenumber of vertices vertex x,y,z coordinates (optional)

directed edges

undirected edges

from Ada(1) to Louise(3) w/

eight “2” and color Black

between Ada(1) to Cora(2) w/

weight “1” and color Black

Page 21: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Let’s Get Started• Opening a network

– File Network Read

• Visualization– Draw Draw

• Essential measurements– …

Page 22: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Opening a File• A planar graph and layouts in Pajek• Download the file 'planarnet.net' from the Ctools website.

• Open it in Pajek by either clicking on the yellow folder icon under the word "Network" or by selecting FileNetworkRead from the main menu panel

• A report window should pop up confirming that the graph has been read and the filename and location will be displayed in the 'active' position of the network dropdown list

Page 23: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Visualization & Manual Positioning

• Visualize the network using Pajek's DrawDraw command from the main menu panel.

• This will bring up the 'draw' window with its own menu bar at the top

• Reposition the vertices by clicking on them and holding down the mouse button while dragging them to a new location. Continue doing this until you have shown that the graph is planar (no edges cross have to cross )

• (If you think this is really fun to do in your spare time, go to http://www.planarity.net)

Page 24: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Visualization & Layout Algorithms

• Now let Pajek do the work for you by selecting from the draw toolbar several layout algorithms under 'LayoutEnergy'.

• Why did you select the layout algorithm you did?• Did the layout leave any lines crossed? If you were to

do this assignment over, what order would you do it in?

Page 25: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

A Directed Network

2

1

1

2

1

2

1

2

1

2

21

1

2

1

2

1

2

12

1

2

1

2

1

2

1

21

2 1

2

1

2

12 1

2

1

2

12

1

2

12

1

2

1 2

12

Ada

Cora

Louise

Jean

Helen

Martha

Alice

Robin

Marion

Maxine

Lena

Hazel Hilda

Frances

Eva

RuthEdna

Adele

Jane

Anna

Mary

Betty

Ella

Ellen

Laura

Irene

• girls’ school dormitory dining-table partners (Moreno, The sociometry reader, 1960)

• first and second choices shown

Page 26: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Node Centrality: Degree

• Node network properties– from immediate connections

• indegreehow many directed edges (arcs) are incident on a node

• outdegreehow many directed edges (arcs) originate at a node

• degree (in or out)number of edges incident on a node

– labels

outdegree=2

indegree=3

degree=5

Page 27: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Centrality: Degree

• More on degree and other centrality measures in the next lecture…

• Degree: calculate it– Net Partitions Degree

• Visualize degree centrality– DrawDraw-Vector– If nodes are not the right size, use resize option

• Options Sizeof Vertices• Adjust the default size

Page 28: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Connected Components

• Strongly connected components– Any two nodes in the component

can be reached from each other by following directed edges B C D E A G H F

• Weakly connected components: every node can either reach or be reached from every other node by following directed edges

A

B

C

DE

F

GH

A B C D E G H F

• In undirected networks one talks simply about “connected components”

Page 29: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

The bowtie model of the WebBroder et al. (1999)

• SCC (strongly connected component):

– can reach all nodes from any other by following directed edges

• IN – can reach SCC from any node in

‘IN’ component by following directed edges

• OUT– can reach any node in ‘OUT’

component from SCC

• Tendrils and tubes– connect to IN and/or OUT

components but not SCC

• Disconnected – isolated components

Page 30: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Let’s try this on the dining table partners

• Suppose each girl will share her dish (and any dish that is shared with her), with only her 1st and 2nd choice– Find groups of girls who can all sample each other's dishes

• Net Components Strong

• Draw Draw-partition

– Which girls will get to sample no other dishes but their own?• Net Partitions Degree Input

– Display the network of strongly connected components• Operations Shrink Network Partition

• Draw Draw-partition

– Identify the bowtie structure • Net Partitions Bow-Tie

– What part of the bowtie is missing? What does this imply for the circulation of the different dishes?

Page 31: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Snowball Sampling

• Dining partners: You are a prince who just met an enchanting young lady at a ball, but she left at the stroke of midnight and left a shoe behind. Now you'd like to find the shoe's owner. All you know about her is that she lives in this particular girls' dorm. The headmistress won't let you talk to the girls, so the only way you can find your princess is to covertly ask the one girl you know, Ella, to introduce you to her two favorite friends. Once you know her friends, you can ask them to introduce you to their two favorite friends, etc. This is the snowball sampling technique we covered in class.

Highlight the vertices that you will reach using snowball sampling– Net K-Neighbors ...

• Which girls will you not find using snowball sampling starting with Ella (23)?

Page 32: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Bipartite networks

Page 33: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Going from a Bipartite to a One-mode Graph

• One mode projection– two nodes from the first

group are connected if they link to the same node in the second group

– some loss of information– naturally high

occurrence of cliques

Two-mode networkgroup 1

group 2

Page 34: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Let’s Try it on the Actors Network

• Open actorsandmovies.net • Create a 2-mode partition (newer versions of Pajek do

this automatically) – NetPartition2-Mode

• Draw the network – DrawDraw-Partition (the two classes of nodes should be

colored differently)– If labels are not shown, add them:

Options Mark Vertices UsingLabels

• Experiment with different layout algorithms– “Layout” menu in the draw window

Page 35: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Actors Network (continued)

• Transform the network into a one-mode network– NetTransform2-Mode to 1-ModeRows

• Draw the network – DrawDraw

• Qualitatively compare the structure of the 2-Mode to the 1-Mode network. Is there a loss of information?

• Show the weights on each edge– OptionsLinesMark Lineswith Values

• What do the values represent? • Compute the unweighted degree of each node

– NetPartitionsDegreeAll• Draw the network using

– DrawDraw-vector• How is the degree represented?

• Add the vector value to each vertex (it will be the degree/(max possible degree)) – OptionsMark Vertices UsingVector Values

• Who are the most important actors using this measure?

Page 36: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Actors Network (continued)

• How does the boundary of the network (i.e. who is included) affect who is found to be most central? – Load the file actorsandmoviesWithGere.net. It contains one

extra actor, Richard Gere. Repeat the above procedure.– In the 1-mode network of actors, is there a change in who is

most central?– What does this tell you about biases and boundaries in sample

selection?

• Remove all edges between actors who have costarred in fewer than 3 movies– NetTransformRemoveLines with valueLower than

• Which actors comprise the central core of this network

Page 37: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Grabbing Your Data from Facebook via Nexus

Page 38: 2009 © University of Michigan 1 Introductory Social Network Analysis with Pajek (Lecture for SI508 Networks: Theory and Application) Sept 16, 2009 Instructor:

2009 © University of Michigan

Pajek: Wrap Up

• Used frequently by sociologists – UCInet is comparable and arguably more user friendly (but not

free)

• Extensive functionality– But not extendable

• What we covered– visualization– node properties: degree– connected components– k-neighbors– converting two-mode networks to one-mode– thresholding the network