54
eractive visualization and exploration network data with gephi Bernhard Rieder Universiteit van Amsterdam Mediastudies Department and some conceptual context

Interactive visualization and exploration of network data with gephi

Embed Size (px)

DESCRIPTION

Presentation for a workshop given at the Centre for Interdisciplinary Methodologies at Warwick University on May 9 2013. Focuses on conceptual and historical questions. Comments, references, and explanations are in the notes.

Citation preview

Page 1: Interactive visualization and exploration of network data with gephi

Interactive visualization and explorationof network data with gephi

Bernhard RiederUniversiteit van AmsterdamMediastudies Department

and some conceptual context

Page 2: Interactive visualization and exploration of network data with gephi

Context

Terms like "big data", "computational social science", "digital humanities", "digital methods", etc. are receiving a lot of attention.

They point to a set of practices of knowledge production: data analysis, visualization, modeling, etc.

Instead of a totalizing search for a "logic" of data analysis, we could inquire into the vocabulary of concepts and analytical gestures that constitute the practice of data analysis.

A twofold approach to methods:☉ Engagement, development, application => digital

methods

☉ Conceptual, historical, and political analysis and critique => software studies

Page 3: Interactive visualization and exploration of network data with gephi

This workshop

How do we talk about data? How do we analyze them? What is our frame of thought? How do we go further in terms of imagination, expressivity?

☉ Introduction☉ A bit of math

☉ Two kinds of mathematics

☉ Concepts and techniques from graph theory

☉ Working with gephi

Engage the theory of knowledge (epistemology) mobilized in data analysis, but through the actual techniques and not generalizing concepts.

Page 4: Interactive visualization and exploration of network data with gephi

Basic ideas

Why?

Why do network analysis and visualization? Which arguments are put forward?

☉ New media: technical and conceptual structures modeled as networks

☉ Calculative capacities: powerful techniques and tools

☉ Visualization: the network diagram, "visual analytics"

☉ Logistics: data and software are available

☉ Methodology: dissatisfaction with statistics (SNA)

☉ Society: diversification, problems with demographics / statistics / theory

Page 5: Interactive visualization and exploration of network data with gephi

Platforms like Twitter boost opportunities for connectivity between various types of actors.

Page 6: Interactive visualization and exploration of network data with gephi

At the same time, they produce detailed data traces that are highly centralized and searchable.

Much of these data can be analyzed as graphs.

Page 7: Interactive visualization and exploration of network data with gephi

What styles of reasoning?

Hacking (1991) building the concept of "style of reasoning" on A. C. Crombie’s (1994) "styles of scientific thinking":

☉ postulation and deduction

☉ experiment and empirical research

☉ reasoning by analogy

☉ ordering by comparison and taxonomy

☉ statistical analysis of regularities and probabilities

☉ genetic development

What kind of reasoning are we mobilizing in data analysis?

Is it one type of reasoning or many?

Are we "positivists" when we do data analysis? Reductionists?

Page 8: Interactive visualization and exploration of network data with gephi

Quality / quantity

"One of my favorite fantasies is a dialogue between Mills and Lazarsfeld in which the former reads to the latter the first sentence of The Sociological Imagination: 'Nowadays men often feel that their private lives are a series of traps. ' Lazarsfeld immediately replies: 'How many men, which men, how long have they felt this way, which aspects of their private lives bother them, do their public lives bother them, when do they feel free rather than trapped, what kinds of traps do they experience, etc., etc., etc.' If Mills succumbed, the two of them would have to apply to the National Institute of Mental Health for a million-dollar grant to check out and elaborate that first sentence. They would need a staff of hundreds, and when finished they would have written Americans View Their Mental Health rather than The Sociological Imagination, provided that they finished at all, and provided that either of them cared enough at the end to bother writing anything." (Maurice Stein, cit. in Gitlin 1978)

Theory vs. empiricism, macro vs. micro, qualitative vs. quantitative, inductive vs. deductive, associative vs. formalistic, etc.

The promise of data analysis tools, applied to exhaustive (and cheap) data, is to bridge the gap, to allow zooming, "quali-quanti" (Latour 2010).

Page 9: Interactive visualization and exploration of network data with gephi

Two kinds of mathematics

Can there be data analysis without math? No.

Does this imply epistemological commitments? Yes.

But there are choices, e.g. between:☉ Confirmatory data analysis => deductive

☉ Exploratory data analysis (Tukey 1962) => inductive

There is a fast growing variety of formal analytical gestures relying on mathematical modeling and computation.

Page 10: Interactive visualization and exploration of network data with gephi

Two kinds of mathematics

Statistics

Observed: objects and properties

Inferred: social forces

Data representation: the table

Visual representation: quantity charts

Grouping: "class" (similar properties)

Graph-theory

Observed: objects and relations

Inferred: structure

Data representation: the matrix

Visual representation: network diagrams

Grouping: "clique" (dense relations)

Page 11: Interactive visualization and exploration of network data with gephi

Graph theory

Leonhard Euler, "Seven Bridges of Königsberg", 1735

Introducing the "point and line" model

Page 12: Interactive visualization and exploration of network data with gephi

Graph theory

Develops over the 20th century, in particular the second half.

Integrates branches of mathematics (topology, geometry, statistics, etc.).

Graph theory is "the mathematics of structure" (Harary 1965), "a mathematical model for any system involving a binary relation" (Harary 1969); it makes relational structure calculable.

"Perhaps even more than to the contact between mankind and nature, graph theory owes to the contact of human beings between each other." (König 1936)

Page 13: Interactive visualization and exploration of network data with gephi

Basic ideas

Moreno 1934

Graph theory developed in exchange with sociometry, small-group research and (later) social exchange theory.

Starting point:

"the sociometric test"

(experimental definition of "relation")

Page 14: Interactive visualization and exploration of network data with gephi

Basic ideas

Page 15: Interactive visualization and exploration of network data with gephi

Forsythe and Katz, 1946, "adjacency matrix"

Page 16: Interactive visualization and exploration of network data with gephi

Harary, Graph Theory, 1969

Page 17: Interactive visualization and exploration of network data with gephi

Basic ideas

The late 1990s

The network "singularity":☉ The network imaginary, a "new science of networks" (Watts 2005)

☉ Computational capacities (memory, speed, interfaces, etc.)

☉ New platforms and datasets

☉ Packaged tools

Different traditions conflate to form network analysis:☉ Social network analysis and sociometrics

☉ Scientometrics / science and technology studies

☉ Mathematics / physics / computer science

☉ Information and data visualization

☉ Digital sociology / new media studies

Page 18: Interactive visualization and exploration of network data with gephi

Basic ideasAdamic and Glance, "Divided They Blog", 2005

Page 19: Interactive visualization and exploration of network data with gephi

Formalization

"As we have seen, the basic terms of digraph theory are point and line. Thus, if an appropriate coordination is made so that each entity of an empirical system is identified with a point and each relationship is identified with a line, then for all true statements about structural properties of the obtained digraph there are corresponding true statements about structural properties of the empirical system." (Harary et al. 1965)

There is always an epistemological commitment!

=> What can "carry" the reductionism and formalization?

=> What types of analytical gestures?

Page 20: Interactive visualization and exploration of network data with gephi

Facebook Page "ElShaheeed", June 2010 – June 2011, (Poell / Rieder, forthcoming)

7K posts, 700K users, 3.6M comments, 10M likes (tool: netvizz), work in progress!

Page 21: Interactive visualization and exploration of network data with gephi

Facebook Page "ElShaheeed", June 2010 – June 2011: comment timescatter, log10 y scale, likes on comments

Page 22: Interactive visualization and exploration of network data with gephi

Facebook Page "ElShaheeed", June 2010 – June 2011: scatterplot comments / likes, per post type

Page 23: Interactive visualization and exploration of network data with gephi

Facebook Page "ElShaheeed"

700K nodes, 11M connections

Color: type

Page 24: Interactive visualization and exploration of network data with gephi

Facebook Page "ElShaheeed"

700K nodes, 11M connections

Color: outdegree

Page 25: Interactive visualization and exploration of network data with gephi

Basic ideas

What Kind of Phenomena/Data?

Interactive networks (Watts 2004): link encodes tangible interaction☉ social network

☉ citation networks

☉ hypertext networks

Symbolic networks (Watts 2004): link is conceptual☉ co-presence (Tracker Tracker, IMDB, etc.)

☉ co-word

☉ any kind of "structure" that can be as point and line

=> do all kinds of analysis (SNA, transportation, text mining, etc.)

=> analyze structure in various ways

Page 26: Interactive visualization and exploration of network data with gephi

Basic ideas

What is a graph?

An abstract representation of nodes connected by links.

Two ways of dealing with graphs:☉ mathematical analysis (graph statistics, structural measures, etc.)

☉ visualization (network diagram, matrix, arc diagram, etc.)

Page 27: Interactive visualization and exploration of network data with gephi

Three different force-based layouts of my FB profile

OpenOrd, ForceAtlas, Fruchterman-Reingold

Page 28: Interactive visualization and exploration of network data with gephi

Non force-based layouts

Circle diagram, parallel bubble lines, arc diagram

Page 29: Interactive visualization and exploration of network data with gephi

Network statistics

betweenness centrality

degr

ee

Relational elements of graphs can be represented as tables (nodes have properties) and analyzed through statistics.

Network statistics bridge the gap between individual units and the structural forms they are embedded in.

This is currently an extremely prolific field of research.

Page 30: Interactive visualization and exploration of network data with gephi

Basic ideas

Page 31: Interactive visualization and exploration of network data with gephi

Basic ideas

What is a graph?

Vertices and edges!

Nodes and lines!

Two main types:Directed (e.g. Twitter)

Undirected (e.g. Facebook)

Properties of nodes:degree, centrality, etc.

Properties of edges:weight, direction, etc.

Properties of the graph:averages, diameter, communities, etc.

Page 32: Interactive visualization and exploration of network data with gephi

Basic ideas

Page 33: Interactive visualization and exploration of network data with gephi

Basic ideas

Wikipedia: Glossary of graph theory

Tools are easy, concepts are hard

Page 34: Interactive visualization and exploration of network data with gephi

Basic ideas

Interactive visual analytics

Bringing structure to the surface (gephi panel: "layout")☉ different spatializations (force, geometry, etc.)

Projecting variables into the diagram (gephi panel: "ranking")☉ Size (nodes, edges, labels, etc.)

☉ Color (nodes, edges, labels, etc.)

Deriving measures (gephi panel: "statistics")☉ Properties of nodes, edges, structure => new variables

Analysis: e.g. correlation between spatial layout and variables?

Page 35: Interactive visualization and exploration of network data with gephi

Basic ideashttp://courses.polsys.net/gephi/

Page 36: Interactive visualization and exploration of network data with gephi

Basic ideas

Page 37: Interactive visualization and exploration of network data with gephi

Basic ideas

Twitter #ows dataset, co-hashtag analysis

Strong topic clustering

Page 38: Interactive visualization and exploration of network data with gephi

Twitter 1% sample, co-hashtag analysis

227,029 unique hashtags, 1627 displayed (freq >= 50)

Size: frequency

Color: modularity

Page 39: Interactive visualization and exploration of network data with gephi

Size: frequency

Color: user diversity

Twitter 1% sample, co-hashtag analysis

227,029 unique hashtags, 1627 displayed (freq >= 50)

Page 40: Interactive visualization and exploration of network data with gephi

Size: frequency

Color: degree

Twitter 1% sample, co-hashtag analysis

227,029 unique hashtags, 1627 displayed (freq >= 50)

Page 41: Interactive visualization and exploration of network data with gephi

Twitter 1% sample

Co-hashtag analysis

Degree vs. wordFrequency

Page 42: Interactive visualization and exploration of network data with gephi

Degree vs. userDiversity

Twitter 1% sample

Co-hashtag analysis

Page 43: Interactive visualization and exploration of network data with gephi

FB group "Islam is dangerous"

Friendship network, color: betweenness centrality

2.339 members

Average degree of 39.69

81.7% have at least one friend in the group

55.4% five or more

37.2% have 20 or more

founder and admin has 609 friends

Page 44: Interactive visualization and exploration of network data with gephi

FB group "Islam is dangerous"

Friendship network, color: Interface language

en_us, de, en_uk, it dominate

Page 45: Interactive visualization and exploration of network data with gephi

Mapping European Extremism

Friendship relations of 18 extreme-right groups

Page 46: Interactive visualization and exploration of network data with gephi

FB page "Educate children about the evils of Islam"

Links have more comments, photos more likes.

Page 47: Interactive visualization and exploration of network data with gephi

FB page "Stop the Islamization of the World"

Number of posts and reactions

Page 48: Interactive visualization and exploration of network data with gephi

FB page "Stop the Islamization of the World"

Page 49: Interactive visualization and exploration of network data with gephi

Basic ideas

Interactive visual analytics

Bringing structure to the surface (gephi panel: "layout")☉ different spatializations (force, geometry, etc.)

Projecting variables into the diagram (gephi panel: "ranking")☉ Size (nodes, edges, labels, etc.)

☉ Color (nodes, edges, labels, etc.)

Deriving measures (gephi panel: "statistics")☉ Properties of nodes, edges, structure => new variables

Analysis: e.g. correlation between spatial layout and variables?

Page 50: Interactive visualization and exploration of network data with gephi

Basic ideashttp://courses.polsys.net/gephi/

Page 51: Interactive visualization and exploration of network data with gephi

Nine measures of centrality (Freeman 1979)

Page 52: Interactive visualization and exploration of network data with gephi
Page 53: Interactive visualization and exploration of network data with gephi

Basic ideasUS Airports

Page 54: Interactive visualization and exploration of network data with gephi

Thank You

[email protected]

https://www.digitalmethods.net

http://thepoliticsofsystems.net

"Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (Tukey 1962)