Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
CSE 494/598Lecture-9:Social Networks and their applications to WebLYDIA MANIKONDA HTTP://WWW.PUBLIC.ASU.EDU/~LMANIKON /
**Content adapted from last year’s slides
Announcements• Homework-2 soon will be released
• Surprise quiz today!
Today’s lecture• Social Networks – Network theory and concepts – General Introduction
• Hands-on Tutorial -- Crawling Twitter
Networks – Physical & Cyber
Typhoid Mary(Mary Mallon)
Patient Zero(Gaetan Dugas)
Applications of Network Theory• World Wide Web and hyperlink structure
• The Internet and router connectivity
• Collaborations among…– Movie actors
– Scientists and mathematicians
• Sexual interaction
• Cellular networks in biology
• Food webs in ecology
• Phone call patterns
• Word co-occurrence in text
• Neural network connectivity of flatworms
• Conformational states in protein folding
Web Applications of Social Networks• Analyzing Influence or Importance• Page Rank • Related to recursive in-degree computation
• Discovering Communities• Finding near-cliques
• Analyzing Trust • Propagating trust
• Using propagated trust to fight spam• In email
• In web page ranking
• Analyzing Information Diffusion• Modeling web interactions
People are represented as nodes.
Society as a graph
People are represented as nodes.
Relationships are represented as edges.
(Relationships may be acquaintanceship, friendship, co-authorship, etc.)
Society as a Graph
People are represented as nodes.
Relationships are represented as edges.
(Relationships may be acquaintanceship, friendship, co-authorship, etc.)
Allows analysis using tools of mathematical graph theory
Society as a Graph
Graphs – Sociograms (based on Hanneman, 2001)
Graphs – Network analysis uses one kind of graphic display that consists of points (nodes) to represent actors and lines (edges) to represent the ties or relations.
Sociograms – Called by Sociologists; Same as graphs
Strength of ties:
• Nominal – represents presence of absence of a tie
• Signed – represents a negative tie, positive tie or no tie
• Ordinal – represents whether a tie is the strongest, next strongest, etc.
• Valued – measured on an interval or ratio level
ConnectionsSize
◦ Number of nodes
Density ◦ Number of ties that are present/the amount of ties that could be present
Out-degree
◦ Sum of connections from an actor to others
In-degree
◦ Sum of connections to an actor
DistanceWalk
◦ A sequence of actors and relations that begins and ends with actors
Geodesic distance
◦ The number of relations in the shortest possible walk from one actor to another
Maximum flow
◦ The amount of different actors in the neighborhood of a source that lead to pathways to a target
Centrality
◦ This indicates the most important vertices within a graph
Degree◦ Sum of connections from or to an actor
◦ Transitive weighted degreeAuthority, hub, pagerank
Closeness centrality◦ Distance of one actor to all others in the network
Betweenness centrality◦ Number that represents how frequently an actor is between other actors’
geodesic paths
Some Measures of Power & Prestige(based on Hanneman, 2001)
Cliques and Social Roles (based on Hanneman, 2001)
Cliques ◦ Sub-set of actors
◦ More closely tied to each other than to actors who are not part of the sub-set
◦ (A lot of work on “trawling” for communities in the web-graph)
◦ Often, you first find the clique (or a densely connected subgraph) and then try to interpret what the clique is about
Social roles ◦ Defined by regularities in the patterns of relations among actors
Outline•Small Worlds
•Random Graphs
•Alpha and Beta
•Power Laws
•Searchable Networks
•Six Degrees of Separation
Small worlds
Trying to make friends
Kentaro
Trying to make friends
Kentaro
BashMicrosoft
Trying to make friends
Kentaro Ranjeet
BashMicrosoft Asha
Trying to make friends
Kentaro Ranjeet
Bash
Sharad
Microsoft Asha
New York CityYale
Ranjeet and I already had a friend in common!
I didn’t have to worry…
Kentaro
Bash
Karishma
Sharad
Maithreyi
Anandan
Venkie
Soumya
It’s a small world after all!
Kentaro Ranjeet
Bash
Karishma
Sharad
Maithreyi
Anandan Prof. Sastry
Venkie
PM Manmohan Singh
Prof. Balki
Pres. Kalam
Prof. Jhunjhunwala
Dr. Montek SinghAhluwalia
Ravi
Dr. Isher Judge Ahluwalia
Pawan
Aishwarya
Prof. McDermott
Ravi’sFather
AmitabhBachchan
Prof.Kannan
Prof. Prahalad
Soumya
NandanaSen
Prof. AmartyaSen
Prof. Veni
Rao
The Kevin Bacon GameInvented by Albright College students in 1994:
◦ Craig Fass, Brian Turtle, Mike Ginelly
Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie.
Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors:
http://oracleofbacon.org/
Boxed version of theKevin Bacon Game
The Kevin Bacon Game
Kevin Bacon
An Example
Tim Robbins
Om Puri
Amitabh Bachchan
Yuva (2004)
Mystic River (2003)
Code 46 (2003)
Rani Mukherjee
Black (2005)
Perhaps the other path is deemed more diverse/ colorful…
…actually Bachchan has a Bacon number 3
Total # of actors in database: ~550,000
Average path length to Kevin: 2.79
Actor closest to “center”: Rod Steiger (2.53)
Rank of Kevin, in closeness to center: 876th
Most actors are within three links of each other! Center of Hollywood?
The Kevin Bacon Game
Erdős Number (Bacon game for Brainiacs )
Number of links required to connect scholars to Erdős, via co-authorship of papers
Erdős wrote 1500+ papers with 507 co-authors.
Jerry Grossman’s (Oakland Univ.) website allows mathematicians to compute their Erdosnumbers:
http://www.oakland.edu/enp/
Connecting path lengths, among mathematicians only:
◦ average is 4.65◦ maximum is 13
Paul Erdős (1913-1996)
Unlike Bacon, Erdos has better centrality in his network
Erdős NumberPaul Erdős
Dimitris Achlioptas
Bernard Schoelkopf
Kentaro Toyama
Andrew Blake
Mike Molloy
Toyama, K. and A. Blake (2002). Probabilistic tracking with exemplars in a metric space. International Journal of Computer Vision. 48(1):9-19.
Romdhani, S., P. Torr, B. Schoelkopf, and A. Blake (2001). Computationally efficient face detection. In Proc. Int’l. Conf. Computer Vision, pp. 695-700.
Achlioptas, D., F. McSherry and B. Schoelkopf. Sampling Techniques for Kernel Methods. NIPS 2001, pages 335-342.
Achlioptas, D. and M. Molloy (1999). Almost All Graphs with 2.522 n Edges are not 3-Colourable. Electronic J. Comb. (6), R29.
Alon, N., P. Erdos, D. Gunderson and M. Molloy (2002). On a Ramsey-type Problem. J. Graph Th. 40, 120-129.
Six Degrees of SeparationThe experiment:
Random people from Nebraska were to send a letter (via intermediaries) to a stock broker in Boston.
Could only send to someone with whom they were on a first-name basis.
Among the letters that found the target, the average number of links was six.
Milgram (1967)
Stanley Milgram (1933-1984)
Many “issues” with the experiment…
Some issues with Milgram’s setupA large fraction of his test subjects were stockbrokers
◦ So are likely to know how to reach the “goal” stockbroker
A large fraction of his test subjects were in Boston◦ As was the “goal” stockbroker
A large fraction of letters never reached ◦ Only 20% reached
Six Degrees of Separation
John Guare wrote a play called Six Degrees of Separation, based on this concept.
Milgram (1967)
“Everybody on this planet is separated by only six other people. Six degrees of
separation. Between us and everybody else on this planet. The president of the United
States. A gondolier in Venice… It’s not just the big names. It’s anyone. A native in a rain
forest. A Tierra del Fuegan. An Eskimo. I am bound to everyone on this planet by a trail
of six people…”
Today• PageRank algorithm
• Introduction to Social Networks
In-class exercise: Crawling WebUSING TWITTER API