Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Visualisation and Analysis of the Internet Movie Database∗
Adel Ahmed†
School of IT, University of SydneyNICTA, Australia
Vladimir Batagelj‡
Discrete and Computational MathematicsUniversity of Ljubljana, Slovenia
Xiaoyan Fu§
NICTA, Australia
Seok-Hee Hong¶
School of IT, University of SydneyNICTA, Australia
Damian Merrick‖School of IT, University of Sydney
NICTA, Australia
Andrej Mrvar∗∗Social Science Informatics
University of Ljubljana, Slovenia
ABSTRACT
In this paper, we present a case study for the visualisation and anal-ysis of large and complex temporal multivariate networks derivedfrom the Internet Movie DataBase (IMDB). Our approach is to in-tegrate network analysis methods with visualisation in order to ad-dress scalability and complexity issues. In particular, we definednew analysis methods such as (p,q)-core and 4-ring to identify im-portant dense subgraphs and short cycles from the huge bipartitegraphs. We applied island analysis for a specific time slice in orderto identify important and meaningful subgraphs. Further, a tem-poral Kevin Bacon graph and a temporal two mode network areextracted in order to provide insight and knowledge on the evolu-tion.
Keywords: Large and Complex Networks, Case Study, Visualisa-tion, Network Analysis, IMDB.
Index Terms: H.5.2 [Information Interfaces and Presentation]:User Interfaces—Algorithms; I.3.6 [Computer Graphics]: Method-ology and Techniques—
1 INTRODUCTION
Recent technological advances have led to the production of a lot ofdata, and consequently have led to many large and complex networkmodels across a number of domains. Examples include:
• Webgraphs: where the entities are web pages and relation-ships are hyperlinks; these are huge: the whole graph consistsof billions of nodes.
• Social networks: These include telephone call graphs (usedto trace terrorists), money movement networks (used to de-tect money laundering), and citation networks or collabora-tion networks. The size of the network can be medium to verylarge.
• Biological networks: Protein-protein interaction (PPI) net-works, metabolic pathways, gene regulatory networks andphylogenetic networks are used by biologists to analyse andengineer biochemical materials. In general, they are smaller,with thousands of nodes. However, the relationships in thesenetworks are very complex.
∗This paper is based on the winning entry of the Graph Drawing Com-petition 2005 [7] and invited presentation at Sunbelt Viszard Session [9].
†e-mail: [email protected]‡e-mail:[email protected]§e-mail:[email protected]¶e-mail:[email protected]‖e-mail:[email protected]
∗∗e-mail:[email protected]
Understanding these networks is a key enabler for many appli-cations. Good analysis methods are needed for these networks, andsome are available. However, such methods are not useful unlessthe results are effectively communicated to humans. Visualisationcan be an effective tool for the understanding of such networks.Good visualisation reveals the hidden structure of the networks andamplifies human understanding, thus leading to new insights, newfindings and possible predictions for the future.
We can identify the following challenging research issues foranalysis and visualisation of large and complex networks:
• Scalability: Webgraphs or telephone call graphs gathered byAT&T have billions of nodes. In some cases, it is impossibleto visualise the whole graph, or one cannot possibly load thewhole graph in a main memory. Hence, the design of newanalysis and visualisation methods for huge networks is a keyresearch challenge from databases to computer graphics.
• Complexity: Relationships between actors in a social net-work, for example, can have a multitude of attributes (for ex-ample, observed behavior can be confirmed or unconfirmed,relationships can be directed or undirected, and weighted byprobabilities). Also, biological networks are quite complexin nature; for example, metabolic pathways have only a fewthousand nodes, but their relationships and interactions arevery complex. The data may be given by nature, but someparts of the data may be unknown to human scientists. Thedesign of analysis and visualisation methods to resolve thesecomplexity issues is the second research challenge.
• Network Dynamics: Real world networks are always chang-ing over time. Many social networks, such as webgraphs,evolve relatively slowly over time. In some cases, such as tele-phone call networks, the data is a very fast-streamed graph.Effective and efficient modeling, analysis and visualisationfor dynamic networks are challenging research topics.
One approach to solve these challenging issues is an integra-tion of analysis with visualisation and interaction. Analysis toolsfor networks are not useful without visualisation, and visualisationtools are not useful unless they are linked to analysis. Further, in-teraction is necessary to find out more details or insights from thevisualisation.
In this paper, we present a case study for our approach to inte-grating analysis, visualisation and interaction using large and com-plex temporal multivariate networks derived from the IMDB (Inter-net Movie Data Base). In general, the IMDB is a huge and veryrich data set with many attributes. Note that the IMDB data set hasbecome a challenging data set for visualisation researchers [7, 9].
For example, a multi-scale approach for visualisation of smallworld networks was used for data sets from IMDB [3]. A visual-ization approach for dynamic affiliation networks in which eventsare characterized by a set of descriptors was presented [6]. A ra-dial ripple metaphor was devised to display the passing of time and
17
Asia-Pacific Symposium on Visualisation 20075 - 7 February, Sydney, NSW, Australia 1-4244-0809-1/07/$20.00 © 2007 IEEE
Cream of Comedy
Dansk melodi grand prix
Dronningens nyt�rstale
Eurovision Song Contest, The
Kennedy Center Honors: A Celebration of the Performing Arts, The
King of the Ring
Popular Science
Royal Rumble
Starrcade
Statsministerens nyt�rstale
Summerslam
Survivor Series
Unusual Occupations
’Commissario Corso, Il’
’EnquŒtes du commissaire Maigret, Les’
’Nero Wolfe Mystery, A’
’Operation Phoenix - J�ger zwischen den Welten’
’Sitte, Die’
Bock, Alana
Boyd, KarinB�hm, IrisGawlich, Cathlen
Hłeg, Jannie Leese, Lindsay
Maggio, Rosalia
Margrethe II
Siggaard, Kirsten
Abatantuono, Diego
Anoai, Solofatu
Berry, Colin
Borden, Steve (I)
Calaway, Mark
Carpenter, Ken (I)
Chaykin, Maury
Cronkite, Walter
de Mylius, Jłrgen
DiBiase, Ted
Dunn, Conrad
Eaton, Mark (II)
Flair, Ric
Fox, Colin (I)
Gunn, Billy (II)
Hart, BretHart, Owen
Heick, Keld
Heinrichs, Dirk
Hickenbottom, Michael
Hutton, Timothy
Jacobs, Glen
Jarczyk, Robert
Kelehan, Noel
Lawler, Jerry
Levesque, Paul Michael
Martens, Dirk (I)
McMahon, Vince
Olsen, Jłrgen
Panczak, Hans Georg
Pfohl, Lawrence
Rasmussen, Poul Nyrup
Rasmussen, Tommy (I)
Richard, Jean (I)
Ross, Jim (III)
Schl�ter, Poul
Sims, Tim
Smith, Davey BoyTraylor, Raymond
Whitman, Gayne
Figure 1: Arcs with multiplicity at least 8
conveys relations among the different constituents through appro-priate layout. Note that the method is suitable for an egocentricperspective.
As the first step of our approach, we integrate network analysismethods [5, 10] with visualisation. In particular, we defined thenew analysis methods such as (p,q)-core and 4-ring to identify im-portant dense subgraphs and short cycles from the huge bipartitegraphs. We applied island analysis for a specific time slice in orderto identify important and meaningful subgraphs of the large andcomplex network. Further, a temporal Kevin Bacon graph and atemporal two mode network are extracted and visualised in order toprovide insight and knowledge on the evolution of the IMDB dataset.
This paper is organised as follows. In the next Section, wepresent a simple analysis of the IMDB data set. In Section 3, wepresent the integration of network analysis methods with visualisa-tion for large bipartite graphs including (p,q)-core, 4-ring and is-land. Section 4 presents visual analysis based on the Kevin-Baconnumber. Section 5 presents galaxy metaphor visualisation of a tem-poral two mode actor-movie network, and a visual analysis of thetwo mode network with company attributes. Section 6 concludes.
2 BASIC CHARACTERISTICS OF IMDB
The source of the original data is the Internet Movie Database.We transformed the contest data into a temporal network withsome additional vectors and partitions describing the propertiesof vertices. The IMDB network is bipartite (two mode) and has1324748 = 428440 + 896308 vertices and 3792390 arcs. 9927 ofthe arcs in the network are multiple (parallel) arcs. The nature ofthe appearance of multiple arcs can be seen in Figure 1, where allarcs with multiplicity of at least 8 are displayed.
Note that in the analysis that follows, we treat multiple arcs assingle. The IMDB network consists of 132714 weak components.
3 VISUALISATION AND ANALYSIS OF LARGE BIPARTITENETWORKS
There are few direct specialized methods for analyzing bipartitenetworks, especially large ones. Because of the size of the IMDBnetwork, the standard reduction of the entire network to one or theother derived 1-mode network was not an option. This motivated usto design and implement two new methods for analysis of bipartitenetworks:
• bipartite version of cores – (p,q)-cores
Table 1: (p,q : n1,n2) for IMDB
1 1590: 1590 1 | 22 24: 1854 1153 | 43 14: 29 832 516: 788 3 | 23 23: 47 56 | 44 14: 29 833 212: 1705 18 | 24 23: 34 39 | 45 13: 30 954 151: 4330 154 | 25 22: 42 53 | 46 13: 29 945 131: 4282 209 | 26 22: 31 38 | 47 12: 29 1016 115: 3635 223 | 27 22: 31 38 | 48 12: 28 1007 101: 3224 244 | 28 20: 36 53 | 49 12: 26 958 88: 2860 263 | 29 20: 35 52 | 50 11: 27 1119 77: 3467 393 | 30 19: 35 59 | 51 11: 26 110
10 69: 3150 428 | 31 19: 35 59 | 52 11: 16 7911 63: 2442 382 | 32 19: 34 57 | 53 10: 35 16212 56: 2479 454 | 33 18: 34 62 | 54 10: 35 16213 50: 3330 716 | 34 18: 34 62 | 55 10: 34 16214 46: 2460 596 | 35 18: 33 61 | 56 10: 34 16215 42: 2663 739 | 36 17: 33 65 | 57 9: 35 18716 39: 2173 678 | 37 16: 33 75 | 58 9: 33 18017 35: 2791 995 | 38 16: 30 73 | 59 9: 33 18018 32: 2684 1080 | 39 16: 29 70 | 60 9: 32 17819 30: 2395 1063 | 40 15: 29 77 | 61 9: 31 17720 28: 2216 1087 | 41 15: 28 76 | 62 9: 31 17721 26: 1988 1087 | 42 15: 28 76 | 63 8: 31 202
• 4-ring weights on lines
3.1 (p,q)-core AnalysisThe subset of vertices C ⊆V is a (p,q)-core in a bipartite (2-mode)network N = (V1,V2;L), V = V1 ∪V2 if and only if
a. in the induced subnetwork K = (C1,C2;L(C)), C1 = C ∩V1,C2 = C ∩V2 it holds ∀v ∈ C1 : degK(v) ≥ p and ∀v ∈ C2 :degK(v) ≥ q ;
b. C is the maximal subset of V satisfying condition a.
The basic properties of bipartite cores are:
• C(0,0) = V
• K(p,q) is not always connected
• (p1 ≤ p2)∧ (q1 ≤ q2) ⇒C(p1,q1) ⊆C(p2,q2)
Using (p,q)-cores, we can identify important dense structure outof large and complex networks. We design a very efficient O(m)algorithm to fine (p,q)-cores, and implement in Pajek .
Since there are many (p,q)-cores, we must answer the questionof how to select the interesting ones among them. To help the userin these decisions, we implemented a Table of cores’ characteristicsn1 = |C1(p,q)|, n2 = |C2(p,q)| and k – number of components inK(p,q) (see Table 1 and 2). We look for (p,q)-cores where
• n1 +n2 ≤ selected threshold
• big jumps from C(p−1,q) and C(p,q−1) to C(p,q).
For example, we selected (247,2)-core and (27,22)-core. Fromthe labels we can see that the corresponding topics are: wrestling,and pornography. See Figures 2 and 3.
3.2 4-ring AnalysisA k-ring is a simple closed chain of length k. Using k-rings we candefine a weight of edges as wk(e) = # of different k-rings containingthe edge e ∈ E.
Since for a complete graph Kr, r ≥ k ≥ 3 we have wk(Kr) =(r−2)!/(r−k)! the edges belonging to cliques have large weights.Therefore, these weights can be used to identify the dense parts ofa network. For example, all r-cliques of a network belong to r−2-edge cut for the weight w3.
18
Royal Rumble
Survivor Series
Dumas, AmyEllison, LillianGarc�a, LiliÆnGuenard, NidiaHulette, ElizabethKai, LeilaniKeibler, StacyLaurer, JoanieMartel, SherriMartin, Judy (II)McMahon, StephanieMcMichael, DebraMero, RenaMoore, Carlene (II)Moore, Jacqueline (VI)Moretti, LisaPsaltis, Dawn MarieRobin, Rockin’Runnels, TerriStratus, TrishVachon, AngelleWilson, TorrieWright, JuanitaYoung, Mae (I)Adams, Brian (VI)Ahrndt, JasonAl-Kassi, AdnanAlbano, LouAnderson, ArnAndrØ the GiantAngle, KurtAnoai, ArthurAnoai, MattAnoai, RodneyAnoai, SamAnoai, SolofatuApollo, PhilAustin, Steve (IV)Backlund, BobBarnes, Roger (II)Bass, Ron (II)Batista, DaveBenoit, Chris (I)Bigelow, Scott ’Bam Bam’Bischoff, EricBlackman, Steve (I)Blair, Brian (I)Blanchard, TullyBlood, RichardBloom, Matt (I)Bloom, WayneBresciano, AdolphBrisco, GeraldBrunzell, JimBuchanan, Barry (II)Bundy, King KongCalaway, MarkCandido, ChrisCanterbury, MarkCena, John (I)Centopani, PaulChavis, ChrisClarke, BryanClemont, PierreCoachman, JonathanCoage, AllenCole, Michael (V)Connor, A.C.Constantino, RicoCopeland, Adam (I)Cornette, James E.Darsow, BarryDavis, Danny (III)DeMott, WilliamDiBiase, TedDouglas, ShaneDuggan, Jim (II)Eadie, BillEaton, Mark (II)Enos, Mike (I)Eudy, SidFarris, RoyFatu, EddieFifita, UliuliFinkel, HowardFlair, RicFoley, MickFrazier Jr., NelsonFujiwara, HarryFunaki, ShoGarea, TonyGasparino, PeterGill, DuaneGoldberg, Bill (I)Gray, George (VI)Guerrero Jr., ChavoGuerrero, EddieGunn, Billy (II)Guttierrez, OscarHall, Scott (I)Hardy, Jeff (I)Hardy, MattHarris, Brian (IX)Harris, Don (VII)Harris, Ron (IV)Hart, BretHart, Jimmy (I)Hart, OwenHart, StuHayes, Lord AlfredHeath, David (I)Hebner, DaveHebner, EarlHeenan, BobbyHegstrand, MichaelHelms, ShaneHennig, CurtHenry, Mark (I)Hernandez, RayHeyman, PaulHickenbottom, MichaelHogan, HulkHollie, DanHorn, BobbyHorowitz, BarryHouston, SamHoward, JamieHoward, Robert WilliamHuffman, BookerHughes, DevonHyson, MattJackson, TigerJacobs, GlenJames, Brian (II)Jannetty, MartyJarrett, Jeff (I)Jericho, ChrisJohnson, Ken (X)Jones, Michael (XVI)Keirn, SteveKelly, Kevin (VIII)Killings, RonKnight, Dennis (II)Knobs, BrianLauer, David (II)Laughlin, Tom (IV)Laurinaitis, JoeLawler, Brian (II)Lawler, JerryLayfield, JohnLeinhardt, RodneyLeslie, EdLesnar, BrockLevesque, Paul MichaelLevy, Scott (III)Lockwood, MichaelLoMonaco, MarkLong, TeddyLothario, JoseManna, MichaelMarella, Joseph A.Marella, RobertMartel, RickMartin, Andrew (II)Matthews, Darren (II)McMahon, ShaneMcMahon, VinceMero, MarcMiller, ButchMoody, William (I)Mooney, Sean (I)Morgan, Matt (III)Morley, SeanMorris, Jim (VII)Muraco, DonNash, Kevin (I)Neidhart, JimNord, JohnNorris, Tony (I)Nowinski, ChrisOkerlund, GeneOrton, RandyOttman, FredPage, DallasPalumbo, Chuck (I)Peruzovic, JosipPettengill, ToddPfohl, LawrencePiper, RoddyPlotcheck, MichaelPoffo, LannyPowers, Jim (IV)Prichard, TomRace, HarleyReed, Bruce (II)Reiher, JimReso, JasonRhodes, Dusty (I)Rivera, Juan (II)Roberts, Jake (II)Rock, TheRoss, Jim (III)Rotunda, MikeRougeau Jr., JacquesRougeau, RaymondRude, RickRunnels, DustinRuth, GlenSags, JerrySaturn, PerrySavage, RandyScaggs, CharlesSenerca, PeteShamrock, KenShinzaki, KensukeSimmons, Ron (I)Slaughter, Sgt.Smith, Davey BoySnow, AlSolis, MercidSteiner, Rick (I)Steiner, ScottStorm, LanceSzopinski, TerryTajiri, YoshihiroTanaka, PatTaylor, Scott (IX)Taylor, Terry (IV)Tenta, JohnTraylor, RaymondTunney, JackVailahi, SioneValentine, GregVan Dam, RobVaziri, Kazrowvon Erich, KerryWalker, P.J.Waltman, SeanWare, David (II)Warrington, ChazWarriorWhite, LeonWickens, BrianWight, PaulWilson, Al (III)Wright, Charles (II)Zhukov, Boris (I)
Figure 2: (247,2)-core
Fully Loaded
Invasion
King of the Ring
No Way Out
Royal Rumble
Summerslam
Survivor Series
Wrestlemania 2000
Wrestlemania X-8
Wrestlemania X-Seven
WWE Armageddon
WWE Judgment Day
WWE No Mercy
WWE No Way Out
WWE SmackDown! Vs. Raw
WWE Unforgiven
WWE Vengeance
WWE Wrestlemania X-8
WWE Wrestlemania XX
WWF Backlash
WWF Insurrextion
WWF Judgment Day
WWF No Mercy
WWF No Way Out
WWF Rebellion
WWF Unforgiven
WWF Vengeance
’Raw Is War’
’Sunday Night Heat’
’WWE Velocity’
’WWF Smackdown!’
Dumas, Amy
Keibler, StacyMcMahon, Stephanie
Stratus, TrishAngle, KurtAnoai, SolofatuAustin, Steve (IV)Benoit, Chris (I)Bloom, Matt (I)Calaway, MarkCole, Michael (V)Copeland, Adam (I)Guerrero, EddieGunn, Billy (II)Hardy, Jeff (I)Hardy, Matt
Hebner, EarlHeyman, PaulHuffman, BookerHughes, Devon
Jacobs, GlenJericho, ChrisLawler, JerryLayfield, JohnLevesque, Paul Michael
LoMonaco, Mark
Martin, Andrew (II)
Matthews, Darren (II)
McMahon, ShaneMcMahon, VinceReso, JasonRock, TheRoss, Jim (III)Senerca, PeteSimmons, Ron (I)
Taylor, Scott (IX)Van Dam, Rob
Wight, Paul
Figure 3: (27,22)-core
The 3-ring weights were already available [8]. However, thereare no 3-rings in the IMDB network. The densest substructuresare complete bipartite subgraphs Kp,q. They contain many 4-rings.This motivated us to design a method to find 4-rings weights. Weimplement it in Pajek .
Table 2: (p,q : n1,n2) for IMDB
Size Freq Size Freq Size Freq Size Freq--------------------------------------------------------
2 5512 20 19 38 4 59 23 1978 21 18 39 3 61 14 1639 22 15 40 2 64 15 968 23 9 42 2 67 16 666 24 13 43 3 70 17 394 25 12 45 3 73 18 257 26 6 46 4 76 19 209 27 6 47 5 82 110 148 28 5 48 1 86 111 118 29 6 49 2 106 112 87 30 3 50 2 122 113 55 31 6 51 1 135 114 62 32 5 52 2 144 115 46 33 3 53 1 163 116 39 34 1 54 2 269 117 27 35 5 55 1 301 118 28 36 4 57 1 332 219 29 37 7 58 1 673 1--------------------------------------------------------
Be My Valentine, Charlie Brown
Boy Named Charlie Brown
Charlie Brown Celebration
Charlie Brown Christmas
Charlie Brown Thanksgiving
Charlie Brown’s All Stars!
He’s Your Dog, Charlie Brown
Is This Goodbye, Charlie Brown?
It’s a Mystery, Charlie Brown
It’s an Adventure, Charlie Brown
It’s Flashbeagle, Charlie Brown
It’s Magic, Charlie Brown
It’s the Easter Beagle, Charlie Brown
It’s the Great Pumpkin, Charlie Brown
Life Is a Circus, Charlie Brown
Making of ’A Charlie Brown Christmas’
Play It Again, Charlie Brown
Race for Your Life, Charlie Brown
Snoopy Come Home
There’s No Time for Love, Charlie Brown
You Don’t Look 40, Charlie Brown
You’re a Good Sport, Charlie Brown
You’re In Love, Charlie Brown
You’re Not Elected, Charlie Brown
Charlie Brown and Snoopy ShowAltieri, Ann
Dryer, Sally
Mendelson, Karen
Momberger, Hilary
Stratford, Tracy
Brando, Kevin
Hauer, Brent
Kesten, Brad
Melendez, Bill
Ornstein, Geoffrey
Reilly, Earl ’Rocky’
Robbins, Peter (I)
Schoenberg, Jeremy
Shea, Christopher (I)
Shea, Stephen
Figure 4: Charlie Brown
To identify interesting substructures, we applied the simple is-lands procedure for the weight w4. It takes around three minutes tocompute w4 weights on a 1400 MHz, 1GB RAM computer, and 13seconds to determine the islands. We obtained 12465 simple lineislands on 56086 vertices. Here is their size distribution.
There are 94 of size at least 30; and only 10 over 100. Thelargest island corresponds to wrestling. Each island represents aspecial topic. We visualized only some of them. For example, seeFigures 4, 5, 6, 7 and 8.
3.3 Time slices and Island AnalysisBy extracting a time slice from the complete network, we can iden-tify the main groups in selected time periods. Islands can identifyimportant subgraphs of large networks based on the value of at-tributes [4].
To illustrate this, we extracted the time slice 1935-1950. Thereare 223 simple islands [4] for w4 on 1774 vertices. For example,we selected island 6 – ’Dona Macabra’; see Figure 9.
4 TEMPORAL CO-STARRING NETWORK: KEVIN-BACONNETWORK
We extracted a small important subset of the actors in the IMDBnetwork and constructed from it a dynamic visualisation of a 1-mode network showing the co-appearance of actors in films.
To define a sufficiently small important subgraph, we first con-sidered only nodes in the network with a Kevin Bacon number of1. The Kevin Bacon number of an actor is a similar concept to the
19
Adventures of Mark Twain, The
Bad Men of Missouri
Big City
Castle on the Hudson
Dust Be My Destiny
Go Getter, The
Honky Tonk Hoodlum Saint, The
Kid From Kokomo, The
Kid Galahad
King of the Underworld
Knockout
Man Who Talked Too Much, The
Meet John Doe
Nancy Drew... Reporter
Naughty But Nice
Racket Busters
Roaring Twenties, The
San Quentin
Secret Service of the Air
Sergeant Madden
Smashing the Money Ring
Star Is Born, A
They Drive by Night
They Made Me a Criminal
Unconquered
Union Pacific
Valley of the Giants
Wells Fargo
Whole Town’s Talking, The
Women in the Wind
Yankee Doodle Dandy
You Can’t Take It with You
Flowers, BessChandler, Eddy
Dunn, Ralph
Flavin, James
Holmes, Stuart
Mower, Jack
O’Connor, Frank (I)
Phelps, Lee (I)
Saum, Cliff
Sullivan, Charles (I)
Vogan, Emmett
Figure 5: Mower, Jack and Phelps, Lee
Boy, T.T.
Byron, Tom
Davis, Mark (V)
Dough, Jon
Drake, Steve (I)
Horner, Mike
Jeremy, Ron
Michaels, Sean
Morgan, Jonathan (I)
North, Peter (I)
Sanders, Alex (I)
Savage, Herschel
Silvera, Joey
Thomas, Paul (I)
Voyeur, Vince
Wallice, Marc
West, Randy (I)
Figure 6: Adult
Erdos number of a mathematician; it represents the length of theshortest path in the movie star collaboration network from the actorto Kevin Bacon.
The data set was divided into time slices of a decade in length(e.g. 1920s, 1930s, etc.), and the set of actors reduced in eachdecade to only those who had co-starred in at least 5 films withanother actor with a Kevin Bacon number of 1. The sizes of thegraphs for each of these time slices are given in Table 3.
The 1-mode co-starring networks of these reduced sets of actorswere constructed for each decade, and a three-dimensional layoutwas generated for each using the Scale-free network layout [2]inGEOMI [1]. Nodes in the force-directed layout were restricted tolie on one of three concentric spheres, depending on the degree ofthe node [2]. The colouring of each node was also used to indicatethe degree. The size of each node was dependant on the number of
Abid el gassad
Abid el mal
Abu Ahmad
Abu Dahab
Abu Hadid
Aguazet seif
Amir el antikam
Ana bint min?
Ana zanbi eh?
Ard el ahlam
Ashki limin?
Asrar el naas
Baad al wedah
Baba Amin
Batal lil nehaya
Beyt al Taa
Cass el azab
Ebn el-hetta
Elf laila wa laila
Fatat el mina
Fatawa, El
Fatawat el Husseinia
Ghaltet ab
Ghazal al-banat
Haked, El
Hamida
Hareb min el ayyam
Hub fil zalam
Ibn al ajar
Imlak, ElIskanderija... lih?Laab bil nar, El
Maktub alal guebin
Malak el zalem, El
Massiada, Al
Matloub zawja fawran
Mohtal, ElMurra kulshi, El
Namrud, El
Nashal, El
Nassab, El
Osta Hassan, El
Port Said
Rasif rakam khamsa
Sawak nus el lail
Sittat afarit, al-
Souk el selahTarik el saada
Zalamuni el habaieb
Zoj el azeb, El
Hamama, Faten
Rostom, Hind
Soltan, Hoda
El Dekn, Tewfik
El-Meliguy, Mahmoud
Hamdi, Imad
Riad, Hussein
Sarhan, Shukry
Shawqi, Farid
Figure 7: Shawqi, Farid and El-Meliguy, Mahmoud
Pol
izei
ruf 1
10 -
Ang
st u
m T
essa
B�lo
w
Pol
izei
ruf 1
10 -
Der
Pfe
rdem
�rde
r
Pol
izei
ruf 1
10 -
Der
Spi
eler
Pol
izei
ruf 1
10 -
Dok
tors
piel
e
Pol
izei
ruf 1
10 -
Ein
Bild
von
ein
em M
�rde
r
Pol
izei
ruf 1
10 -
Hei
�kal
te L
iebe
Pol
izei
ruf 1
10 -
Hen
kers
mah
lzei
t
Pol
izei
ruf 1
10 -
Jug
endw
ahn
Pol
izei
ruf 1
10 -
Kop
f in
der
Sch
linge
Pol
izei
ruf 1
10 -
Kur
scha
tten
Pol
izei
ruf 1
10 -
Mor
dsfr
eund
e
Pol
izei
ruf 1
10 -
Ros
ento
d
Pol
izei
ruf 1
10 -
Tod
sich
er
Pol
izei
ruf 1
10 -
Tot
e er
ben
nich
t
Pol
izei
ruf 1
10 -
Zer
st�r
te T
r�um
e
Sta
rkes
Tea
m -
Aug
e um
Aug
e, E
in
Sta
rkes
Tea
m -
Ban
krau
b, E
in
Sta
rkes
Tea
m -
Blu
tsba
nde,
Ein
Sta
rkes
Tea
m -
Bra
unau
ge, E
in
Sta
rkes
Tea
m -
Das
Bom
bens
piel
, Ein
Sta
rkes
Tea
m -
Das
gro
�e S
chw
eige
n, E
in
Sta
rkes
Tea
m -
Der
letz
te K
ampf
, Ein
Sta
rkes
Tea
m -
Der
Man
n, d
en ic
h ha
sse,
Ein
Sta
rkes
Tea
m -
Der
sch
�ne
Tod
, Ein
Sta
rkes
Tea
m -
Der
Tod
fein
d, E
in
Sta
rkes
Tea
m -
Der
Ver
dach
t, E
in
Sta
rkes
Tea
m -
Die
Nat
ter,
Ein
Sta
rkes
Tea
m -
Ein
s zu
Ein
s, E
in
Sta
rkes
Tea
m -
Erb
arm
ungs
los,
Ein
Sta
rkes
Tea
m -
Im V
isie
r de
s M
�rde
rs, E
in
Sta
rkes
Tea
m -
Kin
dert
r�um
e, E
in
Sta
rkes
Tea
m -
Kle
ine
Fis
che,
gro
�e F
isch
e, E
in
Sta
rkes
Tea
m -
Kol
lege
M�r
der,
Ein
Sta
rkes
Tea
m -
Lug
und
Tru
g, E
in
Sta
rkes
Tea
m -
Mor
dlus
t, E
in
Sta
rkes
Tea
m -
M�r
deris
ches
Wie
ders
ehen
, Ein
Sta
rkes
Tea
m -
Rot
er S
chne
e, E
in
Sta
rkes
Tea
m -
Sic
herh
eits
stuf
e 1,
Ein
Sta
rkes
Tea
m -
Tr�
ume
und
L�ge
n, E
in
Sta
rkes
Tea
m -
T�d
liche
Rac
he, E
in
Sta
rkes
Tea
m -
Ver
rate
n un
d ve
rkau
ft, E
in
Sta
rkes
Tea
m, E
in
’Aff�
re S
emm
elin
g, D
ie’
Mar
anow
, Maj
a
Bad
emso
y, T
ayfu
n
Lans
ink,
Leo
nard
Lerc
he, A
rnfr
ied
Mar
tens
, Flo
rian
Sch
war
z, J
aeck
i
Win
kler
, Wol
fgan
g
Figure 8: Polizeiruf 110 and Starkes Team
movies in which the corresponding actor starred in that particulardecade. Similarly, the width of an edge was used to represent thenumber of co-appearances between two actors in a decade.
To effectively illustrate the evolution of the co-starring network,we display smooth animations between the layouts of subsequentdecades. The animations are broken into several parts shown oneafter the other in time, in order to aid retention of the mental map.First, nodes and edges not present in the first layout are faded out.Nodes present in both first and second layouts are then animated totheir new positions in the second layout. Nodes new to the secondlayout burst out from the centre and come to rest in their calcu-lated positions, and finally new edges are faded in to show the newcollaborations in the second decade. The animation is download-able from http://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05-final.avi
20
Dona Macabra
Hoy canto para ti
Isla Isabel
Janitors, The
Lupo und der Muezzin
Madre padrona
Martin Fierro
Misterio del latigo negro, El
Monja alferez, La
Primo Baby
Rayo de luz, Un
Silencio roto
Sor Juana Inez de la cruz
Suenos atomicos
Tehtaan varjossa
Tesoro de Morgan, El
Tierra y mar del noroeste
Todo un caballero
Triboulet
Tu Hau
Camargos, Glaucia
D’Org, Olga
Delholm, Kirsten
Deray, Sara
Escobar, Valeria
Frank, Constanze
Gomez, Martha
Morales, Lucy
Obregon, Julia
Roldan, Celia
Segarra, Carol
Zea, Kristi
Arenas, Mathieu
Aroza, Diego
Barreiro, Jose
Blanco, Tomas (I)
Buendia, Jorge
Busquets, Enrique
Cabello, Antonio
Calles, David
Calvo, Ricardo
Cardona, Renan
de Anda, Rafael
Del Degan, Davide
Fernandez, Emiliano
Frauscher, Richard
Gonzalez, Gibran
Langlands, Rob
Lopez, Bruno
Lopez, Celso
Marti, Adam
Martinez, Pablo (V)
Noriega, Leonardo J.
O’Farril, Alfredo
Parra, Aleksandr Perez, Jose A. (I)
Rueda, Enrique
Soler, Cote
Trevino, Alejandro
Velasco, Gary
Villarreal, Juan Antonio
Villate, Victor
Wimer, Homero
Figure 9: Dona Macabra
KB1 V EInitial 1324748 3792390
all decades, no filtering 2742 3360601910s, ≥ 5 films 16 181920s, ≥ 5 films 4 21930s, ≥ 5 films 25 531940s, ≥ 5 films 17 171950s, ≥ 5 films 19 181960s, ≥ 5 films 16 351970s, ≥ 5 films 79 4111980s, ≥ 5 films 59 731990s, ≥ 5 films 207 4252000s, ≥ 5 films 124 208
Table 3: Graph sizes per decade of co-starring network
This process was continued for all decade slices from 1911through to 2004, and the result can be seen in the downloadableanimation. Figures 10, 11, 12, 13, 14 show snapshots of the anima-tion from the 1960s through to the early 2000s.
The visualisation revealed a number of interesting facts. One un-expected finding was the substantial number of actors with a KevinBacon number of 1 in the early years of the twentieth century, someof whom could clearly not have co-starred in a film with Kevin Ba-con. This revealed some problems in the collection of the moviedata set. The years of some movies had been recorded incorrectly,while edges to other movies that possessed the same name as amovie of a prior decade were all recorded as belonging to the earliermovie.
In the 1960s (Figure 10), the visualisation shows a clique involv-ing the US president John F. Kennedy. This is due to the assassina-tion of Kennedy in 1963, and the subsequent barrage of documen-taries that were produced detailing the event. The other actors in theclique (Jacqueline Kennedy, John and Nellie Connally, etc.) wereall present at the assassination. They are present in this data setsince the movie JFK, starring Kevin Bacon, included real archivefootage of the assassination. The Kennedys continue through tolater decades in the visualisation, illustrating the vast number ofdocumentary films developed that were based on this event.
The 1970s, shown in Figure 11, sees the first large connectedgroup of Hollywood actors that continue as big names to this day.James Earl Jones, Robert Redford, Steve Martin and John Travoltaall appear in this group.
Figure 10: The co-starring actors visualisation (1960s)
Figure 11: The co-starring actors visualisation (1970s)
The visualisation of the 1980s (Figure 12) highlights some par-ticularly close-knit groups of actors. Comedy stars Chevy Chase,Dan Akroyd and Bill Murray appear due to roles in Satuday NightLive, Caddy Shack and Spies Like Us. Also present are Jim Cum-mings, Jack Angel and Rob Paulson, who have quite high degreesdue to their involvement as voice actors in many short cartoons andepisodes.
These groups continue into the 1990s, where the groups of actorsbecome much larger and more highly connected (Figure 13). Morewell-established modern actors like Whoopi Goldberg, Tom Hanksand Dennis Hopper become particularly prominent in this decade.
Finally, in the 2000s, we see some particularly interesting andunexpected phenomena (Figure 14). First, music stars such as Brit-ney Spears, Beyonce Knowles and Sheryl Crow appear with veryhigh degree and connectedness, due to their participation in numer-ous music award shows. Secondly, on the other side of the visu-alisation, popular actor Arnold Schwarzenegger links politicians tothe movie stars and musicians in the rest of the co-starring network.This was primarily due to Schwarzenegger’s entry into politics, in
21
Figure 12: The co-starring actors visualisation (1980s)
Figure 13: The co-starring actors visualisation (1990s)
becoming the governor of the US state of California. Followingthis event, he was in several political documentaries in which BillClinton also appeared. Bill Clinton, in turn, is linked through docu-mentaries and archival footage to other famous politicians, such asRonald Reagan, Richard Nixon and John F. Kennedy.
5 A GALAXY OF MOVIE STARS OF TEMPORAL ACTOR-MOVIE NETWORK
This section describes a galaxy of movie stars of the temporal actor-movie network with animation (in order to see the overview), anda visualisation of the network of specific time slice (in order to seethe details).
First we consider a “galaxy of stars” metaphor of the movie-actornetwork. The main idea is to map the “movie stars” in a movie(i.e. animation) of a galaxy of stars which displays actor-movieinteractions.
Representing as much information as possible without introduc-ing overwhelming visual complexity has always been a challengewhen visualising large data sets. We define important subgraphs to
Figure 14: The co-starring actors visualisation (2000s)
reduce visual complexity as follows.We define the “stars” from the IMDB as follows:
• every star actor must have been in more than 12 movies overthe whole time period
• every star movie must have more than 12 actors
• each star actor must have played in between three to sixmovies in each year
We again use a bipartite (2-mode) network model. There are twotypes of nodes: actor nodes and movie nodes. Actor nodes are dis-played as stars in the night sky, and edges are displayed as faintlines joining up “constellations” of actors (See Figure 15). Edgeswith bends are displayed between actor and movie nodes; however,movie nodes are hidden; in this manner, collaboration between ac-tors can easily be seen. In this case, the picture not only reduces thevisual complexity (especially for edges), but also represents actor-movie and actor-actor interactions at the same time.
To produce an overview of the temporal network dynam-ics, we computed a layout for each year from 1907 to 2004and produced an animation. A two-dimensional force-directedlayout was generated for each year’s subgraph using GEOMI[1]. The animation is performed between each layout, in asimilar manner to the animation of the co-starring authors net-work in the previous section. The animation is available fromhttp://www.it.usyd.edu.au/∼dmerrick/gd05contest/gd05-final.avi
Once we have an overview of the temporal network using ananimation, we now focus on the details of the specific year of thenetwork to observe some interesting patterns in specific time peri-ods.
Figure 16 shows part of the layout of year 1918. Those threeactors co-starred in five movies together; on the other hand, they didnot appear in any other movies. Only one of the movies includesactors from outside. This kind of pattern can be usually found inthe early years.
Figures 17 and 18 show a different pattern. They are both cap-tured from the layout of year 1983. In Figure 17, nineteen actorsco-starred in a masterpiece. In Figure 18, the same group of peo-ple starred in a series of movies together, whilst also appearing inother movies with actors from outside the group. Compared to thepattern of early years in Figure 16, one may gain some knowledgeand insight about the trends of the movie industry from Figure 17.
22
Figure 15: A frame from the galaxy of stars animation
Figure 16: Actor collaboration pattern in early years.
Further insights can be discovered when combining company at-tributes in visualisation, Figures 19 to 22 show. There are two clus-ters in 1985. To assist with analysis, we display the movie nodeswith their labels. The two clusters are normal movies and adultmovies.
Figures 19 to 22 show some patterns in the evolution: before the1990s, these two types of movies were clearly separated, meaningthat they were produced by different companies with different ac-tors. That is, two groups seldom collaborated. However, these twogroups started to merge into one big group. The actors started tomove around between different companies for collaboration. Forexample, see the year 1994. It is difficult to separate these twogroups in the picture. This may be an indication of the possiblechange in the movie industry, as well as to the social network of ac-tors. This visualisation can be a useful supplement to formal anal-ysis methods.
6 CONCLUSION
Integration of good analysis methods with proper visualisationmethods is an effective approach to gain an insight into large andcomplex networks. Our next step is to further integrate variousanalysis methods with visualisation on different data sets. A for-mal evaluation on the insights and knowledge derived then needs tobe carried out.
Figure 17: Many actors co-starring one movie.
Figure 18: Same group of people in several movie.
Ultimately, appropriate interaction methods need to be integratedin order to complete our visual analysis framework for large andcomplex networks.
REFERENCES
[1] A. Ahmed, T. Dwyer, M. Forster, X. Fu, J. Ho, S. Hong, D.Koschutzki, C. Murray, N. Nikolov, A. Tarassov, R. Taib and K. Xu,GEOMI: GEometry for Maximum Insight, Proc. of Graph Drawing2006, pp. 468-479, 2006.
[2] A. Ahmed, T. Dwyer, S. Hong, C. Murray, L. Song and Y. Wu, Vi-sualisation and Analysis of Large and Complex Scale-free Networks,Proc. of EuroVis 2005, pp. 18, 2005.
[3] D. Auber, Y. Chiricota, F. Jourdan and G. Melanon, Multiscale Visu-alization of Small World Networks, Proc. of InfoVis, pp. 75-81, 2003.
[4] V. Batagelj, Analysis of large networks - Islands, Dagstuhl seminar03361: Algorithmic Aspects of Large and Complex Networks, 2003.
[5] U. Brandes and T. Erlebach, Network Analysis: methodological foun-dations, Springer, 2005.
[6] U. Brandes, M. Hoefer and C. Pich, Affiliation Dynamics with an Ap-plication to Movie-Actor Biographies, Proc. of EuroVis 2006, pp. 179-186, 2006.
[7] Graph Drawing 2005 Competition, http://gd2005.org/[8] Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/[9] Sunbelt XXVI 2006 Viszard Sesseion.
[10] S. Wasserman and K. Faust, Social Network Analysis: Methods andApplications, Cambridge University Press, 1994.
23
Figure 19: Layout of 1985
Figure 20: Layout of 1988
Figure 21: Layout of 1991
Figure 22: Layout of 1994
24