Upload
sven
View
48
Download
1
Embed Size (px)
DESCRIPTION
PhD Research Proficiency Exam. Social Network Analysis using Link Mining. Jing Xia Laboratory for Knowledge Discovery in Databases Department of Computing and Information Sciences Kansas State University http://www.kddresearch.org http://www.cis.ksu.edu/~xiajing. Outline. - PowerPoint PPT Presentation
Citation preview
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
PhD Research Proficiency Exam
Jing XiaLaboratory for Knowledge Discovery in Databases
Department of Computing and Information Sciences
Kansas State University
http://www.kddresearch.org
http://www.cis.ksu.edu/~xiajing
Social Network Analysis using Link Mining
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network Introduction
Networks in Biological System
Mining on Social NetworkLinking MiningMulti Relational Mining
Problem Specification
Proposed approach
OutlineOutline
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network IntroductionSocial Network Introduction
What is Social Network?a social net work is a heterogeneous and
multirelational data set represented by a graph.
Characteristics of Social Network“Natural” Networks and UniversalityQuantitative measures
Mining Social NetworkLink Mining: Tasks and Challenges
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Society Society
Nodes: individuals
Links: social relationship(family/work/friendship/etc.)
S. Milgram (1967) “natural” network appears to be a universal Six Degrees of Separation
Society networks: Many individuals with diverse social interactions between them.
2023年4月20日 Data Mining: Concepts and Techniques4
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
CommunicationCommunication
The Earth is developing an electronic system, a network with diverse nodes and links are
-computers
-routers
-satellites
-phone lines
-TV cables
-EM waves
Communication networks: Many non-identical components with diverse connections between them.
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
EpidemiologyEpidemiology
Nodes: doctors, patients, geological location
Links: contact relationship(direct/indirect infectiousness)
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Characteristics of Social NetworkCharacteristics of Social Network
Consider many kinds of networks:social, technological, business, economic, content,…
These networks tend to share certain informal properties:Multi relational interactionTemporal (time-evolving)large scale; continual growthdistributed, organic growth: vertices “decide” who to
link tomixture of local and long-distance connectionsabstract notions of distance: geographical, content,
social,…
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network TheorySocial Network Theory
Do natural networks share more quantitative universals?
What would these “universals” be?How can we make them precise and measure them?How can we explain their universality?
This is the domain of social network theorySometimes also referred to as link analysis
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Quantitative MeasureQuantitative Measure
Connected components:how many, and how large?
Network diameter:maximum (worst-case) or average?exclude infinite distances? (disconnected
components)the small-world phenomenon
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Quantitative MeasureQuantitative Measure
Clustering: to what extent that links tend to cluster “locally”? what is the balance between local and long-
distance connections? what roles do the two types of links play?
Degree distribution: what is the typical degree in the network? what is the overall distribution?
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network Introduction
Networks in Biological System
Problem Specification
Mining on Social NetworkLinking MiningMulti Relational Mining
OutlineOutline
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Bio-MapBio-Map
Protein-gene interaction
protein-protein interactions
PROTEOME
GENOME
Citrate Cycle
METABOLISM
Bio-chemical reactions
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Protein-Protein Interaction Network
Protein-Protein Interaction Network
protein-protein interactions
PROTEOME
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Protein-Protein Interaction Network
Protein-Protein Interaction Network
Nodes: proteins Links: multi relational
physical interactions (binding)complex membershipPathway
P. Uetz, et al. Nature 403,
623-7 (2000).
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network Introduction
Networks in Biological System
Mining on Social NetworkLinking MiningMulti Relational Mining
Problem Specification
Proposed approach
OutlineOutline
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Link MiningLink Mining
Traditional machine learning and data mining approaches assume: data is flat
Typical real data setInstances in data set form linked networks
Link Mining
Newly emerging research area at the intersection of research in social network and link analysis, hypertext and web mining, graph mining, relational learning and inductive logic programming
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Link Mining TasksLink Mining Tasks
Object-Related TasksLink-based object rankingLink-based object classificationObject clustering (group detection)Object identification (entity resolution)
Link-Related TasksLink prediction
Graph-Related TasksSubgraph discoveryGraph classification
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Multi-relational Link MiningMulti-relational Link Mining
Traditional link mining assume there is only one kind of relation in the network: link is flat
There exist multiple, heterogeneous social networks, each representing a particular kind of relationshipMulti-relational & heterogeneous
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Multi-relational NetworkMulti-relational Network
Multi-relational & heterogeneous NetworkMultiple object and link types
Example NetworkMedical network: patients, doctors, disease,
contacts, treatmentsBibliographic network: years, publications, authors,
venuesEpidemic transmission network (involve temporal
data, multi-relational: airborne, patients’ contacts
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network Introduction
Networks in Biological System
Mining on Social NetworkLinking MiningMulti Relational Mining
Problem Specification
Proposed approach
OutlineOutline
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Problem SpecificationProblem Specification
Phenomenon: Heterogeneity & Multi-relationship exists in many real network
Rationale: it might be useful for link mining
ProblemCan we utilize multi-relationship to help
link analysisHow to extract relations as relation network (RN)?How to identify relationship among relation
network? (co-relation, independent, etc)Is RN time-evolving? Which relation plays an
important role?
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Problem Example1Problem Example1
Application Domain: Epidemic Disease
Pre-condition 1: given multi relations -- patients’ contacts network in timeline
Pre-condition 2: sequential relationship among relations
Pre-condition 3: another medium of disease transmission
Problem: can we predict if any person will be infected, based on mining these multi-relational networks?
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Problem Example2Problem Example2
Application Domain: bibliographic network
Pre-condition 1: given multi relations – the co-author relation networks of a conference in some years
Problem 1: what is the relationship among these relation networks
Problem 2: How can we utilize the relationship to meet the user’s query
Mining Hidden Community in Heterogeneous Social Networks, Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, and Jiawei Han, March, Report No. UIUCDCS-R-2005-2538 UILU-ENG-2005-1731
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Problem Example3Problem Example3
Application Domain: bibliographic network
Pre-condition 1: given multi relations – the co-author networks of a conference in some years
Pre-condition 2: topics of publications
Problem: Can we predict if two researchers will be co-author in the future, based on two types of networks?
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network Introduction
Networks in Biological System
Mining on Social NetworkLinking MiningMulti Relational Mining
Problem Specification
Proposed approach
OutlineOutline
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Proposed approachProposed approach
Random walk with restart
1
4
3
2
5 6
7
9 10
811
12
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
43
2
5 6
7
9 10
811
120.13
0.10
0.13
0.130.05
0.05
0.08
0.04
0.02
0.04
0.03
More red, more relevant
Nearby nodes, higher scores
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Proposed approachProposed approach
Basic ideaRWR serves as a measure for proximity between
two nodes in networkModel relationship among multi relations using
RWR
PurposeFacilitate mining more interesting patternsIncrease prediction accuracy
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Measure Relationship Measure Relationship
ICDM
KDD
SDM
ECML
PKDD
PAKDD
CIKM
DMKD
SIGMOD
ICML
ICDE
0.009
0.011
0.0080.007
0.005
0.005
0.005
0.0040.004
0.004
A: RWR!
Q: what is most related conference to ICDM
Neighborhood Formulation [Sun ICDM2005]
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Multi-Relational ModelMulti-Relational Model
ICDM author network
KDD author network
PKDD author network ICML author network
ICDM
KDD
SDM
ECML
PKDD
PAKDD
CIKM
DMKD
SIGMOD
ICML
ICDE
0.009
0.011
0.0080.007
0.005
0.005
0.005
0.0040.004
0.004
relation network
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Other ApplicationsOther Applications
Content-based Image Retrieval [He]
Personalized PageRank [Jeh], [Widom], [Haveliwala]
Anomaly Detection (for node; link) [Sun]
Link Prediction [Getoor], [Jensen]
Semi-supervised Learning [Zhu], [Zhou]…
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Social Network Analysis
Linking mining
Problem: multi relational
Proposed approach
SummarySummary
Computing & Information SciencesKansas State University
Laboratory forKnowledge Discovery in Databases
Thank youThank you