Upload
edgar-davidson
View
214
Download
1
Embed Size (px)
Citation preview
Data Analysis in YouTube
Introduction • Social network + a video sharing media
– Potential environment to propagate an influence.
• Friendship network and subscribers network
– Friendship network : undirected graph
– Subscribers network : directed graph
Data collecting• Why Crawling in the social media?
– The prospective graph is large and dynamic.
• Writing script for crawling vs. YouTube API– Application program interface(API) : public web interface provided by Google
– Private group can not be crawled.
• Snow-ball sampling – a kind of BFS
– Focus on WCC (weakly connected component)
– Does not contain isolated nodes and nodes in large WCC
• This fraction is not large.
• Two hops are considered.– Measurement shows after three hops, averagely, videos are propagated through other social media like
Facebook ( more exact depends on application).
Interaction based measurement
• Passive user
– Users who are not making much content (like comments, content generation), can not be influential.
• It is valid even for friends and friends of friends.
– Should be removed in sampling or modeled with small weight.
• Weighted graph
– Or if some subscribers makes more comments, showing that influence should be more strong.
• Edge with high weight (e.x function of mutual interaction)
Modeling • 1- How we can fit the obtained graph (through measurement) into the popular random network model?
– Power law network
– Scale-free network
• High degree node tend to be connected to other high degree node.
– Small-world network
• Small diameter and high clustering
• 2- we can propose our ideas and test those ideas over real data.– E.x. propagation or influence is the function of degree or degree of friends or friends of friends (in-degree or out-degree)???