59
Social Web 2016 Lecture 4: How do we MINE, ANALYSE & VISUALISE the Social Web? Davide Ceolin (credits to: Lora Aroyo) The Network Institute VU University Amsterdam

VU University Amsterdam - The Social Web 2016 - Lecture 4

Embed Size (px)

Citation preview

Page 1: VU University Amsterdam - The Social Web 2016 - Lecture 4

Social Web2016

Lecture 4: How do we MINE, ANALYSE & VISUALISE the Social Web?

Davide Ceolin (credits to: Lora Aroyo)The Network Institute

VU University Amsterdam

Page 2: VU University Amsterdam - The Social Web 2016 - Lecture 4

Announcements• Results of Assignment 1 are out: well

done• Assignment 2 is out: due on 01/03!• Next deadlines:• Wednesday 23:59: final project

update• Friday 10:00: post your question• Friday 17:00: vote your question

Page 3: VU University Amsterdam - The Social Web 2016 - Lecture 4

Announcements• This Thursday (lab session):• F153 – S345 – S329• Anca & Niels will be there in

person• I will join via hangout

([email protected])• Next Monday: guest lecture

Page 4: VU University Amsterdam - The Social Web 2016 - Lecture 4

• 200 billion tweets on Twitter in 2015, by 1.3 billion registered users

• 4.5 billion likes generated on Facebook in 2015, by 1.55 billion different users

• 300 hours of videos uploaded to YouTube every minute

• 60.7 million photos uploaded to flickr per month

The Age of BIG Data

Social Web 2016, Davide Ceolin

Page 5: VU University Amsterdam - The Social Web 2016 - Lecture 4

Science with BIG Data

Social Web 2016, Davide Ceolin

Page 6: VU University Amsterdam - The Social Web 2016 - Lecture 4

BIG Data Challenges

Social Web 2016, Davide Ceolin

Page 7: VU University Amsterdam - The Social Web 2016 - Lecture 4

Big Data vs. Deep Data

• Social Web data often follow a long tail distribution

Social Web 2016, Davide Ceolin

Big Deep

Page 8: VU University Amsterdam - The Social Web 2016 - Lecture 4

enormous wealth of data = lots of insights• insights in users’ daily lives and

activities• insights in history• insights in politics• insights in communities• insights in trends• insights in businesses & brands

Why?

Social Web 2016, Davide Ceolin

Page 9: VU University Amsterdam - The Social Web 2016 - Lecture 4

enormous wealth of data = lots of insights• who uploads/talks? (age, gender,

nationality, community, etc.)• what are the trending topics? when?• what else do these users like? on which

platform?• who are the most/least active users?• ..…

Why?

Social Web 2016, Davide Ceolin

Page 10: VU University Amsterdam - The Social Web 2016 - Lecture 4

Web Source Criticism?Source criticism checklist (https://en.wikipedia.org/wiki/Source_criticism) • Who is the author and what are the qualifications of

the author in regard to the topic that is discussed?• When was the information published?• What is the reputation of the publisher?• Does the source show a particular cultural or

political bias?• Does the source contain a bibliography?• Has the material been reviewed by a group of peers,

or has it been edited?• …

How does this apply to Web sources?

Page 12: VU University Amsterdam - The Social Web 2016 - Lecture 4

How about this?

Social Web 2016, Davide Ceolin

Page 13: VU University Amsterdam - The Social Web 2016 - Lecture 4

Web of Trust

https://www.mywot.com/en/scorecard/pulse.seattlechildrens.org

Page 14: VU University Amsterdam - The Social Web 2016 - Lecture 4

Who uses it?

Social Web 2016, Davide Ceolin

Page 15: VU University Amsterdam - The Social Web 2016 - Lecture 4

PoliticiansGovernmental

institutions

Social Web 2016, Davide Ceolin

Page 16: VU University Amsterdam - The Social Web 2016 - Lecture 4

Whole society

Social Web 2016, Davide Ceolin

Page 17: VU University Amsterdam - The Social Web 2016 - Lecture 4

Whole society

repurposing data

danger of second order effect

Social Web 2016, Davide Ceolin

Page 18: VU University Amsterdam - The Social Web 2016 - Lecture 4

Whole society

Repurposing data

discoveries & correlations

Web-Scale Pharmacovigilance: Listening to Signals from the Crowd, R.W. White et al (2013)

Social Web 2016, Davide Ceolin

Page 19: VU University Amsterdam - The Social Web 2016 - Lecture 4

Scientists

Bibliometrics

Social Web 2016, Davide Ceolin

Page 20: VU University Amsterdam - The Social Web 2016 - Lecture 4

CultureHistory

Social Web 2016, Davide Ceolin

Page 21: VU University Amsterdam - The Social Web 2016 - Lecture 4

CultureHistory

Social Web 2016, Davide Ceolin

Page 22: VU University Amsterdam - The Social Web 2016 - Lecture 4

Culture

Bill Howe, University of Washington

Social Web 2016, Davide Ceolin

Page 23: VU University Amsterdam - The Social Web 2016 - Lecture 4

Entertainment

Social Web 2016, Davide Ceolin

Page 24: VU University Amsterdam - The Social Web 2016 - Lecture 4

You?

Social Web 2016, Davide Ceolin

https://klout.com/#/measure

Page 25: VU University Amsterdam - The Social Web 2016 - Lecture 4

Companies

Social Web 2016, Davide Ceolin

Page 26: VU University Amsterdam - The Social Web 2016 - Lecture 4
Page 27: VU University Amsterdam - The Social Web 2016 - Lecture 4

Who does it?

Social Web 2016, Davide Ceolin

Page 28: VU University Amsterdam - The Social Web 2016 - Lecture 4

The Rise of the Data Scientist

Data Geeks Skills:Statistics & Math

Data mungingVisualisation

Social Web 2016, Davide Ceolin

Page 29: VU University Amsterdam - The Social Web 2016 - Lecture 4

http://radar.oreilly.com/2010/06/what-is-data-science.html

The Rise of the Data Scientist

Social Web 2016, Davide Ceolin

Page 30: VU University Amsterdam - The Social Web 2016 - Lecture 4

• Data Science enables the creation of data products

• Data products are applications that acquire their value from the data, and create more data as a result.

• Users are in a feedback loop: they constantly provide information about the products they use, which gets used in the data product.

Data Science

Social Web 2016, Davide Ceolin

Page 31: VU University Amsterdam - The Social Web 2016 - Lecture 4

Data Science Venn Diagram

Drew Conway

Social Web 2016, Davide Ceolin

Page 32: VU University Amsterdam - The Social Web 2016 - Lecture 4

Data Science Venn Diagram

Social Web 2016, Davide Ceolin

Page 33: VU University Amsterdam - The Social Web 2016 - Lecture 4

Social Web 2016, Davide Ceolin

Page 34: VU University Amsterdam - The Social Web 2016 - Lecture 4

Popular Data Products

Data Science is about building products

not just answering questionsSocial Web 2016, Davide Ceolin

Page 35: VU University Amsterdam - The Social Web 2016 - Lecture 4

Popular Data Products

empower the others to use the

data

empower the others to their own analysis

Social Web 2016, Davide Ceolin

Page 36: VU University Amsterdam - The Social Web 2016 - Lecture 4

Popular Data Products

http://www.metacog.com/resources/banner3.jpg

Page 37: VU University Amsterdam - The Social Web 2016 - Lecture 4

(Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’s Salford Systems Data Mining Conf. and Toon Calders’ slides)

Data mining is the exploration & analysis of large quantities of data

in order to discover valid, novel, potentially useful, & ultimately understandable patterns in data

http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.jpg

Data Mining 101

Social Web 2016, Davide Ceolin

Page 38: VU University Amsterdam - The Social Web 2016 - Lecture 4

Databases

Statistics/ Numerical methods

Artificial Intelligen

ce

Data Mining 101

• Data input & exploration

• Preprocessing• Data mining algorithms

• Evaluation & Interpretation

Social Web 2016, Davide Ceolin

Page 39: VU University Amsterdam - The Social Web 2016 - Lecture 4

• What data do I need to answer question X?

• What variables are in the data?

• Basic stats of my data?

Data Input & Exploration

“LikeMiner”Social Web 2016, Davide Ceolin

Page 40: VU University Amsterdam - The Social Web 2016 - Lecture 4

• Cleanup! • Choose a suitable data model• What happens if you integrate data from multiple

sources?• Reformat your data

Preprocessing

“LikeMiner”

Social Web 2016, Davide Ceolin

Page 41: VU University Amsterdam - The Social Web 2016 - Lecture 4

• Classification: Generalising a known structure & apply to new data

• Association: Finding relationships between variables

• Clustering: Discovering groups and structures in data

Data Mining Algorithms

Social Web 2016, Davide Ceolin

Page 42: VU University Amsterdam - The Social Web 2016 - Lecture 4

• Filter users by interests

• Construct user graphs

• PageRank on graphs to mine representativeness

• Result: set of influential users

• Compare page topics to user interests to find pages most representative for topics

Mining in “LikeMiner”

Social Web 2016, Davide Ceolin

Page 43: VU University Amsterdam - The Social Web 2016 - Lecture 4

Evaluation & InterpretationWhat does the pattern I found mean?

• Pitfalls: • Meaningless Discoveries• Implication ≠ Causality (Intensive care -> death)• Simpson’s paradox• Data Dredging• Redundancy• No New Information

• Overfitting• Bad Experimental Setup

Social Web 2016, Davide Ceolin

Page 44: VU University Amsterdam - The Social Web 2016 - Lecture 4

Data Mining is not easy

Social Web 2016, Davide Ceolin

Page 45: VU University Amsterdam - The Social Web 2016 - Lecture 4

Popular ML – Deep learning

http://www.kdnuggets.com/wp-content/uploads/deep-

learning-small-big-data.jpg

http://scyfer.nl/wp-content/uploads/2014/05/

Deep_Neural_Network.png

Page 46: VU University Amsterdam - The Social Web 2016 - Lecture 4

Deep learning frameworks

https://code.facebook.com/posts/1687861518126048/facebook-to-open-

source-ai-hardware-design/

Page 47: VU University Amsterdam - The Social Web 2016 - Lecture 4
Page 48: VU University Amsterdam - The Social Web 2016 - Lecture 4

Data Journalism

Social Web 2016, Davide Ceolin

Page 49: VU University Amsterdam - The Social Web 2016 - Lecture 4

Social Web 2016, Davide Ceolin

Page 50: VU University Amsterdam - The Social Web 2016 - Lecture 4

Social Web 2016, Davide Ceolin

Page 52: VU University Amsterdam - The Social Web 2016 - Lecture 4

Source: http://infosthetics.com/archives/2011/12/all_the_information_facebook_knows_about_you.htmlSee also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg

Single Person

Social Web 2016, Davide Ceolin

Page 53: VU University Amsterdam - The Social Web 2016 - Lecture 4

http://www.brandrants.com/brandrants/obama/

Populations

Social Web 2016, Davide Ceolin

Page 54: VU University Amsterdam - The Social Web 2016 - Lecture 4

Brand Sentiment via Twitter

http://flowingdata.com/2011/07/25/brand-sentiment-showdown/

Social Web 2016, Davide Ceolin

Page 55: VU University Amsterdam - The Social Web 2016 - Lecture 4

Sentiment Analysis as Service

Social Web 2016, Davide Ceolinhttp://www.crowdflower.com/type-sentiment-analysis

Page 56: VU University Amsterdam - The Social Web 2016 - Lecture 4

http://text-processing.com/demo/sentiment/

Social Web 2016, Davide Ceolin

Page 57: VU University Amsterdam - The Social Web 2016 - Lecture 4

http://www.cs.cornell.edu/home/kleinber/networks-book/networks-book.pdf

Recommended Reading

Social Web 2016, Davide Ceolin

Page 58: VU University Amsterdam - The Social Web 2016 - Lecture 4

http://www.actmedia.eu/media/img/text_zones/English/small_38421.jpg

Assignment 2: Semantic Markup

• Part I: enrich/create a Web page with semantic markup• Step 1: Mark up two different Web pages with the appropriate markup describing properties

of at least people, relationships to other people, locations, some temporally related data and some multimedia. You can also try out tools such as Google Markup Helper

• Step 2: Validate your semantic markup. Use existing validator.• Step 3: Explain why you chose particular markups. Compare the advantages and

disadvantages of the different markups. Include screenshots from validators.

• Part II: analyse other team’s Web page markup - as a consumer & as a publisher• Step 1: Perform evaluation and report your findings (consider findability or content

extraction)• Step 2: Support your critique with examples of how the semantic markup could be

improved.• In introductory section explain what semantic markup is, what it is for, what it looks like etc. • Support your choices and explanations with appropriate literature references. • 5 pages (excluding screen shots). • Other group’s evaluation details in appendix.

• Deadline: 1 March 23:59

Page 59: VU University Amsterdam - The Social Web 2016 - Lecture 4

image source: http://www.flickr.com/photos/bionicteaching/1375254387/

Hands-on Teaser

• Build your own recommender system 101• Recommend pages on del.icio.us • Recommend pages to your Facebook friends

Social Web 2016, Davide Ceolin