Upload
claude-theoret
View
118
Download
3
Embed Size (px)
DESCRIPTION
Presentation to the Web Science summer school at UQAM, on the rise of the data scientist in the new economy
Citation preview
The opportunity for Social Data Scientists
@cgtheoret
Part 1 The Explosion
@cgtheoret
@cgtheoret
Every minute 8-10 months ago:
• 48 hours of video are downloaded on Youtube• 320 new accounts and 98,000 tweets appear
on Twitter• 168,000,000 million emails are sent • 20,000 new posts on Tumblr• 6,600 photos appear on Flickr• Over 20% of all websites are
CMS/wordpress/etc…
Every minute today:
• 100 hours of video are downloaded on Youtube
• ??? new accounts and 236,000 tweets appear on Twitter
• 204,000,000 million emails are sent • 28,000 new posts on Tumblr• 1,600 photos appear on Flickr !!! No shit!
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
@cgtheoret
But…• Facebook has lost 1.5 million users in Canada
and 6 million in the United States • Yahoo study: 50% of the content that is read
and shared by humans is produced by only 20, 000 accounts 0.05%
@cgtheoret
@cgtheoret
Gartner is predicting an explosion in Social Media Analytics It spending
@cgtheoret
@cgtheoret
@cgtheoret
In a lot of ways Social “Big Data” is like Oil…• Difficult and expensive to extract
@cgtheoret
Difficult and expensive to extract
@cgtheoret
Difficult and expensive to store and distribute
Cheapest (and least useful) when its unrefined
@cgtheoret
@cgtheoret
@cgtheoret
In a lot of ways “Big Data” is like Oil…• Can’t be used by consumers unless refined• More expensive at every step of refinement
@cgtheoret
The Market is Producing a plethora of derived higher value data products
@cgtheoret
@cgtheoret
In a lot of ways “Big Data” is like Oil…
• Difficult and expensive to extract• Difficult and expensive to store and distribute• Cheapest in its unrefined form• More expensive at every step of refinement• Produces a plethora of derived products• and it’s actually quite “dirty”!!!!
@cgtheoret
Part 2
Social Data is one of the reasons why IBM added a 4th V to the Big Data Definition
VERACITY
@cgtheoret
Social Data Analytics = Oil Refineries
@cgtheoret
@cgtheoret
6 factors affect Data Veracity …
1. Accuracy: Is it true?2. Precision: If true, error margin?3. Reliability: Is it there all the time?4. Provenance: Can you trace the source?5. Fidelity: Did it change from the
source?6. Permission: Can you use it for the
context?
Black Hat SEO : Blogs
Twitter: 46% of brand followers are bots
Black Hat Social Marketing : Twitter
Or in some cases over 90 %…
Dissapearing Romney: FB as well…
And it is getting worse …
Trying to solve the Veracity problem …
Trying to solve the Veracity problem …
The Big Guys are now doing Veracity …
Murali Krishnam <[email protected]>Murali Krishnam <[email protected]>
@cgtheoret
Part 3The Opportunity for Social Data Scientists
@cgtheoret
@cgtheoret
“McKinsey Global Institute estimated that by 2018 there will be 4 million big data related positions in the U.S. that require quantitative and analytical skills. However, there will be a potential shortfall of 1.5 million data-savvy managers and analysts to fill these positions”
@cgtheoret @fffady
Zeitgeist
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady
@cgtheoret @fffady