Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
•
•
•
• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol
• The Revolution
• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol
• The Revolution
• 1992-1995 • Given the proceedings of the Canadian parliament,
3 millions sentences carefully translated into French and English, the Candide system automatically learns how English and French are related.
• Worked well, but never became popular • And it could not be improved further!
– Takes every translation it can find on the web.
– A trillion of words, 95 billions English sentences
– Very unevenly translated!
– Does not apply any grammatical rule, no models, only statistical
analysis.
– It works way better than anything else.
– Not because of better quality of the data. Just because of size.
– It got the size because it accepted bad, messy data, not made
for this purpose.
Trading quantity with quality
– It improves all the time.
Do we need models, when we have lots of data?
It is not always enough to crunch data!
MODEL-BASED STATISTICS
• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol
• The Revolution
Syrian Civil War
•
•
•
Significance, April 2015 Megan Price, Anita Gohdes and Patrick Ball
• policy and military decisions • resource allocation • war crimes tribunals
•
•
•
•
4 groups produce 4 lists of people killed in Syria:
We can match the lists and compare reports.
4 groups produce 4 lists of people killed in Syria:
IDEA: Comparing the size of the overlaps • If most of the cases on the lists overlap, the real
number of deaths is not much larger than the number of cases listed.
4 groups produce 4 lists of people killed in Syria:
IDEA: Comparing the size of the overlaps • If most of the cases on the lists overlap, the real
number of deaths is not much larger than the number of cases listed.
• If the overlap is small, the number of deaths is larger than the union of reports.
N = true unknown number of deaths. Yellow list has A individuals, M of those are also in the blue list, which has in total B individuals. The probability of being in a random list of size A
from a population of size N is 𝐀
𝐍.
The probability of being in a list of size B is 𝐁
𝐍.
The probability of being in a list of size M is 𝑴
𝐍.
If two organisations work independently, the probability of being in both
yellow and blue list is the product of the individual probabilities: 𝐀
𝐍∙
𝐁
𝑁.
But “ to be both in A and B” is the same as M, so it must be: 𝐀
𝐍∙
𝐁
𝐍=
𝐌
𝐍 ,
and therefore we estimate 𝐍 =𝐀∙𝐁
𝐌.
N
A B M
reporting groups
• Documented data suggest that deaths slightly decreased from one month to the next, while the estimates tell this is not true.
95% confidence interval
• 1554 documented casualties
December 2012 ----- March 2013
• Confidence intervals suggests there were as many as 3793 deaths
MODEL-BASED STATISTICS
DELIVERS DEEPER UNDERSTANDING THAN JUST DATA SUMMARIES
• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol
• The Revolution
TRACKING UNEMPLOYMENT USING MOBILE PHONE DATA
Toole, J. L., Lin, Y. R., Muehlegger, E., Shoag, D., González, M. C., & Lazer, D. Journal of The Royal Society Interface, 2015
• Real time estimate of changes in unemployment, at arbitrarily fine spatial scale, using mobile phone data already collected.
• Ahead traditional indicators in European countries
Data - mobile phone calls: • caller -> receiver • location • time
Training: Case of a large factory closing down
• Compare individual signal before vs. after closure • Find special features of the signal when job is lost
Calibrating: A region with official unemployment estimates • Match “lost-job” mobile phone signal to unemployment rates
Predict: Current (and near future) unemployment
Training
Prediction
Based on mobile phones
Official rates
• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol
• The Revolution
University of Pittsburgh
EXPERIMENT:
• 720 female and male social drinkers, 21-28 years
• randomly assigned to groups of 3 strangers
• seated, offered three beverages during 36 minutes
• video recorded (35 million frames)
Alcohol groups: juice plus vodka
Placebo groups: told alcohol, given juice + hint of vodka
Smiles are infectious!
Explore the impact of alcohol and group gender composition
on the likelihood that an initial smile will progress into a mutual
smile, instead than remaining unreciprocated.
(Enjoyment) “Duchenne” smile
Social Display
Smiling and Speech Behavior of Three Group Members for
10 Minutes of Interaction
Time (Coded Every 1/30th Sec)
Sp
eech
Sm
ilin
g
Group Gender
Makeup
Alcohol
% (N)
Placebo
% (N)
All Males 50.4% (614) 37.4% (321)
2 Males 1 Female 48.8% (780) 44.4% (490)
1 Male 2 Female 49.8% (714) 49.1% (683)
All Females 49.4% (822) 48.2% (679)
Percentage of smiles leading to a mutual smile
Effect of alcohol on mutual smiles is larger in all-males
than in the groups with all females.
Personalised solutions
Forecasting the transient
•
•
•
•
• Automatic language translation • Syrian civil war • Unemployment • Happiness and alcohol
• The Revolution
The impact of
data rich information technologies
is deep.
• we work less
• The Internet-of-Things • Automation, sensors • Smart software
1. Jobs get replaced, brokers, megler, drivers, IT, … 2. More productive, less time searching for info… Fewer working hours to do the same job.
• we work less
• less difference between work & free time,
weakening the concept of salary
1. Freelance, not just one employer 2. Networking (after work) is part of the job 3. Ideas matter, and they do not come between 9 and 5 Is this work or my own free time? What shall a salary cover? More trust and collaboration between employers and employees is necessary.
• we work less
• less difference between work & free time,
weakening the power of salary
• less private property
1. Information and data are abundant and available for free (while markets exploit scarcity) 2. Re-use of data 3. Monopolies (which can play with prices) that capture data and sell it, will fail, because much data is free. The concepts of “cost” and “private property” are shaking. There is less possibility for profit.
• we work less
• less difference between work & free time,
weakening the power of salary
• less private property
• the sharing economy, escaping the
market rules
1. Collaborative production of goods, services 2. Organisations are different: no managers, no contracts 3. Need to redefine taxing systems
loppemarked
• Product for all and for free • Produced in collaboration • no profit • 208 employers • 73.000 active contributors
• if commercial: 3 billion USD revenue/year • but impossible for others to make profit in this area any more
• we work less
• less difference between work & free time,
weakening the power of salary
• less private property
• the sharing economy, escaping the
market rules
• networking people, puts power in the
hands of many
THE END OF
FREE MARKET
CAPITALISM?
•
•
•
My health records
“Can we use your data for a study on Alzheimer?”
“Can we use your data for a study on myopia?”
1980
Documents of war: Understanding the Syrian conflict Megan Price, Anita Gohdes Patrick Ball