Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Big Data – so what’s the big deal?
Jevin D. West iSchool, University of Washington
What is big data?
Agenda • Introductions • Big data – why should you care? • Introduction to big data • Examples • Nuances in big data • Big data at UW and in Seattle • An exercise in big data • Concerns • Big data skills • Homework • References
[email protected] ������
MGH 330D
Molecular & Cell Biology
Medicine
Physics
Ecology & Evolution
Economics
Geosciences
Psychology
Chemistry
Psychiatry
Environmental Chemistry & Microbiology
Mathematics
Computer Science
Analytic ChemistryBusiness & Marketing
Political Science
Fluid Mechanics
Medical Imaging
Material Engineering
Sociology
Probability & Statistics
Astronomy & Astrophysics
Gastroenterology
Law
Chemical Engineering
Education
Telecommunication
Control Theory
Operations Research
Ophthalmology
Crop Science
Geography
Anthropology
Computer Imaging
Agriculture
Parasitology
Dentistry
Dermatology
Urology
Rheumatology
Applied Acoustics
Pharmacology
Pathology
Otolaryngology
Electromagnetic Engineering
Circuits
Power Systems
Tribology
Neuroscience
Orthopedics Veterinary
Environmental Health
A
Citation flow from B to ACitation flow within field
Citation flow from A to BCitation flow out of field
B
Why should you care about big data?
A shortage of 1.5 million jobs!
Universities are going big
What is big data?
“Yes, some of the best theorizing comes a4er collec6ng data because then you become aware of another
reality…”
Robert Shiller, Nobel Price in Economics (2013)
Data Exhaust
Data Exhaust: by-product of human activity
Examples: cell phone locations, purchase transactions, social media Predicting human behavior, spread of infectious disease
Barabasi et al., Nature (2008), Ginsperg et al., Nature (2009)
Why big data?
• Cheaper sensors (climate research, astronomy, high energy physics, high-throughput gene sequencing, cell phones)
• Cheaper storage (4 TB, $168) • People willing to share their personal
information (Facebook, social media) • Faster communication (internet, cell phones) • Other reasons?
The Four A’s
• Architecture • Acquisition • Analysis • Archiving
The Four V’s
• Volume • Velocity • Variety • Veracity
Big Data is messy
Correlation versus Causation
Sampling
Big Data at UW
• LSST • CS (Farecast) • Libraries (digital content) • Oceanography • Neuroscience
Is there a secondary market for the data that companies are collecting?
Big Data in action
DJ Patil
If you had access to the personal calendars of 200 million people, what could you do with it? What products
could you create?
The power of meta data…
hHp://qz.com/140357/what-‐your-‐facebook-‐friends-‐list-‐reveals-‐about-‐your-‐love-‐life/
It’s only meta data…
hHp://kieranhealy.org/blog/archives/2013/06/09/using-‐metadata-‐to-‐find-‐paul-‐revere/
hHp://kieranhealy.org/blog/archives/2013/06/09/using-‐metadata-‐to-‐find-‐paul-‐revere/
hHp://kieranhealy.org/blog/archives/2013/06/09/using-‐metadata-‐to-‐find-‐paul-‐revere/
Betweeness centrality
Eigenvector centrality
hHp://kieranhealy.org/blog/archives/2013/06/09/using-‐metadata-‐to-‐find-‐paul-‐revere/
Big data is about asking good questions
Concerns
• Privacy • Probabilistic Models • Correlation • The big players own the big data • NSA • Reproducibility • What else?
Enjoy the wave but be cautious…
‘The Data Scientist’
Communication skills Ethical Reasoning
Information/Data Management Personnel Management
Interdisciplinary Adaptable
Big Data involves people
Homework
Example of big data (1) Why you think it is big data. (2) How it involves people?
References
“Data is increasingly digital air : the oxygen we breathe and the carbon dioxide that we exhale. It
can be a source of both sustenance and pollution.” -- Dana Boyd
D. Boyd & K. Crawford (2011) Six Provocations on Big Data. SSRN
Why should you care about big data?
Jobs
Privacy