Upload
jasoncapehart
View
239
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A look at the challenges involved in creating a big data product in the context of the Cascade Project (https://www.cascadeproject.com/)
Citation preview
Real Time-Big Data-Social Network-Data Science-Gamified!
a.k.a. The Cascade Project(Okay … that last part of the title isn’t true)
Jason Capehart12/12/12
1.Visualization
2.Data
3.Analysis
Show Me!
The Good, The Bad, The Ugly
Store Examples
Key-Value Hadoop, Memcached, Redis
Document MongoDB, CouchDB
Graph Neo4j, Giraph, Titan
Real Time Storm, Impala
Surely, You Must Be Joking.
Citation:Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM.
ln𝑝 (𝑥 )=α ln𝑥+𝐶
Citation:A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)
800,000,000(that’s a lot of users)
(cost = 200k for fire hose)
Sampled
Not Sampled
Citation:Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.
# Pseudo Code
id_guess = randint(0, 10^9)
user = api.get_user(id = id_guess)
Repeat until tired or rate limited
Discrete Power Law vs. Lognormal
Loglikelihood Ratio
89.46
Vuong’s Test Statistic
7.14
p-val(1-sided)
>0.99
Power Law (xmin = 281, α = 2.19)
Lognormal
Power Law (xmin = 222, α = 2.33)
Lognormal
Stretched Exponential
• Conclusions = None!– All work is in progress
• Discussion– Cascade uses open source– Opportunities to give back?
References
1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, "Power-law distributions in empirical data" SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)– Code: http://tuvalu.santafe.edu/~aaronc/powerlaws/
2. Newman, M. (2005, September-October). Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 46(5), 323-351.
3. Kwak, H., Changhyun, L., Park, H., & Moon, S. (2010). What is Twitter, a Social Network or a News Media? Proceedings of the 19th International World Wide Web (WWW) Conference (pp. 591-600). Raleigh, NC: ACM
4. Stumpf, M. P., Wiuf, C., & May, R. M. (2005). Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proceedings of the National Academy of Sciences, 4221-4224.