Upload
titus
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
DNC-Big Data and Data Mining in 2012 US Election. Azamat Kamzin Mandar Bhide. Overview. Highlights of Narwhal System Organization Classification Associative patterns Predictive models References. Highlights. Codename: Narwhal Budget:$100 million Lead Developer: Scott VanDenPlas - PowerPoint PPT Presentation
Citation preview
DNC-Big Data and Data Mining in 2012 US Election
Azamat KamzinMandar Bhide
Overview
• Highlights of Narwhal
• System Organization
• Classification
• Associative patterns
• Predictive models
• References
Highlights
• Codename: Narwhal
• Budget:$100 million
• Lead Developer: Scott VanDenPlas
• Chief Analytics: Dan Wagner
• Team: Approx. 200 members
• General Objective:o Bring together information on voters, supporters, donorsat one
place( unlike in 2008 where information was split 6 different servers/vendors)
• It was top 20 largest consumer/customer databases ever madeo Size: As per VanDenPlas tweet
“4Gb/s, 10k requests per second, 2,000 nodes, 3 datacenters, 180TB and 8.5 billion requests...”
(Service Provider:Amazon Cloud)
System Organization
Data Collection
/Enrichment• Automated 1.2 million call survey per day
• Tracking visitors behavior online using cookies
DreamCatcher
• Level of support for Obama
• Likelihood to vote• Estimate donation
Amount
Narwhal
2008 Voter databases
Private/ Public
Databases
Call/Email to motivate the voter
Best Channel and timeslot
to advertise
Directing volunteers to right
door
Right email Ad to right person
Dreamcatcher -Voter Classification
• Classification was done in 4 categories
Dreamcatcher:Association Pattern
• Output: Detailed profile of voters
• Inputs are attributes of each individual stored in Narwhalo Voting historyo Social media Likes, commentso Volunteeringo Magazine subscriptionso Registered caro Insurance datao Individual Private Information
from firms like Aristotle
Predictive Models
• A/B Testing:
To understand which image or text user response will be
higher
Ex. “Learn More” garnered 18.6 percent more signups per
visitor than the default of “Sign Up.”
• Time Series Analysis: To understand Approval and disapproval trend
• Regression o Used to calculate Electoral votes(dependent variable) based on top
issues such as economy, healthcare etc.o Packages used were SAS, R and MATLAB
• Decision Treeso We don’t believe they used decision trees due to large number of
attributes which differ with each individual
Predictive Models
Reference
•Author: Michael Scherer ( November8, 2012). “How Obama's data crunchers helped him win” . Retrieved from http://www.cnn.com/2012/11/07/tech/web/obama-campaign-tech-team
•Author: Sasha Issenberg (December 19, 2012). “How President Obama’s campaign used big data to rally individual voters” . Retrieved from http://www.technologyreview.com/featuredstory/509026/how-obamas-team-used-big-data-to-rally-voters/