45
Caching is Your Friend Creating Tons of Maps on the Fly and Auto- Updating Some

Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

  • Upload
    vophuc

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Caching is Your FriendCreating Tons of Maps on the Fly and Auto-

Updating Some

Page 2: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Outline

● Empower Engine Overview● Tons of Maps, Tons of Tiles● Caching 1: Map Cache Tables● Caching 2: Daily Updating Data● Store stuff well: PostGIS tips & tricks

Page 3: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Julie Goldberg● 15+ years as a software engineer● Democratic Politics & Campaign Software since 2003-2004 (Howard

Dean’s presidential campaign)

Noah Glusenkamp● Democratic Politics since Obama in Iowa in 2007● Made many one-off maps for many campaigns he worked on● UI and product design focused

Who We Are

Page 4: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Campaigns Organize Geographically

Organizers each have their own turf

Local variation matters.

Page 5: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Marriage Example

Scope = Washington State Scope = Seattle

Page 6: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Overall Problem● Any Scope● Any Data Layer● Any Granularity All Within the Request/Response Cycle

Page 7: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

DEMO….

Page 8: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Our Stack● Postgres/PostGIS● Tilestache ● Mapnik ● S3● Django● Leaflet● Celery

Page 9: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Maps Empower & Motivate● Maps tell a story. Spreadsheets don’t.● Maps motivate organizers, volunteers and even

candidates.● Organizers don’t know what a shapefile is, and they

shouldn’t need to.

Page 10: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

User TestimonialWith Empower Engine I can visually show vols where the densest part of my turf is and provide them with reasoning why I want them to go canvass in those areas. Even if means that they have to drive 20 mins or so. Also vols get so stoked when I provide them with inside SCIENCE on how we are going to win this election. - Kathleen Austad, Field Organizer 2014

Page 11: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Empower Engine: What it Does● Very easy to create or edit choropleth maps.● Campaign manager can make a set of important

“distribution maps” and distribute them to each organizer at their turf.

● Rescales/reclassifies each map for each user to their turf.

● Some data layers are static.● Most interesting ones update daily (doors in last week,

universe density, early votes).

Page 12: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Outline

● Empower Engine Overview● Tons of Maps, Tons of Tiles● Caching 1: Map Cache Tables● Caching 2: Daily Updating Data● Store stuff well: PostGIS tips & tricks

Page 13: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Dynamic Maps & Tiles in Tilestache:The Problem

● Tilestache normally has a config file with one configuration per map.

● Every time someone changes a map or makes a new one, we can’t change the config file.

Page 14: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Tilestache Out of the Box● Tilestache expects you to pre-generate your tiles or

generate them on demand using Mapnik and/or a custom provider.

● You can cache tiles in many ways, including S3. ● Image tiles and UTFGrid tiles both.

Page 15: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Custom Tilestache Wrapperhttps://gist.github.com/JulieGoldberg/6926274● Specify parameter names on start-up. ● Passes parameters to each tile request. ● MapId tells our custom provider (based on Mapnik)

what data layers, colors, etc. to use.● Our custom cache inserts /MapId/#/Version/# into

our S3 path. Editing a map invalidates tiles.

Page 16: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

S3 Path and Tiles

/map_id/5086/version/0/district_value/11/329/714.png

/map_id/5086/version/1/district_value/11/329/714.png

Page 17: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Outline

● Empower Engine Overview● Tons of Maps, Tons of Tiles● Caching 1: Map Cache Tables● Caching 2: Daily Updating Data● Store stuff well: Miscelaneous PostGIS tips

& tricks

Page 18: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Why Cache?● Space is cheap.● Geometric queries are slow.● Indexed lookups are fast. No JOINs/WHERE

clauses are even faster. ● We want to make tiles on the fly.● We want to generate new maps in the web

request/response cycle.● We want to update lots of data every day as

quickly as possible.

Page 19: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Size of our tables (just for WA)

Districts: 6,226,475Hexagon, Precinct, County, Census Block, etc.

District Attribute Values: 27,246,617# Marriage Votes in Precinct 12345, # Doors Attempted yesterday in Hexagon 54, etc.

Page 20: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Only cache tables easily be recreated....

Page 21: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Making Tiles Efficiently: Cache Table

● Mapnik does geospatial queries to find the relevant districts and associate the data.

● Both our district_shapes table and our values table are huge.

● The smaller the set of data, the better it performs.● Don’t JOIN any tables inside Mapnik. ● Make a cache table per map with all data values and

shapes.

Page 22: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Sample Map Cache Table DefinitionCREATE TABLE map_caches.map_2270 (

district_id integer,

attribute_15230 double precision,

attribute_15231 double precision,

attribute_15232 double precision,

attribute_15233 double precision,

geometry geometry(MultiPolygon,3857),

short_name character varying(12),

long_name character varying(100),

sq_miles double precision,

colorized_attribute_id integer, -- important for plurality maps

colorized_value double precision,

author_provided_id character varying(60))

Page 23: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Make Cache Tables Efficiently● Don’t do any geometric queries to determine what data

will be on a map.● Pre-calculate all possible granularities with all possible

scopes.○ Medium Hexagons in Puyallup? Here’s the list.○ 2012 Precincts in Washington’s 3rd CD? Here you

go.● When asked to pull any data layer in any scope at any

granularity, we join on indexed integer IDs.

Page 24: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Pre-Computing Takes Time● It takes hours or sometimes days to pre-compute

granularity-scope intersections when adding districts.● Not in the web request/response cycle.● We have scripts that compute them and populate the

cache when we add districts.● We’ve optimized this some, but it’s a one-time cost

whenever we add districts.

Page 25: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Outline

● Empower Engine Overview● Tons of Maps, Tons of Tiles● Caching 1: Map Cache Tables● Caching 2: Daily Updating Data● Store stuff well: PostGIS tips & tricks

Page 26: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

What sort of data do we have?Universe = People Campaign Currently Wants to Contact ● Universe Density

● Universe Penetration ○ What % of your universe in

each area has the campaign attempted or contacted?

○ In specified day range or since specified date.

● Early voters in universe● All Door/Phone

Attempts/Contacts in specified day range

Page 27: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Refreshing the Data Daily● Download daily from the DNC’s data warehouse.● Voters in campaign’s universes are likely to be voters

contacted. The same voters may be in multiple universes.

● Download the voter id and geocode for all the voters we care about.

● Download all the data we want to aggregate per voter.

Page 28: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Aggregate All Points to PolygonsAvoid Overplotting

Typical GOTV Day in Washington State:3,894,333 Voters 692,744 Contacts or Attempts6,305,129 People in All Universes1,404,251 Penetration Into Universes

Page 29: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Daily Person-District Lookups● Need a person->district lookup for every district group

(small hexagons, current precincts, etc.).● Recreate these cache tables every day.● All daily updated per-voter data can be joined to this to

lookup and grouped on district.

Page 30: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Person-District Mappings Change (but not much each day)

● Jennifer Smith (person id 444) moves in with Amanda Jones (person id 987) at geocode (47.6717833,-122.3814195) ○ Medium Hexagon: 423, district 12345.○ 2014 Precincts: SEA 36-1324, district 98765

● If Amanda lived there already, we may know the districts.

● If they rented a new house and the prior resident was a non-citizen, we should district the new geocode once.

● They’re unlikely to move again this year.

Page 31: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

People Share Geocodes● Couples, families, apartment buildings● Lookup table of PersonID, Latitude, Longitude,

DistrictGroupID -> DistrictIDWhat large hexagon is Jennifer in, now that she lives at (47.6717833,-122.3814195)?

● If it’s not there, look for the geocode without the PersonID before trying to compute it.

Page 32: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Districting New Geocodes● Pull DISTINCT list of geocodes to district.

○ Speeds things up by factor of 2 or 3 (only geocode Amanda & Jane’s new house once).

● Store a cache table of shapes for each district group.● No JOINs or WHERE clauses improves performance.

Page 33: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Keep Newly Districted Geocodes● Add new person - geocode - district group -> district

rows to our lookup for each person at the new geocode.● Keep the new lookup rows● We’ll probably be making maps about the same voters

tomorrow.

Page 34: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Updating Data Layers● Make our daily person-district cache tables from our

person-geocode-districtGroup -> district lookup. ● For each data set, join to person-district lookup, GROUP

BY district ID.● We pre-compute and store square miles of all districts,

so # people per square mile is as easy as # people.

Page 35: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Updating Maps● Store last viewed on map.● Store last updated on data layers.● If the data is newer than the map:

○ Recreate the cache table.○ Recompute bins .○ Increment the version.

● New version invalidates the tile cache.

Page 36: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Reminder: Overall Problem● Any Scope● Any Data Layer● Any Granularity All Within the Request/Response Cycle

Page 37: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Any Scope

City of Puyallup

Legislative District 45

Page 38: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Any Data Layer

% Early Voted

# People Contacted at their Door

Page 39: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Any Granularity

2014 Primary Precincts

Medium Hexagon

Page 40: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Outline

● Empower Engine Overview● Tons of Maps, Tons of Tiles● Caching 1: Map Cache Tables● Caching 2: Daily Updating Data● Store stuff well: PostGIS tips & tricks

Page 41: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Shapes Table

SELECT *

FROM districts

LIMIT 1

Order of magnitude faster if there are no geometries in the table.

● Postgres knows the size of integer, varchar(12), etc.

● Postgres doesn’t know how big a geometry is.

● Geometry columns hold pointers to shapes.

● Create separate district_shapes table.

● Store geometries in shapes table.

Page 42: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Indexes are Your Friend● Indexes always speed up searches, but some are better

than others.○ Integer best○ Float second choice○ Geometry worst

● Use person-latitude-longitude index before latitude-longitude index

● Store latitude and longitude separately, because Postgres can index floats better than geometries.

Page 43: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Searching Districts

● Auto-complete on name for selecting boundary scopes.

● Millions of granularity-only districts○ precincts, hexagons, census blocks

● Thousands of boundary scopes ○ CDs, cities, states

● Washington State:○ 768 boundary scopes○ 6 million granularities

Page 44: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

ScopeableDistricts Materialized View

● Make a materialized view for districts that can be boundary scopes.

● View is refreshed whenever we add such districts (on back end).

● Could add a text index on the view if we need.

Page 45: Caching is Your Friend - FOSS4G NA 2015 is... · Caching is Your Friend ... Django Leaflet Celery. ... Materialized View Make a materialized view for districts that can be

Questions?Contact: [email protected]