14
Open Addresses Symposium Meeting the Challenges Address inference from open data John Murray

Open addresses symposium slides

Embed Size (px)

DESCRIPTION

My slides from the ODI OpenAddresses Symposium held on 8th August. OpenAddresses is about providing an open address gazetteer for the UK as an alternative to Royal Mail PAF.

Citation preview

Page 1: Open addresses symposium slides

Open Addresses SymposiumMeeting the Challenges

Address inference from open data

John Murray

Page 2: Open addresses symposium slides

Open Addresses Symposium 2

Sources of Addresses

• Land Registry• Companies House• National Social Housing Register (NROSH)• NHS – GP surgeries, hospitals etc• Lists of schools• Government department asset lists• Scottish gazetteers (are they really open?)

8 August 2014

Page 3: Open addresses symposium slides

Open Addresses Symposium 3

Sources of Spatial Information• Ordnance Survey:

– Codepoint Open– OS Locator– OS Gazetteer– Street view– Named places, settlement seeds, DLUA boundaries, parishes.

• ONS– ONSPD Postcode directory– Built up areas.– Census boundaries.

• Land Registry– Cadestral Polygons (dispute about whether they are open)

• DfT– National Public Transport Gazetteer.

8 August 2014

Page 4: Open addresses symposium slides

Open Addresses Symposium 4

Proposal

• Build a street and places gazetteer, to which address points (PAON and SAON) may be attached.

• Use spatial data to verify veracity of loaded data from open sources.

• Apply confidence score to each record based on:– Spatial integrity– Frequency of appearance within and across sources.

• Towns and localities inferred by filling gaps.• Street layout analysis:

– Position of buildings by pixel analysis.– Postcode to numbering: e.g. odds and evens

8 August 2014

Page 5: Open addresses symposium slides

Open Addresses Symposium 5

Pixel Analysis

• Overlay vector streets and postcode centroids on OS StreetView

• Use in conjunction with OS locator for context and extent.

• Analyse pixel colour within buffer either side of road to estimate buildings extent.

• Can be used to:– Ensure veracity of other data– Infill missing properties– More accurately assign streets to postcodes

8 August 2014

Page 6: Open addresses symposium slides

Open Addresses Symposium 6

Pixel Analysis

8 August 2014

Page 7: Open addresses symposium slides

Open Addresses Symposium 7

Adding Land Registry

8 August 2014

Page 8: Open addresses symposium slides

Open Addresses Symposium 8

Maximising Available Data

• Using ONSPD, correcting postcodes where there is an unambiguous coordinate match from a terminated postcode to new one.

• Accounts for 50% of retired codes.• Correcting misspellings by reference to

dictionaries using lexical analysis.• Reference earlier versions of the data.

8 August 2014

Page 9: Open addresses symposium slides

Open Addresses Symposium 9

Source Audits

• Land Registry – Good quality, kept up to date, few errors. Covers England and Wales.

• Companies House – Data quality issues, particularly older companies. Covers UK.

• NROSH – Variable quality. Covers England

8 August 2014

Page 10: Open addresses symposium slides

Open Addresses Symposium 10

Prototype

• Contains all current GB postcodes.• Streets added where possible.• Localities added where possible.• Corrects retired postcodes where possible.• Shows nearest postcodes if not.• Built from 4 sources, with gaps filled by

inference.

8 August 2014

Page 11: Open addresses symposium slides

Open Addresses Symposium 11

Initial ResultsOS_Locator LandReg Companies NROSH Count Percent

0 0 0 1 5,042 0.29%0 0 1 0 18,990 1.09%0 0 1 1 606 0.03%0 1 0 0 111,553 6.40%0 1 0 1 25,514 1.46%0 1 1 0 20,842 1.20%0 1 1 1 5,574 0.32%1 0 0 0 227,971 13.07%1 0 0 1 41,773 2.40%1 0 1 0 115,449 6.62%1 0 1 1 7,065 0.41%1 1 0 0 381,917 21.90%1 1 0 1 166,608 9.55%1 1 1 0 348,669 19.99%1 1 1 1 122,299 7.01%

Unmatched 144,218 8.27%

  1,744,090100.00

%

8 August 2014

Page 12: Open addresses symposium slides

Open Addresses Symposium 12

Weaknesses

• Lack of addresses for Scotland• Inference not always accurate due to:– Non-vehicular streets– Streets in close proximity– Not all addresses have a street– Address elements not unique at postcode sector

• Questions about openness of some data

8 August 2014

Page 13: Open addresses symposium slides

Open Addresses Symposium 13

Conclusion

• More study needed on veracity to:– Understand issues in data.– Ensure integrity of database.– Make more accurate assumptions.

• Crowdsourcing:– Same methods could be used to ensure veracity.– Could be offered a free/low cost service to SMEs

• Lobbying for more data to be made open.

8 August 2014

Page 14: Open addresses symposium slides

Questions?

Test drive the prototype.