21
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee [email protected] and Magnus Rademeyer [email protected] presented at the ICC 2009, Santiago, Chile, November 2009

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Embed Size (px)

Citation preview

Page 1: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool

for geocoding of addresses with misleading suburb or place names

bySerena Coetzee [email protected] and

Magnus Rademeyer [email protected] at the ICC 2009,

Santiago, Chile, November 2009

Page 2: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Overview

• Why Geocode?

• The Address Lifecycle

• Problem statement

• Address matching with a spatial adjacency match

• Test runs

• Results

• Conclusion

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Page 3: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

¿Why Geocode?

We geocode addresses to link attribute data to physical

positions for the purpose of logistics, governance

(elections, rates and taxes), customer database analysis

(risk, trade area analytics) and many more….

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Page 4: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

The Address Lifecycle

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

We geocode addresses to link attribute data to physical

positions for the purpose of logistics, governance

(elections, rates and taxes), customer database analysis

(risk, trade area analytics) and many more….

AddressCapturing

AddressCleaning

AddressGeocoding &Verification

AddressDelivery &Analysis

Page 5: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Alphanumeric matching101 Rubida Street, Murrayfield incorrectly matched to 110 Rubida Street, Murrayfield

Problem statement

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Page 6: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Problem statement

• Alphanumeric matching by itself can cause errors

(previous slide)

• Potential solution: attribute relaxation (i.e. ignore suburb)

• Most common cause of errors (Goldberg et al. 2007)

Page 7: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

With spatial adjacency match

• Intiendo = alphanumeric matching + spatial adjacency match

• Improves geocoding results

Alphanumeric match:

propose matched address from reference dataset

Above threshold?

Y

es, proposed

matched address

is an acceptable

result

N

o, search for

street number in

radius around

proposed address

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Page 8: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

With spatial adjacency match

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

NO

NO

YES

YES

Page 9: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

With spatial adjacency match

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Page 10: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

With spatial adjacency match

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

1. Geocode without SpatialAdjacencyMatch (Non-spatial run)2. Geocode with SpatialAdjacencyMatch enabled (Spatial run)

Compare results

Page 11: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

With spatial adjacency match

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

• Sample input address data 14,760 address records Test for misleading names Therefore include only addresses for which province, suburb, street name and street number are populated

Province Town Suburb Street NameStreet

Number

Gauteng Johannesburg Saxonwold Engelwold Road 19

Gauteng Pretoria Atteridgeville Sekukuni Street 104

Gauteng Midrand Noordwyk Sagewood Avenue 637

Page 12: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

With spatial adjacency match

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Intiendo hierarchy databaseReference dataset: AfriGIS address data

Page 13: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Intiendo settings

Test runs

Page 14: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Results

Results

Page 15: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Results

3% is low but improvement on bigger address sets 3% is low but improvement on bigger address sets can be significant (next slide), can be significant (next slide),

e.g. address on different sides of a highwaye.g. address on different sides of a highway

Spatial run Non-spatial run

Customer address records 14,670 14,670

Matched address records 8,905 (61%) 8,514 (58%)

Non-matched address records 5,765 (39%) 6,156 (42%)

Page 16: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Results

Subsequent real life implementations on bigger datasets have yielded significantly improved results.

In a dataset recently analysed for a major credit bureau, 21 million records were examined. Without Spatial adjacency 3.87 million were successfully geocoded automatically, with Spatial adjacency on, an additional 0.95 million were

geocoded for a total of 4.82 million. Thus the spatial adjacent match yielded a 24.5% improvement.

Page 17: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Results

Specific example

Source Province Town Suburb Street NameStreet

Number

1 Input Gauteng Alberton New Redruth Voortrekker Road 16

2 NSRGauteng(100%)

Alberton(100%)

New Redruth(100%)

Voortrekker Road(100%)

35(96%)

3 SRGauteng(100%)

Alberton(100%)

South Crest(44%)

Voortrekker Road(100%)

16(100%)

Page 18: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

35 Voortrekker Road35 Voortrekker Road

16 Voortrekker Road16 Voortrekker Road

Results

Page 19: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Results

If there are misleading suburb names in

addresses, alphanumeric match by itself can

cause errors.

• Intiendo = alphanumeric + spatial adjacency match

• More input addresses are matched more accurately

• Improves quality of results

• Sample test runs: 3% improvement

• Real life example: 24.5% improvement

Page 20: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009

Conclusion

• Intiendo address matching

= alphanumeric string matching + spatial adjacency match• Improves quality of results• More addresses matched more accurately

• This work• Specific sample dataset showed improvement

• Future• More tests to understand average percentage improvement

Page 21: Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names by Serena Coetzee

Acknowledgements

Christopher Ueckermann from AfriGIS

for running the geocoding tests with Intiendo