Upload
harvey-lyons
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Testing the spatial adjacency match of the Intiendo address matching tool
for geocoding of addresses with misleading suburb or place names
bySerena Coetzee [email protected] and
Magnus Rademeyer [email protected] at the ICC 2009,
Santiago, Chile, November 2009
Overview
• Why Geocode?
• The Address Lifecycle
• Problem statement
• Address matching with a spatial adjacency match
• Test runs
• Results
• Conclusion
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
¿Why Geocode?
We geocode addresses to link attribute data to physical
positions for the purpose of logistics, governance
(elections, rates and taxes), customer database analysis
(risk, trade area analytics) and many more….
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
The Address Lifecycle
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
We geocode addresses to link attribute data to physical
positions for the purpose of logistics, governance
(elections, rates and taxes), customer database analysis
(risk, trade area analytics) and many more….
AddressCapturing
AddressCleaning
AddressGeocoding &Verification
AddressDelivery &Analysis
Alphanumeric matching101 Rubida Street, Murrayfield incorrectly matched to 110 Rubida Street, Murrayfield
Problem statement
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Problem statement
• Alphanumeric matching by itself can cause errors
(previous slide)
• Potential solution: attribute relaxation (i.e. ignore suburb)
• Most common cause of errors (Goldberg et al. 2007)
With spatial adjacency match
• Intiendo = alphanumeric matching + spatial adjacency match
• Improves geocoding results
Alphanumeric match:
propose matched address from reference dataset
Above threshold?
Y
es, proposed
matched address
is an acceptable
result
N
o, search for
street number in
radius around
proposed address
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
NO
NO
YES
YES
With spatial adjacency match
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
With spatial adjacency match
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
1. Geocode without SpatialAdjacencyMatch (Non-spatial run)2. Geocode with SpatialAdjacencyMatch enabled (Spatial run)
Compare results
With spatial adjacency match
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
• Sample input address data 14,760 address records Test for misleading names Therefore include only addresses for which province, suburb, street name and street number are populated
Province Town Suburb Street NameStreet
Number
Gauteng Johannesburg Saxonwold Engelwold Road 19
Gauteng Pretoria Atteridgeville Sekukuni Street 104
Gauteng Midrand Noordwyk Sagewood Avenue 637
With spatial adjacency match
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Intiendo hierarchy databaseReference dataset: AfriGIS address data
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Intiendo settings
Test runs
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results
Results
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results
3% is low but improvement on bigger address sets 3% is low but improvement on bigger address sets can be significant (next slide), can be significant (next slide),
e.g. address on different sides of a highwaye.g. address on different sides of a highway
Spatial run Non-spatial run
Customer address records 14,670 14,670
Matched address records 8,905 (61%) 8,514 (58%)
Non-matched address records 5,765 (39%) 6,156 (42%)
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results
Subsequent real life implementations on bigger datasets have yielded significantly improved results.
In a dataset recently analysed for a major credit bureau, 21 million records were examined. Without Spatial adjacency 3.87 million were successfully geocoded automatically, with Spatial adjacency on, an additional 0.95 million were
geocoded for a total of 4.82 million. Thus the spatial adjacent match yielded a 24.5% improvement.
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results
Specific example
Source Province Town Suburb Street NameStreet
Number
1 Input Gauteng Alberton New Redruth Voortrekker Road 16
2 NSRGauteng(100%)
Alberton(100%)
New Redruth(100%)
Voortrekker Road(100%)
35(96%)
3 SRGauteng(100%)
Alberton(100%)
South Crest(44%)
Voortrekker Road(100%)
16(100%)
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
35 Voortrekker Road35 Voortrekker Road
16 Voortrekker Road16 Voortrekker Road
Results
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Results
If there are misleading suburb names in
addresses, alphanumeric match by itself can
cause errors.
• Intiendo = alphanumeric + spatial adjacency match
• More input addresses are matched more accurately
• Improves quality of results
• Sample test runs: 3% improvement
• Real life example: 24.5% improvement
Testing the spatial adjacency match of the Intiendo address matching tool for geocoding of addresses with misleading suburb or place names, Serena Coetzee and Magnus Rademeyer, presented at the ICC 2009, Santiago, Chile, November 2009
Conclusion
• Intiendo address matching
= alphanumeric string matching + spatial adjacency match• Improves quality of results• More addresses matched more accurately
• This work• Specific sample dataset showed improvement
• Future• More tests to understand average percentage improvement
Acknowledgements
Christopher Ueckermann from AfriGIS
for running the geocoding tests with Intiendo