53
1 blog.telemapics.com

Data Quality and Neogeography

  • View
    4.094

  • Download
    1

Embed Size (px)

DESCRIPTION

A review of the role played by User Generated Content in creating or augmenting spatial databases.

Citation preview

Page 1: Data Quality and Neogeography

1blog.telemapics.com

Page 2: Data Quality and Neogeography

2blog.telemapics.com

Before We StartBefore We Start

I am not here to persuade you about the I am not here to persuade you about the usefulness or limitations of Neogeography usefulness or limitations of Neogeography or User Generated Contentor User Generated Content

I am here to share my views on issues I am here to share my views on issues relating to the topic of spatial data quality relating to the topic of spatial data quality and neogeographyand neogeography

Disclaimer - In general, my observations Disclaimer - In general, my observations derive from my familiarity with mapping, derive from my familiarity with mapping, navigation and local searchnavigation and local search

Page 3: Data Quality and Neogeography

3blog.telemapics.com

My BackgroundMy Background PhD in Geography, specializing in CartographyPhD in Geography, specializing in Cartography Attended AutoCarto 1 in 1974 (and gave the Attended AutoCarto 1 in 1974 (and gave the

keynote in 2008)keynote in 2008) Associate Professor of mapping and geography at Associate Professor of mapping and geography at

SUNY Albany (1972–1985)SUNY Albany (1972–1985) Associate at Spad SystemsAssociate at Spad Systems Chief Cartographer, Chief Technologist and VP of Chief Cartographer, Chief Technologist and VP of

BizDev for Rand McNally (1986-1999)BizDev for Rand McNally (1986-1999) CTO and EVP of Engineering for go2 Systems (YP CTO and EVP of Engineering for go2 Systems (YP

over cell phones)over cell phones) Now run a consulting business focused on Now run a consulting business focused on

geospatial, especially local search, mapping and geospatial, especially local search, mapping and navigation applicationsnavigation applications

Page 4: Data Quality and Neogeography

Data Quality and Data Quality and NeogeographyNeogeography

Dr. Mike DobsonDr. Mike Dobson

PresidentPresident

TeleMapics LLCTeleMapics LLC

[email protected]@telemapics.com

Page 5: Data Quality and Neogeography

5blog.telemapics.com

Spatial Data Quality?Spatial Data Quality? Overall concern regarding the Overall concern regarding the

“fitness” of data for a particular use“fitness” of data for a particular use Accuracy of positionAccuracy of position

resolutionresolution Accuracy of AttributionAccuracy of Attribution

Logical ConsistencyLogical Consistency CompletenessCompleteness

Including spatial coverageIncluding spatial coverage Temporal relevanceTemporal relevance MetadataMetadata

Page 6: Data Quality and Neogeography

6blog.telemapics.com

Spatial Data’s Emerging Spatial Data’s Emerging PopularityPopularity

World of spatial data is explodingWorld of spatial data is exploding Accessibility to spatial data increasingAccessibility to spatial data increasing Availability of spatial data increasingAvailability of spatial data increasing Today’s online environment providesToday’s online environment provides

Easy-to-use tools for collecting spatial dataEasy-to-use tools for collecting spatial data Easy-to-use tools for analyzing spatial dataEasy-to-use tools for analyzing spatial data Easy-to-use tools for presenting spatial dataEasy-to-use tools for presenting spatial data

Page 7: Data Quality and Neogeography

7blog.telemapics.com

Why Is This of Concern?Why Is This of Concern? The quality of spatial data mitigates the The quality of spatial data mitigates the

success of communicating spatial success of communicating spatial conceptsconcepts Could this explosive growth have an influence Could this explosive growth have an influence

on the quality of spatial data?on the quality of spatial data?

Page 8: Data Quality and Neogeography

8blog.telemapics.com

Why Data Quality Is KeyWhy Data Quality Is Key

Page 9: Data Quality and Neogeography

9blog.telemapics.com

No Integrity!No Integrity!

Page 10: Data Quality and Neogeography

10blog.telemapics.com

NeogeographyNeogeography

Neogeography Neogeography ““new” geography using non-traditional new” geography using non-traditional

toolstools NeogeographersNeogeographers

Want to communicate/share their interests Want to communicate/share their interests in geography and are willing to do in geography and are willing to do something about itsomething about it

Page 11: Data Quality and Neogeography

11blog.telemapics.com

NeoGeosNeoGeos

What Roles do Neogeographers play in the What Roles do Neogeographers play in the process of communicating spatial data?process of communicating spatial data? Data collectors – database creatorsData collectors – database creators Data analyzersData analyzers Data PresentersData Presenters

While all three roles impact or are While all three roles impact or are influenced by “data quality”, today I will influenced by “data quality”, today I will focus on neogeographers and data focus on neogeographers and data collection /database creationcollection /database creation

Page 12: Data Quality and Neogeography

12blog.telemapics.com

Spatial Data Quality and Spatial Data Quality and NeogeographyNeogeography

In order to help you understand my In order to help you understand my persuasion on data quality and persuasion on data quality and neogeography, I would like to explore neogeography, I would like to explore User Generated ContentUser Generated Content UGC is one of the primary means that UGC is one of the primary means that

neogeographers use to express their neogeographers use to express their interest in Geographyinterest in Geography On this journey we will loop outside of On this journey we will loop outside of

geography and then fall back in through geography and then fall back in through mapping and other uses of spatial data.mapping and other uses of spatial data.

Page 13: Data Quality and Neogeography

13blog.telemapics.com

UUser ser GGenerated enerated CContent?ontent?

Content that is produced by users Content that is produced by users of web sites and digital mediaof web sites and digital media Contrasted with traditional media Contrasted with traditional media

producers such as broadcasters, producers such as broadcasters, production companies publishing production companies publishing companies and map database companies and map database companiescompanies

Page 14: Data Quality and Neogeography

14blog.telemapics.com

So What’s Important About So What’s Important About UGC?UGC?

Equality of opportunity to publishEquality of opportunity to publish Coupled with one of the most Coupled with one of the most

significant demographic trends in the significant demographic trends in the last century:last century: ““It’s about me” (e.g. use of It’s about me” (e.g. use of YouTube, YouTube,

MySpace, FacebookMySpace, Facebook) ) ““Especially in respect to the streets, roads Especially in respect to the streets, roads

and trails I travel, as well as the POIs I and trails I travel, as well as the POIs I frequent and the frequent and the spatial topics of interest to spatial topics of interest to meme””

Page 15: Data Quality and Neogeography

15blog.telemapics.com

Social NetworkingSocial Networking

Page 16: Data Quality and Neogeography

16blog.telemapics.com

How Did This Happen?How Did This Happen?

Technology that allows you to be Technology that allows you to be “connected”, as well as to “connected”, as well as to communicate and collaborate on communicate and collaborate on your own termsyour own terms InternetInternet Cellular telephonyCellular telephony

Development of comprehensive Development of comprehensive spatial databasesspatial databases Pushing geospatial into the mainstream Pushing geospatial into the mainstream

-Neogeography-Neogeography

Page 17: Data Quality and Neogeography

17blog.telemapics.com

How Did This Happen?How Did This Happen?

Networks provide forNetworks provide for Collective intelligence – the hive Collective intelligence – the hive

mentality or perhaps the Borgmentality or perhaps the Borg Aggregated knowledge from Aggregated knowledge from

decentralized sources (Wikipedia – decentralized sources (Wikipedia – Wikinomics)Wikinomics)

Low cost collaborationLow cost collaboration

Page 18: Data Quality and Neogeography

18blog.telemapics.com

UGC Potential BenefitsUGC Potential Benefits Linus’s law Linus’s law

With enough eyes all bugs (With enough eyes all bugs (spatial errorsspatial errors) become trivial) become trivial Contributors exhibitContributors exhibit

Self selectionSelf selection FocusFocus Self benefitSelf benefit

NumerousnessNumerousness There should be more interested spatial data There should be more interested spatial data

contributors than professional map editorscontributors than professional map editors Spatial distributionSpatial distribution

The distribution of UGCers is more ubiquitous than that The distribution of UGCers is more ubiquitous than that of professional map editors.of professional map editors.

Page 19: Data Quality and Neogeography

19blog.telemapics.com

Criticisms Of UGCCriticisms Of UGC

Some error situations are too Some error situations are too complex to be understood real-timecomplex to be understood real-time

Usability may be lowUsability may be low May require extensive error checkingMay require extensive error checking User priorities may lead to User priorities may lead to

unreliabilityunreliability Prejudice in responsesPrejudice in responses

Page 20: Data Quality and Neogeography

20blog.telemapics.com

Lake What Road?Lake What Road?

Page 21: Data Quality and Neogeography

21blog.telemapics.com

Not enough Contributors -Not enough Contributors -Data Points?Data Points?

Page 22: Data Quality and Neogeography

22blog.telemapics.com

User Priorities - OooopsUser Priorities - Oooops

Page 23: Data Quality and Neogeography

23blog.telemapics.com

Prejudice in Response?Prejudice in Response?

Page 24: Data Quality and Neogeography

24blog.telemapics.com

Prejudice in ResponsePrejudice in Response

Page 25: Data Quality and Neogeography

25blog.telemapics.com

UGC And Spatial DatabasesUGC And Spatial Databases

Page 26: Data Quality and Neogeography

26blog.telemapics.com

Spatial Database CreationSpatial Database Creation

Page 27: Data Quality and Neogeography

27blog.telemapics.com

What’s Being Optimized In What’s Being Optimized In The Previous Process?The Previous Process?

spatial data qualityspatial data quality Accuracy of positionAccuracy of position

resolutionresolution Accuracy of AttributionAccuracy of Attribution

Logical ConsistencyLogical Consistency CompletenessCompleteness

Including spatial coverageIncluding spatial coverage Temporal relevanceTemporal relevance MetadataMetadata

Page 28: Data Quality and Neogeography

28blog.telemapics.com

How Optimized?How Optimized? Data Quality is an integral part of the processData Quality is an integral part of the process

InitiallyInitially Data collected according to specificationsData collected according to specifications

Bad data re-collected or placed in the update queueBad data re-collected or placed in the update queue OngoingOngoing

Every year significant spatial changes are accommodated.Every year significant spatial changes are accommodated. Areas of high change are identified and updated.Areas of high change are identified and updated. Other changes are found by systematically working Other changes are found by systematically working

research teams through the entire coverage over timeresearch teams through the entire coverage over time The overall assignment is designed toThe overall assignment is designed to maximize the maximize the

time value of money, time value of money, while increasingwhile increasing the integrity of the integrity of the database.the database.

Page 29: Data Quality and Neogeography

29blog.telemapics.com

HarmonizationHarmonization

It is this attempt to It is this attempt to actively actively harmonize all dataharmonize all data that distinguishes that distinguishes database building efforts.database building efforts.

Important IssuesImportant Issues Who directs crowdsourced data from an Who directs crowdsourced data from an

editorial perspective?editorial perspective? Who sets standards for crowdsourced data?Who sets standards for crowdsourced data? Who Quality Controls crowdsourced data?Who Quality Controls crowdsourced data? What external guidance exists in What external guidance exists in

crowdsourced systems ? crowdsourced systems ?

Page 30: Data Quality and Neogeography

30blog.telemapics.com

Three Categories of Spatial Three Categories of Spatial DataData

Controlled dataControlled data OS, Navteq, TeleAtlas, INFOusaOS, Navteq, TeleAtlas, INFOusa

Hybrid (a mix of controlled and Hybrid (a mix of controlled and uncontrolled data)uncontrolled data)

Google, Yahoo, MSN, TomTomGoogle, Yahoo, MSN, TomTom

Crowdsourced (uncontrolled)Crowdsourced (uncontrolled) OSM, Flickr, etcOSM, Flickr, etc

Page 31: Data Quality and Neogeography

31blog.telemapics.com

IssueIssue

It is possible to manage It is possible to manage controlled data controlled data qualityquality to meet specific requirements to meet specific requirements

It is possible to manage It is possible to manage hybrid data hybrid data qualityquality to meet specific requirements to meet specific requirements

But can you manage But can you manage crowdsourced data crowdsourced data qualityquality to meet specific requirements on a to meet specific requirements on a reliable basis?reliable basis?

Let’s look at database compilation for Let’s look at database compilation for some insightssome insights

Page 32: Data Quality and Neogeography

32blog.telemapics.com

CompilationCompilation

CommercialCommercial Training in Training in

compilationcompilation SpecializationSpecialization Staff size limitedStaff size limited Research limitedResearch limited Sweat of the browSweat of the brow

But salaried sweat of But salaried sweat of the browthe brow

WikiWiki Self SelectionSelf Selection Local experienceLocal experience Staff size Staff size

potentially potentially unlimitedunlimited

Research hours Research hours potentially potentially unlimitedunlimited

AvocationAvocation

Page 33: Data Quality and Neogeography

33blog.telemapics.com

Compare and ContrastCompare and Contrast CommercialCommercial

What are my coverage What are my coverage goals?goals?

What are my accuracy What are my accuracy goals?goals?

How Much can I spend How Much can I spend on updating?on updating?

What size of capable What size of capable staff can I afford?staff can I afford?

How well can I pay How well can I pay them?them?

How can I otherwise How can I otherwise incent them to create incent them to create the best database the best database possible?possible?

WIKIWIKI How many people will How many people will

contribute?contribute? How many are How many are

capable?capable? Where are they Where are they

located?located? Does this match areas Does this match areas

of weak coverage?of weak coverage? How long will it take to How long will it take to

get good results over get good results over large coverages?large coverages?

How to motivate these How to motivate these collaborators over long collaborators over long periods?periods?

Page 34: Data Quality and Neogeography

34blog.telemapics.com

What Are The Potential What Are The Potential Weaknesses of WIKI?Weaknesses of WIKI?

Common issuesCommon issues Not enough data gatherers to validate the data Not enough data gatherers to validate the data

or a method to redeploy themor a method to redeploy them Not enough coverage to meet the need (the Not enough coverage to meet the need (the

distribution of the UGCers)distribution of the UGCers) Or a method to redeploy themOr a method to redeploy them

Lack of StandardsLack of Standards Lack of Quality ControlLack of Quality Control

But all of these limitation can be But all of these limitation can be accommodatedaccommodated

Page 35: Data Quality and Neogeography

35blog.telemapics.com

Getting Around Some UGC Getting Around Some UGC IssuesIssues

Page 36: Data Quality and Neogeography

36blog.telemapics.com

Are Other Types of Spatial Are Other Types of Spatial Databases Superior?Databases Superior?

Even with the benefits of Moolah ($) -Even with the benefits of Moolah ($) -Major navigation databases areMajor navigation databases are Out of dateOut of date InaccurateInaccurate Non-comprehensiveNon-comprehensive Variable qualityVariable quality Too expensive to maintainToo expensive to maintain

Navteq database extension and update Navteq database extension and update costs in 2007 were over $300,000,000costs in 2007 were over $300,000,000

Page 37: Data Quality and Neogeography

37blog.telemapics.com

www.refnum.com/osm/gmaps/www.refnum.com/osm/gmaps/Haywards HeathHaywards Heath

Page 38: Data Quality and Neogeography

38blog.telemapics.com

And That’s Why UGC and And That’s Why UGC and NeogeographersNeogeographers

Will become an integral part of Will become an integral part of building spatial databasesbuilding spatial databases

Hybrid data collection systems using Hybrid data collection systems using UCG and controlled data are where UCG and controlled data are where geospatial is goinggeospatial is going Let’s lookLet’s look

Page 39: Data Quality and Neogeography

39blog.telemapics.com

Old Information SharingOld Information Sharing

Page 40: Data Quality and Neogeography

40blog.telemapics.com

New Information SharingNew Information Sharing

Page 41: Data Quality and Neogeography

41blog.telemapics.com

What’s The New ProcessWhat’s The New Process

Page 42: Data Quality and Neogeography

42blog.telemapics.com

Social Networking Tools Of Social Networking Tools Of Interest in CompilationInterest in Compilation

Page 43: Data Quality and Neogeography

43blog.telemapics.com

Spatial Data CollectionSpatial Data Collection

Some UGC will be activeSome UGC will be active User connects to an app and enters relevant User connects to an app and enters relevant

spatial data for updating or extending a spatial data for updating or extending a spatial databasespatial database

Some UGC will be passiveSome UGC will be passive Device tracks and reports (anonymously) Device tracks and reports (anonymously)

user paths, builds database by merging path user paths, builds database by merging path information over timeinformation over time

Passive is particularly useful in building navigation Passive is particularly useful in building navigation databasesdatabases

Page 44: Data Quality and Neogeography

44blog.telemapics.com

Relative CostRelative Cost

Data Compilation Techniques

0

20

40

60

80

100

120

140

160

C C + D C +D +A C +D +P C+D+P+A

Rel

ativ

e C

ost

Page 45: Data Quality and Neogeography

45blog.telemapics.com

Relative AccuracyRelative Accuracy

Data Compilation Techniques

020406080

100120140160180200

C C + D C +D +A C +D +P C+D+P+A

Rel

ativ

e A

ccu

racy

Page 46: Data Quality and Neogeography

46blog.telemapics.com

Summing UPSumming UP

Data Collection SystemsData Collection Systems Closed – commercial compilation efforts, Closed – commercial compilation efforts,

no UGCno UGC Open – WIKI approaches, no proprietary Open – WIKI approaches, no proprietary

datadata Hybrid – where geospatial is goingHybrid – where geospatial is going

Advantages spatial data accuracy by Advantages spatial data accuracy by contributing the best of both approaches.contributing the best of both approaches.

Page 47: Data Quality and Neogeography

47blog.telemapics.com

Raises These QuestionsRaises These Questions

Will the winners beWill the winners be Established commercial companies that Established commercial companies that

capitalize on UGC to augment their capitalize on UGC to augment their data?data?

New competitors that commercialize New competitors that commercialize UGC and augment these data to UGC and augment these data to compete with established commercial compete with established commercial systems?systems?

Page 48: Data Quality and Neogeography

48blog.telemapics.com

PND Data Flow – A PND Data Flow – A WinnerWinner

Page 49: Data Quality and Neogeography

49blog.telemapics.com

UGC Open Street Data UGC Open Street Data Flow – No MedalFlow – No Medal

Page 50: Data Quality and Neogeography

50blog.telemapics.com

Commercializing UGCCommercializing UGC

Page 51: Data Quality and Neogeography

51blog.telemapics.com

Relative Benefits Of Types Relative Benefits Of Types Of UGC By DeviceOf UGC By Device

Page 52: Data Quality and Neogeography

52blog.telemapics.com

Why We Need UGC and Why We Need UGC and NeogeographersNeogeographers

Page 53: Data Quality and Neogeography

53blog.telemapics.com

ThanksThanks