11
GEO-SPATIAL INFORMATION SCIENCE, 2017 VOL. 20, NO. 3, 241–251 https://doi.org/10.1080/10095020.2017.1368200 KEYWORDS Pokémon Go; VGI (Volunteered Geographic Information); Yelp; cluster; South Florida; Boston; HCI (human–computer interaction) ARTICLE HISTORY Received 26 March 2017 Accepted 1 August 2017 © 2017 Wuhan University. Published by Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. CONTACT Levente Juhász levente.juhasz@ufl.edu Where to catch ‘em all? – a geographic analysis of Pokémon Go locations Levente Juhász and Hartwig H. Hochmair Geomatics Program, University of Florida, Fort Lauderdale, FL, USA ABSTRACT In 2016, Niantic Labs released Pokémon Go, an augmented reality smartphone game that attracted millions of users worldwide. This game allows users to “catch” Pokémons through their mobile cameras in different geographic locations that often correspond to prominent places. This paper analyzes the distribution of PokéStops, Pokémon gyms, and spawnpoints in selected urban areas of South Florida and Boston. It identifies which socioeconomic variables and land- use categories affect the density of PokéStops, and how PokéStops and gyms cluster relative to each other. Using nearest neighbor analysis, this paper assesses also how actual PokéStop locations are reflected in Yelp’s “PokéStop nearby” attribute. Results show that black and Hispanic neighborhoods are disadvantaged when it comes to crowd-sourced data coverage, that PokéStops occur more frequently in commercial, recreational and touristic sites and around universities, and that PokéStops tend to cluster around gyms. The latter suggests that these point sets were generated by a similar location selection process. To mitigate geographically linked biases, future versions of augmented reality and geo-games should aim to make them equally accessible in all areas, for example by placing extra resources, such as points of interest, in neighborhoods that are currently underrepresented in data coverage. 1. Introduction Recent years experienced an increased popularity of augmented reality (AR) applications and location-based games embedded in different devices, such as smart- phones or tablets (Johnson et al. 2011; Neustaedter, Tang, and Judge 2013). AR is oſten used for educational purposes as it offers new learning opportunities through combining computer-assisted contextual layers with relevant real-world information (de Lucia et al. 2012; Wu et al. 2013). Pokémon Go, released in July 2016, is a location-based AR game that quickly became the most downloaded smartphone app on both Android and iOS in history (BBC 2016; Reisinger 2016). e game allows anyone with a smartphone to collect virtual Pokémon characters on the screen which appear to be positioned at the same location as the player. To do so, the player needs to navigate to certain points on a map. ese points are oſten placed at prominent places, such as landmark buildings or statues. Locations are based on the crowd- sourced data-set of an earlier AR game called Ingress which was mostly collected by male, tech-savvy players, leading to a concentration of virtual landmarks in com- mercial and downtown areas and fewer in non-white or residential areas (Akhtar 2016). Pokémon Go users can directly interact with three different point data-sets in the game, which are PokéStops, Pokémon gyms, and spawnpoints. PokéStops are points in the geographic space, associated with a landmark, such as a building or monument. Players need to visit these locations and perform some actions, such as flipping a coin. In return, players are rewarded with items, such as a Pokéball, which they need to capture Pokémons later. Gyms are virtual locations where Pokémons can be trained and that are also associated with landmarks. Pokémons can “pop-up” at distinct locations (spawnpoints) during the game for 30 min. During this time-window, users have the chance to visit these locations and catch the specific Pokémon, which, upon capturing, will appear in the inventory of the player. Since all these locations provide a player with opportunities to perform beneficial activities for their game, an area with a higher density of PokéStops, gyms, or spawnpoints means also more advantages to the player (Colley et al. 2017). Pokémon Go was in the center of media attention for an extended period of time with several media articles reporting issues and findings about the game. Residents of certain neighborhoods reported that they felt that there was a smaller number of PokéStops in their areas than in other areas of the same city. e lack of PokéStops in these disadvantaged neighborhoods eventually prevented these residents from effectively participating in the game. Figure 1 visually supports these inequalities of Pokémon-related point densities OPEN ACCESS Downloaded by [Universitat Salzburg] at 23:39 21 September 2017

Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

Geo-spatial information science, 2017Vol. 20, no. 3, 241–251https://doi.org/10.1080/10095020.2017.1368200

KEYWORDSpokémon Go; VGi (Volunteered Geographic information); Yelp; cluster; south florida; Boston; Hci (human–computer interaction)

ARTICLE HISTORYreceived 26 march 2017 accepted 1 august 2017

© 2017 Wuhan University. published by taylor & francis Group.this is an open access article distributed under the terms of the creative commons attribution license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

CONTACT levente Juhász [email protected]

Where to catch ‘em all? – a geographic analysis of Pokémon Go locations

Levente Juhász  and Hartwig H. Hochmair

Geomatics program, University of florida, fort lauderdale, fl, Usa

ABSTRACTIn 2016, Niantic Labs released Pokémon Go, an augmented reality smartphone game that attracted millions of users worldwide. This game allows users to “catch” Pokémons through their mobile cameras in different geographic locations that often correspond to prominent places. This paper analyzes the distribution of PokéStops, Pokémon gyms, and spawnpoints in selected urban areas of South Florida and Boston. It identifies which socioeconomic variables and land-use categories affect the density of PokéStops, and how PokéStops and gyms cluster relative to each other. Using nearest neighbor analysis, this paper assesses also how actual PokéStop locations are reflected in Yelp’s “PokéStop nearby” attribute. Results show that black and Hispanic neighborhoods are disadvantaged when it comes to crowd-sourced data coverage, that PokéStops occur more frequently in commercial, recreational and touristic sites and around universities, and that PokéStops tend to cluster around gyms. The latter suggests that these point sets were generated by a similar location selection process. To mitigate geographically linked biases, future versions of augmented reality and geo-games should aim to make them equally accessible in all areas, for example by placing extra resources, such as points of interest, in neighborhoods that are currently underrepresented in data coverage.

1. Introduction

Recent years experienced an increased popularity of augmented reality (AR) applications and location-based games embedded in different devices, such as smart-phones or tablets (Johnson et al. 2011; Neustaedter, Tang, and Judge 2013). AR is often used for educational purposes as it offers new learning opportunities through combining computer-assisted contextual layers with relevant real-world information (de Lucia et al. 2012; Wu et al. 2013). Pokémon Go, released in July 2016, is a location-based AR game that quickly became the most downloaded smartphone app on both Android and iOS in history (BBC 2016; Reisinger 2016). The game allows anyone with a smartphone to collect virtual Pokémon characters on the screen which appear to be positioned at the same location as the player. To do so, the player needs to navigate to certain points on a map. These points are often placed at prominent places, such as landmark buildings or statues. Locations are based on the crowd-sourced data-set of an earlier AR game called Ingress which was mostly collected by male, tech-savvy players, leading to a concentration of virtual landmarks in com-mercial and downtown areas and fewer in non-white or residential areas (Akhtar 2016). Pokémon Go users can directly interact with three different point data-sets in the game, which are PokéStops, Pokémon gyms, and

spawnpoints. PokéStops are points in the geographic space, associated with a landmark, such as a building or monument. Players need to visit these locations and perform some actions, such as flipping a coin. In return, players are rewarded with items, such as a Pokéball, which they need to capture Pokémons later. Gyms are virtual locations where Pokémons can be trained and that are also associated with landmarks. Pokémons can “pop-up” at distinct locations (spawnpoints) during the game for 30 min. During this time-window, users have the chance to visit these locations and catch the specific Pokémon, which, upon capturing, will appear in the inventory of the player. Since all these locations provide a player with opportunities to perform beneficial activities for their game, an area with a higher density of PokéStops, gyms, or spawnpoints means also more advantages to the player (Colley et al. 2017).

Pokémon Go was in the center of media attention for an extended period of time with several media articles reporting issues and findings about the game. Residents of certain neighborhoods reported that they felt that there was a smaller number of PokéStops in their areas than in other areas of the same city. The lack of PokéStops in these disadvantaged neighborhoods eventually prevented these residents from effectively participating in the game. Figure 1 visually supports these inequalities of Pokémon-related point densities

OPEN ACCESS

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 2: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

242 L. JUHÁSZ AND H. H. HOCHMAIR

between metropolitan Miami downtown area (Figure 1(a)) and the nearby municipality of Hialeah, which has a significant Hispanic population (Figure 1(b)). The dif-ference becomes most obvious for PokéStops and gyms.

Using a set of 600 gyms, over 5000 PokéStops, and over 18,000 spawnpoint locations within parts of South Florida, including Miami, and Boston, one of the goals of this study is to analyze how these Pokémon-related point locations cluster in geographic space and, more specifi-cally, what kind of demographic or land-use bias persists. Quantifying the latter would help players to adjust their strategy of participating in a game by becoming more effective in finding the related point of interest (POI).

There is evidence that other services and local busi-nesses try to utilize the popularity of Pokémon Go by organizing Pokémon-related special offers or develop-ing some functionalities in their sites that would be of interest to players. For example, vape shops began to monetize Pokémon Go by announcing that their store was a PokéStop, or for giving away a prize for the best Pokémon caught in a shop (Kirkpatrick et al. 2017). Another example is Yelp, which is a location-based service that gives users the ability to rate, review, and browse businesses, such as restaurants and stores. In July 2016, Yelp added a functionality to its smartphone apps and website where users could mark businesses with a “PokéStop nearby” attribute. This attribute was incor-porated into the Yelp search functionality, so that users could find businesses in the proximity of PokéStops

and therefore combine two leisure activities, which are Pokémon hunting and eating out in a restaurant. This process of accessing crowd-sourced information from one platform (i.e. PokéStops in Pokémons Go) and transferring it to another (Yelp) has been referred to as “cross-viewing” (Juhász and Hochmair 2017). In this study, we assess the correctness of Yelp’s “PokéStop nearby” attribute by comparing the spatial proximity of PokéStop-labeled Yelp businesses to their nearest PokéStops with the spatial proximity of all Yelp busi-nesses to their nearest PokéStops. This approach will demonstrate if and how AR applications affect crowd-sourced data platforms.

Based on the previous considerations, the following four research objectives can be formulated:

• R1: Identify the effect of socioeconomic factors and land-use categories on the number of PokéStops in a census block group;

• R2: For PokéStops, gym locations, and spawn-points, quantify the point distributions across dif-ferent land-use categories;

• R3: For PokéStop and gym locations, compare their relative spatial clustering patterns;

• R4: Determine the quality of the “PokéStop nearby” attribute on Yelp.

The remainder of the paper is structured as follows. The next section reviews previous work on Pokémon Go user activities as well as the distribution of Pokémon

Figure 1. pokémon-related point data-sets (a) around Downtown miami and (b) in Hialeah.

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 3: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

GEO-SPATIAL INFORMATION SCIENCE 243

Go locations and crowd-sourced POIs. This is followed by a description of the study setup in Section 3, and a presentation and discussion of analysis results in Section 4. Section 5 summarizes the findings and provides direc-tions for future work.

2. Previous work

Several media sources reported a new phenomenon around the release date of Pokémon Go, showcasing masses of people in public places watching their smart-phones while walking around. Indeed, Pokémon Go was found to motivate people to go outside and become more physically active (McCartney 2016; Nigg, Mateo, and An 2017; Xu et al. 2017). According to Clark and Clark (2016), the quick update of the Pokémon Go app demonstrates that health promotion should include a social dimension. The authors note that academic research oftentimes lacks the fast pace of technological developments in mobile industry, and that collaborations are needed between academia and industry to develop future apps for key populations, such as patients with chronic diseases or poor mental health. Besides having positive effects on people’s health, Pokémon Go encour-ages players to explore new areas, which can even lead to the identification of new species (Nature 2016). A related example is the recent detection of a new species of the pygmy devil grasshopper through a photo on Facebook (Skejo and Caballero 2016). The fact that millions of users are participating in this game is also appealing for business owners and other services since they can gener-ate revenue from this. For example, Yelp allowed its users to tag businesses near PokéStops which, in turn, allows Yelp users to search for businesses to visit in the prox-imity of PokéStops. Kondamudi, Protono, and Alhoori (2017) conducted a multi-year comparison of the num-ber of Yelp reviews for businesses with and without the “PokéStop nearby” attribute, finding that restaurants with a PokéStop nearby experienced a slight decrease in the number of reviews after the release of Pokémon Go.

The spatial distribution of PokéStops was recently analyzed by Colley et al. (2017). Using Chicago and Detroit as case study areas, the authors found a higher Pokémon stop density in white non-Hispanic neighbor-hoods than in neighborhoods with large minority popu-lations. Another bias was observed along the urban–rural spectrum, indicating a dramatic decrease in PokéStop density in rural counties compared to more urban coun-ties. A distinction between advantaged areas (with better data coverage) and disadvantaged areas (with lower data coverage) based on location has already been observed for data in social media and crowd-sourcing platforms, including Twitter, Flickr, or OpenStreetMap (Alivand and Hochmair 2017; Antoniou, Morley, and Haklay 2010; Li, Goodchild, and Xu 2013; Zielstra, Hochmair, and Neis 2013).

As Pokémon points are largely based on a crowd-sourced data-set through its adoption from Ingress, PokéStops and gym locations fall within the Volunteered Geographic Information (VGI) domain (Goodchild 2007). Yelp businesses, which are incorporated in this study, have also a VGI component since Yelp allows users to add new business locations and to write reviews for Yelp businesses. Several studies assessed the quality of VGI in general and specifically for POIs. For example, McKenzie, Janowicz, and Adams (2014) explore various techniques to automatically conflate two crowd-sourced POI data-sets, namely those of Yelp and Foursquare. In their venue data the authors identified almost 75,000 FourSquare venues within 1  km of 200 randomly selected Yelp businesses in the continental United States, illustrating a high POI density in crowd-sourced data-sets. Another study analyzed the editing history of Geographic Names Information System (GNIS) point features after their import into OpenStreetMap (OSM) in 2009 (Hochmair and Zielstra 2013). It found that OSM mappers generally helped to improve the data quality of the imported POIs. Quality analysis of Flickr tags showed that precision and accuracy of user-gen-erated data appear to be high enough to describe city neighborhoods (Hollenstein and Purves 2010).

3. Study setup

3.1. Study areas

The focus of this research was on selected urban areas in South Florida and Boston, MA. Since visual inspec-tion suggested that the underlying geography affects the number of Pokémon points, we selected 17 hexa-gon-shaped study areas with different neighborhood characteristics, such as downtown areas, suburban, touristic, and rural/agricultural areas. Figure 2 shows the study areas in South Florida (Figure 2(a)) and Boston (Figure 2(b)).

3.2. Data extraction

3.2.1. Pokémon Go locationsPokémon Go is a commercial product that does not pro-vide an open application programming interface (API) for data access on its backend. However, a discussion started on reddit (https://www.reddit.com/r/pokemon-godev) resulted in open source software libraries that accessed the communication flow between the smart-phone apps and Niantic servers, enabling information extraction. We used the pgoapi (https://github.com/tejado/pgoapi) and PokemonGo-Map (https://github.com/scottstamp/PokemonGo-Map) libraries along with multiple accounts (player profiles) in different geographic areas (Figure 2), to log on to the game pro-grammatically and to obtain Pokémon, PokéStops, and gym locations. The method used from these libraries

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 4: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

244 L. JUHÁSZ AND H. H. HOCHMAIR

3.2.3. Supplemental data sourcesThe analysis for R1 was conducted at the US Census Block Group level. This aggregation level was chosen as it avoids too many zero-count areas with regard to Pokémon locations (as would have been the case with smaller census blocks) while still providing a sufficient sample size and a detailed enough spatial granularity to capture local variability in socioeconomic and land-use-related variables between analyzed areas (which would not have been the case with, for example, larger census tracts). For each block group, 2016 projections of per-centage (%) of African-American and percentage (%) of Hispanic population were obtained from the Business Analyst of Environmental Systems Research Institute, Inc. (Esri) (Esri 2016). The presence of parks and higher education institutions in block groups was extracted from the OSM OverpassAPI and coded as dummy var-iables. More specifically, parks were extracted through

simulates the behavior of a smartphone user who moves around the city. This process is illustrated in Figure 3 where cyan circles represent zones that were scanned in each hexagon-shaped area. Our agents (one for each hexagon) started in the middle of the hexagon and sys-tematically spiralled outwards through all zones. The entire scan process for a hexagon was optimized so that it was faster than the duration that a Pokémon is visible (30 min) at a spawnpoint. This increases the chance to record a Pokémon that popped up. Scanning an area once provides a snapshot of currently visible Pokémons, which is only a subset of all available spawnpoints. To overcome this limitation, our agents were continuously collecting data between late July and early August, 2016, and unique spawnpoint locations were extracted from a large set (137,917) of Pokémon encounters. PokéStops and gym locations are static, therefore a complete list can be obtained with a single scan. We stored all locations of PokéStops and gyms as well as Pokémon encounters in SQLite databases. The final data-set contains 600 gyms, 5017 PokéStops, and 18,257 spawnpoint locations spread across South Florida and Boston.

3.2.2. Yelp businessesYelp provides an API to access its services and data, which was used to extract Yelp business information within bounding box queries (Juhász, Rousell, and Arsanjani 2016), where the API returns up to 20 busi-nesses for each query request. To build a complete data-set of Yelp businesses within a chosen geographic area, our algorithm inserted locally refined bounding boxes whenever this download threshold was reached, resulting in a geographically nested sequence of queries (Figure 4). This means that areas with a higher density of businesses, such as strip malls, required a refined grid pattern of bounding boxes to obtain all business locations within the original bounding box. In an addi-tional step, we extracted all businesses tagged with the “PokéStop nearby” attribute using a query filter.

Figure 2. study areas in (a) south florida and (b) Boston, ma.

Figure 3. example of scanned area for extracting pokémon Go point data-sets in miami Beach.

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 5: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

GEO-SPATIAL INFORMATION SCIENCE 245

transportation category in Boston combines roads and infrastructure, so that all Pokémon Go features falling onto streets or transportation infrastructure will fall into the transportation and none in the road category. We decided to keep road and infrastructure land-use cate-gories separated for Miami (and not to aggregate them into one class) to obtain more refined information about the prevalence of Pokémon Go features in different land-use classes.

4. Result analysis

4.1. Analysis methods

Different analysis methods were applied for the different research objectives. The relationship between PokéStop counts and socioeconomic and land-use-based factors (R1) was explored using a negative binomial regression model. Prevalence of Pokémon Go features in different land-use categories (R2) was analyzed using a relative count index and subsequent chi-square tests of inde-pendence. Cross K-functions were used to analyze the relative clustering between PokéStops and gyms (R3), and nearest neighbor distances were used to analyze the clustering of PokéStops around Yelp businesses (R4).

4.2. R1 – Relationship between PokéStop counts and neighborhood variables

A negative binomial regression model (NBRM) that relates PokéStop counts in census block groups with eth-nicity, race, and land-use-based factors was developed through a manual stepwise approach where variables were added and removed in an exploratory manner to improve the model fit (measured by the Akaike infor-mation criterion) while avoiding multicollinearity between predictor variables. The number of Panoramio photos within a block group was used as a proxy for tourist activities, and the number of Yelp businesses per census block group was included as a proxy for eco-nomic activity. Block group area was used as a control variable. Other predictor variables, such as population count, median age, median household income, number of jobs, or overlap with central business district were also considered in the manual stepwise approach. However, these were non-significant and therefore not shown in the final results. It is possible that with a larger sample size of observations and a larger variation of attribute values of predictor variables some of the non-significant variables turn significant, e.g. by extending the analysis area. This could, however, not be tested due to a limited Pokémon Go data-set available for our analysis. Results of the final model estimation are listed in Table 1, where the p-values indicate significance of the corresponding coefficient at p < 0.001.

Table 1 shows that parks and universities are associ-ated with an increase in PokéStops. This is in-line with

a “leisure = park” query on OSM tags for nodes, ways and relations, and universities were extracted by find-ing features that matched any of the following queries on OSM tags: “amenity = university”, “amenity = col-lege”, “building = university”, “building = college”. The presence of any of these features in a block group was also coded as dummy variables. OSM data were used for these categories, since the official land-use layers from Miami-Dade County and Massachusetts did not con-tain parks and higher institutions as separate categories. OSM data quality in the analyzed areas (regarding com-pleteness and attribute accuracy) is high, which renders OSM data suitable for the proposed analysis. As a proxy for attractiveness for tourists, the number of Panoramio photos within each block group was extracted from the Panoramio API. Panoramio was chosen since it provides a better positional accuracy and coverage of outdoor images compared to Flickr (Zielstra and Hochmair 2013).For R2, which quantifies the relative abundance of Pokémon Go data-sets in different land-use catego-ries, the land-use layers from Miami-Dade County and Massachusetts were used. The four study sites in Collier and Broward Counties (shown near Naples and Fort Lauderdale in Figure 2) were excluded from this specific task due to the lack of an adequate land-use data-set. Original land-use classes were aggregated in both data-sets to obtain a more generalized land-use classification. The only semantic difference in the reclas-sifications between both regions (Miami-Dade County and Massachusetts) relates to the “roads” and “trans-portation” categories. In Miami-Dade, roads are shown separated from other transportation infrastructure (e.g. railroad terminals), and therefore some Pokémon Go points fell into the road class. As opposed to this, the

Figure 4.  scanning for Yelp businesses in part of miami-Dade county, fl.

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 6: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

246 L. JUHÁSZ AND H. H. HOCHMAIR

Regression residuals of block group polygons were not spatially autocorrelated at a 5% level of significance, which suggests that the specified model provides unbi-ased estimates and correct inference. This means also that explicit modeling of autocorrelation, e.g. through spatial eigenvector filtering (Helbich and Arsanjani 2015) or autoregressive models (de Smith, Goodchild, and Longley 2015) is not necessary in this case.

4.3. R2 – Counts of Pokémon Go points on land-use categories

This research question examines in detail how different land-use categories affect the abundance of PokéStops, gyms, and spawnpoints. Using study areas that are com-bined for the Miami-Dade County and Boston regions, point counts on different land-use categories were com-pared to count numbers that can be expected under complete spatial randomness (CSR). The expected count number for a land-use category is computed as the total number of points in a region multiplied by the propor-tion of area covered by the land-use category in ques-tion. Table 2 juxtaposes observed and expected point counts for the three point types of Pokémon Go points. Chi-squared tests of independence (lower portion of

newspaper articles reporting that Pokémon Go is played by college crowds (university) (Parry 2016) or for rec-reational purposes (parks) (Grande 2016; Khalid 2016). Areas with business opportunities and tourist activities are also found to be positively associated with PokéStop numbers, supporting earlier notions about a higher den-sity of crowd-sourced points for Ingress in commercial, downtown, and higher income areas (Akhtar 2016). Areas with a higher percentage of African-American and Hispanic population had a fewer PokéStops, supporting the notion of “redlining”, which describes a community being cut off from essential services based on its racial or ethnic group (Kooragayala and Srini 2016).

Table 2. pokémon Go-related point data-set counts on different land-use categories along with expected counts calculated based on area.

Land-use Area (km2) PokéStop obs. (exp) Spawnpoint obs. (exp) Gym obs. (exp)

(a) Miami-Dade County, FLagricultural 13.4 1 (96.1) 179 (881.2) 0 (13.4)commercial 5.5 169 (39.3) 1,216 (360.4) 19 (5.5)industrial 6.1 10 (43.9) 290 (402.6) 2 (6.1)natural 4.8 11 (34.4) 194 (315.4) 2 (4.8)other 14.3 22 (102.0) 178 (935.5) 2 (14.2)public 6.6 237 (47.3) 800 (400.7) 42 (6.6)recreational 5.6 124 (40.1) 801 (367.4) 23 (5.6)residential 39.9 102 (284.9) 4,169 (2,611.9) 10 (39.7)roads 24.8 325 (175.9) 1,161 (1,621.7) 41 (24.6)transportation 2.2 37 (15.5) 131 (142.3) 4 (2.2)Water 18.4 2 (131.0) 374 (1,200.9) 0 (18.2)Water (open) 4.2 2 (29.7) 52 (272.0) 0 (4.1)total obs. 1041 9545 145pearson’s chi-squared testX2 719.3 2779.5 114.5df 11 11 11p-value <2.20e–16 <2.20e–16 <2.20e–16

(b) Boston, MAagricultural 0.0 0 (0.9) 1 (2.6) 0 (0.1)commercial 6.8 589 (290.6) 1,249 (809.9) 35 (22.9)industrial 2.1 64 (87.4) 156 (243.6) 7 (6.9)natural 1.3 26 (55.0) 93 (153.4) 3 (4.3)other 0.5 22 (20.1) 50 (58.0) 2 (1.64)public 6.0 544 (255.6) 1,112 (712.1) 41 (20.1)recreational 2.2 178 (91.8) 362 (255.9) 21 (7.2)residential 17.2 305 (727.3) 1,806 (2,026.7) 25 (57.2)roads – – (–) – (–) – (–)transportation 3.1 104 (129.9) 122 (362.0) 11 (10.2)Water 1.4 8 (58.7) 62 (163.6) 1 (4.6)Water (open) 3.3 17 (139.0) 162 (387.2) 0 (10.9)total obs. 1,857 5,175 146pearson’s chi-squared testX2 557.3 504.6 42.7df 10 10 10p-value <2.20e−16 <2.20e−16 5.70e−06

Table 1. estimation results for the number of pokéstops.

Parameter CoefficientStandard

error p-valueintercept 0.946 0.098 <2e−16% of african-american −0.006 0.002 4.65e−04% of Hispanic −0.008 0.001 7.51e−08park 0.365 0.087 3.13e−05University 0.441 0.130 7.42e−04# of businesses 0.015 0.002 5.55e−14# panoramio pictures (tourism) 0.008 0.001 2.28e−08Area (km2) 0.396 0.109 2.85e−04N 373mcfadden’s pseudo R2 0.622

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 7: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

GEO-SPATIAL INFORMATION SCIENCE 247

agricultural, industrial, natural, residential, and water land-use categories in both cities, reflecting that these are areas with fewer PokéStops, gyms, and spawnpoints.

4.4. R3 – Spatial clustering of Pokémon Go point data-sets

Visual inspection of the study sites and results from the land-use analysis suggest that Pokémon Go point data-sets are spatially clustered. Since PokéStop and gym locations were allegedly generated from the same crowd-sourced data-set, it can be hypothesized that these two point groups are similarly clustered throughout both regions, meaning that there is no clustering of PokéStops relative to gym locations and the other way round. This is also suggested by similar relative count index patterns observed for PokéStops and gyms (see Figure 5).

To determine whether PokéStops and gyms cluster similarly the bivariate version of Ripley’s K-function, known as Cross K-function (Dixon 2002), can be used. The Cross K-function can be formulated as:

where f(r) is the number of type j events within a dis-tance r of a randomly chosen type i event; λ is the density of j events per areal unit. Under random labeling, Kii(r) = Kij(r) = Kji(r) = Kjj(r), where in the context of this paper i and j stand for PokéStop and gym, or the other way around. Statistical inference of the difference between the observed Cross K-function and a Cross K-function generated by random labeling can be achieved through Monte Carlo simulation. We analyze the cluster behavior of gyms around PokéStops, using gyms as event type j and PokéStops as event type i in Equation (2). Within each of the 999 permutations of the Monte Carlo simu-lation, events were randomly labeled as either PokéStop

(2)Kij(r) = �−1E[f (r)]

Table 2) were performed to examine the association between land-use and point count, which was found to be significant (p < 0.0001) for PokéStops, gyms, and spawnpoints in both analyzed regions. This means that point counts differ significantly from expected counts on different land-use categories.

To illustrate how Pokémon Go points are over- or underrepresented in different land-use categories, a rela-tive count index (Equation (1)) was calculated as follows:

where c is the relative count index; O is the number of observed points falling in a land-use category; E is the expected number of points falling in that land-use cat-egory. The relative count index values range between +1 and −1 (exclusive) where a positive c means over-rep-resentation (i.e. more observed points than expected under CSR) and a negative c means the opposite. Figure 5 shows the relative count index for PokéStops, gyms, and spawnpoints for the 12 aggregated land-use categories in both study regions. Patterns of over-/under- representa-tion are similar in the three examined point categories and in both analyzed regions.

Three of the 12 land-use categories (commercial, pub-lic, and recreational) are overrepresented for all point sets both in Miami-Dade and Boston. For PokéStops, this overrepresentation resembles some of the signif-icant positive coefficients of the NBRM estimation (Section 4.2., # of businesses = commercial, Park = rec-reational, University  =  public), suggesting that these land-use categories are indeed the ones where Pokémon Go can be played most effectively. As opposed to this, most Pokémon Go points are underrepresented in

(1)c(O,E) =

{

0 , if O + E = 0o−E

o+E, if O + E > 0

Figure 5. relative count index for different land-use categories.

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 8: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

248 L. JUHÁSZ AND H. H. HOCHMAIR

of nearest neighbor distances, namely those measured between businesses tagged with the PokéStop attribute and their nearest PokéStops, and between all Yelp busi-nesses and their nearest PokéStops. For this analysis, only PokéStops within the study site hexagons were con-sidered as potential nearest neighbors. Descriptive statis-tics for both sets of distance measurements are shown in Table 3, and Figure 7 plots the corresponding histograms for the two distance sets.

Both Table 3 and Figure 7 suggest that businesses tagged with the “PokéStop nearby” attribute are indeed situated closer to a PokéStop than all Yelp businesses. A Mood’s median test was performed to determine whether differences between medians were significant between both distance groups, and results confirmed this at a high level of significance (p < 2.2E–16). This means that the tagging behavior of Yelp users on this attribute is not random, but that the data contributors tend to annotate this information correctly. A higher tagging intensity of Yelp businesses can be observed in metropolitan areas (Miami Beach: 12.0%, downtown Boston: 12.5%, downtown Miami: 9.7%) than in sub-urban areas (Hialeah, FL: 1.2%, Homestead, FL: 5.4%, suburban Boston: 2.3%). Two rural areas (Redlands, FL and Immokalee, FL) and two suburban areas (Davie, FL and Brownsville, FL) did not have any businesses tagged with this attribute. Two intertwined explana-tions for this discrepancy in tagging completeness rates

or gym (retaining their observed proportions) and the Cross K-function was calculated. This established and upper and lower simulation envelope for random labeling at a 99.9% confidence level. The Monte Carlo simulation was run for 15 study sites in South Florida and Boston (2 rural sites did not have PokéStops) for distances up to 5000 m. Results show that the observed Cross K-functions fall within the simulation envelope for the whole distance range for most study sites. This implies that PokéStops and gyms are similarly clus-tered around each other, that no attraction or repulsion between both point types are present, and that these point groups were indeed generated by similar spatial processes.

There are, however a few study sites where the observed Cross K-function falls slightly below the lower simulation envelope (Figure 6). In these areas, gyms and PokéStops are further apart from each other than it would be expected under random labeling at a 0.001 level of significance. For a Pokémon Go player, this means that slightly longer trips may be necessary to cover activities that involve both PokéStops and gyms, compared to other areas that contain the same number of PokéStops and gyms but do not show this effect.

4.5. R4 – Pokémon-related user tagging of Yelp businesses

This analysis examines to which extent Yelp users tagged businesses with the “PokéStop nearby” attribute, and how reliable this crowd-sourced information is. Within all analyzed study polygons areas, Yelp users tagged 1392 businesses with the “PokéStop nearby” attrib-ute out of 21,606 total Yelp businesses, revealing that Yelp’s strategy to attract potential customers through targeting Pokémon Go players seemed to work up to a certain extent. To determine whether user tagging of businesses is correct or not, we computed two sets

Figure 6. computed cross K-function (black line) with simulation mean (red dashed line) and confidence envelopes (gray area) for random labeling using a monte carlo simulation with 999 permutations.

Table 3. Descriptive statistics of nearest neighbor distances between Yelp businesses and pokéstops.

Distance measurement Mean (m) Median (m) SD (m) Ntagged Yelp

businesses – pokéstops

52.8 38.1 58.1 896

all Yelp businesses – pokéstops

102.5 57.8 129.7 10,970

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 9: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

GEO-SPATIAL INFORMATION SCIENCE 249

Our study also analyzed the interplay of augmented reality gaming and VGI, suggesting that the Pokémon Go user community participates in crowd-sourcing activities, namely adding information to the “PokéStop nearby” attribute on the Yelp business platform. Nearest neighbor analysis suggests that this tagged information tends to be correct, and that it can be used by visitors of the Yelp website to identify businesses that are located near a PokéStop.

The presented research supports earlier findings of a strong geographic and socioeconomic bias in the Pokémon Go data-set. As this bias can affect the user experience in location-based games negatively, future developments and improvements of location-based AR games should address this issue and provide equal access to interactive platforms such as Pokémon Go to all user communities.

Notes on contributors

Levente Juhász is a PhD candidate at the University of Florida where he focuses his research efforts on VGI. He holds a master’s degree in Geography from the University of Szeged, Hungary. He is especially interested in contribution patterns of VGI and social media users as well as in how user-gen-erated data are used across different platforms. Previously, he worked as a GIS developer in Hungary, then was a short-term visiting scientist at the Joint Research Centre in Ispra, Italy, before starting his doctoral studies at the University of Florida. He also acts as a data scientist for a geospatial startup, Mapillary, and is an avid contributor of OpenStreetMap and other open data projects.

Hartwig H. Hochmair is an associate professor of Geomatics at the University of Florida where he teaches courses in GIS, Digital Mapping, Adjustment Computations, and Geodesy. He focuses on quality assessment of crowd-sourced data, route planning, wayfinding, and the analysis of transportation net-works and travel behavior with a focus on bicycle and public transportation. As part of his interdisciplinary research, he analyzes the spread of invasive species in Southeast Florida, including termites and tegus. His educational background includes Geodesy and Geoinformation, and he obtained his PhD degree from the Technical University of Vienna, Austria.

could be (a) the lower density of PokéStops in rural areas (even with businesses present), and (b) the lack of active crowd-sourcing communities in rural areas. The latter might have contributed to the prior during earlier crowd-sourcing data collection efforts since these were primarily focusing on urban and metro-politan regions.

5. Conclusions

This paper analyzed point data-sets extracted from Pokémon Go, which is a location-based augmented reality game for smartphones engaging millions of users worldwide. The study confirms the anecdotal experi-ence of players in certain neighborhoods who reported a lack of Pokémon Go point features compared to other neighborhoods. Whereas earlier research already iden-tified that areas with a high percentage of minorities are disadvantaged regarding the access to PokéStops (Colley et al. 2017), the estimated regression model of this paper presents an extended list of factors which were found to be significantly associated with an increase in PokéStop counts per block group, including presence of parks, higher education institutes, or businesses, or being tourist sites.

Further analysis showed that some land-use cate-gories have an overrepresentation of Pokémon Go fea-tures. Users will have the highest chance to encounter a Pokémon, or to run into a PokéStop or gym, if play-ing Pokémon Go in commercial areas (e.g. shopping centers), public spaces (e.g. university campuses), or in recreational areas (e.g. parks). As opposed to this, natural, agricultural, residential, and industrial areas, as well as lakes, rivers, and open water have a lower point density and make it therefore more difficult to succeed in this game. Using a Cross K-function, gym locations were in most study areas found to cluster similarly to PokéStops, suggesting that those locations were gener-ated with the same methods and derived from the same crowd-sourced data-set.

Figure 7.  Histogram of nearest neighbor distances in m (log scale) along with the means (red vertical lines) and medians (blue vertical lines).

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 10: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

250 L. JUHÁSZ AND H. H. HOCHMAIR

on Geographic Information Science, Wageningen, The Netherlands, May 9–12.

Juhász, L., A. Rousell, and J. Arsanjani. 2016. “Technical Guidelines to Extract and Analyze VGI from Different Platforms.” Data 1 (3): 15. doi:10.3390/data1030015.

Khalid, A. 2016. “Crowds Rush to Catch Rare Pokémon Vaporeon in Central Park.” The Daily Dot. https://www.dailydrot.com/unclick/crowds-capture-rare-pokemon-vaporeon-central-park/.

Kirkpatrick, M. G., T. B. Cruz, N. I. Goldenson, J.-P. Allem, K.-H. Chu, M. A. Pentz, and J. B. Unger. 2017. “Electronic Cigarette Retailers Use Pokémon Go to Market Products.” Tobacco Control (Online published). doi:10.1136/tobaccocontrol-2016-053369.

Kondamudi, P. R., B. Protono, and H. Alhoori. 2017. “Pokémon Go: Impact on Yelp Restaurant Reviews.” The 9th International ACM Web Science Conference, Troy, NY, USA, June 25–28. doi: 10.1145/3091478.3098861.

Kooragayala, S., and T. Srini. 2016. “Pokémon Go is Changing How Cities Use Public Space, But Could It Be More Inclusive?” https://www.urban.org/urban-wire/pokemon-go-changing-how-cities-use-public-space-could-it-be-more-inclusive.

Li, L., M. F. Goodchild, and B. Xu. 2013. “Spatial, Temporal, and Socioeconomic Patterns in the Use of Twitter and Flickr.” Cartography and Geographic Information Science 40 (2): 61–77. doi:10.1080/15230406.2013.777139.

de Lucia, A., R. Francese, I. Passero, and G. Tortora. 2012. “A Collaborative Augmented Campus based on Location-aware Mobile Technology.” International Journal of Distance Education Technologies 10 (1): 55–73. doi:10.4018/jdet.2012010104.

McCartney, M. 2016. “Game on for Pokémon Go.” British Medical Journal 354: i4306. doi:10.1136/bmj.i4306.

McKenzie, G., K. Janowicz, and B. Adams. 2014. “A Weighted Multi-attribute Method for Matching User-generated Points of Interest.” Cartography and Geographic Information Science 41 (2): 125–137. doi:10.1080/15230406.2014.880327.

Nature. 2016. “Gotta Name Them All: How Pokémon can Transform Taxonomy (Editorial).” Nature 535: 323–324. doi:10.1038/535323b.

Neustaedter, C., A. Tang, and T. K. Judge. 2013. “Creating Scalable Location-based Games: Lessons from Geocaching.” Personal and Ubiquitous Computing 17 (2): 335–349. doi:10.1007/s00779-011-0497-7.

Nigg, C. R., D. J. Mateo, and J. An. 2017. “Pokémon Go May Increase Physical Activity and Decrease Sedentary Behaviors.” American Journal of Public Health 107 (1): 37–38. doi:10.2105/AJPH.2016.303532.

Parry, H. 2016. “They’ve Gotta Catch ‘Em All! Hoards of Pokemon Go Players Overrun College Campus in Search of Squirtle.” Daily Mail. https://www.dailymail.co.uk/news/article-3688999/They-ve-gotta-catch-em-Hoards-Pokemon-players-overrun-college-campus-search-Squirtle.html.

Reisinger, D. 2016. “Pokémon Go is Far More Popular Than Netflix.” Fortune, July 14. https://fortune.com/2016/07/14/pokemon-go-netflix/.

Skejo, J., and J. H. S. Caballero. 2016. “A Hidden Pygmy Devil From the Philippines: Arulenus Miae Sp. Nov. – A New Species Serendipitously Discovered in an Amateur Facebook Post (Tetrigidae: Discotettiginae).” Zootaxa 4067 (3): 383–393. doi:10.11646/zootaxa.4067.3.7.

de Smith, M. J., M. F. Goodchild, and P. A. Longley. 2015. Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools. 5th ed. Leicester: Matador.

ORCIDLevente Juhász   http://orcid.org/0000-0003-3393-4021

ReferencesAkhtar, A. 2016. “Is Pokémon Go Racist? How the App

May Be Redlining Communities of Color.” USA TODAY. https://www.usatoday.com/story/tech/news/2016/08/09/pokemon-go-racist-app-redlining-communities-color-racist-pokestops-gyms/87732734/.

Alivand, M., and H. H. Hochmair. 2017. “Spatiotemporal Analysis of Photo Contribution Patterns to Panoramio and Flickr.” Cartography and Geographic Information Science 44 (2): 170–184. doi:10.1080/15230406.2016.1211489.

Antoniou, V., J. Morley, and M. Haklay. 2010. “Web 2.0 Geotagged Photos: Assessing the Spatial Dimension of the Phenomenon.” Geomatica 64 (1): 99–110.

BBC. 2016. “Should You Believe Those Pokemon Go Download Numbers?” Polygon. https://www.polygon.com/2016/7/7/12118576/pokemon-go-pokestop-gym-locations-map-guide.

Clark, A. M., and M. T. G. Clark. 2016. “Pokémon Go and Research Qualitative, Mixed Methods Research, and the Supercomplexity of Interventions.” International Journal of Qualitative Methods 15 (1): 1–3. doi:10.1177/1609406916667765.

Colley, A., J. Thebault-Spieker, A. Y. Lin, D. Degraen, B. Fischman, J. Häkkilä, K. Kuehl, et al. 2017. “The Geography of Pokémon GO: Beneficial and Problematic Effects on Places and Movement.” The 2017 CHI Conference on Human Factors in Computing Systems, Denver, US, May 6–11. doi:10.1145/3025453.3025495.

Dixon, P. M. 2002. Ripley’s K-Function. Encyclopedia of Environmetrics. Hoboken, NJ: Wiley.

Esri. 2016. “Methodology Statement: 2016–2021. Esri US Demographic Updates.” https://downloads.esri.com/esri_content_doc/dbl/us/J10268_Methodology_Statement_2016-2021_Esri_US_Demographic_Updates.pdf.

Goodchild, M. F. 2007. “Citizens as Voluntary Sensors: Spatial Data Infrastructure in the World of Web 2.0 (Editorial).” International Journal of Spatial Data Infrastructures Research 2: 24–32.

Grande, A. 2016. “Is Pokémon Go Going Too Far in Bellevue?” Kiro7. https://www.kiro7.com/news/local/is-pokemon-go-going-too-far-in-bellevue/423744178.

Helbich, M., and J. Arsanjani. 2015. “Spatial Eigenvector Filtering for Spatiotemporal Crime Mapping and Spatial Crime Analysis.” Cartography and Geographic Information Science 42 (2): 134–148. doi:10.1080/15230406.2014.893839.

Hochmair, H. H., and D. Zielstra. 2013. “Development and Completeness of Points of Interest in Free and Proprietary Data Sets: A Florida Case Study.” GI_Forum 2013, Creating the GISociety, Salzburg, July 2–5.doi:10.1553/giscience2013s39.

Hollenstein, L., and R. Purves. 2010. “Exploring Place through User-generated Content: Using Flickr Tags to Describe City Cores.” Journal of Spatial Information Science 1: 21–48. doi:10.5311/JOSIS.2010.1.3.

Johnson, L., R. Smith, H. Willis, A. Levine, and K. Haywood. 2011. “The 2011 Horizon Report – The New Media Consortium.” https://redarchive.nmc.org/publications/horizon-report-2011-k-12-edition.

Juhász, L., and H. H. Hochmair. 2017. “How do Volunteer Mappers Use Crowdsourced Mapillary Street Level Images to Enrich OpenStreetMap?” The 20th AGILE Conference

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17

Page 11: Where to catch ‘em all? – a geographic analysis of Pokémon ... · GEO-SPATIAL INFORMATION SCIENCE 243 Go locations and crowd-sourced POIs. This is followed by a description of

GEO-SPATIAL INFORMATION SCIENCE 251

Zielstra, D., and H. H. Hochmair. 2013. “Positional Accuracy Analysis of Flickr and Panoramio Images for Selected World Regions.” Journal of Spatial Science 58 (2): 251–273. doi:10.1080/14498596.2013.801331.

Zielstra, D., H. H. Hochmair, and P. Neis. 2013. “Assessing the Effect of Data Imports on the Completeness of OpenStreetMap – A United States Case Study.” Transactions in GIS 17 (3): 315–334. doi:10.1111/tgis.12037.

Wu, H.-K., S. W.-Y. Lee, H.-Y. Chang, and J.-C. Liang. 2013. “Current Status, Opportunities and Challenges of Augmented Reality in Education.” Computers & Education 62: 41–49. doi:10.1016/j.compedu.2012.10.024.

Xu, H., Y. Xian, H. Xu, L. Liang, A. F. Hernandez, T. Y. Wang, and E. D. Peterson. 2017. “Does Pokemon Go Help Players be More Active? An Evaluation of Pokemon Go and Physical Activity.” Circulation 135 (Suppl 1): A20.

Dow

nloa

ded

by [

Uni

vers

itat S

alzb

urg]

at 2

3:39

21

Sept

embe

r 20

17