12
Cartography and GIS MODELLING BRUSSELS BIKE-SHARING OPEN DATA USING SPATIAL REGRESSION MODELS Tiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal ABSTRACT In recent years there has been a renewed interest in utilitarian cycling due to its recognized potential in the reduction of energy consumption and pollution in the cities. Bike-sharing systems generate a large amount of data which can be used to improve the systems themselves, or to improve the body of knowledge on urban mobility. The open data automatically generated by the Brussel's bike- sharing system (Villo) is explored through spatial regression models of the number of bicycle trips at stations. The main goal of the modelling process is to understand if socio-economic, infrastructure and land use factors influence mobility patterns in peak periods and weekdays. The first step of the modelling process consists in setting up exploratory Ordinary Least Squares (OLS) models in order to identify potential explanatory variables. Finally, Geographically Weighted Poisson Regression (GWPR) models and semi-parametric versions of GWPR models are parametrised using the previously identified variables. The results show that the relationships between the dependent and independent variables are complex and spatially varying. Furthermore, the results show hidden patterns that enable further local investigation on these relationships. The weaknesses and

SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

Cartography and GIS

MODELLING BRUSSELS BIKE-SHARING OPEN DATA USING SPATIAL

REGRESSION MODELS

Tiago Daniel Costa Pina

Ana Cristina CostaNOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

ABSTRACT

In recent years there has been a renewed interest in utilitarian cycling due to its recognized potential in the reduction of energy consumption and pollution in the cities. Bike-sharing systems generate a large amount of data which can be used to improve the systems themselves, or to improve the body of knowledge on urban mobility. The open data automatically generated by the Brussel's bike-sharing system (Villo) is explored through spatial regression models of the number of bicycle trips at stations. The main goal of the modelling process is to understand if socio-economic, infrastructure and land use factors influence mobility patterns in peak periods and weekdays. The first step of the modelling process consists in setting up exploratory Ordinary Least Squares (OLS) models in order to identify potential explanatory variables. Finally, Geographically Weighted Poisson Regression (GWPR) models and semi-parametric versions of GWPR models are parametrised using the previously identified variables. The results show that the relationships between the dependent and independent variables are complex and spatially varying. Furthermore, the results show hidden patterns that enable further local investigation on these relationships. The weaknesses and strengths of our approach are discussed, particularly its implementation in other geographic contexts and its potential of generalisation for all bicycle trips.

Keywords: bike-sharing systems, Geographically Weighted Poisson Regression, mobility, spatial statistics, spatial regression

INTRODUCTION

Cycling has been gaining a renewed interest as environmental and safety concerns increase, and a new mobility paradigm is arising [1]. Cycling infrastructure improvements are very important to promote cycling and ensuring users safety, as well as other strategies like multi-modality and bike-sharing. Bike-sharing systems enable short-term rental of bicycles from one docking station to another [2]. Bike-sharing systems can be part of the transport system. In some cases, they complement the public transportation system, and in other cases they constitute a real alternative to bicycle ownership. Unlike other activities, transport planning is a service. Therefore, it should be consumed when it is produced because it is not possible to stock it [3].

Page 2: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

19th International Multidisciplinary Scientific GeoConference SGEM 2019

Understanding the way people travel through a given territory has always been crucial to the transport planning process. Within this context, data collected automatically can complement, and in some cases replace, former methods of data acquisition such as counts or surveys. Contemporary bike-sharing systems generate data that can be used for multiple purposes. Bike-sharing is not only an important data source to users and providers, but also for researchers as it can provide relevant insights into bike-sharing activity and urban mobility.

The Brussels bike-sharing system, known as “Villo”, had around 360 bike-sharing stations and more than 5000 shared-bicycles located all around the Brussels region in 2018. Even though it is one of the largest systems of the European continent, there is little academic research about it. To the best of our knowledge, the studies undertaken by [4] and [5] are the only exceptions. Both studies compare several indicators of different bike-sharing systems, providing global insights of the Villo system. However, the Brussels bike-sharing system was not studied in detail by those authors. This paper attempts to fulfil this gap.

The main goal of this study is to identify underlying factors influencing bike stations activity at different periods of the day and the week. The peak periods (morning and evening) and weekdays were analysed using open data. Local spatial regression models were applied in order to reveal the mechanisms underlying the observed spatial patterns. Only a few authors applied spatial regression approaches on bike-sharing research. [6] used Ordinary Least Squares (OLS) and Geographic Weighted Regression (GWR) to identify the factors behind bicycling for health, and [7] used a robust linear regression model to study the Lyon bike-sharing system “Vélo’v”.

DATA AND METHODS

Study area and datasets

The open data used in this research were collected in May 2017 and have multiple sources: the bike-sharing provider JCDecaux, Brussels Institute for Statistics and Analysis, and the mobility geo-portal Mobigis. In total, 46 different variables related to the bike-sharing system “Villo”, socio-economic factors, infrastructure and mobility were collected. The dependent variables of the models are the number of bicycle trips during the morning and evening peak periods, and on weekdays. Data pre-processing involved data cleansing and transposing all variables into a unique geographic unit that was defined as Voronoi polygons, which we named as “service areas” for the purpose of this study. Voronoi polygons were truncated to 1200 meters and were also delimited by the Brussels region border (Fig. 1).

Methods

The methodology is summarised in Figure 2. First, several exploratory Ordinary Least Squares (OLS) regressions were undertaken in order to select the most relevant and appropriate explanatory variables. The best OLS model was selected considering the coefficient of determination (highest adjusted R²), the elimination multicollinearity between independent variables (VIF – Variance Inflation Factor < 10, or Condition Number < 30), and the significance of the regression coefficients (p-values of robust t-tests smaller than 0,05).

Page 3: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

Cartography and GIS

Figure 1. Bike-sharing stations and service areas in Brussels

Figure 2. Modelling methodology

Page 4: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

19th International Multidisciplinary Scientific GeoConference SGEM 2019

The OLS model is not appropriate to deal with spatial data because observations are not independent, and the residuals exhibit both heteroscedasticity (verified with the Koenker-Basset test and the Local Moran’s I statistic) and spatial autocorrelation (tested with the Global Moran’s I statistic). Geographically weighted regression models were specifically developed to deal with spatial heteroscedasticity [8]. Moreover, the dependent variable corresponds to count-data, thus Poisson regression models are more appropriate. Therefore, the results of a global Poisson regression model were compared with those of a Geographically Weighted Poisson Regression (GWPR) model, and with the semi-parametric version of the GWPR model (abbreviated as SGWPR), based on the AICc – Corrected Akaike’s Information Criterion (for further details see [9]).

RESULTS AND DISCUSSION

The final global Poisson model includes as predictors the “Number of Villo’s bike-sharing stands”, “Distance to city centre”, “Public transportation density”, “Density of parking lots for bicycles”, and “Offices density”. The results of this model show that “Distance to city centre” is the only variable with a negative coefficient, which is consistent with OLS results. This is an expected behaviour because the activity is more important in the city-centre and there is an almost proportional activity decay as one steps further from the city-centre to the suburbs.

Both GWPR and SGWPR local models perform better than the global model (lower AICc values) for all periods (Table 1). The GWPR models with the same predictors of the global Poisson model were the ones that best fitted both peak periods. The SGWPR model with “Distance to city centre” as global variable and all other variables as local predictors showed the best performance for the weekdays’ period.

Table 1. Statistical comparison of the global Poisson model, the Geographically Weighted Poisson Regression (GWPR) model, and the semi-parametric GWPR model (SGWPR)

Model Number of neighbours AICc Deviance Degrees of

freedomWeekdaysGlobal Poisson – 52145,99 52133,73 336GWPR 12 4720,16 2594,90 52,94

SGWPR 10 4684,68 2041,77 43,90Morning peakGlobal Poisson – 16258,70 16246,45 336GWPR 15 2639,18 1413,47 88,70

SGWPR 15 2875,52 1884,47 123,28Evening peakGlobal Poisson – 14024,9 14012,24 336GWPR 15 2581,483 1361,84 88,01

SGWPR 13 2683,28 1355,83 82,59

As one of the goals of this research was to investigate the potential generalisation of the models to bicycle trips, it is of great interest to analyse in deeper detail how the explanatory variable “Bicycle parking density” contributes to explain the spatial patterns of the “Number of bicycle trips”. Both negative and positive significant

Page 5: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

Cartography and GIS

coefficients are observed in different parts of Brussels, providing evidence that the relationship between these two variables varies locally (Fig. 3). Clusters of negative significant coefficients suggest that an increase in private bicycle usage decreases the bike-sharing usage in those areas.

Figure 3. Results of the “Density of parking lots for bicycles” predictor in GWPR models: (a) local coefficients and (b) pseudo-t values in the morning peak; (c) local coefficients and (d) pseudo-t

values in the evening peak

“Public transportation density” has positive coefficients in most of the study region (Fig. 4). This was the expected result, considering the complementarity between bike-sharing and public transportation reported in the literature [2]. However, the results show that in a few areas of Brussels's region, bike-sharing might be a competitive alternative to public transportation, as found by [10] for the Tel’Aviv bike-sharing system.

The signs of the coefficients of “Distance to city centre” and “Villo bike-stands” are as expected. Coefficients are negative overall and positive overall, respectively, with very limited exceptions.

Finally, the areas with the highest office density are not necessarily those where the coefficients are positive and high, as it would be expected for the morning peak period, considering the results of the global model. On the contrary, in those areas, negative coefficients intercalate with positive ones, and in some cases they are significant and in others not (Fig. 5). One possible explanation is that some stations are filled rapidly, since the commuting movements occur mainly in the home-to-work direction during the morning peak period and in the reverse direction during the afternoon. If there is no redistribution of bicycles by the operator, nearby stations with available capacity are used.

Page 6: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

19th International Multidisciplinary Scientific GeoConference SGEM 2019

Figure 4. Results of the “Public transportation density” predictor in GWPR models: (a) local coefficients and (b) pseudo-t values in the morning peak; (c) local coefficients and (d) pseudo-t

values in the evening peak

Figure 5. Results of the “Offices density” predictor in GWPR models: (a) local coefficients and (b) pseudo-t values in the morning peak; (c) local coefficients and (d) pseudo-t values in the evening

peak

Page 7: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

Cartography and GIS

CONCLUSION

Bike usage is influenced by meteorology and seasonal variations more than other means of transport. Furthermore, cycling is growing faster in the recent years and its growth is very changing and unpredictable. For these reasons, a predictive spatial regression model would have to take these aspects into consideration in order to succeed in providing an accurate prediction. Creating a predictive model was not intended, and the main goal of this study was to understand the factors behind bike-sharing activity and how they could change over-time.

The results show that five variables have a significant relationship with the number of bicycle trips in all the analysed periods, but their explanatory power varies spatially. Those variables are public transportation density, distance to city centre, density of parking lots for bicycles, offices density, and number of Villo’s bike-sharing stands.

The Geographically Weighted Poisson Regression (GWRP) model is the one that better fits the peak periods data. However, when analysing the 24 hours period on weekdays, the semi-parametric SGWPR model with distance to city centre as a global variable has a better performance than any other of the investigated models.

The exploratory analysis and the results of global regression models enable a global understanding on the relationship between dependent and independent variables. Local models are a powerful tool to uncover hidden patterns and relationships on data that could not be identified with global models. Despite the proved strengths of local models, their results are sometimes difficult to understand. Contextualizing the results of GWPR and SGWPR with other data allows gaining a better understanding of local phenomena. Nevertheless, it is difficult to understand if the models lack important explanatory variables. Therefore, we suggest for future studies the investigation of the explanatory significance of other variables, such as consumption, tourism, economic activity, and land use or land cover.

As an alternative to the Poisson regression model, future research could focus on alternative models such as the Negative Binomial Regression model, which is specially adapted for modelling count-data with over-dispersion. Furthermore, similarly to the study of [10], an extension of this research would be the modelling of origin-destination data (if available), in order to identify relationships and usage patterns between bike-sharing stations. Such a model could be integrated with a traffic assignment approach calibrated with automatic bike counts, in order to gain insights into the soft mobility in Brussels.

REFERENCES

[1] Randriamanamihaga, A. N., Côme, E., Oukhellou, L., Govaert, G., Clustering the Vélib׳ dynamic Origin/Destination flows using a family of Poisson mixture models, Neurocomputing, vol. 141, pp 124–138, 2014.

[2] Fishman, E., Washington, S., Haworth, N., Bike Share: A Synthesis of the Litera-ture, Transport Reviews, vol. 33, pp 148–165, 2013.

[3] Ortúzar, J. de D., Willumsen, L. G., Modelling Transport, 4th Edition, John Wiley & Sons, UK, 2011.

Page 8: SGEM2008 - Universidade NOVA de Lisboa€¦ · Web viewTiago Daniel Costa Pina Ana Cristina Costa NOVA Information Management School (NOVA IMS), Universidade Nova de Lisboa, Portugal

19th International Multidisciplinary Scientific GeoConference SGEM 2019

[4] O’Brien, O., Cheshire, J., Batty, M., Mining bicycle sharing data for generating in-sights into sustainable transport systems, Journal of Transport Geography, vol. 34, pp 262–273, 2014.

[5] Otero, I., Nieuwenhuijsen, M. J., Rojas-Rueda, D., Health impacts of bike sharing systems in Europe, Environment International, vol. 115, pp 387–394, 2018.

[6] Griffin, G. P., Jiao, J., Where does bicycling for health happen? Analysing volun-teered geographic information through place and plexus, Journal of Transport & Health, vol. 2, pp 238–247, 2015.

[7] Tran, T. D., Ovtracht, N., d’Arcier, B. F., Modeling Bike Sharing System using Built Environment Factors, Procedia CIRP, vol. 30, pp 293–298, 2015.

[8] Fotheringham, A. S., Brunsdon, C., Charlton, M., Geographically Weighted Regres-sion: The Analysis of Spatially Varying Relationships, 1st Edition, John Wiley & Sons, UK, 2002.

[9] Nakaya, T., Fotheringham, A. S., Brunsdon, C., Charlton, M., Geographically weighted Poisson regression for disease association mapping, Statistics in Medicine, vol. 24, pp 2695–2717, 2005.

[10] Levy, N., Golani, C., Ben-Elia, E., An exploratory study of spatial patterns of cy-cling in Tel Aviv using passively generated bike-sharing data, Journal of Transport Ge-ography, vol. 76, pp 325–334, 2019.