17
CityNet: A Multi-city Multi-modal Dataset for Smart City Applications Xu Geng 1 , Yilun Jin 1 , Zhengfei Zheng 1 , Yu Yang 2 , Yexin Li 3 , Han Tian 1 , Peibo Duan 4 , Leye Wang 5 , Jiannong Cao 2 , Hai Yang 1 , Qiang Yang 1 , Kai Chen 1 1 The Hong Kong University of Science and Technology, Hong Kong, China 2 The Hong Kong Polytechnic University, Hong Kong, China 3 JD Intelligent Cities Business Unit, Beijing, China 4 Northeastern University, China 5 Peking University, China {xgeng, yilun.jin, zzhengak, yliby, htianab}@connect.ust.hk, [email protected], [email protected], [email protected], [email protected], [email protected], {qyang, kaichen}@cse.ust.hk Abstract Data-driven approaches have been applied to many problems in urban computing. However, in the research community, such approaches are commonly studied under data from limited sources, and are thus unable to characterize the complexity of urban data coming from multiple entities and the correlations among them. Consequently, an inclusive and multifaceted dataset is necessary to facilitate more extensive studies on urban computing. In this paper, we present CityNet, a multi- modal urban dataset containing data from 7 cities, each of which coming from 3 data sources. We first present the generation process of CityNet as well as its basic properties. In addition, to facilitate the use of CityNet, we carry out extensive machine learning experiments, including spatio-temporal predictions, transfer learning, and reinforcement learning. The experimental results not only provide benchmarks for a wide range of tasks and methods, but also uncover internal correlations among cities and tasks within CityNet that, with adequate leverage, can improve performances on various tasks. With the benchmarking results and the correlations uncovered, we believe that CityNet can contribute to the field of urban computing by supporting research on many advanced topics. 1 Introduction We have witnessed remarkable progress in advancing machine learning techniques (e.g., deep learning [23, 9], transfer learning [15, 19], and reinforcement learning [17, 6]) for various smart city applications [24] in recent years, such as predicting traffic speeds and flows, managing ride-hailing fleets, forecasting urban environments, etc. However, due to the absence of a desirable open-source dataset, most research works in urban computing investigate individual tasks using datasets from limited sources. In fact, existing approaches can be further improved to deliver more intelligent decisions by leveraging data from multiple sources. For example, while most research works on taxi fleet management use taxi data only, in practice, such a task can benefit from additional data sources, including real-time traffic speeds, weather conditions, POI distributions, demands and supplies of other transports, etc. All these urban data are closely related to the taxi market from different aspects. However, existing open datasets, such as PeMS [1], METR [4] and NYC Cabs [2] focus on individual aspects of urban dynamics, e.g., traffic speeds or taxis. Furthermore, limited effort had been spent 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

CityNet: A Multi-city Multi-modal Dataset for SmartCity Applications

Xu Geng1, Yilun Jin1, Zhengfei Zheng1, Yu Yang2, Yexin Li3, Han Tian1,Peibo Duan4, Leye Wang5, Jiannong Cao2, Hai Yang1, Qiang Yang1, Kai Chen1

1The Hong Kong University of Science and Technology, Hong Kong, China2The Hong Kong Polytechnic University, Hong Kong, China

3JD Intelligent Cities Business Unit, Beijing, China4Northeastern University, China

5Peking University, China

{xgeng, yilun.jin, zzhengak, yliby, htianab}@connect.ust.hk,[email protected], [email protected],[email protected], [email protected],[email protected], {qyang, kaichen}@cse.ust.hk

Abstract

Data-driven approaches have been applied to many problems in urban computing.However, in the research community, such approaches are commonly studied underdata from limited sources, and are thus unable to characterize the complexityof urban data coming from multiple entities and the correlations among them.Consequently, an inclusive and multifaceted dataset is necessary to facilitate moreextensive studies on urban computing. In this paper, we present CityNet, a multi-modal urban dataset containing data from 7 cities, each of which coming from3 data sources. We first present the generation process of CityNet as well asits basic properties. In addition, to facilitate the use of CityNet, we carry outextensive machine learning experiments, including spatio-temporal predictions,transfer learning, and reinforcement learning. The experimental results not onlyprovide benchmarks for a wide range of tasks and methods, but also uncoverinternal correlations among cities and tasks within CityNet that, with adequateleverage, can improve performances on various tasks. With the benchmarkingresults and the correlations uncovered, we believe that CityNet can contribute tothe field of urban computing by supporting research on many advanced topics.

1 IntroductionWe have witnessed remarkable progress in advancing machine learning techniques (e.g., deeplearning [23, 9], transfer learning [15, 19], and reinforcement learning [17, 6]) for various smart cityapplications [24] in recent years, such as predicting traffic speeds and flows, managing ride-hailingfleets, forecasting urban environments, etc. However, due to the absence of a desirable open-sourcedataset, most research works in urban computing investigate individual tasks using datasets fromlimited sources. In fact, existing approaches can be further improved to deliver more intelligentdecisions by leveraging data from multiple sources. For example, while most research works on taxifleet management use taxi data only, in practice, such a task can benefit from additional data sources,including real-time traffic speeds, weather conditions, POI distributions, demands and supplies ofother transports, etc. All these urban data are closely related to the taxi market from different aspects.However, existing open datasets, such as PeMS [1], METR [4] and NYC Cabs [2] focus on individualaspects of urban dynamics, e.g., traffic speeds or taxis. Furthermore, limited effort had been spent

35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks.

arX

iv:2

106.

1580

2v1

[cs

.AI]

30

Jun

2021

Page 2: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

Meteorology

Taxi GPS

City Layout

CityNet

(a). Raw sources (c). Categories of sub-datasets and their correlations

Service Context

Service Context

Tasks

Citi

es

Beijing

Hong Kong

... ..

.

(1)

(2) (3)

Weather

POI

Traffic speed

Connectivity

Citizens Inflow Outflow

Taxis Idle-drivingPick-ups

(b). Sub-datasets

Figure 1: Architecture of CityNet. Left: Three raw data sources of CityNet. Middle: Illustration of all8 sub-datasets, whose sources are distinguished by underlines shown in Fig. 1(a), and categories aredistinguished by colors shown in Fig. 1(c). Right: Decomposition of the data dimensions into citiesand tasks. Directed curves indicate correlations to be discovered in this paper.

on unifying the spatio-temporal ranges of existing datasets for joint usage. For example, althoughTaxiBJ [23], tdrive [21] and Q-traffic [10] all describe Beijing taxi, they are different in temporalranges and thus cannot be jointly utilized. Hence, an inclusive and spatio-temporally aligned datasetis in urgent demand in urban computing to tackle above problems, so as to facilitate more accuratealgorithms and more insightful analyses.

In summary, there are two challenges that hamper building and utilizing such a dataset. First, in-dividual urban datasets from various maintainers differ from each other in spatio-temporal ranges,granularity, attributes, etc. This leads to difficulty in integrating them into unified formats underaligned ranges and granularities for research purposes. Second, in addition to the dataset con-struction, the correlations among sub-datasets must be uncovered so as to enhance performances byidentifying, sharing, and transferring related knowledge.

In this paper, we present CityNet, a dataset with data from multiple cities and sources for smart cityapplications. Inspired by [16], we call CityNet multi-city multi-modal to describe the wide range ofcities and sources from which it is derived. Compared to existing datasets, CityNet has the followingfeatures:

• Inclusiveness: As shown in Fig. 1(a), CityNet consists of 3 types of raw data (city layout, taxi,meteorology) collected from 7 cities. Besides, we process the raw data into multiple sub-datasets(Fig. 1(b)) to measure various aspects of urban phenomena. For example, we transform raw taxidata into region-based measurements (taxi flows, pickups, and idle driving time) that show thestatus of the transportation market and citizen activities.

• Spatio-temporal Alignment: To facilitate studies among cities and tasks, we use unified spatio-temporal configuration, including geographical coordinates, temporal frames, and samplingintervals to process all sub-datasets originating from the same city.

• Interrelatedness: We categorize sub-datasets into service and context data, as shown in Fig.1(b) and (c), to demonstrate the status of urban service providers (e.g. taxi companies) and theurban environment (e.g. weather) respectively. As depicted in Fig. 1(c), we verify the correlationsbetween sub-datasets, including the relationship (1) among services, (2) between cities and (3)between contexts and services.

Leveraging the above features of CityNet, we conduct a series of experiments to benchmark variouspopular machine learning tasks, including spatio-temporal prediction, transfer learning and rein-forcement learning. The empirical studies reveal that utilizing such a multi-modal dataset leads toperformance improvements, and also point to the current challenges and future opportunities.

The contributions of this paper are summarized as follows:

• To the best of our knowledge, CityNet is the first multi-city multi-modal urban dataset to as-semble and align sub-datasets from different tasks and cities. It is open to the research communityand can facilitate in-depth investigations on urban phenomenons from various disciplines.

• Extensive analysis and experiments on CityNet provide new insights for researchers. UponCityNet, we present diverse benchmarking results so as to motivate further studies in areas likespatio-temporal predictions, transfer learning, reinforcement learning, and federated learningin urban computing. Specifically, we use various tools to reveal and quantify the correlations

2

Page 3: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

among different modalities and cities. We further demonstrate that appropriately utilizing themutual knowledge among correlated sub-datasets can significantly improve the performance ofvarious machine learning tasks.

CityNet has been open-sourced and is currently accessible at https://github.com/citynet-at-git/Citynet-phase1. Besides, to empower the most advanced research in urban computing, we also plan to shareall resources related with this paper on TACC, a high-performance computing platform currently inthe initial stages of opening to the public. Please refer to Appendix A.7 for details.

2 Architecture of CityNetIn this section, we introduce the generation of CityNet, including the raw datasets, discretization,generation of individual datasets and data integration across individual datasets.

2.1 Raw Data Collection

Taxi GPS points Taxi trajectories reflect rich information on citizen activities and the status ofthe transportation network. We collect taxi GPS point records Gc in each city c, which consist oftrajectories of taxis Gc = {tra}a∈Ac

, where Ac is the set of taxis in city c. Each trajectory tra is asequence of tuples tra = [(posk, tk, occk)]

|tra|k=1 , where posk denotes the position (i.e. longitude and

latitude) of the taxi at time tk, and occk denotes whether the taxi is occupied at time tk.

City Layout data We collect POI (points-of-interest), road network and traffic speed data todescribe the overall layout of the city. POI data (xpoi) describe the functions served by each region. APOI is defined as poi = (pospoi, catpoi), where pospoi denotes the position (longitude and latitude) ofthe POI, and catpoi denotes the category (e.g., catering, shopping) of the POI. We denote the set of allPOIs in city c as POIc. The road network and traffic speed data (xspeed) describe the real-time statusof the transportation system. A road segment is defined by its two endpoints as seg = (pos0, pos1).Each segment is associated with its historical real-time traffic speeds spseg = [(tk, speedk)]

|spseg|k=1 ,

where speedk denotes the speed of seg at time tk.

Meteorology data We collect weather data recorded at the airport of each city from rp51 every hour.We process the weather data of each city into a matrix xmtrc ∈ RTc/2×49, where each 49-dimensionalvector contains information about temperature, air pressure, humidity, weather phenomena (e.g., rain,fog), etc. We describe the detailed schema of the weather data in the Appendix A.3.

In total, we collect data for 7 cities: Beijing, Shanghai, Shenzhen, Chongqing, Xi’an2, Chengdu2 andHong Kong. We provide data properties, including data range, size, and availability in Table 5, andsummarize notations used in this paper in Table 7 in the Appendix.

2.2 Raw Data Discretization

Discretization is the first pre-processing step that transforms raw datasets from continuous spatio-temporal ranges to discrete space with fixed sampling intervals. This procedure enables buildingindicators via aggregating discretized measurements from the same interval to quantify variousphenomena in cities. The formulations of the discretization in both spatial and temporal dimensionsare defined as follows:Definition 1 (Region). Each city c is split into Wc × Hc square grids, each with size 1km×1km.Each grid in a city is referred to as a region, denoted as r. We denote the set of all regions in city casRc, and use rc(i, j) to denote the region in city c with coordinate (i, j).Definition 2 (Timestamps). We define the set of available time stamps of city c as Tc = [τc − Tc +1, ...τc], where τc denotes the last timestamp, and Tc is the number of available time stamps of city c.By default, we choose each time stamp to be a period of 30 minutes.Definition 3 (Spatio-temporal Tensors). We denote a spatio-temporal tensor for city c (e.g. taxi flowvalues) as xc ∈ RTc×Wc×Hc , and use xc(τ, i, j) and xc(τ) to denote the value at region rc(i, j) andthe values for all regions in city c at timestamp τ , respectively.

1https://rp5.ru/2Original data are retrieved from Didi Chuxing, https://gaia.didichuxing.com, under the Creative Commons

Attribution-NonCommercial-NoDerivatives 4.0 International license. All data are fully anonymized.

3

Page 4: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

2.3 Processing Individual Sub-datasets

Taxi Flow Dataset Region-wise human flows are important to traffic management and publicsafety. Given taxi GPS points Gc, we extract inflow and outflow spatio-temporal tensors xinc ,x

outc ∈

RTc×Wc×Hc as the number of taxis going into and out of each region within a timestamp [22]:

xinc (τ, i, j) =∑tr∈Gc

∑(posk,tk,occk)∈tr

I (tk ∈ τ ∧ posk ∈ rc(i, j) ∧ posk−1 /∈ rc(i, j)) ,

xoutc (τ, i, j) =∑tr∈Gc

∑(posk,tk,occk)∈tr

I (tk+1 ∈ τ ∧ posk ∈ rc(i, j) ∧ posk+1 /∈ rc(i, j)) ,(1)

where I(X) is the indicator function. I(X) = 1 iff X is true.

Taxi Pickup and Idle Driving Dataset Demand and supply data of taxis are also important toallocating transportation resources efficiently. Given taxi GPS points Gc, we derive the pickup andidle driving spatio-temporal tensors xpickupc ,xidlec to approximate the demands and supplies of taxis.

xpickupc (τ, i, j) =∑tr∈Gc

∑(posk,tk,occk)∈tr

I(tk ∈ τ ∧ occk = 1 ∧ occk−1 = 0), (2)

xidlec (τ, i, j) =1

|τ |∑tr∈Gc

∑(posk,tk,occk)∈tr

I(tk ∈ τ ∧ occk = 0)(tk+1 − tk), (3)

where |τ | denotes the length of a timestamp. Eqn. 2 computes the number of pickups of all taxisat rc(i, j) at timestamp τ , which serves as a lower bound for taxi demands. Meanwhile, Eqn. 3computes the total fraction of time in timestamp τ where taxis are running in idle.

POI Dataset For city c, we obtain POI data POIc = {(pospoi, catpoi)} using AMap API 3. EachPOI category4 is given an id from 0 to 13. We build POI tensors xpoic ∈ RWc×Hc×14 as follows:

xpoic (i, j, cat) =∑

(pospoi,catpoi)∈POIc

I(catpoi = cat ∧ pospoi ∈ rc(i, j)). (4)

xpoic (i, j, cat) is the number of POIs with category cat in rc(i, j).

Road Connectivity Dataset We collect road map information, including highways and motorwaysfrom OpenStreetMaps5. According to the road maps, we build region-wise adjacency matricesAconnc ∈ R|Rc|×|Rc| as:

Aconnc (r0, r1) =

∑(pos0,pos1)∈SEGc

I(pos0 ∈ r0 ∧ pos1 ∈ r1), (5)

where r0, r1 ∈ Rc. Aconnc (r0, r1) represents the total number of roads connecting r0 and r1, and

therefore, Aconnc represents region-wise transportation connectivity within city c.

Traffic Speed Dataset The real-time speed data are processed into xspeedc ∈ RTc×|SEGc|:

xspeedc (τ, s) =∑

s∈SEGc

∑(tk,speedk)∈sps I(tk ∈ τ) · speedk∑

(tk,speedk)∈sps I(tk ∈ τ). (6)

It indicates the average traffic speed of the road segment s over timestamp τ .

As shown in Fig. 1(c), we categorize all taxi data (flow, pickup, and idle driving data) and trafficspeed data as service data as they represent the states of transport service providers, and the rest ascontext data. We show statistics of all sub-datasets in Table 6 in the Appendix.

3https://lbs.amap.com/4We maintain POIs with the following 14 categories: scenic spots, medical and health, domestic services,

residential area, finance, sports and leisure, culture and education, shopping, housing, governments andorganizations, corporations, catering, transportation, and public services.

5https://openstreetmap.org/

4

Page 5: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

2.4 Data integration

Service data and context data are usually integrated to jointly train machine learning models. InCityNet, such a integration operation is simple using the ’JOIN’ operation on the aligned spatial ortemporal dimension. Using the taxi demand prediction problem as an example, we can represent theintegration of taxi pick-up data, POI data and meteorology data with the following relational algebraformula: Π∗(x

poic 1(i,j) x

pickupc 1τ xmtrc ), which represents joining pick-up data with POI data on

the spatial dimension and joining pick-up data with meteorology data on the temporal dimension.

3 Data Mining and Analysis

In addition to data gathering and processing, we should also uncover and quantify the correlationsbetween sub-datasets in CityNet to provide insights on how to utilize the multi-modal data. In thissection, we use data mining tools to explore the correlations between service data and context data.

Inflow Outflow Pickup Idle-time Inflow Outflow Pickup Idle-time Inflow Outflow Pickup Idle-time

Beijing Shanghai Shenzhen

ARI 0.554 0.554 0.635 0.544 0.445 0.445 0.474 0.382 0.191 0.191 0.219 0.227AMI 0.308 0.308 0.418 0.318 0.268 0.268 0.356 0.240 0.132 0.132 0.192 0.182

Chongqing Chengdu Xi’an

ARI 0.471 0.471 0.624 0.541 0.197 0.196 0.167 N/A 0.155 0.155 0.042 N/AAMI 0.261 0.261 0.404 0.350 0.213 0.213 0.226 N/A 0.164 0.164 0.05 N/A

Table 1: ARI and AMI measurements between clustering assignments based on POI and those basedon taxi data. Both measurements range from [−1, 1]. Larger values indicate higher coincidence.

3.1 Correlation between POI and Taxi Data

Intuitively, taxi service data should be closely related to POIs. For example, the outflow of residentialareas should grow significantly in morning peak hours, while taxi pickups should be frequent inbusiness areas during evening rush hours. To confirm this, we cluster regions in city c to k = 3disjoint clusters as: Cc,poi : Rc = Rc,1 ∪ Rc,2 ∪ · · · ∪ Rc,k, according to their POI vectors (xpoic )using the k-means algorithm. Similarly, we also cluster regions using the average daily pattern of taxiservice data to obtain Cc,in, Cc,out, Cc,pickup, Cc,idle. We use the Adjusted Rand Index (ARI) andthe Adjusted Mutual Information (AMI) to measure the similarity between cluster assignments basedon POIs (Cc,poi) and those based on taxi service data (Cc,in, etc). The results are shown in Table1, where both ARI and AMI show positive correlations between both cluster assignments, whichconfirms the relationship between the POI data and taxi service data.

Further more, we plot the average regional daily patterns of taxi service data from each POI-basedcluster in Beijing, Chengdu, and Xi’an in Fig. 2 in the Appendix. On one hand, in Fig. 2(a), taxipatterns in Beijing are cohesive within each POI-based cluster and separable across clusters. Onthe other hand, in Fig. 2(b), clusters with higher values significantly overlap in Xi’an and Chengdu,which are cities with relatively low ARI and AMI in Table 1. This may be caused by the limitednumber of regions in these cities. However, we can still identify two separable clusters in Fig. 2(b).

According to the results, we conclude that regions with similar POI distributions also share similartaxi service patterns.

3.2 Correlation between Meteorology and Traffic Speed Data

Xi’an Chengdu

Rain -2.06∗∗ -0.64∗Fog -0.78∗∗ -0.79∗∗

Table 2: Regression coefficients between weather phenomena and traffic speed in Chengdu and Xi’an.∗,∗∗ indicate 5% and 1% significance under a double-sided t-test, respectively.

It is also intuitive that there exists correlations between weather and traffic speed. For example, whenconfronted with rain or fog, drivers tend to drive with caution and thus slow down. Therefore, weperform experimental studies to verify the correlations.

5

Page 6: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

City Task Non Deep Learning Euclidean Deep Learning Graph LearningRMSE/MAE ST-HA ST-LR ST-CNN MT-CNN ST-LSTM MT-LSTM ST-GCN MT-GCN ST-GAT MT-GAT

Beijing

Inflow 7.90/2.97 6.07/2.59 5.60/2.35 5.45/2.39 5.67/2.31 5.48/2.40 5.48/2.27 5.34/2.19 5.59/2.43 5.36/2.20Outflow 7.90/2.97 6.07/2.59 5.60/2.35 5.45/2.39 5.67/2.31 5.48/2.40 5.48/2.27 5.33/2.19 5.59/2.43 5.36/2.20Pickup 2.51/0.90 2.37/0.89 2.19/0.79 2.15/0.82 2.24/0.82 2.14/0.80 2.23/0.82 2.14/0.80 2.15/0.80 2.14/0.79Idle 0.73/0.31 0.61/0.25 0.57/0.24 0.54/0.25 0.60/0.25 0.52/0.22 0.55/0.24 0.51/0.23 0.55/0.24 0.50/0.22

Shanghai

Inflow 19.41/7.93 11.72/5.93 8.29/4.36 8.41/4.51 8.60/4.57 8.67/4.60 10.40/5.26 9.77/5.05 10.05/5.11 9.64/4.99Outflow 19.40/7.94 11.76/5.94 8.42/4.42 8.26/4.43 8.64/4.53 8.52/4.51 10.07/5.14 9.49/4.90 10.10/5.13 9.38/4.89Pickup 2.24/0.88 1.94/0.84 1.66/0.75 1.64/0.75 1.66/0.74 1.61/0.73 1.78/0.79 1.72/0.80 1.75/0.77 1.71/0.76Idle 1.23/0.54 0.89/0.41 0.86/0.44 0.75/0.41 0.85/0.41 0.73/0.36 0.86/0.42 0.79/0.38 0.81/0.40 0.78/0.38

Shenzhen

Inflow 43.79/18.97 24.06/12.23 17.62/9.41 18.28/9.75 17.94/9.57 17.75/9.65 22.18/11.61 20.84/10.91 22.35/11.46 20.52/10.87Outflow 43.85/18.97 24.08/12.23 18.08/9.54 18.45/9.53 17.70/9.54 17.80/9.64 22.08/11.33 20.82/10.81 21.99/11.28 20.59/10.82Pickup 6.78/2.48 4.91/2.09 4.29/1.83 4.24/1.76 4.00/1.75 3.99/1.76 4.39/2.02 4.24/1.86 4.50/1.92 4.21/1.84Idle 4.25/1.59 2.51/1.09 2.29/0.98 2.16/1.00 1.88/0.88 1.90/0.93 2.33/1.05 2.17/0.97 2.29/1.01 2.18/0.98

Chongqing

Inflow 37.40/14.97 21.19/9.91 12.65/6.48 12.67/6.52 14.39/7.29 13.84/7.08 18.54/6.59 17.12/8.46 16.53/8.49 15.98/7.93Outflow 37.36/14.87 21.23/9.85 12.90/6.60 12.99/6.60 14.30/7.10 13.68/6.95 18.75/8.87 16.91/8.29 18.12/9.12 15.94/7.93Pickup 7.64/3.03 6.34/2.69 5.25/2.24 5.15/2.10 5.21/2.18 5.02/2.10 5.77/2.45 5.49/2.33 5.72/2.45 5.34/2.28Idle 3.81/1.48 2.21/0.96 1.53/0.74 1.57/0.76 1.58/0.73 1.57/0.75 1.99/0.87 1.90/0.84 1.97/0.86 1.80/0.83

Xi’anInflow 87.00/55.28 43.69/29.72 27.79/19.20 25.77/17.85 28.13/19.39 26.33/17.94 34.77/23.86 31.70/21.77 31.62/21.67 30.67/21.59Outflow 86.66/55.29 43.58/29.59 27.59/18.97 25.32/17.55 27.90/18.19 25.92/17.69 35.24/24.15 31.04/21.49 29.88/22.18 28.96/20.12Pickup 16.85/10.84 11.22/7.66 7.77/5.39 7.54/5.16 7.88/5.38 7.71/5.27 9.50/6.50 9.11/6.26 8.22/5.88 8.05/5.26

ChengduInflow 117.5/71.1 58.92/37.72 31.22/21.09 30.02/20.40 34.50/23.57 32.78/22.42 45.10/29.49 41.27/27.62 40.02/27.27 38.47/25.84Outflow 116.9/71.5 58.98/37.88 31.91/21.46 29.74/20.20 33.55/22.86 32.77/22.22 43.66/28.47 40.39/27.10 35.59/24.72 35.20/24.58Pickup 28.27/15.66 17.32/10.54 11.02/7.13 10.55/6.78 11.39/7.21 11.15/7.15 13.64/8.55 13.58/8.76 12.30/7.62 11.24/7.38

Table 3: Results for taxi prediction of all models and all tasks under 6 cities. ST-* and MT-* standsfor single-task and multi-task models respectively. The lowest RMSE/MAE in each task are bolded.

We perform both quantitative and qualitative studies. For quantitative studies, we perform leastsquares regression on the average speed for each timestamp τ , i.e. MEANs

(xspeedc (τ, s)

)with the

weather phenomena xmtrc (τ), and show the coefficients in Table 2. It can be shown that both rainand fog have negative effects on the average traffic speed. We also plot the daily average speeds inXi’an, with and without rain or fog in Fig. 3 in the Appendix. We observe consistently higher averagespeeds when there was no rain, and higher average speeds during day time when there was no fog.

4 Machine Learning Applications

In this section, we present experimental results of machine learning tasks that are supported byCityNet, including spatio-temporal predictions (Section 4.1.1 and Appendix A.5.1), transfer learning(Section 4.2), and reinforcement learning (Appendix A.5.2) to benchmark individual tasks. Also, thebenchmark results uncover the transferrability of knowledge across tasks and cities in CityNet.

4.1 Spatio-temporal Predictions

Spatio-temporal predictions are common and important tasks in smart city. Given the data in CityNet,we provide benchmarks for two typical spatio-temporal prediction tasks, including the taxi serviceprediction tasks in Section 4.1.1, and the traffic speed prediction task in Appendix A.5.1.

4.1.1 Taxi Service Prediction

We investigate the taxi service prediction task [22, 3], including flow and demand-supply prediction6.

Problem Formulation We formulate the taxi service prediction problem as learning a single-stepprediction function fc that achieves the lowest prediction error over future timestamps:

x∗c(τ) = fc ([x∗c(τ − L), ...,x∗c(τ − 1)]) ,

fc = arg minfc

∑τ∈Tc

error (x∗c (τ) ,x∗c (τ)) .

∗ = in, out, pickup, idle, joint,

(7)

where x∗c stands for the spatio-temporal tensor to predict, and error denotes some error function,such as RMSE or MAE. We introduce two settings, the single-task and multi-task settings, wherex∗c denotes individual spatio-temporal tensors (e.g. in/outflow) for the single-task setting, and thestacked spatio-temporal tensors xjointc = STACK

([xinc ,x

outc ,xpickupc ,xidlec

])for the multi-task

6As stated in Sec. 2, we use pickup and idle driving as proxies for demand and supply.

6

Page 7: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

setting. In other words, we achieve multi-task learning for multiple taxi service predictions throughdirect weight sharing. We set the input as L = 5 closest temporal observations (i.e., 2.5 hours)7.

Methods We study the following methods for taxi prediction.

• HA (History Average): Mean of historical values of the data.

• LR (Linear Regression): Ridge regression on historical values with a regularization of 0.01.

• CNN (Convolutional Neural Network): We implement our CNN based on STResNet [22]. Weuse 6 residual blocks with 64 3× 3 filters. Batch normalization (BN) is used.

• LSTM: We implement our LSTM based on ST-net [19], which first uses convolutions to modelspatial relations and uses LSTM to capture temporal dependencies. We use 3 residual blocks withBN for convolutions, and use a single-layer LSTM with 256 hidden units.

• GCN (Graph Convolutional Network) [5] and GAT (Graph Attention Network) [14]: We applyGCN and GAT with input features being historical values (single-task) or stacked historicalvalues of all tasks (multi-task) of input data. For graph convolution, we take the adjacency matrixas A =

(Aconnc + Apoi

c + Aneighc

)/3 following [3], and use the normalized adjacency matrix

A = D−1/2AD−1/2 for GCN.

Among them, HA, LR are traditional time series forecasting models, CNN, LSTM are deep learningmodels on Euclidean structures (grids), while GCN, GAT are graph deep learning models utilizingadditional region-wise connections (e.g. POI and road connectivity).

Experimental Settings We split the earliest 70% data for training, the middle 15% for validation,and the latest 15% for testing. All data are normalized to [0, 1] by min-max normalization and will berecovered for evaluation. We use RMSE and MAE as metrics [20].

Table 3 shows the overall performances under both the single-task (ST-*) setting and the multi-task(MT-*) setting for all investigated models. We make the following observations:

• Deep Learning or Not: Given sufficient data (e.g. 10 days, 1-2 months), deep learning modelsincluding CNN, LSTM, GCN, and GAT outperform time-series forecasting methods like HA andLR. This illustrates the power of deep learning in spatio-temporal predictions and the advantageof leveraging correlations in both Euclidean and non-Euclidean spaces.

• Multi-task or Not: Among all 22 tasks, multi-task models achieve the lowest RMSE in 15(68.2%) tasks, and the lowest MAE in 15 (68.2%) tasks. For each model, MT-* outperformsST-* in 80 (90.9%) cases in terms of RMSE and 66 (75%) cases in terms of MAE. We concludethat taxi service predictions can be improved through a simple multi-task method, weight sharing,which verifies the connections among heterogeneous taxi service data and paves way for futureresearch in multi-task spatio-temporal predictions.

• Graph Models or Not: Among all the prediction tasks, CNN-based models (CNN, LSTM)achieve the best performances in most cases, while GNN-based models (GCN, GAT) workwell in Beijing only, which has the largest map size with the most complex intra-city relations.We conclude that in cities with large map sizes, incorporating graphs to model region-wiseconnections will improve prediction accuracy, while in cities with small map sizes, incorporatinggraphs will hurt performance. One possible reason is that graphs built on a limited number ofregions tend to be dense, as shown in Table 6, which causes over-smoothing [7].

4.2 Inter-city Transfer Learning

As shown in Table 3, deep learning models generate accurate predictions when empowered withabundant data. However, the level of digitization varies significantly among cities, and it is likely thatmany cities cannot build accurate deep prediction models due to a lack of data. One solution to theproblem is transfer learning [12], which transfers knowledge from a source domain with abundantdata to a target domain without much data, and in our case, from one city to another. Therefore, we

7For pickup and idle driving datasets, we aggregate 10-minute timestamps into 30-minute timestamps, anduse the aggregated tensors for prediction.

7

Page 8: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

Target City Task Full Data (Lower bound) 3-day Data (Upper bound) Fine-tuning/Source City RegionTrans/Source CityLR LSTM LR LSTM Beijing Shanghai Xi’an Beijing Shanghai Xi’an

Shenzhen

Inflow 24.06/12.23 17.94/9.57 24.07/12.27 22.63/12.83 19.78/10.58 20.90/11.09 N/A 19.67/10.40 20.59/10.93 N/AOutflow 24.08/12.23 17.80/9.64 24.09/12.27 22.75/12.87 19.82/10.58 20.95/11.09 N/A 19.62/10.49 20.70/10.95 N/APickup 4.91/2.09 3.99/1.76 4.92/2.09 4.52/2.07 4.25/1.86 4.37/1.92 N/A 4.21/1.82 4.31/1.89 N/AIdle 2.51/1.09 1.90/0.94 2.51/1.09 2.39/1.17 2.12/0.97 2.16/1.10 N/A 2.12/0.97 2.13/1.00 N/A

Chongqing

Inflow 21.19/9.91 13.84/7.08 21.20/9.95 18.45/9.81 16.10/7.98 16.74/8.23 N/A 15.87/7.90 16.59/8.15 N/AOutflow 21.24/9.85 13.68/6.95 21.24/9.86 18.22/9.69 15.85/7.83 16.61/8.13 N/A 15.72/7.83 16.46/8.08 N/APickup 6.34/2.69 5.02/2.10 6.34/2.69 5.64/2.52 5.42/2.23 5.45/2.30 N/A 5.37/2.20 5.42/2.30 N/AIdle 2.21/0.96 1.57/0.75 2.21/0.96 2.03/0.98 1.75/0.81 1.80/0.83 N/A 1.77/0.82 1.77/0.81 N/A

ChengduInflow 58.92/37.72 32.79/22.42 59.29/38.45 48.39/33.12 43.49/28.88 46.30/30.08 45.24/29.69 43.43/28.60 45.74/29.92 44.43/29.51Outflow 58.98/37.88 33.55/22.86 59.37/38.61 47.77/32.69 43.12/28.44 46.06/30.12 44.66/29.35 42.73/28.66 45.59/29.86 43.69/29.17Pickup 17.32/10.54 11.15/7.15 17.40/10.69 15.21/10.08 13.78/8.79 14.39/9.12 14.07/9.01 13.51/8.48 14.21/9.03 13.89/9.06

Table 4: Results for inter-city transfer learning from source cities (Beijing, Shanghai, Xi’an) to targetcities (Shenzhen, Chongqing, Chengdu). The lowest RMSE/MAE using limited target data are inbold. Results under full and 3-day data are lower and upper bounds of errors of corresponding tasks.

carry out transfer learning experiments on CityNet to show that connections among cities can lead topositive knowledge transfer, and to provide benchmarks for future works on urban transfer learning.

We investigate the problem of inter-city transfer learning on taxi service datasets. We formulate theproblem as to learn a function fcs,ct that achieves the lowest prediction error on the target city ct,with the help of data from the source city cs:

x∗ct(τ) = fcs,ct([x∗ct(τ − L), ...,x∗ct(τ − 1)

],x∗cs

),

fcs,ct = arg minfcs,ct

∑τ∈Tct

error(x∗ct(τ),x∗ct(τ)

).

∗ = in, out, pickup, idle, joint, |x∗ct | � |x∗cs |.

(8)

Experimental Settings We use LSTM, which is similar to ST-net in MetaST [19] as the base model,and adopt the multi-task setting. For cities with large map sizes, we choose Beijing and Shanghai tobe source cities; for those with small map sizes, we choose Xi’an to be the source city, and the rest arechosen as target cities. Xi’an is not used as the source city to transfer to Shenzhen and Chongqing dueto its much smaller map size. For all target cities, we choose a three-day, Thursday-Friday-Saturdayinterval for training, and validation and test data remain the same.

Methods We study the following transfer learning methods:

• Fine-tuning: We train a model on the source city and then tune it on limited target data.• RegionTrans [15]: RegionTrans computes a matching between source Rcs and target regionsRct based on time-series (S-match) or auxiliary data (A-match) 8 correlation. The matching isused as a regularizer, such that similar regions share similar features. In our experiments, we usePOI vectors to compute the matching, which belongs to A-match.

Results We show results of inter-city transfer learning in Table 4. We show results trained usingfull and 3-day target data, which correspond to the lower and upper bound of errors, respectively.We also show results of fine-tuning and RegionTrans. We make the following observations:

• Degradation under Data Scarcity. When only 3-day training data are available, LR, as a non-deep learning model achieves similar performances as using full data, while LSTM suffers from50% more error (see Chengdu). It indicates that compared to non-deep learning, deep learningmodels show greater sensitivity to the amount of training data.

• Inter-city Correlations. Compared to using target data only (upper bound), transfer learningachieves error reductions in all source-target pairs, the largest of which is about 15% (seeShenzhen and Chongqing). The results indicate that there exist sufficient cross-city correlationsin CityNet, such that knowledge learned for one city can be beneficial to another.

• Impact of Context Data. Using POI data for region matching, RegionTrans achieves lowererror than fine-tuning in most cases, which verifies the connection between context and servicedata, and underscores the importance of multi-modal data in CityNet.

• Domain Selection. We consistently achieve the best results when Beijing is used as the sourcecity, regardless of algorithms and the target city. One probable reason is that Beijing contains themost regions, and hence the most complex regional service patterns.

8In this paper, context data serve as auxiliary data.

8

Page 9: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

4.3 Discussion

The spatio-temporal prediction experiments on the taxi service data have uncovered the correlationamong different tasks for individual sub-datasets in CityNet. Leveraging the mutual knowledgeamong them, the prediction performances for individual tasks could be improved by simple multi-task learning strategies like weight sharing. Also, via transfer learning, the cold start problem indata-scarce cities could be alleviated by choosing a highly-related source domain and transfer theabundant knowledge learned from it. This is supported by the cross-city correlations in CityNet.

To summarize, both the multi-task spatio-temporal prediction and the inter-city transfer learningdemonstrate the correlations among individual datasets in CityNet from different tasks and cities.Thus, the inclusiveness and interrelatedness of CityNet are important features to facilitate the researchof various urban phenomena with new problem settings and new techniques.

5 Conclusion and Opportunities

We present a multi-city, multi-modal dataset for smart city named CityNet. To the best of ourknowledge, CityNet is the first dataset to incorporate urban data that are spatio-temporally alignedfrom multiple cities and tasks. In this paper, to illustrate the importance of multi-modal urban data,we show connections between service and context data using data mining tools, and provide extensiveexperimental results on spatio-temporal predictions, transfer learning, and reinforcement learningto show the connections among tasks and among cities. The results and the uncovered connectionsendorse the potential of CityNet to serve as versatile benchmarks on various research topics, includingbut not exclusive to:

• Transfer learning: In urban computing, knowledge required for different tasks, cities, or timeperiods is likely to be correlated. Leveraging the transferable knowledge across these domains,data scarcity problems in newly-built or under-developed cities can be alleviated. Containingmulti-city, multi-task data, CityNet is an ideal testbed for researchers to develop and evaluatevarious transfer learning algorithms, including meta-learning [19], AutoML [8], and also transferlearning on homogeneous or heterogeneous spatio-temporal graphs.

• Federated learning: Urban data are generated by various human activities and warehoused bydifferent stakeholders, including organizations, companies, and the government. Due to dataprivacy regulations or the protection of commercial interests, collaborations between them shouldbe privacy-preserving. Federated learning (FL) [18] would become an effective solution enablingsuch privacy-preserving multi-party collaboration. As CityNet contains data from multiple citiesand sources, it is appropriate to investigate various FL topics under various settings using CityNet,with each party holding data from one source or one city.

• Explanation and diagnosis: Understanding social effects from data helps city governors makewiser decisions on urban management. Therefore, besides to prediction and planning, we hopethat applications providing explicit insight into urban management, including causal inference[11], uncertainty estimations, explainable machine learning can also be carried out on CityNet.

We hope CityNet can serve as triggers that lead to more extensive research efforts into machinelearning in smart city. For future work, we notice that CityNet is not yet multi-modal in terms oftransports, and thus plan to incorporate even more sources of data into CityNet, including subwaysand buses.

9

Page 10: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

References

[1] Chao Chen. Freeway performance measurement system (pems). 2003.

[2] Nivan Ferreira, Jorge Poco, Huy T Vo, Juliana Freire, and Cláudio T Silva. Visual explorationof big spatio-temporal urban data: A study of new york city taxi trips. IEEE transactions onvisualization and computer graphics, 19(12):2149–2158, 2013.

[3] Xu Geng, Yaguang Li, Leye Wang, Lingyu Zhang, Qiang Yang, Jieping Ye, and Yan Liu.Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3656–3663,2019.

[4] H.V. Jagadish, Johannes Gehrke, Alexandras Labrinidis, Yannls Papakonstantinou, Jignesh M.Patel, Raghu Ramakrishnan, and Cyrus Shahabi. Big data and its technical challenges. Commu-nications of the Acm, 57(7):86–94, 2014.

[5] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutionalnetworks. arXiv preprint arXiv:1609.02907, 2016.

[6] Minne Li, Zhiwei Qin, Yan Jiao, Yaodong Yang, Jun Wang, Chenxi Wang, Guobin Wu, andJieping Ye. Efficient ridesharing order dispatching with mean field multi-agent reinforcementlearning. In The World Wide Web Conference, pages 983–994, 2019.

[7] Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networksfor semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence,volume 32, 2018.

[8] Ting Li, Junbo Zhang, Kainan Bao, Yuxuan Liang, Yexin Li, and Yu Zheng. Autost: Efficientneural architecture search for spatio-temporal prediction. In Proceedings of the 26th ACMSIGKDD International Conference on Knowledge Discovery & Data Mining, pages 794–802,2020.

[9] Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neuralnetwork: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926, 2017.

[10] Binbing Liao, Jingqing Zhang, Chao Wu, Douglas McIlwraith, Tong Chen, Shengwen Yang,Yike Guo, and Fei Wu. Deep sequence learning with auxiliary information for traffic prediction.In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining. ACM, 2018.

[11] Wei Liu, Yu Zheng, Sanjay Chawla, Jing Yuan, and Xie Xing. Discovering spatio-temporalcausal interactions in traffic data streams. In Proceedings of the 17th ACM SIGKDD interna-tional conference on Knowledge discovery and data mining, pages 1010–1018, 2011.

[12] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions onknowledge and data engineering, 22(10):1345–1359, 2009.

[13] Xiaocheng Tang, Zhiwei (Tony) Qin, Fan Zhang, Zhaodong Wang, Zhe Xu, Yintai Ma, HongtuZhu, and Jieping Ye. A deep value-network based approach for multi-driver order dispatching.In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, pages 1780–1790, 2019.

[14] Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and YoshuaBengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.

[15] Leye Wang, Xu Geng, Xiaojuan Ma, Feng Liu, and Qiang Yang. Cross-city transfer learningfor deep spatio-temporal prediction. In Proceedings of the 28th International Joint Conferenceon Artificial Intelligence, pages 1893–1899. AAAI Press, 2019.

[16] Ying Wei, Yu Zheng, and Qiang Yang. Transfer knowledge between cities. In Proceedings ofthe 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pages 1905–1914, 2016.

[17] Zhe Xu, Zhixin Li, Qingwen Guan, Dingshui Zhang, Qiang Li, Junxiao Nan, Chunyang Liu,Wei Bian, and Jieping Ye. Large-scale order dispatch in on-demand ride-hailing platforms:A learning and planning approach. In Proceedings of the 24th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, pages 905–913, 2018.

10

Page 11: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

[18] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Conceptand applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19,2019.

[19] Huaxiu Yao, Yiding Liu, Ying Wei, Xianfeng Tang, and Zhenhui Li. Learning from multiplecities: A meta-learning approach for spatial-temporal prediction. In The World Wide WebConference, pages 2181–2191, 2019.

[20] Bing Yu, Haoteng Yin, and Zhanxing Zhu. Spatio-temporal graph convolutional networks: adeep learning framework for traffic forecasting. In Proceedings of the 27th International JointConference on Artificial Intelligence, pages 3634–3640, 2018.

[21] Jing Yuan, Yu Zheng, Xing Xie, and Guangzhong Sun. Driving with knowledge from the physi-cal world. In Proceedings of the 17th ACM SIGKDD international conference on Knowledgediscovery and data mining, pages 316–324, 2011.

[22] Junbo Zhang, Yu Zheng, and Dekang Qi. Deep spatio-temporal residual networks for citywidecrowd flows prediction. In Proceedings of the AAAI Conference on Artificial Intelligence,volume 31, 2017.

[23] Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. Dnn-based predictionmodel for spatio-temporal data. In Proceedings of the 24th ACM SIGSPATIAL InternationalConference on Advances in Geographic Information Systems, pages 1–4, 2016.

[24] Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. Urban computing: concepts, method-ologies, and applications. ACM Transactions on Intelligent Systems and Technology (TIST),5(3):1–55, 2014.

[25] Ming Zhou, Jiarui Jin, Weinan Zhang, Zhiwei Qin, Yan Jiao, Chenxi Wang, Guobin Wu, YongYu, and Jieping Ye. Multi-agent reinforcement learning for order-dispatching via order-vehicledistribution matching. In Proceedings of the 28th ACM International Conference on Informationand Knowledge Management, pages 2645–2653, 2019.

Acknowledgments and Disclosure of Funding

This project is sponsored by RGC Theme-based Research Scheme (No.T41-603/20R). The rawdatasets are retrieved from Didi Chuxing GAIA Initiative (https://gaia.didichuxing.com) and someother open-source channels. We appreciate the effort from Hao Liu, Kaiqiang Xu and Xiaoyu Hu({liuh,kxuar,huxiaoyu}@ust.hk, HKUST) throughout this research work.

Checklist

1. For all authors...

(a) Do the main claims made in the abstract and introduction accurately reflect the paper’scontributions and scope? [Yes]

(b) Did you describe the limitations of your work? [Yes](c) Did you discuss any potential negative societal impacts of your work? [No] We only

publish the discretized and aggregated measurements of our dataset, which will notreveal sensitive information of individual users.

(d) Have you read the ethics review guidelines and ensured that your paper conforms tothem? [Yes]

2. If you are including theoretical results...

(a) Did you state the full set of assumptions of all theoretical results? [N/A](b) Did you include complete proofs of all theoretical results? [N/A]

3. If you ran experiments (e.g. for benchmarks)...

(a) Did you include the code, data, and instructions needed to reproduce the main experi-mental results (either in the supplemental material or as a URL)? [Yes]

(b) Did you specify all the training details (e.g., data splits, hyperparameters, how theywere chosen)? [Yes]

11

Page 12: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

(c) Did you report error bars (e.g., with respect to the random seed after running experi-ments multiple times)? [No]

(d) Did you include the total amount of compute and the type of resources used (e.g., typeof GPUs, internal cluster, or cloud provider)? [Yes]

4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets...(a) If your work uses existing assets, did you cite the creators? [Yes](b) Did you mention the license of the assets? [Yes](c) Did you include any new assets either in the supplemental material or as a URL? [No](d) Did you discuss whether and how consent was obtained from people whose data you’re

using/curating? [Yes](e) Did you discuss whether the data you are using/curating contains personally identifiable

information or offensive content? [Yes]5. If you used crowdsourcing or conducted research with human subjects...

(a) Did you include the full text of instructions given to participants and screenshots, ifapplicable? [N/A]

(b) Did you describe any potential participant risks, with links to Institutional ReviewBoard (IRB) approvals, if applicable? [N/A]

(c) Did you include the estimated hourly wage paid to participants and the total amountspent on participant compensation? [N/A]

12

Page 13: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

A Supplementary Materials

A.1 Overview of CityNet data

In this section, we provide the overview of CityNet in Table 5 and some basic statistical properties inTable 6 as a supplement to the introduction section.

City Time Range(inclusive)

City Layout Taxi MeteorologyLongitude range (E) Latitude range (N) POI Road Speed In/Outflow Pickup Idle

Beijing 10/20 - 10/29, 2013 115.9045-116.9565 39.5786-40.4032 3 3 7 3 3 3 3Shanghai 08/01 - 08/31, 2016 121.2054-121.8176 30.9147-31.4202 3 3 7 3 3 3 3Shenzhen 12/01 - 12/31, 2015 113.8097-114.3087 22.5085-22.7739 3 3 7 3 3 3 3

Chongqing 08/01 - 08/31, 2017 106.3422-106.6486 29.3795-29.8322 3 3 7 3 3 3 3Xi‘an 10/01 - 11/30, 2016 108.9219-109.0093 34.2049-34.2799 3 3 3 3 3 7 3

Chengdu 10/01 - 11/30, 2016 104.0421-104.1296 30.6528-30.7278 3 3 3 3 3 7 3Hong Kong 10/13 - 12/22, 2020 113.9878-114.2459 22.2745-22.4670 7 3 3 7 3

Table 5: Overview of data in CityNet, including spatio-temporal ranges and availability of sub-datasets in each city. Cities in CityNet have diverse properties. Beijing has the largest map sizewith |Rc| ≈ 10000. Shanghai, Shenzhen and Chongqing have large maps with |Rc| > 1000, whileChengdu and Xi’an having small maps with |Rc| < 100.

Taxi Service In/Outflow Pickup Idle-time In/Outflow Pickup Idle-time In/Outflow Pickup Idle-time

Beijing Shanghai Shenzhen

Shape (480,106,92) (1440,106,92) (1448,62,57) (4464,62,57) (1448,50,30) (4464,50,30)Mean 6.682 0.389 0.488 27.181 0.637 1.618 88.244 2.893 5.488Max 671.0 340.0 243.3 1440.0 304.0 438.0 3601.0 302.0 583.7

Sparsity 0.695 0.920 0.788 0.406 0.838 0.537 0.313 0.715 0.412

Chongqing Chengdu Xi’an

Shape (1488,31,51) (4464,31,51) (2928,9,9) (8784,9,9) - (2928,9,9) (8784,9,9) -Mean 54.761 2.651 3.944 343.309 21.835 - 227.624 12.897 -Max 1572.0 495.0 411.8 3646.0 554.0 - 2768.0 250.0 -

Sparsity 0.411 0.727 0.477 0.028 0.102 - 0.001 0.111 -

Road Connectivity Beijing Shanghai Shenzhen Chongqing Chengdu Xi’an

# roads 180,488 151,124 69,612 31,138 6,846 11,810Max 65 67 32 27 35 23Mean (×1e-4) 20.07 95.35 180.33 122.69 6752.02 4429.20Sparsity 0.999 0.997 0.994 0.994 0.823 0.838

Speed Chengdu Xi’an Hong Kong

# roads 1,676 792 618Max 170.6 149.22 111.0Mean 31.93 32.49 56.09Sparsity 0.047 0.053 0.027

Table 6: Statistics of Taxi Service, Road Connectivity and Traffic Speed sub-datasets within all cities.The temporal granularity is 30 minutes for in/outflow data, and 10 minutes for pickup, idle drivingand traffic speed data. Sparsity denotes the fraction of 0s within all entries of the sub-dataset.

A.2 Summary of Notations

We summarize all notations used in this paper in Table 7.

A.3 Details of Meteorology Dataset

Given the weather data matrix xmtrc ∈ RTc/2×49, each dimension in the 49-dimensional vectorxmtrc (τ) has the following meaning:

• 0: Air temperature (degrees Celsius);• 1: Atmospheric pressure (mmHg);• 2: Atmospheric pressure reduced to mean sea level (mmHg);• 3: Relative humidity (%);• 4: Mean wind speed (m/s);• 5: Horizontal visibility (km);• 6: Dewpoint temperature (degrees Celcius);• 7-24: Multi-hot encodings of whether phenomena. Each index represents fog, patches fog,

partial fog, mist, haze, widespread dust, light drizzle, rain, light rain, shower, light shower,

13

Page 14: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

Notation Meaning

c A specific city, e.g. Beijing, ShanghaiWc, Hc Width and height of the city c in gridsr A region, i.e. a 1km×1km grid

rc(i, j) The region in city c with grid coordinate (i, j)Rc The set of all regions in city cτ A timestamp, a 30-minute interval by defaultτc The last valid timestamp of city cTc The set of valid timestamps of city cTc # valid timestamps of city cGc The set of all taxi GPS points in city cAc The set of all taxis in city ctra The trajectory of taxi a

(pos, t, occ) A taxi GPS pointPOIc The set of POIs in city c

(pospoi, catpoi) The position and category of a POIxc A spatio-temporal tensor for city c

xc(i, j, τ) The value at region rc(i, j) at τxc(τ) The spatio-temporal values ofRc at time τ

seg = (pos0, pos1) A road segment and its two endpointsSEGc The set of road segments in city cspseg The sequence of history traffic speeds of seg

Table 7: Summary of Notations

(a) Beijing (b) Chengdu and Xi’an

Figure 2: Average daily pattern of taxi service data in (a) Beijing, (b) Chengdu and Xi’an. Weaggregate all values at each timestamp from all days, and represent mean values at each timestampusing solid lines, and standard deviations using shades.

in the vicinity shower, heavy shower, thunderstorm, light thunderstorm, in the vicinitythunderstorm, heavy thunderstorm, and no special weather phenomena, respectively;

• 25-42: One-hot encoding of wind direction. Each index represents no wind, variabledirection, E, E-NE, E-SE, W, W-NW, W-SW, S, S-SE, S-SW, SE, SW, N, N-NE, N-NW, NE,and NW, respectively;

• 43-48: One-hot encoding of wind covering. Each index represents no significant clouds, cu-mulonimbus clouds, few clouds, scattered clouds, broken clouds, and overcast, respectively.

A.4 Visualization for Data Mining and Analysis

We provide the visualization for data mining analysis in Section 3 in Figure 2 and 3.

14

Page 15: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

0:00 4:00 8:00 12:0016:0020:0024:00Time

242628303234363840

Spee

d (k

m/h

)

Daily Avg speed at each hour in Xi'an

RainNo Rain

(a) Rain

0:00 4:00 8:00 12:0016:0020:0024:00Time

242628303234363840

Spee

d (k

m/h

)

Daily Avg speed at each hour in Xi'an

FogNo Fog

(b) Fog

Figure 3: Daily average speed in Xi’an, with and without rain or fog. Shades represent half of thestandard deviation.

RMSE/MAE Xi’an Chengdu Hong Kong

HA 8.7475/5.0312 7.9507/4.5202 6.784/3.8175LR 8.1432/4.5687 7.4246/4.1844 5.7752/3.3942

GCN 7.5872/4.1807 7.0598/3.9894 5.5043/3.1405GAT 7.5217/4.0987 6.9622/3.8825 5.5722/3.1465

Table 8: Results for traffic speed prediction in Xi’an, Chengdu, and Hong Kong.

A.5 More Machine Learning Benchmarks

A.5.1 Traffic Speed Prediction

We investigate the traffic speed prediction problem [20, 9]. We formulate the problem as learning amulti-step prediction function fc(·):

[xsc(τ), ..., xsc(τ + Lo − 1)] = fc ([xsc(τ − Li), ...,xsc(τ − 1)]) ,

fc = arg minfc

∑τ∈Tc

error (xsc (τ) ,xsc (τ)) , (9)

where xsc is a shorthand for the traffic speed sub-dataset xspeedc . In this experiment, the input lengthLi is set to 12 (2 hours) and the output length Lo is set to 6 (1 hour). We investigate four methods, HA,LR, GCN and GAT described in Sec. 4.1.1. For i, j ∈ SEGc, i = (posi0, pos

i1), j = (posj0, pos

j1),

the adjacency matrix A for GNN models is calculated as

A(i, j) =

{1, posi0 = posj1 or posi1 = posj00, otherwise

. (10)

Table 8 shows the results of the traffic speed prediction tasks in three cities for all examined models.

A.5.2 Reinforcement Learning

Planning for adequate travel resource allocation is important in smart city. Based on the data inCityNet, we provide benchmarks for the taxi dispatching task, where operators continually dispatchavailable taxis to waiting passengers in real-time so as to maximize the long-term total revenue of thetaxi system.

Problem Statement. Given historical requests from passengers, we learn a real-time taxi dispatchingpolicy. At every timestamp τ , we dispatch available taxis to the current passengers according to thepolicy so as to maximize the total revenue of all taxis in the long run. Specifically, we divide the cityinto uniform hexagon grids, instead of square ones hexagon as per previous studies [13, 17].

Methods. We study the following methods for taxi dispatching.

• Greedy is a heuristic algorithm that dispatches the closest available taxi to each passengerone by one.

15

Page 16: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

Date Completion rate Accumulated revenue

Greedy LLD LPA Greedy LLD LPA

Oct. 1st 0.689 0.734 0.718 1,017,144 1,035,724 1,016,410Oct. 2nd 0.705 0.765 0.752 969,340 984,424 975,157Oct. 3rd 0.680 0.734 0.747 943,422 961,263 969,341Oct. 4th 0.695 0.810 0.830 965,896 1,034,539 1,041,978Oct. 5th 0.708 0.782 0.811 984,225 1,021,727 1,032,165Oct. 6th 0.735 0.832 0.825 979,356 1,028,292 1,020,174Oct. 7th 0.774 0.872 0.889 932,641 977,996 980,397

Table 9: Results for taxi dispatching in Chengdu

• LLD is an optimization based approach formulated as Eqn. 11, where aij = 1 if taxi j isdispatched to passenger i and 0 otherwise; A(i, j) is the immediate revenue to taxi j afterdriving passenger i to his/her destination. The definition of the immediate revenue follows[17].

maxaij

m∑i=0

n∑j=0

A(i, j)aij

s.t.

∑mi=0 aij = 1, j = 1, 2, 3, ..., n∑nj=0 aij = 1, i = 1, 2, 3, ...,m

(11)

• LPA [17] is a reinforcement learning based planning approach. We first adopt SARSA [17]to learn the expected long-term revenue of each grid in each period. Then, based on theseexpected revenues, we dispatch taxis to passengers by the same optimization formulationas Eqn. 11, except that we replace A(i, j) with the scores learned by SARSA. LPA tries tomaximize the total revenue of the system in the long-run instead of immediate revenues.

Experimental Settings Based on historical request data, we implement a simulator according to[17, 25] to simulate how the taxi system operates, and evaluate these methods on our simulator.

We show taxi dispatching results in Chengdu in Table 9, where Completion rate denotes the ratio ofcompleted requests within all requests, and Accumulated revenue denotes the total revenue receivedby all taxis in the whole day. We draw the following conclusions from the experiment results:

• Greedy algorithm, which does not consider any global optimization targets, performs theworst compared to LLD and LPA. Taking global optimization targets into considerationleads to an improvement of 5%-20% in completion rate and 2%-8% in revenue.

• LPA performs better than LLD most of the time. Since LPA optimizes the expected long-term revenues at each dispatching round, while LLD only focuses on the immediate reward,LPA should compare favorably against LLD.

A.6 Details of Machine Learning Experiments

We provide detailed settings for our machine learning experiments. Note that we only tune thesesettings to make sure all models converge, rather than to achieve state-of-the-art results.

A.6.1 Taxi Service Prediction

For CNN and LSTM, we train models with learning rate 1e-4, weight decay 0 using Adam optimizerfor 75 epochs, and choose the model with the lowest validation error for testing. We set the batchsize as 64 for CNN, and use variable batch sizes (Beijing 4, Shanghai 8, Shenzhen/Chongqing 16,Xi’an/Chengdu 32) for LSTM. Weather data are used in training the model following the method in[22]. Each result is an average of 5 independent runs.

16

Page 17: arXiv:2106.15802v1 [cs.AI] 30 Jun 2021

A.6.2 Inter-city Transfer Learning

For fine-tuning, we tune the model trained over the source city with learning rate 5e-5, weight decay0 using Adam optimizer for 100 epochs, and choose the model with the lowest validation error on thetarget city for testing. For RegionTrans, the weight for feature consistency is taken as 0.01.

A.7 Hardware Configurations and Data-sharing Approaches

All experiments included in this paper are conducted with Python 3.6 and PyTorch 1.6.0 environmenton Tesla V100 GPUs. The hardware is affiliated to the Turing AI Computing Cloud (TACC)9

maintained by the iSING Lab at HKUST10.

To empower smart city applications including transportation, healthcare, finance, etc., we are buildingTACC, an open platform that provides high performance computing resources, massive smart citydata including our CityNet, and state-of-the-art machine learning algorithmic paradigms. Researcherscan train their own machine learning models on CityNet on TACC equipped with GPUs and RDMAfor free. We plan to open the platform to the the academic community in Hong Kong by December2021, and to the global researchers by June 2022.

A.8 Author statement and license

We bear all responsibility in case of violation of rights during using, processing and distributing thesedata. This work is licensed under a CC BY 4.0 license.

9https://turing.ust.hk/tacc.html10https://ising.cse.ust.hk/

17