13
Research Article Machine Learning Approach to Quantity Management for Long-Term Sustainable Development of Dockless Public Bike: Case of Shenzhen in China Qingfeng Zhou , 1,2 Chun Janice Wong , 1 andXianSu 3 1 Harbin Institute of Technology, Shenzhen, Guangdong 518055, China 2 Shenzhen Key Laboratory of Urban Planning and Decision Making, Shenzhen, Guangdong 518055, China 3 China Resources Land Guangxi Co, Nanning, Guangxi 530000, China CorrespondenceshouldbeaddressedtoChunJaniceWong;[email protected] Received 21 July 2020; Revised 25 October 2020; Accepted 12 November 2020; Published 28 November 2020 AcademicEditor:PetrDolezel Copyright©2020QingfengZhouetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. SincethenumberofbicyclesiscriticaltothesustainabledevelopmentofdocklessPBS,thisresearchpracticedtheintroductionofa machinelearningapproachtoquantitymanagementusingOFObikeoperationdatainShenzhen.First,twoclusteringalgorithms wereusedtoidentifythebicyclegatheringarea,andtheavailablebikenumberandcoefficientofavailablebikenumbervariation were analyzed in each bicycle gathering area’s type. Second, five classification algorithms were compared in the accuracy of distinguishingthetypeofbicyclegatheringareasusing25impactfactors.Finally,theapplicationoftheknowledgegainedfrom theexistingdocklessbicycleoperationdatatoguidethenumberplanningandmanagementofpublicbicycleswasexplored.We foundthefollowing.(1)erewere492OFObicyclegatheringareasthatcanbedividedintofourtypes:highinefficient,normal inefficient,highefficient,andnormalefficient.ehighinefficientandnormalinefficientareasgatheredabout110,000bicycles withlowusage.(2)Moretypesofbicyclegatheringareawillaffecttheaccuracyoftheclassificationalgorithm.erandomforest classificationhadthebestperformanceinidentifyingbicyclegatheringareatypesinfiveclassificationalgorithmswithanaccuracy ofmorethan75%.(3)erewereobviousdifferencesinthecharacteristicsof25impactfactorsinfourtypesofbicyclegathering areas.Itisfeasibletousethesefactorstopredictareatypetooptimizethenumberofavailablebicycles,reduceoperatingcosts,and improve utilization efficiency. is work helps operators and government understand the characteristics of dockless PBS and contributes to promoting long-term sustainable development of the system through a machine learning approach. 1.Introduction e public bike system (PBS) also called a bicycle sharing system (BSS), which was born in 1965 in Europe, has been developedforthreegenerations[1].PBSiseconomical,eco- friendly, healthy, more equitable, produces ultralow carbon emissions, and has rapidly emerged in many cities all over the world [2]. Since 2016, a relatively new model of PBS, knownasthefree-floatbikesharingsystem,hasincreasingly gained its popularity. e FFBS is based on the mobile app and GPS which eliminates stations and docks (also called docklessbike).Passengerscaneasilypickupanddropoffa bike anywhere using their cell phones. is system is quite spread nowadays through enterprises as OFO and Mobike since early 2016 in China. Dockless PBS brings new expe- riencesandconveniencesaswellassomeproblems,andan important issue is to consider the number of bicycles available.Ithastwosides:(1)assumingthatthesurrounding roads are suitable for cycling when a large number of docklessbicyclesareconcentratedinanareawithalowcost to use, we can think it is providing “adequate” bike supply, whichcanhelpusfullyunderstandthebikedemandinthis area; (2) in fact, if the number of available bicycles is too large, it will cause a series of waste. Many problems are related to the number of shared bicycles especially for the dockless PBS which is an important issue to be considered. But it is seldom involved in the existing research. e numberofavailablebicyclesisthecoreindicator.Excessive Hindawi Journal of Advanced Transportation Volume 2020, Article ID 8847752, 13 pages https://doi.org/10.1155/2020/8847752

MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

Research ArticleMachine Learning Approach to Quantity Management forLong-Term Sustainable Development of Dockless Public BikeCase of Shenzhen in China

Qingfeng Zhou 12 Chun Janice Wong 1 and Xian Su 3

1Harbin Institute of Technology Shenzhen Guangdong 518055 China2Shenzhen Key Laboratory of Urban Planning and Decision Making Shenzhen Guangdong 518055 China3China Resources Land Guangxi Co Nanning Guangxi 530000 China

Correspondence should be addressed to Chun Janice Wong janicewonghiteducn

Received 21 July 2020 Revised 25 October 2020 Accepted 12 November 2020 Published 28 November 2020

Academic Editor Petr Dolezel

Copyright copy 2020Qingfeng Zhou et al-is is an open access article distributed under theCreative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Since the number of bicycles is critical to the sustainable development of dockless PBS this research practiced the introduction of amachine learning approach to quantity management using OFO bike operation data in Shenzhen First two clustering algorithmswere used to identify the bicycle gathering area and the available bike number and coefficient of available bike number variationwere analyzed in each bicycle gathering arearsquos type Second five classification algorithms were compared in the accuracy ofdistinguishing the type of bicycle gathering areas using 25 impact factors Finally the application of the knowledge gained fromthe existing dockless bicycle operation data to guide the number planning and management of public bicycles was explored Wefound the following (1) -ere were 492 OFO bicycle gathering areas that can be divided into four types high inefficient normalinefficient high efficient and normal efficient -e high inefficient and normal inefficient areas gathered about 110000 bicycleswith low usage (2) More types of bicycle gathering area will affect the accuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicycle gathering area types in five classification algorithms with an accuracyof more than 75 (3) -ere were obvious differences in the characteristics of 25 impact factors in four types of bicycle gatheringareas It is feasible to use these factors to predict area type to optimize the number of available bicycles reduce operating costs andimprove utilization efficiency -is work helps operators and government understand the characteristics of dockless PBS andcontributes to promoting long-term sustainable development of the system through a machine learning approach

1 Introduction

-e public bike system (PBS) also called a bicycle sharingsystem (BSS) which was born in 1965 in Europe has beendeveloped for three generations [1] PBS is economical eco-friendly healthy more equitable produces ultralow carbonemissions and has rapidly emerged in many cities all overthe world [2] Since 2016 a relatively new model of PBSknown as the free-float bike sharing system has increasinglygained its popularity -e FFBS is based on the mobile appand GPS which eliminates stations and docks (also calleddockless bike) Passengers can easily pick up and drop off abike anywhere using their cell phones -is system is quitespread nowadays through enterprises as OFO and Mobike

since early 2016 in China Dockless PBS brings new expe-riences and conveniences as well as some problems and animportant issue is to consider the number of bicyclesavailable It has two sides(1) assuming that the surroundingroads are suitable for cycling when a large number ofdockless bicycles are concentrated in an area with a low costto use we can think it is providing ldquoadequaterdquo bike supplywhich can help us fully understand the bike demand in thisarea (2) in fact if the number of available bicycles is toolarge it will cause a series of waste Many problems arerelated to the number of shared bicycles especially for thedockless PBS which is an important issue to be consideredBut it is seldom involved in the existing research -enumber of available bicycles is the core indicator Excessive

HindawiJournal of Advanced TransportationVolume 2020 Article ID 8847752 13 pageshttpsdoiorg10115520208847752

bicycles can affect the cost and efficiency of operation whichis not conducive to the long-term sustainability of thesystem -e government and scholars have paid more andmore attention to the question of how to rationally developdockless PBS in the city

Computational intelligence such as artificial neuralnetworks fuzzy systems and evolutionary computing hasachieved significant results in modeling learning and searchand optimization problems for smart city applications [3 4]-e characteristics of machine learning make it attractive foranalyzing smart city data with complex nature [5] such asmodes (streams time series images videos and texts) largeamounts (continuous data generated by millions of sensingdevices) space-time dependence etc Researchers in smartcities have applied machine learning in many areas such asurban human mobility [6] public space utilization [7] andpublic bus charging station placement [8] Since the numberof bicycles is critical to the sustainable development ofdockless PBS this research practiced the introduction of amachine learning approach to quantity management Fourissues are discussed from the existing shared bicycle oper-ation data (1) How to identify the gathering area of docklessshared bicycles (2) How to measure the number of bicyclesand activity characteristics in the bicycle gathering area (3)What are the differences between classification algorithms inpredicting the types of bicycle gathering areas (4) How touse activity pattern to guide dockless PBS rationally developin the city In this study first two clustering algorithms wereused to identify the bicycle gathering area and the availablebike number and coefficient of available bike number var-iation were analyzed in each bicycle gathering arearsquos typeSecond five classification algorithms were compared in theaccuracy of distinguishing the type of bicycle gathering areasusing impact factors Finally the application of theknowledge gained from the existing dockless bicycle oper-ation data to guide the number planning andmanagement ofpublic bicycles was explored

-e rest of this paper is organized as follows Section 2presents a literature review on the systems perspective ofpublic bike research Section 3 introduces the indicators andmethods used Section 4 briefly describes bicycle operationdata and influencing variables In Section 5 we discuss thebicycle gathering area typersquos recognition prediction andapplication Finally Section 6 summarizes the results of thisstudy and provides direction in future studies

2 Previous Work

PBS is involved in many areas of research and it is broadlybased on two perspectives user perspective and systemsperspective [9] In this study we only focus on a systemsperspective according to the goals

21 Bike Sharing Rebalance For PBS the lack of resources isthe major issue a user can arrive at a station that has no bikeavailable or wants to return her bike at a station with noempty spot Based on the practical usage several studiesfocused to deal with public bike rebalancing problem using

intelligent algorithms Fricker and Gast [10] proposed astochastic model of homogeneous PBSs to study the effect ofusersrsquo random choices on the number of problematic sta-tions -ey also computed the rate of which bikes must beredistributed by trucks to ensure a given quality of serviceYou et al [11] provided an integrated model to resolve theproblems of fleet sizing empty-resource repositioning andvehicle routing for bike transfer in multiple station systemsOrsquoMahony and Shmoys [12] tackled the problem of reba-lancing PBS during rush hour An optimization problemwhose goal is to plan truck routes to make PBS as balanced aspossible in night shift was studied and novel methods weredeveloped for optimizing rebalancing resources Chen et al[13] addressed the layout planning of public bicycle systemwithin the attracted scope of a metro station Locations ofdifferent PBS service stations and the optimal route optionsfor the implementation of the redistributing strategy wereconsidered Lozano et al [14] proposed a multiagent modelthat provides visualization and prediction tools for PBS

22 Bike Demand Estimation -ese studies examine theinfluence of PBS infrastructure transportation networkinfrastructure land use and urban form meteorologicaldata and temporal characteristics on PBS usage Faghih-Imani et al [15] collected station-level occupancy data andthen transformed station occupancy snapshot data intostation-level customer arrivals and departures -ey devel-oped a mixed linear model to estimate the influence ofbicycle infrastructure sociodemographic characteristicsand land-use characteristics on customer arrivals and de-partures In the work of Krykewycz et al [16] various de-mographic land use and infrastructure factors understoodto be favorable for bike share usage were spatially analyzed todefine a primary market area El-Assi et al [17] investigatedthe effects of weather socioeconomic and demographicfactors and land use and the built environment on bicycleshare ridership A regression analysis was performed onthree different levels Hampshire and Marla [18] employed apanel regression model to explain the factors affecting thebike sharing trip generation and attraction in the presence ofunobserved spatial and temporal variables -e data usedincluded PBSrsquos usage data in Barcelona and Seville ninecensus demographic data and the location of points ofinterest (POIs) Zhang et al [19] employed a multiple linearregression model to examine the influence of built envi-ronment variables on trip demand as well as on the ratio ofdemand to supply at bike stations in China Faghih-Imaniet al [15] investigated factors affecting bicycle share demandat the station level using real-time ridership data -e resultsshowed that stations close to major roads had lower tripactivities compared to stations that were situated aroundminor roads and bicycle lanes A number of land use andbuilt environment variables temporal characteristics andweather variables such as temperature were investigatedMaurer [20] used a pairwise suitability analysis to under-stand the effects of variables such as job density householdincome and alternative commuters on public bicycle shareridership to propose the locations of bicycle stations in

2 Journal of Advanced Transportation

Sacramento California Gebhart and Noland [21] used real-time ridership data for Capital Bikeshare in WashingtonDC to investigate the impact of weather variables andproximity of bike share stations to metro stations on rid-ership levels Buck and Buehler [22] investigated the in-fluence of bicycle infrastructure population density landuse mix around stations and the number of householdswithout a car using bicycle share systems using ridershipdata from Capital Bikeshare Wang et al [23] evaluated theeffect of sociodemographic land use built environment andtransportation infrastructure variables on bicycle shareridership Rixey [24] explored the influence of socio-demographic characteristics such as education income andemployment and population density on monthly ridershipdata from three states of the USA

23 Spatial and Temporal Patterns of Bike Use -ese studiesexplore the spatial and temporal patterns of bike use over thetime of day using data mining and visualization techniquesClustering is frequently used to identify mobility patterns inBSS usage by partitioning the stations into different clustershaving a similar usage Wong and Cheng [25] presented theinsights of imbalanced public bicycle distributions throughthe analysis of spatiotemporal activity patterns of bikestations -e clustering algorithm was used to analyze howstation activity patterns were geographically distributedbased on their usage patterns -ey also explored how theseactivity patterns relate to underlying cultural and spatialcharacteristics of Taipei City in China Temporal and spa-tiotemporal patterns among bike stations of Barcelona bikesharing system were explored by Froehlich et al [26]Numerous research studies also used a hierarchical clus-tering method to generate clusters and investigate usagepatterns geographically distributed in the city to understandthe impact of the inhomogeneity of the city on the long-runactivity of stations [27ndash29] Brien et al [30] proposed aclassification of bike shares based on the geographicalfootprint and diurnal day-of-week and spatial variations inoccupancy rates Etienne and Latifa [31] presented an au-tomatic algorithm based on a new statistical model to au-tomatically cluster PBS stations according to their usageprofile Zhou [32] investigated the spatiotemporal bikingpattern in Chicago by analyzing massive BSS data from Julyto December in 2013 and 2014 Bike flow similarity graphwas constructed with a fast greedy algorithm to detect spatialcommunities of biking flows

Scholars have achieved rich results in measuring theindicators of the bicycle system and the factors affectingcycling -e methods of research mainly involve regressionmodels -e knowledge gained most comes from the dockPBS except a few studies [33ndash35]

3 Methodology

31 Indicators of Dockless PBS -ere are many indicators tomeasure the PBS including the number of bicycle use arrivalrate and departure rate -is study focused on the number ofavailable bicycles and their changes so two indicators were used

311 Average Available Bike Number Unlike the dock PBSthe maximum available bike number is fixed and determinedby the number of docks of station For the dockless PBS themaximum available bike is not subject to parking restric-tions It is related to the initial bike quantity status of de-ployment by the system and varies as the bike flows Weproposed the average available bike number to explore thedockless PBS It represents the number of bicycles availablein a bike service area -is metric is used to measure thebicycle resource -e number of bikes available per hour(Abni) can be calculated by equation (1)

abni 1113936

5d1 abn

id

5 (1)

abn DAY 1113936

24i1Abni

24 (2)

where i represents the ith hour in a day d represents the dthworkday of a week Abni

d is the number of available bicyclesin the ith hour of the dth day Abni is the average number ofavailable bicycles in hour i in work day and abn DAY inequation (2) is the average available vehicle for bike servicearea throughout the day

312 Coefficient of Available Bike Number Variation-e coefficient of variation is used to compare the degree ofdispersion of the two sets of data which can eliminate theinfluence of measurement scale and dimension In thisstudy the coefficient of variation was used to compare thechanges in available bicycles in 24 hours of a day amongbicycle service areas-e calculation formula is shown in thefollowing equation

cv sd abni( 1113857

abn DAYtimes 100 (3)

where sd(abni) is the standard deviation of available bi-cycles number in 24 hours in a service area Obviously cv isaffected by the two statistics of mean and standard deviationof available bike number -is metric is used to measure thevariation in bicycle usage with average available bikenumber

32 Clustering and Classification in MachineLearning Approach

321 Clustering Algorithm Clustering is an unsupervisedlearning algorithm of classifying and organizing members indatasets which are similar [36]

(1) k-Means Clustering Algorithm Given a set of data thek-means algorithm divides the data into k clusters repeatedlyaccording to a distance function -e algorithm operates ona set of d dimensional vectors D xi|i 1 N1113864 1113865 wherexi isin d denotes ith data point -e algorithm is initialized bypicking k points in d as the initial k cluster representatives orldquocentroidsrdquo Techniques for selecting these initial seeds in-clude sampling at random from the dataset setting them as

Journal of Advanced Transportation 3

the solution of clustering a small subset of the data orperturbing the global mean of the data k times -en thealgorithm iterates between two steps till convergence Aboutthe value of k we can choose from reasonable guessed orpredefined number but it is better to know whether k

clusters is better or worse than k minus 1 or k + 1 clusters -emethod of With he Sum of Square (WSS) is often used to getreasonable K value WSS is the sum of the square of thedistance between all points and their nearest centroid point-e calculation is shown in equation (4) pi represents thepoint i and qi represents the nearest centroid point to i if alldata points are relatively close to their respective centersthen the WSS is relatively small If K + 1 clusters do notsignificantly reduce the WSS value of K clusters then theclassification is of little significance

WSS 1113944N

i1d pi q

i1113872 1113873

2 (4)

(2) Mean Shift Clustering Algorithm Mean shift clustering isa general nonparametric cluster finding procedure introducedby Fukunaga and Hostetler [37] and it does not depend onany explicit assumptions on the shape of the point distri-bution the number of clusters or any form of random ini-tialization Mean shift treats the clustering problem bysupposing that all points given represent samples from someunderlying probability density function with regions of highsample density corresponding to the local maxima of thisdistribution To find these local maxima the algorithm worksby allowing the points to attract each other via what might beconsidered a short-ranged ldquogravitationalrdquo force Allowing thepoints to gravitate towards areas of higher density one canshow that they will eventually coalesce at a series of pointsclose to the local maxima of the distribution -ose datapoints that converge to the same local maxima are consideredto bemembers of the same cluster For amathematical detailssee Comaniciu and Meer [38] In the next sections we il-lustrate application of the algorithm to a couple of problemsusing the python package SkLearn which contains a meanshift implementation

322 Classification Algorithm Classification is a kind ofsupervised learning algorithm of training a classifier in agroup of samples that already know the class label so that itcan classify an unknown sample In the field of machinelearning there are hundreds of classifiers to solve real-worldclassification problems [39] and in this research fivecommonly used classification algorithms are selected ran-dom forest classifier K-nearest neighbor classifier logisticregression support vector machine and artificial neuralnetwork -e five algorithms used in this study are based onthe Python platform Scikit-learn package free from httpsscikit-learnorg-e parameters of each algorithm have beenadjusted to ensure the optimal performance of the algo-rithm In the analysis of shared bicycles the accuracy androbustness of these five classification algorithms will becompared

(1) Random Forest Classifier Random forest classifier (RFC)is the most widely used supervised machine learning al-gorithm It is very powerful and usually gives good resultswithout the need to repeatedly adjust the parameters -ebasic unit of random forests is the decision tree A randomforest is a classifier that contains multiple decision trees andthe category of its output is determined by the mode of thecategory of the individual tree output [40] For an inputsampleN trees will haveN classification results-e randomforest integrates all the classification voting results andspecifies the category with the highest number of votes asoutput It has several advantages it enables to handlethousands of input variables without variable deletion andgives estimates of what variables are important in theclassification

(2) K-Nearest Neighbor Classifier K-nearest neighbor(KNN) is a method of measuring the distance betweendifferent feature values for classification Given a training setD and a test object z the test object is a vector composed ofattribute values and an unknown category label -e algo-rithm needs to calculate the distance (or similarity) betweenz and each training object In this way the list of nearestneighbors can be determined-en assign the category withthe dominant number of instances in the nearest neighbor toz -e advantage is that it is easy to understand and goodperformance can be obtained without excessive adjustment-e disadvantage is that the prediction speed is slow and thedataset with many characteristics cannot be processed It isvulnerable to data imbalance And the interpretability of theoutput is not strong

(3) Logistic Regression Logistic regression (LR) is essentiallya linear classifier which refers to the establishment of aregression formula on the classification boundary line basedon the existing data to classify -e calculation cost of thismethod is not high and it is easy to understand and im-plement -e fitted parameters can clearly see the impact ofeach feature on the result And most of the time is used fortraining and classification is fast after training is completedbut it is easy to underfit and the classification accuracy is nothigh -e main reason is that LR is linear fitting but inreality many things do not satisfy linearity

(4) Support Vector Machine Support vector machine (SVM)maps the data to a multidimensional space in the form ofpoints thereby converting the nonlinear separable problemin the original sample space into a linear separable problemin the feature space so that the optimal hyperplane forclassification can be found-en classify the set according tothe hyperplane SVM can make good predictions on dataoutside the training set and has a low generalization errorrate low computational overhead and easy-to-interpretresults but it is too sensitive to parameter adjustments andkernel function parameters

(5) Artificial Neural Network Artificial neural network(ANN) is an information processing system based on imi-tating the structure and function of the brainrsquos neural

4 Journal of Advanced Transportation

network -e ANN algorithm is a set of continuous inputoutput units where each connection is associated with aweight In the learning stage by adjusting the weights of theneural network the correct class label of the sample to belearned can be predicted -e advantages of the ANN al-gorithm are high classification accuracy and strong dis-tributed parallel processing capabilities Artificial neuralnetworks have strong robustness and fault tolerance fordatasets containing a large amount of noisy data but thelearning process cannot be observed and the output resultsare difficult to interpret which will affect the reliability andacceptability of the results It also requires a large number ofparameters such as network topology initial values ofweights and thresholds

4 Study Area

41 OFO Dockless PBS in Shenzhen -is paper focuses onChinarsquos fastest urbanizing city Shenzhen to lay a foun-dation for empirical analysis of the intensity of usage of theOFO dockless bike sharing system It provides a uniquecase study as it is one of the largest bike share programslocated in a metropolis OFO bicycle sharing system waslaunched in Shenzhen in December 2016 with more than220000 bicycles We scanned the working status of thesebicycles every 15 minutes in one week of September 2017-ere are about 576 million bicycle status records in a dayFor a bicycle ID we first judge whether the bicycle is usedby comparing whether its position has changed Ifchanged we saved the time and position of the bicycle-en according to the average travel speed and traveldistance of the bicycle the abnormal bicycle use record isrejected Figure 1 demonstrates the bike service area inShenzhen

Figure 2 shows the trip summary of shared bicycles in 24hours in a workday -ere are two distinct peaks in sharedbicycle use in a workday -e morning peak is between 0800-0900 and the evening peak is between 1800ndash2100 It isreasonable to assume that bikes are used for commuting Inthe morning peak the trip number of bicycles exceeded50000-e trips in evening peak were slightly lower than theearly peak but still more than 40000 During the period of0100ndash0600 bicycle usage is stable and the lowest with about5000 trips per hour At noon period from 1200 to 1500 thebicycle use is about 20000 per hour -e amount of bicycleuse dropped from 40000 to 10000 per hour at the nightperiod which is from 2200 to 2400

42 Influencing Factor of Bike Use In the previous studiesfactors influencing public bike usage are grouped into fourcategories transportation land-usebuild environmentpopulation and meteorological data -e weather variablesare not considered in our study A total 25 factors wereselected including 6 categories of variables populationpoint of interest (POI) road network public transportationdistance and building function -e detailed factors arelisted in Table 1

5 Results and Discussion

51lte Identification of BicycleGatheringArea We used themean shift clustering method to identify the clustering areaof bicycles based on the position of the bicycle at 0900 Inthe choice of bandwidth we considered two bandwidths 300meters and 500 meters because the area identified by thesetwo bandwidths is approximately equal to the grid area sizeof 500mlowast 500m and 1000mlowast 1000m -e minimumnumber of bicycles included in each category is set to 100When the bandwidth is 300 meters a total of 492 bicyclegathering areas are obtained-e 492 bicycle gathering areascontain a total of 140000 bicycles accounting for 636 ofall bicycles When the bandwidth is 500m a total of 270gathering areas are obtained including 140000 bicyclesConsidering that the bicycles contained in the 492 clustersare more compact we finally selected 492 clusters as theanalysis objects

Figure 3 shows the OFO bicycle gathering area identifiedby mean shift clustering In Figure 3 each cluster has acenter point and the buffer analysis was proposed to obtainthe range of the bicycle gathering area-e buffer is a kind ofinfluence range or service scope of the geospatial targetwhich refers to the polygons of a certain width which areautomatically established around the point the line and thesurface entity 300 meters of buffers were established byArcGIS based on all cluster center points thus to calculatethe indicators of dockless PBS and influencing factor ofbicycle gathering area

52 Performance of Five Classification Algorithms Aftercalculating the available bike number and coefficient ofavailable bike number variation of bicycle cluster area thek-means algorithmwas executed to group these areas Figure4 shows the WSS curve and we made WSS values from 2clusters to 19 When k is increased from 2 to 8 WSS de-creases significantlyWhen kgt 8 the improvement ofWSS isvery linear so the cluster centers have similar characteristics-e larger the kmeans the more the classifications of bicyclecluster area which is likely to impact on the accuracy of theclassification algorithm It is necessary to find an optimal kvalue to balance between the accurate cluster and accurateclassification prediction -is research adopts an experi-mental strategy to select k from 3 to 8 and then uses fiveclassification algorithms to compare the prediction accuracy-e bicycle gathering areas in the same cluster are markedwith the same label using k-means clustering Five classi-fication algorithms were compared in the accuracy of dis-tinguishing the type of bicycle gathering areas using 25impact factors -e experimental process is divided into twostages including training and application At the stage oftraining the 492 gathering areas are randomly divided intotwo parts -e first part containing 75 of areas is used fortraining data and the second part as the test data is used toverify the accuracy Figure 5 shows the accuracy of the fiveclassification algorithms in the training set and the test setwhen k takes different values

Journal of Advanced Transportation 5

In the training set the performance differences of thefive algorithms are obvious For different K values the RFCalways maintains the highest accuracy rate which is higherthan 90 -e ANN also has a high accuracy rate When Kis 3-4 the accuracy rate is above 90 and when the k valueis 5ndash8 the accuracy rate drops to above 80 KNN algo-rithm performance is in the middle of the five algorithmsWhen the K value is 3-4 the accuracy rate is above 70and when the K value is 5ndash8 the accuracy rate drops toabove 60 As the k value increases the accuracy of SVMdrops from 63 to 48-e worst-performing algorithm isthe LR algorithm As the k value increases the accuracy ratedrops from 58 to 37 In addition in the trend that theaccuracy rate changes with the value of k the accuracy rateof the RFC algorithm fluctuates little and the other al-gorithms have the highest accuracy rate when the value of Kis small As the value of k increases the accuracy ratedecreases and When k= 8 with ANN the accuracy rate hasincreased In the test set the accuracy of the five algorithmsis lower than that of the training set -e most accurate

performance is still the RFC algorithm When k = 4 RFChas the highest accuracy rate of 7697 which is the onlycase in the test set where the accuracy rate exceeds 75When k = 3 its accuracy rate is 73 For ANN the highestaccuracy rate is 71 when k = 4 -e accuracy of KNN inthe training set is better than that of SVM and LR but itsperformance in the test set is not much different from SVMand LR -e accuracy of these three algorithms is low Inaddition RFC ANN and LR all have the highest accuracywhen k = 4

Comprehensive comparison of the performance of thefive algorithms in the training set and the test set shows thatRFC and ANN have better performance in the prediction ofthe type of bicycle clusters -e accuracy of ANN SVMand LR in the training set is quite different but the dif-ference in the test set is not obvious Basically when the kvalue is greater than 4 as the K value increases the accuracyof the five classification models has a downward trendWhen K 4 RFC and ANN have the highest test accuracyWe choose k 4 which means that the bicycle aggregationis divided into 4 types for further analysis

53 lte Analysis of Bicycle Gathering Area

531 lte Clustering of Bicycle Gathering Area Table 2shows the description of the cluster center when k 4-e table mainly lists four indicators which are thestandard and original values of Abn_DAY and the stan-dard and original values of the coefficient of variation (in k-means clustering standard values were used)-e fourcluster centers have obvious characteristics We first divide

0100002000030000400005000060000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Trips

Figure 2 -e number of sharing bicycle trips in 24 hours in aworkday

Baoan

Longgang

Dapeng

Nanshan

Longhua Pingshan

Guangming

Luohu

Futian

Yantian

Bike service area

Built area

Shenzhen

0 10 20 30 405Kilometers

N

Figure 1 OFO bike service area in Shenzhen

6 Journal of Advanced Transportation

the four clusters into inefficient and efficient groups accordingto the value of cv A high cv indicates that the number ofavailable bicycles in the gathering area is more appropriate andthe use of regional bicycles is efficient A low cv indicates thatthe number of available bicycles in the gathering area is largewhich does not match the number of active bicycles and theuse of bicycles is inefficient We call clusters with z_cv lowerthan 0 as an inefficient group and clusters with z_cv greaterthan 0 as an efficient group -en each group is divided intotwo subtypes according to Abn DAY and cv

(i) Cluster A z_ Abn_DAYgt 1 z_cvlt 0 this clustercan be called high efficiencymodeAbn_DAY in thisgroup is reaching 416 but the average daily changeof vehicles is very few with an average of only51-ere are excessive bicycles deployed or stayed inthe area and the activity of more than 300 bicycles isnot high

(ii) Cluster B z_Abn DAYlt 0 z_cvlt 0 we call it anormal inefficient mode Its z_cvlt 0 is as same asCluster A but z_ Abn_DAYlt 0 compared with Aindicating that the number of bicycles in the clusterarea in this group is less than A-ere are about 185bicycles with an average of 17 bikes daily used andmore than 150 are not very active

(iii) Cluster C z_Abn DAYlt 0 z_cv> 2 this cluster hasthe highest cv value among the four clusters indicatingthat the number of available bicycles in this groupmatches the demand for bicycles and there are not toomany idle bicycles available -e average number ofavailable bicycles in this group is 213 and the averagedaily change is 117 More than half of the bicycles areused so it is called high efficient mode

(iv) Cluster D z_Abn DAYlt 0 z_cv> 0 this cluster issimilar to C except that the z_cv value is lower thanthat of the C class but exceeds the average In thisgroup the average number of available bicycles isalmost similar to C but Abn DAY is about half ofthat of C and its average cv is 2-3 times that ofgroups A and B -e use efficiency of bicycles ishigher than that of A and B but lower than C so it iscalled normal effective mode

In general high inefficient mode of cluster A has the maxaverage number of available bikes and high efficiency modeof cluster C has the largest average coefficient of variation-e difference between A and B is in the average number ofavailable bikes and the difference between B C and D is inthe coefficient of variation From the number of the clusterscluster A and cluster B together account for about 73 so

Table 1 -e influencing factors of bike use

Factor type Factor Unit CalculationPopulation Population Number Count the number of residents in the service area

POI

Restaurant Number

Count the number of POIs of the corresponding category in the service areaCompany NumberSmall store NumberCar park Number

Road network

Length of main road m

Calculate the total length of the corresponding road level in the service areaLength of secondaryroad m

Length of branch road m

Publictransportation

Bus stop Number Calculate the number of bus stops in the service area

Distance to subway m Calculate the distance from the center of the service area to the nearest subwaystation

Distance

Distance to university m

Calculate the distance from the center of the service area to the closestcorresponding place

Distance to government mDistance tosupermarket m

Distance to hub mDistance to square mDistance to park mDistance to school mDistance to hospital m

Building function

Office building m2

Calculate the total floor area of the corresponding building in the service area

Industrial building m2

Public building m2

Commercial building m2

Residential building m2

Urban village building m2

Warehouse m2

Building number NumberCover ratio -e ratio of the projected area of all buildings to the area of the service area

Journal of Advanced Transportation 7

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 2: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

bicycles can affect the cost and efficiency of operation whichis not conducive to the long-term sustainability of thesystem -e government and scholars have paid more andmore attention to the question of how to rationally developdockless PBS in the city

Computational intelligence such as artificial neuralnetworks fuzzy systems and evolutionary computing hasachieved significant results in modeling learning and searchand optimization problems for smart city applications [3 4]-e characteristics of machine learning make it attractive foranalyzing smart city data with complex nature [5] such asmodes (streams time series images videos and texts) largeamounts (continuous data generated by millions of sensingdevices) space-time dependence etc Researchers in smartcities have applied machine learning in many areas such asurban human mobility [6] public space utilization [7] andpublic bus charging station placement [8] Since the numberof bicycles is critical to the sustainable development ofdockless PBS this research practiced the introduction of amachine learning approach to quantity management Fourissues are discussed from the existing shared bicycle oper-ation data (1) How to identify the gathering area of docklessshared bicycles (2) How to measure the number of bicyclesand activity characteristics in the bicycle gathering area (3)What are the differences between classification algorithms inpredicting the types of bicycle gathering areas (4) How touse activity pattern to guide dockless PBS rationally developin the city In this study first two clustering algorithms wereused to identify the bicycle gathering area and the availablebike number and coefficient of available bike number var-iation were analyzed in each bicycle gathering arearsquos typeSecond five classification algorithms were compared in theaccuracy of distinguishing the type of bicycle gathering areasusing impact factors Finally the application of theknowledge gained from the existing dockless bicycle oper-ation data to guide the number planning andmanagement ofpublic bicycles was explored

-e rest of this paper is organized as follows Section 2presents a literature review on the systems perspective ofpublic bike research Section 3 introduces the indicators andmethods used Section 4 briefly describes bicycle operationdata and influencing variables In Section 5 we discuss thebicycle gathering area typersquos recognition prediction andapplication Finally Section 6 summarizes the results of thisstudy and provides direction in future studies

2 Previous Work

PBS is involved in many areas of research and it is broadlybased on two perspectives user perspective and systemsperspective [9] In this study we only focus on a systemsperspective according to the goals

21 Bike Sharing Rebalance For PBS the lack of resources isthe major issue a user can arrive at a station that has no bikeavailable or wants to return her bike at a station with noempty spot Based on the practical usage several studiesfocused to deal with public bike rebalancing problem using

intelligent algorithms Fricker and Gast [10] proposed astochastic model of homogeneous PBSs to study the effect ofusersrsquo random choices on the number of problematic sta-tions -ey also computed the rate of which bikes must beredistributed by trucks to ensure a given quality of serviceYou et al [11] provided an integrated model to resolve theproblems of fleet sizing empty-resource repositioning andvehicle routing for bike transfer in multiple station systemsOrsquoMahony and Shmoys [12] tackled the problem of reba-lancing PBS during rush hour An optimization problemwhose goal is to plan truck routes to make PBS as balanced aspossible in night shift was studied and novel methods weredeveloped for optimizing rebalancing resources Chen et al[13] addressed the layout planning of public bicycle systemwithin the attracted scope of a metro station Locations ofdifferent PBS service stations and the optimal route optionsfor the implementation of the redistributing strategy wereconsidered Lozano et al [14] proposed a multiagent modelthat provides visualization and prediction tools for PBS

22 Bike Demand Estimation -ese studies examine theinfluence of PBS infrastructure transportation networkinfrastructure land use and urban form meteorologicaldata and temporal characteristics on PBS usage Faghih-Imani et al [15] collected station-level occupancy data andthen transformed station occupancy snapshot data intostation-level customer arrivals and departures -ey devel-oped a mixed linear model to estimate the influence ofbicycle infrastructure sociodemographic characteristicsand land-use characteristics on customer arrivals and de-partures In the work of Krykewycz et al [16] various de-mographic land use and infrastructure factors understoodto be favorable for bike share usage were spatially analyzed todefine a primary market area El-Assi et al [17] investigatedthe effects of weather socioeconomic and demographicfactors and land use and the built environment on bicycleshare ridership A regression analysis was performed onthree different levels Hampshire and Marla [18] employed apanel regression model to explain the factors affecting thebike sharing trip generation and attraction in the presence ofunobserved spatial and temporal variables -e data usedincluded PBSrsquos usage data in Barcelona and Seville ninecensus demographic data and the location of points ofinterest (POIs) Zhang et al [19] employed a multiple linearregression model to examine the influence of built envi-ronment variables on trip demand as well as on the ratio ofdemand to supply at bike stations in China Faghih-Imaniet al [15] investigated factors affecting bicycle share demandat the station level using real-time ridership data -e resultsshowed that stations close to major roads had lower tripactivities compared to stations that were situated aroundminor roads and bicycle lanes A number of land use andbuilt environment variables temporal characteristics andweather variables such as temperature were investigatedMaurer [20] used a pairwise suitability analysis to under-stand the effects of variables such as job density householdincome and alternative commuters on public bicycle shareridership to propose the locations of bicycle stations in

2 Journal of Advanced Transportation

Sacramento California Gebhart and Noland [21] used real-time ridership data for Capital Bikeshare in WashingtonDC to investigate the impact of weather variables andproximity of bike share stations to metro stations on rid-ership levels Buck and Buehler [22] investigated the in-fluence of bicycle infrastructure population density landuse mix around stations and the number of householdswithout a car using bicycle share systems using ridershipdata from Capital Bikeshare Wang et al [23] evaluated theeffect of sociodemographic land use built environment andtransportation infrastructure variables on bicycle shareridership Rixey [24] explored the influence of socio-demographic characteristics such as education income andemployment and population density on monthly ridershipdata from three states of the USA

23 Spatial and Temporal Patterns of Bike Use -ese studiesexplore the spatial and temporal patterns of bike use over thetime of day using data mining and visualization techniquesClustering is frequently used to identify mobility patterns inBSS usage by partitioning the stations into different clustershaving a similar usage Wong and Cheng [25] presented theinsights of imbalanced public bicycle distributions throughthe analysis of spatiotemporal activity patterns of bikestations -e clustering algorithm was used to analyze howstation activity patterns were geographically distributedbased on their usage patterns -ey also explored how theseactivity patterns relate to underlying cultural and spatialcharacteristics of Taipei City in China Temporal and spa-tiotemporal patterns among bike stations of Barcelona bikesharing system were explored by Froehlich et al [26]Numerous research studies also used a hierarchical clus-tering method to generate clusters and investigate usagepatterns geographically distributed in the city to understandthe impact of the inhomogeneity of the city on the long-runactivity of stations [27ndash29] Brien et al [30] proposed aclassification of bike shares based on the geographicalfootprint and diurnal day-of-week and spatial variations inoccupancy rates Etienne and Latifa [31] presented an au-tomatic algorithm based on a new statistical model to au-tomatically cluster PBS stations according to their usageprofile Zhou [32] investigated the spatiotemporal bikingpattern in Chicago by analyzing massive BSS data from Julyto December in 2013 and 2014 Bike flow similarity graphwas constructed with a fast greedy algorithm to detect spatialcommunities of biking flows

Scholars have achieved rich results in measuring theindicators of the bicycle system and the factors affectingcycling -e methods of research mainly involve regressionmodels -e knowledge gained most comes from the dockPBS except a few studies [33ndash35]

3 Methodology

31 Indicators of Dockless PBS -ere are many indicators tomeasure the PBS including the number of bicycle use arrivalrate and departure rate -is study focused on the number ofavailable bicycles and their changes so two indicators were used

311 Average Available Bike Number Unlike the dock PBSthe maximum available bike number is fixed and determinedby the number of docks of station For the dockless PBS themaximum available bike is not subject to parking restric-tions It is related to the initial bike quantity status of de-ployment by the system and varies as the bike flows Weproposed the average available bike number to explore thedockless PBS It represents the number of bicycles availablein a bike service area -is metric is used to measure thebicycle resource -e number of bikes available per hour(Abni) can be calculated by equation (1)

abni 1113936

5d1 abn

id

5 (1)

abn DAY 1113936

24i1Abni

24 (2)

where i represents the ith hour in a day d represents the dthworkday of a week Abni

d is the number of available bicyclesin the ith hour of the dth day Abni is the average number ofavailable bicycles in hour i in work day and abn DAY inequation (2) is the average available vehicle for bike servicearea throughout the day

312 Coefficient of Available Bike Number Variation-e coefficient of variation is used to compare the degree ofdispersion of the two sets of data which can eliminate theinfluence of measurement scale and dimension In thisstudy the coefficient of variation was used to compare thechanges in available bicycles in 24 hours of a day amongbicycle service areas-e calculation formula is shown in thefollowing equation

cv sd abni( 1113857

abn DAYtimes 100 (3)

where sd(abni) is the standard deviation of available bi-cycles number in 24 hours in a service area Obviously cv isaffected by the two statistics of mean and standard deviationof available bike number -is metric is used to measure thevariation in bicycle usage with average available bikenumber

32 Clustering and Classification in MachineLearning Approach

321 Clustering Algorithm Clustering is an unsupervisedlearning algorithm of classifying and organizing members indatasets which are similar [36]

(1) k-Means Clustering Algorithm Given a set of data thek-means algorithm divides the data into k clusters repeatedlyaccording to a distance function -e algorithm operates ona set of d dimensional vectors D xi|i 1 N1113864 1113865 wherexi isin d denotes ith data point -e algorithm is initialized bypicking k points in d as the initial k cluster representatives orldquocentroidsrdquo Techniques for selecting these initial seeds in-clude sampling at random from the dataset setting them as

Journal of Advanced Transportation 3

the solution of clustering a small subset of the data orperturbing the global mean of the data k times -en thealgorithm iterates between two steps till convergence Aboutthe value of k we can choose from reasonable guessed orpredefined number but it is better to know whether k

clusters is better or worse than k minus 1 or k + 1 clusters -emethod of With he Sum of Square (WSS) is often used to getreasonable K value WSS is the sum of the square of thedistance between all points and their nearest centroid point-e calculation is shown in equation (4) pi represents thepoint i and qi represents the nearest centroid point to i if alldata points are relatively close to their respective centersthen the WSS is relatively small If K + 1 clusters do notsignificantly reduce the WSS value of K clusters then theclassification is of little significance

WSS 1113944N

i1d pi q

i1113872 1113873

2 (4)

(2) Mean Shift Clustering Algorithm Mean shift clustering isa general nonparametric cluster finding procedure introducedby Fukunaga and Hostetler [37] and it does not depend onany explicit assumptions on the shape of the point distri-bution the number of clusters or any form of random ini-tialization Mean shift treats the clustering problem bysupposing that all points given represent samples from someunderlying probability density function with regions of highsample density corresponding to the local maxima of thisdistribution To find these local maxima the algorithm worksby allowing the points to attract each other via what might beconsidered a short-ranged ldquogravitationalrdquo force Allowing thepoints to gravitate towards areas of higher density one canshow that they will eventually coalesce at a series of pointsclose to the local maxima of the distribution -ose datapoints that converge to the same local maxima are consideredto bemembers of the same cluster For amathematical detailssee Comaniciu and Meer [38] In the next sections we il-lustrate application of the algorithm to a couple of problemsusing the python package SkLearn which contains a meanshift implementation

322 Classification Algorithm Classification is a kind ofsupervised learning algorithm of training a classifier in agroup of samples that already know the class label so that itcan classify an unknown sample In the field of machinelearning there are hundreds of classifiers to solve real-worldclassification problems [39] and in this research fivecommonly used classification algorithms are selected ran-dom forest classifier K-nearest neighbor classifier logisticregression support vector machine and artificial neuralnetwork -e five algorithms used in this study are based onthe Python platform Scikit-learn package free from httpsscikit-learnorg-e parameters of each algorithm have beenadjusted to ensure the optimal performance of the algo-rithm In the analysis of shared bicycles the accuracy androbustness of these five classification algorithms will becompared

(1) Random Forest Classifier Random forest classifier (RFC)is the most widely used supervised machine learning al-gorithm It is very powerful and usually gives good resultswithout the need to repeatedly adjust the parameters -ebasic unit of random forests is the decision tree A randomforest is a classifier that contains multiple decision trees andthe category of its output is determined by the mode of thecategory of the individual tree output [40] For an inputsampleN trees will haveN classification results-e randomforest integrates all the classification voting results andspecifies the category with the highest number of votes asoutput It has several advantages it enables to handlethousands of input variables without variable deletion andgives estimates of what variables are important in theclassification

(2) K-Nearest Neighbor Classifier K-nearest neighbor(KNN) is a method of measuring the distance betweendifferent feature values for classification Given a training setD and a test object z the test object is a vector composed ofattribute values and an unknown category label -e algo-rithm needs to calculate the distance (or similarity) betweenz and each training object In this way the list of nearestneighbors can be determined-en assign the category withthe dominant number of instances in the nearest neighbor toz -e advantage is that it is easy to understand and goodperformance can be obtained without excessive adjustment-e disadvantage is that the prediction speed is slow and thedataset with many characteristics cannot be processed It isvulnerable to data imbalance And the interpretability of theoutput is not strong

(3) Logistic Regression Logistic regression (LR) is essentiallya linear classifier which refers to the establishment of aregression formula on the classification boundary line basedon the existing data to classify -e calculation cost of thismethod is not high and it is easy to understand and im-plement -e fitted parameters can clearly see the impact ofeach feature on the result And most of the time is used fortraining and classification is fast after training is completedbut it is easy to underfit and the classification accuracy is nothigh -e main reason is that LR is linear fitting but inreality many things do not satisfy linearity

(4) Support Vector Machine Support vector machine (SVM)maps the data to a multidimensional space in the form ofpoints thereby converting the nonlinear separable problemin the original sample space into a linear separable problemin the feature space so that the optimal hyperplane forclassification can be found-en classify the set according tothe hyperplane SVM can make good predictions on dataoutside the training set and has a low generalization errorrate low computational overhead and easy-to-interpretresults but it is too sensitive to parameter adjustments andkernel function parameters

(5) Artificial Neural Network Artificial neural network(ANN) is an information processing system based on imi-tating the structure and function of the brainrsquos neural

4 Journal of Advanced Transportation

network -e ANN algorithm is a set of continuous inputoutput units where each connection is associated with aweight In the learning stage by adjusting the weights of theneural network the correct class label of the sample to belearned can be predicted -e advantages of the ANN al-gorithm are high classification accuracy and strong dis-tributed parallel processing capabilities Artificial neuralnetworks have strong robustness and fault tolerance fordatasets containing a large amount of noisy data but thelearning process cannot be observed and the output resultsare difficult to interpret which will affect the reliability andacceptability of the results It also requires a large number ofparameters such as network topology initial values ofweights and thresholds

4 Study Area

41 OFO Dockless PBS in Shenzhen -is paper focuses onChinarsquos fastest urbanizing city Shenzhen to lay a foun-dation for empirical analysis of the intensity of usage of theOFO dockless bike sharing system It provides a uniquecase study as it is one of the largest bike share programslocated in a metropolis OFO bicycle sharing system waslaunched in Shenzhen in December 2016 with more than220000 bicycles We scanned the working status of thesebicycles every 15 minutes in one week of September 2017-ere are about 576 million bicycle status records in a dayFor a bicycle ID we first judge whether the bicycle is usedby comparing whether its position has changed Ifchanged we saved the time and position of the bicycle-en according to the average travel speed and traveldistance of the bicycle the abnormal bicycle use record isrejected Figure 1 demonstrates the bike service area inShenzhen

Figure 2 shows the trip summary of shared bicycles in 24hours in a workday -ere are two distinct peaks in sharedbicycle use in a workday -e morning peak is between 0800-0900 and the evening peak is between 1800ndash2100 It isreasonable to assume that bikes are used for commuting Inthe morning peak the trip number of bicycles exceeded50000-e trips in evening peak were slightly lower than theearly peak but still more than 40000 During the period of0100ndash0600 bicycle usage is stable and the lowest with about5000 trips per hour At noon period from 1200 to 1500 thebicycle use is about 20000 per hour -e amount of bicycleuse dropped from 40000 to 10000 per hour at the nightperiod which is from 2200 to 2400

42 Influencing Factor of Bike Use In the previous studiesfactors influencing public bike usage are grouped into fourcategories transportation land-usebuild environmentpopulation and meteorological data -e weather variablesare not considered in our study A total 25 factors wereselected including 6 categories of variables populationpoint of interest (POI) road network public transportationdistance and building function -e detailed factors arelisted in Table 1

5 Results and Discussion

51lte Identification of BicycleGatheringArea We used themean shift clustering method to identify the clustering areaof bicycles based on the position of the bicycle at 0900 Inthe choice of bandwidth we considered two bandwidths 300meters and 500 meters because the area identified by thesetwo bandwidths is approximately equal to the grid area sizeof 500mlowast 500m and 1000mlowast 1000m -e minimumnumber of bicycles included in each category is set to 100When the bandwidth is 300 meters a total of 492 bicyclegathering areas are obtained-e 492 bicycle gathering areascontain a total of 140000 bicycles accounting for 636 ofall bicycles When the bandwidth is 500m a total of 270gathering areas are obtained including 140000 bicyclesConsidering that the bicycles contained in the 492 clustersare more compact we finally selected 492 clusters as theanalysis objects

Figure 3 shows the OFO bicycle gathering area identifiedby mean shift clustering In Figure 3 each cluster has acenter point and the buffer analysis was proposed to obtainthe range of the bicycle gathering area-e buffer is a kind ofinfluence range or service scope of the geospatial targetwhich refers to the polygons of a certain width which areautomatically established around the point the line and thesurface entity 300 meters of buffers were established byArcGIS based on all cluster center points thus to calculatethe indicators of dockless PBS and influencing factor ofbicycle gathering area

52 Performance of Five Classification Algorithms Aftercalculating the available bike number and coefficient ofavailable bike number variation of bicycle cluster area thek-means algorithmwas executed to group these areas Figure4 shows the WSS curve and we made WSS values from 2clusters to 19 When k is increased from 2 to 8 WSS de-creases significantlyWhen kgt 8 the improvement ofWSS isvery linear so the cluster centers have similar characteristics-e larger the kmeans the more the classifications of bicyclecluster area which is likely to impact on the accuracy of theclassification algorithm It is necessary to find an optimal kvalue to balance between the accurate cluster and accurateclassification prediction -is research adopts an experi-mental strategy to select k from 3 to 8 and then uses fiveclassification algorithms to compare the prediction accuracy-e bicycle gathering areas in the same cluster are markedwith the same label using k-means clustering Five classi-fication algorithms were compared in the accuracy of dis-tinguishing the type of bicycle gathering areas using 25impact factors -e experimental process is divided into twostages including training and application At the stage oftraining the 492 gathering areas are randomly divided intotwo parts -e first part containing 75 of areas is used fortraining data and the second part as the test data is used toverify the accuracy Figure 5 shows the accuracy of the fiveclassification algorithms in the training set and the test setwhen k takes different values

Journal of Advanced Transportation 5

In the training set the performance differences of thefive algorithms are obvious For different K values the RFCalways maintains the highest accuracy rate which is higherthan 90 -e ANN also has a high accuracy rate When Kis 3-4 the accuracy rate is above 90 and when the k valueis 5ndash8 the accuracy rate drops to above 80 KNN algo-rithm performance is in the middle of the five algorithmsWhen the K value is 3-4 the accuracy rate is above 70and when the K value is 5ndash8 the accuracy rate drops toabove 60 As the k value increases the accuracy of SVMdrops from 63 to 48-e worst-performing algorithm isthe LR algorithm As the k value increases the accuracy ratedrops from 58 to 37 In addition in the trend that theaccuracy rate changes with the value of k the accuracy rateof the RFC algorithm fluctuates little and the other al-gorithms have the highest accuracy rate when the value of Kis small As the value of k increases the accuracy ratedecreases and When k= 8 with ANN the accuracy rate hasincreased In the test set the accuracy of the five algorithmsis lower than that of the training set -e most accurate

performance is still the RFC algorithm When k = 4 RFChas the highest accuracy rate of 7697 which is the onlycase in the test set where the accuracy rate exceeds 75When k = 3 its accuracy rate is 73 For ANN the highestaccuracy rate is 71 when k = 4 -e accuracy of KNN inthe training set is better than that of SVM and LR but itsperformance in the test set is not much different from SVMand LR -e accuracy of these three algorithms is low Inaddition RFC ANN and LR all have the highest accuracywhen k = 4

Comprehensive comparison of the performance of thefive algorithms in the training set and the test set shows thatRFC and ANN have better performance in the prediction ofthe type of bicycle clusters -e accuracy of ANN SVMand LR in the training set is quite different but the dif-ference in the test set is not obvious Basically when the kvalue is greater than 4 as the K value increases the accuracyof the five classification models has a downward trendWhen K 4 RFC and ANN have the highest test accuracyWe choose k 4 which means that the bicycle aggregationis divided into 4 types for further analysis

53 lte Analysis of Bicycle Gathering Area

531 lte Clustering of Bicycle Gathering Area Table 2shows the description of the cluster center when k 4-e table mainly lists four indicators which are thestandard and original values of Abn_DAY and the stan-dard and original values of the coefficient of variation (in k-means clustering standard values were used)-e fourcluster centers have obvious characteristics We first divide

0100002000030000400005000060000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Trips

Figure 2 -e number of sharing bicycle trips in 24 hours in aworkday

Baoan

Longgang

Dapeng

Nanshan

Longhua Pingshan

Guangming

Luohu

Futian

Yantian

Bike service area

Built area

Shenzhen

0 10 20 30 405Kilometers

N

Figure 1 OFO bike service area in Shenzhen

6 Journal of Advanced Transportation

the four clusters into inefficient and efficient groups accordingto the value of cv A high cv indicates that the number ofavailable bicycles in the gathering area is more appropriate andthe use of regional bicycles is efficient A low cv indicates thatthe number of available bicycles in the gathering area is largewhich does not match the number of active bicycles and theuse of bicycles is inefficient We call clusters with z_cv lowerthan 0 as an inefficient group and clusters with z_cv greaterthan 0 as an efficient group -en each group is divided intotwo subtypes according to Abn DAY and cv

(i) Cluster A z_ Abn_DAYgt 1 z_cvlt 0 this clustercan be called high efficiencymodeAbn_DAY in thisgroup is reaching 416 but the average daily changeof vehicles is very few with an average of only51-ere are excessive bicycles deployed or stayed inthe area and the activity of more than 300 bicycles isnot high

(ii) Cluster B z_Abn DAYlt 0 z_cvlt 0 we call it anormal inefficient mode Its z_cvlt 0 is as same asCluster A but z_ Abn_DAYlt 0 compared with Aindicating that the number of bicycles in the clusterarea in this group is less than A-ere are about 185bicycles with an average of 17 bikes daily used andmore than 150 are not very active

(iii) Cluster C z_Abn DAYlt 0 z_cv> 2 this cluster hasthe highest cv value among the four clusters indicatingthat the number of available bicycles in this groupmatches the demand for bicycles and there are not toomany idle bicycles available -e average number ofavailable bicycles in this group is 213 and the averagedaily change is 117 More than half of the bicycles areused so it is called high efficient mode

(iv) Cluster D z_Abn DAYlt 0 z_cv> 0 this cluster issimilar to C except that the z_cv value is lower thanthat of the C class but exceeds the average In thisgroup the average number of available bicycles isalmost similar to C but Abn DAY is about half ofthat of C and its average cv is 2-3 times that ofgroups A and B -e use efficiency of bicycles ishigher than that of A and B but lower than C so it iscalled normal effective mode

In general high inefficient mode of cluster A has the maxaverage number of available bikes and high efficiency modeof cluster C has the largest average coefficient of variation-e difference between A and B is in the average number ofavailable bikes and the difference between B C and D is inthe coefficient of variation From the number of the clusterscluster A and cluster B together account for about 73 so

Table 1 -e influencing factors of bike use

Factor type Factor Unit CalculationPopulation Population Number Count the number of residents in the service area

POI

Restaurant Number

Count the number of POIs of the corresponding category in the service areaCompany NumberSmall store NumberCar park Number

Road network

Length of main road m

Calculate the total length of the corresponding road level in the service areaLength of secondaryroad m

Length of branch road m

Publictransportation

Bus stop Number Calculate the number of bus stops in the service area

Distance to subway m Calculate the distance from the center of the service area to the nearest subwaystation

Distance

Distance to university m

Calculate the distance from the center of the service area to the closestcorresponding place

Distance to government mDistance tosupermarket m

Distance to hub mDistance to square mDistance to park mDistance to school mDistance to hospital m

Building function

Office building m2

Calculate the total floor area of the corresponding building in the service area

Industrial building m2

Public building m2

Commercial building m2

Residential building m2

Urban village building m2

Warehouse m2

Building number NumberCover ratio -e ratio of the projected area of all buildings to the area of the service area

Journal of Advanced Transportation 7

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 3: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

Sacramento California Gebhart and Noland [21] used real-time ridership data for Capital Bikeshare in WashingtonDC to investigate the impact of weather variables andproximity of bike share stations to metro stations on rid-ership levels Buck and Buehler [22] investigated the in-fluence of bicycle infrastructure population density landuse mix around stations and the number of householdswithout a car using bicycle share systems using ridershipdata from Capital Bikeshare Wang et al [23] evaluated theeffect of sociodemographic land use built environment andtransportation infrastructure variables on bicycle shareridership Rixey [24] explored the influence of socio-demographic characteristics such as education income andemployment and population density on monthly ridershipdata from three states of the USA

23 Spatial and Temporal Patterns of Bike Use -ese studiesexplore the spatial and temporal patterns of bike use over thetime of day using data mining and visualization techniquesClustering is frequently used to identify mobility patterns inBSS usage by partitioning the stations into different clustershaving a similar usage Wong and Cheng [25] presented theinsights of imbalanced public bicycle distributions throughthe analysis of spatiotemporal activity patterns of bikestations -e clustering algorithm was used to analyze howstation activity patterns were geographically distributedbased on their usage patterns -ey also explored how theseactivity patterns relate to underlying cultural and spatialcharacteristics of Taipei City in China Temporal and spa-tiotemporal patterns among bike stations of Barcelona bikesharing system were explored by Froehlich et al [26]Numerous research studies also used a hierarchical clus-tering method to generate clusters and investigate usagepatterns geographically distributed in the city to understandthe impact of the inhomogeneity of the city on the long-runactivity of stations [27ndash29] Brien et al [30] proposed aclassification of bike shares based on the geographicalfootprint and diurnal day-of-week and spatial variations inoccupancy rates Etienne and Latifa [31] presented an au-tomatic algorithm based on a new statistical model to au-tomatically cluster PBS stations according to their usageprofile Zhou [32] investigated the spatiotemporal bikingpattern in Chicago by analyzing massive BSS data from Julyto December in 2013 and 2014 Bike flow similarity graphwas constructed with a fast greedy algorithm to detect spatialcommunities of biking flows

Scholars have achieved rich results in measuring theindicators of the bicycle system and the factors affectingcycling -e methods of research mainly involve regressionmodels -e knowledge gained most comes from the dockPBS except a few studies [33ndash35]

3 Methodology

31 Indicators of Dockless PBS -ere are many indicators tomeasure the PBS including the number of bicycle use arrivalrate and departure rate -is study focused on the number ofavailable bicycles and their changes so two indicators were used

311 Average Available Bike Number Unlike the dock PBSthe maximum available bike number is fixed and determinedby the number of docks of station For the dockless PBS themaximum available bike is not subject to parking restric-tions It is related to the initial bike quantity status of de-ployment by the system and varies as the bike flows Weproposed the average available bike number to explore thedockless PBS It represents the number of bicycles availablein a bike service area -is metric is used to measure thebicycle resource -e number of bikes available per hour(Abni) can be calculated by equation (1)

abni 1113936

5d1 abn

id

5 (1)

abn DAY 1113936

24i1Abni

24 (2)

where i represents the ith hour in a day d represents the dthworkday of a week Abni

d is the number of available bicyclesin the ith hour of the dth day Abni is the average number ofavailable bicycles in hour i in work day and abn DAY inequation (2) is the average available vehicle for bike servicearea throughout the day

312 Coefficient of Available Bike Number Variation-e coefficient of variation is used to compare the degree ofdispersion of the two sets of data which can eliminate theinfluence of measurement scale and dimension In thisstudy the coefficient of variation was used to compare thechanges in available bicycles in 24 hours of a day amongbicycle service areas-e calculation formula is shown in thefollowing equation

cv sd abni( 1113857

abn DAYtimes 100 (3)

where sd(abni) is the standard deviation of available bi-cycles number in 24 hours in a service area Obviously cv isaffected by the two statistics of mean and standard deviationof available bike number -is metric is used to measure thevariation in bicycle usage with average available bikenumber

32 Clustering and Classification in MachineLearning Approach

321 Clustering Algorithm Clustering is an unsupervisedlearning algorithm of classifying and organizing members indatasets which are similar [36]

(1) k-Means Clustering Algorithm Given a set of data thek-means algorithm divides the data into k clusters repeatedlyaccording to a distance function -e algorithm operates ona set of d dimensional vectors D xi|i 1 N1113864 1113865 wherexi isin d denotes ith data point -e algorithm is initialized bypicking k points in d as the initial k cluster representatives orldquocentroidsrdquo Techniques for selecting these initial seeds in-clude sampling at random from the dataset setting them as

Journal of Advanced Transportation 3

the solution of clustering a small subset of the data orperturbing the global mean of the data k times -en thealgorithm iterates between two steps till convergence Aboutthe value of k we can choose from reasonable guessed orpredefined number but it is better to know whether k

clusters is better or worse than k minus 1 or k + 1 clusters -emethod of With he Sum of Square (WSS) is often used to getreasonable K value WSS is the sum of the square of thedistance between all points and their nearest centroid point-e calculation is shown in equation (4) pi represents thepoint i and qi represents the nearest centroid point to i if alldata points are relatively close to their respective centersthen the WSS is relatively small If K + 1 clusters do notsignificantly reduce the WSS value of K clusters then theclassification is of little significance

WSS 1113944N

i1d pi q

i1113872 1113873

2 (4)

(2) Mean Shift Clustering Algorithm Mean shift clustering isa general nonparametric cluster finding procedure introducedby Fukunaga and Hostetler [37] and it does not depend onany explicit assumptions on the shape of the point distri-bution the number of clusters or any form of random ini-tialization Mean shift treats the clustering problem bysupposing that all points given represent samples from someunderlying probability density function with regions of highsample density corresponding to the local maxima of thisdistribution To find these local maxima the algorithm worksby allowing the points to attract each other via what might beconsidered a short-ranged ldquogravitationalrdquo force Allowing thepoints to gravitate towards areas of higher density one canshow that they will eventually coalesce at a series of pointsclose to the local maxima of the distribution -ose datapoints that converge to the same local maxima are consideredto bemembers of the same cluster For amathematical detailssee Comaniciu and Meer [38] In the next sections we il-lustrate application of the algorithm to a couple of problemsusing the python package SkLearn which contains a meanshift implementation

322 Classification Algorithm Classification is a kind ofsupervised learning algorithm of training a classifier in agroup of samples that already know the class label so that itcan classify an unknown sample In the field of machinelearning there are hundreds of classifiers to solve real-worldclassification problems [39] and in this research fivecommonly used classification algorithms are selected ran-dom forest classifier K-nearest neighbor classifier logisticregression support vector machine and artificial neuralnetwork -e five algorithms used in this study are based onthe Python platform Scikit-learn package free from httpsscikit-learnorg-e parameters of each algorithm have beenadjusted to ensure the optimal performance of the algo-rithm In the analysis of shared bicycles the accuracy androbustness of these five classification algorithms will becompared

(1) Random Forest Classifier Random forest classifier (RFC)is the most widely used supervised machine learning al-gorithm It is very powerful and usually gives good resultswithout the need to repeatedly adjust the parameters -ebasic unit of random forests is the decision tree A randomforest is a classifier that contains multiple decision trees andthe category of its output is determined by the mode of thecategory of the individual tree output [40] For an inputsampleN trees will haveN classification results-e randomforest integrates all the classification voting results andspecifies the category with the highest number of votes asoutput It has several advantages it enables to handlethousands of input variables without variable deletion andgives estimates of what variables are important in theclassification

(2) K-Nearest Neighbor Classifier K-nearest neighbor(KNN) is a method of measuring the distance betweendifferent feature values for classification Given a training setD and a test object z the test object is a vector composed ofattribute values and an unknown category label -e algo-rithm needs to calculate the distance (or similarity) betweenz and each training object In this way the list of nearestneighbors can be determined-en assign the category withthe dominant number of instances in the nearest neighbor toz -e advantage is that it is easy to understand and goodperformance can be obtained without excessive adjustment-e disadvantage is that the prediction speed is slow and thedataset with many characteristics cannot be processed It isvulnerable to data imbalance And the interpretability of theoutput is not strong

(3) Logistic Regression Logistic regression (LR) is essentiallya linear classifier which refers to the establishment of aregression formula on the classification boundary line basedon the existing data to classify -e calculation cost of thismethod is not high and it is easy to understand and im-plement -e fitted parameters can clearly see the impact ofeach feature on the result And most of the time is used fortraining and classification is fast after training is completedbut it is easy to underfit and the classification accuracy is nothigh -e main reason is that LR is linear fitting but inreality many things do not satisfy linearity

(4) Support Vector Machine Support vector machine (SVM)maps the data to a multidimensional space in the form ofpoints thereby converting the nonlinear separable problemin the original sample space into a linear separable problemin the feature space so that the optimal hyperplane forclassification can be found-en classify the set according tothe hyperplane SVM can make good predictions on dataoutside the training set and has a low generalization errorrate low computational overhead and easy-to-interpretresults but it is too sensitive to parameter adjustments andkernel function parameters

(5) Artificial Neural Network Artificial neural network(ANN) is an information processing system based on imi-tating the structure and function of the brainrsquos neural

4 Journal of Advanced Transportation

network -e ANN algorithm is a set of continuous inputoutput units where each connection is associated with aweight In the learning stage by adjusting the weights of theneural network the correct class label of the sample to belearned can be predicted -e advantages of the ANN al-gorithm are high classification accuracy and strong dis-tributed parallel processing capabilities Artificial neuralnetworks have strong robustness and fault tolerance fordatasets containing a large amount of noisy data but thelearning process cannot be observed and the output resultsare difficult to interpret which will affect the reliability andacceptability of the results It also requires a large number ofparameters such as network topology initial values ofweights and thresholds

4 Study Area

41 OFO Dockless PBS in Shenzhen -is paper focuses onChinarsquos fastest urbanizing city Shenzhen to lay a foun-dation for empirical analysis of the intensity of usage of theOFO dockless bike sharing system It provides a uniquecase study as it is one of the largest bike share programslocated in a metropolis OFO bicycle sharing system waslaunched in Shenzhen in December 2016 with more than220000 bicycles We scanned the working status of thesebicycles every 15 minutes in one week of September 2017-ere are about 576 million bicycle status records in a dayFor a bicycle ID we first judge whether the bicycle is usedby comparing whether its position has changed Ifchanged we saved the time and position of the bicycle-en according to the average travel speed and traveldistance of the bicycle the abnormal bicycle use record isrejected Figure 1 demonstrates the bike service area inShenzhen

Figure 2 shows the trip summary of shared bicycles in 24hours in a workday -ere are two distinct peaks in sharedbicycle use in a workday -e morning peak is between 0800-0900 and the evening peak is between 1800ndash2100 It isreasonable to assume that bikes are used for commuting Inthe morning peak the trip number of bicycles exceeded50000-e trips in evening peak were slightly lower than theearly peak but still more than 40000 During the period of0100ndash0600 bicycle usage is stable and the lowest with about5000 trips per hour At noon period from 1200 to 1500 thebicycle use is about 20000 per hour -e amount of bicycleuse dropped from 40000 to 10000 per hour at the nightperiod which is from 2200 to 2400

42 Influencing Factor of Bike Use In the previous studiesfactors influencing public bike usage are grouped into fourcategories transportation land-usebuild environmentpopulation and meteorological data -e weather variablesare not considered in our study A total 25 factors wereselected including 6 categories of variables populationpoint of interest (POI) road network public transportationdistance and building function -e detailed factors arelisted in Table 1

5 Results and Discussion

51lte Identification of BicycleGatheringArea We used themean shift clustering method to identify the clustering areaof bicycles based on the position of the bicycle at 0900 Inthe choice of bandwidth we considered two bandwidths 300meters and 500 meters because the area identified by thesetwo bandwidths is approximately equal to the grid area sizeof 500mlowast 500m and 1000mlowast 1000m -e minimumnumber of bicycles included in each category is set to 100When the bandwidth is 300 meters a total of 492 bicyclegathering areas are obtained-e 492 bicycle gathering areascontain a total of 140000 bicycles accounting for 636 ofall bicycles When the bandwidth is 500m a total of 270gathering areas are obtained including 140000 bicyclesConsidering that the bicycles contained in the 492 clustersare more compact we finally selected 492 clusters as theanalysis objects

Figure 3 shows the OFO bicycle gathering area identifiedby mean shift clustering In Figure 3 each cluster has acenter point and the buffer analysis was proposed to obtainthe range of the bicycle gathering area-e buffer is a kind ofinfluence range or service scope of the geospatial targetwhich refers to the polygons of a certain width which areautomatically established around the point the line and thesurface entity 300 meters of buffers were established byArcGIS based on all cluster center points thus to calculatethe indicators of dockless PBS and influencing factor ofbicycle gathering area

52 Performance of Five Classification Algorithms Aftercalculating the available bike number and coefficient ofavailable bike number variation of bicycle cluster area thek-means algorithmwas executed to group these areas Figure4 shows the WSS curve and we made WSS values from 2clusters to 19 When k is increased from 2 to 8 WSS de-creases significantlyWhen kgt 8 the improvement ofWSS isvery linear so the cluster centers have similar characteristics-e larger the kmeans the more the classifications of bicyclecluster area which is likely to impact on the accuracy of theclassification algorithm It is necessary to find an optimal kvalue to balance between the accurate cluster and accurateclassification prediction -is research adopts an experi-mental strategy to select k from 3 to 8 and then uses fiveclassification algorithms to compare the prediction accuracy-e bicycle gathering areas in the same cluster are markedwith the same label using k-means clustering Five classi-fication algorithms were compared in the accuracy of dis-tinguishing the type of bicycle gathering areas using 25impact factors -e experimental process is divided into twostages including training and application At the stage oftraining the 492 gathering areas are randomly divided intotwo parts -e first part containing 75 of areas is used fortraining data and the second part as the test data is used toverify the accuracy Figure 5 shows the accuracy of the fiveclassification algorithms in the training set and the test setwhen k takes different values

Journal of Advanced Transportation 5

In the training set the performance differences of thefive algorithms are obvious For different K values the RFCalways maintains the highest accuracy rate which is higherthan 90 -e ANN also has a high accuracy rate When Kis 3-4 the accuracy rate is above 90 and when the k valueis 5ndash8 the accuracy rate drops to above 80 KNN algo-rithm performance is in the middle of the five algorithmsWhen the K value is 3-4 the accuracy rate is above 70and when the K value is 5ndash8 the accuracy rate drops toabove 60 As the k value increases the accuracy of SVMdrops from 63 to 48-e worst-performing algorithm isthe LR algorithm As the k value increases the accuracy ratedrops from 58 to 37 In addition in the trend that theaccuracy rate changes with the value of k the accuracy rateof the RFC algorithm fluctuates little and the other al-gorithms have the highest accuracy rate when the value of Kis small As the value of k increases the accuracy ratedecreases and When k= 8 with ANN the accuracy rate hasincreased In the test set the accuracy of the five algorithmsis lower than that of the training set -e most accurate

performance is still the RFC algorithm When k = 4 RFChas the highest accuracy rate of 7697 which is the onlycase in the test set where the accuracy rate exceeds 75When k = 3 its accuracy rate is 73 For ANN the highestaccuracy rate is 71 when k = 4 -e accuracy of KNN inthe training set is better than that of SVM and LR but itsperformance in the test set is not much different from SVMand LR -e accuracy of these three algorithms is low Inaddition RFC ANN and LR all have the highest accuracywhen k = 4

Comprehensive comparison of the performance of thefive algorithms in the training set and the test set shows thatRFC and ANN have better performance in the prediction ofthe type of bicycle clusters -e accuracy of ANN SVMand LR in the training set is quite different but the dif-ference in the test set is not obvious Basically when the kvalue is greater than 4 as the K value increases the accuracyof the five classification models has a downward trendWhen K 4 RFC and ANN have the highest test accuracyWe choose k 4 which means that the bicycle aggregationis divided into 4 types for further analysis

53 lte Analysis of Bicycle Gathering Area

531 lte Clustering of Bicycle Gathering Area Table 2shows the description of the cluster center when k 4-e table mainly lists four indicators which are thestandard and original values of Abn_DAY and the stan-dard and original values of the coefficient of variation (in k-means clustering standard values were used)-e fourcluster centers have obvious characteristics We first divide

0100002000030000400005000060000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Trips

Figure 2 -e number of sharing bicycle trips in 24 hours in aworkday

Baoan

Longgang

Dapeng

Nanshan

Longhua Pingshan

Guangming

Luohu

Futian

Yantian

Bike service area

Built area

Shenzhen

0 10 20 30 405Kilometers

N

Figure 1 OFO bike service area in Shenzhen

6 Journal of Advanced Transportation

the four clusters into inefficient and efficient groups accordingto the value of cv A high cv indicates that the number ofavailable bicycles in the gathering area is more appropriate andthe use of regional bicycles is efficient A low cv indicates thatthe number of available bicycles in the gathering area is largewhich does not match the number of active bicycles and theuse of bicycles is inefficient We call clusters with z_cv lowerthan 0 as an inefficient group and clusters with z_cv greaterthan 0 as an efficient group -en each group is divided intotwo subtypes according to Abn DAY and cv

(i) Cluster A z_ Abn_DAYgt 1 z_cvlt 0 this clustercan be called high efficiencymodeAbn_DAY in thisgroup is reaching 416 but the average daily changeof vehicles is very few with an average of only51-ere are excessive bicycles deployed or stayed inthe area and the activity of more than 300 bicycles isnot high

(ii) Cluster B z_Abn DAYlt 0 z_cvlt 0 we call it anormal inefficient mode Its z_cvlt 0 is as same asCluster A but z_ Abn_DAYlt 0 compared with Aindicating that the number of bicycles in the clusterarea in this group is less than A-ere are about 185bicycles with an average of 17 bikes daily used andmore than 150 are not very active

(iii) Cluster C z_Abn DAYlt 0 z_cv> 2 this cluster hasthe highest cv value among the four clusters indicatingthat the number of available bicycles in this groupmatches the demand for bicycles and there are not toomany idle bicycles available -e average number ofavailable bicycles in this group is 213 and the averagedaily change is 117 More than half of the bicycles areused so it is called high efficient mode

(iv) Cluster D z_Abn DAYlt 0 z_cv> 0 this cluster issimilar to C except that the z_cv value is lower thanthat of the C class but exceeds the average In thisgroup the average number of available bicycles isalmost similar to C but Abn DAY is about half ofthat of C and its average cv is 2-3 times that ofgroups A and B -e use efficiency of bicycles ishigher than that of A and B but lower than C so it iscalled normal effective mode

In general high inefficient mode of cluster A has the maxaverage number of available bikes and high efficiency modeof cluster C has the largest average coefficient of variation-e difference between A and B is in the average number ofavailable bikes and the difference between B C and D is inthe coefficient of variation From the number of the clusterscluster A and cluster B together account for about 73 so

Table 1 -e influencing factors of bike use

Factor type Factor Unit CalculationPopulation Population Number Count the number of residents in the service area

POI

Restaurant Number

Count the number of POIs of the corresponding category in the service areaCompany NumberSmall store NumberCar park Number

Road network

Length of main road m

Calculate the total length of the corresponding road level in the service areaLength of secondaryroad m

Length of branch road m

Publictransportation

Bus stop Number Calculate the number of bus stops in the service area

Distance to subway m Calculate the distance from the center of the service area to the nearest subwaystation

Distance

Distance to university m

Calculate the distance from the center of the service area to the closestcorresponding place

Distance to government mDistance tosupermarket m

Distance to hub mDistance to square mDistance to park mDistance to school mDistance to hospital m

Building function

Office building m2

Calculate the total floor area of the corresponding building in the service area

Industrial building m2

Public building m2

Commercial building m2

Residential building m2

Urban village building m2

Warehouse m2

Building number NumberCover ratio -e ratio of the projected area of all buildings to the area of the service area

Journal of Advanced Transportation 7

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 4: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

the solution of clustering a small subset of the data orperturbing the global mean of the data k times -en thealgorithm iterates between two steps till convergence Aboutthe value of k we can choose from reasonable guessed orpredefined number but it is better to know whether k

clusters is better or worse than k minus 1 or k + 1 clusters -emethod of With he Sum of Square (WSS) is often used to getreasonable K value WSS is the sum of the square of thedistance between all points and their nearest centroid point-e calculation is shown in equation (4) pi represents thepoint i and qi represents the nearest centroid point to i if alldata points are relatively close to their respective centersthen the WSS is relatively small If K + 1 clusters do notsignificantly reduce the WSS value of K clusters then theclassification is of little significance

WSS 1113944N

i1d pi q

i1113872 1113873

2 (4)

(2) Mean Shift Clustering Algorithm Mean shift clustering isa general nonparametric cluster finding procedure introducedby Fukunaga and Hostetler [37] and it does not depend onany explicit assumptions on the shape of the point distri-bution the number of clusters or any form of random ini-tialization Mean shift treats the clustering problem bysupposing that all points given represent samples from someunderlying probability density function with regions of highsample density corresponding to the local maxima of thisdistribution To find these local maxima the algorithm worksby allowing the points to attract each other via what might beconsidered a short-ranged ldquogravitationalrdquo force Allowing thepoints to gravitate towards areas of higher density one canshow that they will eventually coalesce at a series of pointsclose to the local maxima of the distribution -ose datapoints that converge to the same local maxima are consideredto bemembers of the same cluster For amathematical detailssee Comaniciu and Meer [38] In the next sections we il-lustrate application of the algorithm to a couple of problemsusing the python package SkLearn which contains a meanshift implementation

322 Classification Algorithm Classification is a kind ofsupervised learning algorithm of training a classifier in agroup of samples that already know the class label so that itcan classify an unknown sample In the field of machinelearning there are hundreds of classifiers to solve real-worldclassification problems [39] and in this research fivecommonly used classification algorithms are selected ran-dom forest classifier K-nearest neighbor classifier logisticregression support vector machine and artificial neuralnetwork -e five algorithms used in this study are based onthe Python platform Scikit-learn package free from httpsscikit-learnorg-e parameters of each algorithm have beenadjusted to ensure the optimal performance of the algo-rithm In the analysis of shared bicycles the accuracy androbustness of these five classification algorithms will becompared

(1) Random Forest Classifier Random forest classifier (RFC)is the most widely used supervised machine learning al-gorithm It is very powerful and usually gives good resultswithout the need to repeatedly adjust the parameters -ebasic unit of random forests is the decision tree A randomforest is a classifier that contains multiple decision trees andthe category of its output is determined by the mode of thecategory of the individual tree output [40] For an inputsampleN trees will haveN classification results-e randomforest integrates all the classification voting results andspecifies the category with the highest number of votes asoutput It has several advantages it enables to handlethousands of input variables without variable deletion andgives estimates of what variables are important in theclassification

(2) K-Nearest Neighbor Classifier K-nearest neighbor(KNN) is a method of measuring the distance betweendifferent feature values for classification Given a training setD and a test object z the test object is a vector composed ofattribute values and an unknown category label -e algo-rithm needs to calculate the distance (or similarity) betweenz and each training object In this way the list of nearestneighbors can be determined-en assign the category withthe dominant number of instances in the nearest neighbor toz -e advantage is that it is easy to understand and goodperformance can be obtained without excessive adjustment-e disadvantage is that the prediction speed is slow and thedataset with many characteristics cannot be processed It isvulnerable to data imbalance And the interpretability of theoutput is not strong

(3) Logistic Regression Logistic regression (LR) is essentiallya linear classifier which refers to the establishment of aregression formula on the classification boundary line basedon the existing data to classify -e calculation cost of thismethod is not high and it is easy to understand and im-plement -e fitted parameters can clearly see the impact ofeach feature on the result And most of the time is used fortraining and classification is fast after training is completedbut it is easy to underfit and the classification accuracy is nothigh -e main reason is that LR is linear fitting but inreality many things do not satisfy linearity

(4) Support Vector Machine Support vector machine (SVM)maps the data to a multidimensional space in the form ofpoints thereby converting the nonlinear separable problemin the original sample space into a linear separable problemin the feature space so that the optimal hyperplane forclassification can be found-en classify the set according tothe hyperplane SVM can make good predictions on dataoutside the training set and has a low generalization errorrate low computational overhead and easy-to-interpretresults but it is too sensitive to parameter adjustments andkernel function parameters

(5) Artificial Neural Network Artificial neural network(ANN) is an information processing system based on imi-tating the structure and function of the brainrsquos neural

4 Journal of Advanced Transportation

network -e ANN algorithm is a set of continuous inputoutput units where each connection is associated with aweight In the learning stage by adjusting the weights of theneural network the correct class label of the sample to belearned can be predicted -e advantages of the ANN al-gorithm are high classification accuracy and strong dis-tributed parallel processing capabilities Artificial neuralnetworks have strong robustness and fault tolerance fordatasets containing a large amount of noisy data but thelearning process cannot be observed and the output resultsare difficult to interpret which will affect the reliability andacceptability of the results It also requires a large number ofparameters such as network topology initial values ofweights and thresholds

4 Study Area

41 OFO Dockless PBS in Shenzhen -is paper focuses onChinarsquos fastest urbanizing city Shenzhen to lay a foun-dation for empirical analysis of the intensity of usage of theOFO dockless bike sharing system It provides a uniquecase study as it is one of the largest bike share programslocated in a metropolis OFO bicycle sharing system waslaunched in Shenzhen in December 2016 with more than220000 bicycles We scanned the working status of thesebicycles every 15 minutes in one week of September 2017-ere are about 576 million bicycle status records in a dayFor a bicycle ID we first judge whether the bicycle is usedby comparing whether its position has changed Ifchanged we saved the time and position of the bicycle-en according to the average travel speed and traveldistance of the bicycle the abnormal bicycle use record isrejected Figure 1 demonstrates the bike service area inShenzhen

Figure 2 shows the trip summary of shared bicycles in 24hours in a workday -ere are two distinct peaks in sharedbicycle use in a workday -e morning peak is between 0800-0900 and the evening peak is between 1800ndash2100 It isreasonable to assume that bikes are used for commuting Inthe morning peak the trip number of bicycles exceeded50000-e trips in evening peak were slightly lower than theearly peak but still more than 40000 During the period of0100ndash0600 bicycle usage is stable and the lowest with about5000 trips per hour At noon period from 1200 to 1500 thebicycle use is about 20000 per hour -e amount of bicycleuse dropped from 40000 to 10000 per hour at the nightperiod which is from 2200 to 2400

42 Influencing Factor of Bike Use In the previous studiesfactors influencing public bike usage are grouped into fourcategories transportation land-usebuild environmentpopulation and meteorological data -e weather variablesare not considered in our study A total 25 factors wereselected including 6 categories of variables populationpoint of interest (POI) road network public transportationdistance and building function -e detailed factors arelisted in Table 1

5 Results and Discussion

51lte Identification of BicycleGatheringArea We used themean shift clustering method to identify the clustering areaof bicycles based on the position of the bicycle at 0900 Inthe choice of bandwidth we considered two bandwidths 300meters and 500 meters because the area identified by thesetwo bandwidths is approximately equal to the grid area sizeof 500mlowast 500m and 1000mlowast 1000m -e minimumnumber of bicycles included in each category is set to 100When the bandwidth is 300 meters a total of 492 bicyclegathering areas are obtained-e 492 bicycle gathering areascontain a total of 140000 bicycles accounting for 636 ofall bicycles When the bandwidth is 500m a total of 270gathering areas are obtained including 140000 bicyclesConsidering that the bicycles contained in the 492 clustersare more compact we finally selected 492 clusters as theanalysis objects

Figure 3 shows the OFO bicycle gathering area identifiedby mean shift clustering In Figure 3 each cluster has acenter point and the buffer analysis was proposed to obtainthe range of the bicycle gathering area-e buffer is a kind ofinfluence range or service scope of the geospatial targetwhich refers to the polygons of a certain width which areautomatically established around the point the line and thesurface entity 300 meters of buffers were established byArcGIS based on all cluster center points thus to calculatethe indicators of dockless PBS and influencing factor ofbicycle gathering area

52 Performance of Five Classification Algorithms Aftercalculating the available bike number and coefficient ofavailable bike number variation of bicycle cluster area thek-means algorithmwas executed to group these areas Figure4 shows the WSS curve and we made WSS values from 2clusters to 19 When k is increased from 2 to 8 WSS de-creases significantlyWhen kgt 8 the improvement ofWSS isvery linear so the cluster centers have similar characteristics-e larger the kmeans the more the classifications of bicyclecluster area which is likely to impact on the accuracy of theclassification algorithm It is necessary to find an optimal kvalue to balance between the accurate cluster and accurateclassification prediction -is research adopts an experi-mental strategy to select k from 3 to 8 and then uses fiveclassification algorithms to compare the prediction accuracy-e bicycle gathering areas in the same cluster are markedwith the same label using k-means clustering Five classi-fication algorithms were compared in the accuracy of dis-tinguishing the type of bicycle gathering areas using 25impact factors -e experimental process is divided into twostages including training and application At the stage oftraining the 492 gathering areas are randomly divided intotwo parts -e first part containing 75 of areas is used fortraining data and the second part as the test data is used toverify the accuracy Figure 5 shows the accuracy of the fiveclassification algorithms in the training set and the test setwhen k takes different values

Journal of Advanced Transportation 5

In the training set the performance differences of thefive algorithms are obvious For different K values the RFCalways maintains the highest accuracy rate which is higherthan 90 -e ANN also has a high accuracy rate When Kis 3-4 the accuracy rate is above 90 and when the k valueis 5ndash8 the accuracy rate drops to above 80 KNN algo-rithm performance is in the middle of the five algorithmsWhen the K value is 3-4 the accuracy rate is above 70and when the K value is 5ndash8 the accuracy rate drops toabove 60 As the k value increases the accuracy of SVMdrops from 63 to 48-e worst-performing algorithm isthe LR algorithm As the k value increases the accuracy ratedrops from 58 to 37 In addition in the trend that theaccuracy rate changes with the value of k the accuracy rateof the RFC algorithm fluctuates little and the other al-gorithms have the highest accuracy rate when the value of Kis small As the value of k increases the accuracy ratedecreases and When k= 8 with ANN the accuracy rate hasincreased In the test set the accuracy of the five algorithmsis lower than that of the training set -e most accurate

performance is still the RFC algorithm When k = 4 RFChas the highest accuracy rate of 7697 which is the onlycase in the test set where the accuracy rate exceeds 75When k = 3 its accuracy rate is 73 For ANN the highestaccuracy rate is 71 when k = 4 -e accuracy of KNN inthe training set is better than that of SVM and LR but itsperformance in the test set is not much different from SVMand LR -e accuracy of these three algorithms is low Inaddition RFC ANN and LR all have the highest accuracywhen k = 4

Comprehensive comparison of the performance of thefive algorithms in the training set and the test set shows thatRFC and ANN have better performance in the prediction ofthe type of bicycle clusters -e accuracy of ANN SVMand LR in the training set is quite different but the dif-ference in the test set is not obvious Basically when the kvalue is greater than 4 as the K value increases the accuracyof the five classification models has a downward trendWhen K 4 RFC and ANN have the highest test accuracyWe choose k 4 which means that the bicycle aggregationis divided into 4 types for further analysis

53 lte Analysis of Bicycle Gathering Area

531 lte Clustering of Bicycle Gathering Area Table 2shows the description of the cluster center when k 4-e table mainly lists four indicators which are thestandard and original values of Abn_DAY and the stan-dard and original values of the coefficient of variation (in k-means clustering standard values were used)-e fourcluster centers have obvious characteristics We first divide

0100002000030000400005000060000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Trips

Figure 2 -e number of sharing bicycle trips in 24 hours in aworkday

Baoan

Longgang

Dapeng

Nanshan

Longhua Pingshan

Guangming

Luohu

Futian

Yantian

Bike service area

Built area

Shenzhen

0 10 20 30 405Kilometers

N

Figure 1 OFO bike service area in Shenzhen

6 Journal of Advanced Transportation

the four clusters into inefficient and efficient groups accordingto the value of cv A high cv indicates that the number ofavailable bicycles in the gathering area is more appropriate andthe use of regional bicycles is efficient A low cv indicates thatthe number of available bicycles in the gathering area is largewhich does not match the number of active bicycles and theuse of bicycles is inefficient We call clusters with z_cv lowerthan 0 as an inefficient group and clusters with z_cv greaterthan 0 as an efficient group -en each group is divided intotwo subtypes according to Abn DAY and cv

(i) Cluster A z_ Abn_DAYgt 1 z_cvlt 0 this clustercan be called high efficiencymodeAbn_DAY in thisgroup is reaching 416 but the average daily changeof vehicles is very few with an average of only51-ere are excessive bicycles deployed or stayed inthe area and the activity of more than 300 bicycles isnot high

(ii) Cluster B z_Abn DAYlt 0 z_cvlt 0 we call it anormal inefficient mode Its z_cvlt 0 is as same asCluster A but z_ Abn_DAYlt 0 compared with Aindicating that the number of bicycles in the clusterarea in this group is less than A-ere are about 185bicycles with an average of 17 bikes daily used andmore than 150 are not very active

(iii) Cluster C z_Abn DAYlt 0 z_cv> 2 this cluster hasthe highest cv value among the four clusters indicatingthat the number of available bicycles in this groupmatches the demand for bicycles and there are not toomany idle bicycles available -e average number ofavailable bicycles in this group is 213 and the averagedaily change is 117 More than half of the bicycles areused so it is called high efficient mode

(iv) Cluster D z_Abn DAYlt 0 z_cv> 0 this cluster issimilar to C except that the z_cv value is lower thanthat of the C class but exceeds the average In thisgroup the average number of available bicycles isalmost similar to C but Abn DAY is about half ofthat of C and its average cv is 2-3 times that ofgroups A and B -e use efficiency of bicycles ishigher than that of A and B but lower than C so it iscalled normal effective mode

In general high inefficient mode of cluster A has the maxaverage number of available bikes and high efficiency modeof cluster C has the largest average coefficient of variation-e difference between A and B is in the average number ofavailable bikes and the difference between B C and D is inthe coefficient of variation From the number of the clusterscluster A and cluster B together account for about 73 so

Table 1 -e influencing factors of bike use

Factor type Factor Unit CalculationPopulation Population Number Count the number of residents in the service area

POI

Restaurant Number

Count the number of POIs of the corresponding category in the service areaCompany NumberSmall store NumberCar park Number

Road network

Length of main road m

Calculate the total length of the corresponding road level in the service areaLength of secondaryroad m

Length of branch road m

Publictransportation

Bus stop Number Calculate the number of bus stops in the service area

Distance to subway m Calculate the distance from the center of the service area to the nearest subwaystation

Distance

Distance to university m

Calculate the distance from the center of the service area to the closestcorresponding place

Distance to government mDistance tosupermarket m

Distance to hub mDistance to square mDistance to park mDistance to school mDistance to hospital m

Building function

Office building m2

Calculate the total floor area of the corresponding building in the service area

Industrial building m2

Public building m2

Commercial building m2

Residential building m2

Urban village building m2

Warehouse m2

Building number NumberCover ratio -e ratio of the projected area of all buildings to the area of the service area

Journal of Advanced Transportation 7

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 5: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

network -e ANN algorithm is a set of continuous inputoutput units where each connection is associated with aweight In the learning stage by adjusting the weights of theneural network the correct class label of the sample to belearned can be predicted -e advantages of the ANN al-gorithm are high classification accuracy and strong dis-tributed parallel processing capabilities Artificial neuralnetworks have strong robustness and fault tolerance fordatasets containing a large amount of noisy data but thelearning process cannot be observed and the output resultsare difficult to interpret which will affect the reliability andacceptability of the results It also requires a large number ofparameters such as network topology initial values ofweights and thresholds

4 Study Area

41 OFO Dockless PBS in Shenzhen -is paper focuses onChinarsquos fastest urbanizing city Shenzhen to lay a foun-dation for empirical analysis of the intensity of usage of theOFO dockless bike sharing system It provides a uniquecase study as it is one of the largest bike share programslocated in a metropolis OFO bicycle sharing system waslaunched in Shenzhen in December 2016 with more than220000 bicycles We scanned the working status of thesebicycles every 15 minutes in one week of September 2017-ere are about 576 million bicycle status records in a dayFor a bicycle ID we first judge whether the bicycle is usedby comparing whether its position has changed Ifchanged we saved the time and position of the bicycle-en according to the average travel speed and traveldistance of the bicycle the abnormal bicycle use record isrejected Figure 1 demonstrates the bike service area inShenzhen

Figure 2 shows the trip summary of shared bicycles in 24hours in a workday -ere are two distinct peaks in sharedbicycle use in a workday -e morning peak is between 0800-0900 and the evening peak is between 1800ndash2100 It isreasonable to assume that bikes are used for commuting Inthe morning peak the trip number of bicycles exceeded50000-e trips in evening peak were slightly lower than theearly peak but still more than 40000 During the period of0100ndash0600 bicycle usage is stable and the lowest with about5000 trips per hour At noon period from 1200 to 1500 thebicycle use is about 20000 per hour -e amount of bicycleuse dropped from 40000 to 10000 per hour at the nightperiod which is from 2200 to 2400

42 Influencing Factor of Bike Use In the previous studiesfactors influencing public bike usage are grouped into fourcategories transportation land-usebuild environmentpopulation and meteorological data -e weather variablesare not considered in our study A total 25 factors wereselected including 6 categories of variables populationpoint of interest (POI) road network public transportationdistance and building function -e detailed factors arelisted in Table 1

5 Results and Discussion

51lte Identification of BicycleGatheringArea We used themean shift clustering method to identify the clustering areaof bicycles based on the position of the bicycle at 0900 Inthe choice of bandwidth we considered two bandwidths 300meters and 500 meters because the area identified by thesetwo bandwidths is approximately equal to the grid area sizeof 500mlowast 500m and 1000mlowast 1000m -e minimumnumber of bicycles included in each category is set to 100When the bandwidth is 300 meters a total of 492 bicyclegathering areas are obtained-e 492 bicycle gathering areascontain a total of 140000 bicycles accounting for 636 ofall bicycles When the bandwidth is 500m a total of 270gathering areas are obtained including 140000 bicyclesConsidering that the bicycles contained in the 492 clustersare more compact we finally selected 492 clusters as theanalysis objects

Figure 3 shows the OFO bicycle gathering area identifiedby mean shift clustering In Figure 3 each cluster has acenter point and the buffer analysis was proposed to obtainthe range of the bicycle gathering area-e buffer is a kind ofinfluence range or service scope of the geospatial targetwhich refers to the polygons of a certain width which areautomatically established around the point the line and thesurface entity 300 meters of buffers were established byArcGIS based on all cluster center points thus to calculatethe indicators of dockless PBS and influencing factor ofbicycle gathering area

52 Performance of Five Classification Algorithms Aftercalculating the available bike number and coefficient ofavailable bike number variation of bicycle cluster area thek-means algorithmwas executed to group these areas Figure4 shows the WSS curve and we made WSS values from 2clusters to 19 When k is increased from 2 to 8 WSS de-creases significantlyWhen kgt 8 the improvement ofWSS isvery linear so the cluster centers have similar characteristics-e larger the kmeans the more the classifications of bicyclecluster area which is likely to impact on the accuracy of theclassification algorithm It is necessary to find an optimal kvalue to balance between the accurate cluster and accurateclassification prediction -is research adopts an experi-mental strategy to select k from 3 to 8 and then uses fiveclassification algorithms to compare the prediction accuracy-e bicycle gathering areas in the same cluster are markedwith the same label using k-means clustering Five classi-fication algorithms were compared in the accuracy of dis-tinguishing the type of bicycle gathering areas using 25impact factors -e experimental process is divided into twostages including training and application At the stage oftraining the 492 gathering areas are randomly divided intotwo parts -e first part containing 75 of areas is used fortraining data and the second part as the test data is used toverify the accuracy Figure 5 shows the accuracy of the fiveclassification algorithms in the training set and the test setwhen k takes different values

Journal of Advanced Transportation 5

In the training set the performance differences of thefive algorithms are obvious For different K values the RFCalways maintains the highest accuracy rate which is higherthan 90 -e ANN also has a high accuracy rate When Kis 3-4 the accuracy rate is above 90 and when the k valueis 5ndash8 the accuracy rate drops to above 80 KNN algo-rithm performance is in the middle of the five algorithmsWhen the K value is 3-4 the accuracy rate is above 70and when the K value is 5ndash8 the accuracy rate drops toabove 60 As the k value increases the accuracy of SVMdrops from 63 to 48-e worst-performing algorithm isthe LR algorithm As the k value increases the accuracy ratedrops from 58 to 37 In addition in the trend that theaccuracy rate changes with the value of k the accuracy rateof the RFC algorithm fluctuates little and the other al-gorithms have the highest accuracy rate when the value of Kis small As the value of k increases the accuracy ratedecreases and When k= 8 with ANN the accuracy rate hasincreased In the test set the accuracy of the five algorithmsis lower than that of the training set -e most accurate

performance is still the RFC algorithm When k = 4 RFChas the highest accuracy rate of 7697 which is the onlycase in the test set where the accuracy rate exceeds 75When k = 3 its accuracy rate is 73 For ANN the highestaccuracy rate is 71 when k = 4 -e accuracy of KNN inthe training set is better than that of SVM and LR but itsperformance in the test set is not much different from SVMand LR -e accuracy of these three algorithms is low Inaddition RFC ANN and LR all have the highest accuracywhen k = 4

Comprehensive comparison of the performance of thefive algorithms in the training set and the test set shows thatRFC and ANN have better performance in the prediction ofthe type of bicycle clusters -e accuracy of ANN SVMand LR in the training set is quite different but the dif-ference in the test set is not obvious Basically when the kvalue is greater than 4 as the K value increases the accuracyof the five classification models has a downward trendWhen K 4 RFC and ANN have the highest test accuracyWe choose k 4 which means that the bicycle aggregationis divided into 4 types for further analysis

53 lte Analysis of Bicycle Gathering Area

531 lte Clustering of Bicycle Gathering Area Table 2shows the description of the cluster center when k 4-e table mainly lists four indicators which are thestandard and original values of Abn_DAY and the stan-dard and original values of the coefficient of variation (in k-means clustering standard values were used)-e fourcluster centers have obvious characteristics We first divide

0100002000030000400005000060000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Trips

Figure 2 -e number of sharing bicycle trips in 24 hours in aworkday

Baoan

Longgang

Dapeng

Nanshan

Longhua Pingshan

Guangming

Luohu

Futian

Yantian

Bike service area

Built area

Shenzhen

0 10 20 30 405Kilometers

N

Figure 1 OFO bike service area in Shenzhen

6 Journal of Advanced Transportation

the four clusters into inefficient and efficient groups accordingto the value of cv A high cv indicates that the number ofavailable bicycles in the gathering area is more appropriate andthe use of regional bicycles is efficient A low cv indicates thatthe number of available bicycles in the gathering area is largewhich does not match the number of active bicycles and theuse of bicycles is inefficient We call clusters with z_cv lowerthan 0 as an inefficient group and clusters with z_cv greaterthan 0 as an efficient group -en each group is divided intotwo subtypes according to Abn DAY and cv

(i) Cluster A z_ Abn_DAYgt 1 z_cvlt 0 this clustercan be called high efficiencymodeAbn_DAY in thisgroup is reaching 416 but the average daily changeof vehicles is very few with an average of only51-ere are excessive bicycles deployed or stayed inthe area and the activity of more than 300 bicycles isnot high

(ii) Cluster B z_Abn DAYlt 0 z_cvlt 0 we call it anormal inefficient mode Its z_cvlt 0 is as same asCluster A but z_ Abn_DAYlt 0 compared with Aindicating that the number of bicycles in the clusterarea in this group is less than A-ere are about 185bicycles with an average of 17 bikes daily used andmore than 150 are not very active

(iii) Cluster C z_Abn DAYlt 0 z_cv> 2 this cluster hasthe highest cv value among the four clusters indicatingthat the number of available bicycles in this groupmatches the demand for bicycles and there are not toomany idle bicycles available -e average number ofavailable bicycles in this group is 213 and the averagedaily change is 117 More than half of the bicycles areused so it is called high efficient mode

(iv) Cluster D z_Abn DAYlt 0 z_cv> 0 this cluster issimilar to C except that the z_cv value is lower thanthat of the C class but exceeds the average In thisgroup the average number of available bicycles isalmost similar to C but Abn DAY is about half ofthat of C and its average cv is 2-3 times that ofgroups A and B -e use efficiency of bicycles ishigher than that of A and B but lower than C so it iscalled normal effective mode

In general high inefficient mode of cluster A has the maxaverage number of available bikes and high efficiency modeof cluster C has the largest average coefficient of variation-e difference between A and B is in the average number ofavailable bikes and the difference between B C and D is inthe coefficient of variation From the number of the clusterscluster A and cluster B together account for about 73 so

Table 1 -e influencing factors of bike use

Factor type Factor Unit CalculationPopulation Population Number Count the number of residents in the service area

POI

Restaurant Number

Count the number of POIs of the corresponding category in the service areaCompany NumberSmall store NumberCar park Number

Road network

Length of main road m

Calculate the total length of the corresponding road level in the service areaLength of secondaryroad m

Length of branch road m

Publictransportation

Bus stop Number Calculate the number of bus stops in the service area

Distance to subway m Calculate the distance from the center of the service area to the nearest subwaystation

Distance

Distance to university m

Calculate the distance from the center of the service area to the closestcorresponding place

Distance to government mDistance tosupermarket m

Distance to hub mDistance to square mDistance to park mDistance to school mDistance to hospital m

Building function

Office building m2

Calculate the total floor area of the corresponding building in the service area

Industrial building m2

Public building m2

Commercial building m2

Residential building m2

Urban village building m2

Warehouse m2

Building number NumberCover ratio -e ratio of the projected area of all buildings to the area of the service area

Journal of Advanced Transportation 7

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 6: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

In the training set the performance differences of thefive algorithms are obvious For different K values the RFCalways maintains the highest accuracy rate which is higherthan 90 -e ANN also has a high accuracy rate When Kis 3-4 the accuracy rate is above 90 and when the k valueis 5ndash8 the accuracy rate drops to above 80 KNN algo-rithm performance is in the middle of the five algorithmsWhen the K value is 3-4 the accuracy rate is above 70and when the K value is 5ndash8 the accuracy rate drops toabove 60 As the k value increases the accuracy of SVMdrops from 63 to 48-e worst-performing algorithm isthe LR algorithm As the k value increases the accuracy ratedrops from 58 to 37 In addition in the trend that theaccuracy rate changes with the value of k the accuracy rateof the RFC algorithm fluctuates little and the other al-gorithms have the highest accuracy rate when the value of Kis small As the value of k increases the accuracy ratedecreases and When k= 8 with ANN the accuracy rate hasincreased In the test set the accuracy of the five algorithmsis lower than that of the training set -e most accurate

performance is still the RFC algorithm When k = 4 RFChas the highest accuracy rate of 7697 which is the onlycase in the test set where the accuracy rate exceeds 75When k = 3 its accuracy rate is 73 For ANN the highestaccuracy rate is 71 when k = 4 -e accuracy of KNN inthe training set is better than that of SVM and LR but itsperformance in the test set is not much different from SVMand LR -e accuracy of these three algorithms is low Inaddition RFC ANN and LR all have the highest accuracywhen k = 4

Comprehensive comparison of the performance of thefive algorithms in the training set and the test set shows thatRFC and ANN have better performance in the prediction ofthe type of bicycle clusters -e accuracy of ANN SVMand LR in the training set is quite different but the dif-ference in the test set is not obvious Basically when the kvalue is greater than 4 as the K value increases the accuracyof the five classification models has a downward trendWhen K 4 RFC and ANN have the highest test accuracyWe choose k 4 which means that the bicycle aggregationis divided into 4 types for further analysis

53 lte Analysis of Bicycle Gathering Area

531 lte Clustering of Bicycle Gathering Area Table 2shows the description of the cluster center when k 4-e table mainly lists four indicators which are thestandard and original values of Abn_DAY and the stan-dard and original values of the coefficient of variation (in k-means clustering standard values were used)-e fourcluster centers have obvious characteristics We first divide

0100002000030000400005000060000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Trips

Figure 2 -e number of sharing bicycle trips in 24 hours in aworkday

Baoan

Longgang

Dapeng

Nanshan

Longhua Pingshan

Guangming

Luohu

Futian

Yantian

Bike service area

Built area

Shenzhen

0 10 20 30 405Kilometers

N

Figure 1 OFO bike service area in Shenzhen

6 Journal of Advanced Transportation

the four clusters into inefficient and efficient groups accordingto the value of cv A high cv indicates that the number ofavailable bicycles in the gathering area is more appropriate andthe use of regional bicycles is efficient A low cv indicates thatthe number of available bicycles in the gathering area is largewhich does not match the number of active bicycles and theuse of bicycles is inefficient We call clusters with z_cv lowerthan 0 as an inefficient group and clusters with z_cv greaterthan 0 as an efficient group -en each group is divided intotwo subtypes according to Abn DAY and cv

(i) Cluster A z_ Abn_DAYgt 1 z_cvlt 0 this clustercan be called high efficiencymodeAbn_DAY in thisgroup is reaching 416 but the average daily changeof vehicles is very few with an average of only51-ere are excessive bicycles deployed or stayed inthe area and the activity of more than 300 bicycles isnot high

(ii) Cluster B z_Abn DAYlt 0 z_cvlt 0 we call it anormal inefficient mode Its z_cvlt 0 is as same asCluster A but z_ Abn_DAYlt 0 compared with Aindicating that the number of bicycles in the clusterarea in this group is less than A-ere are about 185bicycles with an average of 17 bikes daily used andmore than 150 are not very active

(iii) Cluster C z_Abn DAYlt 0 z_cv> 2 this cluster hasthe highest cv value among the four clusters indicatingthat the number of available bicycles in this groupmatches the demand for bicycles and there are not toomany idle bicycles available -e average number ofavailable bicycles in this group is 213 and the averagedaily change is 117 More than half of the bicycles areused so it is called high efficient mode

(iv) Cluster D z_Abn DAYlt 0 z_cv> 0 this cluster issimilar to C except that the z_cv value is lower thanthat of the C class but exceeds the average In thisgroup the average number of available bicycles isalmost similar to C but Abn DAY is about half ofthat of C and its average cv is 2-3 times that ofgroups A and B -e use efficiency of bicycles ishigher than that of A and B but lower than C so it iscalled normal effective mode

In general high inefficient mode of cluster A has the maxaverage number of available bikes and high efficiency modeof cluster C has the largest average coefficient of variation-e difference between A and B is in the average number ofavailable bikes and the difference between B C and D is inthe coefficient of variation From the number of the clusterscluster A and cluster B together account for about 73 so

Table 1 -e influencing factors of bike use

Factor type Factor Unit CalculationPopulation Population Number Count the number of residents in the service area

POI

Restaurant Number

Count the number of POIs of the corresponding category in the service areaCompany NumberSmall store NumberCar park Number

Road network

Length of main road m

Calculate the total length of the corresponding road level in the service areaLength of secondaryroad m

Length of branch road m

Publictransportation

Bus stop Number Calculate the number of bus stops in the service area

Distance to subway m Calculate the distance from the center of the service area to the nearest subwaystation

Distance

Distance to university m

Calculate the distance from the center of the service area to the closestcorresponding place

Distance to government mDistance tosupermarket m

Distance to hub mDistance to square mDistance to park mDistance to school mDistance to hospital m

Building function

Office building m2

Calculate the total floor area of the corresponding building in the service area

Industrial building m2

Public building m2

Commercial building m2

Residential building m2

Urban village building m2

Warehouse m2

Building number NumberCover ratio -e ratio of the projected area of all buildings to the area of the service area

Journal of Advanced Transportation 7

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 7: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

the four clusters into inefficient and efficient groups accordingto the value of cv A high cv indicates that the number ofavailable bicycles in the gathering area is more appropriate andthe use of regional bicycles is efficient A low cv indicates thatthe number of available bicycles in the gathering area is largewhich does not match the number of active bicycles and theuse of bicycles is inefficient We call clusters with z_cv lowerthan 0 as an inefficient group and clusters with z_cv greaterthan 0 as an efficient group -en each group is divided intotwo subtypes according to Abn DAY and cv

(i) Cluster A z_ Abn_DAYgt 1 z_cvlt 0 this clustercan be called high efficiencymodeAbn_DAY in thisgroup is reaching 416 but the average daily changeof vehicles is very few with an average of only51-ere are excessive bicycles deployed or stayed inthe area and the activity of more than 300 bicycles isnot high

(ii) Cluster B z_Abn DAYlt 0 z_cvlt 0 we call it anormal inefficient mode Its z_cvlt 0 is as same asCluster A but z_ Abn_DAYlt 0 compared with Aindicating that the number of bicycles in the clusterarea in this group is less than A-ere are about 185bicycles with an average of 17 bikes daily used andmore than 150 are not very active

(iii) Cluster C z_Abn DAYlt 0 z_cv> 2 this cluster hasthe highest cv value among the four clusters indicatingthat the number of available bicycles in this groupmatches the demand for bicycles and there are not toomany idle bicycles available -e average number ofavailable bicycles in this group is 213 and the averagedaily change is 117 More than half of the bicycles areused so it is called high efficient mode

(iv) Cluster D z_Abn DAYlt 0 z_cv> 0 this cluster issimilar to C except that the z_cv value is lower thanthat of the C class but exceeds the average In thisgroup the average number of available bicycles isalmost similar to C but Abn DAY is about half ofthat of C and its average cv is 2-3 times that ofgroups A and B -e use efficiency of bicycles ishigher than that of A and B but lower than C so it iscalled normal effective mode

In general high inefficient mode of cluster A has the maxaverage number of available bikes and high efficiency modeof cluster C has the largest average coefficient of variation-e difference between A and B is in the average number ofavailable bikes and the difference between B C and D is inthe coefficient of variation From the number of the clusterscluster A and cluster B together account for about 73 so

Table 1 -e influencing factors of bike use

Factor type Factor Unit CalculationPopulation Population Number Count the number of residents in the service area

POI

Restaurant Number

Count the number of POIs of the corresponding category in the service areaCompany NumberSmall store NumberCar park Number

Road network

Length of main road m

Calculate the total length of the corresponding road level in the service areaLength of secondaryroad m

Length of branch road m

Publictransportation

Bus stop Number Calculate the number of bus stops in the service area

Distance to subway m Calculate the distance from the center of the service area to the nearest subwaystation

Distance

Distance to university m

Calculate the distance from the center of the service area to the closestcorresponding place

Distance to government mDistance tosupermarket m

Distance to hub mDistance to square mDistance to park mDistance to school mDistance to hospital m

Building function

Office building m2

Calculate the total floor area of the corresponding building in the service area

Industrial building m2

Public building m2

Commercial building m2

Residential building m2

Urban village building m2

Warehouse m2

Building number NumberCover ratio -e ratio of the projected area of all buildings to the area of the service area

Journal of Advanced Transportation 7

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 8: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

the main mode is inefficient mode -ese two groups havegathered a total of 110000 bicycles which used low effi-ciency -e efficient mode area accounts for about 27 of allregions of which the high efficient mode area accounts for10 with 10000 bicycles and the normal efficient modeaccounts for 17 totaling 17705 bicycles A total of 27000bicycles are gathered in these two groups and the bicycle useefficiency in these areas is higher

532 lte Impact of Clustering of Bicycle Gathering AreaFigure 6 shows the spatial distribution of four clusters -ehigh inefficient mode is mainly distributed in the central areaof Shenzhen (Futian and Luohu) and Baoan and shows the

characteristics of spatial clustering Normal inefficient areasare distributed on the periphery of urban built-up areas andefficient mode areas are scattered between normal inefficientand high inefficient areas It is worth noting that in Futianand Luohu districts where the subway network density ishigh the main distribution mode is inefficient In order tobetter understand the influencing factors that affect the typesof bicycle service clusters we analyzed the importance of thefactors that determine the types of bicycle service clustersbased on the RFC model with the highest accuracy

-e importance indicates how important the variablesare in the RFC model -e sum of the importance of allvariables is 1 Figure 7 shows the importance of 25 factors inthe RFC model in ascending order when K= 4 and theaverage importance is 004 (125) -e importance of pop-ulation and buildings number is significantly higher thanother factors which are key factors -e least importantfactors are the length of the branch road the number of busstops and the area of public buildings -e importance ofthese three variables does not exceed 002 lower than theaverage -e importance of the main road length and therestaurant number ranked third and fourth indicating thatthey are important reference variables for identifying bikegathering area type -e importance of resident buildingarea building coverage distance to universities and smallshops is slightly larger than average -e distance to thesubway station ranks only ninth in importance which isabout the same as the commercial building area and thecompany number -e sum of the importance of the top 10variables accounted for 54 Existing studies have shownthat the area around subway stations is the most active area

N

Metro line

Figure 3 -e OFO bicycle gathering area identified by mean shift clustering

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19400

600

800

1000

1200

1400

1600

Number of K

WSS

WSS

Figure 4 -e WSS curve in determining the number of clusters

8 Journal of Advanced Transportation

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 9: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

for bicycle activities but our research has found that thedistance from the subway station is not the most importantfactor in judging the activity type of bicycle gathering areas-e population the number of buildings the length of themain road and restaurants are four important variables forjudging the active types of bicycle clusters In additionamong the 25 influencing factors in Figure 7 except for thevariables with higher and lower importance the importanceof most of the variables in the middle is more evenly dis-tributed indicating that the active types of bicycle clustershave more and more complex influencing factors

Figure 8 shows a comparison of the average values ofthe standard values of 25 factors -e color of the heat mapclearly shows that there are obvious differences in thevalues of 25 variables between the four groups We found

that extreme values of variables tend to appear in groups Aand C Group A is significantly higher than the other threegroups in the seven variables of population number ofbuildings length of main roads number of restaurantsbuilding coverage number of parking lots and number ofbus stops -e number of population buildings and thelength of secondary roads in group C are significantly lowerthan the other three groups and the distance to schoolindustrial building area office area and distance to the parkare significantly higher than the other three groups Al-though the variables in groups B and D rarely havemaximum or minimum values the characteristics of thevariables between them are quite different -e classifica-tion is based on average available number and the coeffi-cient of variation but the 25 variables between the classes

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Accu

racy

Accu

racy

Training

RFC trainKNN trainLR train

SVM trainANN train

RFC testKNN testLR test

SVM testANN test

3 4 5 6 7 803

04

05

06

07

08

09

1

K

Testing

Figure 5 -e accuracy of the five classification algorithms

Table 2 -e description of four clusters

Cluster A B C DDescription High inefficient Normal inefficient High efficient Normal efficientz_ Abn DAY 103 minus073 minus052 minus055z_ cv minus039 minus059 235 075Abn_DAY 416 185 213 209cv 012 009 055 030Abn_DAY lowast cv 51 17 117 63Total available bike number 78588 31574 9986 17705Cluster number 189 171 47 85

Journal of Advanced Transportation 9

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 10: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

Popu

latio

nBu

ildin

g nu

mbe

rM

ain

road

Reat

aura

ntRe

siden

tial b

uild

ing

Cov

er ra

tioD

istan

ce to

uni

vers

itySm

all s

tore

Dist

ance

to su

bway

Com

mer

cial

bui

ldin

gC

ompa

nyD

istan

ce to

scho

olD

istan

ce to

hub

Indu

stria

l bui

ldin

gSe

cond

ary

road

Dist

ance

to h

ospi

tal

Offi

ce b

uild

ing

Dist

ance

to p

ark

Dist

ance

to sq

uare

Dist

ance

to g

over

nmen

tCa

r par

king

Dist

ance

to m

all

Publ

ic b

uild

ing

Bus s

top

Bran

ch ro

ad

01

009

008

007

006

005

004

003

002

001

0

100

90

80

70

60

50 ()

40

30

20

10

0

Figure 7 -e importance of factor in random forest classifier model

A high inefficient

C high efficient

D normal efficient

B normal inefficient

Metro lineType

N

Figure 6 -e distribution of four clusters

10 Journal of Advanced Transportation

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 11: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

have obvious differences in value indicating that the ac-tivity types of bicycles are related to these factors -evalues of these variables can be used to judge the activitytypes of bicycles

533 Guiding Public Bicycle Planning and ManagementLet us assume a scenario to provide a dockless public bicycleservice in a new area in Shenzhen Considering that theinfluencing variables are easy to obtain we apply the RFC to

PopulationBuilding number

Main roadResidential building

Cover ratioDistance to university

Small storeDistance to subway

Commercial buildingCompany

Distance to schoolDistance to hub

Industrial buildingSecondary road

Distance to hospitalOffice building

Distance to parkDistance to square

Distance to governmentCar parking

Distance to mallPublic building

Bus stopBranch road

Fact

ors

05703601

016035

ndash028

ndash032ndash0092ndash024ndash036

ndash036

ndash026

ndash026

016

026

02

015 018

0180087049084021

053

00048

012023013024018026

029 009

0069000330036

0047

062

00930012028

018

ndash01

ndash00013

ndash0032

ndash0041

ndash028

ndash022ndash0028ndash014

ndash0029ndash0058ndash012

ndash0052ndash028

ndash056 ndash00350082007500990031

ndash0087

ndash013 ndash0089ndash025

ndash024ndash0082

ndash0047ndash0035

ndash023

ndash065ndash063 ndash029

ndash026

001

0028

ndash015

ndash018

03

024

ndash014

ndash014

ndash024ndash02

ndash018

ndash033ndash011ndash013ndash021ndash019

ndash014

ndash0044

037

022

ndash016

018

0140053

028

04

A CB DClusters

08

04

ndash08

ndash04

00

Figure 8 -e heatmap of factor in four clusters

Metro line

Prediction

N

A

C

D

B

Service area

Figure 9 -e predicted results of activity patterns in the new service area

Journal of Advanced Transportation 11

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 12: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

predict the activity pattern of the new service area -e builtarea of Shenzhen which does not provide OFO service isdivided into 1459 grids and the influence factors are cal-culated Figure 9 shows the predicted results of the RFCmodel for activity patterns Note that the prediction assumesthat the existing bike operational and deployment strategiesof OFO remain unchanged Yellow grids belonging to clusterB have the largest number about 1025 -ere are 146 bluegrids which are cluster D-e remaining area is 162 red gridsand 126 green grids Most girds are normal inefficient typesTable 2 and the service area type can provide information topublic bicycle quantity management We can get theoreti-cally the minimum number of bicycles needed according tothe gridrsquos cluster and the total number of bicycles required Itis possible to optimize the number of available bicyclesaccording to bike activity of the grid reduce operating costsand improve utilization efficiency

6 Conclusions

-is research practiced the introduction of a machinelearning approach to quantity management using OFO bikeoperation data in Shenzhen -e contributions are mainlyreflected in the following three aspects First we proposed amethod for identifying the cluster area of dockless sharedbicycles which can accurately calculate the impact factors ofshared bicycle systems Second different from previousresearch perspectives this research discusses the perfor-mance and optimization possibilities of the shared bicyclesystem from the number of available bicycles in the gath-ering area and its changes At last this work shows theapplicability and operability of machine learning methods insolving urban planning and management problems which isinspiring for people with urban management background touse computational intelligence

-e bicycle gathering area typersquos recognition predictionand application in this study are meaningful for the sus-tainable development of shared bicycles (1) -ere were 492OFO bicycle gathering areas containing more than 140000bicycles accounting for 636 of all bicycles in Shenzhen (2)More type number of bicycle gathering area will affect theaccuracy of the classification algorithm -e random forestclassification had the best performance in identifying bicyclegathering area with an accuracy of more than 75 (3)Shenzhen OFO dockless public bike gathering areas can bedivided into four types high inefficient normal inefficienthigh efficient and normal efficient -e main area types arehigh inefficient and normal inefficient which gathered about110000 bicycles with low usage(4) -ere were obviousdifferences in the characteristics of impact factors in fourtypes of bicycle gathering areas It is feasible to use thesefactors to predict area type to optimize the number ofavailable bicycles reduce operating costs and improveutilization efficiency So the knowledge from the existingdockless bicycle operation data can be used to guide publicbicycle planning and management -e potential activitypatterns and the minimum number of bikes in new serviceareas can be obtained in advance Operating companies canmake optimization strategies based on this information

Our study also has some limitations First due to thelimitations of data acquisition the working day operationaldata used only contain one week so the results of the analysismay be biased-e data we analyzed did not include data fornonworking days Weekend public bicycle usage patternsmay differ fromweekday Second themodes analyzed in thisstudy rely heavily on operational data and may not be ap-plicable elsewhere When the strategy of bicycle operationchanges or the number of bicycles is reoptimized the activitymodes will be affected After a period of operation accordingto the indicators and models of this study a new mode ofactivity will be formed Last this paper focuses on thenumber of available bicycles and the activity indicators ofdockless PBS need to be further explored

Data Availability

-e bike data population distribution data and POI dataincluding land use and built environment were supplied bythe authors of this article -ey are freely available Requestsfor access to these data should be made to Qingfeng Zhouzhouqingfenghiteducn

Conflicts of Interest

-e authors declare that they have no conflicts of interest

Acknowledgments

Financial support from the National Natural ScienceFoundation of China (grant no 41771169) is acknowledged

References

[1] P Demaio ldquoBike-sharing history impacts models of pro-vision and futurerdquo Journal of Public Transportation vol 12no 4 pp 41ndash56 2009

[2] China Academy of Information and CommunicationsTechnology China Shared Bicycle Industry Development Re-port China Academy of Information and CommunicationsTechnology Beijing China 2018

[3] Y Zhou B P L Lau Z Koh C Yuen and B K K NgldquoUnderstanding crowd behaviors in a social event by passiveWiFi sensing and data miningrdquo IEEE Internet of ltingsJournal vol 7 no 5 pp 4442ndash4454 2020

[4] K Li C Yuen S S Kanhere et al ldquoAn experimental study fortracking crowd in smart citiesrdquo IEEE Systems Journal vol 13no 3 pp 2966ndash2977 2019

[5] Q Chen W Wang F Wu et al ldquoA survey on an emergingarea deep learning for smart city datardquo IEEE Transactions onEmerging Topics in Computational Intelligence vol 3 no 5pp 392ndash410 2019

[6] Y Zhou B P L Lau C Yuen B Tuncer and E WilhelmldquoUnderstanding urban human mobility through crowdsenseddatardquo IEEE Communications Magazine vol 56 no 11pp 52ndash59 2018

[7] L P L Billy N Wijerathne B K K Ng et al ldquoSensor fusionfor public space utilization monitoring in a smart cityrdquo IEEEInternet of ltings Journal vol 5 no 2 pp 473ndash481 2017

[8] X Wang C Yuen N U Hassan et al ldquoElectric vehiclecharging station placement for urban public bus systemsrdquo

12 Journal of Advanced Transportation

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13

Page 13: MachineLearningApproachtoQuantityManagementfor Long ...2020/07/21  · Sacramento,California.GebhartandNoland[21]usedreal-timeridershipdataforCapitalBikeshareinWashington D.C.toinvestigatetheimpactofweathervariablesand

IEEE Transactions on Intelligent Transportation Systemsvol 18 no 1 pp 128ndash139 2017

[9] A Faghih-Imani and N Eluru ldquoAnalysing bicycle-sharingsystem user destination choice preferences chicagorsquos Divvysystemrdquo Journal of Transport Geography vol 44 pp 53ndash642015

[10] C Fricker and N Gast ldquoIncentives and redistribution inhomogeneous bike-sharing systems with stations of finitecapacityrdquo EURO Journal on Transportation and Logisticsvol 5 no 3 pp 261ndash291 2016

[11] P-S You P-J Lee and Y-C Hsieh ldquoAn artificial intelligentapproach to the bicycle repositioning problemsrdquo EngineeringComputations vol 34 no 1 pp 145ndash163 2017

[12] E OrsquoMahony and D B Shmoys ldquoData analysis and opti-mization for (citi) bike sharingrdquo in Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence AAAI PressAustin TX USA January 2015

[13] Y Chen Y Li H Hu J Zhang D Gu and P Xu ldquoCom-putational intelligence approaches to robotics automationand controlrdquo Mathematical Problems in Engineeringvol 2015 Article ID 620275 1 page 2015

[14] Ao Lozano J De Paz G Villarrubia Gonzalez D Iglesia andJ Bajo ldquoMulti-agent system for demand prediction and tripvisualization in bike sharing systemsrdquo Applied Sciences vol 8no 1 p 67 2018

[15] A Faghih-Imani N Eluru A M El-Geneidy M Rabbat andU Haq ldquoHow land-use and urban form impact bicycle flowsevidence from the bicycle-sharing system (BIXI) in Mon-trealrdquo Journal of Transport Geography vol 41 pp 306ndash3142014

[16] G R Krykewycz C M Puchalsky J Rocks B Bonnette andF Jaskiewicz ldquoDefining a primary market and estimatingdemand for major bicycle-sharing program in philadelphiaPennsylvaniardquo Transportation Research Record Journal of theTransportation Research Board vol 2143 no 1 p 117 2010

[17] W El-Assi M Salah Mahmoud and K Nurul Habib ldquoEffectsof built environment and weather on bike sharing demand astation level analysis of commercial bike sharing in TorontordquoTransportation vol 44 no 3 pp 589ndash613 2017

[18] R C Hampshire and L Marla An Analysis of Bike SharingUsage Explaining Trip Generation and Attraction from Ob-served Demand Transportation Research Board WashingtonDC USA 2012

[19] Y Zhang T -omas M Brussel and M van MaarseveenldquoExploring the impact of built environment factors on the useof public bikes at bike stations case study in ZhongshanChinardquo Journal of Transport Geography vol 58 pp 59ndash702017

[20] L K Maurer Feasibility Study for a Bicycle Sharing Programin Sacramento California Transportation Research BoardWashington DC USA 2011

[21] K Gebhart and R B Noland ldquo-e impact of weather con-ditions on bikeshare trips in Washington DCrdquo Trans-portation vol 41 no 6 pp 1205ndash1225 2014

[22] D Buck and R Buehler Bike Lanes and Other Determinants ofCapital Bikeshare Trips Transportation Research BoardWashington DC USA 2012

[23] X L G S Wang ldquoModeling bike share station activity effectsof nearby businesses and jobs on trips to and from stationsrdquoJournal of Urban Planning amp Development vol 142 no 12012

[24] R A Rixey ldquoStation-level forecasting of bikesharing rider-shiprdquo Transportation Research Record Journal of the Trans-portation Research Board vol 2387 no 1 pp 46ndash55 2013

[25] J T Wong and C Y Cheng ldquoExploring activity patterns ofthe Taipei public bike sharing systemrdquo Journal of the EasternAsia Society for Transportation Studies vol 11 pp 1012ndash10282015

[26] J Froehlich J Neumann and N Oliver ldquoMeasuring the pulseof the city through shared bicycle programsrdquo in Proceedings ofthe International Workshop on Urban Community and SocialApplications of Networked Sensing Systems-UrbanSense 08Raleigh NC USA November 2008

[27] P Vogel T Greiser and D CMattfeld ldquoUnderstanding bike-sharing systems using data mining exploring activity pat-ternsrdquo Procedia - Social and Behavioral Sciences vol 20pp 514ndash523 2011

[28] N Lathia S Ahmed and L Capra ldquoMeasuring the impact ofopening the London shared bicycle scheme to casual usersrdquoTransportation Research Part C Emerging Technologiesvol 22 no 5 pp 88ndash102 2012

[29] P Borgnat P Abry P Flandrin C Robardet J-B Rouquierand E Fleury ldquoShared bicycles in a city a signal processingand data analysis perspectiverdquo Advances in Complex Systemsvol 14 no 3 pp 415ndash438 2011

[30] O O Brien J Cheshire andM Batty ldquoMining bicycle sharingdata for generating insights into sustainable transport sys-temsrdquo Journal of Transport Geography vol 34 pp 262ndash2732014

[31] C Etienne and O Latifa ldquoModel-based count series clusteringfor bike sharing system usage mining a case study with thevelibrsquo system of parisrdquo ACM Transactions on IntelligentSystems and Technology vol 5 no 3 2014

[32] X L Zhou ldquoUnderstanding spatiotemporal patterns of bikingbehavior by analyzing massive bike-sharing data in ChicagordquoPLoS One vol 10 no 10 2015

[33] L Caggiani R Camporeale M Ottomanelli and W Y SzetoldquoAmodeling framework for the dynamic management of free-floating bike-sharing systemsrdquo Transportation Research PartC Emerging Technologies vol 87 pp 159ndash182 2018

[34] S Reiss and K Bogenberger ldquoGPS-data analysis of munichrsquosfree-floating bike sharing system and application of an op-erator-based relocation strategyrdquo in Proceedings of the 2015IEEE 18th International Conference on Intelligent Trans-portation Systems-(ITSC 2015) September 2015

[35] A Pal Yu Zhang and C Kwon Analyzing Mobility Patternsand Imbalance of Free-Floating Bike Sharing SystemsTransportation Research Board Washington DC USA 2017

[36] P-N Tan M Steinbach and V Kumar Introduction to DataMining Pearson Addison-Wesley Boston MA USA 2006

[37] K Fukunaga and L Hostetler ldquo-e estimation of the gradientof a density function with applications in pattern recogni-tionrdquo IEEE Transactions on Informationlteory vol 21 no 1pp 32ndash40 1975

[38] D Comaniciu and P Meer ldquoMean shift a Robust approachtowards feature space analysisrdquo IEEE Transactions on PatternAnalysis and Machine Intelligence vol 24 no 5 2002

[39] M Fernandez-Delgado and D Amorim ldquoDo we need hun-dreds of classifiers to solve real world classification problemsrdquoJournal of Machine Learning Research vol 15 pp 3133ndash31812014

[40] L Breiman L Breiman and R A Cutler ldquoRandom forestsmachine learningrdquo Journal of Clinical Microbiology vol 2pp 199ndash228 2001

Journal of Advanced Transportation 13