8
Research Article The Aggregation Mechanism Mining of Passengers’ Flow with Period Distribution Based on Suburban Rail Lines Xiaobing Ding, Hua Hu, Zhigang Liu, and Haiyan Zhu College of Urban Rail Transportation, Shanghai University of Engineering Science, Shanghai 201620, China Correspondence should be addressed to Xiaobing Ding; [email protected] Received 31 December 2016; Revised 21 March 2017; Accepted 8 May 2017; Published 22 June 2017 Academic Editor: Seenith Sivasundaram Copyright © 2017 Xiaobing Ding et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Stranding too many passengers at the stations will reduce the service level; if measures are not taken, it may lead to serious security problems. Deeply mining the time distribution mechanism of passenger flow will guide the operation enterprises to make the operation plans, emergency evacuation plans, and so on. Firstly, the big data theory is introduced to construct the mining model of temporal aggregation mechanism with supplement and correction function, then, the clustering algorithm Time cluster , is used to mine the peak time interval of passenger flow, and the passenger flow time aggregation rule is studied from the angle of traffic dispatching command. Secondly, according to the rule of mining traffic aggregation, passenger flow calculation can be determined by the time of train lines in the suburbs of vehicle speed ratio, to match the time period of the uneven distribution of passenger flow. Finally, an example is used to prove the superiority of model in determining train ratios with the experience method. Saving energy consumption improves the service level of rail transit. e research can play a positive role in the operation of energy consumption and can improve the service level of urban rail transit. 1. Introduction e success of urbanization in China has also brought some problems as follows: urban traffic congestion in megacities is becoming more and more serious, and the focus of urban life is shiſting to suburbs, and the passenger flow between suburbs increases. e passenger flow of the rail transit has a great change that is difficult to predict, resulting in increased difficulty of passenger organization; large passenger flow stranded will lead to major safety issues; for example, the serious trampling accident in Chen Yi Square in Shanghai Bund on New Year’s Day of 2015 was a security alarm. By means of big data theory analysis and advanced data mining technology, the study on the aggregation rule of rail passenger flow at the station can obtain the characteristics of passenger flow at different stations and different means in time and help the operators adjust the driving schedule in time and provide decision-making basis for daily transportation dispatching. In many passenger travel distribution models, the gravity model and the growth coefficient method are widely used. In 1940, Stouffer first proposed the interventional opportunity model. In 1955, Casey proposed a gravity model, which was subsequently interpreted NT in terms of maximum entropy and maximum likelihood. In 1965, Fumess proposed the well- known growth coefficient method. Subsequently, the linear regression model was brought into use in traffic planning to predict travel occurrence and attractiveness in the United States. At the end of 1960, the British scholars put forward a cross-classification method; on this basis, Gordon proposed a nonfamily travel model [1]. Yam et al. developed a special model to predict official travel [2]. Bowman and Ben-Akiva established a model to predict the travel occurrence from the traveler’s activities as the starting point [3]. ese foreign scholars made a more in-depth study in the passenger flow prediction and distribution models, which laid a theoretical basis. In recent years, many studies have been conducted in China: Xiaoguang proposed a polynomial distribution lagging model for short-term traffic flow forecasting, which gave overall considerations to factors impacting on the traffic state time series except its own lagging term, and the model calculation is more accurate [4]. Enjian analyzed the theoretical flaws in the traditional disaggregated models and constructed an OD distribution forecasting model based on aggregated data [5]. On the joint model of OD distribution Hindawi Discrete Dynamics in Nature and Society Volume 2017, Article ID 3819304, 7 pages https://doi.org/10.1155/2017/3819304

The Aggregation Mechanism Mining of Passengers’ Flow with

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Aggregation Mechanism Mining of Passengers’ Flow with

Research ArticleThe Aggregation Mechanism Mining of Passengersrsquo Flow withPeriod Distribution Based on Suburban Rail Lines

Xiaobing Ding Hua Hu Zhigang Liu and Haiyan Zhu

College of Urban Rail Transportation Shanghai University of Engineering Science Shanghai 201620 China

Correspondence should be addressed to Xiaobing Ding dxbsuda163com

Received 31 December 2016 Revised 21 March 2017 Accepted 8 May 2017 Published 22 June 2017

Academic Editor Seenith Sivasundaram

Copyright copy 2017 Xiaobing Ding et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Stranding too many passengers at the stations will reduce the service level if measures are not taken it may lead to serious securityproblems Deeply mining the time distribution mechanism of passenger flow will guide the operation enterprises to make theoperation plans emergency evacuation plans and so on Firstly the big data theory is introduced to construct the mining model oftemporal aggregation mechanism with supplement and correction function then the clustering algorithm Time cluster119898119899119896 is usedto mine the peak time interval of passenger flow and the passenger flow time aggregation rule is studied from the angle of trafficdispatching command Secondly according to the rule of mining traffic aggregation passenger flow calculation can be determinedby the time of train lines in the suburbs of vehicle speed ratio to match the time period of the uneven distribution of passenger flowFinally an example is used to prove the superiority of model in determining train ratios with the experience method Saving energyconsumption improves the service level of rail transit The research can play a positive role in the operation of energy consumptionand can improve the service level of urban rail transit

1 Introduction

The success of urbanization in China has also brought someproblems as follows urban traffic congestion in megacitiesis becoming more and more serious and the focus of urbanlife is shifting to suburbs and the passenger flow betweensuburbs increases The passenger flow of the rail transit has agreat change that is difficult to predict resulting in increaseddifficulty of passenger organization large passenger flowstranded will lead to major safety issues for example theserious trampling accident in Chen Yi Square in ShanghaiBund on New Yearrsquos Day of 2015 was a security alarm Bymeans of big data theory analysis and advanced data miningtechnology the study on the aggregation rule of rail passengerflow at the station can obtain the characteristics of passengerflow at different stations and different means in time and helpthe operators adjust the driving schedule in time and providedecision-making basis for daily transportation dispatching

In many passenger travel distribution models the gravitymodel and the growth coefficient method are widely used In1940 Stouffer first proposed the interventional opportunitymodel In 1955 Casey proposed a gravity model which was

subsequently interpreted NT in terms of maximum entropyandmaximum likelihood In 1965 Fumess proposed thewell-known growth coefficient method Subsequently the linearregression model was brought into use in traffic planningto predict travel occurrence and attractiveness in the UnitedStates At the end of 1960 the British scholars put forward across-classification method on this basis Gordon proposeda nonfamily travel model [1] Yam et al developed a specialmodel to predict official travel [2] Bowman and Ben-Akivaestablished a model to predict the travel occurrence fromthe travelerrsquos activities as the starting point [3] These foreignscholars made a more in-depth study in the passenger flowprediction and distribution models which laid a theoreticalbasis In recent years many studies have been conductedin China Xiaoguang proposed a polynomial distributionlagging model for short-term traffic flow forecasting whichgave overall considerations to factors impacting on thetraffic state time series except its own lagging term and themodel calculation is more accurate [4] Enjian analyzed thetheoretical flaws in the traditional disaggregated models andconstructed an OD distribution forecasting model based onaggregated data [5] On the joint model of OD distribution

HindawiDiscrete Dynamics in Nature and SocietyVolume 2017 Article ID 3819304 7 pageshttpsdoiorg10115520173819304

2 Discrete Dynamics in Nature and Society

Interaction module

Raw data

Naive Bayesianclassification missingvalue filling

Rule mining

True-value fitting

Value correction

Data verifying

Cleaning completedCleaning required

Figure 1 The flowchart of data cleaning and preprocessing

and path selection Professor Peng used the extrapolationmethod on parameter calibration because of the lack ofsurvey results on destination selection which resulted in theseparation of two phases and affected the prediction accuracyof the model [6] A short-term passenger flowOD estimationmodel based on the state space method was proposed andthe accuracy of the prediction results is improved [7] In theODmatrix the objective function was estimated by using thegeneralized least squaremodel as theODmatrix and the cor-responding solving process using the Lagrangian algorithmwas given However the accuracy of back stepping resultswas barely satisfactory [8 9] A multiobjective mathematicalmodel was proposed to address a situation in which severalprojects are candidate to be invested completely or partiallyThe computational results were compared with solutionsobtained by NSGA-II algorithm [10] Amultiobjective modelfor relief resources distribution facilities under an uncertaincondition is investigated in two ways of demand satisfactionby considering the relief resources accessibility and demandsatisfaction in a fuzzy logic To solve the problem andto do the sensitivity analysis the NSGA-II algorithm waspresented A constraintmethodwas also proposed to evaluatethe performance of the proposed algorithm [11]

The above research results have not been studied deepin the mass data preprocessing and there have still been nosupplementary mechanisms for coping with lack of data fieldvalue and so on The application of passenger data mainlyfocuses on the calibration of model parameters If we cancarry on the fine pretreatment and the missing supplementto the passenger flow data and on this basis we exploredeeper to probe the station time aggregation mechanismof passengers which will provide the data support for theoptimization of the scheme Based on the passenger flow dataof AFC inbound and outbound traffic from the operators thispaper studies the information tracking based on passengersrsquotransportation card numbers and travel time and calculatesthe travel time of passengers by combining relevant timeparameter data so as to construct the model of miningpassenger aggregation mechanism It is of great help on trainoperating plan making and temporary adjustment and canprovide theoretical and methodic references for supplemen-tation of passenger flow forecasting and emergency reliefmodel

2 Urban Rail Transit Passenger Flow DataCleaning and Processing

After many years of operation the rail transportation enter-prise has accumulated a great deal of operation-related dataand the Oracle database has a large capacity and because ofnot-so-reasonable design of fields stored procedures datarules and paradigms in the early stage of database con-struction the stored operational data contains redundancyor error that not only occupies a lot of storage space andcauses a waste of storage resources but also greatly reducesthe effectiveness of data analysis and stability Thereforefor abnormal data identification missing data complementerror data correction redundant data cleaning and so on[12] big data analysis and processing methods are of greatsignificance

Data redundancy missing data and nonstandard qualityproblems will have a fatal impact on big data application sothere is a need for cleaning big data having quality problemsYou can use the parallel technology to achieve high scalabilityof big data cleaning However due to the lack of effectivedesign common data cleaning methods based on MapRe-duce have redundant computation which leads to the perfor-mance degradation To optimize the data cleaning to improvethe efficiency of cleaning operations may start from theprocess shown in Figure 1 to design the optimization process

In the data generated by the rail transit operation someare filled in by hand some are automatically generated by themachine and some are inevitably missed For this reasona missing value filling mechanism based on naive Bayesianclassification is proposed See Figure 2 for the basic process

119875 (119862119894 | 119883) = 119875 (119883 | 119862119894) times 119875 (119862119894)119875 (119883) (1)

119875(119862119894 | 119883)means that when incident 119883 occurs the prob-ability of incident 119862119894 occurs 119875(119883 | 119862119894) means that whenincident119862119894 occurs the probability of incident119883 occurs119875(119862119894)means the probability of incident 119862119894 and 119875(119883) means theprobability of incident119883(1) Parameter Estimation The entire missing value is pop-ulated with the value of the maximum probability as thefilling value calculated by Bayesian classification [13]The task

Discrete Dynamics in Nature and Society 3

Start Parameter estimation Connection Filling End

Figure 2 The flowchart of missing value filling

of the parameter estimation module is to calculate all theprobabilities using (1) where all values of 119875(119883) are constantso only the calculation of 119875(119883 | 119862119894) times 119875(119862119894) is needed If theprior probability of missing values is unknown then it canbe assumed to be equal probability and 119875(119883 | 119862119894) can becalculated using

119875 (119883 | 119862119894) = 119875 (1198831 | 119862119894) times 119875 (1198832 | 119862119894) times sdot sdot sdottimes 119875 (119883119899 | 119862119894) (2)

119875(119883 | 119862119894) means that when incident 119862119894 occurs theprobability of incident 119883 occurs 119875(1198831 | 119862119894) means thatwhen incident 119862119894 occurs the probability of incident1198831 occurs 119875(119883119899 | 119862119894) means that when incident 119862119894occurs the probability of incident119883119899 occurs

So the entire parameter estimation module is used tocalculate the probability 119875(119883 | 119862119894) of each value of all attrib-utes

From the perspective of probability theory when thesample space is large enough the probability can be approx-imately equivalent to that of the frequency and the systemuses frequency of occurrence of each attribute value in tupleswithout missing values to calculate probability 119875(119883119899 | 119862119894)(2) Connection Module According to (2) the probability ofvarious values to be filled in the tuples containing missingvalues that are determined by its dependency attribute valuecan be calculated However since MapReduce can onlyprocess one record at a time in the Map phase and Reducephase the system must make the dependency attribute valueand its conditional probability correlate this is existencenecessity and the problem to be solved of the connectionmodule The input data of the connection module is theoutput data of the parameter estimation module and theoriginal data to be filledThe output data is the file correlatingthe dependent attribute values in the tuples containing themissing data with the conditional probability of the values

(3) Filling Module The filling module is implemented withMapReduce First the output result and the original inputdata of the connection module are subject to connectiveoperation with offset as the key The Map phase is similarto the connection module The Reduce phase uses (2) tocalculate the conditional probability corresponding to each119862119894 (a possible value of an attribute to be filled) and select119862119894 with the maximum 119875(119862119894 | 119883) probability for filling themissing value

3 The Time_cluster119898119899119896 Model ofAggregation Mechanism Mining

When the database is changed the fuzzy frequent attribute setmaynot be frequent in the updated database In this paper the

definition of negative bounds of the fuzzy frequent attributesets is proposed for the fuzzy sets of the fuzzy frequentattribute set the negative boundary and the fuzzy attributeset in the original database and used for mining the timepassenger flow

31 Mining Association Rules of Passenger Flow Time Seriesin Rail Transit In fuzzy discretization of time series set thetime series 119878 = (1199091 1199092 1199093 119909119899) the time window of thewidth 119908 acts on the subsequence 119878119894 = (1199091 1199092 119909119894+119908minus1)formed by 119878 and a series of subsequences 1199091 1199092 1199093 119909119899minus119908+1 of the width 119908 is formed by single-step slip

119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 (3)

119882(119904 119908) is regarded as 119899 minus 119908 + 1 points in 119908 Euclideanspace randomly assigned to the class 119896 to calculate eachtype of center as a representative of each category and theelements 119878119894 119894 = 1 2 119899 minus 119908 + 1 of the set119882(119904 119908) and themembership attribute function 119906119895(119878119894) belonging to the class-119895representatives is

119906119895 (119878119894) = (1 10038171003817100381710038171003817119904119894 minus 119909119895100381710038171003817100381710038172)1(119887minus1)

sum119896119888=1 (1 1003817100381710038171003817119904119894 minus 1199091198881003817100381710038171003817)2 119895 = 1 2 119896 119887 gt 1 (4)

where 119887 gt 1 denotes a constant that controls the fuzzy degreeof the clustering results 119904119894minus1199091198952 represents the square of thedistance from each point to the representative point of theclass 119895

In the calculation of the support if 119860 occurs 119861 occursin time 119879 that is we need to determine the frequency ofoccurrence of 119860

119865 (119860) = 119899minus119908+1sum119894=1

119906119860 (119878119894) (5)

where 119906119860(119878119894) is the membership of point 119878119894 belonging to the119860th representative attributeThe support 119888 of the correlation rule 119860 119879997888rarr 119861 is 119888(119860 119879997888rarr119861) = 119865(119860 119861 119879)119865(119860) where 119865(119860 119861 119879) denotes frequency of

occurrence of 119861 in time 119879 after 119860 happens with reference tothe 119862-means method of the rule 119860 119879997888rarr 119861

119862 (119861119879 119860) = 119901 (119860) times 119901 (119861119879 | 119860) log 119901 (119861119879 | 119860)119901 (119861119879)+ [1 minus 119901 (119861119879 | 119860)] log 1 minus 119901 (119861119879 | 119860)1 minus 119901 (119861119879)

(6)

where 119901(119860) denotes probability of occurrence of the rulein 119860 119901(119861119879 | 119860) represents occurrence forms of 119861 in time119879 after 119860 occurs and the right side of (5) represents the

4 Discrete Dynamics in Nature and Society

Table 1 The AFC passenger flow data of Shanghai Metro Line 16 (original)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui 31U07452172874 0558 University Town 0649 Zhaojiabang PositiveU31842254712 0658 University Town 0728 Yishan Rd PositiveU12452172127 0831 Jiuting 0903 XujiahuiU07452172874 0621 University Town 0723 Zhaojiabang PositiveU31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1751 University Town PositiveU07452172874 1702 Xujiahui 30

information acquisition process from a priori probability119901(119861119879) to posteriori probability 119901(119861119879 | 119860)32 Mining of Temporal Passenger Flow OD Pairs Basedon the Support of Correlation Rules By preprocessing theAFC passenger flow data of the rail transit stations theobtained data have beenmore accurate and in accordwith thedata format and redundancy requirement of the applicationdata mining tools The main algorithm is used to exploretemporal aggregation correlation rules for passenger datafor the purpose of mining the temporal passenger flow andthe passenger flow OD pairs with traffic card numbers asthe major cord and the designed algorithm is shown asfollows

Step 1 Calculate 119906119860(119878119894) first and then 119888(119860 119879997888rarr 119861) accordingto the support of passenger flow of the rail transit and setminsupport = 119888(119860 119879997888rarr 119861)Step 2 Initialize AFC passenger flow database according tothe mining condition with the cleaned and preprocesseddata as initial data for correlation rule mining scanning thetransaction database 119879ID and finding all the item sets whoselength is 119896 = 1 forming candidate 1-item set (1198621) substituting1198621 into (5) and (6) and calculating the support of eachitem comparing with the minimum support one by one andforming the frequent 1-item set if the support is greater thanthe minsupport

Step 3 Generate candidate (119896 + 1) item set 119862119896+1 accordingto the frequent 119896-item set if 119862119896+1 = Φ go to the next stepotherwise the loop ends

Step 4 Scan the database to calculate and determine thesupport for each candidate item set

Step 5 Delete candidate items with the support smaller thanminsupport forming a frequent item set 119871119896+1 of the (119896 + 1)item

Step 6 119896 = 119896 + 1 go to Step 3

Step 7 This results in the frequent item set correlation rulesof temporal passenger flow aggregation

Step 8 Output 2-hour particle size of the temporal passengerflow aggregation and the results of OD pairs to guide thepreparation and adjustment of rail transit operation plans

4 Case Study and Analysis

41 Cleaning and Missing Filling of Passenger Flow DataPassenger flow data of this paper comes from the AFC gateof Shanghai Rail Transit Line 9 and passenger flow periodcovers 600ndash2200Themain data field includes card numberarrival time inbound station departure time outboundstation P + R (the indication if the P + R parking lots is used)and total time (travel time) many of which are redundantdata such as fare evasion transportation card lost and otherreasons that resulted in the invalid data and some raw dataare shown as Table 1

From the original data we can see that some data fieldsaremissing especially the fields that have to be filled the datacleaning and filling method designed in Section 2 is used todeal with the original data to facilitate data mining

In order to fill the loss values of the original data substi-tuting (2) into (1) yields

119875 (119862119894 | 119883) = 119875 (119883 | 119862119894) = 119875 (1198831 | 119862119894) times 119875 (1198832 | 119862119894) times sdot sdot sdot times 119875 (119883119899 | 119862119894) times 119875 (119862119894)119875 (119883) (7)

Based on the card number field value of AFC data traveltimes and the OD percentage of each passenger are collectedand retrieved by SQL statement and substituted into (7) and

through parameter estimation connection filling and othermodules supplementing the original data yields results ofTable 2

Discrete Dynamics in Nature and Society 5

Table 2 The AFC passenger flow data of Shanghai Metro Line 16 (supplementary and correction)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui Negative 31U07452172874 0558 University Town 0649 Zhaojiabang Rd Positive 51U31842254712 0658 University Town 0728 Yishan Rd Negative 30U12452172127 0831 Jiuting 0903 Xujiahui Negative 32U07452172874 0621 University Town 0710 Zhaojiabang Rd Positive 41U31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1732 University Town Positive 46U07452172874 1702 Xujiahui 1748 University Town Positive 30

Table 3 Characteristic values of 2-hour particle size of Da Xue Cheng Station

Time series ClassMembership 119906119895(119878119894) Calculation of support 119862(119860 minus 119861) Passenger flow

600ndash800 085 090 47804801ndash1000 074 086 310121001-1200 091 084 114301201ndash1400 090 080 90101401ndash1600 089 084 130141601ndash1800 082 081 302541801ndash2000 069 083 301472001ndash2200 092 084 29874

Table 4 The aggregation passenger flow based on distribution time

Time range StationSt1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800 7031 6031 9214 15274 22215 14721 16014 8721 10721 30264 28214801ndash1000 12031 13031 8610 11201 20203 6321 11782 6321 9843 19721 187341001ndash1200 10574 7574 7141 9845 6201 5014 9487 7034 7123 10351 163211201ndash1400 9414 3414 6573 9216 5845 4782 5014 6791 8787 9014 100241401ndash1600 9018 10034 5362 8142 9216 3587 6571 8482 5591 7782 97891601ndash1800 8124 8124 5217 9345 13201 8721 20362 6217 10258 9587 81371801ndash2000 10250 22230 6057 9714 9231 6321 18217 9354 9239 8214 103542001ndash2200 9217 13217 5325 8521 7227 5014 9057 8242 8427 9241 9792

42 Mining of Passenger Flow Period Aggregation Accordingto the time series discretization design in Section 31 thepassenger flow period is divided according to the 2-hourparticle size and single-step slip forms a series of time series119878 = (1199091 1199092 1199093 119909119899) with width of 2

Calculating119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 andthe membership function 119906119895(119878119894) = (1119904119894 minus 1199091198952)1(119887minus1)sum119896119888=1(1119904119894 minus 119909119888)2 119895 = 1 2 119896 119887 gt 1 obtains the mem-bership of station passenger flow aggregation of every 2-hourparticle size

Passenger flow period covers 600 to 2000 and isdivided as 2-hour particle size into 600ndash800 800ndash10001000ndash1200 1200ndash1400 1400ndash1600 1600ndash1800 1800ndash2000 and 2000ndash2200 and membership 119906119895(119878119894) calculationof support 119862(119860 minus 119861) and passenger flow of each period areshown in Table 3

The above-mentioned mining algorithm can be used tocalculate 2-hour particle size period passenger flow of theShanghaiMetro Line 9GuilinRoadCaohejingDevelopmentZone Hechuan Road Xingzhong Road Qibao ZhongchunRoad Jiuting Sijing Sheshan Dongjing Da Xue ChengStation The results are shown in Table 4

Figure 3 shows the change trend of distribution time dataIt can show the passenger flow distribution of the stations inthe 2-hour particle size range visually The distribution of thepassenger flow can guide the operation directly

The mining results in Table 4 can provide the datasupport for the travel rush hour of the passengers and trafficadjustment scheme of the rail transit station so that the traincapacity can be matched with the passenger flow and thepassengers are quickly distributed to the road network It canalso provide data support for the determination of the speed

6 Discrete Dynamics in Nature and Society

Table 5 The mining of passengersrsquo travel OD pair rule (upward direction)

Card number O D P + RU12452172127 Jiuting Xujiahui NegativeU07452172874 Songjiang University Town Zhaojiabang Rd PositiveU31842254712 Songjiang University Town Yishan Rd PositiveU01552172876 Songjiang New Town Qibao PositiveU07452172874 Songjiang New Town Yishan Rd PositiveU12452376572 Songjiang Sports Center Century Ave NegativeU06571172546 Dongjing Century Ave NegativeU318745684324 Sheshan Xujiahui NegativeU01555476158 Qibao Century Ave NegativeU15478475786 Jiuting Century Ave Negative

0

5000

10000

15000

20000

25000

30000

35000

St1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800

801ndash1000

1001ndash1200

1201ndash14001401ndash1600

1601ndash1800

1801ndash2000

2001ndash2200

Figure 3 The aggregation passenger flow based on distributiontime

ratio of expressslow trains of the subway lines of the railtransitThe ratio of the expressslow trains can be determinedaccording to the temporal passenger flow thus saving energyconsumption and passenger travel time

43 Mining of Passenger Flow OD Pairs Based on CorrelationRules According to Section 32 119888(119860 119879997888rarr 119861) of the OD pairsof each traffic card number and comparedwithminsupport =119888(119860 119879997888rarr 119861) the pairs with values higher than it are determinedas the OD pairsThemining results are shown in Table 5Themining results can provide accurate decision-making supportfor the determination of the arrival scheme of the expresstrains in the expressslow mode

It can be seen from Table 5 that if we can combine thetravel time of passengers to mine the OD pairs it can providemore elaborated decision support for the optimization ofthe expressslow train program To a certain extent theaccessibility of passenger travel is increased especially in thepeak commuting accessibility is particularly important toreduce the unnecessary transfer and transfer passenger flowin the road network is relatively reduced

5 Conclusions

Suburban rail transport passenger transport organizationproblems have troubled operation and management enter-prises over years Passenger transport organizations areinvolved in key indicators such as the temporal passengerflow passenger OD and speed car ratio of the expressslowtrains Passenger flow data of the rail transit are complicatedin structure in which redundancy and missing coexist Inthis paper we use the big data processing method to purifyand supplement the original data and then explore the timedistribution rule and the OD pairs of the passenger flow indepth The passenger flow AFC data of Shanghai Rail TransitLine 9 are taken as inputs and are revised supplemented andmined to give out the time distribution characteristics of thepassenger flow of Line 9 in the end and verify the feasibilityof the method In the future the time distribution rule andOD pairs of passenger flow under the network condition willbemined in order to guide the establishment of the operationplan of the whole road network

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research described in this paper was supported by theNational Natural Science Foundation of China (Grant no71601110) the National Key Research and Development Planof China (Grant no 2016YFC0802505) and the NationalKey Technologies R amp D Program of China (Grant no2015BAG19B02-28) The authors also gratefully acknowledgethose who provided data and suggestions

References

[1] W Gordon ldquoImproved modeling of non-home-base tripsrdquoTransportation Research Record pp 23ndash25 1996

[2] R C M Yam R CWhitfield and RW F Chung ldquoForecastingtraffic generation in public housing estatesrdquo Journal of Trans-portation Engineering vol 126 no 4 pp 358ndash361 2000

Discrete Dynamics in Nature and Society 7

[3] J L Bowman andM E Ben-Akiva ldquoActivity-based disaggregatetravel demand model system with activity schedulesrdquo Trans-portation Research A Policy and Practice vol 35 no 1 pp 1ndash282001

[4] Y Xiaoguang ldquoMultinomial distribution lagging model usedfor short-term traffic flow predictionrdquo Journal of Tongji Univer-sity vol 39 no 9 pp 1297ndash1302 2011

[5] Y Enjian Study on Urban Rail Transit OD Distribution Predic-tion Models under Networking Condition [D] Beijing JiaotongUniversity 2013

[6] Z Peng ldquoUrban rail transit network short-term passengerflow OD estimation modelrdquo The Journal of Traffic SystemsEngineering and Information Technology vol 15 no 2 pp 149ndash155 2015

[7] C Xiaohong ldquoStudy on rail transit passenger flow dynamic ODmatrix estimation methodsrdquo The Journal of Tongji Universityvol 24 no 12 pp 56ndash72 2010

[8] A Jamili M A Shafia S J Sadjadi and R Tavakkoli-Moghad-dam ldquoSolving a periodic single-track train timetabling problemby an efficient hybrid algorithmrdquo Engineering Applications ofArtificial Intelligence vol 25 no 4 pp 793ndash800 2012

[9] M T Isaai A Kanani M Tootoonchi andH R Afzali ldquoIntelli-gent timetable evaluation using fuzzyAHPrdquoExpert SystemswithApplications vol 38 no 4 pp 3718ndash3723 2011

[10] A Ceder ldquoPublic-transport automated timetables using evenheadway and even passenger load conceptsrdquo in Proceedings ofthe 32nd Australasian Transport Research Forum (ATRF rsquo09) 171 pages Auckland New Zealand October 2009

[11] A Nuzzolo U Crisalli and L Rosati ldquoA schedule-based assign-ment model with explicit capacity constraints for congestedtransit networksrdquo Transportation Research Part C EmergingTechnologies vol 20 no 1 pp 16ndash33 2012

[12] M Carey and I Crawford ldquoScheduling trains on a network ofbusy complex stationsrdquo Transportation Research B Methodolog-ical vol 41 no 2 pp 159ndash178 2007

[13] V Guihaire and J K Hao ldquoTransit network design and schedul-ing a global reviewrdquo Transportation Research A Policy andPractice vol 42 no 10 pp 1251ndash1273 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: The Aggregation Mechanism Mining of Passengers’ Flow with

2 Discrete Dynamics in Nature and Society

Interaction module

Raw data

Naive Bayesianclassification missingvalue filling

Rule mining

True-value fitting

Value correction

Data verifying

Cleaning completedCleaning required

Figure 1 The flowchart of data cleaning and preprocessing

and path selection Professor Peng used the extrapolationmethod on parameter calibration because of the lack ofsurvey results on destination selection which resulted in theseparation of two phases and affected the prediction accuracyof the model [6] A short-term passenger flowOD estimationmodel based on the state space method was proposed andthe accuracy of the prediction results is improved [7] In theODmatrix the objective function was estimated by using thegeneralized least squaremodel as theODmatrix and the cor-responding solving process using the Lagrangian algorithmwas given However the accuracy of back stepping resultswas barely satisfactory [8 9] A multiobjective mathematicalmodel was proposed to address a situation in which severalprojects are candidate to be invested completely or partiallyThe computational results were compared with solutionsobtained by NSGA-II algorithm [10] Amultiobjective modelfor relief resources distribution facilities under an uncertaincondition is investigated in two ways of demand satisfactionby considering the relief resources accessibility and demandsatisfaction in a fuzzy logic To solve the problem andto do the sensitivity analysis the NSGA-II algorithm waspresented A constraintmethodwas also proposed to evaluatethe performance of the proposed algorithm [11]

The above research results have not been studied deepin the mass data preprocessing and there have still been nosupplementary mechanisms for coping with lack of data fieldvalue and so on The application of passenger data mainlyfocuses on the calibration of model parameters If we cancarry on the fine pretreatment and the missing supplementto the passenger flow data and on this basis we exploredeeper to probe the station time aggregation mechanismof passengers which will provide the data support for theoptimization of the scheme Based on the passenger flow dataof AFC inbound and outbound traffic from the operators thispaper studies the information tracking based on passengersrsquotransportation card numbers and travel time and calculatesthe travel time of passengers by combining relevant timeparameter data so as to construct the model of miningpassenger aggregation mechanism It is of great help on trainoperating plan making and temporary adjustment and canprovide theoretical and methodic references for supplemen-tation of passenger flow forecasting and emergency reliefmodel

2 Urban Rail Transit Passenger Flow DataCleaning and Processing

After many years of operation the rail transportation enter-prise has accumulated a great deal of operation-related dataand the Oracle database has a large capacity and because ofnot-so-reasonable design of fields stored procedures datarules and paradigms in the early stage of database con-struction the stored operational data contains redundancyor error that not only occupies a lot of storage space andcauses a waste of storage resources but also greatly reducesthe effectiveness of data analysis and stability Thereforefor abnormal data identification missing data complementerror data correction redundant data cleaning and so on[12] big data analysis and processing methods are of greatsignificance

Data redundancy missing data and nonstandard qualityproblems will have a fatal impact on big data application sothere is a need for cleaning big data having quality problemsYou can use the parallel technology to achieve high scalabilityof big data cleaning However due to the lack of effectivedesign common data cleaning methods based on MapRe-duce have redundant computation which leads to the perfor-mance degradation To optimize the data cleaning to improvethe efficiency of cleaning operations may start from theprocess shown in Figure 1 to design the optimization process

In the data generated by the rail transit operation someare filled in by hand some are automatically generated by themachine and some are inevitably missed For this reasona missing value filling mechanism based on naive Bayesianclassification is proposed See Figure 2 for the basic process

119875 (119862119894 | 119883) = 119875 (119883 | 119862119894) times 119875 (119862119894)119875 (119883) (1)

119875(119862119894 | 119883)means that when incident 119883 occurs the prob-ability of incident 119862119894 occurs 119875(119883 | 119862119894) means that whenincident119862119894 occurs the probability of incident119883 occurs119875(119862119894)means the probability of incident 119862119894 and 119875(119883) means theprobability of incident119883(1) Parameter Estimation The entire missing value is pop-ulated with the value of the maximum probability as thefilling value calculated by Bayesian classification [13]The task

Discrete Dynamics in Nature and Society 3

Start Parameter estimation Connection Filling End

Figure 2 The flowchart of missing value filling

of the parameter estimation module is to calculate all theprobabilities using (1) where all values of 119875(119883) are constantso only the calculation of 119875(119883 | 119862119894) times 119875(119862119894) is needed If theprior probability of missing values is unknown then it canbe assumed to be equal probability and 119875(119883 | 119862119894) can becalculated using

119875 (119883 | 119862119894) = 119875 (1198831 | 119862119894) times 119875 (1198832 | 119862119894) times sdot sdot sdottimes 119875 (119883119899 | 119862119894) (2)

119875(119883 | 119862119894) means that when incident 119862119894 occurs theprobability of incident 119883 occurs 119875(1198831 | 119862119894) means thatwhen incident 119862119894 occurs the probability of incident1198831 occurs 119875(119883119899 | 119862119894) means that when incident 119862119894occurs the probability of incident119883119899 occurs

So the entire parameter estimation module is used tocalculate the probability 119875(119883 | 119862119894) of each value of all attrib-utes

From the perspective of probability theory when thesample space is large enough the probability can be approx-imately equivalent to that of the frequency and the systemuses frequency of occurrence of each attribute value in tupleswithout missing values to calculate probability 119875(119883119899 | 119862119894)(2) Connection Module According to (2) the probability ofvarious values to be filled in the tuples containing missingvalues that are determined by its dependency attribute valuecan be calculated However since MapReduce can onlyprocess one record at a time in the Map phase and Reducephase the system must make the dependency attribute valueand its conditional probability correlate this is existencenecessity and the problem to be solved of the connectionmodule The input data of the connection module is theoutput data of the parameter estimation module and theoriginal data to be filledThe output data is the file correlatingthe dependent attribute values in the tuples containing themissing data with the conditional probability of the values

(3) Filling Module The filling module is implemented withMapReduce First the output result and the original inputdata of the connection module are subject to connectiveoperation with offset as the key The Map phase is similarto the connection module The Reduce phase uses (2) tocalculate the conditional probability corresponding to each119862119894 (a possible value of an attribute to be filled) and select119862119894 with the maximum 119875(119862119894 | 119883) probability for filling themissing value

3 The Time_cluster119898119899119896 Model ofAggregation Mechanism Mining

When the database is changed the fuzzy frequent attribute setmaynot be frequent in the updated database In this paper the

definition of negative bounds of the fuzzy frequent attributesets is proposed for the fuzzy sets of the fuzzy frequentattribute set the negative boundary and the fuzzy attributeset in the original database and used for mining the timepassenger flow

31 Mining Association Rules of Passenger Flow Time Seriesin Rail Transit In fuzzy discretization of time series set thetime series 119878 = (1199091 1199092 1199093 119909119899) the time window of thewidth 119908 acts on the subsequence 119878119894 = (1199091 1199092 119909119894+119908minus1)formed by 119878 and a series of subsequences 1199091 1199092 1199093 119909119899minus119908+1 of the width 119908 is formed by single-step slip

119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 (3)

119882(119904 119908) is regarded as 119899 minus 119908 + 1 points in 119908 Euclideanspace randomly assigned to the class 119896 to calculate eachtype of center as a representative of each category and theelements 119878119894 119894 = 1 2 119899 minus 119908 + 1 of the set119882(119904 119908) and themembership attribute function 119906119895(119878119894) belonging to the class-119895representatives is

119906119895 (119878119894) = (1 10038171003817100381710038171003817119904119894 minus 119909119895100381710038171003817100381710038172)1(119887minus1)

sum119896119888=1 (1 1003817100381710038171003817119904119894 minus 1199091198881003817100381710038171003817)2 119895 = 1 2 119896 119887 gt 1 (4)

where 119887 gt 1 denotes a constant that controls the fuzzy degreeof the clustering results 119904119894minus1199091198952 represents the square of thedistance from each point to the representative point of theclass 119895

In the calculation of the support if 119860 occurs 119861 occursin time 119879 that is we need to determine the frequency ofoccurrence of 119860

119865 (119860) = 119899minus119908+1sum119894=1

119906119860 (119878119894) (5)

where 119906119860(119878119894) is the membership of point 119878119894 belonging to the119860th representative attributeThe support 119888 of the correlation rule 119860 119879997888rarr 119861 is 119888(119860 119879997888rarr119861) = 119865(119860 119861 119879)119865(119860) where 119865(119860 119861 119879) denotes frequency of

occurrence of 119861 in time 119879 after 119860 happens with reference tothe 119862-means method of the rule 119860 119879997888rarr 119861

119862 (119861119879 119860) = 119901 (119860) times 119901 (119861119879 | 119860) log 119901 (119861119879 | 119860)119901 (119861119879)+ [1 minus 119901 (119861119879 | 119860)] log 1 minus 119901 (119861119879 | 119860)1 minus 119901 (119861119879)

(6)

where 119901(119860) denotes probability of occurrence of the rulein 119860 119901(119861119879 | 119860) represents occurrence forms of 119861 in time119879 after 119860 occurs and the right side of (5) represents the

4 Discrete Dynamics in Nature and Society

Table 1 The AFC passenger flow data of Shanghai Metro Line 16 (original)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui 31U07452172874 0558 University Town 0649 Zhaojiabang PositiveU31842254712 0658 University Town 0728 Yishan Rd PositiveU12452172127 0831 Jiuting 0903 XujiahuiU07452172874 0621 University Town 0723 Zhaojiabang PositiveU31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1751 University Town PositiveU07452172874 1702 Xujiahui 30

information acquisition process from a priori probability119901(119861119879) to posteriori probability 119901(119861119879 | 119860)32 Mining of Temporal Passenger Flow OD Pairs Basedon the Support of Correlation Rules By preprocessing theAFC passenger flow data of the rail transit stations theobtained data have beenmore accurate and in accordwith thedata format and redundancy requirement of the applicationdata mining tools The main algorithm is used to exploretemporal aggregation correlation rules for passenger datafor the purpose of mining the temporal passenger flow andthe passenger flow OD pairs with traffic card numbers asthe major cord and the designed algorithm is shown asfollows

Step 1 Calculate 119906119860(119878119894) first and then 119888(119860 119879997888rarr 119861) accordingto the support of passenger flow of the rail transit and setminsupport = 119888(119860 119879997888rarr 119861)Step 2 Initialize AFC passenger flow database according tothe mining condition with the cleaned and preprocesseddata as initial data for correlation rule mining scanning thetransaction database 119879ID and finding all the item sets whoselength is 119896 = 1 forming candidate 1-item set (1198621) substituting1198621 into (5) and (6) and calculating the support of eachitem comparing with the minimum support one by one andforming the frequent 1-item set if the support is greater thanthe minsupport

Step 3 Generate candidate (119896 + 1) item set 119862119896+1 accordingto the frequent 119896-item set if 119862119896+1 = Φ go to the next stepotherwise the loop ends

Step 4 Scan the database to calculate and determine thesupport for each candidate item set

Step 5 Delete candidate items with the support smaller thanminsupport forming a frequent item set 119871119896+1 of the (119896 + 1)item

Step 6 119896 = 119896 + 1 go to Step 3

Step 7 This results in the frequent item set correlation rulesof temporal passenger flow aggregation

Step 8 Output 2-hour particle size of the temporal passengerflow aggregation and the results of OD pairs to guide thepreparation and adjustment of rail transit operation plans

4 Case Study and Analysis

41 Cleaning and Missing Filling of Passenger Flow DataPassenger flow data of this paper comes from the AFC gateof Shanghai Rail Transit Line 9 and passenger flow periodcovers 600ndash2200Themain data field includes card numberarrival time inbound station departure time outboundstation P + R (the indication if the P + R parking lots is used)and total time (travel time) many of which are redundantdata such as fare evasion transportation card lost and otherreasons that resulted in the invalid data and some raw dataare shown as Table 1

From the original data we can see that some data fieldsaremissing especially the fields that have to be filled the datacleaning and filling method designed in Section 2 is used todeal with the original data to facilitate data mining

In order to fill the loss values of the original data substi-tuting (2) into (1) yields

119875 (119862119894 | 119883) = 119875 (119883 | 119862119894) = 119875 (1198831 | 119862119894) times 119875 (1198832 | 119862119894) times sdot sdot sdot times 119875 (119883119899 | 119862119894) times 119875 (119862119894)119875 (119883) (7)

Based on the card number field value of AFC data traveltimes and the OD percentage of each passenger are collectedand retrieved by SQL statement and substituted into (7) and

through parameter estimation connection filling and othermodules supplementing the original data yields results ofTable 2

Discrete Dynamics in Nature and Society 5

Table 2 The AFC passenger flow data of Shanghai Metro Line 16 (supplementary and correction)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui Negative 31U07452172874 0558 University Town 0649 Zhaojiabang Rd Positive 51U31842254712 0658 University Town 0728 Yishan Rd Negative 30U12452172127 0831 Jiuting 0903 Xujiahui Negative 32U07452172874 0621 University Town 0710 Zhaojiabang Rd Positive 41U31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1732 University Town Positive 46U07452172874 1702 Xujiahui 1748 University Town Positive 30

Table 3 Characteristic values of 2-hour particle size of Da Xue Cheng Station

Time series ClassMembership 119906119895(119878119894) Calculation of support 119862(119860 minus 119861) Passenger flow

600ndash800 085 090 47804801ndash1000 074 086 310121001-1200 091 084 114301201ndash1400 090 080 90101401ndash1600 089 084 130141601ndash1800 082 081 302541801ndash2000 069 083 301472001ndash2200 092 084 29874

Table 4 The aggregation passenger flow based on distribution time

Time range StationSt1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800 7031 6031 9214 15274 22215 14721 16014 8721 10721 30264 28214801ndash1000 12031 13031 8610 11201 20203 6321 11782 6321 9843 19721 187341001ndash1200 10574 7574 7141 9845 6201 5014 9487 7034 7123 10351 163211201ndash1400 9414 3414 6573 9216 5845 4782 5014 6791 8787 9014 100241401ndash1600 9018 10034 5362 8142 9216 3587 6571 8482 5591 7782 97891601ndash1800 8124 8124 5217 9345 13201 8721 20362 6217 10258 9587 81371801ndash2000 10250 22230 6057 9714 9231 6321 18217 9354 9239 8214 103542001ndash2200 9217 13217 5325 8521 7227 5014 9057 8242 8427 9241 9792

42 Mining of Passenger Flow Period Aggregation Accordingto the time series discretization design in Section 31 thepassenger flow period is divided according to the 2-hourparticle size and single-step slip forms a series of time series119878 = (1199091 1199092 1199093 119909119899) with width of 2

Calculating119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 andthe membership function 119906119895(119878119894) = (1119904119894 minus 1199091198952)1(119887minus1)sum119896119888=1(1119904119894 minus 119909119888)2 119895 = 1 2 119896 119887 gt 1 obtains the mem-bership of station passenger flow aggregation of every 2-hourparticle size

Passenger flow period covers 600 to 2000 and isdivided as 2-hour particle size into 600ndash800 800ndash10001000ndash1200 1200ndash1400 1400ndash1600 1600ndash1800 1800ndash2000 and 2000ndash2200 and membership 119906119895(119878119894) calculationof support 119862(119860 minus 119861) and passenger flow of each period areshown in Table 3

The above-mentioned mining algorithm can be used tocalculate 2-hour particle size period passenger flow of theShanghaiMetro Line 9GuilinRoadCaohejingDevelopmentZone Hechuan Road Xingzhong Road Qibao ZhongchunRoad Jiuting Sijing Sheshan Dongjing Da Xue ChengStation The results are shown in Table 4

Figure 3 shows the change trend of distribution time dataIt can show the passenger flow distribution of the stations inthe 2-hour particle size range visually The distribution of thepassenger flow can guide the operation directly

The mining results in Table 4 can provide the datasupport for the travel rush hour of the passengers and trafficadjustment scheme of the rail transit station so that the traincapacity can be matched with the passenger flow and thepassengers are quickly distributed to the road network It canalso provide data support for the determination of the speed

6 Discrete Dynamics in Nature and Society

Table 5 The mining of passengersrsquo travel OD pair rule (upward direction)

Card number O D P + RU12452172127 Jiuting Xujiahui NegativeU07452172874 Songjiang University Town Zhaojiabang Rd PositiveU31842254712 Songjiang University Town Yishan Rd PositiveU01552172876 Songjiang New Town Qibao PositiveU07452172874 Songjiang New Town Yishan Rd PositiveU12452376572 Songjiang Sports Center Century Ave NegativeU06571172546 Dongjing Century Ave NegativeU318745684324 Sheshan Xujiahui NegativeU01555476158 Qibao Century Ave NegativeU15478475786 Jiuting Century Ave Negative

0

5000

10000

15000

20000

25000

30000

35000

St1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800

801ndash1000

1001ndash1200

1201ndash14001401ndash1600

1601ndash1800

1801ndash2000

2001ndash2200

Figure 3 The aggregation passenger flow based on distributiontime

ratio of expressslow trains of the subway lines of the railtransitThe ratio of the expressslow trains can be determinedaccording to the temporal passenger flow thus saving energyconsumption and passenger travel time

43 Mining of Passenger Flow OD Pairs Based on CorrelationRules According to Section 32 119888(119860 119879997888rarr 119861) of the OD pairsof each traffic card number and comparedwithminsupport =119888(119860 119879997888rarr 119861) the pairs with values higher than it are determinedas the OD pairsThemining results are shown in Table 5Themining results can provide accurate decision-making supportfor the determination of the arrival scheme of the expresstrains in the expressslow mode

It can be seen from Table 5 that if we can combine thetravel time of passengers to mine the OD pairs it can providemore elaborated decision support for the optimization ofthe expressslow train program To a certain extent theaccessibility of passenger travel is increased especially in thepeak commuting accessibility is particularly important toreduce the unnecessary transfer and transfer passenger flowin the road network is relatively reduced

5 Conclusions

Suburban rail transport passenger transport organizationproblems have troubled operation and management enter-prises over years Passenger transport organizations areinvolved in key indicators such as the temporal passengerflow passenger OD and speed car ratio of the expressslowtrains Passenger flow data of the rail transit are complicatedin structure in which redundancy and missing coexist Inthis paper we use the big data processing method to purifyand supplement the original data and then explore the timedistribution rule and the OD pairs of the passenger flow indepth The passenger flow AFC data of Shanghai Rail TransitLine 9 are taken as inputs and are revised supplemented andmined to give out the time distribution characteristics of thepassenger flow of Line 9 in the end and verify the feasibilityof the method In the future the time distribution rule andOD pairs of passenger flow under the network condition willbemined in order to guide the establishment of the operationplan of the whole road network

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research described in this paper was supported by theNational Natural Science Foundation of China (Grant no71601110) the National Key Research and Development Planof China (Grant no 2016YFC0802505) and the NationalKey Technologies R amp D Program of China (Grant no2015BAG19B02-28) The authors also gratefully acknowledgethose who provided data and suggestions

References

[1] W Gordon ldquoImproved modeling of non-home-base tripsrdquoTransportation Research Record pp 23ndash25 1996

[2] R C M Yam R CWhitfield and RW F Chung ldquoForecastingtraffic generation in public housing estatesrdquo Journal of Trans-portation Engineering vol 126 no 4 pp 358ndash361 2000

Discrete Dynamics in Nature and Society 7

[3] J L Bowman andM E Ben-Akiva ldquoActivity-based disaggregatetravel demand model system with activity schedulesrdquo Trans-portation Research A Policy and Practice vol 35 no 1 pp 1ndash282001

[4] Y Xiaoguang ldquoMultinomial distribution lagging model usedfor short-term traffic flow predictionrdquo Journal of Tongji Univer-sity vol 39 no 9 pp 1297ndash1302 2011

[5] Y Enjian Study on Urban Rail Transit OD Distribution Predic-tion Models under Networking Condition [D] Beijing JiaotongUniversity 2013

[6] Z Peng ldquoUrban rail transit network short-term passengerflow OD estimation modelrdquo The Journal of Traffic SystemsEngineering and Information Technology vol 15 no 2 pp 149ndash155 2015

[7] C Xiaohong ldquoStudy on rail transit passenger flow dynamic ODmatrix estimation methodsrdquo The Journal of Tongji Universityvol 24 no 12 pp 56ndash72 2010

[8] A Jamili M A Shafia S J Sadjadi and R Tavakkoli-Moghad-dam ldquoSolving a periodic single-track train timetabling problemby an efficient hybrid algorithmrdquo Engineering Applications ofArtificial Intelligence vol 25 no 4 pp 793ndash800 2012

[9] M T Isaai A Kanani M Tootoonchi andH R Afzali ldquoIntelli-gent timetable evaluation using fuzzyAHPrdquoExpert SystemswithApplications vol 38 no 4 pp 3718ndash3723 2011

[10] A Ceder ldquoPublic-transport automated timetables using evenheadway and even passenger load conceptsrdquo in Proceedings ofthe 32nd Australasian Transport Research Forum (ATRF rsquo09) 171 pages Auckland New Zealand October 2009

[11] A Nuzzolo U Crisalli and L Rosati ldquoA schedule-based assign-ment model with explicit capacity constraints for congestedtransit networksrdquo Transportation Research Part C EmergingTechnologies vol 20 no 1 pp 16ndash33 2012

[12] M Carey and I Crawford ldquoScheduling trains on a network ofbusy complex stationsrdquo Transportation Research B Methodolog-ical vol 41 no 2 pp 159ndash178 2007

[13] V Guihaire and J K Hao ldquoTransit network design and schedul-ing a global reviewrdquo Transportation Research A Policy andPractice vol 42 no 10 pp 1251ndash1273 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: The Aggregation Mechanism Mining of Passengers’ Flow with

Discrete Dynamics in Nature and Society 3

Start Parameter estimation Connection Filling End

Figure 2 The flowchart of missing value filling

of the parameter estimation module is to calculate all theprobabilities using (1) where all values of 119875(119883) are constantso only the calculation of 119875(119883 | 119862119894) times 119875(119862119894) is needed If theprior probability of missing values is unknown then it canbe assumed to be equal probability and 119875(119883 | 119862119894) can becalculated using

119875 (119883 | 119862119894) = 119875 (1198831 | 119862119894) times 119875 (1198832 | 119862119894) times sdot sdot sdottimes 119875 (119883119899 | 119862119894) (2)

119875(119883 | 119862119894) means that when incident 119862119894 occurs theprobability of incident 119883 occurs 119875(1198831 | 119862119894) means thatwhen incident 119862119894 occurs the probability of incident1198831 occurs 119875(119883119899 | 119862119894) means that when incident 119862119894occurs the probability of incident119883119899 occurs

So the entire parameter estimation module is used tocalculate the probability 119875(119883 | 119862119894) of each value of all attrib-utes

From the perspective of probability theory when thesample space is large enough the probability can be approx-imately equivalent to that of the frequency and the systemuses frequency of occurrence of each attribute value in tupleswithout missing values to calculate probability 119875(119883119899 | 119862119894)(2) Connection Module According to (2) the probability ofvarious values to be filled in the tuples containing missingvalues that are determined by its dependency attribute valuecan be calculated However since MapReduce can onlyprocess one record at a time in the Map phase and Reducephase the system must make the dependency attribute valueand its conditional probability correlate this is existencenecessity and the problem to be solved of the connectionmodule The input data of the connection module is theoutput data of the parameter estimation module and theoriginal data to be filledThe output data is the file correlatingthe dependent attribute values in the tuples containing themissing data with the conditional probability of the values

(3) Filling Module The filling module is implemented withMapReduce First the output result and the original inputdata of the connection module are subject to connectiveoperation with offset as the key The Map phase is similarto the connection module The Reduce phase uses (2) tocalculate the conditional probability corresponding to each119862119894 (a possible value of an attribute to be filled) and select119862119894 with the maximum 119875(119862119894 | 119883) probability for filling themissing value

3 The Time_cluster119898119899119896 Model ofAggregation Mechanism Mining

When the database is changed the fuzzy frequent attribute setmaynot be frequent in the updated database In this paper the

definition of negative bounds of the fuzzy frequent attributesets is proposed for the fuzzy sets of the fuzzy frequentattribute set the negative boundary and the fuzzy attributeset in the original database and used for mining the timepassenger flow

31 Mining Association Rules of Passenger Flow Time Seriesin Rail Transit In fuzzy discretization of time series set thetime series 119878 = (1199091 1199092 1199093 119909119899) the time window of thewidth 119908 acts on the subsequence 119878119894 = (1199091 1199092 119909119894+119908minus1)formed by 119878 and a series of subsequences 1199091 1199092 1199093 119909119899minus119908+1 of the width 119908 is formed by single-step slip

119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 (3)

119882(119904 119908) is regarded as 119899 minus 119908 + 1 points in 119908 Euclideanspace randomly assigned to the class 119896 to calculate eachtype of center as a representative of each category and theelements 119878119894 119894 = 1 2 119899 minus 119908 + 1 of the set119882(119904 119908) and themembership attribute function 119906119895(119878119894) belonging to the class-119895representatives is

119906119895 (119878119894) = (1 10038171003817100381710038171003817119904119894 minus 119909119895100381710038171003817100381710038172)1(119887minus1)

sum119896119888=1 (1 1003817100381710038171003817119904119894 minus 1199091198881003817100381710038171003817)2 119895 = 1 2 119896 119887 gt 1 (4)

where 119887 gt 1 denotes a constant that controls the fuzzy degreeof the clustering results 119904119894minus1199091198952 represents the square of thedistance from each point to the representative point of theclass 119895

In the calculation of the support if 119860 occurs 119861 occursin time 119879 that is we need to determine the frequency ofoccurrence of 119860

119865 (119860) = 119899minus119908+1sum119894=1

119906119860 (119878119894) (5)

where 119906119860(119878119894) is the membership of point 119878119894 belonging to the119860th representative attributeThe support 119888 of the correlation rule 119860 119879997888rarr 119861 is 119888(119860 119879997888rarr119861) = 119865(119860 119861 119879)119865(119860) where 119865(119860 119861 119879) denotes frequency of

occurrence of 119861 in time 119879 after 119860 happens with reference tothe 119862-means method of the rule 119860 119879997888rarr 119861

119862 (119861119879 119860) = 119901 (119860) times 119901 (119861119879 | 119860) log 119901 (119861119879 | 119860)119901 (119861119879)+ [1 minus 119901 (119861119879 | 119860)] log 1 minus 119901 (119861119879 | 119860)1 minus 119901 (119861119879)

(6)

where 119901(119860) denotes probability of occurrence of the rulein 119860 119901(119861119879 | 119860) represents occurrence forms of 119861 in time119879 after 119860 occurs and the right side of (5) represents the

4 Discrete Dynamics in Nature and Society

Table 1 The AFC passenger flow data of Shanghai Metro Line 16 (original)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui 31U07452172874 0558 University Town 0649 Zhaojiabang PositiveU31842254712 0658 University Town 0728 Yishan Rd PositiveU12452172127 0831 Jiuting 0903 XujiahuiU07452172874 0621 University Town 0723 Zhaojiabang PositiveU31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1751 University Town PositiveU07452172874 1702 Xujiahui 30

information acquisition process from a priori probability119901(119861119879) to posteriori probability 119901(119861119879 | 119860)32 Mining of Temporal Passenger Flow OD Pairs Basedon the Support of Correlation Rules By preprocessing theAFC passenger flow data of the rail transit stations theobtained data have beenmore accurate and in accordwith thedata format and redundancy requirement of the applicationdata mining tools The main algorithm is used to exploretemporal aggregation correlation rules for passenger datafor the purpose of mining the temporal passenger flow andthe passenger flow OD pairs with traffic card numbers asthe major cord and the designed algorithm is shown asfollows

Step 1 Calculate 119906119860(119878119894) first and then 119888(119860 119879997888rarr 119861) accordingto the support of passenger flow of the rail transit and setminsupport = 119888(119860 119879997888rarr 119861)Step 2 Initialize AFC passenger flow database according tothe mining condition with the cleaned and preprocesseddata as initial data for correlation rule mining scanning thetransaction database 119879ID and finding all the item sets whoselength is 119896 = 1 forming candidate 1-item set (1198621) substituting1198621 into (5) and (6) and calculating the support of eachitem comparing with the minimum support one by one andforming the frequent 1-item set if the support is greater thanthe minsupport

Step 3 Generate candidate (119896 + 1) item set 119862119896+1 accordingto the frequent 119896-item set if 119862119896+1 = Φ go to the next stepotherwise the loop ends

Step 4 Scan the database to calculate and determine thesupport for each candidate item set

Step 5 Delete candidate items with the support smaller thanminsupport forming a frequent item set 119871119896+1 of the (119896 + 1)item

Step 6 119896 = 119896 + 1 go to Step 3

Step 7 This results in the frequent item set correlation rulesof temporal passenger flow aggregation

Step 8 Output 2-hour particle size of the temporal passengerflow aggregation and the results of OD pairs to guide thepreparation and adjustment of rail transit operation plans

4 Case Study and Analysis

41 Cleaning and Missing Filling of Passenger Flow DataPassenger flow data of this paper comes from the AFC gateof Shanghai Rail Transit Line 9 and passenger flow periodcovers 600ndash2200Themain data field includes card numberarrival time inbound station departure time outboundstation P + R (the indication if the P + R parking lots is used)and total time (travel time) many of which are redundantdata such as fare evasion transportation card lost and otherreasons that resulted in the invalid data and some raw dataare shown as Table 1

From the original data we can see that some data fieldsaremissing especially the fields that have to be filled the datacleaning and filling method designed in Section 2 is used todeal with the original data to facilitate data mining

In order to fill the loss values of the original data substi-tuting (2) into (1) yields

119875 (119862119894 | 119883) = 119875 (119883 | 119862119894) = 119875 (1198831 | 119862119894) times 119875 (1198832 | 119862119894) times sdot sdot sdot times 119875 (119883119899 | 119862119894) times 119875 (119862119894)119875 (119883) (7)

Based on the card number field value of AFC data traveltimes and the OD percentage of each passenger are collectedand retrieved by SQL statement and substituted into (7) and

through parameter estimation connection filling and othermodules supplementing the original data yields results ofTable 2

Discrete Dynamics in Nature and Society 5

Table 2 The AFC passenger flow data of Shanghai Metro Line 16 (supplementary and correction)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui Negative 31U07452172874 0558 University Town 0649 Zhaojiabang Rd Positive 51U31842254712 0658 University Town 0728 Yishan Rd Negative 30U12452172127 0831 Jiuting 0903 Xujiahui Negative 32U07452172874 0621 University Town 0710 Zhaojiabang Rd Positive 41U31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1732 University Town Positive 46U07452172874 1702 Xujiahui 1748 University Town Positive 30

Table 3 Characteristic values of 2-hour particle size of Da Xue Cheng Station

Time series ClassMembership 119906119895(119878119894) Calculation of support 119862(119860 minus 119861) Passenger flow

600ndash800 085 090 47804801ndash1000 074 086 310121001-1200 091 084 114301201ndash1400 090 080 90101401ndash1600 089 084 130141601ndash1800 082 081 302541801ndash2000 069 083 301472001ndash2200 092 084 29874

Table 4 The aggregation passenger flow based on distribution time

Time range StationSt1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800 7031 6031 9214 15274 22215 14721 16014 8721 10721 30264 28214801ndash1000 12031 13031 8610 11201 20203 6321 11782 6321 9843 19721 187341001ndash1200 10574 7574 7141 9845 6201 5014 9487 7034 7123 10351 163211201ndash1400 9414 3414 6573 9216 5845 4782 5014 6791 8787 9014 100241401ndash1600 9018 10034 5362 8142 9216 3587 6571 8482 5591 7782 97891601ndash1800 8124 8124 5217 9345 13201 8721 20362 6217 10258 9587 81371801ndash2000 10250 22230 6057 9714 9231 6321 18217 9354 9239 8214 103542001ndash2200 9217 13217 5325 8521 7227 5014 9057 8242 8427 9241 9792

42 Mining of Passenger Flow Period Aggregation Accordingto the time series discretization design in Section 31 thepassenger flow period is divided according to the 2-hourparticle size and single-step slip forms a series of time series119878 = (1199091 1199092 1199093 119909119899) with width of 2

Calculating119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 andthe membership function 119906119895(119878119894) = (1119904119894 minus 1199091198952)1(119887minus1)sum119896119888=1(1119904119894 minus 119909119888)2 119895 = 1 2 119896 119887 gt 1 obtains the mem-bership of station passenger flow aggregation of every 2-hourparticle size

Passenger flow period covers 600 to 2000 and isdivided as 2-hour particle size into 600ndash800 800ndash10001000ndash1200 1200ndash1400 1400ndash1600 1600ndash1800 1800ndash2000 and 2000ndash2200 and membership 119906119895(119878119894) calculationof support 119862(119860 minus 119861) and passenger flow of each period areshown in Table 3

The above-mentioned mining algorithm can be used tocalculate 2-hour particle size period passenger flow of theShanghaiMetro Line 9GuilinRoadCaohejingDevelopmentZone Hechuan Road Xingzhong Road Qibao ZhongchunRoad Jiuting Sijing Sheshan Dongjing Da Xue ChengStation The results are shown in Table 4

Figure 3 shows the change trend of distribution time dataIt can show the passenger flow distribution of the stations inthe 2-hour particle size range visually The distribution of thepassenger flow can guide the operation directly

The mining results in Table 4 can provide the datasupport for the travel rush hour of the passengers and trafficadjustment scheme of the rail transit station so that the traincapacity can be matched with the passenger flow and thepassengers are quickly distributed to the road network It canalso provide data support for the determination of the speed

6 Discrete Dynamics in Nature and Society

Table 5 The mining of passengersrsquo travel OD pair rule (upward direction)

Card number O D P + RU12452172127 Jiuting Xujiahui NegativeU07452172874 Songjiang University Town Zhaojiabang Rd PositiveU31842254712 Songjiang University Town Yishan Rd PositiveU01552172876 Songjiang New Town Qibao PositiveU07452172874 Songjiang New Town Yishan Rd PositiveU12452376572 Songjiang Sports Center Century Ave NegativeU06571172546 Dongjing Century Ave NegativeU318745684324 Sheshan Xujiahui NegativeU01555476158 Qibao Century Ave NegativeU15478475786 Jiuting Century Ave Negative

0

5000

10000

15000

20000

25000

30000

35000

St1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800

801ndash1000

1001ndash1200

1201ndash14001401ndash1600

1601ndash1800

1801ndash2000

2001ndash2200

Figure 3 The aggregation passenger flow based on distributiontime

ratio of expressslow trains of the subway lines of the railtransitThe ratio of the expressslow trains can be determinedaccording to the temporal passenger flow thus saving energyconsumption and passenger travel time

43 Mining of Passenger Flow OD Pairs Based on CorrelationRules According to Section 32 119888(119860 119879997888rarr 119861) of the OD pairsof each traffic card number and comparedwithminsupport =119888(119860 119879997888rarr 119861) the pairs with values higher than it are determinedas the OD pairsThemining results are shown in Table 5Themining results can provide accurate decision-making supportfor the determination of the arrival scheme of the expresstrains in the expressslow mode

It can be seen from Table 5 that if we can combine thetravel time of passengers to mine the OD pairs it can providemore elaborated decision support for the optimization ofthe expressslow train program To a certain extent theaccessibility of passenger travel is increased especially in thepeak commuting accessibility is particularly important toreduce the unnecessary transfer and transfer passenger flowin the road network is relatively reduced

5 Conclusions

Suburban rail transport passenger transport organizationproblems have troubled operation and management enter-prises over years Passenger transport organizations areinvolved in key indicators such as the temporal passengerflow passenger OD and speed car ratio of the expressslowtrains Passenger flow data of the rail transit are complicatedin structure in which redundancy and missing coexist Inthis paper we use the big data processing method to purifyand supplement the original data and then explore the timedistribution rule and the OD pairs of the passenger flow indepth The passenger flow AFC data of Shanghai Rail TransitLine 9 are taken as inputs and are revised supplemented andmined to give out the time distribution characteristics of thepassenger flow of Line 9 in the end and verify the feasibilityof the method In the future the time distribution rule andOD pairs of passenger flow under the network condition willbemined in order to guide the establishment of the operationplan of the whole road network

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research described in this paper was supported by theNational Natural Science Foundation of China (Grant no71601110) the National Key Research and Development Planof China (Grant no 2016YFC0802505) and the NationalKey Technologies R amp D Program of China (Grant no2015BAG19B02-28) The authors also gratefully acknowledgethose who provided data and suggestions

References

[1] W Gordon ldquoImproved modeling of non-home-base tripsrdquoTransportation Research Record pp 23ndash25 1996

[2] R C M Yam R CWhitfield and RW F Chung ldquoForecastingtraffic generation in public housing estatesrdquo Journal of Trans-portation Engineering vol 126 no 4 pp 358ndash361 2000

Discrete Dynamics in Nature and Society 7

[3] J L Bowman andM E Ben-Akiva ldquoActivity-based disaggregatetravel demand model system with activity schedulesrdquo Trans-portation Research A Policy and Practice vol 35 no 1 pp 1ndash282001

[4] Y Xiaoguang ldquoMultinomial distribution lagging model usedfor short-term traffic flow predictionrdquo Journal of Tongji Univer-sity vol 39 no 9 pp 1297ndash1302 2011

[5] Y Enjian Study on Urban Rail Transit OD Distribution Predic-tion Models under Networking Condition [D] Beijing JiaotongUniversity 2013

[6] Z Peng ldquoUrban rail transit network short-term passengerflow OD estimation modelrdquo The Journal of Traffic SystemsEngineering and Information Technology vol 15 no 2 pp 149ndash155 2015

[7] C Xiaohong ldquoStudy on rail transit passenger flow dynamic ODmatrix estimation methodsrdquo The Journal of Tongji Universityvol 24 no 12 pp 56ndash72 2010

[8] A Jamili M A Shafia S J Sadjadi and R Tavakkoli-Moghad-dam ldquoSolving a periodic single-track train timetabling problemby an efficient hybrid algorithmrdquo Engineering Applications ofArtificial Intelligence vol 25 no 4 pp 793ndash800 2012

[9] M T Isaai A Kanani M Tootoonchi andH R Afzali ldquoIntelli-gent timetable evaluation using fuzzyAHPrdquoExpert SystemswithApplications vol 38 no 4 pp 3718ndash3723 2011

[10] A Ceder ldquoPublic-transport automated timetables using evenheadway and even passenger load conceptsrdquo in Proceedings ofthe 32nd Australasian Transport Research Forum (ATRF rsquo09) 171 pages Auckland New Zealand October 2009

[11] A Nuzzolo U Crisalli and L Rosati ldquoA schedule-based assign-ment model with explicit capacity constraints for congestedtransit networksrdquo Transportation Research Part C EmergingTechnologies vol 20 no 1 pp 16ndash33 2012

[12] M Carey and I Crawford ldquoScheduling trains on a network ofbusy complex stationsrdquo Transportation Research B Methodolog-ical vol 41 no 2 pp 159ndash178 2007

[13] V Guihaire and J K Hao ldquoTransit network design and schedul-ing a global reviewrdquo Transportation Research A Policy andPractice vol 42 no 10 pp 1251ndash1273 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: The Aggregation Mechanism Mining of Passengers’ Flow with

4 Discrete Dynamics in Nature and Society

Table 1 The AFC passenger flow data of Shanghai Metro Line 16 (original)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui 31U07452172874 0558 University Town 0649 Zhaojiabang PositiveU31842254712 0658 University Town 0728 Yishan Rd PositiveU12452172127 0831 Jiuting 0903 XujiahuiU07452172874 0621 University Town 0723 Zhaojiabang PositiveU31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1751 University Town PositiveU07452172874 1702 Xujiahui 30

information acquisition process from a priori probability119901(119861119879) to posteriori probability 119901(119861119879 | 119860)32 Mining of Temporal Passenger Flow OD Pairs Basedon the Support of Correlation Rules By preprocessing theAFC passenger flow data of the rail transit stations theobtained data have beenmore accurate and in accordwith thedata format and redundancy requirement of the applicationdata mining tools The main algorithm is used to exploretemporal aggregation correlation rules for passenger datafor the purpose of mining the temporal passenger flow andthe passenger flow OD pairs with traffic card numbers asthe major cord and the designed algorithm is shown asfollows

Step 1 Calculate 119906119860(119878119894) first and then 119888(119860 119879997888rarr 119861) accordingto the support of passenger flow of the rail transit and setminsupport = 119888(119860 119879997888rarr 119861)Step 2 Initialize AFC passenger flow database according tothe mining condition with the cleaned and preprocesseddata as initial data for correlation rule mining scanning thetransaction database 119879ID and finding all the item sets whoselength is 119896 = 1 forming candidate 1-item set (1198621) substituting1198621 into (5) and (6) and calculating the support of eachitem comparing with the minimum support one by one andforming the frequent 1-item set if the support is greater thanthe minsupport

Step 3 Generate candidate (119896 + 1) item set 119862119896+1 accordingto the frequent 119896-item set if 119862119896+1 = Φ go to the next stepotherwise the loop ends

Step 4 Scan the database to calculate and determine thesupport for each candidate item set

Step 5 Delete candidate items with the support smaller thanminsupport forming a frequent item set 119871119896+1 of the (119896 + 1)item

Step 6 119896 = 119896 + 1 go to Step 3

Step 7 This results in the frequent item set correlation rulesof temporal passenger flow aggregation

Step 8 Output 2-hour particle size of the temporal passengerflow aggregation and the results of OD pairs to guide thepreparation and adjustment of rail transit operation plans

4 Case Study and Analysis

41 Cleaning and Missing Filling of Passenger Flow DataPassenger flow data of this paper comes from the AFC gateof Shanghai Rail Transit Line 9 and passenger flow periodcovers 600ndash2200Themain data field includes card numberarrival time inbound station departure time outboundstation P + R (the indication if the P + R parking lots is used)and total time (travel time) many of which are redundantdata such as fare evasion transportation card lost and otherreasons that resulted in the invalid data and some raw dataare shown as Table 1

From the original data we can see that some data fieldsaremissing especially the fields that have to be filled the datacleaning and filling method designed in Section 2 is used todeal with the original data to facilitate data mining

In order to fill the loss values of the original data substi-tuting (2) into (1) yields

119875 (119862119894 | 119883) = 119875 (119883 | 119862119894) = 119875 (1198831 | 119862119894) times 119875 (1198832 | 119862119894) times sdot sdot sdot times 119875 (119883119899 | 119862119894) times 119875 (119862119894)119875 (119883) (7)

Based on the card number field value of AFC data traveltimes and the OD percentage of each passenger are collectedand retrieved by SQL statement and substituted into (7) and

through parameter estimation connection filling and othermodules supplementing the original data yields results ofTable 2

Discrete Dynamics in Nature and Society 5

Table 2 The AFC passenger flow data of Shanghai Metro Line 16 (supplementary and correction)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui Negative 31U07452172874 0558 University Town 0649 Zhaojiabang Rd Positive 51U31842254712 0658 University Town 0728 Yishan Rd Negative 30U12452172127 0831 Jiuting 0903 Xujiahui Negative 32U07452172874 0621 University Town 0710 Zhaojiabang Rd Positive 41U31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1732 University Town Positive 46U07452172874 1702 Xujiahui 1748 University Town Positive 30

Table 3 Characteristic values of 2-hour particle size of Da Xue Cheng Station

Time series ClassMembership 119906119895(119878119894) Calculation of support 119862(119860 minus 119861) Passenger flow

600ndash800 085 090 47804801ndash1000 074 086 310121001-1200 091 084 114301201ndash1400 090 080 90101401ndash1600 089 084 130141601ndash1800 082 081 302541801ndash2000 069 083 301472001ndash2200 092 084 29874

Table 4 The aggregation passenger flow based on distribution time

Time range StationSt1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800 7031 6031 9214 15274 22215 14721 16014 8721 10721 30264 28214801ndash1000 12031 13031 8610 11201 20203 6321 11782 6321 9843 19721 187341001ndash1200 10574 7574 7141 9845 6201 5014 9487 7034 7123 10351 163211201ndash1400 9414 3414 6573 9216 5845 4782 5014 6791 8787 9014 100241401ndash1600 9018 10034 5362 8142 9216 3587 6571 8482 5591 7782 97891601ndash1800 8124 8124 5217 9345 13201 8721 20362 6217 10258 9587 81371801ndash2000 10250 22230 6057 9714 9231 6321 18217 9354 9239 8214 103542001ndash2200 9217 13217 5325 8521 7227 5014 9057 8242 8427 9241 9792

42 Mining of Passenger Flow Period Aggregation Accordingto the time series discretization design in Section 31 thepassenger flow period is divided according to the 2-hourparticle size and single-step slip forms a series of time series119878 = (1199091 1199092 1199093 119909119899) with width of 2

Calculating119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 andthe membership function 119906119895(119878119894) = (1119904119894 minus 1199091198952)1(119887minus1)sum119896119888=1(1119904119894 minus 119909119888)2 119895 = 1 2 119896 119887 gt 1 obtains the mem-bership of station passenger flow aggregation of every 2-hourparticle size

Passenger flow period covers 600 to 2000 and isdivided as 2-hour particle size into 600ndash800 800ndash10001000ndash1200 1200ndash1400 1400ndash1600 1600ndash1800 1800ndash2000 and 2000ndash2200 and membership 119906119895(119878119894) calculationof support 119862(119860 minus 119861) and passenger flow of each period areshown in Table 3

The above-mentioned mining algorithm can be used tocalculate 2-hour particle size period passenger flow of theShanghaiMetro Line 9GuilinRoadCaohejingDevelopmentZone Hechuan Road Xingzhong Road Qibao ZhongchunRoad Jiuting Sijing Sheshan Dongjing Da Xue ChengStation The results are shown in Table 4

Figure 3 shows the change trend of distribution time dataIt can show the passenger flow distribution of the stations inthe 2-hour particle size range visually The distribution of thepassenger flow can guide the operation directly

The mining results in Table 4 can provide the datasupport for the travel rush hour of the passengers and trafficadjustment scheme of the rail transit station so that the traincapacity can be matched with the passenger flow and thepassengers are quickly distributed to the road network It canalso provide data support for the determination of the speed

6 Discrete Dynamics in Nature and Society

Table 5 The mining of passengersrsquo travel OD pair rule (upward direction)

Card number O D P + RU12452172127 Jiuting Xujiahui NegativeU07452172874 Songjiang University Town Zhaojiabang Rd PositiveU31842254712 Songjiang University Town Yishan Rd PositiveU01552172876 Songjiang New Town Qibao PositiveU07452172874 Songjiang New Town Yishan Rd PositiveU12452376572 Songjiang Sports Center Century Ave NegativeU06571172546 Dongjing Century Ave NegativeU318745684324 Sheshan Xujiahui NegativeU01555476158 Qibao Century Ave NegativeU15478475786 Jiuting Century Ave Negative

0

5000

10000

15000

20000

25000

30000

35000

St1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800

801ndash1000

1001ndash1200

1201ndash14001401ndash1600

1601ndash1800

1801ndash2000

2001ndash2200

Figure 3 The aggregation passenger flow based on distributiontime

ratio of expressslow trains of the subway lines of the railtransitThe ratio of the expressslow trains can be determinedaccording to the temporal passenger flow thus saving energyconsumption and passenger travel time

43 Mining of Passenger Flow OD Pairs Based on CorrelationRules According to Section 32 119888(119860 119879997888rarr 119861) of the OD pairsof each traffic card number and comparedwithminsupport =119888(119860 119879997888rarr 119861) the pairs with values higher than it are determinedas the OD pairsThemining results are shown in Table 5Themining results can provide accurate decision-making supportfor the determination of the arrival scheme of the expresstrains in the expressslow mode

It can be seen from Table 5 that if we can combine thetravel time of passengers to mine the OD pairs it can providemore elaborated decision support for the optimization ofthe expressslow train program To a certain extent theaccessibility of passenger travel is increased especially in thepeak commuting accessibility is particularly important toreduce the unnecessary transfer and transfer passenger flowin the road network is relatively reduced

5 Conclusions

Suburban rail transport passenger transport organizationproblems have troubled operation and management enter-prises over years Passenger transport organizations areinvolved in key indicators such as the temporal passengerflow passenger OD and speed car ratio of the expressslowtrains Passenger flow data of the rail transit are complicatedin structure in which redundancy and missing coexist Inthis paper we use the big data processing method to purifyand supplement the original data and then explore the timedistribution rule and the OD pairs of the passenger flow indepth The passenger flow AFC data of Shanghai Rail TransitLine 9 are taken as inputs and are revised supplemented andmined to give out the time distribution characteristics of thepassenger flow of Line 9 in the end and verify the feasibilityof the method In the future the time distribution rule andOD pairs of passenger flow under the network condition willbemined in order to guide the establishment of the operationplan of the whole road network

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research described in this paper was supported by theNational Natural Science Foundation of China (Grant no71601110) the National Key Research and Development Planof China (Grant no 2016YFC0802505) and the NationalKey Technologies R amp D Program of China (Grant no2015BAG19B02-28) The authors also gratefully acknowledgethose who provided data and suggestions

References

[1] W Gordon ldquoImproved modeling of non-home-base tripsrdquoTransportation Research Record pp 23ndash25 1996

[2] R C M Yam R CWhitfield and RW F Chung ldquoForecastingtraffic generation in public housing estatesrdquo Journal of Trans-portation Engineering vol 126 no 4 pp 358ndash361 2000

Discrete Dynamics in Nature and Society 7

[3] J L Bowman andM E Ben-Akiva ldquoActivity-based disaggregatetravel demand model system with activity schedulesrdquo Trans-portation Research A Policy and Practice vol 35 no 1 pp 1ndash282001

[4] Y Xiaoguang ldquoMultinomial distribution lagging model usedfor short-term traffic flow predictionrdquo Journal of Tongji Univer-sity vol 39 no 9 pp 1297ndash1302 2011

[5] Y Enjian Study on Urban Rail Transit OD Distribution Predic-tion Models under Networking Condition [D] Beijing JiaotongUniversity 2013

[6] Z Peng ldquoUrban rail transit network short-term passengerflow OD estimation modelrdquo The Journal of Traffic SystemsEngineering and Information Technology vol 15 no 2 pp 149ndash155 2015

[7] C Xiaohong ldquoStudy on rail transit passenger flow dynamic ODmatrix estimation methodsrdquo The Journal of Tongji Universityvol 24 no 12 pp 56ndash72 2010

[8] A Jamili M A Shafia S J Sadjadi and R Tavakkoli-Moghad-dam ldquoSolving a periodic single-track train timetabling problemby an efficient hybrid algorithmrdquo Engineering Applications ofArtificial Intelligence vol 25 no 4 pp 793ndash800 2012

[9] M T Isaai A Kanani M Tootoonchi andH R Afzali ldquoIntelli-gent timetable evaluation using fuzzyAHPrdquoExpert SystemswithApplications vol 38 no 4 pp 3718ndash3723 2011

[10] A Ceder ldquoPublic-transport automated timetables using evenheadway and even passenger load conceptsrdquo in Proceedings ofthe 32nd Australasian Transport Research Forum (ATRF rsquo09) 171 pages Auckland New Zealand October 2009

[11] A Nuzzolo U Crisalli and L Rosati ldquoA schedule-based assign-ment model with explicit capacity constraints for congestedtransit networksrdquo Transportation Research Part C EmergingTechnologies vol 20 no 1 pp 16ndash33 2012

[12] M Carey and I Crawford ldquoScheduling trains on a network ofbusy complex stationsrdquo Transportation Research B Methodolog-ical vol 41 no 2 pp 159ndash178 2007

[13] V Guihaire and J K Hao ldquoTransit network design and schedul-ing a global reviewrdquo Transportation Research A Policy andPractice vol 42 no 10 pp 1251ndash1273 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: The Aggregation Mechanism Mining of Passengers’ Flow with

Discrete Dynamics in Nature and Society 5

Table 2 The AFC passenger flow data of Shanghai Metro Line 16 (supplementary and correction)

Card number Arrival time Inbound station Departure time Outbound station P + R Total time (minute)U12452172127 0912 Jiuting 0943 Xujiahui Negative 31U07452172874 0558 University Town 0649 Zhaojiabang Rd Positive 51U31842254712 0658 University Town 0728 Yishan Rd Negative 30U12452172127 0831 Jiuting 0903 Xujiahui Negative 32U07452172874 0621 University Town 0710 Zhaojiabang Rd Positive 41U31842254712 0624 University Town 0728 Yishan Rd Positive 30U07452172874 1646 Zhaojiabang Rd 1732 University Town Positive 46U07452172874 1702 Xujiahui 1748 University Town Positive 30

Table 3 Characteristic values of 2-hour particle size of Da Xue Cheng Station

Time series ClassMembership 119906119895(119878119894) Calculation of support 119862(119860 minus 119861) Passenger flow

600ndash800 085 090 47804801ndash1000 074 086 310121001-1200 091 084 114301201ndash1400 090 080 90101401ndash1600 089 084 130141601ndash1800 082 081 302541801ndash2000 069 083 301472001ndash2200 092 084 29874

Table 4 The aggregation passenger flow based on distribution time

Time range StationSt1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800 7031 6031 9214 15274 22215 14721 16014 8721 10721 30264 28214801ndash1000 12031 13031 8610 11201 20203 6321 11782 6321 9843 19721 187341001ndash1200 10574 7574 7141 9845 6201 5014 9487 7034 7123 10351 163211201ndash1400 9414 3414 6573 9216 5845 4782 5014 6791 8787 9014 100241401ndash1600 9018 10034 5362 8142 9216 3587 6571 8482 5591 7782 97891601ndash1800 8124 8124 5217 9345 13201 8721 20362 6217 10258 9587 81371801ndash2000 10250 22230 6057 9714 9231 6321 18217 9354 9239 8214 103542001ndash2200 9217 13217 5325 8521 7227 5014 9057 8242 8427 9241 9792

42 Mining of Passenger Flow Period Aggregation Accordingto the time series discretization design in Section 31 thepassenger flow period is divided according to the 2-hourparticle size and single-step slip forms a series of time series119878 = (1199091 1199092 1199093 119909119899) with width of 2

Calculating119882(119904 119908) = 119878119894 | 119894 = 1 2 119899 minus 119908 + 1 andthe membership function 119906119895(119878119894) = (1119904119894 minus 1199091198952)1(119887minus1)sum119896119888=1(1119904119894 minus 119909119888)2 119895 = 1 2 119896 119887 gt 1 obtains the mem-bership of station passenger flow aggregation of every 2-hourparticle size

Passenger flow period covers 600 to 2000 and isdivided as 2-hour particle size into 600ndash800 800ndash10001000ndash1200 1200ndash1400 1400ndash1600 1600ndash1800 1800ndash2000 and 2000ndash2200 and membership 119906119895(119878119894) calculationof support 119862(119860 minus 119861) and passenger flow of each period areshown in Table 3

The above-mentioned mining algorithm can be used tocalculate 2-hour particle size period passenger flow of theShanghaiMetro Line 9GuilinRoadCaohejingDevelopmentZone Hechuan Road Xingzhong Road Qibao ZhongchunRoad Jiuting Sijing Sheshan Dongjing Da Xue ChengStation The results are shown in Table 4

Figure 3 shows the change trend of distribution time dataIt can show the passenger flow distribution of the stations inthe 2-hour particle size range visually The distribution of thepassenger flow can guide the operation directly

The mining results in Table 4 can provide the datasupport for the travel rush hour of the passengers and trafficadjustment scheme of the rail transit station so that the traincapacity can be matched with the passenger flow and thepassengers are quickly distributed to the road network It canalso provide data support for the determination of the speed

6 Discrete Dynamics in Nature and Society

Table 5 The mining of passengersrsquo travel OD pair rule (upward direction)

Card number O D P + RU12452172127 Jiuting Xujiahui NegativeU07452172874 Songjiang University Town Zhaojiabang Rd PositiveU31842254712 Songjiang University Town Yishan Rd PositiveU01552172876 Songjiang New Town Qibao PositiveU07452172874 Songjiang New Town Yishan Rd PositiveU12452376572 Songjiang Sports Center Century Ave NegativeU06571172546 Dongjing Century Ave NegativeU318745684324 Sheshan Xujiahui NegativeU01555476158 Qibao Century Ave NegativeU15478475786 Jiuting Century Ave Negative

0

5000

10000

15000

20000

25000

30000

35000

St1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800

801ndash1000

1001ndash1200

1201ndash14001401ndash1600

1601ndash1800

1801ndash2000

2001ndash2200

Figure 3 The aggregation passenger flow based on distributiontime

ratio of expressslow trains of the subway lines of the railtransitThe ratio of the expressslow trains can be determinedaccording to the temporal passenger flow thus saving energyconsumption and passenger travel time

43 Mining of Passenger Flow OD Pairs Based on CorrelationRules According to Section 32 119888(119860 119879997888rarr 119861) of the OD pairsof each traffic card number and comparedwithminsupport =119888(119860 119879997888rarr 119861) the pairs with values higher than it are determinedas the OD pairsThemining results are shown in Table 5Themining results can provide accurate decision-making supportfor the determination of the arrival scheme of the expresstrains in the expressslow mode

It can be seen from Table 5 that if we can combine thetravel time of passengers to mine the OD pairs it can providemore elaborated decision support for the optimization ofthe expressslow train program To a certain extent theaccessibility of passenger travel is increased especially in thepeak commuting accessibility is particularly important toreduce the unnecessary transfer and transfer passenger flowin the road network is relatively reduced

5 Conclusions

Suburban rail transport passenger transport organizationproblems have troubled operation and management enter-prises over years Passenger transport organizations areinvolved in key indicators such as the temporal passengerflow passenger OD and speed car ratio of the expressslowtrains Passenger flow data of the rail transit are complicatedin structure in which redundancy and missing coexist Inthis paper we use the big data processing method to purifyand supplement the original data and then explore the timedistribution rule and the OD pairs of the passenger flow indepth The passenger flow AFC data of Shanghai Rail TransitLine 9 are taken as inputs and are revised supplemented andmined to give out the time distribution characteristics of thepassenger flow of Line 9 in the end and verify the feasibilityof the method In the future the time distribution rule andOD pairs of passenger flow under the network condition willbemined in order to guide the establishment of the operationplan of the whole road network

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research described in this paper was supported by theNational Natural Science Foundation of China (Grant no71601110) the National Key Research and Development Planof China (Grant no 2016YFC0802505) and the NationalKey Technologies R amp D Program of China (Grant no2015BAG19B02-28) The authors also gratefully acknowledgethose who provided data and suggestions

References

[1] W Gordon ldquoImproved modeling of non-home-base tripsrdquoTransportation Research Record pp 23ndash25 1996

[2] R C M Yam R CWhitfield and RW F Chung ldquoForecastingtraffic generation in public housing estatesrdquo Journal of Trans-portation Engineering vol 126 no 4 pp 358ndash361 2000

Discrete Dynamics in Nature and Society 7

[3] J L Bowman andM E Ben-Akiva ldquoActivity-based disaggregatetravel demand model system with activity schedulesrdquo Trans-portation Research A Policy and Practice vol 35 no 1 pp 1ndash282001

[4] Y Xiaoguang ldquoMultinomial distribution lagging model usedfor short-term traffic flow predictionrdquo Journal of Tongji Univer-sity vol 39 no 9 pp 1297ndash1302 2011

[5] Y Enjian Study on Urban Rail Transit OD Distribution Predic-tion Models under Networking Condition [D] Beijing JiaotongUniversity 2013

[6] Z Peng ldquoUrban rail transit network short-term passengerflow OD estimation modelrdquo The Journal of Traffic SystemsEngineering and Information Technology vol 15 no 2 pp 149ndash155 2015

[7] C Xiaohong ldquoStudy on rail transit passenger flow dynamic ODmatrix estimation methodsrdquo The Journal of Tongji Universityvol 24 no 12 pp 56ndash72 2010

[8] A Jamili M A Shafia S J Sadjadi and R Tavakkoli-Moghad-dam ldquoSolving a periodic single-track train timetabling problemby an efficient hybrid algorithmrdquo Engineering Applications ofArtificial Intelligence vol 25 no 4 pp 793ndash800 2012

[9] M T Isaai A Kanani M Tootoonchi andH R Afzali ldquoIntelli-gent timetable evaluation using fuzzyAHPrdquoExpert SystemswithApplications vol 38 no 4 pp 3718ndash3723 2011

[10] A Ceder ldquoPublic-transport automated timetables using evenheadway and even passenger load conceptsrdquo in Proceedings ofthe 32nd Australasian Transport Research Forum (ATRF rsquo09) 171 pages Auckland New Zealand October 2009

[11] A Nuzzolo U Crisalli and L Rosati ldquoA schedule-based assign-ment model with explicit capacity constraints for congestedtransit networksrdquo Transportation Research Part C EmergingTechnologies vol 20 no 1 pp 16ndash33 2012

[12] M Carey and I Crawford ldquoScheduling trains on a network ofbusy complex stationsrdquo Transportation Research B Methodolog-ical vol 41 no 2 pp 159ndash178 2007

[13] V Guihaire and J K Hao ldquoTransit network design and schedul-ing a global reviewrdquo Transportation Research A Policy andPractice vol 42 no 10 pp 1251ndash1273 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: The Aggregation Mechanism Mining of Passengers’ Flow with

6 Discrete Dynamics in Nature and Society

Table 5 The mining of passengersrsquo travel OD pair rule (upward direction)

Card number O D P + RU12452172127 Jiuting Xujiahui NegativeU07452172874 Songjiang University Town Zhaojiabang Rd PositiveU31842254712 Songjiang University Town Yishan Rd PositiveU01552172876 Songjiang New Town Qibao PositiveU07452172874 Songjiang New Town Yishan Rd PositiveU12452376572 Songjiang Sports Center Century Ave NegativeU06571172546 Dongjing Century Ave NegativeU318745684324 Sheshan Xujiahui NegativeU01555476158 Qibao Century Ave NegativeU15478475786 Jiuting Century Ave Negative

0

5000

10000

15000

20000

25000

30000

35000

St1 St2 St3 St4 St5 St6 St7 St8 St9 St10 St11

601ndash800

801ndash1000

1001ndash1200

1201ndash14001401ndash1600

1601ndash1800

1801ndash2000

2001ndash2200

Figure 3 The aggregation passenger flow based on distributiontime

ratio of expressslow trains of the subway lines of the railtransitThe ratio of the expressslow trains can be determinedaccording to the temporal passenger flow thus saving energyconsumption and passenger travel time

43 Mining of Passenger Flow OD Pairs Based on CorrelationRules According to Section 32 119888(119860 119879997888rarr 119861) of the OD pairsof each traffic card number and comparedwithminsupport =119888(119860 119879997888rarr 119861) the pairs with values higher than it are determinedas the OD pairsThemining results are shown in Table 5Themining results can provide accurate decision-making supportfor the determination of the arrival scheme of the expresstrains in the expressslow mode

It can be seen from Table 5 that if we can combine thetravel time of passengers to mine the OD pairs it can providemore elaborated decision support for the optimization ofthe expressslow train program To a certain extent theaccessibility of passenger travel is increased especially in thepeak commuting accessibility is particularly important toreduce the unnecessary transfer and transfer passenger flowin the road network is relatively reduced

5 Conclusions

Suburban rail transport passenger transport organizationproblems have troubled operation and management enter-prises over years Passenger transport organizations areinvolved in key indicators such as the temporal passengerflow passenger OD and speed car ratio of the expressslowtrains Passenger flow data of the rail transit are complicatedin structure in which redundancy and missing coexist Inthis paper we use the big data processing method to purifyand supplement the original data and then explore the timedistribution rule and the OD pairs of the passenger flow indepth The passenger flow AFC data of Shanghai Rail TransitLine 9 are taken as inputs and are revised supplemented andmined to give out the time distribution characteristics of thepassenger flow of Line 9 in the end and verify the feasibilityof the method In the future the time distribution rule andOD pairs of passenger flow under the network condition willbemined in order to guide the establishment of the operationplan of the whole road network

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research described in this paper was supported by theNational Natural Science Foundation of China (Grant no71601110) the National Key Research and Development Planof China (Grant no 2016YFC0802505) and the NationalKey Technologies R amp D Program of China (Grant no2015BAG19B02-28) The authors also gratefully acknowledgethose who provided data and suggestions

References

[1] W Gordon ldquoImproved modeling of non-home-base tripsrdquoTransportation Research Record pp 23ndash25 1996

[2] R C M Yam R CWhitfield and RW F Chung ldquoForecastingtraffic generation in public housing estatesrdquo Journal of Trans-portation Engineering vol 126 no 4 pp 358ndash361 2000

Discrete Dynamics in Nature and Society 7

[3] J L Bowman andM E Ben-Akiva ldquoActivity-based disaggregatetravel demand model system with activity schedulesrdquo Trans-portation Research A Policy and Practice vol 35 no 1 pp 1ndash282001

[4] Y Xiaoguang ldquoMultinomial distribution lagging model usedfor short-term traffic flow predictionrdquo Journal of Tongji Univer-sity vol 39 no 9 pp 1297ndash1302 2011

[5] Y Enjian Study on Urban Rail Transit OD Distribution Predic-tion Models under Networking Condition [D] Beijing JiaotongUniversity 2013

[6] Z Peng ldquoUrban rail transit network short-term passengerflow OD estimation modelrdquo The Journal of Traffic SystemsEngineering and Information Technology vol 15 no 2 pp 149ndash155 2015

[7] C Xiaohong ldquoStudy on rail transit passenger flow dynamic ODmatrix estimation methodsrdquo The Journal of Tongji Universityvol 24 no 12 pp 56ndash72 2010

[8] A Jamili M A Shafia S J Sadjadi and R Tavakkoli-Moghad-dam ldquoSolving a periodic single-track train timetabling problemby an efficient hybrid algorithmrdquo Engineering Applications ofArtificial Intelligence vol 25 no 4 pp 793ndash800 2012

[9] M T Isaai A Kanani M Tootoonchi andH R Afzali ldquoIntelli-gent timetable evaluation using fuzzyAHPrdquoExpert SystemswithApplications vol 38 no 4 pp 3718ndash3723 2011

[10] A Ceder ldquoPublic-transport automated timetables using evenheadway and even passenger load conceptsrdquo in Proceedings ofthe 32nd Australasian Transport Research Forum (ATRF rsquo09) 171 pages Auckland New Zealand October 2009

[11] A Nuzzolo U Crisalli and L Rosati ldquoA schedule-based assign-ment model with explicit capacity constraints for congestedtransit networksrdquo Transportation Research Part C EmergingTechnologies vol 20 no 1 pp 16ndash33 2012

[12] M Carey and I Crawford ldquoScheduling trains on a network ofbusy complex stationsrdquo Transportation Research B Methodolog-ical vol 41 no 2 pp 159ndash178 2007

[13] V Guihaire and J K Hao ldquoTransit network design and schedul-ing a global reviewrdquo Transportation Research A Policy andPractice vol 42 no 10 pp 1251ndash1273 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: The Aggregation Mechanism Mining of Passengers’ Flow with

Discrete Dynamics in Nature and Society 7

[3] J L Bowman andM E Ben-Akiva ldquoActivity-based disaggregatetravel demand model system with activity schedulesrdquo Trans-portation Research A Policy and Practice vol 35 no 1 pp 1ndash282001

[4] Y Xiaoguang ldquoMultinomial distribution lagging model usedfor short-term traffic flow predictionrdquo Journal of Tongji Univer-sity vol 39 no 9 pp 1297ndash1302 2011

[5] Y Enjian Study on Urban Rail Transit OD Distribution Predic-tion Models under Networking Condition [D] Beijing JiaotongUniversity 2013

[6] Z Peng ldquoUrban rail transit network short-term passengerflow OD estimation modelrdquo The Journal of Traffic SystemsEngineering and Information Technology vol 15 no 2 pp 149ndash155 2015

[7] C Xiaohong ldquoStudy on rail transit passenger flow dynamic ODmatrix estimation methodsrdquo The Journal of Tongji Universityvol 24 no 12 pp 56ndash72 2010

[8] A Jamili M A Shafia S J Sadjadi and R Tavakkoli-Moghad-dam ldquoSolving a periodic single-track train timetabling problemby an efficient hybrid algorithmrdquo Engineering Applications ofArtificial Intelligence vol 25 no 4 pp 793ndash800 2012

[9] M T Isaai A Kanani M Tootoonchi andH R Afzali ldquoIntelli-gent timetable evaluation using fuzzyAHPrdquoExpert SystemswithApplications vol 38 no 4 pp 3718ndash3723 2011

[10] A Ceder ldquoPublic-transport automated timetables using evenheadway and even passenger load conceptsrdquo in Proceedings ofthe 32nd Australasian Transport Research Forum (ATRF rsquo09) 171 pages Auckland New Zealand October 2009

[11] A Nuzzolo U Crisalli and L Rosati ldquoA schedule-based assign-ment model with explicit capacity constraints for congestedtransit networksrdquo Transportation Research Part C EmergingTechnologies vol 20 no 1 pp 16ndash33 2012

[12] M Carey and I Crawford ldquoScheduling trains on a network ofbusy complex stationsrdquo Transportation Research B Methodolog-ical vol 41 no 2 pp 159ndash178 2007

[13] V Guihaire and J K Hao ldquoTransit network design and schedul-ing a global reviewrdquo Transportation Research A Policy andPractice vol 42 no 10 pp 1251ndash1273 2008

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: The Aggregation Mechanism Mining of Passengers’ Flow with

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 201

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of