Transcript

the Chicago, Illinois, Transit Authority’s (CTA’s) system (3). Theseearlier works focused on rail transit trips and did not include all bustrips. The Massachusetts Institute of Technology, working withCTA and Transport for London, has conducted a detailed analysisof fare card data, typically with more and better-quality informationthan that available for New York City (1, 4–6).

AFC systems for transit can be classified by whether or notthey require a swipe to leave the system. Exit swipes are requiredby distance-based fare systems (e.g., those in San Francisco, Cali-fornia, and London) and facilitate a more straightforward approachto the recovery of information about the transit trips that ridersmake. Other cities (e.g., New York and Chicago) use a flat-ratefare system and collect only boarding information. Such entry-onlytransit systems require additional logic to impute the alightinglocations of transit trips.

This paper presents the findings of work conducted for the NewYork City Transit (NYCT) of the Metropolitan Transit Authority(MTA) to extend the methods of Barry et al. (2) to include all tran-sit modes: subway, local and express buses, ferry, and tramway.This effort and its relevance to demand forecasting have beendescribed earlier (7 ). The goals included improving the processfor estimation of subway trips, generating origin–destination pat-terns by traffic analysis zone, creating load profiles for bus andsubway routes, and extending the analysis beyond the morningpeak period.

MTA’s MetroCard system is an entry-only AFC system in whicha rider swipes a MetroCard to enter a subway station or dips a Metro-Card to board a bus (a “bus dip”), generating a unique transaction thatis logged onto a mainframe computer database. No transaction occurswhen a rider exits a station or a bus. The system was designed in theearly 1990s to collect fares efficiently from millions of riders per dayon a system with a flat-rate fare structure. It was not set up to capturethe details of the trips made by each rider. The transaction times savedare truncated to 10ths of an hour.

The basis of the approach described is that for each transaction,an attempt is made to identify the route and the specific board-ing and alighting stops that define a trip leg. Multiple trip legs are combined into a linked trip, when it is inferred that a rider uses his or her MetroCard two or more times to complete a singlejourney.

The remainder of this paper is organized in three parts. The first partdescribes the methods used to process the available AFC and sched-ule data to produce data sets consisting of geographically referencedlinked transit trips, known by NYCT as the Citywide Transit TravelDatabase. Each data set contains all the transactions for a single day.The second part describes the user-oriented geographic information

Use of Entry-Only Automatic Fare Collection Data to Estimate Linked Transit Trips in New York City

James J. Barry, Robert Freimer, and Howard Slavin

53

Many large transit systems use automatic fare collection (AFC) systems.Most AFC systems were designed solely for revenue management, butthey contain a wealth of customer use data that can be mined to createinputs to operations planning and demand forecasting models for trans-portation planning. More detailed information than could ever be col-lected by any travel survey is potentially available if it is assumed that thetransactional data can be processed to produce the desired information.Previous work in this field focused primarily on rail transit, since board-ings at fixed stations are easier to locate than boardings of buses, whichmove around. This paper presents a case study for the MetropolitanTransit Authority’s New York City Transit, a transit system in which arider swipes a fare card only to enter a station or board a bus. This is thefirst work to include trips by all transit modes in a system that records thetransaction only on rider entry, which is significantly more challengingbecause all the alighting locations need to be inferred and the bus board-ing locations need to be estimated. No location information (from auto-mated vehicle location technology or a Global Positioning System) wasavailable for buses. Software that processes the 7 million–plus daily trans-actions and that creates a data set of linked transit trips was created. Thedata set can then be analyzed by using geographic information system-based query software to create reports, maps, origin–destination matri-ces, load profiles, and new data sets. Subway journeys are assigned byusing a schedule-based shortest-path algorithm.

Many large transit systems use automatic fare collection (AFC) sys-tems, in which a rider swipes a card to enter a subway station or toboard a bus. Once AFC systems came into use, transportation plan-ners realized that they contained a wealth of customer use data thatcould be mined to create inputs to operations planning and demandforecasting models. Because most AFC systems were designed solelyfor the efficient collection of fares from a huge number of riders perday, the special and often difficult processing of their transactionaldata sets is required to derive information useful for transportationplanning (1).

Early work by Barry et al. showed that AFC transactions could beused to estimate subway origin–destination patterns in New YorkCity (2). Rahbee and Czerwinski then applied similar techniques to

J. J. Barry, New York City Transit, Metropolitan Transit Authority, 72 LindberghLane, New York, NY 10956. R. Freimer and H. Slavin, Caliper Corporation, 1172 Beacon Street, Suite 300, Newton, MA 02461. Corresponding author: R. Freimer, [email protected].

Transportation Research Record: Journal of the Transportation Research Board,No. 2112, Transportation Research Board of the National Academies, Washington,D.C., 2009, pp. 53–61.DOI: 10.3141/2112-07

system-based query and analysis tools that can be used to study andanalyze the trip data by creating reports, maps, origin–destinationmatrices, load profiles, and new data sets. The final part describes howthe system is being used in practice by NYCT.

CREATING TRANSIT TRIPS

This section discusses how the MetroCard transactions were processedto create geographically located, linked transit trips. Figure 1 providesa schematic flowchart of the extensive data processing required.The other inputs used include a geographic representation of tran-sit routes, the log of actual bus trips, and the subway and busschedules.

The procedure generates a table of linked passenger trips for agiven day, with each trip having an origin and a destination that is

54 Transportation Research Record 2112

either a subway station or a bus stop. A second table details the com-ponent legs for each linked trip. These two tables are the principalinputs used by the query software.

AFC Transactions

The development project was based on a sample data set consistingof MetroCard transactions provided for a 2-week study period in lateApril 2004. Almost 95 million records were provided in the rawmainframe format and converted into a fixed-format binary table ona personal computer.

Unlike many cities, NYCT never shuts down completely and runs24 h a day, so 3 a.m. was chosen as the start and end of a day to mini-mize the overlapping trips. The procedure processes a day’s worth oftransactions at a time. Weekdays include more than 7 million transac-

AFC Bus Trips

Subway Booths

Trip Matched to RouteSelected by Date

MetroCardTransactions

NYCT Bus SchedulesOperator Order

TransCAD RouteSystem

Bus SchedulesEquipment Order

Trip Matched toSimilar Scheduled Trip

AFC Bus Tripswith Stops

Located SubwayBoardings

Intersection LookupTables

Bus Trips Identified(BUSTRIP.EXE)

Locate Buses based onBus to Bus TransfersNearby Subway to Bus TransfersBus to Subway Transfers

Transactions withLocated Boardings

Located BusBoardings

AFC Bus Trips withEstimated Times

Interpolate Times forUnlocated Bus Events

FIGURE 1 Data flow.(continued)

tions, corresponding to more than 9 million transit legs, because manyriders transfer within the subway system.

MetroCard transactions are generated for a high percentage ofboardings: 97% for the subway and 86% for buses. The remainingtrips are paid for by single-use MetroCards or cash fares that arenot captured.

Accurate Geographic Route System

Many aspects of the processing rely on an accurate representation ofthe transit system, with a route for each pattern and stations and bus

Barry, Freimer, and Slavin 55

stops being correctly located. These include the schedule-basedshortest-path (SSP) algorithm used to estimate the paths throughthe subway system, the locations of transfers between routes, andestimates of arrival times so that trips can be linked.

Before the project described here began, NYCT had been usinga route system with only the major patterns for the morning peakperiod. In conjunction with this project, NYCT migrated over to anew route system that was based on a more accurate base map andthat extended to include all the subway patterns, bus patterns forother time periods, and routes for other bus systems that also usethe MetroCard system. Geographically accurate bus stops withstandardized identifiers were added so that the schedules could be

Estimate Arrivals

TAZs

Passenger Legs

Transactions with Located Boardings

Locate Subway to Bus Destinations

Locate Destinations by Chaining

Link Passenger Legs based on Travel Time and Location

Impute Destinations for Remaining Trips

Determine All Subway Paths & Expand Trips

Allocate Commuter Rail/ Bus Flows

Allocate Linked Endpoints to Zones

Linked Passenger Trips

Query Software

Subway Path Lookup Tables

AFC Bus Trips with Estimated Times

Subway & Ferry Schedules

Schedule-BasedShortest Path

TransCAD Planning Route System

Commuter Rail & Bus Estimates

FIGURE 1 (continued) Data flow. (TAZ � traffic analysis zone).

matched to routes. The creation of such a route system required alot of time and effort.

AFC Bus Trip Log

The MetroCard system records a log of events for each bus and theevent times. These events include destination sign changes, which canusually be used to determine the route pattern that a bus followed. Itdoes not record any location information for a bus or a trip identifierthat can be synchronized with the schedule. A significant effort wasrequired to clean up this data set because multiple records frequentlyneed to be combined into a single trip and some records are missingor are incorrect because they are manually generated by the driverchanging the sign, which does not always happen as planned.

Subway Schedules

For each subway line, there are up to three separate schedule files, cor-responding to weekdays, Saturday, and Sunday. Each file details thestations, the list of trips, and the sequence of stations that comprise atrip along the route.

These were converted into a fixed-format binary table with 596,876records, with each record corresponding to a schedule event. Theseevents occur at 2,549 unique route stops, corresponding to fewerphysical locations.

The schedules were used to update the route system to contain allthe subway patterns, including short turns. They are also used asinputs to the SSP algorithm, which is used to determine paths throughthe subway system.

The MetroCard system is also used by the Staten Island Railway(SIR), the Roosevelt Island Tramway, parts of the Port AuthorityTrans-Hudson (PATH) system under the Hudson River, and the Air-Train to and from John F. Kennedy International Airport (JFK Air-Train). All of these function similar to the subway, in that they havefixed stations. Routes were added for all these operators. Schedulesfor the SIR and the free Staten Island Ferry were added so that theSSP algorithm could be used to assign trips between Staten Islandand Manhattan.

Bus Schedules

The NYCT bus schedules are similar to the subway schedules.For each bus route, there are up to four separate schedule files, cor-responding to weekdays (when school is open), weekdays (whenschool is closed), Saturday, and Sunday. These were converted intoa fixed-format binary file with more than 5 million schedule events,which occurred at 63,049 unique route stops. The schedules had tobe converted from the service schedule order into the equipmentorder, so that they could be synchronized with the actual trips fromthe AFC bus trip log.

MetroCard transactions were also provided for the other busoperators within New York City, including Long Island Bus, routesfranchised by the New York City Department of Transportation(NYCDOT), Atlantic Express, and the Metro-North Hudson RailLink. Each of these provided their schedules in a different formatthat had to be converted and integrated separately. For a variety ofreasons, the schedule information for the NYCDOT routes wassometimes incomplete or inaccurate, and during this project theseroutes were being taken over by the MTA and grouped into theMTA Bus Company.

56 Transportation Research Record 2112

Boarding Locations

Subway and tramway boardings are located by the use of the turn-stile fare collection information. This provides an immediate iden-tification of the station where the boarding occurred but does notidentify the routes boarded when multiple lines serve the same sta-tion complex or the rider switches subway trains. Even the directiontraveled is usually unavailable, because both platforms are oftenaccessible from the same entrance.

Bus boardings are located by estimating the location of the bus atthe trip boarding transaction time. This is rather challenging in NewYork City, because the bus fleet does not employ the automated vehi-cle location technology and the transaction times are truncated to6-min intervals. Instead, the AFC bus trip log is combined with thebus schedules and positional information derived from certainMetroCard transactions to obtain approximate bus locations for mosttrips. The times for intermediate stops between derived locationswere interpolated by using distance as the weighting factor.

The most difficult portion of this project was trying to determinea method that could be used to clean up the bus trip tables andmatch them against the schedule so that they could be used to locatethe bus boardings. The principal difficulty was the inconsistentnumber of records corresponding to an actual trip. To create a sin-gle trip record, there is a frequent need to combine multiple triprecords. In other cases, records need to be split to correct for a driver’sfailure to change the sign. There were several false starts before afinal strategy was selected.

The original strategy used to locate bus transactions involved thecleaning up of the AFC bus trip log to contain one record per actualtrip. The scheduled trip, along with the TransCAD (a commercialsoftware package) route identified by using the schedule files, wouldcontrol the cleanup process. The scheduled stop times would beadjusted to reflect the actual times observed, and passenger bus dipswould be located on the basis of interpolation along the trip route.

Unfortunately, a variety of data imperfections and matching issuescombined to foil the cleanup and matching strategy. After severalfalse starts, a successful approach was developed on the basis of thefollowing observations:

1. The most important goals of matching the bus trip to the sched-ule are to determine the route pattern, the locations of stops, and therelative times between stops. Thus, it is sufficient to find a similartrip made by a bus, as long as it uses the same pattern. It is no longerproblematic if two trips match an identical scheduled trip, since onlythe pattern of stops and their relative times are used by the locationprocedure.

2. Bus positions can be localized by using the transfers observedin the data. For example, if a passenger takes two bus legs within ashort period of time and the routes intersect, then the second busmust be near the intersection point at the transaction time. Similarly,some subway-to-bus transfer locations can be used as well.

A two-pass method is used to locate bus transactions. During thefirst pass the primary goal is to pinpoint the location of the bus atseveral stops during each trip by using bus-to-bus and subway-to-bus transfer points. Interpolated times are then assigned to interme-diate stops, and these are used to assign locations to the remainingbus dips during the second phase. Adjustments to the boarding stopused are made during the trip-linking procedure.

The boarding locations developed are geographically less accu-rate for buses than for subways because of the 6-min truncation of

recorded boarding times, the estimation process, and the cleanuprequired to use the AFC bus trip log.

Alighting Locations

Two assumptions were made to allow alighting locations to be deter-mined: (a) most riders start their next trip at or near the destinationof their previous trip, and (b) most riders end their last trip of the dayat or near the start of their first trip of the day. These were shown byNYCT to be reasonable for subway riders by use of a travel diary sur-vey to validate the destinations (2). The methodology resulted in 90%valid destinations.

An additional assumption was made: the pattern of single-farecard users is similar to that of multiple-fare card users at a givenboarding location.

A chaining procedure is applied to determine the likely alightinglocations for riders with two or more MetroCard transactions on aparticular day. Each transaction defines a leg (unlinked trip) in thepassenger’s journey. The location of the next transaction by a Metro-Card is used to find a nearby bus stop or subway station for the alight-ing location of the current leg, assuming that a consistent one exists.This assumes that many riders start their next movement near theconclusion of their prior movement. For the final leg, the procedureloops back to the first transaction, unless the passenger makes only asingle linked trip during the day. This logic is justified by the fact thatmany riders return to their origin at the end of the day. Impossibledestinations, that is, those that are unreachable by the subway systemor bus in question, are discarded. Destinations for single trips andother trips with no chained destination are assigned by using a ran-dom sampling of distributions derived from other riders who havethe same trip origins.

The arrival time for each imputed alighting location is estimatedby running a query in the SSP algorithm for each subway journey orby using the estimated stop times computed for the bus trip. Legsare combined into a single linked trip if the expected arrival time ofthe first leg is within 18 min of the start of the next leg. The locationconsistency is ensured by the fact that alighting locations have notyet been assigned for inconsistent legs.

Expansion

Not all transactions will have their alighting locations determined bythe chaining procedure because it fails to assign an alighting locationwhen either the rider made a single trip (which could possibly be amultimodal trip) or no alighting stop is consistent with the next board-ing location. Two expansion procedures that use sampling to assignalighting locations are applied. For subway transactions, an alightingstop is assigned by uniform sampling on the basis of the observeddistribution from riders boarding at the same station with assignedalighting stations. For bus passengers, a similar approach based ona distribution of alighting stops for all passengers boarding at the samestop during the day for that route pattern is used.

Linkage

Two or more movements for a rider are linked together into a singletrip, when they occur within a short period of time. The alightingtimes for subway trips are determined by use of the SSP algorithm,

Barry, Freimer, and Slavin 57

which uses the complete subway schedule and geographic representa-tions of all route patterns to predict the route traveled through the sub-way system and the time of arrival. This method also handles the useof the Staten Island Ferry, which is a free connector between theNYCT subway system and SIR. Bus alighting times are determinedby using the estimated arrival time for the bus at the alighting stop.

Zone Allocation

The procedure generates a table of linked passenger trips for a givenday, with each trip having an origin and a destination that is either asubway station or a bus stop. Nearby origin and destination zones(2000 census block groups) were assigned to each trip by a logit allo-cation procedure that distributes the trips to nearby zones on the basisof a weighting of walking distance and population or employment,depending on the time of day. Special handling was added to assignthe trips starting and ending on commuter rail or a bus.

Validation

A variety of methods were used to validate the location and linkageprocedure while it was being developed. In particular, some auto-mated procedures were implemented to tabulate the transactionslocated so that the results could be compared with other availableinformation for validation purposes.

For subway transactions, the entrance and exit counts were tabu-lated by subway station complex for the full day and by 4-h periods,which match the periods used at each station to match the polling ofthe registers; these periods vary in their starting hours. These resultscan then be used for comparisons with subway register counts (exitcounts should be treated as a lower bound, since not all exiting pas-sengers are counted) and to check the balance between the entranceand exit totals for a station, since they should usually be close.

For bus transactions, ride check data are an alternate source ofinformation. Counts of boardings, alightings, and overall loads areprovided for each stop along each bus trip. Only a couple of routeswere checked during the 2-week study period. The available ridecheck data were imported and successfully matched to the sched-uled trips. The bus transactions were then tabulated by trip stop, sothat the ride check data could be compared with the transactions thatwere located.

To compare the results obtained by the automated location proce-dure with those for some known trips, some MetroCards were pur-chased and 10 predetermined tours were taken. The boarding andalighting times and locations were logged during those tours. Theresults derived by the automated location procedure were comparedwith the results derived from actual trips. By in large, the results werequite promising, given the quality of some of the data sets. A coupleof subway journeys used different pairs of trains to get between thesame pair of stations, which happens when actual choices do notmatch the SSP algorithm query expansion.

QUERY SOFTWARE

Powerful query software that allows almost any conceivable queryto be answered was created for this project. The software works intwo steps: trip or leg selection and output creation. Queries can bemade on either the linked trips table or the unlinked legs table.

The tool is set up to work with multiple days of data processedfrom the 2-week study period from April 19 to May 2, 2004, whichwas selected as a representative period for data collection and whichwas free of exceptional events, such as snowstorms, disasters, andschool vacations.

The software is a custom application of TransCAD that is accessedthrough a new custom toolbox. The software was designed to beeasy for anyone to use to get the information that they desire. Nospecial knowledge or training is required to use the system. Userscan interactively select the geographic location and time period ofinterest.

The query builder step defines a query by combining one or moreselection primitives that conceptually select a set of trips or legs(Figure 2). The query can require that either any or all of the primi-tives be matched. For linked trips, the primitives include selectionby the mode, route, or pattern of a particular leg in the trip sequence;by the origin or destination of the trip by the specification of stops,zones, census tracts, or boroughs; by the inclusion of a particulartype of transfer between modes within the trip; or by a general struc-tured query language (SQL) query. For unlinked legs, the primitivesinclude selection by the mode, route, or pattern of the leg; by the ori-gin or destination of the leg by the specification of stops, zones, cen-sus tracts, or boroughs; by specification of the mode of either thepreceding or following leg used as part of a linked trip; by a generalSQL query; or by the position of the leg within its linked trip. Queriescan be saved for later reuse.

The execute step specifies the day to be used for the query,which is selected from the list of available days. The trips or legscan be further restricted on the basis of a time period or an exist-

58 Transportation Research Record 2112

ing selection set of linked trips. The choices for the output includereports (Figure 3), maps (Figure 4), origin–destination matrices,or a TransCAD selection set that can be used to create external tablesor spreadsheets. The reports can be summarized by arrival time,departure time, mode or route of a particular leg, origin, destination,or origin–destination pair. A ridership report by either route patternor route segment can also be produced. Maps can depict the ori-gins and destinations of the trips or legs selected. Maps can alsoinclude a scaled theme depicting the ridership by street or tracksegment. The origin–destination matrices summarize the trips orlegs by stop, zone, census tract, or borough. In addition to creat-ing export tables, the selection set can be used for general analy-sis in TransCAD or for examination of individual linked trips in acustomized trip browser that depicts each leg of the linked trip onthe map (Figure 5).

CURRENT USE OF CITYWIDE TRANSIT TRAVEL DATABASE

The Citywide Transit Travel Database with its query tool in TransCAD provides a unique resource for short- and long-termtransportation planning in New York City as well as for the planningand scheduling of subway and bus services by NYCT. The millionsof transaction records that the MetroCard fare collection system pro-duces daily provide unprecedented details on the spatial and tempo-ral aspects of transit use. In addition to NYCT, MetroCards areaccepted on Long Island Bus, MTA Bus, Atlantic Express, Metro-North Hudson Rail Link, the Bee Line bus system in Westchester

FIGURE 2 Query builder example for the No. 6 Train.

County, SIR, Roosevelt Island Tramway, the PATH system, and theJFK AirTrain.

The retrieval of basic origin and destination information by routefor transit users in New York City is a daunting task. On an averageweekday, there are 5 million weekday subway entries and 2.5 mil-lion bus boardings. This database provides origin and destinationinformation by route, including the key distinction between linkedand unlinked trips that is essential for an understanding of existingtravel patterns and the development of forecasts. This trip informationat the subway station and bus stop levels was also converted into tra-ditional zone (block group)-to-zone trip tables for use in various traveldemand models. NYCT maintains a detailed transit network modelfor analysis of the effects of subway and bus service alternativeson route choice, travel time savings, and convenience. For regionaltransit analysis focused on commuter rail and subway use, MTAmaintains a model (a regional transit forecasting model) for modechoice, trip assignment, and benefit analysis. The metropolitan plan-ning organization for the New York City portion of the region (theNew York Metropolitan Transportation Council) maintains a full-fledged travel demand forecasting model that incorporates features

Barry, Freimer, and Slavin 59

from both the MTA and the NYCT models. Application examplesare as follows.

Long- and Short-Term Service Planning

1. The zone-to-zone trip tables are essential for the demandforecasting tasks required for major additions to the subway sys-tem, such as the Second Avenue Subway. Zone-level trip tablesare needed to project future travel as population, employment, andland use changes occur.

2. Detailed trip information helps address short-term serviceplanning issues, such as capacity and crowding constraints, fleet addi-tions, and the allocation of operating and capital resources.

Reconstruction Service Planning

3. Portions of subway lines sometimes need to be taken out ofservice for extended periods of time for reconstruction. Information

FIGURE 3 Report generated for the No. 6 Train example.

60 Transportation Research Record 2112

on how passengers use the line is used to help plan for the serviceinterruption, and the network model is used to estimate how passen-gers will adjust their route use and determine whether supplementaryservice is needed.

4. Travel pattern information is also used to determine theimpacts on passengers of routine late-night and weekend maintenanceand replacement work that requires track outages.

Bus Route Analysis

5. Origin and destination information is used to evaluate theproposed splitting of long bus routes in Manhattan and Brooklyninto two shorter and more reliable routes while minimizing the needto transfer.

6. The origin and destination locations of express routes inStaten Island were analyzed.

7. For bus rapid transit corridor planning, origin and destinationinformation was used to help determine existing bus trip lengths andsubway transfers in the corridor.

Route Usage

8. Station-to-station subway route use information was providedduring off-peak periods and weekends to plan diversions duringreconstruction work.FIGURE 4 Ridership map generated for the No. 6 Train example.

FIGURE 5 A three-legged subway–bus trip being browsed.

9. The total number of riders using each subway route, includ-ing transfers, was extracted by time of day to improve the informa-tion provided to government agencies and the press.

Model Enhancements and Updates

10. For the first time, the linked trip product provides a transit triptable that includes all of the bus and subway links that form a singletrip. This is important for understanding and modeling the thousandsof ways that passengers use the extensive bus and subway networkin New York City.

11. The modeling results are used to calibrate the NYCT demandmodel and for validation purposes when the results are comparedwith other sources of transit usage information.

12. The zone-to-zone trips tables are incorporated into the regionaltrip tables used in the regional models.

Household Travel Surveying

13. As part of the updating of the model, MTA has been con-ducting a household travel survey within New York City, and theproject team has been using this MetroCard-based information tovalidate the origin and destination trip tables being produced by thesurvey.

14. Because of logistical and financial constraints, the householdsurvey is limited to the collection of detailed information from a sam-ple of adult residents of New York City and is subject to responsebias. The MetroCard-based information, however, is based on mil-lions of daily transactions and includes all paying passengers, regard-less of their age or residence. These two sources of transit demandinformation complement each other.

CONCLUSION

The project described here involved the creation of custom softwarethat processes MetroCard data and creates geographically locatedlinked passenger trips. These trips can then be queried by the use of user-friendly software to produce reports, maps, extracts, andmatrices, as needed, to support various planning and operational needs.Numerous unexpected hurdles were overcome to create a functioningsystem that is, the authors believe, the first to actually handle buses andmixed-mode trips for a transit system with entry-only AFC data.

The processing procedure developed to generate the origin–destination trip database is highly complex and involves numerousintermediate steps, many assumptions, and some sampling approx-imations. Undoubtedly, there are errors in the trip tables produced,especially for bus trips. Nevertheless, the portrait of NYCT system useis probably the most accurate that has ever been available, and ithas many potential applications.

The data development approach could be improved in manyways. The following improvements would improve the accuracy ofthe results that could be obtained from data from MetroCard trans-actions. Some of them would also allow simpler algorithms to beused for the processing.

1. Improve the accuracy of the MetroCard transactions recordedto a minute, a second, or better. A bus can travel a significant distancein 6 min.

Barry, Freimer, and Slavin 61

2. Preserve the orderings of the MetroCard bus boardings.3. Improve the route system to the point at which it matches the

schedules exactly and also has accurate geographic locations for allthe stops.

4. Improve the bus trip-logging system to allow the easier recov-ery of trip records. This could include having the drivers enter signcodes more consistently. It would be even better to have a GlobalPositioning System-based system to provide bus locations.

The data provide the opportunity to perform much more sophis-ticated types of forecasting of the use of new services by making itpossible to use dynamic (i.e., time-dependent) methods of transitassignment. Further calibration of the transit assignment procedurecould work hand in hand to make fuller use of the data generated.

The wealth of data now available also make it possible to studynontraditional commuting patterns, which previously would havebeen ignored by onboard surveys. For example, it is now possible toexamine commuting between or within the four boroughs outside ofManhattan instead of the more common commute into Manhattan.

ACKNOWLEDGMENT

The authors acknowledge the invaluable contributions of the NYCTproject manager, Larry Hirsch, whose insights were instrumental indeveloping a solution to many of the difficult problems encounteredin this work. His observation that certain transfer sequences could beused to infer the location of a bus was critical for the development ofthe bus location procedure.

REFERENCES

1. Zhao, J. Rail Transit OD Planning and Analysis Implications of Auto-mated Data Collection Systems: Rail OD Matrix Inference and PathChoice Modeling Examples. MS thesis. Department of Urban Studiesand Planning and Department of Civil and Environmental Engineering,Massachusetts Institute of Technology, Cambridge, 2004.

2. Barry, J. J., R. Newhouser, A. Rahbee, and S. Sayeda. Origin and Desti-nation Estimation in New York City with Automated Fare SystemData. In Transportation Research Record: Journal of the TransportationResearch Board, No. 1817, Transportation Research Board of the NationalAcademies, Washington, D.C., 2002, pp. 183–187.

3. Rahbee, A., and D. Czerwinski. Using Entry-Only Automatic Fare Col-lection Data to Estimate Rail Transit Passenger Flows at CTA. Proc.,2002 Transport Chicago Conference, Chicago, Ill., 2002.

4. Wilson, N. H. M., J. Zhao, and A. Rahbee. The Potential Impact of Auto-mated Data Collection Systems on Urban Public Transport Planning. InSchedule-Based Modeling of Transportation Networks. Springer, NewYork, 2009, pp. 75–99.

5. Utsunomiya, M., J. Attanucci, and N. H. Wilson. Potential Uses ofTransit Smart Card Registration and Transaction Data to Improve Tran-sit Planning. In Transportation Research Record: Journal of the Trans-portation Research Board, No. 1971, Transportation Research Board ofthe National Academies, Washington, D.C., 2006, pp. 119–126.

6. Zhao, J., A. Rahbee, and N. H. M. Wilson. Estimating a Rail PassengerTrip Origin–Destination Matrix Using Automatic Data Collection Systems.Computer-Aided Civil and Infrastructure Engineering, Vol. 22, No. 5,2007, pp. 376–387.

7. Slavin, H., A. Rabinowicz, J. Brandon, G. Flammia, and R. Freimer.Using Automatic Fare Collection Data, GIS, and Dynamic ScheduleQueries to Improve Transit Data and Transit Assignment Models. InSchedule-Based Modeling of Transportation Networks (N. H. M. Wilsonand A. Nuzzolo, eds.). Springer, New York, 2009, pp. 101–118.

The Rail Transit Systems Committee sponsored publication of this paper, whichwas selected by the Public Transportation Group to receive the First Annual Out-standing Research Paper in Public Transportation Award.


Recommended