Upload
corey-mcgee
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Capstone Capstone ProjectProject
NYC Taxi DataSetNYC Taxi DataSet
• The data is stored in CSV format, organized by year and
month.
• In each file, each row represents a single taxi trip.
• Table 1 below gives a small sample of this data.
• There are several entries per second for four years.
• The raw trip data takes up about 116GB in text CSV format.
NYC Taxi DataSetNYC Taxi DataSet
NYC Taxi DataSetNYC Taxi DataSet
• The data is organized as follows:
• Medallion (car ID).
• Hack license (driverID).
• Vender id
• Rate_code (taximeter rate).
• Store_and_fwd_flag (unknown attribute).
NYC Taxi DataSetNYC Taxi DataSet
• Pickup datetime: start time of the trip, mm-dd-yyyy
hh24:mm:ss EDT.
• Dropoff datetime: end time of the trip, mm-dd-yyyy
hh24:mm:ss EDT.
• Passenger count: number of passengers on the trip, default
value is one.
• Trip time in secs: trip time measured by the taximeter in
seconds.
NYC Taxi DataSetNYC Taxi DataSet
• Trip distance: trip distance measured by the taximeter in
miles.
• Pickup_longitude and pickup_latitude: GPS coordinates at
the start of the trip.
• Dropoff longitude and dropoff latitude: GPS coordinates at
the end of the trip.
NYC Taxi DataSetNYC Taxi DataSet
• Fare data is also available from 2010-2014. A sample of the fare
data is shown in Table 2 below. This dataset contains the
following attributes:
• Medallion: car ID.
• Hack license: driverID.
• Vender id:
• Pickup datetime: start time of the trip, mm-dd-yyyy
hh24:mm:ss EDT.
NYC Taxi DataSetNYC Taxi DataSet
• Fare amount: the meter fare, it should include the Newark
surcharge, in USD.
• Surcharge: Extra fees, such as rush hour and overnight
surcharges, in USD.
• Mta tax: Metropolitan commuter transportation mobility tax,
in USD.
• Tip amount: tip amount, in USD.
NYC Taxi DataSetNYC Taxi DataSet
• Tolls amount: total price paid for tolls, summed across all
tolls for the trip, in USD.
• Total amount: all charges that are presented to the passenger
at time of fare payment (includes tip for non-cash trips), in
USD.
NYC Taxi DataSetNYC Taxi DataSet
Trajectory Data Query Model Trajectory Data Query Model
• Existing query models of the trajectory data interested in
searching and finding trajectories or trips with respect to a
given range or point.
• (e.g. “find all objects within a given area (or at a given point)
sometime during a given time interval” or “find the k-closest
objects with respect to a given point at a given time interval”)
Trajectory Data Query Model Trajectory Data Query Model
• The coordinate based queries:
• Point Queries: (e.g. find the location of specific object
between 1:00pm-1:30pm).
• Region Queries: (e.g. find all trajectories or trips passed
through R region between 1:00pm-1:30pm).
• K- Nearest Neighbor Queries: (e.g. find all trajectories or trips
within 500m of a gas station between 1:00pm-1:30pm).
Trajectory Data Query Model Trajectory Data Query Model
• The trajectory based queries:
• Topological Queries: (e.g. “When did vehicle X enters street Y
most recently”).
• Navigational Queries: (e.g. “What is the current speed of
vehicle X”).
A Study of New York City Taxi TripsA Study of New York City Taxi Trips
From: Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips. Nivan Ferreira, Jorge Poco, Huy T. Vo, Juliana Freire, and Claudio T. Silva
• For NYC DataSet:
• 2013 : http://www.andresmh.com/nyctaxitrips/
• 2010 – 2013: https://uofi.app.box.com/NYCtaxidata
• NYC TaxiVis Paper: http://vgc.poly.edu/projects/taxivis/
QuestionsQuestions