16
Capstone Capstone Project Project

Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

Embed Size (px)

Citation preview

Page 1: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

Capstone Capstone ProjectProject

Page 2: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

• The data is stored in CSV format, organized by year and

month.

• In each file, each row represents a single taxi trip.

• Table 1 below gives a small sample of this data.

• There are several entries per second for four years.

• The raw trip data takes up about 116GB in text CSV format.

Page 3: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

Page 4: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

• The data is organized as follows:

• Medallion (car ID).

• Hack license (driverID).

• Vender id

• Rate_code (taximeter rate).

• Store_and_fwd_flag (unknown attribute).

Page 5: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

• Pickup datetime: start time of the trip, mm-dd-yyyy

hh24:mm:ss EDT.

• Dropoff datetime: end time of the trip, mm-dd-yyyy

hh24:mm:ss EDT.

• Passenger count: number of passengers on the trip, default

value is one.

• Trip time in secs: trip time measured by the taximeter in

seconds.

Page 6: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

• Trip distance: trip distance measured by the taximeter in

miles.

• Pickup_longitude and pickup_latitude: GPS coordinates at

the start of the trip.

• Dropoff longitude and dropoff latitude: GPS coordinates at

the end of the trip.

Page 7: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

• Fare data is also available from 2010-2014. A sample of the fare

data is shown in Table 2 below. This dataset contains the

following attributes:

• Medallion: car ID.

• Hack license: driverID.

• Vender id:

• Pickup datetime: start time of the trip, mm-dd-yyyy

hh24:mm:ss EDT.

Page 8: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

• Fare amount: the meter fare, it should include the Newark

surcharge, in USD.

• Surcharge: Extra fees, such as rush hour and overnight

surcharges, in USD.

• Mta tax: Metropolitan commuter transportation mobility tax,

in USD.

• Tip amount: tip amount, in USD.

Page 9: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

• Tolls amount: total price paid for tolls, summed across all

tolls for the trip, in USD.

• Total amount: all charges that are presented to the passenger

at time of fare payment (includes tip for non-cash trips), in

USD.

Page 10: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

NYC Taxi DataSetNYC Taxi DataSet

Page 11: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

Trajectory Data Query Model Trajectory Data Query Model

• Existing query models of the trajectory data interested in

searching and finding trajectories or trips with respect to a

given range or point.

• (e.g. “find all objects within a given area (or at a given point)

sometime during a given time interval” or “find the k-closest

objects with respect to a given point at a given time interval”)

Page 12: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

Trajectory Data Query Model Trajectory Data Query Model

• The coordinate based queries:

• Point Queries: (e.g. find the location of specific object

between 1:00pm-1:30pm).

• Region Queries: (e.g. find all trajectories or trips passed

through R region between 1:00pm-1:30pm).

• K- Nearest Neighbor Queries: (e.g. find all trajectories or trips

within 500m of a gas station between 1:00pm-1:30pm).

Page 13: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

Trajectory Data Query Model Trajectory Data Query Model

• The trajectory based queries:

• Topological Queries: (e.g. “When did vehicle X enters street Y

most recently”).

• Navigational Queries: (e.g. “What is the current speed of

vehicle X”).

Page 14: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

A Study of New York City Taxi TripsA Study of New York City Taxi Trips

From: Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips. Nivan Ferreira, Jorge Poco, Huy T. Vo, Juliana Freire, and Claudio T. Silva

Page 15: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

• For NYC DataSet:

• 2013 : http://www.andresmh.com/nyctaxitrips/

• 2010 – 2013: https://uofi.app.box.com/NYCtaxidata

• NYC TaxiVis Paper: http://vgc.poly.edu/projects/taxivis/

Page 16: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip

QuestionsQuestions