Upload
shahparth8891
View
222
Download
0
Embed Size (px)
Citation preview
8/18/2019 NYC Taxi Data Analysis
1/8
NYC TAXI DATA ANALYSIS
Parth Shah - 0989
8/18/2019 NYC Taxi Data Analysis
2/8
•
Dataset and Attribute• Analysis using Map-Reduce (Abstract)
• Data collection and ntegration
• !"tension o# Pro$ect
Recap of Phase-2
Attribute Datatype
%endorid nu&ber
trip'picup'dateti&e oating'ti&esta&p
trip'dropo*'dateti&e oating'ti&esta&p
passenger'count nu&ber
trip'distance nu&ber
picup'longitude nu&ber
picup'latitude nu&ber
ratecodeid nu&ber
store'and'#+d'ag te"t
dropo*'longitude nu&ber
Attribute Datatype
dropo*'latitude nu&ber
pay&ent'type nu&ber
#are'a&ount nu&ber
e"tra nu&ber
&ta'ta" nu&ber
tip'a&ount nu&ber
tolls'a&ount nu&ber
total'a&ount nu&ber
8/18/2019 NYC Taxi Data Analysis
3/8
,. /a"i dataset o# 012 &ade a%ailable in 014 under 35 (/he 3reedo& o#n#or&ation 5a+)
Data +as re6uested and collected by .hris 7hong (uy abo%e) on ard Dis anAnalysis Pro$ect &ade a%ailable as open source on it-ub:
5ater 012 Dataset decoded by ;i$ay Pandurangan and
8/18/2019 NYC Taxi Data Analysis
4/8
8/18/2019 NYC Taxi Data Analysis
5/8
3e+ analysis are si&ple but +hich is use#ul on our data lie %erall ,. /a"i Re6u• Date'Mapper utput EFey> ;alueG H- ERoundbyDate (trip'picup'dateti&e)> list (occurre
• Date'Reducer utput EFey> ;alueG H- Etrip'picup'date> countG
,o+ +e +ill +rite the output o# Date'Reducer in cs% ;alueG H- Etrip'picup'Month> countG
?y /i&e
• /i&e'Mapper utput EFey> ;alueG H- ERoundbyours (trip'picup'dateti&e)> list (occurre
• /i&e'Reducer utput EFey> ;alueG H- Ehour> countG
Ta$i Re"uest fre"uency day% &onth ' Ti(e
8/18/2019 NYC Taxi Data Analysis
6/8
• /his one is the si&ple analysis but ind o# interesting one> As +e already &entionegoing to introduce ne+ class(Datatype) ,a&ed location
• Rounding location +ill create an area and it is lie round in the &ap
• Mapper utput EFey> ;alueG H- ERound (5ocation)> list (tip)G
• Reducer utput EFey> ;alueG H- E Round(5ocation)> A%g (tip)G
The )enerous area of Ne*-Yor+
8/18/2019 NYC Taxi Data Analysis
7/8
• 3or this analysis +e are going to ntegrate 014 and 01C dataset o# ,. /a"i and per#or& belo+ analysis:
• 7e +ill use the output o# Analysis A and use it as an e"tension o# this one +e +ill thighest #re6uent trip locations and use it #or #air data
• Mapper utput EFey> ;alueG H
E3or3re6uent/rip (RoundbyDate (trip'picup'dateti&e))> 5ist (3air)G
• Reducer utput EFey> ;alueG H
E 3or3re6uent/rip (RoundbyDate (trip'picup'dateti&e))> A%g (3air)G
!air increase of Ta$i and ,ut#iner Trip D
8/18/2019 NYC Taxi Data Analysis
8/8
Than+ You