NYC Taxi Data Analysis

Embed Size (px)

Citation preview

  • 8/18/2019 NYC Taxi Data Analysis

    1/8

    NYC TAXI DATA ANALYSIS

    Parth Shah - 0989

  • 8/18/2019 NYC Taxi Data Analysis

    2/8

    Dataset and Attribute• Analysis using Map-Reduce (Abstract)

    • Data collection and ntegration

    • !"tension o# Pro$ect

    Recap of Phase-2

    Attribute Datatype

    %endorid nu&ber

    trip'picup'dateti&e oating'ti&esta&p

    trip'dropo*'dateti&e oating'ti&esta&p

    passenger'count nu&ber

    trip'distance nu&ber

    picup'longitude nu&ber

    picup'latitude nu&ber

    ratecodeid nu&ber

    store'and'#+d'ag te"t

    dropo*'longitude nu&ber

    Attribute Datatype

    dropo*'latitude nu&ber

    pay&ent'type nu&ber

    #are'a&ount nu&ber

    e"tra nu&ber

    &ta'ta" nu&ber

    tip'a&ount nu&ber

    tolls'a&ount nu&ber

    total'a&ount nu&ber

  • 8/18/2019 NYC Taxi Data Analysis

    3/8

     ,. /a"i dataset o# 012 &ade a%ailable in 014 under 35 (/he 3reedo& o#n#or&ation 5a+)

    Data +as re6uested and collected by .hris 7hong (uy abo%e) on ard Dis anAnalysis Pro$ect &ade a%ailable as open source on it-ub:

    5ater 012 Dataset decoded by ;i$ay Pandurangan and

  • 8/18/2019 NYC Taxi Data Analysis

    4/8

  • 8/18/2019 NYC Taxi Data Analysis

    5/8

    3e+ analysis are si&ple but +hich is use#ul on our data lie %erall ,. /a"i Re6u• Date'Mapper utput EFey> ;alueG H- ERoundbyDate (trip'picup'dateti&e)> list (occurre

    • Date'Reducer utput EFey> ;alueG H- Etrip'picup'date> countG

    ,o+ +e +ill +rite the output o# Date'Reducer in cs% ;alueG H- Etrip'picup'Month> countG

    ?y /i&e

    •  /i&e'Mapper utput EFey> ;alueG H- ERoundbyours (trip'picup'dateti&e)> list (occurre

    •  /i&e'Reducer utput EFey> ;alueG H- Ehour> countG

     Ta$i Re"uest fre"uency day% &onth ' Ti(e

  • 8/18/2019 NYC Taxi Data Analysis

    6/8

    •  /his one is the si&ple analysis but ind o# interesting one> As +e already &entionegoing to introduce ne+ class(Datatype) ,a&ed location

    • Rounding location +ill create an area and it is lie round in the &ap

    • Mapper utput EFey> ;alueG H- ERound (5ocation)> list (tip)G

    • Reducer utput EFey> ;alueG H- E Round(5ocation)> A%g (tip)G

     The )enerous area of Ne*-Yor+ 

  • 8/18/2019 NYC Taxi Data Analysis

    7/8

    • 3or this analysis +e are going to ntegrate 014 and 01C dataset o# ,. /a"i and per#or& belo+ analysis:

    • 7e +ill use the output o# Analysis A and use it as an e"tension o# this one +e +ill thighest #re6uent trip locations and use it #or #air data

    • Mapper utput EFey> ;alueG H

      E3or3re6uent/rip (RoundbyDate (trip'picup'dateti&e))> 5ist (3air)G

    • Reducer utput EFey> ;alueG H

    E 3or3re6uent/rip (RoundbyDate (trip'picup'dateti&e))> A%g (3air)G

     !air increase of Ta$i and ,ut#iner Trip D

  • 8/18/2019 NYC Taxi Data Analysis

    8/8

    Than+ You