15
Delivering Deep Learning to Mobile Devices via Offloading Xukan Ran*, Haoliang Chen*, Zhenming Liu 1 , Jiasi Chen* *University of California, Riverside 1 College of William and Mary

Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

DeliveringDeepLearningtoMobileDevicesviaOffloading

Xukan Ran*,Haoliang Chen*,Zhenming Liu1,Jiasi Chen**UniversityofCalifornia,Riverside 1CollegeofWilliamandMary

Page 2: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Deeplearningonmobiledevices

• Augmentedreality (AR)isthenext“killerapp”

• FastobjectrecognitioniskeyforgeneralARapplications

• Deeplearningisapopulartechniqueforobjectrecognition2

Pokemon GoSnapchatfilters(facedetection)GoogleTranslate(textprocessing)

Page 3: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Problem

• Currentapproachesfordeeplearningonmobiledevices1. Local-onlyprocessing

• ApplePhotos,GoogleTranslate• GPUspeedup[1]

2. Remote-onlyprocessing• AppleSiri,AmazonAlexa

• Goal:Developaframeworktointelligentlyoffloadtonearbyedgedevicesforreal-timevideoanalysisusingdeeplearning.

• Cannotusegeneraloffloadingtechniques.Needtospecificallyaccountfor:• Characteristicsofthevideo• Characteristicsofthedeeplearningmodels• Applicationrequirements

3

Slow!(~600ms/frame)

Doesn’tworkwhennetworkisbad

[1]L.Huynh,Y.Lee,R.Balan,“DeepMon:MobileGPU-basedDeepLearningFrameworkforContinuousVisionApplications”,ACMMobiSys,2016.

Page 4: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Designspace

Degreesoffreedom• Videocharacteristics

• Framerate• Resolution• Bitrate

• Deeplearningcharacteristics• Modelsize• Modellatency/energy• Modelaccuracy

Constraints• Apprequirements

• Latency• Accuracy

Metrics• Accuracy• Framerate• Energy

4

Complexinteractionsbetweenthesedegreesoffreedomandmetrics• e.g.,highbitratewhenoffloadingà highaccuracy,highenergy• e.g.,smalldeeplearningmodelà highframerate,lowaccuracy

Howtodecide?

Page 5: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Optimizedecision

Constraints:• Currentnetworkconditions• Applicationrequirements

offloadingdecision

neuralnetmodelsize

videoresolution

Decisionframework

detectionaccuracy

framerate

energyconsumption

Metrics:Degreesoffreedom:

Relationbetweenthedegreesoffreedomonthemetricscannotbeanalyticallyunderstoodà needmeasurements!

Page 6: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Systemdesign

6

Offlineperformance

characterization

ServerFront-enddevice

Camerafeed Outputdisplay

BatteryfunctionLatencyfunctionAccuracyfunction

SmallCNN

?Onlinedecisionengine

BigCNN

car0.9

Measurements:-

-

-

s

Page 7: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Experimentalsetup

7

• Deeplearningmodel:YOLObuiltonTensorflow [2]• tiny-yolo:9convolutionallayers• big-yolo:22convolutionallayers

• Localprocessing:OnePlus3TAndroidphonewithquad-coreCPU,6GBRAM• Remoteprocessing:Serverwithquad-coreCPU,8GBRAM,NVIDIAGeForceGTX970graphicscardwith4GBofRAM

[2]JosephRedmon,AliFarhadi,“YOLO9000:Better,Faster,Stronger”,CVPR,2017.

Videoframe

Boundingbox

Developedapptoimplementoffloading:

Page 8: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

8

Local-onlyprocessingRemote-onlyprocessing

Page 9: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Howdolatencyandenergychangewithresolution?

9Energyandlatencyincreasewithpixels2

• Encodeavideoframeatdifferentresolutions• MeasuretheprocessingtimeandenergyusageinAndroidonthesmartphone

Page 10: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Howdoesaccuracychangewithbitrateandresolution?

10

• Encode20videosatdifferentbitrateandresolutions• Measuretheaccuracy(IoU)relativetothebig-yolo+rawvideo

Accuracyincreaseswithlargermodel,higherresolution,higherbitrate

big-yolo tiny-yolo

Page 11: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Howfastisdeeplearning,end-to-end?

11

• Measure#processedframespersecond,undercontrollednetworkconditions• Caveat:stop-and-waitforeachprocessedframe

• Increasedbandwidthà higherframerate• Whenbandwidth>5Mbps,shouldoffload

0 200 400Added Network Latency (ms)

0

1

2

3

Fram

es p

er s

econ

d offload to serverrun locally on phone

• Increasedlatencyà lowerframerate• Whenlatency<100ms,shouldoffload

Page 12: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Howmuchtimeisspentforcommunication?

• Recordtimestampsasframetravelsfromphonetoserverandback

12

Whenoffloading,majorityoftimeisspentonnetwork

Page 13: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Howmuchbatteryisusedfromoffloadingdeeplearning?

13

HigherbandwidthàmorebatteryPrefertorunlocallytosavebattery

• Measurethebatterydropafter30secondsofcontinuoususage

Page 14: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

Howwelldoesoffloadingdointhewild?

14

• Perform5trialsinpubliclocationsoverLTEandWiFi• Coffeeshop1:Differentcityfromserver• Coffeeshop2:Samecity,samesubnetasserver

• Apartment1:Differentcitythanserver• Apartment2:Samecityasserver

14PerformanceoverLTEsometimes>WiFi HigherframeratesoverLTEattheexpenseofdatacost

Page 15: Delivering Deep Learning to Mobile Devices via Offloadingjiasi/pub/deep_vrarworkshop17_slides.pdf · Delivering Deep Learning to Mobile Devices via Offloading XukanRan*, HaoliangChen*,

KeyTake-Aways

15

Real-timevideoanalysisusingdeeplearningisslow(~600ms/frameonsmartphones)

Offloadingcanbebeneficial(upto2xframerate),butoptimaldecisionisunclear

Inthewild,LTEsometimes>publicWiFi