Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Edge AI with TI Jacinto Processors 高效邊緣智能 - TI Jacinto 處理器
August 2021
Andre Tseng & Rio Chan
1
Webinar agenda
• Edge AI system challenge
• Introducing TDA4x processors for practical embedded edge AI systems
• TI Edge AI software, tools and services for accelerated edge AI development
• TI Edge AI Cloud Service Demo
• Getting started
2
Embedded edge AI technology | Unlimited possibilities
3
Factory Automation Retail Automation
Smart building &
cities Industrial transport Healthcare
Agriculture Construction
Aerospace &
Defense DeliveryLogistics
Smart Cameras, Autonomous Machines & Robots
AI is influencing broad applications
New use cases in existing applications
Edge AI system challenge
CSI-2 RX /
USB
ENCODE
ISP / DECODEVISUALIZATION /
DISPLAY
CAPTURE
RAW /
RTSP
CAMERA
PROCESS /
DECODE
DEWARP
SCALE,,
CROP, …
CLASSIFY,
DETECT,
….
ONSCREEN
DISPLAY
COMPRESS
STREAM /
STORAGE
IMAGE
PROCESSING
DEEP
LEARNING
OUTPUTDEPTH &
MOTION
ESTIMATION,
COMPUTER
VISION
PROCESSING
Diverse workload, lot of compute horsepower for real-time processing, but at lower system-
power
Robust AI system, with functional safety security, at lower system-cost & complexity
Complex vision pipeline, multi-camera image processing, classical computer vision and AI
Building a practical edge AI system
5
Meet power, thermal &
physical constrains
Deliver speed, latency and
accuracy requirements at consistently, robustly and
reliably, even under harsh
environments
Optimized system cost Easy to develop software
development kit
Optimized on 4 vectors
Performance Power, size & weight CostFast development
cycle
TDA4x processor architecture for practical embedded edge AI systems
Safety MCUCAN-FD
LIN
SPI
Radar/
LIDARMCU
Ethernet Switch
PCIe
Image
Sensor
Applications Processor
GPU (analytics)
Generalcompute
Typical architecture
ISP
PCIe
GPU (analytics)
ISP
Ethernet Switch
Generalcompute
Safety MCUMCU
disparity
TDA4x processors enable practical embedded edge AI
TDA4x processors
LPDDR4
Multi-core A72
DSP
Safety MCU
Deep Learning Accelerator
CSI-2 ports, USB 3, Ethernet & PCIe
Switches
Large Internal mem, highspeed bus
Imaging & Vision acceleratorsISP | LDC| MSC| NF | SDE | DPF
Jacinto™ TDA4x
Security Accelerators
Video codecaccelerator
GPU Display
Multi-core MCU
SIL-3
SIL-2
AI with high-performance, at low power and optimized
system cost
Programming with Industry standard APIs
C7x + MMA | Industry’s most efficient deep learning accelerator
▪ C7x DSP + Matrix Multiply Accelerator (MMA)– Programable accelerator for tensor, vector and scalar processing
▪ Smart memory architecture results in up to 90% utilization of the accelerator and DDR BW savings
– High bandwidth interconnect, Large internal memory, 4D
programmable DMA, Data forward engine
▪ Self sustained for deep learning work-loads– No dependency on host ARM, GPU, has its own DMA engine and
memory sub-system
CScalarVector
L1I
L1D
L2
MSMC
C7x Core
Safety Prefetch
Safety Hist/LUT
Safety
Firewall DMA
Coherence
Safety
StreamingEngine
MMU
Safety
MMA
L3 DCUDMA
8 TOPS, Int-8, 80 GFLOPS @ 1GHz, per core
High FPS/TOPS
Designed for Lower power
Enables Fan-less design
512-bit wide, 64 GB/sec
Designed for Functional Safety
ECC on data memory Using TI Proprietary Technology
Lowest #of DDR interfaces &
bandwidth
Reimagine what’s possible with TDA4x processors
9
741
162
385
0
100
200
300
400
500
600
700
800
Mobilenet V1(224x224)
Resnet-50 V 1.5(224x224)
SSD-MobileNets-v1(300x300)
Fra
mes p
er
sec (FPS
)
MLPerf 0.7 Benchmarks
12.5
0
2
4
6
8
10
12
14
Fra
mes P
er
Sec (
FPS
SSD-ResNet34
(1200x1200)
22
10296
58
0
20
40
60
80
100
120
Resnet 50-v1
( 1MP)
MobileNet v1
(1MP)
MobileNet v2
(1MP)
IncpetionNet v1
(1 MP)
Fra
mes p
er sec (F
PS
)*
Feature Extraction Networks1MP (1024x1024) resolution
** 5-10% performance boost expected with future optimizations
DL inference performance on TDA4VM (8 TOPS), 8-bit fixed-point, Batch size 1, single 32-bit LPDDR4
De-warping engine Image Pyramid Imaging sub system
Accelerated imaging and computer vision
WDR w ith 3 exposure
50% higher Bit-depth
3A statics support
Support 180 and 360 FOV 10 different scales per input True 2D Bilateral f iltering
• Accelerate 10x 2 MP @ 30 fps camera, real-time
• Replace FPGA & custom ISP Chips, free up CPU MHz
• Reduce system power, latency and BOM cost
Accelerated computer vision
Stereo Depth Estimation
• Depth estimation from tw o different view s
• Confidence score for each disparity output
• Scalable, 2MP, 192 disparities, 80 MP/s
• 2D motion vector f ield estimation given tw o images
• Confidence score for each f low vector output
• Scalable, 2 MP, 150 MP/s
Dense Optical Flow
• Accelerate Depth and Motion Perception on multiple cameras in < 0.5 W
• Free CPU MHz
• Reduce system power, latency and BOM cost
Architecture
TDA4x processors functional safety
12
High Speed Interconnect 16nm FF
ASIL D
Security AccelerationCrypto: AES, 3DES, SHA, PKA, RNG
Encode
Decode
Video AccelerationEncode
Decode
Ethernet SwitchUp to 8 PortsETHERNET
MMA-+ * =
C7x DSP
32k/48K L1
512KB L2
ASIL B
Po
we
r Is
ola
tio
n
ARM Cortex A7x48k/32K each
Arm Cortex A7x48k/32K each
1M shared L2
MSM
C
Display Subsystem
1x eDP + 1x DSI
Capture Subsystem
2x CSI2 4L RX 2.5 Gbps
1x CSI2 4L TX 2.5 Gbps
GPIO
IPC
IOMMU
UDMASMMU Debug
Timers WDTSystem Services
Hardware Diagnostics
8MB L3 RAM/Cache w/ECC011100
100010001111
32b LPDDR4-4266
Arm Cortex R5F
32K/32K L1
64KB RAM
Arm
Cortex R5F32K/32K L1
64KB RAM Lock S
tep
0.5 MB SRAM
MCU Island
DMSC
Device Management &
Security Controller
Hardware Diagnostics
CRC
RTI
ESM
DCC
BIST
Atm
Cortex R5F32K/32K L1
64KB RAM
Atm
Cortex R5F
32K/32K L1
64KB RAM Lock S
tep
1MB SRAM
2x OSPI (XIP)
XIP
2x ADC
3x I2C*
3x SPI*
UDMA
2x UART*
2x I3C*
GPIO
2x
RGMII/RMII
Furian GE8430 GPU
MMA32k/32K L1
288KB L2
C66xDSP
MMA
-+ * =
32k/32K L1
288KB L2
C66xDSP
Connectivity Network
4x PCIe 14x
Serial
1x I3C1x SDIO
4x McSPI2x UART
5x McASP
3x I2C
Storage
GPMC 1x SD 3.0
1x UFS 2.x 1x eMMC5.x
1x MediaLB
2x USB RGMII/RMII
Arm
Cortex R5F
32K/32K L1
64KB RAM
Arm
Cortex R5F
32K/32K L1
64KB RAM Lo
ck S
tep
0.5 MB SRAM
Vision Processing ACCISP NF, REMAP, MSC
Depth & Motion PACDense Optical
Flow STEREO
Architecture Software Collateral
• ASIL-D/SIL-3 Systematic Capability • Built-in Hardware Diagnostics• ASIL-D/SIL-3 Safety MCU Island• ASIL-B/SIL-2 Main Domain• FFI, ECC, Clock Comparators• Voltage & Temperature Monitors
• TUV Certified Safety Software Process• Safety Diagnostics Reference & Examples• Self Test Libraries• SW FMEDAs, Code Coverage, Traceability
Reports • Compliance Support Packages
• Compiler Qualification Kit
• Device Safety Manual • Configurable FMEDA • Safety Analysis Report• Safety Assessment Certificate • Trainings • Whitepapers & Application
TI Edge AI software, tools and services for accelerated edge AI development
Industry standard APIs and Framework
TI Edge AI processor
Deep learning
Python & C++ application layer
Applications
Imaging Vision VideoArm® Cortex®-A
Multi-camera AI processing Sensor fusion Secure cloud connection
TensorFlow Lite ONNX RT OpenCVGStreamer DockerTVM
Graphics
OpenGLES
Hardware
Accelerators
Fast development cycle with industry standard APIs
DSP
TI tools and middleware for HW accelerator
Multi-video AI
processing
AI in your system | Three steps
Train anywhere, Develop anywhere Compile & Optimize for TI SoCUsing industry standard Compilers/RTs
Deploy on TI SoCUsing industry standard APIs
• Common representation • Post Training Quantization • Calibration• Optimization
• Compilation
TFLite / ONNX-RT/ TVM Compiler
TI Edge AI Processor
TFLite RT /ONNX-RT/Neo-AI-DLR
TIDL RunTime
Linux
Cortex-A DLA
RTOS
TIDL Library
1. TI Model Zoo (60+ models)• Model Selection tool• New weights
2. Own model
Optional QAT (Quantization artifacts tool) from TI
https://github.com/TexasInstruments/jacinto-ai-devkit
Accelerated inference using open-source industry standard RunTime Engines
Out of box optimized inference support for 60+ modelsDL Tools & software to reduces
model development time
1 2 3
TI Edge AI Cloud for faster edge AI evaluation
Collect latency, FPS, accuracy, DDR BW and Power benchmarks in minutes
• < 1 min to explore & compere performance : Model Selection tool
• < 5 min to experience SW & evaluate HW : TI Model Zoo examples
• < 30 min to evaluate custom models : Custom model examples
• 1 hr+ to benchmark performance : TI Model Zoo examples
In-minutes evaluation
Available now at https://dev.ti.com/edgeai !!
Free on-line service, enable deep learning evaluation in minutes
EVM FarmUsers
Evaluate TI SoC DL capabilities in remote EVM
farm using web browser, Jupyter Notebook
No EVM buy
Fast evaluation and easy to program software
< 1 min < 5 min 1 hr+ < 30 min
Use custom model and open the same program that ran on the PC
TI Model Zoo
Extensive and pre-trained | Models available ready to use
▪ TI’s Model zoo: 60 plus models to choose from
▪ Select type of function: Classification, Detection
or Segmentation
▪ Select the runtime: Tflite or TVM or ONNX-RT
Model porting | PC to embedded device
http://softw are-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/07_03_00_07/exports/docs/tidl_j7_02_00_00_07/ti_dl/docs/user_guide_html/index.html
Model Inference on TI SoC
Deep learning evaluation with TI Model Zoo in < 5 min
• Example Jupyter Notebook & Python scripts for TFLite, ONNX-RT and Neo-AI-DLR
• Select models from TI Model Zoo, run pre-compiled artifacts and evaluate SW and HW
Bring your own model and deploy in < 30 min
• Jupyter Notebook & Python scripts for TFLite, ONNX-RT and Neo-AI-DLR
• Bring your own model, compile and deploy & get automatic acceleration
Getting started
TDA4VM processor Arm® CPU core(s) 2 Arm® Cortex®-A72 64-bit, @ 1.8 GHz
Co-processor(s) MCU island of 2 Arm Cortex-R5F (lockstep opt), SoC main of 4 Arm Cortex-R5F (lockstep opt)
Neural network accelerator C7x DSP w/ MMA, 8 TOPS
Computer vision accelerator(s)ISP, Image rectification, Multi-scaling, Noise Filtering@ 720 MP/s
Depth (80 MP/s) and motion (150 MP/s),
Decode 4K60 H.264/H.265
Encode 3x1080p30 H.264
GPU 100 GFLOPS
Display 2 DPI, 1 DSI, 1 EDP
Ethernet MAC & PCIe 8-port 2.5Gb switch, 4x2L PCIe gen-3
SecurityCryptographic acceleration, Debug security, Device identity, Isolation firewalls, Secure boot &
storage & programming, Software IP protection, Trusted execution environment
Rating Automotive & Industrial
Operating temperature range -40 to 125
23https://www.ti.com/lit/gpn/tda4vmRefer to TDA4VM for full specification:
Processor SDK | Out-of-box AI demos
Semantic SegmentationObject DetectionImage Classification
8x 2MP @ 30 fps real-time image
processing- Room to process 2 more cameras
• Demonstrate RAW to RGB
processing
• Image distortion correction
• Flexible programming sub-system
Image pre-processingHardware accelerated
Deep LearningHardware accelerated
▪ Out-of-box example for image classification, object detection and semantic segmentation
▪ Model zoo: 60+ pre-trained TF, PyTorch, TFLite, ONNX and MXNet models validated on TI
processors
Demonstrate simultaneous execution of multiple models
Semantic Segmentation (Resolution: 768x384)
Object Detection(Resolution 512x512)
Object Detection(Resolution 512x512)
▪ Image Pre-processing demos
▪ Deep Learning demos
Extensive demosAv ailable now!
Available in SDK: Demo applications for deep learning & image processing
Getting started | Add intelligence with embedded Edge AI from Texas Instruments
Full development
Software development kits
Support
Product Folder: https://www.ti.com/product/TDA4VM
TDA4 EVM: http://www.ti.com/tool/TDA4VMXEVM
TI Processor SDK – Seamlessly reuse and migrate Linux, Linux-RT and TI-RTOS software across TI processors
http://www.ti.com/tool/PROCESSOR-SDK-DRA8X-TDA4X
https://e2e.ti.com
Please also let us know any specific topics you want us to cover in the futureMore Information: [email protected]
TI Edge AI Cloud evaluation
Zero-cost & in-minutes evaluation of TDA4VM hardware
https://dev.ti.com/edgeai
©2021 Texas Instruments Incorporated. All rights reserved.
IMPORTANT NOTICE AND DISCLAIMERTI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCEDESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANYIMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRDPARTY INTELLECTUAL PROPERTY RIGHTS.These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriateTI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicablestandards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants youpermission to use these resources only for development of an application that uses the TI products described in the resource. Otherreproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third partyintellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims, damages,costs, losses, and liabilities arising out of your use of these resources.TI’s products are provided subject to TI’s Terms of Sale (https:www.ti.com/legal/termsofsale.html) or other applicable terms available eitheron ti.com or provided in conjunction with such TI products. TI’s provision of these resources does not expand or otherwise alter TI’sapplicable warranties or warranty disclaimers for TI products.IMPORTANT NOTICE
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265Copyright © 2021, Texas Instruments Incorporated