Upload
buithien
View
214
Download
0
Embed Size (px)
Citation preview
8/20/2014
1
Computer Vision
Mubarak [email protected]
Computer Vision The ability of computers to see.
Image Understanding Machine Vision Robot Vision Image Analysis Video Understanding
8/20/2014
2
A picture is worth a thousand words.
A word is worth a thousand pictures.
A HUNT
8/20/2014
3
Image 2-D array of numbers (intensity values,
gray levels) Gray levels 0 (black) to 255 (white) Color image is 3 2-D arrays of numbers
Red Green Blue
Resolution (number of rows and columns) 128X128 256X256 512X512 640X480
8/20/2014
4
Image Formats
TIF PGM PBM GIF JPEG RAW
Video Sequence of frames 30 frames per second
Formats AVI MPEG Quick Time
8/20/2014
5
Video Clip
Sequence of Images
8/20/2014
6
Image Formation Light Source Camera (extrinsic and intrinsic
parameters) Scene (Surface reflectance, Surface
shape )
Perspective Projection (Pin Hole)
(X,Y,Z)
Lens
Image Plane
image Zy
fWorldpoint
8/20/2014
7
Orthographic Projection(X,Y,Z)
Image Plane
image
y
Worldpoint
Shape from X Recover 3-D shape from 2-D image(s)
Stereo Motion Shading Texture Contours
8/20/2014
8
Stereo
http://www.vision3d.com/stereo.html
8/20/2014
9
Renault Stereo Pair
Depth Map
8/20/2014
10
Shape from Shading
Lambertian Model
S=L, light source
I=S.N
8/20/2014
11
Vase
(1, 0, 1) (-1, 1, 1) (-1,-1, 1)
Shape from Texture
8/20/2014
12
Visual Motion
Shape from Motion: Moving Light Display
8/20/2014
13
Shape from Motion
Photosynth
8/20/2014
14
Sequence Raw Optical flow
27
Video Clip & Mosaic
8/20/2014
15
Applications of Computer Vision Face Recognition Object Recognition Video Surveillance and Monitoring
Object detection, tracking and behavior analysis
Remote Sensing: UAVs Robotics Computer Graphics
Object Recognition
Finding People in imagesProblem 1: Given an image I Question: Does I contain an image of a
person?
8/20/2014
16
“Yes” Instances
“No” Instances
8/20/2014
17
Localize People (Human Detection)
Human Detection
8/20/2014
18
Airplanes
Motor Cycles
8/20/2014
19
Face Recognition
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
What are wild faces?
8/20/2014
20
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
8/20/2014
21
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
350 million photos uploaded daily350 million photos uploaded daily
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
100 hours of movies uploaded per hour100 hours of movies uploaded per hour
60,000 movies60,000 movies
8/20/2014
22
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Why are wild faces important?
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Three Tasks of Face Recognition
Most Research Focus
8/20/2014
23
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Three Tasks of Face Recognition
Most Research Focus
FERET200 people1400 faces
AR100 people1400 faces
Ext. Yale B38 people2400 faces
PubFig8383 people
>830 faces
YouTube Celebrities
47 people1910 faces
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Three Tasks of Face Recognition
Most Research Focus Pair-Matching Task
Same Not Same
PubFig200 people58,797 faces
LFW5,749 people13,233 faces
YouTube Faces
1,595 people3,425 faces
8/20/2014
24
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Three Tasks of Face Recognition
Most Research Focus Pair-Matching TaskScenario
Most Realistic Scenario
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Open-Universe Face Identification
News Article: Label Important Figures
Social Network: Tag Facebook Friends
Barack Obama
Bob
Alice
8/20/2014
25
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Open-Universe Face IdentificationFind Angelina Jolie and George Clooney
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Open-Universe Face IdentificationFind Angelina Jolie and George Clooney
8/20/2014
26
Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition
CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida
Open-Universe Face Identification
Gwenyth PaltrowRobert Downey Jr.
Face Recognition in Movie Trailers via Mean Sequence Sparse Representation‐based Classification
CVPR 2013
8/20/2014
27
Recognition – Qualitative
Known:Paul RuddSteve Carell
Recognition – Qualitative
Known:Paul RuddSteve Carell
8/20/2014
28
Recognition – Qualitative
Known:Owen WilsonReese WitherspoonPaul Rudd
Recognition – Qualitative
Known:Bruce WillisMorgan Freeman
8/20/2014
29
Recognition – Qualitative
Known:Tom CruiseCameron Diaz
FACIAL EXPRESSIONS
RAISE EYE BROWS SMILE
8/20/2014
30
Detecting Driver Alertness
Lipreading
8/20/2014
31
Video Surveillance and Monitoring
Automated Surveillance System (Detection & Tracking)
Object detection Object tracking Object categorization and classification
Event or Activities Recognition
A. I Person Tracking
A. II Part Tracking
8/20/2014
32
• Part of the WAS (wide area surveillance) project executed by the Homeland Security Advanced Research Project Agency (HSARPA)
• Current Sensor• 8 high-resolution cameras• provide a 100 mega-pixel, 360°field of view• frame rate: 5 frames per second
• Next Generation• 48 cameras with significantly higher resolution• smaller size
NONA: Project Overview
Current Sensor Next Generation
NONA Sysmtem—Airport Sequence 1
8/20/2014
33
UAV: Unmanned Aerial Vehicle
UAVs: Unmanned Aerial Vehicles (Drones)
Global Hawk
Predator
Microdrone
8/20/2014
34
KINGFISHER AEROSTAT BALLOON
8/20/2014 Computer Vision Lab, UCF 67
COCOA – System Flow
Ego Motion Compensation
Feature based + Gradient Based
Motion Detection
Accumulative Frame Differencing + Background
Modeling + Object Segmentation
Object Tracking
Kernel Tracking + Blob Tracking + Occlusion Handling
Telemetry*
Aerial V
ideo
Registered Images Motion Detection Tracks
CO
CO
A
Event Detection & Indexing
8/20/2014
35
Registration Result ‐ I
Aerial Video - EO Mosaic
Alignment Mask
Registration Result ‐ II
Aerial Video - IR Mosaic
Alignment Mask
8/20/2014
36
Detection Results
Tracking Results
8/20/2014
37
Wide Area Surveillance
Wide Area Surveillance
8/20/2014
38
Tracking Results
Robot Vision (Unmanned Ground Vehicle)
UGV
8/20/2014
39
UGV
Human Action Recognition
8/20/2014
40
Events, Actions, Activities,
• Action
• Event
• Movement
• Activity
• Interaction
• Verb
• ….
• 10 actions
• 9 actors per action
Bend Jack Wave2hands Wave1hand Jumpinplace
Sidestep Hop Walk Skip Run
Weizmann Action Dataset
8/20/2014
41
KTH Data Set
• Six Categories, 25 actors, 4 instances, 600. clips
Boxing HandClapping HandWaving
Jogging Running Walking
UCF Sports Action Dataset
Bench Swing Dive Swing Run
Kick Lift Ride Golf Swing Skate
9actions,142videos.
8/20/2014
42
IXMAS Multi‐view Data Set
• 13 action categories, 4 camera views, 10 actors, 3 instances.
View1 View2 View3 View4
CheckWatch
GetUp
UCF YouTube Action Dataset (UCF‐11)
Cycling Diving GolfSwinging Riding
Juggling BasketballShooting Swinging TennisSwinging
Volleyball Spiking TrampolineJumping WalkingDog
8/20/2014
43
UCF-101
BaseballPitch BasketballShooting BikingBenchPress Billiards
Breaststroke CleanandJerk DrummingDiving Fencing
GolfSwing HighJump HorseRidingHorseRace HulaHoop
JavelinThrow JugglingBalls JumpRopeJumpingJack Kayaking
Lunges MilitaryParade NunchucksMixingBatter PizzaTossing
UCF50
8/20/2014
44
PlayingGuitar PlayingPiano PlayingViolinPlayingTabla PoleVault
PommelHorse PullUps PushUpsPunch RockClimbingIndoor
RopeClimbing Rowing SkateBoardingSalsaSpins Skiing
Skijet SoccerJuggling TaiChiSwing TennisSwing
ThrowDiscus TrampolineJumping WalkingwithadogVolleyballSpiking Yo Yo
UCF50
Microsoft Kinect sensor
• Data Captured using Microsoft Kinect sensor
• Approximately 50,000 gesture samples
RGB Camera
IR Camera 1 IR Camera 2
8/20/2014
45
Gesture Lexicons
Gestures from Depth camera
Gestures from RGB camera
Diving Signals Referee Signals Nurse Gesture Music Notes
Discovered Primitives
Representative Motion Primitives (out of 136) for different batches
Left arm moving down
Left arm moving up
Left arm moving to body
Left hand moving forward
Left arm waving up
Left arm moving away from body
Right arm moving laterally up
Right arm moving away from body
Left shoulder moving to right
Left arm moving to right
Left arm moving down
Left shoulder moving down
Right arm moving up
Left arm moving up
Left arm moving down
Right arm moving down
8/20/2014
46
Test case 1: Torso motion adds noise (devel 01– 10 gestures)
Instances of Successful Recognition
Instances of Failed Recognition
Instances of Successful Recognition
Instances of Failed Recognition
Test Case 2: Improvisations (devel 06 – 9 gestures)
8/20/2014
47
Test Case 3: Subtle differences (devel 09 – 10 gestures)
Instances of Successful Recognition
Instances of Failed Recognition
8/20/2014
48
High Density Crowded Scenes
Political Rallies Religious Festivals Marathons High Density Moving Objects
Tracking in Crowds
Average chip size 14 x 22 pixels 492 Frames Selected 199 athletes for tracking Successfully tracked 143 athletes
8/20/2014
49
Results
Experiment – 1
8/20/2014
50
Experiment-3
Average chip size 14 x 17 pixels 453 Frames Selected 50 athletes for tracking
Experiment – 3
8/20/2014
51
Behaviors in Crowded Scenes
Bottleneck Departure Lanes Arch/Rings Blockings
8/20/2014
52
Proposed Method=640Ground truth=634
Proposed Method=1590Ground truth=1567
8/20/2014
53
Proposed Method=1468Ground truth=1428
Proposed Method=673Ground truth=653
8/20/2014
54
Ground truth=2322 Proposed Method=2203
Proposed Method=2496Ground truth=2319
8/20/2014
55
Where Am I?
“Where Am I?” Problem:
Accurate Image Localization
Input Output
Mere Visual Information(Images) Location in Terms of λ (Lon.) and φ (Lat.)=40.4419, =-79.9986
8/20/2014
56
Qualitative Image Localization Results:
68.3%: The best match found:
QueryMatch – Error: 6.4 m
QueryQuery
Match – Error: 7.5 m Match – Error: 62.7 m
QueryMatch – Error: 9.01 m
Qualitative Image Localization Results:
68.3%: The best match found:13.2%: A correct match found, but not the best one:
QueryMatch – Error: 244.7 m
Match – Error: 307.7 m
Match – Error: 157.3 m
Match – Error: 71.3 mQuery
Query
Query
8/20/2014
57
GeospatialTrajectoryExtraction
Visual Business RecognitionACM Multimedia 2013
8/20/2014
58
Computer Vision for Computer Graphics
Video Completion
8/20/2014
59
Layer Based Video Composition
Results of Doll
8/20/2014
60
Results of Mom-Daughter
Multimedia: Segmentation of Moving-Sounding Objects
120
Accepted in IEEE Transactions on Multimedia
8/20/2014
61
Computer Vision Text Books
8/20/2014
62
Computer Vision Researchers
8/20/2014
63
Late Azriel Rosenfeld (Mayrland)
Berthold Horn (MIT)
8/20/2014
64
Thomas Huang (UIUC)
Ruzena Bajcsy
8/20/2014
65
Jake Aggarwal (UT Austin)
Chris Brown (Rochester)
8/20/2014
66
Bob Haralick (CUNY)
Olivier Faugeras (INRIA, France)
8/20/2014
67
Takeo Kanade (CMU)
Sandy Pentland (MIT)
8/20/2014
68
Shree Nayar (Columbia)
John Canny (Berkeley)
8/20/2014
69
Demetri Terzopoulos (UCLA)
Ramesh Jain (UCI)
8/20/2014
70
Jitendra Malik (Berkeley)
Andrew Zisserman (Oxford)
8/20/2014
71
Andrew Blake (Microsoft)
Rama Chellappa
8/20/2014
72
Anil Jain
David Forsyth
8/20/2014
73
Rick Szeliski (Microsof)
David Lowe
8/20/2014
74
Cordelia Schmidt
Computer Vision Journals
8/20/2014
75
8/20/2014
76
8/20/2014
77
Computer Vision Conferences
8/20/2014
78
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR)• June 24-27, 2014
• Greater Columbus Convention Center in Columbus, Ohio.
8/20/2014
79
8/20/2014
80
International Conference on Computer Vision (ICCV)
8/20/2014
81
European Conference on Computer Vision (ECCV)
International Conference on Pattern Recognition (ICPR)
8/20/2014
82
Asian Conference on Computer Vision (ACCV)
International Conference on Image Processing
8/20/2014
83