Computer Visioncrcv.ucf.edu/courses/CAP5415/Fall2014/CVIntroductionAugust2014.pdf · Computer Vision Mubarak Shah [email protected] Computer Vision ... Playing Guitar Playing Piano

8/20/2014

1

Computer Vision

Mubarak [email protected]

Computer Vision The ability of computers to see.

Image Understanding Machine Vision Robot Vision Image Analysis Video Understanding

8/20/2014

2

A picture is worth a thousand words.

A word is worth a thousand pictures.

A HUNT

8/20/2014

3

Image 2-D array of numbers (intensity values,

gray levels) Gray levels 0 (black) to 255 (white) Color image is 3 2-D arrays of numbers

Red Green Blue

Resolution (number of rows and columns) 128X128 256X256 512X512 640X480

8/20/2014

4

Image Formats

TIF PGM PBM GIF JPEG RAW

Video Sequence of frames 30 frames per second

Formats AVI MPEG Quick Time

8/20/2014

5

Video Clip

Sequence of Images

8/20/2014

6

Image Formation Light Source Camera (extrinsic and intrinsic

parameters) Scene (Surface reflectance, Surface

shape )

Perspective Projection (Pin Hole)

(X,Y,Z)

Lens

Image Plane

image Zy

fWorldpoint

8/20/2014

7

Orthographic Projection(X,Y,Z)

Image Plane

image

y

Worldpoint

Shape from X Recover 3-D shape from 2-D image(s)

Stereo Motion Shading Texture Contours

8/20/2014

8

Stereo

http://www.vision3d.com/stereo.html

8/20/2014

9

Renault Stereo Pair

Depth Map

8/20/2014

10

Shape from Shading

Lambertian Model

S=L, light source

I=S.N

8/20/2014

11

Vase

(1, 0, 1) (-1, 1, 1) (-1,-1, 1)

Shape from Texture

8/20/2014

12

Visual Motion

Shape from Motion: Moving Light Display

8/20/2014

13

Shape from Motion

Photosynth

8/20/2014

14

Sequence Raw Optical flow

27

Video Clip & Mosaic

8/20/2014

15

Applications of Computer Vision Face Recognition Object Recognition Video Surveillance and Monitoring

Object detection, tracking and behavior analysis

Remote Sensing: UAVs Robotics Computer Graphics

Object Recognition

Finding People in imagesProblem 1: Given an image I Question: Does I contain an image of a

person?

8/20/2014

16

“Yes” Instances

“No” Instances

8/20/2014

17

Localize People (Human Detection)

Human Detection

8/20/2014

18

Airplanes

Motor Cycles

8/20/2014

19

Face Recognition

Taming Wild Faces: Web‐Scale, Open‐Universe Face Recognition

CRCV | Center for Research in Computer VisionUn ive rs i t y o f Cent ra l F lo r ida

What are wild faces?

8/20/2014

20





8/20/2014

21



350 million photos uploaded daily350 million photos uploaded daily



100 hours of movies uploaded per hour100 hours of movies uploaded per hour

60,000 movies60,000 movies

8/20/2014

22



Why are wild faces important?



Three Tasks of Face Recognition

Most Research Focus

8/20/2014

23




Most Research Focus

FERET200 people1400 faces

AR100 people1400 faces

Ext. Yale B38 people2400 faces

PubFig8383 people

>830 faces

YouTube Celebrities

47 people1910 faces




Most Research Focus Pair-Matching Task

Same Not Same

PubFig200 people58,797 faces

LFW5,749 people13,233 faces

YouTube Faces

1,595 people3,425 faces

8/20/2014

24




Most Research Focus Pair-Matching TaskScenario

Most Realistic Scenario



Open-Universe Face Identification

News Article: Label Important Figures

Social Network: Tag Facebook Friends

Barack Obama

Bob

Alice

8/20/2014

25



Open-Universe Face IdentificationFind Angelina Jolie and George Clooney



Open-Universe Face IdentificationFind Angelina Jolie and George Clooney

8/20/2014

26



Open-Universe Face Identification

Gwenyth PaltrowRobert Downey Jr.

Face Recognition in Movie Trailers via Mean Sequence Sparse Representation‐based Classification

CVPR 2013

8/20/2014

27

Recognition – Qualitative

Known:Paul RuddSteve Carell


Known:Paul RuddSteve Carell

8/20/2014

28


Known:Owen WilsonReese WitherspoonPaul Rudd


Known:Bruce WillisMorgan Freeman

8/20/2014

29


Known:Tom CruiseCameron Diaz

FACIAL EXPRESSIONS

RAISE EYE BROWS SMILE

8/20/2014

30

Detecting Driver Alertness

Lipreading

8/20/2014

31

Video Surveillance and Monitoring

Automated Surveillance System (Detection & Tracking)

Object detection Object tracking Object categorization and classification

Event or Activities Recognition

A. I Person Tracking

A. II Part Tracking

8/20/2014

32

• Part of the WAS (wide area surveillance) project executed by the Homeland Security Advanced Research Project Agency (HSARPA)

• Current Sensor• 8 high-resolution cameras• provide a 100 mega-pixel, 360°field of view• frame rate: 5 frames per second

• Next Generation• 48 cameras with significantly higher resolution• smaller size

NONA: Project Overview

Current Sensor Next Generation

NONA Sysmtem—Airport Sequence 1

8/20/2014

33

UAV: Unmanned Aerial Vehicle

UAVs: Unmanned Aerial Vehicles (Drones)

Global Hawk

Predator

Microdrone

8/20/2014

34

KINGFISHER AEROSTAT BALLOON

8/20/2014 Computer Vision Lab, UCF 67

COCOA – System Flow

Ego Motion Compensation

Feature based + Gradient Based

Motion Detection

Accumulative Frame Differencing + Background

Modeling + Object Segmentation

Object Tracking

Kernel Tracking + Blob Tracking + Occlusion Handling

Telemetry*

Aerial V

ideo

Registered Images Motion Detection Tracks

CO

CO

A

Event Detection & Indexing

8/20/2014

35

Registration Result ‐ I

Aerial Video - EO Mosaic

Alignment Mask

Registration Result ‐ II

Aerial Video - IR Mosaic

Alignment Mask

8/20/2014

36

Detection Results

Tracking Results

8/20/2014

37

Wide Area Surveillance

Wide Area Surveillance

8/20/2014

38

Tracking Results

Robot Vision (Unmanned Ground Vehicle)

UGV

8/20/2014

39

UGV

Human Action Recognition

8/20/2014

40

Events, Actions, Activities,

• Action

• Event

• Movement

• Activity

• Interaction

• Verb

• ….

• 10 actions

• 9 actors per action

Bend Jack Wave2hands Wave1hand Jumpinplace

Sidestep Hop Walk Skip Run

Weizmann Action Dataset

8/20/2014

41

KTH Data Set

• Six Categories, 25 actors, 4 instances, 600. clips

Boxing HandClapping HandWaving

Jogging Running Walking

UCF Sports Action Dataset

Bench Swing Dive Swing Run

Kick Lift Ride Golf Swing Skate

9actions,142videos.

8/20/2014

42

IXMAS Multi‐view Data Set

• 13 action categories, 4 camera views, 10 actors, 3 instances.

View1 View2 View3 View4

CheckWatch

GetUp

UCF YouTube Action Dataset (UCF‐11)

Cycling Diving GolfSwinging Riding

Juggling BasketballShooting Swinging TennisSwinging

Volleyball Spiking TrampolineJumping WalkingDog

8/20/2014

43

UCF-101

BaseballPitch BasketballShooting BikingBenchPress Billiards

Breaststroke CleanandJerk DrummingDiving Fencing

GolfSwing HighJump HorseRidingHorseRace HulaHoop

JavelinThrow JugglingBalls JumpRopeJumpingJack Kayaking

Lunges MilitaryParade NunchucksMixingBatter PizzaTossing

UCF50

8/20/2014

44

PlayingGuitar PlayingPiano PlayingViolinPlayingTabla PoleVault

PommelHorse PullUps PushUpsPunch RockClimbingIndoor

RopeClimbing Rowing SkateBoardingSalsaSpins Skiing

Skijet SoccerJuggling TaiChiSwing TennisSwing

ThrowDiscus TrampolineJumping WalkingwithadogVolleyballSpiking Yo Yo

UCF50

Microsoft Kinect sensor

• Data Captured using Microsoft Kinect sensor

• Approximately 50,000 gesture samples

RGB Camera

IR Camera 1 IR Camera 2

8/20/2014

45

Gesture Lexicons

Gestures from Depth camera

Gestures from RGB camera

Diving Signals Referee Signals Nurse Gesture Music Notes

Discovered Primitives

Representative Motion Primitives (out of 136) for different batches

Left arm moving down

Left arm moving up

Left arm moving to body

Left hand moving forward

Left arm waving up

Left arm moving away from body

Right arm moving laterally up

Right arm moving away from body

Left shoulder moving to right

Left arm moving to right


Left shoulder moving down

Right arm moving up

Left arm moving up


Right arm moving down

8/20/2014

46

Test case 1: Torso motion adds noise (devel 01– 10 gestures)

Instances of Successful Recognition

Instances of Failed Recognition



Test Case 2: Improvisations (devel 06 – 9 gestures)

8/20/2014

47

Test Case 3: Subtle differences (devel 09 – 10 gestures)



8/20/2014

48

High Density Crowded Scenes

Political Rallies Religious Festivals Marathons High Density Moving Objects

Tracking in Crowds

Average chip size 14 x 22 pixels 492 Frames Selected 199 athletes for tracking Successfully tracked 143 athletes

8/20/2014

49

Results

Experiment – 1

8/20/2014

50

Experiment-3

Average chip size 14 x 17 pixels 453 Frames Selected 50 athletes for tracking

Experiment – 3

8/20/2014

51

Behaviors in Crowded Scenes

Bottleneck Departure Lanes Arch/Rings Blockings

8/20/2014

52

Proposed Method=640Ground truth=634


8/20/2014

53



8/20/2014

54

Ground truth=2322 Proposed Method=2203


8/20/2014

55

Where Am I?

“Where Am I?” Problem:

Accurate Image Localization

Input Output

Mere Visual Information(Images) Location in Terms of λ (Lon.) and φ (Lat.)=40.4419, =-79.9986

8/20/2014

56

Qualitative Image Localization Results:

68.3%: The best match found:

QueryMatch – Error: 6.4 m

QueryQuery

Match – Error: 7.5 m Match – Error: 62.7 m


Qualitative Image Localization Results:

68.3%: The best match found:13.2%: A correct match found, but not the best one:


Match – Error: 307.7 m

Match – Error: 157.3 m

Match – Error: 71.3 mQuery

Query

Query

8/20/2014

57

GeospatialTrajectoryExtraction

Visual Business RecognitionACM Multimedia 2013

8/20/2014

58

Computer Vision for Computer Graphics

Video Completion

8/20/2014

59

Layer Based Video Composition

Results of Doll

8/20/2014

60

Results of Mom-Daughter

Multimedia: Segmentation of Moving-Sounding Objects

120

Accepted in IEEE Transactions on Multimedia

8/20/2014

61

Computer Vision Text Books

8/20/2014

62

Computer Vision Researchers

8/20/2014

63

Late Azriel Rosenfeld (Mayrland)

Berthold Horn (MIT)

8/20/2014

64

Thomas Huang (UIUC)

Ruzena Bajcsy

8/20/2014

65

Jake Aggarwal (UT Austin)

Chris Brown (Rochester)

8/20/2014

66

Bob Haralick (CUNY)

Olivier Faugeras (INRIA, France)

8/20/2014

67

Takeo Kanade (CMU)

Sandy Pentland (MIT)

8/20/2014

68

Shree Nayar (Columbia)

John Canny (Berkeley)

8/20/2014

69

Demetri Terzopoulos (UCLA)

Ramesh Jain (UCI)

8/20/2014

70

Jitendra Malik (Berkeley)

Andrew Zisserman (Oxford)

8/20/2014

71

Andrew Blake (Microsoft)

Rama Chellappa

8/20/2014

72

Anil Jain

David Forsyth

8/20/2014

73

Rick Szeliski (Microsof)

David Lowe

8/20/2014

74

Cordelia Schmidt

Computer Vision Journals

8/20/2014

75

8/20/2014

76

8/20/2014

77

Computer Vision Conferences

8/20/2014

78

IEEE Conference on Computer Vision and Pattern Recognition

(CVPR)• June 24-27, 2014

• Greater Columbus Convention Center in Columbus, Ohio.

8/20/2014

79

8/20/2014

80

International Conference on Computer Vision (ICCV)

8/20/2014

81

European Conference on Computer Vision (ECCV)

International Conference on Pattern Recognition (ICPR)

8/20/2014

82

Asian Conference on Computer Vision (ACCV)

International Conference on Image Processing

8/20/2014

83

Documents

Computer Visioncrcv.ucf.edu/courses/CAP5415/Fall2014/CVIntroductionAugust2014.pdf · Computer Vision Mubarak Shah [email protected] Computer Vision ... Playing Guitar Playing Piano