61
TRACKING-LEARNING-DETECTION V. H. AYMA Laboratório de Visão Computacional Departamento de Engenharia Elétrica PUC – Rio

DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

TRACKING-LEARNING-DETECTION

V. H. AYMA

Laboratório de Visão Computacional Departamento de Engenharia Elétrica

PUC – Rio

Page 2: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

2

CONTENT

1. INTRODUCTION

2. TLD FRAMEWORK • TRACKING

• DETECTION

• LEARNING

Page 3: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

3

CONTENT

1. INTRODUCTION

2. TLD FRAMEWORK

3. TRACKING

4. DETECTION

5. LEARNING

Page 4: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

4

INTRODUCTION

• TLD Framework1 was proposed by Zdenek Kalal, Krystian Mikolajczyk and Jiri Matas.

– P-N Learning: Bootstrapping Binary Classifiers by Structural Constrains2.

– Forward-Backward Error: Automatic Detection of Tracking Failures3.

1) Z. Kalal, K. Mikolajczyk and J. Matas, “Tracking-Learning-Detection”, Pattern Analysis and Machine

Intelligence, 2011.

2) Z. Kalal, K. Mikolajczyk and J. Matas, “Forward-Backward Error: Automatic Detection of Tracking Failures”, International Conference on Pattern Recognition, 2010, pp. 23-26.

3) Z. Kalal, J. Matas and K. Mikolajczyk, “P-N Learning: Bootstrapping Binary Classifiers by Structural Constrains”, Conference on Computer Vision and Pattern Recognition, 2010.

Page 5: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

5

INTRODUCTION LONG TERM TRACKING

The goal is to determine the object’s bounding box or indicate the object is not visible in the frames that fallows.

Page 6: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

6

INTRODUCTION LONG TERM TRACKING

Long-term tracking algorithms must:

• Detect the object when it reappears in the camera’s field of view.

– Object may change its appearance during its absence, thus initial appearance becomes irrelevant.

• Handle:

– Scale variations.

– Illumination variations.

– Occlusions.

– Background clutter.

• Operate at frame rate (real time).

Page 7: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

7

INTRODUCTION LONG TERM TRACKING

Long-term tracking algorithms must:

• Detect the object when it reappears in the camera’s field of view.

– Object may change its appearance during its absence, thus initial appearance becomes irrelevant.

• Handle:

– Scale variations.

– Illumination variations.

– Occlusions.

– Background clutter.

• Operate at frame rate (real time).

Page 8: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

8

INTRODUCTION LONG TERM TRACKING

Long-term tracking algorithms must:

• Detect the object when it reappears in the camera’s field of view.

– Object might change its appearance during its absence, thus initial appearance becomes irrelevant.

• Handle:

– Scale variations.

– Illumination variations.

– Occlusions.

– Background clutter.

• Operate at frame rate (real time).

Page 9: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

9

INTRODUCTION LONG TERM TRACKING

TRACKING (Estimates location)

+ Requires initialization

+ Produces smooth trajectories

+ Reasonably fast

‒ Accumulate errors during track (drifts)

‒ Fails when the object disappears

‒ Doesn’t have a post failure recovery mechanism

DETECTION (Finds the best match)

‒ Requires off-line training

‒ Works over a know object

‒ Produces a sort of discrete trajectories

‒ Computationally expensive

+ Doesn’t drift

+ Doesn’t fail if the object disappears

LONG-TERM TRACKING

Page 10: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

10

INTRODUCTION LONG TERM TRACKING

TRACKING (Estimates location)

+ Requires initialization

+ Produces smooth trajectories

+ Reasonably fast

‒ Accumulate errors during track (drifts)

‒ Fails when the object disappears

‒ Doesn’t have a post failure recovery mechanism

DETECTION (Finds the best match)

‒ Requires off-line training

‒ Works over a know object

‒ Produces a sort of discrete trajectories

‒ Computationally expensive

+ Doesn’t drift

+ Doesn’t fail if the object disappears

LONG-TERM TRACKING

Page 11: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

11

INTRODUCTION LONG TERM TRACKING

TRACKING (Estimates location)

+ Requires initialization

+ Produces smooth trajectories

+ Reasonably fast

‒ Accumulate errors during track (drifts)

‒ Fails when the object disappears

‒ Doesn’t have a post failure recovery mechanism

DETECTION (Finds the best match)

‒ Requires off-line training

‒ Works over a know object

‒ Produces a sort of discrete trajectories

‒ Computationally expensive

+ Doesn’t drift

+ Doesn’t fail if the object disappears

LONG-TERM TRACKING

Page 12: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

12

INTRODUCTION LONG TERM TRACKING

• Interaction between tracking and detection may benefit long-term tracking.

TRACKER (Estimates location)

DETECTOR (Finds the best match)

REINITIALIZE

TRAINING DATA

Page 13: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

13

INTRODUCTION LONG TERM TRACKING

• Interaction between tracking and detection may benefit long-term tracking.

• How reliable is the training data?

TRACKER (Estimates location)

DETECTOR (Finds the best match)

REINITIALIZE

TRAINING DATA

Page 14: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

14

INTRODUCTION LONG TERM TRACKING

• Long-term tracking can be decomposed into:

• Tracking: follows the object from frame to frame.

TRACKING

Page 15: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

15

INTRODUCTION LONG TERM TRACKING

• Long-term tracking can be decomposed into:

• Tracking: follows the object from frame to frame.

• Detection: localizes the appearances that have been observed so far.

TRACKING

DETECTION

Page 16: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

16

INTRODUCTION LONG TERM TRACKING

• Long-term tracking can be decomposed into:

• Tracking: follows the object from frame to frame.

• Detection: localizes the appearances that have been observed so far.

TRACKING

DETECTION corrects

Page 17: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

17

INTRODUCTION LONG TERM TRACKING

• Long-term tracking can be decomposed into:

• Tracking: follows the object from frame to frame.

• Detection: localizes the appearances that have been observed so far.

• Learning: estimates the detector errors.

TRACKING

LEARNING

DETECTION corrects

updates

Page 18: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

18

INTRODUCTION LONG TERM TRACKING

• Long-term tracking can be decomposed into:

• Tracking: follows the object from frame to frame.

• Detection: localizes the appearances that have been observed so far.

• Learning: estimates the detector errors.

– Should deal with an arbitrary complex video stream.

– Mustn’t degrade the detector with irrelevant information.

– Operates in real time.

TRACKING

LEARNING

DETECTION corrects

updates

Page 19: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

19

CONTENT

1. INTRODUCTION

2. TLD FRAMEWORK

• TRACKING

• DETECTION

• LEARNING

Page 20: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

20

TLD FRAMEWORK

Learning

Detection Tracking

Fragments of trajectory

Detections

Re-initialization

Training data

Page 21: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

21

TLD FRAMEWORK

Learning

Detection Tracking

Fragments of trajectory

Detections

Re-initialization

Training data

Page 22: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

22

TLD FRAMEWORK ADAPTIVE TRACKING

• Adaptive tracking eventually fails due to the insertion of background information into its model, commonly known as drifting.

Page 23: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

23

TLD FRAMEWORK ADAPTIVE TRACKING

• Adaptive tracking eventually fails due to the insertion of background information into its model, commonly known as drifting.

• Idea: recognize tracking failures and update only if the tracking is correct.

Page 24: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

24

TLD FRAMEWORK ADAPTIVE TRACKING

• Which of these points was tracked correctly?

Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.

framet framet+1

1

2

Page 25: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

25

TLD FRAMEWORK ADAPTIVE TRACKING

• Which of these points was tracked correctly?

• Measure the similarity between patches around the points.

Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.

framet framet+1

Page 26: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

26

TLD FRAMEWORK ADAPTIVE TRACKING

• Which of these points was tracked correctly?

• Measure the similarity between patches around the points.

– Typically Normalized Cross Correlation and Sum of Squared Errors between patches are embedded into tracking algorithms.

Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.

framet framet+1

Page 27: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

27

TLD FRAMEWORK ADAPTIVE TRACKING

• Which of these points was tracked correctly?

• Measure the similarity between patches around the points.

– Typically Normalized Cross Correlation and Sum of Squared Errors between patches are embedded into tracking algorithms.

• Or, compute the Forward-Backward Error.

Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.

framet framet+1

Page 28: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

28

TLD FRAMEWORK ADAPTIVE TRACKING

FORWARD-BACKWARD ERROR:

• Track points from frame t to frame t + 1, i.e., Forward tracking.

Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.

framet framet+1

Page 29: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

29

TLD FRAMEWORK ADAPTIVE TRACKING

FORWARD-BACKWARD ERROR:

• Track points from frame t to frame t + 1, i.e., Forward tracking.

• Track points from frame t + 1 to frame t, i.e., Backward tracking.

Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.

framet framet+1

Page 30: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

30

TLD FRAMEWORK ADAPTIVE TRACKING

FORWARD-BACKWARD ERROR:

• Track points from frame t to frame t + 1, i.e., Forward tracking.

• Track points from frame t + 1 to frame t, i.e., Backward tracking.

• Compute the error between initial points and backward points.

– Small errors = correctly tracked points.

Adapted from: Predator: A visual tracker that learns from its errors, Google Tech Talks.

framet framet+1

error2

error1

Page 31: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

31

TLD FRAMEWORK ADAPTIVE TRACKING

MEDIAN FLOW TRACKER framet framet+1

Initialize a grid

Track points between frames

Estimate Bounding box

Filter out 50% outliers

Estimate point reliability

Page 32: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

32

TLD FRAMEWORK

Learning

Detection Tracking

Fragments of trajectory

Detections

Re-initialization

Training data

Page 33: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

33

TLD FRAMEWORK DETECTION

• The goal is to discriminate the object from the background, i.e., model the appearance of the object.

• Let M be the object model, so that:

Where,

– p+, represents the object patches.

– p‒, are the background patches.

• New image patches can be classified as belonging to the object or the background using a Nearest Neighbor Classifier (NNC).

ppppppn1m1

,,,,,,,M

22

Page 34: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

34

TLD FRAMEWORK DETECTION

NEAREST NEIGHBOR CLASSIFIER

• Red and black points represent the object and background patches, respectively, in a d-dimensional space.

Page 35: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

35

TLD FRAMEWORK DETECTION

NEAREST NEIGHBOR CLASSIFIER • Use a relative similarity, Sr, to classify a

new patch (blue point), formally:

Where,

– S+, is the similarity with the positive

nearest neighbor.

– S–, is the similarity with the negative

nearest neighbor.

• Red and black points represent the object and background patches,

respectively, in a d-dimensional space.

Mp,Mp,

Mp,

SSS

Sr

Page 36: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

36

TLD FRAMEWORK DETECTION

NEAREST NEIGHBOR CLASSIFIER • Similarity between patches, S(pi, pj), is

given by:

Where,

– NCC, is the Normalized Cross Correlation between patches pi and pj.

• Red and black points represent the object and background patches, respectively, in a d-dimensional space.

1NCC5.0 jiji ,pp,ppS

Page 37: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

37

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

Page 38: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

38

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

Page 39: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

39

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

Page 40: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

40

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

Page 41: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

41

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

Page 42: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

42

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

Page 43: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

43

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

• Evaluate the NNC over every possible patch in the image is an unfeasible task as it involves evaluation of the relative similarity.

Page 44: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

44

TLD FRAMEWORK DETECTION

• Image patches are generated from the initial bounding box, for example, a QVGA image (240x320) with the following parameters:

– Scale step = 1.2,

– Horizontal step = 0.1(object’s width),

– Vertical step = 0.1(object’s height),

– Minimal bounding box size = 20,

Produces around 50K bounding boxes.

• Evaluate the NNC over every possible patch in the image is an unfeasible task as it involves evaluation of the relative similarity.

USE A CASCADE OF CLASSIFIERS INSTEAD!

Page 45: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

45

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS

PATCH VARIANCE

ENSEMBLE CLASSIFIER NN CLASSIFIER

REJECTED PATCHES

ACCEPTED PATCHES

Page 46: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

46

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS

• Rejects patches that have variance smaller than 50% of the initial bounding box variance.

PATCH VARIANCE

ENSEMBLE CLASSIFIER NN CLASSIFIER

REJECTED PATCHES

ACCEPTED PATCHES

Page 47: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

47

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS

• An Ensemble Classifier consist of n base classifiers.

• Each base classifier, ci, performs a number of pixel comparisons, fk, for each patch, resulting in a binary code, xi.

PATCH VARIANCE

ENSEMBLE CLASSIFIER NN CLASSIFIER

REJECTED PATCHES

ACCEPTED PATCHES

……

Page 48: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

48

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS • Consider an image patch, p, and an Ensemble Classifier which has a

single base classifier to perform four pixel comparisons, fk, so that:

• Finally, the binary code x is equal to 5.

otherwise ,0

positionposition ,1 ,, yellowkredk

k

ppf

f1 = 0 f2 = 1 f3 = 0 f4 = 1

Page 49: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

49

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS

• An Ensemble Classifier consist of n base classifiers.

• Each base classifier, ci, performs a number of pixel comparisons, fk, for each patch, resulting in a binary code, xi.

• An image patch is classified as “object” if the mean of the posterior probabilities:

is greater than 0.5.

PATCH VARIANCE

ENSEMBLE CLASSIFIER NN CLASSIFIER

REJECTED PATCHES

ACCEPTED PATCHES

……

n

i

ii xyPn

P1

1

Page 50: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

50

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS • The posterior probability for the i-th base classifier is given by:

Where, n+(xi) and n‒(xi) correspond to the number of positive and negative patches, respectively, that were assigned the same binary code.

ii

iii

xnxn

xnxyP

Page 51: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

51

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS • For example, consider a trained base classifier and a xi = 5, then:

6.010

6

46

6511

xyP

Page 52: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

52

TLD FRAMEWORK DETECTION

CASCADE OF CLASSIFIERS

PATCH VARIANCE

ENSEMBLE CLASSIFIER NN CLASSIFIER

REJECTED PATCHES

ACCEPTED PATCHES

……

Page 53: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

53

TLD FRAMEWORK

Learning

Detection Tracking

Fragments of trajectory

Detections

Re-initialization

Training data

Page 54: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

54

TLD FRAMEWORK LEARNING

• Use online learning techniques to improve the performance of the detector.

• Online learning is challenging…

– Lack of training data

– Unlabeled data

– Requires real time processing.

• Assuming that these challenges are overcome…

– Identify the errors committed by the detector in order to update it and avoid these errors in the future.

Page 55: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

55

TLD FRAMEWORK LEARNING

• Use online learning techniques to improve the performance of the detector.

• Online learning is challenging…

– Lack of training data

– Unlabeled data

– Requires real time processing.

• Assuming that these challenges are overcome…

– Identify the errors committed by the detector in order to update it and avoid these errors in the future.

P-experts N-experts

Page 56: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

56

TLD FRAMEWORK LEARNING

– Identify the errors committed by the detector in order to update it and avoid these errors in the future.

Missed detection

False alarms

Page 57: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

57

TLD FRAMEWORK LEARNING

– N-expert explores the spatial structure in the data in order to identify false positives (false alarms).

Page 58: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

58

TLD FRAMEWORK LEARNING

– P-expert explores the temporal structure in the data in order to identify false negatives (missed detections).

Page 59: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

59

TLD FRAMEWORK LEARNING

PN – LEARNING: INITIAL LEARNING

Training Set Supervised

Training

Classifier

labeled data

classifier

parameters

Page 60: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

60

TLD FRAMEWORK LEARNING

PN – LEARNING

Training Set Supervised

Training

Classifier

P-N Experts

unlabeled data labels

classifier

parameters

[‒] examples

[+] examples

Page 61: DEEP LEARNING A brief introductionvisao/TLD_Presentation.pdf · INTRODUCTION LONG TERM TRACKING TRACKING (Estimates location) + Requires initialization ... A visual tracker that learns

61

TLD FRAMEWORK LEARNING

PN – LEARNING

Training Set Supervised

Training

Classifier

P-N Experts

labeled data

unlabeled data labels

classifier

parameters

[‒] examples

[+] examples