91
1 Image and video analysis with local binary pattern variants Matti Pietikäinen, Guoying Zhao Center for Machine Vision Research University of Oulu, Finland http://www.cse.oulu.fi/CMV Part 1: Introduction to local binary patterns in spatial and spatiotemporal domains

Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

1

Image and video analysis with local binary pattern variants

Matti Pietikäinen, Guoying Zhao

Center for Machine Vision Research

University of Oulu, Finland

http://www.cse.oulu.fi/CMV

Part 1: Introduction to local binary patterns in spatial

and spatiotemporal domains

Page 2: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

2

Texture is an important characteristic of images and videos

Property

Pattern Contrast Transformation

LBP in spatial domain

2-D surface texture is a two dimensional phenomenon characterized by:

• spatial structure (pattern)

• contrast (‘amount’ of texture)

Thus,

1) contrast is of no interest in gray scale invariant analysis

2) often we need a gray scale and rotation invariant pattern measure

Gray scale no effect

Rotation no effect affects

affects

? affects Zoom in/out

Page 3: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

3

Local Binary Pattern and Contrast operators

6 5 2

7 6 1

9 8 7

1

1

1 11

0

00 1 2 4

8

163264

128

example thresholded weights

LBP = 1 + 16 +32 + 64 + 128 = 241

Pattern = 11110001

C = (6+7+8+9+7)/5 - (5+2+1)/3 = 4.7

An example of computing LBP and C in a 3x3 neighborhood:

Important properties:

• LBP is invariant to any monotonic gray level change

• computational simplicity

- 1778 citations in Google Scholar

- 3rd most cited paper of the PR journal published since 1996

Ojala T, Pietikäinen M & Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29:51-59.

- arbitrary circular neighborhoods - uniform patterns

- multiple scales

- rotation invariance

- gray scale variance as contrast measure

Multiscale LBP

- 3213 citations in Google Scholar

- 4th most cited paper of the top-ranking IEEE PAMI journal since 2002

- the most cited Finnish paper in ICT area published since 2002

Ojala T, Pietikäinen M & Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7):971-987 (an early version at ECCV 2000).

Page 4: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

4

•70

51

70 62

83

65

78

•47

80

-19

0 -8

13

-5

8

-23

10

0

1 0

1

0

1

0

1

1. Sample 2. Difference 3. Threshold

1*1 + 1*2 + 1*4 + 1*8 + 0*16 + 0*32 + 0*64 + 0*128 = 15

4. Multiply by powers of two and sum

An example of LBP image and histogram

Page 5: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

5

MACHINE VISION GROUP

Let’s define texture T as the joint distribution of gray levels

gc and gp (p=0,…, P-1):

T = t(gc, g0, …, gP-1)

Circular neighborhood (g1,g3,g5,g7 interpolated)

gc R g0

g2 g1

g4

g3

g6 g5 g7

gc g0

g2 g1

g4

g3

g6 g5 g7

R

Texture at gc is modeled using a local neighborhood of radius R,

which is sampled at P (8 in the example) points:

Square neighborhood

Foundations for LBP: Description of local image texture

MACHINE VISION GROUP

Without losing information, we can subtract gc from gp:

T = t(gc, g0- gc, …, gP-1- gc)

Assuming gc is independent of gp-gc, we can factorize above:

T ~ t(gc) t(g0-gc, …, gP-1-gc)

t(gc) describes the overall luminance of the image, which is unrelated to

local image texture, hence we ignore it:

T ~ t(g0-gc, …, gP-1-gc)

Above expression is invariant wrt. gray scale shifts

Description of local image texture (cont.)

Page 6: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

6

MACHINE VISION GROUP

average t(gc,g0- gc) average absolute difference

between t(gc,g0- gc) and t(gc) t(g0-gc)

Pooled (G=16) from 32 Brodatz textures used in

[Ojala, Valkealahti, Oja & Pietikäinen: Pattern Recognition 2001]

Exact independence of t(gc) and t(g0-gc, …, gP-1-gc ) is not warranted in

practice:

Description of local image texture (cont.)

MACHINE VISION GROUP

Invariance wrt. any monotonic transformation of the gray scale is achieved

by considering the signs of the differences:

T ~ t(s(g0-gc), …, s(gP-1-gc))

where

s(x) = { 1, x 0

0, x < 0

Above is transformed into a unique P-bit pattern code by assigning

binomial coefficient 2p to each sign s(gp-gc):

P-1

LBPP,R = s (gp-gc) 2p

p=0

LBP: Local Binary Pattern

Page 7: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

7

MACHINE VISION GROUP

U=2

U=0

‘Uniform’ patterns (P=8)

U=4 U=6 U=8

Examples of ‘nonuniform’ patterns (P=8)

‘Uniform’ patterns

MACHINE VISION GROUP

Texture primitives (“micro-textons”) detected by the uniform

patterns of LBP

1 = black

0 = white

Page 8: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

8

Uniform

patterns

Bit patterns with 0 or 2 transitions

0→1 or 1→0

when the pattern is considered circular

All non-uniform patterns assigned to a single bin

58 uniform patterns in case of 8 sampling points

MACHINE VISION GROUP

Rotation of Local Binary Patterns

edge (15)

(30) (60) (120) (240) (225) (195) (135) (15)

rotation

Spatial rotation of the binary pattern changes the LBP code:

Page 9: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

9

MACHINE VISION GROUP

Rotation invariant local binary patterns

Formally, rotation invariance can be achieved by defining:

LBPP,Rri = min{ROR(LBPP,R, i) | i=0, …, P-1}

(15) (30) (60) (120) (240) (225) (195) (135)

mapping

(15)

1 P-1

VARP,R= - (gp - m)2 P p=0 where

1 P-1

m = - gp P p=0

VARP,R

• invariant wrt. gray scale shifts (but not to any monotonic transformation like LBP)

• invariant wrt. rotation along the circular neighborhood

Operators for characterizing texture contrast

Local gray level variance can be used as a contrast measure:

Usually using complementary contrast leads to a better performance than using LBP alone.

Page 10: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

10

Quantization of continuous feature space

bin 0 bin 1 bin 2 bin 3

cut values

equal area

total distribution

Texture statistics are described with discrete histograms Mapping needed for continuous-valued contrast features

Non-uniform quantization Every bin have the same amount of total data

Highest resolution of the quantization is used where the number of entries

is largest

Estimation of empirical feature distributions

0 1 2 3 4 5 6 7 ... B-1

VARP,R LBPP,R

riu2 / VARP,R

LBPP,Rriu2

Joint histogram of

two operators

Input image (region) is scanned with the chosen operator(s), pixel by pixel,

and operator outputs are accumulated into a discrete histogram

LBPP,Rriu2

0 1 2 3 4 5 6 7 ... P+

1

Page 11: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

11

Multi-scale analysis

Information provided by N operators can be combined simply by summing up operatorwise similarity scores into an aggregate similarity score:

N

LN = Ln e.g. LBP8,1riu2 + LBP8,3

riu2 + LBP8,5riu2

n=1

Effectively, the above assumes that distributions of individual operators are independent

LBP features can be computed directly from different levels of an image pyramid - or image regions can be re-scaled prior to feature extraction

Multiscale analysis using images at multiple scales

Page 12: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

12

Nonparametric classification principle

In Nearest Neighbor classification, sample S is assigned to the class

of model M that maximizes

B-1

L(S,M) = Sb ln Mb b=0

Instead of log-likelihood statistic, chi square distance or histogram intersection is often used for comparing feature distributions.

The histograms should be normalized e.g. to unit length before classification,

if the sizes of the image windows to be analyzed can vary.

The bins of the LBP feature distribution can also be used directly as

features e.g. for SVM classifiers.

LBP unifies statistical and structural approaches

Page 13: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

13

Spatiotemporal LBP

- 449 citations in Google Scholar

Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915-928.

Dynamic textures (R Nelson & R Polana: IUW, 1992; M Szummer & R

Picard: ICIP, 1995; G Doretto et al., IJCV, 2003)

Page 14: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

14

Volume Local Binary Patterns (VLBP)

Sampling in volume

Thresholding

Multiply

Pattern

LBP from Three Orthogonal Planes (LBP-TOP)

0 2 4 6 8 10 12 14 160

5

10x 10

4

P: Number of Neighboring Points

Length

of

Featu

re V

ecto

r

Concatenated LBP

VLBP

Page 15: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

15

32

10

-1-2

-3-1

0

1

3

2

1

0

-1

-2

-3

XT

Y

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

X

Y

-3 -2 -1 0 1 2 3-1

0

1

X

T

-3 -2 -1 0 1 2 3-1

0

1

Y

T

LBP-TOP

Page 17: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

17

Results of LBP from three planes

5 10 15 20 25 300

0.2

0.4

0 100 200 300 400 500 600 700 8000

0.05

0.1

0.15

0.2

LBP XY XZ YZ Con weighted

8,8,8,1,1,1 riu2 88.57 84.57 86.29 93.14 93.43[2,1,1]

8,8,8,1,1,1 u2 92.86 88.86 89.43 94.57 96.29[4,1,1]

8,8,8,1,1,1 Basic 95.14 90.86 90 95.43 97.14[5,1,2]

8,8,8,3,3,3 Basic 90 91.17 94.86 95.71 96.57[1,1,4]

8,8,8,3,3,1 Basic 89.71 91.14 92.57 94.57 95.71[2,1,8]

Rotation revisited

• Rotation of an image by α degrees • Translates each local neighborhood to a new location

• Rotates each neighborhood by α degrees

LBP histogram Fourier features

If α = 45°, local binary patterns

• 00000001 → 00000010,

• 00000010 → 00000100, ...,

• 11110000 → 11100001, ...,

Similarly if α = k*45°,

• each pattern is circularly

• rotated by k steps

Zhao G, Ahonen T, Matas J & Pietikäinen M (2012) Rotation-invariant image and video description with local binary pattern features. IEEE Transactions on Image Processing 21(4):1465-1467.

Page 18: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

18

Rotation revisited (2)

In the uniform LBP histogram, rotation of input image by

k*45° causes a cyclic shift by k along each row:

LBP Histogram Fourier Features

.

.

.

LBP-HF feature vector

.

.

.

Fourier

magnitudes

Fourier

magnitudes

1

0

/2)),((),(P

r

Puri

PI ernUhunH

Fourier

magnitudes

),(),(),( unHunHunH

Page 19: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

19

Example

Input image Uniform

LBP histogram

Original rot.invariant LBP (red)

LBP-Histogram fourier (blue)

Rotation Invariant LBP-TOP

Computation of LBP-TOP for “watergrass” with 0 (left) and 60

(right) degrees rotation.

Page 20: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

20

Rotated planes from which LBP is computed.

One Dimensional Histogram Fourier LBP-TOP

(1DHFLBP-TOP)

LBP histograms for uniform patterns in different rotated motion planes with PXY = 8 and PT = 8.

Page 21: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

21

Two Dimensional Histogram Fourier LBP-TOP

(2DHFLBP-TOP)

Uniform pattern before mirror (left)

and after mirror (right).

The examples of LBP with number of “1”s two in different rotated

motion planes with PXY = 8 and PT = 8.

DynTex database

Images after rotating by 15 degree intervals.

Page 22: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

22

Page 23: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

23

A nine class dataset with the classes being boiling water (8), fire (8), flowers (12), fountains (20), plants (108), sea (12), smoke (4), water (12) and waterfall (16).

P. Saisan, G. Doretto, Y.N. Wu, and S. Soatto, “Dynamic texture recognition,” CVPR, pp. 58-63, 2001.

A. Ravichandran, R. Chaudhry, and R. Vidal, “View-invariant dynamic texture tecognition using a bag of dynamical systems,” CVPR, pp. 1-6, 2009.

Page 24: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

24

Part 2: Examples of applications

Unsupervised texture segmentation

LBP/C was used as texture operator

Segmentation algorithm consists of three phases:

• 1) hierarchical splitting 2) agglomerative merging, 3) pixelwise classification

hierarchical

splitting

agglomerative merging

pixelwise classification

- 277 citations in Google Scholar

Ojala T & Pietikäinen M (1999) Unsupervised texture segmentation using feature distributions. Pattern Recognition 32:477-486.

Page 25: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

25

Segmentation examples

Natural scene #2: 192x192 pixels

Natural scene #1: 384x384 pixels

Case project: Texture analysis of tissues with virtual microscopy

• Joint research with Adj.Prof. Johan Lundin, FIMM, University of Helsinki

• Texture analysis is becoming a very important tool in microscopy

• The objective is to study and develop automated texture-based classifiers for

breast and other cancer tissues that together with clinical and molecular-genetic

data, can distinguish patients with aggressive disease from those with indolent

disease. Also image analysis tools for the diagnosis of malaria are studied.

1. Imaging 2. Stitching 3. Virtual Slide 4. Texture Analysis with Local Binary Pattern operator 5. Classifier

Class probability

90%

8%

2%

LBP

Page 26: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

26

Linder et al. (2012) Identification of tumor epithelium and stroma in tissue

microarrays using texture analysis, Diagnostic Pathology, 2012, 7:22.

•Epithelium

•Stroma

•A •B •C •D •E

•LBP

•LBP

In experiments with colorectal cancer microscopy images, over 99% accuracy was obtained with LBP/C features and SVM classifier

Case: Automatic landscape mode detection

• The aim was to develop and implement an algorithm that automatically

classifies images to landscape and non-landscape categories

– The analysis is solely based on the visual content of images.

• The main criterion is to find an accurate but still computationally light

solution capable of real-time operation.

Huttunen S, Rahtu E, Heikkilä J, Kunttu I & Gren J (2011) Real-time detection of landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS, 6688:338-347.

Page 27: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

27

Landscape vs. non-landscape

• Definition of landscape and non-landscape images is not

straightforward

– If there are no distinct and easily separable objects present in a

natural scene, the image is classified as landscape

– The non-landscape branch consists of indoor scenes and other

images containing man-made objects at relatively close distance

Data set

• The images used for training and testing were downloaded from the

PASCAL Visual Object Classes (VOC2007) database and the Flickr

site

– All the images were manually labeled and resized to QVGA

(320x240).

• Training: 1115 landscape images and

2617 non-landscape images

• Testing: 912 and 2140, respectively

Page 28: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

28

The approach

• Simple global image representation

based on local binary pattern (LBP)

histograms is used

• Two variants:

– Basic LBP

– LBP In+Out

• SVM classifier

Histogramcomputation

SVM classifiertraining

Featureextraction

Classification results

Page 29: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

29

Summary of the results

Real-time implementation

• The current real-time implementation coded in C relies

on the basic LBPb

• Performance analysis

– Windows PC with Visual Studio 2010 Profiler

• The total execution time for one frame was

about 3 ms

– Nokia N900 with FCam

• About 30 ms

Page 30: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

30

Demo videos

Huttunen S, Rahtu E, Kunttu I, Gren J & Heikkilä J (2011) Real-time detection of landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS, 6688:338-347.

Modeling the background and detecting moving

objects

- 507 citations in Google Scholar

Heikkilä M & Pietikäinen M (2006) A texture-based method for modeling the background and detecting moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(4):657-662.

Page 31: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

31

Roughly speaking, the background subtraction can be seen as a two-stage process as illustrated below.

Background modeling

The goal is to construct and maintain a statistical representation of the scene that the camera sees.

Foreground Detection The comparison of the input frame with the current background model.

The areas of the input frame that do not fit to the background model are considered as foreground.

Overview of the approach…

We use an LBP histogram computed over a circular region around the pixel as the feature vector.

The history of each pixel over time is modeled as a group of K weighted LBP histograms: {x1,x2,…,xK}.

The background model is updated with the information of each new video frame, which makes the algorithm adaptive.

The update procedure is identical for each pixel.

x1

x2

xK

Page 32: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

32

Examples of detection results

Detection results for images of Toyama et al. (ICCV 1999)

Page 33: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

33

Demo for detection of moving objects

LBP in multi-object tracking

Takala V & Pietikäinen M (2007) Multi-object tracking using color, texture, and motion. Proc. Seventh IEEE International Workshop on Visual Surveillance (VS 2007), Minneapolis, USA, 7 p.

Page 34: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

34

Recognition of human actions

Kellokumpu V, Zhao G & Pietikäinen M (2011) Recognition of human actions using texture descriptors. Machine Vision and Applications 22(5):767-780.

Dynamic textures for action recognition

• Illustration of xyt-volume of a person walking

yt

xt

Page 35: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

35

Dynamic textures for action recognition

• Formation of the feature histogram for an xyt volume of short duration

• HMM is used for sequential modeling

Feature histogram of a bounding volume

Action classification results - KTH

.980.020

.855.145

.032,108.860

.977.020.003

.01.987.003

.033.967

.980.020

.855.145

.032,108.860

.977.020.003

.01.987.003

.033.967

Box Clap Wave Jog Run Walk

Clap

Wave

Jog

Run

Walk

Box

• Classification accuracy 93,8% using image data

Page 36: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

36

Recognition of natural actions

Chen J, Zhao G, Kellokumpu V & Pietikäinen M (2011) Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition. Proc. ICCV Workshops (VECTaR2011), Barcelona, Spain.

What did we do

• complex human action recognition

Page 37: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

37

What is the contribution

• use a simple LBP descriptor to divide each video into

motion segments

What is the contribution

• Representation

LBPHOG HOF

(HOG/HOF)+LBP

Page 38: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

38

Experimental results

• Datasets – OSD from YouTube.

• 16 sport classes: high jump, long jump, triple jump, pole vault, discus throw, hammer throw, javelin throw, shot put, basketball layup, bowling, tennis serve, platform (diving), springboard (diving), snatch (weightlifting), clean and jerk (weightlifting) and vault (gymnastics).

– HOHA

• from thirty-two movies, e.g. “American Beauty”, “Big Fish”, “Forest Gump”. It contains eight types of actions, “AnswerPhone”, “GetOutCar”, “HandShake”, “Hug-Person”, “Kiss”, “SitDown”, “SitUp”, and “StandUp”.

– KTH

• 2,396 video sequences covering 6 types of actions performed by 25 actors

To what extend

Olympic Sports Dataset (OSD)

Hollywood Human Action

dataset 2

Niebles et al, ECCV 2010

72.1 N/A

Laptev et al. CVPR 2008

62.1 38.4

Ours 80.0 53.3

Page 39: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

39

Video texture synthesis

Video texture synthesis is the process of providing a continuous and infinitely varying stream of frames, similar to the example texture.

Two major stages:

- Video stitching;

- Transition smoothing.

Guo Y, Zhao G, Zhou Z & Pietikäinen M (2013) Video texture synthesis with multi-frame LBP-TOP and diffeomorphic growth model. IEEE Transactions on Image Processing, in press.

Video stitching

• To generate a similar looking video with infinite length

taking a short video with finite length as input.

A video clip Video

Video clip: dynamic behaviors for a specific period of time.

Video texture synthesis depict the timeless quality of the phenomena in general.

Page 40: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

40

Video stitching

Page 41: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

41

Page 42: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

42

Experiments

• DynTex database is a diverse collection of high quality

dynamic texture videos. The latest version contains 650

sequences of dynamic textures.

• Samples from other databases and the Internet are also used

for evaluation, including different types of dynamic textures,

ranging from the natural scene to human motion.

http://www.cse.oulu.fi/CMV/DemosAndVideos/DTwebpage

A typical example illustrating virtual frames. Compared to the original image sequence (the first row), the image sequence containing virtual frames (the second row) also has smooth and consistent visualization. The frames in green square are real frames.

Original

Synthesized

Page 43: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

43

Part 3: A brief review of recent LBP variants

Description of interest regions with center-symmetric LBPs

n5

nc

n3 n1

n7

n0n4

n2

n6

Neighborhood

LBP =

s(n0 – nc)20

+

s(n1 – nc)21

+

s(n2 – nc)2 2 +

s(n3 – nc)2 3 +

s(n4 – nc)24

+

s(n5 – nc)25

+

s(n6 – nc)26

+

s(n7 – nc)2 7

Binary Pattern

CS-LBP =

s(n0 – n4)20

+

s(n1 – n5)21

+

s(n2 – n6)22 +

s(n3 – n7)23

- 294 citations in Google Scholar

- Most cited paper of PR journal published since 2009

Heikkilä M, Pietikäinen M & Schmid C (2009) Description of interest regions with local binary patterns. Pattern Recognition 42(3):425-436.

Page 44: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

44

Description of interest regions

InputRegion

x

y

CS-LBPFeatures

x

y

Featu

re

Region Descriptor

xy

Page 45: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

45

Setup for image matching experiments

CS-LBP perfomed better than SIFT in image maching and categorization experiments, especially for images with Illumination variations

Page 46: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

46

Gabor filtering for extracting more macroscopic information [Zhang et

al., ICCV 2005]

Preprocessing for illumination normalization [Tan & Triggs, AMFG 2007]

Edge detection - used to Enhance the gradient information; Find orientations of local patterns [Liao & Chung, ICIP 2009), Find patterns of oriented edge magnitudes [Vu & Caplier, ECCV 2010, TIP 2012]

Preprocessing prior to LBP feature extraction

Square or circular neighborhood is normally used - circular neighborhood important for rotation-invariant operators

Anisisotropic neighborhood (e.g. elliptic)

- improved results in face recognition [Liao & Chung, ACCV 2007,

and in medical image analysis [Nanni et al., Artif. Intell. Med. 2010]

Encoding similarities between patches of pixels [Wolf et al., ECCV 2008]

- they characterize well topological structural information of face appearance

Neighborhood topology

Page 47: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

47

• Using mean or median of the neihborhood for thresholding

• Using a non-zero threshold [Heikkilä et al., IEEE PAMI 2006]

• Local tenary patterns - encoding by 3 values [Tan & Triggs, AMGF 2007]

• Extended quinary patterns – encoding by 4 values [Nanni et al., Artif.

Intell. Med. 2010]

• Soft LBP [Ahonen & Pietikäinen, Finsig 2007]

• Scale invariant local ternary pattern [Liao et al., CVPR 2010] - for background subtraction applications

Thresholding and encoding

Robust LBP [Heikkilä et al., PAMI 2006]

In robust LBP, the term s(gp – gc) is replaced with s(gp – gc + a)

Allows bigger changes in pixel values without affecting thresholding results

- improved results in background subtraction [Heikkilä et al., PAMI 2006]

- was also used in CS-LBP interest region descriptor [Heikkilä et al., PR 2009]

Page 48: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

48

Binary code: 11000000

0

0 0

0

1 0 1

0

0

0 0

1

-1 0 -1

1

0

0 0

1

0 0 0

1

Binary code: 00001100

Ternary code:1100(-1)(-1)00

Local ternary patterns (LTP) [Tan & Triggs, AMGF 2007]

• Feature selection e.g. with AdaBoost to reduce the number of bins [Zhang

et al., LNCS 2005]

• Subspace methods projecting LBP features into a lower-dimensional

space [Shan et al., ICPR 2006], [Chan et al., ICB 2007]

• Learning the most dominant and discriminative patterns [Liao et al., IEEE

TIP 2009], [Guo et al., ACCV 2010]

Feature selection and learning

Page 49: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

49

A learning-based LBP

Guo Y, Zhao G & Pietikäinen M (2012) Discriminative features for texture description. Pattern Recognition 45(10):3834-3843.

• LBP was designed as a complementary measure of local contrast,

using joint LBP/C or LBP/VAR histograms

• LBPV puts the local contrast into 1-dimensional histogram [Guo et al.,

Pattern Recogn. 2010]

• Completed LBP (CLBP) considers complementary sign and magnitude

vectors [Guo et al., IEEE TIP 2010].

• Weber law descriptor (WLD) includes excitation and orientation

components [Chen et al., IEEE PAMI 2010]

Use of complementary contrast/magnitude information

Page 50: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

50

Completed LBP [Guo et al., IEEE TIP 2010]

a) 3 x 3 sample block

b) The local differences

c) The sign component

d) The magnitude component

Local patterns (sign and magnitude)

Completed local binary patterns (CLBP: sign, magnitude and contrast):

Difference: 𝑑𝑝 = 𝑔𝑝 −𝑔𝑐

Sign: 𝑢𝑝 = 1, 𝑑𝑝 ≥ 0

0, 𝑑𝑝 < 0

Magnitude: ℎ𝑝 = 1, |𝑑𝑝| ≥ 𝛿

0, |𝑑𝑝| < 𝛿, where 𝛿 is the mean value of |𝑑𝑝| from

the whole image

𝑢𝑝 ℎ𝑝 |𝑑𝑝|

Page 51: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

51

WLD: Weber law descriptor

Composed of excitation

and orientation

components

Chen J, Shan S, He C, Zhao G, Pietikäinen M, Chen X & Gao W (2010) WLD: A robust local image descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1705-1720.

Local Phase Quantization (LPQ)

- blur insentitive

- has received recently considerable attention e.g. in face analysis research community

Rahtu E, Heikkilä J, Ojansivu V & Ahonen T (2012) Local phase quantization for blur-insensitive image analysis. Image and Vision Computing 30(8):501–512.

Page 52: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

52

Visual Recognition using Local Quantized Patterns

Sibt ul Hussain and William Triggs In: European Conference on Computer Vision, Florence, Italy, 2012.

Build LUT

Assign each local pattern to the cluster whose mean is closest to it.

{𝑥 ∈ 𝑘: | 𝑥 − 𝜇𝑘 | ≤ | 𝑥 − 𝜇𝑟 |∀1 ≤ 𝑟 ≤ 𝐾}

0

0

0

0

1

1

1

1

1

0

0

0

1

1

1

1

0

0

0

0

1

0

1

1

1

0

0

0

1

1

1

0

1

0

0

0

0

1

1

1

Local Patterns

00000000

00000001

00000010

11111111

17

Index

46

6

6

… …

Page 53: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

53

Completed Local Quantized Patterns (CLQP) Huang X, Zhao G, Hong X, Pietikäinen M & Zheng W (2013) Texture description with completed local quantized patterns. Proc. 18th Scandinavian Conference on Image Analysis (SCIA 2013).

Ylioinas J, Hadid A, Guo Y & Pietikäinen M (2012) Efficient image appearance description using dense sampling based local binary patterns, ACCV 2012.

Page 54: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

54

Histogram vs. ”softly voted” statistic

• In soft voting weights are assigned based on Hamming

distance between the detected pattern and

representative entries (bins) in the statistic

– Experiments on texture classification (CUReT) on original and ”limited-sample-size” scenario and face verification (LFW)

• Minor improvement with CUReT-original and LFW

Ylioinas J, Hong X & Pietikäinen M (2013) Constructing local binary pattern statistics by soft voting. Proc. 18th Scandinavian Conference on Image Analysis (SCIA 2013).

”limited-sample-size” scenario using CUReT

- the voted statistic suffered much less while using

41x41 pixels size of patches (cropped from the

original 200x200 textures)

» E.g. LBP(8,2) without any mapping, and

with uniform and rotation invariant mapping

CUReT original CUReT ”limited-sample-size”

Page 55: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

55

Recent progress in LBP-based video descriptors

CLBP-TOP, based on computing CLBP in three orthogonal planes

LBP histogram Fourier features for rotation-invariant image and video description

Local monogenic binary patterns, combining monogenic signal analysis and local binary patterns

Combining LBP-TOP, WLD, and optical flow for segmentation

Pfister T, Li X, Zhao G & Pietikäinen M (2011) Differentiating spontaneous from posed facial expressions within a generic facial expression recognition framework. Proc. ICCV Workshops (SISM 2011), Barcelona, Spain, 868-875.

Zhao G, Ahonen T, Matas J & Pietikäinen M (2012) Rotation-invariant image and video description with local binary pattern features. IEEE Transactions on Image Processing 21(4):1465-1467

Huang X, Zhao G, Pietikäinen M & Zheng W (2012) Spatiotemporal local monogenic binary patterns for facial expression recognition. IEEE Signal Processing Letters 19(5):243-246.

Chen J, Zhao G, Salo M, Rahtu E & Pietikäinen M (2012) Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Transactions on Image Processing 22(1): 326-339.

Part 4: LBP in facial image analysis

Page 56: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

56

Face description with LBP

- 1217 citations in Google Scholar

(ECCV paper: 863 citations)

- 3rd most cited paper of PAMI journal since 2006

Ahonen T, Hadid A & Pietikäinen M (2006) Face description with local binary patterns: application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12):2037-2041. (an early version published at ECCV 2004)

Weighting the regions

Block size Metrics Weighting

18 * 21

130 * 150

Feature vector length 2891

Page 57: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

57

Local Gabor Binary Pattern Histogram Sequence [Zhang et al., ICCV 2005]

Illumination normalization by preprocessing] prior to LBP/LPQ feature extraction

Tan & Triggs, AMFG 2007]

Improving the robustness of LBP-based face recognition

Illumination invariance using LBP with NIR imaging

S.Z. Li et al. [IEEE PAMI, 2007]

Page 58: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

58

Hussain et al. (2012) Face recognition using local quantized patterrns,

BMVC 2012.

- State-of-the-art performance on FERET and LFW databases

Lei Z, Pietikäinen M & Li SZ (2013) Learning discriminant face descriptor. IEEE Trans. Pattern Analysis and Machine Intelligence, accepted.

Page 59: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

59

Research on Trusted biometrics under spoofing

attacks (TABULA RASA) 2010-2014

(http://www.tabularasa-euproject.org/)

• The project will address some of the issues of direct (spoofing)

attacks to trusted biometric systems. This is an issue that needs to be

addressed urgently because it has recently been shown that

conventional biometric techniques, such as fingerprints and face, are

vulnerable to direct (spoof) attacks.

• Coordinated by IDIAP, Switzerland

• We will focus on face and gait

recognition

• Without countermeasures biometric

systems are vulnerable to spoofing

attacks

– Recognition algorithms try to identify/verify instead of checking

whether the biometric input is genuine

• Especially true for face biometrics –

falsifying biometric data is easy

– Social media (Facebook, etc.)

– Hard to hide your face in public

– Paper prints, video displays

Problem description

Page 60: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

60

Database examples

Typical countermeasures

• Dedicated sensors (infrared, multispectral & 3D imaging)

– Extra hardware needed – not always possible to deploy

• Multi-modal – hard to spoof multiple traits

– Robustness depends also on fusion rules

• Available sensors – same data used for authentication

– Non-intrusive but no reliable solutions available

• Challenge-response

– Interaction increases both security and intrusiveness

• Costs/intrusiveness and security level go hand in hand

Page 61: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

61

Facial texture analysis – appearance and

dynamics

• Proximity between the spoofing medium and the camera (or if high

resolution imaging is used) may cause

– The recaptured face image to be out-of-focus overall blur

– Reveal other facial texture quality issues, e.g. due to the used display

medium, printing defects

Finding differences in texture patterns using LBP

• Dynamic visual cues that are based on either the motion patterns of

– A genuine human face (e.g. eye blinking, mouth movements, facial

expressions)

– The used display medium (e.g. characteristic reflections of planar

objects, distorted motion of warped photos, excessive if hand-held attack

performed relatively close to camera

Static and dynamic texture analysis using LBP-TOP

High performance in detecting 2D face spoofing attacks

• Our results were significantly better than those obtained with earlier methods

• LBP was very powerful, discriminating printing artifacts and differences in light reflection

Määttä J, Hadid A & Pietikäinen M (2011) Face spoofing detection from single images using micro-texture analysis. Proc. International Joint Conference on Biometrics (IJCB 2011).

Page 62: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

62

LBP-based method for spoofing attack detection

Facial texture analysis – appearance and dynamics

Page 63: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

63

Facial texture analysis – appearance and

dynamics

Gender classification using LBP and contrast

• concatenating LBP sign and contrast histograms: LBP_C

Ylioinas J, Hadid A & Pietikäinen M (2011) Combining contrast information and local binary patterns for gender classification. In: Image Analysis, SCIA 2011 Proceedings, Lecture Notes in Computer Science, 6688:676-686

Page 64: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

64

Age classification using LBP variants

• concatenating LBP sign and magnitude histograms: CLBP_S_M

Ylioinas J, Hadid A & Pietikäinen M (2012) Age classification in unconstrained conditions using LBP variants. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan.

Research on recognizing facial expressions and

affect

Page 65: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

65

Facial expression recognition from videos

Determine the emotional state of the face

• Regardless of the identity of the face

Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915-928.

Facial Expression Recognition

Mug Shot Dynamic Information

Action Units Prototypic Emotional Expressions

Psychological studies [Bassili 1979], have demonstrated that humans do a

better job in recognizing expressions from dynamic images as opposed to the

mug shot.

Page 66: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

66

(a) Non-overlapping blocks(9 x 8) (b) Overlapping blocks (4 x 3, overlap size = 10)

a) Block volumes (b) LBP features (c) Concatenated features for one block volume from three orthogonal planes with the appearance and motion

Database

Cohn-Kanade database :

• 97 subjects

• 374 sequences

• Age from 18 to 30 years

• Sixty-five percent were female, 15 percent were

African-American, and three percent were Asian or

Latino.

Page 67: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

67

Happiness Anger Disgust

Sadness Fear Surprise

Demo for facial expression recognition

Low resolution

No eye detection

Translation, in-plane and out-of-plane rotation, scale

Illumination change

Robust with respect to errors in

face alignment

Page 68: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

68

Principal appearance and motion from boosted

spatiotemporal descriptors

Multiresolution features=>Learning for pairs=>Slice selection

1) Use of different number of neighboring points when computing

the features in XY, XT and YT slices

2) Use of different radii which can catch the occurrences in different space and time scales

Zhao G & Pietikäinen M (2009) Boosted multi-resolution spatiotemporal descriptors for facial expression recognition. Pattern Recognition Letters 30(12):1117-1127.

3) Use of blocks of different sizes to have global and local statistical

features

The first two resolutions focus on the

pixel level in feature computation, providing different local spatiotemporal information

the third one focuses on the

block or volume level, giving more global information in space and time dimensions.

Page 69: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

69

Selected 15 most discriminative slices

Proposing the first method to recognize expressions

in weak illumination using near-infrared videos

Visible light (VL) : 0.38-0.75 μm Near Infrared (NIR) : 0.7μm-1.1μm

Zhao G, Huang X, Taini M, Li SZ & Pietikäinen M (2011) Facial expression recognition from near-infrared videos. Image and Vision Computing 29(9):607-619.

Page 70: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

70

On-line facial expression recognition from NIR videos

• NIR web camera allows expression recognition in near darkness.

• Image resolution 320 × 240 pixels.

• 15 frames used for recognition.

• Distance between the camera and subject around one meter.

Start sequences Middle sequences End sequences

Towards recognizing spontaneous expressions: a

component-based spatiotemporal feature descriptor

• Facial Interest Components

– Forehead

– Eyes

– Nose

– Mouth

• Active Shape Model for dealing with occlusion and small view

changes

Page 71: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

71

Component-based approaches

Boosted spatiotemporal LBP-TOP features are extracted from areas

centered at fiducial points (detected by ASM) or larger areas

• more robust to changes of pose, occlusions

• can be used for analyzing action units [Jiang et al, FG 2011]

Huang X, Zhao G, Pietikäinen M & Zheng W (2010) Dynamic facial expression recognition using boosted component-based spatiotemporal features and multi-classifier fusion. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2010 Proceedings, Lecture Notes in Computer Science, 6475:312-322.

• Real-world challenge (Occlusion)

• Existed occlusion detections (on static images)

X. Huang, G. Zhao, W. Zheng, M. Pietikäinen. Towards a dynamic expression recognition under facial occlusion. Pattern Recognition Letters.

Towards a dynamic expression recognition under facial occlusion

Page 72: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

72

Framework (CFD-OD-WL)

Experimental results

Page 73: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

73

Recognition of spontaneous micro-expressions

Posed Spontaneous

Page 74: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

74

Recognition of spontaneous micro-expressions

Pfister T, Li X, Zhao G & Pietikäinen M (2011) Recognising spontaneous facial micro-expressions. Proc. International Conference on Computer Vision (ICCV 2011), 1449-1456.

Idea

• Read hidden emotions in faces

Page 75: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

75

Potential applications

Reveal lies

Find right price

Our method

Price negotiation

Interrogation

Facial micro-expressions - method

Figure: An example of a facial micro-expression (top-left) being interpolated through graph embedding (top-right); the result from which spatiotemporal local texture descriptors are extracted (bottom-right), enabling recognition using multiple kernel learning.

Page 76: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

76

Figure: Normalisation in space domain to a model face. (a) is the model face onto which the feature points in the example face are mapped. (b) is the example face with its feature points detected. (c) shows the image after the feature points of the example face have been mapped to the model face.

•High speed camera

•Emotional movie clips

•Participant

•Researcher

Page 77: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

77

Movie to induce FEAR emotion: 1/10 Speed

1/10 Speed

Movie to induce HAPPY emotion:

Page 78: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

78

1/10 Speed

Movie to induce SURPRISE emotion:

Recognition of emotions from multiple modalities

Page 79: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

79

Multimodal recognition of emotions

Eye detection Facial representation by LBP

Classification by frame-by-frame

method

Kortelainen J, Huang X, Li X, Laukka S, Pietikäinen M & Seppänen T (2012) Multimodal emotion recognition by combining physiological signals and facial expressions: a preliminary study. Proc. the 34th Annual Int’l Conf. IEEE Engineering in Medicine and Biology Society (EMBC'12), San Diego, CA.

Demo

Positive Neutral

Negative

Page 80: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

80

Visual recognition of spoken phrases

Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment.

A human listener can use visual cues, such as lip and tongue movements, to enhance the level of speech understanding.

The process of using visual modality is often referred to as lipreading which is to make sense of what someone is saying by watching the movement of his lips.

McGurk effect [McGurk and MacDonald 1976] demonstrates that inconsistency

between audio and visual information can result in perceptual confusion.

Zhao G, Barnard M & Pietikäinen M (2009). Lipreading with local spatio-temporal descriptors. IEEE Transactions on Multimedia 11(7):1254-1265.

System overview

Our system consists of three stages.

• First stage: face and eye detectors, and the localization of mouth.

• Second stage: extracts the visual features.

• Last stage: recognize the input utterance.

Page 81: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

81

Local spatiotemporal descriptors for visual information

• (a) Volume of utterance sequence

• (b) Image in XY plane (147x81)

• (c) Image in XT plane (147x38) in y = 40

• (d) Image in TY plane (38x81) in x = 70

Overlapping blocks (1 x 3, overlap size = 10).

LBP-YT images

Mouth region images

LBP-XY images

LBP-XT images

•Features in each block volume.

•Mouth movement representation.

Page 82: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

82

Experiments

Database:

Our own visual speech database: OuluVS Database

20 persons; each uttering ten everyday’s greetings one to five times.

Totally, 817 sequences from 20 speakers were used in the experiments.

C1 “Excuse me” C6 “See you”

C2 “Good bye” C7 “I am sorry”

C3 “Hello” C8 “Thank you”

C4 “How are you” C9 “Have a good time”

C5 “Nice to meet you” C10 “You are welcome”

Experimental results - OuluVS database

•Mouth regions from the dataset.

•Speaker-independent:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C100

20

40

60

80

100

Phrases index

Rec

ogni

tion

resu

lts (

%)

1x5x3 block volumes

1x5x3 block volumes (features just from XY plane)

1x5x1 block volumes

Page 83: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

83

Selected 15 slices for phrases ”See you” and ”Thank you”.

Selected 15 slices for phrases ”Excuse me” and ”I am sorry”.

•These phrases were most difficult to recognize because they are quite similar in the latter part containing the same word ”you”.

•The selected slices are mainly in the first and second part of the phrase.

•The phrases ”excuse me” and ”I am sorry”are different throughout the whole utterance, and the selected features also come from the whole pronunciation.

Selecting 15 most discriminative

features

Demo for visual speech recognition

Page 84: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

84

LBP-TOP with video normalization

With normalization nearly 20%

improvement in speaker independent

recognition is obtained

Zhou Z, Zhao G & Pietikäinen M (2011) Towards a practical lipreading system. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), 137-144.

Visual speaker identification with

spatiotemporal directional features

• Problem: recognizing who is speaking on the basis of individual information

• included in speech signals and lip movements.

• Audio signal

• Video signal

geometric feature

features from the gray level data directly

• Audio + Video

Zhao G & Pietikäinen M (2013) Visual speaker identification with spatiotemporal directional features. Proc. International Conference on Image Analysis and Recognition.

Page 85: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

85

From XM2VTS database: ``Joe took fathers green shoe bench out''

An intuitive understanding of features related to lip movement:

different people have different articulatory styles when

speaking the same utterance.

• Discrete Cosine Transform (DCT)

• Principal Component Analysis (PCA)

• EdgeMap LBP

• LOCP-TOP: local ordinal contrast pattern with three orthogonal planes

1) they did not consider the directional features;

2) it only used binary code (sign) information;

3) the correlation of pixels was not eliminated.

Page 86: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

86

Encoding Spatiotemporal Directional Changes

Combining Sign and Magnitude Information

Page 87: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

87

Decorrelation

Then we use G(x) for independent sign and magnitude feature calculation:

• XM2VTS Database

• 295 subjects in four sessions, spaced monthly.

• The first recording per session of the sentence “Joe took fathers green shoe bench out” was used for this research.

Experiments

Page 88: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

88

Demo: Affective human-robot interaction

Page 89: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

89

Part 5: Summary and some future directions

Summary

• Modern texture operators form a generic tool for computer vision

• LBP and its variants are very effective for various tasks in computer

vision

• The advantages of the LBP and its variants include

- computationally very simple

- can be easily tailored to different types of problems

- robust to illumination variations

- robust to localization errors

• LPQ is also insensitive to image blurring

• New LBP variants are still emerging

• Complementary descriptors are often used

- LBP/C, CLBP, Gabor&LBP, SIFT&LBP, HOG&LBP, LPQ&LBP

Page 90: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

90

Some future directions

• New LBP variants are still emerging

• Often a single descriptor is not effective enough

• Multi-scale processing

• Use of complementary descriptors

- CLBP, Gabor&LBP, Gradient/Orientation&LBP, SIFT&LBP, HOG&LBP,

LPQ&LBP,

• Combining local with more global information (e.g. LBP & Gabor)

• Combining texture and color - or intensity and depth

• Combining sparse (e.g., SIFT, HOG) and dense descriptors (LBP)

• Machine learning for finding the most effective descriptors for

a given problem

• Dynamic textures offer a new approach to motion analysis

- general constraints of motion analysis (i.e. scene is Lambertian, rigid

and static) can be relaxed

The success of LBP approach is growing

• Increasing number of citations

• Used in various problems, applications, and industrial products

• First book on LBP published in 2011 (Springer)

• First LBP workshop was held in conjunction of ACCV 2012

• First special issue of a journal on LBP appearing in 2013

• First edited book on LBP will be published in 2013 (Springer)

For downloads, see http://www.cse.oulu.fi/CMV/Downloads

Page 91: Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2 Texture is an important characteristic of images and videos • Property Transformation

91

A book on local binary patterns published in 2011

Thanks!