Image and video analysis with local binary pattern variantsgyzhao/Papers/LBP-tutorial.pdf · 2...

Preview:

Citation preview

1

Image and video analysis with local binary pattern variants

Matti Pietikäinen, Guoying Zhao

Center for Machine Vision Research

University of Oulu, Finland

http://www.cse.oulu.fi/CMV

Part 1: Introduction to local binary patterns in spatial

and spatiotemporal domains

2

Texture is an important characteristic of images and videos

Property

Pattern Contrast Transformation

LBP in spatial domain

2-D surface texture is a two dimensional phenomenon characterized by:

• spatial structure (pattern)

• contrast (‘amount’ of texture)

Thus,

1) contrast is of no interest in gray scale invariant analysis

2) often we need a gray scale and rotation invariant pattern measure

Gray scale no effect

Rotation no effect affects

affects

? affects Zoom in/out

3

Local Binary Pattern and Contrast operators

6 5 2

7 6 1

9 8 7

1

1

1 11

0

00 1 2 4

8

163264

128

example thresholded weights

LBP = 1 + 16 +32 + 64 + 128 = 241

Pattern = 11110001

C = (6+7+8+9+7)/5 - (5+2+1)/3 = 4.7

An example of computing LBP and C in a 3x3 neighborhood:

Important properties:

• LBP is invariant to any monotonic gray level change

• computational simplicity

- 1778 citations in Google Scholar

- 3rd most cited paper of the PR journal published since 1996

Ojala T, Pietikäinen M & Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29:51-59.

- arbitrary circular neighborhoods - uniform patterns

- multiple scales

- rotation invariance

- gray scale variance as contrast measure

Multiscale LBP

- 3213 citations in Google Scholar

- 4th most cited paper of the top-ranking IEEE PAMI journal since 2002

- the most cited Finnish paper in ICT area published since 2002

Ojala T, Pietikäinen M & Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7):971-987 (an early version at ECCV 2000).

4

•70

51

70 62

83

65

78

•47

80

-19

0 -8

13

-5

8

-23

10

0

1 0

1

0

1

0

1

1. Sample 2. Difference 3. Threshold

1*1 + 1*2 + 1*4 + 1*8 + 0*16 + 0*32 + 0*64 + 0*128 = 15

4. Multiply by powers of two and sum

An example of LBP image and histogram

5

MACHINE VISION GROUP

Let’s define texture T as the joint distribution of gray levels

gc and gp (p=0,…, P-1):

T = t(gc, g0, …, gP-1)

Circular neighborhood (g1,g3,g5,g7 interpolated)

gc R g0

g2 g1

g4

g3

g6 g5 g7

gc g0

g2 g1

g4

g3

g6 g5 g7

R

Texture at gc is modeled using a local neighborhood of radius R,

which is sampled at P (8 in the example) points:

Square neighborhood

Foundations for LBP: Description of local image texture

MACHINE VISION GROUP

Without losing information, we can subtract gc from gp:

T = t(gc, g0- gc, …, gP-1- gc)

Assuming gc is independent of gp-gc, we can factorize above:

T ~ t(gc) t(g0-gc, …, gP-1-gc)

t(gc) describes the overall luminance of the image, which is unrelated to

local image texture, hence we ignore it:

T ~ t(g0-gc, …, gP-1-gc)

Above expression is invariant wrt. gray scale shifts

Description of local image texture (cont.)

6

MACHINE VISION GROUP

average t(gc,g0- gc) average absolute difference

between t(gc,g0- gc) and t(gc) t(g0-gc)

Pooled (G=16) from 32 Brodatz textures used in

[Ojala, Valkealahti, Oja & Pietikäinen: Pattern Recognition 2001]

Exact independence of t(gc) and t(g0-gc, …, gP-1-gc ) is not warranted in

practice:

Description of local image texture (cont.)

MACHINE VISION GROUP

Invariance wrt. any monotonic transformation of the gray scale is achieved

by considering the signs of the differences:

T ~ t(s(g0-gc), …, s(gP-1-gc))

where

s(x) = { 1, x 0

0, x < 0

Above is transformed into a unique P-bit pattern code by assigning

binomial coefficient 2p to each sign s(gp-gc):

P-1

LBPP,R = s (gp-gc) 2p

p=0

LBP: Local Binary Pattern

7

MACHINE VISION GROUP

U=2

U=0

‘Uniform’ patterns (P=8)

U=4 U=6 U=8

Examples of ‘nonuniform’ patterns (P=8)

‘Uniform’ patterns

MACHINE VISION GROUP

Texture primitives (“micro-textons”) detected by the uniform

patterns of LBP

1 = black

0 = white

8

Uniform

patterns

Bit patterns with 0 or 2 transitions

0→1 or 1→0

when the pattern is considered circular

All non-uniform patterns assigned to a single bin

58 uniform patterns in case of 8 sampling points

MACHINE VISION GROUP

Rotation of Local Binary Patterns

edge (15)

(30) (60) (120) (240) (225) (195) (135) (15)

rotation

Spatial rotation of the binary pattern changes the LBP code:

9

MACHINE VISION GROUP

Rotation invariant local binary patterns

Formally, rotation invariance can be achieved by defining:

LBPP,Rri = min{ROR(LBPP,R, i) | i=0, …, P-1}

(15) (30) (60) (120) (240) (225) (195) (135)

mapping

(15)

1 P-1

VARP,R= - (gp - m)2 P p=0 where

1 P-1

m = - gp P p=0

VARP,R

• invariant wrt. gray scale shifts (but not to any monotonic transformation like LBP)

• invariant wrt. rotation along the circular neighborhood

Operators for characterizing texture contrast

Local gray level variance can be used as a contrast measure:

Usually using complementary contrast leads to a better performance than using LBP alone.

10

Quantization of continuous feature space

bin 0 bin 1 bin 2 bin 3

cut values

equal area

total distribution

Texture statistics are described with discrete histograms Mapping needed for continuous-valued contrast features

Non-uniform quantization Every bin have the same amount of total data

Highest resolution of the quantization is used where the number of entries

is largest

Estimation of empirical feature distributions

0 1 2 3 4 5 6 7 ... B-1

VARP,R LBPP,R

riu2 / VARP,R

LBPP,Rriu2

Joint histogram of

two operators

Input image (region) is scanned with the chosen operator(s), pixel by pixel,

and operator outputs are accumulated into a discrete histogram

LBPP,Rriu2

0 1 2 3 4 5 6 7 ... P+

1

11

Multi-scale analysis

Information provided by N operators can be combined simply by summing up operatorwise similarity scores into an aggregate similarity score:

N

LN = Ln e.g. LBP8,1riu2 + LBP8,3

riu2 + LBP8,5riu2

n=1

Effectively, the above assumes that distributions of individual operators are independent

LBP features can be computed directly from different levels of an image pyramid - or image regions can be re-scaled prior to feature extraction

Multiscale analysis using images at multiple scales

12

Nonparametric classification principle

In Nearest Neighbor classification, sample S is assigned to the class

of model M that maximizes

B-1

L(S,M) = Sb ln Mb b=0

Instead of log-likelihood statistic, chi square distance or histogram intersection is often used for comparing feature distributions.

The histograms should be normalized e.g. to unit length before classification,

if the sizes of the image windows to be analyzed can vary.

The bins of the LBP feature distribution can also be used directly as

features e.g. for SVM classifiers.

LBP unifies statistical and structural approaches

13

Spatiotemporal LBP

- 449 citations in Google Scholar

Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915-928.

Dynamic textures (R Nelson & R Polana: IUW, 1992; M Szummer & R

Picard: ICIP, 1995; G Doretto et al., IJCV, 2003)

14

Volume Local Binary Patterns (VLBP)

Sampling in volume

Thresholding

Multiply

Pattern

LBP from Three Orthogonal Planes (LBP-TOP)

0 2 4 6 8 10 12 14 160

5

10x 10

4

P: Number of Neighboring Points

Length

of

Featu

re V

ecto

r

Concatenated LBP

VLBP

15

32

10

-1-2

-3-1

0

1

3

2

1

0

-1

-2

-3

XT

Y

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

X

Y

-3 -2 -1 0 1 2 3-1

0

1

X

T

-3 -2 -1 0 1 2 3-1

0

1

Y

T

LBP-TOP

17

Results of LBP from three planes

5 10 15 20 25 300

0.2

0.4

0 100 200 300 400 500 600 700 8000

0.05

0.1

0.15

0.2

LBP XY XZ YZ Con weighted

8,8,8,1,1,1 riu2 88.57 84.57 86.29 93.14 93.43[2,1,1]

8,8,8,1,1,1 u2 92.86 88.86 89.43 94.57 96.29[4,1,1]

8,8,8,1,1,1 Basic 95.14 90.86 90 95.43 97.14[5,1,2]

8,8,8,3,3,3 Basic 90 91.17 94.86 95.71 96.57[1,1,4]

8,8,8,3,3,1 Basic 89.71 91.14 92.57 94.57 95.71[2,1,8]

Rotation revisited

• Rotation of an image by α degrees • Translates each local neighborhood to a new location

• Rotates each neighborhood by α degrees

LBP histogram Fourier features

If α = 45°, local binary patterns

• 00000001 → 00000010,

• 00000010 → 00000100, ...,

• 11110000 → 11100001, ...,

Similarly if α = k*45°,

• each pattern is circularly

• rotated by k steps

Zhao G, Ahonen T, Matas J & Pietikäinen M (2012) Rotation-invariant image and video description with local binary pattern features. IEEE Transactions on Image Processing 21(4):1465-1467.

18

Rotation revisited (2)

In the uniform LBP histogram, rotation of input image by

k*45° causes a cyclic shift by k along each row:

LBP Histogram Fourier Features

.

.

.

LBP-HF feature vector

.

.

.

Fourier

magnitudes

Fourier

magnitudes

1

0

/2)),((),(P

r

Puri

PI ernUhunH

Fourier

magnitudes

),(),(),( unHunHunH

19

Example

Input image Uniform

LBP histogram

Original rot.invariant LBP (red)

LBP-Histogram fourier (blue)

Rotation Invariant LBP-TOP

Computation of LBP-TOP for “watergrass” with 0 (left) and 60

(right) degrees rotation.

20

Rotated planes from which LBP is computed.

One Dimensional Histogram Fourier LBP-TOP

(1DHFLBP-TOP)

LBP histograms for uniform patterns in different rotated motion planes with PXY = 8 and PT = 8.

21

Two Dimensional Histogram Fourier LBP-TOP

(2DHFLBP-TOP)

Uniform pattern before mirror (left)

and after mirror (right).

The examples of LBP with number of “1”s two in different rotated

motion planes with PXY = 8 and PT = 8.

DynTex database

Images after rotating by 15 degree intervals.

22

23

A nine class dataset with the classes being boiling water (8), fire (8), flowers (12), fountains (20), plants (108), sea (12), smoke (4), water (12) and waterfall (16).

P. Saisan, G. Doretto, Y.N. Wu, and S. Soatto, “Dynamic texture recognition,” CVPR, pp. 58-63, 2001.

A. Ravichandran, R. Chaudhry, and R. Vidal, “View-invariant dynamic texture tecognition using a bag of dynamical systems,” CVPR, pp. 1-6, 2009.

24

Part 2: Examples of applications

Unsupervised texture segmentation

LBP/C was used as texture operator

Segmentation algorithm consists of three phases:

• 1) hierarchical splitting 2) agglomerative merging, 3) pixelwise classification

hierarchical

splitting

agglomerative merging

pixelwise classification

- 277 citations in Google Scholar

Ojala T & Pietikäinen M (1999) Unsupervised texture segmentation using feature distributions. Pattern Recognition 32:477-486.

25

Segmentation examples

Natural scene #2: 192x192 pixels

Natural scene #1: 384x384 pixels

Case project: Texture analysis of tissues with virtual microscopy

• Joint research with Adj.Prof. Johan Lundin, FIMM, University of Helsinki

• Texture analysis is becoming a very important tool in microscopy

• The objective is to study and develop automated texture-based classifiers for

breast and other cancer tissues that together with clinical and molecular-genetic

data, can distinguish patients with aggressive disease from those with indolent

disease. Also image analysis tools for the diagnosis of malaria are studied.

1. Imaging 2. Stitching 3. Virtual Slide 4. Texture Analysis with Local Binary Pattern operator 5. Classifier

Class probability

90%

8%

2%

LBP

26

Linder et al. (2012) Identification of tumor epithelium and stroma in tissue

microarrays using texture analysis, Diagnostic Pathology, 2012, 7:22.

•Epithelium

•Stroma

•A •B •C •D •E

•LBP

•LBP

In experiments with colorectal cancer microscopy images, over 99% accuracy was obtained with LBP/C features and SVM classifier

Case: Automatic landscape mode detection

• The aim was to develop and implement an algorithm that automatically

classifies images to landscape and non-landscape categories

– The analysis is solely based on the visual content of images.

• The main criterion is to find an accurate but still computationally light

solution capable of real-time operation.

Huttunen S, Rahtu E, Heikkilä J, Kunttu I & Gren J (2011) Real-time detection of landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS, 6688:338-347.

27

Landscape vs. non-landscape

• Definition of landscape and non-landscape images is not

straightforward

– If there are no distinct and easily separable objects present in a

natural scene, the image is classified as landscape

– The non-landscape branch consists of indoor scenes and other

images containing man-made objects at relatively close distance

Data set

• The images used for training and testing were downloaded from the

PASCAL Visual Object Classes (VOC2007) database and the Flickr

site

– All the images were manually labeled and resized to QVGA

(320x240).

• Training: 1115 landscape images and

2617 non-landscape images

• Testing: 912 and 2140, respectively

28

The approach

• Simple global image representation

based on local binary pattern (LBP)

histograms is used

• Two variants:

– Basic LBP

– LBP In+Out

• SVM classifier

Histogramcomputation

SVM classifiertraining

Featureextraction

Classification results

29

Summary of the results

Real-time implementation

• The current real-time implementation coded in C relies

on the basic LBPb

• Performance analysis

– Windows PC with Visual Studio 2010 Profiler

• The total execution time for one frame was

about 3 ms

– Nokia N900 with FCam

• About 30 ms

30

Demo videos

Huttunen S, Rahtu E, Kunttu I, Gren J & Heikkilä J (2011) Real-time detection of landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS, 6688:338-347.

Modeling the background and detecting moving

objects

- 507 citations in Google Scholar

Heikkilä M & Pietikäinen M (2006) A texture-based method for modeling the background and detecting moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(4):657-662.

31

Roughly speaking, the background subtraction can be seen as a two-stage process as illustrated below.

Background modeling

The goal is to construct and maintain a statistical representation of the scene that the camera sees.

Foreground Detection The comparison of the input frame with the current background model.

The areas of the input frame that do not fit to the background model are considered as foreground.

Overview of the approach…

We use an LBP histogram computed over a circular region around the pixel as the feature vector.

The history of each pixel over time is modeled as a group of K weighted LBP histograms: {x1,x2,…,xK}.

The background model is updated with the information of each new video frame, which makes the algorithm adaptive.

The update procedure is identical for each pixel.

x1

x2

xK

32

Examples of detection results

Detection results for images of Toyama et al. (ICCV 1999)

33

Demo for detection of moving objects

LBP in multi-object tracking

Takala V & Pietikäinen M (2007) Multi-object tracking using color, texture, and motion. Proc. Seventh IEEE International Workshop on Visual Surveillance (VS 2007), Minneapolis, USA, 7 p.

34

Recognition of human actions

Kellokumpu V, Zhao G & Pietikäinen M (2011) Recognition of human actions using texture descriptors. Machine Vision and Applications 22(5):767-780.

Dynamic textures for action recognition

• Illustration of xyt-volume of a person walking

yt

xt

35

Dynamic textures for action recognition

• Formation of the feature histogram for an xyt volume of short duration

• HMM is used for sequential modeling

Feature histogram of a bounding volume

Action classification results - KTH

.980.020

.855.145

.032,108.860

.977.020.003

.01.987.003

.033.967

.980.020

.855.145

.032,108.860

.977.020.003

.01.987.003

.033.967

Box Clap Wave Jog Run Walk

Clap

Wave

Jog

Run

Walk

Box

• Classification accuracy 93,8% using image data

36

Recognition of natural actions

Chen J, Zhao G, Kellokumpu V & Pietikäinen M (2011) Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition. Proc. ICCV Workshops (VECTaR2011), Barcelona, Spain.

What did we do

• complex human action recognition

37

What is the contribution

• use a simple LBP descriptor to divide each video into

motion segments

What is the contribution

• Representation

LBPHOG HOF

(HOG/HOF)+LBP

38

Experimental results

• Datasets – OSD from YouTube.

• 16 sport classes: high jump, long jump, triple jump, pole vault, discus throw, hammer throw, javelin throw, shot put, basketball layup, bowling, tennis serve, platform (diving), springboard (diving), snatch (weightlifting), clean and jerk (weightlifting) and vault (gymnastics).

– HOHA

• from thirty-two movies, e.g. “American Beauty”, “Big Fish”, “Forest Gump”. It contains eight types of actions, “AnswerPhone”, “GetOutCar”, “HandShake”, “Hug-Person”, “Kiss”, “SitDown”, “SitUp”, and “StandUp”.

– KTH

• 2,396 video sequences covering 6 types of actions performed by 25 actors

To what extend

Olympic Sports Dataset (OSD)

Hollywood Human Action

dataset 2

Niebles et al, ECCV 2010

72.1 N/A

Laptev et al. CVPR 2008

62.1 38.4

Ours 80.0 53.3

39

Video texture synthesis

Video texture synthesis is the process of providing a continuous and infinitely varying stream of frames, similar to the example texture.

Two major stages:

- Video stitching;

- Transition smoothing.

Guo Y, Zhao G, Zhou Z & Pietikäinen M (2013) Video texture synthesis with multi-frame LBP-TOP and diffeomorphic growth model. IEEE Transactions on Image Processing, in press.

Video stitching

• To generate a similar looking video with infinite length

taking a short video with finite length as input.

A video clip Video

Video clip: dynamic behaviors for a specific period of time.

Video texture synthesis depict the timeless quality of the phenomena in general.

40

Video stitching

41

42

Experiments

• DynTex database is a diverse collection of high quality

dynamic texture videos. The latest version contains 650

sequences of dynamic textures.

• Samples from other databases and the Internet are also used

for evaluation, including different types of dynamic textures,

ranging from the natural scene to human motion.

http://www.cse.oulu.fi/CMV/DemosAndVideos/DTwebpage

A typical example illustrating virtual frames. Compared to the original image sequence (the first row), the image sequence containing virtual frames (the second row) also has smooth and consistent visualization. The frames in green square are real frames.

Original

Synthesized

43

Part 3: A brief review of recent LBP variants

Description of interest regions with center-symmetric LBPs

n5

nc

n3 n1

n7

n0n4

n2

n6

Neighborhood

LBP =

s(n0 – nc)20

+

s(n1 – nc)21

+

s(n2 – nc)2 2 +

s(n3 – nc)2 3 +

s(n4 – nc)24

+

s(n5 – nc)25

+

s(n6 – nc)26

+

s(n7 – nc)2 7

Binary Pattern

CS-LBP =

s(n0 – n4)20

+

s(n1 – n5)21

+

s(n2 – n6)22 +

s(n3 – n7)23

- 294 citations in Google Scholar

- Most cited paper of PR journal published since 2009

Heikkilä M, Pietikäinen M & Schmid C (2009) Description of interest regions with local binary patterns. Pattern Recognition 42(3):425-436.

44

Description of interest regions

InputRegion

x

y

CS-LBPFeatures

x

y

Featu

re

Region Descriptor

xy

45

Setup for image matching experiments

CS-LBP perfomed better than SIFT in image maching and categorization experiments, especially for images with Illumination variations

46

Gabor filtering for extracting more macroscopic information [Zhang et

al., ICCV 2005]

Preprocessing for illumination normalization [Tan & Triggs, AMFG 2007]

Edge detection - used to Enhance the gradient information; Find orientations of local patterns [Liao & Chung, ICIP 2009), Find patterns of oriented edge magnitudes [Vu & Caplier, ECCV 2010, TIP 2012]

Preprocessing prior to LBP feature extraction

Square or circular neighborhood is normally used - circular neighborhood important for rotation-invariant operators

Anisisotropic neighborhood (e.g. elliptic)

- improved results in face recognition [Liao & Chung, ACCV 2007,

and in medical image analysis [Nanni et al., Artif. Intell. Med. 2010]

Encoding similarities between patches of pixels [Wolf et al., ECCV 2008]

- they characterize well topological structural information of face appearance

Neighborhood topology

47

• Using mean or median of the neihborhood for thresholding

• Using a non-zero threshold [Heikkilä et al., IEEE PAMI 2006]

• Local tenary patterns - encoding by 3 values [Tan & Triggs, AMGF 2007]

• Extended quinary patterns – encoding by 4 values [Nanni et al., Artif.

Intell. Med. 2010]

• Soft LBP [Ahonen & Pietikäinen, Finsig 2007]

• Scale invariant local ternary pattern [Liao et al., CVPR 2010] - for background subtraction applications

Thresholding and encoding

Robust LBP [Heikkilä et al., PAMI 2006]

In robust LBP, the term s(gp – gc) is replaced with s(gp – gc + a)

Allows bigger changes in pixel values without affecting thresholding results

- improved results in background subtraction [Heikkilä et al., PAMI 2006]

- was also used in CS-LBP interest region descriptor [Heikkilä et al., PR 2009]

48

Binary code: 11000000

0

0 0

0

1 0 1

0

0

0 0

1

-1 0 -1

1

0

0 0

1

0 0 0

1

Binary code: 00001100

Ternary code:1100(-1)(-1)00

Local ternary patterns (LTP) [Tan & Triggs, AMGF 2007]

• Feature selection e.g. with AdaBoost to reduce the number of bins [Zhang

et al., LNCS 2005]

• Subspace methods projecting LBP features into a lower-dimensional

space [Shan et al., ICPR 2006], [Chan et al., ICB 2007]

• Learning the most dominant and discriminative patterns [Liao et al., IEEE

TIP 2009], [Guo et al., ACCV 2010]

Feature selection and learning

49

A learning-based LBP

Guo Y, Zhao G & Pietikäinen M (2012) Discriminative features for texture description. Pattern Recognition 45(10):3834-3843.

• LBP was designed as a complementary measure of local contrast,

using joint LBP/C or LBP/VAR histograms

• LBPV puts the local contrast into 1-dimensional histogram [Guo et al.,

Pattern Recogn. 2010]

• Completed LBP (CLBP) considers complementary sign and magnitude

vectors [Guo et al., IEEE TIP 2010].

• Weber law descriptor (WLD) includes excitation and orientation

components [Chen et al., IEEE PAMI 2010]

Use of complementary contrast/magnitude information

50

Completed LBP [Guo et al., IEEE TIP 2010]

a) 3 x 3 sample block

b) The local differences

c) The sign component

d) The magnitude component

Local patterns (sign and magnitude)

Completed local binary patterns (CLBP: sign, magnitude and contrast):

Difference: 𝑑𝑝 = 𝑔𝑝 −𝑔𝑐

Sign: 𝑢𝑝 = 1, 𝑑𝑝 ≥ 0

0, 𝑑𝑝 < 0

Magnitude: ℎ𝑝 = 1, |𝑑𝑝| ≥ 𝛿

0, |𝑑𝑝| < 𝛿, where 𝛿 is the mean value of |𝑑𝑝| from

the whole image

𝑢𝑝 ℎ𝑝 |𝑑𝑝|

51

WLD: Weber law descriptor

Composed of excitation

and orientation

components

Chen J, Shan S, He C, Zhao G, Pietikäinen M, Chen X & Gao W (2010) WLD: A robust local image descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1705-1720.

Local Phase Quantization (LPQ)

- blur insentitive

- has received recently considerable attention e.g. in face analysis research community

Rahtu E, Heikkilä J, Ojansivu V & Ahonen T (2012) Local phase quantization for blur-insensitive image analysis. Image and Vision Computing 30(8):501–512.

52

Visual Recognition using Local Quantized Patterns

Sibt ul Hussain and William Triggs In: European Conference on Computer Vision, Florence, Italy, 2012.

Build LUT

Assign each local pattern to the cluster whose mean is closest to it.

{𝑥 ∈ 𝑘: | 𝑥 − 𝜇𝑘 | ≤ | 𝑥 − 𝜇𝑟 |∀1 ≤ 𝑟 ≤ 𝐾}

0

0

0

0

1

1

1

1

1

0

0

0

1

1

1

1

0

0

0

0

1

0

1

1

1

0

0

0

1

1

1

0

1

0

0

0

0

1

1

1

Local Patterns

00000000

00000001

00000010

11111111

17

Index

46

6

6

… …

53

Completed Local Quantized Patterns (CLQP) Huang X, Zhao G, Hong X, Pietikäinen M & Zheng W (2013) Texture description with completed local quantized patterns. Proc. 18th Scandinavian Conference on Image Analysis (SCIA 2013).

Ylioinas J, Hadid A, Guo Y & Pietikäinen M (2012) Efficient image appearance description using dense sampling based local binary patterns, ACCV 2012.

54

Histogram vs. ”softly voted” statistic

• In soft voting weights are assigned based on Hamming

distance between the detected pattern and

representative entries (bins) in the statistic

– Experiments on texture classification (CUReT) on original and ”limited-sample-size” scenario and face verification (LFW)

• Minor improvement with CUReT-original and LFW

Ylioinas J, Hong X & Pietikäinen M (2013) Constructing local binary pattern statistics by soft voting. Proc. 18th Scandinavian Conference on Image Analysis (SCIA 2013).

”limited-sample-size” scenario using CUReT

- the voted statistic suffered much less while using

41x41 pixels size of patches (cropped from the

original 200x200 textures)

» E.g. LBP(8,2) without any mapping, and

with uniform and rotation invariant mapping

CUReT original CUReT ”limited-sample-size”

55

Recent progress in LBP-based video descriptors

CLBP-TOP, based on computing CLBP in three orthogonal planes

LBP histogram Fourier features for rotation-invariant image and video description

Local monogenic binary patterns, combining monogenic signal analysis and local binary patterns

Combining LBP-TOP, WLD, and optical flow for segmentation

Pfister T, Li X, Zhao G & Pietikäinen M (2011) Differentiating spontaneous from posed facial expressions within a generic facial expression recognition framework. Proc. ICCV Workshops (SISM 2011), Barcelona, Spain, 868-875.

Zhao G, Ahonen T, Matas J & Pietikäinen M (2012) Rotation-invariant image and video description with local binary pattern features. IEEE Transactions on Image Processing 21(4):1465-1467

Huang X, Zhao G, Pietikäinen M & Zheng W (2012) Spatiotemporal local monogenic binary patterns for facial expression recognition. IEEE Signal Processing Letters 19(5):243-246.

Chen J, Zhao G, Salo M, Rahtu E & Pietikäinen M (2012) Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Transactions on Image Processing 22(1): 326-339.

Part 4: LBP in facial image analysis

56

Face description with LBP

- 1217 citations in Google Scholar

(ECCV paper: 863 citations)

- 3rd most cited paper of PAMI journal since 2006

Ahonen T, Hadid A & Pietikäinen M (2006) Face description with local binary patterns: application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12):2037-2041. (an early version published at ECCV 2004)

Weighting the regions

Block size Metrics Weighting

18 * 21

130 * 150

Feature vector length 2891

57

Local Gabor Binary Pattern Histogram Sequence [Zhang et al., ICCV 2005]

Illumination normalization by preprocessing] prior to LBP/LPQ feature extraction

Tan & Triggs, AMFG 2007]

Improving the robustness of LBP-based face recognition

Illumination invariance using LBP with NIR imaging

S.Z. Li et al. [IEEE PAMI, 2007]

58

Hussain et al. (2012) Face recognition using local quantized patterrns,

BMVC 2012.

- State-of-the-art performance on FERET and LFW databases

Lei Z, Pietikäinen M & Li SZ (2013) Learning discriminant face descriptor. IEEE Trans. Pattern Analysis and Machine Intelligence, accepted.

59

Research on Trusted biometrics under spoofing

attacks (TABULA RASA) 2010-2014

(http://www.tabularasa-euproject.org/)

• The project will address some of the issues of direct (spoofing)

attacks to trusted biometric systems. This is an issue that needs to be

addressed urgently because it has recently been shown that

conventional biometric techniques, such as fingerprints and face, are

vulnerable to direct (spoof) attacks.

• Coordinated by IDIAP, Switzerland

• We will focus on face and gait

recognition

• Without countermeasures biometric

systems are vulnerable to spoofing

attacks

– Recognition algorithms try to identify/verify instead of checking

whether the biometric input is genuine

• Especially true for face biometrics –

falsifying biometric data is easy

– Social media (Facebook, etc.)

– Hard to hide your face in public

– Paper prints, video displays

Problem description

60

Database examples

Typical countermeasures

• Dedicated sensors (infrared, multispectral & 3D imaging)

– Extra hardware needed – not always possible to deploy

• Multi-modal – hard to spoof multiple traits

– Robustness depends also on fusion rules

• Available sensors – same data used for authentication

– Non-intrusive but no reliable solutions available

• Challenge-response

– Interaction increases both security and intrusiveness

• Costs/intrusiveness and security level go hand in hand

61

Facial texture analysis – appearance and

dynamics

• Proximity between the spoofing medium and the camera (or if high

resolution imaging is used) may cause

– The recaptured face image to be out-of-focus overall blur

– Reveal other facial texture quality issues, e.g. due to the used display

medium, printing defects

Finding differences in texture patterns using LBP

• Dynamic visual cues that are based on either the motion patterns of

– A genuine human face (e.g. eye blinking, mouth movements, facial

expressions)

– The used display medium (e.g. characteristic reflections of planar

objects, distorted motion of warped photos, excessive if hand-held attack

performed relatively close to camera

Static and dynamic texture analysis using LBP-TOP

High performance in detecting 2D face spoofing attacks

• Our results were significantly better than those obtained with earlier methods

• LBP was very powerful, discriminating printing artifacts and differences in light reflection

Määttä J, Hadid A & Pietikäinen M (2011) Face spoofing detection from single images using micro-texture analysis. Proc. International Joint Conference on Biometrics (IJCB 2011).

62

LBP-based method for spoofing attack detection

Facial texture analysis – appearance and dynamics

63

Facial texture analysis – appearance and

dynamics

Gender classification using LBP and contrast

• concatenating LBP sign and contrast histograms: LBP_C

Ylioinas J, Hadid A & Pietikäinen M (2011) Combining contrast information and local binary patterns for gender classification. In: Image Analysis, SCIA 2011 Proceedings, Lecture Notes in Computer Science, 6688:676-686

64

Age classification using LBP variants

• concatenating LBP sign and magnitude histograms: CLBP_S_M

Ylioinas J, Hadid A & Pietikäinen M (2012) Age classification in unconstrained conditions using LBP variants. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan.

Research on recognizing facial expressions and

affect

65

Facial expression recognition from videos

Determine the emotional state of the face

• Regardless of the identity of the face

Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915-928.

Facial Expression Recognition

Mug Shot Dynamic Information

Action Units Prototypic Emotional Expressions

Psychological studies [Bassili 1979], have demonstrated that humans do a

better job in recognizing expressions from dynamic images as opposed to the

mug shot.

66

(a) Non-overlapping blocks(9 x 8) (b) Overlapping blocks (4 x 3, overlap size = 10)

a) Block volumes (b) LBP features (c) Concatenated features for one block volume from three orthogonal planes with the appearance and motion

Database

Cohn-Kanade database :

• 97 subjects

• 374 sequences

• Age from 18 to 30 years

• Sixty-five percent were female, 15 percent were

African-American, and three percent were Asian or

Latino.

67

Happiness Anger Disgust

Sadness Fear Surprise

Demo for facial expression recognition

Low resolution

No eye detection

Translation, in-plane and out-of-plane rotation, scale

Illumination change

Robust with respect to errors in

face alignment

68

Principal appearance and motion from boosted

spatiotemporal descriptors

Multiresolution features=>Learning for pairs=>Slice selection

1) Use of different number of neighboring points when computing

the features in XY, XT and YT slices

2) Use of different radii which can catch the occurrences in different space and time scales

Zhao G & Pietikäinen M (2009) Boosted multi-resolution spatiotemporal descriptors for facial expression recognition. Pattern Recognition Letters 30(12):1117-1127.

3) Use of blocks of different sizes to have global and local statistical

features

The first two resolutions focus on the

pixel level in feature computation, providing different local spatiotemporal information

the third one focuses on the

block or volume level, giving more global information in space and time dimensions.

69

Selected 15 most discriminative slices

Proposing the first method to recognize expressions

in weak illumination using near-infrared videos

Visible light (VL) : 0.38-0.75 μm Near Infrared (NIR) : 0.7μm-1.1μm

Zhao G, Huang X, Taini M, Li SZ & Pietikäinen M (2011) Facial expression recognition from near-infrared videos. Image and Vision Computing 29(9):607-619.

70

On-line facial expression recognition from NIR videos

• NIR web camera allows expression recognition in near darkness.

• Image resolution 320 × 240 pixels.

• 15 frames used for recognition.

• Distance between the camera and subject around one meter.

Start sequences Middle sequences End sequences

Towards recognizing spontaneous expressions: a

component-based spatiotemporal feature descriptor

• Facial Interest Components

– Forehead

– Eyes

– Nose

– Mouth

• Active Shape Model for dealing with occlusion and small view

changes

71

Component-based approaches

Boosted spatiotemporal LBP-TOP features are extracted from areas

centered at fiducial points (detected by ASM) or larger areas

• more robust to changes of pose, occlusions

• can be used for analyzing action units [Jiang et al, FG 2011]

Huang X, Zhao G, Pietikäinen M & Zheng W (2010) Dynamic facial expression recognition using boosted component-based spatiotemporal features and multi-classifier fusion. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2010 Proceedings, Lecture Notes in Computer Science, 6475:312-322.

• Real-world challenge (Occlusion)

• Existed occlusion detections (on static images)

X. Huang, G. Zhao, W. Zheng, M. Pietikäinen. Towards a dynamic expression recognition under facial occlusion. Pattern Recognition Letters.

Towards a dynamic expression recognition under facial occlusion

72

Framework (CFD-OD-WL)

Experimental results

73

Recognition of spontaneous micro-expressions

Posed Spontaneous

74

Recognition of spontaneous micro-expressions

Pfister T, Li X, Zhao G & Pietikäinen M (2011) Recognising spontaneous facial micro-expressions. Proc. International Conference on Computer Vision (ICCV 2011), 1449-1456.

Idea

• Read hidden emotions in faces

75

Potential applications

Reveal lies

Find right price

Our method

Price negotiation

Interrogation

Facial micro-expressions - method

Figure: An example of a facial micro-expression (top-left) being interpolated through graph embedding (top-right); the result from which spatiotemporal local texture descriptors are extracted (bottom-right), enabling recognition using multiple kernel learning.

76

Figure: Normalisation in space domain to a model face. (a) is the model face onto which the feature points in the example face are mapped. (b) is the example face with its feature points detected. (c) shows the image after the feature points of the example face have been mapped to the model face.

•High speed camera

•Emotional movie clips

•Participant

•Researcher

77

Movie to induce FEAR emotion: 1/10 Speed

1/10 Speed

Movie to induce HAPPY emotion:

78

1/10 Speed

Movie to induce SURPRISE emotion:

Recognition of emotions from multiple modalities

79

Multimodal recognition of emotions

Eye detection Facial representation by LBP

Classification by frame-by-frame

method

Kortelainen J, Huang X, Li X, Laukka S, Pietikäinen M & Seppänen T (2012) Multimodal emotion recognition by combining physiological signals and facial expressions: a preliminary study. Proc. the 34th Annual Int’l Conf. IEEE Engineering in Medicine and Biology Society (EMBC'12), San Diego, CA.

Demo

Positive Neutral

Negative

80

Visual recognition of spoken phrases

Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment.

A human listener can use visual cues, such as lip and tongue movements, to enhance the level of speech understanding.

The process of using visual modality is often referred to as lipreading which is to make sense of what someone is saying by watching the movement of his lips.

McGurk effect [McGurk and MacDonald 1976] demonstrates that inconsistency

between audio and visual information can result in perceptual confusion.

Zhao G, Barnard M & Pietikäinen M (2009). Lipreading with local spatio-temporal descriptors. IEEE Transactions on Multimedia 11(7):1254-1265.

System overview

Our system consists of three stages.

• First stage: face and eye detectors, and the localization of mouth.

• Second stage: extracts the visual features.

• Last stage: recognize the input utterance.

81

Local spatiotemporal descriptors for visual information

• (a) Volume of utterance sequence

• (b) Image in XY plane (147x81)

• (c) Image in XT plane (147x38) in y = 40

• (d) Image in TY plane (38x81) in x = 70

Overlapping blocks (1 x 3, overlap size = 10).

LBP-YT images

Mouth region images

LBP-XY images

LBP-XT images

•Features in each block volume.

•Mouth movement representation.

82

Experiments

Database:

Our own visual speech database: OuluVS Database

20 persons; each uttering ten everyday’s greetings one to five times.

Totally, 817 sequences from 20 speakers were used in the experiments.

C1 “Excuse me” C6 “See you”

C2 “Good bye” C7 “I am sorry”

C3 “Hello” C8 “Thank you”

C4 “How are you” C9 “Have a good time”

C5 “Nice to meet you” C10 “You are welcome”

Experimental results - OuluVS database

•Mouth regions from the dataset.

•Speaker-independent:

C1 C2 C3 C4 C5 C6 C7 C8 C9 C100

20

40

60

80

100

Phrases index

Rec

ogni

tion

resu

lts (

%)

1x5x3 block volumes

1x5x3 block volumes (features just from XY plane)

1x5x1 block volumes

83

Selected 15 slices for phrases ”See you” and ”Thank you”.

Selected 15 slices for phrases ”Excuse me” and ”I am sorry”.

•These phrases were most difficult to recognize because they are quite similar in the latter part containing the same word ”you”.

•The selected slices are mainly in the first and second part of the phrase.

•The phrases ”excuse me” and ”I am sorry”are different throughout the whole utterance, and the selected features also come from the whole pronunciation.

Selecting 15 most discriminative

features

Demo for visual speech recognition

84

LBP-TOP with video normalization

With normalization nearly 20%

improvement in speaker independent

recognition is obtained

Zhou Z, Zhao G & Pietikäinen M (2011) Towards a practical lipreading system. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), 137-144.

Visual speaker identification with

spatiotemporal directional features

• Problem: recognizing who is speaking on the basis of individual information

• included in speech signals and lip movements.

• Audio signal

• Video signal

geometric feature

features from the gray level data directly

• Audio + Video

Zhao G & Pietikäinen M (2013) Visual speaker identification with spatiotemporal directional features. Proc. International Conference on Image Analysis and Recognition.

85

From XM2VTS database: ``Joe took fathers green shoe bench out''

An intuitive understanding of features related to lip movement:

different people have different articulatory styles when

speaking the same utterance.

• Discrete Cosine Transform (DCT)

• Principal Component Analysis (PCA)

• EdgeMap LBP

• LOCP-TOP: local ordinal contrast pattern with three orthogonal planes

1) they did not consider the directional features;

2) it only used binary code (sign) information;

3) the correlation of pixels was not eliminated.

86

Encoding Spatiotemporal Directional Changes

Combining Sign and Magnitude Information

87

Decorrelation

Then we use G(x) for independent sign and magnitude feature calculation:

• XM2VTS Database

• 295 subjects in four sessions, spaced monthly.

• The first recording per session of the sentence “Joe took fathers green shoe bench out” was used for this research.

Experiments

88

Demo: Affective human-robot interaction

89

Part 5: Summary and some future directions

Summary

• Modern texture operators form a generic tool for computer vision

• LBP and its variants are very effective for various tasks in computer

vision

• The advantages of the LBP and its variants include

- computationally very simple

- can be easily tailored to different types of problems

- robust to illumination variations

- robust to localization errors

• LPQ is also insensitive to image blurring

• New LBP variants are still emerging

• Complementary descriptors are often used

- LBP/C, CLBP, Gabor&LBP, SIFT&LBP, HOG&LBP, LPQ&LBP

90

Some future directions

• New LBP variants are still emerging

• Often a single descriptor is not effective enough

• Multi-scale processing

• Use of complementary descriptors

- CLBP, Gabor&LBP, Gradient/Orientation&LBP, SIFT&LBP, HOG&LBP,

LPQ&LBP,

• Combining local with more global information (e.g. LBP & Gabor)

• Combining texture and color - or intensity and depth

• Combining sparse (e.g., SIFT, HOG) and dense descriptors (LBP)

• Machine learning for finding the most effective descriptors for

a given problem

• Dynamic textures offer a new approach to motion analysis

- general constraints of motion analysis (i.e. scene is Lambertian, rigid

and static) can be relaxed

The success of LBP approach is growing

• Increasing number of citations

• Used in various problems, applications, and industrial products

• First book on LBP published in 2011 (Springer)

• First LBP workshop was held in conjunction of ACCV 2012

• First special issue of a journal on LBP appearing in 2013

• First edited book on LBP will be published in 2013 (Springer)

For downloads, see http://www.cse.oulu.fi/CMV/Downloads

91

A book on local binary patterns published in 2011

Thanks!

Recommended