View
1
Download
0
Category
Preview:
Citation preview
1
Image and video analysis with local binary pattern variants
Matti Pietikäinen, Guoying Zhao
Center for Machine Vision Research
University of Oulu, Finland
http://www.cse.oulu.fi/CMV
Part 1: Introduction to local binary patterns in spatial
and spatiotemporal domains
2
Texture is an important characteristic of images and videos
•
Property
Pattern Contrast Transformation
LBP in spatial domain
2-D surface texture is a two dimensional phenomenon characterized by:
• spatial structure (pattern)
• contrast (‘amount’ of texture)
Thus,
1) contrast is of no interest in gray scale invariant analysis
2) often we need a gray scale and rotation invariant pattern measure
Gray scale no effect
Rotation no effect affects
affects
? affects Zoom in/out
3
Local Binary Pattern and Contrast operators
6 5 2
7 6 1
9 8 7
1
1
1 11
0
00 1 2 4
8
163264
128
example thresholded weights
LBP = 1 + 16 +32 + 64 + 128 = 241
Pattern = 11110001
C = (6+7+8+9+7)/5 - (5+2+1)/3 = 4.7
An example of computing LBP and C in a 3x3 neighborhood:
Important properties:
• LBP is invariant to any monotonic gray level change
• computational simplicity
- 1778 citations in Google Scholar
- 3rd most cited paper of the PR journal published since 1996
Ojala T, Pietikäinen M & Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29:51-59.
- arbitrary circular neighborhoods - uniform patterns
- multiple scales
- rotation invariance
- gray scale variance as contrast measure
Multiscale LBP
- 3213 citations in Google Scholar
- 4th most cited paper of the top-ranking IEEE PAMI journal since 2002
- the most cited Finnish paper in ICT area published since 2002
Ojala T, Pietikäinen M & Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7):971-987 (an early version at ECCV 2000).
4
•70
51
70 62
83
65
78
•47
80
-19
0 -8
13
-5
8
-23
10
0
1 0
1
0
1
0
1
1. Sample 2. Difference 3. Threshold
1*1 + 1*2 + 1*4 + 1*8 + 0*16 + 0*32 + 0*64 + 0*128 = 15
4. Multiply by powers of two and sum
An example of LBP image and histogram
5
MACHINE VISION GROUP
Let’s define texture T as the joint distribution of gray levels
gc and gp (p=0,…, P-1):
T = t(gc, g0, …, gP-1)
Circular neighborhood (g1,g3,g5,g7 interpolated)
gc R g0
g2 g1
g4
g3
g6 g5 g7
gc g0
g2 g1
g4
g3
g6 g5 g7
R
Texture at gc is modeled using a local neighborhood of radius R,
which is sampled at P (8 in the example) points:
Square neighborhood
Foundations for LBP: Description of local image texture
MACHINE VISION GROUP
Without losing information, we can subtract gc from gp:
T = t(gc, g0- gc, …, gP-1- gc)
Assuming gc is independent of gp-gc, we can factorize above:
T ~ t(gc) t(g0-gc, …, gP-1-gc)
t(gc) describes the overall luminance of the image, which is unrelated to
local image texture, hence we ignore it:
T ~ t(g0-gc, …, gP-1-gc)
Above expression is invariant wrt. gray scale shifts
Description of local image texture (cont.)
6
MACHINE VISION GROUP
average t(gc,g0- gc) average absolute difference
between t(gc,g0- gc) and t(gc) t(g0-gc)
Pooled (G=16) from 32 Brodatz textures used in
[Ojala, Valkealahti, Oja & Pietikäinen: Pattern Recognition 2001]
Exact independence of t(gc) and t(g0-gc, …, gP-1-gc ) is not warranted in
practice:
Description of local image texture (cont.)
MACHINE VISION GROUP
Invariance wrt. any monotonic transformation of the gray scale is achieved
by considering the signs of the differences:
T ~ t(s(g0-gc), …, s(gP-1-gc))
where
s(x) = { 1, x 0
0, x < 0
Above is transformed into a unique P-bit pattern code by assigning
binomial coefficient 2p to each sign s(gp-gc):
P-1
LBPP,R = s (gp-gc) 2p
p=0
LBP: Local Binary Pattern
7
MACHINE VISION GROUP
U=2
U=0
‘Uniform’ patterns (P=8)
U=4 U=6 U=8
Examples of ‘nonuniform’ patterns (P=8)
‘Uniform’ patterns
MACHINE VISION GROUP
Texture primitives (“micro-textons”) detected by the uniform
patterns of LBP
1 = black
0 = white
8
Uniform
patterns
Bit patterns with 0 or 2 transitions
0→1 or 1→0
when the pattern is considered circular
All non-uniform patterns assigned to a single bin
58 uniform patterns in case of 8 sampling points
MACHINE VISION GROUP
Rotation of Local Binary Patterns
edge (15)
(30) (60) (120) (240) (225) (195) (135) (15)
rotation
Spatial rotation of the binary pattern changes the LBP code:
9
MACHINE VISION GROUP
Rotation invariant local binary patterns
Formally, rotation invariance can be achieved by defining:
LBPP,Rri = min{ROR(LBPP,R, i) | i=0, …, P-1}
(15) (30) (60) (120) (240) (225) (195) (135)
mapping
(15)
1 P-1
VARP,R= - (gp - m)2 P p=0 where
1 P-1
m = - gp P p=0
VARP,R
• invariant wrt. gray scale shifts (but not to any monotonic transformation like LBP)
• invariant wrt. rotation along the circular neighborhood
Operators for characterizing texture contrast
Local gray level variance can be used as a contrast measure:
Usually using complementary contrast leads to a better performance than using LBP alone.
10
Quantization of continuous feature space
bin 0 bin 1 bin 2 bin 3
cut values
equal area
total distribution
Texture statistics are described with discrete histograms Mapping needed for continuous-valued contrast features
Non-uniform quantization Every bin have the same amount of total data
Highest resolution of the quantization is used where the number of entries
is largest
Estimation of empirical feature distributions
0 1 2 3 4 5 6 7 ... B-1
VARP,R LBPP,R
riu2 / VARP,R
LBPP,Rriu2
Joint histogram of
two operators
Input image (region) is scanned with the chosen operator(s), pixel by pixel,
and operator outputs are accumulated into a discrete histogram
LBPP,Rriu2
0 1 2 3 4 5 6 7 ... P+
1
11
Multi-scale analysis
Information provided by N operators can be combined simply by summing up operatorwise similarity scores into an aggregate similarity score:
N
LN = Ln e.g. LBP8,1riu2 + LBP8,3
riu2 + LBP8,5riu2
n=1
Effectively, the above assumes that distributions of individual operators are independent
LBP features can be computed directly from different levels of an image pyramid - or image regions can be re-scaled prior to feature extraction
Multiscale analysis using images at multiple scales
12
Nonparametric classification principle
In Nearest Neighbor classification, sample S is assigned to the class
of model M that maximizes
B-1
L(S,M) = Sb ln Mb b=0
Instead of log-likelihood statistic, chi square distance or histogram intersection is often used for comparing feature distributions.
The histograms should be normalized e.g. to unit length before classification,
if the sizes of the image windows to be analyzed can vary.
The bins of the LBP feature distribution can also be used directly as
features e.g. for SVM classifiers.
LBP unifies statistical and structural approaches
13
Spatiotemporal LBP
- 449 citations in Google Scholar
Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915-928.
Dynamic textures (R Nelson & R Polana: IUW, 1992; M Szummer & R
Picard: ICIP, 1995; G Doretto et al., IJCV, 2003)
14
Volume Local Binary Patterns (VLBP)
Sampling in volume
Thresholding
Multiply
Pattern
LBP from Three Orthogonal Planes (LBP-TOP)
0 2 4 6 8 10 12 14 160
5
10x 10
4
P: Number of Neighboring Points
Length
of
Featu
re V
ecto
r
Concatenated LBP
VLBP
15
32
10
-1-2
-3-1
0
1
3
2
1
0
-1
-2
-3
XT
Y
-3 -2 -1 0 1 2 3-3
-2
-1
0
1
2
3
X
Y
-3 -2 -1 0 1 2 3-1
0
1
X
T
-3 -2 -1 0 1 2 3-1
0
1
Y
T
LBP-TOP
16
DynTex database
• Our methods outperformed the state-of-the-art in experiments with DynTex and MIT dynamic texture databases
17
Results of LBP from three planes
5 10 15 20 25 300
0.2
0.4
0 100 200 300 400 500 600 700 8000
0.05
0.1
0.15
0.2
LBP XY XZ YZ Con weighted
8,8,8,1,1,1 riu2 88.57 84.57 86.29 93.14 93.43[2,1,1]
8,8,8,1,1,1 u2 92.86 88.86 89.43 94.57 96.29[4,1,1]
8,8,8,1,1,1 Basic 95.14 90.86 90 95.43 97.14[5,1,2]
8,8,8,3,3,3 Basic 90 91.17 94.86 95.71 96.57[1,1,4]
8,8,8,3,3,1 Basic 89.71 91.14 92.57 94.57 95.71[2,1,8]
Rotation revisited
• Rotation of an image by α degrees • Translates each local neighborhood to a new location
• Rotates each neighborhood by α degrees
LBP histogram Fourier features
If α = 45°, local binary patterns
• 00000001 → 00000010,
• 00000010 → 00000100, ...,
• 11110000 → 11100001, ...,
Similarly if α = k*45°,
• each pattern is circularly
• rotated by k steps
Zhao G, Ahonen T, Matas J & Pietikäinen M (2012) Rotation-invariant image and video description with local binary pattern features. IEEE Transactions on Image Processing 21(4):1465-1467.
18
Rotation revisited (2)
In the uniform LBP histogram, rotation of input image by
k*45° causes a cyclic shift by k along each row:
LBP Histogram Fourier Features
.
.
.
LBP-HF feature vector
.
.
.
Fourier
magnitudes
Fourier
magnitudes
1
0
/2)),((),(P
r
Puri
PI ernUhunH
Fourier
magnitudes
),(),(),( unHunHunH
19
Example
Input image Uniform
LBP histogram
Original rot.invariant LBP (red)
LBP-Histogram fourier (blue)
Rotation Invariant LBP-TOP
Computation of LBP-TOP for “watergrass” with 0 (left) and 60
(right) degrees rotation.
20
Rotated planes from which LBP is computed.
One Dimensional Histogram Fourier LBP-TOP
(1DHFLBP-TOP)
LBP histograms for uniform patterns in different rotated motion planes with PXY = 8 and PT = 8.
21
Two Dimensional Histogram Fourier LBP-TOP
(2DHFLBP-TOP)
Uniform pattern before mirror (left)
and after mirror (right).
The examples of LBP with number of “1”s two in different rotated
motion planes with PXY = 8 and PT = 8.
DynTex database
Images after rotating by 15 degree intervals.
22
23
A nine class dataset with the classes being boiling water (8), fire (8), flowers (12), fountains (20), plants (108), sea (12), smoke (4), water (12) and waterfall (16).
P. Saisan, G. Doretto, Y.N. Wu, and S. Soatto, “Dynamic texture recognition,” CVPR, pp. 58-63, 2001.
A. Ravichandran, R. Chaudhry, and R. Vidal, “View-invariant dynamic texture tecognition using a bag of dynamical systems,” CVPR, pp. 1-6, 2009.
24
Part 2: Examples of applications
Unsupervised texture segmentation
LBP/C was used as texture operator
Segmentation algorithm consists of three phases:
• 1) hierarchical splitting 2) agglomerative merging, 3) pixelwise classification
hierarchical
splitting
agglomerative merging
pixelwise classification
- 277 citations in Google Scholar
Ojala T & Pietikäinen M (1999) Unsupervised texture segmentation using feature distributions. Pattern Recognition 32:477-486.
25
Segmentation examples
Natural scene #2: 192x192 pixels
Natural scene #1: 384x384 pixels
Case project: Texture analysis of tissues with virtual microscopy
• Joint research with Adj.Prof. Johan Lundin, FIMM, University of Helsinki
• Texture analysis is becoming a very important tool in microscopy
• The objective is to study and develop automated texture-based classifiers for
breast and other cancer tissues that together with clinical and molecular-genetic
data, can distinguish patients with aggressive disease from those with indolent
disease. Also image analysis tools for the diagnosis of malaria are studied.
1. Imaging 2. Stitching 3. Virtual Slide 4. Texture Analysis with Local Binary Pattern operator 5. Classifier
Class probability
90%
8%
2%
LBP
26
Linder et al. (2012) Identification of tumor epithelium and stroma in tissue
microarrays using texture analysis, Diagnostic Pathology, 2012, 7:22.
•Epithelium
•Stroma
•A •B •C •D •E
•LBP
•LBP
In experiments with colorectal cancer microscopy images, over 99% accuracy was obtained with LBP/C features and SVM classifier
Case: Automatic landscape mode detection
• The aim was to develop and implement an algorithm that automatically
classifies images to landscape and non-landscape categories
– The analysis is solely based on the visual content of images.
• The main criterion is to find an accurate but still computationally light
solution capable of real-time operation.
Huttunen S, Rahtu E, Heikkilä J, Kunttu I & Gren J (2011) Real-time detection of landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS, 6688:338-347.
27
Landscape vs. non-landscape
• Definition of landscape and non-landscape images is not
straightforward
– If there are no distinct and easily separable objects present in a
natural scene, the image is classified as landscape
– The non-landscape branch consists of indoor scenes and other
images containing man-made objects at relatively close distance
Data set
• The images used for training and testing were downloaded from the
PASCAL Visual Object Classes (VOC2007) database and the Flickr
site
– All the images were manually labeled and resized to QVGA
(320x240).
• Training: 1115 landscape images and
2617 non-landscape images
• Testing: 912 and 2140, respectively
28
The approach
• Simple global image representation
based on local binary pattern (LBP)
histograms is used
• Two variants:
– Basic LBP
– LBP In+Out
• SVM classifier
Histogramcomputation
SVM classifiertraining
Featureextraction
Classification results
29
Summary of the results
Real-time implementation
• The current real-time implementation coded in C relies
on the basic LBPb
• Performance analysis
– Windows PC with Visual Studio 2010 Profiler
• The total execution time for one frame was
about 3 ms
– Nokia N900 with FCam
• About 30 ms
30
Demo videos
Huttunen S, Rahtu E, Kunttu I, Gren J & Heikkilä J (2011) Real-time detection of landscape scenes. Proc. Scandinavian Conference on Image Analysis (SCIA 2011), LNCS, 6688:338-347.
Modeling the background and detecting moving
objects
- 507 citations in Google Scholar
Heikkilä M & Pietikäinen M (2006) A texture-based method for modeling the background and detecting moving objects. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(4):657-662.
31
Roughly speaking, the background subtraction can be seen as a two-stage process as illustrated below.
Background modeling
The goal is to construct and maintain a statistical representation of the scene that the camera sees.
Foreground Detection The comparison of the input frame with the current background model.
The areas of the input frame that do not fit to the background model are considered as foreground.
Overview of the approach…
We use an LBP histogram computed over a circular region around the pixel as the feature vector.
The history of each pixel over time is modeled as a group of K weighted LBP histograms: {x1,x2,…,xK}.
The background model is updated with the information of each new video frame, which makes the algorithm adaptive.
The update procedure is identical for each pixel.
x1
x2
xK
32
Examples of detection results
Detection results for images of Toyama et al. (ICCV 1999)
33
Demo for detection of moving objects
LBP in multi-object tracking
Takala V & Pietikäinen M (2007) Multi-object tracking using color, texture, and motion. Proc. Seventh IEEE International Workshop on Visual Surveillance (VS 2007), Minneapolis, USA, 7 p.
34
Recognition of human actions
Kellokumpu V, Zhao G & Pietikäinen M (2011) Recognition of human actions using texture descriptors. Machine Vision and Applications 22(5):767-780.
Dynamic textures for action recognition
• Illustration of xyt-volume of a person walking
yt
xt
35
Dynamic textures for action recognition
• Formation of the feature histogram for an xyt volume of short duration
• HMM is used for sequential modeling
Feature histogram of a bounding volume
Action classification results - KTH
.980.020
.855.145
.032,108.860
.977.020.003
.01.987.003
.033.967
.980.020
.855.145
.032,108.860
.977.020.003
.01.987.003
.033.967
Box Clap Wave Jog Run Walk
Clap
Wave
Jog
Run
Walk
Box
• Classification accuracy 93,8% using image data
36
Recognition of natural actions
Chen J, Zhao G, Kellokumpu V & Pietikäinen M (2011) Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition. Proc. ICCV Workshops (VECTaR2011), Barcelona, Spain.
What did we do
• complex human action recognition
37
What is the contribution
• use a simple LBP descriptor to divide each video into
motion segments
What is the contribution
• Representation
LBPHOG HOF
(HOG/HOF)+LBP
38
Experimental results
• Datasets – OSD from YouTube.
• 16 sport classes: high jump, long jump, triple jump, pole vault, discus throw, hammer throw, javelin throw, shot put, basketball layup, bowling, tennis serve, platform (diving), springboard (diving), snatch (weightlifting), clean and jerk (weightlifting) and vault (gymnastics).
– HOHA
• from thirty-two movies, e.g. “American Beauty”, “Big Fish”, “Forest Gump”. It contains eight types of actions, “AnswerPhone”, “GetOutCar”, “HandShake”, “Hug-Person”, “Kiss”, “SitDown”, “SitUp”, and “StandUp”.
– KTH
• 2,396 video sequences covering 6 types of actions performed by 25 actors
To what extend
Olympic Sports Dataset (OSD)
Hollywood Human Action
dataset 2
Niebles et al, ECCV 2010
72.1 N/A
Laptev et al. CVPR 2008
62.1 38.4
Ours 80.0 53.3
39
Video texture synthesis
Video texture synthesis is the process of providing a continuous and infinitely varying stream of frames, similar to the example texture.
Two major stages:
- Video stitching;
- Transition smoothing.
Guo Y, Zhao G, Zhou Z & Pietikäinen M (2013) Video texture synthesis with multi-frame LBP-TOP and diffeomorphic growth model. IEEE Transactions on Image Processing, in press.
Video stitching
• To generate a similar looking video with infinite length
taking a short video with finite length as input.
A video clip Video
Video clip: dynamic behaviors for a specific period of time.
Video texture synthesis depict the timeless quality of the phenomena in general.
40
Video stitching
41
42
Experiments
• DynTex database is a diverse collection of high quality
dynamic texture videos. The latest version contains 650
sequences of dynamic textures.
• Samples from other databases and the Internet are also used
for evaluation, including different types of dynamic textures,
ranging from the natural scene to human motion.
http://www.cse.oulu.fi/CMV/DemosAndVideos/DTwebpage
A typical example illustrating virtual frames. Compared to the original image sequence (the first row), the image sequence containing virtual frames (the second row) also has smooth and consistent visualization. The frames in green square are real frames.
Original
Synthesized
43
Part 3: A brief review of recent LBP variants
Description of interest regions with center-symmetric LBPs
n5
nc
n3 n1
n7
n0n4
n2
n6
Neighborhood
LBP =
s(n0 – nc)20
+
s(n1 – nc)21
+
s(n2 – nc)2 2 +
s(n3 – nc)2 3 +
s(n4 – nc)24
+
s(n5 – nc)25
+
s(n6 – nc)26
+
s(n7 – nc)2 7
Binary Pattern
CS-LBP =
s(n0 – n4)20
+
s(n1 – n5)21
+
s(n2 – n6)22 +
s(n3 – n7)23
- 294 citations in Google Scholar
- Most cited paper of PR journal published since 2009
Heikkilä M, Pietikäinen M & Schmid C (2009) Description of interest regions with local binary patterns. Pattern Recognition 42(3):425-436.
44
Description of interest regions
InputRegion
x
y
CS-LBPFeatures
x
y
Featu
re
Region Descriptor
xy
45
Setup for image matching experiments
CS-LBP perfomed better than SIFT in image maching and categorization experiments, especially for images with Illumination variations
46
Gabor filtering for extracting more macroscopic information [Zhang et
al., ICCV 2005]
Preprocessing for illumination normalization [Tan & Triggs, AMFG 2007]
Edge detection - used to Enhance the gradient information; Find orientations of local patterns [Liao & Chung, ICIP 2009), Find patterns of oriented edge magnitudes [Vu & Caplier, ECCV 2010, TIP 2012]
Preprocessing prior to LBP feature extraction
Square or circular neighborhood is normally used - circular neighborhood important for rotation-invariant operators
Anisisotropic neighborhood (e.g. elliptic)
- improved results in face recognition [Liao & Chung, ACCV 2007,
and in medical image analysis [Nanni et al., Artif. Intell. Med. 2010]
Encoding similarities between patches of pixels [Wolf et al., ECCV 2008]
- they characterize well topological structural information of face appearance
Neighborhood topology
47
• Using mean or median of the neihborhood for thresholding
• Using a non-zero threshold [Heikkilä et al., IEEE PAMI 2006]
• Local tenary patterns - encoding by 3 values [Tan & Triggs, AMGF 2007]
• Extended quinary patterns – encoding by 4 values [Nanni et al., Artif.
Intell. Med. 2010]
• Soft LBP [Ahonen & Pietikäinen, Finsig 2007]
• Scale invariant local ternary pattern [Liao et al., CVPR 2010] - for background subtraction applications
Thresholding and encoding
Robust LBP [Heikkilä et al., PAMI 2006]
In robust LBP, the term s(gp – gc) is replaced with s(gp – gc + a)
Allows bigger changes in pixel values without affecting thresholding results
- improved results in background subtraction [Heikkilä et al., PAMI 2006]
- was also used in CS-LBP interest region descriptor [Heikkilä et al., PR 2009]
48
Binary code: 11000000
0
0 0
0
1 0 1
0
0
0 0
1
-1 0 -1
1
0
0 0
1
0 0 0
1
Binary code: 00001100
Ternary code:1100(-1)(-1)00
Local ternary patterns (LTP) [Tan & Triggs, AMGF 2007]
• Feature selection e.g. with AdaBoost to reduce the number of bins [Zhang
et al., LNCS 2005]
• Subspace methods projecting LBP features into a lower-dimensional
space [Shan et al., ICPR 2006], [Chan et al., ICB 2007]
• Learning the most dominant and discriminative patterns [Liao et al., IEEE
TIP 2009], [Guo et al., ACCV 2010]
Feature selection and learning
49
A learning-based LBP
Guo Y, Zhao G & Pietikäinen M (2012) Discriminative features for texture description. Pattern Recognition 45(10):3834-3843.
• LBP was designed as a complementary measure of local contrast,
using joint LBP/C or LBP/VAR histograms
• LBPV puts the local contrast into 1-dimensional histogram [Guo et al.,
Pattern Recogn. 2010]
• Completed LBP (CLBP) considers complementary sign and magnitude
vectors [Guo et al., IEEE TIP 2010].
• Weber law descriptor (WLD) includes excitation and orientation
components [Chen et al., IEEE PAMI 2010]
Use of complementary contrast/magnitude information
50
Completed LBP [Guo et al., IEEE TIP 2010]
a) 3 x 3 sample block
b) The local differences
c) The sign component
d) The magnitude component
Local patterns (sign and magnitude)
Completed local binary patterns (CLBP: sign, magnitude and contrast):
Difference: 𝑑𝑝 = 𝑔𝑝 −𝑔𝑐
Sign: 𝑢𝑝 = 1, 𝑑𝑝 ≥ 0
0, 𝑑𝑝 < 0
Magnitude: ℎ𝑝 = 1, |𝑑𝑝| ≥ 𝛿
0, |𝑑𝑝| < 𝛿, where 𝛿 is the mean value of |𝑑𝑝| from
the whole image
𝑢𝑝 ℎ𝑝 |𝑑𝑝|
51
WLD: Weber law descriptor
Composed of excitation
and orientation
components
Chen J, Shan S, He C, Zhao G, Pietikäinen M, Chen X & Gao W (2010) WLD: A robust local image descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9):1705-1720.
Local Phase Quantization (LPQ)
- blur insentitive
- has received recently considerable attention e.g. in face analysis research community
Rahtu E, Heikkilä J, Ojansivu V & Ahonen T (2012) Local phase quantization for blur-insensitive image analysis. Image and Vision Computing 30(8):501–512.
52
Visual Recognition using Local Quantized Patterns
Sibt ul Hussain and William Triggs In: European Conference on Computer Vision, Florence, Italy, 2012.
Build LUT
Assign each local pattern to the cluster whose mean is closest to it.
{𝑥 ∈ 𝑘: | 𝑥 − 𝜇𝑘 | ≤ | 𝑥 − 𝜇𝑟 |∀1 ≤ 𝑟 ≤ 𝐾}
0
0
0
0
1
1
1
1
1
0
0
0
1
1
1
1
0
0
0
0
1
0
1
1
1
0
0
0
1
1
1
0
1
0
0
0
0
1
1
1
Local Patterns
00000000
00000001
00000010
11111111
17
Index
46
6
6
… …
53
Completed Local Quantized Patterns (CLQP) Huang X, Zhao G, Hong X, Pietikäinen M & Zheng W (2013) Texture description with completed local quantized patterns. Proc. 18th Scandinavian Conference on Image Analysis (SCIA 2013).
Ylioinas J, Hadid A, Guo Y & Pietikäinen M (2012) Efficient image appearance description using dense sampling based local binary patterns, ACCV 2012.
54
Histogram vs. ”softly voted” statistic
• In soft voting weights are assigned based on Hamming
distance between the detected pattern and
representative entries (bins) in the statistic
– Experiments on texture classification (CUReT) on original and ”limited-sample-size” scenario and face verification (LFW)
• Minor improvement with CUReT-original and LFW
Ylioinas J, Hong X & Pietikäinen M (2013) Constructing local binary pattern statistics by soft voting. Proc. 18th Scandinavian Conference on Image Analysis (SCIA 2013).
”limited-sample-size” scenario using CUReT
- the voted statistic suffered much less while using
41x41 pixels size of patches (cropped from the
original 200x200 textures)
» E.g. LBP(8,2) without any mapping, and
with uniform and rotation invariant mapping
CUReT original CUReT ”limited-sample-size”
55
Recent progress in LBP-based video descriptors
CLBP-TOP, based on computing CLBP in three orthogonal planes
LBP histogram Fourier features for rotation-invariant image and video description
Local monogenic binary patterns, combining monogenic signal analysis and local binary patterns
Combining LBP-TOP, WLD, and optical flow for segmentation
Pfister T, Li X, Zhao G & Pietikäinen M (2011) Differentiating spontaneous from posed facial expressions within a generic facial expression recognition framework. Proc. ICCV Workshops (SISM 2011), Barcelona, Spain, 868-875.
Zhao G, Ahonen T, Matas J & Pietikäinen M (2012) Rotation-invariant image and video description with local binary pattern features. IEEE Transactions on Image Processing 21(4):1465-1467
Huang X, Zhao G, Pietikäinen M & Zheng W (2012) Spatiotemporal local monogenic binary patterns for facial expression recognition. IEEE Signal Processing Letters 19(5):243-246.
Chen J, Zhao G, Salo M, Rahtu E & Pietikäinen M (2012) Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Transactions on Image Processing 22(1): 326-339.
Part 4: LBP in facial image analysis
56
Face description with LBP
- 1217 citations in Google Scholar
(ECCV paper: 863 citations)
- 3rd most cited paper of PAMI journal since 2006
Ahonen T, Hadid A & Pietikäinen M (2006) Face description with local binary patterns: application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(12):2037-2041. (an early version published at ECCV 2004)
Weighting the regions
Block size Metrics Weighting
18 * 21
130 * 150
Feature vector length 2891
57
Local Gabor Binary Pattern Histogram Sequence [Zhang et al., ICCV 2005]
Illumination normalization by preprocessing] prior to LBP/LPQ feature extraction
Tan & Triggs, AMFG 2007]
Improving the robustness of LBP-based face recognition
Illumination invariance using LBP with NIR imaging
S.Z. Li et al. [IEEE PAMI, 2007]
58
Hussain et al. (2012) Face recognition using local quantized patterrns,
BMVC 2012.
- State-of-the-art performance on FERET and LFW databases
Lei Z, Pietikäinen M & Li SZ (2013) Learning discriminant face descriptor. IEEE Trans. Pattern Analysis and Machine Intelligence, accepted.
59
Research on Trusted biometrics under spoofing
attacks (TABULA RASA) 2010-2014
(http://www.tabularasa-euproject.org/)
• The project will address some of the issues of direct (spoofing)
attacks to trusted biometric systems. This is an issue that needs to be
addressed urgently because it has recently been shown that
conventional biometric techniques, such as fingerprints and face, are
vulnerable to direct (spoof) attacks.
• Coordinated by IDIAP, Switzerland
• We will focus on face and gait
recognition
• Without countermeasures biometric
systems are vulnerable to spoofing
attacks
– Recognition algorithms try to identify/verify instead of checking
whether the biometric input is genuine
• Especially true for face biometrics –
falsifying biometric data is easy
– Social media (Facebook, etc.)
– Hard to hide your face in public
– Paper prints, video displays
Problem description
60
Database examples
Typical countermeasures
• Dedicated sensors (infrared, multispectral & 3D imaging)
– Extra hardware needed – not always possible to deploy
• Multi-modal – hard to spoof multiple traits
– Robustness depends also on fusion rules
• Available sensors – same data used for authentication
– Non-intrusive but no reliable solutions available
• Challenge-response
– Interaction increases both security and intrusiveness
• Costs/intrusiveness and security level go hand in hand
61
Facial texture analysis – appearance and
dynamics
• Proximity between the spoofing medium and the camera (or if high
resolution imaging is used) may cause
– The recaptured face image to be out-of-focus overall blur
– Reveal other facial texture quality issues, e.g. due to the used display
medium, printing defects
Finding differences in texture patterns using LBP
• Dynamic visual cues that are based on either the motion patterns of
– A genuine human face (e.g. eye blinking, mouth movements, facial
expressions)
– The used display medium (e.g. characteristic reflections of planar
objects, distorted motion of warped photos, excessive if hand-held attack
performed relatively close to camera
Static and dynamic texture analysis using LBP-TOP
High performance in detecting 2D face spoofing attacks
• Our results were significantly better than those obtained with earlier methods
• LBP was very powerful, discriminating printing artifacts and differences in light reflection
Määttä J, Hadid A & Pietikäinen M (2011) Face spoofing detection from single images using micro-texture analysis. Proc. International Joint Conference on Biometrics (IJCB 2011).
62
LBP-based method for spoofing attack detection
Facial texture analysis – appearance and dynamics
63
Facial texture analysis – appearance and
dynamics
Gender classification using LBP and contrast
• concatenating LBP sign and contrast histograms: LBP_C
Ylioinas J, Hadid A & Pietikäinen M (2011) Combining contrast information and local binary patterns for gender classification. In: Image Analysis, SCIA 2011 Proceedings, Lecture Notes in Computer Science, 6688:676-686
64
Age classification using LBP variants
• concatenating LBP sign and magnitude histograms: CLBP_S_M
Ylioinas J, Hadid A & Pietikäinen M (2012) Age classification in unconstrained conditions using LBP variants. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan.
Research on recognizing facial expressions and
affect
65
Facial expression recognition from videos
Determine the emotional state of the face
• Regardless of the identity of the face
Zhao G & Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(6):915-928.
Facial Expression Recognition
Mug Shot Dynamic Information
Action Units Prototypic Emotional Expressions
Psychological studies [Bassili 1979], have demonstrated that humans do a
better job in recognizing expressions from dynamic images as opposed to the
mug shot.
66
•
(a) Non-overlapping blocks(9 x 8) (b) Overlapping blocks (4 x 3, overlap size = 10)
•
•
a) Block volumes (b) LBP features (c) Concatenated features for one block volume from three orthogonal planes with the appearance and motion
Database
Cohn-Kanade database :
• 97 subjects
• 374 sequences
• Age from 18 to 30 years
• Sixty-five percent were female, 15 percent were
African-American, and three percent were Asian or
Latino.
67
Happiness Anger Disgust
Sadness Fear Surprise
Demo for facial expression recognition
Low resolution
No eye detection
Translation, in-plane and out-of-plane rotation, scale
Illumination change
Robust with respect to errors in
face alignment
68
Principal appearance and motion from boosted
spatiotemporal descriptors
Multiresolution features=>Learning for pairs=>Slice selection
1) Use of different number of neighboring points when computing
the features in XY, XT and YT slices
2) Use of different radii which can catch the occurrences in different space and time scales
Zhao G & Pietikäinen M (2009) Boosted multi-resolution spatiotemporal descriptors for facial expression recognition. Pattern Recognition Letters 30(12):1117-1127.
3) Use of blocks of different sizes to have global and local statistical
features
The first two resolutions focus on the
pixel level in feature computation, providing different local spatiotemporal information
the third one focuses on the
block or volume level, giving more global information in space and time dimensions.
69
Selected 15 most discriminative slices
Proposing the first method to recognize expressions
in weak illumination using near-infrared videos
Visible light (VL) : 0.38-0.75 μm Near Infrared (NIR) : 0.7μm-1.1μm
Zhao G, Huang X, Taini M, Li SZ & Pietikäinen M (2011) Facial expression recognition from near-infrared videos. Image and Vision Computing 29(9):607-619.
70
On-line facial expression recognition from NIR videos
• NIR web camera allows expression recognition in near darkness.
• Image resolution 320 × 240 pixels.
• 15 frames used for recognition.
• Distance between the camera and subject around one meter.
Start sequences Middle sequences End sequences
Towards recognizing spontaneous expressions: a
component-based spatiotemporal feature descriptor
• Facial Interest Components
– Forehead
– Eyes
– Nose
– Mouth
• Active Shape Model for dealing with occlusion and small view
changes
71
Component-based approaches
Boosted spatiotemporal LBP-TOP features are extracted from areas
centered at fiducial points (detected by ASM) or larger areas
• more robust to changes of pose, occlusions
• can be used for analyzing action units [Jiang et al, FG 2011]
Huang X, Zhao G, Pietikäinen M & Zheng W (2010) Dynamic facial expression recognition using boosted component-based spatiotemporal features and multi-classifier fusion. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2010 Proceedings, Lecture Notes in Computer Science, 6475:312-322.
• Real-world challenge (Occlusion)
• Existed occlusion detections (on static images)
X. Huang, G. Zhao, W. Zheng, M. Pietikäinen. Towards a dynamic expression recognition under facial occlusion. Pattern Recognition Letters.
Towards a dynamic expression recognition under facial occlusion
72
Framework (CFD-OD-WL)
Experimental results
73
Recognition of spontaneous micro-expressions
Posed Spontaneous
74
Recognition of spontaneous micro-expressions
Pfister T, Li X, Zhao G & Pietikäinen M (2011) Recognising spontaneous facial micro-expressions. Proc. International Conference on Computer Vision (ICCV 2011), 1449-1456.
Idea
• Read hidden emotions in faces
75
Potential applications
Reveal lies
Find right price
Our method
Price negotiation
Interrogation
Facial micro-expressions - method
Figure: An example of a facial micro-expression (top-left) being interpolated through graph embedding (top-right); the result from which spatiotemporal local texture descriptors are extracted (bottom-right), enabling recognition using multiple kernel learning.
76
Figure: Normalisation in space domain to a model face. (a) is the model face onto which the feature points in the example face are mapped. (b) is the example face with its feature points detected. (c) shows the image after the feature points of the example face have been mapped to the model face.
•High speed camera
•Emotional movie clips
•Participant
•Researcher
77
Movie to induce FEAR emotion: 1/10 Speed
1/10 Speed
Movie to induce HAPPY emotion:
78
1/10 Speed
Movie to induce SURPRISE emotion:
Recognition of emotions from multiple modalities
79
Multimodal recognition of emotions
Eye detection Facial representation by LBP
Classification by frame-by-frame
method
Kortelainen J, Huang X, Li X, Laukka S, Pietikäinen M & Seppänen T (2012) Multimodal emotion recognition by combining physiological signals and facial expressions: a preliminary study. Proc. the 34th Annual Int’l Conf. IEEE Engineering in Medicine and Biology Society (EMBC'12), San Diego, CA.
Demo
Positive Neutral
Negative
80
Visual recognition of spoken phrases
Visual speech information plays an important role in speech recognition under noisy conditions or for listeners with hearing impairment.
A human listener can use visual cues, such as lip and tongue movements, to enhance the level of speech understanding.
The process of using visual modality is often referred to as lipreading which is to make sense of what someone is saying by watching the movement of his lips.
McGurk effect [McGurk and MacDonald 1976] demonstrates that inconsistency
between audio and visual information can result in perceptual confusion.
Zhao G, Barnard M & Pietikäinen M (2009). Lipreading with local spatio-temporal descriptors. IEEE Transactions on Multimedia 11(7):1254-1265.
System overview
Our system consists of three stages.
• First stage: face and eye detectors, and the localization of mouth.
• Second stage: extracts the visual features.
• Last stage: recognize the input utterance.
81
Local spatiotemporal descriptors for visual information
• (a) Volume of utterance sequence
• (b) Image in XY plane (147x81)
• (c) Image in XT plane (147x38) in y = 40
• (d) Image in TY plane (38x81) in x = 70
Overlapping blocks (1 x 3, overlap size = 10).
LBP-YT images
Mouth region images
LBP-XY images
LBP-XT images
•Features in each block volume.
•Mouth movement representation.
82
Experiments
Database:
Our own visual speech database: OuluVS Database
20 persons; each uttering ten everyday’s greetings one to five times.
Totally, 817 sequences from 20 speakers were used in the experiments.
C1 “Excuse me” C6 “See you”
C2 “Good bye” C7 “I am sorry”
C3 “Hello” C8 “Thank you”
C4 “How are you” C9 “Have a good time”
C5 “Nice to meet you” C10 “You are welcome”
Experimental results - OuluVS database
•
•
•
•
•Mouth regions from the dataset.
•Speaker-independent:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C100
20
40
60
80
100
Phrases index
Rec
ogni
tion
resu
lts (
%)
1x5x3 block volumes
1x5x3 block volumes (features just from XY plane)
1x5x1 block volumes
83
Selected 15 slices for phrases ”See you” and ”Thank you”.
Selected 15 slices for phrases ”Excuse me” and ”I am sorry”.
•These phrases were most difficult to recognize because they are quite similar in the latter part containing the same word ”you”.
•The selected slices are mainly in the first and second part of the phrase.
•The phrases ”excuse me” and ”I am sorry”are different throughout the whole utterance, and the selected features also come from the whole pronunciation.
Selecting 15 most discriminative
features
Demo for visual speech recognition
84
LBP-TOP with video normalization
With normalization nearly 20%
improvement in speaker independent
recognition is obtained
Zhou Z, Zhao G & Pietikäinen M (2011) Towards a practical lipreading system. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), 137-144.
Visual speaker identification with
spatiotemporal directional features
• Problem: recognizing who is speaking on the basis of individual information
• included in speech signals and lip movements.
• Audio signal
• Video signal
geometric feature
features from the gray level data directly
• Audio + Video
Zhao G & Pietikäinen M (2013) Visual speaker identification with spatiotemporal directional features. Proc. International Conference on Image Analysis and Recognition.
85
From XM2VTS database: ``Joe took fathers green shoe bench out''
An intuitive understanding of features related to lip movement:
different people have different articulatory styles when
speaking the same utterance.
• Discrete Cosine Transform (DCT)
• Principal Component Analysis (PCA)
• EdgeMap LBP
• LOCP-TOP: local ordinal contrast pattern with three orthogonal planes
1) they did not consider the directional features;
2) it only used binary code (sign) information;
3) the correlation of pixels was not eliminated.
86
Encoding Spatiotemporal Directional Changes
Combining Sign and Magnitude Information
87
Decorrelation
Then we use G(x) for independent sign and magnitude feature calculation:
• XM2VTS Database
• 295 subjects in four sessions, spaced monthly.
• The first recording per session of the sentence “Joe took fathers green shoe bench out” was used for this research.
Experiments
89
Part 5: Summary and some future directions
Summary
• Modern texture operators form a generic tool for computer vision
• LBP and its variants are very effective for various tasks in computer
vision
• The advantages of the LBP and its variants include
- computationally very simple
- can be easily tailored to different types of problems
- robust to illumination variations
- robust to localization errors
• LPQ is also insensitive to image blurring
• New LBP variants are still emerging
• Complementary descriptors are often used
- LBP/C, CLBP, Gabor&LBP, SIFT&LBP, HOG&LBP, LPQ&LBP
90
Some future directions
• New LBP variants are still emerging
• Often a single descriptor is not effective enough
• Multi-scale processing
• Use of complementary descriptors
- CLBP, Gabor&LBP, Gradient/Orientation&LBP, SIFT&LBP, HOG&LBP,
LPQ&LBP,
• Combining local with more global information (e.g. LBP & Gabor)
• Combining texture and color - or intensity and depth
• Combining sparse (e.g., SIFT, HOG) and dense descriptors (LBP)
• Machine learning for finding the most effective descriptors for
a given problem
• Dynamic textures offer a new approach to motion analysis
- general constraints of motion analysis (i.e. scene is Lambertian, rigid
and static) can be relaxed
The success of LBP approach is growing
• Increasing number of citations
• Used in various problems, applications, and industrial products
• First book on LBP published in 2011 (Springer)
• First LBP workshop was held in conjunction of ACCV 2012
• First special issue of a journal on LBP appearing in 2013
• First edited book on LBP will be published in 2013 (Springer)
For downloads, see http://www.cse.oulu.fi/CMV/Downloads
91
A book on local binary patterns published in 2011
Thanks!
Recommended