2
Fine-grained recognition of physical objects on mobile phones: from categorization to identification Maurits Diephuis Stochastic Information Processing University of Geneva, Switzerland [email protected] Sviatoslav Voloshynovskiy [email protected] Taras Holotyak [email protected] Abstract This paper addresses the recognition of physical ob- jects such as pharmaceutical and cosmetic packages with two hierarchical levels. The first level categorises the object based on printed images, graphics and text, while the second one identifies it based on so called micro-structures: the physical non-cloneable surface structure images. Both levels deploy a tracing descriptor, which com- bines fingerprint-like properties while having reason- able invariance to geometrical and lighting distortions due to its semi-local nature. Crucially, objects can be identified without any geometrical matching or final re- ranking procedure. 1. Introduction In this paper we consider fine-grained identification of physical objects, such as packages, in two levels. The first level categorises the type of package based on visual cues such as design elements or text. The second stage identifies an individual object based on optically acquired micro-structures which are physical non cloneable functions (PUF’s). Similar to biometrics, the working principle is their unique and non-cloneable character. Such architectures are cheap to enrol, do not require any product modification and verification can be done relatively easy. In both the categorisation and the identification we use a descriptor that traces over the image between selected key-points. The key idea is to build a discrim- inative descriptor, that can function as a persistent hash that does not require any exhaustive geometric re-ranking. 2. Framework The targeted class of objects is characterized by lo- calized graphical features that can be detected for fine- grained recognition. Such a localization increases both the accuracy of recognition and robustness to geomet- Figure 1: Automatic partitioning by the categorization algorithm. It is language and font independent. ric distortions typical for mobile imaging. The tracing descriptor is used twice throughout the hierarchy, but with a different method to generate the key-points to trace between. As traces are formed be- tween a pair of two key-points, the stability of those points is crucial. For initial categorisation of a physical package, all text and individual design elements are segmented us- ing local variance and morphological operations to find individual elements and regions (Fig. 1). The centroids of individual graphical elements within segmented re- gions form the key-points that will be traced between. This enables accurate and fast categorization of an ob- ject using advanced indexing techniques or even direct fast Hamming distance based matching. Having recognized the type of the presented object, the system proceeds with the specific identification of that object based on local micro-structures. Given the ascertained type of the object, image patches from pre- selected regions, free from graphical elements, are ex- tracted. FAST [2] key-points are detected and spatially clus- tered using graph connected components. Clusters of local points are spatially averaged to form a new single points, while clusters with a single point are removed. 1

Fine-grained recognition of physical objects on mobile ...sip.unige.ch/articles/2015/2015.cvpr.workshop_final3_camready.pdf · Fine-grained recognition of physical objects on mobile

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fine-grained recognition of physical objects on mobile ...sip.unige.ch/articles/2015/2015.cvpr.workshop_final3_camready.pdf · Fine-grained recognition of physical objects on mobile

Fine-grained recognition of physical objects on mobile phones:from categorization to identification

Maurits DiephuisStochastic Information ProcessingUniversity of Geneva, Switzerland

[email protected]

Sviatoslav [email protected]

Taras [email protected]

Abstract

This paper addresses the recognition of physical ob-jects such as pharmaceutical and cosmetic packageswith two hierarchical levels. The first level categorisesthe object based on printed images, graphics and text,while the second one identifies it based on so calledmicro-structures: the physical non-cloneable surfacestructure images.

Both levels deploy a tracing descriptor, which com-bines fingerprint-like properties while having reason-able invariance to geometrical and lighting distortionsdue to its semi-local nature. Crucially, objects can beidentified without any geometrical matching or final re-ranking procedure.

1. Introduction

In this paper we consider fine-grained identificationof physical objects, such as packages, in two levels.The first level categorises the type of package basedon visual cues such as design elements or text. Thesecond stage identifies an individual object based onoptically acquired micro-structures which are physicalnon cloneable functions (PUF’s). Similar to biometrics,the working principle is their unique and non-cloneablecharacter. Such architectures are cheap to enrol, do notrequire any product modification and verification canbe done relatively easy.

In both the categorisation and the identification weuse a descriptor that traces over the image betweenselected key-points. The key idea is to build a discrim-inative descriptor, that can function as a persistenthash that does not require any exhaustive geometricre-ranking.

2. FrameworkThe targeted class of objects is characterized by lo-

calized graphical features that can be detected for fine-grained recognition. Such a localization increases boththe accuracy of recognition and robustness to geomet-

Figure 1: Automatic partitioning by the categorizationalgorithm. It is language and font independent.

ric distortions typical for mobile imaging.

The tracing descriptor is used twice throughout thehierarchy, but with a different method to generate thekey-points to trace between. As traces are formed be-tween a pair of two key-points, the stability of thosepoints is crucial.

For initial categorisation of a physical package, alltext and individual design elements are segmented us-ing local variance and morphological operations to findindividual elements and regions (Fig. 1). The centroidsof individual graphical elements within segmented re-gions form the key-points that will be traced between.This enables accurate and fast categorization of an ob-ject using advanced indexing techniques or even directfast Hamming distance based matching.

Having recognized the type of the presented object,the system proceeds with the specific identification ofthat object based on local micro-structures. Given theascertained type of the object, image patches from pre-selected regions, free from graphical elements, are ex-tracted.

FAST [2] key-points are detected and spatially clus-tered using graph connected components. Clusters oflocal points are spatially averaged to form a new singlepoints, while clusters with a single point are removed.

1

Page 2: Fine-grained recognition of physical objects on mobile ...sip.unige.ch/articles/2015/2015.cvpr.workshop_final3_camready.pdf · Fine-grained recognition of physical objects on mobile

60#130&

Select feature pair Select 3x3 ROI Mean of all signals

Variance oflocal varianceAccept/Reject Order statistics

Micro-structure TextGraphic

Figure 2: Basic work-flow of the tracing descriptor.

100 200 300 400 500 600 700 800

50

100

150

200

250

100 200 300 400 500 600 700 800 900

50

100

150

200

250

300

20 40 60 80 100 120−1.5

−1

−0.5

0

0.5

1

1.5

OriginalDistorted

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

50050 100 150 200 250 300 350 400 450 500 550

50

100

150

200

250

300

350

400

450

500

55020 40 60 80 100 120

−5

−4

−3

−2

−1

0

1

2

OriginalDistorted

Figure 3: Trace examples from an original package andmicro-structure and their distorted copies.

It stems from the observation that whole clusters ofFAST key-points tend to be more stable.

Generating the trace (Fig. 2 and 3) is identical forboth stages in the hierarchy and done as follows:

• Feature points are chosen as pair when the dis-tance is sufficient given the interpolation used.

• For each point of a pair, a 3 × 3 neighbourhoodis taken, after which the traces are computed be-tween two sets of key points. The resulting setis rescaled and averaged in a single re-normalizedsignal.

• The most informative traces are chosen by onlyselecting signals whose variance of local variancesis at least 1.5 times higher than the average forthat specific image.

• Candidate traces for which this value is dispropor-tionally caused by an edge are also rejected.

• Resulting traces are quantized using either randomprojections and sign or product VQ with a pre-trained codebook with about 2K centroids for eachVQ block.

• Identification is based on indexing, and matcheseach descriptor of a probe image. The final deci-sion is then based on majority voting.

3. Experimental validation

To empirically test the accuracy of the categoriza-tion we have constructed a database of packages and

10−4

10−3

10−2

10−1

100

10−8

10−6

10−4

10−2

100

PM

PF

SIFTORBTrace

Figure 4: ROC showing the discriminative power ofORB, SIFT and Trace descriptors detected in imagesdistorted by AWGN, JPEG compression and a projec-tive transformation.

acquired them 3 times using a hand held SamsungGalaxy III in ordinary light. We used a single frontalimage for the enrolment and the two others for thecategorization testing. In total 450 packages were en-rolled form which about 30 regions from each packagewere extracted that were passed to the trace descriptor.These were classified error-less.

To validate the identification, the micro-structureimages of 50 packages were acquired with a mobilephone, 3 times each, all in about the same positionof roughly 4 × 4 cm regions. Every image was en-rolled, (a) with a 100 trace descriptors; (b) with a 1’000SIFT descriptors selected via gradient magnitude and(c) an unlimited regime with on average 10’000 SIFTdescriptors per image [1]. Even on (distorted) micro-structures the trace descriptor is discriminative enoughto allow for error-less identification. This is not the casefor SIFT (or ORB) in this particular framework, as itsfeature-points are not stable enough, nor its descriptorsdiscriminative enough. As emphasized by the ROC inFig. 4 the probability of false acceptance for trace issignificantly lower than those for SIFT and ORB.

4. ConclusionWe have shown the viability of categorizing and

identifying physical objects using a simple descriptorand a mobile phone. Specifically, this can be done with-out a geometry based re-ranking procedure. This workwas partially supported by SNF project No. 200020-146379.

References

[1] M. Brown and D. G. Lowe. Invariant features frominterest point groups. In BMVC, number 1, 2002.

[2] E. Rosten and T. Drummond. Machine learning forhigh-speed corner detection. In Computer Vision–ECCV 2006, pages 430–443. Springer, 2006.