University of Venice, Italy

University of Venice, Italy

C. Lucchese, M. Vlachos, D. Rajan, P.S. YuIBM Research IBM Research University of Chicago

TimeX

Y

Objective: Ownership seal with Mining Guarantees

Output on database and data mining operations is the same as on the original data

Output on database and data mining operations is the same as on the original data

Final Destination

Embed a stamp so that we can claim ownership of the

data

Embed a stamp so that we can claim ownership of the

data

NN Search

Clustering

Classification

the trajectories are modified imperceptibly,but their neighboring objects are not distorted

…

Applications: Database Search

Search operations remains same

– outsource data to a mining company

– maintain principal rights of the dataset

We want to retain the Nearest Neighbors of each object.

x

NN(x)

y1 y2

Watermark does not change the nearest neighbor

Determine the maximum watermark embedding power p which maintains NN

for all objects:

Dp(x, NN(x)) < Dp(x,y)

Class A

Class A

Class A

Class B

Class B

Class B

Dataset of time-series/trajectories with class labels

Objective: Distort the data imperceptibly so that class labels are maintained.

Acceptable

Unacceptable

Modified Dataset including watermark

Applications: Classification Preservation

Applications: Clustering Preservation

Results of clustering remains the same

– geodesic distances will remain the same

– hierarchical clustering will not be affected

De Brazza Monkey MaleDe Brazza Monkey Juvenile Male

Juvenile Baboon

Mandrill2 male

Orangutan juvenile

Orangutan2 male

Gray-necked Owl Monkey Male

Gray-necked Owl Monkey Female

Red Howler Monkey Male

Mandrill male

Common Chimpanzee male

Common Chimpanzee Male 2

Mantled HowlerMonkey

The secret key is embedded in a domain resilient to common trajectory transformations

Frequency Domain

ft ift

watermarked magnitudes

original data watermarked data

watermark

Magnitude

Phase

Magnitude

Phasesame

modified

Frequency Domain

Example:w = [-1 1 -1 -1 1 1 ]

p (embedding power)

Additive Embedding in Magnitudes

Techniques are also applicable for image shapes

Red Howler Monkey Male

(Alouatta seniculus seniculus)

Conversion of skull shape into a

two-dimensional sequence

Orangutan skull Extracted Shape Embed the key in the k most important coefficients

(shapes can be treated as trajectories)

Secret information is hidden in some of the frequency components

2 coeffs 4 coeffs 8 coeffs

16 coeffs 32 coeffs 64 coeffs

Select the frequency coefficients that best describe the shape of the trajectory

One can select either highest energy coefficients, or low frequency coefficients. Removal of the watermark will be more difficult without destroying the important trajectory characteristics

TimeX

Y

key is detected very efficiently even when it is inserted with low embedding power

Frequency Domain

ft

watermark

Magnitude

Phase

correlation

w = [-1 1 -1 -1 1 1 ]

watermarked data

Detection of the embedded key is virtually perfect

Better Detection (semi-blind):Remove ‘background noise’ bias before the embedding and during the detection

Threshold

MST after watermarkingMST before watermarking

example of using our techniquefor spanning tree preservation

the proposed fast algorithm prunes a significant amount of the search space

We need to examine for each power p, how many times the following is violated:

Dp(x, NN(x)) > Dp(x,y)

x

NN(x)

y z

Finding the maximum embedding power

Express distance parametrized by the embedding power of the key

our approach can embed the hidden information more than 300 times faster than the brute-force approach

The fast search techniques find the same result as the exhaustive search, but are 2-3 orders of magnitude faster

Running Time

The efficient key embedding + detectionallow for effective key recovery even under attacks Geometric Attacks: perfect detection under Translation/Rotation/Scaling attacks

Gaussian Noise attack has to destroy the data in order to be effective

Decimation attack can be perfectly withstood

Data Reduction attack (even when pruning 50% of dataset) is not effective

Documents

University of Venice, Italy