31
1/21 LSSD LSSD: a Controlled Large JPEG Image Database for Deep-Learning-based Steganalysis Hugo RUIZ 1 , Mehdi YEDROUDJ 1 , Marc CHAUMONT 1;2 , Fr edericCOMBY 1 , Gerard SUBSOL 1 1 Research-Team ICAR, LIRMM, Univ. Montpellier, CNRS, France; 2 Univ. N^mes, France [email protected] January 11, 2021 ICPR’2021, International Conference on Pattern Recognition, MMForWILD’2021, Worshop on MultiMedia FORensics in the WILD, Lecture Notes in Computer Science, LNCS, Springer. January 10-15, 2021, Virtual Conference due to Covid (formerly Milan, Italy).

LSSD: a Controlled Large JPEG Image Database for Deep

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LSSD: a Controlled Large JPEG Image Database for Deep

1/21

LSSD

LSSD: a Controlled Large JPEG Image Databasefor Deep-Learning-based Steganalysis

Hugo RUIZ1, Mehdi YEDROUDJ1, Marc CHAUMONT1,2,

Frederic COMBY1, Gerard SUBSOL1

1Research-Team ICAR, LIRMM, Univ. Montpellier, CNRS, France;2Univ. Nımes, France

[email protected]

January 11, 2021

ICPR’2021, International Conference on Pattern Recognition,MMForWILD’2021, Worshop on MultiMedia FORensics in the WILD,Lecture Notes in Computer Science, LNCS, Springer.

January 10-15, 2021, Virtual Conference due to Covid (formerly Milan, Italy).

Page 2: LSSD: a Controlled Large JPEG Image Database for Deep

2/21

LSSD

Introduction

Steganography / Steganalysis

Lorem ipsum dolor sit amet, consecteturadipiscing elit. Aenean id nunc convallis, volutpat

purus et, sodales tortor. Nullam tristique leosapien, nec aliquet velit posuere ac. Maecenas eulorem eu libero faucibus congue vel rutrum enim.

Donec sagittis elit a turpis posuere aliquam.Maecenas quis sem eleifend arcu finibus volutpat.

Emb

Cover imageC

Message M

Alice

Page 3: LSSD: a Controlled Large JPEG Image Database for Deep

2/21

LSSD

Introduction

Steganography / Steganalysis

Lorem ipsum dolor sit amet, consecteturadipiscing elit. Aenean id nunc convallis, volutpat

purus et, sodales tortor. Nullam tristique leosapien, nec aliquet velit posuere ac. Maecenas eulorem eu libero faucibus congue vel rutrum enim.

Donec sagittis elit a turpis posuere aliquam.Maecenas quis sem eleifend arcu finibus volutpat.

Emb

Cover imageC

Message M

Alice

ExtLorem ipsum dolor sit amet, consectetur

adipiscing elit. Aenean id nunc convallis, volutpatpurus et, sodales tortor. Nullam tristique leo

sapien, nec aliquet velit posuere ac. Maecenas eulorem eu libero faucibus congue vel rutrum enim.

Donec sagittis elit a turpis posuere aliquam.Maecenas quis sem eleifend arcu finibus volutpat.

Bob

Stego image S

Message M

Secret key K

Page 4: LSSD: a Controlled Large JPEG Image Database for Deep

2/21

LSSD

Introduction

Steganography / Steganalysis

Lorem ipsum dolor sit amet, consecteturadipiscing elit. Aenean id nunc convallis, volutpat

purus et, sodales tortor. Nullam tristique leosapien, nec aliquet velit posuere ac. Maecenas eulorem eu libero faucibus congue vel rutrum enim.

Donec sagittis elit a turpis posuere aliquam.Maecenas quis sem eleifend arcu finibus volutpat.

Emb

Cover imageC

Message M

Alice

ExtLorem ipsum dolor sit amet, consectetur

adipiscing elit. Aenean id nunc convallis, volutpatpurus et, sodales tortor. Nullam tristique leo

sapien, nec aliquet velit posuere ac. Maecenas eulorem eu libero faucibus congue vel rutrum enim.

Donec sagittis elit a turpis posuere aliquam.Maecenas quis sem eleifend arcu finibus volutpat.

Bob

Eve

Stego image S

Message M

Secret key K

Stego ≈ Cover

Page 5: LSSD: a Controlled Large JPEG Image Database for Deep

3/21

LSSD

Introduction

Efficient methods in steganalysis

I First were based on Machine Learning [KCP13]

I Deep Learning improved performance

I But it requires a significant training database

Page 6: LSSD: a Controlled Large JPEG Image Database for Deep

4/21

LSSD

Introduction

The mismatch problem

Learningbase Test base

Page 7: LSSD: a Controlled Large JPEG Image Database for Deep

4/21

LSSD

Introduction

The mismatch problem

I Differences between learning base andtest base

• different image sources(Camera & ISO)

• differences in development parameters(RAW → JPEG)

Learningbase Test base

Page 8: LSSD: a Controlled Large JPEG Image Database for Deep

4/21

LSSD

Introduction

The mismatch problem

I Differences between learning base andtest base

• different image sources(Camera & ISO)

• differences in development parameters(RAW → JPEG)

I Reduce performance of the method

Learningbase Test base

Page 9: LSSD: a Controlled Large JPEG Image Database for Deep

4/21

LSSD

Introduction

The mismatch problem

I Differences between learning base andtest base

• different image sources(Camera & ISO)

• differences in development parameters(RAW → JPEG)

I Reduce performance of the method

Learningbase Test base

Objective to work with mismatchTo be closer to the real world and have a network more robust

Page 10: LSSD: a Controlled Large JPEG Image Database for Deep

5/21

LSSD

Introduction

Limits of image bases currently used in steganalysis

I ALASKA v1 (2018) & v2 (2020): big base (up to 80k images)with a lot of diversity. link here

I BOSS (2010): a reference base but few images (10k). link here

I Dresden & RAISE : very few images (∼10k both combined).RAISE link here

All these bases contain about 100,000 images in RAW format butsome are no longer available for download.

Page 11: LSSD: a Controlled Large JPEG Image Database for Deep

6/21

LSSD

Introduction

Examples

Part of image from ALASKA2256×256 JPEG image in colour

Part of image from BOSS256×256 JPEG image in grayscale

Page 12: LSSD: a Controlled Large JPEG Image Database for Deep

7/21

LSSD

Diversity

Sources

Increase diversity: get as RAW images as possible

I Total: 127,420 images

I Size: ∼ 3000×5000

I Cameras: > 100

Page 13: LSSD: a Controlled Large JPEG Image Database for Deep

8/21

LSSD

Diversity

Sources

Analysis of diversity of the images

The ISO number is the value of sensitivity of the camera sensor.

I ISO for ALASKA2 base:

ISO < 100 100 ≤ ISO < 1000 1000 ≤ ISO

Number 12,497 55,893 11,615

I Camera models: > 100

ALASKA2 BOSS Dresden RAISE Stego App

Number 40 7 25 3 26

Page 14: LSSD: a Controlled Large JPEG Image Database for Deep

9/21

LSSD

Diversity

Development process

Processing pipeline

Split

1 RAWOriginal sizeColour image

1 JPEG1024x1024

Colour | Grayscale

16 JPEG256x256

Grayscale

Dev

To obtain LSSD database: 127,420×16 ' 2M

Page 15: LSSD: a Controlled Large JPEG Image Database for Deep

10/21

LSSD

Diversity

Development process

Processing pipeline

Rawtherapee process

RAWdataset

Demosaicing Resizing Crop1024x1024

JPEGcompression

Microconstrast toolp = 0.5Denoising

LSSD

Multi split256x256

Page 16: LSSD: a Controlled Large JPEG Image Database for Deep

10/21

LSSD

Diversity

Development process

Processing pipeline

Rawtherapee process

RAWdataset

Demosaicing Resizing Crop1024x1024

JPEGcompression

Microconstrast toolp = 0.5Denoising

LSSD

Multi split256x256

With this method, it is possible to create 1024×1024 and256×256 JPEG images.

During the Rawtherapee process, Unsharp Masking (USM) can be used

and more values are available for the denoising.

Page 17: LSSD: a Controlled Large JPEG Image Database for Deep

11/21

LSSD

Diversity

Development process

Parameters for the LSSD database

1https://alaska.utt.fr/

Based on the script[CGB20] usedfor the development of the imagesof the ALASKA2 challenge1.

Other parameters could be used:

I Demosaicing: AMAZE or IGV

I Resize: can be random

I Denoise: different range

I . . .

Name Value

Demosaicing Fast or DCBResize & Crop YesSize (resize) 1024× 1024

Kernel (resize)Nearest (0.2)Bicubic (0.5)Bilinear (0.3)

Resize factor depends on initial sizeUnsharp Masking No

Denoise(Pyramid Denoising)

Yes

Intensity [0; 60]Detail [0; 40]

Micro-contrast Yes (p = 0.5)Color No

Quality factor 75

Page 18: LSSD: a Controlled Large JPEG Image Database for Deep

12/21

LSSD

Diversity

Development process

Quality Factor (QF), a key-parameter for JPEG images

QF is linked to quantization matrices in DCT image compression.We use only standard matrices in the first version of LSSD.

However, possible to increase diversity by changing QF.

QF ∈ [0, 100]

I 0: very poor quality

I 50: minimum

I 75: our choice

I 100: best quality available

Page 19: LSSD: a Controlled Large JPEG Image Database for Deep

13/21

LSSD

Experimental protocol

Global process

Dataset

RAW

LSSD

CoverJPEG

Development

Page 20: LSSD: a Controlled Large JPEG Image Database for Deep

13/21

LSSD

Experimental protocol

Global process

I Algorithm: J-UNIWARD [HF13]

I Payload: 0.2 bpnzacs

I Type: grayscale

Dataset

RAW

LSSD

CoverJPEG

LSSD

StegoJPEG

Development Steganography

Page 21: LSSD: a Controlled Large JPEG Image Database for Deep

13/21

LSSD

Experimental protocol

Global process

Dataset

RAW

LSSD

CoverJPEG

LSSD

StegoJPEG

LSSD

MAT

Development Steganography

Decompression

Page 22: LSSD: a Controlled Large JPEG Image Database for Deep

14/21

LSSD

Experimental protocol

Building different bases

I With different sizes

I 6 bases: from 10k to 2M images

I The smaller ones are extracted fromthe larger ones⇒ same development

I Number of images = cover imagesEx: LSSD 50k = 50k cover + 50k stego

LSSD_1M

LSSD_500k

LSSD_100k

LSSD_50k

LSSD_10k

LSSD_2M

Page 23: LSSD: a Controlled Large JPEG Image Database for Deep

15/21

LSSD

Experimental protocol

Test base

Alldevelopmentpipeline

TSTbase100kJPEG

RAWdataset

RAWTST

I Extract 6,250 RAW imagesfrom RAW dataset

I Same development pipeline(with same parameters)

I Test base of 100k images

Page 24: LSSD: a Controlled Large JPEG Image Database for Deep

15/21

LSSD

Experimental protocol

Test base

Alldevelopmentpipeline

TSTbase100kJPEG

RAWdataset

RAWTST

I Extract 6,250 RAW imagesfrom RAW dataset

I Same development pipeline(with same parameters)

I Test base of 100k images

This test base is totallyindependent of the learning base

Page 25: LSSD: a Controlled Large JPEG Image Database for Deep

16/21

LSSD

Results

A new database: LSSD

http://www.lirmm.fr/~chaumont/LSSD.html

Page 26: LSSD: a Controlled Large JPEG Image Database for Deep

16/21

LSSD

Results

A new database: LSSD

A database of 2 million JPEG (color or grayscale) 256× 256 images.

Different databases available for downloading separately.

http://www.lirmm.fr/~chaumont/LSSD.html

Page 27: LSSD: a Controlled Large JPEG Image Database for Deep

17/21

LSSD

Results

To download (access soon)

A shell script to download: https://github.com/Yiouki/download_lssd

Parameters:

I Base name (-b): LSSD 10k, 50k. . . or ALASKA2, BOSS. . .

I Type (-t): JPEG or MAT

I Coloring (-c): Color or Gray

I Nature (-n): Cover or Stego (Stego P02 in our case)

I Output (-o): Choose the output path

I Help (-h): Print help

Example to download LSSD 10k gray cover images in MAT format: sh

LSSD download script -b LSSD 10k -t MAT -c Gray -n Cover

Page 28: LSSD: a Controlled Large JPEG Image Database for Deep

18/21

LSSD

Conclusions and perspectives

Conclusions

I Steganalysis community can find multiple usages of this base:

• scalability analysis [RCY+21]• learning bases exceeding millions of controlled images

I Lot of important steps to develop a complete database

I About one week to develop a complete and usable base.(Computer: 16 proc Intel Xeon(R) W-2145 3.70Ghz)

Page 29: LSSD: a Controlled Large JPEG Image Database for Deep

19/21

LSSD

Conclusions and perspectives

To be continued. . .

I Create and analyze the controlled mismatch between thetraining/learning and test base.With USM, denoising, demosaicing. . .

I Use different steganography algorithms with more payloads

I Use colour in the base

Page 30: LSSD: a Controlled Large JPEG Image Database for Deep

20/21

LSSD

References

References I

Remi Cogranne, Quentin Giboulot, and Patrick Bas.

Challenge Academic Research on Steganalysis with Realistic Images.

In Proceedings of the IEEE International Workshop on InformationForensics and Security, WIFS’2020, Virtual Conference due to covid(Formerly New-York, NY, USA), December 2020.

Vojtech Holub and Jessica Fridrich.

Digital image steganography using universal distortion.

IH&MMSec 13 Proceedings of the first ACM workshop on Informationhiding and multimedia security, 2013(1), 2013.

Sarra Kouider, Marc Chaumont, and William Puech.

Adaptive Steganography by Oracle (ASO).

In Proceeding of the IEEE International Conference on Multimedia andExpo, ICME’2013, pages 1–6, San Jose, California, USA, July 2013.

Page 31: LSSD: a Controlled Large JPEG Image Database for Deep

21/21

LSSD

References

References II

Hugo Ruiz, Marc Chaumont, Mehdi Yedroudj, Ahmed Oulad-Amara,Frederic Comby, and Gerard Subsol.

Analysis of the Scalability of a Deep-Learning Network for Steganography“Into the Wild”.

In Submited to MultiMedia FORensics in the WILD, MMForWILD’2020,in conjunction with ICPR2020 The 25th International Conference onPattern Recognition, Virtual Conference due to covid (Formerly Milan,Italy), January 2021.