55
Symposium E-discovery 2019 Artificial Intelligence en digital forensich onderzoek, risico of oplossing ? prof. dr. ing. Zeno Geradts Senior forensic scientist / Special Chair Forensic Data Science Digital Technology and Biometrics / University of Amsterdam

Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Symposium E-discovery 2019

Artificial Intelligence en digital forensich onderzoek, risico of oplossing ?

prof. dr. ing. Zeno Geradts

Senior forensic scientist / Special Chair Forensic Data ScienceDigital Technology and Biometrics /

University of Amsterdam

Page 2: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

COST Project DigForAsp

DigForAsp (Digital forensics: evidence analysis via intelligent systems

and practices) – CA17124 is funded by the European Cooperation in Science

and Technology (COST). DigForAsp activities were launched on 10th

September 2018 for 4 years.

Page 3: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Outline

- Introduction- Deep learning and neural networks- Examples- Issues- Outlook and conclusion

3

Page 4: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Netherlands Forensic Institute

Page 5: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Type your footer here

5

University of Amsterdam Chair Forensic Data Science

● store and process ● understand and decide ● analyse and model● Report and visualize● Higher efficiency● Data-intensive● Evidential strength big

data

Page 6: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Machine learning vs deep learning

6

Page 7: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Neural network multilayer

7

Page 8: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Calculation speed with Digital Evidence

8

Page 9: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Digital Evidence

9

Page 10: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

10

Extract data

Make data

readable

Organize data

Interpret data

Police does 97%

of the work

Focus on data

Page 11: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

11

Challenge: many formats, old & new, non-standard

•Tool and library development

•Reverse engineeringDiscover the technological principles of a system (e.g. software or communication protocol) through analysis of its function and operation

Make data readable

Page 12: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

12

Trace Recovery & Analysis

Trace-analysis is the expertise to conserve, detect, repair, undelete, decrypt, find, structure and interpret data and traces on any case related digital medium.

Page 13: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

13

digital behaviour

rapid/short development

cycles

fast global expension of bandwidth 57% per year

consumer prices for

devices+data falling rapidly

THEDIGITAL WORLD

increasing streaming

data volume

time spend online

Page 14: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Internet of things

14

Page 15: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Internet of things 2020 Gartner

15

Page 16: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

16

40 kilometers queue of trucks filled with paper!!!

8 Terabyte?

1600 hours HD Video

Page 17: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Big Data issues

17

Page 18: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

18

Page 19: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

The good news : many examples were it

works well credit card fraud detection

and casework

VISA states they save

billions of euros

a year

Type your footer here 19

Page 20: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Big Data at NFI

⬛ Text Mining

⬛ Data Profiling

⬛ Financial Data Analysis

⬛ Social Network Analysis

Type your footer here 20

Page 21: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

21

By smart automation of our data factories!

How to identify relevant digital traces?

smart search+find -

and smart analysis

solutions

smart broadband

infra and smart

scalable storage

Page 22: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

What are digital traces?

(bits)rows: 0’s en 1’s:0101010010010100100101110010100111010100100011110010101

0100110010010010010011010101010100101001010000101011111

1111100100100110101010101001010010100001010111110101011

…with a meaning (after interpretation)

Interpretation difficult because of:

Undocumented storageformats

Deleted files

Files partly overwritten

Encryption

100 kB100.000 bytes

2 MB2.000.000 bytes

10 kB10.000 bytes

Page 23: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Data analysis at the NFI

Specifications available?

• Yes? Use the specs

• No? Reverse Engineering and Carving

Add results to Forensic libraries

• File systems (Snorkel)

• File formats (Traces)

• RAM memory (Mammal)

Process data based on libraries

• Create trace index (Data model)

• Investigate using GUI or API (Query model)

23

Page 24: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Digital investigation using XIRAF

24

analyst

Tactical Investigatorr

Several weeks(1 TB in 24/hrs)

technicalInvestigator

ANALYSEREPORTENABLE ACCESS / ENRICHSECURECONFISCATION

Page 25: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Future of digital investigation: HANSKEN

25

analyst

tacticalinvestigator

Some hours (1Tb/20 min) – direct results at start

technicalinvestigator

ANALYSEREPORTENABLE ACCESS / ENRICHSECURECONFISCATION

Datastream X

Datastream Y

Page 26: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Evolution forensic analysis – automation, speed & coverage

26

automated

import and automated massive-parallel

processing

manual

import and automated processing

manual

import and manual

processing

Conventional: throughput months

50% 50%

XIRAF: throughput weeks70% 30%

HANSKEN:throughput hours

85% 15%

Page 27: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

27

Page 28: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Examples hypotheses in digital forensic science

• has the computer been hacked or not ?• has the email been send or not ?• has the USB been plugged in or not ?• was the phone in this location or at the location

presented by the defence ?• has the child pornography been send by the computer of

the suspect or not ?• is the child porn photographed with this camera or

another camera ?

28

Page 29: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

29

Challenge: data is not self-explaining

Add models and analysis to support interpretation

• Scenario analysis

• Timeline analysis

• Geographical models: e.g. location of cell phones

• Analysis of images / video / audio

– Size

– Speed

– Face recognition

– Speech recognition

• Author recognition

Interpret data

Page 30: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

30

Page 31: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

31

Page 32: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

32

Page 33: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

33

Page 34: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

34

Page 35: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

35

Page 36: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

36

Page 37: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

37

37

Digital Camera Identification

The process of

Linking images to the source camera

Linking images to images in a database

to determine a common source

Page 38: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

38

38

Page 39: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

39

Casework links

Page 40: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

40

Casework

• Example where it worked

Page 41: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

41

41

Bayesian

Question: were the images made with the seized camera?

Conclusion

The findings of the investigation are:

Equally likely

Somewhat more likely

More likely

Much more likely

Very much more likely

if H1 is true, than if H2 is true.

The findings are very much more likely if the Seized Camera took the child pornographic image, than if another camera took the image.

Page 42: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Large Scale Camera Identification

42

• Sorting photos by source• Identify photos from the same source (camera)•New valuable information and insight

Panda

Page 43: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Scan→Extract → Compare → Cluster → Explore

Sorting Images by Source

Scan

4320x3240 1024x768

Sorted by resolution and directory

Page 44: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Scan→Extract → Compare → Cluster → Explore

Sorting Images by Source

Extract

PRNU noise patterns (fingerprints)

Page 45: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Scan→Extract → Compare → Cluster → Explore

Sorting Images by Source

Compare

Images compared to all images

Page 46: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Scan→Extract → Compare → Cluster → Explore

Sorting Images by Source

Cluster

Images grouped by source

threshold = 0.001

Page 47: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Scan→Extract → Compare → Cluster → Explore

Sorting Images by Source also GPU / social networks also deep learning applied

!GP

Page 48: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Facial comparison

Page 49: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

NIST test of faces in the wild

Page 50: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Other examples of deep learning

- manipulation detection- face morphing / deepfakes- court findings finding irregularities

Page 51: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital
Page 52: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

52

Discussion

Rafferty said: “Cost-cutting and outsourcing has put the administration of

justice at risk ... I don’t think it’s bad faith by the police. They have been under-

resourced. They are swamped. In some of my cases it’s the police who have

revealed material that’s helpful to the defence.”

Collie, the head of Discovery Forensics in London who mainly works for

defendants, said: “The odds are stacked against the defence in many ways. We

rarely get access to the actual piece of equipment. In the past I could go to the

police station and see a phone or a computer and physically check it’s the right

piece. Now everything comes prepackaged and is handed over on a hard drive

or USB stick.”

Page 53: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Collapsed rape prosecutions

December: Liam AllanThe first case to be abandoned due to the failure by police to hand over crucial digital evidence was that of London student Liam Allan, 22, in December. Allan was charged with 12 counts of rape and sexual assault, but his trial was abandoned after police were ordered to hand over phone records that should have already been provided to the defence.

December: Isaac ItiaryShortly before Christmas, an alleged child rapist, Isaac Itiary, 25, was cleared at Inner London crown court when the prosecution offered no evidence. Material recovered from the phone of the complainant by police was only handed over to defence lawyers shortly before it was due to come to trial.

January: Oliver MearsIn January, Oliver Mears, 19, a student at Oxford University, was charged with the alleged rape of a teenage woman in 2015 following

53

Page 54: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

54

• Explain Deep Learning in court

• Bias in Model

• Training of users

• Anti forensic software

Challenges

Page 55: Artificial Intelligence en digital forensich onderzoek, risico of ......Machine learning vs deep learning 6 Neural network multilayer 7 Calculation speed with Digital Evidence 8 Digital

Questions

Zeno G

era

dts

z.g

era

dts

@nfi.m

invenj.n

l

55