31
Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Embed Size (px)

Citation preview

Page 1: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Analysis 360: Blurring the line between EDA and PC

Andrea Gibson, Product Director, Kroll Ontrack

March 27, 2014

Page 2: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

2

Discussion Overview

Pushing the Boundaries of Early Data Analysis (EDA)

Examining Traditional EDA Tools

Leveraging Predictive Coding (PC) for Analysis

Using PC in an EDA Environment

Page 3: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Pushing the Boundaries of EDA

Page 4: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

4

EDA | an acronym worth defining Early Data Analysis (EDA) aides fact-finding and

narrows the data scope by helping attorneys understand their datasets» Triage data into critical and non-critical groupings

» Identify and reduces number of key players

» Test search terms

» Identify critical case arguments

» Categorize documents as efficiently as possible for production

A true methodology – technology fuels human decisions

Page 5: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

5

» Filter

» Search

» Cluster

» Processing

» Ensure portabilityof groups and tags

» Ensure production/search capabilities of review platform

» Search

» Tag

» Redact

Identify, Collect & Process

Analysis

Export to Review Platform

» Log

» Route

» Report

Import & Perform Early Analysis

» Test

» QC

Document Review

Traditional EDA | Overview

Page 6: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

6

» Filter

» Search

» Cluster

» Processing

» Ensure portabilityof groups and tags

» Ensure production/search capabilities of review platform

Identify, Collect & Process

Analysis

Export to Review Platform

Import & Perform Analysis

» Test

» QC

Where does Predictive Coding fit in?

Predictive Coding!» Search

» Tag

» Redact

» Log

» Route

» Report

Document Review

Page 7: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

7

» Filter

» Search

» Cluster

» Ensure portabilityof groups and tags

» Ensure production/search capabilities of review platform

» Search

» Tag

» Redact

Predictive Coding!

Identify, Collect & Process

Analysis

Export to Review Platform

» Log

» Route

» Report

Import & Perform Analysis

» Test

» QC

Review

Traditional EDA | How efficient is it?

The

Ber

mud

a Tr

iang

le o

f edi

scov

ery

» PC is massively underused

» The tools used during analysis and review overlap

substantially

» Pointless inefficiencies are created by jockeying data between two

standalone platforms

Page 8: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

8

Identify, Collect & Process

Analyze and Review

EDA + Review | Could it look like this?

» Process

» PC

» Filter

» Search

» Cluster

» Test

» QC

» Route

» Report

» Tag

Page 9: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Examining Traditional EDA Tools

Page 10: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

10

Keyword Search & Concept Search

»Uses search terms and Boolean operators (&, or, not) to retrieve documents that contain those exact terms

»Standard practice

»Generally accepted in the courts

“baseball & field”

»Technology alternative

»Allows reviewers to find documents with similar conceptual terms even if they do not contain exact search terms

»Seldom used for filtering; increasingly used for review

“baseball” diamond, MLB, hit, out

Page 11: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

11

Finance

» Documents automatically grouped by theme without human input

Topic Grouping &

» Identify all languages in a document

» Used to group and sort documents for review by multilingual reviewers

非披露協議كتيب

الموظف

Contract

Topic Grouping & Language Identification

Page 12: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

12

» Identifies and groups e-mail conversations based on content

Topic Grouping &

» Reviewers can quickly identify and compare documents that are very similar to one another but are not exact duplicates

Email Threading & Near Deduplication

Start-Point Email RE:

FWD:End-Point

Email

Page 13: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

13

Finding a Common Thread At their cores, these

tools help attorneys learn more about their data» Does PC fit the bill?

TopicGroup

Key WordSearch

Language ID

Dedupe Email Threading

Concept Search

Analytical

Tools

Predictive Coding

Page 14: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Leveraging PC for Analysis

Page 15: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

15

Predictive Coding for Production

Page 16: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Predictive Coding For Analysis

16

PC has been praised for its ability to reduce the amount of documents manually reviewed during first pass

But at least three critical components of PC empower attorneys with unrivaled knowledge about their case:» Prioritization

» Categorization

» Active Learning

Page 17: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

The Prioritization Component

17

74,000

480,000

Responsive Non-responsive

Learns from reviewer decisions and escalates documents based on two binary categories» Responsive or

nonresponsive

» Works based on modest amount of learning

Increases the ratio of responsive documents that get routed to reviewers

Page 18: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

The Prioritization Component

18

How does this help attorneys analyze their case?» When attorneys ‘check out’ documents to review, they are seeing those

documents most likely to be responsive

» For the same reasons this speeds up production, attorneys who put eyes on these richly relevant documents will know more about their case earlier – driving arguments and filling knowledge gaps

» It runs in the background, you don’t need to carve into billable hours to test keywords

Request batch

Entire Corpus

Page 19: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

19

Learns from trainer decisions and suggests coding on multiple categories for an entire collection of documents

Assigns a predicted responsiveness score

Improves speed and quality of categorization decisions

75% Predicted

Responsive

Non-responsivePrivileged67% Predicted

89% Predicted

The Categorization Component

Page 20: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

The Categorization Component

20

How does this help attorneys analyze their case?» Allows attorneys to segregate data at user-defined predicted

responsiveness ratings after modest training

» Empowers attorneys to route certain categories of documents (e.g. “hot” docs) to certain sub-groups within the team

0% 100%

1,427 docs9,522 docs

Post Round One Categorization Results (65% cutoff)

65%

% likelihood to be responsive

To: Brief-writer BryanRe: Good Luck on the first draft!

Page 21: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Key component of any true PC solution» Automatically escalates focus documents for training (as opposed

to just handpicked, or just randomly selected training documents)

Focus Documents:» Come from grey areas in the classifier because the machine is

currently uncertain whether they are responsive or not responsive

» Ideal candidates to improve machine learning

» Not random, but queried

21

100% responsive

0% non-responsive

90%

80%

70%60%

50%40%

30%20%

10%

The Active Learning Component

Page 22: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

How does this help attorneys analyze their case?» Introduces attorneys to the documents on the fringe of relevancy

– These could be case-changing documents that the machine just doesn’t know enough about yet

» Most effective way to boost metrics and improve results between early training rounds– Reduces false positives; improves accuracy of machine’s concept of

relevancy

22

The Active Learning Component

Pre

cis

ion

Re

ca

ll

Re

ca

ll

Pre

cis

ion

TR1 TR2

Page 23: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Additional Efficiencies

23

Production» Can easily transition into production whether leveraging PC, or not

– Most practical form of PC for EDA

Reporting» Even if just one or two training rounds are performed, metrics will

show where you stand– In this vein, no other EDA tool comes close to PC’s automatic reporting

– There’s a reason courts often ask for recall and precision - these indicate whether you’re understanding of the data set is accurate

Page 24: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Additional Efficiencies

24

Other ECA tools complement predictive coding» Predictive coding requires reviewing a few thousand documents in

training– Most PC solutions also come equipped with all other EDA tools available

– This helps you navigate the training set as well as during review

Intra-team quality control» Can compare reviewer-machine agreement rates side-by-side

» Identify points of disagreement and inconsistency

Page 25: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Additional Efficiencies

25

The small case conundrum» The analytical value from PC is greater where the same subject-

matter expert who trains the system is the same attorney who is forming case strategy– This is most likely true in small-medium cases where one attorney may be

in charge of a case through trial

» The production value from using PC to aid review is greater where high upfront costs can be recouped from applying the machine’s logic to a large amount of documents– Traditionally, this has been true only in large cases

Page 26: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Additional Efficiencies

26

This is all changing

The “portfolio approach” to ediscovery» Pay yearly for PC (and everything that preceded it) in all your cases

for a data hosting fee (process on the vendor’s side)– Upload on day one, train on day one, see a list of documents ranked by

relevancy on day one

Page 27: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

Using PC in an EDA Environment

Page 28: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

28

Overview It’s not that crazy

» EDA tools let you learn more about your data—so does PC

» Many of the tools discussed today (e.g. de-duplication, concept searching) already exist in standalone “PC solutions”

Aggressive culling via keywords can have an impact on training in PC

Any search strategy must be well designed according to the matter at hand

The producing party has substantial deference in conducting its search

Page 29: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

29

In re Biomet» Defendant’s search strategy:

» Plaintiffs argued: the defendant should have used PC on the whole 19.5 million document corpus; the keywords tainted the training. We want joint review of training docs.

» Court held: defendant’s search was reasonable

Pre-PC Keyword Cull?

3million

documents

19.5 million

documents

ProductionKeyword PC

Page 30: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014

30

Parting Thoughts There are many ways to learn about data

» Different tools on the same belt; multi-modal search

Solutions are emerging that offer all of these tools in one location» No more data jockeying

» More information for better decisions

Quality control is essential whenever you use one of these tools to remove documents from production

Page 31: Analysis 360: Blurring the line between EDA and PC Andrea Gibson, Product Director, Kroll Ontrack March 27, 2014