Catalyst_Reeling in the Red Herrings of TAR

877.557.4273

catalystsecure.com

Reeling In the Red Herrings of TAREvidence vs. FUD

WEBINAR

Tom Gricks Mark Noel

Speakers

Speakers

A prominent e-discovery lawyer and one of the nation's leading authorities on the use of TAR in litigation, Tom advises corporations and law firms on best practices for applying Catalyst’s TAR technology, Insight Predict, to reduce the time and cost of discovery. He has more than 25 years’ experience as a trial lawyer and in-house counsel, most recently with the law firm Schnader Harrison Segal & Lewis, where he was a partner and chair of the e-Discovery Practice Group.

Managing Director, Professional Services, CatalystTom Gricks

Mark specializes in helping clients use technology-assisted review, advanced analytics, and custom workflows to handle complex and large-scale litigations. He also works with Catalyst’s research and development group on new litigation technology tools. Before joining Catalyst, Mark was an intellectual property litigator with Latham & Watkins LLP, co-founder of an e-discovery software startup, and a research scientist at Dartmouth College’s Interactive Media Laboratory.

Managing Director, Professional Services, CatalystMark Noel

Today’s Agenda

Key Points for Today

Rely on evidence and data, not intuition or FUD

Be clear about your goals and the “why”

Measure everything against the same standard

FUD

Example 1

Example 1 (improved by editing)

Recall Relevant Not Relevant

Search Results

“How complete was my catch?”

Precision Relevant Not Relevant

Search Results

“How pure was my catch?”

The 1st Problem with Linear Review

Disagreement among reviewers

Maura R Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and Efficient Than Exhaustive and Manual Review, XVII Rich. J.L. & Tech. 11 (2011), http://jolt.richmond.edu/v1713/articlee11.pdf; Ellen M. Voorhees, Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, 36 Info. Processing & Mgmt 697 (2000)

The 2nd Problem with Linear ReviewInconsistent, often poor results

Prec

isio

n

Recall The % relevant documents from the Collection Set that are in the Review Set

Precision The % documents in the Review Set that are truly relevant (the balance of the Review Set is junk)

Recall

The 1st Problem with Keyword Search

• (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR("brown & williamson") OR bat industries OR liggett group)

Jason R. Baron, Through A Lawyer’s Lens: Measuring Performance in Conducting Large Scale Searches Against Heterogeneous Data Sets in Satisfaction of Litigation Requirements, University of Pennsylvania Workshop, (October 26, 2006)

It can become overly complex

The 2nd Problem with Keyword SearchGenerally poor results

Recall

Prec

isio

n

Recall The % relevant documents from the Collection Set that are in the Review Set

Precision The % documents in the Review Set that are truly relevant (the balance of the Review Set is junk)

The 2nd Problem with Keyword SearchGenerally poor results

• Attorneys worked with experienced paralegals to develop search terms. Upon finishing, they estimated that they had retrieved at least three quarters of all relevant documents.

Blair & Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System (1985).

20%75%

• What they actually retrieved:

Physiological Factors

• “Justice is what the judge ate for breakfast” – Jerome Frank

Danziger, Leva and Avnaim-Pesso, Extraneous factors in judicial decisions, PNAS (2011).

Proportion of rulings in favor of the prisoners by ordinal position.

Danziger S et al. PNAS 2011;108:6889-6892

©2011 by National Academy of Sciences

• Review accuracy may be what the junior associates ate for lunch

• Executive function influenced by blood glucose levels, breaks, positive mood, and viewing pictures of nature.

• United States Federal Court — Rio Tinto v Vale (2015 Peck, J.) In the three years since Da Silva Moore, the case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it

• United States Tax Court — Dynamo Holdings v. Commissioner (2014)

• State Court — Global Aerospace (2012 Loudoun County, VA)

• Great Britain — Pyrrho Investments (2016); Brown v. BCA Trading (2016)

• Ireland — IRBC v. Quinn (2015)

The Current State of the LawSo Where Are We Today?

Example 2

Allaying Concerns Over Definition• Widely Disparate Views

• Grossman-Cormack Glossary A process for Prioritizing or Coding a Collection of Documents using a computerized system that harnesses human judgments of one or more Subject Matter Expert(s) on a smaller set of Documents and then extrapolates those judgments to the remaining Document Collection.

• Everything else — clustering, threading, near-dupe, even keyword search

• An Easy Solution

• Address your concerns in an ESI Case Management Order

• Focus primarily on validation

Three Major Components of TAR System

• Feature Selection

• Algorithm

• Process (Workflow)

• E.g., Latent semantic whatever, words, n-grams, metadata

• E.g., SVM, k-nearest neighbor, Bayesian inference, logistic regression

• E.g., Continuous Active Learning

Example 3

Cranfield Model (1966)

1. Assemble a test collection: • Document corpus • Expression of user information need • Relevance judgments, aka ground truth

2. Choose an Effectiveness Metric

3. Vary the TAR system

Frequency of Re-ranking

— Perfect

— 10 minutes

— Daily

— Weekly

— Linear

Case 1


— Perfect

— 10 minutes

— Daily

— Weekly

— Linear

Case 2


— Perfect

— 10 minutes

— Daily

— Weekly

— Linear

Case 3


— Perfect

— 10 minutes

— Daily

— Weekly

— Linear

Case 4

Example 4

Recall is a bad metric for TAR or e-discovery. It only reflects the “easy to find” documents.

Recall and the Metrics of Validation

• Looking at typical metrics

• Recall

• Precision

• Challenges to recall as a validation measure

• Recall only measures mass, not distribution

• Recall can’t guarantee finding the “smoking gun”

• Recall doesn’t address probative weight

Recall Only Reflects “Easy to Find” Documents• The theory

• You can easily achieve 80% recall with duplicates, finding only predominant themes, but missing documents from a wide array of other topics

• The facts

• “Reasonable inquiry” not thematic distribution

• ALWAYS have the right to challenge deficiencies

• Know your tools

• CAL effects multi-faceted recall (Grossman-Cormack, SIGIR 2015)

• Catalyst Predict incorporates Contextual Diversity

Recall Can Miss the “Smoking Gun”

• The theory

• You can’t guarantee finding the smoking gun with a recall metric

• The facts

• You never could guarantee finding a smoking gun

“I’m in. I’ll be shredding ’till 11am so I should have plenty of time to make it.”

• Unlikely pre-planned indecipherability — “documents have friends”

• Again, Contextual Diversity guards against unseen words

Recall Doesn’t Account for Probative Weight

• The theory

• By focusing on mass, you don’t address whether “important” documents are produced

• The facts

• Production has NEVER been done that way

• It’s not required by ANY Rules of Civil Procedure

• Most certainly work product

• Disagreement among opposing counsel

“The first principle is that you must not fool yourself, and you are the easiest person to fool.”

– Richard Feynman

Abraham Flexner

Evidence-based medicine

Questions & Discussion

You may use the chat feature at any time to ask questions

Mark [email protected]

Tom [email protected]

mailto:[email protected]?subject=


We hope you’ve enjoyed this discussionThank You!

A prominent e-discovery lawyer and one of the nation's leading authorities on the use of TAR in litigation, Tom advises corporations and law firms on best practices for applying Catalyst’s TAR technology, Insight Predict, to reduce the time and cost of discovery. He has more than 25 years’ experience as a trial lawyer and in-house counsel, most recently with the law firm Schnader Harrison Segal & Lewis, where he was a partner and chair of the e-Discovery Practice Group.

Managing Director, Professional Services, CatalystTom Gricks

Mark specializes in helping clients use technology-assisted review, advanced analytics, and custom workflows to handle complex and large-scale litigations. He also works with Catalyst’s research and development group on new litigation technology tools. Before joining Catalyst, Mark was an intellectual property litigator with Latham & Watkins LLP, co-founder of an e-discovery software startup, and a research scientist at Dartmouth College’s Interactive Media Laboratory.

Managing Director, Professional Services, CatalystMark Noel

[email protected]

[email protected]