Data-Driven Discovery: Seven Metrics for Smarter Decisions and Better Results

  • View
    218

  • Download
    0

  • Category

    Law

Preview:

Citation preview

877.557.4273cata lystsecure.com

Speakers

Robert Ambrogi, Esq.Host & Moderator

Data Driven Discovery Jeremy Pickens, Ph.D.Mark Noel, Esq.Seven Metrics for Smarter Decisions & Better Results

APRIL WEBINAR

Today’s Agenda§ Being Empirical 101

§Kaizen, the scientific method, and getting good data§How much do you really need to do or know to be effective?

§ A data-driven look at seven phases of discovery§What variables might be in play?

§Any variables that seem plausible but don’t have much effect?§When it might be a good idea to measure§What/how to measure§Other experiments or research

§ Questions / open discussion

SpeakersJeremy Pickens, Ph.D. | Chief Scientist, Catalyst

Mark Noel, Esq. | Managing Director, Professional Services, Catalyst

Robert Ambrogi, Esq. | Director of Communications, Catalyst

Jeremy is one of the world's leading search scientists and a pioneer in the field of collaborative exploratory search. He has six patents pending in the field of search and information retrieval, including two for collaborative exploratory search systems. At Catalyst, Jeremy researches and develops methods of using collaborative search and other techniques to enhance search and review within the Catalyst system and help clients achieve more intelligent and precise results in e-discovery search and review.

Mark specializes in helping clients use technology-assisted review, advanced analytics, and custom workflows to handle complex and large-scale litigations. He also works with Catalyst’s research and development group on new litigation technology tools. Before joining Catalyst, Mark was an intellectual property litigator with Latham & Watkins LLP, co-founder of an e-discovery software startup, and a research scientist at Dartmouth College’s Interactive Media Laboratory.

Bob is a practicing lawyer in Massachusetts and is the former editor-in-chief of The National Law Journal, Lawyers USA and Massachusetts Lawyers Weekly. A fellow of the College of Law Practice Management, he writes the award-winning blog LawSites and co-hosts the legal-affairs podcast Lawyer2Lawyer. He is a regular contributor to the ABA Journal and is vice chair of the editorial board of the ABA’s Law Practice magazine.

Abraham FlexnerEvidence-based medicine

Fundamentals§ Spot the testable question – don’t guess§ Good experimental design – don’t make it up as you go along§ Control variables§ Assume data will be noisy – you may need several matters§ Measure early and often

“The first principle is that you must not fool yourself, and you are the easiest person to fool.”

– Richard Feynman

Cranfield Model (1966)1. Assemble a test collection:

§Document Corpus§Expression of User Information Need§Relevance Judgments, aka ground truth

2. Choose an Effectiveness Metric

3. Vary the TAR system

Training ProtocolCondition 1 Condition 2 Condition 3

Document Corpus Corpus Z Corpus Z Corpus Z

Starting Condition (e.g. seed documents, ad hoc query, etc.)

[docid:7643 = true][docid:225 = true]

[docid:7643 = true][docid:225 = true]

[docid:7643 = true][docid:225 = true]

Feature (Signal) Extraction Character n-grams Character n-grams Character n-grams

Ranking Engine Logistic Regresssion Logistic Regresssion Logistic Regresssion

Training/Review Protocol SPL SAL CAL

Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]

[docid:7643 = true][docid:225 = true][docid:42 = false]

[docid:7643 = true][docid:225 = true][docid:42 = false]

Evaluation Metric Precision@75% recall Precision@75% recall Precision@75% recall

Okay, but what about all us non-scientists?

1. Targeted Collections

§ Number of custodians§ Number or type of collection sources§ Sophistication, reasonableness, or tenacity of opposing side§ Likelihood of unrelated but sensitive material being scooped up§ Capabilities of targeted collection tools

Factors to consider:

1. Targeted Collections

§ Process and review random samples from initial custodians§ Use TAR to sort into “likely positive” and “likely negative” populations§ Use TAR to also generate or supplement list of potential search terms§ Run a report to compare each term’s hit count or hit density in each of the populations

Example: Generating and validating boolean terms

“Raptor”

“Payments”

2. Culling

§ Black list vs. white list search terms§ File type, date range, etc. § Can we get a stip?§ Do we need to validate in order to avoid a Biomet problem?§ Is it even worth doing in light of what we’re doing next?

Factors to consider:

Problem with Keyword Search

§ Attorneys worked with experienced paralegals to develop search terms. Upon finishing, they estimated that they had retrieved at least three quarters of all relevant documents. § What they actually retrieved >

Generally Poor Results

Blair&Maron,AnEvaluationofRetrievalEffectivenessforaFull-TextDocument-RetrievalSystem(1985).

Problem with Keyword Search

§ (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboroAND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR("brown & williamson") OR bat industries OR liggett group)

Jason R. Baron, Through A Lawyer’s Lens: Measuring Performance in Conducting Large Scale Searches Against Heterogeneous Data Sets in Satisfaction of Litigation Requirements, University of Pennsylvania Workshop, (October 26, 2006).

It can become overly complex

2. Culling

§ Sample review to validate recall or elusion§ Total cost of culling effort vs. total cost to promote additional documents to review§ Using TAR, additional non-relevant docs might not get reviewed anyway§ Developing and validating extensive culling terms requires a lot of human effort

Potential Metrics:

2. CullingExample: Manual search and culling vs. TAR / CAL

Time Cost

Manual development and validation of search terms

Two weeks (with two associates) to cull 700,000 documents

160 Associate hours

Letting TAR do the workOne day (with 12-person review team) to review 6,000 more docs

100 review hours + technology cost of 700,000 * $0.01

3. ECA & Investigation

§ Know what we’re looking for?§ Possible intent to evade search? § Time/resource constraints?§ Blair & Maron – almost 50 topics, 75% recall on 1, but many with less than 3% recall.§ TREC 2016 will have a topical recall track.

Factors to consider:

3. ECA & Investigation

§ Number of different search techniques§ Total time, total docs, or percentage of docs required to reach a defensible outcome

Potential Metrics:

4. Review

§ Richness estimates§ Population overall§ Batch richness / review precision§ Family or doc level review§ Factors for and against each

§Overall richness§Average family size§Workflow and tool capabilities

§ Dependence on context to make relevance judgments

Factors to consider:

4. Review

§ Review rate§ Relevant vs. Non-relevant§ Threading§ Clustering / TAR§ Heterogeneity

§ Complexity§ Number of coding fields§ Separating variables in coding field§ Bifurcated workflows (e.g., special file types)

More Factors to consider:

4. Review

§ The usual suspects§ Population richness§ Batch richness§ Docs per hour§ Relevant docs per review hour

§ Review precision by day (“yield”)§ Review precision by reviewer§ A/B Testing

Potential Metrics:

5. TAR and Analytics

§ Re-using seeds§ Which/how many judgmental seeds§ Artificial seeds§ Variations on weighting§ Richness constraints?

§ Training protocol§ Frequency of re-ranking

Factors to consider:

5. TAR and AnalyticsFrequency of updates

Condition 1 Condition 2

Document Corpus Corpus Z Corpus Z

Starting Condition (e.g. seed documents, ad hoc query, etc.)

[docid:7643 = true][docid:225 = true]

[docid:7643 = true][docid:225 = true]

Feature (Signal) Extraction 1-grams 1-gramsRanking Engine Logistic Regresssion Logistic Regresssion

Training/Review Protocol CAL, reranked once a day CAL, reranked every 10 min.

Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]

[docid:7643 = true][docid:225 = true][docid:42 = false]

Evaluation Metric Precision@75% recall Precision@75% recall

5. TAR and AnalyticsExample: Frequency of Re-ranking

Case 1

5. TAR and AnalyticsExample: Frequency of Re-ranking

Case 2

5. TAR and AnalyticsExample: Frequency of Re-ranking

Case 3

5. TAR and AnalyticsExample: Frequency of Re-ranking

Case 4

5. TAR and AnalyticsCondition 1 Condition 2

Document Corpus English Corpus Mixed Japanese+EnglishCorpus

Starting Condition (e.g. seed documents, ad hoc query, etc.)

[docid:7643 = true][docid:225 = true]

[docid:9356 = false][docid:89 = true]

Feature (Signal) Extraction 1-grams, no cross-language features

1-grams, no cross-language features

Ranking Engine Logistic Regresssion Support Vector MachineTraining/Review Protocol CAL CAL

Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]

[docid:9356 = false][docid:89 = true][docid:42 = false]

Evaluation Metric Precision@75% recall Precision@75% recall

6. Quality ControlFactors to consider:

§ How much is enough?§ Random or systematic? § Validate? § Base level of disagreement inherent in human review, or significant? § Disagreement due to expertise differences between first pass and QC team (tendency of less expert to over-mark and err on the side of relevant) § Relevance drift

Disagreement Among ReviewersMaura R Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and Efficient Than Exhaustive and Manual Review, XVII Rich. J.L. & Tech. 11 (2011), http://jolt.richmond.edu/v1713/articlee11.pdf; Ellen M. Voorhees, Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, 36 Info. Processing & Mgmt 697 (2000).

6. Quality ControlPotential metrics:

§ Overturn rate in a validation sample§ Overturn yield:

Expert vs Non-Expert TrainingCondition Condition 2

Document Corpus Corpus Z Corpus Z

Starting Condition (e.g. seed documents, ad hoc query, etc.)

[docid:7643 = true][docid:225 = false]

[docid:7643 = true][docid:225 = true]

Feature (Signal) Extraction 1-grams 1-gramsRanking Engine Logistic Regresssion Logistic Regresssion

Training/Review Protocol CAL, using non-expert judgments

CAL, using expert judgments

Ground Truth[docid:7643 = true][docid:225 = true][docid:42 = false]

[docid:7643 = true][docid:225 = true][docid:42 = false]

Evaluation Metric Precision@75% recall Precision@75% recall

7. ValidationFactors to consider:

§ How confident do you need to be?§ What are the boundaries of the process we need to validate?§ Will one total recall number be sufficient, or do you also need some guarantee of topical completeness?§ Who are you defending the process to and what metrics do they care about?

§ Recall – people trying to find stuff§ Precision – people paying for stuff

7. ValidationAnother Example: Review Precision

7. ValidationExample: Review Metrics for Outside Counsel

Questions & Answers

Jeremy Pickens, Ph.D.

Mark Noel, Esq

Robert Ambrogi, Esq

jpickens@catalystsecure.com

mnoel@catalystsecure.com

bambrogi@catalystsecure.com

You may use the chat feature at any time to ask questions

Lowering Your Total Cost of ReviewUsing Predicative Analytics

Thursday, May 12, 2016 | 2 p.m. Eastern

John Tredennick Michael Arkfeld David Stanton

Recommended