Download pdf - © 2012 Winston & Strawn LLP · 2013-08-09 · Outsourced Manual Review •Most prominent model used today •Limited culling and analysis •Heavy reliance on attorney review •Use

© 2012 Winston & Strawn LLP


eDiscovery and Technology Assisted Review: What You Need to Know Now

Brought to you by Winston & Strawn’s eDiscovery & Information Management Practice Group

© 2012 Winston & Strawn LLP 3

Today’s eLunch Presenters

John RosenthalChair, eDiscovery & Information

ManagementWashington, DC

[email protected]

Chris CostelloeDiscovery & Information

ManagementNew York

[email protected]


Welcome!


Overview

Technology Assisted Review (“TAR”) Definitions

Types of technology assisted review

Predictive Coding

Proposed Changes to the Federal Rules Why the need for further changes in the rules

Overview of the Rules Process

Two-track approach of the Advisory Committee

Rule 37

Duke Conference – Rules 16 and 26


Let’s Look at the Numbers

94% of respondents found the cost of e-discovery “frustrating”

87% of respondents used an early case assessment to try to resolve matters earlier

81% of respondents brought software in-house, which helps to cut costs on law firm or service provider fees

52% of respondents brought staff in-house to help reduce fees spent on law firms or service providers

32% of respondents used clustering or visualization tools to speed review along (down from 34% in 2010)

71% of respondents used contract attorneys for legal review (down from 77% in 2010)

61% of respondents were able to quantify how much money they spent on e-discovery. Many companies are still unaware of their spending habits.

42% of respondents have a tool to collect and preserve data from the cloud or from social media

FTI Survey of 31 In-House General Counsels

6


E-Discovery Spend

7Fulbright Annual Litigation Trends Survey


Electronic Document Review

Excessive and unpredictable costs: 58 % to 70 % of total litigation costs Document review costs are rising due to the increasing

amount of electronic information Traditional document review is not accurate:

Evidence suggests that there are high error rates in linear manual review

Error rates lead to likelihood of inadvertent production of privileged or sensitive information

Inability to defend the review process: Judges are increasingly focusing on the need for validation of

review processes

8


Traditional Electronic Document Review = Linear Review

Over collection

Little or no culling

Ad hoc use of Boolean searches

Linear review of the data set

Use of traditional associate work force to perform review

Produced Documents

Traditional Approach

Manually Acquire Broad Amounts of Data

Second Level Review

Process Data

First Level Review


Goals of ESI Review

Recall - Identification and prioritization of relevant material

Precision - Elimination of irrelevant/non-responsive material

Identification of privileged material

Relevant Data

Retrieved Data

Relevant and not retrieved Non-relevant and retrieved

Relevant and retrieved


Accuracy of Human Review

RecallNumber of responsive documents retrieved

Total number of responsive documents in the collection

PrecisionNumber of responsive documents identified

Total number of documents retrieved


Accuracy of Human Review

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Recall

70%

80%

90%

100%

0%

10%

20%

30%

40%

50%

60%

0%

Prec

isio

n

Perfection

Blair & Maron (1985)

Voorhees

Roitblat

TREC

TREC

TREC

TREC

http://www.google.com/imgres?imgurl=http://www.fotosearch.com/bthumb/OMU/OMU110/21P0702.jpg&imgrefurl=http://www.fotosearch.com/clip-art/gold-star.html&usg=__BnjUVLFN8XKZerzOploCpxVLhYA=&h=170&w=170&sz=8&hl=en&start=9&um=1&itbs=1&tbnid=CXqvi8vEXpcyoM:&tbnh=99&tbnw=99&prev=/images?q=clip+art+gold+star&um=1&hl=en&sa=X&rlz=1T4GGLD_enUS331US369&tbs=isch:1�


The Sedona Conference Commentary on the Use of Search and Information Retrieval

“[T]here appears to be a myth that manual review by humans of large amounts of information is as accurate and complete as possible – perhaps even perfect – and constitutes the gold standard by which all searches should be measured. Even assuming that the profession had the time and resources to continue to conduct manual review of massive sets of electronic data sets (which it does not), the relative efficacy of that approach versus utilizing newly developed automated methods of review remains very much open to debate.”

13


2011 RAND Study re E-Discovery

“Taken together, this body of research shows that groups of human reviewers exhibit significant inconsistency when examining the same set of documents for responsiveness under conditions similar to those in large-scale reviews. . . . Human error in applying the criteria for inclusion appears to be the primary culprit [regarding the lack of accuracy], not a lack of clarity in the document’s meaning or ambiguity in how the scope of the production demand should be interpreted. In other words, people make mistakes, and based on the evidence, they make them regularly when it comes to judging relevancy or responsiveness.”

14


Ralph Losey Revised EDRM Model


Document Review Models

Outsourced Manual Review

• Most prominent model used today

• Limited culling and analysis

• Heavy reliance on attorney review

• Use of sampling to ensure quality control

Predictive Coding

• Great deal of confusion regarding what it means

• Uses attorneys to develop a seed set of data that can be fed into a black box to find similar documents

• Emphasizes sampling of inclusion set and exclusion set

• Only a handful of courts have addressed its use

Technology Assisted Reviews

• Process approach to review to increase efficiency, recall and precision, using legally accepted tool sets:• Threading• Near-Duping• Advance search• Clustering


Technology Assisted Review

Meta-Data Context

Boolean Queries

Wildcard expansions

Proximity Specification

Misspellings/Fuzzy Search

Synonyms

Dupe and Near Dupe

Threading

Concept/clustering engines

LSI, LSA, PLSA

Predictive coding


Technology Assisted Reviews

•Working with client and data to develop a set of defensible “relevance criteria” to select data subject to review Analytical

•Use of search and retrieval at the front end can dramatically reduce the volume and cost

•Risk considerationCollection

•Employ more sophisticated processing tools to further reduce the volume set

•Unilaterally vs. negotiate Processing, Filtering and Culling

•Clustering/Concepting•Threading•Near Dupe•Predictive Coding

Non-Linear Review


• 70% of production is eMail and of that nearly 65% or more are part of e-mail threads

The Problem:

No clear method to identify eMail threads

eMails are reviewed multiple times and inconsistently

Extremely difficult to identify where missing eMails exist

eMail Threads – Step 1

Group into eMail sets

eMail Threads – Step 2

Build tree structure

Identify missing links

Suppress duplicates

Focus on inclusives

Less Cost

Less Time

Less Errors

E-Mail Threads

Source: Equivio


• 15% to 40% of document population are duplicates or near duplicates

The Problem:

No clear method to organize and allocate documents across reviewers

Documents are reviewed multiple times by different reviewers

High risk of different coding among similar documents

Near-Duping – Step 1

Group the near-duplicates

Identify the differences among the near-duplicates

Near-Duping – Step 2

Assign near-dupe sets for coherent review to reviewers

Reviewers prioritize and review only the differences

Apply coding to entire near-dupe sets where appropriate

Less Cost

Less Time

Less Errors

Duplication and Near Duplication

Source: Equivio


Clustering or Concepting

Concept search places a document or part of a document in this space.

Results are returned in order of relevance.

higher score = closer document

• Document 1: 98

• Document 3: 92• Document 4: 91

Source: K-Cura Corp.

Presenter

Presentation Notes

(0) Once the documents have become searchable, you can perform queries. Here’s how a query looks in our 3-dimensional world. A query is just a string of text, of any length. It is subjected to the same preprocessing which we discussed earlier—removal of numbers, stopwords, etc. It is then mapped temporarily to a point in the space. The system then draws a “hit sphere” to find points that are within a certain distance of the query. These documents are returned to the user ordered by their score, which is really just a measure of the distance between that document and the query itself.


“Predictive Coding”


Document Set for Review

What is Predictive Coding

Source: Servient Inc. http://www.servient.com/


Use Cases for Predictive Coding

Early case assessment

Relevance inclusion

Relevance exclusion

Pre-review tagging

Pre-review batching

Privileged review

Review of incoming productions

Internal investigations


Limitations on Predictive Coding

As with any statistical model, caution should be exercised (“Torture numbers, and they’ll confess to anything”)

Garbage in = garbage out Limitations: Not right for all types of cases Size matters Unable to address:

Images Graphics Excel files Video Voice

Confidentiality


Do you Need to Understand the Technology?

Very few individuals in the industry will ever understand the technology

Even fewer people would know how to attack the technology

Does the technology matter?

Not all TAR software is created equal

Same seed set put into different TAR software will yield vastly different results

26

“Muddy water is best cleared by leaving it alone.”

Alan Wilson Watts

http://www.goodreads.com/author/show/1501668.Alan_Wilson_Watts�


Defending the Technology?

What is the basic underlying technology?

Support Vector Machines (SVM) (i.e., patterns are determined and categorized from positive examples (relevant documents) and negative examples (irrelevant documents), and new examples are classified in one category or the other based on whether these patterns appear in the new examples)

Probabilistic Latent Semantic Analysis (PLSA) (i.e., documents are categorized by detecting concepts through a statistical analysis of word contexts; documents are grouped based on probabilities of the number of times words occur together)

Other potential algorithms that generate correlations and categorizations

What has the vendor done to explain the technology?

What has the vendor done to defend the technology?

How can the technology be abused or misused?

What are its limitations? 2012 Technology Concepts & Design, Inc.

27


Stages to a Predictive Coding Process?

28

Team Selection Culling Selection of

Control Set

Iterative Training the

System

Selection of Sensitivity

Quality Control of

Corpus


The Process

Who designed, implemented and supervised the process?

What should your team look like? Sr. Partner?

Contract attorney?

How many people should be on the team?

29


The Process

Selection of the control set? Size? Random or targeted? Entire corpus or issue driven? Entire documents or selected portions? Richness of the data?

Training the system? Iterations? How are conflicts resolved? Is it more important to focus on inclusive or exclusive

documents?

30


Stabilization criteria

Source: Equivo


The Process

Sensitivity

32

Source: Equivo


The Process

Quality control of remaining corpus Written sampling protocol?

How much do you look at?

When do you need to retrain?

33


Da Silva Moore v. Publicis Groupe – Case No. 11 Civ. 1279 (S.D.N.Y. Feb. 24, 2012) (Peck) “[t]he Court determined that the use of predictive coding was appropriate considering … the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches).”

Global Aerospace v. Landow Aviation, Case No. CL 61040 (Va. Cir. Ct. Loudon Co. April 23, 2012), – Va. state judge approved use of predictive coding where defendant stated that it would achieve recall of 76.7%.

Kleen Products v. Packaging Corp. (N.D. Ill.) (Nolan) - refused plaintiffs' request to force defendants to use predictive coding over search terms.

In Re: Actos (Pioglitazone) Products Liability Litigation (W.D. La. 2012) (Doherty) – order setting forth a detailed protocol for the use of predictive coding.

34

Predictive Coding Decisions


Moore v. Publicis & MSL

Judge Peck – “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.”

“The technology exists and should be used where appropriate, but it is not a case of machine replacing humans: it is the process used and the interaction of man and machine that the court needs to examine.”


Judge Peck’s Key Takeaways

Process

Transparency

Proportionality

Cooperation

Competence


Kleen Products, LLC v. Packaging Corp. of America, et al., (J. Nolan) N.D. ILL, Case No. 1:10-cv-05711 (Sept. 9, 2010)

Class action antitrust action filed in 2010

Plaintiffs requested that Defendants use predictive coding (Feb. 2012)

Defendants (7 paper companies) Had already produced over 1 million docs using traditional

keyword-based search terms on key custodians

Thousands of hours (99% complete)

No glaring omissions

3 days of hearings


Global Aerospace v. Landow Aviation (J. Chamblin) Virg. Cir. Ct. (Loudoun), No. CL-61040 (Apr. 23, 2012)

Protective order — defendants can use predictive coding to process and produce documents.

Explains that predictive coding meets duty under Virginia law to use reasonable inquiry and care in discovery.

Contrasts predictive coding with linear human review and keyword searches.

Takeaway — such opinions will dominate.


In re Actos (Pioglitazone) Products Liability Litigation, MDL No. 229 (W.D. La. 2012) (Doherty)

Court issued ESI protocol utilizing predictive coding (Equivio Relevance)

Select 4 custodians for creation of sample collection population; parties to select three “experts” to work together to work collaboratively to train the system.

Parties to meet and confer on the relevance scores generated using sample collection and to decide on a “cutoff” score.

Iterative training phase until system reaches stability

Post Predictive Coding “meet and confer” to finalize method for searching for documents.

Results still a long way off.


Defense of Process

Legal Standard – does not exist

Documentation vs. Transparency

Transparency Is it required?

How much is too much?


Lessons Learned

Sedona Conference Cooperation Proclamation is gaining traction among judiciary, especially as it applies to TAR/predictive coding.

Discussions concerning use of predictive coding should occur early and often (e.g., disclosure of seed sets and process involved, discuss acceptable rates of recall and precision, number of iterations, etc.)

Counsel needs to be cognizant of the strengths and weaknesses of the various TAR/Predictive Coding software and prepared to discuss how best to implement it.

Clients should inquire as to use of predictive coding and appropriateness in case at hand and cost-saving potential.

Although Predictive Coding is not appropriate in all circumstances, courts are beginning to accept its use as a means to handle high volume, complex litigation where it can serve to reduce overall costs and increase likelihood of recall and precision.


Moving Forward

Expect to see more instances where Predictive Coding gets judicial stamp of approval.

Use of Predictive Coding continues moving to investigations and review of documents produced by opposing party to speed reviews.

Expect to see more instances where clients push for cost savings and benefits from using predictive coding.

More in-depth discussions of predictive coding methodologies, proportionality, and sharing of data between counsel prior to Rule 16(f) conferences, and longer, multi-day Rule 16(f) conferences as parties try to agree on protocols implementing TAR/Predictive Coding strategies.

Focus on the process and transparency of the software/Predictive Coding protocol.

Increased importance of developing highly trained and experienced “experts” to develop sample/seed sets.

Loss in revenue from linear review and shifting law firm approach to embrace new technologies/roles for lawyers and paralegals.


Update on Federal Rule Process

Overview of Rules Process Discovery Subcommittee (preservation – triggers, scope,

sanctions)

Duke Subcommittee (proportionality, cooperation, early case management)

Scope of potential amendments Rule 1

Rule 26

Rule 16

Rule 37

Rule 45 (already in progress)


Federal Rules Process

2010: American College of Trial Lawyers study Sedona Conference on Future of Civil Litigation Duke Conference on Future of Civil Litigation

2011 Call for a “mini-conference” in September Mini-conference occurs in Dallas – Sept. 9 Submissions by law firms, corporations and academia

2012 FJC Early Stages of Litigation Report Rand Report Sedona Conference Proposed Draft Rules

Presenter

Presentation Notes

So the pushback began in late 2009. ACTL survey and study – Sept. 2009. It famously declared, “Our system of civil discovery is broken” and singled out e-discovery as the main reason. Many of you will have heard of The Sedona Conference. Some perhaps have not. It’s a think tank, loosely stated, with law firm, corporation, vendor and judicial membership. It made its name by promulgating the Sedona Principles and the Sedona Guidelines in 2004. Sedona convened a conference in spring of 2010 to explore the Future of Civil Litigation. I was asked to co-chair it with a plaintiff’s lawyer named Bill Butterfield. Preservation emerged as the issue most attendees felt needed attention from the rules committee. The Advisory Committee held a conference one month later, in May 2010, when the same topics were explored, and once again, preservation emerged as the topic about which most participants felt needed reform. The committee heard the outcry and started to study the issue. [NEXT SLIDE]


Microsoft’s Comments


Microsoft’s Comments

Preserved 48,431,250 pages

Collected and Processed12,915,000 pages

Reviewed 645,750 pages

Produced141,450

pagesUsed

142 pages


Rule 37(e) Proposal

(e) FAILURE TO PRESERVE DISCOVERABLE INFORMATION. If a party fails to preserve discoverable information that reasonably should be preserved in the anticipation or conduct of litigation,

(1) The court may permit additional discovery, order the party to undertake curative measures, or require the party to pay the reasonable expenses, including attorney’s fees, caused by the failure.

(2) The court may impose any of the sanctions listed in Rule 37(b)(2)(A) or give an adverse-inference jury instruction only if the court finds:

(A) that the failure was willful or in bad faith and caused substantial prejudice in the litigation; or

(B) that the failure irreparably deprived a party of any meaningful opportunity to present a claim or defense.


Rule 37(e) Proposal

(3) In determining whether a party failed to preserve discoverable information that reasonably should have been preserved, and whether the failure was willful or in bad faith, the court should consider all relevant factors, including:

(A) the extent to which the party was on notice that litigation was likely and that the information would be discoverable;

(B) the reasonableness of the party’s efforts to preserve the information, including the use of a

litigation hold and the scope of the preservation efforts;

(C) whether the party received a request that information be preserved, the clarity and reasonableness of the request, and whether the person who made the request and the party engaged in good-faith consultation regarding the scope of preservation;

(D) the party’s resources and sophistication in litigation;

(E) the proportionality of the preservation efforts to any anticipated or ongoing litigation; and

(F) whether the party sought timely guidance from the court regarding any unresolved disputes concerning the preservation of discoverable information.


What Will Happen?


Questions?


Thank You.