© 2012 Winston & Strawn LLP
© 2012 Winston & Strawn LLP
eDiscovery and Technology Assisted Review: What You Need to Know Now
Brought to you by Winston & Strawn’s eDiscovery & Information Management Practice Group
© 2012 Winston & Strawn LLP 3
Today’s eLunch Presenters
John RosenthalChair, eDiscovery & Information
ManagementWashington, DC
Chris CostelloeDiscovery & Information
ManagementNew York
© 2012 Winston & Strawn LLP 4
Welcome!
© 2012 Winston & Strawn LLP 5
Overview
Technology Assisted Review (“TAR”) Definitions
Types of technology assisted review
Predictive Coding
Proposed Changes to the Federal Rules Why the need for further changes in the rules
Overview of the Rules Process
Two-track approach of the Advisory Committee
Rule 37
Duke Conference – Rules 16 and 26
© 2012 Winston & Strawn LLP 6
Let’s Look at the Numbers
94% of respondents found the cost of e-discovery “frustrating”
87% of respondents used an early case assessment to try to resolve matters earlier
81% of respondents brought software in-house, which helps to cut costs on law firm or service provider fees
52% of respondents brought staff in-house to help reduce fees spent on law firms or service providers
32% of respondents used clustering or visualization tools to speed review along (down from 34% in 2010)
71% of respondents used contract attorneys for legal review (down from 77% in 2010)
61% of respondents were able to quantify how much money they spent on e-discovery. Many companies are still unaware of their spending habits.
42% of respondents have a tool to collect and preserve data from the cloud or from social media
FTI Survey of 31 In-House General Counsels
6
© 2012 Winston & Strawn LLP 7
E-Discovery Spend
7Fulbright Annual Litigation Trends Survey
© 2012 Winston & Strawn LLP 8
Electronic Document Review
Excessive and unpredictable costs: 58 % to 70 % of total litigation costs Document review costs are rising due to the increasing
amount of electronic information Traditional document review is not accurate:
Evidence suggests that there are high error rates in linear manual review
Error rates lead to likelihood of inadvertent production of privileged or sensitive information
Inability to defend the review process: Judges are increasingly focusing on the need for validation of
review processes
8
© 2012 Winston & Strawn LLP 9
Traditional Electronic Document Review = Linear Review
Over collection
Little or no culling
Ad hoc use of Boolean searches
Linear review of the data set
Use of traditional associate work force to perform review
Produced Documents
Traditional Approach
Manually Acquire Broad Amounts of Data
Second Level Review
Process Data
First Level Review
© 2012 Winston & Strawn LLP 10
Goals of ESI Review
Recall - Identification and prioritization of relevant material
Precision - Elimination of irrelevant/non-responsive material
Identification of privileged material
Relevant Data
Retrieved Data
Relevant and not retrieved Non-relevant and retrieved
Relevant and retrieved
© 2012 Winston & Strawn LLP 11
Accuracy of Human Review
RecallNumber of responsive documents retrieved
Total number of responsive documents in the collection
PrecisionNumber of responsive documents identified
Total number of documents retrieved
© 2012 Winston & Strawn LLP 12
Accuracy of Human Review
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Recall
70%
80%
90%
100%
0%
10%
20%
30%
40%
50%
60%
0%
Prec
isio
n
Perfection
Blair & Maron (1985)
Voorhees
Roitblat
TREC
TREC
TREC
TREC
© 2012 Winston & Strawn LLP 13
The Sedona Conference Commentary on the Use of Search and Information Retrieval
“[T]here appears to be a myth that manual review by humans of large amounts of information is as accurate and complete as possible – perhaps even perfect – and constitutes the gold standard by which all searches should be measured. Even assuming that the profession had the time and resources to continue to conduct manual review of massive sets of electronic data sets (which it does not), the relative efficacy of that approach versus utilizing newly developed automated methods of review remains very much open to debate.”
13
© 2012 Winston & Strawn LLP 14
2011 RAND Study re E-Discovery
“Taken together, this body of research shows that groups of human reviewers exhibit significant inconsistency when examining the same set of documents for responsiveness under conditions similar to those in large-scale reviews. . . . Human error in applying the criteria for inclusion appears to be the primary culprit [regarding the lack of accuracy], not a lack of clarity in the document’s meaning or ambiguity in how the scope of the production demand should be interpreted. In other words, people make mistakes, and based on the evidence, they make them regularly when it comes to judging relevancy or responsiveness.”
14
© 2012 Winston & Strawn LLP 15
Ralph Losey Revised EDRM Model
© 2012 Winston & Strawn LLP 16
Document Review Models
Outsourced Manual Review
• Most prominent model used today
• Limited culling and analysis
• Heavy reliance on attorney review
• Use of sampling to ensure quality control
Predictive Coding
• Great deal of confusion regarding what it means
• Uses attorneys to develop a seed set of data that can be fed into a black box to find similar documents
• Emphasizes sampling of inclusion set and exclusion set
• Only a handful of courts have addressed its use
Technology Assisted Reviews
• Process approach to review to increase efficiency, recall and precision, using legally accepted tool sets:• Threading• Near-Duping• Advance search• Clustering
© 2012 Winston & Strawn LLP 17
Technology Assisted Review
Meta-Data Context
Boolean Queries
Wildcard expansions
Proximity Specification
Misspellings/Fuzzy Search
Synonyms
Dupe and Near Dupe
Threading
Concept/clustering engines
LSI, LSA, PLSA
Predictive coding
© 2012 Winston & Strawn LLP 18
Technology Assisted Reviews
•Working with client and data to develop a set of defensible “relevance criteria” to select data subject to review Analytical
•Use of search and retrieval at the front end can dramatically reduce the volume and cost
•Risk considerationCollection
•Employ more sophisticated processing tools to further reduce the volume set
•Unilaterally vs. negotiate Processing, Filtering and Culling
•Clustering/Concepting•Threading•Near Dupe•Predictive Coding
Non-Linear Review
© 2012 Winston & Strawn LLP 19
• 70% of production is eMail and of that nearly 65% or more are part of e-mail threads
The Problem:
No clear method to identify eMail threads
eMails are reviewed multiple times and inconsistently
Extremely difficult to identify where missing eMails exist
eMail Threads – Step 1
Group into eMail sets
eMail Threads – Step 2
Build tree structure
Identify missing links
Suppress duplicates
Focus on inclusives
Less Cost
Less Time
Less Errors
E-Mail Threads
Source: Equivio
© 2012 Winston & Strawn LLP 20
• 15% to 40% of document population are duplicates or near duplicates
The Problem:
No clear method to organize and allocate documents across reviewers
Documents are reviewed multiple times by different reviewers
High risk of different coding among similar documents
Near-Duping – Step 1
Group the near-duplicates
Identify the differences among the near-duplicates
Near-Duping – Step 2
Assign near-dupe sets for coherent review to reviewers
Reviewers prioritize and review only the differences
Apply coding to entire near-dupe sets where appropriate
Less Cost
Less Time
Less Errors
Duplication and Near Duplication
Source: Equivio
© 2012 Winston & Strawn LLP 21
Clustering or Concepting
Concept search places a document or part of a document in this space.
Results are returned in order of relevance.
higher score = closer document
• Document 1: 98
• Document 3: 92• Document 4: 91
Source: K-Cura Corp.
© 2012 Winston & Strawn LLP 22
“Predictive Coding”
© 2012 Winston & Strawn LLP 23
Document Set for Review
What is Predictive Coding
Source: Servient Inc. http://www.servient.com/
© 2012 Winston & Strawn LLP 24
Use Cases for Predictive Coding
Early case assessment
Relevance inclusion
Relevance exclusion
Pre-review tagging
Pre-review batching
Privileged review
Review of incoming productions
Internal investigations
© 2012 Winston & Strawn LLP 25
Limitations on Predictive Coding
As with any statistical model, caution should be exercised (“Torture numbers, and they’ll confess to anything”)
Garbage in = garbage out Limitations: Not right for all types of cases Size matters Unable to address:
Images Graphics Excel files Video Voice
Confidentiality
© 2012 Winston & Strawn LLP 26
Do you Need to Understand the Technology?
Very few individuals in the industry will ever understand the technology
Even fewer people would know how to attack the technology
Does the technology matter?
Not all TAR software is created equal
Same seed set put into different TAR software will yield vastly different results
26
“Muddy water is best cleared by leaving it alone.”
Alan Wilson Watts
© 2012 Winston & Strawn LLP 27
Defending the Technology?
What is the basic underlying technology?
Support Vector Machines (SVM) (i.e., patterns are determined and categorized from positive examples (relevant documents) and negative examples (irrelevant documents), and new examples are classified in one category or the other based on whether these patterns appear in the new examples)
Probabilistic Latent Semantic Analysis (PLSA) (i.e., documents are categorized by detecting concepts through a statistical analysis of word contexts; documents are grouped based on probabilities of the number of times words occur together)
Other potential algorithms that generate correlations and categorizations
What has the vendor done to explain the technology?
What has the vendor done to defend the technology?
How can the technology be abused or misused?
What are its limitations? 2012 Technology Concepts & Design, Inc.
27
© 2012 Winston & Strawn LLP 28
Stages to a Predictive Coding Process?
28
Team Selection Culling Selection of
Control Set
Iterative Training the
System
Selection of Sensitivity
Quality Control of
Corpus
© 2012 Winston & Strawn LLP 29
The Process
Who designed, implemented and supervised the process?
What should your team look like? Sr. Partner?
Contract attorney?
How many people should be on the team?
29
© 2012 Winston & Strawn LLP 30
The Process
Selection of the control set? Size? Random or targeted? Entire corpus or issue driven? Entire documents or selected portions? Richness of the data?
Training the system? Iterations? How are conflicts resolved? Is it more important to focus on inclusive or exclusive
documents?
30
© 2012 Winston & Strawn LLP 3131
Stabilization criteria
Source: Equivo
© 2012 Winston & Strawn LLP 32
The Process
Sensitivity
32
Source: Equivo
© 2012 Winston & Strawn LLP 33
The Process
Quality control of remaining corpus Written sampling protocol?
How much do you look at?
When do you need to retrain?
33
© 2012 Winston & Strawn LLP 34
Da Silva Moore v. Publicis Groupe – Case No. 11 Civ. 1279 (S.D.N.Y. Feb. 24, 2012) (Peck) “[t]he Court determined that the use of predictive coding was appropriate considering … the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches).”
Global Aerospace v. Landow Aviation, Case No. CL 61040 (Va. Cir. Ct. Loudon Co. April 23, 2012), – Va. state judge approved use of predictive coding where defendant stated that it would achieve recall of 76.7%.
Kleen Products v. Packaging Corp. (N.D. Ill.) (Nolan) - refused plaintiffs' request to force defendants to use predictive coding over search terms.
In Re: Actos (Pioglitazone) Products Liability Litigation (W.D. La. 2012) (Doherty) – order setting forth a detailed protocol for the use of predictive coding.
34
Predictive Coding Decisions
© 2012 Winston & Strawn LLP 35
Moore v. Publicis & MSL
Judge Peck – “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.”
“The technology exists and should be used where appropriate, but it is not a case of machine replacing humans: it is the process used and the interaction of man and machine that the court needs to examine.”
© 2012 Winston & Strawn LLP 36
Judge Peck’s Key Takeaways
Process
Transparency
Proportionality
Cooperation
Competence
© 2012 Winston & Strawn LLP 37
Kleen Products, LLC v. Packaging Corp. of America, et al., (J. Nolan) N.D. ILL, Case No. 1:10-cv-05711 (Sept. 9, 2010)
Class action antitrust action filed in 2010
Plaintiffs requested that Defendants use predictive coding (Feb. 2012)
Defendants (7 paper companies) Had already produced over 1 million docs using traditional
keyword-based search terms on key custodians
Thousands of hours (99% complete)
No glaring omissions
3 days of hearings
© 2012 Winston & Strawn LLP 38
Global Aerospace v. Landow Aviation (J. Chamblin) Virg. Cir. Ct. (Loudoun), No. CL-61040 (Apr. 23, 2012)
Protective order — defendants can use predictive coding to process and produce documents.
Explains that predictive coding meets duty under Virginia law to use reasonable inquiry and care in discovery.
Contrasts predictive coding with linear human review and keyword searches.
Takeaway — such opinions will dominate.
© 2012 Winston & Strawn LLP 39
In re Actos (Pioglitazone) Products Liability Litigation, MDL No. 229 (W.D. La. 2012) (Doherty)
Court issued ESI protocol utilizing predictive coding (Equivio Relevance)
Select 4 custodians for creation of sample collection population; parties to select three “experts” to work together to work collaboratively to train the system.
Parties to meet and confer on the relevance scores generated using sample collection and to decide on a “cutoff” score.
Iterative training phase until system reaches stability
Post Predictive Coding “meet and confer” to finalize method for searching for documents.
Results still a long way off.
© 2012 Winston & Strawn LLP 40
Defense of Process
Legal Standard – does not exist
Documentation vs. Transparency
Transparency Is it required?
How much is too much?
© 2012 Winston & Strawn LLP 41
Lessons Learned
Sedona Conference Cooperation Proclamation is gaining traction among judiciary, especially as it applies to TAR/predictive coding.
Discussions concerning use of predictive coding should occur early and often (e.g., disclosure of seed sets and process involved, discuss acceptable rates of recall and precision, number of iterations, etc.)
Counsel needs to be cognizant of the strengths and weaknesses of the various TAR/Predictive Coding software and prepared to discuss how best to implement it.
Clients should inquire as to use of predictive coding and appropriateness in case at hand and cost-saving potential.
Although Predictive Coding is not appropriate in all circumstances, courts are beginning to accept its use as a means to handle high volume, complex litigation where it can serve to reduce overall costs and increase likelihood of recall and precision.
© 2012 Winston & Strawn LLP 42
Moving Forward
Expect to see more instances where Predictive Coding gets judicial stamp of approval.
Use of Predictive Coding continues moving to investigations and review of documents produced by opposing party to speed reviews.
Expect to see more instances where clients push for cost savings and benefits from using predictive coding.
More in-depth discussions of predictive coding methodologies, proportionality, and sharing of data between counsel prior to Rule 16(f) conferences, and longer, multi-day Rule 16(f) conferences as parties try to agree on protocols implementing TAR/Predictive Coding strategies.
Focus on the process and transparency of the software/Predictive Coding protocol.
Increased importance of developing highly trained and experienced “experts” to develop sample/seed sets.
Loss in revenue from linear review and shifting law firm approach to embrace new technologies/roles for lawyers and paralegals.
© 2012 Winston & Strawn LLP 43
Update on Federal Rule Process
Overview of Rules Process Discovery Subcommittee (preservation – triggers, scope,
sanctions)
Duke Subcommittee (proportionality, cooperation, early case management)
Scope of potential amendments Rule 1
Rule 26
Rule 16
Rule 37
Rule 45 (already in progress)
© 2012 Winston & Strawn LLP 44
Federal Rules Process
2010: American College of Trial Lawyers study Sedona Conference on Future of Civil Litigation Duke Conference on Future of Civil Litigation
2011 Call for a “mini-conference” in September Mini-conference occurs in Dallas – Sept. 9 Submissions by law firms, corporations and academia
2012 FJC Early Stages of Litigation Report Rand Report Sedona Conference Proposed Draft Rules
© 2012 Winston & Strawn LLP 45
Microsoft’s Comments
© 2012 Winston & Strawn LLP 46
Microsoft’s Comments
Preserved 48,431,250 pages
Collected and Processed12,915,000 pages
Reviewed 645,750 pages
Produced141,450
pagesUsed
142 pages
© 2012 Winston & Strawn LLP 47
Rule 37(e) Proposal
(e) FAILURE TO PRESERVE DISCOVERABLE INFORMATION. If a party fails to preserve discoverable information that reasonably should be preserved in the anticipation or conduct of litigation,
(1) The court may permit additional discovery, order the party to undertake curative measures, or require the party to pay the reasonable expenses, including attorney’s fees, caused by the failure.
(2) The court may impose any of the sanctions listed in Rule 37(b)(2)(A) or give an adverse-inference jury instruction only if the court finds:
(A) that the failure was willful or in bad faith and caused substantial prejudice in the litigation; or
(B) that the failure irreparably deprived a party of any meaningful opportunity to present a claim or defense.
© 2012 Winston & Strawn LLP 48
Rule 37(e) Proposal
(3) In determining whether a party failed to preserve discoverable information that reasonably should have been preserved, and whether the failure was willful or in bad faith, the court should consider all relevant factors, including:
(A) the extent to which the party was on notice that litigation was likely and that the information would be discoverable;
(B) the reasonableness of the party’s efforts to preserve the information, including the use of a
litigation hold and the scope of the preservation efforts;
(C) whether the party received a request that information be preserved, the clarity and reasonableness of the request, and whether the person who made the request and the party engaged in good-faith consultation regarding the scope of preservation;
(D) the party’s resources and sophistication in litigation;
(E) the proportionality of the preservation efforts to any anticipated or ongoing litigation; and
(F) whether the party sought timely guidance from the court regarding any unresolved disputes concerning the preservation of discoverable information.
© 2012 Winston & Strawn LLP 49
What Will Happen?
© 2012 Winston & Strawn LLP
Questions?
© 2012 Winston & Strawn LLP
Thank You.