38
E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science Foundation Washington, D.C. Jason R. Baron Director of Litigation Office of General Counsel U.S. National Archives and Records Administration July 18, 2007

E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Embed Size (px)

DESCRIPTION

National Archives and Records Administration3 Overview Introduction: Myth, Hype, Reality – Information Retrieval and the Problem of Language Case Study: U.S. v. Philip Morris The TREC Legal Track: Initial Results Strategic Challenges in Thinking About Search Issues Across The Enterprise References

Citation preview

Page 1: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of

Institutional Change Collaborative Expedition Workshop #63

National Science FoundationWashington, D.C.

Jason R. BaronDirector of Litigation

Office of General CounselU.S. National Archives and Records Administration

July 18, 2007

Page 2: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 2

Searching Through E-Haystacks: A Big Challenge

Page 3: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 3

Overview Introduction: Myth, Hype, Reality –

Information Retrieval and the Problem of Language

Case Study: U.S. v. Philip Morris The TREC Legal Track: Initial Results Strategic Challenges in Thinking About Search Issues Across The Enterprise References

Page 4: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

The Myth of Search & Retrieval

When lawyers request production of “all” relevant documents (and now ESI), all or substantially all will in fact be retrieved by existing manual or automated methods of search.Corollary: in conducting automated searches, the use of “keywords” alone will reliably produce all or substantially all documents from a large document collection.

Page 5: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

The “Hype” on Search & Retrieval

Claims in the legal tech sector that a very high rate of “recall” *(i.e., finding all relevant documents) is easily obtainable provided one uses a particular software product or service.

Page 6: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

The Reality of Search & Retrieval

+ Past research (Blair & Maron, 1985) has shown a gap or disconnect between lawyers’ perceptions of their ability to ferret out relevant documents, and their actual ability to do so: --in a 40,000 document case (350,000 pages), lawyers estimated that a manual search would find 75% of relevant documents, when in fact the research showed only 20% or so had been found.

Page 7: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

More Reality: IR is Hard + Information retrieval (IR) is a hard problem: difficult even with English-language text, and even harder with non-textual forms of ESI (audio, video, etc.) caught up in litigation.+ A vast field of IR research exists, including some fundamental concepts and terminology, that lawyers would benefit from having greater exposure with.

Page 8: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Why is IR hard (in general)?

+ Fundamental ambiguity of language+ Human errors + OCR problems+ Non-English language texts+ Nontextual ESI (in .wav, .mpg, .jpg formats, etc.) + Lack of helpful metadata

Page 9: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Problems of language

Polysemy: ambiguous terms (e.g., “George Bush,” “strike,”)Synonymy: variation in describing same person or thing in multiplicity of ways (e.g., “diplomat,” “consul,” “official,” ambassador,” etc.)Pace of change: text messaging, computer gaming as latest examples (e.g., “POS,” “1337”)

Page 10: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Why is IR hard (for lawyers)?

+ Lawyers not technically grounded+ Traditional lawyering doesn’t emphasize front-

end “process” issues that would help simplify or focus search problem in

particular contexts+ The reality is that huge sources of heterogeneous ESI exist, presenting an array of technical issues + Deadlines and resource constraints + Failure to employ best strategic practices

Page 11: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Snapshot of 2007 ESI Heterogeneity

E-mail, integrated with voice mail & VOIP, word processing (including not in English), spreadsheets, dynamic databases, instant messaging, Web pages including intraweb sites, Blogs, wikis, and RSS feeds, backup tapes, hard drives, removable media, flash drives, new storage devices, remote PDAs, and audit logs and metadata of all types.

Page 12: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 12

Sedona Guideline 11 (2007)A responding party may satisfy its good faith obligation to preserve and produce potentially relevant electronically stored information by using electronic tools ad processes, such as data sampling, searching, or the use of selection criteria, to identify data reasonably likely to contain relevant information.www.thesedonaconference.org

Page 13: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Litigation Targets+ Defining “relevance”+ Maximizing # responsive docs+ Minimizing retrieval “noise” or false

positives (non-responsive docs)

Page 14: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 14

Not Relevant and Retrieved

Relevant and Retrieved

Relevant and Not Retrieved

Not Relevant and Not Retrieved

FINDING RESPONSIVE DOCUMENTS IN A LARGE DATA SET: FOUR LOGICAL CATEGORIES

DOCUMENT SET FALSE POSITIVES

FALSE NEGATIVES

Page 15: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 15

FINDING RESPONSIVE DOCUMENTS IN A LARGE DATA SET: THE REALITY OF LARGE SCALE DISCOVERY

RELEVANT DOCUMENTS

“HITS” ON NONRELEVANT DOCUMENTS???????

??????

?????

The Great Unknown

Page 16: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Measures of Information Retrieval

Recall =

# of responsive docs retrieved# of responsive docs in collection

Page 17: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Measures of Information Retrieval

Precision =

# of responsive docs retrieved # of docs retrieved

Page 18: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 18

RECALL

PRECISION

0 100%

100%

THE RECALL-PRECISION TRADEOFF

Page 19: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Three Questions(1) How can one go about improving rates of recall and precision (so as to find a greater number of relevant documents, while spending less overall time, cost, etc., sifting through noise?)(2) What alternatives to keyword searching exist?(3) Are there ways in which to benchmark alternative search methodologies so as to evaluate their efficacy?

Page 20: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Beyond Reliance on Keywords Alone: Alternative Search Methods

Greater Use Made of Boolean StringsFuzzy Search ModelsProbabilistic models (Bayesian)Statistical methods (clustering)Machine learning approaches to semantic representationCategorization tools: taxonomies and ontologiesSocial network analysis

Page 21: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 21

What is TREC? Conference series co-sponsored by the National

Institute of Standards and Technology (NIST) and the Advanced Research and Development Activity (ARDA) of the Department of Defense

Designed to promote research into the science of information retrieval

First TREC conference was in 1992 15th Conference held November 15-17, 2006 in

U.S. in Gaithersburg, Maryland (NIST headquarters)

Page 22: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 22

TREC 2006 Legal Track The TREC 2006 Legal Track was designed to evaluate th

effectiveness of search technologies in a real-world legal context First of a kind study using nonproprietary data since Blair/Maron

research in 1985 5 hypothetical complaints and 43 “requests to produce” drafted

by Sedona Conference members “Boolean negotiations” conducted as a baseline for search

efforts Documents to be searched were drawn from a publicly available 7

million document tobacco litigation Master Settlement Agreement database

6 Participating teams submitted 33 runs. Teams consisted of: Hummingbird, National University of Singapore, Sabir Research, University of Maryland, University of Missouri-Kansas City, and York University

Page 23: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 23

<?xml version="1.0" encoding="ISO-8859-1" ?> - <TrecLegalProductionRequest>- <ProductionRequest>  <RequestNumber>6</RequestNumber>   <RequestText>All documents discussing, referencing, or relating to company guidelines or internal approval for placement of tobacco products, logos, or signage,

in television programs (network or cable), where the documents expressly refer to the programs being watched by children.</RequestText> - <BooleanQuery>  <FinalQuery>(guide! OR strateg! OR approval) AND (place!

OR promot! OR logos OR sign! OR merchandise) AND (TV OR "T.V." OR televis! OR cable OR network)

AND ((watch! OR view!) W/5 (child! OR teen! OR juvenile OR kid! OR adolescent!))</FinalQuery> - <NegotiationHistory>

TREC 2006 LEGAL TRACK XML ENCODED TOPICS WITH NEGOTIATION HISTORY – ONE EXAMPLE

Page 24: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 24

0

100

200

300

400

500

49404117 1047243629 39254451 3520143328 46503222 371830 6 9 38451343 7 27 8 3421 31262319

Topic

Kno

wn

Rel

evan

t Doc

umen

tsBoolean Expert Searcher Ranked Only

TREC Legal Track 2006: Percentage of Unique Documents By Topic Found By Boolean, Expert Searcher, and Other Combined Methods of Search

Page 25: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 25

TREC Legal Track 2006: Sort by Increasing Percentage of Unique Documents Per Topic Found By Combined Methods Other Than A Baseline Boolean Search

0% 0% 0% 0% 0%2% 3%

6% 6% 7% 7%11

% 14% 16%

17%

17%

18% 20

% 22% 24

%24

%25

% 28% 32

%32

% 36% 38%

44% 46

% 51% 53

%54

%62

% 64%

73% 76

%76

% 78%

100%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

17374041512818275013302522194745292039214331322644 8 3814362335 6 3446 9 33 7 2410

Topic

Perc

ent

Series1

Page 26: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 26

INCREASING EFFORT(time, resources expended, etc.)

Boolean Run

Alternative Search Run

Boolean vs. Hypothetical Alternative Search Method

B

C

D

SUCCESS(in retrievingrelevant docs)

A

x

y

Page 27: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 27

Managing Litigation Risk

↑ Success per amount of effort = ↓ Litigation Risk

Page 28: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Strategic ChallengesConvincing lawyers and judges that automated searches are not just desirable but necessary in response to large e-discovery demands.

Page 29: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Challenges (con’t)Having all parties and adjudicators understand that the use of automated methods does not guarantee all responsive documents will be identified in a large data collection.

Page 30: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Challenges (con’t)Designing an overall review process which maximizes the potential to find responsive documents in a large data collection (no matter which search tool is used), and using sampling and other analytic techniques to test hypotheses early on.

Page 31: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Challenges (con’t)Parties making a good faith attempt to collaborate on the use of particular search methods, including utilizing multiple “meet and confers” as necessary based on initial sampling or surveying of retrieved ESI, based on whatever methods are used.

Page 32: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Challenges (con’t)Being open to using new and evolving search and information retrieval methods and tools.

Page 33: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Leading U.S. Case Precedent on Automated Searches

-- Current cases emphasize parties discussing proposed keyword searches under the rubric of coming up with a “search protocol.” Treppel v. Biovail, 233 F.R.D. 363 (S.D.N.Y. 2006)-- First case discussing parties’ employing or relying on concept searches as an alternative search methodologies. Disability Rights Council v. WMATA, 2007 WL 1585452 (June 1, 2007)

Page 34: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

Future Research

TREC 2007 Legal TrackThe Sedona Conference Search & Retrieval Commentary & additional activities

Page 35: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 35

References

J. Baron, “Toward A Federal Benchmarking Standard for Evaluating Information Retrieval Products Used in E-Discovery,” 6 Sedona Conference Journal 237 (2005) (available on Westlaw and LEXIS)

J. Baron, “Information Inflation: Can The Legal System Adapt?” (with co-author George L. Paul), 13 Richmond J. Law & Technology (2007), vol. 3, article 10 (available online at http://law.richmond.edu/jolt/index.asp

Page 36: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 36

References

J. Baron, D. Oard, and D. Lewis, “TREC 2006 Legal Track Overview,” available at http://trec.nist.gov/pubs/trec15/t15_proceedings.html (document 3)

The Sedona Conference, Best Practices Commentary on The Use of Search & Retrieval Methods in E-Discovery (forthcoming 2007)

TREC 2007 Legal Track Home Page, see http://trec-legal.umiacs.umd.edu/

Page 37: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 37

References

ICAIL 2007 (International Conference on Artificial Intelligence and the Law), Workshop on Supporting Search and Sensemaking for ESI in Discovery Proceedings, see http://www.umiacs.umd.edu/~oard/desi-ws/ see also J. Baron and P. Thompson, “The Search

Problem Posed By Large Heterogeneous Data Sets in Litigation: Possible Future Approaches to Research,” ICAIL 2007 Conference Paper, June 4-8, 2007, available at http://www.umiacs.umd.edu/~oard/desi-ws/ (click link to conference paper).

Page 38: E-Discovery and E-Recordkeeping: The Litigation Risk Factor As A Driver of Institutional Change Collaborative Expedition Workshop #63 National Science

National Archives and Records Administration 38

Jason R. BaronDirector of LitigationOffice of General CounselNational Archives and Records Administration8601 Adelphi Road Suite 3110College Park, MD 20740(301) 837-1499Email: [email protected]