16

Deep Dive: Advanced Search Technologies

Embed Size (px)

DESCRIPTION

Even with recent advancements in predictive coding, tried and true searching tactics such as keyword searching, concept searching, topic grouping, near de-duplication, and email threading will continue to play an important role in ediscovery filtering, review and production across the Electronic Discovery Reference Model (EDRM).

Citation preview

Page 1: Deep Dive: Advanced Search Technologies
Page 2: Deep Dive: Advanced Search Technologies

2

Page 3: Deep Dive: Advanced Search Technologies

Discussion Overview

Case Law and Industry Guidance: The Role of Searching in Ediscovery

Back to the Basics: Keyword Searching Tips

Deep Dive: Advanced Searching Technologies

3

Page 4: Deep Dive: Advanced Search Technologies

Judicial Viewpoints on Keyword Searching

Court required parties to “confer on the

development of reasonable search terms”

instead of compelling production without a list

of proposed search terms provided by the

requesting party

“Common practice governing the

discovery of [ESI] requires the use of

search terms . . . If the producing party

generates the search terms on its own,

the inevitable result will be complaints

that the search terms were

inadequate” EEOC v. McCormick & Schmick’s Seafood Restaurants,

Inc., 2012 WL 380048 (D. Md. Feb. 3, 2012).

4

Page 5: Deep Dive: Advanced Search Technologies

Keyword searching plays an important role in winnowing document sets for discovery

Analyzing Search Methods

5

Objective of search: high recall and precision

» Recall – fraction of relevant documents found during review

» Precision – fraction of identified documents that actually are relevant

In this example, fruit is relevant; broccoli is not.

Page 6: Deep Dive: Advanced Search Technologies

Designing Effective Keyword Searches

1. Understand your search engine

» Learn how each operator works (OR, AND, PROXIMITY, etc.)

» Be aware of operator precedence (Boolean or left-to-right) and use parentheses to clarify

» Work with ediscovery provider to create an alternative strategy for lengthy searches that may “time out”

6

Page 7: Deep Dive: Advanced Search Technologies

Designing Effective Keyword Searches

2. Develop a search strategy

» Run broad searches for date-range culling, etc. then use results as scope for sub-level searches

» Save searches and search results for future use and reference

» Find on-point documents and use “similar” documents and concepts to provide additional key terms

» Know your universe (foreign language requires foreign keywords!)

7

Page 8: Deep Dive: Advanced Search Technologies

Designing Effective Keyword Searches

3. Build smart keyword lists

Use a text editor to reduce errors

» Programs that format text can cause difficulty

» Use a program like Notepad and place each term on a separate line

» Spell check

» Be aware of commonly misspelled keywords or privilege terms

Understand the impact of your key terms

» Be flexible: account for word/phrase permutations – use a “Data Dictionary”

» Over-inclusive? Under-inclusive?

» “Noise words” increase likelihood of false hits

8

Page 9: Deep Dive: Advanced Search Technologies

Advanced Searching Technologies

What are some “new and evolving” search methods?

1. Concept Searching

2. Topic Grouping

3. Language Identification

4. Email Threading

5. Near De-Duplication

6. Sampling

**Technology-assisted Review

9

Will not cover in this

presentation – hot, evolving

topic!

Will cover in this presentation

Page 10: Deep Dive: Advanced Search Technologies

Keyword Searching Concept Searching

Allows reviewers to find

documents with similar

conceptual terms even if they

do not contain the exact

search terms

Seldom used for filtering;

increasingly used for review

1. Keyword Searching vs. Concept Searching

Uses search terms to

retrieve documents that

contain those exact

terms

10

Standard practice; generally

accepted in the courts

Emerging as a technology alternative

Page 11: Deep Dive: Advanced Search Technologies

2. Topic Grouping

Documents automatically grouped by theme without human input

Topic grouping will group similar documents and label them for quick identification

Users do not need to “seed” the processing engine by providing keywords

11

Page 12: Deep Dive: Advanced Search Technologies

3. Language Identification

This technology can identify all languages in a document as well as the primary language and pass this information along via a metadata field

A legal team needs to know what languages are in a collection, and the volume of foreign language documents

Reports can help determine whether to use machine translations, foreign language reviewers, or a combination

12

Page 13: Deep Dive: Advanced Search Technologies

4. Email Threading

Identifies and groups for review e-mail conversations based on content

Using actual content of the e-mails to identify e-mail threads is the most reliable method, as it will not fail to recognize a thread if the subject line changes or if e-mails are exchanged across different e-mail applications

13

Page 14: Deep Dive: Advanced Search Technologies

5. Near De-Duplication

Reviewers can quickly identify and compare documents that are very similar to one another but are not exact duplicates

Technology assesses document set’s similarities, identifying the most uniquely representative documents as “the core”

» All related documents are then grouped around the core

14

Page 15: Deep Dive: Advanced Search Technologies

6. Sampling: Defensibility & Quality Control

Sampling is the practice of looking at a certain % of documents in a data set or particular folder of data

» Strengthens the defensibility of the process

» Helps validate what you have (and equally important, do not have) in your production set

» May take place iteratively throughout the review process or prior to production

– During ongoing quality control

– At the end to assess completeness of review

15

Page 16: Deep Dive: Advanced Search Technologies