25
Redefining High Speed eDiscovery Processing & Production Conversion of the EDRM Enron Dataset from Natives to TIFF images in 5.3 hours (23 Million pages/day rate) using the Lexbe eDiscovery Processing System August 6, 2014 Karsten Weber Principal, Lexbe LC

Lexbe eDiscovery Webinar- Redefining High Speed eDiscovery Processing & Production

Embed Size (px)

Citation preview

Redefining High Speed eDiscovery Processing & Production

Conversion of the EDRM Enron Dataset from Natives to TIFF images in 5.3 hours (23 Million pages/day rate) using the Lexbe eDiscovery Processing SystemAugust 6, 2014

Karsten WeberPrincipal, Lexbe LC

eDiscovery Webinar Series

○ Takes Place Monthly

○ Cover a Variety of Relevant eDiscovery Topics

○ Presentations Available for Download by Registrants.

Information

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

eDiscovery Webinar Series

Lexbe is an Austin, TX based eDiscovery software and services provider.

○ Lexbe eDiscovery PlatformLexbe eDiscovery Platform is a hosted eDiscovery processing and review tool. Users can load a variety of file types, process for review, OCR for search, and conduct document reviews, productions, prepare for depos & analyze transcripts, conduct case analytics, prepare for dispositive motions, and provide litigation support during trial.

○ Lexbe eDiscovery Services Lexbe performs large volume document culling, processing from native to PDF or TIFF, load file creation, high-volume OCR of image files, Rule 26 and project management consulting, and related eDiscovery Services.

About Lexbe

Lexbe Sales [email protected]

(800) 401-7809 x22

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

If you have any questions or technical issues, please e-mail them to:

[email protected]

Questions will be forwarded to Gene and answered during the webinar or via e-mail if we run out of time.

eDiscovery Webinar SeriesQuestions & Technical Issues

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

eDiscovery Webinar SeriesKarsten Weber bio

○ Current- Principal of Lexbe LC- Principal Architect of Lexbe eDiscovery Platform and Lexbe eDiscovery Services

○ Prior Experience- Consulting Expert, Lumin Expert Group- Director of Software, nLine Corporation- Software Engineering Manager, KLA-Tencor

○ Education

- MBA, University of Texas- M.S. Engineering, Danish Technical University

Contact Karsten [email protected]

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

○ Background of eDiscovery Processing & Production

○ eDiscovery Review Tools in Use Today

○ TIFF Popularity and Processing Throughput Challenge

○ The Lexbe eDiscovery Processing System

○ Test Methodology & the EDRM Enron Data Set

○ Performance Results

○ Comparison with a Large Provider Using Traditional Processing Methods

○ Conclusion

Executive SummaryHigh-Speed eDiscovery Processing

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

Data Types and Volume Keep ExpandingGrowth of Data Worldwide

VoipEmail

iPhonesPeer-to-Peer

Online StorageDigital Cameras

Facebook | LinkedInDropBox | Backup Devices

Elastic Storage | SaaS | Google StreetsPersonal Blogs | Skype | World Satellite Images

Personal Scanners | Customer Service RecordingsPublic Webcams | Google Goggles | Netbooks | Cloud Instance Servers

| PaaS

Digital Information Created, Captured, Replicated WorldwideZettabytes*

4

3

2

1

2005 2010 2015Source: IDC Digital Universe Study (2012)* 1 Zettabyte = 1 Trillion Gigabytes

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

Growth of eDiscovery ProcessingData Volume is Rising

GBs of ESI in a Typical Commercial Case

Low

High

1995 2000 2005 2010 2015

Enron Criminal Trial (2005)○ Source ESI: 100M pages

(~4 TBs)○ Brought to Trial: 1M pages

(~40 GBs)○ Extraordinary at time○ Not now

Microsoft (2011)○ Microsoft collects 45 custodians

per matter average (2011)○ Almost 1 TB per matter,

average

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

Growth of eDiscovery ProcessingProcessing Costs Are Falling - But Still High

Cost per GB to Process ESI in Volume

2005 2010 2015 High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

$2,000

$1,500

$1,000

$500

$0

$1,800/GB (2006)Source: Forrester Research

$500/GB (2011)Source: Forrester Research

ESI Processing costs have fallen 90% in the last 10 years

eDiscovery Market is Big & Growing

Source: Complex Discovery (ComplexDiscovery.com)Based on a combination of public market sizing estimates.

eDiscovery Software & Services

○ $5.5 Billion today○ Growing 15.5% annually○ Projected $9.8 Billion (2017)○ Services (72%)○ Software (28%)

Growth of eDiscovery Processing

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

eDiscovery Processing BackgroundProcessing Activities & Functions

Collection○ Identify and execute retrieval of discoverable documents and electronic

evidence.Culling○ Reduces collections using keyword or date range parameters

Native Processing○ Convert Native Documents (Outlook, Microsoft Office, etc.) into reviewable

formats (TIFF, PDF, Near Native)○ Can include application of OCR to make documents searchable

Review○ Load/ingest ESI into Litigation Database to prepare for trial

Production○ Create a production in a specified format and apply Bates Numbers○ Apply Privilege QC procedures to avoid inadvertently producing confidential

case documents.

Processing Graphic

Setup & Planning Collection Culling &

AnalysisReview & Production

Depos & MotionsProcessing

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

eDiscovery Processing BackgroundReview Environments and TIFF

Type Example Description

TIFF Concordance, Summation,CaseLogistix,RingTail,iConnect

○ Currently the most commonly used format/review environment

○ Must process ESI to single page TIFFs with text and load files before review

PDF WorldDox, Adobe ○ Requires Documents to be converted to PDF for review

Processed Natives

Relativity, Allegro ○ Must process ESI into a ‘native load file’○ Generate ‘near native’ HTML for review

Raw Natives

Lexbe, Digital Warroom, NextPoint

○ Load raw natives that will be automatically processed within the review software

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

eDiscovery Processing BackgroundTIFF Background

○ 2013 ILTA (International Legal Technology Association) survey found that the vast majority (91%) of firms still use TIFF-Based software.

TIFF Benefits:

○ Standardized Review Format○ Page level Bates Stamping can be applied○ Addresses concern of opposition altering

native files○ Easy to redact○ TIFF viewer is only requirement○ Often can be hosted & supported internally

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

eDiscovery Processing BackgroundThe TIFFing Challenge

○ Traditional TIFFing methods have been time consuming and expensive due to the process’ need for considerable computing power

○ As data volumes continue to increase in size, the time and expense issues associated with TIFFing become more severe

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

● Use industry standard dataset to ensure transparent result. Study was run on the 53 GB EDRM Enron Data Set.

● What is the TIFF throughput rate of LEPS?

● How automated is LEPS?

● What quality control procedures are in place?

● How does LEPS compare to current industry leaders?

Meeting the Challenge - Study GoalsHigh-Speed eDiscovery Processing

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

Evaluate the Capabilities of Lexbe eDiscovery Processing System (LEPS) under testable and repeatable conditions

High-Speed Processing DemonstrationLexbe Architecture

ScalableSystems architecture allows LEPS to increase server instances to apply more resources to your processing task

AutomatedLEPS minimizes the need for ‘babysitting’.

Fault TolerantProcessing tasks are not ‘batch-centric’ and check-out/check-in procedures insure individual processing steps operate independently

Secure Processing EnvironmentLEPS is powered by Amazon S3 servers to facilitate redundancy and the high security standards. All data is strong encrypted (256-bit) in-transit and in-place. Our data centers provide SOC I and II reports published under SSAE 16 and ISAE 3402 professional standards and are ISO 27001 certified.

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

High-Speed Processing DemonstrationLexbe Process

○ Archive/Container Decompression

○ File Repair

○ Metadata extraction & fielding

○ MD5 hash code generation

○ System file identification & DeNIST

○ Email attachment extraction & parent email association

○ Native text extraction

○ OCR of image files

○ Full-text indexing

○ Bates stamping

○ PDF & TIFF creation

○ Placeholder creation

○ Native extracted, PDF and TIFF loadfile generation in multiple formats: XLSX (Lexbe), DAT/OPT (Case Logistix, Concordance, iPro Allegro, Ringtail, Kura Relativity) and DII (Summation), and quality control reports

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

High-Speed Processing DemonstrationResults

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

○ High quality output is critical, especially when making a claim of increased efficiency.

High-Speed Processing DemonstrationSample Output

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

High-Speed Processing DemonstrationLexbe Quality Control Tools and Features

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

○ Programmatic batching of processing to individual servers (reduces human error)

○ Custom QC flag creation and filtering○ Integration with Excel for reporting and analysis○ Pivot table analysis and charting○ Ability to view all documents including parent containers (email and

attachments) together○ Ability to verify image quality○ Filtering and reporting by any captured or calculated fields including

failed to convert, words in document, placeholders, etc.○ Native files are extracted and provided for linked load and review○ Statistical sampling and reporting

High quality output is critical, especially when making a claim of increased efficiency.

eDiscovery Processing BackgroundProviders of TIFF Processing

Type Example Description

Service Providers Xerox, Lexbe etc ● Business Service bureaus that deliver a wide range of processing service.

● Local server setup and capacity

Professionals Internal Litigation Support

● Department inside of law firms responsible for conducting litigation support processing functions.

● Often work with service and software providers to meet internal demands.

Software Providers Ipro, Law ● Develop processing software that is licensed for resale by service providers or use in internal litigation support departments

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

Lexbe v. Xerox

Compare Lexbe to Industry Leaders

○ Xerox is known for its high-volume litigation processing and production capacity.

○ Xerox states in its service literature that its production capacity is 5 million pages a day.

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

○ TIFF is important and turn around time is critical

○ Traditional approaches:

■ Fixed capacity leading to variable turn-around time.

○ Lexbe approach:

■ Scalable capacity leading to fixed turn around time.

○ Lexbe study demonstrates what we believe is the worlds fastest TIFF

processing thereby allowing you to meet even the toughest discovery

deadlines.

SummaryHigh-Speed eDiscovery Processing

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

Related Lexbe ServicesHigh-Speed eDiscovery Processing

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014

○ ESI Culling+ Reduce ESI stores to manageable sizes with DeNIST, deduplication, date culling and keyword culling. Metadata extractions and PST reconstitution is available as well.

○ ESI Email Collection+ Flatten and extract native file attachments and metadata to create loadfiles in preparation for native or near native review.

○ Native Processing+ Convert native documents, including Outlook Email and Microsoft Office files, into TIFF or PDF format for searchability, bates stamping, and preparation for online review.

○ eDiscovery OCR+ Apply optical character recognition to increase searchability of PDFs, TIFFs, or document-formatted JPGs or PNGs.

○ NearDup Groupings+ Identify key documents, group similar documents, ensure consistency in privilege coding, and enable email threading.

Thank YouContact Info

Karsten Weber: [email protected]: (800) 401-7809

Stu Van Dusen [email protected] Marketing Manager: (512) 669-9485

Webinar Questions: [email protected]

High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014