Redefining High Speed eDiscovery Processing & Production
Conversion of the EDRM Enron Dataset from Natives to TIFF images in 5.3 hours (23 Million pages/day rate) using the Lexbe eDiscovery Processing SystemAugust 6, 2014
Karsten WeberPrincipal, Lexbe LC
eDiscovery Webinar Series
○ Takes Place Monthly
○ Cover a Variety of Relevant eDiscovery Topics
○ Presentations Available for Download by Registrants.
Information
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
eDiscovery Webinar Series
Lexbe is an Austin, TX based eDiscovery software and services provider.
○ Lexbe eDiscovery PlatformLexbe eDiscovery Platform is a hosted eDiscovery processing and review tool. Users can load a variety of file types, process for review, OCR for search, and conduct document reviews, productions, prepare for depos & analyze transcripts, conduct case analytics, prepare for dispositive motions, and provide litigation support during trial.
○ Lexbe eDiscovery Services Lexbe performs large volume document culling, processing from native to PDF or TIFF, load file creation, high-volume OCR of image files, Rule 26 and project management consulting, and related eDiscovery Services.
About Lexbe
Lexbe Sales [email protected]
(800) 401-7809 x22
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
If you have any questions or technical issues, please e-mail them to:
Questions will be forwarded to Gene and answered during the webinar or via e-mail if we run out of time.
eDiscovery Webinar SeriesQuestions & Technical Issues
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
eDiscovery Webinar SeriesKarsten Weber bio
○ Current- Principal of Lexbe LC- Principal Architect of Lexbe eDiscovery Platform and Lexbe eDiscovery Services
○ Prior Experience- Consulting Expert, Lumin Expert Group- Director of Software, nLine Corporation- Software Engineering Manager, KLA-Tencor
○ Education
- MBA, University of Texas- M.S. Engineering, Danish Technical University
Contact Karsten [email protected]
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
○ Background of eDiscovery Processing & Production
○ eDiscovery Review Tools in Use Today
○ TIFF Popularity and Processing Throughput Challenge
○ The Lexbe eDiscovery Processing System
○ Test Methodology & the EDRM Enron Data Set
○ Performance Results
○ Comparison with a Large Provider Using Traditional Processing Methods
○ Conclusion
Executive SummaryHigh-Speed eDiscovery Processing
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
Data Types and Volume Keep ExpandingGrowth of Data Worldwide
VoipEmail
iPhonesPeer-to-Peer
Online StorageDigital Cameras
Facebook | LinkedInDropBox | Backup Devices
Elastic Storage | SaaS | Google StreetsPersonal Blogs | Skype | World Satellite Images
Personal Scanners | Customer Service RecordingsPublic Webcams | Google Goggles | Netbooks | Cloud Instance Servers
| PaaS
Digital Information Created, Captured, Replicated WorldwideZettabytes*
4
3
2
1
2005 2010 2015Source: IDC Digital Universe Study (2012)* 1 Zettabyte = 1 Trillion Gigabytes
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
Growth of eDiscovery ProcessingData Volume is Rising
GBs of ESI in a Typical Commercial Case
Low
High
1995 2000 2005 2010 2015
Enron Criminal Trial (2005)○ Source ESI: 100M pages
(~4 TBs)○ Brought to Trial: 1M pages
(~40 GBs)○ Extraordinary at time○ Not now
Microsoft (2011)○ Microsoft collects 45 custodians
per matter average (2011)○ Almost 1 TB per matter,
average
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
Growth of eDiscovery ProcessingProcessing Costs Are Falling - But Still High
Cost per GB to Process ESI in Volume
2005 2010 2015 High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
$2,000
$1,500
$1,000
$500
$0
$1,800/GB (2006)Source: Forrester Research
$500/GB (2011)Source: Forrester Research
ESI Processing costs have fallen 90% in the last 10 years
eDiscovery Market is Big & Growing
Source: Complex Discovery (ComplexDiscovery.com)Based on a combination of public market sizing estimates.
eDiscovery Software & Services
○ $5.5 Billion today○ Growing 15.5% annually○ Projected $9.8 Billion (2017)○ Services (72%)○ Software (28%)
Growth of eDiscovery Processing
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
eDiscovery Processing BackgroundProcessing Activities & Functions
Collection○ Identify and execute retrieval of discoverable documents and electronic
evidence.Culling○ Reduces collections using keyword or date range parameters
Native Processing○ Convert Native Documents (Outlook, Microsoft Office, etc.) into reviewable
formats (TIFF, PDF, Near Native)○ Can include application of OCR to make documents searchable
Review○ Load/ingest ESI into Litigation Database to prepare for trial
Production○ Create a production in a specified format and apply Bates Numbers○ Apply Privilege QC procedures to avoid inadvertently producing confidential
case documents.
Processing Graphic
Setup & Planning Collection Culling &
AnalysisReview & Production
Depos & MotionsProcessing
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
eDiscovery Processing BackgroundReview Environments and TIFF
Type Example Description
TIFF Concordance, Summation,CaseLogistix,RingTail,iConnect
○ Currently the most commonly used format/review environment
○ Must process ESI to single page TIFFs with text and load files before review
PDF WorldDox, Adobe ○ Requires Documents to be converted to PDF for review
Processed Natives
Relativity, Allegro ○ Must process ESI into a ‘native load file’○ Generate ‘near native’ HTML for review
Raw Natives
Lexbe, Digital Warroom, NextPoint
○ Load raw natives that will be automatically processed within the review software
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
eDiscovery Processing BackgroundTIFF Background
○ 2013 ILTA (International Legal Technology Association) survey found that the vast majority (91%) of firms still use TIFF-Based software.
TIFF Benefits:
○ Standardized Review Format○ Page level Bates Stamping can be applied○ Addresses concern of opposition altering
native files○ Easy to redact○ TIFF viewer is only requirement○ Often can be hosted & supported internally
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
eDiscovery Processing BackgroundThe TIFFing Challenge
○ Traditional TIFFing methods have been time consuming and expensive due to the process’ need for considerable computing power
○ As data volumes continue to increase in size, the time and expense issues associated with TIFFing become more severe
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
● Use industry standard dataset to ensure transparent result. Study was run on the 53 GB EDRM Enron Data Set.
● What is the TIFF throughput rate of LEPS?
● How automated is LEPS?
● What quality control procedures are in place?
● How does LEPS compare to current industry leaders?
Meeting the Challenge - Study GoalsHigh-Speed eDiscovery Processing
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
Evaluate the Capabilities of Lexbe eDiscovery Processing System (LEPS) under testable and repeatable conditions
High-Speed Processing DemonstrationLexbe Architecture
ScalableSystems architecture allows LEPS to increase server instances to apply more resources to your processing task
AutomatedLEPS minimizes the need for ‘babysitting’.
Fault TolerantProcessing tasks are not ‘batch-centric’ and check-out/check-in procedures insure individual processing steps operate independently
Secure Processing EnvironmentLEPS is powered by Amazon S3 servers to facilitate redundancy and the high security standards. All data is strong encrypted (256-bit) in-transit and in-place. Our data centers provide SOC I and II reports published under SSAE 16 and ISAE 3402 professional standards and are ISO 27001 certified.
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
High-Speed Processing DemonstrationLexbe Process
○ Archive/Container Decompression
○ File Repair
○ Metadata extraction & fielding
○ MD5 hash code generation
○ System file identification & DeNIST
○ Email attachment extraction & parent email association
○ Native text extraction
○ OCR of image files
○ Full-text indexing
○ Bates stamping
○ PDF & TIFF creation
○ Placeholder creation
○ Native extracted, PDF and TIFF loadfile generation in multiple formats: XLSX (Lexbe), DAT/OPT (Case Logistix, Concordance, iPro Allegro, Ringtail, Kura Relativity) and DII (Summation), and quality control reports
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
High-Speed Processing DemonstrationResults
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
○ High quality output is critical, especially when making a claim of increased efficiency.
High-Speed Processing DemonstrationSample Output
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
High-Speed Processing DemonstrationLexbe Quality Control Tools and Features
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
○ Programmatic batching of processing to individual servers (reduces human error)
○ Custom QC flag creation and filtering○ Integration with Excel for reporting and analysis○ Pivot table analysis and charting○ Ability to view all documents including parent containers (email and
attachments) together○ Ability to verify image quality○ Filtering and reporting by any captured or calculated fields including
failed to convert, words in document, placeholders, etc.○ Native files are extracted and provided for linked load and review○ Statistical sampling and reporting
High quality output is critical, especially when making a claim of increased efficiency.
eDiscovery Processing BackgroundProviders of TIFF Processing
Type Example Description
Service Providers Xerox, Lexbe etc ● Business Service bureaus that deliver a wide range of processing service.
● Local server setup and capacity
Professionals Internal Litigation Support
● Department inside of law firms responsible for conducting litigation support processing functions.
● Often work with service and software providers to meet internal demands.
Software Providers Ipro, Law ● Develop processing software that is licensed for resale by service providers or use in internal litigation support departments
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
Lexbe v. Xerox
Compare Lexbe to Industry Leaders
○ Xerox is known for its high-volume litigation processing and production capacity.
○ Xerox states in its service literature that its production capacity is 5 million pages a day.
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
○ TIFF is important and turn around time is critical
○ Traditional approaches:
■ Fixed capacity leading to variable turn-around time.
○ Lexbe approach:
■ Scalable capacity leading to fixed turn around time.
○ Lexbe study demonstrates what we believe is the worlds fastest TIFF
processing thereby allowing you to meet even the toughest discovery
deadlines.
SummaryHigh-Speed eDiscovery Processing
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
Related Lexbe ServicesHigh-Speed eDiscovery Processing
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014
○ ESI Culling+ Reduce ESI stores to manageable sizes with DeNIST, deduplication, date culling and keyword culling. Metadata extractions and PST reconstitution is available as well.
○ ESI Email Collection+ Flatten and extract native file attachments and metadata to create loadfiles in preparation for native or near native review.
○ Native Processing+ Convert native documents, including Outlook Email and Microsoft Office files, into TIFF or PDF format for searchability, bates stamping, and preparation for online review.
○ eDiscovery OCR+ Apply optical character recognition to increase searchability of PDFs, TIFFs, or document-formatted JPGs or PNGs.
○ NearDup Groupings+ Identify key documents, group similar documents, ensure consistency in privilege coding, and enable email threading.
Thank YouContact Info
Karsten Weber: [email protected]: (800) 401-7809
Stu Van Dusen [email protected] Marketing Manager: (512) 669-9485
Webinar Questions: [email protected]
High Speed eDiscovery Processing & Production | EDRM Enron Demonstration | August 2014