23
Open Source eDiscovery Presentation for "Women in eDiscovery" Houston, TX 12/15/2011

Open source e_discovery

Embed Size (px)

DESCRIPTION

Presentation for Women in eDiscovery, Houston, TX

Citation preview

Page 1: Open source e_discovery

Open Source eDiscovery

Presentation for "Women in eDiscovery" Houston, TX

12/15/2011

Page 2: Open source e_discovery

Open source eDiscovery

•Pre-history•Present capabilities•Foreseeable future•Vision

Page 3: Open source e_discovery

Qualifications

• MS Math• MS Computer Science• Mensa, Languages (10)

 • Oil: patents, books, awards, software• Projects...

 • JD - eDiscovery• eDiscovery 1• eDiscovery 2• Free Discovery

Page 4: Open source e_discovery

Following the People with Luck

Watch the people who made it

Page 5: Open source e_discovery

My first project: writing eDiscovery for 1 computerEnding with 30

Page 6: Open source e_discovery

My second project: writing eDiscovery for an unlimited clusterEnding with BigData

Page 7: Open source e_discovery

Big Data! Enter Hadoop

 

Page 8: Open source e_discovery

Hadoop = Big Data

 

Page 9: Open source e_discovery

Big Data History

• 2004 - Google reveals their big data technology• 2005 - It becomes open source with Hadoop• 2008 - eDiscovery on the cluster• 09-11 - Big Data explosion

Page 10: Open source e_discovery

Writing a book

Hadoop in Practice for Manning

Page 11: Open source e_discovery

Getting invited

• YouTube• Microsoft Bing• Facebook• Google • Yahoo

Page 12: Open source e_discovery

So what is FreeEed

• Applied knowledge gained from eDiscovery applications and competitor analysis

• Big Data• Open source

Page 13: Open source e_discovery

Built for Big Data

Write the code once, make it work either on 1 or on 1000s of computers 

• One machine• Many private computers (cluster)• Many rented Amazon computers

Page 14: Open source e_discovery

What is a cluster

Many computers organized together

Page 15: Open source e_discovery

What is a Hadoop cluster?

• A group of computers ready to work together• Hadoop allows them to share the workload• Fault-tolerant

Page 16: Open source e_discovery

What is open source?

Many programmers working together

Page 17: Open source e_discovery

Open source for eDiscovery

• Low cost for the user• Ideal for in-house implementation• Better code quality• Open collaboration• Fast development using existing

open source tools and applications

Page 18: Open source e_discovery

FreeEed present capabilities

 

• Text extraction• Culling• Flexible search syntax• Scalability• PDF Imaging • Runs on Windows, Mac, Linux, Hadoop cluster

Page 19: Open source e_discovery

FreeEed processing stages

  

• Staging, maintaining the integrity of the data• Processing - text/native/exceptions/pdf• Review - Concordance/Future review platform

Page 20: Open source e_discovery

FreeEed screens

Project, Settings, History

Page 21: Open source e_discovery

FreeEed immediate future - 3 months

• Amazon cloud processing• Multiple enhancements (imaging, deduping, OCR, etc.)

Page 22: Open source e_discovery

Next organizational steps

• Support• Development• In-house EDD

Page 23: Open source e_discovery

Exciting future steps

• Enhanced capabilities based on cloud power• iPad/Chrome tablet eDiscovery• Big Data technology for review• Text Understanding: predictive coding,

automated privilege review, clustering, email chains

 Advanced FreeEed technology will be a powerful weapon in future legal battles