Page 1: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

AIIM Conference 2014 Orlando, FL

April 2, 2014

Jason R. Baron, Esq. Information Governance and eDiscovery Group

Drinker Biddle & Reath LLP Washington, D.C. 20005

© Jason R. Baron 2014

Finding The Signal in the Noise: Bringing Predictive Analytics To the Information Governance Space

Page 2: All Needle, No Haystack: Bring Predictive Analysis to Information Governance
Page 3: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

(c) Jason R. Baron 2013

Page 4: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

We have entered the era where Big Data is ….

(c) Jason R. Baron 2014

Page 5: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

The World Has Changed

§  We are not just managing thousands or millions of paper files §  We are at an inflection point in history in terms of data volume

§  IDC Report: 1800 new exabytes this year (1 exabyte=data equivalent of 50,000 yrs of continuous movies)

§  Open data policies vs. “the iceberg”: a vast amount of information is

“hidden” underneath the web —how is it

to be reliably preserved and accessed?

(c) Jason R. Baron 2013

Page 6: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

(c) Jason R. Baron 2013

Reality: The era of information inflation and Big Data in litigation has just begun…. Lehman Brothers Investigation —  350 billion page universe (3 petabytes) —  Examiner narrowed collection by selecting key

custodians, using dozens of Boolean searches —  Reviewed 5 million docs (40 million pages using 70

contract attorneys) Source: Report of Anton R. Valukas, Examiner, In re Lehman Brothers Holdings Inc., et al., Chapter 11 Case No. 08-13555 (U.S. Bankruptcy Ct. S.D.N.Y. March 11, 2010), Vol. 7, Appx. 5, at

Page 7: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Information governance is needed in a world where . . .

-  80% of enterprise data is unstructured

-  60% of documents are obsolete

-  50% of documents are duplicate

-  80% documents are not retrieved by traditional search

(c) Jason R. Baron 2013

Page 8: All Needle, No Haystack: Bring Predictive Analysis to Information Governance�  

Do  YOU  understand  the  business    challenge  of  the  next  10  years?  

This  ebook  from  AIIM  President  John  Mancini  explains.  

Page 9: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Traditional Document Review Processes


§  Labor intensive §  Linear Review § Quality of manual coding for responsiveness open to question

(see RAND Study, 2012)

Page 10: All Needle, No Haystack: Bring Predictive Analysis to Information Governance


Searching the Haystack….

Page 11: All Needle, No Haystack: Bring Predictive Analysis to Information Governance


to find relevant needles…

Page 12: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

False Positives Relevant Smoking Policy Emails


VP Chief of Staff Ron Klain

Office of the U.S. Trade Rep.

White House


Page 13: All Needle, No Haystack: Bring Predictive Analysis to Information Governance


Example of Boolean search string from U.S. v. Philip Morris §  (((master settlement agreement OR msa) AND

NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)

Page 14: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Emerging New Strategies: “Predictive Analytics”

Improved review and case assessment: cluster docs thru use of software with minimal human intervention at front end to code “seeded” data set Slide adapted from Gartner Conference

June 23, 2010 Washington, D.C. (c) Jason R. Baron 2013

Page 15: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Defining “predictive coding” or “TAR” §  A process for prioritizing or coding a collection of electronic

documents using a computerized system that harnesses human judgments of one or more subject matter experts on a smaller set of documents and then extrapolates those judgments to the remaining document population.

§  Also referred to as “supervised or active machine learning,” “computer-assisted review” or “technology-assisted review”

Source: Adapted from Grossman-Cormack Glossary of Technology Assisted Review, v. 1.0 (Oct 2012)

(c) Jason R. Baron 2013

Page 16: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Judicial endorsement of predictive analytics in document review by Judge Peck in da Silva Moore v. Publicis Groupe (SDNY Feb. 24, 2012) This opinion appears to be the first in which a Court has approved of the use of computer-assisted review. . . . What the Bar should take away from this Opinion is that computer-assisted review is an available tool and should be seriously considered for use in large-data-volume cases where it may save the producing party (or both parties) significant amounts of legal fees in document review. Counsel no longer have to worry about being the ‘first’ or ‘guinea pig’ for judicial acceptance of computer-assisted review . . . Computer-assisted review can now be considered judicially-approved for use in appropriate cases.

(c) Jason R. Baron 2013

Page 17: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

The da Silva Moore Protocol • Supervised learning

•  Random sampling

•  Establishment of seed set

• Issue tags •  Iteration

•  Random sampling of docs deemed irrelevant

(c) Jason R. Baron 2013

Page 18: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

The demise of RM…. ● John Mancini, President of AIIM: • “If by traditional records management you mean

manual systems—even if they are computerized – then I would say traditional records management is dead. The idea that we could get busy people to care about our complicated retention schedules, and drag and drop documents into folders, and manually apply metadata document by document according to an elaborate taxonomy will soon seem as ridiculous as asking a blacksmith to work on a Ferrari.”

(c) Jason R. Baron 2013

Page 19: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Process Optimization Problem: The transactional toll of user-based recordkeeping schemes (“as is” RM)

(c) Jason R. Baron 2013

Page 20: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

…. and the need for better, automated solutions ….

(c) Jason R. Baron 2013

Page 21: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Email is still the 800 lb. gorilla of ediscovery

(c) Jason R. Baron 2013

Page 22: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Archivist/OMB Directive ● M-12-18, Managing Government Records

Directive, dated 8/24/12: 1.1 By 2019, Federal agencies will manage all permanent records in an electronic format. 1.2 By 2016, Federal agencies will manage both permanent and temporary email records in an accessible electronic format.

(c) Jason R. Baron 2013

Page 23: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

NARA Moved to the Cloud for Email with Embedded RM/Autocategorization

(c) Jason R. Baron 2013

Page 24: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Capstone Officials Capstone officials may include: ● Officials at or near the top of

an agency or an organizational subcomponent

● Key staff members that may be in positions that create or receive presumptively permanent email records

Capstone  accounts  

Other  accounts  

Key  staff  accounts  

Other  accounts  

(c) Jason R. Baron 2013

Page 25: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

How To Avoid A Train Wreck With Email Archiving….

Capture  E-­‐mail  But  U:lize  Records  Management!  

(c) Jason R. Baron 2013

Page 26: All Needle, No Haystack: Bring Predictive Analysis to Information Governance


Can advanced analytics techniques and technologies, including Auto-Categorization, Auto-redaction, Auto-indexing, Auto-translation, etc., be applied and leveraged by Records Managers/Information Governance types? Yes, but ….

Information Governance / Records Analytics

Page 27: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Homage to Carl Linnaeus (1707-1778)

(c) Jason R. Baron 2013

Page 28: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Linnaean classification of the animal kingdom §  Kingdom: Animalia

§  Phylum: Chordata

§  Subphylum: Vertebrata

§  Superclass: Tetrapoda

§  Class: Mammalia

§  Subclass: Theria

§  Infraclass: Eutheria

§  Cohort: Unguiculata

§  Order: Primata

§  Suborder: Anthropoidea

§  Superfamily: Hominoidae

§  Family: Hominidae

§  Subfamily: Homininae

§  Genus: Homo

§  Subgenus: Homo (Homo)

§  Specific epithet: sapiens

(c) Jason R. Baron 2013

Page 29: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Which category?

(c) Jason R. Baron 2013

Page 30: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

The Coming Age of Dark Archives (and the inability to provide access unless we have smart ways of extracting signal from noise)

(c) Jason R. Baron 2013

Page 31: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

We should be leveraging the power of predictive analytics to improve information governance . . .

-- RM: defensible disposal of low value information

-- Regulatory compliance

-- Risk mitigation – segregating sensitive materials…

(PII, proprietary, etc.)

-- Business intelligence

-- E-discovery

-- Collaboration across enterprise

-- Providing access to dark data & archives

(c) Jason R. Baron 2013

Page 32: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

(c) Jason R. Baron 2013

IG & Analytics: True Life Stories “Ripped from the Headlines”

§  The Case of the Wayward Would-Be Whisteblower

§  The Case of the Mistakenly Valued Merger & Acquisition

Page 33: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

What is the IGI?

The IGI is a cross-disciplinary think tank and consortium dedicated to advancing the adoption of Information Governance practices and technologies through research, publishing, advocacy, and peer-to-peer networking.

It provides industry thought leadership and benchmarking designed to foster consensus and conversation It is a connector among the stakeholders of information governance It is a promoter of industry best practices and standards

Page 34: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

“The future is here. It is just not evenly distributed.”

--William Gibson

(c) Jason R. Baron 2013

Page 35: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

References Sources Referencing Information Governance, Autocategorization & Predictive Coding

B. Borden & J.R. Baron, “Finding the Signal in the Noise: Information Governance, Analytics, and The Future of the Law,” 20 Richmond J. Law & Technology 7 (2014),

J.R. Baron, “Law in the Age of Exabytes: Some Further Thoughts on ‘Information Inflation’ and Current Issues in E-Discovery Search, 17 Richmond J. Law & Technology (2011), see

N. Pace, “Where The Money Goes: Understanding Litigant Expenditures for Producing E-Discovery,” RAND Publication (2012), see

TREC Legal Track Home Page, (includes bibliography for further reading)

The Sedona Conference®, The Sedona Conference Commentary on Information Governance (2013)

Latest “Supervised Learning/Predictive Coding” Case Law: •  Da Silva Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012), approved and adopted

in Da Silva Moore v. Publicis Groupe, 2012 WL 1446534, at *2 (S.D.N.Y. Apr. 26, 2012) •  EORHB v HOA Holdings, Civ. No. 7409-VCL (Del. Ch. Oct. 15, 2012) •  Global Aerospace Inc., et al. v. Landow Aviation, L.P., et al., 2012 WL 1431215 (Va. Cir. Ct. Apr. 23, 2012). •  In re Actos (Pioglitazone) Products, 2012 WL 3899669 (W.D. La. July 27, 2012) •  Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711 (N.D. Ill.) (Nolan, M.J.) •  In re Biomet M2a Magnum Hip Implant Products Liability Litigation, 3:12-MD-2391 (S.D. Ind.) (April 18,


(c) Jason R. Baron 2013

Page 36: All Needle, No Haystack: Bring Predictive Analysis to Information Governance�  

Do  YOU  understand  the  business    challenge  of  the  next  10  years?  

This  ebook  from  AIIM  President  John  Mancini  explains.  

Page 37: All Needle, No Haystack: Bring Predictive Analysis to Information Governance

Jason R. Baron Of Counsel Drinker Biddle & Reath LLP 1500 K Street, N.W. Washington, D.C. 20005

(202) 230-5196 Email: [email protected]

(c) Jason R. Baron 2014
