L1-based compression of random forest modelSlide

L1-based compression of random forest model

Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkela.joly@ulg.ac.be - www.ajoly.org

Departement of EE and CS & GIGA-Research

High dimensional supervised learning applications

3D Image segmentation

Genomics Electrical grid

x=original imagey=segmented image

x=DNA sequencey=phenotype

x=system statey=stability

From 105 to 109 dimensions.

Tree based ensemble methods

From a dataset of input-output pairs {(xi , yi )}ni=1 ⊂ X × Y, weapproximate f : X → Y by learning an ensemble of M decision trees.

Label 3

Label 1

Label 2

The estimator f̂ of f is obtained by averaging the predictions of theensemble of trees.

High model complexity → Large memory requirement

The complexity of tree based methods is measured by the number ofinternal nodes and increases with

I the ensemble size M;I the number of samples n in the dataset.

The variance of individual trees increases with the dimension p of theoriginal feature space → M(p) should increase with p to yield near optimalaccuracy.

Complexity grows as nM(p) → may require huge amount of storage.

Memory limitation will be an issue in high dimensional problems.

L1-based compression of random forest model (I)

We first learn an ensemble of M extremely randomized trees (Geurts, et al.,2006) . . .Example

1,2 1,3

1,4 1,5

1,6 1,7

2,2 2,3

2,4 2,5 2,6 2,7

3,2 3,3

3,4 3,5

. . . and associate to each node an indicator function 1m,l (x) which is equalto 1 if the sample (x , y) reaches the l -th node of the m-th tree, 0 otherwise.

L1-based compression of random forest model (II)

The node indicator functions 1m,l (x) may be used to lift the input space Xtowards its induced feature space Z

z(x) = (11,1(x), . . . , 11,N1(x), . . . , 1M,1(x), . . . , 1M,NM (x)) .

Example for one sample xs

1,2 1,3

1,4 1,5

1,6 1,7

2,2 2,3

2,4 2,5 2,6 2,7

3,2 3,3

3,4 3,5

L1-based compression of random forest model (III)

A variable selection method (regularization with L1-norm) is applied on theinduced space Z to compress the tree ensemble using the solution of

(β∗j (t)

)qj=0 = argmin

n∑i=1

yi − β0 −q∑

βj zj(xi )

s.t.q∑

|βj | ≤ t.

Pruning: a test node is deleted if all its descendants (including the testnode itself) correspond to β∗j (t

∗) = 0.

Overall assessment on 3 datasets

Datasets Error Complexity

ET rET ET rET ET/rET

Friedman1 0.19587 0.18593 29900 885 34Two-norm 0.04177 0.06707 4878 540 9

SEFTi 0.86159 0.84131 39436 2055 19

ET Extra trees;rET L1-based compression of ET.

Table: Parameters of the Extra-Tree method: M = 100; K = p; nmin = 1 onFriedman1 and Two-norm, nmin = 10 on SEFTi.

An increase of t decreases the error of rET until t = 3 withdrastic pruning

(a) Estimated risk (b) Complexity

Friedman1 : M = 100, K = p = 10 and nmin = 1

Managing complexity in the extra tree method

Bound MRestrict the size M of the tree based ensemble.

Pre-pruningPre-pruning reduces the complexity of tree based methods by imposing acondition to split a node e.g.

I minimum number of samples nmin in order to split,I minimum decrease of an impurity measure,I . . .

The accuracy and complexity of an rET model does notdepend on nmin, for nmin small enough (nmin < 10 )

(c) Estimated risk (d) Complexity

Friedman1 : M = 100, K = p = 10 and t = t∗cv

After variance reduction has stabilized (M ' 100), furtherincreasing M keeps enhancing the accuracy of the rETmodel without increasing complexity

(e) Estimated risk (f) Complexity

Friedman1 : nmin = 10, K = p = 10 and t = t∗cv

Conclusion & perspectives

1. Drastic pruning while preserving accuracy.2. Strong compressibility of the tree ensemble suggests that it could be

possible to design novel algorithms suited to very high dimensionalinput space.

3. Future research will target similar compression ratio without using thecomplete set of node indicator functions of the forest model.

L1-based compression of random forest model

Arnaud Joly, François Schnitlzer, Pierre Geurts, Louis Wehenkela.joly@ulg.ac.be - www.ajoly.org

Departement of EE and CS & GIGA-Research

Appendix

Overall assessment on 3 datasets

Datasets Error Complexity

ET rET Lasso ET rET Lasso

Friedman1 0.19587 0.18593 0.282441 29900 885 4Two-norm 0.04177 0.06707 0.033500 4878 540 20

SEFTi 0.86159 0.84131 0.988031 39436 2055 14

ET Extra trees;rET L1-based compression of ET.

Table: Overall assessment (parameters of the Extra-Tree method: M = 100;K = p; nmin = 1 on Friedman1 and Two-norm, nmin = 10 on SEFTi).

L1-based compression of random forest modelSlide

Data & Analytics

Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

L1: C# Classespeople.sutd.edu.sg/~ugur_arikan/Documents/Tutorials/CSharp/L1.pdf · L1: C# CLASSES 10/13/16 Uğur Arıkan (ugur_arikan@sutd.edu.sg ; ugur_arikan/) 1 L1: C# Classes

Substation L1/ MCC L1 Update #2

1430 L1-L4 Technical Specifications · Section - 1 2 1430 L1-L4 Technical Specifications. 1430 L1-L4 - Manual # 99906367. 1430 L1-L4, Specifications. Performance. Unit. L1 L2 L3 L4

Compression Data files compression Music compression Image and video compression

Audio Compression notes(Data compression)

ocw.mit.edu · l1 l2 l1 < l2 l1, l2 = number of data. Title: 02-head.dvi

Image Compression Compression Fundamentals

Differential Object Marking in Child and Adult Spanish ...L1 L1 L1 early childhood . middle-late adolescence childhood adulthood . L1 . ... adulthood . Language Attrition in Adults

Random I Random II Random III Random IV Random V

LG 29FS7RL-L1 29FS7RK-L1 CW62C

BRICS · BRICS DS-01-9 F. F. Rodler: Compression with Fast Random Access BRICS Basic Research in Computer Science Compression with Fast Random Access Flemming Friche Rodler BRICS

ASLR on the Line · L1 L2 L3 (Last Level Cache), shared between cores DDR Memory CPU Core L1 L2 CPU Core L1 L2 CPU Core L1 L2 Modern CPU architectures. CPU Core L1 code / L1 data

Does knowledge of L1 grammatical morphology and L1 reading

brass compression fittings · 2019. 10. 12. · brass compression fittings G6 oz L1 in flow dia.D L in D L L1 L L1 U065 L 1/8 U065 L 2.82 .61 .094 .66 3/16 U065 L 3.87 .61 .125 .92

First language acquisition LING 200 Spring 2006. Overview Questions about first language acquisition (L1) Characteristics of L1 Theories of L1 L1 and

image compression in data compression

Practical Data Compression for Modern Memory Hierarchies · 2016. 8. 26. · Keywords: Data Compression, Memory Hierarchy, Cache Compression, Memory Compression, Bandwidth Compression,

Image Compression and Video Compression 2004 Notes - 6 Audio Compression

Entropy of Keys and Password Generation Introduction to entropy Entropy and data compression Predictability of random number generation Entropy and system