NLP in Scala with Breeze and Epic David Hall UC Berkeley

NLP in Scala with Breeze and Epic

David HallUC Berkeley

ScalaNLP Ecosystem

Breeze Epic Puck

• Linear Algebra• Scientific Computing• Optimization

• Natural Language Processing

• Structured Prediction

• Super-fast GPUparser for English≈ ≈

Numpy/Scipy PyStruct/NLTK

≈{ }

Some fruit visionaries say the Fuji could someday tumble the Red Delicious from the top of America's apple heap.

It certainly won’t get there on looks.

Natural Language Processing

Epic for NLP

Named Entity Recognition

Person Organization Location Misc

NER with Epic> import epic.models.NerSelector> val nerModel = NerSelector.loadNer("en").get> val tokens = epic.preprocess.tokenize("Almost 20

years ago, Bill Watterson walked away from \"Calvin & Hobbes.\"")

> println(nerModel.bestSequence(tokens).render("O"))

Almost 20 years ago , [PER:Bill Watterson] walked away from `` [LOC:Calvin & Hobbes] . ''

Not a location!

Annotate a bunch of data?

Building an NER system> val data: IndexedSeq[Segmentation[Label, String]]

= ???> val system = SemiCRF.buildSimple(data,

startLabel,

outsideLabel)> println(system.bestSequence(tokens).render("O"))

Almost 20 years ago , [PER:Bill Watterson] walked away from `` [MISC:Calvin & Hobbes] . ''

Gazetteers

http://en.wikipedia.org/wiki/List_of_newspaper_comic_strips_A%E2%80%93F

Gazetteers

Using your own gazetteer> val data: IndexedSeq[Segmentation[Label, String]]

= ???> val myGazetteer = ???> val system = SemiCRF.buildSimple(data,

startLabel, outsideLabel,

gaz = myGazetteer)

Gazetteer• Careful with gazetteers!• If built from training data, system will use it

and only it to make predictions!• So, only known forms will be detected.• Still, can be very useful…

Semi-CR-What?• Semi-Markov Conditional Random Field• Don’t worry about the name.

Semi-CRFs

+ score(Berkeley, CA)+ score(- A bowl of )+ score(Churchill-Brenneis Orchards)+ score(Page mandarins and medjool dates)

score(Chez Panisse)

Features

= w(starts-with-Chez)+w(starts-with-C…)+w(ends-with-P…)+w(starts-sentence)+w(shape:Xxx Xxx) +w(two-words)+w(in-gazetteer)

score(Chez Panisse)

Building your own featuresval dsl = new WordFeaturizer.DSL[L](counts) with SurfaceFeaturizer.DSL import dsl._

word(begin) // word at the beginning of the span + word(end – 1) // end of the span + word(begin – 1) // before (gets things like Mr.) + word (end) // one past the end + prefixes(begin) // prefixes up to some length + suffixes(begin) + length(begin, end) // names tend to be 1-3 words + gazetteer(begin, end)

Using your own featurizer> val data: IndexedSeq[Segmentation[Label, String]]

= ???> val myFeaturizer = ???> val system = SemiCRF.buildSimple(data,

startLabel, outsideLabel,

featurizer = myFeaturizer)

Features• So far, we’ve been able to do everything with

(nearly) no math.• To understand more, need to do some math.

Machine Learning Primer• Training example (x, y)– x: sentence of some sort– y: labeled version

• Goal:– want score(x, y) > score(x, y’), forall y’.

Machine Learning Primer

score(x, y) = wTf(x, y)

score(x, y) = w.t * f(x, y)

score(x, y) = w dot f(x, y)

score(x, y) >= score(x, y’)

w dot f(x, y) >= w dot f(x, y’)

f(x,y)

f(x,y’)

w + f(x,y)

(w + f(x,y)) dot f(x,y) >= w dot f(x,y)

The Perceptron> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double]

(featureIndex.size)

> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val labelScores = DV.tabulate(labelIndex.size) { yy =>

val features = featuresFor(x, yy)weights.t * new FeatureVector(indexed)// or weights dot new FeatureVector(indexed)

} …}

The Perceptron (cont’d)> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double](featureIndex.size)

> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val labelScores = ... val y_best = argmax(labelScores)

if (y != y_best) {weights += new FeatureVector(featuresFor(x,

y))weights -= new FeaureVector(featuresFor(x,

y_best)) }}

Structured Perceptron• Can’t enumerate all segmentations! (L * 2n)• But dynamic programs exist to max or sum• … if the feature function has a nice form.

Structured Perceptron> val featureIndex : Index[Feature] = ???> val labelIndex: Index[Label] = ???> val weights = DenseVector.rand[Double]

(featureIndex.size)

> for ( epoch <- 0 until numEpochs; (x, y) <- data ) { val y_best = bestStructure(weights, x) if (y != y_best) { weights += featuresFor(x, y) weights -= featuresFor(x, y_best) } }

Constituency Parsing

Multilingual Parser

Avg Ar Ba Fr Ge HeHun Ko Po

95"Berkeley"Epic

“Berkeley:” [Petrov & Klein, 2007]; Epic [Hall, Durrett, and Klein, 2014]

Epic Pre-built Models• Parsing

– English, Basque, French, German, Swedish, Polish, Korean

– (working on Arabic, Chinese, Spanish)

• Part-of-Speech Tagging– English, Basque, French, German, Swedish,

Polish

• Named Entity Recognition– English

• Sentence segmentation– English – (ok support for others)

• Tokenization– Above languages

What Is Breeze?

Dense Vectors, Matrices, Sparse Vectors,Counters, Matrix Decompositions

What Is Breeze?

Nonlinear Optimization,Probability Distributions

Getting startedlibraryDependencies ++= Seq( "org.scalanlp" %% "breeze" % "0.8.1",

// optional: native linear algebra libraries // bulky, but faster "org.scalanlp" %% "breeze-natives" % "0.8.1")

scalaVersion := "2.11.1"

Linear Algebra> import breeze.linalg._> val x = DenseVector.zeros[Int](5)

// DenseVector(0, 0, 0, 0, 0)

> val m = DenseMatrix.zeros[Int](5,5)

> val r = DenseMatrix.rand(5,5)

> m.t // transpose> x + x // addition> m * x // multiplication by vector> m * 3 // by scalar> m * m // by matrix> m :* m // element wise mult, Matlab .*

Return Type Selection> import breeze.linalg.{DenseVector => DV}

> val dv = DV(1.0, 2.0)

> val sv = SparseVector.zeros[Double](2)

> dv + sv// res: DenseVector[Double] = DenseVector(1.0, 2.0)

> sv + sv// res: SparseVector[Double] = SparseVector(1.0, 2.0)

Return Type Selection> import breeze.linalg.{DenseVector => DV}

> val dv = DV(1.0, 2.0)

> val sv = SparseVector.zeros[Double](2)

> (dv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = DenseVector(1.0, 2.0)

> (sv: Vector[Double]) + (sv: Vector[Double])// res: Vector[Double] = SparseVector()

Static: VectorDynamic: Dense

Static: VectorDynamic: Sparse

Linear Algebra: Slices> val m = DenseMatrix.zeros[Int](5, 5)

> m(::,1) // slice a column// DV(0, 0, 0, 0, 0)

> m(4,::) // slice a row

> m(4,::) := DV(1,2,3,4,5).t

> m.toString0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5

Linear Algebra: Slices> m(0 to 1, 3 to 4).toString

0 0 2 3

> m(IndexedSeq(3,1,4,2),IndexedSeq(4,4,3,1))

0 0 0 0 0 0 0 0 5 5 4 2 0 0 0 0

Universal Functions> import breeze.numerics._

> log(DenseVector(1.0, 2.0, 3.0, 4.0))// DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906)

> exp(DenseMatrix( (1.0, 2.0), (3.0, 4.0)))

> sin(Array(2.0, 3.0, 4.0, 42.0))

> // also sin, cos, sqrt, asin, floor, round, digamma, trigamma, ...

UFuncs: In-Place> import breeze.numerics._

> val v = DV(1.0, 2.0, 3.0, 4.0)

> log.inPlace(v)// v == DenseVector(0.0, 0.6931471805599453, // 1.0986122886681098, 1.3862943611198906)

UFuncs: Reductions> sum(DenseVector(1.0, 2.0, 3.0))

// 6.0> sum(DenseVector(1, 2, 3))

// 6> mean(DenseVector(1.0, 2.0, 3.0))

// 2.0> mean(DenseMatrix( (1.0, 2.0, 3.0)

(4.0, 5.0, 6.0)))// 3.5

UFuncs: Reductions> val m = DenseMatrix((1.0, 3.0), (4.0, 4.0))

> sum(m) == 12.0

> mean(m) == 3.0

> sum(m(*, ::)) == DenseVector(4.0, 8.0)> sum(m(::, *)) == DenseMatrix((5.0, 7.0))

> mean(m(*, ::)) == DenseVector(2.0, 4.0)> mean(m(::, *)) == DenseMatrix((2.5, 3.5))

Broadcasting> val dm = DenseMatrix((1.0,2.0,3.0), (4.0,5.0,6.0))

> sum(dm(::, *)) == DenseMatrix((5.0, 7.0, 9.0))

> dm(::, *) + DenseVector(3.0, 4.0)// DenseMatrix((4.0, 5.0, 6.0), (8.0, 9.0, 10.0)))

> dm(::, *) := DenseVector(3.0, 4.0)// DenseMatrix((3.0, 3.0, 3.0), (4.0, 4.0, 4.0))

UFuncs: Implementation> object log extends UFunc

> implicit object logDouble extends log.Impl[Double, Double] { def apply(x: Double) = scala.math.log(x)

> log(1.0) // == 0.0

> // elsewhere: magic to automatically make this work:> log(DenseVector(1.0)) // == DenseVector(0.0)> log(DenseMatrix(1.0)) // == DenseMatrix(0.0)> log(SparseVector(1.0)) // == SparseVector()

UFuncs: Implementation> object add1 extends UFunc

> implicit object add1Double extends add1.Impl[Double, Double] { def apply(x: Double) = x + 1.0

> add1(1.0) // == 2.0

> // elsewhere: magic to automatically make this work:> add1(DenseVector(1.0)) // == DenseVector(2.0)> add1(DenseMatrix(1.0)) // == DenseMatrix(2.0)> add1(SparseVector(1.0)) // == SparseVector(2.0)

Breeze Benchmarks

Breeze jblas Mahout Apache Commons0

135.04

Singular Value Decomposition

Lower is better

Breeze Benchmarks

Sparse/Dense Multiply

Lower is better

Breeze Benchmarks

0.037 0.075

25.376

Sparse/Dense Addition

Lower is better

Nonnegative Matrix Factorization

V W H≈ Vij, Wij, Hij >= 0

Nonnegative Matrix Factorization

• Input: matrix V• Output: W * H ≈ V, W and H ≥ 0

[Lee and Seung, 2011]

Breeze NMF val n = V.rows val m = V.cols val r = dims val W = DenseMatrix.rand(n, r) val H = DenseMatrix.rand(r, m)

for(i <- 0 until iters) { H :*= ((W.t * V) :/= (W.t * W * H)) W :*= ((V * H.t) :/= (W * H * H.t)) }

(W, H)

From Breeze to Gust

Breeze NMF val n = X.rows val m = X.cols val r = dims val W = DenseMatrix.rand(n, r) val H = DenseMatrix.rand(r, m)

for(i <- 0 until iters) { H :*= ((W.t * X) :/= (W.t * W * H)) W :*= ((X * H.t) :/= (W * H * H.t)) }

(W, H)

Gust NMF val n = X.rows val m = X.cols val r = dims val W = CuMatrix.rand(n, r) val H = CuMatrix.rand(r, m)

for(i <- 0 until iters) { H :*= ((W.t * X) :/= (W.t * W * H)) W :*= ((X * H.t) :/= (W * H * H.t)) }

(W, H)

≥6x speedup on my laptop!

Optimization

(x2 + y – 11)2 + (x + y2 – 7)2

Optimization

trait DiffFunction[T] extends (T=>Double) { /** Calculates both the value

and the gradient at a point */ def calculate(x:T):(Double,T)}

Optimization

val df = new DiffFunction[DV[Double]] { def calculate(values: DV[Double]) = { val gradient = DV.zeros[Double](2) val (x,y) = (values(0),values(1)) val value = pow(x * x + y - 11, 2) + pow(x + y * y - 7, 2) gradient(0) = 4 * x * (x * x + y - 11) + 2 * (x + y * y - 7) gradient(1) = 2 * (x * x + y - 11) + 4 * y * (x + y * y - 7)

(value, gradient)

Optimizationbreeze.optimize.minimize(df, DV(0.0, 0.0))

// DenseVector(3.0000000000012905, 1.999999999997128)

Optimization• Fully configurable• Support for:– LBFGS (+ L2/L1 Regularization)– Stochastic Gradient (+ L2/L1 Regularization)– Truncated Newton– Linear Programs– Conjugate Gradient Descent

Thanks!

Breeze Epic Puck

www.github.com/dlwh/{breeze, epic, puck}

NLP in Scala with Breeze and Epic David Hall UC Berkeley

Documents

Scala Abide: A lint tool for Scala

W^rgreiner/C-366/RG-2002-SLIDES/PC-… · Breeze Breeze Breeze Breeze Breeze Stench Stench Breeze PIT PIT PIT 1 2 3 4 1 2 3 4 START Gold Stench { { { ~

Table de conversion des mesures linéaires Duodécimal et ...Scalimetri in pollici Francesi Scala 1 ÷ 72 Scala 1 ÷ 48 Scala 1 ÷ 36 Scalimetri in metrico Scala 1 ÷ 72 Scala 1 ÷

NLP Training - NLP Certification

Spring Scala - files.meetup.comfiles.meetup.com/11921292/Spring Scala - Sneaking Scala into your... · Apache Camel Spring Scala (Spring Source aka VMware aka Pivotal) 7 years of

Shriram City News Letter - Breeze- 1st September 2019 Breeze/Sep 2019 Breeze/BREEZE... · June 2019 8 Shriram City—News Letter - Breeze- 1st September 2019 SEPTEMBER 2019 BADMINTON

DERRERA Gayle (Daughty ...sites.rootsweb.com/~cowcgs/WCMI/Derrera_G.pdf · The Johnstown Breeze The Johnstown Breeze The Johnstown Breeze The Johnstown Breeze The Johnstown Breeze

NLP and ML in Scala with Breeze David Hall UC Berkeley 9/18/2012 dlwh@cs.berkeley.edu

Scala Days 2018 You Are a - Lightbend...scala/scala-parallel-collections scala/scala-collection-compat scala/scala-java8-compat scala/scala-swing scala/scala-async scala/scala-continuations

NLP Practitioner Certification Training - ubermind.coubermind.co/assets/uber-mind-nlp-flyer-2017-1213-03.pdf · grates both traditional (behavioral) NLP along with NLP New Code. NLP

BREEZE AT - Dubai off plan Properties · breeze at dubai creek beach breeze at creek beach. breeze at dubai creek beach breeze at creek beach. breeze at dubai creek beach breeze at

(Nlp) Nlp Secrets

NLP Practitioner Notes · NLP Practitioner Notes NLP Practitioner Course presented by Micheal Goodman NLP Master Practitioner (ABNLP) and NLP Trainer (NLPEA) 2018

Scala Next SF Scala meetup Dec 8 th, 2011. 2 Scala Today

NLP Practitioner Heart of NLP - NLP Courses

PredictionIO – A Machine Learning Server in Scala – SF Scala

breeze quickstart V2.0 PORTUGUESE - breeze+quickstart+V2.0

Scala By Example - The Scala Programming Language

NLP in Scala with Breeze and Epic

Scala 2013: a Pragmatic guide to Scala adoPtionpages.zeroturnaround.com/.../images/Scala2013-A-Pragmatic-Guide-To... · Scala 2013: a Pragmatic guide to Scala ... Numerous blog posts