45
Estimating Querying Decoding Language Model Algorithms Kenneth Heafield Carnegie Mellon, University of Edinburgh March 7, 2013 Kenneth Heafield Language Model Algorithms

Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Language Model Algorithms

Kenneth Heafield

Carnegie Mellon, University of Edinburgh

March 7, 2013

Kenneth Heafield Language Model Algorithms

Page 2: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

MT is Expensive

I “Since decoding is very time-intensive” [Jehl et al, 2012]

I “Based on the amount of memory we can afford” [Wuebker et al, 2012]

Much of the computational cost is due to the language model.

Kenneth Heafield Language Model Algorithms

Page 3: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

MT is Expensive

I “Since decoding is very time-intensive” [Jehl et al, 2012]

I “Based on the amount of memory we can afford” [Wuebker et al, 2012]

Much of the computational cost is due to the language model.

Kenneth Heafield Language Model Algorithms

Page 4: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Desiderata

I More data

I Less CPU time

I Less RAMI High Quality

I Search AccuracyI BLEU

Kenneth Heafield Language Model Algorithms

Page 5: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Language Model Algorithms

Estimating Text → ARPA file with probabilities.

Querying ARPA file → efficient data structure.

Decoding Searching for high-scoring translations.

Kenneth Heafield Language Model Algorithms

Page 6: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Estimating: The Problem

SRILM uses too much memory =⇒ limit on data size.Also, annoying to compile.

IRSTLM does not estimate modified Kneser-Ney models.Also, segfaults.

Kenneth Heafield Language Model Algorithms

Page 7: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Estimating: The Problem

SRILM uses too much memory =⇒ limit on data size.Also, annoying to compile.

IRSTLM does not estimate modified Kneser-Ney models.Also, segfaults.

Kenneth Heafield Language Model Algorithms

Page 8: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Stupid Backoff [Brants et al, 2007]

1. Count n-grams offline

count(wn1 )

2. Compute pseudo-probabilities at runtime

s(wn|wn−11 ) =

{ count(wn1 )

count(wn−11 )

if count(wn) > 0

0.4s(wn|wn−12 ) if count(wn) = 0

NB: s does not sum to 1.

Kenneth Heafield Language Model Algorithms

Page 9: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Stupid Backoff [Brants et al, 2007]

1. Count n-grams offline

count(wn1 )

2. Compute pseudo-probabilities at runtime

s(wn|wn−11 ) =

{ count(wn1 )

count(wn−11 )

if count(wn) > 0

0.4s(wn|wn−12 ) if count(wn) = 0

NB: s does not sum to 1.

Kenneth Heafield Language Model Algorithms

Page 10: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Counting n-grams

<s> Australia is one of the few

5-gram Count<s> Australia is one of 1Australia is one of the 1is one of the few 1

Hash table?

Runs out of RAM.

Kenneth Heafield Language Model Algorithms

Page 11: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Counting n-grams

<s> Australia is one of the few

5-gram Count<s> Australia is one of 1Australia is one of the 1is one of the few 1

Hash table?Runs out of RAM.

Kenneth Heafield Language Model Algorithms

Page 12: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Spill to Disk When RAM Runs Out

Text

Hash Table

Hash Table Hash Table

Sort Sort

File File

File

Merge Sort

Kenneth Heafield Language Model Algorithms

Page 13: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Split Data

Text

Hash Table Hash Table

Sort Sort

File File

File

Merge Sort

Kenneth Heafield Language Model Algorithms

Page 14: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Split and Merge

Text

Hash Table Hash Table

Sort Sort

File File

File

Merge Sort

Kenneth Heafield Language Model Algorithms

Page 15: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Fanout: Merging More Than 2 Files

logfanout |data| passes, each of which reads and writes |data|.

Small fanout =⇒ more passes.Large fanout =⇒ more disk seeks.

Compromise: 64 MB buffer per file; fanout = (RAM/64MB)− 1.

Kenneth Heafield Language Model Algorithms

Page 16: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Optimizing Merge Sort

I Laziness: do the last merge as the file is being read

I Sharding: partition the data by key

I Combine counts during merge passes

Kenneth Heafield Language Model Algorithms

Page 17: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Now we can estimate Stupid Backoff models.

What about modified Kneser-Ney?

Kenneth Heafield Language Model Algorithms

Page 18: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Modified Kneser-NeyText

Extract 5-Grams

Adjust Counts

Normalize

Interpolate

ARPA file

Sort in Suffix Order

Sort in Context Order

Sort in Suffix Order

Kenneth Heafield Language Model Algorithms

Page 19: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Sorting Orders

Suffix Order5 4 3 2 1A A A A AC A A A AA Y A B AA Y Z B AA A A A YC A B A Z

Context Order4 3 2 1 5A A A A AA A A A YC A A A AC A B A ZA Y A B AA Y Z B A

Each step is a streaming algorithm in suffix or context order.

Kenneth Heafield Language Model Algorithms

Page 20: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Adjust Counts

adjusted(wn1 ) =

{count(wn

1 ) if n = 5 or w1 = <s>

|v : count(vwn1 ) > 0| otherwise

Suffix order makes vwn1 consecutive.

Kenneth Heafield Language Model Algorithms

Page 21: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Normalize

normalized(wn1 ) =

adjusted(wn1 )− discount(adjusted(wn

1 ))∑v adjusted(wn−1

1 v)

Context order makes wn−11 v consecutive.

The denominator is computed by a thread that reads ahead.

Kenneth Heafield Language Model Algorithms

Page 22: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Interpolate

p(wn|wn−11 ) = normalized(wn

1 ) + backoff(wn−11 )p(wn|wn−1

2 )

Suffix order means p(wn|wn−12 ) and p(wn|wn−1

1 ) are close.

Kenneth Heafield Language Model Algorithms

Page 23: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

SortingModified Kneser-Ney

Estimation Results

lmplz 132 billion tokens, 4.3 days, one machine (140 GBRAM), lossless

Google 2007 31 billion tokens, 2 days, 400 machines, prunedsingleton words

Kenneth Heafield Language Model Algorithms

Page 24: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Querying: The Problem

The LM is queried ≈2.7 million times per sentence translated.=⇒ Represent the LM to make queries fast.

The University of Edinburgh owns a machine with 1 TB RAM.=⇒ Find more training data!

Kenneth Heafield Language Model Algorithms

Page 25: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Querying: The Problem

The LM is queried ≈2.7 million times per sentence translated.=⇒ Represent the LM to make queries fast.

The University of Edinburgh owns a machine with 1 TB RAM.=⇒ Find more training data!

Kenneth Heafield Language Model Algorithms

Page 26: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Example Language ModelUnigrams

Words log p log b<s> -∞ -2.0iran -4.1 -0.8is -2.5 -1.4one -3.3 -0.9of -2.5 -1.1

BigramsWords log p log b<s> iran -3.3 -1.2iran is -1.7 -0.4is one -2.0 -0.9one of -1.4 -0.6

TrigramsWords log p<s> iran is -1.1iran is one -2.0is one of -0.3

Kenneth Heafield Language Model Algorithms

Page 27: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Example QueriesUnigrams

Words log p log b<s> -∞ -2.0iran -4.1 -0.8is -2.5 -1.4one -3.3 -0.9of -2.5 -1.1

BigramsWords log p log b<s> iran -3.3 -1.2iran is -1.7 -0.4is one -2.0 -0.9one of -1.4 -0.6

TrigramsWords log p<s> iran is -1.1iran is one -2.0is one of -0.3

Query: <s> iran is

log p(is | <s> iran) = -1.1

Query: iran is oflog p(of) -2.5log backoff(is) -1.4log backoff(iran is) + -0.4

log p(of | iran is) = -4.3

Kenneth Heafield Language Model Algorithms

Page 28: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Lookup I: Giant Hash Table

Hash every n-gram to a 64-bit integer. Ignore collisions.Store n-grams in custom linear probing hash tables.

Fastest.

Kenneth Heafield Language Model Algorithms

Page 29: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Lookup II: Minimal Perfect Hash [Talbot and Brants, 2008]

Function from seen n-grams to an integers:

Perfect Each seen n-gram has a unique integer.

Minimal Integers are in [0, |n-grams|).

Use the integer to index an array of probability and backoff.

False Positives

Problem Unseen n-grams collide with seen n-grams.

Solution Store an f -bit signature =⇒ 2−f false positives.

Lowest memory but has false positives.

Kenneth Heafield Language Model Algorithms

Page 30: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Lookup II: Minimal Perfect Hash [Talbot and Brants, 2008]

Function from seen n-grams to an integers:

Perfect Each seen n-gram has a unique integer.

Minimal Integers are in [0, |n-grams|).

Use the integer to index an array of probability and backoff.

False Positives

Problem Unseen n-grams collide with seen n-grams.

Solution Store an f -bit signature =⇒ 2−f false positives.

Lowest memory but has false positives.

Kenneth Heafield Language Model Algorithms

Page 31: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Lookup III: Reverse Trie

Australia <s>

are

oneare

is Australiais Australia <s>

<s>

of oneare

is

Most popular format. Exact, middle memory usage.

Kenneth Heafield Language Model Algorithms

Page 32: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Optimizing Storage

Bit-Level Packing

Use only as many bits as needed.

Chop Bits [Raj and Whittaker, 2003]

In a sorted array, encode bits by the offset where they roll over.

Quantization [Whittaker and Raj, 2001] [IRSTLM]

Cluter floats into 2q bins then store q bits/float.

Kenneth Heafield Language Model Algorithms

Page 33: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Storing Less

Filtering

Remove n-grams that will not be queried during decoding.

Pruning

Remove low-count n-grams.

Encode Less Than Probability and Backoff

I Stupid backoff has one value: count.

I Collapse probability and backoff (requires more CPU time).

Kenneth Heafield Language Model Algorithms

Page 34: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Moses Benchmarks: Single Threaded

8

8

TrieProbing

Chop SRIIRST

8 Rand Backoff 2−8 false

,

/

0 2 4 6 8 10 120

1

2

3

4

Memory (GB)

Tim

e(h

)

Kenneth Heafield Language Model Algorithms

Page 35: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

LookupOptimizing

Querying Takeaways

I Optimizing the LM optimizes the decoder.

I Approximations can have negligible impact on quality27.29 BLEU → 27.09 BLEU by quantizing to 4 bits.

I There is no single best data structure.

Kenneth Heafield Language Model Algorithms

Page 36: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Decoding

March 5: Cube Pruning

Can we do better?

Kenneth Heafield Language Model Algorithms

Page 37: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Recall: Cube Pruning

I Each constituent is visited in bottom-up (topological) order.

I Generate a fixed number of hypothese per constituent.

I Estimated probabilities for words on the edge.

Kenneth Heafield Language Model Algorithms

Page 38: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

A Beam

X :X

countries that a maintain diplomatic relations ` with North Korea .

Left State Right Staterelations

tiescountries that have a an embassy in ` DPR Korea .country a that maintains some diplomatic ties ` in North Korea .nations which has a some diplomatic ties ` with DPR Korea .country a that maintains some diplomatic ties ` with DPR Korea .

� denotes words omitted by state.Rule “is a X :X”

Cube pruning tests variants of the same bad idea.

Kenneth Heafield Language Model Algorithms

Page 39: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

A Beam With State

X :X

Left State Right State Score(countries that a � ` with North Korea .) -2(nations which has a � ` with DPR Korea .) -4(countries that have a � ` DPR Korea .) -5(country a � ` in North Korea .) -8(country a � ` with DPR Korea .) -9

� denotes words omitted by state.

Rule “is a X :X”Cube pruning tests variants of the same bad idea.

Kenneth Heafield Language Model Algorithms

Page 40: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

A Beam With State

X :X

Left State Right State Score(countries that a � ` with North Korea .) -2(nations which has a � ` with DPR Korea .) -4(countries that have a � ` DPR Korea .) -5(country a � ` in North Korea .) -8(country a � ` with DPR Korea .) -9

is ais ais a

is a

� denotes words omitted by state.

Rule “is a X :X”Cube pruning tests variants of the same bad idea.

Kenneth Heafield Language Model Algorithms

Page 41: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

High Level Idea of Incremental Search

Group hypotheses by common outer words.Score outer words first, see how well they do.

Kenneth Heafield Language Model Algorithms

Page 42: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Make a tree

(ε � ε)

(country a � Korea .)

(country a � ` with DPR Korea .)

(country a � ` in North Korea .)

(nations which has a � ` with DPR Korea .)

(countries that � Korea .)

(countries that have a � ` DPR Korea .)

(countries that a � ` with North Korea .)

[1]

Kenneth Heafield Language Model Algorithms

Page 43: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Use the Tree

Try “in (country a � ` Korea .)” once and see if you like it.

Pop top candiate off the priority queue, expand children, push.

Kenneth Heafield Language Model Algorithms

Page 44: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Use the Tree

Try “in (country a � ` Korea .)” once and see if you like it.

Pop top candiate off the priority queue, expand children, push.

Kenneth Heafield Language Model Algorithms

Page 45: Language Model Algorithms - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/sp2013-11731/slides/15.decoding.pdf · Estimating Querying Decoding MT is Expensive I \Since decoding is

EstimatingQueryingDecoding

Results

Kenneth Heafield Language Model Algorithms