41
NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

NLP Breakfast 11

Structuring legal documents

with Deep Learning

Pauline Chavallard2019/10/17

Page 2: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Plan●

●●●●

Page 3: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Google for law

Doctrine was created in 2016

Challenges

- volume of data

- heterogeneity

- domain specificity

Page 4: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Legal contents have tons of links

Page 5: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Challenges in data science at Doctrine

Low/weak supervision:

● No labeled data (esp. in French)

High specificity/heterogeneity:

● Language is different between decisions, legislations and commentaries

● Among decisions, depending on courts, structures are different

● Content comes in various formats (papers, images, PDFs, texts)

Page 6: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

An example of French court decision

Page 7: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Plan●

●●●●

Page 8: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Motivation

● Four million court decisions delivered each year in France

● Critical information for lawyers

Problem:

● Long and complex documents

● One may be interested only in a very precise part

Page 9: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

French court decisions

A french court decision is generally structured following these sections:

● Metadata (« En-tête » in French): court, number, date, etc., of the trial.

● Parties (« Parties » in French): information about the claimants and defendants

● Composition of the court (« Composition de la cour » in French)

● Facts (« Faits » in French): what happened?

● Pleas in law and main arguments (« Moyens » in French): arguments presented by

the claimant and defendant.

● Grounds (« Motifs » in French): reasons and arguments used by the court

● Operative part of the judgment (« Dispositif » in French): final decision

Page 10: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

French court decisions - Example

Cour d'appel de Metz, 28 janvier 2015

Page 11: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

French court decisions

Unfortunately, there is no mandatory guideline on how to

release a court decision.

Courts may use:

● different styles in term of writing

● different styles in term of organising the documents

● all sections from previous slide, or a subset

Page 12: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

The French Court of Appeal usually has a very unified way of

writing: ~55 % have explicit titles for their categories

French Court of Appeal

Extracted from https://www.doctrine.fr/d/CA/Orleans/2007/SKDD824CCFE8D8D9D93128.

Page 13: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

The French Court of Appeal usually has a very unified way of

writing: ~55 % have explicit titles for their categories

French Court of Appeal

Extracted from https://www.doctrine.fr/d/CA/Orleans/2007/SKDD824CCFE8D8D9D93128.

Facts

Page 14: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

For the remaining 45 %, it’s harder...

French Court of Appeal

Extracted from https://www.doctrine.fr/d/CA/Metz/2015/RAC1261A1563690C06B77

Page 15: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

How would an algorithm automatically generate table of contents ?

Page 16: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Plan●

●●●●

Page 17: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Information needed

To complete this task, a human being would take advantage of:

1. The vocabulary used

2. The order of the paragraphs

Page 18: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Information needed

1. The vocabulary used

Not always so obvious, legislation references in both...

-> standard approaches

(BoW - TF-IDF)

encodings performed

poorly

Page 19: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Information needed

1.

2. The order of the paragraphs

● Metadata

● Parties

● Composition of the court

● Facts

● Pleas in law and main arguments

● Grounds

● Operative part of the judgment

-> sequential

information is

important

Page 20: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling

Split decisions into paragraphs (X)

Pre-process Replace rare words by <UNK> with p=0.5

Page 21: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Dataset creation

● Find labeled data from structured decisions with titles

● Remove titles

● Assign each paragraph to its corresponding label (y)

● y ∈ [0, 6]

-> Supervised classification

Page 22: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling

Looks like NER... at paragraph scale.

Inspired from the literature and what we use at Doctrine for NER : bi-LSTMs with

attention [1]

With LSTM we capture information from

● paragraph inherent properties

● paragraph context (the neighborhood gives insights on the label)

[1] Neural Architectures for Named Entity Recognition. Lample, Ballesteros, Subramanian, Kawakami, Dyer.

NAACL 2016.

Page 23: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: paragraph embedding

Page 24: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: all in one

Page 25: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: all in one

Page 26: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: all in one

Page 27: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Plan●

●●●●

Page 28: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: results

● Trained on 20.000 decisions

Page 29: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: results

● Trained on 20.000 decisions

Page 30: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: results

● Trained on 20.000 decisions

Page 31: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: results

● Trained on 20.000 decisions

Page 32: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: results

● Trained on 20.000 decisions

Page 33: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: results

CRF enables to watch transition

probabilities:

● Each class followed by itself

● Metadata -> Parties

● Metadata -> Composition

● Low triangle part: green

● High triangle part: red

Page 34: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: attention

Page 35: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Modeling: attention

Page 36: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Product outcome

On the 45% incomplete table of contents of Court of

Appeal decisions, we now manage to get 90% complete

ones with this approach

Page 37: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Plan●

●●●●

Page 38: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Copyright © Doctrine 38

Errors of the model

Page 39: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Copyright © Doctrine 39

Further work

- better paragraphs / sentences splitting

- one of the tag is very rare, doesn’t perform well

- play with optimizers, dropout, …

- try different architectures ?

Page 40: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Blog post is available

Paragraph classification: article by Doctrine

Page 41: NLP Breakfast 11 Structuring legal documents ... - Feedly Blog€¦ · NLP Breakfast 11 Structuring legal documents with Deep Learning Pauline Chavallard 2019/10/17

Thank you for your attention!

Any questions ?