37
Chair of Software Engineering for Business Information Systems (sebis) Faculty of Informatics Technische Universität München wwwmatthes.in.tum.de Sentence Boundary Detection in German Legal Documents Sebastian Moser, August 19 th , 2019, Final Presentation Bachelor’s Thesis

Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Chair of Software Engineering for Business Information Systems (sebis)

Faculty of Informatics

Technische Universität München

wwwmatthes.in.tum.de

Sentence Boundary Detection in German Legal

DocumentsSebastian Moser, August 19th, 2019, Final Presentation Bachelor’s Thesis

Page 2: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Introduction

▪ Motivation

▪ Research Questions

▪ Sentence Boundaries in Legal Documents

Dataset

SBD System

▪ Overview

▪ Existing Approaches

▪ Rule-Based

▪ CRF

▪ NN

Evaluation

▪ Legal Documents

▪ Wikipedia Articles

▪ XML Documents

Conclusion + Demo

Outline

© sebis190819 Moser SBD in German Legal Documents 2

Page 3: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Motivation - Publisher

© sebis190819 Moser SBD in German Legal Documents 3

Author Legal Expert

XML/DTD

Regular

ExpressionsLegal Text

Page 4: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

What are sentences in the legal domain?

How should the document corpus be build?

Which methods are state-of-the-art solutions in other domains?

What are the best methods for SBD on German legal documents?

What are the functional/non-functional requirements of the SBD system?

How good are existing approaches on German legal documents?

Are different solutions required for different legal document types?

Research Questions

© sebis190819 Moser SBD in German Legal Documents 4

Page 5: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

What are sentences in the legal domain?

How should the document corpus be build?

Which methods are state-of-the-art solutions in other domains?

What are the best methods for SBD on German legal documents?

What are the functional/non-functional requirements of the SBD system?

How good are existing approaches on German legal documents?

Are different solutions required for different legal document types?

Research Questions

© sebis190819 Moser SBD in German Legal Documents 5

Page 6: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

§ 556g Rechtsfolge; Auskunft über die Mieter1

(1) Eine zum Nachteil des Mieters von den Vorschriften dieses Unterkapitels abweichende Vereinbarung ist unwirksam.

(1a) Der Vermieter ist verpflichtet, dem Mieter vor dessen Abgabe der Vertragserklärung über Folgendes unaufgefordert Auskunft zu erteilen:

1. im Fall des § 556e Abs. 1 darüber, wie hoch die Vormiete ein Jahr vor Beendigung des Vormietverhältnisses war, […]

4. im Fall des § 556f Satz 2 darüber, dass es sich um die erste Vermietung nach umfassender Modernisierung handelt. […]

(4) Sämtliche Erklärungen nach den Absätzen 1a bis 3 bedürfen der Textform.

Fußnote

(+++ § 556g: Zur Anwendung vgl. §§ 557a, 557b +++)

(+++ § 556g: Zur Nichtanwendung vgl. § 35 BGBEG +++)

(+++ § 556g: Zur Anwendung vgl. Art. 229 § 49 Abs. 2 BGBEG +++)

Unterkapitel 2

Regelungen über die Miethöhe

§ 557 Mieterhöhungen nach Vereinbarung oder Gesetz

[…]

Sentence Boundaries in Legal Documents

© sebis190819 Moser SBD in German Legal Documents 6

1 BGB § 556g shortened, slightly changed

Page 7: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

§ 556g Rechtsfolge; Auskunft über die Mieter1

(1) Eine zum Nachteil des Mieters von den Vorschriften dieses Unterkapitels abweichende Vereinbarung ist unwirksam.

(1a) Der Vermieter ist verpflichtet, dem Mieter vor dessen Abgabe der Vertragserklärung über Folgendes unaufgefordert Auskunft zu erteilen:

1. im Fall des § 556e Abs. 1 darüber, wie hoch die Vormiete ein Jahr vor Beendigung des Vormietverhältnisses war, […]

4. im Fall des § 556f Satz 2 darüber, dass es sich um die erste Vermietung nach umfassender Modernisierung handelt. […]

(4) Sämtliche Erklärungen nach den Absätzen 1a bis 3 bedürfen der Textform.

Fußnote

(+++ § 556g: Zur Anwendung vgl. §§ 557a, 557b +++)

(+++ § 556g: Zur Nichtanwendung vgl. § 35 BGBEG +++)

(+++ § 556g: Zur Anwendung vgl. Art. 229 § 49 Abs. 2 BGBEG +++)

Unterkapitel 2

Regelungen über die Miethöhe

§ 557 Mieterhöhungen nach Vereinbarung oder Gesetz

(1) Während des Mietverhältnisses können die Parteien eine Erhöhung der Miete vereinbaren.

Sentence Boundaries: Abbreviations [1]

© sebis190819 Moser SBD in German Legal Documents 7

1 BGB § 556g shortened, slightly changed

Page 8: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

§ 556g Rechtsfolge; Auskunft über die Mieter1

(1) Eine zum Nachteil des Mieters von den Vorschriften dieses Unterkapitels abweichende Vereinbarung ist unwirksam.

(1a) Der Vermieter ist verpflichtet, dem Mieter vor dessen Abgabe der Vertragserklärung über Folgendes unaufgefordert Auskunft zu erteilen:

1. im Fall des § 556e Abs. 1 darüber, wie hoch die Vormiete ein Jahr vor Beendigung des Vormietverhältnisses war, […]

4. im Fall des § 556f Satz 2 darüber, dass es sich um die erste Vermietung nach umfassender Modernisierung handelt. […]

(4) Sämtliche Erklärungen nach den Absätzen 1a bis 3 bedürfen der Textform.

Fußnote

(+++ § 556g: Zur Anwendung vgl. §§ 557a, 557b +++)

(+++ § 556g: Zur Nichtanwendung vgl. § 35 BGBEG +++)

(+++ § 556g: Zur Anwendung vgl. Art. 229 § 49 Abs. 2 BGBEG +++)

Unterkapitel 2

Regelungen über die Miethöhe

§ 557 Mieterhöhungen nach Vereinbarung oder Gesetz

(1) Während des Mietverhältnisses können die Parteien eine Erhöhung der Miete vereinbaren.

Sentence Boundaries: Abbreviations [1]

© sebis190819 Moser SBD in German Legal Documents 8

1 BGB § 556g shortened, slightly changed

Page 9: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

§ 556g Rechtsfolge; Auskunft über die Mieter1

(1) Eine zum Nachteil des Mieters von den Vorschriften dieses Unterkapitels abweichende Vereinbarung ist unwirksam.

(1a) Der Vermieter ist verpflichtet, dem Mieter vor dessen Abgabe der Vertragserklärung über Folgendes unaufgefordert Auskunft zu erteilen:

1. im Fall des § 556e Abs. 1 darüber, wie hoch die Vormiete ein Jahr vor Beendigung des Vormietverhältnisses war, […]

4. im Fall des § 556f Satz 2 darüber, dass es sich um die erste Vermietung nach umfassender Modernisierung handelt. […]

(4) Sämtliche Erklärungen nach den Absätzen 1a bis 3 bedürfen der Textform.

Fußnote

(+++ § 556g: Zur Anwendung vgl. §§ 557a, 557b +++)

(+++ § 556g: Zur Nichtanwendung vgl. § 35 BGBEG +++)

(+++ § 556g: Zur Anwendung vgl. Art. 229 § 49 Abs. 2 BGBEG +++)

Unterkapitel 2

Regelungen über die Miethöhe

§ 557 Mieterhöhungen nach Vereinbarung oder Gesetz

(1) Während des Mietverhältnisses können die Parteien eine Erhöhung der Miete vereinbaren.

Sentence Boundaries: Mandatory Positions [2]

© sebis190819 Moser SBD in German Legal Documents 9

1 BGB § 556g shortened, slightly changed

Page 10: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

§ 556g Rechtsfolge; Auskunft über die Mieter1

(1) Eine zum Nachteil des Mieters von den Vorschriften dieses Unterkapitels abweichende Vereinbarung ist unwirksam.

(1a) Der Vermieter ist verpflichtet, dem Mieter vor dessen Abgabe der Vertragserklärung über Folgendes unaufgefordert Auskunft zu erteilen:

1. im Fall des § 556e Abs. 1 darüber, wie hoch die Vormiete ein Jahr vor Beendigung des Vormietverhältnisses war, […]

4. im Fall des § 556f Satz 2 darüber, dass es sich um die erste Vermietung nach umfassender Modernisierung handelt. […]

(4) Sämtliche Erklärungen nach den Absätzen 1a bis 3 bedürfen der Textform.

Fußnote

(+++ § 556g: Zur Anwendung vgl. §§ 557a, 557b +++)

(+++ § 556g: Zur Nichtanwendung vgl. § 35 BGBEG +++)

(+++ § 556g: Zur Anwendung vgl. Art. 229 § 49 Abs. 2 BGBEG +++)

Unterkapitel 2

Regelungen über die Miethöhe

§ 557 Mieterhöhungen nach Vereinbarung oder Gesetz

(1) Während des Mietverhältnisses können die Parteien eine Erhöhung der Miete vereinbaren.

Sentence Boundaries: Legal Domain [3][4]

© sebis190819 Moser SBD in German Legal Documents 10

1 BGB § 556g shortened, slightly changed

Page 11: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Introduction

▪ Motivation

▪ Research Questions

▪ Sentence Boundaries in Legal Documents

Dataset

SBD System

▪ Overview

▪ Existing Approaches

▪ Rule-Based

▪ CRF

▪ NN

Evaluation

▪ Legal Documents

▪ Wikipedia Articles

▪ XML Documents

Conclusion + Demo

Outline

© sebis190819 Moser SBD in German Legal Documents 11

Page 12: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Dataset

© sebis190819 Moser SBD in German Legal Documents 12

Laws Judgments WikipediaTerms of

ServicePrivacy

PoliciesXML

BGB

SGB

1 - 3SGB

1 - 3SGB

1 - 3

StGB

GG

131

Doc.

100

Doc.

11

Doc.

GG

SGB

1

14

Doc.

Page 13: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

0

5

10

15

20

25

Laws Judgments Terms of Service Privacy Policies XML Wikipedia

Sentences in Thousands

Dataset

© sebis190819 Moser SBD in German Legal Documents 13

Page 14: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Introduction

▪ Motivation

▪ Research Questions

▪ Sentence Boundaries in Legal Documents

Dataset

SBD System

▪ Overview

▪ Existing Approaches

▪ Rule-Based

▪ Conditional Random Fields

▪ Recurrent Neural Network

Evaluation

▪ Legal Documents

▪ Wikipedia Articles

▪ XML Documents

Conclusion + Demo

Outline

© sebis190819 Moser SBD in German Legal Documents 14

Page 15: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Overview

© sebis190819 Moser SBD in German Legal Documents 15

Text

NLTK

OpenNLP

Splitter

Page 16: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Overview

© sebis190819 Moser SBD in German Legal Documents 16

Text

NLTK

Tokenizer

OpenNLP

Rule

CRF

NN

Splitter

Page 17: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Overview

© sebis190819 Moser SBD in German Legal Documents 17

Text

NLTK

Template

Tokenizer

OpenNLP

Rule

CRF

NN

Splitter

Page 18: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Existing Approaches

NLTK/Punkt [5]

▪ Unsupervised

▪ Abbreviation Disambiguation

▪ Hypothesis testing

▪ Python

OpenNLP [6]

▪ Supervised

▪ Statistical model with hardcoded

features

▪ Java

© sebis190819 Moser SBD in German Legal Documents 18

Page 19: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Rule-Based

▪ Definition of rules for sentence boundaries (SB)

▪ Rules based on context window and regular expressions

▪ Positive rules → SB

▪ Negative rules → remove SB

© sebis190819 Moser SBD in German Legal Documents 19

Paragraph

[§|Upper, Number, AlphaNumeric, \n]

↔ § 81 Stiftungsgeschäft \n

Abbreviation

[Abs|Art|Rn|Urt|Buchst|bzw|…]

Page 20: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Conditional Random Fields

▪ Statistical model for sequence modelling

▪ Label probability inferred via predefined features for individual tokens

▪ Features used: Special, Lowercase, Length, Signature, Lower, Upper, Number

▪ Implementation with CRFSuite

▪ Dependencies between input/output sequence → linear-chain CRF

© sebis190819 Moser SBD in German Legal Documents 20

X1 X2 X3

Y1 Y2 Y3

Page 21: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Recurrent Neural Network

▪ Recurrent Neural Networks keep information from previous processing steps

▪ Input: Word2vec word embeddings (pretrained + trained on corpus)

+ Context around token

▪ Implementation in PyTorch

▪ Combination of bidirectional recurrent (RNN, LSTM, GRU) and linear

processing units

© sebis190819 Moser SBD in German Legal Documents 21

Σ

Σ

ΣX3

X2

X1

Y3

Y2

Y1

Page 22: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Introduction

▪ Motivation

▪ Research Questions

▪ Sentence Boundaries in Legal Documents

Dataset

SBD System

▪ Overview

▪ Existing Approaches

▪ Rule-Based

▪ Conditional Random Fields

▪ Recurrent Neural Network

Evaluation

▪ Legal Documents

▪ Wikipedia Articles

▪ XML Documents

Conclusion + Demo

Outline

© sebis190819 Moser SBD in German Legal Documents 22

Page 23: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

72,9

%

73,0

% 79,0

%

82,7

%

95,4

%

97,7

%

61,1

%

61,2

%

71,0

%

74,7

%

94,4

%

98,1

%

90,4

%

92,1

%

89,2

%

92,8

%

96,4

%

97,3

%

OPENNLP NLTK TEMP(NLTK) RULE CRF NN

F1 Recall Pre

Legal Documents: Laws

© sebis190819 Moser SBD in German Legal Documents 23

Page 24: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Legal Documents: Judgments

© sebis190819 Moser SBD in German Legal Documents 24

78,1

%

69,1

%

69,1

%

88,6

% 96,8

%

98,7

%

65,6

%

66,4

%

69,7

%

88,3

% 96,6

%

99,3

%

96,3

%

73,9

%

68,5

%

88,9

% 97,0

%

98,1

%

OPENNLP NLTK TEMP(NLTK) RULE CRF NN

F1 Recall Pre

Page 25: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Legal Documents: Privacy Policies, Terms of Service

© sebis190819 Moser SBD in German Legal Documents 25

95,4

%

97,7

%

96,8

% 98,7

%

91,8

%

92,3

%

84,5

%

82,6

%

87,3

%

94,4

%

98,1

%

96,6

%

99,3

%

86,4

%

87,3

%

81,6

%

84,0

%

86,8

%

96,4

%

97,3

%

97,0

%

98,1

%

95,7

% 97,7

%

87,6

%

81,3

%

87,7

%

70,00%

75,00%

80,00%

85,00%

90,00%

95,00%

100,00%

CRF(LAW S) NN(LAW S) CRF(JUDG) NN(JUDG) CRF(TOS) NN(TOS) CRF(PRV) NN(PRV) W IKI

F1 Recall Pre

Page 26: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

XML Documents

© sebis190819 Moser SBD in German Legal Documents 26

95,4

%

97,0

%

94,4

%

97,5

%

96,6

%

96,4

%

90,00%

91,00%

92,00%

93,00%

94,00%

95,00%

96,00%

97,00%

98,00%

99,00%

100,00%

CRF LAW CRF XML

F1 Recall Pre

Page 27: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Introduction

▪ Motivation

▪ Research Questions

▪ Sentence Boundaries in Legal Documents

Dataset

SBD System

▪ Overview

▪ Existing Approaches

▪ Rule-Based

▪ Conditional Random Fields

▪ Recurrent Neural Network

Evaluation

▪ Legal Documents

▪ Wikipedia Articles

▪ XML Documents

Conclusion + Demo

Outline

© sebis190819 Moser SBD in German Legal Documents 27

Page 28: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Conclusion

Implementation of tailored SBD system for the German legal domain

Creation of SBD dataset for the German legal domain

Performance evaluation:

→ Existing solutions not useful for legal texts

→ Highly specialized methods needed

→ state-of-the-art results with recurrent neural networks

Legal documents are harder to process than normal text

Demonstration

© sebis190819 Moser SBD in German Legal Documents 28

Page 29: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

References

[1] Reynar, J. C.; Ratnaparkhi, A.: A Maximum Entropy Approach to Identifying

Sentence Boundaries. In Proceedings of COLING 2012: Posters. Pages 985-994.

Mumbai, India. December 2012.

[2] Mikheev, A.: Tagging Sentence Boundaries. In Proceedings of the 1st North

American Chapter of the Association for Computational Linguistics Conference.

NAACL 2000. Pages 264-271. Stroudsburg, PA, USA. 2000.

[3] de Maat, E.: Making sense of legal texts. PhD thesis. University of Amsterdam.

2012.

[4] Savelka, J.; Ashley, K. D.: Sentence Boundary Detection in Adjudicatory

Decisions in the United States. Traitement automatique des langues.

58(February):21-45. 2017.

[5] Natural Language Toolkit. https://www.nltk.org. Last Access: August 16, 2019

[6] Apache OpenNLP. https://opennlp.apache.org/. Last Access: August 16, 2019

[7] Okazaki, N.: CRFSuite: a fast implementation of Conditional Random Fields

(CRFs). 2007.

© sebis190819 Moser SBD in German Legal Documents 29

Page 30: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Technische Universität München

Faculty of Informatics

Chair of Software Engineering for

Business Information Systems

Boltzmannstraße 3

85748 Garching bei München

Tel +49.89.289.

Fax +49.89.289.17136

wwwmatthes.in.tum.de

Sebastian Moser

17132

[email protected]

Page 31: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

0

10

20

30

40

50

60

Laws Judgments Terms of Service Privacy Policies XML Wikipedia

Sentences in Thousands Tokens per Sentence

Dataset

© sebis190819 Moser SBD in German Legal Documents 31

Page 32: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Neural Network Performance on Legal Documents

© sebis190819 Moser SBD in German Legal Documents 32

97,7

%

84,3

%

87,5

%

98,7

%

98,1

%

80,9

%

84,4

%

99,3

%

97,3

%

88,0

%

90,9

% 98,1

%

LAW S (LAW ) LAW S (JUDG) JUDG (LAW ) JUDG (JUDG)

F1 Recall Pre

Page 33: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Template

© sebis190819 Moser SBD in German Legal Documents 33

▪ Idea: Automatic detection of structure

▪ Identify headlines, paragraphs,…

▪ Pre-processing method

▪ Algorithm:

1. Find numbered pattern

→ Construct regular expressions

2. Segment into coarser structure

3. Repeat

▪ Individual sentences determined by other methods

Page 34: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Demo

© sebis190819 Moser SBD in German Legal Documents 34

Marked

Sentence

Boundaries

Choice of

Method

Page 35: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Demo: Rule-based

© sebis190819 Moser SBD in German Legal Documents 35

Wrong

Prediction

True

Prediction

False

Negative

Page 36: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Demo: NN

© sebis190819 Moser SBD in German Legal Documents 36

Page 37: Sentence Boundary Detection in German Legal Documents fileIntroduction Motivation Research Questions Sentence Boundaries in Legal Documents Dataset SBD System Overview Existing Approaches

Class Overview

© sebis190819 Moser SBD in German Legal Documents 37

NLTKModule

Tokenizer

OpenNLP

ModuleRuleModule CRFModule NNModule

SplitterSBDModule

Annotator