Creation in Graphs Extracting Conceptual Structures from Old …people.hum.aau.dk/~ulrikp/MA/Download/Report-Final.pdf · 2006. 5. 4. · developed by Prof. Dr. Eep Talstra and his

Creation in Graphs

Extracting Conceptual Structuresfrom Old Testament Texts

Ulrik Petersen

A thesis submittedin partial fulfillment

of the requirements for the degree ofcand.mag. in human-centered informatics

Supervisor: Prof. dr.scient., PhD Peter ØhrstrømStudy program: Human-centered InformaticsM.A. ThesisAalborg University, Denmark

Copyright (C) 2004 by Ulrik Petersen.

All rights reserved.

Please contact the author for obtaining permission before mass-distributing this thesis, eitherelectronically or in hardcopy.

The author can be contacted via:

• ulrikp|a-t|hum(do-t)aau(d-ot)dk or

• ulrikp|t-a|emdros(d-o-t)org

Both addresses have been obfuscated to avoid spam.

This thesis has a homepage here:

http://ulrikp.org/MA/

Scripture quotations marked NIV are taken fromthe HOLY BIBLE, NEW INTERNATIONAL VERSION.

Copyright (C) 1973, 1978, 1984 by International Bible Society.Used by permission of Hodder & Stoughton, Ltd,

a member of the Hodder Headline Plc Group.All rights reserved.

NIV is a registered trademark of International Bible Society.UK trademark number 1448790.

3

Abstract

The main goal of this MA thesis is to develop a method for automatically transforming naturallanguage text into a formalization of a possible meaning of the text, expressed in the conceptualgraphs of John Sowa. I have implemented my method in a computer program, and during thecourse of my thesis, I demonstrate empirically that my method works, by applying it to a specificpiece of text.

My chosen text consists of parts of chapter 1 from the book of Genesis in the Old Testamentof the Bible. I have chosen Hebrew as the particular natural language on which to test my method.

The method chosen is that of syntax-directed, ontology-guided, rule-based, step-wise trans-formation. This is but one of two competing methods described in the literature, the other beingbased on syntax-directed maximal joining of canonical graphs.

As input to my method, I have four classes of data: First, the Hebrew text itself, and second,a ready-made syntactic analysis of the text. Both are taken from the Hebrew WIVU-databasedeveloped by Prof. Dr. Eep Talstra and his research group, Werkgroep Informatica, at the FreeUniversity of Amsterdam. The third class of input data is an ontology of the concepts found inthe text, derived from a matching of a concise Hebrew-English lexicon with WordNet. The fourthclass of input data contains the relation hierarchy, the rules, and the lexicons I have developed aspart of my method.

My method runs in three stages:First, the syntax trees from the WIVU database are refined and transformed into more tra-

ditional generative syntax trees with smaller units. This step is necessary in order to make mymethod workable: One of the main assumptions of my thesis is that semantics can be viewed asbeing compositional in nature, meaning that the semantics of a text unit (e.g., a sentence) can bederived by breaking down the meaning into ever smaller units, going right down to the level ofwords and perhaps morphemes. Conversely, the meaning of a unit can then be constructed backagain by composing together the individual parts of the meaning, directed by the syntax. TheWIVU syntax trees, by themselves, have units which are far too large for compositional seman-tics to be workable, hence I must transform the trees so that the units are smaller. This is done inStep 1.

Second, having obtained a more refined syntax tree, I then transform the text into “interme-diate” CGs. This is done by starting at the bottom of the tree (i.e., with words) and traversingthe tree upwards, composing the meaning of the higher-level units from the meaning of thelower-level units by using rule-based, syntax-tree-directed, ontology-guided joining of concep-tual graphs. This process carries on right up to clause-level, where a different algorithm takesover. The result is CGs which are “quite good”, but which still have bits of syntax left.

Third, the intermediate CGs are transformed into fully semantic CGs using rules. These ruleshave a premise-conclusion structure, and are capable of transforming both concepts, relations,and structure. The end result is CGs which are by now “adequate”, and which have no syntaxleft.

The thesis is divided into two parts. Part I contains background information necessary forunderstanding my method, whereas Part II develops and discusses the method itself.

4

Part I starts out with an introductory chapter (Chapter 1). After that, I introduce the threetools I have used, namely the Jython programming language, the Notio CG framework, andthe Emdros text database engine (Chapter 2). In a short chapter, I then describe Hebrew as alanguage as well as the Hebrew WIVU database (Chapter 3). After that, I describe and discussmy ontology (Chapter 4), followed by a literature survey of the state of the art in text-to-CGtransformation (5). This concludes Part I.

Part II starts out by introducing my method from a bird’s eye perspective (Chapter 6). AsI mentioned, the method runs in three stages, treated in the three subsequent chapters, namely:Refinement of the syntax-trees (Chapter 7); Transformation of the refined syntax trees to inter-mediate CGs (Chapter 8); And finally transformation of intermediate CGs to fully semantic CGs(Chapter 9). Having thus developed my method, I discuss and philosophize over the method(Chapter 10). Finally, I round off the thesis in a concluding chapter (Chapter 11).

5

Resumé1

Denne specialeafhandling har som hovedsigte, at udvikle en metode til automatisk at transformereen tekst i naturligt sprog til en formalisering af en mulig betydning af teksten, udtrykt ved hjælp af JohnSowas konceptuelle grafer. Jeg har implementeret min metode i et computerprogram, og demonstrerer iløbet af specialeafhandlingen, at min metode virker, ved at anvende den på et bestemt stykke tekst.

Den empiriske basis for a teste min metode kommer fra min valgte tekst, nemlige dele af kapitel 1 af1. Mosebog i Det Gamle Testamente i Bibelen. Jeg har valgt hebraisk som det naturlige sprog, jeg vilbruge som testsprog for min metode.

Den valgte metode er syntaks-styret, ontologi-vejledt, regelbaseret, trinvis transformation. Dette erkun en af to grundlæggende metoder beskrevet i litteraturen. Den anden er baseret på syntaks-styretmaximal join af kanoniske grafer.

Som input til min metode har jeg fire klasser af data: 1) Den hebraiske tekst og 2) en færdiglavetsyntaktisk analyse af denne tekst. Begge er tage fra hebraisk-databasen udviklet af Werkgroep Informaticaved det Frie Universitet i Amsterdam (WIVU-databasen), produceret af Prof. Dr. Eep Talstra og hansforskergruppe. 3) En ontologi over tekstens begreber, baseret på en matchning af en koncis hebraisk-engelsk ordbog med WordNet. 4) De data, jeg selv har udviklet i forbindelse med min metode, nemlig etrelationshierarki, et regelsæt og et antal mini-ordbøger.

Min metode løber i tre stadier:Det første stadie er en forfining af WIVU-syntaksen til at have mere finkornede syntakstræer med

mindre enheder, og som mere ligner traditionelle generative syntakstræer. Dette trin er nødvendigt for atfå min metode til at virke hensigtsmæssigt: En af hovedantagelserne bag mit speciale er, at semantik kanforstås som værende kompositionel af natur. Dette vil sige, at man kan tage meningen af en tekstenhed(f.eks. en sætning) og bryde den ned i mindre bestanddele, indtil man når ord-niveauet og evt. morfem-niveauet. Men omvendt kan man så også tage disse mindste semantiske enheder og kombinere dem igeni større og større semantiske enheder, indtil vi når til betydningen af den oprindelige tekstenhed. WIVU-databasens syntakstræer har netop for store enheder til, at en sådan kompositionel tilgang til semantik kanlade sig styre af syntakstræet. Derfor må WIVU-syntaksen forfines til mindre dele, og dette gøres i førstestadie.

Det andet stadie transformerer den forfinede syntaks fra første stadie til “mellemniveau” CG’er. Dettegøres ved at starte i bunden af syntakstræet (d.v.s., ved ordene) og kombinere de konceptuelle grafer,som derved opstår, ved at gå opad i træet og lade det styre kompositionen. Denne kombination er regel-baseret, og bygger på grammatikregler ekstraheret fra teksten i først stadie. Denne proces fortsætter op tilledsætningsniveau, hvor en anden algoritme tager over. Resultatet er CG’er som er “ret gode”, men somstadig har elementer af syntaks tilbage.

Det tredie stadie tager “mellemniveau-CG’erne” fra andet stadie og transformerer dem til fuldt seman-tiske grafer. Dette gøres ved hjælp af regler med en præmis-konklusion-struktur. Disse regler er i standtil at transformere både koncepter, relationer og CG struktur. Slutresultatet er en mængde af konceptuellegrafer, som nu er “tilstrækkelige,” og som ikke har nogen spor af syntaks tilbage.

Specialeafhandlingen er inddelt i to hoveddele: Del I indeholder baggrundsinformation, som er nød-vendig for at forstå resten af specialet. Del II udvikler og diskuterer metoden selv.

Del I starter med et introducerende kapitel (Kapitel 1), hvorefter jeg introducerer tre værktøjer, jeghar anvendt, nemlig programmeringssproget Jython, CG-framework’et Notio og tekstdatabasemotorenEmdros (Kapitel 2). I et kort kapitel skitserer jeg derefter de nødvendige elementer af det hebraiske

1 This abstract in Danish has been added as per my agreement with the Director of Studies, Jørgen Stigel.

6

sprogs grammatik, samt hovedtræk ved WIVU-databasen (Kapitel 3). I Kapitel 4 beskriver og diskutererjeg min ontologi, og i Kapitel 5 beskriver jeg i en litteraturgennemgang de to metoder, der er beskrevet ilitteraturen. Dette konkluderer Del I.

Del II starter med at introducere min metode fra fugleperspektiv (Kapitel 6). Metoden løber somsagt i tre stadier, som efterfølgende beskrives i hver sit kapitel: Fra WIVU-syntaks til mere finkornetsyntaks (Kapitel 7), fra finkornet syntaks til mellemniveau-CG’er (Kapitel 8) og fra mellemniveau-CG’ertil semantiske CG’er (Kapitel 9). I Kapitel 10 diskuterer jeg min metode og filosoferer over den. Endeligafrunder jeg specialet i et konkluderende kapitel (Kapitel 11).

Contents

I Background 21

1 Introduction 231.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.2 Hebrew . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.3 Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.4 Conceptual Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.5 Problem description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.6 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.7 Overview of Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Tools 272.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Jython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3 Notio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4 Emdros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.2 Emdros concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Hebrew 313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 The Hebrew Bible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 The Hebrew language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.2 Parts of speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.3 Hebrew morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4 The WIVU database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4.2 Distributional vs. functional data . . . . . . . . . . . . . . . . . . . . . . 333.4.3 Subphrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7

8 CONTENTS

4 Ontology 354.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.2 9th semester work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2.2 Ontological concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2.3 The lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.4 WordNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.5 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3 Changes from previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.4 Discussion of ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.4.2 The top . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.4.2.2 Situations, States, Processes, and Events . . . . . . . . . . . . 424.4.2.3 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.4.2.4 To bless: State or Process? . . . . . . . . . . . . . . . . . . . 444.4.2.5 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4.2.6 Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4.2.7 Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4.2.8 psychological feature . . . . . . . . . . . . . . . . . . . . . . 46

4.4.3 The rest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4.3.2 God . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4.3.3 Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.4.3.4 Primeval ocean . . . . . . . . . . . . . . . . . . . . . . . . . . 474.4.3.5 void . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.4.3.6 image/likeness . . . . . . . . . . . . . . . . . . . . . . . . . . 484.4.3.7 creepy-crawly . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 Literature review 495.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Canonical graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.1 Sowa and Way (1986) . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2.2 Sowa (1988) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2.3 Velardi et.al. (1988) (DANTE) . . . . . . . . . . . . . . . . . . . . . . . 505.2.4 Other work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.3 Rule-based transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.3.1 Barrière . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.3.2 Nicolas et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.3.3 Nicolas (2003) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

CONTENTS 9

II Creating conceptual structures 59

6 Method overview 616.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.2 Choosing an approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616.3 Overview of my method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.4 Conclusion and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Transforming the WIVU syntax 677.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.2 The bare WIVU syntax trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.3 Problems of the WIVU syntax trees . . . . . . . . . . . . . . . . . . . . . . . . 677.4 Making better syntax trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687.4.2 Main transformation algorithm . . . . . . . . . . . . . . . . . . . . . . . 697.4.3 transform_phrase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.4.4 transform_phrase_subphrase . . . . . . . . . . . . . . . . . . . . . . . . 707.4.5 get_subphrase_string_left_branching . . . . . . . . . . . . . . . . . . . 717.4.6 process_subphrase_list . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.4.7 process_subphrase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

8 From syntax to CGs 798.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.2 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.3 Phrase- and word-level rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808.3.2 Properties of rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808.3.3 Noun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.3.4 Verb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.3.5 Adjective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.3.6 Adverb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.3.7 Preposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.3.8 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.3.9 NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

8.3.9.1 Parallel construction . . . . . . . . . . . . . . . . . . . . . . . 848.3.9.2 Regens/rectum . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8.3.10 VP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858.3.11 PP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.3.12 CjP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.3.13 AP and AdvP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878.3.14 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

10 CONTENTS

8.4 Clause-level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888.4.2 Excursus: The number of rules . . . . . . . . . . . . . . . . . . . . . . . 888.4.3 Method used at clause-level . . . . . . . . . . . . . . . . . . . . . . . . 92

8.5 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 938.5.2 Overview of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 938.5.3 Word-level algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 948.5.4 Phrase-level algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 948.5.5 Clause-level algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8.5.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 958.5.5.2 Top-level clause-transformation . . . . . . . . . . . . . . . . . 958.5.5.3 transform_clause . . . . . . . . . . . . . . . . . . . . . . . . . 968.5.5.4 joinAtClauseLevel . . . . . . . . . . . . . . . . . . . . . . . . 978.5.5.5 join_graphs_with_relation . . . . . . . . . . . . . . . . . . . . 98

8.5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 988.6 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.6.2 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1008.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9 From intermediate CGs to more semantic CGs 1039.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1039.2 Overview of method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1039.3 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

9.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049.3.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049.3.3 Rule structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049.3.4 Rule preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059.3.5 Sample rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089.4.2 Mappings and lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1099.4.3 Bird’s eye view of algorithm . . . . . . . . . . . . . . . . . . . . . . . . 1109.4.4 preprocessRule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109.4.5 process_conclusion_concept . . . . . . . . . . . . . . . . . . . . . . . . 1129.4.6 getInnerGraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1139.4.7 transformIntermediateGraph . . . . . . . . . . . . . . . . . . . . . . . . 1139.4.8 ApplyConclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1159.4.9 lookupConcept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1179.4.10 lookupConceptGetPair . . . . . . . . . . . . . . . . . . . . . . . . . . . 1179.4.11 CopyConcept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

CONTENTS 11

9.4.12 removeFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1189.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

10 Discussion 12310.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12310.2 This is machine translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12310.3 This is knowledge representation . . . . . . . . . . . . . . . . . . . . . . . . . . 12410.4 The semantics of my work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

10.4.1 Surface semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12610.4.2 Compositional semantics . . . . . . . . . . . . . . . . . . . . . . . . . . 128

10.5 The non-centrality of Hebrew . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12910.6 Does my method scale? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13010.7 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13110.8 Critique of my method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13210.9 Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

10.9.1 Perspective and general adequacy . . . . . . . . . . . . . . . . . . . . . 13310.9.2 Adequacy for my purposes . . . . . . . . . . . . . . . . . . . . . . . . . 13410.9.3 Key role of my ontology . . . . . . . . . . . . . . . . . . . . . . . . . . 134

10.10Further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13410.11Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

11 Conclusion 13911.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13911.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13911.3 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14011.4 Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14111.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

A Hebrew transliteration 155

B Hebrew 157B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157B.2 The Hebrew language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

B.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157B.2.2 Graphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157B.2.3 Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

B.2.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 158B.2.3.2 Parts of speech . . . . . . . . . . . . . . . . . . . . . . . . . . 158B.2.3.3 Hebrew morphology . . . . . . . . . . . . . . . . . . . . . . . 159

B.2.4 The verbal system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160B.2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 160B.2.4.2 Tenses and moods . . . . . . . . . . . . . . . . . . . . . . . . 161

12 CONTENTS

B.2.4.3 Tense/Aspect . . . . . . . . . . . . . . . . . . . . . . . . . . . 161B.2.4.4 Moods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162B.2.4.5 Wav + perfect/imperfect . . . . . . . . . . . . . . . . . . . . . 162B.2.4.6 Infinitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163B.2.4.7 Directives (Imperatives, Cohortatives, Jussives) . . . . . . . . 163

B.2.5 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164B.2.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 164B.2.5.2 State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164B.2.5.3 Tense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

B.2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166B.3 The WIVU database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

B.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166B.3.2 History of the Werkgroep Informatica . . . . . . . . . . . . . . . . . . . 166B.3.3 Distributional and functional data . . . . . . . . . . . . . . . . . . . . . 171

B.3.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 171B.3.3.2 Initial definitions . . . . . . . . . . . . . . . . . . . . . . . . . 171B.3.3.3 Purpose and order of creation . . . . . . . . . . . . . . . . . . 172B.3.3.4 Further definitions . . . . . . . . . . . . . . . . . . . . . . . . 172

B.3.4 Methods used in analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 173B.3.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 173B.3.4.2 Bottom-up strategy . . . . . . . . . . . . . . . . . . . . . . . 173B.3.4.3 Top-down approach . . . . . . . . . . . . . . . . . . . . . . . 174B.3.4.4 Overview of the analysis procedures . . . . . . . . . . . . . . 174B.3.4.5 Description of procedure . . . . . . . . . . . . . . . . . . . . 175B.3.4.6 Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

B.3.5 Word level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177B.3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 177B.3.5.2 Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177B.3.5.3 Subphrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

B.3.6 Phrase level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179B.3.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 179B.3.6.2 Definition of phrases and phrase atoms . . . . . . . . . . . . . 179B.3.6.3 Clause constituent labels . . . . . . . . . . . . . . . . . . . . 181

B.3.7 Clause level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183B.3.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 183B.3.7.2 Clause atoms and clauses . . . . . . . . . . . . . . . . . . . . 183B.3.7.3 Clause hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . 183

B.3.8 Sentence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184B.3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

B.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

CONTENTS 13

C Categories in the WIVU database 187C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187C.2 Parts of speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

C.2.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187C.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

C.3 Phrase-dependent parts of speech . . . . . . . . . . . . . . . . . . . . . . . . . . 188C.3.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188C.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188C.3.3 Commentary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

C.4 Changes in part of speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188C.4.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188C.4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189C.4.3 Commentary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

C.5 Verbal tense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189C.5.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189C.5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

C.6 Verbal stem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189C.6.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189C.6.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

C.7 Person . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190C.7.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190C.7.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

C.8 Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190C.8.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190C.8.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

C.9 Gender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191C.9.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191C.9.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

C.10 Phrase atom type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191C.10.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191C.10.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191C.10.3 Commentary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

C.11 Phrase type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192C.11.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192C.11.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192C.11.3 Commentary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

C.12 Phrase function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192C.12.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192C.12.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193C.12.3 Commentary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

C.13 Clause atom relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193C.13.1 Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

C.13.1.1 Genesis 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

14 CONTENTS

C.13.1.2 Genesis 1:1-3 . . . . . . . . . . . . . . . . . . . . . . . . . . 193C.13.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

C.14 Verbs with non-qal stems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194C.14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194C.14.2 Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

C.14.2.1 Genesis 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194C.14.2.2 Genesis 1:1-3 . . . . . . . . . . . . . . . . . . . . . . . . . . 194

C.14.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

D Textual emendations 197D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197D.2 Genesis 1:14 monads 262-265 . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

D.2.1 MQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197D.2.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198D.2.3 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

D.3 Genesis 1:26 monads 506-523 . . . . . . . . . . . . . . . . . . . . . . . . . . . 198D.3.1 MQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198D.3.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199D.3.3 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

D.4 Genesis 1:28 monads 572-584 . . . . . . . . . . . . . . . . . . . . . . . . . . . 200D.4.1 MQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200D.4.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200D.4.3 Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

E Emdros 203E.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203E.2 Origins of Emdros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203E.3 Emdros concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

E.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204E.3.2 Monad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204E.3.3 Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205E.3.4 Object type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205E.3.5 Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205E.3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

E.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206E.5 Emdros API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207E.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

F Ontology 209F.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209F.2 Ontology of Genesis 1:1-3, abbreviated . . . . . . . . . . . . . . . . . . . . . . 209F.3 Ontology of Genesis 1, abbreviated . . . . . . . . . . . . . . . . . . . . . . . . . 211F.4 Ontology of Genesis 1:1-3, full . . . . . . . . . . . . . . . . . . . . . . . . . . . 219

CONTENTS 15

F.5 Ontology of Genesis 1, full . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

G Grammar of Genesis 1 227G.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227G.2 Grammar of Gen 1:1-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227G.3 Grammar of Genesis 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

H Plots 231H.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231H.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231H.3 GNUPlot scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

H.3.1 Plotting rules against words . . . . . . . . . . . . . . . . . . . . . . . . 232H.3.2 Plotting ln(rules) against ln(words) . . . . . . . . . . . . . . . . . . . . 233

I Mathematics of plots 235I.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235I.2 Argumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

J Rules 239J.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239J.2 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

List of Tables

8.1 Suffix conversion strings for possessive suffix . . . . . . . . . . . . . . . . . . . 818.2 Suffix conversion strings for subject suffix . . . . . . . . . . . . . . . . . . . . . 828.3 Rules for nouns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828.4 Rules for verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828.5 Rule for adjectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.6 Rule for adverbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.7 Lexicon for prepositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.8 Lexicon for conjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.9 Rules for NPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.10 Rule for VPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.11 Rules for PPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.12 Rule for CjP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868.13 Rules for APs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878.14 Rule for AdvPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878.15 Sliding scale of importance for clause labels . . . . . . . . . . . . . . . . . . . . 938.16 Mapping of clause text types to concept types . . . . . . . . . . . . . . . . . . . 968.17 Relations used in transforming syntax to CG . . . . . . . . . . . . . . . . . . . . 99

9.1 Fields of each Rule data-structure . . . . . . . . . . . . . . . . . . . . . . . . . 105

A.1 The Hebrew alphabet and its transliteration . . . . . . . . . . . . . . . . . . . . 155

B.1 Parts of speech in Biblical Hebrew (van der Merwe et.al. (1999)). . . . . . . . . 159B.2 Summary of possible values for person, number, gender, and state. . . . . . . . . 160B.3 Summary of Hebrew suffixes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160B.4 Hebrew verb tenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161B.5 Hebrew verb “moods” or “stem formations” . . . . . . . . . . . . . . . . . . . . 161B.6 Directives in Biblical Hebrew. . . . . . . . . . . . . . . . . . . . . . . . . . . . 163B.7 Tenses as specified by intraclausal syntax. . . . . . . . . . . . . . . . . . . . . . 165B.8 Verb-tenses in the WIVU database. . . . . . . . . . . . . . . . . . . . . . . . . . 178B.9 Subphrases, or functional categories at word-level. . . . . . . . . . . . . . . . . 178B.10 Clause constituent labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182B.11 Groups of clause atom relation codes in Genesis 1 . . . . . . . . . . . . . . . . . 184

16

List of Figures

4.1 The top ontology from Sowa (1992) . . . . . . . . . . . . . . . . . . . . . . . . 424.2 My own top ontology, derived from Sowa (1992) and Martin (1995). . . . . . . . 43

5.1 Overview of Barrière’s method (after Barrière (1997), p. 31) . . . . . . . . . . . 535.2 WordNet-actor from Nicolas (2003), p. 55 . . . . . . . . . . . . . . . . . . . . . 565.3 Rules from Nicolas (2003), Appendix B, page 108 . . . . . . . . . . . . . . . . . 57

6.1 Overview of method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.1 Example regens-rectum structure . . . . . . . . . . . . . . . . . . . . . . . . . . 687.2 Overlapping parallel subphrase example . . . . . . . . . . . . . . . . . . . . . . 727.3 Transformed, left-branching parallel example . . . . . . . . . . . . . . . . . . . 727.4 Transformed right-branching regens-rectum example . . . . . . . . . . . . . . . 737.5 WIVU syntax of “Darkness was on the face of the deep” . . . . . . . . . . . . . 767.6 Transformed syntax of “Darkness was on the face of the deep” . . . . . . . . . . 77

8.1 Plot of rules against words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908.2 Plot of ln(rules) against ln(words) . . . . . . . . . . . . . . . . . . . . . . . . . 918.3 Intermediate CG for “Darkness was on the surface of the deep” . . . . . . . . . . 1018.4 Graphs produced from syntax, Genesis 1:1-3 . . . . . . . . . . . . . . . . . . . . 102

9.1 Rule for Time/ptim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1069.2 Rule for be/Subj/stat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1069.3 Rule for Subj/agnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079.4 Rule for Objc/thme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1079.5 Rule for PreC/Subj/over . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1089.6 Summary of transformIntermediateGraph . . . . . . . . . . . . . . . . . . . . . 1109.7 Summary of applyConclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 1119.8 Semantic CG for “Darkness was on the surface of the deep” . . . . . . . . . . . 1209.9 Graphs produced from intermediate graphs, Genesis 1:1-3 . . . . . . . . . . . . 121

10.1 Machine translation from Hebrew to English CGs . . . . . . . . . . . . . . . . . 12410.2 Machine translation from Hebrew to CGs to other natural languages . . . . . . . 12510.3 Example of better handling of direct speech . . . . . . . . . . . . . . . . . . . . 135

17

18 LIST OF FIGURES

10.4 Example of embedded quote . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13610.5 Rule for transforming multi-layered parallel constructions . . . . . . . . . . . . . 136

B.1 Overview of Werkgroep Informatica analysis-procedure. . . . . . . . . . . . . . 176

E.1 Objects of type “Word” and “Phrase” . . . . . . . . . . . . . . . . . . . . . . . . 205E.2 A small EMdF database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

List of abbreviations

AI Artificial IntelligenceAP/AdjP Adjective PhraseAdvP Adverb PhraseAPI Application Programming InterfaceAST Abstract Syntax Tree (a term in compiler construction in Computer Science)BHS Biblia Hebraica Stuttgartensia (Standard scholarly edition of the Hebrew Bible)CG Conceptual GraphCGIF CG Interchange Form (a CG expression format)CjP Conjunction PhraseDeut DeutoronomyEMdF Extended MdF (Emdros text database model)Emdros Engine for MdF Database Retrieval, Organization, and StorageGen Genesis (Book of the Bible)LCPC List of Conclusion-Premise Concepts (see page 9.3.4)LCRC List of Conclusion-Result Concepts (see page 9.4.2)LMC List of Matched Concepts (see page 9.4.2)LMR List of Matched Relations (see page 9.4.2)LPNMC List of Premise Non-Matched Concepts (see page 9.4.2)id_d A unique ID of an Emdros object in a databaseJudg Judges (Book of the Bible)MA Master of Arts. A translation of the Danish “cand.mag.”MdF Monads dot Features (text database model)MQL Mini QL (Emdros query language)NIV New International Version (of the Bible)NP Noun PhraseP PremisePP Prepositional PhrasePPObj PP object (i.e., the NP governed by the preposition(s) in a PP)Prep PrepositionVP Verb PhraseVU Vrije Universiteit (Free University) (of Amsterdam)WI Werkgroep Informatica (Workgroup for Informatics), faculty of theology, VUWIVU Werkgroep Informatica, Vrije Universiteit

19

Acknowledgements

No man is an island, as the poet says. In this context of acknowledgements, this means inparticular that no person grows by his own strength alone, and no research takes place in avacuum. I have much to be grateful to others for, both in terms of growth and research. Prof.dr.scient., PhD Peter Øhrstrøm has been an invaluable help not only with this MA thesis, butby supporting my growth as an academic and as a teacher. PhD candidate, MA Henrik Schärfehas been a great colleague who has supported my efforts to grow as a teacher, who has providedmany good conversations on research-related topics, and who also gave me the idea for my MAtopic on an airport shuttle in Sofia, Bulgaria in the hot Summer of 2002. Associate professor,dr.theol. Georg Adamsen has been a great support in my endeavors to become an academic.Associate professor, teol.dr. Nicolai Winther-Nielsen has had a great and lasting influence onmy life in ways that are too numerous to recount here. I shall thus only mention the fact that heinstigated in me a love for Hebrew and computational linguistics in that fateful Summer of 1996when we first met. Professor Ordinarius Dr. Eep Talstra of the Free University of Amsterdamhas helped me with his friendship, his guidance, his sponsorship of some of my work-relatedefforts, his generous offer to let me use his database in my MA, and by his kindness. PhD KatyBarnwell has been a great support in personal and spiritual growth over the years, and she alsohad the vision to bring me to Dallas, Texas in 1998 and 1999, where I have spent some of myhappiest moments. My mentor and friend, M.D., has been a support for so many years and inso many ways that I cannot express my gratitude to you. And finally, to my parents, for theirconstant love and support throughout the years.

S.D.G.

20

Part I

Background

21

This page has been intentionally left blank

22

Chapter 1

Introduction

1.1 IntroductionThis thesis is concerned with methods for automatically transforming syntactic structures toconceptual structures. At the heart of my studies stands the assumption that it is possible totransform syntactic structures to conceptual structures by means of rules which transform syntaxto semantics, combined with an ontology. The thesis can be seen as a sustained argument in favorof this assumption.

The concrete goal of my studies is to be able to transform a syntactic analysis of parts ofthe Hebrew text of Genesis chapter 1 from the Hebrew Bible into the conceptual graphs of JohnSowa.1 The proposed method is partially rule-based, partially based on an ontology derived froma subset of a Hebrew-English lexicon covering the lexemes in the chosen text. The method willbe implemented in a computer program along with a number of data files.

I have divided the thesis into two parts. Part I lays the foundation for the rest of my work,while Part II develops and discusses my method.

In the following, I briefly present some of the main chapters in Part I. First, I describe Hebrewas the chosen domain language (1.2). Then I touch on ontology as a basis for my work (1.3).Then I discuss conceptual graphs, their origins, and academic backing (1.4). I then proceed todefining the problem which my thesis strives to answer in a problem description (1.5). After that,I present some hypotheses which I hope will be made plausible during the course of my thesis(1.6). Finally, I conclude with an overview of Part I (1.7).

1.2 HebrewThe Hebrew Bible begins with “the beginning.” In its first book, Genesis, the story of the creationof the world is told, along with the Great Fall, Cain and Abel, Noah and the deluge, the Towerof Babel, and the story of how the Israelites came to be a people told in the stories of Terah,Abraham, Isaac, Jacob, and Joseph.

1 Sowa (2000a).

23

24 CHAPTER 1. INTRODUCTION

In this thesis, I have restricted myself to dealing with Genesis chapter 1, or parts thereof.I really have two target texts: First, I only hope to be able to demonstrate my method on thefirst three verses of Genesis 1 (Genesis 1:1-3). However, I will try to make it plausible that themethod could be extended to cover the whole of chapter 1, the whole of Genesis, and ultimatelythe whole of the Hebrew Bible, given enough attention to detail. Thus I will try to make itplausible that the method works at least in principle for large amounts of Hebrew text, while Ionly demonstrate it on Genesis 1:1-3.

One could ask, why this text? Part of an answer would be that the text is among the moststudied in the whole of the Hebrew Bible. Thus adding to the volume of knowledge about thetext can do little or no harm. Another part of the answer is that most of the text is fairly simpleHebrew with a low degree of polysemy and much repetition,2 thus simplifying my task.

One could also ask, why deal with Hebrew? This choice was motivated by two factors: First,the intrinsic interest-value in dealing with a foreign, Biblical, ancient language such as Hebrew.And second, the novelty of the problem perspective, in the fact that, to the best of my knowledge,no-one had yet tried to transform Hebrew to conceptual graphs before my study.

For the purposes of my thesis, the Hebrew text has to be available in machine-readable form,and a syntactic analysis has to be produced. Fortunately, a research group of the Faculty of The-ology at the Free University of Amsterdam has done just this. Under the leadership of Prof. Dr.Eep Talstra, the Werkgroep Informatica (WI) has been at work since 1977 producing a syntacti-cally analyzed database of the Hebrew Bible.3 The Free University is called “Vrije Universiteit”or VU in Dutch, so the database is called the WIVU-database. I have generously been grantedaccess to the database, and this will be my source of both the text of Genesis 1 and a syntacticanalysis of the text.

In Chapter 3 on page 31, I describe certain aspects of the Hebrew language and the WIVUsyntactic analysis. This chapter only gives a brief introduction; the details can be found inAppendix B on page 157.

1.3 Ontology

To summarize John Sowa’s definition,4 ontology is the study of the categories that exist in somedomain. An ontology, on the other hand, is the product of such a study. In Chapter 4 startingon page 35, I present the ontology which underlies my work. This ontology has been derivedautomatically from a Hebrew-English lexicon married to the WordNet lexical database. It hasalso been merged with a consistent top ontology so that the ontology can better serve as the basisfor applying CG algorithms to the problem at hand.

2 There are 108 lexical forms in Genesis 1, and a total word count of 673, yielding an average repetition-rate of6.23. Moreover, each word only occurs in a small number of senses, often only 1.

3 See, e.g., Talstra and Postma (1989), Talstra and Sikkel (2000), Talstra (2002).4 See Sowa (2000a, p. 492) and section 4.1 on page 35 of this thesis.

1.4. CONCEPTUAL GRAPHS 25

1.4 Conceptual Graphs

Conceptual graphs became widely known with the publication of John F. Sowa’s5 1984-book,“Conceptual Structures: Information Processes in Mind and Machine” (Sowa (1984)). This gen-erated a lot of research, and gradually a research community was founded, resulting in a seriesof seven workshops, followed by eleven annual international conferences to date.6 In 2000,Sowa published another important book, “Knowledge Representation: Logical, Philosophical,and Computational Foundations” (Sowa (2000a)). Through numerous articles, Dr. Sowa has con-tributed to the development of the theory of conceptual graphs, as have many other researchers.7

Conceptual graphs are not the only contender for a formal language in which to representmeaning. Others include Concept Graphs8, Word Graphs9, the technologies propounded by theSemantic Web initiative,10 and others. The reason I have chosen conceptual graphs over all ofthese include their simplicity, their expressivity, the availability of CG software manipulationtools, the academic community and backing behind CGs, and my familiarity with them.

I will not describe conceptual graphs further, but simply assume that the reader is familiarwith them, at least enough to be able to understand my later discussions.11

1.5 Problem description

How can one transform a text in natural language into a representation in a formal language, anddo so automatically? In my thesis I wish to explore methods for automatically transforming theHebrew text of Genesis chapter 1 verses 1-3 into the conceptual graphs of John Sowa. I will doso by developing a variation of a specific method, namely rule-based, stepwise, ontology-guided,syntax-driven transformation of syntactic structures to conceptual structures. What problems areinvolved in this process? How well does it work? What can be done when it does not work?

These and other questions I hope to answer in this thesis.

5 Dr. Sowa has a website at <http://www.jfsowa.com/>.6 The proceedings of the Seventh Annual Workshop on Conceptual Structures is available as Pfeiffer and Nagle

(1993). The international conference on conceptual structures (ICCS) has been held annually since 1993. SeeMineau, Moulin and Sowa (1993), Tepfenhart, Dick and Sowa (1994), Ellis, Levinson, Rich and Sowa (1995),Eklund, Ellis and Mann (1996), Lukose, Delugach, Keeler, Searle and Sowa (1997), Mugnier and Chein (1998),Tepfenhart and Cyre (1999), Ganter and Mineau (2000), Stumme (2000), Delugach and Stumme (2001), Mineau(2001), Priss, Corbett and Angelova (2002), Angelova, Corbett and Priss (2002), and Moor, Lex and Ganter (2003).

7 In my own small way, I have contributed to the advancement of the theory of conceptual graphs throughwriting some teaching materials for teaching conceptual graphs and Prolog+CG to students from the humanities.See <http://www.huminf.aau.dk/cg/>. These materials have been developed through the academic andfinancial sponsorship of the Department of Communication, Aalborg University, and the Flexnet project of theDepartment of Humanistic Information Science, University of Southern Denmark.

8 Wille (1997), extended in Wille (1998)9 Hoede and Liu (1996), extended in Hoede and Liu (1998)

10 See <http://www.semanticweb.org/>.11 I refer the interested reader to <http://www.huminf.aau.dk/cg/>, Module 1, for a tutorial.

26 CHAPTER 1. INTRODUCTION

1.6 HypothesesI have some hypotheses which I hope to make plausible during the course of my thesis. Thesehypotheses have grown out of my thinking about the subject matter; some have been hunchesfrom the start, while others have become clearer as I have worked with the data. Some havesprung from my work suddenly and forcefully.

My hypotheses are:

1. It is possible to transform syntactic structures to conceptual structures by means of themethod under development, with quite some degree of success. In particular, the concep-tual structures will have no syntax left from the source text, and will only contain seman-tics. In addition, a human reader will be able to recognize a large degree of similaritybetween the semantics of the CGs and a possible meaning of the natural language text.

2. The ontology which I developed in my 9th semester project is adequate for the purposesof this thesis, given certain modifications.

3. Syntax alone cannot yield enough semantics; the ontology will demonstrably play a keyrole in yielding meaning from the text.

4. The method I have chosen to develop will work for Hebrew, and in particular, using thesyntactic analysis of the text which I have at my disposal as a basis.

5. It will be beneficial to transform the syntactic analysis to something which is closer totraditional generative syntax trees.

6. The number of phrase structure rules found in any given amount of text grows slowly withthe number of words analyzed.

7. The number of clause valency patterns grows faster with the amount of text analyzed thanthe number of phrase-level phrase structure rules. Thus it will be easier to treat clause-levelusing a different method than the one used at phrase-level.

In Section 11.4 on page 141, I return to these hypotheses and check whether they have beenvalidated.

1.7 Overview of Part IIn Chapter 2, I describe the tools which I have used in building an implementation of my method.In Chapter 3, I describe Hebrew as a language. In Chapter 4, I describe ontology as a basis formy further work. Finally, in Chapter 5, I give a literature review of the field of my thesis.

Chapter 2

Tools

2.1 Introduction

In this chapter, I describe three of the technical tools used in the implementation of my method.The first is the Jython programming language. The second, Notio, is a Java-based implemen-tation of the theory of conceptual graphs. The third is Emdros, a text database engine which isused at the VU for storing and retrieving the WIVU-database.

All three are foundational to my work. The Jython language is foundational in that it providesthe concrete computer programming language in which to express the algorithms of the method.Notio and Emdros are foundational in the sense that they function as prerequisites upon whichthe rest of the application is built. If I had not had access to these tools, the development timewould have been significantly longer.

In the following, I describe first Jython (2.2), then Notio (2.3), and lastly Emdros (2.4).Finally, I summarize the chapter in a conclusion (2.5).

2.2 Jython

Jython1 is a programming language based on two other programming languages, namely Python2

and Java.3 Jython is an implementation, in Java, of Python, and provides access to Java programsfrom within the Python language.

Jython is a general-purpose programming language. It is also reasonably fast, quite robust,with good compiler-support, and has a level of abstraction that is more than adequate for my pur-poses. It also allows for rapid prototyping and quick changes to a program. These characteristicsmake the language a good choice in which to implement my method.

1 See <http://www.jython.org/>.2 See <http://www.python.org/>.3 See <http://java.sun.com/>.

27

28 CHAPTER 2. TOOLS

2.3 NotioNotio4 is an API5 for developing applications in Java which need to manipulate conceptualgraphs. Notio is well-documented,6 with documentation in English, and by design seeks toadhere to the CG standard. It seems to have about the right level of abstraction for my purposesin the primitives it provides.

There are also other CG frameworks besides Notio. The three contenders were:

• CoGITaNT,7

• Prolog+CG,8 and

• Amine Platform.9

Amine was not ready and available early enough to be a real contender. Prolog+CG does notsupport the full CG theory in that it lacks indexicals, monadic and triadic relations, and other de-sirable properties of the full CG theory. The drawbacks of CoGITaNT, which led to its rejection,include the fact that it seems overly complex for my purposes, and the fact that it does not adhereto the CG standard.10

For completeness, I should also mention Synergy by Prof. Dr. Adil Kabbaj.11 However,Synergy was not really a contender, since it is not a framework for representing knowledge usingconceptual graphs, but rather a programming language based on conceptual graphs. In the samevein, pCG12 might have been a contender, were it not for the fact that it is a process-orientedlanguage based on Conceptual Graphs. Thus it is not for general knowledge-representation.

Other arguments in favor of Notio can be advanced as follows. CoGITaNT is written i C++,which is not a language which is known for its friendliness towards rapid application develop-ment. In my MA work, since it largely involved methodological explorations with concomitantrapid change of the code-base, I needed to work in a language which provided rapid turn-aroundtimes in terms of changes to the codebasecode-base. Since Notio was written i Java, it couldbe trivially exposed in Jython. Since Jython is both familiar to me as a programming languageand is known to provide very rapid turn-around times in terms of changes to the program, theNotio-Jython combination seemed very appealing. I could, of course, have wrapped the CoGI-TaNT API in Python or Java with the SWIG program13 and achieved the same effect. However,

4 See Southey and Linders (1999) and <http://notio.lucubratio.org/>.5 Application Programming Interface. A specification of how programmers can access the services which a piece

of software offers.6 See <http://notio.lucubratio.org/>.7 See Genest and Salvat (1998) and <http://cogitant.sourceforge.net/>.8 >See Kabbaj and Janta-Polczynski (2000), Kabbaj, Moulin, Gancet, Nadeau and Rouleau (2001), and

<http://prologpluscg.sourceforge.net/>9 See <http://amine-platform.sourceforge.net/>

10 CoGITaNT implements the Nested Conceptual Graph (NCG) standard, which deviates from the StandardConceptual Graph model (SCG) in the way nested graphs are handled. See Genest and Salvat (1998, pp. 156–159)and Chein and Mugnier (1997, pp. 101–108).

11 See Kabbaj (1999a) and Kabbaj (1999b).12 See Benn and Corbett (2001).13 SWIG is the “Simplified Wrapper Interface Generator.” It is available from <http://www.swig.org/>

2.4. EMDROS 29

as mentioned before, CoGITaNT is somewhat heavy-weight in relation to my needs, and Notioseemed to provide a better level of abstraction and a smaller, simpler range of features whichbetter suited my needs.

Thus Notio seemed to be the best choice for a CG framework on which to base my studies.

2.4 Emdros

2.4.1 IntroductionEmdros14 is a “text database engine for analyzed or annotated text.”15 It is an engine in that itacts as a middle-ware layer offering certain services to other software. It is a database enginein that the services it offers are database-oriented services. It is a text database engine in that itsintended database application domain is text.

Emdros deals with “analyzed or annotated text,” meaning that it handles text plus informationabout that text. This extra information can be anything from linguistic analyses to documentstructure to pagination. The Werkgroep Informatica use Emdros to store their version of theHebrew Bible, plus their linguistic analyses and the traditional book-chapter-verse documentstructure of the Hebrew Bible.

In the following section, I briefly introduce the most important concepts in Emdros. Theinterested reader is referred to Appendix E on page 203, where I elaborate more on Emdros.

2.4.2 Emdros conceptsEmdros rests on four concepts: Monad, object, object type, and feature. In this section, I brieflyintroduce these concepts.

A monad is simply an integer. The sequence of integers (1,2,3, etc.) is used to dictate thereading order of the text.

An object is a set of monads plus some attributes, called “features”. The notation for ac-cessing an object’s features is “Object.feature”. Thus a word’s surface text might be denoted“w1.surface”.

Objects are grouped in object types, e.g., Word, Phrase, Clause, or Verse. The object typedetermines what features an object has.

One special feature which always exists on an object is “self”, referring to the object’s IDin the database (called an id_d). Thus the feature-value “O.self” refers to the unique ID of theobject “O”.

I shall refer to some of these concepts later in the thesis.Thus Emdros is a text database engine for storing and retrieving analyzed or annotated text.

Its significance for my thesis is that it provides the easiest way of accessing the WIVU database.It is based on monads, objects, object types, and features.

14 Emdros is introduced for linguists in Petersen (forthcoming).15 See the Emdros website: <http://emdros.org/>. The quote is the Emdros slogan displayed on the front

page.

30 CHAPTER 2. TOOLS

2.5 ConclusionIn this chapter I have described three tools which I have used for implementing my method. Thetools are the Jython programming language, the Notio CG API, and the Emdros text databaseengine. All of these have proven to be useful in implementing my method, lightening the loadby providing services upon or through which to build my application.

Chapter 3

Hebrew

3.1 Introduction

In this chapter, I lay the foundation for my later work, in so far as knowledge of Hebrew and theHebrew Bible is concerned. I start by introducing the Hebrew Bible as a whole (3.2). Then I treatthe features of the Hebrew language which are necessary for understanding my later discussions(3.3). Then I describe the analysis of the Hebrew text as it is embodied in the WIVU database(3.4). Finally, I sum up my findings in a conclusion (3.5).

I shall not go into detail in this chapter: The details are saved for Appendix B on page 157.Instead, this chapter only summarizes the main points of Appendix B. I have done this for tworeasons: A) so as to save space in the main part of the thesis, and B) because describing theHebrew language is not my main intention in this thesis. My main intention is to describe amethod for transforming Hebrew to conceptual graphs. The interested reader can find lots ofadditional detail in Appendix B, should the need arise.

3.2 The Hebrew Bible

“The Hebrew Bible” is the term commonly applied to what Christians call the “Old Testament”in Hebrew and Aramaic.1 It is a corpus consisting of approximately 426,000 words,2 organizedin 39 books. Each book has one or more sequentially numbered chapters, each of which issubdivided into a number of sequentially numbered verses. The standard way of referring toa given passage by book-chapter-verse is to follow the formula: “Book chapter:verse(s)”. So“Genesis 1:1-3” would refer to the book called “Genesis” chapter “1” verses “1 to 3.” It is alsostandard practice to abbreviate the book-name. In this thesis, I will use “Gen” for “Genesis.”

The standard scholarly edition of the Hebrew Bible is called the “Biblia Hebraica Stuttgarten-

1 Jews following Judaism do not, of course, believe that the predicate “Old” applies to the Hebrew Bible in thesense that something (viz., the New Testament) has replaced the “Old” by fulfillment and partial obsolescence.

2 Count of words in the Emdros database of the Werkgroep Informatica Hebrew Bible. The exact count is426,477.

31

32 CHAPTER 3. HEBREW

sia”, or BHS for short.3 The BHS is, of course, written using a Hebrew script. In this thesis, Iwill not use a Hebrew script when writing Hebrew words, but rather use the transliteration usedin the Werkgroep Informatica BHS.4

Thus the Hebrew Bible is the “Old Testament” in Hebrew and Aramaic. It is structured inbooks, chapters, and verses, and specific sections are referenced by the formula “Book chap-ter:verse(s)”, e.g., “Gen 1:3”. The standard scholarly edition of the Hebrew Bible is called BHS.In this thesis, I will be using a transliteration rather than a Hebrew script.

3.3 The Hebrew language

3.3.1 IntroductionIn this section, I briefly introduce some concepts from Hebrew which are necessary for under-standing my later treatment. I mostly refer to Appendix B for the details, and here only give themost cursory treatment of each topic.

First, I introduce the parts of speech in Hebrew (3.3.2). Then I introduce the basic elementsof Hebrew morphology (3.3.3). And this will suffice for my purposes in the rest of the thesis.

3.3.2 Parts of speechHebrew has a number of parts of speech, all listed in Table B.1 on page 159. These parts ofspeech are for the most part not unusual, since most of them are present in the Indo-Europeanlanguages. In fact, the odd ones out (“Predicators of existence” and “Discourse markers”) are notclassified as such in the WIVU database, but as other, more well-known parts of speech. Hencethey pose no problem for understanding my method.

3.3.3 Hebrew morphologyHebrew nouns, adjectives, pronouns, verbs, and participles, are inflected for number and gender.Number is either singular, dual, or plural, and gender is either masculine or feminine. Verbs andpronouns are inflected for person. (See Section B.2.3.3 on page 159.)

Verbs and participles are further inflected for tense and mood. Mood is used for showingreflexivity, passive/active voice, and intensity (see Section B.2.4.4 on page 162). Tense mayshow either time or aspect or both; scholars tend to disagree on the correct interpretation oftenses (see Section B.2.4.3 on page 161).

Number, gender, and person are realized as suffixes. There are three kinds of suffixes: nomi-nal ending, verbal ending, and pronominal suffixes. Nominal endings the noun’s number/gender.Verbal endings show the person/number/gender of the subject of the verb. Pronominal suffixesshow possession on nouns, whereas on verbs they show the verb’s object. (See Section B.2.3.3 onpage 159.)

3 Elliger, Rudolf and Weil (1997).4 See Appendix A.

3.4. THE WIVU DATABASE 33

Nouns, adjectives, and participles show a feature called “state”. It can be either “absolute”or “construct”. This is used to specify genitive-relations between words (e.g., “the bridle OF thehorse OF the king”; see Section B.2.5.2 on page 164).

3.4 The WIVU database

3.4.1 IntroductionIn this section, I mention a few concepts which are relevant for my later discussions. Most ofthe concepts are described under the heading of “distributional vs. functional data” (3.4.2). Oneconcept which deserves special mention is that of Subphrases, and accordingly this has beengiven a section of its on (3.4.3). Subphrases play a very important role in my method when Itransform the WIVU syntax into more fine-grained syntax trees (Chapter 7 starting on page 67).In this section, I shall mostly refer to Appendix B for the full treatment, and here only summarizethe main points.

3.4.2 Distributional vs. functional dataThe most striking feature of the WIVU Hebrew database may well be its distinction between dis-tributional and functional data. The distinction centers around the notions of form and function:Form is what you can see directly in the text, whereas function is derived from the form.

For word-level, these two kinds are conflated into the same Emdros object. The distributionaldata for Word-level include such things as what affixes are present, and its part of speech accord-ing to an analytical lexicon. The functional data for Word-level include person, number, gender,tense, and state, since these are derived from the distributional data such as morpheme affixes.

For phrase-level, clause-level, and sentence-level, each level is split into distributional andfunctional units. Thus there are two kinds of phrases: Distributional “phrase atoms” and func-tional “phrases”. The same is true for clause-level and sentence-level (i.e., clause atoms vs.clauses, sentence atoms vs. sentences).

Distributional atom-units are contiguous, and may not be complete units. Functional units aremade up of distributional units of the same level (e.g., phrases are made of phrase atoms). Thus,while functional units sometimes coincide with their distributional counterparts, the functionalunits may be larger, being comprised of more than one distributional unit of the same level.

For phrase-level, functional phrases are the largest units which have a function in the clause,whereas phrase atoms are the next largest phrasal units. Functional phrases are given a labelsignifying its role in the clause. These labels can be seen in Table B.10 on page 182, and will beused later in the thesis.

PPs are not split into their head preposition and object NP, not even at the phrase-atom-level.This calls for specific methods of treating PPs, which I will explicate later.

For clause-level, distributional clause-atoms are mostly full clauses with a single verbal pred-icate, alternatively a nominal clause. However, in the cases of ellipsis and embedding, clauseatoms represent the contiguous, adjacent units which make up the matrix functional clause and

34 CHAPTER 3. HEBREW

its dependents. Functional clauses are full clauses with a single verbal predicate, alternatively anominal clause.

Each clause has a “text type” which shows the type as either “Unknown”, “Narrative”, “Quo-tation”, or “Discourse.” I use this text type when deciding the outermost context of each CGrepresentation of a clause.

All of this is described in detail in Section B.3.3 on page 171. In the above, I have also takenconcepts from the description of words ( B.3.5 on page 177), phrases ( B.3.6 on page 179), andclauses ( B.3.7 on page 183).

3.4.3 SubphrasesThe WIVU database lumps large stretches of words together at phrase-level. Therefore, to rem-edy the gap that arises between word-level and phrase-level, the WIVU databases introduces alayer called “subphrases” in between word-level and phrase-level. This is used for such thingsas parallel constructions (e.g., “A and B and C”), and regens-rectum-constructions5 (the genitiveconstructions mentioned previously, e.g., “the bridle OF the horse OF the king”), and attribu-tive constructions (with adjectives). I will not treat them further here, but refer the reader toSection B.3.5.3 on page 178.

3.5 ConclusionI have briefly introduced the Hebrew Bible, the Hebrew language, and the WIVU database as faras is necessary for my later discussions. Details can be found in Appendix B. Along the way,I have referred to specific sections in that appendix, and the interested reader is encouraged toread the relevant sections in the appendix.

5 Also referred to as “status constructions” or “state constructions.”

Chapter 4

Ontology

4.1 IntroductionSowa (2000a, p. 492) very nicely defines what ontology is:

“The subject of ontology is the study of the categories of things that exist or mayexist in some domain. The product of such a study, called an ontology, is a catalogof the types of things that are assumed to exist in a domain of interest D from theperspective of a person who uses a language L for the purpose of talking about D.”(p. 492; emphasis in original.)

Thus an ontology is always about a certain domain of interest, always has a certain perspective,and is always expressed in some language.

In my work, I need an ontology to stand behind the conceptual graphs produced. This is be-cause an ontology is always behind every conceptual graph, and Sowa (2000a, p. 487) explicitlystates that every knowledge base must have a type hierarchy (i.e., ontology). Since what I ambuilding is a knowledge base, I need an ontology.

What is a domain? The Oxford Concise Dictionary1 gives the definition “scope, field, province,of thought or action” (p. 285). Thus a domain is a scope or field of thought, the encompassmentof the concepts in a particular field of thought. For example, the united domain of cars andairplanes would include concepts such as “wing”, “wheel”, “transmission”, “jet-engine”, “avi-ation”, “driving”, “yaw”, “roll”, “pitch”, “veer”, “slip”, “skid”, “road”, “runway”, “corridor”,“airspace”, “map”, “driver”, “pilot”, “co-driver”, “co-pilot”, and a host of other concepts2 whichare all related by being in the same field of thought.

My ontology shares the characteristics described by Sowa. It is about a certain domain: Thedomain of discourse described in Genesis 1. It is made from a certain perspective, namely a crossbetween that of the WordNet editors and my own reading of the text. And it is expressed in acertain language, namely English.

1 Seventh edition: Sykes (1982).2 Here I use the term “concept” more broadly than in the CG theory. Here, I really mean “conceptual type”,

whereas in the CG theory, a concept is a concept type paired with a referent.

35

36 CHAPTER 4. ONTOLOGY

All three characteristics are important: The first characteristic (domain) is important becauseof what is not there: It does not describe cars or airplanes, because they are not necessary fordescribing what is in Genesis 1. It is also important because of what is there: It delimits andidentifies what concepts are really necessary for expressing the semantic content of the text. Thesecond characteristic (perspective) is important because it influences how the ontology is shaped,and hence how one can interpret the graphs which my method produces. The third characteristic(language) is important because it highlights a certain fact to which I shall return later,3 namelythat my method is partially a method involving machine translation between languages (Hebrewand English-in-conceptual-graphs).

4.2 9th semester work

4.2.1 IntroductionI have chosen to base my ontology on the ontology which I created in my 9th semester report.4 Inthat report, I showed how one can take a concise Hebrew-English lexicon5 and programmaticallymatch it with WordNet6 to produce an English-language type hierarchy.

In the following, I will recount the most important aspects of my 9th semester work. First,I will recount some of the ontological concepts or ideas which I described (4.2.2). Next, I willdescribe the Hebrew-English lexicon on which my work is based (4.2.3). Next, I will brieflydescribe WordNet (4.2.4). Finally, I will describe the method I used (4.2.5).

4.2.2 Ontological conceptsIn my 9th semester report (pp. 11–14), I described various concepts or ideas related to ontology.Work in the field of ontology can be seen as being placed on a continuum ranging from “purelymetaphysical considerations to very formal considerations (formal ontology)” (p. 12). Sincewhat I was dealing with was formal ontology, I described some important concepts or ideas fromformal ontology.

One very important concept is that of “ontotype.” Ontotypes are also called “conceptualtypes”, “types”, and “categories” in the literature. An ontotype corresponds to one of the cate-gories that Sowa mentions in his definition of ontology cited above. Thus ontotypes are the stuffout of which ontologies are made.

Two other important concept are “individual” and “extension”. Any ontotype may have zeroor more individuals in its extension. That is, while an ontotype always is abstract, referringto a category of entities, the extension of the ontotype is the set of real-world entities that ex-ist belonging to that category. Those real-world entities may be physical (as in the individual

3 See Section 10.2 on page 123.4 Petersen (2003).

You can download my 9th semester report from <http://www.hum.aau.dk/~ulrikp/studies.html>.5 The lexicon chosen was that of the Werkgroep Informatica. The lexicon has a homepage here:

<http://jakob.th.vu.nl/~hjb/lexicon.html>.6 See Fellbaum (1998) and <http://www.cogsci.princeton.edu/~wn/>.

4.2. 9TH SEMESTER WORK 37

“Jerusalem” belonging to the ontotype “city”) or abstract (as in the individual “42” belonging tothe ontotype “natural number”).

A concept related to “extension” is “intension.” The intension of an ontotype is the set ofcharacteristics which all or most of the individuals in the extension of the ontotype have incommon, which serve to define the ontotype.

A formal ontology, then, is “a logical specification of essential concepts within a domainstructured by relationships identified between the concepts.” (Nilsson (2001)). The relationshipmost commonly used to structure the domain is that of the is-a relation, also called the “hyper-onymy” relation. If ontotype A is-a ontotype B, then B is said to be a hyperonym of A. B is alsosaid to be a supertype of A, while A is said to be a subtype of B. A inherits all of the intensionalproperties of B, while adding some intensional properties of its own. Therefore, A is said to bea specialization of B, while B is a generalization of A. Any ontotype may have more than onesupertype, which is called multiple inheritance.

When ontotypes are ordered by an is-a relation, this can be seen as a partial order overontotypes. Partial orders give rise to mathematical structures called lattices. Hence, a formalontology is often constructed in such a way that it becomes a lattice. This, in particular, meansthat for any two ontotypes in the lattice, it is always possible to find a unique least commonsupertype, and a unique greatest common subtype.7 Also, it means that there must always bea unique topmost element, variously called Entity, Universal, or Top, and a unique bottommostelement, which I will call Absurdity.

This concludes my recounting of the most important ontological concepts which I describedin my 9th semester report.

4.2.3 The lexicon

The lexicon used is the one produced in the Werkgroep Informatica, by PhD-candidate HendrikJan Bosman and Prof. Dr. Ferenç Postma. In my 9th semester report, I did not describe the fullrange of features found in the lexicon, but restricted myself to a few that were relevant for mywork. In the following, I recount what I said back then (pp. 15–16).

The lexicon divides lexemes into fourteen parts of speech. Each lexical entry is keyed to alexeme, and contains a number of fields.

Some fields are present for all lexemes, whereas some are present only for some parts ofspeech, and some are present only optionally. All lexemes have a lexeme, a part of speech, anEnglish gloss, and a German gloss. Nouns may also have gender, semantic set, plural form, anddual form. The semantic set of a noun is one or more labels drawn from a small, fixed set ofsemantic designations describing some general tenet of the semantics of the noun. Verbs mayoptionally have verbal stem (binyan).

The structure of the glosses follows a certain pattern. Within each gloss, different senses areseparated by semicolon, whereas different glosses for the same sense are separated by comma.Sometimes, a gloss will have an elaboration, either in <angle brackets> or in (parentheses).

7 See Braüner, Nilsson and Rasmussen (1999, pp. 458–459) for a nice introduction. Nilsson (2001) goes intogreater detail.


This concludes my recounting of my description of the lexicon in my 9th semester report.

4.2.4 WordNetWordNet is an “electronic lexical database”, which is also the title of the main work currentlydescribing WordNet (Fellbaum (1998)). In my 9th semester report, I described the aspects ofWordNet which were most important for my work (p. 14,17).

WordNet contains almost two hundred thousand lexical items, organized into so-called synsets.A synset is a set of lexical items which are related by synonymy. The synsets, in turn, are relatedby various semantic relations. For synsets of nouns and verbs, the most important one is that ofhyperonymy. Synsets of adjectives and adverbs are not ordered according to hyperonymy, but bymeans of other relations which I did not use in my 9th semester work.

WordNet’s nouns and verbs are organized into several hierarchies, each with a “unique be-ginner”. The unique beginners are not connected at the top. To remedy this deficiency, Miller(1998, p. 30) suggested a way of organizing the unique beginners for nouns into a hierarchywith more levels (p. 14). In a similar vein, Martin (1995) has shown how to organize the uniquebeginners for nouns according to Sowa’s top-level distinctions described in Sowa (1992).

WordNet provides an API which I have exploited in my 9th semester work. The two mostimportant functions are: A) Finding hyperonyms recursively (up to the unique beginner), and B)Finding just synsets.

4.2.5 MethodThe bulk of my 9th semester report went into describing my method (pp. 17–27). The followinggives an overview of the most important aspects of my method.

In my method, I invented a number of new data-structures to describe and contain the ontol-ogy. I declared that an ontology was a list of entry clusters, where each entry cluster containedzero or more ontology entries (p. 20). Each of these concepts will be explained in more detailbelow.

An entry cluster corresponds to a WordNet synset. It has: A) an ID that is unique within theontology, B) The WordNet synset on which it is based, C) an English gloss, and D) A set of IDspointing upwards to the entry clusters immediately above it in the synset hierarchy. In addition,it has a set of IDs pointing to zero or more ontology entries which it contains. (p. 20).

An ontology entry corresponds to a sense of a lexeme in the Hebrew-English lexicon. It has:A) An ID that is unique within the ontology, B) The lexeme on which it is based, and C) AnEnglish gloss. It does not have a placement in the overall ontology, except indirectly, through theentry cluster of which it is a member. (p. 20).

An ontology, then, is a list of entry clusters, and the entry clusters themselves have theupwards-pointing IDs which make this list into a lattice. (p. 20).

The only parts of speech which are in the ontology are the most important open-set classes,8

8 In linguistics, lexical classes (or word-classes, or parts of speech) can be divided into open classes and closedclasses. Closed classes are those whose membership is fairly stable over time, i.e., in which not many new members

4.2. 9TH SEMESTER WORK 39

namely noun, verb, adjective, and adverb. This is largely because WordNet only contains theseparts of speech. However, in my 9th semester report, I also gave a number of other reasons forthis choice. First, the parts of speech found in WordNet carry the most semantic vs. grammaticalinformation. The other parts of speech also carry semantic meaning, but their overall contributionto the meaning of a sentence is often more grammatical than semantic. Second, all the other partsof speech, except Proper Noun, are all closed-set classes. This, coupled with the fact that theyonly represent a small number of lexemes compared to the four included parts of speech, makesit feasible to deal with them separately. Third, it is not evident that the other parts of speechbelong in an ontology at all. For example, prepositions combine with noun phrases to formprepositional phrases, and their contribution to the meaning of a sentence is to give rise to arelation rather than a concept. Although relations can also be encoded in an ontology, it mightbe better to deal with them in a relation hierarchy than in a conceptual hierarchy. Similarly,personal pronouns and demonstratives would typically give rise to indexicals in concepts ratherthan concept types. The conjunctions are likely to give rise to juxtapositions (“and”) or otherrelations between graphs (“or”, “because”, etc.) rather than concept types. Negatives are likelyto be translated into negation operators. Interjections could be represented as propositions withspecific literal referents. To summarize, all other parts of speech are likely to give rise, in aconceptual graph, not to a concept type, but to something else, be it an indexical, a relation or astructural change. Therefore, they probably do not belong in an ontology over conceptual types.(p. 21).

When a gloss does not occur in WordNet, I have emended the lexicon so that it does. This isbetter than hard-coding into the program how to deal with a certain lexeme, first, because it doesnot alter the method, and second, because it is far simpler than hard-coding the change into theprogram. (p. 22).

How do I deal with the various parts of speech? Verbs and nouns are placed into a hierar-chy containing all the hyperonyms up to the unique beginners. Adjectives, not being orderedhyperonymically in WordNet, are all placed underneath the “attribute” entry cluster.9 Similarly,adverbs, also not ordered hyperonymically in WordNet, are placed underneath the “manner”entry cluster. (p. 23).

The algorithm works as follows. First, I parse each lexical entry into an Abstract SyntaxTree (AST) according to generally accepted compiler-construction principles. Second, I traversethe AST in such a way that each semicolon-separated sense ends up as a separate ontologyentry in a separate entry cluster, whereas each comma-separated gloss contributes to the sameontology entry. Each semicolon-separated sense gives rise to a set of strings which are candidatesfor matching with WordNet. These strings are based on the comma-separated glosses, as well

appear in the language over long periods of time. Open classes, on the other hand, are those classes whose mem-bership is open to admission of new members. Nouns, Verbs, Adjectives, and Adverbs are most often seen as openclasses, whereas most other classes tend to be fairly closed. Closed classes would include determiners, prepositions,pronouns, negators, conjunctions, other particles, etc. See Matthews (1997, pp. 57, 257) for a fuller treatment.Givón (1984, pp. 51–56) gives an introduction to the time-stability scale, which explains why some lexical classesare more stable while others are less stable, thus giving rise to the notion of “open” vs. “closed” classes.

9 Thus it is assumed that all adjectives are attributive. They may, of course, be predicative, but even in that case,the best place to place them in the ontology would be under “attribute”.


as their combinations with elaborations in (parentheses). Then, various heuristics are used tofind the most likely candidate for a WordNet synset to match the set of strings. In particular,the algorithm aims to find the most specific ontotype which is still a description of all of thematches. This gives rise to a synset, which is then used for the basis of an entry cluster. If theentry cluster is not present in the ontology already, it is added, along with all of its supertypes.Then an ontology entry is created to contain the sense, which is then added to the entry cluster.(pp. 23–25).

Various methods have been used to remedy the situation when the algorithm failed. Themost common problem was that there was no match with WordNet at all. In such cases, I haveemended the lexicon to find the nearest synonym which does occur in WordNet. The otherproblem dealt with was when the lexeme matched wrongly. In such cases, I have also emendedthe lexicon, this time adding the right gloss to match precisely the correct WordNet synset, orchanging the gloss slightly so as to give rise to a better match. (pp. 25–27).

This concludes my treatment of my method as explained in my 9th semester report.

4.3 Changes from previous workI haven’t changed much in the method for building the ontology since my 9th semester work.Here is a list of changes:

1. In cases where the lexicon did not match WordNet, or where it found the wrong interpre-tation, I have edited the source lexicon by hand so as to better fit with my interpretationof Genesis 1. This, of course, adds my own perspective to the ontology, as discussed inSection 4.1.

2. I have added XML output/input to the program. That means, it can now store the lexiconand ontology in an XML file, which it then provides capabilities for reading into memoryagain. This was so as to speed up the usage of the ontology, since creating the ontologytook longer time than reading a ready-made ontology from XML.

3. I have wrapped the API using SWIG10 so that I can use it from Jython.

4. I have added various API niceties for using the lexicon and ontology from Jython.

5. I have added Sowa’s top from Sowa (1992) (see Figure 4.1 on page 42). I have also addedsome of the distinctions found in Martin (1995), plus added all of the verb unique beginnerswhich show up in my text. The result is seen in Figure 4.2 on page 43.

6. I have added the semantic set “padj” (“potential adjective”) to the adverb “VWB/” (’tov’,which means ’good’). This is so as to signal to the ontology-building program that it couldbe taken as an adjective.

10 SWIG is the “Simplified Wrapper Interface Generator”, a program which can wrap C/C++ code so that it canbe used from a variety of other programming languages. SWIG has a website at <http://www.swig.org/>.

4.3. CHANGES FROM PREVIOUS WORK 41

7. I have added the semantic set “card, ordn” to the noun “>XD/” (’achad’, ’first’, ’one’) soas to signal that this can be both a cardinal and an ordinal. In the original lexicon, it waslisted solely as a cardinal.

8. Each entry cluster now has a unique English gloss, as does each ontology entry. Both entrycluster glosses and ontology entry glosses are unique in the ontology; that is, they are bothdrawn from the same pool of unique glosses.11

9. I have improved the algorithm for finding the English gloss of an ontology entry. It nowuses the first gloss in the lexical entry.

10. When a verb occurs in moods other than qal (see Section C.6 on page 189), I have madesure that they show up in the lexicon.12 Furthermore, I have changed the program suchthat they are read into the lexicon in-memory, and stored/read to/from XML. However, Ihave not taken the added lexemes into the ontology. This is because none of the lexemesin question change their meaning beyond adding intensive (for piel), causative (for hifil)or passive (for nifal), so a separate entry in the ontology is not necessary.

11. When a word can occur in more than one part of speech (“such as “M>D”, which means“very”, and can occur as both a noun and an adverb), I have made sure that all potentialparts of speech are tried as a match against WordNet.

12. When a verb means “be X”, where X is an adjective (for example, “VWB[”, ’tov’, ’begood’), I have added the ontology entry to the adjective, and added a flag in the ontologyentry saying that it is to be construed as a verb with the meaning “be X”, where “X” istaken from the containing adjectival entry cluster.

13. I have programmatically emended the lexicon such that when a noun is a cardinal or ordi-nal, I treat it as an adjective. This is because “>XD” (“one”, “first”) and “CNJM” (“two”)show up as nouns in Hebrew but as adjectives in English. Hence, WordNet would not findthe correct entries if I treated them as nouns.

14. I have stipulated that when an entry cluster ECsub is a subtype of another entry clusterECsuper, and ECsuper has at least one ontology entry, then ECsub is properly a subtypeof all of the ontology entries in ECsuper, not of ECsuper itself. This is reflected in theabbreviated ontologies in Appendix F on page 209.

11 This is why the graphs exhibit concept types such as “be_1”. This means that the “be” concept type had alreadybeen taken before the ontology entry for the verb “HJH” (to be) was added, and so “_1” was suffixed to distinguishit from the other “be”.

12 An exception was “<WP[”, which occurs in the piel in the text, but which is not present as a separate lexicalentry for piel in the lexicon. The only other exception was “QWH=[”, which occurs in the nifal in the text, but whichis already nifal with this lexeme in the lexicon.


Figure 4.1: The top ontology from Sowa (1992)

The top node (T) is split into a Situation/Entity dichotomy. The Situation node is split into aProcess/State dichotomy, reflecting the time-scale of the Situation. Entities are split into a

Representation/Physical_object/Abstraction trichotomy. Propositions serve as one importantsubtype of Abstraction.

4.4 Discussion of ontology

4.4.1 Introduction

The ontology can be seen in Appendix F on page 209. In this section, I discuss some aspects ofthe ontology, with a view towards clarifying both the top, my integration of WordNet, and thechoices made in the ontology. I do so first in a long section about the top (4.4.2), and then in amore general section on some of the choices made in the ontology (4.4.3).

4.4.2 The top

4.4.2.1 Introduction

As noted in Section 4.3, I have largely adopted Sowa’s top from Sowa (1992) (see Figure 4.1),as well as some of the distinctions found in Martin (1995). This allows for some reasoning aboutthe ontology.

In the following, I describe Sowa’s top, my adaptation of it, and my incorporation of theWordNet unique beginners.

4.4.2.2 Situations, States, Processes, and Events

Sowa (1992) makes a distinction between Situations and Entities. A Situation “is a finite config-uration of some aspect of the world in a limited region of space and time” (p. 13). A situation

4.4. DISCUSSION OF ONTOLOGY 43

Figure 4.2: My own top ontology, derived from Sowa (1992) and Martin (1995).

A “W_” prefix means that the concept type comes from WordNet. Other than that, Sowa’s top istaken over with little modification. Underneath “Representation”, I have added

“Conceptual_graph”, which will be used for representing Rules.


may be further subdivided into whether it is a State or a Process, the former having no relevantchanges to its configuration over a period of time, and the latter undergoing change (Sowa (1992,p. 19)). This is with reference to some “clock tick” which is relevant for the situation. For exam-ple, with respect to geological formation of rocks, a century might be a relevant clock tick if theSituation is to be seen as a Process, whereas seconds are usually a good clock tick for measuringhuman States and Processes (ibid.).

Finally, an Event is a special kind of process which happens instantaneously with respect tothe relevant clock tick.

4.4.2.3 Actions

Sowa places Actions as a subtype of Event, something which Martin (1995, section 5.1) criti-cizes. By placing Action underneath Event, Martin says, Sowa thereby forbids that Action canbe a State. For this reasons, Martin preferred to place “action” underneath Process, arguing thatthen the knowledge engineer could choose to see “action” as a State.

I must admit that I don’t follow Martin’s logic. If “action” is placed as a subtype of “Process”,it still cannot be a “State”, since the distinction between State and Process is a dichotomy. Rather,the distinction one gains by placing “Action” as a subtype of “Process” rather than “Event” isthat the “clock tick” grows longer, but the action still induces change.

Like Martin, I have chosen to view “Action” as a Process rather than an Event. This isreflected in Figure 4.2 on the page before. The reason is that in my chosen text, the things that Iwould want to call “Actions” have such varied “clock ticks” that they cannot all be consolidatedunder “Event.” For example, when in Gen 1:1 the Spirit of God was “hovering” over the faceof the waters, the way I read it, the “clock-tick” is fairly long, perhaps centuries, perhaps hours.Whereas when, in Gen 1:28, God speaks to the first male and female human pair, the “clock tick”is much shorter, on the human level of face-to-face conversation between Creator and creature.Thus not all Actions can be viewed as Events, and therefore it is better to view them as Processes.

4.4.2.4 To bless: State or Process?

One could argue that certain things which I have called “Actions” are in fact States, because theyseem to induce no physical change in the Situation at hand. One example would be “BRK[”,which means “to bless”. In both contexts in which it is used in my text (Gen 1:22 and Gen1:28),13 the blessing is done by God by means of a speech act which he performs towards hiscreatures. The question is: Is this speech act a State (because no immediate physical change isinvolved in the Situation), or a Process (because some kind of change is involved after all?)

I have chosen to call it an Action because the act of blessing does in fact induce a change inthe state of affairs of the Situation context, even if the change is not immediately physical: To

13 Gen 1:22 reads “God blessed them [every creature living in the sea and every bird] and said ’Be fruitful andincrease in number and fill the water in the seas and let the birds increase on the earth.”’ (NIV)

Gen 1:28 reads “God blessed them [the first two humans] and said ’Be fruitful and increase in number; fill theearth and subdue it. Rule over the fish of the sea and the birds of the air and over every living creature that moveson the ground.”’ (NIV)


bless, when it is God doing the blessing, has the same effect as a commissive: God promises todo good for his creatures,14 and the act of blessing constitutes the promise. Therefore, the act ofblessing causes the following change for the blessed parties: They go from a state of not beingblessed to a state of being blessed, or under God’s promise of future benefaction.

This raises the metaphysical question of the extent of the domain about which we are speak-ing. Do we allow non-physical entities such as commissive speech acts to induce a relevantchange in our Situations? Or do we prefer positivistic adherence to matter-and-matter-only? Ihave chosen to allow more non-physical material to enter my ontological commitments than mostpositivists would care for. This is because it is necessary to do so in order to take the text underinvestigation seriously. The text bills itself as being an account of how God, a Spirit, speaks intothe world and causes changes in the world by means of speech acts. Furthermore, the text dealswith relationships of love between man and woman, and love is difficult to reduce to merelyphysical actions or relationships. Hence, a more-than-physical ontological commitment is nec-essary in order to take the text seriously, and this is what I have done, exemplified in my choiceof letting “BRK[” (to bless) be an Action.

4.4.2.5 States

Most verbs end up under “Action”, but not all. A few verbs end up under “State” becauseI have explicitly placed their unique beginner under “State”. This is the case with “see”, forexample, whose unique beginner is “perceive, comprehend.” Verbs of cognition and perceptionare normally viewed as stative verbs,15 and hence I have placed them under “State”.

Another example is “be”, which is often seen as a stative verb.16

All verbs which do not have their unique beginner under “State” end up under “Action.” Thisis so as to be able to use the label “Action” in relation signatures such as AGNT to say that theymust involve an Action.

4.4.2.6 Phenomenon

I have chosen to place WordNet’s “phenomenon” as a subtype of Process. This is because theonly two “phenomenon”s that occur in my text are “light” and “breath”. Since light is indeed aphysical process, involving photons that, when viewed as waves, emanate by means of a changein an electrical field, it makes sense to place “phenomenon” as a subtype of Process. Likewise,breath is a physical process which causes change in the lungs of the breather, and in their blood-stream.

14 In this case, allowing them to be fruitful and increase in number; and, in the case of humans, to let them extendtheir rule over the other creatures.

15 See, e.g., Van Valin and LaPolla (1997, pp. 114-115).16 This placement has profound implications for my method, as it necessitates different rules for “be” than for

“Action”. See, e.g., Figure 9.2 on page 106.


4.4.2.7 Entities

Sowa does not define explicitly what an Entity is, but one may infer that it is everything whichis not Situation-like. Under Entity, one finds the dichotomy between “Abstraction” and “Physi-cal_object”. The Abstractions, in turn, include Propositions and Attributes.

Propositions are important in and of themselves in the theory of Conceptual Graphs, sincethey are often used as descriptions of situations, among many other uses.17

Attributes include most adjectives in the text. I have simply placed an adjective underneath“Attribute” by default.

Under Attribute, one also finds Property, and under Property, “Manner”. As explained in my9th semester work, I have placed all adverbs under “Manner”. This is in line with the only twoadverbs present in my text, “KN” (“so”) and “M>D” (“very”). The latter is used to modify anadjective18 and so may properly be viewed as subtypes of Manner. Similarly, “KN” is alwaysused to say “and it was so”, which could be translated by the graph

[Universal: #it]->(STAT)->[BE]->(MANR)->[So]Under Representation, one finds “Conceptual_graph”. It is important for my formulation of

the rules to be able to talk about conceptual graphs in the ontology, as we shall see later.

4.4.2.8 psychological feature

I have disagreed with Martin and placed WordNet’s unique beginner “psychological feature”under “Abstraction” rather than “State”. The reason is that at least one entry under “psychologicalfeature”, namely “kind” is more like an Entity than a Situation. This is because a Kind is notlimited to a particular region of space and time, but is rather a categorical description of a certainclass of Entities. The other entry which appears under “psychological feature” is “God.” Godis also more like an Entity than a Situation, and hence this entry also makes more sense under“Abstraction.”19

4.4.3 The rest4.4.3.1 Introduction

In this section, I describe some peculiar points in the ontology which are of general interest.

4.4.3.2 God

One peculiar feature of the ontology is that God is an Abstraction, a “psychological feature”, anda “belief”. This is because WordNet places God under “spiritual being”, which they have clas-sified as a “belief”, the “content” of a “cognition” which is a “psychological feature”. Thus the

17 See, e.g., Sowa (1992, p. 14–15).18 Gen 1:31, “W-HNH VWB M>D”, “and behold, it was very good.”19 The doctrines of God’s being immanent, omnipresent, and eternal would imply that he is present everywhere all

the time. But this is not a “limited region of space and time”, which are the criteria for being a situation. Everywhereand every time are not limitations.


ontology would seem to imply that God only exists in the mind of the believer. Both Christians,believing Jews, and Muslims would vehemently argue against such a reductionist description.However, a few points may be put forward in favor of leaving the ontology as it is rather thanemending it.

First, although my method works from a closed-world assumption, this is only from theinside of the meaning-universe of the process. Conceptual graphs gain their meaning, not fromthe ontology and the catalog of individuals alone but also from the meanings which the humanreader of the conceptual graphs invest in the labels in the ontology. Hence, it is possible thatthat meaning may be imported from the outside, and thus the ontology could be augmented withan external (human) reader’s assumptions about the entities and situations in the ontology. Thisleaves room for extra interpretations for the concept of “God” being placed into the reading ofthe graphs.

Second, while it may or may not be true that God is more than a psychological feature, it iscertainly true that, if he is more than that, he communicates himself by means of psychologicalfeatures. While God may not in and of himself be a psychological feature or belief, he is man-ifested as such in the vast majority of (believing) people’s minds. This is no different from myconception of New York. I have never been to New York, but I believe that such a place exists.Hence, in my mind, New York manifests itself as a belief, even though in physical reality it ismore than that. Hence, it is entirely true to reality to classify God as a belief.

Third, two draw the former two together: It is acceptable to classify God as a belief andhence an Abstraction if one remembers that there is room for classifying him as being more thanthat, by means of external augmentation of the ontology in the process of ascribing meaning tothe conceptual graphs.

4.4.3.3 Two

The word for “two” is “CNJM” in Hebrew. Because it is an adjective in WordNet, it ends upunder “Attribute” when it might more properly have been a Number. I have chosen not to doanything about it because, in my view, it doesn’t really matter for the purposes and scope of thisthesis whether a number is a Number or an Attribute.

4.4.3.4 Primeval ocean

The Hebrew word “THWM” is used in Gen 1:2 to say that “darkness was on the surface of thedeep (THWM)” (NIV). Holladay (1988, p. 386)20 states that it here means “primeval ocean,deep”. Unfortunately, WordNet does not have an entry for “primeval ocean”, and hence I havechosen to simply label it “ocean”.

4.4.3.5 void

If one looks carefully at the ontology, one sees that “void” ends up both under “abstraction” andunder “Physical object”. This is because WordNet classifies a void as a “space”, which can be

20This is my reference Hebrew lexicon.


both an “amorphous shape” (which ends up under “abstraction”) and a “location” (which endsup under “entity, physical thing”. An interesting philosophical question arises here: Is a void,which is a nothingness, a part of space? And if it is part of space, is it a physical thing or anabstraction? The WordNet editors have left both options open, in that they haven’t made anontological commitment either way.21

4.4.3.6 image/likeness

Genesis 1:27 states that:

“God created man in his own imagein the image of God he created himas male and female he created them.”

The question is, what does “image” (Hebrew “YLM”) mean here? The WordNet entry whichcomes closest to my understanding of the word is “likeness, alikeness, similitude”, with thedefinition “similarity in appearance or character or nature between persons or things”. The otherpossibility would have been “double, image, look-alike”, but this would have made God’s imagea person and a being, which it clearly is not in this context.

4.4.3.7 creepy-crawly

The Hebrew word “RMF” has been translated as “creepy crawly.” Holladay (1988, p. 341) saysthat the word signifies “animal world exc[luding] large animals and birds: coll[ectively] smallanimals, reptiles.” So we are dealing with insects, arachnids, worms, and other small animals.WordNet defines “creepy-crawly” as “an animal that creeps or crawls (such as worms or spidersor insects)”, which seems to fit the bill perfectly. Moreover, “RMF” comes from a root whichmeans “swarm, teem (of vast numbers of creatures in water, on ground, in woods; in randommovement)” (ibid.), which also seems to tie in with “creeping” and “crawling”. Hence, thischoice is justified, if a little curious.

4.5 ConclusionAfter a general introduction to the subject of ontology (4.1), I have described my 9th semesterwork, on which my ontology for this thesis is built (4.2). I have then described the changes Ihave made to the 9th semester in order to obtain the ontology as it is used in this thesis (4.3).I have then gone into detail about the top of the ontology, including its integration and certainpeculiarities in the ontology (4.4). All of this drives towards one goal: Producing an ontologywhich can form the basis of giving meaning to, shaping, and reasoning about the conceptualgraphs which I hope to produce.

21 Blaise Pascal and Pater Noël had a discussion in Pascal’s day about the nature of empty space. It is recounted(in Danish) in Hartnack, Justus and Johannes Sløk (eds.) (1970): “De store tænkere: Pascal”, pp. 33–69, BerlingskeForlag, Copenhagen.

Chapter 5

Literature review

5.1 Introduction

In the literature, two methods have been described for creating conceptual structures from text.One method centers around a repository of canonical graphs, while the other uses rules to trans-form syntactic structures to conceptual structures guided by an ontology, but without canonicalgraphs.

In this chapter, I describe much of the previous work done in this area. Section 5.2 deals withthe method centered around canonical graphs, whereas Section 5.3 deals with the rule-basedmethod. Finally, I conclude the Chapter in Section 5.4.

5.2 Canonical graphs

The first method described in the literature was that of using a lexicon of canonical graphs. Thefirst reference which I could find was Sowa and Way (1986). Others using this methodology havebeen Sowa (1988), Velardi, Pazienza and De Giovanetti (1988), Fargues, Landau, Dugourd andCatach (1986), Landau (1990), Antonacci, Pazienza, Russo and Velardi (1992), Bornerand andSabah (1992), Schröder (1993), Mann (1993), and Zweigenbaum and Bouaud (1997). The articleby Sowa and Way (1986) laid the groundwork for most of the later work done using this method,and therefore I will describe the work done in this article in the greatest detail, comparing othermethods with this point of reference.

5.2.1 Sowa and Way (1986)

Sowa and Way start by describing various levels of meaning in communication which any theoryof semantics must deal with if it is to be successful in describing the meaning of any domain.These levels include lexical, syntactic, semantic, and episodic information. The authors thenproceed to describing how their method deals with each of these.

49

50 CHAPTER 5. LITERATURE REVIEW

The lexical and syntactic levels are dealt with using a parser developed by Jensen and Hei-dorn.1 The semantic and episodic levels are dealt with using a lexicon of canonical graphs,optionally augmented with a lexicon of schemas. A canonical graph is a minimal graph whichdescribes the default patterns in which a given concept is used. A schema is a more elaborategraph which shows background information in addition to normal usage.2

The system takes the parse tree generated by the parser, and proceeds in a bottom-up fashionto construct the resulting conceptual graph. The process begins at the word-level, where thelexicon of canonical graphs is used to map word senses to canonical graphs. The resultingcanonical graphs are then joined, using a maximal join3 algorithm, with the parse tree as a guideto the order in which the joins should take place. The joins are sometimes blocked, in which casethe system backtracks and either tries different word senses (other canonical graphs) or differentjoin-orders (using the same parse-tree) or different join-orders yet (using different parse-trees).Thus the basic operation is maximal join on conceptual graphs, starting from canonical graphs atthe word-level, working one’s way up the parse tree, backtracking where necessary.

5.2.2 Sowa (1988)This is also the basic process described in Sowa (1988). In this article, however, Mr. Sowaexplains in greater detail how the process deals with ambiguities, and how it tackles a number ofother problems in semantics and syntax.

Of special noteworthiness for my efforts is Sowa’s section on how hard it is to produce alexicon of canonical graphs (pp. 135–136). He mentions that for the DANTE project (nextsection), it took nearly a day to write one entry in the lexicon of canonical graphs by hand. Astudent named Magrini then wrote a program to automatically extract first drafts of entries fromtext. These could then be checked by a linguist and corrected if necessary. This reduced theaverage time to process one lexicon-entry to about half an hour. I shall use these facts in mydiscussion of whether my method scales (Section 10.6 on page 130).

5.2.3 Velardi et.al. (1988) (DANTE)The article by Velardi et al. (1988) uses a slightly different approach, even if the basic approachis the same. The goals of Velardi et al. are also somewhat different from those of Sowa and Way,which likely has influenced their choice of methodology. While Sowa and Way are concernedonly with generating conceptual graphs from natural language text, Velard et. al. have a muchwider scope: Their system does generation of conceptual graphs from text, generation of natural-language text from conceptual graphs, and query-answering using these two components.

The system relies on a semantic lexicon which is similar to that of Sowa and Way: For eachword sense, a number of “Semantic Surface Patterns” (SSPs) are given, along with larger graphs

1 See Jensen, Karen and Heidorn, George E. (1983), “The Fitted Parse: 100% Parsing Capability in a SyntacticGrammar of English”, in Proceedings of the Conference on Applied Natural Language Processing, Association forComputational Linguistics, Santa Monica, CA.

2 See also Sowa (1984, pp. 90–96).3 Maximal join is described in Sowa (1984, pp. 96–103).

5.2. CANONICAL GRAPHS 51

which are case-frames giving background- and usage-information. The former is analogous toSowa and Way’s canonical graphs, while the latter is analogous to Sowa and Way’s schemas.The SSPs are in the form of a two-concept branch,

[Concept-being-defined]->(Relation)->[Related-concept]

or

[Concept-being-defined]<-(Relation)<-[Related-concept]

The process proceeds through morphological and syntactic parsing through matching ofwords with conceptual types from the lexicon, checking of constraints through SSPs, genera-tion of semantic hypotheses about the meaning of the various parse-tree constructions, checkingof semantic hypotheses using SSPs and word-usage-graphs, and generation of conceptual graphsfrom semantic hypotheses. The finished product is a conceptual graph representing the meaningof the original sentence. This conceptual graph can then be stored in a database for later matchingin the query-answering system.

5.2.4 Other work

The article by Fargues et al. (1986) shows how the principles laid down in Sowa and Way (1986)can be implemented using Prolog (Sowa and Way used the Programming Language for NaturalLanguage Processing, PLNLP). This project is the KALIPSOS project at IBM Paris.

An article related to the KALIPSOS project described in Fargues et al. (1986) is Landau(1990). It describes the KALIPSOS project more from a linguistic perspective, and less froma systems programming perspective. It also gives examples of how the method disambiguatesambiguous sentences, and how it deals with anaphoric references.

The article by Zweigenbaum and Bouaud (1997) uses an interesting application of LexicalFunctional Grammar (LFG), but nevertheless the basic method is still syntax-directed joiningof canonical graphs, and thus falls within the scope of the method proposed by Sowa and Way(1986).

The article by Mann (1993) aimed at developing a system for aiding physical navigation.The method used was basically that of Sowa and Way (1986), but with an extra step aimed atpragmatic disambiguation and refinement.

The research presented in Bornerand and Sabah (1992) is a fascinating hybrid between themethod of Sowa and Way (1986) and the method presented in a later section ( 5.3.2 on page 55).The basic algorithm is still syntax-directed joining of conceptual graphs, but the lexicon andthe grammar-rules are described partially in terms of conceptual graphs. This means that eachgrammar rule results in a conceptual graph representing the grammatical information found bythe lexical and syntactic parsers. These graphs are then used as input to the joining-algorithms,resulting in the finished interpretation of the original sentence in conceptual graphs.


The research presented in Schröder (1993) uses an approach similar to Sowa and Way (1986)but augments the method for the purposes of inference-based querying of the domain in question,namely radiological reports in the field of medicine.

The research of Rassinoux, Baud and Scherrer (1994) presents some interesting ways ofdoing transformation of natural language texts (in this case, medical discharge reports) into con-ceptual graphs. The basic method is still syntax-directed joining of schemata, but with intriguingtwists in how the syntactic analyses are obtained. Not only syntactic knowledge is brought tobear on the text before joining schemata, but also semantic knowledge and functional knowledge.

5.3 Rule-based transformation

The second method described in the literature is ontology-guided, syntax-driven, and rule-based.One of the first to use the method was Barrière (1997). Another work in this vein is Nicolas,Mineau and Moulin (2002).

I have chosen to follow some methods from Nicolas et al. (2002), but mostly to follow theoverall approach taken in Barrière (1997). I have adapted both methods to fit the context of myparticular problem. The adaptation has taken into account the fact that I am dealing with Hebrewrather than English, that I am dealing with a particular syntactic model of analysis (the WIVUdatabase), and that this project is really also a project in machine translation to some extent.4

5.3.1 Barrière

Barrière (1997) seems to be the major work using the method of rule-based transformations ofCGs. What follows is a summary of Chapter 2 from her book, which explains how she extractsCGs from text.

Barrière’s aim is to take a children’s dictionary and build a lexical knowledge-base from it.In this process, she builds CGs from the text of the dictionary, builds a lexical ontology fromthe dictionary entries, and builds a number of other byproducts which are useful in a lexicalknowledge base.

In building CGs from text, her method does two kinds of disambiguation: First, structuraldisambiguation (singling out one graph from a number of possibilities) and second, semanticdisambiguation (refining graphs so as to have more semantic meaning and less similarity tosyntactic constructions).

The overall method runs in five steps (see also Figure 5.1 on the facing page):

1. Building a parse tree: Tagging and parsing the text into a syntax tree.

4 My thesis is a project in Machine translation because I am translating not only from one natural language(Hebrew) to a formal language (conceptual graphs), but also from a natural language (Hebrew) to a representation(conceptual graphs) that uses a particular natural language (English) for its ontological labels. Thus the translationis not only from natural to formal language, but also from one natural language to another. See also Section 10.2 onpage 123.

5.3. RULE-BASED TRANSFORMATION 53

Figure 5.1: Overview of Barrière’s method (after Barrière (1997), p. 31)


2. Building “raw” CGs from the parse tree: Transforming the syntax tree to CGs which havemost of the syntactic structure intact (but are still quite good, or “semantic” CGs).

3. Building the type hierarchy from the CGs made in step 2.

4. Structural disambiguation:

(a) Prepositional attachment

(b) Conjunctional disambiguation

5. Semantic disambiguation:

(a) Anaphora resolution

(b) Word sense disambiguation

(c) Finding deeper semantic relations (such as “instrument”, “possession”) through Se-mantic Relation Transformation Graphs (SRTGs)

(d) Conjunction distribution

These processes are applied in a spiralling fashion, in that the steps are applied iteratively severaltimes. This is necessary because the steps influence each other (e.g., the semantic disambiguationmight lead to a better type hierarchy, which might lead to better structural disambiguation).

Building a parse tree from “raw” text goes through the steps of tagging and parsing, resultingin a number of parse trees. Various heuristics are used to limit the number of parse trees.

Building “raw” CGs from the parse tree(s) involves applying rules to the parse-tree(s) to buildCGs. This may result in more than one CG, since there may be more than one parse tree, as wellas other factors. Therefore, a number of heuristics are again applied to reduce the number ofresulting CGs. The goal is to have one, or at least very few, resulting CGs. The rules whichtransform parse trees to CGs are based on: a) closed-set words, and b) syntactic structure. Thelatter kind of rules are formulated in terms of phrase-structure production rules from her grammaryielding such and such a CG structure. Thus it is crucial that Barrière wrote the grammar herself,since this means that she knows exactly what the parse-tree will look like, and thus she is able toformulate rules based on the structure of the parse-tree.

Structural disambiguation for Barrière is a process of deciding where prepositional phrasesand conjunctions attach in the parse tree (and hence, the resulting graphs). The disambiguationis done on the CGs, not the parse tree, through (a) statistical analysis of the corpus, (b) decisionsbased on least common supertype (for conjunctional attachment), and (c) Semantic RelationTransformation Graphs (SRTGs). The SRTGs have a premise (“if”) and a conclusion (“then”),and work by replacing (part of) the input CG with the conclusion.

Semantic disambiguation has four components for Barrière: (a) Anaphora resolution (whichis kept fairly simple), (b) Word sense disambiguation (which is done through various heuristics),(c) Finding deeper semantic relations (such as “possession”, “instrument”, etc., which is a pro-cess of replacing surface-semantics with deeper semantic relations, and is done through applyingSRTGs to prepositions and other closed-set words), and (d) Conjunction distribution (which is


the opposite process of conjunctional attachment, namely splitting a conjunction over severalparts of a CG; again this is done through SRTGs).

Thus Barrière uses four steps to produce CGs from text (apart from the step of building a typehierarchy): Text to parse tree, parse tree to CG, structural disambiguation of CGs, and semanticdisambiguation of CGs. These processes are mostly guided by rules, either grammar productionrules, parse-tree based rules, or CG-based rules (SRTGs), although a few heuristics are also used(such as statistical information on prepositional co-occurrence and least common supertype forconjunctional attachment).

The end result is CGs which are of a very high quality, being very semantic in nature, andhaving almost no syntax left.

5.3.2 Nicolas et al.

In this section, I briefly describe the work of Nicolas et al. (2002), in order to be able to use someof their methods later.

The context of the work of Nicolas et al. is a desire to improve upon the present-day searchengines such as Google, by giving the possibility of searching the WWW for semantics ratherthan flat text strings. In this respect, their work resembles the work of the article by Velardi et al.(1988), where the goal was also a query-answering system based on semantics. One difference isthe scope of the input data: Velardi et al. (1988) used their software on a limited corpus of Italiannewspaper articles, whereas Nicolas et al. (2002) wanted to search the entire English-speakingWWW.

The system takes as input a sentence in English. This sentence is then transformed to arepresentation in conceptual graphs. A traditional search engine is then used to retrieve a numberof web-pages which could be candidates for “hits”. These retrieved pages are then transformedto conceptual graphs, and the graphs matched with the graphs from the input sentence. If enoughsimilarity is found, the page is added as a “hit” to be displayed later.

The system relies on an external parser based on Functional Dependency Grammar (FDG) foranalyzing the English sentences syntactically. In order to deal with coordinating conjunctions, alink grammar from CMU is used “to transform a sentence containing a coordinating conjunctioninto a set of sentences without coordinating conjunctions” (p. 20).

WordNet5 is used as a lexical ontology. For purposes of simplicity, only the first sense ineach WordNet entry is used, thus avoiding problems of polysemy. In order to assign a meaningto each syntagm6 from WordNet, an actor is constructed which relates syntagmas to WordNetconcept types.

The first step in generating CGs from the English sentences is to transform the syntacticanalysis from the FDG parser into a representation in Conceptual Graphs. In order to do this,an ontology has been constructed for the various grammatical concepts in the FDG output. Thusthis step is lossless; it simply transforms the syntactic analysis into a CG representation. Theresulting graphs are called “grammatical graphs,” since they bear little relation to the semantic

5 See Fellbaum (1998) and <http://www.cogsci.princeton.edu/~wn/>.6 “Syntagm” is Nicolas’ term for “word.”


Figure 5.2: WordNet-actor from Nicolas (2003), p. 55

CGs which are normally used to express the meaning of text.The ontology for the grammatical graphs has concept types such as “Grammatical_Concept”,

“POS_Modifier”, “Noun_case”, “WordNet_syntagma”, “Noun”, “Verb”, and others. Similarly,a custom relation hierarchy is used. This hierarchy has a dichotomy at the top: “Attr” and“Grammatical_relation”. Under “Grammatical_relation”, one finds a number of relations fromthe FDG grammar, such as “Agent”, “Pre_marker”, “Subject”, “Object”, etc.

The second step is to transform the grammatical graphs into something expressing moresemantics than syntax. This is done by a rule-based procedure, where the rules are themselvesexpressed as conceptual graphs. Each rule is expressed as an If-Then implication using the Word-Net actor depicted in Figure 5.2 to transform words into concepts, and using other grammaticalfeatures to construct the resulting conceptual graph.

5.3.3 Nicolas (2003)

The work done in Nicolas et al. (2002) was later brought to fruition in Nicolas’ M.Sc. thesis,Nicolas (2003). In the following, I describe some of his ideas.

Briefly, the method is the same as in Nicolas et al. (2002) (see p. 49), but with clarificationsand corrections to the framework.

A WordNet-actor is used for transforming bare words to concepts in an ontology. The actoris depicted in Figure 5.2.

The rules are expressed in CGIF7 for easy loading into Notio. Examples of rules are givenin Figure 5.3 on the next page, copied from Nicolas’ Annexe B, page 108. The only change isto the formatting, which has been made more readable. Below I explain the rules, first generallyand then each of the examples in turn.

Each rule follows a general structure: The outermost context of each rule is a concept withthe type “rule”. Inside this concept one finds a nested graph with three disconnected subgraphs:

7 CG Interchange Form, as per the upcoming CG ISO Standard. See<http://www.jfsowa.com/cg/cgstand.htm#Header_44>.


[rule: ;Un verbe à l’actif;[Verbe_grammatical*a][Universal*b]<wn_actor?a|?b>[Graph_grammatical:

[Verbe_grammatical?a][Voix_verbal*c:’active’](Propriete_grammatical?a?c)

][Graph_conceptuel:

[Universal?b]]

].

[rule: ; Pronom interrogatif where;[Adverbe_grammatical*a][Universal*b]<wn_actor?a|?b>[Forme_verbal_grammatical*c][Universal*d]<wn_actor?c|?d>[Graph_grammatical:

[Adverbe_grammatical?a:’where’][Forme_verbal_grammatical?c](Relation_grammatical?c?a)

][Graph_conceptuel:

[Universal?b][Universal?d](loc?d?b)]

]

Figure 5.3: Rules from Nicolas (2003), Appendix B, page 108

Two rules are shown, both with the same overall structure: The top-level concept is a “rule”,nested inside of which we find a number of subgraphs: One or two graphs binding syntagmas toWordNet ontology entries, a “grammatical graph” (the premise) and a “conceptual graph” (the

conclusion). This structure has been taken over with little modification in my own work.

A grammatical graph (“Graph_grammatical”), a semantic graph (“Graph_conceptuel”), and abinding of concepts inside of these two graphs with the WordNet actor.

The first rule transforms a verb in the active voice to a concept with the correct concept type.The WordNet actor binds a grammatical verb “*a” to a Universal “*b”. Inside the grammaticalgraph, one finds a grammatical verb which is coreferent with the “?a” verb acted on by theWordNet actor. One also finds a concept linked to the grammatical verb which says that its voiceis active. Inside the semantic graph, one simply finds a Universal which is coreferent with the“?b” acted on by the WordNet actor.

The second rule is more complex. When the adverb “where” is connected to a verb by anygrammatical relation, this rule infers that they should be transformed to a graph with a conceptcorresponding to “where”, a concept corresponding to the verb, and the two linked with a “loc”relation. The structure and general ideas are the same as for the first rule, so I will not explain itin detail.


5.4 ConclusionI have presented two general approaches to transformation of text to CGs. One is based onsyntax-directed maximal-joins of canonical graphs. The other is based on syntax-directed, ontology-driven, rule-based joining and refinement of graphs. Sowa and Way (1986) were the main expo-nents for the former method, and many followed in their footsteps. Barrière (1997) was the mainexponent of the latter method. The work done in Nicolas et al. (2002) then took Barrière’s ap-proach and applied it to a search engine retrieval application, being brought to fruition in Nicolas(2003).

In my own method, I shall use ideas from both Barrière (1997) and Nicolas’ work.

Part II

Creating conceptual structures

59


60

Chapter 6

Method overview

6.1 IntroductionIn this chapter, I briefly present my method in general terms. Later chapters will flesh out thedetails. I first deliberate on whether to choose a Barrière-style approach or a Nicolas-style ap-proach (6.2). I then give an overview of my chosen method (6.3). I then conclude the chapter,and look towards further chapters (6.4).

6.2 Choosing an approachBarrière transformed her syntax trees into CGs which were quite close to being “good enough”, inthat they actually resembled the end-result, fully semantic CGs. These intermediate CGs still hadbits of syntax left, especially in their relations, but were quite good despite their shortcomings.She then “fixed” the problems by means of Semantic Relation Transformation Graphs (SRTGs),which were rules based on an input premise graph and an output conclusion graph. The resultinggraphs were very good, having almost no syntax left, and were thus fully semantic.

In contrast, Nicolas first transformed the syntax tree to CGs which had little or no semantics,being just a one-to-one mapping of the syntax tree. He then transformed these “syntactic CGs” tomore semantic CGs, like Barrière using a rule-formalism with input premise graphs and outputconclusion graphs.

One can ask why Nicolas chose a different approach than Barrière. I believe part of theanswer lies in the different natures of their input syntax trees. While Barrière’s syntax treeswere “traditional” syntax trees in the Chomsky-tradition, Nicolas used a parser which is basedon dependency grammar. Thus the structure of the respective syntax trees were quite different,and it is my conjecture that the dependency-grammar syntax trees were not very amenable totransformation directly to CGs, due to their different structure.

In addition, Barrière wrote her parser-grammar herself, and thus she knew exactly what thesyntax trees would look like in terms of production rules. This made it possible for her to writerules which worked on specific production rules, mapping each to a semantic CG structure.Nicolas, on the other hand, to my knowledge, had no way of knowing exactly what the syntax

61

62 CHAPTER 6. METHOD OVERVIEW

trees would look like, since his parser and grammar were written by someone else (the companyConnexor Oy). This may also have influenced Nicolas’s choice of method. Nicolas was partially“blind” to the structures of the syntax tree. Thus it would have been more difficult to transformthe syntax trees directly to semantic CGs, because he could not a priori know what syntacticstructures would map to which semantic structures. It is one thing to make rules to transformsyntax trees to syntactic CGs based on very abstract knowledge of the structure of the syntaxtree, and quite another thing to make rules to map particular syntax-tree structures to particularsemantic CG structures, especially when one does not know exactly what syntactic structuresare going to show up. Hence he opted for the intermediate step of syntactic CGs. These wereeasier to deal with in rules because of the richer matching and joining algorithms of CGs, whichmade the problem more tractable from two opposite ends: First, it meant that the rules couldbe more modular and more specific, each tending to a smaller syntactic problem (the locativeof a “where”-question, for example). Second, it meant that the rules could be made to applynot only to specific nodes in the syntax tree and their immediate descendants, but also to theirlower-level features and nodes lower down in the tree, all at once, thus potentially widening thescope of application of each rule. The fact that the rules could look at a whole grammaticalgraph at once and match arbitrary parts of it made this possible. Thus, by using CG-based rulesrather than syntax-tree-directed, very myopic transformation rules, Nicolas was able to cast offthe straight-jacket of the syntax tree and its nodes.

Barrière’s syntax trees were similar to the syntax trees used by Sowa and Way. Since Sowaand Way were successful in joining their CGs, being directed by their syntax trees, into CGswhich closely resembled fully semantic CGs, it should come as no surprise that Barrière’s syntaxtrees would do the same for her. This is because the structure of the syntax trees were similar.While Sowa and Way had canonical graphs to guide the joins, Barrière used ad-hoc heuristicsand arrived at graphs which were, admittedly, inferior to those of Sowa and Way, but she latercorrected these deficiencies through SRTGs.

Barrière, it may be noted, took advantage of the same algorithms in her SRTGs as Nicolasdid in his rules. The main difference between their approaches, then, is in the first step fromsyntax to CG. Where Nicolas produced syntactic graphs with little or no semantics in his firststep, Barrière produced CGs which were close to the end result, and then used methods similar toNicolas’s methods going on from there. Thus Barrière was also able to shun the straight-jacket ofthe syntax-tree after the first step. Barrière’s CGs, as a result, were arguably of a slightly higherquality than those of Nicolas, comparing samples from each of their works.1

I have chosen to be more like Barrière in my method than like Nicolas. For two reasons:

1. First, in order for my method to work, I have to transform the WIVU syntax into some-thing more closely resembling Barrière’s syntax trees. This is necessary because the WIVUsyntax contains phrasal units which are too large and internally undifferentiated to be trans-formed meaningfully into CGs by means of composition or joining. Thus I shall have toderive more traditional, more differentiated syntax trees from the WIVU syntax anyway.

1 Henrik Schärfe pointed out to me that Nicolas had a very convenient methodological self-sufficiency: Hisgraphs were guaranteed to be “good enough” simply because he applied the same graph-extraction-algorithms tothe search input sentence as he did to the search result candidates.

6.3. OVERVIEW OF MY METHOD 63

2. Second, these syntax trees will resemble the syntax trees of Barrière and Sowa and Way,and hence it is plausible that the same production-rule-based joining of CGs will produceCGs which are at least as good as Barrière’s first-attempt syntax trees. Hence it is hopedthat the final CGs will also be of a higher quality than would have been possible, had Itaken Nicolas’s approach.

As I said, I have to transform the WIVU syntax into more traditional syntax trees. The outcomeof this step is two-fold: First, it produces the more traditional syntax trees which form a betterbasis for transformation to CGs. Second, it allows me to automatically extract a transformationalgrammar of the text based on the better syntax trees. This will allow me to know exactly whatthe syntax tree looks like in terms of production rules, and hence I can write rules which mapsyntax to CGs based on these production rules.

6.3 Overview of my methodMy method runs in three stages:

1. First, the WIVU syntax is transformed into syntax trees that are closer to traditional syntaxtrees in the generational tradition.2 An important by-product of this step is a generativegrammar of the text. This grammar is used in step 2. The input to this stage is the Hebrewtext and its WIVU syntactic analysis (from Emdros).

2. Second, the resulting syntax trees are transformed to intermediate CGs by means of a rule-based joining algorithm. The rules are based on the grammar produced in step 1. Thealgorithm is a depth-first join of the syntax trees. The input to this stage comprises:

(a) The Hebrew text and its (now transformed) syntactic analysis

(b) The ontology

(c) A number of small Hebrew-English lexicons (for answers to lexicon-specific ques-tions)

(d) A relation-hierarchy

(e) A number of rules for transforming the syntactic analyses to syntactic CGs.

3. Third, the intermediate CGs are transformed to more semantic CGs by means of rulesbased on a premise input graph and a conclusion output graph. The input to this stageconsists of:

2 Such as Chomsky’s various theories, Joan Bresnan’s Lexical-Functional Grammar (LFG), Head-Driven Phrase-structure Grammar (HPSG) of Pollard and Sag, and others. See Horrocks (1987) for a good introduction to someof these. HPSG is described in Pollard, Carl and Ivan A. Sag (1994) Head-driven Phrase Structure Grammar. Uni-versity of Chicago Press, Chicago. LFG has most recently been treated in Bresnan, Joan (2000) Lexical-FunctionalSyntax. Blackwell Publishers, Oxford.

64 CHAPTER 6. METHOD OVERVIEW

(a) The CGs from step 2.

(b) The ontology

(c) A relation hierarchy

(d) Rules for transforming intermediate CGs to semantic CGs. These rules are expressedin terms of CGs with a premise-conclusion structure.

The method is depicted in Figure 6.1 on the facing page. The rectangular boxes depict each ofthe three stages just described, while the rounded boxes show the input data at each stage, as wellas the output data.

6.4 Conclusion and overviewI have presented a discussion of the two methods of Barrière and Nicolas, and I have motivatedmy choice of being more like Barrière than like Nicolas (6.2). I have also briefly outlined mymethod, and presented a diagram of the method (6.3).

Looking now towards the rest of the thesis, in Chapter 7 starting on page 67 I present mymethod of transforming the WIVU syntax to more traditional syntax trees. In Chapter 8 startingon page 79 I present my method for transforming these syntax trees to intermediate CGs as a firststep. In Chapter 9 starting on page 103 I present my method for transforming these intermediateCGs to more semantic CGs. Then in Chapter 10 starting on page 123 I discuss the completemethod, its possible applications, and its philosophical underpinnings. Finally, I conclude thethesis in Chapter 11 starting on page 139.

6.4. CONCLUSION AND OVERVIEW 65

Figure 6.1: Overview of method.

The Hebrew text plus its syntactic analysis are first transformed to more fine-grainedsyntax-trees. These syntax-trees are then transformed to intermediate CGs using

syntax-transformation rules, a Hebrew-English lexicon, an ontology, and a relation hierarchy.These intermediate CGs are then transformed to semantic CGs using premise-conclusion-based

transformation rules plus the same ontology and relation hierarchy as was used for theintermediate CGs.


66

Chapter 7

Transforming the WIVU syntax

7.1 IntroductionIn this chapter, my goal is to show the inherent problems in dealing with the WIVU syntax formy purposes, and how I solve these problems. First, I briefly recapitulate the most importanttenets of the WIVU syntax trees (7.2). I then discuss what the problems involved are (7.3). Ithen show how I transform these WIVU syntax trees into “better”, more “finely grained” syntaxtrees which are closer to traditional transformational-generational syntax trees (7.4). To make mydescription less abstract, I provide an example with references to the algorithms (7.5). Finally, Iconclude the chapter (7.6).

7.2 The bare WIVU syntax treesThe WIVU syntactic analysis comprises four layers (word, phrase, clause, and sentence). Theupper three layers (phrase, clause, and sentence) exist in two versions: functional units (phrase,clause, sentence) and distributional units (phrase atom, clause atom, sentence atom). The distri-butional units often coincide with their functional counterparts, but sometimes a functional unitis made up of more than one distributional unit.

The functional phrases make up the largest units that at clause level have a function. Phraseatoms, when different from their functional counterparts, make up the level just below the func-tional level. In between word-level and phrase atom-level we find the subphrase level, whichfurther breaks down the phrase-atom level into functional categories such as “parallel”, “geni-tive”, “attributive”, and “modifier”.

7.3 Problems of the WIVU syntax treesThe problem with the WIVU syntax trees, for my purposes, are the following:

• They are not strict trees, since the phrase-atom-level and the functional-phrase-level are

67

68 CHAPTER 7. TRANSFORMING THE WIVU SYNTAX

A B C| REG | rec |

| REG | rec |

Figure 7.1: Example regens-rectum structure

There are two regens-rectum structures, sharing the middle term (B). The top-layerregens-rectum structure consists of “A” and “B”. The bottom-layer regens-rectum structure

consists of “B” and “C”. It is not possible to make this into a tree without changing one of thesubphrase units.

not in a strict tree-hierarchy. Instead, the two hierarchies (functional and distributional)are parallel hierarchies, forming two separate trees.

• They are not fine-grained enough. The phrases are much too large and unwieldy, even ifone considers the phrase-atom-level. The phrases are often so large that joining their wordsbecomes difficult, since the phrases lack internal structure.

• The subphrase level, while attempting to break up the large phrases, sometimes followconventions that make them less than easy to directly fit into a tree structure. Consider,for example, the regens-rectum structure in Figure 7.1. (Regens-rectum is the Hebrewcounterpart, if not direct equivalent, of the Indo-European genitive structure). The troublewith fitting this structure into a tree structure is that the word B would need two nodes inthe tree, one to express the rectum-relationship with the word A, and one to express theREGens relationship with word C. This is impossible in a tree-structure, since no matterwhich way one turns the relationships, either B’s rectum-relationship with A would need todominate both B and C, or B’s REGens relationship with word C would need to dominateboth A and B. Thus the subphrases do not fit nicely into a tree-structure.

7.4 Making better syntax trees

7.4.1 IntroductionI have developed a solution to all of the problems described in the previous section. My solutioninvolves doing away with the distributional level, unifying the distributional levels with theirfunctional counterparts, in certain situations transforming the subphrases to strict trees, thendoing away with the subphrase-level, and creating a true tree.

The tree’s structure is preserved by a feature on each syntactic Emdros object called “id_up”which “points to” the direct ancestor in the tree of that node. The “id_up” of a child A points tothe id_d of its ancestor B, so that B.self = A.id_up.1

1 See section 2.4.2 on page 29 for an explanation of the notations B.self and A.id_up.

7.4. MAKING BETTER SYNTAX TREES 69

In the following, I give the main thrust of the algorithms that make up the transformationfrom WIVU syntax to more traditional syntax trees.

7.4.2 Main transformation algorithmMy solution is embodied in a Jython script which does the following:

1. Load the Emdros database into memory (all of it, or only a part). Call the in-memorydatabase “indb”.

2. Create an empty in-memory Emdros database. Call this in-memory database “outdb”.

3. Copy the functional sentences, the functional clauses, and the words directly from indb tooutdb, but create new id_ds in outdb.

4. Take each phrase in indb, and call the subroutine “transform_phrase” on them in turn (seebelow).

5. Sew the words onto the lowest phrases that exist at this point. I.e., make sure that eachword’s id_up feature points to the phrase object that is lowest in the hierarchy and whichcontains the monad of the word.

7.4.3 transform_phraseThe “transform_phrase” subroutine takes a phrase as input and transforms its phrase atoms andsubphrases. First, the phrase is split into its constituent phrase atoms (if there is more than onephrase atom). Then, any parallel subphrases are transformed, using a subroutine. Then, anyPPs that result from this are broken down into their constituent preposition(s) and the object NP.Finally, any regens/rectum subphrases are transformed, using the same subroutine as for parallelsubphrases.

The algorithm does the following:

1. Get the clause in outdb which contains the input phrase.

2. Make a copy of the input phrase and call it “copyphrase”. Give it a new id_d which isunique in outdb.

3. Set copyphrase.id_up to the id_d of the containing clause.

4. Add copyphrase to outdb.

5. Split copyphrase into its constituent phrase atoms, if different from copyphrase. Create a“phrase” from each phrase atom that resembles a functional phrase in its features, includinga “function” which is the same as copyphrase.function. Set each new phrase’s id_up tocopyphrase.self. Add each to outdb.


6. Transform parallel subphrases by means of the “transform_phrase_subphrase” subroutine(see below).

7. Transform PP phrases that exist in outdb at this point into their Prep and NP constituents.Each NP is given the function “PPObj”. The algorithm looks for the first non-prepositionas the beginning of the NP, and takes the end of the input PP phrase as the end of the NP.

8. Transform regens/rectum subphrases by means of the “transform_phrase_subphrase” sub-routine (see below).

I could have chosen to transform “attribute” and “modifier” subphrases as well, but decided thatit was too much effort for too little gain; the law of diminishing returns.

7.4.4 transform_phrase_subphraseThe “transform_phrase_subphrase” method takes as input the highest phrase that we are dealingwith (copyphrase), and two strings (“MOTHER_type_string” and “daughter_type_string”) whichshow, respectively, the name of the mother’s type (e.g., “REG”) and the name of the daughter’stype (e.g., “rec”). The goal is to transform the subphrases of the specified kind (REGens/rectumor PARallel) into a tree. To do this, some subphrase-patterns are transformed into strict treesusing a subroutine (described in Section 7.4.5). Then, having gotten a tree, another subroutineis called (described in Section 7.4.6) for transforming this tree. The algorithm takes into consid-eration the fact that parallel structures are left-branching whereas regens/rectum structures areright-branching. This is reflected, e.g., in the way the lists of subphrases are sorted, whether theyare sorted first-to-last or last-to-first.

The algorithm runs as follows:

1. Get a list of the id_ds of the subphrases having monads in the monads of copyphrase.

2. If there are none, return at this point.

3. Otherwise, sift through the subphrases, building two lists of objects, one that containsthe mothers (subphrases_MOTHER_objects) and one that contains the daughters (sub-phrases_daughter_objects), based on MOTHER_type_string and daughter_type_string.

4. Sort each list. If mother is a “PAR”, sort them such that objects which start first and endsooner come before objects which start later and/or end later. If mother is a “REG”, do thereverse sort.

5. Run through each lists, and set a flag (“hasBeenSeen”) in each object to “false”. This saysthat that the object has not been “seen” (or “visited”) yet.

6. If mother’s type is “PAR”: For each mother object, do the following

(a) If the mother has been “seen” yet (as indicated by the “hasBeenSeen” flag), don’t doanything.


(b) Otherwise:

i. get a list of the objects that comprise a “string” of subphrases. Use the“get_subphrase_string_left_branching” algorithm (shown below).

ii. Sort the list objects of the subphrase string so that objects that start first and endsooner are before objects that start later and/or end later.

iii. Call subroutine “process_subphrase_list” (7.4.6) with this sorted list of objectsas an argument.

7. Otherwise, if mother’s type is “REG”: For each daughter object, do the following:

(a) If the daughter has been “seen” yet (as indicated by the “hasBeenSeen” flag), don’tdo anything.

(b) Otherwise:

i. get a list of the objects that comprise a “string” of subphrases. Use the“get_subphrase_string_right_branching” algorithm. This is the counterpart ofthe “get_subphrase_string_left_branching” algorithm, but makes a right-branchingtree instead of a left-branching tree.

ii. Sort the list objects of the subphrase string so that objects that end last and startlater are before objects that end sooner and/or start sooner.

iii. Call subroutine “process_subphrase_list” with this sorted list of objects as anargument.

7.4.5 get_subphrase_string_left_branchingThe “get_subphrase_string_left_branching” subroutine takes as input a mother subphrase, a listof mother objects of the same type, and a list of daughter objects of the same type. It returns alist of subphrases which contains a “string” of subphrases that are connected by overlap, if thereare more than two subphrases involved. If only one pair is involved, that pair is returned. Italso transforms the subphrases into a strict tree by making sure that any overlapping structures(such as those in Figure 7.2 on the following page) are transformed into a full left-branchingtree. The example in Figure 7.2 on the next page would be transformed into the structure inFigure 7.3 on the following page. This can clearly be transformed into a tree, since all nodes arefully contained, pair-wise, within another node (the top-most PAR/par pair being contained inthe containing phrase).


1. Create an empty result list.

2. Set the mother object’s “hasBeenSeen” flag to “true”.

3. Find the daughter of the mother. Call it “obj_daughter”.

4. Set obj_daughter’s “hasBeenSeen” flag to “true”.


A B C| PAR | par |

| PAR | par |

Figure 7.2: Overlapping parallel subphrase example

Like the regens-rectum structure in Figure 7.1, this cannot be transformed into a tree structurewithout changing one of the subphrase units.

A B C| PAR | par || PAR | par |

Figure 7.3: Transformed, left-branching parallel example

This transformed version of Figure 7.2 is a left-branching tree. The middle PAR subphrase hasbeen extended over both A and B.

5. Append mother and daughter to the list to be returned.

6. Check to see whether there is a mother in the list of mothers that has the same set ofmonads as the daughter. Call it “obj_second_mother” if it is there.

7. If there is such a second mother:

(a) Create a new mother object on the basis of “obj_second_mother”. Call it “new_mother”.

(b) Set new_mother’s set of monads to be the range from the input mother’s first monadto obj_second_mother’s last monad. This effectively extends it over both the motherand the daughter, thereby creating a left-branching structure.

(c) Set obj_second_mother’s “hasBeenSeen” flag to “true”.

(d) Call ourselves (“get_subphrase_string”) recursively, with “new_mother” as the mother.Extend our result list by the results of the recursive call when it returns. This ensuresthat the second mother’s daughter is also added to the list, along with new_mother,and that any second mother equal to the second daughter’s set of monads also getsadded, until the list of mothers is exhausted.

Note that this only shows the algorithm for making left-branching strings. The algorithm forright-branching strings is analogous, but changes a few things, such as treating daughters wherethis algorithm treats mothers.

The left-branching structure is chosen for the parallel subphrases, whereas the right-branchingstructure is chosen for the REG/rec subphrases. Why this choice?


A B Cbridle horse king

| REG | | rec || REG | | rec |

Figure 7.4: Transformed right-branching regens-rectum example

This transformed version of Figure 7.1 on page 68 is a right-branching tree. This has beenaccomplished by extending the middle “rec” subphrase over “B” and “C”. Note that the two

lines have been swapped since Figure 7.1.

The parallel subphrases get a left-branching structure because this structure is found else-where in the WI’s treatment of parallel subphrases. It also makes good sense, since the list isthus built up from the front, with later elements being tacked onto the end. Had the structurebeen right-branching, the list would have been built up from the back, which is counterintuitive,given that lists are spoken in a given order, which is ordered by the textual sequence.

The regens/rectum subphrases get a right-branching structure because it matches the seman-tics of the regens/rectum structure. For example, if the words “A B C” in Figure 7.1 on page 68were “bridle horse king”, the semantics would be “bridle of the horse of the king”, not “bridle’shorse’s king”. The sentence “bridle of the horse of the king” is right-branching in that “horse”and “king” are more tightly bound than “bridle” and “horse”. Instead, it is “bridle” and “horse ofthe king” that are bound at a higher level than “horse” and “king”. This is depicted in Figure 7.4.Thus right-branching structures are necessary for regens/rectum.

7.4.6 process_subphrase_list

The “process_subphrase_list” subroutine takes as arguments a sorted list of objects making upa “subphrase string”, and the MOTHER_type_string from “transform_phrase_subphrase”. Itbasically calls the next algorithm, process_subphrase, for all mothers in the list. However, ifprocess_subphrase already has processed the mother, this algorithm bypasses it. This is done bymeans of a flag called “hasBeenSeen2” on each object.

It does the following:

1. For each object in the list of objects, set a flag “hasBeenSeen2” to false.

2. For each object in the object list:

(a) If the object’s “hasBeenSeen2” flag is true, don’t do anything with it.

(b) Otherwise, if the object is of type MOTHER_type_string, call the subroutine “pro-cess_subphrase”.


7.4.7 process_subphraseThe process_subphrase algorithm processes a mother subphrase and its concomitant daughtersubphrase. It creates new phrases based on the parent phrase. It copies the parent phrase (therebytaking over its phrase type), and sets the id_up of each new phrase to the id_d of the parent.


1. Get the daughter that belongs to the mother.

2. Set both daughter and mother subphrase’s “hasBeenSeen2” flag to “true”.

3. Create a set of monads that is the union of both daughter and mother. Call it “som”.

4. Get the phrase in outdb which is lowest in the hierarchy yet contains all the monads of“som”. Call it “lowest_phrase_obj”.

5. Make a copy of “lowest_phrase_obj”, and set its set of monads to the mother’s set ofmonads, set its function to the mother’s type, autogenerate a unique id_d for it, and set itsid_up to lowest_phrase_obj.self.

6. Do something similar for the daughter object.

7. For both new phrases, check whether the parent is a PP, yet this object contains no prepo-sition in front. If this is the case, set its “phrase_type” to “NP”. This is to deal with thosecases where the parent is a PP, but the two parallel things are strictly inside the PP’s objectNP.

8. Add both to outdb.

7.5 ExampleAll of this can seem very abstract. Therefore, I shall illuminate certain aspects of the method bymeans of an example.

Consider Figure 7.5 on page 76. It shows the WIVU analysis of the Hebrew for “Darknesswas on the face of the deep” from Gen 1:2. At the top you see the row of monads (19-23). Thencomes a row with the words, followed by rows with phrases, phrase atoms, subphrases, and theclause. Each object in each row is delimited by lines, thereby showing the monads of each object.Each object has an id_d (e.g., 10020 for the first Word-object). Each object further has a numberof features. The most important ones are: For phrases: function (func) and phrase type (phrtyp).For phrase_atom: phrase atom type (pat). For subphrase: mother2 (moth), kind3, and type.4

2 The mother is a pointer to the id_d of the mother, or head subphrase. If this is 0, then the subphrase object is amother. If it is non-0, then the subphrase is a daughter, and the mother feature points to the mother via the mother’sid_d.

3 The kind is either “moth” (mother) or “daugh” (daughter).4 The type shows whether it is a parallel construction, a regens/rectum construction, or a number of other types.

In the example, the construction is a “REG(ens)” / “rec(tum)” construction.

7.6. CONCLUSION 75

The most interesting aspects of this analysis are the following:

• In all cases, the phrase atoms coincide with the phrases. There is no phrase atom which issmaller than its functional phrase counterpart.

• One phrase-atom (with id_d 10951, the one that extends over monads 21-23) is a PP.Therefore, it will be split into a preposition plus an NP.

• The presence of the regens/rectum subphrases at the end means that this NP will be splitfurther into sub-NPs.

Consider now Figure 7.6 on page 77. It shows the result of transforming the WIVU syntaxin Figure 7.5.

First, note how each object now has a new feature, called id_up. This points to the immediateancestor in the tree. Thus the first phrase (with id_d 10030, which consists of the monad-set{19}) has its id_up feature set to “10002”, which is the clause. This shows that this phrase is animmediate constituent in the clause. Note also how the first word (with id_d 10011 and the samemonad set as the phrase) has its id_d feature set to “10030”, which is the phrase just mentioned.Thus this word is an immediate constituent in that phrase. This system of id_up preserves thestructure of the tree.

Second, note how there are no phrase atoms. They have been done away with in the trans-form_phrase algorithm (Section 7.4.3 on page 69). Note also that as a consequence of this, andthe fact that all the phrase atoms coincided with their corresponding functional phrases, the firstthree phrases now have words as their immediate constituents.

Third, consider the multiple levels of phrases in monads 21-23. We can see that the PP phrase(id_d 10032) has been split into a preposition (id_d 10013) and an NP (id_d 10033). This wasalso done in the algorithm in Section 7.4.3 on page 69. Then, because of the regens-rectumsubphrases, this NP (id_d 10033) has been split further into two separate NPs (id_ds 10034and 10035). These new NPs have their id_up feature set to the parent NP (id_d 10033). Thiswas done in the algorithms described in Sections 7.4.4 on page 70, 7.4.5 on page 71, 7.4.6 onpage 73, and 7.4.7 on the preceding page. The algorithm in Section 7.4.4 basically figured outwhat subphrases were present, sorted them and made them into a tree (using the algorithm inSection 7.4.5), then called the algorithm in Section 7.4.6 with this tree. This algorithm, in turn,called the algorithm in Section 7.4.7 to do the actual splitting.

Thus I have given an example of how all parts of my algorithm work.

7.6 ConclusionI have developed a method for transforming the syntactic analysis of the WIVU database intoa more traditional syntax tree. It only contains three object types: word, phrase, and clause,and is thus a simplification from the WIVU syntax. Functional phrases are split into their phraseatoms. The resulting phrases are split for parallel subphrases, PPs, and regens/rectum subphrases.Parallel subphrases give rise to a left-branching structure, whereas regens/rectum gives rise to a

76C

HA

PTE

R7.

TR

AN

SFOR

MIN

GT

HE

WIV

USY

NTA

X

1:2 19 20 21 22 23

word

w: 10020oldlxm: Wgw: W:-pdpsp: cj

w: 10021oldlxm: XCK/gw: XO73CEk:pdpsp: n

w: 10022oldlxm: <Lgw: <AL&pdpsp: prep

w: 10023oldlxm: PNH/gw: P.:N;74Jpdpsp: n

w: 10024oldlxm: THWM/gw: T:HO92Wmpdpsp: n

phrase

p: 11345func: Conjphrtyp: CjPdet: NA

p: 11346func: Subjphrtyp: NPdet: indet

p: 11347func: PreCphrtyp: PPdet: indet

phrase_atomp: 10949det: NApatyp: CjP

p: 10950det: indetpatyp: NP

p: 10951det: indetpatyp: PP

subphrase

s: 11728moth: 0kind: mothtype: REG

s: 11729moth: 11728kind: daughtype: rec

clause

c: 10677text_type: ?moth: 0cltype: NmClccrel: none

Figure7.5:W

IVU

syntaxof“D

arknessw

ason

theface

ofthedeep”

The

monads

areshow

nalong

thetop.T

hereare

fiveobjecttypes:W

ord,phrase,phraseatom

,subphrase,and

clause.Each

objectisdelim

itedby

lines,andhas

anID

(e.g.,10020forthe

firstw

ord)anda

numberoffeatures.

7.6.C

ON

CL

USIO

N77

1:2 19 20 21 22 23

word

w: 10011oldlxm: Wgw: W:-id_up: 10030pdpsp: cj

w: 10012oldlxm: XCK/gw: XO73CEk:id_up: 10031pdpsp: n

w: 10013oldlxm: <Lgw: <AL&id_up: 10032pdpsp: prep

w: 10014oldlxm: PNH/gw: P.:N;74Jid_up: 10034pdpsp: n

w: 10015oldlxm: THWM/gw: T:HO92Wmid_up: 10035pdpsp: n

phrase

p: 10030phrtyp: CjPfunc: Conjdet: NAid_up: 10002

p: 10031phrtyp: NPfunc: Subjdet: indetid_up: 10002

p: 10032phrtyp: PPfunc: PreCdet: indetid_up: 10002

phrase

p: 10033func: PPobjphrtyp: NPdet: indetid_up: 10032

phrase

p: 10034phrtyp: NPfunc: REGdet: indetid_up: 10033

p: 10035phrtyp: NPfunc: recdet: indetid_up: 10033

clause

c: 10002text_type: ?moth: 0id_up: 0cltype: NmClccrel: none

Figure7.6:Transform

edsyntax

of“Darkness

was

onthe

faceofthe

deep”

The

monads

areshow

nalong

thetop.T

hereare

threeobjecttypes:W

ord,phrase,andclause.

Each

objectisdelim

itedby

lines,andhas

anID

(e.g.,10011forthe

firstword)and

anum

beroffeatures.


right-branching structure. This reflects the semantics of the two kinds of subphrases. The resultis a proper syntax tree with an acceptable level of granularity.

Chapter 8

From syntax to CGs

8.1 IntroductionIn this chapter, I discuss my method for transforming the syntax trees uncovered in the previouschapter into CGs. For word-level and phrase-level, the method is very close to what Barrièredoes. For clause-level, however, I take up ideas from Nicolas.

First I describe the general approach (8.2). Then I describe the rules whereby the syntacticstructures are transformed at phrase- and word-level, first generally and then specifically for eachsyntactic category (8.3). Then I describe how I treat clause-level (8.4), followed by a descriptionof the algorithms used (8.5). I then treat the relations used in the previous sections (8.6). I thengive an example of how the rules are applied, with references to the previous sections (8.7).Finally, I conclude the chapter (8.8).

8.2 General approachThe general approach of my method can be described as “syntax-directed, rule-based joining ofconceptual graphs”. As such, it is similar to what Barrière does in the first step of her transfor-mation to CGs, and it is also similar to what Sowa and Way do, along with their followers.

The key difference between what I do and what Sowa and Way do can be summarized in twopoints: a) I use my ontology, rather than canonical graphs, for the leaf nodes, and b) I use onlythe signature of relations rather than full maximal joins of canonical graphs to guide the join.

The syntax tree is traversed in a depth-first order, thus starting with the leaf nodes (i.e., words)and working one’s way up through the tree.

The output is not a graph but a list of graphs. Whenever there is ambiguity (such as a polyse-mous word), the graphs in the list are duplicated as many times as the ambiguity allows for, andthen added to the list, barring joins that are not allowed by the signatures of relations.

The rule-based join proceeds up until the clause-level is reached. At that point, a differentalgorithm takes over with different heuristics. This algorithm simply joins the conceptual graphsdiscovered for the phrases below clause-level by taking the predicate (or predicate complement)as the head concept and then joining the rest of the conceptual graphs to this head concept with

79

80 CHAPTER 8. FROM SYNTAX TO CGS

relations that have the same name as the clause labels with which they are labeled. It is up tothe third stage to take these syntactic relations and make them into more semantic relations orconstructions.

8.3 Phrase- and word-level rules

8.3.1 IntroductionIn this section, I introduce and discuss the rules which I have created for treating phrase- andword-level. I first discuss general properties of the rules (8.3.2). I then describe and discussthe rules, listed by grammatical category, first words and then phrases: Noun (8.3.3), Verb(8.3.4), Adjective (8.3.5), Adverb (8.3.6), Preposition (8.3.7), Conjunction (8.3.8), NP (8.3.9),VP (8.3.10), PP (8.3.11), CjP (8.3.12), and AP and AdvP (8.3.13). Finally, I conclude the section(8.3.14).

8.3.2 Properties of rulesThe rules have the following properties:

1. The input is either a word with concomitant morphological attributes, or a phrase-structurerule for a phrase.

2. The output is a list of (fragments of) conceptual graphs, not a single conceptual graph.This is so as to be able to deal with ambiguity. However, when there is no ambiguity, thelist contains only one conceptual graph.

3. The output always has a specially privileged node which is an attachment point for the restof the graph when joining at a higher level. This node is marked by being attached to arelation called “attach”, and may have the Universal type as a concept type if the type is tobe determined at a higher level.

4. The rules take advantage of a lexicon for words which are not in the ontology.

5. For words in the ontology, all the possible senses of the word are produced, each givingrise to one conceptual graph in the output list, except when this is prohibited by relationsignatures.

6. Whenever a grammatical category (such as “noun” or “NP/REG”) is used as the concepttype, it is understood that what is meant is the concept that has the attachment point fromwhatever production rule it stands for.

Below I show the rules, listed by grammatical category. First I treat the four parts of speechwhich are present in the WordNet ontology, then I treat the rest of the parts of speech in my text,after which I treat each of the different kinds of phrase structure rules for phrases.

8.3. PHRASE- AND WORD-LEVEL RULES 81

Suffix Meaning Replacement stringJ 1 sing on noun (possessive suffix) myK 2 masc sing your_singK= 2 fem sing your_singW 3 masc sing (primarily on noun) his_or_itsH 3 fem sing hers_or_itsNW 1 plur ourKM 2 masc plur your_pluralKN 2 fem plur your_pluralHM 3 masc plur theirM 3 masc plur theirMW 3 masc plur (archaic) theirHN 3 fem plur theirN 3 fem plur their

Table 8.1: Suffix conversion strings for possessive suffix

The tables contain a left-hand side which is the input, and a right-hand-side which is theoutput. In the output, emphasized words are to be taken from the input and replaced by whatevermeans are available. If the input is a noun, verb, adjective, or adverb, the output is taken fromthe ontology. If the input is a suffix, the output is taken from the list of strings in Tables 8.1and 8.2 on the next page. The tables are based on personal communication with teol.dr. NicolaiWinther-Nielsen, a hebraist and associate professor of Old Testament studies at CopenhagenLutheran School of Theology, to whom I express my gratitude, and parts of them can also befound in Petersen (1997a).

8.3.3 Noun

See Table 8.3 on the following page.Plural nouns get the usual “{*}” plurality-marker, whereas singular nouns don’t get it. Nouns

with a suffix are further distinguished with an indexical which is the suffix replacement stringof Table 8.1. If the noun is “>LHJM/” (“God”), then, even though the noun is plural, it is notgiven the “{*}” plurality-marker. This is because the meaning is not “Gods”, but “God” (in thesingular). After all, we are dealing with the holy writ of two monotheistic religions here.

8.3.4 Verb

See Table 8.4 on the following page.There are two by two cases: One dimension is whether the verb in question is flagged in the

ontology as a “be_X”, e.g., Hebrew “VWB[” = “be_good”. The other dimension is whether theverb has a subject suffix.


Suffix Meaning Replacement stringNJ 1 sing on verb (object suffix) IK 2 masc sing you_singK= 2 fem sing you_singW 3 masc sing (primarily on noun) he_or_itHW 3 masc sing (only on verb) he_or_itH 3 fem sing she_or_itNW 1 plur weKM 2 masc plur you_pluralKN 2 fem plur you_pluralHM 3 masc plur theyM 3 masc plur theyMW 3 masc plur (archaic) theyHN 3 fem plur theyN 3 fem plur they

Table 8.2: Suffix conversion strings for subject suffix

input outputnoun, singular, no suffix [noun]<-attachnoun, plural, no suffix, is not “GOD” [noun: {*}]<-attachnoun, plural, is “GOD” [noun]<-attachnoun, singular, suffix [noun: #suffix]<-attachnoun, plural, suffix, is not “GOD” [noun: {*} #suffix]<-attachnoun, plural, suffix, is “GOD” [noun: #suffix]<-attach

Table 8.3: Rules for nouns

verb outputverb, no suffix, is not “be_X” [verb]<-attachverb, suffix, is not “be_X” attach->[verb]->agnt->[Universal: #suffix]verb, no suffix, is “be_X” attach->[State: [be]-attr->[verb]]verb, suffix, is “be_X” attach->[State: [verb]<-attr<-[be]]<-stat<-[Universal: #suffix]

Table 8.4: Rules for verbs


input outputadjective [adjective]<-attach

Table 8.5: Rule for adjectives

input outputadverb [adverb]<-attach

Table 8.6: Rule for adverbs

When the verb is a “be_X” verb, I use the verb “be” and an “attr” relation,1 enclosed in aState concept. The State concept has the attachment point. When the suffix is present, I use the“STAT” relation to indicate that the subject suffix is in a state.

When the verb is not a “be_X” verb, I simply take the verb’s ontology entry’s English gloss,create a concept from this concept type, and use it as the attachment point. If there is a subjectsuffix, I link it with an AGNT relation to the verb. The relation could, in some cases, be a STATrelation, e.g., when the verb is a verb of existence like HJH. However, for the sake of simplicity,I will assume an AGNT relation.

The suffixes are taken from Table 8.2 on the facing page.

8.3.5 Adjective

See Table 8.5.Nothing much to say here: Adjectives are simply taken from the ontology and then encapsu-

lated in a concept and passed on. This leaves the decision of whether it is attributive or predicativeat a higher level where it belongs.

8.3.6 Adverb

See Table 8.6Again, adverbs are simply found in the ontology and then passed on. Just as with adjectives,

the relation “manr” is added at a higher level.

8.3.7 Preposition

The lexicon for the prepositions in Genesis 1:1-3 is as in Table 8.7 on the next page. These wordsend up in relations.

1 Recall from point 12 on page 41 that “be_X” verbs, where X is an adjective, end up under “attribute.” Hence,the choice of the “attr” relation is justified.


input outputB in<L over

Table 8.7: Lexicon for prepositions

input outputW and

Table 8.8: Lexicon for conjunctions

8.3.8 Conjunction

The lexicon for the conjunctions in Genesis 1:1-3 is as in Table 8.8. This word ends up inrelations.

8.3.9 NP

See Table 8.9.There is not much to say about NPs from a high-level perspective: When the NP consists of a

single noun, that noun becomes the attachment point. If an article is in front of the noun, then the“#” indexical is added. The parallel construction and the regens/rectum construction, however,deserve special mention, and will be treated in the following two subsections.

8.3.9.1 Parallel construction

The PAR/par construction takes the two NPs, connects them with the conjunction, and placesthem in the context of a concept whose type is the least common supertype of the attachmentpoints of the two NPs. This outer context will then become the attachment point at a higherlevel.

The PAR/par construction is a left-branching structure. Therefore, the string

input outputNP –> noun [noun]<-attachNP –> article noun [noun: #]<-attachNP –> NP/REG NP/rec attach->[NP/REG]<-poss<-[NP/rec]NP –> NP/PAR conjunction NP/par [least_common_supertype(NP/PAR,NP/par):

[NP/PAR]->(conjunction)->[NP/par]]<-attach

Table 8.9: Rules for NPs


“[[Abraham and Isaac] and Jacob]”

would yield a structure of the form:

[Person:[Person:

[Person: Abraham]->(and)->[Person: Isaac]]->(and)->[Person: Jacob]

]

Clearly this is contrived. However, the next stage (Chapter 9) in principle allows for such uglyconstructions to be rectified. (See the discussion in Section 10.10 on page 135.)

8.3.9.2 Regens/rectum

In the REGENS/rectum relationship, a three-element construction would yield a structure likethis, e.g.:

[bridle]<-poss<-[horse]<-poss<-[king]

for the “bridle of the horse of the king” example. Note how the final attachment point wouldbe [bridle], which is actually the right attachment point: The first element is always the head ofthe construction. Since it is a right-branching structure, and since the left end is always chosenas the attachment point, it follows that the leftmost end will become the final attachment point.This is because the structure, being right-branching, has its innermost pair to the right, and weare moving from right to left through the structure when we go upwards through the tree. Thus,choosing the leftmost element as the attachment point at any given level will allow us to arriveat the leftmost element of the construction at the outermost level.

Whether the choice of “poss” as the relation to express regens/rectum is always justified re-mains to be debated. For example, when one speaks of “the surface of the deep”, does one reallyimply possession or rather a part-of relationship? As is well known in the linguistic literature,the English “of” preposition is so polysemous as to be almost devoid of meaning. Somethingsimilar might be true of the regens/rectum relationship, but I leave that for further research.2

8.3.10 VPSee Table 8.10 on the next page.

There is no difference between PreC and Pred: The PreC is always for participles, and Predis always for finite verbs. But since conceptual graphs are tense-less in the verbs (tense could beshown with relations at a higher level), there is no difference in the rules.

2 One possible way of solving this problem is to use a preposition-relation such as “of”, and then resolve at ahigher level (i.e., using the method of the next chapter) exactly which semantic relation is meant (such as “poss”,“part”, etc.


input outputVP –> verb [verb]<-attach

Table 8.10: Rule for VPs

input outputPP –> preposition NP/PPobj (prep is “>T”) [NP/PPobj]<-attachPP –> preposition NP/PPobj (other prep) attach->[Universal]->(preposition)->[NP/PPobj]PP –> PP/PAR conjunction PP/par [least_common_supertype(PP/PAR, PP/par):

[PP/PAR]->(conjunction)->[PP/par]]<-attach

Table 8.11: Rules for PPs

8.3.11 PP

See Table 8.11.The preposition “>T” is treated specially, since it usually does not function as a preposi-

tion that has any English counter-part. Instead, it is an “object marker”, showing that the NPwhich it governs is an object, often at clause-level. Thus the NP should be allowed to be carriedforward unchanged. For other prepositions, however, the attachment point is a concept of type“Universal”, which is then related to the governed NP by an English translation of the Hebrewpreposition. The Universal attachment point is then joined with whatever is the attachment pointat a higher level.

The parallel construction is treated analogously to the parallel construction for NPs (see Sec-tion 8.3.9.1 on page 84). This gives exactly the same problems as for NPs. The solutions,however, would be similar (see the discussion in Section 10.10 on page 135).

8.3.12 CjP

See Table 8.12.The CjP/Conj phrase is always used as an inter-clause marker. Since I deal only with units

up to and including clause-level, I do not wish to represent inter-clausal markers.Note that CjP’s only occur as top-level phrases having a function at clause-level. The method

described in Chapter 7 treats conjunctions that relate phrases at lower levels such that the con-junction becomes an immediate constituent in the phrase which is the parent of the two parallel

input outputCjP/Conj –> conjunction delete (empty graph)

Table 8.12: Rule for CjP


input outputAP/PreC –> adjective [adjective]<-attachAP/any other function –> adjective [adjective]<-attr<-[Universal]<-attach

Table 8.13: Rules for APs

input outputAdvP –> adverb [adverb]<-manr<-[Universal]<-attach

Table 8.14: Rule for AdvPs

phrases. Thus CjPs are never produced by the method in Chapter 7. Hence it is safe, and makessense for the reason just given, to eliminate them altogether.

8.3.13 AP and AdvP

For completeness, I show the rules as they would have been for APs and AdvPs, had they beenin my chosen text (Genesis 1:1-3).

Adjective phrases are described in Table 8.13. As can be seen, AP as a predicate complement(PreC) is treated specially, just as a plain, bare adjective. This is because a predicate complementwill be taken care of at a higher level, namely at clause-level. Any other function will be treated asan attributive. This makes it possible to distinguish between predicative adjectives and attributiveadjectives.

All adjectives end up under “attribute” in the ontology, hence it makes sense to use the “attr”relation (see footnote 9 on page 39).

Adverb phrases are described in Table 8.14. Adverbs are treated as “manner” because alladverbs end up under “manner” in the ontology.

8.3.14 Conclusion

I have presented the rules I have devised for treating word-level and phrase-level. The rules taketheir point of departure either in a part of speech with concomitant morphological attributes, orin a phrase-structure rule. The result in each case is a list of conceptual graphs, to allow forambiguity. In each graph there is a specially privileged node, called the point of attachment,which is used for joining at higher levels in the tree.

The process runs bottom-up in a depth-first traversal of the syntax tree. This means that thealgorithm actually starts at the top of the tree (with the clause), but then recurses down throughthe tree until it hits the bottom (the words). Then the results are combined or joined bottom-upuntil one reaches the top (the clause). Thus the first part of each algorithm recurses down throughthe tree and then, when these subroutines return, combines or joins the results from this recursivedownwards traversal of the tree.


8.4 Clause-level

8.4.1 IntroductionIn this section, I describe what I do about clause-level. First I make an excursus about thenumber of clause-level rules in the grammar versus the number of phrase-rules, and the rate atwhich these numbers grow with the amount of text analyzed (8.4.2). Then I describe in generalterms what I actually do (8.4.3).

8.4.2 Excursus: The number of rulesWhat should be done about clause-level? It would have been nice to treat each valency-patternspecially, each in a separate rule along the same lines as is done with phrases. This turns out,however, to be infeasible for large amounts of text, simply because valency patterns are so nu-merous that writing a rule for each quickly becomes cumbersome. Moreover, the clause-patternsmultiply more quickly for larger amounts of text than the phrase-level patterns do.

To back up this statement, I have run the method in section 7.4 on page 68 on various por-tions of Genesis, including all of it, extracting various grammars at each stage. The grammarsextracted are:

1. A grammar where the phrases are distinguished both by phrase type and by function.

2. A grammar where the phrases are distinguished by phrase type alone, leaving off functions.This results in a smaller number of patterns, both at clause-level and at phrase-level, sincesome patterns which were distinct only in the phrase functions in the first grammar nowcan be subsumed under the same rule.

3. A grammar of clause-level that takes into account the functions of the phrases in eachpattern, but leaves out the phrase types. At the same time, the constituent order is abstractedaway by sorting the patterns according to a sorting scheme based on the phrase functions.This results in a grammar which subsumes patterns with the same set of phrase functionsbut different constituent order under the same pattern.

4. A grammar where the phrases are distinguished by both phrase type and by function in theproduction-part of the rule (right-hand-side of the production), but by phrase-type alonein the head of the rule (left-hand-side of the production). This grammar is especiallyimportant, because it resembles the way I have treated phrase-level in the rules of theprevious section (8.3). Thus this grammar gives a feel for the number of rules that wouldneed implementation if one were to scale the method up to include larger parts of the OldTestament.

The result, with the number of clause- and phrase-rules plotted against the number of wordscovered,3 can be seen in Figure 8.1 on page 90. As can be seen, the number of clause-rules

3 The “words covered” are counted as the number of running-text words in the text analyzed. It is text-words,not instances of words that are counted. Thus “He is a fox, he is!” would amount to 6 words, not 4.

8.4. CLAUSE-LEVEL 89

with phrase types and phrase functions (thick dash-dot line) is much higher than the number ofphrase-rules phrase types and phrase functions (thin dash-dot line), about a factor of 2 larger.Similarly, the number of clause-rules with only phrase-types (thick broken line) is higher thanthe number of phrase-rules with only phrase-types (thin broken line). It can also be seen thatabstracting the phrase-types and constituent order away from the clause-rules does not do muchto reduce complexity, as this curve (solid line) is not far below the curve for phrase-types only(thick broken line). Finally, taking the phrase functions away from the rule head gives a nicereduction in the number of rules (thin dotted line vs. thin dash-dot line), but still the numberof rules is not much greater than the number of rules without phrase functions altogether (thinbroken line). This bodes well for the scalability of my method, as we shall see in Section 10.6 onpage 130.

Conclusion: The number of clause-rules grows faster than the number of phrase-rules, and isabout a factor of 2 larger. This in my view prohibits using the same method for both clause-leveland phrase-level.

The curves have an interesting shape: They resemble root-functions (e.g., square root, cubicroot). To verify whether this is the case, I have plotted the natural logarithm of the number ofrules against the natural logarithm of the number of words. The result can be seen in Figure 8.2 onpage 91. As can be seen, all of the plots form more-or-less straight lines with a gradient that isless than one. This means, if you do the math, that the number of rules relates to the number ofwords as follows:

rules = db√

words b, d > 1

This means that the number of rules grows slowly with the number of words. This means thatthe curves become flatter and flatter as the number of words grows. In other words, there is goingto come a point after which analyzing more text is not going to discover that many more rules.This also bodes well for my method, since there is empirical evidence that the number of ruleswhich have to be taken into account will not grow fast with the amount of text to be analysed.

Note however, that the formula predicts that the number of rules will grow to infinity if thenumber of words grows to infinity. Thus the formula does not predict an absolute limit on thenumber of rules. People will say all sorts of things, and given enough time, they are likely to saysomething with a new structure.4

The fact that the number of clause-level rules grows faster than the number of phrase-levelrules can be seen in Figure 8.2 in the fact that the gradients for clause-level rules are higher thantheir corresponding phrase-level gradient.

The data used to plot the graphs can be seen in Appendix H on page 231, and the mathemat-ical derivations can be seen in Appendix I on page 235.

I mentioned earlier that the graph with a thin dotted line was the most interesting from theperspective of scalability of my method. This line represents rules with phrase-types (but nofunctions) on the rule head, and phrase types and phrase functions on the production-part of the

4 As the Biblical book of Ecclesiastes in the Bible puts it, “Of making many books there is no end, and muchstudy wearies the body” (NIV; Ecclesiastes 12:12).

90C

HA

PTE

R8.

FRO

MSY

NTA

XTO

CG

S

0

200

400

600

800

1000

1200

1400

1600

0 5000 10000 15000 20000 25000 30000

Rul

es

Words

Plot of rules against words

Gen

1-3

Gen

1-1

1

Gen

1-2

0

Gen

1-2

6

Gen

1-5

0

Clauses only with phrase typesPhrases only with phrase types

Clauses with phrase types and phrase functionsPhrases with phrase types and phrase functions

Phrases with phrase types and functions but no functions on headsClauses only with phrase functions and disregarding constituent order

Figure8.1:Plotofrules

againstwords

8.4.C

LA

USE

-LE

VE

L91

3.5

4

4.5

5

5.5

6

6.5

7

7.5

6.5 7 7.5 8 8.5 9 9.5 10 10.5

ln(R

ules

)

ln(Words)

Plot of ln of rules against ln of words

Clauses only with phrase typesPhrases only with phrase types

Clauses with phrase types and phrase functionsPhrases with phrase types and phrase functions

Phrases with phrase types and functions but no functions on headsClauses only with phrase functions and disregarding constituent orderFigure

8.2:Plotofln(rules)againstln(words)


rule. How many phrase-level rules of this kind are there for the entire Hebrew Bible? Let’sgive an estimate. Two relevant data points are (13624,221) and (1096,55). If we assume that theline is defined by these two data points, then the gradient is (ln(221)-ln(55))/(ln(13624)-ln(1096))≈0.55. The place where this line crosses the y-axis would then be ln(221)-0.55*ln(13624)≈0.16.Placing this into our formula, and assuming 426,000 words in the entire Hebrew Bible,5 we gete0.16 × 426, 0000.55 ≈ 1460 rules. That is not a whole lot.

Finally, I should mention some caveats. This section by no means constitutes proof of any-thing as fact beyond the raw data. In particular, I have not proved that the number of rules is aroot function of the number of words analyzed for large numbers of words, nor have I provedthat the rate of change actually does approach zero. All sorts of theoretical complications arise,such as the fact that we are dealing with discrete values and not continuous functions.

8.4.3 Method used at clause-level

Given that the same method used at phrase-level will not work, what will? The method I havechosen takes a cue from Nicolas’s work. Recall that in Section 6.2 on page 61, I argued thatNicolas was able to free himself from the straight-jacket of the syntax-tree by using rules whichcould match arbitrary parts of the overall graph. I am going to follow Nicolas for clause-level,in the sense that I produce an intermediate CG which has a lot of the syntax left, and then userules in the third stage of my method to take away that syntax, replacing it with semantics. Thisis what both Nicolas and Barrière do, Nicolas with his rules and Barrière with her SRTGs.

Recall that clauses are always made up of a string of phrases which are the largest unitsthat function at clause-level. Each of these phrases will have a clause label or function, like“predicate”, “predicate complement”, “subject”, “object”, “time reference”, and the like. Thusthere are no words at clause-level, since all words, even conjunctions, have been encapsulated inat least one level of phrase-units. Moreover, the functions assigned to each phrase at clause-levelare drawn from a fixed set of highly semantic labels. This structure can work to our advantage.

My method, then, takes the graph resulting from the “main” phrase and lets it be the hub ina star configuration, where the graphs resulting from the other phrases are joined to the “main”graph by means of relations with the same names as the functions of each of the other phrases.

What constitutes the “main” phrase? It can be argued that if there is a verbal predicate, thenthat is always central, and it is central for two reasons. First, it is central because the verbal va-lency of a verb in a sense determines what the other phrases can be, apart from peripheral elementsuch as adjuncts. Second, the verb is central because many of the other elements incur relationswith it. Thus the subject incurs a relation with the verb, as do the object, the complement, theadverbial elements, and the predicate complement in an copulative clause. It should be noted,however, that this is not the case for all elements in the clause, since in some linguistic theories(e.g., Role and Reference Grammar), things like fronted elements and fronted time-references donot belong together with the verb, nor would many linguistic theories say that the adjunct incursa relation with the verb (including Role and Reference Grammar). However, it can be concludedthat the verbal predicate, when present, is to be taken as central.

5 See footnote 2 on page 31.

8.5. ALGORITHMS 93

1. Pred 9. IntS 17. IrpC 25. Ques2. PreS 10. ExsS 18. Adju 26. Intj3. PreO 11. IrpS 19. Rela 27. Exst4. PreC 12. NegS 20. Loca 28. Frnt5. IrpP 13. ModS 21. Time 29. Supp6. PtcO 14. Objc 22. Modi 30. Conj7. PtSp 15. IrpO 23. Nega 31. IrpC8. Subj 16. Cmpl 24. Voct

Table 8.15: Sliding scale of importance for clause labels

What happens when there is no verbal predicate? In that case, a sliding scale of importancecan be used. The predicate complement should always take primacy when there is no predicate,since it is the most predicate-like element after predicates. I have listed the sliding scale ofimportance in Table 8.15. The principle is:

Predicates > Subjects > Objects > Complements > Adjuncts > All others

This partially reflects the hierarchy of grammatical relations proposed in Van Valin (2001, p.46), where “Subjects > Objects > non-terms” are seen in that hierarchy. Predicates come beforesubjects because we wish the verbal predicate to take precedence over all others, as argued above.Complement comes before adjunct because it is required for the completion of the meaning ofthe verb, whereas adjuncts are not.6

8.5 Algorithms

8.5.1 IntroductionIn this section, I describe the algorithms I use for transforming the syntax trees to CGs. Thealgorithms implement the general design described in the previous sections, and take advantageof the rules expressed therein.

I first give an overview of how the algorithms work from a “bird’s eye” perspective (8.5.2). Ithen describe the algorithms for word-level (8.5.3), phrase-level (8.5.4), and clause-level (8.5.5).Finally, I sum this section up in a conclusion (8.5.6).

8.5.2 Overview of algorithmsThe first general tenet which runs across all of the algorithms for transforming syntax to inter-mediate CGs is that they all return CGs, not concepts. This is because, even for word-level, it

6 The clause labels in Table 8.15 are described in Section B.3.6.3 on page 181.


may be the case that a whole CG is required to express the meaning of the word. See, e.g., therules for NPs in Section 8.3.9 on page 84.

The second general tenet is that the algorithms allow for ambiguity. Hence, each functionproduces not a CG, but a list of CGs. These lists then give rise to a multiplicity of CGs as theyfloat upwards through the tree. Or, if there is no ambiguity at any point, then the result will bejust one CG.

This possibility for ambiguity starts right at word-level, where a single word may be polyse-mous, and hence give rise to more than one CG.

The third general tenet is that the algorithm is a depth-first, bottom-up join of CGs, startingfrom the words and going one’s way up the tree up to clause-level.

Thus each algorithm produces a list of CGs, which may contain one or more CGs. If there ismore than one result CG from any given step, this gives rise to a multiplicity of CGs at each ofthe steps upwards in the tree.

8.5.3 Word-level algorithmsThere is not much to say about the word-level algorithms, except that there is a top-level entrypoint for all words, which is then delegated to subroutines for each part of speech. Each part ofspeech is then treated according to the rules given above, based on the morphological propertiesof the word. The “nuts and bolts” of Notio do most of the heavy lifting, rendering my ownalgorithms rather boring. Therefore I shall not describe the word-level algorithms in detail.

8.5.4 Phrase-level algorithmsThere is a top-level entry point for transforming phrases. It works as follows:

1. Find all the immediate constituents of the phrase, and sort them so that they appear intextual order.

2. Build a list of lists of graphs based on the immediate constituents of the phrase. That is,for all phrases in the list of constituents, call ourselves recursively and append the outputto a list we are building. Words are not treated at this point, but in each of the rules forthis phrase, and an empty list of CGs is inserted for each word (just to keep the sequencestraight).

3. Get the phrase’s phrase type.

4. Delegate processing to each of the subroutines based on the phrase type. This is actuallywhat produces the list of CGs.

5. Return the result

Note how the depth-first traversal is effected: First, we call ourselves recursively, and then weprocess ourselves based on the phrase type. This is the classic algorithm for doing a depth-firsttraversal of a tree.

8.5. ALGORITHMS 95

Each phrase type is treated in a special subroutine. The general strategy of each of thesesubroutines is as follows, from a high-level standpoint:

1. Build local copies of grammar-rules which match the production rules which we wish tohandle.

2. Call the grammar-module to build the grammar rule for the phrase at hand. This is thesame grammar-module as has been used in Chapter 7 to build grammars of the analyzedsyntax.

3. Compare the grammar-rule thus built with the local grammar-rules, and call the appropriatesubroutine implementing the matching rule.

The rule-algorithms themselves are not very noteworthy, since they just implement the rulesoutlined above using the “nuts and bolts” provided by Notio. One general thing which they alldo is look for the attachment point in the list of list of CGs produced in step 2 in the top-levelphrase algorithm, and use this attachment point for joining inside the rule. They also take careof calling the word-level algorithm, as this was not done in step 2.

8.5.5 Clause-level algorithms8.5.5.1 Introduction

Clause-level algorithms are slightly more interesting than either phrase-level or word-level rules.Therefore, I shall treat these algorithms in more detail. I start at the top-level algorithm and workmy way down the call-chain.

8.5.5.2 Top-level clause-transformation

The top-level clause-transformation first finds a list of all clauses in the database, then treats eachof the elements of this list. First, each clause is transformed by the subroutine transform_clause(Section 8.5.5.3 on the next page). Then the CG(s) which result from this are encapsulated ineither a Situation concept or a proposition concept, depending on the text type of the clause.Then the resulting concept(s) are encapsulated in CGs and added to the list of results.

The algorithm works as follows:

1. Find all clauses in the database and sort them according to textual order.

2. For each clause object:

(a) Call transform_clause (Section 8.5.5.3) on the clause object to get a list of CGs.

(b) Call a subroutine named getClauseConceptListFromCGList to get a list of clause-concepts containing all the CGs. This produces a list of concepts each of which has adescriptor which is a CG in the list of CGs produced by transform_clause. The con-cept type is either “Situation” or “proposition”, depending on the clause’s text type


Text type Meaning Concept type? Unknown SituationN Narrative SituationQ Quote propositionD Discourse Situation

Table 8.16: Mapping of clause text types to concept types

as given in the WIVU database. The subroutine works as follows:

Only the last character in that text type is considered, and the concept types is chosenas in Table 8.16. It can be seen that embedded quotations are treated as propositions(because somebody is saying something), whereas all other text types are mapped toSituation.

The Unknown (’?’) text type is assumed to be a Situation by default, since say-ing that it is a proposition is more of a risk than saying it is a Situation. In the WIVUdatabase, the unknown text type is generally used for all clauses at the beginning ofthe chapter, until the text type can be determined either through syntactic or lexicalmeans (e.g., wayyiqtol narrative tense, or the presence of a verb of saying).

The Narrative and Discourse text types are mapped to Situation because they gen-erally describe just that – situations. For example, Narrative is used to recount howthis or that person acts, moves, or stays, and hence the supertype “Situation” is ap-propriate (since this maps either to events, processes, or states).

(c) Call a subroutine called addConceptListToResult to add these clauses to the result.This just encapsulates each of the concepts produced in the previous step in a CG,and adds the CG to the result list of CGs.

8.5.5.3 transform_clause

This algorithm takes as input a clause object and produces a list of CGs. It first gets its immediateconstituents and sorts them. It then removes any CjPs as specified and argued in Section 8.3.12 onpage 86. It then calls the phrase-level rules to build lists of graphs for each of the constituentphrases. It then calls the algorithm described in the next section (8.5.5.4) to produce the resultinglist of graphs.

It works as follows:

1. Get a list of immediate constituent phrases and sort it according to textual order.

2. Remove any CjP (conjunction phrase), as per the rule given above.

8.5. ALGORITHMS 97

3. Build a list of lists of graphs from each of the constituent phrases, using the phrase-levelalgorithms described in Section 8.5.4 on page 94.

4. Call joinAtClauseLevel (Section 8.5.5.4) to produce the result list of graphs.

5. Return the output of joinAtClauseLevel.

8.5.5.4 joinAtClauseLevel

This algorithm takes as input a list of immediate phrase constituents of a clause, and a list of listsof CGs produced from this list of immediate phrase constituents. The output is a list of CGs. Thealgorithm works as follows:

1. Go through list of phrase objects (immediate constituents of the clause) and set an attributeson each object: Set the attribute “index”, which is the index it has into the list phraseobjects.

2. Copy the list of phrase objects into a new list. We do this because we wish to sort the listof phrases, but we do not wish to destroy its sequence, which is the textual sequence.

3. Sort this copy list of phrase objects according to phrase-function, using the sliding scale ofimportance, depicted in Table 8.15 on page 93.

4. Get the most important phrase from the original list of phrases by getting the index attributeon the first element of the sorted copy list of phrases, then using this index to get the mostimportant phrase. Call this index “mip_index”.

5. Create a variable, mip_graph_list, which is the list of CGs of the most important phrase(mip) from the list of lists of CGs.

6. Copy mip_graph_list to a list called “result_graph_list”.

7. For each object in the list of phrase constituents:

(a) If the object’s index is mip_index, don’t do anything.

(b) Else:

i. Get the phrase graph list of the phrase from the list of lists of CGs.ii. Get the phrase’s function

iii. For each CG (called “phrase_graph1”) in the phrase’s list of lists of CGs:A. Initialize “result_graph_list2” to be the empty listB. For each of the CGs (called “phrase_graph2”) in result_graph_list: i) Call

join_graphs_with_relation (Section 8.5.5.5 on the following page), passingthe phrase’s function as the relation name, and phrase_graph1 and phrase_graph2as the graphs to be joined. ii) Add the result to result_graph_list2.


iv. Replace result_graph_list with result_graph_list2, thus extending it with thisphrase’s list of CGs.

8. For each graph in the result_graph_list, remove the attachment point indicator, and placea name designator with the most important phrase’s phrase function on the former attach-ment point.

9. Return result_graph_list.

To summarize, the main thrust of the algorithm can be distilled as follows:

1. Find the most important phrase (according to the sliding scale of importance).

2. Initialize the result list to be the most important phrase’s list of CGs.

3. For each phrase object that is not the most important phrase, join each of its possible CGswith each of the CGs in the current result graph list, making copies of the list as manytimes as there are CGs in the phrase’s list of CGs.

4. For each graph thus created, remove the residual attachment point (which would be theattachment point of the most important phrase), and give the previous attachment point aname designator which is the phrase function of the most important phrase. This is so thatwe can distinguish it in the next stage.

5. Return the resulting list of graphs.

8.5.5.5 join_graphs_with_relation

This method take a relation name and two graphs (graph1 and graph2). It joins the two graphsusing the relation name for a relation between the two. The concepts that are joined are theattachment points in each of the graphs. The attachment point is removed from graph1, butretained in graph2. A description of the algorithm would not be very interesting, as it mostlyuses Notio to do the heavy lifting.

8.5.6 ConclusionI have described the algorithms I use for transforming the syntax uncovered in the previouschapter to intermediate CGs. The word-level algorithms (8.5.3) and phrase-level algorithms(8.5.4) are not that interesting, because they mostly just implement the rules specified in Section8.3. Clause-level algorithms are more interesting, and have been described in greater detail(8.5.5).

The word-level rules and phrase-level rules follow Barrière and produce “good” CGs, muchbetter than Nicolas’ grammatical graphs. This is done by syntax-tree-directed, bottom-up joiningof conceptual graphs, with the leaf nodes (words) being taken from the ontology.

8.6. RELATIONS 99

Relation name Signatureattach (Universal)poss (Entity, Entity)agnt (Situation, Universal)stat (Universal, State)attr (Universal, Attribute)manr (Universal, Manner)in (Universal, Universal)over (Entity, Entity)and (Universal, Universal)

Table 8.17: Relations used in transforming syntax to CG

The clause-level rules, however, follow Nicolas and produce CGs with bits of syntax leftin. In particular, the top-level phrases are sorted by a sliding scale of importance, then themost important phrase is singled out as the hub of a star of conceptual graphs. The rest ofthe conceptual graphs are joined to this “hub” by means of relations taken from each phrase’sfunction in the clause. Thus bits of syntax (viz. the functions) are still left in the graphs at thispoint. The idea is to transform these bits of syntax away in the next stage (Chapter 9), resultingin fully semantic graphs.

All of the algorithms have the following tenets:

• They are bottom-up.

• They return lists of conceptual graphs in order to allow for ambiguity. This ambiguityresults in a multiplicity of graphs as the results are joined upwards in the tree.

• Each graph (except the final graphs of clause-level) has one privileged concept called “theattachment point” which shows where to join the conceptual graph at a higher level.

8.6 Relations

8.6.1 IntroductionThe previous sections have used a number of relations. In this section, I itemize and define therelations. I do so with respect to the ontology defined in Chapter 4.

8.6.2 RelationsThe relations used are listed in Table 8.17.

The signature of the “attach” relation is (Universal) because the concept on the outgoing arccan be anything.


The signature of the “poss” relation is (Entity, Entity) because it enters into both regens/rectumrelations, and it is assumed that all things that can enter into regens/rectum relationships are En-tities.

The signature of “agnt” is (Situation, Universal) because it has either a state or a processattached to its incoming arc (hence the Situation part), and sometimes a [Universal: #suffix]concept attached to its outgoing arc (hence the Universal part).

The signature of “stat” is (Universal, State) because it can be anything on the incoming arcand always has a State on the outgoing arc.

The signatures of “attr” and “manr” have Universal on their incoming arc because it reallycan be anything (but more likely to be an Entity than a Situation), and either Attribute or Manneron their outgoing arc because that is how the ontology is structured: Adjectives and adverbs endup under Attribute and Manner respectively.

The signature of “in” is (Universal, Universal) because it is not possible to know what willbe on the incoming arc (hence Universal) and because, though “in” is mostly a spatial concept(hence Entity could plausibly have been used), it is also used for Situations such as “beginning”(which in the ontology is an Event), hence we must make the outgoing arc Universal and notEntity.

The signature of “over” is (Entity, Entity) because it is a spatial relation, hence it would notmake sense to include “Situation” by making the signature (Universal, Universal). The reasonit is not (Physical_object, Physical_object) is that collections may be involved (e.g., “<L H-CMJM”, over the heavens).

The signature of “and” is (Universal, Universal) because it is not possible to know what willbe on either side of the relation, whether Situation or Entity.

In addition to these, all the functions listed in Table B.10 on page 182 are used as dyadicrelations, with a signature of (Universal, Universal).

8.7 ExampleIn order to make the previous discussion less abstract, I have chosen to give an example. It is thesame example as was given in Section 7.5.

Consider the syntax of Figure 7.6 on page 77. It shows the Hebrew for “Darkness was uponthe surface of the deep.” It is transformed into the CG in Figure 8.3 on the facing page. This isdone as follows.

First, in the algorithm described in Section 8.5.5.3 on page 96 (transform_clause), the overallclause is broken down into its immediate constituent phrases. This would be the phrases withid_ds 10030 (CjP), 10031 (NP), and 10032 (PP).

Second, the CjP is eliminated, since the highest level which we wish to treat is clause-level ,and a CjP shows inter-clausal relations. This leaves the NP and the PP for consideration.

Third, the phrases are treated, resulting in lists of conceptual graphs. The NP (10031) istreated according to the rules in Sections 8.3.9 on page 84 and 8.3.3 on page 81. This gives riseto [darkness]. The PP (10032) is first treated according to a PP rule in Section 8.3.11 on page 86.This ultimately gives rise to the structure “(attach)->[Universal]->over->[NP/PPObj]” Before

8.8. CONCLUSION 101

[Situation:[Universal: ’PreC’]-

->Subj->[darkness],->over->[surface_1: ’{*}’]<-poss<-[ocean]

]

Figure 8.3: Intermediate CG for “Darkness was on the surface of the deep”

that, the object NP (NP/PPObj) is treated (8.3.9), using the Regens/rectum rule. This ultimatelygives rise to the structure “(attach)->[NP/REG]<-poss<-[NP/rec]”. Before that, the regens andrectum NPs are treated, giving rise to [surface_1: ’{*}’] and [ocean]. Note how the noun rule forsurface adds the plural marker (8.3.3). These CGs are then combined to “(attach)->[surface_1:’{*}’]<-poss<-[ocean]” (NP rule for regens/rectum, 8.3.9), which again is combined to “(attach)->[Universal]-over->[surface_1: ’{*}’]<-poss<-[ocean]” (PP rule, 8.3.11).

Fourth, these two conceptual graphs are joined using the algorithm in 8.5.5.4 on page 97. Thegraph resulting from the PP ((attach)->[Universal]-over->[surface_1: ’{*}’]<-poss<-[ocean]) ischosen as the most important phrase, since its function (“PreC”) is more important than that ofthe NP (“Subj”). Hence the PP’s CG becomes the hub, and it is joined with the NP’s CG bymeans of a relation with the same name as the NP’s function (“Subj”). The result can be seen inFigure 8.3.

8.8 ConclusionIn this chapter, I have shown how I produce intermediate CGs from the syntax uncovered inChapter 7. First, I have given a general overview of my method (8.2). Then, I have shown howI do word-level and phrase-level (8.3), followed by how I deal with clause-level (8.4). Havingtreated clause-level, I have detailed the algorithms used to arrive at the result (8.5). Then I havedetailed the relations used, and their rationale (8.6). Finally, I have given an example (8.7).

Two important points made in the section on clause-level are: a) that the number of phrase-structure rules grows slowly with the amount of text analyzed, and b) that the number of clause-level rules grows faster than the number of phrase-level rules. This validates hypotheses 6 and7.

The graphs produced for Gen 1:1-3 are shown in Figure 8.4 on the next page. The Englishtranslations are from the Holy Bible, New International Version (NIV), used by permission ofHodder & Stoughton. The Hebrew is also shown for comparison.


Gen 1:1: In the beginning God created the heavens and the earth.B-R>CJT BR> >LHJM >T H-CMJM W->T H->RY

[Situation:[entity*a:[heavens*b:’{*} #’][earth*c:’#’](and?b?c)][God*d][beginning*e][Universal*f][create*g:’Pred’](in?f?e)(Time?g?f)(Subj?g?d)(Objc?g?a)]

Gen 1:2: Now the earth was formless and emptyW-H->RY HJTH THW W-BHW

[Situation:[Universal*a:[emptiness*b][void*c](and?b?c)][earth*d:’#’][be_1*e:’Pred’](Subj?e?d)(PreC?e?a)]

Gen 1:2: darkness was over the surface of the deepW-XCK <L PNJ THWM

[Situation:[darkness*a][ocean*b][surface_1*c:’{*}’][Universal*d:’PreC’](poss?b?c)(over?d?c)(Subj?d?a)]

Gen 1:2: and the Spirit of God was hovering over the watersW-RWX >LHJM MRXPT <L PNJ H-MJM

[Situation:[water*a:’{*} #’][surface_1*b:’{*}’][Universal*c][God*d][spirit*e][hover*f:’PreC’](poss?a?b)(over?c?b)(poss?d?e)(Subj?f?e)(Cmpl?f?c)]

Gen 1:3: And God saidW-J>MR >LHJM

[Situation:[God*a][say*b:’Pred’](Subj?b?a)]

Gen 1:3: “Let there be light”JHJ >WR

[proposition:[light*a][be_1*b:’Pred’](Subj?b?a)]

Gen 1:3: And there was light.W-JHJ >WR

[Situation:[light*a][be_1*b:’Pred’](Subj?b?a)]

Figure 8.4: Graphs produced from syntax, Genesis 1:1-3

This is the intermediate result of my work.For each clause, the English, the Hebrew, and the CG are displayed.

Chapter 9

From intermediate CGs to more semanticCGs

9.1 Introduction

The final step of my method takes the CGs produced in the previous chapter and removes mostof the syntax, thus leaving fully semantic CGs. The method is modeled after what both Nicolasand Barrière do, Nicolas with his rules and Barrière with her Semantic Relation Transforma-tion Graphs (SRTGs). My approach is more similar to Nicolas’ approach than that of Barrière,however.

In the following, I first give an overview of my method (9.2), followed by a presentation therule structure, along with samples of the rules I have devised (9.3), after which I explain thealgorithms used (9.4). To make the preceding sections less abstract, I then give an example (9.5).Finally, I conclude the chapter (9.6).

9.2 Overview of method

Both Nicolas and Barrière transform various stages of their output using rules with a premise-conclusion structure. Barrière keeps the premise and conclusion separate, whereas Nicolas inte-grates them into one CG. Moreover, Nicolas makes use of a “WordNetActor” for binding “syn-tagmas”, as he calls them (i.e., words), to WordNet concepts.

I have chosen to be more like Nicolas than Barrière in this step of my method. However, myalgorithm is different because of the different natures of our respective input CGs. Nicolas’ inputgraphs contain almost no semantics, and are full of syntax. My input graphs contain almost nosyntax, and are full of semantics. Also, I do not need the WordNetActor, since the previous stephas taken care of binding Hebrew words to English WordNet concepts.

The overall strategy is, for each intermediate CG, to match as much of it as possible withas many rules as possible, to process what these rules match, and then, after all rules have beenexhausted, to copy whatever didn’t get copied. This is different from Nicolas, who leaves out

103

104 CHAPTER 9. FROM INTERMEDIATE CGS TO MORE SEMANTIC CGS

stuff from his syntactic graphs without copying. It is like Barrière, however, who copies whateverwasn’t matched by an SRTG.

9.3 Rules

9.3.1 Introduction

In this section, I describe what I mean by “rule.” First I give a general overview (9.3.2), followedby a description of the general rule structure (9.3.3). I then show how the rules are preprocessedbefore processing clauses (9.3.4), after which I show the some of the rules needed for transform-ing Gen 1:1-3 from intermediate graphs to more semantic graphs (9.3.5).

9.3.2 Overview

A rule is basically a structure with a premise and a conclusion. In my method, I have chosen tobe like Nicolas and express each rule as a single CG. I then express the rules in a text file usingCGIF, loading them into my program at the appropriate time. I then do some preprocessing oneach rule needed for the proper functioning of my method.

9.3.3 Rule structure

Consider the rule in Figure 9.1 on page 106. It shows the overall structure of a rule:

• The containing structure is a concept with concept type Rule.

• Nested inside the Rule concept one finds a CG with three components:

– A Premise concept with a nested Premise graph

– A Conclusion concept with a nested Conclusion graph

– A number of defining concepts used for coreference links between the Premise andthe Conclusion graphs. These defining concepts need to be in a scope which is ac-cessible from both the Premise graph and the Conclusion graph. Since they need tobe inside the rule, the best place is where they are.

This structure has been taken directly from Nicolas’ work, as can be seen in Figure 5.3 onpage 57.

In addition, each in-memory rule will have a list LCPC and a list LPNMC which are describedin the next section. These lists are derived from the rule CG.

9.3. RULES 105

Data member Type Meaning

P CG Premise graphC CG Conclusion graphLCPC list of pairs (Concept, Concept) List of Conclusion-Premise Concepts

A mapping of concepts in C to their counterparts in PLPNMC list of Concept List of Premise Non-Matched Concepts

List of concepts in P which do not have a counterpart in C

Table 9.1: Fields of each Rule data-structure

9.3.4 Rule preprocessing

Each rule is preprocessed to discover two things:

1. a mapping between the concepts of the conclusion graph and the concepts of the premisegraph. This is realized as a list, called LCPC (List of Conclusion-Premise Concepts),which lists pairs of type (concept, concept). The first concept of each pair is a conceptfrom the conclusion graph, and the second is the concept from the premise graph to whichit corresponds. The correspondence is established by means of the coreference sets.

2. A list of concepts in the premise which do not have a counterpart int the conclusion. Thisis called LPNMC (List of Premise Non-Matched Concepts).

The preprocessing also breaks the coreference links, thereby rendering the premise graph and theconclusion graph stand-alone graphs. This is done after the coreference links have been used todiscover the LCPC mapping.

Thus each rule is a data-structure with the fields depicted in Table 9.1.

9.3.5 Sample rules

In this section, I have gathered a sample of the rules I have specified. Each rule will be describedby its premise and its conclusion, and for some rules, I will offer a few comments. The full listof rules can be seen in Appendix J on page 239.

Consider the rule in Figure 9.1 on the following page. It has a premise with a Universalconnected to a Universal with the Time relation. The originating Universal must have a Namedesignator saying ’Pred’. The conclusion has again two Universals, but now they are connectedvia a ptim relation rather than a Time relation. Thus the “Time” function is turned into a “ptim”relation.

At least one other relation would have to be considered in a full system, namely “DUR” orduration. One cannot say, in general, that the Hebrew “Time” function will always need to bereplaced by a “ptim” relation. To solve this problem, one could go further and require that the


[Rule:[Universal*a][Universal*b][Premise:

[Universal?a:’Pred’][Universal?b](Time?a?b)][Conclusion:

[Universal?a][Universal?b](ptim?a?b)]

]

Figure 9.1: Rule for Time/ptim

The same rule structure is exhibited as in Nicolas’ rules in Figure 5.3 on page 57: A containingouter context (“Rule”), nested inside of which we find some binding coreference concepts (both

“Universal” in this case), a “Premise” graph and a “Conclusion” graph.


[be_1?a:’Pred’][Entity?b](Subj?a?b)][Conclusion:

[Universal?a][Universal?b](stat?b?a)]

]

Figure 9.2: Rule for be/Subj/stat

outgoing arc must connect to something that is a point in time rather than a duration. Doing sowould require that the ontology had this distinction, of course.1

Now consider the rule in Figure 9.2. It has a premise with a rather specific predicate andsubject: It must be the “be” type connected to an “Entity” as a Subject. The conclusion specifiesthat the subject must be related to the predicate with a “stat” relation. The “stat” relation is moreappropriate than the “agnt” relation when the predicate is a state, as in the case of “be_1”.

Now consider the rule in Figure 9.3 on the next page. It has a premise with an “Action”predicate connected to an “Entity” via a Subject relation. It replaces the subject relation with an“agnt” relation. Note how the predicate must be an “Action”. For a subject to be an “agnt” ratherthan having a “stat” relation with the predicate, the predicate must be some sort of Process, inthis case an “Action”.

1 See also the discussion of the ptim/dur duality in Section 10.7 on page 132.

9.3. RULES 107


[Action?a:’Pred’][Entity?b](Subj?a?b)][Conclusion:

[Universal?a][Universal?b](agnt?a?b)]

]

Figure 9.3: Rule for Subj/agnt


[Situation?a:’Pred’][Entity?b](Objc?a?b)][Conclusion:

[Universal?a][Universal?b](thme?a?b)]

]

Figure 9.4: Rule for Objc/thme

Agenthood also necessitates a volitional subject. This is harder to capture in the ontology,but an attempt has been made by specifying that the subject must be an Entity, not a Situation.Perhaps this requirement could be made even stronger by requiring the subject to be a subtypeof “Physical_object”. However, this would exclude “Entity_playing_a_role”, which may be toostrong a requirement, since some agents capable of agenthood might be beneath this type. (SeeFigure 4.2 on page 43).

Thus this rule transforms the Entity subject of an Action into an agent.Now consider the rule in Figure 9.4. It has a premise with a Situation which is a Predicate,

connected to an Entity via an Object relation. The “Objc” relation is replaced with a “thme”relation. This is for such things as “Create” (Pred) “the Heavens and the Earth” (Objc).

Note that there would be a number of other role relations which would have to be tried in afull system, such as “ptnt” and “rslt”. Again the ontology might help in disambiguating which itshould be, either on the predicate end or the object end.

Note also that “Situation” is probably too broad, as “Action” might be more appropriate.Thus this rule transforms the object of a Situation into a theme.Finally, consider the rule in Figure 9.5 on the following page. This rule is important for

my discussion of the example (Section 9.5 on page 119). The premise has three Universal con-cepts: One, the head, has a “PreC” (predicate complement) label, and is connected to the two


[Rule:[Universal*a][Universal*b][Universal*c][Premise:

[Universal?a:’PreC’][Universal?b][Universal?c](Subj?a?b)(over?a?c)

][Conclusion:

[be_1*d][Universal?b][Universal?c](stat?b?d)(over?d?c)

]]

Figure 9.5: Rule for PreC/Subj/over

other Universal concepts by a “Subj” and an “over” relation respectively. The conclusion trans-forms the head Universal into a “be” concept, and the “Subj” relation into a “stat” relation. Thismakes sense because it comes from a nominal clause (i.e., a verbless clause) where the “be” isunderstood.

Note how this example is specific to the “over” relation as a predicate complement. Otherprepositions in the same configuration would need to have their own, separate rules, unless onecould use the relation hierarchy to one’s advantage. For example, all preposition-relations (suchas “over”, “in”, “on”, etc.) could be subtypes of a “preposition” relation or similar. This wouldallow general formulation of this rule in terms of this supertype.

9.4 Algorithms

9.4.1 Introduction

In this section, I detail the algorithms used to transform intermediate CGs to semantic CGs.Before I detail the algorithms one by one, I first introduce the seven mappings and lists employed(9.4.2). I then give a bird’s eye view of the two main algorithms, transformIntermediateGraphand applyConclusion (9.4.3).

The algorithm in Section 9.4.4, preprocessRule, is for preprocessing rules before the mainrun, as described in Section 9.3.4 on page 105. It has a helper subroutine, process_conclusion_concept(9.4.5). It also uses a utility subroutine, getInnerGraph (9.4.6), which is also used in other algo-rithms.

The algorithm in Section 9.4.7, transformIntermediateGraph, is the main algorithm. It callsthe other, subsequent algorithms. The ApplyConclusion algorithm (9.4.8) does a lot of heavy lift-ing. Two utility subroutines, lookupConcept (9.4.9) and lookupConceptGetPair (9.4.10), look upconcepts in various list-mappings, such as the LCPC list described in Section 9.3.4 on page 105about rule preprocessing. The utility subroutine CopyConcept (9.4.11) copies a concept to the

9.4. ALGORITHMS 109

output graph, provided it has not already been dealt with. Finally the utility subroutine remove-Function (9.4.12) cleans up a CG by removing any residue functions, such as “Pred”, “PreC”,“Objc”, and others described in Table 8.15 on page 93.

9.4.2 Mappings and listsThe algorithm employs a number of mappings and lists which need to be explained in detailbefore I specify the algorithms.

First, each rule R has two lists, R.LCPC and R.LPNMC which are explained in Section 9.3.4 onpage 105 and in Table 9.1 on page 105. The reader is encouraged to look up their definitions atthis point.

Second, the main algorithm, transformIntermediateGraph (9.4.7), employs two mappings,LC and LR.

LC is a map of pairs (Concept, Concept) which maps concepts in the input CG to the con-cept in the output CG to which it is copied. Thus LC is used for tracking which concepts in theoutput CG are the product of a concept in the input CG. This is used at the end of the trans-formIntermediateGraph algorithm to copy concepts which have not been copied yet. The secondconcept of each pair may be None, a special Jython type meaning “no object”, in which casethe input concept is copied. Or it may be a concept in the output CG, showing that the concepthas already been copied. Finally, the second concept may also be a special concept, “[Universal:’NoCopy’]”, meaning that the concept should not be copied.

LR is for similar purposes, but is used to track relations which have been accounted for in arule. It is a list of pairs (Relation, boolean). The first part of each pair is a relation in the inputgraph. The second part of each pair is a boolean showing whether a rule, which was also applied,matched the given relation at some point. After all rules have been tried, the relations whichwere not accounted for (and thus whose second element of the pair is “false”) will be copied tothe output graph.

Third, the applyConclusion algorithm (9.4.8) employs a number of mappings, namely LMC,LMR, and LCRC. They are explained below.

The applyConclusion algorithm takes as part of its input a rule which was successfullymatched against the input CG. Notio maintains, for each match, a mapping of concepts whichmatched between the two matched graphs (in this case, the rule’s premise and the input graph).Similarly, a mapping of relations is maintained. These mappings are extracted from the matchingas LMC and LMR. LMC is a list of pairs (Concept, Concept) mapping concepts in the premiseof the rule to the matched concepts in the input graph. Similarly, LMR is a list of pairs (Rela-tion, Relation) mapping relations in the rule premise to the matched relations in the input graph.Finally, LCRC (“List of Conclusion concepts mapped to Result Concepts”) maps concepts inthe conclusion to the result concepts in the output graph. It is a list of pairs (Concept, Concept)where the first member of each pair is a concept in the conclusion, and the second member of thepair is the concept in the result graph to which the conclusion concept was copied.

Thus we have seven mappings or lists which are central to the algorithm:Each rule has a mapping LCPC and a list LPNMC. LCPC maps concepts in the conclusion

to their corresponding concept in the premise. LPNMC is a list of concepts in the premise which


Figure 9.6: Summary of transformIntermediateGraph

are not to be copied, since they do not have a counterpart in the conclusion.The transformIntermediateGraph algorithm uses two mappings: LC and LR. LC maps con-

cepts in the input graph to its corresponding concept in the output graph, if any. This mapping isalso used to signal that a concept is not to be copied. The LR list is used to keep track of whichrelations have already been accounted for, and thus should not be copied. Both LC and LR aremodified by applyConclusion.

The applyConclusion algorithm uses three mappings: LMC, LMR, and LCRC. LMC mapsconcepts in the premise to their matching concepts in the input graph. LMR does the same forrelations in the premise and relations in the input graph. LCRC maps concepts in the conclusionto the concept to which it has been copied in the output graph.

9.4.3 Bird’s eye view of algorithmFrom a bird’s eye perspective, the algorithm works as depicted in Figure 9.6 (transformInterme-diateGraph) and Figure 9.7 on the facing page (applyConclusion). The reader is encouraged toperuse these figures before continuing, as it will likely help in grasping the big picture of thealgorithms.

9.4.4 preprocessRuleThe preprocessRule algorithm takes a rule and preprocesses it according to the description givenin Section 9.3.4 on page 105. It works as follows:

9.4. ALGORITHMS 111

Figure 9.7: Summary of applyConclusion


1. Use the getInnerGraph algorithm (9.4.6) to get the inner graph in the Rule concept. Call itCG_inner.

2. Get the premise concept (“Premise_concept”) by using Notio’snotio.Graph.getConceptsWithExactType.

3. Get the premise graph (“Premise_graph”) by using Notio’s notio.Concept.getEnclosedGraph()on Premise_concept.

4. Get the conclusion graph (“Conclusion_graph”) in like manner.

5. Get a list of the premise graph’s concepts (“Premise_concepts”) using notio.Graph.getConcepts()

6. Get a list of the conclusion graph’s concepts (“Conclusion_concepts”) in like manner.

7. For all concepts “conclusion_concept” in Conclusion_concepts:

(a) Call the “process_conclusion_concept” algorithm (9.4.5) with (conclusion_concept,Premise_concepts).

(b) Call notio’s notio.Concept.isolate() method on conclusion_concept. This breaks anycoreference links to this concept.

8. For all concepts “premise_concept” in Premise_concepts:

(a) If premise_concept is not in LCPC, add it to LPNMC.

(b) Call premise_concept.isolate(). This is necessary for later matching to occur properly.

9.4.5 process_conclusion_conceptThe process_conclusion_concept algorithm is used in preprocessRule (described in the previoussection). It serves to build the rule’s LCPC mapping which maps conclusion concepts to theircorresponding premise concepts. (See Section 9.3.4 on page 105.) If a conclusion concept isnot represented in the premise graph, the mapping will show this with the Python value “None”,which means “no object”.

The algorithm takes two parameters: The conclusion concept to process (“conclusion_concept”)and a list of the premise’s concepts (“Premise_concepts”).

Note how, if a coreferent premise concept is not found, the “coreferent_premise_concept”variable remains “None”, and so this is the value that is going to get into LCPC.

1. Initialize coreferent_premise_concept to None. This is the value it will keep if we don’tfind it in Premise_concepts.

2. For all coreferent concepts (“coreferent_concept”) in conclusion_concept’s coreferent con-cepts (notio.Concept.getCoreferentConcepts()):

9.4. ALGORITHMS 113

(a) Set bBreak to “false”

(b) For all premise concepts (“premise_concept”) in Premise_concepts:

i. If there is identity between coreferent_concept and premise_concept:

A. Set coreferent_premise_concept to premise_conceptB. Set bBreak to “true”C. Break out of the for-loop iterating over Premise_concepts.

(c) if bBreak: Break out of for-loop iterating over coreferent concepts.

3. Append the pair (conclusion_concept, coreferent_premise_concept) to LCPC.

9.4.6 getInnerGraph

The getInnerGraph algorithm takes a CG “cg” as input and returns the nested graph of the firstconcept in “cg”.

1. Call cg.getConcepts(), getting an array of concepts from Notio.

2. Take the first concept C in this array and return C.getEnclosedGraph().

9.4.7 transformIntermediateGraphThis is the main algorithm for transforming an intermediate graph to a more semantic graph. Itis called once for each intermediate graph. The only parameter is the intermediate CG “cg” toprocess.

1. Make a new, empty graph, called CG_result

2. Make a global variable, “no_copy_concept”, which looks like this: “[Universal: ’No-Copy’]”. This will be used to say that a concept should not be copied.

3. Find the inner graph “CG_inner” in cg, using the getInnerGraph(cg) algorithm ( 9.4.6).

4. For all concepts c in CG_inner, store them in a list of pairs (concept,concept) showingwhether and how this concept was copied to CG_result. Initially, the second element inthe pair is set to “None”, which is Jython’s way of saying “no object”. The list is calledLC.

5. For all relations r in CG_inner, store them in a list of pairs (relation, boolean) showingwhether this relation was matched by a rule and thus should not be copied. Initially, thesecond element in the pair is set to false, showing that the relation was not matched. Thelist is called LR.


6. Now we need to apply the rules in turn.

For each rule R:

(a) Get the premise P and the conclusion C.

(b) Try to match the premise P with CG_inner. This is done with notio.Graph.matchGraphs().

(c) If the graphs matched: apply the conclusion using ApplyConlusion (9.4.8).

7. Now we need to copy the concepts in CG_inner which were not accounted for by apply-Conclusion. That is, copy all those concepts which were not either copied via a conclusion,or are not to be copied because it was in a premise but not in the conclusion.

For all pairs (CGcg, CG_out) in LC:

(a) If CG_out is None (i.e., it was not copied, and it was not matched by the premise ofa rule which didn’t copy it):

i. make a copy C_out of CGcgii. Apply removeFunction ( 9.4.12 on page 118) to C_out.

iii. add C_out to CG_result using the CopyConcept subroutine.

8. Now we need to copy those relations that have not been accounted for by applyConclusion.

For all pairs (Rcg, bHasBeenAccountedFor) in LR: If bHasBeenAccountedFor is FALSE:

(a) Make a list cgLR of the concept(s) in CG_inner to which Rcg relates. Make the listso that its order matches the order of the concepts in the relation itself.Use notio.Relation.getArguments().

(b) Make an empty list CG_result_LR which is to be the same as cgLR, but using con-cepts from CG_result.

(c) For all concepts CGcg in cgLR:

i. Lookup the pair (CGcg,C_out) in LC, andii. add C_out to CG_result_LR.

(d) Make a copy of Rcg using CG_result_LR.

(e) Add this copy to CG_result.

9. Since CG_result only accounts for the inner graph, we need to nest it in a concept of thesame type.

(a) Get the Situation/proposition concept using notio.Graph.getConcepts() on “cg” andtake the first one.

9.4. ALGORITHMS 115

(b) Get the type label of the concept’s concept type.

(c) Construct a new concept with this type. Call it outer_concept.

(d) Embed CG_result inside the new outer_concept.

(e) Construct an empty graph CG_outer_result, and add outer_concept to it.

10. Return CG_outer_result

9.4.8 ApplyConclusionThe ApplyConclusion subroutine takes as input the input CG CG_inner, the rule R, the NotioMatchResult from the matching, the lists LC and LR, and CG_result. Its purpose is to apply theconclusion of R to CG_inner. Along the way, it updates LC and LR from the transformInterme-diateGraph algorithm in the previous section. It works as follows:

1. Get the premise P and the conclusion C from the rule R.

2. For all concepts c in P, make a list of pairs (concept,concept) which maps concepts in Pto their matching concepts in CG_inner. Call the list LMC (“List of Matched Concepts”).This is done as follows:

(a) Set LMC to the empty list.

(b) Get the list of concepts from the premise (“firstConcepts”) through the first memberof matchresult’s NodeMappings. Use notio.NodeMapping.getFirstConcepts(). Thiswill get the concepts from the premise which were found to map to concepts in theCG_inner graph while matching the two graphs.

(c) Similarly, get the list of concepts from the input graph (“secondConcepts”).

(d) For all indexes in the range 0 to length of firstConcepts minus 1:

i. Set PC to firstConcepts[index]ii. Set CGC to secondConcepts[index]

iii. Append the pair (PC,CGC) to LMC

3. For all relations r in P, make a list of pairs (relation, relation) which maps relations in P totheir matching relations in cg. Call the list LMR (“List of Matched Relations”). This isdone analogously to the building of LMC.

4. Get the rule’s LCPC list (“LCPC”). (See Section 9.3.4 on page 105.)

5. Make an empty list of pairs (concept,concept) which is to map concepts in C to theiroutput concepts in CG_result. Call it LCRC (“List of Conclusion concepts mapped toResult Concepts”).


6. Now we need to copy the conclusion’s concepts.

For all concepts CC in the C:

(a) Look up the concept CR in P to which CC corresponds, using LCPC. Use the lookup-Concept(CC, LCPC) algorithm (Section 9.4.9 on the next page). CR is now the con-cept in P to which CC corresponds

(b) If CC does not have a counterpart in P (i.e., there is no such concept CR), then:

i. make a copy “C_out” of CC,ii. apply removeFunction to it ( 9.4.12 on page 118),

iii. add it to CG_result, andiv. place the pair “(CC,C_out)” into LCRC.v. Then go on to the next concept.

(c) Otherwise, we now know that the conclusion concept matches a concept CR in P.Look up CR in LMC, thereby getting the concept “Ccg” which matches CR. Again,use the lookupConcept(CR, LMC) algorithm (Section 9.4.9 on the next page). Ccg isnow the concept in the input graph to which CC corresponds, by way of its matchingconcept CR in P.

(d) Look up Ccg in LC and see whether it has already been copied. Use lookupCon-cept(Ccg, LC) and call the result C_out. C_out will be None if it has not been copied,and a non-None value if it has already been copied.

i. If it has already been copied (C_out is not None), add (CC,C_out) to LCRC, andgo on to the next concept.

ii. If it has not been copied (C_out is None), then:

A. Make a copy “C_out” of CcgB. Apply removeFunction ( 9.4.12 on page 118) to C_outC. Copy C_out to CG_result, using the subroutine CopyConcept (Section 9.4.11 on

page 118).D. Add (CC,C_out) to LCRC.

7. Now we need to make sure that all concepts which were matched by something in thepremise, but which must not be copied because it was not in the conclusion, will not becopied. This is done by placing “non_copy_concept” into LC for the correct CG_innerconcept.

(a) Get the rule’s LPNMC list (“LPNMC”). (See Section 9.3.4 on page 105.)

(b) For all pairs (PC, CGC) in LMC:

i. If PC is in LPNMC:

9.4. ALGORITHMS 117

A. Find CGC in LC and replace the second part of the pair with the globalvariable non_copy_concept (see Section 9.4.7)

8. Now we need to make sure that all relations in the conclusion get copied.

For all relations CR ins C:

(a) Make a list CLR of the concept(s) in C to which it relates. Make the list so that itsorder matches the order of the concepts in the relation itself.Use notio.Relation.getArguments()

(b) Make an empty list CG_result_LR which is to be the same as CLR, but using conceptsfrom CG_result.

(c) For all concepts CC in CLR: Lookup the pair (CC,C_out) in LCRC, and add C_outto CG_result_LR.

(d) Make a copy of CR using CG_result_LR.

(e) Add this copy to CG_result.

9. Now we need to make sure that all the relations which have been accounted for by P arenot copied later, by setting the corresponding “boolean” to “true” in LR.

For all relations PR in P:

(a) Lookup the pair (PR,MRcg) in LMR.

(b) Lookup the pair (MRcg,boolean) in LR.

(c) Set the boolean in this pair to “true”.

9.4.9 lookupConcept

The lookupConcept takes a concept C and a list L as input. The list L must be a list of pairs(concept, concept) which maps concepts to concepts. If C is found as the first member of a pair,the second member of that pair is returned. Otherwise, if C is not found as the first member of apair in the list, None is returned.

9.4.10 lookupConceptGetPair

The lookupConceptGetPair is similar to lookupConcept, except that it directly returns the pairrather than the second element of the pair. If C is not found as the first member of a pair in thelist, the pair (None, None) is returned.


9.4.11 CopyConceptThe CopyConcept subroutine takes as input a concept “Ccg” in the input graph cg, a concept“C_out” by which to replace it in the output graph, the output graph “CG_result”, and the list“LC”. The purpose is to copy “C_out” to CG_result, and to make sure that LC tells us that C_outreplaces Ccg. It works as follows:

1. Make sure that LC does not say that Ccg is already copied. If it does, then return withoutdoing anything.

2. Place C_out into CG_result.

3. Find Ccg in LC and place C_out in the second element of the pair.

9.4.12 removeFunctionThe removeFunction algorithm takes a concept and looks at its referent. If the referent is aNameReferent, it sees whether the referent name contains one of the WIVU phrase functions. Ifit does, the function is removed. This is so as to remove the last vestiges of syntax. It takes oneparameter, “concept” and modifies it in-place.

1. Get referent “old_referent” from concept.

2. Set newReferent to “None”

3. if old_referent is not None:

(a) Get quantifier “quantifier” from old_referent.

(b) Get descriptor “descriptor” from old_referent.

(c) Get designator “designator” from old_referent.

(d) if designator is None:

i. newDesignator = None

(e) Otherwise, if designator is not a Name designator:

i. newDesignator = designator

(f) Otherwise:

i. Get label “name” from designator (which is a Name designator)ii. See if “name” contains one of the WIVU functions. If it does, remove the func-

tion from “name”.iii. If length of “name” is greater than 0:

A. Create a new Name designator and assign it to newDesignatoriv. Otherwise:

9.5. EXAMPLE 119

A. newDesignator = None

(g) Create a new referent “newReferent” with quantifier, descriptor, and newDesignatoras the parameters.

4. Replace concept’s referent with newReferent.

9.5 ExampleTo make the above algorithms less abstract, I here present an example in action.

Consider the graph in Figure 8.3 on page 101. It shows a CG which is the output of the syntax-to-intermediate-CG algorithm, and expresses the sentence “Darkness was over the surface of thedeep”. The algorithms in this chapter transform this graph to the graph in Figure 9.8 on thefollowing page.

First, note how the two graphs are almost identical. The only differences are:

1. that “[be_1”] has been subsituted for “[Universal: ’PreC’]”, and

2. that the “Subj” relation has been replaced by the “stat” relation turning in the other direc-tion.

This is due to the rule in Figure 9.5 on page 108 which was discussed on page 107.This rule has, before we even get to this CG, been processed by the algorithms in Sec-

tions 9.4.4 on page 110 and 9.4.5 on page 112. This has produced two lists associated withthe rule: The LCPC list which maps concepts in the conclusion to concepts in the premise, andthe LPNMC list which shows the concepts in the premise which are not matched by a conceptin the conclusion. For this rule, the “[Universal: ’PreC’]” concept is not matched by anything inthe conclusion, and so it ends up in LPNMC. Similarly, the “[be_1]” concept in the conclusionis paired with “None” (meaning “no object”) in LCPC.

Now consider the algorithm in Section 9.4.7. Two empty lists are formed, namely LC andLR. LC maps concepts in the input to concepts in the output graph. LR shows us which relationsin the input graph have been accounted for.

Then all of the rules in Appendix J on page 239 are tried one by one. The only one thatmatches is the one in Figure 9.5 on page 108.

Since this rule matched, we apply the conclusion using the ApplyConclusion algorithm inSection 9.4.8 on page 115. This copies all of the concepts from the conclusion to the outputgraph, substituting the relations in the conclusion for the relations in the input graph which arematched by the premise. These matched relations are then accounted for in LR. The “[Universal:’PreC’]” concept which was not copied (since it was in the premise but not in the conclusion)ends up in LC paired with the special concept “[Universal: ’NoCopy’]” so that we know not tocopy it later. All other concepts which were copied from the conclusion, and which matchedconcepts in the input graph, are flagged in LC as having been copied to the respective concepts.This is done using CopyConcept (Section 9.4.11 on the preceding page).


[Situation:[be_1]-

<-stat<-[darkness],->over->[surface_1:’{*}’]<-poss<-[ocean]

]

Figure 9.8: Semantic CG for “Darkness was on the surface of the deep”

Back in the main algorithm (Section 9.4.7 on page 113), we check whether there are anyconcepts which have not been copied or which must not be copied. This is not the case, since allconcepts were accounted for.

We then check whether any relations need to be copied. This is also not the case, since allrelations were accounted for. This is because the rule matched all relations in the input graph.

Finally, we place the input graph back into its Situation context and construct a CG from thisSituation concept.

Thus I have described almost all of the algorithms using an example. Hopefully this makesit all less abstract.

9.6 ConclusionI have presented a method for transforming the intermediate graphs uncovered in the previouschapter to more semantic graphs.

The result can be seen in Figure 9.9 on the next page. I think that these graphs are pretty goodas far as their semantic content goes. No syntax is left, and everything is specified in terms ofsemantic concept types or semantic relations.2 This validates Hypothesis 1 on page 26, namelythat it is possible to achieve my goal of having “quite good” CGs with no syntax left.

2 Whether prepositon-relations such as “in” and “over” represent some last vestiges of syntax may, I suppose, bedebated.

9.6. CONCLUSION 121

Gen 1:1: In the beginning God created the heavens and the earth.B-R>CJT BR> >LHJM >T H-CMJM W->T H->RY

[Situation:[create*a:][Universal*b][God*c][entity*d:[heavens*e:’{*}#’][earth*f:’#’](and?e?f)][beginning*g](ptim?a?b)(agnt?a?c)(thme?a?d)(in?b?g)]

Gen 1:2: Now the earth was formless and emptyW-H->RY HJTH THW W-BHW

[Situation:[earth*a:’#’][Universal*b:[emptiness*c][void*d](and?c?d)](stat?a?b)]

Gen 1:2: darkness was over the surface of the deepW-XCK <L PNJ THWM

[Situation:[be_1*a][darkness*b][surface_1*c:’{*}’][ocean*d](stat?b?a)(over?a?c)(poss?d?c)]

Gen 1:2: and the Spirit of God was hovering over the watersW-RWX >LHJM MRXPT <L PNJ H-MJM

[Situation:[hover*a:][spirit*b][surface_1*c:’{*}’][water*d:’{*} #’][God*e](agnt?a?b)(over?a?c)(poss?d?c)(poss?e?b)]

Gen 1:3: And God saidW-J>MR >LHJM

[Situation:[say*a:][God*b](agnt?a?b)]

Gen 1:3: “Let there be light”JHJ >WR

[proposition:[be_1*a:][light*b](stat?b?a)]

Gen 1:3: And there was light.W-JHJ >WR

[Situation:[be_1*a:][light*b](stat?b?a)]

Figure 9.9: Graphs produced from intermediate graphs, Genesis 1:1-3

This is the end result of my work.For each clause, the English, the Hebrew, and the CG are displayed.


122

Chapter 10

Discussion

10.1 IntroductionIn this chapter, I discuss some general aspects of my work. I start out by discussing the factthat my project can be viewed as being within the field of machine translation (10.2). I thendiscuss the knowledge-representation aspects of my work, and possible applications (10.3). Ithen discuss various aspects of the semantics of my work (10.4). I then discuss the fact that theHebrew language is not central to my method, and could be replaced by another language (10.5).I then discuss whether my method scales well, and compare it to the method using canonicalgraphs (10.6). I then discuss various aspects relating to ambiguity, including where it arises inmy method (10.7). I then give a critique of my method, enumerating various points where Icould have done better (10.8). I then discuss various aspects of my ontology, including whetherthe hypotheses related to the ontology are validated (10.9). I then discuss possible avenues offurther research (10.10). Finally, I conclude and summarize the chapter (10.11).

10.2 This is machine translationMy work can be seen as machine translation in more than one way.

First, it is machine translation because it translates (“transforms” would be a more appropri-ate word) between one language, Hebrew, and another language, conceptual graphs. It tries tocapture the meaning of the source language in the statements of the target language, the CGs. Assuch, it is machine translation.

Second, it is machine translation because it translates from one language (Hebrew) into con-ceptual graphs expressed in English. As such, it is machine translation not only between twolanguages, a natural language (Hebrew) and a formal language (Conceptual Graphs), but alsobetween a natural language (Hebrew) and terms from another natural language (English). Thusthe machine translation can be depicted as in Figure 10.1 on the next page.

Third, it is machine translation because of the potential of the graphs as an intermediate stepin machine translation between two natural languages. Having created a meta-level representa-tion of a possible meaning of the Hebrew text (in CGs), one can then proceed to generating other

123

124 CHAPTER 10. DISCUSSION

Figure 10.1: Machine translation from Hebrew to English CGs

natural languages from this meta-level representation. Hence the CGs can be seen as a steppingstone to further machine translation, as depicted in Figure 10.2 on the facing page. Note howone would have to map not just the CGs themselves, but also the English ontology to each of thetarget languages.

The following references are all concerned with generation of natural language from con-ceptual graphs: Sowa (1984, pp. 230–246), Dogru and Slagle (1993), Velardi et al. (1988),Antonacci et al. (1992), Harrius (1992), and van Rijn (1992). One would be able to take theideas presented in these articles and build a system for automatically translating into other lan-guages from the CGs. If one could make the generational algorithms language-independent, onecould facilitate automatic Bible Translation into a lot of the world’s minority languages, pro-ducing first drafts perhaps within months rather than the typical X number of years it takes toproduce a first draft by hand. Organizations such as Wycliffe Bible Translators and United BibleSocieties would be potentially interested in such a software system.

10.3 This is knowledge representation

My work is also a work squarely placed within the field of knowledge representation. I take rawtext, producing a possible representation of its meaning in a knowledge base. Hence, some ofthe applications of knowledge representation can be transferred to my work as examples of whatyou can do with the work.

For example, one application of knowledge representation is text retrieval or query-answering:Given a knowledge base, we can answer queries from a user. Given the Biblical nature of myknowledge base, the target audience could be a number of profiles. For example, a theologianmight wish to query the knowledge base for exegetical parallels based on semantics rather thanword occurrences. The type hierarchy would play an especially important role in this respect,since it could be used to find texts based on more general concepts than is possible using just text-strings as search criteria. For example, one could find all instances where water is mentioned,

10.4. THE SEMANTICS OF MY WORK 125

Figure 10.2: Machine translation from Hebrew to CGs to other natural languages

regardless of whether it is a brook, a river, a lake, or a sea. This might yield new exegeticalinsights on particular passages.

Another profile would be the narratologist wishing to search for narratological parallels be-tween texts. The work of Schärfe (2001) shows beautifully how such searches can be put to gooduse.

A third profile would be the average Bible student wishing to do Bible study based on se-mantic searches, much like the theologian above.

Articles which use query-answering based on CGs as part of the method include Velardi etal. (1988), Fargues (1992), and Myaeng (1992).

Another application within the field of knowledge representation would be the narratologicalstudies alluded to above, but with a wider scope than text retrieval and query-answering. HenrikSchärfe in his MA thesis (Schärfe (2001)) showed how a CG knowledge base of an Old Testa-ment text could be the subject of structural narratological investigations using Prolog predicatesto search for, e.g., the hero and the villain in a story, based on structural narratological crite-ria. Other references with this topic include Schärfe and Øhrstrøm (2000), Schärfe (2002), andSchärfe and Øhrstrøm (2003).

10.4 The semantics of my work

I would like to comment on the semantic outlook underpinning my work. In fact, two threadsweave through the thesis as foundational assumptions about the semantics involved, like strongundercurrents. The first involves the distinction between “surface semantics” and “deep seman-


tics” (10.4.1). The second involves the notion of “compositional semantics” (10.4.2).

10.4.1 Surface semanticsFirst, what I have produced in my thesis is “surface semantics” rather than “deep semantics”. Ve-lardi et al. (1988, p. 252) draw a distinction between “surface semantics” and “deep semantics”,affirming that their work falls in the first category. They write:

“We believe that the ultimate goal of a language-understanding system is to pro-duce a “deep” representation, but the methods by which this representation shouldbe derived are unclear and not generally accepted in the present state of the art.” (p.252)

My work, too, falls squarely within the “shallow/surface semantics” camp. An example willserve to illustrate my point.

Consider the graph in Figure 9.8 on page 120. What exactly constitutes the semantics inthis graph? How can a human (or a machine, for that matter) hope to extract meaning from thisgraph?

The answer is three-fold: First, the semantics of the ontology should provide the referencemeaning of the concept types. Second, the semantics of the relation hierarchy should provide thereference meaning of the relations. And third, the syntax of the CGs (bipartite graph, direction ofarcs determine arguments of relation, concept is split into concept type and referent, referent hascertain well-defined components such as “{*}” and “#”, etc.), all this CG syntax helps us relatethe semantics of the ontology and relation hierarchy to form coherent meaning, by a processwhich I shall illucidate in the next section on compositional semantics.

However, if we look closely, we notice that the ontology does not really specify any “deep”semantics. It only goes so far as to specify the “is-a” relations between concept types, and,because it is based on WordNet, we can refer to WordNet’s definitions and example sentencesfor the “meaning” of each ontology.

Similarly, if we look closely, we notice that the relation hierarchy only has an implicit, ex-ternal semantics, in that one can refer to Appendix B of Sowa (2000a) for the meaning of eachrelation. And for relations such as “in” and “over”, we need to refer to an external dictionary fortheir meaning. Hence, it becomes very difficult for a computer to operate with the CGs as if theywere semantic and not really just symbols connected together by a certain structure and certainrelations.

This goes right back to the “meaning triangle” of Ogden and Richards 1923, cited in Sowa(2000b). The meaning triangle relates an object (such as a cat) with the symbol which stands forthe object (such as “Garfield”), and these two in turn with the “meaning” of the object, or theconcept or neural excitation which appears in the mind of a person thinking about Garfield thecat. Peirce called the symbol “Representamen” and the concept “Interpretant”. Frege used thewords “Zeichen” for the symbol, “Sinn” for the concept, and “Bedeutung” for the object. (SeeSowa (2000b, p. 59).)

Sowa calls this neural excitation “elusive” (p. 60), and with good reason. Hoffmeyer (1996)writes:

10.4. THE SEMANTICS OF MY WORK 127

“All computer programs are completely based on Peircean "secondness", i.e. syn-tactic operations, since application of the rules governing the manipulation of thesymbols does not depend upon what the symbols "mean" (their external semantics),only upon the symbol type. The problem is not only that the semantic dimensionof the mental cannot be reduced to pure syntactics. . . . The problem rather is thatthe semantic level itself is bound up in the unpredictable and creative power of theintentional, goal-oriented embodied mind.”

Thus Hoffmeyer would argue that it takes “the unpredictable and creative power of the inten-tional, goal-oriented embodied mind” to produce semantics. Since intentions are inherentlybased on Peircean “thirdness”, and since all present-day computer programs are completelybased on Peircean “secondness”, it follows that present-day computer programs cannot attainto “deep semantics”, since all they can do is manipulate symbols syntactically.

As Copeland (1993) admits, every computer is nothing more than a symbol-manipulatingmachine (p. 59). Much of his book is devoted to explaining how present-day computers cannotunderstand the symbols which they manipulate. However, Copeland argues that it is possible thata computer which does understand and think, can exist (pp. 79–81). We just have not learnedhow to build such a computer yet.1

Thus all computer programs in the present state of the art must rely on syntactic operationson symbols, and cannot access any external semantics.2 This means, applying it to my thesis,that the semantics of my CGs are by definition limited to “surface semantics” rather than “deepsemantics”, since they can only contain semantics by means of external references, which cannotbe codified in the computer, except through more symbols which indexically point to the externalsemantics.3

However, this state of affairs is actually good enough for my purposes. I have only demon-strated that it is possible to transform the syntax of one language to a formal language which hasless Hebrew-specific syntax (in fact, none at all), and which expresses “surface semantics” bymeans of the ontology, the relation hierarchy, and the syntactic rules governing CGs. And thatwas all I set out to do. Since I have achieved my goal, namely to produce “good enough” CGswith no syntax left, I do not further need to demonstrate the usefulness of the CGs. After all, anMA thesis is supposed to be limited in scope.4

1 Copeland would go even further and argue that any entity capable of understanding and thought is a computer,and hence our brains are also computers. Thus we, too, are nothing more than symbol-manipulating machines,and the Peircean Interpretants or concepts that arise as neural excitations in our brains can be represented by acomplex of symbols. This would introduce Peircean “thirdness” into the computer by means of a complexity ofsymbol-manipulations.

2 Searle’s Chinese Room argument tries to argue that it is not possible for a computer system to understand,given that it can only rely on syntactic operations on symbols, and can never access the nebulous realm of semantics.Copeland (1993, pp. 121–139) presents the argument and argues that Searle’s argument is invalid.

3 My assumption here is not that symbols cannot account for semantics, nor that semantic concepts cannot bereduced to symbols. I am merely arguing that the present state of the art cannot produce the complex of symbolswhich would be necessary to produce semantics, if this is possible.

4 I would not argue that “deep semantics” does not exist. Far from it: I believe that the Interpretants or conceptswhich arise in our brains as neural excitations do in fact constitute “deep semantics”. Whether this actually arisesthrough a mind-bogglingly complex arrangement of nothing more than symbols, I do not presume to know.


The first assumption, then, is that “surface semantics” is “good enough” for my purposes.Since I cannot hope to attain to the difficult goal of “deep semantics” during the course of a mereMA thesis, I have chosen to settle for the lesser goal of “surface semantics.”5

10.4.2 Compositional semantics

The second thread which weaves through my thesis involves the notion of “compositional se-mantics.” In fact, my thesis is based on the assumption that semantics can be constructed com-positionally. To see what this means, let us consider what I have done in my thesis.

First, I have obtained some fine-grained syntactic trees based on the WIVU syntax. Thesefine-grained syntactic trees have the property that the bottom most phrase-level nodes are verysmall, mostly consisting of one or two words. They also have the property that each further nodeupwards in the tree only involves one, two or three immediate constituents.

Second, I have constructed conceptual graphs by means of taking the leaf nodes of the tree(i.e., words) and from these produced CGs. These CGs have then been combined or joined bymeans of rules, going upwards in the tree. At each level, one considers the CGs produced at thelower nodes as atomic building blocks which are them combined at this level, only to be passedupwards in the tree, where the resulting CG is again treated as atomic. Finally, at clause-level,the structure of the clause is captured by making a star construction using the CG resulting fromthe “most important phrase” as the hub of the star.

This process can be called compositional: It joins CGs resulting from lower-level units,thereby composing the meaning together as the tree is traversed upwards, treating the outputof each level immediately beneath the current level as a unit at the current level.

Thus the second assumption which weaves through my thesis as a fundamental assumption isthat semantics can function compositionally. That is, that meaning can be composed recursivelyfrom lower-level units, and that the syntax tree is a reliable guide to this composition.

This assumption has a long tradition in AI. Fargues et al. (1986) write:

“A classical property of the formal models for natural language semantics used inAI is that they obey the compositionality principle. It is usually assumed that arepresentation of the semantics of an entire sentence can be built by combining thesemantic representations associated with its components.” (p. 73)

Thus I am in good company in having chosen this methodology. Note that for this to work, Ineed a fine-grained syntax-tree, and so I needed to produce one, as I did in Chapter 7.6

5 To state what is hopefully obvious by now: I am not arguing that computers cannot work through Peircean“thirdness”, and hence have the possibility of understanding at a semantic level. If this statement were true, thatcomputers cannot understand, it would be a very forceful argument against strong AI. Since this is still an openquestion, and since a capacity such as Copeland (p. 121) states that the issue can only be settled empirically, notphilosophically, I shall not venture such an argument.

6 The compositionality-principle is also discussed in Bornerand and Sabah (1992, p. 481, 492), and they mention(p. 492) that for the principle to work, a syntax tree is necessary.

10.5. THE NON-CENTRALITY OF HEBREW 129

10.5 The non-centrality of HebrewOne might ask the pertinent question: How central to the working of my method is my choice ofinput language? Could another language be substituted with only slight modifications, or is it sobound to Hebrew and the WIVU database that this would be difficult?

The answer depends on what amount of modification one is willing to accept for my methodto work on another language. Let me enumerate the places where the method is dependent onthe WIVU database and Hebrew:

1. The ontology is the most obvious place where Hebrew plays a role. The ontology directlymaps Hebrew lexemes to WordNet categories. However, given that WordNet is used as thebasic ontological framework, it would not be difficult to substitute the English WordNet,or any of the European WordNet implementations, and thus use other languages.

2. The next most obvious place where Hebrew plays a role is in the syntax-to-CG rules, whichare based on grammar rules extracted from the Hebrew text. However, the basic algorithmsare not bound up with these rules, and rules from another language could be subsitutedfor the present rules, without changing the algorithms. This includes the treatment ofHebrew suffixes, which are incidental to Hebrew and not central to the method. Similarthings could be said for the lexicons of conjunctions and prepositions; they, too, could berewritten to cater to another language. Other than that, the main requirement for the syntaxof a new language would be that the syntax-trees be traditional generative syntax trees,such as I have produced. Obviously, the first step (Chapter 7) would not be necessary ifsuch generative trees were already extant.

3. The third most obvious place where the WIVU-database plays a role is in the treatmentof phrases at clause-level. Here, the clause labels or phrase functions are used to join thegraphs from the levels below the uppermost phrase level. If the new language did nothave similar phrase function labels, then a different algorithm would have to be taken intoaccount for clause-level. However, one might also be lucky that the syntactic analysis doescontain such phrase functions, in which case one can use them directly.

4. The fourth most obvious place where the WIVU database plays a role is in the premise-conclusion rules of Chapter 9. Here, the clause labels mentioned in the previous item aretransformed, along with certain structures resulting from the syntax-to-CG rules. How-ever, the algorithm is so general and so widely applicable and so useful that this is not amajor concern: There is nothing in the algorithm of Chapter 9 that is Hebrew-specific; theHebrew-specificity only occurs in the rules, which would have to be recreated for a newlanguage anyway.

Thus, on balance, it appears that my method is general and is not intrinsically (that is, in the al-gorithms) Hebrew-specific nor specific to the WIVU database. The method is specific to Hebrewonly in the input data (such as the ontology, the rules, and the lexicons), as well as in the algo-rithm at clause-level. These need to be subsituted when applying the method to a new languageanyway, so the loss is not so great.


10.6 Does my method scale?In this section, I argue that my method does indeed scale to a text of the size of the entire HebrewBible. I start by offering some perspective on why I chose to use the method I have chosen, andthen argue that this is a rational choice.

The choice of method was between:

1. Syntax-directed joining of canonical graphs, or

2. Syntax-directed, ontology-driven, rule-based transformation.

The reason I chose number 2. over number 1. is that I perceived that building a lexicon of canon-ical graphs is a major undertaking. I estimated that for my method to scale, it would be morefeasible to base it on 2. rather than 1. I wanted my method to scale potentially to all of theHebrew Bible, even though I could not reach this goal in my MA.

Was this estimation justified? Let me argue that this is the case.There are around 11000 lexemes in the Hebrew Bible, including both Hebrew and Aramaic.7

Building canonical graphs for this amount of lexemes would be a huge undertaking. Sowa (1988,p. 135) mentions no less than five subtasks involved in building such a lexicon of canonicalgraphs, all of which require a large degree of human interaction and creativity. Sowa mentionsthat doing all of these tasks by hand, building a lexicon of canonical graphs could progress at therate of 1 concept per day (p. 136). However, Sowa also mentions the thesis of a student, Magrini,who produced a tool which did most of the heavy lifting, leaving only the checking to be doneby a linguist. With Magrini’s tool, the time to build a lexicon-entry was reduced to about halfan hour. Still, 11000 lexemes times half an hour makes 5500 hours or 687 man-days, given an8-hour business day. This is about 2.8 man-years, given a five-day working week and 48 weeksin a working-year.

Contrast this with the method I have chosen. The ontology is generated automatically froma marriage of WordNet and a Hebrew-English lexicon. Yes, the resulting ontology needs adjust-ment and checking, but not nearly on the scale of 2.8 man-years. In my 9th semester work, I wasable to check and correct the results of about 300 lexemes in about four days. That makes forabout 75 lexemes per day, or 146 man-days to produce a complete ontology of the Old Testament.Probably, with experience, this time would be shortened even further.

However, not everything centers around the ontology. One must also produce code that mapsproduction rules to CGs. For this small pilot-study, I used about 5+8 hours to both write the rulesand implement them. This involved:

1. Devising the rules (this took 5 hours).

2. Write the lexicons of prepositions, conjunctions, and suffixes.

3. Implementing the general framework.

7 The numbers for the WIVU lexicon at my disposal are 10145 lexemes for Hebrew and 829 lexemes for Aramaic.The actual numbers are potentially even higher, since not all proper nouns are included in the lexicon.

10.7. AMBIGUITY 131

4. Implement 17 rules.

The last three items took about 8 hours, probably with a distribution of time such that the 17rules took 3 hours and the other two tasks took the rest of the time. I realize that this is anecdotalevidence, since I didn’t write the times for the subtasks down, but for the sake of argument, let’sassume that this is the case. Recall that there would be around 1450 rules to implement for theentire Hebrew Bible. If one can do 17 rules in 5+3 hours, then the entire Hebrew Bible wouldtake 682 hours, or 85 working days of 8 hours, or a little over four months. Even assuming veryconservatively that I did the 17 rules in 5 hours, the numbers don’t skew too much: This bringsthe total hours to 853, or 107 working days of 8 hours, or a little over five and a half months.

So we have a conservative estimate of 146 + 107 = 253 working days, or 50 work-weeks,which is a little over a man-year. Contrast this with 2.8 man-years for the method using canonicalgraphs.

Hence, I conclude that my method does scale better than the method using canonical graphs.Finally, let me briefly tie this section together with the section on machine translation. One of

the reasons why I chose to marry WordNet and the WIVU lexicon to produce the ontology wasthat I wanted the graphs to be based on English. This would facilitate sharing of the graphs withnon-Hebrew-speaking researchers. It also facilitates the machine-translation aspect of translat-ing from Hebrew to English conceptual graphs. Thus an interesting by-product of my chosenontology-building method is that it facilitates machine-translation between Hebrew and EnglishCGs.

10.7 AmbiguityA few words need to be said about ambiguity. My method allows for ambiguity, at least in thestep which transforms syntax to intermediate CGs. In this method, any polysemous word withmore than one entry in the ontology would give rise to a multiplicity of graphs which would becarried upwards through the tree.

However, that is also the only place where ambiguity can enter. The syntactic, lexical, andmorphological levels have all been completely disambiguated by the good people at the Werk-groep Informatica. Their database does not contain any ambiguities, each value being specifiedat most once for each element. This of course reflects a number of choices on the part of theWerkgroep, some of which are interpretational in nature. However, that is not something whichI can change; I am only a consumer of the product which someone else has labored hard to pro-duce, and I cannot alter the basic premises of this product, at least not within the scope of anMA.

The stage which transforms intermediate CGs to semantic CGs likewise does not allow formore ambiguity than is already present from the previous stage. The algorithm produces exactlyone output CG for each input CG, no more, no less.

This is clearly a weakness in my method. For example, the rule depicted in Figure 9.1 onpage 106 always transforms a “Time” relation to a “ptim” relation, and the rule in Figure 9.4 onpage 107 always transforms an object relation to a “thme” relation. Yet neither are foregone


conclusions. For example, the target of the first rule is “In the beginning (Time), God createdthe Heavens and the Earth.” This is always transformed to say “[Create]->(ptim)->. . . ” etc., sig-nalling that “the beginning” means, in this case, a “point in time.” Yet some so-called “old-Earthcreationists”, such as Hugh Ross, would argue that “In the beginning” refers not to a specificpoint in time, but to a duration of several billion years, during which God created all the heav-enly bodies (other planets and the stars, including the Sun) as well as the Earth. Thus the relation“dur” might be better than “ptim” for this particular instance of the “Time” relation. Having theoption of generating both possibilities would enhance the value of the method, provided one alsomade provisions for disambiguating at a later stage. It is better to know that there is an ambigu-ity, and be able to disambiguate, than not to know your options. Something similar can be saidfor the “object-as-theme” rule, which clearly needs to be only one among several possibilities.Hence, a necessary extension to the method would be to introduce the possibility of ambiguityat the rule-level.

It is a coincidence, but none of the words in the text I have chosen to treat (Genesis 1:1-3) arepolysemous. Hence, there are no ambiguities at word-level, and hence only one CG is producedfor each clause, not a multiplicity of CGs. This is just luck: Chapter 1 in its entirety does containpolysemous words, and so I would have encountered ambiguities had I treated more text.

10.8 Critique of my methodI would like to take the opportunity to critique my method as it stands. A number of points canbe advanced against it:

1. While I intended to treat relations in such a way that they would form the basis of rejectingcertain joins due to incompatible relation signatures, I did not follow through on this inthe algorithms which I developed. However, since almost all joining takes place in onesubroutine (Section 8.5.5.5 on page 98), it would not be difficult to modify the program tohandle this.

2. The stage which transforms intermediate CGs to semantic CGs always produces exactlyone CG from an input CG, no more no less. This should be modified so as to be able toreject certain CGs if no rules matched, or if the rules that matched could not be appliedbecause the conclusion used relations with incompatible signatures, for example. As it is,input CGs which are not matched by any rule are simply copied verbatim to the output CG,and no checking is done on relation signatures to verify that the copying makes sense.

3. My ontology should ideally be extended to include defining graphs, canonical graphs andschemata for each concept type, not just a placement in the “is-a” hierarchy. This is becausedefining graphs, canonical graphs and schemata are one way at attempting to “get at” theelusive “concept” or neural excitation behind each concept type. Such an extension ofthe ontology would make it more useful in a context of knowledge engineering, formalnarratological analysis, or machine translation, since the “meaning” of each concept wouldbe further specified. It still would be just symbols in a computer, but at least the meaning

10.9. ONTOLOGY 133

would be more specified than by a mere placement in the “is-a” hierarchy, and applicationscould take advantage of this added knowledge. I know that I argued that writing canonicalgraphs would be infeasible, but note that I said “ideally” at the beginning of this paragraph.Thus, the ideal and most useful situation would be to have a completely specified ontology,with not only an “is-a” lattice, but also defining graphs, canonical graphs, and schematafor each concept type.

4. For my method, I produced custom lexicons of conjunctions and prepositions. Ideally,these should be taken from the Hebrew-English lexicon directly. However, to save timeand to enable more fine-grained control over the lexicons, I wrote them myself.

5. The ontology has a certain perspective, and this influences the meaning which can beextracted from the resulting graphs to a great extent. I treat this problem in Section 10.9.1

6. My method leaves a lot of Hebrew characteristics unprocessed, such as tense, mood, andinter-clausal relations. The lack of inter-clausal relations is particularly odious, since itmeans that one can query the knowledge base asking “Did God Say anything?”, and getan answer, but one cannot ask “What did God say?” and get an answer. This is becauseeach clause results in a separate graph, and the graphs are not reconnected at a higher level.Hence, the graph for “And God said:” is never connected to the graph for “Let there belight.” I shall discuss these shortcomings in the section on further research (10.10).

10.9 Ontology

I wish to say something about my ontology and the way in which it influences the meaning andusefulness of the resulting graphs (10.9.1). I also wish to discuss the adequacy of my ontologyfor my purposes (10.9.2), and the way in which the ontology plays a key role in my method(10.9.3).

10.9.1 Perspective and general adequacy

Recall that every ontology has a certain perspective (Section 4.1 on page 35). My ontology is nodifferent, in that it takes over the perspective of the WordNet lexicographers. Whether this leavesthe ontology useful and adequate can only be determined by the actual application in which mymethod’s resulting graphs are used.

For example, if the ontology were to be used for Bible Translation, it is not given that itwould make sense from the point of view of this application to accept the specification of “God”as a “belief”, as discussed in Section 4.4.3.2 on page 46.

Thus the ontology and its perspective greatly influences the usefulness and applicability ofthe resulting graphs. Whether they are adequate for a particular application will have to bedetermined relative to the purposes of that application.


10.9.2 Adequacy for my purposesHowever, for my purposes, the ontology is adequate in that it supports my efforts in reachingmy goal. My goal was to produce semantic CGs from the source text with no syntax left, andthe ontology has certainly shown itself adequate in this respect, since I reached my goal. Thisvalidates Hypothesis 2 on page 26.

10.9.3 Key role of my ontologyI hypothesized in Hypothesis 3 on page 26 that syntax alone could not produce enough semantics,and that the ontology would play a key role in producing semantics from syntax. I shall nowargue that this is the case.

There are two places in my method where the ontology really shines. One place is at thesource of the syntax-to-intermediate-CGs method, namely at word level. Here, the ontologyis instrumental in picking a conceptual type for the concept which will represent the word inquestion.

The other place where the ontology really shines is in the stage where I transform intermedi-ate CGs to semantic CGs. It is not explicitly mentioned in the algorithmic sections, but I rely onNotio’s capability to only produce matches which are compatible in terms of supertype/subtyperelations in the ontology. Thus the premise graph must always be a generalization of the part ofthe input graph which it matches. In other words, the premise graph must have concepts whoseconcept types are supertypes of the concepts in the input graph which they match.8 This rulesout a lot of false matches between rules that really do not apply and certain input graphs.

Thus the ontology does play a key role in my method: First, it is necessary for transformingthe Hebrew words to concept types, and second, without it, Notio’s CG matching, which is socentral to my method, would not work properly. This validates Hypothesis 3.

10.10 Further researchMy work leaves much to be desired, and opens up many avenues which could be explored infurther research. In this section, I consider some of them.

First, the most obvious thing to do would be to extend my method to work for all of theHebrew Bible. I have already discussed the benefits which such an achievement would bring tovarious fields. However, it is clearly not something to be attempted during the course of an MA,and so I have limited the scope of my research accordingly.

Second, the next thing to do after having a larger amount of text analyzed would be to imple-ment some of the applications I have mentioned, such as automatic Bible Translation or formal,automatic narratological analysis.

Third, my method could be made better in various ways. For example, I could treat Hebrewverb tenses and Hebrew verb moods. Anaphora resolution could be added. Inter-clausal relations

8 Recall that the supertype relation is reflexive (it is a partial order), and so the concept types can be the same inthe premise graph as in the input graph.

10.10. FURTHER RESEARCH 135

[Situation:[Say]-

-agnt->[God],-thme->[proposition:

[Light]->stat->[Be]]

]

Figure 10.3: Example of better handling of direct speech

could be considered, such as relative clauses, elliptic clauses, subject and object clauses, adjectiveclauses, parallel clauses, and defective clauses. All of these are encoded in the WIVU database,and my method should ideally deal intelligently with them.

As previously mentioned (10.7), it would be important to extend the method of Chapter 9 tohandle ambiguity.

Also, one important extension of my method would be to handle discourse structures suchas direct speech, embedded narratives, and embedded quotations. The WIVU database encodessuch structures beautifully, and therefore it is possible to treat them in detail. To give but oneexample from my chosen text, it would be nice if the words “And God said: ’Let there be light”’were transformed along the lines of Figure 10.3.

Similarly, the sentence:

The serpent said to the woman: “Did the LORD God really say: ’You must not eatfrom any tree in the garden’?”

could be represented roughly as in Figure 10.4 on the following page.9 Narratives embedded inquotes would be dealt with similarly. Finally, some way of chaining clauses at a discourse levelwould be necessary, especially in narrative sections.

Another avenue of research could be how to deal with the problem of contrived parallelstructures, as presented in Section 8.3.9.1 on page 84. One possible way of dealing with it wouldbe to use a rule as in Figure 10.5 on the following page. Note how the supertype of the conceptwhich is around both layers (?a) will become the new supertype for the combined construction.This makes good sense because it has been arrived at through a two-step process: First, (?c) and(?d) had their least common supertype taken, resulting in (?b). Then (?b) and (?e) had their leastcommon supertype taken, resulting in (?a). Thus (?a) is in fact the least common supertype of allthree (?c, ?d, ?e). This would solve the problem for both NPs and PPs.

Another interesting avenue of research would be how to handle metaphors. Metaphors pro-vide a rich mode of expression which is hard to formalize. There are a number of reasons whymetaphors are hard to formalize, but chief among them is the reason that metaphors inherentlysay one thing but mean another. Hence the real meaning of the metaphor is “hidden”, and iscertainly “deeper” than the “surface semantics” which I have treated in this thesis. Thus what is

9 This sentence has the text type or structure “NQQ” and is a quotation embedded in a quotation embedded in anarrative.


[Situation:[Say]-

-agnt->[Serpent: #],-rcpt->[Woman: #],-manr->[Questioning],-thme->[proposition:

[Say]--agnt->[LORD: #],-manr->[Commanding],-thme->[proposition:

[Must]-<-(neg),-ptnt->[Person: {*} #you],-thme->[Situation:

[Eat]--thme->[Tree: #any]-

-loc->[Garden: #]]

]]

]

Figure 10.4: Example of embedded quote

[Rule:[Universal*a][Universal*b][Universal*c][Universal*d][Universal*e][Premise:

[Universal?a:[Universal?b:

[Universal?c]->(and)->[Universal?d]]->(and)->[Universal?e]

]][Conclusion:

[Universal?a:[Universal?c]->(and)->[Universal?d]->(and)->[Universal?e]

]]

]

Figure 10.5: Rule for transforming multi-layered parallel constructions

10.11. CONCLUSION 137

interesting in a metaphor is not so much what is said, but what is not said. Since all my methodis able to handle is syntax-directed transformation, and since syntax can only ever encode whatis there, it follows that metaphors are hard to treat using my method. The work of Rasmussen(1998) or Eileen C. Way10 would probably come in handy, as would Schärfe (2002).

Thus my work opens up possibilities for future research which would provide useful occupa-tion for many years to come, were they to be explored exhaustively.

10.11 ConclusionIn this chapter, I have discussed various concepts, fields, problems, applications, and ingridientsrelating to my method. I have discussed the fact that my project is in some sense a project inmachine translation (10.2). I have discussed the fact that my project lies within the field of knowl-edge representation, and the various applications that this entails (10.3). I have discussed variousaspects of the semantics of my method, including its double nature as “surface semantics” and“compositional semantics” (10.4). I have then investigated whether my method is closely tied toHebrew, or whether a different language could be substituted, and I concluded that it can (10.5).I have then deliberated on whether my method scales well, and on the feasibility of producingCGs for the entire Old Testament (10.6). I have discussed how ambiguity enters into my methodand how I deal with it (10.7). I have given a number of points of critique of my method, where Icould have done better (10.8). I have discussed various aspects of my ontology (10.9), and haveconcluded that Hyptheses 2 and 3 are validated. Finally, I have discussed possible avenues offurther research (10.10).

10 See Way, Eileen Cornell. (1991) “Knowledge Representation and Metaphor”, Kluwer, Dordrecht.


138

Chapter 11

Conclusion

11.1 IntroductionIn this chapter, I summarize my thesis (11.2), offer perspective on my work (11.3), and treat thehypotheses which I set forth in Section 1.6 on page 26 (11.4). Finally, I round off the thesis in afinal conclusion (11.5).

11.2 SummaryMy thesis has revolved around finding a method for transforming syntax to semantics. In par-ticular, I have developed a method for transforming the Hebrew text of Genesis chapter 1 verses1-3 into the conceptual graphs of John Sowa.

I have used various tools to accomplish the job, among them Notio, Jython, and Emdros.I have also used certain input data as a basis for my work, in particular WordNet, a Hebrew-English lexicon, and the WIVU syntactic analysis of the Hebrew text, in addition to the Hebrewtext itself.

The method has been implemented in a Jython program. The process runs automaticallywithout user intervention, and produces a set of conceptual graphs matching the semantics of theinput text to a high degree.

In Part I, I laid the groundwork for my method. I introduced my tools (Chapter 2), the Hebrewlanguage and the WIVU database (Chapter 3), and my ontology (Chapter 4).

I also gave a literature review, showing what had been done previously in the field of gener-ation of CGs from text (Chapter 5). In particular, I reviewed two basic methods, namely that ofSowa and Way (1986) on the one hand, and that of Barrière and Nicolas on the other. Sowa andWay used a lexicon of canonical graphs to accomplish their goal, whereas Barrière and Nicolasused syntax-directed joining coupled with rule-based transformation of conceptual graphs.

Part II is where the meat of my thesis is found.In Chapter 6 I gave an overview of my method and introduced the rest if Part II.In order for my method to work, I had to transform the WIVU syntax into more traditional

syntax trees in the generative tradition. This was necessitated by the large size of the units in the

139

140 CHAPTER 11. CONCLUSION

WIVU syntax. In order for the compositonality-principle to work (Section 10.4.2), I needed amore fine-grained syntax. I also needed a generative grammar of extracted from the analysis ofmy text. Both goals were accomplished in Chapter 7.

Having obtained a more fine-grained syntactic analysis plus a generative grammar, I wasready to transform this syntax into “intermediate graphs.” This was done in Chapter 8. Bymeans of a three-layered approach (word, phrase, clause), I transformed the syntax trees intoCGs which were “quite good”, but which still had bits of syntax left, especially in the relations.The method was bottom-up syntax-directed joining of conceptual graphs. I should have made itontology-guided, too, but this was left for further research.

Then, building upon these “quite good” intermediate graphs, I processed them further inChapter 9, yielding CGs which were now “good enough” and which had no syntax left. This wasdone through rule-based matching of subgraphs. Each rule had a premise and a conclusion. Thepremise was matched against the input graph, and if a match occurred, then the conclusion wasapplied. Concepts and relations which were not accounted for in this way were simply copiedover into the resulting CG.

Having arrived at the goal of my thesis, namely a method to transform the source text to CGswhich were “good enough” and which had no syntax left, I then proceeded to discussing variousaspects of my method in Chapter 10. Among other things, I discussed possible applications,semantic considerations, ontological considerations, a critique of my method, ambiguity, andfurther research. I also argued that my method scales better than the competing method of Sowaand Way (1986), namely syntax-directed joining of canonical graphs.

11.3 PerspectiveMy work can be seen as one contribution in a long line of research into generation of CGs fromtext. The literature review lists much of this research, even though some references have beenleft out. My contribution is at least partially novel for three reasons:

1. The source language is Hebrew, which, to the best of my knowledge, has not been treatedby conceptual graphs yet.

2. I have combined elements from the methods of other people in a novel way. In particular,my method combines Barrière’s syntax-directed joining of CGs to produce “intermedi-ate graphs” with Nicolas’ rules instantiated in premise-conclusion CGs. The nexus pointwhere this joining becomes necessary is at clause-level: At clause-level, I deviate fromwhat Barrière does and follow what Nicolas does instead. In particular, at clause-level Iproduce intermediate CGs which have bits of syntax left still in (the phrase-function rela-tions). At this point, Barrière goes all the way and produces more or less fully semanticCGs because she treats the various patterns that arise at clause-level in her syntax-directedjoining method. Nicolas, on the other hand, produces CGs which have almost no seman-tics and are full of syntax, then using rules to transform this syntax to more semantic CGs.Thus I follow both Barrière and Nicolas, and in this combination I am somewhat novel.

11.4. HYPOTHESES 141

3. My method for treating clause-level is novel in itself, in that I always produce a star topol-ogy. I decide what the most important phrase is, and use that phrase’s CG(s) as the hub(s)of the star(s). This is motivated by the fact that the most important phrase will often be apredicate or a predicate complement. Since in my experience, the verb or that which theclause predicates (the predicate complement, in the case of the lack of a verbal predicate)will often be just that: The hub in a star of subgraphs.

My work can also be seen more broadly as laying down the seeds of a wider contribution tothe field of knowledge engineering. As I argued in Section 10.3 on page 124, this opens up thepossibility for a number of applications within this field.

11.4 HypothesesAt various junctures throughout my thesis, I have referred to one or another of my hypothe-ses, and have stated that they are validated for some reason. In this section, I recapitulate thehypotheses, and explain why I think they are validated.

1. It is possible to transform syntactic structures to conceptual structures by means of the methodunder development, with quite some degree of success. In particular, the conceptual struc-tures will have no syntax left from the source text, and will only contain semantics. Inaddition, a human reader will be able to recognize a large degree of similarity between thesemantics of the CGs and a possible meaning of the natural language text.

I think that the fact that my method works so well and produces such good, syntax-free graphsas those depicted in Figure 9.9 on page 121 is reason enough to validate this hypothesis. Thegraphs are good for two reasons: First, they contain no syntax (as I stipulated that they must notdo). Second, a human reader can verify that they do in fact represent the meaning of the Hebrewquite faithfully, if somewhat incompletely.1

2. The ontology which I developed in my 9th semester project is adequate for the purposes ofthis thesis, given certain modifications.

The fact that my ontology has served my purposes, given the modifications I mentioned in Sec-tion 4.3 on page 40, shows that this hypothesis has been validated.

3. Syntax alone cannot yield enough semantics; the ontology will demonstrably play a key rolein yielding meaning from the text.

As I argued in Section 10.9.3 on page 134, the ontology does play a key role at two junctures inmy method, and my method would not work without the ontology. Therefore, this hypothesis isvalidated.

1 Things like tenses and moods are missing, for example.

142 CHAPTER 11. CONCLUSION

4. The method I have chosen to develop will work for Hebrew, and in particular, using thesyntactic analysis of the text which I have at my disposal as a basis.

My method does work for Hebrew and for the particular analysis at my disposal. The graphsin Figure 9.9 show this to be the case. I had to make the syntactic analysis more fine-grained,but this was possible given the categories already existing in the database. Hence, the syntacticanalysis served its purpose well and adequately.

5. It will be beneficial to transform the syntactic analysis to something which is closer to tradi-tional generative syntax trees.

Without the transformations of Chapter 7, the syntactic units would have been too large andunwieldy for the compositionality-principle to function (Section 10.4.2 on page 128). Take, forexample, the PP phrase of the first sentence in the text:

>T H-CMJM W >T H->RY“Et ha-shamayim we et ha-arets”“The Heavens and the Earth”

This was transformed to the following structure:

[>T [H-CMJM]] W [>T [H->RY]]

This structure had elements of size two and three, which is much easier to deal with.Of course, one could have written a rule specifically for the pattern

Prep Article Noun Conjunction Prep Article Noun

But writing such rules would quickly get out of hand, as the number of this kind of rulewould grow very fast, since the syntax allows such great variation in expression and combinationof phrases. Hence, it was beneficial to my method to break down the syntax tree into morefine-grained trees in the generative tradition. Hence, this hypothesis is validated.

6. The number of phrase structure rules found in any given amount of text grows slowly with thenumber of words analyzed.

This point was argued at length in Section 8.4.2 on page 88 and also in Appendix I on page 235.The numerical basis for the claims is documented in Appendix H on page 231. Here I shall onlyrefer to these sections and state that the hypothesis has been validated empirically.

7. The number of clause valency patterns grows faster with the amount of text analyzed than thenumber of phrase-level phrase structure rules. Thus it will be easier to treat clause-levelusing a different method than the one used at phrase-level.

Again, this point was argued at length in Section 8.4.2 on page 88. I have also argued that thefact of the larger set of clause-level rules precludes using the same method as that at phrase-level.Further, I have argued that it is in fact easier to treat clause-level using the method I describedin Section 8.4.3 on page 92, augmented with the method in Chapter 9. Hence, the hypothesis isvalidated.

Thus all my hypotheses were validated.

11.5. CONCLUSION 143

11.5 ConclusionMy work leaves much to be desired, both in terms of execution and in terms of methodology.Further research could go on for many years, but I leave the work here, hoping that others willpick up the work where I left off. The perspectives for automated Bible Translation, in particular,are very promising. SIL International estimates that there are 2700 languages in the world thatstill need a Bible,2 and automated Bible Translation could be one strategy to reach the goal ofcompleting all these Bible Translations.

Part of my method may also be useful to some weary soul who wishes to transform syntax tosemantics. May their efforts be successful and their work their reward. This has been my lot inthis MA.

2 SIL International is a non-profit academic charity organization dedicated to literacy and Bible Translationamong minority people groups around the world. See <http://www.sil.org/> for more information.


144

Bibliography

Andersen, F. I. (1994). Salience, implicature, ambiguity, and redundancy in clause-clause rela-tionships in Biblical Hebrew, in R. D. Bergen (ed.), Biblical Hebrew and Discourse Lin-guistics, Summer Institute of Linguistics, Dallas, Texas, pp. 99–116. ISBN 1-55671-007-0.

Angelova, G., Corbett, D. and Priss, U. (eds) (2002). Foundations and Applications of Con-ceptual Structures – Contributions to ICCS 2002, Bulgarian Academy of Sciences, Sofia,Bulgaria.

Antonacci, F., Pazienza, M. T., Russo, M. and Velardi, P. (1992). Analysis and generation ofItalian sentences, in Nagle, Nagle, Gerholz and Eklund (1992), pp. 437–459. ISBN: 0-13-175878-0.

Barrière, C. (1997). From a Children’s First Dictionary to a Lexical Knowledge Base ofConceptual Graphs, PhD thesis, Université Simon Fraser. Access online April 2003:http://www.site.uottawa.ca/˜barriere/research/BarrierePhD.ps.

Benn, D. and Corbett, D. (2001). An application of the process mechanism to a room allocationproblem using the pCG language, in Delugach and Stumme (2001), pp. 360–376.

Bornerand, S. and Sabah, G. (1992). Conceptual parsing: a syntax-directed joining algorithm fornatural language understanding, in Nagle et al. (1992), pp. 481–513. ISBN: 0-13-175878-0.

Bosman, H. J. (1995). Computer assisted clause description of Deutoronomy 8, Actes du Qua-trième Colloque International: “Bible et Informatique: «Matériel et Matière»: L’impactde l’informatique sur les études bibliques”, Amsterdam, 15–18 August 1994, number 57in Travaux de linguistique quantitative, Association Internationale Bible et Informatique,Honoré Champion, Paris, pp. 76–100.

Bosman, H. J. and Sikkel, C. (2002a). Reading authors and reading documents, in Cook (2002),pp. 113–134. ISBN 9004124950.

Bosman, H. J. and Sikkel, C. (2002b). Word level analysis, Internal documentation for theWerkgroep Informatica.

Braüner, T., Nilsson, J. F. and Rasmussen, A. (1999). Conceptuel graphs as algebras – with anapplication to analogical reasoning, in Tepfenhart and Cyre (1999), pp. 456–469.

145

146 BIBLIOGRAPHY

Chein, M. and Mugnier, M.-L. (1997). Positive nested conceptual graphs, in Lukose et al. (1997),pp. 95–109.

Cohn-Sherbok, D. (1996). Biblical Hebrew for Beginners, Society for Promoting ChristianKnowledge, London. ISBN 0-281-04818-5.

Cook, J. (ed.) (2002). Bible and Computer - the Stellenbosch AIBI-6 Conference: Proceedingsof the Association Internationale Bible Et Informatique “From Alpha to Byte”, Universityof Stellenbosch, 17-21 July, 2000, Association Internationale Bible et Informatique, Brill,Leiden. ISBN 9004124950.

Copeland, B. J. (1993). Artificial Intelligence – A Philosophical Introduction, Blackwell Pub-lishers, Oxford and Malden, MA. ISBN 0-631-18385-X.

Delugach, H. and Stumme, G. (eds) (2001). Conceptual Structures: 9th International Conferenceon Conceptual Structures, ICCS 2001, Stanford, CA, USA, July/August 2001, Proceedings,Vol. 2120 of Lecture Notes in Artificial Intelligence (LNAI), Springer Verlag, Berlin.

Doedens, C. F. J. (1994). Text Databases: One Database Model and Several Retrieval Lan-guages, number 14 in Language and Computers, Editions Rodopi Amsterdam, Amsterdamand Atlanta, GA. ISBN 90-5183-729-1.

Dogru, S. and Slagle, J. R. (1993). A system that translates conceptual structures into English,in Pfeiffer and Nagle (1993), pp. 283–292.

Dyk, J. W. (1994). Participles in Context. A Computer-Assisted Study of Old Testament Hebrew.,Vol. 12 of APPLICATIO, VU University Press, Amsterdam. Doctoral dissertation, VrijeUniversiteit Amsterdam. Promotors:Prof. Dr. G. E. Booij and Prof. Dr. E. Talstra.

Dyk, J. W. (2002). Linguistic aspects of the Peshitta version of 2 Kings 18 and 19, in Cook(2002), pp. 519–544. ISBN 9004124950.

Dyk, J. W. and Talstra, E. (1988). Computer-assisted study of syntactical change, the shift in theuse of the participle in Biblical and Post-Biblical Hebrew texts, in P. van Reenen and K. vanReenen-Stein (eds), Spatial and Temporal Distributions, Manuscript Constellations – Stud-ies in language variation offered to Anthonij Dees on the occasion of his 60th birthday, JohnBenjamins Publishing Co., Amsterdam/Philadelphia, pp. 49–62. ISBN 90-272-2062-X.

Eklund, P. W., Ellis, G. and Mann, G. (eds) (1996). Conceptual Structures: Knowledge Represen-tation as Interlingua – 4th International Conference on Conceptual Structures, ICCS’96,Sydney, Australia, August 1996, Proceedings, Vol. 1115 of Lecture Notes in Artificial Intel-ligence (LNAI), Springer Verlag, Berlin.

Elliger, K., Rudolf, W. and Weil, G. (eds) (1997). Biblia Hebraica Stuttgartensia, editio quintaemendata edn, Deutsche Bibelgesellschaft, Stuttgart.

BIBLIOGRAPHY 147

Ellis, G., Levinson, R., Rich, W. and Sowa, J. F. (eds) (1995). Conceptual Structures: Ap-plications, Implementation and Theory – Third International Conference on ConceptualStructures, ICCS’95, Santa Cruz, CA, USA, August 1995, Proceedings, Vol. 954 of LectureNotes in Artificial Intelligence (LNAI), Springer Verlag, Berlin.

Fargues, J. (1992). Conceptual graph information retrieval using linear resolution, generalizationand graph splitting, in Nagle et al. (1992), pp. 573–582. ISBN: 0-13-175878-0.

Fargues, J., Landau, M.-C., Dugourd, A. and Catach, L. (1986). Conceptual graphs for semanticsand knowledge processing, IBM Journal of Research and Development 30(1): 70–79.

Fellbaum, C. (ed.) (1998). WordNet: An Electronic Lexical Database, MIT Press, London,England and Cambridge, Massachusetts.

Ganter, B. and Mineau, G. W. (eds) (2000). Conceptual Structures: 8th International Conferenceon Conceptual Structures, ICCS 2000, Darmstadt, Germany, August 2000, Proceedings,Vol. 1867 of Lecture Notes in Artificial Intelligence (LNAI), Springer Verlag, Berlin.

Genest, D. and Salvat, E. (1998). A platform allowing typed nested graphs: How CoGITo becameCoGITaNT, in Mugnier and Chein (1998), pp. 154–161.

Givón, T. (1984). Syntax – A Functional-Typological Introduction, Vol. I, John Benjamins, Am-sterdam/Philadelphia.

Hardmeier, C., Syring, W.-D., Range, J. D. and Talstra, E. (eds) (2000). Ad Fontes! Quellenerfassen - lesen - deuten. Was ist Computerphilologie?, Vol. 15 of APPLICATIO, VU Uni-versity Press, Amsterdam.

Harmsen, H. (1992). QUEST: a query concept for text research, Actes du Troisième Colloque In-ternational: “Bible et Informatique: Interprétation, Herméneutique, Compétence Informa-tique”, Tübingen, 26-30 August, 1991, number 49 in Travaux de linguistique quantitative,Association Internationale Bible et Informatique, Champion-Slatkine, Paris and Genève,pp. 319–328.

Harrius, J. (1992). Text generation in expert critiquing systems using rhetorical structure theory,in Nagle et al. (1992), pp. 515–523. ISBN: 0-13-175878-0.

Hoede, C. and Liu, X. (1996). Word graphs: The first set, in P. W. Eklund, G. Ellis and G. Mann(eds), Conceptual Structures: Knowledge Representation as Interlingua – Auxiliary Pro-ceedings of the Fourth International Conference on Conceptual Structures, Bondi Beach,Sydney, Australia, ICCS’96, Unknown publisher, Unknown address, pp. 81–93.

Hoede, C. and Liu, X. (1998). Word graphs: The second set, in Mugnier and Chein (1998),pp. 375–389.

148 BIBLIOGRAPHY

Hoffmeyer, J. (1996). Evolutionary intentionality, in E. Pessa, A. Montesanto andM. Penna (eds), Proceedings from The Third European Conference on Sys-tems Science, Rome 1.-4. Oct. 1996, Edzioni Kappa, Rome, pp. 699–703.http://www.molbio.ku.dk/MolBioPages/abk/PersonalPages/Jesper/Intentionality.html.

Holladay, W. L. (ed.) (1988). A Concise Hebrew and Aramaic Lexicon of the Old Testament, E.J.Brill, Leiden. ISBN 0-8028-3413-2.

Horrocks, G. (1987). Generative Grammar, Longman, London and New York.

Hughes, J. J. (ed.) (1987). Bits, Bytes and Biblical Studies: A Resource Guide for the Use ofComputers in Biblical and Classical Studies, Zondervan, Grand Rapids, Michigan. ISBN0-310-28581-X.

Jenner, K. and Talstra, E. (2002). CALAP and its relevance for the translation and interpretationof the syriac bible. the presentation of a research programme on the computer assistedlinguistic analysis of the peshitta., in Cook (2002), pp. 681–697. ISBN 9004124950.

Kabbaj, A. (1999a). Synergy: A conceptual graph activation-based language, in Tepfenhart andCyre (1999), pp. 198–213.

Kabbaj, A. (1999b). Synergy as an hybrid object-oriented conceptual graph language, in Tepfen-hart and Cyre (1999), pp. 247–261.

Kabbaj, A. and Janta-Polczynski, M. (2000). From PROLOG++ to PROLOG+CG : A cg object-oriented logic programming language, in Ganter and Mineau (2000), pp. 540–554.

Kabbaj, A., Moulin, B., Gancet, J., Nadeau, D. and Rouleau, O. (2001). Uses, improvements, andextensions of Prolog+CG : Case studies, in Delugach and Stumme (2001), pp. 346–359.

Landau, M.-C. (1990). Solving ambiguities in the semantic interpretation of texts, in H. Karlgren(ed.), COLING 1990 Volume 2: Papers presented to the 13th International Conference onComputational Linguistics, Helsinki, August 20-25, 1990, pp. 65–70. Available online athttp://acl.ldc.upenn.edu/C/C90/C90-2042.pdf.

Lukose, D., Delugach, H., Keeler, M., Searle, L. and Sowa, J. (eds) (1997). Conceptual Struc-tures: Fulfilling Peirce’s Dream – Fifth International Conference on Conceptual Structures,ICCS’97, Seattle, Washington, USA, August 1997, Proceedings, Vol. 1257 of Lecture Notesin Artificial Intelligence (LNAI), Springer Verlag, Berlin.

Mann, G. A. (1993). Assembly of conceptual graphs from natural language by means of multipleknowledge specialists, in Pfeiffer and Nagle (1993), pp. 275–282.

Martin, P. (1995). Using the WordNet concept catalog and a relation hierarchy for knowledge ac-quisition, in P. Eklund (ed.), Proceedings of the Fourth International Workshop on PEIRCE,Santa Cruz, USA, pp. 36–47.

BIBLIOGRAPHY 149

Matthews, P. H. (1997). The Concise Oxford Dictionary of Linguistics, Oxford University Press,Oxford.

McCawley, J. D. (1982). Parentheticals and discontinuous constituent structure, Linguistic In-quiry 13(1): 91–106.

Miller, G. A. (1998). Nouns in wordnet, in Fellbaum (1998), pp. 23–46.

Mineau, G., Moulin, B. and Sowa, J. F. (eds) (1993). Conceptual Structures: Conceptual Graphsfor Knowledge Representation – First International Conference on Conceptual Structures,ICCS’93, Quebec City, Canada, August 1993, Proceedings, Vol. 699 of Lecture Notes inArtificial Intelligence (LNAI), Springer Verlag, Berlin.

Mineau, G. W. (ed.) (2001). Conceptual Structures: Extracting and Representing Semantics.Contributions to ICCS 2001. The 9th International Conference on Conceptual Structures,July 30th to August 3rd, 2001, Stanford University, California, USA, Dept of ComputerScience, Faculty of Sciences and Engineering, University Laval, Quebec City, Quebec,Canada. Email: mineau(a-t)ift(d-ot)ulaval(do-t)ca.

Moor, A. d., Lex, W. and Ganter, B. (eds) (2003). Conceptual Structures for Knowledge Creationand Communication: 11th International Conference on Conceptual Structures, ICCS 2003,Dresden, Germany, July 21-25, 2003, Proceedings, Vol. 2746 of Lecture Notes in ArtificialIntelligence (LNAI), Springer Verlag, Berlin.

Mugnier, M.-L. and Chein, M. (eds) (1998). Conceptual Structures: Theory, Tools and Appli-cations – 6th International Conference on Conceptual Structures, ICCS’98, Montpellier,France, August 1998, Proceedings, Vol. 1453 of Lecture Notes in Artificial Intelligence(LNAI), Springer Verlag, Berlin.

Myaeng, S. H. (1992). Conceptual graphs as a framework for text retrieval, in Nagle et al. (1992),pp. 559–572. ISBN: 0-13-175878-0.

Nagle, T. E., Nagle, J. A., Gerholz, L. L. and Eklund, P. W. (eds) (1992). Conceptual Structures:Current Research and Practice, Ellis Horwood, New York. ISBN: 0-13-175878-0.

Nicolas, S. (2003). Sesei: une filtre sémantique pour les moteurs de recherche conventionnelspar comparaison de structures de connaisance extraites depuis des textes en langue naturel,Master’s thesis, Faculté des études supérieures de l’Université Laval. Access online April2004, http://s-java.ift.ulaval.ca/˜steff/memoire-SN-depot-final.pdf.

Nicolas, S., Mineau, G. W. and Moulin, B. (2002). Extracting conceptual structures from Englishtexts using a lexical ontology and a grammatical parser, in Angelova et al. (2002), pp. 15–28.

150 BIBLIOGRAPHY

Nilsson, J. F. (2001). A logico-algebraic framework for ontologies: ONTOLOG, in P. A.Jensen and P. Skadhauge (eds), Ontology-based Interpretation of Noun Phrases: Proceed-ings of the First International OntoQuery Workshop, number 21/2001 in Skriftserie - Syd-dansk Universitet, Institut for Fagsprog, Kommunikation og Informationsvidenskab, Dept.of Business Communication and Information Science, University of Southern Denmark,Kolding, pp. 11–35.

Oswalt, J. N. (1986). The book of Isaiah Chapters 1-39, The New international commentary onthe Old Testament, William B. Eerdmans Publishing Company, Grand Rapids, Michigan.ISBN: 0-8028-2368-8.

Petersen, U. (1997a). Documentation of SynXX, Part II, Unpublished report, available from theauthor upon request.

Petersen, U. (1997b). Feature types and object types in an MdF implementation of the WI BHS,Unpublished report, available from the author upon request. December 11.

Petersen, U. (1999). The Extended MdF model, Unpublished B.Sc. thesis in computer science,DAIMI, Aarhus University, Denmark. Available from http://www.hum.aau.dk/˜ulrikp/.

Petersen, U. (2003). Automatic lexicon-based ontology-creation – a methodological study, Un-published 9th semester project report, Department of Communication, Aalborg University,available at http://www.hum.aau.dk/˜ulrikp/.

Petersen, U. (forthcoming). Emdros – a text database engine for analyzed or annotated text, Pro-ceedings of COLING 2004, Association for Computational Linguistics, p. pages unknown.

Petersen, U. and Winther-Nielsen, N. (1996). Documentation of SynXX. an introduction to thelinguistic programs of the Werkgroep Informatica., Unpublished report, Aalborg UniversityLanguage Lab. October 8.

Pfeiffer, H. D. and Nagle, T. E. (eds) (1993). Conceptual Structures: Theory and Implementation.7th Annual Workshop, Las Cruces, NM, USA, July 1992, Proceedings, Vol. 754 of LectureNotes in Artificial Intelligence (LNAI), Springer Verlag, Berlin.

Priss, U., Corbett, D. and Angelova, G. (eds) (2002). Conceptual Structures Integration andInterfaces: 10th International Conference on Conceptual Structures, ICCS 2002, Borovets,Bulgaria, July 2002, Proceedings, Vol. 2393 of Lecture Notes in Artificial Intelligence(LNAI), Springer Verlag, Berlin.

Rasmussen, A. (1998). Metaforisk ræsonneren, PhD thesis, Department of Communication,Aalborg University.

Rassinoux, A.-M., Baud, R. and Scherrer, J.-R. (1994). A multilingual analyser of medical texts,in Tepfenhart et al. (1994), pp. 84–96.

BIBLIOGRAPHY 151

Ruwe, A. (2000). Zur Analyse der Redeformen im Hohenlied mit QUEST 2, in Hardmeier,Syring, Range and Talstra (2000), pp. 171–181.

Schärfe, H. (2001). Reasoning with narratives, Master’s thesis, Department of Communication,Aalborg University. Available from http://www.hum.aau.dk/˜scharfe.

Schärfe, H. (2002). CG representations of non-literal expressions, in Priss et al. (2002), pp. 166–176.

Schärfe, H. and Øhrstrøm, P. (2000). Computer aided narrative analysis using conceptual graphs,in Stumme (2000), pp. 16–29. ISBN 3-8265-7669-1. Web: www.shaker.de.

Schärfe, H. and Øhrstrøm, P. (2003). Representing time and modality in narratives with concep-tual graphs, in Moor et al. (2003), pp. 201–214.

Schröder, M. (1993). Knowledge based analysis of radiology reports using conceptual graphs,in Pfeiffer and Nagle (1993), pp. 293–302.

Sikkel, C. (2001). Description of Quest II data file format, Unpublished technical note of theWerkgroep Informatica. Version 1.10.

Sikkel, C. (2003). Description of Quest II data file format, Unpublished technical note of theWerkgroep Informatica. Version 1.12.

Southey, F. and Linders, J. G. (1999). Notio - a Java API for developing CG tools, in Tepfenhartand Cyre (1999), pp. 262–271.

Sowa, J. F. (1984). Conceptual Structures: Information Processing in Mind and Machine,Addison-Wesley, Reading, MA.

Sowa, J. F. (1988). Using a lexicon of canonical graphs in a semantic interpreter, in M. Evens(ed.), Relational Models of the Lexicon, Cambridge University Press, Cambridge, UK,pp. 113–137.

Sowa, J. F. (1992). Conceptual graphs summary, in Nagle et al. (1992), pp. 3–51. ISBN: 0-13-175878-0.

Sowa, J. F. (2000a). Knowledge Representation: Logical, Philosophical, and ComputationalFoundations, Brooks/Cole Thomson Learning, Pacific Grove, CA.

Sowa, J. F. (2000b). Ontology, metadata, and semiotics, in Ganter and Mineau (2000), pp. 55–81.

Sowa, J. F. and Way, E. C. (1986). Implementing a semantic interpreter using conceptual graphs,IBM Journal of Research and Development 30(1): 57–69.

Stumme, G. (ed.) (2000). Working with Conceptual Structures: Contributions to ICCS 2000,Shaker Verlag, Aachen. ISBN 3-8265-7669-1. Web: www.shaker.de.

152 BIBLIOGRAPHY

Sykes, J. (ed.) (1982). The Concise Oxford Dictionary of Current English, seventh edn, OxfordUniversity Press, Oxford.

Syring, W.-D. (1998). QUEST 2 - computergestützte Philologie und Exegese, Zeitschrift fürAlthebraistik 11: 85–89.

Syring, W.-D. (2000). Nutzung grammatischer Textdatenbanken zur Analyse literarischer Textemit QUEST 2, in Hardmeier et al. (2000), pp. 159–170.

Talstra, E. (1997). A hierarchy of clauses in Biblical Hebrew narrative, in E. van Wolde (ed.),Narrative Syntax and the Hebrew Bible, Vol. 29 of Biblical Interpretation Series, Brill,Leiden, New York, Köln, pp. 85–118. ISBN 90-04-10787-8.

Talstra, E. (1998). Phrases, clauses and clause connections in the Hebrew data base of theWerkgroep Informatica: Computer-assisted production of syntactically parsed textual data,Unpublished manuscript detailing the procedures used in the analysis-software developedat the Werkgroep Informatica.

Talstra, E. (2002). Computer-assisted linguistic analysis. The Hebrew database used in Quest.2,in Cook (2002), pp. 3–22. ISBN 9004124950.

Talstra, E. and Dyk, J. W. (1999). Paradigmatic and syntagmatic features in identifying subjectand predicate in nominal clauses, in C. L. Miller (ed.), The Verbless Clause in Biblical He-brew. Linguistic approaches., Vol. 1 of LSAWS: Linguistic Studies in Ancient West Semitic,Eisenbrauns, Winona Lake, pp. 133–185. ISBN 1575060361.

Talstra, E. and Postma, F. (1989). On texts and tools. A short history of the Werkgroep Informat-ica (1977-1987), in E. Talstra (ed.), Computer Assisted Analysis of Biblical Texts, Vol. 7 ofAPPLICATIO, VU University Press, Amsterdam, pp. 9–27.

Talstra, E. and Sikkel, C. (2000). Genese und Kategorienentwicklung der WIVU-Datenbank, inHardmeier et al. (2000), pp. 33–68.

Talstra, E. and van der Merwe, C. H. (2002). Analysis, retrieval and the demand for more data.Integrating the results of a formal textlinguistic and cognitive based pragmatic approach tothe analysis of Deut 4:1-40., in Cook (2002), pp. 43–78. ISBN 9004124950.

Talstra, E. and Van Wieringen, A. L. (1992). Introduction, in E. Talstra and A. L. Van Wieringen(eds), A Prophet on the Screen – Computerized Description and Literary Interpretation ofIsaianic Texts, Vol. 9 of APPLICATIO, VU University Press, Amsterdam, pp. 1–20.

Talstra, E., Hardmeier, C. and Groves, J. A. (1992). Quest. Electronic concordance applica-tion for the Hebrew Bible (database and retrieval software), Nederlands Bijbelgenootschap(NBG), Haarlem, Netherlands. Manual: J.A. Groves, H.J. Bosman, J.H. Harmsen, E. Tal-stra, User Manual Quest. Electronic Concordance Application for the Hebrew Bible, Haar-lem, 1992.

BIBLIOGRAPHY 153

Tepfenhart, W. and Cyre, W. (eds) (1999). Conceptual Structures: 7th International Conferenceon Conceptual Structures, ICCS’99, Blacksburg, VA, USA, July 1999, Proceedings, Vol.1640 of Lecture Notes in Artificial Intelligence (LNAI), Springer Verlag, Berlin.

Tepfenhart, W. M., Dick, J. P. and Sowa, J. F. (eds) (1994). Conceptual Structures: CurrentPractices – Second International Conference on Conceptual Structures, ICCS’94, CollegePark, Maryland, USA, August 1994, Proceedings, Vol. 835 of Lecture Notes in ArtificialIntelligence (LNAI), Springer Verlag, Berlin.

van der Merwe, C., Naudé, J. A. and Kroeze, J. H. (1999). A Biblical Hebrew Reference Gram-mar, number 3 in Biblical Languages: Hebrew, Sheffield Academic Press, Sheffield.

van Keulen, P. S. (2002). A case of ancient exegesis: The story of Solomon’s adversaries (1 Kgs11: 14–25) in Septuaginta, Peshitta, and Josephus., in Cook (2002), pp. 555–572. ISBN9004124950.

van Rijn, A. (1992). Generating language from conceptual dependency graphs, in Nagle et al.(1992), pp. 525–542. ISBN: 0-13-175878-0.

Van Valin, Jr., R. D. (2001). An introduction to Syntax, Cambridge University Press, Cambridge,U.K.

Van Valin, Jr., R. D. and LaPolla, R. J. (1997). Syntax – Structure, meaning, and function,Cambridge University Press, Cambridge, U.K.

Velardi, P., Pazienza, M. T. and De Giovanetti, M. (1988). Conceptual graphs for the analysisand generation of sentences, IBM Journal of Research and Development 32(2): 251–267.

Verheij, A. J. (1994). Grammatica Digitalis I – The Morphological Code in the «WerkgroepInformatica» Computer Text of the Hebrew Bible, Vol. 11 of APPLICATIO, VU UniversityPress, Amsterdam.

Wille, R. (1997). Conceptual graphs and formal concept analysis, in Lukose et al. (1997),pp. 290–303.

Wille, R. (1998). Triadic concept graphs, in Mugnier and Chein (1998), pp. 194–208.

Winskel, G. (1993). The formal semantics of programming languages: An introduction, MITPress, Cambridge, Mass. ISBN 0-262-23169-7.

Winther-Nielsen, N. and Talstra, E. (1995). A Computational Display of Joshua. A Computer-assisted Analysis and Textual Interpretation, Vol. 13 of APPLICATIO, VU University Press,Amsterdam.

Zweigenbaum, P. and Bouaud, J. (1997). Construction d’une représentation sémantique engraphes conceptuels à partir d’une analyse LFG, in D. Genthial (ed.), Actes de la qua-trième conférence annuelle sur le Traitement Automatique des Langues Naturelles (TALN97), Grenoble, France, pp. 30–39. See http://www-test.biomath.jussieu.fr/˜pz/.


154

Appendix A

Hebrew transliteration

The following table gives the transliteration used in this thesis. No differentiation is made be-tween final versions and non-final versions, and the dagesh is not represented.

The dash (-) is used both for (some instances of) maqqeph and for separating words that arewritten together. For example, H-MLK is “ham-melech”, “the king”, which would have beenwritten as one word in BHS.

Name Transliteration Name Transliteration Name Transliteration

Aleph > Tet V Peh PBeth B Yod J Qoph QGimel G Kaf K Tsadi YDaleth D Lamed L Resh RHeh H Mem M Sin FWaw W Nun N Shin CZayin Z Samech S Tav TChet X Ayin <

Table A.1: The Hebrew alphabet and its transliteration

155


156

Appendix B

Hebrew

B.1 IntroductionIn this appendix, I describe the Hebrew language and the WIVU database. I refer to this materialin Chapter 3.

B.2 The Hebrew language

B.2.1 IntroductionIn this section, I introduce a number of features of the Hebrew language as it is found in mychosen text, Genesis 1. I start by describing the graphology (B.2.2), after which I treat word-level, including the morphological and lexical levels (B.2.3). The verbal system, though mostlybelonging to word-level, is so complex that it deserves a section of its own (B.2.4). After thissection, I treat some aspects of Hebrew syntax (B.2.5). Finally, I sum up my findings in aconclusion (B.4).

B.2.2 GraphologyThe Hebrew alphabet contains 22 or 23 letters,1 depending on how you count.2 They are listed inAppendix A on page 155. The letters are consonants, with vowels being represented as symbols

1 See van der Merwe, Naudé and Kroeze (1999, p. 22–23).2 The difference is whether sin and shin are counted as one letter or two. They represent different phonetic values

(“/s/” vs. “/sh/”, respectively), but ancient evidence suggests that they were originally thought of as being the sameletter.

One such piece of evidence comes from the acrostic psalms and other songs of the Old Testament. An acrosticpoem is a poem in which the sections or stanzas each begin with successive letters of an alphabet. In almost all(if not all) the acrostic psalms of the Old Testament, the two letters are conflated, thus leading to 22 rather than 23stanzas. See, for example, Psalm 119:161–168 where both letters are attested. Other examples with both letters inthe same stanza include Lamentations 3,61–63. Other examples include Psalm 111:10, where sin is used (no shin),and Psalm 112:10, where shin is used (no sin).

157

158 APPENDIX B. HEBREW

below, above, or inside the consonants.3

The vowels are auxiliary, in the sense that they are not strictly necessary for reading, sincethey can, to some extent, be inferred from the consonants. In fact, the vowels are left out in themanuscripts used for public reading in the Jewish synagogues.4 The vowels were not part ofthe original manuscripts, but were added between 600 AD and 1000 AD by a group of scholarsknown as the Masoretes.5

There are a number of other signs used in the BHS. Of these, I will only mention a few. Thedagesh, represented by a dot in the middle of a letter, is used for a number of purposes, includingshowing that a consonant has to be doubled (reduplication, see van der Merwe et al. (1999, pp.38–39)), and showing that a letter is to be pronounced as a plosive rather than a fricative6 (van derMerwe et al. (1999, pp. 24, 38–39)). The maqqeph is a hyphen which connects linguistic wordsinto one graphical entity (van der Merwe et al. (1999, p. 43)).

Thus the Hebrew alphabet consists of 22 or 23 consonants. Vowels are indicated with sym-bols above, below, or inside the consonants. The dagesh shows reduplication or plosiveness, andthe maqqeph joins distinct linguistic words into one graphical entity.

B.2.3 Words

B.2.3.1 Introduction

In this section, I describe certain aspects of word-level in Hebrew which it would be advantageousto be able to discuss at a later stage. First I treat parts of speech (B.2.3.2). I then treat certainaspects of Hebrew morphology (B.2.3.3).

B.2.3.2 Parts of speech

The standard reference grammar used in this thesis, van der Merwe et al. (1999, pp. 53–59),lists the parts of speech given in Table B.1 on the next page.

Two parts of speech deserve special mention. First, the “predicators of existence”7 is a classconsisting of the two words “JC”8 (“there is”) and “>JN” (“there is not”). They do not occurin my chosen text. Second, the discourse markers9 are used “to comment on the content of asentence and/or sentences from a meta-level”.10 Of the forms that van der Merwe et.al. (1999)list, only the following are found in my chosen text: “HNH” (“behold” or “look”), “Wa-JeHiJ”(“and there became/was”).

3 See van der Merwe et al. (1999, p. 28–29).4 See Cohn-Sherbok (1996, p. 5).5 See van der Merwe et al. (1999, p. 28).6 “Plosive” and “fricative” are labels which phoneticians give to two different manners of articulation. For

example, [f] is a fricative, whereas [p] is a plosive in English. Likewise, [th] is a fricative, whereas [d] is a plosivein English. See Matthews (1997, pp. 137, 284).

7 See van der Merwe et al. (1999, pp. 58, 320–321).8 See Appendix A on page 155 for the transliteration used in this thesis.9 See van der Merwe et al. (1999, pp. 59, 328–333).

10 van der Merwe et al. (1999, p. 328).

B.2. THE HEBREW LANGUAGE 159

Part of speechVerbNounAdjectivePrepositionAdverbConjunctionInterrogativeInterjectionPredicators of existenceDiscourse marker

Table B.1: Parts of speech in Biblical Hebrew (van der Merwe et.al. (1999)).

Note, however, that the “discourse markers” may equally plausibly be regarded as belongingto other parts of speech: “HNH” can thus be classified as an interjection (as the WIVU databasedoes), and “Wa-JeHiJ” can also be described as a wav consecutive imperfect of the verb “HJH”(see below, Section B.2.4.5 on page 162). Likewise, “JC” and “>JN” can also be classified asnouns, which is what Holladay (1988, pp. 145, 13) does.

The other parts of speech present in my chosen text11 are: verb, noun, adjective, adverb,preposition, and conjunction, and interjection.12

In addition, Hebrew has a definite article,13 as well as an “object marker” (“>T”, or “et”).14

Thus Hebrew has many of the familiar parts of speech found in Indo-European languages,as well as two “parts of speech” as classified by van der Merwe et.al. (1999) which are non-standard.

B.2.3.3 Hebrew morphology

Hebrew has quite a number of classes of morphemes which are used in constructing words. Threeclasses which deserve special mention15 are:

• Nominal ending,

• verbal ending, and

• pronominal suffix.

11 That is, according to the VIWU database.12 The only interjection present is HNH, “behold”.13 See van der Merwe et al. (1999, p. 187).14 For the object marker, see van der Merwe et al. (1999, pp. 245–247).15 Because all of them will be treated specially by my method.


PersonFirstSecondThird

NumberSingularDualPlural

GenderMasculineFeminine

StateAbsoluteConstruct

Table B.2: Summary of possible values for person, number, gender, and state.

Nominal ending Verbal ending Pronominal suffixPerson No Yes YesNumber Yes Yes YesGender Yes Yes YesState Possibly No NoMeaning-target noun itself verb’s subject noun’s possessor

or verb’s object

Table B.3: Summary of Hebrew suffixes.

The nominal ending tells the reader which gender, number, and possibly “state” a noun or adjec-tive belongs to.16 The gender may be either feminine or masculine.17 The number may be eithersingular, plural or dual.18 The “state” can be either “construct” ore “absolute,” and is used forgenitive relations.19 See Section B.2.5.2 on page 164 for a fuller treatment of state relations.

The verbal ending tells the reader which person, number, and gender a verb’s subject has.20

The pronominal suffix is used for possessives on nouns21 and objects on verbs.22 It showsperson, number, and gender.

All of this can be summarized as in Table B.2 and Table B.3.

B.2.4 The verbal systemB.2.4.1 Introduction

In this subsection, I describe various aspects of the Hebrew verbal system as necessary formy later discussions. I start by summarizing the tenses and moods (B.2.4.2). I then discusstense/aspect in more detail (B.2.4.3), after which I do the same for moods (B.2.4.4). The Hebrewverb-system has an intraclausal feature called wav-consecutive perfect/imperfect, which I treatin (B.2.4.5). Hebrew also has infinitives (B.2.4.6) and directives (B.2.4.7).

16 See Cohn-Sherbok (1996, pp. 29–31).17 See van der Merwe et al. (1999, pp. 175–181).18 See van der Merwe et al. (1999, pp. 181–187).19 See van der Merwe et al. (1999, pp. 191–200).20 See Cohn-Sherbok (1996, pp. 55–56), van der Merwe et al. (1999, pp. 67–68).21 See Cohn-Sherbok (1996, pp. 36–37), van der Merwe et al. (1999, pp. 200–212).22 See van der Merwe et al. (1999, pp. 90–95).


TensePerfect (Qatal)Imperfect (Yiqtol)Imperative (Qetol)CohortativeJussiveInfinitive ConstructInfinitive AbsoluteParticiple

Table B.4: Hebrew verb tenses

Stem formation MeaningQal No specific meaning. Either Action or Stative verbsNiphal PassivePiel IntensivePual Passive intensiveHithpael ReflexiveHiphil CausativeHophal Passive causative

Table B.5: Hebrew verb “moods” or “stem formations”

B.2.4.2 Tenses and moods

Hebrew verbs come in 8 “tenses” and 7 “moods.” The tenses (van der Merwe et al. (1999, pp.68–72)) are listed in Table B.4. The “moods” or “stem formations” (van der Merwe et al. (1999,pp. 73–90)) are listed in Table B.5.

B.2.4.3 Tense/Aspect

Generally speaking, a language may have a verb-system which is either tense/time-oriented oraspect-oriented.23 A verb-system which has the grammatical means of signifying time relative toa ’now’ (i.e., past/present/future) is primarily tense/time-oriented. A verb-system which does nothave this means usually has the grammatical means of signifying whether an action is completedor uncompleted.

There is scholarly disagreement as to whether Biblical Hebrew is a tense/time-oriented oran aspect-oriented language.24 However, in any given instance of a verb-usage in a particularcontext, both tense/time and aspect are semantically present. Thus, “[i]t is not clear whether in

23 See van der Merwe et al. (1999, pp. 141–142).24 See van der Merwe et al. (1999, p. 142).


BH25 it is time that assumes aspect, or aspect that assumes time.”26.I do not presume to be able to say with authority either for or against either position. For my

work, I have chosen to disregard the tense/aspect elements of Hebrew syntax altogether. Instead,I have made the graphs tense-less and aspect-less. This is clearly a point for further research.Later (in Section B.2.5.3) I will show how the tenses could be interpreted.

B.2.4.4 Moods

The “stem-formation” or “moods” are also called the “binyanim.” Morphologically, they arerealized by means of verb prefixes, infixes, and suffixes, as well as vowel changes.

For example, the verb “HLL”, “to praise” in the Hithpael or reflexive mood means “to praiseoneself”.27 In the Niphal or passive first person masculine singular, “HLL” would mean “I ampraised”. The verb “CWB”, which in the Qal stem means “return”, among other things,28 wouldmean “cause us to return” in the Hiphil or causative first person plural masculine imperative.

Observe that the Niphal is the passive version of Qal,29 the Pual is the passive version ofPiel,30 and the Hophal is the passive version of Hiphil.31

Thus the “mood” shows both voice (active/passive) and a number of other features, includingintensive, reflexive, and causative.

B.2.4.5 Wav + perfect/imperfect

The letter “Wav” is the conjunction “and”. It combines with the perfect (qatal) and the imperfect(yiqtol) at the level of intra-clausal syntax to generate two special verb-forms. These are calledwav-consecutive perfect (weqatal) and wav-consecutive imperfect (wayyiqtol).

The wayyiqtol form is used together with qatal forms to indicate continuation of temporalspheres and aspects, but can also signal progression.32 It can also be used to introduce or controlthe flow of narrative.33

The weqatal form is used to “refer to the same temporal spheres and aspects as an imperfectform”,34 but can also signify progression, e.g., sequence in time.35 It can also be used to carrythe backbone of a narrative, either predictive (future), habitually descriptive, or precative.36

25 In quotes from the van der Merwe grammar, “BH” means “Biblical Hebrew.”26 Quote is from van der Merwe et al. (1999, p. 144).27 See Cohn-Sherbok (1996, p. 78).28 See Holladay (1988, 362).29 This is an over-simplification; see van der Merwe et al. (1999, p. 78), which states that only about 60% of

the extant Niphals are related to the Qal stem. The rest are related to Piel or Hiphil, or bear no relation to any otherstem.

30 In all cases, according to van der Merwe et al. (1999, p. 82).31 According to van der Merwe et al. (1999, p. 88). The Niphal may also take on the function of the passive

sense of Hiphil (ibid.).32 See van der Merwe et al. (1999, p. 165).33 See van der Merwe et al. (1999, pp. 166-168).34 See van der Merwe et al. (1999, p. 169).35 Ibid.36 See van der Merwe et al. (1999, pp. 169–170).


Form Person Direct/IndirectImperative Second DirectCohortative First IndirectJussive Third (Second) Indirect

Table B.6: Directives in Biblical Hebrew.

Sometimes, the wav does not give the verb this special “consecutive” status. In such cases,grammarians of Biblical Hebrew speak of “wav-copulative perfect” and “wav-copulative imper-fect.”.37 Then the conjunction just means “and”, and the verb is a straight perfect (qatal) orimperfect (yiqtol) with no special meaning.

Below, in Section B.2.5.3 on page 165, I return to this distinction, giving it a tense-orientedsemantic interpretation.

B.2.4.6 Infinitives

There are two kinds of infinitives: Infinitive construct and infinitive absolute. In my chosen text,only the infinitive construct occurs. It may be used in place of a noun, often filling the slot ofSubject in the matrix clause.38 It may also be part of the predicate, either as complement, filler ofmental states after verbs of cognition, signifying purpose, manner or other adverbial functions,and a number of other roles as part of the predicate.39 Or, it may act as a complementizer markingreported speech,40 in the form L->MR41.

B.2.4.7 Directives (Imperatives, Cohortatives, Jussives)

Three verb forms are precative in their meaning: Imperative, Cohortative, and Jussive. Impera-tive is, broadly speaking, a direct command to the second person.42 The cohortative is an indirectcommand to the first person,43 and the jussive is an indirect command to the third person, some-times the second.44 This can be summarized as in Table B.6.

Morphologically, the Cohortative and Jussive are special forms of the imperfect (yiqtol).45

This means that they carry much of the same semantic load as the imperfect, as regards tense/aspect.46

37 See van der Merwe et al. (1999, pp. 168,171).38 See van der Merwe et al. (1999, p. 154).39 See van der Merwe et al. (1999, pp. 154–155).40 See van der Merwe et al. (1999, pp. 155–156).41 L>MR is pronounced “lemor”, and comes from the root “>MR,” “to say.” See Holladay (1988, p. 21). The

infinitive construct L>MR is used very often in the Hebrew Bible to mark reported speech. So often, in fact, that theprograms of the Werkgroep Informatica use it as one of the triggers for deciding that a piece of text is a quotation.

42 See van der Merwe et al. (1999, p. 71).43 Ibid. and p. 151.44 Ibid. and p. 15245 See van der Merwe et al. (1999, pp. 75–76).46 See van der Merwe et al. (1999, p. 152).


B.2.5 SyntaxB.2.5.1 Introduction

In this subsection, I describe two features of Hebrew which will be treated each in its specialway in my method. First, I describe the feature “state”, which has to do with genitive relations(B.2.5.2). And second, I describe the “tense” of verbs, as regards the tense/time dimension(B.2.5.3).

B.2.5.2 State

Hebrew has a feature called “state.” It applies to nouns,47 adjectives,48 and participles.49 Thefeature “state” may take on the values “absolute” or “construct”. The “absolute” value is the“normal” form of the word.50 The “construct” value is shown by various morphological changes,and is used to show that the word in the construct state and the word following it “form a posses-sive construction (in the broadest sense of the word)”.51

For example, the English construction

Horses of the King

can be written in Hebrew as

SWSJ H-MLK52

where “SWSJ” is in the construct state and “MLK” is in the absolute state. Thus the word in theabsolute state is the possessor, and the word in the construct state is the possessed.

Sometimes, the word in the construct state is homonymous with the same form in the abso-lute state. For example, the masculine singular construct is mostly the same as the masculinesingular absolute.53 Therefore, construct relations are best viewed as a phrase-level constructionrather than a morphological construction, which can be disambiguated by means of syntactic orpragmatic considerations, even if the morphology is ambiguous.

Thus Hebrew has a construct relationship which applies to nouns, adjectives, and participles.It consists of a word in the construct state followed by a word in the absolute state. The con-struction shows a possessive relationship (in the broadest sense of the word). The constructionis best viewed as a phrase-level phenomenon rather than only occurring at the level of words ormorphemes.

47 See van der Merwe et al. (1999, pp. 191–200).48 When functioning as a noun. See van der Merwe et al. (1999, p. 234).49 When functioning as a noun. See van der Merwe et al. (1999, p. 163).50 See van der Merwe et al. (1999, p. 192).51 Ibid.52 “Susey hammelek”. “SWS” means “horse” (or “stallion”), and “MLK” means “king”. “H” is the definite

article. See Holladay (1988, pp. 254,198).53 See van der Merwe et al. (1999, p. 193).


Past FutureQatal YiqtolWav-consecutive imperfect (wayyiqtol) Wav-consecutive perfect (weqatal)Wav-copulative perfect (Wav + qatal) Wav-copulative imperfect (weyiqtol)

Table B.7: Tenses as specified by intraclausal syntax.

B.2.5.3 Tense

Andersen (1994) argues that the problem of tense in Biblical Hebrew is resolved not at the levelof verb morphology, but at the level of intra-clausal and inter-clausal syntax. He argues thatthe division shown in Table B.7 is the salient reading of Hebrew verbs forms (see the sustainedargument on pp. 99–102 of Andersen (1994)).

The wav-copulative perfect (past) and the wav-consecutive perfect (future) are homonymous(p. 102). However, in a paragraph dominated by wav-consecutive imperfect (wayyiqtol; past),the former reading is more likely (p. 103). This shows that inter-clausal relationships are alsoimportant.

Andersen’s division is in agreement with van der Merwe et.al. (1999) on the following points:

• Weqatal can be used as future predictively and precatively (van der Merwe et al. (1999,pp. 169–170), Andersen (1994, pp. 102–103)).

• Yiqtol can also be used as future tense (van der Merwe et al. (1999, pp. 146–147), Ander-sen (1994, p. 103)).

• Wayyiqtol can be used as past tense (van der Merwe et al. (1999, p. 166), Andersen (1994,p. 103)).

• Qatal can also be used as past tense (van der Merwe et al. (1999, pp. 144–145), Andersen(1994, p. 103)).

Andersen’s division shows the salient reading of the verb-forms, that is, the one that can beassumed by default. However, pragmatic concerns may force us to interpret the verb-formsdifferently. For example, in the presence of time-adverbials which contradict this division, adifferent reading is possible. However, Andersen argues that this division is the one that holds inmost cases, and I shall not have occasion to question his findings.54

This forms the basis of an interpretation of the Hebrew tenses which could be used in futureresearch. In my work, however, I have chosen to disregard the tenses, instead making the graphstenseless.

54 Other scholars postulate the same division. E.g., Dyk and Talstra (1988, p. 51) quote the textbook of WolfgangSchneider (“Grammatik des Biblischen Hebräisch”, Claudius Verlag, Munich, 1985–1988) as having basically thesame division.


B.2.6 Conclusion

In this section, I have looked at a number of features of the Hebrew language which are impor-tant for understanding my later discussions. This has included graphology (B.2.2), Word-level(B.2.3), the verbal system (B.2.4), and various aspects of the syntax of the language (B.2.5).

For Word-level, I have treated the various parts of speech (B.2.3.2), and a number of featuresfrom Hebrew morphology (B.2.3.3).

Since the verbal system is one of the most complex areas of Hebrew grammar, I have treatedit in a section of its own apart from Word-level. I have looked at tense/aspect and moods in theHebrew verb (B.2.4.2, B.2.4.3, B.2.4.4). I have looked at what happens when a wav combineswith a perfect or imperfect verb form (B.2.4.5). Finally, I have looked at infinitives (B.2.4.6) anddirectives (B.2.4.7) in Hebrew.

For syntax, I have looked at only two features of the language, namely state (B.2.5.2) andtense/time (B.2.5.3).

Thus I have described certain features of the Hebrew language, some of which are pertinentto my method and to my discussions elsewhere in the thesis.

B.3 The WIVU database

B.3.1 Introduction

In this section, I describe the parts of the WIVU database which are relevant to my methodolog-ical considerations. First, I give a short history of the Werkgroep Informatica (B.3.2). I thentreat a very important distinction in the WIVU database which permeates all of the data, namelythat between distributional and functional data (B.3.3). I then describe the methods and pro-cedures used in producing the WIVU database (B.3.4). I then treat Word level (B.3.5), Phraselevel (B.3.6), Clause level (B.3.7), and Sentence level (B.3.8) in succession, reflecting the orderin which the database is produced. Finally, I sum up my findings in a conclusion (B.3.9).

B.3.2 History of the Werkgroep Informatica

The Werkgroep Informatica was founded by Eep Talstra in 1977.55 From the beginning, it was aresearch unit within the Faculty of Theology at the Vrije Universiteit Amsterdam.56 Its foundingideal was to “bring together members of the Faculty who would be helped in their research byany kind of computer application”.57 Thus the Werkgroep initially drew together members from“biblical studies, the sociology of religion, [and] the disciplines of ecclesiastical history and

55 I have adapted much of the discussion in this section from Talstra and Postma (1989). A description of theearly stages of the project is also found in Hughes (1987, pp. 505–509)

56 I.e., Free University of Amsterdam.57 I quote from Talstra and Postma (1989, p. 9).

B.3. THE WIVU DATABASE 167

dogmatics”,58 which I judge to be quite a diverse range of theological disciplines.59

It was, however, also clear from the start that the computer-assisted analysis of biblical textswould be the most complex and demanding task of the Werkgroep, while also being the mostimportant. Therefore, the “biblical studies” section has been the most active in the Werkgroep.

The years 1977–197960 were devoted to matters necessary for setting up the research unit,including getting access to the necessary computer facilities and getting the proper training, e.g.,in PASCAL programming. A small database of Hebrew Old Testament texts was produced,allowing the testing of various methods and programs for linguistic analysis of the text. Onlymorphological and lexical information was produced at this stage, including an encoding schemefor the text which allowed programmatical production of morphologically and lexically analyzedtexts from the encoded text alone.

The years 1979–198361 saw more focused interest in developing the Hebrew database, en-fleshed by the inclusion of a new member of the Werkgroep, Ferenç Postma. He was responsiblefor producing a fully morphologically and lexically encoded text according to the principleshammered out during the years 1977–1979. From this text, a number of results began to emerge,including the production of concordances for certain Old Testament books, and a study by P.A.Siebesma on the verbal tenses in Deutero-Isaiah.62 Another research effort of the Werkgroepwhich dates from this period was the design of a similar encoding for New Testament Greek,where an encoded text, together with the necessary software for analyzing it, was produced for anumber of books of the Greek New Testament. A third research effort from this period was thestart of a project on Old Testament bibliography by A.J.O. van der Wal. In summary, this periodsaw a continued effort in the linguistic encoding and analysis of Biblical texts, resulting in theemergence of the first results, along with other related research efforts.63

58 Ibid.59 We may note that the founding ideal of the WI thus is very similar to one of the goals of human-centered

informatics. In human-centered informatics, we strive both, on the one hand, to apply computer science to thetraditional subjects of humanities, such as linguistics, and, on the other hand, to apply insights from traditionalsubjects in humanities to computer science.

At the WI, they do both. On the one hand, they apply computer-methods to studying ancient texts and theirexegesis, while on the other hand, this application of computer-technology is shaped by an understanding of thetraditional subjects of linguistics and theology. In other words, both the humanities-oriented subjects of linguisticsand theology on the one hand, and computer science on the other hand, are enriched by the research efforts of theWI.

60 See Talstra and Postma (1989, p. 10–13).61 See Talstra and Postma (1989, p. 13–15).62 Deutero-Isaiah is the common name used by theologians for the book of Isaiah chapters 40-55. See Oswalt

(1986, pp. 17–19).63 On the subject of the importance of the production of a linguistically analysed database of the Biblical texts,

Talstra and Postma write (p. 13), “Using this database, we have programs designed to do the grammatical and lexicalsearching based on the assumption that much exegetical work, of whatever method, involves searching for parallelsyntactic constructions as much as, or even more than searching for lexical information on word frequencies andword repetitions. In many cases, both will be needed. Because the system of morphological encoding is capable ofboth isolating the lexeme and calculating grammatical functions, programs could be written now to provide for thetype of research wanted.” This resulted in the publication of a book including, not only a traditional lexically basedconcordance, but also a syntactic concordance for Deutero-Isaiah (Talstra, E. and Postma, F. and van Zwet, H.A.(1982). “Deuterojesaja. Proeve van automatische tekstverwerking ten dienste van de exegese”, Free University


The years 1983–198564 saw both a weakening and a strengthening of the Werkgroep. Thefirst years of the period were marked by a weakening economy and resultant budget cuts, whichalmost led to the cancellation of the Werkgroep’s activities. However, the Werkgroep survivedthe dire prospects of termination, and subsequently has flourished remarkably. In 1983, EepTalstra started developing linguistic programs which used the Hebrew machine-readable text ofthe Werkgroep for analysis at linguistic levels above the Word-level. The book of Deutoronomywas almost fully analyzed from morpheme to text in cooperation with Lenart de Regt of LeidenUniversity. In addition, a new project was started to encode and analyze Biblical Aramaic texts.65

The years 1985–198766 saw the addition of two new Werkgroep members, Janet W. Dyk andPeter Crom. Mr. Crom worked as a programmer and system administrator. Ms. Dyk, who hada background in linguistics and semitic languages, worked on training students in the use of theprograms of the Werkgroep, so that courses could be established for students with an interest inBible translation and the grammar of the Biblical languages. She also revised the morphologicalencoding for the Greek New Testament, leading to more possibilities in the morphological andlexical analysis.

From 1987 onwards, there is no published material on the history of the Werkgroep per se.Instead, the following relies on personal communication with members of the Werkgroep,67 aswell as publications which mark milestones in the history of the Werkgroep.

In 1987, Hendrik Jan Bosman joined the Werkgroep. About a year later, Constantijn Sikkelalso joined. These two men have been with the Werkgroep ever since, and have contributedgreatly to the advancement of the Werkgroep’s purposes. Hendrik Jan initially assisted withdata production. From 1992–1996, he was a PhD student at the Werkgroep. Constantijn Sikkelhas worked as a system administrator and programmer. Bosman says of Mr. Sikkel, “Doeseverything for everybody, really.”68

Saskia Leene joined the Werkgroep in 1987 and left in 1994. Ms. Leene assisted with dataproduction, and was the programmer for Ms. Janet Dyk’s dissertation (promotion in 1994).

In 1992, a milestone in the history of the Werkgroep was reached through the publicationof a software program called QUEST. This software program was a text database applicationfor the IBM PC and MS-DOS containing the WIVU database in its 1992 state. QUEST wasproduced by the Werkgroep Informatica in cooperation with Christof Hardmeier of TheologischeHochschule Bielefeld and J. Alan Groves of Westminster Theological Seminary in Philadelphia,Pennsylvania. The software was written by AND software of Rotterdam, and was published bythe Netherlands Bible Society.69

QUEST had a query language called QML (Query Menu Language). This language was

Press, Amsterdam, 2nd edition).Thus one of the main reasons for producing a syntactic database of the Old Testament in Hebrew is that it provides

a supporting research tool for doing exegetical research.64 See Talstra and Postma (1989, p. 15–20).65 Parts of the Old Testament are originally in Aramaic, a semitic language related to Hebrew but distinct from it.66 See Talstra and Postma (1989, p. 20–21).67 I am especially indebted to Mr. Hendrik Jan Bosman for helpful assistance with digging up historical facts

about the Werkgroep.68 Personal communication.69 See Talstra, Hardmeier and Groves (1992).


designed by Henk Harmsen, Crist-Jan Doedens, and Jan Melein and implemented by AND soft-ware.70 Crist-Jan Doedens later took the ideas nascent in QML and turned them into a full-fledged query language, as well as a text database model, in his 1994 PhD thesis.71 Out ofDoedens’ work sprang my B.Sc. work,72 which elaborated on Doedens’ database model andmade his query language more implementable. This eventually led to the implementation ofEmdros, and today the Werkgroep are actively using Emdros for some of their research tasks.

In 1996, I myself became involved in the Werkgroep. In the Spring of 1996, I met an associateprofessor of theology, teol.dr. Nicolai Winther-Nielsen, who introduced me to the analysis-programs of the Werkgroep. My task was to make the programs run under the Linux operatingsystem, and to analyse and describe the programs. The latter task resulted in a preliminary report(Petersen and Winther-Nielsen (1996)) which later turned into a more stringent report (Petersen(1997a)). The former task has continued even until today, where I remain the maintainer ofthe Linux versions of the Werkgroep’s programs. In February 1997, I received a grant to goto Amsterdam to visit the Werkgroep. During this time, I reprogrammed parts of one of theirprograms, and formed ties with the Werkgroep. These ties led to their inviting me to come andwork for them during the months October - December 1997. During this time, I reimplemented afew more of their programs, designed and implemented an algorithm for finding the boundariesof certain elements in the WIVU database,73 and wrote a design document for how to applyDoedens’ work to the WIVU database.74

QUEST contained the state-of-the-art WIVU database as it was in 1992. However, since thenthe database has been expanded greatly, with more books being fully analysed from morphemeto text, and all of the text being analysed up to clause-level. Since 1992, a project has been un-derway to create the successor to QUEST, which for many years has been code-named QUEST2, but in its final market-form will be called the “Stuttgart Electronic Study Bible” (SESB). Thisproject is a cooperation between the Werkgroep, the Ernst-Moritz-Arndt-Universität Greifswald(with Christof Hardmeier and Wolf-Dieter Syring as the main partners), the German Bible Soci-ety and Libronix Corporation.75 The Werkgroep’s task in this endeavor is to deliver the WIVUdatabase in machine-readable format, and for this purpose, a dataformat was devised, calledQDF.76 This dataformat was in part based on my earlier work from 1997.77

Let me here interject an anecdote. I have written a program which takes the QDF files,importing them into Emdros. This gives rise to the possibility of doing syntactic research on

70 See Doedens (1994, pp. 263–264). The language was described partially in Harmsen, Henk (1988). “Software-functions, quest operating system”, unpublished report, Theological Faculty, Vrije Universiteit, Amsterdam, Septem-ber 1988 and more fully in Doedens, Crist-Jan and Harmsen, Henk (1990). “QUEST retrieval language referencemanual”, unpublished reference manual, presented to AND software, September 1990, both of which are cited inDoedens (1994). In addition, Harmsen (1992) partially describes the QUEST retrieval language.

71 Doedens (1994).72 Petersen (1999).73 See below, section B.3.5.3 on page 178.74 Petersen (1997b).75 For a description of the project, see Syring (1998). For examples of research done with beta-versions, see

Syring (2000) and Ruwe (2000).76 Quest Database Format. See Sikkel (2001).77 Petersen (1997b).


the WIVU database with Emdros. Sometimes, this can lead to confirmation of old insights intoHebrew grammar which are significant exegetically. For example, in Judges 5:1, we have thephrase “And Deborah and Barak sang”. The subject of this sentence (“Deborah and Barak”)is obviously plural, since it has “X and Y”. Because the verb is feminine singular in Hebrew,certain feminist theologians have advanced the theory that it was really only Deborah who sang,whereas Barak was only an adjunct who had to be there because of the alleged male-chauvinisticnature of the Hebrew semitic culture, and the resulting impossibility of letting a woman singsinglevoicedly in Holy Scripture. The feminists might have taken the trouble to read the oldstandard Hebrew grammar by Gesenius and have seen that a verb agrees with the first part of acompound subject. Can this old insight be confirmed by modern means?

Using the MQL query language of Emdros, together with the WIVU database, one can searchthe entire Old Testament for similar constructions (i.e., a clause with a singular verb yet a com-pound subject in which an “and” conjunction is found). The results turn up around 250 uniqueinstances, which shows that the construction was not uncommon in ancient Hebrew. Therefore,the claim of the feminists is weakened on syntactic grounds. This is an example of how a syn-tactic database can be of help in testing exegetical theories. End of anecdote.

The Werkgroep has also produced an electronic Hebrew-and-Aramaic to English-and-Germanlexicon. This project ran officially from 1997 to 1998, and unofficially until the present. HendrikJan Bosman and Ferenç Postma were the chief compilers of this lexicon.

In the late 1980’ies, Arian J.C. Verheij joined the Werkgroep as a PhD student (promotionin 1990). He initially worked on data production, as well as a new encoding of the text. Thislatter project culminated in 1994 with the publication of Verheij (1994), which described thenew encoding, another milestone in the history of the Werkgroep. For this project, Dr. Verheijcollaborated with C. Sikkel and F. Postma. Later, Dr. Verheij worked on the binyanim of He-brew,78 culminating in the publication of Verheij, A.J.C. (2000), “Bits, Bytes, and Binyanim. AQuantitative Study of Verbal Lexeme Formations in the Hebrew Bible”, Orientalia LovaniensiaAnalecta, Volume 93, Peeters, Leuven.

Several other persons have been involved peripherally in the Werkgroep. One which deservesmention is teol.dr. Nicolai Winther-Nielsen, then of the Lutheran School of Theology in Aarhus,who co-authored a volume on the book of Joshua with Prof. Dr. Eep Talstra.79 Prof. Dr.Christo H.J. van der Merwe of Stellenbosch University, South Africa, has also had an influenceon the Werkgroep, notably through a collaboration with Prof. Dr. Eep Talstra80, and guest visitsbetween the two colleagues, both in Stellenbosch and in Amsterdam.

In 1999, a project was started in collaboration with colleagues from Leiden University calledCALAP.81 CALAP stands for “Computer Assisted Linguistic Analysis of the Peshitta”. ThePeshitta contain the Syriac version of the Old Testament, and the project aims at developing forparts of the Syriac Old Testament what the Werkgroep has done for Hebrew. This project is donemainly by Dr. Janet Dyk and Dr. Percy van Keulen (of Leiden) under the supervision of Dr. K.D.Jenner (of Leiden) and Prof. Dr. Eep Talstra. Hendrik Jan Bosman and Constantijn Sikkel also

78 See above, section B.2.4.4 on page 162.79 Winther-Nielsen and Talstra (1995).80 See, e.g., Talstra and van der Merwe (2002).81 See, e.g., Jenner and Talstra (2002), Dyk (2002) and van Keulen (2002).


help on this project, notably with a redesign of the word-level encodings and algorithms.82 Theyare currently working on a book on the subject.

Prof. Dr. Ferenç Postma has been an anchor man on the textual analysis and text encodingthrough most of the existence of the Werkgroep. Like an old-fashioned scholar, he has labori-ously pored over the texts and their encoding, smoking his cigars when no-one else was aroundin the office. For a number of years he has held a chair in Church History at the University ofBudapest, in his native Hungary.

Dr. Janet W. Dyk has done a lot of the analysis of the texts in the database. Her special areainterest has been the participle in Hebrew83 and Syriac, as well as verb valency patterns. Withher solid background in linguistics, she has provided much linguistic insight for the Werkgroep’swork.

Thus the Werkgroep has done much research in its 27 years of existence, having collaboratedwith numerous scholars from around the world. It owes much of its existence to one man, namelyEep Talstra. He has successfully fought for the existence and funding of the Werkgroep sinceits inception in 1977, and he has provided guidance, supervision, and direction to most of theprojects of the Werkgroep. We are many who are grateful for his gracious attitude, his wisdom,and his kindness.

B.3.3 Distributional and functional data

B.3.3.1 Introduction

One distinction which is very important in understanding the WIVU database is the distinctionbetween “distributional” and “functional” data. In this section, I circumscribe these notions bymeans of quotes from the writings of the Werkgroep.

The circumscription will be gradual, starting with an initial definition (B.3.3.2), moving onto a description of the purpose of the distinction and the order of creation of the units (B.3.3.3),and finally looking at further definitions (B.3.3.4)

B.3.3.2 Initial definitions

In describing the two kinds of data, Talstra (2002) says (p. 10):

“Distributional: defines a linguistic unit in terms of its composition from lower-levelcomponents; no gapping is allowed. These units are called ’atoms’.

“Functional: defines a linguistic unit in terms of its function in a higher-level lin-guistic unit; gapping is allowed. These units are called: phrase, clause, sentence,paragraph.”

82 See, e.g., Bosman and Sikkel (2002a), Bosman and Sikkel (2002b).83 See, e.g., Dyk (1994), Dyk and Talstra (1988)..


Thus the distributional units are called “phrase atom,” “clause atom,” and “sentence atom,”whereas the functional units are called “phrase,” “clause,” and “sentence.”84

B.3.3.3 Purpose and order of creation

The difference between distributional and functional units is further described in Talstra andSikkel (2000, pp. 51-52). Here the purpose of the distinction is also described, along with theorder in which the units are created. I quote:

“Die “Atome” als elementare Kategorien sind, auch wenn sie teilweise keine voll-ständigen linguistischen Größen repräsentieren, für zwei Zwecke hilfreich. Zumeinen fügen sie sich in die Idealvorstellung einer strengen linguistischen Hierar-chie: Atome passen genau ineinander und aneinander; sie bilden Einheiten, die sichnicht überschneiden und den Text linear und lückenlos segmentieren. Auf dieseWeise wird die textuelle Hierarchie für ein Computerprogramm bzw. eine Daten-bank darstellbar. Zum anderen dienen die Atome als eine Art Metasprache: sieermöglichen es, in einem Text die funktionalen linguistischen Einheiten als Kombi-nation von Atomen zu definieren und so auch lückenhafte Einheiten zu erfassen; sokann z.B. ein Satz Nr. 10 aus zwei Satz-Atomen (Nr. 12 und Nr. 14) bestehen, diedurch ein eingebettetes Element (Satz-Atom Nr. 13) voneinander getrennt sind.

“In einem zweiten Arbeitsgang werden die Atome zu funktionalen sprachlichen Ein-heiten kombiniert. Diese repräsentieren die praktische Verwendung sprachlicherEbenen in einem konkreten Text: Einheiten unterschiedlicher Ebenen können miteinan-der verbunden oder ineinander eingebettet sein. Verschiedene Ebenen der Hierarchiekönnen sich so zu einer Einheit höherer Ordnung zusammenfügen.”

Thus “atoms” are the linear building blocks out of which the – possibly discontiguous – func-tional units are built. The atoms are useful for two reasons: First, they segment the text intocontiguous, non-overlapping, strictly adjacent units which correspond to the ideal notion of astrict linguistic hierarchy. And second, they form a kind of meta-language out of which the func-tional, “real” linguistic units can be built. The atoms are built first; and out of the atoms, thefunctional units are built.

Also, distributional units are linear, containing no gaps. Functional units, on the other hand,may contain gaps. This arises, for example, when a functional unit is interrupted by an embeddedelement.

This summarizes the purpose of the distinction, and the order of creation.

B.3.3.4 Further definitions

Another way of looking at it is to say that the atoms are the smallest units of a given level whichcannot be broken down further into units of the same level. For example, Bosman (1995, pp.76-77) says, concerning clause-level:

84 In the version of the database to which I have access, there are as yet no paragraphs and no paragraph atoms inthe database, as Talstra and Sikkel (2000) point out (by not mentioning them).


“A clause atom is a ’simple’ structure that can not itself be divided into con-stituent clauses. Structures like this are called clause atoms, as they are the buildingblocks of ’real’ grammatical clauses.”

Likewise, Talstra (1998, p. 10) says, concerning phrase-level:

“The combination ’Def.art. + Noun’ (H-FDH),85 for instance, is used frequently as apattern of its own in the texts. It can also be used in combination with other phrase-atoms to build one phrase (HMLK DWD),86 but it can not be subdivided furtherinto smaller parts without being divided into elements of a lower linguistic level, i.e.lexemes. The largest possible units built from them in a certain context by applyingthe paradigms of linguistic theory are labeled with the more traditional linguisticterminology of the functional type: phrase, clause, or sentence.”

Thus in general, distributional atoms of a given unit cannot be subdivided further without be-coming units of a lower linguistic level.87 Functional units, on the other hand, are the largestpossible units at the given level.

B.3.4 Methods used in analysisB.3.4.1 Introduction

In this subsection, I describe the methods used by the Werkgroep Informatica in building thedatabase. First, I note that the general strategy has always been bottom-up (B.3.4.2). However,the method is also partially top-down, as I point out next (B.3.4.3). I then give an overview of theprocedure, noting that the main methodological process employed is pattern-matching (B.3.4.4).I then give a detailed description of each of the steps in the procedure for building a full analysisof a piece of text (B.3.4.5). Finally, I briefly describe how the Werkgroep deal with ambiguity inthe text (B.3.4.6).

B.3.4.2 Bottom-up strategy

From the beginning of the existence of the Werkgroep Informatica, an overarching methodologi-cal decision has been at the core of the Werkgroep’s methods, namely that of choosing a bottom-up strategy for producing the database. Thus Talstra and Postma (1989) write (p. 11):

“[I]n building up a database of biblical texts one should work according to a well-defined hierarchical model, starting from analysis at a low level, i.e., the morpheme,and then adding at each higher level the linguistic information valid for that particu-lar level.”

85 Here the original has Hebrew characters, which I transliterate. “H-” is the definite article, and “FDH” means“open country, open field”, according to Holladay (1988, p. 349).

86 Again the original has Hebrew characters which I transliterate. “HMLK” means “the king” and “DWD” means“David,” so the phrase means “David the king.”

87 However, this is not quite true for phrase-level, as we shall see (p. 179).


Thus the Werkgroep Informatica has chosen a bottom-up strategy according to a “well-definedhierarchical model.” This design choice has shaped all of the Werkgroep’s research methodsfrom the very beginning.88

B.3.4.3 Top-down approach

This general principle of working bottom-up is embodied in the mainline procedure for pro-ducing the data (see Section B.3.4.4). However, there is also a secondary procedure, which istop-down: After each of the distributional levels have been produced, this secondary proceduretakes a step downwards, producing functional labels for the units below the level which has justbeen produced.

Thus, after phrase-atom level has been produced, functional labels such as “genitive” and“attributive” are applied to the word-level, and after clause-atom level has been produced, thefunctional phrases are produced and adorned with clause-constituent labels such as “Subject”,“Object”, “Predicate”, etc.

Thus, while the mainline analysis is bottom-up, the method takes a step downwards aftereach new distributional level has been reached, producing the functional units of the level justbelow the current distributional level.

B.3.4.4 Overview of the analysis procedures

The basic method of analysis used in the production of the WIVU database is that of pattern-matching.89 A list of previously encountered patterns is maintained for each level. A patternfor level X is generally expressed in terms of the units at level X-1. For example, clause-atom

88 The reasons of the WI for choosing a bottom-up strategy rather than a top-down strategy have not been articu-lated very well in the literature. Hence, I must here merely adduce my own speculations on the subject.

There are generally two methodological approaches to the analysis of language: top-down and bottom-up. Top-down starts with a higher level, such as semantics and pragmatics, and works one’s way down the layers throughsyntax and morphology. The bottom-up strategy starts with a lower level, such as the morphological or lexical level,and works one’s way upwards through syntax and semantics.

The difficulty with choosing the top-down approach in the context of computer-aided analysis, is that, so far,machines are only capable of manipulating symbols, not meaning directly. This imposes limits on the kinds ofsemantic or pragmatic analysis which one can perform in the computer. In particular, it is difficult to go from rawtext to a semantic or pragmatic analysis in one step; a large amount of human intervention would be required withtoday’s tools. Thus top-down analysis is so far reserved mostly for human agents.

Tools exist which can analyze the semantics of raw text, but these tools almost invariably are bottom-up, thus per-forming the intermediate steps of morphological, lexical, and syntactic analysis before applying semantic analysisto the output of the lower levels of analysis. As an example, see Velardi et al. (1988), which reports on the DANTEsystem.

The bottom-up strategy, on the other hand, is tractable given enough resources. This is because the process alwayshas a previous layer upon which to build a new layer. Of course, the process must be bootstrapped at some level. Inthe case of the WI, they have bootstrapped the process by supplying a disambiguated, morphologically and lexicallyanalyzed text by hand.

89 See Bosman (1995, p. 77).


patterns are expressed in terms of phrase-atoms.90

Rules formulated a priori are rejected as a research method. Instead, rules are seen as theend result rather than the starting point of the analysis procedure.91 From the patterns gatheredduring analysis, one should in principle be able to abstract general rules of the language.92

In general, the longest available pattern is used.93 At phrase-level, this design choice isgrounded in the definition of a phrase-atom (see section B.3.6.2 below).

At clause-level, this design choice makes sense due to a simple argument. I argue as follows:Matching the patterns has to take the patterns in some order. It is most efficient if the list issorted, since this allows faster lookup of promising patterns. The list has to be sorted on somekey. The obvious key to choose is the units of level X-1, since they are the building blocks out ofwhich the list is made. This leaves us with a design choice of sorting longest-first or shortest-firstfor patterns that are identical in their first n terms. If we choose shortest-first, and traverse thelist in the sorted order, then we will never get to the longer patterns, since the shorter patternsare matched first.94 Therefore, the longer patterns must come before the shorter patterns, or thelonger patterns will never be used. On the other hand, if the longer patterns do not match, theshorter patterns can be tried in succession until one is found that matches.

B.3.4.5 Description of procedure

An overview of the parsing procedure can be found in Figure B.1 on the next page. I now describethis figure in words.95

First, syn01 and syn02 take the coded text and produce lexemes with parts of speech,person, number, gender, tense, and other morphological features.

Then the syn03 program takes the lexical level just produced, and produces phrase atoms.This is done by means of pattern-matching, where each pattern consist of a string of parts ofspeech with assorted morphological features (i.e., it is expressed in terms of lexemes).

Then the parsephrases program takes the phrase atoms and produces functional labelsfor the word-level, called “subphrases.” See Section B.3.5.3 on page 178 below for a descriptionof subphrases. The functional labels include “genitive”,96 “attributive,” and “parallel”, amongothers.

Then the syn04 program takes the phrase atoms and produces clause atoms. This is done bymeans of pattern matching, where each pattern consists of a string of phrase-atom types97 along

90 Plus additional morphological and lexical information, but the basic organizing principle of the clause-atompatterns is a string of phrase-atoms. See Bosman (1995, pp. 77–78).

91 See Bosman (1995, p. 77), Talstra (1998), Talstra and Sikkel (2000), Talstra (2002).92 This has been done, e.g., in Talstra and van der Merwe (2002, pp. 55–56), Talstra and Dyk (1999), Talstra

(1997, pp. 95–96,102–103).93 See Bosman (1995, p. 77), Talstra (1998, p. 41)94 The Werkgroep Informatica have chosen the naïve strategy of applying the first pattern that matches, rather

than using backtracking techniques.95 Most of the description comes from my own familiarity with the programs. For a diagram which shows the

procedure, see Talstra (1998, p. 17). See also Petersen (1997a).96 I.e., construct relation, see above, section B.2.5.2 on page 164 for a description.97 I.e., NP, VP, AdjP, PP, etc.


Figure B.1: Overview of Werkgroep Informatica analysis-procedure.

The ellipses represent data, whereas the rectangles represent programs. Solid-border ellipsesrepresent distributional data. Dashed-line-border ellipses show functional data, and

dash-dot-border ellipses show distributional and functional data. The input to one stage willgenerally be the output from the previous stage.


with assorted lexical and morphological features.The parseclauses program then takes the phrase atoms within the boundaries of the

clause atoms, producing functional phrases out of the phrase atoms, and giving the functionalphrases functional labels such as “Subject,” “Object”, “Predicate,” etc. This is done by means ofpattern-matching, where each pattern is a list of clause atom-based patterns of phrase-atoms invarious positions and with various verb valency configurations.98

Then the syn04types program takes all of the levels below, producing functional clauses,sentence atoms, sentences, clause-relations, discourse type of clauses, and a clause-hierarchy.This is done by means of pattern matching, statistical methods, and using a similarity measurebetween clauses.

Thus the end result is a text which is fully analyzed from morpheme-level to text-level. Thisis done in a bottom-up order with occasional top-down steps. The general method is pattern-matching, where the patterns of level X are expressed in terms of units of level X-1. The processis iterative, going over the same text a number of times until the full analysis has been built.

B.3.4.6 Ambiguity

Most real texts will have some degree of ambiguity in their interpretation. This is true at almostall linguistic levels, not just at the semantic and pragmatic levels. Hence, in any endeavor tocreate a syntactic database, one must choose a strategy for dealing with ambiguity.

The Werkgroep have taken the approach of completely disambiguating each linguistic choiceof analysis in the database. That is, the database model employed does not have room for encod-ing more than one analysis of any given text at any given level, of which Sikkel (2003) is onelong affirmation.

The question then inevitably arises: How does the analyst choose between several ambiguouschoices of analysis? The answer is not clear, and is not sufficiently described in the literature ofthe Werkgroep for me to be able to venture any descriptions.

B.3.5 Word levelB.3.5.1 Introduction

In this subsection, I treat two aspects of Word-level in the WIVU database: verbs (B.3.5.2) andsubphrases (B.3.5.3).

B.3.5.2 Verbs

The verbal tenses listed in Table B.4 on page 161 are the ones found in van der Merwe et.al.(1999). The WIVU database has a slightly different division. First, the Cohortative and Jussiveforms are really special forms of the imperfect (yiqtol), and so are listed as such in the WIVUdatabase.99 Second, the WIVU database distinguishes wav-consecutive imperfect (wayyiqtol)

98 Actually, two lists of patterns are used: One for verbal clauses and one for nominal/verbless clauses.99 See above, section B.2.4.7 on page 163 and van der Merwe et al. (1999, pp. 75–76).


TensePerfect (Qatal)Imperfect (Yiqtol)Imperative (Qetol)Infinitive ConstructInfinitive AbsoluteParticiplePassive participleWav-consecutive imperfect (Wayyiqtol)Wav-copulative imperfect (Weyiqtol)

Table B.8: Verb-tenses in the WIVU database.

Functional labelAdjunctAttributiveDemonstrativeModifierParallelGenitive

Ordering Meaningmother first subphrase in pairdaughter second subphrase in pair

Table B.9: Subphrases, or functional categories at word-level.

from wav-copulative imperfect (weyiqtol). Third, the WIVU database distinguishes active par-ticiple from passive participle at the level of verb-tense values.

Thus the full list of verb-tenses as they are found in the WIVU database can be summarizedas in Table B.8.

B.3.5.3 Subphrases

Once the phrase-atoms have been produced, the analysis-procedure takes a step downwards,producing functional labels and categories at the word-level. This is done by identifying rela-tionships between words or word-groups inside the phrase-atom.

The functional labels applied are listed in Table B.9. Each label is applied to pairs of sub-phrases, where the first is called the “mother”, and the second is called the “daughter”. Forexample, in the genitive relationship, the word in construct state is the mother, and the word inthe absolute state is the daughter.100

For any of the labels, e.g., Parallel, the subphrase may extend over more than one word. Forexample, in the Hebrew phrase

100 For a description of construct and absolute state, see above, section B.2.5.2 on page 164.


>T H-CMJM W->T H->RY101

there are two parallel units, coordinated by the conjunction “W”, meaning “and”:

1. >T H-CMJM (“Et ha shamayim”, “the heavens”)

2. >T H->RY (“Et ha arets”, “the Earth”)

This has been analyzed in the Werkgroep Informatica database in such a way that these twounits are subphrases, labeled with the “Parallel” relation, where the first is the “mother”, and thesecond is the “daughter”.102

Thus the subphrases are strings of words which are below the phrase-atom level, one or morewords long. The subphrases always come in pairs; a mother first and then a daughter. Each pairof subphrases is labeled with a functional label from the set given in Table B.9 on the precedingpage. This produces functional categories for word-level.103

B.3.6 Phrase levelB.3.6.1 Introduction

In this subsection, I first circumscribe the definition of phrases and phrase atoms in the WIVUdatabase. This is done by means of quotations from the writings of the Werkgroep (B.3.6.2).Second, I describe the functional labels applied to the functional phrases by the parseclausesprogram (B.3.6.3).

B.3.6.2 Definition of phrases and phrase atoms

In general, atoms of a given level cannot be broken down into constituent parts which are them-selves units of the same level.104 This view, however, has to be slightly amended for phrase-level.

101 “Et ha shamayim we et ha arets”, meaning “the heavens and the Earth”. “>T” is the object marker, since thephrase appears in the larger clause “In the beginning, God created the heavens and the Earth” (Gen 1:1). This phraseis the object of “created”, and so has the object marker.

102 Back in 1997, when I was employed at the Vrije Universiteit as a programmer in the Werkgroep Informatica, Ideveloped and implemented the algorithm currently in use for finding the beginning boundaries of the subphrases.Up until 1997, only the end of the subphrases had been demarcated.

103 One might argue that since the subphrases may consist of more than one word, they do not really constitutefunctional categories at word-level, but rather at phrase-level. This is true to some extent. The words do havefunctional labels already, in that they are labeled with grammatical categories such as “person”, “number,” “gender,”and “tense.”

The reason I have chosen to call the subphrases a word-level phenomenon is that “phrase-level”, as envisaged bythe Werkgroep, consists of the largest possible phrases (functional phrases) and the phrases just below that (phraseatoms), as we shall see in the next section. Thus the term “phrase”has a slightly different meaning in WerkgroepInformatica-terminology than in general linguistics.

Since many of the subphrase-relations do obtain between words rather than groups of words (e.g., genitive,attributive, modifier), it seems best for clarity’s sake to call the subphrases a word-level phenomenon. This isobviously also in line with the overall view of the procedure that after a distributional level X has been analyzed, thefunctional categories of level X-1 are produced.

104 See above, section B.3.3.4 on page 172.


There, phrase-atoms may in fact be larger than the smallest units at phrase-level. The functionalunits (“phrases”) are declared to be the largest units which fill a function at clause-level, e.g.,subject or predicate. But if such a phrase consists recursively of more than one phrase-level unit,which in turn consist recursively of more than one phrase-level unit, then the phrase-atoms fromwhich the functional phrase are built are declared to be the largest possible units at the level justbelow the functional phrase level. Talstra (1998, p. 23) writes:

“. . . it seems best to use the label phrase for the largest combinations possible, . . . .In this way the labeling is consistent with the observations made . . . : phrases equalthe functional categories of the next level, i.e. clause constituents and clause modi-fiers. The smaller parts, the ones that can be found as independent phrases elsewhere,but are not used independently in this particular context, are called: phrase-atoms.

“This implies that neither products of lexeme combinations, i.e. phrases and phraseatoms can be seen as fixed paradigmatic entities. For their identification they de-pend fully on the combinations made in a particular context. The phrase atoms arethe ’building blocks’, the smaller phrase-level units that could be used as phrasesthemselves elsewhere.

“There is, however, one further complication. Clearly, phrase atoms can have morethan one level of embedding. They can be combined into larger phrase atoms, thatcan be combined again into larger units, either phrases or phrase atoms.”

Talstra goes on to say (p. 24):

“In any particular context only the largest units at phrase level should be calledphrase. The composing parts and their combinations are the phrase atoms. Thisimplies, for instance, that the unit >BTJK105 + prep106 is a phrase in Deut 10,15107

. . . , whereas in Deut 9,5108 . . . it is a phrase atom.”

Talstra also says (p. 25):

“This means that the parsing process produces phrase atoms that either represent thehighest level phrases in a particular context, or represent the phrase atoms at the nextlower level.”

Thus phrase atoms can be broken down into constituent parts which are themselves phrase-levelentities, if the phrasal construction is recursive to at least three levels of phrase-level constituents.The uppermost level will be called the functional phrase, whereas the elements just below the

105 Here the original text had Hebrew characters which I have transliterated. “>BTJK” means “your fathers”.106 “prep” is Talstra’s abbreviation for “preposition.”107 “Deut” is an abbreviation for the book of Deutoronomy, the fifth book of Moses in the Bible. In Deut 10,15,

the construction is “B->BTJK” and means “on your fathers,” in the wider construction “the Lord set his affection onyour fathers.” (NIV)

108 In Deut 9,5, the Hebrew functional phrase reads “L->BTJK (L->BRHM L-JYXQ W-L-J<QB)” (to your fathers,to Abraham, to Isaac, and to Jacob).


uppermost level will be called phrase atoms. This is one reason why phrase atoms may be brokendown into elements which are themselves phrasal in nature.

In addition, prepositional phrases are always treated with the preposition + the NP as onesingle PP at phrase-atom level. The NP is never split off from the preposition at phrase-atomlevel.109

This situation is ameliorated to some extent by the existence of subphrases, since they some-times capture the essence of phrases that are below the phrase atom level.110 The particularproblem of Prep + NP is not dealt with in this way, however, since subphrases are for housingfunctional labels, and the Prep + NP is strictly a distributional phenomenon.111

B.3.6.3 Clause constituent labels

In the parseclauses program, where the functional phrases are produced,112 the functionalphrases are also given functional labels. These labels show the function of the phrase in its clauseatom, and are shown in Table B.10 on the following page. They include such familiar labels as“Subject”, “Object”, “Predicate”, “Complement,” and “Adjunct”, most of which can be said to be

109 I have learned this particular piece of information inductively by studying the so-called phrase set, whichcontains the list of phrase-atom patterns which the Werkgroep Informatica has encountered in their years of research.This list has a long section containing prepositional phrase-atoms, or PPs. Also see Talstra (1998, p. 28–29), whereNPs and PPs are described.

110 See above, section B.3.5.3 on page 178.111 Some linguists would argue that the NP is the “object” of the preposition, and that a functional relationship

therefore obtains between the two parts of the PP. Within the framework of the WIVU database, however, there is noroom for this special relationship at the functional level. I do not know exactly why this is the case. I could imaginethat the following reasons might have been contemplated:

1. The Preposition + NP “head-object” relationship does not have the same symmetry as do most of the func-tional categories in Table B.9 on page 178.

2. Occam’s razor may have been applied to the functional labels: The Preposition + NP “head-object” relation-ship does not obtain between any of the other units in which there is perceived interest, and so may seemredundant to some extent.

3. The relationship may simply not be interesting enough. In cases of single prepositional phrases, the relation-ship can be calculated without much trouble, since the first word will always be the preposition, and the restof the phrase atom will always be the NP. In cases of more than one prepositional phrase being coordinatedby a conjunction in the same phrase atom, the relationship can be calculated from the existing parallel sub-phrases, since they demarcate each prepositional phrase within the phrase atom. Thus the same algorithmas with the single prepositional phrases can be applied, this time to the subphrases: The first word is thepreposition, and all the remaining words of the subphrase are the NP.

It is because of the algorithmically obtainable nature of the relationship, as laid out in point no. 3, that I said abovethat the Prep + NP relationship is strictly distributional in nature.

112 See Figure B.3.4.4 on page 174.


Label MeaningAdju AdjunctCmpl ComplementConj ConjunctionExsS Existence with SSExst ExistenceFrnt Fronted elementIntj InterjectionIntS Interjection with SSIrpC IP as CmplIrpO IP as ObjcIrpP IP as PredIrpS IP as SubjLoca LocativeModi ModifierModS Modifier with SSNega Negation

Label MeaningNegS Negation with SSObjC ObjectPreC Predicate complementPred PredicatePreO Predicate with object suffixPreS Predicate with SSPtcO Participle with objectPtSp Participle with specificationQues QuestionRela RelativeSubj SubjectSupp Supplementary constituentTime Time referenceUnkn UnknownVoct Vocative

Table B.10: Clause constituent labels

Note: SS = Subject Suffix, IP = Interrogative Pronoun

grammatical relations.113 However, the rest of the roles are a mixed bag of lexical,114 syntactic,115

thematic,116 and pragmatic117 roles.

113 See Van Valin (2001, pp. 22–23).114 I deem the following roles to be lexical in nature: e.g., Conj, Intj, IntS, Modi, ModS, Nega, NegS, Ques,

Rela. These all owe their existence to specific parts of speech, viz. Conjunction, Interjection, Interjection, Adverb,Adverb, Negative, Negative, Interrogative, and Relative pronoun respectively. In addition, Ques, ExsS, Exst can beseen as being lexical in nature, since they all come from very closed classes of lexemes.

115 I deem the following roles to be syntactic in nature: e.g., Frnt, Supp, PreO, PreS, PtcO, PtSp. The Frontedelement (Frnt) and Supplementary constituent (Supp) roles are syntactic because they owe their existence to certainlocations in the clause. PreS and PreO roles are syntactic because they are kinds of predicates, which are generallyseen as belonging to the “grammatical relation” category. PtcO and PtSp roles are syntactic because they areanalogous to the predicate roles, except that the verb is not finite but a participle.

116 The Time (reference) an d Loca(tive) roles are most often seen as being thematic. See e.g. Givón (1984, pp.131–132), Van Valin (2001, pp. 94–95), Van Valin and LaPolla (1997, pp. 126–128).

117 I deem the Voct role to be pragmatic in nature. This is because in Hebrew, the vocative is not expressed bymorphological, syntactic means, or by specific lexemes, but mainly by using proper names and personal pronouns.However, proper names and personal pronouns are used for many other roles. As such, the identification of these asbeing vocative has to depend on pragmatic considerations.


B.3.7 Clause levelB.3.7.1 Introduction

In this subsection, I describe clause-level in the Werkgroep Informatica database. I start out bydescribing the differences and interplay between clause atoms and functional clauses (B.3.7.2).Finally, I describe the hierarchy of clauses that is produced with the syn04types program(B.3.7.3).

B.3.7.2 Clause atoms and clauses

As I described in Section B.3.3 on page 171, the WIVU database is divided up into distributionaland functional data. The distributional units at clause level are called clause atoms, whereasthe functional units at clause level are called clauses. Bosman (1995, p. 77) mentions tworequirements that a clause atom will meet:

1. A clause atom will have no more than one predicate. Clause atoms may also be without apredicate.

2. A clause atom is consecutive, i.e., it will not have gaps. When a (functional) clause is’broken up’ by an embedded (functional) clause, the (surrounding) clause will consist oftwo clause atoms, separated by the embedded clause. Each clause-atom will tend to be ’de-fective’, since they will tend to require each other for forming a syntactically meaningfulunit.

Clause atoms, then, form the basis for constructing grammatical (or functional) clauses. Often-times, a clause atom will coincide with a grammatical clause, but as indicated, this is not alwaysthe case.

B.3.7.3 Clause hierarchy

Clause atoms are organized in a hierarchy with relations between clause atoms. The descriptionthat follows has been adapted from Bosman (1995, pp. 78–79).

A clause atom is either a daughter, a mother, or both. Each daughter clause atom is connectedto either zero or one other clause atom, called its mother. In other words, a daughter clause atomcan have at most one mother. A mother clause atom, on the other hand, can have one or moredaughters.

In each analyzed piece of text (usually a chapter), there is exactly one daughter clause atomwhich does not have a mother. Such a daughter clause atom is called a root.

The connection between a daughter and its mother is usually upwards, i.e., the mother pre-cedes the daughter. However, the reverse can also be true: The daughter can precede the mother.In either case, the daughter can stand in one of the following relations to the mother:

• The daughter depends on the mother syntactically (i.e., daughter is subordinate to mother,and forms a constituent of it), or


Code Meaning10 Relative clause atom with Hebrew article12 Relative clause atom in qatal17 Nominal clause atom with >CR (asher), the relative pronoun64 Infinitive construct clause atom with the preposition L

100-168 Asyndetic (without a conjunction) clause atom200-201 Parallel clause atom

222 Defective clause atom with verbal predicate in daughter223 Defective clause atom with verbal predicate in mother

300-378 Dependent clause atom with a W (“and”) conjunction480-488 Dependent clause atom in the weyiqtol “tense”

527 Dependent clause atom with the conjunction KJ or KJ >M999 First clause atom of direct speech

Table B.11: Groups of clause atom relation codes in Genesis 1

• The daughter follows the mother as a syntactic continuation (i.e., not as a subordinateconstituent, but nevertheless connected to the mother, e.g., through pronominal references,conjunctions, or other criteria), or

• The daughter is syntactically parallel to the mother (a special case of continuation).

When a daughter is connected backwards to a preceding mother which is not adjacent to thedaughter, the daughter is said to be connected to the whole block of text in between the motherand the daughter. The link in the database only points to the mother, but the intended interpreta-tion is to connect the daughter to all of the preceding text up to and including the mother.

The relations between clause atoms are encoded as integers with up to three base-10 digits.118

The integers encode various data items, such as conjunction class, verbal tense of mother clauseatom, verbal tense of daughter clause atom, preposition class, and other information. The integersare grouped into three sets, each of which maps to one of the relations mentioned above. Therelations are useful both while producing the database, and as a research tool afterwards.

Some of the codes present in my chosen text present information which could be made usefulfor my method. The groups present are described in Table B.11.119 I have chosen not to treatthese codes in my actual method, in order to limit the scope of my method. But this is an areafor further research.

B.3.8 Sentence levelThe WIVU database to which I have access does contain sentence atoms and sentences. How-ever, I have been unable to find any published source describing how the sentence boundaries

118 See Talstra and Van Wieringen (1992, p. 14–16), and Sikkel (2001, p. 8–10).119 The complete list of codes present is described in section C.13 on page 193.


are determined. I know from my experience with the WI programs that they are calculated au-tomatically based on the clause hierarchy. Thus the user has no direct influence on exactly howthe sentences are demarcated, since they are calculated automatically. From personal communi-cation with Professor Talstra, I know that a sentence in the WIVU database by definition is saidto be a main clause plus its dependents, if any.

I have chosen to disregard the sentences in my work. This is because: A) They add a furthercomplicating factor to an already difficult problem, and B) while they are linguistically moti-vated, they are not subject to user correction, and thus may be misleading.

B.3.9 ConclusionThe data arising from the linguistic levels above Word-level in the WIVU database can be dividedinto “distributional” and “functional” data. The distributional units are called “atoms”, thus giv-ing rise to “phrase atoms,” “clause atoms,” and “sentence atoms.” Their functional counterpartsare “phrases,” “clauses,” and “sentences” respectively.

The atoms are contiguous and non-overlapping, and are the building blocks out of which thefunctional units are built. This allows for discontiguous functional units, so that embedding ispossible. The atoms are built first, then the functional units are built out of the atoms.

Atoms generally cannot be broken down into further units at the same level, whereas func-tional units are generally the largest possible units of a given level.

The overall methodological strategy used by the Werkgroep Informatica in producing theWIVU database is bottom-up analysis with occasional top-down analysis. The main, ascendentprocedure produces distributional units, whereas the secondary, descendent procedure producesfunctional units. When a new distributional level X has been reached, the functional units of levelX-1 are produced. This is done mostly by means of pattern-matching with a longest-pattern-first,applying-the-first-match strategy. The procedure is depicted in Figure B.1 on page 176. Thedatabase is completely disambiguated, but the methods for choosing one analysis over anotherin cases of ambiguity remains unknown.

The verbal system is described slightly differently in the WIVU database from the way vander Merwe et.al. (1999) describe it. The WIVU division of tenses is shown in Table B.8 onpage 178.

Subphrases are word-level functional categories, produced after phrase atoms have been pro-duced. Subphrases occur in pairs, each pair having exactly one functional label, drawn fromTable B.9 on page 178.

Functional phrases are the largest possible phrasal units at clause atom level, and are labeledwith functional labels as in Table B.10 on page 182. Phrase atoms are the largest possible phrasalunits just below functional phrase level. This means that if a phrasal structure is recursive to threeor more levels of nesting, only the uppermost two are demarcated in the analysis. This situationis ameliorated to some extent by the existence of the subphrases.

Clause-level is divided up into two distinct categories: Distributional clause atoms and func-tional clauses. The clause atoms are contiguous, and have no more than one predicate. They arethe building blocks of the functional clauses. The clause atoms may be defective, e.g., in cases ofdiscontiguous functional clauses which arise due to embedding. Clause atoms are organized into


a clause hierarchy. Within this hierarchy, clause atoms either play the role of daughter, mother,or both. A daughter has at most one mother, and a mother has one or more daughters. The kindof relation that obtains between the daughter and the mother is specified by means of integercodes.

B.4 ConclusionI have presented aspects of the Hebrew language (B.2) and the WIVU database (B.3).

In the section on the Hebrew language, I have dealt with graphology (B.2.2), word-level(B.2.3), the verbal system (B.2.4), and syntax (B.2.5). This more or less reflects the order inwhich many grammars treat the language, including my reference grammar, van der Merweet.al. (1999).

For the WIVU database, I have treated the history of the Werkgroep Informatica (B.3.2), theirdistinction between distributional and functional data (B.3.3), methods used in analysis (B.3.4),word-level (B.3.5), phrase-level (B.3.6), clause-level (B.3.7), and sentence-level (B.3.8). Thehistorical section is provided mainly for humanistic interest. The distinction between distribu-tional and functional data is so fundamental to the database that it needed to be described early.The methods used in analysis, while not strictly necessary for understanding my method, provideuseful insights into the research of the Werkgroep Informatica, and is included for expanding thegeneral scope and breadth of my thesis. Then comes a treatment of each of the levels of analysispresent in the database, ordered in their sequence of production.

Thus I have circumscribed enough of the grammar of Hebrew to be able to understand whatgoes on in my method. I have done the same for the WIVU database, which is a fundamentalbuilding block in my method. However, understanding my method requires only a cursory in-troduction to these topics, such as that provided in Chapter 3). Therefore, I have relegated thedetails to this appendix.

Appendix C

Categories in the WIVU database

C.1 IntroductionIn this appendix, I list the categories present in my chosen text (Genesis 1), along with the meth-ods I used to obtain these category-lists. There were two sources: The Emdros database contain-ing Genesis 1, and the pX files produced by the WI analysis programs, such as syn04types. Ineach case, I have listed first the category values present in Genesis 1, then the method used toobtain the list. After that, I may or may not give some commentary on the values.

For each category, there are two tables. The left table is for all of Genesis 1, whereas theright table is for Genesis 1:1-3. The method-section of each category only shows how to get thevalues for all of Genesis. The method for getting the values for Gen 1:1-3 is usually just to useanother input file containing only that text.

C.2 Parts of speech

C.2.1 ValuesValue Meaning

0 Article1 Verb2 Noun4 Adverb5 Preposition6 Conjunction

10 Interjection13 Adjective

Value Meaning0 Article1 Verb2 Noun5 Preposition6 Conjunction

C.2.2 MethodI ran the ps3 file through the following shell script:

187

188 APPENDIX C. CATEGORIES IN THE WIVU DATABASE

cut -c 34-36 genesis01.ps3 | sort | uniq

C.3 Phrase-dependent parts of speech


0 Article1 Verb2 Noun4 Adverb5 Preposition6 Conjunction

10 Interjection13 Adjective

Value Meaning0 Article1 Verb2 Noun5 Preposition6 Conjunction

C.3.2 Method

I ran the ps3 file through the following shell script:


C.3.3 Commentary

These values are exactly the same set as for the part of speech, both for all of Genesis 1 andfor Gen 1:1-3. This does not, however, mean that all phrase dependent parts of speech are thesame as the formal part of speech category for all words, at least not for all of Genesis 1. See thefollowing section.

C.4 Changes in part of speech

C.4.1 ValuesFrom (formal psp) To (phr.dep. psp)Article ConjunctionVerb AdjectiveAdjective NounNoun AdverbNoun Preposition

From (formal psp) To (phr.dep. psp)no change no change

C.5. VERBAL TENSE 189

C.4.2 Method

In ran the ps3 file through the following shell script:

cut -c 34-36,74-76 genesis01.ps3 | sort | uniq

C.4.3 Commentary

The values in the table above only list the changing transformations. As can be seen, in Gen1:1-3, there are no psp changes.

C.5 Verbal tense


1 yiqtol2 qatal3 imperative4 infinitive construct6 participle

11 wayyiqtol12 weyiqtol

Value Meaning1 yiqtol2 qatal6 participle

11 wayyiqtol

C.5.2 Method



C.6 Verbal stem


0 Qal1 Piel2 Hifil3 Nifal

Value Meaning0 Qal1 Piel


C.6.2 Method



C.7 Person


0 unknown1 first person2 second person3 third person

Value Meaning0 unknown3 third person

C.7.2 Method



C.8 Number


0 unknown1 singular2 dual3 plural

Value Meaning1 singular3 plural

C.8.2 Method



C.9. GENDER 191

C.9 Gender


0 unknown1 feminine2 masculine

Value Meaning0 unknown1 feminine2 masculine

C.9.2 Method



C.10 Phrase atom type


1 Verb phrase (VP)2 Noun phrase (NP)4 Adverb phrase (AdvP)5 Prepositional phrase (PP)6 Conjunction phrase (CjP)

10 Interjection phrase (IjP)13 Adjective phrase (AP)

Value Meaning1 Verb phrase (VP)2 Noun phrase (NP)5 Prepositional phrase (PP)6 Conjunction phrase (CjP)

C.10.2 Method


cut -c 78-80 genesis01.ps3 | sort | uniq [upet01@upet01 gen]$

C.10.3 Commentary

This closely mirrors the set of parts of speech given above, except that articles are not repre-sented. This is understandable, since articles always combine with at least one other word toform a phrase, and hence are never phrases on their own.


C.11 Phrase type


1 Verb phrase (VP)2 Noun phrase (NP)4 Adverb phrase (AdvP)5 Prepositional phrase (PP)6 Conjunction phrase (CjP)

10 Interjection phrase (IjP)13 Adjective phrase (AP)

Value Meaning1 Verb phrase (VP)2 Noun phrase (NP)5 Prepositional phrase (PP)6 Conjunction phrase (CjP)

C.11.2 MethodI ran the following MQL script:

SELECT ALL OBJECTSIN {1-673}WHERE[Phrase GET phrase_type]GO

Through the following shell script:

mql -d ~/db/genesis1 phrase_types.mql | grep "phrase_type" |gawk ’{ match($0, "phrase_type=\"([0-9]+)", a); print a[1]; }’ |sort | uniq

C.11.3 Commentary

C.12 Phrase function

C.12.1 ValuesValue Meaning QDF Value Meaning QDF

501 Predicate Pred 509 Conjunction Conj502 Subject Subj 512 Interjection Intj503 Object Objc 519 Relative Rela504 Complement Cmpl 521 Predicate complement PreC505 Adjunct Adju 531 Predicate with object suffix PreO506 Time reference Time 566 Parallel Para507 Locative Loca 567 Conjunction Link508 Modifier Modi 582 Specification Spec

C.13. CLAUSE ATOM RELATION 193

Value Meaning QDF

501 Predicate Pred502 Subject Subj503 Object Objc504 Complement Cmpl506 Time reference Time509 Conjunction Conj

C.12.2 MethodI ran the ps4.p file through the following shell script:

cut -c 120-122 genesis01.ps4.p | sort | uniq

C.12.3 CommentaryThis feature is split over two object types: functional phrases and phrase atoms. The phraseatoms use 566, 567 and 582 to describe their relationship with another phrase atom or word. Thephrases use the rest of the values in the table to show their phrase function.

The “Value” columns show the numerical value present in the ps4.p file. The “QDF” columnsshow the textual label used in the QDF files and in the Emdros database.

C.13 Clause atom relation

C.13.1 ValuesC.13.1.1 Genesis 1

10, 12, 17, 64, 101, 107, 122, 127, 160, 161, 162, 167, 200, 201, 222, 223, 302, 307, 311, 313,321, 322, 327, 360, 372, 461, 481, 527, and 999.

C.13.1.2 Genesis 1:1-3

200, 302, 322, 360, 372, and 999.

C.13.2 MethodI ran the following MQL script:

SELECT ALL OBJECTSWHERE[Versebook = Genesis and


chapter = 1[clause_atom relation <> 0

GET relation, mother]

]GO

Through the following shell-script (it is all one line):

mql -d ~/tmp/backup.db/genesis-speciale-2004-01-16/genesis1 clause-atom-relations.mql | grep "relation" |gawk ’{ match($0, "relation=\"([0-9]+)", a); print a[1]; }’ |sort | uniq

C.14 Verbs with non-qal stems

C.14.1 Introduction

As noted in Section C.6, only four verbal stems are represented in Genesis 1: qal, piel, hifil, andnifal. The majority of verbs are in the qal stem. Here I list the verbs present in piel, hifil, andnifal.

C.14.2 Verbs

C.14.2.1 Genesis 1

Piel (1)BRK[<WP[RXP[

Hifil (2)>WR[BDL[DC>[ZR<[JY>[

Nifal (3)QWH=[R>H[

C.14.2.2 Genesis 1:1-3

Piel (1)RXP[

C.14.3 Method

I ran the ps3 file through the following shell scripts:

C.14. VERBS WITH NON-QAL STEMS 195

# for piel:cut -c 12-27,41-43 genesis01.ps3 | grep -v "\-1"| grep "1" | sort | uniq

# for hifil:cut -c 12-27,41-43 genesis01.ps3 | grep -v "\-1"| grep "2" | sort | uniq

# for nifal:cut -c 12-27,41-43 genesis01.ps3 | grep -v "\-1"| grep "3" | sort | uniq


196

Appendix D

Textual emendations

D.1 Introduction

In this appendix, I describe the emendations I have made to the text, or rather, to the syntacticanalysis thereof. In all cases, the emendation has been made because I disagreed with the analysisof the WI. Each emendation is described first by an MQL script that shows what I have done, thenby a description of the text both before and after the emendation, and finally by an explanationof why I have felt that an emendation was necessary.

D.2 Genesis 1:14 monads 262-265

D.2.1 MQL

DELETE OBJECTSBY ID_D = 11757[subphrase]GO

CREATE OBJECTFROM MONADS = { 263 }WITH ID_D = 11757[subphrasesubphrase_type := PAR;subphrase_kind := mother;mother := NIL;

]GO

197

198 APPENDIX D. TEXTUAL EMENDATIONS

D.2.2 DescriptionThe text reads:

L-JMJM W CNJM

And has been analyzed thus with subphrases:

{PAR+ L-JMJM +PAR} W {par- CNJM -par}

I have changed it to be analysedanalysed thus:

L-{PAR+ JMJM +PAR} W {par- CNJM -par}

D.2.3 ExplanationI believe that this is a prepositional phrase with an embedded NP that consists of two parallelNPs each consisting of a noun. My emendation reflects this fact. The WI had analyzed it insteadas a prepositional phrase which was parallel to an NP.


D.3.1 MQLDELETE OBJECTSBY ID_D = 11822[subphrase]GO

CREATE OBJECTFROM MONADS = { 506-514 }WITH ID_D = 11822[subphrasesubphrase_type := PAR;subphrase_kind := mother;mother := NIL;

]GO


CREATE OBJECTFROM MONADS = { 506-518 }

D.3. GENESIS 1:26 MONADS 506-523 199

WITH ID_D = 11824[subphrasesubphrase_type := PAR;subphrase_kind := mother;mother := NIL;

]GO



]GO


B-DGT H-JM W B-<WP H-CMJM W B–BHMH W B KL H->RY W B-KL H-RMF

With parallel subphrases, it looks as follows (subphrases on the same level point to eachother):

B-DGT H-JM W B-<WP H-CMJM W B--BHMH W B KL H->RY W B-KL H-RMF|PAR | |par |

|PAR | |par ||PAR | |par ||PAR | |par |

have changed it to be analyzed thus:

B-DGT H-JM W B-<WP H-CMJM W B--BHMH W B KL H->RY W B-KL H-RMF|PAR | |par ||PAR | |par ||PAR | |par ||PAR | |par |

200 APPENDIX D. TEXTUAL EMENDATIONS

D.3.3 ExplanationThis is clearly a string of prepositional phrases, all of which are parallel in a left-branchingstructure. The original analysis had the beginnings of some of the head parallel subphraseswrong. I have emended the wrong PAR subphrases so that they all start at the beginning of thePP.


D.4.1 MQLDELETE OBJECTSBY ID_D = 11842[subphrase]GO


]GO


B-DGT H-JM W B-<WP H-CMJM W B-KL XJH

With parallel subphrases, it looks as follows (subphrases on the same level point to eachother):

B-DGT H-JM W B-<WP H-CMJM W B-KL XJH|PAR | |par |

|PAR | |par |

have changed it to be analyzed thus:

B-DGT H-JM W B-<WP H-CMJM W B-KL XJH|PAR | |par ||PAR | |par |

D.4. GENESIS 1:28 MONADS 572-584 201

D.4.3 ExplanationThis is clearly a string of prepositional phrases, all of which are parallel in a left-branchingstructure. The original analysis had the beginnings of one of the head parallel subphrases wrong.I have emended the wrong subphrase so that the mother PAR subphrases all start at the beginningof the PP.


202

Appendix E

Emdros

E.1 IntroductionThis appendix describes Emdros in greater detail than was possible in Chapter 2. In the follow-ing, I first describe the origins of Emdros (E.2). I then give an introduction to some Emdrosconcepts of importance for my study (E.3), after which I give an example of an Emdros database(E.4). I then briefly note that Emdros has an API which I have used in building my application(E.5). Finally, I summarize this appendix in a conclusion (E.6).

E.2 Origins of EmdrosEmdros is an implementation of the EMdF1 database model and the MQL2 query language.EMdF and MQL, in turn, are derivatives of the work done by Dr. Crist-Jan Doedens in his 1994PhD thesis.3 In his PhD thesis, Doedens developed the MdF database model as a mathematicalmodel of “text plus information about that text,” as well as the QL query language to complementthe MdF model. In my B.Sc. thesis,4 I took the MdF model and extended it slightly, making iteasier to implement. I also took the QL query language, first cutting it down to a manageablecore, then giving this core an operational semantics. Since Doedens had only given a denotationalsemantics for QL, it was not easy to implement. My derivative, which I called MQL, would laterprove much easier to implement, since it had been given an operational semantics.5

In the Spring of 2001, I spent some time extending MQL to become not just a query languagefor query-operations, but a full-access language with create/update/delete operations in all of thedata domains of the EMdF model. Subsequently, I was able to implement both the EMdF model

1 Extended Monads dot Features2 Mini Query Language3 Doedens (1994).4 Petersen (1999).5 I am here talking about formal semantics of programming languages, not the semantics of natural languages.

The difference between operational semantics and denotational semantics is as follows. Denotational semanticsspecifies what to compute, but not how. Operational semantics specifies how to compute the results, but not whatthey are. See Winskel (1993) for an introduction to the subject.

203

204 APPENDIX E. EMDROS

and the now-extended MQL query language. In September of 2001, I released the first versionas Open Source6 on the Internet.

Thus Emdros implements the EMdF model and the MQL query language, both of which arederivatives of the work done in Doedens (1994). EMdF provides a mathematical abstraction oftext, or the language in which to talk about text in an Emdros database. MQL is a full-access lan-guage supporting create/update/delete/query operations on all data domains of the EMdF model.

E.3 Emdros concepts

E.3.1 IntroductionIn this section, I briefly introduce four key concepts necessary for understanding how Emdros isused in the implementation of my method. They are:

• monad (E.3.2),

• object (E.3.3),

• object type (E.3.4), and

• feature (E.3.5).

They all belong to the EMdF model, and as a whole describe the core of the EMdF model.Finally, I sum up this subsection in a conclusion (E.3.6).

E.3.2 MonadA monad is simply a positive integer greater than or equal to 1.7 The integers are arranged ina sequence determined by the successor function.8 In the EMdF model, the linear sequence ofthe text is determined by the linear sequence of the integers or monads. So the sequence of themonads determines the sequence of the text which is stored in an EMdF database.

6 For a definition of the Open Source concept, see <http://www.opensource.org/>.7 The term “monad” here comes from the work of Doedens (1994). Doedens states (p. 57): “A monad corre-

sponds to a single position in the text considered to be indivisible”, and then cites the Random House Dictionaryfrom 1987 with the definition of a monad “a single unit or entity [1605-1615]”.

Thus its definition, as Doedens uses it, predates Leibniz’s “Monadology” from 1714(<http://www.rbjones.com/rbjpub/philos/classics/leibniz/monad.htm>). Leibniz’swork places a lot of notions into the concept which are not present in Doedens’ use of the word, such as its beinglinked to perception. However, there is some overlap between Leibniz’s and Doedens’s uses of the word, as thefollowing quotation from the above URL (a translation of “Monadology”) shows:

“1. The Monad, of which we shall here speak, is nothing but a simple substance, which enters intocompounds. By ’simple’ is meant ’without parts.”’

In Doedens’ use, a monad is not a substance, since it is merely an abstract integer. However, it is “simple” in thatit is “without parts”, and it can “enter into compounds” in the sense of being a member of a larger set of monads,thereby forming objects.

8 I.e., succ(x) = x + 1. So the sequence is 1,2,3,4,5,. . . .

E.3. EMDROS CONCEPTS 205

WordMonads { 2 }Number 2surface “door,”part_of_speech noun

PhraseMonads { 1, 2 }Number 1phrase_type NP

Figure E.1: Objects of type “Word” and “Phrase”

E.3.3 Object

An object is simply a non-empty set of monads. This set must have at least one monad, but otherthan that, there are no restrictions on the set. In particular, it can contain discontiguous sequencesof monads, thus representing objects which pertain to discontiguous parts of the text. This could,e.g., be a clause which has an embedded relative clause which is not part of it.9

E.3.4 Object type

All objects are of exactly one object type. An object type could, e.g., be “Word,” “Phrase,”“Clause,” “Sentence,” “Book,” “Page,” “Chapter,” etc. An object type groups EMdF objects withsimilar characteristics, much like a conceptual “type” (such as “Cat”, “Chair”, “Lover”) groupsreal-world objects with similar characteristics. The difference is that an object type is definedmuch more formally and specially: An object type groups objects that have the same features,where “feature” has a special meaning, which I describe next.

E.3.5 Feature

A feature is a function10 whose domain is objects of a given object type, and whose codomainis one of “integer,” “string,” “id_d,11” or “enumeration.12” Another word for the features of anobject type could be “attributes.” Thus it is through features that we assign specific values toattributes of a given object.

For example, an object type “Word” might have the features “surface” and “part_of_speech.”A given object of type Word might then, e.g., have “door,” for “surface” and “noun” for “part_of_speech.”Similarly, an object of type “Phrase” spanning the monads corresponding to the Word objects“The” “door,” might have the value “NP” for its “phrase_type” feature. See Figure E.1 for atabular view of these objects.

9 An example would be “The door, which opened towards the East, was blue.” In this example, “The door . . .was blue” forms a single clause which is interrupted by the embedded relative clause “which opened towards theEast.” The set of monads describing the clause “The door . . . was blue” would be discontiguous.

10 I here use the word “function” in its mathematical sense. See, e.g., Winskel (1993, p. 7).11 An id_d is an integer uniquely identifying an object in the database.12 An enumeration is a set of labels. So for example, “{ masculine, feminine }” could be an enumeration which

we might call “gender.”

206 APPENDIX E. EMDROS

surface

part_of_speech

Phrase

The

def.art.

door,

noun

which

rel.pron.

opened

verb

towards

prep.

the

def.art. noun

East,

Word

phrase_type

Phrase

phrase_type

Clause_atom

Clause

Sentence

verb

was

adject.

blue.

1 2 3 4 5 8 9

1 2 3 4 5 6 7 8 9

6 7

1 2 3

4

5 6 7

1 2 3

2 1

1

1

NP NP VP

PP

NP VP AP

Figure E.2: A small EMdF database

All object types have at least one feature, namely the “self” feature, which for each objectshows what its id_d is.

The notation for features on objects is as follows: Given an object O with an object type OTthat has a feature f, then O’s feature f is denoted “O.f”. For example, “O.self” means “the valueof object O’s ’self’ feature.”

E.3.6 ConclusionObjects are sets of monads. Monads are integers. The sequence of monads as embodied in thesuccessor function determines the textual sequence. An object is of exactly one object type. Theobject type determines which features are applicable to a given object. The features are used toassign values to attributes of objects. This summarizes the core of the EMdF model.

E.4 An exampleIn order to make the preceding material less abstract, I have provided an example in Figure E.2.

This small EMdF database has nine monads and five object types (Word, Phrase, Clause_atom,Clause, Sentence). One monad corresponds to one word. The Word object type has the features“surface” and “part_of_speech,” while the “Phrase” object type has the feature “phrase_type.”The other object types have no features.

The Phrase object labelled “5” consists of the monad-set { 6, 7 }. It has the “phrase_type”

E.5. EMDROS API 207

“NP.” It is embedded in the “PP” Phrase object labelled “4”, which consists of the monad-set {5, 6, 7 }.

The “Clause” object labelled “1” is discontiguous and consists of the monads { 1, 2, 8, 9 }.It represents the clause “The door . . . was blue.” The intervening relative clause is, by somelinguistic accounts, not part of the surrounding clause.13

E.5 Emdros APIEmdros has an API for accessing its services. It is described online,14 and need not concern ushere, except to note that I have used it in implementing access to the WIVU Hebrew database.

E.6 ConclusionEmdros is a text database engine for “text plus information about that text.” Its significancein the context of this study is that the WIVU database, which provides the empirical basis formy project, is most easily accessed from an Emdros database. Emdros implements the EMdFdatabase model and the MQL query language, both of which are derived from the work of Doe-dens (1994). The EMdF model is based on monads, objects, object types, and features.

13 See McCawley (1982) for a sustained argument in favor of this interpretation. This article also discusses whymonads 1-7 do not constitute a big noun phrase. One of the arguments is that the relative clause is a non-restrictiverelative clause, i.e., it does not serve to specify which door, but rather it serves to give further information aboutthe already-identified door. Non-restrictive relative clauses are generally not seen as modifying their target NP, andhence are not part of the target NP.

14 See the “Emdros Programmer’s Reference Guide,” <http://emdros.org/progref/>.


208

Appendix F

Ontology

F.1 IntroductionThis appendix lists the ontology in the form of trees. Two forms are given: An abbreviated form,and a full form with differentiation between entry clusters and ontology entries. Also, both formsare shown twice: Once for Genesis 1:1-3 and once for all of Genesis 1.

Nodes at the same level of indentation are children of the node that is immediately abovewhich is indented one level less. Leaf nodes in these representations have “Absurd” as the singlesubtype.

F.2 Ontology of Genesis 1:1-3, abbreviatedUniversal

EntityEntity_playing_a_rolePossessed_entitypossession

Collectiongroup

abstractionpropositionIfcommunicationThen

attributeshape

amorphous_shapespacevoid_1void

property

209

210 APPENDIX F. ONTOLOGY

mannerpsychological_featurecognition

contentbeliefspiritual_beingGod_1God

Physical_objectentitylocation

spacevoid_1void

regionextremityboundarysurfacesurface_1celestial_sphereheavens

objectlandearth

body_of_waterocean_1ocean

substancecompoundbinary_compoundwater_1water

fluidliquidwater_1water

causal_agentvital_principlespirit_1spirit

representationconceptual_graphRulePremiseConclusion

F.3. ONTOLOGY OF GENESIS 1, ABBREVIATED 211

SituationStatebebe_1

perceivestateillumination

darkdarkness

conditionemptiness_1emptiness

relationProcessActionhesitate

hover_1hover

makecreate

expressstate_1say

eventhappening

beginning_1beginning

phenomenonnatural_phenomenon

physical_phenomenonenergyradiationelectromagnetic_radiationactinic_radiationlight_1light

F.3 Ontology of Genesis 1, abbreviatedUniversal

EntityPhysical_objectentitycausal_agent


vital_principlespirit_1spirit

partbody_partorganwing_1wing

objectliving_thingorganismbeingplantvascular_plantwoody_planttree_1tree

herb_1herbgramineous_plantgrass_1grass

animalanimal_1creepy-crawly_1creepy-crawly

marine_animal_1marine_animal

chordatevertebratemammalplacentallivestock_1livestock

primatehominid

homoman

aquatic_vertebratefish_1fish

bird_1bird

mutantfreak


leviathan_1leviathan

landgroundearthdry_land

natural_objectplant_partplant_organreproductive_structurefruit_1fruitseed_1seed

celestial_bodyheavenly_bodystar_1star

body_of_watersea_1sea

ocean_1ocean

locationspacevoid_1void

regionextremityboundarysurfacesurface_1celestial_sphereheavensfirmament

insidemidst_1midst

pointtopographic_pointplace

substancefluidliquidwater_1


watercompoundbinary_compoundwater_1water

food_1food

representationconceptual_graphRulePremiseConclusion

abstractionattributeshape

amorphous_shapespacevoid_1void

every(a)every

fourth_1fourth

first_1first

qualitysamenesssimilaritylikeness_1likenesslikeness_2

fifth_1fifth

living_1living

all(a)all

one_1one

third_1third

good_1goodbe_good

property


magnitudeamountquantityabundanceprofusiongreenness_1greenness

mannerthusso

very_1very

two_1two

female_1female

male_1male

second_1second

fruitfulbe_fruitful

small_1small

great_1great

sixth_1sixth

psychological_featurecognition

contentideaconceptcategorykind_1kind

beliefspiritual_beingGod_1God

propositionIfcommunication

indicationevidence


cluesign_1sign

Thenmeasuretime_unit

day_2day_1

fundamental_quantitytime_periodnight_1night

morning_1morning

dayevening_1evening

season_1season

year_1year

Entity_playing_a_rolePossessed_entitypossession

Collectiongroupcollection_1

collectionSituationStatebebe_1

perceivesee_1

seestatecondition

emptiness_1emptiness

dominancedominion_1dominion

illuminationdarkdarkness


relationProcessActionexpress

state_1say

controlgovern_1governrule

changelightenlight_2illuminate

change_magnitudeincreasegrow_1developgrowshootsprout

hesitatehover_1hover

move_1gathercrowdpourteem

change_1fill_1fill

give_1give

oppressrepresssubdue

determineidentifydistinguishdifferentiate

designatelabelnamecall


travelmove_2fly_1fly

moveput_1put

spillseed_3seed_2

separate_1separate

makecreateproduce_1produce

reproducebreedmultiply

bless_1bless

gather_2gather_1

phenomenonprocess

natural_processorganic_processbodily_processbreath_1breath

natural_phenomenonphysical_phenomenonenergyradiationelectromagnetic_radiationactinic_radiationlight_1light

eventhappening

beginning_1beginning

F.4. ONTOLOGY OF GENESIS 1:1-3, FULL 219

F.4 Ontology of Genesis 1:1-3, full

EC: { gl.=’Universal’ syns.=’Universal’ }EC: { gl.=’Entity’ syns.=’Entity’ }EC: { gl.=’Collection’ syns.=’Collection’ }

EC: { gl.=’group’ syns.=’group, grouping’ }EC: { gl.=’abstraction’ syns.=’abstraction’ }

EC: { gl.=’psychological_feature’ syns.=’psychological_feature’ }EC: { gl.=’cognition’ syns.=’cognition, knowledge, noesis’ }EC: { gl.=’content’ syns.=’content, cognitive_content, mental_object’ }

EC: { gl.=’belief’ syns.=’belief’ }EC: { gl.=’spiritual_being’ syns.=’spiritual_being,

supernatural_being’ }EC: { gl.=’God_1’ syns.=’God, Supreme_Being’ }

OE: { lexeme = ’>LHJM/’ gloss = ’God’ }EC: { gl.=’proposition’ syns.=’proposition’ }

EC: { gl.=’If’ syns.=’If’ }EC: { gl.=’Then’ syns.=’Then’ }EC: { gl.=’communication’ syns.=’communication’ }

EC: { gl.=’attribute’ syns.=’attribute’ }EC: { gl.=’property’ syns.=’property’ }EC: { gl.=’manner’ syns.=’manner, mode, style, way, fashion’ }

EC: { gl.=’shape’ syns.=’shape, form’ }EC: { gl.=’amorphous_shape’ syns.=’amorphous_shape’ }

EC: { gl.=’space’ syns.=’space’ }EC: { gl.=’void_1’ syns.=’void, vacancy, emptiness’ }OE: { lexeme = ’BHW/’ gloss = ’void’ }

EC: { gl.=’Physical_object’ syns.=’Physical_object’ }EC: { gl.=’entity’ syns.=’entity, physical_thing’ }

EC: { gl.=’body_of_water’ syns.=’body_of_water, water’ }EC: { gl.=’ocean_1’ syns.=’ocean’ }

OE: { lexeme = ’THWM/’ gloss = ’ocean’ }EC: { gl.=’substance’ syns.=’substance, matter’ }EC: { gl.=’fluid’ syns.=’fluid’ }

EC: { gl.=’liquid’ syns.=’liquid’ }EC: { gl.=’water_1’ syns.=’water, H2O’ }OE: { lexeme = ’MJM/’ gloss = ’water’ }

EC: { gl.=’compound’ syns.=’compound, chemical_compound’ }EC: { gl.=’binary_compound’ syns.=’binary_compound’ }

EC: { gl.=’water_1’ syns.=’water, H2O’ }OE: { lexeme = ’MJM/’ gloss = ’water’ }

EC: { gl.=’causal_agent’ syns.=’causal_agent, cause, causal_agency’ }EC: { gl.=’vital_principle’ syns.=’vital_principle, life_principle’ }

EC: { gl.=’spirit_1’ syns.=’spirit’ }OE: { lexeme = ’RWX/’ gloss = ’spirit’ }

EC: { gl.=’location’ syns.=’location’ }EC: { gl.=’region’ syns.=’region, part’ }

EC: { gl.=’extremity’ syns.=’extremity’ }EC: { gl.=’boundary’ syns.=’boundary, bound, bounds’ }EC: { gl.=’surface’ syns.=’surface’ }

OE: { lexeme = ’PNH/’ gloss = ’surface_1’ }EC: { gl.=’celestial_sphere’ syns.=’celestial_sphere,

sphere, empyrean, firmament, heavens, vault_of_heaven,welkin’ }

OE: { lexeme = ’CMJM/’ gloss = ’heavens’ }EC: { gl.=’space’ syns.=’space’ }

EC: { gl.=’void_1’ syns.=’void, vacancy, emptiness’ }OE: { lexeme = ’BHW/’ gloss = ’void’ }

EC: { gl.=’object’ syns.=’object, physical_object’ }EC: { gl.=’land’ syns.=’land, dry_land, earth, ground, solid_ground,

terra_firma’ }OE: { lexeme = ’>RY/’ gloss = ’earth’ }

EC: { gl.=’Entity_playing_a_role’ syns.=’Entity_playing_a_role’ }EC: { gl.=’Possessed_entity’ syns.=’Possessed_entity’ }


EC: { gl.=’possession’ syns.=’possession’ }EC: { gl.=’Situation’ syns.=’Situation’ }EC: { gl.=’State’ syns.=’State’ }

EC: { gl.=’relation’ syns.=’relation’ }EC: { gl.=’state’ syns.=’state’ }

EC: { gl.=’illumination’ syns.=’illumination’ }EC: { gl.=’dark’ syns.=’dark, darkness’ }

OE: { lexeme = ’XCK/’ gloss = ’darkness’ }EC: { gl.=’condition’ syns.=’condition, status’ }EC: { gl.=’emptiness_1’ syns.=’emptiness’ }

OE: { lexeme = ’THW/’ gloss = ’emptiness’ }EC: { gl.=’be’ syns.=’be’ }

OE: { lexeme = ’HJH[’ gloss = ’be_1’ }EC: { gl.=’perceive’ syns.=’perceive, comprehend’ }

EC: { gl.=’Process’ syns.=’Process’ }EC: { gl.=’event’ syns.=’event’ }

EC: { gl.=’happening’ syns.=’happening, occurrence, natural_event’ }EC: { gl.=’beginning_1’ syns.=’beginning’ }

OE: { lexeme = ’R>CJT/’ gloss = ’beginning’ }EC: { gl.=’phenomenon’ syns.=’phenomenon’ }

EC: { gl.=’natural_phenomenon’ syns.=’natural_phenomenon’ }EC: { gl.=’physical_phenomenon’ syns.=’physical_phenomenon’ }

EC: { gl.=’energy’ syns.=’energy’ }EC: { gl.=’radiation’ syns.=’radiation’ }EC: { gl.=’electromagnetic_radiation’ syns.=’

electromagnetic_radiation, electromagnetic_wave,nonparticulate_radiation’ }

EC: { gl.=’actinic_radiation’ syns.=’actinic_radiation,actinic_ray’ }

EC: { gl.=’light_1’ syns.=’light, visible_light,visible_radiation’ }

OE: { lexeme = ’>WR/’ gloss = ’light’ }EC: { gl.=’Action’ syns.=’Action’ }

EC: { gl.=’make’ syns.=’make, create’ }OE: { lexeme = ’BR>[’ gloss = ’create’ }

EC: { gl.=’express’ syns.=’express, verbalize, verbalise, utter,give_tongue_to’ }

EC: { gl.=’state_1’ syns.=’state, say, tell’ }OE: { lexeme = ’>MR[’ gloss = ’say’ }

EC: { gl.=’hesitate’ syns.=’hesitate, waver, waffle’ }EC: { gl.=’hover_1’ syns.=’hover, vibrate, vacillate, oscillate’ }

OE: { lexeme = ’RXP[’ gloss = ’hover’ }

F.5 Ontology of Genesis 1, fullEC: { gl.=’Universal’ syns.=’Universal’ }

EC: { gl.=’Entity’ syns.=’Entity’ }EC: { gl.=’Collection’ syns.=’Collection’ }

EC: { gl.=’group’ syns.=’group, grouping’ }EC: { gl.=’collection_1’ syns.=’collection, aggregation, accumulation,

assemblage’ }OE: { lexeme = ’MQWH/’ gloss = ’collection’ }

EC: { gl.=’abstraction’ syns.=’abstraction’ }EC: { gl.=’psychological_feature’ syns.=’psychological_feature’ }

EC: { gl.=’cognition’ syns.=’cognition, knowledge, noesis’ }EC: { gl.=’content’ syns.=’content, cognitive_content, mental_object’ }

EC: { gl.=’idea’ syns.=’idea, thought’ }EC: { gl.=’concept’ syns.=’concept, conception, construct’ }EC: { gl.=’category’ syns.=’category’ }

EC: { gl.=’kind_1’ syns.=’kind, sort, form, variety’ }OE: { lexeme = ’MJN/’ gloss = ’kind’ }

EC: { gl.=’belief’ syns.=’belief’ }

F.5. ONTOLOGY OF GENESIS 1, FULL 221

EC: { gl.=’spiritual_being’ syns.=’spiritual_being,supernatural_being’ }

EC: { gl.=’God_1’ syns.=’God, Supreme_Being’ }OE: { lexeme = ’>LHJM/’ gloss = ’God’ }

EC: { gl.=’proposition’ syns.=’proposition’ }EC: { gl.=’If’ syns.=’If’ }EC: { gl.=’Then’ syns.=’Then’ }EC: { gl.=’communication’ syns.=’communication’ }EC: { gl.=’indication’ syns.=’indication, indicant’ }

EC: { gl.=’evidence’ syns.=’evidence’ }EC: { gl.=’clue’ syns.=’clue, clew, cue’ }EC: { gl.=’sign_1’ syns.=’sign, mark’ }

OE: { lexeme = ’>WT/’ gloss = ’sign’ }EC: { gl.=’attribute’ syns.=’attribute’ }

EC: { gl.=’first_1’ syns.=’first, 1st’ }OE: { lexeme = ’>XD/’ gloss = ’first’ }

EC: { gl.=’third_1’ syns.=’third, 3rd, tertiary’ }OE: { lexeme = ’CLJCJ/’ gloss = ’third’ }

EC: { gl.=’second_1’ syns.=’second, 2nd, 2d’ }OE: { lexeme = ’CNJ/’ gloss = ’second’ }

EC: { gl.=’two_1’ syns.=’two, 2, ii’ }OE: { lexeme = ’CNJM/’ gloss = ’two’ }

EC: { gl.=’property’ syns.=’property’ }EC: { gl.=’manner’ syns.=’manner, mode, style, way, fashion’ }

EC: { gl.=’very_1’ syns.=’very, really, real, rattling’ }OE: { lexeme = ’M>D/’ gloss = ’very’ }

EC: { gl.=’thus’ syns.=’thus, thusly, so’ }OE: { lexeme = ’KN’ gloss = ’so’ }

EC: { gl.=’magnitude’ syns.=’magnitude’ }EC: { gl.=’amount’ syns.=’amount’ }

EC: { gl.=’quantity’ syns.=’quantity’ }EC: { gl.=’abundance’ syns.=’abundance, copiousness,

teemingness’ }EC: { gl.=’profusion’ syns.=’profusion, profuseness,

richness, cornucopia’ }EC: { gl.=’greenness_1’ syns.=’greenness, verdancy, verdure’ }

OE: { lexeme = ’JRQ/’ gloss = ’greenness’ }EC: { gl.=’good_1’ syns.=’good’ }OE: { lexeme = ’VWB/’ gloss = ’good’ }OE: { lexeme = ’VWB[’ gloss = ’be_good’ }

EC: { gl.=’fourth_1’ syns.=’fourth, 4th, quaternary’ }OE: { lexeme = ’RBJ<J/’ gloss = ’fourth’ }

EC: { gl.=’sixth_1’ syns.=’sixth, 6th’ }OE: { lexeme = ’CCJ/’ gloss = ’sixth’ }

EC: { gl.=’all(a)’ syns.=’all(a), all_of’ }OE: { lexeme = ’KL/’ gloss = ’all’ }

EC: { gl.=’quality’ syns.=’quality’ }EC: { gl.=’sameness’ syns.=’sameness’ }

EC: { gl.=’similarity’ syns.=’similarity’ }EC: { gl.=’likeness_1’ syns.=’likeness, alikeness, similitude’ }OE: { lexeme = ’DMWT/’ gloss = ’likeness’ }OE: { lexeme = ’YLM/’ gloss = ’likeness_2’ }

EC: { gl.=’great_1’ syns.=’great’ }OE: { lexeme = ’GDWL/’ gloss = ’great’ }

EC: { gl.=’every(a)’ syns.=’every(a)’ }OE: { lexeme = ’KL/’ gloss = ’every’ }

EC: { gl.=’female_1’ syns.=’female’ }OE: { lexeme = ’NQBH/’ gloss = ’female’ }

EC: { gl.=’living_1’ syns.=’living’ }OE: { lexeme = ’XJ/’ gloss = ’living’ }

EC: { gl.=’one_1’ syns.=’one, 1, i, ane’ }OE: { lexeme = ’>XD/’ gloss = ’one’ }

EC: { gl.=’fruitful’ syns.=’fruitful’ }OE: { lexeme = ’PRH[’ gloss = ’be_fruitful’ }

EC: { gl.=’small_1’ syns.=’small, little’ }


OE: { lexeme = ’QVN/’ gloss = ’small’ }EC: { gl.=’fifth_1’ syns.=’fifth, 5th’ }OE: { lexeme = ’XMJCJ/’ gloss = ’fifth’ }

EC: { gl.=’shape’ syns.=’shape, form’ }EC: { gl.=’amorphous_shape’ syns.=’amorphous_shape’ }

EC: { gl.=’space’ syns.=’space’ }EC: { gl.=’void_1’ syns.=’void, vacancy, emptiness’ }OE: { lexeme = ’BHW/’ gloss = ’void’ }

EC: { gl.=’male_1’ syns.=’male’ }OE: { lexeme = ’ZKR=/’ gloss = ’male’ }

EC: { gl.=’measure’ syns.=’measure, quantity, amount, quantum’ }EC: { gl.=’fundamental_quantity’ syns.=’fundamental_quantity,

fundamental_measure’ }EC: { gl.=’time_period’ syns.=’time_period, period_of_time, period’ }

EC: { gl.=’day’ syns.=’day, daytime, daylight’ }EC: { gl.=’evening_1’ syns.=’evening, eve, eventide’ }OE: { lexeme = ’<RB/’ gloss = ’evening’ }

EC: { gl.=’season_1’ syns.=’season’ }OE: { lexeme = ’MW<D/’ gloss = ’season’ }

EC: { gl.=’night_1’ syns.=’night, nighttime, dark’ }OE: { lexeme = ’LJLH/’ gloss = ’night’ }

EC: { gl.=’year_1’ syns.=’year, twelvemonth, yr’ }OE: { lexeme = ’CNH/’ gloss = ’year’ }

EC: { gl.=’morning_1’ syns.=’morning, morn, morning_time, forenoon’ }OE: { lexeme = ’BQR=/’ gloss = ’morning’ }

EC: { gl.=’time_unit’ syns.=’time_unit, unit_of_time’ }EC: { gl.=’day_2’ syns.=’day, twenty-four_hours, solar_day,

mean_solar_day’ }OE: { lexeme = ’JWM/’ gloss = ’day_1’ }

EC: { gl.=’Physical_object’ syns.=’Physical_object’ }EC: { gl.=’entity’ syns.=’entity, physical_thing’ }

EC: { gl.=’part’ syns.=’part, piece’ }EC: { gl.=’body_part’ syns.=’body_part’ }

EC: { gl.=’organ’ syns.=’organ’ }EC: { gl.=’wing_1’ syns.=’wing’ }OE: { lexeme = ’KNP/’ gloss = ’wing’ }

EC: { gl.=’object’ syns.=’object, physical_object’ }EC: { gl.=’land’ syns.=’land, dry_land, earth, ground, solid_ground,

terra_firma’ }OE: { lexeme = ’>DMH/’ gloss = ’ground’ }OE: { lexeme = ’>RY/’ gloss = ’earth’ }OE: { lexeme = ’JBCH/’ gloss = ’dry_land’ }

EC: { gl.=’natural_object’ syns.=’natural_object’ }EC: { gl.=’celestial_body’ syns.=’celestial_body, heavenly_body’ }

OE: { lexeme = ’M>WR/’ gloss = ’heavenly_body’ }EC: { gl.=’star_1’ syns.=’star’ }OE: { lexeme = ’KWKB/’ gloss = ’star’ }

EC: { gl.=’plant_part’ syns.=’plant_part’ }EC: { gl.=’plant_organ’ syns.=’plant_organ’ }EC: { gl.=’reproductive_structure’ syns.=’

reproductive_structure’ }EC: { gl.=’fruit_1’ syns.=’fruit’ }

OE: { lexeme = ’PRJ/’ gloss = ’fruit’ }EC: { gl.=’seed_1’ syns.=’seed’ }

OE: { lexeme = ’ZR</’ gloss = ’seed’ }EC: { gl.=’living_thing’ syns.=’living_thing, animate_thing’ }

EC: { gl.=’organism’ syns.=’organism, being’ }OE: { lexeme = ’NPC/’ gloss = ’being’ }EC: { gl.=’plant’ syns.=’plant, flora, plant_life’ }EC: { gl.=’vascular_plant’ syns.=’vascular_plant, tracheophyte’ }

EC: { gl.=’herb_1’ syns.=’herb, herbaceous_plant’ }OE: { lexeme = ’<FB/’ gloss = ’herb’ }EC: { gl.=’gramineous_plant’ syns.=’gramineous_plant,

graminaceous_plant’ }EC: { gl.=’grass_1’ syns.=’grass’ }


OE: { lexeme = ’DC>/’ gloss = ’grass’ }EC: { gl.=’woody_plant’ syns.=’woody_plant, ligneous_plant’ }

EC: { gl.=’tree_1’ syns.=’tree’ }OE: { lexeme = ’<Y/’ gloss = ’tree’ }

EC: { gl.=’animal’ syns.=’animal, animate_being, beast, brute,creature, fauna’ }

OE: { lexeme = ’XJH/’ gloss = ’animal_1’ }EC: { gl.=’marine_animal_1’ syns.=’marine_animal, sea_animal’ }

OE: { lexeme = ’CRY/’ gloss = ’marine_animal’ }EC: { gl.=’chordate’ syns.=’chordate’ }

EC: { gl.=’vertebrate’ syns.=’vertebrate, craniate’ }EC: { gl.=’bird_1’ syns.=’bird’ }

OE: { lexeme = ’<WP/’ gloss = ’bird’ }EC: { gl.=’aquatic_vertebrate’ syns.=’aquatic_vertebrate’ }

EC: { gl.=’fish_1’ syns.=’fish’ }OE: { lexeme = ’DGH/’ gloss = ’fish’ }

EC: { gl.=’mammal’ syns.=’mammal’ }EC: { gl.=’placental’ syns.=’placental,

placental_mammal, eutherian, eutherian_mammal’ }EC: { gl.=’livestock_1’ syns.=’livestock, stock,

farm_animal’ }OE: { lexeme = ’BHMH/’ gloss = ’livestock’ }

EC: { gl.=’primate’ syns.=’primate’ }EC: { gl.=’hominid’ syns.=’hominid’ }

EC: { gl.=’homo’ syns.=’homo, man, human_being,human’ }

OE: { lexeme = ’>DM/’ gloss = ’man’ }EC: { gl.=’creepy-crawly_1’ syns.=’creepy-crawly’ }

OE: { lexeme = ’RMF/’ gloss = ’creepy-crawly’ }EC: { gl.=’mutant’ syns.=’mutant, mutation, variation, sport’ }EC: { gl.=’freak’ syns.=’freak, monster, monstrosity,

lusus_naturae’ }EC: { gl.=’leviathan_1’ syns.=’leviathan’ }

OE: { lexeme = ’TNJN/’ gloss = ’leviathan’ }EC: { gl.=’body_of_water’ syns.=’body_of_water, water’ }EC: { gl.=’ocean_1’ syns.=’ocean’ }

OE: { lexeme = ’THWM/’ gloss = ’ocean’ }EC: { gl.=’sea_1’ syns.=’sea’ }

OE: { lexeme = ’JM/’ gloss = ’sea’ }EC: { gl.=’substance’ syns.=’substance, matter’ }EC: { gl.=’food_1’ syns.=’food, nutrient’ }

OE: { lexeme = ’>KLH/’ gloss = ’food’ }EC: { gl.=’compound’ syns.=’compound, chemical_compound’ }

EC: { gl.=’binary_compound’ syns.=’binary_compound’ }EC: { gl.=’water_1’ syns.=’water, H2O’ }OE: { lexeme = ’MJM/’ gloss = ’water’ }

EC: { gl.=’fluid’ syns.=’fluid’ }EC: { gl.=’liquid’ syns.=’liquid’ }

EC: { gl.=’water_1’ syns.=’water, H2O’ }OE: { lexeme = ’MJM/’ gloss = ’water’ }

EC: { gl.=’causal_agent’ syns.=’causal_agent, cause, causal_agency’ }EC: { gl.=’vital_principle’ syns.=’vital_principle, life_principle’ }

EC: { gl.=’spirit_1’ syns.=’spirit’ }OE: { lexeme = ’RWX/’ gloss = ’spirit’ }

EC: { gl.=’location’ syns.=’location’ }EC: { gl.=’space’ syns.=’space’ }

EC: { gl.=’void_1’ syns.=’void, vacancy, emptiness’ }OE: { lexeme = ’BHW/’ gloss = ’void’ }

EC: { gl.=’region’ syns.=’region, part’ }EC: { gl.=’extremity’ syns.=’extremity’ }

EC: { gl.=’boundary’ syns.=’boundary, bound, bounds’ }EC: { gl.=’surface’ syns.=’surface’ }

OE: { lexeme = ’PNH/’ gloss = ’surface_1’ }EC: { gl.=’celestial_sphere’ syns.=’celestial_sphere,

sphere, empyrean, firmament, heavens, vault_of_heaven,


welkin’ }OE: { lexeme = ’CMJM/’ gloss = ’heavens’ }OE: { lexeme = ’RQJ</’ gloss = ’firmament’ }

EC: { gl.=’inside’ syns.=’inside, interior’ }EC: { gl.=’midst_1’ syns.=’midst, thick’ }OE: { lexeme = ’TWK/’ gloss = ’midst’ }

EC: { gl.=’point’ syns.=’point’ }EC: { gl.=’topographic_point’ syns.=’topographic_point, place, spot’ }

OE: { lexeme = ’MQWM/’ gloss = ’place’ }EC: { gl.=’Entity_playing_a_role’ syns.=’Entity_playing_a_role’ }

EC: { gl.=’Possessed_entity’ syns.=’Possessed_entity’ }EC: { gl.=’possession’ syns.=’possession’ }

EC: { gl.=’Situation’ syns.=’Situation’ }EC: { gl.=’State’ syns.=’State’ }

EC: { gl.=’relation’ syns.=’relation’ }EC: { gl.=’state’ syns.=’state’ }

EC: { gl.=’illumination’ syns.=’illumination’ }EC: { gl.=’dark’ syns.=’dark, darkness’ }

OE: { lexeme = ’XCK/’ gloss = ’darkness’ }EC: { gl.=’condition’ syns.=’condition, status’ }EC: { gl.=’dominance’ syns.=’dominance, ascendance, ascendence,

ascendancy, ascendency, control’ }EC: { gl.=’dominion_1’ syns.=’dominion, rule’ }

OE: { lexeme = ’MMCLH/’ gloss = ’dominion’ }EC: { gl.=’emptiness_1’ syns.=’emptiness’ }

OE: { lexeme = ’THW/’ gloss = ’emptiness’ }EC: { gl.=’be’ syns.=’be’ }

OE: { lexeme = ’HJH[’ gloss = ’be_1’ }EC: { gl.=’perceive’ syns.=’perceive, comprehend’ }

EC: { gl.=’see_1’ syns.=’see’ }OE: { lexeme = ’R>H[’ gloss = ’see’ }

EC: { gl.=’Process’ syns.=’Process’ }EC: { gl.=’event’ syns.=’event’ }

EC: { gl.=’happening’ syns.=’happening, occurrence, natural_event’ }EC: { gl.=’beginning_1’ syns.=’beginning’ }

OE: { lexeme = ’R>CJT/’ gloss = ’beginning’ }EC: { gl.=’phenomenon’ syns.=’phenomenon’ }

EC: { gl.=’process’ syns.=’process’ }EC: { gl.=’natural_process’ syns.=’natural_process, natural_action,

action, activity’ }EC: { gl.=’organic_process’ syns.=’organic_process,

biological_process’ }EC: { gl.=’bodily_process’ syns.=’bodily_process, body_process,

bodily_function, activity’ }EC: { gl.=’breath_1’ syns.=’breath’ }

OE: { lexeme = ’NPC/’ gloss = ’breath’ }EC: { gl.=’natural_phenomenon’ syns.=’natural_phenomenon’ }EC: { gl.=’physical_phenomenon’ syns.=’physical_phenomenon’ }

EC: { gl.=’energy’ syns.=’energy’ }EC: { gl.=’radiation’ syns.=’radiation’ }EC: { gl.=’electromagnetic_radiation’ syns.=’

electromagnetic_radiation, electromagnetic_wave,nonparticulate_radiation’ }

EC: { gl.=’actinic_radiation’ syns.=’actinic_radiation,actinic_ray’ }

EC: { gl.=’light_1’ syns.=’light, visible_light,visible_radiation’ }

OE: { lexeme = ’>WR/’ gloss = ’light’ }EC: { gl.=’Action’ syns.=’Action’ }

EC: { gl.=’determine’ syns.=’determine, set’ }EC: { gl.=’identify’ syns.=’identify, place’ }

EC: { gl.=’distinguish’ syns.=’distinguish, separate,differentiate, secern, secernate, severalize, severalise,tell, tell_apart’ }

OE: { lexeme = ’BDL[’ gloss = ’differentiate’ }


EC: { gl.=’bless_1’ syns.=’bless’ }OE: { lexeme = ’BRK[’ gloss = ’bless’ }

EC: { gl.=’travel’ syns.=’travel, go, move, locomote’ }OE: { lexeme = ’RMF[’ gloss = ’move_2’ }EC: { gl.=’fly_1’ syns.=’fly, wing’ }

OE: { lexeme = ’<WP[’ gloss = ’fly’ }EC: { gl.=’make’ syns.=’make, create’ }OE: { lexeme = ’BR>[’ gloss = ’create’ }EC: { gl.=’reproduce’ syns.=’reproduce, procreate, multiply’ }

EC: { gl.=’breed’ syns.=’breed, multiply’ }OE: { lexeme = ’RBH[’ gloss = ’multiply’ }

EC: { gl.=’produce_1’ syns.=’produce, bring_forth’ }OE: { lexeme = ’JY>[’ gloss = ’produce’ }

EC: { gl.=’express’ syns.=’express, verbalize, verbalise, utter,give_tongue_to’ }

EC: { gl.=’state_1’ syns.=’state, say, tell’ }OE: { lexeme = ’>MR[’ gloss = ’say’ }

EC: { gl.=’move_1’ syns.=’move’ }EC: { gl.=’gather’ syns.=’gather, congregate, collect’ }

EC: { gl.=’crowd’ syns.=’crowd, crowd_together’ }EC: { gl.=’pour’ syns.=’pour, swarm, stream, teem, pullulate’ }OE: { lexeme = ’CRY[’ gloss = ’teem’ }

EC: { gl.=’change’ syns.=’change’ }EC: { gl.=’lighten’ syns.=’lighten, lighten_up’ }

EC: { gl.=’light_2’ syns.=’light, illume, illumine, light_up,illuminate’ }

OE: { lexeme = ’>WR[’ gloss = ’illuminate’ }EC: { gl.=’change_magnitude’ syns.=’change_magnitude’ }

EC: { gl.=’increase’ syns.=’increase’ }EC: { gl.=’grow_1’ syns.=’grow’ }EC: { gl.=’develop’ syns.=’develop’ }

EC: { gl.=’grow’ syns.=’grow’ }EC: { gl.=’shoot’ syns.=’shoot, spud, germinate,

pullulate, bourgeon, burgeon_forth, sprout’ }OE: { lexeme = ’DC>[’ gloss = ’sprout’ }

EC: { gl.=’oppress’ syns.=’oppress, suppress, crush’ }EC: { gl.=’repress’ syns.=’repress, quash, keep_down, subdue,

subjugate, reduce’ }OE: { lexeme = ’KBC[’ gloss = ’subdue’ }

EC: { gl.=’change_1’ syns.=’change, alter’ }EC: { gl.=’fill_1’ syns.=’fill, fill_up, make_full’ }

OE: { lexeme = ’ML>[’ gloss = ’fill’ }EC: { gl.=’hesitate’ syns.=’hesitate, waver, waffle’ }EC: { gl.=’hover_1’ syns.=’hover, vibrate, vacillate, oscillate’ }

OE: { lexeme = ’RXP[’ gloss = ’hover’ }EC: { gl.=’control’ syns.=’control, command’ }EC: { gl.=’govern_1’ syns.=’govern, rule’ }

OE: { lexeme = ’MCL[’ gloss = ’govern’ }OE: { lexeme = ’RDH[’ gloss = ’rule’ }

EC: { gl.=’move’ syns.=’move, displace’ }EC: { gl.=’spill’ syns.=’spill, shed, disgorge’ }

EC: { gl.=’seed_3’ syns.=’seed’ }OE: { lexeme = ’ZR<[’ gloss = ’seed_2’ }

EC: { gl.=’separate_1’ syns.=’separate, disunite, divide, part’ }OE: { lexeme = ’BDL[’ gloss = ’separate’ }

EC: { gl.=’put_1’ syns.=’put, set, place, pose, position, lay’ }OE: { lexeme = ’NTN[’ gloss = ’put’ }

EC: { gl.=’gather_2’ syns.=’gather, garner, collect, pull_together’ }OE: { lexeme = ’QWH=[’ gloss = ’gather_1’ }

EC: { gl.=’give_1’ syns.=’give’ }OE: { lexeme = ’NTN[’ gloss = ’give’ }

EC: { gl.=’designate’ syns.=’designate, denominate’ }EC: { gl.=’label’ syns.=’label’ }

EC: { gl.=’name’ syns.=’name, call’ }OE: { lexeme = ’QR>[’ gloss = ’call’ }


226

Appendix G

Grammar of Genesis 1

G.1 Introduction

This appendix shows the grammar of Genesis 1:1-3 as I have extracted it from the WIVUdatabase. I also show the grammar of the whole of Genesis 1, since that is, in a sense, alsomy target text.

G.2 Grammar of Gen 1:1-3clause --> VP/Pred NP/Subjclause --> CjP/Conj VP/Pred NP/Subjclause --> CjP/Conj NP/Subj PP/PreCclause --> PP/Time VP/Pred NP/Subj PP/Objcclause --> CjP/Conj NP/Subj VP/PreC PP/Cmplclause --> CjP/Conj NP/Subj VP/Pred NP/PreCVP/PreC --> verbVP/Pred --> verbNP/PAR --> nounNP/PPobj --> nounNP/PPobj --> NP/REG NP/recNP/PPobj --> article nounNP/PreC --> NP/PAR conjunction NP/parNP/REG --> nounNP/Subj --> nounNP/Subj --> NP/REG NP/recNP/Subj --> article nounNP/par --> nounNP/rec --> nounNP/rec --> article nounPP/Cmpl --> preposition NP/PPobjPP/Objc --> PP/PAR conjunction PP/parPP/PAR --> preposition NP/PPobjPP/PreC --> preposition NP/PPobjPP/Time --> preposition NP/PPobj

227

228 APPENDIX G. GRAMMAR OF GENESIS 1

PP/par --> preposition NP/PPobjCjP/Conj --> conjunction

G.3 Grammar of Genesis 1clause --> PP/Objcclause --> VP/Predclause --> NP/PreCclause --> VP/Pred NP/Subjclause --> CjP/Rela VP/PreCclause --> CjP/Rela PP/PreCclause --> CjP/Conj PP/Objcclause --> NP/Objc PP/Adjuclause --> CjP/Conj VP/PreOclause --> VP/PreC NP/Objcclause --> CjP/Rela VP/Predclause --> VP/Pred PP/Cmplclause --> CjP/Rela AP/PreCclause --> CjP/Conj VP/Predclause --> CjP/Conj VP/Pred PP/PreCclause --> CjP/Conj VP/Pred PP/Cmplclause --> CjP/Conj VP/Pred NP/Subjclause --> CjP/Rela VP/Pred NP/Subjclause --> CjP/Conj PP/Objc PP/Adjuclause --> VP/PreC NP/Objc PP/Adjuclause --> PP/Cmpl VP/Pred PP/PreCclause --> CjP/Rela PP/PreC NP/Subjclause --> CjP/Conj VP/Pred PP/Objcclause --> CjP/Conj VP/Pred AdvP/Modiclause --> CjP/Conj NP/Subj PP/PreCclause --> VP/Pred NP/Subj PP/PreCclause --> CjP/Rela VP/PreC PP/Cmplclause --> PP/Adju VP/Pred PP/Objcclause --> NP/Adju VP/Pred PP/Objcclause --> VP/Pred NP/Subj PP/Cmplclause --> VP/Pred NP/Subj NP/Objcclause --> CjP/Rela NP/Subj PP/PreCclause --> CjP/Conj VP/Pred VP/PreC PP/Cmplclause --> PP/Time VP/Pred NP/Subj PP/Objcclause --> CjP/Conj VP/Pred PP/PreC PP/Locaclause --> CjP/Conj PP/Cmpl VP/Pred NP/Objcclause --> CjP/Conj NP/Subj VP/Pred PP/Cmplclause --> CjP/Conj IjP/Intj AP/PreC AdvP/Modiclause --> CjP/Conj PP/Cmpl PP/Objc PP/Adjuclause --> IjP/Intj VP/Pred PP/Cmpl PP/Objcclause --> CjP/Conj VP/Pred PP/Cmpl NP/Subjclause --> CjP/Conj NP/Subj VP/Pred NP/PreCclause --> VP/Pred NP/Objc PP/Adju PP/Adjuclause --> VP/Pred NP/Subj NP/Objc PP/Adjuclause --> CjP/Conj NP/Subj VP/Pred PP/Loca

G.3. GRAMMAR OF GENESIS 1 229

clause --> CjP/Conj NP/Subj VP/PreC PP/Cmplclause --> CjP/Conj VP/Pred NP/Subj PP/Cmplclause --> CjP/Conj VP/Pred NP/Subj PP/Objcclause --> CjP/Conj VP/Pred PP/Objc NP/Subjclause --> VP/PreC NP/Objc PP/Adju PP/Locaclause --> CjP/Conj VP/Pred PP/Cmpl CjP/Conj PP/Cmplclause --> CjP/Conj VP/Pred PP/Objc NP/Subj PP/Cmplclause --> CjP/Conj VP/Pred NP/Subj PP/Objc PP/Adjuclause --> VP/Pred NP/Subj NP/Objc NP/Objc NP/Objcclause --> CjP/Conj VP/Pred NP/Objc PP/Cmpl NP/Objcclause --> CjP/Conj VP/Pred NP/Subj NP/Objc NP/Objc CjP/Conj NP/ObjcVP/PreC --> verbVP/PreO --> verbVP/Pred --> verbVP/Pred --> preposition verbNP/Adju --> NP/PAR conjunction NP/parNP/Objc --> nounNP/Objc --> NP/REG NP/recNP/Objc --> noun adjectiveNP/Objc --> NP/PAR conjunction NP/parNP/PAR --> nounNP/PAR --> NP/REG NP/recNP/PAR --> NP/PAR conjunction NP/parNP/PPobj --> nounNP/PPobj --> noun nounNP/PPobj --> NP/REG NP/recNP/PPobj --> article nounNP/PPobj --> NP/PAR conjunction NP/parNP/PPobj --> article noun article adjectiveNP/PreC --> noun nounNP/PreC --> noun adjectiveNP/PreC --> NP/REG NP/recNP/PreC --> NP/PAR conjunction NP/parNP/REG --> nounNP/REG --> NP/REG NP/recNP/Subj --> nounNP/Subj --> article nounNP/Subj --> NP/Subj PP/SubjNP/Subj --> NP/REG NP/recNP/Subj --> noun adjectiveNP/par --> nounNP/par --> NP/REG NP/recNP/rec --> nounNP/rec --> adjectiveNP/rec --> noun adjectiveNP/rec --> article nounNP/rec --> noun nounNP/rec --> article noun article adjectiveAP/PreC --> adjectivePP/Adju --> preposition NP/PPobjPP/Cmpl --> preposition

230 APPENDIX G. GRAMMAR OF GENESIS 1

PP/Cmpl --> PP/Cmpl PP/CmplPP/Cmpl --> preposition NP/PPobjPP/Cmpl --> PP/PAR conjunction PP/parPP/Loca --> preposition NP/PPobjPP/Objc --> prepositionPP/Objc --> preposition NP/PPobjPP/Objc --> PP/Objc PP/ObjcPP/Objc --> PP/PAR conjunction PP/parPP/Objc --> PP/Objc CjP/Conj PP/ObjcPP/PAR --> preposition NP/PPobjPP/PAR --> PP/PAR conjunction PP/parPP/PAR --> preposition NP/PAR conjunction PP/parPP/PreC --> prepositionPP/PreC --> preposition NP/PPobjPP/PreC --> PP/PAR conjunction PP/parPP/PreC --> preposition preposition preposition NP/PPobjPP/Subj --> preposition preposition NP/PPobjPP/Time --> preposition NP/PPobjPP/par --> preposition NP/PPobjAdvP/Modi --> adverbCjP/Conj --> conjunctionCjP/Rela --> conjunctionIjP/Intj --> interjection

Appendix H

Plots

H.1 Introduction

This appendix lists the data and gnuplot1 scripts used to produce the graphs in Figure 8.1 onpage 90 and Figure 8.2 on page 91.

The values for text sizes smaller than Genesis 1:1-2:21 (1096) have not been considered,as the margin of error is too large for these small values to make any meaningful contributionto the plots and the formula. Below 1096 words, the formula mentioned in Section 8.4.2 onpage 89 does not hold, but as can be seen from the plots, and especially Figure 8.2 on page 91,the formula holds beautifully for larger values, even much larger values such as all of Genesis1-50. For values smaller than 1096, the properties of the logarithmic functions are such that evensmall increases in the argument to the function (such as the jump from 17 to 16) produces anerror of margin which greatly influences the shape of the graph.

For example, looking at 139 words (Genesis 1:1-8), we find that 17 phrase-level rules havebeen found. Had this been but 1 different, the difference to the plot’s y-axis would have beenln(17)-ln(16) ≈0.060. Taking this now to 1096 words, we find that 108 phrase-level rules havebeen found. Had this been 1 different, we would have gotten a difference in the plot’s y-axisof ln(108)-ln(107) ≈0.0093, which is an error-margin which is almost an order of magnitudesmaller. Thus small flucutationsfluctuations in the lower end of the spectrum lead to great influ-ences on the shape of the graph.

H.2 Data# Column 1: Number of words# Column 2: Number of phrase-level patterns with phrase-types and functions# Column 3: Number of clause-level patterns with phrase-types and functions# Column 4: Number of phrase-level patterns with phrase-types# Column 5: Number of clause-level patterns with phrase-types# Column 6: Number of clause-level patterns with functions

1 Gnuplot is a standard plotting program available from <http://gnuplot.sourceforge.net/>.

231

232 APPENDIX H. PLOTS

# Column 7: Number of phrase-level patterns with phrase-types and functions,# but no functions for rule heads.# 39 words: Genesis 1:1-3# 139 words: Genesis 1:1-8# 326 words: Genesis 1:1-17# 673 words: Genesis 1# 1096 words: Genesis 1:1-2:21# 1666 words: Genesis 1-3# 3543 words: Genesis 1-7# 5483 words: Genesis 1-11# 9621 words: Genesis 1-20# 13624 words: Genesis 1-26# 28735 words: Genesis 1-50#39 21 6 8 6 6 8#139 30 17 13 16 17 13#326 45 30 19 24 30 20#673 74 54 31 38 49 341096 116 108 50 71 77 551666 138 163 57 102 113 643543 237 290 84 168 184 1105483 288 397 115 225 238 1389621 384 683 151 361 380 18713624 449 878 179 445 454 22128735 650 1525 260 710 682 322

H.3 GNUPlot scripts

H.3.1 Plotting rules against wordsset terminal postscriptset output "rules-against-words.ps"set title "Plot of rules against words"set xlabel "Words"set ylabel "Rules"set label "Gen 1-3" at 1666,980 rotateset label "Gen 1-11" at 5483,980 rotateset label "Gen 1-20" at 9621,980 rotateset label "Gen 1-26" at 13624,980 rotateset label "Gen 1-50" at 28735,980 rotateplot ’rule_graph_data.txt’ using 1:5 with lines lt 2 lw 4 \

title "Clauses only with phrase types", \’rule_graph_data.txt’ using 1:4 with lines lt 2 lw 2 \

title "Phrases only with phrase types", \’rule_graph_data.txt’ using 1:3 with lines lt 5 lw 4 \

title "Clauses with phrase types and phrase functions", \’rule_graph_data.txt’ using 1:2 with lines lt 5 lw 2 \

title "Phrases with phrase types and phrase functions", \’rule_graph_data.txt’ using 1:7 with lines lt 3 lw 2 \

title "Phrases with phrase types and functions but no functions on heads", \’rule_graph_data.txt’ using 1:6 with lines lt 1 lw 4 \

title "Clauses only with phrase functions and disregarding constituent order"

H.3. GNUPLOT SCRIPTS 233

H.3.2 Plotting ln(rules) against ln(words)set terminal postscriptset output "log-rules-against-log-words.ps"set title "Plot of ln of rules against ln of words"set xlabel "ln(Words)"set ylabel "ln(Rules)"plot ’rule_graph_data.txt’ using (log($1)):(log($5)) with lines lt 2 lw 4 \

title "Clauses only with phrase types", \’rule_graph_data.txt’ using (log($1)):(log($4)) with lines lt 2 lw 2 \

title "Phrases only with phrase types", \’rule_graph_data.txt’ using (log($1)):(log($3)) with lines lt 5 lw 4 \

title "Clauses with phrase types and phrase functions", \’rule_graph_data.txt’ using (log($1)):(log($2)) with lines lt 5 lw 2 \

title "Phrases with phrase types and phrase functions", \’rule_graph_data.txt’ using (log($1)):(log($7)) with lines lt 3 lw 2 \

title "Phrases with phrase types and functions but no functions on heads", \’rule_graph_data.txt’ using (log($1)):(log($6)) with lines lt 1 lw 4 \

title "Clauses only with phrase functions and disregarding constituent order"


234

Appendix I

Mathematics of plots

I.1 IntroductionThis appendix shows the mathematical argumentation for the point I made in Section 8.4.2 onpage 88 that the number of grammar rules grows more and more slowly as the amount of textanalyzed grows. This was taken from the empirical data in the graphs in Figure 8.1 on page 90and Figure 8.2 on page 91.

I.2 ArgumentationRecall that what we are measuring is the number of rules r found after analyzing w words oftext. When one takes ln(r) and plots it against ln(w), one gets the very beautiful straight lines inFigure 8.2 on page 91.

Notice that the gradient in all of the lines is less than 1, and that all of the lines cross they-axis above 1. This means that

ln(r) = a ln(w) + c 0 < a < 1, c > 1

Now, if one lifts this up as powers of e on both sides, one gets:

eln(r) = ea ln(w)+c

which simplifies to

r = ec(eln(w))a

which simplifies to

r = dwa d = ec

So r is an exponential function in the number of words, with a proportionality-constant d =ecand an exponent a.

235

236 APPENDIX I. MATHEMATICS OF PLOTS

Now, since a < 1we can rephrase this as

r = dw1

b b > 1

which is another way of saying that

r = d b√

w

Now, in order to know how this changes as w grows, we can take the differential with respectto w. The rule for constants is

ddx

cf(x) = cd

dxf(x)

and the rule for exponents is that

ddx

xn = nxn−1

Taken together, it implies that

dr

dw= daw(a−1)

Since 0 < a < 1, we have the fact that −1 < (a − 1) < 0. This means that we can write thedifferential as

dr

dw= daw−m 0 < m < 1, m = 1 − a

or, in a still different rewrite:

dr

dw= daw

−( 1

j)

j =1

m, j > 1

This again means that

dr

dw=

daj√

w

Since we have a constant in the numerator and a monotonically growing function of w in thedenominator, we have that

limw → ∞

daj√

w= 0

Ergo, as w grows towards infinity, the rate of change grows closer and closer to 0. This meansthat as w grows, one will have to analyze greater and greater amounts of text in order to get thesame amount of growth in the rules.

I.2. ARGUMENTATION 237

Now, it may be that this reasoning is not valid. After all, we are not dealing with a continuousfunction, but with discrete (integer) values. Hence the method I have used may not work, sincein general you cannot differentiate a non-continuous function.

Also, one has to consider what the domain under investigation is, namely language. Languageis highly dynamic, and people will, given enough time, say the strangest things, which, whenanalyzed, give rise to still new exceptions to rules. People bend language and twist it to suittheir communicative needs. Hence the number of rules is likely to continue to grow and notcompletely stagnate, given enough analyzed text. However, what I have argued is that at somepoint, the rate of change will drop very close to 0.


238

Appendix J

Rules

J.1 IntroductionIn this Appendix, I list the rules that I have devised for transforming the intermediate graphs tosemantic graphs.

J.2 Rules[Rule:

[Universal*a][Universal*b][Premise:

[Universal?a:’Pred’][Universal?b](Time?a?b)][Conclusion:

[Universal?a][Universal?b](ptim?a?b)]

].[Rule:

[Universal*a][Universal*b][Premise:[be_1?a:’Pred’][Process?b](Subj?a?b)

][Conclusion:[Universal?a][Universal?b](stat?b?a)

]].[Rule:

[Universal*a][Universal*b][Premise:[Action?a:’Pred’][Entity?b](Subj?a?b)

][Conclusion:[Universal?a][Universal?b](agnt?a?b)

]].

239

240 APPENDIX J. RULES

[Rule:[Universal*a][Universal*b][Premise:[Action?a:’PreC’][Entity?b](Subj?a?b)

][Conclusion:[Universal?a][Universal?b](agnt?a?b)

]].[Rule:

[Universal*a][Universal*b][Premise:[Situation?a:’Pred’][Entity?b](Objc?a?b)

][Conclusion:[Universal?a][Universal?b](thme?a?b)

]].[Rule:

[Universal*a][Universal*b][Universal*c][Premise:

[be_1?a:’Pred’][Entity?b][Universal?c](Subj?a?b)(PreC?a?c)

][Conclusion:

[Universal?b][Universal?c](stat?b?c)]

].[Rule:


[Universal?a:’PreC’][Universal?b][Universal?c](Cmpl?a?b)(over?b?c)

][Conclusion:

[Universal?a][Universal?c](over?a?c)]

].[Rule:


[Universal?a:’PreC’][Universal?b][Universal?c](Subj?a?b)(over?a?c)

][Conclusion:

[be_1*d][Universal?b][Universal?c](stat?b?d)(over?d?c)]

]

Documents

Creation in Graphs Extracting Conceptual Structures from Old …people.hum.aau.dk/~ulrikp/MA/Download/Report-Final.pdf · 2006. 5. 4. · developed by Prof. Dr. Eep Talstra and his