Upload
dirk-roorda
View
162
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SHEBANQ project (half-way) as a use case in querying language resources. The corpus is the text of the Hebrew Bible with linguistic features, packaged in de special text database and converted to LAF
Citation preview
Data Archiving and Networked Services !
SHEBANQ !
Dirk Roorda - researcher @ DANS,TLA !
System for HEBrew Text: ANnotations for Queries and Markup !
TEI pre-conference workshop: Query !Roma – 2013-10-01 !
Overview
1. Context: text, data, research in Hebrew Bible
2. MdF database model, MQL query language
3. Sharing the research process
4. CLARIN-NL project: SHEBANQ
5. Towards new tools
1 (of 5) Context
Text, data and research in the Hebrew Bible
VU Amsterdam
Eep Talstra Centre for Bible and Computer
text + linguistic features => database
database + research questions => publications
4 !
2 (of 5) MdF and MQL
• MdF database model
• MQL query language
Monad Object Feature
1977-now: Eep Talstra et al. ECA, WIVU. Print reference (Google Books)
1988-1994 Crist-Jan Doedens: Text Databases – One Database Model and Several Retrieval Languages (google books reference)
2004: Ulrik Petersen. Emdros - a text database engine for analyzed or annotated text. COLING
word objects
standardedition
text
monads(atomic chunks
of text)
lexeme_utf8= תישארold_lexeme_utf8= תישאר
vocalized_lexeme_utf8= תישארsurface_consonants_utf8= תישאר
graphical_lexeme_utf8= ישאר
׃ץראה תאו םימשה תא םיה.א ארב תישארב
1234567891011
23456789101112
84383
59559
34680
7763777638
40770
7 .. 511 .. 9
11 .. 5
11 .. 5
11 .. 1
11 .. 1
clause_atom_number=1clause_atom_relation=0
clause_atom_relation_daughter_tense=unknownclause_atom_relation_kind=No_relation
clause_atom_relation_mother_tense=unknownclause_atom_relation_preposition_class=none
clause_atom_type=xQtlindentation=0
phrase objects
Monad-Object-Feature
subphrase objects
phrase_atom objects
clause_atom objects
sentence objects
MQL query language
topographic, i.e:
query expression =~= query results w.r.t.
• sequence
• embedding
Example SELECT ALL OBJECTS !WHERE ![Clause ! [Phrase ! [Word FOCUS !" " "part_of_speech = verb AND !" " "lexeme = "FJM["] !
] ! .. ! [Phrase FOCUS !" "phrase_function = Objc OR !" "phrase_function = IrpO!
] ! .. ! [Phrase FOCUS !" "phrase_function = Objc OR !" "phrase_function = IrpO!
] !] !
!
3 (of 5) Sharing
Problem: how to share (intermediate) results of analysis
Solution: saving queries as annotations
Lock - in
scholarly-bi
bles.com!
Stuttgart Electronic Study Bible
⇒ massive dissemination
But
⇒ not the right dynamics for tool development
Leiden: international workshop biblical scholarship
Desiderata:
new tool development
text transmission (variants)
linguistic analysis (features)
even combined!
a short history: 2012
leiden loren
tz!
Hebrew Text in the Archive
urn:nbn:nl:u
i:13-ikjj-ek
!
Hebrew Text in the Archive
urn:nbn:nl:u
i:13-ikjj-ek
!
how can the people annotate
our work? !
Research Data Cycle
Research Data Cycle Text transmission, tradition, editorial
processes
Free University, theology faculty,
server department, WIVU project
!
NWO projects !NWO projects
religious communities
theol. scholars
theol. scholars
enlightened lay people
scholarly-
bibles.com!
Research Data Cycle Text transmission, tradition, editorial
processes
Free University, theology faculty,
server department, WIVU project
!
NWO projects !NWO projects
religious communities
theol. scholars
theol. scholars
CLARIN SHEBANQ
linguists
Wider public: Annotation,
Query Saving, via Linked Data
dig. hum
comp. hum
enlightened lay people
scholarly-
bibles.com!
Research Data Archiving
DANS
3 (of 5) Sharing (c’t’d)
Solution: Queries As Annotations
queries-as-annotations
model ! query ! example !
body ! query instruction !SELECT ALL OBJECTS WHERE [Word FOCUS part_of_speech = verb AND lexeme = "שים"] !
targets ! query results in context !
ו ישכם יעקב ב בקר ו יקח את ה אבן אשר שם מראשתיו ו ישם אתה מצבה ו יצק שמן
על ראשה
annotation ! published query ! qu123 (just an identifier) !
metadata !
researcher, date created, date last
run, research question !
Janet Dyk 2004-02-16 2012-01-27 Can the verb ים have a double שobject? - article in Foundations for Syriac Lexicography !
OpenAnnotation openannotati
on.org!
provenance
motivation
demonstrator datane
tworkservice
.nl/qaa!
demonstrator datane
tworkservice
.nl/qaa!
demonstrator datane
tworkservice
.nl/qaa!
demonstrator datane
tworkservice
.nl/qaa!
demonstrator
demonstrator
demonstrator
demonstrator
still missing:
saving queries
not semantic-web-enabled
sustainability
4 (of 5) Project
CLARIN-NL: SHEBANQ:
(A) Curation
(B) Demonstrator
SHEBANQ
System for Hebrew Text: ANnotations for Queries
CLARIN-NL project
data curation: LAF
demonstrator: query saver
#!/etc bc
s/g$/q/ !
Linguistic Annotation Framework
ISO 24612:2012
Nancy Ide, Laurent Romary
feature definitions
feature definitions
TEI ISO-FS schema
dcr:datcat on <fDecl> versus <f>
26,225,966 <f>s ! !2.5 GB redundant attribute material !!
5 (of 5) Project
CLARIN-NL: SHEBANQ: (B) Demonstrator
select all objects where
[clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ]]
Execute
Query executed
Passage
תאו םימשה תא םיהלא ארב תישארב׃ץראה
תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי
Controls
תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי
Gen 1:1
2Chron 3:4
Gen 1:1 תאו םימשה תא םיהלא ארב תישארב׃ץראה
תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי
Text
1Sam 12:4
Ex 23:2
Query results
Prev 2 3 65 ... 2241 Next21 313 results
Executing query ...
view in context
Save this query
Researcher Oliver Glanz
Date created 2013-08-25
Date last run 2013-08-25
Project Data and Tradition
Institute VU/Eep Talstra Centre for Bible and Computing
Reason irregular valency of ארב
Comments needs to be combined with query on םיהלא
Save PublishCancel
Name valency ארב
Edit Query
Passage
תאו םימשה תא םיהלא ארב תישארב׃ץראה
תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי
Controls
תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי
Gen 1:1
2Chron 3:4
Gen 1:1 תאו םימשה תא םיהלא ארב תישארב׃ץראה
תיב הלעא יכ תוא המ והיקזח רמאיו׃הוהי
Text
1Sam 12:4
Ex 23:2
Saved Query Results
Prev 2 3 65 ... 2241 Next21 313 results
view in context
Information on this query
Researcher Oliver Glanz
Date created 2013-08-25
Date last run 2013-08-25
Project
Institute
Reason
Comments
Name
Query Info
select all objects where
[clause [phrase phrase_function = Objc [word FOCUS tense = infinitive_absolute] ]]
MQL query text Persistent Identifier urn:nbn:nl:ui:13-scpm-ji
http://www.persistent-identifier.nl/?identifier=urn...
valency ארב
Data and Tradition
VU/Eep Talstra Centre for Bible and Computing
irregular valency of ארב
needs to be combined with query on םיהלא
datanetworks
ervice.nl/qa
a!
SHEBANQ: implementing Q-a-A
5 (of 5) Towards new tools
• LAF tools
• or generic graph algorithms
• Emdros tools
• or generic database technology
• Linked Data tools
• or generic SPARQL queries
Side conditions • development close to the researchers
• preferably in their own institutions
• decent performance
• within the scale of a laptop
• usable to researchers
• that is: non-programmers
• persistence in mind
• new results will be archived and re-enter the data cycle
thank you
slideshare.net/dirkroorda/
s/g$/q/ !
#!/etc bc Eep Talstra Centre for Bible and Computer!