51
The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at the Arts Faculty, University of Oslo. The databases contains data from archaeology, antropology, botany, zoology, numismatics, history, history of arts, lexicography The databases are accessible via specially developed end user applications and via the WWW.

The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Embed Size (px)

Citation preview

Page 1: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The Universities’ Collection Databases

”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at the Arts Faculty, University of Oslo.

The databases contains data from archaeology, antropology, botany, zoology, numismatics, history, history of arts, lexicography

The databases are accessible via specially developed end user applications and via the WWW.

Page 2: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

This presentation gives an overview of A common user interface Samples from some of the databases

The Universities' Collection databases

Page 3: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Implementation The databases are implemented in Oracle 8.1.7, not using any spesific

object oriented features The object types (and the table structures) are defined in a common

meta database All databases are accessed via a common framework The common framework get design and structure information from

the meta database. All queries are generated automatically on the basis of the information in the meta database.

Each user is granted access via a user database The user interface program checks the meta database for new versions

of modules and upgrade it self automatically via the net. New databases are added regularly A WWW version is being developed

Page 4: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The users have their personal navigator for quick access to

databases of interest.

Each database has an assosiated object

type

Page 5: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The users can add their own folders or categories to the navigator

Page 6: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Search for the artifact type ”ring”

Choose a database(archaeological

artifacts and finds)

Page 7: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Click on a column title to sort the result

grid

Page 8: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Drag and drop a column title to

group the rows in the grid

9 rings found in the county

'Akershus'

Page 9: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Double click to view detailed information

(show the object viewer)

Page 10: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The artifacts found together with the

selected ring (in the same find event)

Page 11: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The users can export the data as HTML, Excel or according to the users’

predefined report templates

Page 12: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The result grid exported

to Excel

Page 13: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The users can define report

templates

Page 14: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Drag and drop result rows onto a predefined report template of the

corresponding object type to create a report

Page 15: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The report is ready to be printed

Page 16: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Click to save pointers to selected rows in the result

grid

A list can hold pointers to a manually selected set of

objects or a dynamic set (query defined). The pointers can be of a single object type or have

different typed. In the latter case the type will be the

common supertype

Page 17: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Click on the list icon to see the content of a list. In the system a stored list is just a (sub) database

and can be queried.

Page 18: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Additional pointers to can be added to an

existing list

Page 19: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Click on the explorer icon to get an overview of users

and data sources (databases and stored

lists)

Page 20: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Select another database (here: place

names excerpts)

Click to see both the result grid and the

object viewer

Page 21: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Click to switch windows

Display the object correponding to the next

row in the result grid

Page 22: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The users can create and store their personal

result grid design

The tree structure reflects the structure of the object

type as defined in the meta database

Page 23: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The users can create and store their

personal query form design

Page 24: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Linguistic and lexicographicapplications

Lexicographic archives Lexical databases Dictionary databases Editing tools The Meta Dictionary - a tool for the field linguist or

lexicographer The Norwegian Dictionary project Text corpus tools

Page 25: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Lexical archives The database for the traditional word slip collection of

the Norwegian Dictionary project Main collection : 2 900 000 facsimiles Regional collection: 187 000 facsimiles The database is linked to the Meta Dictionary

Page 26: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Head wordPart of speechLiterature referencesPlace of utteranceFacsimiles

Page 27: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Morphological databases Lists with lemmata and inflected forms for

the two Norwegian written languages (bokmål, nynorsk)

Basis for a two level morpho-syntactic tagger Produced in collaboration with the Text

Laboratory at the Arts faculty, Univ. of Oslo Bokmål: 156.000 lemmata, 1,2 million

inflected forms Nynorsk: 123.000 lemmata, 896.000 inflected

forms The databases are linkedto the Meta

Dictionary

Page 28: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Paradigme codes andgenerated inflected forms

Lemma

Page 29: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Dictionary databases Database tools for two major Norwegian

dictionaries The entire process from editing to camara

ready manuscript The tools are integrated in the common

framwork The manuscripts are linked to the Meta

Dictionary

Page 30: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The dictionary entry

Page 31: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Fields for different information categories

Graphical representation of the definition

structure

The editing tools are for the time being not a parts of the common

framwork

AWYSIWYGpresentation of the entry

Page 32: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The program generates the head word part of the entry

based on the lemma and part of speech marking

The entries can be viewed in the their running context

Navigation buttons

Page 33: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

A set of entries (or the entire manuscript) can be typeset in the PDF format and presented on the screen.

The entries are exported from the database as XML documents, converted via TEX, DVI to PDF and send back to the user.

Page 34: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The Norwegian Dictionary A national dictionary project (nynorsk) To be finished in 12 volumes by year 2014 DOK is developing the software solutions The dictionary manuscript is linked to the

Meta Dictionary

Page 35: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Graphical representation of

the entry

The full text based on the

structure of the entry

Page 36: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Each part of the dictionary entry has its own data

entry form

Data entry form for the head word part

Artikkelteksten vert kontinuerleg

oppdatert

The entry text is updated

automatically

Page 37: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Skard’s dictionary Defines the 1938 orthography 32 000 entries The dictionary is linked to the Meta

Dictionary

Page 38: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at
Page 39: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The Meta Dictionary

A tool for systematising weakly normalized languages and a tool for the development of the Norwegian Dictionary (NO2014)

Interlinks different lexical databases 521 000 headwords (NO2014) The backbone in the (NO2014) project

Page 40: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

924 slips about the word ”hus”

(house)

Page 41: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Word forms /lemmata written in different dialects

and/or according to changing orthographies

Page 42: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Word compound analysis

Page 43: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Links to other lexical

resources

Object viewer according to the type of lexical resource (here

slips)

Page 44: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Tool for fast normalization of the head words in

the Meta Dictionary

Each project assistant has to normalize 300 entries a day

All links are manually checked

Page 45: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Background Editorial requirements for NO2014 Design and implementation

Unit for digital documentation, DOKWork began in August 2002 and will continue according to the tasked assigned to the unit by NO2014 for one [email protected]

Norwegian (Nynorsk) electronic text corpus

Page 46: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

The definitive corpus for New Norwegian for lexicography and for other domains using electronic resources

A corpus access system that can be reused for other languages and text collections

Incorporation of robust methods from computational linguistics with the goal of creating a linguistic workbench, over and above a corpus workbench

Norwegian (Nynorsk) electronic text corpusLong-term goals

Page 47: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Editorial work within NO2014Headword selectionChoice of examples

Examples are catalogued in the Meta Dictionary

Sense divisionFirth: Knowing a word by the company it keeps.Aided by the refined collection of examples

Norwegian (Nynorsk) electronic text corpusApplication Area

Page 48: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Excerpta refined byMethods from computational linguisticsHuman interaction

Eventually a selection will be made for publication, but in the framework of the Meta Dictionary, even those that were excluded from publication will remain available for other application areas

Communication with the editing software through the Meta Dictionary

Norwegian (Nynorsk) electronic text corpusIntegration with the Meta Dictionary

Page 49: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Representative corpus based on specifications produced by the EU language resources project, LE-PAROLE

SGML markup in accordance with PAROLE’s specifications, based on TEI

One-to-One mapping between the PAROLE format and a database structure defined in Oracle.

Norwegian (Nynorsk) electronic text corpusDesign

Page 50: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

25,000,000+ wordsDag og Tid (news paper)

21,000,000 wordsLegacy data

approx 5,000,000 literature Existing agreements

Weekly deliveries from Dag og TidSamlaget (publishing house)Syn og Segn (monthly magazine)

Norwegian (Nynorsk) electronic text corpusStatus

Page 51: The Universities’ Collection Databases ”The Universities’ Collection Databases” denotes all databases developed by the Unit for digital documentation at

Application for access through the web, last quarter of 2002

Balancing the domains covered by the corpus: continuous

Stand-alone windows application Continuous incorporation of computational

linguistics methods for phrase identification and extraction,

Norwegian (Nynorsk) electronic text corpusThe next steps