V3NLP Framework dbAnnotation Database Schema (created 12/2011) (revised) 10/09/2012 (revised) 10/18/2012) (revised 10/23/2012) (revised 10/25/2012)

v3NLP FrameworkdbAnnotation Database Schema

(created 12/2011)(revised) 10/09/2012

(revised) 10/18/2012) (revised 10/23/2012) (revised 10/25/2012)

Tables (original)document

document_id BIGINT

referenceSystem VARCHAR(120)

referenceLocator VARCHAR(120)

documentAnnotations

documentAnnotation_id BIGINT

document_id BIGINT

annotation_id BIGINT

annotation


entityDefinition_id BIGINT

entityDefinition


Name VARCHAR(120)

provenance VARCHAR(120)

span

span_id BIGINT


filter VARCHAR(50)

startOffset INTEGER

endOffset INTEGER

feature

feature_id BIGINT



featureElementText

featureElement_id BIGINT

feature_id BIGINT

value VARCHAR(6300)

1

1

n

1

11

1

n

n

1

1

n

Annotation Notes• There is a one-to-one relationship between rows in the documentAnnotations table and the

Annotations table.• It is recognized that these tables should be folded into one table. There is an explanation why they are

not.• resources, including annotation admin and chart reader use a schema similar to this. dbAnnotation’s

schema is meant to be isomorphic with the schemas for annotation admin and chart reader.• These two tables mirror external tables set up for other tools within VINCI’s data. • Chart reader and annotation admin allow for an annotation that spans across documents. Under such

circumstances, there would be an annotation_id that would have a different documentAnnotation id. • The dbAnnotation schema does not handle this pathologic circumstance, resulting in the one-to one

relationship rather than the n to 1 relationship in the other schemas.

documentAnnotations


document_id BIGINT


Annotation [see notes]


entityDefinition_id BIGINT1

1

Additional Tables (revised)

Corpus [see notes]

corpus_id BIGINTdocument_id BIGINTrun_id VARCHAR(20)documentName VARCHAR(120)documentTitle VARCHAR(120)patient_id VARCHAR(20)tiu_id VARCHAR(20)

1

annotationConceptIndex [see notes]

corpus_id BIGINT

document_id BIGINT

run_id VARCHAR(20)

tiu_id VARCHAR(20)

patient_id VARCHAR(20)

documentTitle VARCHAR(120)


startOffset INTEGER

endOffset INTEGER

annotation_name VARCHAR(60)

content VARCHAR(2100)

negationStatus VARCHAR(20)

sectionName VARCHAR(40)

conceptNames VARCHAR(160)

cuis VARCAR(12)

semanticTypes VARCHAR(20)

semanticGroups VARCHAR(20)

featureNames VARCHAR(2100)

featureValues VARCHAR(2100)

Corpus Notes• This table is needed to track the same document through the same

software multiple times, as when the software gets revised.• Document name is equivalent to reference locator in the document

table, but only filled out with a full path to location of the document. (Reference locator might be filled out with the query that created the record)

• tiu_id is the record id from the table (TIU_NOTES) whence it came. This might be different than the document name.

• patient_id. Patient id is the link to groups of documents. Patient id is not propagated to the normalized table to keep a firewall between potentially de-identified records and patient sensitive data.

• Slot for documentTitle if known.

Corpus [see notes]corpus_id BIGINT

document_id BIGINT

run_id VARCHAR(20)

documentName VARCHAR(120)



tiu_id VARCHAR(20)

annotationConcept Index Notes• This table is a flattened view of the corpus for information retrieval

purposes• One row per annotation and one table for query purposes• Is just one of a number of indexes/views that could be made from the

normalized tables. • Includes patient and tui ids • One to one relationship between corpus, document and run id• The (normalized) text between offsets is represented in this table

within the content field.• Annotation names will contain labels that are kinds of concepts – for

example Symptom.• Includes slots for documentTitle, sectionName• Concept attributes represented as explicit fields including

conceptNames, cuis, semanticTypes, and semanticGroups• Concept attributes are pipe delimited fields• Feature names is a pipe delimited string with each field being a feature

name as a catch all for other attributes• Feature values is a pipe delimited string with each field being a feature

value as a catch all for other attributes• One to one correspondence between feature name and value fields.


corpus_id BIGINT

document_id BIGINT

run_id VARCHAR(20)

tiu_id VARCHAR(20)




startOffset INTEGER

endOffset INTEGER






cuis VARCAR(160)





View to be created from dbAnnotation to annotations-dbd

• The annotation-dbd schema is an agreed upon schema for interoperability between several systems at the Salt Lake City VA including annotationAdmin, and ChartReader

• When the need arises, a database view can be created to make dbAnnoation look like the annotations-dbd tables to preserve interoperability between systems.

Tables (revised)document

document_id BIGINT

referenceSystem VARCHAR(120)

referenceLocator VARCHAR(120)

documentAnnotations


document_id BIGINT


Annotation [see notes]



entityDefinition


Name VARCHAR(120)

provenance VARCHAR(120)

span

span_id BIGINT


filter VARCHAR(50)

startOffset INTEGER

endOffset INTEGER

feature

feature_id BIGINT



featureElementText

featureElement_id BIGINT

feature_id BIGINT

value VARCHAR(6300)

1

1

11

1 n

n

1

1

n

Corpus [see notes]corpus_id BIGINT

document_id BIGINT

run_id VARCHAR(120)

documentName VARCHAR(120)



tiu_id VARCHAR(20)

1


corpus_id BIGINT

document_id BIGINT

run_id VARCHAR(20)

tiu_id VARCHAR(20)




startOffset INTEGER

endOffset INTEGER






cuis VARCAR(160)





1

annotations-dbd Schema

Compatibility with the annotations-dbd schema

Annotations-dbd Table Name dbAnnotations Table Name Compatibility Notes

analyte_ reference

field: id

document

field: document_id field: run_id

Both have reference_system, and reference_locator fields. v3NLP tools do not fill these fields out.

The annotations_dbd schema does not have a run_id.

Annotation_analyte_reference field: analyte_reference_id

documentAnnotations field: documentAnnotation_id

Both have the field filter. v3NLP tools do not fill this field out.

span field: id

span field: span_id

Offsets in the annotation-dbd are long, but int’s in the dbAnnotations schema.

annotation field: id field: resource_id

annotation field: annotation_id field: entityDefinition_id

reference field: id field: uri

entityDefinition field: entityDefinition_id field: provenance

feature field: id field: resource_id

feature field: feature_id field: entityDefinition_id

1. Annotations-dbd contains a parent id field not replicated in dbAnnotations schema.

2. Annotations-dbd features table can reference other features. V3NLP tools have not implemented this relationship.

feature_element_text field: id field: text_value

featureElementText field: featureElement_id field: value

The feature_id,resource_id pair is redundant and not replicated in the dbAnnotations.

feature_element_numeric [TBD]

feature_element_blob [TBD]

Documents

V3NLP Framework dbAnnotation Database Schema (created 12/2011) (revised) 10/09/2012 (revised) 10/18/2012) (revised 10/23/2012) (revised 10/25/2012)