23
DATA MODELING AND METADATA From graphs to graphs 1

Data modeling and metadata

  • Upload
    vangie

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Data modeling and metadata. From graphs to graphs. Metadata. Full metadata: relational schemas Self defining data: XML, key/value, key/document No metadata: untagged images, video, audio Parallel metadata: tagged images, video, audio. Full schema metadata. Origins: Semantic networks in AI - PowerPoint PPT Presentation

Citation preview

Page 1: Data modeling and metadata

DATA MODELING AND METADATAFrom graphs to graphs

1

Page 2: Data modeling and metadata

Metadata Full metadata: relational schemas Self defining data: XML, key/value,

key/document No metadata: untagged images, video,

audio Parallel metadata: tagged images, video,

audio

2

Page 3: Data modeling and metadata

Full schema metadata Origins:

Semantic networks in AI Metadata mixed in with data Objects (nodes in graph), has-a (arcs in graph),

is-a (arcs in graph), types (nodes), subtypes (nodes)

Essentially a network with metadata and all instances of the metadata

Goal was to model knowledge of real world, not to manage volumes of data

3

Page 4: Data modeling and metadata

Early databases Slow to adopt data structuring

abstractions because speed of access was the focus

Hierarchical and network databases Links between records of one file to records

of another E.g., each claim record is linked to a subscriber

record Also, sets of records and sets of links

4

Page 5: Data modeling and metadata

Relational databases5

First true abstraction of metadata separated from data

Minimal structure in order to accommodate fast retrieval of tuples

Abstractions Relation Attribute Tuple PKs, CKs, FKs, null/not null

Page 6: Data modeling and metadata

Concurrent with relational database development: “semantic” databases

6

Like semantic networks (quite deliberately), only metadata separated from data

Not object-oriented No object IDs No classes instantiated from types

A wide variety of competing models, with “the” Semantic Model being one of them

Page 7: Data modeling and metadata

Semantic databases, continued

7

Other modeling notions Components or aggregates that are necessary

parts of an object and cannot be changed, like the day you were born or the VIN of a car Versus Properties or attributes that can be changed,

like your name or the transmission in a car Cause and effect relationships

Such as a sales visit leading to a sale And many other specialized relationships

Interestingly, no query facilities and no commercial systems that were successful

Page 8: Data modeling and metadata

Persistent programming languages

8

Not necessarily object-oriented Host language is the only language Data can be persistent or not, often

selectively Strong notion of metadata as

programming data types

Page 9: Data modeling and metadata

Object-oriented databases9

Strong notion of object ID and object identity

Types/subtypes and classes Strong sense of metadata separate from

data Behavioral encapsulation

Page 10: Data modeling and metadata

Object-relational databases10

Objects in the small User defined data types for attribute

domains No behavioral encapsulation

Page 11: Data modeling and metadata

One-of-a-kind semantically rich databases11

Engineering/CAD data Complex objects Lots of singleton types, but with strict

notion of metadata Complex constraints Far reaching component and constraint

relationships

Page 12: Data modeling and metadata

One-of-a-kind scientific/medical/financial databases

12

Managing type-based, voluminous data with little internal structure (imaging)

Managing textual data with some structure and lots of domain-based terminology

Often there are real-time demands made on distributed databases – very difficult problem By putting timing constraints on specific

parts of the data processing code

Page 13: Data modeling and metadata

Self-defining data13

Inspired by need to stream data live and process it in one pass

Also inspired by the need to vary the structure of individual pieces of data, like documents and other items that don’t really have a shared type construct

XML developed as a shared language model for semi-structured (or self-defining) data Developed in part to assist the construction of the

semantic web Data is streamed on the Internet or from sensors

Page 14: Data modeling and metadata

Self-defining data, continued

14

NoSQL databases that store extremely high volumes of loosely structured data Documents with internal structure Values with no meaning within the

database Usually no formal query language, as

data is interpreted programmatically (either partially or fully); sometimes there is a library of common query templates

Page 15: Data modeling and metadata

No metadata databases15

Early blob and continuous data Images Video Audio Flash

All processing of data taking place in complex programs that do not retrieve metadata or insert metadata in the data E.g., image processing, facial searching,

language searching

Page 16: Data modeling and metadata

Recent blob/continuous data

16

Development of parallel metadata databases that contain low level and semantically rich tagging

Only the metadata database is actively searched

Searching can be enhanced by downloading small samples

Feedback loops to improve tag interpretation

Tags taken from shared namespaces

Page 17: Data modeling and metadata

Assertion based databases17

Usually use triples (assertions) Triples are chained together to make

new inferences Metadata is treated like data

Joe owns a Ford Fords are cars

SQL-like, triple-hopping query languages

Page 18: Data modeling and metadata

Graph databases18

Networks of objects that blur the boundary between data and metadata

Supports levels of connectivity orders of magnitude bigger than in network and hierarchical databases of old

Has a purpose that is reminiscent of network/hierarchical databases – to represent the fluid and highly interconnected nature of complex data, such as that collected from social media

Use graph-like query and programming interfaces

Page 19: Data modeling and metadata

Graphics/animation/gaming data19

Shares a lot of properties with scientific and engineering data

Innately mathematical Straight and curved line 2D geometry used

in 3-space Bezier and NURBS for curves

Matrix mathematics for 3D manipulation Transpose, Scale, Rotate

Mapping to pixel based data for presentation

Page 20: Data modeling and metadata

Graphics/animation/gaming, continued20

For real-time rendering, low polygon objects and bounding box collision mathematics used

Creates the most aggressive demands on processing and graphics card technology

Often no notion at all of metadata at all Even non-real-time animation demands

low quality interactive rendering

Page 21: Data modeling and metadata

Procedural data21

Used heavily in photo/video processing Focusing, removing objects, adding color

effects, changing lighting, etc. There are standalone apps and plugin products

Used heavily in animation Procedural textures and materials that don’t

need to tiled Environment procedures (often sun and sky) Cloning to make crowds Lighting and camera objects

Page 22: Data modeling and metadata

Metadata for procedural data

22

Big problem Difficult to crisply define the “meaning” of

procedural data Often, the reason procedural data exists is that

the task is too complex This sort of data is often inherently non-

declarative The marketplace is filled with competing,

varying products, each with its own interface, and they are too powerful to scrap

Page 23: Data modeling and metadata

Procedural data, continued23

Mathematical packages used for minding Almost ironically, these are somewhat

easier to package declaratively, since the mathematics can be so complex that its foundation is used in a black box fashion