Why SGML (Retro Alert 1995)

Preview:

DESCRIPTION

A presentation developed and delivered in 1995. It was designed to be part of a larger introduction to SGML. It is interesting today because it foregrounds many (if not all - and perhaps a few extra) of the themes being touched upon in discussions of Intelligent Content. It needed to be shared just in case someone thought that this was all new.

Citation preview

(1995)

figure list

para

document

title

Sub-title

+

Why SGML?

The Need for SGML

Course

Module *

Module

knowledge

information information

data data

...

+

+

?

* *

First delivered: 1995

www.gollner.ca

(1995)

What is SGML?

SGML stands for the Standard

Generalized

Markup

Language

SGML is an international (ISO) standard

ISO 8879:1986 Information Processing - Text and

Office Systems - Standard Generalized Markup

Language (SGML)

(1995)

What is SGML? Informal Definitions

SGML is a system and processing

independent means of representing,

creating, managing and exchanging

information.

SGML is an “intelligent markup language”

that protects the accessibility, usability, life

expectancy and value of information.

(1995)

Why SGML? A Meditation on a Paper Clip

The paper clip is a

low-tech version of

hypertext – facilitating

the physical association

of documents & fragments.

Often used in addition to

electronic files where

such associations cannot be

easily shown or enforced.

(1995) SGML was created

to better manage documents Publications

Training Manuals

Specifications

Documentation

Reports

Correspondence

Policies

Procedures

Standards

Plans

Directives

Commentaries

Proposals

(1995) Most Information

is held in Documents

Document Information Database Information

10% 90%

IM Budget

Allocations 90% 10%

(1995) Structured Database

Information

Formalized

Processes

Relational Structure

Strict Definitions Limited Access

Stable Organizational

Boundaries

Limited Flexibility

(1995)

Document Information

A Document is a meaningful organization of

Information

A Document is meaningful because it is

communicated between people to achieve

specific goals

A Document combines multiple media types

together in an organized, but not strictly

predictable, form that people can use

(1995)

Document Information Features

Chapter Title Section Title

1

Multiple

Dynamic

Processes

Wide and

Variable

Access

Hierarchical Structure

Variable Definitions

Variable Organizational

Boundaries

(1995)

Document Information Conclusions

Document Information does not fit within the

conventional Database paradigm

Database Information is organized

according to the needs of the Computer

Document Information is organized

according to the needs of the User

Few of the assumptions within the Database

Paradigm apply to Documents

(1995) Document Management

Technology Today

(1995)

Documents and Computers

Computers help us create more paper faster

Computers help us format printed

documents more efficiently and at less cost

Computers have not helped with the

management consequences

(1995)

The Document Explosion

The volume of documents is growing

exponentially

The visibility of document-based

transactions is increasing

The rise of the Internet and Enterprise

Integration dramatically alters the potential

user community of a document

Documents are becoming more complex,

larger and more varied in format

(1995)

Management Breakdown

Traditional Records Management practices

and technologies cannot cope with the

volume, complexity, or volatility of computer-

generated documents

The typical response has been to extend the

Database paradigm to document information

Given currently-used technology, the best

that can be done is the “Electronic Filing

Cabinet” (old tools made electronic - again)

(1995)

What’s Wrong

Computers traditionally store documents as

“objects”

Computers know very little (almost nothing)

about these objects some management information (author, version, date)

little awareness of document content

less awareness of document structure

Computers can only associate some

information with the objects as the objects

have no inherent “intelligence”

(1995)

New Technologies

Applications have evolved to redress some

of these shortcomings

“Electronic Filing Cabinets” associate

management information with document

objects and physically control events

Full-Text Retrieval technologies have been

used to access Document “Content”

Word Processors are used to infer the

structure of documents based on format

(styles and templates)

(1995)

Electronic Filing Cabinets

In an “Electronic Filing Cabinet”

environment, management information is

associated with these “objects”

Document objects that leave the sphere of

control are no longer managed

Chapter Title Section Title

1

Chapter Title Section Title

1

Chapter Title Section Title

1

Chapter Title Section Title

1

Sphere of Control

(1995)

Full-Text Retrieval

Create external indices of the textual content

of a document

Various text indexing algorithms are used to

support searches by word, by text string,

proximity, exclusion and so on

Useful but imprecise as document volume

increases

New technologies arising to improve search

precision (lexicon-based, links to metadata)

(1995)

Word Processors

Evolving to include basic management

information (profiles)

Evolving to include template structures

(document types)

Management and structural information only

accessible through Word Processor

application (directly or via API)

These new Word Processing features are

not generally used

(1995)

Proprietary Documents

The basic problem is that traditional

documents are produced and maintained in

a proprietary and non-intelligent format

Electronic Documents are simply paper

documents in a more reproducible form

Electronic Documents are printed for use

People retain and use hardcopy “files”

New Applications still assume a static

environment and single format use

(1995)

Proprietary Formats

Word Processing applications offer an

enhanced implementation of the typewriter,

the copy editor and the typesetter

Word Processing applications Add formatting instructions to text

Execute formatting instructions to produce an output

(operating system and printer interface)

Formatting Instructions are specific to the

application that created them and the

platform on which they were created

(1995)

Procedural Markup Processing Instructions

Chapter Title

Section Title

1

12 pt. bold Helvetica

10 pt. bold Helvetica

8 pt. Times

on 10 pt. leading

8 pt. Times

on 10 pt. leading

7 pt. Helvetica bold

(1995)

Proprietary Markup Typical of Word Processors

[Center][Und On]SGML[Und Off][Hrt]

[Hrt]

[Font: Helvetica 10pt]

[Indent]Introduction[Hrt]

[Hrt]

[Font: Times Roman 8pt]

[Tab]Someday [Italic On]information

[Italic Off] will be free.[Hrt]

Position

Style

Font

(1995) Binary Storage Formats Highly Proprietary and

Optimized for Performance

ÿWPC-$ �

� ûÿ� 2 �� � � B ÿÿH W

Z ­ � � �� #| x � �

cpi) Courier 12pt (10cpi) Courier 12pt (10cpi) (Bold) CG Times (WN)

(Italic) ÿÿÿÿÿÿÿÿÿÿÿÿÿÿHP LaserJet

III HPLASIII.WRS Û�x �-�Œ

��@É ‡Ï� � ,�È ,�,�4Y-œJX�@Ð�� � � �ÐÓ�� USCE� �Óûÿ� 2 Ø�

ÿÿ1 O� ÿÿ… € � ÿÿ� R ÿÿ Ÿ Courier 12pt (10cpi) Courier 12pt

(10cpi) (Bold) CG Times (WN) (Italic) CG Times (WN) (Bold Italic) Univers (WN) Univers (WN)

Q���X�˜þþþþþþþÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿûÿ� 2 _

��@�

� ÿÿd J� ��@� ®� ÿÿq î

�" ‚ ÿÿÿÿ5�ÿÿ…�ÿÿû�ÿÿÿÿÿÿ@�ÿÿÿÿÿÿ^;C`cc±›CCCc±CCCCccccccccccCCDZÇc±zz

…�zr��CY…o¦…�z�zco�z¦zooCCCcccccYcY7cc77Y7�ccccMM7cY…YYMYcYc± ;; !cc

c Rc c c zczczczczc±……YzYzYzYzYC7C7C7C7…c•c•c•c•c•c•c•c•c;Yzc•c•c

�coY�czczczczc…Y …Y…c zczczc�c�c � �c�c�c�ccccccc Y …Yo7 oR

…c …c •c;;zM zRcM;;N; \ ccCc\\cc ;cc±±cF ccc±F CC ;;;;;; ;;;

; ;; ; CFtC±nn ± ± ÅyyÑ

2 co ±7¥ �c Ÿ Å Ñ ¥ \\™™™

HP LaserJet!

(1995)

Proprietary Documents

Are proprietary to the originating software

Limit or obstruct cross-platform interchange

Are non-intelligent

provide no consistent mechanism to determine

document context, content, or structure

provide no means to enhance automation

Support only one output rendering (print)

Will become obsolete

Information in an obsolete format

is itself obsolete!

(1995) Portability Problems Paper remains the format for

Document Interchange

Chapter Title Section Title

1

Chapter Title Section Title

1

Chapter Title Section Title

1

(1995)

Low Document Intelligence Marginal Automated Support

for Business Processes

Lack of Document Intelligence prevents

computers from providing effective

document management or workflow support

Paper remains the working medium

Chapter Title Section Title

1

Approval

Review

(1995) Single Output Formats

Create Additional Costs

WP Printed

Documents

Conversion $

CD ROM

Conversion $

WWW

Conversion $

Database

Proprietary

Formatting

(1995) Obsolescence Information must survive when

Products become obsolete

Multimate

WPS Plus

Display Write

Lotus Manuscript

Lanier

Wang

Mass-11

WPS-8

CPT

Word-11

NBI Legend

Xywrite

Where are they now?

(1995)

Summary

Traditional computing technology and

management practices are failing to cope

with the increasing volume of documents

Non-Intelligent, Proprietary document

formatting restricts document manageability,

portability, utility, quality, affordability,

suitability for multi-format publishing, and

longevity.

Business is therefore conducted in paper!

(1995)

Are your information assets

frozen in Proprietary Formats?

Recommended