10
Formatting standards for document interchange Joan Smith discusses the work of the ISO in preparing a document layout standard This paper describes the current work of ISO/TC 97/SC 18/WG 3 in the writing of a multipart standard on 'Information processing -- text preparation and inter- change--text structures', which is approaching the status of a draft proposal The purpose of the standard is to facilitate the interchange of office documents, be they reports, invoices, letters or memoranda, where the layout of the document is preserved so that it is displayed on the receiving device as intended by the author, and that its logical structure supports further processing, be it editing, sorting or indexing. This meth- odology is intended to allow unambiguous description of all operations, chiefly creation, presentation, comm- unication, receipt, distribution, filing, handling and processing. It is specified in the (currently) four parts of the standard, general introduction, office document architecture, document description and office docu- ment interchange format, which are presented in the paper. The architecture allows for single-mode and mixed-mode working, taking into account such services as teletex, videotex and facsimile, where documents may be composed of character box, geometric or photographic elements, or an admixture thereof. It is pertinent to all those who wish to interchange and store textual documents where such interchange may be local, national, or international Keywords: office automation, document transmission formatting, standards Word processors started to come onto the market many years ago, and as they became more sophisti- cated and more of them became available the stage was reached when most large offices considered National Computing Centre, Oxford Road, Manchester M1 7ED, UK buying a word processor to replace the typewriter. Of course, a printer was also required, and in some cases, because reports often contained diagrams, a graphics device was also purchased. Then there is facsimile, and provision will shortly exist for what is described as group 4 facsimile, which can interface with other modes to provide for multimedia documents. These have to be indexed and stored for subsequent retrieval, and it may be necessary to edit the documents further, arrange for copies to be sent elsewhere and so on. Thus the office environment of yesterday is slowly changin& and moving towards the electronic office. Since these devices are all electronic, it is not unreasonable to think of connecting them, and since communications technology has been developing apace, it should surely be possible to interchange documents locally, even with the next office; nation- ally, perhaps with another branch; internationally, in the case of a multinational concern; or with colleagues in different environments and with equipment of different vendors. All this is perfectly feasible, and could be possible if only there were the required standards, without which it will remain but a pipe dream. Standards development has recognized this situa- tion and has been working towards the means of creating an environment where intercommunication of word processors and workstations becomes viable. If manufacturers implement the standards this will become possible, and if users specify conformance with standards when procuring new equipment, it will become reality--or, to be more precise, it can become reality. BACKGROUND TO STANDARDS WORK The International Organization for Standardization (ISO) has been working on interconnection via sub- 0140-3664/84/040171-10503.00 © 1984 Butterworth & Co (Publishers) Ltd. vol 7 no 4 august 1984 171

Formatting standards for document interchange

Embed Size (px)

Citation preview

Formatting standards for document interchange

Joan Smith discusses the work of the ISO in preparing a document layout standard

This paper describes the current work of ISO/TC 97/SC 18/WG 3 in the writing of a multipart standard on 'Information processing - - text preparation and inter- change-- text structures', which is approaching the status of a draft proposal The purpose of the standard is to facilitate the interchange of office documents, be they reports, invoices, letters or memoranda, where the layout of the document is preserved so that it is displayed on the receiving device as intended by the author, and that its logical structure supports further processing, be it editing, sorting or indexing. This meth- odology is intended to allow unambiguous description of all operations, chiefly creation, presentation, comm- unication, receipt, distribution, filing, handling and processing. It is specified in the (currently) four parts of the standard, general introduction, office document architecture, document description and office docu- ment interchange format, which are presented in the paper. The architecture allows for single-mode and mixed-mode working, taking into account such services as teletex, videotex and facsimile, where documents may be composed of character box, geometric or photographic elements, or an admixture thereof. It is pertinent to all those who wish to interchange and store textual documents where such interchange may be local, national, or international

Keywords: office automation, document transmission formatting, standards

Word processors started to come onto the market many years ago, and as they became more sophisti- cated and more of them became available the stage was reached when most large offices considered

National Computing Centre, Oxford Road, Manchester M1 7ED, UK

buying a word processor to replace the typewriter. Of course, a printer was also required, and in some cases, because reports often contained diagrams, a graphics device was also purchased. Then there is facsimile, and provision will shortly exist for what is described as group 4 facsimile, which can interface with other modes to provide for multimedia documents. These have to be indexed and stored for subsequent retrieval, and it may be necessary to edit the documents further, arrange for copies to be sent elsewhere and so on. Thus the office environment of yesterday is slowly changin& and moving towards the electronic office.

Since these devices are all electronic, it is not unreasonable to think of connecting them, and since communications technology has been developing apace, it should surely be possible to interchange documents locally, even with the next office; nation- ally, perhaps with another branch; internationally, in the case of a multinational concern; or with colleagues in different environments and with equipment of different vendors. All this is perfectly feasible, and could be possible if only there were the required standards, without which it will remain but a pipe dream.

Standards development has recognized this situa- tion and has been working towards the means of creating an environment where intercommunication of word processors and workstations becomes viable. If manufacturers implement the standards this will become possible, and if users specify conformance with standards when procuring new equipment, it will become real i ty--or , to be more precise, it can become reality.

BACKGROUND TO STANDARDS WORK

The International Organization for Standardization (ISO) has been working on interconnection via sub-

0140-3664/84/040171-10503.00 © 1984 Butterworth & Co (Publishers) Ltd.

vol 7 no 4 august 1984 171

committee 16, which deals with open systems inter- connection (OSI), within Technical Committee 97, responsible for all aspects of information processing. This standard, or to be more precise, series of stan- dards, is now nearing publication, being at various stages from a fully fledged international standard, in the case of ISO 7498, "Information processing--open systems interconnection- a basic reference model" via draft international standards, to draft proposals. These specify interconnection by means of protocols or sets of rules.

All ied to this, at the application level, there is the work of Subcommittee 18, which deals with text preparation and interchange, this being achieved through its working groups:

• W G 1 - - user requirements, • WG 2 - - symbols and terminology, • W G 3 - - text structures, • WG 4 - - procedures for text interchange, • W G 5 - - text preparation and presentation.

The remit of WG 1 is to make known the user requirements to WGs 3, 4 and 5, whi le the projected output from WG 2 is a glossary to which the other WGs are to contribute. Only WGs 3, 4 and 5 wil l produce standards. WG 3 has the task of defining the structures of the text and their representation within the data stream, WG 4 deals with the messaging aspects, and WG 5 effectively takes the data stream and positions the text on the rendit ion media.

Since any standards relating to this work must necessarily be internationally recognized and inter- nationally adopted in general, member countries of the ISO are devoting all their effort to ISO work, and are not developing separate standards for the same subject. The British Standards Institution's (BSI) OIS/18 (comparable to ISO/TC 97/SC 18) has working parties which provide input to their opposite number, for example WP 3 to WG 3.

In addit ion to member bodies of the various countries concerned, WG 3 also liaises with Technical Committee 29 of the European Computer Manu- facturers' Association (ECMA) and study group (SG) VIII of the International Telegraph and Telephone Consultative Committee (CCITT). The CCITT's SG VIl l has produced a draft recommendation T.73 Document interchange protocol for the telematic services, to be presented at the end of the four-year study period at the general assembly to be held in November 1984, whereas ECMA's TC 29 is preparing a standard which it hopes to have ratified at a general assembly meeting before the end of 1984. WG 3 submitted draft pro- posals of the mult ipart standard to the SC 18 plenary session in April 1984, this being but the first stage in the process of publication, where a draft proposal (DP) in t ime becomes a draft international standard (DIS), when comments are taken into account and ir is voted upon prior to being published as an international standard.

When WG 3 was set up in April 1981, its remit was

to standardize the elements of formatted text and those in a logical structure of processable text (e.g. suitable for manipulations such as editing, sorting, indexing), and to standardize the sequences of text elements.

The purpose of the standard was to define a method- ology to describe a generalized text structure which would take into account the logical aspects of the rendit ion media. The document went on to state that this intended to allow unambiguous description of all operations that may be performed on the text, chiefly creation, presentation, filing, communication, receipt, distribution, handling and processin&

This is being achieved through the (currently) four parts of the mult ipart standard:

• Part 1 - - general introduction, • Part 2 - - office document architecture, • Part 3 - - document description, • Part 4 - - office document interchange format.

It is designed to be expansible, so that other parts may be added in the future. The extent to which it is achieving the objectives may be judged from the fol lowing sections.

P R O P O S E D M U L T I P A R T S T A N D A R D

G e n e r a l i n t r o d u c t i o n

This draft proposal has yet to be allocated a number, which will remain with it even when upgraded. Never- theless, Part I , the general introduct ion of Information processing- text preparation and interchange- text structures does exist. It was further refined by an ad hoc working group which met in Paris in December 1983, then presented to WG 3 at its meeting in February 1984 in Geneva, before transmission to SC18. It provides a framework for the mult ipart standard and acts as a carrier for the definit ions which have been written by the ad hoc group. Terms used in a non- dictionary sense are extracted by members of the group from current output documents, a copy of the definit ions being made available to WG 2 for the glossary, and also to ECMA and the CCITT. It is expected that conformance clauses will be added later, possibly as a result of output from an SC 18 ad hoc group meeting due to take place in November 1983 on functional classes of documents. At present, it contains just one clause dealing with an implement- ation of document description.

The introduct ion states that the purpose of the standard is to facilitate the interchange of office documents, providing for their representation in such a way as to enable the documents to be reproduced as intended by the sender, and to facilitate their process- ing by the recipient, where the interchange is by means of data communications or the exchange of storage media. Its field of application defines office documents as items such as memoranda, letters, forms and reports, including pictures and tabular material,

172 computer communications

where the graphic elements used within the docu- ments include character box, geometric and photo- graphic elements, potentially all within one document. The model is designed to be expansible to encompass other types of documents and elements, such as digitized sound.

It will continue to be updated and refined as other parts progress, including more references to published standards, for example, further definitions or more conformance clauses so that manufacturers can supply equipment conforming to relevant parts of the stan- dard depending on the device and its degree of sophistication.

Office document architecture

The office document architecture (ODA) provides the definition of an abstract document model intended for representation of office documents, and the descrip- tion of the model's constituent parts. These are organized into two hierarchical structures, the logical structure and the layout structure, which are linked by layout directives. The content of the document consists of character box, geometric or photographic elements, the logical structure relating the content to logical text objects, such as headings, sections, paragraphs, figures and footnotes. It is the layout structure that relates the content to its positioning and rendition on the various presentation media, whatever these might be, the layout directives expressing the layout requirements of the logical objects.

A conscious effort has been made to allow interfac- ing with other international standards for text process- ing and interchange, and ODA supports the incorpo- ration of subarchitectures within the content which are in accordance with other standards, for example those specific to character box, geometric or photographic elements. The architecture identifies and interrelates objects which may contain components with different subarchitectures, thus providing for mixed-mode docu- ments.

Documents can be regarded as being members of different classes, examples being memorandum, letter or report, where any class may be defined by specifying a set of common properties. These properties can be the types of logical text objects that may occur and the relationships between them, predefined layout com- ponents and common portions of content. Their aim is to maintain the consistency of the document within the definition of the class, should the document be modified, and to facilitate the creation of documents within a class.

Those structures which are common to a class are called generic logical and layout structures, the rela- tionships between them being generic layout direct- ives. Those relating to a single instance are called specific logical and layout structures, the relationships between them being specific layout directives.

The generic logical structure may be regarded as a set of rules from which specific logical structures may

be derived; specific logical structures may be derived from their generic counterparts. It is unnecessary to interchange the generic structure each time that a specific instance of that class is interchanged. In this case, only one or more external references to the generic structure are required. This reduces potential overheads in the transmission of unnecessary inform- ation, with the added implication of interpretation by the receiving device.

The specific logical structure is basically a tree structure in which the number of its hierarchical levels depends on the application. Nodes of the tree are called logical text objects, the terminal nodes being called basic text objects. The content of a basic text object is of a single category of graphic element, be it character box, geometric or photographic, and depend- ing on the category, it may have a more detailed structure. A basic text object of character box elements, for example, may be subdivided into sentences or words.

Those nodes below the level of document and above that of basic text objects are called composite text objects, examples of which are chapters, sections, paragraphs, abstracts, figures and footnotes, the appli- cation determining which of these actually constitute hierarchical levels in the logical structure of a docu- ment. The generic logical structure can define and control the application-dependent identification of text objects in the specific logical structure, their hierarchical relations and permissable sequences.

In a similar way, the generic logical structure is also basically a tree structure, allowing for predefined relationships between logical text objects, possibly with predefined contents for some of the basic text objects, for example standard paragraphs. One type of branch within this tree defines the hierarchy of the generic logical objects and that of the corresponding specific logical text objects, the terminal nodes being generic basic text objects, generic composite text objects being the intermediate nodes. Another type of branch defines the permissible sequences of logical text objects. These branches, which only occur between nodes of equal level in the hierarchy, may form cycles to allow the repetition of logical text objects.

As far as the interchanged data stream is concerned, the structures do not impose a particular sequence of the objects as presented. Figures 1 and 2 give examples of generic and specific logical structures, respectively, for a document named 'report'.

Turning to the specific layout structure, this too is a tree structure with a variable number of hierarchical levels depending on the application, but where the minimum number below that of'document' is one. The nodes in this tree are called layout objects, the branches of the tree representing their division into subordinate layout objects. Terminal nodes are called basic layout objects, which may have a more detailed internal structure depending on the type of content; for example, a basic layout object of character box elements may be subdivided into lines or character strings. The set of rules governing this is termed a

vol 7 no 4 august 1984 173

0. :t

O 3 "0

O 3 3 r-

i1)

o

I Ta

ble o

f TM

Figu

re 1

. E

xam

ple

of a

gen

eric

log

ical

str

uctu

re o

f a

clas

s of

doc

umen

t na

med

're

po

rt'

w~

| J

L O

< o_

0

~a

c-

u1

Fi

gure

2.

E~

F7

Exa

mpl

e of

a s

peci

fic l

ogic

al s

truc

ture

of

a "r

epor

t"

C

| r

subarchitecture, the content of a basic layout object being l imited to a single subarchitecture.

Below the document level, the layout objects are page set, page, frame and block, where a basic layout object may be either a page or a block. A page, which corresponds to a unit of the presentation medium, is a reference area used for posit ioning and imaging the content of the document, where this area may be smaller than, equal to or greater than the size of the physical page. A page set consists of more subordinate page sets that need to be identif ied as a group, containing a section of a manual, for example.

If a page is not a basic layout object, it may contain one or more levels of frame. Within these levels, the contents of logical objects may be formatted under control of the layout objectives which refer to the frames. Frames may be overlaid, either partially or fully. The basic container for a portion of document content is a block, all blocks and frames below the level of page being posit ioned relative to the next level of frames. Frames may be overlaid, either partially or fully, transparently or opaquely, the latter being specified by attributes of the block concerned. A frame may act as a container for two blocks; for example, a picture and its caption, where these should not be separated.

The generic layout structure is also a tree structure, allowing for predefined relationships between specific pages, frames and blocks. Its nodes are generic page set, generic page, generic frame and generic block, with similar rules to those for their specific counter- parts. A generic block allows for a predefined block having a predefined position and content, for example

standard Iogos or a copyright notice. Its contents comprises part of the common content portions of the document class, where the content may be of only one category of graphic element, either character box, geometric or photographic.

Document content is divided into content portions, the subdivision being such that any content portion corresponds to (at most) one basic logical text object and (at most) one basic layout object. For examples of the generic and specific layout structures of a docu- ment named 'manual', see Figures 3 and 4, respectively.

Further text for the section on attributes was added at the recent meeting of WG 3 in Dallas, USA. These attributes are parameters of the components of the logical and layout structures specifying characteristics of the components and relationships between the components. Each attr ibute consists of an identif ier and a value specification, where the latter may contain one of the following:

• a constant, • a user name of the components, • a reference to another component, • a rule to calculate the value depending on an

attr ibute of another component, • a procedure, such as plausibi l i ty checking for

forms, • a rule to calculate the value depending on the

content of the document.

There are both specific and generic attributes associ- ated with a component of the specific and generic document, respectively. The generic attr ibute applies to all corresponding components in the specific

~ / ~ i ~ . . . . . ~ i a'-ge'~t'- _~,I_~;_~e~contents- - ~ ~ / Ohapte~

/ ~ ' ~ ~ v e r /f ~,lntroductionf~ Tableof /'~'~ Title 7 "~ ~ i i i i i )tiill i cOntents I [ / page ~" JNormal page

~t fs~ ~bFl°°~e r (~) ~ ) ~ ~ Introduction Title Text Text frame page frame frame 1

l~ 3 LOgO E J Header E3 Figure 3. Examp/e of a generic layout structure of a c/ass of document named named "manual"

176 computer communicat ions

Pl ~ ~ a b l e o f P f ~Chapter P~"

~ page age

Introduction Logo Title, Logo header List Chapter Paragraphs Footer Para

Figure 4.

Chapter

'~ Normal ~ Normal ) page

date Paragraphs title raphs Paragraphs Image Caption

II i l [ l l l I I I l i l Example of a specific layout structure of a "manual"

document, but may be overridden by a specific attribute. Specific attributes may be handed down from hierarchically higher components.

Attributes of components of the logical structure that map these components onto those of the layout structure are called layout directives. Examples of their use are to specify that a chapter is to start on a new page, or that a footnote is to be presented at the bottom of the current page. In this way they control the page layout, depending on the specific structure and the amount of content, and can be either specific or generic attributes, where the generic can be overridden by the specific.

A layout directive can consist of a rule associated as an attribute to a logical object, and a reference to a frame, such rules relating to the centring of headings or the layout for a table, for example. Some may contain just a rule, perhaps a directive concerningthe typeface to be used for the display of the logical elements.

An attribute is specified as being associated with a generic or specific logical or layout object, and classified as common, layout, logical, content or document description.

• Common attributes are applicable to generic and specific logical and layout objects, and include those to specify object types and identify object portions.

• Layout attributes are applicable to generic and specific layout objects, comprising layout structure attributes, position and dimension attributes, those relating to overlay, extended layout (specifying additional properties of layout objects to facilitate formatting, e.& tabulation) and presentation (specify- ing details of the presentation of the content associated with the layout objects, e.g. character spacing).

• Logical attributes are applicable to generic and specific logical objects, comprising those for logical structure, layout directives (see above), and present- ation (specifying details of the presentation of the content associated with the logical objects, e.& hyphenation).

• Content attributes are applicable to content por- tions, including those to identify the individual content portions and to specify the types of graphic elements that make up the content portion and the coding type used to represent them.

• Document description attributes include those to specify general characteristics of the document and to facilitate its storage and retrieval (see the section on document description).

Of these, some items are mandatory, while others are optional, default values being associated with some, as appropriate. Defaulting rules can apply to both specific logical and layout objects where the attributes can be specified at higher levels in the hierarchy, these being interpreted as default values at the lower levels, although they can be overridden by generic or specific attributes at those lower levels. Thus, nominal page dimensions may be specified at the document level, for example, or the default resolution for photographic blocks at page level. The priority order determining the attributes of a specific object is:

• those specified explicitly for the specific object, • those specified explicitly for the corresponding

generic object, • those relating to the specific object at the next

higher level (where these have been determined using this same priority order to establish the value),

• the default value given in the standard itself.

vol 7 no 4 august 1984 1 77

This applies in the case of optional attributes; for those which are mandatory only the first two items apply.

While many of the attribute definitions have been completed for this section, others have still to be added before the document can reach the status of a draft proposal. So much for ODA at this stage.

D o c u m e n t d e s c r i p t i o n

This part of the standard is applicable to both the implementor and the end user, since the attributes of the document description were specified bearing in mind all the possible things that the end-user might wish to do with a document during its lifetime, including being able to state when that lifetime would expire. Inevitably, not everything could be accorded a separate entry in a standard, and many companies could have private parameters; for them a free-format field has been included to allow the addition of ancilliary information.

The document description precedes and is an integral part of the interchanged document. It provides information for comprehension by both a human and a machine. This is achieved by means of attributes for the handling of the document as a whole, including information for processing the document (for example formatting, editing) and for filing and retrieval. Some of the attributes are applicable to its own rendition (for example, the character set used). Mandatory items include the title of the document and the overall length, optional items being author(s) and document date. Some are provided at the originating end before transmission, while others may be added by the recipient, who may also amend attributes to suit particular needs. However, amending the document and/or the document description, results in the creation of a new document, and this is an important point to note.

The semantics of the document description are described in Part 3 of the standard, their syntax and coding being specified in Part 4. Figure 5 gives an indication of what the preparer or encoder of the document might expect to see on a display screen. In the standard itself, these items are written in a syntactic metalanguage, each followed by an explan- atory comment.

Where the mandatory items are concerned, it is useful to know the graphic character set in which the document description itself is transmitted, particularly if from a country with a different alphabet (for example, an EEC document in Greek could be sent to a delegate in Athens, but where the character set of the description itself was in ITA No 2). Content type(s) could denote teletex and/or facsimile (a service), for example, or that they were drawn by vector or raster graphics. The length is that of the document as a whole (the body plus the document description) and may be actual or estimated, where the estimate must be the maximum storage requirement. (This was seen as

being of value for hand-held devices used in a library, for example, where the maximum length was that of a full cassette tape.)

Certain optional items may require some explan- ation. User-specific codes, for example, were seen as being a contract or a project number, or possibly a budget code. Where the originating and receiving system identifiers are concerned, these could be of value where an article for publication is prepared on a word processor of a certain manufacturer, but which includes typesetting commands for a laser printer to which the editor has access. A publisher of electronic journals, however, may be unhappy about their possibly'mangled' display on an inferior device to that on which he had composed his aesthetically pleasing journals, and would therefore impose a rendition restriction. Special stationery applies to a pro forma invoice, for example, where the information received would otherwise appear in nonexistent boxes. Key- words may be supplied by the originator and/or the receiver, their purpose being to 'permit logical associ- ations to be made about the content of the document'.

DOCUMENT DESCRIPTION

Note that items marked * must be completed. The others are optional.

• document description graphic character set = • content type(s) = • title = • length =

version number = reference = user-specific code(s) = originating system identifier = receiving system identifier(s) = rendition restriction = special stationery = author(s) = organization (s) = authorization = preparer = superseded document(s) = number of pages = subject = creation date = document date document clas~ = copy list = last change date and time = expiry date = owner(s) = filing reference(s) = filing date and time = keyword (s) = summary information = copyright security = copy protection(s) access right(s)= encryption language(s) = external reference(s) = additional information

Figure 5. Example of a display of document description

178 computer communications

Language(s) relates to the primary language(s) in which the main body of the document is written, be it French or Arabic, COBOL or BASIC, for example.

It is specified that the encryption algorithm, if appropiate and if given, should be a string of alpha- numeric characters, in accordance with the method laid down in the file-transfer standards. Similarly, the date and time are to be coded in accordance with ISO 2014, The writing of calendar dates in all-numeric forms, and ISO 3307, Information interchange-- representations oft/me of the day. Thus, 5.45 pm on 14 November 1983 could be 1983-11-14-17:45. Hope- fully, it will be all things to all men, something that only time will tell.

Office document interchange format

Part4 of the standard describes the formats of the data streams representing documents that are structured in accordance with the architecture defined in Part 2. It allows various subsets of the ODA logical and layout information to be represented in the datastream, which would result in different formats. These include text-image format (TIF), suitable primarily for imaging the text exactly as intended by the originator, and text- processible format (TPF), suitable for processing the text in addition to imaging` and there are several extensions to these. Conformance levels for data streams containing different amounts of logical and layout information relating to TIF, TPF and possible extensions are derived from this.

The document description and the document body are represented in the data stream, where the latter represents the content and the structure of the document, being composed of descriptors and text units. A descriptor corn prises a set of attributes relating to a generic or a specific logical layout object. A text unit represents a portion of the document content, of a single graphic element type, that is associated with (at most) one logical object and (at most) one layout object, although two or more text units may be associated with a single logical or layout object. Relationships between descriptors and between de- scriptors and text units are expressed by pointers, where a pointer is a data item used within a descriptor or a text unit, uniquely identifying and referring to that descriptor.

A descriptor, then, is a data structure, the subordinate data items of which identify the descriptor and the document component corresponding to it by means of attributes. Any subordinate data structure consists of more elementary attributes and data items where these, at the lowest level, are of a basic data type, such as a character or a bit string.

There are two main parts to a text unit: a data structure and an information field. The subordinate data items and (possibly) subordinate data structures of the former identify the text unit and type of document content (character box, geometric or photo-

graphic elements) represented by the information field. They refer to the descriptor of the logical object and/or the layout object with which the text unit is associated. Subordinate data structures consist of more elementary data structures and data items which, at the lowest level, are of the basic data type defined above. The information field represents the type of document content, and may contain embedded control functions. One consisting of character box elements could therefore include format effectors for carriage return and line feed, say, or a presentation control function for select graphic rendition.

Attributes are identified in the document using a syntactic metalanguage similar to that described in the CCITT document X.M HS4. Those for character present- ation relate to the character repertoire, character path, character orientation, character spacing, alignment, line progression, line spacing, character box size, baseline offset and graphic rendition. Not all have yet been specified, although the list should be completed shortly.

The coding method given is type, length, value (TLV) where the coded representation of each data structure or data item (with the exception of element- ary data items) consists of a type field, a length field and a value field. The type field identifies the attribute or group of attributes corresponding to the data structure or data item, the length field specifying that of the value. The value field consists of one or more triplets, each of which is composed of a type, a length and a value field, representing the subordinate data structures and items. Details of the coding method have yet to be defined.

How much logical and layout information is to accompany the document body depends on the requirement. For TIF, there is layout information (generic and/or specific layout structure and attributes) but no logical information. It allows for the unambig- uous specification of the image of the document, basic TI F containing specific layout structure and attributes for pages and blocks only. Extension 1 contains additional layout information for limited image manipul- ation (for example, rearrangement of blocks or sets of blocks within pages), this being the specific layout structure and attributes for frames. Extension 2 includes generic layout structure, which effectively reduces the potential length of the datastream, since common document components need only to be specified once.

TPF contains logical information (generic and/or specific logical structure and attributes) and may include layout directives, along with layout information (generic and/or specific layout structure and attributes). It is the logical information which enables processing to be carried out while preserving the logical structure and associated layout characteristics which may be specified by the layout directives. Basic TPF contains specified logical information and includes declarations of the logical object types. It may also contain layout directives that do not refer to layout objects. Extension

vol 7 no 4 august 1984 179

1 includes the generic logical structure, thus providing for maintenance of the document's logical structure when the document is modified in some way, perhaps by editing. Extension 2 includes specific and/or generic layout structure, along with layout directives that refer to layout objects. Conformance levels can thus be defined, depending on the level of TIF or TPF required.

CONCLUSIONS

This standard, while potentially of great importance to end users and written for their benefit, is not intended to be used by them. In this sense it is directed at implementors, be they suppliers of equipment or software. It is for analysts and programmers who will have the task of making its inner workings transparent to the user.

The would-be purchaser of a system will want to be conversant with the conformance clauses given in Part 1 of the standard; to know which level of the standard will be most appropriate for his requirements, be it TI F or TPF or any extension, indeed any other level which may be defined or addition thereto. He is then in a position to refer to specific clauses of the standard and to require conformance when inviting manufact- urers to tender. He would be interested in the ease of use of the system, and that the particular implement- ation of the document description is readily under- standable by the preparers or encoders of office documents in his company. After all, all he wishes to do

is to be able to interchange office documents. They may be single mode or mixed mode; using equipment from one supplier or several; it may be with the office next door, elsewhere in the country or elsewhere in the world. The author sees this as a reasonable request in this day and age.

The outlook is bright! By the end of 1984 there should be an ECMA standard for 'office document architecture' and a CCITT recommendation for a 'document interchange protocol for the telematic services'. The ISO/TC 97/SC 18/WG 3 documents should have a draft proposal number which will remain with them all their life, since it will be that of the (eventual) standard. Things are starting to move fast.

ISO work continues, taking in that to be decided by ECMA and CCITT, which will form a subset (or subsets) of the ISO standard. More substance is always being added to the skeletal framework, text gradually replac- ing occurrences of the phrase 'for further study' to result in an all-singing, all-dancing standard. And long before the projected publication date, it is confidently predic- ted that suppliers will be anticipating the outcome. There are indeed many hopeful signs for aspiring users.

N ote

ISO working documents are not generally available to the public. However, they are available to members of contributing organizations, thus NCC members may obtain a copy on application to the Standardization Office, Oxford Road, Manchester M1 7ED, UK.

dP DATA PROCESSING is specifically tailored to meet the business and professional interests of managers of computer installations and company executives who procure computing equipment. It contains news and features on industry developments, DP management issues, advances in technology, applications and new products and services.

Data Processing is the international journal for computer managers

coverage readersh ip • industry developments • data processing managers • information technology vendors • hardware • communications managers • systems houses • systems software • management services • service suppliers • applications software • systems analysts and programmers • computer supplies vendors • industry applications • computer and systems design engineers • software houses • data communications • ha~'dware vendors • computer academics • conferences and exhibitions • new products and services

Further details and sample copy can be obtained from: Chdstioc Mullins Butterworth Scientific Limited -- Journals Division PO Box 63 Westbury House Bury Street Guildford Surrey GU2 5BH UK Telephone 0483 31261 Telex: 859556 SCITEC G

180 computer communications