23
Components of WordprocessingML Main Document Paragraphs & Rich Formatting Runs Run Content Tables Custom Markup Sections Styles Paragraph Character Numbering Table Document Defaults Fonts Numbering Headers/Footers Footnotes/Endnotes Glossary Document Annotations Comments Revisions Bookmarks Mail Merge Document Settings Web Settings Compatibility Settings Fields & Hyperlinks Odds & Ends (Textboxes, Subdocuments, Extensibility) Ecma/TC45/2006/011 (Rev.)

2 wordprocessing ml subject - paragraphs and rich formatting

Embed Size (px)

Citation preview

Components of WordprocessingML

• Main Document• Paragraphs & Rich Formatting

– Runs– Run Content

• Tables• Custom Markup• Sections• Styles

– Paragraph– Character– Numbering– Table– Document Defaults

• Fonts• Numbering• Headers/Footers• Footnotes/Endnotes• Glossary Document• Annotations

– Comments– Revisions– Bookmarks

• Mail Merge• Document Settings

– Web Settings– Compatibility Settings

• Fields & Hyperlinks• Odds & Ends (Textboxes, Subdocuments, Extensibility)

Ecma/TC45/2006/011 (Rev.)

WordprocessingML – Paragraphs & Rich Formatting

Paragraphs

• The most basic unit of a WordprocessingML document

• Analogous to the HTML <p> tag

• A paragraph contains three pieces of information:

– Paragraph properties

– Inline content

– (optionally) a set of revision IDs used for document merge and compare

Paragraph Example

• A basic paragraph with three different text formats:

Paragraph properties

Paragraph contents

Paragraph Properties

• The paragraph properties are stored on the pPr element

• This contains all information on the formatting applied at the paragraph level, as well as to the paragraph mark character

Paragraph Properties

• Paragraph Style

• Keep on same page with previous/next paragraph

• Page break before

• Text frame

– Text frame properties

• Widow/Orphan control

– Prevents one line of a paragraph from being on a different page

• Numbering properties

• Paragraph borders

Paragraph Properties (cont'd)

• Suppress line numbering

• Paragraph shading

• Tab stops

• Override hyphenation

• RTL vs. LTR

• East Asian typography settings

• Line spacing

• Document grid settings

– Adjust text to grid

– Snap margins to grid

• Paragraph alignment

Paragraph Properties (cont'd)

• Indentation

– Mirror indents?

• Text orientation (vertical vs. horizontal)

• Outline level

• HTML <div> references

• Conditional formatting properties (in tables)

• Formatting properties for the paragraph mark character

• Section properties

Paragraph Properties (cont'd)

• Paragraph property revisions

Runs

• A run is a region of text with a common set of properties

• All text in a word processing document is contained within runs

• A run contains three pieces of information:

– Run properties

– Run content (e.g. text)

– (optionally) A set of revision IDs for document comparison

Runs

• All runs must be contained within a paragraph

• Producers may break runs whenever they choose, as long as the net property set for each run is correct

Run Example

Run w/ no format

Run w/ no format

Bold run

Run Example (cont'd)

Four runs w/ no formatting

Two bold runs

Run w/ no formatting

Run Example (cont'd)

Run Example (cont'd)

• The second example may be less efficient, but it's equally valid.

Run Properties

• The run properties are stored on the rPrelement

• This contains all information on the formatting applied to the characters in this run

Run Properties

• Character style

• Font face

• Font size

• Bold

• Italic

• ALL CAPS

• Small caps

• Strikethrough

• Double Strikethrough

• Outline

• Shadow

• Emboss

• Engrave

• Hidden text

Run Properties (cont'd)

• Run property revisions

• Fit text (for East Asian typography)

• Vertical alignment

• RTL vs. LTR

• Complex script flag

• Emphasis mark

• Language ID of text

• Horizontal in vertical

• Two lines in one

• Math

Run Content

• Runs may contain 'run content':

– Text

– Deleted text

– Soft line breaks

– Field codes

– Deleted field codes

– Footnote/endnote reference marks

– Fields

Run Content

• Runs may contain 'run content' (cont'd):

• Page numbers

• Tabs

• Ruby text

• DrawingML content

• Embedded objects

• Pictures

Text

• The only elements in the main story that can contain a text node(!)

– All other text is in an attribute value

• There are four types of text in WordprocessingML:

– Text

– Deleted text

– Field code

– Deleted field codes

Text

• Why do we use a different element for deleted text?

– Good question!

• This allows simple consumers to get the text of the document easily by just grabbing the contents of the t node (text)

• They don't need to check where revisions start and end, etc. to extract the visible contents

Disclaimer

This presentation is for informational purposes only, and should not be relied upon as a substitute or replacement for Microsoft formal file format documentation, which is available at the following website: https://msdn.microsoft.com/en-us/library/cc313118(v=office.12).aspx. Any views or opinions presented in this material are solely those of the author and do not necessarily represent those of Microsoft. Microsoft disclaims all liability for mistakes or inaccuracies in this presentation.