23
Fredrik Estreen - Lionbridge Yves Savourel - ENLASO Inline Markup in XLIFF 2.0

Inline Markup in XLIFF 2.0

  • Upload
    amalia

  • View
    28

  • Download
    1

Embed Size (px)

DESCRIPTION

Inline Markup in XLIFF 2.0. Fredrik Estreen - Lionbridge Yves Savourel - ENLASO. Disclaimer. While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup. - PowerPoint PPT Presentation

Citation preview

Page 1: Inline Markup in XLIFF 2.0

Fredrik Estreen - LionbridgeYves Savourel - ENLASO

Inline Markup in XLIFF 2.0

Page 2: Inline Markup in XLIFF 2.0

While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup.

Things may change during the formal approval by the sub-committee and later when it goes through the process of review and approval from the main XLIFF TC.

Disclaimer

Page 3: Inline Markup in XLIFF 2.0

• Principles and Background

• Inline Markupo Characters that are invalid in XMLo Native Codeso Annotations

• Extensions

• Processing requirements

• XLIFF Toolkit

Agenda

Page 4: Inline Markup in XLIFF 2.0

Some of the guidelines we are trying to follow during the work:

• Try to have only one way to do one thing

• Provide processing requirements

• Try to re-use existing standards when possible

• Try to keep things simple

Some Principles

Page 5: Inline Markup in XLIFF 2.0

The structural part of XLIFF changes in 2.0 and the inline markup should be easy to handle in the new model.

• Static structureo <file> -> <group>* -> <unit>o Contents of the concatenated <source> elements

remain static during processing

• Dynamic structure inside <unit>o <segment>, <ignorable> -> <source>, <target>o A processor may merge or split the contents of

segments or ignorable.

Containing Structure

Page 6: Inline Markup in XLIFF 2.0

The inline markup is what's inside the <source> and <target> elements

• Characters that are invalid in XML

• Original inline codes

• Annotations

What's the Inline Markup?

Page 7: Inline Markup in XLIFF 2.0

• Inline codes belong to the <unit> and not to the <segment>(s)

• ID uniqueness within the <unit>

• Allows simple re-segmentation of the content of <unit>

• No need to clone codes that span multiple segments

Inline codes and segmentation

Page 8: Inline Markup in XLIFF 2.0

For example control characters are not allowed in XML content, so they cannot be stored as-it in XLIFF.

<cp hex="0007"/> represents U+0007 (the "bell" character)

- Same as Unicode LDML format

- Only characters invalid in XML must use this notation.

Characters that are Invalid in XML

Page 9: Inline Markup in XLIFF 2.0

• Support any type of native markup

• Standalone: <ph/>

• Spanning: <pc> and <sc/> + <ec/>

Inline Codes

Page 10: Inline Markup in XLIFF 2.0

All possible cases:

Standalone code <ph id='1'/>

Well-formed spanning code <pc id='1'>text</pc>

Start marker of spanning code <sc id='1'/>

End marker of spanning code <ec rid='1'/>

Orphan start marker of spanning code <sc id='1' isolated='yes'/>

Orphan end marker of spanning code <ec id='1' isolated='yes'/>

Inline Codes - Use Cases

Page 11: Inline Markup in XLIFF 2.0

• No storage:

<source>A<ph id="1"/>B</source>

• Store, but only outside the segment:

<source>A<ph id="1" nid="d1"/>B</source>

<originalData> <data id="d1">&lt;BR></data>

</originalData>

Inline Codes - Storage of Original

Page 12: Inline Markup in XLIFF 2.0

<mrk> for well-formed constructs

<sm/> + <em/> otherwise

Attributes:

• id (required)

• type (default=generic)

• translate (yes or no, default=yes)

• ref (optional type-specific URI)

• value (optional type-specific text/data)

Annotations

Page 13: Inline Markup in XLIFF 2.0

• Translate annotations

• Term annotations

• Comment annotations

• Custom annotations

The IDs link the same annotation in source and target if needed.

Annotations Types

Page 14: Inline Markup in XLIFF 2.0

• To protect (or not) a span of content:

<mrk id="1" translate="no">content</mrk>

Note that translate can also be used with other types of annotations.

Translate Annotation

Page 15: Inline Markup in XLIFF 2.0

• To denote a "term":

<mrk id="1" type="term" value="simple definition" ref="reference to more info">content</mrk>

The id links source and target if needed

Term Annotation

Page 16: Inline Markup in XLIFF 2.0

• Simple:

<source><mrk id="1" type="comment" value="The text of the comment">content</mrk></source>

• With associated note:

<source><mrk id="1" type="comment" ref="#n1">content</mrk></source>

<notes>

<note id="n1">Text of the note</note></notes>

Comment Annotation

Page 17: Inline Markup in XLIFF 2.0

• User-defined annotation:

- The type attribute = <prefix>:<userType>

- The meanings of the value and ref attributes are defined by the user.

<mrk id="1" type="myPrefix:isbn" value="978-0-14-44919-8">The Epic of Gilgamesh</mrk>

Custom Annotation

Page 18: Inline Markup in XLIFF 2.0

• A few attributes can take user-defined values: e.g. mrk@type, ph@type, pc@type

• No additional attributes are allowed in any of the inline elements

• No additional elements are allowed inside <source>, <target> or <data>

Custom annotations are essentially the only way to extend markup inside the inline content.

Extensions

Page 19: Inline Markup in XLIFF 2.0

• Allowed markup transforms and related attribute mapping. Between <pc> and <sc>,<ec> pair.

• Define requirements for creation and editing of target text.

• Rules on cloning markup with and without reference to native data

• Stricter rules on attributes and ID references

• How to handle segmentation changes

Processing Requirements

Page 20: Inline Markup in XLIFF 2.0

• Java-based and open source (LGPL)

• http://code.google.com/p/okapi-xliff-toolkit/

• Stream-based rather than DOM to handle very large documents

• Reader is event-driven

• Unit available as single object

• Writer also available

XLIFF Toolkit - A Library and More

Page 21: Inline Markup in XLIFF 2.0

XLIFFReader reader = new XLIFFReader();

reader.open(new File("myInput.xlf"));

while ( reader.hasNext() ) {

XLIFFEvent event = reader.next();

if ( event.getType() == XLIFFEventType.TEXT_UNIT ) {

Unit unit = event.getUnit();

// Do something with the unit

}

}

reader.close();

Library - Reading a Document

Page 22: Inline Markup in XLIFF 2.0

XLIFFReader reader = new XLIFFReader();

XLIFFwriter writer = new XLIFFWriter();

reader.open(new File("myInput.xlf"));writer.create(new File("myOutput.xlf"));

while ( reader.hasNext() ) {

XLIFFEvent event = reader.next();

if ( event.getType() == XLIFFEventType.TEXT_UNIT ) {

Unit unit = event.getUnit();

// Do something with the unit

}

writer.write(event);

}

reader.close(); writer.close();

Library - Updating a Document

Page 23: Inline Markup in XLIFF 2.0

Useful links

• Read the latest Editor's Draft:https://wiki.oasis-open.org/xliff/

• Comment or ask questions in the mailing lists:https://lists.oasis-open.org/archives/xliff-comment/https://lists.oasis-open.org/archives/xliff-users/

• Try out the toolkit:http://code.google.com/p/okapi-xliff-toolkit/

Q & A