37
23rd Internationalization and Unicode Conference, Prague, Czech Republic – March, 2003 Common XML Locale Repository Dr. Mark Davis [email protected] Steven R. Loomis [email protected] IBM San José Globalization Center of Competency Copyright © 2003 IBM Corporation

23rd Internationalization and Unicode Conference, Prague, Czech Republic – March, 2003 Common XML Locale Repository Dr. Mark Davis [email protected]

Embed Size (px)

Citation preview

23rd Internationalization and Unicode Conference, Prague, Czech Republic – March, 2003

Common XML Locale Repository

Dr. Mark Davis

[email protected]

Steven R. Loomis

[email protected]

IBM San José Globalization Center of CompetencyCopyright © 2003 IBM Corporation

Prague, Czech Republic, March 2003Common XML Locale Repository 2

Locale Data Confusion

Variations in localized data can irritate or confuse users…

OS #1: 2003-02-17 (févr. )

OS #2: 03-02-17 (fév)

Prague, Czech Republic, March 2003Common XML Locale Repository 3

Locale Data Problems

But, over the network, mismatched data can be catastrophic.

OS #1: 2003-02-17 (févr. )

OS #2: 03-02-17 (fév)

Prague, Czech Republic, March 2003Common XML Locale Repository 4

What is Locale Data?

• Locale = identifier string referring to linguistic and cultural preferences

• Typical data– Dates/times– Numbers– Measurement– Currency– Sorting (Collation)– Translated country and language names

Prague, Czech Republic, March 2003Common XML Locale Repository 5

Where is locale data found?

• International Components for Unicode (ICU)

• OpenOffice.org

• Operating Systems– Linux, Solaris, AIX, Windows, …

• Java

Prague, Czech Republic, March 2003Common XML Locale Repository 6

Common XML Locale Repository Team

• Li18nux is now OpenI18N(part of the Free Standards Group)

– Linux Application Development Environment subgroup

• Common XML Locale Repository project

http://www.openi18n.org/subgroups/lade/locale/

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.

Prague, Czech Republic, March 2003Common XML Locale Repository 7

Repository Objectives

• Common XML format for locale data

• Collect data from platforms

• Make repository available to the public

• Validate and release corrected data

• Enable W3C Web Services– Exchange and display of data in localized

form

Prague, Czech Republic, March 2003Common XML Locale Repository 8

Repository Features

• Version controlled database

• HTTP based- browsing or custom tools

• Compare data between platforms– (Comparisons available now)

Prague, Czech Republic, March 2003Common XML Locale Repository 9

Repository Structure

• Contents– Common– ICU– OpenOffice.org– Windows?– …

• Migrate to Common over time

Prague, Czech Republic, March 2003Common XML Locale Repository 10

Locale Data Markup Language

• XML “vocabulary” for locale data interchange

• Data stored in separate files (fr.xml or cs_CZ.xml)

• Inheritance used: ‘root.xml’ root locale, ‘fr.xml’ for French, ‘fr_CA.xml’ for French, Canada

Prague, Czech Republic, March 2003Common XML Locale Repository 11

Locale Naming

• ISO-639 + ISO-3166 +Variant (POSIX-like)en — Englishfr_BE — French in Belgiumde_DE — German in Germanysv_FI_AALAND — Swedish in Finland

(Åland region)

de_DE@collation=phonebook,currency@pre-euro

German in Germany, Phonebook collation, pre-Euro Currency.

Prague, Czech Republic, March 2003Common XML Locale Repository 12

<identity> Element

<localeData> <identity> <version number="1.1">Various

notes and changes</version> <generation date="2002-08-28"/> <language type="sv"/> <territory type="FI"/> <variant type="AALAND"/> </identity></localeData>

Prague, Czech Republic, March 2003Common XML Locale Repository 13

Inheritance

fr• Janvier, Février…• 1,234.56 • …

fr_CA• 1 234,57 $ • …

fr_LX• 1.234,57 €• …

Prague, Czech Republic, March 2003Common XML Locale Repository 14

Aliasing

zh (Chinese)

zh_CN zh_HK zh_TWTraditionalSimplified

Prague, Czech Republic, March 2003Common XML Locale Repository 15

<alias> element

<localeData> <identity> <language type="zh"/> <territory type="HK"/> </identity> <collations> <alias source="zh_TW"/> </collation></localeData>

Prague, Czech Republic, March 2003Common XML Locale Repository 16

type attribute

<numberFormatStyle type="decimal">

1 234,57<numberFormatStyle type="percent">

123%

cs_CZ

Prague, Czech Republic, March 2003Common XML Locale Repository 17

type attribute in Locale

<numberFormatStyle type="percent">

123%

cs_CZ@numberFormatStyle=percent

Prague, Czech Republic, March 2003Common XML Locale Repository 18

Standard Keys/Types

• CollationTraditional, Pinyin, Stroke, Direct (Hindi),

posix

• CurrencyPre-Euro

• CalendarGregorian, Arabic (Religious and Civil),

Chinese, Hebrew, Japanese, Thai (Buddhist)

Prague, Czech Republic, March 2003Common XML Locale Repository 19

draft and standard

• Unverified data may be marked with draft=true<localeData draft="true">

• Standard-conforming data may be marked with standard=…– Name: <collation standard="MSA 200:2002">

– URL: <dateFormatStyle standard="ISO 8601, http://www.iso.ch/iso/…CatalogueDetail?…ICS3=30,DIN 5008">

Prague, Czech Republic, March 2003Common XML Locale Repository 20

Data Access

• Normal HTTP request

http://openi18n.org/locale/icu/de_DE.xml?version=2.2&currency=pre-euro• Accessible by web browser or

programmatically.

Prague, Czech Republic, March 2003Common XML Locale Repository 21

Calendars

• Non-Gregorian calendars supportedGregorian data is the ‘root’for inheritance

• Calendars distinguished by ‘class’(class="japanese", class="arabic", …)

Prague, Czech Republic, March 2003Common XML Locale Repository 22

<calendars>

<calendar class="gregorian"> <monthNames> <month type="1">January</month> <month type="2">February</month> </monthNames> <dayNames> <day type="sun">Sunday</day> <day type="mon">Monday</day> </dayNames>

Prague, Czech Republic, March 2003Common XML Locale Repository 23

<calendars> (cont’d)

<dateFormats> <default type="medium"/> <dateFormatStyle type="full"> <dateFormat> <pattern>EEEE, MMMM d, yyyy</pattern> </dateFormat> </dateFormatStyle>

<dateFormatStyle type="medium"> <default type="DateFormatsKey2"> <dateFormat type="DateFormatsKey2"> <pattern>MMM d, yyyy</pattern> </dateFormat> <dateFormat type="DateFormatsKey3"> <pattern>MMM dd, yyyy</pattern> …

Prague, Czech Republic, March 2003Common XML Locale Repository 24

<calendars> <eras>(gregorian, continued) <eras> <eraAbbr> <era type="0">BC</era> <era type="1">AD</era> </eraAbbr> </eras></calendar>

<calendar class="japanese"> <eras> <eraAbbr> <era type="0">Taika</era> <era type="1">Hakuchi</era> </eraAbbr> </eras></calendar>

Prague, Czech Republic, March 2003Common XML Locale Repository 25

<numbers>

• <symbols> - digits, separators, signs• <numberFormats> - Patterns• <currencies> - Monetary

patterns, symbols

Prague, Czech Republic, March 2003Common XML Locale Repository 26

<symbols><decimal> . </decimal> <group> , </group> <list> ; </list> <percentSign> % </percentSign> <nativeZeroDigit> 0 </nativeZeroDigit> <patternDigit> # </patternDigit> <plusSign> + </plusSign> <minusSign> - </minusSign> <exponential> E </exponential> <perMille> ‰ </perMille> <infinity> ∞ </infinity> <nan> _ </nan>

Prague, Czech Republic, March 2003Common XML Locale Repository 27

<numberFormats><numberFormats> <numberFormatStyle type="decimal"> <numberFormat type="long"> <pattern type="positive">#,##0.###</pattern> <pattern type="negative">-#,##0.###</pattern> </numberFormat> </numberFormatStyle>

<numberFormatStyle type="percent"> <numberFormat type="short"> <pattern

type="positive">#,##0%</pattern> </numberFormat> </numberFormatStyle>

Prague, Czech Republic, March 2003Common XML Locale Repository 28

<numberFormats> currency

<numberFormatStyle type="currency"> <numberFormat type="medium"> <pattern type="positive"> #,##0.00;</pattern> <pattern type="negative"> ( #,##0.00)</pattern> </numberFormat> </numberFormatStyle></numberFormats>

Prague, Czech Republic, March 2003Common XML Locale Repository 29

<currencies><currencies> <default type="USD"/> <currency type="USD"> <displayName>dollar</displayName> <symbol>$</symbol> </currency> <currency type ="JPY"> <displayName>yen</displayName> <symbol>¥</symbol> </currency></currencies>

Prague, Czech Republic, March 2003Common XML Locale Repository 30

<collations>

• ‘root’ locale behavior = UCA

• Sub locales defined in terms of tailorings to the UCA

Prague, Czech Republic, March 2003Common XML Locale Repository 31

<collations>: Swedish<collation> <base UCA='3.1.1'>  <settings caseLevel="on"/>  <rules> <reset>Z</reset> <p>æ</p> <t>Æ</t> <t>aa</t> <t>aA</t> <t>Aa</t> <t>AA</t> ... </rules>

</collation>

Prague, Czech Republic, March 2003Common XML Locale Repository 32

<special>

• Can appear anywhere in the locale

• Denotes data specific to is not part of the LDML specification.

• Used to store data specific to OpenOffice.org, ICU, or other sources– Single source.

Prague, Czech Republic, March 2003Common XML Locale Repository 33

<special> example

<special owner="http://oss.software.ibm.com/icu/">

<transforms> <transform type="Latin">

&lt;&gt; a ; &lt;&gt; v ; </transform> </transforms></special>

Prague, Czech Republic, March 2003Common XML Locale Repository 34

Other Elements

• <displayName>• <localizedPatternChars>• <timeZoneNames>• <delimiters>• <encodings>• <layout>• <localeDisplayNames>• <measurement>

Prague, Czech Republic, March 2003Common XML Locale Repository 35

Open Issues

• Vetting process not defined

• Versioning and release of Repository not finalized

Prague, Czech Republic, March 2003Common XML Locale Repository 36

Current Status

• LDML 1.0 Specification released, and approved by Openi18n steering committee

• Preliminary data available by CVS (Source code repository)

• Newsgroup available for discussions• Database available for reporting bugs or

feature requests

Prague, Czech Republic, March 2003Common XML Locale Repository 37

For More Information

• http://www.openi18n.org/subgroups/lade/locale/.