Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Internationalizing JavaScript Applications
Norbert Lindenberg
© Norbert Lindenberg 2012. All rights reserved.
ECMAScript
• Language Speci!cation
• Developed by Ecma TC 39
• Language syntax and semantics
• Core API: Object, String, Array, RegExp, ...
• 5.1 current
• 6 expected December 2013
ECMAScript• Internationalization API Speci!cation
• Developed by Ecma TC 39 + experts
• Collation, number, date & time formatting
• Started fall 2010
• Speci!cation stable
• Implementations and test suite in progress
• Approval expected December 2012
JavaScript Environments
• Web browsers: with DOM, XHR
• Servers: Node
• Platforms: Firefox OS, Metro Windows 8-style UI, Phonegap
• Libraries: jQuery, Dojo, YUI, GWT, +++++
Collation
Collation (Sorting)• Old: String.prototype.localeCompare
• Only string argument
• New: Intl.Collator
• locales
• options
• Fixed: String.prototype.localeCompare
• With locales and options arguments
Locales• BCP 47 language tags
• Language, script, country codes
• “es”, “en-AU”, “zh-Hans-CN”
• Unicode locale extension
• “de-u-co-phonebk”
• Preference lists
• [“mr”, “hi”, “en-IN”]
Locale Negotiation• BCP 47 Lookup
• [“es-GT”, “es-MX”] → “es-GT”, “es”, “es-MX”
• Best !t
• implementation de!ned
• [“es-GT”, “es-MX”] → “es-GT”, “es-MX”, “es”
• Unicode extension handled separately
Collator Extensions
• co: collation – phonebook, pinyin, ...
• kf: case !rst – upper, lower
• kn: numeric sorting
• kk: use normalization
Collator Options
• localeMatcher: lookup, best !t
• usage: sort, search
• sensitivity: base, accent, case, variant
• ignorePunctuation
• numeric, normalization, caseFirst
Non-ECMAScript
• Nothing good found (some for Latin only)
• Collation is hard
• Knowledge of full Unicode character set
• Big tables
Number Formatting
Number Formatting• Old: Number.prototype.toLocaleString
• No arguments
• New: Intl.NumberFormat
• locales
• options
• Fixed: Number.prototype.toLocaleString
• With locales and options arguments
NumberFormat Extensions
• nu: numbering system
NumberFormat Options
• localeMatcher: lookup, best !t
• style: decimal, currency, percent
• currency: ISO 4217 currency code
• currencyDisplay: symbol, code, name
• minimum/maximum digits
• useGrouping
¤ % ๙ # , ⚑Globalize + + - + - 250+
Dojo + + - + - 30+
Closure + + + + + 300+
Windows 8-style UI + + + + + 100s
iLib + + - + - 10+¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.¤: currency formatting. %: percent formatting. ๙: numbering systems. #: digit settings. ,: grouping separator option. ⚑: supported locales.
Non-ECMAScript
Date and Time Formatting
Date and Time Formatting
• Old: Date.prototype.toLocale[|Date|Time]String
• No arguments
• New: Intl.DateTimeFormat
• locales
• options
• Fixed: Date.prototype.toLocale[|Date|Time]String
• With locales and options arguments
DateTimeFormat Extensions
• ca: calendar
• nu: numbering system
DateTimeFormat Options
• localeMatcher: lookup, best !t
• timeZone: UTC
• hour12
• weekday, era, year, month, day, hour, minute, second, timeZoneName: components
• formatMatcher: basic, best !t
Non-ECMAScript
ca tz ๙ ⚑Globalize 5+ + - 250+Dojo 4 - - 30+Closure + + + 300+Windows 8-style UI ? - ? ?iLib 3 + - 10+YUI - - - 50+ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.ca: calendars. tz: time zones. ๙: numbering systems. ⚑: supported locales.
Message Construction
• Substitution
• {user} went to {city}.
• {user}さんは{city}へ行きました。
Message Construction
• Plurals
• {user} est allé à {city}.
• {user1} et {user2} sont allés à {city}.
• 1-6 forms depending on language
• {number, plural {one {...} few {...} many {...}}}
Message Construction
• Gender
• {user} est allé à {city}.
• {user} est allée à {city}.
• 1-4 forms depending on language
• {gender, select {female {...} male {...} unknown {...}}}
Message Construction{gender, select {
female {num, plural {
one {{user1} est allée à {city}.}
other {{user1} et {user2} sont allées à {city}.}}}
male {num, plural {
one {{user1} est allé à {city}.}
other {{user1} et {user2} sont allés à {city}.}}}
}}
Message Construction
• Google has MessageFormat for Closure environment
• Alex Sexton provided standalone version
Occupy Wall Street. By @tanlines.
Supplementary Characters
• Characters above U+FFFF
• Emoji, rare CJK, ancient scripts, musical symbols, ...
• 2 units in UTF-16
Today: UCS-2 or UTF-16?UCS-2:
• Regular expressions
• String comparison
• Case conversion
UTF-16:
• Source text conversion
• URI handling
Today: UCS-2 or UTF-16?UCS-2:
• Regular expressions
• String comparison
• Case conversion
UTF-16:
• Source text conversion
• URI handling
• DOM, text input, text rendering, XMLHttpRequest, libraries, apps
ECMAScript 6: UTF-16
• New Unicode mode in regular expressions
• Case conversion for full Unicode
• Full Unicode in identi!ers
• String accessors for code points
• But: no change to low-level string comparison
Rendering
• Emoji on Mac/iOS are rendered with color font
• On Mac, only Safari supports this font
• Not Firefox, Chrome, Opera
• Fonts for other supplementary characters supported in all modern browsers
Regular Expressions
• RegExp in ES5 doesn’t have much Unicode support
• No support for Unicode character properties
• No support for supplementary characters
Regular Expressions
• CSet (inimino): Character classes with supplementary characters
• XRegExp (Steven Levithan and Mathias Bynens): Unicode categories and properties with supplementary characters
Unicode Normalization
• Makes strings be equal that users perceive as equal (more or less)
• ä = a ¨
• ự = ự
• 김 = ㄱ ㅣ ㅁ
Unicode Normalization
• ECMAScript “assumes” normalization happens where needed
• Reality: applications have to do it
• Libraries available, but not up to date:
• unorm (Matsuza)
• Richard Ishida’s normalizer
北京大学.中国
北京大学.中国
Internationalized Domain Names
• Unicode at user interface
• ASCII under the hood
• 北京大学.中国 = xn--1lq90ic7fzpc.xn--!qs8s
• Main steps:
• normalization (as discussed)
• punycode (Mathias Bynens has latest)
Summary
• ECMAScript Internationalization API provides core functionality
• Please review and provide feedback
• http://norbertlindenberg.com/2012/06/ecmascript-internationalization-api/
• Libraries provide more internationalization support than you may think