Upload
technical-dude
View
3.934
Download
0
Embed Size (px)
DESCRIPTION
One thing hasn’t changed in Web 2.0: users can be from many different countries, speaking many different languages. This session will show how to design internationalized SOAP and REST web services, how to deal with multiple languages in syndication, and how to make all of this work with Ajax in the browser. Over the course of the session we will take a plain, screen-scraping Web 1.0 currency converter and remake it into a multilingual, personalized Web 2.0 mashup, gadget, feed, and service. Learn the principles and apply them in your environment.
Citation preview
© Copyright 2007 Achim Ruopp© Copyright 2007 Achim Ruopp Web 2.0 Expo 2007Web 2.0 Expo 2007
Making Cents of Yens and Making Cents of Yens and Euros: Web 2.0 Euros: Web 2.0
InternationalizationInternationalization
Achim RuoppAchim Ruopp
Digital Silk RoadDigital Silk Road
http://www.digitalsilkroad.net/http://www.digitalsilkroad.net/
DemoDemoA Currency Converter Application – A Currency Converter Application – before and after before and after Web 2.0 InternationalizationWeb 2.0 Internationalization
AgendaAgenda Introduction to Web Internationalization (i18n)Introduction to Web Internationalization (i18n)
• Selecting and Persisting User PreferencesSelecting and Persisting User Preferences• Locales and Locale IdentifiersLocales and Locale Identifiers• UnicodeUnicode• Localization – Model and ToolsLocalization – Model and Tools
Client-side ScriptingClient-side Scripting• Javascript InternationalizationJavascript Internationalization• AjaxAjax
Multi-lingual SyndicationMulti-lingual Syndication• RSSRSS• AtomAtom
International Web Services DesignInternational Web Services Design• RESTREST• SOAPSOAP
Intro to Web InternationalizationIntro to Web InternationalizationLanguage and LocationLanguage and Location
en-US
fr en;0.8
da-DK
Intro to Web InternationalizationIntro to Web InternationalizationUser PreferencesUser Preferences
LanguageLanguage• HTTP Accept-Language headerHTTP Accept-Language header• E.g.: E.g.: en, fr-CA;0.8, fr;0.6en, fr-CA;0.8, fr;0.6• Language negotiation with the serverLanguage negotiation with the server
LocaleLocale• Cultural preferences for formatting, sorting etc.Cultural preferences for formatting, sorting etc.• Infer from Accept-Language header Infer from Accept-Language header • Map IPv4 address to ccTLD (country code top-level Map IPv4 address to ccTLD (country code top-level
domain)domain) Public information accessible through librariesPublic information accessible through libraries
• E.g. Perl IP::Country CPAN moduleE.g. Perl IP::Country CPAN module Commercial services offer more precisionCommercial services offer more precision
Always provide option to change defaultsAlways provide option to change defaults Store preferences in cookiesStore preferences in cookies
Intro to Web Internationalization Intro to Web Internationalization Internet Language TagsInternet Language Tags
IETF Language Tags (BCP 47)IETF Language Tags (BCP 47)
Language[-Language]*Language[-Language]*33
[-Script][-Region][-Script][-Region][-Variant]*[-Extension]*[-PrivateUse]*[-Variant]*[-Extension]*[-PrivateUse]*
ExamplesExamples• en-CA: English in Canadaen-CA: English in Canada• Zh-Hant-TW: Chinese written in traditional Zh-Hant-TW: Chinese written in traditional
Chinese script used in TaiwanChinese script used in Taiwan Obsoletes RFC 3066 & RFC 1766Obsoletes RFC 3066 & RFC 1766
• Often still used in products/earlier standardsOften still used in products/earlier standards
Internationalization ChangesInternationalization Changes
Intro to Web InternationalizationIntro to Web InternationalizationPOSIX LocalesPOSIX Locales
Cross-platform APICross-platform API• Locale-identifiers can have variationsLocale-identifiers can have variations
Un*x: en_USUn*x: en_US Windows: English_United StatesWindows: English_United States
• Results can be platform-dependentResults can be platform-dependent Basis for locale functionality in all scripting Basis for locale functionality in all scripting
languageslanguages Provides functionality forProvides functionality for
• Number Formatting: 1,000,000.23Number Formatting: 1,000,000.23• Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμDate/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ• SortingSorting• String processing (e.g. upper-/lower-casing)String processing (e.g. upper-/lower-casing)• Some translated strings like weekdays, yes/no messagesSome translated strings like weekdays, yes/no messages
Intro to Web InternationalizationIntro to Web Internationalization International Components for UnicodeInternational Components for Unicode
IBM Open Source projectIBM Open Source project Extensive locale data and APIs Extensive locale data and APIs
• Data vetted as part of Common Locale Data vetted as part of Common Locale Data Repository (CLDR) projectData Repository (CLDR) project
Java and C++ APIsJava and C++ APIs Wrappers for scripting languagesWrappers for scripting languages
• PyICU (Python)PyICU (Python)• ICU4R (Ruby) – abandoned?ICU4R (Ruby) – abandoned?• DIY – difficult because of API complexity DIY – difficult because of API complexity
and character encoding issuesand character encoding issues
Intro to Web InternationalizationIntro to Web InternationalizationMicrosoft Internationalization APIsMicrosoft Internationalization APIs
Windows NLS APIWindows NLS API Microsoft .NET Framework Microsoft .NET Framework
System.Globalization namespaceSystem.Globalization namespace Similar set of data to ICUSimilar set of data to ICU
• Data vetted by Microsoft subsidiariesData vetted by Microsoft subsidiaries APIs accessible from all Microsoft APIs accessible from all Microsoft
programming languagesprogramming languages
Intro to Web InternationalizationIntro to Web InternationalizationUnicode 5.0Unicode 5.0
00000
10000
20000
30000
E0000
F0000
100000
…
Basic Multilingual Plane
Dead Languages & Math
Han Characters
Language Tags
Private Use
0000
1000
2000
3000
4000
5000
6000
7000
8000
9000
A000
B000
C000
D000
E000
F000
Alphabets
Punctuation
Asian Languages
Han Characters
Yi
Hangul
Surrogates
Private Use
Legacy/Compatibility
99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined
Intro to Web InternationalizationIntro to Web InternationalizationUnicode Encodings FormsUnicode Encodings Forms
Variable length: UTF-8/UTF-16 Variable length: UTF-8/UTF-16 Fixed length: UTF-32Fixed length: UTF-32 U+2122: ™: Trade Mark SignU+2122: ™: Trade Mark Sign
UTF-8UTF-8 0xE2 0x84 0xA20xE2 0x84 0xA2 1110111000100010
10100001000001001010100010100010
UTF-16UTF-16 0x21220x2122 00100001 0010001000100001 00100010
UTF-32UTF-32 0x000021220x00002122 0…00100001 001000100…00100001 00100010
* source: Google presentation at IUC30* source: Google presentation at IUC30
Intro to Web InternationalizationIntro to Web InternationalizationUnicode on the WebUnicode on the Web
XML processors are required to process XML processors are required to process UTF-8/UTF-16UTF-8/UTF-16
Encoding declaration precedenceEncoding declaration precedence1.1.HTTP Content-Type header charset declarationHTTP Content-Type header charset declaration2.2.XML encoding declaration (XHTML)XML encoding declaration (XHTML)3.3.meta charset declaration in (X)HTMLmeta charset declaration in (X)HTML4.4.link element charset attribute link element charset attribute
Approx. 4% of pages have encoding Approx. 4% of pages have encoding errors*errors*
No real need for character references No real need for character references • Exceptions: <,>,&,"Exceptions: <,>,&,"
Use styles to control font selectionUse styles to control font selection
DemoDemoA Currency Converter Application – A Currency Converter Application – globalized but not localizedglobalized but not localized
Intro to Web InternationalizationIntro to Web InternationalizationLocalization RecommendationsLocalization Recommendations
Avoid translatable text in graphics
Make sure graphics are culturally neutral
Avoid absolute
sizingUse HTML
flow layout
Write complete sentences
Intro to Web InternationalizationIntro to Web InternationalizationLocalization Model and ToolsLocalization Model and Tools
Text translationText translation• Localization formatsLocalization formats
HTML with template libraryHTML with template library• W3C Internationalization Tag Set (tool support?)W3C Internationalization Tag Set (tool support?)
GNU gettext/POGNU gettext/PO XLIFF - XML Localization Interchange File FormatXLIFF - XML Localization Interchange File Format
• Localization toolsLocalization tools OmegaTOmegaT Open Language Tools (Sun)Open Language Tools (Sun) The WordForge Project: PootleThe WordForge Project: Pootle ……
Searchability – Links/SitemapSearchability – Links/Sitemap
DemoDemoA Currency Converter Application – A Currency Converter Application – fully internationalized Web 1.0 fully internationalized Web 1.0 applicationapplication
Client-side ScriptingClient-side ScriptingJavascript InternationalizationJavascript Internationalization
ECMAScript edition 3 added a range of ECMAScript edition 3 added a range of internationalization features (1999)internationalization features (1999)• Good support for Unicode processingGood support for Unicode processing• Set of locale-sensitive functionsSet of locale-sensitive functions
Dependent on host locale (i.e. browser)Dependent on host locale (i.e. browser)
• Set of locale-insensitive functionsSet of locale-insensitive functions• No number or date/time parsingNo number or date/time parsing
Javascript libraries with additional Javascript libraries with additional internationalization functionalityinternationalization functionality• dojo Toolkit (i18n contributed by IBM)dojo Toolkit (i18n contributed by IBM)• Microsoft AJAX LibraryMicrosoft AJAX Library
Client-side ScriptingClient-side ScriptingAJAX RecommendationsAJAX Recommendations
Late globalizationLate globalization• Transmit data in locale-independent form with Transmit data in locale-independent form with
XMLHttpRequestXMLHttpRequest• Might require some creative parsing/UIMight require some creative parsing/UI
Early localizationEarly localization• Text localization server-sideText localization server-side• Browsers are missing a message-catalog Browsers are missing a message-catalog
facilityfacility• Dynamically created page content is invisible Dynamically created page content is invisible
to search enginesto search engines
DemoDemoA Currency Converter Application – A Currency Converter Application – dynamic update of exchange dynamic update of exchange amounts using Ajaxamounts using Ajax
Multi-lingual SyndicationMulti-lingual SyndicationRSS 2.0RSS 2.0
Character encodingCharacter encoding• RSS 2.0 is an XML applicationRSS 2.0 is an XML application• XML encoding rules applyXML encoding rules apply
LanguageLanguage• Element only on channel (feed), not on itemElement only on channel (feed), not on item
Create one channel per languageCreate one channel per language
• Specified to comply to RFC1766 language tagsSpecified to comply to RFC1766 language tags Date/TimeDate/Time
• In standard RFC 822 format (including 4-digit In standard RFC 822 format (including 4-digit years)years)
E.g. “Wed, 02 Oct 2002 08:00:00 EST”E.g. “Wed, 02 Oct 2002 08:00:00 EST”
Multi-lingual SyndicationMulti-lingual SyndicationAtom SyndicationAtom Syndication
More granular language markingMore granular language marking• xml:lang can be applied to any human xml:lang can be applied to any human
readable text in the formatreadable text in the format• Aggregators need to deal with thisAggregators need to deal with this
Better date/time format: RFC 3339Better date/time format: RFC 3339• E.g. “2003-12-13T18:30:02-05:00”E.g. “2003-12-13T18:30:02-05:00”
Acknowledgement: Tim BrayAcknowledgement: Tim Bray
DemoDemoA Currency Converter Application – A Currency Converter Application – adding a syndication feed with adding a syndication feed with exchange rate informationexchange rate information
International Web Services DesignInternational Web Services DesignService PatternsService Patterns
DescriptionDescription Request dataRequest data Return dataReturn data
Locale NeutralLocale Neutral Neutral data Neutral data formatsformats
CADCAD 1.17851.1785
Client Client InfluencedInfluenced
Service reacts Service reacts to client-locale to client-locale e.g. HTTP e.g. HTTP Accept-Accept-LanguageLanguage
CADCAD
(Accept-(Accept-Language: de)Language: de)
Kanadischer Kanadischer DollarDollar
Service Service DeterminedDetermined
Service is Service is locale-specific locale-specific and ignores and ignores client client preferencepreference
03/08/2007 03/08/2007 12:00pm EST12:00pm EST
Data DrivenData Driven Service adjusts Service adjusts formatting and formatting and language to language to locale the data locale the data refers torefers to
NOKNOK
CHFCHF
norske kroner norske kroner
??
International Web Services DesignInternational Web Services DesignRESTREST
REST naturally ties into i18n features in REST naturally ties into i18n features in HTTP/HTML/XMLHTTP/HTML/XML• Locale indicated with HTTP Accept-LanguageLocale indicated with HTTP Accept-Language• Encoding and language marking in markupEncoding and language marking in markup
Special caution for HTTP GET parametersSpecial caution for HTTP GET parameters• Locale-independent formatting recommendedLocale-independent formatting recommended• Text parametersText parameters
Encode in UTF-8 and escape in URIsEncode in UTF-8 and escape in URIs IRI (International Resource Identifier) functionality IRI (International Resource Identifier) functionality
might provide this for youmight provide this for you
International Web Services DesignInternational Web Services DesignSOAPSOAP
Locale can be communicated inLocale can be communicated in• Transport header (e.g. HTTP)Transport header (e.g. HTTP)• SOAP headerSOAP header• SOAP message bodySOAP message body
Beware of automatically generated SOAP Beware of automatically generated SOAP interfacesinterfaces• Might be locale-dependent, but not allow to Might be locale-dependent, but not allow to
specify localespecify locale Use of XML Schema data types promotes Use of XML Schema data types promotes
locale-independencelocale-independence Also consider localization of error Also consider localization of error
messagesmessages
DemoDemoA Currency Converter Application – A Currency Converter Application – exchange rates as a REST web exchange rates as a REST web serviceservice
ConclusionsConclusions
UnificationUnification• One code baseOne code base
Customization Customization • Localization and adaptation for localesLocalization and adaptation for locales
Next step: cross-language “leakage”Next step: cross-language “leakage”• Provide views in multiple languages to the Provide views in multiple languages to the
same (user-generated) datasame (user-generated) data• Translate user-generated contentTranslate user-generated content
VolunteersVolunteers Machine TranslationMachine Translation
Call for ContributionsCall for Contributions The Perl CGI demo code is available onThe Perl CGI demo code is available on
• http://www.digitalsilkroad.net/twiki/http://www.digitalsilkroad.net/twiki/CurrencyConverterCurrencyConverter
Add a version in your preferred languageAdd a version in your preferred language• Ruby on RailsRuby on Rails• PHPPHP• PythonPython• ……
A similar application for ASP.NET is A similar application for ASP.NET is available onavailable on• http://quickstarts.asp.net/QuickStartv20/http://quickstarts.asp.net/QuickStartv20/
aspnet/doc/localization/default.aspxaspnet/doc/localization/default.aspx
ReferencesReferences
W3C Internationalization ActivityW3C Internationalization Activity• http://www.w3.org/International/http://www.w3.org/International/
POSIX LocalePOSIX Locale• http://www.opengroup.org/onlinepubs/009695399/basedhttp://www.opengroup.org/onlinepubs/009695399/based
efs/xbd_chap07.htmlefs/xbd_chap07.html International Components for UnicodeInternational Components for Unicode
• http://www-306.ibm.com/software/globalization/icu/http://www-306.ibm.com/software/globalization/icu/ Unicode/Common Locale Data RepositoryUnicode/Common Locale Data Repository
• http://www.unicode.org/http://www.unicode.org/ Microsoft Internationalization APIsMicrosoft Internationalization APIs
• http://msdn2.microsoft.com/en-us/library/http://msdn2.microsoft.com/en-us/library/ms776254.aspxms776254.aspx
• http://msdn2.microsoft.com/en-us/library/http://msdn2.microsoft.com/en-us/library/system.globalization.aspxsystem.globalization.aspx
ReferencesReferences
OmegaTOmegaT• http://www.omegat.org/omegat/omegat_en/omegat.htmlhttp://www.omegat.org/omegat/omegat_en/omegat.html
Open Language ToolsOpen Language Tools• https://open-language-tools.dev.java.net/https://open-language-tools.dev.java.net/
The WordForge ProjectThe WordForge Project• http://www.wordforge.org/drupal/http://www.wordforge.org/drupal/
Javascript InternationalizationJavascript Internationalization• http://www.icu-project.org/docs/papers/internationalization_support_forhttp://www.icu-project.org/docs/papers/internationalization_support_for
_javascript.html_javascript.html RSS 2.0RSS 2.0
• http://www.rssboard.org/rss-specificationhttp://www.rssboard.org/rss-specification Atom SyndicationAtom Syndication
• http://www.atomenabled.org/developers/syndicationhttp://www.atomenabled.org/developers/syndication RSS 1.0RSS 1.0
• http://web.resource.org/rss/1.0/spechttp://web.resource.org/rss/1.0/spec W3C Web Services Internationalization Usage ScenariosW3C Web Services Internationalization Usage Scenarios
• http://www.w3.org/TR/ws-i18n-scenarios/http://www.w3.org/TR/ws-i18n-scenarios/
Additional SlidesAdditional Slides
Multi-lingual SyndicationMulti-lingual SyndicationRSS 1.0RSS 1.0
Character encodingCharacter encoding• RSS 1.0 is an XML applicationRSS 1.0 is an XML application• XML encoding rules applyXML encoding rules apply
Complies to RDF (Resource Description Complies to RDF (Resource Description Framework) specificationFramework) specification• Definition of language and date/time formats Definition of language and date/time formats
are left to RDF metadata formatsare left to RDF metadata formats Dublin Core Metadata Element Set Dublin Core Metadata Element Set Language: RFC1766/ISO639-2Language: RFC1766/ISO639-2 Date/Time: ISO 8601 (superset of RFC 3339)Date/Time: ISO 8601 (superset of RFC 3339)
• Also Dublin Core allows to specify time periods!Also Dublin Core allows to specify time periods!