33
© Copyright 2007 Achim Ruopp © Copyright 2007 Achim Ruopp Web 2.0 Expo 2007 Web 2.0 Expo 2007 Making Cents of Yens Making Cents of Yens and Euros: Web 2.0 and Euros: Web 2.0 Internationalization Internationalization Achim Ruopp Achim Ruopp Digital Silk Road Digital Silk Road http://www.digitalsilkroad.ne http://www.digitalsilkroad.ne t/ t/

Making Cents of Yens and Euros: Web 2.0 Internationalization

Embed Size (px)

DESCRIPTION

One thing hasn’t changed in Web 2.0: users can be from many different countries, speaking many different languages. This session will show how to design internationalized SOAP and REST web services, how to deal with multiple languages in syndication, and how to make all of this work with Ajax in the browser. Over the course of the session we will take a plain, screen-scraping Web 1.0 currency converter and remake it into a multilingual, personalized Web 2.0 mashup, gadget, feed, and service. Learn the principles and apply them in your environment.

Citation preview

Page 1: Making Cents of Yens and Euros: Web 2.0 Internationalization

© Copyright 2007 Achim Ruopp© Copyright 2007 Achim Ruopp Web 2.0 Expo 2007Web 2.0 Expo 2007

Making Cents of Yens and Making Cents of Yens and Euros: Web 2.0 Euros: Web 2.0

InternationalizationInternationalization

Achim RuoppAchim Ruopp

Digital Silk RoadDigital Silk Road

http://www.digitalsilkroad.net/http://www.digitalsilkroad.net/

Page 2: Making Cents of Yens and Euros: Web 2.0 Internationalization

DemoDemoA Currency Converter Application – A Currency Converter Application – before and after before and after Web 2.0 InternationalizationWeb 2.0 Internationalization

Page 3: Making Cents of Yens and Euros: Web 2.0 Internationalization

AgendaAgenda Introduction to Web Internationalization (i18n)Introduction to Web Internationalization (i18n)

• Selecting and Persisting User PreferencesSelecting and Persisting User Preferences• Locales and Locale IdentifiersLocales and Locale Identifiers• UnicodeUnicode• Localization – Model and ToolsLocalization – Model and Tools

Client-side ScriptingClient-side Scripting• Javascript InternationalizationJavascript Internationalization• AjaxAjax

Multi-lingual SyndicationMulti-lingual Syndication• RSSRSS• AtomAtom

International Web Services DesignInternational Web Services Design• RESTREST• SOAPSOAP

Page 4: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationLanguage and LocationLanguage and Location

en-US

fr en;0.8

da-DK

Page 5: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationUser PreferencesUser Preferences

LanguageLanguage• HTTP Accept-Language headerHTTP Accept-Language header• E.g.: E.g.: en, fr-CA;0.8, fr;0.6en, fr-CA;0.8, fr;0.6• Language negotiation with the serverLanguage negotiation with the server

LocaleLocale• Cultural preferences for formatting, sorting etc.Cultural preferences for formatting, sorting etc.• Infer from Accept-Language header Infer from Accept-Language header • Map IPv4 address to ccTLD (country code top-level Map IPv4 address to ccTLD (country code top-level

domain)domain) Public information accessible through librariesPublic information accessible through libraries

• E.g. Perl IP::Country CPAN moduleE.g. Perl IP::Country CPAN module Commercial services offer more precisionCommercial services offer more precision

Always provide option to change defaultsAlways provide option to change defaults Store preferences in cookiesStore preferences in cookies

Page 6: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web Internationalization Intro to Web Internationalization Internet Language TagsInternet Language Tags

IETF Language Tags (BCP 47)IETF Language Tags (BCP 47)

Language[-Language]*Language[-Language]*33

[-Script][-Region][-Script][-Region][-Variant]*[-Extension]*[-PrivateUse]*[-Variant]*[-Extension]*[-PrivateUse]*

ExamplesExamples• en-CA: English in Canadaen-CA: English in Canada• Zh-Hant-TW: Chinese written in traditional Zh-Hant-TW: Chinese written in traditional

Chinese script used in TaiwanChinese script used in Taiwan Obsoletes RFC 3066 & RFC 1766Obsoletes RFC 3066 & RFC 1766

• Often still used in products/earlier standardsOften still used in products/earlier standards

Page 7: Making Cents of Yens and Euros: Web 2.0 Internationalization

Internationalization ChangesInternationalization Changes

Page 8: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationPOSIX LocalesPOSIX Locales

Cross-platform APICross-platform API• Locale-identifiers can have variationsLocale-identifiers can have variations

Un*x: en_USUn*x: en_US Windows: English_United StatesWindows: English_United States

• Results can be platform-dependentResults can be platform-dependent Basis for locale functionality in all scripting Basis for locale functionality in all scripting

languageslanguages Provides functionality forProvides functionality for

• Number Formatting: 1,000,000.23Number Formatting: 1,000,000.23• Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμDate/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ• SortingSorting• String processing (e.g. upper-/lower-casing)String processing (e.g. upper-/lower-casing)• Some translated strings like weekdays, yes/no messagesSome translated strings like weekdays, yes/no messages

Page 9: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web Internationalization International Components for UnicodeInternational Components for Unicode

IBM Open Source projectIBM Open Source project Extensive locale data and APIs Extensive locale data and APIs

• Data vetted as part of Common Locale Data vetted as part of Common Locale Data Repository (CLDR) projectData Repository (CLDR) project

Java and C++ APIsJava and C++ APIs Wrappers for scripting languagesWrappers for scripting languages

• PyICU (Python)PyICU (Python)• ICU4R (Ruby) – abandoned?ICU4R (Ruby) – abandoned?• DIY – difficult because of API complexity DIY – difficult because of API complexity

and character encoding issuesand character encoding issues

Page 10: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationMicrosoft Internationalization APIsMicrosoft Internationalization APIs

Windows NLS APIWindows NLS API Microsoft .NET Framework Microsoft .NET Framework

System.Globalization namespaceSystem.Globalization namespace Similar set of data to ICUSimilar set of data to ICU

• Data vetted by Microsoft subsidiariesData vetted by Microsoft subsidiaries APIs accessible from all Microsoft APIs accessible from all Microsoft

programming languagesprogramming languages

Page 11: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationUnicode 5.0Unicode 5.0

00000

10000

20000

30000

E0000

F0000

100000

Basic Multilingual Plane

Dead Languages & Math

Han Characters

Language Tags

Private Use

0000

1000

2000

3000

4000

5000

6000

7000

8000

9000

A000

B000

C000

D000

E000

F000

Alphabets

Punctuation

Asian Languages

Han Characters

Yi

Hangul

Surrogates

Private Use

Legacy/Compatibility

99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined

Page 12: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationUnicode Encodings FormsUnicode Encodings Forms

Variable length: UTF-8/UTF-16 Variable length: UTF-8/UTF-16 Fixed length: UTF-32Fixed length: UTF-32 U+2122: ™: Trade Mark SignU+2122: ™: Trade Mark Sign

UTF-8UTF-8 0xE2 0x84 0xA20xE2 0x84 0xA2 1110111000100010

10100001000001001010100010100010

UTF-16UTF-16 0x21220x2122 00100001 0010001000100001 00100010

UTF-32UTF-32 0x000021220x00002122 0…00100001 001000100…00100001 00100010

Page 13: Making Cents of Yens and Euros: Web 2.0 Internationalization

* source: Google presentation at IUC30* source: Google presentation at IUC30

Intro to Web InternationalizationIntro to Web InternationalizationUnicode on the WebUnicode on the Web

XML processors are required to process XML processors are required to process UTF-8/UTF-16UTF-8/UTF-16

Encoding declaration precedenceEncoding declaration precedence1.1.HTTP Content-Type header charset declarationHTTP Content-Type header charset declaration2.2.XML encoding declaration (XHTML)XML encoding declaration (XHTML)3.3.meta charset declaration in (X)HTMLmeta charset declaration in (X)HTML4.4.link element charset attribute link element charset attribute

Approx. 4% of pages have encoding Approx. 4% of pages have encoding errors*errors*

No real need for character references No real need for character references • Exceptions: <,>,&,"Exceptions: <,>,&,"

Use styles to control font selectionUse styles to control font selection

Page 14: Making Cents of Yens and Euros: Web 2.0 Internationalization

DemoDemoA Currency Converter Application – A Currency Converter Application – globalized but not localizedglobalized but not localized

Page 15: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationLocalization RecommendationsLocalization Recommendations

Avoid translatable text in graphics

Make sure graphics are culturally neutral

Avoid absolute

sizingUse HTML

flow layout

Write complete sentences

Page 16: Making Cents of Yens and Euros: Web 2.0 Internationalization

Intro to Web InternationalizationIntro to Web InternationalizationLocalization Model and ToolsLocalization Model and Tools

Text translationText translation• Localization formatsLocalization formats

HTML with template libraryHTML with template library• W3C Internationalization Tag Set (tool support?)W3C Internationalization Tag Set (tool support?)

GNU gettext/POGNU gettext/PO XLIFF - XML Localization Interchange File FormatXLIFF - XML Localization Interchange File Format

• Localization toolsLocalization tools OmegaTOmegaT Open Language Tools (Sun)Open Language Tools (Sun) The WordForge Project: PootleThe WordForge Project: Pootle ……

Searchability – Links/SitemapSearchability – Links/Sitemap

Page 17: Making Cents of Yens and Euros: Web 2.0 Internationalization

DemoDemoA Currency Converter Application – A Currency Converter Application – fully internationalized Web 1.0 fully internationalized Web 1.0 applicationapplication

Page 18: Making Cents of Yens and Euros: Web 2.0 Internationalization

Client-side ScriptingClient-side ScriptingJavascript InternationalizationJavascript Internationalization

ECMAScript edition 3 added a range of ECMAScript edition 3 added a range of internationalization features (1999)internationalization features (1999)• Good support for Unicode processingGood support for Unicode processing• Set of locale-sensitive functionsSet of locale-sensitive functions

Dependent on host locale (i.e. browser)Dependent on host locale (i.e. browser)

• Set of locale-insensitive functionsSet of locale-insensitive functions• No number or date/time parsingNo number or date/time parsing

Javascript libraries with additional Javascript libraries with additional internationalization functionalityinternationalization functionality• dojo Toolkit (i18n contributed by IBM)dojo Toolkit (i18n contributed by IBM)• Microsoft AJAX LibraryMicrosoft AJAX Library

Page 19: Making Cents of Yens and Euros: Web 2.0 Internationalization

Client-side ScriptingClient-side ScriptingAJAX RecommendationsAJAX Recommendations

Late globalizationLate globalization• Transmit data in locale-independent form with Transmit data in locale-independent form with

XMLHttpRequestXMLHttpRequest• Might require some creative parsing/UIMight require some creative parsing/UI

Early localizationEarly localization• Text localization server-sideText localization server-side• Browsers are missing a message-catalog Browsers are missing a message-catalog

facilityfacility• Dynamically created page content is invisible Dynamically created page content is invisible

to search enginesto search engines

Page 20: Making Cents of Yens and Euros: Web 2.0 Internationalization

DemoDemoA Currency Converter Application – A Currency Converter Application – dynamic update of exchange dynamic update of exchange amounts using Ajaxamounts using Ajax

Page 21: Making Cents of Yens and Euros: Web 2.0 Internationalization

Multi-lingual SyndicationMulti-lingual SyndicationRSS 2.0RSS 2.0

Character encodingCharacter encoding• RSS 2.0 is an XML applicationRSS 2.0 is an XML application• XML encoding rules applyXML encoding rules apply

LanguageLanguage• Element only on channel (feed), not on itemElement only on channel (feed), not on item

Create one channel per languageCreate one channel per language

• Specified to comply to RFC1766 language tagsSpecified to comply to RFC1766 language tags Date/TimeDate/Time

• In standard RFC 822 format (including 4-digit In standard RFC 822 format (including 4-digit years)years)

E.g. “Wed, 02 Oct 2002 08:00:00 EST”E.g. “Wed, 02 Oct 2002 08:00:00 EST”

Page 22: Making Cents of Yens and Euros: Web 2.0 Internationalization

Multi-lingual SyndicationMulti-lingual SyndicationAtom SyndicationAtom Syndication

More granular language markingMore granular language marking• xml:lang can be applied to any human xml:lang can be applied to any human

readable text in the formatreadable text in the format• Aggregators need to deal with thisAggregators need to deal with this

Better date/time format: RFC 3339Better date/time format: RFC 3339• E.g. “2003-12-13T18:30:02-05:00”E.g. “2003-12-13T18:30:02-05:00”

Acknowledgement: Tim BrayAcknowledgement: Tim Bray

Page 23: Making Cents of Yens and Euros: Web 2.0 Internationalization

DemoDemoA Currency Converter Application – A Currency Converter Application – adding a syndication feed with adding a syndication feed with exchange rate informationexchange rate information

Page 24: Making Cents of Yens and Euros: Web 2.0 Internationalization

International Web Services DesignInternational Web Services DesignService PatternsService Patterns

DescriptionDescription Request dataRequest data Return dataReturn data

Locale NeutralLocale Neutral Neutral data Neutral data formatsformats

CADCAD 1.17851.1785

Client Client InfluencedInfluenced

Service reacts Service reacts to client-locale to client-locale e.g. HTTP e.g. HTTP Accept-Accept-LanguageLanguage

CADCAD

(Accept-(Accept-Language: de)Language: de)

Kanadischer Kanadischer DollarDollar

Service Service DeterminedDetermined

Service is Service is locale-specific locale-specific and ignores and ignores client client preferencepreference

03/08/2007 03/08/2007 12:00pm EST12:00pm EST

Data DrivenData Driven Service adjusts Service adjusts formatting and formatting and language to language to locale the data locale the data refers torefers to

NOKNOK

CHFCHF

norske kroner norske kroner

??

Page 25: Making Cents of Yens and Euros: Web 2.0 Internationalization

International Web Services DesignInternational Web Services DesignRESTREST

REST naturally ties into i18n features in REST naturally ties into i18n features in HTTP/HTML/XMLHTTP/HTML/XML• Locale indicated with HTTP Accept-LanguageLocale indicated with HTTP Accept-Language• Encoding and language marking in markupEncoding and language marking in markup

Special caution for HTTP GET parametersSpecial caution for HTTP GET parameters• Locale-independent formatting recommendedLocale-independent formatting recommended• Text parametersText parameters

Encode in UTF-8 and escape in URIsEncode in UTF-8 and escape in URIs IRI (International Resource Identifier) functionality IRI (International Resource Identifier) functionality

might provide this for youmight provide this for you

Page 26: Making Cents of Yens and Euros: Web 2.0 Internationalization

International Web Services DesignInternational Web Services DesignSOAPSOAP

Locale can be communicated inLocale can be communicated in• Transport header (e.g. HTTP)Transport header (e.g. HTTP)• SOAP headerSOAP header• SOAP message bodySOAP message body

Beware of automatically generated SOAP Beware of automatically generated SOAP interfacesinterfaces• Might be locale-dependent, but not allow to Might be locale-dependent, but not allow to

specify localespecify locale Use of XML Schema data types promotes Use of XML Schema data types promotes

locale-independencelocale-independence Also consider localization of error Also consider localization of error

messagesmessages

Page 27: Making Cents of Yens and Euros: Web 2.0 Internationalization

DemoDemoA Currency Converter Application – A Currency Converter Application – exchange rates as a REST web exchange rates as a REST web serviceservice

Page 28: Making Cents of Yens and Euros: Web 2.0 Internationalization

ConclusionsConclusions

UnificationUnification• One code baseOne code base

Customization Customization • Localization and adaptation for localesLocalization and adaptation for locales

Next step: cross-language “leakage”Next step: cross-language “leakage”• Provide views in multiple languages to the Provide views in multiple languages to the

same (user-generated) datasame (user-generated) data• Translate user-generated contentTranslate user-generated content

VolunteersVolunteers Machine TranslationMachine Translation

Page 29: Making Cents of Yens and Euros: Web 2.0 Internationalization

Call for ContributionsCall for Contributions The Perl CGI demo code is available onThe Perl CGI demo code is available on

• http://www.digitalsilkroad.net/twiki/http://www.digitalsilkroad.net/twiki/CurrencyConverterCurrencyConverter

Add a version in your preferred languageAdd a version in your preferred language• Ruby on RailsRuby on Rails• PHPPHP• PythonPython• ……

A similar application for ASP.NET is A similar application for ASP.NET is available onavailable on• http://quickstarts.asp.net/QuickStartv20/http://quickstarts.asp.net/QuickStartv20/

aspnet/doc/localization/default.aspxaspnet/doc/localization/default.aspx

Page 30: Making Cents of Yens and Euros: Web 2.0 Internationalization

ReferencesReferences

W3C Internationalization ActivityW3C Internationalization Activity• http://www.w3.org/International/http://www.w3.org/International/

POSIX LocalePOSIX Locale• http://www.opengroup.org/onlinepubs/009695399/basedhttp://www.opengroup.org/onlinepubs/009695399/based

efs/xbd_chap07.htmlefs/xbd_chap07.html International Components for UnicodeInternational Components for Unicode

• http://www-306.ibm.com/software/globalization/icu/http://www-306.ibm.com/software/globalization/icu/ Unicode/Common Locale Data RepositoryUnicode/Common Locale Data Repository

• http://www.unicode.org/http://www.unicode.org/ Microsoft Internationalization APIsMicrosoft Internationalization APIs

• http://msdn2.microsoft.com/en-us/library/http://msdn2.microsoft.com/en-us/library/ms776254.aspxms776254.aspx

• http://msdn2.microsoft.com/en-us/library/http://msdn2.microsoft.com/en-us/library/system.globalization.aspxsystem.globalization.aspx

Page 31: Making Cents of Yens and Euros: Web 2.0 Internationalization

ReferencesReferences

OmegaTOmegaT• http://www.omegat.org/omegat/omegat_en/omegat.htmlhttp://www.omegat.org/omegat/omegat_en/omegat.html

Open Language ToolsOpen Language Tools• https://open-language-tools.dev.java.net/https://open-language-tools.dev.java.net/

The WordForge ProjectThe WordForge Project• http://www.wordforge.org/drupal/http://www.wordforge.org/drupal/

Javascript InternationalizationJavascript Internationalization• http://www.icu-project.org/docs/papers/internationalization_support_forhttp://www.icu-project.org/docs/papers/internationalization_support_for

_javascript.html_javascript.html RSS 2.0RSS 2.0

• http://www.rssboard.org/rss-specificationhttp://www.rssboard.org/rss-specification Atom SyndicationAtom Syndication

• http://www.atomenabled.org/developers/syndicationhttp://www.atomenabled.org/developers/syndication RSS 1.0RSS 1.0

• http://web.resource.org/rss/1.0/spechttp://web.resource.org/rss/1.0/spec W3C Web Services Internationalization Usage ScenariosW3C Web Services Internationalization Usage Scenarios

• http://www.w3.org/TR/ws-i18n-scenarios/http://www.w3.org/TR/ws-i18n-scenarios/

Page 32: Making Cents of Yens and Euros: Web 2.0 Internationalization

Additional SlidesAdditional Slides

Page 33: Making Cents of Yens and Euros: Web 2.0 Internationalization

Multi-lingual SyndicationMulti-lingual SyndicationRSS 1.0RSS 1.0

Character encodingCharacter encoding• RSS 1.0 is an XML applicationRSS 1.0 is an XML application• XML encoding rules applyXML encoding rules apply

Complies to RDF (Resource Description Complies to RDF (Resource Description Framework) specificationFramework) specification• Definition of language and date/time formats Definition of language and date/time formats

are left to RDF metadata formatsare left to RDF metadata formats Dublin Core Metadata Element Set Dublin Core Metadata Element Set Language: RFC1766/ISO639-2Language: RFC1766/ISO639-2 Date/Time: ISO 8601 (superset of RFC 3339)Date/Time: ISO 8601 (superset of RFC 3339)

• Also Dublin Core allows to specify time periods!Also Dublin Core allows to specify time periods!