Addison Phillips, Chair W3C Internationalization WG Towards the Promised Land: Globalization...

Preview:

Citation preview

Addison Phillips, Chair

W3C Internationalization WG

Towards the Promised Land:

Globalization Developments in Web Standards

Presenter

• Globalization Architect, Amazon Lab126 We make the Kindle

• Chair, W3C Internationalization WG

Acknowledgements

• This presentation owes much of its content to these contributors:– Richard Ishida (W3C International Activity lead)– Felix Sasaki (W3C MLW-LT)– Aharon Lanin (Google, bidi maven)– Norbert Lindenberg (ES-I18N)– Koji Ishii (Rakuten)

The Web: vastly improved or room for improvement?

• Why “the promised land”?The promise of a multilingual Web is being realized and new W3C specifications help demonstrate that.Many features are implemented.

• Why only “towards”We’ve waited a long time.Many features we’ll talk about today are not implemented yet or are only partially implemented.

• What issues are more or less solved on the Web?

• What are we doing to address the remaining problems?

• How can you influence the outcomes?

ق��ا! �ة ح �ة عالم جعل شبكة الويب العالموب جهانی را به درستی جهانی سازیم!

عالمگیر ویب کو حقیقی طور پر عالمگیر بناناՀամաշխարհային ցանցն իրոք համաշխարհային դարձնելը

ᑖᑦᓱᒪ ᐃᑭᐊᖅᑭᕕᒃ ᓯᓚᕐᔪᐊᓕᒫᒥᒃ ᓈᕆᑎᑉᐹ.

"Дүниежүзілік торды" нағыз дүниежүзілік етеміз!

वरलड� व�ईड व�बला�ई यथा�था�म विवशववया�पी� बना�उना� !የዓለም አቀፉን ድር በእውነት አለም አቀፍ ማድረግ!

Κάνοντας τον Παγκόσμιο Ιστό πραγματικά Παγκόσμιο

ਵਰਡ ਵ�ਈਡ ਵ�ਬ ਨ� ਵ�ਕਈ ਵਿਵਸਵ-ਵਿਵਆਪੀ� ਬਨ�ਉਣਾ� !缔造真正全球通行的万维网ליצור מהרשת רשת כלל עולמית באמת!

ˈmeɪkɪŋ ðə wɜːld waɪd wɛb ˈtruːlɪ ˈwɜːldˈwaɪd

ワールド・ワイド・ウェッブを世界中に広げましょうធវើ���ឲយ�ធវើ� �លវា�យធវើ� �បមានទ�ទា�ងព�ភពធវើ�កព�បរា�កដមែ�ន!

전세계의 월드 와이드 웹으로 만들기 !

Gwneud y we fyd-eang yn wirioneddol fyd-eang!

การท�าให� World Wide Web แพร�หลายไปท��วโลกอย�างแท�จร�งའཛམ་ག� ང་ཡ ངས་འབ� ལ་འད� ་ ང ་མ་འབད་རང་ འཛམ་ག� ང་ཡ ངས་ལ་ཁབ་ཚགསཔ་བཟ ་བ།

"The Path W3C follows to making text on the Web truly global is Unicode." Tim Berners-Lee

Unicode

ق��ا! �ة ح �ة عالم جعل شبكة الويب العالموب جهانی را به درستی جهانی سازیم!

عالمگیر ویب کو حقیقی طور پر عالمگیر بناناՀամաշխարհային ցանցն իրոք համաշխարհային դարձնելը

ᑖᑦᓱᒪ ᐃᑭᐊᖅᑭᕕᒃ ᓯᓚᕐᔪᐊᓕᒫᒥᒃ ᓈᕆᑎᑉᐹ.

"Дүниежүзілік торды" нағыз дүниежүзілік етеміз!

वरलड� व�ईड व�बला�ई यथा�था�म विवशववया�पी� बना�उना� !የዓለም አቀፉን ድር በእውነት አለም አቀፍ ማድረግ!

Κάνοντας τον Παγκόσμιο Ιστό πραγματικά Παγκόσμιο

ਵਰਡ ਵ�ਈਡ ਵ�ਬ ਨ� ਵ�ਕਈ ਵਿਵਸਵ-ਵਿਵਆਪੀ� ਬਨ�ਉਣਾ� !缔造真正全球通行的万维网ליצור מהרשת רשת כלל עולמית באמת!

ˈmeɪkɪŋ ðə wɜːld waɪd wɛb ˈtruːlɪ ˈwɜːldˈwaɪd

ワールド・ワイド・ウェッブを世界中に広げましょうធវើ���ឲយ�ធវើ� �លវា�យធវើ� �បមានទ�ទា�ងព�ភពធវើ�កព�បរា�កដមែ�ន!

전세계의 월드 와이드 웹으로 만들기 !

Gwneud y we fyd-eang yn wirioneddol fyd-eang!

การท�าให� World Wide Web แพร�หลายไปท��วโลกอย�างแท�จร�งའཛམ་ག� ང་ཡ ངས་འབ� ལ་འད� ་ ང ་མ་འབད་རང་ འཛམ་ག� ང་ཡ ངས་ལ་ཁབ་ཚགསཔ་བཟ ་བ།

http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html

Unicode

Encoding declarations<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"

"http://www.w3.org/TR/html4/strict.dtd">

<html lang='en'>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

</head>

...

<!DOCTYPE html>

<html>

<head>

<meta charset=utf-8>

</head>

...

• Strong encouragement to use UTF-8.

• New meta charset declaration. Either approach will work, but check you don't have both.

• Must be completely within the first 1024 bytes of the file.

HTML5 Encoding Spec

• Rules for determining, parsing, handling legacy encodings.

<h2><a id="რჩეული">რჩეული ფოტოსურათი</a></h1>

<p><a href="/wiki/ჭიამაია" title="ჭიამაია" class="mw-redirect">ჭიამაია</a> (Coccinellidae), ხოჭოების ოჯახს ეკუთვნის. აქვს ამობურცული, მომრგვალო ან ოვალური სხეული. ზურგზე ღია ფონზე შავი ლაქები აყრია, იშვიათად ...

Unicode versions and ids

History: CharMod

CharMod was the start of the International Activity, based on requirements originally published in 1998. So how is this news?

I◌Ôzeli◌Ôto◌Õu◌Öl

NFD

ÍzelítőülNFC

Ha a világ beszélni akarna, Unicode-ul szólalna meg. Regisztráljon már most a Tizedik Nemzetközi Unicode Konferenciára, melyet 1997. március 10-12-én rendeznek Meinz-ban, Németországban. Ezen a konferencián az iparág több neves szakértője is résztvesz. Ízelítőül a témákból: a világháló és a Unicode nemzetközisítése és lokalizálása, a Unicode alkalmazása működő rendszerekben és alkalmazásokban, szövegelrendezésnél, és többnyelvű számítógépeken.

Normalization

Evolution & Revolution

W3C ،نشاط التدويل

W3Cنشاط التدويل، ✘

✔<description dir="rtl">W3Cنشاط التدويل، </description>

Bidirectional text support

Bidi isolation for inserted text

<span dir=rtl>לילית</span> - 3 reviews✘

Bidi isolation for inserted text

• CSS3 added the “isolate” value to the unicode-bidi property.

• HTML5 adds a new <bdi> element, with unicode-bidi:isolate in the default stylesheet.

• The <output> element behaves the same way.

Determining direction at run time

Determining direction at run time

• HTML5 adds new “auto” value for the dir attribute.

• CSS3 adds a “plaintext” value to the unicode-bidi property to allow per-paragraph auto-direction, primarily for use on <textarea> and <pre> elements.

• dir=auto sets the unicode-bidi CSS property to “plaintext” for <textarea> and <pre> elements, to “bidi-override isolate” for <bdo> elements, and to “isolate” otherwise.

• It estimates a direction according to the UBA method.

<p>Your search - <span class=booktitle dir=auto> תורהצה

CSS</span> - did notיוותדודיק תתתת match any documents.</p>

Unicode Isolate Controls

Four new codepoints:• U+2066 LEFT-TO-RIGHT ISOLATE (LRI)• U+2067 RIGHT-TO-LEFT ISOLATE (RLI)• U+2068 FIRST STRONG ISOLATE (FSI)• U+2069 POP DIRECTIONAL ISOLATE (PDI)

FSIפיצה!PDI - 3 reviews ==> !3 - פיצה reviews

Unicode Isolates -> HTML Markup

• http://www.w3.org/International/wiki/Html-bidi-isolation Needs Comments!– @direction (isolating)– Option options rejected:

• Change dir to be isolating• Use <bdi> for isolation• Add ‘rli’ ‘lri’ to @dir (<span dir=“rli”>)• Add @isolate (<span dir=“rtl” isolate>)

Other bidi changes

• Reporting the chosen direction of <input> and <textarea> in form submissions (@dirname)

• <br> should should serve as a bidi separator

• Block elements as bidi separators (isolating)

• <title> supports the dir attribute

• <option> supports the dir attribute and be displayed accordingly both in the dropdown and after being chosen

Implementers of user agents need to be prodded by the public to support the developing marketplace !

A ক国

hanging

alphabeticideographic

सथि�वि�

CSS3

Requirements for Japanese Layout

What about my language?

• Other language groups interested in building documents can do so– Korean nearing

FPWD– Indic languages– ???

Vertical text

Writing Mode

CSS3 has a new module for “writing mode” that supports vertical text.

http://www.w3.org/TR/css3-writing-modes/

Ruby annotation

<ruby>凝<rt>ぎょう</rt>視<rt>し</rt></ruby>

<ruby<rb>凝</rb><rt>ぎょう</rt></ruby> <ruby><rb>視</rb><rt>し</rt></ruby>

<ruby><rbc><rb>凝</rb><rb>視</rb></rbc><rtc><rt>ぎょう</rt><rt>し</rt></rtc></ruby>

Ruby annotation

Ruby Annotation• http://rishida.net/misc/ruby/ruby-authoring.html

Zusätzlich erleichtert PLS die Eingrenzung von Anwendungen, indem es Aussprachebelange von anderen Teilen der Anwendung abtrennt.

* { hyphens: auto; }

Zusätzlich er-leichtert PLS die Eingrenzung von Anwendungen, in-dem es Aussprac-hebelange von an-deren Teilen der Anwendung ab-trennt.

Hyphenation

Hyphenation Support

• Hyphenation support is starting to become available.

– Still works best with embedded (server-side) hinting

– Language support??

Still in flux… development needed

<DOCTYPE html>

<html lang=it>

<head>

<meta http-equiv=Content-Language content="en, it">

</head>

...

• Attributes indicate the language of text inside that element for text processors. Only one language value allowed.

• Meta elements indicate the language of the expected readership. Multiple languages are ok.

• Attributes override other declarations.

Language declarations

<DOCTYPE html>

<html lang=it>

<head>

<meta http-equiv=Content-Language content="en, it">

</head>

...

• Attributes indicate the language of text inside that element for text processors. Only one language value allowed.

• Meta elements indicate the language of the expected readership. Multiple languages are ok.

• Attributes override other declarations.

• The meta element with Content-Language is now non-conforming.

Language declarations

BCP 47 improvements

• Basis for Java7, JavaScript, PHP, .Net and other locale systems

• -u- extension

– Unicode Locales (RFC 6067)

• :lang pseudo-attribute

– CSS selection• -t- extension

– Transliterations and transformations (RFC 6497)

<time datetime="2004-08-08">8 ส�งหาคม ๒๕๔๗</time>

<form>

<input type="date">

</form>

Improved Date/Time Support

Locale Sensitivity

• Still an issue for the Web– Date pickers not locale or language sensitive– No markup-based control over format– Time zone support is spotty

JavaScript gets locales at last!

• ECMAScript ‘intl’ extension work– Locales based on BCP 47 language tags– Date, number formatting– Collation– and more…

• Core spec addressing Unicode needs, particularly supplementary character support

http://wiki.ecmascript.org/doku.php?id=strawman:i18n_api

ES I18N Spec

• Internationalization API Specification• Developed by ECMA TC 39 + experts• Collation, number, date & time formatting• Started fall 2010• Implementations and test suite in progress• Approved in December 2012

Webapps at W3C

• Various technologies that make Web-based applications possible are under development. Some samples:

– IDL– Web sockets, Web storage, Web workers– XHR– Widgets– Selectors– File APIs– DOM

The Widget Spec

• Widget containers deliver “apps” cross-platform based on HTML5– Extensive localization model– Ability to set base locale

<widget xmlns=http://www.w3.org/ns/widgets defaultlocale=“en”>

<name short="Weather"> Weather! a totally awesome application! </name>

<name short=" هوا و <"xml:lang="fa" dir="rtl "آب<span dir="ltr" xml:lang="en">Weather!</span> برنامه

بزرگ <name/> واقعا</widget>

42

ITS 2.0

• Internationalization Tag Set (ITS) 2.0• Currently being defined in W3C

MultilingualWeb-LT Working Group• Latest Draft 6 December 2012 (“Last Call”)

http://www.w3.org/TR/its20/ • WG Homepagehttp://www.w3.org/International/multilingualweb/lt/• ITS 2.0 test suite

https://github.com/finnle/ITS-2.0-Testsuite/

43

“Translate” locally in HTML5 or XML (example: DocBook)

<!DOCTYPE html><html> ...<p>The <span translate=no>World Wide Web Consortium</span> is making the World Web Web worldwide!</p>...</html>

<db:article ...><db:para>The <db:emphasis its:translate="no">World Wide Web Consortium</db:emphasis> is making the World Web Web worldwide!</db:para> ...</db:article>

Part of HTML5 !!

markup for bidirectional text

normalization

working with case sensitivity

more information about date & time

Capturing guidance for spec developers and implementers (and you)

Tests

Articles

Tutorials

Technical

notes

Tests

Talks

Tools

Reviews

http://www.w3.org/International/

International Activity

http://validator.w3.org/i18n-checker/

Checker tool

1. Discover

2. Check

Get involved!• Follow the discussions on the internationalization mailing lists

(eg. www-international@w3.org), and track other technologies for internationally relevant topics. Follow our RSS feeds and twitter channels (@webi18n and @multilingweb)

• Read and review specifications (http://www.w3.org/TR/tr-technology-drafts) and send comments to the www-international list or direct to the Working Group.

• Discuss local requirements for the Multilingual Web, and if you identify missing features, find ways to coordinate proposals.

• Use features needed for non-Latin script support and push implementers to include more in browsers and authoring tools.

• Join the Working Group

The Web needs your help

this is your Web – not the W3C's

we need you to make the Web worldwide

get involved

Thank youhttp://www.inter-locale.com/whitepaper/imug2013

Recommended