13

Click here to load reader

"Internationalisation with PHP and Intl" source code

Embed Size (px)

DESCRIPTION

Accompanying source code for "Internationalisation with PHP and Intl"

Citation preview

Page 1: "Internationalisation with PHP and Intl" source code

First play and test

<?php

//note that this script file is UTF-8

//UTF-8 CLI assumed, else you'll need this://header("Content-Type: text/html; charset=UTF-8;");

//'hi' is Hindi, 'fa' is Farsi, 'ar_EG' is Egyptian Arabic$locales = array('en', 'en_US', 'fr_FR', 'de_DE', 'hi', 'fa', 'ar_EG');

$number = 1234567890;

foreach($locales as $locale){

$formatter = new NumberFormatter($locale, NumberFormatter::DECIMAL);echo $locale . ":\t" . $formatter->format($number) . "\n";

}

//Output://en: 1,234,567,890//en_US: 1,234,567,890//fr_FR: 1 234 567 890//de_DE: 1.234.567.890//hi: १,२३,४५,६७,८९०//fa: ۱٬۲۳۴٬۵۶۷٬۸۹۰//ar_EG: ١٬٢٣٤٬٥٦٧٬٨٩٠

?>

Sorting German

<?php

//note that this script file is UTF-8

//UTF-8 CLI assumed, else you'll need this://header("Content-Type: text/html; charset=UTF-8;");

//some German surnames$german_names = array('Weiß', 'Goldmann', 'Göbel', 'Weiss', 'Göthe', 'Goethe', 'Götz');

sort($german_names);

//gives Array ( [0] => Goethe [1] => Goldmann [2] => Göbel [3] => Göthe//[4] => Götz [5] => Weiss [6] => Weiß )//which COINCIDENTALLY is the Austrian sort orderprint_r($german_names);

sort($german_names, SORT_STRING); //default is SORT_REGULARprint_r($german_names); //gives same as above

//BTW, you're not going to get far with setlocale() if you don't//have that particular locale supported on your OS!//on *nixes, something like:

Page 2: "Internationalisation with PHP and Intl" source code

//> locale --all-locales//will give you a list of all installed locales////you can give setlocale() a *list* of locales to try if//you're not sure how your OS is spelling it etc////ICU (and therefore Intl) uses its own locales and is not dependent//on the operating system for the locale data

setlocale(LC_ALL, 'de_DE');sort($german_names, SORT_LOCALE_STRING);print_r($german_names);//above gives Array ( [0] => Göbel [1] => Göthe [2] => Götz [3] => Goethe//[4] => Goldmann [5] => Weiß [6] => Weiss )//which isn't dictionary, phonebook or Austrian sort order :(//[it seems to be "umlauted vowel comes before plain vowel, eszett//comes before double ess" order]

setlocale(LC_ALL, 'de_AT');sort($german_names, SORT_LOCALE_STRING);print_r($german_names);//above gives same as for de_DE - ie. nothing good :(

//this is curious...setlocale(LC_ALL, 'de_DE.utf8');sort($german_names, SORT_LOCALE_STRING);print_r($german_names);//above gives Array ( [0] => Göbel [1] => Goethe [2] => Goldmann//[3] => Göthe [4] => Götz [5] => Weiss [6] => Weiß )//which is our dictionary sort order!

//----Let's try using Intl--------

$coll = new Collator('de_DE');

$coll->sort($german_names);

//gives Array ( [0] => Göbel [1] => Goethe [2] => Goldmann [3] => Göthe//[4] => Götz [5] => Weiss [6] => Weiß )//which is our dictionary sort order!print_r($german_names);

//Collator constructor can accept UCA keywords$coll = new Collator('de@collation=phonebook'); //see http://userguide.icu-project.org/collation/architecture and http://userguide.icu-project.org/locale

$coll->sort($german_names);

//gives Array ( [0] => Göbel [1] => Goethe [2] => Göthe [3] => Götz//[4] => Goldmann [5] => Weiss [6] => Weiß )//which is our phonebook sort order!print_r($german_names);

?>

Page 3: "Internationalisation with PHP and Intl" source code

Japanese era

<?php

//note that this script file is UTF-8

//UTF-8 CLI assumed, else you'll need this://header("Content-Type: text/html; charset=UTF-8;");

$timezones = array('en_GB' => 'Europe/London', 'ja_JP' => 'Asia/Tokyo','ja_JP@calendar=japanese' => 'Asia/Tokyo');

$now = new DateTime(); //DateTime is a core PHP class as of version 5.2.0

foreach($timezones as $locale => $timezone){

$calendar = IntlDateFormatter::GREGORIAN;

if(strpos($locale, 'calendar=') !== false){

//slightly presumptuous as @calendar=gregorian also exists$calendar = IntlDateFormatter::TRADITIONAL;

}

$formatter = new IntlDateFormatter($locale, IntlDateFormatter::FULL,IntlDateFormatter::FULL, $timezone, $calendar);

echo 'It is now: "' . $formatter->format($now) . '" in ' . "{$timezone}\n";}

//Last line of output gives "平成23年" which is Heisei 23!

?>

Korean numbers

<?php

//note that this script file is UTF-8

//UTF-8 CLI assumed, else you'll need this://header("Content-Type: text/html; charset=UTF-8;");

//See://http://en.wikipedia.org/wiki/Korean_numerals//http://askakorean.blogspot.com/2010/03/korean-language-series-sino-korean.html

$number = 1234567890;

$formatter = new NumberFormatter('ko_KR', NumberFormatter::SPELLOUT);

echo "Korean spellout ({$formatter->getLocale()}):\t". $formatter->format($number) . "\n";

//above gives [Korean spellout (en_GB): one thousand two hundred and thirty-four

Page 4: "Internationalisation with PHP and Intl" source code

//million, five hundred and sixty-seven thousand, eight hundred and ninety]//ie. locale has fallen back to system default (in this case en_GB)

//but ko_KR is a valid ICU locale string, so let's check:$formatter = new NumberFormatter('ko_KR', NumberFormatter::CURRENCY);

echo "Korean currency ({$formatter->getLocale()}):\t". $formatter->format($number) . "\n";

//above gives [Korean currency (ko): ₩1,234,567,890] which is correct

//ok, so it looks like we don't have the rules for Korean spellout//we'll have to supply the NumberFormatter with our own ruleset.//the technical details of the ruleset format are at//http://userguide.icu-project.org/formatparse/numbers/rbnf-examples//BUT we *do* have a ruleset for Japanese spellout which we can modify and use!//(it's similar because it also counts in ten thousands and has non-Arabic numerals)//the Japanese spellout ruleset is this (construct a NumberFormatter for Japanese//spellout and then var_dump($formatter->getPattern()))://////pattern for japanese spellout/* string(1520) "%financial:

0: 零;

1: 壱;

2: 弐;

3: 参;

4: 四;

5: 伍;

6: 六;

7: 七;

8: 八;

9: 九;

10: 拾;

11: 拾>%financial>;

20: <%financial<拾;

21: <%financial<拾>%financial>;

100: <%financial<百;

101: <%financial<百>%financial>;

1000: <%financial<千;

1001: <%financial<千>%financial>;

10000: <%financial<萬;

10001: <%financial<萬>%financial>;

100000000: <%financial<億;

100000001: <%financial<億>%financial>;

Page 5: "Internationalisation with PHP and Intl" source code

1000000000000: <%financial<兆;

1000000000001: <%financial<兆>%financial>; 10000000000000000: =#,##0=;

-x: マイナス>%financial>;

x.x: <%financial<点>%financial>;%traditional:

0: 〇;

1: 一;

2: 二;

3: 三;

4: 四;

5: 五;

6: 六;

7: 七;

8: 八;

9: 九;

10: 十;

11: 十>%traditional>;

20: <%traditional<十;

21: <%traditional<十>%traditional>;

100: 百;

101: 百>%traditional>;

200: <%traditional<百;

201: <%traditional<百>%traditional>;

1000: 千;

1001: 千>%traditional>;

2000: <%traditional<千;

2001: <%traditional<千>%traditional>;

10000: <%traditional<万;

10001: <%traditional<万>%traditional>;

100000000: <%traditional<億;

100000001: <%traditional<億>%traditional>;

1000000000000: <%traditional<兆;

1000000000001: <%traditional<兆>%traditional>; 10000000000000000: =#,##0=;

-x: マイナス>%traditional>;

x.x: <%traditional<・>%traditional>;"*/

//basically, for Japanese we count in groups of ten thousands//and we have a traditional set of characters//and a financial (anti-forgery) set of characters.////Korean also counts in groups of ten thousands

Page 6: "Internationalisation with PHP and Intl" source code

//so let's modify this pattern for Korean//(note that we'll use only South Korean Sino-Korean numbers)

$korean_pattern = '%hangul:

0: 영; 1: 일; 2: 이; 3: 삼; 4: 사; 5: 오; 6: 육; 7: 칠; 8: 팔; 9:

구; 10: 십;

11: 십>%hangul>;

20: <%hangul<십;

21: <%hangul<십>%hangul>;

100: 백;

101: 백>%hangul>;

200: <%hangul<백;

201: <%hangul<백>%hangul>;

1000: 천;

1001: 천>%hangul>;

2000: <%hangul<천;

2001: <%hangul<천>%hangul>;

10000: <%hangul<만;

10001: <%hangul<만>%hangul>;

100000000: <%hangul<억;

100000001: <%hangul<억>%hangul>;

1000000000000: <%hangul<조;

1000000000001: <%hangul<조>%hangul>;10000000000000000: =#,##0=;-x: ->%hangul>;

x.x: <%hangul<・>%hangul>;%hanja:

0: 〇;

1: 一;

2: 二;

3: 三;

4: 四;

5: 五;

6: 六;

7: 七;

8: 八;

9: 九;

10: 十;

11: 十>%hanja>;

20: <%hanja<十;

21: <%hanja<十>%hanja>;

100: 百;

101: 百>%hanja>;

Page 7: "Internationalisation with PHP and Intl" source code

200: <%hanja<百;

201: <%hanja<百>%hanja>;

1000: 千;

1001: 千>%hanja>;

2000: <%hanja<千;

2001: <%hanja<千>%hanja>;

10000: <%hanja<萬;

10001: <%hanja<萬>%hanja>;

100000000: <%hanja<億;

100000001: <%hanja<億>%hanja>;

1000000000000: <%hanja<兆;

1000000000001: <%hanja<兆>%hanja>;10000000000000000: =#,##0=;-x: ->%hanja>;

x.x: <%hanja<・>%hanja>;';

$formatter = new NumberFormatter('ko_KR', NumberFormatter::PATTERN_RULEBASED,

$korean_pattern);

$formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET, "%hangul");

$numbers = array_merge(range(0, 20), range(30, 100, 10),array(1000, 10000, 100000000, 1000000000000));

foreach($numbers as $number){

echo "{$number}: {$formatter->format($number)}\n";}

//outputs a correct list of Korean Hangul numbers

?>

strftime (not on presentation)

<?php

//note that this script file is UTF-8

//UTF-8 CLI assumed, else you'll need this://header("Content-Type: text/html; charset=UTF-8;");

//out of curiosity, let's see how PHP's core strftime() handles//some different locale date formats

//BTW, you're not going to get far with setlocale() if you don't//have that particular locale supported on your OS!//on *nixes, something like://> locale --all-locales//will give you a list of all installed locales//

Page 8: "Internationalisation with PHP and Intl" source code

//you can give setlocale() a *list* of locales to try if//you're not sure how your OS is spelling it etc////ICU (and therefore Intl) uses its own locales and is not dependent//on the operating system for the locale data

$locales = array(array('fi_FI.utf8', 'fi_FI'), array('ja_JP.utf8', 'ja_JP'),

array('fr_FR.utf8', 'fr_FR'));

$format = "%A %d %B %Y"; //ie. "Wednesday 17 August 2011"

foreach($locales as $locale_array){

$locale = setlocale(LC_TIME, $locale_array);echo "{$locale}:\t" . strftime($format) . "\n";

}

//Output://fi_FI.utf8: keskiviikko 17 elokuu 2011

//ja_JP.utf8: 水曜日 17 8月 2011//fr_FR.utf8: mercredi 17 août 2011

//Comment://Not bad at all. The Japanese date won't be very natural-looking for a//native Japanese speaker as the day and year aren't quantified with the//appropriate character.//Also seem to be some implementation issues (http://uk.php.net/manual/en/function.strftime.php)//Intl extension is giving us more power and flexibility

?>

Japanese financial numbers (not on presentation)

<?php

//note that this script file is UTF-8

//UTF-8 CLI assumed, else you'll need this://header("Content-Type: text/html; charset=UTF-8;");

//SUPPORTED LOCALES FOR "SPELLOUT" IS A LOT MORE LIMITED//THAN FOR DECIMAL OR CURRENCY ETC...////from ICU website://"ICU provides number spellout rules for several locales,//but not for all of the locales that ICU supports, and not all of the predefined rule types.//Also, as of release 2.6, some of the provided rules are known to be incomplete."

$number = 1234567890;

$formatter = new NumberFormatter('ja_JP', NumberFormatter::SPELLOUT);

echo "Default Japanese spellout:\t" . $formatter->format($number) . "\n";

Page 9: "Internationalisation with PHP and Intl" source code

//above gives [十二億三千四百五十六万七千八百九十] - the usual kanji numbers

//see "Key/Type Definitions" at http://www.unicode.org/reports/tr35$formatter = new NumberFormatter('ja_JP@numbers=jpanfin', NumberFormatter::SPELLOUT); echo "Modified locale spellout:\t" . $formatter->format($number) . "\n";

//above also gives [十二億三千四百五十六万七千八百九十] - not our financial kanji numbers!

//Hmmmm, but if we now var_dump($formatter->getPattern()) we get:

//pattern for japanese spellout//(interestingly, financial kanji here and at http://www.sljfaq.org/afaq/banknote-numbers.html differ)/* string(1520) "%financial:

0: 零;

1: 壱;

2: 弐;

3: 参;

4: 四;

5: 伍;

6: 六;

7: 七;

8: 八;

9: 九;

10: 拾;

11: 拾>%financial>;

20: <%financial<拾;

21: <%financial<拾>%financial>;

100: <%financial<百;

101: <%financial<百>%financial>;

1000: <%financial<千;

1001: <%financial<千>%financial>;

10000: <%financial<萬;

10001: <%financial<萬>%financial>;

100000000: <%financial<億;

100000001: <%financial<億>%financial>;

1000000000000: <%financial<兆;

1000000000001: <%financial<兆>%financial>; 10000000000000000: =#,##0=;

-x: マイナス>%financial>;

x.x: <%financial<点>%financial>;%traditional:

0: 〇;

Page 10: "Internationalisation with PHP and Intl" source code

1: 一;

2: 二;

3: 三;

4: 四;

5: 五;

6: 六;

7: 七;

8: 八;

9: 九;

10: 十;

11: 十>%traditional>;

20: <%traditional<十;

21: <%traditional<十>%traditional>;

100: 百;

101: 百>%traditional>;

200: <%traditional<百;

201: <%traditional<百>%traditional>;

1000: 千;

1001: 千>%traditional>;

2000: <%traditional<千;

2001: <%traditional<千>%traditional>;

10000: <%traditional<万;

10001: <%traditional<万>%traditional>;

100000000: <%traditional<億;

100000001: <%traditional<億>%traditional>;

1000000000000: <%traditional<兆;

1000000000001: <%traditional<兆>%traditional>; 10000000000000000: =#,##0=;

-x: マイナス>%traditional>;

x.x: <%traditional<・>%traditional>;"*/

//so the financial kanji are in there but how to wrangle them out??

$formatter = new NumberFormatter('ja_JP', NumberFormatter::SPELLOUT);

$formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET, "%financial");

echo "setTextAttribute spellout:\t" . $formatter->format($number) . "\n";

//above gives [拾弐億参千四百伍拾六萬七千八百九拾] - bingo!

//now, out of curiosity

//same formatter as above

Page 11: "Internationalisation with PHP and Intl" source code

$formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET, "%traditional");

echo "setTextAttribute spellout:\t" . $formatter->format($number) . "\n";

//yes, this gives [十二億三千四百五十六万七千八百九十]

//notice that the %traditional and $financial patterns//differ in more than just the characters used//(for example, look at each format for the value of 100).//let's take a look

$numbers = array(100, 199, 200, 201, 1000, 1999, 2000, 2001);

$traditional_formatter = new NumberFormatter('ja_JP', NumberFormatter::SPELLOUT);$financial_formatter = new NumberFormatter('ja_JP', NumberFormatter::SPELLOUT);$financial_formatter->setTextAttribute(NumberFormatter::DEFAULT_RULESET, "%financial");

foreach($numbers as $number){

echo "{$number} as traditional:\t" . $traditional_formatter->format($number) . "\n";

echo "{$number} as financial:\t" . $financial_formatter->format($number) . "\n";

echo "----------------\n";}

//outputs:/*

100 as traditional: 百

100 as financial: 壱百----------------

199 as traditional: 百九十九

199 as financial: 壱百九拾九----------------

200 as traditional: 二百

200 as financial: 弐百----------------

201 as traditional: 二百一

201 as financial: 弐百壱----------------

1000 as traditional: 千

1000 as financial: 壱千----------------

1999 as traditional: 千九百九十九

1999 as financial: 壱千九百九拾九----------------

2000 as traditional: 二千

2000 as financial: 弐千----------------

Page 12: "Internationalisation with PHP and Intl" source code

2001 as traditional: 二千一

2001 as financial: 弐千壱----------------*///We see that the different rules enable the financial spellout to write, say, "one thousand" instead of//the traditional "thousand". This clearly makes sense in an anti-forgery context.

//User exercise: compare and contrast with PHPs' core localeconv()

?>

Locale::acceptFromHttp (not on presentation)

<?php

//note that this script file is UTF-8

//We can set Intl's locale based on the browser's HTTP_ACCEPT_LANGUAGE header.//Browser's send this header based on their "prefered language" setting.//Only power users would tinker with this setting directly, but we can assume//that it is *usually* correct.//Google sites are quite good at using this header, try changing your//browser's prefered language setting and then visit your favourite//Google site!

header("Content-Type: text/html; charset=UTF-8;");

echo 'Browser\'s Accept-Language header: ' . $_SERVER['HTTP_ACCEPT_LANGUAGE'] . '<br>';

$browser_locale = Locale::acceptFromHttp($_SERVER['HTTP_ACCEPT_LANGUAGE']);echo 'Decided browser locale: ' . $browser_locale . '<br>';

Locale::setDefault($browser_locale);echo 'Intl default locale now: ' . Locale::getDefault() . '<br>'; //a check

$all_variants = Locale::getAllVariants(Locale::getDefault());echo 'All variants: ';print_r($all_variants);echo '<br>';

$language_name = Locale::getDisplayLanguage(Locale::getDefault());echo 'Language display name: ' . $language_name . '<br>';

$region_name = Locale::getDisplayRegion(Locale::getDefault());echo 'Region display name: ' . $region_name . '<br>';

$script_name = Locale::getDisplayScript(Locale::getDefault());echo 'Script display name: ' . $script_name . '<br>';

Page 13: "Internationalisation with PHP and Intl" source code

$variant_name = Locale::getDisplayVariant(Locale::getDefault());echo 'Variant display name: ' . $variant_name . '<br>';

$keywords = Locale::getKeywords(Locale::getDefault());echo 'Keywords: ';print_r($keywords);echo '<br>';

?>