Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Mahipalsinnh RanaMember of Technical StaffSun Microsystems
1
Mahipalsinh RanaMember of Technical StaffSun Microsystems
Internationalization360˚ Testing
AgendaIntroduction of I18n - 40 MinutesInternationalization(I18n) 360˚ testing - 60 MinutesTesting Standalone Applications – 15 minutesQuiz – 10 minutesTesting Web Applications – 15 minutesQuiz – 10 minutesI18n testing Automation – 30 minutesAdvanced I18n testing , References – 15 minutesQ/A - 15 minutes
Understanding of Internationalization(I18n)Why I18n testingMyths for I18n testing Scope of I18n testingTerminologies in i18n technology
Character set/Character repertoireCharacter Code/Code Point,Coded CharacterEncoding , Unicode , UTF-8 ,UTF-16 ,UTF-32Glyph , Fonts , Input Method Engine (IME)Locale
Introduction
“Everyone has the right... to seek, receive and impart information and ideas through any media regardless of frontiers”
-- Universal Declaration of Human Rights
Why Globalization
Why Globalization
Sun Portal server in Chinese
Why Globalization
Yahoo.com in Kannada
Why Globalization
“Visitors linger twice as long as they do at English-only URL's.Business users are 3 times more likely to buy when addressed in their language.Customer service costs drop when instructions are displayed in the user's native language."
'Strategies for Global Sites'Donald DePalmaForrester Research Inc.
Why Globalization
"One large IT company discovered that asignificant percentage of inquiries were comingfrom South Korea - they created a Koreanwebsite and revenues rose by 8 percent."
'Global eCommerce'Donald J. PlumleyBowne Global Solutions
What's with the acronyms?
Internationalization ====> i18n , How?There are 18 characters between i and n
With that logic :Localization ====>L10n Globalization ====> G11nTranslation ===> T9n
and you can call me M5l ==> Mahipal ,
Don't they all look the same?
LocalizationInternationalizationGlobalizationTranslation
How do they differ and relate?
An
Globalization encompasses i18n and l10n.Internationalization enables localization.An expert in i18N may not be an expert in l10N.
LISA* DefinitionsGlobalization-(G11n)
“Globalization addresses the business issues associated with taking a product global. In the globalization of high-tech products this involves integrating localization throughout a company, after proper internationalization and product design, as well as marketing, sales, and support in the world market.”
Internationalization-(I18n)“Internationalization is the process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re-design. Internationalization takes place at the level of program design and document development.”
Localization-(L10n)“Localization involves taking a product and making it linguistically and culturally appropriate to the target locale (country/region and language) where it will be used and sold.”
*Localization Industry Standards Association
Why I18n testing ?
I18n testing is required for enable product localization in multiple languages.Removing barriers to localization
Enabling UnicodeIndependence from UI strings in codeHandling legacy character encodings.Separating localizable elements from source.
Enabling code to support local,regional, language, or culturally related preferences.
Myths for I18n Testing
Misunderstood as translation testingOnly language expert can perform i18n testingDone after product releasedMisunderstood with product localizationIt is only about String messages
Terminologies of I18n
What is Character set/Character repertoire?What is Character Code/Code Point,Coded Character?What is Unicode?What is meant by Encoding?
UTF-8,UTF-16,UTF-32What is Glyph?What is Font?What is Input Method Engine(IME) ?What is Locale?
What is Character, Character Set ?
A character is just an abstract minimal unit of text. It doesn'thave a fixed shape (that would be a glyph), and it doesn't have a value.
"A" is a character, and so is "$", the symbol for the currency. Character set/repertoire is a collection of characters.Examples
!Making the World Wide Web world wide!
!!
What is Character Code/Code Point, Coded Character Set ?
Character Code - A mapping, which defines a one-to-one correspondence between characters in a character repertoire and a set of non-negative integers. Examples of character codes:
ASCII, ISO Latin 1 alias ISO 8859-1, ISO 10646, the Windows character set exists in different variations,or "code pages" (CP)-Windows code page 1252 etcA Character Code point is unique non-negative integer assigned to character in character code
A coded character set is a character set where each character has been assigned to a unique code point
What is Character Code/Code Point, Coded Character Set ?
ASCII character set , one of early character set
Image Source :
What is Character Code/Code Point, Coded Character Set ?
Ex. ASCII Character set
8 bit character set , cover most of character needed by Europeans but What about east part of the world?
Image Source:
Unicode
Ex. ASCII Character set
Answer is
It has characters from almost every written script in this worldEuropean alphabetic scripts
Latin,Greek,Cyrillic,Armenian,Georgian,Runic,Ogham,Modifier lettersMiddle East Scripts
Hebrew,Arabic,Syriac,ThaanaSouth & South East Asian scripts
Devanagari,Bengali,Gujurati,Panjabi,Oriya,Tamil,Telugu,Kannada,Malayalam
East Asian scripts Han,Hiragana,Katakana,Hangul,Bopomofo,Yi
SymbolsCurrency symbols,Letter like symbols,Mathematic operators,Numeric forms,Technical symbols,Geometrical symbols
Additional scriptsEthiopic Cherokee Canadian Aboriginal Syllabics Mongolian
What is Character Encoding ?
A mapping from a set of non-negative integers that are elements of a Coded Character Set, to a set of sequences of particular code units of some specified width, such as 8- bit/16-bit/32-bit integers
The most commonly used code units are bytes, but 16-bit or 32-bit integers can also be used for internal processing.
Examples are UTF-8,UTF-16,UTF-32
UTF-8 , UTF-16 , UTF-32
UTF-32 simply represents each Unicode code point as the 32-bit integer of the same value.UTF-16 uses sequences of one or two unsigned 16-bit code
units to encode Unicode code points. [Values U+0000 to U+FFFF are encoded in one 16-bit unit with the same value. Supplementary characters are encoded in two code units]UTF-8 uses sequences of one to four bytes to encode Unicode
code points. [U+0000 to U+007F are encoded in one byte, U+0080 to
U+07FF in two bytes, U+0800 to U+FFFF in three bytes, and U+10000 to U+10FFFF in four bytes.]
Relation between Character set and Encoding
Characters A
Code Point 41 5D0 597D
UTF-8 41 D7 90 E5 A5 BD UTF-16 00 41 05 D0 59 7D UTF-32 00 00 00 41 00 00 05 D0 00 00 59 7D
Different encodings yield different byte sequences for same character in Character set
Unicode Character set, code set, encodings
UniversalCharacter set/repertoire
UnicodeCodePoints
UTFencodings
All Character set will bea subset of this hugecharacterrepertoire.ASCIIset,French,Japanese,Korean,Devanagari
Each Unicodecharacter isassigned aUnicode Codepoint .Rangeis U+0000 toU+10FFFF.
UTF-8,UTF-16,UTF-32are theencodingformatsforinternalprocessing
What is Glyph?
A glyph - a visual appearanceIt is important to distinguish the character concept from the glyph concept. A glyph is a presentation of a particular shape which a character may have when rendered or displayed.Example: a letter and different glyphs for it:latin capital letter z (U+00E9)
Z Z Z Z+ + + + +
What is Font?
A repertoire of glyphs comprises a fontA font is a numbered set of glyphs.The numbers correspond to code positions of the characters (presented by the glyphs).Font including characters for a language should be available for an application to display text for the language
What is Input Method Engine(IME)
Input methods capture a sequence of keystrokes and form a character or characters as input for languages
Input Method Engine (IME) is a program or operating system component that allows computer users to enter complex characters and symbols using a standard Western keyboard. It is also referred as Input Method Environment.
What is Locale?Locale is a set of parameters that defines the user's language, country and any special variant references that the user wants to see in their user interface.
The locale naming convention is usually:language[_territory][.encoding][@modifier].
Example for Hindi with UTF8 encoding : hi_IN.UTF8Encoding [ Native encoding (iso8859-*, Shift_JIS,GB18030, BIG5, ISO2022) , Unicode encoding (UTF-8, UTF-16, ) ]
What is Locale?Behavior affected by Locale
Language culture dataSorting, searching, text boundary, text conversionIndexingCountry culture dataCalendar, date/time/number/currency formatPeople name/mailing address layout
I18n 360˚ Testing Approach
What is Traditional ApproachWhat is 360˚ ApproachCase StudyRequirement PhaseDesign PhaseImplementation PhaseQA PhaseDocumentation
Generally start after build released by development team. In some case starts even after product release as they release separate international release
Major Focus on functionality testingI18n testing done on following
MessagesDate/CalenderSorting/Searching
Traditional Approach of I18n Testing
Major Architectural flaws related to i18n get caught quite late Support for adding new language not givenDoes not consider global cultural requirementsAll the issues are reported in QA phase which takes longer time to fixDocumentation does not care about usage in non-english environment
Traditional Approach of I18n Testing
Start as early as Product planningI18n has role to play in each phase of Product life cycleNo corner untouchedHelps to design and build better products for Global Customers
What is 360˚ Approach
UsecasesSearch Trains/FareMake ReservationKnow Passenger statusKnow Train ScheduleUser management
We will use this case study to illustrate key points in this workshop
Case Study – Railway Reservation System
Requirements phase
What is Global market requirements?Languages ,Regions to be supportedWhat Date format will be supported?What Calender will be supported?
Gregorian , Vikram samvat , Lunar etcWhat Cultural requirements to be taken care of?
Case Study - Requirements phase
Who are the customers of this website?What languages will our target customers use?Which payment methods should be made available?Should we display information visually (ex. seat availability) or textually?What kind of Internet access/computers will our target customers use? how will that effect l10n?Should email confirmations/alerts be sent using local language?
Design phase
What approach will be used to support multiple languages?
Browser based and/or Command line basedList of languages on website
User interface designHow I18n of UI Messages will be done?How I18n of UI components
Button , Dropdown box size to accommodate multibyte valuesReview of images for cultural sensitivity ex. A sentence "Every <number> days" contains variable part which will be a input from text field. So, while design engineer should externalize the whole string as a single string. Not as 3 strings and concatenate programmatically.
Design phaseI18n compliant Product architecture
Consideration of Encoding , Charset , Bi-diStandard I18n mechanism or customized i18n solution for each technology used in Product
ex. Java I18n , JSP I18n , AJAX I18n , Jruby I18nHow Locale fall-back will be handled?How I18n of Date,Calender,Sorting and Searching techniques will be done?How I18n of Error Messages , System error messages will be done?How I18n of Log message will be done?Input/Output should handle multibyte characters
Which features does not need I18n?
Case Study - Design phase
How can customer change language?How can messages on website will be visible in local languages of customer?How different encoding will be handled?How UI components on website handle messages in different language?How user can register in local languages?How I18n of various technologies done?
Implementation phase – Interaction with Development Team
Setting up common convention Naming convention of localizable filesDirectory to store localizable filesHow to specify non localizable text in property file or html file
Educating developer about i18n best practicesMost of technology has standard way of doing i18nDefining customized i18n solution for technologies which does not have standard i18n solution
Implementation phase – Code review
Best way to find early and most common i18n issuesShould be done to catch following
Messages externalizationDate , Calender I18nEncoding handling , HTTP content headerSearching technique i18nSorting technique i18nInput field should have clear hints of which character are allowed
I18n implementation should be common across modules for same technology
ex. Java I18n should be done in same way across modules
Implementation phase – Unit testing
Incorporate i18n in developer level i18n testinglets developers see for themselves if they broke i18nhelps prevent regressionimproves product quality tremendously
Case Study – Implementation phase
Find out any hard coded messages in codeCheck for encoding in html or jsp pageVerify how date are being displayedCheck out button,Dropdown size , is it sufficient for localized charactersInclude i18n testcases in developer testing
QA phase – I18n Test case writingI18n test plan
Which build to start i18n testingHow much testing requiredWhich area to focus more for i18n and which are for less
I18n test cases writing and reviewReview base team testcases for functionality coverageTestcases should capture flow of mutlibyte data in productTestcases should cover culture specific issues
Date format change in various languagesInclude negative testcases for i18n
Fields which does not accept multibyte data
QA phase – Configuration MatrixConfiguration matrix for i18n testing
Which Locale to be testedWhich Encoding to be testedWhich Platforms to be tested
Install OS with l10n supportWhich features to be tested
Hint : test features which base testing team has already tested
QA phase – Cultural Differences
Language ,Cultural specific representation of dataex. name and address formats are specific to language
,
–
Format Examples town, province postalcode China, India
postalcode town-province Brazil postalcode town, province M éxico
town province postalcode USA, Canada, Australia
Symbolism can differ from place to place. For example the check mark means incorrectin some places around the world. Ensure that you do not give the wrong message through your use of colors,symbolism, examples, etc.Be cautious with humour It doesn't travel well.When dealing with graphics, consider how to deal with text. Ideally the text will be overlaid on a graphic, rather than embedded in it. If the text is within the graphic, try to ensure that you develop it in layers, with text on a separate layer, so that when it comes to translation the text can be easily removed and replaced over complicated backgrounds.
Examples used in text are understandable by the audience of the translated version.
QA phase – Cultural Differences
Image Source :
Fast relief, when youneed it most!
Color also has different connotations in different parts of the world.
For example, a black wedding kimono is not as strange in Japan as it may seem to a European.
QA phase – Cultural Differences
Image Source :
QA phase – Culture DifferencesCulture specific order
Image Source :
InputEntering data in different languages – Is one keystroke equal to one character for non-English languages?Application should parse input multibyte data and process accordinglyOperating system allows to enter data in various languagesApplication can also provide inbuilt feature. Ex. Orkut
QA phase – Human Interface
OutputDisplaying data in different languages - what you enter, what stores in memory & what gets displayed – Is this all one-to-one mapping?It becomes complex and includes many-to-one mapping Text Rendering, Reordering, Layout of strings becomes complexOne character will not be equal to one glyphExample: Languages like Hindi which have Complex Text Layout(CTL), which can use a number of glyphs to form a single character
QA phase – Human Interface
What are the considerations when you have to process Text which are in different languages ?
Text Boundary - Character/Word/Sentence/Line BoundaryChinese and Japanese do not have space between wordsCTL character may contain multiple code points (glyphs)
Text Input/Output, Encoding ConversionText transferred between applications or external files should have consistent encoding, else encoding conversion is involved
Text Layout and Direction, Vertical and BiDiSome Asian countries still use vertical writing systemArabic and Hebrew use Bi-Direction writing system
Text Sorting and Searching
QA phase – Text Processing
Vertical characters should be correctly displayed for based on languages
text proceeds downwards syllable by syllable, not letter by letter.
QA phase – Presentation Matters
Image Source :
Right to left layoutBBC site in Left to Right and Right to Left language.
QA phase – Presentation Matters
Formatting of Data is different when dealing with different languages / regionsDate/Time formats, Calendar
Date/time formats are different across languages and countriesSome countries use local calendar as their official calendar
Number/Currency formatShow number in the format of the language user prefersSuch number should be parsed by number parser for the user preferred language
QA phase – Format
What are the considerations when dealing with messages in your application?
Externalizing UI messages, Error messages from program to resource files for localizationCategorise static content like (help files / docs ) to languages specific directory
Message FormattingWhen message contains more than one place holders, you need to consider that the translated messages may re-order these place holders
Message EncodingMessages should be encoded in the encoding that the application expects
QA phase – Message
Can be used when product is yet to be localizedCreate localized resource bundle by adding localized character at beginning and end of each English messagesEffective way of finding hard coded stringsex. English Resource bundle
ex. English resource bundle MyMessages.propertieswelcome=Welcome to I18n WorldstartProcess=Start the process
Create resource bundle for Hindi as followMyMessages_hi.propertieswelcome= Welcome to I18n WorldstartProcess= Start the process
QA phase – Pseudo localization
Access the website in non english languageDo registration as non english userBook a ticket for non english passengerVerify site able to display non english characters correctlyEnsure website provide correct responses with non english inputsCheck whether website comes in user language
Case Study - QA phase
How to install product in non-English environmentHow to configure features in non-English environmentHow to add new language to ProductVerify I18n specific hints and processes documented correctlyHints to translator regarding culture specific images in documentationCase Study
Documentation
Setting of localized environmentOperating system with l10n supportStarting product in non-English environment
Language selectionApplications testing with multibyte dataQuiz
Testing Standalone applications
Setting of localized environmentOperating system with l10n supportStarting product in non-English environment
Browser preferred languageContent negotiationPresidency of language (user preferred locale, browser preferred locale, platform locale)Application Testing with multibyte dataQuiz
Testing Web applications
Automation frameworkAutomation tool should support multibyte dataLeverage from core testing team
Scope of testing to be automatedRegression testingDemo
Automation Testing
Speech basedHigher recognition accuracy can be obtained by tailoring voice input to regional dialectsVoice output in the wrong dialect can make an application sound ‘foreign’Applications supported with regional dialects have better impact
Indic , Bi-Di specific issuesTitles and NamesDifferent ways of expressing currencyPresentation / Styling issuesCalenders - Vikram Samvat/ Saka / Hijri/Islamic
Advanced I18n testing
Advanced I18n testing – International Domain Name (IDN)
Lot of demand for not ASCII domain nameshttp://räksmörgås.josefsson.org/mål/franzén.html
domain name path
New standards have come out of the IETF recently that make this possible.The W3C personnel contributed to the development of these standards.There are still some hurdles to overcome with regard to security anddeployment, but it is possible to use these now. For more information seehttp://www.w3.org/International/articles/idn-and-iri/ .
References
W3C Internationalization :http://www.w3.org/International/Sun Software Globalization : http://developers.sun.com/techtopics/global/Software Globalization - Architecture, Design,Testing : http://developers.sun.com/techtopics/global/technology/arch/Software Globalization- JES : http://developers.sun.com/techtopics/global/products_platforms/jes/Sun Software Product Internationalization Taxonomy : http://developers.sun.com/dev/gadc/des_dev/i18ntaxonomySubscribe to Software globalization NewsLetter : http://developers.sun.com/dev/gadc/subscribe/index.htmlTechnical articles on Java Internationalization : http://java.sun.com/developer/technicalArticles/Intl/Java Internationalization Tutorial : http://java.sun.com/docs/books/tutorial/i18n/index.htmlThe Java Tutorial's Weblog: http://blogs.sun.com/thejavatutorials/
Last but not the least!
“Maintain that rapport with your Development team.”
Q/A