View
32
Download
0
Category
Tags:
Preview:
DESCRIPTION
ARCH-12 Broaden Your Potential Customer Base Using Unicode. David Lund Sr. Training Program Manager, Progress. Broaden Your Potential Customer Base Using Unicode. Unicode is the best way to support multiple languages A number of recent OpenEdge ™ enhancements facilitate Unicode - PowerPoint PPT Presentation
Citation preview
ARCH-12Broaden Your Potential Customer Base Using Unicode
David LundSr. Training Program Manager, Progress
2 © 2005 Progress Software Corporation ARCH-12, Unicode
Broaden Your Potential Customer Base Using Unicode
Unicode is the best way to support multiple languages
A number of recent OpenEdge™ enhancements facilitate Unicode
OpenEdge tools simplify the task
3 © 2005 Progress Software Corporation ARCH-12, Unicode
Agenda - Implementing Unicode
Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider
4 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Essentials
Unicode foundation for creating internationalized and localized applications
Unicode provides a unique number for every character
Lossless round tripping– Mapping from any Unicode coded character
sequence S to a sequence of bytes and back will produce S again
5 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Essentials
UTF – Unicode Transformation Format– Algorithm for mapping (encoding) Unicode
scalar value to a unique sequence
– 3 formats (mappings) UTF-8, UTF-16, UTF-32
– Formats vary in how they handle mapping Impacts access, storage, and performance
6 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Essentials
Code Page = table that assigns a numeric value– Letters, numbers, punctuation, control codes, etc.
– prolang\list-cp.p lists code pages in convmap Sample Code Page – IBM850 (partial)
– Character ‘2’ is hex 32
20 30 400 0 @1 ! 1 A2 2 B3 # 3 C
7 © 2005 Progress Software Corporation ARCH-12, Unicode
Progress I18N Essentials
Undefined code page– Tells Progress not to do any conversions
when reading or writing data
– For example Sports database uses undefined Can be used with any character set
I18N (Internationalization)
8 © 2005 Progress Software Corporation ARCH-12, Unicode
Progress I18N Essentials
Startup parameters – -cpinternal
Code page used for internal data processing
– -cpstream Code page used for stream files
– Parameter file prolang\UTF\UTF-8.pf
I18N (Internationalization)
-cpinternal utf-8-cpstream utf-8
9 © 2005 Progress Software Corporation ARCH-12, Unicode
Progress I18N Essentials
Performing code page conversions– Progress provides a character set
management facility– Automatically converts data between the
code pages of different data sources and targets
Must be in CONVMAP file
Targets for code page conversion– Memory (-cpinternal)– Streams (-cpstream)– Databases
10 © 2005 Progress Software Corporation ARCH-12, Unicode
Progress I18N Essentials
proutil <dbname> –C CODEPAGE-COMPILER convmap.dat convmap.cp
Referenced code pages must be in CONVMAP
Modifying CONVMAP– Edit convmap.dat
– Compile CONVMAP
– Make convmap.cp available to session Progress installation directory PROCONV environment variable -convmap startup parameter
11 © 2005 Progress Software Corporation ARCH-12, Unicode
Progress I18N Essentials
Converting characters or strings in memory– Specify code page in functions
ASC CHR CODEPAGE-CONVERT
Converting input and output data– Specify code page in statements
INPUT FROM (input source to memory target) OUTPUT TO (memory source to output target)
12 © 2005 Progress Software Corporation ARCH-12, Unicode
Fonts for Unicode
Locating fonts on windows– C:\WINDOWS\Fonts
– Control Panel, select Font icon Unicode fonts may need to be purchased Setting Unicode fonts for Progress
– Progress.ini
– Use ini2reg.exe to place in registry
13 © 2005 Progress Software Corporation ARCH-12, Unicode
System Resources
14 © 2005 Progress Software Corporation ARCH-12, Unicode
Agenda - Implementing Unicode
Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider
15 © 2005 Progress Software Corporation ARCH-12, Unicode
Migrating a Database to Unicode
Two ways to migrate database to Unicode– Dump and Load– Converting the database without doing a
dump and load
Start an OpenEdge session– Use startup parameters
-cpinternal UTF-8 -cpstream UTF-8
16 © 2005 Progress Software Corporation ARCH-12, Unicode
Migrating a Database to UnicodeCautions:
Backup your database Dump definitions and data
– Do not do a binary dump and load Binary data is not converted to the code
page of the database when it is loaded
– Always use Data Admin tool Goes through automatic conversion
Using dump and load 1 of 3
17 © 2005 Progress Software Corporation ARCH-12, Unicode
Migrating a Database to Unicode
Create an empty UTF-8 database– Data Administration tool
Database>Create Database
– Create Database dialog Select radio set to create a copy of some
other database Select an empty database from
prolang/UTF-8– For example empty4.db
Using dump and load 2 of 3
18 © 2005 Progress Software Corporation ARCH-12, Unicode
Migrating a Database to Unicode
Load the Definitions– Load will convert to UTF-8 automatically
Load the Data– Data will be automatically converted to
UTF-8 from the dumped code page when it is loaded
Using dump and load 3 of 3
19 © 2005 Progress Software Corporation ARCH-12, Unicode
Migrating a Database to Unicode
Backup your database Use proutil to convert the database
Load the UTF-8 collation table– prolang/UTF/_tran.df
Assign a word break rules to the database Rebuild the indexes
Converting without a dump and load
proutil <db-name> -C convchar convert UTF-8
proutil <db-name> -C idxbuild
20 © 2005 Progress Software Corporation ARCH-12, Unicode
Agenda - Implementing Unicode
Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider
21 © 2005 Progress Software Corporation ARCH-12, Unicode
Benefits of GUI Unicode Client
Multi-lingual– Able to use data from multiple languages in the
same session Fully enables AppBuilder to build multilingual
UTF-8 applications Easier deployment:
– Lower costs, higher ROI– No need to have different configurations using
specific settings per language Increased competitive advantage
– No (or very few changes) required to existing apps to take advantage of GUI Unicode client
Added in OpenEdge 10.0A release
22 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Editor
RichEdit editor in OpenEdge 10– Supports Unicode
Selecting an editor– Modify UseSourceEditor in progress.ini – Default SlickEdit:
UseSourceEditor=yes– For Unicode use RichEdit:
UseSourceEditor=no
23 © 2005 Progress Software Corporation ARCH-12, Unicode
Demonstration
GUI UnicodeClient
MultipleLanguages
24 © 2005 Progress Software Corporation ARCH-12, Unicode
Agenda - Implementing Unicode
Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider
25 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic Sorting
Language sensitive collations
– Tailor to expectations of locale Language Country
Easy to use– Functions just like any other
collation for 4GL
The goal …
26 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Sorting
OpenEdge 10.0A supports binary sorting Basic collation support Sorts by value in code page Possible to do user defined sorting
OpenEdge 10.0B also supports linguistic sorting– Supports ICU collations
International Components for Unicode OpenEdge does not support multiple
collations in the database
27 © 2005 Progress Software Corporation ARCH-12, Unicode
Binary versus Linguistic Sorting -A Visual
beetcarrotentrytrustzoomécoleçedilla
beetcarrotçedillaécoleentrytrustzoom
Binary Sort Linguistic Sort
English (ICU-en)
28 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic Sorting
Progress uses collations for:– -cpcoll session startup
parameter
– Database collation
– Collation of database CLOB column
– Argument to COMPARE function COLLATE option of the BY
phrase
29 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic SortingSupported Collations
OpenEdge supports all ICU collations in the icui18n library– Beyond icui18n one additional
collation is supported Japanese Hiragana Quaternary
as case-sensitive
30 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic Sorting
4GL Usage - Reference collation by name For example “ICU-fr” for French
Specify using– -cpcoll <table name>
Identifies collation table to use with code page in memory at session startup
<table name> is the collation table in convmap.cp or the name of the ICU collation
– 4GL Statements COMPARE COLLATE
31 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic Sorting
/* French collation */DISPLAY “ICU-fr = ” + COMPARE("côte", "<", "coté",
"case-insensitive", "ICU-fr")/* Spanish collation */DISPLAY “ICU-es = ” + COMPARE("côte", "<", "coté",
"case-insensitive", "ICU-es")
ICU-fr = yesICU-es = no
Sort order depends on selected collation
Output of above statements
32 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic SortingExamples 1 of 4
Examples– UTF-8 database with “basic” collation– Names: beet, carrot, çedilla, entry, école,
zoom, trust
FOR EACH words WHERE name < “t”:DISPLAY name.
END.
beetcarrotentry
Output result
33 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic SortingExamples 2 of 4
FOR EACH words WHERE name >= “t”:DISPLAY name.
END.
trustzoomécoleçedilla
Output result
34 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic SortingExamples 3 of 4
FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,
“ICU-en”):DISPLAY name.
END.
beetcarrotentryécoleçedilla
Output result
35 © 2005 Progress Software Corporation ARCH-12, Unicode
Linguistic SortingExamples 4 of 4
FOR EACH words WHERE COMPARE(name < “t”,“case-insensitive”,
“ICU-en”) BY COLLATE(name, “case-insensitive”,
“ICU-en”): DISPLAY name.
END.
beetcarrotçedillaécoleentry
Output result
36 © 2005 Progress Software Corporation ARCH-12, Unicode
Agenda - Implementing Unicode
Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider
37 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Normalization
Why is this needed?
– Puts in “NCF” format as expected by XML (and other W3C entities)
– Best way to convert from Unicode to other code pages
– Useful when doing tasks such as making comparisons
38 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Normalization
Unicode has different ways of expressing the same characters– Base letter plus combining marks (accents)
as two Unicode code points Á = composite (composed)
(U+0041, Latin Capital Letter A) + (U+0301, Combining Acute Accent ´)
– Base letter and accents as one Unicode code point
Á = precomposed (U+00C1, Latin Capital Letter A with Acute)
What is normalization?
39 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Normalization
NORMALIZE– 4GL function new in OpenEdge 10.0B
– Returns either CHAR or LONGCHAR Matches the source string CHAR variable must be UTF-8 LONGCHAR variable any form of Unicode
– UTF-8, UTF-16, UTF-32
result-string = NORMALIZE(source-string, normalization-mode)
40 © 2005 Progress Software Corporation ARCH-12, Unicode
Normalization Modes Supported
NFD– Canonical Decomposition
NFC– Canonical Decomposition, followed by Canonical
Composition NFKD
– Compatibility Decomposition NFKC
– Compatibility Decomposition, followed by Canonical Composition
None– No change to source string– Turns off normalization when normalization-mode
is a variable
41 © 2005 Progress Software Corporation ARCH-12, Unicode
Agenda - Implementing Unicode
Essentials
Migrating a Database
Unicode Client
Sorting
Normalization
Other Areas to Consider
42 © 2005 Progress Software Corporation ARCH-12, Unicode
Bidi Support
Bi-directional (bidi)– Behavior of individual widgets and/or the complete
window to go from right to left or left to right Supported
– Fill-in widget Can type right to left of left to right
Not-Supported– Whole frame
Cannot switch labels from left side to right side
43 © 2005 Progress Software Corporation ARCH-12, Unicode
GB18030 Code Page SupportAdded in OpenEdge 10.0B
New Chinese code page Required for all new
software sold in mainland China as of Jan. 1, 2001
44 © 2005 Progress Software Corporation ARCH-12, Unicode
Broaden Your Potential Customer Base Using Unicode
Unicode is the best way to support multiple languages
A number of recent OpenEdge™ enhancements facilitate Unicode
OpenEdge tools simplify the task
In summary
45 © 2005 Progress Software Corporation ARCH-12, Unicode
Documentation
OpenEdge Development– Internationalizing
Applications
46 © 2005 Progress Software Corporation ARCH-12, Unicode
Unicode Resources
Unicode Home page– http://www.unicode.org
– Unicode Standard, Unicode Consortium
International Components for Unicode– http://www-124.ibm.com/icu/docs/– http://www-124.ibm.com/icu/docs/papers/forms_
of_unicode/
47 © 2005 Progress Software Corporation ARCH-12, Unicode
System Resources
Viewing keyboard layoutshttp://www.microsoft.com/globaldev/reference/keyboards.aspx
– Select the language and the keyboard layout is displayed
– Use shift key to toggle to ‘lower/upper’ case characters
– Use MS Internet Explorer to display
48 © 2005 Progress Software Corporation ARCH-12, Unicode
Questions?
49 © 2005 Progress Software Corporation ARCH-12, Unicode
Thank you for your time!
50 © 2005 Progress Software Corporation ARCH-12, Unicode
Recommended