Upload
dinhduong
View
256
Download
5
Embed Size (px)
Citation preview
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Unicode in SAP NetWeaver
Sebastian BuhlingerSAP Consultant, HP-SAP EMEA CC
3/31/2004 2
Agenda
1. Introduction to Unicode
2. Unicode & SAP in General
3. Technology in Depth
4. Sizing Information for Unicode-based SAP Systems
3/31/2004 3
Introduction to Unicode
3/31/2004 4
1. Introduction to Unicode• What is text?• History of character encoding• Problem of character encoding• From ASCII to Unicode• What is Unicode exactly?• The Unicode Standard• Where is Unicode used?• The Unicode Consortium• Unicode Encodings
3/31/2004 5
What is text?• Code pages & encodings describe the handling of and the
way text is stored in• Computers• Files• Data structures
• Inside a computer program or data file, text is stored as a sequence of numbers – just like “everything else”
• A character is a:• Letter,• Digit,• Period,• Hyphen,• Punctuation or• Math symbol
• Furthermore there are control characters – typically not visible
3/31/2004 6
History of Character Encoding• Historically, computers were pretty slow, had fairly
little memory and were very expensive• Up to 1960s I/O meant pushing holes into paper
tapes• Most of the character sets date back to punch-card
age and are designed with these cards in mind• In the early days of computers every hardware
manufacturer used proprietary technology (and encodings)
• International data interchange was no issue and so nothing needed to fit together
3/31/2004 7
Problem of character encoding• Which number is assigned to which character?• When typing an ‘A’ on the keyboard, the
computer uses the character code as a basis for pulling the character shape of ‘A’ from a font file listing with the same binary number, and displays or prints it
• The character ‘A’ may also have different integer values in different programs or data files (‘A’ might be ‘•’ in an Arabic font file)
• In some instances no number available for certain characters (f.i. “ä” à Ä)
• All data encoded in the form of binary numerical codes
3/31/2004 8
Character repertoire• English alphabet: with some digits and little more:~ 60 characters
• Western European Standard: ~ 300 characters for several languages
• Korean: ~12.000 syllables• Chinese dictionaries: ~ 50.000 letters• Hundreds of other characters in common use, such as math and currency symbols
3/31/2004 9
From ASCII to Unicode• Most character sets and encodings in 70s/80s were modifications or extensions of ASCII
• Many of them used 8-bit with a subset of the 94 used ASCII characters
• Most common encodings nowadays use single byte per character (SBCS)
• They are all limited to 256 characters• Due to that, none of them can even cover the letters for the Western European languages
3/31/2004 10
From ASCII to Unicode•Consequence: many different 8-bit encodings were created to fulfill the needs of different user communities
•Solution for data interchange in global networked information society and collaborative business world:single character set for all languages in use
•Unicode can encode 4.294.967.296 different characters, symbols and control characters
3/31/2004 11
What is Unicode exactly?• Unicode = universally encoded character set to
store information from any language• Unicode defines
• properties for each character• standardizes script behavior• provides a standard algorithm for bi directional text• defines cross-mappings for other standards
• Unicode defines a unique code value for every character, regardless of platform, program or programming language used
3/31/2004 12
What is Unicode exactly?•The Unicode standard primarily encodes scripts rather than languages
•Scripts comprise several languages that historically share the same set of symbols
• In many cases a script may serve to write dozens of languages (e.g. the Latin script)
• In other cases one script complies to one language (e.g. Hangul)
3/31/2004 13
What is Unicode exactly?•Additionally it also includes punctuation marks, diacritics, mathematical symbols, technical symbols, musical symbols, arrows, dingbats etc.
• In all, the Unicode Standard comprises >95.000 characters, ideograph sets, symbols (version 4.0)
3/31/2004 14
The Unicode Standard •The Unicode Standard is a character coding system designed to support the worldwide•interchange, •processing, •and display of written text of the diverse languages and technical disciplines of the modern world
• In addition, it supports classical and historical texts of many written languages
3/31/2004 15
Where is Unicode used?• The Unicode standards has been adopted by
many software and hardware vendors• Mosts OSs support Unicode• Unicode is required for international document
and data interchange, the Internet and the WWW, and therefore by modern standards such as:• Java, C#, Perl, Python• Markup languages such as XML, HTML, XHTML,
MathML, WML etc.• JavaScript• LDAP• CORBA etc.
3/31/2004 16
The Unicode Consortium•The Unicode Consortium is a non-profit organization originally founded to •develop, •extend, •and promote the use of the Unicode Standard
•Members of the Consortium include major computer corporations, software producers, database vendors, research institutions, international agencies, various user groups, and interested individuals
3/31/2004 17
The Unicode Consortium•The Consortium cooperates with
•W3C and •ISO •and has liaison status "C" with ISO/IEC/
JTC 1/SC2/WG2, which is responsible for in refining the specification and expanding the character set of ISO/IEC 10646
3/31/2004 18
Unicode Encodings•UTF = Unicode Transformation Format•UCS = Universal Character Set•CESU = Compatibility Encoding Scheme
•Conversion between different encodings is a simple, bit-wise operation (defined in standard)
•No performance excessive conversion table necessary!
3/31/2004 19
Unicode Encodings•UTF-8: Unicode Transformation based on 8-bit representation
•CESU-8: Compatibility Encoding Scheme of UTF-16 on an 8-bit base
•UTF-16: Unicode Transformation based on 16-bit representation
3/31/2004 20
Unicode Encodings•UCS-2: Universal Character Set 2 byte variation (16-bit)
•UTF-32: Unicode Transformation based on 32-bit representation
•UCS-4: Universal Character Set 4 byte variation (32 bit)
3/31/2004 21
Unicode Encodings•Not all Unicode characters are 2 bytes longí no doubling of hw requirements in the first place
•Unicode encoding determines the length of a character
•Character in one Unicode encoding can be longer than 1 byte; therefore Unicode characters can be longer than characters defined in a standard code page
3/31/2004 22
UTF-8• UTF-8 is the 8-bit encoding of Unicode• It’s a variable-width encoding and also a strictsuperset of 7-bit ASCII
• “Strict superset” means that every character in 7-bit ASCII is available in UTF-8 with the same corresponding code point value
• 1 character = 1byte – 4 bytes in the encoding• Characters from European scripts: either 1or 2 bytes
• Asian scripts: 3 or 4 bytes
3/31/2004 23
UTF-8• UTF-8 used for UNIX-platforms, HTML and most
Internet Browsers• Main benefits of UTF-8:
•compact storage requirements for European scripts
•in general European scripts will occupy less storage on disk and memory
•ease of migration –> since 7-bit ASCII data remains the same in UTF-8, data conversion effort between ASCII based character sets and UTF-8 is reduced significantly
3/31/2004 24
UTF-8 / CESU-8 (8-bit encodings)•8-bit encodings are well-suited for data transfer since all 7-bit ASCII and 8-bit ISO characters retain the same code points
•Easier communication with legacy and non-Unicode systems
•Downside: variable character length
3/31/2004 25
UCS-2• UCS-2 has a fixed width of 16 bit (2 bytes)• UCS-2 is the Unicode encoding for Java & Win NT 4.0• Main benefits of UCS-2:
•More compact storage requirements for Asian scripts (each character represented with 2 bytes only)
•String processing will be faster because all characters are of the same width
•Good compatibility with Java and Microsoft clients
• Downside:•UCS-2 can support Unicode characters defined up to
Unicode 3.0 only (max. 65.536)
3/31/2004 26
UTF-16• UTF-16 is the 16-bit encoding of Unicode• Basically an extension of UCS-2• One Unicode character can be 2 or 4 bytes in the encoding
• Characters from European and most Asian scripts are represented in 2 bytes
• Supplementary characters are represented in 4 bytes
• UTF-16 is the main Unicode encoding from Windows 2K
3/31/2004 27
UTF-16• Main benefits of UTF-16:
•More compact storage requirements for Asian scripts (2 bytes for commonly used characters)
•Ideal if European and Asian scripts are used together --> UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part)
•Balance of efficient access to characters and economical use of storage
• Above mentioned points reason for use of UTF-16 in SAP Web Application Server
3/31/2004 28
UCS-2 / UTF-16 (16-bit encodings)•16-bit encodings offer a compromise between the pros and cons of the 8-bit and the 32-bit encodings, respectively
•They do not need as much memory as 32-bit encodings, but offer quasi fixed character length
•UCS-2 has a fixed character length, but it cannot define more than 2^16 (65.636) characters
3/31/2004 29
UTF-32•32-Bit encoding
•Popular when memory space is no concern
•Fixed width (4Byte)
3/31/2004 30
UCS-4 / UTF-32 (32-bit encodings)•All 32-bit encodings have a fixed length
•This advantage is outweighed by the extensive memory & storage requirements
3/31/2004 31
Example #1
D834 DD1EN/AF0 9D 84 9E•
98759875E4 BA 75•
06640664DA 64•00F600F6C3 B6Ö
00C600C6C3 86Æ
0063006363c
0041004141A
UTF-16UCS-2UTF-8Character
3/31/2004 32
Example #2 – character “•” U+AC00
00CAHEX
0000000011001010BINUTF-
16
00001100 00001010Regroup bits
0000000000111010
000010000000101110101110Remove lead
bytes
000010000000101110101110BIN
080BAEHEXUTF-
8
Lead Byte Indicator Trailing Byte Indicator
3/31/2004 33
Unicode & SAP in General
3/31/2004 34
2. Unicode & SAP in General• Languages and characters• Characters on Disk/Memory• Code Pages• SAP & Code Pages• Language Combinations before Unicode• Recommendations from SAP (w/o Unicode)• Unicode-compliant SAP products• When/why do customers need Unicode?
3/31/2004 35
Language and characters• Languages are written in fonts• Only a few languages use the same fonts• A font is a group of characters
3/31/2004 36
Characters on Disk/Memory• A character is stored as a byte sequence on disk• a code page defines the mapping between the
byte sequence and a character
Characters on Disk/Memory
3/31/2004 37
Code Pages• The code page determine what character you can
see and enter
Characters on Disk/Memory
3/31/2004 38
Code Pages• different code pages map different characters to
the same byte sequence
Characters on Disk/MemorySingle Byte Double Byte
3/31/2004 39
SAP & Code Pages
3/31/2004 40
Language Combinations before Unicode• Single Standard Code Pages
• supports specific sets of languages• the number and combination of languages that are supported
cannot be altered
• Standard code pages and R/3 languages (w/o EBCDIC)
Double-Byte Code Pages
3/31/2004 41
Language Combinations before Unicode• It is also possible to specify a customer-specific language; this language must use one of the code pages that SAP supports; see Note 0112065
3/31/2004 42
Language Combinations before Unicode• Blended Code Pages (≥ Rel. 3.1D)
•SAP proprietary code pages that contain characters from one or more standard code pages
• increases the combinations of languages that can be used
• functionally, a Blended Code Page system uses a single code page
•a Blended Code Page is a single code page system
•users can see and enter all characters contained in the code page, regardless of their log-in language
3/31/2004 43
Language Combinations before Unicode
SAP Code Page Supported Languages
3/31/2004 44
Language Combinations before Unicode• the availability of SAP blended code pages is
platform dependent, because SAP blended locales need to be created for each platform
• Blended Locale Status (x = available −− = not available)
3/31/2004 45
Language Combinations before Unicode• MDMP (≥ Rel. 3.1I)
Multi-Display / Multi-Processing
• allows dynamic code page switching on the application server• therefore permits any combination of standard code pages on
one system• the log-on language determines the code page that is active for
each user• an MDMP system is recommended if:
1. one or more additional code pages are required to add languages to your existing installation
2. a blended code page cannot support the combination of languages you need for a new installation. For example, an MDMP system with the code pages 1100 and 8000, allows German and Japanese users to log onto the same R/3 system in their respective languages
3/31/2004 46
Example
• Each user can only access one code page at a time: a user who logs in as a Japanese user cannot enter German characters, and all German characters in the database will not be correctly displayed
1100 – ISO-1
8000 - SJIS
Language Combinations before Unicode
DBApplication
Server
Front End
Japan
Germany
3/31/2004 47
Language Combinations before UnicodeExample
JapaneseUser
GermanUser
3/31/2004 48
Language Combinations before UnicodePlease Note:
• It is possible for a user to log on with German and then manipulate the character set and font settings so that he can enter what appear to be Japanese characters; these characters will not be correctly stored in the database and this data will be corrupt
• If a user wants to enter f.i. Japanese, he/she must log on in Japanese
3/31/2004 49
Language Combinations before UnicodePlease Note:
• To insure that no data corruption occurs, the following restrictions must be followed:
•Global data must contain only 7-bit ASCII characters, which are in all code pages
•Users may use only the characters of their log-in language or 7-bit ASCII
•Batch processes must be assigned with the correct user ID and language
•EBCDIC code pages are not supported
3/31/2004 50
Recommendations from SAP (w/o Unicode)
• In general, using a single standard code page for new installations and upgrades is the optimal decision
• If additional languages or language combinations are needed, SAP recommends Unambiguous Blended Code Pages for new installations and MDMP for existing installations
• Unambiguous Blended Code Pages only support certain language combinations and therefore an MDMP setup may be the only possibility for new installations as well
3/31/2004 51
Unicode-compliant SAP products
• All Unicode installations are currently planned only with written permission of SAP carried out as customer projects together with SAP, except of new installations of R/3 Enterprise Extension Set 2.0
3/31/2004 52
Unicode-compliant SAP products (SAP Note 79991)
üSAP Web Application Server (≥ 6.20)
ümySAP Customer Relationship Management (CRM)• The Unicode version of mySAP CRM 4.0 is available via Ramp-Up
ümySAP Supply Chain Management (SCM)• The Unicode version of mySAP SCM 4.0 is available via Ramp-Up
ümySAP Supplier Relationship Management (SRM)• The Unicode version of mySAP SRM 4.0 is available via Ramp-Up• conversions (with or without MDMP) of existing SRM installations
3/31/2004 53
Unicode-compliant SAP products (SAP Note 79991)
ümySAP Business Intelligence (BW)• The Unicode version of mySAP BW 3.5 is available via Ramp-Up• the conversion of existing BW installations as customer project• SAP Note 643813 has a collection of all relevant SAP notes
concerning Unicode-based SAP BW installations
ümySAP Product Lifecycle Management (PLM)• The Unicode version of mySAP PLM 4.0 is available via Ramp-Up
üSAP R/3 Enterprise (Ext. 1.10 & higher)
üSAP Exchange Infrastructure
3/31/2004 54
When/why do customers need Unciode?
• Global businesses that require IT systems to support multilingual data without any restrictionsí f.i. customers with one WW central SAP system
• Web interfaces open the door to a global customer base, and IT systems must consequently be able to support multiple local languages simultaneously
3/31/2004 55
When/why do customers need Unciode?
• With J2EE integration, mySAP components fully support web standards, and with Unicode, it now can take full advantage of XML and Java
• Only Unicode makes it possible to seamlessly integrate inhomogeneous SAP and non-SAP system landscapesí NetWeaver
3/31/2004 56
Technology in Depth
3/31/2004 57
3. Technology in Depth•Unicode & Operating Systems•Unicode & Databases•SAP Unicode-based Code Pages•How to Unicode-enable a program•Unicode-enabled ABAP•Migrating to Unicode enabled ABAP•Unicode Conversion, IMIG Lab Test•SAP System-to-System communication•Printing & Output Management
3/31/2004 58
Unicode & Operating Systems –HP-UX• HP-UX is Unicode-enabled since version 10.x• All Unicode locales in the HP-UX operating
environment are based on the UTF-8 format• Each locale includes a base language in the UTF-8
code set and the regional data related to this base language
• This includes local formatting rules, text messages, help messages, and other related files
• Each locale also supports several other scripts for input, display, code conversion, and printing
3/31/2004 59
Unicode & Operating Systems -Windows• Some Unicode support has been included in
Microsoft Windows since Windows 95, and Windows NT 4
• Windows 2000 and Windows XP/2003 are based on Unicode instead of the ANSI or WGL4 character sets
• Before Win2K, your version of Windows may have used a different character set if you live in a country such as Egypt, Greece, Israel, Russia or Thailand that uses a non-Latin alphabet
3/31/2004 60
Unicode & Operating Systems –Windows
• The first 128 characters were the same as in ANSI, but many of the places in the second set of 128 were taken by characters from the Arabic, Greek, Hebrew, Cyrillic or Thai alphabets
• This caused and still causes problems when moving documents between operating systems such as DOS, Windows, Mac OS and UNIX or exchanging documents electronically that were created on computers using different character sets
3/31/2004 61
Unicode & Operating Systems –Linux• Before UTF-8 emerged, Linux users all over the
world had to use various different language-specific extensions of ASCII
• Most popular were ISO 8859-1 and ISO 8859-2 in Europe, ISO 8859-7 in Greece, KOI-8 / ISO 8859-5 / CP1251 in Russia, EUC and Shift-JIS in Japan, BIG5 in Taiwan, etc.
• This made the exchange of files difficult and application software had to worry about various small differences between these encodings
3/31/2004 62
Unicode & Operating Systems –Linux• Because of these difficulties, major Linux
distributors and application developers have now started to phase out these older legacy encodings in favor of UTF-8
• UTF-8 support has improved dramatically over the last few years and ever more people now use UTF-8 on a daily basis in • text files (source code, HTML files, email messages, etc.) • file names • standard input and standard output, pipes • …
3/31/2004 63
Unicode & Operating Systems –Linux
• In UTF-8 mode, terminal emulators (such as xterm) transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process
• Similarly, any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16-bit font
3/31/2004 64
Unicode & Operating Systems –Linux• Before you start experimenting with UTF-8 under
Linux, update your installation to a recent distribution with up-to-date UTF-8 support
• This is particular the case if you use an installation older than SuSE 8.1 or Red Hat 8.0
• Before these, UTF-8 support was far too limited and experimental to be recommendable for daily use
3/31/2004 65
Little vs. Big Endian• UCS and Unicode are first of all just code tables that assign integer numbers to characters
• There exist several alternatives for how a sequence of such characters or their respective integer values can be represented as a sequence of bytes
• The two most obvious encodings store Unicode text as sequences of either 2 or 4 bytes sequences
3/31/2004 66
Little vs. Big Endian• The official terms for these encodings are UCS-2 and UCS-4, respectively
• Unless otherwise specified, the most significant byte comes first in these (Big Endianconvention)
• An ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte
• If we want to have a UCS-4 file, we have to insert three 0x00 bytes instead before every ASCII byte
3/31/2004 67
Little vs. Big Endian
66 5353 66E6 99 93U+6653•05 D0D0 05D7 90U+05D0•03 B1B1 03CE B1U+03B1•00 C4C4 00C3 84U+00C4Ä00 4141 0041U+0041A
[Big Endian]
[Little Endian]
UTF-16UTF-16UTF-8 / CESU-8Unicode Scalar ValueCharacter
3/31/2004 68
Unicode & Databases
P----PPPPSAP DB
P?PPPPPDB2
P----PPPPOracle
------------PSQL Server
LinuxOS/390OS/400AIXSolarisHP-UXWin2K
P Available ? Currently not available -- Unsupported in general
Supported Databases by SAP (WAS 6.20)
3/31/2004 69
Unicode & Databases
UTF-88.0
UTF-167.0SAP DB
UTF-16AS400
CESU-8AIXDB2
UTF-8 / UTF-1610g
UTF-8 / UTF-169i
UTF-88
UTF-87.2Oracle
UTF-162000SQL Server
EncodingsVersionManufacturer
3/31/2004 70
SAP Unicode-based Code Pages• With the Unicode enablement of mySAP.com
components (check chapter #1), the old code page management had to be changed
• Instead of using SAP character numbers all code pages are now based on Unicode character Ids
•í 5 digit SAP Character numbers no longer adequate
This change is valid for both Unicode and Non-Unicode Systems!
3/31/2004 71
SAP Unicode-based Code Pages
3/31/2004 72
SAP Unicode-based Code Pages• Connection between SAP
character number & Unicode character ID is found in table TCP01
• You can see the connection in the SPAD character section
• NOTE: not every character has a corresponding Unicode character ID!f.i.
3/31/2004 73
SAP Unicode-based Code Pages• The migration of all SAP code pages from the old
to the new format was done using report RSCP0126
• The definition of code pages is still in TCP00
Customers must migrate their own code pages (9xxx) using RSCP0126
themselves!
3/31/2004 74
How to Unicode-enable a program• Separate Unicode and Non-Unicode version of
R/3ABAP
source Non-Unicode
R/3
Unicode R/3
• 1 character = 1 byte(types C, N, D, T, STRING)
• Non-Unicode kernel
• Non-Unicode database
• 1 character = 2 bytes í UTF-16
(types C, N, D, T, STRING)
• Unicode kernel
• Unicode database
• No explicit Unicode data type in ABAP• Single ABAP source for Unicode and non-Unicode systems
3/31/2004 75
How to Unicode-enable a program•Major part of ABAP coding is ready for Unicode without any changes
•Minor part of ABAP coding has to be adapted to comply with Unicode restrictions (f.i. syntactical restrictions)
3/31/2004 76
How to Unicode-enable a program• Program attribute
„Unicode checks active“
3/31/2004 77
Unicode Enabled ABAPDesign Goals• Platform independenceØIdentical behavior on Unicode and non-Unicode systems
• Highest level of compatibility to the pre-Unicode worldØMinimize costs for Unicode enabling of ABAP Programs
Main Features• Clear distinction between character and byte
processing1 Character <> 1 Byte
3/31/2004 78
Unicode Enabled ABAPABAP lists: Difference between memory and display
length
3/31/2004 79
Migrating to Unicode enabled ABAPStep 1
• In non-Unicode system
• Adapt all ABAP programs to Unicode syntax and runtime restrictions
• Set attribute "Unicode enabled" for all programs
3/31/2004 80
Migrating to Unicode enabled ABAPStep 2• Set up a Unicode system
• Unicode kernel + Unicode database• Only ABAP programs with the Unicode attribute are executable
• Do runtime tests in Unicode system
• Check for runtime errors
• Look for semantic errors
• Check ABAP list layout with former double byte characters
3/31/2004 81
Migrating to Unicode enabled ABAPUse UCCHECK to analyze your applications:• Remove errors• Inspect statically not analyzable places (optional)
• Untyped field symbols• Offset with variable length• Generic access to database tables
• Set Unicode program attribute using UCCHECK or SE38 / SE24 / ...
• Do additional checks with SLIN (e.g. matching of actual and formal parameters in function modules)
3/31/2004 82
Migrating to Unicode enabled ABAP
3/31/2004 83
Migrating to Unicode enabled ABAP
Upgrade to Unicode
3/31/2004 85
Upgrade to Unicode• With Unicode, there are no limitations on users,
and all languages in the ISO639 standard can be used
• Unicode is technically supported as of Basis Release 6.20, see Note 0379940 for more information
• A single code page system (standard or Unambiguous Blended Code Page) can be upgraded to Unicode using the normal upgrade method
3/31/2004 86
Unicode Conversion RoadmapPreparation • During preparation, topics such as
• additional hardware requirements, • downtime issues, • Unicode-enabling of customer developments, • and the special treatment of MDMP systems
have to be taken into consideration
3/31/2004 87
Unicode Conversion RoadmapConversion • The Unicode conversion process is based on a
system copy, and during this process, the database conversion and system shutdown/restart are as automated as possible
• For small to mid-size databases (< 1 TB), this is based on an SAP Unload/Reload of the complete database; minimum downtime tools will be used for larger databases.
3/31/2004 88
Unicode Conversion RoadmapPost-Conversion
• Once the Unicode system is up and running, you need to • verify data consistency on a scenario basis, • as well as carry out general integration testing
• For systems that support multiple languages, special emphasis needs to be placed on cross-language handling during the test phase.
• Correction tools are provided by SAP, which can be used in the case that conversion did not run properly.
3/31/2004 89
Unicode Conversion RoadmapPost-Conversion
• Additional Tool: SAP Data Management - reducing the database size and growth
• To keep your database costs in check, the SAP Data Management service frees up valuable database resources by showing you how to reduce the size and growth of your database by typically 25 % (see details).
3/31/2004 90
Unicode Conversion at a GlancePreparation
Conversion
Post-Conversion
Set up the Unicode Conversion Project
Check Prerequisites
Data Analysis for downtime minimization –special MDMP treatment
Enabling of Customer Developments
Highly automated
System will be down during database
conversion
Unload /reload process for small databases
Minimum downtime tool for large databases
Unicode system is up and running
Verification of Data Consistency
Integration Testing focused on
language handling
3/31/2004 91
Upgrade Paths to Unicode (R/3 Enterprise)
R/3 4.6c
Source system Target system
R/3 Enterprise
non-Unicode
R/3 Enterprise
Unicode
R/3 4.5b
R/3 3.1i
l First upgrade, then conversion to Unicode
l R/3 Enterprise Ramp-Up started 2002-07
l Unicode availability follows a phase ofrestricted shipment with pilot customers
R/3 4.6b
R/3 4.0b Conversion
Directupgrade
3/31/2004 92
Upgrade Paths to Unicode (BW 3.1)
BW 3.0
Source system Target system
BW 3.1
non-Unicode
BW 3.1
Unicode
BW 2.1C
BW 2.0B
l Interfacing R/3 MDMP on a project base only
l Unicode BEXGUI restrictions apply
l First upgrade, then conversion to Unicode
l BW 3.1 Ramp-Up starting 2002-12
l Unicode availability follows a phase ofrestricted shipment with pilot customers
Conversion
3/31/2004 93
Upgrade Paths to Unicode (CRM 3.1)
CRM 3.0
Source system Target system
CRM 3.1
non-Unicode
CRM 3.1
Unicode
l Selected scenarios onlyçècooperation with SAP GBU CRM required
l First upgrade, then conversion to Unicode
l CRM 3.1 Ramp-Up starting 2002-12
l Unicode availability follows a phase ofrestricted shipment with pilot customers
CRM 2.0B
CRM 2.0C
Conversion
3/31/2004 94
Unicode Conversion at a GlancePreparation
Conversion
Post-Conversion
Set up the Unicode Conversion Project
Check Prerequisites
Data Analysis for downtime minimization –special MDMP treatment
Enabling of Customer Developments
Highly automated
System will be down during database
conversion
Unload /reload process for small databases
Minimum downtime tool for large databases
Unicode system is up and running
Verification of Data Consistency
Integration Testing focused on
language handling
3/31/2004 95
Prerequisites, special MDMP treatment
• OSS Note 548016Conversion from Unicode to non-Unicode is not possible
The Unicode Conversion of MDMP AND also Ambiguous Code page systems ( Code Page numbers 6100, 6200 and 6500 ) is only supported on project basis with SAP involvement
• OSS Note 543715The Unicode Conversion of a BW 3.1 system requires additional steps regarding the system copy
• OSS Note 573044If you are using HR functionality within R/3 Enterprise , also additional steps are mandatory
3/31/2004 96
6.30 Unicode & MCOD
ABAP Stack (non Unicode/Unicode)
ABAP Stack (non Unicode/Unicode)
Java Stack (Unicode)
Java Stack (Unicode)
System QA1
System TC2
SAPQA1
SAPQA1DB
SAPTC2
SAPTC2DB
• With SAP WebAS 6.30 a database abstraction layer for the Java stack was introduced – OpenSQL for Java
• Tables of the Java stack are stored in the same database instance like the tables of the ABAP stack in two different schema (except Informix)
• The concept of MCOD installations is fully supported by the combined stack of ABAP and Java
3/31/2004 97
Unicode Conversion at a GlancePreparation
Conversion
Post-Conversion
Set up the Unicode Conversion Project
Check Prerequisites
Data Analysis for downtime minimization –special MDMP treatment
Enabling of Customer Developments
Highly automated
System will be down during database
conversion
Unload /reload process for small databases
Minimum downtime tool for large databases
Unicode system is up and running
Verification of Data Consistency
Integration Testing focused on
language handling
3/31/2004 98
Unicode Conversion - IMIG
Whitepaper:
„SAP R/3 incremental migration test“
http://saphpcc.bbn.hp.com/Global/Compet/migration/migration.HTM
3/31/2004 99
SAP System-to-System Communication
3/31/2004 100
SAP System-to-System communication• SAP Web Application Server (≥ 6.20)
• Only one source code exists for Unicode-based and non-Unicode-based systems, í new developments can be smoothly exchanged
• The interfaces (e.g. RFC) have been extended, so that communication between other Unicode-based systems or non-Unicode-based systems is possible. Furthermore, SAP provides standard tools for the installation of (and conversion to) Unicode-based systems that can also be used for checking and Unicode-enabling of customer developments
3/31/2004 101
SAP System-to-System communication• solid lines:
receiver can receive all characters
• dotted lines:receiver cannot receive characters, which are not in its own code page. But as long as you restrict the character set, data can be sent from everywhere to everywhere.
Unicode R/3
WWW
http/RFC
http/RFC
SJIS
Latin-1
Non-Unicode
R/3SJIS
MDMP R/3
Latin-1 SJIS
3/31/2004 102
SAP System-to-System communicationRFC• Unicode <-> Unicode
• no problem
• non Unicode <-> non Unicode• old stuff, receiver converts code page if possible
• Unicode <-> non Unicode• the Unicode side converts from/ to the code page of the
non Unicode side• MDMP is converted with a languages key• System settings allow the configuration of error handling
3/31/2004 103
SAP System-to-System communicationRFC (SM59) – Unicode <–> non Unicode
3/31/2004 104
SAP System-to-System communicationRFC (SM59) – Unicode <–> non Unicode
3/31/2004 105
Printing & Output ManagementWhat is a SAP device type?• configuration file for the SAP printer driver that ensures
proper functionality between the SAP data stream and the printer or output device where the data is sent
Printer drivers & device types• In R/3, a distinction is made between "printer driver" and
"device type“• A device type consists of a variety of attributes defined for
an output device• One of these attributes is the printer driver to be used by
SAPscript (R/3 forms processor) for this particular printer
3/31/2004 106
Printing & Output Management• device types cover aspects such as control commands
for font selection, page size, character set selection, character set used and so on
• a device type must be specified to enable direct-printing from the SAP applications for every new printer defined in SAP environment
• device types are created by SAP for the entire HP LaserJet printer family on the basis of PCL5, PCL6 and PostScript
• SAP develops, tests and supports device types for HP products that can be found here: http://h40045.www4.hp.com/printing_solutions/Device_Types.html
3/31/2004 107
Printing & Output Management• at present, there are five SAPscript printer drivers
They include:• HP-PCL5 (for example, HP Laserjet 3,4,5,6 series)• PostScript printers (PS level 2)• PRESCRIBE (for example, Kyocera FS-1500)• device types SWIN/SAPWIN/xxSWIN/xxSAPWIN
3/31/2004 108
Printing & Output ManagementUnicode Device Types• LEXMARK is going into HP accounts, claiming that only
LEXMARK could support SAP UNICODE printing. Background:• in order to support UNICODE character-sets on an HP
printer, customers need to have a UNICODE compliant printer and a SAP UNICODE device-type
• UNICODE compliant printer are defined by firmware support for UTF8 and/or UTF16 and UNICODE fonts loaded on the printer
• today LEXMARK is the preferred vendor for SAP UNICODE printing
3/31/2004 109
Printing & Output ManagementSolution for HP• all OZ based printers (LJ2300 and higher) support by default
UNICODE UTF16 fonts in PCL6• the LJ2300, CLJ9500 and future products will support UTF8 fonts
in PCL5• firmware role is planned to also support all current OZ based
printers (LJ4200/4300, LJ9000, CLJ4600, CLJ5500) to support UTF-8 in PCL5
• furthermore the UNICODE fonts need to be loaded on the printer (e.g. stored on internal hard-disc)
• today we have a UNICODE-prototype-solution available to print from an SAP environment
• for more information, contact Alan Cooke (U.S.) or Stephen Westberg (EMEA)
3/31/2004 110
Sizing Information for Unicode-based SAP Systems
3/31/2004 111
Sizing Info - GeneralThe space requirements for encoding a text, compared to encodings currently in use (8 bit per character for European languages, more for Chinese/ Japanese/ Korean), is as follows í next Slide
This has an influence on disk storage space and network download speed (when no form of compression is used)
3/31/2004 112
Sizing Info - GeneralUTF-8
No change for US ASCII, just a few percent more for ISO-8859-1, 50% more for Chinese/Japanese/Korean, 100% more for Greek and Cyrillic
UCS-2 and UTF-16No change for Chinese/Japanese/Korean. 100% more for US ASCII and ISO-8859-1, Greek and Cyrillic
UCS-4100% more for Chinese/Japanese/Korean. 300% more for US ASCII and ISO-8859-1, Greek and Cyrillic
3/31/2004 113
Expected Hardware Requirements• Increase of CPU requirementsØDepending on existing solution:
ISO-LATIN1 (ASCII) ð Unicode: +30%Double-Byte/MDMP ð Unicode: + <5%
• Increase of memory requirementsØIncrease of memory requirements depending on
underlying DB (+ ~50%)ØApplication Server internally based on UTF-16; DB either
UTF-8, CESU-8 or UTF-16
3/31/2004 114
Unicode Conversion Demo
JAVA Applet Demo
3/31/2004 115
Expected Hardware Requirements• Database growth depending onØ DB Unicode encoding schema (e.g. CESU-8, UTF-16)Ø Languages in use
A
1100 8000 CESU-8 UTF-16
Ä
1100 8000 CESU-8 UTF-16
•
1100 8000 CESU-8 UTF-16
60-70%SQL Server, DB/2 (AS400), SAP DB (7.0)
UTF-16
35%Oracle, SAP DB (8.0)DB/2 (AIX)
UTF-8 CESU-8
Additional StorageReq‘s
ManufacturersEncoding
1 By
te
• Network load: (draft results) <7% for Latin-1, about 15% for Japanese, 25% for other Asian languages
3/31/2004 116
Expected Hardware Requirements
NON-Unicode
R/3 Release 4.0 4.5 4.6c 4.7 (6.20) non-Unicode
CPU 1 +20% +15% +5%
Memory 1 +20% DB: +20%; +5%App:+10%
Disk 1 +10% +10% +10%
3/31/2004 117
Expected Hardware Requirements
Unicode
R/3 Release 4.7 (6.20) non-Unicode 4.7 with Unicode
CPU 1 +30% to 35%
Memory 1 +50%
Disk 1 +~35% (UTF-8)+60-70% (UTF-16)