Unicode in SAP NetWeaver - doag.org rz... · PDF file3/31/2004 2 Agenda 1. Introduction to Unicode 2. Unicode & SAP in General 3. Technology in Depth 4. Sizing Information for Unicode-based

© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice

Unicode in SAP NetWeaver

Sebastian BuhlingerSAP Consultant, HP-SAP EMEA CC

3/31/2004 2

Agenda

1. Introduction to Unicode

2. Unicode & SAP in General

3. Technology in Depth

4. Sizing Information for Unicode-based SAP Systems

3/31/2004 3

Introduction to Unicode

3/31/2004 4

1. Introduction to Unicode• What is text?• History of character encoding• Problem of character encoding• From ASCII to Unicode• What is Unicode exactly?• The Unicode Standard• Where is Unicode used?• The Unicode Consortium• Unicode Encodings

3/31/2004 5

What is text?• Code pages & encodings describe the handling of and the

way text is stored in• Computers• Files• Data structures

• Inside a computer program or data file, text is stored as a sequence of numbers – just like “everything else”

• A character is a:• Letter,• Digit,• Period,• Hyphen,• Punctuation or• Math symbol

• Furthermore there are control characters – typically not visible

3/31/2004 6

History of Character Encoding• Historically, computers were pretty slow, had fairly

little memory and were very expensive• Up to 1960s I/O meant pushing holes into paper

tapes• Most of the character sets date back to punch-card

age and are designed with these cards in mind• In the early days of computers every hardware

manufacturer used proprietary technology (and encodings)

• International data interchange was no issue and so nothing needed to fit together

3/31/2004 7

Problem of character encoding• Which number is assigned to which character?• When typing an ‘A’ on the keyboard, the

computer uses the character code as a basis for pulling the character shape of ‘A’ from a font file listing with the same binary number, and displays or prints it

• The character ‘A’ may also have different integer values in different programs or data files (‘A’ might be ‘•’ in an Arabic font file)

• In some instances no number available for certain characters (f.i. “&auml” à Ä)

• All data encoded in the form of binary numerical codes

3/31/2004 8

Character repertoire• English alphabet: with some digits and little more:~ 60 characters

• Western European Standard: ~ 300 characters for several languages

• Korean: ~12.000 syllables• Chinese dictionaries: ~ 50.000 letters• Hundreds of other characters in common use, such as math and currency symbols

3/31/2004 9

From ASCII to Unicode• Most character sets and encodings in 70s/80s were modifications or extensions of ASCII

• Many of them used 8-bit with a subset of the 94 used ASCII characters

• Most common encodings nowadays use single byte per character (SBCS)

• They are all limited to 256 characters• Due to that, none of them can even cover the letters for the Western European languages

3/31/2004 10

From ASCII to Unicode•Consequence: many different 8-bit encodings were created to fulfill the needs of different user communities

•Solution for data interchange in global networked information society and collaborative business world:single character set for all languages in use

•Unicode can encode 4.294.967.296 different characters, symbols and control characters

3/31/2004 11

What is Unicode exactly?• Unicode = universally encoded character set to

store information from any language• Unicode defines

• properties for each character• standardizes script behavior• provides a standard algorithm for bi directional text• defines cross-mappings for other standards

• Unicode defines a unique code value for every character, regardless of platform, program or programming language used

3/31/2004 12

What is Unicode exactly?•The Unicode standard primarily encodes scripts rather than languages

•Scripts comprise several languages that historically share the same set of symbols

• In many cases a script may serve to write dozens of languages (e.g. the Latin script)

• In other cases one script complies to one language (e.g. Hangul)

3/31/2004 13

What is Unicode exactly?•Additionally it also includes punctuation marks, diacritics, mathematical symbols, technical symbols, musical symbols, arrows, dingbats etc.

• In all, the Unicode Standard comprises >95.000 characters, ideograph sets, symbols (version 4.0)

3/31/2004 14

The Unicode Standard •The Unicode Standard is a character coding system designed to support the worldwide•interchange, •processing, •and display of written text of the diverse languages and technical disciplines of the modern world

• In addition, it supports classical and historical texts of many written languages

3/31/2004 15

Where is Unicode used?• The Unicode standards has been adopted by

many software and hardware vendors• Mosts OSs support Unicode• Unicode is required for international document

and data interchange, the Internet and the WWW, and therefore by modern standards such as:• Java, C#, Perl, Python• Markup languages such as XML, HTML, XHTML,

MathML, WML etc.• JavaScript• LDAP• CORBA etc.

3/31/2004 16

The Unicode Consortium•The Unicode Consortium is a non-profit organization originally founded to •develop, •extend, •and promote the use of the Unicode Standard

•Members of the Consortium include major computer corporations, software producers, database vendors, research institutions, international agencies, various user groups, and interested individuals

3/31/2004 17

The Unicode Consortium•The Consortium cooperates with

•W3C and •ISO •and has liaison status "C" with ISO/IEC/

JTC 1/SC2/WG2, which is responsible for in refining the specification and expanding the character set of ISO/IEC 10646

3/31/2004 18

Unicode Encodings•UTF = Unicode Transformation Format•UCS = Universal Character Set•CESU = Compatibility Encoding Scheme

•Conversion between different encodings is a simple, bit-wise operation (defined in standard)

•No performance excessive conversion table necessary!

3/31/2004 19

Unicode Encodings•UTF-8: Unicode Transformation based on 8-bit representation

•CESU-8: Compatibility Encoding Scheme of UTF-16 on an 8-bit base

•UTF-16: Unicode Transformation based on 16-bit representation

3/31/2004 20

Unicode Encodings•UCS-2: Universal Character Set 2 byte variation (16-bit)

•UTF-32: Unicode Transformation based on 32-bit representation

•UCS-4: Universal Character Set 4 byte variation (32 bit)

3/31/2004 21

Unicode Encodings•Not all Unicode characters are 2 bytes longí no doubling of hw requirements in the first place

•Unicode encoding determines the length of a character

•Character in one Unicode encoding can be longer than 1 byte; therefore Unicode characters can be longer than characters defined in a standard code page

3/31/2004 22

UTF-8• UTF-8 is the 8-bit encoding of Unicode• It’s a variable-width encoding and also a strictsuperset of 7-bit ASCII

• “Strict superset” means that every character in 7-bit ASCII is available in UTF-8 with the same corresponding code point value

• 1 character = 1byte – 4 bytes in the encoding• Characters from European scripts: either 1or 2 bytes

• Asian scripts: 3 or 4 bytes

3/31/2004 23

UTF-8• UTF-8 used for UNIX-platforms, HTML and most

Internet Browsers• Main benefits of UTF-8:

•compact storage requirements for European scripts

•in general European scripts will occupy less storage on disk and memory

•ease of migration –> since 7-bit ASCII data remains the same in UTF-8, data conversion effort between ASCII based character sets and UTF-8 is reduced significantly

3/31/2004 24

UTF-8 / CESU-8 (8-bit encodings)•8-bit encodings are well-suited for data transfer since all 7-bit ASCII and 8-bit ISO characters retain the same code points

•Easier communication with legacy and non-Unicode systems

•Downside: variable character length

3/31/2004 25

UCS-2• UCS-2 has a fixed width of 16 bit (2 bytes)• UCS-2 is the Unicode encoding for Java & Win NT 4.0• Main benefits of UCS-2:

•More compact storage requirements for Asian scripts (each character represented with 2 bytes only)

•String processing will be faster because all characters are of the same width

•Good compatibility with Java and Microsoft clients

• Downside:•UCS-2 can support Unicode characters defined up to

Unicode 3.0 only (max. 65.536)

3/31/2004 26

UTF-16• UTF-16 is the 16-bit encoding of Unicode• Basically an extension of UCS-2• One Unicode character can be 2 or 4 bytes in the encoding

• Characters from European and most Asian scripts are represented in 2 bytes

• Supplementary characters are represented in 4 bytes

• UTF-16 is the main Unicode encoding from Windows 2K

3/31/2004 27

UTF-16• Main benefits of UTF-16:

•More compact storage requirements for Asian scripts (2 bytes for commonly used characters)

•Ideal if European and Asian scripts are used together --> UTF-16 will occupy less storage on disk and memory than with UTF-8 (3 bytes for Asian part)

•Balance of efficient access to characters and economical use of storage

• Above mentioned points reason for use of UTF-16 in SAP Web Application Server

3/31/2004 28

UCS-2 / UTF-16 (16-bit encodings)•16-bit encodings offer a compromise between the pros and cons of the 8-bit and the 32-bit encodings, respectively

•They do not need as much memory as 32-bit encodings, but offer quasi fixed character length

•UCS-2 has a fixed character length, but it cannot define more than 2^16 (65.636) characters

3/31/2004 29

UTF-32•32-Bit encoding

•Popular when memory space is no concern

•Fixed width (4Byte)

3/31/2004 30

UCS-4 / UTF-32 (32-bit encodings)•All 32-bit encodings have a fixed length

•This advantage is outweighed by the extensive memory & storage requirements

3/31/2004 31

Example #1

D834 DD1EN/AF0 9D 84 9E•

98759875E4 BA 75•

06640664DA 64•00F600F6C3 B6Ö

00C600C6C3 86Æ

0063006363c

0041004141A

UTF-16UCS-2UTF-8Character

3/31/2004 32

Example #2 – character “•” U+AC00

00CAHEX

0000000011001010BINUTF-

16

00001100 00001010Regroup bits

0000000000111010

000010000000101110101110Remove lead

bytes

000010000000101110101110BIN

080BAEHEXUTF-

8

Lead Byte Indicator Trailing Byte Indicator

3/31/2004 33

Unicode & SAP in General

3/31/2004 34

2. Unicode & SAP in General• Languages and characters• Characters on Disk/Memory• Code Pages• SAP & Code Pages• Language Combinations before Unicode• Recommendations from SAP (w/o Unicode)• Unicode-compliant SAP products• When/why do customers need Unicode?

3/31/2004 35

Language and characters• Languages are written in fonts• Only a few languages use the same fonts• A font is a group of characters

3/31/2004 36

Characters on Disk/Memory• A character is stored as a byte sequence on disk• a code page defines the mapping between the

byte sequence and a character

Characters on Disk/Memory

3/31/2004 37

Code Pages• The code page determine what character you can

see and enter

Characters on Disk/Memory

3/31/2004 38

Code Pages• different code pages map different characters to

the same byte sequence

Characters on Disk/MemorySingle Byte Double Byte

3/31/2004 39

SAP & Code Pages

3/31/2004 40

Language Combinations before Unicode• Single Standard Code Pages

• supports specific sets of languages• the number and combination of languages that are supported

cannot be altered

• Standard code pages and R/3 languages (w/o EBCDIC)

Double-Byte Code Pages

3/31/2004 41

Language Combinations before Unicode• It is also possible to specify a customer-specific language; this language must use one of the code pages that SAP supports; see Note 0112065

3/31/2004 42

Language Combinations before Unicode• Blended Code Pages (≥ Rel. 3.1D)

•SAP proprietary code pages that contain characters from one or more standard code pages

• increases the combinations of languages that can be used

• functionally, a Blended Code Page system uses a single code page

•a Blended Code Page is a single code page system

•users can see and enter all characters contained in the code page, regardless of their log-in language

3/31/2004 43

Language Combinations before Unicode

SAP Code Page Supported Languages

3/31/2004 44

Language Combinations before Unicode• the availability of SAP blended code pages is

platform dependent, because SAP blended locales need to be created for each platform

• Blended Locale Status (x = available −− = not available)

3/31/2004 45

Language Combinations before Unicode• MDMP (≥ Rel. 3.1I)

Multi-Display / Multi-Processing

• allows dynamic code page switching on the application server• therefore permits any combination of standard code pages on

one system• the log-on language determines the code page that is active for

each user• an MDMP system is recommended if:

1. one or more additional code pages are required to add languages to your existing installation

2. a blended code page cannot support the combination of languages you need for a new installation. For example, an MDMP system with the code pages 1100 and 8000, allows German and Japanese users to log onto the same R/3 system in their respective languages

3/31/2004 46

Example

• Each user can only access one code page at a time: a user who logs in as a Japanese user cannot enter German characters, and all German characters in the database will not be correctly displayed

1100 – ISO-1

8000 - SJIS

Language Combinations before Unicode

DBApplication

Server

Front End

Japan

Germany

3/31/2004 47

Language Combinations before UnicodeExample

JapaneseUser

GermanUser

3/31/2004 48

Language Combinations before UnicodePlease Note:

• It is possible for a user to log on with German and then manipulate the character set and font settings so that he can enter what appear to be Japanese characters; these characters will not be correctly stored in the database and this data will be corrupt

• If a user wants to enter f.i. Japanese, he/she must log on in Japanese

3/31/2004 49

Language Combinations before UnicodePlease Note:

• To insure that no data corruption occurs, the following restrictions must be followed:

•Global data must contain only 7-bit ASCII characters, which are in all code pages

•Users may use only the characters of their log-in language or 7-bit ASCII

•Batch processes must be assigned with the correct user ID and language

•EBCDIC code pages are not supported

3/31/2004 50

Recommendations from SAP (w/o Unicode)

• In general, using a single standard code page for new installations and upgrades is the optimal decision

• If additional languages or language combinations are needed, SAP recommends Unambiguous Blended Code Pages for new installations and MDMP for existing installations

• Unambiguous Blended Code Pages only support certain language combinations and therefore an MDMP setup may be the only possibility for new installations as well

3/31/2004 51

Unicode-compliant SAP products

• All Unicode installations are currently planned only with written permission of SAP carried out as customer projects together with SAP, except of new installations of R/3 Enterprise Extension Set 2.0

3/31/2004 52

Unicode-compliant SAP products (SAP Note 79991)

üSAP Web Application Server (≥ 6.20)

ümySAP Customer Relationship Management (CRM)• The Unicode version of mySAP CRM 4.0 is available via Ramp-Up

ümySAP Supply Chain Management (SCM)• The Unicode version of mySAP SCM 4.0 is available via Ramp-Up

ümySAP Supplier Relationship Management (SRM)• The Unicode version of mySAP SRM 4.0 is available via Ramp-Up• conversions (with or without MDMP) of existing SRM installations

3/31/2004 53

Unicode-compliant SAP products (SAP Note 79991)

ümySAP Business Intelligence (BW)• The Unicode version of mySAP BW 3.5 is available via Ramp-Up• the conversion of existing BW installations as customer project• SAP Note 643813 has a collection of all relevant SAP notes

concerning Unicode-based SAP BW installations

ümySAP Product Lifecycle Management (PLM)• The Unicode version of mySAP PLM 4.0 is available via Ramp-Up

üSAP R/3 Enterprise (Ext. 1.10 & higher)

üSAP Exchange Infrastructure

3/31/2004 54

When/why do customers need Unciode?

• Global businesses that require IT systems to support multilingual data without any restrictionsí f.i. customers with one WW central SAP system

• Web interfaces open the door to a global customer base, and IT systems must consequently be able to support multiple local languages simultaneously

3/31/2004 55

When/why do customers need Unciode?

• With J2EE integration, mySAP components fully support web standards, and with Unicode, it now can take full advantage of XML and Java

• Only Unicode makes it possible to seamlessly integrate inhomogeneous SAP and non-SAP system landscapesí NetWeaver

3/31/2004 56

Technology in Depth

3/31/2004 57

3. Technology in Depth•Unicode & Operating Systems•Unicode & Databases•SAP Unicode-based Code Pages•How to Unicode-enable a program•Unicode-enabled ABAP•Migrating to Unicode enabled ABAP•Unicode Conversion, IMIG Lab Test•SAP System-to-System communication•Printing & Output Management

3/31/2004 58

Unicode & Operating Systems –HP-UX• HP-UX is Unicode-enabled since version 10.x• All Unicode locales in the HP-UX operating

environment are based on the UTF-8 format• Each locale includes a base language in the UTF-8

code set and the regional data related to this base language

• This includes local formatting rules, text messages, help messages, and other related files

• Each locale also supports several other scripts for input, display, code conversion, and printing

3/31/2004 59

Unicode & Operating Systems -Windows• Some Unicode support has been included in

Microsoft Windows since Windows 95, and Windows NT 4

• Windows 2000 and Windows XP/2003 are based on Unicode instead of the ANSI or WGL4 character sets

• Before Win2K, your version of Windows may have used a different character set if you live in a country such as Egypt, Greece, Israel, Russia or Thailand that uses a non-Latin alphabet

3/31/2004 60

Unicode & Operating Systems –Windows

• The first 128 characters were the same as in ANSI, but many of the places in the second set of 128 were taken by characters from the Arabic, Greek, Hebrew, Cyrillic or Thai alphabets

• This caused and still causes problems when moving documents between operating systems such as DOS, Windows, Mac OS and UNIX or exchanging documents electronically that were created on computers using different character sets

3/31/2004 61

Unicode & Operating Systems –Linux• Before UTF-8 emerged, Linux users all over the

world had to use various different language-specific extensions of ASCII

• Most popular were ISO 8859-1 and ISO 8859-2 in Europe, ISO 8859-7 in Greece, KOI-8 / ISO 8859-5 / CP1251 in Russia, EUC and Shift-JIS in Japan, BIG5 in Taiwan, etc.

• This made the exchange of files difficult and application software had to worry about various small differences between these encodings

3/31/2004 62

Unicode & Operating Systems –Linux• Because of these difficulties, major Linux

distributors and application developers have now started to phase out these older legacy encodings in favor of UTF-8

• UTF-8 support has improved dramatically over the last few years and ever more people now use UTF-8 on a daily basis in • text files (source code, HTML files, email messages, etc.) • file names • standard input and standard output, pipes • …

3/31/2004 63

Unicode & Operating Systems –Linux

• In UTF-8 mode, terminal emulators (such as xterm) transform every keystroke into the corresponding UTF-8 sequence and send it to the stdin of the foreground process

• Similarly, any output of a process on stdout is sent to the terminal emulator, where it is processed with a UTF-8 decoder and then displayed using a 16-bit font

3/31/2004 64

Unicode & Operating Systems –Linux• Before you start experimenting with UTF-8 under

Linux, update your installation to a recent distribution with up-to-date UTF-8 support

• This is particular the case if you use an installation older than SuSE 8.1 or Red Hat 8.0

• Before these, UTF-8 support was far too limited and experimental to be recommendable for daily use

3/31/2004 65

Little vs. Big Endian• UCS and Unicode are first of all just code tables that assign integer numbers to characters

• There exist several alternatives for how a sequence of such characters or their respective integer values can be represented as a sequence of bytes

• The two most obvious encodings store Unicode text as sequences of either 2 or 4 bytes sequences

3/31/2004 66

Little vs. Big Endian• The official terms for these encodings are UCS-2 and UCS-4, respectively

• Unless otherwise specified, the most significant byte comes first in these (Big Endianconvention)

• An ASCII or Latin-1 file can be transformed into a UCS-2 file by simply inserting a 0x00 byte in front of every ASCII byte

• If we want to have a UCS-4 file, we have to insert three 0x00 bytes instead before every ASCII byte

3/31/2004 67

Little vs. Big Endian

66 5353 66E6 99 93U+6653•05 D0D0 05D7 90U+05D0•03 B1B1 03CE B1U+03B1•00 C4C4 00C3 84U+00C4Ä00 4141 0041U+0041A

[Big Endian]

[Little Endian]

UTF-16UTF-16UTF-8 / CESU-8Unicode Scalar ValueCharacter

3/31/2004 68

Unicode & Databases

P----PPPPSAP DB

P?PPPPPDB2

P----PPPPOracle

------------PSQL Server

LinuxOS/390OS/400AIXSolarisHP-UXWin2K

P Available ? Currently not available -- Unsupported in general

Supported Databases by SAP (WAS 6.20)

3/31/2004 69

Unicode & Databases

UTF-88.0

UTF-167.0SAP DB

UTF-16AS400

CESU-8AIXDB2

UTF-8 / UTF-1610g

UTF-8 / UTF-169i

UTF-88

UTF-87.2Oracle

UTF-162000SQL Server

EncodingsVersionManufacturer

3/31/2004 70

SAP Unicode-based Code Pages• With the Unicode enablement of mySAP.com

components (check chapter #1), the old code page management had to be changed

• Instead of using SAP character numbers all code pages are now based on Unicode character Ids

•í 5 digit SAP Character numbers no longer adequate

This change is valid for both Unicode and Non-Unicode Systems!

3/31/2004 71

SAP Unicode-based Code Pages

3/31/2004 72

SAP Unicode-based Code Pages• Connection between SAP

character number & Unicode character ID is found in table TCP01

• You can see the connection in the SPAD character section

• NOTE: not every character has a corresponding Unicode character ID!f.i.

3/31/2004 73

SAP Unicode-based Code Pages• The migration of all SAP code pages from the old

to the new format was done using report RSCP0126

• The definition of code pages is still in TCP00

Customers must migrate their own code pages (9xxx) using RSCP0126

themselves!

3/31/2004 74

How to Unicode-enable a program• Separate Unicode and Non-Unicode version of

R/3ABAP

source Non-Unicode

R/3

Unicode R/3

• 1 character = 1 byte(types C, N, D, T, STRING)

• Non-Unicode kernel

• Non-Unicode database

• 1 character = 2 bytes í UTF-16

(types C, N, D, T, STRING)

• Unicode kernel

• Unicode database

• No explicit Unicode data type in ABAP• Single ABAP source for Unicode and non-Unicode systems

3/31/2004 75

How to Unicode-enable a program•Major part of ABAP coding is ready for Unicode without any changes

•Minor part of ABAP coding has to be adapted to comply with Unicode restrictions (f.i. syntactical restrictions)

3/31/2004 76

How to Unicode-enable a program• Program attribute

„Unicode checks active“

3/31/2004 77

Unicode Enabled ABAPDesign Goals• Platform independenceØIdentical behavior on Unicode and non-Unicode systems

• Highest level of compatibility to the pre-Unicode worldØMinimize costs for Unicode enabling of ABAP Programs

Main Features• Clear distinction between character and byte

processing1 Character <> 1 Byte

3/31/2004 78

Unicode Enabled ABAPABAP lists: Difference between memory and display

length

3/31/2004 79

Migrating to Unicode enabled ABAPStep 1

• In non-Unicode system

• Adapt all ABAP programs to Unicode syntax and runtime restrictions

• Set attribute "Unicode enabled" for all programs

3/31/2004 80

Migrating to Unicode enabled ABAPStep 2• Set up a Unicode system

• Unicode kernel + Unicode database• Only ABAP programs with the Unicode attribute are executable

• Do runtime tests in Unicode system

• Check for runtime errors

• Look for semantic errors

• Check ABAP list layout with former double byte characters

3/31/2004 81

Migrating to Unicode enabled ABAPUse UCCHECK to analyze your applications:• Remove errors• Inspect statically not analyzable places (optional)

• Untyped field symbols• Offset with variable length• Generic access to database tables

• Set Unicode program attribute using UCCHECK or SE38 / SE24 / ...

• Do additional checks with SLIN (e.g. matching of actual and formal parameters in function modules)

3/31/2004 82

Migrating to Unicode enabled ABAP

3/31/2004 83

Migrating to Unicode enabled ABAP

Upgrade to Unicode

3/31/2004 85

Upgrade to Unicode• With Unicode, there are no limitations on users,

and all languages in the ISO639 standard can be used

• Unicode is technically supported as of Basis Release 6.20, see Note 0379940 for more information

• A single code page system (standard or Unambiguous Blended Code Page) can be upgraded to Unicode using the normal upgrade method

3/31/2004 86

Unicode Conversion RoadmapPreparation • During preparation, topics such as

• additional hardware requirements, • downtime issues, • Unicode-enabling of customer developments, • and the special treatment of MDMP systems

have to be taken into consideration

3/31/2004 87

Unicode Conversion RoadmapConversion • The Unicode conversion process is based on a

system copy, and during this process, the database conversion and system shutdown/restart are as automated as possible

• For small to mid-size databases (< 1 TB), this is based on an SAP Unload/Reload of the complete database; minimum downtime tools will be used for larger databases.

3/31/2004 88

Unicode Conversion RoadmapPost-Conversion

• Once the Unicode system is up and running, you need to • verify data consistency on a scenario basis, • as well as carry out general integration testing

• For systems that support multiple languages, special emphasis needs to be placed on cross-language handling during the test phase.

• Correction tools are provided by SAP, which can be used in the case that conversion did not run properly.

3/31/2004 89

Unicode Conversion RoadmapPost-Conversion

• Additional Tool: SAP Data Management - reducing the database size and growth

• To keep your database costs in check, the SAP Data Management service frees up valuable database resources by showing you how to reduce the size and growth of your database by typically 25 % (see details).

3/31/2004 90

Unicode Conversion at a GlancePreparation

Conversion

Post-Conversion

Set up the Unicode Conversion Project

Check Prerequisites

Data Analysis for downtime minimization –special MDMP treatment

Enabling of Customer Developments

Highly automated

System will be down during database

conversion

Unload /reload process for small databases

Minimum downtime tool for large databases

Unicode system is up and running

Verification of Data Consistency

Integration Testing focused on

language handling

3/31/2004 91

Upgrade Paths to Unicode (R/3 Enterprise)

R/3 4.6c

Source system Target system

R/3 Enterprise

non-Unicode

R/3 Enterprise

Unicode

R/3 4.5b

R/3 3.1i

l First upgrade, then conversion to Unicode

l R/3 Enterprise Ramp-Up started 2002-07

l Unicode availability follows a phase ofrestricted shipment with pilot customers

R/3 4.6b

R/3 4.0b Conversion

Directupgrade

3/31/2004 92

Upgrade Paths to Unicode (BW 3.1)

BW 3.0


BW 3.1

non-Unicode

BW 3.1

Unicode

BW 2.1C

BW 2.0B

l Interfacing R/3 MDMP on a project base only

l Unicode BEXGUI restrictions apply


l BW 3.1 Ramp-Up starting 2002-12


Conversion

3/31/2004 93

Upgrade Paths to Unicode (CRM 3.1)

CRM 3.0


CRM 3.1

non-Unicode

CRM 3.1

Unicode

l Selected scenarios onlyçècooperation with SAP GBU CRM required


l CRM 3.1 Ramp-Up starting 2002-12


CRM 2.0B

CRM 2.0C

Conversion

3/31/2004 94


Conversion

Post-Conversion


Check Prerequisites



Highly automated


conversion






language handling

3/31/2004 95

Prerequisites, special MDMP treatment

• OSS Note 548016Conversion from Unicode to non-Unicode is not possible

The Unicode Conversion of MDMP AND also Ambiguous Code page systems ( Code Page numbers 6100, 6200 and 6500 ) is only supported on project basis with SAP involvement

• OSS Note 543715The Unicode Conversion of a BW 3.1 system requires additional steps regarding the system copy

• OSS Note 573044If you are using HR functionality within R/3 Enterprise , also additional steps are mandatory

3/31/2004 96

6.30 Unicode & MCOD

ABAP Stack (non Unicode/Unicode)

ABAP Stack (non Unicode/Unicode)

Java Stack (Unicode)

Java Stack (Unicode)

System QA1

System TC2

SAPQA1

SAPQA1DB

SAPTC2

SAPTC2DB

• With SAP WebAS 6.30 a database abstraction layer for the Java stack was introduced – OpenSQL for Java

• Tables of the Java stack are stored in the same database instance like the tables of the ABAP stack in two different schema (except Informix)

• The concept of MCOD installations is fully supported by the combined stack of ABAP and Java

3/31/2004 97


Conversion

Post-Conversion


Check Prerequisites



Highly automated


conversion






language handling

3/31/2004 98

Unicode Conversion - IMIG

Whitepaper:

„SAP R/3 incremental migration test“

http://saphpcc.bbn.hp.com/Global/Compet/migration/migration.HTM

3/31/2004 99

SAP System-to-System Communication

3/31/2004 100

SAP System-to-System communication• SAP Web Application Server (≥ 6.20)

• Only one source code exists for Unicode-based and non-Unicode-based systems, í new developments can be smoothly exchanged

• The interfaces (e.g. RFC) have been extended, so that communication between other Unicode-based systems or non-Unicode-based systems is possible. Furthermore, SAP provides standard tools for the installation of (and conversion to) Unicode-based systems that can also be used for checking and Unicode-enabling of customer developments

3/31/2004 101

SAP System-to-System communication• solid lines:

receiver can receive all characters

• dotted lines:receiver cannot receive characters, which are not in its own code page. But as long as you restrict the character set, data can be sent from everywhere to everywhere.

Unicode R/3

WWW

http/RFC

http/RFC

SJIS

Latin-1

Non-Unicode

R/3SJIS

MDMP R/3

Latin-1 SJIS

3/31/2004 102

SAP System-to-System communicationRFC• Unicode <-> Unicode

• no problem

• non Unicode <-> non Unicode• old stuff, receiver converts code page if possible

• Unicode <-> non Unicode• the Unicode side converts from/ to the code page of the

non Unicode side• MDMP is converted with a languages key• System settings allow the configuration of error handling

3/31/2004 103

SAP System-to-System communicationRFC (SM59) – Unicode <–> non Unicode

3/31/2004 104

SAP System-to-System communicationRFC (SM59) – Unicode <–> non Unicode

3/31/2004 105

Printing & Output ManagementWhat is a SAP device type?• configuration file for the SAP printer driver that ensures

proper functionality between the SAP data stream and the printer or output device where the data is sent

Printer drivers & device types• In R/3, a distinction is made between "printer driver" and

"device type“• A device type consists of a variety of attributes defined for

an output device• One of these attributes is the printer driver to be used by

SAPscript (R/3 forms processor) for this particular printer

3/31/2004 106

Printing & Output Management• device types cover aspects such as control commands

for font selection, page size, character set selection, character set used and so on

• a device type must be specified to enable direct-printing from the SAP applications for every new printer defined in SAP environment

• device types are created by SAP for the entire HP LaserJet printer family on the basis of PCL5, PCL6 and PostScript

• SAP develops, tests and supports device types for HP products that can be found here: http://h40045.www4.hp.com/printing_solutions/Device_Types.html

3/31/2004 107

Printing & Output Management• at present, there are five SAPscript printer drivers

They include:• HP-PCL5 (for example, HP Laserjet 3,4,5,6 series)• PostScript printers (PS level 2)• PRESCRIBE (for example, Kyocera FS-1500)• device types SWIN/SAPWIN/xxSWIN/xxSAPWIN

3/31/2004 108

Printing & Output ManagementUnicode Device Types• LEXMARK is going into HP accounts, claiming that only

LEXMARK could support SAP UNICODE printing. Background:• in order to support UNICODE character-sets on an HP

printer, customers need to have a UNICODE compliant printer and a SAP UNICODE device-type

• UNICODE compliant printer are defined by firmware support for UTF8 and/or UTF16 and UNICODE fonts loaded on the printer

• today LEXMARK is the preferred vendor for SAP UNICODE printing

3/31/2004 109

Printing & Output ManagementSolution for HP• all OZ based printers (LJ2300 and higher) support by default

UNICODE UTF16 fonts in PCL6• the LJ2300, CLJ9500 and future products will support UTF8 fonts

in PCL5• firmware role is planned to also support all current OZ based

printers (LJ4200/4300, LJ9000, CLJ4600, CLJ5500) to support UTF-8 in PCL5

• furthermore the UNICODE fonts need to be loaded on the printer (e.g. stored on internal hard-disc)

• today we have a UNICODE-prototype-solution available to print from an SAP environment

• for more information, contact Alan Cooke (U.S.) or Stephen Westberg (EMEA)

3/31/2004 110

Sizing Information for Unicode-based SAP Systems

3/31/2004 111

Sizing Info - GeneralThe space requirements for encoding a text, compared to encodings currently in use (8 bit per character for European languages, more for Chinese/ Japanese/ Korean), is as follows í next Slide

This has an influence on disk storage space and network download speed (when no form of compression is used)

3/31/2004 112

Sizing Info - GeneralUTF-8

No change for US ASCII, just a few percent more for ISO-8859-1, 50% more for Chinese/Japanese/Korean, 100% more for Greek and Cyrillic

UCS-2 and UTF-16No change for Chinese/Japanese/Korean. 100% more for US ASCII and ISO-8859-1, Greek and Cyrillic

UCS-4100% more for Chinese/Japanese/Korean. 300% more for US ASCII and ISO-8859-1, Greek and Cyrillic

3/31/2004 113

Expected Hardware Requirements• Increase of CPU requirementsØDepending on existing solution:

ISO-LATIN1 (ASCII) ð Unicode: +30%Double-Byte/MDMP ð Unicode: + <5%

• Increase of memory requirementsØIncrease of memory requirements depending on

underlying DB (+ ~50%)ØApplication Server internally based on UTF-16; DB either

UTF-8, CESU-8 or UTF-16

3/31/2004 114

Unicode Conversion Demo

JAVA Applet Demo

3/31/2004 115

Expected Hardware Requirements• Database growth depending onØ DB Unicode encoding schema (e.g. CESU-8, UTF-16)Ø Languages in use

A

1100 8000 CESU-8 UTF-16

Ä

1100 8000 CESU-8 UTF-16

•

1100 8000 CESU-8 UTF-16

60-70%SQL Server, DB/2 (AS400), SAP DB (7.0)

UTF-16

35%Oracle, SAP DB (8.0)DB/2 (AIX)

UTF-8 CESU-8

Additional StorageReq‘s

ManufacturersEncoding

1 By

te

• Network load: (draft results) <7% for Latin-1, about 15% for Japanese, 25% for other Asian languages

3/31/2004 116

Expected Hardware Requirements

NON-Unicode

R/3 Release 4.0 4.5 4.6c 4.7 (6.20) non-Unicode

CPU 1 +20% +15% +5%

Memory 1 +20% DB: +20%; +5%App:+10%

Disk 1 +10% +10% +10%

3/31/2004 117

Expected Hardware Requirements

Unicode

R/3 Release 4.7 (6.20) non-Unicode 4.7 with Unicode

CPU 1 +30% to 35%

Memory 1 +50%

Disk 1 +~35% (UTF-8)+60-70% (UTF-16)

Documents

Unicode in SAP NetWeaver - doag.org rz... · PDF file3/31/2004 2 Agenda 1. Introduction to Unicode 2. Unicode & SAP in General 3. Technology in Depth 4. Sizing Information for Unicode-based