History of Code Pages and Unicode in SAP_v2.0

Embed Size (px)

Citation preview

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    1/24

    1

    History of Code pages and

    Unicode in SAP

    Sumit Kothiyal, Basis Consultant

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    2/24

    2

    History of Code pages in SAP

    With the need for Multi National Language Support in SAP, SAP developedblended code pages as a workaround to the problems caused by theMNLS. It also suggested the MDMP solution, which was not much popularat that time. In the meantime, MDMP has become more stable and ispreferred to blended code pages. As of the Unicode-enabled Basisrelease 6.10, the universal solution to the problem is available.

    Pre-Unicode Solutions by SAPSingle Code Page SystemBlended Code Page System ( Release 3.0D)MDMP System Configuration ( Release 3.1l).

    Language Combinations before Unicode It is also possible to specify a customer specific language; this language must use one of the code

    pages that SAP supports, see Note 0112065 for more information.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    3/24

    3

    First question First. What is a code page ?

    A good question to start with, well we should be aware of a code page before we start

    with different types of coding techniques implemented in SAP or used in SAP to store thecharacter data.

    Code Page : All data in the database are stored as a sequence of bytes/numbers,

    including characters. For character data, a code page defines the mapping between a bytesequence and a character (a letter, symbol, ideograph, dingbat, etc.). A code page is amatrix of code points, which are the combination of the coordinate values for a given

    space in a code page matrix (see example). A code page is used whenever characterdata is processed on the application server, displayed on the front end, or rendered bythe printer. To ensure there is no error while processing character data SAP uses codepages based on the ISO standard. Each code page is defined a unique four-digit number.

    In the example code page, the character "A"

    has the code point '41' in hexadecimal notation,and the character "}" has the code point '7D'.The empty fields are reserved for non-printingcharacters, such as END OF LINE, or they havenot been assigned a character. The ASCIIcharacters are shaded in the example.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    4/24

    4

    Single Code Page System

    System using one standard code page which can support a specific set of languages. In a

    single code page system, all application servers and the database use one standard systemcode page. This may be a 1-byte code page like Latin 1 (for Western Europe) or Latin 2 (forEastern Europe), or a multi-byte Japanese code page. If your system landscape goesbeyond one of these regions, however, this single code page system will no longer besufficient.

    If youre using single code page systems, the conversion to Unicode is straightforward and, infact, mostly automated

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    5/24

    5

    SAP Blended Code Pages (R/3 3.0D R/3 4.6D)

    We will start with Blended Code Pages first as this was the first solution for the multilanguage support in SAP

    From R/3 3.0D on, SAP application servers could run multi-byte blended code pages, which containcharacters from several standard code pages. Blended code pages are not standard code pages, butSAP-customized pages created to support an increased number of possible language combinationsin a single code page. But such an approach covers only a fixed set of language combinations anddoes not allow any flexibility regarding additional code pages.There are two types of SAP BlendedCode Pages: Ambiguous Blended Code Pages and Unambiguous Blended Code Pages

    (ISO1 with ISO7 D7 and F7)SAP Diocletian6500

    (ASCII+SJIS1 + Asian)SAP Asian Unification6200

    (ISO1+ ISO2 + ISO7 + SJIS1)SAP Unification6100

    Ambiguous Blended Code Pages

    (ISO5 + SJIS1)SAP Trans Siberian6700

    (Thai + SJIS1)SAP Nagamasa6600

    (ISO7 + SJIS1)SAP Silk Road6400

    (ISO1 + SJIS1)SAP Eurojapan6300

    (Korean + SJIS1)SAP Asian UnificationK

    6250

    (simp. Chinese + SJIS1)SAP Asian UnificationC6240

    (trad. Chinese + SJIS1)SAP Asian UnificationT6230

    Unambiguous Blended Code Pages

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    6/24

    6

    What is the difference ?

    When you use an Ambiguous Blended Code Page, several characters can be assigned to oneand the same byte sequence. Each character can be represented by different byte sequences,or in simple language two characters can share the same code point.

    When you use an Unambiguous Blended Code Page, each byte sequence is assigned exactlyone character. Each character can be represented by different byte sequences, or in simpleterms each code point refers exactly to one character.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    7/24

    7

    MDMP - Multiple Display/Multiple Processing

    History:The MDMP solution has been introduced with R/3 3.1I when the Blended CodePages solution turned out to be no longer sufficient, as the size of a code page limits thenumber and the combination of languages that can be supported in a single code pagesystem as is shown in the diagram in the previous slide. MDMP was initially treated astemporary solution for R/3 systems with the restrictions as explained in SAP Notes 747036,745030 and 73606. CRM, SCM, BI and other non-R/3 components never supported MDMP.

    Since Web AS 6.20, the standard code page technology for SAP systems is Unicode.

    Support:Existing MDMP installations are supported up to SAPNetWeaver 2004 (ERP 2004).With SAP NetWeaver 7.0 (ERP 6.0) and all higher releases including all enhancementpackages, MDMP is out of support: SAP Note 79991 provides the detail of MDMP support. SAPsystems with more than one system code page must be converted to Unicode before orduring the upgrade to SAP NetWeaver 7.0 (ERP 6.0). See SAP Notes 838402 and 928729 formore information

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    8/24

    8

    Technical Information

    In an MDMP system, in order to allow more languages in the system, more than one code pageis used, the catch is : characters used by these languages are not in the same code page. Thecode page used on the application server is selected by the users logon language or can saydynamically. To sum up, only the characters that are in the active code page can be displayedproperly, but on the database they are correct.

    Lets look into this from this perspective, If a user wants to enter Japanese, he/she must logon in Japanese. To insure that no data corruption occurs, the following restrictions must be

    followed: Global data must contain only 7-bit ASCII characters, which are in all code pages,Users may use only the characters of their log-in language or 7-bit ASCII. Batch processesmust be assigned with the correct user ID and language.

    Let us take an example to understand this: MDMP functionality has the ability to dynamicallyassign a code page, mapping the hex value to a natural language character, based on the codepage containing the language of the users logon session. A Japanese user working in an R/3MDMP system (logged on with the Japanese language) can view texts that were originallyentered in Kanji (Japanese), but this user cannot correctly view text data originally entered by auser logged on in Russian. This is because the hex values that represent the Russian text datawould not map correctly to Kanji characters; the Japanese user would see garbage charactersif they tried to, for instance, view a customer name that was entered with Russian characters.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    9/24

    9

    End of Support for MDMP SAP Systems

    Restrictions in MDMP:

    Users can only use the characters of their logon language or 7-bit ASCII.

    Incorrect or faulty locales can lead to data corruption.

    English texts are "fixed" in one code page and therefore must be repeatedly translated.

    Handling errors are likely: e.g. users must log on with the correct language, Batch processes mustbe assigned with the correct user ID and language, the correct device type must be used.

    Global data (from tables without language flag) must contain only 7-bit ASCII characters, which are

    in all code pages, otherwise data corruption can occur. As MDMP is an SAP proprietary solution, mixed MDMP data can not be interpreted by most third party

    products. Only RFC and BAPI communication is possible.

    SAP cannot guarantee that Java texts containing Unicode data are properly interpreted by MDMPsystems.Java is always based on Unicode.

    Integration of WebDynpro and MDMP systems is problematic.

    SAP cannot guarantee that data coming from the internet containing Unicode data are properly

    interpreted by MDMP systems. The relationships between the language keys and code pages in MDMPsystems are only well-defined with SAP systems.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    10/24

    10

    Introduction of Unicode in SAP

    The interesting feature of human society is a language or the manner in which we allcommunicate and for that matter of fact there are so many languages, divided into manylanguage families, but every language inevitably changes over even a relatively short time- thus

    proving only thing which is constant is change, and the reason is continuous communication betweendifferent language speaking people trying to speak in one language and thus resulting in a smallchange, for example an Indian speaking in British English resulting in a different modulation andintonation of speech and with time includes his own native words into the language. Asignificant challenge during the fast-paced development of information technology andcomputers was therefore to try to encode language and the characters associated with it into aform suitable for machines, so as to be able to store and exchange data. Data exchange was,and still is, challenging, as one must define certain standards in order to ensure the smoothest

    possible data exchange between different computers and programs. With time, it became clearthat the variety of different formats introduced mainly due to increasing globalizationwerestill unable to represent languages sufficiently well, and that there were even errors during dataexchange between heterogeneous IT platforms. The solution to this omnipresent problem wasto find Unicode. For the first time, it was accepted and agreed globally to create a uniformstandard, the IDEA was the fixed assignment of one number to every character, guarantees that texts inany language can be displayed and transmitted without error, both today and in the future.

    Today all the sap applications support and are available in UNICODE-based versions, also thenew products to like SAP XI and SAP N/W Portal are delivered in only UNICODE versions. It is aplan by SAP to end the support for obsolete solutions for combinations of languages and codepages like MDMP, single code page, Blended code page in R/3, and the termination is beingdone step by step. Now the ERP2005 no longer support for MDMP and after 2007 all newinstallations of applications based on SAP NetWeaver will only be possible under Uniocde.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    11/24

    11

    What is Unicode exactly?

    Unicode = universally encoded character set to store information from any language

    Unicode defines: Properties for each character Standardizes script behavior Provides a standard algorithm for bi directional text Defines cross-mappings for other standards Unicode defines a unique code value for every character, regardless of platform,

    program or programming language usedThe Unicode standard primarily encodes scripts rather than languages Scripts comprise several languages that historically share the same set of symbol In many cases a script may serve to write dozens of languages (e.g. the Latin script) In other cases one script complies to one language (e.g. Hangul)Additionally it also includes punctuation marks, diacritics, mathematical symbols,technical symbols, musical symbols, arrows, dingbats etc. In all, the Unicode Standard comprises >95.000 characters, ideograph sets, symbols.

    The Unicode StandardThe Unicode Standard is a character coding system designed to support the worldwide interchange,

    processing and display of written text of the diverse languages and technical disciplines of themodern world. In addition, it supports classical and historical texts of many written languages.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    12/24

    12

    What is Unicode exactly? Contd.

    Where is Unicode used?

    The Unicode standards has been adopted by many software and hardware vendors

    Most of the OS support Unicode

    Unicode is required for international document and data interchange, the Internet and the WWW, andtherefore by modern standards such as:

    Java, C#, Perl, Python

    Markup languages such as XML, HTML, XHTML, MathML, WML etc.

    JavaScript

    LDAP

    CORBA etc.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    13/24

    13

    Unicode-compliant SAP products (SAP Note79991)

    mySAP Business Intelligence (BW)

    The Unicode version of mySAP BW 3.5 is available via Ramp-Up

    the conversion of existing BW installations as customer project

    SAP Note 643813 has a collection of all relevant SAP notes concerning Unicode-based SAP BWinstallations

    mySAP Product Lifecycle Management (PLM)

    The Unicode version of mySAP PLM 4.0 is available via Ramp-Up

    SAP R/3 Enterprise (Ext. 1.10 & higher)

    SAP Exchange Infrastructure

    Why do we need Unicode?Answer to this question is pretty straight forward as explained in the below

    points:1. The Global support of the IT systems that has multi lingual data withany restrictions.2. It uses the web interfaces that opens the door to global customer baseand thus support multi region and multi languages simultaneously.3. SAP has integrated J2EE and can not support web standards fully, withUNICODE it can take advantage of XML and Java in the functionality.4. Only UNICODE be able to integrate inhomogeneous SAP and non-SAP

    system landscapes.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    14/24

    14

    Guidelines for Unicode Conversion Projects

    As was explained earlier today, all SAP applications are available in Unicode-basedversions, and new software products from SAP such as the SAP NetWeaver ExchangeInfrastructure (SAP XI) or the SAP NetWeaver Portal are now only delivered as Unicodeversions. The support of the MDMP based R/3 system is also getting terminated, so theneed of UNICODE system has increased to globalised the system without any restrictions.

    Unicode Conversion: Below is a rough overview of the conversion of one SAP system,

    which shows the phases of a conversion. The strategy for a conversion remains the same:Preparation remains very important, followed by the conversion itself, and then the phaseof post processing.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    15/24

    15

    Information Gathering , evaluation and analysis

    Before Unicode conversion starts we must gather as much as possible and clarify specific

    situation in the client. In general cost and efforts are the focus. It is very essential to have abusiness justification for the conversion.

    The following points describe the possible factors to be taken into account in this step:Unicode conversion process and its outcome

    Acquiring relevant customer-specific information like as follows

    Overview of the system landscape (systems, releases, support packages, front-endsoftware, and so on)

    Database sizes (in GB), the 50 largest tables, and the hardware configuration of allrelevant systems

    Requirements pertaining to tolerable downtime for individual systems and their impact onthe business

    Code page setup of all systems (MDMP, single code page, blended code page)

    Description and configuration of the interfaces between the systems and to non-SAPsystems.

    Existing add-on solutions in the systems (SAP and non-SAP)

    Number and type of existing custom and modified developments in SAP system.

    Existing rollout plans in other countries for the different systems

    Planned system mergers

    Possible conversion strategies

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    16/24

    16

    Information Gathering , evaluation and analysis contd.

    Gathering experience of other customers

    Creating the first rough estimate of effort

    Defining the business case and creation of initial project plan

    Evaluating the consequences if the Unicode is postponed, for more information read note 79991

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    17/24

    17

    Determining Factors of a Conversion Project

    There are certain factors involved in deciding the Unicode conversion, the best way out isto collect answers to the below basic questions/reasons before you start or even plan forthe conversion.

    Your company is planning to upgrade from existing MDMP system to SAP ERP 5.0 and ERP 6.0,which does not support MDMP, more information in sapnote 79991

    You want to use English as the central logon language for all countries or languages.

    You want to use Java technologies as ESS/MSS in the MDMP environment.

    Support for dialects is needed ( such as Canadian French )

    Needs to display certain characters that are not supported in MDMP.

    Internet connection is needed.

    Java integration is also needed.

    Needs to consolidate systems with different code-page configurations.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    18/24

    18

    Determining Factors contd.

    The duration of project depends on many factors, few main factors are shown below ofcourse it depends on the availability of the resource and their state of knowledge. As isdefined and categorized below the duration depends on Language used, SAP solutionused and Platform used. The hardware requirement for the Unicode is different, so it hasto be looked into as well in the planning stage and should be accurately sized.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    19/24

    19

    Determining Factors contd.

    As a minimum value for the conversion of a three-system landscape, you can assumeabout four weeks of project runtime. On average, these projects take about three to fourmonths. For very large MDMP systems with many custom ABAP objects or interfaces toother MDMP systems, the runtime can even be more than a year.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    20/24

    20

    Determining Factors contd.

    Specialist needed:A Unicode project not only need Basis/NetWeaver experts but alsorequires expertise in the area of ABAP enabling as well as in the interface area. TransactionSPUMG/SPUM4, SUMG are generally executed by SAP NetWeaver/Basis experts. For thepreparation of the system vocabulary, however, experts in Vocabulary creation is needed.The export and import procedures and optimization are comparable to an upgrade andrequire technical knowledge. Testing is generally the responsibility of the application team.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    21/24

    21

    Release Changes and Unicode Conversion

    We need to find the best possible way of combining the conversion and upgrade of our system

    or should ask this question How can an upgrade and Unicode conversion be combined ?

    Upgrades (release change) and Unicode conversions are both projects during the course of which agreat deal of application testing is necessary. Although these are two logically independent stepsthere is still the question of how well the two tasks can be combined. This are particularlyinteresting in an upgrade from a non-Unicode-capable release with MDMP to SAP ERP 6.0, becausein the target release MDMP is no longer supported (see SAP Note 79991).

    There are certain possibilities as mentioned below for deciding the strategy for the upgrade andUnicode conversion:

    1.Separate Projects :upgrade is treated as a separate project from Unicode conversion or vice versa.This is the greatest possible separation of upgrade and Unicode conversion.

    2.Upgrade and Unicode on same weekend :This depends if the runtime of upgrade and conversioncan be accommodated in the weekend. Normally the Unicode conversion itself takes around 40 hours of

    downtime and including upgrade downtime it is highly unexpected to finish both the procedures in 48 hrstime during the weekend. Another drawback in this approach is increase in the complexity of the projectfor example handling the ABAP objects during the upgrade from the source release of 4.6C.

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    22/24

    22

    Release Changes and Unicode Conversion Contd.

    3.Upgrade and Unicode conversion of different weekend:It is possible to perform the conversionand the upgrade in one project, but on different weekend for the conversion of the production system. Assuming thatthe upgrade is performed before the conversion, this means that tests must be performed both in the non-Unicodeand the Unicode systems, as in this case the non-Unicode system will be going live on the new release. The

    advantages of this approach would be that a sandbox system could be used both for the upgrade and for theconversion, and that tests may be performed twice, but otherwise would still be performed shortly after one anotherin an identical procedure.

    4. Combined upgrade and Unicode Conversion:The CU&UC method is primarily developed for theMDMP customers who are on SAP R/3 4.6 C and going towards the target release of ECC6.0, refer to sapnote 928729for more information. The major component of this approach is SPUM4 which is equivalent to SPUMG. The principlebehind the SPUM4 and SPUMG is that the transaction will be performed online during the production operation.Because the runtimes of Transaction SPUMG for MDMP customers will run at least for a matter of days, theperformance of SPUMG in the target release is impossible during the downtime. Thus SPUMG was implemented in

    SAP R/3 4.6C as SPUM4, so that online performance would be possible under this release. But Unicode enablingtransaction like UCCHECK is not available under this release so this has to be done on the sandbox or the upgradedsystem and then the results can be transported into the production later on. Refer to the diagram shown below for asample procedure:

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    23/24

    23

    Release Changes and Unicode Conversion Contd.

    5. Twin Upgrade and Unicode Conversion:As explained earlier CU&UC cannot be performed to

    the release prior to SAP R/3 4.6C. Now for these releases the method TU&UC has been developed andused successfully. In this method the Twin system is created as a copy of the production system and anupgrade is performed without Unicode conversion. As explained earlier transaction SPUM4 was availableonly on SAP R/3 4.6C so it could not be used on the release prior to this, so the Idea is to get the systemupgraded to the target release and then use SPUMG which is available starting from ECC5.0 and then dothe Unicode conversion. The results of the SPUMG can then be transported later to the Upgraded

    production system and then the SUMG can be used to make any corrections if needed in the targetupgraded and Unicode converted Production system. For more information onTU&UC limitations and FAQsee sapnote 959698. Refer to the below diagram for the sample procedure of TU&UC:

  • 8/14/2019 History of Code Pages and Unicode in SAP_v2.0

    24/24

    24

    Summary

    The focus of this Presentation was to make clear what is coding and its implications in SAPstarting from Single code pages, Blended code pages ( Ambiguous and Unambiguous codepages ) and MDMP code pages in SAP system and the challenge involved in the conversionof these systems to Unicode systems in SAP. Different conversion procedure andcomplexities involved. A new installation is relatively simple, because it hardly differs fromthe installation of a non-Unicode system. In the conversion of a three-system landscape,

    on the other hand, there are already many different options for the implementation ofUnicode as explained in the later part of the presentation. How to estimate the effort of theUnicode project and the various factors involved. Here, database size, the possible use ofMDMP, the number of custom programs, and the type and number of interfaces all playsignificant parts. At the end comparison of Unicode conversion with an upgrade projectmade it clear that, depending on the conditions, the Unicode conversion may be easily donewith proper planning and pre-analysis of the impact. The difference between CU&UC and

    TU&UC is explained and overview if these conversion procedure was also given.