22
Documentum Proprietary 1 18 th International Unicode Conference 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management Software Product Line to Unicode 27 April 2001 Donald Ziff

18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Embed Size (px)

Citation preview

Page 1: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 1

18th InternationalUnicode Conference18th International

Unicode Conference

Documentum and UTF-8:Converting Content Management Software Product Line to Unicode

27 April 2001Donald Ziff

Page 2: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 2

18th InternationalUnicode Conference

Agenda

• What is Documentum?

• Documentum’s I18N Problem

• How Unicode UTF-8 Saved the Day

• Other Success Factors

• Demo

Documentum Proprietary and Confidential

Page 3: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 3

18th InternationalUnicode Conference

About Documentum

• Documentum: NASDAQ “DCTM”

• The Leader in Web and Enterprise Content Management Solutions

• > $128M in revenue 1999. > 800 employees.

• Over 900+ Global 2000 customers with strong vertical focus

• Over 25 Offices in 10+ countries

Page 4: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 4

18th InternationalUnicode Conference

DCTM’s I18N Problem

• Everyone agrees: we need I18N to fuel growth – especially in Asia

• Asian-certified product much more important than multi-lingual– Although demand for multi-lingual is

growing…

• So why not I18N?

Page 5: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 5

18th InternationalUnicode Conference

I18N Perception Problems

• Too Difficult – won’t fit into a development cycle

• Too much Overhead – multiplies QA and Support

• Not Sexy – no new functionality

Let’s look at these problems…

Page 6: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 6

18th InternationalUnicode Conference

“I18N is too difficult”

Product Layers:

• Server (built on RDBMS + Verity)

• DMCL: Client Library (C++)

• DFC: Foundation Classes (Java)

• DTC: Desktop Client – Win32 end-user client

• WDK: Web Development Kit

• RightSite: Legacy Web-Server Integration

• Web Publisher: Web Content Management App

• Legacy clients: Workspace (Win32), Intranet

Page 7: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 7

18th InternationalUnicode Conference

History Lesson

• Server v3.1.6.INT, created by consultants for Japanese market, was expensive and time-consuming– 3.1.6.INT attempted to internationalize all

the layers in the DCTM architecture at once

• 4.0 was released without I18N changes

• 4.1 followed, the deltas from 3.1.6 to 3.1.6.INT became hard to apply…

Page 8: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 8

18th InternationalUnicode Conference

“I18N requires too much overhead”

• The DCTM server requires pharmaceutical-strength certification

• Dimensions of certifications: – 3 RDBMS platforms: Oracle, Sybase, SQL-

Server– 4 Server OS’s: NT, Solaris, HPUX, AIX

• The 3.1.6.INT architecture introduced new dimensions, leading us to…

Page 9: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 9

18th InternationalUnicode Conference

Certification Hell!

• New certification dimensions:– 5 DCTM Server code-pages– 5 RDBMS code-pages

• Market requires another dimension: – 5 Server OS Localizations

• 125 new times 12 old 1500 certs!

• Exaggeration, of course… But still…

Page 10: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 10

18th InternationalUnicode Conference

“I18N not sexy”

• DCTM is a growth company, needs sizzle as well as steak

• I18N grows markets, but doesn’t add much to marketing message

• To be fair: new functionality is not just “sexy” – it is essential to DCTM’s continued survival

• Other priorities will move to the top…

Page 11: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 11

18th InternationalUnicode Conference

DCTM’s I18N Requirements

• Crucial need: support Asia from the main code-line. One binary for the world

• Backward compatibility essential

• Multi-lingual features would be a side-benefit. High on the wish list for a few key customers

• I18N project must be scoped down to be achievable

Page 12: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 12

18th InternationalUnicode Conference

How UTF-8 Saved the Day

• UTF-8 moves safely through the server because anything that looks like ASCII actually is

• Standardizing on UTF-8 as the only supported internal code-page cuts down certification matrix

Page 13: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 13

18th InternationalUnicode Conference

Lessons from Double-Byte Experiments

• EUC-KR: 4.1 server works (basically)

• SJIS: problems! double-byte characters whose second bytes are ASCII: \ ` |

• Lessons:– Non-ASCII moves through the server safely– String handling need not be double-byte

aware, if ASCII always means ASCII

• Solution: UTF-8!

Page 14: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 14

18th InternationalUnicode Conference

UTF-8: ASCII is ASCII

• No need for special string handling– Server 3.1.6.INT replaced all standard c

string handling with calls to 3rd-party library– With UTF-8, we stick with standard – yacc

and other legacy tools work fine

• Greatly improved perception (and reality) of how difficult I18N would be– Now, it’s relatively low-impact

Page 15: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 15

18th InternationalUnicode Conference

It’s UTF-8, dummy!

• Use UTF-8 everywhere, cut down on certification dimensions

• Provides safe character-handling for Asia

• Even though multi-lingual is not a requirement

• Easier to support

Page 16: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 16

18th InternationalUnicode Conference

Other Success Factors

• Rely on RDBMS services to translate between RDBMS code-page and UTF-8

• Market research cut back on OS localization constraints

• Transcoding infrastructure

Page 17: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 17

18th InternationalUnicode Conference

RDBMS transcodes to/from UTF-8

• Oracle and Sybase transcode automatically – SQL Server is a problem

• No need for new transcoding calls between Server and RDBMS – lower impact

• Upgrade customers have non-unicode RDBMS – no need for them to convert

• One less certification dimension!

Page 18: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 18

18th InternationalUnicode Conference

Cut back on Localized OS certs

• Limit RDBMS for Asia – for 4.2, just Oracle

• Localized OS certification not necessary for Europe

Page 19: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 19

18th InternationalUnicode Conference

Transcoding Infrastructure

• Server must be aware of interface code-pages

• Transcoding done at the interfaces

• 3rd party transcoding used: Uniscape’s GlobalC

Page 20: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 20

18th InternationalUnicode Conference

New I18N Architecture

RDBMSRDBMS(Unicode)(Unicode)

VerityVerityFile File SystemSystem

e-Content Servere-Content Server(UTF8)(UTF8)

( UTF8) DMCL (4.2)( UTF8) DMCL (4.2)

DFC (Unicode)DFC (Unicode)

WDK (Unicode)WDK (Unicode)

Intranet Client AdministratorWeb Publisher

WorkSpace

Custom WebApp

ARP(NCS)ARP(NCS)Web CacheWeb Cache

Rightsite(NCS)Rightsite(NCS)

DMCL DMCL ≤ 4.1≤ 4.1 (NCS) (NCS)

Desktop Client

Unicode

National Character Set

Legend:

Page 21: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 21

18th InternationalUnicode Conference

Demo

• Demo – multilingual WDK

• If there’s time, a quick look at localized Desktop Client (Win32 Client)

Page 22: 18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management

Documentum Proprietary 22

18th InternationalUnicode Conference

Conclusion

UTF-8 was a crucial technology in DCTM’s I18N strategy:

• Provided an easy path for legacy C++

• Supported specific Asian languages consistently, minimizing certifications

• Prepared infrastructure for multi-lingual requirements