Upload
samuel-larson
View
230
Download
2
Tags:
Embed Size (px)
Citation preview
Documentum Proprietary 1
18th InternationalUnicode Conference18th International
Unicode Conference
Documentum and UTF-8:Converting Content Management Software Product Line to Unicode
27 April 2001Donald Ziff
Documentum Proprietary 2
18th InternationalUnicode Conference
Agenda
• What is Documentum?
• Documentum’s I18N Problem
• How Unicode UTF-8 Saved the Day
• Other Success Factors
• Demo
Documentum Proprietary and Confidential
Documentum Proprietary 3
18th InternationalUnicode Conference
About Documentum
• Documentum: NASDAQ “DCTM”
• The Leader in Web and Enterprise Content Management Solutions
• > $128M in revenue 1999. > 800 employees.
• Over 900+ Global 2000 customers with strong vertical focus
• Over 25 Offices in 10+ countries
Documentum Proprietary 4
18th InternationalUnicode Conference
DCTM’s I18N Problem
• Everyone agrees: we need I18N to fuel growth – especially in Asia
• Asian-certified product much more important than multi-lingual– Although demand for multi-lingual is
growing…
• So why not I18N?
Documentum Proprietary 5
18th InternationalUnicode Conference
I18N Perception Problems
• Too Difficult – won’t fit into a development cycle
• Too much Overhead – multiplies QA and Support
• Not Sexy – no new functionality
Let’s look at these problems…
Documentum Proprietary 6
18th InternationalUnicode Conference
“I18N is too difficult”
Product Layers:
• Server (built on RDBMS + Verity)
• DMCL: Client Library (C++)
• DFC: Foundation Classes (Java)
• DTC: Desktop Client – Win32 end-user client
• WDK: Web Development Kit
• RightSite: Legacy Web-Server Integration
• Web Publisher: Web Content Management App
• Legacy clients: Workspace (Win32), Intranet
Documentum Proprietary 7
18th InternationalUnicode Conference
History Lesson
• Server v3.1.6.INT, created by consultants for Japanese market, was expensive and time-consuming– 3.1.6.INT attempted to internationalize all
the layers in the DCTM architecture at once
• 4.0 was released without I18N changes
• 4.1 followed, the deltas from 3.1.6 to 3.1.6.INT became hard to apply…
Documentum Proprietary 8
18th InternationalUnicode Conference
“I18N requires too much overhead”
• The DCTM server requires pharmaceutical-strength certification
• Dimensions of certifications: – 3 RDBMS platforms: Oracle, Sybase, SQL-
Server– 4 Server OS’s: NT, Solaris, HPUX, AIX
• The 3.1.6.INT architecture introduced new dimensions, leading us to…
Documentum Proprietary 9
18th InternationalUnicode Conference
Certification Hell!
• New certification dimensions:– 5 DCTM Server code-pages– 5 RDBMS code-pages
• Market requires another dimension: – 5 Server OS Localizations
• 125 new times 12 old 1500 certs!
• Exaggeration, of course… But still…
Documentum Proprietary 10
18th InternationalUnicode Conference
“I18N not sexy”
• DCTM is a growth company, needs sizzle as well as steak
• I18N grows markets, but doesn’t add much to marketing message
• To be fair: new functionality is not just “sexy” – it is essential to DCTM’s continued survival
• Other priorities will move to the top…
Documentum Proprietary 11
18th InternationalUnicode Conference
DCTM’s I18N Requirements
• Crucial need: support Asia from the main code-line. One binary for the world
• Backward compatibility essential
• Multi-lingual features would be a side-benefit. High on the wish list for a few key customers
• I18N project must be scoped down to be achievable
Documentum Proprietary 12
18th InternationalUnicode Conference
How UTF-8 Saved the Day
• UTF-8 moves safely through the server because anything that looks like ASCII actually is
• Standardizing on UTF-8 as the only supported internal code-page cuts down certification matrix
Documentum Proprietary 13
18th InternationalUnicode Conference
Lessons from Double-Byte Experiments
• EUC-KR: 4.1 server works (basically)
• SJIS: problems! double-byte characters whose second bytes are ASCII: \ ` |
• Lessons:– Non-ASCII moves through the server safely– String handling need not be double-byte
aware, if ASCII always means ASCII
• Solution: UTF-8!
Documentum Proprietary 14
18th InternationalUnicode Conference
UTF-8: ASCII is ASCII
• No need for special string handling– Server 3.1.6.INT replaced all standard c
string handling with calls to 3rd-party library– With UTF-8, we stick with standard – yacc
and other legacy tools work fine
• Greatly improved perception (and reality) of how difficult I18N would be– Now, it’s relatively low-impact
Documentum Proprietary 15
18th InternationalUnicode Conference
It’s UTF-8, dummy!
• Use UTF-8 everywhere, cut down on certification dimensions
• Provides safe character-handling for Asia
• Even though multi-lingual is not a requirement
• Easier to support
Documentum Proprietary 16
18th InternationalUnicode Conference
Other Success Factors
• Rely on RDBMS services to translate between RDBMS code-page and UTF-8
• Market research cut back on OS localization constraints
• Transcoding infrastructure
Documentum Proprietary 17
18th InternationalUnicode Conference
RDBMS transcodes to/from UTF-8
• Oracle and Sybase transcode automatically – SQL Server is a problem
• No need for new transcoding calls between Server and RDBMS – lower impact
• Upgrade customers have non-unicode RDBMS – no need for them to convert
• One less certification dimension!
Documentum Proprietary 18
18th InternationalUnicode Conference
Cut back on Localized OS certs
• Limit RDBMS for Asia – for 4.2, just Oracle
• Localized OS certification not necessary for Europe
Documentum Proprietary 19
18th InternationalUnicode Conference
Transcoding Infrastructure
• Server must be aware of interface code-pages
• Transcoding done at the interfaces
• 3rd party transcoding used: Uniscape’s GlobalC
Documentum Proprietary 20
18th InternationalUnicode Conference
New I18N Architecture
RDBMSRDBMS(Unicode)(Unicode)
VerityVerityFile File SystemSystem
e-Content Servere-Content Server(UTF8)(UTF8)
( UTF8) DMCL (4.2)( UTF8) DMCL (4.2)
DFC (Unicode)DFC (Unicode)
WDK (Unicode)WDK (Unicode)
Intranet Client AdministratorWeb Publisher
WorkSpace
Custom WebApp
ARP(NCS)ARP(NCS)Web CacheWeb Cache
Rightsite(NCS)Rightsite(NCS)
DMCL DMCL ≤ 4.1≤ 4.1 (NCS) (NCS)
Desktop Client
Unicode
National Character Set
Legend:
Documentum Proprietary 21
18th InternationalUnicode Conference
Demo
• Demo – multilingual WDK
• If there’s time, a quick look at localized Desktop Client (Win32 Client)
Documentum Proprietary 22
18th InternationalUnicode Conference
Conclusion
UTF-8 was a crucial technology in DCTM’s I18N strategy:
• Provided an easy path for legacy C++
• Supported specific Asian languages consistently, minimizing certifications
• Prepared infrastructure for multi-lingual requirements