Gregory M. Shreve. Software Localization and Internationalization: How and Why. Internet, E-Commerce & Foreign Markets. - PowerPoint PPT Presentation
Text of Gregory M. Shreve
The GREEN Digital Library A Specialized Materials Science
Collection Of the National Science Digital LibraryKent State
University
Internet World Stats estimates the current number of WWW users at
785 million. Of these, 29% reside in North America, 27.7% reside in
Europe, and 31% reside in Asia with penetration rates of 69.8%,
29.9% and 6.7% respectively.
With 58.7% of current users residing in regions with an average
penetration rate of only 18.3%, it is clear that these foreign
markets offer substantial rewards for those prepared to enter
them.
Internet, E-Commerce & Foreign Markets
The growth of the Internet and e-commerce over the next decade will
be driven by the expansion of foreign markets.
175.psd
11/7/2004
Kent State University
In 2003 e-commerce sales to foreign customers exceeded domestic
sales. This year the European Internet economy is expected to break
the 4 trillion dollar mark, growing at a compound annual rate of
87%. Western Europe is expected to lead all regions with 692
billion dollars in global online exports in 2004.
North America will move 23% of its exports online, with the U.S.
pumping 210 billion dollars into cross border e-commerce. The
Asia-Pacific region will reach 219 billion dollars in 2004, sparked
by 57 billion dollars in Japanese online exports.
Consumer as Foreigner
Kent State University
Global, Globalize, Globalization
Companies that intend to sell online will have to globalize their
web presence and their products to reach the majority of the online
marketplace. They will have to make their web sites, software
interfaces, and product documentation available in the languages
and cultural styles of an increasingly diverse and international
market by applying a process called localization – the translation
of content and adaptation of interface and form to reflect the
expectations of one or many given locales.
For global-strategy American companies, over
40% of total revenue comes from international
sales. These companies market high-
technology products such as software,
medical instrumentation, CAD / CAM devices,
and so on.
Kent State University
Global, Globalize, Globalization
Most of these products have a high document overhead, with
instructions on the assembly, use, maintenance, and repair of the
products delivered via off- and on-line electronic documentation.
Most are marketed and supported online. Further, many products may
have embedded software components and user interfaces use on-line
databases. These products and documents must be delivered to
locales, target markets with different cultural and linguistics
contexts.
CBT
computer-based-training
UI
Language Industry
While global marketing existed before the 1990’s, the translation /
software localization industry (or “language industry” for short)
today has evolved primarily as a result of the rapid global
expansion of the computer software market and the increasing use of
the Internet as a global marketing and customer service tool – all
part of globalization.
The corporate problem is, of course, that many companies do not
understand HOW to prepare their many products, documents, web pages
and database interfaces for distribution in other linguistic and
cultural locales – hence the need for the services of the language
industry.
11/7/2004
New Media, New Markets
Experts estimate the current worth of the U.S. language industry at
just under $2 billion annually, with the global market worth
approximately $6 billion. Indications are that growth will continue
to be strong into the next decade because of new electronic media
and markets.
Consider the case of massively multi-player online games (MMOGs):
the language industry enables the publishers of these games to
leverage their initial development investment by translating and
adapting the games for international locales. Industry projections
are that MMOGs will post a 52% cumulative annual growth rate
between 2002 and 2006.
11/7/2004
This presentation examines the issues and processes involved in
software internationalization and localization.
There are three related major processes to consider. We have
already discussed globalization.
globalization, a strategic decision to reach an international
audience or to include different linguistic and cultural materials
in a product, software application, web site or digital
collection;
internationalization, a design process intended to enable efficient
and cost-effective subsequent linguistic and cultural
adaptation;
localization, the preparation of locale-specific versions of an
application’s interface and content.
G11N L10N I18N
Kent State University
Localization is the preparation of locale-specific versions of a
software application, electronic document, internet resource, or
digital collection. It consists of the translation of textual
material into the language and textual conventions of the target
locale and the adaptation of non-textual materials and delivery /
display mechanisms to take into account the cultural requirements
of that locale.
Internationalization is an “upstream” engineering process that
should precede localization. Its aim is to make subsequent
localization/translation easier, more efficient, and less
costly.
Internationalization & Localization
documents, interfaces, tools
Each of these processes has a different scope and occurs at a
different
point in the business and document cycles of an organization.
Earlier
Later
11/7/2004
Evolution of Software Localization
Software localization developed as part of the globalization of the
personal computer software market. Software applications and
supporting electronic documents were the first “localized”
products. The growth of the Internet and the World Wide Web created
a demand for localized web pages and sites. Digital multimedia and
digital repositories (including digital libraries) are emerging
foci of localization.
PC
software
1980
2005
WWW
repositories
multimedia
11/7/2004
Localization focuses on both display (appearance, presentation) and
content. Thus, localization includes a cultural adaptation as well
as a linguistic translation component.
date, time, calendar,
currency, number, address
strings are
printf("This program converts decimal numbers to
hexadecimal\n\n");
while(1) {
printf("\nDo you want to continue? ");
scanf("%s",y);
localizable material:
dialog boxes
not breaking tags
evaluating CSS and stylesheet changes
making changes to graphics
Localization of HTML
Web sites are also now being localized. The link below points to a
commented HTML file that gives a simple introduction to localizing
an HTML web page. At the localizer’s level some of the issues (not
an exhaustive list) are:
11/7/2004
A Solution: Re-Engineer the Software
As one could imagine, localizing directly in code led to problems.
First, translator / localizers were quite capable of “breaking
code.” There were also problems associated with the necessity for
multiple “re-builds” of the basic software for each language
version. Language expansion (differences in textual volume) created
sizing problems in dialogs and controls. Localization was
labor-intensive, difficult and expensive. A solution was to
re-engineer the software with the intent of separating language
resources from the underlying delivery mechanism.
11/7/2004
Internationalization is a re-engineering and re-design process
intended to make localization and translation easier, faster and
more cost-effective.
A first step in the inter-nationalization of software applications
is the separation or extraction of linguistic and cultural
resources from the application, leaving a “neutral” software
kernel.
Extraction requires specialized localization tools.
applicationsoftware
kernel
resources
11/7/2004
main()
printf(intl_m_msg("","mypg",1));
while(1) {
printf(intl_m_msg("","mypg",2));
\n Enter decimal number:
\n Do you want to continue?
\n exiting ..\n
main()
printf(intl_m_msg("","mypg",1));
while(1) {
printf(intl_m_msg("","mypg",2));
Ce programme convertit les nombres décimaux en
hexadécimal\n\n
\nEntrer le nombre décimal:
\nVoulez vous continuer?
<TR><TD>Kent</TD></TR>
<TR><TD> Ohio</TD></TR>
<TR><TD> 44240</TD></TR>
.
.
.
Content and Display in Web Pages
Web pages share the problem of “separation of content and coding”
with application software. You can see from our web page example
how true this is. Internationalization solutions in web pages also
involve the “extraction” of linguistic and cultural material from
the software vehicle. Cutting edge solutions create dynamic HTML
from XML-based language content.
<gradinquiry>
<name>
<addressline2/>
as used in software localization
Multiple static versions of pages stored in a folder hierarchy by
language and navigated by selection mechanism
language
selection
11/7/2004
Truly effective internationalization also involves early
intervention in and re-design of “upstream” business and document
processes like authoring to exert greater control and to reduce
variability.
creation: authoring
11/7/2004
internationalized products.
own processes and tools.
Translation memories and terminology managers are important tools
for maintaining standardized translations and glossaries. TMs
provide the focus of QA, ensure replicability / repeatability, and
allow re-use of linguistic and cultural materials.
11/7/2004
localization
toolkit
(distribution)
localization
tool
(translator)
translation
memory
terminology
manager
Specialized localization for alignment and term extraction are used
to automate the construction of TMs.
term extraction
previous
20%
change
objective of internationalization and
11/7/2004
reusability
scalability
by separating content from display, defining and extracting
culturally variable material from fixed or neutral material,
intervening in the document cycle to exert control over document
processes, and using
translation memories and
terminology management to ensure critical characteristics such as
authority and reusability
11/7/2004
Future directions in internationalization will involve exploiting
document corpora more effectively and extracting useful linguistic
and textual objects for control and re-use.
Control of the document cycle begins with understanding the
documents we already “own” and enhancing them.
11/7/2004
Corpus
11/7/2004
Corpus Replication
Using statistical techniques it is possible to replicate the
contents of a monolingual corpus and add multilingual equivalents
for terms, phrases, document segments and other objects to
it.
11/7/2004
What The Industry is Doing Now
The language industry currently relies on using translation
memories and terminology managers. There are significant drawbacks
to this method that prevent new gains in cost reduction and
profitability – the goal of inter-nationalization.
11/7/2004
Kent State University
A New Model
New approaches to internationalization and automatic localization
leverage the linguistic value of existing corpora and allow the
creation of “enhanced” corpora whose contents are understood and
controlled. Statistical corpus linguistics and XML combine to allow
the next step in localization technology.
11/7/2004
Kent State University
Peer-to-Peer Localization Resources
A peer-to-peer networking platform with a security and digital
rights management layer can be used to link clients in an XML
resource network. A vendor can assess per transaction charges for
access to corpus object stores.
11/7/2004
Kent State University
Socio-Cultural Style Sheets