28
1, 'e G. 1S, 10\- lee, :om 18.1 Introduction This chapter describes how the digital library creation tool Greenstone addresses the 1e 9, problem of internationalization. Digital collections served by Greenstone involve sev- eral different groups of users: Final ;ram, • Readers - who consult the multimedia documents in the collections to support their personal tasks. • Anthologists - who gather documents together and establish collections. node • Librarians - who monitor and update collections as they are used by readers. inion • Technicians - who respond to user feedback and implement new functionality. :Iines In practice, although the other boundaries are clear, there is little difference between )021 the anthologists who create collections and the librarians who maintain them and in I I , this chapter we call both groups 'librarians.' JOnse These groups of users are common to all digital library applications. However; sion, Greenstone aims to be a multilingual solution for building digital libraries and there- fore needs to accommodate documents and interfaces in any language. While it is most common to browse collections in their native language, this is not always the case (and cannot be the case for multilingual collections). Therefore, Greenstone strives to support arbitrary combinations of languages. In other words, one might employ an English interface to Chinese text documents, or an Arabic interface to Spanish documents. Multilingual software typically breaks the problem of language support down into two parts: internationalization and localization. Internationalization involves

Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

1, 'e

G. 1S,

10\­

lee, :om 18.1 Introduction

This chapter describes how the digital library creation tool Greenstone addresses the 1e 9, problem of internationalization. Digital collections served by Greenstone involve sev­

eral different groups of users:

Final ;ram, • Readers - who consult the multimedia documents in the collections to support

their personal tasks. • Anthologists - who gather documents together and establish collections.

node • Librarians - who monitor and update collections as they are used by readers. inion • Technicians - who respond to user feedback and implement new functionality.

:Iines In practice, although the other boundaries are clear, there is little difference between )021 the anthologists who create collections and the librarians who maintain them and inI

I,

this chapter we call both groups 'librarians.' JOnse These groups of users are common to all digital library applications. However; sion, Greenstone aims to be a multilingual solution for building digital libraries and there­

fore needs to accommodate documents and interfaces in any language. While it is most common to browse collections in their native language, this is not always the case (and cannot be the case for multilingual collections). Therefore, Greenstone strives to support arbitrary combinations of languages. In other words, one might employ an English interface to Chinese text documents, or an Arabic interface to Spanish documents.

Multilingual software typically breaks the problem of language support down into two parts: internationalization and localization. Internationalization involves

Page 2: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

using a software architecture that can easily be localized, while localization refers to the process of adapting an application for a particular culture or market (Yeo, 2001; Aykin, 2005). As an open source project, Greenstone does not have the lin­guistic resources to achieve localization itself. Consequently, a further set of users is identified:

• Translators - who translate the interface into their own language.

Greenstone allows end-user communities to both adapt the software and add their language contributions into a central repository from which all users can benefit. In this chapter we describe how the software infrastructure manages the relationships between the various user groups to enable the improvement and localization of the Greenstone interface.

We first outline the challenges in building global digital libraries and how the Greenstone software has evolved over the past decade. We describe the problems of serving multilingual content with multilingual interfaces and then explain the approach that the Greenstone developers have adopted. The chapter concludes with a summary of the lessons learnt during the development of the toolset and their rele­vance for the different user groups.

18.2 Global Digital Libraries Libraries have been an important element in the development of modern societies, providing low-cost access to diverse information sources for their patrons. Digital libraries increase the accessibility of information sources, lowering costs, extending access via networking, and adding new access mechanisms such as full-text search­ing. As the Internet spreads over the world, digital libraries, along with other soft­ware applications, face the three universal usability challenges: technological variety, user diversity, and gaps in user knowledge (Shneiderman, 2000). These challenges become apparent when we consider how to spread the benefits of universal informa­tion access in developing countries: hardware and software from a decade ago are common, networking may be unreliable or absent, languages and cultures of users differ from those of the software developers and the fundamental concepts involved in digital libraries may not be Widely understood by library users or by potential dig­ital library creators (Witten et al., 2002).

In addressing the global challenges of producing tools to build digital libraries it is useful to briefly consider the context of use. Although it is undeniably useful to disseminate information collections built in the developed world, as present

Page 3: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

digital libraries tend to do, a better strategy for sustained long-term development is to disseminate the capability to create information collections rather than the col­lections themselves. Effective human development blossoms from empowerment rather than gifting. Digital libraries enable indigenous people to participate active­ly in preserving and disseminating their own culture (Nichols et al., 2005). In accepting this rationale of tool distribution we also implicitly accept the technical challenges of designing software to work on older computer systems and without the assumption of fast reliable network access (Witten et aI., 2002; Witten and Bainbridge, 2003).

The universal usability challenge of user diversity has several aspects, i'neluding the capacity to support differences in age, gender, language, culture, literacy, disabil­ity, etc. (Shneiderman, 2000). In this chapter we concentrate on language; other work in this volume addresses other aspects of the challenge. The use of language, for a reader, in a digital library occurs in two different ways: the language of the docu­ments and the language of the interface. As documents may be any multimedia resource (e.g. images, audio, video, etc.) then it is the interface that is the only ever­present element. The librarians, however, also interact with the software tools need­ed to construct the collection and its interface for the readers. The translators may experience a different subset of the software as their specialized task can usually be isolated from more general maintenance issues. Finally, the technicians have to create tools and documentation that can function in diverse multilingual environments on, potentially, less sophisticated computer systems. [n the following sections we describe how Greenstone has responded to these challenges.

18.3 Conception and Birth of Greenstone The project that grew into Greenstone began in 1995 with the construction of a New Zealand digital library for computer science research (Witten et aI., 1995), one of many early technology-based projects that build searchable collections of papers in computer science. A few years later we worked with a Belgium-based humanitarian organization, the Human [nfo NGO, to build stand-alone CD~ROM collections of practical information on topics such as agriculture, building, energy, health, nutri­tion, sanitation, and water. This work forced us to consider the 'essential 'bur-mun­dane business of making the software work reliably on low-end Windows systems, for all our research work had been done on Linux, and to build installers that allowed end-users to install the collections on their computer. This work produced an end­user system for working with information collections that we refer to as the 'reader's' interface to Greenstone.

Page 4: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Human Info's connections led to discussions with UNESCO, who encouraged us to consider distributing the capability for building digital libraries, rather than the collections themselves, for all the reasons discussed above. An international cooper­ative effort with UNESCO and Human Info was formally initiated in 2000 and led to the production of the first of a series of CD-ROM distributions of the Greenstone software itself, not collections that had been built with it. This involved an extensive effort in documentation and multilingual computing, for UNESCO wanted to issue an English, Spanish, French, and Russian version of the Greenstone CD~ROM as the second of the series. In parallel, we had been working to make it possible for volun­teers to translate the reader's interface into their own language, and - importantly ­update it when changes were made to the software. It also involved the design and production of an end-user system for building collections that we call the 'librarian's' interface to Greenstone.

Greenstone's original design philosophy emphasized the following points:

• Trivial to install (for individuals without any institutional computer support). • Easy for end-users to build collections with existing documents and metadata. • Open approach to document and metadata formats: anything can be accom­

modated. • End-users can wrap individual collections into a package (e.g. on CD-ROM or

DVD) for use on non-networked machines. • Exactly the same interface will be provided for networked and non-networked

Greenstone installations. • Librarian's interface runs on all machines. • Reader's interface runs all machines right down to very low-end ones (Windows 3.1). • Multilingual, with the capability of easily adding new languages and maintaining

existing ones.

Only when these basic requirements were satisfied did we turn to more sophisticated facilities, such as distributed use of the librarian's interface and interoperation with standard protocols and other digital library systems. -While others were doing research on interoperability, our internal motto was 'first operability, then interoperability.'

18.4 Related Work There are many examples - hundreds if not thousands - of digital libraries on the web. With only a few exceptions, these are done with custom software. One-offs! The pat­tern such projects tend to follow is that the organization (typically an institution of

Page 5: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

IS some form) that is responsible for the source material (typically empowered to be the

le guardian of the material) decides to make the material available over the web to make [­ it more accessible. In other words, they need to build a digital library. The size of the

~d organization is such that it is within their resources to either commit IT staff to devel­

Ie op the necessary software or to outsource the work to a company who then, more

re often than not, writes custom software. In the case of documents starting in physical

Ie. form, digitization - a considerable undertaking - is factored in as part of the process.

le At the top end of the scale in terms of investment, the American Memory project

:l ­ by the Library of Congress (www.memory.loc.gov) is testimony to what can be achieved with this approach. With the aim of providing 'a digital record of American

ld history and creativity,' the digital library embodies dozens of collections providing

's' open access over the web to over S million items. Artifacts include video, maps, sheet music in addition to the written and spoken word. In its peak period of development (1996-2000) it received over US$45 million of funding.

II 1 ! I

1

Despite the wide array of end-result digital libraries, there are only a few digital library systems that are frameworks, like Greenstone, to target the abstract concepts and constructs of digital libraries, thereby enabling others to accomplish their online delivery aims with minimal IT investment. Of note are DSpace (Tansley et ai., 2005) and Fedora (Lagoze et ai., 2006). Both are open source, like Greenstone.

DSpace (www.dspace.org) facilitates the building of institutional repositories that capture, distribute, and preserve intellectual output at an institutional level. It is pro­duced by Hewlett-Packard and designed in partnership with MIT. Its designers note that much of the intellectual output of professors and researchers is in digital form,

Il-

J[

~d

II 1 1

and unless their home institution has an aggressive policy for collecting and preserv­ing it, this information is potentially ephemeral. DSpace is designed to help capture and organize everything produced by faculty and staff -digitized versions-of ,lecture notes, videos, papers, and data sets - into an 'institutional repository' that will make it available to future generations in its original digital form.

).

Ig

j

1I

I ·1 Fedora (www.fedora.info) is a general purpose repository service developed jointly ~d

th 1 by the University of Virginia Library and Cornell University. The Fedora project is

:h 1 devoted to the goal of providing open-source repository software that can serve as the foundation for many types of information management systems. The software

~ I

demonstrates how distributed digital information management can be deployed using web-based technologies, including XML and web services. At its core is a powerful digital object model that supports multiple representations or views of each digital

b. object. Relationships among digital objects can be stored and queried, providing the

It- foundation for expressing rich information networks. These objects exist within a

of repository architecture that supports a variety of management functions such as fine

Page 6: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

granularity access control, version control, and ingest and export of information in standard XML formats.

Unlike Greenstone, which evolved, both these projects were conceived and imple­mented as general frameworks. The abstraction in DSpace is tightly focused on an institutional repository model and has seen significant uptake in universities. To stray beyond these bounds and repurpose the software, however, requires significant pro­gramming effort, although this is clearly possible given the open source nature of the code. Born out of digital library work, Fedora is at the other end of the spectrum in terms of levels of abstraction, with the broad remit of information management sys­tems, of which digital libraries is just one example. This comes at a price, however. For instance, it is assumed that the XML-based document object model used by Fedora is the starting point. It is up to a project using Fedora to develop software that converts source documents into this format. Also, in response to the difficult issue of the end-user interface, Fedora essentially leaves this unspecified, as it is not possible to design something in the realm of information management that is all things to all people. Again, it is up to a particular project to design and implement this.

In terms of internationalization, DSpace has work underway with the user inter­face being translated into several languages (nine European languages as well as Chinese, japanese, and Indonesian at the time of writing) in various stages of com­pletion. The underlying mechanism used for this is based on the java Standard Tag Library, with a property file for each language storing the language strings akin to the language-specific macro files used by Greenstone (see below). Of course, the reader's interface is merely one component to full internationalization (again see below) and the developers list some technical areas of DSpace that need attention, such as indexing. In Fedora there is no evidence of internationalization work, however it should be noted that they make use of the same java Standard Tag Library, and therefore the same technical solution is available to them, should they pursue this.

18.5 Greenstone: ATool for Creating Digital Libraries Greenstone is a suite of software for building and distributing digital library collec­tions (Witten and Bainbridge, 2003). It is not a digital library but a tool for building digital libraries. It provides a new way of organizing information and publishing it on the Internet in the form of a fully-searchable, metadata-driven collection. It is open source, multilingual software, issued under the terms of the GNU General Public License. Collections built with Greenstone automatically include effective full-text searching and metadata-based browsing facilities that are attractive and easy to use. They are easily maintainable and can be rebuilt entirely automatically.

Page 7: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Table 18.1 A representative cross-section of Greenstone-based public digital libraries from around the world

Association of Indian Labour Historians, Delhi New York Botanical Garden

California University at Riverside Peking University Digital Library

Chicago University Library Philippine Research Education and Government Information Network Detroit Public Library

Secretary of Human Rights of Argentina Gresham College, London Siavonski Brod Public Library, Slovenia Illinois Wesleyan University State Library of TasmaniaKyrgyz Republic National Library Stuttgart University of Applied Sciences Lehigh University, Pennsylvania Vietnam National University Mari EI Republic, Russia Vimercate Public Library, Milan, Italy National Centre for Science Information,

Bangalore, India Washington Research Library ConsortiumNetherlands Institute for Scientific

Information Services Welsh Books Council

Greenstone runs on all versions of Windows, Unix, and Mac OS X. It is easy to install. For the default Windows installation absolutely no configuration is necessary, and end-users routinely install Greenstone on their personal laptops or workstations. Institutional users run it on their main web server, where it interoperates with stan­dard web server software (such as Apache).

In common with many other open-source projects the precise user base for Greenstone is unknown. It is distributed on SourceForge, a leading distribution cen­ter for open-source software. Since 2003 the average number of downloads per month has been over 4500, with 40% of those being the software and the remaining 60%, documentation. 80% of the software downloads are for Windows binaries, with the next biggest category being 15% for Linux, also binaries. Although we do not have detailed information on Greenstone's users, we are aware of many public digital libraries that use the software around the world (Table 18.1).

Greenstone allows users to create digital library collections using a wide variety of source documents: text, web pages, images, audio, video, etc. An inevitable conse­quence of this broad approach is that the collections contain documents in more than one language.

Figure 18.1 shows typical images from interacting with a Greenstone collection: in this case the Niupepa (2005) collection of Maori newspapers (Apperley et al., 2002). A user has accessed the home page of the Niupepa collection where three

Page 8: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

------

--- "'~,.JWo""",,_""'-" ...... ~ ,.

No.l7J.mI1JPU~NGA.W~w:EnWBAXAPONO BEPA.TAI.L liilrII.IIt,.,...._JiBOA LbiMia.. tI:.·ka/. Xu. :71. liE Kl:PI: \HAK.\)JAk:\"t.\._.pCIlIi ---.e... che _1Ill..._clMlilo~_

c r..ai.clia i.._." __ iHa-lci..ko " ~c~:r~':~l~ ~::h~i~~ ,uE~~~, _~1Tu.w.b1lit.tI;_o _O...wr.-A~ lto \'

0100 h: l'Iluril'llt<l nui 0 Trim kn l"alri{ T~en I~ RlIawhiu'o tlJ::.3. J').~lt;~

_ .............. _ wW:d:abokilt._.N....... .PO'I'.Am t TolCtUlUQ. lu.notoc::l.:i,,·ak.t..k~~~:::.~I~=~~~':: f NOl WAU KE TE WRAUPGNO. rl:UlVlM. enf,:;l"i l;:\OJ~ i 1"."h.... HOle,

:l;llII'i re.ir1l.. L"\ hu.ri i. to: hillu n I~" ~ko~Tc"'-I.1ci~KoT~_

t'IlIPh .....b1'wiN.T......,...i-..K~__ 1I:lIIni.....

."..·all.1j·r ~.~__ ;T:>n.N' .'E:~.ti.o....... _ ...... L . '.. Ii

.A'~Ql. e- . ~ ..,.;~.

(a) Viewing a document: extracted text (b) Viewing a document: scanned text

Figure 18.1 The Niupepa collection of Maori newspapers displayed via Greenstone.

options are available to access the Maori language newspapers: rapu (Search), Tanga Pukapuka (Series Listings), and Nga Ra (Date). The user has selected the full-text search option and has undertaken a search for the word 'waka' (canoe). This has returned a list of results and the user has selected the first one. The Greenstone soft­ware then displays a textual representation of the newspaper page with the search term highlighted as in Figure 18.1(a). This feature allows the user to easily find an inconspicuous search term on a large page of text. To view the original image the user clicks on the 'Whakaahua Nui' (Large Image) button and it is displayed as in Figure 18.1(b}.

This example illustrates three aspects of international digital library work:

• Access to a textual representation (thereby allowing textual searching and other access mechanisms).

• Access to an original image of a document. • The interface through which access is provided.

In the case described above the interface language is Maori and the documents themselves are also in Maori. However, universal usability mandates that access to original material should be possible in any language: the interface should be capa­ble of being translated. It makes little sense (and is sometimes distasteful) to have a collection whose content is in Chinese or Hindi, but whose supporting text ­

Page 9: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

instructions, navigation buttons, labels, images, help text, and so on - can only be seen in English.

Automated translation of the original content is a much more difficult problem and is beyond the scope of this chapter. However, the textual elements of interfaces are usually simpler, smaller (in terms of number of words), and more easily separat­ed into manageable fragments (Purvis et aI., 2001; Bainbridge et aI., 2003). Although the text of an interface is important there are also other characteristics of interfaces that influence usability, such as interactivity, position, color, sound, fonts, etc. A gen­eral solution to creating international digital libraries has to cope with the potential variability of all these interface elements (Savarimuthu and Purvis, 2004). In other words, translation is necessary, but not sufficient, for localization.

18.6 Internationalizing Greenstone A distinction is often made between the internationalization of software architec­ture and the specific localization work necessary to adapt software to a specific language and culture (Crystal, 2000; Purvis et aI., 2001; Hogan et ai., 2004). Using this framework we can divide up the work required to create multilingual interfaces.

Technicians: • Internationalize the software architecture of Greenstone to permit the textual

strings of interfaces to be used in more than one language. • Provide a mechanism for nontextual interface elements to be customized.

Translators: • Translate the textual strings for the interface.

Librarians: • Localize the interface based on the translated textual strings.

This division of labor is a natural consequence of the expertise of the respective groups: technicians are not necessarily good at translation and translators should not need to learn a progr~mming language in order for the project to benefit from their skills.

Technically, Greenstone uses the international standard Unicode (Unicode Consortium, 2000) to represent text and builds on the display capabilities of web browsers for displaying documents to users. Documents in any language and charac­ter encoding can be imported and example collections in Arabic, Chinese, Cyrillic,

Page 10: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

- -

package Global

_header_ { The New Zealand Digital Library Project } content { Oops. If you are reading this then an error

has occurred in the runtime system. } Powered by <a href="www.greenstone.org">Greenstone</a>. }

package query

_content_ { _If_<_cgiargqb_ eq "large",_largequerybox_,_normalquerybox_l ... }

# ... the macro descriptions for _largequerybox_, _normalquerybox_, # and other nested macros are omitted for brevity

_header_ [l=en) {Begin search} _header_ [l=fr) {Demarrer la recherche _header_ [l=es) {Iniciar la busqueda}

# ... and so on

# Images containing language-dependent text

## "HELP" ## top_naY_button ## chelp ## _httpiconchelp_ [l=en,v=l) {_httpimg_/en/chelp.gif} _httpiconchelp_ [l=en,v=O) {HELP}

# ... and so on

Figure 18.2 Excerpt of Greenstone macro file syntax to illustrate some of the internationalization features.

French, Spanish, German, Hindi, and Maori can be examined at the New Zealand Digital Library website (www.nzdl.org).

However, internationalization is not just about translating or displaying text frag­ments. Technology-centric issues such as nontextual elements and user-driven ones such as cultural differences need to be considered.

The software developers had to create a mechanism for more detailed customiza­tion that could be used by the librarians. This task involved generalizing the output architecture of Greenstone so that interface components could be selected contextu­ally by language or manually customized. The mechanism to achieve this interna­tionalization of the Greenstone software architecture is a macro language facility.

Figure 18.2 shows an artificially constructed excerpt that illustrates the syntax through which macros are defined and used. Macro definitions comprise a name, flanked by underscores" and the corresponding content, placed within braces ["" "I.

Page 11: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

They are grouped together into packages, with lexical scoping, and an inheritance scheme is used to determine which definitions are in effect at any given time. This allows global formatting styles to be embedded with the particular content that is generated for a page.

Figure 18.2 shows a baseline page defined in the 'Global' package, which, in fact, is never intended to be seen. It is overridden in the 'query' package below to gener­ate a page that invites the user to enter search terms and perform a query. Like other pages, it comprises a _header_ ... _contenC ... Jooter_ sequence.

Macros can include parameters interposed in square brackets between name and content. Such parameters are known as 'page parameters' because they control the overall generation of a page. They are expressed as [x = YI, which gives parameter x the value y. Two parameters of particular interest are J, which determines what lan­guage is used, and v, which controls whether or not images are used in the interface.

Figure 18.2 shows the definition of three versions of the macro _header_ within the 'query' package, corresponding to the languages English, French, and Spanish. They set the language parameter J to the appropriate two-letter international stan­dard abbreviation (ISO 639), enabling the system to present the appropriate version when the page is generated. If a macro has no definition for a given language, it will resolve to the version given without any language parameter - which, in the current implementation, is English (though another language could be chosen as the default).

Greenstone uses many images that contain language-dependent text. These are cre­ated by an open-source utility using scripting to automate image generation. Macro files use a specially constructed form of comment (signified by a double hash, ##) to convey additional information for a progressing program - in this case the image gen­eration script. Near the end of Figure 18.2 an icon with the text HELP is generated and placed in the file cheJp.gif in subdirectory en {for 'English'). The image generation script parses the comment to determine the type of image to generate, the text to be placed on it, and where to put the result. Then it automatically generates the image and stores it in the language-specific directory appropriate to the I page parameter.

A precedence ordering for evaluating page parameters is built into the macro lan­guage to resolve conflicting definitions. Also included are conditional statements. An example can be seen in Figure 18.2's _content_ macro, which uses an If statement, conditioned by the macro _cgiargqb_, to determine whetber the query box that appears on the search page should be the normal one or a large one. The value of _cgiargqb_ is set at runtime by the Greenstone system (the user can change it on a Preferences page). Many other system-defined macros have values that are deter­mined at runtime: examples include the URL prefix where Greenstone is installed on the system, and the number of documents returned by a search.

Page 12: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Figure 18.2 is artificial in both content (to demonstration the salient features of the macro language) and layout (for expository convenience). In reality, all English text phrases are stored together in the file english.dm, with French phrases in french.dm and so on (the suffix .dm denotes a macro file). Package names serve to lexically scope the phrase definitions. There are two files for each language, for exam­ple english.dm and english2.dm. The first contains the core Greenstone phrases and the second those used in auxiliary subsystems. This allows the translation facility to differentiate between these two classes.

The internationalization support systems described in this chapter fulfill the same goal as those in the Firstsearch interface case study (Perlman, 2000). However, fea­tures such as lexical scoping, inheritance, integration with the release system, and translation support system (described below) provide a more general solution to the localization challenge. It is interesting to note that Greenstone has independently evolved a similar community solution to the translation prbblem as that used by the CITIDEL project (Perugini et aI., 2004). CITIDEL's evaluation showed that their community translation was of significantly better quality than a machine translation of the same content. In contrast to the International Children's Digital Library (Hutchinson et aI., 2005), Greenstone does not face the issues of managiqg metada­ta in multiple languages as we provide an infrastructure for other~ to publish content, rather than becoming a content publisher ourselves.

The detail of the macro facility outlined above illustrates some of the work that was required to provide an internationalized architecture for Greenstone. The goal of the tech­nicians in performing this work is to provide an abstraction for the librarians - to enable customization without coding. Tennant (2002) outlines a similar level of abstraction:

... all librarians need not know how to code software. But they should know what soft­ware is capable of doing, when a program could beeasily.writtentoaceomplish a task, and what skills someone needs to write one.

18.7 Supporting Localization: The Translator's Interface The deployment of a localized digital library collection can be viewed as a long-term collaboration between technicians, librarians, and translators. Each of these groups lacks the expertise to succeed on its own, and the roleofthe Greenstone translator's interface is to provide the infrastructure for the deployment and maintenance of the collection. It allows users to:

• Translate the interface into a new language. • Update an existing language interface to reflect new Greenstone facilities. • Refine an existing language interface by correcting errors.

Page 13: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

" ,i8~7" SUPPORTING LOCAUZA1ION":. T~E TRANSLATOR'S "U>.lTERFAQ:E

r ,.

(a) The status page for French (b) Updating a section of the Maori interface

Figure 18.3 Using the translator's interface.

On entry the user selects the target language that they are translating into. The base language is always English, because this is what is used to develop Greenstone and is guaranteed to be the most up-to-date representation of the interface. Originally we planned to allow users ro select other base languages, but we removed this facility for practical reasons: we did not want to compound errors by using a base language that was incomplete or included incorrect translations.

Having selected the target language, the user is shown a status page for that lan­guage (Figure I8.3(a)). Greenstone distinguishes between phrases that are used in the main system - for instance, search, browsing, and help pages - and phrases in less-frequently-used subsystems - for instance, the site administration pages through which usage statistics and logs are viewed, and the translator service itself - for this too needs translating! The phrases are divided into a few sections that reflect this dis­tinction; currently there are four. For each one, the status page shows the number of translations that have been done and the number remaining to do.

In Figure I8.3(b) the user has begun to update the core macro file for the Maori language interface. A single language fragment is shown, and when this string was last updated. Also included is a progress indicator: how many fragments remain to be done. The English phrase appears at the top; below it is the box into which the trans­lated version can be entered. Two kinds of phrase appear: ones that are missing from the Maori version, and ones whose Maori translation is outdated because the English

Page 14: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

18 'INlERNAn0NAUZINGGRE~NSTONE .

version has been edited more recently. In the latter case the outdated translation appears as a visual cue (as in Figure 18.3(b)). After completing the translation the changes are committed back to the translation server.

Another feature of the translator, not shown in the figures, is the ability to search for a language fragment and then, on choosing a particular occurrence, enter into a translation window similar to that shown in Figure 18.3(b). The sort of situation where this is useful is when a translator is checking the text in situ in the reader's inter­face and spots an error, say a typing mistake. Using the search feature they can quick­ly locate the macro with the mistake in it, and correct it. Internally, the language frag­ments are managed as a private collection to Greenstone (which means it does not normally appear to a regular user), and therefore the ability to search it comes for free.

Hogan et al. (2004) note the importance of the 'ability to incorporate the context of each translation' when working with internationalized software applications. Changes to the translator's interface take place immediately: users can see their new translations in context by accessing (or reloading) the appropriate pages in Greenstone. However, these changes are not made automatically to the public Greenstone site, nor are they automat­ically committed to the master software repository. Instead, to guard against error and misuse, they take effect in a special replica of the Greenstone site used for translation. When satisfied with the entire translation, users notify the central Greenstone repository's administrator of the change through email. Then, issuing a single command fully inte­grates the changes into the officially released version of the software.

Because each translated text string is saved when it is submitted, a user need not translate all phrases in one sitting. Moreover, when they return to the service the sys­tem regenerates everything from scratch, which means that only the outstanding phrases are shown. For well-maintained language interfaces such as Spanish, French, and Russian, only a few new translation requests are generated when new features are added. However, some less-used language interfaces contain translations only for the core phrases that appear in the main system.

New languages are added in the same way that existing ones are updated, except that no existing translations appear in the right-hand column. A would-be translator emails the system administrator and the administeat:or manually adds the new lan­guage to the list. There are a total of about 750 phrases in the entire interface. Of these about 60% (450 phrases) pertain to the core Greenstone system, which every language interface covers; the remainder are for the less-used 'auxiliary' parts of the interface. Of the existing language interfaces, 15 are for the complete interface and the remaining 23 cover just the core parts.

Sometimes phrases in an existing language interface need to be refined. For example, a typographical error may have been overlooked when entering a phrase, or seeing a

Page 15: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

phrase in its actual interface context may suggest a better form of expression. To accom­modate this requirement, users need to be able to locate an existing phrase and update its translation. Consequently each page of the interface contains a link to a search facil­ity that allows you to find all fragments that contain the specified search terms.

As with the macro language, the Translator's Interface provides a simplified view of the complex internals of Greenstone. The textual fragments are separated from other elements so that translators only see relevant content. The ability to perform this separation, and provide this service, is an effective operational test of the inter­nationalization of a software application.

18.7.1 Alocalization Example The localization process involves adapting the textual elements of the digital library interface and customizing other elements of the interface. For example, the collec­tions at Ulukau: The Hawaiian Electronic Library (U1ukau, 2005) use a customized version of Greenstone that provides a graphical aid for creating queries containing specific Hawaiian characters, allowing users to create an accurate query without hav­ing to remember sequences of keystrokes.

The localization shown in Figure 18.4 goes beyond that shown in Figure 18.1. In the case of Ulukau the local Hawaiian language community had sufficient technical knowledge to be able to add more extensive customization. This customization was possible because Greenstone is released as open source software; the community had permission to look inside and change the program. The librarians who made these changes were aware that their reader community had difficulty using standard key­boards to construct their Hawaiian language queries. Local contexts such as this are

Search t

S~af(:h for arucles which contctin Ct';;;:; ....10' the words

W9t'(J «Xll't: t:~: O.lu.~~

... dtxumentS mat<.hf-d Ulto queI)'.

~~,,-H:~U:.:'=.,J~:::~i~~~~~I~~;::.~~~~~J~A~~;'~:::'~::::=....~~.,umu~;b~..b~ ........It.'.... p.ob

"",~~.",,"'_Idpe·oItMiolrwU"';»·c:.;... .m.,............ ·.~·....'.4.iI...;"i'.I.'....

~ No lc.a Pono 0 ka IiO'ikehooua j'OriJ1irWH#>ffLiMn ~~r:WrJ ~ :'.' __' _~ ~ _ ,, __.._ ~__~,. _ k. '-~_•• _ _ • _. h ,.,,_., ,,_'-'. __ '.~ '.:;"~. _ _~_

Figure 18.4 Ulukau: the Hawaiian Electronic Library.

Page 16: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

::0 0 (') . . :__~(t~. A-L_..__ .. _ Context: Training ~. "

1* First Aid ~ in Pictures

Figure 18.5 The Greenstone collection First Aid in Pictures.

difficult to predict for the technicians developing the Greenstone software. Although the internationalization of the architecture meets many localization needs, it is unlike­ly to ever be complete: in many circumstances the flexibility of the open source model provides the final localization step.

The flexibility of the localization possible in Greenstone is also evident in the cre­ation of a collection aimed at illiterate users (Deo et aI., 2004). The collection First Aid in Pictures derives from an internationalized book that describes medical proce­dures without a textual interface. Figure 18.5 shows searching and browsing structures provided by nontextual means.

18.8 The Effects of localization As multilingual collections are created and achieve significant usage levels, it becomes possible to study the effects of internationalization on the user experience, to tease out insights into the effect of interface language on user information behavior. Research is no longer limited to small-scale experiments over artificial tasks and collections - it suddenly becomes feasible to learn what real users do in multilingual digital libraries, as they attempt to satisfy authentic information needs.

Log analysis is a particularly promising tool, as it allows the researcher to sum­marize large amounts of user activity over extended periods. Analysis of a year's

Page 17: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

8.9 .". .

usage of the Niupepa collection of Maori newspapers (Figure 18.1), for example, has identified patterns of behavior with implications for interface and interaction design (Keegan and Cunningham, 2005a,b). The Niupepa collection allows users to choose between an English and a Maori interface to the predominantly Maori language doc­ument collection. Three patterns of usage emerged from the log data: usage sessions conducted primarily via the English language interface (surprisingly, two-thirds of sessions), sessions conducted via the Maori interface, and 'bilingual' sessions in which the user switched between the two languages. Users in 'bilingual' sessions presumably changed interface to access a retrieval strategy in a language that they were more comfortable with, or perhaps to access a certain text that was only stored in one lan­guage in a multi-language collection - further (qualitative) research is needed to come to a deeper understanding of the 'why' behind the patterns located in the logs. Information retrieval strategies differed between sessions conducted under the Maori or the English language interface; the Maori interface users tended to rely more heav­ily on browsing methods to access pages of interest in the collection rather than search, and the English language interface sessions included considerably more searching than browsing. Historically, Maori has been an oral rather than a written language, and so it is likely that the proportion of Niupepa users who are fluent in written Maori is small - and so browsing is a more effective information-seeking mechanism for these users. The implications for interface design are clear: to provide more, and more flexible, browsing tools through the Maori language.

Experiments with alternating the default interface language between English and Maori indicate that, as perhaps might be expected, users tend to accept the default­no matter which of the two languages it is (Keegan and Cunningham, 2005a,b). Given that users appear to have weak preferences as shown by their disinclination to switch languages, it seems more natural to make Maori the default. Digital library developers who have more than one interface language available have the opportu­nity, perhaps obligation, to ensure that the default language best matches the lan­guage of content of the collection and thus provides the user of the collection a smoother and more comfortable information retrieval experience.

18.9 Understanding Localization in Context: Training Training is a serious barrier to the adoption of advanced information systems in pub­lic institutions - particularly in poorer parts of the world where commercial courses are infeasible. UNESCO's Communication and Information program has mounted courses on the use of Greenstone for building collections, in Bangalore (India), Almaty (Kazakhstan), Dakar (Senegal), and Suva (Fiji). Courses sponsored by other

Page 18: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

sources have been mounted in Colima (Mexico), Puna and Kohzikode (India), Havana, Cape Town, Shanghai, Singapore, and Arusha (Tanzania). There have also been many courses in the USA and Europe.

These workshops promote the development and sharing of digital library collec­tions using Greenstone. The aim of the UNESCO workshops is to train trainers, and participants are expected to promote digital library collection development by con­ducting similar programs in their countries. Attendees are supplied with a full set of teaching material, in printed form and also on CD-ROM, that they can use for local courses back home.

To give a specific early example, the first Fiji workshop took place in November 2003, sponsored jointly by the University of the South Pacific Library. It was attend­ed by 15 participants from 7 countries in the Pacific region: Fiji, Marshall Islands, Papua New Guinea, Samoa, Solomon Islands, Tonga, and Vanuatu. Most of the par­ticipants were librarians and library systems personnel from national and education­al institutions in these countries. The workshop was designed and conducted by members of the New Zealand Digital Library project. ThN:.ontent was based on an earlier workshop at NCSI, Indian Institute of Science, Bangalore.

The workshop covered the following aspects: overview of Greenstone's features, capabilities, and applications; platforms, installation, and configuration; using the Librarian Interface to build collections and add and use metadata; advanced features of the Interface; sample collections; multilingual support; new interface languages. Most of the presentation and demonstration sessions were followed by laboratory sessions where the participants experimented with Greenstone through carefully designed exercises. Each participant had a dedicated workstation with the Windows XP operating system. Apart from the printed course material, Greenstone and asso­ciated software were distributed to the participants on CD-ROM.

During the lab sessions, each participant built (among other things) a small Greenstone collection of images about tourism in their home country, downloaded from the web, with manually assigned metadata. At the end of the workshop these were all placed on a self-installing Greenstone CD-ROM entitled 'Pacific Tourism,' and copies were made for participants to take away as a memento. The collections were charming: all had striking images, some had maps, and one included an under­water video.

Course evaluation forms were used to assess the effectiveness of presentations and lab exercises. Based on formal and informal feedback from participants, the work­shop was successful in imparting a conceptual and practical understanding of the development of digital collections and the use of Greenstone. There was a consensus that further time was needed to cover all the features of the software that participants

Page 19: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

wanted to learn about, and many expressed interest in a more advanced follow-up workshop. However, opinion was split as to whether additional time would be bet­ter spent in presenting new information or reinforcing existing material through more lab exercises. In fact, it would be useful in future to divide attendees on the basis of their prior knowledge and experience and run parallel sessions.

All course material (slightly revised to correct minor errors that were discovered during the workshop) has been made freely available on the Greenstone web site (www.greenstone.org) for others to use in future workshops. Face-to-face training sessions have been especially useful for connecting the technicians and the librarians; who usually only communicate electronically.

(

18.10 Implications for users 18.10.1 Shift the User Mindset: Users are Part of the Team Greenstone has many of the attributes of a 'real' piece of software: its releases have version numbers indicating that it has been in existence a respectable amount of time, regular updates are released, documentation is available, and so .forth. 'Proper' soft­ware should be accompanied by extensive user support services: help lines, dedicated user support personnel, guaranteed response times for queries. This level of user sup­port is simply not possible for a product created in a university research lab.

So as we make it easy for users to contribute, we must also try to shift the mind­set of users: to let users see that they do have something to contribute to Greenstone. One part of this shift comes from literally allowing users to join the Greenstone fam­ily: a donation allows an individual or a group to become a 'Friend of Greenstone,' the advantages of which include additional support and time from a project member to explore that individual's particular requirements.

We also seek to engage users in dialog, to have our users tell us what features of Greenstone they find frustrating or difficult to understand. The aim is to move users from seeing themselves as powerless end-users, to participants in an ongoing usabili­ty evaluation and interface design refinement process. To this end, we have created a software framework for this 'participatory usability' approach that allows users to easily report the context of a Greenstone interaction that they find problematic (Nichols et aI., 2003).

18.10.2 Make it Easy for Users to Contribute Most of our users are end-users, typically readers and librarians, rather than program­mers. In many ways this has been a disadvantage. We feel that we have not been able to benefit as much as other open source systems do from others helping us develop the

Page 20: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

:~

system and fix bugs. Many of the responses on our mailing lists come from our own software developers, rather than being contributed by the wider community.

However, we have found users very eager to collaborate and contribute. One way we have benefited from this is in translation of the interface into different languages ­an area which the software developers could not possibly tackle by themselves. The sheer magnitude of effort involved in translating a substantial user interface into different languages is staggering. In the case of Greenstone, there are about 700 separate language strings in the reader's interface alone, which has been translated into nearly 40 languages. (There are almost twice as many strings in the librarian's interface, but this has been rendered in only five languages.)

18.11 Implications for Designers 18.11.1 Adapt the Infrastructure to the Strengths of the

Respective User Groups Examining the infrastructure that has evolved with Greenstone, it appears as if we have converged on an arrangement called 'central oversight with local empowerment' (COLE) (Woods, 2005). This COLE organization ,is regarded as best practice in glob­al enterprises for playing to the strengths of the various stakeholders whilst retaining sufficient coordination to benefit from the advantages of consistency. The localization work is pushed away from the core of developers to those places where it can be per­formed most efficiently: the language communities themselves. The strengths of the COLE approach mean that locale-specific issues are dealt with by local experts, but as this work is performed within a global standard structure then it can easily be reused by other groups (Woods, 2005). For example, although a language interface for use in Brazil needs to be tailored to represent differences from Portuguese, the work is incremental rather than starting from a blank slate.

The COLE structure of Greenstone leaves the technicians free to concentrate on their speciality - designing and coding software - whilst hiding the complexity from other groups. The librarians see a tool that -abstracts away Jromcode and represents familiar concepts such as collections and metadata. The translators see language frag­ments (and their contexts of use) and can effectively ignore many other aspects of the application. The readers see their locale-specific content presented with an appropri­ate interface that facilitates their access to information.

It is easy for designers and developers to regard universal usability as a challenge for their skill set. The experience of the localization of Greenstone is that involving the users as partners in development can be an effective approach that can spread the workload.

Page 21: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

18.11.2 Don't be a Linguistic Perfectionist Despite our exhortation to cultivate attention to detail in the interface itself, we have been remarkably simplistic in our approach to linguistic support of different lan­guages. For example, stemming is implemented only for English and, in a rather sim­ple form, for French. Alphabetization of browsing lists has not been examined care­fully in different languages. It probably does not work well for many languages, and certainly not for idiographic ones from which the very notion of alphabetic ordering is absent. For example, in Chinese we need a Pinyin browser and a stroke-based browser for those older readers who are more comfortable with this representation of character ordering. Greenstone presently has neither. Alphabetical lists of titles in English follow the convention that initial articles (A and The) are ignored in the ordering; this is not extended to other languages.

In fact, we are surprised that we have been able to get away with this sloppiness. Users submit countless many questions and requests for further features to the mailing list, but rarely comment on linguistic issues like this. Despite the attention paid to stem­ming in information retrieval, most users do not really seem to care - and many do not even know whether their favorite search engines stem query terms or not. As far as non­English languages are concerned, we believe that most users are surprised and grateful to find that the interface is available at all in their own language, and do not worry too much about linguistic deficiencies. It would have been easy for us to get bogged down in trying to cope properly with different languages, to the extent that the software was never released or used. Fortunately, we have avoided this optimality trap.

18.12 Implications for Policymakers 18.12.1 Reap the Benefits of Open Source Software The elements of localization that extend beyond translating the text are hard for a software development to predict. A global user base means that there will always be requests for types of customization that the software does not currently provide. However, an open source application, such as Greenstone, allows unlimited cus­tomization by technically competent users. The Ulukau digital library shows that localization can be facilitated by an appropriate licensing scheme; it is instructive to contrast Ulukau with previous localization experiences in Hawaii. Warschauer (1998) describes how the Hawaiian language community developed their own soft­ware systems because they could not find localized versions appropriate to their needs. In Greenstone's case, the licensing facilitated developers to incrementally local­ize, rather than develop appropriate software from scratch.

f'

Page 22: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Although the philosophy behind Greenstone's development has been one of tool empowerment, it is clear that constructing a completely generic digital library tool is very difficult. Although Greenstone, as distributed, is a very flexible tool, the open source licensing model provides local communities with the potential for fine-grained localization for their specific needs.

Greenstone has received funding from the New Zealand government to aid its development. It is common for funded projects to be asked to justify their economic impact and, of course, Greenstone can be freely downloaded and used. However, a spin-off company, DL Consulting, is now providing support, consulting, and custom collection-building services based around Greenstone software. Policymakers need to recognize that the impact of open source projects can be more indirect and diffuse than that of clearly commercially relevant projects. For example, given Greenstone's work in developing countries it could almost be considered part of New Zealand's overseas development aid.

18.12.2 Provide Support for Development as Well as Research Greenstone has been created within a research group. While some aspects of Greenstone are in themselves research, the majority of the Greenstone effort has been straightforward software development. But having created Greenstone, it is a research tool: it supports experimentation by ourselves and researchers worldwide in digital library user interfaces, novel approaches to searching and browsing, facilities for mul­timedia documents, and so forth. The benefits to us as researchers are incalculable, but at times the costs to the group have seemed high, and the path precarious.

Recognition by funding agencies and policymakersof the dual development! research roles that a group must take on for a project such as Greenstone would ease the pain of producing both solid software and solid research. Researchers typically do not have the experience, expertise, or energy to distribute and market their work; it would be invaluable if infrastructure were available for these portions of the soft­ware creation cycle that are so far outside normal research activities.

18.12.3 Do not Neglect Training As we have noted above, training courses are an important part of the overall pro­gram of getting Greenstone widely used, particularly in developing countries. It is easy for software developers to neglect this because they are used to dealing with other software specialists. In our case the principal users are librarians, and they often need to be gently introduced to the system in a carefully designed training program.

Page 23: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

We should probably have identified trainers as a further class of user, with their own requirements. In fact, most training courses have been run by the system developers, or by self-taught users. In future, as we expand our training program, we will probably have to identify the needs of trainers and cater for them as a separate user group.

Our holistic view of software distribution that includes training end-users has been immensely valuable. It has enabled face-to-face contact between developers and users which affords the developers new insights into how people sometimes struggle valiantly with the software - problems that may not be discussed in other fora, such as mailing lists. Running training courses has also given us an immense amount of personal satisfaction.

18.13 Implications for Researchers 18.13.1 Partner with Global Organizations Partnering with global organizations can provide an invaluable source of expertise and resources. In August 2000 the Greenstone project entered into a partnership with UNESCO (and also with the Belgium-based Human Info NGO). Through its Information for all program, UNESCO recognizes that digital libraries are radically reforming how information is acquired and disseminated in its partner communities and institutions in the fields of education, science, and culture around the world, and particularly in developing countries. UNESCO distributes the Greenstone software widely in developing countries with the aim of empowering users, particularly in uni­versities, libraries, and other public service institutions, to build their own digital col­lections. Their hope is that this software will encourage the effective deployment of digital libraries to share information and place it in the public domain.

The partnership with UNESCO has been a crucial feature ofthedevelopment and internationalization of Greenstone. They give universal credibility to the internation­al branding of Greenstone. They have strongly encouraged us to take a global out­look (particularly in developing countries). They have provided us with contacts, with multilingual resources, and (though to a far lesser extent) a limited amount of seed funding.

For example, all of the Greenstone documentation has, with the aid of UNESCO, been translated into Spanish, French, and Russian. This includes not just the end-user reader's interface, but the full documentation for building collections, all buttons, menu items, and online help in the Greenstone Librarian Interface which is used for building collections, and all error and warning messages. For example, some of the output from our Perl scripts is fed to the user to indicate the status of the collection­building operations, or warning messages, and these have also been translated. To

.:..~

Page 24: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

produce a comprehensive piece of interactive software, with full documentation, in four languages is something far beyond our own resources.

The global outlook of Greenstone has been facilitated through UNESCO cooper­ation and has provided momentum to ensure that the software can be successfully localized. The readers and librarians have benefited from the multilingual focus that a large organization such as UNESCO has brought to the software development. In effect, UNESCO has acted as a user champion to ensure the thorough international­ization of the software.

18.14 Future Directions The Greenstone project is negotiating a delicate tightrope between producing and promulgating socially useful software on the one hand and undertaking cutting edge computer science research on the other. The current 'production' version of the sys­tem, which we call Greenstone2, was designed over five years ago and has reached a stage where it is comprehensive, mature, and reliable. Nevertheless, new require­ments continue to emerge. For example, as we write we are extending a new facility for incorporating CDSIISIS databases into Greenstone collections, a format that is unknown in the West but widely used in developing countries for _storing biblio­graphic records, even in major libraries. To debug this facility we must collaborate closely with users in developing countries: at home there are no CDSIISIS users.

We are also striving to improve the software engineering methodology used to develop the system. Refactoring is the process of restructuring code to improve its design in a way that does not alter its observable behavior (Fowler et al., 1999). The improved design is intended to ensure that further enhancements to the software are easier to make, and reduce the possibility of introducing new bugs or unintended side­effects. Backwards compatibility has become a watchword for Greenstone2 develop­ment, and is motivating the design of a refactoring tool that leverages off aspects pecu­liar to digital library software. Creating a digital library collection involves generating many static files such as indexes and database tables that encapsulate much of the external behavior of the digital library. S.pottingdiifecencesbetween files generated by original and refactored code allows developers to identify potential errors in the revised version. This provides a kind of regression testing facility.

Alongside the production version of Greenstone we are working on a radical redesign and complete reimplementation, called Greenstond, informed by end-users' and collection developers' experience over the last decade. This provides flexible ways of dynamically configuring the run-time system and adding new services. It modular­izes the internal structure and simplifies the addition of new modules. It is written in

Page 25: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Java to promote portability, dynamic loading of objects, and internationalization. Modules communicate by streaming XML messages between each other. Using SOAP this communication can be distributed across a network. All modules have the abili­ty to describe themselves in a machine readable form, and to apply an XSLT to trans­form messages. This is instrumental in providing different levels of configurability, an important ability given the different types of people involved in the lifecycle of a dig­ital library.

In keeping with our original philosophy of laying the basic groundwork first, Greenstone3 already provides full backwards compatibility to collections built with Greenstone2. It can serve collections built under the old regime without any modifi­cation at all, and present them in a way that looks indistinguishable to readers. At present we recommend librarian-level users to work with Greenstone2, secure in the knowledge that their collections will continue to operate when they eventually upgrade to the new system. We recommend computer science-level users who wish to develop new services that correspond to new ways of accessing and presenting infor­mation to work with Greenstone3.

This twin approach is intended to satisfy users in developing countries who want reliable, easy-to-use software that runs on low-level hardware (Greenstone2), as well as high-end research-level users who enjoy sophisticated computer environments (Greenstone3). Our long-term research aim is to encourage digital library researchers, including ourselves, to make their new facilities available to real end-users. Our long­term service aim is to do this in a way that will eventually allow people in all corners of the world to benefit from the advances.

18.15 Conclusion Providing truly international digital library collections requires a collaboration infra­structure that connects the librarians, technicians, and translators. Greenstone can provide end-users (the readers) with multilingual documents using different interface languages. However, in terms of universal usability the software developers have to consider the librarians as a separate set of users, whose goal is to produce collections for the readers. A current goal of research on Greenstone is to address the needs of this group of users to make it easier to adapt and localize collections without being overwhelmed by the technical details of server-side collection customization. The recent results (Keegan and Cunningham, 2005a,b) that collection interfaces translat­ed into indigenous languages are actually used in those languages provides the Greenstone development team with additional rationale for easing the process of col­lection customization.

Page 26: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Greenstone has adapted its multilingual support to accommodate many lan­guages, in doing so it has simplified usage for technicians, translators, and librarians. As the user base has become more diversified the range of contexts of use has become greater and consequently the software has had to adapt to these diverse conditions. As a result of coping with these 'edge' conditions, the software is now increasingly robust in normal usage. This experience supports Shneiderman's (2000) belief that a 'broader spectrum of usage situations forces researchers to consider a wider range of designs and often leads to innovations that benefit all users.' In fact, the multilingual nature of Greenstone has encouraged a more diverse user population than an English­only application; the effects of this diversity of use are felt throughout the application and are not constrained to the language components. In this way the benefits arising from addressing the universal usability challenge are distributed to all the user groups: the translators, the technicians, the readers and the librarians.

References Apperley, M., Keegan, T.T., Cunningham,S.}. and Witten, I.H. (2002) Delivering the Maori-language newspapers on the Internet. In Curnow, J., Hopa N. & McRae, J. (Eds.) Rere atu, taku manu! Discovering History, Language and Politics in the Maori-Language Newspapers, Auckland University Press, Auckland, New Zealand, 211-232.

Aykin, N. (2005) Overview: where to start and what to consider. In Aykin, N. (Ed.) Usability and Internationalization of Information Technology, Lawrence Erlbaum Associates, Mahwah, N}, 3-20.

Bainbridge, D., Edgar, K.D., McPherson, }.R. and Witten, I.H. (2003) Managing change in a digital library system with many interface ,languages. Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2003), Springer-Verlag, Berlin, Germany, LNCS 2769, pp. 350-361.

Crystal, D. (2000) Language Death. Cambridge University Press, Cambridge.

Deo, 5., Nichols D.M., Cunningham,S.}. et al. (2004) Digital library access for illiterate users. Proceedings ofthe International Research Conference on Innovations in Information Technology (IlT 2004), UAE University, Dubai, UAE, pp. 506-516.

Fowler, M., Beck, K., Brant, J. et al. (1999) Refactoring: Improving the Design of Existing Code. Addison-Wesley, Reading, MA.

Hogan, J.M., Ho-Stuart, C. and Pham, B. (2004) Key challenges in software interna­tionalisation. Proceedings of the Australasian Workshop on Software

Page 27: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Internationalisation (AWSI 2004). ACSW Frontiers 2004: Conferences on Research and Practice in Information Technology, Volume 32, Australian Computer Society, Inc., Sydney, Australia, 187-194.

Hutchinson, H.B., Rose, A., Bederson, B. et a1.(2005) The International Children's Digital Library: a case study in designing for a multi-lingual, multi-cultural, multi­generational audience, Information Technology and Libraries, 24(1),4-12.

Keegan, T.T. and Cunningham, 5.1. (2005a) Language preference in a bi-Ianguage digital library. Proceedings of the Joint Conference on Digital Libraries (jCDL '05), ACM Press, New York, pp. 174-175.

Keegan, T.T. and Cunningham,S.]. (2005b) What happens if we switch the default language of a website? Proceedings of the 1st International Conference on Web Information Systems and Technologies (WEBIST '05), INSTICC, Setubal, Portugal, pp.263-269.

Lagoze, C, Payette,S., Shin, E. and Wilper, C (2006) Fedora: an architecture for complex objects and their relationships. International Journal on Digital Libraries, 6(2), 124-138.

Nichols, D.M., McKay, D. and Twidale, M.B. (2003) Participatory usability: empow­ering proactive users, Proceedings of the 4th Annual Conference of the ACM Special Interest Group on Computer Human Interaction - New Zealand Chapter (CHINZ'03), ACM SIGCHI New Zealand, Dunedin, New Zealand, pp. 63-68.

Nichols, D.M., Witten, I.H., Keegan, T.T. et al. (2005) Digital libraries and minori­ty languages. New Review of Hypermedia Multimedia, 11(2), 139-155.

Niupepa: Maori newspapers (2005) Available online at: http://nzdl.org/niupepa (accessed 25 October 2005).

Perugini,S., McDevitt, K., Richardson, R. et al. (2004) Enhancing usability in CITIDEL: multimodal, multilingual and interactive visualization interfaces. Proceedings of the Joint Conference on Digital Libraries (jCDL' 04), ACM Press, New York, pp. 315-324.

Perlman, G. (2000) The FirstSearch user interface architecture: universal access for any user, in many languages, on any platform. Proceedings of the Conference on Universal Usability (CUU 'OO),ACM Press,New York, pp. 1-8.

Purvis, M., Hwang, P., Purvis, M. et al. (2001 ) Apracrica1 look at software inter­nationalisation. Journal of Integrated Design Processes in Science, 5(3), 79-90.

Savarimuthu, B.T.R. and Purvis, M. (2004) Towards a multi-lingual workflow system - a practical outlook. Proceedings of the Australasian Workshop on Software

Page 28: Introduction - Georgia Institute of Technologysonify.psych.gatech.edu/~walkerb/classes/ms-hci/extrareading/Laza… · libraries increase the accessibility of information sources,

Internationalisation (AWSI 2004). ACSW Frontiers 2004: Conferences on Research and Practice in Information Technology, Volume 32, Australian Computer Society, Inc., Sydney, Australia, pp. 205-210.

Shneiderman, B. (2000) Universal usability. Communications of the ACM, 43(5), 85-91.

Tansley, R., Smith, M. and Walker, J.H. (2005) The DSpaceopen source digital asset management system: challenges and opportunities. Proceedings of the 9th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2005), Springer-Verlag, Berlin, Germany, LNCS 3652, pp. 242-253.

Tennant, R. (2002) The digital librarian shortage. Library Journal, 127(5),32.

Ulukau: the Hawaiian Electronic Library (2005) Available online at: http://ulukau.org (accessed 25 October 2005).

The Unicode Consortium (2000) The Unicode Standard, Version 4.0. Addison­Wesley, Boston, MA.

Warschauer, M. (1998) Technology and indigenous language revitalization: analyz­ing the experience of Hawai'i. Canadian Modern Language Review, 55(1), 140-161.

Witten, tH. and Bainbridge, D. (2003) How to Build a Digital Library. Morgan Kaufmann, San Francisco, CA.

Witten, tH., Cunningham,S.]., Vallabh, M. and Bell, T.e. (1995) A New Zealand digital library for computer science research. Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (DL'95j, Texas A&M University, Austin, TX, pp. 25-30.

Witten, tH., Loots, M., Trujillo, M.E and Bainbridge, D. (2002) The promise of dig­itallibraries in developing countries. The Electronic Library, 20(1), 7-13.

Woods, J. (2005) Managing multicultural content in the global enterprise. In Aykin, N. (Ed.) Usability and Internationalization of Information Technology, Lawrence Erlbaum Associates, Mahwah, NJ, 123-154.

Yeo, A.W. (2001) Global-software development lifecyc1e: an exploratory study. Proceedings of the Conference on Human Factors in Computing Systems (CHI' 01), ACM Press, New York, 104-111.