10
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 141–150, Honolulu, Hawai‘i, USA, March 6–7, 2017. c 2017 Association for Computational Linguistics Waldayu and Waldayu Mobile: Modern digital dictionary interfaces for endangered languages Patrick Littell Carnegie Mellon University Language Technologies Institute 5000 Forbes Ave. Pittsburgh PA 15213 [email protected] Aidan Pine University of British Columbia Department of Linguistics 2613 West Mall Vancouver, BC V6T 1Z4 [email protected] Henry Davis University of British Columbia Department of Linguistics 2613 West Mall Vancouver, BC V6T 1Z4 [email protected] Abstract We introduce Waldayu and Waldayu Mobile, web and mobile front-ends for endangered language dictionar- ies. The Waldayu products are de- signed with the needs of novice users in mind – both novices in the lan- guage and technological novices – and work in tandem with existing lexico- graphic databases. We discuss some of the unique problems that endangered- language dictionary software products face, and detail the choices we made in addressing them. 1 Introduction Lexicographers have noted that with the in- crease in access to digital technology, “lexi- cography is clearly at a turning point in its his- tory” (Granger and Paquot, 2012). While the changes that technology presents to lexicogra- phy are relevant to non-endangered languages as well, there exists a unique set of challenges in developing lexicographic materials for en- dangered languages in particular. We iden- tify and address two of these fundamental and perennial difficulties. 1. At least in the North American context (and likely elsewhere), there are rela- tively few potential users who are both fluent in the language and trained in a systematic orthography. Many users are students who have not yet achieved flu- ency in the language they are searching. 2. Lexicographic efforts have, in many communities, taken place over genera- tions by various scholars, using a variety of formats, orthographies, and assump- tions, leading to data sets that are often very heterogeneous. To address these issues, we have devel- oped Waldayu and Waldayu Mobile. Wal- dayu is an orthographically-aware dictionary front-end with built-in approximate search, and a plugin architecture to allow it to op- erate with a variety of dictionary formats, from the output of advanced back-ends like TLex (Joffe and de Schryver, 2004), to semi- structured HTML like online word lists, to simple word/definition spreadsheets. Wal- dayu Mobile is a mobile implementation of Waldayu which is compatible with both An- droid and iOS devices. Waldayu and Waldayu Mobile were orig- inally developed to provide online and mobile interfaces to a forthcoming Gitk- san (Tsimshianic) e-dictionary, but are in- tended to be language-neutral and have since been expanded to St’at’imcets (Salish), Nuu- chah-nulth (Wakashan), Sliammon (Salish), 141

Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 141–150,Honolulu, Hawai‘i, USA, March 6–7, 2017. c©2017 Association for Computational Linguistics

Waldayu and Waldayu Mobile:Modern digital dictionary interfaces for endangered languages

Patrick LittellCarnegie Mellon University

Language Technologies Institute5000 Forbes Ave.

Pittsburgh PA [email protected]

Aidan PineUniversity of British Columbia

Department of Linguistics2613 West Mall

Vancouver, BC V6T [email protected]

Henry DavisUniversity of British Columbia

Department of Linguistics2613 West Mall

Vancouver, BC V6T [email protected]

Abstract

We introduce Waldayu and WaldayuMobile, web and mobile front-endsfor endangered language dictionar-ies. The Waldayu products are de-signed with the needs of novice usersin mind – both novices in the lan-guage and technological novices – andwork in tandem with existing lexico-graphic databases. We discuss some ofthe unique problems that endangered-language dictionary software productsface, and detail the choices we made inaddressing them.

1 Introduction

Lexicographers have noted that with the in-crease in access to digital technology, “lexi-cography is clearly at a turning point in its his-tory” (Granger and Paquot, 2012). While thechanges that technology presents to lexicogra-phy are relevant to non-endangered languagesas well, there exists a unique set of challengesin developing lexicographic materials for en-dangered languages in particular. We iden-tify and address two of these fundamental andperennial difficulties.

1. At least in the North American context(and likely elsewhere), there are rela-tively few potential users who are both

fluent in the language and trained in asystematic orthography. Many users arestudents who have not yet achieved flu-ency in the language they are searching.

2. Lexicographic efforts have, in manycommunities, taken place over genera-tions by various scholars, using a varietyof formats, orthographies, and assump-tions, leading to data sets that are oftenvery heterogeneous.

To address these issues, we have devel-oped Waldayu and Waldayu Mobile. Wal-dayu is an orthographically-aware dictionaryfront-end with built-in approximate search,and a plugin architecture to allow it to op-erate with a variety of dictionary formats,from the output of advanced back-ends likeTLex (Joffe and de Schryver, 2004), to semi-structured HTML like online word lists, tosimple word/definition spreadsheets. Wal-dayu Mobile is a mobile implementation ofWaldayu which is compatible with both An-droid and iOS devices.

Waldayu and Waldayu Mobile were orig-inally developed to provide online andmobile interfaces to a forthcoming Gitk-san (Tsimshianic) e-dictionary, but are in-tended to be language-neutral and have sincebeen expanded to St’at’imcets (Salish), Nuu-chah-nulth (Wakashan), Sliammon (Salish),

141

Page 2: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

Squamish (Salish), Thangmi (Sino-Tibetan),and Cayuga (Iroquoian).

In Section 3, we discuss some of the userexperience principles we have adopted, and inSection 4 discuss our challenges and solutionsin adapting these to mobile devices. In Sec-tion 5, we discuss our implementation of ap-proximate search.

2 A reusable front-end for novice users

Waldayu is primarily intended as a front-endsolution to the perennial “Dictionary prob-lem”: that dictionaries are fundamentally alanguage-learning tool but require a certainlevel – sometimes an advanced level – of lan-guage knowledge to use in the first place.This is particularly apparent in our field con-text, the North American Pacific Northwest,where the sheer phonological and morpholog-ical complexity of the languages makes tradi-tional print dictionary use particularly difficult(Maxwell and Poser, 2004).

We can observe this when, to give a real-lifeexample, a user is trying to look up the wordfor “coyote” (sn ’k ’yep) in Thompson’s mon-umental print dictionary of NłePkepmxcın(Thompson, 1996). To successfully find thisword, the user must know that it is alphabet-ized under ’k rather than s or n (which are pre-fixes/proclitics) and that ’k and ’y are distinctfrom k and y for the purposes of collation.With thousands of nouns starting with s andabout ten phonemes that might be confusedwith k (both pan-Salishan traits), most wordsin the dictionary cannot easily be found bynovice users.

This is not a specific criticism of Thomp-son’s lexicographic choices; complex lan-guages require the lexicographer to make dif-ficult decisions about orthography, morphol-ogy, morphophonology, and collation, andany decision they make about these will posedifficulty for some segment of novice users.

However, modern technology – particularlyapproximate search – allows us a way to meetthe user halfway, by letting the system itselfbe aware of complications (prefixes/proclitics,easily-confused sounds, orthographic differ-

ence, etc.) that novices have yet to master.Waldayu is not, however, intended to be a

replacement for a mature, collaborative lex-icographic software solution such as TLex,Kamusi (Benjamin and Radetzky, 2014), orthe Online Linguistic Database (Dunham,2014). Most lexicographic teams we have en-countered already have preferred back-ends,file formats, and workflows; solving the “dic-tionary problem” for users should not requireteams to abandon solutions into which theyhave already invested time, effort, and re-sources. Rather, Waldayu and Waldayu Mo-bile are intended to serve as a lightweight, un-clutttered online front-end for novice users,while expert users can make use of more ad-vanced functionality offered by a mature lexi-cographic database.

3 User experience and interaction

There are five user experience principles thatwe attempted to make consistent throughoutboth products, so as to remove barriers fornovice users who may not be familiar with on-line dictionaries, or online interfaces in gen-eral.

3.1 Consistent control behaviours

Each control (button, search box, link, etc.)has only a single function, and no controlschange their behaviour depending on the set-tings of other controls. For example, thereis no Gitksan-English/English-Gitksan togglethat changes the behaviour of the search box;this convention, although ubiquitous in onlinebilingual dictionaries, is a frequent source ofuser error even for experienced users.

Rather, Waldayu utilizes a double searchbar, in which user input in the left search boxsearches the primary language (e.g., Gitksan)while the left search box searches in the sec-ondary language (e.g., English).1 This paral-lels the two most frequent user tasks, usingEnglish as a query language to find an entry in

1This is superficially similar in appearance to the two-boxinterface used by Google Translate, but both boxes acceptuser input, and the user cannot swap the boxes or otherwisechange what languages they represent.

142

Page 3: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

Figure 1: Screenshot of Waldayu as the Gitksan/English Online Dictionary

Figure 2: Reduced Waldayu control layout embedded at the top of each page

a target language dictionary, and using a tar-get language as a query language to find anEnglish definition.

3.2 Immediate responsesIn addition to the above, we try to avoid re-quiring other “two step” user inputs (e.g.,typing a query and then hitting “search”).Instead, we attempt to provide user inputswith immediate feedback a la Google Instantsearch or other “AJAX”-style client and serverinterfaces (although as noted in Section 3.5,Waldayu does not in fact have a client-serverarchitecture).

3.3 Consistent visual metaphorsIn both controls and presentation, consis-tent visual metaphors are maintained. Weborrow the fundamental metaphor from theFirstVoices online word lists (First Peoples’Cultural Council, 2009) – the online interfacemost familiar to our users – that the primarylanguage (e.g., Gitksan) is always in the leftcolumn, and the secondary language (e.g., En-glish) is always in the right column. Thismetaphor is maintained in the search interface(as seen in Fig. 1, the left search box searches

the primary language, the right search boxsearches the secondary language), in brows-ing interfaces like the random word page (Fig.3), and in entry pages (with primary languagekeywords and examples on the left, definitionsand commentary on the right).

This horizontal visual metaphor is aug-mented by a colour scheme that presentsthe target language as green-blue (#066)and English as dark gray (#333); likeother metaphors, these colours are preservedthroughout the site.

Figure 3: Position and colour metaphorsmaintained in a “browsing” page

143

Page 4: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

3.4 Continued presence of controls

The major controls and links are present onevery page, and work the same way on eachpage. The interface of the start page (a hor-izontal bar with links to free browsing, ran-dom browsing, etc., the page title, and the dualsearch bar) is embedded (with a reduced lay-out, as seen in Fig. 2) and fully functional atthe top of every page.

This is intended to make it essentially im-possible to “get lost” within the site; thereis no question of how to return to the site’score functionality because every page has thatfunctionality.

3.5 Connection-independence

Many of our user communities are located inremote regions of Canada, and some users donot have home internet connections or reliablemobile data access. While the initial connec-tion to the Waldayu site, or the initial down-load of Waldayu Mobile, requires an internetconnection, subsequent uses should not.2

While Waldayu appears to be a modernAJAX client-and-server web application, inactuality the Waldayu engine compiles a sin-gle HTML file, containing the dictionary itselfin JSON format and JavaScript code that emu-lates a multi-page website. This way, the pagecan be downloaded and used offline on anyplatform; it has no installation steps, externalfiles, or any prerequisites (save, of course, fora reasonably recent web browser).

4 The mobile user experience

Exporting Waldayu’s user experience to mo-bile devices could not be done wholesale.While most laptop screens have a minimumof ˜700px in width, mobile devices are typi-cally in the ˜300px range, with the result thatmaintaining the principles in Section 3 oftenrequired different interface decisions.

2Or, more precisely, as much functionality as possibleshould be retained even if the internet connection is lost.Some multimedia capability is lost when Waldayu loses in-ternet access, but the basic search functionality remains.

4.1 Side menusWhile a navigation bar that itemizes each indi-vidual page associated with the site was an ap-propriate organization for Waldayu, the navi-gation bar became too cluttered for mobile de-vices, especially considering Waldayu Mobilehas additional pages such as a “Flashcards”page. For this reason, navigation for WaldayuMobile was translated into a side-menu that isaccessible by tapping the three-bar icon (whatis sometimes called “hamburger” in mobileinterface jargon) in the upper left corner ofthe screen, or swiping to the right on a touchscreen. This is seen below in Figure 4.

Figure 4: Side menus

4.2 A unified search barSimilarly, maintaining Waldayu’s horizontalmetaphor for search bars in Waldayu Mobilewould force search bars to be too small (i.e.,less than 150px wide) for a positive mobileuser experience. The option to force usersto turn their devices and search with a ‘land-scape’ (horizontal) orientation was also notpreferred because it would result in forfeitingbeing able to visualize results as queries aresearched. This would mean that users wouldhave to search with a horizontal orientation,and view results either by scrolling or switch-ing to a vertical orientation which would goagainst our UX principles in Section 3.

Initially, two solutions were created. Thefirst kept both search bars and dynamicallychanged their widths depending on whether a

144

Page 5: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

user tapped the left or right search bar. How-ever, this still proved to limit the size of thesearch bar too severely.

The second solution was to use only a sin-gle search bar and a language selector; whilethis does not achieve the principle in Sec-tion 3.1, we hypothesized that increased at-tention to the visual metaphors in Section 3.3might alleviate the difficulties this interfacecan cause. The language-selection radio but-tons preserved the aforementioned horizontalmetaphor (L1 on the left, L2 on the right) anddynamically changed the colour scheme of thesearch box to either the L1 or L2 colour.

While this solved the issues related toscreen size, user testing revealed difficultieswith the “two step” process, in which usersfirst need to first consider what type of searchthey wished to perform and then select a but-ton, which in turn changes the behaviour ofthe search bar. Likely, this is a result of thehabituation to Google-style search bars whereusers can type any sort of query and expectaccurate and relevant results.

Ultimately, this led us to develop a singlesearch bar which searches the query term inboth languages. The L1 and L2 results arepresented together (still preserving the afore-mentioned visual metaphors), but with addi-tional highlighting indicating, for each result,whether the match is on the L1 side or the L2side. This is seen below in Figure 5.

Figure 5: Highlighted search results inCayuga (Ranjeet and Dyck, 2011) (left) andEnglish (right)

5 Approximate Search

In this section, we consider the implementa-tion of approximate search within Waldayuand Waldayu Mobile.

There are four primary reasons for why anapproximate search is not simply a convenientfeature for endangered language dictionaries,but a necessary one.

1. Some users, even if they can distinguishall the phonemes of the language (e.g.,/k/ vs. /q/), do not always know theorthographic convention used to encodethis difference.

2. Other users – particularly students – mayknow the orthographic convention but beunable to reliably discern certain phono-logical differences in their target lan-guage, resulting in systematic and pre-dictable errors in spelling.

3. Many users will not have easy accessto a keyboard which is able to type therequired characters, whether for lack ofa language-specific keyboard, difficultywith keyboard installation, or becausethey are using a mobile device.3

4. Many North American Indigenous lan-guages have multiple orthographies,sometimes historical, but sometimes incurrent competition with each other. Dif-ferent regions, generations, scholars, andschools have produced materials usingdifferent orthographic conventions, andusers may have learned any of them, ormay be attempting to enter data from ma-terial written in a different orthography.

Sometimes these issues are compounded, inthat a user might type either “kl” or “tl” asan approximation of /ň/. Sometimes, though,the correct sequence of characters is perfectlyvalid using a standard English keyboard, butthe user has difficulty hearing the distinction,

3FirstVoices (First Peoples’ Cultural Council, 2009) re-cently developed an Android and iOS Keyboard app thatallows users to type in over 100 endangered languages inCanada, the US and New Zealand.

145

Page 6: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

Figure 6: Gitksan-English Online Dictionary returning noosik and naasik’ as possible matchesto user query nosek.

as is the case with glottalized resonants (e.g.’w vs w) in many languages of the PacificNorthwest.

To address these issues, we implementedseveral approximate search algorithms. Themost important criterion – aside from be-ing fast enough to return results to usersin a reasonable amount of time – was thatthis algorithm should either not require pa-rameterization for a particular language, orshould be able to be parameterized by a non-programmer (so that lexicographers can adaptthe Waldayu software to a new language with-out hiring a programmer or having to con-tact the developers).4 Ideally, a lexicogra-pher should be able to adapt Waldayu us-ing only familiar consumer software (like aspreadsheet program) rather than modifyingthe code or composing a structured data for-mat directly.

5.1 Unweighted Levenshtein search

We began with a simple Levenshtein dis-tance algorithm (Levenshtein, 1966), in whicheach word is compared to the user query andranked according to the number of single-letter edits needed to change the query into theword. This comparison has the benefit thatit can be expressed as a finite-state automa-ton (Schulz and Mihov, 2002) and, after con-struction of the automaton itself, run in lineartime over the length of the query in characters,making search results nearly instantaneous re-

4The third possibility is that the system learns appropriateparameters directly from the data. While this is promising forfuture work, we did not have, for any of our development lan-guages, an appropriate corpus of user-transcribed data fromwhich a system could learn orthographic correspondences.

gardless of the size of the dictionary.Simple Levenshtein distance, however, is

insufficiently discriminative to give useful re-sults to users; for example, the Gitksan wordfor tree, gan (/Gan/), sounds similar to En-glish “gun”, but given the query gun, an un-weighted Levenshtein algorithm has no rea-son to order the result gan above the resultdin, since these both differ from “gun” by twoedits.

5.2 Weighted Levenshtein search

To address this, we then parameterized editcosts such that more similar and frequently-confused characters (g and g, a and u) accruea lesser penalty, thereby ranking more-similarwords higher, along the lines of Needlemanand Wunsch (1970), Kondrak (2000), or Ryt-ting et al. (2011).5 Users could parameter-ize these penalties using a simple spreadsheet,in which they identified classes of similarsounds and chose [0.0-1.0] penalties withinthese classes.

While this approach proved adequate forour sample dictionaries, it did not scale well tolarger dictionaries, leading to inappropriatelylong search times in practice, particularly onmobile devices.6 Searching a five-letter wordin a dictionary of about 9,000 words took upto 9 seconds on an iPhone 4.

5We also reduced the relative penalties for deletions andinsertions at the beginnings and ends of words; this had theeffect of allowing search for subword strings such as roots.

6As noted in Section 3.5, Waldayu ensures basic func-tionality even while offline. Search is therefore performedentirely on the client side, to ensure a consistent user experi-ence online and offline.

146

Page 7: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

5.3 Unweighted Levenshtein search on“comparison” forms

We therefore adopted a hybrid approachthat leverages Waldayu’s built-in orthographicconversion tools, in which we run an un-weighted Levenshtein comparison between“orthographies” that purposely fail to repre-sent easily-confused distinctions.

As background, Waldayu is designed to beorthography-independent, since it is designedto be able to take in heterogeneous resourcesthat are not necessarily in the same orthog-raphy or the orthography that the user ex-pects. Non-programmer lexicographers canspecify orthographic correspondences using asimple table like that in Table 1, which Wal-dayu can read and compile into a finite statetransducer.7

kw kw

k qkw qw

x Xxw Xw

Table 1: Sample orthographic correspondencetable

In addition to specifying genuine ortho-graphic transformations, the user can alsospecify transformations into one or more“comparison” orthographies, in which easily-confused (or difficult to enter on a keyboard)distinctions are not represented. Compar-ing words by collapsing similar sounds intoa small number of equivalence classes is atechnique of long standing in lexicography(Boas and Hunt, 1902)8, information retrieval(Russell, 1918) and historical linguistics (Do-golpolsky, 1964).

7Programmer-lexicographers, on the other hand, can writeorthographic transformation plugins if they require transfor-mations more sophisticated than can be expressed in a tabularformat.

8Presuming that he likely made mistakes in differentiat-ing the similar sounds of Kwak’wala language, and that usersof the dictionary would face similar confusions, Boas col-lated the glossary of the Kwakiutl Texts according to equiv-alence classes, so that, for example, all lateral fricatives andaffricates were collated together.

In Waldayu, the user can specify aSoundex-like (Russell, 1918) transformationof entries and queries; by default Wal-dayu uses a transformation intended forNorth American Pacific Northwest languages(“PugetSoundex”), but it can straightfor-wardly be adapted to other languages. Table 2illustrates a sample transformation.

k KYkw KWk K

kw KWx HY

xw HWx H

xw HW

Table 2: Sample correspondence table for ap-proximate phonological comparison

For example, in the Gitksan developmentdictionary, the entry noosik (“caterpillar”) un-dergoes a transformation into NWSYK, los-ing the distinctions between vowel length andheight, the u/w/rounding and i/y/palatizationdistinctions, and the velar/uvular distinction.Meanwhile, a user input like nosek (Fig. 6)would undergo a similar transformation andlikewise result in NWSYK. The results of thesetransformations are then compared using anunweighted Levenshtein distance; since theedit penalties are not continuous, we canimplement this as a Levenshtein automatonusing the liblevenshtein library (Ed-wards, 2014). As both orthographic transduc-tion and Levenshtein automata operate in lin-ear time, the resulting system returns resultsnearly instantaneously (dropping from the 9seconds reported in Section 5.2 to less than10 milliseconds).9

In practice, all entries in the developmentdictionary go through two transformationsand comparisons, one to a very reduced form

9Qualitative evaluation of these approximate search al-gorithms (e.g, how often does a user query result in theirintended word?) will require a larger collection of user-generated text than we currently have, and thus remains tobe done.

147

Page 8: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

like NWSYK and another to a more faithful(although still quite broad) phonological rep-resentation. The search algorithm ranks re-sults according to a weighted average of thesecomparisons. This dual representation allowsus to rank entries by both coarse and fine dis-tinctions without using a continuous penaltyfunction.

6 Conclusion and Future Research

Waldayu and Waldayu Mobile10 are undercontinued development, although they arefunctional as-is and can be (and are being)adapted to additional languages.

Of the many features that remain to be im-plemented, several touch on unsolved (at leastfor this domain) research problems:

1. Incorporating algorithmic methods fordetermining the language of a givenquery term, along the lines of Cavnarand Trenkle (1994), could enhance theresults of the “unified” search bar (Sec-tion 4.2, in order to dynamically priori-tize L1 and L2 results according to whichlanguage the system decides the queryis targeting. However, this is not a triv-ial task (Beesley, 1988), and the difficul-ties in Section 5 also pose difficulties forlanguage identification, since many userqueries will be attempts at L1 words butinfluenced by L2 phonology and orthog-raphy.

2. Combining phonological/orthographicalapproximate search with morpho-logically-aware subword search hasproven problematic, when faced whenlong words composed of many relativelyshort morphemes. Take, for instance, theword for “telephone” in Gitksan:

haluu’algyagamt’uuts’xwha-luu-algyax-m-t’uuts’xwINSTR-in-speech-ATTR-wire“telephone” (Hindle and Rigsby, 1973)

10www.waldayu.org

The results returned from combinedphonological/orthographical/morpho-logical search, although “relevant” in thesense that the entries contain sequencesof morphemes that are phonologi-cally/orthographically close to the query,can seem very counter-intuitive from auser point-of-view. It remains to be seenwhat level of morphological analysis andwhat notion of distance produces resultsthat are intuitively “correct” from a userpoint of view.

3. We need to move our assumptions aboutthe efficacy of our UX and algorithms be-yond anecdote. For search algorithm ef-ficiency, we would like to develop bothstatistical methods and analytics for de-termining how often users are given cor-rect or relevant search results. For UX,we would like to conduct controlled ex-periments with users who possess vary-ing levels of linguistic knowledge andtarget language competency.

4. Finally, the conflicting demands of weband mobile interfaces has led to a de-gree of codebase fragmentation. Thisfragmentation was in part due to Wal-dayu’s origin as a web-based application– where there are fewer constraints onthe interface – and subsequent adapta-tion to more-constrained mobile applica-tions. The next version of Waldayu, nowwell into development, seeks to unifythe interfaces and codebase as much aspossible by taking a “mobile first” ap-proach and implementing all three ver-sions (web, Android, and iOS) in An-gularJS, using the Ionic framework11 forWaldayu Mobile. This change reworksWaldayu into a responsive Single PageApplication (SPA), as seen in Figure7, and brings features of Waldayu Mo-bile (like automatic flashcard generation)back into the web-based interface.

These problems suggest interesting future11ionicframework.com

148

Page 9: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

Figure 7: Browse state of Waldayu using AngularJS

research directions at the intersection of userexperience, linguistics, and computation, andfurthermore suggest that user-facing, novice-friendly interfaces may be valuable in gener-ating novel data sets and research questionsfor endangered language research.

Acknowledgments

The development of Waldayu and WaldayuMobile was supported by the SSHRC In-sight Grant 435-2016-1694, ‘Enhancing Lex-ical Resources for BC First Nations Lan-guages’. Finally, we would also like to ac-knowledge our presence on unceded CoastSalish territory, where the majority of thiswork was inspired, created, and written about.

ReferencesKenneth Beesley. 1988. Language identifier: A com-

puter program for automatic natural-language iden-tification of on-line text.

Martin Benjamin and Paula Radetzky. 2014. Smalllanguages, big data: Multilingual computationaltools and techniques for the lexicography of endan-gered languages. In Proceedings of the 2014 Work-shop on the Use of Computational Methods in theStudy of Endangered Languages, pages 15–23.

Franz Boas and George Hunt. 1902. Kwakiutl texts.Memoirs of the American Museum of Natural His-tory, 5.

William B. Cavnar and John M. Trenkle. 1994. N-gram based text categorization. In In Proc. of

SDAIR-94, 3rd Annual Symposium on DocumentAnalysis and Information Retrieval, pages 161–175.

Aron B. Dogolpolsky. 1964. Gipoteza drevnejego rod-stva jazykovych semej severnoj evrazii s verojatnos-tej toky zrenija. Voprosy Jazykoznanija, 2:53–63.

Joel Dunham. 2014. The Online Linguistic Database:Software for Linguistic Fieldwork. Ph.D. thesis,University of British Columbia.

Dylon Edwards. 2014. liblevenshtein: A li-brary for generating finite state transducersbased on Levenshtein automata. Retrievedfrom https://github.com/universal-automata/-liblevenshtein.

First Peoples’ Cultural Council. 2009. FirstVoices.Retrieved from http://www.firstvoices.com/.

Sylviane Granger and Magali Paquot. 2012. Elec-tronic lexicography. Oxford University Press.

Lonnie Hindle and Bruce Joseph Rigsby. 1973. Ashort practical dictionary of the Gitksan language,volume 7. Department of Sociology/Anthropology,University of Idaho.

David Joffe and Gilles-Maurice de Schryver. 2004.Tshwanelex: A state-of-the-art dictionary compila-tion program. pages 99–104.

Grzegorz Kondrak. 2000. A new algorithm for thealignment of phonetic sequences. In Proceedingsof the 1st North American chapter of the Asso-ciation for Computational Linguistics conference,pages 288–295. Association for Computational Lin-guistics.

Vladimir I. Levenshtein. 1966. Binary codes capa-ble of correcting deletions, insertions, and reversals.Soviet Physics Doklady, 10:707–710.

149

Page 10: Waldayu and Waldayu Mobile: Modern digital dictionary ...littell.nfshost.com/papers/littell2017waldayu.pdf · structured HTML like online word lists, to simple word/denition spreadsheets

Michael Maxwell and William Poser. 2004. Mor-phological interfaces to dictionaries. In Proceed-ings of the Workshop on Enhancing and Using Elec-tronic Dictionaries, ElectricDict ’04, pages 65–68,Stroudsburg, PA, USA. Association for Computa-tional Linguistics.

Saul B. Needleman and Christian D. Wunsch. 1970.A general method applicable to the search for simi-larities in the amino acid sequence of two proteins.Journal of Molecular Biology, 48:443–53.

Kumar Ranjeet and Carrie Dyck. 2011.Cayuga digital dictionary. Retrieved fromhttps://cayugalanguage.ca.

Robert C. Russell. 1918. U.S. Patent 1,261,167. U.S.Patent and Trademark Office.

C. Anton Rytting, David M. Zajic, Paul Rodrigues,Sarah C. Wayland, Christian Hettick, Tim Buckwal-ter, and Charles C. Blake. 2011. Spelling correctionfor dialectal arabic dictionary lookup. 10(1):3:1–3:15, March.

Klaus U. Schulz and Stoyan Mihov. 2002. Faststring correction with Levenshtein-automata. Inter-national Journal of Document Analysis and Recog-nition, 5(1):67–85.

Laurence C. Thompson. 1996. Thompson River Salishdictionary: NłePkepmxcın. University of MontanaOccasional Papers in Linguistics.

150