1
Introduction This dictionary project is the first step in a larger project to develop tools that indigenous Alaskans can use to help revitalize their languages. Purpose To develop software to encourage Yup’ik writing using modern technologies. Goals • This project will develop data structures to define and store a digital dictionary for the Central Yup’ik language. • This project will develop basic word-checking software to show the functionality of this dictionary. Process • Define Yup’ik word-forming grammar rules with clear algorithms • Create data structures to store Yup’ik morphemes • Develop word-checking software to test data structures and algorithms Revitalizing Central Yup’ik Language The Central Yup’ik language, like Alaska Native languages all over the state, has been in decline for a generation or more. The younger generations of speakers are not learning and using the language of their ancestors. We may be able to reconnect with the next generation of speakers using modern technologies. Acknowledgments Thank you Theo Sery and Jeane Breinig with the UAA English department. Theo for helping me write about these projects and Jeane for being supportive with credit. Thank you Marie Meade and Nancy Furlow with Alaska Native Studies for providing excellent education and support for Alaska Native peoples. Thank you Frank Moore and Kendrick Mock in the Computer Science department for providing excellent instruction and guidance. But thank you most of all to Herb Schroeder and everybody in the ANSEP team. It’s their community environment and financial support that enables me to continue school. Modeling Yup’ik Grammar The Yup’ik language is a polysynthetic language. This means that most Yup’ik words are formed using a base, zero or more postbases, and an appropriate ending. Each time a postbase is added to the base, it is treated as an expanded base and can receive additional postbases to add meaning to the word. The bases can be defined as being either a noun or a verb, and can be classified into one of six morpho- phonological classes that help define how postbases and endings will be added. This is a brief table of nouns professor Marie Mead uses in her Yup’ik language classes here at the university. Included are two endings to show notation used for postbases and endings. What is a trie? A trie is a type of tree data structure. A data structure is made up of a series of linked nodes. The lengthy explanation… A tree structure begins with a head node, providing the starting point for the tree. This head node contains the addresses of each of the nodes that branch from it, each of which is referred to as a child node. Each of these child nodes, in turn, have links to child nodes of their own, until all the data that needs to be found in the tree has been stored. The trie data structure is a tree that can be easily used to store and look up words in a dictionary. The head node points at each beginning letter of a word. Each of these letters will point to possible letters that may follow it. This process is repeated until the longest word in the dictionary has been defined. However, a figure would be the best way to describe how this data structure works. Here is a trie storing the English words: DO, DOG, DOT, and COT. The short explanation… Yup’ik language dictionary and processing software Eric Somerville, Researcher Dr. Frank Moore, Mentor Office of Undergraduate Research and Scholarship, University of Alaska Anchorage Notice that in these examples the vowel, i, was doubled when adding class VI noun base, panig-, with the unpossessed plural ending, %:(e)t. Notice also how the k from first-person singular possessive (1s- s) ending, -ka, reverted to -qa after forming with the r base ending of qetunrar-. Defining the symbols The process of adding %:(e)t to class VI base, panig- is as follows: • % - the final consonant of the class VI base is retained. The base remains panig- • (e) – the letter, e, is inserted for some bases, but not others. In this case, for class VI only. This gives us: panig:et • : - when a velar marked with this character is surrounded by single vowels, the vowel-velar-vowel series is replaced with a vowel-vowel pair. Giving us: paniit For Further Information Please contact [email protected]. www.camai-ellamyui.com should become available over the summer. Yup’ik Spell Checking The current algorithm I plan to use for this spell checking software will search through a series of trie data structures to search for matching morphemes, combining them using proper Yup’ik grammar. This process begins by checking a list of bases, returning a list of possible bases to be checked. Each base will go through a process of adding appropriate postbases and endings, until the input word is either completely formed or found to be outside the dictionary. Below is a figure outlining this process. Notice the process of adding postbases onto bases, forming extended bases, then passing the extended base back to the relevant postbase and ending tries. Five tries to get it right… The tries I’m currently planning to divide Yup’ik morphemes into are as follows: A base trie will store the list of both noun and verb bases in a single structure. A verbal-adding postbase trie will contain all postbases that can be added to a verb base. Some of these postbases will expand the verb and leave it verbal, some will expand the verb, changing it into a noun. Complimenting this trie will be the nominal-adding postbase trie, storing all postbases that can be added to noun bases. The final two tries will store verb and noun endings. References Jacobson, S. A. (1984). Yup’ik Eskimo Dictionary. Fairbanks, AK: Alaska Native Language Center. Krauss, M. E. (1980). Alaska native languages: Past, present, and future. Fairbanks, AK: Alaska Native Language Center. Opsahl, A. (ed.). (2010). Alaska company wins $25.3broadband stimulus grant. Retrieved from http://www.govtech.com/gt/742350 Reed, I., Miyaoka, O., Jacobson, S., Afcan, P., Krauss, M. (1977). Yup’ik Eskimo Grammar. Fairbanks, AK: Alaska Native Language Center. English Citatio n Base Class Plural %:(e)t 1s-s “My one” -ka Mother Aana Aana- I Aanat Aanaka Husband Ui Ui- II Uit Uika Fish/ Food Neqa Neqe- III Neqet Neqeka Dog Qimugta Qimugte- IV Qimugtet Qimugteka Son Qetunra q Qetunrar - V Qetunrat Qetunraqa Daughter Panik Panig- VI Paniit Panika no postbase found ending found no base found Input String Return false Base Guess Endin g Trie Base Trie Return true Postbas e Trie Return false Expanded Base = Base Guess + Postbase Guess Head Node D C O O T T G end end end end

Introduction This dictionary project is the first step in a larger project to develop tools that indigenous Alaskans can use to help revitalize their languages

Embed Size (px)

Citation preview

Page 1: Introduction This dictionary project is the first step in a larger project to develop tools that indigenous Alaskans can use to help revitalize their languages

IntroductionThis dictionary project is the first step in a larger project to develop tools that indigenous Alaskans can use to help revitalize their languages.

PurposeTo develop software to encourage Yup’ik writing using modern technologies.

Goals• This project will develop data structures to define and store a digital dictionary for the Central Yup’ik language.• This project will develop basic word-checking software to show the functionality of this dictionary.

Process• Define Yup’ik word-forming grammar rules with clear algorithms• Create data structures to store Yup’ik morphemes• Develop word-checking software to test data structures and algorithms

Revitalizing Central Yup’ik LanguageThe Central Yup’ik language, like Alaska Native languages all over the state, has been in decline for a generation or more. The younger generations of speakers are not learning and using the language of their ancestors. We may be able to reconnect with the next generation of speakers using modern technologies.

AcknowledgmentsThank you Theo Sery and Jeane Breinig with the UAA English department. Theo for helping me write about these projects and Jeane for being supportive with credit.

Thank you Marie Meade and Nancy Furlow with Alaska Native Studies for providing excellent education and support for Alaska Native peoples.

Thank you Frank Moore and Kendrick Mock in the Computer Science department for providing excellent instruction and guidance.

But thank you most of all to Herb Schroeder and everybody in the ANSEP team. It’s their community environment and financial support that enables me to continue school.

Modeling Yup’ik GrammarThe Yup’ik language is a polysynthetic language. This means that most Yup’ik words are formed using a base, zero or more postbases, and an appropriate ending. Each time a postbase is added to the base, it is treated as an expanded base and can receive additional postbases to add meaning to the word.

The bases can be defined as being either a noun or a verb, and can be classified into one of six morpho-phonological classes that help define how postbases and endings will be added.

This is a brief table of nouns professor Marie Mead uses in her Yup’ik language classes here at the university. Included are two endings to show notation used for postbases and endings.

What is a trie?A trie is a type of tree data structure.

A data structure is made up of a series of linked nodes.

The lengthy explanation…

A tree structure begins with a head node, providing the starting point for the tree. This head node contains the addresses of each of the nodes that branch from it, each of which is referred to as a child node. Each of these child nodes, in turn, have links to child nodes of their own, until all the data that needs to be found in the tree has been stored.

The trie data structure is a tree that can be easily used to store and look up words in a dictionary. The head node points at each beginning letter of a word. Each of these letters will point to possible letters that may follow it. This process is repeated until the longest word in the dictionary has been defined.

However, a figure would be the best way to describe how this data structure works. Here is a trie storing the English words: DO, DOG, DOT, and COT.

The short explanation…

Yup’ik language dictionary and processing software

Eric Somerville, Researcher Dr. Frank Moore, MentorOffice of Undergraduate Research and Scholarship, University of Alaska Anchorage

Notice that in these examples the vowel, i, was doubled when adding class VI noun base, panig-, with the unpossessed plural ending, %:(e)t. Notice also how the k from first-person singular possessive (1s-s) ending, -ka, reverted to -qa after forming with the r base ending of qetunrar-.

Defining the symbolsThe process of adding %:(e)t to class VI base, panig- is as follows:

• % - the final consonant of the class VI base is retained. The base remainspanig-

• (e) – the letter, e, is inserted for some bases, but not others. In this case, for class VI only. This gives us:panig:et

• : - when a velar marked with this character is surrounded by single vowels, the vowel-velar-vowel series is replaced with a vowel-vowel pair. Giving us:

paniit

For Further InformationPlease contact [email protected].

www.camai-ellamyui.com should become available over the summer.

Yup’ik Spell CheckingThe current algorithm I plan to use for this spell checking software will search through a series of trie data structures to search for matching morphemes, combining them using proper Yup’ik grammar.

This process begins by checking a list of bases, returning a list of possible bases to be checked. Each base will go through a process of adding appropriate postbases and endings, until the input word is either completely formed or found to be outside the dictionary.

Below is a figure outlining this process.

Notice the process of adding postbases onto bases, forming extended bases, then passing the extended base back to the relevant postbase and ending tries.

Five tries to get it right…The tries I’m currently planning to divide Yup’ik morphemes into are as follows:

A base trie will store the list of both noun and verb bases in a single structure.

A verbal-adding postbase trie will contain all postbases that can be added to a verb base. Some of these postbases will expand the verb and leave it verbal, some will expand the verb, changing it into a noun.

Complimenting this trie will be the nominal-adding postbase trie, storing all postbases that can be added to noun bases.

The final two tries will store verb and noun endings.

ReferencesJacobson, S. A. (1984). Yup’ik Eskimo Dictionary. Fairbanks, AK: Alaska Native

Language Center.Krauss, M. E. (1980). Alaska native languages: Past, present, and future.

Fairbanks, AK: Alaska Native Language Center.Opsahl, A. (ed.). (2010). Alaska company wins $25.3broadband stimulus grant.

Retrieved from http://www.govtech.com/gt/742350Reed, I., Miyaoka, O., Jacobson, S., Afcan, P., Krauss, M. (1977). Yup’ik Eskimo

Grammar. Fairbanks, AK: Alaska Native Language Center.

English Citation Base ClassPlural%:(e)t

1s-s “My one”-ka

Mother Aana Aana- I Aanat Aanaka

Husband Ui Ui- II Uit Uika

Fish/Food Neqa Neqe- III Neqet Neqeka

Dog Qimugta Qimugte- IV Qimugtet Qimugteka

Son Qetunraq Qetunrar- V Qetunrat Qetunraqa

Daughter Panik Panig- VI Paniit Panika

no postbase found

ending found

no base foundInput String Return false

Base Guess

Ending Trie

Base Trie

Return true

Postbase Trie Return false

Expanded Base = Base Guess + Postbase Guess

Head Node

D C

O O

T TGend

end end end