Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Burmese Project
LEARNApril 23, 2018
Ye Min TunBurmese, LCI
1
Post‐Visit Report:
Memo 1
2
12‐Week Workshop‐ Challenges‐ Prototyping‐ Solutions ‐ Memo‐2: A Project Proposal
3
Design Thinking
Burmese Unicode Fonts and Lack of Available Keyboard
Is there any feasible Burmese font that we could use with any language tools available on the internet?
Do we have a Burmese keyboard for Unicode font?
The Analysis and Design Process Stage I: Technical Challenges for Burmese
We were able to:(1) Identify a Burmese Unicode font that can be
used for vocabulary development with tools and technology available on the internet
(2) Design and create new Burmese Unicode keyboard software at FSI
(3) Find an appropriate tool for Burmese word segmentation
(4) Find appropriate tools to create a word frequency list
(5) Prototype a Burmese word frequency list
End of the Stage II of Design ThinkingMajor Challenges Identified and Solved
“Word lists lie at the heart of good vocabulary course design, the development of graded materials for extensive listening and extensive reading, research on vocabulary load, and vocabulary test development.”
Paul NationMaking and Using Word Lists for Language Learning and Testing
(2016)
6
Making Word Lists
Phase I: Vocabulary frequency list (7 ‐months)(i) 1000 to 3000 high‐frequency words
Phase II: Texts and Prototype‐lessons (5 ‐months)(i) Reading texts (ii) Lesson prototypes (iii) New ways of teaching the Burmese script and
sound system (iv) Prototype lessons to be tested in class
Phase III:Curriculum (9 ‐months)(i) A vocab‐frequency list based curriculum (ii) Job related materials(iii) An extensive reading program (iv) Grammar instruction and materials
Scope and Sequence of the Burmese Curriculum Project
7
Purpose for making a particular list has a strong effect on the decisions and procedures that need to be followed.
Paul NationMaking and Using Word Lists for Language Learning and
Testing (2016)
8
Phase I
200000000
10000001000000
10000001000000
Phase I: Creation of Vocabulary Frequency List for the Burmese Project
1000, 2000, 3000
1000, 2000, 3000for each cone
200000000
10000000
1000000
A. Making a general Corpus‐ Target tokens – 2.5 millions‐ A general frequency list
B. Making sub‐Corpora (BEA/CS/GC/PM/STA)‐ Target tokens – 5oo K (each)‐ Frequency lists for each cone
C. Testing Frequency Lists‐Word list, Keyword List, Collocation, Concordance
9
Step 1. Finding appropriate textshttp://myanmar.mmtimes.com/
Phase I: How and What
10
Step 2. Convert text into Unicode if needed http://burglish.my‐mm.org/latest/trunk/web/fontconv.htm
Phase I: How and What
11
http://www.nlpresearch‐ucsy.edu.mm/NLP_UCSY/wsandpos.html
Phase I: How and WhatStep 3. Segmentation
12
Step 4A. Cleaning the noises
Phase I: How and What
13
Step 4B. Cleaning the noises
Phase I: How and What
14
Step 4C. Improving accuracy
Phase I: How and What
15
Step 4C. Improving accuracy
Phase I: How and What
16
Step 5. Save a text file
Phase I: How and What
17
Step 6A. Spreadsheet Record for General Corpus
Phase I: How and What
18
1. Future use of materials2. Copy rights issue3. Sources of materials4. Spoken Vs. Written5. Range of the Corpora (How many tokens acquired
for each category)
19
Why do we need a record?
Step 6A. Spreadsheet Record for General Corpus
Phase I: How and What
20
Step 6A. Spreadsheet Record for Sub‐Corpora
Phase I: How and What
21
# of acquired tokens as of December 2016 = 2,451,217
(Sub‐Corpora)
Corpora
516946 488469
398922
535288
472983
36109
PM BEA STA
CS GC DS
22
Total # of acquired tokens = 2,731,762(Sub‐Corpora)
23
516946 497723 482732 535288 475809
43208 43850
2731762
0
500000
1000000
1500000
2000000
2500000
3000000
1
PM
BEA
STA
CS
GC
DS
CON
Total
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)http://www.laurenceanthony.net/
24
http://www.laurenceanthony.net/
AntConc: A tool to do many things
25
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
26
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
27
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
28
Creating High Frequency ListsTesting with Tools
(AntWordProfiler, AntConc, TextFixer)
29
Using AntConc to make a stable list of high frequency words
Next …
Using AntWordProfiler to profile the texts
30
What I am currently doing …
What does this project mean for similar languages?
31
Discussion:
Burmese Project
32
200000000
One‐Page Description
200000000 10000000
1000, 2000, 3000
Road Map: Three Phases of Burmese Project
Phase I: Creation of Vocabulary List
Phase II: Texts and Prototype‐lessons (i) Reading texts (ii) Lesson prototypes (iii) New ways of
teaching the Burmese script and sound system
(iv) Prototype lessons to be tested in class
Phase III: (i) Vocabulary based Curriculum with “Four Strands” principle(ii) Development of Grammar Instructions and Integrate in Curriculum
33