Upload
turrance-nandasara
View
2.251
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
1
Multimedia Technology Text
S T NandasaraADMTC/UCSC
2
World of Languages
3
World of Languages Asian Countries
Source: Ethnologue- Languages of the World (The exact number of languages may never be determined exactly)
4
World of Languages Asian region
(Half of the worlds languages are spoken in only eight countries)
5
World of Languages Asian CountriesCountry Number of Languages Country Population Official or National Languages
Indonesia 742 245,452,739 Indonesian
India 427 1,095,351,995 Assamese, Bengali, Bodo, Dogri, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Maithili, Malayalam, Manipuri, Marathi, Marwari, Nepali, Oriya, Panjabi, Sanskrit, Sindhi, Tamil, Telugu, Urdu,
China 241 1,313,973,713 Chinese, Zhuang, Uighur, Hmong, Hani
Philippines 180 89,468,677 Filipino, English
Malaysia 147 24,385,858 Malay
Nepal 125 28,287,147 Nepali, Gurung, Tamang
Myanmar 109 47,382,633 Burmese
Vietnam 93 84,402,966 Vietnamese
Laos 82 6,368,481 Lao
Thailand 75 64,631,595 Thai
Iran 74 68,688,433 Arabic, Farsi
Pakistan 69 165,803,560 Urdu, Panjabi, Sindhi, English
Afghanistan 45 31,056,997 Dari, Pashto
Bangladesh 38 147,365,352 Bengali
Bhutan 24 2,279,723 Dzongkha
Iraq 23 26,783,383 Arabic, Kurdi
Cambodia 19 13,881,427 Khmer
Brunei 17 379,444 Malay, English
Mongolia 12 2,832,224 Halh Mongolian
Sri Lanka 8 20,222,240 Sinhala, Tamil, English
6
World of Languages Script Diversity
Three types of Major Scripts in South, South East & East Asia
In East Asia - Chinese Ideographic Scripts
In South Asia, Around Indian sub-continent & Part of South Asia - Influence by Brahmi Scripts
Part of South East Asia and Austrasia - Roman Scripts
Two Major Types of Scripts in West & Central Asia
In Central Asia Historically in Arabic, but later Transformed in to Cyrillic
In Western Asia, Arabic Scripts is widely used
One major Type of Script in Europe and West
Roman Script
7
World of Languages Script in AsiaChinese (Mandarin) 885,000,000 Nepali 16,200,000
English 322,000,000 English Filipino (Tagalog) 14,850,000 Tagalog
Arabic (Alarabia) 280,000,000 Assamese 14,604,000 a
Bengali 196,000,000 Azeri/Azerbaijani (Cyrillic) 13,869,000
Hindi 182,000,000 Sinhala 13,218,000
Portuguese (Portugus) 182,000,000 portugus Zhuang 10,000,000 Saw cuengh
Indonesian 140,000,000 Indonesea Pashto/Pakhto 9,585,000
Japanese (Nihongo) 125,000,000 Kazakh 8,000,000
/
Hankuko (Korean) 75,000,000 [] Uighur (Uyghur) 7,464,000
/
Telugu 73,000,000 Khmer 7,063,200
Vietnamese 66,897,000 Ting Vit Dari 7,000,000
Marathi 64,783,000 Tatar 7,000,000
/
Tamil 62,000,000 Turkmen 5,397,500 m
Turkish (Trke) 59,000,000 Trke Kashmiri 4,381,000
/
Urdu 54,000,000 Lao 4,000,000
Gujarati 44,000,000 Balinese 3,800,000 Bahasa Bali
Malayalam 34,014,000 Kyrgyz 2,631,420
Kannada 33,663,000 Fijian 650,000 vaka-Viti
Punjabi/Panjabi 25,700,000
/
Maldivian Dhivehi 280000
Thai 21,000,000 Sanskrit 194,433
Sindhi 19,675,000 Tahitian 150,000 Te Reo Tahiti
Uzbek (Cyrillic) 18,386,000 Maori 70,000 Te Reo Mori
Bahasa Melayu (Malay) 17,600,000 Bahasa melayu Hawaiian 8,000 lelo Hawai'i
8
World of Languages Script in Asia
9
Nature of Text
The most basic media.
Easiest to generate, store and transfer in PC.
Still the best for complex explanation.
Using structured text/Hypertext
Light weight
Smallest sized media
Static
Language dependent (biggest problem)
PresenterPresentation NotesHypertext Text that has link to more text, find often in Help files and Web. Organized, information, that allows meaningful, non linear,access to text- oriented resources.10
Text Digital Form
Input Digital Form Output
Creation
Handwriting
Printed Documents
Human Voice
Keyboard
Handwriting Recognition
Optical Character Recognition (OCR)
Voice Recognition
Text Data
(Character code)ASCII: 8 bitUnicode: 16 bitUniversal Character Set: 32 bit
Typeface
Voice
Bitmap fontVector Font
Text-to-Speech
PresenterPresentation NotesUCS Universal Character StandardHand Writing Personal information Managers11
Indexing and Hypertext
Indexing
Rapid random access/search method for Large Text Data.
Essential for reference type applicationsDictionary, Encyclopedia Etc.
Hypertext
Non-sequential navigation structure for Large Text Data
Used in Web pages (HTML)
While, it is hard when we try to
process by machine a plur
ality
of
media together. The tele
phone and radio for voice, the camera for image.
we usually tend to handle diff erent
media individually. Even with the computer, the represen
tative
device, origin -ally it could only handle text and numbers.
With technological progre
ss, it
became able to handle voice and images and to com municate, but there we re still many limitat
ions. Tel
a b c d
ad am
adjust adorn
bi
e
bybot
Large Text DataLarge Text Data
IndexIndex
PresenterPresentation NotesDiag. to show hypertext data. An example of linking information of different media.12
Hypertext, Hypermedia and Multimedia
Multim
edia Hypertext
Hypermedia
Hypermedia system includes the non- linear Information links of hypertext systems and the continuous and discrete media of multimedia systems.
PresenterPresentation NotesWe know what is hypertext. Lets look at the differences between hypermedia and multimedia13
Typography
Until end of 14th Century, all writing was done by hand.
Typography the design of the characters that make up text and display type and the way they are configured on the page.
Modern software allows :
Rotation or distorting type, wrap around images,
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font14
Typography Evolution of Asian Scripts
Pa
l
l
awa
Panjabi
Devanagari
Gujarati
Bengali
Oriya
Teligu
Kannada
Tamil
Malayalam
Sinhala
12 th
century
10 th
Cent
ur
y
8 th
Cent
ur
y
6 th
century
3rd
century
1st
century
3rd
Bc
Mode
r
n
12 th
century
Mode
r
n
10 th
Cent
ur
y
12 th
century
Mode
r
n
8 th
Cent
ur
y
10 th
Cent
ur
y
12 th
century
Mode
r
n
6 th
century
8 th
Cent
ur
y
10 th
Cent
ur
y
12 th
century
Mode
r
n
3rd
century
6 th
century
8 th
Cent
ur
y
10 th
Cent
ur
y
12 th
century
Mode
r
n
1st
century
3rd
century
6 th
century
8 th
Cent
ur
y
10 th
Cent
ur
y
12 th
century
Mode
r
n
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font15
Typography Complex Scripts
Bengali Devanagar i
Gujarati
Kannada Malayalam Teligu
Sinhala Tamil Ranjana
Gurmuki Oriya Tibetan
Khmer Lao Thai
Jawani Thana Bagini
Sanskrit
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font16
Typography - Complex Vowels
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font17
Typography ASCII & EBCDICASCII EBCDIC
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font18
Typography 8 Bit English and Sinhala1989 - SLASCII
Wadan Tharuwa SBIOS
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font19
The Code Page Problem
Characters in most languages are traditionally represented by single-byte values
Allows for 256 characters max
Real limit for most encodings is 192 characters
This includes letters, digits, punctuation, symbols
When a system is used for a new language, the encoding has to be adapted to use that languages characters
Encodings proliferate
Each language or group of languages gets its own encoding
Different vendors or standards committees devise different encodings, so generally each language has several, often incompatible, encodings
20
Multi-byte encodings
Some languages (Chinese, Japanese, Korean, etc.) have more than 256 characters
Encoding standards for these languages use sequences of bytes for many characters
In many standards, not all characters are the same number of bytes
Cant tell whether a given byte is a whole character or part of a character
Corruption of one byte can corrupt the whole data stream
21
22
Interoperability problems
Cant easily mix languages in a document or system
Data not tagged with encoding, so loss can occur when transferring between systems
Most encodings are ASCII-based, so problems often not seen with English-only data
Two possible solutions:
Systematic tagging of textual data with encoding ID
Universal encoding standard with all languages characters
23
Encoding space
An ASCII character is 7 bits wide
24
Encoding space
Most encodings press the eighth bit into service
25
Encoding space
Early versions of Unicode used 16 bits
26
Encoding space
Unicode now uses 21 bits
27
Encoding space
Plane number
Row number
Character number
28
Unicode
21-bit encoding space allows for 1,114,112 characters
95,156 code point values assigned to characters in Unicode 3.2
137,216 code point values set aside for application use
2,114 code point values set aside for non- character use
879,626 code point values reserved for future character assignments
29
The Unicode Encoding Space
0123456789ABCDEF
10
Basic Multilingual Plane
30
The Unicode Encoding Space
0123456789ABCDEF
10
Supplementary Planes
31
The Unicode Encoding Space
0123456789ABCDEF
10
Supplementary Multilingual PlaneSupplementary Ideographic Plane
Supplementary Special-Purpose Plane
32
The Unicode Encoding Space
0123456789ABCDEF
10Private Use Planes
33
The Unicode Encoding Space
0123456789ABCDEF
10
Basic Multilingual Plane
34
The Basic Multilingual Plane0123456789ABCDEF
General Scripts Area
Symbols AreaCJK Punct.
CJK Punct.
Han
Yi
Hangul
Surrogates Area
Private Use AreaCompatibility Area
35
The General Scripts Area00/0102/0304/0506/0708/09
0A/0B0C/0D0E/0F10/1112/1314/1516/1718/19
1A/1B1C/1D1E/1F
LatinIPA Diacriticals GreekCyrillic Armenian Hebrew
Arabic Syriac ThaanaDevanagari Bengali
Gurmukhi Gujarati Oriya TamilTelugu Kannada Malayalam Sinhala
Thai Lao TibetanMyanmar Georgian Hangul
Ethiopic CherokeeCanadian Aboriginal Syllabics
Runic Philippine KhmerMongolian
Latin Greek
Ogh
am
36
Unicode Coverage
European scripts
Latin, Greek, Cyrillic, Armenian, Georgian, IPA
Bidirectional (Middle Eastern) scripts
Hebrew, Arabic, Syriac, Thaana
Indic (Indian and Southeast Asian) scripts
Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Khmer, Myanmar, Tibetan, Philippine
East Asian scripts
Chinese (Han) characters, Japanese (Hiragana and Katakana), Korean (Hangul), Yi
Other modern scripts
Mongolian, Ethiopic, Cherokee, Canadian Aboriginal
Historical scripts
Runic, Ogham, Old Italic, Gothic, Deseret
Punctuation and symbols
Numerals, math symbols, scientific symbols, arrows, blocks, geometric shapes, Braille, musical notation, etc.
37
Characters, Glyphs, and Fonts
In computer terms, a character is a grouping of bits (binary ones and zeros) in packages of 8: one or more bytes
There are two broad classes of characters: data characters and control characters
38
Characters, Glyphs, and Fonts
A ArialA
- Times New Roman
A
- Courier newA
Giddyup Standard
A
- BodoniA
- Papyrus
A
- Forte
39
Characters, Glyphs, and Fonts
You can run out of available characters pretty quick if you allow all those strange foreign, mathematical, scientific, engineering, currency, and other symbols
(Informal Roman)
40
Unicode properties
0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;
Code point: 0041 Name: LATIN CAPITAL LETTER A
General category: Uppercase letter (Lu) Canonical combining class: Standard spacing (0)
Bidirectional category: Left-to-right (L) Mirrored: no (N)
Lowercase mapping: 0061
Representative glyph
Semantic properties
A
41
Combining characters
One character
42
Combining characters
or two?
43
Combining charactersActually, either.
Unicode is generative, with accent marks represented with their own code point values
= U+0065 (e) U+0301 (accent)
but common combinations of letters and accents are also given their own code points for convenience.
= U+00E9
44
Combining characters
This can be tough, because the two representations are to be treated as absolutely identical.
U+0065 U+0301 U+00E9=
=
45
Combining charactersThings can get really wild for characters with more than one accent mark:
= 006F (o) 0302 (circumflex) 0323 (dot)
= 006F (o) 0323 (dot) 0302 (circumflex)
= 00F4 (o-circumflex) 0323 (dot)
= 1ECD (o-dot) 0302 (circumflex)
= 1ED9 (o-circumflex-dot)
46
Typography - Complex Vowels Positioning
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font47
babibu b
Smart rendering: Arabic
bbababbabibabibScreen:
Keyboard:
babibu 0628 064e 0628 06500628 064f 0020 0628
Code points:0628 064e 0628 06500628 064f 00200628 064e 0628 06500628 064f0628 064e 0628 065006280628 064e 0628 06500628 064e 06280628 064e0628
48
Smart rendering: Burmese
kkrkru
Screen:
Keyboard:
krui1000 1039 101b102f 102d
Code points:1000 1039 101b102f1000 1039 101b1000
49
Smart rendering: Tamil
UUrUr rUr rUUr rU yUr rU yUUr rU yU NUr rU yU NUUr rU yU NU mUr rU yU NU mUUr rU yU NU mU kUr rU yU NU mU kUUr rU yU NU mU kU j
Screen:
Keyboard: Ur rU yU NU mU kU jUCode points:
b9c bc2b95 bc2bae bc2ba3 bc2baf bc2bb0bb0 bc2b8a bb0b8a baf
ba3 bae b95b9c
50
Typography - Complex Ligature
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font51
Canonical equivalence
01FA
212B 0301
00C5 0301
0041 030A 0301
LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
ANGSTROM SIGN
COMBINING ACUTE ACCENT
LATIN CAPITAL LETTER A WITH RING ABOVE
COMBINING ACUTE ACCENT
LATIN CAPITAL LETTER A
COMBINING RING ABOVE COMBINING ACUTE ACCENT
52
Case mapping
Case mapping may produce strings of different length
01F0
004A 030C
Case mapping may depend on the locale
English 0069
0049
Turkish/Azeri 0069
0130
53
Combining charactersThings can get really wild for characters with more than one accent mark:
= 006F (o) 0302 (circumflex) 0323 (dot)
= 006F (o) 0323 (dot) 0302 (circumflex)
= 00F4 (o-circumflex) 0323 (dot)
= 1ECD (o-dot) 0302 (circumflex)
= 1ED9 (o-circumflex-dot)
54
Typography Unicode Sinhala1987- Unicode Ver. 1.0 Sinhala 1998 Unicode Ver. 3.0 Sinhala
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font55
Typography - Complex Ligature
Tva in Malayalamttha in Tamil Tva in Sinhalattha in Devanagari
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font56
Typography - Complex Ligature
Tva with ZWNJ in Malayalam Tva with ZWJ in Malayalam
Tva with ZWNJ in Sinhala Tva with ZWJ in Sinhala
U+200C UTF8 E2 80 8C U+200D UTF8 E2 80 8D
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font57
U+0000 .. U+007F 1 byte 0xxx xxxx U+0080 .. U+07FF 2 bytes 110x xxxx 10xx xxxx U+0800 .. U+FFFF 3 bytes 1110 xxxx 10xx xxxx 10xx xxxx U+10000 .. U+10FFFF 4 bytes 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
Typography - Complex Ligature-UTF 8
U+0026 AMPERSAND (decimal 38)U+0D85 SINHALA LETTER AYANNA (decimal 3,461)U+4E2D HAN IDEOGRAPH 4E2D (decimal 20,013)U+10346 GOTHIC LETTER FAIHU (decimal 66,374)U+0E12 THAI LETTER THO PHUTHAO (3602)
58
Typography - Complex Ligature
Preventing Conjunct Forms in Devanagari
Half-Consonants in Devanagari
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font59
Typography - Complex Ligature
Buddha in Sinhala
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font60
Typography - Complex Ligature in DB
//The dump for my database storing sinhala utf strings isCREATE TABLE `sinhala` ( `data` varchar(1000) character set utf8 collate utf8_bin default NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `sinhala` VALUES (');
PresenterPresentation NotesHistory of typography began after the invention of Printing press and movable type.Type : font61
Typography
Typical typefaces (fonts) and type styles used in Word Processors
Times New Roman
Courier
Palatino
Serif typefaces
ArialImpact
Arial Narrow
San Serif typefaces
symbol
free hand
Special typefaces
Type styles Bold Italics Outline
Typefaces
Crazy fonts can be distracting!
62
Typography
Special effects
Kerning increases or decreases the spacing between certain pairs of letters to improve their appearance.
Line spacing or leading
Orientation
Anti-alias : To smooth out a text edge.This makes the edges of the text blend into the background so that the text is cleaner and more readable when it is large.
PresenterPresentation NotesKerning : Most fonts include information that automat. reduces the amount of space between certain letter pairs, such as TA or Va. Fireworks auto-kerning uses a fonts kerning information when displaying text, but you may want to turn it off at smaller point sizes, or when the text has no anti-aliasing. Kerning is measured as a percentage. In Fireworks, you set horizontal and vertical orientation as well as the direction of text flow in the Property inspector. These settings apply to entire text blocks only. Show an example using Fireworks63
Typography
Ascender height
Descanter height
Cap Height
X height
Base line
64
Typography - Tracking & Kerning
65
Typography - Orientation
66
Typography Anti-alias
67
Typography
Special effects cont..
strokes, fills, effects and styles to text
stroke fill effect style
PresenterPresentation NotesSelect the character, select default stroke/fill colours, under colour palette.68
TypographySpecial effects cont..
Attaching text to a path
PresenterPresentation NotesAttaching text to a path To free text from the restrictions of rectangular text blocks, you can draw a path and attach text to it. The text flows along the shape of the path and remains editable. To place text on a path: Shift-select a text block and a path.Choose Text > Attach to Path.To detach text from a selected path: Choose Text > Detach from Path.69
TypographySpecial effects cont..
Converting text to path :
Text converted to paths retains all of its visual attributes, but you can edit it only as paths.
PresenterPresentation NotesTo edit converted text character paths individually, do one of the following: Select the converted text with the Subselection tool.Select the converted text and choose Modify > Ungroup.You can edit the individual converted character paths using the vector-editing tools. For more information on editing pathsOther effects : weight, stress, varying x-heights, alignments etc70
Typography
Bitmap Font
Vector Font
True TypeFast, Standard, for computer screen, Printer
Adobe Type 1Precise, Professional, used for publishing
Anti-aliased Small font
For LCD screen ClearType etc.
Screen from Fontographer
Normal
Optimized
PresenterPresentation NotesBitmap cannot enlarge properlyVector font can enlarge properly fundamental type for Word processors (standard) Adobe type1 better qualityclearType Microsoft technologyUsed in mobile phones, organizers.Fontographer, not available. Font lab popular71
Text- Cross-media Technology
Voice Recognition
Converts voice (sound data) text data
Need real time procession
Specific speaker/Non specific speaker
Text-to-Speech (Speech Synthesis)
Computer dictates text dataAutomatic information services/New mail dictation.
PresenterPresentation NotesShow some software72
Text- Cross-media Technology cont
Optical Character Recognition
Converts text bitmap image to real text data
Used with image scanner
Handwriting Recognition
Similar to OCR, but use writing order/direction for better recognition.
Used in PIM (Personal Information Manager)Devices (palmtop computers),
73
Text- Cross-media Technology cont
Machine Translation
All text based techniques are language dependent
Needs automatic translation
Vertical Market Technical document translationPersonal Market Web browsing
Combination of media technology
Automatically translate international telephone messages.
Japanese Voice
JapaneseText data
EnglishVoice
EnglishText data
Japanese voice recognition
MachineTranslation
EnglishSpeech Synthesis
PresenterPresentation NotesPagelater used in Iraq war74
File Format
.TXT - (unformatted text eg. Notepad)
.DOC - (Developed by Microsoft eg. MS- Word)
.RTF - (Rich Text Format)
PDF - (Portable Document Format) Adobe
PS - (Post Script) Page Description Language Use mainly for Desk Top Publishing
Multimedia Technology TextWorld of LanguagesWorld of Languages Asian CountriesWorld of Languages Asian regionWorld of Languages Asian CountriesWorld of Languages Script DiversityWorld of Languages Script in AsiaWorld of Languages Script in AsiaNature of TextText Digital FormIndexing and HypertextHypertext, Hypermedia and MultimediaTypographyTypography Evolution of Asian Scripts Typography Complex ScriptsTypography - Complex VowelsTypography ASCII & EBCDICTypography 8 Bit English and SinhalaThe Code Page ProblemMulti-byte encodingsSlide Number 21Interoperability problemsEncoding spaceEncoding spaceEncoding spaceEncoding spaceEncoding spaceUnicodeThe Unicode Encoding SpaceThe Unicode Encoding SpaceThe Unicode Encoding SpaceThe Unicode Encoding SpaceThe Unicode Encoding SpaceThe Basic Multilingual PlaneThe General Scripts AreaUnicode CoverageCharacters, Glyphs, and FontsCharacters, Glyphs, and FontsCharacters, Glyphs, and FontsUnicode propertiesCombining charactersCombining charactersCombining charactersCombining charactersCombining charactersTypography - Complex Vowels Positioning Smart rendering: ArabicSmart rendering: BurmeseSmart rendering: TamilTypography - Complex LigatureCanonical equivalenceCase mappingCombining charactersTypography Unicode SinhalaTypography - Complex LigatureTypography - Complex LigatureTypography - Complex Ligature-UTF 8Typography - Complex LigatureTypography - Complex LigatureTypography - Complex Ligature in DBTypographyTypographyTypographyTypography - Tracking & KerningTypography - OrientationTypography Anti-aliasTypographyTypographyTypographyTypographyText- Cross-media TechnologyText- Cross-media Technology contText- Cross-media Technology contFile Format