Upload
todd-lyons
View
219
Download
5
Embed Size (px)
Citation preview
Tafseer Ahmed Department of Computer ScienceUniversity of Karachi
Urdu on Linux
International Support
Note:
All the issues and support discussed for Urdu are also applicable for other Pakistani Languages like Sindhi, Pashto, Punjabi, Balochi etc.
Character
The character is identified as an abstract entity, such as "LATIN CHARACTER CAPITAL A" or ”ARABIC CHARACTER HA”.
Every Character has only one position/ code point in character representation schemes like Unicode.
Glyph
The visual representation of the character made on screen or paper is called a Glyph.
A Character can have more than one Glyphs.
Script
Script is writing Style of a language.
For Example, English and French are written in Roman Script and Urdu and Farsi are written in Arabic Script
Character Encoding
Data and hence Text is stored in computer using Binary Numbers.
Character Encoding scheme like ASCII, EBCIDIC gives mapping of (English) Characters to Binary Numbers (for storage and processing).
Character of any language can have character encoding. This is basis of Code Pages.
Every language has a Code Page which have encoding of that language’s characters.
Character Encoding of Urdu
Propriety Standards (Biggest Problem in Urdu Software Development)
Urdu Zabta Takhti (National standard code page of Urdu)
Unicode (International Standard for Multilingual Characters)
Unicode
Unicode is repository of characters of almost all languages of the world.
Unicode has more than 65,000 code-points for characters.
All Software vendors are now supporting or switching to Unicode.
Unicode™ / ISO 10646
16-bit international character encoding
0x0000
0xFFFF
PunctuationPunctuation
Future useFuture use
ASCIIASCII
Private usePrivate use
CompatibilityCompatibility
IndianIndian
GreekGreek
Arabic, HebrewArabic, Hebrew
LatinLatin
IdeographsIdeographs(Hanzi, Kanji, (Hanzi, Kanji, Hanja)Hanja)
SymbolsSymbols
HangulHangulKanaKana
ThaiThai
A0041 9662 FF96 4F85 0000
((null)null)
Open Type Font
OpenType is a new cross-platform font file format developed jointly by Adobe and Microsoft.
It is an extension of True Type Font.
OpenType Font may contain more than 65,000 glyphs.
One character may correspond to several glyphs.
A rich mapping between characters and glyphs, which supports ligatures, positional forms, alternates, and other substitutions.
Information to support features for two-dimensional positioning and glyph attachment.
It Explicit script and language information, so a text-processing application can adjust its behavior accordingly
Tables in OTF Font
CMAP (Character to Glyph Mapping) GDEF (Glyph Definition Data) GPOS (Glyph Position Data) GSUB (Glyph Substitution Data) BASE (Baseline Data) JSTF (Justification Data)
GSUB An Example of OTF Tables
information for substituting glyphs to render the scripts and language systems supported in a font.
Types of Substitution A Single Substitution replaces a single glyph
with another single glyph.
An Alternate Substitution identifies functionally equivalent but different looking forms of a glyph.
A Multiple Substitution replaces a single glyph with more than one glyph. This is used to specify actions such as ligature decomposition.
A Ligature Substitution replaces several glyph indices with a single glyph index.
Contextual substitution describes glyph substitutions in context–that is, a substitution of one or more glyphs within a certain pattern of glyphs.
Each substitution describes one or more input glyph sequences and one or more substitutions to be performed on that sequence.
The Alphabet Soup GNOME is a desktop environment for the user, as
well as a powerful application framework for the software developer.
GTK+ is a multi-platform toolkit for creating graphical user interfaces offering a complete set of widgets.
GTK+ is based on three libraries : GLib Pango ATK library
GNOME uses GTK+ for graphical user interface. GNOME and GTK+ are open source software and
part of GNU Project
Pango Word “Pango” consists of:
Greek "Pan" / U03A0 U03B1 U03BD / All
Japanese "Go" / U8A9E / Language
Pango project is an open-source framework for the layout and rendering of internationalized text.
Pango uses Unicode (UTF-8 encoded strings) for all of its encoding, and will eventually support output in all the worlds major languages.
Pango Fonts
Pango give support to following fonts Bitmap Fonts
under the X windowing system,
Type1 fonts
Adobe Standard
TrueType fonts
Apple and Microsoft Standard
OpenType fonts Adobe and Microsoft Standard
The Layout and Rendering Pipelineabc PAY ALIF KAF SEEN TAY ALIF NOON
Itemization The input string is broken into portions rendered with a consistent font, with a consistent language tag, and with a specific bidirectional embedding level.
{abc} {PAY ALIF KAF SEEN TAY ALIF NOON}
Reordering The items are reordered from logical order into visual order according to their bidirectional embedding levels.
{abc} {NOON ALIF TAY SEEN KAF ALIF PAY}
The Layout and Rendering Pipeline (contd.)
Glyph Selection (Shaping) The characters in each item are turned into glyphs.
Justification The glyph strings created in the previous step are adjusted to fit the line-justification policies that are in place.
Rendering The justified glyph strings are rendered in their final order onto the output device.
abc پاکستان
Sample Screenshots
The GTK+ color selector localized to Farsi
GTK+ labels rendering various languages
Web Resources• www.unicode.orgwww.unicode.org
• www.adobe.com/type/opentype/www.adobe.com/type/opentype/
• www.microsoft.com/typography/developers/opentype/www.microsoft.com/typography/developers/opentype/
• communities.msn.com/MicrosoftVOLTuserscommunity/
• www.gtk.orgwww.gtk.org
• www.pango.org
• i18n.kde.org • tremu.gov.pk/tremu/workingroups/url.htm