28
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support

Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support

Embed Size (px)

Citation preview

Tafseer Ahmed Department of Computer ScienceUniversity of Karachi

Urdu on Linux

International Support

Note:

All the issues and support discussed for Urdu are also applicable for other Pakistani Languages like Sindhi, Pashto, Punjabi, Balochi etc.

Character EncodingFontText Display Engine

Character, Script, Glyph and Font

Character

The character is identified as an abstract entity, such as "LATIN CHARACTER CAPITAL A" or ”ARABIC CHARACTER HA”.

Every Character has only one position/ code point in character representation schemes like Unicode.

Glyph

The visual representation of the character made on screen or paper is called a Glyph.

A Character can have more than one Glyphs.

Script

Script is writing Style of a language.

For Example, English and French are written in Roman Script and Urdu and Farsi are written in Arabic Script

Writing Styles of Urdu

Naskh

Nastaleeq

Character Encoding

Character Encoding

Data and hence Text is stored in computer using Binary Numbers.

Character Encoding scheme like ASCII, EBCIDIC gives mapping of (English) Characters to Binary Numbers (for storage and processing).

Character of any language can have character encoding. This is basis of Code Pages.

Every language has a Code Page which have encoding of that language’s characters.

Character Encoding of Urdu

Propriety Standards (Biggest Problem in Urdu Software Development)

Urdu Zabta Takhti (National standard code page of Urdu)

Unicode (International Standard for Multilingual Characters)

Unicode

Unicode is repository of characters of almost all languages of the world.

Unicode has more than 65,000 code-points for characters.

All Software vendors are now supporting or switching to Unicode.

Unicode™ / ISO 10646

16-bit international character encoding

0x0000

0xFFFF

PunctuationPunctuation

Future useFuture use

ASCIIASCII

Private usePrivate use

CompatibilityCompatibility

IndianIndian

GreekGreek

Arabic, HebrewArabic, Hebrew

LatinLatin

IdeographsIdeographs(Hanzi, Kanji, (Hanzi, Kanji, Hanja)Hanja)

SymbolsSymbols

HangulHangulKanaKana

ThaiThai

A0041 9662 FF96 4F85 0000

((null)null)

Font for Text Display

Open Type Font

OpenType is a new cross-platform font file format developed jointly by Adobe and Microsoft.

It is an extension of True Type Font.

OpenType Font may contain more than 65,000 glyphs.

One character may correspond to several glyphs.

A rich mapping between characters and glyphs, which supports ligatures, positional forms, alternates, and other substitutions.

Information to support features for two-dimensional positioning and glyph attachment.

It Explicit script and language information, so a text-processing application can adjust its behavior accordingly

Tables in OTF Font

CMAP (Character to Glyph Mapping) GDEF (Glyph Definition Data) GPOS (Glyph Position Data) GSUB (Glyph Substitution Data) BASE (Baseline Data) JSTF (Justification Data)

GSUB An Example of OTF Tables

information for substituting glyphs to render the scripts and language systems supported in a font.

Types of Substitution A Single Substitution replaces a single glyph

with another single glyph.

An Alternate Substitution identifies functionally equivalent but different looking forms of a glyph.

A Multiple Substitution replaces a single glyph with more than one glyph. This is used to specify actions such as ligature decomposition.

A Ligature Substitution replaces several glyph indices with a single glyph index.

Contextual substitution describes glyph substitutions in context–that is, a substitution of one or more glyphs within a certain pattern of glyphs.

Each substitution describes one or more input glyph sequences and one or more substitutions to be performed on that sequence.

Text Display Engine

The Alphabet Soup GNOME is a desktop environment for the user, as

well as a powerful application framework for the software developer.

GTK+ is a multi-platform toolkit for creating graphical user interfaces offering a complete set of widgets.

GTK+ is based on three libraries : GLib Pango ATK library

GNOME uses GTK+ for graphical user interface. GNOME and GTK+ are open source software and

part of GNU Project

Pango Word “Pango” consists of:

Greek "Pan" / U03A0 U03B1 U03BD / All

Japanese "Go" / U8A9E / Language

Pango project is an open-source framework for the layout and rendering of internationalized text.

Pango uses Unicode (UTF-8 encoded strings) for all of its encoding, and will eventually support output in all the worlds major languages.

Pango Fonts

Pango give support to following fonts Bitmap Fonts

under the X windowing system,

Type1 fonts

Adobe Standard

TrueType fonts

Apple and Microsoft Standard

OpenType fonts Adobe and Microsoft Standard

The Layout and Rendering Pipelineabc PAY ALIF KAF SEEN TAY ALIF NOON

Itemization The input string is broken into portions rendered with a consistent font, with a consistent language tag, and with a specific bidirectional embedding level.

{abc} {PAY ALIF KAF SEEN TAY ALIF NOON}

Reordering The items are reordered from logical order into visual order according to their bidirectional embedding levels.

{abc} {NOON ALIF TAY SEEN KAF ALIF PAY}

The Layout and Rendering Pipeline (contd.)

Glyph Selection (Shaping) The characters in each item are turned into glyphs.

Justification The glyph strings created in the previous step are adjusted to fit the line-justification policies that are in place.

Rendering The justified glyph strings are rendered in their final order onto the output device.

abc پاکستان

Sample Screenshots

The GTK+ color selector localized to Farsi

GTK+ labels rendering various languages

Web Resources• www.unicode.orgwww.unicode.org

• www.adobe.com/type/opentype/www.adobe.com/type/opentype/

• www.microsoft.com/typography/developers/opentype/www.microsoft.com/typography/developers/opentype/

• communities.msn.com/MicrosoftVOLTuserscommunity/

• www.gtk.orgwww.gtk.org

• www.pango.org

• i18n.kde.org • tremu.gov.pk/tremu/workingroups/url.htm