Unicode and Collation Support in Microsoft SQL Server Michael S. Kaplan Globalization Infrastructure...

Preview:

Citation preview

Unicode and Collation Support Unicode and Collation Support in Microsoft SQL Serverin Microsoft SQL Server

Michael S. KaplanGlobalization Infrastructure and Font Technology

Windows International

Microsoft

24-26 March 2003 Prague, Czech Republic (IUC23)

Unicode SupportUnicode Support

Uses the "N" or national data types from the SQL-92 specification

NCHAR, NVARCHAR, NTEXTWhat the SQL-99 spec says about UnicodeInteroperability with other clients

24-26 March 2003 Prague, Czech Republic (IUC23)

Collation in SQL Server <= 6.5Collation in SQL Server <= 6.5

No Unicode support at allOne code page per serverOne collation per serverNo good solution for multilingual support

24-26 March 2003 Prague, Czech Republic (IUC23)

Collation in SQL Server 7.0Collation in SQL Server 7.0

Unicode datatypes supportedTwo collations

– Unicode– Non-Unicode

Number of collations distilled down to the minimum necessary

24-26 March 2003 Prague, Czech Republic (IUC23)

7.0 flattening of collations7.0 flattening of collations

Example: the General Unicode sort order handles: Afrikaans, Albanian, Arabic, Basque, Belarusian, Bulgarian, English, Faeroese, Farsi, Georgian (Traditional), Greek, Hebrew, Hindi, Indonesian, Malay, Russian, Serbian, Swahili, and Urdu

24-26 March 2003 Prague, Czech Republic (IUC23)

OS independenceOS independence

Collation independent of operating systemBased on the Jet “Unicorn” DLLs

24-26 March 2003 Prague, Czech Republic (IUC23)

SQL Language SupportSQL Language Support(limited locale information)(limited locale information)

Messages Date/Time First Day of Week Currency and currency symbols Month/day names and abbreviated month

names

24-26 March 2003 Prague, Czech Republic (IUC23)

SQL Language SupportSQL Language Support(list of languages)(list of languages)

Arabic British English Brazilian Bulgarian Simplified Chinese Traditional Chinese Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hungarian

Italian Japanese Korean Latvian Lithuanian Norwegian Polish Portuguese Romanian Russian Slovak Slovenian Spanish Swedish Thai Turkish

24-26 March 2003 Prague, Czech Republic (IUC23)

Getting at the list of languagesGetting at the list of languages

sp_helplanguage stored proceduresyslanguages/sysmessages tablesSET LANGUAGE

– SET LANGUAGE čeština– SET LANGUAGE 한국어

Each language has a langid (0 – 32)

24-26 March 2003 Prague, Czech Republic (IUC23)

Collation in SQL Server 2000Collation in SQL Server 2000

Combined code pages and collations into a single entity

24-26 March 2003 Prague, Czech Republic (IUC23)

"Windows" collations"Windows" collations

Added for unique code pages(Example – Arabic)

Added for unique ordering (Example – French)

Removed for identical ordering(Example – Finnish_Swedish)

24-26 March 2003 Prague, Czech Republic (IUC23)

43 Windows Collations43 Windows Collations Albanian Arabic Chinese_PRC Chinese_PRC_Stroke Chinese_Taiwan_Bopomofo Chinese_Taiwan_Stroke Cyrillic_General Croatian Czech Danish_Norwegian Estonian Finnish_Swedish French Georgian_Modern_sort German_PhoneBook Greek Hebrew Hindi Hungarian Hungarian_Technical Icelandic Japanese

Japanese_Unicode Korean_Wansung Korean_Wansung_Unicode Latin1_General Latvian Lithuanian Lithuanian_Classic FYRO Macedonian Spanish (Spain) Polish Romanian Slovak Slovenian Thai Traditional_Spanish Turkish Ukrainian Vietnamese  

24-26 March 2003 Prague, Czech Republic (IUC23)

Windows collations, continuedWindows collations, continued

Suffix meanings– _BIN (Binary)– _CI/_CS (Case sensitivity)– _AI/_AS (Accent sensitivity)– _KS - kanatype sensitivity (hiragana/katakana)– _WS - width sensitivity (full/half width)

24-26 March 2003 Prague, Czech Republic (IUC23)

SQL CollationsSQL Collations

Provided for backwards compatibility with prior versions of SQL Server

24-26 March 2003 Prague, Czech Republic (IUC23)

SQL CollationsSQL Collations SQL_1xCompat_CP850 SQL_Estonian_CP1257 SQL_Latin1_General_Pref_CP437 SQL_AltDiction_CP1253 SQL_Hungarian_CP1250 SQL_Latin1_General_Pref_CP850 SQL_AltDiction_CP850 SQL_Icelandic_Pref_CP1 SQL_Latvian_CP1257 SQL_AltDiction_Pref_CP850 SQL_Latin1_General_CP1 SQL_Lithuanian_CP1257 SQL_Croatian_CP1250 SQL_Latin1_General_CP1250 SQL_MixDiction_CP1253

SQL_Czech_CP1250 SQL_Latin1_General_CP1251 SQL_Polish_CP1250 SQL_Danish_Pref_CP1 SQL_Latin1_General_CP1253 SQL_Romanian_CP1250 SQL_EBCDIC037_CP1 SQL_Latin1_General_CP1254 SQL_Scandinavian_CP850 SQL_EBCDIC273_CP1 SQL_Latin1_General_CP1255 SQL_Scandinavian_Pref_CP850 SQL_EBCDIC277_CP1 SQL_Latin1_General_CP1256 SQL_Slovak_CP1250

SQL_EBCDIC278_CP1 SQL_Latin1_General_CP1257 SQL_Slovenian_CP1250 SQL_EBCDIC280_CP1 SQL_Latin1_General_CP437 SQL_SwedishPhone_Pref_CP1 SQL_EBCDIC284_CP1 SQL_Latin1_General_CP850 SQL_SwedishStd_Pref_CP1 SQL_EBCDIC285_CP1 SQL_Latin1_General_Pref_CP1 SQL_Ukrainian_CP1251 SQL_AltDiction_CP1253 SQL_Hungarian_CP1250  SQL_Latin1_General_Pref_CP850  

24-26 March 2003 Prague, Czech Republic (IUC23)

Collation at four levelsCollation at four levels

ServerDatabaseColumnExpression

24-26 March 2003 Prague, Czech Republic (IUC23)

At the server levelAt the server level

Acts as a default for all databasesCan be changed with RebuildM.exe in the

tools\BINN dirQuerying the server collation:

SELECT CONVERT(char, SERVERPROPERTY('collation'))

24-26 March 2003 Prague, Czech Republic (IUC23)

At the database levelAt the database level

Every database has a collation (default is the server collation)

Collation can be changed under some circumstances

24-26 March 2003 Prague, Czech Republic (IUC23)

At the column levelAt the column level

Overrides database level collationSpecifies code page for non-Unicode

columnsAgain, can be changed under some

circumstancesNo multilingual columns with separate

collations

24-26 March 2003 Prague, Czech Republic (IUC23)

At the expression levelAt the expression level

Can be used to override any other collationuses the COLLATE keyword

24-26 March 2003 Prague, Czech Republic (IUC23)

Metadata in System TablesMetadata in System Tables

All stored as Unicode no matter what the database collation is

Unicode 2.0 repertoire is used for identifiers (use brackets or quotes around anything else)

24-26 March 2003 Prague, Czech Republic (IUC23)

More on the COLLATE keywordMore on the COLLATE keyword

COLLATE [<Windows_Collation_name>|<SQL_Collation_Name]

Specific rules of precedence:– Explicit (two explicits == runtime error)– Implicit (two implicits == no collation)– Default– <no collation>

24-26 March 2003 Prague, Czech Republic (IUC23)

LimitationsLimitations

Features people will want for future versions– LCID --> Collation– ISO string <--> Collation– Creating custom collations?

24-26 March 2003 Prague, Czech Republic (IUC23)

ReferencesReferences

http://microsoft.com/globaldev/ “International Features in Microsoft SQL Server

2000”

(by Michael Kaplan) at http://msdn.microsoft.com/

24-26 March 2003 Prague, Czech Republic (IUC23)

Questions?Questions?

24-26 March 2003 Prague, Czech Republic (IUC23)

Unicode and Collation Support

in Microsoft SQL Server

Don’t Forget Your Evaluations!

Recommended