Upload
sybil-mckenzie
View
220
Download
1
Embed Size (px)
Citation preview
DAT290Databases for the World:Designing Multilingual Databases Using SQL Server 2005
Michael KaplanTechnical LeadGlobalization Infrastructure, Fonts & ToolsMicrosoft Corporation
DAT290Wereld databanken:Meertalige databases ontwerpen met SQL Server 2005
Michael KaplanTechnical LeadGlobalization Infrastructure, Fonts & ToolsMicrosoft Corporation
Agenda
Unicode tekstvelden gebruiken
Omgaan met niet-Unicode data
SQL Server collation-ondersteuning
Collations aanpassen
Te vermijden collation-valkuilen
Andere taalondersteuningen
Meer informatie
Agenda
Using Unicode text fields
Handling non-Unicode data
SQL Server collation support
Changing Collations
Collation pitfalls to avoid
Other language support issues
Resources
Unicode is…
16-bit encoding, originally designed to handle 65,536 code points
Supplementary characters allow for over 1 million code points
Microsoft Windows NT (especially Windows 2000) and later is Unicode-based
0x0000
0xFFFF
PunctuationPunctuation
ASCIIASCII
Private usePrivate use
CompatibilityCompatibility
IndianIndian
GreekGreek
Arabic, HebrewArabic, Hebrew
LatinLatin
IdeographsIdeographs(Hanzi, Kanji,(Hanzi, Kanji,
Hanja) Hanja)
SymbolsSymbols
HangulHangul
KanaKana
ThaiThai
SurrogatesSurrogates
Not to scale!Not to scale!
A0041 9662 FF96 0000
(null)
Unicode
MBCS41 88 AF D6 00
SurrogatesSurrogates
Using Unicode Text Fields (cont’d)Basics about SQL Server Unicode
NCHAR, NVARCHAR, NTEXT
META data – SYSNAME
N’Unicode’
UCS-2 Server storage(with some UTF-16 abilities)
Conversions between ANSI to Unicode use MultiByteToWideChar and WideCharToMultiByte
Using Unicode Text Fields (cont’d)Unicode encoding
UTF-16 (or UCS-2)
The “standard” 16-bit Unicode
UTF-32 (or UCS-4)
UTF-8
“Unicode Transformation Format, 8-bit”
For transmission over unknown lines
UTF-8 Bit Distribution
UTF-16 1st Byte 2nd Byte 3rd Byte 4th Byte
00000000 0xxxxxxx 0xxxxxxx - - -
00000yyy yyxxxxxx 110yyyyy 10xxxxxx - -
zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx -
110110ww wwzzzzyy110111yy yyxxxxxx
11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
Using Unicode Text Fields (cont’d)UCS-2 vs. UTF-16
The difference?
Supplementary character (also known as surrogate pair) support
Using Unicode Text Fields (cont’d)Supplementary characters
Surrogate range from U+D800 to U+DFFF
Contains 1024 low and 1024 high surrogate values
Surrogate aware comparison operator for all new (*_90) collations
Ability to manipulate and display supplementary characters
Using Unicode Text Fields (cont’d)Surrogate pairs in new (*_90) collations
Supplementary character aware=, <, >, >=, =<, <>, !=, !<, !>
With Group By, Order By, Distinct
Based on code point comparisonEquivalent to code point comparison when comparing two supplementary characters
When comparing a supplementary and a BMP character, the supplementary character has a fixed weight
Native T-SQL functions give surrogate code units no weight
Agenda
Using Unicode text fields
Handling non-Unicode data
SQL Server collation support
Changing Collations
Collation pitfalls to avoid
Other language support issues
Resources
Handling Non-Unicode DataGB18030 Support
Chinese National Standard
Incorporates all of Unicode, including Extension A and B ideographs
Can be one, two, or four bytes
Microsoft SQL Server has support for:
Converting native GB18030 stream into Unicode
Storing and manipulating native GB18030 (as Unicode characters)
Output can be either native GB18030 or Unicode characters
Handling Non-Unicode Data (cont’d)All other non-Unicode data
Use SQL Server collations
Only ANSI code pages (ACPs) supported
Use the .NET Framework Encoding support or tools like BCP/DTS for other code pages
Agenda
Using Unicode text fields
Handling non-Unicode data
SQL Server collation support
Changing Collations
Collation pitfalls to avoid
Other language support issues
Resources
SQL Server Collation SupportCollation basics
Provides an appropriate sort orderGerman: a < ä, Swedish: z < äFrench: x < y, Lithuanian: i < y
Collations also define code pages for non-Unicode conversions
55 language collations – Updated to Microsoft Windows XP/Microsoft Windows Server 2003 data (plus Turkish i!)
New Comparison Flag
_BIN2 (Binary) Fastest – True Unicode Code Point Compare
New Yukon collations are tagged _90
SQL Server Collation Support (cont’d)Technical details
Windows 55 language collationsWindows Locale + sort orderSuffix meanings (comparison flag (sort order)
_BIN (Binary) Fastest _BIN2 (Binary code Point) Fastest*_CI/_CS (Case sensitivity) (A == a)_AI/_AS (Accent sensitivity) (á == a)_KS - kanatype sensitivity (hiragana == katakana) _WS - width sensitivity (full width == half width)
SQL CollationsProvided for backwards compatibility with prior versions of SQL Server
* Slightly slower on insert than _BIN, same speed otherwise
SQL Server Collation Support (cont’d)Technical details (cont’d)
What makes up the full set?
All unique collations
All unique ANSI code pages
All new unique collations
All SQL compatibility collations
The full number of SQL Server collations
Select * FROM ::fn_helpcollations()Select * FROM ::fn_helpcollations()
SQL Server Collation Support (cont’d)A four-level approach
Collation can be specified at four levels
Server
Database
Column
Expression
SQL Server Collation Support (cont’d)At the server level
Acts as a default for all databases
Can be changed with RebuildM.exe in the tools\BINN dir (or unattended setup)
Querying the server collation:
SELECT CONVERT(char, SERVERPROPERTY('collation'))SELECT CONVERT(char, SERVERPROPERTY('collation'))
SQL Server Collation Support (cont’d)Setup/upgrade
Unicode datatypes in master database
DB-object metadata converted in Unicode
Sort order compatibility
7.07.08.08.06.56.5
SQL Server Collation Support (cont’d)At the database level
Every database has a collation (default is the server collation)
Collation can be changed using ALTER DATABASE
Create database db_test collate Latin1_General_CI_AICreate database db_test collate Latin1_General_CI_AI
ALTER DATABASE ALTER DATABASE CONVERT(char, CONVERT(char, DATABASEPROPERTYEX(‘db_test', 'collation')) DATABASEPROPERTYEX(‘db_test', 'collation'))
SQL Server Collation Support (cont’d)At the column level
Overrides database level collation
CREATE TABLE jobs( job_id smallint IDENTITY(1,1) PRIMARY KEY CLUSTERED, job_desc nvarchar(50) COLLATE Arabic_CI_AI_KS NOT NULL DEFAULT 'New Position - title not formalized yet',)
SQL Server Collation Support (cont’d)At the expression level
SELECT * FROM tblCustomers ORDER BY LastName COLLATE Lithuanian_AI_CI,FirstName COLLATE Lithuanian_AI_CI
SELECT * FROM tblCustomers ORDER BY LastName, FirstName
Collation-aware syntax
Regular syntax
SQL Server Collation Support (cont’d)Collation example
Inherits collation from Name column (French_CS_AS)
Forced collation/ordering
SELECT *FROM PRODUCTSORDER BY Name, Description
SELECT *FROM PRODUCTSORDER BY Name COLLATE French_CI_AI, Description
SQL Server Collation Support (cont’d)The rules of precedence
Explicit B
Expression
Implicit B
Column
Default
DatabaseNo Collation
Explicit A
ExpressionRuntime Error Explicit A Explicit A Explicit A
Implicit A
ColumnExplicit B No Collation Implicit A No Collation
Default
DatabaseExplicit B Implicit B Default No Collation
No Collation Explicit B No Collation No Collation No Collation
SQL Server Collation Support (cont’d)Precedence example
This query will return the following error:
Msg 446, Level 16, State 9, Server cpsqlintl3, Line 1Cannot resolve collation conflict for equal to operation
This query will work properly and return one row:
CREATE TABLE TestTab (id int,
GreekCol nvarchar(10) collate greek_ci_as,
LatinCol nvarchar(10) collate latin1_general_cs_as)
INSERT TestTab VALUES (1, N'A', N'a')
SELECT * FROM TestTab
WHERE GreekCol = LatinCol
SELECT * FROM TestTab
WHERE GreekCol = LatinCol COLLATE greek_ci_as
SQL Server Collation Support (cont’d)Full Text Search
Allows for word or phrase-based indexing of character data
Full-text indexing enables the creation and population of the full-text catalogs
Maintained outside of SQL Server
Managed by the Microsoft Search service
Full-text search uses new Transact-SQL predicates to query these populated full-text catalogs
CONTAINS
CONTAINSTABLE
FREETEXT
FREETEXTTABLE
With a full-text query, you can perform:
A linguistic search of character data in tables enabled for full-text search
A linguistic search that operates on words and phrases unlike the LIKE predicate which is used to search character patterns
SQL Server Collation Support (cont’d)Full Text Search enhancements
Accent sensitivity
New word-breakers are accent sensitive
Usage of LCID
Specify the LCID on individual queries
Choose the correct language resources in a query
Useful against both XML and BLOB data
Number of supported MS Search word breakers is increasing all the time
Agenda
Using Unicode text fields
Handling non-Unicode data
SQL Server collation support
Changing Collations
Collation pitfalls to avoid
Other language support issues
Resources
Changing CollationsUnicode fields
No data loss – just an order change....
Gotchas when moving between old and new collations
Changing Collations (cont’d)Non-Unicode fields
Nearly guaranteed data loss in most cases
Many caveats – only 100% safe prior to data being added
Many safeguards to prevent data loss
Changing Collations (cont’d) Updates to tools and technologies
BCP
DTS (Data Transformation Services)
SQLCMD
XML Support
Agenda
Using Unicode text fields
Handling non-Unicode data
SQL Server collation support
Changing Collations
Collation pitfalls to avoid
Other language support issues
Resources
Collation Pitfalls to AvoidAvoid accent-insensitive collations
U+ac00 (가 , Hangul Syllable Kiyeok A) under Korean_Wansung_CI_AI or _CS_AI is equal to all of the following characters:
伽佳假價加可呵哥嘉嫁家暇架枷柯歌珂痂稼苛茄街袈訶賈跏軻迦駕仮傢咖哿坷宊斝榎檟珈笳耞舸葭謌
All of these Hanja have the same pronunciation as U+ac00, and differ only by that secondary “diacritic” weight
User perception of any operation for sorting/searching that treats them as equal – BROKEN APPLICATION
Other collations have similar problems
Thai, Indic languages, others
Workaround – search with AI but sort by AS (closest matches will be on top)
Collation Pitfalls to Avoid (cont’d)Be careful with wildcard searches
Take two queries:
Using Thai_CS_AS collation,Query #1 returns true, query #2 returns false!
In Thai, U+0e40 U+0e21 is a unique compression
Has its own sort weight
Neither U+0e40 nor U+0e21 can be found individually when they are together in a text field
Workaround: use a Binary collation
if N'เหม' like N'%ม%' select 'true' else select 'false'if N'เม' like N'%ม%' select 'true' else select 'false‘
Collation Pitfalls to Avoid (cont’d)Binary collations have no language
All *_BIN and *_BIN2 collations are equal for Unicode data
For non-Unicode data, they control code page to use and nothing else
Collation Pitfalls to Avoid (cont’d)Binary collations, SQL Server vs. the CLR
*_BIN == Ordinal
*_BIN2 == <nothing>
<nothing> == OrdinalIgnoreCase
Agenda
Using Unicode text fields
Handling non-Unicode data
SQL Server collation support
Changing Collations
Collation pitfalls to avoid
Other language support issues
Resources
Other Language Support IssuesOverview
Culturally appropriate behavior
Language of error messages
Date/Time
Month/day names and abbreviated month names
First day of week
Currency and currency symbols
Other Language Support Issues (cont’d)Setting the language
Session language syslanguages
Set by ‘set language’
SET LANGUAGE French
Set by connection attribute
OLE DB – SSPROP_INIT_CURRENTLANGUAGE
ODBC – LANGUAGE keyword
.NET Data Provider – language keyword in ConnectionString
Set by user record in syslogins
sp_configure "language", 2
sp_helplanguage
Other Language Support Issues (cont’d)Date/Time support
DATETIME
1 Jan 1753 to 31 Dec 9999
1/300 of a second precision (3.33 milliseconds)
Stored as 2x 4-byte integers
SMALLDATETIME
1 Jan 1900 to 6 Jun 2079
1 minute precision (29.998 seconds round down)
Stored as 2x 2-byte integers
No name strings in date/time
Use Convert() Function to format date/time values
SELECT CONVERT(char, GETDATE(), 104) AS [104]
21.11.2002
UDT support for Date/Time
Custom calendars!
Other Language Support Issues (cont’d)SQL Server vs. the CLR
Updated collations that match the CLR
Look for the _90 suffix on the name
SQL Server locale semantics:
One setting for UI and formatting
Another setting for collation/encoding
.NET/Windows semantics
One setting for UI
Another setting for formatting/collation
Solution
Special GetCultureInfo override that takes two CultureInfo names for the two SQL Server settings
Other Language Support Issues (cont’d)SQL Server vs. the CLR, continued
Microsoft.ReportingServices.Diagnostics.Localization
CatalogCulture
ClientPrimaryCulture
DefaultReportServerCulture
FallbackUICulture
InstalledCultureNames
ReportParameterCulture
SqlCulture
Other Language Support Issues (cont’d)Documentation enhancements
Books Online
International Considerations for SQL Server
Windows Server System Books
Planning
Upgrading from SQL Server 2000 to 2005
Migrating your Applications from SQL Server 2000 to 2005
Planning Guide
Collation Compatibility (collation/language/SQL version)
Collation Support
Collation Performance Considerations
Deployment
Deploying SQL Server 2005 in an International Environment
Mixed Collation Environments
Windows versus SQL Collations
Locale versus Country versus Region
Agenda
Using Unicode text fields
Handling non-Unicode data
SQL Server collation support
Changing Collations
Collation pitfalls to avoid
Other language support issues
Resources
Resources
International Features in Microsoft SQL Server 2000: http://msdn.microsoft.com/library/en-us/dnsql2k/html/intlfeaturesinsqlserver2000.asp
Arabic Language Support in MS SQL Server 2000: http://msdn.microsoft.com/library/en-us/dnsql2k/html/sql_arabicsupport.asp
Preserving Client-Server Data Integrity with Unicode and Microsoft SQL Server 2000:http://msdn.microsoft.com/library/en-us/dnsql2k/html/sql_dataencoding.asp
SQL Server Books On Line
Sorting It All Out: http://blogs.msdn.com/michkap
Global Development: http://microsoft.com/globaldev
Ask The ExpertsGet Your Questions Answered
You can find me at the Microsoft Ask the Experts area, located in the Exhibition Hall:
Tuesday 5 July 13:30 – 15:00Thursday 7 July 13:30 – 15:00
Community ResourcesMicrosoft Community Resourceshttp://www.microsoft.com/communities/default.mspx
Non-Microsoft Community Resourceshttp://www.microsoft.com/communities/related/default.mspx
NewsgroupsConverse online with Microsoft Newsgroups, including Worldwidehttp://www.microsoft.com/communities/newsgroups/default.mspx
User GroupsMeet and learn with your peershttp://www.microsoft.com/communities/usergroups/default.mspx
Attend a free chathttp://www.microsoft.com/communities/chats/default.mspx
Attend a free web casthttp://www.microsoft.com/usa/webcasts/default.asp
Most Valuable Professional (MVP)http://mvp.support.microsoft.com/
Microsoft Learning Resources:
Click here to access free Microsoft Learning Assessments http://www.microsoft.com/learning/assessment/ind/default.asp
To access free Microsoft Learning E-Learning visit http://www.microsoft.com/learning/access and reference the promotional code 9185-TECHED-6650.
Special offers on Microsoft Certificationfrom Microsoft Learning
Click here to take advantage of these special offers: http://www.microsoft.com/learning/mcp/
Get ready for Microsoft Visual Studio 2005 and Microsoft SQL Server 2005 with free Assessments and E-Learning, and a chance to win a laptop, GPS, and more
1 2 3
Self Support• Knowledge Base
• Microsoft Newsgroups
• Support Webcasts
Assisted Support
Email Support
Phone Support
Advisory Services
Remotely delivered, hourly fee–based, consultative support option
Contract-based Support
Microsoft Premier Support
Provides premium level support and a designated Account Manager
Microsoft Essential Support
Prepackaged support options including phone and online access to a pool of skilled Technical Account Specialists
Microsoft Partner Advantage
Partner Advantage provides technology partners the best response times and highest level of problem resolution support
Free of charge support Fee based support Managed support
The Microsoft Guide to Support Options
Resources:Microsoft Support Website: http://support.microsoft.com Microsoft TechNet program: http://technet.microsoft.comMicrosoft MSDN program: http://msdn.micrososft.com
© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.