55
9 Chapter Nine Extracting and Transforming Data with SQL Server 2000

Chapter Nine

  • Upload
    buzz

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Chapter Nine. Extracting and Transforming Data with SQL Server 2000. Objectives. Understand the option in SQL Server 2000 for inserting from and extracting to external data sources - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter Nine

9Chapter NineChapter Nine

Extracting and Transforming Data with SQL Server 2000

Page 2: Chapter Nine

9 ObjectivesObjectives

• Understand the option in SQL Server 2000 for inserting from and extracting to external data sources

• Use the BULK INSERT statement and the BCP utility in SQL Server 2000 to move records into and out of tables and views

• Use the Data Transformation Services (DTS) interface to import and export data

Page 3: Chapter Nine

9 ObjectivesObjectives

• Create and save DTS packages to perform transformations and add workflow to packages

• Identify and implement an appropriate strategy for replicating data among instances of SQL Server 2000

• Understand and configure SQL Server 2000 XML support

Page 4: Chapter Nine

9 The BULK INSERT Statement

The BULK INSERT Statement

• BULK INSERT statement– Used in SQL Server 2000 to insert data into a

table or view from a data file• Architecture

– The OLE DB provider in SQL Server 2000 interprets the BULK INSERT statement natively• As a result, it is possible to use the bulk copy API

from ODBC, OLE DB, SQL-DMO and DB-Library-based applications

Page 5: Chapter Nine

9 The BULK INSERT Statement

The BULK INSERT Statement

• The BULK INSERT statement uses a simple syntax for inserting data from a file into a table– The destination table name, qualified by the

database name, and the source data file are the only two parameters that must be specified for each BULK INSERT operation to succeed

Page 6: Chapter Nine

9 BULK INSERT Statement Usage Scenarios

BULK INSERT Statement Usage Scenarios

• The BULK INSERT statement is a high-performance way to insert data from a file into SQL Server 2000 using a standard T-SQL statement– Best used when the data to be imported exist in a tabular,

non-hierarchical format, such as a comma-delimited file– Can be used within a stored procedure as part of a data

feed and can be initiated by scheduling a job that executes the stored procedure

– Can be used if an event or program external to SQL Server 2000 initiates the data import

Page 7: Chapter Nine

9 BULK INSERT Statement Performance

BULK INSERT Statement Performance

• BULK INSERT statement and BCP are two high-performance ways to insert data into database tables in SQL Server 2000– There is an overhead associated with logging the

insert of records using these methods– SQL Server 2000 incurs this overhead to ensure

that records inserted can be rolled back and rolled forward

Page 8: Chapter Nine

9 BULK INSERT Statement Performance

BULK INSERT Statement Performance

• To optimize the performance of a bulk insert, it is possible to cause SQL Server 2000 not to log the inserts of records– To do this, ensure the following are true:

• Database option “select into/bulkcopy” is set to true• Target table has no indexes or if the table has indexes, it

is empty when the bulk copy starts• Target table is not being replicated• TABLOCK hint is specified

Page 9: Chapter Nine

9 Bulk Copy ProgramBulk Copy Program

• Bulk Copy Program– Used to move data into and out of SQL Server

2000• BCP syntax

– BCP utility requires a slightly different syntax than the BULK INSERT statement but uses similar parameters• A key difference in syntax is that it is necessary to

specify user name, password, database name and server name when using the BCP utility

Page 10: Chapter Nine

9 BCP in ActionBCP in Action

Figure 9-1: The BCP command to export order data

Page 11: Chapter Nine

9 BCP Usage ScenariosBCP Usage Scenarios

• The same requirements that lead to the selection of the BULK INSERT statement for inserting large sets of data into SQL Server 2000 can be used to select the BCP utility– In addition, you should select the BCP utility when

you need to export data from a SQL Server table into a data file and extensive transformations are not necessary

Page 12: Chapter Nine

9 BCP PerformanceBCP Performance

• Like BULK INSERT statement, BCP utility is a high-performance, batch-oriented mechanism for transferring large sets of records– Because both implement the Bulk Copy API, the

performance of BCP will not differ much from the performance of the BULK INSERT statement

Page 13: Chapter Nine

9 Data Transformation Services

Data Transformation Services

• Data Transformation Services (DTS)– Set of features bundled together for use in

Enterprise Manager• Interface

– SQL Server 2000 provides an intuitive, easy-to-use interface to the Data Transformation Services in the format of the Data Transformation Services Import/Export Wizard

Page 14: Chapter Nine

9 InterfaceInterface

Figure 9-2: Invoking the DTS Wizard from Enterprise Manager

Page 15: Chapter Nine

9 InterfaceInterface

Figure 9-3: Invoking the DTS Wizard from the program group

Page 16: Chapter Nine

9 Executing a Transformation Using DTS

Executing a Transformation Using DTS

• The first two steps when executing a transformation using DTS are to correctly specify the source and target data repositories– Subsequently, it is necessary to specify several

options about the data transfer and customer transformations• At a minimum, the user executing a transformation needs

to select one or more tables or views to copy from the source to the target data repositories

Page 17: Chapter Nine

9 Choosing a Data Source/DestinationChoosing a Data

Source/Destination• It is necessary to identify source and target data

repositories involved in the data transfer/transformation after invoking DTS Import/Export Wizard

• To identify each repository, specify the following:– Type of data held in each repository– Physical location of the repository– Any information required to gain access to the repository

Page 18: Chapter Nine

9 Choosing a Data Source/DestinationChoosing a Data

Source/Destination

Figure 9-4: The Choose a Data Source dialog box in the DTS Wizard

Page 19: Chapter Nine

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-5: The Specify Table Copy or Query dialog box

Page 20: Chapter Nine

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-6: The Select Source Tables and Views dialog box

Page 21: Chapter Nine

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-7: The Type SQL Statement dialog box

Page 22: Chapter Nine

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-8: Using Query Builder to create SQL statements

Page 23: Chapter Nine

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-9: Column Mappings tab in the Column Mappings and Transformation dialog box, which allows customization of destination column mappings

Page 24: Chapter Nine

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-10: Transformation tab in the Column Mappings and Transformations dialog box, which allows custom transformations to the source data

Page 25: Chapter Nine

9 Copying Objects and Data between SQL Server Databases

Copying Objects and Data between SQL Server Databases

Figure 9-11: The Select Objects to Copy dialog box

Page 26: Chapter Nine

9 Saving DTS PackagesSaving DTS Packages

Figure 9-12: The Save, schedule and replicate package dialog box

Page 27: Chapter Nine

9 Saving DTS PackagesSaving DTS Packages

• SQL Server– With this default save option, you can store a package as a table

in the SQL Server msdb database• SQL Server Meta Data Services

– With this save option, you can maintain historical information about the data manipulated by the package

• Structured Storage File– With this save option, you can copy, move and send a package

across the network without storing the file a SQL Server database• Visual Basic File

– With this save option, you can programmatically customize a package created in DTS Designer or DTS Import/Export Wizard

Page 28: Chapter Nine

9 Editing DTS PackagesEditing DTS Packages

Figure 9-13: Invoking the DTS Designer from Enterprise Manager

Page 29: Chapter Nine

9 Editing DTS PackagesEditing DTS Packages

Figure 9-14: The DTS Designer interface

Page 30: Chapter Nine

9 ReplicationReplication

• Replication– Process of synchronizing copies of data stored in

multiple sites

• Uses of replication:– Allowing multiple locations to keep copies of the same

data– Separating transaction processing applications from

analytical processing applications– Allowing greater autonomy

Page 31: Chapter Nine

9 ReplicationReplication

• Uses of replication (cont.):– Creating a scalable application that is based on

strategically locating data that will be accessed– Bringing data closer to individuals or groups– Using replication as part of a customized

standby server strategy

Page 32: Chapter Nine

9 Replication OverviewReplication Overview

• Publisher– Server that makes data available for replication to

other servers

• Subscribers– Servers that receive replicated data

• Distributor– Server that contains the distribution database and

stores meta-data, history data and/or transactions

Page 33: Chapter Nine

9 Snapshot ReplicationSnapshot Replication

• Snapshot replication– Periodically

copies entire publications as they exist at a moment in time

Figure 9-15: The components involved in snapshot replication

Page 34: Chapter Nine

9 Usage Scenarios for Snapshot ReplicationUsage Scenarios for Snapshot Replication

• Snapshot replication is the best choice when the following are true:– Most of the data are static and do not change often– It is acceptable to have copies of data that are out

of data for some period of time– The total size of data publications to replicate is

small– The sites are often disconnected, eliminating the

possibility to send constant updates of data as in transactional replication

Page 35: Chapter Nine

9 Transactional ReplicationTransactional Replication

• Transactional replication– Incremental

changes made on the Publisher are sent to each Subscriber

Figure 9-16: The components involved in transactional replication

Page 36: Chapter Nine

9 Usage Scenarios for Transactional Replication

Usage Scenarios for Transactional Replication

• Transactional replication is an excellent choice when:– Data modification should be propagated to Subscribers

within seconds of when they occur– Transactions should be atomic– Subscribers are connected to the Publisher most of the

time– The application cannot tolerate out-of-date data on the

subscribers that receive the changes

Page 37: Chapter Nine

9 Merge ReplicationMerge Replication

• Merge replication– Data

modifications can be made at each site

Figure 9-17: The components involved in merge replication

Page 38: Chapter Nine

9 Usage Scenarios for Merge Replication

Usage Scenarios for Merge Replication

• Merge replication is the best selection when:– Multiple Subscribers need to update data at

various times and propagate those changes to the Publisher and to other Subscribers

– Subscribers need to receive data, make changes offline and synchronize changes later with the Publisher and other Subscribers

– The ability for each site to operate independently is critical

Page 39: Chapter Nine

9 Replication ToolsReplication Tools

• SQL Server 2000 provides several easy-to-use tools for establishing and configuring replication– Each of these tools is available through Enterprise

Manager– Most take the form of wizards that allow the

definition of Publishers, Distributors, Subscribers, subscriptions and their properties

Page 40: Chapter Nine

9 Replication WizardsReplication Wizards

• Configure Publishing and Distribution Wizard– Typically the first wizard that is invoked to establish replication

• Create Publication Wizard– Allows creation of publication

• Push Subscription Wizard– Allows the creation of a subscription to an existing publication

• Pull Subscription Wizard– Allows the creation of a subscription to a publication requested by

the Subscriber on the SQL Server specified in this wizard• Disable Publishing and Distribution Wizard

– Allows removal of replication publications and distribution service from the SQL Server machine

Page 41: Chapter Nine

9 Filtering Published DataFiltering Published Data

• By filtering published data, it is possible to:– Minimize the amount of data sent over the network– Reduce the amount of storage space required at

the Subscriber– Customize publications and applications based on

individual Subscriber requirements– Avoid or reduce conflicts that could result if the

same data were sent multiple Subscribers

Page 42: Chapter Nine

9 Replication Performance Optimization

Replication Performance Optimization

• There are several application-specific and network-specific techniques for optimizing the performance of replication– Simple way is to set minimum amount of memory allocated

to SQL Server 2000 by using the Min Server Memory option– Using a separate disk drive for the transaction log on any

server involved in replication will help to avoid the disk I/O bottleneck that can result from logging many operations

– Ensuring that the data that is replicated is then filtered as much as possible will also help to avoid unnecessary performance problems

Page 43: Chapter Nine

9 Heterogeneous QueriesHeterogeneous Queries

• A heterogeneous query retrieves, inserts or updates data by joining together sets of data from multiple data source types– It is necessary to initiate a heterogeneous query from a

SQL Server database that is linked to another data source

– A linked server configuration allows remote server access and allows access of non-SQL Server data sources to be treated the same way as the access of SQL Server data sources

Page 44: Chapter Nine

9 Configuring a Linked Server

Configuring a Linked Server

• To configure a linked server definition, a SQL Server administrator specifies an OLE DB provider and OLE DB data source

Table 9-1: OLE DB Providers and Data Sources

Page 45: Chapter Nine

9 Configuring a Linked Server

Configuring a Linked Server

Figure 9-18: Linked server configuration

Page 46: Chapter Nine

9 Managing a Linked Server Definition

Managing a Linked Server Definition

• When setting up a linked server, register connection information and data source with SQL Server 2000– After registration is accomplished, it is possible to refer to

the data source with a single logical name

• You can create or delete a linked server definition with stored procedures or through Enterprise Manager– It it possible to use stored procedures to manage a linked

server definition

Page 47: Chapter Nine

9 ArgumentsArguments

• Table 9-2 describes data returned by sp_linkedservers system-stored procedure

Table 9-2: Returned Data

Page 48: Chapter Nine

9 XML Support in SQL Server 2000

XML Support in SQL Server 2000

• Standard data format eXtensible Markup Language (XML) is a format for marking up data with extensible tags that can be used by applications to determine the purpose of the data– In SQL Server 2000, it is possible to make a request for relational

data in the form of XML and have the SQL OLE DB provider in SQL Server construct an XML document from the resulting data

– It is possible to insert, update or delete data in SQL Server using XML syntax to describe the data that should be inserted, updated or deleted

– Much of this functionality is available through stored procedures and general T-SQL statements that are executed against SQL Server 2000

Page 49: Chapter Nine

9 XML Support in SQL Server 2000

XML Support in SQL Server 2000

• There are features that can be used directly from a browser and do not require writing T-SQL code, including:– Executing a T-SQL Query in the URL window of a browser to

retrieve the results as XML– Executing a query against a database object using the

Xpath query syntax– Defining an XML template with a T-SQL statement that can

be executed to return a set of data in the form of XML– Defining an XML View that uses XML Data Reduced (XDR)

syntax instead of T-SQL syntax

Page 50: Chapter Nine

9 Configuring Internet Information Services for XML Support

Configuring Internet Information Services for XML Support

Figure 9-19: The MMC snap-in to configure SQL SML support in IIS

Page 51: Chapter Nine

9 Virtual DirectoriesVirtual Directories

• sqlisapi.dll– Implementation of the Internet Services API (ISAPI) filter that

provides SQL-specific functionality to Internet Information Services (IIS)

• To allow IIS to differentiate between request files that exist in conventional virtual directories and XML requests, it is necessary to create SQL virtual directories

Page 52: Chapter Nine

9 Virtual DirectoriesVirtual Directories

Figure 9-20: The settings tab of the Demos Properties dialog box

Page 53: Chapter Nine

9 Virtual NamesVirtual Names

• Virtual names– Sub-namespace

in IIS whose HTTP requests are forwarded to SQL Server 2000 as a specific type of request

Figure 9-21: The Virtual Names tab of the Demos Properties dialog box

Page 54: Chapter Nine

9 Virtual NamesVirtual Names

• Dbobject– Allows access to objects in underlying database of the

Allow XPath option is selected for the virtual directory

• Schema– Allows access to data in the underlying database

without the use of a T-SQL statement

• Template– Allows access to an XML file that acts as a template if

the “Allow template queries” option is selected for the virtual directory

Page 55: Chapter Nine

9 Chapter SummaryChapter Summary

• There are a variety of data formats and methods for moving data into and out of SQL Server 2000

• For one-time importing and exporting of data, the bulk insert task and BCP utility can be executed using T-SQL statements and Windows command-line statements

• To apply user-defined transformations to data imports and exports, DTS packages can be managed in Enterprise Manager

• If it is necessary to retrieve or to insert data in XML format, the FOR XML clause can be used in T-SQL statements