14
marco kiesewetter marco kiesewetter SIMPLE ETL SOLUTION Extracting, Transforming, Loading Data From Any System To SQL Server Right From Your Desktop.

Simple ETL Solution - Marco Kiesewetter

Embed Size (px)

Citation preview

Page 1: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

SIMPLE ETL

SOLUTION Extracting, Transforming, Loading

Data From Any System To SQL

Server Right From Your Desktop.

Page 2: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

OVERVIEW

Why use manual ETL from your desktop?

A common issue to deal with

Example: Salesforce to SQL - a simple way to load fresh

data

Recap: What is ETL?

Extract data from outside source, i.e. Salesforce.com

Transform data to fit operational needs

Load data into a data storage system such as a SQL database, data mart or

data warehouse

Page 3: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

WHY USE MANUAL ETL

Business Analysts, Financial Analysts and BI Developers run

into two common situations where a manual desktop

upload is needed:

The data does not yet exist in SQL

Often it is helpful to get a sample dataset into SQL to test and justify an

new data source that is needed.

Development using this new data source can begin immediately, even before

the automated ETL job is set up by your IT department

The data needs to be refreshed off-schedule

Many ETL jobs run once or twice a day. If an extra refresh is needed at

times, a manual upload of ‘fresh’ data can be the quickest way

Page 4: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

A COMMON ISSUE TO DEAL WITH

One of the most common issues is the formatting

of the raw data

Field delimiters & row delimiters may not be standard

The use of quotations and commas in text fields can cause delimiter

recognition to fail

SQL Server does only support CSV upload if the data is in a specific

format

Salesforce row delimiters are not recognized

Quotes only work if all fields in a column are enclosed in quotes

Page 5: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

SALESFORCE TO SQL - A SIMPLE WAY

TO LOAD FRESH DATA

Example

Page 6: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

AUTOMATION & SIMPLICITY

Requirement:

Automation – very little manual effort

Simplicity – anyone can run the update

Page 7: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

STEP 1: THE EXTRACT FILE

Download the Salesforce report results you want

to upload

Download as CSV

Save in an accessible network path

The SQL server has to be able to access it.

Use a filename that does not change

Avoid dates etc. in the file name.

Scripts will access this filename in this folder. For an update simply

overwrite this file.

Page 8: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

STEP 2: STANDARDIZE THE FILE

Now we are using Windows PowerShell to prepare the csv for upload

Often we will have commas in comments or other text fields. We should change the

field delimiter to a different symbol. The pipe “|” is a good option.

First we will need change all existing pipes to something else in order to make the

pipe symbol unique as field delimiter.

In the example below I simply remove the pipes by replacing them with an empty

string, they can be replaced with anything else, though.

# Define the file but note that I do not add “.csv” $csvfile = '\\server\folder\salesforceExtract' # Now we replace all pipes get-content ($csvfile + ".csv") | % {$_ -replace "\|", ""} | out-file ($csvfile +" (no Pipes).csv") -force -encoding ascii

Page 9: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

STEP 2: STANDARDIZE THE FILE

Next we will change the delimiter and standardize the CSV file:

The file salesforceExtract (standardized).csv has now all fields in quotes.

Since we do not use commas as delimiters and all pipes were removed

from all text fields, we can safely remove all quotes from the file:

# Make standard CSV but use Pipes as delimiter Import-csv -path ($csvfile + " (no Pipes).csv") -Delimiter ',' | Export-CSV -path ($csvfile + " (standardized).csv") -Delimiter '|'

# Remove all Quotes (“) get-content ($csvfile + " (standardized).csv") | % { $_ -replace '"',""} | Set-Content ($csvfile + " (upload).csv")

Page 10: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

STEP 3: UPLOAD TO SQL

Finally, we upload this prepared CSV to SQL server using

Microsoft SQL Server Management Studio

For this we use the BULK INSERT command

In most cases we may upload a complete new data set. The easiest way to

handle this is by deleting the old table and re-creating it.

Another advantage for doing this is the ease with which new fields can be

added or the variable types of fields can be changed. The new creation of

the table allows for any such adjustments.

Note that the BULK INSERT functionality is a server-side permission

setting and may need to be activated for your login.

Page 11: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

STEP 3: UPLOAD TO SQL

Here is an example SQL

script you can adjust to

fit your needs:

use YourDB drop table [dbo].[YourTable] go create table [dbo].[YourTable]( Field1 nVARCHAR(255) null, Field2 datetime null, Field3 float null, go bulk insert YourTable from '\\server\folder\salesfoceExtract (upload).csv' With ( fieldterminator = '|', rowterminator = '\n', firstrow = 2 ) go

Page 12: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

STEP 4: PUTTING IT ALL TOGETHER

Put all the PowerShell commands into a text file with the extension .ps1

Put the SQL script into a text file with the extension .sql

Now create a batch script (example below, file extension .bat) that runs

all of the above commands and place everything in the same folder in

which you save your extracted CSV from Salesforce

@echo off cls echo Standardizing the CSV file... powershell.exe -noprofile -ExecutionPolicy ByPass -File “My PowerShell Script.ps1" echo. echo Is SQL Server Management Studio running and logged in to server ? pause echo. echo Loading the SQL Query “My SQL Script.sql"

Page 13: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

SIMPLE ETL

Your automated ETL solution is ready.

All you have to do now is saving the report results

under the same file name, run the batch script and

hit “Execute” in SQL Management Studio once it

loaded.

Page 14: Simple ETL Solution - Marco Kiesewetter

marco kiesewetter marco kiesewetter

THANK YOU Questions?

Reach out:

https://www.linkedin.com/in/marcokiesewetter