Data Done Right

Preview:

DESCRIPTION

You probably already know that managing data in Salesforce can be a formidable task. But you might not know that it doesn't have to be! In this session, we'll focus on strategies to help you with key data tasks such as data migration, managing large data volumes, org merges, and data consolidations.

Citation preview

Data Done RightAdministrators

Brian Wiebe: Technical Engagement Manager, salesforce.comEzra Kenigsberg: Data Architect, salesforce.com

Safe HarborSafe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.

The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2010. This documents and others are available on the SEC Filings section of the Investor Information section of our Web site.

Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

TinyURL.com/SalesforceSafeHarbor

Purpose

To walk through three big data issues that can help

make you an even-better administrator.

This Session… Other Sessions…

Practical demos—things you

can do TODAY

Required– Data Loader

– Microsoft Excel

– A decent text editor

(I use Notepad++)

Optional– Cloud Converter

– Synchronizer

(requires Microsoft Access)

Bigger-picture data strategy

Professional third-party tools

Overview

1. Introduction

5 min

2. Moving Data

15 min

3. Cleaning Data

15 min

4. Working with Large Data Volumes

15 min

5. Q&A

until they kick us out

Prior to making any major changes to your org:

BACK UP!

Ezra Kenigsberg

salesforce.com

Moving Data

The Scenario

The scenario we’re walking through:

Gotta import new records by tomorrow

We’re creating a repeatable, documented process– “Just load it” fails the hit-by-a-bus test…

• …is difficult to audit after the fact

• …may not be reversible if I’ve made a mistake

Links and Tools

Useful links:

developer.force.com/consultants

EzraKenigsberg.com

Our tools:

Required– Data Loader

– Microsoft Excel

– A decent text editor (I use Notepad++)

Optional– Cloud Converter

– Synchronizer (requires Microsoft Access)

Useful Links: developer.force.com/consultants

Dedicated pages for• Data Migration• Large Data Volumes• many others

Useful Links: EzraKenigsberg.com

Dedicated sections for• Handy Tools• Reference Links• Presentations• Requests for Salesforce & Data Loader Improvements

Three Key Steps

Three steps:

1. How should I map my data?

2. How can I automate the generating of CSVs?

3. How can I load data in an auditable way?

HOW do I

generate? load?map?

HOW do I

generate? load?map?

Moving Data #1: How Do I Map?

Create a mapping file:

1. Create list of source fields in source file/s

2. Create list of API field names (not the UI labels!)– Get them with Data Loader or Cloud Converter

3. Match source fields to API field names

HOW do I

map?

Grab the Source Field Names

Transposed in Excel

Excel 2003:Edit | Paste Special | Transpose

Excel 2007 or 2010:Home | Paste | Paste Special | Transpose

1

2

HOW do I

map?

Grab the API Field Names (slide 1 of 2)

2

Model Metrics’ free utility “Cloud Converter” is a straightforward

way to export metadata

1

HOW do I

map?

Grab the API Field Names (slide 2 of 2)

2

1

...and a row for

every field

“Cloud Converter” exports a tab for every object...

HOW do I

map?

Build Out Mapping File

HOW do I

map?

Index Column

An index columnfacilitates latersorting

HOW do I

map?

A Column of Arrows

Arrows remind people of the DIRECTION of data

HOW do I

map?

Not Every Field Has to Map

Not every column has to map

HOW do I

map?

Keep it Simple!

The most common cause of bad data maps?

Too much stuff!

Put in only the columns you’ll UPDATE and USE

HOW do I

map?

Moving Data #2: How Do I Generate CSVs?

Import legacy data into a tool that enables REUSE– Raw data comes in, ready-to-load data comes out

– Minimize manual steps. Tools I’m not so thrilled with:• Microsoft Excel

• Import Wizard

– Each subsequent load becomes a straightforward process• Test-then-Production

• Follow-up loads

Which tool should I use?– Good question! Can you wait eight slides?

HOW do I

generate?

Moving Data #3: How Do I Load?

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

Folder Naming

YYYY-MM-DD#NNforces folders

to sort inchronological

order

HOW do I

load?

Keep All Files Together

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

Folder Contents (1 of 3)

Save source file, success files, and error files all in same directory

HOW do I

load?

Folder Contents (2 of 3)

When loading errors:• Copy error file and• Use as source for next load

1

2

HOW do I

load?

Folder Contents (3 of 3)HOW do I

load?

Dummy-Proof the Auto-Match

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

Perfect Auto-Match

With a file created this way, Auto-Match gets every field mapped

HOW do I

load?

The Smarter Way to Load Data: Upsert!

Loading Best Practices:

1. Group work by folder

2. Keep all files together

3. Auto-Match

4. Upsert!

HOW do I

load?

export UPSERT UPSERT

Why UPSERT and EXTERNAL IDs are Great

With Upsert & External IDs

export map update insert

map map update insert

a

c

a

c

a

c

a

c

a

c

cc

a

c

a

c

aa

c

Without Upsert & External IDs

a

c

a

HOW do I

generate? load?map?

Synchronizer is an OPEN SOURCE application that is NOT

SUPPORTED by Salesforce

Synchronizer has functions that facilitate all these tasks:– Mapping

– Generating CSVs

– Loading data

HOW do I

Synchronize?

How Synchronizer Automates All This

Synchronizer Walkthrough

In this walkthough, we’ll:

1. Import legacy data into Synchronizer

2. Map the data

3. Document the data map

4. Migrate the data into Salesforce

5. Review the files created

HOW do I

Synchronize?

Synchronizer Review: Import and Map

One-button CSV importer (not perfect, but fast and simple):

Grab API fields, then tie legacy fields to API fields:

HOW do I

Synchronize?

Synchronizer Review: Create Data Map

One-click data map generator:

HOW do I

Synchronize?

Synchronizer Review: Migrate

One-screen UI for loading data

HOW do I

Synchronize?

Synchronizer Review: Organize Files

Folder and file discipline:

Reimporting of success and error files for use in future loads

HOW do I

Synchronize?

Synchronizer Review: Other Goodies

Some other Synchronizer functions not covered:

Ability to run multiple steps in sequence

Scheduling

Mass-create tasks– Creating Users

– Assigning Users to Groups

Custom reports– Storage usage by User, by object

HOW do I

Synchronize?

Best Practices

1. Build a mapping file

2. Leverage a tool to generate CSVs

3. Use loading Best Practices

Get Synchronizer and help make it better!

HOW do I

generate? load?map?

Brian Wiebe

salesforce.com

Cleaning Data

43

What is Data Quality?Combination of Processes, Policies and Tools

Involves Governance, Enforcement, Prevention Goal is not perfection

What are the typical Issues?Duplicates (Account, Contact), Incomplete information,

Stale or Untouched data, Inconsistent values, Incorrect linkages

What are the typical causes?Not part of Budget, Unmeasurable problem

No Action Plan, No Ownership, Lack of Training, Non-optimized salesforce.com

Key Data Quality ConceptsDefining a Broad Topic

The Full Data Quality Lifecycle

AssessAssess

Cle

anse

•Train users•Enforce processes •Monitor on-going

quality

Data Protect

Protect

Our 3-step, iterative process quickly identifies problems, fixes them and helps you maintain high data quality over time

Data Cleanse

Data QualityAssessment•Profile data•Analyze results• Identify problems

and next steps

•Standardize & Cleanse

•Supplement & Enrich

•Test & Load

Data Quality Assessment

Data Quality Assessment

Project Planning

Strong Sponsorship– Committed Involvement & Availability (The DQ Assessment helps justify this)– Appreciation, Awareness & Understanding of Data complexities

Limiting Scope / Phased Approach– ACHIEVABLE Goals– Define critical quick-win items for Phase 1 (focus on biggest issues for end-users)

Test, Test, Test– Leverage your Sandbox Environment– Data Quality cleansing is a “destructive” process

Plan for End-user involvement– Data Quality is an iterative process – and MUST involve end-user buy-in and input

If the foundation is off..

Begin Governance & Stewardship– Involve IT and Business users– Monitoring Data Quality Dashboards – report back monthly– Use Salesforce features (e.g., Data Validation Rules, Conditional Workflow field

updates, Analytic Snapshots for trending)

Archive un-used Data– Data must be USEFUL to the Business and must be justifiable– Candidates for archiving : last updated > 1 year ago, no child records, Missing

Core Required Fields

Correct existing data– Users who have left company and STILL own records, Find/Replace picklist

values, Apply Naming Standards

Data Quality Solution Considerations

Identify and remove Dupes– Low hanging fruit:

• Simple dupes: e.g.,) match on a unique key like email address• Flag dupes for merging in Salesforce

– Leverage available de-dupe tools• Complex definition of dupe: e.g.,) fuzzy matching on name+address

– Define your rules (matching rules, merging rules)

Enrich & Append– Enrich your existing data– Add NEW data for known companies– 3rd party data vendors – helpful in creating Account hierarchies, helpful

for accurate Contact Info – especially at various levels in the Company

Data Quality Solution Considerations

Limit points of Entry– List Imports restricted to certain profiles– Control data being entered – without overwhelming users– Leverage Sales Intelligence Tools, Dupe Prevention, Address

Validation

Automation / Integration– Integrations, Master Data Management– Nightly Batch Updates

Data Quality Solution Considerations

Cleansing Environment

Staging

Staging

Production

• Transform & Re-model• Cleanse & Standardize• Enrich & De-dupe• Iterate• Validate with Business

Users

Company Name & Address

Enrich (Optional)Enrich (Optional)

Acme Inc HQAcme UK

Hierarchy Data

Demographics

3

Names

StandardizeStandardize

US, U.S. U.S.A USA

acme incorpAcme Inc

Addresses

Postal Standards

Identify, Match & Score

4

De-dupeDe-dupe

J. Smith, John Smith 80%

Re-parent Child Records

Account: Division, Opportunity, Contact

Merge

J. Smith, John Smith John Smith

Find & Replace

2

CleanseCleanse

Acme-Widgets-453

Hot HighCold Low

Data Transformation

Naming Conventions

Mergers, acquisitions, spin-offs

Archiving & Filtering

Load to Sandbox

5

ValidateValidate

Load to Production

Validate & Modify

1

Cleansing Process

Safeguard your cleansed data and prevent future deterioration.

TrainTrain

• User Training• Naming Conventions• Address Conventions• Dupe. Prevention Process• Data Importing Policies

• Required Fields• Default Values• Data Validation Rules• Workflow Field Updates• Web-to-Lead Restrictions

• Data Quality Dashboards• Data Quality

Reassessment• AppExchange Tools

EnforceEnforce MonitorMonitor

Protect Your Data

What tools do I use?

The AppExchange The Trusted Cloud Computing Marketplace

1000+Pre-Integrated Apps

300+ Services

4000+Customer Reviews

200+ Free Apps to Get You Started

• Reports & dashboards to end-to-end templates

• Fully customizable

AppExchange Tools Worth Checking Out!

Cloud Converter (Free)

Synchronizer (Free)

Jigsaw for Salesforce (Paid)

CRM Fusion (Paid)

Data Quality Dashboard (EE edition)

Data Quality Analysis Dashboard (EE Edition)

All reports pull from just TWO formula fields.

59

The Formula Field

You can EXPAND these formulas to include YOUR custom fields.

Brian WiebeEzra Kenigsberg

salesforce.com

Managing Large Data Volumes (LDV)

What Do We Mean by Large Data Volumes?

You know you’ve got scale when …– 1,000s of Users

– 1,000,000s of records for a single Object

– Role or Territory hierarchy > dozens of levels

– Public Groups nested > 5 levels deep

– A single User, Queue, Role, Public Group, or Territory:• Owning 10,000s of records

• Seeing 10,000s of records as a result of sharing

– 10,000s of Public Groups

– 1,000s of Territories

These are NOT hard limits—only useful guides to proceed carefully!

Talk to your Account Executive

Where Would We Proceed Carefully?

User Interface– Reports, Dashboards, List Views

– Searches

API– Queries

– Integration

Synchronizing with end-user (Outlook, Mobile)

apps

What Options are Available?

1. Segment

2. Optimize sharing

3. Leverage indexes & skinny tables

4. Move data asynchronously

1) Segmenting with Divisions Think millions

Use data access patterns

– Does everyone really look at

everything?

Acts like DB partitions

– Breaks up big objects

– Aim for <1M / division

Used for performance of search,

reports, dashboards, and list

views

Not a security measure!

Global Division

… SOHO SMB MSB Enterprise Gov …

By Geography

By Responsibility

Example: Financial Services Customer13M Clients / ~500 Branches = 26,000 records per Division

Arc

hive

1) Segmenting with Tiered Data Think tens of millions

Focus users on active data– Open cases

– Warm leads

– Recent history

Segregate inactive data

Can be used with Divisions

The Archive Data table...– ...doesn’t have to be custom

– ...doesn’t have to be Salesforce

Can use Analytic Snapshots

Example: Financial Services Customer145M Activities - 90M Legacy = 55M55M / ~500 Branches = 110,000

Active DataStandard or Custom ObjectStandard functionality

Batch ApexScheduled Apex

Archive DataStandard or Custom ObjectSubset of columnsReport focused

2) Optimize Sharing

Use private sharing strategically

Enforce ownership to prevent data

concentration– “Super-owner” individuals

Streamline hierarchies– Limit depth of nested groups

– Roles, Groups, Territories, etc.

Leverage all capabilities– Apex Managed Sharing for custom

objects

3) Leverage Custom Indexes

Standard Indexes: Created Date, Last Modified Date, Division, Record Type

Administrators can index fields bydesignating them as External IDs

Custom Indexes are availablethrough Support– Multi-column custom indexes also supported

– Can be applied based on use case, impact and priority

– Salesforce.com working on automatically detecting the need

Example: Large Japanese Insurance Co.Custom object with 10M records queried regularly by 50,000+ users80x boost in query perf. due to multi-column Custom Index

3) A Word About Skinny Tables

Database

Innovation– No work required

– Managed by

salesforce.com

2-10x performance

for some analytics

Also available

through Support

Name Address 1 ST Comments SUM

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

John 1 Terracotta Ln CA Need follow up here. 500

Name ST SUM

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

John CA 500

BaseTable

Ski

nn

yTa

ble

Faster Reports(more rows fit in memory)

Faster Reports(more rows fit in memory)

Fewerrowsper

fetch

Morerowsper

fetch

Data streamed to temporary storageData streamed to

temporary storage

ClientClientProcessing

ThreadProcessing

Thread

Processing Thread

Processing Thread

Processing Servers

Processing Servers

JobJob

Data Batches

Data Batches

Dequeue batch

Dequeue batch

Insert/updateInsert/update

Save resultsSave

results

Send all data

Send all data

Check StatusCheck Status

Retrieve ResultsRetrieve Results

4) Bulk Up—Asynchronously

Job updated in Admin Setup

Job updated in Admin Setup

Dataset processed in parallel

Dataset processed in parallel

ResultsResults

Bulk API

The “go-to” option for tens of thousands of records

and up

Up to 10,000 records in a batch file

Asynchronous loading, tracked in Salesforce’s

Walkthrough time!

Upsert legacy data into Salesforce—FAST

Example: American Insurance Co.230 million records processed in 33 hours,14 hours ahead of schedule

Q&A

Q&A

Links:

developer.force.com/consultants

ezrakenigsberg.com

Post-Session Questions?• Brian Wiebe

Technical Engagement Manager (West)bwiebe@salesforce.com

• Ezra KenigsbergData Architect (Midwest) ekenigsberg@salesforce.com

D I S C O V E R

Visit Customer Success Team at Campground

Discover

Training

Learning Paths

Experience

Product

Demos

Learn about Customer

Resources

the products, services and resources

Meet Success Experts

S U C C E S S

Find us at the Customer Success Team area of salesforce.com Campground at Moscone North

Learn about how to win prizes including 10 iPads & more!

that help you achieve

How Could Dreamforce Be Better? Tell Us!

Log in to the Dreamforce app to submit

surveys for the sessions you attendedUse the

Dreamforce Mobile app to submit

surveysEvery session survey you submit is

a chance to win an iPod nano!

OR