19
Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved. Using Data Classification to Manage File Servers Adi Oltean – Senior SDE, Microsoft Corporation Ran Kalach – Principal Dev Manager, Microsoft Corporation

Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

  • Upload
    vantu

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Using Data Classification to Manage File Servers

Adi Oltean – Senior SDE, Microsoft CorporationRan Kalach – Principal Dev Manager, Microsoft Corporation

Page 2: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Agenda

Customer challengesSolution: File Classification

Manage data based on business valueGrow the ecosystem in classification solutions

File Classification InfrastructureThe classification pipelineAggregation, conflict resolutionIncremental classificationChallenges, Mitigations & Best Practices

Conclusions

Page 3: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Customer challenges – file serversStorage growth

Storage cost

Compliance Security and Information leakage

Data sharing and search

Replication

Backup

HSM

Security

Archive

Encryption

Expiration

Increasing data management needs / many data management tools

Page 4: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

ITBusiness

File shares and business requirements

4

Need per project share

Make sure high business impact files do not leak out

Backup files with personal information to encrypted store

Expire low business impact files created three years ago and not touched for a

year

Page 5: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Some time later …

5

Page 6: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Classify and apply policy

Step 1:

Classify data

Step 2:

Apply policy based on

classification

Manual

Line Of Business application

Automatic classification•Location•Content•Owner

IT Scripts

Backup

Archive

Reports

HSM

Expiration

Replication

Security

Encryption

Search

Classification methods

Actions based on classification

Leakage prevention

Page 7: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

ITBusiness

File shares and business requirements

7

Need per project share

Make sure high business impact files do not leak out

Personal Business Information Impact

Backup files with personal information to encrypted store

Expire low business impact files created three years ago and not touched for a

year

Page 8: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Customer benefits - Summary

Reduce Cost• Expire files to reduce

storage purchasing needs• Move files to less

expensive storage• Optimize backup SLAs• Replicate only business

related files

Manage risk• Find sensitive files on public

servers• Watermark documents• Keep files containing personal

information encrypted in backup

• Apply rights management to high secrecy files

• Comply with retention policies

Apply Policies Based on Classification=

Manage data based on business value!

Page 9: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Agenda

Customer challengesSolution: File Classification

Manage data based on business valueGrow the ecosystem in classification solutions

File Classification InfrastructureThe classification pipelineAggregation, conflict resolutionIncremental classificationChallenges, Mitigations & Best Practices

Conclusions

Page 10: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

File Classification Infrastructure

Set classification properties API for external applications

Classify Data

Store classification properties

File Classification Extensibility points

Apply Policy based on

classification

Discover Data

Extract classification properties

Get classification properties API for external applications

Page 11: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Classification Runtime Process

Hosting ProcessHosting ProcessHosting Process

Classification pipeline – an example

ScannerGets basic file properties

Office Storage [Load]

Folder Classifier

Content Classifier

Office Storage [Save]

Reporting Engine

Property bags can cross processes• Security checks are performed on cross-process

data transfers

Most modules are hosted within a separate process

Each component passes property bags to the next one

Property bag object

discovery load properties classification save properties run policies

This is an example of a pipeline setup with one storage module and two classifiers

Page 12: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Aggregation and Conflict Resolution

Problem: • A classification rule may provide conflicting value with the value already

stored in the file• Two classification rules may provide conflicting values for the same

property• Example:

Admin creates a “Business Impact” property with possible values (LBI, MBI, HBI)A file previously classified as MBI is copied to a folder x:\fooThe Folder rule for x:\foo classifies all files as LBIThe Content classifier scans the file and classifies it as HBIWhat is the correct value?

Solution: • Provide several types of classification rules:

Default: rule runs only if the property not present in the file. Otherwise: rules can either explicitly aggregate or overwrite previously-stored properties.

• Value aggregation depends on the property type

Page 13: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Incremental Classification Goal: Minimize re-classification of already classified files

Crucial for scalability (large amount of files)

Automatic classification (scheduled)Cache classification results in ADS (alternate data stream)

ADS contains a hash of certain file properties (last-modify-time, file-path, file-id, etc)ADS contains the last classification timeAllows determining whether the cached classification is up-to-date

Re-classify the file only if:The file changed or was added since previous classification (hash is different), orA rule has changed since previous classification, orThe configuration of a classifier has been updated since previous classification.

Get Property API (on-demand)If cache is present and up to date, return cached propertiesOtherwise (out-of-date classification), application can choose:

Accuracy: classify the file “on the fly” Performance: return stored properties

Page 14: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

1 - PerformanceContent classification is expensive (I/O , CPU)

Must optimize to scan & classify only when neededMust be able to cache results

Minimize performance impact on host of data being classified

Classify on another machineWhen classifying locally, throttle machine resource usage and back out when the machines becomes non-idleBe smart with how you schedule classification, support pause/resume

Challenges, Mitigations & Best Practices

Page 15: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Challenges, Mitigations & Best Practices

2 - AccuracyAutomatic Classification can almost never be 100% accurate

Tune your rules for false-positive / false-negative according to the scenario

Example: secure files – false positive, expire files – false negative

Policy execution: revert in case of classification errorExample: backup files one last time just before you expire them

Examine classification results periodically Modify your rules or classifiers till they’re optimized for your data-set

Enable manual classification

Clear and consistent policy for aggregating and resolving conflicts

Support flexible rules that allow tuning by administrator or applicationOne answer doesn’t fit all!

Page 16: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

3 - Real-time Classification and PoliciesSome policies require real-time or near real-time execution

Example: removing confidential file from unsecured share

Solution: event-based classificationFile-system activity can be a triggerNeed a hook to file-system operations, (many implementation options exist)Consider Classifying only when the file content is “stable”Avoid overloading the server performance with too aggressive classification

Challenges, Mitigations & Best Practices

Page 17: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Examples of FCI-enabled solutions

Solution ExampleClassification solutions An LOB app that maintains special

classification rules for PII data it generates.

Custom “classifiers” that extract metadata from files

A medical imaging classifier extracts embedded metadata from scanned images

Custom “storage modules” that load/store custom metadata in files

Load/store metadata in your custom file formats (example: videos)

Add “classification awareness” to existing data management solutions.

A backup app can have special backup policies for HBI data

Build “intelligent” policy-based data management solutions

Define a policy to automatically apply encrypt HBI data

Page 18: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

Opportunities for you

Why participate in the File Classification Infrastructure ecosystem?Use FCI for existing software

Enhance existing data-producing apps to also attach classification to generated files (ex: LOB applications)Enhance existing data management apps to consume classification

Use FCI for new software solutionsDevelop solutions on top of FCIDevelop components for the FCI ecosystem

Classifiers Storage modules

How I can develop against it?File Classification Infrastructure can be consumed through a rich, scriptable COM API FCI can be extended using C++/C# code, or Powershell scripts

When can I start? Now: FCI is part of the latest Server releases (starting with Windows Server 2008 R2)

Page 19: Using Data Classification to Manage File Servers - SNIA · PDF fileReplication Backup HSM Security Archive Encryption Expiration Increasing data management needs / many data management

Storage Developer Conference 2009 © 2009 Insert Copyright information here. All rights reserved.

More information about FCI

General informationHome page: http://www.microsoft.com/windowsserver2008/en/us/fci.aspx

Team blog: http://blogs.technet.com/filecab

API documentation on MSDN: http://msdn.microsoft.com/en-us/library/bb972746(VS.85).aspx

Sample codeWindows SDK http://msdn.microsoft.com/en-us/windows/bb980924.aspx

Sample FCI clients (C++, C#)Sample classifiers (C++, C#)

Code Gallery: http://code.msdn.microsoft.com/fci