40
© 2013 IBM Corporation 1 InfoSphere Guardium Tech Talk: Big Data Security Use Case: A Holistic Approach to Data Protection Rodrigo Bisbal / [email protected]

InfoSphere Guardium Tech Talk: Big Data Security Use Case ... · Analytic Engines InfoSphere Sensemaking o p e n t i c k e t s S N M P a l e r t s d i s t r i b u t e S T A P s remediate

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

© 2013 IBM Corporation1

InfoSphere Guardium Tech Talk:

Big Data Security Use Case: A HolisticApproach to Data ProtectionRodrigo Bisbal / [email protected]

© 2013 IBM Corporation2

Logistics This tech talk is being recorded. If you object, please hang up and

leave the webcast now.

We’ll post a copy of slides and link to recording on the Guardiumcommunity tech talk wiki page: http://ibm.co/Wh9x0o

You can listen to the tech talk using audiocast and ask questions inthe chat to the Q and A group.

We’ll try to answer questions in the chat or address them atspeaker’s discretion.

– If we cannot answer your question, please do include your emailso we can get back to you.

When speaker pauses for questions:– We’ll go through existing questions in the chat

© 2013 IBM Corporation

Reminder: Guardium Tech Talks

Link to more information about this and upcoming tech talks can be found on the InfoSpereGuardium developerWorks community: http://ibm.co/Wh9x0o

Please submit a comment on this page for ideas for tech talk topics.

Next tech talk: WATCH THIS SPACE

Speakers:

Date &Time:

Register here:

© 2013 IBM Corporation4

Big Data Security Use Case: A HolisticApproach to Data ProtectionRodrigo Bisbal / [email protected]

© 2013 IBM Corporation5

Table of contents

–What is Big Data ?

–Why Big Data ?

–What to do with it ?

–Access methods, new exposure

–How to secure it ?

–Playing nice with the enterprise

–Q&A

© 2013 IBM Corporation6

What is Big Data ?

© 2013 IBM Corporation

There is an Explosion in Data and Real World Events

4.6 BillonMobile PhonesWorld Wide

1.3 Billion RFID tags in200530 Billion RFID today

2 Billion Internetusers by 2011

Twitter process7 terabytes ofdata every day

Facebook process10 terabytes ofdata every day

World Data Centre for Climate220 Terabytes of Web data9 Petabytes of additional

data

Capital marketdata volumes grew

1,750%, 2003-06

© 2013 IBM Corporation

Information is Exploding…

2009800,000 petabytes

202035 zettabytes

as much Data and ContentOver Coming Decade44x Of world’s data

is unstructured80%

Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010

© 2013 IBM Corporation

Source: WHATRUNSWHERE.COM http://blog.whatrunswhere.com/big-data-online-marketing-strategy/

Why Big Data ?

VolumeBe able to capture large amounts ofunstructured data: stats, video, sensor data,messages, likes, etc

CostIt is cheaper to store in BigData than in RDBMS

Real TimeUse analysis tools to go through largeunstructured amounts of data very fast.Results are used to improve business processesand decisions.

ScalabilityBe able to grow exponentially withoutincreasing cost compared to warehousing.

……

© 2013 IBM Corporation10 14 November 2013

Why Big Data ?Case study: Aviation Data

Jet sensors:Collect jet engine data ( temperature, humidity, air pressure ) to predict partfailure, take preventative action. Reduce cost by pre-empting failure

Reduce down-time:Preventative maintenannce reduces down time, thus more planes to servicecustomers.

Analyzing arrivals/departure data, weather conditions and other data sourcesairlines can bette rmanage their fleets and schedules.

Happier customers:Improved customer satisfaction is the result of fewer delays, increased customerloyalty and increased bookings.

Nalayze customer’s flying patterns airlines can identify new routes and add otherservices to benefits customers and the airline.

Greener:

More efficient jet engines consume less fuel and emit fewer CO2 gases

© 2013 IBM Corporation11

Case study: Facebook Messaging

▪ High write throughput

▪ Every message, instant message, SMS, and e-mail

▪ Search indexes for all of the above

▪ Denormalized schema

▪ A product at massive scale on day one

▪ 6k messages a second

▪ 50k instant messages a second

▪ 300TB data growth/month compressed

© 2013 IBM Corporation

Case Study: Facebook “likes” on outdoor gear

USA 2.5M

MEX 300K

BRA 1.2MPERU 100K

ARG 350K SA 80K

INDIA 1.5M

RUSSIA 500KUK 300K

© 2013 IBM Corporation

© 2013 IBM Corporation

Two security requirements in the era of big data

#1 Deploy Security Analytics

– Security analytics to predict, prevent and act oninformation in real time and through historicalanalysis

– Threats to physical and cyber assets must beunderstood and analyzed. Analyze in real timeand based on persisted data.

#2 Scale Existing Technology to Big Data

– Existing cyber security strategies such asencryption and data activity monitoring mustbe applied to big data. For example, maskunstructured data types such as medicalrecords or XML data or dynamically mask datafrom Hadoop platform or monitor all Hadoopactivity and access patterns

– Apply data security and privacy policies basedon security analytics and business rules

Big DataAnalytics

LogsLogs

EventsEvents AlertsAlerts

Traditional SecurityOperations andTechnology

ConfigurationConfigurationinformationinformation

SystemSystemaudit trailsaudit trails

External threatExternal threatintelligence feedsintelligence feeds

Network flowsNetwork flowsand anomaliesand anomalies

Identity contextIdentity context

Web pageWeb pagetexttext

Full packet andFull packet andDNS capturesDNS captures

EE--mailmail

BusinessBusinessprocess dataprocess dataCustomerCustomer

transactionstransactions

Social DataSocial Data --blogs, tweets,blogs, tweets,

chatschats

SatellitesSatellites

GPS trackingGPS trackingSmart devicesSmart devices

Network TrafficNetwork TrafficSensorsSensors

ImagesImages

SpreadsheetsSpreadsheets

FinancialFinancialTransactionsTransactions

TelephoneTelephoneRecordsRecords

© 2013 IBM Corporation15

Access Methods: Load NYSE stock data inHadoop. It could not be easier !

© 2013 IBM Corporation16

Access Methods: create nyse_stocks table from the CSV file

Very easy toload data andcreate tablesfrom CSV files

Automatic datatype detection

© 2013 IBM Corporation

Access Methods: run analysis script against Hadoop file

Find volume average of IBM stock data, very easy !

© 2013 IBM Corporation

Use InfoSphere Guardium Hadoop Activity Monitor to auditevery transaction

Guardium report:

© 2013 IBM Corporation

Access Methods: extract Hadoop NYSE stock datafrom the command line

© 2013 IBM Corporation

Challenge: Capturing HDFS activity with Guardium

Guardium report:

© 2013 IBM Corporation

Monitor sensitive data access withInfoSphere Guardium Authorized users

group

Directories thatcontain sensitive data

© 2013 IBM Corporation

Who is accessing my sensitive data?

Unauthorized useraccessing sensitivedata

Sensitive datadirectory

© 2013 IBM Corporation

Hadoop – Unauthorized MapReduce Jobs Report

This group containsauthorized programs.

This report showsprograms that areNOT in the group.

What applications arerunning on mysystem?

Who is running them?

© 2013 IBM Corporation

InfoSphere Guardium pre built reports

Login as a user

On the View tab is the Hadoop section

Hadoop section has all thepre-built reports

If you login as admin, you will need to add reports to the web console.You can add them to the “My New Reports” tab.

© 2013 IBM Corporation

Sensitive Data

Distributed Hadoop Cluster

Traditional SourcesPro

tec

tH

ere D

ata

Stre

am

s

Bu

sin

ess

Inte

llige

nc

eO

utp

ut

Protect Here

Web PagesSocial

Networks

Where to protect ?

© 2013 IBM Corporation

Securing the Hadoop Filesystem with InfoSphereGuardium Data Encryption:

• High level HDFS accesspolicy easily implementedwith Guardium DataEncryption

• Process aware

• User aware

© 2013 IBM Corporation

Sample data file exploit:

• Data store files: csv,images, encoded,etc sit in thefilesystem

• Direct access canbe used to extractdata

• Mission critical dataneeds to besecured

Simple strings command isenough to extract card datafrom a file !

© 2013 IBM Corporation

Use Guardium Data Encryption to protect

• Define the user thatis allowed to accessthe file

• Define the processthat is allows toaccess the files

• Specify the Effect:all FS operations:read, write, list,audit, encrypt, etc

Create a policy to protectthe directory:

© 2013 IBM Corporation

Use Guardium Data Encryption to protect ( cont. )

• Different policiescan be used fordifferent directories

• Centrally managefile system securityon the entireenterprise

Policy has been applied to adirectory: Guard point

© 2013 IBM Corporation

This time the data exploit fails !

• Only the authorizedprocess can accessthe files

• In this case theadmin cannot readthe file contentsdirectly

• Policy allows forauthorizedprocesses, apps andbackups to accessthe file

Simple strings command isnot enough to extract carddata from a file !

© 2013 IBM Corporation31

How to Secure It and Add Enterprise Value ?How to securely use mission critical data with big data ?

© 2013 IBM Corporation

Integrate Guardium Data Encryption logs withGuardium Activity Monitoring

-Guarded file system’s activity is logged in detail:User, action, process, object, timestamp

-In this form the logs usefulness is limited

-Read native logs from CSV stream or using Guardiummessaging API to send them in real time

© 2013 IBM Corporation

Integrate GDE logs with Guardium !

-GDE audit data is now on Guardium and can becorrelated with Hadoop Activity Monitoring-Data is normalized for easy filtering-Easily integrate with: Alerting, Workflow,Correlation engine, Quick Search, etc

© 2013 IBM Corporation

A2: Transparent. You simply definewhat processes, and/or user’s andgroups get access. Nothing changes forthose “trusted” with access. Applicationno API calls or code changesrequired. **** This is a Crucial benefitwe are bringing

A1: It is FS block level. HDFS writes tothe local FS blocks. This is what wecare about. We don’t care what HDFSdoes before actually doing the IO (filewrite/read) to the underlying FS.

Q1: How does this work at runtime? Forexample, if a file is encrypted, and thatis used by a MR job, how is thedecryption invoked (since MR does notlike encrypted files). Is there an API thatone needs to call as part of theapplication? Or is there a plugin at theDFS layer that is invoked on access?

Q2: Is the encryption at a block level? Isit sensitive to Hadoop "splits"? How doesit work in concert with compression(especially BZ2 and CMX)?

Sample Customer Questions about GDE:

© 2013 IBM Corporation

Integrate GDE logs with Guardium !

By reading the GDE audit stream and forwardinglogs into the Guardium audit database for:

-Alerting to QRadar, Syslog, ArcSight, etc-Correlate with audit data-Guardium Quick Search-Complete risk view-Analyze blocked operations-etc

© 2013 IBM Corporation

InfoSphere Guardium Goals

© 2013 IBM Corporation

monito

rend-u

ser

activity

InfoSphere Guardium integration with other IBM products

Master Data ManagementInfoSphere MDM

Web Application PlatformWebSphere

Databases•DB2 [LUW, i, z, native agent]

•Informix

•IMS

DatawarehousesNetezza

PureData

PureFlex

Big DataBig Insights

SIEMQRadar

Storage and Archival•Optim Archival

•Tivoli Storage Manager

Endpoint ConfigurationAssessment and Patch

ManagementTivoli Endpoint Manager

LDAP DirectorySecurity Directory Server

Static Data MaskingOptim Data Masking

Data Discovery/Classification•InfoSphere Discovery

•Business Glossary

Help DeskTivoli Maximo

Event MonitoringTivoli Netcool

Software DistributionTivoli Provisioning Manager

TransactionApplication

CICS

Database tools•Change Data Capture

•Query Monitor

•Optim Test Data Manager

•Optim Capture Replay

•InfoSphere Data Stage

Analytic EnginesInfoSphere Sensemaking

open

ticke

ts

SNMP alerts

distribute

STAPs

remediate vulnerability

send alert, audit, vulnerabilityuser and group mgmtmonitor end-user activity

monitor end-user activity

monito

rend-u

seract

ivity

end-user activity

leverage capture function

leverage audit change

share discovery & policies

share discovery

share discovery & classif.

monitor, audit, protect

monitor, audit

monito

r,audit

mon

itor,

aud

it,a

rch

ive

arc

hiv

eau

dit

share discovery

InfoSphereGuardium

BusinessIntelligence

Cognos

© 2013 IBM Corporation

Resources• E-book “Planning a security and auditing deployment for

Hadoop” http://www.ibm.com/software/sw-library/en_US/detail/I804665J74548G31.html

• Big Data Security and Auditing with IBM InfoSphere Guardium:http://www.ibm.com/developerworks/data/library/techarticle/dm-1210bigdatasecurity/

• Data Security best practices: A practical guide to implementingdata encryption on IBM InfoSphere BigInsightshttp://public.dhe.ibm.com/software/dw/bigdata/bd-datasecuritybp/Encryption_1.4.pdf

• Quick Start Edition for BigInsights:http://www.ibm.com/software/data/infosphere/biginsights/quick-start/

© 2013 IBM Corporation39

Information, training, and community

InfoSphere Guardium Tech Talks – at least one per month. Suggestions welcome!

InfoSphere Guardium YouTube Channel – includes overviews, technical demos, tech talk replays

InfoSphere Guardium newsletter

developerWorks forum (very active)

Guardium DAM User Group on Linked-In (very active)

Community on developerWorks (includes discussion forum, content and links to a myriad of sources,developerWorks articles, tech talk materials and schedules)

Guardium Info Center (Installation, System Z S-TAPs, how-tos, more to come)

Technical training courses (classroom and self-paced)

InfoSphere Guardium Virtual User Group. Open, technicaldiscussions with other users. Not recorded!

Send a note to [email protected] if interested.

InfoSphere Guardium Virtual User Group. Open, technicaldiscussions with other users. Not recorded!

Send a note to [email protected] if interested.

@2013 IBM Corporation

© 2013 IBM Corporation

Questions ?