56
PRESENTED BY Anthony Krinsky Sales Consultant Practical Query Governance & Data Security TCC 2013 UPDATED ! [email protected]

Practical Query Governance & Data Security

  • Upload
    hanne

  • View
    81

  • Download
    3

Embed Size (px)

DESCRIPTION

Practical Query Governance & Data Security. UPDATED!. TCC 2013 . [email protected]. Agenda. Terms Security filtering Managing big data Drill-down design patterns Limiter design patterns. Terms. Data Governance Data Security Query Governance. “Data Governance”. - PowerPoint PPT Presentation

Citation preview

Page 1: Practical Query Governance  & Data Security

PRESENTED BY

Anthony KrinskySales Consultant

Practical Query Governance & Data SecurityTCC 2013 UPDATED!

[email protected]

Page 2: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Agenda

• Terms• Security filtering• Managing big data• Drill-down design patterns• Limiter design patterns

Page 3: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Terms

• Data Governance• Data Security• Query Governance

Page 4: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

“Data Governance”

• “The discipline embodies a convergence of data quality, data management, data policies, business process management, and risk management surrounding the handling of data in an organization.” • Wikipedia

Page 5: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

In-scope

• Data Security• Authentication• Data encryption• Row-level security

• Query Governance

Page 6: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Data Security

• Data encryption• Use the database: Tableau will not decrypt

• Tableau provides 2 methods for authenticating into server• AD Security• Local security

• v8 introduced Datasource filters• Immutable• Table row-level filters on saved datasources• Security enforced through SQL where clauses:

must be modeled

Page 7: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Datasource Filters

• Like common filters, but applied to the datasource object directly.

• Useful session security expressions can bind rules to user identities.

• Saved with the datasource• Can be published with the datasource.

Page 8: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Session security expressions

• USERNAME()• Username of logged in user, otherwise Windows

username. Authors may impersonate other server users.

• ISMEMBEROF(<string literal>)• FULLNAME()• USERDOMAIN()

• Tableau server domain or windows domain• ISFULLNAME(< string literal>)• ISUSERNAME(< string literal>)

Page 9: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Use data server to “bake” security into datasources

Page 10: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Data Server

• Control for IT:• Users cannot override calculations• Users cannot edit joins or connection information• Users cannot write SQL• Users cannot alter or republish data sources• Authentication can be fixed or prompted

• Flexibility for users• Can access remotely over HTTP• Can write new calculations• Can blend desktop data• No need to download data/extracts• No drivers to install• Can leverage power of Tableau Server

Page 11: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Group/Role vs. User FilteringSuperstore Tables

Restrict to regions user manages:[Users].[Manager]=Username()

1:0 or many

THIS WILL NOT WORK:IsMemberOf ([Users].[Region])

Tableau wants to see a STRING literal here (not a dynamic variable)

Page 12: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Dynamic group filtering does NOT work

Works! Does not work!

Page 13: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Adding users to groups on Tableau server

Page 14: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

User IsMemberOf() for role based permissions

• ISMEMBEROF(“<ROLE>”)• Use group membership for capabilities/ROLE

based security:

• ISMEMBEROF(“HR”) • ISMEMBEROF(“FINANCE”)• ISMEMBEROF(“ADMIN”)

Page 15: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

User username() for dynamic security

• <NAMEFIELD>=USERNAME() for user or group-level dynamic security

Page 16: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

User security design pattern

Inner joins

Fact

Table 1

Table N

1:many

1:many

USERNAME

USERNAME

USERNAME

Page 17: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Blend possible, but not recommended

• Filtering on secondary dimensions supported in v8.

• But…• Defined for view: can be

easily defeated or misapplied by author.

• Can explode memory footprint if security table is too large.

Fact

Security Table(in another datasource)

blend

USERNAME

USERNAME

Page 18: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

View-level filter test

Page 19: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Database enforced join

Page 20: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Make immutable with Datasource filters

Page 21: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Works same as local filter but tamper-proof

Page 22: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Impersonation different for published datasource

• Note that when publishing the datasource, Desktop user impersonation will stop working.

• To test different users, edit datasource connection.

Page 23: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Big Data: what can possibly go wrong?

The data Innocent Tableau user!

just drag and….Data source

Page 24: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

When too much data becomes a problem

1. Reports take too long to render2. Report interactions are too sluggish3. Tableau (Server often) is unstable

• Memory issues may be latent.• You may not notice a problem until reports stop

processing (spinners on one or more panes of a dashboard)

4. Long-running queries are outraging the DBA

Page 25: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

What is a lot of marks?

• Tableau renders images fast.• 1 million marks is fast, fluid, natural.• 5 million?• 1 trillion?

• Can you make sense of 1 million marks???

Page 26: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

What is a lot of data?

• Report complexity, query working set size, and cache settings, affect memory footprint.

• Render time is not always a good indicator of memory utilization (ie. Table calcs are fast).

• v8.0 VizQL processes are 32-bit process (2GB or 4GB per process).

• 8.1 with 64-bit (16 TB) practically erases limits.• As a rule of thumb:

• 100 mb is a lot of data for a view to process. • If caching is enabled, 10 mb per view adds up

quickly.

Page 27: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Sizing query results

• Function of aggregation, data types and factorial of dimensions and rows.

• Database profiling tools can often tell you exactly how much data is being transferred/requested.

• Check size/cardinality of dimensions using “describe field” -> “load”

Page 28: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Specifics: MSQL “Client Statistics”

Example:1 million row fact table query = 80 mb

Page 29: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Cardinality of dimensions (high = lots)

Page 30: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

How to blow up a VizQL process?

• You’re rarely going to see it on million row datasets but as results get larger, it’s possible…

• Drag-drop high-cardinality dimension in DW• ie. SKU, names or IDs

• (Inadvertently) tell Tableau to NOT aggregate huge queries.• Blending per se, will not constrain the size of

the secondary query.• Table calculations are applied AFTER data is

returned. • A “top N” filter on “index()” requires all rows to

be retrieved FIRST.

Page 31: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Caution: your report may run fine in Desktop!

• Yes, memory-hogging reports may “just work” on your laptop.

• Desktop = dedicated server, per user.• Desktop processing is easier: no HTML to

render.• Server does not always release memory

(especially if “refresh less often” is enabled)

Page 32: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Use common sense

• Can you make sense of 1 million points?• How many rows does your report require to

render 1,000 marks?• If queries require 10x as many rows, or more,

you may have a report design issue.• If your data set has 1 million rows or less, it

probably doesn’t matter.• If it’s more, consider defensive maneuvers.

Page 33: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Tableau Warnings: Informed Consent

Page 34: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Take precautions

• Defensive data source definition• Be mindful of high-cardinality dimensions• Avoid index() filters (post query)• Avoid blends (post query)

• Or keep secondary data sources small• By default, show no/few records

• Use data source filters/parameters• Summary tables

• Consider extracts• Use summary-detail design pattern

• In-database query governance

Page 35: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Summary Tables

• Summarize data you will query often using aggregate functions (SUM, AVG, MIN, MAX) and GROUP BY on dimensions.• Create summary tables in the database (materialized

views make this easy)• Extracts are a great option (up to 1 billion rows)

• Remember to use the “aggregate” option on creation.• You are generally “safe” with extracts (memory-

backed file architecture)

Page 36: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Summary Extracts

Page 37: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Summary-Detail Design Pattern

Page 38: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Summary-Detail filter action

• Make sure to “Exclude all values” when not selected

Page 39: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

When page loads, no detail shown.

Page 40: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Database Top N

• Syntax varies by database

• MySQL, Postres, Vertica, Pivotal, Netezza syntax• SELECT column_name(s)

FROM table_nameLIMIT number

• Oracle syntax• SELECT column_name(s)

FROM table_nameWHERE ROWNUM <= number;

• SQL Server / MS Access / Excel syntax• SELECT TOP number column_name(s)

FROM table_name;

Page 41: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Excel example

• SELECT TOP 1000

Page 42: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Top N in Tableau

• Simple T/F on # number of rows• Does not work!

• Top N on a dimension• Works!• But not immutable

• Sets & T/F calculated field• Works!• Can bind to data source filter (immutable)

• Using index()<top N• Works! • But does not filter.

Page 43: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

5. Where you would want a LIMIT function (if other filters)

4. Filter Shelf (no context)

But remember… order of evaluation

3. Context Filters

2. Datasource Filters

1. Custom SQL or DB view

Where you can add one easily

# = order of evaluation. Are we filtering on a pre-limited subset???

Top N on dim

Top N with sets

Top N or LIMIT in SQL

Page 44: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

4. Filter Shelf (no context)

What about 100% Custom SQL filters

3. Context Filters

2. Datasource Filters

1. Custom SQL or DB view

But to get accurate results, you can’t use thesefilters AFTER limit has been applied. Ok if logical.Less Ok if arbitrary.

Page 45: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

So… Top N as a governor: pick 2

Immutable Accurate results(with other filters/context)

Reduces # Rows

Sets & T/F calc

YES NO YES

Custom SQL or view

YES NO YES

“Standard”Top N on dimension

NO YES YES

Index()< top N & T/F calc

NO YES NO

Page 46: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

A note about context filters…

• By default, context filters write results to temp tables

• With big data… writing temp tables can be slow (depending on database)• On ODBC and some databases, can you

disable temp file generation through datasource XML.

<customization name='CAP_CREATE_TEMP_TABLES' value='no' /><customization name='CAP_SELECT_INTO' value='no' />

Page 47: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

So…

• We can use Top N for governance, but then we won’t necessarily get the right results.

• But if we choose methods that get the right results, either defeatable by the author (not bound to the datasource/immutable), or actually do not actually limit rows (table calcs)

• Can you constrain your query non-arbitrarily?

Page 48: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Key insights so far…

• Be thoughtful in exposing and authoring with large dimensions without summarization

• Consider non-arbitrary (ie. Top N) filters/parameters to constrain dimensions to a sensible domain

• Datasource parameters (new in 8.0)• Custom SQL, Views

• Stored procedures (new in 8.1)• SAP BW variables

• Leave run-away query governance to the database itself – if your database supports it.

• Then, detect when the database is truncating results and indicate if necessary.

Page 49: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

5. In-database query governor for Tableau user

4. Filter Shelf (no context)

In-database query governance

3. Context Filters

2. Datasource Filters

1. Custom SQL or DB view

# = order of evaluation

Truncates but does not introduce inaccuracies

Page 50: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

In-database query governors

• Elephant in the room:• Tableau does not have a mid-tier query

governor to wrapper the final SQL w/limit • And… most but not all query governors are

resource not row-constrained• Bullet-proof, but functionality does not exist for

all databases – I think.• Set initial SQL coming to more databases in 8.2

(supported in Teradata today)• Work with your DBA to setup resource limits

• I welcome your feedback: [email protected]

Page 51: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

When truncating, let the user know!

• Create calculated field TOTAL(SUM([Number of Records])) that triggers when limit is reached. Simply display in title.

Page 52: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Separate indicator view

• Requires all main view filters to be applied!

Page 53: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Indicator with record Count < or > 1,000

Page 54: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

If not bullet-proof, be informed

• If immutable row-count limits/constraints are not available in your database• Train your users to be respectful of large

dimensions• Encourage use of parameter/filters that will

non-arbitrarily constrain large dimensions.• Avoid blends and table calcs with high-

cardinality dimensions.• Carefully review your report design and see

identify where your queries are returning “too many” rows for the task at hand (more than 10x should be a dead give-away)

Page 55: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Big Data Governance

• Summary tables• Materialized views or ETL• Tableau extracts

• Other guidance• Try to ensure that run-away queries are

governed by database itself: bullet-proof limits may be available for DB

• And notify the user if possible, if limits are reached.

• Be careful of Top N when used with other filters• Understand data context

• Educate users about querying high-cardinality dimensions

Page 56: Practical Query Governance  & Data Security

©2012 Tableau Software Inc. All rights reserved.

Questions