Zeppelin meetup 2016 madrid

Preview:

Citation preview

Advanced features of Apache Zeppelinhttp://zeppelin.apache.org

Jongyoul Lee

PMC of Apache Zeppelin from Sep. 2015.

Software Development Engineer at NFLabs

Advanced?• lium

• A new extension for visualization

• Multi-users features

• Users & Permissions

• Per user/Per note & Shared/Scoped/Isolated

• Futures

• Impersonation & Personalized mode

• Scalability & Reliability

He2

liumHe2

Zeppelin

Visualizations : 6 Built-in visualizations comes with pivot

Table Bar Pie Area Line Scatter

Free to draw any customized visualizations inside of notebook

He liumHe2

Interpreter Notebook StorageSp

ark

Flin

k

Geo

de

JDBC …

File

Sys

tem

Amaz

on S

3

Git …

Application

Visu

aliz

atio

ns

Map

Wor

dClo

ud

Resource PoolSparkContext Flink Environment JDBC connection …

Ana

lytic

s

… …

User object

Extend pluggable visualization to pluggable analytics application

Working in progress to make visualization pluggable

Users and Permissions

• Company complains

• Why security works …

• Why authentication works …

• Why Zeppelin stores my password as plain …

• Why two user use same Spark …

• Why I wait while other run somethings

& Enterprise

Auhentication : Integrated with Apache Shiro

Contributions

- PAM - ActiveDirectory - Jdbc - Jndi - Ldap - Properties

Zeppelin

Notebook Authorization : Owners, Writers, Readers per Note

Zeppelin

Multi-tenancyPer user/Per note & Shared/Scoped/Isolated

SHARED ISOLATED SCOPED

PROCESS 1 N 1

THREADS 1 1 N

Multi-tenancyZeppelin

ZeppelinServer

SparkInterpreter

Run P1 on NoteA

Run SparkInterpreter for P1

User1

User2

Run P2 on NoteB Run SparkInterpreter for P2

SharedZeppelin

• Originally implemented • Pros

• Simple structure • Predictable behavior

• Cons • All resources shared • Interference among users

SharedZeppelin

ZeppelinServer

SparkInterpreter

Run P1 on NoteA

Run SparkInterpreter for P1

User1

User2

Run P2 on NoteB

Run SparkInterpreter for P2 SparkInterpreter

IsolatedZeppelin

• Pros • No pending • No resources shared

• Cons • Lots of memory • Inefficiency of using memory • Limited by resources

IsolatedZeppelin

ZeppelinServer

JDBCInterpreter

Run P2 on NoteA

Run SparkInterpreter for P2

User1

User2

Run P3 on NoteB Run SparkInterpreter for P3

Scoped

JDBCInstance User1

JDBCInstance User2

Zeppelin

• Pros • Less memory • Some resources Isolated

• Cons • Some resources shared • Big single process

ScopedZeppelin

SHARED ISOLATED SCOPED

PROCESS 1 N 1

THREADS 1 1 N

Multi-tenancyZeppelin

• ~ 0.7.0

• Impersonation of JDBC/Spark Interpreter

• Personalized mode

• 0.7.0 ~

• Scalability & Reliability

• …

& Futures

Thank you

Jongyoul Lee jongyoul@nflabs.com

@madeng