32
User Defined Functions … new in Apache Cassandra 3.0 CASSANDRA-7395 + many more…

User defined-functions-cassandra-summit-eu-2014

Embed Size (px)

Citation preview

Page 1: User defined-functions-cassandra-summit-eu-2014

User Defined Functions… new in Apache Cassandra 3.0

CASSANDRA-7395 + many more…

Page 2: User defined-functions-cassandra-summit-eu-2014

Me, Robert…• Contribute code to Apache Cassandra (UDFs, row cache + more)

• Help customers to build Cassandra solutions

• Freelancer, Coder

Robert Stupp, Cologne, Germany @snazy [email protected]

Page 3: User defined-functions-cassandra-summit-eu-2014

Disclaimer• Apache Cassandra 3.0 is the next major release

• Everything is under development

• Things may change

• Things may be different in final 3.0 release

Page 4: User defined-functions-cassandra-summit-eu-2014

Apache Cassandra 3.0• … will bring a lot of cool, new features !

• … will bring a lot of great improvements !

• UDFs is just one of these features :)

Page 5: User defined-functions-cassandra-summit-eu-2014

UDF - What’s that??

Page 6: User defined-functions-cassandra-summit-eu-2014

UDF• UDF means User Defined Function

• You write the code that’s executed on Cassandra nodes

• Functions are distributed transparently to the whole cluster

• You may not have to wait for a new release for new functionality :)

Page 7: User defined-functions-cassandra-summit-eu-2014

UDF Characteristics• „Pure“

• just input parameters

• no state, side effects, dependencies to other code, etc

• Usually deterministic

Page 8: User defined-functions-cassandra-summit-eu-2014

Consider a Java function like…import  nothing;

public  final  class  MyClass  {      public  static  int  myFunction  (  int  argument  )      {          return  argument  *  42;      }  }

This would be your UDF

Page 9: User defined-functions-cassandra-summit-eu-2014

Java works out of the box!

CREATE  FUNCTION  sinPlusFoo(          valueA        double,          valueB        double)RETURNS  doubleLANGUAGE  javaAS  ’return  Math.sin(valueA)  +  valueB;’;

Example

Java code

define the UDF language

arguments

return type

Page 10: User defined-functions-cassandra-summit-eu-2014

JavaScript works out of the box!

Cassandra 3.0 targets Java8 - so it’s „Nashorn“

CREATE  FUNCTION  sin  (          value        double  )RETURNS  doubleLANGUAGE  javascriptAS  ’Math.sin(value);’;

Next Example

Javascript code

JavaScript works, too!

Page 11: User defined-functions-cassandra-summit-eu-2014

JSR 223• “Scripting for the Java Platform“

• UDFs can be written in Java and JavaScript

• Optionally: Groovy, JRuby, Jython, Scala

• Not: Clojure (JSR 223 implementation’s wrong)

Page 12: User defined-functions-cassandra-summit-eu-2014

Behind the scenes• Builds Java (or script) source

• Compiles that code (Java class, or compiled script)

• Loads the compiled code

• Migrates the function to all other nodes

• Done - UDF is executable on any node

Page 13: User defined-functions-cassandra-summit-eu-2014

Types for UDFs• Support for all Cassandra types for arguments and return value

• All means

• Primitives (boolean, int, double, uuid, etc)

• Collections (list, set, map)

• Tuple types, User Defined Types

Page 14: User defined-functions-cassandra-summit-eu-2014

UDF - For what?

Page 15: User defined-functions-cassandra-summit-eu-2014

UDF invocation

Now your application can sum two values in one row - or create the sin of a value!GREAT NEW FEATURES!

Okay - not really…

SELECT  sumThat  (  colA,  colB  )        FROM  myTable        WHERE  key  =  ...

SELECT  sin  (  foo  )        FROM  myCircle        WHERE  pk  =  ...

Page 16: User defined-functions-cassandra-summit-eu-2014

UDFs are good for…• UDFs on their own are just „nice to have“

• Nothing you couldn’t do better in your application

Page 17: User defined-functions-cassandra-summit-eu-2014

Real Use Case for UDFs ?

Page 18: User defined-functions-cassandra-summit-eu-2014

User Defined Aggregates !CASSANDRA-8053

Page 19: User defined-functions-cassandra-summit-eu-2014

User Defined Aggregates

Use UDFs to code your own aggregation functions

(Aggregates are things like SUM, AVG, MIN, MAX, etc) Aggregates :

consume values from multiple rows & produce a single result

Page 20: User defined-functions-cassandra-summit-eu-2014

CREATE  AGGREGATE  minimum  (  int  )        STYPE  int        SFUNC  minimumState;

Example

Syntax similar to Postgres.

name of the aggregate function argument types

name of the “state“ UDF

state type

Page 21: User defined-functions-cassandra-summit-eu-2014

How an aggregate worksSELECT  minimum  (  val  )  FROM  foo …

1. Initial state is set to null

2. for each row the state function is called withcurrent state and column value - returns new state

3. After all rows the aggregate returns the last state

Page 22: User defined-functions-cassandra-summit-eu-2014

CREATE  AGGREGATE  average  (  int  )        SFUNC  averageState        STYPE  tuple<long,int>        FINALFUNC  averageFinal        INITCOND  (0,  0);

More sophisticated

FINALFUNC + INITCOND are optional initial state value

UDF called after last row

Page 23: User defined-functions-cassandra-summit-eu-2014

How that aggregate worksSELECT  average  (  val  )  FROM  foo …

1. Initial state is set to (0,0)

2. for each row the state function is called withcurrent state + column value - returns new state

3. After all rows the final function is called with last state

4. final function calculates the aggregate

Page 24: User defined-functions-cassandra-summit-eu-2014

Now everybody can execute evil code on your cluster :)

Page 25: User defined-functions-cassandra-summit-eu-2014

UDF permissions• There will be permissions to restrict (allow)

• UDF creation (DDL)

• UDF execution (DML)

CASSANDRA-7557

Page 26: User defined-functions-cassandra-summit-eu-2014

Built in functions• All known built-in functions are called native functions

• Native functions belong to SYSTEM keyspace

• Native functions cannot be modified (or dropped)

Note: you already know native functions like now, count, unixtimestampof

Page 27: User defined-functions-cassandra-summit-eu-2014

UDF belong to a keyspace• User Defined Functions and

• User Defined Aggregate

• belong to a keyspace

• SYSTEM keyspace is searched first for functions (then the current keyspace) if function/aggregate is not fully qualified

Page 28: User defined-functions-cassandra-summit-eu-2014

UDF - some final words…Keep in mind:

• JSR-223 has overhead - Java UDFs are much faster

• Do not allow everyone to create UDFs (in production)

• Keep your UDFs “pure“

• Test your UDFs and user defined aggregates thoroughly

Page 29: User defined-functions-cassandra-summit-eu-2014

For the geeks :)• UDFs and user defined aggregates are executed on the coordinator node

• Prefer to use Java-UDFs for performance reasons

Page 30: User defined-functions-cassandra-summit-eu-2014

Let a man dream…UDFs could be useful for…

• Functional indexes

• Partial indexes

• Filtering

• Distributed GROUP BY

• etc etc

Page 31: User defined-functions-cassandra-summit-eu-2014

Q & A

THANK YOU FOR YOUR ATTENTION :)

Robert Stupp, Cologne, Germany

@snazy [email protected]

Page 32: User defined-functions-cassandra-summit-eu-2014