Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterprise Search

http://cassandrasummit-datastax.com/agenda/?source=



1. Introduction2. Persistence needs of an API PaaS3. Selecting DataStax Enterprise Search4. Main challenges and solutions5. Conclusion6. Q&A

Agenda

Introduction

● Jérôme Louvel○ founder & CTO of Restlet, Web API platform vendor○ created Restlet Framework, first REST framework in 2004○ contributor to “RESTful Web Services” (O’Reilly, 2007)○ member of the JAX-RS 1.0 expert group (2007 - 2009)○ co-author of “Restlet in Action” (Manning, 2012)○ InfoQ editor covering Web APIs since 2014

● Guillaume Blondeau○ DevOps engineer at Restlet○ working on APISpark cloud platform○ Cassandra Administrator certified by DataStax

About the Speakers

x

•

○

○

○

•○○○

● Key features○ visual creation & deployment of

data APIs○ operation of APIs &

their local data sources○ management of any API

● Benefits○ accessible via web browser,

no technical expertise required○ companies of any size can

become API providers○ get started for free, then pay

when the API generates traffic

About APISpark

Persistence Needs of an API PaaS

High Availability of APIs and their Data Stores

Low Latency for Users Across the Globe

Rugby World Cup Data

High Scalability & Elasticity

● For API traffic○ concurrent calls○ workload types○ peaks handling

● For data storage○ number of stores○ volume of data ...

...

...

...

● Filtering on properties

● Pagination

● Sorting

Rich Query Capabilities

High Multi-tenant Density

● Balance between○ data isolation○ low cost

● Many customers & projects○ sharing persistence

infrastructure○ isolated data stores

● Many users & groups○ personal data○ shared group data

Selecting DataStax Enterprise Search

Step 1: Prototyping with AWS NoSQL

● Started with SimpleDB○ zero ops, highly available & low latency○ mono-region & limited query capabilities

● Upgraded to DynamoDB○ better scalability & predictability○ not really for multi-tenant use cases (soft limits)○ not very elastic (provisioned throughput)

● Other limitations○ unable to develop and test locally (MySQL mode)○ strong AWS lock-in

Step 2: Moving to Apache Cassandra

● For APISpark beta version○ increasing multi-tenancy needs○ increasing cost concerns

● Benefits○ fully open source & free (vendor support)○ on-premise deployments possible○ proven scalability on AWS (Netflix)○ richer query capabilities○ natively multi-region

Step 3: Upgrading to DataStax Enterprise

● For APISpark GA○ DataStax certified stack○ production support

● Improved capabilities○ much richer query capabilities with Solr integration○ administration console○ command line tooling○ comprehensive documentation

● Still open source foundation○ limited vendor lock-in○ mature open source components

Current Persistence DesignEntity Store

Entity

Property

Primary Key

7 Main Challenges & Solutions

DataStax Enterprise Search 4.6.7(Cassandra 2.0.14, Solr 4.6.0)

● Using Ec2MultiRegionSnitch

● 1 Entity Store = 1 Keyspace○ Each keyspace can set its own replication policy

I. Deploying Across Multiple Regions

● 1 Entity Store = 1 Keyspace○ Data isolated in File System and Memory

● Complementary benefit○ ACL per keyspace

II. Isolating Customer Data & Keeping Cost Low

Keyspace

Table

Composite property

List property

III. Supporting Complex Data Models

IV. Dealing with Dynamic Schema Changes (1/3)

ALTER TABLE DROP

ALTER TABLE ADD


User Action on Entity Store Action performed in DB

Create Entity CQL: “CREATE TABLE <tableName>” + Solr Core creation

Delete Entity CQL: “DROP TABLE <tableName>”

Create Property CQL: “ALTER TABLE ADD <columnName> <type>” + Solr Core schema update

Delete Property CQL: “ALTER TABLE DROP <columnName>” + Solr Core schema update

Add Property in composite Java: Alter JSON for all rows

Delete Property in composite Java: Alter JSON for all rows

● Advantages○ flexibility compared to RDBMS

■ no lock○ available actions

■ add / drop / rename column■ change type of column

● Limitations○ schema deployment can take time○ in some edge cases can’t recreate columns


V. High Multi-tenant Density (1/2)

Schema deployment time with growing # of tables

● Challenge○ large number of C* tables & Solr cores○ memory usage (ex: 1 C* table takes more than 1MB of heap)

● Solutions○ adjust JVM memory settings○ need to create additional clusters○ deprovision unused Entity Stores

V. High Multi-tenant Density (2/2)

VI. Query Capabilities 1/2

Search queries

Upsert / Delete / “Get by id” queries

● Filtering on a property

● Pagination

● Sorting

VI. Query Capabilities 2/2

Solr Queries

VII. Analytics (1/2)

Provide analytics about API calls

VII. Analytics (2/2)

used for latest API calls

issue with wide rows(heavily used APIs)

1 table per report

use of C* counters

Conclusion

● Special use case of DataStax Enterprise○ not a lot of shared knowledge about it○ great support from DataStax○ DSE is a good fit despite some challenges

● Looking forward to DSE 4.8 !○ User Defined Types with Solr indexing○ live indexing of C* data into Solr○ improved overall performance

Conclusion

Questions ?

Technology

Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterprise Search