Upload
datastax-academy
View
864
Download
0
Embed Size (px)
Citation preview
Powering Enterprise Data-driven Applications with Cassandra
“ ”2
Be Right Fasterwith
Reliable Data, Relevant Insights,
Recommended Actions
TM
#DataManagement
#BigData
#ML
© 2015. All Rights Reserved.
Anastasia ZamyshlyaevaVP Platform Product Management and Co-founder @ Reltio
• 2011 – started working with C*
• 2012 – selected C* as the persistence store for creating a hybrid Columnar & Graph data-store
• Since 2012 – Running in Production to support:
– 24/7 uptime with 99.995% availability
– Multi-Tenancy across customers
– both Operational and Analytical workloads
[email protected]/in/azamyshlyaeva
© 2015. All Rights Reserved. 3
“If you focus on the smallest details, you never get the big picture right”
~ Leroy Hood
© 2015. All Rights Reserved. 4
© 2015. All Rights Reserved. 5
© 2015. All Rights Reserved. 6
© 2015. All Rights Reserved. 7
© 2015. All Rights Reserved. 8
Sales
Web site
Support
Supply
Marketing
© 2015. All Rights Reserved. 9
Sales
Web site
Supply
MarketingSupport
© 2015. All Rights Reserved. 10
Sales
Web site
Supply
MarketingSupport
Enterprise Applications Ecosystem11© 2015. All Rights Reserved.
Is data up-to-date?
Is data correct?
?? ?Is data complete?
© 2015. All Rights Reserved. 12
© 2015. All Rights Reserved. 13
Sales
Web site
Data Unification Application
Supply
(based on Relational Databases)• Fixed structure• No big data• Expensive• Hard to support graphs and complex attributes• Single point of failure (often) MarketingSupport
© 2015. All Rights Reserved. 14
Sales
Web site
Supply
MarketingSupport (based on Cassandra)
Why Cassandra?üHigh performance
üFault tolerance
üLinear scalability
üMulti-datacenter
© 2015. All Rights Reserved. 15
Reltio Metadata-driven Model and Operations
© 2015. All Rights Reserved. 16
Doctors and HospitalsSchema
configureUI, REST API, Analytics
© 2015. All Rights Reserved. 17
Oil & GasSchema
Reltio Metadata-driven Model and Operations
UI, REST API, Analyticsconfigure
© 2015. All Rights Reserved. 18
Asset CatalogSchema
Reltio Metadata-driven Model and Operations
UI, REST API, Analyticsconfigure
AMan
Cassandra is a primary datastore
© 2015. All Rights Reserved. 19
© 2015. All Rights Reserved. 20
ID: doc1Type: IndividualName: JohnEmail: [email protected]
[email protected]: CA, shipping
NY, billing
Entity type: Individual- Name: String- Email: List- Address: Complex
- State: String- Type: List
Metadata Entity
doc1<Name>.1 …
John
Simple metadata-driven attributes in Cassandra (Thrift API)
Metadata-driven Documents in Columnar storage
ID: doc1Type: IndividualName: JohnEmail: [email protected]
[email protected]: CA, shipping
NY, billing
Entity type: Individual- Name: String- Email: List- Address: Complex
- State: String- Type: List
© 2015. All Rights Reserved. 21
Entity
doc1… <Email>.1 <Email>.2 …
… [email protected] [email protected]
Multi-value metadata-driven attributes in Cassandra (Thrift API)
Metadata
Metadata-driven Documents in Columnar storage
ID: doc1Type: IndividualName: JohnEmail: [email protected]
[email protected]: CA, shipping (1)
NY, billing (2)
© 2015. All Rights Reserved. 22
Entity
doc1… <Address>.1.<State>.1 <Address>.1.<Type>.1 <Address>.2.<State>.1 …
… CA billing NY
Complex metadata-driven attributes in Cassandra (Thrift API)
Metadata
Metadata-driven Documents in Columnar storage
Entity type: Individual- Name: String- Email: List- Address: Complex
- State: String- Type: List
© 2015. All Rights Reserved. 23
Metadata-driven Documents – CQL wide rowsCREATE TABLE ENTITIES(
doc_id int,attribute_name String,attribute_value String,…PRIMARY KEY (doc_id, attribute_name)
);
SELECT * -- select all addressesFROM ENTITIESWHERE doc_id = 1AND attribute_name >= Address.0 AND attribute_name <= Address.9;
© 2015. All Rights Reserved. 24
John
DunderMifflin
Dwight
CopyPaper
Employee Individual
ProductOrganization Cassandra- Records storage across datacenters
Reltio- Metadata-driven graphs- Rich model for entities, relations- Partitioning- Effective joins- Graph operations
Hybrid Graphs - linked entities with infinite attribution
25
Reltio de-duplication
John Smith
Jon Smith
© 2015. All Rights Reserved. 26
Cassandra+ = Hybrid searchElasticsearch** excluded documents
Hybrid Search – without documents!
0
0.5
1
1.5
Data volume in Elasticsearch index (Tb)
0
1000
2000
Elasticsearch indexing performance (OPS)
0
10
20
30
Search performance on large documents (sec)
- Elasticsearch
- Hybrid search: Elasticsearch + Cassandra
Reltio Cloud Data Components
© 2015. All Rights Reserved.
Spark
AWS
AWS Redshift
Cassandra
Elasticsearch
Reltio Use Cases
© 2015. All Rights Reserved. 28
AManag
Thank you