Upload
planet-cassandra
View
1.285
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Over the past few years, Health Market Science has transitioned from traditional relational databases and enterprise systems to a massively scalable Big Data platform that combines Cassandra and Storm to ingest thousands of feeds of data from the health market industry to produce a single high-quality masterfile. Come hear the "Why?", "What for?" and "How?" of that evolution.
Citation preview
© Health Market Science 2013, All Rights Reserved
Isaac Rieksts
Software Developer
@IsaacRieksts, [email protected]
CROSSING THE CHASM
SQL to NOSQL
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Our Mission
§ Deliver the most current information on the U.S. healthcare provider universe using integrated solutions in order for customers to: › Prevent fraud, waste and abuse across the healthcare system › Comply with evolving state and federal regulations › Improve market opportunity for non retail drugs and devices
#Cassandra13
© Health Market Science 2013, All Rights Reserved
The Business
Business Solutions
Health Care Provider & Facilities
Variety/Velocity • >2000 of sources • 6 Million unique HCPs • 10+ years history Data Challenges • Constant change in real
world data • Conflicting & partial info • Frequent changes to
source structure • Authoritative sources vs.
crowdsource • Predicting source quality
Master Data Solutions Medical Procedures & Diagnosis
Volume/Velocity • ~1B claims annually • +5B records annually • 5+ years history Data Challenges • Sources have
incomplete capture • Overlapping source data • Statistical projections &
biases • Social media type
relationships
Medical Claims Data
Batch (CompleteView,
Expense Manager, CompleteSpend)
Transactional (PRS/PE)
Big Data Relational DB &
Analytics (Claims)
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Master Data Management
Visualization
Dashboard / Reports
Structured Storage
Relational Indexing
Flexible Storage
NoSQL Graph(s)
Interfacing
Web Services
Distributed Processing
Standardize
Validate
Match
Consolidate
Analytics
Data Sources
Government
Web
Customer
I’m happy
User Interface
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Consolidation
First Name: John Middle Name: David Last Name: Smith
First Name: Mike Middle Name: Steve Last Name: Smith
First Name: Mike Middle Name: David Last Name: Smith
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Legacy System
§ Relational DB
§ Jboss
§ Jboss MQ
§ 1 Week to process a record through the system
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Our Solutions
Business Needs
Finance & Legal Business Systems Compliance Sales & Marketing
Solutions Compliance Data Assessment, Integration, &
Outsourcing Enrichment Services
Provider Data
01010011
Market Intelligence
HMS Authoritative
Sources PDC Federal State Medical Claims Web Derived
Advanced Technology
Storm
HMS MDM
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Data Model
§ Think of full entity
§ Build entity as you go
§ Get full view upon fetch
§ Choose PK carefully
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Cassandra-Indexing
§ Fast wide row alternate key for Cassandra
§ Two row pull process › Fetch PKs matching AK › Use PK to fetch your data
https://github.com/hmsonline/cassandra-indexing #Cassandra13
© Health Market Science 2013, All Rights Reserved
Cassandra-Indexing
§ Key: Col1:Col2
§ Index: Col2:Col1
https://github.com/hmsonline/cassandra-indexing #Cassandra13
© Health Market Science 2013, All Rights Reserved
Cassandra-Indexing Example
§ Key: <First Name>:<Last Name>
§ Index: <Last Name>:<First Name>
§ Data › John:Smith › Steve:Smith › David:Jones
§ Index fetch “Smith” => John:Smith, Steve:Smith
§ Index fetch “Jones” => David:Jones
https://github.com/hmsonline/cassandra-indexing #Cassandra13
© Health Market Science 2013, All Rights Reserved
System Phase 1
#Cassandra13
© Health Market Science 2013, All Rights Reserved
System Phase 2
#Cassandra13
© Health Market Science 2013, All Rights Reserved
System Phase 3
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Oracle Advanced Queue
§ Integrate Relation DB and JMS
§ Near Real time processing of data › Table trigger
§ Bulk exports › Keep only what you need on the queue
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Oracle Advanced Queue (cont)
§ Distributed processing › Write to Cassandra as of queue time › Write only ids and query back for data
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Unit testing
§ Module level › In memory mock › Map<String, Map<String, Map<String, Map<String, String>>>> › Map<Keyspace, Map<Column Family, Map<Column, Map<Row
Key, Value>>>>
§ Integration › Embedded Cassandra super class › Schema migration
#Cassandra13
© Health Market Science 2013, All Rights Reserved
QA
§ Fail fast and early
§ SoapUI and Maven
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Organization Design
§ Project Manager
§ Business Analyst
§ Quality Assurance
§ Software Developer
§ Development Operations
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Devops
§ Virtual Hardware (VMware)
§ Puppet › Puppet Master › Jenkins
§ Promote using config › Same script run in DEV as in Prod
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Real-time System
Kafka Queue(s)
Offset
C* A
B C
C* ES1 Kafka
Elastic Search
ES2 C*
REST API
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Storm
• Guaranteed once semantics • Well-designed processing abstraction • Beats BYODP • Momentum
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Storm and Cassandra
§ Use Cases: › Write Storm Tuple data to C*
§ Computation Results § Pre-computed indices
› Read data from C* and emit Storm Tuples § Dynamic Lookups
http://github.com/hmsonline/storm-cassandra #Cassandra13
© Health Market Science 2013, All Rights Reserved
Storm-Cassandra Project
§ ColumnsMapper Interface › Tells the CassandraLookupBolt how to transform a C* row into a
Storm Tuple
§ Given a C* Row Key and list of Columns: › Return a list of Storm Tuples
http://github.com/hmsonline/storm-cassandra
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Vision
Engine
• Unpredictable schema/layout
• Expand data storage structure dynamically
• Fuzzy Search
Unstructured Data
• Traversing relationships • Building connections • Real time relationship
changes
Graph Database
• Traditional data base • Predictable, logical structure • Faceted Search
Structured Data
• Scalability • Performance • Processing power • Virtual grow/shrink
Distributed Processing
Data
#Cassandra13
© Health Market Science 2013, All Rights Reserved
Summary
§ Cassandra-Indexing
§ Oracle Advanced Queue
§ Storm-Cassandra
#Cassandra13
© Health Market Science 2013, All Rights Reserved
THE SCIENCE OF BETTER RESULTS
www.healthmarketscience.com
2700 Horizon Drive • King of Prussia, PA 19406 • 800.593.4467 • [email protected]
Questions?
#Cassandra13