Upload
rajasekaran-kandhasamy
View
185
Download
2
Embed Size (px)
Citation preview
Table Of Contents
Abstract Title: MMIS - Big Data Integration ................................................................................ 3
Author : Rajasekaran Kandhasamy ......................................................................................... 3
Overview of Paper: ................................................................................................................................... 3
Introduction: ......................................................................................................................................... 3
Note: ................................................................................................................................................. 3
Functional Specification: ........................................................................................................................... 3
Primary Use Cases: ................................................................................................................................ 3
1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud Data
Center (CDC): .................................................................................................................................... 3
2) Unified Claims Archiving (UAC) ................................................................................................. 4
3) Extract, Transform and Load (ETL) Integration ......................................................................... 4
4) Process Large Audit or Log Files ................................................................................................ 5
Secondary Use Cases: ........................................................................................................................... 5
5) Near - Continuous data protection (CDP) or Backup & Recovery ............................................. 5
Value to Payers: ........................................................................................................................................ 6
Technical Specification: ............................................................................................................................. 7
High Level Architecture: ........................................................................................................................ 7
1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud Data
Center (CDC) Flow: ................................................................................................................................ 8
2) Unified Claims Archiving (UAC): .................................................................................................... 9
HBase Claim Archival Sample Data Model: ..................................................................................... 10
3) Extract, Transform and Load (ETL) Integration: .......................................................................... 11
Proposed system: ............................................................................................................................ 11
Design Option 1:.............................................................................................................................. 11
Design Option 2:.............................................................................................................................. 11
4) Process Large Audit or Log Files: ................................................................................................. 12
Security Audit ............................................................................................................................. 12
Application Audit ............................................................................................................................ 12
5) Near - Continuous data protection (CDP) Backup & Recovery: .................................................. 13
Proposed system: ............................................................................................................................ 13
Backup: ............................................................................................................................................ 13
Recovery: ........................................................................................................................................ 14
Hive Claim Backup Sample Data Model: ......................................................................................... 14
Abstract Title: MMIS - Big Data Integration
Author : Rajasekaran Kandhasamy ([email protected])
Overview of Paper:
Introduction: MMIS/HealthCare Payer Applications depend upon traditional data base models and
structured data analytics to fulfill their needs. These approaches, while adequate in the past, will not
suffice to address future requirements. They lack the processing capability to load and query multi-
terabyte datasets in a timely fashion and the flexibility to effectively manage unstructured and semi-
structured data. Adapti g Big Data platfo to MMI“ appli atio ill esol e a o e issues.
This technical paper provides details about integrating MMIS/HealthCare payer
appli atio s ith Hadoop ased Big Data platfo .
Note: MMIS is big application and hence paper covers use cases only related to claims.
The proposal is not a replacement for OLTP database approach and just an idea what benefits we can
get if we integrate with Hadoop technologies. Another Non MMIS application covered here is MMIS
BI/BIRT/JASPERSOFT is nothing but business intelligence analytical tools with chart/report capabilities.
Simply open source BI or reporting tool to manage big data activities.
Functional Specification:
Primary Use Cases:
1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or]
Cloud Data Center (CDC): is a multi-source data collection/management platform that
delivers backup, archive, search, and analytics capabilities to Medicaid or HealthCare
Payer applications. Introducing cloud based options for data that must be kept for
extremely long periods of time. Simply, consolidate MMIS file transfer processes under
one managed solution.
CDC provides connectivity for the flow of data in the form of files between providers, state
agencies and switch vendors and Enterprise System.
Within the context of MMIS, A CDC/HSS is one which describes an interaction
between external entities, system or service agency (e.g. Switch Vendor, Provider Agency)
with the MMIS/Payer applications. This interaction could involve transfer of data and
store the data. Most of these external interfaces will be file based inbound and outbound
interfaces which come in batches. External systems will be exchanging the data with the
MMIS in different formats. Each data file may contain one or more records. The possible
file formats are:
a. X12 files
b. XML files
c. Flat files (Delimited and Fixed width data, comma separated values)
d. Binary data
Advantages:
While the majority of files in most payer applications are stored on IT-managed
file servers that are not always under the direct control of IT. Here we maintain
all in one umbrella.
Reduce li e si g, ai te a e a d suppo t osts of file se e s. De elop a o vendor lock-i f a e o k.
Scalable storage (Hadoop) environment with CDC a solutio that does ’t require a change in platforms or the retraining of IT employees and
administrators. CDC delivers comprehensive capabilities more efficiently than
ad hoc data/file management systems do, allowing an enterprise to dedicate
fewer resources to supporting infrastructure than to innovating, so it can
quickly bring its innovation to market.
Reduce MMIS operational costs with respect to data storage, backup, archive
and maintenance.
Minimize analytical latency in big data like environment.
External systems can connect to UFM/CDC/HSS using any FTP/SFTP client or
REST based Resource Oriented Architecture (ROA).
2) Unified Claims Archiving (UAC): Preserve information for compliance, legal, business
reference, or system optimization purposes. Archiving capabilities that MEDICAID may not
believe they need now but, given current archive market trends, will be extremely useful
to them in the near future. The combination of increasing manual and machine-generated
data and increasingly larger file/message/database sizes. For a variety of reasons, a good
portion of this data needs to be archived. Once a claim case was closed it became the
responsibility of the MMIS/HealthCare Payer to archive and manage the closed file in
compliance with regulations.
Advantages:
Data associated with claims processing is a good candidate for data archival. If
the size of a production table gets too large there will be a distinct impact to
retrieval time. Most of the core system screens include limits on the number of
records the screen will retrieve and display. When tables are large, screens will
not display all of the applicable data and some screens will not function.
Moving old data from MMIS OLTP to Hadoop based HBase can increase MMIS
OLTP performance. More number of unwanted/old unused records in claims
table would decrease the performance.
Historical information for comparative and competitive analysis.
Enhanced data quality and completeness.
Supplementing disaster recovery plans with another data backup source.
3) Extract, Transform and Load (ETL) Integration: Most ETL software packages require
their own servers, processing, databases, and licenses. They also require setup,
configuration, and development by experts in that particular tool, and those skills are not
always transferable.
Advantages:
For instance MMIS reference sub module receives different set of procedure,
diagnosis and other codes as files from CMS periodically. These files are stored
in CDC and Hadoop based Pig/Hive used to convert this file data into MMIS
understandable format with less expensive manner.
Note: Most of the existing MMIS uses Pl/SQL based ETL for loading data into
MMIS DB. If the MMIs don’t want to break existing flow then use CDC adapter to
get file from Hadoop instead of traditional file server.
UFP/ CDC can easily be stored inexpensively in the cloud and processed by Hive
to ETL data. It is a cost-effective complement to data warehouse solutions, and
it reduces risk, cost, and/or improves accessibility over in-house solutions. Once
data is processed and stored in Hive it does make sense to consider the various
file formats available.
4) Process Large Audit or Log Files: Audits are historical and immutable. We can
segregate MMIS audits in two categories.
a) Security Audit: MMIS application keep logging user actions in the form MMIS
file and this file will be moved to Hadoop based HBase NoSql DB. Through
MMIS BI application user can view who logged in, what actions he performed
and so on.
b) Application Audit: Existing MMIS have following application audit options,
1) DB Triggers,
2) Module specific code will insert for each user operation. E.g. Error codes
view history.
New proposal uses JMS or queue: A scalable approach, if you really need it,
and one that is completely in line with the J2EE specification, is to use JMS.
That is, publish your audit log messages to a message queue, and another,
separate process (Flume), can take them off the queue and log them either in
Cloud Data Center (CDC) based HBase NoSQL database.
Secondary Use Cases:
5) Near - Continuous data protection (CDP) or Backup & Recovery: Near-continuous
data protection (near CDP) is a general term for backup and recovery products that take
backup snapshots at set intervals. CDP technology protects data on a nearly continuous
basis. Rather than running a large monolithic backup overnight, CDP products back up
data every few minutes, 24 hours a day.
Advantages:
N-CDP is a Hadoop-based backup solution that efficiently and cost-effectively
protects business-critical healthcare data such as databases, and files.
By default Hadoop enables near-instant recovery from disasters and other
replication features.
By providing continuous and periodic protection, N-CDP allows organizations to
enhance or eliminate their tape-backup infrastructures, minimizing software
license and maintenance fees as well as hardware and tape costs.
Recovery Point Objective (RPO) refers to the point in time in the past to which
you will recover.
Recovery Time Objective (RTO) refers to the point in time in the future at which
you will be up and running again.
Difference between CDP and N-CDP:
CDP backup the data for every action on data. But N-CDP take backup on user
defined regular interval.
Value to Payers:
Proudly say Payers is in cloud and big data market.
The cloud based data centers can subscribe by other parties/state/payer with agreed
SLAs. So there is no separate data centers maintenance required for each state or payer.
Reduce licensing, maintenance and support costs. Go with a o e do lo k-i framework. Wherever possible avoid licensing software run along with MMIS and go with
open source proposed tools. For e.g,
a) Informatica - Use CDC based Hadoop ETL tools.
b) FTP Server - Use CDC.
c) COGNOS or Other BI tools - Use MMIS BI/BIRT/JASPERSOFT based open source
analytical tool.
d) Archive and backup tool - Use proposed approach.
By developing more operational and analytical related use cases with this integration
will move Payers into business intelligence tool market.
SaaS/multi-tenant enabled MMIS BI application can use by several customers with low
infrastructure cost maintenance.
Much and more big data advantages.
Technical Specification:
High Level Architecture:
Maryland MMIS Tenant (E.g.)
SFTP
over
Hadoop
Providers
Agencies
Others
Inbound Landing Zone
Outbound Landing Zone
Claims/Reference/TPL
Member
Provider
Others
D
a
t
a
M
a
r
t
HBase/ Hive Flume Pig/Sqoop Oozie Tools/YARN
EHR/Cognos/Others MMIS BI/BIRT/JASPERSOFT DB MMIS/ HealthCare Payer DB
1) Unified File Management (UFM) - Heterogeneous Storage Solutions (HSS) [or] Cloud
Data Center (CDC) Flow:
Cloud Data Center:
i. External clients can upload files to their dedicated inbound directory through
FTP/SFTP.
ii. Here we use apache mina based customized SFTP to support Hadoop file
system.
iii. Once the files are placed, the MMIS listening queues pick file from HDFS and
start claim processing as per the above flow.
iv. Also MMIS BI to be capable of REST enabled service to upload files.
MMIS BI Unified File Management:
i. UFM is one of the sub module in MMIS BI application.
ii. Using SaaS MMIS BI, user can view complete inbound and outbound file details
under single point of access for the particular tenant.
iii. Different kind of charts/metrics used to monitor day to day file activities in CDC.
Note: Software As A Service (SaaS) MMIS BI name depicts that the application is tenant
aware. So same application services can be used by other subscribers.
EDI Claim Flow
Paper Claim Flow
Reference File Loading
Emdeon/OCR
SFTP over
Hadoop
EDI Claims
Paper Claims
Reference Files
Claims/Ref FTP/SFTP Clients
CDC -HDFS
Img Archival
PL/SQL
Hippa Validation
Claims Loader
Hippa Translation
Loading Process
MMIS DB
Claim OCR Data
SaaS - MMIS BI Unified File Management
File Monitoring
Ref File Read & Load
2) Unified Claims Archiving (UAC):
MMIS APP:
Through MMIS application user can perform different type of archival as per above
diagram.
Once archival initiated, Sqoop module will trigger. This Sqoop module load detail from
MMIS Db to CDC based HBase db. Claim HBase data structure depicts in below diagram.
MMIS BI:
Through MMIS BI application user can view different type of claims related charts. This
will read data from HBase NoSQL DB.
O e est e a ple is Ope atio al Met i s , he e use a ie paid lai s th ough certain period of the time.
Also appli atio suppo t A al ti al Met i s , he e use a ie lai fo e ast fo certain period of time.
MMIS DB
Cloud Data Center - Hadoop
HBase - NoSQL Database
Claims Data Mart
Archive Table Backup Table Other Tables
MMIS BI
Apache Sqoop
MMIS APP
Provider type based archival
Claim type based archival
Date wise archival
Claim status based archival
Quarterly archival
Yearly archival
Claims Archive Operational Metrics (PAST)
Claims Archive Analytical Metrics (FUTURE)
HBase Claim Archival Sample Data Model:
Around 50 - 80 tables are involved in claim adjudication related process. The below
section depicts mapping OLTP claim data model to HBase based NoSQL data model.
HBase currently does not do well with anything above two or three column families so in
this design we have one column family for all header related tables and one for claim
line related tables.
I elo diag a all heade elated ta le e t ies go i to HEADER FAMILY a d li e ite elated ta le e t ies go to LINE FMAILY .
“<ChildTableName_RecordNumber_ColumnName> is the generic format to insert the
values. This is nothing but mapping OLTP one to many to NoSQL tables. E.g: OLTP claim
Header cutback table entries go here as CUTBACK_1_QLFR= CUTBACK (Table name),
1(Record Number), QLFR (Column name).
Row - Key: <claimfiledate_claimtype_providername>
HBASE DATABASE
CLM_ARC_TB
HEADER FAMILY
TCN
CUTBACK_1_QLFR
TPL_1_AMT
LINE FAMILY
ATTACHMENT_1_NAME
PRVDR_1_LCTN
PROCEDURE_1_CODE
3) Extract, Transform and Load (ETL) Integration: Taxonomy codes, HCPCS, Correct Coding
Initiative (CCI), Diagnosis Related Group Codes (DRG), Medicare Physician Fee Schedule (MPFS),
ICD‑10, Clinical Lab Fee Schedule codes are the few interface reference files where payer will
receive from CMS/State/Others. All are claim reference codes to adjudicate the claims and these
needs to update periodically in MMIS DB.
Proposed system:
Design Option 1:
CMS/State/Others can place the reference files in CDC.
MMIS DB procedures pick the files from CDC and start loading the file content
into MMIS DB.
Design Option 2:
CMS/State/Others can place the reference files in CDC.
Apache Pig application is the ETL transaction model that describes how a
process will extract data from a CDC, transform it according to a rule set and
then load it into Apache Hive.
Apache Sqoop loads the details from Apache Hive to MMIS DB.
CMS/State/Others
MMIS DB
Cloud Data Center - Hadoop
HIVE
Reference Data Mart
Reference Table
Apache Sqoop
Apache Pig - ETL
HDFS EXTRACT
TRANSFORM
LOAD
4) Process Large Audit or Log Files:
Whenever user logged into the system MMIS start capturing user page actions in file
format.
Apache flume listening to this file and whenever row added this information moved to
HBase DB.
User can use BI tool view the details in allowed formats.
Security Audit
Application Audit
Security Log File
MMIS APP
Cloud Data Center - Hadoop
HBASE
SECUITY_AUDIT Table
Apache Flume
Live Streaming
All User Actions
MMIS BI
Live Data where can see logged
in user actions
JMS Queue
MMIS APP
Cloud Data Center - Hadoop
HBASE
APPLICATION_AUDIT Table
Apache Flume
Application audit
All User Modifications
MMIS BI
Live Data where can see logged
in user modifications
5) Near - Continuous data protection (CDP) Backup & Recovery:
Proposed system:
Backup:
Design Option 1: One time full load and subsequent update based on time stamp.
MMIS BI integration module triggers backup service at every one hour.
Backup service calls java sqoop client with claim tables as parameter.
One time activity: Sqoop get connect with MMIS DB and start to import
complete table data. Its start with header table and subsequent child table will
get load iteratively.
Sqoop support alternate table update strategy supported is called lastmodified
mode. So when rows of the source table (MMIS table) may be updated, and
each such update will set the value of a last-modified column to the current
timestamp. Only those records get update in Hive side and new records will be
inserted as usual in Hive.
Design Option 2: Complete MMIS snapshot every time.
MMIS BI integration module triggers backup service at every one hour.
Backup service calls java sqoop client with claim tables as parameter.
Every time sqoop import complete data from MMIS DB. Header table first and
child tables next.
Each primary key appended with job trigger time and this will be used in
recovery time.
Cloud Data Center - Hadoop
MMIS DB
…
HIVE
Claims Header Claims Line Claims Other Tables
Sqoop Import Sqoop Export
Backup
MMIS BI
Backup
ESB
BackupService
Recovery
UI -Enter Recovery Point
RecoveryService
Load data from mentioned time
Regular Interval
Recovery
Recovery:
If e hoose Desig Optio e o e ti e is ot e ui ed. Be ause this is exact MMIS DB copy.
If e hoose Desig Optio the e o e ti e is list of trigger time from
MMIS BI context. So user can choose any one of the time and snap shot data
obtained during that time will get load from cloud to MMIS DB.
If i ase a disaste happe s to MMIs DB, e a use Re o e odule to load data from cloud to MMIs.
Sqoop will export the data from Hive table to MMIS tables
Hive Claim Backup Sample Data Model:
There are no differences in MMIS data model and HIVE data model.
Mo e o less all a e sa e fo Desig Optio . Fo Desig Optio all p i a ke s will be appended with additional
timestamp surrogate key.