51
1 From Mining To Analytics Making Sense of Medicare Data

Making Sense of Medicare Data: From Mining to Analytics

Embed Size (px)

Citation preview

Page 1: Making Sense of Medicare Data: From Mining to Analytics

1

From Mining To Analytics

Making Sense of Medicare Data

2

Tripfilmscom

3

The Achievement Network

4

Archway Health Advisors

Medicare bull Centers for Medicare amp Medicaid Services (CMS)

bull Medicare is a national social insurance program since

1966 covering Americans aged 65 and older

bull ldquoFee For Servicerdquo Model

5

Medicare Fee for Service

6

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days 2 days 12 visits

$3000 $12000 $5000 $4000

= $24000

Bundled Payments for Care bull Better care smarter spending and healthier people

bull The Bundled Payments for Care Improvement (BPCI)

Payment arrangements Based on financial and performance

bull 4 Models

Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-

acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only

7

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 2: Making Sense of Medicare Data: From Mining to Analytics

2

Tripfilmscom

3

The Achievement Network

4

Archway Health Advisors

Medicare bull Centers for Medicare amp Medicaid Services (CMS)

bull Medicare is a national social insurance program since

1966 covering Americans aged 65 and older

bull ldquoFee For Servicerdquo Model

5

Medicare Fee for Service

6

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days 2 days 12 visits

$3000 $12000 $5000 $4000

= $24000

Bundled Payments for Care bull Better care smarter spending and healthier people

bull The Bundled Payments for Care Improvement (BPCI)

Payment arrangements Based on financial and performance

bull 4 Models

Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-

acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only

7

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 3: Making Sense of Medicare Data: From Mining to Analytics

3

The Achievement Network

4

Archway Health Advisors

Medicare bull Centers for Medicare amp Medicaid Services (CMS)

bull Medicare is a national social insurance program since

1966 covering Americans aged 65 and older

bull ldquoFee For Servicerdquo Model

5

Medicare Fee for Service

6

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days 2 days 12 visits

$3000 $12000 $5000 $4000

= $24000

Bundled Payments for Care bull Better care smarter spending and healthier people

bull The Bundled Payments for Care Improvement (BPCI)

Payment arrangements Based on financial and performance

bull 4 Models

Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-

acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only

7

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 4: Making Sense of Medicare Data: From Mining to Analytics

4

Archway Health Advisors

Medicare bull Centers for Medicare amp Medicaid Services (CMS)

bull Medicare is a national social insurance program since

1966 covering Americans aged 65 and older

bull ldquoFee For Servicerdquo Model

5

Medicare Fee for Service

6

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days 2 days 12 visits

$3000 $12000 $5000 $4000

= $24000

Bundled Payments for Care bull Better care smarter spending and healthier people

bull The Bundled Payments for Care Improvement (BPCI)

Payment arrangements Based on financial and performance

bull 4 Models

Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-

acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only

7

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 5: Making Sense of Medicare Data: From Mining to Analytics

Medicare bull Centers for Medicare amp Medicaid Services (CMS)

bull Medicare is a national social insurance program since

1966 covering Americans aged 65 and older

bull ldquoFee For Servicerdquo Model

5

Medicare Fee for Service

6

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days 2 days 12 visits

$3000 $12000 $5000 $4000

= $24000

Bundled Payments for Care bull Better care smarter spending and healthier people

bull The Bundled Payments for Care Improvement (BPCI)

Payment arrangements Based on financial and performance

bull 4 Models

Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-

acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only

7

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 6: Making Sense of Medicare Data: From Mining to Analytics

Medicare Fee for Service

6

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days 2 days 12 visits

$3000 $12000 $5000 $4000

= $24000

Bundled Payments for Care bull Better care smarter spending and healthier people

bull The Bundled Payments for Care Improvement (BPCI)

Payment arrangements Based on financial and performance

bull 4 Models

Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-

acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only

7

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 7: Making Sense of Medicare Data: From Mining to Analytics

Bundled Payments for Care bull Better care smarter spending and healthier people

bull The Bundled Payments for Care Improvement (BPCI)

Payment arrangements Based on financial and performance

bull 4 Models

Model 1 Retrospective Acute Care Hospital Stay Only Model 2 Retrospective Acute Care Hospital Stay And Post-

acute care Model 3 Retrospective Post-Acute Care Only Model 4 Acute Care Hospital Stay Only

7

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 8: Making Sense of Medicare Data: From Mining to Analytics

Medicare BPCI

8

Hospital SNF

CMS

Claim Claim Claim Claims Claim Claim Claim Claims

Home Health Hospital (Readmit)

3 days 18 days

Episode $20000

$20000 - ($3000 + $12000) = $5000

$3000 $12000

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 9: Making Sense of Medicare Data: From Mining to Analytics

Claims bull Statement of services and costs from a healthcare

provider Patient information Diagnosis information Procedure(s) information

bull Types

Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims hellip

9

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 10: Making Sense of Medicare Data: From Mining to Analytics

10

Episode of Care

SNF 1 SNF 2

IP 1

IP 2

SNF 2

IP 3

HHA 1

HHA 2

time

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 11: Making Sense of Medicare Data: From Mining to Analytics

Mining Claims

11

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP SNF HHA IP IP IP IP SNF SNF

IP

IP

IP

IP

IP

Hospital Stays

Post-Acute Care

Anchor

Episode Initiator

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 12: Making Sense of Medicare Data: From Mining to Analytics

Technologies considered bull Pig (amp Hadoop)

Data Processing Language Procedural Language Relational-oriented

bull SAS Business Analytics amp BI Software De-facto standard in Healthcare Industry Proprietary

12

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 13: Making Sense of Medicare Data: From Mining to Analytics

HPCC Systems Quick Introduction

Rodrigo Pastrana -Consulting Software Engineer

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 14: Making Sense of Medicare Data: From Mining to Analytics

WHT082311

What is HPCC Systems

14

bull Open Source distributed data-intensive computing platform bull Provides end-to-end Big Data workflow management scheduler

integration tools etc bull Runs on commodity computingstorage nodes bull Binary packages available for the most common Linux distributions bull Originally designed circa 1999 (predates the original paper on

MapReduce from Dec lsquo04) bull Improved over a decade of real-world Big Data analytics bull In use across critical production environments throughout

LexisNexis for more than 10 years

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 15: Making Sense of Medicare Data: From Mining to Analytics

WHT082311

The HPCC Systems platform

15

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 16: Making Sense of Medicare Data: From Mining to Analytics

WHT082311

bull Massively Parallel data processing engine bull Enables data integration on a scale not previously available bull Programmable using ECL

HPCC Systems Data Refinery (Thor)

HPCC Systems Data Delivery Engine (Roxie) bull A massively parallel high throughput query engine bull Low latency highly concurrent and highly available bull Several advanced strategies for efficient retrieval bull Programmable using ECL

Enterprise Control Language (ECL) bull An easy to use declarative data-centric programming language optimized for large-scale data management and query processing

bull Highly efficient automatically distributes workload across all nodes compiles to native machine code

bull Automatic parallelization and synchronization

1

2

3

The Three HPCC Systems components

Conclusion End to End platform bull No need for any third party tools

16

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 17: Making Sense of Medicare Data: From Mining to Analytics

WHT082311

bull Declarative programming language Describe what needs to be done and not how to do it

bull Powerful High level data activities like JOIN TRANSFORM PROJECT SORT DISTRIBUTE MAP etc are available

bull Extensible Modular and extensible it can shape itself to adapt to the type of problem at hand

bull Implicitly parallel Parallelism is built into the underlying platform The programmer needs not be concerned with data partitioning and parallelism

bull Maintainable High level programming language without side effects and with efficient encapsulation programs are more succinct reliable and easier to troubleshoot

bull Complete ECL provides a complete data programming paradigm

bull Homogeneous One language to express data algorithms across the entire HPCC Systems platform data integration analytics and high speed delivery

bull Polyglottic ECL supports the embedding of other languages such as Java Python R SQL and more

Enterprise Control Language (ECL)

17

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 18: Making Sense of Medicare Data: From Mining to Analytics

WHT082311

Current Status and Resources

bull HPCCSystemscom ndash Tutorials Docs Platform distributions and more bull Latest release 520 adds many new features and improvements

bull Drastic GUI improvements bull Ganglia and Nagios plug-in for system monitoring and alerting bull Security Enhancements ndash tighter authentication measures intra-

component communication encryption bull Embedded Languages ndash Cassandra support memcache and redis

access bull JSON based data support bull Dynamic ESDL ndash Provides simple middlewareback-end interface

definition bull JAVA API project ndash facilitates interaction between Java based apps and

HPCC web services and c++ tools bull Available now ndash HPCCSystemscom

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 19: Making Sense of Medicare Data: From Mining to Analytics

Data mining with HPCC Systems bull Thor

Responsible for processing vast amount of data Optimized for Extraction Transformation Loading

Sorting and Linking Data

bull ECL Declarative More Data Centric Fast amp Implicitly Parallel Inline data Unit Tests in ECL

19

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 20: Making Sense of Medicare Data: From Mining to Analytics

20

SQL vs ECL

SELECT diag_group_cd COUNT() as volume SUM(pmt_amt) as costs

FROM inpatient_claims

GROUP BY diag_group_cd

TABLE( inpatient_claims

diag_group_cd INTEGER volume =

COUNT(GROUP) REAL costs = SUM(pmt_amt)

diag_group_cd

)

SQL ECL

SELECT

FROM inpatient_claims LEFT JOIN ip_value_codes

RIGHT ON LEFTid = RIGHTid

JOIN( inpatient_claims ip_value_codes LEFTid = RIGHTid

)

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 21: Making Sense of Medicare Data: From Mining to Analytics

21

SQL vs ECL

DECLARE my_cursor CURSOR FOR SELECT FROM inpatient_claims

OPEN my_cursor FETCH NEXT FROM my_cursor INTO hellip hellip WHILE FETCH_STATUS = 0 BEGIN

hellip END CLOSE my_cursor DEALLOCATE my_cursor

ITERATE( inpatient_claims TRANSFORM(inpatient_claim_layout

SELFis_dropped = is_one_year_or_greater(

RIGHTadmsn_dt RIGHTdschrgdt) SELF = RIGHT

) )

SQL ECL

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 22: Making Sense of Medicare Data: From Mining to Analytics

Tx

22

ECL ROLLUP

R1 R2 R3 R4 R5 R6

LEFT RIGHT

RA

Tx LEFT RIGHT

RB R4 R6 R5

ROLLUP( dataset condition(LEFT RIGHT) transformation(LEFT RIGHT) )

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 23: Making Sense of Medicare Data: From Mining to Analytics

Processing Claims 1 The intent here is to make the series of interim claims look like a single claim for

most purposes where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim

2 11130901113090The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay

3 11130901113090The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record or whether the stay is includedexcluded as a readmission for an existing episode

4 11130901113090Costs across all IP claims included in the single stay are aggregated to the stay level

5 Claims where the last in the series of claims has patient (hellip) [as ldquostill a patientrdquo not discharged] flag these and drop all of the claims in the series from the IP hospital stay file

23

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 24: Making Sense of Medicare Data: From Mining to Analytics

Processing Claims With ECL H_1 = SORT( A bene_sk provider admsn_dt dschrgdt thru_dt) H_2 = ROLLUP(H_1

is_interim(LEFT RIGHT) merge_interim_claims(LEFT RIGHT))

H_3 = JOIN(H_2 H_1 LEFTbene_sk = RIGHTbene_sk [hellip] RIGHT ONLY) H_4 = PROJECT(H_3 TRANSFORM(BPCILayoutsip_claim_etl_layout SELFis_dropped = TRUE SELFdropped_reason_code = BPCILayoutsDROPPED_REASON_CODESInterimClaim SELF = LEFT )) H = H_2 + H_4

24

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 25: Making Sense of Medicare Data: From Mining to Analytics

25

Template Language EXPORT load_all_client_files(pId pFileSet pBaseDataDirectory) = MACRO LOADXML(pFileSet) baseDataDirectory = pBaseDataDirectory + pId + FOR(folder) UNIQUENAME(subId) subId = UNIQUENAME(subDS) subDS = ClientDatasets(subId) [] UNIQUENAME(id) id = pId + + UNIQUENAME(dataDir) dataDir = baseDataDirectory + + UNIQUENAME(etl) etl = ClientETL(dataDir id) etlrun() END ENDMACRO

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 26: Making Sense of Medicare Data: From Mining to Analytics

26

Template Language file_set = rsquoltfoldersgtrsquo + ltfoldergtM201409ltfoldergt + ltfoldergtM201410ltfoldergt + ltfoldergtM201411ltfoldergt + ltfoldergtM201412ltfoldergt + ltfoldergtM201501ltfoldergt + ltfoldergtM201502ltfoldergt + ltfoldergtM201503ltfoldergt + lsquoltfoldersgtrsquo load_all_client_files(1234 file_set lsquovolume1datalsquo)

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 27: Making Sense of Medicare Data: From Mining to Analytics

Beyond Processing Data

bull Security amp Authentication

bull Collaboration

bull Unit Tests

bull Visualizations

27

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 28: Making Sense of Medicare Data: From Mining to Analytics

Beyond Processing Security

bull HTTPS

bull Htpasswd

bull LDAP support

bull File level security when using LDAP

28

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 29: Making Sense of Medicare Data: From Mining to Analytics

Beyond Processing Workunits bull Workunit Identifier

bull Attribution

bull Query

bull Timings

bull Results

29

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 30: Making Sense of Medicare Data: From Mining to Analytics

30

Beyond Processing Collaboration

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 31: Making Sense of Medicare Data: From Mining to Analytics

31

Beyond Processing Collaboration (2)

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 32: Making Sense of Medicare Data: From Mining to Analytics

32

Beyond Processing Collaboration

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 33: Making Sense of Medicare Data: From Mining to Analytics

33

Beyond Processing Collaboration

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 34: Making Sense of Medicare Data: From Mining to Analytics

34

Beyond Processing Unit Tests interim_claims = MODULE Test Data test_set = BPCITestSamplesip_claim( bene_id = 1 claim_id = 1 pmt_amt = 30420 ) + BPCITestSamplesip_claim( bene_id = 1 claim_id = 2 pmt_amt = 114090 ) + EXPORT Actual = Step2ip_stays SHARED TestSuite = MODULE EXPORT Test01 = ASSERT(oActual(NOT is_dropped) claimno IN [12] Did not filter ou EXPORT Test02 = ASSERT(oActual(is_dropped) claimno IN [35]) END EXPORT AllTests = TestSuiteTest01 + TestSuiteTest02 END

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 35: Making Sense of Medicare Data: From Mining to Analytics

Beyond Processing Unit Tests (2)

35

Using inline dataset simple_ip_claims = DATASET([ 11001000120000201200001202000020161 ] simplified_ip_layout) ip_claims = Samplesip_claims(simple_ip_claims) OR passing NAMED parameters ip_claims = Samplesip_claims2( bene_id = 1 claim_id = 1 claim_type = 00 )

simplified_ip_layout = RECORD UNSIGNED bene_id UNSIGNED claim_id STRING claim_type STRING provider_number INTEGER4 through_date STRING status_code INTEGER4 admission_date INTEGER4 discharge_date STRING ms_drg_code END

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 36: Making Sense of Medicare Data: From Mining to Analytics

36

Beyond Processing Visualization

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 37: Making Sense of Medicare Data: From Mining to Analytics

37

Custom Visualization

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 38: Making Sense of Medicare Data: From Mining to Analytics

38

(No) Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 39: Making Sense of Medicare Data: From Mining to Analytics

39

Insights

0

20

40

60

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Episo

des

Costs in $1000

No readmit 1 Readmit 2 Readmits

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 40: Making Sense of Medicare Data: From Mining to Analytics

Data Delivery Roxie

bull Data Delivery Engine

bull Indexed compressed and in-memory

bull Data Warehouse Capabilities

bull Data Services

40

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 41: Making Sense of Medicare Data: From Mining to Analytics

Data Services

bull Web Services over Data Warehouse

bull XMLSOAP but also JSON

bull Web Services defined in ECL

bull Solution to addremove data from the cluster

41

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 42: Making Sense of Medicare Data: From Mining to Analytics

Data Service Query bull Define service using ECL

42

INTEGER4 oOffset = 1 STORED(Offset) INTEGER4 oResults = 100 STORED(Results) INTEGER4 oStartDate = 20130201 STORED(Begin_Date) INTEGER4 oEndDate = 20140201 STORED(End_Date) oParams = DATASET([ oOffset oResults oStartDate oEndDate hellip ] Layoutsservice_parameters_layout) T(DATASET(RECORDOF(DatasetsdsFactEpisodeCostsIndex)) pData) = FUNCTION RETURN TABLE(pData STRING bpid = pDatabpid UNSIGNED INTEGER1 model = pDatamodel UNSIGNED INTEGER1 post_dsch_prd_length = pDatapost_dsch_prd_length INTEGER8 total_episodes = COUNT(GROUP) DECIMAL15_2 total_costs = SUM(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 average_costs = AVE(GROUP sum_post_dsch_prd_pay) DECIMAL15_2 std_dev_costs = SQRT(VARIANCE(GROUP sum_post_dsch_prd_pay)) bpid model post_dsch_prd_length) END ReportServicesBaseServicerun_it(Summary oParameters T bpid)

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 43: Making Sense of Medicare Data: From Mining to Analytics

43

Data (Web) Services

summary offset 1 results 10 begin_date 20130101 end_date 20130201 hellip

summaryResponse hellip Results hellip Summary Row [ bpid 9999 model 2 post_dsch_prd_length 90 total_episodes 987 total_costs 987654321 average_costs 1235813

hellip ] hellip

Request Response

httpsWsEclformsjsonqueryroxiesummary

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 44: Making Sense of Medicare Data: From Mining to Analytics

44

Data Services WsECL

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 45: Making Sense of Medicare Data: From Mining to Analytics

45

Data Services WsECL

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 46: Making Sense of Medicare Data: From Mining to Analytics

Loading up data bull Logical vs Physical

~abcsubfoldersubsubfoldermyfile abcsubfoldersubsubfoldermyfile

bull ECL to load data into cluster

46

oDS = DATASET( stdFileExternalLogicalFilename(172001varlibmyfilecsv)

Layoutsip_claim_layout CSV(HEADING(0)) )

oDSDistributed = DISTRIBUTE(oDS bene_id) OUTPUT(oDSDistributed lsquo~somewhereoverheremyfilersquo OVERWRITE)

oDS = DATASET(lsquo~somewhereoverheremyfilersquo Layoutsip_claim_layout)

bull ECL to use data loaded into cluster

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 47: Making Sense of Medicare Data: From Mining to Analytics

SuperFiles bull Super File = Symbolic link list of sub-files bull Each sub-file must have the same layout

47

WEBLOGS_FILE = lsquo~somewherelogswebrsquo StdFileCreateSuperFile(WEBLOGS_FILE) hellip run_report() = FUNCTION oDS = DATASET(WEBLOGS_FILE Layoutsweblogs_layout CSV) RETURN TABLE(oDS ip_address COUNT(GROUP) ip_address ) END

SEQUENTIAL( StdFileStartSuperFileTransaction()

StdFileAddSuperFile(WEBLOGS_FILE lsquo~somewherelogsweb20150401rsquo) StdFileFinishSuperFileTransaction() )

bull Including (more) data

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 48: Making Sense of Medicare Data: From Mining to Analytics

48

Data Services Reusability EXPORT run_it( pServiceName pParams pReportFunction pSortByField) = MACRO Filtering data based on parameters UNIQUENAME(DS) DS = WSDatasetsdsFactEpisodeCostsIndex [hellip]

UNIQUENAME(B) B = IF(COUNT(pParams[1]providers) = 0 A A(provider_id IN pParams[1]providers)) UNIQUENAME(C) C = IF(COUNT(pParams[1]npis) = 0 B B(at_npi IN pParams[1]npis OR op_npi IN pParams[ [hellip]

UNIQUENAME(report) report = pReportFunction(K) UNIQUENAME(sorted) sorted = SORT(report pSortByField) UNIQUENAME(O1) O1 = OUTPUT(pParameters NAMED(Request)) oSummary = DATASET([ COUNT(sorted) ] WSLayoutsservice_summary_layout) UNIQUENAME(O2) O2 = OUTPUT(oSummary NAMED(lsquoMetadata)) UNIQUENAME(O3) O3 = OUTPUT(

CHOOSEN(sorted pParams[1]results pParams[1]offset) NAMED(pServiceName) ALL)

PARALLEL(O1 O2 O3) ENDMACRO

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 49: Making Sense of Medicare Data: From Mining to Analytics

49

AHA System Architecture

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 50: Making Sense of Medicare Data: From Mining to Analytics

50

Archway Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51
Page 51: Making Sense of Medicare Data: From Mining to Analytics

51

lpezetarchwayhacom wwwlinkedincominlucpezet

mezzetinblogspotcom

HPCC Systems open source portal httphpccsystemscom

Thank you Questions Feedback Questions Feedback

wwwlinkedincominlucpezet

mezzetin blogspot com

  • Slide Number 1
  • Tripfilmscom
  • The Achievement Network
  • Archway Health Advisors
  • Medicare
  • Medicare Fee for Service
  • Bundled Payments for Care
  • Medicare BPCI
  • Claims
  • Episode of Care
  • Mining Claims
  • Technologies considered
  • HPCC Systems
  • Slide Number 14
  • Slide Number 15
  • Slide Number 16
  • Slide Number 17
  • Current Status and Resources
  • Data mining with HPCC Systems
  • SQL vs ECL
  • SQL vs ECL
  • ECL ROLLUP
  • Processing Claims
  • Processing Claims With ECL
  • Template Language
  • Template Language
  • Beyond Processing Data
  • Beyond Processing Security
  • Beyond Processing Workunits
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration (2)
  • Beyond Processing Collaboration
  • Beyond Processing Collaboration
  • Beyond Processing Unit Tests
  • Beyond Processing Unit Tests (2)
  • Beyond Processing Visualization
  • Custom Visualization
  • (No) Insights
  • Insights
  • Data Delivery Roxie
  • Data Services
  • Data Service Query
  • Data (Web) Services
  • Data Services WsECL
  • Data Services WsECL
  • Loading up data
  • SuperFiles
  • Data Services Reusability
  • AHA System Architecture
  • Archway Analytics
  • Slide Number 51