24

Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant
Page 2: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Introduction

Mark CohenProduct Manager, CCC Data Services & Transcripts CCC Technology Center

[email protected]

2

Page 3: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Agenda• Project History • Overview• Data Sources• Flow of Data• Security• Stakeholder Engagement• Next Steps• Discussion

3

3

Page 4: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Introducing CCC Data

The Data Warehouse is part of the CCC Data project, supported by the Data Services Program (DSP) initiative

from the California Community Colleges Chancellor's Office

● Project overseen by CCCCO Data Governance Council○ Manages MOU’s and data sharing agreements that enable data

to be stored in the Data Lake and accessed through the Data Warehouse

Page 5: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

5

Project History

2016/17

Data Lake POC to support CCC Assess & CCC Apply

2017/18

Data Warehouse established with CCC Apply, MyPath &

Canvas

2018/19

DSP grant funding CCC Data. Launched DW pilot. Launched LGBTQ report

2019/20

COCI & C-ID data added to DL/DW, MIS & Cal-PASS+ to DL. CCC Data to production

2020/21

•FY 2016/2017 - CCC Tech Center was asked through the Tech Center Grant to construct a Data Lake to support Canvas data as part of the Online Education Initiative (OEI), assessment data as part of the Common Assessment Initiative (CAI) and CCCApply data. An initial project to do a Proof of Concept (POC) was constructed to create the data lake and store data into Amazon's S3 object data store. That POC completed in May of 2017.

•2017/18: After the successful completion of the POC, the OEI Workplan and the Common Assessment Grant called for the Tech Center to build a Data Warehouse to house CCCApply, MyPath, Canvas and Multiple Measures data. That work took where the Data Lake POC ended and began work in June of 2017 and the current Data Warehouse project began. The intent was to build out a production ready data warehouse and begin piloting it out with Colleges in the Spring prior to going into Production.

•FY 2018/19: the CCC Technology Center began working on an architecture for system-wide data, information, and technology infrastructure which includes Business Intelligence. We brought in CCCApply, Canvas and MyPath and de-escalated the CCCAssess data as per the instructions from the Chancellor's Office. In late May 2018 we narrowed our focus to providing all of the Colleges the LGBTQ report, based on the legislatively driven need for the LGBTQ reports, and for the need for flexibility so we can pivot to follow the greater strategy of the Chancellor's Office for the next few years. We piloted the data warehouse to build on the POC. An infrastructure was built out to

Page 6: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

move raw data from the sources to the S3 data lake and from there into readable tables and objects. This data was then presented through the CCC Report Center interface

•FY 2019/20: completed successful pilot of the Data Warehouse, including data from CCC Apply, MyPath, and the colleges Canvas data, with Foothills, Butte, Shasta, Lake Tahoe, Yuba, Mt. SAC. We added COCI and C-ID data to the Data Lake. Now working to add MIS and Cal-PASS+ data to the Data Lake. Launching CCC Data to production, including the Data Lake, Data Warehouse, and Data Pipelines; and launching the DW Report Center to all CCC’s. Forming the Data Warehouse Advisory Group and continuing to work closely with the CCCCO and data governance.

Page 7: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

7

CCC Data Overview

SourceDatabases

CCC Data Lake

CCC Data Warehouse

115+1 schemas

Research & Analytics Tools and

CCC Report Center

● The CCC Data Warehouse project provides a set of data products to serve the California Community Colleges and the California Community Colleges Chancellor's office, this includes:

● A Data Lake in which any of the source data can be persisted as it comes from the source for data mining and auditing purposes. All data from source database are stored with changes of data over time

● A Data Warehouse that acts as a structured source of master data that can be used to generate the data marts, reports, and analytics that the end-users need. Holds unencrypted data available for researchers

○ 115+1 schemas ensures security, creating distinct schemas against the data warehouse so that each college accesses only their data.

● A report center, and ODBC/JDBC connections directly to Redshift that provide colleges and Chancellor's Office with access to these data for reporting and analytics.

● The ETL’s, data pipelines, that support the movement of data between the data sources and CCC Data tools.

Page 8: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

● Overtime, this may expand to include additional elements, we will discuss later in the presentation

Page 9: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Current Data Sources

7

Source Data Data Lake Data WarehouseCCC Apply: Application ✔ ✔

CCC Apply: International Application ✔ ✔

CCC Apply: Fee Waiver ✔ ✔

Multiple Measures Placement (MMPP) ✔ ✔

MyPath ✔ ✔

COCI ✔ ✔

C-ID ✔ ✔

MIS ✔ Pending Approval

Cal-PASS+ Pending MOU Pending Approval

Canvas (integration per college) ✔ ✔

►►►►►►►►►

● Connected to the DL through a series of ETL’s

7

Page 10: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

10

Data FlowCCC Apply

app/intl/fee waiver

MMPP

My Path

MIS

COCI

C-ID

Canvas 1 per college

System Data

AWS Kinesis Firehose

CCC SuperGlue

AWS Data Pipeline

AWS Data Pipeline

AWS Data Pipeline

AWS Data Pipeline

CCC SuperGlue

College Data

CCC Data Lake

Amazon Single Storage Service (S3)

CCC Data Warehouse

Amazon Redshi

CCC Report Center

Tibco Jasper

AWS Data Pipelines

• CCC Apply• MMPP• My Path• Canvas

ODBC/JDBC Connection Research & Analytics Tools

● CCC Data is built on an AWS-centric, cloud-based solution that stores and structures data sets from the CCC data sources.

● Source data consisting of system data, and college specific data● Connected through a series of ETL’s that are developed based on

requirements● The data pipelines run nightly, or more often based on requirements● All data is captured in the datalake● Value is added through capturing incremental updates to the data,

identifying which which data sources have deltas/diff info generated so Researchers can find data, that in some cases, only exists in the DW as its over written in the source db

○ DIFF tables captured for for CCCApply (standard, Fee waiver and international), C-ID and COCI

○ Do not store diff tables for MMPS, Mypath & Canvas at this time due to use case or nature of how data is shared

● The data is quickly discovered, retrieved, and used for analytics, reporting, or data mining.

● Data Warehouse enables colleges to perform analysis across their data and system data

○ AWS Redshift supports up to 1,024 unique schemas against the data repository, effectively creating 115+1 distinct data warehouses, assuring that each college accesses only their data

Page 11: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

● In addition to the Report Center as the default method to get to the data, researchers can connect to Redshift via ODBC/JDBC to use their own tools. (Power BI, Cognos, etc.)

Page 12: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

9

Data FlowAWS Data Pipeline

ExternalData

Sources

AWS Data

Pipeline

CCCDataLake

AWS Data

Pipeline

CCCData

Warehouse

● AWS Data Pipelines are used for pulling data from the source database.

WWWe control when it runs and with what frequencye control when it runs and with what frequencye control when it runs and with what frequency, source DB doesn’t , source DB doesn’t , source DB doesn’t know when shared and coming into DL.know when shared and coming into DL.know when shared and coming into DL.

●●AAAWS data pipelines developed and maintained by CCC Data WS data pipelines developed and maintained by CCC Data WS data pipelines developed and maintained by CCC Data TTTeameameam●●

Using Using AAWS data pipeline to talk to external data sources and pull it into WS data pipeline to talk to external data sources and pull it into the DL, from there data pipeline brings data from DLthe DL, from there data pipeline brings data from DL to DW to DW. .

Page 13: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

10

Data FlowCCC Super Glue

ExternalData

Sources

ExternalAPI

Gateway

CCCDataLake

AWS Data

Pipeline

CCCData

Warehouse

● SuperGlue dumps data from source database into the Data Lake● SuperGLue team, responsible for moving data from the data source to

the API gateway...● SuperGLue controls timing, knows what database, how often, how to

fetch and bring to DL.

Page 14: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Data FlowAWS Kinesis

11

ExternalData

Sources

AWSKinesis

FH

CCCDataLake

AWS Data

Pipeline

CCCData

Warehouse

● Streaming data arrives at Data Lake via AWS Kinesis● MyPath uses Kinesis, developed as part of MyPath architecture. ● Source team gets log data, publishes the data to Kinesis,and Kinesis

moves the data to the DL.

Page 15: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

CCC Data Built on AWS Platform

• AWS selected through an open RFP process• Flexible architecture is highly scalable• Relatively low cost and transparent pricing• Address speed, performance, and storage requirements• Includes robust security and supporting tools• Leverages existing AWS infrastructure• 24/7 Monitoring and Incident Response

Page 16: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

AWS Platform Security

Amazon Web Services provides one of the most secure cloud environments available for sensitive data and confirms to the following standards:

Page 17: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Securing CCC Data

14

●○

●○

●○○○

●●

○○

Page 18: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Stakeholder Engagement

15

Data Warehouse Advisory Group

Colle Colle Colleggges ces ces conneconneconnection ttion ttion to their dao their dao their datttaaa○○ R R Requirequirequirementementements fs fs for Ror Ror Report Centeport Centeport Center rer rer reporteporteportsss○○ Identify sys Identify sys Identify systttem and cem and cem and colleolleolleggge dae dae dattta soura soura sourccceseses○○

MeeMeet monthly tt monthly to info inform and help prioritizorm and help prioritize re requirequirementements:s:● ChancChancChancellor'ellor'ellor's Offics Offics Office se se stttakakakeholdereholdereholdersss○○

C C CCCCC insC insC institutional rtitutional rtitutional reseeseesearararch, planning and effch, planning and effch, planning and effececectivtivtiveness Preness Preness Profofofessionals essionals essionals (identified b(identified b(identified by the RP Gry the RP Gry the RP Group)oup)oup)

○○AdvisorAdvisory gry group made up ofoup made up of●

● The goal for this advisory group is to

This This This Advisory Group is composed of institutional research, planning and Advisory Group is composed of institutional research, planning and Advisory Group is composed of institutional research, planning and efefeffectiveness (IRPE) professionals from California Community Colleges, fectiveness (IRPE) professionals from California Community Colleges, fectiveness (IRPE) professionals from California Community Colleges, along with representation from the Chancellor's Ofalong with representation from the Chancellor's Ofalong with representation from the Chancellor's Office and CCC fice and CCC fice and CCC TTTechnology Centerechnology Centerechnology Center. . .

●●CCC Data project guided by their input, requirements.CCC Data project guided by their input, requirements.CCC Data project guided by their input, requirements.●●

provide guidance to ensure that this project is developed in a provide guidance to ensure that this project is developed in a provide guidance to ensure that this project is developed in a manner consistent with the needs of the CCC IRPE communitymanner consistent with the needs of the CCC IRPE communitymanner consistent with the needs of the CCC IRPE community. . .

○○

inform the business requirements for how colleges connect to the inform the business requirements for how colleges connect to the Data WData Warehouse, including the data accessed that support arehouse, including the data accessed that support reporting, analysis, and research, and to reporting, analysis, and research, and to

Page 19: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Next Steps• Engage Data Warehouse Advisory Group • Continue coordination with CCCCO efforts• Connect more data sources• Support colleges accessing Data Warehouse & Report Center• Ongoing development of CCC Data infrastructure• Develop reporting in Data Warehouse Report Center• Participate in Data Services Program activities

16

• Engage Data Warehouse Advisory Group

WWWork with CO on systemwide data projectsork with CO on systemwide data projectsork with CO on systemwide data projects••

Develop reports in the DW Report Center working with advisory group Develop reports in the DW Report Center working with advisory group Develop reports in the DW Report Center working with advisory group and other stakeholders and other stakeholders and other stakeholders

••

WWWork with colleges connecting to the DW through DW Report Center ork with colleges connecting to the DW through DW Report Center ork with colleges connecting to the DW through DW Report Center and through ODBC/JDBC to Redshift, and through ODBC/JDBC to Redshift, and through ODBC/JDBC to Redshift,

••

Continue developing data pipelines to bring more data in to the DL, DWContinue developing data pipelines to bring more data in to the DL, DWContinue developing data pipelines to bring more data in to the DL, DW, , , Report CenterReport CenterReport Center

••

WWWork with CO on continued direction of CCC Data, MOU’ork with CO on continued direction of CCC Data, MOU’ork with CO on continued direction of CCC Data, MOU’s governing s governing s governing data sources, data policies, and data related use casesdata sources, data policies, and data related use casesdata sources, data policies, and data related use cases

••

identify requirements, data sources, reporting for Data Warehouse, and DW Report Center

Page 20: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

17

Ideas to the future ...

College DataCanvas ? ?

? ? ?

System Data

CCC Cal-MMPP COCIApply PASS+

MIS MyPath C-ID ?

CCC Data Lake

CCC Data Warehouse

Data Marts

CCC Report Center

BI Tool

ODBC/JDBC Connection

● Based on direction from the CCCCO and input from the advisory group, future development of CCC Data, may include:

○ Bring in more sources of college data, so that the DL and DW are made up of both data originating from the colleges as well as system data

○ Develop a set of Data Marts from the Data Warehouse and Data Lake, to provide zone- and domain-scoped data sets with dashboards, reports, and analytics targeted to those users.

○ Evaluate business intelligence tools that may expand on the functionality of the CCC Data platform

○ May explore multi-dimensional/Cube (OLAP) databases as needed

Page 21: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Related DSP Activities

In coordination with the Chancellor's Office Digital Innovation and Infrastructure Division:

• Participate in CCC Data Governance Council • Support selection of systemwide Data Dictionary

Application• Develop strategy for Master Data Management

18

Page 22: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Data Warehouse Access

● CCC Data available to CCC Institutional Research, Planning and Effectiveness

● Data Warehouse Report Center○ Upgraded Report Center will be available to Researchers with access to LGBTQ

report○ Addtl staff may request access at [email protected]

● ODBC/JDBC Connection○ Colleges may request access to the CCC Data Warehouse through request to

[email protected]

Page 23: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

Discussion

20

Page 24: Introducing CCC Data...5 Project History 2016/17 Data Lake POC to support CCC Assess & CCC Apply 2017/18 Data Warehouse established with CCC Apply, MyPath & Canvas 2018/19 DSP grant

21

Introducing the Data Warehouse

Mark [email protected]