38
CS-22-Data Analytics: Using an Interdisciplinary Approach to Teach STEM and non-STEM Students Grambling State University Connie Walton, Corisma Akins, Yenumula Reddy December 2019 Annual SACSCOC Meeting Date/Time: 12/8/2019: Sunday: 1:30PM - 2:30PM Location: 351 F, Level 3, GRB

Data Analytics: Using an Interdisciplinary Approach to

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Analytics: Using an Interdisciplinary Approach to

CS-22-Data Analytics: Using an Interdisciplinary Approach to Teach STEM and non-STEM Students

Grambling State UniversityConnie Walton, Corisma Akins, Yenumula Reddy

December 2019 Annual SACSCOC Meeting

Date/Time: 12/8/2019: Sunday: 1:30PM - 2:30PM

Location: 351 F, Level 3, GRB

Page 2: Data Analytics: Using an Interdisciplinary Approach to

Grambling State University

Founded in 1901

Located in north Louisiana

Enrollment ~5200 students

Offer degrees at bachelor, master, doctoral levels

Center of Academic Excellence in Mathematical Achievement for Science & Technology

Page 3: Data Analytics: Using an Interdisciplinary Approach to

Academic Divisions

• College of Education and Graduate Studies

• College of Business

• College of Professional Studies

• College of Arts & Sciences

Accreditations/Certifications

AACSB, ABET-CS, ABET-TAC, ACEN, ACS-Committee on Professional Training, CAEP, COAPRT, CWSE, NASM, NAST, NASPAA

Page 4: Data Analytics: Using an Interdisciplinary Approach to

NSF HBCU-UP FUNDED PROJECT

Expand Data Science/Data Analytics Training of undergraduate STEM and non-STEM Students

Page 5: Data Analytics: Using an Interdisciplinary Approach to

Data Analytics

(source of info-https://searchdatamanagement.techtarget.com/definition/data-analytics)

“Data analytics (DA) is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software. “

Used to make business decisions and used by researchers to prove or disprove theories.

“Data analytics applications involve more than just analyzing data. Particularly on advanced analytics projects, much of the required work takes place upfront, in collecting, integrating and preparing data and then developing, testing and revising analytical models to ensure that they produce accurate results. In addition to data scientists and other data analysts, analytics teams often include data engineers, whose job is to help get data sets ready for analysis.”

Data from different source systems may need to be combined via data integration routines, transformed into a common format and loaded into an analytics system, such as a Hadoop cluster, NoSQL database or data warehouse.

Page 6: Data Analytics: Using an Interdisciplinary Approach to

https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Analytics/Our%20Insights/The%20age%20of%20analytics%20Competing%20in%20a%20data%20driven%20world/MGI-The-Age-of-Analytics-Full-report.ashx

Page 7: Data Analytics: Using an Interdisciplinary Approach to

National need

2011 McKinsey & Company Report

https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_exec_summary.ashx

The United States faces a shortage of between 140,000-190,000 workers with deep analytic skills.

An additional 1.5 million managers and analysts who understand big data science enough to ask the correct questions and use the results effectively to solve problems are also needed.

“ just three exabytes of data existed in 1986—but by 2011, that figure was up to more than 300 exabytes. The trend has not only continued but has accelerated since then. One analysis estimates that the United States alone has more than two zettabytes (2,000 exabytes) of data, and that volume is projected to double every three years.”

Page 8: Data Analytics: Using an Interdisciplinary Approach to

Strategies to Expand Data Analytics Skills of GSU Undergraduate Students

Certificate Program in Data Analytics

Infuse Topics into Existing Courses

Undergraduate Research Projects

Professional Development for Faculty

Page 9: Data Analytics: Using an Interdisciplinary Approach to

Certificate Courses

INTRO TO DATA ANALYTICS

DATA ANALYTICS STATISTICS

Page 10: Data Analytics: Using an Interdisciplinary Approach to

Intro to Big Data Course

3 credit hour Computer Science course

100 level course

Topics that include characteristics of big data, sources of big data, big data platforms, text analysis/streams, and introduction to the R language

Page 11: Data Analytics: Using an Interdisciplinary Approach to

Data Analytics Course

Sophomore level course

Learning outcomes include -demonstrating a fundamental understanding of Hadoop Distributed File Systems, understanding how to test and debug MapReduce applications, and using RHadoop to analyze big data.

Mini-projects infused

Page 12: Data Analytics: Using an Interdisciplinary Approach to

Intro to Big Data Course-CS 112

First semester offered students felt it was a programming course

Course-Introduce Hadoop Apache and R language.

Page 13: Data Analytics: Using an Interdisciplinary Approach to

Big Data Science Campmiddle & high school students

• One Week Summer Camp

• Daily Themes (Healthcare, Sports, Social Media, Natural Disaster, Music, etc.)

• Mini Projects

• Guest Speakers from Different Professions

• Daily Presentations

Page 14: Data Analytics: Using an Interdisciplinary Approach to

Sample Social Media Project

Page 15: Data Analytics: Using an Interdisciplinary Approach to

Sample Social Media Project

LeBron James was traded during the camp

• Observed data changing in real time

• Experienced how socialmedia data could be used toanalyze various aspects ofthe sports industry

Stephen Curry

LeBron James

Page 16: Data Analytics: Using an Interdisciplinary Approach to

Sample Project

Students completed projects using ArcMap and ArcGIS

Page 17: Data Analytics: Using an Interdisciplinary Approach to

Lessons Learned from Camp

Use activities that are of interest to students

Health Care

Music SportsNatural

disastersPolitics

Use activities that show diverse uses of data

Page 18: Data Analytics: Using an Interdisciplinary Approach to

Revamped Intro to Big Data Course

Team Taught

Less coding

Solicited mini projects from campus community and alumni

Page 19: Data Analytics: Using an Interdisciplinary Approach to

Students Introduced to Data Analysis through Varied Projects

Find common and distinctive words in

song lyrics and books

Discover trends in university class

registration data

Compare nutritional information from different cereal

brands

Correlate bike accidents by

weather, conditions, and driver sex

Track the shift in literary genres by

distribution of texts

Time how long politicians take to delete typos on

Twitter

Measure the emotions expressed

by social media users

Determine a flower’s species using machine

learning

Page 20: Data Analytics: Using an Interdisciplinary Approach to

Faculty shared the data processes in their research

Page 21: Data Analytics: Using an Interdisciplinary Approach to

Faculty created example reports to show some data workflows

Page 22: Data Analytics: Using an Interdisciplinary Approach to

Students enrolled in Intro to Big Data presented at

• Cancer and Cyberbullying: Monitoring and analyzing Data from Social Media

• Predictive Modelling of Gender Classification with Caret

• 12th Annual Undergraduate Research Conference hosted at University of Louisiana at Lafayette (November 2019)

Page 23: Data Analytics: Using an Interdisciplinary Approach to

Certificate Program

Need a certificate program that can address needs of both STEM & non-STEM majors

Need to have core set of required foundational courses that will be taken by both STEM and non-STEM majors

Have a set of required courses for STEM majors…… then have a set of required courses for non-STEM majors (courses at 300 & 400 levels)

Require completion of 18 credit hours, half at 300 & 400 levels

Page 24: Data Analytics: Using an Interdisciplinary Approach to

Probability and Statistics I Course

Data Analytics

Basic Probability and Statistical Distributions

Data Manipulation

Data Visualization and Statistical Graphics

Statistical Inference

Techniques for Supervised Learning

Techniques for Unsupervised Learning

Page 25: Data Analytics: Using an Interdisciplinary Approach to

Statistics I Course

The focus is to prepare students on how to use data to obtain information.

Extensive examples using actual data are provided, illustrating diverse informatics sources in socioeconomics, marketing, advertising and finance, among many others.

In many cases, computer code using Python is employed to analyze the data.

Page 26: Data Analytics: Using an Interdisciplinary Approach to

Getting Insights from Data (1)

Descriptive Statistics

• Scale Types

• Descriptive Univariate Analysis

• Descriptive Bivariate Analysis

Descriptive Multivariate Analysis

• Multivariate Frequencies

• Multivariate Data Visualization

• Multivariate Statistics

Page 27: Data Analytics: Using an Interdisciplinary Approach to

Getting Insights from Data (2)

Data Quality and Preprocessing

• Data Quality

• Converting to a Different Scale Type

• Data Transformation

• Dimensionality Reduction

Clustering

• Distance Measure

• Clustering Validation

• Clustering Techniques

Page 28: Data Analytics: Using an Interdisciplinary Approach to

A Project on Data Analytics- Statistics I Course

Understanding the problem to be solvedUnderstanding

Defining the objectives of the projectsDefining

Looking for the necessary dataLooking

Preparing these data so that they can be usedPreparing

Identifying suitable methods and choosing between themIdentifying

Analyzing and evaluating the resultsAnalyzing and

evaluating

Redoing the pre-processing tasks and repeating the experimentsRedoing

Page 29: Data Analytics: Using an Interdisciplinary Approach to

Data Analysis Application Examples

Data Munging

Cleaning Data

Filtering

Merging Data

Reshaping Data

Data Aggregation

Grouping Data

Page 30: Data Analytics: Using an Interdisciplinary Approach to

Infusion of Data Analytics into

Existing Courses

Page 31: Data Analytics: Using an Interdisciplinary Approach to

Infusion of Big Data in Existing Courses

BIOL 409: Biological Research

CHEM 226: Organic Chemistry Lab

CS 435: Big Data and Cloud Computing

Select Business Courses

Page 32: Data Analytics: Using an Interdisciplinary Approach to

Big Data in BIOL 409: Biological Research

• Fall 2018: 6 students

• Spring 2019: 10 students

• Offered only as a Spring course starting 2020

Enrollment

• Training in use of big data analytics in biological research applications, culminating in group project

• In class lectures

• Online bioinformatics modules via Pine Biotech (New Orleans, LA)

• Bioinformatics analyses via T-BioInfo platform

Description

• Understand research methodologies and experimental design

• Apply descriptive and inferential statistical methods to datasets

• Analyze Next Generation Sequencing (NGS) datasets using GENOMIC/TRANSCRIPTOMIC approaches

Objectives

Page 33: Data Analytics: Using an Interdisciplinary Approach to

BIOL 409 Data Analytics Content

Statistics

• Descriptive: mean, median, mode, range, standard deviation, frequency table, frequency histogram, bivariate scatterplot

• Inferential: Pearson's correlation coefficient, chi-square test, Student's T-test, factor regression, null and alternate hypothesis testing

Transcriptomics

• Map RNA sequencing reads to reference genome using TopHat > Cufflinks > Cuffmerge > Bowtie2-t

• Convert to gene expression levels using RsemExptable

• Find differential gene expression using DESeq2

• Visualize and compress data using Principal Component Analysis

Genomics

• Map genomic sequencing reads to refernce genome using Bowtie2

• Call variants using Strelka

• Visualize Single Nucleotide Polymorphisms using UCSC Genome Browser

Page 34: Data Analytics: Using an Interdisciplinary Approach to

BIOL 409 Modules to Pipelines to Data

Page 35: Data Analytics: Using an Interdisciplinary Approach to

BIOL 409 Project

• Bioinformatics project relevant to students in Environmental Science concentration

• Genes involved in drought resistance

• Discuss journal article

• Use bioinformatics tools to explore published results

• Future: carry out novel analysis

Page 36: Data Analytics: Using an Interdisciplinary Approach to

Big Data in CHEM 226:

Organic Chemistry Lab

Molecular Modeling Experiment:

• NIH database of small molecules

• Dock molecules in a protein binding site

• Molecules get scored based on various properties such as intermolecular vs. intramolecular bonds

• Put together a drug molecule for the disease state

Page 37: Data Analytics: Using an Interdisciplinary Approach to

Seminars on Big Data

Health Disparities

Data Analytics Professionals

Corporate Executives

Page 38: Data Analytics: Using an Interdisciplinary Approach to

Contact Information

Dr. Connie Walton

[email protected]

Mrs. Corisma Akins

[email protected]

Dr. Yenumula Reddy

[email protected]