Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
CS-22-Data Analytics: Using an Interdisciplinary Approach to Teach STEM and non-STEM Students
Grambling State UniversityConnie Walton, Corisma Akins, Yenumula Reddy
December 2019 Annual SACSCOC Meeting
Date/Time: 12/8/2019: Sunday: 1:30PM - 2:30PM
Location: 351 F, Level 3, GRB
Grambling State University
Founded in 1901
Located in north Louisiana
Enrollment ~5200 students
Offer degrees at bachelor, master, doctoral levels
Center of Academic Excellence in Mathematical Achievement for Science & Technology
Academic Divisions
• College of Education and Graduate Studies
• College of Business
• College of Professional Studies
• College of Arts & Sciences
Accreditations/Certifications
AACSB, ABET-CS, ABET-TAC, ACEN, ACS-Committee on Professional Training, CAEP, COAPRT, CWSE, NASM, NAST, NASPAA
NSF HBCU-UP FUNDED PROJECT
Expand Data Science/Data Analytics Training of undergraduate STEM and non-STEM Students
Data Analytics
(source of info-https://searchdatamanagement.techtarget.com/definition/data-analytics)
“Data analytics (DA) is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software. “
Used to make business decisions and used by researchers to prove or disprove theories.
“Data analytics applications involve more than just analyzing data. Particularly on advanced analytics projects, much of the required work takes place upfront, in collecting, integrating and preparing data and then developing, testing and revising analytical models to ensure that they produce accurate results. In addition to data scientists and other data analysts, analytics teams often include data engineers, whose job is to help get data sets ready for analysis.”
Data from different source systems may need to be combined via data integration routines, transformed into a common format and loaded into an analytics system, such as a Hadoop cluster, NoSQL database or data warehouse.
https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Analytics/Our%20Insights/The%20age%20of%20analytics%20Competing%20in%20a%20data%20driven%20world/MGI-The-Age-of-Analytics-Full-report.ashx
National need
2011 McKinsey & Company Report
https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_exec_summary.ashx
The United States faces a shortage of between 140,000-190,000 workers with deep analytic skills.
An additional 1.5 million managers and analysts who understand big data science enough to ask the correct questions and use the results effectively to solve problems are also needed.
“ just three exabytes of data existed in 1986—but by 2011, that figure was up to more than 300 exabytes. The trend has not only continued but has accelerated since then. One analysis estimates that the United States alone has more than two zettabytes (2,000 exabytes) of data, and that volume is projected to double every three years.”
Strategies to Expand Data Analytics Skills of GSU Undergraduate Students
Certificate Program in Data Analytics
Infuse Topics into Existing Courses
Undergraduate Research Projects
Professional Development for Faculty
Certificate Courses
INTRO TO DATA ANALYTICS
DATA ANALYTICS STATISTICS
Intro to Big Data Course
3 credit hour Computer Science course
100 level course
Topics that include characteristics of big data, sources of big data, big data platforms, text analysis/streams, and introduction to the R language
Data Analytics Course
Sophomore level course
Learning outcomes include -demonstrating a fundamental understanding of Hadoop Distributed File Systems, understanding how to test and debug MapReduce applications, and using RHadoop to analyze big data.
Mini-projects infused
Intro to Big Data Course-CS 112
First semester offered students felt it was a programming course
Course-Introduce Hadoop Apache and R language.
Big Data Science Campmiddle & high school students
• One Week Summer Camp
• Daily Themes (Healthcare, Sports, Social Media, Natural Disaster, Music, etc.)
• Mini Projects
• Guest Speakers from Different Professions
• Daily Presentations
Sample Social Media Project
Sample Social Media Project
LeBron James was traded during the camp
• Observed data changing in real time
• Experienced how socialmedia data could be used toanalyze various aspects ofthe sports industry
Stephen Curry
LeBron James
Sample Project
Students completed projects using ArcMap and ArcGIS
Lessons Learned from Camp
Use activities that are of interest to students
Health Care
Music SportsNatural
disastersPolitics
Use activities that show diverse uses of data
Revamped Intro to Big Data Course
Team Taught
Less coding
Solicited mini projects from campus community and alumni
Students Introduced to Data Analysis through Varied Projects
Find common and distinctive words in
song lyrics and books
Discover trends in university class
registration data
Compare nutritional information from different cereal
brands
Correlate bike accidents by
weather, conditions, and driver sex
Track the shift in literary genres by
distribution of texts
Time how long politicians take to delete typos on
Measure the emotions expressed
by social media users
Determine a flower’s species using machine
learning
Faculty shared the data processes in their research
Faculty created example reports to show some data workflows
Students enrolled in Intro to Big Data presented at
• Cancer and Cyberbullying: Monitoring and analyzing Data from Social Media
• Predictive Modelling of Gender Classification with Caret
• 12th Annual Undergraduate Research Conference hosted at University of Louisiana at Lafayette (November 2019)
Certificate Program
Need a certificate program that can address needs of both STEM & non-STEM majors
Need to have core set of required foundational courses that will be taken by both STEM and non-STEM majors
Have a set of required courses for STEM majors…… then have a set of required courses for non-STEM majors (courses at 300 & 400 levels)
Require completion of 18 credit hours, half at 300 & 400 levels
Probability and Statistics I Course
Data Analytics
Basic Probability and Statistical Distributions
Data Manipulation
Data Visualization and Statistical Graphics
Statistical Inference
Techniques for Supervised Learning
Techniques for Unsupervised Learning
Statistics I Course
The focus is to prepare students on how to use data to obtain information.
Extensive examples using actual data are provided, illustrating diverse informatics sources in socioeconomics, marketing, advertising and finance, among many others.
In many cases, computer code using Python is employed to analyze the data.
Getting Insights from Data (1)
Descriptive Statistics
• Scale Types
• Descriptive Univariate Analysis
• Descriptive Bivariate Analysis
Descriptive Multivariate Analysis
• Multivariate Frequencies
• Multivariate Data Visualization
• Multivariate Statistics
Getting Insights from Data (2)
Data Quality and Preprocessing
• Data Quality
• Converting to a Different Scale Type
• Data Transformation
• Dimensionality Reduction
Clustering
• Distance Measure
• Clustering Validation
• Clustering Techniques
A Project on Data Analytics- Statistics I Course
Understanding the problem to be solvedUnderstanding
Defining the objectives of the projectsDefining
Looking for the necessary dataLooking
Preparing these data so that they can be usedPreparing
Identifying suitable methods and choosing between themIdentifying
Analyzing and evaluating the resultsAnalyzing and
evaluating
Redoing the pre-processing tasks and repeating the experimentsRedoing
Data Analysis Application Examples
Data Munging
Cleaning Data
Filtering
Merging Data
Reshaping Data
Data Aggregation
Grouping Data
Infusion of Data Analytics into
Existing Courses
Infusion of Big Data in Existing Courses
BIOL 409: Biological Research
CHEM 226: Organic Chemistry Lab
CS 435: Big Data and Cloud Computing
Select Business Courses
Big Data in BIOL 409: Biological Research
• Fall 2018: 6 students
• Spring 2019: 10 students
• Offered only as a Spring course starting 2020
Enrollment
• Training in use of big data analytics in biological research applications, culminating in group project
• In class lectures
• Online bioinformatics modules via Pine Biotech (New Orleans, LA)
• Bioinformatics analyses via T-BioInfo platform
Description
• Understand research methodologies and experimental design
• Apply descriptive and inferential statistical methods to datasets
• Analyze Next Generation Sequencing (NGS) datasets using GENOMIC/TRANSCRIPTOMIC approaches
Objectives
BIOL 409 Data Analytics Content
Statistics
• Descriptive: mean, median, mode, range, standard deviation, frequency table, frequency histogram, bivariate scatterplot
• Inferential: Pearson's correlation coefficient, chi-square test, Student's T-test, factor regression, null and alternate hypothesis testing
Transcriptomics
• Map RNA sequencing reads to reference genome using TopHat > Cufflinks > Cuffmerge > Bowtie2-t
• Convert to gene expression levels using RsemExptable
• Find differential gene expression using DESeq2
• Visualize and compress data using Principal Component Analysis
Genomics
• Map genomic sequencing reads to refernce genome using Bowtie2
• Call variants using Strelka
• Visualize Single Nucleotide Polymorphisms using UCSC Genome Browser
BIOL 409 Modules to Pipelines to Data
BIOL 409 Project
• Bioinformatics project relevant to students in Environmental Science concentration
• Genes involved in drought resistance
• Discuss journal article
• Use bioinformatics tools to explore published results
• Future: carry out novel analysis
Big Data in CHEM 226:
Organic Chemistry Lab
Molecular Modeling Experiment:
• NIH database of small molecules
• Dock molecules in a protein binding site
• Molecules get scored based on various properties such as intermolecular vs. intramolecular bonds
• Put together a drug molecule for the disease state
Seminars on Big Data
Health Disparities
Data Analytics Professionals
Corporate Executives
Contact Information
Dr. Connie Walton
Mrs. Corisma Akins
Dr. Yenumula Reddy