22
Data Science and Analytics Curriculum development at Rensselaer (and the Tetherless World Constellation) NRC BigData Education Workshop April 11-12, 2014, Washington DC Peter Fox (RPI and WHOI/AOP&E) [email protected], @taswegian Tetherless World Constellation, http://tw.rpi.edu #twcrpi Earth and Environmental Science, Computer Science, Cognitive Science, and IT and Web Science

NRC BigData Education Workshop April 11-12, 2014, Washington DC

Embed Size (px)

DESCRIPTION

Data Science and Analytics Curriculum development at Rensselaer (and the Tetherless World Constellation). NRC BigData Education Workshop April 11-12, 2014, Washington DC. Peter Fox (RPI and WHOI/AOP&E) [email protected] , @taswegian Tetherless World Constellation, http://tw.rpi.edu #twcrpi - PowerPoint PPT Presentation

Citation preview

Page 1: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Data Science and Analytics Curriculum development at Rensselaer

(and the Tetherless World Constellation)NRC BigData Education Workshop

April 11-12, 2014, Washington DC

Peter Fox (RPI and WHOI/AOP&E) [email protected], @taswegianTetherless World Constellation, http://tw.rpi.edu #twcrpiEarth and Environmental Science, Computer Science, Cognitive Science, and IT and Web Science

Page 2: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Data is a 1st class citizen

2http://thomsonreuters.com/content/press_room/science/686112

Page 3: NRC BigData Education Workshop April 11-12, 2014, Washington DC

tw.rpi.edu

Research Themes

Future Web•Web

Science•Policy•Social

Xinformatics•Data Science

•Semantic eScience

•Data Frameworks

Semantic Foundations•Knowledge Provenance

•Ontology Engineering Environments•Inference, Trust

Hendler

Fox

McGuinness

Multiple depts/schools/programs ~ 35 (Post-doc, Staff, Grad, Ugrad)

Page 4: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Application Themes

Govt. Data•Open

•Linked•Apps

Env. Informatics•Ecosystems

•Sea Ice•Ocean imagery

•Carbon

Health Care/ Life Sciences•Population Science•Translational Med

•Health Records

Hendler/ Erickson

Fox

McGuinness

Platforms:Bio-nano tech centerExp. Media and Perf. Arts Ctr.Center for Comput. Innovation

Institute for Data Exploration and Applications http://idea.rpi.edu

Page 5: NRC BigData Education Workshop April 11-12, 2014, Washington DC

http://tw.rpi.edu/web/Courses

5

Data Information Knowledge

Context

PresentationOrganization

IntegrationConversation

CreationGathering

Experience

Data Science Xinformatics Semantic eScience

Web Science

GIS4ScienceData Analytics

Page 6: NRC BigData Education Workshop April 11-12, 2014, Washington DC

I teach and am involved:

• Data Science*, Xinformatics*, GIS for the Sciences*, Semantic eScience*, Data Analytics*, Sematic Technologies**

• School of Science– ITWS and E&ES curriculum committees, SoS CC– E&ES international student advisor – Institute Faculty Fellow

• Institute-wide– New Digital Humanities program

• Institute for Data Exploration and Applications

Page 7: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Data Science/ Xinformatics

Science has fully entered a new mode of operation. Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work. Data science is helping scientists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce. At present, there is a lack formal training in the key cognitive and skill areas that would enable graduates to become key participants in e-science collaborations. The need is to teach key methodologies in application areas based on real research experience and build a skill-set. At the heart of this new way of doing science, especially experimental and observational science but also increasingly computational science, is the generation of data.

In the last 2-3 years, Informatics has attained greater visibility across a broad range of disciplines, especially in light of great successes in bio- and biomedical-informatics and significant challenges in the explosion of data and information resources. Xinformatics is intended to provide both the common informatics knowledge as well as how it is implemented in specific disciplines, e.g. X=astro, geo, chem, etc. Informatics' theoretical basis arises from information science, cognitive science, social science, library science as well as computer science. As such, it aggregates these studies and adds both the practice of information processing, and the engineering of information systems. This course will introduce informatics, each of its components and ground the material that students will learn in discipline areas by coursework and project assignments.

Page 8: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Modern informatics enables a new scale-free framework approach

Page 9: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Mediation; generations

Borgmann et al., Cyber Learning Report, NSF 2008

Page 10: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Data Analytics Challenge

10

Page 11: NRC BigData Education Workshop April 11-12, 2014, Washington DC

IT and Web Science

• First IT academic program in U.S.

• First web science degree program in U.S.

• BS in ITWS (20 concentrations) and MS in IT (10 concentrations)

• PhD in Multi-Disciplinary Sciences

• http://itws.rpi.edu

Page 12: NRC BigData Education Workshop April 11-12, 2014, Washington DC

  

 Technical Track Courses

 

  Concentrations

Computer Engineering Track

1) ECSE-2610 Computer Components and Operations2) ENGR-2350 Embedded Control3) ECSE-2660 Computer Architecture, Networking and

Operating Systems

Civil EngineeringComputer HardwareComputer Networking (hardware focus)Mechanical/Aeronautical Eng.

Computer Science Track 1) CSCI-2200 Foundations of Computer Science2) CSCI-2300 Introduction to Algorithms3) CSCI-2500 Computer Organization

Cognitive ScienceComputer Networking (software focus)Information SecurityMachine and Computational Learning

Information Systems Track 1) CSCI-2200 Foundation of Computer Science2) CSCI-2500 Computer Organization3) Four credits from the following: CSCI-2220 Programming in Java (2 credits) CSCI-2961 Program in Python (2 credits) CSCI-2300 Introduction to Algorithms (4 credits) ITWS-49XX Web Systems Development II (4 credits)

ArtsCommunicationEconomicsEntrepreneurshipFinanceManagement Information SystemsMedicinePre-lawPsychologySTS

Web Science Track 1) CSCI-2200 Foundations of Computer Science2) CSCI-2500 Computer Organization3) One of the following: CSCI-49XX Web Systems Development II Web/Data Course approved by ITWS Curriculum

Committee

Data ScienceScience Informatics Web Technologies 

Page 13: NRC BigData Education Workshop April 11-12, 2014, Washington DC

CHANGES TO THE MASTER’S IN INFORMATION TECHNOLOGY

PROGRAM• In Spring 2013 the MS in IT core curriculum was revised

to include Data Analytics.• Networking core classes were replaced with Data

Analytics core classes: Data Science, Database Mining, X-informatics, and Data Analytics (a new class offered in Spring 2014).

• The MS in IT program also added two new concentrations: Data Science and Analytics and Information Dominance.

• The Information Dominance concentration was developed for a new Navy program that will be educating a select group of 5-10 naval officers a year with the skills needed for military cyberspace operations. Two officers started in Fall 2013 and three began in Spring 2014.

Page 14: NRC BigData Education Workshop April 11-12, 2014, Washington DC

IT Core Area Course Number Course Title Term(s) Offered

Database Systems CSCI-4380 Database Systems Fall/Spring

Data Analytics ITWS-6350 Data Science Fall

Software Design and Engineering

CSCI-4440 Software Design and Documentation Fall

ITWS-6400 X-Informatics Spring

Management of Technology*

ITWS-6300Business Issues for Engineers and Scientists (Professional Track Only)

Fall/Spring

Human Computer Interaction

COMM-6420 Foundations of HCI Usability Fall

COMM-696X Human Media Interaction Spring

MS in IT Required Core Courses

* For the research track, replace ITWS-6300 Business Issues for Engineers and Scientists with one of the two semester courses ITWS-6980 Master’s Project or ITWS-6990 Master’s Thesis.

Advanced Core options for students who have previously completed a Core Course

IT Core Area Course Number Course Title Term(s) Offered

Database Systems

CSCI-6390 Database Mining Fall

ITWS-6350 Data Science Fall

ITWS-696X Semantic E-Science Fall

Data Analytics

CSCI-6390 Database Mining Fall

ITWS-6400 X-Informatics Spring

ITWX-696X Data Analytics Spring

Software Design

CSCI-6500 Distributed Computing Over the Internet Fall

ECSE-6780 Software Engineering II Fall

ITWS-696X Semantic E-Science Fall

Management of Technology

MGMT-6080 Networks, Innovation and Value Creation Fall

MGMT-6140 Information Systems for Management Spring

Human Computer Interaction

COMM-6620 Information Architecture Spring

COMM-6770 User-Centered Design Fall

COMM-696X Interactive Media Design Summer

Page 15: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Concentration Course Number Course Name Term(s) Offered

Data Science and

Analytics

Data and Information analytics extends analysis (descriptive and predictive models to obtain knowledge from data) by using insight from analyses to recommend action or to guide and communicate decision-making. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with an entire methodology. Key topics include: advanced statistical computing theory, multivariate analysis, and application of computer science courses such as data mining and machine learning and change detection by uncovering unexpected patterns in data. Select two or three of the following courses:

ITWS-6350 Data Science Fall

ITWS-6400 X-Informatics Spring

ITWS-696X Data Analytics Spring

ITWS-696X Semantic E-Science Fall

ITWX-696XAdvanced Semantic Technologies*

Spring

If only two of the above were chosen, select one more of the following courses:

COMM-6620 Information Architecture Spring

CSCI-4020 Computer Algorithms Spring

CSCI-4150 Introduction to AI Fall

CSCI-6390 Database Mining Fall

CSCI-4220 or CSCI-6220

Network Programming or Parallel Algorithm Design

Spring

ISYE-4220Optimization Algorithms and Applications

Fall

ISYE-6180 Knowledge Discovery with Data Mining

Spring

MGMT-696XTechnology Foundations for Business Analytics

Fall

MGMT-696XPredictive Analytics Using Social Media

Spring

Concentration Course Number Course Name Term(s) Offered

Information Dominance

The Information Dominance concentration prepares students for careers designing, building, and managing secure information systems and networks.  The concentration includes advanced study in encryption and network security, formal models and policies for access control in databases and application systems, secure coding techniques, and other related information assurance topics.  The combination of coursework provides comprehensive coverage of issues and solutions for utilizing high assurance systems for tactical decision-making.  It prepares students for careers ranging from secure information systems analyst, to information security engineer, to field information manager and chief information officer.  It is also appropriate for all IT professionals who want to enhance their knowledge of how to use pervasive information in situational awareness, operations scenarios, and decision-making.

Select two or three of the following courses:

ISYE-6180Knowledge Discovery with Data Mining

Spring

CSCI-6960Cryptography and Network Security I

Fall

ITWS-4370 Information System Security Spring

CSCI-4650 Networking Laboratory IFall/Spring

MGMT-7760 Risk Management Fall

ISYE-4310Ethics of Modeling for Industrial Systems Engineering

Fall

If only two of the above were chosen, select one more of the following courses:

CSCI-6390 Database Mining Fall

CSCI-6968Cryptography and Network Security II

Spring

CSCI-4660 Networking Laboratory IIFall/Spring

ECSE-6860Evaluation Methods for Decision Making

Fall

ISYE-6500Information and Decision Technologies for Industrial and Service Systems

Fall/Spring

CSCI-496XComputational Analysis of Social Processes

Fall

Two New MS in IT Concentrations

Page 16: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Also at RPI

• Data Science Research Center and Data Science Education Center (dsrc.rpi.edu, 2009)

• http://www.rpi.edu/about/inside/issue/v4n17/datacenter.html– Over 45: research faculty, post-docs, grad students, staff,

undergraduates…

• Data is one of the Rensselaer Plan’s five thrusts• Other key faculty

– Fran Berman (Center for Digital Society and RDA)– Bulent Yener (DSRC Director)– Jin Hendler (IDEA Director)

Page 17: NRC BigData Education Workshop April 11-12, 2014, Washington DC

data.rpi.edu (v0.1, 2009)

Page 18: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Soon…

Page 19: NRC BigData Education Workshop April 11-12, 2014, Washington DC

More RPI Curriculua

• Environmental Science with Geoinformatics concentration

• Bio, geo, chem, astro, materials - informatics

• GIS for Science

• Master of Science – Data Science?? (pending)

• Multi-disciplinary science program - PhD in Data and Web Science

• DATUM: Data in Undergraduate Math! (Bennett)

• Missing – intermediate statistics

• Graphs – significant potential here – must teach!

Page 20: NRC BigData Education Workshop April 11-12, 2014, Washington DC

5-6 years in…

• Science and interdisciplinary from the start!– Not a question of: do we train scientists to be

technical/data people, or do we train technical people to learn the science

– It’s a skill/ course level approach that is needed

• We teach methodology and principles over technology *

• Data science must be a skill, and natural like using instruments, writing/using codes

• Team/ collaboration aspects are key **• Foundations and theory must be taught ***

Page 21: NRC BigData Education Workshop April 11-12, 2014, Washington DC

Challenging the “Heroic” Science Paradigm

This national and international has drawn attention to the need for a reassessment of priorities to recognize that, in the new data era, the burden of making data and information usable shifts from the user to the provider.

Page 22: NRC BigData Education Workshop April 11-12, 2014, Washington DC

And thus … in <10 years