View
0
Download
0
Category
Preview:
Citation preview
An integrated, modular approach to data science education
in the life sciences
Kimberly A Dill-McFarland1,2*, Steven J Hallam1*,
1 Department of Microbiology and Immunology, University of British Columbia,
Vancouver, BC, Canada 2 Division of Allergy and Infectious Diseases, Department of
Medicine, University of Washington, Seattle, WA, USA
* kadm@uw.edu * shallam@mail.ubc.ca
Abstract
We live in an increasingly data-driven world. This is strongly evident in the life sciences
where high-throughput data generation platforms are transforming biology into
information science. This has shifted major challenges in biological research from data
generation to interpretation and knowledge translation. However, training in
bioinformatics, or more generally data science for life scientists, lags behind current
demands at multiple career levels. In particular, the development of accessible,
undergraduate data science curricula has the potential to improve learning outcomes
and better prepare students in the life sciences to address current social, environmental,
and technological careers. Here, we describe the Experiential Data science for
Undergraduate Cross-Disciplinary Education (EDUCE) initiative, which aims to
progressively build data science competencies across two years of undergraduate
coursework. In this program, students complete data science modules (2 - 17 hrs)
integrated into required courses augmented with opportunities to participate in
additional training through calibrated co-curricular activities. The EDUCE program
draws on a cross-disciplinary community of practice to overcome several reported
barriers to data science for life scientists, including instructor capacity, student prior
knowledge, and relevance to discipline-specific problems. Preliminary survey results
September 26, 2019 1/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
indicate that even a single module improves student self-reported interest and/or
experience in bioinformatics and computer science. Thus, EDUCE provides a flexible
framework for integration of effective data science curriculum into a diverse range of
undergraduate programs.
Introduction 1
The world is increasingly inundated with data. This is especially true in the life sciences, 2
where recent advances in high-throughput sequencing and mass spectrometry have 3
resulted in an explosion of multi-omic information (e.g. DNA, RNA, protein, etc.). For 4
example, over 31 tera-bases of genomic data are created on average per second, and 5
rates are expected to continue to increase with continued improvement in platform 6
technologies [1]. 7
Over the past decade, application of these multi-omic platforms to the study of 8
microbial communities has changed our perception of the world, and the impacts of 9
these tools can be seen across environmental [2], agricultural [3], and medical fields of 10
study [4], as just some examples. Thus, major challenges in the life sciences are shifting 11
away from data generation toward data processing and interpretation. This has resulted 12
in a need for training in bioinformatics or more generally, data science. 13
However, life science training in these areas lags behind recognized needs. Despite 14
calls to action as early as 20 years ago [5], there remains a sustained, unmet need for 15
bioinformatics training in the life sciences [6]. A meta-analysis of surveys from 16
organizations such as the Global Organisation for Bioinformatics Learning, Education 17
and Training (GOBLET), the European life-sciences Infrastructure for biological 18
Information (ELIXIR), the U.S. National Science Foundation (NSF), and the Australia 19
Bioinformatics Resource (EMBL-ABR) found the desire for training to be widespread 20
both in terms of geography and career level [6]. While a number of training formats 21
were applicable, the most common ask for undergraduate students was integrated data 22
science training within current degree programs [6]. This ensures that all students 23
receive the necessary training and provides comprehensive, foundational knowledge for 24
further studies or subsequent careers. 25
In order to meet this clear and present need for data science education in the life 26
September 26, 2019 2/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
sciences, the University of British Columbia has launched a number of initiatives in 27
recent years. These include the Master of Data Science program (established 2016) as 28
well as several specialized undergraduate majors like biotechnology (BIOT) and 29
combined computer science-life science programs (established early 2000s). While these 30
programs provide a subset of students with in-depth training in data science and 31
bioinformatics, they are inherently self-selecting. This is concerning as there is 32
continued under-representation of women and minorities in undergraduate degree 33
programs in computer science, mathematics, and statistics [7]. Therefore, since data 34
science training is necessary for the majority of life science graduates and specialized 35
degree programs do not attract all students, this training needs to be incorporated into 36
the fabric of undergraduate coursework in a progressive manner such that students are 37
able to build their confidence and skillsets across the curriculum. 38
With this integrative objective in mind, the Experiential Data science for 39
Undergraduate Cross-Disciplinary Education (EDUCE) initiative was launched in Fall 40
2017. The following provides an in-depth description of the EDUCE framework as well 41
as preliminary findings from EDUCE in the Department of Microbiology and 42
Immunology at the U. of British Columbia. 43
Experiential Data science for Undergraduate 44
Cross-Disciplinary Education (EDUCE) 45
EDUCE seeks to develop a uniform, cross-disciplinary, and collaborative framework to 46
equip all undergraduate students in the life sciences with basic competency in data 47
science. The EDUCE curriculum provides students with the most commonly needed 48
data science skills as defined by recent surveys [6]. Our foundational learning objectives 49
are for students to learn to: 50
• Recognize and define uses of data science 51
• Explore and manipulate data 52
• Visualize data in tables and figures 53
• Apply and interpret statistical tests 54
September 26, 2019 3/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
For each program, sub-objectives within these foundational objectives are then 55
framed and defined using discipline-specific data, questions, and software. These 56
learning objectives are achieved through 1) modular curriculum in courses, 2) calibrated 57
co-curricular activities, and 3) a cross-disciplinary community of practice. These diverse 58
learning opportunities allow students to develop progressive data science competencies 59
over several years of integrated coursework. 60
Modular curriculum 61
EDUCE curriculum is modular, thus providing a flexible framework for integration of 62
data science curriculum across a diverse range of programs. Modules are self-contained, 63
customizable bundles of learning materials that are scaffolded into existing courses. 64
Each module can be delivered as a separate chunk within a course (like a 2-week 65
intensive), a reoccurring section (like ’data science Fridays’), or a fully integrated 66
curriculum wherein the students are not specifically aware of what is or is not part of a 67
module. 68
Because EDUCE is modular, it overcomes several of the largest barriers to data 69
science education in the life sciences [8]. Specifically, the modules can be taught by the 70
main course instructor(s), visiting lectures, or the students themselves. This lifts the 71
need for all faculty to be well-versed in data science methods as a single, knowledgeable 72
instructor can teach modules across several courses or even across an entire program. 73
Also, the modules assume no prior knowledge as they incorporate all necessary 74
introductory and background material, including materials unique to that module and 75
review activities from previous modules. Finally, module integration circumvents 76
traditional course creation, i.e. new course codes, thus allowing quicker, more broadly 77
accessible curriculum deployment within departments or across faculties. 78
In addition, modular instruction allows for more timely and direct linkages to 79
discipline-specific content than traditional course structures. For example, students 80
cover the global impacts, thermodynamics, etc. of the nitrogen cycle in their regular 81
course content and then immediately enter an EDUCE module in which they quantify 82
and plot nitrogen species in the ocean using R [9]. This aids student learning in both 83
the discipline and data science as students are able to reference prior knowledge [10] as 84
September 26, 2019 4/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
well as think about the meaning and application of a new skill as they learn that 85
skill [11]. Thus, modules leverage student interest in their chosen area of study and 86
maximize the relevancy of the data science content. 87
An EDUCE module consists of all materials related to a set of learning objectives 88
and includes 1) introduction, 2) practice, 3) application, and 4) communication (Fig 1). 89
As students progress from introductory to advanced modules across several courses, 90
their focus transitions from introduction and practice to application and communication. 91
Thus, while each module contains all four aspects, later modules build on previous 92
knowledge to challenge students to apply their data science skills to more complex data 93
and new scientific questions. This builds progressive competency in data science over 94
several years of instruction without the need for additional course requirements. 95
Fig 1. EDUCE module overview. Each EDUCE module consists of introduction,practice, application, and communication. Early, introductory modules are focused onintroduction and practice while later, more advanced modules challenge students toapply their data science skills to complex data and to communicate novel researchfindings.
Modules vary in size from a single activity to semester-long projects, and they can 96
include a range of materials. The following pieces are collated together based on the 97
module’s learning objectives, available class time, and instructor needs. Examples of 98
specific microbiology modules are shown in detail below and additional examples can be 99
found on the EDUCE website (educe-ubc.github.io). 100
• Lecture slides 101
• Notes / handouts 102
• In-class tutorials 103
• At-home tutorials 104
• Screen capture videos 105
• Individual assignments 106
• Team projects 107
September 26, 2019 5/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Example modules in Microbiology 108
EDUCE has or is in the process of being integrated into seven third and fourth year 109
microbiology (MICB) courses selected based on internal MICB curriculum review and 110
instructor interest (Table 1). These courses include lectures, wet labs, and dry labs (i.e. 111
tutorials). 112
Table 1. EDUCE courses in microbiology (MICB)
MICB code Name L W D Hrs/wk EDUCE hrs
301 Microbial ecophysiology X 3 5322 Molecular microbiology laboratory X X 6 3323 Molecular immunology and virology laboratory X X 7 3405 Bioinformatics X X 4 17425 Microbial ecological genomics X 3 17
421, 447 Experimental microbiology/molecular biology (CURE) X X 7 2+L: Lecture, W: Wet lab, D: Dry lab, CURE: Course-based undergraduate research experience
EDUCE content varies in each course and builds from the third (300+) to 113
fourth-year (400+) (Fig 2). Ideally, students enter the program in MICB 301 where 114
they master basic R/RStudio scripting to manipulate data, create figures, and perform 115
t-tests. Then, students acquire additional practice in R/RStudio in 300-level labs 116
(MICB 322, 323) where they employ specialized R packages to manipulate and plot data 117
generated in the course. For example, MICB 322 uses the Sushi package [12] to visualize 118
transposon insertions in the Caulobacter mutant library created by students in the 119
course. Next, students progress to capstone 400-level courses. These include traditional 120
400-level courses (MICB 405, 425) and course-based undergraduate research experiences 121
(CUREs, MICB 421, 447). In the traditional courses, EDUCE modules are built into 122
the coursework and include capstone projects using published but under-explored 123
metagenomic and metatranscriptomic data-sets [13]. In contrast, CUREs involve 124
student-led research projects that generate novel data. Thus, EDUCE serves as a 125
consultation resource to help students with R/RStudio packages, Unix tools, or 126
statistical methods as required for their individual projects. Finally, all EDUCE 127
capstone courses offer the opportunity for students to communicate their scientific 128
findings through publication in the undergraduate Journal of Microbiology and 129
Immunology (JEMI, jemi.microbiology.ubc.ca) and/or presentation at the MICB 130
Undergraduate Research Symposium (URS, 131
jemi.microbiology.ubc.ca/UndergraduateResearchSymposium). 132
September 26, 2019 6/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Fig 2. EDUCE curriculum in microbiology. Students enter the program inMICB 301 at the U. of British Columbia or equivalent coursework at the BritishColumbia Institute of Technology (BCIT). Some students take additional 300-levelcourses, MICB 322 and 323. Then, all students progress to one or more capstone400-level courses that culminate formal publication or presentation within thedepartment. MICB: microbiology
While the above is an ideal module progression, there exists considerable variation 133
and flexibility in the EDUCE program. This is necessary, because MICB prerequisite 134
structure and major requirements do not yield a linear or conserved course progression 135
for students. Microbiology (MICB) and Microbiology Honours (MBIM) degrees require 136
the majority of EDUCE courses while the combined majors require only some EDUCE 137
courses with little overlap across the three options. Specifically, the MICB/MBIM and 138
computer science (+CPSC) or Earth, Ocean, and Atmospheric Sciences (+EOAS) 139
majors require the first EDUCE course (MICB 301) but different 400-level capstones 140
(MICB 405 and 425, respectively). MICB/MBIM + EOAS also requires an additional 141
300-level EDUCE ’practice’ course. The Biotechnology (+BIOT) degree, on the other 142
hand, is a joint program wherein students spend their second and third years at the 143
British Columbia Institute of Technology (BCIT). Thus, +BIOT does not require any 144
300-level EDUCE courses. In addition, many students choose to take the optional or 145
suggested 400-level capstone courses in addition to those required by their major 146
(Table 2). 147
Table 2. EDUCE course requirements for microbiology majors
Undergraduate majorMICB code MICB/MBIM +CPSC +EOAS +BIOT
301 R R R S322 R S R O323 R S O O405 O R S R425 O O R S
421/447 R O O S
R: Required, S: Suggested, O: Optional. MICB: Microbiology, MBIM: MicrobiologyHonours, CPCS: Computer Science, EOAS: Earth, Ocean, and Atmospheric Sciences,BIOT: Biotechnology.
These course modules form the basis of the EDUCE program and address the 148
foundational learning objectives. While all MICB students obtain some level of data 149
literacy through EDUCE modules within their required courses, some students obtain 150
September 26, 2019 7/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
more practice or more advanced skills if they choose to take additional EDUCE courses. 151
Additionally, all students can gain more experience and competencies through the 152
remaining aspects of the program. 153
Co-curricular activities 154
EDUCE co-curricular activities reinforce course modules by providing students with 155
diverse data science learning opportunities outside the classroom. These can include 156
workshops, hackathons, independent research, etc. that incorporate one or more of the 157
foundational learning objectives and meet the following criteria. 158
Co-curricular activities must first and foremost be accessible to undergraduate 159
students in terms of cost, content, and schedule as these are common barriers. 160
Specifically, undergraduate students are often unable to pay the high costs of data 161
”boot camps” [14] and unwilling to pursue outside training opportunities as they feel 162
intimidated by data science tools [15]. 163
Secondarily, EDUCE co-curricular activities directly reiterate and build on learning 164
objectives from related courses. As with the integrated course modules, this aims to 165
improve student learning by referencing prior knowledge [10] and imparting 166
discipline-specific meaning to the skills being learned [11]. This is particularly important 167
for short-term training opportunities as fully stand-alone programs, such as ”boot 168
camps”, do not promote long-term skill development even at the graduate level [16]. 169
Finally, co-curriculars utilize the same or similar data sets and tools as courses. 170
Course modules, thus, serve as pre-requisites for co-curriculars such that students are 171
more prepared and less intimidated by data science training opportunities outside the 172
classroom. This requires co-curriculars to be discipline-specific but does not necessarily 173
exclude other participants. For example, an R [9] workshop using oceanic oxygen data 174
may be designed for life or Earth science students but is no less accessible to other 175
disciplines than the commonly used cars data [17]. 176
EDUCE co-curricular activities fulfill a number of roles within the program. 177
Workshops generally serve as additional practice with data science tools for students 178
who want more experience with concurrent modules or to review concepts from past 179
modules. They also allow students to explore modules from courses that do not fit in 180
September 26, 2019 8/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
their schedules. Other opportunities like hackathons and research challenge students to 181
move beyond the course modules and to develop their data science skills to answer new, 182
unexplored questions alongside other undergraduates as well as graduate students and 183
even faculty. Thus, as with the course modules, EDUCE co-curriculars are flexible and 184
integrated in order to provide students with varied types and levels of instruction from 185
introductory to advanced (Fig 1). 186
Example co-curricular workshops in Microbiology 187
The EDUCE program at UBC has partnered with the Ecosystem Services, 188
Commercialization Platforms and Entrepreneurship (ECOSCOPE, ecoscope.ubc.ca) 189
training program and the Applied Statistics and Data Science Group (ASDa, 190
asda.stat.ubc.ca) to deliver data science workshops accessible to all participants 191
from undergraduate students to industry professionals. 192
The current workshop portfolio (github.com/EDUCE-UBC/workshops_R) consists of 193
32 hours of content in R [9] using the same oceanic geochemical data set [18] as most 194
EDUCE course modules. These workshops target the EDUCE foundational learning 195
objectives and include: 196
• Introduction to R 197
• The R tidyverse 198
• Reproducible research in R and Git 199
• Intermediate R programming 200
• Statistical models in R 201
• Phylogenetics and microbiomes in R 202
Introduction to R is a short refresher of the MICB 301 module or an introduction to 203
the workshops for non-EDUCE students. The R tidyverse closely follows its respective 204
module in MICB 405 and 425 (Fig 1) and thus, can be used as a refresher or additional 205
practice. The remaining workshops have some overlap with course modules (like 206
ANOVA in Statistical models in R and MICB 405) but mainly build on the fourth year 207
modules to challenge students to continue their learning in data science. 208
September 26, 2019 9/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Importantly, ECOSCOPE sponsorship allows undergraduates to take any workshop 209
for 10 CAD as compared to regular registration fees of 100 CAD or more. This, along 210
with scheduling to align with course modules and to accommodate the most common 211
undergraduate schedules in MICB, facilitates undergraduate participation. Moreover, 212
these opportunities are actively advertised in undergraduate courses in MICB and 213
directly linked to concurrent and previous course modules. 214
Community of practice 215
Data science, itself, is a cross-disciplinary field traditionally comprised of statistics, 216
mathematics, and computer science. Thus, EDUCE operates through a 217
cross-disciplinary community of practice including these and other fields. While 218
EDUCE at UBC is led by a postdoctoral fellow in MICB, the team spans 10 219
departments across three faculties (Fig 3). For example, statistics consults on module 220
development, mathematics provides resource support, and teaching assistants (TAs) 221
have been recruited from the Faculties of Science, Applied Science, and Medicine. The 222
community also includes scientists at many career levels including undergraduate and 223
graduate students, postdoctoral fellows, instructors, research faculty, and staff. Overall, 224
the EDUCE team brings together expertise from across the university to provide 225
relevant curriculum grounded in both current data science best practices and 226
scholarship of education leadership. 227
Fig 3. EDUCE team members at UBC. The EDUCE program is led by MICB inthe Faculty of Science. However, TAs, consultants, collaborators, and other supportcome from across UBC, including 10 departments from 5 Faculties. Team members alsorepresent multiple career stages from undergraduates to faculty to staff. Other includesthe Faculties of Applied Science, Land and Food Systems, Medicine, and UBC Central.
The EDUCE community of practice reflects widespread changes in teaching in 228
MICB. A growing number of MICB courses (24% in 2018/19) are taught by a team of 229
faculty often including both tenure-track researchers and instructors. This team 230
teaching brings additional expertise, styles, and perspectives into the classroom [19,20] 231
as well as fosters collaboration and community at the university [20]. 232
Additionally, EDUCE TAs contribute to team teaching. Unlike traditional 233
appointments, EDUCE TAs are not assigned to a specific course nor do they always 234
September 26, 2019 10/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
come from the course department. Instead, EDUCE TAs are recruited from any 235
department and work across the entire program. With their varied expertise, TAs act as 236
contributing members to all aspects of EDUCE from curriculum development to office 237
hours to instruction. This provides the EDUCE students with diverse resources, 238
viewpoints, and role models as they progress through modules and co-curriculars. 239
Moreover, this provides the TAs will hands-on teaching experience as well as evaluation 240
to build their teaching portfolios. 241
Preliminary outcomes of EDUCE 242
EDUCE launched at UBC in Term 1 of the 2017/18 academic year. Per year, 243
approximately 300 (redundant) students are impacted through 40 total classroom hours 244
in five MICB courses. Once modules have been deployed in all seven planned courses, 245
this will increase to roughly 475 students and 45 hours. Also, over the first two years, 246
co-curricular workshops were taken by 77 EDUCE students, which resulted in increasing 247
undergraduate participation from 0% (2016/17, prior to partnership with EDUCE) to 248
8.8% (2017/18) to 23% (2018/19) of total registrants. 249
Evaluation of EDUCE is on-going and includes student surveys (BREB H17-02416) 250
at the start (pre) and end (post) of courses (Text S1) as well as at the end of workshops. 251
Courses with <5 EDUCE hours (Table 1) were excluded to avoid redundant surveying 252
of students. For example, >95% of students in MICB 322 concurrently take MICB 301. 253
To-date, 439 pre-course responses and 400 post-course responses have been collected 254
with 91 students indicating willingness to complete post-graduation surveys or focus 255
groups. We have also collected 44 responses from co-curricular workshops. 256
Preliminary analysis of the first EDUCE course, MICB 301, from 2017-18 revealed 257
that student self-reported interest in ’bioinformatics’ increased with a significant 258
transition from medium to high interest (Monte Carlo FDR P = 1.82E-2, Fig 4). 259
Similarly, student self-reported experience in ’bioinformatics’ significantly increased from 260
none to low (FDR P = 1.84E-7) or medium (FDR P = 2.74E-5) as well as from low to 261
medium (FDR P = 2.69E-2). This is particularly important for the MICB program at 262
UBC as the fourth year course MICB 405, Bioinformatics is available to these students. 263
In contrast, self-reported interest in ’computer science’ remained constant but 264
September 26, 2019 11/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Fig 4. Student interest and experience in data science. Student responses frommatched pre- and post-surveys in MICB 301 indicating self-reported interest (N = 143 -146) and experience (N = 136 -143) in data science areas including bioinformatics andcomputer science. Numerical responses from 2018/19 were converted to categoricalgroups from 2017/18 as none (0), low (1-3), medium (4-7), and high (8-10). Arrows andp-values indicate significant response changes by Monte Carlo symmetry tests for pairedcontingency tables.
showed significant gains in experience from none to low (FDR P = 2.44E-3), and 265
’statistics’ had no significant changes in either interest or experience. Thus, it appears 266
that even minimal exposure to data science in disciplinary coursework (5 hours, 267
(Table 1)) may impact student interest and confidence in some data science areas. 268
??????RESULTS ON JEMI, URS 269
Full analysis details are available at 270
https://github.com/kdillmcfarland/EDUCE_desc. 271
On-going challenges 272
The EDUCE program aims to improve undergraduate competency in data science 273
through modular course curriculum and calibrated co-curricular activities offered by a 274
cross-disciplinary teaching team. While course modules are designed to be extensible, 275
some modification is required prior to implementation in a new course in order to 276
accommodate individual teaching style, course schedule, and student prior knowledge. 277
This process is made easier through collaboration across the EDUCE community of 278
practice. However, as with any new curriculum, instructor preparation time remains a 279
barrier to implementation of EDUCE modules in some courses. 280
Another on-going challenge is student’s prior knowledge. Due to the non-linear 281
nature of MICB degree requirements at UBC, some students come into an intermediate 282
EDUCE course without prior EDUCE experience while others repeat EDUCE content 283
in multiple courses. To overcome this, we are developing supplementary exercises for 284
students entering the program in fourth-year as well as integrating new data sets in 285
fourth-year modules. 286
Finally, while preliminary results indicate that EDUCE may have positive impacts 287
on student interest and learning in data science, further analyses of collected survey 288
data are needed to determine if EDUCE curriculum is truly causative of these outcomes. 289
September 26, 2019 12/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
?????Do we need a discussion section that related EDUCE findings with other 290
programs and papers? 291
Acknowledgments 292
The authors wish to thank faculty and instructors for access to courses and support 293
during module integration, including Sean Crowe, Marcia Graves, Martin Hirst, William 294
Mohn, David Oliver, and Jennifer Sibley. 295
We gratefully acknowledge the financial support for this project provided by UBC 296
Vancouver students via the Teaching and Learning Enhancement Fund. This work was 297
also supported the UBC Department of Microbiology and Immunology as well as the 298
Natural Sciences and Engineering Research Council of Canada (NSERC), Collaborative 299
Research and Training Experience Program (CREATE, ????Grant number?????). 300
References
1. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big
Data: Astronomical or Genomical? PLOS Biology. 2015;13(7):e1002195.
2. Hawley AK, Nobu MK, Wright JJ, Durno WE, Morgan-Lang C, Sage B, et al.
Diverse Marinimicrobia bacteria may mediate coupled biogeochemical cycles
along eco-thermodynamic gradients. Nature communications. 2017;8(1):1507.
doi:10.1038/s41467-017-01376-9.
3. Neumann AP, Suen G. The Phylogenomic Diversity of Herbivore-Associated
Fibrobacter spp. Is Correlated to Lignocellulose-Degrading Potential. mSphere.
2018;3(6):e00593–18. doi:10.1128/mSphere.00593-18.
4. Sze MA, Schloss PD. Leveraging Existing 16S rRNA Gene Surveys To Identify
Reproducible Biomarkers in Individuals with Colorectal Tumors. mBio.
2018;9(3):e00630–18. doi:10.1128/mBio.00630-18.
5. MacLean M, Miles C. Swift action needed to close the skills gap in bioinformatics.
Nature. 1999;401(6748):10. doi:10.1038/43269.
September 26, 2019 13/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
6. Attwood TK, Blackford S, Brazas MD, Davies A, Schneider MV. A global
perspective on evolving bioinformatics and data science training needs. Briefings
in Bioinformatics. 2017; p. bbx100–bbx100.
7. National Science Foundation: National Center for Science and Engineering
Statistics. Women, Minorities, and Persons with Disabilities in Science and
Engineering: 2019. Alexandria, VA: Special Report NSF 19-304; 2019. Available
from: https://www.nsf.gov/statistics/wmpd.
8. Williams JJ, Teal TK. A vision for collaborative training infrastructure for
bioinformatics. Annals of the New York Academy of Sciences. 2017;1387(1):54–60.
doi:10.1111/nyas.13207.
9. R Core Team. R: A language and environment for statistical computing; 2019.
Available from: http://www.r-project.org/.
10. Bransford JD, Brown AL, Cocking RR. How People Learn: Brain, Mind,
Experience, and School. Washington D.C.: National Academy Press; 2000.
11. Morris CD, Bransford JD, Franks JJ. Levels of processing versus transfer
appropriate processing. Journal of Verbal Learning and Verbal Behavior.
1977;16(5):519–533. doi:10.1016/S0022-5371(77)80016-9.
12. Phanstiel DH. Sushi: Tools for visualizing genomics data; 2018.
13. Hawley AK, Torres-Beltran M, Zaikova E, Walsh DA, Mueller A, Scofield M,
et al. A compendium of multi-omic sequence information from the Saanich Inlet
water column. Scientific Data. 2017;4:170160.
14. National Academies of Sciences and Medicine E. Data Science for
Undergraduates: Opportunities and Options. Washington, DC: The National
Academies Press; 2018. Available from: https://www.nap.edu/catalog/25104/
data-science-for-undergraduates-opportunities-and-options.
15. Farrell KJ, Carey CC. Power, pitfalls, and potential for integrating
computational literacy into undergraduate ecology courses. Ecology and
evolution. 2018;8(16):7744–7751. doi:10.1002/ece3.4363.
September 26, 2019 14/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
16. Feldon DF, Jeong S, Peugh J, Roksa J, Maahs-Fladung C, Shenoy A, et al. Null
effects of boot camps and short-format training for PhD students in life sciences.
Proceedings of the National Academy of Sciences of the United States of America.
2017;114(37):9854–9858. doi:10.1073/pnas.1705783114.
17. Fox J, Weisberg S. An {R} Companion to Applied Regression. 3rd ed. Thousand
Oaks, Ca: Sage; 2019.
18. Torres-Beltran M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, et al.
A compendium of geochemical information from the Saanich Inlet water column.
Scientific Data. 2017;4:170159.
19. Jones F, Harris S. Benefits and Drawbacks of Using Multiple Instructors to
Teach Single Courses. College Teaching. 2012;60:132–139.
20. Bess JL. Teaching Alone, Teaching Together : Transforming the Structure of
Teams for Teaching. 1st ed. San Francisco, USA: Jossey-Bass; 2000.
September 26, 2019 15/15
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Introductionto data science
Communicateresults
Apply toolsto public data
Practicewith tools
- Definitions- Uses in discipline
- Data manipulation- Data visualization- Statistics
- Discipline-specific- Messy, imperfect data- Novel findings
- Schorlarly reports- Peer-reviewed articles- Symposium presentations
Apply toolsto course data
Introductory
Advanced
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Introduction to data science
Additional practice
MICB301 BCIT
MICB322
MICB323
- Discipline-specific uses- Introduction to R/RStudio- Data manipulation and figures- t-tests
- Specialized R packages- Student-generated data
Scientific communication
MICB405
MICB425
Applications of data science
- Intermediate R/RStudio- ANOVA and linear regression- Introduction to Unix- Capstone projects
MICB421
MICB447
- Consultation in R/RStudio- Consultation in statistics- Student-generated data- Capstone projects
- Publication in the Undergraduate Journal of Microbiology and Immunology (UJEMI)- Presentation at the MICB Undergraduate Research Symposium (URS)
UJEMI MICBURS
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Faculty of Science Other
Undergraduatestudent
Graduatestudent
Postdoctoralfellow
Faculty(instructor)
Faculty(research)
Staff Graduatestudent
Staff
0
1
2
3
4
5
6
Career level
Num
ber
of E
DU
CE
team
mem
bers
Department
Applied Mathematics
Botany
Centre for Teaching,Learning & Technology
Computer Science
Electrical & ComputerEngineering
Food Science
Mathematics
Medical Genetics
Microbiology &Immunology
Statistics
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
BioinformaticsP = 1.82E−2
Computer science Statistics
Pre Post Pre Post Pre Post
0%
25%
50%
75%
100%
Survey
Pro
port
ion
ofre
spon
ses
Interest
High
Medium
Low
None
A
BioinformaticsP < 0.03
Computer scienceP = 2.44E−3
Statistics
Pre Post Pre Post Pre Post
0%
25%
50%
75%
100%
Survey
Pro
port
ion
ofre
spon
ses
Experience
High
Medium
Low
None
B
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted July 26, 2020. ; https://doi.org/10.1101/2020.07.25.218453doi: bioRxiv preprint
Recommended