166
A Proposal for a Program of Graduate Study in Data Science leading to a degree in Doctor of Philosophy in Data Science (PhD/DS) By: Doctoral Program Committee, HDSI (AY 2019) Gal Mishne, Virginia De Sa, Yian Ma, Jingbo Shang, Vineet Bafna, Lily Xu, Michael Holst, Rayan Saab, Armin Schwartzman, George Sugihara, Dimitris Politis. Doctoral Program Committee, HDSI (AY 2020) Faculty Council Contacts: Academic: Rajesh K. Gupta Director, Halıcıoğlu Data Science Institute (HDSI) (858) 822-4391 [email protected] Administrative: Yvonne Wollmann Student Affairs Manager Halıcıoğlu Data Science Institute (HDSI) (858) 246-5427 [email protected] Version History: April 12, 2020: Version 1.1 submitted for preliminary review by HDSI faculty. Oct 6, 2020: Version 2.0 submitted for administrative review. Oct 12, 2020: Version 2.2 updated with inputs from HDSI faculty council. Nov 30, 2020: Version 3.1 submitted to Graduate Council for review. Jan 28, 2021: Version 4.0 revised and updated based on feedback from the Graduate Council. Online Link: https://bit.ly/HDSI-PHD 1

F acul t y Counci l A P r o p o sal fo r a P r o g r am o

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

A Proposal for a Program of Graduate Study inData Science leading to a degree in

Doctor of Philosophy in Data Science (PhD/DS)

By:Doctoral Program Committee, HDSI (AY 2019)Gal Mishne, Virginia De Sa, Yian Ma, Jingbo Shang, Vineet Bafna, Lily Xu, Michael Holst,Rayan Saab, Armin Schwartzman, George Sugihara, Dimitris Politis.

Doctoral Program Committee, HDSI (AY 2020)Faculty Council

Contacts:Academic:Rajesh K. GuptaDirector, Halıcıoğlu Data Science Institute (HDSI)(858) [email protected]

Administrative:Yvonne WollmannStudent Affairs ManagerHalıcıoğlu Data Science Institute (HDSI)(858) [email protected]

Version History:April 12, 2020: Version 1.1 submitted for preliminary review by HDSI faculty.Oct 6, 2020: Version 2.0 submitted for administrative review.Oct 12, 2020: Version 2.2 updated with inputs from HDSI faculty council.Nov 30, 2020: Version 3.1 submitted to Graduate Council for review.Jan 28, 2021: Version 4.0 revised and updated based on feedback from the Graduate Council.Online Link: https://bit.ly/HDSI-PHD

1

A Proposal for a Program of Study in DataScience leading to a degree in

Doctor of Philosophy in Data Science (PhD/DS)Executive SummaryThe Halıcıoğlu Data Science Institute proposes a doctoral degree program in “Data Science”(PhD/DS) to serve the need for advanced graduate studies in the area of Data Science, a field inwhich HDSI currently offers a well-received Bachelor of Science degree as a part of its academicmission “to promote a unified campus-wide approach to research and teaching in Data Science.”The proposed doctoral program will join similar degree programs coming up across the countryas the emerging field continues to define its core intellectual thrusts and its academic community.The nascent field of Data Science spans mathematical models, computational methods andanalysis tools for navigating and understanding data in a broad range of application domains.The scientific community in the area is accordingly drawn from many different existingdisciplines driven in the near term by the immediate demand and limited success of applyingdata science methods and tools in application areas such as information technology,communications, financial markets. These early successes have led to a demand for datascientists in a whole range of industries from drug discovery to healthcare management, frommanufacturing to enterprise business processes as well as government organizations with theexpectation to do “data-driven” tasks such as the ability to create mathematical models of data,identify trends and patterns using suitable algorithms and present the results in an effectivemanner. However, there is also a growing realization that scientific knowledge is not enough fordata scientists who must also demonstrate awareness of ethical responsibilities in their work andoutcomes.

The goal of the doctoral program is to teach students knowledge, skills and awarenessrequired to perform data-driven tasks, and using this shared background, lay the foundation forresearch that expands the boundaries of knowledge in Data Science. To achieve these goals, thegraduate program is structured as a set of three key requirements related to coursework,examinations and dissertation compliance. The course preparation consists of breadth and depthrequirements of 48 units taken for letter grade and 4 units of satisfactory completion ofprofessional preparation courses. After a required preliminary advisory assessment at the end offirst year, the examination requirements consist of a research qualifying examination anddissertation defense examination. The dissertation compliance requirement approved thesisdocument that specifically meets reproducibility requirements. The implementation plan isdesigned to open the program for internal transfers in Fall 2021 with a formal announcement andnew admissions starting Fall 2022.

Ph.D. in Data Science, November 30, 2020 Version 4.1 2 | Page

Table of ContentsExecutive Summary 21 Introduction 5

Program Goals 5Historical Development of the Field 7Rationale and Justification 12Timetable for Development of the Program 12Relationship of the Proposed Program to Existing Program on Campus 13Contributions to Diversity 17Relationship of the Proposed Program with Other UC Campuses 18Program Administration and Resource Planning 20Plan for Evaluation of the Program 23

2 Program 23Undergraduate Preparation for Admission into the Doctoral Program 23Admission Requirements 24Foreign Language 25Overview of the Proposed Doctoral Program 25Plan of Study 27Unit Requirements 27Structure of the Proposed Graduate Program 28

Course Numbering Schema 28Required and Recommended Courses 29Program Structure 29Student Advising Information 38

Field Examinations 38Qualifying Examinations 39

Preliminary Assessment 39Research Qualifying Examination, or the UQE 39

Thesis Requirements 40Generalization and Reproducibility Requirements: 40

Final Examination 40Explanation of Special Requirements Over and Above Graduate Division MinimumRequirements 41

Generalizability and Reproducibility Requirements 41Rotation Training Program 41

Relationship of Master’s and Doctoral Programs 42Special Preparation for Careers in Teaching 42

3 Projected Need 43

Ph.D. in Data Science, November 30, 2020 Version 4.1 3 | Page

Student Demand for the Program 43Opportunities for Placement of Graduates 44Importance to the Discipline 45Ways in which the program will meet the needs of the society 46Relationship of the program to research and/or professional interests of the faculty 46Program Differentiation 46

4 Faculty 475. Courses 536. Resource Requirements 537. Graduate Student Support 538. Governance 549. Changes in Senate Regulations 54

Appendix A: Listing of Research Areas 55

Appendix B: Letters of Support Solicited 57

Appendix C: Catalogue Copy Description [Draft] 59

Data Science (DSC) 59The Graduate Program 59Data Science Program 59

Course Requirements 60Preliminary Advisory Assessment 62Research Qualifying Examination (UQE) 62Dissertation Defense Examination 62

Student with Disabilities 63Appendix D: HDSI BylawsAppendix E: Faculty Vitae

Ph.D. in Data Science, November 30, 2020 Version 4.1 4 | Page

1. Introduction

1. Program GoalsThe goal of the doctoral program is to train graduate students who will advance the field of DataScience. The doctoral program is an integral part of a greater mandate of the Halicioğlu DataScience Institute (HDSI) to create a community and ecosystem of self-identifying data scienceresearchers and practitioners. The resulting discipline of Data Science consists of a community ofrecognized researchers and a common interest in questions such as: such as: What problems areconsidered important? What solution methods are considered legitimate, valid or useful? Whatverification regimes are considered essential to assess a proposed solution?

The nascent discipline of Data Science currently draws researchers from diverse fields that sharea quantitative intellectual tradition, from mathematics, computer science, engineering, physicalsciences, and quantitative social sciences. Naturally, the proposed program builds uponintellectual traditions and training in disciplines that currently constitute the primary drivers ofknowledge advances in Data Science: computer science, mathematics, statistics, and electricalengineering. While these are the disciplines from which the majority (but not all) of the currentHDSI faculty are drawn, we are keenly aware of the importance of various use cases that areprincipally driving adoption of data science advances in practice including engineering,medicine, governance, journalism, and archeological discovery. More importantly, beyond usecases scholarship in Data Science must also demonstrate awareness of ethical responsibilities forthe direct role data has on our social, cultural and personal lives.

This places a significant academic challenge on the HDSI faculty to create a program with awell-defined intellectual core that invites and cultivates diverse intellectual traditions. Such aprogram can not simply be a collection of diverse existing topics or multiple courses and degreesin sciences and humanities stacked on an individual, or specialization of an existing program.Instead, a streamlined and integrated approach to curriculum is needed that is accessible tostudents drawn from different undergraduate degree backgrounds. The HDSI faculty haveaddressed the challenge of program accessibility in its Masters of Science (MS) program recentlyapproved by the Graduate Council of the Academic Senate. Building upon the MS/DS program,the doctoral program is structured to cultivate both a generalist’s penchant for persistence inresults validated by proofs, and robust experimentation as well as a specialist’s view of practicalimpact validated by real-world demonstrations, user studies and trials.1

1 David Epstein in “Range: Why generalists triumph in a specialized world” (Riverhead Books, 2019) onthe importance of broad thinking and diverse experiences.

Ph.D. in Data Science, November 30, 2020 Version 4.1 5 | Page

To achieve these goals we outline here a broad and equitably-accessible program of graduatestudy that clearly articulates core knowledge and skills to be expected from our graduates whileensuring success of students drawn from diverse educational backgrounds. We seek to achievethese via well-articulated pathways through existing and new courses as well as on-ramp courseswith financial support necessary to ensure a diverse talent pool consistent with aspirational goalsof UC San Diego as an academic institution.

The educational objectives of the proposed degree program include knowledge and skillsexpected of all Data Science graduates namely: (a) collect raw data from various sources andconvert this raw data into a curated form suitable for computational modeling and analysis (e.g.,its use in designing experiments); (b) understand learning algorithms and how to appropriatelyuse them in a given domain by developing effective optimization methods; (c) interpret theresults of these algorithms and iteratively drill down into the data, perform analysis, visualizeresults and carry out scientific enquiry appropriate for the targeted domains. These educational2

goals are to be achieved through required courses structured into breadth, depth and electivegroups. A successful completion of the doctoral degree program will require a demonstratedadvance in the state-of-the-art in data science evidenced through traditional means of academicresearch success: peer reviewed publications, software (tools) or system artifacts and evidence ofgeneralizability and reproducibility documented in a well-written and approved dissertationdocument. These requirements are discussed in detail in Section 2.10.

A successful execution of the proposed program also induces imperatives and resourcecommitments by the Institute that are discussed later in this document. These include effectivepartnership with academic units, institutes and centers for maximum exposure of potentialdomain experts to graduate student training including any rotation programs, necessary teachingcapacity by HDSI faculty for timely graduation of students, and essential advising andcounseling services for the students to appropriately guide them towards graduation and intopost-graduation careers.

2. Historical Development of the FieldScientific and engineering advances have given us a better understanding of the physical world,material and structural properties and their use in accomplishing primarily physical work moreeffectively and efficiently. These advances through new measurements and models have alsogiven us insights into the living world and the processes of life, mind and the intellect. In doing

2 These conclusions have been arrived at through discussions within the HDSI faculty council informed bynational debate on this subject organized in a series of meetings by the National Academy of Sciences,Division of Engineering and Physical Sciences under the “Roundtable on Data Science Post-SecondaryEducation”, 2016-17.

Ph.D. in Data Science, November 30, 2020 Version 4.1 6 | Page

so, these advances are incorporating human knowledge from social sciences, humanities, naturaland life sciences into a greater understanding of us and the world around us.

Data, collected or synthesized, is the primary means of such knowledge exploration andintegration. Historically, data analysis has been a domain of Statistics, a field that has reachedacross multiple centuries. The growth of scientific enquiry especially through the eighteenthcentury post-Napoleon period of quantitative scientific discovery relied upon calculus andprobability to understand measurement data. Statistical analysis spread widely to many areas ofhuman enquiry, in particular, areas of social sciences such as economics, clinical psychology etc.These efforts contributed to significant growth in statistics.

Statistics departments are now common in most universities. At UC San Diego, while there is nodepartment of Statistics, statistics faculty are part of the Department of Mathematics and HDSIon the General Campus, and also in the Division of Biostatistics in the School of Medicine. AsStatistics matured with strong foundational results and practical methods influencing manyapplication domains, more recent advances in computing hardware, software, engineering ofsensory devices, etc. enabled not only volumetric growth in data but also in computational meansto handle such data. Recent advances in algorithmic processing, machine learning, havesignificantly advanced computational means for data processing. Early efforts in definingcomputational means of handling large data sets and streams placed a new field of Data Scienceat the intersection of statistics and computer science while others characterized it as a growth3

area of Statistics with strong applications focus. ,4 5

While strong footprints of Computer Science, Mathematics and Statistics can be seen in itsorigins, Data Science has emerged as a discipline in its own right to define either the coreproblems of the sciences and society, or fundamental theories and underlying methods and toolsto solve these problems. Many of these problems concern reasoning, spanning intellect andknowledge domains that are assisted by computing machines, thus referred to as machineintelligence or artificial intelligence (AI). An independent and rich tradition in signal processing,information theory, detection and estimation theory from Electrical Engineering has contributedsignificantly to modern automation methods in AI. While AI has caught the imagination ofcomputer scientists and mathematicians since the early years of computing machines nearly halfa century ago, technological advances have only recently made it possible to realize answers tosome of the pressing questions such as:

● How do we automate routine tasks without violating human autonomy of thought orconduct?

5 David Donoho, “50 Years of Data Science,” Journal of Computational and Graphical Studies, Dec 2017.4 Bin Yu, “Let us own Data Science”, IMS Presidential Address, October 2014.3 David Blei, Padhraic Symth, “Science and Data Science”, PNAS August 2017.

Ph.D. in Data Science, November 30, 2020 Version 4.1 7 | Page

● How can we incorporate machine intelligence into decision processes that are currentlypurely human, and thus transition from purely human decision making to combinedhuman/machine decision making?

As we are beginning to provide answers to such questions, typically in the form of new softwareand systems in various application domains such as improved automated diagnostics fromradiological images, we are beginning to face an entirely new set of sophisticated questions suchas:

● Anticipatory Awareness: How do we integrate algorithmic decision making into political,social, and economic institutions in a way that anticipates how the algorithm itself mightchange the incentives/behavior of individuals and cause negative and positive feedbackloops?

● Artificial Sentience: What are the ethical, moral and business considerations when analgorithm learns by observations and produces new products and services? Who are theultimate beneficiaries of these intellectual or material products: for instance in ahealthcare setting, is it the patient or the doctor being observed, the business creatingnew services or the machine itself?

The list of such questions is mind-boggling and touches pretty much every area of humanenquiry . As an academic institution, fortunately our focus is limited to how knowledge advances6

in the emerging domain will be achieved and how will we create a talent pool for the emergingarea? As mentioned earlier, the academic areas that have made the most early advances inmethods, tools and systems to perform such data analysis are statistics and computer scienceespecially in the context of understanding brain and cognition. Such talent is typically found inthe departments of electrical engineering and computer science, cognitive science, as well asmathematics, statistics in natural sciences.

New sensing, data-collection and computing devices have also brought together these domains,enabling practitioners to relax assumptions about the nature of the process that generates the dataand use real-life datasets instead. Thus, the analysis methods can be directly interfaced withreal-life systems to actually capture and analyze real data (and sometimes in real time as well).Without the axioms underlying data generation processes, the mathematics and statistics requiredto arrive at robust answers analytically become exceedingly complex. It is also precisely in thesecircumstances where computing steps in and provides us computational models and solutionsthat can deliver practical answers. Yet, neither -- mathematical analysis or computational models-- can provide generic answers as problem-solving methods (and tools) that individual

6 Recently, a number of attempts to define Grand Challenge problems in the Data Science area havebeen made. Prominent among these are essays by Jeannette Wing, Bin Yu, and Xuming He & XihongLin. https://hdsr.mitpress.mit.edu/pub/d9j96ne4/release/2

Ph.D. in Data Science, November 30, 2020 Version 4.1 8 | Page

applications can use, because in the absence of a broadly applicable axiomatic framework, thevery success of these methods and tools depends upon the structure, dynamics and meaning ofthe actual data. In this environment, to make real progress and demonstrate impact, it is essentialto work closely with the scientists, engineers and social scientists -- the “domain experts” --before the real problems are understood, articulated and solutions devised to make an impact.

In this context, UC San Diego provides a rich tapestry of domain experts starting with perhapsone of the most complex of application domains – the human brain and mind – and spread acrossthe triumvirate of general campus, health and marine sciences. Over the past four years theInstitute leadership has engaged deeply with a broad community of nearly 500 researchers acrossthe campus through many meetings in small group settings. These efforts yielded a core group offounding faculty who came together and organized their ideas in data science. There are nowover two hundred faculty affiliated with HDSI drawn from all schools and divisions, and nearlyall departments who participate in various HDSI events, including its weekly Friday seminarsduring the academic year. HDSI affiliates are organized into six research clusters shown belowand on HDSI website under Research:

● Data Science Theory: Researchers in this cluster work on theoretical foundations ofData Science, design machine and statistical learning algorithms with provableguarantees, develop methods and tools for the practitioners that are broadly useful incombating the “deluge” of data caused by ever growing sources of data. Researcherswith core expertise in algorithms, mathematics, and statistics work with domain expertsin areas where there is a perceived benefit to collecting large amounts of data. Theconstant interplay between the particulars of a domain and generality of methods isessential to the advances we seek in algorithmic data sciences.[https://datascience.ucsd.edu/research/theory-cluster/]

● Enabling Discovery: Researchers in this group are drawn together from the ongoingCenter for Computational Mathematics that administers the campus-wide ComputationalScience, Mathematics and Engineering (CSME) graduate program. With the rise of bigdata, the CSME area has evolved into Data-enabled Computational Science that seeks toadvance and make available integrated approaches to massively parallel computation –from architectures to algorithms — as building blocks for scientists and engineers.[https://datascience.ucsd.edu/research/discovery-cluster/]

● Education and Curricula Design: The goal of the Education cluster is to enabletraining of students in methods and tools of Data Science regardless of their majors ordegree program. We seek to enhance skills of our graduates in experimental design,hypothesis testing and data analysis by offering courses — online and in person — thatprovide opportunities for significant hands-on learning experience.[https://datascience.ucsd.edu/research/education-cluster/]

Ph.D. in Data Science, November 30, 2020 Version 4.1 9 | Page

● Quality of Life: The researchers in this cluster span areas of health sciences that rely onlarge data sets such as precision health imaging, pharmaceutical data sciences, cancercytogenomics and immunogenomics, cancer biology, and medical and populationgenomics. [https://datascience.ucsd.edu/research/life-cluster/]

● Cross-Cutting Areas and Systems: Researchers in this cluster address infrastructuralneeds of cloud computing, telecommunication networks, data-driven system design, datavisualization, scientific workflows, data science in art.[https://datascience.ucsd.edu/research/systems-cluster/]

● Data Science in Society: This focus group aims to develop advanced geospatial toolsand research methods for scalable analysis of satellite data. The group will developworkflows that combine machine learning, remote sensing and crowdsourcing tools tomap our changing world and to address many of the world’s greatest challenges. Thegroup will identify multidisciplinary research domains that would utilize remotelysensed data to address one or more of UCSD’s research themes.[https://datascience.ucsd.edu/research/society-cluster/]

Each cluster consists of a number of interested groups where the researchers and practitionerscome together for joint research efforts in response to various research funding opportunities,engagements with the industry, etc. HDSI provides personnel and material support to the entirecommunity for both proposal preparation as well as industry engagements. A complete list of 44different research areas covered by the HDSI affiliates is available on the website.

Over the past two years, HDSI has recruited core faculty members, as well as drawn a numberof existing faculty through partial appointments into building an active governing body, FacultyCouncil. As of this writing the faculty council consists of 11 full-time faculty members, 13partially-appointed faculty members and 24 formally appointed faculty with no teachingresponsibilities at HDSI (i.e., the so-called, 0% appointments). A complete list of facultymembers and their specializations is provided in Section 1.8.

Over the past year, the HDSI faculty council has worked to identify broader researchchallengesthat are central to Data Science as a field. A compilation of all these efforts reducedcore areas of research into following eight research themes that form the scope of the HDSIacademic programs and continue to drive our faculty recruiting strategy listed below.

Core Theme Brief Description

1. ArtificialIntelligence

AI is about automating the decision processes, augmenting,complementing or amplifying decision making means. Challengingproblem areas are related to finding and learning good representationsof knowledge with connections of human cognition.

Ph.D. in Data Science, November 30, 2020 Version 4.1 10 | Page

2. MachineLearning

Computing machines –- from neuromorphic to cloud processing – haveenergized algorithms and CS theory that enable computing systems tolearn from data, create examples, counter-examples. ML coversvarious learning methods (reinforcement, transfer,resource-constrained, deep learning), architectural acceleration andalgorithms for game theoretic setups, natural language processing, etc.

3. DataInfrastructure

This theme covers the entire gamut of machines and systems thatenable us to curate, organize, visualize and navigate large datasets,identify structure in such data; design, deployment and security ofsystems and their software stack including new programmingparadigms, languages and methods.

4. MathematicalFoundations

Mathematical foundations span areas of probability theory, statisticsand applied mathematics that are used to address current and pressingchallenges of data science such as causal inference, non-parametricdata analysis, compressed sensing, multiple hypotheses testing andsubmodular optimization, etc.

5. DigitalHumanities

Digital humanities is an umbrella term that in UCSD context spans bothsocial sciences, arts and humanities. The research topics includeprivacy, public policy, ethics, computational social science,computational linguistics, and philosophy of information.

6. Systems andApplications

A large and heterogeneous group of topics from algorithms anddemonstrable systems, their use in specific domains from medicalsignal processing, economics, geospatial data systems to political andeconomic systems as well as applications in cyber-physical systemsand robotics.

7. ScientificDiscovery

Data-driven scientific advances through new instruments and analytics,especially for high-throughput biology and chemistry that enable us tounderstand the natural world as well as life processes.

8. Healthcare/Medicine

Research in this theme focuses on how data can be harnessed toimprove human health and well-being. The goal is to develop technicalcontributions – i.e., theory, methods, hardware, software – to driveprogress in areas such as: clinical decision support, precisionmedicine, clinical trial design, medical image processing,pharmaceutical development, bioinformatics, and genomics.(https://www.mlforhc.org/, https://www.nature.com/npjdigitalmed/)

3. Rationale and JustificationThere are three key reasons for offering the new degree program, each of which are ultimatelytied to the mission of HDSI to be the hub for data science at UC San Diego: (a) to develop the

Ph.D. in Data Science, November 30, 2020 Version 4.1 11 | Page

nascent discipline of data science; (b) to meet the demand for specialists in data science areas byacademia and industry; and (c) to catalyze data science research by building an ecosystem ofpartnerships in research and teaching across UC San Diego. As a core subject area of theHalicioğlu Data Science Institute, the Data Science doctoral program is a natural evolution of theundergraduate and (pending) master’s program in Data Science and a key vehicle for theacademic research conducted by its full and jointly-appointed faculty members. Besides beingthe academic home for Data Science at HDSI, a practical reason for launching this program nowby the HDSI is to satisfy the growing demand for the graduate program in data science bothinternally as well as externally. Launched in 2016, the undergraduate Data Science major isalready the 8th largest major at UC San Diego on par with Electrical Engineering despite beingon enrollment cap. In a survey of our students graduating from Data Science major, roughly athird of them have indicated their interest in a graduate degree in Data Science program. In 2019,nearly a half (1571) of over 3000 applicants to various graduate degree programs at UC SanDiego who indicated interest in Machine Learning and Artificial Intelligence related topics inElectrical Engineering, Computer Science and Cognitive Sciences, directly indicated theirinterest in Data Science programs at HDSI.

In short, the demand for doctoral training in Data Science is high and continues to grow with theneed for future leaders in Data Science both in research and education. Further, the program notonly serves the HDSI mission of educating talent in the area of Data Science, but also serves as avehicle for continued engagement and proliferation of Data Science training across variousgraduate programs through foundational, core and elective course offerings that engage domainexperts into the field of Data Science (See Section 1.5 on HDSI Strategy for Partnership withOther Academic Units). The program provides an excellent means to create new educationalopportunities for students, especially for underserved and economically-disadvantaged studentpopulations who can benefit from graduate scholarships offered by HDSI as a part of its corefoundation-supported activities.

4. Timetable for Development of the ProgramThe HDSI faculty council kicked-off discussion on the graduate program in Summer 2019 with aformal presentation hosted by Yian Ma and Gal Mishne at the two-day long Faculty Retreat onSeptember 16-17, 2019. The faculty resolved to proceed with the plans for a Ph.D. program andformed a faculty committee for defining and developing the PhD program in Fall 2019 headedby Professor Gal Mishne. The discussions matured into a proposal that was revised in view ofthe Graduate Council feedback on a concurrent proposal by HDSI for a Master’s program inData Science. The two proposals were closely coordinated due to the graduate courses (beingoffered and planned for future) that are common to both degree programs.

Ph.D. in Data Science, November 30, 2020 Version 4.1 12 | Page

We plan a two-phase launch of the Ph.D. program with internal transfers (from other degreeprograms) beginning Fall 2021 followed by a formal announcement and launch beginning Fall2022 with a general admission deadline of January 15, 2022. Initial enrollment is estimated to be5-10 students with approximately 10-15 new students per year that will ramp to 20-25 newstudents per year at the steady-state for an average of 4-5 students per faculty FTE. We plan toconduct a review of the program and its outcomes after the first three years of operation as a partof our re-assessment of capacity and any enrollment changes.

Needs assessment and faculty discussions in HDSI Summer’19-Winter’20

Administrative Review and Routing Fall 2020 (October 2020)

Proposal submitted for UCSD Graduate Council, Reviewedby mid-December 2021

November 30, 2020

Revised proposal submitted January 6, 2021

UCSD proposal submission to CCGA March 2021

CCGA approval received Early Spring 2021 (early May)

UCOP approval received Spring 2021 (June)

Program open for admissions (internal transfers only) Summer 2021 (July)

Program announced for new admissions Early Fall 2021 (Application DeadlineJan 15, 2022)

Admission of first class announced Spring 2022 (April)

Orientation and student advising Summer 2022

Program offered and courses begin Fall 2022

5. Relationship of the Proposed Program to Existing Program onCampusAs part of a transdisciplinary field, Data Science courses necessarily intersect with programs in computerscience, electrical engineering, mathematics and cognitive sciences which are among the closest andfounding partners of the HDSI. As attested in letters from chairs of these departments and divisionaldeans, these intersections discussed below are seen by the departments and HDSI faculty as a strength thatmakes the proposed degree program unique in bringing together the very best UCSD has to offer

Ph.D. in Data Science, November 30, 2020 Version 4.1 13 | Page

educationally. In what follows, we first briefly describe HDSI’s approach to building partnerships withother academic units on campus and our approach to pedagogy in a resource-optimal way.

How do we partner with other academic units? Over 200 founding faculty as a part of HDSIFaculty Affiliate program are a starting point for a deeper engagement with HDSI and itsgovernance. HDSI's relationships with other academic units are governed by jointly appointedfaculty members on the HDSI Faculty Council with specific roles in keeping campus unitsapprised of HDSI plans and progress through its weekly meetings. HDSI programs are overseenby standing committees of the HDSI Faculty Council. Before hiring any of our core full-time orpart-time HDSI faculty, it has been the role of the HDSI Faculty Council to define and developboth the Data Science curriculum and research directions. The HDSI Faculty Council nowconsists of faculty drawn from all across UC San Diego: the home departments of Councilmembers are in Engineering (e.g., Computer Science, Electrical Engineering), PhysicalSciences (e.g., Mathematics, Physics), Arts & Humanities (Philosophy, Visual Arts), SocialSciences (e.g., Cognitive Science, Communication, Political Science), Medicine (e.g.,Biostatistics and Bioinformatics, Radiology, Pediatrics), the Scripps Institution ofOceanography, and the Supercomputer Center. With such a diverse background to draw upon,the HDSI Faculty Council has managed to create a unifying vision for Data Science, and to steerthe Institute towards a future that is based on interdisciplinary collaboration with all units onCampus.

Following senate regulations, the Faculty Council has developed a detailed set of Bylaws[attached with the proposal] to facilitate the governance and growth of HDSI as an academicunit. The HDSI Faculty Council remains open to new faculty interested in joining HDSI via awell-defined review, advise and consent process. Using this governance structure, HDSI hassuccessfully conducted six joint searches in 2019, and four in 2020. The Faculty Councilcurrently consists of 48 faculty members:

● 11 faculty members with 100% appointed in HDSI (2 Full Professors, 1 Associate, and 8Assistant Professors)

● 13 faculty members with joint appointments with another department (Communication,Computer Science and Engineering, Neurobiology, Bioengineering, Mathematics,Political Science, Biostatistics, Philosophy)

● 24 faculty members with current or proposed 0% appointments in HDSI. These areamong the original faculty council members who have guided recruiting. All of themwill eventually transition to 0% appointments with HDSI as the ongoing processcompletes.

We note that in its proposed three-year hiring plan, HDSI has requested the largest number ofjoint searches among all divisions on the general campus. Partners of HDSI can be found in

Ph.D. in Data Science, November 30, 2020 Version 4.1 14 | Page

almost all units on campus.

Program Engagements: The proposed doctoral program lists a number of new core courses thatare also part of our MS program and designed to be broadly accessible. Some of the proposedcourses will be taught by faculty in other departments, and thereby cross-listed with coursesoutside HDSI. When and if relevant courses are available in other UCSD departments, studentswill be encouraged to enroll in them. Notably, HDSI has partnered with the Computer Scienceand Engineering Department to create an Online M.S. program that was recently approved bythe Graduate Council, and that will serve as an excellent on-ramp preparation for a few selectedstudents into the Ph.D. program. Indeed, HDSI currently offers scholarships to students thatcover the cost of attending online courses taught by HDSI-affiliated faculty.

Among the related programs that partially cover some of the topical areas of the PhD/DSprogram are the doctoral programs in Computer Science by CSE, Electrical Engineering by ECE,Statistics by Math. More precisely, there are specializations of these programs that featureelective courses on Machine Learning, Statistical learning and inference. The proposed programis directly and entirely dedicated to Data Sciences and differs from existing programs in twomaterial ways: in the breadth of the student population it serves and in the scope of thetransdisciplinary training it provides as discussed below:

(a) The proposed degree is targeted to students drawn from a wide variety of backgrounds intheir undergraduate education in an effort to serve a diverse group of learners interested in DataScience. This is in contrast to existing programs that either target a different population ofstudents or focus on subject areas specific to their domain. For instance, a doctoral degree inComputer Science targets to admit students “with a strong academic background in computerscience and engineering and/or a related field.” Students in the PhD program must select fourcourses from ten different breadth areas that include Artificial Intelligence and Robotics.Similarly, the Machine Learning and Data Science (ML/DS) specialization in the ECE Ph.D.program is one of 13 specializations, and one of the three “impacted” programs, that is, capacitycontrolled areas along with Circuits and Robotics that are restricted to ECE students with arequired Bachelor’s (and optionally MS) degree in Engineering, Sciences or Mathematics.Mentored by notable researchers in the areas of information and coding theory, statistical signalprocessing, robotics and controls, the doctoral specialization provides deep insights intointellectual underpinnings for data analytics and machine learning and its various applicationdomains. Students in the Ph.D. program are required to meet 48 units of course work structuredinto three sets of courses that cover basic knowledge of programming, linear algebra, probabilityand statistics, a set of required courses and another set of technical electives. The nature of thesecourses as well as CSE courses, and their coordination with HDSI courses are discussed furtherbelow. Among other programs, the department of mathematics offers Ph.D. degrees inmathematics with specialization either in Computational Science (CSME) or Statistics.

Ph.D. in Data Science, November 30, 2020 Version 4.1 15 | Page

Admission to the Ph.D. programs in Mathematics requires a B.S. degree in Mathematics or astrong background in mathematics with demonstrated completion of a full sequence of courses incalculus, differential equations, linear algebra and a year’s sequence in both abstract algebra andreal analysis. Both specializations require completion of 48 units courses in core curriculum inmathematics and 24 units of specialization in topics related to analysis, probability and statistics,numerical optimizations, and applications of statistical methods to Bioinformatics. The studentsare required to pass two written qualifying examinations; typical choices for the latter areMathematical Statistics and Real Analysis. Finally, the division of Biostatistics andBioinformatics of the department of Family Medicine and Public Health (FMPH) in the HerbertWertheim School of Public Health offers a PhD degree in Biostatistics drawing upon courses andinstructors from the departments of Mathematics, Computer Science and FMPH. The programrequires 68 units, a vast majority of which are required in Mathematical Statistics, BiostatisticalMethods, and life science applications with one elective course drawn from Biostatistics, CS orMathematics.

(b) The courses offered by the HDSI graduate programs will be available to the students inrelated programs, and in fact, will be taught by faculty jointly appointed with other departmentsrepresenting domain knowledge. For instance, a faculty member jointly appointed with HDSIand Bioengineering is planning to offer a graduate course in Biomedical Data Analysis. Such acourse will constitute a core requirement in the Bioengineering graduate program as well as aHDSI cross-listed elective course for students with background and interest in biology andengineering. Thus, by serving as a catalyst for creation of new courses and student support,HDSI seeks to enhance the overall capacity of UC San Diego in serving educational and trainingneeds of a growing population of students whose interests extend beyond the offerings of theexisting programs.

Impact on Existing Programs: We do not anticipate any adverse material impact on existingPhD programs in CSE, ECE, Mathematics or Cognitive Science. Their respective specializationsin Artificial Intelligence, Machine Learning/Data Science and Statistics are part of much largerPhD programs among a dozen or so other specializations and are heavily oversubscribed bystudents in their respective departments with class enrollments routinely over 200 students.

On the contrary, we expect a positive impact from increased participation of various academicunits in Data Science related subject areas (such as Computational Biology, ComputationalChemistry, Computational Social Science) that are currently inaccessible to graduate studentsfrom other departments despite their need and demand, an assessment supported by letter fromthe dean of division of social science. The proposed PhD/DS program expands the pool ofapplications by enabling students from diverse backgrounds such as Economics, Cognitive

Ph.D. in Data Science, November 30, 2020 Version 4.1 16 | Page

Science, Biology etc to consider a research career in Data Analytics or its application to theirown domains.

There should also be a positive impact on certain specialization area courses offered by thepartner departments: some of the data science graduate students will increase enrollments inthese classes most of which are cross-listed with DSC courses and/or taught for HDSI orjointly-appointed HDSI faculty in the Math, CSE, ECE departments. We expect to see a spreadof data science graduate students across half a dozen or so available area specializations.

6. Contributions to DiversityOur vision for how the proposed program will advance UC’s goals for diversity is informed bythe founding document “HDSI Strategy for Inclusive Excellence.” It is a living documentavailable online at https://bit.ly/HDSI-Diversity that will be updated with information available fromDiversity Dashboard and our surveys.

The nascent nature of the HDSI organization provides us with additional flexibility toincorporate Equity, Diversity and Inclusion (EDI) goals into the DNA of the new institution, thatis, embedded in all our processes and actions ab initio with these goals. In particular, the threecore tenets of “access & success, climate and accountability” are vigorously pursued. TheInstitute has taken three concrete steps towards EDI goals that will directly impact its programs-- including the proposed graduate program -- in the coming years. First, HDSI faculty recruitingis carefully planned with anti-bias training required from the entire faculty council before anyfaculty search is initiated. Second, every faculty member hired into HDSI has been provided with$30K in support of EDI goals. These funds are held by the Institute and released for specific andapproved activities that advance diversity goals. The faculty members are encouraged to poolthese resources and seek additional matching resources from the Institute to launch substantiveprogrammatic activities.

Beyond this “access and success” part, the third element of HDSI strategy directly addressesclimate and accountability. HDSI has proposed and eventually succeeded in using its endowmentresources to identify and recruit a full-time coordinator for broadening participation(https://bit.ly/HDSI-BPC). While a permanent position is pending and yet to be created, workingwith the administration we have been able to recruit Saura Naderi who has been tasked full-timein developing measures, establishing metrics and directing the activities related to EDI goals .7

The personnel and EDI share-pool mentioned earlier empowers HDSI faculty to put to actionconcrete plans -- including seminars, enrichment activities and additional counseling, etc that areimplemented, tracked and accounted for. With increased direct attention to BPC (broadeningparticipation in computing) plans in research proposals to agencies such as the NSF, HDSI’s

7 https://datascience.ucsd.edu/about/dei/ website provides a starting point for engagement with HDSIpersonnel who are dedicated to achieving EDI goals of the Institute.

Ph.D. in Data Science, November 30, 2020 Version 4.1 17 | Page

broadening participation plan provides a sustainable institutional mechanism and support for theHDSI community. Under the leadership of Saura Naderi, HDSI has created a DEI council for thefaculty and staff. The DEI council meets weekly, and has started to work on projects such asunderstanding racial influences and bias. It has also created a number of projects that are in theearly stages of planning and execution. Prominent among these are “Pathways to AI” outreachprogram, and an educational trial program in Chula Vista Middle School. These and otherprojects are discussed and launched by the DEI council that will be advising faculty in terms ofpromoting inclusion and equity. Under the stewardship of DEI council, we will be creatingefforts where faculty can participate using their funding.

Beyond the three elements of the HDSI EDI strategy mentioned earlier, the proposed doctoralprogram will also provide an excellent vehicle for deploying our fellowship support to encourageURM participation as well as making the program accessible for academically strong buteconomically disadvantaged students to ensure the program provides an affordable pathway for abroad and diverse student population. Among the programs we have already devised andlaunched are scholarships for graduate students (a commitment of $600K for the current year,8

and likely to rise in coming years), as well as access to learning outside the classroom throughmechanisms such as EdX micromasters programs where the Institute offers financial support toall students interested in taking on-ramp classes at their own pace. This on-ramping is a criticalelement of our strategy to leverage the existing micromasters program. It enables students fromdifferent backgrounds, who may otherwise be rejected from the doctoral program, to demonstratethat they can do well in the program at low or no cost through our scholarship to undergraduatestudents across the campus.

Beyond strategic decisions and choices to appoint personnel, devote resources, we are cultivatinga climate for faculty to conceive of new ideas that directly contribute to inclusive excellence.Among the measures that we seek to improve are the participation rates of women andunderrepresented minorities in our classes and degree programs, retention rates, progress towardsgraduation and placement results. Institutional role models and mentoring are key means that arealready implemented in the HDSI foundations and shall remain a cornerstone of HDSI facultyrecruiting and leadership advancement.

7. Relationship of the Proposed Program with Other UC CampusesGiven the historical development of the field discussed in Section 1.2, Data Science as a doctoralsubject area is often led by departments of Statistics or Operations Research on campuses wheresuch departments exist, in cooperation with Computer Science or EECS departments. UC San

8 Please see page 60-61 of our annual report at http://bit.ly/HDSIfirstyear.

Ph.D. in Data Science, November 30, 2020 Version 4.1 18 | Page

Diego’s Ph.D. program in Data Science will be the first such program on a UC campus joiningsimilar programs nationally.

To meet the growing scope and demand for data science, a number of UC campuses are offeringor planning to offer undergraduate and graduate programs in data science while at the same timebuilding new academic departments and schools of data science to support the nascent field.Prominent among these is UC Berkeley’s Division of Computing, Data Science and Society(CDSS) consisting of the departments of Statistics, Electrical Engineering and Computer Science(which is jointly part of Engineering and CDSS), and the School of Information. CDSS at UCBerkeley provides specialization in the form of a “Designated Emphasis in Computational andData Science and Engineering” to existing Ph.D. programs through curriculum specialization inthe individual Ph.D. programs.

At UC Davis and UC Irvine, the departments of Statistics have taken leadership in data scienceas a part of their existing degree programs in Statistics, especially due to organizationalstructures of the underlying departments (such as Statistics being part of a School of Informationand Computer Science at UCI). While the current focus in these emerging academic units is onbachelor’s and master’s programs new doctoral programs and specializations are beginning toappear.

To a first order, none of the specializations of existing Ph.D. degree programs on other campusesprepare students for a research career in Data Science, an important objective of HDSI’sproposed doctoral program. More importantly, no program provides the wide accessibility tostudents from diverse educational backgrounds to Data Science and its applications. Indeed, thestructure of the proposed graduate program consisting of foundation and core courses makes itpossible for HDSI to offer a single program that admits students with training as broad asengineering, sciences, social sciences, business and humanities and produces students with agraduate degree in Data Science with well-defined research specializations depending upon theapplication domain of the dissertation research that make it easier for them to pursue differentcareer paths in targeted domains.

Nationally, New York University (NYU) offers Ph.D. in Data Science since 2017 with fiverequired courses in programming, probability and statistics, big data information andrepresentation and nine elective courses drawn from the data science areas of machine learning,artificial intelligence and statistics. Columbia University offers a data science specialization ofComputer Science, Electrical Engineering, Industrial Engineering & Operations Research andStatistics doctoral programs. Among other notable national programs are Yale University’sStatistics and Data Science program, PhD in Data Science and Operations offered by theMarshall School of Business at the University of Southern California, and PhD in Statistics and

Ph.D. in Data Science, November 30, 2020 Version 4.1 19 | Page

Machine Learning offered by the Department of Statistics and Machine Learning at CarnegieMellon University.

8. Program Administration and Resource PlanningThe proposed program will be offered by the Halıcıoğlu Data Science Institute (HDSI),established as an academic unit by the UC Academic Senate in June 2018 under a divisionalbudget model on the UC San Diego campus. The Institute also carries a $75M foundingendowment with an annual payout that is expressly dedicated to support the mission of HDSI intraining and preparation of Data Science talent by the Institute activities and programs. Due to itsfounding commitments, the proposal does not require separate new infusion of campusresources.

The Institute faculty, and members of its faculty council, are listed below with their annualteaching workload in HDSI. The Graduate Admissions and Graduate Program are among thestanding committees of the HDSI Faculty Council. These committees are supported by afull-time academic coordinator as well as an assistant director of training programs to ensureprogram operation and academic advising of the graduate students.

Faculty Members with Teaching Responsibilities in HDSI Programs

FacultyGroup

Names and Title Appointments NominalTeachingWorkload inData Science

FullTime

Facultyin

HDSI(11)

Mikhail Belkin, Professor HDSI 3 courses

Justin Eldridge, Assistant Teaching Professor HDSI 6 courses

Aaron Fraenkel, Assistant Teaching Professor HDSI 6 courses

Yian Ma, Assistant Professor HDSI 3 courses

Arya Mazumdar, Associate Professor HDSI 3 courses

Gal Mishne, Assistant Professor HDSI 3 courses

Yusu Wang, Professor HDSI 3 courses

Babak Salimi, Assistant Professor HDSI 3 courses

Zhiting Hu, Assistant Professor HDSI 3 courses

Berk Ustun, Assistant Professor HDSI 3 courses

Ph.D. in Data Science, November 30, 2020 Version 4.1 20 | Page

Lily Weng, Assistant Professor HDSI 3 courses

JointFacultyinHDSI(13)

R. Stuart Geiger, Assistant Professor Communication &HDSI

1.5 courses

David Danks, Professor HDSI andPhilosophy

1.5 courses

Mikio Aoi, Assistant Professor HDSI &Neurobiology

1.5 courses

Jingbo Shang, Assistant Professor CSE & HDSI 1.5 courses

Benjamin Smarr, Assistant Professor Bioengineering &HDSI

1.5 courses

Barna Saha, Associate Professor CSE & HDSI 1 course

Arun Kumar, Assistant Professor CSE & HDSI 1 course

Yoav Freund, Professor CSE & HDSI 1 course

Jelena Bradic, Associate Professor Mathematics &HDSI

1 course

Rayan Saab, Associate Professor Mathematics &HDSI

1 course

Alex Cloninger, Assistant Professor Mathematics &HDSI

1 course

Margaret Roberts, Associate Professor Political Science &HDSI

1 course

Armin Schwartzman, Professor Biostatistics &HDSI

1 course

Fellows&Others(5)

Bradley Voytek, Associate Professor HDSI Fellow,Cognitive Science

2 cross-listedcourses

Ilkay Altintas, Chief Data Scientist, SDSC HDSI Fellow,SDSC

Virginia De Sa, Professor HDSI AssociateDirector, CognitiveScience

1 cross-listedcourse

Dimitris Politis, Distinguished Professor HDSI Associate 1 cross-listed

Ph.D. in Data Science, November 30, 2020 Version 4.1 21 | Page

Director,Mathematics

course

Rajesh K. Gupta, Distinguished Professor HDSI Director, CSE 1 cross-listedcourse

In addition to the 29 faculty members listed above (15.5 FTE), the Institute is also planning tofill one teaching faculty (LSOE) and one advancing faculty diversity (AFD) position in thecurrent recruiting season. It anticipates additional 1-2 new faculty members to join the Institutefor a total faculty strength of 16-17 FTE including 3 FTE LPSOE and 8 U18 lecturers.

Together, these provide a capacity of 51-52 courses annually by the current ladder-rank faculty inaddition to 5 cross-listed courses as well as teaching by U18 continuing lecturers for a combinedtotal annual capacity of 58-74 courses. The current Data Science undergraduate programaccounts for 35 courses/sections per year. Conservatively, the Institute has the capacity to offer6-10 graduate courses per quarter that enables it to adequately serve the proposed doctoralprogram.

Financial Support for Doctoral Students in the program will follow the current campus supportmodel consisting of direct support from the Graduate Division (formerly block grant), GraduateStudent Research (GSR) support from grants and contracts, and teaching assistant (TA)employment funding. The Graduate Division support typically corresponds to one year ofnon-employment based support including a levelized tuition (for resident and non-residentstudents). The total amount of this support is based on historical enrollment and a function ofoverall contracts and grant activity and is a part of the annual budgeting process. The current TAsupport provides for 8-10 TA FTEs who will be drawn from the graduate student pool in DataScience.

To ease the transition process, for the first five years of the program launch, the Institute will setaside 20% of our annual graduate student funding liability in the foundation accounts as acontingency measure to ensure support continuity in the face of any short-term GSR fundingshortfall as the Institute faculty ramp up extramural funding support through competitive grants.Thus, we are confident that using a combination of resources from the Graduate Division, GSR,TA and our foundation accounts, we will be able to guarantee five years of guaranteed funding toevery doctoral student in the program. In the steady state, we plan to deploy our fellowshipsupport to encourage URM participation as well as make the program accessible foracademically strong but economically disadvantaged students to ensure the program provides anaffordable pathway for a broad and diverse student population.

Ph.D. in Data Science, November 30, 2020 Version 4.1 22 | Page

9. Plan for Evaluation of the ProgramThe doctoral program will be formally evaluated like all other UC San Diego graduate programs,in a way that is consistent with senate regulations every 8 years. This evaluation process willinclude an external review and UC San Diego graduate council oversight. In addition, asmentioned earlier, the HDSI faculty will perform a mid-flight review after three years focusingon issues such as success in EDI-specific goals, its needs, and a comprehensive evaluation ofprogram placement outcomes. As with the formal evaluation, the internal review process – bothannually and mid-flight –will include student feedback and surveys, teaching evaluations, alumniand industry feedback.

HDSI's existing faculty members have significant experience in building, launching anddirecting graduate programs in the CSE, Bioengineering, Mathematics, Cognitive Science andBiostatistics units. The academic coordinator position was specifically designed with a viewtowards broadening participation, improving student learning experience and career placementoutcomes. The Institute is planning to be physically co-located in the same building with asegment of the Teaching and Learning Commons (TLC) starting 2021. This colocation willpresent us with additional opportunities for engagement with UC San Diego’s expertise inimproving learning experience and outcomes for all our students.

2. Program

1. Undergraduate Preparation for Admission into the Doctoral ProgramThe HDSI faculty have spent significant time discussing and formulating a plan that enablesmaximum participation of interested students into the envisioned graduate program. Arguably,this is the chief distinction of the campus-wide Data Science program. We are also keenly awareof our primary obligation to ensure successful and timely completion of the graduate degreeprogram given the significant level of individual and institutional investment in terms of timeand resources. Balancing these requirements has required us to structure the incoming stream ofstudents into essentially three broad categories:

1. students who come with preparation in computing and/or information sciences at a levelto master algorithmic programming and cloud computing skills;

2. students with preparation in mathematics and statistics at a level to master probability andstatistical methodology necessary for meaningful data analysis;

3. students who enter the program from other areas of science that rely upon collecting andanalyzing observational or experimental data in order to advance scientificunderstanding. These are students with a degree in natural sciences such as physics,

Ph.D. in Data Science, November 30, 2020 Version 4.1 23 | Page

chemistry, biology, environmental sciences, etc. or coming from a social sciencebackground such as communication, economics, political science, psychology, etc.Application examples may be causal inference in economics, assessing statisticalsignificance of a pharmaceutical experiment or psychological treatment, the study ofsocial networks in political science, etc.

We note that these are broad and overlapping categories. Even when students come prepared inboth advanced computing and mathematics/statistics, Data Science research problems challengethem to apply these skills meaningfully in diverse applications to advance knowledge.

Graduate admissions process will use text analysis methods to automatically sort and binadmitted students into three pools and thus drive the subsequent advising process including priorcommunication to the students regarding their preparation options using online and other offersby UC San Diego and other organizations. HDSI student advising will recruit an advisordedicated to the graduate program advising and will develop pathways for newly admittedstudents to take specific upper-level undergraduate courses from different areas, in order tosolidify their backgrounds when/if there is some perceived weakness.

2. Admission RequirementsA Ph.D. degree in Data Science is an advanced degree that prepares students for leadership indata science research in academia, industry or civic organizations. To be successful in thisprogram, the students must have a background in quantitative analysis typically seen in degreeprograms with substantial mathematical preparation and programming skills. Course work orequivalent experience in programming, calculus, probability and statistics are required.

Admissions requirements for the Ph.D. program are:

● Bachelor’s and/or Master’s degree in a quantitative field such as engineering, computerscience, mathematics, statistics, cognitive science, scientific disciplines or quantitativesocial sciences such as economics or computational social science. Other degree optionsare acceptable with demonstrated course work or experience in programming, calculus,probability and statistics.

● Undergraduate GPA of at least 3.0 on a 4.0 scale● College Transcripts● Optional GRE requirements as per the latest guidance from the Graduate Division.● Three letters of recommendation.● Evidence of proficiency for international students: three English proficiency

examinations are accepted for graduate study at UC San Diego:○ The Test of English as a Foreign Language (TOEFL): The minimum TOEFL

score for admission is 85 for the Internet Based Test and 64 for the Paper Based

Ph.D. in Data Science, November 30, 2020 Version 4.1 24 | Page

Test. Please note the Paper Based Test does not have a speaking component.TOEFL information and forms are available at the TOEFL website.

○ The International English Language Testing System (IELTS) Academic Trainingexam: The minimum IELTS score is Band 7.0. IELTS registration information isavailable on the IELTS website.

○ The Pearson Test of English Academic (PTE Academic). The minimum PTEacademic score required for graduate admission is overall score 65. Registrationand test information is available on the Pearson website.

● A statement of purpose that clearly outlines the motivation, background preparation, anyrelevant work experience in data science related areas and topical interests for a degree inData Science. Prospective students would be asked to identify any faculty members thatthey would like to seek as a research advisor.

3. Foreign LanguageA demonstrated proficiency in English is expected for international applicants. Foreign languageproficiency is not required for this degree.

4. Overview of the Proposed Doctoral ProgramThe Ph.D. program consists of the following components consistent with the regulation 715 ofthe San Diego Division of the Academic Senate :9

● Research rotation requirements to be completed by taking research rotation courses atleast in two laboratories in the first two quarters of Ph.D. program;

● Formal coursework requirements representing breadth and depth requirementsconsisting of 48 units of courses structured in three groups: foundations, core and depthareas; as well as 4 units of professional preparation including 1-unit HDSI FacultyResearch Seminar, 2-units of TA/Tutor training and 1-unit of Research Skills courses tobe completed with a Satisfactory grade;

● Completing a preliminary advisory assessment in a technical area of choice by thestudent by a committee set by the Graduate Committee (GradCom). This examination isto be completed before the start of the second year. Preliminary examinations willnormally be scheduled annually in the Spring quarter through Summer quarter of the firstyear. The goal of the preliminary assessment is to assess student preparation inbackground courses and identify any required courses consistent with the plannedresearch area. In rare cases, the assessment outcome may include a requirement to retake

9 http://senatestage.ucsd.edu/Operating-Procedures/Senate-Manual/Regulations/715

Ph.D. in Data Science, November 30, 2020 Version 4.1 25 | Page

the examination. The preliminary assessment must be successfully completed no laterthan completion of two years (or six quarter enrollment) in the Ph.D. program

● Passing a research qualifying examination (UQE) that is conducted by the dissertationcommittee consisting of five or more members approved by the graduate division as persenate regulation 715(D). One senate faculty member must have a primary appointmentin the department outside of HDSI. Faculty with 25% or less partial appointment in HDSImay be considered for meeting this requirement on an exceptional basis upon approvalfrom the graduate division. The goal of UQE is to assess the ability of the candidate to10

perform independent critical research as evidenced by a presentation and writing atechnical report at the level of a peer-reviewed journal or conference publication. Theresearch qualifying examination must be completed no later than fourth year or 12quarters from the start of the degree program; the UQE is tantamount to the advancementto PhD candidacy exam;

● Annual review of the progress in the doctoral program by the graduate committee ofHDSI faculty council;

● Teaching requirements including completion of teacher training course (DSC 599) andminimum of one quarter of teaching experience at half-time (50%) appointment as aTeaching Assistant over the course of the degree program;

● Successful defense of the dissertation presentation in a final examination to thedoctoral dissertation committee;

● Approved dissertation that must explicitly address the reproducibility requirement.This requirement can be met by providing supplementary online material consisting ofcode, data repositories, any evidence of use by external parties and/or where necessarythrough validated proof of results.

Time Limits: Assuming a student has no deficiencies and is full-time enrolled in the program,our normative length of time pre-candidacy is 3 years and 2 years in candidacy. Extension oftotal time from matriculation to degree beyond six years will require petition and approval fromthe graduate division. HDSI has instituted several mechanisms and incentives to ensureexpeditious time-to-degree. These include a full-time graduate students advisor in HDSIGraduate Affairs , preliminary assessment examination and advisory in the first year, and annual11

11 Academic and career advising are among the highest profile investments by HDSI and stipulatedexplicitly as a part of the founding gift agreement for the Institute. We plan to build a portal and services

10 This exception is stipulated in view of a large number of formally appointed faculty on HDSI facultycouncil (at 25% or 0%) drawn from different departments and divisions thus making it impossible for astudent to find an “outside” faculty member in some areas.

Ph.D. in Data Science, November 30, 2020 Version 4.1 26 | Page

review of each graduate student in the Ph. D. program led by the assigned faculty academicadvisor of the student, graduate scholarships funded by HDSI foundation accounts to cultivate aculture of excellence in research and dedicated staff for computing and data curation services toensure a smooth and easy access to necessary experimental platforms.

5. Plan of StudyThe program plan will follow Plan A consistent with the Regulation 715 of the San Diegodivision of the Academic Senate.

Before admission to the candidacy for the Ph.D. degree, the student must have passed apreliminary assessment examination conducted by a committee constituted by the GraduateCommittee (GradCom) of the HDSI Faculty Council. This committee shall not include anyassigned or selected research advisor.

The doctoral dissertation committee, chaired by the academic research advisor, shall beappointed by the Dean of Graduate Studies under the authority of the Graduate Council of theAcademic Senate. The committee members shall be chosen from at least two departments, and atleast two members shall represent academic specialities that differ from the student’s chosenspecialization. In all cases, the doctoral committee will include one tenured or emeritus UCSDfaculty member from outside the HDSI. In exceptional conditions, a faculty member with homedepartment outside of HDSI and with 25% or less appointment in HDSI may be petitioned to thegraduation division for meeting this requirement. Additional rules per Regulation 715 on the12

composition and conduct of the doctoral committee shall apply.

6. Unit RequirementsFor the conferral of the Ph.D. degree in Data Science, 48 units (12 courses) will be required tobe taken for a letter grade and 4 units of professional preparation units must be taken for apassing (satisfactory) grade. The professional preparation consists of 1 unit of faculty researchseminar, 2 units of TA/tutor training and 1 unit of survival skills course. Out of the 12 courses, atleast 10 must be graduate-level courses; at most two can be upper-level undergraduate courses.36 units or 9 courses must be completed within six quarters from the start of the degree program.

7. Structure of the Proposed Graduate Program1. Course Numbering Schema

Course numbering scheme in HDSI reflects its fundamental mission as the hub for Data Scienceacross the campus. Accordingly, course series are structured into groups according to the

12 http://senatestage.ucsd.edu/Operating-Procedures/Senate-Manual/Regulations/715

similar to our undergraduate advising to ensure success of our graduate students.(https://datascience.ucsd.edu/academics/undergraduate/resources/)

Ph.D. in Data Science, November 30, 2020 Version 4.1 27 | Page

intellectual (and corresponding organizational) areas where Data Science as a subject intersectswith existing topical areas. The first digit (from the left) of the three digit course number reflectsUndergraduate (‘1’) or Graduate (‘2’) course designation. Based on content, prerequisites andcredit policies, some courses can be taken by both undergraduate seniors and beginning graduatestudents. These are colloquially referred to as “mezzanine” courses. Number ‘5’ as first digitrefers to teaching related courses, such as tutor/TA training or credit accounting for graduateteaching activities. The middle digit describes either introductory data science subjects (‘0’),foundational core subject (‘1’, ‘4’), advanced topics in data science (‘5’) or data science subjectsrelated to domain areas such as life sciences (‘2’), computing (‘3’), society and humanities (‘6’),natural sciences (‘7’) etc. Finally, the last (right) digit reflects a partially ordered sequence ofcourses on a topical area starting with (‘0’) that have as pre-requisite courses in the lowerdivision or a mezzanine course respectively for undergraduate and graduate courses.

Areas Description UD UGseries

Gradseries

Notes

Data Management & Data Systems, Data Security DSC 10X DSC 20X Introductory &Mezzanine Courses

Computational & Mathematical Foundations DSC 11X DSC 21X

Data and Life Science DSC 12X DSC 22X

Digital Infrastructure, Computing Systems, Cloud,Cyber-infrastructure, Traditional & non-traditionalcomputing systems

DSC 13X DSC 23X

Data Science Theoretical Foundations (builds uponlower division DSC 4X series)

DSC 14X DSC 24X

Applied Machine Learning: Data Mining (incl.Graph mining, time-series mining), recommendersystems, ML-based vision, Deep learningapplications. Natural Language Processing

DSC 15X DSC 25X Multipledomain-specific MLapplications.

Arts, Humanities, Society, Policy and SocialSciences

DSC 16X DSC 26X

Data and Physical, Environmental Sciences DSC 17X DSC 27X

Capstone Project Courses DSC 18X NA

Special Topics DSC 19X DSC 29X Topics: 291, Projects:292, Seminars: 293,

Ph.D. in Data Science, November 30, 2020 Version 4.1 28 | Page

Rotation: 294,Survival Skills: 295.

Directed Research DSC 199 DSC 298,DSC 299

298: IndependentResearch; 299:teaching credit

TA/Tutor Training DSC 599

2. Required and Recommended Courses

The formal course requirements for the doctoral program build upon the course requirements toearn a Master’s degree in Data Science with additional requirements related to teachingexperience, professional preparation and research rotation requirements and coverage of bothbread and depth subject areas necessary for a successful doctorate degree. This structurerationalizes significant preparation and common knowledge and skills expected of all ourstudents in data science while preparing our students for leadership careers in data science. Itdoes so by leveraging the significant effort HDSI faculty spend to teach and prepare students forcore and domain-specialized topics in the master’s program, while preparing them to takeadvanced courses in chosen depth areas. It also provides a safe harbor for a small minority ofstudents who may not qualify or otherwise choose to exit from the doctoral program with asuccessfully completed master’s degree without significant additional investment of time in thegraduate program.

3. Program Structure

Courses in Data Science Graduate Program are structured into three groups of courses: Group A,Group B and Group C. Group A courses are introductory level courses taught at the level ofundergraduate senior or mezzanine courses. Group B are core graduate level courses withprerequisites from Group A courses. Group C are advanced, specialized and free-standingcourses, often part of the required courses in the Data Science specialization of GraduateProgram in other departments. In all three groups, required courses are indicated as such; theycan not be substituted by other courses without exception approval from the graduate programcommittee.

Group A: Preparatory Knowledge and Skill Areas [Credit for maximum of 3 courses]

We have identified five important knowledge and skills necessary for understanding (andadvancing) core data science knowledge. It is, therefore, important that all our entering students

Ph.D. in Data Science, November 30, 2020 Version 4.1 29 | Page

either have background preparation or have courses available in the program to ensure asuccessful completion of the stipulated doctoral degree program:

1. Algorithms and Programming skills: ability to efficiently translate algorithmicknowledge and analysis methods into suitable programming platforms, especially usingcloud computing resources.

2. Data organization methods and skills: ability to cast data from raw sources intoformats (structured or semi-structured) that are amenable to scalable automated analysis,visualization on various platforms, data wrangling.

3. Numerical Linear Algebra: knowledge of underlying mathematics that supports theability of students to conceptualize transformation operations and convert them intocomputational algorithms such as Principal Component Analysis (PCA).

4. Multivariate Calculus: the mathematical study of a function of multiple variables asrequired in understanding optimization methods such as gradient descent that underliemuch of modern machine learning.

5. Probability and Statistics: understanding randomness in data that is fundamental tounderstanding of the processes that generate data and estimation procedures as a basis forcritical thinking and data analysis; quantifying the accuracy in estimation andmodel-fitting; performing multiple hypothesis tests; and optimality in estimation andprediction.

Given the breadth of the applicant pool, it is understandable that among the incoming studentsinterested in the Data Science graduate program, there may be some lacking the basicbackground at the undergraduate level in one or more of the above areas. This would prohibitthem from taking the relevant graduate level courses. Accordingly, we have devised fivefoundational knowledge area courses described in the catalogue copy and listed below. A studentcan receive credit towards the Ph.D. degree for a maximum of three courses from the list ofcourses below. We expect that students graduating from quantitative undergraduate backgroundswould have taken a majority of these courses (or equivalent). Students with an undergraduatedegree from the Data Science major or a Data Science minor would have taken in all the fiveareas mentioned above thus obviating the need for background preparation.

1. DSC 200: Data Science Programming [New], 4 units: Computing structures andprogramming concepts such as object orientation, data structures such as queues, heaps,lists, search trees and hash tables. Laboratory skills include Jupyter notebooks, RESTfulinterfaces and various software development kits (SDKs). Instructors: Aaron Fraenkel,Yoav Freund

2. DSC 202: Data Management for Data Science [New], 4 units: Principles of datamanagement, relational data model, relational algebra, SQL for data science, NoSQL Databases

Ph.D. in Data Science, November 30, 2020 Version 4.1 30 | Page

(document, key–value, graph, column-family), Multidimensional data management (datawarehousing, OLAP Queries, OLAP Cubes, Visualizing multidimensional data) Instructors:Babak Salimi, Jingbo Shang, Amarnath Gupta

3. DSC 210: Numerical Linear Algebra [New], 4 units: Linear algebraic systems, leastsquares problems, orthogonalization methods, ill-conditioned problems, eigenvalue andsingular value decomposition, principal component analysis. Instructors: Rayan Saab,Alex Cloninger, Gal Mishne

4. DSC 211: Introduction to Optimization [New], 4 units. Continuity and differentiabilityof a function of several variables, gradient vector, Hessian matrices, Taylorapproximation, fundamentals of optimization, Lagrange multipliers, convexity, gradientdescent. Instructors: Yian Ma, Rayan Saab, Arya Mazumdar.

5. DSC 212: Probability and Statistics for Data Science [New], 4 units: Probability,random variables, distributions, central limit theorem, maximum likelihood estimation,method of moments, confidence intervals, hypothesis testing, Bayesian estimation,introduction to simulation and the bootstrap. Instructors: Jelena Bradic, Dimitris Politis,Armin Schwartzmann

Group B: Core Knowledge and Skill Areas [Ph.D. students take at least 6 courses]

Building upon the foundation courses in Group A, the graduate program identifies several coregraduate courses. Four core courses are required for all Ph.D. students, including those with aBachelors in Data Science. The four required courses are:

1. DSC 240: Machine Learning [New], 4 units: A graduate level course in machinelearning algorithms: decision trees, principal component analysis, k-means, clustering,logistic regression, random forests, boosting, neural networks, deep learning. Instructors:Misha Belkin, Yian Ma, Jelena Bradic, Gal Mishne, Virginia de Sa

2. DSC 260: Data Ethics and Fairness [New], 4 units: Ethical considerations regardingprivacy and control of information. Principles of fairness, accountability, andtransparency. Use of metadata to information algorithms. Algorithmic fairness. Policyissues such as the Fair Information Practices Principles Act, and laws concerning the“right to be forgotten.” Instructor: R. Stuart Geiger, David Danks

3. *DSC 241: Statistical Models [New], 4 units: linear/nonlinear models, generalizedlinear models, model fitting and model selection (cross-validation, knockoffs, etc.),regularization and penalization (ridge regression, lasso, etc.), robust methods,

Ph.D. in Data Science, November 30, 2020 Version 4.1 31 | Page

nonparametric regression, conformal prediction, causal inference. Instructors: EryArias-Castro, Jelena Bradic, Dimitris Politis.

4. *DSC 204A: Scalable Data Systems [New], 4 units: Storage/memory hierarchy,distributed scalable computing (i.e., cluster, cloud, edge) principles. Big Data storage,management and processing at scale. Dataflow programming systems and programmingmodels (MapReduce/Hadoop and Spark). [Prerequisite: DSC 202] Instructors: IlkayAltintas, Mai Nguyen.

(*) Depending on academic preparation, a Ph.D. student can take an advanced course on AppliedStatistics, such as MATH 282B instead of DSC 241. Similarly, instead of DSC204A, a studentcan take a course on Algorithms, such as CSE 202: Design and Analysis of Algorithms.

In addition, a doctoral student must select at least 2 out of the following 8 core courses .13

5. DSC 203: Data Visualization and Scalable Visual Analytics [New], 4 units:Commonly used algorithms and techniques in data visualization. Interactive reasoningand exploratory analysis though visual interfaces. Application of data visualization invarious domains including science, engineering, and medicine. Scalable interactivemethods involving exploring with big data and visualization methods. Techniques toevaluate effectivity and interpretability of analytical products for diverse users to obtaininsights in support of assessment, planning, and decision making. [Prerequisite: DSC202] Instructors: Ilkay Altintas, Juergen Schulze

6. DSC 204B: Big Data Analytics & Applications : The goal of this course is to introducethe student to the methods and methodologies of big data analytics. Methods coveredinclude: I/O bottleneck and the memory hierarchy, HDFS, Spark, XGBoost andtensorflow. Methodologies include: writing jupyter notebooks that can be understood andused by people of diverse background Replicability and statistical significance.[Prerequisite: DSC 204B] Instructors: Yoav Freund.

7. DSC 242: High-dimensional Probability and Statistics [New], 4 units: Concentrationinequalities, Markov processes and ergodicity, martingale inequalities, empiricalprocesses, sparse linear models in high dimensions, Principal component analysis in highdimensions, estimation of large covariance matrices. [This class may be cross-listed withthe Mathematics Department.] Instructor: Jelena Bradic, Rayan Saab

8. DSC 243: Advanced Optimization [New], 4 units: Linear/quadratic programming,optimization under constraints, gradient descent (deterministic and stochastic),

13 HDSI faculty plans to propose two additional course options in the areas of Artificial Intelligence andAccountability/Trust and Critical Data Studies.

Ph.D. in Data Science, November 30, 2020 Version 4.1 32 | Page

convergence rate of gradient descent, acceleration phenomena in convex optimization,stochastic optimization with large data sets, complexity lower bounds for convexoptimization. Instructor: Yian Ma, Rayan Saab

9. DSC 244: Large-Scale Statistical Analysis [New], 4 units: Exploratory data analysis,diagnostics, bootstrap, large-scale (multiple) hypothesis testing, false discovery rate,empirical Bayes methods. [This class may be cross-listed with Mathematics and/orBiostatistics.] Instructor: Armin Schwartzman, Jelena Bradic

10. DSC 245: Introduction to Causal Inference [New], 4 units: Causal versus predictiveinference, potential outcomes and randomized experiments (A/B testing), structuralcausal models (interventions, counterfactuals, causal diagram, do-operator, d-separation),identification of causal effect (back-door and front-door criterion, do-calculus),estimation of causal effect (matching, propensity score, g-computation, doubly robustestimation, regression discontinuity and instrumental variables, conditional effects),structure learning (constraint and score-based algorithms), advanced topics (mediationand path-specific effects, bounding causal effect, selection bias, external validity andtransportability, processing missing data, causal inference in networks) [Prerequisite:DSC 212, 240] Instructors: Babak Salimi

11. DSC 250: Advanced Data Mining [New], 4 units: Graph mining and basic text analysis(including keyphrase extraction and generation), set expansion and taxonomyconstruction, graph representation learning, graph convolutional neural networks,heterogeneous information networks, label propagation, and truth findings. [Prerequisite:DSC 190A or CSE158 or equivalent] Instructor: Jingbo Shang

12. DSC 261: Responsible Data Science [New], 4 units: responsible data management,algorithmic fairness (fairness definitions, impossibility results, causal fairness, buildingfair ML models, fairness beyond classification), algorithmic transparency (interpretabilityvs explainability, auditing-black-box algorithms, algorithmic recourse), privacy and dataprotection, sampling bias, reproducibility [Prerequisite: DSC 260, 240, 245] Instructors:Babak Salimi

Thus, together with Group A and Group C courses, doctoral students are required to take aminimum of 5 courses for letter-grade credit. On the other end, students can satisfy all lettergrade course requirements except (satisfactory completion of professional preparation) teaching,survival skills and research seminar courses. These students are expected to enroll into individualresearch (DSC 298) in a section offered by the faculty advisor to meet residency requirementsand maintain graduate student standing during the period of dissertation research.

Ph.D. in Data Science, November 30, 2020 Version 4.1 33 | Page

Group C: Professional Preparation and Elective Courses [Remaining credits]

Group C courses aim to provide either practical experiences in chosen specialization areas, oradvanced training for students preparing for doctoral programs. The courses include requiredprofessional preparation courses: 2 unit TA/tutor training (DSC 599), 1 unit of academic survivalskills (DSC 295) and 1 unit faculty research seminar (DSC 293), all of which must be completedwith a Satisfactory (S) grade using the S/U option.

Courses in this group also serve as a means to directly engage faculty in departments across thecampus who are directly interested in Data Science related topics and instruction. Consequently,we make important courses taught by HDSI affiliated faculty visible to the Data Sciencegraduate students. However, their availability is subject to schedule and enrollment constraints ofthe individual departments. Based on written approval from participating departments, coursesavailable in a given domain in a given year will be announced beginning of the academic yearwith a pre-registration deadline for capacity planning purposes.

Professional Preparation Courses:DSC 599: TA/TUTOR Training: 2 units (S/U): Expected TA duties, evaluation methods. Rulesgoverning TA appointment, conduct and evaluation. Practice effective teaching strategiesincluding communications with students and instructors, conduct of discussion sessions,formulating learning objectives and implementation of active learning strategies. Prerequisites:none. Instructors: Teaching Faculty Staff. CSE 599 can be taken for credit to meet thisrequirement.

DSC 293: Faculty Research Seminar: 1 unit (S/U): Weekly faculty research seminar.Individual HDSI colloquia and distinguished lecturers may be included at the discretion of theinstructor. Instructor: HDSI Faculty.

DSC 294: Research Rotation: 4 units (S/U): Special topics research under the direction of anHDSI faculty member. The research topics may include training in specific researchmethodologies consisting of practical laboratory skills, computational skills or proof systems in aresearch group/laboratory in which the student may pursue doctoral dissertation research.Prerequisites: Data Science graduate students and consent of the instructor.

DSC 295: Academia Survival Skills: 1 unit (S/U): Basic skills necessary to succeed as aresearcher in Data Science including scripting, cloud computing skills, fellowship proposalpreparation, CV preparation, writing reviews, preparing posters etc.

General Elective Courses:

Ph.D. in Data Science, November 30, 2020 Version 4.1 34 | Page

These are advanced courses in core Data Science subjects listed under Group B above, or offeredas research topics (DSC 291), or they can be graduate courses in other departments subject toapproval by the student’s HDSI academic advisor. Additional elective courses will be offered14

based on faculty interest and availability. Any numbered course (other than DSC 291) must beoffered at least once in three years to stay on the course catalogue. HDSI plans to expandofferings as a part of its growing engagement with faculty across other departments.

DSC 205: Geometry of Data, Instructor: Gal Mishne, Alex CloningerGraph-based data modeling, analysis and representation. Topics include: spectral graph theory,spectral clustering, kernel-based manifold learning, dimensionality reduction and visualization,multiway data analysis, multimodal and multiview data representation, graph neural networks.

DSC 213: Topological Data Analysis, Instructor: Yusu WangTopological methods provide powerful tools for analyzing complex data. This course introducesbasic concepts and topological structures, as well as recent theoretical and algorithmicdevelopments, together with examples of applications. Some topics include: basics in topology,simplicial complexes to model data, persistent homology, discrete Morse theory, topologyinference, the Mapper methodology, hierarchical clustering, and integration of topologicalmethods with machine learning.

DSC 231: Embedded Sensing and IOT Data Models and Methods: Sensory data and controlis mediated by devices near the edge of sensor networks, referred to as IOT (Internet of Things)devices. Components of IOT platforms: signal processing, communications/networking, control,real-time operating systems. Interfaces to cloud computing stack, publish-subscribe protocolssuch as MQTT, embedded software/middleware components, metadata schema, metadatanormalization methods, applications in selected CPS (cyber-physical system) applications.Instructor: Rajesh Gupta

DSC 251: Machine Learning in Control: Estimation of stability and uncertainty, optimalcontrol, and sequential decision making. Instructor: Yian Ma

DSC 252: Statistical Natural Language Processing, 4 units. Diving deep to the classical NLPpipeline: tokenization, stemming, lemmatization, part-of-speech tagging, named entityrecognition, parsing, and machine translation. Finite-state transducer, context-free grammar,

14 Academic advisors are appointed by the HDSI GradCom and are required to be different fromthe student's research advisor (or chair of dissertation committee). The primary responsibility ofan academic advisor is to provide an assessment of student progress, and be a spokesperson forthe student welfare to the HDSI faculty.

Ph.D. in Data Science, November 30, 2020 Version 4.1 35 | Page

Hidden Markov Models (HMM), and Conditional Random Fields (CRF) will be covered indetail. Instructor: Jingbo Shang

DSC 253: Advanced Data-driven Text Mining, 4 units: Unsupervised, weakly supervised, anddistantly supervised methods for text mining problems, including information retrieval,open-domain information extraction, text summarization (both extractive and generative), andknowledge graph construction. Bootstrapping, comparative analysis, learning from seed wordsand existing knowledge bases will be the key methodologies. Instructor: Jingbo Shang

DSC 254: Statistical Signal and Image Analysis. 4 units. A graduate level course on signal andimage analysis spanning three main themes. Statistical signal processing: random processes,stochasticity, stationarity, Wiener filter, Kalman filter, matched filter ; Signal processing:time-frequency representations, wavelets, signal processing with sparse representation(dictionary learning) ; Image processing: registration, image degradation and restoration: noisemodels + denoising, image pyramids, random fields. Instructor: Gal Mishne, ArminSchwartzman

DSC 213: Statistics on Manifolds. 4 units. This is a graduate topics course covering statisticswith manifold constraints. Topics include: Frechet means and variances, principal geodesicanalysis, directional statistics, random fields on manifolds, statistical distances betweendistributions, transport problems, and information geometry. Manifold constraints will beconsidered on simplexes, spheres, Stiefel manifold, stratified manifolds, cone of positive definitematrices, trees, compositional data, and other relevant manifolds. Instructor: ArminSchwartzman, Alex Cloninger

CSE 234: Data Systems for Machine Learning. 4 units. Data management and systems issuesacross the whole lifecycle of ML-based analytics in real-world applications, including: datasourcing, preparation, and organization for ML; programming models and systems for scalableML training, feature engineering, and model selection; systems for ML inference, deployment,and explanations; and governed ML platforms and feature stores. Instructor: Arun Kumar

DSC 261: Responsible Data Science, 4 units. Computational aspects of responsible datascience. Computational approaches for enforcing fairness in machine learning, interpretability,explainability, privacy. Prerequisites: DSC 240, DSC 241. Instructor: Babak Salimi.

MATH 281A-B-C: Mathematical Statistics (4-4-4 units). Math 281A consists of statisticalmodels, sufficiency, efficiency, optimal estimation, least squares and maximum likelihood, largesample theory. Math 281B continues and discusses Hypothesis testing and confidence intervals,one-sample and two-sample problems. Bayes theory, statistical decision theory, linear models

Ph.D. in Data Science, November 30, 2020 Version 4.1 36 | Page

and regression. Math 281C finished the sequence with nonparametrics: tests, regression, densityestimation, bootstrap and jackknife. Instructor: Jelena Bradic, Ery Arias-Castro.

MATH 284: Survival Analysis. 4 units. Survival analysis is an important tool in many areas ofapplications including biomedicine, economics, engineering. It deals with the analysis of time toevents data with censoring. This course discusses the concepts and theories associated withsurvival data and censoring, comparing survival distributions, proportional hazards regression,nonparametric tests, competing risk models, and frailty models. The emphasis is onsemiparametric inference, and material is drawn from recent literature Instructor: Lily Xu,Jelena Bradic

MATH 285. Stochastic Processes (4 units). Elements of stochastic processes, Markov chains,hidden Markov models, martingales, Brownian motion, Gaussian processes. Recommendedpreparation: undergraduate probability theory. Instructor: Ruth Williams

MATH 287A. Time Series Analysis (4 units). Discussion of finite parameter schemes in theGaussian and non-Gaussian context. Estimation for finite parameter schemes. Linear vs.nonlinear time series. Stationary processes and their spectral representation. Spectral estimation.Students who have not taken MATH 282A may enroll with consent of the instructor. Instructor:Dimitris Politis

MATH 287B: Multivariate Analysis. 4 units. Bivariate and more general multivariate normaldistribution. Study of tests based on Hotelling’s T2. Principal components, canonicalcorrelations, and factor analysis will be discussed as well as some competing nonparametricmethods, such as cluster analysis. Students who have not taken MATH 282A may enroll withconsent of the instructor. Instructor: Ery Arias-Castro

MATH 287D: Statistical Learning Theory. 4 units. Topics include regression methods:(penalized) linear regression and kernel smoothing; classification methods: logistic regressionand support vector machines; model selection; and mathematical tools and concepts useful fortheoretical results such as VC dimension, concentration of measure, and empirical processes.Instructor: Jelena Bradic.

COGS 243: Statistical Inference and data analysis (4 units): This course provides a rigoroustreatment of hypothesis testing, statistical inference, model fitting, and exploratory data analysistechniques used in the cognitive and neural sciences. Students will acquire an understanding ofmathematical foundations and hands-on experience in applying these methods using Matlab.Cognitive science PhD students must enroll for four units and will be required to do assignmentsand a final project. All other students can enroll for two units and will be required to complete all

Ph.D. in Data Science, November 30, 2020 Version 4.1 37 | Page

assignments but not a final project (or by request of a project and no assignments). Instructor:Angela Yu, Virginia de Sa.

4. Student Advising Information

The following lists important advising information for meeting the course completionrequirements of the proposal doctoral program.

a. Incoming students must meet with their assigned HDSI faculty academic advisor tocustomize their individual program of study. Students must exemplify they have therequired preparation in the above five areas in order to be exempted from taking some (orall) of the above courses. For example, having taken MATH 170A (or equivalent) wouldindicate that the student does not need to take DSC210. Having taken MATH 173A (orequivalent) would show that the student does not need to take DSC211, and having takenMATH 181A (or equivalent) would show that the student does not need to take DSC212.Similarly, having taken BENG216 would show that the student does not need to takeDSC200. Once again, these are only representative examples among a large number ofpreparatory courses across different programs.

b. In addition, incoming students can transfer up to two upper-level undergraduate orgraduate courses taken under a different program and/or university, as long as (i) thesecourses are related to one of the above five foundational areas, (ii) these courses have notalready been used for credit towards a different degree, and (iii) these courses areapproved by the student’s HDSI faculty advisor to establish their relevance to datascience and avoid course duplication.

c. A student can receive credit towards the M.S. or Ph.D. degree for a maximum of twocourses (8 units) taken at the upper-division undergraduate level, subject to the approvalof the student’s faculty advisor. These two courses can be transferred (as discussedabove), or taken during the course of the graduate program. For example, a student canmake use of the equivalencies discussed in part (a) above.

8. Field ExaminationsNo field examinations are required.

9. Qualifying ExaminationsAs discussed in Section 2.4, successful completion of a Ph.D. program in Data Science requirestimely completion of a preliminary assessment, research qualifying and a final dissertationdefense examination.

Ph.D. in Data Science, November 30, 2020 Version 4.1 38 | Page

1. Preliminary Assessment

The preliminary assessment is an advisory examination. It consists of an oral examination in anarea selected by the student with the goal to assess the student's preparation for the proposedarea, including several relevant topics, and identify any courses that are required orrecommended for the candidate based on knowledge shown and critical missing backgroundrevealed. The preliminary examination must be completed before the start of the second year inthe doctoral degree program. The examination dates are announced no later than the start of theWinter Quarter. A failing grade in the preliminary examination would include recommendationfor the opportunity to receive a MS in Data Science degree provided they meet the degreerequirements in no more than one extra quarter over the standard time for the MS program; herewe refer to the newly proposed degree of MS in Data Science (not its online version). Studentswho fail the preliminary examination may file a petition to retake it; if the petition is approved,they will be allowed to retake it one (and only one) more time.

After a student successfully completes the preliminary assessment examination, in the nextannual review of the student (conducted annually in the Fall Quarter as a part of the AnnualFaculty Retreat), the GradCom of the HDSI Faculty Council assigns the academic advisor toprovide necessary updates to the GradCom and helps in setting up the doctoral dissertationcommittee.

2. Research Qualifying Examination, or the UQE

A research qualifying examination (UQE) is conducted by the dissertation committeeconsisting of five or more members approved by the graduate division as per senate regulation715(D). One senate faculty member must have a primary appointment in the department outsideof HDSI. Faculty with 25% or less partial appointment in HDSI may be considered for meetingthis requirement on an exceptional basis upon approval from the graduate division. The goal of15

UQE is to assess the ability of the candidate to perform independent critical research asevidenced by a presentation and writing a technical report at the level of a peer-reviewed journalor conference publication. The research qualifying examination must be completed no later thanfourth year or 12 quarters from the start of the degree program; the UQE is tantamount to theadvancement to PhD candidacy exam

15 This exception is stipulated in view of a large number of formally appointed faculty on HDSIfaculty council (at 25% or 0%) drawn from different departments and divisions thus making itimpossible for a student to find an “outside” faculty member in some areas.

Ph.D. in Data Science, November 30, 2020 Version 4.1 39 | Page

10. Thesis RequirementsHDSI PhD program thesis requirements must meet Regulation 715(D) requirements. Additionalrequirements above UC San Diego Graduate Division requirements are explained below.Specifically, a dissertation in the scope of Data Science is required of every candidate for thePh.D. degree. A draft of the dissertation must be submitted to each member of the doctoralcommittee at least four weeks before the final examination (also known as doctoral defenseexamination, discussed below). The final form of the dissertation document must comply withpublished guidelines by the Graduate Division. Two official copies of the approved dissertationmust be submitted to the Registrar for deposit in the University Library.

Generalization and Reproducibility Requirements:

A candidate for doctoral degree in data science is expected to demonstrate evidence ofgeneralization skills as well as evidence of reproducibility in research results. Evidence ofgeneralization skills may be in the form of -- but not limited to -- generalization of results arrivedat across domains, or across applications within a domain, generalization of applicability ofmethod(s) proposed, or generalization of thesis conclusions rooted in formal or mathematicalproof or quantitative reasoning supported by robust statistical measures. Reproducibilityrequirement may be satisfied by additional supplementary material consisting of code, datarepository along with evidence of independent external use or adoption.

11. Final ExaminationSuccessful defense of the dissertation presentation in a final examination to the doctoralcommittee consisting of five or more members approved by the graduate division as per senateregulation 715(D). One senate faculty member must have a primary appointment in thedepartment outside of HDSI. As explained earlier, partially appointed faculty in HDSI (at 25% orless) are acceptable in meeting this outside-department requirement as long as their main (lead)department is not HDSI.

12. Explanation of Special Requirements Over and Above GraduateDivision Minimum Requirements

Generalizability and Reproducibility Requirements

There are no special requirements over and above Graduate Division minimum requirementsrelated to course work. There are requirements as to the structure of doctoral dissertationspecifically related to evidence of generalizability and reproducibility as explained in Section2.10. The primary reason for this additional requirement is the transdisciplinary nature of the

Ph.D. in Data Science, November 30, 2020 Version 4.1 40 | Page

nascent discipline that places an additional emphasis on identifying core elements of a researchand dissertation that forms a basis for it to be considered primarily in data science.

Rotation Training Program

In many areas of science, research rotations provide the opportunity for first-year PhD studentsto obtain research experience under the guidance of HDSI faculty members. Through therotations, students can identify a faculty member under whose sponsorship their dissertationresearch will be completed.

Given the diversity of background training and intellectual persuasion of entering Ph.D. students,a subset of the HDSI faculty council felt strongly that a rotation training program would beessential to providing an informed match for each Ph.D. student. The interdisciplinary nature ofdata science makes research rotation experience a desirable aspect of the Ph.D. program whileaddressing principally different advising cultures in constituent areas of data science.Accordingly, HDSI seeks a principled way to address this difference in academic advisingculture. One possibility is to require rotation of all candidates, but allow for exceptions throughindividual review of candidates who demonstrate strong background work and inclination towork with a specific faculty member. The other possibility is to identify a subset of admittedstudents who would be good candidates for participation in the first year rotation program. Whilethe exact details will be worked out by the Graduate Committee, we plan to offer participation inthe rotation program to all students at the time of the admission.

A research rotation is a guided research experience lasting one quarter (10 weeks) obtained byregistering for DSC 294 with an instructor. Ph.D. students will participate in a minimum of 2research rotations during their first year, and with a minimum of two different faculty members,and as much as four rotations including summer quarter. A student may rotate twice under thesame faculty member as long as they rotate with at least two faculty members. The goal is tohelp the student identify and develop their research interests and to expose students to newmethodological approaches or domain knowledge that may be outside the scope of their eventualthesis. Research rotations must complete before the start of the second year with a signedcommitment form from a faculty advisor.

The Graduate Committee (GradCom) will develop detailed guidelines on the selection processand conduct of the Rotation Program that specifically address questions such as rotationschedule, whether or not it is arranged by faculty advisor or by the student themselves,orientation of students into the rotation program, guidelines on academic advising by the rotationadvisors and student evaluations in a rotation program, and exception conditions includingextension of the rotation experience for a student to four quarters and other specialcircumstances.

Ph.D. in Data Science, November 30, 2020 Version 4.1 41 | Page

The rotation program will be evaluated for its effectiveness as a part of our first three-yearreview program with recommendations for any changes to the program submitted for review andapproval by the Graduate Council.

13. Relationship of Master’s and Doctoral ProgramsThe proposed doctoral program is closely related to and builds upon the pending Master’sprogram in Data Science (MS-DS) by HDSI that has been recently approved by the GraduateCouncil of the Academic Senate. In particular, the doctoral program is structured to benefit fromour investment and support for courses that enables students with a broad range of backgroundsto successfully complete master’s level work in data science. This ‘onboarding’ of enteringstudents into our Master’s program is equally valuable for our doctoral program coursework eventhough the qualifications for entry into the doctoral program are more detailed in terms ofbackground preparation in computing and mathematical subject areas. In addition, PhD studentsalong their way towards their degree may fulfill all requirements for the M.S. degree, andtherefore can apply and receive it before the conferral of the Ph.D. degree; notably, theRegistrar’s Office does not award the M.S. and the Ph.D. degrees in the same quarter.

14. Special Preparation for Careers in TeachingAll graduate students in the doctoral program are required to complete at least one quarter ofexperience in the classroom as teaching assistants regardless of their eventual career goals.Effective communications and ability to explain deep technical subjects is considered a keymeasure of a well-rounded doctoral education. Thus, Ph.D. students are also required to take1-unit DSC 295 (Academia Survival Skills) course for a Satisfactory grade.

Ph.D. in Data Science, November 30, 2020 Version 4.1 42 | Page

3. Projected Need

1. Student Demand for the ProgramSection 1.1 outlines the programmatic reasons for launching the graduate program as a keyvehicle for advancing knowledge and practice in Data Science. The driver for such a program,indeed, of HDSI as an institution, is to satisfy the growing demand for the graduate program indata science both internally as well as externally. Almost all our external letters of evaluationfrom academic institutions have specifically pointed to “student demand for rigorous PhD-leveltraining in data science” (Alex Aue, UC Davis), because of “unprecedented increase in demand”(Larry Wasserman, CMU) stated similarly by George Michailidis, University of Florida.Academic demand for Data Science postdoctoral scholars and faculty is not hard to see given therise of academic units (departments and schools) in Data Science across the country.

Further, industry surveys have repeatedly shown a soaring need for data scientists. Chief andcredible among these are reports by McKinsey , IBM , Bloomberg . As mentioned, in a survey16 17 18

of our students graduating from Data Science major, roughly a third of them have indicated theirinterest in a graduate degree in Data Science program despite strong placement opportunities forour graduates in the industry for students training in key data science areas of AI and MachineLearning. Since our undergraduate major in Data Science covers basic and advanced courses inthese areas, the graduate interest is primarily for a doctoral research degree program. In 2019,nearly a half (1571) of over 3000 applicants to various graduate degree programs at UC SanDiego who indicated interest in Machine Learning and Artificial Intelligence related topics inElectrical Engineering, Computer Science and Cognitive Science, directly indicated their interestin Data Science programs at HDSI. HDSI offered scholarships to 10 of these students admittedinto degree programs in Computer Science, Math, Electrical Engineering or Cognitive Sciences.

Thus, the demand for a graduate training in Data Science is high and continues to grow. Further,the program not only serves the HDSI mission of educating talent in the area of Data Science,but also serves as a vehicle for continued engagement and proliferation of Data Science trainingacross various graduate programs through new foundational, core and elective course offeringsthat engage domain experts into the field of Data Science. The program provides an excellentmeans to create new educational opportunities for students, especially for underserved andeconomically-disadvantaged student populations who can benefit from graduate scholarshipsoffered by HDSI as a part of its core endowment-supported activities that we have mentionedearlier.

18 https://bloom.bg/3chfute17 https://bit.ly/3dkiDIS16 https://mck.co/2W4LriY

Ph.D. in Data Science, November 30, 2020 Version 4.1 43 | Page

Based on the demographic data of students interested in constituent areas of machine learning,information theory and statistics, one would expect a skewed demographic balance at aboutone-fifth of domestic students. However, with growth of areas such as computational biology andcomputational social science, we expect international fraction to be closer to campus average of60% (7%, Social Sciences, 15% in Arts and Humanities, 23% Biological Sciences, 32% Health,45% GPS, 50% Physical Sciences, 70% JSOE and RSM) and better than the founding areas ofMathematics (90%), Computer Science (80%), Electrical Engineering (83%). Indeed, thedemographics on our Undergraduate Program support an expectation of one-third to one-half ofdomestic students. Following a similar reasoning, we expect a better ratio of California residentapplicants over computer science where we saw four times as many resident applicants for theMS program over the doctoral program. We shall be monitoring and reporting on these numbersin our annual program review as a part of HDSI annual report.

2. Opportunities for Placement of GraduatesData Science as an academic subject area is rapidly emerging with Schools or Colleges of DataSciences (such as Berkeley, Wisconsin, MIT, Columbia) or Departments such as NYU,Michigan, Yale, Cornell, UC Irvine, Virginia just to name a few. UC Berkeley recentlyreorganized and launched the Division of Computing, Data and Society. Regardless ofinstitutional home as a department, division, school or college, the faculty demand for DataScience is expected to rise along with the need for a pipeline of graduate students in the comingyears.

While no survey data is available for the doctoral demand, we have plenty of data and evidencefor placement of graduate students in the industry and civil organizations. As a case study, weexamined the entire class of Ph.D. students who graduated from the Ph.D. program at NYU. Welocated 30 graduates from the program (from 2017 starting year) in LinkedIn working aspostdoctoral scholars, senior data scientists at companies such as Boston Consulting Group,Walmart Laboratories, Facebook, Uber; as research analysts in venture capital and investmentbanking; and as software engineers at Hulu and FreeWheel.

A detailed market analysis in support of the opportunity in ‘Big Data’ and long-term trends inthis domain comes from a recent McKinsey Global Institute report that identified ‘GameChanger’ opportunities for US growth . The May 2011 McKinsey Global Institute report, “Big19

Data: The next frontier for innovation, competition and productivity” , predicted the need for20

over 500,000 data scientists by 2018. McKinsey projected a shortfall of 1.5 million additional

20

http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation

19 http://www.mckinsey.com/insights/americas/us_game_changers

Ph.D. in Data Science, November 30, 2020 Version 4.1 44 | Page

managers and analysts in the U.S. who can “ask the right questions and consume the results ofthe analysis of big data effectively.” These numbers were later analyzed by commercial outfitssuch as kdnuggets.com. Analysis of LinkedIn Workforce report dated August 2018 states“Nationally, we have a shortage of 151,717 people with data science skills”. Kaggle, the largestcommunity of data scientists and now a part of Google has over 2 million subscribers. TheLinkedIn profile of Data Scientists lists 132,083 people with Data Scientist titles that are spreadacross IT services, Computer Software, Financial Services, Banking, Insurance and HigherEducation.

In a separate ‘bottom-up’ study, the EMC Corporation, a publicly traded company with 60,000employees, interviewed 497 data science and business intelligence professionals from around theworld. The results of their study on the need for Data Scientists pointed to some interestingtrends in the computing industry. About two-thirds of the individuals polled believe the demandfor data scientists will outpace supply in the next five years with nearly 30% coming fromprofessionals in disciplines other than computer science. The study also cited the lack of trainingand resources as the biggest obstacles to data science in organizations. These observationsdirectly support the case for the need for rigorous scientific training for the professionals movinginto the data science field.

Data Scientists constitute a separate category of jobs that are currently posted along with IT andbusiness analytics positions. As of this writing, Indeed.com posts openings for 5953 DataScience jobs, Glassdoor lists 21,166 Data Science jobs with a salary range from $76,000 to$148,000 for an average of $118,700. This compares with an average salary of $76,500 forsoftware engineers, $70,700 for computer engineers and $110,200 for computer scientists.Indeed, driven by the opportunities available, a whole industry has sprung up on Data Scienceplacements. Data Science graduates will be well qualified for job titles such as data analysts,21

business intelligence and predictive analysis professionals. The students are likely to findemployment across many areas including internet companies, banking, insurance, investments,engineering and healthcare. We will work with Career and Placement Services as well as AlumniBoard to ensure mentoring and placement of graduates from the Data Science program.

3. Importance to the DisciplineSection 1.2 addresses the intellectual underpinning driving the emergence of the discipline ofData Science where UC San Diego already has an oversubscribed undergraduate program(currently 8th largest major at UC San Diego) as well as active postdoctoral program offered byHDSI. A graduate program is crucial to the establishment of Data Science as an academic area.The proposed doctoral program is a necessary step toward a complete graduate program that

21 https://www.dataquest.io/blog/career-guide-find-data-science-jobs/

Ph.D. in Data Science, November 30, 2020 Version 4.1 45 | Page

establishes the complete pipeline of talent into academic (teaching and research) careers in theemerging discipline.

4. Ways in which the program will meet the needs of the societyData Scientists are highly sought after, showing a societal need for individuals with thisprofessional competency. Furthermore, Data Sciences are already having an impact on manyaspects of society, including e-commerce, financial industries, technology companies, healthcare, and academia. There are few aspects of society that will not be affected by Data Science.The proposed program directly serves the current and growing need of professionals in the areaand its applications.

5. Relationship of the program to research and/or professional interestsof the facultyAs of this writing, the Institute has appointed 2 teaching assistant professors, 11 ladder-rankfull-time professors (8 assistant, 1 associate, 2 full) as well as 13 ladder-rank joint faculty (5 at50% and 8 at 25%) into the Institute, each of which directly lists Data Science as their mainprofessional research interest.

6. Program DifferentiationSections 1.5 and 1.6 cover in detail related programs at UC San Diego and in the UC system.The growth in Data Science degree program is following a middle-out process, starting with alarge number of MS programs counting over 100, with emerging data science bachelor’s degreeprograms such as UC Berkeley and UC Irvine in addition to BS in Data Science offered byHDSI, now in its fifth year. New York University has offered a PhD program in Data Sciencesince 2017, with specializations in Data Science of existing PhD degrees offered by Columbia,Michigan and many other schools. In contrast to emerging programs as specializations ofStatistics, or Engineering degrees, HDSI organization presents us with the capability to designintegrated programs in Data Science for a broader and deeper training through a large anddiversified set of electives. With the increasing participation from faculty and departments acrossthe campus in creating additional electives/specialization courses, we hope to extend the reachand impact of Data Science as a discipline.

Ph.D. in Data Science, November 30, 2020 Version 4.1 46 | Page

4. Faculty

The HDSI faculty community consists of over 200 faculty affiliates organized into 44 differentresearch clusters. The core faculty of HDSI consists of 48 faculty who are appointed at variouslevels of FTE scale reflecting the extent of teaching responsibilities within Data Science . The22

table below lists the Institute faculty. A two-page CV for each faculty member is provided as aseparate attachment to the proposal.

HDSI Faculty Council

FacultyGroup

Names and Title Specialization Areas

Full TimeFaculty in

HDSI(11)

Mikhail Belkin, Professor Machine Learning, Learning Theory,AI: understanding structure in data,analysis and algorithms for non-linearhigh-dimensional data

Justin Eldridge, Assistant TeachingProfessor

Machine Learning Theory: improvingcorrectness of learning algorithms,clustering, process of learning

Aaron Fraenkel, Assistant TeachingProfessor

Machine Learning, Design ofanti-fraud, anti-abuse systems

Yian Ma, Assistant Professor Machine Learning Theory: scalableinference methods, time-series dataand sequential decision making,Bayesian inference algorithms

Arya Mazumdar, Associate Professor Machine Learning, InformationTheory: error correcting codes for usein storage systems

Gal Mishne, Assistant Professor Signal processing and machinelearning for graph-based modeling,processing and analysis of large-scalehigh-dimensional real-world data;unsupervised data analysis inneuroscience

Yusu Wang, Professor Computational Geometry,

22 https://datascience.ucsd.edu/about/faculty/hdsi-faculty-council/

Ph.D. in Data Science, November 30, 2020 Version 4.1 47 | Page

Topological/geometric methods fordata analysis

Babak Salimi, Assistant Professor Data Management, Causal Inference,Fairness in Decision Support Systems

Zhiting Hu, Assistant Professor Machine Learning and NaturalLanguage Processing with applicationsin controllable content generation,enterprise AI platforms and healthcare.

Berk Ustun, Assistant Professor Interpretability, Fairness,Accountability in Machine Learning,Applications of ML to Medicine,Finance, Justice and Business.

Tsui-Wei (Lily) Weng, AssistantProfessor

Robust, Reliable and Trustworthy AISystems, Deep ReinforcementLearning, Fairness in machine learningand Robustness against adversarialattacks.

JointFaculty inHDSI(4.5 FTE,13headcount)

David Danks, Professor Learning & reasoning in humans,Ethics & policy for autonomoussystems, machine learning.

R. Stuart Geiger, Assistant Professor Computational Social Science:computational ethnography,socio-technical systems

Mikio Aoi, Assistant Professor Computational Biology: large-scaleBayesian nonparametric inference,Bayesian optimization, neuronalanalysis

Jingbo Shang, Assistant Professor Data mining, natural languageprocessing, and machine learning:mining and constructing structuredknowledge from massive text corporawith minimum human effort

Benjamin Smarr, Assistant Professor Computational biology, dynamicalsystems, stochastic processes andBiological Circuits

Barna Saha, Associate Professor Theoretical Computer Science:

Ph.D. in Data Science, November 30, 2020 Version 4.1 48 | Page

algorithm design and analysis,probabilistic method and large scaledata analytics

Arun Kumar, Assistant Professor Databases, Data management andsoftware systems, data preparation,model selection, and modeldeployment, ML/AI-based dataanalytics

Yoav Freund, Professor Machine Learning and its applicationsin bioinformatics, computer vision,finance, network routing, andhigh-performance computing.

Jelena Bradic, Associate Professor Statistics: causal inference, ensemblelearning, robust statistics and survivalanalysis with applications togene-knockout experiments,understanding cell cycles, developingnew policies or detecting effects oftreatments onto survival

Rayan Saab, Associate Professor Mathematics: signal processing andanalysis, sparse and low-dimensionalrepresentations of high dimensionaldata, compressed sensing

Alex Cloninger, Assistant Professor Applied harmonic analysis, machinelearning, neural networks, analysis ofhigh-dimensional data

Margaret Roberts, Associate Professor Political Science: automated contentanalysis, political methodology andpolitics of information

Armin Schwartzman, Professor Biostatistics: Signal and imageanalysis; functional andmanifold-valued data;high-dimensional data; modernmultivariate statistics; large scalemultiple testing; applications inbiomedicine and the environment.

Fellows &Administra

Bradley Voytek, Associate ProfessorHDSI Fellow

Cognitive neuroscience: neuralmodeling and simulation, along with

Ph.D. in Data Science, November 30, 2020 Version 4.1 49 | Page

tive large-scale data mining and machinelearning techniques, to understand thephysiological basis of human cognitionand age-related cognitive decline

Ilkay Altintas, Research ScientistHDSI Fellow

Scientific workflows and solutionarchitectures for data andcomputational science, eScienceapplications.

Virginia De Sa, ProfessorHDSI Associate Director

Cognitive Science: computationalneuroscience, visual perception, EEGanalysis, brain-computer interfacesMachine Learning, multi-viewlearning, multi-task learning, computervision applications

Dimitris Politis, DistinguishedProfessor, HDSI Associate Director

Statistics: time series, bootstrapmethods, and nonparametric estimationmethods

Rajesh K. Gupta, DistinguishedProfessor, HDSI Director

Embedded and Cyber-physicalSystems: sensor data organization,metadata models and methods.

0%Appointments

Angela Yu AI (artificial agents): learning anddecision making under uncertainty,social cognition.

Eran Mukamel Computational neuroscience: modelingand analysis of large-scale data sets tounderstand complex biologicalnetworks of the brain

Frank Wuerthwein Physics: experimental particle physics,distributed high-throughput computingwith large data volumes

George Sugihara Complex system dynamics, methodsfor forecasting chaotic systems,neurobiology, gene expression incancer

Henrik Christensen Robotics, computer vision, AI:systems-oriented approach to machineperception, robotics and design of

Ph.D. in Data Science, November 30, 2020 Version 4.1 50 | Page

intelligent machines

Julian McAuley, Assistant Professor Machine Learning: Social Networks,using artificial intelligence in fashionchoice, and data science in variousapplications.

Larry Smarr, Distinguished Professor High-performance computing andnetworking: advancedcyberinfrastructure, experimentalsystems

Lucila Ohno-Machado, Professor Biomedical Informatics: accessible andusable health data and its use inevidence-based health decisions

Michael Pazzani, Professor EmeritusHDSI Distinguished Scientist

Machine learning, explainable artificialintelligence, personalization, internetsearch, and recommendation systems

Michael Holst, Professor Mathematical and ComputationalPhysics: biochemistry and biophysics,computational fluid dynamics,computer graphics, materials science,and numerical algorithms relativity

Robin Knight, Professor Bioengineering, Cellular andMolecular Biology, Computer Science:Microbial analysis

Ronghui (Lily) Xu, Professor Mathematics and Biostatistics:machine learning, statistical inferencefor complex data-types in the presenceof high-dimensional covariates

Ruth Williams, DistinguishedProfessor

Mathematics: Probability theory,stochastic models of complex networks(e.g., in internet, systems biology)

Shankar Subramaniam, DistinguishedProfessor

Bioengineering and Systems Biology

Shannon Ellis, Assistant TeachingProfessor

Human Genetics, Data ScienceEducation

Tara Javidi, Professor Information Theory, Machine

Ph.D. in Data Science, November 30, 2020 Version 4.1 51 | Page

Learning: wireless mesh networks.

Terry Sejnowski, DistinguishedProfessor

Computational Neurobiology,Neurosciences, Neural Networks

Vineet Bafna, Professor Bioinformatics, ComputationalBiology

Young-Han Kim, Professor Information Theory: networkinformation theory.

In addition to the 48 faculty members listed above (15.5 FTE), the Institute is also planning tofill one teaching faculty (LSOE) and one advancing faculty diversity (AFD) position in thecurrent recruiting season. It anticipates additional 1-2 new faculty members to join the Institutefor a total faculty strength of 16-17 FTE including 3 FTE LPSOE and 8 U18 lecturers.

Together, these provide a capacity of 51-52 courses annually by the current ladder-rank faculty inaddition to 5 cross-listed courses as well as teaching by U18 continuing lecturers for a combinedtotal annual capacity of 58-74 courses. The current Data Science undergraduate programaccounts for 35 courses/sections per year. Conservatively, the Institute has the capacity to offer6-10 graduate courses per quarter that enables it to adequately serve the proposed doctoralprogram.

Ph.D. in Data Science, November 30, 2020 Version 4.1 52 | Page

5. CoursesSection 2.7 describes the structure of the program and courses. eCourse description of thecourses and their instructors is attached in Appendix C. We note that the courses have beendevised to ensure broadest possible access to the Data Science graduate program by motivatedstudents from diverse educational backgrounds. Accordingly, the program makes provision forcourse credit for a maximum of 3 out 5 courses (Group A: Foundational Areas) that ensureadequate preparation of students to enable successful completion of the graduate degree. GroupB: Core Areas specify a minimum of 5 courses out of 10 courses that constitute the body ofknowledge and skills in methods/tools areas of data science. This list includes two requiredcourses on Machine Learning and Data Ethics & Fairness. Finally, elective courses (includingthesis research) seek to specialize data science skills in specific areas or application domains.

6. Resource RequirementsAs mentioned earlier that no new or additional resource requirements are expected from thecampus in support of the proposed Ph.D. program. Instead, the Institute’s continuing and plannedexpenses in graduate scholarships, faculty recruiting and cyber-infrastructure resources(including personnel) will be key enablers for the successful operation of the proposed graduateprogram. Starting Winter 2021, the Institute has been allocated space on two floors of the 38,000square feet Literature Building that would provide ample space for housing the faculty, students,and advising staff for the graduate and undergraduate programs. HDSI’s current undergraduateprogram has over 1000 students in its majors and minors, thus making the undergraduate majorto be the 8th largest major. This provides a funding source for graduate Teaching Assistants whowill be primarily drawn from the proposed doctoral program. Faculty fully or primarilyappointed in HDSI currently direct research projects worth $5.5M annually that are managed byHDSI. The sponsored research is expected to grow as additional faculty join starting Fall 2021.The combined research and teaching activities will be taken into consideration by the graduatedivision in setting the graduate scholarship support that is expected to cover one full-year ofnon-employment based graduate student support for each of the entering graduate students. Wepoint out that this graduate support is realized due to additional teaching and research activities(and associated revenues) by the HDSI faculty. HDSI faculty will continue to fund and supervisestudents working on Data Science research projects who are drawn from Ph.D. programs in otherdepartments as well.

7. Graduate Student SupportConsistent with the Graduate Division’s instructions, HDSI plans to offer five year of confirmedfinancial support including tuition remission for all its entering students that would consist of a

Ph.D. in Data Science, November 30, 2020 Version 4.1 53 | Page

combination of research and teaching assistantships. HDSI will guarantee first-year ofnon-employment based support as a part of the rotation program. Our normative expectation isthat Ph.D. students are able to confirm a research advisor by the end of the first-year who will beresponsible for providing graduate student research support.

Overall, HDSI’s guarantee of financial support is rooted in four primary sources of funding forgraduate students: (a) Graduate division support of graduate students based on campus policy ondistribution of scholarships support to graduate students. It was earlier known as “block grants”derived on the basis of campus policies for academic units based on their need and extramurallyfunded research activities; (b) Teaching assistant support. Currently, TA support is at 8 TA FTEper year and expected to rise with increase in undergraduate enrollment in our majors and minors(from currently at 700 students in the major, 5000 students in classes annually to 1000 majorsand 10000 students in classes annually in three years).

Graduate students will be trained and once determined to be qualified per university regulations,they will be offered TAships; (c) Extramurally funded research projects including traininggrant(s). Extramural funding is likely to be the largest source of funding for our graduatestudents, given the extensive growth and consistent availability of research funded byorganizations such as NSF, DARPA, DOE, ARL and others. Data Science areas are among themost intensely invested areas of research both by public and private organizations (foundations).Based on budget analysis provided as a part of 3-year FTE planning, we anticipate annualresearch support of $200K/year per faculty appointed in the institute; (d) endowment-supportedgraduate student scholarships. We have currently budgeted $600K per year for this program. Weexpect to grow this program with the growing industry contributions and philanthropic support tothe Institute. Financial aid will be available to approximately one quarter of our best students inthe early years. As we scale the program, the ratio of financial support may drop to no less than15% of the total student population. In addition, as outlined in our EDI strategy (Section 1.6), theInstitute will directly offer scholarship for URM students.

8. GovernanceThe program is offered by the Halicioglu Data Science Institute, established as an academic unitby the Academic Senate on June 6, 2018. HDSI faculty council is the governing body of allacademic programs by the Institute. A copy of Bylaws is attached in the Appendix.

9. Changes in Senate RegulationsNo changes to Senate regulations are proposed.

Ph.D. in Data Science, November 30, 2020 Version 4.1 54 | Page

Appendix A: Listing of Research AreasThe following table lists topical areas covered in doctoral research efforts engaging core HDSI

faculty organized by seven core themes of HDSI.

AI: Automated Reasoning, Knowledge Representations, Cognition

Knowledge Representations, Distributed representations, learning multiple levels of representation orbased on composing learned functions

Multi-agent Systems combined with graph signal processing, network analysis

Automated decisions, Computer augmented decision making (with applications in geospatial analysis,health)

Intelligence amplification and application to decision making, Augmented Cognition

Machine Learning: Theory, Algorithms, Systems

Adversarial ML, ML for security and privacy, algorithmic fairness

Reinforcement learning, Learning as optimization, multi-task learning, transfer learning, learning tolearn

Algorithms, game-theoretic setups such as GANs, realistic study of the limits of machine learning andapplied statistics

NLP, language technologies, unstructured text analysis

Accelerated ML Systems: architectures, algorithms, tools and libraries for accelerated ML systems

Data Infrastructure: Data Viz, Programming and DB Systems

Data Visualization, Visual Analytics, HCI for data science

Databases/data systems for data systems

Data mining, data integration from multiple modalities (text, time-series, imaging etc)

Methods and System design to ensure data security and privacy

Distributed/cloud computing

IOT and Cyber-Physical Systems, AI sensors

Software engineering and PL for Machine Learning, ML Systems

Mathematical Foundations of Data Science: Causal Inference, Hypothesis Testing, OptimizationTheory

Causal inference in machine learning, Sequential decision making methods and their statisticalanalysis

Non-parametric data analysis

Multiple hypothesis testing and high dimensional data analysis, false discovery rate

Applied probability problems for the analysis of data science methods

Ph.D. in Data Science, November 30, 2020 Version 4.1 55 | Page

Submodular optimization, transport theory (optimal transport/Wasserstein distance, parallel transport,especially on non-Euclidean spaces), optimization theory and algorithms that use large data to reducecomputation without compromising statistical validity

Digital Humanities: DS in Society, Ethics/Policy

Ethics, data science in public interest

Philosophy of information: how data science allows us to learn about the world, information transferfrom data to models, prediction/interpretation tradeoffs

Privacy and public policy, Accountability measures and methods from Data Science

Understanding humans, data science in language, literature and arts

Computational social science, data-driven sociology

Computational Linguistics, Speech versus intentional language, Conversational design and humanbehavior

Systems and Applications: CPS/IOT, Architectures, Health, Economics, Robotics

Brain-inspired Computing Machines, Neuromorphic Architectures, Hyperdimensional Computing

Medical signal processing, computational medicine, medical data integration challenges (patientrecords, device records, insurance records etc); Causal inference in Medical Informatics

Data-driven developmental economics, new economic theories based on automated data-drivenmeasurements and methods

Statistics and economics: statistical game theory (focus on statistical and computational properties ofthe Nash equilibrium and its implications to fairness), market efficiency (antitrust), and other marketdesign problems

Geospatial data collection and analysis

Robust and commonsense learning in robotic systems

ML augmented organizational workflows, data science applications in organizational behavior,business

ML methods for dynamic, time, causal inference with implications for Political Science, Economicsand/or Healthcare

IOT for health, autonomous vehicles

eSciences: Real-Time Instrumentation Data, Sustainability

Environmental Data Sciences

Data science in ecology and conservation, Sustainability at scale

Data Science in Precision Imaging Systems: Applied Optical and Electron Microscopy

Data Science for High-throughput Biology, Sequencing, Mass Spectrometry in support ofBioinformatics, Quantitative Biology

Ph.D. in Data Science, November 30, 2020 Version 4.1 56 | Page

Appendix B: Letters of Support Solicited

As of the writing, following letters of support have been received and enclosed. Additional letterswill be provided in time for review by the Graduate Council meeting in January 2021.

Divisional Deans & Directors1. Al Pisano, Dean of Jacobs School of Engineering2. Peter Cowhey, Dean of Global Policy Institute3. Carol Padden, Dean of Social Sciences4. Cristina Della Coletta, Dean of Arts and Humanities5. Cheryl Anderson, Dean, Herbert Wertheim School of Public Health and Human

Longevity Science.6. Lisa Ordóñez, Dean, Rady School of Management

Department Chairs1. James McKernan, Chair, Department of Mathematics2. Jonathan Cohen, Chair, Department of Philosophy3. Brian Goldfard, Chair, Department of Communication4. Bill Lin, Chair, Department of Electrical and Computer Engineering5. Kun Zhang, Chair, Department of Bioengineering6. Sorin Lerner, Chair, Department of Computer Science and Engineering7. Stefan Leutgeb, Neurobiology Section, Division of Biological Sciences8. Thad Kousser, Chair, Department of Political Science.

External Reviewers:1. Professor Alexander Aue, Chair, Department of Statistics, Co-Director, Center for Data

Science and Artificial Intelligence Research, UC Davis2. Professor Larry Wasserman, Department of Statistics and Data Science, CMU3. Professor George Michailidis, Founding Director, UF Informatics Institute, University of

Florida4. Professor Sharad Mehrotra, Information and Computer Science, UC Irvine.

Faculty, Instructors1. Jorge Cortes, MAE2472. James Fowler, POLI 2873. Trey Idekar, BNFO 286 / MED 2834. Massimo Franceschetti, ECE 2275. Vineet Bafna, CSE 283 / BENG 203, CSE 280A6. Alex Cloninger, DSC 210, Math 170A, Math 277A7. Arun Kumar, CSE 234, DSC 202, DSC 204

Ph.D. in Data Science, November 30, 2020 Version 4.1 57 | Page

8. Rayan Saab, DSC 210, DSC 211, DSC 242, DSC 2439. Armin Schwartzman, DSC 244, DSC 212, DSC 241, DSC 24210. Angela Yu, COGS 24311. Brad Voytek, COGS 28012. Ronghui (Lily) Xu, MATH 28413. Siavash Mirarabbaygi, ECE

Ph.D. in Data Science, November 30, 2020 Version 4.1 58 | Page

A LB E R T ( “ A L” ) P . P IS A N O , D E A N 9 5 0 0 G I LM A N D R I V EIR W IN A N D J O A N J A C O BS S C H O O L O F E N G IN E E R IN G LA J O L LA C A L IF O R N IA 9 2 0 9 3 - 0 4 0 3W A LT E R J . Z A B L E P R O F E S S O R O F E N G IN E E R IN G T E L: ( 8 5 8 ) 5 3 4 - 6 2 3 7 F A X : ( 8 5 8 ) 8 2 2 - 3 9 0 4 7 3 1 3 J A C O BS H A L L E M A I L: D e a n P i s a n o @ e n g . u c s d . e d u

29 November 2020 TO: Graduate Council FROM: Albert P. Pisano, Dean of Engineering RE: Doctor of Philosophy Degree in Data Science (Ph.D./DS)

I am writing to express my strong support for the new, proposed Doctor of Philosophy Degree in Data Science (Ph.D./DS), to be offered by the Halicioglu Data Science Institute (HDSI). There already exist significant collaborations between HDSI and the Jacobs School of Engineering, and this new Ph.D. degree will serve to strengthen and expand that collaboration. I am pleased to report that there are six jointly appointed faculty between HDSI and the Jacobs School on which this strong collaboration will be based: Benjamin Smarr, Bioengineering, Jingbo Shang, CSE, Barna Saha, CSE, Joav Freund, CSE, Arun Kumar, CSE, and Rajesh Gupta, CSE. In my conversations with faculty colleagues I find there is broad support for the proposed Ph.D. program. Indeed, because HDSI is a unit that has faculty who conduct research with Ph.D. students, it seems appropriate that HDSI have the ability to offer the proposed degree. Further, Engineering is willing to collaborate with HDSI in areas of common research interest, including Algorithms, Artificial Intelligence, Machine Learning, Data Infrastructure and Systems, as well as application areas of the research. There are many opportunities for broadening the course offerings at UCSD, and the course offerings in Data Science will benefit the Ph.D. and MS students in CSE and ECE. Similarly, a number of the graduate classes in CSE and ECE are sure to be of interest to Data Science PhD students.

I am confident that HDSI and Engineering will move forward together in a mutually-

beneficial way, and I anticipate there will be high demand from students for this program, and look forward to an exciting new crop of Ph.D. researchers.

Sincerely,

Albert ("Al") P. Pisano Member, US National Academy of Engineering Member, US National Academy of Inventors Walter J. Zable Distinguished Professor & Dean Irwin and Joan Jacobs School of Engineering University of California, San Diego

PETER F. COWHEY 9500 Gilman Drive #0519 Dean, School of Global Policy and Strategy La Jolla, California 92093-0519 Qualcomm Chair in Communications and Technology Policy T: (858) 534-1946 | F: (858) 534-3939 [email protected] | gps.ucsd.edu To: The Graduate Council From: Peter Cowhey, Dean, GPS Re: HDSI proposal for a PhD program in Data Science I have reviewed the proposal and fully endorse it. The proposal (and all of the HDSI work) recognizes that the application of data science to applied problem solving requires partnership with domain experts. HDSI has worked to organize a Data Science in Society cluster that fulfills this philosophy. It includes a number of faculty members from GPS. As a result, the PhD program has a roster of pertinent researchers (and teachers) available already in place whose interests will lead to constructive spillovers to the teaching and research programs of GPS. The cluster is more than an aspirational. HDSI is already providing support for some of the large research initiatives at GPS. Two examples are the big data program for the analysis of the politics and economics of China that is housed at our 21st Century China Center and the other is the "Big Pixel" program employing satellite imagery analysis in our Center on Global Transformation. These and other initiatives are leading to a new set of graduate courses on marrying data science to policy analysis. One of our senior faculty members, Professor John Ahlquist, has made a multi-year commitment to being the "Sherpa" for this undertaking. We expect some of these courses will be available to HDSI PhD students. Finally, it should be noted that GPS has no plan for a PhD program that would conflict with the proposed HDSI offering.

Dean, Division of Social Sciences University of California San Diego • 9500 Gilman Drive # 05020 • La Jolla, California 92093-0502 Tel: (858) 534-6073 • Fax: (858) 534-7394 • socialsciences.ucsd.edu

November 30, 2020

TO: Rajesh Gupta, Director Halıcıoğlu Data Science Institute

RE: Proposal for a PhD program in Data Science

In another letter to the Graduate Council, we offered strong support to HDSI’s plans for a M.S. program in Data Science. I am pleased to also support HDSI’s plans for a PhD program in Data Science. The proposed program shows there is strong coherence across their degree programs, at the undergraduate and masters’ level as well. This proposal for a PhD is a natural extension of their curricular planning to date.

We do not have a surplus of courses and programs about data science. Although areas such as machine learning and artificial intelligence are also taught in Computer Science and Cognitive Science (and Mathematics), the teaching emphasis and the cases under study across the divisions are different enough as not to be redundant or overlapping. Further, because we have a number of faculty in the Social Sciences (e.g. Voytek, Roberts, Geiger) who participate in the teaching programs of the HDSI, their perspectives are regularly considered and incorporated into HDSI courses – thus maintaining cooperative intellectual and research programs across divisions while teaching to the needs and ambitions of the different PhD programs.

I agree with HDSI that the demand for high-level training in data science is such that having multiple PhD programs is not a problem. In fact, the areas of machine learning and artificial intelligence will benefit from being taught in different ways for different populations of students who will bring their skills to a broad job market that has an acute need for skill in this area. The proposed program has many interesting and thoughtful elements, and it is clear they have thought very carefully about pedagogy at this level. I welcome their innovative teaching into our campus community.

Sincerely,

Carol Padden Dean, Division of Social Sciences

Division of Arts and Humanities University of California San Diego ∙ 9500 Gilman Drive #0406 ∙ La Jolla, California 92093-0406 Tel (858) 534-6270 ∙ Fax (858) 534-0091 ∙ artsandhumanities.ucsd.edu

November 29, 2020

To: Rajesh K. Gupta, Director, Halıcıoğlu Data Science Institute (HDSI)

From: Cristina Della Coletta, Dean Arts & Humanities

RE: Proposed PhD Degree in Data Science

Dear Professor Gupta:

I am very pleased to offer my support for the creation of a Doctoral Degree program in “Data

Science” (PhD/DS) at UC San Diego.

The proposal frames the program’s core objectives very clearly around three main competency

areas, namely, to train students to (a) collect raw data for computational modeling and analysis;

(b) appropriately use algorithms in a specific domain by developing effective optimization

methods; and (c) interpret, analyze, and visualize the results of these algorithms to complete

relevant scientific inquiry.

The program is designed around multiple specialization tracks, in order to allow students from

diverse academic backgrounds to both develop shared core competencies and explore domain-

elective courses.

As noted in the proposal, the demand for PhD education in Data Science is growing across

multiple institutions. The structure of the HDSI proposed program is especially nimble and

innovative, as it will allow doctoral students to train across various graduate programs through

foundational, core and elective course offerings, in partnership with other Academic Units. This

feature makes the program especially competitive.

Not only will the PhD program in Data Science create timely transdisciplinary opportunities for

many students; it will also play a crucial role in serving underserved and economically-

disadvantaged student populations, thanks to the graduate scholarships offered by HDSI as a part

of the Institute’s foundation-supported initiatives.

The proposal for the PhD in Data Science is well-argued and meticulously presented. I believe

the PhD degree program in Data Sciences will provide a welcome addition to graduate studies at

UC San Diego. I look forward to seeing this program take off, and to further opportunities of

collaboration between the Division of Arts and Humanities and HDSI.

Sincerely,

Cristina Della Coletta Dean, Arts & Humanities

Cheryl A. M. Anderson, PhD, MPH, MS

Professor and Dean • Herbert Wertheim School of Public Health and Human Longevity Science

UC San Diego • 9500 Gilman Drive # 0628 • La Jolla, California 92093 • Tel: (858) 534-8363

November 29, 2020 Rajesh K. Gupta, Director UC San Diego Halıcıoğlu Data Science Institute Dear Dr. Gupta,

On behalf of the Herbert Wertheim School of Public Health and Human Longevity Science (HWSPH), I am pleased to offer support for your proposal for a program of graduate study in data science leading to a Doctor of Philosophy in Data Science (PhD/DS).

Thank you for the opportunity to review and comment on your proposal. Strengths of this proposal are that it addresses an emerging field of study for which there is a demand for training, it is highly relevant to a wide range of industries, uses a campus-wide collaborative approach, and I see it as complementary to the degrees we offer in the HWSPH. It was also great to have faculty from the HWSPH’s Biostatistics and Bioinformatics group (Drs. Xu and Schwartzman) included in the planning process.

I offer my best wishes as you create this important training program, and look forward to supporting it when it is approved.

Sincerely,

Cheryl A. M. Anderson, PhD, MPH, MS Professor and Dean Herbert Wertheim School of Public Health and Human Longevity Science

Lisa D. Ordóñez, PhD 9500 Gilman Drive # 0553

Dean La Jolla, California 92093-0553

Stanley and Pauline Foster Endowed Chair Tel: (858) 822-0830

Rady School of Management [email protected]

rady.ucsd.edu

Dec. 1, 2020

To: Graduate Council

From: Dean Lisa Ordóñez

Re: Doctor of Philosophy in Data Science

Dear Colleagues,

Several faculty members at Rady and I have had an opportunity to review HDSI’s proposed degree program for a

Doctor of Philosophy in Data Science. We are very supportive of the proposal and believe it will be well-

received by other units on campus. This proposal is a timely one and will help address an unmet and growing

need for graduate education in data science.

There is a lot to like in the details of HDSI’s proposal. First, a fundamental goal of the program is to “lay the

foundation for future researchers who can expand the boundaries of knowledge in Data Science itself”. This is an

important aspect that will help produce capable researchers who will become leaders in theory and practice of

data science and advance the emerging field.

We view the PhD in Data Science program as complementary to our degree programs and believe the experience

of graduate students focusing in the areas of data science and its applications will be positively impacted by its

existence. For instance, our students will mutually benefit from some of the new graduate courses that are created

as part of this proposal. While our students typically take their breadth electives within Rady, some students seek

courses in departments across campus. These electives require approval by program directors who assess fit.

In closing, I am supportive of the Doctor of Philosophy in Data Science program presented in this proposal. It is

very well thought out and designed with aspects that make it unique within UC San Diego. I anticipate that the

program will be successful in achieving its goals.

Best regards,

Lisa D. Ordóñez, PhD

Dean, Rady School of Management

Stanley and Pauline Foster Endowed Chair

UNIVERSITY OF CALIFORNIA, SAN DIEGO UCSD

BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ

Professor James McKernan, ChairDepartment of Mathematics9500 Gilman Drive # 0112La Jolla, CA 92093–0112

Tel: (858) 534-6347Fax: (858) 534-5273

Email: [email protected]: http://www.math.ucsd.edu/∼jmckerna/

October 30, 2020

Professor Rajesh GuptaDirector HDSIUCSD

Dear Professor Gupta,

I am writing to express the Mathematics Department’s support for the new proposed Doctor of Philosophy inData Science (PhD/DS) to be offered by the Halicioglu Data Science Institute (HDSI) in collaboration withvarious academic units on the UC San Diego campus.

The proposed program will give its students the knowledge, skills and awareness required to perform datadriven taskts to do research which will expand the boundaries of Data Science. This training and research willprovide students with the knowledge, skills and research expertise for a career in Data Science, in academia,industry or the civil service.

As mathematics and statistics are integral components of the interdisciplinary field of Data Science, andindeed many UCSD Mathematics Department faculty are affiliated with HDSI, the department is pleased tofurther cement the connections that this proposal will make with the Mathematics Department. In particularmany courses (including DSC 205, 210, 211, 212, 213, 240, 241, 242, 243, 244, 281ABC, 284, 285, 287AB)will often be taught by faculty who have joint appointments with HDSI and mathematics.

The Mathematics Department looks forward to cooperating with HDSI on this program to further catalyzeconnections and collaborations related to data science.

Yours sincerely,

James McKernan, FRSDepartment ChairCharles Lee Powell Endowed Chair in Mathematics

UNIVERSITY OF CALIFORNIA, SAN DIEGO

BERKELEY · DAVIS · IRVINE · LOS ANGELES · RIVERSIDE · SAN DIEGO · SAN FRANCISCOC SANTA BARBARA · SANTA CRUZ

JONATHAN COHEN

PROFESSOR AND CHAIR

DEPARTMENT OF PHILOSOPHY

9500 GILMAN DRIVE, DEPT. 0119

LA JOLLA, CALIFORNIA 92093–0119

(760) 814-1110

FAX: (858) 534-8566

[email protected]://aardvark.ucsd.edu

October 8, 2020

UC San Diego Academic SenateDear Committee Members:

On behalf of the Department of Philosophy, I write to offer support for the proposal for a PhD program inData Science that would be housed within Halicioglu Data Science Institute (HDSI).

Data science is clearly an important emerging field with connections to many areas of intellectual inquiryspread across our University. The creation of a PhD program that would capitalize on these resources in away that benefits a new generation of scholars is an exciting prospect.

We hope and expect that the establishment of such a program will lead to further cooperation in researchand instruction between Philosophy and HDSI in the areas of causal discovery, machine learning, data ethics,and more. We look forward to discussing ways in which we might contribute as the shape of the new programbecomes clearer.

We are confident that HDSI has the infrastructure and expertise to run the proposed PhD program. OurDepartment does not expect that the program will negatively impact our research or pedagogical missionsat any level, and so endorse the proposal without reservations.

Please feel free to contact me for any additional questions.

Sincerely,

Jonathan Cohen, Professor and Chair

DEPARTMENT OF COMMUNICATION, MC0503 9500 GILMAN DRIVE OFFICE: (858) 534-0234 LA JOLLA, CALIFORNIA 92093-0503 FAX: (858) 534-7315

To: Rajesh K. Gupta, Director, Halıcıoğlu Data Science Institute (HDSI) From: Brian Goldfarb, Associate Prof. and Chair Department of Communication Subject: Support for proposed doctoral degree program in “Data Science” (PhD/DS October 26, 2020 Dear Rajesh, I am writing to express support from the Department of Communication for the proposed doctoral degree program in “Data Science” (PhD/DS) to be offered by the Halıcıoğlu Data Science Institute (HDSI). As a key partner to HDSI in building Data Science, the Communication Department views the proposed program as an important step in advancing interdisciplinary cross-fertilization between the two units. Our department has two faculty affiliated with HDSI: Kelly Gates and Lilly Irani, and one, Stuart Geiger, who holds a joint appointment across the two units. Since joining the faculty this fall, Prof. Geiger has been working to establish a working group on Critical Data Studies which promises to build a tighter fabric of connections between HDSI and Communication. The creation of the proposed PhD promises to set up a platform to expand the interactions among our faculty around research and graduate advising/mentorship. The proposal lays out a plan for a program with clear standards for academic rigor. The curriculum includes a well-considered set of requirements as well as options that balance the establishment of shared scholarly concerns with cross-fertilization of contributing disciplines. The proposal also articulates the impressive scope and strengths of faculty who would participate in the program and establishes a reassuring picture of the adequacy of the facilities that will be dedicated to research and teaching. Finally, the initial success of the undergraduate program and the interest from graduate students in affiliated departments signals that HDSI can anticipate a strong applicant pool, while the growth of the field bodes well for the placement prospects for its graduates. In summary, I confirm support of this proposal and look forward to the opportunities for collaborations between faculty and students in our Department and HDSI. Sincerely,

Brian Goldfarb, Assoc. Professor and Chair, Department of Communication

PROF. BILL LIN ELECTRICAL & COMPUTER ENGINEERING TEL: (858) 822-1383 9500 GILMAN DRIVE, MAIL CODE 0407 E-MAIL: [email protected] LA JOLLA, CALIFORNIA 92093-0407

DATE: October 26, 2020 TO: Graduate Council FROM: Bill Lin, Chair, Department of Electrical and Computer Engineering RE: Doctoral Degree in Data Science Dear Colleagues, It is my pleasure to write this strong letter of support for the newly proposed Doctoral Degree in Data Science (PhD/DS), to be offered by the Halicioğlu Data Science Institute (HDSI). The propsed doctoral degree will serve the need for advanced graduate students in the area of Data Science. Demand for data scientists is clearly exploding in both academia and industry, as data science is being applied in all aspects of society. The proposed PhD/DS program in HDSI is very timely to serve this need. The proposed program is very strong in both quality and academic rigor. Further, HDSI is fully capable of administering this program given the size and expertise of the HDSI faculty as well the facilities and budgets available to HDSI. Also, the exploding demand for data scientists in both academia and industry will ensure a strong applicant pool as well as exceptionally strong placement prospects for the graduates of the proposed PhD/DS program. In addition, the proposed HDSI PhD/DS program will facilitate closer engagements and collaborations between the faculty in HDSI and ECE, as well as other departments across campus. Overall, the proposed HDSI PhD/DS program will undoubtedly bring much greater visibility to UC San Diego as the preeminent university for artificial intelligence, machine learning, and data science. I look forward to cooperating with HDSI on this program as well as other initiatives. Best regards,

Bill Lin, Chair Electrical and Computer Engineering Department University of California, San Diego

KUN ZHANG, PH.D. TELEPHONE (858) 822-7876 PROFESSOR FAX (858) 534-5722 CHAIRMAN E-MAIL: [email protected] DEPARTMENT OF BIOENGINEERING 9500 GILMAN DRIVE 0412 LA JOLLA, CALIFORNIA 92093-0412

November 29, 2020 To: Professor Rajesh Gupta, Director, HDSI at UCSD

Re: Data Science PhD Program

I am writing on behalf of the Department of Bioengineering to enthusiastically endorse and support the graduate program Doctor of Philosophy in Data Science.

Given the vast importance of Data Sciences in modern society, it is imperative that we train qualified professionals who can join the workforce solving problems where big data is the paradigm. I have reviewed your proposed program and the design of the curriculum is excellent and will be ideal for training students.

The proposed PhD Program will benefit from the interactions with Bioengineering and our top-notched Bioengineering graduate program. We anticipate that students in your PhD Program will have the options to take a number of Bioengineering courses related to systems biology, genomics and imaging. I am also excited that we have an outstanding joint hire in Ben Smarr, who will serve as a bridge between our programs. I should also add that my colleague Dr. Shankar Subramaniam is launching a new course on Biomedical Data Sciences in the academic year 2020-21, which would be valuable to the two recently proposed MS Programs in BE and HDSI, as well as this PhD program.

Several other courses offered by Bioengineering including graduate courses in technologies that generate vast data in biomedicine and complex modeling courses that transform data into knowledge would be valuable for our Programs.

I look forward to working with you and helping make the PhD/DS graduate program harbinger of the future. Best regards,

Kun Zhang Professor and Chair Department of Bioengineering

DATE: 11/29/2020

FROM: Sorin Lerner, Chair, Department of Computer Science and Engineering

TO: Rajesh K. Gupta, Director, Halıcıoğlu Data Science Institute (HDSI)

Dear Rajesh,

The department of Computer Science and Engineering fully supports the proposed PhD program

in Data Science. Since HDSI is a unit that has faculty who conduct research with PhD students, it

only makes that HDSI have its own PhD program. We are happy to work with HDSI on areas of

common interest, including Algorithms, Artificial Intelligence, Machine Learning, Data

Infrastructure and Systems, and application areas of common interest. There are many

opportunities for broadening the course offerings at UCSD, and the course offerings in Data

Science will benefit the PhD and MS students in CSE. Similarly, some of the graduate classes in

CSE, such as the CSE 202 course on Algorithms (but others too), will be of interest to Data

Science PhD students, and we are happy to make seats available to the Data Science PhD

students in those classes.

Sincerely,

Sorin Lerner

STEFAN LEUTGEB, Ph.D. PACIFIC HALL

CHAIR AND PROFESSOR ROOM 3225A

NEUROBIOLOGY SECTION 9500 GILMAN DRIVE

DIVISION OF BIOLOGICAL SCIENCES LA JOLLA, CA 92093-0357

TEL: (858) 246-0824

FAX: (858) 534-7309

EMAIL: [email protected]

Nov 2, 2020

Rajesh K. Gupta

Director

Halıcıoğlu Data Science Institute

Dear Rajesh,

I write to express my highest enthusiasm for the proposed doctoral degree program in Data Sciences by

the Halıcıoğlu Data Science Institute (HDSI). As you know, there is a strong connection of Neurobiology

research with many of the core themes of HDSI, including artificial intelligence, machine learning, and

scientific discovery. To take advantage these shared interests, Neurobiology and HDMI have hired several

faculty at the intersection between our fields over just the past three years, including Gal Mishne and Alex

Cloninger at HDSI and Marcus Benna and Yonatan Aljadeff in Neurobiology. In addition, our first joint

faculty member, Mikio Aoi, has just been hired and arrived on campus.

A PhD program in Data Sciences will fill a major gap in our current offerings of PhD programs, in that

your program will not be merely geared towards prospective students in mathematics, computer science

and engineering but will also attract those with an avid interest in one of the application sciences such as

chemistry and biology. From our perspective, computational neuroscience, in combination with big data

from neural recordings is a discipline that strongly benefits from the integration with data science, and

your PhD program will be unique in attracting students at this intersection between disciplines.

Conversely, many of the foundational ideas for machine learning have at least a loose analogue in circuit

mechanisms that are used by the brain, and there is an enormous potential in further applying findings

from rigorous experimental research to engineering applications. These are new frontiers that can be

effectively approached by the type of PhD applicant that only your program can attract, such as students

with a background in both the life sciences and in computer sciences. Importantly, there is also an

increasing number of prospective employers in both academia and industry who are in need of a

workforce who can lead projects in data analytics in fields that include molecular biology, biochemistry,

and neurobiology.

HDSI has already brought together an impressively interdisciplinary group of faculty who have the

expertise to train a new generation of data scientists. Including trainees at the doctoral level is particularly

valuable, because PhD students do not only contribute to the training of students at the undergraduate and

UCSD UNIVERSITY OF CALIFORNIA, SAN DIEGO

SANTA BARBARA • SANTA CRUZ BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO

Masters level but are also invaluable for the research mission and the continued national and international

leadership of faculty at HDSI. The launch of your proposed PhD program is therefore well timed in that

all the necessary expertise across disciplines is now in place so that there will be a rewarding interaction

that will further strengthen the status of HDSI as one of the premier institutions of its kind. Based on the

faculty with appointments and joint appointments in HDSI, their expertise is well suited to teach the range

of classes and seminars that are proposed. Taken together, the PhD program in Data Sciences will not

only lead to a rigorous education of the students in the program but to also fill a gap that is currently not

covered by more specialized PhD programs in the analytical and application disciplines. By admitting

students who can bridge gaps between these established programs, numerous PhD programs will be

substantially strengthened by the cohort of students that can go between these diverse fields.

Given that close collaborations between HDSI and our department have already been developing among

faculty, I anticipate that your PhD program will only further foster these interactions and become a pillar

for the type of interdisciplinary science that UC San Diego stands for. Such interdisciplinarity will benefit

the entire campus community and particularly students at all levels within your program as well as

beyond your program. I therefore most enthusiastically support the addition of a PhD program in Data

Sciences.

Sincerely,

Stefan Leutgeb, PhD

Professor and Chair

Neurobiology Section, Division of Biological Sciences

Fellow, Kavli Institute for Brain and Mind

University of California, San Diego

UNIVERSITY OF CALIFORNIA, SAN DIEGO UCSD BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ

THAD KOUSSER DEPT. OF POLITICAL SCIENCE 0521 Professor and Department Chair 9500 Gilman Drive E-MAIL: [email protected] La Jolla, California 92093-0521 TEL: (858) 534-3239 FAX: (858) 534-7130

October 30, 2020 Dear Academic Senate Members,

As chair of the Department of Political Science, I am writing to voice my department’s strong support for the Halıcıoğlu Data Science Institute’s Proposal for a Program of Study in Data Science Leading to a Degree in Doctor of Philosophy in Data Science. We have reviewed this proposal and are excited about its potential to be both a rigorous and successful program in its own right and to serve as a central force uniting other campus strengths in bolstering UC San Diego’s emerging leadership in data science education and research.

We believe that this proposal lays out that rigorous course of study in data science that the faculty associated with the HDSI – which includes some of our faculty – are highly qualified to deliver. It will deliver foundational skills early in the program and important applied machine learning skills as students progress. We are especially encouraged by the inclusion of a course in Arts, Humanities, Society, Policy and Social Sciences, which will connect students with the diverse disciplinary strengths of our campus.

We believe that students graduating with a Ph.D. in the proposed data science degree would have strong placement prospects both within and outside of academia. Within academia, there is an increasing demand across disciplines to hire faculty with data science expertise. The Department of Political Science has itself hired successfully in data science and is continuously looking to expand the group of political methodologists with a data science focus. Data scientists are also in high demand in the non-academic sector, and supply is still limited. The Political Science Department has been very successful in placing its students with data science expertise in a variety of companies, ranging from Amazon to Facebook to Google. We therefore believe that students who graduate with a dedicated data science degree would have very strong non-academic job prospects. The proposed program also offers potential synergy effects across departments on campus. A data science PhD program would offer additional courses that would be attractive to PhD students from other programs. At the same time, Data Science could potentially draw from existing courses that fit very well in the proposed curriculum. If I can answer any further questions about this matter, please feel free to call me at 858-246-0721 or to email me at [email protected].

Sincerely,

Thad Kousser

October 28, 2020 RE: Support for proposed PhD Program in Data Science Dear Dr. Gupta: I am writing in enthusiastic support of the PhD Program in Data Science currently proposed at UC San Diego. In my capacity as a Co-Director of the UC Davis Center for Data Science and Artificial Intelligence Research and as chair of our campus-wide Data Science Steering Committee, I have been closely following the developments at UC San Diego and the HDSI, viewing them as remarkably useful and exemplary for our own deliberations. I believe USCD has gotten crucial decisions right in the past and is in the process of adding another successful piece to its data science portfolio. The planned introduction of a PhD Program in Data Science is the natural next step for your campus and completes the educational data science infrastructure, complementing the already existing undergraduate major and the recently approved MS program. While the latter programs help provide industry, government and academia with graduates versed in the application of diverse data scientific tools, a maturing of the field will require mapping out and building the intellectual foundations that make data science a unique new field, and placing it within the existing disciplinary landscape. This is best done in conjunction with the development of a strong PhD program that allows for this process to play out in a coordinated yet flexible fashion. Outside of academia, future PhD graduates will take on leadership positions in industry and government that require more data science expertise than expected of those with undergraduate and MS degrees. I imagine the presence of the HDSI will enable a seamless integration of academia, industry and government interests into a coherent whole. Given the all-encompassing role data science is expected to play in the future, the PhD program will be of great service to all constituents at UC San Diego. It will in particular help satisfy student demand for rigorous PhD-level training in data science. I specifically like the proposers’ thoughtfulness in defining the aims of the PhD program and devising the curriculum, clearly bearing in mind the transdisciplinarity and evolving nature of the field, within the UC San Diego data science ecosystem and beyond. The strategy laid out in the proposal made available to me

is sound and constitutes a broad consensus of the involved parties. It is also laudable that guidelines put forward by the National Academies have been followed in mapping out the structure of the coursework, inclusive of important ethical components. The proposed curriculum will serve future PhD students in Data Science well. Faculty members listed as having teaching responsibilities in HDSI programs include renowned experts and leaders in their fields and should ensure the highest quality of instruction. I liked to see the on-ramping options that will allow students from diverse backgrounds to enter the program. Once in existence, I will make sure that undergraduate and MS students in Statistics and other disciplines at UC Davis are made aware of this exciting new opportunity for graduate education at UC San Diego. Overall, I view UC San Diego as primed to play a major role in data science research and education on the national level. The PhD program is the last piece missing to complete the full educational pipeline. The proposal is well sought out and administered by leading faculty at one of the foremost data science institutes. The PhD degree will be an outstanding addition to the portfolio of graduate degrees on your campus, providing your students with a pathway to the high-level jobs in data-intensive fields the US needs to cultivate in order to ensure a prosperous as well as equitable future. The proposal has my enthusiastic support. Please let me know if you should have any further questions. Yours sincerely,

Alexander Aue

Professor and Chair Department of Statistics

Co-Director Center for Data Science and Artificial Intelligence Research

University of California, Davis

+1-530-752-0560 [email protected]

Carnegie Mellon Department of Statistics &Data Science232 Baker HallCarnegie Mellon University5000 Forbes AvenuePittsburgh, PA 15213

Larry WassermanUPMC Professor(412) [email protected]/∼larryOctober 24, 2020

Rajesh GuptaHDSI DirectorUCSD

Dear Professor Gupta:

I am writing in support of the proposal to create a doctoral degree program in HDSI.

Data science is one of the fastest growing areas in academia. The reason is that, withthe flood of information that we now have, data science plays a role in every scienceand in understanding societal issues. Every statistics, data science and machinelearning doctoral program that I know of has experienced an unprecedented increasein demand both in terms of applicants and in demand for graduates.

Increasing the capacity to service more doctoral candidates in data science is thuscritical to the infrastructure of science and public policy. In short, we need moredoctoral programs.

HDSI is well positioned to offer a doctoral program. There is already a B.Sc. andthere is a large pool of talented faculty with an impressive array of research ex-pertise. I have reviewed the proposal and it is clear that the proposed program hasbeen clearly thought out. I should add that UCSD is unusual in that it does not havea statistics department. Having a doctoral program in data science will thus fill aserious gap.

In summary, the proposal to create a doctoral program in data science in HDSI iswell supported and I strongly support this proposal.

October 24, 2020 2

Sincerely,

Larry Wasserman

2

The Foundation for The Gator Nation An Equal Opportunity Institution

UF Informatics Institute E251 CISE Bldg PO Box 118545 Gainesville, FL 32611-8545 352-294-3912 October 26, 2020 Rajesh Gupta, Ph.D. Distinguished Professor HDSI Founding Director UC San Diego Dear Professor Gupta, It is my pleasure to write a letter of support for the new Ph.D. program in Data Science proposed by the Halicioglu Data Science Institute (HSDI) of UC San Diego. The demand for graduates in data science is very high in all sectors of the economy –even during the pandemic--- and the need for Ph.D. level researchers is subsequently becoming evident in industry, as well as academia. I have been involved with the design of two data science programs during my career. The first is the Masters in Data Science at the University of Michigan, while I was faculty there and launched in late 2015. It is jointly administered by the Departments of Computer Science and Statistics and the School of Information and provides training in both core methodologies (programming, data structures, data management, probability, statistical inference, data modeling, machine learning, optimization and computational methods) and domain expertise through elective coursework. The degree also requires a capstone course that requires students to do an end-to-end data science project involving understanding the scientific question of interest, data collection and curation, modeling and computation and finally communication of the results through a written report and an oral presentation. The second data science program I have been involved with is the Data Science undergraduate major at the University of Florida, a program jointly administered by the Departments of Statistics, Mathematics and Computer Science. Its philosophy is analogous to the previous program and aims to provide training in core methodologies, but also expose students to additional training in data science problems in specific domains (e.g. social sciences, natural sciences, public health) through specific thrusts. The Ph.D. program proposal developed by HSDI is elaborate and well thought out, both in terms of proposed coursework that covers both in depth state-of-the-art technical topics, but also provides exposure to a wide range of topics, necessary to produce well rounded data scientists. I was particularly impressed by the fact that the Ph.D. program will be open to students from various and diverse backgrounds and it is designed to prepare them for success. To that end, students will attend (as needed) certain carefully designed preparatory classes on core methods -computing, mathematics and statistics. Hence, all incoming students will be brought to the same page by the end of first year, including students who are admitted with little quantitative background. UCSD has a large Department of Mathematics, but (surprisingly) it does not have a Department of Statistics. The Statistics group within the UCSD Mathematics Department is very strong in

The Foundation for The Gator Nation An Equal Opportunity Institution

Mathematical Statistics. The proposed Ph.D. program in Data Science may thus provide an outlet for some top-quality work in more computationally and applications oriented work coming out of UCSD. In summary, I believe this new Ph.D. is carefully designed to accommodate a wide range of students and thus it represents an exciting development for UCSD. I believe it will be highly successful and I fully support the proposal. Please do not hesitate to contact me should you need more information. Sincerely,

George Michailidis Founding Director, UF Informatics Institute U Florida Research Foundation Professor Professor of Statistics and Computer Science University of Florida

UNIVERSITY OF CALIFORNIA

BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ

The Donald Bren School of Department of Computer Science Information and Computer Sciences Irvine, CA 92697-3435 Tel: (949) 824-0016 Fax: (949) 824-4056 www.ics.uci.edu January 8th, 2021

I would like to commend UC San Diego and HDSI for their initiative to create Ph.D. program in Data Sciences. This program spearheaded by a set of very talented and dedicated faculty will undoubtably continue the meteoritic trajectory UC San Diego in on in an important and timely area of data sciences. As a database faculty with keen interest in data sciences, I have been closely monitoring UCSDs efforts led by HDSI over the past several years. It is now well recognized that data science is destined to be the catalyst for disruptive innovations in science and technology leading to unprecedented changes and improvements to all walks of modern society. We live in the time of data revolution where machines, sensors, a variety of data capture devices enable us to collect and monitor every aspect of our lives whether they be personal experiences, health, social interactions, or our interactions with the engineered and physical systems. The ability to automatically and seamlessly monitor social as well as physical worlds at various spatial and temporal granularities has created unprecedented opportunities leading to major data-centric innovations, new opportunities, new efficiencies, and new industries. Companies such as Google and Yahoo! have used such data to provide improved search, better personalized experiences of individuals on the internet, designed novel ways to monetizing and funding the new ideas through placement of advertisement. Moving beyond internet companies, organizations such as the health care providers, product companies, political activist groups, and news media have developed tools to monitor public opinion feedback about their goods and services and use such feedback to launch new product lines or new models. While the above emphasizes the role/impact of data-centric approaches to industry, its role to the future of science and technology, new discoveries whether they be in medicine, health sciences, oceanography, or cosmology, will be even more profound. UC San Diego was amongst the early schools to realize the central role data science was to play to the future of education, and, now with its proposed Ph.D. program it is all set to lead the academic community in creation of the foundational principles that form the core of data-driven explorations, as well as, to expand the boundaries of knowledge and contribute to tools and techniques that will expand the nascent field of Data Sciences. While I cannot emphasize enough the timeliness of creating such a program and the very strong arguments the proposal makes as to how such a program will help not just UC, San Diego but the academic community, what I found truly exceptional was how well thought out are the operational plans to creating such a program.

In particular, the proposal clearly articulates the important role of multidisciplinary research and education in such a program and, based on such a realization, it is noteworthy that the leadership at HDSI systematically approached over 200 faculty (drawn from Engineering, Computer Science, Physical Sciences, Arts, Humanities, Social Sciences, Medicine, and Health) who are now are part of the affiliates program to create a cohesive integrative vision of Data Sciences outlined in the proposal. The approach emphasizes not just principles, algorithms, mathematical foundations, tools and technologies at the core of data-driven approaches, but provides an integrative view that incorporates domain sciences to set a path forward for doctoral dissertations based on interdisciplinary collaborations that open up opportunities for breakthrough in areas such as physics, medicine, social sciences, etc. Indeed, the architects of the proposal have this view firmly in their minds when they observe two unique aspects the proposed Ph.D. program compared to existing data science efforts UC San Diego and other universities that are typically part of Computer Sciences and Machine Learning. While existing efforts can be expected to advance algorithmic solutions, machine learning, and data management principles that form the theoretical underpinning of data science, a truly effective program (such as the one promoted by the proposal) must seek involvement of researchers with multidisciplinary background that embrace an interdisciplinary curriculum with faculty and students from disciplines interested in exploring data sciences in order to advance science and technology using data-driven approaches. Indeed, interactions of specialists and research at the cross-boundaries of disciplines is where the largest advances in data sciences and benefit of data driven approaches are expected to be. In looking through the details of the program articulated in the proposal, it is clear that the proposal writers have done their homework and tried to strike a balance in terms of courses and requirements that highlights quality and academic rigor while at the same time ensure success of the program from the very beginning.. As is always the case, additional/new needs will emerge when the program is launched. The proposal includes mechanisms necessary for such future adaptations based on emerging needs. With the faculty talent associated with the proposal, both at the leadership levels as well as excellent new hires associated with HDSI, and faculty affiliated with HDSI, I have no doubt that once the Ph.D. program is launched, it will be monitored and improved based on initial lessons learnt and the progress of the program will set the example for other universities, including my own – UC, Irvine -- to follow on their footsteps. It is for all these reasons that I very enthusiastically support the proposed UCSD effort. The proposal is very well thought out. It addresses an emerging need and is the logical next step for HDSI as it establishes itself to be a center of excellence and leadership in data sciences. Please do not hesitate to contact me if I can be of further assistance. Prof. Sharad Mehrotra IEEE Fellow Department of Computer Science University of California, Irvine. CA 92617

UNIVERSITY OF CALIFORNIA, SAN DIEGO

BERKELEY · DAVIS · IRVINE · LOS ANGELES · MERCED · RIVERSIDE · SAN DIEGO · SAN FRANCISCO

UNIVERSITY OF CALIFORNIAUNOFFICIAL SEAL

Attachment B - “Unofficial” SealFor Use on Letterhead

SANTA BARBARA · SANTA CRUZ

MECHANICAL AND AEROSPACE ENGINEERING +1 (858) 534-0708LA JOLLA, CALIFORNIA 92093

Mechanical and Aerospace EngineeringJacobs School of EngineeringUniversity of California9500 Gilman DrLa Jolla, California 92093PHONE: +1 (858) 822-7930EMAIL: [email protected] 29, 2020

To: Professor Rajesh Gupta, Director HDSIRe: MAE247: Cooperative Control of Multi-Agent SystemsFrom: Jorge Cortes, Professor, Mechanical and Aerospace Engineering

Dear Rajesh,

This is to confirm that I support the listing of the graduate course “MAE247: Cooperative Control of Multi-AgentSystems” as an elective for the HDSI Masters program under the Networks specialization and warmly welcomequalified graduate students in the course.

Sincerely,

Jorge CortesProfessor

U N I V E R S I T Y O F C A L I F O R N I A , S A N D I E G O U C S D

DEPARTMENT OF POLITICAL SCIENCE, 0521 9500 GILMAN DRIVE TELEPHONE: (858) 534-6807 LA JOLLA, CALIFORNIA FAX: (858) 534-7130 92093-0521

BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO

SANTA BARBARA • SANTA CRUZ

April 29, 2020 Dear Colleagues: To: Professor Rajesh Gupta, Director HDSI Re: POLI 287: Multidisciplinary Methods in Political Science: Social Networks From: James Fowler, Professor This is to confirm that I support the listing of the above course as an elective for the HDSI Masters program under the Networks specialization and welcome qualified graduate students in this course. Sincerely,

James H. Fowler Professor University of California, San Diego [email protected]

April 29, 2020

Professor Rajesh Gupta, Director HDSI UC San Diego [email protected]

Re: BNFO 286 / MED 283: Network Biology and Biomedicine

Dear Dr. Gupta,

This is to confirm that I support the listing of the above course as an elective for the Halıcıoğlu Data Science Institute (HDSI) Masters program under the Networks specialization and welcome qualified graduate students in this course.

Sincerely,

Trey Ideker, Ph.D.

UC San Diego Health Department of Medicine 9500 Gilman Drive MC-0688 La Jolla, CA 92093-0688 T: +1 858.822.4558 F: +1 858.534.4246 [email protected] idekerlab.ucsd.edu

Trey Ideker, Ph.D.

Professor of Medicine

Adjunct Professor of Bioengineering and Computer Science

Director, NCI Cancer Cell Map Initiative (CCMI)

Director, NIGMS National Resource for Network Biology (NRNB)

Director, NIMH Psychiatric Cell Map Initiative (PCMI)

UNIVERSITY OF CALIFORNIA, SAN DIEGO UCSD

BERKELEY ⋅ DAVIS ⋅ IRVINE ⋅ LOS ANGELES ⋅ MERCED ⋅ RIVERSIDE ⋅ SAN DIEGO ⋅ SAN FRANCISCO

9500 GILMAN DRIVE LA JOLLA, CALIFORNIA 92093-0404

FAX: (858) 534-7029

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF CALIFORNIA, SAN DIEGO THE IRWIN AND JOAN JACOBS SCHOOL OF ENGINEERING

SANTA BARBARA ⋅ SANTA CRUZ

May 4, 2020 To: Prof. Rajesh Gupta, Director HDSI This is to confirm that I support the listing of the following courses that I teach as electives for the HDSI MS program under the Bio specialization and welcome qualified graduate students in this course.

1. CSE280A: Algorithms for population genetics 2. CSE283/Beng203

Sincerely,

Vineet Bafna, PhD Professor, CSE, #4218 UC San Diego 9500 Gilman Drive La Jolla, CA 92093-0404 [email protected] 858-822-4978 (O) 858-534-7029 (F) http://www.cs.ucsd.edu/~vbafna

UNIVERSITY OF CALIFORNIA, SAN DIEGO UCSD

BERKELEY · DAVIS · IRVINE · LOS ANGELES · MERCED · RIVERSIDE · SAN DIEGO · SAN FRANCISCO SANTA BARBARA · SANTA CRUZ

DEPARTMENT OF MATHEMATICS +1 (858) 534-48899500 GILMAN DRIVE, #0112 FAX +1 (858) 534-5273LA JOLLA, CALIFORNIA 92093-0112 EMAIL [email protected]

May 3, 2020

Dear Colleagues,

To: Professor Rajesh Gupta, Director HDSI

This is to confirm that I support the listing of the course Math 277A: Topics in Computational and Applied Math:Diffusion Geometry and Metric Graph Learning as an elective for the HDSI Masters program under the Networksspecialization, and welcome qualified graduate students in the course.

Sincerely yours,

Alexander Cloninger, Ph.D.Assistant Professor of Mathematics andHalıcıoglu Data Science InstituteUniversity of California, San Diego

DEPARTMENT OF COGNITIVE SCIENCE 9500 GILMAN DRIVE, 0515 858-822-3317 LA JOLLA, CALIFORNIA 92093-0515 [email protected]

MAY 1, 2020  Dear Colleagues:  To: Professor Rajesh Gupta, Director HDSI Re: COGS 243: Statistical Inference and Data Analysis (4 units) From: Angela Yu, Associate Professor  This is to confirm that I support the listing of the above course as a general elective for the HDSI Masters                                         program under Group B (Core Knowledge and Skills Areas), and welcome qualified graduate students in this                               course.   Sincerely,  

  Angela Yu, PhD Associate Professor University of California, San Diego  

BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ

BRADLEY VOYTEK ASSOCIATE PROFESSOR UC SAN DIEGO

9500 GILMAN DR. LA JOLLA, CA

92093-0515

2020 May 01

To: Professor Rajesh Gupta, Director, Halıcıoğlu Data Science Institute Re: COGS 280: Neural Oscillations From: Bradley Voytek, Associate Professor

This letter confirms my support for including the above class, COGS 280: Neural Oscillations, as an elective for the Halıcıoğlu Data Science Institute Master program, Computational Neuroscience Specialization Area.

I look forward to working with students from HDSI!

Sincerely,

Bradley Voytek, Ph.D.

UC San Diego Department of Cognitive Science Halıcıoğlu Data Science Institute Neurosciences Graduate Program

Cc: Professor Doug Nitz, Chair, Cognitive Science Jennifer Morgan, MSO, HDSI

9500 GILMAN DRIVE

LA JOLLA, CA 92093-0515

To: Professor Rajesh Gupta, Director, Halıcıoğlu Data Science Institute

Re: COGS 225: Image Recognition

From: Zhuowen Tu, Professor

This letter confirms my support for including the above class, COGS 225: Image Recognition, as an elective for the

Halıcıoğlu Data Science Institute Master program, Computational Neuroscience Specialization Area.

Sincerely,

Zhuowen Tu

Professor

Department of Cognitive Science,

Department of Computer Science and Engineering (affiliate)

University of California, San Diego

Email: [email protected]

Tel: +1-858-822-0908

Cc:

Professor Doug Nitz, Chair, Cognitive Science

Jennifer Morgan, MSO, HDSI

DEPARTMENT OF COGNITIVE SCIENCE, 0515

La Jolla, CA 92093 Fax: (858) 534-1128

1

SIAVASH MIRARABBAYGI PHONE: (858) 822-6245 ASSISTANT PROFESSOR OF ELECTRICAL AND COMPUTER ENGINEERING E-MAIL: [email protected] 9500 GILMAN DRIVE, MC 0407 LA JOLLA, CA 92093

May 4, 2020 To: Prof. Rajesh Gupta, Director HDSI Re: ECE208: Computational Evolutionary Biology From: Siavash Mirarab, Assistant Professor, ECE This is to confirm that I support the listing of the above course that I teach for the graduate program in ECE at the Masters level as an elective for the HDSI Masters program under any appropriate specialization and welcome qualified graduate students in this course. With kindest regards Sincerely, Siavash Mirarab

1

MASSIMO FRANCESCHETTI PHONE; (858) 822-2284 PROFESSOR OF ELECTRICAL ENGINEERING FAX: (858) 534-2486 9500 GILMAN DRIVE, MC 0407 E-MAIL: [email protected] LA JOLLA, CALIFORNIA 92093-0407 April, 30, 2020 To: Prof. Rajesh Gupta, Director HDSI Re: ECE227: Big Network Data From: Massimo Franceschetti, Professor, ECE

This is to confirm that I support the listing of the above course that I teach for the machine learning and data science graduate program in ECE at the Masters level as an elective for the HDSI Masters program under the Networks specialization and welcome qualified graduate students in this course.

With kindest regards Massimo Franceschetti

Appendix C: Catalogue Copy Description [Draft]

Data Science (DSC)All courses, faculty listings, and curricular and degree requirements described herein aresubject to change or deletion without notice. Updates may be found on the Academic Senatewebsite: http://senate.ucsd.edu/catalog-copy/approved-updates/.

The Graduate Program

The graduate program offers a master of science degree and a doctor of philosophy degree indata science. To be accepted into either course of study, a student should have a BS or BAdegree in relevant fields or work experience in Data Science, or be able to demonstrate anequivalent competency.

Admission to the graduate program is done through the Graduate Division, UC San Diego. Theapplication deadline is in December. Admissions are always effective the following fall quarter.For admission deadline and requirements, please refer to the departmental web page:http://datascience.ucsd.edu.

Admission decisions for the MS and PhD programs are made separately. A current MS studentwho wishes to enter the PhD program must submit a petition, including a new statement ofpurpose and three new letters of recommendation, to the HDSI graduate admissions committee.

Data Science Program

The field of Data Science spans mathematical models, computational methods and analysistools for navigating and understanding data and applying these skills to a broad and emergingrange of application domains. A whole range of industries – from drug discovery to healthcaremanagement, from manufacturing to enterprise business processes as well as governmentorganizations – are creating demand for data scientists with a skill set that enables them tocreate mathematical models of data, identify trends and patterns using suitable algorithms andpresent the results in effective manners. The target systems can be, for example, biological(e.g., clinical data from cancer patients), physical (e.g., transportation networks), social (e.g.,social networks) or cyber-physical (e.g., smart grids). In all these cases, there is a combinationof core knowledge in information processing coupled with the skills to abstract, build and testpredictive and descriptive models that must be taught and learnt in the context of an application

Ph.D. in Data Science, November 30, 2020 Version 4.1 59 | Page

domain. These application areas are in many domains served by Engineering, PhysicalSciences, Social Sciences, Health & Life Sciences, and Arts & Humanities.

Doctor of Philosophy Program

The goal of the doctoral program is to create leaders in the field of Data Science who will lay thefoundation and expand the boundaries of knowledge in the field.

Course Requirements

There are Foundation, Core, and Elective and Research requirements for the graduateprogram. These course requirements are intended to ensure that students are exposed to (1)fundamental concepts and tools (Foundation), (2) advanced, up-to-date views in topics centralto Data Science for all students (the Core requirement), and (3) a deep, current view of theirresearch or application are (the Elective requirement). Courses may not fulfill more than onerequirement.

The doctoral program is structured as a total of 52 units in courses grouped into foundational,core, professional preparation and research experience areas as described below. Successfulcompletion of the program requires successful and timely completion of three examinations andcompletion of a doctoral dissertation. Out of the 52 units, 48 units must be taken for letter gradeand at least 40 units must be using graduate-level courses. Out of the 12 courses, at least 10must be graduate-level courses; at most two can be upper-level undergraduate courses. 36units or 9 courses must be completed within six quarters from the start of the degree program.Group A, Group B and Group C. Group A courses are introductory level courses taught at thelevel of undergraduate senior or mezzanine courses. Group B are core graduate level courseswith prerequisites from Group A courses. Group C are advanced, specialized and free-standingcourses, often part of the required courses in the Data Science specialization of GraduateProgram in other departments. In all three groups, required courses are indicated as such; theycan not be substituted by other courses without exception approval from the graduate programcommittee.

Group A: Preparatory CoursesThere are five important knowledge and skills necessary for understanding (and advancing)core data science. It is, therefore, important that all our entering students either havebackground preparation or have courses available in the program to ensure a successfulcompletion of the stipulated doctoral degree program. A student can receive credit towards thePh.D. degree for a maximum of three courses from the list of courses below:

1. DSC 200: Data Science Programming.2. DSC 202: Data Management for Data Science3. DSC 210: Numerical Linear Algebra4. DSC 211: Introduction to Optimization

Ph.D. in Data Science, November 30, 2020 Version 4.1 60 | Page

5. DSC 212: Probability and Statistics for Data Science

Group B: Core CoursesFour core courses are required for all Ph.D. students, including those with a Bachelors in DataScience. The four required courses are:

1. DSC 240: Machine Learning2. DSC 260: Data Ethics and Fairness3. DSC 241: Statistical Models (or MATH 282B)4. DSC 204A: Scalable Data Systems (or CSE 202)

In addition, a doctoral student must select at least 2 out of the following 8 core courses1. DSC 203: Data Visualization and Scalable Visual Analytics2. DSC 204B: Big Data Analytics and Applications3. DSC 242: High-dimensional Probability and Statistics4. DSC 243: Advanced Optimization5. DSC 244: Large-Scale Statistical Analysis6. DSC 245: Introduction to Causal Inference7. DSC 250: Advanced Data Mining8. DSC 261: Responsible Data Science

Thus, together with Group A and Group C courses, doctoral students are required to take aminimum of 5 courses for letter-grade credit. On the other end, students can satisfy all lettergrade course requirements except (satisfactory completion of professional preparation)teaching, survival skills and research seminar courses. These students are expected to enrollinto individual research (DSC 298) in a section offered by the faculty advisor to meet residencyrequirements and maintain graduate student standing during the period of dissertation research.

Group C: Professional Preparation and Elective CoursesGroup C courses aim to provide either practical experiences in chosen specialization areas, oradvanced training for students preparing for doctoral programs. The courses include requiredprofessional preparation courses: 2 unit TA/tutor training (DSC 599), 1 unit of academic survivalskills (DSC 295) and 1 unit faculty research seminar (DSC 293), all of which must be completedwith a Satisfactory (S) grade using the S/U option.

Professional Preparation Courses1. DSC 599: TA/Tutor Training2. DSC 293: Faculty Research Seminar3. DSC 294: Research Rotation4. DSC 295: Academia Survival Skills

Elective and Specialization Courses

Ph.D. in Data Science, November 30, 2020 Version 4.1 61 | Page

Students can choose from following elective or specialization tracks a total of three 4-unitcourses to complete course requirements.

DSC 205, DSC 231, DSC 251, DSC 252, DSC 253, DSC 254, DSC 213CSE 234, MATH 181 A-B-C, MATH 284, MATH 285, MATH 287A-B, COGS 243.

Preliminary Advisory AssessmentThe preliminary assessment is an advisory examination. It consists of an oral examination in anarea selected by the student with the goal to assess the student's preparation for the proposedarea, including several relevant topics, and identify any courses that are required orrecommended for the candidate based on knowledge shown and critical missing backgroundrevealed. The preliminary examination must be completed before the start of the second year inthe doctoral degree program. The examination dates are announced no later than the start ofthe Winter Quarter. A failing grade in the preliminary examination would includerecommendation for the opportunity to receive a MS in Data Science degree provided they meetthe degree requirements in no more than one extra quarter over the standard time for the MSprogram; here we refer to the newly proposed degree of MS in Data Science (not its onlineversion). Students who fail the preliminary examination may file a petition to retake it; if thepetition is approved, they will be allowed to retake it one (and only one) more time.

After a student successfully completes the preliminary assessment examination, in the nextannual review of the student (conducted annually in the Fall Quarter as a part of the AnnualFaculty Retreat), the GradCom of the HDSI Faculty Council assigns the academic advisor toprovide necessary updates to the GradCom and helps in setting up the doctoral dissertationcommittee.

Research Qualifying Examination (UQE)A research qualifying examination (UQE) is conducted by the dissertation committee consistingof five or more members approved by the graduate division as per senate regulation 715(D).One senate faculty member must have a primary appointment in the department outside ofHDSI. Faculty with 25% or less partial appointment in HDSI may be considered for meeting thisrequirement on an exceptional basis upon approval from the graduate division. The goal ofUQE is to assess the ability of the candidate to perform independent critical research asevidenced by a presentation and writing a technical report at the level of a peer-reviewed journalor conference publication. The research qualifying examination must be completed no later thanfourth year or 12 quarters from the start of the degree program; the UQE is tantamount to theadvancement to PhD candidacy exam

Dissertation Defense ExaminationStudents must successfully complete a final dissertation defense presentation and examination.

Ph.D. in Data Science, November 30, 2020 Version 4.1 62 | Page

Student with Disabilities

In order for the program to respond, a student requiring accommodation for disability may makea request for accommodation upon submission of the student’s intent to apply to the GraduateProgram. Declaration of any disability information is not part of the admissions review processand will not be a factor in admissions.

Information concerning accommodation requests is available at: https://disabilities.ucsd.edu/ .Distance learning sites must confirm their ability to support students with disabilities.

Ph.D. in Data Science, November 30, 2020 Version 4.1 63 | Page

Halicioglu Data Science Institute (HDSI) Bylaws – Preamble

UC San Diego has organized the Halicioglu Data Science Institute (HDSI) as an academic unit tasked with creation and operation of academic programs related to the field of Data Science, broadly defined as the study of mathematical models, computational methods and analysis tools for navigating, securing and understanding data, data-driven systems and decisions and applying these skills to all areas of human enquiry, creativity and applications in natural, social and engineered systems. Due to its breadth, Data Science is considered a transdisciplinary subject, that is, spanning and overlapping many existing disciplines such as Mathematics, Computer Science, Electrical Engineering as well as topical areas of Machine Learning, Artificial Intelligence, Data/Cyber Infrastructure and Digital Humanities.

Serving as the hub for data science talent and programs, HDSI builds upon unique strengths of UC San Diego. In particular, UC San Diego seeks institutional presence – both of UCSD in Data Science as well as Data Science at UCSD – that benefits all existing units, departments, programs and schools with potential to contribute to the academic discipline. As an academic unit, HDSI mission consists of three core components: (a) train talent in data science at all levels via courses, degree and professional training programs; (b) catalyze research in data science via integrative research projects and initiatives; and (c) cultivate an ecosystem of data science by engaging industry partners, non-profit and civic organizations with potential to contribute to data science practice.

Institutionally, HDSI is designated an academic unit with responsibilities that include functions carried out by traditional academic departments and schools. Accordingly, HDSI functions under a divisional budget model and direct oversight by the Academic Senate and administration. The Institute is also endowed by founding gift to ensure HDSI is the hub for Data Science and is able to carry out its three-part mission by bringing the campus together.

Halicioglu Data Science Institute (HDSI) Bylaws

1st Draft 14 Oct. 2019; this version May 1, 2020 1. INTRODUCTION

Preamble

The authority for the departments to establish a form of departmental governance is established by Academic Senate Bylaws (Part I), Title VI, Bylaw 55.A:

“According to the Standing Orders of The Regents, “... the several departments of the University, with the approval of the President, shall determine their own form of administrative organization.... No department shall be organized in such a way that would deny to any of its non-emeritae/i faculty who are voting members of the Academic Senate, as specified in Standing Order 105.1(a), the right to vote on substantial departmental questions...”

The source documents, referred to as “Higher Authority” for the Bylaws, are the UCSD PPM, the UC APM and the UC and UCSD Academic Senate Bylaws. Important, frequently used policies that are specified in detail in these documents are described and referenced here, along with Department policies that fill in additional details. In the case of discrepancies, Higher Authority takes precedence.

Principles

Responsibilities These bylaws describe procedures for discharging faculty responsibilities in an academic unit. Some responsibilities are assigned by the UCSD Policies and Procedures Manual (PPM) specifically to the faculty as a whole, others to the Director, and some are assigned to both. These bylaws describe responsibilities, the rules and guidelines for performing a responsibility, and the selection of faculty members who will perform a role that fulfills a responsibility. A basic principle that has been followed in the case of shared responsibilities is for the Director to initiate decisions and actions, and for the faculty to approve them or take remedial action. On all academic matters, including courses, degree programs, senate faculty appointments, faculty approval as a group is a necessary requirement without any exceptions.

Delegation of Responsibilities In order to facilitate effective governance, HDSI may have Associate and/or Assistant Directors, and permanent committees to which the faculty may delegate some of their responsibilities and on which the Director may rely for advice. The Director shall have the authority to create committees, and Associate and/or Assistant Director positions, subject to relevant university regulations, and to choose the faculty to serve in those positions. The appointment of Associate Director(s) is done by the Chancellor in consultation with the Director. Associate Directors may assist the Director in carrying out the Director’s responsibilities as Director, but the responsibilities themselves may not be delegated.

Furthermore (i) all such entities shall have a written statement of their responsibilities; (ii) faculty may request a Departmental vote on the creation of new entities or on

significant changes that are made to the responsibilities of those entities if some or all of these responsibilities are not explicitly allocated to the Director;

(iii) committees, Associate/Assistant Director(s) and the Director shall keep the faculty informed of all significant decisions or actions;

(iv) a faculty member may request a Departmental vote, and by a regular approval majority vote overturn a committee decision or action over which the faculty has full or shared jurisdiction.

Structure of the Bylaws

The Bylaws are organized into the following sections.

1. Introduction 2. General Voting Definitions and Procedures 3. Faculty Meetings 4. Directors and Associate Directors 5. Standing Committees 6. Faculty Appointments 7. Faculty Promotions 8. Miscellaneous and Infrequent Responsibilities 9. Bylaw Changes

2. GENERAL VOTING DEFINITIONS AND PROCEDURES Voting

The HDSI Faculty Council consists of all faculty members that are members of the Academic Senate, and hold a full or partial (even 0%) appointment with HDSI; HDSI fellows are also members of HDSI Faculty Council. Faculty members whose HDSI appointment is solely as adjunct or affiliate are not part of the HDSI Faculty Council.

A faculty member is considered to be in residence for the purposes of a vote if the member is not on leave from HDSI or university, nor on sabbatical away from the campus, in the quarter during which the vote is taken.

The HDSI voting population consists of all members of the HDSI Faculty Council that are in residence.

The means of taking a vote include: show of hands at a faculty meeting, secret written ballot conducted at a faculty meeting, secret ballot circulated by mail, email or fax ballot. Votes conducted by written ballots, email and or faxes are referred to as mail votes. Written and non-written votes conducted at meetings are referred to as meeting votes.

A secret ballot vote must be used for a vote if it is requested by at least one member of the voting population.

For non-secret votes, a fax or email vote may be used at the discretion of a voting faculty member. For secret ballot votes, fax and email votes will be allowed for faculty members who are not in residence, or who are unable to be on campus for any reasons. They are also allowed for other members unless two faculty members request that email and fax ballots be restricted for that secret vote. Email votes are counted by the chief administrative officer of HDSI.

Proxy voting is not allowed. A faculty member who is eligible to vote during a faculty meeting, but who will be absent for the meeting, may request a ballot in advance, which can be submitted to and entered into the voting process by the supervisor for that vote.

All written ballots will allow a simple yes, no, or abstain choice on an issue and provide a space for remarks. The members of a voting population who are in residence and who do not vote are reported as abstain. The members of a voting population who are not in residence and who do not vote are reported as absent. The remarks shall be reported along with any report of the vote’s results, and must be included in any HDSI letter that summarizes HDSI’s position.

The supervisor of a vote will report the results of all votes to the relevant voting population in a timely manner. A vote shall not be considered completed until it has been reported to the voting population.

Some issues will specify a required vote. If a required vote is part of some process, it must be held. Others will involve a requested vote. A requested vote only occurs under certain circumstances, which include a certain number or percentage of members of a voting population requesting the vote. When requested, it must be held in a timely fashion. A requested vote can be either a mail or meeting vote.

Quorums and Levels of Approval

A quorum is defined to be more than half of the members of the HDSI voting population. All decision-making votes require a quorum in order to be valid. Mail votes are invalid if the number of people voting does not meet the quorum by the final deadline for the vote. At the discretion of the supervisor of the vote, an initial deadline for a vote may be extended, provided the original deadline has not passed. A mail or electronic vote shall be held open for a minimum of three working days.

Approval requirements for a vote are defined as follows:

Super-approval: the number of yes votes greater than 2/3 of the size of the HDSI voting population.

Regular approval: the number of yes votes greater than 50% of the size of the HDSI voting population. Simple approval: the number of yes votes greater than 50% of the number of votes cast.

As a general rule, super-approval is required for all required and requested votes that result in the recruitment of faculty members, changes to the bylaws, creation of (or significant changes to) descriptions of permanent committee responsibilities, revoking senate committee assignments, and voting on personnel matters outside the default voting rules. Regular approval is sufficient for the promotion and/or academic review of HDSI faculty, and for voting on departmental actions required by policy. Simple approval is typically used for minor issues in the context of a faculty meeting.

3. FACULTY MEETINGS

A faculty meeting, also referred to as faculty council meeting, is used to carry out a number of different responsibilities. It may be used by the Director and other faculty members to make announcements, to provide a forum for discussion of issues of importance, and to facilitate decisions and actions for which there is no specific provision in the bylaws.

Scheduling

(i) Faculty meetings shall occur at least once a quarter, excluding the summer. (ii) The Director must call a meeting within a reasonable amount of time if any three members of the HDSI voting population petition the Director to do so on any issue.

Agenda

The Director shall announce the agenda in writing for each faculty meeting at least two working days in advance. Urgent items can be added to the agenda at the last minute but the case for the urgency has to be explicitly stated.

(i) Issues that were the reason for scheduling the meeting, as described above, will be automatically placed on the agenda.

(ii) For all other issues, any two members of the HDSI voting population may request that any issue be placed on the agenda for a meeting, and the Director may delay placing that issue on the agenda for at most one meeting.

Motions

Any issue related to an agenda item may be brought to a vote if it is proposed and seconded by two members of the HDSI voting population. The Director shall allow a reasonable amount of time in each faculty meeting to consider faculty motions. If the issue is one for which the Director has sole, specific authority, the result of the vote will be advisory, but otherwise it will be binding. The level of required approval will depend on the issue.

Operation

The Director presides over faculty meetings, or delegates this duty to an Associate Director, or another member of the HDSI voting population. If necessary, a meeting shall have both an open and a closed part. All academic personnel matters shall be discussed only during closed parts of the meetings that are restricted to relevant voting members of the HDSI voting population, the Associate and/or Assistant Director(s), and the relevant academic specialists. A faculty member may request that an item be discussed in a closed session which is restricted to members of the relevant HDSI voting population.

Minutes

The Director shall ensure that the minutes for each faculty meeting are published within five working days following the meeting. As a minimum, minutes shall include a record of each motion voted on and the outcome of the vote. Faculty members have five working days to submit corrections to the minutes. Minutes shall be stored in a safe place for at least five years. Access to the minutes for the closed part of a meeting shall be restricted to faculty members who were eligible to attend the closed part.

4. DIRECTORS AND ASSOCIATE DIRECTOR(S) Responsibilities

The specific responsibilities for which the Director has an authority that cannot be delegated are analogous to those of a Department Chair as described in university regulations (PPM 230-1.IV.B).

Department Consultation: The Director is expected to inform the HDSI faculty and seek advice for major decisions made with respect to the above responsibilities. The Director shall inform the faculty of staff organization and responsibilities, and seek advice on how these arrangements can best be used to support faculty duties and responsibilities. The Director must be receptive to questions and facilitate appropriate remedial procedures as required.

Selection of Director and Associate/Assistant Director(s)

The HDSI Director is appointed by the Chancellor following UCSD procedures for administrative appointments. The HDSI Director reports to the HDSI Oversight Committee. The Director must hold the rank of Full Professor.

Associate Directors are appointed by the Chancellor at the recommendation of the Director. Associate Directors must be tenured members of the Academic Senate.

Assistant Director appointments are made by the Director. Assistant Director(s) can be staff members.

The Director and Associate/Assistant Director(s) can be re-appointed by the Chancellor for an unlimited number of consecutive terms.

The appointment to the office of Director is for a period of five years.

Associate Director appointments are for a period of three years, subject to annual review.

If a Director does not wish to be reappointed, then the new appointments procedure specified in [PPM 230 2 III A] will be followed. The procedure requires that the tenured members of the HDSI voting population meet to consider their recommendation of a new Director. In the case where the recommendation is not unanimous, a vote will take place whose results will be included as part of the recommendation to the Chancellor.

In the case where a Director wishes to be reappointed, the reappointments procedure specified in [PPM 230 2 III B] will be followed. The procedure requires that the reappointment ad hoc committee consult with faculty members. In the year of the reappointment, the tenured members of the HDSI voting population will meet to determine their recommendation, which will be forwarded to the committee. In the case where the recommendation is not unanimous, a vote will take place whose results will be included as part of the recommendation to the Chancellor.

The Director reappointment evaluation procedure may be initiated by the Chancellor at intermediate stages of a Director’s tenure.

Any two tenured faculty members may request a secret ballot vote of no confidence in the Director. The faculty will meet to select by vote, a committee of two HDSI faculty members who will administer the vote, and report it to the office of the Chancellor via the Dean and/or Vice Chancellor.

5. STANDING COMMITTEES Responsibilities

Certain responsibilities will be managed by permanent (standing) committees listed below in no particular order:

• Space Planning & Collaboratories (SpaceCom) • Computing and Cyber-Infrastructure (CI) • Graduate Admissions & Scholarships (GradAdmin) • Grad and Post-doctoral Programs (GradCom) • Undergraduate Programs and Scholarships (UGS) • Colloquia, DLS and Sponsorships (DLS) • Equality, Diversity and Inclusion (EDI) • Industry Liaison and Institutional Partnerships (ILIP) • Recruiting (RecCom): multiple recruiting committees may be appointed to conduct

searches in broadly different areas.

Standing committees have specifically defined areas of significant responsibility and continue indefinitely, even though their membership may change. Some of the responsibilities may involve issues that are the sole responsibility of the Director and cannot be delegated (see Section 4.1). In this case the committees may act in an advisory role. Other responsibilities are the prerogative of the faculty or are shared responsibilities, and the committee acts as the representative of the faculty.

Committee Creation, Deletion, and Modification

The Director has the authority to create a new committee, providing a written description of committee's role, procedures and policies. Any two faculty may request a faculty vote of approval. Deletion of any existing committee is also open to a requested vote of approval. Substantial changes to a committee’s role, procedures or policies must be announced and are open to a requested vote of approval. Requested votes on standing committee creation and

deletion require super-approval. Requested votes on committee policies and procedures, for which the faculty has sole or shared responsibility, require regular approval.

Committee Selection In order to facilitate effective governance, the Director chooses committee chairs and, in consultation with the committee chair, chooses the members of the committee. All permanent committee members must be members of the appropriate HDSI voting population. The Director and/or Associate Directors may be a member or chair of one or more standing committees.

Committee Procedures and Policies

Each committee chair, in consultation with the Director and/or Associate Director(s), must prepare a specification of that committee’s responsibilities and policies.

In some cases, such as the Recruiting Committee, the policies and procedures are specified by a higher authority, such as the UCSD PPM or the UC APM. Some of the more important procedures and policies for this committee are summarized in the Recruiting Section of these Bylaws.

In other cases, such as Undergraduate Program, Master's Program and Ph.D. program, the significant policies and procedures are the responsibility of the faculty and must be approved by a faculty vote. Examples include the procedure for selecting admissions to the Ph.D. program.

Advisory Board

The chair may create an Advisory Board that consists of up to five distinguished individuals that do not necessarily belong to the HDSI Faculty Council. The Advisory Board does not have authority to make decisions. Its creation and operation must follow the procedures specified in Sections 5.2, 5.3 and 5.4 above.

6. FACULTY APPOINTMENTS

UC Academic Bylaws assign the responsibility for faculty appointments to the tenured members of a Department [VI.55.B.1]. A vote will be held among the tenured members of the HDSI voting population to extend the responsibility for faculty appointments to all members of the Faculty Council, tenured or not; this vote will require super-approval. Responsibilities for faculty appointments include identifying, evaluating and voting on new HDSI faculty members. All appointments, except part-time lecturers, require a vote with super-approval. This includes research series and adjunct appointments. The Director has sole responsibility for the appointment of part-time lecturers which can be delegated to the Chair of the suitable committee. Appointments to visiting positions will either be voted on or the HDSI voting population will delegate this responsibility to the Director or a duly appointed committee.

Adjunct, visiting and research appointments will be processed individually and require a vote with regular approval.

The operation of the HDSI faculty as a recruiting committee of the whole is authorized by the PPM. Some of these responsibilities are delegated by the faculty to the Recruiting committee. The Recruiting committee and its chair are chosen by the Director. They use the following procedures.

Hiring Plan

At the beginning of the recruiting season, the Director, in consultation with the HDSI Faculty Council, will formulate a hiring plan. This plan will be based on the expected number of positions that will be available, the expected levels of appointments, and targeted areas. Any specific strategy to be adopted to meet diversity goals will be part of this plan. Examples of strategy include flexibility in target areas, seniority or any specific target(s) of opportunity available that season.

Screening

The Recruiting committee, which will be made up from members of the HDSI Faculty Council, will be responsible for initial screening of applicants for all open positions. This consists of all positions except for part- time lecturers.

The Recruiting committee will evaluate candidates, solicit letters of reference, and recommend candidates to be invited for a visit to HDSI. The committee will make every effort to consider all candidates fairly and to use an appropriate comparison process. The Recruiting committee will consider both the plan for hiring and the excellence of candidates which may result in exceptions to the plan consistent with the strategy specified. The Recruiting committee shall provide the Director its recommendation concerning which candidates to invite for a visit. The Director will share the committee’s recommendation with the HDSI faculty. All members of the appropriate HDSI voting population may examine the applicant files and suggest to the Committee that specific additional candidates be added to the committee’s recommendation list. In the case of disagreement, a vote will be scheduled in a timely manner to allow new candidates to be considered along with the others. Adding an applicant to the list will require, under the general rules for faculty meetings, regular approval.

Institutional/Departmental Evaluation After all candidates in a particular search or search specialization who were approved by the screening process have completed their interviews with HDSI, the Director will call a meeting of the relevant HDSI voting population to discuss the candidates. At this meeting, a vote will be held in which each faculty member may vote yes or no for each candidate. Super-approval is required for a candidate to be further considered. The result of Institutional Evaluation is a list of faculty candidates recommended for making an offer of a faculty appointment in HDSI and/or jointly with another department/school. The list may be unordered or partially ordered. The actual offer and offer order will be specified by an offer strategy discussed next.

Offer Strategy In the case of multiple candidates for one or more positions, the Director may formulate a strategy for scheduling the approved candidates for a formal vote and offer. This will take into account: the original plan for hiring, the approved candidates, the maximal number of offers to have out at one time, balance between areas, financial cost considerations and the responsiveness of the candidates to academic and EDI goals of the Institute and of the hiring plans. The Director will present the strategy to the faculty for advice and consent. If requested by a member of the voting population, the strategy will be put to a faculty vote, where it will require majority approval. If no strategy is proposed, or no proposed strategy is approved, the candidates will be offered positions in an order determined by the number of votes they received. In the case of ties, the Director will make the decision. Documentation

The documentation required for a proposed HDSI appointment is covered in PPM 230.20.IX. Included in this documentation is the Departmental Recommendation Letter. This letter is meant to summarize the Department position. The file, including this letter, shall be made available for inspection by the HDSI voting faculty for a period of not less than five working days before submission of the file. The Director shall announce to the members of the HDSI voting population when the letter is available for inspection. If a faculty member objects to the Department letter, or to the process that was used, the faculty member has the option of including a letter of dissent, which may be signed by one or more faculty members. Dissenting faculty members must submit their letter within the five-day inspection period. A file shall not be submitted to the administration that has not had a five-day inspection period.

If desired, the Director may also include a confidential letter in a file, which can be used to express the Director’s personal opinion.

If significant additional evidence for a file arrives after the file has been submitted, the Director has the option of submitting the additional information or recalling the file for additional faculty consideration and/or processing.

Endowed Chairs The procedure for the awarding of an endowed Chair, including the required faculty consultation is described in [PPM 230 8]. In the case where an endowed chair is to be used in faculty recruiting, the candidate must satisfy both the procedure for faculty hiring and the endowed chair appointment. 7. FACULTY PROMOTIONS

The relevant responsibilities in this category include: identification of HDSI faculty members who may be eligible for normal or accelerated advancement, assembling a promotion file, and carrying out a faculty vote; this includes HDSI faculty members with partial HDSI appointments as long as they have a non-zero % HDSI appointment.

Responsibilities with regard faculty promotions are carried out by the Director, the relevant voting population subset of the faculty, an ad hoc committee, and the individual faculty member who is up for promotion.

The Director shall select the chair of an ad hoc committee, which will consist of the chair and two additional faculty members. The Director shall select the members of the ad hoc committee in consultation with the ad hoc chair. The committee members must be chosen from the voting population for the candidate’s promotion.

In the case that a faculty member has a partial (but non-zero) HDSI appointment, HDSI shall follow the above procedure and arrive at its own recommendation, even if the candidate’s other department(s) are also conducting a separate review. In case that there is disagreement in the assessment of the candidate’s file by the different academic units, the candidate will be given the option to apply to move his/her FTE to the academic unit of his/her choice, before the promotion recommendation is filed to the Campus.

Screening

In the academic year preceding a normal or proposed accelerated promotion, the Director shall determine which faculty members are eligible for a normal merit promotion within rank, or promotion to the next rank.

Any faculty member may request consideration for an accelerated promotion, either at the time of what would be a normal promotion or at an intermediate time in a promotion cycle. A faculty member who would be a member of the voting population for a proposed accelerated promotion may also propose such a promotion for another faculty member.

Institutional/Departmental Evaluation

Faculty members who are eligible for a normal promotion, or who have been proposed for an accelerated promotion, will be informed in a timely manner of the procedures for preparing their promotion files, and the deadlines for submission of materials for which they are responsible.

The ad hoc committee will, if necessary, choose references and oversee the assembly of a candidate’s file. The Director has the final authority over the selection of references.

Objections to the choice by a faculty member may be made in a dissenting letter to be added to the file before it is submitted.

It is required by the PPM that voting members have the opportunity to express their opinions of the promotion case. There will be a pre-vote meeting scheduled for the voting faculty that must be held far enough in advance of the vote to allow suggestions related to the processing of the file to be implemented.

The chair of the ad hoc committee shall prepare a letter to the Director that details the committee’s recommendation.

The relevant voting population shall vote on all promotions to a new rank, advancements from Full Professor Step V to Step VI, advancements from Full Professor Step IX to Above Scale (AS), and all accelerated advancements either within or to a new rank. In the case of merit advancements that are not accelerated, for which the Director in consultation with the ad hoc committee recommends approval, no faculty vote is required. In the case of merit advancements for which the Director in consultation with the ad hoc committee recommends disapproval, the candidate may request that their file be put forward for a vote, along with the department's negative recommendation.

Voting Population

The voting populations for promotions of members of the Academic Senate will be based on the default populations specified in UC Academic Senate Bylaw 55. They are defined in the following table. All references to Professor in the table, unless designated otherwise, refer to ladder rank Professor appointments who are members of the HDSI voting population.

For voting purposes, all cases that involve the removal of the Acting modifier from the title of a member of the Academic Senate shall be treated as promotions to the rank in question. The table contains an entry for the voting population for promotion to Assistant Professor. This may occur as the result of the removal of the "Acting" designation, or a promotion from Instructor. There is no corresponding case for Lecturer PSOE. An appointment to Senior Lecturer PSOE is considered to be comparable to that of an initial appointment as Acting Full Professor, based on the salary restrictions. A subsequent promotion from Senior Lecturer PSOE to Senior Lecturer SOE is covered by row 5 in the table.

In the case of promotions for non-academic senate members, these promotions will be determined by faculty members of the Academic Senate using a voting population that parallels that for Academic Senate promotions. Such appointments include adjunct and research series appointments and are referred to as “at a level equal to ...” in the table below.

Promotion to and within Eligible voters

Assistant Professor, Assistant Teaching Professor

Full Professors, Associate Professors, Assistant Professors, Full Teaching Professor, Associate Teaching Professors

Teaching Professor (Lecturer SOE)

Full Professors, Associate Professors, Full Teaching Professors, Associate Teaching Professors (Senior Lecturers SOE)

Associate Professor Associate Teaching Professor Associate Professor In Residence

Full Professors, Associate Professors, Associate Teaching Professors

Senior Lecturer SOE Full Professors, Senior Lecturers SOE

Full Professor Full Professor in Residence or at a level equal to a Full Professor

Full Professors

Documentation The clarification and additional documentation details for appointments, as contained in the Section for recruiting, shall also apply to promotions. In addition, a faculty member who is a candidate for promotion may, after examination of the redacted promotion file, include a letter in the file.

8. MISCELLANEOUS AND INFREQUENT RESPONSIBILITIES These are responsibilities that are not covered in the bylaws. They may be unanticipated, infrequent, or minor in nature. These responsibilities may be carried out by the Director or Associate Director(s), by temporary committees, or by the faculty as a whole.

The Director and the faculty shall have the authority to create a temporary committee and choose its members. The committee is expected to be advisory.

The Director will have primary authority for these responsibilities. In substantial matters for which the Director and/or faculty has authority, the Director may request a faculty vote. In matters for which the faculty shares or has authority, the faculty may request a vote. All votes will be approved by regular approval, except in matters that are covered in the other sections of the bylaws, for which a higher level of approval is indicated.

The Director should keep the faculty informed of all important issues and decisions taken with respect to the issues.

9. BYLAW CHANGES

Changes to, additions and deletions from the bylaws are carried out by the HDSI voting population.

Suggestions for changes to the bylaws, and requests for a vote on the suggested changes, may be made by any member of the HDSI voting population in accordance with the regulations for faculty meetings. A vote is required on a suggested change, and such a vote may be either a meeting vote or a mail vote. All such votes require super-approval.

1

Ilkay Altintas San Diego Supercomputer Center Telephone: (858) 822-5453 9500 Gilman Drive Fax: (858) 822-3693 MC 0505 E-mail: [email protected] La Jolla, CA 92093-0505 Professional Preparation

Middle East Technical University, Ankara, Turkey

B.S. Computer Engineering 1999

Middle East Technical University, Ankara, Turkey

M.S. Computer Engineering 2001

University of Amsterdam, Amsterdam, Netherlands

Ph.D. Computational Science 2011

Appointments 2018-. 2016-.

Fellow, Halicioglu Data Science Institute, UCSD Associate Research Scientist, San Diego Supercomputer Center, UCSD

2016-. 2015-. 2015-. 2014-.

Faculty Co-Director, Master of Advanced Studies in Data Science and Engineering, UCSD Chief Data Science Officer, San Diego Supercomputer Center (SDSC), UCSD Division Director, Cyberinfrastructure Research, Education and Development, SDSC, UCSD Founder and Director, Workflows for Data Science Center of Excellence, SDSC, UCSD

2012-. 2012-2016

Lecturer, Department of Computer Science and Engineering, UCSD Assistant Research Scientist, San Diego Supercomputer Center, UCSD

2008-2014 Deputy Coordinator for Research, San Diego Supercomputer Center, UCSD

2004-2014 Founder and Director, Scientific Workflow Automation Technologies Laboratory, SDSC, UCSD

2005-2007 Assistant Director, National Laboratory for Advanced Data Research (NLADR) - Data, SDSC, UCSD

2001-2004 Research Programmer (P/A III), SDSC, UCSD

1999-2001 Research Assistant, Middle East Technical University (Ankara, TURKEY)

Products (Out of 100+) 1. I. Altintas, J. Block, R. de Callafon, D. Crawl, C. Cowart, A. Gupta, M.Nguyen, H.W. Braun, J.

Schulze, M. Gollner, A. Trouve, L. Smarr: Towards an Integrated Cyberinfrastructure for Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience of Wildfires. In Proceedings of the Workshop on Dynamic Data-Driven Application Systems (DDDAS) at the 15th International Conference on Computational Science (ICCS 2015), Procedia Computer Science, Volume 51, 2015, Pages 1633-1642, ISSN 1877-0509, doi:10.1016/j.procs.2015.05.296. (Best Paper Award)

2. Kepler Scientific Workflow System Releases 1.0, 2.0 through 2.4. (Downloaded by 100K+) 3. J. Wang, D. Crawl, I. Altintas, W. Li. Big Data Applications using Workflows for Data Parallel

Computing. Computing in Science & Eng., 16(4), pp. 11-22, July-Aug. 2014, IEEE.

2

4. J. Wang, P. Korambath, I. Altintas, J. Davis, D. Crawl. Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms. In Proceedings of International Conference on Computational Science (ICCS 2014), pages 546-556. DOI: 10.1016/j.procs.2014.05.049

5. B. Ludaescher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, Y. Zhao, Scientific Workflow Management and the Kepler System, Concurrency and Computation: Practice & Experience, 18(10), pp. 1039-1065, 2006. (Cited by 2124 in October 2019.)

Other Selected Products 6. I. Altintas, M.K. Anand, T. Vuong, S. Bowers, B. Ludaescher, P.M.A. Sloot, “A Data Model for

Analyzing User Collaborations in Workflow-Driven eScience,” The International Journal of Computers and Their Applications (IJCA), 2011. Vol. 18, No. 3, p.160 – 180, Dec, 2011.

7. I. Altintas, A.W. Lin, J. Chen, C. Churas, M. Gujral, S. Sun, W. Li, R. Manansala, M. Sedova, J.S. Grethe, and M. Ellisman, “CAMERA 2.0: A Data-centric Metagenomics Community Infrastructure Driven by Scientific Workflows,” In Proceedings of the SWF 2010 at IEEE SERVICES '10, pp. 352-359, 2010. DOI=10.1109/SERVICES.2010.89

8. A. Goderis, C. Brooks, I. Altintas, E. Lee, and C. Goble, “Heterogeneous composition of models of computation,” FGCS, vol. 25, no. 5, pp. 552–560, 2009.

9. I.Altintas, O. Barney, E. Jaeger-Frank, Provenance Collection Support in the Kepler Scientific Workflow System, in Provenance and Annotation of Data, LNCS Volume 4145/2006, pages 118-132, 2006. (Cited by 361 in October 2019.)

10. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludaescher, and S. Mock, “Kepler: An extensible system for design and execution of scientific workflows,” in Intl. Conference on Scientific and Statistical Database Management (SSDBM), Greece, 2004. (Cited by 1103 in October 2019.)

Recent Synergistic Activities • Associate Editor, Elsevier Future Generation Computer Systems - Impact Factor: 5.768 (since

2012) • Massive Open Online Course (MOOC) Instructor, Coursera and edX – over 1 Million students

worldwide (since 2016) • Member, The National Academies of Sciences, Engineering, and Medicine Committee on

Forecasting Costs for Preserving, Archiving, and Promoting Access to Biomedical Data, 2019-2020

• Member, The National Academies of Sciences, Engineering, and Medicine Committee on Realizing Opportunities for Advanced and Automated Workflows in Scientific Research, 2019-2020

• Advisory Board Member, National Center for Atmospheric Research (NCAR) Computational and Systems Information Lab (since 2017)

ERY ARIAS-CASTRO

CONTACT INFORMATION

Department of Mathematics Voice: (858) 534-3590University of California, San Diego Fax: (858) 534-5273La Jolla, CA 92093-0112 (USA) E-mail: [email protected]

EDUCATION

Ph.D. in Statistics, Stanford University 2004

M.S. in Artificial Intelligence and Applied Mathematics, Ecole Normale Superieure de Cachan and WashingtonUniversity in Saint Louis 1998

B.S. in Mathematics, Ecole Normale Superieure de Cachan 1997

PROFESSIONAL

Professor, Mathematics, University of California, San Diego 2015–present

Associate Professor, Mathematics, University of California, San Diego 2011–2015

Assistant Professor, Mathematics, University of California, San Diego 2005–2011

Postdoctoral Fellow, Mathematical Sciences Research Institute Spring 2005

Postdoctoral Fellow, Institute for Pure and Applied Mathematics Fall 2004

MEMBERSHIPS

Institute of Mathematical Statistics

SERVICE

Committee work within UCSD: Academic Integrity Review Board [2012-13], Faculty mediator [2018-]

Associate Editor: Annals of Statistics [2013-19], Journal of the American Statistical Association [2014-], Jour-nal of the Royal Statistical Society [2016-], Electronic Journal of Statistics [2015-], ESAIM - Probability andStatistics [2015-], ALEA [2019-]

Area Chair: Conference on Learning Theory (COLT) [2016], Artificial Intelligence and Statistics (AISTATS)[2017, 2019, 2020]

Guest Editor: Special Issue on Detection, IEEE Journal of Selected Topics in Signal Processing, 2012

Reviewer: IEEE Transactions on Image Processing, IEEE Transactions on Information Theory, IEEE Transac-tions on Signal Processing, IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal of Mathe-matical Imaging and Vision, Annals of Statistics, Electronic Journal of Statistics, Journal of Multivariate Analy-sis, Journal of the Royal Statistical Society (Series B), Annals of Applied Statistics, Statistical Science, Journalof Nonparametric Statistics, Solar Energy, The Astrophysical Journal, Bernoulli, Journal of the American Sta-tistical Association, ESAIM: Probability and Statistics, Journal of Machine Learning Research, Conference onLearning Theory, Artificial Intelligence and Statistics, International Conference on Machine Learning, etc.

Conference Organization: Math+Stats+X, a conference in honor of David Donoho’s 60th birthday, 2017 (co-organizer); IMS Meeting, 2014 (session organizer); Probability and Statistics Day, 2013 (co-organizer); Meetingof New Researchers in Statistics and Probability, 2014 (board member); Meeting of New Researchers in Statisticsand Probability, 2013 (board member); Meeting of New Researchers in Statistics and Probability, 2012 (chairand local chair); Quality and Productivity Research Conference, 2012 (session organizer)

Other: NSF grant panel [2012, 2016]

FIVE RECENT PUBLICATIONS

1. E. Arias-Castro, A. Javanmard, and B. Pelletier, “Perturbation bounds for procrustes, classical scaling,and trilateration, with applications to manifold learning,” Journal of Machine Learning Research, vol. 21,pp. 15–1, 2020

2. E. Arias-Castro, S. Bubeck, G. Lugosi, and N. Verzelen, “Detecting Markov random fields hidden in whitenoise,” Bernoulli, vol. 24, no. 4B, pp. 3628–3656, 2018

3. E. Arias-Castro, G. Lerman, and T. Zhang, “Spectral clustering based on local PCA,” The Journal ofMachine Learning Research, vol. 18, no. 1, pp. 253–309, 2017

4. E. Arias-Castro, “Some theory for ordinal embedding,” Bernoulli, vol. 23, no. 3, pp. 1663–1693, 2017

5. N. Verzelen and E. Arias-Castro, “Community detection in sparse random networks,” The Annals of AppliedProbability, vol. 25, no. 6, pp. 3465–3510, 2015

1. Name: Mikhail Belkin

2. Education – degree, discipline, institution, year:

Ph. D., mathematics, University of Chicago, 2003. M.Sc. , mathematic, University of Chicago, 1997. B. Sc. Mathematics, University of Toronto, 1995.

3. Academic experience – institution, rank, title (chair, coordinator, etc. if appropriate)

Ohio State University, professor, 2017-present. Simons Institute for the Theory of Computing, UC Berkeley, visiting faculty, 2017, 2019. Ohio State University, associate professor, 2012-2017. Ohio State University, assistant professor, 2005-2012

4. Current membership in professional organizations:

Association for Computing Machinery (ACM)

5. Honors and awards: NSF Career Award, Google Faculty Research Award, Lumley Research Award.

6. Service activities (within and outside of the institution):

Editorial Board Service: SIAM Journal on Mathematics of Data Science, Associate editor, 2020 – present. The Journal of Machine Learning Research, Action Editor, 2011 – 2020; IEEE Transactions on Pattern Recognition and Machine Intelligence, Associate Editor, 2011 – 2016. Recent workshop organizing activities: Steering Committee of Midwest Machine Learning Symposium, 2018-present; Information Modeling and Control of Complex Systems Workshop, Ohio State University, 2016, 2017; Simons Institute Workshop on Spectral Algorithms: From Theory to Practice (co-chair), 2014. Recent program committee service: Area chair/PC for COLT 2019, NeurIPS 2018, ICML 2018, ICML 2017; AAAI 2017; COLT 2016; AI and Statistics 2015.

7. Briefly list the most important publications and presentations from the past five years –

title, co-authors if any, where published and/or presented, date of publication or presentation.

• Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal, Reconciling modern machine learning practice and the bias-variance trade-off, PNAS, 2019, 116 (32).

• Chaoyue Liu, Libin Zhu, Mikhail Belkin, Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning, arxiv, 2020.

• Chaoyue Liu, Mikhail Belkin, Accelerating Stochastic Training for Over-parametrized Learning, ICLR 2020.

• Mikhail Belkin, Daniel Hsu, Ji Xu, Two models of double descent for weak features, arxiv 2019.

• Mikhail Belkin, Siyuan Ma, Soumik Mandal, To understand deep learning we need to understand kernel learning, ICML 2018.

• Siyuan Ma, Mikhail Belkin, Kernel machines that adapt to GPUs for effective large batch training, SysML 2019.

• Mikhail Belkin, Daniel Hsu, Partha Mitra, Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate, Neural Inf. Proc. Systems (NeurIPS) 2018.

• Siyuan Ma, Raef Bassily, Mikhail Belkin, The power of interpolation: understanding the effectiveness of SGD in modern over-parametrized learning, ICML 2018.

Jelena Bradic Department of Mathematics Office: 858-534-3590 & Halicioglu Data Science Institute Email: [email protected] University of California, San Diego (UCSD) Homepage: http://www.jelenabradic.net Applied Physics & Mathematics Building (AP&M) 9500 Gilman Drive # 0112 La Jolla, CA 92093-0112

Education

PhD Operations Research and Financial Engineering Princeton University 2011 Magister Probability and Statistics Belgrade University 2007 BS Mathematics Belgrade University 2004 Academic Experience

Stanford University, Statistics Department, Visiting Associate Professor, 2019-present

University of California San Diego, Halicioglu Data Science Institute, Associate Professor (with tenure), 2019-present

University of California San Diego, Mathematics Department, Associate Professor (with tenure), 2018-present

University of California San Diego, Mathematics Department, Assistant Professor (on maternity leave 2011/2012), 2011-2018

Professional Associations

Institute of Mathematical Statistics, American Statistical Association, Bernoulli Society Honors and Awards

Journal of the American Statistical Association Discussion Paper, to be awarded in JSM meeting, for the paper Tuning-free Robust Regression with High-Dimensional Heavy-Tailed Data, 2020 Leads Fellow, Uc San Diego, 2019 NSF DMS award 17212481, PI(single), Hypothesis Testing in High-Dimensions without Sparsity, 2017 Hellman Fellowship, awarded by Hellman Foundation, 2014 NSF DMS award 1205296, PI(single), Regularization for High-Dimensional Inference and Sparse Recovery, 2012 Laha Award (superseded by IMS New Researcher Travel Award), awarded by IMS, 2010

Professional Service

Program Chair Elect, section on Statistical Learning and Data Science of the American Statistical Association, 2020-2021 Task-force on Equity, Diversity and Inclusion in Undergraduate Education, UC San Diego, 2019-2020 Program co-Chair, conference on Statistical Learning and Data Science, 2020 Associate Editor, Journal of the American Statistical Association, 2019- Associate Editor, Journal of Nonparametric Statistics, 2019- Associate Editor, Scandinavian Journal of Statistics, 2019- Treasurer, section on Nonparametric Statistics, of the American Statistical Association, 2018-2019 Task-force on the Status of Women in Physical Sciences, UC San Diego, 2018-2019 NSF DMS Panelist, Statistics section, 2019, 2017, 2016 Publications Bradic, Jelena and Jianqing Fan and Zhu, Yinchu (2020), Testability of high-dimensional linear models with non-sparse structures, to appear at the Annals of Statistics Bradic, Jelena and Claekens, Gerda and Gueuning,Thomas (2020), Testing fixed effects in high-dimensional misspecified linear mixed models, Journal of the American Statistical Association: Theory & Methods, 115 (529), 1-16. Zhu, Yinchu and Bradic, Jelena (2018), Linear hypothesis testing in dense high-dimensional linear models, Journal of American Statistical Association:Theory & Methods,113(524), 1583-1600. Li, Alexander Hanbo and Bradic, Jelena (2018), Boosting in the presence of outliers: classification with non- convex loss functions, Journal of American Statistical Association: Theory & Methods, 512 (113), 660-674. Ryzhov, Ilya and Han, Bin and Bradic, Jelena (2016), Cultivating Disaster Donors: A Case Application of Scalable Analytics for Big Data, Management Science, 62(3), 849-866.

ALEXANDER C. [email protected]

EMPLOYMENTUNIVERSITY OF CALIFORNIA, SAN DIEGO La Jolla, CAAssistant Professor July 2017-presentMathematics Department 2017-presentHalicioglu Data Science Institute 2020-present

YALE UNIVERSITY New Haven, CTGibbs Assistant Professor and NSF Postdoctoral Fellow September 2014-June 2017Applied Mathematics Program

EDUCATIONUNIVERSITY OF MARYLAND College Park, MDPh.D. in Applied Math and Scientific Computing Program May 2014Adviser: Wojciech Czaja and John J. BenedettoThesis: Exploiting Data-Dependent Structure for Improving Sensor Acquisition and Integration

WASHINGTON UNIVERSITY St. Louis, MOB.S. Physics (Second Major in Pure Math) May 2009

WORK EXPERIENCEINSTITUTE FOR DEFENSE ANALYSIS Bowie, MDCenter for Computing Sciences 6/2010 - 9/2013

DEPARTMENT OF DEFENSE Various locationsSummer Programs Summer 2008-Summer 2009

AWARDS AND HONORS

� US Patent 10,613,176 - 2D NMR Relaxometry with Partial Data 2020� Co-PI Russel Sage Foundation Grant Number 2196 - Economics and Satellite Imagery 2019-2020� PI on Collaborative Research NSF DMS-1819222 - Generative Models 2018- 2021� Founding Member Halıcıoglu Data Science Institute 2018� NSF Mathematical Sciences Postdoctoral Research Fellowship - Data Fusion 2014- 2017� Spotlight on Student Research Prize, University of Maryland Math Department 2014� Ann G. Wylie Dissertation Fellow 2013� 3rd Place in IEEE GRSS Data Fusion Best Paper Contest 2013� Seymour Goldberg Prize for Exposition, University of Maryland Math Department 2012� Distinguished Teaching Assistant, University of Maryland 2009-2010 and 2010-2011� Gold Metal in Teaching Excellence, University of Maryland Math Department 2010-2011

SERVICE ACTIVITIES

� NSF Panel Reviewer Apr. 2020� Organizer of Mini-symposium “Distance Metrics High Dim. Point Clouds”, ICIAM Jul. 2019� Organizer of Mini-symposium “High Dim. Machine Learning”, DSCO SF Institute Mar. 2019� Organizer of Panel “AI and DNN in Radiation Oncology”, ASTRO Oct. 2018� Organizer of Undergraduate Math Colloquium Talks Fall 2018� Founding Faculty HDSI Institute, UCSD Mar. 2018� Organizer of Mini-symposium “Laplacians and Applications”, SIAM PDE Conference Dec. 2017� Organizer of Applied Math Seminar at Yale University 2014-2016

SELECTED PUBLICATIONS

� A Potapov, I Colbert, K Kreutz-Delgado, A Cloninger, S Das. “PT-MMD: A Novel Statistical Frame-work for the Evaluation of Generative Systems.” ASILOMAR, 2019.

� X. Cheng, A. Cloninger, and R.R. Coifman. “Two Sample Statistics Based on Anisotropic Kernels.”Information and Inference, 2019.

� A. Cloninger, B. Roy, C. Riley, and H. Krumholz. “People Mover’s Distance: Class level geometry usingfast pairwise data adaptive transportation costs.” Applied and Computational Harmonic Analysis,2019.

� G. Mishne, U. Shaham, A. Cloninger, I. Cohen. “Diffusion Nets.” Applied and Computational Har-monic Analysis, 2018.

� A Cloninger, S Steinerberger. “On the dual geometry of Laplacian eigenfunctions.” ExperimentalMathematics, 2018.

� J. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, Y. Kluger. “Deep Survival: A Deep CoxProportional Hazards Network.” BMC medical research methodology, 2018.

� A. Cloninger. “A Note on Markov Normalized Magnetic Eigenmaps.” Applied and ComputationalHarmonic Analysis, 2017.

� A. Cloninger, W. Czaja, and T. Doster. “The Pre-image Problem for Laplacian Eigenmaps UtilizingL1 Regularization with Applications to Data Fusion.” Inverse Problems, 2017.

� A. Cloninger, S. Steinerberger. “Spectral Echolocation via the Wave Embedding.” Applied and Com-putational Harmonic Analysis, 2017.

� U. Shaham, A. Cloninger, R. Coifman. “Provable approximation properties for deep neural networks.”Applied and Computational Harmonic Analysis, 2017.

� A. Cloninger. “Prediction models for graph-linked data with localized regression.” SPIE: Waveletsand Sparsity XVII, 2017.

� Nicholas S Downing, Alexander Cloninger, Arjun K Venkatesh, Angela Hsieh, Elizabeth E Drye, RonaldR Coifman, Harlan M Krumholz. “Describing the performance of US hospitals by applying big dataanalytics.” PloS One, 2017.

� A Cloninger, S Steinerberger. “On suprema of autoconvolutions with an application to Sidon sets.”Proceedings of the American Mathematical Society, 2017.

� A. Cloninger, R. Coifman, N. Downing, H. Krumholz. “Bigeometric Organization with Deep Nets.”Applied and Computational Harmonic Analysis, 2016.

� A. Cloninger, W. Czaja. “Eigenvector Localization on Data-Dependent Graphs.” SampTA, 2015.� A. Hafftka, H. Celik, A. Cloninger, W. Czaja, R. Spencer. “2D Sparse Sampling Algorithm for ND

Fredholm Equations with Applications to NMR Relaxometry.” SampTA, 2015.� R. Bai, P. Basser, A. Cloninger, W. Czaja. “Efficient 2D MRI Relaxometry Using Compressed Sens-

ing.” Journal of Magnetic Resonance, 2015.� N Jamil, X Chen, A Cloninger. “Hildreth’s algorithm with applications to soft constraints for user

interface layout.” Journal of Computational and Applied Mathematics, 2015.

SELECTED TALKS

� Kernel approaches in global statistical distances, local measure detection, and active learning. Collo-quium talk, Claremont Graduate University, Claremont, CA, February 5, 2020.

� Dual Geometry of Laplacian Eigenfunctions with Applications to Graph Wavelets, Cuts, and Visual-ization. Jubilee of Fourier Analysis and Applications, College Park, MD, September 21, 2019.

� Manifold Learning with Diffusion Variational Autoencoders. Approximation Theory 16, VanderbiltUniversity, Nashville, TN, May 21, 2019.

� Fast Detection of Inter-Group Differences in Images. Statistical, Variational, and Learning Techniquesin Image Analysis, Joint Math Meetings, Baltimore, MD, January 19, 2019.

� New Developments in AI/Deep Learning. Artificial Intelligence and Deep Learning Within RadiationOncology, 2018 ASTRO Meeting, San Antonio, TX, October 23, 2018.

� Fast Point Cloud Distances and Multi-Sample Testing. Applied Harmonic Analysis and Data Process-ing, Mathematisches Forschungsinstitut Oberwolfach, Oberwolfach, Germany, March 29, 2018.

� Deep Learning Function Approximation on Manifolds. Applied Harmonic Analysis, Massive Data Sets,Machine Learning, and Signal Processing Workshop, Casa Matematica Oaxaca, October 18, 2016

� Defining Distances Between High-Dimensional Point Clouds. Symposium on Advanced ComputationalMethods in Biomedical Imaging, National Institutes of Health, October 6, 2016

BIOGRAPHICAL SKETCH NAME: de Sa, Virginia R

EDUCATION DEGREE

Completion Date

FIELD OF STUDY

Queen’s University B.Sc. Engineering

06/1998 Mathematics and (Electrical) Engineering

University of Rochester PhD 06/1994 Computer Science University of Toronto Postdoc 12/1995 Computer Science University of California, San Francisco Postdoc 08/2001 Theoretical

Neuroscience Academic Experience 1994-1995 Postdoctoral Fellow, Computer Science, University of Toronto (mentor: Geoff Hinton) 1996-2001 Postdoctoral Fellow, Physiology, University of California at San Francisco (mentors: Michael Merzenich, Michael Stryker) 2001-2008 Assistant Professor, Cognitive Science, University of California at San Diego 2008-2018 Associate Professor, Cognitive Science, University of California at San Diego 2018-present Professor, Cognitive Science, University of California at San Diego 2019-present Associate Director, Halıcıoğlu Data Science Institute, University of California at San Diego Current Membership in Professional Organizations Founding Member of the BCI Society Honors and Awards 1988 Medal in Mathematics and Engineering (88), Queen's University [highest (in ME dept) standing] 1988 Professional Engineer's Gold Medal (88), Queen's University [highest standing in final year] 1988 Governor-General's Medal, Queen's University [highest standing throughout 4 years of Eng] 1988-1992 Natural Sciences and Engineering Research Council of Canada (NSERC) 1967 Science and Engineering Scholarship [one of 47 given to graduating students across Canada] 1994-1995 Natural Sciences and Engineering Research Council of Canada (NSERC) Postdoctoral Fellowship 1996-1998 Sloan Postdoctoral Fellowship 2001-2007 NSF CAREER Award 2003-2004 UCSD Faculty Career Development Program Award 2007-2008 UCSD Chancellor’s Collaboratories award 2012-2013, 2016-2017, 2018-2019 Kavli Innovative Research Award 2016-2017, 2017-2018 UCSD Frontiers of Innovation Scholars Program 2019-2020 Kavli Symposium Inspired Proposal Award Service Activities (External) 1999, 2000 Advanced Tutor, EU Advanced Course in Computational Neuroscience, Trieste, Italy 2001-2002 Co-chair for the Neural Information Processing Systems workshops 2002 Member of NSF, Knowledge and Cognitive Systems, grant review panel 2003 Member of NSF, Machine Learning, grant review panel 2008 Member of NSF, Robust Intelligence, grant review panel 2009-present Institutional Review Board for Neurosky

2013,2016 Member of NSF, Human Centered Computing, grant review panel 2007,2014,2017,2018,2019,2020 Program Committee (Neural Information Processing Systems (NIPS)) 2016,2017,2018,2019,2020 Program Committee (Cognitive Science Conference) 2018,2019 Program Committee (International Conference on Learning Representations (ICLR)) 2019 Program Committee (International Joint Conference on Artificial Intelligence (IJCAI)) 2020 NSF grant Panel Review CISE Recent Service Activities (Internal) 2019-present Associate Director of the Halıcıoğlu Data Science Institute 2019-present Served on Capacity-based admissions workgroup 2019-2020 Chaired one and sat on 3 other faculty recruitment committees 2019-present Executive Committee Institute for Neural Computation Most important publications and presentations Noh, E., Liao, K., Mollison, M.V., Curran, T., & de Sa, V.R. (2018). Single-trial EEG analysis predicts memory retrieval and reveals source-dependent differences. Frontiers in Human Neuroscience 12:258. doi: 10.3389/fnhum.2018.00258 Mousavi, M., & de Sa, V.R. (2019). Temporally Adaptive Common Spatial Patterns with Deep Convolutional Neural Networks. Proceedings of the 41st Annual International Conference of the IEEE EMBS Engineering in Medicine and Biology Society (EMBC'19) Xu, X. Huang, J. & de Sa, V.R. (2019) Pain Evaluation in Video using Extended Multitask Learning from Multidimensional Measurements. Proceedings of Machine Learning Research (Machine Learning for Health ML4H at NeurIPS 2019). Liao, K., Mollison, M., Curran, T., and de Sa, V.R. (2018). Single-Trial EEG Predicts Memory Retrieval Using Leave-One-Subject-Out Classification. First International Workshop on Machine Learning for EEG Signal Processing (MLESP 2018). Noh, E., Liao, K., Mollison, M.V., Curran, T., & de Sa, V.R. (2018). Single-trial EEG analysis predicts memory retrieval and reveals source-dependent differences. Frontiers in Human Neuroscience 12:258. doi: 10.3389/fnhum.2018.00258

Mousavi, M. & de Sa, V.R. (2019) Spatio-temporal analysis of error-related brain activity in active and passive brain-computer interfaces. Brain-Computer Interfaces. https://doi.org/10.1080/2326263X.2019.1671040

Mousavi, M., Koerner, A.S., Zhang, Q., Noh, E., & de Sa, V.R. (2017) Improving motor imagery BCI with user response to feedback. Brain-Computer Interfaces. doi 10.1080/2326263X.2017.1303253 de Sa, V.R. Using insights from cortical architectures for neural networks. Invited presentation at Cell Press Beijing Conference: AI and the Brain. Nov 6-7, 2019 Sunrise Kempinski Hotel, Beijing China Tang, S. & de Sa, V.R. (2019). Exploiting Invertible Decoders for Unsupervised Sentence Representation Learning. ACL 2019. Most recent professional development activities UCSD faculty leadership academy (2019-present)

Justin Eldridge [email protected] http://web.cse.ohio-state.edu/~eldridge.48/ (330) 803-7449

ResearchInterests

Machine learning and artificial intelligence.Theoretical foundations of unsupervised learning.Scientific discovery aided by machine learning.Pedagogy of computer science and machine learning.

Education Ph.D. in Computer Science 2017The Ohio State UniversityDissertation: Clustering ConsistentlyAdvisors: Mikhail Belkin & Yusu Wang

M.S. in Computer Science 2016The Ohio State University

B.S. in Physics 2011The Ohio State UniversitySumma cum laudeB.S. in Applied Mathematics 2011The Ohio State UniversitySumma cum laude

AcademicPositions

Senior Lecturer Spring 2018CSE 2321: Foundations I (Algorithms), OSU

Post-Doctoral Researcher Spring 2018The Ohio State University

Presidential Fellow 2017The Ohio State University

Graduate Visitor Spring 2017Simons Institute for the Theory of Computing, Berkeley

Graduate Research Assistant 2011-2017Dept. of Computer Science and Engineering, OSUAdvisors: Mikhail Belkin & Yusu Wang

Graduate Research Assistant 2012Center for Cognitive Science, OSUAdvisors: Mikhail Belkin, Simon Dennis, Allison Lane

Graduate Student Instructor 2011-2012CSE 101/105: Intro to Computer-Assisted Problem Solving, OSU

Undergraduate Research Assistant 2010Dept. of Physics, The Ohio State UniversityAdvisor: Fengyuan Yang

Undergraduate Research Assistant 2009Dept. of Physics, University of California, DavisAdvisor: Rena Zieve

Awards Presidential Fellowship, The Ohio State University, 2017.Most prestigious award bestowed by the OSU graduate school.

2016 Neural Information Processing Systems (NIPS) Travel Award.Best Student Paper, Conference on Learning Theory (COLT), 2015.

Beyond Hartigan Consistency, with M. Belkin & Y. Wang.Center for Cognitive Science GRA, The Ohio State University, 2012.Smith Senior Award, Dept. of Physics, The Ohio State University, 2011.National Merit Scholar, 2006.

Eldridge | 2

Publications Conference Papers

Unperturbed: spectral analysis beyond Davis-Kahan.J. Eldridge, M. Belkin, Y. WangAlgorithmic Learning Theory (ALT), 2018.

Graphons, mergeons, and so on!J. Eldridge, M. Belkin, Y. WangNeural Information Processing Systems (NIPS), 2016.Full oral, top ∼2% of submissions.

Beyond Hartigan Consistency: Merge Distortion Metric for HierarchicalClustering.J. Eldridge, M. Belkin, Y. Wang.Conference on Learning Theory (COLT), 2015.Mark Fulk Award, best student paper.

Support Vector Machine (SVM) Analysis of Auditory Oddball Event-Related Potentials (ERP) Classifies Toddlers with and withoutEarly Signs of Autism.A.E. Lane, J. Eldridge, K. Harpster, S. Dennis, T. Shahin, M.Belkin.International Meeting for Autism Research (IMFAR), 2012.

Journal Articles

Robust features for the automatic identification of autism spectrum dis-order in children.J. Eldridge, A.E. Lane, M. Belkin, S. Dennis.Journal of Neurodevelopmental Disorders, 2014.

Workshop Abstracts

Graphons, mergeons, and so on!J. Eldridge, M. Belkin, Y. Wang.Abstract, talk. Workshop on Geometry and Machine Learning,2016.

Technical Reports

Denali: A tool for visualizing scalar functions as landscape metaphors.J. Eldridge, M. Belkin, Y. Wang.http://denali.cse.ohio-state.edu/tech_report.pdf

Eldridge | 3

Reviewing Theoretical Computer Science, special issue on Algorithmic LearningTheory.

IEEE Transactions on Pattern Analysis and Machine Learning.

Talks InvitedTulane CS Colloquium, November 2017.Air Force Research Laboratory ATR Summer Seminar, June 2017.Information Theory and Applications, Graduation Day Talk, Feb. 2017.Italian Institute of Technology Machine Learning Seminar, Dec. 2016.

ConferenceNIPS 2016, full oral. Video: https://youtu.be/en_qtNAtkUsCOLT 2015, best student paper. Video: https://goo.gl/c7M42J

SeminarConsistent Clustering. AI Seminar, OSU, November 2017.Graphons, mergeons, and so on! Topology, Geometry, and Data Analysis

(TGDA) seminar, OSU, November 2016.Graphons, mergeons, and so on!. AI Seminar, OSU, November 2016.What do we seek in a hierarchical clustering?, AI Seminar, OSU, April

2015.

Teaching Senior Lecturer, The Ohio State UniversityCSE 2321: Foundations I (Algorithms), Spring 2018

Guest Lecturer, The Ohio State UniversityCSE 5522: Machine Learning – 3 classesCSE 2331: Foundations II (Algorithms) – 2 classes

Graduate Instructor, CSE 101/105: Computer-Assisted Problem Solving,The Ohio State University, 2011-2012.Invited by students to Faculty Appreciation Lunch.

Software Denali: Cross-platform, open source interface for visualizing hierarchiesas landscape metaphors. Written in C++ using Qt and VTK.http://denali.cse.ohio-state.edu

Eldridge | 4

Name: Aaron Fraenkel Education: Ph.D. Mathematics, UC Berkeley 2011 Academic Experience:

● UCSD, Assistant Teaching Professor, Chair Undergraduate Program, 2018-2020 ● Boston College, Visiting Assistant Professor, 2012-2014 ● Pennsylvania State University, Chowla Research Assistant Professor, 2011-2012

Non-Academic Experience:

● ID Analytics, Senior Data Scientist, Fraud Modeling and Identity Resolution, 2014-2016 ● Amazon.com, Senior Machine Learning Scientist, Security and Abuse, 2016-2018

Service: Chair, DSC Undergraduate Program

Rajesh K. Gupta Computer Science and Engineering University of California, San Diego 9500 Gilman Drive, MC 0404 La Jolla, CA 92093-0404 Phone: 858-822-4391 http://mesl.ucsd.edu EDUCATION Indian Institute of Technology, Kanpur Electrical Engineering B. Tech, 1984. UC Berkeley, Berkeley, CA Electrical Engineering M.S., 1986. & Computer Science Stanford University, Stanford, CA Electrical Engineering Ph. D., 1994. ACADEMIC APPOINTMENTS 2018-now Distinguished Professor, Computer Science & Engineering, UC San Diego 2018-now Director, Halicioglu Data Science Institute, UC San Diego 2003-2018 Qualcomm Chair Professor, Computer Science & Eng., UC San Diego 2006 Visiting Professor, EPFL, Lausanne, Switzerland 2005 Visiting Professor, Electrical Engineering, Stanford University 2002-2003 Professor of Information and Computer Science, UC Irvine 1998-2002 Associate Professor of Information and Computer Science, UC Irvine 1996-1997 Assistant Professor of Information and Computer Science, UC Irvine 1994-1996 Assistant Professor of Computer Science, U. Illinois, Urbana-Champaign. 1986-1993 Senior Design Engineer, Intel Corporation, Santa Clara, California. RELEVANT RECENT PUBLICATIONS

1. “Who can Access What, and When? Understanding Minimal Access Requirements of Building Applications,” J. Koh, D. Hong, S. Nagare, S. Boovaraghavan, Y. Agarwal, R. K. Gupta ACM International Conference on Systems for Buildings, Cities, and Transportation (BuildSys), 2019.

2. “Beyond a House of Sticks: Formalizing Metadata Tags with Brick”, G Fierro, J. Koh, Y. Agarwal, R. K. Gupta, D. E. Culler ACM Buildsys, 2019.

3. “Plaster: An Integration, Benchmark, and Development Framework for Metadata Normalization Methods”, J. Koh, D. Hong, R. K. Gupta, K. Whitehouse, H. Wang, Y. Agarwal, ACM BuildSys, 2018

4. “Scrabble: Transferrable Semi-Automated Semantic Metadata Normalization using Intermediate Representation,” J. Koh, B. Balaji, D. Sengupta, J. McAuley, R. K. Gupta, Y. Agarwal, ACM BuildSys, 2018.

5. “Brick: Metadata schema for portable smart building applications,” B. Balaji, A. Bhattacharya, G. Fierro, J. Gao, J. Gluck, D. Hong, A. Johansen, J. Koh, J. Ploennings, Y. Agarwal, M. Berges, D. Culler, R. K. Gupta, Mikkel B. Kjaergaard, M. Srivastava, K. Whitehouse, Applied Energy, 2018.

OTHER RECENT PUBLICATIONS 1. “Towards verified programming of embedded devices,” J-P Talpin, J-J Marty, S.

Narayana, D. Stefan, R. K. Gupta, Design, Automation & Test in Europe (DATE), 2019 2. “Real Time Principal Component Analysis,” R. R. Chowdhury, M. A. Adnan, R. K.

Gupta, IEEE International Conference on Data Engineering (ICDE), 2019. 3. “Zodiac: Organizing Large Deployment of Sensors to Create Reusable Applications for

Buildings,” B. Balaji, C. Verma, B. Narayanaswamy, Y. Agarwal, ACM Buildsys, 2015.

4. “Sentinel: An Occupancy Based HVAC Actuation System using existing WiFi Infrastructure in Commercial Buildings”, B. Balaji, J. Xu, R. K. Gupta, Y. Agarwal, ACM Conference on Embedded Networked Sensor Systems (SenSys 2013), 2013.

5. “Duty-Cycling Buildings Aggressively: The Next Frontier in HVAC Control,” Y. Agarwal, B. Balaji, S. Dutta, R. K Gupta, T. Weng, IEEE/ACM International Conference on Information Processing in Sensor Networks: Sensor Platforms, Tools and Design Methods (IPSN/SPOTS), April 2011.

RECENT AND ONGOING SYNERGISTIC ACTIVITIES

1. Editor-in-Chief, IEEE Trans on CAD, 2018-. 2. General Chair, BuildSys 2018, IPSN 2009, CPSWeek 2009 3. Founding Editor-in-Chief, IEEE Embedded Systems Letters, 2009-13. 4. Founding General Chair, ACM/IEEE Conference on Models and Methods in Codesign

(MEMOCODE) 5. Founding General Co-Chair of ACM/IEEE/IFIP CODES+ISSS Conference.

Arun Kumar

3218 CSE/EBU3b, UC San Diego9500 Gilman Drive, Mail Code 0404La Jolla, CA 92093

Email : [email protected]: (+1) 614-602-9734Web: http://cseweb.ucsd.edu/~arunkk/

EDUCATION University of Wisconsin-MadisonPh.D. in Computer Sciences. 2011–2016M.S. in Computer Sciences. 2009–2011

Indian Institute of Technology, MadrasB.Tech. in Computer Science and Engineering. 2005–2009

ACADEMICEXPERIENCE

University of California, San Diego, Assistant ProfessorDepartment of Computer Science and Engineering (CSE) From 2016Halicioglu Data Science Institute (HDSI) From 2019

NON-ACADEMICEXPERIENCE

Microsoft Jim Gray Systems LabResearch Assistant. Fall 2013–Summer 2016

Microsoft Cloud and Information Services LabResearch Intern. Summer 2013

Oracle LabsResearch Intern. Summer 2012

IBM Research AlmadenResearch Intern. Summer 2011

PROFESSIONALMEMBERSHIPS

Association for Computing Machinery (ACM)Member since 2010. Professional Member since 2017.

The Institute of Electrical and Electronics Engineers (IEEE)Member since 2014.

SELECTEDHONORS

Google Faculty Research Award 2020, 2017Invited Paper at ACM Transactions on Database Systems 2020, 2016Honorable Mention for Best Paper Award at ACM SIGMOD 2019ACM SIGMOD Distinguished PC Member 2019, 2017VLDB Distinguished PC Member 2019Hellman Fellowship 2018Faculty of the Year from UCSD oSTEM Chapter 2018UW-Madison CS Graduate Student Research Award for best PhD research 2016Best Paper Award at ACM SIGMOD 2014Invited Paper at the Communications of the ACM 2013

MAJORSERVICE

Organizer: Associate Editor for VLDB‘21, XLDB‘18, SIGMOD‘18 DEEM Work-shop, SIGKDD‘18 CMI Workshop, SoCal DB Day 2018

PC Member: SIGMOD ‘17–‘20, VLDB ‘18–‘21, MLSys ‘19–‘20, ICDE‘17, SIG-MOD‘17 Demo and SRC, HotCloud‘16, SIGMOD‘16 URC

Reviewer: ACM TODS 2017 and 2015, IEEE TKDE 2014

Proposal Reviewer/Panelist:NSF SBIR/STTR Phase II 2020, DOE Solar Office 2020, NSF HDR Data ScienceCorps 2019

Key Department/University Service:2019–20: HDSI Faculty Recruiting Committee; 2019–20: CSE Bylaws Committee;2018: UCSD LGBTQIA+ Undergraduate Scholarships Committee; 2017–20: CSEMS Committee; 2017: UCSD SDSC Sustainability Committee; 2016–17: CSE PhDAdmissions Committee

Key Contributions to Diversity:2019–20: Represented UCSD CSE at NSF Workshop on Departmental BPC Plans;co-authored CSE’s Departmental BPC plan2019: Organized a panel on LGBTQ+ community resources on UCSD on CSE Cele-bration of Diversity Day2017–18: Represented UCSD and CSE twice at oSTEM National Conference2017: Co-proposed/created a new UCSD CSE PhD diversity-focused scholarship2017–: Active member of UCSD CSE DEI Committee2017–: Actively involved with UCSD LGBT Resource Center and oSTEM activities(Q & A panels, talks, scholarships, etc.) as an out faculty member

PUBLICATIONSSUMMARY

Full papers at top-tier conferences (SIGMOD, VLDB, etc.): 21Other peer-reviewed conference and journal papers: 6Peer-reviewed workshop and demonstration papers: 12Full papers under submission: 3Number of citations: 1431 and h-index: 15 (as per Google Scholar in May 2020)Full list of publications: https://adalabucsd.github.io/publications.html

SELECTEDMAJORPUBLICATIONS

Vista: Declarative Feature Transfer from Deep CNNs at ScaleSupun Nakandala and Arun KumarACM SIGMOD 2020 (To appear)

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Ex-planationsSupun Nakandala, Arun Kumar, and Yannis PapakonstantinouACM SIGMOD 2019 (Honorable Mention for Best Paper Award)

Data Management in Machine Learning SystemsMatthias Boehm, Arun Kumar, and Jun YangSynthesis Lectures on Data Management, Morgan & Claypool Publ. (Book), 2019

Are Key-Foreign Key Joins Safe to Avoid when Learning High Capacity Classifiers?Vraj Shah, Arun Kumar, and Xiaojin ZhuVLDB 2018

Towards Linear Algebra over Normalized DataLingjiao Chen, Arun Kumar, Jeffrey Naughton, and Jignesh M. PatelVLDB 2017

Learning Generalized Linear Models Over Normalized DataArun Kumar, Jeffrey Naughton, and Jignesh M. PatelACM SIGMOD 2015

Materialization Optimizations for Feature Selection WorkloadsCe Zhang, Arun Kumar, and Christopher ReACM SIGMOD 2014 (Best Paper Award; Invited to ACM TODS 2016)

Yian Ma

E-mail: [email protected]: https://sites.google.com/view/yianma

EXPERIENCE

Assistance Professor, Halicioglu Data Science Institute, University of California, San Diego July 2020 –

Visiting Faculty, Google Brain Health and Google Research August 2019 – July 2020

Post-doctoral Fellow, Electrical Engineering and Computer Sciences September 2017 – August 2019University of California, Berkeley, CA, USAAdvisor: Michael I. Jordan

EDUCATION

Ph.D. of Science, Applied Mathematics June 2017University of Washington, Seattle, WA, USAAdvisors: Emily B. Fox and Hong Qian

Bachelor of Engineering, Computer Science and Engineering (honor thesis) June 2012Shanghai Jiao Tong University, Shanghai, China

SELECTED PUBLICATIONS

• Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, Michael I. Jordan. Sampling can be faster thanoptimization, Proc. Natl. Acad. Sci., 2019.

• Chris Aicher, Yi-An Ma, Nick Foti, Emily B. Fox. Stochastic gradient MCMC methods for state spacemodels, SIAM J. Math. Data Sci., 2019.

• Yi-An Ma, Emily B. Fox, Tianqi Chen, Lei Wu. Irreversible samplers from jump and continuousMarkov processes, Stat. Comput. (2018).

• Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael I. Jordan. Onthe theory of variance reduction for stochastic gradient Monte Carlo, in Proceedings of InternationalConference on Machine Learning 35 (ICML 2018).

• Yi-An Ma, Nick Foti, Emily B. Fox. Stochastic gradient MCMC methods for hidden Markov models,in Proceedings of International Conference on Machine Learning 34 (ICML 2017).

• Xiaojie Qiu, Andrew Hill, Jonathan Packer, Dejun Lin, Yi-An Ma, Cole Trapnell. Single-cell mRNAquantification and differential analysis with Census, Nature Methods (2017).

• Yi-An Ma, Tianqi Chen, Emily B. Fox. A complete recipe for stochastic gradient MCMC, in Advancesin Neural Information Processing Systems 28 (NIPS 2015).

SELECTED TALKS

• Bridging MCMC and Optimization

– Statistics Department Seminar, Mathematics Department, University of California, Davis; April 2020.

– Mathematics Department Seminar, Duke University; Sept. 2019.

– Invited Talk at Microsoft Research New England, Boston, MA; Aug. 2019.

– Invited Talk at Google Research, San Francisco, CA; July 2019.

– Statistics Department Seminar, University of Warwick; Feb. 2019.

– Machine Learning Department Seminar, Carnegie Mellon University; Feb. 2019.

– Halicioglu Data Science Institute (HDSI) Seminar, University of California, San Diego; Feb. 2019.

– Statistics Department Seminar, Rutgers University; Feb. 2019.

– Department of Statistics and Data Science Seminar, Yale University; Feb. 2019.

– Statistics Department Seminar, Eberly College of Science, Penn State; Jan. 2019.

– Courant Institute and Center for Data Science Seminar, New York University; Jan. 2019.

– Stewart School of Industrial and Systems Engineering (ISyE) Seminar, Georgia Tech; Jan. 2019.

• When is sampling faster than optimization and how to accelerate it?

– Invited talk at Bayes Comp, University of Floridap; Jan. 2020.

– Statistics and Data Science Symposium, Halicioglu Data Science Institute (HDSI), UC San Diego; Jan. 2019.

• Stochastic gradient MCMC for independent and correlated data

– Statistics Department Seminar, University of Minnesota; April 2018.

– SAMSI Workshop on “Trends and advances in Monte Carlo algorithms”, Duke University; Dec. 2017.

– Invited talk at Pacific Northwest National Lab (PNNL), Richland, WA; Sept. 2017.

• A unifying framework for constructing MCMC algorithms from irreversible diffusion processes

– Probability Seminar, University of California, Berkeley; April 2018.

• Scalable and efficient MCMC for complex posteriors

– Statistics Department Seminar, Stanford University; Feb. 2017.

– Invited talk at Los Alamos National Lab (LANL), Los Alamos, NM; Jul. 2016.

– ICERM Workshop on “Stochastic numerical algorithms, multiscale modeling and high-dimensional data analytics”,

Brown University; July, 2016.

– Intractable Likelihood (I-Like) Workshop; Lancaster University, Lancaster, UK; June, 2016.

• Stochastic gradient MCMC methods for hidden Markov models

– 34th International Conference on Machine Learning (ICML); Sydney, Australia; Aug. 2017.

• A complete recipe for stochastic gradient MCMC

– 29th Annual Conference on Neural Information Processing Systems (NIPS); Montreal, Canada; December, 2015.

SELECTED AWARDS

• 2017 Stein fellowship (declined for other opportunities).• Best undergraduate thesis “Lyapunov functions for oscillatory and chaotic dynamical systems” awarded

by the computer science department, Shanghai Jiao Tong University.

SERVICES

• Journals: reviewer for Journal of the American Statistical Association (JASA), Biometrika, Bernoulli,Journal of Machine Learning Research (JMLR), Statistics and Computing.

• Conferences: reviewer for Advances in Neural Information Processing Systems (NeurIPS/NIPS), Inter-national Conference on Machine Learning (ICML), Annual Conference on Learning Theory (COLT);served on the program committee of AAAI conference on Artificial Intelligence.

• Secretary for the University of Washington Chapter of Society for Industrial and Applied Mathematics(SIAM), 2015-2016.

PATENT

• Patent: Real Time Supervise Machine against Traffic Law Violation awarded by the State IntellectualProperty of the Peoples Republic of China in 2012 (Patent No.: 201120076406.X).

1. Gal Mishne

2. Education – BSc, Electrical Engineering, 2009 BSc, Physics, 2009 PhD, Electrical Engineering 2017

3. Academic experience – Yale University, Gibbs Assistant Professor, 2017-2019 UC San Diego, Assistant Professor, 2019-present

4. Non-academic experience – Rafael Advanced Defense Systems Ltd., Image processing engineer, 2008-2014

5. Awards AMS-Simons Travel Grant, 2018 SIAM Early Career Travel Award, 2017 Wolf Foundation Award for Ph.D. students, 2016

6. Service activities • Hiring committee – HDSI/Neurobiology, 2020 • PhD program committee – HDSI, 2020 • Reviewer – Cosyne 2020; ICML 2020; Neural Computation; Involve; IEEE

Transactions on Image Processing; IEEE ICASSP; Elsevier Information Sciences; Journal of Mathematical Imaging and Vision; Neurons, Behavior, Data analysis, and Theory; Advances in Computational Mathematics,

• DeepMath 2020 – Co-organizer

7. Briefly list the most important publications from the past five years • X. Cheng and G. Mishne, ``Spectral embedding norm: To look deep into the spectrum of

the graph Laplacian", accepted to SIAM Imaging Sciences. • G. C. Linderman, G. Mishne}, A. Jaffe, Y. Kluger and S. Steinerberger,``Randomized

nearest neighbor graphs, giant components and applications in data science", accepted to Advances in Applied Probability.

• S. Gigante, A. S. Charles, S. Krishnaswamy and G. Mishne, ``Visualizing the PHATE of Neural Networks", NeurIPS-2019, December 2019.

• G. Mishne, Eric C. Chi and R. R. Coifman, ``Co-manifold learning with missing data", ICML 2019, June 2019.

• X. Cheng, G. Mishne, and S. Steinerberger, ``The geometry of nodal sets and outlier detection", Journal of Number Theory, vol. 185, pp 48--64, 2018.

• G. Mishne, R. Talmon, I. Cohen, Y. Kluger and R. R. Coifman, ``Data-driven tree transforms and metrics", IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 3, pp. 451--466, Sept. 2018

• G. Mishne, U. Shaham, A. Cloninger and I. Cohen, ``Diffusion Nets", Applied and Computational Harmonic Analysis, Aug. 2017.

• G. Mishne, R. Talmon, R. Meir, J. Schiller, M. Lavzin, U. Dubin and R. R. Coifman, ``Hierarchical coupled-geometry analysis for neuronal structure and activity pattern discovery", IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 7, pp. 1238-1253, Oct. 2016.

8. Briefly list the most recent professional development activities

DIMITRIS N. POLITIS

Education

1990 Stanford University, Ph.D. in Statistics. 1990 Stanford University, M.S. in Statistics. 1989 Stanford University, M.S. in Mathematics. 1985 Rensselaer Polytechnic Institute, M.S. in Computer and Systems Engineering. 1984 University of Patras, B.S. in Electrical Engineering.

Academic Positions

2018-present Associate Director, Halicioglu Data Science Institute, UCSD. 2016-present Distinguished Professor, Department of Mathematics; also affiliated with the Department of Economics, University of California at San Diego. 2001-2016 Professor, Department of Mathematics, and Adjunct Professor, Department of Economics, University of California at San Diego. 2014 (Summer) John-von-Neumann Visiting Professor, Department of Mathematics, Technical University of Munich, Germany. 1997-2001 Associate Professor, Department of Mathematics, UCSD. 1999 (Fall) Visiting Associate Professor, Athens University of Economics and Business. 1995-1997 Associate Professor, Department of Mathematics and Statistics, University of Cyprus. 1995-1996 Associate Professor, Department of Statistics, Purdue University (on leave). 1990-1995 Assistant Professor, Department of Statistics, Purdue University.

Professional Associations

Institute of Mathematical Statistics (Fellow), American Statistical Association (Fellow), The Econometric Society, International Society for NonParametric Statistics (Co-founder).

Grants and Awards

2020 Co-Principal investigator, NIH grant T32 MH122376-01 [304118-00001] `Advanced data analytics training for behavioral and social sciences research'. 2019 Principal investigator, NSF grant DMS 19-14556, `Computer-intensive methods for nonparametric analysis of dependent data'. 2013 Awarded the Econometric Theory Multa Scripsit Award. 2013 Fellow of the Institute of Advanced Study, Technische Universitaet Muenchen. 2012 Co-Principal investigator, NSF grant DMS 12-23137, `ATD: Detection of Clusters in Spatial Data and Images'. 2012 Awarded the Tjalling C. Koopmans Econometric Theory Prize 2009-2011 for the paper "Higher-Order Accurate, Positive Semi-Definite Estimation of Large-Sample Covariance and Spectral Density Matrices," Econometric Theory, Vol. 27, No. 4, August 2011, pp. 703-744. 2011 Fellowship from the John Simon Guggenheim Memorial Foundation for the project "Model-free Prediction and Regression".

2011 Elected Fellow of the American Statistical Association (ASA). Citation reads: ``For path-breaking research in nonparametric statistics, for outstanding applications of this methodology to time series analysis, resampling, subsampling, and function estimation; and for exemplary leadership and service to the profession, especially for conference organization and prolific editorial work.'' 2004 Elected Fellow of the Institute of Mathematical Statistics (IMS). Citation reads: ``Prof. Politis received the award for innovative methodology in the analysis of time series and models of spatial dependence, as well as groundbreaking theory in nonparametric statistics".

Professional Service

• Chair (2018-2019) and Chair-Elect (2017-2018) of the Section on Nonparametric Statistics of the American Statistical Association.

• Member of the External Review Committee for the Department of Statistics, Purdue University, Sept. 2016.

• Co-founder (with M. Akritas and S.N. Lahiri) of the International Society for NonParametric Statistics (ISNPS), and member of the ISNPS Executive Committee 2010-2016.

Editorial Work

• Co-Editor of the Journal of Time Series Analysis, 2013--present. • Editor of the IMS Bulletin, 2011--2014. • Founding member of the Editorial Board of the Springer Book Series: Frontiers in

Probability and the Statistical Sciences, 2012--present. • Editor of the Journal of Nonparametric Statistics, 2008--2011. • Associate Editor of the journal Econometrics and Statistics, 2016--2018. • Associate Editor of the journal Bernoulli, 2013--2018. • Associate Editor of the Electronic Journal of Statistics, 2013--2018. • Associate Editor of the Journal of the American Statistical Association, 2011--2020. • Associate Editor of the Journal of the Royal Statistical Society, Series B, 2009--2012. • Associate Editor of the IMS Collections Series, 2008--2012. • Associate Editor of the Journal of Time Series Analysis, 2006--2013. • (Associate) Editor of the Journal of Multivariate Analysis, 2005--2011. • (Associate) Editor of the Journal of Nonparametric Statistics, 2005--2008. • Associate Editor of the Journal of Statistical Planning and Inference, 2000--2006. • Co-Editor of Sankhya, the Indian Journal of Statistics, 1999--2002.

Publications

Co-author of over 100 journal papers (see list at: www.math.ucsd.edu/~politis/) and of the books: --SUBSAMPLING, D.N. Politis, J.P. Romano, M. Wolf, Springer, New York, 1999. --MODEL-FREE PREDICTION AND REGRESSION: A TRANSFORMATION-BASED APPROACH TO INFERENCE, D.N. Politis, Springer, New York, 2015. --TIME SERIES: A FIRST COURSE WITH BOOTSTRAP STARTER, T.S. McElroy and D.N. Politis, Chapman and Hall/CRC Press, 2020.

1. Name: Rayan Saab

2. Education:

• The University of British Columbia, Vancouver, Canada; Electrical Engineering;

PhD., 2010

3. Academic experience: • 2017–present: Associate Professor, Mathematics, The University of California,

San Diego (UCSD), San Diego, CA • 2013–2017: Assistant Professor, Mathematics, The University of California, San

Diego (UCSD), San Diego, CA • 2011–2013: Visiting Assistant Professor, Mathematics, Duke University,

Durham, NC • 2010–2011: Postdoctoral Researcher, Mathematics, The University of British

Columbia, Vancouver, Canada;

4. Non-academic experience: N/A

5. Certifications or professional registrations

6. Member of the AMS

7. Honors and awards:

• August-Wilhelm Scheer Visiting Prof., Teschnische Universitat Munchen (June

2017) • Hellman Fellowship (July 2015 -- June 2016) • Simons Foundation Collaboration Grant (2015, declined) • Mercator Fellowship (2014 -- 2017) • Banting Postdoctoral Fellowship (October 2011 - September 2013) • NSERC Postdoctoral Fellowship (2011, declined)

8. Service activities (within and outside of the institution): • Within UCSD (last 2 years):

i. 2018/2019 - 3 departmental merit review AdHoc Committees, Mathematics Department Hiring Committee, Data Science Hiring Committee, HDSI Faculty Council, Mathematics Department Strategic Growth Committee, Mathematics Department Space Committee, Faculty Advisor, Mathematics – Computer Science Major, Math Dept., UCSD.

ii. 2019/2020 – Faculty Advisor, Mathematics – Applied Math Major, Math Dept., UCSD, Graduate Admissions Committee, Mathematics

Department, HDSI Undergraduate Scholarship Committee, UCSD, HDSI PhD Program Planning Committee, HDSI Hiring Committee

• Outside UCSD (last 3 years): Special session organizer, Joint Math Meetings, Atlanta GA, 2017, Session organizer, Information Theory and Applications Workshop, San Diego CA, February 2017, 2018, 2019, Invited session organizer, Sampling Theory and Applications (SampTA), July 2017, 2019, IMA Special Workshop “Phaseless Imaging in Theory and Practice: Realistic Models, Fast Algorithms, and Recovery Guarantees”, August 14 - 18, 2017, Currently co-organizing the international inter-institutional “One World Mathematics of Information, Data, and Signals (MINDS) seminar.

9. Briefly list the most important publications and presentations from the past five years –

title, co-authors if any, where published and/or presented, date of publication or presentation

• T. Huynh, R. Saab, “Fast binary embeddings, and quantized compressed sensing with structured matrices", Communications on Pure and Applied Mathematics, Vol. 73, no. 1, pp 110 – 149, 2020.

• D. Needell, R. Saab, T. Woolf, “Simple Classification using Binary Data", The Journal of Machine Learning Research, Vol. 19, no. 1, pp. 2487 – 2516, 2018.

• R. Saab, R.Wang, Ö. Yılmaz, “Quantization of compressive samples with stable and robust recovery", Applied and Computational Harmonic Analysis, vol. 44, pages 123–143, 2018.

• K. Knudson, R. Saab, R. Ward, “One-bit compressive sensing with norm estimation", IEEE Transactions on Information Theory, vol. 62, no. 5, pages 2748–2758, 2016.

• M. A. Iwen, B. Preskitt, R. Saab, A. Viswanathan, “Phase retrieval from local measurements: Improved robustness via eigenvector-based angular synchronization," Applied and Computational Harmonic Analysis, Vol. 48, no. 1, pp. 415 – 444, 2020.

• R. Saab, R. Wang and Ö. Yılmaz, “From Compressed Sensing to Compressed Bit-Streams: Practical Encoders, Tractable Decoders," IEEE Transactions on Information Theory, Vol. 64, no. 9, pp. 6098-6114, 2018.

10. Briefly list the most recent professional development activities N/A (?)

1

Armin Schwartzman

Education Technion – Israel Inst. of Tech. Haifa, Israel Electrical Eng. BS 1995 California Inst. of Tech. Pasadena, CA Electrical Eng. MS 1996 Stanford University Stanford, CA Statistics PhD 2006 Academic Appointments (full time) U. of California, San Diego Biostatistics & HDSI Professor 2019- U. of California, San Diego Biostatistics Assoc. Prof. 2016-2019 Technion – Israel Inst. of Tech. Industrial Eng. Visiting Prof. 2015-2016 North Carolina State University Statistics Assoc. Prof. 2013-2015 Technion – Israel Inst. of Tech. Electrical Eng. Visiting Prof. 2012-2013 Harvard School of Public Health Biostatistics Assist. Prof. 2007-2012 Professional Experience (full time) DaimlerChrysler Research & Tech. Palo Alto, CA Research Intern 2013 Biosense Webster (Israel) Ltd. Haifa, Israel Algorithm Developer 1999-2001 Rockwell Semiconductor Systems San Diego, CA Systems Engineer 1996-1998 Professional Memberships: Institute of Mathematical Statistics (lifetime member), American Statistical Association (lifetime member) Honors and Awards (excluding research grants) U. of California, San Diego Hispanic Center of Excellence (HCOE) Fellow 2018 U. of California, San Diego Hispanic Center of Excellence (HCOE) Fellow 2018 Stanford University Teaching award in statistics 2003 Stanford University W. R. Kimbal and S. Heart Graduate Fellowship 2001 Technion – Israel Inst. of Tech. President's Academic List of Honors 1992, 1994 Technion – Israel Inst. of Tech. Dean’s Academic List of Honors 1993 Service Activities 1. Academic Advising: 9 postdoctoral scholars, 17 PhD students (4 as main thesis adviser + 5

as PhD committee member + 8 other), 4 MSc students, 2 BSc student. 2. University Service:

UC San Diego: HDI MS Program Cmte. (AY 2019-20); HDSI PhD Program Cmte. (AY 2019-20); HDSI Faculty Council (AY 2019-20); HDSI Advisory Board (AY 2018-19); Biostatistics Executive Committee (AY 2018-20); Biostatistics PhD Program Admissions Cmte. (AY 2016-20); Biostatistics PhD Program Education Cmte. (AY 2016-20); BS in Public Health Steering Cmte. (AY 2018-19); Biostatistics Hiring Cmte. (AY 2016-17).

NC State University: Written Preliminary Exam Committee (AY 2014-15); Hiring Committee (AY 2014-15); Basic Exam Committee (AY 2013-14).

Harvard SPH: High Dimensional Data Seminar co-Chair (AY 2007-12); Qualifying Exam Cmte. (AY 2008-11); Newsletter Cmte. (AY 2007-09); Degree Program Cmte. (AY 2007-08); Diversity Cmte. (AY 2007-08); Seminar Cmte. (AY 2006-07).

2

3. Reviewer services: a. Associate Editor, Electronic Journal of Statistics (2016-Present); Econometrics and

Statistics, Special issue on Neuroimaging (2018 - 2020) b. Journal reviewer: 77 times for 17 statistical journals; 29 times for 16 scientific journals. c. Grant Reviewer: Emerging Imaging Technologies in Neuroscience (EITN) Study

Section, NIH (2020); Dutch Research Council (NWO) (2019); Israeli-Qu\'{e}bec Collaboration in Medical Bio-Imaging (2017); Statistics Program, Division of Mathematical Sciences, NSF (2017); Israeli-Quebec Collaboration in Medical Bio-Imaging (2017); Biostatistical Methods and Research Design Study Section (BMRD), NIH (2015); Network for Translational Research: Optical Imaging, NCI/NIH (2008), In-vivo Cellular and Molecular Imaging Centers, NCI/NIH (2007).

4. Conference organization: Comp. and Methodological Statistics, Pisa, Italy (2018), London, UK (2017) and Sevilla, Spain (2016); Joint Statistical Meetings, Montreal, QC (2013) and Vancouver, BC (2010); International Biometric Society, Fort Collins, CO (2012) and San Luis Obispo (2011); Harvard Cancer Center (2009); Radcliffe Institute for Advanced Study (2008).

Publications: 57 original articles (29 in methodological journals, 28 in other scientific journals), 4 invited comments, 5 refereed conference proceedings.

MOST IMPORTANT PUBLICATIONS – LAST FIVE YEARS 1. Schwartzman A, Keeling RF. Achieving atmospheric verification of CO2 emissions. Nature Climate Change 2020; 10: 416-417. 2. Schwartzman A, Schork A, Zablocki R, Thompson WK. A simple, consistent estimator of heritability from genome-wide association studies. Ann. Appl. Stat. 2019; 13(4): 2509-2538. 3. Schwartzman A, Telschow F. Peak p-values and false discovery rate inference in neuroimaging. Neuroimage 2019; 197: 402-413. 4. Sommerfeld M, Sain S, Schwartzman A. Asymptotic Confidence Regions for Spatial Excursion Sets, with an Application to Climate. J. Amer. Stat. Assoc. 2018; 113:523, 1327-1340. 5. Cheng D, Schwartzman A. Expected Number and Height Distribution of Critical Points of Smooth Isotropic Gaussian Random Fields. Bernoulli 2018; 24(4B), 3422-3446. 6. Cheng D, Schwartzman A. Multiple Testing of Local Maxima for Detection of Peaks in Random Fields. Ann. Stat. 2017; 45(2): 529-556. 7. Schwartzman A. Log-Normal Distributions and Geometric Averages of Positive Definite Matrices. International Statistical Review 2016; 84(3): 456-486. 8. Azriel D, Schwartzman A. The Empirical Distribution of a Large Number of Correlated Normal Variables. J. Amer. Stat. Assoc. 2015; 110(511): 1217-1228. Oral Presentations (national and international): 63 invited seminars (Statistics, Biostatistics, Mathematics and Engineering departments); 41 invited conference presentations; 9 contributed conference presentations; 11 contributed posters. Recent Professional Development Activities: Mentoring and teaching workshops as fellow of

the UCSD Hispanic Center of Excellence (2017-2019).

Please use the following format for the faculty vitae (2 pages maximum in Times New Roman 12 point type)

1. Name Jingbo Shang

2. Education – degree, discipline, institution, year:

PhD, Computer Science, University of Illinois at Urbana-Champaign, 2019

3. Academic experience – institution, rank, title (chair, coordinator, etc. if appropriate),

when (ex. 2002-2007), full time or part time University of California, San Diego, Assistant Professor, 2020-Present

4. Non-academic experience – company or entity, title, brief description of position, when

(ex. 2008-2012), full time or part time

5. Certifications or professional registrations

6. Current membership in professional organizations

7. Honors and awards Google PhD Fellowship 2017-2019 World Champions in IEEE Xtreme Competition, 2018 and 2019 The Web Conference Best Poster Award Runner-up, 2018 4th Place in Fake News Challenge, 2017 Grand Prize in Yelp Dataset Challenge, 2015

8. Service activities (within and outside of the institution)

ACM SIGKDD 2020 Workshop Co-Chair. Program Committee member of ACL and EMNLP since 2019. Reviewers of IEEE TKDE, TKDD, and TBD journals. Tutorial Organizers at ACM SIGKDD since 2017. Chief Judge of ACM-ICPC North America Mid-Central Region since 2019.

9. Briefly list the most important publications and presentations from the past five years – title, co-authors if any, where published and/or presented, date of publication or presentation

There are about 40 publications in the past 5 years. The complete list can be found on my website (https://shangjingbo1226.github.io/publications/) or my Google scholar page (https://scholar.google.com/citations?user=0SkFI4MAAAAJ&hl=en). Here are a few examples:

1. “Contextualized Weak Supervision for Text Classification,” D. Mekala and J. Shang. Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

2. “Empower Entity Set Expansion via Language Model Probing,” Y. Zhang, J. Shen, J. Shang and J. Han. Annual Meeting of the Association for Computational Linguistics (ACL), 2020.

3. “NetTaxo: Automated Topic Taxonomy Construction from Large-Scale Text-Rich Network,” J. Shang, X. Zhang, L. Liu, S. Li and J. Han. The Web Conference (WWW), 2020.

4. “Integrating Local Context and Global Cohesiveness for Open Information Extraction,” Q. Zhu, X. Ren, J. Shang, Y. Zhang, A. EI-Kishky and J. Han. International Conference on Web Search and Data Mining (WSDM), 2019.

5. “CrossWeigh: Training Named Entity Tagger from Imperfect Annotations,” Z. Wang, J. Shang, L. Liu, L. Lu, J. Liu and J. Han. ACL SIGDAT Empirical Methods in Natural Language Processing (EMNLP), 2019.

6. “Learning Named Entity Tagger using Domain-Specific Dictionary,” J. Shang, L. Liu, X. Gu, X. Ren, T. Ren and J. Han. ACL SIGDAT Empirical Methods in Natural Language Processing (EMNLP), 2018

7. “Empower Sequence Labeling with Task-Aware Neural Language Model,” L. Liu, J. Shang, F. Xu, X. Ren, H. Gui, J. Peng and J. Han. AAAI Conference on Artificial Intelligence (AAAI), 2018.

8. “Automated Phrase Mining from Massive Text Corpora,” J. Shang, J. Liu, M. Jiang, X. Ren, C. Voss and J. Han. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.

9. “MetaPAD: Meta Pattern Discovery from Massive Text Corpora,” M. Jiang, J. Shang, T. Cassidy, X. Ren, L. Kaplan, T. Hanratty and J. Han. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2017.

10. Briefly list the most recent professional development activities

1. NAME. Benjamin Smarr 2. EDUCATION. Ph.D., Neurobiology and Behavior Program, University of Washington. B.S., Biological Sciences, University of California at Santa Cruz. 3. POSITION. Assistant Professor in Bioengineering and the Halicioğlu Data Science Institute, University of California 3/2019- NIH K99 fellow, UC Berkeley, 2017-2019 NIH F23 fellow, UC Berkeley, 2014-2017 Postdoctoral researcher, Kriegsfeld lab, UC Berkeley, 2013 4. CONSULTING AND ADVISING HALE Sports, 2018 - Scientific advisor Decoding Superhuman Podcast, 2018 - Scientific consultant OWaves, 2017 - Scientific advisor Stephen Auger Lighting / AhhBe, 2016 - Scientific advisor Invicta, 2016 – Scientific advisor PrimaTemp, 2016 - Scientific advisor Oura Ring, 2015 - Scientific consulting Reverie Sleep Systems, 2015 - Scientific advisor 5. CURRENT MEMBERSHIPS Society for Behavioral Neuroendocrinology, 2014- Endocrine Society, 2012- Society for Research in Biological Rhythms, 2011- Society for Neuroscience, 2007- Inaugural student president, Center for Sensorimotor Neural Engineering, UW, 2011-2012 President, Neurobiology and Behavior Community Outreach, UW, 2010-2012 Organizing member of Neurobiology and Behavior Community Outreach, UW, 2006-2012 6. SELECTED AWARDS AND HONORS 2017-2022. Pathway to Independence Award (K99/R00), NIEHS (PI: Smarr). “Understanding the Impact of Environmental Disruption in Biological Timing Systems Through Signal Processing”. 2014-2017. Postdoctoral Ruth L. Kirschstein National Service Award (NRSA), NICHD (PI: Smarr). “The Role of Circadian Stability During Development in Adult Health and Behavior”. 2012-2018. Research Merit Award, Society for Research in Biological Rhythms 2016. Best in Body Area Networks Award, Body Area Networks 2012. Seed Project Grant, NSF Center for Sensory-motor and Neural Engineering, University of Washington (PI: Smarr). “Circadian Modulation of Neuromotor Control”. 2012. Education and Outreach-Development Grant, NSF Center for Sensory-motor and Neural Engineering, University of Washington (PI: Smarr). “Neural Engineering Outreach Development”. 2010. Outreach activities featured in Science with picture, SCIENCE VOL 328 25 JUNE 2010 2009. NSF Doctoral Dissertation Improvement Grant (DDIG) (PI: de la Iglesia and Smarr). “Circadian Regulation of Female Reproductive Physiology”. 7. SERVICE Science in Society Outreach Vocal proponent of data rights in biomedicine, and frequent public speaker / interviewee about the role of technology advancing public health, and the importance of the individual in this process. Recent publications include Wired, The Economist, BBC Business Daily, San Francisco Chronicle, Readers’ Digest, and many others globally.

Education outreach Over 100 K-12 “What do Brains do” school visits and lab tours; Special emphasis on schools in lower socioeconomic status neighborhoods; Organizer, “Brain Days Fair” at University of Washington through the Neurobiology and Behavior Community Outreach; Created lesson plans for neuroscience classroom activities with over 10,000 downloads; 8. SELECT PUBLICATIONS FROM 2016-2020, AND THEIR IMPORTANCE. Koskimaki H, Kinnunen H, Salla R, Smarr B. Following the heart – What does variation of resting heart rate tell us about us as individuals and as population. UbiComp. 2019. Winner, best paper. Smarr B, Cutler T, Loh D, Kuljis D, Kudo T, Kriegsfeld L, Ghiani C, Colwell C. Circadian dysfunction in the (z)Q175 model of Huntington’s disease: network analysis. J. Neurosci Res. 2019. Smarr B, Schirmer A. 3.4 million real-world learning management system logins reveal the majority of students experience social jetlag correlated with decreased performance. Scientific Reports. 2018 March 29; 8:4793. Featured on BBC, Reddit front page, The Telegraph, ScienceDaily, Slate, and dozens of other news outlets globally. Gharibans AA, Smarr B, Kunkel DC, Kriegsfeld LJ, Hayat MM, Coleman TP. Demonstration and Validation of a Noninvasive System for Measuring Gastric Myoelectric Activity in Ambulatory Subjects. Scientific Reports. 2018 March 22; 8: 5019. Featured in San Diego Union-Tribune, EurekaAlert!, MobiHealthNews, and many national news outlets. Smarr B, Grant A, Perez L, Zucker I, Kriegsfeld LJ. Maternal and Early-Life Circadian Disruption Have Long Lasting Negative Consequences on Offspring Development and Adult Behavior in Mice. Scientific Reports. 2017 June 12; 7(1):3326. Smarr B, Grant A, Zucker I, Prendergast BJ, Kriegsfeld LJ. Sex Differences in Variability Across Time Scales in Mice. Biology of Sex Differences. 2017 Feb 9; 8:7. Smarr B, Zucker I, Kriegsfeld LJ. Detection of Successful and Unsuccessful Pregnancies in Mice within Hours of Pairing Through Frequency Analysis of High Temporal Resolution Core Body Temperature Data. PLoS One. 2016 Jul 28; 11(7):e0160127. Smarr B, Burnett DC, Mesri SM, Kriegsfeld LJ, Pister KSJ. A Wearable Sensors System with Circadian Rhythm Stability Estimation for Prototyping Biomedical Studies. IEEE Transactions on Affective Computing. 2016 Jul; 7(3): 220-230. Smarr B, Kriegsfeld LJ. “Biological Rhythms.” American Psychiatric Association Handbook. 2016; 599-614. My most important work of the last 5 years has been leveraging time series data into insights about health outcomes. In animal models these include the ability to detect pregnancy within hours of conception, and to predict pregnancy outcome within the first day of pregnancy; to develop within-individual tracking of Huntington’s disease progression; to demonstrate that female animals show less variability than males – not more - across ovulatory cycles when faster biological rhythms are included in analyses; and that disruptions to circadian rhythms during pregnancy cause autism-like outcomes in resultant offspring. I have transitioned this work into human populations, contributing a number of important insights in the last few years. These include: determining wearable device design for inferring circadian phase from body temperature; inferring internal hormonal concentrations from time series features of wearable sensor data; prediction of student academic performance from sleep and circadian metrics; co-discovery of circadian and ultradian rhythms within the gastric system; seasonal variation in cardiac output across large populations. My work is now focused on developing early insights into COVID-19 and other illnesses from wearable device data. Manuscripts in preparation include detection of fever, prediction of illness onset, and classification of illness variants from physiological time series features. This work is being carried out with participation from 50,000 wearable device users, and is an important template for expanded capture of “natural experiments” for developing tools for individuals at public-health scale using distributed hardware infrastructure of personal tracking devices. Together, these works focus on unlocking the novel potential of continuous physiological data generated from wearable devices and related technologies, with a focus on modernizing women’s health, education outcomes, and long-term care and monitoring in illness. 8. RECENT PROFESSIONAL DEVELOPMENT NSF Big Data in Public Infrastructure workshop, participant World Economic Forum on Public Health Potential from Wearable Devices, participant and presenter TemPredict public-private planning sessions, coordination with NIH, DoD, wearable device manufacturers, UCSD @ HDSI and SDSC

Berk UstunEducation Massachusetts Institute of Technology 2010 – 2017

PhD in Electrical Engineering and Computer ScienceSM in Computation for Design and Optimization

University of California, Berkeley 2005 – 2009BS in Industrial Engineering and Operations Research, BA in Economics

AcademicPositions

University of California, San Diego 2021 – PresentAssistant Professor, Halıcıoğlu Data Science Institute

Google Research 2020 – 2021Visiting Faculty, Google Medical Brain

Harvard University 2017 – 2020Postdoctoral Fellow, Center for Research on Computation and Society

SelectedProfessionalExperience

Petal. New York, NY 2015 – 2020Co­Founder & Technical AdvisorSpearheaded machine learning strategy to lend responsibly to consumers without credithistory in the US. PetalPetalPetalPetalPetalPetalPetalPetalPetalPetalPetalPetalPetalPetalPetalPetalPetal is a credit card startup with over 100 employees and 50K customers.

Amazon. Seattle, WA Summer 2013Research Scientist Intern, IPC Buying Strategy TeamDeveloped algorithms to identify complementary products. Proposed new inventory andtransportation policies for complementary products that achieved major cost savings.

SelectedHonors &Awards

INFORMS Innovative Applications in Analytics Award 2016 & 2019INFORMS Computing Society Best Student Paper Prize 2017INFORMS Wagner Award for Excellence in Operations Research Practice, Finalist 2017MIT Presidential Fellowship 2012

AcademicService

Conference & Workshop Organization: NeurIPS Workshop Selection Program Commit­tee (2020), FAT* – Co­Chair for the Computer Science Track (2020), FAT/ML – WorkshopOrganizer & Webmaster (2017, 2018), INFORMS Session Organizer (2013­2019)

Grant Reviewing: National Science Foundation Panelist (2019)Journal Reviewing: Management Science, IEEE Transactions on Signal Processing, Statis­tical Analysis & Data Mining, Artificial Intelligence, Information Sciences, Minds & Ma­chines, Big Data, Epidemiology, Nature Digital Medicine, Artificial Intelligence & Law,Journal of Quantitative Criminology, IBM Journal of Research & Development.

ConferenceProgramCommittee: NeurIPS (2018, 2019, 2020), ICML (2019), ICLR (2020),FAT* (2018, 2019), AISTATS (2019), AAAI (2019), HCOMP (2019), UAI (2018), ISIT (2018)

Page 1 of 2

Advising PhD StudentsJennifer Chien, UCSD CSE 2020 – PresentJamelle Watson­Daniels, Harvard SEAS 2020 – PresentEric Mibuari, Harvard SEAS 2018 – PresentHao Wang, Harvard SEAS 2017 – Present

MS StudentsHaorang Zhang, University of Toronto 2020 – PresentVinith Suriyakumar, University of Toronto 2020 – PresentAlexander Spangher, Columbia University 2017 – 2019

UndergraduatesTynan Seltzer, Harvard SEAS 2018 – PresentCharles Marx, Haverford College Summer 2019Jiaming Zeng, MIT 2014 – 2016

SelectedPapers

1. Predictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in ClassificationPredictive Multiplicity in Classification.CharlesMarx, Flavio Calmon, Berk Ustun. International Conference onMachine Learning, 2020

2. Learning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk ScoresLearning Optimized Risk Scores.Berk Ustun, Cynthia Rudin. Journal of Machine Learning Research, 2019

3. Fairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference GuaranteesFairness without Harm: Decoupled Classifiers with Preference Guarantees.Berk Ustun, Yang Liu, David C. Parkes. International Conference on Machine Learning, 2019

4. Repairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual DistributionsRepairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions.Hao Wang, Berk Ustun, Flavio Calmon. International Conference on Machine Learning, 2019

5. Actionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear ClassificationActionable Recourse in Linear Classification.Berk Ustun, Alexander Spangher, Yang Liu. ACM Conference on Fairness, Accountability, andTransparency, 2019

6. The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5The World Health Organization Adult ADHD Self­Report Screening Scale for DSM­5.Berk Ustun, Lenard Adler, Cynthia Rudin, Stephen Faraone, Thomas Spencer, Patricia Berglund,Michael Gruber, Ronald C. Kessler. JAMA Psychiatry, 2017

7. Association of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized PatientsAssociation of an EEG­Based Risk Score With Seizure Probability in Hospitalized Patients.Aaron Struck, Berk Ustun, Andres Ruiz, Jong Woo Lee, Suzette LaRoche, Lawrence Hirsch, EmilyGilmore, Jan Vlachy, Hiba A. Haider, Cynthia Rudin, Brandon Westover. JAMA Neurology, 2017

8. Interpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism PredictionInterpretable Classification Models for Recidivism Prediction.Jiaming Zeng, Berk Ustun, Cynthia Rudin. JRSS Series A, 2016

9. Supersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring SystemsSupersparse Linear Integer Models for Optimized Medical Scoring Systems.Berk Ustun, Cynthia Rudin. Machine Learning, 2015

ProfessionalDevelopment

MIT Kaufman Teaching Certification Program Spring 2014Completed semester­long course on modern approaches to curriculum design, lecturing,and assessment.

Berk Ustun ­ Faculty Vitae Page 2 of 2

Bradley Voytek, Ph.D.

Initial faculty appointment: March 01, 2014 University of California, San Diego 9500 Gilman Dr. La Jolla, CA 92093-0515

(858) 643-0002 [email protected]

http://www.voyteklab.com http://github.com/voytekresearch

Education 2004 - 2010 Ph.D. Neuroscience, UC Berkeley 1998 - 2002 B.S. Psychology, University of Southern California

Professional 2020 - current Vice-chair: Data Science Major/Minor Steering Committee, UC San Diego 2019 - current Board Member: UC San Diego, Halıcıoğlu Data Science Institute Industry Member Board 2019 - current Co-founder and Board Member: Data Science Alliance - 501(c)(3) non-profit 2019 - current Diversity Representative and Chair: Halıcıoğlu Data Science Institute, UC San Diego 2018 - current Director: Halıcıoğlu Data Science Institute Scholarship Program, UC San Diego 2018 - current Director: Halıcıoğlu Data Science Institute Distinguished Lecture Series, UC San Diego 2018 - current Affiliate: Institute for Practical Ethics, UC San Diego 2018 - current Associate Professor (with tenure), Cognitive Science, UC San Diego 2018 - 2020 Fellow (similar to endowed chair), Halıcıoğlu Data Science Institute at UC San Diego 2017 - current Executive Committee, Neurosciences Graduate Program, UC San Diego 2017 - current Executive Committee, Halıcıoğlu Data Science Institute, UC San Diego 2017 - 2019 Co-founder, Diversity Outreach and Training Committee, Cognitive Neuroscience Society 2016 - 2019 Diversity Committee Chair: Neurosciences Graduate Program, UC San Diego 2014 - current Cognitive Science Student Association Faculty Representative, UC San Diego 2014 - 2016 Diversity Representative: UC San Diego, Neurosciences Graduate Program 2014 - current Data Science Student Society (DS3) Faculty Representative, UC San Diego 2013 - current Consultant, National Academy of Sciences, Science & Entertainment Exchange

Honors and Awards

• 2017 National Academy of Sciences Kavli Frontiers of Science, Symposium Chair

• 2016 Computational and Systems Neuroscience New Attendee Award

• 2015 National Academy of Sciences Kavli Fellow

• 2015 Alfred P. Sloan Research Fellow in Neuroscience

• 2011 AAAS: Finalist - Early Career Award for Public Engagement with Science

• UC Berkeley: Outstanding Graduate Student Instructor, University-wide teaching award

Research Support

• 2019 - 2023: NIGMS R01 GM134363-01: Tools for parameterizing and visualizing electrophysiological rhythmic and arrhythmic features. $1,262,000 (total). (PI: B. Voytek)

• Intel Corporation: On Device Telemetry Workload Correlations with Personas and Environment. $15,000 (total). (PI: B. Voytek)

• 2018 - 2020: Whitehall Foundation 2017-12-73: Prefrontal Oscillatory Mechanism of “Activity Silent” Memory. $225,000 (total). (PI: B. Voytek)

• 2018: UC Stem Cell Program Innovative Award. $100,000 (total). (PI: A. Muotri; Co-PI: B. Voytek).

• 2018: UC San Diego, Shiley-Marcos Alzheimer’s Disease Research Center (ADRC): Research Training in Alzheimer’s Disease, $35,000 (total). PI: B. Voytek)

• 2017 - 2020: National Science Foundation BCS COGNEURO 1736028: Oscillatory phase dynamics coordinate cognitive neural networks, $471,777 (total). (PI: B. Voytek)

• 2017 - 2020: National Science Foundation DGE NRT 1735234: NRT-IGE: Augmenting, Piloting, and Scaling Computational Notebooks to Train New Graduate Researchers in Data-Centric Programming, $498,751 (total). (PI: James Hollan; Co-PIs: Philip Guo, Scott Klemmer, Bradley Voytek)

Selected Publications

• Voytek B & Knight RT. Prefrontal cortex and basal ganglia contributions to visual working memory. Proc Natl Acad Sci USA 2010.

• Voytek B, Canolty RT, Shestyuk A, Crone NE, Parvizi J, Knight RT. Shifts in gamma phase-amplitude coupling frequency from theta to alpha over posterior cortex during visual tasks. Front Hum Neurosci 2010.

• Voytek B, Davis M, Yago E, Barceló F, Vogel EK, Knight RT. Dynamic neuroplasticity after human prefrontal cortex damage. Neuron 2010.

• Voytek JB & Voytek B. Automated cognome construction and semi-automated hypothesis generation. J Neurosci Methods 2012.

• Voytek B, D’Esposito M, Crone NE, Knight RT. A method for event-related phase/amplitude coupling. NeuroImage 2013.

• Voytek B, Kayser A, Badre D, Fegen D, Chang EF, Crone NE, Parvizi J, Knight RT, D’Esposito M. Oscillatory dynamics coordinating human frontal networks in support of goal maintenance. Nature Neuroscience 2015.

• Voytek B, Kramer MA, Case J, Lepage KQ, Tempesta ZR, Knight RT, Gazzaley A. Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience 2015.

• Voytek B & Knight RT. Dynamic network communication as a unifying neural basis for cognition, development, aging, and disease. Biological Psychiatry 2015.

• Tran TT, Hoffner NC, LaHue SC, Tseng L, Voytek B. Alpha phase dynamics predict age-related visual working memory decline. NeuroImage 2016.

• Voytek B. The virtuous cycle of a data ecosystem. PLOS Computational Biology 12(8): 1-6. 2016.

• Cole SR, van der Meij R, Peterson EJ, de Hemptinne C, Starr PA, Voytek B. Nonsinusoidal beta oscillations reflect cortical pathophysiology in Parkinson's disease. J Neurosci 2017.

• Cole SR & Voytek B. Brain oscillations and the importance of waveform shape. Trends Cogn Sci 2017.

• Voytek B. Social Media, Open Science, and Data Science are Inextricably Linked. Neuron 2017.

• Gao RD, Peterson EJ, Voytek B. Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage 2017.

• Moore SM, Seidman JS, Ellegood J, Gao R, Savchenko A, Troutman TD, Abe Y, Stender J, Lee D, Wang S, Voytek B, Lerch JP, Suh H, Glass C, Muotri A. Setd5 haploinsufficiency alters neuronal network connectivity and leads to autistic-like behaviors in mice. Translational Psychiatry 2019.

• Cole SR, Donoghue T, Gao R, Voytek B. NeuroDSP: A package for neural digital signal processing. Journal of Open Source Software 2019.

• Jackson N, Cole SR, Voytek B, Swann NC. Characteristics of waveform shape in Parkinson’s disease detected with scalp electroencephalography. eNeuro 2019.

• Veerakumar A, Tiruvadi V, Howell B, Waters AC, Crowell AL, Voytek B, Posse PR, Denison L, Rajendra JK, Edwards JA, Bijanki KR, Choi KS, Mayberg HS. Field potential 1/f activity in the subcallosal cingulate region as a candidate signal for monitoring deep brain stimulation for treatment resistant depression. J Neurophysiol. 2019.

• Cole S & Voytek B. Cycle-by-cycle analysis of neural oscillations. J Neurophysiol. In press.

• Trujillo CA*, Gao R*, Negraes PD*, Chaim IA, Domissy A, Vandenberghe M, Devor A, Yeo GW, Voytek B#, Muotri AR#. Complex Oscillatory Waves Emerging from Cortical Organoids Model Early Human Brain Network Development. Cell Stem Cell. 2019. *,# these authors contributed equally

• Robertson MM, Furlong S, Voytek B, Donoghue T, Boettiger CA, Sheridan MA. EEG Power Spectral Slope differs by ADHD status and stimulant medication exposure in early childhood. J Neurophysiol. 122(6): 2427-2437. 2019.

• Molina JL, Voytek B, Thomas ML, Joshi YB, Bhakta SG, Talledo JA, Swerdlow NR, Light GA. Memantine effects on EEG measures of putative excitatory/inhibitory balance in schizophrenia. Biol Psychiatry Cogn Neurosci Neuroimaging. In press.

• Ghatak S, Dolatabadi N, Gao R, Wu Y, Scott H, Trudler D, Sultan A, Ambasudhan R, Nakamura T, Masliah E, Talantova M, Voytek B, Lipton SA. NitroSynapsin ameliorates hypersynchronous neural network activity in Alzheimer hiPSC models. Mol Psychiatry. In press.

• Tran TT, Rolle CE, Gazzaley A, Voytek B. Linked sources of neural noise contribute to age-related cognitive decline. J Cogn Neurosci. In press.

Book

• Verstynen T & Voytek B (2014). Do Zombies Dream of Undead Sheep? A Neuroscientific View of the Zombie Brain: Princeton University Press.

Please use the following format for the faculty vitae (2 pages maximum in Times New Roman 12 point type)

1. Name Tsui-Wei (Lily) Weng

2. Education – degree, discipline, institution, year

Ph.D, EECS, MIT, Sep 2020 M.S, Communication Engineering, National Taiwan University, June 2013 B.S, Electrical Engineering, National Taiwan University, June 2011

3. Academic experience – institution, rank, title (chair, coordinator, etc. if appropriate),

when (ex. 2002-2007), full time or part time Assistant Professor, UCSD HDSI, 2021-present

4. Non-academic experience – company or entity, title, brief description of position, when

(ex. 2008-2012), full time or part time MIT-IBM Watson AI Lab, Research Staff Member, 2020-2021, Full time Google DeepMind, Research Intern, 2019.05-2019.09, Full time IBM Research, Research Intern, 2018.05-2018.08, Full time IBM Research, Research Intern, 2017.05-2017.08, Full time Mitsubishi Electric Research Lab, Research Intern, 2015.06-2015.08, Full time

5. Certifications or professional registrations

6. Current membership in professional organizations

7. Honors and awards

8. Service activities (within and outside of the institution)

Top-tier Machine Learning Conference Reviewers including ICML, CVPR, AAAI, ICLR, ECCV, NeurIPS, 2018-present

9. Briefly list the most important publications and presentations from the past five years –

title, co-authors if any, where published and/or presented, date of publication or presentation

(i) Five Most Relevant Publications:

1. T.-W. Weng, H. Zhang, P.-Y Chen, J. Yi, D. Su, Y. Gao, C.-J. Hsieh and L. Daniel, “Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach”, ICLR 2018

2. T.-W. Weng, H. Zhang, H. Chen, Z. Song, C.-J. Hsieh, D. Boning, I. S. Dhillon and L. Daniel, “Toward Fast Computation of Certified Robustness for ReLU Networks”, ICML 2018

3. T.-W. Weng, P.-Y. Chen, L. M. Nguyen, M. S. Squillante, A. Boopathy, I. Oseledets, and L. Daniel, “PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach”, ICML 2019

4. A. Boopathy, T.-W. Weng, P.-Y. Chen, S. Liu and L. Daniel, “CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks”, AAAI 2019

5. A. Boopathy, T.-W. Weng, S. Liu, P.-Y. Chen and L. Daniel, “ Efficient Training of Robust and Verifiable Neural Networks”, under submission to NeurIPS 2020

(ii) Five Other Significant Publications 1. Y.-S. Weng, T.-W. Weng, and L. Daniel, “Neural Network Control Policy Verification

with Persistent Adversarial Perturbations”, ICML 2020 2. J. Mohapatra, T.-W. Weng, P.-Y. Chen, S. Liu and L. Daniel, “Towards Verifying

Robustness of Neural Networks Against a Family of Semantic Perturbations”, CVPR 2020 3. T.-W. Weng, K. Dvijotham, J. Uesato, K. Xiao, S. Gowal, R. Stanforth, Pushmeet Kohli,

“Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control”, ICLR 2020.

4. H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh and L. Daniel, “Efficient Neural Network Robustness Certification with General Activation Functions”, NeurIPS 2018

5. C.-Y. Ko, Z. Lyu, T.-W. Weng, L. Daniel, N. Wong, and D. Lin, “POPQORN: Quantifying Robustness of Recurrent Neural Networks”, ICML 2019

10. Briefly list the most recent professional development activities

1. Name: Yusu Wang

2. Education – degree, discipline, institution, year

School, university Dates of attendance Location Major subject or

field Degrees Date received

Stanford University Duke University Duke University

2004-2005 2000-2004 1998-2000

Stanford, CA Durham, NC Durham, NC

Computer Science Computer Science Computer Science

Postdoc Ph.D. M.Sc.

2005 2004 2000

Tsinghua University

1993-1998 China Computer Science B.Sc. 1998

3. Academic experience – institution, rank, title (chair, coordinator, etc. if appropriate),

when (ex. 2002-2007), full time or part time

Period of employment

From - To:

Institution, firm or organization Location Rank, title, or

position

2020-Present 2018-2020

Univ. California, San Diego Foundation Research Community of Practice (CoP) at Translational Data Analytics Institute @ OSU

California Ohio

Professor Co-director of CoP

2017-2020 2011-2017 2012-2013 2005-2011

Ohio State University Ohio State University Institute of Science and Technology Ohio State University

Ohio Ohio Austria Ohio

Professor Associate Professor Visiting Professor Assistant Professor

4. Non-academic experience – company or entity, title, brief description of position, when

(ex. 2008-2012), full time or part time N/A

5. Certifications or professional registrations N/A

6. Current membership in professional organizations 2005-present ACM member

7. Honors and awards

(2002) Department service award, Computer Science Dept, Duke University (2004) Best Ph.D Dissertation, Computer Science Dept., Duke University (2006) U.S. Department of Energy (DOE) Early Career Principal Investigator (ECPI) Award (2008) U.S. National Science Foundation (NSF) Career Award (2008) Top Reviewer for journal Computational Geometry: Theory and Applications (2010) A Best Paper Award, Eurovis

(2011) Lumley Research Award, College of Engineering, OSU (2015) Best Paper Award at ACM SIGSPATIAL GIS (2015) Best Student Paper Award at Conf. Learning Theory (COLT)

8. Service activities (within and outside of the institution)

Steering Committee: (06/2020-present) Computational Geometry Steering Committee Associate Editors: (2019-present) SIAM Journal on Computing (SICOMP) (2020-present) Computational Geometry: Theory and Applications (2010-present) Journal of Computational Geometry Program committee (Co-)Chair: (2016) Joint STOC/SoCG Workshop Day (2019) 35th Symposium on Computational Geometry (SoCG) Co-organizers: TGDA@OSU Conference (2016, 2018), TGDA@OSU summer school (2018), NSF CBMS Conference on Elastic Functional and Shape Data Analysis (2019), AMW workshop on Women in Computational Topology at JMM (Joint Mathematics Meeting) (2019).

9. Briefly list the most important publications and presentations from the past five years • (2021) AMS Southeastern Sectional Meeting of the Society, Keynote speaker • (2020) 32nd Canadian Conference on Computational Geometry (CCCG), Keynote

speaker • (2019) 34th Summer Conf. on Topology and its Applications, Semi-Plenary speaker • (2017) 6th Mini-Symposium on Computational Topology, Australia, Keynote speaker • (2016) International Workshop on Topological Data Analysis in Biomedicine (TDA-

Bio), part of ACM BCB, Keynote speaker • Banerjee, L. Magee, D. Wang, X. Li, B. Huo, J. Jayakumar, K. Matho, M. Lin, K. Ram, M.

Sivaprakasam, J. Huang, Y. Wang, and P. Mitra. Semantic segmentation of microscopic neuroanatomical data by combining topological priors with encoder-decoder deep networks. Nature Machine Intelligence. Sept. 2020.

• L. Yan, Y. Wang, E. Munch, E. Gasparovic, and B. Wang. A structural average of labeled merge trees foruncertainty visualization. IEEE Trans. Vis. Comput. Graph. (TVCG), 26(1), 832–842 (2020).

• Q. Zhao and Y. Wang. Learning metrics for persistence-based summaries and applications for graph classification. In 33rd Conf. Neural Information Processing Systems (NeuRIPS), 9855—9866, 2019.

• C. Chen, X. Ni, Q. Bai, Y. Wang. A topological regularizer for classifiers via persistent homology. In 22nd Intl. Conf. Artificial Intelligence and Statistcs, 2573-2582, 2019.

• T. K. Dey, J. Wang and Y. Wang. Graph reconstruction by discrete Morse theory. In Sympos. Comput. Geom. (SoCG), 31:1–31:15, 2018.

10. Briefly list the most recent professional development activities

BIOGRAPHICAL SKETCH: RUTH J. WILLIAMS

Address: Dept. of Mathematics, UCSD, 9500 Gilman Drive, La Jolla, CA 92093-0112.

Email: [email protected], Web page: http://www.math.ucsd.edu/∼williams/

a. Professional Preparation.

B.Sc. (Hons.), Mathematics, University of Melbourne, Melbourne, Australia, 1977.

M.Sc., Mathematics, University of Melbourne, Melbourne, Australia, 1979.

Ph.D., Mathematics, Stanford University, Palo Alto, CA, 1983.

b. Appointments.

Charles Lee Powell Chair in Mathematics I, UC San Diego, 2011-present.

Distinguished Professor, Dept. of Mathematics, UC San Diego, 2009–present.

Professor, Dept. of Mathematics, UC San Diego, 1991–2009.

Associate Professor, Dept. of Mathematics, UC San Diego, 1988–91.

Assistant Professor, Dept. of Mathematics, UC San Diego, 1983–88.

Postdoctoral Member in Probability, Courant Inst. of Math. Sciences, NYU, 1983–84.

Visiting Positions (last 25 years).

Visiting Scholar, Center of Mathematical Sciences & Applic., Harvard Univ., 9/19-11/19.

Visiting Member of ACEMS (Australian Research Council Centre of Excellence for Mathe-

matical and Statistical Frontiers), Univ. Melbourne, Australia, 9/15, 7/18 and 7/19.

Visiting Member, Center for Modeling Stochastic Sys., Monash Univ., Australia, 3/17-5/17.

G. C. Steward Visiting Fellow in Math., Gonville & Caius College, Cambridge, 3/10-4/10.

Visiting Fellow, Isaac Newton Institute for Math. Sciences, Cambridge, 3/10-4/10.

Visiting Prof. of Operations, Information & Technology, Stanford University, 9/01–6/02.

Research Professor, Mathematical Sciences Research Institute, Berkeley CA, 9/97–5/98.

Visiting Scientist, Inst. for Math. & Its Applic., U. Minnesota, 3–6/86, 1–3/94 & 10-11/15.

c. Honors.

Fellow of the Society for Industrial and Applied Mathematics, since 2020.

Corresponding Member of the Australian Academy of Science, elected 2018.

Award for the Advancement of Women in Operations Research & Management Sci., 2017.

John von Neumann Theory Prize, 2016 (jointly with M. I. Reiman), awarded by the Institute

for Operations Research and the Management Sciences (INFORMS).

Inaugural Fellow of the American Mathematical Society, Class of 2013.

Fellow of the American Academy of Arts and Sciences since 2009.

INFORMS Fellow since 2008.

INFORMS Applied Probab. Soc. Best Pub. Award 2007 (joint with Gromoll and Puha).

Guggenheim Fellowship, 2001-2002.

Invited 45 minute speaker, International Congress of Mathematicians, Berlin, 1998.

Fellow of the American Association for the Advancement of Science since 1995.

Fellow of the Institute of Mathematical Statistics since 1992.

Alfred P. Sloan Research Fellow, 1988–92.

National Science Foundation Presidential Young Investigator Award, 1987–93.

1

d. Research Interests.

Probability, stochastic processes and their applications; multidimensional reflected diffusions;

stochastic differential (delay) equations; measure-valued processes; fluid and diffusion approx-

imations for complex networks; analysis and control of stochastic networks with applications

to operations management, telecommunications and systems biology.

e. Five Selected Recent Publications.

1. Yingjia Fu and Ruth J. Williams, Stability of a subcritical fluid model for fair bandwidth

sharing with general file size distributions, Stochastic Systems, in press.

2. J. A. Mulvany, A. L. Puha and R. J. Williams, Asymptotic behavior of a critical fluid

model for a multiclass processor sharing queue via relative entropy, Queueing Systems, 93

(2019), 351-397.

3. S. C. Leite and R. J. Williams, A constrained Langevin approximation for chemical reaction

networks, Annals of Applied Probability, 29 (2019), 1541-1608.

4. R. J. Williams, Stochastic Processing Networks, invited article for Annual Review of

Statistics and Its Application, 3 (2016), 323-345.

5. D. Lipshutz and R. J. Williams, Existence, uniqueness and stability of slowly oscillating

periodic solutions for delay differential equations with non-negativity constraints, SIAM J.

on Mathematical Analysis, 47 (2015), 4467–4535.

f. External Professional Activities (5 illustrative examples).

(i) Member of the Council of the National Academy of Sciences, 2019-2022.

(ii) Member of the selection committee for the biennial INFORMS Impact Prize, 2018, 2020

(committee chair in 2020).

(iii) Member of Governance Board of MATRIX (Australian Mathematics Inst.), 2015-pres.

(iv) Associate Editor for Applied Probability Trust Journals: Journal of Applied Probability

and Advances in Applied Probability (2016-present).

(v) President, Institute of Mathematical Statistics, 2012.

g. UC San Diego Recent Service Activities (5 illustrative examples).

(i) Founding Faculty and Council Member, Halicioglu Data Science Institute, 2018-present

(ii) Faculty Advisory Committee for Moore Science Communication grant, 2017–19.

(iii) Physical Sciences Task Force on the Status of Women in the Physical Sciences, 2017–18.

(iv) Council Member, Mathematics Department, 2016–2020 (one of three Council members

elected to represent the faculty).

(v) Chair, Mathematics Department Hiring Committee, 2016-2018.

2

1. Name: Arya Mazumdar

2. Education degree, discipline, institution, year

a. Ph.D., Electrical and Computer Engineering, University of Maryland College Park, 2011

1. Academic experience – institution, rank, title (chair, coordinator, etc. if appropriate),

when (ex. 2002-2007), full time or part time a. University of California San Diego, Associate Professor, 2021-present. b. University of Massachusetts Amherst, Associate Professor, 2019-2021. c. University of Massachusetts Amherst, Assistant Professor, 2015-2019. d. University of Minnesota Twin Cities, Assistant Professor, 2013-2015. e. Massachusetts Institute of Technology, Postdoctoral Associate, 2011-2012.

2. Non-academic experience – company or entity, title, brief description of position, when

(ex. 2008-2012), full time or part time a. Amazon, Senior Scientist, 2019-2020 b. IBM Almaden, PhD Intern, 2010. c. HP Labs, PhD Intern, 2008.

3. Current membership in professional organizations: Senior Member, IEEE and IEEE

Information Theory Society, Member, ACM

4. Honors and awards a. Best Paper Award, European Association for Signal Processing, 2020 b. NSF CAREER Award, 2015 c. Distinguished Dissertation Award, University of Maryland, 2011 d. Jack K. Wolf Paper Award, ISIT, 2010.

5. Service activities (within and outside of the institution)

a. UCSD: DSC Program Committee, Industry Liaison Committee b. UMass: Faculty Hiring Committee, Graduate Committee, Admissions Committee c. Editorial Boards: Associate Editor: IEEE Transactions on Information Theory;

Area Editor: Now Publishers Foundations and Trends; Guest Editor: Entropy d. Organization:

i. Information Theory School, 2019 ii. Workshop on Coding Theory for Optimization, Learning, Inference

e. Review Panels: NSF, BSF (US+Israel), ISF (Israel), Research Grant Council (Hong Kong)

f. Conference Program Committees: ISIT, AAAI, AISTATS, DSP, ITW, NVMW

6. Briefly list the most important publications and presentations from the past five years – title, co-authors if any, where published and/or presented, date of publication

a. A Mazumdar, S Pal, “Semisupervised clustering by queries and locally encodable source coding," IEEE Transactions on Information Theory, vol. 67, no. 2, 2021. Preliminary Version in NeurIPS 2017 (Spotlight paper).

a. V Gandikota, D Kane, R Maity, A Mazumdar, “vqSGD: vector quantized stochastic gradient descent," AISTATS , 2021.

b. A Ghosh, R Maity, A Mazumdar, “Distributed Newton can communicate less and resist byzantine workers," NeurIPS, 2020.

c. V Gandikota, A Mazumdar, S Pal, “Recovery of sparse linear classifiers from mixture of responses," NeurIPS, 2020.

d. S Ubaru, S Dash, A Mazumdar, O Gunluk, “Multilabel classification by hierarchical partitioning and data-dependent grouping," NeurIPS, 2020.

e. R McKenna, R Maity, A Mazumdar, G Miklau, “A workload-adaptive mechanism for linear queries under local differential privacy," VLDB, 2020.

f. Arya Mazumdar, Soumyabrata Pal, “Recovery of sparse signals from a mixture of linear samples," International Conference on Machine Learning (ICML), 2020.

g. A Krishnamurthy, A Mazumdar, A McGregor, S Pal, “Algebraic and Analytic Approaches for Parameter Learning in Mixture Models," ALT, 2020.

h. A Krishnamurthy, A Mazumdar, A McGregor, S Pal, “Sample complexity of learning mixture of sparse linear regressions," NeurIPS, 2019.

i. W Huleihel, A Mazumdar, M Medard, S Pal, “Same-cluster querying for overlapping clusters," NeurIPS, 2019.

j. L Flodin, V Gandikota, A Mazumdar, “Superset technique for approximate recovery in one-bit compressed sensing," NeurIPS, 2019.

k. A Mazumdar, A McGregor, S Vorotnikova, “Storage capacity as information theoretic vertex cover and the index coding rate,"IEEE Tran on Information Theory, vol. 65, no. 9, 2019.

l. A Mazumdar, B Saha, “Clustering with noisy queries ," NIPS, 2017. m. A Mazumdar, B Saha, “Query complexity of clustering with side information,"

NIPS, 2017. n. A Barg, A Mazumdar, “Group testing schemes from codes and designs," IEEE

Transactions on Information Theory, vol. 63, no. 11, Nov 2017. o. S Ubaru, A Mazumdar, Y Saad, “Low rank approximation and decomposition of

large matrices using error correcting codes," IEEE Tran on Information Theory, vol. 63, no. 9, Sep 2017.

p. A Mazumdar, “Nonadaptive group testing with random set of defectives," IEEE Transactions on Information Theory, vol. 62, no. 12, Dec 2016.

7. Briefly list the most recent professional development activities

a. Courses Taught Recently: Algorithms for Data Science, Optimization in Computer Science, Undergraduate Probability and Statistics, Information Theory, Coding Theory

Angela J. Yu Associate Professor (Asst. Prof. 2008-2014)[email protected] Department of Cognitive Sciencehttp://www.cogsci.ucsd.edu/ ajyu University of California San Diego

Education Princeton University (04/05–07/08) Princeton, NJPost-doctoral fellow in the Center for the Study of Brain, Mind, and Behavior

UCL Gatsby Computational Neuroscience Unit (10/00-06/05 London, UKPh.D in Computational Neuroscience.

Massachusetts Institute of Technology(09/96–06/00 Cambridge, MAB.S. Mathematics, B.S. Computer Science, B.S. Brain & Cognitive Sciences; GPA: 5.0/5.0

SelectedPubs

Guan, J, Ryali, C, & Yu, A J (2018). Computational modeling of social face perception inhumans: Leveraging the active appearance model. bioRxiv, https://doi.org/10.1101/360776.

Ryali, C, & Yu, A J (2018). Beauty-in-averageness and its contextual modulations: A Bayesianstatistical account. Adv. in Neural Information Processing Systems, 32.

Guo, D, Yu, A J (2018). Why so gloomy? A Bayesian explanation of human pessimism biasin the multi-armed bandit task. Adv. in Neural Information Processing Systems, 32.

Ryali, C, Gautam, R, & Yu, A J (2018). Demystifying excessively volatile human learning: ABayesian persistent prior and a neural approximation. Adv. in Neural Information ProcessingSystems, 32.

Wang, W, Hu, S, Ide, J S, Zhornitsky, S, Zhang, S, & Yu, A J, Li, C-S R (2018). Motorpreparation disrupts proactive control in the stop signal task. Frontiers in Human Neuroscience,doi: 10.3389/fnhum.2018.00151.

Cogliati Dezza, I, Yu, A J, Cleeremans, A, Alexander, W (2017). Learning the value of infor-mation and reward over time when solving exploration-exploitation problems. Nature ScientificReports, 7:16919.

Harle, K M, Guo, D, Zhang, S, Paulus, M, Yu, A J (2017). Anhedonia and anxiety underlyingdepressive symptomatology have distinct effects on reward-based decision-making. PLoS ONE,12(10):e0186473.

Harle, K M, Zhang, S, Ma, N, Yu*, A J, & Paulus, M P* (2016). Reduced neural recruitmentfor Bayesian adjustment of inhibitory control in methamphetamine dependence. BiologicalPsychology: Cog. Neurosci. and Neuroimaging, 1: 48-459. *Co-senior authors.

Li L, Malave, V, Song, A, & Yu, A J (2016). Extracting Human Face Similarity Judgments:Pairs or Triplets? Proceedings of the Cognitive Science Society Conference.

Ma, N & Yu, A J (2016). Inseparability of Go and Stop in Inhibitory Control: Go StimulusDiscriminability Affects Stopping Behavior. Frontiers in Decision Neuroscience, 10 (54).

Harle, K M, Zhang, S, Schiff, M, Mackey, S, Paulus, M P, & Yu, A J (2015). Altered statisticallearning and decision-making in methamphetamine dependence: Evidence from a two-armedbandit task. Frontiers in Psychology, 6 (1910).

Harle, K M, Steward, J L, Zhang, S, Tapert, S, Paulus, M P, & Yu, A J (2015). Bayesianneural adjustment of inhibitory control predicts emergence of problem stimulant use. Brain,138:3413-26.

Ma, N & Yu, A J (2015). Statistical Learning and Adaptive Decision-Making Underlie HumanResponse Time Variability in Inhibitory Control. Frontiers in Psychology, 6(1046).

Ide, J S, Hu, S, Zhang, S, Yu, A J & Li, C-S R (2015). Impaired Bayesian learning for cognitivecontrol in cocaine dependence. Drug and Alcohol Dependence, 151: 220-227.

Ahmad, S & Yu, A J (2015). A rational model for individual differences in preference choice.Proceedings of the Cognitive Science Society Conference.

Zhang, S, Song, M, & Yu, A J (2015). Bayesian hierarchical model of local-global processing:Visual crowding as a case-study. Proceedings of the Cognitive Science Society Conference.

Yu, A J & Huang, H (2014). Maximizing masquerading as matching: Statistical learning anddecision-making in choice behavior. Decision, 1 (4): 275-287.

Harle, K M, Shenoy, P, Steward, J L, Tapert, S, Yu*, A J, Paulus*, M P (2014). Alteredneural processing of the need to stop in young adults at risk for stimulus dependence. Journalof Neuroscience, 34(13): 4567-4580. *Co-senior authors.

Ahmad, S, Huang, H, & Yu, A J (2013). Context-sensitivity in human active sensing. Adv. inNeural Information Processing Systems 26.

Zhang, S & Yu, A J (2013). Forgetful Bayes and myopic planning: Human learning anddecision-making in a bandit setting. Adv. in Neural Information Processing Systems 26.

Shenoy, P & Yu, A J (2013). A rational account of contextual effects in preference choice:What makes for a bargain? Cognitive Science Society Conference.

Dayanik, S & Yu, A J (2013). Reward-rate maximization in sequential identification under astochastic deadline. SIAM Journal on Control and Optimization, 51 (4), 2922-2948.

Yu, A J (2013). Bayesian Models of Attention. Chapter in Handbook of Attention, Eds. S.Kastner & K. Nobre. Oxford, UK: Oxford University Press.

Ide, J S, Shenoy, P, Yu*, A J, & Li*, C-R (2013). Bayesian prediction and evaluation in theanterior cingulate cortex. Journal of Neuroscience, 33: 2039-2047. *Co-senior authors.

Shenoy, P & Yu, A J (2012). Rational impatience in perceptual decision-making: a Bayesianaccount of discrepancy between two-alternative forced choice and Go/NoGo behavior. Adv. inNeural Information Processing Systems 25.

Yu, A J (2012). Change is in the eye of the beholder. Nature Neuroscience 15: 933-935.

Shenoy, P, Rao, R, & Yu, A J (2010). A rational decision making framework for inhibitorycontrol. Adv. in Neural Information Processing Systems 23: 2146-2154.

Yu, A J & Cohen, J D (2009). Sequential effects: Superstition or rational behavior? Adv. inNeural Information Processing Systems 21: 1873-1880.

Yu, A J, Dayan, P, & Cohen J D (2008). Dynamics of attentional selection under conflict:Toward a rational Bayesian account. J. Exp. Psy.: Human Perc. and Perf., 35: 700-717.

Frazier, P & Yu, A J (2008). Sequential hypothesis testing under stochastic deadlines. Adv.in Neural Information Processing Systems 20: 465-72.

Yu, A J. (2007) Adaptive behavior: Humans act as Bayesian learners. Current Biology 17.

Cohen, J D, McClure, S M, & Yu, A J (2007). Should I stay or should I go? How the humanbrain manages the tradeoff between exploitation and exploration. Philosophical Transactionsof the Royal Society B: Biological Sciences 362: 933-942.

Yu, A J (2007). Optimal change-detection and spiking neurons. Adv. in Neural InformationProcessing Systems 19: 1545-52. MIT Press, Cambridge, MA.

Dayan, P & Yu, A J (2006). Norepinephrine and neural interrupts. Adv. in Neural InformationProcessing Systems 18: 243-50.

Yu, A J & Dayan, P (2005). Uncertainty, neuromodulation, and attention, Neuron, 46: 681-692.

Yu, A J & Dayan, P (2005). Inference, attention, and decision in a Bayesian neural architec-ture. In Adv. in Neural Information Processing Systems 17: 1577-84.

Yu, A J & Dayan, P (2003). Expected and unexpected uncertainty: ACh and NE in theneocortex. In Adv. in Neural Information Processing Systems 15.

Yu, A J & Dayan, P (2002). Acetylcholine in cortical inference. Neural Networks, 15 (4/5/6):719-730.

Dayan, P & Yu, A J (2002). Acetylcholine, uncertainty, and cortical inference. Adv. in NeuralInformation Processing Systems 14.

OtherActivities

Journal editor: Decision, Frontiers in Behavioral Neurosci., Frontiers in Human Neurosci.Journal Reviewer: Adaptive Behavior, Brain Research, Cognition, Cognitive Psychology,Computational Brain & Behaivor, Current Biology, Decision, eLife, European Journal of Neuro-science, Frontiers in Computational Neuroscience, Frontiers in Decision Neuroscience, Frontiersin Behavioral Neuroscience, Frontiers in Human Neuroscience, Journal of Autonomous Agentsand Multi-Agent Systems, Journal of Neuroscience, Journal of Theoretical Biology, Memory &Cognition, Nature Communications, Nature Human Behavior, Nature Reviews, Neural Com-putation, Neuron, PLoS Computational Biology, PLoS ONE, PNAS, Psychological Review,Psychonomic Bulletin & Review, Psychopharmacology, ScienceConference Reviewer/Organizer: Cogsci, Cosyne, IJCAI, RLDM, NIPS (AC, SAC)

Zhiting Hu

Halıcıoglu Data Science InstituteUniversity of California San DiegoLa Jolla, CA 92093

Phone: +1 (412) 320-0630

Email: [email protected]@gmail.com

Homepage: http://zhiting.ucsd.edu/

Education2020 Ph.D., Machine Learning Department, Carnegie Mellon University

Advisor: Eric P. Xing

2016 M.S., Language Technologies Institute, Carnegie Mellon UniversityAdvisior: Eric P. Xing

2014 B.S., Computer Science, Peking University, China

Academic Experience

starting 2021.9 Assistant Professor, Halıcıoglu Data Science Institute, UC San Diego

Non-Academic Experience

2020 – 2021.9 Full-time Visiting Academic, Amazon Alexa AI

2017 – 2020 Full-time Research Scientist, Petuum Inc.

Awards & Honors (Selected)

2019 Best Demo Paper Nomination, ACL 2019

2019 Best Paper Award, ICLR 2019 drlStructPred Workshop

2017 NVIDIA Pioneering Research Award

2017 IBM Ph.D. Fellowship

2017 Baidu Ph.D. Scholarship

2016 Outstanding Paper Award, ACL 2016

2014 Excellence Award of Stars of Tomorrow Internship Program, Microsoft Research Asia

2014 GPA 1st/140, EECS, Peking University

2013 Outstanding Undergraduate Award, China Computer Federation (CCF)

2013 Google Excellence Scholarship

2009 First Prize in China National Biology Olympiad for Senior High School

Service ActivitiesCo-organizer, NeurIPS 2019 Workshop on Learning with Rich Experience: Integration of Learning Paradigms

Zhiting Hu 2

Co-organizer, CVPR 2019 Workshop on Towards Causal, Explainable and Universal Medical Visual Diagnosis

Co-organizer, ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Representations

Co-organizer, ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models

Reviewer, NeurIPS, ICML, ACL, EMNLP, NAACL, CVPR, AAAI, KDD, WWW, JMLR, MLJ, TPAMI, etc

Outstanding reviewer, EMNLP 2020

Publications (Selected)

Google Scholar Profile

[1] Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P Xing, Zhiting Hu.Improving GAN Training with Probability Ratio Clipping and Sample ReweightingNeural Information Processing Systems (NeurIPS 2020).

[2] Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed,Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, XiaodanLiang, Teruko Mitamura, Eric P Xing, Zhiting Hu.A Data-Centric Framework for Composable NLP WorkflowsConference on Empirical Methods in Natural Language Processing (EMNLP 2020), demo.

[3] Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Tom Mitchell, Eric P Xing.Learning Data Manipulation for Augmentation and WeightingNeural Information Processing Systems (NeurIPS 2019).

[4] Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He,Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, etc.Texar: A Modularized, Versatile, and Extensible Toolkit for Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL 2019).Best Demo Paper Nomination, https://github.com/asyml

[5] Zhiting Hu*, Bowen Tan* (equal contrib.), Zichao Yang, Ruslan Salakhutdinov, Eric P Xing.Connecting the Dots between MLE and RL for Sequence PredictionICLR 2019 Workshop on Deep Reinforcement Learning Meets Structured Prediction,Best Paper Award

[6] Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric PXing.Deep Generative Models with Learnable Knowledge ConstraintsNeural Information Processing Systems (NeurIPS 2018).

[7] Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P Xing.On Unifying Deep Generative ModelsInternational Conference on Learning Representations (ICLR 2018).

[8] Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P Xing.Toward Controlled Generation of TextInternational Conference on Machine Learning (ICML 2017).

[9] Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, Eric P Xing.Harnessing Deep Neural Networks with Logic RulesAnnual Meeting of the Association for Computational Linguistics (ACL 2016).Outstanding Paper Award

Curriculum Vitae (abbreviated) R. Stuart Geiger, Ph.D

[email protected] Name: Geiger II, Richard Stuart

Dept Communication & Data Science Title Assistant Professor (effective 7/1/2020) Education

School, college, university

Dates of attendance Location Major subject or

field Degrees or certificates

Date received

University of Texas at Austin

8/2005 - 5/2007 Austin, TX Humanities Honors

B.A. 5/2007

Georgetown University

8/2007 - 5/2009 Washington, DC

Communication, Culture, and Technology

M.A.. 5/2009

UC Berkeley, School of Information

8/2010-12/2015 Berkeley, CA Information Management &

Systems

Ph.D., 12/2015

Academic Experience

Period of employment

Institution, firm or organization Location Rank, title, or position

10/2019 – 6/2020 UC Berkeley, Berkeley Institute for Data Science

Berkeley, CA Professional researcher, assistant rank (full-time)

1/2016 - 9/2019 UC Berkeley, Berkeley Institute for Data Science

Berkeley, CA Postdoctoral scholar (full-time)

5/2009 – 8/2010 Georgetown University,

Dept of Communication, Culture, and Technology

Washington, DC Research associate (full-time)

Current memberships in professional organizations

1. Association for Computing Machinery (ACM), 2009-present 2. Society for the Social Studies of Science (4S), 2009-present 3. Association of Internet Researchers (AoIR), 2013-present 4. International Communication Association (ICA), 2014-present

Honors and Awards

1. Research grant (lead PI): $138,055 from the Sloan Foundation & Ford Foundation for 2018-2020 grant on “The Visible and Invisible Work of Maintaining Open-Source Software” w/ Co-PIs L. Irani & A. Paxton 2. Best presentation award: Awarded at the 2018 Annual European Conference on Computer-Supported Cooperative Work (ECSCW): Nancy, France, June 7, 2018. For “The Types, Practices, and Roles of Documentation.”

3. Best paper award (1st runner up): The 2018 David B. Martin Best Paper Award (1st runner up), from the European Society for Socially Embedded Technologies. For “The Types, Practices, and Roles of Documentation,” with N. Varoquaux, C. Cabasse-Mazel, and C. Holdgraf.

Service activities

1. Lead organizer of the Best Practices in Data Science working group at the UC-Berkeley Institute for Data Science (2018-2020)

2. Co-organizer of workshop and event series on Diversity & Inclusion in Data Science at UC-Berkeley (2018-2020)

3. Program committee member of the ACM Conference on Collective Intelligence (2019-present)

4. Co-organizer of the Data Science Studies / Critical Data Studies conference track at the Annual Meeting of the Society for the Social Studies of Science (4S) (2016-2018)

5. Undergraduate Research Apprenticeship Mentor, UC-Berkeley (2018-2020)

Recent selected publications and presentations

1. Geiger, R.S., K. Yu, Y. Yang, M. Dai, J. Qiu, R. Tang, and J. Huang. 2020. "Garbage In, Garbage Out: Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?" In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency.

2. Geiger, R.S. 2019. “The Visible and Invisible Work of Maintaining and Sustaining Open-Source Software.” Keynote at the SciPy (Scientific Python) 2019 conference, Austin, Texas. July 10, 2019.

3. Geiger, R.S. 2018. “Key Values: What We Talk About When We Talk About ‘Open Science.’” Keynote at the 2018 Hawai’i Open Science Symposium, Manoa, HI. Apr 20, 2018.

4. Geiger, R.S., N. Varoquaux, C. Mazel-Cabasse, and C. Holdgraf. 2018. “The Types, Roles, and Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative Ethnography of Documentation Work.” Computer Supported Cooperative Work. https://doi.org/10.1007/s10606-018-9333-1

5. Geiger, R.S. and Halfaker, A. 2017. “Operationalizing conflict and cooperation between automated software agents in Wikipedia: A replication and expansion of Even Good Bots Fight." In Proceedings of the ACM on Human-Computer Interaction (Nov 2017 issue, CSCW 2018 Online First). https://doi.org/10.1145/3134684

6. Geiger, R.S. 2017. "Beyond opening up the black box: Investigating the role of algorithmic systems in Wikipedian organizational culture." Big Data & Society 4(2). http://stuartgeiger.com/algoculture-bds.pdf

7. Geiger, R.S. 2016. “Bot-based collective blocklists in Twitter: the counterpublic moderation of harassment in a networked public space.” Information, Communication, and Society 19(6). http://stuartgeiger.com/blockbots-ics.pdf

Recent Professional Development Activities

1. UC Sexual Violence and Sexual Harassment Prevention Training (6/18/2019) 2. UC Cyber Security Awareness Fundamentals (8/26/2019)

Margaret (Molly) E. Roberts

Contact Information

University of California, San Diego http://margaretroberts.netSocial Sciences Building 301 [email protected] Gilman Drive, #0521La Jolla, CA 92093-0521

Academic Appointments

University of California, San DiegoChancellor’s Associates Endowed Chair, 2020-.Associate Professor, Dept of Political Science and Halıcıoglu Data Science Institute, July 2020-.Associate Professor, Dept of Political Science, July 2018-2020Assistant Professor, Dept of Political Science, July 2014-2018.

Education

Harvard University,Ph.D., Government, 2014

Stanford UniversityM.S. Statistics, June 2009

Stanford UniversityB.A. International Relations & Economics, June 2009

Books

- Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. Text as Data. PrincetonUniversity Press. (In Press)- Roberts, Margaret E. Censored: Distraction and Diversion Inside China’s Great Firewall.Princeton University Press. (2018)

Selected Publications

- Eddie Yang and Margaret E. Roberts. 2021. “Censorship of Online Encyclopedias: Implica-tions for NLP Models.” In Conference on Fairness, Accountability, and Transparency (FAccT‘21)- Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. “Machine Learning forSocial Science: An Agnostic Approach.” Annual Review of Political Science. (In Press)- Roberts, Margaret E. ”Resilience to online censorship.” Annual Review of Political Science 23(2020): 401-419.- Iyad Rahwan, Manuel Cebrian, Nick Obradovich, Josh Bongard, Jean-Franois Bonnefon, Cyn-thia Breazeal, Jacob W. Crandall, Nicholas A. Christakis, Iain D. Couzin, Matthew O. Jackson,Nicholas R. Jennings, Ece Kamar, Isabel M. Kloumann, Hugo Larochelle, David Lazer, RichardMcElreath, Alan Mislove, David C. Parkes, Alex ‘Sandy’ Pentland, Margaret E. Roberts, AzimShariff, Joshua B. Tenenbaum Michael Wellman. “Machine behaviour.” Nature. 2019 Apr568(7753):477- Hobbs, William R., and Margaret E. Roberts. “How sudden censorship can increase access toinformation.” American Political Science Review 112.3 (2018): 621-636.- King, Gary, Jennifer Pan, and Margaret E. Roberts. 2017. “How the Chinese Government

Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument.” AmericanPolitical Science Review.- King, Gary, Patrick Lam, and Margaret E. Roberts. 2017. “Computer-Assisted Keyword andDocument Set Discovery from Unstructured Text.” American Journal of Political Science.- Roberts, Margaret E, Brandon M. Stewart, and Edo M. Airoldi. 2016. “A model of text forexperimentation in the social sciences.” Journal of the American Statistical Association, 111(515): 988-1003.- King, Gary, Jennifer Pan, and Margaret E. Roberts. 2014. “Reverse Engineering ChineseCensorship: Randomized Experimentation and Participant Observation.” Science, 345 (6199):1-10.- Roberts, Margaret E, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-Luis, Shana Gadarian, Bethany Albertson, David Rand. 2014.“Structural Topic Models forOpen-Ended Survey Responses.” American Journal of Political Science, 58 (4): 1064-1082.- King, Gary, Jennifer Pan, and Margaret E. Roberts. 2013. “How Censorship in China AllowsGovernment Criticism but Silences Collective Expression.” American Political Science Review,107(2), 326-343.

Selected Honors and Awards

• 2019 Best Book in the Information Technology and Politics Section, APSA• 2019 Best Book Award in the Human Rights Section, APSA• 2019 Goldsmith Book Prize• 2019 International Studies Association Human Rights Section’s Best Book Runner Up• Foreign Affairs Best Books of 2018• 2018 Society for Political Methodology Statistical Software Award for stm: An R package

for Structural Topic Models• 2017 National Science Foundation RIDIR Award, “Collaborative Research: Analytical

tools for text based social data integration” (with Amarnath Gupta and Brandon Stewart)$1,516,099

• 2016 UC San Diego, Hellman Fellowship, “Estimating the Impact of the Great Firewallon Political Opinion in China”$46,447

• 2016 UC San Diego, Academic Senate Research Award, “The Credibility of Media Infor-mation to Citizens” (with Seth Hill) $14,032

• 2015 UC San Diego, Integrated Digital Infrastructure Grant, “A High-Performance Stor-age, Management and Computation Platform for Heterogeneous, Multilingual Text Datato Enable Social Science Research” (with James Fowler and Amarnath Gupta) $94,000

• 2015 National Science Foundation RAPID Grant, “RAPID: Measuring the Intent of Chi-nese Leaders through Censorship Behavior” (with Gary King and Jennifer Pan) $200,000

• 2015 APSA Division of Political Communication’s Outstanding Dissertation Award• 2015 Richard J. Herrnstein Prize for Dissertation “Fear, Friction, and Flooding: Methods

of Online Information Control.”• 2014 Edward M. Chase Prize for the best dissertation on a subject relating to the promotion

of world peace

Selected Service

• Editorial Board Member: Political Analysis, Asian Survey, American Journal of PoliticalScience, American Political Science Review, China Quarterly, Political Behavior, WorldPolitics

• American Political Science Association Methodology Section Chair (2019)• Text as Data Association Board (2019-)• Society for Political Methodology Diversity Committee (2014-present)

David Danks Education Ph.D., Philosophy, University of California, San Diego, 2001 M.A., Philosophy, University of California, San Diego, 1999 A.B., Philosophy, Princeton University, 1996 Academic experience (As of July 2021: University of California, San Diego, Professor of Data Science & Philosophy) Carnegie Mellon University, L.L. Thurstone Professor of Philosophy & Psychology, 2016 – CMU, Head, Department of Philosophy, 2014 - CMU, Professor of Philosophy & Psychology, 2014 - 2016 CMU, Associate Professor of Philosophy & Psychology, 2008 - 2014 CMU, Assistant Professor of Philosophy, 2003 - 2008 Florida Institute for Human & Machine Cognition, Research Scientist, 2001 – 2012 (full-time for

2001-2003; part-time from 2003-2012) Colorado College, Visiting Assistant Professor of Philosophy, 2002 - 2003 Membership in professional organizations American Philosophical Association Cognitive Science Society Philosophy of Science Association Honors and awards Andrew Carnegie Fellowship (2017) James S. McDonnell Foundation Scholar (2008) Service activities (selected) National Academies Committee on Responsible Computing Research, Member Technology Transformation Services (GSA) AI Portfolio, Expert advisor National Security Commission on Artificial Intelligence, SGE for Ethics Line of Effort Pittsburgh Task Force on Public Algorithms, Member Grefenstette Center for Ethics in Science, Tech., & Law (Duquesne Univ.), Advisory Board Partnership to Advance Responsible Technology, Founding Board member Salesforce Ethical & Responsible Use advisory council, External member IBM Watson AI XPRIZE competition, Lead/Presiding judge CMU Center for Informed Democracy and Social Cybersecurity (IDeaS), Founding co-Director CMU Block Center for Technology & Society, Chief Ethicist CMU President’s Task Force on Campus Climate, Co-chair Most important recent publications & presentations Lütge, C., Poszler, F., Acosta, A. J., Danks, D., Gottehrer, G., Mihet-Popa, L., & Naseer, A.

(2021). AI4People: Ethical guidelines for the automotive sector – fundamental requirements and practical recommendations. International Journal of Technoethics, 12(1), 101-125.

Zhou, Y., & Danks, D. (2020). Different “intelligibility” for different folks. In A. Markham, J. Powles, T. Walsh, & A. L. Washington (Eds.), Proceedings of the 2020 AAAI/ACM Conference on Artificial Intelligence, Ethics, & Society (pp. 194-200). New York: ACM.

Danks, D. (2019). The value of trustworthy AI. In Proceedings of the 2019 AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.

Danks, D., & Plis, S. M. (2019). Amalgamating evidence of dynamics. Synthese, 196(8), 3213-3230.

Geary, T., & Danks, D. (2019). Balancing the benefits of autonomous vehicles. In Proceedings of the 2019 AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society.

Danks, D. (2018). Privileged (default) causal cognition: A mathematical analysis. Frontiers in Psychology, 9: 498. doi:10.3389/fpsyg.2018.00498

Danks, D., & Ippoliti, E. (Eds.) (2018). Building theories: Heuristics and hypotheses in science. Berlin: Springer-Verlag.

LaRosa, E., & Danks, D. (2018). Impacts on trust of healthcare AI. In Proceedings of the 2018 AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society. doi:10.1145/3278721.3278771

London, A. J., & Danks, D. (2018). Regulating autonomous vehicles: A policy proposal. In Proceedings of the 2018 AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society. doi:10.1145/3278721.3278763

Roff, H. M., & Danks, D. (2018). “Trust but Verify”: The difficulty of trusting autonomous weapons systems. Journal of Military Ethics, 17, 2-20.

Danks, D., & London, A. J. (2017). Algorithmic bias in autonomous systems. In C. Sierra (Ed.), Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 4691-4697).

Danks, D., & London, A. J. (2017). Regulating autonomous systems: Beyond standards. Intelligent Systems, 32(1), 88-91.

Hyttinen, A., Plis, S., Järvisalo, M., Eberhardt, F., & Danks, D. (2017). A constraint optimization approach to causal discovery from subsampled time series data. International Journal of Approximate Reasoning, 90, 208-225.

Wellen, S., & Danks, D. (2016). Adaptively rational learning. Minds & Machines, 26(1), 87-102. DOI: 10.1007/s11023-015-9370-1