Transformation of the Research Enterprise

Embed Size (px)

Citation preview

  • 8/14/2019 Transformation of the Research Enterprise

    1/9

    Jly/Agt 2006EducA

    A

    2006 EDUCAUSE

    Research

    Sandra Braman is Professor of Communication at the University of WisconsinMilwaEDUCAUSE Center for Applied Research (ECAR) Fellow. This article was produced inforthcoming ECAR study on the engagement of IT in research.

    I l l u s t r a t i o n b y N o a h W o o d s , 2 0 0 6

    Transformationof the

    EntErprist least since the time of Plato and Socrates,

    been aware that the technologies we use tate and share knowledge affect what it is thknow. Thus it is not surprising that over thfew decades, innovations in information nology have brought about fundamentallyapproaches to research.1 Nor is it surprisin

    as a result, research methods, practices, anstitutions are evolving into forms quite diffrom those that have dominated modern scThese changes are so significant that theleading to theoretical innovations as well.

    By sandra Braman

    EducAusE r e v i e w Jly/Agt 2006

  • 8/14/2019 Transformation of the Research Enterprise

    2/9

    28 EducausE r e v i e w Jly/agt 2006

    Colleges and universities making deci-sions about such matters as whether ornot to centralize IT support for research,how far to go with the development ofresearch problemspecific software, andhow best to coordinate on-campus activi-ties with those that take place elsewhere

    do so within a dynamic environmentone shaped not only by technologicalinnovations but also by politics. Over thelast couple of decades, computational sci-ence has come into its own as a disciplinedistinct from computer science; whereasthe latter involves research on comput-ing, computational science is the use ofcomputing for research in the physical,life, and social sciences and in the hu-manities. This has transformed those whowork with information technology fromproviders of services into full collabora-

    tive peers with researchers. In addition,the value of computational science to U.S.security, economic strength, and com-petitiveness in science and technologyhas been established. Thus, federal andstate governments both enable (throughfunding) and constrain (through laws andregulations) research institutions andactivities.

    For these reasons, those involved ininformation technology in higher educa-tion need to be aware of the past, present,

    and future of IT engagement with re-search. This article reviews the political,economic, and intellectual developmentsthat shaped the U.S. agenda for researchand information technology, describesthe contemporary research environ-ment and identifies features that have ITimplications, and highlights key compu-tational, networking, and data challengesfor tomorrow.

    The National Agenda forResearch and Information Technology

    In the United States, the national agendafor research and information technologydeveloped out of a long appreciation ofthe value of scientific and technologicalinnovation to society as a whole. Theinclusion of intellectual property rightsin the U.S. Constitution was one mani-festation of the political value placed onthis insight, and by the early nineteenthcentury, diverse government unitseventhe U.S. Post Officewere engaged in

    scientific activity. A role for colleges anduniversities in creating and distributingknowledge in support of national goalswas institutionalized via the Morrill,Hatch, and Smith-Lever Acts of Congressbetween 1862 and 1914, and college anduniversity faculty were heavily involved

    in governmental activity during and fol-lowing World War I. But it was not untilafter World War II and the launch of theNational Science Foundation (NSF) in1950 that the groundwork for todays de-

    velopments was laid.The NSF began its efforts in

    this area by supporting thedevelopment of campuscomputing centers. Bythe mid-1970s, however,it was still the case thatonly those researchers

    who worked with agen-cies such as the Depart-ment of Energy and NASAhad access to supercomput-ers, limiting the scope of re-search for most faculty mem-bers. Those in experimentalcomputer science and com-putation began to organize tocommunicate their concernsabout the scholarlyandnationalconsequences of

    these restrictions, issuing theFeldman Report in 1979. Theywere soon followed by thosein the physics community,who released the Press Reportin 1981, and those in the phys-ical and biological sciences,who produced the Lax Reportthe next year.2 Taken together,this suite of reports arguedfor the revitalization of experimentalcomputer science to generate productsof tangible use in research (including the

    development of new types of computersystems to solve otherwise intractableproblems); highlighted the importanceof computation for the intellectual,economic, and military strength of theUnited States; and pointed to the need formore supercomputers and more accessto those supercomputers for researchersbased in academia.

    Political support for the achievementof these goals received a tremendous

    stimulus when the Japanese governmentlaunched its $500 million Fifth Genera-tion Computer Project in 1981. Intendedto leapfrog existing technologies, thismove catalyzed both the U.S. govern-ment and the U.S. scientific community.Federal funding agencies such as the

    NSF, the National Security Agency (NSA),and the Department of Energy believedthat supporting academic access to su-percomputing would serve three goals:(1) it would help protect the domesticsupercomputing industry in face of the

    Japanese challenge; (2) it wouldprovide increased access to

    these machines for com-puter and computationalscience research; and(3) it would provide theexperimental facilities

    needed to attract thebest computer scientists

    to academic (rather thancorporate) work, where theywould then train the nextgeneration of computer sci-entists. A growing numberof scientists from across thedisciplinary spectrum beganto refer to themselves ascomputational scientists,having realized that com-

    puter simulations and othernew types of interactionswith datainteractions notpreviously possiblewereso sophisticated that theyprovided new theoreticalinsights into biological andphysical phenomena.

    In 1987, the White HouseOffice of Science and Tech-

    nology Policy (OSTP) issued a reportthat looked to networked computing tosupport research.3 In 1993, after a number

    of additional studies and congressionalhearings, the U.S. Congress establisheda $4.7 billion High Performance Com-puting and Communications Program(HPCC). Several foci of the HPCC remainimportant today. First, there was anemphasis on the importance of high-performance computing and commu-nications to national security as well asto the economic competitiveness of theUnited States. Second, projects to be

    Federaland state

    governmentsboth enable

    (throughfunding) and

    constrain

    (throughlaws andregulations)

    researchinstitutions

    andactivities.

  • 8/14/2019 Transformation of the Research Enterprise

    3/9

    30 EducausE r e v i e w Jly/agt 2006

    funded were framed within the contextof scientific and engineering grand chal-lenges. Finally, research projects usinghigh-performance computing were to becollaborative ventures along three dimen-sions, involving (1) partnerships amongentities from academia, the private sector,

    and government; (2) interdisciplinarywork within the sciences; and (3) compu-tational scientists who would be treated asfull collaborators with researchers ratherthan as service providers.

    The NSF operationalized the conceptof grand challenges by establishing fund-ing programs for work that joined disci-plinary researchers, computer scientists,and emerging information technologiesto solve fundamental science and engi-neering problems. In 1993 the NSF BlueRibbon Panel on High Performance

    Computing summarized the intentionsand goals of these projects, which differedfrom others funded by the NSF in thatthey were intended to accelerate progressin virtually every branch of science and

    engineering concurrently (rather thanbeing directed at single disciplines) inorder to stimulate the U.S. economy as awhole.4 Admitting that much of how sci-ence and engineering are practiced wouldbe transformed if the panels recommen-dations were implemented, the NSF iden-

    tified several issues that needed attention:the removal of technological and imple-mentation barriers to the continued rapidevolution of high-performance comput-ing; the achievement of scalable accessto a pyramid of computing resources; thedemocratization of the base of participa-tion in high-performance computing;and the development of future intel-lectual and management leadership forhigh-performance computing. LewisBranscomb, who edited the report forthe committee, also analyzed its implica-

    tions for further policy. Policy-makers, heargued, should recognize emerging tech-nologies essential to meeting the grandchallenges as critical technologies thedevelopment of which would need other

    types of legal and regulatory support,such as exemptions from antitrust lawand changes in the tax structure.5

    At the time, the continued U.S.domination of the world in science andtechnology was assumed. Only a decadelater, however, the situation had changed

    significantly. Successes in analysis of thegenome had dramatized the utility ofcomputational science. It was no longerclear that the United States dominated theworld in science and technology, and thepolitical environment regarding defini-tions of the knowledge and technologiescritical for survival had shifted. The NSFreconsidered its position on supercom-puting centers and, in 2003, issued areport focused on the physical and bio-logical sciences.6 This report outlined anew strategic orientation around cyber-

    infrastructure, referring to the evolutionof distributed, or grid, computing, whichuses the telecommunications network tocombine the computing capacity of mul-tiple sites. The concept of networked com-

  • 8/14/2019 Transformation of the Research Enterprise

    4/9

    Jly/agt 2006EducausE r e v i e w

    puting expanded to include tasks such asthe harvesting of computing cycles, mas-sive storage, and the management of datain both raw and analyzed forms over thelong term. Such matters as middleware,top-level administrative leadership, andthe development of new analytical tools

    and research methods were all identifiedas key to the knowledge-production in-frastructure. The NSF called for increas-ing the speed and capacity of computersanother 1001,000 times in order to servecurrent research needs.

    A second NSF report focused on theparticular cyberinfrastructure needs ofthe social sciences, arguing that makingprogress in this area would be importantnot only for research on society but forknowledge production in general becauseof what can be learned from social scien-

    tists about human-computer interactionsand about the nature of collaboration ata distance.7 Recommendations includedemphasizing funding and institutionalsupport for training social scientists in

    the uses of advanced computationaltechniques, as well as the skills needed tosucceed within the context of distributedresearch teams. The report also identifiedpolicy issues such as security, confidenti-ality, and privacy.

    The American Council on Learned

    Societies joined, in 2005, with a numberof other groups to offer recommenda-tions regarding the computational needsof those in the humanities.8 This reportnoted that data produced by those in thehumanities contributes to the culturalrecord, currently fragmented acrossinstitutional boundaries, often madeobscure through redaction and otherarchival-processing activities or evenmade unavailable because it is propri-etary. As in the social sciences, there is stilla great deal of work to be done developing

    computational environments, software,and instrumentation in the humanities,though the first humanities project touse digital infrastructureProject Guten-berg, which offers free digital versions

    of full texts of important bookswaslaunched as long ago as 1971. Those inthe humanities increasingly collaboratewith researchers in the physical and lifesciences, as appreciation for the valueof expertise in such areas as documentand image analysis grows. Humanities

    databases can be as massive as or evenmore massive than those of the physicaland biological sciences. For example, theSloan Digital Sky Survey used x-ray, in-frared, and visible-light images to surveymore than 100,000,000 celestial objectsand produced a dataset of more than 40terabytes of information; by contrast,Survivors of the Shoah Visual HistoryFoundations analysis of video interviewsof Holocaust survivors has produced 180terabytes of data.9

    Building on work by the Library of

    Congress and other federal agencies, theNational Science Board, which governsthe NSF, produced a 2005 report10 onlong-lived data needs and problems: therelative ephemerality of digital storage

  • 8/14/2019 Transformation of the Research Enterprise

    5/9

    32 EducausE r e v i e w Jly/agt 2006

    media; the constancy and rate of techno-logical innovation; the desire to revisitdata, collected for one purpose, throughdifferent analytical lenses that will throwlight on additional research questions;and the growing complexity and time-sensitivity of data curation. The data col-

    lection universe for any single institutionnow includes data collections, relatedsoftware, and hardware and communica-tion links along with data authors, manag-ers, users, data scientists, research centers,and other institutions. Three differenttypes of data-storage entities were distin-guished in this report. Focused researchcollections support single research projectsin which a limited group of researchersparticipate. Intermediate-level resource col-lections span various research projects andteams from a specific facility or center.

    Reference collections have global user popu-lations and impact, incorporating data ofmultiple types across disciplinary, institu-tional, geopolitical, research project, anddata type boundaries.

    In June 2005, the Presidents Informa-tion Technology Advisory Committee(PITAC) issued a report reemphasizingthe importance of computational sci-ence for U.S. competitiveness and for thenations capacity to address problemssuch as nuclear fusion, the folding of pro-

    teins, and the global spread of disease.

    11

    Inaddition to strengthening computationaland long-term data-storage capabilitiesand supporting the interinstitutionalrelationships needed to carry out large-scale distributed research, PITAC recom-mended the creation of national softwaresustainability centers to ensure thatarchived data would remain accessibleirrespective of innovations in hardwareand software. The committee warnedthat without new initiatives, U.S. leader-ship in computing and computationally

    intense science would continue to dete-riorate. Interestingly, the long-standingWhite House committee was dissolved byPresident George W. Bush shortly after itissued this report.

    Also in 2005, the Mathematical As-sociation of America released a reportthat exemplified how disciplinary asso-ciations are thinking about the changestaking place.12

    For most of the twentieth

    century, mathematics was linked most

    closely with physics and engineering,but today biology is seen as the stimulusfor innovation. This report noted thatevidence is now as often mathematicalas observational, and it remarked on theshift in the relative prestige of variousdisciplines as they become more compu-

    tationally intense.National policies remain critically

    important for the nature of academicresearch. Unfortunately, an appreciationof the knowledge economy does notalways translate into economic supportfor colleges and universities as the sites ofknowledge production and distribution.Still, the number of institutional playerscontinues to grow.

    The ContemporaryResearch Environment

    As the New York Times science writerGeorge Johnson has noted, today all sci-ence is computer science.13 Over a longerhistory, from the invention of the print-ing press and movable type in WesternEurope more than five hundred years agothrough the sensor web of the twenty-first century, technologies have madepossible developments in the productionof knowledge along several dimensions.

    Weve gone from

    n

    individuals working alone, ton individuals interacting in face-to-faceconversations with others, to

    n individuals thinking about the workof others across space and time as ac-cessed by print, to

    n individuals developing new ideasbased on the systematization of writ-ten records, to

    n individuals observing the naturalworld, to

    n individuals systematizing observa-tions about the natural world, to

    n individuals working in laboratories, ton teams working in laboratories, ton teams working with massive amounts

    of data derived from iterations of stud-ies, to

    n teams working collaboratively acrosslaboratory, institutional, and nationalboundaries, to

    n entirely new realms of research open-ing up as a result of high-performancecomputing.

    The cumulative effects of these develop-ments have formed the contemporaryresearch environment, which is charac-terized by a number of features pertinentto the higher education IT field.

    Computation is the third branch of science,along with theory and experimentation. For

    modern science as it developed over thelast several hundred years, knowledgewas produced through iterative interac-tions between theory and experimenta-tion. Today, computation is seen as athird fundamental element of knowledgeproduction. Kenneth Wilsons distinc-tion among theory, experimentation,and computation has been particularlyinfluential.14 Experimentalists designand use scientific instruments to makemeasurements, undertake controlled andreproducible experiments, and analyze

    both the results of those experimentsand the errors that appear within them.Theorists focus on relationships amongexperimental quantities, the principlesthat underlie these relationships, and themathematical concepts and techniquesneeded to apply the principles to specificcases. Computational scientists focus onalgorithms for solving scientific problemsand their operationalization via software,the design of computational experi-ments and the analysis of errors within

    them, and the identification of underly-ing mathematical frameworks, laws, andmodels. Some describe visualization asthe fourth branch of science because ithas turned out to be so valuable in stimu-lating new conceptual and theoreticalinsights.15

    Every discipline is becoming computation-ally intense. The first discipline to becomecomputationally intense was physics,followed by chemistry and then biol-ogy. Today, all disciplines have takenup computation. For example, the term

    megatronics is used to describe theintegration of computers and mechani-cal engineering. In the field of English,a database of every piece of scholarshipby and about Hamlet, linked to every lineof the play, is now critical to the over fivehundred items still written on the subjecteach year; and software written to analyzethe genome is now being used to deter-mine which versions of Chaucers textsare the earliest. Musicians are being fully

  • 8/14/2019 Transformation of the Research Enterprise

    6/9

    34 EducausE r e v i e w Jly/agt 2006

    wired inside and out so that researcherscan study what happens with their bodiesas they play and sing. And ethnographersare using computers to comparativelyanalyze results from studies conducted atdiverse sites.16

    Large research projects are more likely to

    be transdisciplinary than to be disciplinarilybound. The expectation that computation-ally intense research projects would beinterdisciplinary in nature was inherentin the design of the Internet and has beenenunciated as a basic principle for all re-search involving information technologysince the appearance of the NSFs grandchallenges. The notion of computing andcommunications as research infrastruc-ture, introduced in the Bardon Reportof 1983,17 has been helpful in this regard,orienting attention around problems to

    be solved rather than around disciplin-ary boundaries. The heterogeneity ofresearch teams is a corollary.

    Research is more likely to be carried out in thecontext of a problem-oriented application thanwithin the context of a specific academic commu-nity. The enunciation of grand challengesstarted the process of shifting researchfunding attention toward specific prob-lems and away from discipline-boundand theory-driven basic research. Inrecent years, U.S. national security con-

    cerns have raised the salience of researchresults intended to have short-termutility.18 As a corollary, researchers areincreasingly required to respond to socialand political demands to be accountablerather than isolating themselves withinan ivory tower.

    Large research projects are more likely toinvolve working with the results of data frommultiple studies at multiple sites across multipletime periods than working with single studiesalone. High-performance computing hasmade it possible to analyze much larger

    aggregates of data, expanding visionacross space and time in ways previouslynot possible. Researchers working withsuch datasets need support in developingsoftware capable of analyzing patternsacross types of data, as well as institu-tional arrangements that ensure access tothat data irrespective of where it resides orin what form.

    The distinction between basic and appliedresearch has fallen away. Though Vannevar

    Bushs 1945 distinction between basicand applied research is still common inpublic discourse, sociologists of knowl-edge reject the distinction between basicand applied science as one-dimensionaland replace it with a more complex arrayof linkages among various kinds of scien-

    tific activities. Similarly, the distinctionbetween science and technology is fall-ing away, since information technologyis simultaneously the subject and thetool of inquiry. Technological innovationcan develop out of existing technologieswithout additional fundamental re-searchand the use of technologies canitself generate new basic knowledge.For higher education institutions, thesechanges provide additional reasons toencourage the institutional developmentor adaptation of software to serve campus

    research needs, since doing so can resultin patents and other successes for the col-lege or university.

    Research and ITChallenges for the FutureColleges and universities seeking toincrease the extent to which those ininformation technology engagewith research face challengesin three areas: computation,networking, and data. Of

    course, these three areintertwined. Computa-tion often involves net-working, networking caninvolve data, and manydata problems must be re-solved before computationcan take place.

    ComputationIn the area of computation, theproblem of providing enoughcapacity not only endures but

    may be growing as the numberof disciplines dependent onhigh-performance comput-ing increases, the amount ofdata being created and storedeach year multiplies, maturesupercomputing facilitiesfail to maintain their refreshcycles, and the need for accessto high-performance comput-ing spreads to the classroom.

    Grid computing fills some of the gap, butoften campus networks form bottlenecksfor data transfer; in addition, not everyoneis able to access grid computing capabili-ties, although the 2005 NSF commitmentto further support grid computing na-tionwide should make a difference in the

    future. Meanwhile, new techniques beingused to maximize the utility of existingresources include the following: harvest-ing unused cycles available on a singlecampus; harvesting unused cycles avail-able within the multiple campuses of a re-search network; harvesting unused cyclescontributed to specific research projectsby members of the community; and usingolder computing systems in classroomswhere teaching also now involves high-performance computing, rather thandiscarding obsolete equipment. There

    are institutional innovations as well. AtPrinceton University, researchers pooledresources to purchase a supercomputer tobe used by all, and at the Virginia Instituteof Technology, one thousand ordinaryMac G5s were linked together using vol-unteer student labor to produce a super-computer for use by researchers.

    Institutional support systemsoften lag behind the available

    technological systems. Themanagement of Lambda-

    Rail connections, forexample, requires a con-ceptual shift because itis such a different way of

    organizing activity. Thereis some concern that unfa-

    miliarity with the manage-ment of qualitatively differ-ent types of technologicalsystems, such as LambdaRail,may contribute to their un-deruse. This issue suggeststhe need to reconsider the

    types of training available toCIOs and other upper-leveladministrators, as well as tofaculty who would make useof these opportunities.

    The problems raised bythe effort to build large andlong-lived datasets designedfor multiple uses are dis-cussed below. Once the data-conceptualization issues are

    Computationoften involvesnetworking,networking

    can involvedata, andmany dataproblemsmust beresolvedbefore

    computationcan take

    place.

  • 8/14/2019 Transformation of the Research Enterprise

    7/9

    36 EducausE r e v i e w Jly/agt 2006

    resolved, however, technical issues re-main. For example, in order for computerprograms to make use of the conceptsin an ontology, there must be a commoninference syntax. Standard interchangeformats are required to translate the sameknowledge into a variety of symbol-level

    representations. And it is proving to bequite time-consuming to conceptualizethe structure of tasks that systems mustperform in such a way that problem-solving methods themselves can becomereusable. Because there are differencesin the types of systems and computer ar-chitectures that best serve the resolutionof various research problems, thedesire to be able to subjectdata to multiple uses alsopresents new hardwaredesign constraints.

    NetworkingNetworking issues in-clude involvement witha variety of national andinternational grids, atten-tion to the problem of sharedinterfaces at the user end,management of remote in-strumentation and sensors atthe input end, and governanceproblems.

    The boundaries of anysingle institutions researchcomputing infrastructure nowinclude regional, national,and international resources.In addition to computationalgrids, research involves ac-cess grids (for collaborationamong distributed research-ers), data grids (for accessingand integrating heterogeneous datasets),and sensor grids (for collecting real-timedata about matters such as traffic flows

    and electronic transactions). The fact thatthe architectures for each of these typesof grids are often not contiguous witheach other raises additional managementproblems. Indeed, researchers at a singleinstitution may be involved in multipleaccess, data, and sensor grids.

    Grid computing also creates issuesthat must be resolved at the points of datainput and output. On the input end, thereare as yet no consensual norms for how to

    manage remote instrumentation and datacollected through the sensor web; solu-tions are being developed on a project-by-project basis. The number of issuesin this area continues to rise with tech-nological innovation. Clinical medicalpractice, for example, is a forerunner of

    a field in which researchers would like tobe able to put information into databases,analyze that information, and query data-bases for additional information while inthe field and engaged in active problem-solving. Wireless networking and theubiquitous sensor web present problemsunique to their environment. On the out-

    put end, there is a need for userinterfaces that are consistent

    irrespective of comput-ing platform, a matterparticularly important

    in collaborations, sincesuccessful distributedresearch projects require

    WISIWYS (What I See IsWhat You See) interfaces.

    Just as it will take decadesto resolve governance is-sues raised by the globalinformation infrastructure,so the disjunctures amongorganizational, political,geographic, network, and

    research practice boundar-ies present a number of newgovernance problems. Orga-nizational structures beingdiscussed and experimentedwith include managementby disciplinary associations,leadership by individualcampus IT organizations,expansion of the roles of the

    Internet2 organization, oversight by thefederal government, and creation of anentirely new organization to manage grid-

    related matters.

    DataData issues now dominate many discus-sions about how to manage IT engage-ment with research: the amount of datathat must be handled has grown; the re-search habit of using only one type of datain a research project has been replacedby the use of diverse datasets subjected tocommon analytical frameworks; there is

    an interest in ensuring a long life for data-sets to maximize possibilities for knowl-edge reuse; new types of data repositoriessuggest the need for administrator andresearcher training in the management ofthat data; data can be sensitive in a varietyof ways addressed by governmental as

    well as institutional policy; and there isstill a paucity of software for the analysisof many different types of datasets. Theimpact of resolving these issues is ex-pected to be significant, for as the analysisof gene sequences has demonstrated, theability to mine very large data collectionsof global importance can open up entirelynew paths of research.

    The amount of data being createdand stored each year continues to growat exponential rates, with consequencesfor computing capacity. It has been esti-

    mated that in 2003, 5 exabytes (1018 bytes)of new information was created, with 92percent born digital and 40 percent cre-ated in the United States. This is the same

    volume of information that is believedwould result if all words ever spokenby human beings were digitized. Singleexperiments can now yield as much as 1petabyte (1015

    bytes). It is estimated that

    the amount of data in all of the holdingsof all U.S. academic libraries is 2 petabytesand that, if digitized, all printed material

    in existence would have a total volume of200 petabytes. The Large Hadron Colliderjust coming online at CERN is expected to

    yield multiple petabytes per experiment,to be used by 2,000 researchers aroundthe world.

    The amount of data keeps expanding,for a number of reasons. Many kindsof information originally produced inanalog form are being digitized (e.g.,manuscripts, transcripts, and visualinformation). The sensor web is vastlyexpanding the amount of data available,

    often in real time. Distributed comput-ing and advanced networking supportfor the transport of data from one site toanother increases the amount of data incirculation and the amount of data ac-cessible from any single location. Oftendata is generated, transported, analyzed,and used entirely within the world ofmachine-machine communication,without human intervention. The NSFis trying to stimulate the development of

    The amountof data being

    createdand storedeach year

    continuesto grow atexponentialrates, with

    consequencesfor computing

    capacity.

  • 8/14/2019 Transformation of the Research Enterprise

    8/9

    38 EducausE r e v i e w Jly/agt 2006

    new uses of data, such as dynamic data-driven simulations that could be used forpurposes such as forecasting tornadoes.Codification of tacit information also ex-pands the universe of data. And knowl-edge reuse creates additionaldata from the results of re-

    search projects. The grow-ing use of simulationsand modeling meansthat data about possiblefutures also may needto have long lives. Sinceit is clear that our abilityto meaningfully and effec-tively make use of all of thedata available will not keep upwith its growth, institutions,disciplines, and the govern-ment will need to make policy,

    infrastructure, and mainte-nance decisions regardinghow much, and which, infor-mation to keep.

    Were m ost l ik ely toachieve our long-term data-storage goals if the collectionsof global importance are man-aged by just a few institutionsserving as community prox-ies. These institutions wouldhave responsibility for collec-

    tion access, collection struc-ture, technical standards andprocesses for data curation,ontology development, anno-tation, and peer review. One example ofwhat such a community proxy might looklike is offered by the Data Central servicerecently announced by the Universityof CaliforniaSan Diego. Available toresearchers at any institution, this serviceincludes a commitment to keep data col-lections accessible for one hundred years.Other types of data collections may best

    be handled by disciplinary groups, teamsdevoted to specific large-scale problems,or the institutions of key researchers. Andsince research funders are beginning torequire that grant proposals give explicitattention to long-term data-storage is-sues, institutions with grant-fundedresearchers need to begin addressing thisproblem immediately.

    The functions of researchers as bothdata authors and data users are tradi-

    tional but are receiving renewed atten-tion in terms of training, certification,monitoring, establishment of funded joblines, and assurance of career paths in

    todays environment. New categories ofprofessional responsibility are

    necessary in order to work

    with long-lived data stor-agecategories such asspecialists in metadataand individuals whocan provide an interfacebetween researchers and

    collections. The Univer-sity of Michigan is the first

    institution to step forward toprovide leadership in train-ing individuals for these newcareer paths.

    The desire to maximize

    knowledge reuse has becomea much more important fac-tor in the design of libraries,databases, storage collections,and archives. Knowledgereuse occurs in several dif-ferent ways: conducting asecondary analysis of a re-searchers raw data, analyzedeither by another in the samefield or by people in otherfields; revisiting data through

    new analytical lenses; analyz-ing data of multiple typesthrough a single analyticallens; synthesizing the results

    of many different types of studies forsimultaneous analysis of multiple typesof data about the same problem; adaptinganalyses of data for application in newcontexts; and visualizing results of dataanalyses.

    Meeting the needs of knowledgereuse, as well as the shared use of dataacross disciplines as required by interdis-

    ciplinary research, heightens the need forinformation infrastructure around data.Metadatadata about datais of grow-ing importance as the number, the size,and the range of uses of data collectionsgrow. Cross-walks are a very traditionalway for information scientists to linkdata from one discipline with data fromanother discipline. Efforts to reach com-mon ways of thinking about data pertain-ing to a research problem that is shared

    across disciplines builds new ontologies.Some people, such as those involved inthe effort to develop a semantic web,believe it will be possible to discuss datain ways that will be comprehensible toanyone from any discipline, geographiclocale, culture, organizational context,

    or society. A number of laws, regulations, and

    policy concerns affect the treatment ofdata. There is, for example, a tensionbetween the need to centralize data thatis being used by multiple parties formultiple purposes and the interest indispersing data in order to respond to

    vulnerabilities highlighted by the 9/11 ex-perience and to maximize access. Privacyand security issues are becoming moreimportant, and more complex. Interna-tional research collaborations generate

    policy problems when the laws of variousnation-states regarding the treatment ofdata do not harmonize with each other.The growing ability to revisit data col-lected in the past raises questions abouthow to handle information collectedfrom cultures or in periods in which therewere very different ideas about what con-stitutes informed consent.

    ConclusionOver the last few decades, a number of

    themes that shape the context for ITengagement with research in higher edu-cation have emerged at the intersectionof national policy, disciplinary develop-ments, the evolution of research methods,and institutional habits: the emergenceof computation as a fundamental anddistinct stage of the research process, theimpact of globalization, the growth inscale of research projects, the trend towardinterdisciplinary research collaborations,the recognition of the value of knowledgereuse, and the democratization of research.

    The tensions generated by these themesare familiar to those in information tech-nology: centralization versus decentraliza-tion; the inertia of historical habit versusthe need for innovations in practice; thespeed of technological innovation versusthe speed of institutional innovation; therequirements of the academic institution

    versus those of external funding agencies;desires of faculty versus mandates for ad-ministrators; the needs of the many versus

    The desireto maximizeknowledge

    reuse hasbecome amuch moreimportantfactor in

    the designof libraries,databases,

    storagecollections,

    and

    archives.

  • 8/14/2019 Transformation of the Research Enterprise

    9/9

    40 EducausE r e v i e w Jly/agt 2006

    the needs of the few; and knowledge inservice of the public versus developmentof intellectual property by an institution

    versus personal career ambitions. Thesetransformations of the research enterpriseoffer new opportunities for researchers,administrators of higher education institu-

    tions, and IT specialists.e

    Notes

    1. National Science Board, Long-Lived Digital DataCollections: Enabling Research and Education in the 21stCentury (Washington, D.C.: National Science Foun-dation, 2005).

    2. Jerome A. Feldman and William R. Sutherland.Rejuvenating Experimental Computer Science:A Report to the National Science Foundation andOthers, Communications of the ACM, vol. 22, no. 9(September 1979): 497502 (the Feldman Report); William H. Press, ed., Prospectus for Computa-tional Physics, Report by the Subcommittee onComputational Facilities for Theoretical Research

    to the Advisory Committee for Physics, Divisionof Physics, National Science Foundation, March15, 1981 (the Press Report); Peter Lax, ed., Reportof the Panel on Large-Scale Computing in Scienceand Engineering, Coordinating Committee NSF/DOD, December 26, 1982 (the Lax Report).

    3. U.S. Office of Science and Technology Policy(OSTP), A Research and Development Strategy for High

    Performance Computing(Washington, D.C.: Govern-ment Printing Office, 1987).

    4. NSF Blue Ribbon Panel on High PerformanceComputing, From Desktop to Teraflop: Exploiting theU.S. Lead in High Performance Computing (Washing-ton, D.C.: National Science Foundation, 1993).

    5. Lewis Branscomb, ed., Empowering Technology:Implementing a U.S. Strategy (Cambridge: MIT Press,1993).

    6. NSF Blue Ribbon Advisory Panel on Cyberinfra-structure, Revolutionizing Science and Engineeringthrough Cyberinfrastructure (Washington, D.C.:National Science Foundation, 2003), .

    7. Francine Berman and Henry Brady, NSF SBE-CISEWorkshop on Cyberinfrastructure and the Social Sciences(Washington, D.C.: National Science Foundation,2005), .

    8. American Council of Learned Societies (ACLS),The Draft Report of the Commission on Cyberinfrastruc-ture for Humanities and Social Sciences (New York: American Council of Learned Societies, 2005),.

    9. Ibid.10. National Science Board, Long-Lived Digital Data

    Collections: Enabling Research and Education in the21st Century (Washington, D.C.: National ScienceFoundation, 2005), .

    11. Presidents Information Technology AdvisoryCommittee (PITAC), Computational Science: Ensuring

    Americas Competitiveness (Arlington, Va.: NationalCoordination Office for Information Technology

    & Research, 2005).12. Lynn Arthur Steen, ed., Math & Bio 2010: Linking

    Undergraduate Disciplines (Washington, D.C.: Math-ematical Association of America. 2005).

    13. George Johnson , All Science Is Computer Sci-ence, New York Times, March 25, 2001.

    14. Kenneth G. Wilson, Grand Challenges to Com-putational Science, Future Generation ComputerSystems, vol. 5, no. 2/3 (September 1989): 17189.

    15. Author interview with Larry Smarr, Director of theCalifornia Institute for Telecommunications andInformation Technology and Professor of Com-puter Science at the University of CaliforniaSanDiego, August 4, 2005.

    16. Burton Bollag, Finding Land Mines beforeLebanons Children Do, Chronicle of Higher Educa-tion, June 17, 2005; Jeffrey R. Young, Database WillHold the Mirror Up to Hamlet, with All Com-mentary on the Play, Chronicle of Higher Education,May 27, 2005; Katherine S. Mangan, Medicine forMusicians, Chronicle of Higher Education, October15, 2004.

    17. Marcel Bardon and Kent Curtis, A National Com-puting Environment for Academic Research, NSFWorking Group on Computers for Research, July1983.

    18. Executive Office of the President of the UnitedStates, Office of Science and Technology Policy,Science and Technology: A Foundation forHomeland Security, April 2005, .

    19. Peter Lyman and Hal R. Varian, How Much Infor-mation 2003? .