24
This article was downloaded by: [Dicle University] On: 12 November 2014, At: 23:01 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Map & Geography Libraries: Advances in Geospatial Information, Collections & Archives Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/wmgl20 Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing Daniel Goldberg a , Miriam Olivares a , Zhongxia Li a & Andrew G. Klein a a Texas A&M UniversityCollege Station, Texas, USA Published online: 21 Apr 2014. To cite this article: Daniel Goldberg, Miriam Olivares, Zhongxia Li & Andrew G. Klein (2014) Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing, Journal of Map & Geography Libraries: Advances in Geospatial Information, Collections & Archives, 10:1, 100-122, DOI: 10.1080/15420353.2014.893944 To link to this article: http://dx.doi.org/10.1080/15420353.2014.893944 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Embed Size (px)

Citation preview

Page 1: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

This article was downloaded by: [Dicle University]On: 12 November 2014, At: 23:01Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Map & Geography Libraries:Advances in Geospatial Information,Collections & ArchivesPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/wmgl20

Maps & GIS Data Libraries in the Era ofBig Data and Cloud ComputingDaniel Goldberga, Miriam Olivaresa, Zhongxia Lia & Andrew G. Kleina

a Texas A&M UniversityCollege Station, Texas, USAPublished online: 21 Apr 2014.

To cite this article: Daniel Goldberg, Miriam Olivares, Zhongxia Li & Andrew G. Klein (2014) Maps& GIS Data Libraries in the Era of Big Data and Cloud Computing, Journal of Map & GeographyLibraries: Advances in Geospatial Information, Collections & Archives, 10:1, 100-122, DOI:10.1080/15420353.2014.893944

To link to this article: http://dx.doi.org/10.1080/15420353.2014.893944

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Journal of Map And Geography Libraries, 10:100–122, 2014Published with license by Taylor & FrancisISSN: 1542-0353 print / 1542-0361 onlineDOI: 10.1080/15420353.2014.893944

Maps & GIS Data Libraries in the Era of BigData and Cloud Computing

DANIEL GOLDBERG, MIRIAM OLIVARES, ZHONGXIA LI,and ANDREW G. KLEIN

Texas A&M University, College Station, Texas, USA

In upcoming years, two major changes in the computing land-scape will reshape how map and GIS data libraries (MGDLs) willbe required to perform their core functions in the future. Theseadvancements—cloud computing and the “Big Data era”—offeropportunities and challenges for libraries, but most dramaticallychanges. Commercial cloud computing solutions are available ason-demand service; low-cost, internal private clouds are now afinancial possibility. Simultaneously, Geographic Information Sci-ence (GISci) data and services have advanced at an ever-increasingpace into the “Big Data,” swelling the types and amounts of GIS dataand services available. These two shifts have and will impact theentire GIS world and MGDLs, in response, and will now be requiredto collect, curate, and make available more data and services thanever. The MGDL community must be prepared to respond and re-act in order to remain effective. This article explores the evolvinglandscapes within which MGDLs must operate and examines howtheir roles and operational organization will be impacted. The hopeof the authors is that such analyses will spark community-drivendiscussion to motivate the next major phase of research and imple-mentation in the world of MGDLs.

KEYWORDS big data, cloud computing, GIS, library

INTRODUCTION

Many higher education institutions worldwide house a maps and geographicinformation science (GIS) data library (MGDL). In some instances, these units

© Daniel Goldberg, Miriam Olivares, Zhongxia Li, and Andrew G. KleinAddress correspondence to Dr. Daniel Goldberg, Geography, 810 Eller M&N, TAMU

3147, College Station, TX 77843-3147. E-mail: [email protected]

100

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 3: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 101

are within the university’s library system. In others, they are joint venturesbetween the libraries and other campus IT entities, and in still others theyare run primarily as a college- or department-level service. As with manyother university units that perform specialized services, the administrative,operational, and organizational structure of these units varies dramaticallybetween institutions. Given this heterogeneity in approach, it is not surprisingthat the types of data and services MGDLs provide vary among educationalinstitutions, as do the customer bases which they serve. Generally, theseadministrative units provide data storage, retrieval, and archival services forpaper maps and geographic artifacts as well as digital GIS data in both rasterand vector form to a broad user base.

Despite the inherent differences across institutions, how MGDLs func-tion has remained relatively stable for some time. However, at this time,MGDLs, like the rest of academe, are faced with disruptive changes in sci-ence, internet technology (IT), hardware, software, and end-user characterand expectations that will force reconsideration of the roles, services, andtechnical structures of these organizations (Perera 2008). The first of thesechanges is the so-called big data age in which we now live and work (Lohr2012). In this new era, massive amounts of data are continually producedand must be collected, stored, and made discoverable and accessible. Thesecond is the rapid emergence and impending market dominance of cloudand virtual computing (Baun et al. 2011; Bughin, Chui, and Manyika 2010;Manyika et al. 2011; Vouk 2004) wherein data and services are hosted re-motely “on the cloud” or on small virtual “slices” of powerful servers insteadof on individual dedicated hardware. The third is the maturity of GIS-relatedsoftware systems and applications that are now capable of providing a va-riety of high-quality data management and GIS services at low to no cost.The fourth is the changing expectations of a changing user base, which hasexpanded beyond the traditional disciplinary homes on university campusesto encompass a far greater range of interests and levels of expertise.

These changes are having a direct impact on MGDLs, whose fundamen-tal missions include data hosting and archiving, providing a broad range ofservices to a varied set of patrons. These changes simultaneously increaseand decrease costs, as the cost of storing increased data volumes rises whilethe cost of required computing infrastructures has been falling. Beyond thesechanges, MGDLs are affected because the diversity of services they must of-fer their changing customer base and the level of training and specializationrequired for library staff to operate, support, and advise their customersincreases in parallel.

Simultaneously, as GIS and spatial thinking pervade university cam-puses, more users are becoming exposed to the technology, data, and ser-vices that facilitate GIS research, education, and service (Spiegel and Kinikin2004; Sinton and Lund 2007; Ramasubramanian and Goldberg 2012). TheMGDL’s user base is rapidly expanding as GIS use grows outward from its

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 4: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

102 D. Goldberg et al.

traditional core of disciplines. This is expanding the user base that seeks toconsume the services provided by MGDLs. These users have a wide-rangingset of technical abilities and experiences that may differ considerably fromwhat once were MGDL’s core patrons. New users from disciplines that havenot traditionally offered training in database or data management, carto-graphic design, or spatial analysis or modeling may require user-friendly,familiar interfaces to interact with MGDL resources such as the Web mapssimilar to those found on commercial mapping sites like Google Maps. Newusers from data-heavy disciplines, although technically skilled, might be un-familiar with common GIS data formats and standard GIS software requiringthe development of new software systems or services to convert from GISdata formats such as shapefiles into less common formats popular in otherfields such as NetCDF. Cloud and virtual computing may hold the keys tomeeting the needs of these new types of MGDL users.

In the face of this profusion of computing and data storage, why douniversities, and in particular MGDLs, continue to invest in hardware, infras-tructure, and staff to support their own servers and data centers and providetheir own services? Shouldn’t they be following the lead of major businessesand international corporations by shifting computing and data services to thecloud? Some signs point to yes; major institutions such as Yale and Univer-sity of Southern California (USC) now rely on Google Gmail for their e-mail.Students, faculty, and staff around the world routinely make use of backupand synchronization services provided by Dropbox. Coders in computerscience and geography departments at universities worldwide make use ofthe online service GitHub to facilitate collaborative programming projectsemploying multiple programmers.

This article examines how the structure of and services provided byMGDLs on university campuses are impacted by these new environmentswithin which MGDLs must operate with a particular eye to identify chal-lenges and opportunities. To set the stage, a detailed introduction to thenew computing techniques and resources available to MGDLs is presented,along with a detailed description of the new data landscape that these or-ganizations must support. Next, the primary roles and core responsibilitiesof MGDLs are discussed, especially those affected by the described changesin data, computing, and users. Technical operations including providing GISdata curation and access services and GIS development, support, and ad-vising capacities that occur in these organizations on many campuses arediscussed as indicators of what libraries are being affected and how. Thisdiscussion is followed by an analysis of various administrative and opera-tional configurations that MGDLs can take and the technical and service rolesthey can provide. This analysis illustrates how universities can organize thecollection, aggregation, storage, curation, and dissemination of GIS data dif-ferently. We also point how the operational configuration of a library withinits institution affects its ability to adapt its infrastructure, the types of data it

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 5: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 103

can store, and the services it can provide to keep pace with and leveragethese changes in data, science, computing, and patrons. A final analysis linkseach of the changes that MGDLs must respond to across their core areas ofservices and organization to identify challenges and opportunities affordedby the changing landscape of technology, data, and end users.

The Changing Landscape of Computing—Setting the Stage

To understand the present role of MGDLs and the challenges they face, it ishelpful to reflect on the challenges MGDLs have already dealt with over thepast two decades and how they have adapted to address these challenges.History tells us that the availability of technology has always been a drivingforce behind the services offered by MGDLs and that the types, amounts, andqualities of the services offered have directly impacted the level of engage-ment MGDLs have been able to achieve with their university campus patrons.As early as the 1980s, MGDLs faced issues in keeping pace with changingtechnology. North (1989) discussed the effect of laser advancements to storedata on compact and optical discs, questioned whether libraries were readyfor such technology, and described how microcomputing costs at that timewere decreasing while storage capacity was increasing.

The early 1990s saw a slow proliferation of GIS services due to thescarcity of geographic digital data, software applications not being fully de-veloped or difficult to operate, and users’ lack of knowledge (Herold, Gale,and Turner 1999). In addition, computing infrastructure was identified as amajor challenge to establishing new GIS services in the 1990s (Cobb 1995).Starting in the midnineties, GIS awareness began to rise alongside advancesin available technologies; the benefits, engagement, and literacy amonglibrary communities offering GIS services also began to increase (Adler,1995).

In 1999, the ARL GIS Literacy Project published a survey on GIS in the li-brary realm; at the time, sixty-four out of seventy-two respondent institutionswere offering GIS services, most of them based in a government documentdepartment or a map library (ARL 1999). Often, GIS services influenced therole of map, document, reference, or subject librarians because of the ab-sence of GIS librarians (Argentati 1997). Some issues identified at the timewere the interdisciplinary and complex nature of GIS reference questions,the pressing need to upgrade computer infrastructure to keep up with soft-ware advancements, the role of libraries to serve as data depositories, andthe necessity of training librarians in GIS (ARL).

Fast forward to today. The last decade has witnessed two major changesin the modern computing landscape advancements—cloud computing andthe big data era. These, coupled with advances in the capabilities of freeand open-source software systems have once again fundamentally reshaped

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 6: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

104 D. Goldberg et al.

the underlying technologies and platforms that MGDLs use to perform theircore functions (Brown, Chui, and Manyika 2011). If we use history as ourguide, understanding the ways in which these technologies can be usedto benefit the MGDL community will provide opportunities to better serveMGDL patrons into the future.

Big Data

It has been well documented across many scientific disciplines that humansociety has entered the age of big data (Manyika et al. 2011). At no priorjuncture in history have data been generated at rates seen today in nearlyevery field of academic pursuit. Every day, and in some instances every hour,millions upon millions of consumers interact with services that generate data.Similarly, vast sensor arrays distributed around the globe and in space pumpout data readings nonstop. These data are voluminous, they arrive at highvelocities, they encompass many disparate things, and they are, in manycases, exceptionally valuable. These include Internet search-engine data thatcan be used to generate knowledge about trends in disease outbreaks oreating preferences, online shopping data from outlets like Amazon that canbe used to generate knowledge about trends in consumer spending andeconomies writ large, and in situ sensors monitoring environmental condi-tions that might generate new knowledge about the links between exposureand disease. Each of these data sources provides valuable time-stampedtransactional data at the level of the individual for business decisionmakers,government policymakers, and researchers. These vast volumes of data havethe potential to enable research into some of the world’s most interestingand pressing problems including predicting the future state of the world’sclimate, planning for disasters, simulating outbreaks, and testing targetedintervention strategies to mitigate loss or benefit society.

Geographic information science (GIS) is no exception to this big datatrend in science, and GIS data have advanced quickly into the big data era(Shekhar et al. 2012). The dramatic increase in the volumes of GIS databeing collected has impacted the full spectrum of the GIS world from com-mercial enterprises to government agencies and, in no small way, MGDLs.The smartphones in the pockets of nearly every Internet-connected personin the world generate spatially enabled, real-time data about the locations,activities, and preferences of their owners and their interactions with theworld around them; the networks of buoys at sea capture environmentaldata about the wet 71 percent of our world; the satellites perched abovethe planet capture images of our world and beyond. Spatially enabled dataare continually streaming from data sources into GIS databases that describethe earth, its inhabitants, modern societies, and individuals at unprecedentedscales of both frequency and resolution.

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 7: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 105

Cloud Computing

Although the term “cloud computing” means many things to many people,nearly everyone would agree that it at least refers to storing one’s data onlineusing services available via the Internet, rather than storing it locally on apersonal computer. Similarly, cloud computing is also generally understoodto refer to hosting or using services over the Internet on computers locatedsomewhere else in the world that the consumer does not generally own.These types of service-based usages relieve end users of the need to pur-chase hardware, install services, and provide hardware and software supportand maintenance. Users simply use remote services as needed, sometimesfree, and they need not worry about the underlying technology that keepsthem running. To achieve this goal, many cloud-computing providers havedeveloped specialized software that allows them to link thousands of com-modity computers to act as one enormous “cloud” of computing resourcesof which end users can avail themselves when their service requests areprocessed by the service provider.

Cloud computing is emerging as the most-used means of organizingcomputing resources for massive-scale needs, over all prior forms of comput-ing architectures. It has already changed the way average Internet-connectedcitizens interact with online digital data and the services that process or oth-erwise transmit or manipulate it (Armbrust et al. 2010; Miller 2008). Forexample, the general public has become accustomed to using services anddata not stored or computed locally on their own computers, tablets, orphones, but instead accessed via the Internet through Web browsers, desk-top applications, and of course, mobile apps. Common Web search serviceslike Google and Yahoo, online shopping Web sites like Amazon, onlinevideo services like Netflix and Hulu, and online document management sys-tems like Dropbox and Google Drive all represent examples of commonlyused, cloud-based services and data storage. Users now readily and willinglystore personal data such as photos, homework, medical records, and finan-cial information in cloud services for availability and safekeeping. Tweens,teens, parents, and grandparents communicate and share experiences suchas restaurant and movie reviews using cloud-enabled services like Facebookand Yelp.

There are many reasons cloud computing should be attractive to MGDLs.Its operational capabilities are numerous and varied. It is distributed, mean-ing that services and data are split across many machines in potentially manyphysical locations. Cloud computing is reliable, meaning that the data storedin the cloud or services running on the cloud should always be available fromany Internet-connected device or computer. It is scalable, meaning that theprocessing power driving a particular service or data delivery should shrinkor grow to meet changing demand. Cloud computing is cheap, meaning thatthe cost to persons and organizations that pay to host data and services on

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 8: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

106 D. Goldberg et al.

the cloud should be no more than—and hopefully a fraction of—hostingthose data and services locally on their own hardware. Cloud computing issecure, meaning that one person’s or one organization’s data can be viewed,exported, or downloaded only by authorized users.

Virtual Computing

Rapid decreases in server hardware prices including, most importantly, cen-tral processing units (CPUs) and memory have recently occurred so thatpowerful servers can now be purchased for a fraction of the investmentnecessary just five years ago. To realize economies of scale, it is now com-mon for organizations to purchase a few very powerful servers rather thanan enormous quantity of low-performance desktops. The resources of thesephysical servers (memory, CPU cycles) are logically separated, or “virtual-ized,” into many smaller logical machines, called virtual hosts, for performingdedicated tasks (Baun et al.2011; Bughin, Chui, and Manyika 2010; Manyikaet al. 2011; Vouk 2004).

Virtual computing can accomplish many of the same operationally de-sirable characteristics as cloud computing. The primary difference betweenthe two is that virtual computing often relies on hardware still physicallypresent on the university campus. However, just like cloud computing, vir-tual computing is distributed—computing can be distributed across manyphysical machines; virtual computing is reliable—virtual hosts can be set toautomatically “failover” to different physical hosts and entire data centers canbe replicated in real time should catastrophic failure occur at one location;virtual hosting is scalable—additional physical hosts can be easily insertedinto an existing cluster with little to no effort and virtual hosts can releaseresources when not in use to adjust capacity to changing resource demands;virtual hosts are inexpensive—for under $50,000 an organization could pur-chase and install an environment powerful enough to run the computing,data, and service needs for possibly an entire department within a majorresearch university; virtual hosting is secure—data and services can be ac-cessed only by authorized users. Hosting hardware locally on campus mayalso be an attractive option when data have associated security or confiden-tiality issues or when systems need to integrate with other university systemssuch as single sign-on services that allow university user accounts to workacross the many systems of a university.

Advances in Native GIS Support in Databases

In the world of GIS, long gone are the days of the solitary GIS analystor scientist working on a single, self-contained data set with services andcomputing techniques located solely on his or her desktop. Today’s GIS

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 9: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 107

users routinely ingest data into their analysis from on-demand base mapsand data layers hosted in the cloud or on remote servers. Chief among thetechnological advancements that have enabled this new method of doing GISwork is the built-in support for spatial data types (points, lines, polygons)and spatial operations (search, buffer, intersect, clip, etc.) in a free or open-source relational database management system and newer database formatslike NOSQL and object-oriented databases. Prior to these developments,storing digital GIS data required specialized software such as ArcGIS SpatialData Engine in addition to a database (West 2001). This meant that storingand providing access to large amounts of spatial data required the purchase,installation, and maintenance of several different systems, increasing costsand requiring specialized training for staff.

Support for spatial data is now a standard feature supported by nearlyall commercial-grade database systems including Microsoft SQL Server,Oracle, MySQL, and PostgreSQL, the latter two both being freely available.This means that users seeking to host and provide access to massive quan-tities of spatial data can do so for low to no cost. The adoption of opengeospatial consortium, standards-based data structures for representing spa-tial data such as Well-Known Text, Well-Known-Binary, Geographic MarkupLanguage, and keyhole markup language (KML) now means that the datacan be transferred in and out of various database systems and consumedby client applications, transparent to the user. The result of these combinedadvances in spatial data storage infrastructures is that organizations can noweasily and cheaply host large amounts of spatial data and make these dataavailable to their users in formats they can readily make use of. In conjunc-tion with cloud and virtual machine technologies, the costs of GIS computingand spatial data storage are the lowest in history.

Maps and GIS Data Library Configurations

To understand how advances in computing technology, new forms of data,and expanding user bases with increasing expectations impact MGDLs, it isuseful to first understand how these entities can be organized within univer-sities. Every university organizes its research, administration, and educationalenterprises differently and the management of GIS data and services is noexception. How GIS data and services are organized, housed, developed,and how GIS is taught on campuses critically influences the technical infras-tructures used, expertise employed, and services offered by MGDLs. Variousorganizational structures afford different opportunities to adapt practices andadopt new technology approaches.

To begin, campus GIS units often go by different names across in-stitutions such as geospatial centers, data service units, or map and GISdata libraries. Some examples include the Geospatial (GIS) Data Services at

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 10: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

108 D. Goldberg et al.

North Carolina State University[1], the Map and GIS Library[2] at Texas A&MUniversity, and the Stanford Geospatial Center[3] at Stanford’s Branner EarthSciences Library. Some institutions are actively organizing GIS across campussuch as the University of Minnesota’s USpatial Initiative[4].

Three prototypical arrangements that can be seen across universitiesare outlined below. They are by no means an enumeration of all possi-ble approaches, but they still serve to illustrate how the effect of big data,cloud/virtual computing, and broad or narrow users groups are relevant toeach. The strengths and weaknesses of each organizational approach arecharacterized by the key qualities of cost, reliability, and expertise. Today’schanging computing landscape not only impacts the types and amount ofdata and services that can be provided but, even more importantly, thechanging user expectations of each.

Independent or Stand-alone Library Units

At one end of the spectrum are dedicated MGDLs, which are stand-aloneunits within a larger university library system. In many instances, these unitsemploy their own specialized staff, house their own software and hardwareinfrastructures, and curate large collections of both digital and hard-copygeographic data and geographic services. Library personnel manage all as-pects of the data stored by the unit, along with the development, support,and hosting of the systems which serve these data out to on- and off-campususers. This organizational structure requires staff to have in-depth knowledgeof GIS data, services, and applications in addition to standard best practicesin data acquisition, curation, and delivery. The Map and GIS Library (TexasA&M University) and the Digital Map and Geospatial Information Centerat the Lewis Library (Princeton University)[5] are examples of stand-aloneorganizations within a larger university library system.

Joint Ventures with Information Technology Services

A second organizational strategy employed by universities is to share theresponsibilities of GIS data storage, dissemination, and support between theuniversity library system and the university information technology services,which often handles the setup, management, and maintenance of specializedGIS databases on behalf of the university library. In many cases, the MGDLis treated as a customer, just like any other academic or nonacademic unitrequiring storage and serving of large data volumes. Here, library staff are notrequired to have in-depth knowledge of server management, redundancy,backup, and so forth; instead they focus more effort on acquiring, curating,disseminating, and using GIS data. Examples of this partnership are found

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 11: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 109

at NYU Data Services[6] (New York University) and the GIS Center[7] at TischLibrary (Tufts University).

College- and Department-level Coordination

MGDLs are also housed within traditional academic departments or collegesinstead of as components of a larger university library administrative unit.The IT hardware, software, storage, and services are centralized within agroup serving the GIS needs of the department or college. In these instances,one or more academic units or a centralized group takes on the role of datamanager, curator, and service provider. Examples of this arrangement includethe USC Spatial Sciences Institute[8]. In addition to being standalone entities,MGLDs can exist at the department/college/unit level.

MGDLs at the department/college/unit level may not be as formalizedas the first two examples, and they may not they offer the same breadthof GIS services. Nonetheless, they are common within research universi-ties where students, faculty and staff work on shared GIS data as part ofmultiperson projects. The department, college, or unit maintains its ownhardware and software infrastructure. The staff that supports this infras-tructure might be highly trained in IT setup, support, and management,but not as well versed in GIS terminology, data, computing, services, orutilization.

Community-specific Coordination

Although housed at a single university, the mission of the MGDL may servethe needs of a specific research community. There may be on-campus groupsthat produce GIS data, often of a specialized nature, for example, related toa particular research project, and even curate and provide access to thesedata. Or they can be more general such as the Polar Geospatial Center[9] atthe University of Minnesota, which provides GIS expertise not to the campuscommunity but to the polar research community.

Roles and Responsibilities of Maps and GIS Data Libraries

The core functions that MGDLs perform in service of university commu-nities vary greatly across institutions. These include core library functionssuch as collecting, archiving, cataloging, indexing (Steinhart 2006), and cu-rating paper maps and other nondigital, tangible GIS or GIS-related dataand artifacts. Other common MGDL roles include curating government data,archiving historical data, and digitizing paper-based and nonreadable fileformats.

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 12: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

110 D. Goldberg et al.

In addition to these traditional library roles, the core mission of MGDLsnearly always focuses on managing spatial information. This fact differenti-ates MGDLs from other library branches because much of the digital mediathey manage are GIS files, or spatially enabled tabular data that provideadditional capabilities to GIS users far beyond that of the nonspatial datafound in other repositories. To fulfill the need of effectively managing spa-tial data, MGDL staff must have traditional librarian skills as well as up-dated knowledge and skills that concentrate on digital librarianship and,often, training in GIS or related disciplines (Abresch, Hanson and Reehling2008).

These organizations often develop, but always maintain, systems toenable searching, retrieval, and storage of digital GIS data and in many casesprovide some means for end-users to perform this task in a self-servicemanner (Ferguson 2013). Many organizations see this as a central role andprovide GIS data for direct use within GIS applications and research 24/7 viaWeb services such as Web feature services and Web map services. In moreadvanced scenarios, these organizations build domain-specific desktop andWeb applications to make these data more accessible to wide audiences,to serve specific research needs, or to expose geoprocessing services thatoperate on GIS data and allow users to perform complex spatial processingtasks with or on their own data.

In addition to data services, it is common for MGDLs to act as the hubof GIS activity on campus. This role can entail such service-related activitiesas providing training (webinars or short and week-long courses), hosting on-campus GIS events such as GIS Day, building connections between depart-ments, advertising GIS classes, linking students with professors and courses,and advancing GIS in nonacademic units by linking university operationswith GIS research. Additional administrative roles that GIS libraries fulfill in-clude managing the GIS site license and distributing evaluation copies of GISsoftware. Some MGDLs also manage GIS hardware and software purchasingand licensing for the university. Very active MGDLs also seek to serve thecampus community by actively establishing partnerships across campus andwith local, state, and national government agencies and with internationalpartners (Dixon 2006).

An Example Scenario for the Maps and GIS Data Library of the Future

To explore specific opportunities and challenges that MGDLs will be con-fronted with in the big data era, as well as discuss the specific advantagesand disadvantages afforded by new computing paradigms and GIS softwareresources in the context of changing user bases and user expectations, itmight be useful to develop a scenario exemplifying many of these issues. Forthis purpose, we will consider the Gulf of Mexico Coastal Ocean Observing

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 13: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 111

System (GCOOS),[10] a long-term initiative designed to monitor the offshoreand nearshore marine environment of the Gulf of Mexico. This project helpsresearchers, policymakers, and the general public protect human health andsafety, understand and protect marine ecosystems, and encourage respon-sible natural resource extraction and commerce on our planet’s waterways(Simoniello et al. 2011). This project is only one of several ocean observingsystems across the globe. It was funded by numerous stakeholders at thelocal, regional, state and federal levels including the State of Texas, NOAA,and the U.S. National Science Foundation (NSF).

In the GCOOS and other OOSs, data continually stream onshore froman extensive network of buoys floating in the Gulf of Mexico, each of whichis laden with numerous sensors. GCOOS’s individual buoys spend their en-tire lives at sea, constantly sampling environmental readings about the airand marine environment including temperature, salinity, wind speed anddirection, current speed and direction, and a wealth of other informationabout meteorological, environmental, and biological conditions coming fromknown spatial locations, all of which is time stamped. These data representthe investment of millions upon millions of tax dollars and untold numbersof man-hours. They have enabled entire careers in oceanographic research,and they are a staple data set in oceanographic teaching in the Gulf Statesin particular, and are routinely used at all levels of local, regional, state, andfederal government for policymaking, both nationally and internationally.To achieve a return on these investments in terms of advancing our under-standing of human and natural systems, building national and internationaleconomic trade, and promoting the thoughtful stewardship of our naturalresources, these data must be made available to as many researchers whowish to put their hands on them as possible. They must be readily available,at all times, forever. They must be discoverable and usable, and they shouldnot need to be stored in duplicate by every researcher who needs to workwith them. In sum, these data, like many other field-collected data sets, areextremely valuable and are perfect candidates for curation by the MGDL ofthe future.

These data sets exhibit many, but not all, of the characteristics associatedwith big data. They are huge—sampling from each individual sensor on eachindividual buoy can be reported at varying temporal scales and are often ashigh as one reading per second. This works to a very large amount of databecause there are multiple sensors on each buoy, many multiples of buoys,and many seconds in the many years for which GCOOS plans to operate.Bigger data sets certainly exist—stock trading data, genome data, particledata, and the like—but the GCOOS data is of a respectable size for an initialentree into the big data arena by a MGDL who would be inclined to curate it.

If MGDL librarians were to take this route, they would be thinking abouthow they would store and provide access to this rich set of real-time, stream-ing, spatiotemporal data flowing in from individual sensors on individual

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 14: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

112 D. Goldberg et al.

buoys within different networks. As soon as they began, they would startexposing cracks in current MGDL systems, organization, and structure thatcould cause them to reconsider these data unsuitable for curation within anMGDL. These challenges are big and include issues of data ownership andresponsibility, data quality and reliability, data currency and frequency ofupdates, and the structure and formats used to interchange and representthese data. Despite these major challenges in handling big data, the MGDLcommunity of today should attempt to follow in the footsteps of earlier im-plementers who fought through major concerns and limitations to build GISinto libraries that resulted in the sophisticated MGDLs existing around theglobe today. These and other issues not listed here are not small problems,and cloud and virtual computing cannot solve them all, but utilizing thesenew technologies can begin the MGDL community down a path towardachieving major breakthroughs one at a time.

Data Ownership and Responsibility

Beginning with the most obvious issue first, we find that these data are some-one else’s problem. An operation such as GCOOS could not exist withoutsubstantial public and private funding. This could mean that there is no rea-son an MGDL should house these data because some other organizations arebeing paid to collect and disseminate this information. In fact, the NSF hasrecently overhauled the data management requirements for all NSF-fundedresearch (NSF 2013). One component of a proper data management planis designing for long-term curation and dissemination of data obtained, col-lected, or computed with public funds. However, if one of the primary rolesof libraries in our society is to act as the repository of human knowledge,it could be argued that an MGDL would be the perfect place to store thesespatial data for long-term curation. This arrangement leverages a library’score strengths to make these data discoverable and usable by the largestpossible audience.

Data Quality and Reliability

Second, these data are dirty. Despite the best efforts of scientists and tech-nicians, sensors fail and produce erroneous readings, especially in harshmarine conditions. Buoys come off their moorings and drift to the wronglocations. Network outages and battery failures result in lost readings forprolonged periods. Such issues are present in all environmental or re-motely sensed data and present consistent headaches for data managers.In library settings, where the data stored and provided to library customersare supposed to be of final production quality, this issue might be prevalentenough to warrant omitting these data layers from library catalogs. However,

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 15: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 113

techniques exist to validate the quality of these data and quantify levels ofexpected accuracy at the buoy and network levels. The parties responsiblefor the aggregation of these data can undoubtedly produce error boundsand quality assessments of sufficient rigor to make these data suitable forinclusion in the data warehouses of university libraries.

Data Currency and Update Frequency

Third, these data are continuously being updated. Data from the sensors inthe GCOOS network arrive incessantly as time-stamped readings continu-ously stream in from each of the network’s multiple remote platforms. Suchlong-term readings form the basis for much of modern spatiotemporal ana-lytical science and are one of the hallmarks of big data. However, this newform of streaming data collection poses interesting challenges to MGDLs thatstrive to make it usable, discoverable, storable, and achievable within theircollections.

Would every reading inserted into a curated data layer constitute a newversion? What about thousands or millions of readings all recorded at thesame time? Are each of those a new and distinct version or are they all partof a single update? What about hardware errors that cause clock drifts on thedevices out at sea resulting in readings taken at the same time but reportedto be at different times when the data arrive? These are just a few of thequestions these streaming data pose.

Data Standards and Formats

Fourth, and perhaps most importantly, big data derived from sensor measure-ments are generally recorded, transmitted, stored, and utilized in specialized,often nonstandard, formats that are unfamiliar to many GIS users. These usersare accustomed to dealing with traditional GIS formats such as shapefiles,geodatabases—and to some extent keyhole markup language—which arenot suited for high-resolution spatiotemporal data. Historically, GIS data for-mats have not involved time, thus GIS users have traditionally mixed, oftenwithout thought, information collected at different times. These data formatsare essentially flat files containing individual rows associated with individualspatial locations. The columns describe the spatial location and nonspatialattributes associated with the object at one or more periods of time. Toinclude a temporal component in these files, one of two methods can beemployed.

In the first, the nonspatial attribute columns are duplicated as additionalcolumns, one for each instance of time. This approach violates many ofthe normal forms of database design as the structure of a table should notdepend on the data contained in itself. This approach breaks fundamental

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 16: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

114 D. Goldberg et al.

best practices in database design because the number of columns in a tableshould not be determined by the number of readings a sensor has reported.Not all sensors report the same number of readings, meaning that somecolumns may be null if no readings are reported for that particular sensor.Such an approach could also result in an infinite number of columns, becauseeach time a new reading is added, additional sets of columns must be addedto accommodate those readings.

In the second method, new rows are added to the file each time a newreading is observed. This results in panel data, or repeated measurementsof the same phenomenon over time. Although better than the previous ap-proach because the number of columns is constrained by the number ofattributes associated with the sensor, reading files created in this manneris still inefficient from a database perspective because the geographic dataand other stationary data values (those that do not change over time) arerepeated for each of the sensor measurements.

The shortcomings associated with both these formats contrast with re-lational database structures whose multiple tables are used to store and rep-resent the attributes of geographic objects at different times. This relationalapproach is routinely achieved with traditional shapefiles and geodatabasestructures by using joins across tables. However, users who have attemptedto achieve this goal are often frustrated with the limitations of current desk-top GIS applications in handling relationships across numerous tables onvery large data sets.

These and other storage and representation shortcomings of traditionalGIS data formats have led to the development of discipline-, project-, andagency-specific data storage and transmission formats that are commonlyused in either environmentally sensed data or data highly focused for time-series analysis. One example is NetCDF, a data format routinely used inclimatic, atmospheric, and oceanographic science settings and which wouldbe of great use to MGDL customers interested in such phenomena. The chal-lenge for MGDL librarians who seek entry into the real-time, massive, andpervasive data storage, curation, and management business is how to inte-grate standard file formats used within the GIS community with the formatsspecific to communities that generate, store, and utilize spatially and tempo-rally enabled data. For data to be used, they must be provided to formatsknown and used in the communities of interest. Because one library missionis to make data accessible, MGDLs represent an ideal setting in which toundertake initiatives aimed at making data created by diverse communitiesdiscoverable and usable by clients outside their historical user base.

Considerations of the Maps and GIS Data Library of the Future

There is no doubt that MGDLs will continue to be impacted by new technolo-gies like cloud and virtual computing as well as big data. MGDLs will need to

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 17: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 115

adapt themselves, their staff, their computing infrastructure, and the servicesthey provide to take advantage of the assets that these changes bring. How-ever, no single strategy will fit all organizations; there is a myriad of wrongstrategies that could be adopted with disastrous results, in the worst case re-sulting in wasted resources and unhappy library patrons. Alternatively, in thebest case, these technology advances could streamline operations, makingthem more resistant to failure and adaptable to change, while at the sametime providing rich, novel, and expanded services to entirely new customersenabling innovative research, education, and outreach.

With the prior GCOOS data set example in mind, what benefits couldcloud computing and virtual machines provide that would allow a MGDLto be successful in such an endeavor? The following sections detail severalfactors that university administrators and MGDL managers and staff wouldneed to determine the feasibility of creating such a system.

Up-front and Ongoing Costs

Cost is a primary driver for libraries providing GIS data and services. For-tunately, storage is decreasing in cost, enabling a higher archival servicecapacity (Erwin and Sweetkind-Singer 2009). The costs of hardware, soft-ware, and storage, along with the availability and cost of staff (and theirconcomitant training) must also be considered. Important questions aboundbut they include, at a minimum, What would be the cost of installing,maintaining, and supporting the data and services provided by the MGDL?Are there considerable setups or ongoing costs to keep the services run-ning? Can economies of scale be achieved by combining back-end infras-tructure to support GIS databases as well as other library databases? Isspecialized staff necessary? Are these individuals high priced and is ex-pensive, specialized training or effort required? What are the best prac-tices for backing up critical GIS data while maintaining an affordable dailyoperation?

One of the hallmarks of both cloud and virtual computing is that they arecost effective. In the case of cloud computing, a MGDL pays only for whatit uses. The cost of renting storage or computer space would not includethe direct hardware cost of purchasing servers or hard-disk arrays, becausethe cloud provider typically pays for these when the cloud is built. TheMGDL organization instead pays fees based on the usage of both disk spaceand CPU time separately. Similarly, there may be no cost for the setup,installation, support, or maintenance of the actual servers used, becausethe cloud provider would do these as well. No specialized staff would benecessary to manage the servers, only staff necessary to manage the dataand systems related to the tasks of importing and making the data available,

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 18: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

116 D. Goldberg et al.

as would be the case if the MGDL hosted the data on a physical server in arack in their data center.

In contrast, an on-campus, hosted, virtual machine approach would re-quire that all these costs be covered by the MGDL or another organizationwithin the university. Specialized staff would be needed to set up and main-tain the virtual machine environment, and the hardware would need to bepurchased—both servers and disk arrays. But, if the data is sensitive, such aspersonal health information or Institutional Research Boards (IRBs) or otherregulations may require that these data remain on campus; thus, a virtualmachine approach would be the only option.

Availability and Reliability

Organizations like MGDLs that collect, store, curate, and provide data andservices as core operational functions—for better or worse—must abide bycertain user expectations that are not generally expected of campus aca-demic departments or research organizations. By virtue of the role of alibrary, these organizations serve a user population that expects data andservices to be always available, at any time from any place, and mostimportantly—quickly and reliably. Thus MGDLs must consider: How re-liable are their provided maps and services? Are quality-of-service guar-antees in place for customers for their data and services? Can these dataand services be accessed at all times and from anywhere? Do off-site, real-time replicated systems exist in case of catastrophic failure at a primarydata center? Are there dedicated support personnel in place to handle IThardware and software issues? Is there a help desk available to walk usersthrough the use of GIS data and services? Are personnel on call to rebootservices?

In the context of a cloud-based solution, many of these issues becomenonissues because it is the responsibility of the cloud provider to guaranteequality-of-service levels, as well as backups, replication, and appropriatedconnectivity to support users when, where, and how fast they need data.A virtual-machine solution would require that all these issues be addressedby dedicated staff; sufficient investments in hardware, software, and con-nectivity; and appropriate disaster planning. In both situations, the domainknowledge necessary to support a help desk would still need to primarilyfall onto MGDL staff.

Levels of GIS-specific IT Expertise Required

The types of services an MGDL provides require a wide spectrum of generaland specific GIS and IT expertise. For example, data hosting and curationservices using traditional MGDL scenarios of an in-house data center and

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 19: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 117

self-managed servers require staff that can set up, administer, and maintaina series of hardware resources to ensure consistent levels of access. Giventhat libraries have been responsible for the storage and maintenance ofvoluminous digital data since arriv on university campuses, it is not a stretchto assume that staff with server expertise are available either within or to theMGDLs. However, GIS data and services differ from other forms of digitaldata in several key ways. First, GIS data volumes at high spatial temporalresolution can be large so that either it encompasses a large geographicextent or storage space becomes an issue. Second, GIS data have by theirvery nature a spatial component, which until recently required specializeddatabases or application layers to enable efficient storage and retrieval. Thismeant that IT support staff within MGDLs had to become well versed inadditional specialized server, file, and database platforms, beyond the normalscope of library operations. Third, MGDLs that provide access to GIS datalayers may have additional bandwidth requirements due to the complexityof geographic data, for example, that may require millions of data points todelineate the boundaries for land parcels for a single county.

In a cloud-based solution, the cost of supporting massive GIS databasescan be calculated in advance if the data set sizes are known and are relativelystable once agreements are made and if specialized IT staff are not needed tomanage the server or file systems. However, GIS administrative staff will mostlikely still need to manage the actual data, because this is a specialized taskthat most cloud providers do not undertake. An organization could chooseto store as much or as little data on the cloud as it wished in a heterogeneousdata storage approach, trading accessibility and reliability for cost. A virtualmachine solution would require that all specialized staff needs be availablein-house.

Responsiveness to Changes, Trends, and Fads in Technology

Like other technology-heavy fields, GIS both benefits and suffers from rapidadvances drawn from its base disciplines, including computer science andengineering, information technology, math, and statistics. Today GIS is alsobeing impacted by rapid advances in database technology, internet connec-tivity and technology, mobile devices and sensors, and desktop and serverinfrastructures. The pace of these advances will continue to quicken. MGDLsare particularly susceptible to the changes wrought by technical advances.Library administrators must decide whether they should rapidly embracethese changes and attempt to keep up with the latest GIS protocols and ad-vances in data storage and transmission. Librarians need to ask themselves,“Is there a long-term benefit to becoming early adopters of any of thesenew techniques or technologies or is it better to wait until they become defacto or actual standards used as best practices in industry, government, andacademic settings?”

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 20: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

118 D. Goldberg et al.

Cloud and virtual machine-based solutions would provide MGDLs pro-totypes of new GISs, data formats, and service technologies as they aredeveloped by researchers. One of the main benefits common to both ap-proaches is that the cost of creating additional, small, virtual test machinesor cloud instances is generally low enough and can be done rapidly enoughto enable MGDLs to be on the technological cutting edge if they so wished.An additional benefit is that if a project is abandoned, the virtual machine orcloud instance can simply be deleted.

Enabling Data as Publication

Recently there has been a push to publish the data supporting researchreports (Lawrence et al. 2011; Costello 2009). Such a project could be aperfect role for the MGDL of the future because it fits its core mission ofdata curation: once published, data will be available all the time, for alltime. In order for MGDLs to attract customers to provide their data sets,MGDLs must convince them that sufficient levels of redundancy, reliability,and trustworthiness are in place (Costello 2009; Klump 2011).

MGDLs and institutional repositories can help fulfill the requirement ofdata sustainability that funding institutions (such as NSF) are currently impos-ing on research endeavors. Academic departments and individual researchlabs, in contrast, have less expertise in curation of research data. Depart-ments or colleges are certainly not well suited for this type of data curationbecause it requires a dedicated budget to maintain the uptime consideredessential if data are to be treated as a product with the hopes of future re-search using and expanding upon it. Many departments might find unfundedcuration and hosting of data impossible given their financial situation.

Ensuring consistent, reliable, on-demand data availability to an un-known number of future users represents a difficult challenge for MGDLsthat seek to host data as citable publications or resources. If a particulardata set should become extremely popular, requests for the data or servicecould quickly mount. The hallmarks of both cloud computing and virtualiza-tion are reliability, redundancy, scalability, and availablility. The problem ofalways-available data can be readily solved using external commercial cloudservice providers by creating on- or off-campus virtualization solutions.

DISCUSSION AND CONCLUSIONS

MGDLs are once again at a point in their history where they are confrontedwith a changing landscape of technological options to support the servicesthey provide. New types and dramatically different amounts of data mustbe stored and supported to enable these services. User bases are changing,

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 21: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 119

and users’ needs will only continue to grow and become more diverse inthe future. This is not the first time these types of issues have been facedby this community. The paths by which existing MGDLs have grown fromtheir creation until the present have been marked by the community’s abilityto react swiftly to adopt new technological developments that enable themto further their core mission of serving their campuses efficiently and costeffectively.

Libraries need to continue using traditional best practices concerningmetadata, curation, and archiving, while taking innovative approaches to dealwith streaming big data that will fuel discovery among researcher commu-nities and facilitate scholarly work under the premises of the ever-changingtechnological era while maintaining the principles of librarianship. MGDLshave an opportunity to take a leadership role in the developing strategiesand approaches for providing a needed and fundable service to the researchcommunity in their potential ability to act as repositories for very large, veryimportant, and very expensive data sets.

However, changes in the ways MGDLs operate, store, and provide data,along with expanding the types of data needing support, will require keydecisions about the services they provide in the future that will affect allaspects of service planning, budgeting, support, maintenance, and delivery(Fan et al. 2013). In turn, these decisions affect hardware and softwarepurchasing and hiring decisions. Furthermore, administrators of MGDLs mustmake choices that will allow them to remain vigilant to changes and newtrends in research, funding, and technology, and to strategically adapt theirservices accordingly so as to remain relevant and avoid obsolescence.

This article has outlined the broad set of services MGDLs offer anddiscussed the technology, staffing, and expertise required in terms of gen-eral information technology, specialized hardware and software require-ments, and specific GIS expertise necessary to provide these services.Cloud computing and virtual machines each provides significant, sometimessimilar—sometimes complementary—strengths as two potential options thatcould be utilized to help MGDL meet these and other future goals. Thenew computing capabilities have the potential to reduce hardware and laborcosts and responsibilities, while at the same time increase the capacity ofMGDLs and provide platforms to launch new services to better serve theirconstituents.

However, cloud computing and virtual machines are not the silver bul-lets that will solve every issue confronting MGDLs. Staff with specialized GIStraining in the management and use of spatial data will always be necessary.Dramatically increasing data sizes will increase costs, no matter if a cloudsolution, virtual machine, or standard server approach is used. Adding newservices and data types to expand the user base also increases support costs.

The intent of this article was not to provide a recipe that any MGDLcould follow; instead, it was to highlight key areas for thoughtful reflection

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 22: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

120 D. Goldberg et al.

within an organization so that options could be discussed, weighed, andappropriate decisions made in accordance with the constraints and needsof a particular MGDL. No one option will work for two MGDLs, and twooptions may be equally viable for one MGDL. Further analysis as well asprototype implementations need to be attempted by MGDLs of differenttypes, MGDLs using different institutional organizational frameworks, andMGDLs providing different levels and types of services to different types ofend users. Published results of both success stories and miserable failureswill provide important data with which future decisions can be made.

NOTES

1. https://www.lib.ncsu.edu/gis/2. http://library.tamu.edu/maps-gis3. http://lib.stanford.edu/GIS4. https://uspatial.umn.edu/5. http://www.princeton.edu/∼geolib/gis/6. http://nyu.libguides.com/GIS7. https://wikis.uit.tufts.edu/confluence/display/GISatTufts/Home8. http://spatial.usc.edu/9. http://www.pgc.umn.edu

10. gcoos.org

REFERENCES

Abresch, J., A. Hanson, and P. Reehling. 2008. “Geographic Information and LibraryEducation.” In Integrating Geographic Information Systems into Library Ser-vices: A Guide for Academic Libraries, ed. J. Abresch. Hershey, PA: InformationScience Publishing.

Adler, P. S. 1995. Special issue of geographic information systems (GIS) and academiclibraries: An introduction.The Journal of Academic Librarianship, 21: 233–235.

Argentati, C. D. 1997. Expanding horizons for GIS services in academic libraries.The Journal of Academic Librarianship 23(6): 463–468. doi:10.1016/S0099-1333(97)90170-1

Armbrust, M., A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, and G. Lee. 2010.A view of cloud computing. Communications of the ACM 53(4): 50–58.

Association of Research Libraries 1999. The ARL Geographic Information SystemsLiteracy Project. SPEC Kit 238. Washington, DC: ARL Office of Leadership andManagement Services.

Baun, C., M. Kunze, J. Nimis, and S. Tai. 2011. Cloud computing: Web-based dynamicIT services. Berlin, Heidelberg: Springer-Verlag.

Brown, B., M. Chui, and J. Manyika. 2011. Are you ready for the era of ‘big data’?McKinsey Quarterly 4: 24–35.

Bughin, J., M. Chui, and J. Manyika. 2010. Clouds, big data, and smart as-sets: Ten tech-enabled business trends to watch. McKinsey Quarterly 56(1):75–86.

Cobb, D. A. 1995. Developing GIS relationships. The Journal of Academic Librari-anship 21(4): 275–277.

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 23: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

Maps & GIS Data Libraries in the New Era 121

Costello, M. J. 2009. Motivating online publication of data. BioScience 59(5): 418–427.Dixon, J. B. 2006. Essential collaboration: GIS and the academic library. Journal of

Map & Geography Libraries 2(2): 5–20.Erwin, T., and J. Sweetkind-Singer. 2009. The National Geospatial Digital Archive:

A collaborative project to archive geospatial data. Journal of Map & GeographyLibraries 6(1): 6–25.

Fan, X., S. Wu, Y. Ren, and F. Deng. 2013. “An Approach to Providing CloudGIS Services Based on Scalable Cluster.” 21st International Conference onGeoinformatics, Kaifeng, China, June 20–22. IEEE Xplore Digital Library, 1–4.doi:10.1109/Geoinformatics.2013.6626132

Ferguson, C. 2013. Technology left behind—GIS and the library: Part 1. Against theGrain 18(6): 42.

Herold, P., T. D. Gale, and T. P. Turner 1999. Optimizing web access to geospa-tial data: The Cornell University Geospatial Information Repository (CUGIR).Issues in Science and Technology Librarianship 21. http://www.istl.org/99-winter/article2.html

Klump, J. 2011. Criteria for the trustworthiness of data centres. D-Lib Magazine 17(1):6.

Lawrence, B., C. Jones, B. Matthews, S. Pepler, and S. Callaghan. 2011. Citation andpeer review of data: Moving towards formal data publication. InternationalJournal of Digital Curation 6(2): v.

Lohr, S. 2012. The Age of Big Data. New York Times Sunday Review. OpinionPages, February 11. http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted=all&_r=0

Manyika, J., M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A.H. Byers. 2011. Big data: The Next Frontier for Innovation, Competition,and Productivity. (Report. McKinsey Global Institute) http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation

Miller, M. 2008. Cloud computing: Web-based applications that change the way youwork and collaborate online. Edinburgh Gate, Harlow, UK: Que Publishing,Pearson Education.

National Science Foundation. 2013. Chapter II: Proposal Preparation In-structions, Data Management Plan. http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_2.jsp#dmp

North, G. W. 1989. Will your library be the spatial data information center of thefuture? Inspel 23(2):130–136.

Perera, P. 2008. “The (unknown) Role of Map Librarian and the Challenges Faced inSatisfying the Cartographic User’s Needs.” In e-LIS, e-prints in Library and Infor-mation Science. Presented at the National Conference on Library & InformationStudies (NACLIS 2008), June 24. Colombo, Sri Lanka: Sri Lanka FoundationInstitute.

Ramasubramanian, L., and D. W. Goldberg. 2012. GIS adoption and use on collegecampuses: An end-of-year review and look ahead to 2012. Directions Maga-zine. Jan. 12. http://www.directionsmag.com/articles/gis-adoption-and-use-on-college-campuses-an-end-of-year-review-and-loo/225329

Shekhar, S., V. Gunturi, M. R. Evans, and K. Yang. 2012. “Spatial Big-Data ChallengesIntersecting Mobility and Cloud Computing.” In Proceedings of the Eleventh ACMInternational Workshop on Data Engineering for Wireless and Mobile Access.

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014

Page 24: Maps & GIS Data Libraries in the Era of Big Data and Cloud Computing

122 D. Goldberg et al.

Eleventh ACM International Workshop, Scottsdale, AZ, May 20–24. New York:Association for Computing Machinery, 1–6.

Simoniello, C., A. E. Jochens, M. K. Howard, J. Swaykos, D. R. Levin, D. Stone, B.Kirkpatrick, and S. Kobara. 2011. Making Sense of Ocean Sensing: The Gulfof Mexico Coastal Ocean Observing System Links Observations to Applica-tions. In Sensing Technologies for Global Health, Military Medicine, DisasterResponse, and Environmental Monitoring; and Biometric Technology for Hu-man Identification VIII, 802918. (Proceedings of SPIE; Vol. 8029), doi:10.1117/12.883115

Sinton, D. S., and J. J. Lund. 2007. Understanding place: GIS and mapping across thecurriculum. Redlands, CA: ESRI.

Spiegel, S., and J. Kinikin. 2004. Promoting geographic information system usageacross campus. Computers in Libraries 24(5): 10–12.

Steinhart, G. 2006. Libraries as distributors of geospatial data: Data managementpolicies as tools for managing partnerships. Library Trends 55(2): 264–284.

Vouk, M. A. 2004. Cloud computing: Issues, research and implementations. Journalof Computing and Information Technology 16(4): 235–246.

West, R. 2001. Understanding ArcSDE: GIS by ESRI. Redlands, CA: Esri Press.

Dow

nloa

ded

by [

Dic

le U

nive

rsity

] at

23:

01 1

2 N

ovem

ber

2014