34
Cover Page for Proposal Submitted to the National Aeronautics and Space Administration NASA Proposal Number TBD on Submit NASA PROCEDURE FOR HANDLING PROPOSALS This proposal shall be used and disclosed for evaluation purposes only, and a copy of this Government notice shall be applied to any reproduction or abstract thereof. Any authorized restrictive notices that the submitter places on this proposal shall also be strictly complied with. Disclosure of this proposal for any reason outside the Government evaluation purposes shall be made only to the extent authorized by the Government. SECTION I - Proposal Information Principal Investigator Alyssa Goodman E-mail Address [email protected] Phone Number 617-495-9278 Street Address (1) 60 Garden St Street Address (2) MS 42 City Cambridge State / Province MA Postal Code 02138-1516 Country Code US Proposal Title : The ADS All Sky Survey Proposed Start Date 01 / 01 / 2012 Proposed End Date 12 / 31 / 2013 Total Budget 399,954.00 Year 1 Budget 194,255.00 Year 2 Budget 205,699.00 Year 3 Budget 0.00 Year 4 Budget 0.00 SECTION II - Application Information NASA Program Announcement Number NNH11ZDA001N-ADAP NASA Program Announcement Title Astrophysics Data Analysis For Consideration By NASA Organization (the soliciting organization, or the organization to which an unsolicited proposal is submitted) NASA , Headquarters , Science Mission Directorate , Astrophysics Date Submitted Submission Method Electronic Submission Only Grants.gov Application Identifier Applicant Proposal Identifier Type of Application New Predecessor Award Number Other Federal Agencies to Which Proposal Has Been Submitted International Participation Yes Type of International Participation Facility SECTION III - Submitting Organization Information DUNS Number 082359691 CAGE Code 1NQH4 Employer Identification Number (EIN or TIN) 042103580 Organization Type 2J Organization Name (Standard/Legal Name) Harvard College Company Division Organization DBA Name Division Number Street Address (1) 1350 MASS AVE STE 600 Street Address (2) City CAMBRIDGE State / Province MA Postal Code 02138-3846 Country Code USA SECTION IV - Proposal Point of Contact Information Name Alyssa Goodman Email Address [email protected] Phone Number 617-495-9278 SECTION V - Certification and Authorization Certification of Compliance with Applicable Executive Orders and U.S. Code By submitting the proposal identified in the Cover Sheet/Proposal Summary in response to this Research Announcement, the Authorizing Official of the proposing organization (or the individual proposer if there is no proposing organization) as identified below: certifies that the statements made in this proposal are true and complete to the best of his/her knowledge; agrees to accept the obligations to comply with NASA award terms and conditions if an award is made as a result of this proposal; and confirms compliance with all provisions, rules, and stipulations set forth in the two Certifications and one Assurance contained in this NRA (namely, (i) the Assurance of Compliance with the NASA Regulations Pursuant to Nondiscrimination in Federally Assisted Programs, and (ii) Certifications, Disclosures, and Assurances Regarding Lobbying and Debarment and Suspension. Willful provision of false information in this proposal and/or its supporting documents, or in reports required under an ensuing award, is a criminal offense (U.S. Code, Title 18, Section 1001). Authorized Organizational Representative (AOR) Name AOR E-mail Address Phone Number AOR Signature (Must have AOR's original signature. Do not sign "for" AOR.) Date FORM NRESS-300 Version 3.0 Apr 09

Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

  • Upload
    others

  • View
    18

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Cover Page for ProposalSubmitted to the

National Aeronautics andSpace Administration

NASA Proposal Number

TBD on Submit

NASA PROCEDURE FOR HANDLING PROPOSALS

This proposal shall be used and disclosed for evaluation purposes only, and a copy of this Government notice shall be applied to any reproduction orabstract thereof. Any authorized restrictive notices that the submitter places on this proposal shall also be strictly complied with. Disclosure of thisproposal for any reason outside the Government evaluation purposes shall be made only to the extent authorized by the Government.

SECTION I - Proposal Information

Principal Investigator

Alyssa GoodmanE-mail Address

[email protected] Number

617-495-9278Street Address (1)

60 Garden StStreet Address (2)

MS 42City

CambridgeState / Province

MAPostal Code

02138-1516Country Code

USProposal Title : The ADS All Sky Survey

Proposed Start Date

01 / 01 / 2012Proposed End Date

12 / 31 / 2013Total Budget

399,954.00Year 1 Budget

194,255.00Year 2 Budget

205,699.00Year 3 Budget

0.00Year 4 Budget

0.00SECTION II - Application Information

NASA Program Announcement Number

NNH11ZDA001N-ADAPNASA Program Announcement Title

Astrophysics Data AnalysisFor Consideration By NASA Organization (the soliciting organization, or the organization to which an unsolicited proposal is submitted)

NASA , Headquarters , Science Mission Directorate , AstrophysicsDate Submitted Submission Method

Electronic Submission OnlyGrants.gov Application Identifier Applicant Proposal Identifier

Type of Application

NewPredecessor Award Number Other Federal Agencies to Which Proposal Has Been Submitted

International Participation

YesType of International Participation

FacilitySECTION III - Submitting Organization Information

DUNS Number

082359691CAGE Code

1NQH4Employer Identification Number (EIN or TIN)

042103580Organization Type

2JOrganization Name (Standard/Legal Name)

Harvard CollegeCompany Division

Organization DBA Name Division Number

Street Address (1)

1350 MASS AVE STE 600Street Address (2)

City

CAMBRIDGEState / Province

MAPostal Code

02138-3846Country Code

USASECTION IV - Proposal Point of Contact Information

Name

Alyssa GoodmanEmail Address

[email protected] Number

617-495-9278SECTION V - Certification and Authorization

Certification of Compliance with Applicable Executive Orders and U.S. CodeBy submitting the proposal identified in the Cover Sheet/Proposal Summary in response to this Research Announcement, the Authorizing Official of the proposing organization (or the individualproposer if there is no proposing organization) as identified below:

• certifies that the statements made in this proposal are true and complete to the best of his/her knowledge;

• agrees to accept the obligations to comply with NASA award terms and conditions if an award is made as a result of this proposal; and

• confirms compliance with all provisions, rules, and stipulations set forth in the two Certifications and one Assurance contained in this NRA (namely, (i) the Assurance of Compliance withthe NASA Regulations Pursuant to Nondiscrimination in Federally Assisted Programs, and (ii) Certifications, Disclosures, and Assurances Regarding Lobbying and Debarment andSuspension.

Willful provision of false information in this proposal and/or its supporting documents, or in reports required under an ensuing award, is a criminal offense (U.S. Code, Title 18, Section 1001).

Authorized Organizational Representative (AOR) Name AOR E-mail Address Phone Number

AOR Signature (Must have AOR's original signature. Do not sign "for" AOR.) Date

FORM NRESS-300 Version 3.0 Apr 09

Page 2: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

PI Name : Alyssa Goodman

Organization Name : Harvard College

NASA Proposal Number

TBD on SubmitProposal Title : The ADS All Sky Survey

SECTION VI - Team Members

Team Member Role

PITeam Member Name

Alyssa GoodmanContact Phone

617-495-9278E-mail Address

[email protected]

Organization/Business Relationship

Harvard CollegeCage Code

1NQH4DUNS#

082359691

International Participation

NoU.S. Government Agency Total Funds Requested

0.00

Team Member Role

Co-ITeam Member Name

August MuenchContact Phone

617-495-7979E-mail Address

[email protected]

Organization/Business Relationship

Smithsonian Institution/Smithsonian Astrophysical ObservatoryCage Code

1PPP1DUNS#

003261823

International Participation

NoU.S. Government Agency

Smithsonian InstitutionTotal Funds Requested

1,000.00

Team Member Role

Co-I/Science PITeam Member Name

Alberto PepeContact Phone

310-600-3929E-mail Address

[email protected]

Organization/Business Relationship

HARVARD COLLEGE, PRESIDENT & FELLOWS OFCage Code

82368DUNS#

001963263

International Participation

NoU.S. Government Agency Total Funds Requested

0.00

FORM NRESS-300 Version 3.0 Apr 09

Page 3: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

PI Name : Alyssa Goodman

Organization Name : Harvard College

NASA Proposal Number

TBD on SubmitProposal Title : The ADS All Sky Survey

SECTION VII - Project Summary

We will create the first interactive sky map of astronomers' understanding of the Universe over time. We will accomplish this goal byturning the NASA Astrophysics Data System (ADS), widely known for its unrivaled value as a literature resource, into a data resource.

GIS and GPS systems have made it commonplace to see and explore information about goings-on on Earth in the context of maps andtimelines. Our proposal shows an example of a program that lets a user explore which countries have been mentioned in the New YorkTimes, on what dates, and in what kinds of articles. By analogy, the goal of our project is to enable this kind of exploration-on thesky-for the full corpus of astrophysical literature available through ADS.Our group's expertise and collaborations uniquely position us to create this interactive sky map of the literature, which we call the"ADS All-Sky Survey." To create this survey, here are the principal steps we need to follow.

First, by analogy to "geotagging," we will "astrotag," the ADS literature. Many "astrotags" effectively already exist, thanks tocuration efforts at both CDS and NED. These efforts have created links to "source" positions on the sky associated with each of themillions of articles in the ADS. Our collaboration with ADS and CDS will let us automatically extract astrotags for all existing andfuture ADS holdings.

The new ADS Labs, which our group helps to develop, includes the ability for researchers to filter article search results using a varietyof "facets", properties of the article, sources, keywords, authors, observatories, etc. Using only extracted astrotags and facets, we cancreate functionality like what is described in the Times example above: we can offer a map of the density of positions' "mentions" onthe sky, filterable by the properties of those mentions. Using this map, researchers will be able to interactively, visually, discover whatregions have been studied for what reasons, at what times, and by whom.

Second, where images can be extracted from articles, we will attempt to "astroreference" those images in order allow for their overlayon the sky. "Astroreferencing" is the analog of "georeferencing," where coordinate information is used to overlay information onmaps. Our first pass at astroreferencing will be made using the astrometry.net program, in collaboration with one of its creators. Ifenough optically-visible stars are present in an image, astrometry.net can place it where it goes on the sky. Only a small fraction ofADS holdings contain images solvable by astrometry.net, but for the articles which do, reviving the data in this way holds tremendousvalue-especially in the case of historically important observations.

Lastly, we will also astroreference images by text-mining to extract "metadata" buried in the figure captions and text.

As it is built, the ADSASS will effectively create dynamic data layers of astrotags and astroreferenced images. Users will be able toexplore these layers using a wide variety of free all-sky data viewers. Our group and our collaborators have been involved in thedevelopment of the WorldWide Telescope and Aladin programs, so we will use those to develop examples of how we intend for theADSASS to be used. But, we plan to ensure that the data feed represented by the ADSASS will be ingestible by any program capable ofunderstanding sky coordinates and all-sky views.

Our proposal can only give a glimpse into the wealth of science it will enable, which includes everything from observation-planning todata discovery to studying the sky distributions of classes of objects. Just as it would have been hard to predict the full and amazingimpact of GIS and GPS on society, it is similarly hard to gauge the full impact of the NASA ADSASS. The ADS on its own is alreadythe envy of other sciences as a unified research tool, with the advent of the ADSASS, NASA will have led the way to the future onceagain.

FORM NRESS-300 Version 3.0 Apr 09

Page 4: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

PI Name : Alyssa Goodman

Organization Name : Harvard College

NASA Proposal Number

TBD on SubmitProposal Title : The ADS All Sky Survey

SECTION VIII - Other Project Information

Proprietary Information

Is proprietary/privileged information included in this application?

Yes

International Collaboration

Does this project involve activities outside the U.S. or partnership with International Collaborators?

Yes

Principal Investigator

NoCo-Investigator

NoCollaborator

NoEquipment

NoFacilities

Yes

Explanation :

CDS/Observatoire Astronomique - access to SIMBAD object database internal

NASA Civil Servant Project Personnel

Are NASA civil servant personnel participating as team members on this project (include funded and unfunded)?

No

Fiscal Year Fiscal Year Fiscal Year Fiscal Year Fiscal Year

Number of FTEs Number of FTEs Number of FTEs Number of FTEs Number of FTEs

FORM NRESS-300 Version 3.0 Apr 09

Page 5: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

PI Name : Alyssa Goodman

Organization Name : Harvard College

NASA Proposal Number

TBD on SubmitProposal Title : The ADS All Sky Survey

SECTION VIII - Other Project Information

Historical Site/Object Impact

Does this project have the potential to affect historic, archeological, or traditional cultural sites (such as Native American burial or ceremonial grounds) or historic objects(such as an historic aircraft or spacecraft)?

No

Explanation:

FORM NRESS-300 Version 3.0 Apr 09

Page 6: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

PI Name : Alyssa Goodman

Organization Name : Harvard College

NASA Proposal Number

TBD on SubmitProposal Title : The ADS All Sky Survey

SECTION IX - Program Specific Data

Question 1 : Short Title:

Answer: The ADS All Sky Survey

Question 2 : Type of institution:

Answer: Educational Organization

Question 3 : Will any funding be provided to a federal government organization including NASA Centers, JPL, other Federal agencies,government laboratories, or Federally Funded Research and Development Centers (FFRDCs)?

Answer: Yes

Question 4 : Is this Federal government organization a different organization from the proposing (PI) organization?

Answer: N/A

Question 5 : Does this proposal include the use of NASA-provided high end computing?

Answer: No

Question 6 : Research Category:

Answer: 1) Theory/computer modeling

Question 7 : Team Members Missing From Cover Page:

Answer:

Alberto Accomazzi, SAO ADS, Cambridge, MA, facility providerAlberto Conti, STScI, Baltimore, MD, facility provider and data analystThomas Boch, CDS, Strasbourg, FRANCE, resource providerRahul Dave, Harvard College Observatory, Cambridge, MA, data analystJonathan Fay, Microsoft Research, Redmond, WA, software developer and resource providerDavid Hogg, NYU, Astrometry.net, resource provider

Question 8 : This proposal contains information and/or data that are subject to U.S. export control laws and regulations including ExportAdministration Regulations (EAR) and International Traffic in Arms Regulations (ITAR).

Answer: No

Question 9 : I have identified the export-controlled material in this proposal.

Answer: N/A

Question 10 : I acknowledge that the inclusion of such material in this proposal may complicate the government's ability to evaluate theproposal.

Answer: N/AFORM NRESS-300 Version 3.0 Apr 09

Page 7: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Question 11 : Are you planning for undergraduate students to be involved in the conduct of the proposed investigation?

Answer: No

Question 12 : If yes, how many different undergraduate students?

Answer: N/A

Question 13 : What is the total number of student-months of involvement for all undergraduate students over the life of the proposedinvestigation?

Answer:

Question 14 : Provide the names and current year (1,2,3,4) for any undergraduate students that have already been identified.

Answer:

Question 15 : Are you planning for graduate students to be involved in the conduct of the proposed investigation?

Answer: No

Question 16 : If yes, how many different graduate students?

Answer: N/A

Question 17 : What is the total number of student-months of involvement for all graduate students over the life of the proposedinvestigation?

Answer:

Question 18 : Provide the names and current year (1,2,3,4, etc.) for any graduate students that have already been identified.

Answer:

Question 19 : Primary Research Area:

Answer: Astrophysical Databases

Question 20 : Secondary Research Area:

Answer: Interstellar Medium and Galactic Structure

Question 21 : Principal Dataset:

Answer: Other (must identify)

Question 22 : If other, please identify the dataset.

Answer: NASA Astrophysics Data System (ADS)

Question 23 : Additional Dataset:

Answer: Other (must identify)FORM NRESS-300 Version 3.0 Apr 09

Page 8: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Question 24 : If other, please identify the dataset.

Answer: CDS SIMBAD Object Database and NASA/NED Object databases

FORM NRESS-300 Version 3.0 Apr 09

Page 9: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

CONTENTS 1 Scientific Justification ...................................................................................... 1

1. Introduction................................................................................................. 3

2. Technical Approach and Methodology ...................................................... 2

3. Perceived Impact: Sample Science............................................................ 11

4. Plan of Work ............................................................................................. 13

5. Data Sharing Plan ..................................................................................... 14

6. Relevance of Proposed Work to NASA .................................................... 15

2. References.......................................................................................................... 16

3. Biographical Sketches..................................................................................... 17

4. Current and Pending Support....................................................................... 21

5. Budget Justification ....................................................................................... 24

Narrative ............................................................................................................... 24

Details....................................... ............................................................................ 26

6. Letters of Support ............................................................................................ 27

7. Subcontracts ...................................................................................................... 32

Page 10: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

1. INTRODUCTION

OverviewOur proposal will turn the NASA ADS, widely known for its unrivaled value as a literature source, into a data source. Many astronomers do not realize that the “D” in “ADS” stands for data, and that the service as originally envisioned included access to tables, spectra, and images as well as to articles (Murray et al. 1992). Today, our “Seamless Astronomy” group (a collaboration including ADS staff and several others) is working to make the original ADS plans come true, by connecting ADS’s literature holdings to data sources in multiple new ways.

In ongoing work not covered by this proposal, our group is working with existing data repository systems at Harvard to create a platform, interoperable with ADS, that enables future papers to include persistent links to the data sources used within them. But, what about the past? This proposal is about extracting data that are discoverable within the millions of articles now on ADS by automatically adding metadata, thereby making old papers come alive as the source for a fully “new” historical data layer available to researchers worldwide.

While some of the data we will resurrect comes originally from NASA missions, our primary “NASA” data source in this proposal is the NASA ADS archive itself. We realize that this makes our proposal a bit unusual, but we are convinced that the ADS is an untapped source of valuable data for astronomical research.

OutcomesBefore we explain why and how we think we can bring ADS data to life, allow us to explain what will be produced when we are done, and how we think it will be used.

We will create a data layer, the ADS All-Sky Survey (ADSASS), usable by any program that can read FITS files, or understand astronomical coordinates, based on all data and pointers to data currently “extractable” from ADS articles. Examples of such data include:

1. existing “data” pointers within or linked to the article, e.g. through CDS’s SIMBAD or NASA’s NED service.

2. optical1 data shown as images within articles (even when no coordinates or scale is given with the figure);

3. non-optical data shown as images within articles, in cases where enough metadata are available within the text to assign either a position or sourcename, and optionally a platescale and waveband;

With this data layer in-hand, we envision two main initial “products” as our objectives: 1) an all-sky map showing what parts of the sky have been written about in the literature, for what reasons; and 2) an historical “data layer” offering literature-extracted images for analysis and/or overlay on contextual images. We request funding here only for these two initial products, but we emphasize that it should not be at all difficult for researchers (including us) to extend this service to include catalog data, spectral data, and more advanced literature filtering.

1 By “optical” we mean any image where our methodology, which relies for now on astrometry.net’s use of optical and near-infrared sky surveys of stellar positions, can be used to assign a location and scale to an image.

1

Page 11: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Astrotagging = Geotagging for Astronomy What we advance in this proposal is the astronomical equivalent of Earth-based “geotagging.” Our intent is to “astrotag2” the holdings of the NASA ADS database, providing spatial and temporal information for as much of the astronomical literature as possible. The pathways and methodologies we will use to assign astrotags to images and articles will vary based on the data source and quality, as will the utility of the tags once assigned.

In cases where only an approximate location (e.g. “M31”) and time (e.g. date of publication) are available, tags allow an article to be associated with a region defined by a name much like the geotag “US” gives enough information for a resource to be counted as within the United States. In cases where coordinate information can be derived from an image within an article, those coordinates additionally allow for precise overlay of that image on contextual substrates. We call this procedure “astroreferencing,” by analogy to “georeferencing.” And, in cases where well-dated astrotagged and/or astroreferenced resources refer to the same region, time series can be built.

The outcome of our efforts will directly parallel what can be done with geotagged and georeferenced literature relevant to places and dates in Earth’s history. Consider the New York Times example shown in Figure 1, taken from a recent Harvard student project (Jacopille 2011). The Figure shows a screen shot of an interactive application (http://cs171.org/2011/projects/web/Jacopille_David/jacopille/) that allows users to understand which countries were mentioned in the Times, on which dates, and to interactively filter the results by various properties of the

2 We define “astrotag” to include both space and time (e.g. a “geotag” and a “time stamp.”)

Interactive timeline

Selectable facets used for filtering

Live (Interactive) “Heatmap” of Article Density

Color Scale Key

Figure 1. Snapshot of an interactive visualization of New York Times articles.

2

Page 12: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

article (facets). We have annotated this example to show the generic features common to geotagging and astrotagging.

In the Times example, the single set of “facets” at the bottom have been set to show properties of the article like “total words,” but in a more full-blown application they could easily be set to draw upon much more sophisticated and combinable facets, such a search for “wedding,” “includes color photo,” and “Prince William.” Such a search would, in this map/timeline visualization, then highlight all the countries and dates where the Times had run stories with color photos pertaining to weddings and Prince William. The user would then be able to browse relevant articles in their space-time context, retrieve articles using a variety of existing services for doing so, and understand which (clusters of ?) countries are most interested in Prince William getting married, and why.

Expected Significance Just as it would have been (and still is) hard to foresee the full range of amazing GIS/GPS applications resulting from widespread geotagging, it is hard for us to fully anticipate the wealth of new science to be enabled by the astrotagging and astroreferencing of the NASA ADS. Nonetheless, we do offer several generalizable astronomical examples (having nothing to do with Prince William!) in Section 3. The current significance of the NASA ADS as a literature resource cannot be overstated. Unlocking the data currently hiding within the ADS will multiply ADS’s value by offering unprecedented synthetic views of our understanding of the Universe.

2. TECHNICAL APPROACH AND METHODOLOGY:HOW THE ADS ALL-SKY SURVEY IS TO BE CREATED

The procedures, data sources, and outcomes that are part of the proposed ADS All-Sky Survey project are depicted, in a summarized format, in Figure 2.

Data sourcesThe top portion of Figure 2 displays the major data sources that we intend to employ in this project. The primary database is the literature of the NASA ADS3, maintained by the Harvard-Smithsonian Center for Astrophysics (CfA). The additional database is the SIMBAD object catalogue maintained by the Centre de Données astronomiques de Strasbourg (CDS)4.

The ADS is the premier literature resource for the astronomical community. It maintains three bibliographic databases containing roughly 9 million records, 4.5 million scanned pages, and nearly a million fulltext articles. Integrated in its databases, the ADS provides access and pointers to a wealth of external resources, including electronic articles, data catalogs and archives. This additional set of resources and metadata is used to provide the new “faceted” search engine for ADS. The faceted ADS engine essentially aggregates additional metadata features about articles (e.g. author names, object names, year of publication, related archives) and offers these features (the “facets”) in a filtering mechanism (see Figure 3). The faceted search engine is part of a larger experimental suite of new ADS tools, referred to as “ADS Labs,” at labs.adsabs.harvard.edu.

3 See attached letter of support from Alberto Accomazzi, representing ADS.

4 See attached letter of support from Thomas Boch, representing CDS.

3

Page 13: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

The ADS Labs engine employs disparate information about bibliographic records to provide advanced filtering and searching of scholarly articles. For example, one can perform a query for all articles written by author “John Huchra” that discuss galaxy “M31”. This faceting mechanism is offered thanks to ADS’ data integration and collaboration with a number of services and archives. In this case, author information (e.g., “John Huchra”) is stored, maintained, and disambiguated via a local ADS database at the CfA. Object information (e.g., the fact that a given

2

Literature

1Optical images

Object Data

IMAGE EXTRACTION

NASAARCHIVES

NonOptical images

ARTICLE-OBJECTMATCHING

ALL-SKYLITERATUREHEATMAP

HISTORICALDATALAYER

PRIMARY DATABASE ADDITIONAL DATABASE

ASTROMETRICMEASUREMENT

ADS ALL-SKY SURVEY DATABASE

Astrotagged literature

data sources

3TEXT MINING

1836 E. O’Sullivan et al.

The most notable feature in the abundance map (Fig. 1, middlepanel) is a general roughly solar abundance region in the centreof the map, which is asymmetric and clumpy, indicating unevenenrichment of the ICM. Regions of supersolar abundance extendalong the jets to the radius of the western knot and the base of theeastern lobe, with some extension to the north. The solar abundanceregion is somewhat more extended south of the jets, but is lessconsistent, with patches of both high and low abundance. At largerradii, the abundances decline, but there is considerable variationfrom !0.4 Z" regions in the east and west to solar abundances inthe north and south.

The correlation between the supersolar abundance region andthe radio jets suggests that enriched material is being entrainedoutwards from the core of NGC 6051. A branch or clump of highabundances also extends north or north-east from the central galaxy.This may indicate a trail of material left behind the galaxy, sincethe bending of the radio jets suggests that NGC 6051 is movingsouth. Alternatively, it could indicate that the entrainment of gasto the east is less closely confined around the jet than is the caseon the west. Neither the highest abundance features nor the largernear-solar region correlates with the stellar structure of NGC 6051.The high-abundance feature extends roughly across the minor-axisof the galaxy, but is more extended that the D25 ellipse.

A comparison of the maps with galaxies in the field of view showsno clear correlations. IC 4588, an early-type galaxy at redshift 0.051,falls at the western edge of the large, cool, low-abundance regionto the south-east of the eastern radio lobe (region 1 in Fig. 1). It ispossible that the cool material is associated with the galaxy, perhapsas part of a galaxy group. Koranyi & Geller (2002) find a smallnumber of galaxies at approximately the same recession velocity.However, there is no clear surface brightness structure in the regionand there are insufficient counts to allow us to identify any additionalspectral components. An apparent radio source coincident with IC4588 is seen in the 610-MHz contours, but a comparison with theavailable GMRT and VLA maps at other frequencies suggests thatwhile there is a source at this position, its apparent extension is theresult of a noise feature.

3.1 Metal enrichment along the jets

Figs 3 and 4 show the map of best-fitting abundance values, com-pared to the 610 MHz radio structure and the 90 per cent upper andlower bound maps. The central abundance feature, which correlateswith the jets is clear in all three maps. Maps of the fitted statisticshow variation across the field, but do not appear correlated withthe temperature or abundance maps. This suggests that the apparentfeatures are not the product of poor spectral fits in particular regions.We test this conclusion more thoroughly in Section 3.2.

To examine the high abundances associated with the radio jets,we placed a number of rectangular regions along and across thejet, shown in Fig. 3. Smaller regions are used in the inner part ofthe jet to allow us to look for any central abundance peak, largerregions outside to minimize the uncertainties on abundance. Spectrawere extracted from these regions and fitted with an absorbed APEC

model. The resulting abundances are shown in Fig. 5. The east-to-west profile uses the two large rectangular regions at each endand the smaller rectangles along the jet; the north-to-south profilecompares the upper and lower pairs of large rectangular regions, andthe central region comprises the three small rectangles combined.

While the abundances in neighbouring regions are comparable,there is a clear trend for higher abundances in the inner jets (the threecentral regions of the east-to-west profile) and declining abundance

Figure 3. Abundance map of the core of AWM 4, with GMRT 610-MHzcontours overlaid. Rectangular regions were used to examine the variationin abundance across and along the jet. The white cross marks the positionof the radio core.

outside that area. The abundance of the westernmost region is lowerthan the abundances in the inner jet at 90 per cent significance.Combining regions of similar metallicity, we find that the innerpart of the jets (regions 3–5 of the east-to-west profile or region 3of the north-to-south profile) is more enriched than the regions atthe eastern end of the jet at 3.2! significance, but only at a 2.0!

level in comparison to the western regions. However, comparingthe inner jet to a combination of the extreme western and easternregions shows a 3.4! difference. The northern and southern regions,combined in pairs, are less abundant at the 2.4–2.7! level or 3! ,if all four are simultaneously fitted. In general, we conclude thatthe high-abundance region is more extended from east to west thanfrom north to south, following the jet, and that its abundance issignificantly greater than its surroundings, by !0.4 Z".

3.2 Accuracy of the spectral maps and potential sources of bias

To test the accuracy of the maps, we defined regions covering spe-cific temperature and abundance features, extracted spectra fromthese regions and fitted them. The regions contain between !660and !2900 net counts in the 0.7–7.0 keV band. While the spectralextraction and fitting process is identical in mapping and normalspectral analysis, these regions were not constrained to contain afixed number of counts, so should provide a test of the smoothing-like effect of the mapping process. It also allows us to determinehow well the variation within map regions corresponds to the un-certainty on the normal spectral fit. Fig. 6 shows comparisons ofthe range of temperatures and abundances found in the map regionswith the values derived from the spectral fits.

In general, the maps appear to provide an accurate estimate ofboth the temperature and abundance. One spectral fit, for region A,finds a significantly higher temperature than the map suggests. Thisappears to be a smoothing issue, the spectral extraction regions usedto create the map being significantly larger than region A, whichcontains only !900 net counts (0.7–7.0 keV). The abundance mea-surements typically have larger uncertainties, and a greater vari-ation in map pixel values, particularly in the highest-abundanceregions. The highest abundance of any spectral fit is found in re-gion B. The region contains !1200 net counts, again suggesting that

C# 2010 The Authors, MNRAS 411, 1833–1842Monthly Notices of the Royal Astronomical Society C# 2010 RAS

astrotagging workflow

Astro-referencedimages

1836 E. O’Sullivan et al.

The most notable feature in the abundance map (Fig. 1, middlepanel) is a general roughly solar abundance region in the centreof the map, which is asymmetric and clumpy, indicating unevenenrichment of the ICM. Regions of supersolar abundance extendalong the jets to the radius of the western knot and the base of theeastern lobe, with some extension to the north. The solar abundanceregion is somewhat more extended south of the jets, but is lessconsistent, with patches of both high and low abundance. At largerradii, the abundances decline, but there is considerable variationfrom !0.4 Z" regions in the east and west to solar abundances inthe north and south.

The correlation between the supersolar abundance region andthe radio jets suggests that enriched material is being entrainedoutwards from the core of NGC 6051. A branch or clump of highabundances also extends north or north-east from the central galaxy.This may indicate a trail of material left behind the galaxy, sincethe bending of the radio jets suggests that NGC 6051 is movingsouth. Alternatively, it could indicate that the entrainment of gasto the east is less closely confined around the jet than is the caseon the west. Neither the highest abundance features nor the largernear-solar region correlates with the stellar structure of NGC 6051.The high-abundance feature extends roughly across the minor-axisof the galaxy, but is more extended that the D25 ellipse.

A comparison of the maps with galaxies in the field of view showsno clear correlations. IC 4588, an early-type galaxy at redshift 0.051,falls at the western edge of the large, cool, low-abundance regionto the south-east of the eastern radio lobe (region 1 in Fig. 1). It ispossible that the cool material is associated with the galaxy, perhapsas part of a galaxy group. Koranyi & Geller (2002) find a smallnumber of galaxies at approximately the same recession velocity.However, there is no clear surface brightness structure in the regionand there are insufficient counts to allow us to identify any additionalspectral components. An apparent radio source coincident with IC4588 is seen in the 610-MHz contours, but a comparison with theavailable GMRT and VLA maps at other frequencies suggests thatwhile there is a source at this position, its apparent extension is theresult of a noise feature.

3.1 Metal enrichment along the jets

Figs 3 and 4 show the map of best-fitting abundance values, com-pared to the 610 MHz radio structure and the 90 per cent upper andlower bound maps. The central abundance feature, which correlateswith the jets is clear in all three maps. Maps of the fitted statisticshow variation across the field, but do not appear correlated withthe temperature or abundance maps. This suggests that the apparentfeatures are not the product of poor spectral fits in particular regions.We test this conclusion more thoroughly in Section 3.2.

To examine the high abundances associated with the radio jets,we placed a number of rectangular regions along and across thejet, shown in Fig. 3. Smaller regions are used in the inner part ofthe jet to allow us to look for any central abundance peak, largerregions outside to minimize the uncertainties on abundance. Spectrawere extracted from these regions and fitted with an absorbed APEC

model. The resulting abundances are shown in Fig. 5. The east-to-west profile uses the two large rectangular regions at each endand the smaller rectangles along the jet; the north-to-south profilecompares the upper and lower pairs of large rectangular regions, andthe central region comprises the three small rectangles combined.

While the abundances in neighbouring regions are comparable,there is a clear trend for higher abundances in the inner jets (the threecentral regions of the east-to-west profile) and declining abundance

Figure 3. Abundance map of the core of AWM 4, with GMRT 610-MHzcontours overlaid. Rectangular regions were used to examine the variationin abundance across and along the jet. The white cross marks the positionof the radio core.

outside that area. The abundance of the westernmost region is lowerthan the abundances in the inner jet at 90 per cent significance.Combining regions of similar metallicity, we find that the innerpart of the jets (regions 3–5 of the east-to-west profile or region 3of the north-to-south profile) is more enriched than the regions atthe eastern end of the jet at 3.2! significance, but only at a 2.0!

level in comparison to the western regions. However, comparingthe inner jet to a combination of the extreme western and easternregions shows a 3.4! difference. The northern and southern regions,combined in pairs, are less abundant at the 2.4–2.7! level or 3! ,if all four are simultaneously fitted. In general, we conclude thatthe high-abundance region is more extended from east to west thanfrom north to south, following the jet, and that its abundance issignificantly greater than its surroundings, by !0.4 Z".

3.2 Accuracy of the spectral maps and potential sources of bias

To test the accuracy of the maps, we defined regions covering spe-cific temperature and abundance features, extracted spectra fromthese regions and fitted them. The regions contain between !660and !2900 net counts in the 0.7–7.0 keV band. While the spectralextraction and fitting process is identical in mapping and normalspectral analysis, these regions were not constrained to contain afixed number of counts, so should provide a test of the smoothing-like effect of the mapping process. It also allows us to determinehow well the variation within map regions corresponds to the un-certainty on the normal spectral fit. Fig. 6 shows comparisons ofthe range of temperatures and abundances found in the map regionswith the values derived from the spectral fits.

In general, the maps appear to provide an accurate estimate ofboth the temperature and abundance. One spectral fit, for region A,finds a significantly higher temperature than the map suggests. Thisappears to be a smoothing issue, the spectral extraction regions usedto create the map being significantly larger than region A, whichcontains only !900 net counts (0.7–7.0 keV). The abundance mea-surements typically have larger uncertainties, and a greater vari-ation in map pixel values, particularly in the highest-abundanceregions. The highest abundance of any spectral fit is found in re-gion B. The region contains !1200 net counts, again suggesting that

C# 2010 The Authors, MNRAS 411, 1833–1842Monthly Notices of the Royal Astronomical Society C# 2010 RAS

using the ADS All-Sky Survey

Dataviewers

Figure 2. The ADS All-Sky Survey proposal: Technical plan

4

Page 14: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

article discusses “M31”) is offered thanks to services of CDS5, such as SIMBAD and VizieR. In particular, the SIMBAD astronomical database (displayed on the top right portion of Figure 2) provides manually curated catalogue data, cross-identifications, and co-ordinates for astronomical objects. The present collaboration and interoperability plan between the ADS and the CDS already allows ADS articles to be coupled with information about the celestial objects discussed in those articles.

Astrotagging workflowThe proposed workflow involves a number of steps with varying levels of difficulty. Please refer to the project management scheme in Section 4 for a detailed timeline of the proposed workflow. We will begin our astrotagging procedure with the simplest step of our workflow: the article-object matching, indicated in Figure 2 as step “1.” As explained above, the ongoing collaboration between ADS and object databases (CDS/SIMBAD and NASA/NED), as well as interoperability between underlying services, makes this first step of the astrotagging procedure straightforward. We will use all the holdings of the ADS library (both fulltext articles and bibliographic records) for which object references exist, and begin populating a new database of astrotagged literature (shown in the middle of Figure 2). This mechanism of article-object matching will generate an initial database that associates ADS articles with their manually curated object references provided by CDS.

It is important to note that in order to create this database we will simply combine resources that are already made available by ADS and CDS. The purpose of this first step is to aggregate and publish the conceptual linkages that already exist between ADS and CDS holdings. In the context of the example given above, we will be able to query the ADS for all articles written by author “John Huchra” that discuss galaxy “M31” and display the results via a data visualizer. (Note that this query is more interesting than it sounds, because it returns a filterable list of articles about M31 and about related objects discussed in the literature for apparently similar reasons.)

A map that shows, on the sky, the density of all SIMBAD objects, weighted by how frequently they appear in the literature, has already been developed by Thomas Boch at CDS6. Our proposed new service includes full integration with the ADS database, allowing for faceted searching and live updates. Users will be able to filter results via facets including authors, keywords, archives, missions, objects, dates, and more. As shown in Figure 3, ADS Labs can already display results of a literature search as an author network or a paper citation network, but astrotagging ADS will allow users to also visualize articles on the sky (e.g. via a dynamic heatmap generated through an existing data viewer).

The initial database of astrotags in the ADS All-Sky Survey will be created by associating every bibliographic record in ADS with object references from SIMBAD/CDS. “Astrotags” are spatial and temporal annotations about the objects discussed in a paper, so, by way of example, the astrotags relative to one of the papers displayed in Figure 3 (Huchra & Brodie, 1987) would consist of the following elements:

5 In many cases, especially for extragalactic sources, the NASA NED service also provides curated links to source information, and ADS ingests and offers access to this information as well.

6 A video of the CDS implementation is available at http://aladin.u-strasbg.fr/java/CDSMaps.htm

5

Page 15: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

1. the ADS bibcode of the article in question (e.g., 1987AJ.....93..779H), 2. the referenced objects, each identified by their SIMBAD reference (e.g., M 31), 3. object co-ordinates, and epoch, identified by RA and DEC (e.g., 00 42 44.330, +41 16 07.50), 4. a timestamp, identified as the year of publication of the article (e.g., 1987)

This straightforward step “1” (see Figure 2) article-object matching procedure will only be possible for a portion of the ADS database, however. There are millions of articles in the ADS for which no object categorization exists. Yet, the objects discussed in articles for which no manual annotation exists can, in many cases, be deduced from the images contained within

View in the sky

Figure 3. ADS/CDS All-Sky map of John Huchra’s articles about M31 (mockup)

6

Page 16: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

them. The next step of our proposal involves automatically inspecting ADS articles for object and/or coordinate information. This part of the investigation is presently limited to the “only” 1.2 million articles for which a fulltext article exist, either in electronic format or as a scanned document.

As shown in Figure 2, we plan to extract images from all fulltext articles in the ADS. Image extraction will be performed via a variety of methods, depending on the available formats. For example, articles for which LaTeX source is available can be easily dealt with, as images are submitted separately. Scanned documents and other articles only available in PDF will be processed using image detection programs. The database of extracted images will consist of many different kinds of images. Among them, we will find optical images for which we will perform astrometric calibration (step “2” in the astrotagging procedure of Figure 2).

A sample extracted optical image is presented in Figure 4. The image was extracted from a recent article published in the Astrophysical Journal by Indriolo et al. (2010). Although the image presents some light annotation (labels A-F), it can be easily resolved astrometrically. The engine that we will use to assign astrometric calibration metadata to optical images is astrometry.net (Lang, et al, 2010). We plan to set up a dedicated astrometry.net7 server at the Center for Astrophysics to run on the database of extracted images. For example, when fed to astrometry.net, the optical image of Figure 4 would be correctly resolved as:

RA,Dec center: (06:17:4.161, +22:33:23.918); Orientation: -179.77 deg E of N; Pixel scale: 4.56 arcsec/pixel; Field size: 77.89 x 58.19 arcminutes; Field contains: Propus (ηGem), IC 443.

The referenced objects (supernova remnant IC443 and the triple-star system Propus=ηGem) returned by astrometry.net for the image in Figure 4 would become part of the astrotags attached to this article, in addition to a time stamp given by publication date8. In this case, however, in addition to astrotagging the article, we would also astroreference the image, using

7 David Hogg, one of astrometry.net’s inventors, has agreed to advise us on the optimization of astrometry.net for the ADSASS. Jonathan Fay, creator of WWT at MSR, is also interested in optimizing astrometry.net and will consult on this aspect of our work as well. Both Dr. Hogg’s and Mr. Fay’s letters of support are attached.

8 In subsequent versions of the ADSASS, by using semantic text-mining we may be able to offer multiple time stamps, giving observation dates, in addition to publication dates. Here, though, we rely exclusively on publication dates.

Figure 4. Image of IC 443 is from the Second Palomar Observatory Sky Survey (POSS-II), extracted from Indriolo et al. (2010)

7

Page 17: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

the coordinates, orientation, and pixel scale returned by astrometry.net. (Recall that the term “astroreferencing” is used by analogy to “georeferencing,” which refers to the alignment of a “map” for overlay within a given coordinate system.) In this way, we will start populating an additional database of literature-extracted images which can be overlain on the sky. This database of astroreferenced images, shown in the center of Figure 2, represents the second component of our ADS All-Sky Survey database.

Some images contained in the ADS literature will have to be processed before being resolved astrometrically, however. This is because some optical images present in papers are excessively annotated or segmented into panes, making astrometric calibration difficult. An example is provided in Figure 5. The image, extracted from a recent paper by Ofek et al. (2010) shows four different views of the field of Supernova PTF 09UJ embedded in a single image. While some of the single images could, in principle, be resolved astrometrically, the image as a whole would fail astrometric calibration. In cases like these, we plan to segment the image in a programmatic fashion and subsequently resolve, astrotag, and store the segmented images independently from one another. Numerous image segmentation techniques have been presented in the literature. For the purpose of this proposal we plan to evaluate techniques that segment images based on fractional calculus (Marazzato & Sparavigna, 2009), random walks (Grady, 2006), and edge flow (Ma & Manjunath, 2000).

We will attempt astrometric calibration, coupled with the discussed automated suite of image segmentation procedures on all images from 1.2 million full text records in ADS. All the images for which image processing and calibration fail will be classified as “non-optical images”9 (see Figure 2).

When astrometric measurement fails, enough textual metadata may be available to assign either a position or a source name, and optionally a platescale and waveband to the images (step “3” of Figure 2). An example of literally a “non-optical” image is shown in Figure 6. Extracted from a recent paper by O’Sullivan et al. (2011), Figure 6 is a metal abundance map of the core of AWM 4 galaxy cluster. In our proposed workflow, this figure would be extracted from the article and calibrated astrometrically, but without any successful results. Yet, the figure caption gives enough information to extract the needed source name and wavelength metadata. In this case, it is clear from the caption that the source depicted in this cluster is the AWM 4 cluster of galaxies. SIMBAD or NED can tell us that AWM4 is also known as RXC J1604.9+235510, with

9 This terminology is for convenience. It is not meant to imply that every image not solvable by the procedures described above is outside the optical band. Additionally, some near-infrared (technically “non-optical” images are in fact solvable using astrometry.net.

10 “AWM 4” gives several matches in SIMBAD and NED, so actually several sourcenames would be linked.

Figure 5. Optical images of supernova PTF 09UJ split into four different panes.

8

Page 18: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

coordinates RA: 16 04 57.0, DEC: +23 55 14. Thus, using figure caption “metadata,” this article/figure would be astrotagged with name and coordinate information relative to the aforementioned catalog objects.

In order to automate this text mining, we plan to parse the captions of non-optical images searching for object names, taken from the CDS SIMBAD object catalog. When a matching object name is found in a caption, the article and image will be associated with the object name and coordinates.

As the savvy reader will have noted by now, not all of the “steps” labeled in Figure 2 are of equal difficulty. In fact, the steps 1, 2, 3 are labeled in order of increasing difficulty. As our Gantt chart in Section 4 shows, we do think it possible to address all the steps within the two years of funding requested here, but, it is important to note that even just the astrotagging possible using the existing ADS↔CDS(SIMBAD) links (step 1) will produce an ADS All-Sky Survey that will be of tremendous value. We are confident that this first step can be accomplished within Year 1 for certain, and that images easily solved by astrometry.net (step 2) can be astroreferenced expeditiously as well. Step 3 is more ambitious, but as public domain tools for text-mining continue to improve at remarkable rates, it should also be possible within Year 2.

Using the ADS All-Sky Survey Now imagine that we have created the ADS All-Sky Survey shown in the middle of Figure 2. How will we make it usable?

Thanks to the proliferation of beautiful “virtual observatory” data viewing systems over the past few years, the options for visualizing and exploring the All-Sky Literature Heat Map and Historical Data Layer shown in Figure 2 are many. Since astronomers have their own preferences amongst data viewing tools, our long-range (two-year) plan is to produce versions of the ADSASS viewable in any product that can open all-sky image files using HEALPIX or TOAST projection schemes. Sample compatible all-sky viewers include WorldWide Telescope (Microsoft), Aladin (CDS), Google Sky (Google), and MASTview11 (NASA). Making the ADSASS survey accessible through this wide range of tools will ensure its use, and compatibility across Mac, Windows, UNIX and Linux operating systems and across physical platforms (including mobile OS devices).

We will carry out initial ADSASS visualization/use experiments using the WorldWide Telescope (WWT) “web client” platform (Windows and Mac compatible), in collaboration with WWT

11 The MASTview project at NASA/STScI is led by Alberto Conti (see attached letter of support).

Figure 6. Non-optical image. The name of the object depicted (AWM 4) can be deduced from the image caption.

1836 E. O’Sullivan et al.

The most notable feature in the abundance map (Fig. 1, middlepanel) is a general roughly solar abundance region in the centreof the map, which is asymmetric and clumpy, indicating unevenenrichment of the ICM. Regions of supersolar abundance extendalong the jets to the radius of the western knot and the base of theeastern lobe, with some extension to the north. The solar abundanceregion is somewhat more extended south of the jets, but is lessconsistent, with patches of both high and low abundance. At largerradii, the abundances decline, but there is considerable variationfrom !0.4 Z" regions in the east and west to solar abundances inthe north and south.

The correlation between the supersolar abundance region andthe radio jets suggests that enriched material is being entrainedoutwards from the core of NGC 6051. A branch or clump of highabundances also extends north or north-east from the central galaxy.This may indicate a trail of material left behind the galaxy, sincethe bending of the radio jets suggests that NGC 6051 is movingsouth. Alternatively, it could indicate that the entrainment of gasto the east is less closely confined around the jet than is the caseon the west. Neither the highest abundance features nor the largernear-solar region correlates with the stellar structure of NGC 6051.The high-abundance feature extends roughly across the minor-axisof the galaxy, but is more extended that the D25 ellipse.

A comparison of the maps with galaxies in the field of view showsno clear correlations. IC 4588, an early-type galaxy at redshift 0.051,falls at the western edge of the large, cool, low-abundance regionto the south-east of the eastern radio lobe (region 1 in Fig. 1). It ispossible that the cool material is associated with the galaxy, perhapsas part of a galaxy group. Koranyi & Geller (2002) find a smallnumber of galaxies at approximately the same recession velocity.However, there is no clear surface brightness structure in the regionand there are insufficient counts to allow us to identify any additionalspectral components. An apparent radio source coincident with IC4588 is seen in the 610-MHz contours, but a comparison with theavailable GMRT and VLA maps at other frequencies suggests thatwhile there is a source at this position, its apparent extension is theresult of a noise feature.

3.1 Metal enrichment along the jets

Figs 3 and 4 show the map of best-fitting abundance values, com-pared to the 610 MHz radio structure and the 90 per cent upper andlower bound maps. The central abundance feature, which correlateswith the jets is clear in all three maps. Maps of the fitted statisticshow variation across the field, but do not appear correlated withthe temperature or abundance maps. This suggests that the apparentfeatures are not the product of poor spectral fits in particular regions.We test this conclusion more thoroughly in Section 3.2.

To examine the high abundances associated with the radio jets,we placed a number of rectangular regions along and across thejet, shown in Fig. 3. Smaller regions are used in the inner part ofthe jet to allow us to look for any central abundance peak, largerregions outside to minimize the uncertainties on abundance. Spectrawere extracted from these regions and fitted with an absorbed APEC

model. The resulting abundances are shown in Fig. 5. The east-to-west profile uses the two large rectangular regions at each endand the smaller rectangles along the jet; the north-to-south profilecompares the upper and lower pairs of large rectangular regions, andthe central region comprises the three small rectangles combined.

While the abundances in neighbouring regions are comparable,there is a clear trend for higher abundances in the inner jets (the threecentral regions of the east-to-west profile) and declining abundance

Figure 3. Abundance map of the core of AWM 4, with GMRT 610-MHzcontours overlaid. Rectangular regions were used to examine the variationin abundance across and along the jet. The white cross marks the positionof the radio core.

outside that area. The abundance of the westernmost region is lowerthan the abundances in the inner jet at 90 per cent significance.Combining regions of similar metallicity, we find that the innerpart of the jets (regions 3–5 of the east-to-west profile or region 3of the north-to-south profile) is more enriched than the regions atthe eastern end of the jet at 3.2! significance, but only at a 2.0!

level in comparison to the western regions. However, comparingthe inner jet to a combination of the extreme western and easternregions shows a 3.4! difference. The northern and southern regions,combined in pairs, are less abundant at the 2.4–2.7! level or 3! ,if all four are simultaneously fitted. In general, we conclude thatthe high-abundance region is more extended from east to west thanfrom north to south, following the jet, and that its abundance issignificantly greater than its surroundings, by !0.4 Z".

3.2 Accuracy of the spectral maps and potential sources of bias

To test the accuracy of the maps, we defined regions covering spe-cific temperature and abundance features, extracted spectra fromthese regions and fitted them. The regions contain between !660and !2900 net counts in the 0.7–7.0 keV band. While the spectralextraction and fitting process is identical in mapping and normalspectral analysis, these regions were not constrained to contain afixed number of counts, so should provide a test of the smoothing-like effect of the mapping process. It also allows us to determinehow well the variation within map regions corresponds to the un-certainty on the normal spectral fit. Fig. 6 shows comparisons ofthe range of temperatures and abundances found in the map regionswith the values derived from the spectral fits.

In general, the maps appear to provide an accurate estimate ofboth the temperature and abundance. One spectral fit, for region A,finds a significantly higher temperature than the map suggests. Thisappears to be a smoothing issue, the spectral extraction regions usedto create the map being significantly larger than region A, whichcontains only !900 net counts (0.7–7.0 keV). The abundance mea-surements typically have larger uncertainties, and a greater vari-ation in map pixel values, particularly in the highest-abundanceregions. The highest abundance of any spectral fit is found in re-gion B. The region contains !1200 net counts, again suggesting that

C# 2010 The Authors, MNRAS 411, 1833–1842Monthly Notices of the Royal Astronomical Society C# 2010 RAS

9

Page 19: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

developer Jonathan Fay12. The PI and Co-I have been involved in the development of WWT from its inception, and WWT (along with Aladin) is already a linked data viewer used by ADS Labs. In addition, WWT offers a wealth of contextual all-sky images and smaller professional and amateur images of popular sources all over the sky.

In Figure 7, we show a screen shot of the “astroreferenced” image shown in Figure 4 placed on the sky in WWT using the astrometry.net solution. We have used some of the existing features of WWT to: 1) overlay a list 2MASS sources in the area retrieved via a VO search; 2) overlay a high-resolution image of part of the supernova remnant; and 3) open a window offering links to other services a researcher could access from within WWT to find out more about the region under study.

What we do not show in Figure 7 (because it does not yet exist!) is the filterable “heat map” of the literature in this whole area that the astrotagging discussed above will make possible. Instead, the reader should take a look back at the inset panel of Figure 3, where unfiltered citations (around M31) are shown as a heatmap displayed in Aladin. Such a heatmap will also be easily displayed in any all-sky viewer, including WWT, Aladin, Google Sky or MASTview. The combination of the heatmap with astroreferenced images like the one shown in Figure 7 will offer users unprecedented insights, as discussed in Section 3, below.

As shown in Figure 2, results from ADSASS searches will be easily combined with existing NASA archival data using any all-sky viewer.

12 See attached letter of support from Jonathan Fay, representing Microsoft Research.

2MASS sources overlain using VO search w/in

WWT

Contextual high-res optical image available

within WWT overlain

Image extracted

from Journal

Lookup options for the selected position (linked

research services)

Figure 7. Annotated screen shot of the supernova remnant from Figure 4 placed into context within WorldWide Telescope. The astrometry.net solution is used to automatically “astroreference” the figure extracted from the Journal for placement on the sky.

10

Page 20: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

3. PERCEIVED IMPACT: SAMPLE SCIENCE

The following science use cases are offered to make explicit the range of scientific research that the main products of this proposal will enable or enhance. Cases are grouped to illustrate the scientific impact of this proposal’s individual products (see Figure 2) and the results of a complete ADS All-Sky Survey.

All Sky Literature Heat Map“Where to look?” At a recent AAS meeting, the Co-I (Gus Muench) was chatting with a graduate student (Adam Ginsburg, U. Colorado) about his use of the ADS literature database. During the conversation, the Mr. Ginsburg explained that he had “created ‘heat maps’ [his words!] of the Orion molecular cloud to figure out which parts of the cloud had existing molecular line data (or where it was missing) and where specific science topics (e.g., jets and outflows) had been explored in published works.” Gus Muench was thrilled to hear that the need for subject-specific heat mapping was great enough to drive a graduate student to do it himself for his area of interest. The ADS All-Sky Survey will draw on the constantly-updated full inventory of astrotagged ADS literature, so it will enable on-demand faceted visual discovery of “what science has been explored where” for any target or topic, with no special custom efforts by graduate students.

Discovering Data The faceted heat map of the ADSASS will enhance research far beyond planning for observations: it will also enable new forms of data discovery.

Consider the problem of discovering published data on the “radial velocities of young stars” in a particular region of interest. A purely text-based literature search using those words would miss targets of interest only mentioned in the literature as “host stars” in “exoplanet” surveys or as “stellar vermin” in “extragalactic” studies. One does not, for example, find the Valenti & Fischer (2005) “SPOCS” catalog of stars surveyed for exoplanets in a search for velocities of “young stars near molecular clouds,” because that catalog contains none of those keywords. Yet, the SPOCs catalog contains valuable data for a study of the velocities of young stars.

Now imagine that instead of a text-only search, you could discover, by exploring the faceted heat map of the ADSASS, that one of your favorite nearby star-forming regions is also a favorite of exoplanet hunters. You then wonder how good the stellar radial velocity measurements are in those exoplanet papers, so you look in the papers retrieved by faceting your visualization-aided literature search. In a few papers, you discover valuable stellar radial velocity measurements in your region of interest–data you would likely have missed in a text-only search.

In other examples, researchers could gain insight from the sky-coverage of particular classes of sources. Today’s distribution of gamma-ray-burst papers would, for example, make it very clear that these sources are cosmological, and not related to the orientation of the Solar System, or of our Galaxy. The GRB distribution issue is an old (and now clearly solved) one, but the facetable ADSASS will enable similar future investigations, of much more subtle phenomena, with ease.

Researchers’ explorations of the ADSASS heat map will continually enable discovery of “new” data resources, thanks to the independence of faceted visual (sky-based) searches from the keyword-oriented text-based approach common to searching today.

11

Page 21: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Historical Data LayerAstrophysical Events A synoptic view of images published in the literature has some its most powerful science applications enabling and enhancing the study of astrophysical “events.” Eruptive events, such as an accretion driven outburst in a very young star, reveal objects that were previously invisible. Synoptic monitoring surveys such as the Palomar Transient Factory are designed to identify these eruptive events (e.g., Covey et al. 2011) but on relatively short timescales. The ADS Historical Data Layer extends the time baseline for many parts of the sky back almost 100 years, much longer than the epoch provided by the Digital Sky Surveys. This layer will capture other useful images types: for example, pioneering near-infrared images published first in the late 1980s can be compared to the NASA IPAC 2MASS data, extending the search for eruptive young stars to redder wavelengths. Eruptive events may well be recurrent on contemporary timescales, as Aspin et al. (2006) have found using archived glass plates to characterize the multiple outbursts of V1647 Orionis in McNeil’s Nebula. Similarly, the ADS Historical Data layer will provide the long time baselines necessary to identify solitary or recurrent events by capturing data in papers that was not saved and cannot be otherwise recovered. The number of sources where useful time-series can be recovered from literature-based images alone is not likely to be large, but we expect that in certain cases the data currently locked up in historical ADS images will prove critical nonetheless.

Astrophysical Motions Has a supernova remnant expanded or an astrophysical jet shifted over time? Looking for expansion and/or light echos in SNR (e.g., Rest et al. 2011 confirmation of the Cas A asymmetry) or the proper motion of shocked gas are core methods for characterizing the underlying astrophysical process. The recent study by Lang & Hogg (2011) utilizing thousands of online images of Comet Holmes to derive its orbit is a good example of how a synoptic survey based on a search for “all images” of one object can yield amazingly accurate science.

The ADSASS Historical Data Layer will not yield the shear numbers of images online searches can for any one object (e.g. for Comet Holmes), but it will give much richer metadata, in the form of the full caption (and article!) that discusses each image. The power of the Historical Data Layer will stem from both the liberated images themselves, and from the richness of the descriptive information attached to those images in the papers from which they are drawn.

The All-Sky ADS surveyThe ADSASS survey shuffles together “data” and “literature” resources into a single complete view of the extant resources for any astrophysical object on the sky. It is worth repeating that object databases like SIMBAD and NED aggregate objects from multiple resources if and only if a literature reference exists that “ties” them together. Creating object aggregations are the means to create multi-band spectral energy distributions with complete provenance. Creating these new aggregations today is a manual process of cataloging data from ADS, CDS and acquiring, on an image by image basis, NASA archival data using tools such as Datascope or Skyview. An interesting glimpse at the utility provided by doing this better is given by an example where one considers that WWT already includes multi-band images of many galaxies, as well as links to ADS search-by-source from within the WWT program itself. The ADS All-Sky Survey will greatly enhance the scientific workflow by tying literature and data searching together in one resource.

12

Page 22: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

4. PLAN OF WORK

13

Page 23: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

As we hope the detailed Gantt chart above shows, our group and our colleagues have been thinking about how to do this project for quite some time. Our co-location and collaboration with the ADS group as part of our “Seamless Astronomy” efforts13 has already facilitated the creation of ADS Labs itself, and the establishment of an ADS All-Sky map is a logical and exciting next step. Our long-standing collaboration with Microsoft Research, and our association14 with the Virtual Astronomical Observatory (VAO, formerly NVO) and the International Virtual Observatory Alliance (IVOA) position us well to understand what will and will not be possible in the near-term future, and how our efforts on the ADSASS integrate with other efforts amongst IVOA partners (e.g. at CDS). Our extended collaboration (please see letters of support), involving key members of the ADS, CDS, WWT, MAST, and astrometry.net teams, is uniquely suited to create the ADS All-Sky Survey.

The budget requested here represents only the increment, above our currently funded efforts within the Seamless Astronomy collaboration and ADS Labs, needed to enable the creation of the ADSASS.

In the chart above, the column headings have meanings as follows: “Primary” indicates the person doing the majority of the work on a particular task; “Rspnsbl” indicates the person responsible for the task being completed, and often working in close collaboration with the person listed under “Primary;” “Collab,” if filled-in, shows the organization with which we will collaborate most closely for each task listed. The chart has been generated using the online collaborative project management tool smartsheet.com, which we will continue to use to track this project’s development.

5. DATA SHARING PLAN

Our proposed project is essentially a “data sharing” plan. As we have explained above, the full ADSASS will be made available, through ADS and other organizations requesting access, at no charge. The sites using ADSASS services will clearly identify the sponsor for the project as NASA.

We expect that the ADSASS and its underlying database will be of interest to: 1) researchers in astrophysics; 2) information scientists; and 3) the general public. Given the goals of the ADAP program, we have only described the relevance of our work to group “1” (astrophysicists). It is important to point out here, though, that the ADS team has a long history of collaboration with information scientists, and that we will provide customized data sets to that community upon request. In addition, the PI is also PI of the WorldWide Telescope Ambassadors (WWTA) Program, which uses WWT as a STEM teaching tool. We will likely offer a follow-on Outreach proposal to this one, focused on incorporating views of the ADSASS within WWT as a fantastic educational resource.

13 http://projects.iq.harvard.edu/seamlessastronomy/

14 At present, the PI is on the Science Advisory committee of the VAO. The Co-I is involved in user support and testing related to the VAO, leading the “Online Astronomy User Group” at the Center for Astrophysics.

14

Page 24: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

6. RELEVANCE OF THE PROPOSED WORK TO NASA PROGRAMS

The ADS as it was envisioned in the late 1980s and early 1990s (Murray et al. 1992) anticipated today’s environment of an online system of distributed astronomical data resources, integrated via clever software systems–but its goals were not fully technologically achievable until now, more than 20 years later. The ADS visionaries saw that NASA’s archives were about to grow rapidly thanks to the advent of the Great Observatories, but they could not fully foresee the wealth of the additional NASA (and other) data sets that would also become rapidly available to researchers online. As we all know, ADS evolved into a literature service, while CDS and NASA archives and database systems (e.g. NED) primarily took up the data charge. The ADSASS represents the fantastic chance to finally re-unite NASA’s data and literature holdings into the most powerful astrophysical research tool to date. The NASA ADS is already the envy of other sciences as a unified research tool, with the advent of the ADSASS, NASA will have led the way to the future once again.

15

Page 25: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

7. REFERENCES

Aspin, C. et al. 2006 “The 1966-1967 Outburst of V1647 Orionis and the Appearance of McNeil's Nebula”. The Astronomical Journal, vol. 132, p. 1298-1306.

Covey et al. 2011 “PTF10nvg: An Outbursting Class I Protostar in the Pelican/North American Nebula” vol. 141, id. 40.

Grady, L. 2006. “Random Walks for Image Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1768-1783.

Huchra, J. & Brodie, J. 1987. “The M87 globular cluster system. I - Dynamics,” Astronomical Journal, vol. 93, p. 779-784. 

Indriolo, N., et al. 2010. “Investigating the Cosmic-ray Ionization Rate Near the Supernova Remnant IC 443 through H+ 3 Observations”. The Astrophysical Journal, vol. 724, id. 2, pp. 1357-1365

Jacopille, D. 2011, unpublished student project in Harvard CS171 (taught by Prof. H.P. Pfister, Dr. A. Pepe), “Data Visualization,” http://cs171.org/2011/projects/web/Jacopille_David/jacopille/

Lang, D., Hogg, D. W., Mierle, K., Blanton, M., & Roweis,  S. 2010. “Astrometry.net: Blind astrometric calibration of arbitrary astronomical images”, The Astronomical Journal, vol. 137, pp. 1782–1800. 

Lang, D. & Hogg, D. W. 2011 “Searching for comets on the World Wide Web: The orbit of 17P/Holmes from the behavior of photographers”. arXiv e-prints, 1103.6038.

Ma W. Y., & Manjunath B. S. 2000. “EdgeFlow: a technique for boundary detection and image segmentation.”, IEEE Trans Image Process., vol. 9, id. 8, pp. 1375-88.

Marazzato, R. & Sparavigna, A.C. 2009 “Astronomical image processing based on fractional calculus: the AstroFracTool”, Instrumentation and Methods for Astrophysics.

Murray, S.S., Brugel, E.W., Eichhorn, G., Farris, A., Good, J.C., Kurtz, M.J., Nousek, J.A., & Stoner, J.L. 1992, “The NASA Astrophysics Data System: A Heterogeneous Distributed Processing System Application,” Southern Observatory Conference and Workshop Proceedings, 43, 387.

Ofek, E. O. et al. 2010 “Supernova PTF 09UJ: A Possible Shock Breakout from a Dense Circumstellar Wind”. The Astrophysical Journal, vol. 724, id. 2, pp. 1396-1401

O’Sullivan, E., Giacintucci, S., David, L. P., Vrtilek, J. M., Raychaudhury1, S. 2011 “A deep Chandra observation of the poor cluster AWM 4 – II. The role of the radio jets in enriching the intracluster medium”. Mon. Not. R. Astron. Soc. vol. 411, 1833–1842

Rest, A. et al. 2011 “Direct Confirmation of the Asymmetry of the Cas A Supernova with Light Echoes” The Astrophysical Journal, vol. 732, id. 3.

Valenti & Fischer. 2005 “Spectroscopic Properties of Cool Stars (SPOCS). I. 1040 F, G, and K Dwarfs from Keck, Lick, and AAT Planet Search Programs”. The Astrophysical Journal Supplement Series, vol. 159, p. 141-166.

16

Page 26: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Goodman

BIOGRAPHICAL SKETCH FOR ALYSSA A. GOODMAN, MAY 2011

Professional Preparation

Massachusetts Institute of Technology Physics Sc.B, 1984 Harvard University Physics A.M., 1986 Harvard University Physics Ph.D., 1989 University of California, Berkeley Astronomy President’s Fellow, 1989-92

Appointments

1999- Professor of Astronomy, Harvard University 2008- (Inaugural) Scholar-in-Residence, WGBH Boston 2005-2008 (Founding) Director, Initiative in Innovative Computing, Harvard University 2001-2002 Visiting Fellow, Yale University (Sabbatical) 1996-1999 Associate Professor of Astronomy, Harvard University 1992-1996 Assistant Professor of Astronomy, Harvard University 1995-1997 Head Tutor, Harvard University Astronomy Department 1995- Research Associate, Smithsonian Astrophysical Observatory 1989-1992 President’s Fellow, University of California, Berkeley

Selected Honors, Awards, and Elected Positions

2009 Fellow, American Association for the Advancement of Science (AAAS) 2008 Microsoft Academic Partner 2007 Apple Science Innovator (one of 10, nationally), Apple Computer 2007 Chair of Astronomy, American Association for the Advancement of Science 1998 Newton Lacy Pierce Prize, American Astronomical Society

Annotated Bibliography Instead of providing a standard (ADS-style) list of publications, I highlight a small fraction of our group’s recent papers where an ADS All-Sky Survey as described in this proposal would have been of especially great value in the study. After each bibliography entry is a “Q” followed by questions explaining why the ADSASS would have been so relevant.

Kirk, H., Pineda, J.E., Johnstone, D. & Goodman, A. 2010, The Dynamics of Dense Cores in the Perseus Molecular Cloud. II. The Relationship Between Dense Cores and the Cloud. ApJ, 723 457-475 Q What 13CO maps of molecular clouds have been discussed in the literature in regions where many NH3 or N2H+ cores have also been surveyed? How unique is the Perseus database? Are comparisons possible?

Goodman, A. A., Pineda, J. E. & Schnee, S. L. 2009, The "True" Column Density Distribution in Star-Forming Molecular Clouds, ApJ, 692, 91-103 Q Which molecular clouds have been surveyed using NIR extinction mapping, and where are they on the sky? Are they at similar height off the Galactic plane? How many were done between 2000 and the present, when high-quality NIR maps would have been used?

Goodman, A. A., Rosolowsky, E. W., Borkin, M. A., Foster, J. B., Halle, M., Kauffmann, J. & Pineda, J. E. 2009, A role for self-gravity at multiple length scales in the process of star formation, Nature, 457, 63-66 Q Have we missed other regions where all of the kinds of data (molecular line, sub-mm and mm continuum, near-IR, optical, and Spitzer c2d sources) included in this paper have been studied together? And, if so, are any near Perseus, or in regions like Perseus on the sky, where direct comparisons would be possible?

Li, H.B, Dowell, C. D., Goodman, A., Hildebrand, R. & Novak, G. 2009, Anchoring Magnetic Field in Turbulent Molecular Clouds, ApJ, 704, 891-897 Q Where are all the optical polarization maps of background starlight ever made, and how many can we superimpose on the sky thans to their extraction

17

Page 27: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Goodman

from figures via astrometry.net? (Those polarization maps give contextual information on the large-scale magnetic field.)

Schnee, S., Li, J., Goodman, A. A. & Sargent, A. I. 2008, Dust Emission from the Perseus Molecular Cloud, ArXiv e-prints, 0805, 4215 Q In what regions of the sky has line-of-sight dust temperature been measured? Are optical images suitable for extinction map comparisons available there, and what do they look like overlain on dust maps (e.g. within WWT)? How extended are the regions mapped?

Rosolowsky, E. W. et al. 2008, An Ammonia Spectral Atlas of Dense Cores in Perseus, AJ Supp., 175, 509-521 Q Show me the position of every ammonia core ever mapped, on the sky. Are they all in well-studied molecular clouds (defined as clouds having >10 papers written about them within a defined linear extent? Are there clouds that look like they’ve been “missed” in earlier core surveys and should we suggest targeting them?

Pineda, J. E., Caselli, P. & Goodman, A. A. 2008, CO Isotopologues in the Perseus Molecular Cloud Complex: the X-factor and Regional Variations, ApJ, 679, 481-496 Q Where has the X-factor (CO to H2 ratio) been measured on the sky? How significant are large-scale (region-to-region) variations?

Foster, J. B. & Goodman, A. A. 2006, Cloudshine: New Light on Dark Clouds, ApJ, 636, L105-L108 Q What is the general relationship between regions where dust column density has been mapped out (e.g. using extinction) and known HII regions? How significant is the non-uniform radiation field created by bright local HII regions or B stars in comparison with the general interstellar radiation field in illuminating dark clouds and causing “cloudshine”?

Goodman, A. A. & Arce, H. G. 2004, PV Cephei: Young Star Caught Speeding?, ApJ, 608, 831-845 Q Show me, on the sky, every paper ever written about stars in the region between NGC 7023 (where PV Ceph is claimed to have come from) and the present location of PV Ceph (10 degrees away)… then, filter those according to criteria such as “known proper motion” or “young star” or “B-star,” or “fast-moving.”

Management Experience

As (founding) Director of the Initiative in Innovative Computing (IIC) at Harvard, AG built a new institution at Harvard to address and answer scientific questions unanswerable without bringing domain scientists and computer scientists into closer collaboration. The IIC created and fostered one dozen new “e-Science” projects at Harvard, all of which are still ongoing. The projects range from mapping out the wiring diagram of the human brain to collaboration systems to astrophysics. AG stepped down from IIC to re-focus on Astronomy and Data Visualization, but she still leads the “Seamless Astronomy” and “Astronomical Medicine” efforts which grew out of her work at the IIC. The staff of the IIC, which AG supervised when she left, was comprised of more than 30 scientists, technologists, engineers and staff.

Presently, AG leads the “Seamless Astronomy” effort based at the CfA, which is a collaboration amongst 24 people at three institutions, and she was also PI of the COMPLETE Survey of Star-Forming Regions, which involved 20 investigators in four contries. COMPLETE is widely considered one of the most successful coordinated surveys of the physics of star formation to date.

Community Outreach, and Related Activity

AG is principal scientific advisor to the World Wide Telescope (WWT) project at Microsoft Research. She is also PI of the “WorldWide Telescope Ambassasors Program” (WWTA; wwtambassadors.org), which uses WWT as a teaching platform online, in schools, and a public science events. WWTA is a collaboration amongst Harvard, Microsoft Research and WGBH (our media partner). The ADSASS will be included as a layer in the public release of WWT and therefore distributed for use by the public at large, both informally and through the WWTA program.

18

Page 28: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Pepe

BIOGRAPHICAL SKETCH FOR ALBERTO PEPE, MAY 2011

Professional Preparation

University College London, UK Astrophysics B.Sc.2002 University College London, UK Computer Science M. Sc. 2003 University of California, Los Angeles Information Studies Ph. D. 2010

Appointments

2010- Postdoctoral Research Fellow, Center for Astrophysics, Harvard University 2010- Head Teaching Fellow, School of Engineering and Applied Sciences, Harvard University 2009 Teaching Assistant, Department of Design Media Arts, UC, Los Angeles 2008 Course designer and teaching assistant, Dept. of Information Studies, UC, Los Angeles 2006–2010 Graduate Research Assistant, CENS, Center for Embedded Networked Sensing, LA, CA 2007 Visiting fellow, Digital library Research and Prototyping Group, LANL, Los Alamos, NM 2004–2006 Marie Curie fellow, CERN, Geneva, Switzerland 2004 Research fellow, CINECA, InterUniversity Consortium, University of Bologna, Italy

Selected Honors, Awards, and Elected Positions 2010 American Society for Information Science & Technology, Best Dissertation Award 2006–2010 Microsoft Research, Technical Computing Fellowship. 2004–2006 European Commission, Marie Curie Fellowship. 2004 Consiglio Nazionale delle Ricerche (CNR-Italy), Assegno di Ricerca Selected Relevant Publications Collaboration in sensor network research: an in-depth longitudinal analysis of assortative mixing patterns. Alberto Pepe, Marko A. Rodriguez. Scientometrics. Volume 84, Number 3. 2010. From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web. Alberto Pepe, Matthew Mayernik, Christine L. Borgman, Herbert Van De Sompel. Journal of the American Society for Information Science and Technology (JASIST). Volume 61, Issue 3. doi:10.1002/asi.21263. 2010 Moving Archival Practices Upstream: An Exploration of the Life Cycle of Ecological Sensing Data in Collaborative Field Research. Jillian C. Wallis, Christine L. Borgman, Matthew S. Mayernik, Alberto Pepe. International Journal of Digital Curation. Vol. 3. Issue 1. 2008. Drowning in Data: Digital Library Architecture to Support Scientists’ Use of Embedded Sensor Networks. Christine L. Borgman, Jillian C. Wallis, Matthew S. Mayernik, Alberto Pepe. Proceedings of ACM IEEE Joint Conference on Digital Libraries 2007. Protocols for Scholarly Communication. Alberto Pepe, Joanne Yeomans. Astronomical Society of Pacific Conference Series Vol. 377. San Francisco: ASP, 2007. ISBN 978-1-58381-316-4

19

Page 29: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

BIOGRAPHICAL SKETCH FOR AUGUST A. MUENCH

Muench is an Astrophysicist in the High Energy Astrophysics division of the Smithsonian Astrophysical Observatory with duties that include: performing scientific research based on multi-wavelength research and data mining aimed at the scientific exploitation of virtual observatory standards and tools; advising other Observatory scientists in the use of VAO (Virtual Astrophysical Observatory) facilities for their research; leading quality assurance and testing of online astronomy tools with a focus on science end user feedback; developing data publishing protocols for exposing scientist data products as online resources. Muench is a member of the VAO User Support team and a manager of the CfA Seamless astronomy consortium.

Professional Preparation

Georgia Institute of Technology Physics Sc.B, 1995 University of Florida Astronomy Ph.D., 2002

Appointments

2010- Astrophysicist, Smithsonian Astrophysical Observatory

2009-2010 Infrared Astronomer, Smithsonian Astrophysical Observatory

2008-2009 Research Associate, Harvard College Observatory; Program Manager, WorldWide Telescope Pro, Initiative in Innovative Computing at Harvard

2005-2008 Visiting Scientist, Smithsonian Astrophysical Observatory

2003-2005 Postdoctoral Fellow, Smithsonian Astrophysical Observatory

2002-2003 Research Fellow, Spitzer Science Center, California Institute of Technology

Service

Served as Publications Referee for the Astronomical Journal, the Astrophysical Journal, Monthly Notices, and Astronomy and Astrophysics. Served as Proposal Reviewer for NASA ADAP, EPOESS.

Relevant Experience

− Program manager for WorldWide Telescope Pro, a project (Goodman, PI) funded by Microsoft Research (MSR) to integrate professional research tools, such as virtual observatory queries, into the WWT interface.

− Lead on a NASA funded EPO project to use WWT to explain to the general public how NASA scientists use multi-wavelength data to answer fundamental questions about the universe.

− Performed data mining of the ADS and arXiv literature databases to explore how datasets were cited and linked in the refereed literature. Oral presentation of results at 218th AAS Meeting, May 2011.

− Collaborating with an existing data repository system at Harvard to create a platform, interoperable with ADS, that enables future papers to include persistent links to the data sources used within them.

− Leads the Online Astronomy User Group at the Observatory with the dual goals of first, exposing or training research scientists in using online astronomy tools and second, gathering feedback on the science enabling features and needs of the research community.

20

Page 30: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

11 rue de l’Université F-67000 Strasbourg Tél. : +33 (0)3 68 85 24 10 Fax : +33 (0)3 68 85 24 32 www.unistra.fr

Prof. Alyssa Goodman 60 Garden Street, MS 42 Cambridge, MA 02138 Strasbourg, 12 May, 2011 Dear Alyssa,

I acknowledge that I am identified by name as collaborator to the investigation, entitled “ADS All-Sky Survey” that you submitted to the NASA Research Announcement NNH11ZDA001N-ADAP, and that I intend to carry out all responsibilities identified for me in this proposal. I understand that the extent and justification of my participation as stated in this proposal will be considered during peer review in determining in part the merits of this proposal. I have read the entire proposal, including the management plan and budget, and I agree that the proposal correctly describes my commitment to the proposed investigation. For the purposes of conducting work for this investigation, my participating organization is the Observatoire astronomique de Strasbourg.

I work as a software engineer at CDS (Centre de Données

astronomiques de Strasbourg). CDS collects and distributes curated astronomical information through online services such as SIMBAD which provides basic data, bibliography and cross-identifications for more than 5 million of astronomical objects.

My tasks at CDS include the development of the Aladin visualization tool and the development of Virtual Observatory standards and tools.

I think that my expertise in visualization, handling of large data, as well as my advanced knowledge of CDS services will help in reaching the goals of the project.

Sincerely,

Thomas Boch CDS, Observatoire astronomique de Strasbourg [email protected]

27

Page 31: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Alyssa Goodman 60 Garden Street, MS 42 Cambridge, MA 02138

3 May, 2011 Dear Prof. Alyssa Goodman, I acknowledge that I am identified by name as collaborator to the investigation, entitled “ADS All-Sky Survey” that you submitted to the NASA Research Announcement NNH11ZDA001N-ADAP, and that I intend to carry out all responsibilities identified for me in this proposal. I understand that the extent and justification of my participation as stated in this proposal will be considered during peer review in determining in part the merits of this proposal. I have read the entire proposal, including the management plan and budget, and I agree that the proposal correctly describes my commitment to the proposed investigation. For the purposes of conducting work for this investigation, my participating organization is the Smithsonian Astrophysical Observatory. With over 4.5 million scanned pages of astronomical literature, the NASA Astrophyiscs Data System (ADS) article archive provides a potential source of observational image data and metadata waiting to be mined. It is in our communal interest to work with project such as the “ADS All-Sky Survey” to explore novel ways to extract and expose this information in a systematic way. Additionally, the integration of novel visualization tools with ADS’s own UI efforts will provide new functionality to the ADS Labs platform (http://adslabs.org). We plan to collaborate with the project members to provide them access to the literature archive, modifying our access APIs as appropriate to support this and other text and image mining efforts by the community. Our expertise in image processing, feature extraction and metadata enrichment will be useful to accomplish the stated goals.

Sincerely, Alberto Accomazzi Program Manager, NASA Astrophysics Data System [email protected]

28

Page 32: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

29

Page 33: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Microsoft Corporation is an equal opportunity employer.

Alyssa Goodman 60 Garden Street, MS 42 Cambridge, MA 02138

3 May, 2011 Dear Prof. Alyssa Goodman, I acknowledge that I am identified by name as collaborator to the investigation, entitled “ADS All-Sky Survey” that you submitted to the NASA Research Announcement NNH11ZDA001N-ADAP, and that I intend to carry out all responsibilities identified for me in this proposal. I understand that the extent and justification of my participation as stated in this proposal will be considered during peer review in determining in part the merits of this proposal. I have read the entire proposal, including the management plan and budget, and I agree that the proposal correctly describes my commitment to the proposed investigation. For the purposes of conducting work for this investigation, my participating organization is Microsoft Research. I bring to this project my skills in software development and astronomical image processing. I have worked at Microsoft for 18 years, and have led the development of several extensive projects in both imaging and astronomy fields. For the last four years I have served as the principal software architect on Microsoft Research WorldWide Telescope. In that capacity I have been involved in Virtual Observatory projects and standards bodies (NVO/VOA, IVOA) as well as being active collaborating with university astronomy research departments and planetariums. The resources of the WorldWide Telescope client and server infrastructure can easily accommodate the needs of the project and I will be collaborating in making these resource available and accessible to the project.

Sincerely,

Jonathan Fay Microsoft Research

30

Page 34: Cover Page for Proposal NASA Proposal Number Submitted to ...projects.iq.harvard.edu/files/seamlessastronomy/files/nasa_adap_adsall... · Cover Page for Proposal Submitted to the

Alyssa Goodman 60 Garden Street, MS 42 Cambridge, MA 02138

3 May, 2011 Dear Prof. Alyssa Goodman, I acknowledge that “ADS All-Sky Survey” that you submitted to the NASA Research Announcement NNH11ZDA001N-ADAP, and that I intend to carry out all responsibilities identified for me in this proposal. I understand that the extent and justification of my participation as stated in this proposal will be considered during peer review in determining in part the merits of this proposal. I have read the entire proposal, including the management plan and budget, and I agree that the proposal correctly describes my commitment to the proposed investigation. For the purposes of conducting work for this investigation, my participating organization is the Space Telescope Science Institute. Dr. Alberto Conti is an Astronomer and the NASA Multimission Archive Scientist at the Space Telescope Science Institute (MAST). He is well known for his work on Galaxy Formation and numerous pieces of work on sky survey data. Dr. Conti has been a leader in developing data mining tools to deliver Optical and UV data to the astronomical community as part of MAST for the past 9 years. As Hubble Chief Engineer, he was responsible for the Data Management System of the Hubble Space Telescope, and has been an early proponent of the use of modern data mining and visualization techniques across the astronomical community. He is the co-creator of the GoogleSky concept and a team member of Microsoft's World Wide Telescope. Dr. Conti will provide continuing advice to the “ADS All-Sky Survey” team, participate in regular team meetings, and assist in the analysis, interpretation, and publication of the results from this proposal.

Sincerely,

Dr. Alberto Conti MAST Archive Scientist Space Telescope Science Institute

31