Upload
feo
View
34
Download
0
Embed Size (px)
DESCRIPTION
Pan-STARRS PS1 Published Science Products Subsystem. Presentation to the PS1 Science Council August 1, 2007. What is PSPS?. Responsible for managing the catalogs of digital data PS1 PSPS will not receive image files, which are retained by IPP Three significant PS1 I/O threads: - PowerPoint PPT Presentation
Citation preview
Pan-STARRS PS1 Published Science Products Subsystem
Presentation to the PS1 Science Council August 1, 2007
What is PSPS?
• Responsible for managing the catalogs of digital data
• PS1 PSPS will not receive image files, which are retained by IPP
• Three significant PS1 I/O threads:– Ingest of detections and initial
celestial object data from IPP– Ingest of moving object data from
MOPS– User queries of detection/object
data records
What is PSPS?
• Web Based Interface (WBI) – the “link” with the human
• Data Retrieval Layer (DRL) – the “gate-keeper” of the data collections
• PS1 data collection managers– Object Data Manager (ODM)– Solar System Data Manager (SSDM)
• Provide the connection protocol for other (future/PS4) data collection managers; e.g.,– “Postage stamp” cutouts– Complete Metadata database – Cumulative sky image server– Filtered transient database (and other
special clients)
DRL
WBI Other S/W Client
Human
ODM SSDM Other DM
IPP MOPS
PSPS ComponentsOverview/Terminology
• DRL: Data Retrieval Layer
– Software clients, not humans, are PDCs
– Connects to DMs• PDC: Published Data
Client– WBI: Web Based
Interface– External PDCs (non-
PSPS)• DM: Data Manager
(generic)– ODM: Object Data
Manager– SSDM: Solar System
Data Manager
WBI
PublishedData Client
DRL
Standard User APIAdministrator API
Data Manager API
science data
interfacecontract
interfacedependency
Legend
ODMSSDM
MOPS IPP
Pan-STARRSSubsystem
PSPSComponent
FutureComponent
DataManager
PSPS-IPP InterfacePSPS-MOPS Interface
PSPS
metadata, detections raw
science data
science data
IDs
PreferredScience Client(Data Provider)
FuturePan-STARRS
Subsystem
PublishedData Client
NonPan-STARRS
System
Data Manager API Data Manager API
PSPS Development Status
• DRL - a Request For Proposals has been issued to software developers to code the DRL designed by SAIC. This layer includes APIs to connect to the web clients and the databases.
• ODM - a cooperative agreement is being developed with Johns Hopkins University’s Department of Physics & Astronomy to develop the ODM, leveraging their experience from the Sloan Digital Sky Survey database work.
• SSDM - will be a working clone of the MOPS science client database (and a hot spare for the MOPS system).
PSPS Development Status
• WBI - Web clients to access the ODM and SSDM will include those already developed for the MOPS, the “Gator” clone developed at IfA, and a port of the SDSS “CasJobs” client. These will use the new DRL API being developed in the lead item above.
• End-to-end testing of the PSPS structure can be accomplished using the DRL, the ported MOPS web client, and a MOPS clone on the backend. This can be done while the ODM is still under development.
The Object Data Manager
• The ODM is the major component of the PSPS, both in terms of size and complexity. It’s more than a simple archive.
• The ODM will hold & provide user access to:– Catalogs of all individual focal plane (P2) detections.– Catalogs of detections from all stacked images.– Catalogs of all derived objects.– Catalogs of high-significance detections in difference images
(when they become available).– “Blobs” of low-significance detections from difference images.– Sufficient metadata to allow the user to determine the provenance
of any observation.
ODM - Not Your Traditional Astronomical Database!
• Unlike SDSS or 2MASS, we are not waiting until the project is over to generate the database, we’ll publish data as we go!
• Data releases? The concept doesn’t apply here! We will probably keep monthly snapshots of the object catalog as the project proceeds.
• Our logical data structure will allow the user to track how an object’s properties change as new (better) information is added over time. (It’s possible but not necessarily easy!)
ODM Prototyping Goals
The prototyping effort now underway at JHU is intended to demonstrate:– Data ingest (primarily detection to object correlation)
– Scalability (physical data schema) - aka partitioning
– Publishing (moving data from ingest pipeline to query side storage) in a way that has minimal impact on queries
Prototype ODM Structure
Legend
DatabaseFull table [partitioned table]Output tablePartitioned View
Query Manager (QM)Query Manager (QM)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
Web Based Interface (WBI)Web Based Interface (WBI)
Data Transformation Layer (DX)Data Transformation Layer (DX)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
Legend
DatabaseFull table [partitioned table]Output tablePartitioned View
Query Manager (QM)Query Manager (QM)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
PS1
P1 Pm
PartionsMap
Objects
LnkToObj
Meta
[Objects_p1]
[LnkToObj_p1]
[Detections_p1]
Meta
[Objects_pm]
[LnkToObj_pm]
[Detections_pm]
MetaDetections
Linked servers
Data Storage (DS)
Web Based Interface (WBI)Web Based Interface (WBI)
Data Transformation Layer (DX)Data Transformation Layer (DX)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
LoadAdmin
LoadSupport1
objZoneIndx
orphans
Detections_l1
LnkToObj_l1
objZoneIndx
orphans
Detections_ln
LnkToObj_ln
LoadSupportn
Linked servers
PartitionMapData Loading Pipeline (DLP)
Existing Components (from SDSS)
The prototype will utilize the following existing SDSS components:– Data Loading Pipeline (sqlLoader)
– Self-extracting Documentation & Diagnostics
– SQL Query Workbench (CasJobs)
– Spatial Library (Spherical/HTM)
Functionality Under Development
New components for the prototype include:– Data Transformation Layer (input to loader)
– Simulated Data (SDSS data & simulated galactic plane)
– Sample Queries (verify query performance)
– Cross-Match Functionality (detection-object correlation)
– Data Partitioning Procedures (partition across muti-mode cluster for parallel data access)
PS1 Logical Data Schema
PS1 Data Tables & Sizes
tablename cols byte/row rows total (TB) Prototype DR1 comments
AltModels 7 1547 10 1.547E-08 1.547E-08 0CameraConfig 5 287 30 8.61E-09 8.61E-09 0FileGroupMap 4 4335 100 4.335E-07 4.335E-07 0IndexMap 7 2301 100 2.301E-07 2.301E-07 0Objects 88 420 5.50E+09 2.31 0.693 2.31 5 billion stars + 500 million galaxies = total number of objectsObjZoneIndx 7 63 5.50E+09 0.3465 0.10395 0.35 for circular and especially neighbor queries [optional but good to have it at least in prototype]PartitionMap 3 4111 100 4.111E-07 4.111E-07 0PhotoCal 10 151 1000 1.51E-07 1.51E-07 0 Long-term stability of cameraPhotozRecipes 2 267 10 2.67E-09 2.67E-09 0 Descriptors of photo-z algorithmsSkyCells 2 10 50000 0.0000005 0.0000005 0 Definitions of regionsSurveys 2 267 30 8.01E-09 8.01E-09 0 Survey index and text descriptorDropP2ToObj 4 39 4.00E+06 0.000156 1.337E-05 0 Are thes tw o really the same?DropStackToObj 4 39 4.00E+06 0.000156 1.337E-05 0P2AltFits 13 71 1.51E+10 1.06855 0.09159 0.31 10% of P2 detections x 3.5 yearsP2FrameMeta 18 343 1.05E+06 0.00036015 3.087E-05 0P2ImageMeta 64 2870 6.72E+07 0.192864 0.0165312 0.06 1000 images/night x 64/frame x 300 nights x 3.5 yearsP2PsfFits 34 183 1.51E+11 27.5415 2.3607 7.87 total P2 dectections /yr * 3.5 yearsP2ToObj 3 31 1.51E+11 4.6655 0.3999 1.33 Linking table - same size as P2PsfFits detectionsP2ToStack 2 15 1.51E+11 2.2575 0.1935 0.65StackDeltaAltFits 13 71 3.68E+09 0.260925 0.022365 0.07 10% of StackHiSigDeltas - comets, trails etc.StackHiSigDeltas 32 167 3.68E+10 6.13725 0.52605 1.75 7 sq deg x 5000/image x 1000 images/night x 300 nights x 3.5 years (upper bound)StackLow SigDelta 2 5000 1.65E+06 0.00825 0.0007071 0 Numerical noise - varbinary (FITS table) so need to get average sizeStackMeta 49 1551 700000 0.0010857 0.0003257 0 30000 (for 3-pi survey) x 5 f ilters ( round to 200k) x 3.5 yearsStackModelFits 131 535 7.50E+09 4.0125 0.3439286 1.15 number of galaxies x 3 copies x 5 f iltersStackPsfFits 44 215 8.25E+10 17.7375 1.5203571 5.07 total objects x 5 f ilters x 3 copiesStackToObj 4 39 8.25E+10 3.2175 0.2757857 0.92 Linking table same size as StackP2FitsStationaryTransient 2 23 5.00E+08 0.0115 0.0009857 0 Linking table - assume 10% of stars are transients
sum 69.7695986 6.5497356 21.8 Total data sizeindices 13.9539197 1.3099471 4.37 Assume 20% overhead for database indicestotal 83.7235183 7.8596827 26.2 Total size of database
User Interfaces
• The DRL authenticates “users” on a per machine basis.
• Our initial implementation will be via a secure web server providing access to the following clients:– A port of CasJobs from SDSS to access the ODM
– A “Gator” like menu driven tool to access the ODM
– Perl tools (developed by MOPS) to access the SSDM
• Machine access (PDCs) will be configured to attach to the DRL directly.
System Expansion
• Addition of other data collections, e.g., value added products are accommodated within our PSPS design.– The basic PSPS design provides well-defined APIs to the outside
(WBI and PDCs) and inside (databases)– Data collections need not be Relational Database Management
Systems (RDMS), but must obey the DRL-DM API– Databases need not all be the same type, e.g., ODM will use
MSSQL and SSDM will be built on MySQL.• Although not part of the original PSPS design, we can
provide intra-database communications (below the DRL) via well-defined mechanisms (e.g., ODBC, JDBC) to allow queries that cross the data collections hosted by the PSPS. These would be limited to read operations.
Development Schedule
• Award DRL development contract - August 2007
• ODM Prototyping through end of September 2007
• Critical Design Review - end of October 2007
• Hire PSPS Software Engineers (IfA & JHU) - October 2007
• Complete DRL development and perform integration & end-to-end tests using MOPS DB and web interface - April 2008
• Complete integration of the ODM from JHU into the PSPS & full subsystem testing of the system - August 2008