Bill Reach US Planck Data Analysis Review • 9-10 May 2006
US Planck Data Analysis ReviewUS Planck Data Analysis Review
The Planck Early Release Compact The Planck Early Release Compact Source Catalog - StatusSource Catalog - Status
Bill Reach, Gene Kopan, and Tim Pearson
2Bill Reach US Planck Data Analysis Review • 9-10 May 2006
Early Release Compact Source Catalog (ERCSC)Early Release Compact Source Catalog (ERCSC)
What it is– A list of compact bright sources
Galaxies, Quasars, AGN SZ clusters Compact interstellar clouds
– HII regions and molecular clouds
Others
– A catalog using both HFI and LFI data
– Returned to DPCs within 6 months of the first “full” sky coverage
– Released to the public 3 months after provided to DPCs
– Intended for rapid follow-up of “interesting” sources while Herschel (and other systems) still available
What it is not– It is not a real-time detection system
Not even near-real-time Not appropriate for follow-ups of
flaring sources Not appropriate for new solar
system object follow-ups
– It is not the final Planck catalog Calibration will not be finalized. Completeness levels will not
guaranteed. There may be false detections
– Polarization catalog Polarization-sensitive bolometers
will be summed
3Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC SpecificationsERCSC Specifications
Requirements– Source lists, one for each frequency (9 altogether)
No Requirements on band-merging, uncertainties, or in confused areas (e.g. galactic plane), best reasonable efforts will be made
No requirements for polarization, but ERCSC can be run on Stokes Q,U maps on request from DPCs
The DPCs will release the catalogs 3 months after receipt
Goals– 90% reliability of compact source identifications (over cutoff)
Expected Performance– Flux density cutoff: SNR 10 or better
– Flux density accuracy: better than 30%
– Positional accuracy: better than FWHM/5 (1 sigma radial)
4Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC DependenciesERCSC Dependencies
Required Inputs from HFI and LFI DPCs– calibrated, cleaned data with pointing reconstruction for LFI and HFI within 4
weeks of receipt on ground (e.g. weekly deliveries)
– Detector calibration (instrument model) for both LFI and HFI ERCSC to be produced from calibrated data available two months before
catalog due date (I.e. calibration available at S1+4 months)
– HFI/LFI L2 pipeline codes (and installation help) for preprocessing (NERSC/IPAC)
– Realistic simulated data for development and testing – a substantial fraction of a full-sky survey is required, at all frequencies and including as much realism as possible, before launch for integration testing
USDPC provides to HFI and LFI DPCs– source lists within 6 months of completion of first full sky survey coverage
– code that produced source list, within six months of delivery of source list USPDC will participate in DPC and Planck end-to-end test schedules until
launch. Intermediate code deliveries will be made then. There is no requirement that the ERCSC code will run outside of USPDC environment.
The USPDC will make no formal code deliveries between launch and delivery of the catalog
5Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC ProcessingERCSC Processing
HFI DPC
LFI DPC
FT
P
IPAC Planck Data Archive
HFI L2 Preprocessing LFI Preprocessing
USDPC Destriping / Mapmaking
FT
P
ERCSC Source Detection / Extraction
ERCSC QA/ Catalog Production
ERCSC
•high-pass filter (optional)•Assign pointing to samples•Destripe (Springtide)•Assign samples to pixels•Coadd into maps
L2products
L2products
6Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC OperationsERCSC Operations
ERCSC generated from 1st sky coverage
Receive data…weekly major updates
Make new maps…monthly map updates– Must be able to make map from 7 months of data in << 1 month
Make new source list…monthly update
Software upgrades– Allow for at least 2 complete reprocessings (software versions) before delivery
– Versions: P1 = prelaunch pipeline
– P2 = first post-launch update, 1-3 months after getting 1st survey data
– P3 = final update, operational version to generate product delivered back to Planck project
7Bill Reach US Planck Data Analysis Review • 9-10 May 2006
Input calibrated, position-tagged TOD will use PIOremote and L2 pipeline for HFI Will use calibrated timeline in exchange format for LFI (under negotiation)
Clean glitches & instrument signatures
Destripe to match scans (Springtide)
Generate maps per detector– signal, sample variance, counts
– Unified Mapmaking Code same for both HFI and LFI
Filter maps with symmetric ‘matched’ compact source filters
Detect and extract sources from per band maps
Band merge (best efforts)
Quality Analysis
Output catalog
ERCSC Processing stepsERCSC Processing steps
8Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC Development TasksERCSC Development Tasks
HFI L2 Preprocessing code– Port to local machines
– Evaluate/ augment
USDPC Mapmaking/ GCP/ M3 code– Port to local machines
– Evaluate/ augment
Point source detection/ extraction algorithms– Evaluate/ trade off
– Implement
Bandmerging code
Quality assessment tools
Pipeline & software integration
End-to-end testing– In conjunction with major Planck tests, ingest data at IPAC and attempt to run
those parts of ERCSC software that exist
9Bill Reach US Planck Data Analysis Review • 9-10 May 2006
Near-Term Plans FY 06Near-Term Plans FY 06
Obtain all-frequency simulated images with point sources– Simulations underway by three groups
JPL/ADG (soon) all frequency, single “super” bolometer, 1 sky coverage TOD– 30 GHz done, 217 GHz part done
HFI/IAP 40-day TOD simulations for ERCSC and data transfer test– In conjunction with DM test
LFI TBD for data transfer test
Evaluate point source detection/ extraction algorithms (Pearson)
Get HFI/LFI L2 (DM) Preprocessing codes running locally– Test data ingestion using data from HFI/LFI transfer tests
Get USPDC Unified Mapping/ GCP/ M3 code running locally– Generate maps for detection/ extraction testing
10Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC MilestonesERCSC Milestones
Inter-related milestones related to data transfer and handling are in separate schedule.
5/06 Obtain level S simulations all frequencies6/06 Install PISTOU, kst5/06 Obtain predicted source catalog for all frequencies2/07 Predict ERCSC contents (number, type of sources; simulated source lists)7/07 Run Level S data through L2 testbed
12/06 Select detect/extract option4/07 Build detect/extract modules7/07 Integrate datect/extract with IPAC implementations of L2 pipelines12/07 Run detect/extract on simulated data3/08 Design/develop Catalog generator5/08 Integrate Catalog generator with US DPC pipeline7/08 Run Level S data through Catalog generator
7/07 Design/develop QA tool4/08 Design/modify/develop Bandmerger for QA5/08 Bandmerge on simulated source lists5/08 Develop ability to assicate extractions with prior catalogs9/08 Test ability to associate extracted sources with prior sources (C&R)
Bill Reach US Planck Data Analysis Review • 9-10 May 2006
US Planck Data Analysis ReviewUS Planck Data Analysis Review
Source ExtractionSource Extraction
Tim Pearson
12Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC Source Extraction (1/3)ERCSC Source Extraction (1/3)
Locate sources in single-band maps (detection)– Filter maps and search for peaks
Matched filters (with assumptions about noise statistics) “Robust” filters (e.g., mexican hat) Bayesian methods
Estimate source parameters (extraction)– Position, flux density
– Polarization (Q, U) for bright sources
– Angular size (resolved sources): best effort
Band merge (best efforts)– Cross-correlation of single-band catalogs
– Parameter estimation using simultaneous fits to several bands Spectral index SZ cluster detection (?)
13Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC Source Extraction (2/3)ERCSC Source Extraction (2/3)
Make use of published algorithms– Development of new algorithms may occur but is not necessary to meet goals
Use code developed elsewhere when possibleWrite new code as necessaryCompare algorithms using simulated sky maps and WMAP data
– Compare extracted source parameters with input
– As a function of source flux density
– In regions with different foreground contamination
Estimate reliability and completeness– Reliability: fraction of sources found that are real
– Completeness: fraction of real sources that are found
Estimate parameter accuracy– Positional accuracy
– Photometric accuracy
Goal: choose one algorithm that meets requirements by mid 2007– Balance criteria of reliability, completeness, accuracy, run time
– Implement full pipeline and test with simulated data
Further development of algorithms prior to launch
14Bill Reach US Planck Data Analysis Review • 9-10 May 2006
ERCSC Source Extraction (3/3)ERCSC Source Extraction (3/3)Proposed AlgorithmsProposed Algorithms
Matched filter Tegmark & de Oliveira-Costa 1998; Vio et al.
Mexican hat wavelet Cayón et al. 2000; Vielva et al. 2001, 2003
Neyman-Pearson detector and biparametric scale adaptive filter López-Caniego et al. 2005
Adaptive top-hat filter Chiang et al. 2002
Bayesian approach Hobson & McClachlan 2003 Savage & Oliver 2005
MOPEX Makovoz & Marleau 2005
Sextractor Bertin & Arnouts 1996 (blended sources)
etc.
Bill Reach US Planck Data Analysis Review • 9-10 May 2006
US Planck Data Analysis ReviewUS Planck Data Analysis Review
US Planck Data Center US Planck Data Center Archive at IPACArchive at IPAC
Bill Reach
16Bill Reach US Planck Data Analysis Review • 9-10 May 2006
IPAC Planck TeamIPAC Planck Team
Bill Reach– IPAC team lead, Data center design
Gene Kopan– ERCSC system architect
Tim Pearson– Source detection and extraction algorithms
Brendan Crill– HFI instrument, lab testing, telemetry
Ben Rusholme (June 2006)– Preprocessing pipelines, QA
Graca Rocha (TBD)– Algorithms, maps, simulations
1 more scientist being recruited
Code developer being recruited
17Bill Reach US Planck Data Analysis Review • 9-10 May 2006
IPAC Planck Data ArchiveIPAC Planck Data Archive
Service for the US portion of the Planck team– Retrieve data from DPCs
IPAC is single point of contact for mission data Start with simulations and laboratory test data before launch Start with PV phase for flight data Continue through end of mission Maintain data securely
– Support ERCSC development and operations
– Distribute to US team members in accord with Planck policies
– Generate calibrated data from raw, upon instrument model update
Service for the US comminuty– Prepare a mission archive capable of supporting NASA researchers into the future
– Same products and documentation as available from ESA, reorganized for our servers but not rewritten
– Limited US community support (public webpage, email helpdesk)
– Deliver entire archive (“in a box”) to Infrared Science Archive (IRSA) for “eternity”
18Bill Reach US Planck Data Analysis Review • 9-10 May 2006
IPAC Planck Archive: OperationsIPAC Planck Archive: Operations
Import data from HFI and LFI DPCs in their native format – Includes detector data, pointing data, and instrument models
– Start during PV phase
– At least weekly transfers
Import data products from HFI and LFI DPCs– Calibrated time-ordered data
– skymaps
Preprocess data using L2 (modified) pipelines at IPAC– Limited processing capability at IPAC, at level 2 (L2)
– No processing for US team after 2011
– Algorithm development support for US team ideas that need to feed back to DPCs
Export preprocessed data to NERSC for mapmaking– Support secure transfer to NERSC
– Consultation support for installing DPC software at NERSC
19Bill Reach US Planck Data Analysis Review • 9-10 May 2006
IPAC Planck ProcessingIPAC Planck Processing
HFI DPC
LFI DPC
FT
P
IPAC Planck Data Archive
HFI L2 Preprocessing LFI Preprocessing
USDPC Destriping / Mapmaking
FT
P
ERCSC Source Detection / Extraction
ERCSC QA/ Catalog Production
ERCSC
PIOremoteAnd L2
LFIrawand L2
•high-pass filter (optional)•Assign pointing to samples•Destripe (Springtide)•Assign samples to pixels•Coadd into maps
•Secure storage•Access to US co-Is as per DPC policies
Calibrate (apply instrument model)•Deglitch (flagging)
•Calibrate (apply instrument model)•Deglitch (flagging)
20Bill Reach US Planck Data Analysis Review • 9-10 May 2006
IPAC/PlanckIPAC/PlanckComputingComputing
21Bill Reach US Planck Data Analysis Review • 9-10 May 2006
IPAC/PlanckIPAC/PlanckComputingComputing
Phase 1 (FY06)Phase 2 (FY07)
22Bill Reach US Planck Data Analysis Review • 9-10 May 2006
Data Archive: MilestonesData Archive: Milestones
6/06 Install PIOlib8/06 Compile individual L2 module7/07 Integrate L2 modules into testbed
8/06 Negotiate data transfer format with HFI DPC8/06 Negotiate data transfer format with LFI DPC6/06 Obtain simulated data from HFI DPC6/07 Obtain flightlike data from HFI DPC9/06 Obtain sample data from LFI DPC9/07 Obtain flightlike data from LFI DPC9/06 Develop data security plan12/07 Design prototype archive interface for US co-Is
4/07 Integrate L2 modules onto US PDC pipeline9/07 Run L2 pipeline on testbed with HFI data9/07 Run LFI-L2 pipeline on testbed with LFI data10/07 Run HFI and LFI data through Unified mapmaking code
6/06 Install 20 TB disk, linux servers (8 node)6/07 Install 40 TB disk, linux servers (24 node)
23Bill Reach US Planck Data Analysis Review • 9-10 May 2006
Planck/IPAC current SchedulePlanck/IPAC current ScheduleActivity Name
Mission events
launch
cruise
PV phase
1st sky survey
ERCSC delivery
ERCSC release by DPC
ERCSC simulations
Generate simulated data
IInstall visualization tools (kst, PISTOU)
Predict ERCSC contents
process simulated timelines
ERCSC source extraction
select detect/extract method
Build detect/extract modules
Integrate detect/extract modulesrun Simulated data through catalog
Catalog and QA
design/develop QA tool
adapt Bandmerger for ERCSC QA
associate extractions with prior catalogs
ERCSC Operations...
Data Transfer
negotiate data transfer format with HFI
negotiate data transfer format with LFI
install PIOlib,PIOremote
obtain simulated data from HFI DPC
obtain simulated data from LFI DPC
Data Archive
develop data security plan
install 20 TB disk, 8 node linux server
install 40 TB disk, 24 node linux server
prototype archive interface for US co-Isdeploy US co-I archivedevelop IRSA interfacedeploy IRSA interface
Data processing
compile individual L2 modules
integrate L2 modules into testbed pipeline
validate USPDC preprocessing pipelinerun HFI-L2 preprocessing simulated datarun LFI-L2 preprocessing simulated datarun through Unified mapmaking code
Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun
2006 2007 2008 2009 2010 2011
Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Jan Feb Mar Apr May Jun
Bill Reach US Planck Data Analysis Review • 9-10 May 2006
US Planck Data Analysis ReviewUS Planck Data Analysis Review