Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Pre-Processing and Calibration for ‘Million Source Shallow Survey’
V.N. Pandey(Kapteyn Institute/ASTRON) for LOFAR Offline Processing Team
April 1st, 2009 CALIM 09, Socorro
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
2
3
MSSS (MS3) and LOFAR processing chain 1
Outline
4
- Ongoing development/Open Issues
Preprocessing (DP3) - Role + Capabilities
Calibration (BBS) – Role + Capabilities
5 - Ongoing development/Open issues
DP3 - Default Pre-Processing Pipeline BBS - Black Board Selfcal System MS3 - Million Source Shallow Survey (60MHz and 150MHz)
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
MSSS - Specifications
Sky Coverage (Ω) 2π str 2π str
FOV 120 deg sqr 20 deg sqr
Observation time 45 min x 4 15 min x 4
Max Baseline 10 km 10 km
PSF 82.5’’ 33’’
Integration time 1 s 1 s
Freq resln, BW 0.76 KHz, 8MHz 0.76 KHz, 8MHz
Total data size 407 Tbyte 2.3 Pbyte
Point source sensitivity(σ) I 5 mJy 0.5 mJy
N(S>30σ ) 2.7x105 1.05x105
LBA (60MHz) HBA (150MHz) *To be carried out with LOFAR 20(13+7)
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Processing Chain - LOFAR
OFFLINE CLUSTER
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3, BBS in Automatic Offline Processing
CEP (BLUE GENE/P)
Data stream from Stations
SB0 SB1 SB2 subbands spread along frequency axis SubBand N
Correlated Data Copied to Offline Storage/Cluster
DP3 (Default Pre-Processing Pipeline)!!
SB0 SB1 SB2 SB N
BBS (Black Board System) DISTRIBUTED + Multi-Threaded Processing
GDS
SB0 SB1 SB2 SB N
MWI (Master Worker Imager – CImager) DISTRIBUTED PROCESSING
Data -> Preprocessed, Flagged and Compressed
Data ->Calibrated
Images, Sources …..
…
…
…
Y Y Y Y
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3 – Aims/Objectives
Carry out all Pre-Calibration common default steps in an efficient way.
1. Flagging – Pre-Flags and Algorithm based 2. Phase Correction due to diff clocks at stns 3. Sub-Band Pass correction 4. Global Band Pass correction 5. Compression along Frequency Axis 6. Compression along Time Axis
7. Combining Different Sub Bands into a single Mset.
I N T E G R A T E D
D I S T R I B U T E D
Software
Parameter Set File, Cluster Description file, …
• Runs in a Distributed (Purely Data) way on different compute nodes Each Measurement Set processed on one processor • Reads the Measurement set only once for all steps • Single Parset file for all the subbands
Source Code (C++)
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3 – Capabilities 1. Flagging
• Generic code to accommodate - different flagging algorithms - multiple cascade flaggers • Currently used - MAD flagger (Hampel filter) (Two dimensional – freq & time) - Mirroring on the ends- avoids edge effects - Can run on data column desired * - Multiple flaggers implemented
ALGORITHM BASED APRIOR Knowledge Based
REGION CUM CRITERIA APPROACH (under progress!!) REGION: - Time, Freq, baseline, Correlation, Target, Direction CRITERIA: - A condition which Pixel in the region should satisfy
• Satisfactorily results on actual data • May be just good enough for MSSS (Surely not for Transients) • Later Hierarchical system of flaggers
Advantage: Provides region specific suitable flagging Ex. Threshold a function of correlation, baseline etc.
R1- C1, C2, …. R2 – C1, C6… R1
R2
R3 R
Emphasis on using all instrument specific information regarding RFI environment
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Pre-Flagging -> Region cum criteria
• All operations expressed for entire observation expressed as region cum criteria statements • Regions can overlap and have multiple criteria • example R1 C1&C2, C1|| C2 Use as much information as possible: Station beam calibration meta data, Satellite bands, Transmitter, ionosphere information, Bootstrap based on user inputs
Pre-Flagging refers to flagging bad data based on aprior information/estimates before RFI detection Algorithms take charge - Estimate must not use neighborhood visibilities Region – Time range, Freq range, Correlation, Baseline, Target source, Direction (can be many) Time range – Sidereal time, local time, UTC, Integration number, time since start Freq Range – MHz, Subband number, Channel number Baseline – (ant1,ant2), (baseline length, Direction) Direction – any direction in sky, Zenith baseline length for example can be an expression (u>0λ & u<30λ, direction1)
Criteria - a condition visibility should satisfy but should not involve neighborhood visibilities example ALL, amp>0.8, amp> 0.8 f(ν) || amp<0.1f(ν)
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3 – example: Flagger
Tim
e (4
hour
s)
Tim
e(4
hour
s)
Frequency -> Frequency ->
Before Flag
After Flag
MS9315 (40MHzOct 24,2008)
CS1_us0 and CS1_us1 (XX corrleation)
Integ 30s
No absolute Threshold Flagging
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3 example
One channel A single Baseline
Typical RFI Cases which can be handled with ease
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3 - Flagging Example
Time index
Am
plitu
de (X
X)
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3 – Capabilities
• Correct for phase differences introduced due to different clocks at stations • Predetermined Table of corrections to be applied • At this stage exact algorithm is under investigation
• Bandpass shape within each Sub-band due to Polyphase filter bank • Predetermined table as a function of frequency • Already implemented at CEP (Blue Gene)
• In case we need improvements, correction may be implemented.
• Global Bandpass correction due to Antennae response to • Pre-determined table as a function of frequency • Presently Estimation using BBS global solver under investigation
Implementation to have multiple tables of corrections in one step
2. Correcting for clock phases, and cable delays
3. Correcting for sub-band shape
4. Correcting for Global Bandpass
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
DP3: Data Compression 5. Compression along frequency axis
6. Compression along time axis
• Implemented together, any one may be selected or switched off • Compressed pixel = mean ( unflagged values ) • Weight column appropriately modified depending on number of pixels flagged • Time and Frequency of compressed (Averaged pixel) ** - MASK ? (Long baseline work?) • Multiple stage compression allowed
Performance of DP3
• >97% CPU usage most of the time (Data sets of ~ 5Gbyte)
0 0 1 1 1 1 1 1 1 1 1 0 0 1 1
Time
Freq
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Flag Category – Multiple Bits • Instead of single flag for each visibility, Use multiple bits to store flags by flag category • Each bit tagged by User/KSP, Parset/algorithm log, dependencies of other flag bits, time.. • log warning when conflicting conditions • An intermediate file produced which expresses the parset in terms of condensed statements.
- For being able to select desired combination of flag categories based on their performance - Comparative study of different flagging algorithms - Flagging during calibration ex. based on gain solutions can be stored - Flagging based on residual data can be stored - Each KSP may have different strategy for flags, so all can be accommodated - Minimal increased disk space (few bits compared to 8 bytes per correlation visibility) - One can combine algorithms of more than one user - For EoR it may be critical to have this information
Data can reside at one place and users can have only model and corrected data..?
A few advantages
A1 A2 A3 A4 0 1 0 0 1 1 0 1 0 1 0 2 0 1 1 1
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Inferences, issues
• Complete DP3 Pipeline has been tested, works successfully.
• Computational Speed improved – steps integrated in one.
• Integrated in the offline Pipeline.
• Code Generic – support Multiple flaggers and Multiple levels of compression.
• Flags stored bitwise based on flag category
• Time and frequency centroid of averaged visibilities in compressed MS under discussion.
• Performance?
• Regularly used from ?
• RFI for Transients..
• Does not support flagging using data across different subbands/days together.
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
LOFAR Calibration - BBS
● Pool of independent processes operate on shared memory ● A central control process examines the black board and decides what is
to be done next depending on the current state. ● Software to do matrix self-calibration
● Emphasis on performance, batch-mode operation, and distributed processing of large data volumes. Well integrated with SAS/MAC
Commonly used reduction packages for aperture synthesis data — AIPS : VLA, WSRT, GMRT, ATCA, VLBI,… — Miriad : VLA, ATCA, WSRT,… — NEWSTAR : WSRT — AIPS++ : WSRT, VLA, … and Now CASA For LOFAR, with all it novel /complicated aspects, we need to do much better. Two packages have been, and continue to be, developed: — MeqTrees is being used to develop/simulate our understanding — BBS will be implementing efficiently/optimally what we have learned
BBS (Based on Black Board Design Pattern for Distributed computing)
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
BBS
Sky model parameters
Processing strategy
Instrument model parameters
Calibrated visibility data
Observed visibility data
Environment
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
BBS Capabilities ● Processing strategy
– List of processing steps
● Processing steps have the following attributes: – Data selection
– Operation to perform
– Sky model
– Instrument model
● Supported operations – Simulation
– Subtraction, Addition
– Correction
– Parameter fitting
– Easy to add other operations
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
BBS - Capabilities ● Supported parameter types
– Constants
– Polynomials of frequency and/or time
● Supported source models – Point source
– Elliptical Gaussian
● Supported model components – Bandpass
– (Directional) Gain
– Basic SPAM
– Dipole beam ● Analytical model (S. Yatawatta) ● Semi-analytical model (J.P. Hamaker) Dipole beam
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Processing strategy ● Data will be read only once to minimize disk I/O ● Processing strategy
– List of processing steps to perform – Hierarchical specification
Peel0 Peel1
Solve Subtract
Root
Solve Subtract
Root.Steps = [Peel0, Peel1] Peel0.Model.Sources = [CasA] Peel0.Steps = [Solve, Subtract] Peel1.Model.Sources = [CygA] Peel1.Steps = [Solve, Subtract]
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Data distribution ● The visibility data of an observation is stored in a
distributed fashion – Individual subbands are distributed over the cluster
● This is only an issue when fitting parameters – Parameter fitting combines visibility data from a certain
domain in frequency and time – All other operations are embarrassingly parallel
● BBS supports parameter fitting across subbands using separate solver processes
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Calibration group 0
Calibration process 0
Sol
ver
Calibration process 1
Sol
ver
Calibration process 2
Sol
ver
Calibration group 1
Calibration process 3
Sol
ver
Calibration process 4 S
olve
r
Calibration group 2
Calibration process 5
Sol
ver
Calibration process 6
Sol
ver
Solver process 1
Merge equations Solve
Solver process 0
Merge equations Solve
Solver process 2
Merge equations Solve
Equations
Model parameters
Equations
Model parameters
Equations
Model parameters
SB0
SB1
SB2
SB3
SB4
SB5
SB6
Parameter fitting
Use of Storage node CPUs?
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Inspecting the output
● Parameter values are stored in a (distributed) parameter database
● Python interface (Parm Facade) is available
-Enables easy plotting, analysis of solutions-speed up commissioning
Time Time
A M P L I T U D E
P h a s e
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
LBA image using BBS Average { 24hrs, 36* Sub Bands: 0-35} 38-62 MHz
>500 sources
Band width(5 MHz)
Res ~ 0.50
Rms ~ 1 Jy
Initial deep all sky wide field (full hem
isphere centered on North C
elestial Pole)
18h
12h
Sun
Tycho
Taurus
00h
06h
CygA (residual) 1% level
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Open Issues ● High quality images with test stations routinely obtained ● Source positions agree - <1% of HPBW(0.50) on different days (avg) - <3% of HPBW with NVSS Technical - Control framework general enough - Scaling, Multi-Threading, Performance - Scheduling/Failing nodes?
Algorithms ● Simultaneous solution v. peeling ● Solver robustness ● Alternative solver algorithms
– Models ● Beam ● Ionosphere, Mosaicing
V.N.Pandey et. al. DPPP and BBS for MSSS CALIM 2009, Socorro
Thank you