Radio frequency interference mitigation for software telescopes.rob/masters-theses/Tomasz... · 2014-02-11 · make the telescopes more sensitive. New telescopes use thousands of

VU University AmsterdamFaculty of Sciences

Stichting AstronResearch & Development

University of WarsawFaculty of Mathematics, Computer Science and Mechanics

Joint Master of Science Programme

Tomasz Witaszczyk

Student no. 235854 (UW), 2002647 (VU)

Radio frequency interference

mitigation for software telescopes.

Master's thesis

in COMPUTER SCIENCE

Supervisors:

Dr. Rob V. van NieuwpoortVU University Amsterdam

Dr. John W. RomeinASTRON (Netherlands Institute for Radio Astronomy)

August 2010

Supervisor's statement

Hereby I con�rm that the present thesis was prepared under my supervision andthat it ful�ls the requirements for the degree of Master of Computer Science.

Date Supervisor's signature

Author's statement

Hereby I declare that the present thesis was prepared by me and none of its contentswas obtained by means that are against the law.

I also declare that the present thesis is a part of the Joint Master of Science Pro-gramme of the University of Warsaw and the Vrije Universiteit in Amsterdam. Thethesis has never before been a subject of any procedure of obtaining an academic degree.

Moreover, I declare that the present version of the thesis is identical to the attachedelectronic version.

Date Author's signature

Abstract

Radioastronomy is a rapidly growing discipline of science. Astronomers keep building morepowerful telescopes. The di�culties with building bigger parabolic dishes force them to changeto telescopes that consists of thousands of small antennas. Data from all antennas are laterprocessed on a central processing unit. We call such telescopes software telescopes. LOFAR,that currently is under development, is one such telescope. Unfortunately, celestial objectsare not the only sources of waves that can be received by radio telescopes. All other sourcesare, from the astronomers point-of-view, called Radio Frequency Interference (RFI). RFIMitigation is a challenging problem in Radioastronomy. Several mitigation techniques havebeen developed over the years, but most of them operate o�ine on a stored data. OnlineRFI mitigation is di�erent and more di�cult than o�ine mitigation, as we have limitedcomputing power and we can look only at a small part of data at one time. While someobservations with LOFAR are done only using online processing, currently there are no RFImitigation techniques included in a software solution. Moreover, in some online processingpipelines, the sampled data from di�erent stations are added, so if the data from one stationis bad, the sum of the samples is bad as well and the output harms the astronomical dataquality. Currently, no mechanism is available that detects and avoids this behavior. Thisthesis addresses this problem. To check possibilities of detection of RFI and misbehavingstations, the RFI Processing Library was developed and integrated with the existing LOFARsoftware correlator. By comparing e�ciency and accuracy, the Threshold Blanking algorithmhas been chosen as a recommendation for the LOFAR online software. As a tool for detectingand removing stations with corrupted data, the Pre Correlation Stations Detector algorithmhas been chosen.

Keywords

software telescope, RFI, mitigation, Radio Frequency Interference, interference, LOFAR, BlueGene, signal processing, radioastronomy, library

Thesis domain (Socrates-Erasmus subject area codes)

11.3 Computer Science

Subject classi�cation

J. Computer ApplicationsJ.2 Physical Sciences and engineeringJ.2.2 Astronomy

Contents

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1. Basics of radio astronomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2. Radio telescopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2. LOFAR Telescope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1. Concept of LOFAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2. Overall architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3. Online Processing Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3. Radio Frequency Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1. Sources of RFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2. Pro-active mitigation strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3. Reactive Mitigation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1. Blanking in time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.2. Blanking in frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.3. Flagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4. RFI Mitigation for LOFAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4. RFI Processing Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1. Main concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2. Architecture of the library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3. Using the library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.1. Pre-requisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.2. Adapter classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.3. Running the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3.4. Feedback loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.5. Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.4. Related Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5. RFI Removal Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2. Implemented RFI algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.1. Threshold Blanking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.2.2. Parametrized Threshold Blanking . . . . . . . . . . . . . . . . . . . . . 35

3

5.2.3. Var Threshold Blanking . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.4. Sum Threshold Blanking . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.5. Auto Threshold Blanking . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.6. The APB Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.7. Threshold Flagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2.8. Var Threshold Flagging . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3. Adding algorithms to the RFI Processing Library . . . . . . . . . . . . . . . . 395.4. Implemented station detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.4.1. PreCorrelation Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 405.4.2. PostCorrelation Detector . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.5. Adding station detectors to the RFI Processing Library . . . . . . . . . . . . 41

6. Results obtained for LOFAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.1. Integration with LOFAR correlator software . . . . . . . . . . . . . . . . . . . 43

6.1.1. Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2. Testing methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.3. Results for the post correlation algorithms . . . . . . . . . . . . . . . . . . . . 456.4. Results for the pre correlation algorithms . . . . . . . . . . . . . . . . . . . . 49

6.4.1. Frequency/Time row average . . . . . . . . . . . . . . . . . . . . . . . 506.5. Results for the station detectors . . . . . . . . . . . . . . . . . . . . . . . . . . 516.6. Processing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.7. Results of comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.7.1. Comparisons between precorrelation algorithms . . . . . . . . . . . . . 566.7.2. Results obtained by the feedback mechanism . . . . . . . . . . . . . . 57

7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.1. Recommendation for LOFAR telescope . . . . . . . . . . . . . . . . . . . . . . 637.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4

List of Figures

1.1. Electromagnetic Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2. Full-size replica of Jansky's telescope . . . . . . . . . . . . . . . . . . . . . . . 14

1.3. Grote Reber's �rst radio telescope . . . . . . . . . . . . . . . . . . . . . . . . 14

1.4. Very Large Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1. LOFAR Low Band Antennas . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2. Overview of the LOFAR processing . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3. Online processing pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1. Example of RFI in time-frequency domain . . . . . . . . . . . . . . . . . . . . 21

4.1. Architecture of the library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.1. Visualization of the LOFAR data on the model airplanes subband - reference 45

6.2. Visualization of the Threshold Flagging results with threshold = 0.005 . . . . 46

6.3. Visualization of the Threshold Flagging results with threshold = 0.5 . . . . . 46

6.4. Visualization of the Threshold Flagging results with threshold = 0.05 . . . . . 47

6.5. Visualization of the Threshold Flagging results with threshold = 0.05 (cleansubband) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.6. Visualization of the Var Threshold Flagging results with threshold = 0.01 andwindow size = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.7. Visualization of the Var Threshold Flagging results with threshold = 0.001 andwindow size = 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.8. Visualization of the Var Threshold Flagging results with threshold = 0.0005and window size = 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.9. Percentage of marked samples by the Threshold Blanking algorithm . . . . . 50

6.10. Percentage of marked samples by the Parametrized Threshold Blanking algorithm 51

6.11. Percentage of marked samples by the Var Threshold Blanking algorithm withwindow size = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.12. Percentage of marked samples by the Var Threshold Blanking algorithm withthreshold = 1200 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.13. Percentage of marked samples by the Sum Threshold Blanking algorithm withwindow size = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.14. Percentage of marked samples by the Sum Threshold Blanking algorithm withthreshold = 2000 * window size . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.15. Percentage of marked samples by the Auto Threshold Blanking algorithm (av-erage variant) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.16. Percentage of marked samples by the Auto Threshold Blanking algorithm (me-dian variant) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.17. Percentage of marked samples in a time row . . . . . . . . . . . . . . . . . . . 58

5

6.18. Percentage of marked samples in a frequency row . . . . . . . . . . . . . . . . 596.19. Processing time for one subband and 5 stations - Precorrelation Algorithms . 606.20. Processing time for one subband and 5 stations - Statistics . . . . . . . . . . . 616.21. Processing time for one subband and 5 stations - Postcorrelation Algorithms . 62

6

List of Tables

6.1. Pre Correlation StationDetector - percentage of marked stations on clear data 526.2. Pre Correlation Stations Detector - percentage of marked disturbed chunks of

data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3. Comparison between precorrelation algorithms - TV Subband . . . . . . . . . 56

7

Acknowledgments

I would like to thank my supervisor Dr. Rob V. van Nieuwpoort for all his help and supportduring this project. His advices supported by his impressive knowledge of both computerscience and astronomy were invaluable, and without them I would not be able to accomplishthis project. And, what is not less important, Dr Nieuwpoort has a great sense of humourand it is a pleasure to work with him.

I would also like to thank Dr John W. Romein, my co-supervisor from the NetherlandsInstitute for Radio Astronomy (ASTRON). I am grateful for his valuable comments thatde�nitely helped to signi�cantly improve the thesis.

This thesis is a result of a joint VU Amsterdam and ASTRON (the Netherlands Institutefor Radio Astronomy) project. I would like to thank all the people from ASTRON for givingme a chance to work with the newest radio telescope, opportunity to participate in the RFIMitigation workshop, and all the support I have received during the project. Special thanksgo to astronomers, especially Willem Baan, Andrei O�ringa and Peter Fridman. They helpedme to understand basics of radio astronomy and gave some precious advices that improvedthe �nal result.

9

Introduction

Astronomy is one of the oldest sciences in human history. Although, until middle of the20th century people were able to observe the sky only using visible light. Then, astronomersdiscovered that celestial objects also emit waves from other parts of the electromagneticspectrum - among others, the radio waves. This is how the discipline called Radioastronomybegan. The basics of Radioastronomy can be found in chapter 1.

A few years ago radio telescopes were looking mainly as a big, parabolic dishes. As keepingbuilding bigger dishes became extremely di�cult, astronomers had to �nd the other way tomake the telescopes more sensitive. New telescopes use thousands of small antennas instead.The signals from them are processed together in a central processing unit, that is often asoftware solution. We call such a telescopes a software telescope. LOw Frequency ARray(LOFAR) is one such radio telescope, developed by ASTRON - Netherlands Institute forRadio Astronomy. Detailed description of LOFAR and its central processing software can befound in chapter 2.

Unfortunately, celestial objects on the sky are not the only sources that can emit radiowaves received by the sensitive telescope antennas. Satellites, aircraft, TV and radio stations,and many other sources emit waves that disturb astronomical observation. All such signalsare called Radio Frequency Interference (RFI). A brief explanation of what RFI is, and areview of existing RFI mitigation techniques can be found in chapter 3.

Some of the observations done using the LOFAR telescope, like pulsar detection, areentirely online, so the received data is not stored anywhere for further processing. Therefore,RFI has a great impact on the astronomical data quality. Moreover, in the beam-formingprocessing pipeline, the sampled data from di�erent stations is added. If the data from onestation is bad, either by RFI or misbehavior of the station, the sum of the samples is bad aswell. Currently, no mechanism is available that detects and avoids this behavior. This thesisaddresses this problem.

To answer those questions, we created the generic RFI processing library. Description ofthe library's capabilities and a short manual can be found in chapter 4. Using the library,we can answer the question whether the online mitigation possible and e�cient, and whetherwe need to change existing data structures in LOFAR software. By integrating the Librarywith the existing source code for LOFAR we can test implemented techniques basing on realobservation data.

As many RFI mitigation techniques have been developed, the RFI Processing Library al-lows us to use many of them and compare them to each other. A list of currently implementedalgorithms and their description can be found in chapter 5.

At the beginning of chapter 6 we can �nd how the RFI Processing Library has beenintegrated with LOFAR software. The most important part of this chapter are results obtainedfor the real observation data using the library.

Finally, in chapter 7 we can �nd short analysis of the results and recommendations fortechniques for the online RFI mitigation for the LOFAR telescope.

11

Chapter 1

Basics of radio astronomy

1.1. Overview

According to Wikipedia [1], astronomy is the scienti�c study of celestial objects (such as stars,planets, comets, nebulae, star clusters and galaxies). The �rst observations of the night skyhave been done in the ancient times, what makes astronomy one of the oldest sciences.

Before inventing the telescope, people were able to analyze only objects that were visibleto the naked eye. They believed that the earth is center of the universe and that everythingthat can be seen in the sky is rotating around it. This is known as the geocentric model. Thismodel has been changed during the renaissance by Nicolaus Copernicus, who introduced theheliocentric model, where sun is the center of our solar system.

Figure 1.1: Electromagnetic Spectrum

The invention of the optical telescope was a great improvement for astronomical observa-tions. Scientists were able to discover many new stars and other objects. Still, the only sourceof information about celestial objects was the visible light, which is only a small part of a

13

electromagnetic spectrum, as can be seen on Figure 1.1. Before the early 1930s, astronomershad not known that objects can emit signals at di�erent frequencies.

Figure 1.2: Full-size replica of Jansky's telescope

The �rst discovery of radio frequency signals from astronomical sources was done byKarl Jansky. As many science discoveries, this one was also done while looking for somethingcompletely di�erent. During Jansky's work as en engineer at the Bell Telephone Laboratories,he was investigating radio frequency interference from thunderstorms. To achieve that goal,he built an antenna that was tuned to respond to radiation at a wavelength of 14.6 meters androtated in a complete circle on old Ford tires every 20 minutes [2]. Figure 1.2 shows full-sizereplica of Jansky's telescope.

Apart from the signal coming from thunderstorms, he discovered some signal coming fromunknown source. He found out that the power of this unknown static signal changed in acomplete cycle in 24 hours and it is correlated with the earth rotation. At �rst, Janksythought that source of this signal was the sun, but after further research he discovered thatthe Milky Way is the source and published his �nding in 1933.

More information about Jansky's discovery can be found in [2].

Figure 1.3: Grote Reber's �rst radio telescope

For the next few years after Jansky's discovery, no one paid a lot of attention to it. The�rst person who picked up his �nding was Grote Reber. He decided to build his own radio

14

telescope and �nished it in September 1937 [1]. As can be seen in Figure 1.3, his telescopelooks more like modern radio telescopes. At �rst, he setup his telescope to receive signalsat high frequencies, and he failed to �nd any signals from outer space. Because of the �rstfailure he decided to modify his telescope twice, to operate at lower frequency band. In1938 he successfully con�rmed Jansky's discovery and focused on creating a radiofrequencysky map. He achieved this goal in 1941, resulting in the �rst sky map based on non-visiblespectrum.

1.2. Radio telescopes

Radio frequency waves from outer sources that can pass through atmosphere range from 10MHz to 1000 GHz (1mm to 100 meters). The waves cannot be seen by human eye, but theycan be noticed by sensitive radio antennas. Radio antennas used in astronomy are called radiotelescopes. They can be used as a single antenna or in an array.

Most of the telescopes that are currently used look like large, parabolic dishes that can bedirected to any point in the sky. Dishes are used to re�ect the waves and gather them intocentral point. Example of array of such telescopes can be seen in Figure 1.4.

Figure 1.4: Very Large Array

Using two or more telescopes is called arraying. All the antennas in array receive datasimultaneously and all the data is combined into one signal. That gives to astronomers moredetailed knowledge about celestial objects, because thanks to technique called interferometryall the telescopes can act as parts of one huge radiotelescope. More information about arrayingand interferometry can be found in [4] and [5].

Antennas are usually set up for range of frequencies, also called a band. Data fromantennas after gathering are changed by the analog-to-digital converters into digital form.Then, the data is processed by the modern signal processing techniques.

A few years ago, most of the digital signal processing was done using dedicated solutionslike FPGA's (Field Programmable Gate Arrays). This solution has several disadvantages:

• It is hard to modify.

15

• It is hard to implement.

• In case of e�ciency problems, adding more devices is very expensive.

That is why currently in many observatories data is processed using software solutions -those are so called software telescopes. Compared to hardware solutions, they are much easierto modify and they can be parametrized in many ways. By adding more processing units wecan simply solve e�ciency problems without changing anything in software.

16

Chapter 2

LOFAR Telescope

2.1. Concept of LOFAR

LOFAR stands for LOw Frequency ARray and is an array radio telescope that operates in lowfrequency band (10 - 250 MHz). The main di�erence from most of modern radio telescopes isinstead of big and expensive dishes, it consists of thousands of small antennas. Signals from allthe antennas are combined in the software running on an IBM BlueGene/P supercomputer.This chapter describes concepts behind the LOFAR telescope and architecture of the softwaresolution.

2.2. Overall architecture

Figure 2.1: LOFAR Low Band Antennas

LOFAR currently consists of a 2-kilometer wide compact core area of 20 stations, 16remote stations with a maximum distance of 125 km, and 8 international stations, with amaximum distance of 1300 km [6]. In the near future LOFAR will have approximately 64stations. Each station consist of two type on antennas: 48 to 96 Low Band Antennas (LBA)and 48 to 96 High Band Antennas (HBA), which gives a total number of many thousands ofantennas.

17

All antennas are dual polarization. Low-Band Antennas (see �gure 2.1) operate in the10-80 MHz band and High-Band Antennas operate in the 110-250 MHz range. It is pointlessto observe in 80-110 MHz frequency range, because of the FM Radio transmissions.

Antennas are grouped in stations to create a hierarchical structure and to decrease amountof transfered data. Combining all the data centrally would be too computationally expensive,so signals are combined locally, within the station, using FPGAs. At the station, the analog-to-digital conversion is being done, and then signals from all the receivers inside the stationare pre-processed.

Data from the stations are transmitted to the central processing unit by the Wide-AreaNetwork - dedicated light paths were created to achieve that. UDP is used as a transportprotocol. It is an unreliable protocol, but losses of data can be tolerated, and using TCPwould be too expensive and too hard to implement on FPGAs.

Transfered data consists of samples - complex numbers that represent amplitude andphase of received signal encoded by complex integers. After receiving on the BlueGene/Psupercomputer data is �ltered and splited in smaller frequency ranges. Many pipelines areworking in parallel, responsible for beam-forming, correlating etc, which shows the �exibilityof the software processing solution. Online processing will be described in more details insection 2.3.

Processed and correlated data is stored on the storage cluster for further processing, thatcan be done o�-line.

Figure 2.2: Overview of the LOFAR processing

Overview of the data processing on LOFAR can be seen on the �gure 2.2. More detaileddescription of an architecture of LOFAR can be found in [6].

2.3. Online Processing Software

All the online processing of the LOFAR data is done on the BlueGene/P supercomputer. Thissupercomputer consists of two types of nodes: Input/Output nodes (called I/O nodes) andcompute nodes. For each type, a dedicated application was created: IONProc and CNProc,respectively.

The main tasks of the IONProc application are to receive the station UDP data, to bu�erthe data for up to 2.5 seconds, and to forward it to the compute nodes [6]. After the data isprocessed on the compute nodes, the I/O nodes receive the data and send it to the storagenodes.

18

I/O nodes chop the data streams that come from the stations into chunks of one frequencysubband and approximately one second of time. Such a chunk is the unit of data that is sentto the compute node for further processing [6].

As processing a chunk takes longer than one second, we need more than one computenodes, and chunks need to be distributed over them. Scheduling is done using a round-robinalgorithm: compute node receives the chunk, processes it, sends the results back and waits inthe queue for the next chunk.

Before a computing node can perform real computations, a data exchange has to be per-formed. I/O nodes receives all frequency subbands from one station, while computing nodesrequire one subband from all stations. After the data exchange is �nished, each computingnode can perform signal processing.

As we can see in the �gure 2.2, there are several processing pipelines, among others:

• Pulsar detection pipeline.

• Epoch of Re-ionization pipeline.

• Imaging pipeline.

• Transient sources pipeline.

The results from some of the pipelines are stored on the external storage system, but thecommon part of all the pipelines is online processing. Online processing consists of severalsteps. All steps can be seen on the �gure 2.3.

Each pipeline consists of only a part of those steps. For example, during pulsar detection,correlation step is not used. A short description of all the steps can be found below.

The �rst step is the data conversion. Samples that come from the I/O nodes are 4-bit,8-bit or 16-bit integer samples. As the BlueGene is much better in handling �oating-pointoperations, they are converted into 32-bit big-endian �oating point numbers. Conversion isdone after the data exchange, to decrease the size of data sent between nodes.

Next, the converted data are processed by a Poly-Phase Filter bank (PPF). The main taskof the PPF is to split a frequency subband into a number of narrower frequency channels.After this step, we have higher frequency resolution, but to avoid increase of the data size,the time resolution is lower.

The PPF consists of two parts:

• Finite Impulse Response (FIR) �lter, that multiplies a sample with a real weight factorgenerated on the �y.

• Fast Fourier Transformation (FFT), to transform the original function in the time do-main to a function in the frequency domain.

LOFAR stations are placed at many di�erent geographical locations, so radio waves fromthe celestial sources arrive at di�erent times. Therefore, all signals have to be shifted beforefurther processing, what is done during phase shift correction step.

The bandpass correction step compensates for an artifact introduced by a �lter bank thatruns on the FPGAs in the stations [6]. During this step, each sample is once again multipliedby a real, channel-dependent value. It cannot be done during preprocessing on the station,as it can be seen only on the data processed by the PPF.

Superstation beam forming adds the samples from a group of stations that are geograph-ically close, so that they form virtual superstation with extended sensitivity.

19

Figure 2.3: Online processing pipelines

The most computationally expensive operation is correlation. During this step samplesfrom single stations (or virtual superstitions) are correlated. As the signal from the celestialsources are very weak and single antennas receive mainly noise, it is essential to �nd thestatistical coherence. Samples of each pair of stations are correlated, by multiplying thesample of one station with the complex conjugate of the sample of the other station [6].

More detailed description of each step and all the correlator software can be found in [6].Explanation of standard signal processing methods can be found in [7].

20

Chapter 3

Radio Frequency Interference

Unfortunately for radio astronomy, celestial objects are not the only sources of radio wavesthan can be observed on earth. During years of technological progress people developeddevices that can emit radio waves, such as TV antennas. In addition, human activity, such as�ying airplanes, cause re�ection of existing waves. All those arti�cial waves are called RadioFrequency Interference (RFI).

Waves emitted and re�ected by humans interfere with those emitted by celestial objects,making observing much harder. In this chapter, the main sources of RFI are described, aswell as the main strategies used to solve this problem.

Example of RFI in time-frequency domain can be seen on �gure 3.1.

Figure 3.1: Example of RFI in time-frequency domain

3.1. Sources of RFI

According to [9], we can determine four main categories of RFI sources:

• Satellites

Satellites are a serious problem for radio observations. Some of them have signals strongenough to even destroy sensitive telescope antennas, so during the observation we have

21

to be very careful to avoid receiving signals from them. Luckily, their position in orbitcan be easily determined, so we can schedule observations properly. However, from somesources, such as GPS satellites, signals can always be received.

• Aircraft

Transmissions from �ying airplanes are very short-term, so during long-term averagedobservations they can be ignored. However, it can a�ect some observations and cannotbe easily predicted.

• Ground-based

There are plenty of ground-based RFI sources, such as TV and FM antennas, cell phoneemitting towers etc. Apart from the devices that have emitting waves as a main goal,any electronic installation can a�ect observations if it is close enough to the observatory.Astronomers try to build observatories as far from all such sources as they can, but itis not possible to avoid all ground-based RFI.

• Observatory-based

Modern observatories contain a lot of high-end electronic and computing installations,so they are sources of some RFI themselves.

3.2. Pro-active mitigation strategies

By pro-active mitigation strategies we mean all strategies that aim to avoid all RFI in the�rst place. Those are the best strategies, because by having clean electromagnetic spectrum,RFI is no longer a problem. Unfortunately, some RFI, like GPS signals, cannot be avoided.

Examples of such strategies are:

• Regulation

Many di�erent societies want to use some part of electromagnetic spectrum for theirown purposes. They need it for all sorts of wireless communication, data broadcasting,positioning systems etc. However, because of the requests from radioastronomers, somespectral bands are reserved for needs of astronomy, so observations in those bands canbe done without impact of RFI.

• Radio Quiet Zones

In cooperation with local authorities, astronomical observatories are often placed in socalled `radio quiet zones`. Observatories are protected from some ground-based RFI.In a radius of a few kilometers from the antennas, the use of all radio wave emittersis prohibited. Sometimes, radio quiet zones forbid only using speci�c frequencies, asobservations are done only using this speci�c part of spectrum. Still, this strategy doesnot help with some types of RFI, such as satellites.

• The Observatory Environment

Signi�cant part of the radio interference in observations comes from the observatoriesthemselves, because high-end electronic devices emit electromagnetic waves. Solutionsinclude extensive RFI shielding around the emitting devices, screened rooms, RFI - tightcabinets. Everything has to be monitored full-time, to avoid any leaks.

22

3.3. Reactive Mitigation Strategies

By reactive mitigation strategies we mean all strategies, where we detect RFI in data streamreceived from antennas. Then, we remove marked sampled from the data stream or adjustthe level of the interfered data.

3.3.1. Blanking in time

Blanking in time is the most popular reactive strategy. It can be done on analog, preprocesseddata as well as on digitalized samples.

Main idea is that observer sets a threshold level, that is used to distinguish RFI from theRFI - free data. In the simplest variant we iterate over all the data in time order, and sampleswith values above the threshold are marked. We can also implement more complex solutions,that make decisions based on mean values (past and future), standard deviations etc.

Blanking in time has many advantages, it is:

• simple to understand,

• easy to implement,

• fast (has low complexity),

• simple to automatize,

• quite e�ective.

As astronomy pipelines already are compute intensive, low complexity is the most impor-tant fact that makes them choose this strategy.

Example of implemented blanking algorithms can be found in [10].

3.3.2. Blanking in frequency

Because of the fact that modern software telescopes are real-time systems, sometimes it isimpossible to implement sophisticated blanking in time strategies. For example, we cannotmake a decision based on values of future samples. Instead, it sometimes is a better strategyto iterate the data in frequency order, while identifying frequencies that have RFI.

3.3.3. Flagging

Flagging is very similar to blanking with one big di�erence - �agging operates on data aftercorrelation.

Flagging is currently being done o�ine on LOFAR telescope. Details can be found in [8].

3.3.4. Summary

Some additional, less popular techniques have been developed during the years. Examplesare:

• Null Steering.

• Adaptive Filters. Adaptive �lters can be used when a copy of the RFI is available.For example, we can record the data from nearby electronic emitters. Then we canmatch the data received from telescope antennas with recorded �lter and remove theinterference. More details can be found in [11].

23

• Mitigation in array imaging stage.

• Spatial �ltering.

Choosing most adequate strategy depends on many factors and there is no single universalstrategy that will be suitable for all cases.

3.4. RFI Mitigation for LOFAR

Currently there is no online RFI mitigation for the LOFAR telescope. This problem a�ectsmainly the science pipelines that are computed only online - like the pulsar detection pipeline.Still, even for the other pipelines, the data on the storage nodes are saved without an impactof any RFI mitigation.

A solution for o�ine RFI mitigation for LOFAR data has been created (O�ringa et. al.[8]). It is software that allows to detect and remove RFI in radio measurement sets - astandard �le type for storing the radio data, also used in LOFAR.

The algorithms implemented in O�ringa's RFI software are:

• Threshold Flagging

• Var Threshold Flagging

• Sum Threshold Flagging

• CUSUM method

• Surface �tting and smoothing

• Singular Value Decomposition

First three of them are described in the section 5.2. Rest of them, are described in [8].It is very hard to use Surface �tting and smoothing or Singular Value Decomposition

for online mitigation purposes, as they are more computationally expensive - only 4 �rstalgorithms have linear complexity [8].

24

Chapter 4

RFI Processing Library

4.1. Main concept

As it is explained in the previous section, there are many techniques to mitigate radio fre-quency interference. Choosing the most suitable technique depends on many di�erent factorsfor each telescope, as signal-to-noise ratio, characteristic of RFI and others. To make the pro-cess of making a decision easier, we have decided to create a generic RFI processing librarythat can be easily integrated with any existing software telescope.

The main goals for the library are:

• It has to be written in a common programming language used to program softwaretelescopes.

• It has to be data format independent.

• Adding RFI detection algorithms or misbehaving stations detection algorithms to thelibrary has to be relatively simple.

• It has to enable the comparisons of algorithms.

Because of the fact that C++ language is currently most commonly chosen for softwaretelescopes (i.e. LOFAR, VLA) we have decided to choose this language for implementation.

To provide independence of data format used in a telescope, the library is based on meta-programming concepts. In order to use it, an adapter class for the data has to be implemented.This class is responsible to create data iterators, on which algorithms operate.

To add an algorithm to the library, we simply write a class that inherits from the baseclass for the algorithms with a re-written detect() method. When we have algorithms thatwe want to test, we can add as many of them as we want to the main class of the library andrun. In result we obtain many useful statistics, such as the percentage of marked samples(also per time / frequency row) and relative comparisons of each pair of algorithms.

4.2. Architecture of the library

The RFI Processing Library is a template-based library. Most of the classes are templates withan adapter class, a data iterator and a sample type as a template parameters. More detailsabout the concept, that has to be ful�lled by template parameters can be found in section4.3.2. Thanks to the fact, that design of the library is generic, it can be easily integrated with

25

Figure 4.1: Architecture of the library

any existing systems, no matter what are the types of samples and how are they kept in thememory.

The main class in the library is the InterferenceMitigator. It is responsible for schedulingdetection process of all the algorithms and keeping the results. This class can also output theresults into given output stream.

InterferenceMitigator owns a collections of both Pre- and PostCorrelationAlgorithms.Those algorithms are used during detection process. Results for each algorithm are stored inthe instances of Result class, while results of comparison between them can be found in theinstances of ResultsCompared class. One instance represents results from one algorithm/pair.

PreCorrelationAlgorithm and PostCorrelationAlgorithm are abstract template classes foralgorithms. All algorithms have to inherit from one of those classes and override methoddetect(). Marking samples has to be done using mark() method.

CorrelatedDataAdapter and UncorrelatedDataAdapter are base classes that gives an in-terface for adapters. Data adapters written by the user can, but do not have to, inherit fromthose classes.

StationDetector is the abstract template class for algorithms that detect misbehaving

26

stations - both using data before and after correlation. All detection algorithms have toinherit from this class (with either correlated or uncorrelated data adapter as a templateparameter) and override method detect() .

4.3. Using the library

4.3.1. Pre-requisites

In order to use the library some pre-requisites has to be ful�lled:

• Telescope software has to be written in C++ or in another language that can be linkedwith C++.

• The signals from the telescope has to be transformed by the Fourier transformation andrepresented as a complex numbers.

• The sample type has to be comparable with double.

• The Boost library has to be installed on the system [15].

4.3.2. Adapter classes

If the pre-requisites are ful�lled, we have to create two adapter classes:

• Adapter of precorrelation data.

• Adapter of postcorrelation data.

Adapter classes are a standard use of the Adapter design pattern. The main responsibilityof those classes is to adapt data, precorrelated or correlated respectively, to a format that canbe understood by the library. To achieve this goal, each of those classes has to implementappropriate concept,

• for uncorrelated data:

class UncorrelatedDataAdapter

{

public:

unsigned int getNrOfChannels() const

unsigned int getNrOfPolarizations() const

unsigned int getNrOfStations() const

unsigned int getNrOfSamples() const

void getIteratorsFrequency(IteratorType &begin, IteratorType &end,

int stationNr, int polarizationNr, int sampleNr)

void getIteratorsTime(IteratorType &begin, IteratorType &end,

int stationNr, int polarizationNr, int channelNr)

void getIteratorsStations(IteratorType &begin, IteratorType &end,

int polarizationNr, int channelNr, int sampleNr)

};

• for correlated data:

27

class CorrelatedDataAdapter

{

public:

unsigned int getNrOfChannels() const

unsigned int getNrOfPolarizations() const

unsigned int getNrOfStations() const

unsigned int getNrOfSamples() const

void

getIteratorsFrequency(IteratorType &begin, IteratorType &end,

int stationNr1, int stationNr2,

int polarizationNr1, int polarizationNr2, int sampleNr)

void

getIteratorsTime(IteratorType &begin, IteratorType &end,

int stationNr1, int stationNr2,

int polarizationNr1, int polarizationNr2, int channelNr)

void

getIteratorsBaselines(IteratorType &begin, IteratorType &end,

int polarizationNr1, int polarizationNr2,

int channelNr, int sampleNr)

};

Methods getNrOfChannels(), getNrOfPolarizations(), getNrOfSamples(), getNrOfStations()simply return information about size of the data.

The most important methods in both cases are getIteratorsTime() and getIteratorsFre-quency(). For each of the methods, �rst two parameters (IteratorType &begin, IteratorType&end) are output parameters. The method is responsible for storing iterators pointing to thebeginning and the end of data. All the after parameters are input parameters. They are usedto specify over what part of the data we want to iterate.

The only di�erence between the methods for correlated and uncorrelated data is that wehave to specify information for one station in the uncorrelated case, and for two in correlatedcase.

Methods getIteratorsStations() and getIteratorsBaselines() have to be implemented onlyif we want to use the feature, that allows us to detect and remove all the data from particularstation.

In addition, in adapters we can de�ne two public types:

• IteratorType

• SampleType

The advantage of doing so, is that those are default template parameters for mitigator objectsand algorithms, and we do not have to use them explicitly, so using the library is much easierfrom programmer's point of view.

4.3.3. Running the algorithms

When we have both adapters implemented we can run the mitigation software. The �rst stepis to create the main mitigation object - an instance of the class InterferenceMitigator. Themost important elements of this class can be seen below:

28

template<

typename UncorrelatedDataAdapter,

typename CorrelatedDataAdapter,

typename UncorrelatedDataIterator = typename UncorrelatedDataAdapter::IteratorType,

typename UncorrelatedSampleType = typename UncorrelatedDataAdapter::SampleType,

typename CorrelatedDataIterator = typename CorrelatedDataAdapter::IteratorType,

typename CorrelatedSampleType = typename CorrelatedDataAdapter::SampleType>

class InterferenceMitigator

{

public:

typedef PreCorrelationAlgorithm<UncorrelatedDataAdapter,

UncorrelatedDataIterator, UncorrelatedSampleType> UncorrelatedAlgorithm;

typedef PostCorrelationAlgorithm<CorrelatedDataAdapter,

CorrelatedDataIterator, CorrelatedSampleType> CorrelatedAlgorithm;

typedef StationDetector<UncorrelatedDataAdapter,

UncorrelatedDataIterator, UncorrelatedSampleType> UncorrelatedDetector;

typedef StationDetector<CorrelatedDataAdapter,

CorrelatedDataIterator, CorrelatedSampleType> CorrelatedDetector;

InterferenceMitigator();

/* RFI detection */

void addAlgorithm(CorrelatedAlgorithm &algorithm);

void addAlgorithm(UncorrelatedAlgorithm &algorithm);

void runUncorrelated(UncorrelatedDataAdapter & uncorrelatedAdapter);

void runCorrelated(CorrelatedDataAdapter &correlatedAdapter);

/* Bad stations detection */

void addCorrDetector(CorrelatedDetector &detector);

void addUncorrDetector(UncorrelatedDetector &detector);

void runStationsCorr(CorrelatedDataAdapter &adapter);

void runStationsUncorr(UncorrelatedDataAdapter &adapter);

/* Results */

void outputToFile(std::string & file);

void outputToFile(const char * file);

void output();

};

29

As we can see in the listing, the InterferenceMitigator class is a template with six param-eters. To create an instance of this class, we have to de�ne types of adapter classes, iteratorsover both correlated and uncorrelated data and sample types. However, if we have de�nedIteratorType and SampleType inside adapter classes as speci�ed in section 4.3.2, we haveto pass only two parameters, so the typical use of this class looks like InterferenceMitiga-tor<UncorrelatedDataAdapter, CorrelatedDataAdapter>.

After the instance of main object has been created, we can add algorithms. We have twotypes of algorithms:

• Algorithms working on uncorrelated data.

• Algorithms working on correlated data.

For each of them there is an abstract template class called, respectively PreCorrelation-Algorithm and PostCorrelationAlgorithm. Signatures of those classes looks like below:

• PreCorrelationAlgorithm

template<typename DataAdapter,

typename DataIterator = typename DataAdapter::IteratorType,

typename SampleType = typename DataAdapter::SampleType>

class PreCorrelationAlgorithm {};

• PostCorrelationAlgorithm




class PostCorrelationAlgorithm {};

As we can see above, the main template parameter in both cases is DataAdapter, andonce again, if we have public types de�ned in the adapter, it is the only template parameter.

PostCorrelationAlgorithm and PreCorrelationAlgorithm are pure abstract classes, so wecannot create instances of them. All the algorithms in the library inherit from one of thoseclasses. To use an algorithm, we �rst have to create an instance of one of such inheritedclass. The list of algorithms currently implemented in the library can be found in chapter5. Once created, we can add algorithms to the main object using the overloaded methodaddAlgorithm().

There is no limit for the number of tested algorithms, but the complexity of comparingthem is O(n2k), where n is the number of algorithms and k is the number of samples. Thiscomes from the fact that we have to compare each pair of them and there are n(n − 1)/2pairs.

Algorithms can be parametrized - for example, a simple threshold algorithm gets the valueof a threshold. Parameters are passed during construction of an object. If we want to comparethe e�ectiveness of one algorithm with two di�erent parameters, this is not a problem - in onemitigator object we can add many instances of one class.

After setting the collection of algorithms, we can run the detection process. It is donesimply by calling either the runCorrelated() or runUncorrelated() method of the mitigatorobject with instances of data adapters as a parameters. All detecting and comparing processesare included in those methods.

If we want to detect misbehaving stations, we can add detector objects to the mitigator.Just like in the RFI algorithms case, we have two types of detectors:

30

• Detectors working on uncorrelated data.

• Detectors working on correlated data.

Each detector has to inherit from the StationDetector class, independently of used datatype. Signature of this class looks like below:




class StationDetector {};

Once again, the main template parameter is DataAdapter. By checking type of theAdapter, we can determine if the particular detection algorithm works on the data afteror before the correlation process.

The list of detectors currently implemented in the library can be found in chapter 5.The results and statistics about all the detection processes are stored inside the mitigator

object. In order to see them we should use the output() method. By default, results are shownon the std::cerr stream. If we want to redirect it to �le we should �rst use the outputToFile()method with the name of the �le as a parameter.

A simple, complete example of using RFI Processing Library can be found below:

SampleData data;

SampleDataAdapter adapter(data);

PreCorrelationThresholdAlgorithm<SampleDataAdapter> preThreshold(20);

PostCorrelationThresholdAlgorithm<SampleDataAdapter> postThreshold(20);

CorrDetector<SampleDataAdapter> postDetector(2, 0.5, 3);

UncorrDetector<SampleDataAdapter> preDetector(2, 0.5);

InterferenceMitigator<SampleDataAdapter, SampleDataAdapter> mitigator;

mitigator.addAlgorithm(preThreshold);

mitigator.addAlgorithm(postThreshold);

mitigator.addCorrDetector(postDetector);

mitigator.addUncorrDetector(preDetector);

mitigator.runUncorrelated(adapter);

mitigator.runCorrelated(adapter);

mitigator.runStationsCorr(adapter);

mitigator.runStationsUncorr(adapter);

mitigator.output();

4.3.4. Feedback loop

While some processing pipelines do not use correlated data at all, it is easier to detect someRFI (for example RFI located close to only one of the stations) process on this data. To make

31

this possible we have created a mechanism called feedback loop in the library.Feedback loop is a mechanism that analyzes the results of the detection process performed

on correlated data to mark uncorrelated samples, so the uncorrelated sampled data is a�ectedby post correlation RFI detection. Then, by keeping results of this process in the memory,those samples can be used in online processing pipelines (e.g. beam forming mode), whichuse only uncorrelated data.

The main idea is to check if all the samples from particular moment/channel was markedby post correlation algorithm for all the baselines including given station. If this condition isful�lled, we mark sample for given station/moment/channel in the uncorrelated data.

The algorithm of the feedback mechanism looks as follows:

for each polarization

for each sample in time

for each frequency channel

for each station

if sample is marked for all baselines including this station

if data is integrated over time

mark all integrated samples for this channel/station/polarization

else

mark particular sample for this channel/station/polarization

As we can see above, feedback loop can work with data that is both integrated and notintegrated in time domain.

The second advantage of having feedback is that comparisons between correlated anduncorrelated data are possible, because we can compare the results of the RFI detection onuncorrelated data with results obtained by the feedback loop on samples before correlationbased on correlated data.

4.3.5. Statistics

As a result of the detection process we get the following set of statistics for each algorithm:

• The number of processed samples.

• The number of marked samples.

• The percentage of marked samples.

• The percentage of marked samples in a time row, in which any sample was marked.

• The percentage of marked samples in a frequency row, in which any sample was marked.

• The number of marked samples per station.

As a result of comparison process we get following set of statistics for each algorithmspair:

• Percentage of samples marked by �rst algorithm, that were marked by second algorithm.

• Percentage of samples marked by second algorithm, that were marked by �rst algorithm.

For each stations detection algorithm, as a result of running the algorithm we get for eachstation the number of seconds where the station was recognized as misbehaving one.

Below we can see the sample output for two uncorrelated RFI algorithms and one stationdetector:

32

Uncorrelated results:

0) Samples: 118272000, Marked: 0.268176, FreqRovAvg: 2.2243, TimeRowAvg: 1.52696

Results per station: 8224 1506 25242 19 282186

1) Samples: 118272000, Marked: 0.704076, FreqRovAvg: 4.98852, TimeRowAvg: 3.03029

Results per station: 14962 5095 40935 694 771039

0, 1) First_To_Second ratio: 1, Second_To_First ratio: 0.388596

Uncorrelated Results:

0) 0 0 5 0 40

4.4. Related Libraries

As software correlators are relatively new, very few similar libraries have been implemented.One of the similar products is O�ringa's [8] project already described in section 3.4. The

di�erence between it and the RFI Processing Library we describe here is that it can work onlyo�ine on stored data after correlation, while the RFI Processing Library can be integratedinto any online pipeline and detect RFI on data both before and after correlation.

A second approach to RFI mitigation in software solutions has been used in the DifXCorrelator. It has been used as an online RFI algorithms testbed. Still, while the RFIProcessing Library can be integrated with any software correlator, solution created in DifXCorrelator is suitable to work only within the rest of the correlator software. Currently, theyuse only Spectral Kurtosis approach to detect RFI. Description of use of the DifX Correlatoras a testbed for RFI algorithms can be found in [12].

33

Chapter 5

RFI Removal Algorithms

5.1. Rationale

The RFI Processing Library was created to see if there is a possibility to implement onlineRFI mitigation on LOFAR Telescope. As correlating samples is already very computationallyexpensive and it is a real time process, algorithms used for the RFI mitigation process haveto be extremely fast and e�ective. For LOFAR purposes, we can a�ord only a few �oatingpoint operations per sample. Because of that fact, the algorithms chosen for implementationare relatively simple.

5.2. Implemented RFI algorithms

5.2.1. Threshold Blanking

The simplest algorithm in the library. The main idea is to mark samples that have the realpart above the given level. The only parameter for this algorithm is value of threshold. Adescription of the algorithm in pseudo code can be found below:

for each station


for each moment in time domain

get iterator over frequency

for each iterated sample

if real part of sample exceeds threshold

mark sample

The algorithm is very easy, but very accurate with handling short bursts of RFI. The pseudocode above use iterators over the frequency domain, but this algorithm can be applied to boththe time and the frequency domain, and the results are the same.

5.2.2. Parametrized Threshold Blanking

Parametrized threshold blanking is a modi�cation of simple threshold algorithm describedabove. The di�erence is that, if we �nd sample with value exceeding threshold, we mark notonly that particular sample, but also a rectangular area around it. The parameters for thealgorithm are:

• The value of the threshold.

35

• The size of the rectangle in frequency domain.

• The size of the rectangle in time domain.

A description of the algorithm in pseudo code can be found below:

for each station






mark all the samples from the given rectangle

around that sample that have not been marked yet

The advantage of this approach is that if we have really strong burst usually the samplethat exceeds the threshold is not the only one a�ected by the source of RFI, so we also markthe closest neighborhood. Just like in the �rst algorithm, it can be used in both time andfrequency domain.

5.2.3. Var Threshold Blanking

Var Threshold Blanking is another modi�cation of the threshold algorithm. To mark a sampleas invalid, a given number of samples in a row in the frequency domain have to exceed thethreshold value. The main reason to use this algorithm is to �nd RFI, which is not as strongas single bursts, but is spread across multiple frequencies. Because of that, it can be usedonly in the frequency domain.

Parameters of Var Threshold Blanking are:

• The value of the threshold.

• The number of samples in a row that have to exceed the threshold level (�ag border).


for each station




count = 0



increment count

if count > flag border

mark processed sample

else if count == flag border

to_mark = count

while to_mark >= 0

mark sample, which is to_mark positions before processed sample

decrement to_mark

else

count = 0

36

This is simpli�ed version of the algorithm described in [8], adapted to the data beforecorrelation.

5.2.4. Sum Threshold Blanking

Sum threshold blanking is a variant of the previous algorithm. The only di�erence is that,instead of testing if each of the samples in a row exceeds some value, we test if the sum ofgiven number of samples in a row ful�ll this condition. The parameters are:

• Value of the threshold for the sum.

• Number of summed samples.


for each station




sum = 0

clear samples Queue


sum += real part of sample

push real part of sample into samples Queue

if size of samples Queue > number of summed samples

sum -= value taken from the end of queue

if sum exceeds threshold

if in last iteration sample was marked

mark this sample

else

mark last N samples (including processed one),

where N = number of summed samples

This is simpli�ed version of the algorithm described in [8], adapted to the data beforecorrelation.

5.2.5. Auto Threshold Blanking

The �rst algorithm in the library with a dynamically determined threshold. For each chunk ofdata, it calculates mean and standard deviation of real values of samples, and set the thresholdlevel to mean + aggressiveness * standard deviation, where aggressiveness is a parameter ofthe algorithm. A description of the algorithm in pseudo code can be found below:

for each station




calculate mean of real parts of samples

calculate standard deviation of real parts of samples

set threshold level to mean + aggressiveness * standard deviation


37


mark sample

Second variant of this algorithm have been implemented as well - the median variant. Onlydi�erence is that this variant sets the threshold level to median + aggressiveness * standarddeviation.

5.2.6. The APB Algorithm

The APB Algorithm is a threshold algorithm with dynamic calculation of a threshold level.The main idea is, like in the Auto Threshold Blanking algorithm, to mark samples that exceedlevel µ+aggrσ, where µ is mean of samples, σ is standard deviation, and aggr describes howo�ensive algorithm should be.

APB Algorithm di�ers from Auto Threshold Blanking in a two main ways:

• It works on norms of samples, instead of their real parts.

• Statistics are estimated.

The Algorithm has the following parameters:

• aggr - aggressiveness of the algorithm.

• FIFO - length of the queue used to calculate mean and standard deviation

• n− blank - number of samples that have to be marked around the sample exceeding thethreshold (has to be lower than FIFO length)

• step - used to increase e�ciency, checking one in step samples.


for each station



get iterator over frequency domain

detectInLine()

where detectInLine() method looks as follows:

calculate initial mean of norms for first FIFO samples

calculate initial standard deviation of norms for first FIFO samples

iterate over samples with given step

estimate new mean of norms

estimate new standard deviation of norms

if norm of given sample exceeds mean + aggr * standard deviation

mark (n-blank / 2) samples before and after this sample

As this algorithm �ags independent samples, theoretically it can be used in both the timeand the frequency domain.

A Detailed description of the algorithm can be found in [14].

38

5.2.7. Threshold Flagging

Threshold �agging, in opposite to the algorithms mentioned in previous sections works ondata after correlation. The main idea is the same as in the algorithm described in section5.2.1 - we are marking samples with values that are above the given level. Just like in theblanking version, the only parameter for this algorithm is threshold level.

Description of the algorithm in pseudo code can be found below:

for each baseline (pair of stations)






mark sample

5.2.8. Var Threshold Flagging

Var threshold �agging is adaptation of algorithm mentioned in section 5.2.3 to work with dataafter correlation. It has the same parameters and the only di�erence is that it is working onbaselines (pairs of stations) instead of single stations.


for each baseline (pair of stations)




count = 0



increment count

if count > flag border

mark processed sample

else if count == flag border

to_mark = count

while to_mark >= 0

mark sample, which is to_mark positions before processed sample

decrement to_mark

else

count = 0

5.3. Adding algorithms to the RFI Processing Library

To add a new algorithm to the library, we have to create a class that inherits from one of twoclasses:

• PreCorrelationAlgorithm - for algorithms working on data before correlation.

• PostCorrelationAlgorithm - for algorithms working on data after correlation.

39

The algorithm has to override the virtual abstract method detect(). This method is calledby the InterferenceMitigator object, its goal is to detect invalid samples.

In the detect() method we should use either getIteratorsFrequency() or getIteratorsTime()to obtain iterators over data. Choosing one of those depends on if we want to iterate overthe time or over the frequency domain. If the algorithm decides that a particular sample isinvalid, it has to call mark() method from base class. This method is responsible for changingthe internal structures used for calculating statistics and comparing algorithms between eachother.

In the base classes, the getMedian(), getAverage() and getStdDev() methods are de�ned.They can be used for example, for determining the threshold.

5.4. Implemented station detectors

5.4.1. PreCorrelation Detector

PreCorrelation Detector detects corrupted stations basing on the samples before correlation.By calculating mean and standard deviation of real values of samples across all the stations,

we determine a window at which real values should �t. If value from some station does not�t into this window, we increment counter of marked samples for this particular station. Ifcounter exceeds given level, the station is marked as corrupted.

The Detector has the following parameters:

• aggr - aggressiveness of the algorithm.

• part ∈ (0, 1) - part of all the samples that have to be disturbed to mark the station.

The algorithm looks as follows:

for each time chunk of data

stationsCounter = 0

for each sample, channel, polarization

calculate mean of real parts of samples

calculate standard deviation of real parts of samples

low = mean - aggr * standard deviation

high = mean + aggr * standard deviation

for each station

if real part of sample is not in (low, high)

increment stationsCounter[station]

if stationsCounter[station] exceeds (the number of all samples per station * part)

mark station as corrupted on this chunk of data

5.4.2. PostCorrelation Detector

The PostCorrelation Detector is very similar to The PreCorrelation Detector - the maindi�erence is that it works on the data after correlation. Because of that, we aggregate thedata over baselines (pairs of stations) instead of stations.

The Detector has the following parameters:

• aggr > 0 - aggressiveness of the algorithm.

• baselines - number of baselines including particular station that have to be disturbedto mark the station.

40

• part ∈ (0, 1) - part of all the samples that have to be disturbed to mark the station.

The algorithm looks as follows:

for each time chunk of data

stationsCounter = 0

for each sample, channel, polarization

stationsCounterLocal = 0

calculate mean of real part of samples

calculate standard deviation of real part of samples

low = mean - aggr * standard deviation

high = mean + aggr * standard deviation

for each baseline(station1, station2)

if real part of sample is not in (low, high)

increment stationsCounterLocal[station1]

increment stationsCounterLocal[station2]

for each station

if stationsCounterLocal[station] exceeds baselines

increment stationsCounter[station]

if stationsCounter[station] exceeds (the number of samples per baseline * gamma)

mark station as corrupted on this chunk of data

5.5. Adding station detectors to the RFI Processing Library

To add a new station detector to the library, we have to create a class that inherits from theStationDetector template class.

The detector has to override the virtual abstract method detect(). This method is calledby the InterferenceMitigator object, its goal is to detect invalid stations. Methods getItera-torsBaselines() and getIteratorsStations() can be used to iterate over samples in the stationsdomain. The detect() method is called once for each chunk of data. Result of the detectionprocess is a vector of boolean values that for each station state if it is corrupted or not.

41

Chapter 6

Results obtained for LOFAR

6.1. Integration with LOFAR correlator software

To test the algorithms, a dedicated version of the processing software has been created. Insteadof using the live data streams from stations, it takes the data from existing stored observations.The tests can be run o�ine, outside the BlueGene/P computer. The great advantage of doingso, is that tests are repeatable on the same data, allowing us to compare the e�ectiveness ofthe algorithms.

6.1.1. Adapters

To make data understandable by the library, two classes have been created - adapter classesfor data both before and after correlation.

The class that contains precorrelated data after the FFT transformation is called Filtered-Data. Samples are stored in a four dimensional multi array from the boost library, containingdata represented by complex �oat numbers. Dimensions are:

• The number of samples (time domain).

• The number of channels (frequency domain).

• The polarization of sample (X or Y).

• The number of stations.

The FilteredDataAdapter class has been created to adapt the data. To provide requiredfunctionality, a one-dimensional view of an array is created, and iterators to the beginningand end of this view are returned. The class also de�nes two types of data:

• SampleType - as �oat.

• IteratorType - as MultiDimArray<std::complex<�oat>, 1>::iterator - standard Boostiterator over one-dimensional array (details can be found in [15]).

The class that contains correlated data is called CorrelatedData. Samples are stored in afour dimensional multi array from the Boost library, containing data represented by complex�oat numbers. Dimensions are:

• The number of channels (frequency domain).

• The number of baselines (two stations combined).

43

• The polarization of sample from the �rst station (X or Y).

• The polarization of sample from the second station (X or Y).

Note that there is no time domain, because samples over whole second are integrated intoone, by complex add (for details, see [6]). To adapt the data, the CorrelatedDataAdapterclass has been created. This class returns an iterator over a one-dimensional view as well.The types de�ned by this class are exactly the same as in the FilteredDataAdapter case.

The InterferenceMitigator with a FilteredDataAdapter and a CorrelatedDataAdapter astemplate parameters is created in the main processing class of the real time processing soft-ware. The detection process in Filtered Data is performed just after the FFT transformation,while in CorrelatedData it is done after the correlation process.

According to the Figure 2.3, on the LOFAR processing pipeline RFI mitigation on theuncorrelated data is done after the PPF (6), while on the correlated data just after thecorrelation (16). The feedback mechanism a�ects data after the superstation beam forming(9).

The tests have been performed on an idle

6.2. Testing methodology

The tests have been performed on an idle server with following con�guration:

• Intel(R) Core(TM) i7 CPU 920 2.67GHz - 8 cores

• 6 GB RAM

• Linux jupiter 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 04:38:19 UTC 2010 x86_64GNU/Linux

Real observation �les were used containing raw station output data of an observation with5 stations and 5 subbands. 5 MPI processes working in parallel were used. The observa-tion length was about two hours, and was done on Wed Apr 28 2010 19:25:03 GMT+0200.Subbands and stations were chosen to present di�erent types of RFI.

The list of subbands:

• 138 (27MC radio wavelength)

• 183 (model airplanes, the stations are close to a model airplanes air�eld)

• 256 (clean)

• 282 (TV)

• 283 (TV)

The stations used (CS = core station, RS = remote station):

• CS004

• CS006 (close to CS004)

• RS205 (electric fence to keep sheep in the �eld)

• RS306

• RS208 (far away from the core)

44

6.3. Results for the post correlation algorithms

In the Figure 6.1 we can see the visualized correlated data in the time / frequency domain.The red part represents clean data, while the white part represents RFI detected by o�inereference �agger on complete dataset. As it has a lot of RFI, we will use this Figure as areference for estimating the power of post correlation algorithms in whole this section.

Figure 6.1: Visualization of the LOFAR data on the model airplanes subband - reference

Blue points in each of next �gures represent samples marked by the particular algorithmdescribed in the Figure's caption.

In the Figure 6.2 we can see the results obtained by the Threshold Flagging Algorithmwith threshold level set to 0.005. As the big part of RFI is correctly marked, there is a hugeamount of false positives, that disqualify this algorithm for practical use.

In the Figure 6.3 we can see the results obtained by the Threshold Flagging Algorithmwith threshold level set to 0.5. In this case, there are no false positives, but the amount ofcorrectly marked RFI sharply dropped. Therefore, threshold level set to a value that highalso cannot be used in practice.

In the Figure 6.3 we can see the results obtained by the Threshold Flagging Algorithm withthreshold level set to 0.05. Right now, the amount of correctly marked samples is almost thesame as in case of threshold level set to 0.005. Still, there are no false positives. That makes0.05 the most suitable choice for the threshold level in the Threshold Flagging algorithm. Inthe Figure 6.5 we can see the results obtained by this algorithm on the clean subband for alonger period of time. Also in this case, the algorithm does not produce any false positives,while signi�cant part of RFI is marked.

In the Figures 6.6 - 6.8 there are results obtained by the Var Threshold Flagging Algorithm.Threshold level in each case is set to lower than 0.5, as for this level even the ThresholdFlagging algorithm produces no false positives.

45

Figure 6.2: Visualization of the Threshold Flagging results with threshold = 0.005


46


Figure 6.5: Visualization of the Threshold Flagging results with threshold = 0.05 (cleansubband)

In the Figure 6.6 we can see the results obtained by the Var Threshold Flagging Algo-rithm with threshold level set to 0.01 and window size set to 4. There are almost no falsepositives, but the amount of correctly �agged samples is lower than the number achieved bythe Threshold Flagging algorithm with threshold level set to 0.05. Unfortunately, as can beseen in the Figures 6.7 and 6.8 decreasing the threshold value only increases the false positivesratio, while the amount of correctly marked samples stays constant.

47

Figure 6.6: Visualization of the Var Threshold Flagging results with threshold = 0.01 andwindow size = 4


48


6.4. Results for the pre correlation algorithms

As no tool for visualizing an uncorrelated data have been created, for the algorithms workingon precorrelated samples we provide deep analysis of results based on percentages of markedsamples for each algorithm. By having carefully chosen subbands, we can estimate the powerof the algorithms.

Figures 6.9 - 6.16 show the percentage of marked samples depending on the algorithmparameter. Each line represents one subband.

As the impact of RFI on the clean subband should be relatively small, we can look at thepercentage of marked samples on the clean subband as a false-positive ratio.

We can see that increasing the threshold (or window size) decreases the false-positive ratio,and it increases the false-negative ratio. Samples from TV subbands, that obviously shouldbe marked and are marked by algorithms with lower threshold level, are not marked anymorewhen we increase those values.

As we can see in the Figure 6.10, some algorithms can have a large false-positive ratioin some particular cases. In this case, model airplanes emit very short, but relatively strongwaves. If we mark not only sample, which is above the threshold, but also samples that arebefore and after this sample, we drastically increase the false-positive ratio.

Figures 6.12 and 6.14 show the increasing window size even to small values like 5 decreasepercentage of marked samples to the level of statistical error. Only in the model airplanessubband, some RFI is spread over frequencies in a single time moment. That makes thosealgorithms not very useful.

Sum Threshold marks a lot more samples than Var Threshold. Many of those samples arefalse-positives - one sample with extremely high value a�ects marking of samples next to it.

In the Figures 6.15 and 6.16 we can see results obtained by the Auto Threshold Blanking

49

Figure 6.9: Percentage of marked samples by the Threshold Blanking algorithm

algorithm. As the percentage of marked samples is almost the same for all the subbands anddoes not di�er across the stations, none of the variants can be used in practice for determiningthe threshold. Results from APB algorithm are not shown, as for the frequency domain itseems to be completely useless. Flagged data seems to be completely random, with the sameamount of �agged samples for each station/subband.

6.4.1. Frequency/Time row average

Currently in the LOFAR software there is a possibility to mark in a chunk of data as invalidonly whole time/frequency rows. To see if there is a need of implementing a mechanism tomark only a few particular samples we have created a mechanism to check how many samplesare marked in a row, that contains at least one marked sample.

As samples after correlation are integrated over time, the results are relevant only for thepre-correlation algorithms.

In the Figure 6.17 we can see how many samples with di�erent time stamps and samefrequency channel are marked, for those frequencies that have at least one sample marked asinvalid.

In the Figure 6.18 we can see how many samples from di�erent frequency channels withthe same timestamps are marked, for those timestamps that have at least one sample markedas invalid.

50

Figure 6.10: Percentage of marked samples by the Parametrized Threshold Blanking algorithm

As we can see on both �gures, only in the model airplanes subband, RFI is concentrated inparticular places. In that case a loss of good samples while marking the whole frequency/timerow may be acceptable.

In other cases, the percentages are between 0 and 10 percent, so we would mark morethan ten times more good samples than bad samples.

6.5. Results for the station detectors

Tables 6.1 and 6.2 presents data obtained by the Pre Correlation Station Detector for threedi�erent subbands:

• The radio subband.

• The clean Subband.

• The TV Subband.

A �rst row of each table represents the aggressiveness of algorithms, a �rst column repre-sents the percentage of samples, that have to be corrupted to mark the station as misbehaving.Each �eld in the table has three values for the radio, clean, and TV subband, respectively.

In the table 6.1 we can see the percentage of chunks of data that have been marked forthe given data. As all the stations were behaving correctly, we can look at the values in the

51

Figure 6.11: Percentage of marked samples by the Var Threshold Blanking algorithm withwindow size = 3

table as the false positives ratio. As we can see, values in the left top corner of the table, withaggressiveness around 1.0 and required percentage of marked samples around 10, are close to100, so usefulness of them is close to 0. On the other hand, the algorithm with aggressivenessaround 2 and required percentage of samples above 30 had no false positives.

Parameters 1.25 1.50 1.75 2.00 2.25

10 100 / 100 / 100 80 / 99 / 100 20 / 19 / 3 0 / 0 / 0 0 / 0 / 0

20 60 / 99 / 100 20 / 20 / 10 11 / 1 / 0 0 / 0 / 0 0 / 0 / 0

30 20 / 20 / 19 19 / 3 / 0 0 / 0 / 0 0 / 0 / 0 0 / 0 / 0

40 20 / 7 / 0 2 / 0 / 0 0 / 0 / 0 0 / 0 / 0 0 / 0 / 0

Table 6.1: Pre Correlation StationDetector - percentage of marked stations on clear data

To test the algorithm, we have created the mechanism of injecting arti�cial disturbanceto samples from the given station. Two di�erent approaches have been used:

• Multiplying all the samples from the particular station by a given constant factor.

• Multiplying all the samples from the given station by a random number in (−f, f),where f is a given factor.

52

Figure 6.12: Percentage of marked samples by the Var Threshold Blanking algorithm withthreshold = 1200

The set of tests using both approaches have been performed, with parameters rangingfrom 2 to 8. Table 6.2 contains results of the tests. Values in the table represents percentageof arti�cially disturbed chunks of data, that have been correctly recognized as a result frommisbehaving station.

Parameters 1.25 1.50 1.75 2.00 2.25

10 100 / 100 / 100 100 / 100 / 100 100 / 100 / 100 0 / 0 / 0 0 / 0 / 0

20 100 / 100 / 100 100 / 100 / 100 50 / 50 / 50 0 / 0 / 0 0 / 0 / 0

30 100 / 100 / 100 50 / 80 / 79 50 / 50 / 50 0 / 0 / 0 0 / 0 / 0

40 50 / 85 / 82 50 / 50 / 50 50 / 50 / 50 0 / 0 / 0 0 / 0 / 0

Table 6.2: Pre Correlation Stations Detector - percentage of marked disturbed chunks of data

As we can see, for the aggressiveness set to 1.5 and the required percentage of samples setto 20 all the disturbed stations were recognized correctly. Unfortunately, the false positiveratio in this case is still signi�cant. Still, if the quality of astronomical data for particularobservation is crucial, we can a�ord a loss of some stations that behaves relatively bad to getcompletely clean output.

For the aggressiveness set to 1.75 and the required percentage of samples set to 30 westill recognize correctly 50 percent of arti�cial disturbance with no false positives at all. If we

53

Figure 6.13: Percentage of marked samples by the Sum Threshold Blanking algorithm withwindow size = 3

cannot a�ord a loss of good data, it seems to be the most suitable choice.

Unfortunately, results obtained for the Post Correlation Stations Detector were not satis-fying - either all the stations were marked or none of them.

6.6. Processing time

In the Figure 6.19 we can see the processing time for one subband and 5 stations for theprecorrelation algorithms. As LOFAR will have approximately 64 stations, those resultsshould be multiplied by a factor of 13. Still, tests were performed on the test server, so theycan di�er on the BlueGene/P supercomputer.

We can see that the processing time for the 3 �rst algorithms are very similar and arearound 15 ms - so for 64 stations it would be around 200 ms. The Sum Threshold Blankingalgorithm is more computationally expensive - processing time is around 3 times bigger thanin the other cases, while results are very similar (or worse) to the Var Threshold Blankingalgorithm.

The cost of adding a dynamically calculated threshold can be seen on the Auto ThresholdBlanking. While results are highly unsatisfying, the processing time of calculation of meanand standard deviation makes the algorithm twice as expensive as in standard case. Medianvariant is even more expensive. The APB algorithm, which works on norms, is extremely

54

Figure 6.14: Percentage of marked samples by the Sum Threshold Blanking algorithm withthreshold = 2000 * window size

expensive.

The PreCorrelation Stations Detector algorithm is very fast, as calculating mean andstandard deviation across few stations is not computationally expensive. Therefore, accordingto the good behavior of this algorithm it is recommended to integrate this solution withexisting software.

If we want to create our own mechanism of determining the threshold, the Figure 6.20 givesus an answer about time needed for calculating essential statistics using built-in solutions.As we can see, the time needed for calculating an average is around 5 ms - it is less thana half time needed to perform simple thresholding, but the time needed for calculating themedian is 6 times bigger than that - the selection algorithm is computationally expensive[13]. Calculating the standard deviation is only two times more expensive than calculatingthe average value, but the average value has already to be calculated.

In Figure 6.21 we can see the processing time for postcorrelation algorithms. They lookvery low, but it grows squared with the number of stations. To get the processing time for 64stations we have to multiply those results with a factor of 200, achieving results only a littlebit lower than in precorrelation case.

55

Figure 6.15: Percentage of marked samples by the Auto Threshold Blanking algorithm (av-erage variant)

6.7. Results of comparisons

6.7.1. Comparisons between precorrelation algorithms

Results of comparisons between precorrelation algorithms show that algorithms were designedto �nd di�erent types of RFI.

Comparisons between algorithms of the same type with di�erent parameters can be omit-ted - it is obvious that algorithm with lower threshold mark all the samples marked byalgorithm with higher threshold plus some additional samples.

In Table 6.3 we compare the algorithms. The value in column A and row B tells us whatthe average percentage of samples marked by the algorithm A, is that have been marked bythe algorithm B.

Threshold Parametrized Threshold Var Threshold Sum Threshold

Threshold - 85 29 60

Parametrized Threshold 40 - 15 37

Var Threshold 85 90 - 90

Sum Threshold 55 83 60 -

Table 6.3: Comparison between precorrelation algorithms - TV Subband

56

Figure 6.16: Percentage of marked samples by the Auto Threshold Blanking algorithm (me-dian variant)

As we can see in the table, none of the algorithms marks more than 40 percent of samplesmarked by the Parametrized Threshold Blanking algorithm - the false positive ratio of thisalgorithm is very high.

The Var Threshold Blanking algorithm is specialized in �nding RFI, which is spread overmultiple frequencies, so it marks only 30 percent of samples marked by the simple thresholdalgorithm - the false negatives ratio in case of short single bursts for the Var ThresholdAlgorithm is very high. On the other hand, almost all the samples marked by the VarThreshold Algorithm are marked by the other ones.

The Sum Threshold algorithm marks 90 percent samples marked by the Var Threshold.In the same time, it marks 60 percent samples marked by the simple threshold - twice as moreas the Var Threshold Blanking. It means that by having similar true positives ratio, the falsenegatives ratio of the Sum Threshold Blanking is lower.

The table 6.3 presents the results from the TV subband, but results from the other sub-bands were comparable.

6.7.2. Results obtained by the feedback mechanism

The results using the feedback loop were highly unsatisfying - the percentage of samplesmarked by the feedback mechanism in the data before correlation were very low, even for

57

Figure 6.17: Percentage of marked samples in a time row

high percentages of samples marked in the correlated data (around 10 percent). Checking ifthe samples from all the baselines including particular station are marked as RFI in the dataafter correlation is not a good criterion for the feedback mechanism.

For example, the feedback mechanism for the Threshold Blanking algorithm with thresholdlevel set to 0.005, that gives huge amount of false positives (See section 6.3), marked only 0.3percent of samples in uncorrelated data. Keeping in mind, that because of the data integrationwe mark only whole time rows, this amount is extremely small.

We have tried to change the mechanism that we could achieve higher percentage of markedsamples. The feedback mechanism was changed to less strict - to mark the sample in uncor-related data as invalid only a part of the baselines including this station has to be marked asinvalid by post correlation algorithm. Still, even after setting the required number of a�ectedbaselines to 2, the valuable Threshold Blanking algorithm with threshold level set to 0.05marked less than one percent of uncorrelated samples. Therefore, the feedback mechanism inits current state cannot be used for the LOFAR telescope.

58

Figure 6.18: Percentage of marked samples in a frequency row

59

Figure 6.19: Processing time for one subband and 5 stations - Precorrelation Algorithms

60

Figure 6.20: Processing time for one subband and 5 stations - Statistics

61

Figure 6.21: Processing time for one subband and 5 stations - Postcorrelation Algorithms

62

Chapter 7

Conclusions

7.1. Recommendation for LOFAR telescope

As some of the science pipelines in the LOFAR telescope do not use correlated data at alland the feedback mechanism is highly unsatisfying, implementation of at least one of thealgorithms that works on the data before correlation is essential. As compute nodes getchunks with only one second of data, and each time they can have di�erent subband, we haveto keep in mind, that we should be focused more on the frequency domain.

Therefore, we have:

• The APB Algorithm - it was designed to work in time domain, and according to theresults it is not applicable into frequency domain (See Section 6.4).

• The Parametrized Threshold Blanking algorithm - it has very high false-positive ratio(See Figure 6.10).

• The Var Threshold Blanking algorithm and the Sum Threshold Blanking algorithm - inthe frequency they are supposed to detect RFI which is spread over multiple frequencychannels, but according to the results, increasing the window size to values more than5 drastically decreases the number of marked samples (see Figures 6.12 and 6.14), sothey do not ful�ll their primary goal.

• Threshold Blanking Algorithm - it detects most of the data marked by other algorithmsand has very low false-positive ratio (See Section 6.4).

According to the facts presented above, the simple Threshold Blanking Algorithm seems tobe the most suitable choice for the online LOFAR RFI mitigation on data before correlation.

As the algorithm is very simple, it can be easily implemented on the BlueGene/P super-computer in a very e�ective way. Unfortunately, the time / frequency row average is very low(See Section 6.4.1), and marking the whole time or frequency row would be a big loss of agood data. Therefore, a mechanism that will make marking particular samples possible hasto be implemented as well.

The threshold for the algorithm can be set as a constant at a level around 1200, wherefalse positive ratio seems to be very small, and most of the data in�uenced by the RFI isremoved. Because the algorithm is simple, we can also a�ord dynamic determining of thethreshold level, using the implemented methods getMedian(), getAverage() and getStdDev(),but as the existing methods seems not to be accurate (See Figures 6.15, 6.15), the new oneshave to be developed.

63

As a tool for recognizing misbehaving stations we recommend to implement Pre Correla-tion Station Detector - the results for this algorithm were very promising (See Section 6.5)and the algorithm is very e�cient. Depending on a desired true/false positives ratio we canchoose the most suitable parameters basing on the achieved results.

As the results obtained for the post correlation algorithms are not outstanding (See Section6.3) and online RFI is essential mainly for the pipelines, that do not use the correlated data,such as beam forming mode, and the feedback mechanism is not working as expected, we donot recommend any algorithm that works on post correlation data.

7.2. Future Work

Future work includes:

• Implementing the recommended algorithms on the BlueGene/P supercomputer, possiblyin assembler language.

• Implementing a mechanism in the software correlator for marking particular samples(and not just rows / columns).

• Research about methods of dynamic determining the threshold levels.

• Adding more algorithms to the RFI Processing Library - both working on data beforeand after correlation.

64

Bibliography

[1] Wikipedia, http://en.wikipedia.org/wiki/

[2] Miller, Diane F. (1998). Basics of Radio Astronomy for the Goldstone-Apple ValleyRadio Telescope. http://www2.jpl.nasa.gov/radioastronomy/

[3] Burke, Bernard F.; Graham-Smith, Francis (2002). An Introduction to Radio Astron-omy (2nd ed.). Cambridge University Press.

[4] Felli, M.; Spencer, R. E. (1989). Very long baseline interferometry. Techniques andapplications. Kluwer Academic Publishers.

[5] Thompson A.R.; Moran, J. M.; Swenson Jr. G. W. (2001) Interferometry and Syn-thesis in Radio Astronomy (2nd ed.). John Wiley & Sons, Inc.

[6] Romein, J.W.; Broekema; P. C.; Mol, J. D.; van Nieuwpoort, R. V., The LOFAR Cor-relator: Implementation and Performance Analysis, ACM Symposium on Principlesand Practice of Parallel Programming (PPoPP'10), Bangalore, India, pp. 169-178,January, 2010.

[7] McClellan J.; Schafer R.; Yoder M. (2003) Signal Processing First. Pearson EducationInc.

[8] O�ringa, A.R.; de Bruyn, A.G.; Biehl, M.; Zaroubi, S.; Bernardi, G.; Pandey, V.N.Post-correlation radio frequency interference classi�cation methods. Astronomy & As-trophysics 378, 327�344 (2001).

[9] Kesteven, M. The Current Status of RFI Mitigation in Radioastronomy.http://www.atnf.csiro.au/people/Michael.Kesteven/papers/

[10] Baan, W.A.; Fridman, P.A.; Millenaar, R.P.; Radio Frequency Interference Mitiga-tion at the Westerbork Synthesis Array: Algorithms, Test observations and SystemImplementation, The Astronomical Journal, 128:933�949, 2004 August

[11] Kesteven, M.; Hobbs, G.; Clement, R. Adaptive Filters Revisited - RFI Mitigation inPulsar Observations. Radio Science, Vol. 40, RS5S06, 10 pp., 2005

[12] Deller A. Software correlators as testbeds for RFI algorithms. RFI Mitigation Work-shop, 29 - 31 March, 2010, Groningen.

[13] Cormen, T. H.; Leiserson E. C.; Rivest R. L.; Cli�ord S. (2001) Introduction toAlgorithms (2nd ed.). The Massachusetts Institute of Technology.

[14] Niamsuwan N.; Johnson J.T.; Ellingson S.W. Examination of a simple pulse blankingtechnique for RFI mitigation. Radio Science, Vol. 40, RS5S03, 11 pp., 2005

65

[15] Boost library documentation, http://www.boost.org/doc/

66

Documents

Radio frequency interference mitigation for software telescopes.rob/masters-theses/Tomasz... · 2014-02-11 · make the telescopes more sensitive. New telescopes use thousands of