HERA Memo 86: H1C IDR2.2 pre-processing and power spectrum analyses

Nick Kern and the HERA Power Spectrum and Analysis teams
December, 2020

Abstract

In this memo, we present an overview of the power spectrum analysis applied to HERA Phase I (H1C) data from the Internal Data Release 2, working on the output of the main IDR2.2 HERA reduction pipeline (see Dillon2019). This memo highlights some of the additional pre-processing steps that occur before power spectrum estimation, and gives an overview of the power spectrum analysis behind HERA's first upper limit paper (HERA Collaboration in prep).

Note: Due to HERA policy, all power spectra in this memo are intentionally censored such that they are cast into normalized, dimensionless units.

In [1]:
from IPython.display import Image

1. Introduction

This memo starts with the calibrated, RFI flagged and LST binned data products from the IDR2.2 dataset (see Dillon2019 for details). A number of important steps remain to be appied to the data, which we discuss here, and have internally called the "pre-processing" pipeline (pre-power spectrum, but post-main analysis). Throughout the memo, we will highlight some of the major plots that have motivated the analysis choices, and discuss the number of revisions made to the analysis and why they were made. Additionally, we will briefly discuss some of the pipeline validation work that also informs some of these choices.

Note to internal members: More in-depth discussion on many of these aspects can be found in notebooks in the H1C_IDR2 GitHub repository.

Note on the pipeline: The second stage "pre-processing" pipeline can be found in the HERA-Team H1C_IDR2 repo under pipeline/preprocess_data.py, and the batch power spectrum pipeline is similarly under pipeline/pspec_pipe.py. Please direct questions to Nick.

Note on the data: The data discussed are currently (as of July, 2020) located on the NRAO's lustre filesystem, at /lustre/aoc/projects/hera/H1C_IDR2/IDR2_pspec/v2. The v2 stands for the second version of the pre-processing pipeline, which is less about algorithm tweaking and more about conforming to different data standards between IDR2.1 and IDR2.2 (i.e. UVH5), in addition to including the reflection calibration and cross-talk subtraction steps. The 4-pol UVH5 data files that are output from these steps are located in the data directory, and have the following file extensions

 OCRSL : LST binned +P : inpainted * +X : reflection calibrated and xtalk subtracted

  • +T : coherent LST time averaged
  • +K : pseudo-stokes visibilities
  • OCRSLPXTK : full analysis & pre-processing output

2. Additional RFI Inspection

See H1C_IDR2/notebooks/pspec_progress/IDR2_RFI_and_spw_selection.ipynb

Low-level RFI in the data that is missed by our RFI flagging algorithms can be a problem for high dynamic range power spectrum measurements. Motivated by observed foreground leakage from a first round of power spectrum analysis, we looked at foreground filtered visibilities and found evidence for some low-level narrowband RFI that was missed by our RFI algorithms and was furthermore not rejected by the median-clipping stage of the LST binner.

We filter the data by running a wideband delay CLEAN on redundantly averaged data, modeling sources from $\pm2000$ ns and inspecting the residual visibilities. We see a lot of structure in the residuals, and we take a conservative approach by flagging only narrowband features that seem to appear across multiple baseline groups. When we flag a channel, we flag it for all times and all baselines.

In [13]:
Image(filename='imgs/lowband_clean_rfi.png', width=900, )
Out[13]:

Figure 1 | Delay CLEAN residuals for three baseline groups over the H1C low band with the default flags after LST binning (top panels) and after introducing new narrowband flags (bottom panels) marked in white.

In [14]:
Image(filename='imgs/midband_clean_rfi.png', width=900, )
Out[14]:

Figure 2 | Delay CLEAN residuals for three baseline groups over the H1C mid band with the default flags after LST binning (top panels) and after introducing new narrowband flags (bottom panels) marked in white.

In [15]:
Image(filename='imgs/hiband_clean_rfi.png', width=900, )
Out[15]:

Figure 3 | Delay CLEAN residuals for three baseline groups over the H1C hi band with the default flags after LST binning (top panels) and after introducing new narrowband flags (bottom panels) marked in white.

After manual inspection, these are the channel ranges we flag for narrowband RFI for all times and all baselines

[135, 136], [169, 171], [424, 425], [570, 573], [697, 698], [815, 816]

In addition to flagging entire frequency channels, we also flag certain integrations based on low number counts in the LST binner (or nsample, as we call it internally), which is set at any integration that has a frequency-averaged nsample < 10 counts. This mainly picks up integrations near the edges of the LST range, and also integrations near the Fornax A transit when overflagging seems persistant.

In [20]:
Image(filename='imgs/nsample_lst_cuts.png', width=900, )
Out[20]:

Figure 4 | Nsample count directly after LST binning (left) and after enacting minimum frequency-averaged nsample cuts and manual LST cuts (right). Flagged regions are masked as white. The LST range of 3.43 -- 3.82 hours is the only manually flagged LST range, enacted due to the complex patch of flags at 170 MHz, whose origin is not fully understood, but causes problems with wideband inpainting. The minimum freq-averaged nsample cut flags regions at the LST boundaries as well as the region of low nsample at ~4.1 hours, and a few integrations that have high flag occupancy (e.g. ~10.5 hours).

Note that due to computational reasons, all of the flags discussed in this section are actually applied to the data during the inpainting step (but before we actually run the inpainting algorithm), which we discuss next.

3. Inpainting

Data inpainting is needed becuase certain frequencies and times in the visibilities are flagged due to RFI and other related problems. This produces discontinuities in the data that corrupt otherwise foreground-free Fourier modes when taking a Fourier transform across frequency. Inpainting is in a sense trying to deconvolve the impact of nulled data on the power spectrum. We do this with the delay-based CLEAN algorithm (Parsons+2009).

We run wideband inpainting on each time integration for each baseline independently. This is done by creating a delay-space visibility model, and then inserting this model into the flagged frequency channels. Unflagged data are not affected by this procedure. The following parameters were used with the hera_cal.vis_clean.VisClean object, which is itself derived from uvtools.dspec.fourier_filter (itself derived from aipy.deconv.clean).

{'standoff': 50.0, 'horizon': 1.0, 'min_dly': 2000.0, 'tol': 1e-05, 'autotol': 5e-07, 'maxiter': 100, 'window': 'tukey', 'alpha': 0.15, 'skip_wgt': 0.2, 'gain': 0.9, 'edgecut_low': 100, 'edgecut_hi': 150, 'zeropad': 256}

In [21]:
Image(filename='imgs/inpainted_vis.png', width=900, )
Out[21]:

Figure 5 | Visibility amplitude before inpainting (left, black regions denote flags) and after inpainting (right, black regions denote flags). Any integration that is fully flagged remains fully flagged. The manual flagging from the previous section are implemented in this step before deriving the CLEAN model. Inpainting is unreliable in flagged regions where a large number of channels are flagged (see Aguirre et al. 2021 in prep.), but can handle narrowband flags quite well depending on the CLEAN tolerance. The main power spectrum windows are the two relatively clean parts of the band (already largely free of narrowband flags) from 117 -- 132 MHz and 150 -- 168 MHz.

4. Reflections

Cable reflections are seen in the H1C system, arising from the termination of the two coaxial cables, one from the feed to the receiver of 150 meters and the other from the receiver to the container of 20 meters (Kern et al. 2020, Fagnoni et al. 2021). However, we also see hints of complex subreflections spanning a range of delays from 150 -- 1500 ns.

Kern et al. 2020 show that the reflection terms exhibit non-trivial frequency dependence, meaning that better reflection fits can be achieved by solving for reflection terms within subbands. We do this by splitting the H1C band into two separate subbands, separated at the 138 MHz ORBCOMM feature.

We model all reflections with 28 terms (chosen by seeing the calibrated auto-correlation visibility converge at the $\sim10^{-4}$ level in dynamic range. This is done sequentiall, first allowing 3 modes to be fit from 75 -- 800 ns, then 5 modes to be fit between 100 - 800 ns, etc. (see the dly_ranges parameter below). These were chosen manually by visual inspection of the auto-correlations, but seem to suppress auto-correlation structure evenly across the total range of $\sim$100 -- 1500 ns.

The reflection parameters used are

cal_ext : 'refcal.calfits'
time_avg : True
expand_times : True   # if time_avg, inflate gains to full time range of file
wgt_by_nsample : False
dly_ranges : 
- [75, 800, 3]
- [100, 800, 5]
- [200, 800, 5]
- [250, 800, 5]
- [300, 800, 5]
- [700, 1500, 5]
#### initial guess parameters
window : 'bh'
alpha : 0.1
edgecut_low : [100, 400] 
edgecut_hi : [524, 174]
combine_spws : True
spw_boundaries :
- [100, 400]
- [400, 850]  # channel boundaries when combining multispw. default is to use edgecuts.
zeropad : 256
Nphs : 100
fthin : None
ref_sig_cut : 1.0
#### refinement parameters
opt_maxiter : 0
opt_buffer : [15, 30]
opt_method : 'BFGS'
opt_tol : 0.001
#### time smoothcal parameters
smooth_ref : True  # smooth calibration across time
time_scale : 14400  # FWHM of Gaussian in seconds
Nmirror : 0  # number of times to mirror time axis
smooth_cal_ext : '_smooth'  # extension to ...{}.calfits field

Note that we do not use the small-scale refinement proposed by Kern et al. 2019 (evidenced b ythe opt_maxiter=0), as this takes a lot longer and we found that it did not have a significant impact on the derived solutions after time smoothing. Note that the reflection gains are also smoothed in time on a 4 hour timescale, which is motivated by the fact that the solutions do not converge at some time integrations (infrequently), and smoothing them in time seems to help, and is also consistent with how degrees of freedom in the gain solutions across time are dealt with in the analysis pipeline.

In [26]:
Image(filename='imgs/refgain_waterfall.png', width=900, )
Out[26]:

Figure 6 | Waterfall of reflection gain amplitude for spectral window (spw) 1 (left of black line) and spw 2 (right of black line), which are concatenated into a single calfits file (*allspws.refcal.calfits). We show the gain before time smoothing (left subplot) and after time smoothing with a 4 hour timescale (right subplot). The 4 hour timescale was chosen after manual inspection of the smoothed gains--a 4 hour scale mitigates fast timescale fluctuations while keeping broad time structure, and is roughly consistent with the gain smoothing timescale of 6 hours from the analysis pipeline.

In [25]:
Image(filename='imgs/refgain_dly.png', width=900, )
Out[25]:

Figure 7 | Delay space representation of the derived reflection gains for a single antenna-pol from a single time step for spw1 (blue) and spw2 (orange). While the 1200 ns feature is largely consistent from spw1 to spw2, the delay shoulder has a more complex frequency dependence.

In [28]:
Image(filename='imgs/refcal_data.png', width=900, )
Out[28]:

Figure 8 | Auto-correlation data with reflection calibration (purple) and without reflection calibration (green). The major cable features at 250 and 1200 ns are suppressed by at least 1 OOM in the visibility, and the delay shoulder is suppressed by a factor of a few. The number of reflection modes (28) was chosen such that the delay shoulder roughly converged to the level of the delay response of the autos for $\tau>1500$ ns.

5. Crosstalk Filtering

Crosstalk filter is applied by using a fringe-rate high-pass filter Kern et al. 2020. We first delay transform the visibilities (across each spw independently), and use SVD to decompose the delay waterfall into seprable delay and time modes for all delays that fall at and outside the horizon of each baseline (from $\tau_{\rm horizon}$ to 4000 ns).

Note that the fringe rates we choose to filter is set by a specification derived in Kern et al. 2019, which was derived at 120 MHz. However, the boundaries of the first spectral window dips slightly below this, so we have added a conservative buffer to the fringe-rates such that we filter slightly less aggressively. The full parameters used for xtalk filtering is tabulated below

# algorithm parameters
method : "svd-gp"  # options = ['frfilter', 'svd-gp', 'linear_filter']
max_frate_coeffs : [0.024, -0.28]  # max_frate [mHz] = x1 * EW_bl_len [m] + x2
max_frate : 0.75
# svd method parameters
min_dly : 100
max_dly : 4000
horizon : 0.8
standoff : 0
window : 'bh'
alpha : 0.1
edgecut_low : [100, 400]
edgecut_hi : [624, 150]
gp_frate_degrade : 0.0
gp_Nmirror : 200
Nkeep : 30
xthin : None
project_vmodes : False
frate_thresh : 0.06  # if max_frate [mHz] is below this, don't filter it
rm_intermediate_files : False
tol : 0.00000001

where the buffer is a max_frate_coeffs[1] of -0.28, rather than -0.24 (a buffer of .04 mHz in fringe-rate space).

In [31]:
Image(filename='imgs/xtalk_filter.png', width=900, )
Out[31]:

Figure 9 | Visibilities after inpainting (left column) in frequency space (top), delay space (center, windowed from 148 -- 168 MHz), and fringe-rate space (bottom, same frequency window). We also show the data after reflection calibration and xtalk filtering (right column). The strong discontinuities across frequency in the xtalk subtracted visibilities are a result of running xtalk filtering on each spw separately, which is done because reflection gains from each spw are applied before xtalk filtering.

6. Coherent LST Averaging

We perform coherent LST averaging of the visibilities by taking a contiguous block in time, rephasing the drift-scan data to a common pointing center, and then uniformly averaging the complex data. Note that the only weighting applied to the average is are the flags of the data, which after inpainting are constructed to be separable in time and frequency. In other words, there are no time-dependent flags (other than fully flagged integrations). Coherently averaging time-dependent flags can lead to spurious spectral structure Offringa et al. 2019. We perform numerical tests to set the length scale of the time average in notebooks/validation/LST_averaging_signal_loss.ipynb, and also in Aguirre et al. 2021. For baselines of ~<100 meters, we can coherently average data over a 428 second window while only inducing ~1% signal loss on the EoR power (that is $k$ independent). We therefore coherently average over a 214 second window, as our power spectrum estimator will interleave the remaining time bins when squaring the visibilities.

We also from pseudo stokes visibilities following the CASA convention (e.g. Equation 4 of Moore et al. 2017)

7. Power Spectra

We form power spectra of the pseudo I visibilities using the delay spectrum technique. The un-normalized power spectrum is defined as

\begin{align} \large q = x_1^\dagger Q x_2, \end{align}

where $Q$ is a matrix that Fourier transforms $x_1$ and $x_2$ and squares them, and also applies a spectral taper (or apodization) along the frequency axis prior to the FT. We use a blackman harris window for all power spectra. We cross multiply all baseline-pairs within a redundant set (excluding a baseline paired with itself), and further interleave the time array such that $x_1$ and $x_2$ are always adjacent time integrations.

We form power spectra over two spectral windows (spanning channels [175, 335] and [515, 695]) with average redshifts of 10.4 and 7.9, respectively. We also compute the standard $P_N$ thermal noise floor for each baseline-pair and time integration (accounting for each antennas measured $T_{\rm sys}$), as well as the $P_{SN}$ errorbar outlined in Tan et al. 2021 in prep. We also use the Fagnoni EM model of the H1C primary beam to compute beam areas that are used in the power spectrum normalization (pseudo-I beam lives here /lustre/aoc/projects/hera/nkern/beams/NF_HERA_IQ_power_beam_healpix128.fits).

8. Incoherent Averaging

After forming power spectra for all baseline-pairs for all the remaining time integrations, we are left with a large number of power spectra that we can incoherently average together. As noted, we also form two measurements of uncertainty for each baseline-pair and time integration, the $P_N$ thermal noise floor (which is $k$ independent) and $P_{SN}$ uncertainty, which is $k$ dependent. When performing incoherent averaging, we always weight the power spectra by $P_N$ but propagate $P_{SN}$ through the average.

In [13]:
Image(filename='imgs/inc_red_avg.png', width=900, )
Out[13]:

Figure 10 | Incoherently averaging the three shortest redundant baseline groups, followed by averaging the $|{\rm Re}(\bar{P})|$ over an LST range of 1 -- 2.5 hours. The dashed lines show the propagated $P_N\times\sqrt{2/\pi}$ (see Kern et al. 2020b for why $\sqrt{2/\pi}$ appears).

Note that although we are technically averaging the redundant baseline power spectra incoherently (i.e. squaring the visibilities and then averaging), we are actually performing a near coherent average by summing both the conjugated baseline-pair and the un-conjugated baseline pair. To explain this better, consider the following example.

If we have two redundant visibilities $V_1, V_2$ (defined implicitly in Fourier space), and we have two independent noise realizations of these visibilities (e.g. $V_1^a$ and $V_1^b$) then the power spectrum of their coherent average would equal

\begin{align} \large P_{\rm coh} = \frac{1}{4}(V_1^a+V_2^a)(V_1^b+V_2^b)^\ast, \end{align}

which we can see will have no noise-bias (because of the independent noise realizations), and can be expanded out as

\begin{align} \large P_{\rm coh} = \frac{1}{4}(V_1^aV_1^{b\ast} + V_2^aV_2^{b\ast} + V_1^aV_2^{b\ast} + V_2^aV_1^{b\ast}) \end{align}

Obviously, we can form the same coherently averaged power spectrum by simply incoherently averaging all terms in the above equation.

However, we may have reason to believe that the auto-baseline terms (i.e. the $V_1 V_1^\ast$ terms) may be more susceptible to systematics than the cross terms--any systematic that is uncorrelated between baselines will be mean-zero in the cross-correlation, and will thus average down. For example, cable reflections, being antenna-dependent, will be largely uncorrelated between baselines (except for baselines that share an antenna), and the crosstalk systematics are highly baseline dependent. Therefore, we can form a pseudo coherently (redundant) averaged power spectrum by forming all auto and cross-baseline power spectra, but only incoherently averaging the cross-baseline terms (throwing out the auto-baseline terms). We call this a cross coherent average:

\begin{align} \large P_{\rm cross\ coh} = \frac{1}{2}(V_1^aV_2^{b\ast} + V_2^aV_1^{b\ast}). \end{align}

Indeed, in the plots above, we only average cross-baseline terms and throw out auto-baseline terms. Although if we keep the auto-baseline terms and throw out the cross tmers, we define a truly incoherent average

\begin{align} \large P_{\rm inc} = \frac{1}{2}(V_1^aV_1^{b\ast} + V_2^aV_2^{b\ast}), \end{align}

which we will use shortly.

The only reason why throwing out the auto-baseline terms in our average may be a problem is if we think that the cross-baseline terms are significantly more prone to signal loss than the auto-baseline terms. Why might this be the case? This could be the case if the baselines are non-negligibly non-redundant, particularly in phase, which will cause decoherence in the signal when cross correlated. A good mental example of this is simply the response of a single baseline at two nearby but non-identical drift-scan time integrations. To leading order, you can think of the two visibilities as identical but with one having a phasing term applied to it (in this case a known phasing term, but for now we will assume this phasing term is unknown and therefore not correctable). Cross correlating the two visibilities will result in power that is stricly less than the power that would arise from cross correlating the visibilities at the same time integration.

This is the essence of our redundant average signal loss test. This simple test compares the cross coherent average ($P_{\rm coh}$) against the incoherent average ($P_{\rm inc}$), the latter of which is simply just the sum of the auto-baseline terms discussed above. We furthermore restrict the comparison to the $\tau=0$ ns mode, which isolates foreground power indicent from the main lobe of the primary beam (where we expect most of the EoR power to be sourced). Comparing the ratio of these terms tells us how much fractional power is lost in the cross coherent average due to intrinsic non-redundancies between the baselines. Note that for technical reasons, we divide by the time-average of $P_{\rm inc}$ as it can go through nulls at certain times causing the ratio to spike: taking the time average largely mitigates this pathology.

In [18]:
Image(filename='imgs/redundancy_sigloss.png', width=700, )
Out[18]:

Figure 11 | Redundant average signal loss test for the nine shortest baseline groups at an LST range when diffuse foregrounds fill the primary beam (transit of the galactic anti-center). The blue dashed line is the weighted average of the baseline groups, weighted by the number of baselines in each group. This shows that signal loss arising from baseline-to-baseline non-redundancies (regardless of what they are from), is limited to an average of about 1% in power. Note that $P_{\rm coh}$ here is actually the $P_{\rm cross\ coh}$ defined above.

Time averaging

The last axis over which we can average the power spectra are the remaining LST bins. We select three "fields" that correspond to ~2 hour chunks of time over the full ~12 hour LST range of the IDR2.2 dataset that have the least amount of foreground contamination. See notebooks/pspec_progress/pspec_LST_selection.ipynb for details on how these are selected. In summary, we pick LST ranges that reject known problematic foreground sources, specifically Fornax A and the galactic anti-center, as well as give a 0.5 hour buffer from the edges of the LST ranges due to xtalk subtraction (see Aguirre et al. 2021 for details).

9. Sensitivity of Final Limits to Analysis Choices in Power Spectrum Estimation

Even after the major analysis and "pre-processing" steps, there are a multitude of knobs we can tune in the process of power spectrum estimation and averaging. One test of the robustness of our final limits is to assess how sensitive these are when perturbing these knobs slightly. We present some of these tests to demonstrate the sensitivity of the final power spectrum limits to these analysis choices. For simplicity, we will focus on how they impact the Band 2 Field 1 limits.

LST range

The LST range for Field 1 is 1.25 - 2.7 hours. 1.25 hours is ~0.5 hours beyond the beginning of the LST range, and 2.7 hours is ~0.5 hours before the Fornax A transit. Here we explore how the limits change as we change this slightly.

In [2]:
Image(filename='imgs/lst_shift.png', width=900, )
Out[2]:

Figure 12 | The $2\sigma$ upper limit of the Band 2 Field 1 power spectrum at the $k=0.192\ h\ {\rm Mpc}^{-1}$ Fourier mode having shifted the LST range by $\Delta t$ (holding the total duration constant).

Binning in $k$

The native delay resolution for Band 1 is 56.8 ns, which, given a Planck cosmology, corresponds to $\Delta k_\parallel = 0.0318\ h\ {\rm Mpc}^{-1}$. For the analysis, we decimate the $k$ bins to limit their correlation (having applied an apodization function in frequency), such that our sampling in drawn from (0, 2.5) in steps of $\Delta k=0.064\ h\ {\rm Mpc}^{-1}$. However, we can sample the $k$ bins with whatever cadence we choose. Here, we will slightly perturb these values and see how the power spectrum at $k\sim0.2\ h\ {\rm Mpc}^{-1}$ behaves.

In [6]:
Image(filename='imgs/dk_binning_pspec.png', width=900, )
Out[6]:

Figure 13 | Power spectra $P(k)$ (left) and $\Delta^2$ (right) plus $2\sigma$ while shifting the starting $k$ bin, and thus the location of the $k\sim0.19$ dip. As Figure 15 shows, the $\Delta k$ shift does not change the amplitude of the dip in $P(k)$, but because of the $k^3$ nature of the dimensionless power spectrum, it leads to a slight difference in the upper limit depending on where the bin is sampled.

In [7]:
Image(filename='imgs/dk_binning.png', width=900, )
Out[7]:

Figure 14 | The $2\sigma$ upper limit from Band 2 Field 1 while shifting the starting $k$ value such that the k bin near $k\sim0.19$ is slightly shifted. The left plot shows the upper limit in $P(k)$ while the right plot shows it in $\Delta^2$. We see that the shifting of the $k$ bins leads to more or less sensitive upper limits in $\Delta^2$ simply due to its $k^3$ dependence. Note that H1C analysis uses the unshifted $k$ binning, with a best upper limit represented by the black star.

Wedge buffer

In [5]:
Image(filename='imgs/wedge_buff.png', width=900, )
Out[5]:

Figure 15 | The $2\sigma$ upper limit from Band 2 Field 1 at $k=0.192\ h\ {\rm Mpc}^{-1}$ while increasing the foreground wedge buffer $\Delta k_\parallel$. For Band 2, a wedge buffer of 200 nanoseconds equals $\Delta k_\parallel$ of $1.11\ h\ {\rm Mpc}^{-1}$.

10. Summary

This memo walks through some of the major analysis steps prior to (and including) power spectrum estimation, but after the main analysis pipeline outlined in the H1C IDR2.2 memo (Dillon2019). This includes delay-based CLEAN inpainting over RFI flags, reflection calibration and crosstalk subtraction, coherent time averaging, power spectrum estimation, and incoherent time and redundant baseline averaging. Finally, we presented some tests on the final Band 2 Field 1 limits that demonstrate the sensitivity of the limits while perturbing some of the end-stage analysis choices, such as the duration of the LST range, the binning in $k$ and the wedge buffer in the spherical averaging.

With the exception of the sensitivity tests, all of the material discussed here can be found in more detail in the various notebooks located in H1C_IDR2/notebooks.

11. Appendix A: Abscal Bias

Aguirre et al. 2021 in prep show that the absolute calibration procedure of HERA Collab et al. 2021 in prep is a biased estimator of the overall flux scale in the limit of low SNR visibilities (e.g. Boonstra et al. 2003). Due to the fact that drift-scan visibilities are calibrated at all time integrations (even those with low SNR), we find that absolute calibration results in a ~4% bias on the recovered flux density scale (biased low). This can be corrected for in the power spectrum.

To estimate this bias we image the calibrated visibilities and compare to the calibration model--Aguirre et al. 2021 in prep showed this method is an accurate way to estimate the abscal bias.

In [33]:
Image(filename='imgs/spw1_abscal_bias.png', width=900, )
Out[33]:

Figure 16 | An dirty, MFS image of the abscal model (left), the post calibrated data (center) and their ratio across the Band 1 spectral window. Taking the peak flux of the zenith sources yields a (XX and YY averaged) model / data ratio of 1.0459. This implies a necessary 9% bias correction on the Band 1 power spectra.

In [34]:
Image(filename='imgs/spw2_abscal_bias.png', width=900, )
Out[34]:

Figure 17 | Same but for Band 2, yielding a model / data ratio of 1.0387, implying a necessary 8% bias correction on the Band 2 power spectra.

12. Appendix B: Faraday-Rotated Pulsar Emission

See notebooks/polarization_studies/blob_systematic_imaging.ipynb for more details.

We discovered high delay "blob" systematics in the H1C system at very low dynamic range (only visible in the LST binned dataset). These blobs are not persistent in time (timescale of a beam crossing), appear at high delays, and only show up strongly in the XX, YY and pQ visibilities. Furthermore, the delay at which they show up decreases at higher frequencies. This was eventually found to be the Faraday-rotated component of pulsar emission, similar to the predictions from that of Moore et al. 2013. Constructing spectral cubes of the data during the LSTs of the systematics, and forming a rotation measure synthesis of the cubes yields a bright source at the rotation measure of catalogued pulsars. Although the XX, YY and pQ visibilities are contaminated, these features appear to drop out of the pseudo-I visibilities.

In [54]:
Image(filename='imgs/pstokes.rms.157.png', width=500)
Out[54]:
In [52]:
Image(filename='imgs/pstokes.rms.205.png', width=500)
Out[52]:
In [53]:
Image(filename='imgs/pstokes.rms.224.png', width=500)
Out[53]:

Figure 18 | RM synthesis of spectral cube data showing detections of high RM pulsars. Note these data have been inpainted, crosstalk subtracted and redundantly averaged.