David W. Hogg1, 2, 3 and Andrew R. Casey4, 51- Center for Cosmology and Particle Physics, Department of Physics, New York University2- Max-Planck-Institut f¨ur Astronomie, Heidelberg3- Center for Computational Astrophysics, Flatiron Institute4- School of Physics & Astronomy, Monash University5- Centre of Excellence for Astrophysics in Three Dimensions (ASTRO-3D)ABSTRACTWhen there are many observations of an astronomical source—many images with different dithers, or many spectra taken at different barycentric velocities—it is standard practice to shift and stack the data, to (for example) make a high signal-tonoise average image or mean spectrum. Bound-saturating measurements are made by manipulating a likelihood function, where the data are treated as fixed, and model parameters are modified to fit the data. Traditional shifting and stacking of data can be converted into a model-fitting procedure, such that the data are not modified, and yet the output is the shift-adjusted mean. The key component of this conversion is a spectral model that is completely flexible but also a continuous function of wavelength (or position in the case of imaging) that can represent any signal being measured by the device after any reasonable translation (or rotation or field distortion). The benefits of a modeling approach are myriad: The sacred data never are modified. Noise maps, data gaps, and bad-data masks don’t require interpolation. The output can take the form of an image or spectrum evaluated on a pixel grid, as is traditional. In addition to shifts, the model can account for line-spread or point-spread function variations, world-coordinate-system variations, and calibration or normalization variations. The noise in the output becomes uncorrelated across neighboring pixels as the shifts deliver good coverage in some sense. The only cost is a small increase in computational complexity over that of traditional methods. We demonstrate the method with a small data example and we provide open-source sample code for re-use. Keywords: data analysis — image processing — spectroscopy — statistics
The information-theoretic answer is that the covariance matrix C delivering the uncertainties on y is C = [X (X C 1 X)1 X ]1 . The naive uncertainty variances are the diagonals of this matrix (and not the inverses of the diagonals of its inverse). But, very importantly, this is only a correct usage when the input Ci matrices really represented the true uncertainties going in. If you have any concerns about that, then it is way safer to get your uncertainty estimates from bootstrap or jackknife (for a related example, see Hogg & Villar 2021).
id: a310cc59d1e1b6661cd8609c9733e6de - page: 10
In the end, most users want the combined spectrum y (and perhaps the pixel positions x, uncertainty information, and propagated meta data). Some users will additionally want (or want to reconstruct) the parameters and the basis functions gj(x). For these reasons it often makes sense to set M P such that the delivered y fully determines the parameters. But thats a detailed question about your users and your goals. Implementation notes Although the equations like (4) show matrix multiplies between C 1 and X, it is not a good idea to perform these multplies as true matrix multiplies if the C 1 matrix is diagonal (as it often is). When C 1 is diagonal, the matrix multiply can be done as a broadcast of the vector of diagonal values instead, which is faster (and way less memory intensive).
id: a3854de169c5c47b41fa34e3d3fd550c - page: 10
Although the equations like (4) show inverses, it is almost never a good idea to use the native linear-algebra inverse function inv() in any implementation of this method. The reason is that we dont want the inverse matrix itself, we want the outcome of the inverse matrix multiplied by X C 1 Y . For this use case, it is more precise and more more numerically stable to use the linear algebra solve() or lstsq() functions. The only use for inv() is where the full inverse of a matrix is required, as when estimating the full C by (8). Whenever performing inv(), solve(), or lstsq() operations, it is sensible to begin by checking the condition number (the ratio of the largest to the smallest singular value) of the matrix in play. Condition numbers larger than 108 are usually trouble, even when the mathematics is being performed at double precision. When condition numbers get large, maybe a different basis or a re-scale of some basis functions might be a good idea.
id: 8a2daddb3a77ef0c405d115f84c1c9d9 - page: 10
If X is either very sparse (as it is in a b-spline basis) or Fourier (as it is in a sine-and-cosine basis) then there are very fast ways to do the linear algebra. We could have used the finufft package (Barnett et al. 2019) to do the mathematics fast, because we have a Fourier basis here. Sparse linear algebra packages like that in scipy (Virtanen et al. 2020) are available; they would similarly speed up the code when using spline-like or wavelet-like bases. If you use any non-uniform fast Fourier transform package, the programming interface of that package will constrain how the frequencies must be ordered in the code, and it will probably require complex-number implementation. These have impacts on how the code is written. In particular the fact that the output combined spectrum y (8) combining spectra by forward modeling
id: d8f77a5e4d3542e47dd6873ddbeb4812 - page: 10