Astronomical Techniques - Noise and Signal

Sources of noise in data-taking: We may have to contend with purely instrumental noise sources - grain statistics in photography, amplifier noise in photomultipliers or CCDs, thermal noise in cooled detectors, etc. The most fundamental limit is that set by photon noise. Photons arrive randomly, with a certain mean rate set by the radiation intensity and wavelength. Always keep in mind that the photon noise may include a contribution from sources that are a nuisance, such as airglow or scattered sun- or moonlight, and that the numbers of photons in calculations must include this "background". Also note that a form of noise can be introduced in digitizing a signal (roundoff or truncation) or in further processing, for example in use of calibration data less accurate than the desired results.

It can be important to watch the sources of error for a given measurement. In succeeding steps, errors may accumulate. Statistically independent errors add in quadrature. To track this, you may need to carry an error term through all steps of a calculation, or simply add all the relative errors in quadrature. Doing this before carrying out a series of measurements can be very instructive.

Poisson processes

Consider a process (such as detection of a photon in all but the highest radiation densities) with constant probability per unit time; that is, the probability of detecting a photon in dt is ldt for dt infinitesimal, and the expected number detected in finite time interval DT is then l DT. What is the expected distribution of the number actually detected in various statistically independent time intervals, or experimental trials?

This was worked out by Poisson in 1831, and later by Gossett (writing as "Student" due to employment restrictions at the Guinness Brewery). If we define Wn(t) as the probability of seeing exactly n events in a time interval T, we have

Probability of n+1 events in time t = Wn+1(t)
Probability of 0 events in time dt = 1 - ldt
Probability of n events in time t = Wn(t)
Probability of 1 event in time dt = l dt
Probability of n-1 events in time t = Wn-1(t)
Probability of 2 events in time dt = (l dt)2
which last is of negligible importance by construction of differentials. This gives a recursion relation Wn+1(t+dt) = (1 - l dt) Wn+1(t) + Wn(t) l dt and a resulting differential equation d (Wn+1(t))/dt = l [Wn(t) - Wn+1(t)] for each n. With the boundary condition Wn(0)=0, the family of equations has the general solution Wn+1(t) = ((lt)n+1 e-lt)/(n+1)! If we let M be the average number of events in time t (which is to say l t, usually the quantity we want to measure), then we have Wn(M) = Mn e-M /n! as the distribution of probabilities for a Poisson process. Note that this is properly normalized (i.e. the probability of measuring one or another value of n is unity). The mean value is s n Wn(M) = M and it has standard deviation given by
s2 = S[n2 Mn e-M /n! - 2Mn Mn e-M/n! + M2 Wn(M)] = M
or s = M1/2 (so-called root-N statistics or errors). (I am indebted to George Blumenthal for the form of this derivation.)

For large n, the probability distribution approaches the normal or Gaussian distribution

Wn(M) = (1/[(2 p)1/2 s] e-(M-n)2/ 2s2.
For this distribution, the standard deviation s has immediately useful meaning:
P(n > M+ s) = 0.16
P(n > M+2 s) = 0.023
P(n > M+3 s) = 0.0013
(likewise for the distribution below M.)

Gaussians have many useful properties. For example, the convolution of two Gaussians of different s is another Gaussian with s2 = s12 + s22. The Fourier transform of a Gaussian is another Gaussian. instrumental profiles are frequently Gaussian enough that this analytically useful form is used as a fitting function with or without explicit motivation.

The transition from small-number Poisson statistics to a Gaussian is one application of the central limit theorem. This states that for an arbitrary distribution f(x),if one repeatedly measures the mean of f from sufficiently large numbers of trials, the measured means will themselves obey a normal distribution. This enters almost intuitively into experimental and observational design.



Statistics and Data Assessment

There are several commonly used statistics for a distribution, expressing either some measure of the zero point or width. These include mean, median, and mode for the zero point, and various moments for the width (and more details). The mean (or average) is defined by ò x f(x) dx where f(x) is the normalized distribution. The mean may be strongly affected by (possibly spurious) outlying values, so that for some applications the median or mode may be preferable. As an example, say we want to know the sky brightness in some image, where the mean may be contaminated by faint stars, cosmic-ray events, etc. Here, the mode will be a better estimate of what we're after, though still not perfect - to do it right, one would need to understand the distributions of the contaminating processes. The more you know a priori, the better you can do...

We are often interested in what we can learn from data sets via statistical tests. As examples, what is the average luminosity of RR Lyrae stars, or the average X-ray-to-visual ratio for solar type stars? Different kinds of tests are in order, depending on whether there is a physically motivated model to test or parameter to measure. Sometimes, we can only ask whether a data set is consistent with another one or not, or whether two kinds of objects are the same or different. These reflect parametric and nonparametric statistics.

Parametric statistics may include model fitting and parameter estimation. Linear regression is one example. Other commonly used techniques include least-squares function fitting and c2 minimization.

{\it Least-squares fitting}\thinspace , in its linear variety, is used when there is reason to believe that the data represent points of the form yi = S an fn(xi). A special case is fitting a line, yi = a0 + a1 xi . The usual derivation takes all the scatter in data as along the y axis; more complicated variants exist in the case of substantial errors in both variables. The goodness-of-fit criterion here is that the best fit is defined to be the one for which the sum S(y' - yi)2 is minimized, where y' is the value predicted for a given set of coefficients an. It can be shown that minimizing the squared residuals from a given functional fit is yields sets of normal equations from the given data of the following form:

Sf12(xi) + Sf1(x1) f2(xi) + Sf1(xi) f3(xi) + ... = S f1(xi) yi
forming a set of n equations in n unknowns which can be solved by appropriate methods from linear algebra. The functions fn need not be analytic; I have used this to ask what kinds of stars are needed to reproduce a galaxy spectrum, where the fn were simply the spectra of stars and stellar groupings tabulated at 4000 points. One book which I have found to spell this out in some detail is Scientific Data Analaysis: an Introduction to Overdetermined Systems by R.L. Branham, Jr. (Springer-Verlag, 1990).

Another widely-useful fitting technique, especially important for data sets where the errors may not be normally distributed or differ widely among data points, is c2 fitting. Here, one finds the values of parameters for some fitting function that minimizes the quantity c2 = S (y' - yi)/s2 for which it is important that one have an accurate assessment of s (be honest, now!). Not only does this give parameter estimates, but also tells how well they represent the data. For a good fit, the reduced c2 (which is the value above divided by n-p, the number of data values minus the number of parameter values determined in the fit) should be close to 1. This approach can also be used to tell whether specific components of a proposed model are significant in a statistical sense. Mapping c2 as a function of the parameters allows study of the allowed parameter space and any correlation of errors in their estimates. There can be subtleties in applictaion when the fitting function has most of its weight in a small part of the data space, or for small-number statistics (n < 5 or so gives biases results).

Often, either the formulation of a problem or the nature of the data do not admit a parametric approach. This happens when we want to ask, "Are these two samples drawn from the same parent population?", or when there are upper or lower limits as well as genuine measurements. In the first case, one may use comparisons of ranked lists or the Kolmogorov-Smirnov test, which compares cumulative distribution functions of two samples or a sample and a model. The maximum difference between the CDFs for a given sample size is the critical statistic. In the second case, there is a body of approaches known as survival analysis; Feigelson and collaborators have brought this work to astronomers' attention, and produced Fortran routines for their implementation. See, for example, Isobe et al 1990, ApJ 364, 104; Babu and Feigelson 1992, Commun. Statist. Comput. Simul. 22, 533; and Feigelson and Babu 1992, ApJ 397, 55

Statistical sleaze - when do you reject points, and when do you "accept" a hypothesis?

There has been much recent attention to so-called Bayesian analysis, in which the likelihood of a hypothesis given a particular set of data and model is evaluated using Bayes' theorem:

P(B | A) = P(A | B) P(B) / P(A).
As noted in, for example, Statistics in Theory and Practice (R. Lupton, Princeton 1993) this is not at all controversial. What is more controversial, and constitutes Bayes' postulate, is the claim that all prior probabilities (or priors for various outcomes should be taken as equal in the absence of specific contrary evidence. Some of us are leery of this, since it is in fact a specific claim about the probabilities of events that we know nothing about. However, in those cases that we really do understand this well, this can be a powerful way to use our experimental abilities very efficiently. The result can be inverted to yield maximum-likelihood estimates of parameters. As more and more data are available, Bayesian estimates approach "classical" estimates, since the prior is less relevant when large amounts of data exist.


« Detectors | Telescopes »

Course home page | Bill Keel's Home Page | Image Usage and Copyright Info | UA Astronomy

keel@bildad.astr.ua.edu
Last changes: 1/2006	  © 2000-2006