Astrostatistics Tools (astropy.stats
)¶
Introduction¶
The astropy.stats
package holds statistical functions or algorithms
used in astronomy. While the scipy.stats
and statsmodel packages contains a
wide range of statistical tools, they are general-purpose packages and
are missing some tools that are particularly useful or specific to
astronomy. This package is intended to provide such functionality,
but not to replace scipy.stats
if its implementation satisfies
astronomers’ needs.
Getting Started¶
A number of different tools are contained in the stats package, and they can be accessed by importing them:
>>> from astropy import stats
A full list of the different tools are provided below. Please see the
documentation for their different usage. For example, sigma clipping,
which is common way to estimate the background of an image, can be
performed with the sigma_clip()
function. By
default, the function returns a masked array where outliers are
masked:
>>> data = [1, 5, 6, 8, 100, 5, 3, 2]
>>> stats.sigma_clip(data, sigma=2, maxiters=5)
masked_array(data=[1, 5, 6, 8, --, 5, 3, 2],
mask=[False, False, False, False, True, False, False, False],
fill_value=999999)
Alternatively, the SigmaClip
class provides an
object-oriented interface to sigma clipping, which also returns a
masked array by default:
>>> sigclip = stats.SigmaClip(sigma=2, maxiters=5)
>>> sigclip(data)
masked_array(data=[1, 5, 6, 8, --, 5, 3, 2],
mask=[False, False, False, False, True, False, False, False],
fill_value=999999)
In addition, there are also several convenience functions for making
the calculation of statistics even easier. For example,
sigma_clipped_stats()
will return the mean,
median, and standard deviation of a sigma-clipped array:
>>> stats.sigma_clipped_stats(data, sigma=2, maxiters=5)
(4.2857142857142856, 5.0, 2.2497165354319457)
There are also tools for calculating robust statistics, sampling the data, circular statistics, confidence limits, spatial statistics, and adaptive histograms.
Most tools are fairly self-contained, and include relevant examples in their docstrings.
Using astropy.stats
¶
More detailed information on using the package is provided on separate pages, listed below.
Constants¶
The astropy.stats
package defines two constants useful for
converting between Gaussian sigma and full width at half maximum
(FWHM):
-
gaussian_sigma_to_fwhm
¶ Factor with which to multiply Gaussian 1-sigma standard deviation to convert it to full width at half maximum (FWHM).
>>> from astropy.stats import gaussian_sigma_to_fwhm >>> gaussian_sigma_to_fwhm # doctest: +FLOAT_CMP 2.3548200450309493
-
gaussian_fwhm_to_sigma
¶ Factor with which to multiply Gaussian full width at half maximum (FWHM) to convert it to 1-sigma standard deviation.
>>> from astropy.stats import gaussian_fwhm_to_sigma >>> gaussian_fwhm_to_sigma # doctest: +FLOAT_CMP 0.42466090014400953
See Also¶
scipy.stats
- This scipy package contains a variety of useful statistical functions and
classes. The functionality in
astropy.stats
is intended to supplement this, not replace it.
- statsmodel
- The statsmodel package provides functionality for estimating different statistical models, tests, and data exploration.
- astroML
- The astroML package is a Python module for machine learning and data mining. Some of the tools from this package have been migrated here, but there are still a number of tools there that are useful for astronomy and statistical analysis.
astropy.visualization.hist()
- The
histogram()
routine and related functionality defined here are used within theastropy.visualization.hist()
function. For a discussion of these methods for determining histogram binnings, see Choosing Histogram Bins.
Performance Tips¶
If you are finding sigma-clipping to be slow, and if you haven’t already done so,
consider installing the bottleneck
package, which will speed up some of the internal computations. In addition, if
you are using standard functions for cenfunc
and/or stdfunc
, make sure
you specify these as strings rather than passing a Numpy function - that is,
use:
>>> sigma_clip(array, cenfunc='median')
instead of:
>>> sigma_clip(array, cenfunc=np.nanmedian)
Using strings will allow the sigma clipping algorithm to pick the fastest implementation available for finding the median.
Reference/API¶
astropy.stats Package¶
This subpackage contains statistical tools provided for or used by Astropy.
While the scipy.stats
package contains a wide range of statistical
tools, it is a general-purpose package, and is missing some that are
particularly useful to astronomy or are used in an atypical way in
astronomy. This package is intended to provide such functionality, but
not to replace scipy.stats
if its implementation satisfies
astronomers’ needs.
Functions¶
akaike_info_criterion (log_likelihood, …) |
Computes the Akaike Information Criterion (AIC). |
akaike_info_criterion_lsq (ssr, n_params, …) |
Computes the Akaike Information Criterion assuming that the observations are Gaussian distributed. |
bayesian_blocks (t[, x, sigma, fitness]) |
Compute optimal segmentation of data with Scargle’s Bayesian Blocks |
bayesian_info_criterion (log_likelihood, …) |
Computes the Bayesian Information Criterion (BIC) given the log of the likelihood function evaluated at the estimated (or analytically derived) parameters, the number of parameters, and the number of samples. |
bayesian_info_criterion_lsq (ssr, n_params, …) |
Computes the Bayesian Information Criterion (BIC) assuming that the observations come from a Gaussian distribution. |
binned_binom_proportion (x, success[, bins, …]) |
Binomial proportion and confidence interval in bins of a continuous variable x . |
binom_conf_interval (k, n[, conf, interval]) |
Binomial proportion confidence interval given k successes, n trials. |
biweight_location (data[, c, M, axis]) |
Compute the biweight location. |
biweight_midcorrelation (x, y[, c, M, …]) |
Compute the biweight midcorrelation between two variables. |
biweight_midcovariance (data[, c, M, …]) |
Compute the biweight midcovariance between pairs of multiple variables. |
biweight_midvariance (data[, c, M, axis, …]) |
Compute the biweight midvariance. |
biweight_scale (data[, c, M, axis, …]) |
Compute the biweight scale. |
bootstrap (data[, bootnum, samples, bootfunc]) |
Performs bootstrap resampling on numpy arrays. |
calculate_bin_edges (a[, bins, range, weights]) |
Calculate histogram bin edges like numpy.histogram_bin_edges . |
cdf_from_intervals (breaks, totals) |
Construct a callable piecewise-linear CDF from a pair of arrays. |
circcorrcoef (alpha, beta[, axis, …]) |
Computes the circular correlation coefficient between two array of circular data. |
circmean (data[, axis, weights]) |
Computes the circular mean angle of an array of circular data. |
circmoment (data[, p, centered, axis, weights]) |
Computes the p -th trigonometric circular moment for an array of circular data. |
circvar (data[, axis, weights]) |
Computes the circular variance of an array of circular data. |
fold_intervals (intervals) |
Fold the weighted intervals to the interval (0,1). |
freedman_bin_width (data[, return_bins]) |
Return the optimal histogram bin width using the Freedman-Diaconis rule |
histogram (a[, bins, range, weights]) |
Enhanced histogram function, providing adaptive binnings |
histogram_intervals (n, breaks, totals) |
Histogram of a piecewise-constant weight function. |
interval_overlap_length (i1, i2) |
Compute the length of overlap of two intervals. |
jackknife_resampling (data) |
Performs jackknife resampling on numpy arrays. |
jackknife_stats (data, statistic[, conf_lvl]) |
Performs jackknife estimation on the basis of jackknife resamples. |
knuth_bin_width (data[, return_bins, quiet]) |
Return the optimal histogram bin width using Knuth’s rule. |
kuiper (data[, cdf, args]) |
Compute the Kuiper statistic. |
kuiper_false_positive_probability (D, N) |
Compute the false positive probability for the Kuiper statistic. |
kuiper_two (data1, data2) |
Compute the Kuiper statistic to compare two samples. |
mad_std (data[, axis, func, ignore_nan]) |
Calculate a robust standard deviation using the median absolute deviation (MAD). |
median_absolute_deviation (data[, axis, …]) |
Calculate the median absolute deviation (MAD). |
poisson_conf_interval (n[, interval, sigma, …]) |
Poisson parameter confidence interval given observed counts |
rayleightest (data[, axis, weights]) |
Performs the Rayleigh test of uniformity. |
scott_bin_width (data[, return_bins]) |
Return the optimal histogram bin width using Scott’s rule |
sigma_clip (data[, sigma, sigma_lower, …]) |
Perform sigma-clipping on the provided data. |
sigma_clipped_stats (data[, mask, …]) |
Calculate sigma-clipped statistics on the provided data. |
signal_to_noise_oir_ccd (t, source_eps, …) |
Computes the signal to noise ratio for source being observed in the optical/IR using a CCD. |
vonmisesmle (data[, axis]) |
Computes the Maximum Likelihood Estimator (MLE) for the parameters of the von Mises distribution. |
vtest (data[, mu, axis, weights]) |
Performs the Rayleigh test of uniformity where the alternative hypothesis H1 is assumed to have a known mean angle mu . |
Classes¶
BoxLeastSquares (t, y[, dy]) |
Compute the box least squares periodogram |
BoxLeastSquaresResults (*args) |
The results of a BoxLeastSquares search |
Events ([p0, gamma, ncp_prior]) |
Bayesian blocks fitness for binned or unbinned events |
FitnessFunc ([p0, gamma, ncp_prior]) |
Base class for bayesian blocks fitness functions |
LombScargle (t, y[, dy, fit_mean, …]) |
Compute the Lomb-Scargle Periodogram. |
PointMeasures ([p0, gamma, ncp_prior]) |
Bayesian blocks fitness for point measures |
RegularEvents (dt[, p0, gamma, ncp_prior]) |
Bayesian blocks fitness for regular events |
RipleysKEstimator (area[, x_max, y_max, …]) |
Estimators for Ripley’s K function for two-dimensional spatial data. |
SigmaClip ([sigma, sigma_lower, sigma_upper, …]) |
Class to perform sigma clipping. |