accessor#

Accessor to manipulate histograms.

An accessor registered as hist is made available on xarray.DataArray for various histogram manipulations.

Functions

remove_flow(coord)

Remove flow bins from a coordinate.

Classes

HistDataArrayAccessor(obj)

Histogram accessor for DataArrays.

class HistDataArrayAccessor(obj)#

Histogram accessor for DataArrays.

Important

Accessor registered under hist.

Validity

  • The coordinates of the bins must be named <variable>_bins.

  • The array must be named as <variable(s)_name>_<histogram or pdf>. histogram if it is not normalized, and pdf if it is normalized as a probability density function. If the histogram is multi-dimensional, the variables names must be separated by underscores. For instance: Temp_Sal_histogram.

Each bins coordinate may contain attributes:

  • bin_type: the class name of the Boost axis type that was used. If not present, the accessor will assume the bins are regularly spaced and will try to infer the rightmost edge.

  • right_edge: the rightmost edge position, only necessary for Regular and

    Variable bins.

  • underflow and overflow: integers that indicate if the corresponding flow bins are present. If not present, will assume no flow bins.

Backend for computations

Statistics computations are actually delegated to scipy.stats.rv_histogram. Therefore, it does not support chunking along the bins dimensions (which should not be a problem in most cases).

Parameters:

obj (DataArray)

is_normalized()#

Whether the histogram is normalized (based on the array name).

Return type:

bool

bins(variable=None, flow=True)#

Return bins coordinates for a given variable.

Parameters:
  • variable (str | None) – Can be omitted for 1D histograms.

  • flow (bool) – Remove flow bins if False.

Return type:

DataArray

edges(variable=None, flow=True)#

Return the edges of the bins (including the right most edge).

Not supported for bins of the discrete types “IntCategory” and “StrCategory”.

Parameters:
  • variable (str | None) – Can be omitted for 1D histograms.

  • flow (bool) – Remove flow bins if False.

Return type:

DataArray

centers(variable=None, flow=True)#

Return the center of all bins.

Not supported for bin type “StrCategory”. IntCategory bins centers are bins+0.5. The centers of flow bins are the same as their position (np.inf for instance).

Parameters:
  • variable (str | None) – Can be omitted for 1D histograms.

  • flow (bool) – Remove flow bins if False.

Return type:

DataArray

widths(variable=None, flow=True)#

Return the widths of all bins.

Widths of flow bins and StrCategory are 1.

Parameters:
  • variable (str | None) – Can be omitted for 1D histograms.

  • flow (bool) – Remove flow bins if False.

Return type:

DataArray

areas(variables=None, flow=True)#

Return the areas of the bins.

The product of the widths of all specified bins. The areas of points that correspond to a flow bin in at least one dimension is equal to one.

Parameters:
  • variables (Sequence[str] | None) – Variables to include the corresponding bins. If left to None, all variables are used.

  • flow (bool) – Remove flow bins if False.

Return type:

DataArray

normalize(variables=None)#

Return a normalized histogram.

Will raise if the histogram is already normalized.

Parameters:

variables (str | Sequence[str] | None) – The variable(s), ie dimensions, along which to normalize.

Return type:

DataArray

remove_flow(variables=None)#

Remove flow bins.

Parameters:

variables (Sequence[str] | None) – Variables for which to remove flow bins. If not specified, remove for all variables.

Return type:

DataArray

apply_func(func, variable=None, flow=True, **kwargs)#

Apply a function to a bins coordinate.

Parameters:
  • func (Callable[[DataArray], DataArray]) – Callable that must transform the N+1 edges. It does not need to take care of the right_edge attribute.

  • variable (str | None) – The variable to transform. (This is equivalent to computing an histogram of func(ds["variable"], **kwargs)). It can be omitted for 1D histograms.

  • kwargs – Passed to the function.

  • flow (bool)

Return type:

DataArray

scale(factor, variable=None, flow=True)#

Transform a bins coordinate by scaling it.

Parameters:
  • factor (float) – Factor by which to scale the coordinate values.

  • variable (str | None) – The variable to scale. (This is equivalent to computing an histogram of factor * ds["variable"]).

  • flow (bool)

Return type:

DataArray

ppf(q, variable=None)#

Return the percent point function at q.

Uses scipy.stats.rv_histogram for computation.

Parameters:
  • q (float) – Must be between 0 and 1.

  • variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Return type:

DataArray

median(variable=None)#

Return the median value of the distribution.

Uses scipy.stats.rv_histogram for computation.

Parameters:

variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Return type:

DataArray

mean(variable=None)#

Return the mean value of the distribution.

Uses scipy.stats.rv_histogram for computation.

Parameters:

variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Return type:

DataArray

cdf(x, variable=None)#

Return the cumulative distribution function at x.

Uses scipy.stats.rv_histogram for computation.

Parameters:
  • x (float) – Quantile, must be between 0 and 1.

  • variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Return type:

DataArray

var(variable=None)#

Return the variance of the distribution.

Uses scipy.stats.rv_histogram for computation.

Parameters:

variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Return type:

DataArray

std(variable=None)#

Return the standard deviation of the distribution.

Uses scipy.stats.rv_histogram for computation.

Parameters:

variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Return type:

DataArray

moment(order, variable=None)#

Return the nth moment of the distribution.

Uses scipy.stats.rv_histogram for computation.

Parameters:
  • order (int) – Order of moment, order>=1.

  • variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Return type:

DataArray

interval(confidence, variable=None)#

Return the confidence interval with equal areas around the median.

The interval is computed as [ppf(p_tail); ppf(1-p_tail)] with p_tail = (1-confidence)/2.

Uses scipy.stats.rv_histogram for computation.

Parameters:
  • confidence (float) – Probability that a value falls within the returned range. Must be between 0 and 1.

  • variable (str | None) – Variable along which to apply this function. All rv_histogram functions apply to a 1D histogram, so we loop over all other dimensions.

Returns:

dataset – Dataset with variables confidence_low and confidence_high, corresponding to the low and high values of the confidence interval.

Return type:

Dataset

remove_flow(coord)#

Remove flow bins from a coordinate.

Parameters:

coord (DataArray)

Return type:

DataArray