# Statistics

The Statistics module contains basic statistics functionality.

`Statistics.std`

— Function.`std(v; corrected::Bool=true, mean=nothing, dims)`

Compute the sample standard deviation of a vector or array `v`

, optionally along the given dimensions. The algorithm returns an estimator of the generative distribution's standard deviation under the assumption that each entry of `v`

is an IID drawn from that generative distribution. This computation is equivalent to calculating `sqrt(sum((v - mean(v)).^2) / (length(v) - 1))`

. A pre-computed `mean`

may be provided. If `corrected`

is `true`

, then the sum is scaled with `n-1`

, whereas the sum is scaled with `n`

if `corrected`

is `false`

where `n = length(x)`

.

If array contains `NaN`

or `missing`

values, the result is also `NaN`

or `missing`

(`missing`

takes precedence if array contains both). Use the `skipmissing`

function to omit `missing`

entries and compute the standard deviation of non-missing values.

`Statistics.stdm`

— Function.`stdm(v, m; corrected::Bool=true)`

Compute the sample standard deviation of a vector `v`

with known mean `m`

. If `corrected`

is `true`

, then the sum is scaled with `n-1`

, whereas the sum is scaled with `n`

if `corrected`

is `false`

where `n = length(x)`

.

If array contains `NaN`

or `missing`

values, the result is also `NaN`

or `missing`

(`missing`

takes precedence if array contains both). Use the `skipmissing`

function to omit `missing`

entries and compute the standard deviation of non-missing values.

`Statistics.var`

— Function.`var(v; dims, corrected::Bool=true, mean=nothing)`

Compute the sample variance of a vector or array `v`

, optionally along the given dimensions. The algorithm will return an estimator of the generative distribution's variance under the assumption that each entry of `v`

is an IID drawn from that generative distribution. This computation is equivalent to calculating `sum(abs2, v - mean(v)) / (length(v) - 1)`

. If `corrected`

is `true`

, then the sum is scaled with `n-1`

, whereas the sum is scaled with `n`

if `corrected`

is `false`

where `n = length(x)`

. The mean `mean`

over the region may be provided.

If array contains `NaN`

or `missing`

values, the result is also `NaN`

or `missing`

(`missing`

takes precedence if array contains both). Use the `skipmissing`

function to omit `missing`

entries and compute the variance of non-missing values.

`Statistics.varm`

— Function.`varm(v, m; dims, corrected::Bool=true)`

Compute the sample variance of a collection `v`

with known mean(s) `m`

, optionally over the given dimensions. `m`

may contain means for each dimension of `v`

. If `corrected`

is `true`

, then the sum is scaled with `n-1`

, whereas the sum is scaled with `n`

if `corrected`

is `false`

where `n = length(x)`

.

If array contains `NaN`

or `missing`

values, the result is also `NaN`

or `missing`

(`missing`

takes precedence if array contains both). Use the `skipmissing`

function to omit `missing`

entries and compute the variance of non-missing values.

`Statistics.cor`

— Function.`cor(x::AbstractVector)`

Return the number one.

`cor(X::AbstractMatrix; dims::Int=1)`

Compute the Pearson correlation matrix of the matrix `X`

along the dimension `dims`

.

`cor(x::AbstractVector, y::AbstractVector)`

Compute the Pearson correlation between the vectors `x`

and `y`

.

`cor(X::AbstractVecOrMat, Y::AbstractVecOrMat; dims=1)`

Compute the Pearson correlation between the vectors or matrices `X`

and `Y`

along the dimension `dims`

.

`Statistics.cov`

— Function.`cov(x::AbstractVector; corrected::Bool=true)`

Compute the variance of the vector `x`

. If `corrected`

is `true`

(the default) then the sum is scaled with `n-1`

, whereas the sum is scaled with `n`

if `corrected`

is `false`

where `n = length(x)`

.

`cov(X::AbstractMatrix; dims::Int=1, corrected::Bool=true)`

Compute the covariance matrix of the matrix `X`

along the dimension `dims`

. If `corrected`

is `true`

(the default) then the sum is scaled with `n-1`

, whereas the sum is scaled with `n`

if `corrected`

is `false`

where `n = size(X, dims)`

.

`cov(x::AbstractVector, y::AbstractVector; corrected::Bool=true)`

Compute the covariance between the vectors `x`

and `y`

. If `corrected`

is `true`

(the default), computes $\frac{1}{n-1}\sum_{i=1}^n (x_i-\bar x) (y_i-\bar y)^*$ where $*$ denotes the complex conjugate and `n = length(x) = length(y)`

. If `corrected`

is `false`

, computes $\frac{1}{n}\sum_{i=1}^n (x_i-\bar x) (y_i-\bar y)^*$.

`cov(X::AbstractVecOrMat, Y::AbstractVecOrMat; dims::Int=1, corrected::Bool=true)`

Compute the covariance between the vectors or matrices `X`

and `Y`

along the dimension `dims`

. If `corrected`

is `true`

(the default) then the sum is scaled with `n-1`

, whereas the sum is scaled with `n`

if `corrected`

is `false`

where `n = size(X, dims) = size(Y, dims)`

.

`Statistics.mean!`

— Function.`mean!(r, v)`

Compute the mean of `v`

over the singleton dimensions of `r`

, and write results to `r`

.

**Examples**

```
julia> v = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> mean!([1., 1.], v)
2-element Array{Float64,1}:
1.5
3.5
julia> mean!([1. 1.], v)
1×2 Array{Float64,2}:
2.0 3.0
```

`Statistics.mean`

— Function.`mean(itr)`

Compute the mean of all elements in a collection.

If `itr`

contains `NaN`

or `missing`

values, the result is also `NaN`

or `missing`

(`missing`

takes precedence if array contains both). Use the `skipmissing`

function to omit `missing`

entries and compute the mean of non-missing values.

**Examples**

```
julia> mean(1:20)
10.5
julia> mean([1, missing, 3])
missing
julia> mean(skipmissing([1, missing, 3]))
2.0
```

`mean(f::Function, itr)`

Apply the function `f`

to each element of collection `itr`

and take the mean.

```
julia> mean(√, [1, 2, 3])
1.3820881233139908
julia> mean([√1, √2, √3])
1.3820881233139908
```

`mean(A::AbstractArray; dims)`

Compute the mean of an array over the given dimensions.

**Examples**

```
julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> mean(A, dims=1)
1×2 Array{Float64,2}:
2.0 3.0
julia> mean(A, dims=2)
2×1 Array{Float64,2}:
1.5
3.5
```

`Statistics.median!`

— Function.`median!(v)`

Like `median`

, but may overwrite the input vector.

`Statistics.median`

— Function.`median(itr)`

Compute the median of all elements in a collection. For an even number of elements no exact median element exists, so the result is equivalent to calculating mean of two median elements.

If `itr`

contains `NaN`

or `missing`

values, the result is also `NaN`

or `missing`

(`missing`

takes precedence if `itr`

contains both). Use the `skipmissing`

function to omit `missing`

entries and compute the median of non-missing values.

**Examples**

```
julia> median([1, 2, 3])
2.0
julia> median([1, 2, 3, 4])
2.5
julia> median([1, 2, missing, 4])
missing
julia> median(skipmissing([1, 2, missing, 4]))
2.0
```

`median(A::AbstractArray; dims)`

Compute the median of an array along the given dimensions.

**Examples**

```
julia> median([1 2; 3 4], dims=1)
1×2 Array{Float64,2}:
2.0 3.0
```

`Statistics.middle`

— Function.`middle(x)`

Compute the middle of a scalar value, which is equivalent to `x`

itself, but of the type of `middle(x, x)`

for consistency.

`middle(x, y)`

Compute the middle of two reals `x`

and `y`

, which is equivalent in both value and type to computing their mean (`(x + y) / 2`

).

`middle(range)`

Compute the middle of a range, which consists of computing the mean of its extrema. Since a range is sorted, the mean is performed with the first and last element.

```
julia> middle(1:10)
5.5
```

`middle(a)`

Compute the middle of an array `a`

, which consists of finding its extrema and then computing their mean.

```
julia> a = [1,2,3.6,10.9]
4-element Array{Float64,1}:
1.0
2.0
3.6
10.9
julia> middle(a)
5.95
```

`Statistics.quantile!`

— Function.`quantile!([q::AbstractArray, ] v::AbstractVector, p; sorted=false)`

Compute the quantile(s) of a vector `v`

at a specified probability or vector or tuple of probabilities `p`

on the interval [0,1]. If `p`

is a vector, an optional output array `q`

may also be specified. (If not provided, a new output array is created.) The keyword argument `sorted`

indicates whether `v`

can be assumed to be sorted; if `false`

(the default), then the elements of `v`

will be partially sorted in-place.

Quantiles are computed via linear interpolation between the points `((k-1)/(n-1), v[k])`

, for `k = 1:n`

where `n = length(v)`

. This corresponds to Definition 7 of Hyndman and Fan (1996), and is the same as the R default.

An `ArgumentError`

is thrown if `v`

contains `NaN`

or `missing`

values.

- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages",
*The American Statistician*, Vol. 50, No. 4, pp. 361-365

**Examples**

```
julia> x = [3, 2, 1];
julia> quantile!(x, 0.5)
2.0
julia> x
3-element Array{Int64,1}:
1
2
3
julia> y = zeros(3);
julia> quantile!(y, x, [0.1, 0.5, 0.9]) === y
true
julia> y
3-element Array{Float64,1}:
1.2
2.0
2.8
```

`Statistics.quantile`

— Function.`quantile(itr, p; sorted=false)`

Compute the quantile(s) of a collection `itr`

at a specified probability or vector or tuple of probabilities `p`

on the interval [0,1]. The keyword argument `sorted`

indicates whether `itr`

can be assumed to be sorted.

Quantiles are computed via linear interpolation between the points `((k-1)/(n-1), v[k])`

, for `k = 1:n`

where `n = length(v)`

. This corresponds to Definition 7 of Hyndman and Fan (1996), and is the same as the R default.

An `ArgumentError`

is thrown if `itr`

contains `NaN`

or `missing`

values. Use the `skipmissing`

function to omit `missing`

entries and compute the quantiles of non-missing values.

- Hyndman, R.J and Fan, Y. (1996) "Sample Quantiles in Statistical Packages",
*The American Statistician*, Vol. 50, No. 4, pp. 361-365

**Examples**

```jldoctest julia> quantile(0:20, 0.5) 10.0

julia> quantile(0:20, [0.1, 0.5, 0.9]) 3-element Array{Float64,1}: 2.0 10.0 18.0

julia> quantile(skipmissing([1, 10, missing]), 0.5) 5.5 ```