http://zorba.io/modules/excel/statistical

View as XML or JSON.

This is a library module offering a part of the set of statistical functions defined by Microsoft Excel 2003.

Function Summary

avedev ($numbers as xs:anyAtomicType+) as xs:anyAtomicType

Returns the average of the absolute deviations of data points from their mean.

average ($numbers as xs:anyAtomicType*) as xs:anyAtomicType

Returns the average (arithmetic mean) of the arguments.

averagea ($numbers as xs:anyAtomicType+) as xs:anyAtomicType

Calculates the average (arithmetic mean) of the values in the sequence of arguments.

count ($numbers as xs:anyAtomicType*) as xs:integer

Counts the number of cells that contain numbers or values castable to numeric.

counta ($numbers as xs:anyAtomicType*) as xs:integer

Counts the number of values that are not empty.

countblank ($cells as xs:anyAtomicType*) as xs:integer

Counts the empty values in a sequence.

large ($numbers as xs:anyAtomicType*, $k as xs:integer) as xs:anyAtomicType

Returns the k-th largest value in a data set.

max ($numbers as xs:anyAtomicType*) as xs:anyAtomicType

Returns the largest number in a sequence.

maxa ($numbers as xs:anyAtomicType*) as xs:anyAtomicType

Returns the largest value in a list of arguments.

median ($numbers as xs:anyAtomicType*) as xs:anyAtomicType

Returns the median of the given numbers.

min ($numbers as xs:anyAtomicType*) as xs:anyAtomicType

Returns the smallest number in a sequence.

mina ($numbers as xs:anyAtomicType*) as xs:anyAtomicType

Returns the smallest value in a list of arguments.

mode ($numbers as xs:anyAtomicType*) as xs:anyAtomicType

Returns the most frequently occurring, or repetitive, value in a sequence.

percentile ($numbers as xs:anyAtomicType*, $k_at as xs:anyAtomicType) as xs:anyAtomicType

Returns the k-th percentile of values in a sequence.

percentrank ($numbers as xs:anyAtomicType*, $x as xs:anyAtomicType) as xs:decimal

Returns the rank of a value in a data set as a percentage of the data set.

prob ($x_range as xs:anyAtomicType+, $prob_range as xs:anyAtomicType+, $range_lower_limit as xs:anyAtomicType) as xs:anyAtomicType

This is the same as above, only that upper_limit is not specified.

prob ($x_range as xs:anyAtomicType+, $prob_range as xs:anyAtomicType+, $range_lower_limit as xs:anyAtomicType, $upper_limit as xs:anyAtomicType) as xs:anyAtomicType

Returns the probability that values in a range are between two limits.

quartile ($numbers as xs:anyAtomicType*, $quart as xs:integer) as xs:anyAtomicType

Returns the quartile of a data set.

rank ($x as xs:anyAtomicType, $numbers as xs:anyAtomicType*) as xs:decimal

This RANK function is same as the above, only that $order_ascending is set by default to false.

rank ($x as xs:anyAtomicType, $numbers as xs:anyAtomicType*, $order_ascending as xs:boolean) as xs:decimal

Returns the rank of a number in a list of numbers.

slope ($known_y as xs:anyAtomicType+, $known_x as xs:anyAtomicType+) as xs:anyAtomicType

Returns the slope of the linear regression line through data points in known_y's and known_x's.

small ($numbers as xs:anyAtomicType*, $k as xs:integer) as xs:anyAtomicType

This function computes the k-th smallest value in a data set.

standardize ($x as xs:anyAtomicType, $mean as xs:anyAtomicType, $standard_dev as xs:anyAtomicType) as xs:double

Returns a normalized value from a distribution characterized by mean and standard_dev.

var ($numbers as xs:anyAtomicType+) as xs:anyAtomicType

Estimates variance based on a sample.

vara ($numbers as xs:anyAtomicType+) as xs:anyAtomicType

Estimates variance based on a sample.

varp ($numbers as xs:anyAtomicType+) as xs:anyAtomicType

Calculates variance based on the entire population.

varpa ($numbers as xs:anyAtomicType+) as xs:anyAtomicType

Calculates variance based on the entire population.

Functions

avedev#1

declare  function excel:avedev($numbers as xs:anyAtomicType+) as xs:anyAtomicType
Returns the average of the absolute deviations of data points from their mean. The formula is sum(abs(x - average_x))/n, where n is the count of x in the sequence.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. Sequence can be of any length from 1 up.

Returns

xs:anyAtomicType
The formula result

average#1

declare  function excel:average($numbers as xs:anyAtomicType*) as xs:anyAtomicType
Returns the average (arithmetic mean) of the arguments. Arguments can be empty values, otherwise must be castable to numeric. If sequence is empty then zero is returned. The sequence can be of any length.

Parameters

numbers as xs:anyAtomicType
The sequence of numbers or empty values.

Returns

xs:anyAtomicType
The sum of all numbers divided by the number of non-empty values.

averagea#1

declare  function excel:averagea($numbers as xs:anyAtomicType+) as xs:anyAtomicType
Calculates the average (arithmetic mean) of the values in the sequence of arguments. Arguments can be of any type. The numbers are added, and the sum is divided by the size of entire sequence.

Parameters

numbers as xs:anyAtomicType
the sequence of values of any type. The sequence can be of any length, from 1 up.

Returns

xs:anyAtomicType
The result

count#1

declare  function excel:count($numbers as xs:anyAtomicType*) as xs:integer
Counts the number of cells that contain numbers or values castable to numeric.

Parameters

numbers as xs:anyAtomicType
The sequence of values, of any length.

Returns

xs:integer
The count of numbers.

counta#1

declare  function excel:counta($numbers as xs:anyAtomicType*) as xs:integer
Counts the number of values that are not empty. Empty values are the one with string value "".

Parameters

numbers as xs:anyAtomicType
the sequence of values of any type, any length

Returns

xs:integer
The count of non-empty values

countblank#1

declare  function excel:countblank($cells as xs:anyAtomicType*) as xs:integer
Counts the empty values in a sequence. The empty values are the ones with string value "". The value 0 is not counted.

Parameters

cells as xs:anyAtomicType
the sequence of values, of any length

Returns

xs:integer
The count

large#2

declare  function excel:large($numbers as xs:anyAtomicType*, $k as xs:integer) as xs:anyAtomicType
Returns the k-th largest value in a data set. If n is the number of data points in a range, then LARGE(array,1) returns the largest value, and LARGE(array,n) returns the smallest value.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
k as xs:integer
the position of largest value, with value from 1 to count of values

Returns

xs:anyAtomicType
The k-th largest value as numeric type

max#1

declare  function excel:max($numbers as xs:anyAtomicType*) as xs:anyAtomicType
Returns the largest number in a sequence.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. The sequence can be of any length.

Returns

xs:anyAtomicType
The max

maxa#1

declare  function excel:maxa($numbers as xs:anyAtomicType*) as xs:anyAtomicType
Returns the largest value in a list of arguments. In this implementation there is no difference between MAX and MAXA.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric The sequence can be of any length.

Returns

xs:anyAtomicType
The max

median#1

declare  function excel:median($numbers as xs:anyAtomicType*) as xs:anyAtomicType
Returns the median of the given numbers. The median is the number in the middle of a set of numbers. Half the numbers have values that are greater than the median, and half the numbers have values that are less than the median.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers, of any length

Returns

xs:anyAtomicType
for odd count of numbers return the number in the middle of the sorted sequence. For even count of numbers return the average of the two numbers in the middle.

min#1

declare  function excel:min($numbers as xs:anyAtomicType*) as xs:anyAtomicType
Returns the smallest number in a sequence.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. The sequence can be of any length.

Returns

xs:anyAtomicType
The min

mina#1

declare  function excel:mina($numbers as xs:anyAtomicType*) as xs:anyAtomicType
Returns the smallest value in a list of arguments. In this implementation there is no difference between MAX and MAXA.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric The sequence can be of any length.

Returns

xs:anyAtomicType
The min

mode#1

declare  function excel:mode($numbers as xs:anyAtomicType*) as xs:anyAtomicType
Returns the most frequently occurring, or repetitive, value in a sequence. Arguments must be castable to numeric.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers, of any length

Returns

xs:anyAtomicType
The most occuring number

percentile#2

declare  function excel:percentile($numbers as xs:anyAtomicType*, $k_at as xs:anyAtomicType) as xs:anyAtomicType
Returns the k-th percentile of values in a sequence. If k is not a multiple of 1/(n - 1), PERCENTILE interpolates to determine the value at the k-th percentile. The function is computed by (max-min)*k + min

Parameters

numbers as xs:anyAtomicType
the sequence of numbers, of any length
k_at as xs:anyAtomicType
the percentile, with value between 0 .. 1 inclusive

Returns

xs:anyAtomicType
The computed percentile

percentrank#2

declare  function excel:percentrank($numbers as xs:anyAtomicType*, $x as xs:anyAtomicType) as xs:decimal
Returns the rank of a value in a data set as a percentage of the data set. If x does not match one of the values in array, PERCENTRANK interpolates to return the correct percentage rank.

The formula is uses: (RANK - 1) / (size - 1) .

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numbers. The sequence can be of any length, from 1 up.
x as xs:anyAtomicType
is the value for which you want to know the rank

Returns

xs:decimal
The percentage of rank.

prob#3

declare  function excel:prob($x_range as xs:anyAtomicType+, $prob_range as xs:anyAtomicType+, $range_lower_limit as xs:anyAtomicType) as xs:anyAtomicType
This is the same as above, only that upper_limit is not specified. The probability is computed only for range_lower_limit.

Parameters

x_range as xs:anyAtomicType
is the range of numeric values of x with which there are associated probabilities. This does not need to be ordered.
prob_range as xs:anyAtomicType
is a set of probabilities associated with values in x_range.
range_lower_limit as xs:anyAtomicType
is the value for which you want a probability.

Returns

xs:anyAtomicType
The probability of the range_lower_limit value

prob#4

declare  function excel:prob($x_range as xs:anyAtomicType+, $prob_range as xs:anyAtomicType+, $range_lower_limit as xs:anyAtomicType, $upper_limit as xs:anyAtomicType) as xs:anyAtomicType
Returns the probability that values in a range are between two limits.

Parameters

x_range as xs:anyAtomicType
is the range of numeric values of x with which there are associated probabilities. This does not need to be ordered.
prob_range as xs:anyAtomicType
is a set of probabilities associated with values in x_range.
range_lower_limit as xs:anyAtomicType
is the lower bound on the value for which you want a probability.
upper_limit as xs:anyAtomicType
is the upper bound on the value for which you want a probability.

Returns

xs:anyAtomicType
The probability of the entire range

quartile#2

declare  function excel:quartile($numbers as xs:anyAtomicType*, $quart as xs:integer) as xs:anyAtomicType
Returns the quartile of a data set.

Parameters

numbers as xs:anyAtomicType
sequence of numbers or values castable to numbers. The sequence can be of any length, from 1 up.
quart as xs:integer
one of the values 0, 1, 2, 3, 4 with meaning:
0
compute minimum value
1
compute first quartile (25th percentile)
2
compute median value (50th percentile)
3
compute third quartile (75th percentile)
4
compute maximum value

Returns

xs:anyAtomicType
the computed quartile, as numeric type

rank#2

declare  function excel:rank($x as xs:anyAtomicType, $numbers as xs:anyAtomicType*) as xs:decimal
This RANK function is same as the above, only that $order_ascending is set by default to false.

Parameters

x as xs:anyAtomicType
The number whose rank you want to find.
numbers as xs:anyAtomicType
the sequence of numbers or values castable to numbers. The sequence can be of any length.

Returns

xs:decimal
The rank of $x.

rank#3

declare  function excel:rank($x as xs:anyAtomicType, $numbers as xs:anyAtomicType*, $order_ascending as xs:boolean) as xs:decimal
Returns the rank of a number in a list of numbers. The rank of a number is its size relative to other values in a list. (If you were to sort the list, the rank of the number would be its position.) RANK gives duplicate numbers the same rank.

Parameters

x as xs:anyAtomicType
The number whose rank you want to find.
numbers as xs:anyAtomicType
The sequence of numbers or values castable to numbers. The sequence can be of any length.
order_ascending as xs:boolean
A boolean having the meaning:
false
then rank the number as if the sequence was sorted in descending order.
true
then rank the number as if the sequence was sorted in ascending order.

Returns

xs:decimal
The rank of $x.

slope#2

declare  function excel:slope($known_y as xs:anyAtomicType+, $known_x as xs:anyAtomicType+) as xs:anyAtomicType
Returns the slope of the linear regression line through data points in known_y's and known_x's. The slope is the vertical distance divided by the horizontal distance between any two points on the line, which is the rate of change along the regression line. It computes the formula:

sum((x - average_x)(y - average_y)) / sum((x - average_x)^2)

where average_x and average_y are computed with AVERAGE function.

Parameters

known_y as xs:anyAtomicType
the sequence of y numbers. The sequence can be of any length, from 1 up.
known_x as xs:anyAtomicType
the sequence of x numbers. The sequence can be of any length, from 1 up.

Returns

xs:anyAtomicType
The slope value, as numeric type

small#2

declare  function excel:small($numbers as xs:anyAtomicType*, $k as xs:integer) as xs:anyAtomicType
This function computes the k-th smallest value in a data set. Use this function to return values with a particular relative standing in a data set. If n is the number of data points in array, SMALL(array,1) equals the smallest value, and SMALL(array,n) equals the largest value.

Parameters

numbers as xs:anyAtomicType
A sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.
k as xs:integer
The position (from the smallest) in the sequence of data to return. Must have value between 1 and size of sequence.

Returns

xs:anyAtomicType
The k-th smallest value of $numbers.

standardize#3

declare  function excel:standardize($x as xs:anyAtomicType, $mean as xs:anyAtomicType, $standard_dev as xs:anyAtomicType) as xs:double
Returns a normalized value from a distribution characterized by mean and standard_dev.

The formula is (x - mean) / standard_dev .

Parameters

x as xs:anyAtomicType
is the value you want to normalize
mean as xs:anyAtomicType
is the arithmetic mean of the distribution.
standard_dev as xs:anyAtomicType
is the standard deviation of the distribution.

Returns

xs:double
The normalized x, as numeric type

var#1

declare  function excel:var($numbers as xs:anyAtomicType+) as xs:anyAtomicType
Estimates variance based on a sample.

The formula is sum(x - average_x)^2 / (n - 1).

average_x is computed with AVERAGE function.

n is the count of numbers from the sequence, excluding empty values.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.

Returns

xs:anyAtomicType
The variance, as numeric type

vara#1

declare  function excel:vara($numbers as xs:anyAtomicType+) as xs:anyAtomicType
Estimates variance based on a sample.

The formula is sum(x - average_x)^2 / (n - 1).

average_x is computed with AVERAGE function.

n is the size of sequence, including empty values.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.

Returns

xs:anyAtomicType
The variance, as numeric type

varp#1

declare  function excel:varp($numbers as xs:anyAtomicType+) as xs:anyAtomicType
Calculates variance based on the entire population.

The formula is sum(x - average_x)^2 / n.

average_x is computed with AVERAGE function.

n is the count of numbers from the sequence, excluding empty values.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.

Returns

xs:anyAtomicType
The variance, as numeric type

varpa#1

declare  function excel:varpa($numbers as xs:anyAtomicType+) as xs:anyAtomicType
Calculates variance based on the entire population.

The formula is sum(x - average_x)^2 / n.

average_x is computed with AVERAGE function.

n is the size of sequence, including empty values.

Parameters

numbers as xs:anyAtomicType
the sequence of numbers or values castable to numeric. The sequence can be of any length, from 1 up.

Returns

xs:anyAtomicType
The variance, as numeric type