This is gsl-ref.info, produced by makeinfo version 4.0 from
gsl-ref.texi.

INFO-DIR-SECTION Scientific software
START-INFO-DIR-ENTRY
* gsl-ref: (gsl-ref).                   GNU Scientific Library - Reference
END-INFO-DIR-ENTRY

   This file documents the GNU Scientific Library.

   Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001 The GSL Team.

   Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  A
copy of the license is included in the section entitled "GNU Free
Documentation License".



File: gsl-ref.info,  Node: The Gamma Distribution,  Next: The Flat (Uniform) Distribution,  Prev: The Levy skew alpha-Stable Distribution,  Up: Random Number Distributions

The Gamma Distribution
======================

 - Random: double gsl_ran_gamma (const gsl_rng * R, double A, double B)
     This function returns a random variate from the gamma
     distribution.  The distribution function is,

          p(x) dx = {1 \over \Gamma(a) b^a} x^{a-1} e^{-x/b} dx

     for x > 0.

 - Function: double gsl_ran_gamma_pdf (double X, double A, double B)
     This function computes the probability density p(x) at X for a
     gamma distribution with parameters A and B, using the formula
     given above.



File: gsl-ref.info,  Node: The Flat (Uniform) Distribution,  Next: The Lognormal Distribution,  Prev: The Gamma Distribution,  Up: Random Number Distributions

The Flat (Uniform) Distribution
===============================

 - Random: double gsl_ran_flat (const gsl_rng * R, double A, double B)
     This function returns a random variate from the flat (uniform)
     distribution from A to B. The distribution is,

          p(x) dx = {1 \over (b-a)} dx

     if a <= x < b and 0 otherwise.

 - Function: double gsl_ran_flat_pdf (double X, double A, double B)
     This function computes the probability density p(x) at X for a
     uniform distribution from A to B, using the formula given above.



File: gsl-ref.info,  Node: The Lognormal Distribution,  Next: The Chi-squared Distribution,  Prev: The Flat (Uniform) Distribution,  Up: Random Number Distributions

The Lognormal Distribution
==========================

 - Random: double gsl_ran_lognormal (const gsl_rng * R, double ZETA,
          double SIGMA)
     This function returns a random variate from the lognormal
     distribution.  The distribution function is,

          p(x) dx = {1 \over x \sqrt{2 \pi \sigma^2} } \exp(-(\ln(x) - \zeta)^2/2 \sigma^2) dx

     for x > 0.

 - Function: double gsl_ran_lognormal_pdf (double X, double ZETA,
          double SIGMA)
     This function computes the probability density p(x) at X for a
     lognormal distribution with parameters ZETA and SIGMA, using the
     formula given above.



File: gsl-ref.info,  Node: The Chi-squared Distribution,  Next: The F-distribution,  Prev: The Lognormal Distribution,  Up: Random Number Distributions

The Chi-squared Distribution
============================

   The chi-squared distribution arises in statistics If Y_i are n
independent gaussian random variates with unit variance then the
sum-of-squares,

     X_i = \sum_i Y_i^2

has a chi-squared distribution with n degrees of freedom.

 - Random: double gsl_ran_chisq (const gsl_rng * R, double NU)
     This function returns a random variate from the chi-squared
     distribution with NU degrees of freedom. The distribution function
     is,

          p(x) dx = {1 \over \Gamma(\nu/2) } (x/2)^{\nu/2 - 1} \exp(-x/2) dx

     for x >= 0.

 - Function: double gsl_ran_chisq_pdf (double X, double NU)
     This function computes the probability density p(x) at X for a
     chi-squared distribution with NU degrees of freedom, using the
     formula given above.



File: gsl-ref.info,  Node: The F-distribution,  Next: The t-distribution,  Prev: The Chi-squared Distribution,  Up: Random Number Distributions

The F-distribution
==================

   The F-distribution arises in statistics.  If Y_1 and Y_2 are
chi-squared deviates with \nu_1 and \nu_2 degrees of freedom then the
ratio,

     X = { (Y_1 / \nu_1) \over (Y_2 / \nu_2) }

has an F-distribution F(x;\nu_1,\nu_2).

 - Random: double gsl_ran_fdist (const gsl_rng * R, double NU1, double
          NU2)
     This function returns a random variate from the F-distribution
     with degrees of freedom NU1 and NU2. The distribution function is,

          p(x) dx =
             { \Gamma((\nu_1 + \nu_2)/2)
                  \over \Gamma(\nu_1/2) \Gamma(\nu_2/2) }
             \nu_1^{\nu_1/2} \nu_2^{\nu_2/2}
             x^{\nu_1/2 - 1} (\nu_2 + \nu_1 x)^{-\nu_1/2 -\nu_2/2}

     for x >= 0.

 - Function: double gsl_ran_fdist_pdf (double X, double NU1, double NU2)
     This function computes the probability density p(x) at X for an
     F-distribution with NU1 and NU2 degrees of freedom, using the
     formula given above.



File: gsl-ref.info,  Node: The t-distribution,  Next: The Beta Distribution,  Prev: The F-distribution,  Up: Random Number Distributions

The t-distribution
==================

   The t-distribution arises in statistics.  If Y_1 has a normal
distribution and Y_2 has a chi-squared distribution with \nu degrees of
freedom then the ratio,

     X = { Y_1 \over \sqrt{Y_2 / \nu} }

has a t-distribution t(x;\nu) with \nu degrees of freedom.

 - Random: double gsl_ran_tdist (const gsl_rng * R, double NU)
     This function returns a random variate from the t-distribution.
     The distribution function is,

          p(x) dx = {\Gamma((\nu + 1)/2) \over \sqrt{\pi \nu} \Gamma(\nu/2)}
             (1 + x^2/\nu)^{-(\nu + 1)/2} dx

     for -\infty < x < +\infty.

 - Function: double gsl_ran_tdist_pdf (double X, double NU)
     This function computes the probability density p(x) at X for a
     t-distribution with NU degrees of freedom, using the formula given
     above.



File: gsl-ref.info,  Node: The Beta Distribution,  Next: The Logistic Distribution,  Prev: The t-distribution,  Up: Random Number Distributions

The Beta Distribution
=====================

 - Random: double gsl_ran_beta (const gsl_rng * R, double A, double B)
     This function returns a random variate from the beta distribution.
     The distribution function is,

          p(x) dx = {\Gamma(a+b) \over \Gamma(a) \Gamma(b)} x^{a-1} (1-x)^{b-1} dx

     for 0 <= x <= 1.

 - Function: double gsl_ran_beta_pdf (double X, double A, double B)
     This function computes the probability density p(x) at X for a
     beta distribution with parameters A and B, using the formula given
     above.



File: gsl-ref.info,  Node: The Logistic Distribution,  Next: The Pareto Distribution,  Prev: The Beta Distribution,  Up: Random Number Distributions

The Logistic Distribution
=========================

 - Random: double gsl_ran_logistic (const gsl_rng * R, double A)
     This function returns a random variate from the logistic
     distribution.  The distribution function is,

          p(x) dx = { \exp(-x/a) \over a (1 + \exp(-x/a))^2 } dx

     for -\infty < x < +\infty.

 - Function: double gsl_ran_logistic_pdf (double X, double A)
     This function computes the probability density p(x) at X for a
     logistic distribution with scale parameter A, using the formula
     given above.



File: gsl-ref.info,  Node: The Pareto Distribution,  Next: The Spherical Distribution (2D & 3D),  Prev: The Logistic Distribution,  Up: Random Number Distributions

The Pareto Distribution
=======================

 - Random: double gsl_ran_pareto (const gsl_rng * R, double A, double B)
     This function returns a random variate from the Pareto
     distribution of order A.  The distribution function is,

          p(x) dx = (a/b) / (x/b)^{a+1} dx

     for x >= b.

 - Function: double gsl_ran_pareto_pdf (double X, double A, double B)
     This function computes the probability density p(x) at X for a
     Pareto distribution with exponent A and scale B, using the formula
     given above.



File: gsl-ref.info,  Node: The Spherical Distribution (2D & 3D),  Next: The Weibull Distribution,  Prev: The Pareto Distribution,  Up: Random Number Distributions

The Spherical Distribution (2D & 3D)
====================================

   The spherical distributions generate random vectors, located on a
spherical surface.  They can be used as random directions, for example
in the steps of a random walk.

 - Random: void gsl_ran_dir_2d (const gsl_rng * R, double *X, double *Y)
 - Random: void gsl_ran_dir_2d_trig_method (const gsl_rng * R, double
          *X, double *Y)
     This function returns a random direction vector v = (X,Y) in two
     dimensions.  The vector is normalized such that |v|^2 = x^2 + y^2
     = 1.  The obvious way to do this is to take a uniform random
     number between 0 and 2\pi and let X and Y be the sine and cosine
     respectively.  Two trig functions would have been expensive in the
     old days, but with modern hardware implementations, this is
     sometimes the fastest way to go.  This is the case for my home
     Pentium (but not the case for my Sun Sparcstation 20 at work).
     Once can avoid the trig evaluations by choosing X and Y in the
     interior of a unit circle (choose them at random from the interior
     of the enclosing square, and then reject those that are outside
     the unit circle), and then dividing by \sqrt{x^2 + y^2}.  A much
     cleverer approach, attributed to von Neumann (See Knuth, v2, 3rd
     ed, p140, exercise 23), requires neither trig nor a square root.
     In this approach, U and V are chosen at random from the interior
     of a unit circle, and then x=(u^2-v^2)/(u^2+v^2) and
     y=uv/(u^2+v^2).

 - Random: void gsl_ran_dir_3d (const gsl_rng * R, double *X, double
          *Y, double * Z)
     This function returns a random direction vector v = (X,Y,Z) in
     three dimensions.  The vector is normalized such that |v|^2 = x^2
     + y^2 + z^2 = 1.  The method employed is due to Robert E. Knop
     (CACM 13, 326 (1970)), and explained in Knuth, v2, 3rd ed, p136.
     It uses the surprising fact that the distribution projected along
     any axis is actually uniform (this is only true for 3d).

 - Random: void gsl_ran_dir_nd (const gsl_rng * R, int N, double *X)
     This function returns a random direction vector v =
     (x_1,x_2,...,x_n) in N dimensions.  The vector is normalized such
     that |v|^2 = x_1^2 + x_2^2 + ... + x_n^2 = 1.  The method uses the
     fact that a multivariate gaussian distribution is spherically
     symmetric.  Each component is generated to have a gaussian
     distribution, and then the components are normalized.  The method
     is described by Knuth, v2, 3rd ed, p135-136, and attributed to G.
     W. Brown, Modern Mathematics for the Engineer (1956).


File: gsl-ref.info,  Node: The Weibull Distribution,  Next: The Type-1 Gumbel Distribution,  Prev: The Spherical Distribution (2D & 3D),  Up: Random Number Distributions

The Weibull Distribution
========================

 - Random: double gsl_ran_weibull (const gsl_rng * R, double A, double
          B)
     This function returns a random variate from the Weibull
     distribution.  The distribution function is,

          p(x) dx = {b \over a^b} x^{b-1}  \exp(-(x/a)^b) dx

     for x >= 0.

 - Function: double gsl_ran_weibull_pdf (double X, double A, double B)
     This function computes the probability density p(x) at X for a
     Weibull distribution with scale A and exponent B, using the
     formula given above.



File: gsl-ref.info,  Node: The Type-1 Gumbel Distribution,  Next: The Type-2 Gumbel Distribution,  Prev: The Weibull Distribution,  Up: Random Number Distributions

The Type-1 Gumbel Distribution
==============================

 - Random: double gsl_ran_gumbel1 (const gsl_rng * R, double A, double
          B)
     This function returns  a random variate from the Type-1 Gumbel
     distribution.  The Type-1 Gumbel distribution function is,

          p(x) dx = a b \exp(-(b \exp(-ax) + ax)) dx

     for -\infty < x < \infty.

 - Function: double gsl_ran_gumbel1_pdf (double X, double A, double B)
     This function computes the probability density p(x) at X for a
     Type-1 Gumbel distribution with parameters A and B, using the
     formula given above.



File: gsl-ref.info,  Node: The Type-2 Gumbel Distribution,  Next: General Discrete Distributions,  Prev: The Type-1 Gumbel Distribution,  Up: Random Number Distributions

The Type-2 Gumbel Distribution
==============================

 - Random: double gsl_ran_gumbel2 (const gsl_rng * R, double A, double
          B)
     This function returns a random variate from the Type-2 Gumbel
     distribution.  The Type-2 Gumbel distribution function is,

          p(x) dx = a b x^{-a-1} \exp(-b x^{-a}) dx

     for 0 < x < \infty.

 - Function: double gsl_ran_gumbel2_pdf (double X, double A, double B)
     This function computes the probability density p(x) at X for a
     Type-2 Gumbel distribution with parameters A and B, using the
     formula given above.



File: gsl-ref.info,  Node: General Discrete Distributions,  Next: The Poisson Distribution,  Prev: The Type-2 Gumbel Distribution,  Up: Random Number Distributions

General Discrete Distributions
==============================

   Given K discrete events with different probabilities P[k], produce a
random value k consistent with its probability.

   The obvious way to do this is to preprocess the probability list by
generating a cumulative probability array with K+1 elements:

       C[0] = 0
     C[k+1] = C[k]+P[k].

Note that this construction produces C[K]=1.  Now choose a uniform
deviate u between 0 and 1, and find the value of k such that C[k] <= u
< C[k+1].  Although this in principle requires of order \log K steps per
random number generation, they are fast steps, and if you use something
like \lfloor uK \rfloor as a starting point, you can often do pretty
well.

   But faster methods have been devised.  Again, the idea is to
preprocess the probability list, and save the result in some form of
lookup table; then the individual calls for a random discrete event can
go rapidly.  An approach invented by G. Marsaglia (Generating discrete
random numbers in a computer, Comm ACM 6, 37-38 (1963)) is very clever,
and readers interested in examples of good algorithm design are
directed to this short and well-written paper.  Unfortunately, for
large K, Marsaglia's lookup table can be quite large.

   A much better approach is due to Alastair J. Walker (An efficient
method for generating discrete random variables with general
distributions, ACM Trans on Mathematical Software 3, 253-256 (1977);
see also Knuth, v2, 3rd ed, p120-121,139).  This requires two lookup
tables, one floating point and one integer, but both only of size K.
After preprocessing, the random numbers are generated in O(1) time,
even for large K.  The preprocessing suggested by Walker requires
O(K^2) effort, but that is not actually necessary, and the
implementation provided here only takes O(K) effort.  In general, more
preprocessing leads to faster generation of the individual random
numbers, but a diminishing return is reached pretty early.  Knuth points
out that the optimal preprocessing is combinatorially difficult for
large K.

   This method can be used to speed up some of the discrete random
number generators below, such as the binomial distribution.  To use if
for something like the Poisson Distribution, a modification would have
to be made, since it only takes a finite set of K outcomes.

 - Random: gsl_ran_discrete_t * gsl_ran_discrete_preproc (size_t K,
          const double * P)
     This function returns a pointer to a structure that contains the
     lookup table for the discrete random number generator.  The array
     P[] contains the probabilities of the discrete events; these array
     elements must all be positive, but they needn't add up to one (so
     you can think of them more generally as "weights") - the
     preprocessor will normalize appropriately.  This return value is
     used as an argument for the `gsl_ran_discrete' function below.

 - Random: size_t gsl_ran_discrete (const gsl_rng * R, const
          gsl_ran_discrete_t * G)
     After the preprocessor, above, has been called, you use this
     function to get the discrete random numbers.

 - Random: double gsl_ran_discrete_pdf (size_t K, const
          gsl_ran_discrete_t * G)
     Returns the probability P[k] of observing the variable K.  Since
     P[k] is not stored as part of the lookup table, it must be
     recomputed; this computation takes O(K), so if K is large and you
     care about the original array P[k] used to create the lookup
     table, then you should just keep this original array P[k] around.

 - Random: void gsl_ran_discrete_free (gsl_ran_discrete_t * G)
     De-allocates the lookup table pointed to by G.


File: gsl-ref.info,  Node: The Poisson Distribution,  Next: The Bernoulli Distribution,  Prev: General Discrete Distributions,  Up: Random Number Distributions

The Poisson Distribution
========================

 - Random: unsigned int gsl_ran_poisson (const gsl_rng * R, double MU)
     This function returns a random integer from the Poisson
     distribution with mean MU.  The probability distribution for
     Poisson variates is,

          p(k) = {\mu^k \over k!} \exp(-\mu)

     for k >= 0.

 - Function: double gsl_ran_poisson_pdf (unsigned int K, double MU)
     This function computes the probability p(k) of obtaining  K from a
     Poisson distribution with mean MU, using the formula given above.



File: gsl-ref.info,  Node: The Bernoulli Distribution,  Next: The Binomial Distribution,  Prev: The Poisson Distribution,  Up: Random Number Distributions

The Bernoulli Distribution
==========================

 - Random: unsigned int gsl_ran_bernoulli (const gsl_rng * R, double P)
     This function returns either 0 or 1, the result of a Bernoulli
     trial with probability P.  The probability distribution for a
     Bernoulli trial is,

          p(0) = 1 - p
          p(1) = p


 - Function: double gsl_ran_bernoulli_pdf (unsigned int K, double P)
     This function computes the probability p(k) of obtaining K from a
     Bernoulli distribution with probability parameter P, using the
     formula given above.



File: gsl-ref.info,  Node: The Binomial Distribution,  Next: The Negative Binomial Distribution,  Prev: The Bernoulli Distribution,  Up: Random Number Distributions

The Binomial Distribution
=========================

 - Random: unsigned int gsl_ran_binomial (const gsl_rng * R, double P,
          unsigned int N)
     This function returns a random integer from the binomial
     distribution, the number of successes in N independent trials with
     probability P.  The probability distribution for binomial variates
     is,

          p(k) = {n! \over k! (n-k)! } p^k (1-p)^{n-k}

     for 0 <= k <= n.

 - Function: double gsl_ran_binomial_pdf (unsigned int K, double P,
          unsigned int N)
     This function computes the probability p(k) of obtaining K from a
     binomial distribution with parameters P and N, using the formula
     given above.



File: gsl-ref.info,  Node: The Negative Binomial Distribution,  Next: The Pascal Distribution,  Prev: The Binomial Distribution,  Up: Random Number Distributions

The Negative Binomial Distribution
==================================

 - Random: unsigned int gsl_ran_negative_binomial (const gsl_rng * R,
          double P, double N)
     This function returns a random integer from the negative binomial
     distribution, the number of failures occurring before N successes
     in independent trials with probability P of success.  The
     probability distribution for negative binomial variates is,

          p(k) = {\Gamma(n + k) \over \Gamma(k+1) \Gamma(n) } p^n (1-p)^k

     Note that n is not required to be an integer.

 - Function: double gsl_ran_negative_binomial_pdf (unsigned int K,
          double P, double N)
     This function computes the probability p(k) of obtaining K from a
     negative binomial distribution with parameters P and N, using the
     formula given above.



File: gsl-ref.info,  Node: The Pascal Distribution,  Next: The Geometric Distribution,  Prev: The Negative Binomial Distribution,  Up: Random Number Distributions

The Pascal Distribution
=======================

 - Random: unsigned int gsl_ran_pascal (const gsl_rng * R, double P,
          unsigned int K)
     This function returns a random integer from the Pascal
     distribution.  The Pascal distribution is simply a negative
     binomial distribution with an integer value of n.

          p(k) = {(n + k - 1)! \over k! (n - 1)! } p^n (1-p)^k

     for k >= 0

 - Function: double gsl_ran_pascal_pdf (unsigned int K, double P,
          unsigned int N)
     This function computes the probability p(k) of obtaining K from a
     Pascal distribution with parameters P and N, using the formula
     given above.



File: gsl-ref.info,  Node: The Geometric Distribution,  Next: The Hypergeometric Distribution,  Prev: The Pascal Distribution,  Up: Random Number Distributions

The Geometric Distribution
==========================

 - Random: unsigned int gsl_ran_geometric (const gsl_rng * R, double P)
     This function returns a random integer from the geometric
     distribution, the number of independent trials with probability P
     until the first success.  The probability distribution for
     geometric variates is,

          p(k) =  p (1-p)^(k-1)

     for k >= 1.

 - Function: double gsl_ran_geometric_pdf (unsigned int K, double P)
     This function computes the probability p(k) of obtaining K from a
     geometric distribution with probability parameter P, using the
     formula given above.



File: gsl-ref.info,  Node: The Hypergeometric Distribution,  Next: The Logarithmic Distribution,  Prev: The Geometric Distribution,  Up: Random Number Distributions

The Hypergeometric Distribution
===============================

 - Random: unsigned int gsl_ran_hypergeometric (const gsl_rng * R,
          unsigned int N1, unsigned int N2, unsigned int T)
     This function returns a random integer from the hypergeometric
     distribution.  The probability distribution for hypergeometric
     random variates is,

          p(k) =  C(n_1,k) C(n_2, t-k) / C(n_1 + n_2,k)

     where C(a,b) = a!/(b!(a-b)!).  The domain of k is max(0,t-n_2),
     ..., max(t,n_1).

 - Function: double gsl_ran_hypergeometric_pdf (unsigned int K,
          unsigned int N1, unsigned int N2, unsigned int T)
     This function computes the probability p(k) of obtaining K from a
     hypergeometric distribution with parameters N1, N2, N3, using the
     formula given above.



File: gsl-ref.info,  Node: The Logarithmic Distribution,  Next: Shuffling and Sampling,  Prev: The Hypergeometric Distribution,  Up: Random Number Distributions

The Logarithmic Distribution
============================

 - Random: unsigned int gsl_ran_logarithmic (const gsl_rng * R, double
          P)
     This function returns a random integer from the logarithmic
     distribution.  The probability distribution for logarithmic random
     variates is,

          p(k) = {-1 \over \log(1-p)} {(p^k \over k)}

     for k >= 1.

 - Function: double gsl_ran_logarithmic_pdf (unsigned int K, double P)
     This function computes the probability p(k) of obtaining K from a
     logarithmic distribution with probability parameter P, using the
     formula given above.



File: gsl-ref.info,  Node: Shuffling and Sampling,  Next: Random Number Distribution Examples,  Prev: The Logarithmic Distribution,  Up: Random Number Distributions

Shuffling and Sampling
======================

   The following functions allow the shuffling and sampling of a set of
objects.  The algorithms rely on a random number generator as source of
randomness and a poor quality generator can lead to correlations in the
output.  In particular it is important to avoid generators with a short
period.  For more information see Knuth, v2, 3rd ed, Section 3.4.2,
"Random Sampling and Shuffling".

 - Random: void gsl_ran_shuffle (const gsl_rng * R, void * BASE, size_t
          N, size_t SIZE)
     This function randomly shuffles the order of N objects, each of
     size SIZE, stored in the array BASE[0..N-1].  The output of the
     random number generator R is used to produce the permutation.  The
     algorithm generates all possible n!  permutations with equal
     probability, assuming a perfect source of random numbers.

     The following code shows how to shuffle the numbers from 0 to 51,

          int a[52];
          
          for (i = 0; i < 52; i++)
            {
              a[i] = i;
            }
          
          gsl_ran_shuffle (r, a, 52, sizeof (int));


 - Random: int gsl_ran_choose (const gsl_rng * R, void * DEST, size_t
          K, void * SRC, size_t N, size_t SIZE)
     This function fills the array DEST[k] with K objects taken
     randomly from the N elements of the array SRC[0..N-1].  The
     objects are each of size SIZE.  The output of the random number
     generator R is used to make the selection.  The algorithm ensures
     all possible samples are equally likely, assuming a perfect source
     of randomness.

     The objects are sampled _without_ replacement, thus each object can
     only appear once in DEST[k].  It is required that K be less than
     or equal to `n'.  The objects in DEST will be in the same relative
     order as those in SRC.  You will need to call `gsl_ran_shuffle(r,
     dest, n, size)' if you want to randomize the order.

     The following code shows how to select a random sample of three
     unique numbers from the set 0 to 99,

          double a[3], b[100];
          
          for (i = 0; i < 100; i++)
            {
              b[i] = (double) i;
            }
          
          gsl_ran_choose (r, a, 3, b, 100, sizeof (double));


 - Random: void gsl_ran_sample (const gsl_rng * R, void * DEST, size_t
          K, void * SRC, size_t N, size_t SIZE)
     This function is like `gsl_ran_choose' but samples K items from
     the original array of N items SRC with replacement, so the same
     object can appear more than once in the output sequence DEST.
     There is no requirement that K be less than N in this case.


File: gsl-ref.info,  Node: Random Number Distribution Examples,  Next: Random Number Distribution References and Further Reading,  Prev: Shuffling and Sampling,  Up: Random Number Distributions

Examples
========

   The following program demonstrates the use of a random number
generator to produce variates from a distribution.  It prints 10
samples from the Poisson distribution with a mean of 3.

     #include <stdio.h>
     #include <gsl/gsl_rng.h>
     #include <gsl/gsl_randist.h>
     
     int
     main (void)
     {
       const gsl_rng_type * T;
       gsl_rng * r;
     
       int i, n = 10;
       double mu = 3.0;
     
       /* create a generator chosen by the
          environment variable GSL_RNG_TYPE */
     
       gsl_rng_env_setup();
     
       T = gsl_rng_default;
       r = gsl_rng_alloc (T);
     
       /* print n random variates chosen from
          the poisson distribution with mean
          parameter mu */
     
       for (i = 0; i < n; i++)
         {
           unsigned int k = gsl_ran_poisson (r, mu);
           printf(" %u", k);
         }
     
       printf("\n");
       return 0;
     }

If the library and header files are installed under `/usr/local' (the
default location) then the program can be compiled with these options,

     gcc demo.c -lgsl -lgslcblas -lm

Here is the output of the program,

     $ ./a.out
      4 2 3 3 1 3 4 1 3 5

The variates depend on the seed used by the generator.  The seed for the
default generator type `gsl_rng_default' can be changed with the
`GSL_RNG_SEED' environment variable to produce a different stream of
variates,

     $ GSL_RNG_SEED=123 ./a.out
     GSL_RNG_SEED=123
      1 1 2 1 2 6 2 1 8 7

The following program generates a random walk in two dimensions.

     #include <stdio.h>
     #include <gsl/gsl_rng.h>
     #include <gsl/gsl_randist.h>
     
     int
     main (void)
     {
       int i;
       double x = 0, y = 0, dx, dy;
     
       const gsl_rng_type * T;
       gsl_rng * r;
     
       gsl_rng_env_setup();
       T = gsl_rng_default;
       r = gsl_rng_alloc (T);
     
       printf("%g %g\n", x, y);
     
       for (i = 0; i < 10; i++)
         {
           gsl_ran_dir_2d (r, &dx, &dy);
           x += dx; y += dy;
           printf("%g %g\n", x, y);
         }
       return 0;
     }

Example output from the program, three 10-step random walks from the
origin.



File: gsl-ref.info,  Node: Random Number Distribution References and Further Reading,  Prev: Random Number Distribution Examples,  Up: Random Number Distributions

References and Further Reading
==============================

For an encyclopaedic coverage of the subject readers are advised to
consult the book `Non-Uniform Random Variate Generation' by Luc
Devroye.  It covers every imaginable distribution and provides hundreds
of algorithms.

     Luc Devroye, `Non-Uniform Random Variate Generation',
     Springer-Verlag, ISBN 0-387-96305-7.

The subject of random variate generation is also reviewed by Knuth, who
describes algorithms for all the major distributions.

     Donald E. Knuth, `The Art of Computer Programming: Seminumerical
     Algorithms' (Vol 2, 3rd Ed, 1997), Addison-Wesley, ISBN 0201896842.

The Particle Data Group provides a short review of techniques for
generating distributions of random numbers in the "Monte Carlo" section
of its Annual Review of Particle Physics.

     `Review of Particle Properties' R.M. Barnett et al., Physical
     Review D54, 1 (1996) <http://pdg.lbl.gov/>.

The Review of Particle Physics is available online in postscript and pdf
format.


File: gsl-ref.info,  Node: Statistics,  Next: Histograms,  Prev: Random Number Distributions,  Up: Top

Statistics
**********

   This chapter describes the statistical functions in the library.  The
basic statistical functions include routines to compute the mean,
variance and standard deviation.  More advanced functions allow you to
calculate absolute deviations, skewness, and kurtosis as well as the
median and arbitrary percentiles.  The algorithms use recurrence
relations to compute average quantities in a stable way, without large
intermediate values that might overflow.

   The functions are available in versions for datasets in the standard
floating-point and integer types.  The versions for double precision
floating-point data have the prefix `gsl_stats' and are declared in the
header file `gsl_statistics_double.h'.  The versions for integer data
have the prefix `gsl_stats_int' and are declared in the header files
`gsl_statistics_int.h'.

* Menu:

* Mean and standard deviation and variance::
* Absolute deviation::
* Higher moments (skewness and kurtosis)::
* Autocorrelation::
* Covariance::
* Weighted Samples::
* Maximum and Minimum values::
* Median and Percentiles::
* Example statistical programs::
* Statistics References and Further Reading::


File: gsl-ref.info,  Node: Mean and standard deviation and variance,  Next: Absolute deviation,  Up: Statistics

Mean, Standard Deviation and Variance
=====================================

 - Statistics: double gsl_stats_mean (const double DATA[], size_t
          STRIDE, size_t N)
     This function returns the arithmetic mean of DATA, a dataset of
     length N with stride STRIDE.  The arithmetic mean, or "sample
     mean", is denoted by \Hat\mu and defined as,

          \Hat\mu = (1/N) \sum x_i

     where x_i are the elements of the dataset DATA.  For samples drawn
     from a gaussian distribution the variance of \Hat\mu is \sigma^2 /
     N.

 - Statistics: double gsl_stats_variance (const double DATA[], size_t
          STRIDE, size_t N)
     This function returns the estimated, or "sample", variance of
     DATA, a dataset of length N with stride STRIDE.  The estimated
     variance is denoted by \Hat\sigma^2 and is defined by,

          \Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2

     where x_i are the elements of the dataset DATA.  Note that the
     normalization factor of 1/(N-1) results from the derivation of
     \Hat\sigma^2 as an unbiased estimator of the population variance
     \sigma^2.  For samples drawn from a gaussian distribution the
     variance of \Hat\sigma^2 itself is 2 \sigma^4 / N.

     This function computes the mean via a call to `gsl_stats_mean'.  If
     you have already computed the mean then you can pass it directly to
     `gsl_stats_variance_m'.

 - Statistics: double gsl_stats_variance_m (const double DATA[], size_t
          STRIDE, size_t N, double MEAN)
     This function returns the sample variance of DATA relative to the
     given value of MEAN.  The function is computed with \Hat\mu
     replaced by the value of MEAN that you supply,

          \Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2

 - Statistics: double gsl_stats_sd (const double DATA[], size_t STRIDE,
          size_t N)
 - Statistics: double gsl_stats_sd_m (const double DATA[], size_t
          STRIDE, size_t N, double MEAN)
     The standard deviation is defined as the square root of the
     variance.  These functions return the square root of the
     corresponding variance functions above.

 - Statistics: double gsl_stats_variance_with_fixed_mean (const double
          DATA[], size_t STRIDE, size_t N, double MEAN)
     This function computes an unbiased estimate of the variance of
     DATA when the population mean MEAN of the underlying distribution
     is known _a priori_.  In this case the estimator for the variance
     uses the factor 1/N and the sample mean \Hat\mu is replaced by the
     known population mean \mu,

          \Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2


 - Statistics: double gsl_stats_sd_with_fixed_mean (const double
          DATA[], size_t STRIDE, size_t N, double MEAN)
     This function calculates the standard deviation of DATA for a a
     fixed population mean MEAN.  The result is the square root of the
     corresponding variance function.


File: gsl-ref.info,  Node: Absolute deviation,  Next: Higher moments (skewness and kurtosis),  Prev: Mean and standard deviation and variance,  Up: Statistics

Absolute deviation
==================

 - Statistics: double gsl_stats_absdev (const double DATA[], size_t
          STRIDE, size_t N)
     This function computes the absolute deviation from the mean of
     DATA, a dataset of length N with stride STRIDE.  The absolute
     deviation from the mean is defined as,

          absdev  = (1/N) \sum |x_i - \Hat\mu|

     where x_i are the elements of the dataset DATA.  The absolute
     deviation from the mean provides a more robust measure of the
     width of a distribution than the variance.  This function computes
     the mean of DATA via a call to `gsl_stats_mean'.

 - Statistics: double gsl_stats_absdev_m (const double DATA[], size_t
          STRIDE, size_t N, double MEAN)
     This function computes the absolute deviation of the dataset DATA
     relative to the given value of MEAN,

          absdev  = (1/N) \sum |x_i - mean|

     This function is useful if you have already computed the mean of
     DATA (and want to avoid recomputing it), or wish to calculate the
     absolute deviation relative to another value (such as zero, or the
     median).


File: gsl-ref.info,  Node: Higher moments (skewness and kurtosis),  Next: Autocorrelation,  Prev: Absolute deviation,  Up: Statistics

Higher moments (skewness and kurtosis)
======================================

 - Statistics: double gsl_stats_skew (const double DATA[], size_t
          STRIDE, size_t N)
     This function computes the skewness of DATA, a dataset of length N
     with stride STRIDE.  The skewness is defined as,

          skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3

     where x_i are the elements of the dataset DATA.  The skewness
     measures the asymmetry of the tails of a distribution.

     The function computes the mean and estimated standard deviation of
     DATA via calls to `gsl_stats_mean' and `gsl_stats_sd'.

 - Statistics: double gsl_stats_skew_m_sd (const double DATA[], size_t
          STRIDE, size_t N, double MEAN, double SD)
     This function computes the skewness of the dataset DATA using the
     given values of the mean MEAN and standard deviation SD,

          skew = (1/N) \sum ((x_i - mean)/sd)^3

     These functions are useful if you have already computed the mean
     and standard deviation of DATA and want to avoid recomputing them.

 - Statistics: double gsl_stats_kurtosis (const double DATA[], size_t
          STRIDE, size_t N)
     This function computes the kurtosis of DATA, a dataset of length N
     with stride STRIDE.  The kurtosis is defined as,

          kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4)  - 3

     The kurtosis measures how sharply peaked a distribution is,
     relative to its width.  The kurtosis is normalized to zero for a
     gaussian distribution.

 - Statistics: double gsl_stats_kurtosis_m_sd (const double DATA[],
          size_t STRIDE, size_t N, double MEAN, double SD)
     This function computes the kurtosis of the dataset DATA using the
     given values of the mean MEAN and standard deviation SD,

          kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3

     This function is useful if you have already computed the mean and
     standard deviation of DATA and want to avoid recomputing them.


File: gsl-ref.info,  Node: Autocorrelation,  Next: Covariance,  Prev: Higher moments (skewness and kurtosis),  Up: Statistics

Autocorrelation
===============

 - Function: double gsl_stats_lag1_autocorrelation (const double
          data[], const size_t STRIDE, const size_t N)
     This function computes the lag-1 autocorrelation of the dataset
     DATA.

          a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu)
                 \over
                 \sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)}


 - Function: double gsl_stats_lag1_autocorrelation_m (const double
          data[], const size_t STRIDE, const size_t N, const double
          MEAN)
     This function computes the lag-1 autocorrelation of the dataset
     DATA using the given value of the mean MEAN.



File: gsl-ref.info,  Node: Covariance,  Next: Weighted Samples,  Prev: Autocorrelation,  Up: Statistics

Covariance
==========

 - Function: double gsl_stats_covariance (const double DATA1[], const
          size_t STRIDE1, const double data2[], const size_t STRIDE2,
          const size_t N)
     This function computes the covariance of the datasets DATA1 and
     DATA2 which must both be of the same length N.

          covar = (1/(n - 1)) \sum_{i = 1}^{n} (x_i - \Hat x) (y_i - \Hat y)


 - Function: double gsl_stats_covariance_m (const double DATA1[], const
          size_t STRIDE1, const double DATA2[], const size_t N, const
          double MEAN1, const double MEAN2)
     This function computes the covariance of the datasets DATA1 and
     DATA2 using the given values of the means, MEAN1 and MEAN2.



File: gsl-ref.info,  Node: Weighted Samples,  Next: Maximum and Minimum values,  Prev: Covariance,  Up: Statistics

Weighted Samples
================

   The functions described in this section allow the computation of
statistics for weighted samples.  The functions accept an array of
samples, x_i, with associated weights, w_i.  Each sample x_i is
considered as having been drawn from a Gaussian distribution with
variance \sigma_i^2.  The sample weight w_i is defined as the
reciprocal of this variance, w_i = 1/\sigma_i^2.  Setting a weight to
zero corresponds to removing a sample from a dataset.

 - Statistics: double gsl_stats_wmean (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N)
     This function returns the weighted mean of the dataset DATA with
     stride STRIDE and length N, using the set of weights W with stride
     WSTRIDE and length N.  The weighted mean is defined as,

          \Hat\mu = (\sum w_i x_i) / (\sum w_i)

 - Statistics: double gsl_stats_wvariance (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N)
     This function returns the estimated variance of the dataset DATA
     with stride STRIDE and length N, using the set of weights W with
     stride WSTRIDE and length N.  The estimated variance of a weighted
     dataset is defined as,

          \Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2)))
                          \sum w_i (x_i - \Hat\mu)^2

     Note that this expression reduces to an unweighted variance with
     the familiar 1/(N-1) factor when there are N equal non-zero
     weights.

 - Statistics: double gsl_stats_wvariance_m (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N, double
          WMEAN)
     This function returns the estimated variance of the weighted
     dataset DATA using the given weighted mean WMEAN.

 - Statistics: double gsl_stats_wsd (const double W[], size_t WSTRIDE,
          const double DATA[], size_t STRIDE, size_t N)
     The standard deviation is defined as the square root of the
     variance.  This function returns the square root of the
     corresponding variance function `gsl_stats_wvariance' above.

 - Statistics: double gsl_stats_wsd_m (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N, double
          WMEAN)
     This function returns the square root of the corresponding variance
     function `gsl_stats_wvariance_m' above.

 - Statistics: double gsl_stats_wvariance_with_fixed_mean (const double
          W[], size_t WSTRIDE, const double DATA[], size_t STRIDE,
          size_t N)
     This function computes an unbiased estimate of the variance of
     weighted dataset DATA when the population mean MEAN of the
     underlying distribution is known _a priori_.  In this case the
     estimator for the variance replaces the sample mean \Hat\mu by the
     known population mean \mu,

          \Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i)

 - Statistics: double gsl_stats_wsd_with_fixed_mean (const double W[],
          size_t WSTRIDE, const double DATA[], size_t STRIDE, size_t N)
     The standard deviation is defined as the square root of the
     variance.  This function returns the square root of the
     corresponding variance function above.

 - Statistics: double gsl_stats_wabsdev (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N)
     This function computes the weighted absolute deviation from the
     weighted mean of DATA.  The absolute deviation from the mean is
     defined as,

          absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i)

 - Statistics: double gsl_stats_wabsdev_m (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N, double
          WMEAN)
     This function computes the absolute deviation of the weighted
     dataset DATA about the given weighted mean WMEAN.

 - Statistics: double gsl_stats_wskew (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N)
     This function computes the weighted skewness of the dataset DATA.

          skew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i)

 - Statistics: double gsl_stats_wskew_m_sd (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N, double
          WMEAN, double WSD)
     This function computes the weighted skewness of the dataset DATA
     using the given values of the weighted mean and weighted standard
     deviation, WMEAN and WSD.

 - Statistics: double gsl_stats_wkurtosis (const double W[], size_t
          WSTRIDE, const double DATA[], size_t STRIDE, size_t N)
     This function computes the weighted kurtosis of the dataset DATA.
          kurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) - 3

 - Statistics: double gsl_stats_wkurtosis_m_sd (const double W[],
          size_t WSTRIDE, const double DATA[], size_t STRIDE, size_t N,
          double WMEAN, double WSD)
     This function computes the weighted kurtosis of the dataset DATA
     using the given values of the weighted mean and weighted standard
     deviation, WMEAN and WSD.


File: gsl-ref.info,  Node: Maximum and Minimum values,  Next: Median and Percentiles,  Prev: Weighted Samples,  Up: Statistics

Maximum and Minimum values
==========================

 - Statistics: double gsl_stats_max (const double DATA[], size_t
          STRIDE, size_t N)
     This function returns the maximum value in DATA, a dataset of
     length N with stride STRIDE.  The maximum value is defined as the
     value of the element x_i which satisfies x_i >= x_j for all j.

     If you want instead to find the element with the largest absolute
     magnitude you will need to apply `fabs' or `abs' to your data
     before calling this function.

 - Statistics: double gsl_stats_min (const double DATA[], size_t
          STRIDE, size_t N)
     This function returns the minimum value in DATA, a dataset of
     length N with stride STRIDE.  The minimum value is defined as the
     value of the element x_i which satisfies x_i <= x_j for all j.

     If you want instead to find the element with the smallest absolute
     magnitude you will need to apply `fabs' or `abs' to your data
     before calling this function.

 - Statistics: void gsl_stats_minmax (double * MIN, double * MAX, const
          double DATA[], size_t STRIDE, size_t N)
     This function finds both the minimum and maximum values MIN, MAX
     in DATA in a single pass.

 - Statistics: size_t gsl_stats_max_index (const double DATA[], size_t
          STRIDE, size_t N)
     This function returns the index of the maximum value in DATA, a
     dataset of length N with stride STRIDE.  The maximum value is
     defined as the value of the element x_i which satisfies x_i >= x_j
     for all j.  When there are several equal maximum elements then the
     first one is chosen.

 - Statistics: size_t gsl_stats_min_index (const double DATA[], size_t
          STRIDE, size_t N)
     This function returns the index of the minimum value in DATA, a
     dataset of length N with stride STRIDE.  The minimum value is
     defined as the value of the element x_i which satisfies x_i >= x_j
     for all j.  When there are several equal minimum elements then the
     first one is chosen.

 - Statistics: void gsl_stats_minmax_index (size_t * MIN_INDEX, size_t
          * MAX_INDEX, const double DATA[], size_t STRIDE, size_t N)
     This function returns the indexes MIN_INDEX, MAX_INDEX of the
     minimum and maximum values in DATA in a single pass.


File: gsl-ref.info,  Node: Median and Percentiles,  Next: Example statistical programs,  Prev: Maximum and Minimum values,  Up: Statistics

Median and Percentiles
======================

   The median and percentile functions described in this section
operate on sorted data.  For convenience we use "quantiles", measured
on a scale of 0 to 1, instead of percentiles (which use a scale of 0 to
100).

 - Statistics: double gsl_stats_median_from_sorted_data (const double
          SORTED_DATA[], size_t STRIDE, size_t N)
     This function returns the median value of SORTED_DATA, a dataset
     of length N with stride STRIDE.  The elements of the array must be
     in ascending numerical order.  There are no checks to see whether
     the data are sorted, so the function `gsl_sort' should always be
     used first.

     When the dataset has an odd number of elements the median is the
     value of element (n-1)/2.  When the dataset has an even number of
     elements the median is the mean of the two nearest middle values,
     elements (n-1)/2 and n/2.  Since the algorithm for computing the
     median involves interpolation this function always returns a
     floating-point number, even for integer data types.

 - Statistics: double gsl_stats_quantile_from_sorted_data (const double
          SORTED_DATA[], size_t STRIDE, size_t N, double F)
     This function returns a quantile value of SORTED_DATA, a
     double-precision array of length N with stride STRIDE.  The
     elements of the array must be in ascending numerical order.  The
     quantile is determined by the F, a fraction between 0 and 1.  For
     example, to compute the value of the 75th percentile F should have
     the value 0.75.

     There are no checks to see whether the data are sorted, so the
     function `gsl_sort' should always be used first.

     The quantile is found by interpolation, using the formula

          quantile = (1 - \delta) x_i + \delta x_{i+1}

     where i is `floor'((n - 1)f) and \delta is (n-1)f - i.

     Thus the minimum value of the array (`data[0*stride]') is given by
     F equal to zero, the maximum value (`data[(n-1)*stride]') is given
     by F equal to one and the median value is given by F equal to 0.5.
     Since the algorithm for computing quantiles involves
     interpolation this function always returns a floating-point
     number, even for integer data types.