***REVIEW OF STATISTICS
. sample space S = the set of anything that can happen which is of interest to the investigator.
. event A = a subset of the sample space S.
. events Ai and Aj are disjoint if Ai Aj = .
. Axioms of probabilities: Pr(.) is a probability function if it satisfies:
1. Pr(Ai) 0 for any Ai S
2. Pr(S) = 1
3. Pr(Ai Aj) = Pr(Ai) + Pr(Aj) if Ai and Aj are disjoint events.
. A random variable X is a real value function that has a specific value at each point of the sample space.
For any B S, the probability that X B is Pr(X B).
. A distribution function is the function F(t) = Pr(X t) such that:
1. F(t) is non-decreasing and continuous from the right
2. F(-) = 0
3. F(+) = 1.
. A probability function = f(x) where x can be a discrete or a continuous variable.
- discrete case: when X can take a countable number of distinct values: x1, x2, x3, ... Then,
f(xi) = Pr(X = xi),
and
Pr(X B) = i[f(xi): xi B)].
- continuous case: the function f(x) satisfies
Pr(x B) = xB f(x) dx,
where f(x) = F(x)/x.
Note: Pr(X = x) = 0 in the continuous case.
. In the multivariate case, x = (x1, x2, ..., xn) where n is the number of random variables.
- The joint distribution function of x = (x1, x2, ..., xn) is Fn(x) = Pr(X1 x1, X2 x2, ..., Xn xn).
- The marginal distribution of the subset (x1, x2, ..., xk), k < n, is Fk(x1, x2, ..., xk) = Fn(x1, x2, ..., xk, , ..., )
. The marginal probability function is
fk(x1, …, xk) = … fn(x1, …, xn) dxk+1 … dxn, in the continuous case,
and
fk(x1, …, xk) = fn(x1, …, xn), in the discrete case,
where fn(x1, ..., xn) is the joint probability function.
. The random variables (x1, x2, ..., xn) are independent if
1
Fn(x1, x2, ..., xn) = F1(x1) F2(x2) ... Fn(xn)
or
fn(x1, x2, ..., xn) = f1(x1) f2(x2) ... fn(xn).
. Conditional distribution: Let f(x,y) be the joint probability function for (x, y). Then,
g1(x) = f(x, y) dy is the marginal probability function for x,
and
g2(y) = f(x, y) dx is the marginal probability function for y.
The conditional probability function of x given y is
h1(xy) = f(x,y)/g2(y),
and the conditional probability function of y given x is
h2(yx) = f(x,y)/g1(x).
. Bayes theorem:
h2(y| x) =
in the continuous case, and
h2(y| x) =
in the discrete case.
Proof: (in the continuous case)
h2(y| x) = f(x, y)/g1(x) = = .
Q.E.D.
In the case where x corresponds to sample information, g2(y) is called the prior probability, h1(xy) is called the likelihood function of the sample and h2(yx) is called the posterior probability.
. Expectations: The expected value of some function r(x) is given by
E[r(x)] = r(x) f(x) dx, in the continuous case,
or
E[r(x)] = x r(x) f(x), in the discrete case,
where E is the "expectation operator."
. The kth moment: Choose r(x) = xk, k = 1, 2, ... Then,
mk = E(xk) is the kth moment of x.
if k = 1, then m1 = E(x) = the mean (or average) of x, a common measure of the "location" of x.
if k = 2, then m2 = E(x2) = the second moment of x.
if k = 3, then m3 = E(x3) = the third moment of x,...
. The kth central moment: Choose r(x) = (x - m1)k, k = 2, 3, ... Then, Mk = E[(x - m1)k] is the kth central moment of x.
if k = 2, then M2 = E[(x-m1)2] = the variance of x, a common measure of the "spread" or "dispersion" of x.
if k = 3, then M3 = E[(x-m1)3], the third central moment of x.
if k = 4, then M4 = E[(x-m1)4], the fourth central moment of x...
. Note: variance of x = V(x) = E(x-m1)2 = E(x2 + m12 - 2xm1) = m2 - m12.
. Other measures:
standard deviation = (M2)½
coefficient of variation = (M2)½/m1
relative skewness = M3/(M21.5)
relative kurtosis = M4/M22
covariance = Cov(x, y) = E[(x - E(x))(y - E(y))]
= E[x y -x E(y) - yE(x) + E(x) E(y)]
= E(x y) - E(x) E(y)
correlation = (x, y) = Cov(x, y)/[(M2(x) M2(y)]½; -1 1.
. Let: x = (x1, x2, ..., xn)' = (n1) vector with mean E(x) = = (1, 2, ..., n)' and variance = {ij} where ii = V(xi) is the variance of xi and ij = Cov(xi, xj) is the covariance of xi with xj, and is a (nn) matrix. Let y = Ax + b. Then,
E(y) = A E(x) + b = A + b
V(y) = A V(x) A' = A A'.
. Note: If x and y are independently distributed with finite variances, then Cov(x, y) = 0 and V(x + y) = V(x) + V(y).
. Chebyschev inequality: If V(x) exists (i.e. if it is finite), then
Pr[|x - E(x)| t] V(x)/t2
. Moment Generating Function: G(t) = E(etx)
If mr exists (i.e. if it is finite), then the moment generating function G(t) satisfies
[rG(t)/tr]t=0 = E(xr) = mr, r = 1, 2, 3, ...
Proof: A Taylor series expansion of etx evaluated at tx = 0 gives
G(t) = E[1 + tx + (tx)2/2! + (tx)3/3! + ...].
Evaluating the derivative of this expression with respect to t at t = 0 gives the desired result.
Q.E.D.
. Conditional Expectation: Let f(x,y) be a joint probability function, g2(y) be the marginal probability function of y, and h1(xy) = f(x, y)/g2(y) be the conditional probability of x given y. The conditional expectation of a random variable x given y is the expectation based on the conditional probability h1(xy). The unconditional expectation Ex,y of some function r(x,y) is given by
Ex,y r(x, y) = Ey[Exy r(x, y)].
where Exy is the conditional expectation operator and Ey is the expectation based on the marginal probability of y.
Proof: Ex,y r(x, y) = Σx,y r(x, y) f(x, y)
= Σx,y r(x, y) h1(xy) g2(y)
= Σy[Σx r(x, y) h1(xy)] g2(y)
= Ey[Exy r(x, y)].
Q.E.D.
. Conjugate Distributions: In a Bayesian framework, a distribution is conjugate if, for some likelihood function, the prior and posterior distributions belong to the same family.
Example: normal distribution for the unknown mean of a random sample from a normal distribution.
. Some Special Discrete Distributions:
ProbabilityMoment GeneratingMeanVariance
Function f(x)Function G(t)
. Binomial:
n! px(1-p)n-x(pet+(1-p))nnpnp(1-p)
x!(n-x)!
for 0 < p < 1; x = 0, 1, ..., n;
. Bernoulli = Binomial when n = 1
. Negative Binomial:
(r+x-1)! pr(1-p)x[p/(1-(1-p)et)]rr(1-p)/pr(1-p)/p2
x!(r-1)!
for 0 < p < 1; x = 0,1,...,n; for (1-p)et < 1
. Geometric = Negative Binomial when r = 1
. Poisson:
e-xexp((et-1))
x!
for x = 0, 1, 2, ...; > 0;
. Uniform:
f(x) = 1/n(n+1)/2(n2-1)/12
for x = 1, 2, ...,n; n = integer
1
. Some Special Continuous Distributions:
ProbabilityMoment GeneratingMeanVariance
FunctionFunction G(t)
. Beta:
Γ(+) x-1(1-x)-1/(+) αβ/[(+)2(++1]
Γ()Γ()
for 0 < x < 1
. Uniform:
f(x) = 1/(b-a)(b+a)/2(b-a)2/12
for a < x < b
. Normal:
1 exp{-(x-)2/22}exp{t+2t2/2}2
(22)½
for > 0
. Gamma:
x-1e-x(/(-t)//2
()for t <
for > 0; > 0; x > 0
. Exponential = Gamma with = 1
. Chi Square = Gamma with = k/2; = 1/2; k = positive integer
. Pareto:
k/x+1k/(-1)k2/[(-2)(-1)2]
for x > k > 0; > 0for > 1for > 2
. Lognormal
1 exp{-(log(x)-m)2/22} exp(m+2/2) [exp(2)-1]exp(2m+2)
x(2)½
for x > 0; > 0
Note: n! = n (n-1) (n-2) … 1.
() = y-1 e-y dy
= 1 if = 1
= (-1)! if is an integer.