1
3.Review of Calculus
Let f: X Rm be a function from X to Rm, X Rn.
3.1. Limits of a function
Let b X, kRm.
Definition: Assume that f is defined on a neighborhood S Rm of b, except possibly at b. The limit of f at b, denoted by k = limxb f(x), exists if, for all > 0, there is a such that,
if x X and 0 < ||x - b|| < ,
then ||f(x), k|| < .
Notes:. f(x) does not have to be defined at b.
. if f(b) exists, it can be that k f(b).
. f must be defined in a neighborhood of b.
. even if f is defined on a neighborhood S containing b, the limit of f at b may not exist.
Example 1: f(x)= 0 if x 0
= 1 if x = 0.
Then:limxb f(x) = 0 for all b R (e.g., b = 0)
f(0) = 1.
Example 2: f(x)= 1 if x 0
= -1 if x < 0.
Then:limx0 f(x) does not exist
f(0) = 1.
Example 3: f(x)= sin(1/x), x R\{0}.
Then:limx0 f(x) does not exist
f(0) does not exist.
Alternative Definition: Assume that f is defined on a neighborhood S Rm of b, except possibly at b. The limit of f at b, denoted by k = limxb f(x), exists if:
k = limj f(xj) for every sequence {xj: j = 1, 2, ..., } in X such that xj b for all j, and xj b.
Let f: X R, g: X R (m = 1), such that limxb f(x) = k, and limxb g(x) = p,
. the limits k and p are unique
. limxb (f + g)(x) = k + p
. limxb (f g)(x) = k p
. limxb (f/g)(x) = k/p if p 0, g(x) 0 for all x X
. if f(x) g(x) for all x b in some neighborhood of b, then k p.
3.2. Continuity of f
Let f: X Rm, X Rn.
Definition: The function f is said to be continuous at b X if, for all > 0, there exists a > 0 such that, if x X and ||x - b|| < ,
then ||f(x) - f(b)|| < .
Alternative Definitions:
1/ The function f is said to be continuous at b X if, for all sequences {xj: j = 1, 2, …} in X,
if limj xj b, then limj f(xj) f(b).
2/ The function f is said to be continuous at b X if and only if, for all open sets V Rm such that f(x) V, there is an open set U Rn such that x U and f(z) V for all z U X.
3/ The function f is said to be continuous at b X if and only if, for each open sets V Rm, there is an open set U Rn such that f-1(V) = U S, where f-1(V) = {x X: f(x) V}.
Note:. f must be defined at b
. f does not have to be defined in a neighborhood of b.
Example: X = {b} = a single point. Then f is continuous at b, but f does not have a limit at b.
If f is defined at b X and in a neighborhood S X of b, then f is continuous at b if and only if [limxb f(x)] = f(b).
Definition:
If A X, f is said to be continuous on A if f is continuous at all x A.
If A X, f is said to be continuous if f is continuous at all x X.
Let f: X Rm and g: X Rm be continuous functions, X Rn, then
. f + g is also continuous
. f g is also continuous
. f/g is also continuous if g(x) 0 for all x X.
Let f: X Y and g: Y Rm be continuous functions, X Rn, Y Rk, then
g f: X Rm is also continuous.
3.3. Intermediate Value Theorem
Let f: X R, X Rn.
Theorem 1: Let f be a continuous function on X, a convex subset of Rn.
Let x1 X, x2 X, and f(x1) > f(x2).
Then, given any c R such that f(x1) > c > f(x2), there exists a , 0 < < 1, such that
f[ x1 + (1-) x2] = c.
3.4. Differentiability of f
Let f: X Rm, X being an open set in Rn.
Definition: The function f: X Rm is differentiable at x X if there exists a (mn) matrix A such that, for all > 0,
if there exists a > 0 such that y X and ||x - y|| < ,
then ||f(x) - f(y) - A [x - y]|| < ||x - y||.
Alternative Definition: The function f: X Rm is differentiable at x X if there exists a (mn) matrix A such that
where “yx” means “for all sequences {yj: j = 1 ,2 , …} such that yjx”. If m = n = 1, this can be written as
The (mn) matrix A is called the gradient or derivative of f at x and is denoted by Df(x). (Note that Df(x) is often written as f’(x)). If f is differentiable, then Df is a function Df: X Rmn. In general, we can write f(x) as a (m1) vector of functions
where fj(x) denotes the j-th function, j = 1, 2, ..., m, and x = (x1, x2, ..., xn) X Rn.
Then Df can be written as the (mn) matrix
where the (i, j)-th element Dfi(xj, .) is interpreted as the derivative of fi(x) with respect to xj R, i = 1, 2, ..., m, j = 1, 2, ..., n.
Interpretation: Consider a linear function g(y) = b + A y, y X. We look to approximate f at x by the linear function g(y). We want f(x) = g(x) = b + A x, implying b = f(x) - A x, or g(y) = A [y - x] + f(x).
We also want {||f(y) - g(y)||/||y - x||} to be small at y in the neighborhood of x. This implies that {||f(y) - f(x) - A[y - x]||/||y - x||} is also small in the neighborhood of x.
. If f is differentiable at all points in X, then f is differentiable on X.
. A differentiable function on X is necessarily continuous on X.
. If Df: X Rmn is a continuous function, then f is continuously differentiable on X. This is denoted by f C1, where C1 is the class of continuously differentiable functions.
Examples of derivatives, where f: X R, with X R:
. f(x) = a + b x: Df(x) = b
. f(x) = a + b x + c x2:Df(x) = b + 2c x
. f(x) = a + b x + c x2 + d x3: Df(x) = b + 2c x + 3d x2
. f(x) = a + b xc:Df(x) = bc xc-1
. f(x) = a + b ec x:Df(x) = bc ec x
. f(x) = a + b ln(cx):Df(x) = b/x
And assuming that the functions g and h are differentiable:
. f(x) = g(x) + h(x):Df(x) = Dg(x) + Dh(x)
. f(x) = g(x) h(x):Df(x) = Dg(x) h(x) + g(x) Dh(x) (the “product rule”)
. f(x) = g(x)/h(x), h 0:Df(x)= Dg(x)/h(x) - Dh(x) g(x)/[h(x)]2 (the “quotient rule”)
. f(x) = g(h(x)):Df(x) = Dg(h(x)) Dh(x) (the “chain rule”)
. f(x) = eg(x):Df(x) = eg(x) Dg(x)
. f(x) = ln(g(x)), g > 0:Df(x) = Dg(x)/g(x)
. f(x) = sin(g(x)):Df(x) = cos(g(x)) Dg(x)
. f(x) = cos(g(x)):Df(x) = -sin(g(x)) Dg(x)
Note: Let y be (n1) vector, x be a (n1) vector, and A be a (nn) matrix.
. f(x) = yT x, then Df(x) = yT
. f(x) = xT A x, A being symmetric, then Df(x) = 2 xT A
. f(x) = A x, then Df(x) = A.
Example: Consider
f(x) = x2 sin(1/x2) if x 0
= 0 if x = 0.
It follows that
Df(x) = 2x sin(1/x2) - (2/x) cos(1/x2) for x 0.
Note that the right-hand side of the above expression is not well defined at x = 0.
Yet, Df(0) = limx0 [(f(x) - f(0))/x] = limx0 [x sin(1/x2)] = 0.
The function f is differentiable everywhere, but Df is not continuous at x = 0. Thus f C1.
3.5. Intermediate value theorem for Df(x)
Let f: X R, X R.
Theorem 2: Let f be differentiable on X, a convex subset of R. Let x1 X, x2 X, and Df(x1) < Df(x2). Then, given any c R such that Df(x1) < c < Df(x2), there exists a R, 0 < < 1, such that Df[ x1 + (1-) x2] = c.
Notes: . This result does not assume that f C1.
. The theorem implies that the derivative has some "minimal continuity properties" even if it is not continuous: it cannot have "jump discontinuities".
. The theorem obviously holds if f C1.
3.6. Mean value theorem
Let f: X R, X Rn.
Theorem 3: Let f be differentiable on X, an open convex subset of Rn. Let x1 X and x2 X be (n1) vectors. Then, there exists a R, 0 1, such that
f(x2) - f(x1) = Df( x1 + (1-) x2) [x2 - x1]
where Df(x) is a (1n) vector of the derivative of f at x.
Note: f C1 is not required for the theorem to hold.
Proof: Consider the function
g(x) = f(x2) - f(x) + [(f(x2) - f(x1))/(x2– x1)] (x – x2).
Note that g(x1) = 0 and g(x2) = 0.
If g(x) is constant between x1 and x2, then Dg(x) = 0 for any x between x1 and x2.
If g(x) is not constant between x1 and x2, it attains either a maximum or a minimum between x1 and x2. If it attains a maximum, let that maximum be at point x3. Under differentiability, at x3 between x1 and x2, Dg(x3) = 0 (otherwise x3 would not be a maximum). If it attains a minimum, let that minimum be at point x4. Under differentiability, at x4 between x1 and x2, Dg(x4) = 0 (otherwise x4 would not be a minimum).
Thus, in any situation, there is a point x0 between x1 and x2 that satisfies Dg(x0) = 0. Differentiating g(x) yields
Dg(x) = -Df(x) + (f(x2) - f(x1))/(x2 – x1).
Evaluating this expression at x0 gives the desired result.
3.7. Partial Derivatives
Let f: X R, where X is an open set in Rn. Let ei be the (n1) unit vector ei = (ei1, ei2, ..., ein)T such that eij = 0 if i j
= 1 if i = j.
Definition: The i-th partial derivative of f at x X (or the partial derivative of f with respect to xi at x) is
f/xi(x) = limt0 [(f(x + t ei) - f(x))/t], i = 1, 2, ..., n.
. If f is differentiable at x, then all partial derivatives f/xi exist at x, i = 1, 2, ..., n, and Df(x) = [f/x1(x), f/x2(x), …, f/xn(x)] is a (1n) vector where f/xi is the partial derivative of f with respect to xi.
. If all partial derivatives of f exist and are continuous at x, then Df(x) exists and the (1n) gradient vector can be written as Df(x) = [f/x1(x), f/x2(x), …, f/xn(x)].
. f C1if and only if all partial derivatives of f exist and are continuous on X.
Note: All partial derivatives of a differentiable function always exist. However, the existence of partial derivatives does not imply that the function is differentiable.
Example: f(x, y) = x y/(x2 + y2)1/2 for x 0
= 0 for x = 0.
We have:
f/y(x, 0) = x/(x2)1/2 = 1 for all x 0
f/x(0, y) = 1 for all y 0
f/y(0, 0) = limy0 [(f(0, y) - f(0, 0))/y] = limy0 [(0 - 0)/y] = 0
f/x(0, 0) = 0
But,
implying that f(x, y) is not differentiable. This is a case where the partial derivatives of f exist everywhere, although they are not continuous at (x, y) = (0, 0).
3.8. Directional derivatives
Let f: X R, where X is an open subset of Rn.
Definition: The directional derivative of f at x in the direction h Rn, denoted Df(x, h), is
Df(x, h) = limt0+ [(f(x + t h) - f(x))/t],
where “t0+” means “t0, t > 0”.
Notes: If f is differentiable at x, then
. Df(x, h) exists
. Df(x, h) = Df(x) h
. Df(x, h) = -Df(x, -h)
3.9. Higher order derivatives
Let f: X R, where X is an open subset of Rn.
Let f be differentiable on X. Let Df be differentiable such that f/xi: X R is differentiable at x, i = 1, 2, …, n.
Denote the directional derivative of f/xi in the direction ej at x by
The function f is twice differentiable at x with second derivatives D2f(x) given by the (nn) matrix
If f is twice differentiable for all x X, then f is twice differentiable on X.
If f is twice differentiable on X and 2f/xixj(x) is a continuous function from X to R for i, j = 1, 2, …, n, then f is twice continuously differentiable on X. This is denoted by f C2, where C2 is the class of twice continuously differentiable functions.
Theorem 4: (Young theorem): If f C2, then D2f(x) is a (nn) symmetric matrix, where
for all i, j = 1, 2, …, n, and for all x X.
Note: The function f being just twice differentiable is not enough for Young theorem to holdin general. To see that, consider the following example.
Example: Consider
f(x1, x2)= 0 if (x1, x2) = (0, 0),
= (x13 x2 – x1 x23)/(x12 + x22) otherwise.
Note that
- f/x1(0, x2)= -x2
- f/x2(x1, 0)= x1
- 2f/x2x1(0, 0)= -1
- 2f/x1x2(0, 0)= 1.
It follows that 2f/x2x1(0, 0)≠ 2f/x1x2(0, 0), i.e. that Young theorem does not applyat (x1, x2) = (0, 0). Note that 2f/x1x2(y, y)= 0 for any y > 0. It follows that 2f/x1x2(x1, x2) is not continuous at (x1, x2) = (0, 0), i.e. that f C2.
3.10. Taylor Series
Let f: X R, where X is an open convex subset of R.
Let f(k)(x) denote the k-th derivative of f at x, k = 1, 2, …
The following theorem gives the Taylor series expansion of f.
Theorem 5: (Taylor theorem): Let r be a non-negative integer. Suppose that f(k)(x) exists and is continuous on X for k = 1, 2, …, r, and that f(r+1)(x) exists for all x X. Let x0 and x X. Then,
f(x) = f(x0) + (1/k!) f(k)(x0) [x - x0]k + Rr(x, x0),(TE)
where k! = [k (k-1) (k-2)…(2) (1)] is the factorial of k and the remainder term Rr(x, x0) satisfies
Rr(x, x0) = 1/[(r+1)!] f(r+1)( x0 + (1-) x) [x - x0]r+1
for some , 0 1.
Expression (TE) is a Taylor series expansion of f at x0. The mean value theorem is a special case (when r = 0).
Proof: Let P(x) = f(x0) + (1/k!) f(k)(x0) [x - x0]k. Note that P(x0) = f(x0), and P(k)(x0) = f(k)(x0), for k = 1, …, r. Consider the function
g(t) = f(t) – P(t) – M [t – x0]r+1,
Let M = [f(x) – P(x)]/[(x – x0)r+1] so that g(x) = 0.
Note that g(x0) = 0, and g(k)(x0) = 0, for k = 1, …, r.Also note that g(r+1)(t) = f(r+1)(t) – (r+1)! M. Using these results, repeated use of the mean value theorem implies that there exists a point x1 between x and x0 where g(r+1)(x1) = 0. It follows that
0 = g(r+1)(x1) = f(r+1)(x1) – (r+1)! M, or
M = [1/(r+1)!] f(r+1)(x1).
Substituting this result into g(t) and using g(x) = 0, this gives
f(x) = P(x) + [1/(r+1)!] f(r+1)(x1) [x – x0]r+1,
which is (TE).
By definition of f(r+1)(x), note that the remainder Rr(x, x0) in (TE) satisfies limxx0 {Rr(x, x0)/|x - x0|r} = 0. It follows that the function f(x) can be locally approximated in the neighborhood of x0 by the r-th order Taylor series approximation
f(x) f(x0) + (1/k!) f(k)(x0) [x - x0]k.
3.11. Inverse function theorem
Let f: X Rn, x Rn.
Theorem 6: Let f: X Rn be a C1 function. Let X Rn be an open set. Let y X such that the (nn) matrix Df(y) is invertible. Let x = f(y). Then
1/ there exists open sets U and V Rn such that:
. x U, y V,
. f is one-to-one on V
. f(V) = U (implying that f: V U is onto)
2/ the inverse function f-1 = g: U V:
. exists and is unique such that g(f(x) = x for all x U
. g satisfies Dg(x) = [Df(y)]-1 for all x U, y = g(x), where Dg(x) is the (nn) Jacobian matrix of g = f-1 at x, and Df(y) is the (nn) Jacobian matrix of f at y.
Notes:. In general, the Jacobian matrix is not symmetric
. det(Df(y)) is called the Jacobian of f at y.
3.12. Implicit function theorem
Let f: X Rn, X Rm+n, m 1, n 1.
Let f C1.
Let x Rm, y Rn, (x, y) X.
Let the (n(m+n)) matrix Df(x, y) = [Dfx(x, y), Dfy(x, y)], where Dfx(x, y) is a (nm) matrix of the derivatives of f with respect to x, and Dfy(x, y)] is the (nn) matrix of derivatives of f with respect to y.
Theorem 7: Let X be an open subset of Rm+n. Let (x*, y*) X such that the (nn) matrix Dfy(x*,y*) is invertible. Then, there exists a neighborhood U Rm of x*, and a C1 function g: U Rn such that:
. (x, g(x)) X for all x U
. g(x*) = y*
. f(x, g(x)) = f(x*, y*) for all x U
. Dg(x) = -[Dfy(x, y)]-1 Dfx(x, y) for all x U.
3.13. Concave functions
Let f: X R, XRm.
Let X be a convex set in Rm.
Definition: A function f: X R is a concave function on X if, for all x1 X and x2 X, and for all R, 0 1, then
f[ x1 + (1-) x2] f(x1) + (1-) f(x2).
Definition: A function f: X R is strictly concave on X if, for all x1 X and x2 X, and for all R, 0 < < 1, then
f[ x1 + (1-) x2] > f(x1) + (1-) f(x2).
Let X be a convex subset of Rm, and f: X R. Then:
1/ f is a concave function if and only if
{(x, a)} XR: f(x) a} is a convex set in Rm+1.
2/ if f is a concave function, then
for all a R, {x X: f(x) a} is a convex set in Rm.
3/ (Jensen’s inequality) f is a concave function if and only if
for m > 1, f[i xi] i=1,…,mi f(xi)
whenever xi X Rm, i R, i 0, i = 1, 2, …, m, and i=1,…,mi = 1.
4/ If X is a convex subset of Rm and f: X R is concave on X, then
f need not be continuous on X.
Example: f(x) = 1+x for x > 0
= 0 for x = 0,
where X = {x: x R, x 0}. Then , f is concave but not continuous on X.
5/ If X is an open convex subset of Rm and f: X R is concave on X, then
f is a continuous function on X.
6/ Let X be an open convex subset of Rm, and f C1 on X, f: X R. Then f is concave on X if and only if
f(x2) - f(x1) Df(x1) [x2 - x1]
where Df(x1) is the (1m) gradient vector of f at x1, and x1 X and x2 X are (m1) vectors.
7/ Let X be an open convex subset of Rm, and f C1 on X, f: X R. Then f is concave on X if and only if
[Df(x2) - Df(x1)] [x2 - x1] 0
where Df(x) is the (1m) gradient vector of f at x, and x1 X and x2 X are (m1) vectors.
8/ Let X be an open convex subset of Rm, and f C1 on X, f: X R. Then f is strictly concave on X if and only if
f(x2) - f(x1) < Df(x1) [x2 - x1]
whenever x1 x2, where Df(x1) is the (1m) gradient vector of f at x1, and x1 X and x2 X are (m1) vectors.
9/ Let X be an open convex subset of Rm, and f C1 on X, f: X R. Then f is strictly concave on X if and only if
[Df(x2) - Df(x1)] [x2 - x1] < 0
whenever x1 x2, where Df(x) is the (1m) gradient vector of f at x, and x1 X and x2 X are (m1) vectors.
10/ Let X be an open convex subset of Rm, and f C2 on X, f: X R. Then f is concave on X if and only if
H(x) = D2f(x) is a (mm) negative semi-definite matrix,
for all x X, where the (mm) matrix D2f(x) is called the hessian of f at x.
11/ Let X be an open convex subset of Rm, and f C2 on X, f: X R. If the (mm) matrix H(x) = D2f(x) is negative definite for all x X, then the function f is strictly concave on X.
Example: f(x) = - x4, X = R. Then f C2 on X and f is strictly concave on X. Yet, D2f(x) = -12 x2 0 for all x X, but D2f(0) = 0. Thus, D2f(x) is only negative semi-definite, and not negative definite on X.
Let f: X R, where X is a convex subset of Rm.
Definition: The function f is convex on X if and only if (-f) is concave on X.
Definition: The function f is strictly convex on X if and only if (-f) is strictly concave on X.
3.14. Quasi-concave function
Let f: X R, where X Rm is a convex set.
Definition: f is quasi-concave on X if
f(x2) f(x1) implies that f[ x1 + (1-) x2] f(x1)
for all x1 X, x2 X, R, 0 1.
Definition: f is strictly quasi-concave on X if
f(x2) f(x1) implies that f[ x1 + (1-) x2] > f(x1)
for all x1 X, x2 X, x1 x2, R, 0 < < 1.
. Let f: X R, where X is a convex subset of Rm. Then f is quasi-concave on X if and only if
{x X: f(x) b } is a convex set for all b R.
. A concave function is necessarily quasi-concave. However, a quasi-concave function is not necessarily concave.
. Let f: X R, f C1. Let X be an open convex subset of Rm. Then the function f is quasi-concave on X if and only if
f(x2) f(x1) implies that Df(x1) [x2 - x1] 0
where Df(x) is the (1m) gradient vector of f at x, and x1 X and x2 X are (m1) vectors.
Definition: A function f: X R is quasi-convex on X Rmif and only if (-f) is quasi-concave on X.
Definition: A function f: X R is strictly quasi-convex on X Rmif and only if (-f) is strictly quasi-concave on X.
3.15. Homogeneous function
Let f: X R, where X = {x: x Rm, x 0}.
Definition: A function f: X R is homogeneous of degree r if, for all x X and t R, t > 0,
f(t x) = tr f(x).
Theorem 8: Let f: X R be an homogeneous function of degree r, f C1. Then
Df(x) is homogeneous of degree (r-1) on X.
Theorem 9: (Euler theorem): Let f: X R be a homogeneous function of degree r, f C1. Then
Df(x) x = r f(x)
for all x X, where Df(x) is the (1m) gradient vector, and x Rm is a (m1) column vector.
Definition: A function f: X R is homothetic if
f = g h, where h: X R and g: R R
h is homogeneous of degree one
g: R R is a monotonic increasing function (i.e., for all z1, z2 R,
z2 > z1 implies that g(z2) > g(z1)).