1

3.Review of Calculus

Let f: X  Rm be a function from X to Rm, X  Rn.

3.1. Limits of a function

Let b  X, kRm.

Definition: Assume that f is defined on a neighborhood S  Rm of b, except possibly at b. The limit of f at b, denoted by k = limxb f(x), exists if, for all  > 0, there is a  such that,

if x  X and 0 < ||x - b|| < ,

then ||f(x), k|| < .

Notes:. f(x) does not have to be defined at b.

. if f(b) exists, it can be that k  f(b).

. f must be defined in a neighborhood of b.

. even if f is defined on a neighborhood S containing b, the limit of f at b may not exist.

Example 1: f(x)= 0 if x  0

= 1 if x = 0.

Then:limxb f(x) = 0 for all b  R (e.g., b = 0)

f(0) = 1.

Example 2: f(x)= 1 if x  0

= -1 if x < 0.

Then:limx0 f(x) does not exist

f(0) = 1.

Example 3: f(x)= sin(1/x), x  R\{0}.

Then:limx0 f(x) does not exist

f(0) does not exist.

Alternative Definition: Assume that f is defined on a neighborhood S  Rm of b, except possibly at b. The limit of f at b, denoted by k = limxb f(x), exists if:

k = limj f(xj) for every sequence {xj: j = 1, 2, ..., } in X such that xj b for all j, and xj b.

Let f: X  R, g: X  R (m = 1), such that limxb f(x) = k, and limxb g(x) = p,

. the limits k and p are unique

. limxb (f + g)(x) = k + p

. limxb (f  g)(x) = k  p

. limxb (f/g)(x) = k/p if p  0, g(x)  0 for all x  X

. if f(x)  g(x) for all x  b in some neighborhood of b, then k  p.

3.2. Continuity of f

Let f: X  Rm, X  Rn.

Definition: The function f is said to be continuous at b  X if, for all  > 0, there exists a  > 0 such that, if x  X and ||x - b|| < ,

then ||f(x) - f(b)|| < .

Alternative Definitions:

1/ The function f is said to be continuous at b  X if, for all sequences {xj: j = 1, 2, …} in X,

if limj xj b, then limj f(xj)  f(b).

2/ The function f is said to be continuous at b  X if and only if, for all open sets V  Rm such that f(x)  V, there is an open set U  Rn such that x  U and f(z)  V for all z  U  X.

3/ The function f is said to be continuous at b  X if and only if, for each open sets V  Rm, there is an open set U  Rn such that f-1(V) = U  S, where f-1(V) = {x  X: f(x)  V}.

Note:. f must be defined at b

. f does not have to be defined in a neighborhood of b.

Example: X = {b} = a single point. Then f is continuous at b, but f does not have a limit at b.

If f is defined at b  X and in a neighborhood S  X of b, then f is continuous at b if and only if [limxb f(x)] = f(b).

Definition:

If A  X, f is said to be continuous on A if f is continuous at all x  A.

If A  X, f is said to be continuous if f is continuous at all x  X.

Let f: X  Rm and g: X  Rm be continuous functions, X  Rn, then

. f + g is also continuous

. f  g is also continuous

. f/g is also continuous if g(x)  0 for all x  X.

Let f: X  Y and g: Y  Rm be continuous functions, X  Rn, Y  Rk, then

g  f: X  Rm is also continuous.

3.3. Intermediate Value Theorem

Let f: X  R, X  Rn.

Theorem 1: Let f be a continuous function on X, a convex subset of Rn.

Let x1 X, x2 X, and f(x1) > f(x2).

Then, given any c  R such that f(x1) > c > f(x2), there exists a , 0 <  < 1, such that

f[ x1 + (1-) x2] = c.

3.4. Differentiability of f

Let f: X  Rm, X being an open set in Rn.

Definition: The function f: X  Rm is differentiable at x  X if there exists a (mn) matrix A such that, for all  > 0,

if there exists a  > 0 such that y  X and ||x - y|| < ,

then ||f(x) - f(y) - A  [x - y]|| <  ||x - y||.

Alternative Definition: The function f: X  Rm is differentiable at x  X if there exists a (mn) matrix A such that

where “yx” means “for all sequences {yj: j = 1 ,2 , …} such that yjx”. If m = n = 1, this can be written as

The (mn) matrix A is called the gradient or derivative of f at x and is denoted by Df(x). (Note that Df(x) is often written as f’(x)). If f is differentiable, then Df is a function Df: X  Rmn. In general, we can write f(x) as a (m1) vector of functions

where fj(x) denotes the j-th function, j = 1, 2, ..., m, and x = (x1, x2, ..., xn)  X  Rn.

Then Df can be written as the (mn) matrix

where the (i, j)-th element Dfi(xj, .) is interpreted as the derivative of fi(x) with respect to xj R, i = 1, 2, ..., m, j = 1, 2, ..., n.

Interpretation: Consider a linear function g(y) = b + A  y, y  X. We look to approximate f at x by the linear function g(y). We want f(x) = g(x) = b + A  x, implying b = f(x) - A  x, or g(y) = A  [y - x] + f(x).

We also want {||f(y) - g(y)||/||y - x||} to be small at y in the neighborhood of x. This implies that {||f(y) - f(x) - A[y - x]||/||y - x||} is also small in the neighborhood of x.

. If f is differentiable at all points in X, then f is differentiable on X.

. A differentiable function on X is necessarily continuous on X.

. If Df: X  Rmn is a continuous function, then f is continuously differentiable on X. This is denoted by f  C1, where C1 is the class of continuously differentiable functions.

Examples of derivatives, where f: X  R, with X  R:

. f(x) = a + b x: Df(x) = b

. f(x) = a + b x + c x2:Df(x) = b + 2c x

. f(x) = a + b x + c x2 + d x3: Df(x) = b + 2c x + 3d x2

. f(x) = a + b xc:Df(x) = bc xc-1

. f(x) = a + b ec x:Df(x) = bc ec x

. f(x) = a + b ln(cx):Df(x) = b/x

And assuming that the functions g and h are differentiable:

. f(x) = g(x) + h(x):Df(x) = Dg(x) + Dh(x)

. f(x) = g(x)  h(x):Df(x) = Dg(x)  h(x) + g(x)  Dh(x) (the “product rule”)

. f(x) = g(x)/h(x), h  0:Df(x)= Dg(x)/h(x) - Dh(x)  g(x)/[h(x)]2 (the “quotient rule”)

. f(x) = g(h(x)):Df(x) = Dg(h(x))  Dh(x) (the “chain rule”)

. f(x) = eg(x):Df(x) = eg(x) Dg(x)

. f(x) = ln(g(x)), g > 0:Df(x) = Dg(x)/g(x)

. f(x) = sin(g(x)):Df(x) = cos(g(x))  Dg(x)

. f(x) = cos(g(x)):Df(x) = -sin(g(x))  Dg(x)

Note: Let y be (n1) vector, x be a (n1) vector, and A be a (nn) matrix.

. f(x) = yT x, then Df(x) = yT

. f(x) = xT A x, A being symmetric, then Df(x) = 2 xT A

. f(x) = A x, then Df(x) = A.

Example: Consider

f(x) = x2 sin(1/x2) if x  0

= 0 if x = 0.

It follows that

Df(x) = 2x sin(1/x2) - (2/x) cos(1/x2) for x  0.

Note that the right-hand side of the above expression is not well defined at x = 0.

Yet, Df(0) = limx0 [(f(x) - f(0))/x] = limx0 [x sin(1/x2)] = 0.

The function f is differentiable everywhere, but Df is not continuous at x = 0. Thus f  C1.

3.5. Intermediate value theorem for Df(x)

Let f: X  R, X  R.

Theorem 2: Let f be differentiable on X, a convex subset of R. Let x1 X, x2 X, and Df(x1) < Df(x2). Then, given any c  R such that Df(x1) < c < Df(x2), there exists a  R, 0 <  < 1, such that Df[ x1 + (1-) x2] = c.

Notes: . This result does not assume that f  C1.

. The theorem implies that the derivative has some "minimal continuity properties" even if it is not continuous: it cannot have "jump discontinuities".

. The theorem obviously holds if f  C1.

3.6. Mean value theorem

Let f: X  R, X  Rn.

Theorem 3: Let f be differentiable on X, an open convex subset of Rn. Let x1 X and x2 X be (n1) vectors. Then, there exists a  R, 0  1, such that

f(x2) - f(x1) = Df( x1 + (1-) x2)  [x2 - x1]

where Df(x) is a (1n) vector of the derivative of f at x.

Note: f  C1 is not required for the theorem to hold.

Proof: Consider the function

g(x) = f(x2) - f(x) + [(f(x2) - f(x1))/(x2– x1)] (x – x2).

Note that g(x1) = 0 and g(x2) = 0.

If g(x) is constant between x1 and x2, then Dg(x) = 0 for any x between x1 and x2.

If g(x) is not constant between x1 and x2, it attains either a maximum or a minimum between x1 and x2. If it attains a maximum, let that maximum be at point x3. Under differentiability, at x3 between x1 and x2, Dg(x3) = 0 (otherwise x3 would not be a maximum). If it attains a minimum, let that minimum be at point x4. Under differentiability, at x4 between x1 and x2, Dg(x4) = 0 (otherwise x4 would not be a minimum).

Thus, in any situation, there is a point x0 between x1 and x2 that satisfies Dg(x0) = 0. Differentiating g(x) yields

Dg(x) = -Df(x) + (f(x2) - f(x1))/(x2 – x1).

Evaluating this expression at x0 gives the desired result.

3.7. Partial Derivatives

Let f: X  R, where X is an open set in Rn. Let ei be the (n1) unit vector ei = (ei1, ei2, ..., ein)T such that eij = 0 if i  j

= 1 if i = j.

Definition: The i-th partial derivative of f at x  X (or the partial derivative of f with respect to xi at x) is

f/xi(x) = limt0 [(f(x + t ei) - f(x))/t], i = 1, 2, ..., n.

. If f is differentiable at x, then all partial derivatives f/xi exist at x, i = 1, 2, ..., n, and Df(x) = [f/x1(x), f/x2(x), …, f/xn(x)] is a (1n) vector where f/xi is the partial derivative of f with respect to xi.

. If all partial derivatives of f exist and are continuous at x, then Df(x) exists and the (1n) gradient vector can be written as Df(x) = [f/x1(x), f/x2(x), …, f/xn(x)].

. f  C1if and only if all partial derivatives of f exist and are continuous on X.

Note: All partial derivatives of a differentiable function always exist. However, the existence of partial derivatives does not imply that the function is differentiable.

Example: f(x, y) = x y/(x2 + y2)1/2 for x  0

= 0 for x = 0.

We have:

f/y(x, 0) = x/(x2)1/2 = 1 for all x  0

f/x(0, y) = 1 for all y  0

f/y(0, 0) = limy0 [(f(0, y) - f(0, 0))/y] = limy0 [(0 - 0)/y] = 0

f/x(0, 0) = 0

But,

implying that f(x, y) is not differentiable. This is a case where the partial derivatives of f exist everywhere, although they are not continuous at (x, y) = (0, 0).

3.8. Directional derivatives

Let f: X  R, where X is an open subset of Rn.

Definition: The directional derivative of f at x in the direction h  Rn, denoted Df(x, h), is

Df(x, h) = limt0+ [(f(x + t  h) - f(x))/t],

where “t0+” means “t0, t > 0”.

Notes: If f is differentiable at x, then

. Df(x, h) exists

. Df(x, h) = Df(x)  h

. Df(x, h) = -Df(x, -h)

3.9. Higher order derivatives

Let f: X  R, where X is an open subset of Rn.

Let f be differentiable on X. Let Df be differentiable such that f/xi: X  R is differentiable at x, i = 1, 2, …, n.

Denote the directional derivative of f/xi in the direction ej at x by

The function f is twice differentiable at x with second derivatives D2f(x) given by the (nn) matrix

If f is twice differentiable for all x  X, then f is twice differentiable on X.

If f is twice differentiable on X and 2f/xixj(x) is a continuous function from X to R for i, j = 1, 2, …, n, then f is twice continuously differentiable on X. This is denoted by f  C2, where C2 is the class of twice continuously differentiable functions.

Theorem 4: (Young theorem): If f  C2, then D2f(x) is a (nn) symmetric matrix, where

for all i, j = 1, 2, …, n, and for all x  X.

Note: The function f being just twice differentiable is not enough for Young theorem to holdin general. To see that, consider the following example.

Example: Consider

f(x1, x2)= 0 if (x1, x2) = (0, 0),

= (x13 x2 – x1 x23)/(x12 + x22) otherwise.

Note that

  • f/x1(0, x2)= -x2
  • f/x2(x1, 0)= x1
  • 2f/x2x1(0, 0)= -1
  • 2f/x1x2(0, 0)= 1.

It follows that 2f/x2x1(0, 0)≠ 2f/x1x2(0, 0), i.e. that Young theorem does not applyat (x1, x2) = (0, 0). Note that 2f/x1x2(y, y)= 0 for any y > 0. It follows that 2f/x1x2(x1, x2) is not continuous at (x1, x2) = (0, 0), i.e. that f  C2.

3.10. Taylor Series

Let f: X  R, where X is an open convex subset of R.

Let f(k)(x) denote the k-th derivative of f at x, k = 1, 2, …

The following theorem gives the Taylor series expansion of f.

Theorem 5: (Taylor theorem): Let r be a non-negative integer. Suppose that f(k)(x) exists and is continuous on X for k = 1, 2, …, r, and that f(r+1)(x) exists for all x  X. Let x0 and x  X. Then,

f(x) = f(x0) + (1/k!)  f(k)(x0)  [x - x0]k + Rr(x, x0),(TE)

where k! = [k (k-1) (k-2)…(2) (1)] is the factorial of k and the remainder term Rr(x, x0) satisfies

Rr(x, x0) = 1/[(r+1)!]  f(r+1)( x0 + (1-) x)  [x - x0]r+1

for some , 0  1.

Expression (TE) is a Taylor series expansion of f at x0. The mean value theorem is a special case (when r = 0).

Proof: Let P(x) = f(x0) + (1/k!)  f(k)(x0)  [x - x0]k. Note that P(x0) = f(x0), and P(k)(x0) = f(k)(x0), for k = 1, …, r. Consider the function

g(t) = f(t) – P(t) – M [t – x0]r+1,

Let M = [f(x) – P(x)]/[(x – x0)r+1] so that g(x) = 0.

Note that g(x0) = 0, and g(k)(x0) = 0, for k = 1, …, r.Also note that g(r+1)(t) = f(r+1)(t) – (r+1)! M. Using these results, repeated use of the mean value theorem implies that there exists a point x1 between x and x0 where g(r+1)(x1) = 0. It follows that

0 = g(r+1)(x1) = f(r+1)(x1) – (r+1)! M, or

M = [1/(r+1)!] f(r+1)(x1).

Substituting this result into g(t) and using g(x) = 0, this gives

f(x) = P(x) + [1/(r+1)!] f(r+1)(x1) [x – x0]r+1,

which is (TE).

By definition of f(r+1)(x), note that the remainder Rr(x, x0) in (TE) satisfies limxx0 {Rr(x, x0)/|x - x0|r} = 0. It follows that the function f(x) can be locally approximated in the neighborhood of x0 by the r-th order Taylor series approximation

f(x)  f(x0) + (1/k!)  f(k)(x0)  [x - x0]k.

3.11. Inverse function theorem

Let f: X  Rn, x  Rn.

Theorem 6: Let f: X  Rn be a C1 function. Let X  Rn be an open set. Let y  X such that the (nn) matrix Df(y) is invertible. Let x = f(y). Then

1/ there exists open sets U and V  Rn such that:

. x  U, y  V,

. f is one-to-one on V

. f(V) = U (implying that f: V  U is onto)

2/ the inverse function f-1 = g: U  V:

. exists and is unique such that g(f(x) = x for all x  U

. g satisfies Dg(x) = [Df(y)]-1 for all x  U, y = g(x), where Dg(x) is the (nn) Jacobian matrix of g = f-1 at x, and Df(y) is the (nn) Jacobian matrix of f at y.

Notes:. In general, the Jacobian matrix is not symmetric

. det(Df(y)) is called the Jacobian of f at y.

3.12. Implicit function theorem

Let f: X  Rn, X  Rm+n, m  1, n  1.

Let f  C1.

Let x  Rm, y  Rn, (x, y)  X.

Let the (n(m+n)) matrix Df(x, y) = [Dfx(x, y), Dfy(x, y)], where Dfx(x, y) is a (nm) matrix of the derivatives of f with respect to x, and Dfy(x, y)] is the (nn) matrix of derivatives of f with respect to y.

Theorem 7: Let X be an open subset of Rm+n. Let (x*, y*)  X such that the (nn) matrix Dfy(x*,y*) is invertible. Then, there exists a neighborhood U  Rm of x*, and a C1 function g: U  Rn such that:

. (x, g(x))  X for all x  U

. g(x*) = y*

. f(x, g(x)) = f(x*, y*) for all x  U

. Dg(x) = -[Dfy(x, y)]-1 Dfx(x, y) for all x  U.

3.13. Concave functions

Let f: X  R, XRm.

Let X be a convex set in Rm.

Definition: A function f: X  R is a concave function on X if, for all x1 X and x2 X, and for all  R, 0  1, then

f[ x1 + (1-) x2]  f(x1) + (1-) f(x2).

Definition: A function f: X  R is strictly concave on X if, for all x1 X and x2 X, and for all  R, 0 <  < 1, then

f[ x1 + (1-) x2] >  f(x1) + (1-) f(x2).

Let X be a convex subset of Rm, and f: X  R. Then:

1/ f is a concave function if and only if

{(x, a)}  XR: f(x)  a} is a convex set in Rm+1.

2/ if f is a concave function, then

for all a  R, {x  X: f(x)  a} is a convex set in Rm.

3/ (Jensen’s inequality) f is a concave function if and only if

for m > 1, f[i xi] i=1,…,mi f(xi)

whenever xi X  Rm, i R, i 0, i = 1, 2, …, m, and i=1,…,mi = 1.

4/ If X is a convex subset of Rm and f: X  R is concave on X, then

f need not be continuous on X.

Example: f(x) = 1+x for x > 0

= 0 for x = 0,

where X = {x: x  R, x  0}. Then , f is concave but not continuous on X.

5/ If X is an open convex subset of Rm and f: X  R is concave on X, then

f is a continuous function on X.

6/ Let X be an open convex subset of Rm, and f  C1 on X, f: X  R. Then f is concave on X if and only if

f(x2) - f(x1)  Df(x1)  [x2 - x1]

where Df(x1) is the (1m) gradient vector of f at x1, and x1 X and x2 X are (m1) vectors.

7/ Let X be an open convex subset of Rm, and f  C1 on X, f: X  R. Then f is concave on X if and only if

[Df(x2) - Df(x1)]  [x2 - x1]  0

where Df(x) is the (1m) gradient vector of f at x, and x1 X and x2 X are (m1) vectors.

8/ Let X be an open convex subset of Rm, and f  C1 on X, f: X  R. Then f is strictly concave on X if and only if

f(x2) - f(x1) < Df(x1)  [x2 - x1]

whenever x1 x2, where Df(x1) is the (1m) gradient vector of f at x1, and x1 X and x2 X are (m1) vectors.

9/ Let X be an open convex subset of Rm, and f  C1 on X, f: X  R. Then f is strictly concave on X if and only if

[Df(x2) - Df(x1)]  [x2 - x1] < 0

whenever x1 x2, where Df(x) is the (1m) gradient vector of f at x, and x1 X and x2 X are (m1) vectors.

10/ Let X be an open convex subset of Rm, and f  C2 on X, f: X  R. Then f is concave on X if and only if

H(x) = D2f(x) is a (mm) negative semi-definite matrix,

for all x  X, where the (mm) matrix D2f(x) is called the hessian of f at x.

11/ Let X be an open convex subset of Rm, and f  C2 on X, f: X  R. If the (mm) matrix H(x) = D2f(x) is negative definite for all x  X, then the function f is strictly concave on X.

Example: f(x) = - x4, X = R. Then f  C2 on X and f is strictly concave on X. Yet, D2f(x) = -12 x2 0 for all x  X, but D2f(0) = 0. Thus, D2f(x) is only negative semi-definite, and not negative definite on X.

Let f: X  R, where X is a convex subset of Rm.

Definition: The function f is convex on X if and only if (-f) is concave on X.

Definition: The function f is strictly convex on X if and only if (-f) is strictly concave on X.

3.14. Quasi-concave function

Let f: X  R, where X  Rm is a convex set.

Definition: f is quasi-concave on X if

f(x2)  f(x1) implies that f[ x1 + (1-) x2]  f(x1)

for all x1 X, x2 X,  R, 0  1.

Definition: f is strictly quasi-concave on X if

f(x2)  f(x1) implies that f[ x1 + (1-) x2] > f(x1)

for all x1 X, x2 X, x1 x2,  R, 0 <  < 1.

. Let f: X  R, where X is a convex subset of Rm. Then f is quasi-concave on X if and only if

{x  X: f(x)  b } is a convex set for all b  R.

. A concave function is necessarily quasi-concave. However, a quasi-concave function is not necessarily concave.

. Let f: X  R, f  C1. Let X be an open convex subset of Rm. Then the function f is quasi-concave on X if and only if

f(x2)  f(x1) implies that Df(x1)  [x2 - x1]  0

where Df(x) is the (1m) gradient vector of f at x, and x1 X and x2 X are (m1) vectors.

Definition: A function f: X  R is quasi-convex on X  Rmif and only if (-f) is quasi-concave on X.

Definition: A function f: X  R is strictly quasi-convex on X  Rmif and only if (-f) is strictly quasi-concave on X.

3.15. Homogeneous function

Let f: X  R, where X = {x: x  Rm, x  0}.

Definition: A function f: X  R is homogeneous of degree r if, for all x  X and t  R, t > 0,

f(t x) = tr f(x).

Theorem 8: Let f: X  R be an homogeneous function of degree r, f  C1. Then

Df(x) is homogeneous of degree (r-1) on X.

Theorem 9: (Euler theorem): Let f: X  R be a homogeneous function of degree r, f  C1. Then

Df(x)  x = r f(x)

for all x  X, where Df(x) is the (1m) gradient vector, and x  Rm is a (m1) column vector.

Definition: A function f: X  R is homothetic if

f = g  h, where h: X  R and g: R  R

h is homogeneous of degree one

g: R  R is a monotonic increasing function (i.e., for all z1, z2 R,

z2 > z1 implies that g(z2) > g(z1)).