Study of a fast Fourier transform algorithm

In this note I want to record my experience with manually implementing a fast Fourier transform algorithm. FFTs are usually implemented using in-built functions in standard mathematical software packages such as MATLAB but I decided to try to carry out the computations manually from first principles to obtain a more in-depth understanding of the calculations involved. I expected this to be a difficult task as the underlying equations look formidable at first sight, but it turned out to be relatively straightforward and quite instructive.

A number of different FFT algorithms have been developed and the underlying equations are provided in numerous journal articles and books. I chose to implement an 8-point (real-valued) FFT using the equations and approach discussed in the book Approximation Theory and Methods by M. J. D. Powell. At the time of writing, the relevant pages are freely available online and I will display some of the relevant sections below.

The classical Fourier analysis problem is to express a periodic function f(x) of period 2\pi in the form

f(x) = \frac{1}{2}a_0 + a_1 \cos(x) + a_2 \cos(2x) + \cdots + b_1 \sin(x) + b_2 \sin(2x) + \cdots

In the context of approximation theory the above Fourier series is truncated so that some n > 0 is the highest value of j for which at least one of a_j or b_j is non-zero, i.e.,

q(x) = \frac{1}{2}a_0 + \sum_{j=1}^n\big[a_j \cos(jx) + b_j \sin(jx) \big]

We then regard q(x) as a trigonometric polynomial of degree n which approximates the function f(x) (this approximation can be shown to minimise a least squares distance function).

The task is to find formulas for the coefficients a_j and b_j and this is achieved using the following formulas for the average values over a period of \sin(jx) \cos(jx), \sin(jx) \sin(kx) and \cos(jx) \cos(kx) respectively:

\frac{1}{2 \pi} \int_{-\pi}^{\pi} \sin(jx) \cos(jx) dx = 0

\frac{1}{2 \pi} \int_{-\pi}^{\pi} \sin(jx) \sin(kx) dx  = \left \{ \begin{array}{c c}  0 & j \neq k \\  \frac{1}{2} & j = k \neq 0 \\  0 & j = k = 0  \end{array} \right.

\frac{1}{2 \pi} \int_{-\pi}^{\pi} \cos(jx) \cos(kx) dx  = \left \{ \begin{array}{c c}  0 & j \neq k \\  \frac{1}{2} & j = k \neq 0 \\  1 & j = k = 0  \end{array} \right.

Then to find the value of the coefficient a_0 in the Fourier series expansion of f(x) we take the average value of the series by integrating on (-\pi, \pi) term by term. We get

\frac{1}{2 \pi} \int_{-\pi}^{\pi} f(x) dx = \frac{1}{2 \pi} \int_{-\pi}^{\pi} \frac{1}{2} a_0 dx

with all the other terms on the right-hand side equal to zero because they can all be regarded as integrals of \sin(kx) \cos(jx) or of \cos(jx) \cos(kx) with j = 0 and k \neq 0 (i.e., j \neq k). We are thus left with

a_0 = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) dx

This is the formula for calculating the coefficient a_0 in the Fourier series expansion of a periodic function f(x).

To find the coefficient a_1 we multiply both sides of the generic Fourier series expansion by \cos(x) to get

f(x)\cos(x) = \frac{1}{2}a_0\cos(x) + a_1 \cos^2(x) + a_2 \cos(2x)\cos(x) + \cdots

+ b_1 \sin(x)\cos(x) + b_2 \sin(2x)\cos(x) + \cdots

and again find the average value of each term. This time, all the terms on the right-hand side are zero except the one corresponding to a_1 so we get

\frac{1}{2 \pi} \int_{-\pi}^{\pi} f(x)\cos(x) dx = a_1 \frac{1}{2 \pi} \int_{-\pi}^{\pi} \cos^2(x) dx = \frac{1}{2}a_1

and thus

a_1 = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \cos(x) dx

In general, to find a_j we multiply both sides of the generic Fourier series expansion by \cos(jx) and again find the average value of each term to get

a_j = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \cos(jx) dx

Analogously, to obtain the formula for b_j we multiply both sides of the generic Fourier series expansion by \sin(jx) and take average values. We find

b_j = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \sin(jx) dx

This classical setup assumes that the values of the function f(x) are known for all x in the interval [-\pi, \pi]. When this is not the case and we only have the values of f(x) for a discrete set of points in the interval [-\pi, \pi], we need to use a discretised form of Fourier series approximation. Powell approaches this in Section 13.3 of his book as follows:

Therefore the classical formulas

a_j = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \cos(jx) dx

and

b_j = \frac{1}{\pi} \int_{-\pi}^{\pi} f(x) \sin(jx) dx

are replaced by formulas (13.36) and (13.37) in the excerpt above. To illustrate how these formulas work in a concrete example, suppose the problem is to to obtain an approximation of a function f(x) by a trigonometric polynomial of degree 2 given only the four function values f(0) = 0.2, f(\pi/2) = 0.25, f(\pi) = 1.0 and f(3\pi/2) = 0.5. Thus, we need to obtain an approximation to f of the form

q(x) = \frac{1}{2}a_0 + \sum_{j=1}^2\big[a_j \cos(jx) + b_j \sin(jx) \big]

Setting N = 4, we use formulas (13.36) and (13.37) to evaluate the coefficients a_0, a_1, a_2b_1 and b_2 as follows:

a_0 = \frac{2}{N} \sum_{k=0}^{N-1} f\big(\frac{2\pi k}{N}\big) \cos\big(\frac{2\pi j k}{N}\big)

= \frac{2}{4} \big[f(0)\cos(0) + f\big(\frac{2\pi}{4}\big) + f\big(\frac{4\pi}{4}\big) + f\big(\frac{6\pi}{4}\big)\big] = 0.975

a_1 = \frac{2}{4} \big[f(0)\cos(0) + f\big(\frac{2\pi}{4}\big)\cos\big(\frac{2\pi}{4}\big) + f\big(\frac{4\pi}{4}\big)\cos\big(\frac{4\pi}{4}\big) + f\big(\frac{6\pi}{4}\big)\cos\big(\frac{6\pi}{4}\big)\big] = -0.4

a_2 = \frac{2}{4} \big[f(0)\cos(0) + f\big(\frac{2\pi}{4}\big)\cos\big(\frac{4\pi}{4}\big) + f\big(\frac{4\pi}{4}\big)\cos\big(\frac{8\pi}{4}\big) + f\big(\frac{6\pi}{4}\big)\cos\big(\frac{12\pi}{4}\big)\big] = 0.225

b_1 = \frac{2}{N} \sum_{k=0}^{N-1} f\big(\frac{2\pi k}{N}\big) \sin\big(\frac{2\pi j k}{N}\big)

= \frac{2}{4} \big[f(0)\sin(0) + f\big(\frac{2\pi}{4}\big)\sin\big(\frac{2\pi}{4}\big) + f\big(\frac{4\pi}{4}\big)\sin\big(\frac{4\pi}{4}\big) + f\big(\frac{6\pi}{4}\big)\sin\big(\frac{6\pi}{4}\big)\big] = -0.125

b_2 = \frac{2}{4} \big[f(0)\sin(0) + f\big(\frac{2\pi}{4}\big)\sin\big(\frac{4\pi}{4}\big) + f\big(\frac{4\pi}{4}\big)\sin\big(\frac{8\pi}{4}\big) + f\big(\frac{6\pi}{4}\big)\sin\big(\frac{12\pi}{4}\big)\big] = 0

Therefore the required approximation is

q(x) = 0.4875 - 0.4 \cos(x) - 0.125 \sin(x) + 0.225 \cos(2x) + 0 \sin(2x)

The fast Fourier transform is designed to speed up the calculations in discrete Fourier series approximations of the above kind in cases when N is large. Powell discusses this in Section 13.4 of his book as follows:

Each iteration of the FFT process depends on the value of the iteration variable m, which initially is set at 1. In each subsequent iteration, the value of m is set at twice the previous value, so in the second iteration we have m = 2, in the third iteration we have m = 4, in the fourth iteration we have m = 8 and so on. This continues until the value of the iteration variable reaches N (which in this particular FFT approach is required to be a power of 2).

At the start of the first iteration we set m = 1 and j = 0. Formula (13.46) then reduces to

a[1, \alpha, 0] = 2f(\alpha)

and formula (13.47) reduces to

b[1, \alpha, 0] = 0

where for each m the values of \alpha are the numbers in the set

\big\{\frac{2\pi k}{N}: k = 0, 1, \ldots, \frac{N}{m}-1\big\}

So in the first iteration we use the above formulas to generate N values of a[1, \alpha, 0] and b[1, \alpha, 0].

We then use formulas (13.50) to obtain the numbers a[1, \alpha, 1] and b[1, \alpha, 1]. We get

a[1, \alpha, 1] = a[1, \alpha, 0] = 2f(\alpha)

b[1, \alpha, 1] = -b[1, \alpha, 0] = 0

Thus, the numbers a[1, \alpha, 1] and b[1, \alpha, 1] are exactly the same as the numbers a[1, \alpha, 0] and b[1, \alpha, 0] in this first iteration. At this point we are ready to begin the second iteration.

For the second iteration we double the iteration variable and use the results from the first iteration to compute the numbers a[2, \alpha, j] and b[2, \alpha, j] for j = 0, 1 using formulas (13.48) and (13.49). The variable k runs from 0 to \frac{N}{2} - 1 in this second iteration. We then use formulas (13.50) to obtain the numbers a[2, \alpha, j] and b[2, \alpha, j] for j = 2 and at this point we are ready to begin the third iteration.

For the third iteration we double the iteration variable again (so m = 4) and use the results from the second iteration to compute the numbers a[4, \alpha, j] and b[4, \alpha, j] for j = 0, 1, 2 using formulas (13.48) and (13.49). The variable k runs from 0 to \frac{N}{4} - 1 in this third iteration. We then use formulas (13.50) to obtain the numbers a[4, \alpha, j] and b[4, \alpha, j] for j = 3, 4 and at this point we are ready to begin the fourth iteration.

For the fourth iteration we double the iteration variable again (so m = 8) and use the results from the third iteration to compute the numbers a[8, \alpha, j] and b[8, \alpha, j] for j = 0, 1, 2, 3, 4 using formulas (13.48) and (13.49). The variable k runs from 0 to \frac{N}{8} - 1 in this fourth iteration. We then use formulas (13.50) to obtain the numbers a[8, \alpha, j] and b[8, \alpha, j] for j = 5, 6, 7, 8 and at this point we are ready to begin the fifth iteration.

Thus we see that at the end of each iteration the numbers a[m, \alpha, j] and b[m, \alpha, j] are known, where j takes all the integer values in [0, m] and the variable k runs from 0 to \frac{N}{m}-1.

The iterations continue in this fashion until we arrive at the numbers a_j = a[N, 0, j] and b_j = b[N, 0, j] for j = 0, 1, 2, \ldots, \frac{1}{2}N + 1 and these provide the required coefficients for the approximation.

To illustrate how all of this works in a concrete example, I will solve a problem which Powell himself presents at the end of this chapter of his book. The problem reads as follows (and the interested reader should at this point try to solve the problem himself/herself before looking at my solution):

Thus, we need to obtain an approximation to the data of the form

q(x) = \frac{1}{2}a_0 + \sum_{j=1}^3\big[a_j \cos(jx) + b_j \sin(jx) \big]

Applying the iterative process described above with N = 8, we begin by setting m = 1, j = 0 and k = 0, 1, \ldots, 7 in formulas (13.46) and (13.47). Formula (13.46) reduces to

a[1, \alpha, 0] = 2f(\alpha)

so we get the eight values

a\big[1, \frac{2\pi \cdot 0}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 0}{8}\big) = -0.224356

a\big[1, \frac{2\pi \cdot 1}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 1}{8}\big) = 2.159318

a\big[1, \frac{2\pi \cdot 2}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 2}{8}\big) = 4.345334

a\big[1, \frac{2\pi \cdot 3}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 3}{8}\big) = 0.753214

a\big[1, \frac{2\pi \cdot 4}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 4}{8}\big) = -0.642824

a\big[1, \frac{2\pi \cdot 5}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 5}{8}\big) = -1.056226

a\big[1, \frac{2\pi \cdot 6}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 6}{8}\big) = -1.124652

a\big[1, \frac{2\pi \cdot 7}{8}, 0\big] = 2f\big(\frac{2\pi \cdot 7}{8}\big) = -0.932522

and formula (13.47) reduces to

b[1, \alpha, 0] = 0

which gives us

b\big[1, \frac{2\pi \cdot 0}{8}, 0\big] = 0

b\big[1, \frac{2\pi \cdot 1}{8}, 0\big] = 0

b\big[1, \frac{2\pi \cdot 2}{8}, 0\big] = 0

b\big[1, \frac{2\pi \cdot 3}{8}, 0\big] = 0

b\big[1, \frac{2\pi \cdot 4}{8}, 0\big] = 0

b\big[1, \frac{2\pi \cdot 5}{8}, 0\big] = 0

b\big[1, \frac{2\pi \cdot 6}{8}, 0\big] = 0

b\big[1, \frac{2\pi \cdot 7}{8}, 0\big] = 0

Next we calculate the numbers a[1, \alpha, 1] and b[1, \alpha, 1] using equations (13.50), which tell us that

a[1, \alpha, 1] = a[1, \alpha, 0] = 2f(\alpha)

b[1, \alpha, 1] = -b[1, \alpha, 0] = 0

Thus, the numbers a[1, \alpha, 1] and b[1, \alpha, 1] are exactly the same as the numbers a[1, \alpha, 0] and b[1, \alpha, 0] in this first iteration. Writing them out explicitly:

a\big[1, \frac{2\pi \cdot 0}{8}, 1\big] = -0.224356

a\big[1, \frac{2\pi \cdot 1}{8}, 1\big] = 2.159318

a\big[1, \frac{2\pi \cdot 2}{8}, 1\big] = 4.345334

a\big[1, \frac{2\pi \cdot 3}{8}, 1\big] = 0.753214

a\big[1, \frac{2\pi \cdot 4}{8}, 1\big] = -0.642824

a\big[1, \frac{2\pi \cdot 5}{8}, 1\big] = -1.056226

a\big[1, \frac{2\pi \cdot 6}{8}, 1\big] = -1.124652

a\big[1, \frac{2\pi \cdot 7}{8}, 1\big] = -0.932522

b\big[1, \frac{2\pi \cdot 0}{8}, 1\big] = 0

b\big[1, \frac{2\pi \cdot 1}{8}, 1\big] = 0

b\big[1, \frac{2\pi \cdot 2}{8}, 1\big] = 0

b\big[1, \frac{2\pi \cdot 3}{8}, 1\big] = 0

b\big[1, \frac{2\pi \cdot 4}{8}, 1\big] = 0

b\big[1, \frac{2\pi \cdot 5}{8}, 1\big] = 0

b\big[1, \frac{2\pi \cdot 6}{8}, 1\big] = 0

b\big[1, \frac{2\pi \cdot 7}{8}, 1\big] = 0

The above results can be summarised in a table for convenience:

We are now ready to begin the second iteration. The iteration variable becomes m = 2 and we will have j = 0, 1, k = 0, 1, 2, 3. We compute the numbers a[2, \alpha, 0], a[2, \alpha, 1], b[2, \alpha, 0] and b[2, \alpha, 1] using formulas (13.48) and (13.49). We get the formulas

a[2, \alpha, 0] = \frac{1}{2}\big\{a[1, \alpha, 0] + \cos(\pi \cdot 0/1)a[1, \alpha + \frac{\pi}{1}, 0] - \sin(\pi \cdot 0/1)b[1, \alpha + \frac{\pi}{1}, 0]\big\}

= \frac{1}{2}\big\{a[1, \frac{2\pi \cdot k}{8}, 0] + a[1, \frac{2\pi}{8}(k + 4), 0]\big\}

a[2, \alpha, 1] = \frac{1}{2}\big\{a[1, \alpha, 1] + \cos(\pi \cdot 1/1)a[1, \alpha + \frac{\pi}{1}, 1] - \sin(\pi \cdot 1/1)b[1, \alpha + \frac{\pi}{1}, 1]\big\}

= \frac{1}{2}\big\{a[1, \frac{2\pi \cdot k}{8}, 1] - a[1, \frac{2\pi}{8}(k + 4), 1]\big\}

b[2, \alpha, 0] = \frac{1}{2}\big\{b[1, \alpha, 0] + \sin(\pi \cdot 0/1)a[1, \alpha + \frac{\pi}{1}, 0] + \cos(\pi \cdot 0/1)b[1, \alpha + \frac{\pi}{1}, 0]\big\} = 0

b[2, \alpha, 1] = \frac{1}{2}\big\{b[1, \alpha, 1] + \sin(\pi \cdot 1/1)a[1, \alpha + \frac{\pi}{1}, 1] + \cos(\pi \cdot 1/1)b[1, \alpha + \frac{\pi}{1}, 1]\big\} = 0

These then give us the values

a[2, \frac{2\pi \cdot 0}{8}, 0] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 0}{8}, 0] + a[1, \frac{2\pi}{8}(0+ 4), 0]\big\}

= \frac{1}{2}(-0.224356 - 0.642824) = -0.43359

a[2, \frac{2\pi \cdot 1}{8}, 0] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 1}{8}, 0] + a[1, \frac{2\pi}{8}(1+ 4), 0]\big\}

= \frac{1}{2}(2.159318 - 1.056226) = 0.551546

a[2, \frac{2\pi \cdot 2}{8}, 0] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 2}{8}, 0] + a[1, \frac{2\pi}{8}(2+ 4), 0]\big\}

= \frac{1}{2}(4.345334 - 1.124652) = 1.610341

a[2, \frac{2\pi \cdot 3}{8}, 0] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 3}{8}, 0] + a[1, \frac{2\pi}{8}(3+ 4), 0]\big\}

= \frac{1}{2}(0.753214 - 0.932522) = -0.089654

a[2, \frac{2\pi \cdot 0}{8}, 1] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 0}{8}, 1] - a[1, \frac{2\pi}{8}(0+ 4), 1]\big\}

= \frac{1}{2}(-0.224356 + 0.642824) = 0.209234

a[2, \frac{2\pi \cdot 1}{8}, 1] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 1}{8}, 1] - a[1, \frac{2\pi}{8}(1+ 4), 1]\big\}

= \frac{1}{2}(2.159318 + 1.056226) = 1.607772

a[2, \frac{2\pi \cdot 2}{8}, 1] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 2}{8}, 1] - a[1, \frac{2\pi}{8}(2+ 4), 1]\big\}

= \frac{1}{2}(4.345334 + 1.124652) = 2.734993

a[2, \frac{2\pi \cdot 3}{8}, 1] = \frac{1}{2}\big\{a[1, \frac{2\pi \cdot 3}{8}, 1] - a[1, \frac{2\pi}{8}(3+ 4), 1]\big\}

= \frac{1}{2}(0.753214 + 0.932522) = 0.842868

b[2, \frac{2\pi \cdot 0}{8}, 0] = 0

b[2, \frac{2\pi \cdot 1}{8}, 0] = 0

b[2, \frac{2\pi \cdot 2}{8}, 0] = 0

b[2, \frac{2\pi \cdot 3}{8}, 0] = 0

b[2, \frac{2\pi \cdot 0}{8}, 1] = 0

b[2, \frac{2\pi \cdot 1}{8}, 1] = 0

b[2, \frac{2\pi \cdot 2}{8}, 1] = 0

b[2, \frac{2\pi \cdot 3}{8}, 1] = 0

Next we calculate the numbers a[2, \alpha, 2] and b[2, \alpha, 2] using equations (13.50), which tell us that

a[2, \alpha, 2] = a[2, \alpha, 0]

b[2, \alpha, 2] = -b[2, \alpha, 0] = 0

Thus, the numbers a[2, \alpha, 2] and b[2, \alpha, 2] are exactly the same as the numbers a[2, \alpha, 0] and b[2, \alpha, 0] in the second iteration. Writing them out explicitly:

a[2, \frac{2\pi \cdot 0}{8}, 2] = -0.43359

a[2, \frac{2\pi \cdot 1}{8}, 2] = 0.551546

a[2, \frac{2\pi \cdot 2}{8}, 2] = 1.610341

a[2, \frac{2\pi \cdot 3}{8}, 2] = -0.089654

b[2, \frac{2\pi \cdot 0}{8}, 2] = 0

b[2, \frac{2\pi \cdot 1}{8}, 2] = 0

b[2, \frac{2\pi \cdot 2}{8}, 2] = 0

b[2, \frac{2\pi \cdot 3}{8}, 2] = 0

The above results are summarised in the following table for convenience:

We are now ready to begin the third iteration. The iteration variable becomes m = 4 and we will have j = 0, 1, 2, k = 0, 1. We compute the numbers a[4, \alpha, 0], a[4, \alpha, 1], a[4, \alpha, 2], b[4, \alpha, 0], b[4, \alpha, 1] and b[4, \alpha, 2] using formulas (13.48) and (13.49). We get the formulas

a[4, \alpha, 0] = \frac{1}{2}\big\{a[2, \alpha, 0] + \cos(\pi \cdot 0/2)a[2, \alpha + \frac{\pi}{2}, 0] - \sin(\pi \cdot 0/2)b[2, \alpha + \frac{\pi}{2}, 0]\big\}

= \frac{1}{2}\big\{a[2, \frac{2\pi \cdot k}{8}, 0] + a[2, \frac{2\pi}{8}(k + 2), 0]\big\}

a[4, \alpha, 1] = \frac{1}{2}\big\{a[2, \alpha, 1] + \cos(\pi \cdot 1/2)a[2, \alpha + \frac{\pi}{2}, 1] - \sin(\pi \cdot 1/2)b[2, \alpha + \frac{\pi}{2}, 1]\big\}

= \frac{1}{2}a[2, \frac{2\pi \cdot k}{8}, 1]

a[4, \alpha, 2] = \frac{1}{2}\big\{a[2, \alpha, 2] + \cos(\pi \cdot 2/2)a[2, \alpha + \frac{\pi}{2}, 2] - \sin(\pi \cdot 2/2)b[2, \alpha + \frac{\pi}{2}, 2]\big\}

= \frac{1}{2}\big\{a[2, \frac{2\pi \cdot k}{8}, 2] - a[2, \frac{2\pi}{8}(k + 2), 2]\big\}

b[4, \alpha, 0] = \frac{1}{2}\big\{b[2, \alpha, 0] + \sin(\pi \cdot 0/2)a[2, \alpha + \frac{\pi}{2}, 0] + \cos(\pi \cdot 0/2)b[2, \alpha + \frac{\pi}{2}, 0]\big\} = 0

b[4, \alpha, 1] = \frac{1}{2}\big\{b[2, \alpha, 1] + \sin(\pi \cdot 1/2)a[2, \alpha + \frac{\pi}{2}, 1] + \cos(\pi \cdot 1/2)b[2, \alpha + \frac{\pi}{2}, 1]\big\}

= \frac{1}{2} a[2, \frac{2\pi}{8}(k+2), 1]

b[4, \alpha, 2] = \frac{1}{2}\big\{b[2, \alpha, 2] + \sin(\pi \cdot 2/2)a[2, \alpha + \frac{\pi}{2}, 2] + \cos(\pi \cdot 2/2)b[2, \alpha + \frac{\pi}{2}, 2]\big\} = 0

These then give us the values

a[4, \frac{2\pi \cdot 0}{8}, 0] = \frac{1}{2}\big\{a[2, \frac{2\pi \cdot 0}{8}, 0] + a[2, \frac{2\pi}{8}(0 + 2), 0]\big\}

= \frac{1}{2}(-0.43359 + 1.610341) = 0.5883755

a[4, \frac{2\pi \cdot 1}{8}, 0] = \frac{1}{2}\big\{a[2, \frac{2\pi \cdot 1}{8}, 0] + a[2, \frac{2\pi}{8}(1 + 2), 0]\big\}

= \frac{1}{2}(0.551546 - 0.089654) = 0.230946

a[4, \frac{2\pi \cdot 0}{8}, 1] = \frac{1}{2}a[2, \frac{2\pi \cdot 0}{8}, 1]

= \frac{1}{2} \cdot 0.209234 = 0.104617

a[4, \frac{2\pi \cdot 1}{8}, 1] = \frac{1}{2}a[2, \frac{2\pi \cdot 1}{8}, 1]

= \frac{1}{2} \cdot 1.607772 = 0.803886

a[4, \frac{2\pi \cdot 0}{8}, 2] = \frac{1}{2}\big\{a[2, \frac{2\pi \cdot 0}{8}, 2] - a[2, \frac{2\pi}{8}(0 + 2), 2]\big\}

= \frac{1}{2}(-0.43359 - 1.610341) = -1.0219655

a[4, \frac{2\pi \cdot 1}{8}, 2] = \frac{1}{2}\big\{a[2, \frac{2\pi \cdot 1}{8}, 2] - a[2, \frac{2\pi}{8}(1 + 2), 2]\big\}

= \frac{1}{2}(0.551546 + 0.089654) = 0.3206

b[4, \frac{2\pi \cdot 0}{8}, 0] = 0

b[4, \frac{2\pi \cdot 1}{8}, 0] = 0

b[4, \frac{2\pi \cdot 0}{8}, 1] = \frac{1}{2} a[2, \frac{2\pi}{8}(0+2), 1]

= \frac{1}{2} \cdot 2.734993 = 1.3674965

b[4, \frac{2\pi \cdot 1}{8}, 1] = \frac{1}{2} a[2, \frac{2\pi}{8}(1+2), 1]

= \frac{1}{2} \cdot 0.842868 = 0.421434

b[4, \frac{2\pi \cdot 0}{8}, 2] = 0

b[4, \frac{2\pi \cdot 1}{8}, 2] = 0

Next we calculate the numbers a[4, \alpha, 3], a[4, \alpha, 4], b[4, \alpha, 3] and b[4, \alpha, 4] using equations (13.50), which tell us that

a[4, \alpha, 3] = a[4, \alpha, 1]

a[4, \alpha, 4] = a[4, \alpha, 0]

b[4, \alpha, 3] = -b[4, \alpha, 1]

b[4, \alpha, 4] = -b[4, \alpha, 0]

Thus, we can use the previously computed values for a and b in the third iteration to get these. Writing them out explicitly:

a[4, \frac{2\pi \cdot 0}{8}, 3] = 0.104617

a[4, \frac{2\pi \cdot 1}{8}, 3] = 0.803886

a[4, \frac{2\pi \cdot 0}{8}, 4] = 0.5883755

a[4, \frac{2\pi \cdot 1}{8}, 4] = 0.230946

b[4, \frac{2\pi \cdot 0}{8}, 3] = -1.3674965

b[4, \frac{2\pi \cdot 1}{8}, 3] = -0.421434

b[4, \frac{2\pi \cdot 0}{8}, 4] = 0

b[4, \frac{2\pi \cdot 1}{8}, 4] = 0

The above results are summarised in the following table:

We are now ready to begin the fourth (and final) iteration. The iteration variable becomes m = 8 and we have j = 0, 1, 2, 3, 4, k = 0. We compute the numbers a[8, 0, 0], a[8, 0, 1], a[8, 0, 2], a[8, 0, 3], a[8, 0, 4], b[8, 0, 0], b[8, 0, 1], b[8, 0, 2], b[8, 0, 3] and b[8, 0, 4] using formulas (13.48) and (13.49). We get

a[8, 0, 0] = \frac{1}{2}\big\{a[4, 0, 0] + \cos(\pi \cdot 0/4)a[4, 0 + \frac{\pi}{4}, 0] - \sin(\pi \cdot 0/4)b[4, 0 + \frac{\pi}{4}, 0]\big\}

= \frac{1}{2}\big\{a[4, \frac{2\pi \cdot 0}{8}, 0] + a[4, \frac{2\pi}{8}(0 + 1), 0]\big\} = \frac{1}{2}(0.5883755 + 0.230946) = 0.40966075

a[8, 0, 1] = \frac{1}{2}\big\{a[4, 0, 1] + \cos(\pi \cdot 1/4)a[4, 0 + \frac{\pi}{4}, 1] - \sin(\pi \cdot 1/4)b[4, 0 + \frac{\pi}{4}, 1]\big\}

= \frac{1}{2}\big\{a[4, \frac{2\pi \cdot 0}{8}, 1] + \frac{1}{\sqrt{2}}a[4, \frac{2\pi}{8}(0 + 1), 1] -\frac{1}{\sqrt{2}}b[4, \frac{2\pi}{8}(0 + 1), 1]\big\}

= \frac{1}{2}(0.104617 + \frac{1}{\sqrt{2}} \cdot 0.803886 - \frac{1}{\sqrt{2}} \cdot 0.421434) = 0.187525701

a[8, 0, 2] = \frac{1}{2}\big\{a[4, 0, 2] + \cos(\pi \cdot 2/4)a[4, 0 + \frac{\pi}{4}, 2] - \sin(\pi \cdot 2/4)b[4, 0 + \frac{\pi}{4}, 2]\big\}

= \frac{1}{2}a[4, \frac{2\pi \cdot 0}{8}, 2] = \frac{1}{2} \cdot (-1.0219655) = -0.51098275

a[8, 0, 3] = \frac{1}{2}\big\{a[4, 0, 3] + \cos(\pi \cdot 3/4)a[4, 0 + \frac{\pi}{4}, 3] - \sin(\pi \cdot 3/4)b[4, 0 + \frac{\pi}{4}, 3]\big\}

= \frac{1}{2}\big\{a[4, \frac{2\pi \cdot 0}{8}, 3] - \frac{1}{\sqrt{2}}a[4, \frac{2\pi}{8}(0 + 1), 3] -\frac{1}{\sqrt{2}}b[4, \frac{2\pi}{8}(0 + 1), 3]\big\}

= \frac{1}{2}(0.104617 - \frac{1}{\sqrt{2}} \cdot 0.803886 + \frac{1}{\sqrt{2}} \cdot 0.421434) = -0.082908701

a[8, 0, 4] = \frac{1}{2}\big\{a[4, 0, 4] + \cos(\pi \cdot 4/4)a[4, 0 + \frac{\pi}{4}, 4] - \sin(\pi \cdot 4/4)b[4, 0 + \frac{\pi}{4}, 4]\big\}

= \frac{1}{2}(a[4, \frac{2\pi \cdot 0}{8}, 4] - a[4, \frac{2\pi}{8}(0 + 1), 4]) = \frac{1}{2} (0.5883755 - 0.230946) = 0.17871475

b[8, 0, 0] = \frac{1}{2}\big\{b[4, 0, 0] + \sin(\pi \cdot 0/4)a[4, 0 + \frac{\pi}{4}, 0] + \cos(\pi \cdot 0/4)b[4, \frac{2\pi}{8}(0 + 1), 0]\big\}

= 0

b[8, 0, 1] = \frac{1}{2}\big\{b[4, 0, 1] + \sin(\pi \cdot 1/4)a[4, 0 + \frac{\pi}{4}, 1] + \cos(\pi \cdot 1/4)b[4, 0 + \frac{\pi}{4}, 1]\big\}

= \frac{1}{2}(b[4, 0, 1] + \frac{1}{\sqrt{2}} \cdot a[4, \frac{2\pi}{8}(0 + 1), 1] + \frac{1}{\sqrt{2}} \cdot b[4, \frac{2\pi}{8}(0 + 1), 1])

= \frac{1}{2}(1.3674965 + \frac{1}{\sqrt{2}} \cdot 0.803886 + \frac{1}{\sqrt{2}} \cdot 0.421434) = 1.116964291

b[8, 0, 2] = \frac{1}{2}\big\{b[4, 0, 2] + \sin(\pi \cdot 2/4)a[4, 0 + \frac{\pi}{4}, 2] + \cos(\pi \cdot 2/4)b[4, 0 + \frac{\pi}{4}, 2]\big\}

= \frac{1}{2} \cdot a[4, \frac{2\pi}{8}(0 + 1), 2] = \frac{1}{2} \cdot 0.3206 = 0.1603

b[8, 0, 3] = \frac{1}{2}\big\{b[4, 0, 3] + \sin(\pi \cdot 3/4)a[4, 0 + \frac{\pi}{4}, 3] + \cos(\pi \cdot 3/4)b[4, 0 + \frac{\pi}{4}, 3]\big\}

= \frac{1}{2}(b[4, 0, 3] + \frac{1}{\sqrt{2}} \cdot a[4, \frac{2\pi}{8}(0 + 1), 3] - \frac{1}{\sqrt{2}} \cdot b[4, \frac{2\pi}{8}(0 + 1), 3])

= \frac{1}{2}(-1.3674965 + \frac{1}{\sqrt{2}} \cdot 0.803886 + \frac{1}{\sqrt{2}} \cdot 0.421434) = -0.250532209

b[8, 0, 4] = \frac{1}{2}\big\{b[4, 0, 4] + \sin(\pi \cdot 4/4)a[4, 0 + \frac{\pi}{4}, 4] + \cos(\pi \cdot 4/4)b[4, 0 + \frac{\pi}{4}, 4]\big\} = 0

The above results are summarised in the following table:

The required approximation is then

q(x) = 0.204830375+0.187525701\cos(x)+1.116964291\sin(x)

-0.51098275\cos(2x)+0.1603\sin(2x)

-0.082908701\cos(3x)-0.250532209\sin(3x)

The graph of q(x) is plotted in the figures below for comparison with the initial data (the second figure shows more of the periodic pattern):

Illustrating the correspondence between 1-forms and vectors using de Broglie waves

I was asked by a student to clarify the issues surrounding an exercise in the famous book Gravitation written by Misner, Thorne and Wheeler (MTW). The exercise appears as follows in Chapter 2, Section 5:

mtw

The student found a discussion about this problem in Physics Stack Exchange but remained confused as the exercise and the surrounding issues were not dealt with clearly enough in the posts there. I could not find anything else about this online so I tried to clarify the situation a bit more myself. I want to record my thoughts about this in the present note. (The reader should try to solve Exercise 2.1 above for himself/herself at this point, before continuing to read my discussion below).

The key point of this section is that Equation (2.14), the defining equation of 1-forms, can be shown to be physically valid (as well as being a just a mathematical definition) using de Broglie waves in quantum mechanics. The notation in MTW is not ideal, so we will replace the notation \langle \mathbf{\tilde{p}}, \mathbf{v} \rangle for a 1-form evaluated at a vector \mathbf{v} by the notation \mathbf{\tilde{p}}(\mathbf{v}). What MTW are then saying is that given any vector \mathbf{p} we can define a corresponding 1-form as

\mathbf{\tilde{p}} = \langle \mathbf{p}, \ \rangle

which is to be viewed as a function waiting for a vector input (to be placed in the empty space on the right-hand side of the angle brackets). When the vector input \mathbf{v} is supplied, the 1-form will then yield the number

\mathbf{\tilde{p}}(\mathbf{v}) = \langle \mathbf{p}, \mathbf{v} \rangle = \mathbf{p} \cdot \mathbf{v}

In Exercise 2.1 we are asked to verify the validity of this equation using the de Broglie wave

\psi = e^{i \phi} = exp[i(\mathbf{k}\cdot \mathbf{x}- \omega t)]

The phase is the angular argument \phi = \mathbf{k}\cdot \mathbf{x} - \omega t which specifies the position of the wave from some starting point. The phase is parameterised by the wave vector \mathbf{k} which is such that |\mathbf{k}| = 2 \pi/\lambda where \lambda is the wavelength, and by the angular frequency \omega = 2 \pi f where f is the frequency of the relevant oscillator.

It is a well known fact (and it is easy to verify) that given any real-valued function of a vector \phi(\mathbf{x}), the gradient vector \partial \phi/\partial \mathbf{x} is orthogonal to the level surfaces of \phi. In the case of the phase of a de Broglie wave we have

\frac{\partial \phi}{\partial \mathbf{x}} = \mathbf{k}

so the wave vector is the (position) gradient vector of the phase \phi and therefore \mathbf{k} must be orthogonal to loci of constant phase.

In the case of circular waves, for example, these loci of constant phase are circles with centre at the source of the waves and the wave vectors \mathbf{k} point radially outwards at right angles to them, as indicated in the diagram.

To get a diagrammatic understanding of the relationship between 1-forms and vectors, we can imagine focusing on a very small neighbourhood around some point located among these loci of constant phase. On this very small scale, the loci of constant phase will look flat rather than circular, but the wave vectors \mathbf{k} will still be orthogonal to them. What we do is interpret this local pattern of (flat) surfaces of constant phase as the 1-form \mathbf{\tilde{k}}This 1-form corresponding to the wave vector \mathbf{k} is

\mathbf{\tilde{k}} = \langle \mathbf{k}, \ \rangle

and as before we interpret this as a function waiting for a vector input. When it receives a vector input, say \mathbf{v}, it will output a number computed as the scalar product of \mathbf{k} and \mathbf{v}. Thus we can write

\mathbf{\tilde{k}}(\mathbf{v}) = \langle \mathbf{k}, \mathbf{v} \rangle = \mathbf{k} \cdot \mathbf{v}

As indicated in the diagram, the vector \mathbf{v} which we supply to \mathbf{\tilde{k}} will be at an angle to the wave vector \mathbf{k}. If the vector \mathbf{v} is parallel to the loci of constant phase then \mathbf{\tilde{k}}(\mathbf{v}) = 0 because \mathbf{k} and \mathbf{v} will be orthogonal. In the language of 1-forms, this would be interpreted by saying that the vector \mathbf{v} will not pierce the 1-form \mathbf{\tilde{k}} because it will not cross any of the loci of constant phase. Conversely, if the vector \mathbf{v} is parallel to the wave vector \mathbf{k} (orthogonal to the loci of constant phase), we would say that \mathbf{v} will pierce the 1-form \mathbf{\tilde{k}} as much as possible, because it will cross as many loci of constant phase as it possibly can. Between these extremes we will get intermediate values of the 1-form \mathbf{\tilde{k}}(\mathbf{v}). The key idea, then, is that the set of loci of constant phase in the neighbourhood of a point is the diagrammatic representation of the 1-form \mathbf{\tilde{k}}. When we feed a vector \mathbf{v} into this 1-form we get a measure \mathbf{\tilde{k}}(\mathbf{v}) of how many loci of constant phase the vector pierces. This is the language being used by MTW in the prelude to Exercise 2.1 above.

To actually solve Exercise 2.1, begin by recalling from quantum mechanics that a photon’s momentum \mathbf{p} is such that |\mathbf{p}| = E/c where E = hf is the photonic energy and f is the frequency of the oscillator. Since \lambda f = c where \lambda is the photon’s wavelength, we have E = hc/\lambda so the magnitude of the photon’s momentum is

|\mathbf{p}| = \frac{E}{c} = \frac{h}{\lambda} = \hbar \frac{2\pi}{\lambda} = \hbar |\mathbf{k}|

and in fact

\mathbf{p} = \hbar \mathbf{k}

Note that therefore

\lambda = \frac{h}{|\mathbf{p}|}

Famously, de Broglie’s idea in his 1924 PhD thesis was that this wavelength formula applies not just to photons but also to massive particles such as electrons, for which the momentum \mathbf{p} would be calculated as

\mathbf{p} = m \mathbf{u}

where m is the mass of the particle and \mathbf{u} is its four-velocity in Minkowski spacetime. Note that this four-velocity is such that \mathbf{u}\cdot\mathbf{u} = -1 (easily demonstrated using the - +++ metric of Minkowski spacetime).

Thus we have

\mathbf{p} = m \mathbf{u} = \hbar \mathbf{k}

so

\mathbf{u} = \frac{\hbar}{m} \mathbf{k}

In the prelude to Exercise 2.1, MTW say

relabel the surfaces of \mathbf{\tilde{k}} by \hbar \times phase, thereby obtaining the momentum 1-form \mathbf{\tilde{p}}. Pierce this 1-form with any vector \mathbf{v}, and find the result that \mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v}).

Following the authors’ instructions, we relabel the surfaces of \mathbf{\tilde{k}} (i.e., the loci of constant phase) by multiplying by \hbar to get the 1-form

\mathbf{\tilde{p}} = \hbar \mathbf{\tilde{k}} = \hbar \langle \mathbf{k}, \ \rangle

As usual, this 1-form is a linear function waiting for a vector input. Supplying the input \mathbf{v} we then get

\mathbf{\tilde{p}}(\mathbf{v}) = \hbar \langle \mathbf{k}, \mathbf{v} \rangle = \hbar \mathbf{k} \cdot \mathbf{v}

But this is exactly what we get when we work out \mathbf{p} \cdot \mathbf{v} since

\mathbf{p} \cdot \mathbf{v} = m \mathbf{u} \cdot \mathbf{v} = m \frac{\hbar}{m} \mathbf{k} \cdot \mathbf{v} = \hbar \mathbf{k} \cdot \mathbf{v}

Thus, we have solved Exercise 2.1 by showing that \mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v}) is in accord with the quantum mechanical properties of de Broglie waves, as claimed by MTW.

Invariance under rotations in space and conservation of angular momentum

In a previous note I studied in detail the mathematical setup of Noether’s Theorem and its proof. I briefly illustrated the mathematical machinery by considering invariance under translations in time, giving the law of conservation of energy, and invariance under translations in space, giving the law of conservation of linear momentum. I briefly mentioned that invariance under rotations in space would also yield the law of conservation of angular momentum but I  did not work this out explicitly. I want to quickly do this in the present note.

We imagine a particle of unit mass moving freely in the absence of any potential field, and tracing out a path \gamma(t) in the (x, y)-plane of a three-dimensional Euclidean coordinate system between times t_1 and t_2, with the z-coordinate everywhere zero along this path. The angular momentum of the particle at time t with respect to the origin of the coordinate system is given by

\mathbf{L} = \mathbf{r} \times \mathbf{v}

= (\mathbf{i} x + \mathbf{j} y) \times (\mathbf{i} \dot{x} + \mathbf{j} \dot{y})

= \mathbf{k} x \dot{y} - \mathbf{k} y \dot{x}

= \mathbf{k} (x \dot{y} - y \dot{x})

where \times is the vector product operation. Alternatively, we could have obtained this as

\mathbf{L} = \mathbf{r} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & 0 \\ \ \\\dot{x} & \dot{y} & 0 \end{vmatrix}

= \mathbf{k} (x \dot{y} - y \dot{x})

In terms of Lagrangian mechanics, the path \gamma(t) followed by the particle will be a stationary path of the action functional

S[\gamma(t)] = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

(in the absence of a potential field the total energy consists only of kinetic energy).

Now imagine that the entire path \gamma(t) is rotated bodily anticlockwise in the (x, y)-plane through an angle \theta. This corresponds to a one-parameter transformation

\overline{t} \equiv \Phi(t, x, y, \dot{x}, \dot{y}; \theta) = t

\overline{x} \equiv \Psi_1(t, x, y, \dot{x}, \dot{y}; \theta) = x \cos \theta - y \sin \theta

\overline{y} \equiv \Psi_2(t, x, y, \dot{x}, \dot{y}; \theta) = x \sin \theta + y \cos \theta

which reduces to the identity when \theta = 0. We have

d\overline{t} = dt

\dot{\overline{x}}^2 = \dot{x}^2 \cos^2 \theta + \dot{y}^2 \sin^2 \theta - 2 \dot{x} \dot{y} \sin \theta \cos \theta

\dot{\overline{y}}^2 = \dot{x}^2 \sin^2 \theta + \dot{y}^2 \cos^2 \theta + 2 \dot{x} \dot{y} \sin \theta \cos \theta

and therefore

\dot{x}^2 + \dot{y}^2 = \dot{\overline{x}}^2 + \dot{\overline{y}}^2

so the action functional is invariant under this rotation since

S[\overline{\gamma}(t)] = \int_{t_1}^{t_2} d\overline{t} \frac{1}{2}(d\dot{\overline{x}}^2 + d\dot{\overline{y}}^2) = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2) = S[\gamma(t)]

Therefore Noether’s theorem applies. Let

F(t, x, y, \dot{x}, \dot{y}) = \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

Then Noether’s theorem in this case says

\frac{\partial F}{\partial \dot{x}} \psi_1 + \frac{\partial F}{\partial \dot{y}} \psi_2 + \big(F - \frac{\partial F}{\partial \dot{x}} \dot{x} - \frac{\partial F}{\partial \dot{y}} \dot{y}\big) \phi = const.

where

\phi \equiv \frac{\partial \Phi}{\partial \theta} \big|_{\theta = 0} = 0

\psi_1 \equiv \frac{\partial \Psi_1}{\partial \theta} \big|_{\theta = 0} = -y

\psi_2 \equiv \frac{\partial \Psi_2}{\partial \theta} \big|_{\theta = 0} = x

We have

\frac{\partial F}{\partial \dot{x}} = \dot{x}

\frac{\partial F}{\partial \dot{y}} = \dot{y}

Therefore Noether’s theorem gives us (remembering \phi = 0)

-\dot{x} y + \dot{y} x = const.

The expression on the left-hand side of this equation is the angular momentum of the particle (cf. the brief discussion of angular momentum at the start of this note), so this result is precisely the statement that the angular momentum is conserved. Noether’s theorem shows us that this is a direct consequence of the invariance of the action functional of the particle under rotations in space.

A mathematical formulation of Feynman’s ‘mirage on a hot road’

In his famous Feynman Lectures on Physics, Richard Feynman provided an intuitive explanation of how a ‘mirage on a hot road’ can arise due to the bending of light rays from the sky in accordance with Fermat’s Principle (see The Feynman Lectures on Physics, Volume I, Chapter 26). Feynman wrote the following:

feynman vol1 ch26

I was discussing this with a beginning engineering student who did not quite understand why the mirage makes it look as if the water is actually on the road. I explained this by augmenting Feynman’s Fig. 26-8 above as follows:

The bent light ray starting at point A and entering the observer’s eye at point B is interpreted by the observer as having followed a straight line path emanating from the road, as indicated in the diagram. Thus, the observer sees the image of the sky on the road surface and interprets it as a shimmering pool of water.

Having done this, the question then arose as to how one could go about constructing an explicit mathematical model of the above scenario, yielding a suitable equation for the curved light ray from A to B, a linear equation for the apparent straight line path seen by the observer, and explicit coordinates for the point on the road where the image of the sky is seen by the observer. This turned out to be an interesting exercise involving Fermat’s Principle and the Calculus of Variations and is what I want to record here.

Suppose the light ray begins at point A = (a, b) at time t_1, and enters the observer’s eye at point B = (-a, b) at time t_2. Fermat’s Principle (see, e.g., this Wikipedia article) says that the path followed by the light ray is such as to make the optical length functional

S[y] = \int_A^B n ds

stationary, where n = c/v is the refractive index of the medium through which the light passes, c is the speed of light in a vacuum and v = ds/dt is the speed of light in the medium. This functional can be derived (up to a multiplicative constant) from the ‘Principle of Least Time’ by noting that the time taken by the light ray is

T = \int_{t_1}^{t_2} dt = \int_{t_1}^{t_2} \frac{1}{c} \frac{c}{v} \frac{ds}{dt} dt = \int_A^B \frac{n}{c} ds = \frac{1}{c} S

The light ray will find the path that minimises this time of travel.

To apply this setup to the mirage in Feynman’s lecture we need to model the refractive index as a function of the y-coordinate in my amended diagram above, which measures the height above the road. As Feynman says, light goes faster in the hot region near the road than in the cooler region higher up. Thus, since the refractive index is inversely proportional to v, it should be an increasing function of the height above the road y. To get a toy model for the scenario in Feynman’s lecture let us make the simplest possible assumption that the refractive index is a simple linear function of y, namely

n(y) = \alpha + \beta y

with \alpha and \beta both positive. Then since the arc-length element is

ds = dx \sqrt{1 + y^{\prime \ 2}}

we can write the optical length functional as

S[y] = \int_A^B n ds = \int_{a}^{-a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = -\int_{-a}^{a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

We find the stationary path for this functional using the Calculus of Variations. Let

F(x, y, y^{\prime}) = (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

Since this does not depend directly on x, the problem admits a first-integral of the form

y^{\prime} \frac{\partial F}{\partial y^{\prime}} - F = C

where C is a constant. We have

\frac{\partial F}{\partial y^{\prime}} = \frac{(\alpha + \beta y)y^{\prime}}{\sqrt{1 + y^{\prime \ 2}}}

Therefore the first-integral for this problem is

\frac{(\alpha + \beta y)y^{\prime \ 2}}{\sqrt{1 + y^{\prime \ 2}}} - (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = C

Multiplying through by \sqrt{1 + y^{\prime \ 2}}/ \alpha, absorbing \alpha into the constant term, and writing \delta \equiv \beta/\alpha we get

(1 + \delta y) y^{\prime \ 2} - (1 + \delta y)(1 + y^{\prime \ 2}) = C\sqrt{1 + y^{\prime \ 2}}

\iff

-(1 + \delta y) = C\sqrt{1 + y^{\prime \ 2}}

\implies

y^{\prime} = \frac{\pm \sqrt{(1+\delta y)^2 - C^2}}{C}

This is a first-order differential equation for y which can be solved by separation of variables. We get the integral equation

\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \pm \int \frac{dx}{C}

To solve the integral on the left-hand side, make the change of variable

(1 + \delta y) = C sec \theta

\implies

\delta dy = C sec \theta \ tan \theta \ d \theta

Then

\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \int \frac{C sec \theta tan \theta d \theta}{\delta \sqrt{C^2 sec^2 \theta - C^2}}

= \frac{1}{\delta}\int tan \theta d \theta

= \frac{1}{\delta} ln[sec \theta] + const.

= \frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] + const.

For the integral on the right-hand side of the integral equation we get

\pm \int \frac{dx}{C} = \pm \frac{x}{C} + const.

Therefore the integral equation reduces to

\frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] = \pm \frac{x}{C} + const.

\implies

y = \frac{Cexp\big(\pm\frac{\delta x}{C} + const.\big) - 1}{\delta}

This seems to represent two possible solutions for the first-integral equation, which we may write as

y_1 = \frac{Cexp\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

y_2 = \frac{Cexp\big(- \big[ \frac{\delta x}{C} + const. \big] \big) - 1}{\delta}

However, for the curved light ray in my amended diagram above we must have y \rightarrow \infty as x \rightarrow \pm \infty. This condition is not satisfied by either of y_1 or y_2 on their own, but it is satisfied by their sum. We will therefore take the solution of the first integral equation to be

y = \frac{y_1 + y_2}{2}

= \frac{C}{\delta}\bigg[\frac{exp\big(\frac{\delta x}{C} + const.\big) + exp\big(- \big[ \frac{\delta x}{C} + const. \big] \big)}{2}\bigg] - \frac{1}{\delta}

= \frac{C cosh\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

Furthermore, we have y(a) = y(-a) = b and therefore we require

cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(-\frac{\delta a}{C} + const. \big)

But

cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) + sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)

and

cosh\big(-\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) - sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)

These cannot be equal unless sinh(const.) = 0 \implies const. = 0. Thus, our solution for y reduces to

y = \frac{C cosh\big(\frac{\delta x}{C}\big) - 1}{\delta}

with the constant C determined in terms of a and b by

b = \frac{C cosh\big(\frac{\delta a}{C}\big) - 1}{\delta}

This is the equation of the curved path of the light ray from the sky in Feynman’s diagram. The slope of y at point B = (-a, b) is

y^{\prime}(-a) = -sinh\big(\frac{\delta a}{C}\big)

The straight line with this gradient passing through the point B has equation

y = \big(b - asinh\big(\frac{\delta a}{C}\big)\big) - sinh\big(\frac{\delta a}{C}\big)x

This is the equation of the straight line emanating from the x-axis to the observer’s eye in my amended diagram above. On the x-axis we have y = 0 in the straight-line equation so

x = \frac{b}{sinh\big(\frac{\delta a}{C}\big)} - a

This is the point on the x-axis at which the observer in my amended diagram will see the mirage.

Notes on Sturm-Liouville theory

Sturm-Liouville theory was developed in the 19th century in the context of solving differential equations. When one studies it in depth, however, one experiences a sudden realisation that this is the mathematics underlying a lot of quantum mechanics. In quantum mechanics we envisage a quantum state (a time-dependent function) expressed as a superposition of eigenfunctions of a self-adjoint operator (usually referred to as a Hermitian operator) representing an observable. The coefficients of the eigenfunctions in this superposition are probability amplitudes. A measurement of the observable quantity represented by the Hermitian operator produces one of the eigenvalues of the operator with a probability equal to the square of the probability amplitude attached to the eigenfunction corresponding to that eigenvalue in the superposition. It is the fact that the operator is self-adjoint that ensures the eigenvalues are real (and thus observable), and furthermore, that the eigenfunctions corresponding to the eigenvalues form a complete and orthogonal set of functions enabling quantum states to be represented as a superposition in the first place (i.e., an eigenfunction expansion akin to a Fourier series). The Sturm-Liouville theory of the 19th century has essentially this same structure and in fact Sturm-Liouville eigenvalue problems are important more generally in mathematical physics precisely because they frequently arise in attempting to solve commonly-encountered partial differential equations (e.g., Poisson’s equation, the diffusion equation, the wave equation, etc.), particularly when the method of separation of variables is employed.

I want to get an overview of Sturm-Liouville theory in the present note and will begin by considering a nice discussion of a vibrating string problem in Courant & Hilbert’s classic text, Methods of Mathematical Physics (Volume I). Although the problem is simple and the treatment in Courant & Hilbert a bit terse, it (remarkably) brings up a lot of the key features of Sturm-Liouville theory which apply more generally in a wide variety of physics problems. I will then consider Sturm-Liouville theory in a more general setting emphasising the role of the Sturm-Liouville differential operator, and finally I will illustrate further the occurrence of Sturm-Liouville systems in physics by looking at the eigenvalue problems encountered when solving Schrödinger’s equation for the hydrogen atom.

On page 287 of Volume I, Courant & Hilbert write the following:

Equation (12) here is the one-dimensional wave equation

\frac{\partial^2 u}{\partial x^2} = \mu^2 \frac{\partial^2 u}{\partial t^2}

which (as usual) the authors are going to solve by using a separation of variables of the form

u(x, t) = v(x) g(t)

As Courant & Hilbert explain, the problem then involves finding the function v(x) by solving the second-order homogeneous linear differential equation

\frac{\partial^2 v}{\partial x^2} + \lambda v = 0

subject to the boundary conditions

v(0) = v(\pi) = 0

Although not explicitly mentioned by Courant & Hilbert at this stage, equations (13) and (13a) in fact constitute a full blown Sturm-Liouville eigenvalue problem. Despite being very simple, this setup captures many of the typical features encountered in a wide variety of such problems in physics. It is instructive to explore the text underneath equation (13a):

Not all these requirements can be fulfilled for arbitrary values of the constant \lambda.

… the boundary conditions can be fulfilled if and only if \lambda = n^2 is the square of an integer n.

To clarify this, we can try to solve (13) and (13a) for the three possible cases: \lambda < 0, \lambda = 0 and \lambda > 0. Suppose first that \lambda < 0. Then -\lambda > 0 and the auxiliary equation for (13) is

D^2 = - \lambda

\implies

D = \pm \sqrt{- \lambda}

Thus, we can write the general solution of (13) in this case as

v = \alpha e^{\sqrt{-\lambda} x} + \beta e^{-\sqrt{-\lambda} x} = A \mathrm{cosh} \big(\sqrt{-\lambda} x\big) + B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)

where A and B are constants to be determined from the boundary conditions. From the boundary condition v(0) = 0 we conclude that A = 0 so the equation reduces to

v = B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)

But from the boundary condition v(\pi) = 0 we are forced to conclude that B = 0 since \mathrm{sinh} \big(\sqrt{-\lambda} \pi\big) \neq 0. Therefore there is only the trivial solution v(x) = 0 in the case \lambda < 0.

Next, suppose that \lambda = 0. Then equation (13) reduces to

\frac{\mathrm{d}^2 v}{\mathrm{d} x^2} = 0

\implies

v = A + Bx

From the boundary condition v(0) = 0 we must conclude that A = 0, and the boundary condition v(\pi) = 0 means we are also forced to conclude that B = 0. Thus, again, there is only the trivial solution v(x) = 0 in the case \lambda = 0.

We see that nontrivial solutions can only be obtained when \lambda > 0. In this case we have -\lambda < 0 and the auxiliary equation is

D^2 = - \lambda

\implies

D = \pm i \sqrt{\lambda}

Thus, we can write the general solution of (13) in this case as

v = \alpha e^{i \sqrt{\lambda} x} + \beta e^{- i \sqrt{\lambda} x} = A \mathrm{cos} \big(\sqrt{\lambda} x\big) + B \mathrm{sin} \big(\sqrt{\lambda} x\big)

where A and B are again to be determined from the boundary conditions. From the boundary condition v(0) = 0 we conclude that A = 0 so the equation reduces to

v = B \mathrm{sin} \big(\sqrt{\lambda} x\big)

But from the boundary condition v(\pi) = 0 we must conclude that, if B \neq 0, then we must have \sqrt{\lambda} = n where n = 1, 2, 3, \ldots. Thus, we find that for each n = 1, 2, 3, \ldots, the eigenvalues of this Sturm-Liouville problem are \lambda_n = n^2, and the corresponding eigenfunctions are v = B \mathrm{sin}\big(n x\big). The coefficient B is undetermined and must be specified through some normalisation process, for example by setting the integral of v^2 between 0 and \pi equal to 1 and then finding the value of B that is consistent with this. In Courant & Hilbert they have (implicitly) simply set B = 1.

Some features of this solution are typical of Sturm-Liouville eigenvalue problems in physics more generally. For example, the eigenvalues are real (rather than complex) numbers, there is a minimum eigenvalue (\lambda_1 = 1) but not a maximum one, and for each eigenvalue there is a unique eigenfunction (up to a multiplicative constant). Also, importantly, the eigenfunctions here form a complete and orthogonal set of functions. Orthogonality refers to the fact that the integral of a product of any two distinct eigenfunctions over the interval (0, \pi) is zero, i.e.,

\int_0^{\pi} \mathrm{sin}(nx) \mathrm{sin}(mx) \mathrm{d} x = 0

for n \neq m, as can easily be demonstrated in the same way as in the theory of Fourier series. Completeness refers to the fact that over the interval (0, \pi) the infinite set of functions \mathrm{sin} (nx), n = 1, 2, 3, \ldots, can be used to represent any sufficiently well behaved function f(x) using a Fourier series of the form

f(x) = \sum_{n=1}^{\infty} a_n \mathrm{sin} (nx)

All of this is alluded to (without explicit explanation at this stage) in the subsequent part of this section of Courant & Hilbert’s text, where they go on to provide the general solution of the vibrating string problem. They write the following:

The properties of completeness and orthogonality of the eigenfunctions are again a typical feature of the solutions of Sturm-Liouville eigenvalue problems more generally, and this is one of the main reasons why Sturm-Liouville theory is so important to the solution of physical problems involving differential equations. To get a better understanding of this, I will now develop Sturm-Liouville theory in a more general setting by starting with a standard second-order homogeneous linear differential equation of the form

\alpha(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \beta(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \gamma(x) y = 0

where the variable x is confined to an interval a \leq x \leq b.

Let

p(x) = \mathrm{exp} \bigg(\int \mathrm{d} x \frac{\beta(x)}{\alpha(x)}\bigg)

q(x) = \frac{\gamma(x)}{\alpha(x)} p(x)

Dividing the differential equation by \alpha(x) and multiplying through by p(x) we get

p(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \frac{\beta(x)}{\alpha(x)} p(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \frac{\gamma(x)}{\alpha(x)} p(x) y = 0

\iff

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} y}{\mathrm{d} x} \bigg) + q(x) y = 0

\iff

L y = 0

where

L = \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} }{\mathrm{d} x} \bigg) + q(x)

is called the Sturm-Liouville differential operator. Thus, we see already that a wide variety of second-order differential equations encountered in physics will be able to be put into a form involving the operator L, so results concerning the properties of L will have wide applicability.

Using the Sturm-Liouville operator we can now write the defining differential equation of Sturm-Liouville theory in an eigenvalue-eigenfunction format that is very reminiscent of the setup in quantum mechanics outlined at the start of this note. The defining differential equation is

L \phi = - \lambda w \phi

where w(x) is a real-valued positive weight function and \lambda is an eigenvalue corresponding to the eigenfunction \phi. This differential equation is often written out in full as

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

with x \in [a, b]. In Sturm-Liouville problems, the functions p(x), q(x) and w(x) are specified at the start and, crucially, the function \phi is required to satisfy particular boundary conditions at a and b. The boundary conditions are a key aspect of each Sturm-Liouville problem; for a given form of the differential equation, different boundary conditions can produce very different problems. Solving a Sturm-Liouville problem involves finding the values of \lambda for which there exist non-trivial solutions of the defining differential equation above subject to the specified boundary conditions. The vibrating string problem in Courant & Hilbert (discussed above) is a simple example. We obtain the differential equation (13) in that problem by setting p(x) = 1, q(x) = 0 and w(x) = 1 in the defining Sturm-Liouville differential equation.

We would now like to prove that the eigenvalues in a Sturm-Liouville problem will always be real and that the eigenfunctions will form an orthogonal set of functions, as claimed earlier. To do this, we need to consider a few more developments. In Sturm-Liouville theory we can apply L to both real and complex functions, and a key role is played by the concept of the inner product of such functions. Using the notation f(x)^{*} to denote the complex conjugate of the function f(x), we define the inner product of two functions f and g over the interval a \leq x \leq b as

(f, g) = \int_a^b \mathrm{d} x f(x)^{*} g(x)

and we define the weighted inner product as

(f, g)_w = \int_a^b \mathrm{d} x  w(x) f(x)^{*} g(x)

where w(x) is the real-valued positive weight function mentioned earlier. A key result in the theory is Lagrange’s identity, which says that for any two complex-valued functions of a real variable u(x) and v(x), we have

v(Lu)^{*} - u^{*} Lv = \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

This follows from the form of L, since

v(Lu)^{*} - u^{*} Lv = v\bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + q(x) u^{*}\bigg] - u^{*} \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) + q(x) v\bigg]

= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + \frac{\mathrm{d} v}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) - \frac{\mathrm{d} u^{*}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) v \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) u^{*} \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

Using the inner product notation, we can write Lagrange’s identity in an alternative form that reveals the crucial role played by the boundary conditions in a Sturm-Liouville problem. We have

(Lu, v) - (u, Lv) = \int_a^b (Lu)^{*} v \mathrm{d} x - \int_a^b u^{*} Lv \mathrm{d} x

= \int_a^b \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg] \mathrm{d} x

= \int_a^b \mathrm{d} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

= \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b

For some boundary conditions the final term here is zero and then we will have

(Lu, v) = (u, Lv)

When this happens, the operator in conjunction with the boundary conditions is said to be self-adjoint. As an example, a so-called regular Sturm-Liouville problem involves solving the differential equation

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

subject to what are called separated boundary conditions, taking the form

A_1 \phi(a) + A_2 \phi^{\prime}(a) = 0

and

B_1 \phi(b) + B_2 \phi^{\prime}(b) = 0

In this case, the operator L is self-adjoint. To see this, suppose the functions u and v satisfy these boundary conditions. Then at a we have

A_1 u(a)^{*} + A_2 u^{\prime}(a)^{*} = 0

and

A_1 v(a) + A_2 v^{\prime}(a) = 0

from which we can deduce that

\frac{u^{\prime}(a)^{*}}{u(a)^{*}} = -\frac{A_1}{A_2} = \frac{v^{\prime}(a)}{v(a)}

\implies

v(a) u^{\prime}(a)^{*} = u(a)^{*} v^{\prime}(a)

Similarly, at the boundary point b we find that

v(b) u^{\prime}(b)^{*} = u(b)^{*} v^{\prime}(b)

These results then imply

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0

so the operator L is self-adjoint as claimed. As another example, a singular Sturm-Liouville problem involves solving the same differential equation as in the regular problem, but subject to the boundary condition that p(x) is zero at either a or b or both, while being positive for a < x < b. If p(x) does not vanish at one of the boundary points, then \phi is required to satisfy the same boundary condition at that point as in the regular problem. Clearly we will have

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0

in this case too, so the operator L will also be self-adjoint in the case of a singular Sturm-Liouville problem. As a final example, suppose the Sturm-Liouville problem involves solving the same differential equation as before, but with periodic boundary conditions of the form

\phi(a) = \phi(b)

\phi^{\prime}(a) = \phi^{\prime}(b)

and

p(a) = p(b)

Then if u and v are two functions satisfying these boundary conditions we will have

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b

= p(b) \bigg(v(b) u^{\prime}(b)^{*} - u(b)^{*} v^{\prime}(b)\bigg) - p(a) \bigg(v(a) u^{\prime}(a)^{*} - u(a)^{*} v^{\prime}(a)\bigg)

= p(a) \bigg[\bigg(v(b) u^{\prime}(b)^{*} - v(a) u^{\prime}(a)^{*}\bigg) + \bigg(u(a)^{*} v^{\prime}(a) - u(b)^{*} v^{\prime}(b)\bigg)\bigg] = 0

So again, the operator L will be self-adjoint in the case of periodic boundary conditions. We will see later that the singular and periodic cases arise when attempting to solve Schrödinger’s equation for the hydrogen atom.

The key reason for focusing so much on the self-adjoint property of the operator L is that the eigenvalues of a self-adjoint operator are always real, and the eigenfunctions are orthogonal. Note that by orthogonality of the eigenfunctions in the more general context we mean that

(\phi_n, \phi_m)_w = \int_a^b \mathrm{d} x w(x) \phi_n(x)^{*} \phi_m(x) = 0

whenever \phi_n(x) and \phi_m(x) are eigenfunctions corresponding to two distinct eigenvalues.

To prove that the eigenvalues are always real, suppose that \phi(x) is an eigenfunction corresponding to an eigenvalue \lambda. Then we have

L \phi = - \lambda w \phi

and so

(L \phi, \phi) = (- \lambda w \phi, \phi) = \int_a^b (- \lambda w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*} \int_a^b (w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*}\int_a^b \mathrm{d}x w(x)|\phi(x)|^2

But we also have

(\phi, L \phi) = (\phi, - \lambda w \phi) = \int_a^b \phi^{*}(- \lambda w \phi) \mathrm{d} x = -\lambda \int_a^b \phi^{*} (w \phi) \mathrm{d} x = -\lambda\int_a^b \mathrm{d}x w(x)|\phi(x)|^2

Therefore if the operator is self-adjoint we can write

(L \phi, \phi) - (\phi, L \phi) = (\lambda - \lambda^{*}) \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 = 0

\implies

\lambda = \lambda^{*}

since \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 > 0, so the eigenvalues must be real. In particular, this must be the case for regular and singular Sturm-Liouville problems, and for Sturm-Liouville problems involving periodic boundary conditions.

To prove that the eigenfunctions are orthogonal, let \phi(x) and \psi(x) denote two eigenfunctions corresponding to distinct eigenvalues \lambda and \mu respectively. Then we have

L \phi = - \lambda w \phi

L \psi = - \mu w \psi

and so by the self-adjoint property we can write

(L \phi, \psi) - (\phi, L \psi) = \int_a^b (- \lambda w \phi)^{*} \psi \mathrm{d} x - \int_a^b \phi^{*} (- \mu w \psi) \mathrm{d} x

= (\mu - \lambda) \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0

Since the eigenvalues are distinct, the only way this can happen is if

(\phi, \psi)_w = \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0

so the eigenfunctions must be orthogonal as claimed.

In addition to being orthogonal, the eigenfunctions \phi_n(x), n = 1, 2, 3, \dots, of a Sturm-Liouville problem with specified boundary conditions also form a complete set of functions (I will not prove this here), which means that any sufficiently well-behaved function f(x) for which \int_a^b\mathrm{d} x |f(x)|^2 exists can be represented by a Fourier series of the form

f(x) = \sum_{n=1}^{\infty} a_n \phi_n(x)

for x \in [a, b], where the coefficients a_n are given by the formula

a_n = \frac{(\phi_n, f)_w}{(\phi_n, \phi_n)_w} = \frac{\int_a^b \mathrm{d}x w(x) \phi_n(x)^{*} f(x)}{\int_a^b \mathrm{d}x w(x) |\phi_n(x)|^2}

It is the completeness and orthogonality of the eigenfunctions that makes Sturm-Liouville theory so useful in solving linear differential equations, because (for example) it means that the solutions of many second-order inhomogeneous linear differential equations of the form

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)

with suitable boundary conditions can be expressed as a linear combination of the eigenfunctions of the corresponding Sturm-Liouville problem

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

with the same boundary conditions. To illustrate this, suppose this Sturm-Liouville problem with boundary conditions \phi(a) = \phi(b) = 0 has an infinite set of eigenvalues \lambda_k and corresponding eigenfunctions \phi_k(x), k = 1, 2, 3, \dots, which are orthogonal and form a complete set. We will assume that the solution of the inhomogeneous differential equation above is an infinite series of the form

\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)

where the coefficients a_k are constants, and we will find these coefficients using the orthogonality of the eigenfunctions. Since for each k it is true that

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k = - \lambda_k w(x) \phi_k

we can write

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q \phi

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \sum_{k = 1}^{\infty} a_k \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \sum_{k=1}^{\infty} a_k \phi_k

= \sum_{k=1}^{\infty} a_k \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k\bigg]

= \sum_{k=1}^{\infty} a_k\big[- \lambda_k w(x) \phi_k\big]

= - \sum_{k=1}^{\infty} a_k \lambda_k w(x) \phi_k

Thus, in the inhomogeneous equation

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)

we can put

F(u) = - \sum_{k=1}^{\infty} a_k \lambda_k w(u) \phi_k(u)

To find the mth coefficient a_m we can multiply both sides by \phi_m(u)^{*} and integrate. By orthogonality, all the terms in the sum on the right will vanish except the one involving \phi_m(u). We will get

\int_a^b \phi_m(u)^{*} F(u) \mathrm{d}u = - \int_a^b a_m \lambda_m w(x) \phi_m(u)^{*}\phi_m(u) \mathrm{d} u = -a_m \lambda_m (\phi_m, \phi_m)_w

\implies

a_m = -\int_a^b \frac{\phi_m(u)^{*} F(u)}{\lambda_m (\phi_m, \phi_m)_w}\mathrm{d} u

Having found a formula for the coefficients a_k, we can now write the solution of the original inhomogeneous differential equation as

\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)

= \sum_{k = 1}^{\infty} \bigg(-\int_a^b \frac{\phi_k(u)^{*} F(u)}{\lambda_k (\phi_k, \phi_k)_w}\mathrm{d} u\bigg) \phi_k(x)

= \int_a^b \mathrm{d} u \bigg(-\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}\bigg) F(u)

= \int_a^b \mathrm{d} u G(x, u) F(u)

where

G(x, u) \equiv -\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}

To conclude this note, I want to go back to a previous note in which I explored in detail the solution of Schrödinger’s equation for the hydrogen atom by the method of separation of variables. This approach reduced Schrödinger’s partial differential equation into a set of three uncoupled ordinary differential equations which we can now see are in fact Sturm-Liouville problems. As discussed in my previous note, Schrödinger’s three-dimensional equation for the hydrogen atom can be written in spherical polar coordinates as

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0

and after solving this by the usual separation of variables approach starting from the assumption that the \psi function can be expressed as a product

\psi(r, \theta, \phi) = R(r) \Phi(\phi) \Theta(\theta)

we end up with an equation for R (the radial equation) of the form

\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0

and equations for \Phi and \Theta of the forms

\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0

and

\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0

respectively. Taking each of these in turn, we first observe that the radial equation is of the Sturm-Liouville form with p(r) = r^2 and eigenvalues corresponding to the energy term E in the equation. The variable r can range between 0 and \infty and the boundary conditions are formulated in such a way that the solutions of the radial equation remain bounded as r \rightarrow 0 and go to zero as r \rightarrow \infty. Furthermore, since p(0) =0, the radial equation is a singular Sturm-Liouville problem. Next, we observe that the equation for \Phi is essentially the same as equation (13) for the vibrating string in the extract from Courant & Hilbert discussed at the start of this note. The azimuth angle \phi can take any value in (-\infty, \infty) but the function \Phi must take a single value at each point in space (since this is a required property of the quantum wave function which \Phi is a constituent of). It follows that the function \Phi must be periodic since it must take the same value at \phi and \phi + 2\pi for any given \phi. This condition implies the conditions \Phi(0) = \Phi(2 \pi) and \Phi^{\prime}(0) = \Phi^{\prime}(2\pi). Furthermore, we have p(\phi) = 1 for all \phi. Thus, the equation for \Phi is a Sturm-Liouville problem with periodic boundary conditions. Finally, as discussed in my previous note, the \Theta equation can be rewritten as

(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

\iff

\frac{d}{d x}\bigg((1 - x^2) \frac{d \Theta}{d x}\bigg) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

where x = \cos \theta and thus -1 \leq x \leq 1. This is a Sturm-Liouville problem with p(x) = 1 - x^2 and the boundary conditions are given by the requirement that \Theta(\theta) should remain bounded for all x. Since p(x) = 0 at both ends of the interval [-1, 1], this equation can be classified as a singular Sturm-Liouville problem. The eigenvalue is \lambda in this equation.

 

Overview of the Lie theory of rotations

A Lie group is a group which is also a smooth differentiable manifold. Every Lie group has an associated tangent space called a Lie algebra. As a vector space, the Lie algebra is often easier to study than the associated Lie group and can reveal most of what we need to know about the group. This is one of the general motivations for Lie theory. A table of some common Lie groups and their associated Lie algebras can be found here. All matrix groups are Lie groups. An example of a matrix Lie group is the D-dimensional rotation group SO(D). This group is linked to a set of D(D-1)/2 antisymmetric matrices which form the associated Lie algebra, usually denoted by \mathfrak{so}(D). Like all Lie algebras corresponding to Lie groups, the Lie algebra \mathfrak{so}(D) is characterised by a Lie bracket operation which here takes the form of commutation relations between the above-mentioned antisymmetric matrices, satisfying the formula

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

The link between \mathfrak{so}(D) and SO(D) is provided by the matrix exponential map \mathrm{exp}: \mathfrak{so}(D) \rightarrow SO(D) in the sense that each point in the Lie algebra is mapped to a corresponding point in the Lie group by matrix exponentiation. Furthermore, the exponential map defines parametric paths passing through the identity element in the Lie group. The tangent vectors obtained by differentiating these parametric paths and evaluating the derivatives at the identity are the elements of the Lie algebra, showing that the Lie algebra is the tangent space of the associated Lie group manifold.

In the rest of this note I will unpack some aspects of the above brief summary without going too much into highly technical details. The Lie theory of rotations is based on a simple symmetry/invariance consideration, namely that rotations leave the scalar products of vectors invariant. In particular, they leave the lengths of vectors invariant. The Lie theory approach is much more easily generalisable to higher dimensions than the elementary trigonometric approach using the familiar rotation matrices in two and three dimensions. Instead of obtaining the familiar trigonometric rotation matrices by analysing the trigonometric effects of rotations, we will see below that they arise in Lie theory from the exponential map linking the Lie algebra \mathfrak{so}(D) to the rotation group SO(D), in a kind of matrix analogue of Euler’s formula e^{ix} = \mathrm{cos}x +  i \mathrm{sin}x.

Begin by considering rotations in D-dimensional Euclidean space as being implemented by multiplying vectors by a D \times D rotation matrix R(\vec{\theta}) which is a continuous function of some parameter vector \vec{\theta} such that R(\vec{0}) = I. In Lie theory we regard these rotations as being infinitesimally small, in the sense that they move us away from the identity by an infinitesimally small amount. If \mathrm{d}\vec{x} is the column vector of coordinate differentials, then the rotation embodied in R(\vec{\theta}) is implemented as

\mathrm{d}\vec{x}^{\ \prime} = R \mathrm{d}\vec{x}

Since we require lengths to remain unchanged after rotation, we have

\mathrm{d}\vec{x}^{\ \prime \ T} \mathrm{d}\vec{x}^{\ \prime} = \mathrm{d}\vec{x}^{\ T}R^T R \mathrm{d}\vec{x} = \mathrm{d}\vec{x}^{\ T}\mathrm{d}\vec{x}

which implies

R^T R = I

In other words, the matrix R must be orthogonal. Furthermore, since the determinant of a product is the product of the determinants, and the determinant of a transpose is the same as the original determinant, we can write

\mathrm{det}(R^T R) = (\mathrm{det}R)^2 = \mathrm{det}(I) = 1

Therefore we must have

\mathrm{det}(R) = \pm 1

But we can exclude the case \mathrm{det}(R) = -1 because the set of orthogonal matrices with negative determinants produces reflections. For example, the orthogonal matrix

\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}

has determinant -1 and results in a reflection in the x-axis when applied to a vector. Here we are only interested in rotations, which we can now define as having orthogonal transformation matrices R such that \mathrm{det}(R) = 1. Matrices which have unit determinant are called special, so focusing purely on rotations means that we are dealing exclusively with the set of special orthogonal matrices of dimension D, denoted by SO(D).

It is straightforward to verify that SO(D) constitutes a group with the operation of matrix multiplication. It is closed, has an identity element I, each element R \in SO(D) has an inverse (since the determinant is nonzero), and matrix multiplication is associative. Note that this means a rotation matrix times a rotation matrix must give another rotation matrix, so this is another property R(\vec{\theta}) needs to satisfy.

The fact that SO(D) is also a differentiable manifold, and therefore a Lie group, follows in a technical way (which I will not delve into here) from the fact that SO(D) is a closed subgroup of the set of all invertible D \times D real matrices, usually denoted by GL(D, \mathbb{R}), and this itself is a manifold of dimension D^2. The latter fact is demonstrated easily by noting that for M \in GL(D, \mathbb{R}), the determinant function M \mapsto \mathrm{det}(M) is continuous, and GL(D, \mathbb{R}) is the inverse image under this function of the open set \mathbb{R} - \{0\}. Thus, GL(D, \mathbb{R}) is itself an open subset in the D^2-dimensional linear space of all the D \times D real matrices, and thus a manifold of dimension D^2. The matrix Lie group SO(D) is a manifold of dimension \frac{D(D-1)}{2}, not D^2. One way to appreciate this is to observe that the condition R^T R = I for every R \in SO(D) means that you only need to specify \frac{D(D-1)}{2} off-diagonal elements to specify each R. In other words, there are D^2 elements in each R but the condition R^T R = I means that there are \frac{D(D+1)}{2} equations linking them, so the number of `free’ elements in each R \in SO(D) is only D^2 - \frac{D(D+1)}{2} = \frac{D(D-1)}{2}. We will see shortly that \frac{D(D-1)}{2} is also the dimension of \mathfrak{so}(D), which must be the case given that \mathfrak{so}(D) is to be the tangent space of the manifold SO(D) (the dimension of a manifold is the dimension of its tangent space).

If we now Taylor-expand R(\vec{\theta}) to first order about \vec{\theta} = \vec{0} we get

R(\vec{\theta}) \approx I + A

where A is an infinitesimal matrix of order \vec{\theta} and we will (for now) ignore terms like A^2, A^3, \ldots which are of second and higher order in \vec{\theta}. Now substituting R = I + A into R^T R = I we get

(I + A)^T (I + A) = I + A^T + A = I

\implies

A^T = -A

Thus, the matrix A must be antisymmetric. In fact, A will be a linear combination of some elementary antisymmetric basis matrices which play a crucial role in the theory, so we will explore this more. Since a sum of antisymmetric matrices is antisymmetric, and a scalar product of an antisymmetric matrix is antisymmetric, the set of all D \times D antisymmetric matrices is a vector space. This vector space has a basis provided by some elementary antisymmetric matrices containing only two non-zero elements each, the two non-zero elements in each matrix appearing in corresponding positions either side of the main diagonal and having opposite signs (this is what makes the matrices antisymmetric). Since there are \frac{D(D-1)}{2} distinct pairs of possible off-diagonal positions for these two non-zero elements, the basis has dimension \frac{D(D-1)}{2} and, as will be seen shortly, this vector space in fact turns out to be the Lie algebra \mathfrak{so}(D). The basis matrices will be written as J_{(mn)} where m and n identify the pair of corresponding off-diagonal positions in which the two non-zero elements will appear. We will let m run through the numbers 1, 2, \ldots, D in order, and with each pair m and n fixed, the element in the w-th row and k-th column of each matrix J_{(mn)} is then given by the formula

(J_{(mn)})_{wk} = \delta_{mw} \delta_{nk} - \delta_{mk} \delta_{nw}

To clarify this, we will consider the antisymmetric basis matrices for D = 2, D = 3 and D = 4. In the case D = 2 we have \frac{D(D-1)}{2} = 1 so there is a single antisymmetric matrix. Setting m = 1, n = 2, we get (J_{(12)})_{12} = 1 - 0 = 1 and (J_{(12)})_{21} = 0 - 1 = -1 so the antisymmetric matrix is

J_{(12)} = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}

In the case D = 3 we have \frac{D(D-1)}{2} = 3 antisymmetric basis matrices corresponding to the three possible pairs of off-diagonal positions for the two non-zero elements in each matrix. Following the same approach as in the previous case, these can be written as

J_{(12)} = \begin{bmatrix} 0 & 1 & 0\\ -1 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}

J_{(23)} = \begin{bmatrix} 0 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{bmatrix}

J_{(31)} = \begin{bmatrix} 0 & 0 & -1\\ 0 & 0 & 0 \\ 1 & 0 & 0 \end{bmatrix}

Finally, in the case D = 4 we have \frac{D(D-1)}{2} = 6 antisymmetric basis matrices corresponding to the six possible pairs of off-diagonal positions for the two non-zero elements in each matrix. These can be written as

J_{(12)} = \begin{bmatrix} 0 & 1 & 0 & 0\\ -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(23)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(31)} = \begin{bmatrix} 0 & 0 & -1 & 0\\ 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(41)} = \begin{bmatrix} 0 & 0 & 0 & -1\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 \end{bmatrix}

J_{(42)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1\\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}

J_{(43)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{bmatrix}

So in the case of a general infinitesimal rotation in D-dimensional space of the form R(\vec{\theta}) \approx I + A, the antisymmetric matrix A will be a linear combination of the \frac{D(D-1)}{2} antisymmetric basis matrices J_{(mn)} of the form

A = \sum_m \sum_n \theta_{(mn)} J_{(mn)}

But note that using the standard matrix exponential series we have

e^{A} = I + A + \frac{1}{2}A^2 + \cdots \approx I + A

This suggests

R(\vec{\theta}) \approx e^A

and in fact this relationship between rotations and the exponentials of antisymmetric matrices turns out to be exact, not just an approximation. To see this, observe that A and A^{\ T} commute since A^{\ T}A = AA^{\ T} = -A^2. This means that

(e^{A})^{\ T} e^A = e^{A^{\ T}} e^A = e^{A^{\ T} + A} = e^0 = I

(note that in matrix exponentiation e^Ae^B = e^{A+B} only if A and B commute – see below). Since the diagonal elements of an antisymmetric matrix are always zero, we also have

\mathrm{det}(e^A) = e^{tr(A)} = e^0 = 1

Thus, e^A is both special and orthogonal, so it must be an element of SO(D). Conversely, suppose e^A \in SO(D). Then we must have

(e^{A})^{\ T} e^A = I

\iff

e^{A^{\ T}}e^A = I

\iff

e^{A^{\ T}} = e^{-A}

\implies

A^{\ T} = -A

so A is antisymmetric.

So we have a tight link between \mathfrak{so}(D) and SO(D) via matrix exponentiation. We can do a couple of things with this. First, for any real parameter t \in \mathbb{R} and antisymmetric basis matrix J_{(mn)}, we have R(t) \equiv e^{t J_{(mn)}} \in SO(D) and this defines a parametric path through SO(D) which passes through its identity element at t = 0. Differentiating with respect to t and evaluating the derivative at t = 0 we find that

R^{\ \prime}(0) = J_{(mn)}

which indicates that the antisymmetric basis matrices J_{(mn)} are tangent vectors of the manifold SO(D) at the identity, and that the set of \frac{D(D-1)}{2} antisymmetric basis matrices form the tangent space of SO(D). Another thing we can do with the matrix exponential map is quickly recover the elementary rotation matrix in the case D = 2. Noting that  J_{(12)}^2 = -I and separating the exponential series into even and odd terms in the usual way we find that

R(\theta) = e^{\theta J_{(12)}} = \mathrm{cos}\theta I + \mathrm{sin}\theta J_{(12)} = \begin{bmatrix} \mathrm{cos}\theta & \mathrm{sin}\theta \\ -\mathrm{sin}\theta & \mathrm{cos}\theta \end{bmatrix}

where the single real number \theta here is the angle of rotation. This is the matrix analogue of Euler’s formula e^{ix} = \mathrm{cos}x +  i \mathrm{sin}x that was mentioned earlier.

To further elucidate how the antisymmetric basis matrices J_{(mn)} form a Lie algebra which is closely tied to the matrix Lie group SO(D), we will show that the commutation relation between them is closed (i.e., that the commutator of two antisymmetric basis matrices is itself antisymmetric), and that these commutators play a crucial role in ensuring the closure of the group SO(D) (i.e., in ensuring that a rotation multiplied by a rotation produces another rotation). First, suppose that A and B are two distinct antisymmetric matrices. Then since the transpose of a product is the product of the transposes in reverse order we can write

([A, B])^{\ T} = (AB - BA)^{\ T} = (AB)^{\ T} - (BA)^{\ T}

= B^{\ T} A^{\ T}  - A^{\ T} B^{\ T} = BA - AB = - [A, B]

This shows that the commutator of two antisymmetric matrices is itself antisymmetric, so the commutator can be written as a linear combination of the antisymmetric basis matrices J_{(mn)}. Furthermore, since we can write A = \sum_{m, n} \theta_{(mn)} J_{(mn)} and B = \sum_{p, q} \theta_{(pq)}^{\ \prime} J_{(pq)}, we have

[A, B] = \sum_{m, n, p, q} \theta_{(mn)} \theta_{(pq)}^{\ \prime}[J_{(mn)}, J_{(pq)}]

so every commutator between antisymmetric matrices can be written in terms of the commutators [J_{(mn)}, J_{(pq)}] of the antisymmetric basis matrices. Next, suppose we exponentiate the antisymmetric matrices A and B to obtain the rotations e^A and e^B. Since SO(D) is closed, it must be the case that

e^A e^B = e^C

where e^C is another rotation and therefore C is an antisymmetric matrix. To see the role of the commutator between antisymmetric matrices in ensuring this, we will expand both sides. For the left-hand side we get

e^A e^B = (I + A + \frac{1}{2}A^2 + \cdots)(I + B + \frac{1}{2}B^2 + \cdots)

= I + A + B + \frac{1}{2}A^2 + \frac{1}{2}B^2 + AB + \cdots

= I + A + B + \frac{1}{2}(A^2 + AB + BA + B^2) + \frac{1}{2}[A, B] + \cdots

= I + A + B + \frac{1}{2}(A + B)^2 + \frac{1}{2}[A, B] + \cdots

= I + A + B + \frac{1}{2}[A, B] + \cdots

For the right-hand side we get

e^C =  I + C + \frac{1}{2}C^2 + \cdots

Equating the two expansions we get

C = A + B + \frac{1}{2}[A, B] + \cdots

where the remaining terms on the right-hand side are of second and higher order in A, B and C. A result known as the Baker-Campbell-Hausdorff formula shows that the remaining terms on the right-hand side of C are in fact all nested commutators of A and B. The series for C with a few additional terms expressed in this way is

C = A + B + \frac{1}{2}[A, B]

+ \frac{1}{12}\big([A, [A, B]] + [B, [B, A]]\big)

- \frac{1}{24}[B, [A, [A, B]]]

- \frac{1}{720}\big([B, [B, [B, [B, A]]]] + [A, [A, [A, [A, B]]]]\big) + \cdots

This shows that e^A e^B \neq e^{A + B} unless A and B commute, since only in this case do all the commutator terms in the series for C vanish. Since the commutator of two antisymmetric matrices is itself antisymmetric, this result also shows that C is an antisymmetric matrix, and therefore e^C must be a rotation.

Since every commutator between antisymmetric matrices can be written in terms of the commutators [J_{(mn)}, J_{(pq)}] of the antisymmetric basis matrices, a general formula for the latter would seem to be useful. In fact, the formula given earlier, namely

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

completely characterises the Lie algebra \mathfrak{so}(D). To conclude this note we will therefore derive this formula ab initio, starting from the formula

(J_{(mn)})_{wk} = \delta_{mw} \delta_{nk} - \delta_{mk} \delta_{nw}

for the wk-th element of each matrix J_{(mn)}. We have

[J_{(mn)}, J_{(pq)}] = J_{(mn)} J_{(pq)} - J_{(pq)} J_{(mn)}

Focus on J_{(mn)} J_{(pq)} first. Using the Einstein summation convention, the product of the w-th row of J_{(mn)} with the k-th column of J_{(pq)} is

(J_{(mn)})_{wz} (J_{(pq)})_{zk}

= (\delta_{mw} \delta_{nz} - \delta_{mz} \delta_{nw})(\delta_{pz} \delta_{qk} - \delta_{pk} \delta_{qz})

= \delta_{mw} \delta_{qk} \delta_{nz} \delta_{pz} + \delta_{nw} \delta_{pk} \delta_{mz} \delta_{qz} - \delta_{mw} \delta_{pk} \delta_{nz} \delta_{qz} - \delta_{nw} \delta_{qk} \delta_{mz} \delta_{pz}

Now focus on J_{(pq)} J_{(mn)}. The product of the w-th row of J_{(pq)} with the k-th column of J_{(mn)} is

(J_{(pq)})_{wz} (J_{(mn)})_{zk}

= (\delta_{pw} \delta_{qz} - \delta_{pz} \delta_{qw})(\delta_{mz} \delta_{nk} - \delta_{mk} \delta_{nz})

= \delta_{pw} \delta_{nk} \delta_{qz} \delta_{mz} + \delta_{qw} \delta_{mk} \delta_{pz} \delta_{nz} - \delta_{pw} \delta_{mk} \delta_{qz} \delta_{nz} - \delta_{qw} \delta_{nk} \delta_{pz} \delta_{mz}

So the element in the w-th row and k-th column of [J_{(mn)}, J_{(pq)}] is

\delta_{mw} \delta_{qk} \delta_{nz} \delta_{pz} + \delta_{nw} \delta_{pk} \delta_{mz} \delta_{qz} + \delta_{pw} \delta_{mk} \delta_{qz} \delta_{nz} + \delta_{qw} \delta_{nk} \delta_{pz} \delta_{mz}

- \delta_{mw} \delta_{pk} \delta_{nz} \delta_{qz} -  \delta_{nw} \delta_{qk} \delta_{mz} \delta_{pz} - \delta_{pw} \delta_{nk} \delta_{qz} \delta_{mz} - \delta_{qw} \delta_{mk} \delta_{pz} \delta_{nz}

But notice that

\delta_{nz} \delta_{pz} = \delta_{np}

and similarly for the other Einstein summation terms. Thus, the above sum reduces to

(\delta_{mw} \delta_{qk} - \delta_{qw} \delta_{mk})\delta_{np} + (\delta_{nw} \delta_{pk} - \delta_{pw} \delta_{nk})\delta_{mq}

+ (\delta_{pw} \delta_{mk} - \delta_{mw} \delta_{pk})\delta_{nq} + (\delta_{qw} \delta_{nk} - \delta_{nw} \delta_{qk})\delta_{mp}

But

(\delta_{mw} \delta_{qk} - \delta_{mk} \delta_{qw})\delta_{np} = \delta_{np} (J_{(mq)})_{wk}

(\delta_{nw} \delta_{pk} - \delta_{nk} \delta_{pw})\delta_{mq} = \delta_{mq} (J_{(np)})_{wk}

(\delta_{mk} \delta_{pw} - \delta_{mw} \delta_{pk})\delta_{nq} = - \delta_{nq} (J_{(mp)})_{wk}

(\delta_{nk} \delta_{qw} - \delta_{nw} \delta_{qk})\delta_{mp} = - \delta_{mp} (J_{(nq)})_{wk}

Thus the element in the w-th row and k-th column of [J_{(mn)}, J_{(pq)}] is

\delta_{np} (J_{(mq)})_{wk} + \delta_{mq} (J_{(np)})_{wk} - \delta_{mp} (J_{(nq)})_{wk} - \delta_{nq} (J_{(mp)})_{wk}

Extending this to the matrix as a whole gives the required formula:

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

A note on Green’s theorem in the plane and finding areas enclosed by parametric curves

Green’s theorem in the plane says that if PQ\partial P/\partial y and \partial Q/\partial x are single-valued and continuous in a simple connected region \mathfrak{R} bounded by a simple closed curve C, then

\oint_C P \mathrm{d}x + Q \mathrm{d}y = \iint_{\mathfrak{R}}\big(\frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y}\big) \mathrm{d}x \mathrm{d}y

where the line integral along C is in the anti-clockwise direction, as shown in the sketch. The theorem allows one to replace a double-integral over the region \mathfrak{R} by a line integral around the boundary curve C, or vice versa, whichever is the easier one to solve. It acts as a template for generating a multitude of useful formulas of this kind that can be tailored to suit particular situations by carefully choosing the forms of the functions P and Q. For example, in the context of integration of vector functions, if we write a two-dimensional vector V in unit-vector form as V = i V_x + j V_y, then putting Q = V_x and P = - V_y in Green’s theorem gives the divergence theorem in two dimensions, whereas putting Q = V_y and P = V_x gives Stokes’ theorem in two dimensions. Another result like this is obtained by putting Q = x and P = -y in Green’s theorem. This yields a formula for calculating the area enclosed by the simple closed curve C in the sketch above:

\oint_C x\mathrm{d}y - y \mathrm{d}x = \iint_{\mathfrak{R}} \big(\frac{\partial x}{\partial x} - \frac{\partial (-y)}{\partial y}\big) \mathrm{d}x \mathrm{d}y = 2 \iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y

\implies

\iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y = \frac{1}{2} \oint_C x\mathrm{d}y - y \mathrm{d}x

In the present note I want to quickly record a couple of observations about how this last result extends to cases in which the relevant curve C is defined parametrically rather than in terms of Cartesian coordinates.

First, the result can easily be adapted to obtain a very useful formula for finding the areas of closed parametric curves. If the curve C is defined parametrically by x(t) and y(t)), then simply changing variables from x and y to t in the formula above gives

\iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y = \frac{1}{2} \oint_{C} x \mathrm{d}y - y \mathrm{d}x = \frac{1}{2}  \oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t

The expression on the right-hand side can immediately be applied to a huge range of problems involving finding areas of closed parametric curves. As a simple initial illustration to check that it works, we can use it to confirm that the area of a circle of radius r is \pi r^2. The circle is described by the Cartesian equation x^2 + y^2 = r^2 but has a parametric representation x = r\mathrm{cos} (t), y = r \mathrm{sin} (t), with t ranging from 0 to 2 \pi. Therefore \dot{x} = -r\mathrm{sin} (t) and \dot{y} = r\mathrm{cos} (t). Putting these into the formula we get

\frac{1}{2}\oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_{t=0}^{2 \pi} (r^2\mathrm{cos}^2(t) + r^2\mathrm{sin}^2(t)) \mathrm{d}t = \frac{r^2}{2} \int_{t=0}^{2 \pi} \mathrm{d} t = \pi r^2

as expected.

P05As a slightly more interesting example, we can find the area of the main cardioid in the Mandelbrot set. This has parametric representation x = \frac{1}{2} \mathrm{cos}(t) - \frac{1}{4} \mathrm{cos}(2t), y = \frac{1}{2} \mathrm{sin}(t) - \frac{1}{4} \mathrm{sin}(2t) with t ranging from 0 to 2 \pi (see, e.g., Weisstein, Eric W., Mandelbrot set, From MathWorld – A Wolfram Web Resource). We find that \dot{x} = -\frac{1}{2} \mathrm{sin}(t) + \frac{1}{2} \mathrm{sin}(2t) and \dot{y} = \frac{1}{2} \mathrm{cos}(t) - \frac{1}{2} \mathrm{cos}(2t), and therefore

x\dot{y} = \frac{1}{4}  \mathrm{cos}^2(t) - \frac{3}{8} \mathrm{cos}(t) \mathrm{cos}(2t) + \frac{1}{8}\mathrm{cos}^2(2t)

y\dot{x} = -\frac{1}{4} \mathrm{sin}^2(t) + \frac{3}{8} \mathrm{sin}(t)\mathrm{sin}(2t) - \frac{1}{8}\mathrm{sin}^2(2t)

x\dot{y} - y\dot{x} = \frac{3}{8} - \frac{3}{8}\mathrm{cos}(t)

Putting this into the formula we find that the area of the main cardioid in the Mandelbrot set is

\frac{1}{2}\oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_{t=0}^{2 \pi} \big(\frac{3}{8} - \frac{3}{8}\mathrm{cos}(t)\big) \mathrm{d}t = \frac{3\pi}{8}

The second observation I want to make here is that we can sometimes use the same formula to find the area of a region by integrating along a parametric curve that is not closed. This seems surprising at first because we obtained the formula using Green’s theorem which explicitly requires the curve C to be closed. As an illustration of this situation, consider the problem of finding the area A in the diagram. The arc joining the two points (x_1, y_1) and (x_2, y_2) is assumed to have parametric representation x = f(t), y = g(t), such that (x_1, y_1) = (f(t_1), g(t_1)) and (x_2, y_2) = (f(t_2), g(t_2)), with t_2 > t_1. The claim is then that the area A in the diagram is given by the same formula as before, but applied only along the arc joining (x_1, y_1) and (x_2, y_2) rather than all the way around the enclosed region. Thus, we are claiming that

A = \frac{1}{2} \int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t

To prove this we first note that since x = f(t), we can write t = f^{-1}(x), so y can be written as a function of x as y = g(f^{-1}(x)). The area under the arc joining (x_1, y_1) and (x_2, y_2) is then given by

\int_{x_1}^{x_2} g(f^{-1}(x)) \mathrm{d}x

Changing variables in this integral from x to t we find that t_1 = f^{-1}(x_1)t_2 = f^{-1}(x_2), g(f^{-1}(x)) = g(t) = y and \mathrm{d}x = f^{\prime}(t) \mathrm{d}t = \dot{x} \mathrm{d}t. Thus, we find that the area under the arc is given by

\int_{x_1}^{x_2} g(f^{-1}(x)) \mathrm{d}x = \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t

By simple geometry we can then see that the area A is given by

\frac{1}{2}x_2y_2 - \frac{1}{2}x_1y_1 - \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t

where \frac{1}{2}x_2y_2 is the area under the line joining 0 and (x_2, y_2), and \frac{1}{2}x_1y_1 is the area under the line joining 0 and (x_1, y_1).

Next, we imagine flipping the graph over so that y is now along the horizontal axis and x is along the vertical axis. We can proceed in the same way as before to find the area under the curve from this point of view. In Cartesian form the area under the curve is given by

\int_{y_1}^{y_2} f(g^{-1}(y)) \mathrm{d}y

and upon changing variables from y to t this becomes \int_{t_1}^{t_2} x \dot{y} \mathrm{d}t. But, returning to the original graph, we  see that the sum of the two areas \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t and \int_{t_1}^{t_2} x \dot{y} \mathrm{d}t is the same as the difference x_2y_2 - x_1y_1, where x_2y_2 is the area of the rectangle with vertices at (0, 0), (0, y_2), (x_2, y_2) and (x_2, 0), and x_1y_1 is the area of the smaller rectangle with vertices at (0, 0), (0, y_1), (x_1, y_1) and (x_1, 0). Thus, we can write

x_2y_2 - x_1y_1 = \int_{t_1}^{t_2} (x \dot{y} + y \dot{x}) \mathrm{d}t

\iff

\frac{1}{2}x_2y_2 - \frac{1}{2}x_1y_1 - \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t= \frac{1}{2}\int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t

This proves the result since we saw above that the expression on the left-hand side gives the area A.

As a simple application of this result, suppose the arc in the above scenario is part of a parabola with Cartesian equation y = x^2 and we want to find the area A when (x_1, y_1) = (1, 1) and (x_2, y_2) = (2, 4). The quadratic equation has a parametric representation x = t, y = t^2 and t ranges from 1 to 2 along the arc. We have \dot{x} = 1 and \dot{y} = 2t, so putting these into the formula we find that

A = \frac{1}{2} \int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_1^2 t^2 \mathrm{d}t = \frac{7}{6}