Illustrating the correspondence between 1-forms and vectors using de Broglie waves

I was asked by a student to clarify the issues surrounding an exercise in the famous book Gravitation written by Misner, Thorne and Wheeler (MTW). The exercise appears as follows in Chapter 2, Section 5:

mtw

The student found a discussion about this problem in Physics Stack Exchange but remained confused as the exercise and the surrounding issues were not dealt with clearly enough in the posts there. I could not find anything else about this online so I tried to clarify the situation a bit more myself. I want to record my thoughts about this in the present note. (The reader should try to solve Exercise 2.1 above for himself/herself at this point, before continuing to read my discussion below).

The key point of this section is that Equation (2.14), the defining equation of 1-forms, can be shown to be physically valid (as well as being a just a mathematical definition) using de Broglie waves in quantum mechanics. The notation in MTW is not ideal, so we will replace the notation \langle \mathbf{\tilde{p}}, \mathbf{v} \rangle for a 1-form evaluated at a vector \mathbf{v} by the notation \mathbf{\tilde{p}}(\mathbf{v}). What MTW are then saying is that given any vector \mathbf{p} we can define a corresponding 1-form as

\mathbf{\tilde{p}} = \langle \mathbf{p}, \ \rangle

which is to be viewed as a function waiting for a vector input (to be placed in the empty space on the right-hand side of the angle brackets). When the vector input \mathbf{v} is supplied, the 1-form will then yield the number

\mathbf{\tilde{p}}(\mathbf{v}) = \langle \mathbf{p}, \mathbf{v} \rangle = \mathbf{p} \cdot \mathbf{v}

In Exercise 2.1 we are asked to verify the validity of this equation using the de Broglie wave

\psi = e^{i \phi} = exp[i(\mathbf{k}\cdot \mathbf{x}- \omega t)]

The phase is the angular argument \phi = \mathbf{k}\cdot \mathbf{x} - \omega t which specifies the position of the wave from some starting point. The phase is parameterised by the wave vector \mathbf{k} which is such that |\mathbf{k}| = 2 \pi/\lambda where \lambda is the wavelength, and by the angular frequency \omega = 2 \pi f where f is the frequency of the relevant oscillator.

It is a well known fact (and it is easy to verify) that given any real-valued function of a vector \phi(\mathbf{x}), the gradient vector \partial \phi/\partial \mathbf{x} is orthogonal to the level surfaces of \phi. In the case of the phase of a de Broglie wave we have

\frac{\partial \phi}{\partial \mathbf{x}} = \mathbf{k}

so the wave vector is the (position) gradient vector of the phase \phi and therefore \mathbf{k} must be orthogonal to loci of constant phase.

In the case of circular waves, for example, these loci of constant phase are circles with centre at the source of the waves and the wave vectors \mathbf{k} point radially outwards at right angles to them, as indicated in the diagram.

To get a diagrammatic understanding of the relationship between 1-forms and vectors, we can imagine focusing on a very small neighbourhood around some point located among these loci of constant phase. On this very small scale, the loci of constant phase will look flat rather than circular, but the wave vectors \mathbf{k} will still be orthogonal to them. What we do is interpret this local pattern of (flat) surfaces of constant phase as the 1-form \mathbf{\tilde{k}}This 1-form corresponding to the wave vector \mathbf{k} is

\mathbf{\tilde{k}} = \langle \mathbf{k}, \ \rangle

and as before we interpret this as a function waiting for a vector input. When it receives a vector input, say \mathbf{v}, it will output a number computed as the scalar product of \mathbf{k} and \mathbf{v}. Thus we can write

\mathbf{\tilde{k}}(\mathbf{v}) = \langle \mathbf{k}, \mathbf{v} \rangle = \mathbf{k} \cdot \mathbf{v}

As indicated in the diagram, the vector \mathbf{v} which we supply to \mathbf{\tilde{k}} will be at an angle to the wave vector \mathbf{k}. If the vector \mathbf{v} is parallel to the loci of constant phase then \mathbf{\tilde{k}}(\mathbf{v}) = 0 because \mathbf{k} and \mathbf{v} will be orthogonal. In the language of 1-forms, this would be interpreted by saying that the vector \mathbf{v} will not pierce the 1-form \mathbf{\tilde{k}} because it will not cross any of the loci of constant phase. Conversely, if the vector \mathbf{v} is parallel to the wave vector \mathbf{k} (orthogonal to the loci of constant phase), we would say that \mathbf{v} will pierce the 1-form \mathbf{\tilde{k}} as much as possible, because it will cross as many loci of constant phase as it possibly can. Between these extremes we will get intermediate values of the 1-form \mathbf{\tilde{k}}(\mathbf{v}). The key idea, then, is that the set of loci of constant phase in the neighbourhood of a point is the diagrammatic representation of the 1-form \mathbf{\tilde{k}}. When we feed a vector \mathbf{v} into this 1-form we get a measure \mathbf{\tilde{k}}(\mathbf{v}) of how many loci of constant phase the vector pierces. This is the language being used by MTW in the prelude to Exercise 2.1 above.

To actually solve Exercise 2.1, begin by recalling from quantum mechanics that a photon’s momentum \mathbf{p} is such that |\mathbf{p}| = E/c where E = hf is the photonic energy and f is the frequency of the oscillator. Since \lambda f = c where \lambda is the photon’s wavelength, we have E = hc/\lambda so the magnitude of the photon’s momentum is

|\mathbf{p}| = \frac{E}{c} = \frac{h}{\lambda} = \hbar \frac{2\pi}{\lambda} = \hbar |\mathbf{k}|

and in fact

\mathbf{p} = \hbar \mathbf{k}

Note that therefore

\lambda = \frac{h}{|\mathbf{p}|}

Famously, de Broglie’s idea in his 1924 PhD thesis was that this wavelength formula applies not just to photons but also to massive particles such as electrons, for which the momentum \mathbf{p} would be calculated as

\mathbf{p} = m \mathbf{u}

where m is the mass of the particle and \mathbf{u} is its four-velocity in Minkowski spacetime. Note that this four-velocity is such that \mathbf{u}\cdot\mathbf{u} = -1 (easily demonstrated using the - +++ metric of Minkowski spacetime).

Thus we have

\mathbf{p} = m \mathbf{u} = \hbar \mathbf{k}

so

\mathbf{u} = \frac{\hbar}{m} \mathbf{k}

In the prelude to Exercise 2.1, MTW say

relabel the surfaces of \mathbf{\tilde{k}} by \hbar \times phase, thereby obtaining the momentum 1-form \mathbf{\tilde{p}}. Pierce this 1-form with any vector \mathbf{v}, and find the result that \mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v}).

Following the authors’ instructions, we relabel the surfaces of \mathbf{\tilde{k}} (i.e., the loci of constant phase) by multiplying by \hbar to get the 1-form

\mathbf{\tilde{p}} = \hbar \mathbf{\tilde{k}} = \hbar \langle \mathbf{k}, \ \rangle

As usual, this 1-form is a linear function waiting for a vector input. Supplying the input \mathbf{v} we then get

\mathbf{\tilde{p}}(\mathbf{v}) = \hbar \langle \mathbf{k}, \mathbf{v} \rangle = \hbar \mathbf{k} \cdot \mathbf{v}

But this is exactly what we get when we work out \mathbf{p} \cdot \mathbf{v} since

\mathbf{p} \cdot \mathbf{v} = m \mathbf{u} \cdot \mathbf{v} = m \frac{\hbar}{m} \mathbf{k} \cdot \mathbf{v} = \hbar \mathbf{k} \cdot \mathbf{v}

Thus, we have solved Exercise 2.1 by showing that \mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v}) is in accord with the quantum mechanical properties of de Broglie waves, as claimed by MTW.

Invariance under rotations in space and conservation of angular momentum

In a previous note I studied in detail the mathematical setup of Noether’s Theorem and its proof. I briefly illustrated the mathematical machinery by considering invariance under translations in time, giving the law of conservation of energy, and invariance under translations in space, giving the law of conservation of linear momentum. I briefly mentioned that invariance under rotations in space would also yield the law of conservation of angular momentum but I  did not work this out explicitly. I want to quickly do this in the present note.

We imagine a particle of unit mass moving freely in the absence of any potential field, and tracing out a path \gamma(t) in the (x, y)-plane of a three-dimensional Euclidean coordinate system between times t_1 and t_2, with the z-coordinate everywhere zero along this path. The angular momentum of the particle at time t with respect to the origin of the coordinate system is given by

\mathbf{L} = \mathbf{r} \times \mathbf{v}

= (\mathbf{i} x + \mathbf{j} y) \times (\mathbf{i} \dot{x} + \mathbf{j} \dot{y})

= \mathbf{k} x \dot{y} - \mathbf{k} y \dot{x}

= \mathbf{k} (x \dot{y} - y \dot{x})

where \times is the vector product operation. Alternatively, we could have obtained this as

\mathbf{L} = \mathbf{r} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & 0 \\ \ \\\dot{x} & \dot{y} & 0 \end{vmatrix}

= \mathbf{k} (x \dot{y} - y \dot{x})

In terms of Lagrangian mechanics, the path \gamma(t) followed by the particle will be a stationary path of the action functional

S[\gamma(t)] = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

(in the absence of a potential field the total energy consists only of kinetic energy).

Now imagine that the entire path \gamma(t) is rotated bodily anticlockwise in the (x, y)-plane through an angle \theta. This corresponds to a one-parameter transformation

\overline{t} \equiv \Phi(t, x, y, \dot{x}, \dot{y}; \theta) = t

\overline{x} \equiv \Psi_1(t, x, y, \dot{x}, \dot{y}; \theta) = x \cos \theta - y \sin \theta

\overline{y} \equiv \Psi_2(t, x, y, \dot{x}, \dot{y}; \theta) = x \sin \theta + y \cos \theta

which reduces to the identity when \theta = 0. We have

d\overline{t} = dt

\dot{\overline{x}}^2 = \dot{x}^2 \cos^2 \theta + \dot{y}^2 \sin^2 \theta - 2 \dot{x} \dot{y} \sin \theta \cos \theta

\dot{\overline{y}}^2 = \dot{x}^2 \sin^2 \theta + \dot{y}^2 \cos^2 \theta + 2 \dot{x} \dot{y} \sin \theta \cos \theta

and therefore

\dot{x}^2 + \dot{y}^2 = \dot{\overline{x}}^2 + \dot{\overline{y}}^2

so the action functional is invariant under this rotation since

S[\overline{\gamma}(t)] = \int_{t_1}^{t_2} d\overline{t} \frac{1}{2}(d\dot{\overline{x}}^2 + d\dot{\overline{y}}^2) = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2) = S[\gamma(t)]

Therefore Noether’s theorem applies. Let

F(t, x, y, \dot{x}, \dot{y}) = \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

Then Noether’s theorem in this case says

\frac{\partial F}{\partial \dot{x}} \psi_1 + \frac{\partial F}{\partial \dot{y}} \psi_2 + \big(F - \frac{\partial F}{\partial \dot{x}} \dot{x} - \frac{\partial F}{\partial \dot{y}} \dot{y}\big) \phi = const.

where

\phi \equiv \frac{\partial \Phi}{\partial \theta} \big|_{\theta = 0} = 0

\psi_1 \equiv \frac{\partial \Psi_1}{\partial \theta} \big|_{\theta = 0} = -y

\psi_2 \equiv \frac{\partial \Psi_2}{\partial \theta} \big|_{\theta = 0} = x

We have

\frac{\partial F}{\partial \dot{x}} = \dot{x}

\frac{\partial F}{\partial \dot{y}} = \dot{y}

Therefore Noether’s theorem gives us (remembering \phi = 0)

-\dot{x} y + \dot{y} x = const.

The expression on the left-hand side of this equation is the angular momentum of the particle (cf. the brief discussion of angular momentum at the start of this note), so this result is precisely the statement that the angular momentum is conserved. Noether’s theorem shows us that this is a direct consequence of the invariance of the action functional of the particle under rotations in space.

A mathematical formulation of Feynman’s ‘mirage on a hot road’

In his famous Feynman Lectures on Physics, Richard Feynman provided an intuitive explanation of how a ‘mirage on a hot road’ can arise due to the bending of light rays from the sky in accordance with Fermat’s Principle (see The Feynman Lectures on Physics, Volume I, Chapter 26). Feynman wrote the following:

feynman vol1 ch26

I was discussing this with a beginning engineering student who did not quite understand why the mirage makes it look as if the water is actually on the road. I explained this by augmenting Feynman’s Fig. 26-8 above as follows:

The bent light ray starting at point A and entering the observer’s eye at point B is interpreted by the observer as having followed a straight line path emanating from the road, as indicated in the diagram. Thus, the observer sees the image of the sky on the road surface and interprets it as a shimmering pool of water.

Having done this, the question then arose as to how one could go about constructing an explicit mathematical model of the above scenario, yielding a suitable equation for the curved light ray from A to B, a linear equation for the apparent straight line path seen by the observer, and explicit coordinates for the point on the road where the image of the sky is seen by the observer. This turned out to be an interesting exercise involving Fermat’s Principle and the Calculus of Variations and is what I want to record here.

Suppose the light ray begins at point A = (a, b) at time t_1, and enters the observer’s eye at point B = (-a, b) at time t_2. Fermat’s Principle (see, e.g., this Wikipedia article) says that the path followed by the light ray is such as to make the optical length functional

S[y] = \int_A^B n ds

stationary, where n = c/v is the refractive index of the medium through which the light passes, c is the speed of light in a vacuum and v = ds/dt is the speed of light in the medium. This functional can be derived (up to a multiplicative constant) from the ‘Principle of Least Time’ by noting that the time taken by the light ray is

T = \int_{t_1}^{t_2} dt = \int_{t_1}^{t_2} \frac{1}{c} \frac{c}{v} \frac{ds}{dt} dt = \int_A^B \frac{n}{c} ds = \frac{1}{c} S

The light ray will find the path that minimises this time of travel.

To apply this setup to the mirage in Feynman’s lecture we need to model the refractive index as a function of the y-coordinate in my amended diagram above, which measures the height above the road. As Feynman says, light goes faster in the hot region near the road than in the cooler region higher up. Thus, since the refractive index is inversely proportional to v, it should be an increasing function of the height above the road y. To get a toy model for the scenario in Feynman’s lecture let us make the simplest possible assumption that the refractive index is a simple linear function of y, namely

n(y) = \alpha + \beta y

with \alpha and \beta both positive. Then since the arc-length element is

ds = dx \sqrt{1 + y^{\prime \ 2}}

we can write the optical length functional as

S[y] = \int_A^B n ds = \int_{a}^{-a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = -\int_{-a}^{a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

We find the stationary path for this functional using the Calculus of Variations. Let

F(x, y, y^{\prime}) = (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

Since this does not depend directly on x, the problem admits a first-integral of the form

y^{\prime} \frac{\partial F}{\partial y^{\prime}} - F = C

where C is a constant. We have

\frac{\partial F}{\partial y^{\prime}} = \frac{(\alpha + \beta y)y^{\prime}}{\sqrt{1 + y^{\prime \ 2}}}

Therefore the first-integral for this problem is

\frac{(\alpha + \beta y)y^{\prime \ 2}}{\sqrt{1 + y^{\prime \ 2}}} - (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = C

Multiplying through by \sqrt{1 + y^{\prime \ 2}}/ \alpha, absorbing \alpha into the constant term, and writing \delta \equiv \beta/\alpha we get

(1 + \delta y) y^{\prime \ 2} - (1 + \delta y)(1 + y^{\prime \ 2}) = C\sqrt{1 + y^{\prime \ 2}}

\iff

-(1 + \delta y) = C\sqrt{1 + y^{\prime \ 2}}

\implies

y^{\prime} = \frac{\pm \sqrt{(1+\delta y)^2 - C^2}}{C}

This is a first-order differential equation for y which can be solved by separation of variables. We get the integral equation

\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \pm \int \frac{dx}{C}

To solve the integral on the left-hand side, make the change of variable

(1 + \delta y) = C sec \theta

\implies

\delta dy = C sec \theta \ tan \theta \ d \theta

Then

\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \int \frac{C sec \theta tan \theta d \theta}{\delta \sqrt{C^2 sec^2 \theta - C^2}}

= \frac{1}{\delta}\int tan \theta d \theta

= \frac{1}{\delta} ln[sec \theta] + const.

= \frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] + const.

For the integral on the right-hand side of the integral equation we get

\pm \int \frac{dx}{C} = \pm \frac{x}{C} + const.

Therefore the integral equation reduces to

\frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] = \pm \frac{x}{C} + const.

\implies

y = \frac{Cexp\big(\pm\frac{\delta x}{C} + const.\big) - 1}{\delta}

This seems to represent two possible solutions for the first-integral equation, which we may write as

y_1 = \frac{Cexp\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

y_2 = \frac{Cexp\big(- \big[ \frac{\delta x}{C} + const. \big] \big) - 1}{\delta}

However, for the curved light ray in my amended diagram above we must have y \rightarrow \infty as x \rightarrow \pm \infty. This condition is not satisfied by either of y_1 or y_2 on their own, but it is satisfied by their sum. We will therefore take the solution of the first integral equation to be

y = \frac{y_1 + y_2}{2}

= \frac{C}{\delta}\bigg[\frac{exp\big(\frac{\delta x}{C} + const.\big) + exp\big(- \big[ \frac{\delta x}{C} + const. \big] \big)}{2}\bigg] - \frac{1}{\delta}

= \frac{C cosh\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

Furthermore, we have y(a) = y(-a) = b and therefore we require

cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(-\frac{\delta a}{C} + const. \big)

But

cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) + sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)

and

cosh\big(-\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) - sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)

These cannot be equal unless sinh(const.) = 0 \implies const. = 0. Thus, our solution for y reduces to

y = \frac{C cosh\big(\frac{\delta x}{C}\big) - 1}{\delta}

with the constant C determined in terms of a and b by

b = \frac{C cosh\big(\frac{\delta a}{C}\big) - 1}{\delta}

This is the equation of the curved path of the light ray from the sky in Feynman’s diagram. The slope of y at point B = (-a, b) is

y^{\prime}(-a) = -sinh\big(\frac{\delta a}{C}\big)

The straight line with this gradient passing through the point B has equation

y = \big(b - asinh\big(\frac{\delta a}{C}\big)\big) - sinh\big(\frac{\delta a}{C}\big)x

This is the equation of the straight line emanating from the x-axis to the observer’s eye in my amended diagram above. On the x-axis we have y = 0 in the straight-line equation so

x = \frac{b}{sinh\big(\frac{\delta a}{C}\big)} - a

This is the point on the x-axis at which the observer in my amended diagram will see the mirage.

Notes on Sturm-Liouville theory

Sturm-Liouville theory was developed in the 19th century in the context of solving differential equations. When one studies it in depth, however, one experiences a sudden realisation that this is the mathematics underlying a lot of quantum mechanics. In quantum mechanics we envisage a quantum state (a time-dependent function) expressed as a superposition of eigenfunctions of a self-adjoint operator (usually referred to as a Hermitian operator) representing an observable. The coefficients of the eigenfunctions in this superposition are probability amplitudes. A measurement of the observable quantity represented by the Hermitian operator produces one of the eigenvalues of the operator with a probability equal to the square of the probability amplitude attached to the eigenfunction corresponding to that eigenvalue in the superposition. It is the fact that the operator is self-adjoint that ensures the eigenvalues are real (and thus observable), and furthermore, that the eigenfunctions corresponding to the eigenvalues form a complete and orthogonal set of functions enabling quantum states to be represented as a superposition in the first place (i.e., an eigenfunction expansion akin to a Fourier series). The Sturm-Liouville theory of the 19th century has essentially this same structure and in fact Sturm-Liouville eigenvalue problems are important more generally in mathematical physics precisely because they frequently arise in attempting to solve commonly-encountered partial differential equations (e.g., Poisson’s equation, the diffusion equation, the wave equation, etc.), particularly when the method of separation of variables is employed.

I want to get an overview of Sturm-Liouville theory in the present note and will begin by considering a nice discussion of a vibrating string problem in Courant & Hilbert’s classic text, Methods of Mathematical Physics (Volume I). Although the problem is simple and the treatment in Courant & Hilbert a bit terse, it (remarkably) brings up a lot of the key features of Sturm-Liouville theory which apply more generally in a wide variety of physics problems. I will then consider Sturm-Liouville theory in a more general setting emphasising the role of the Sturm-Liouville differential operator, and finally I will illustrate further the occurrence of Sturm-Liouville systems in physics by looking at the eigenvalue problems encountered when solving Schrödinger’s equation for the hydrogen atom.

On page 287 of Volume I, Courant & Hilbert write the following:

Equation (12) here is the one-dimensional wave equation

\frac{\partial^2 u}{\partial x^2} = \mu^2 \frac{\partial^2 u}{\partial t^2}

which (as usual) the authors are going to solve by using a separation of variables of the form

u(x, t) = v(x) g(t)

As Courant & Hilbert explain, the problem then involves finding the function v(x) by solving the second-order homogeneous linear differential equation

\frac{\partial^2 v}{\partial x^2} + \lambda v = 0

subject to the boundary conditions

v(0) = v(\pi) = 0

Although not explicitly mentioned by Courant & Hilbert at this stage, equations (13) and (13a) in fact constitute a full blown Sturm-Liouville eigenvalue problem. Despite being very simple, this setup captures many of the typical features encountered in a wide variety of such problems in physics. It is instructive to explore the text underneath equation (13a):

Not all these requirements can be fulfilled for arbitrary values of the constant \lambda.

… the boundary conditions can be fulfilled if and only if \lambda = n^2 is the square of an integer n.

To clarify this, we can try to solve (13) and (13a) for the three possible cases: \lambda < 0, \lambda = 0 and \lambda > 0. Suppose first that \lambda < 0. Then -\lambda > 0 and the auxiliary equation for (13) is

D^2 = - \lambda

\implies

D = \pm \sqrt{- \lambda}

Thus, we can write the general solution of (13) in this case as

v = \alpha e^{\sqrt{-\lambda} x} + \beta e^{-\sqrt{-\lambda} x} = A \mathrm{cosh} \big(\sqrt{-\lambda} x\big) + B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)

where A and B are constants to be determined from the boundary conditions. From the boundary condition v(0) = 0 we conclude that A = 0 so the equation reduces to

v = B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)

But from the boundary condition v(\pi) = 0 we are forced to conclude that B = 0 since \mathrm{sinh} \big(\sqrt{-\lambda} \pi\big) \neq 0. Therefore there is only the trivial solution v(x) = 0 in the case \lambda < 0.

Next, suppose that \lambda = 0. Then equation (13) reduces to

\frac{\mathrm{d}^2 v}{\mathrm{d} x^2} = 0

\implies

v = A + Bx

From the boundary condition v(0) = 0 we must conclude that A = 0, and the boundary condition v(\pi) = 0 means we are also forced to conclude that B = 0. Thus, again, there is only the trivial solution v(x) = 0 in the case \lambda = 0.

We see that nontrivial solutions can only be obtained when \lambda > 0. In this case we have -\lambda < 0 and the auxiliary equation is

D^2 = - \lambda

\implies

D = \pm i \sqrt{\lambda}

Thus, we can write the general solution of (13) in this case as

v = \alpha e^{i \sqrt{\lambda} x} + \beta e^{- i \sqrt{\lambda} x} = A \mathrm{cos} \big(\sqrt{\lambda} x\big) + B \mathrm{sin} \big(\sqrt{\lambda} x\big)

where A and B are again to be determined from the boundary conditions. From the boundary condition v(0) = 0 we conclude that A = 0 so the equation reduces to

v = B \mathrm{sin} \big(\sqrt{\lambda} x\big)

But from the boundary condition v(\pi) = 0 we must conclude that, if B \neq 0, then we must have \sqrt{\lambda} = n where n = 1, 2, 3, \ldots. Thus, we find that for each n = 1, 2, 3, \ldots, the eigenvalues of this Sturm-Liouville problem are \lambda_n = n^2, and the corresponding eigenfunctions are v = B \mathrm{sin}\big(n x\big). The coefficient B is undetermined and must be specified through some normalisation process, for example by setting the integral of v^2 between 0 and \pi equal to 1 and then finding the value of B that is consistent with this. In Courant & Hilbert they have (implicitly) simply set B = 1.

Some features of this solution are typical of Sturm-Liouville eigenvalue problems in physics more generally. For example, the eigenvalues are real (rather than complex) numbers, there is a minimum eigenvalue (\lambda_1 = 1) but not a maximum one, and for each eigenvalue there is a unique eigenfunction (up to a multiplicative constant). Also, importantly, the eigenfunctions here form a complete and orthogonal set of functions. Orthogonality refers to the fact that the integral of a product of any two distinct eigenfunctions over the interval (0, \pi) is zero, i.e.,

\int_0^{\pi} \mathrm{sin}(nx) \mathrm{sin}(mx) \mathrm{d} x = 0

for n \neq m, as can easily be demonstrated in the same way as in the theory of Fourier series. Completeness refers to the fact that over the interval (0, \pi) the infinite set of functions \mathrm{sin} (nx), n = 1, 2, 3, \ldots, can be used to represent any sufficiently well behaved function f(x) using a Fourier series of the form

f(x) = \sum_{n=1}^{\infty} a_n \mathrm{sin} (nx)

All of this is alluded to (without explicit explanation at this stage) in the subsequent part of this section of Courant & Hilbert’s text, where they go on to provide the general solution of the vibrating string problem. They write the following:

The properties of completeness and orthogonality of the eigenfunctions are again a typical feature of the solutions of Sturm-Liouville eigenvalue problems more generally, and this is one of the main reasons why Sturm-Liouville theory is so important to the solution of physical problems involving differential equations. To get a better understanding of this, I will now develop Sturm-Liouville theory in a more general setting by starting with a standard second-order homogeneous linear differential equation of the form

\alpha(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \beta(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \gamma(x) y = 0

where the variable x is confined to an interval a \leq x \leq b.

Let

p(x) = \mathrm{exp} \bigg(\int \mathrm{d} x \frac{\beta(x)}{\alpha(x)}\bigg)

q(x) = \frac{\gamma(x)}{\alpha(x)} p(x)

Dividing the differential equation by \alpha(x) and multiplying through by p(x) we get

p(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \frac{\beta(x)}{\alpha(x)} p(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \frac{\gamma(x)}{\alpha(x)} p(x) y = 0

\iff

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} y}{\mathrm{d} x} \bigg) + q(x) y = 0

\iff

L y = 0

where

L = \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} }{\mathrm{d} x} \bigg) + q(x)

is called the Sturm-Liouville differential operator. Thus, we see already that a wide variety of second-order differential equations encountered in physics will be able to be put into a form involving the operator L, so results concerning the properties of L will have wide applicability.

Using the Sturm-Liouville operator we can now write the defining differential equation of Sturm-Liouville theory in an eigenvalue-eigenfunction format that is very reminiscent of the setup in quantum mechanics outlined at the start of this note. The defining differential equation is

L \phi = - \lambda w \phi

where w(x) is a real-valued positive weight function and \lambda is an eigenvalue corresponding to the eigenfunction \phi. This differential equation is often written out in full as

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

with x \in [a, b]. In Sturm-Liouville problems, the functions p(x), q(x) and w(x) are specified at the start and, crucially, the function \phi is required to satisfy particular boundary conditions at a and b. The boundary conditions are a key aspect of each Sturm-Liouville problem; for a given form of the differential equation, different boundary conditions can produce very different problems. Solving a Sturm-Liouville problem involves finding the values of \lambda for which there exist non-trivial solutions of the defining differential equation above subject to the specified boundary conditions. The vibrating string problem in Courant & Hilbert (discussed above) is a simple example. We obtain the differential equation (13) in that problem by setting p(x) = 1, q(x) = 0 and w(x) = 1 in the defining Sturm-Liouville differential equation.

We would now like to prove that the eigenvalues in a Sturm-Liouville problem will always be real and that the eigenfunctions will form an orthogonal set of functions, as claimed earlier. To do this, we need to consider a few more developments. In Sturm-Liouville theory we can apply L to both real and complex functions, and a key role is played by the concept of the inner product of such functions. Using the notation f(x)^{*} to denote the complex conjugate of the function f(x), we define the inner product of two functions f and g over the interval a \leq x \leq b as

(f, g) = \int_a^b \mathrm{d} x f(x)^{*} g(x)

and we define the weighted inner product as

(f, g)_w = \int_a^b \mathrm{d} x  w(x) f(x)^{*} g(x)

where w(x) is the real-valued positive weight function mentioned earlier. A key result in the theory is Lagrange’s identity, which says that for any two complex-valued functions of a real variable u(x) and v(x), we have

v(Lu)^{*} - u^{*} Lv = \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

This follows from the form of L, since

v(Lu)^{*} - u^{*} Lv = v\bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + q(x) u^{*}\bigg] - u^{*} \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) + q(x) v\bigg]

= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + \frac{\mathrm{d} v}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) - \frac{\mathrm{d} u^{*}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) v \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) u^{*} \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

Using the inner product notation, we can write Lagrange’s identity in an alternative form that reveals the crucial role played by the boundary conditions in a Sturm-Liouville problem. We have

(Lu, v) - (u, Lv) = \int_a^b (Lu)^{*} v \mathrm{d} x - \int_a^b u^{*} Lv \mathrm{d} x

= \int_a^b \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg] \mathrm{d} x

= \int_a^b \mathrm{d} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

= \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b

For some boundary conditions the final term here is zero and then we will have

(Lu, v) = (u, Lv)

When this happens, the operator in conjunction with the boundary conditions is said to be self-adjoint. As an example, a so-called regular Sturm-Liouville problem involves solving the differential equation

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

subject to what are called separated boundary conditions, taking the form

A_1 \phi(a) + A_2 \phi^{\prime}(a) = 0

and

B_1 \phi(b) + B_2 \phi^{\prime}(b) = 0

In this case, the operator L is self-adjoint. To see this, suppose the functions u and v satisfy these boundary conditions. Then at a we have

A_1 u(a)^{*} + A_2 u^{\prime}(a)^{*} = 0

and

A_1 v(a) + A_2 v^{\prime}(a) = 0

from which we can deduce that

\frac{u^{\prime}(a)^{*}}{u(a)^{*}} = -\frac{A_1}{A_2} = \frac{v^{\prime}(a)}{v(a)}

\implies

v(a) u^{\prime}(a)^{*} = u(a)^{*} v^{\prime}(a)

Similarly, at the boundary point b we find that

v(b) u^{\prime}(b)^{*} = u(b)^{*} v^{\prime}(b)

These results then imply

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0

so the operator L is self-adjoint as claimed. As another example, a singular Sturm-Liouville problem involves solving the same differential equation as in the regular problem, but subject to the boundary condition that p(x) is zero at either a or b or both, while being positive for a < x < b. If p(x) does not vanish at one of the boundary points, then \phi is required to satisfy the same boundary condition at that point as in the regular problem. Clearly we will have

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0

in this case too, so the operator L will also be self-adjoint in the case of a singular Sturm-Liouville problem. As a final example, suppose the Sturm-Liouville problem involves solving the same differential equation as before, but with periodic boundary conditions of the form

\phi(a) = \phi(b)

\phi^{\prime}(a) = \phi^{\prime}(b)

and

p(a) = p(b)

Then if u and v are two functions satisfying these boundary conditions we will have

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b

= p(b) \bigg(v(b) u^{\prime}(b)^{*} - u(b)^{*} v^{\prime}(b)\bigg) - p(a) \bigg(v(a) u^{\prime}(a)^{*} - u(a)^{*} v^{\prime}(a)\bigg)

= p(a) \bigg[\bigg(v(b) u^{\prime}(b)^{*} - v(a) u^{\prime}(a)^{*}\bigg) + \bigg(u(a)^{*} v^{\prime}(a) - u(b)^{*} v^{\prime}(b)\bigg)\bigg] = 0

So again, the operator L will be self-adjoint in the case of periodic boundary conditions. We will see later that the singular and periodic cases arise when attempting to solve Schrödinger’s equation for the hydrogen atom.

The key reason for focusing so much on the self-adjoint property of the operator L is that the eigenvalues of a self-adjoint operator are always real, and the eigenfunctions are orthogonal. Note that by orthogonality of the eigenfunctions in the more general context we mean that

(\phi_n, \phi_m)_w = \int_a^b \mathrm{d} x w(x) \phi_n(x)^{*} \phi_m(x) = 0

whenever \phi_n(x) and \phi_m(x) are eigenfunctions corresponding to two distinct eigenvalues.

To prove that the eigenvalues are always real, suppose that \phi(x) is an eigenfunction corresponding to an eigenvalue \lambda. Then we have

L \phi = - \lambda w \phi

and so

(L \phi, \phi) = (- \lambda w \phi, \phi) = \int_a^b (- \lambda w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*} \int_a^b (w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*}\int_a^b \mathrm{d}x w(x)|\phi(x)|^2

But we also have

(\phi, L \phi) = (\phi, - \lambda w \phi) = \int_a^b \phi^{*}(- \lambda w \phi) \mathrm{d} x = -\lambda \int_a^b \phi^{*} (w \phi) \mathrm{d} x = -\lambda\int_a^b \mathrm{d}x w(x)|\phi(x)|^2

Therefore if the operator is self-adjoint we can write

(L \phi, \phi) - (\phi, L \phi) = (\lambda - \lambda^{*}) \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 = 0

\implies

\lambda = \lambda^{*}

since \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 > 0, so the eigenvalues must be real. In particular, this must be the case for regular and singular Sturm-Liouville problems, and for Sturm-Liouville problems involving periodic boundary conditions.

To prove that the eigenfunctions are orthogonal, let \phi(x) and \psi(x) denote two eigenfunctions corresponding to distinct eigenvalues \lambda and \mu respectively. Then we have

L \phi = - \lambda w \phi

L \psi = - \mu w \psi

and so by the self-adjoint property we can write

(L \phi, \psi) - (\phi, L \psi) = \int_a^b (- \lambda w \phi)^{*} \psi \mathrm{d} x - \int_a^b \phi^{*} (- \mu w \psi) \mathrm{d} x

= (\mu - \lambda) \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0

Since the eigenvalues are distinct, the only way this can happen is if

(\phi, \psi)_w = \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0

so the eigenfunctions must be orthogonal as claimed.

In addition to being orthogonal, the eigenfunctions \phi_n(x), n = 1, 2, 3, \dots, of a Sturm-Liouville problem with specified boundary conditions also form a complete set of functions (I will not prove this here), which means that any sufficiently well-behaved function f(x) for which \int_a^b\mathrm{d} x |f(x)|^2 exists can be represented by a Fourier series of the form

f(x) = \sum_{n=1}^{\infty} a_n \phi_n(x)

for x \in [a, b], where the coefficients a_n are given by the formula

a_n = \frac{(\phi_n, f)_w}{(\phi_n, \phi_n)_w} = \frac{\int_a^b \mathrm{d}x w(x) \phi_n(x)^{*} f(x)}{\int_a^b \mathrm{d}x w(x) |\phi_n(x)|^2}

It is the completeness and orthogonality of the eigenfunctions that makes Sturm-Liouville theory so useful in solving linear differential equations, because (for example) it means that the solutions of many second-order inhomogeneous linear differential equations of the form

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)

with suitable boundary conditions can be expressed as a linear combination of the eigenfunctions of the corresponding Sturm-Liouville problem

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

with the same boundary conditions. To illustrate this, suppose this Sturm-Liouville problem with boundary conditions \phi(a) = \phi(b) = 0 has an infinite set of eigenvalues \lambda_k and corresponding eigenfunctions \phi_k(x), k = 1, 2, 3, \dots, which are orthogonal and form a complete set. We will assume that the solution of the inhomogeneous differential equation above is an infinite series of the form

\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)

where the coefficients a_k are constants, and we will find these coefficients using the orthogonality of the eigenfunctions. Since for each k it is true that

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k = - \lambda_k w(x) \phi_k

we can write

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q \phi

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \sum_{k = 1}^{\infty} a_k \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \sum_{k=1}^{\infty} a_k \phi_k

= \sum_{k=1}^{\infty} a_k \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k\bigg]

= \sum_{k=1}^{\infty} a_k\big[- \lambda_k w(x) \phi_k\big]

= - \sum_{k=1}^{\infty} a_k \lambda_k w(x) \phi_k

Thus, in the inhomogeneous equation

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)

we can put

F(u) = - \sum_{k=1}^{\infty} a_k \lambda_k w(u) \phi_k(u)

To find the mth coefficient a_m we can multiply both sides by \phi_m(u)^{*} and integrate. By orthogonality, all the terms in the sum on the right will vanish except the one involving \phi_m(u). We will get

\int_a^b \phi_m(u)^{*} F(u) \mathrm{d}u = - \int_a^b a_m \lambda_m w(x) \phi_m(u)^{*}\phi_m(u) \mathrm{d} u = -a_m \lambda_m (\phi_m, \phi_m)_w

\implies

a_m = -\int_a^b \frac{\phi_m(u)^{*} F(u)}{\lambda_m (\phi_m, \phi_m)_w}\mathrm{d} u

Having found a formula for the coefficients a_k, we can now write the solution of the original inhomogeneous differential equation as

\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)

= \sum_{k = 1}^{\infty} \bigg(-\int_a^b \frac{\phi_k(u)^{*} F(u)}{\lambda_k (\phi_k, \phi_k)_w}\mathrm{d} u\bigg) \phi_k(x)

= \int_a^b \mathrm{d} u \bigg(-\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}\bigg) F(u)

= \int_a^b \mathrm{d} u G(x, u) F(u)

where

G(x, u) \equiv -\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}

To conclude this note, I want to go back to a previous note in which I explored in detail the solution of Schrödinger’s equation for the hydrogen atom by the method of separation of variables. This approach reduced Schrödinger’s partial differential equation into a set of three uncoupled ordinary differential equations which we can now see are in fact Sturm-Liouville problems. As discussed in my previous note, Schrödinger’s three-dimensional equation for the hydrogen atom can be written in spherical polar coordinates as

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0

and after solving this by the usual separation of variables approach starting from the assumption that the \psi function can be expressed as a product

\psi(r, \theta, \phi) = R(r) \Phi(\phi) \Theta(\theta)

we end up with an equation for R (the radial equation) of the form

\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0

and equations for \Phi and \Theta of the forms

\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0

and

\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0

respectively. Taking each of these in turn, we first observe that the radial equation is of the Sturm-Liouville form with p(r) = r^2 and eigenvalues corresponding to the energy term E in the equation. The variable r can range between 0 and \infty and the boundary conditions are formulated in such a way that the solutions of the radial equation remain bounded as r \rightarrow 0 and go to zero as r \rightarrow \infty. Furthermore, since p(0) =0, the radial equation is a singular Sturm-Liouville problem. Next, we observe that the equation for \Phi is essentially the same as equation (13) for the vibrating string in the extract from Courant & Hilbert discussed at the start of this note. The azimuth angle \phi can take any value in (-\infty, \infty) but the function \Phi must take a single value at each point in space (since this is a required property of the quantum wave function which \Phi is a constituent of). It follows that the function \Phi must be periodic since it must take the same value at \phi and \phi + 2\pi for any given \phi. This condition implies the conditions \Phi(0) = \Phi(2 \pi) and \Phi^{\prime}(0) = \Phi^{\prime}(2\pi). Furthermore, we have p(\phi) = 1 for all \phi. Thus, the equation for \Phi is a Sturm-Liouville problem with periodic boundary conditions. Finally, as discussed in my previous note, the \Theta equation can be rewritten as

(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

\iff

\frac{d}{d x}\bigg((1 - x^2) \frac{d \Theta}{d x}\bigg) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

where x = \cos \theta and thus -1 \leq x \leq 1. This is a Sturm-Liouville problem with p(x) = 1 - x^2 and the boundary conditions are given by the requirement that \Theta(\theta) should remain bounded for all x. Since p(x) = 0 at both ends of the interval [-1, 1], this equation can be classified as a singular Sturm-Liouville problem. The eigenvalue is \lambda in this equation.

 

Overview of the Lie theory of rotations

A Lie group is a group which is also a smooth differentiable manifold. Every Lie group has an associated tangent space called a Lie algebra. As a vector space, the Lie algebra is often easier to study than the associated Lie group and can reveal most of what we need to know about the group. This is one of the general motivations for Lie theory. A table of some common Lie groups and their associated Lie algebras can be found here. All matrix groups are Lie groups. An example of a matrix Lie group is the D-dimensional rotation group SO(D). This group is linked to a set of D(D-1)/2 antisymmetric matrices which form the associated Lie algebra, usually denoted by \mathfrak{so}(D). Like all Lie algebras corresponding to Lie groups, the Lie algebra \mathfrak{so}(D) is characterised by a Lie bracket operation which here takes the form of commutation relations between the above-mentioned antisymmetric matrices, satisfying the formula

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

The link between \mathfrak{so}(D) and SO(D) is provided by the matrix exponential map \mathrm{exp}: \mathfrak{so}(D) \rightarrow SO(D) in the sense that each point in the Lie algebra is mapped to a corresponding point in the Lie group by matrix exponentiation. Furthermore, the exponential map defines parametric paths passing through the identity element in the Lie group. The tangent vectors obtained by differentiating these parametric paths and evaluating the derivatives at the identity are the elements of the Lie algebra, showing that the Lie algebra is the tangent space of the associated Lie group manifold.

In the rest of this note I will unpack some aspects of the above brief summary without going too much into highly technical details. The Lie theory of rotations is based on a simple symmetry/invariance consideration, namely that rotations leave the scalar products of vectors invariant. In particular, they leave the lengths of vectors invariant. The Lie theory approach is much more easily generalisable to higher dimensions than the elementary trigonometric approach using the familiar rotation matrices in two and three dimensions. Instead of obtaining the familiar trigonometric rotation matrices by analysing the trigonometric effects of rotations, we will see below that they arise in Lie theory from the exponential map linking the Lie algebra \mathfrak{so}(D) to the rotation group SO(D), in a kind of matrix analogue of Euler’s formula e^{ix} = \mathrm{cos}x +  i \mathrm{sin}x.

Begin by considering rotations in D-dimensional Euclidean space as being implemented by multiplying vectors by a D \times D rotation matrix R(\vec{\theta}) which is a continuous function of some parameter vector \vec{\theta} such that R(\vec{0}) = I. In Lie theory we regard these rotations as being infinitesimally small, in the sense that they move us away from the identity by an infinitesimally small amount. If \mathrm{d}\vec{x} is the column vector of coordinate differentials, then the rotation embodied in R(\vec{\theta}) is implemented as

\mathrm{d}\vec{x}^{\ \prime} = R \mathrm{d}\vec{x}

Since we require lengths to remain unchanged after rotation, we have

\mathrm{d}\vec{x}^{\ \prime \ T} \mathrm{d}\vec{x}^{\ \prime} = \mathrm{d}\vec{x}^{\ T}R^T R \mathrm{d}\vec{x} = \mathrm{d}\vec{x}^{\ T}\mathrm{d}\vec{x}

which implies

R^T R = I

In other words, the matrix R must be orthogonal. Furthermore, since the determinant of a product is the product of the determinants, and the determinant of a transpose is the same as the original determinant, we can write

\mathrm{det}(R^T R) = (\mathrm{det}R)^2 = \mathrm{det}(I) = 1

Therefore we must have

\mathrm{det}(R) = \pm 1

But we can exclude the case \mathrm{det}(R) = -1 because the set of orthogonal matrices with negative determinants produces reflections. For example, the orthogonal matrix

\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}

has determinant -1 and results in a reflection in the x-axis when applied to a vector. Here we are only interested in rotations, which we can now define as having orthogonal transformation matrices R such that \mathrm{det}(R) = 1. Matrices which have unit determinant are called special, so focusing purely on rotations means that we are dealing exclusively with the set of special orthogonal matrices of dimension D, denoted by SO(D).

It is straightforward to verify that SO(D) constitutes a group with the operation of matrix multiplication. It is closed, has an identity element I, each element R \in SO(D) has an inverse (since the determinant is nonzero), and matrix multiplication is associative. Note that this means a rotation matrix times a rotation matrix must give another rotation matrix, so this is another property R(\vec{\theta}) needs to satisfy.

The fact that SO(D) is also a differentiable manifold, and therefore a Lie group, follows in a technical way (which I will not delve into here) from the fact that SO(D) is a closed subgroup of the set of all invertible D \times D real matrices, usually denoted by GL(D, \mathbb{R}), and this itself is a manifold of dimension D^2. The latter fact is demonstrated easily by noting that for M \in GL(D, \mathbb{R}), the determinant function M \mapsto \mathrm{det}(M) is continuous, and GL(D, \mathbb{R}) is the inverse image under this function of the open set \mathbb{R} - \{0\}. Thus, GL(D, \mathbb{R}) is itself an open subset in the D^2-dimensional linear space of all the D \times D real matrices, and thus a manifold of dimension D^2. The matrix Lie group SO(D) is a manifold of dimension \frac{D(D-1)}{2}, not D^2. One way to appreciate this is to observe that the condition R^T R = I for every R \in SO(D) means that you only need to specify \frac{D(D-1)}{2} off-diagonal elements to specify each R. In other words, there are D^2 elements in each R but the condition R^T R = I means that there are \frac{D(D+1)}{2} equations linking them, so the number of `free’ elements in each R \in SO(D) is only D^2 - \frac{D(D+1)}{2} = \frac{D(D-1)}{2}. We will see shortly that \frac{D(D-1)}{2} is also the dimension of \mathfrak{so}(D), which must be the case given that \mathfrak{so}(D) is to be the tangent space of the manifold SO(D) (the dimension of a manifold is the dimension of its tangent space).

If we now Taylor-expand R(\vec{\theta}) to first order about \vec{\theta} = \vec{0} we get

R(\vec{\theta}) \approx I + A

where A is an infinitesimal matrix of order \vec{\theta} and we will (for now) ignore terms like A^2, A^3, \ldots which are of second and higher order in \vec{\theta}. Now substituting R = I + A into R^T R = I we get

(I + A)^T (I + A) = I + A^T + A = I

\implies

A^T = -A

Thus, the matrix A must be antisymmetric. In fact, A will be a linear combination of some elementary antisymmetric basis matrices which play a crucial role in the theory, so we will explore this more. Since a sum of antisymmetric matrices is antisymmetric, and a scalar product of an antisymmetric matrix is antisymmetric, the set of all D \times D antisymmetric matrices is a vector space. This vector space has a basis provided by some elementary antisymmetric matrices containing only two non-zero elements each, the two non-zero elements in each matrix appearing in corresponding positions either side of the main diagonal and having opposite signs (this is what makes the matrices antisymmetric). Since there are \frac{D(D-1)}{2} distinct pairs of possible off-diagonal positions for these two non-zero elements, the basis has dimension \frac{D(D-1)}{2} and, as will be seen shortly, this vector space in fact turns out to be the Lie algebra \mathfrak{so}(D). The basis matrices will be written as J_{(mn)} where m and n identify the pair of corresponding off-diagonal positions in which the two non-zero elements will appear. We will let m run through the numbers 1, 2, \ldots, D in order, and with each pair m and n fixed, the element in the w-th row and k-th column of each matrix J_{(mn)} is then given by the formula

(J_{(mn)})_{wk} = \delta_{mw} \delta_{nk} - \delta_{mk} \delta_{nw}

To clarify this, we will consider the antisymmetric basis matrices for D = 2, D = 3 and D = 4. In the case D = 2 we have \frac{D(D-1)}{2} = 1 so there is a single antisymmetric matrix. Setting m = 1, n = 2, we get (J_{(12)})_{12} = 1 - 0 = 1 and (J_{(12)})_{21} = 0 - 1 = -1 so the antisymmetric matrix is

J_{(12)} = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}

In the case D = 3 we have \frac{D(D-1)}{2} = 3 antisymmetric basis matrices corresponding to the three possible pairs of off-diagonal positions for the two non-zero elements in each matrix. Following the same approach as in the previous case, these can be written as

J_{(12)} = \begin{bmatrix} 0 & 1 & 0\\ -1 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}

J_{(23)} = \begin{bmatrix} 0 & 0 & 0\\ 0 & 0 & 1\\ 0 & -1 & 0 \end{bmatrix}

J_{(31)} = \begin{bmatrix} 0 & 0 & -1\\ 0 & 0 & 0 \\ 1 & 0 & 0 \end{bmatrix}

Finally, in the case D = 4 we have \frac{D(D-1)}{2} = 6 antisymmetric basis matrices corresponding to the six possible pairs of off-diagonal positions for the two non-zero elements in each matrix. These can be written as

J_{(12)} = \begin{bmatrix} 0 & 1 & 0 & 0\\ -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(23)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(31)} = \begin{bmatrix} 0 & 0 & -1 & 0\\ 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

J_{(41)} = \begin{bmatrix} 0 & 0 & 0 & -1\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 1 & 0 & 0 & 0 \end{bmatrix}

J_{(42)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1\\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{bmatrix}

J_{(43)} = \begin{bmatrix} 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & -1 \\ 0 & 0 & 1 & 0 \end{bmatrix}

So in the case of a general infinitesimal rotation in D-dimensional space of the form R(\vec{\theta}) \approx I + A, the antisymmetric matrix A will be a linear combination of the \frac{D(D-1)}{2} antisymmetric basis matrices J_{(mn)} of the form

A = \sum_m \sum_n \theta_{(mn)} J_{(mn)}

But note that using the standard matrix exponential series we have

e^{A} = I + A + \frac{1}{2}A^2 + \cdots \approx I + A

This suggests

R(\vec{\theta}) \approx e^A

and in fact this relationship between rotations and the exponentials of antisymmetric matrices turns out to be exact, not just an approximation. To see this, observe that A and A^{\ T} commute since A^{\ T}A = AA^{\ T} = -A^2. This means that

(e^{A})^{\ T} e^A = e^{A^{\ T}} e^A = e^{A^{\ T} + A} = e^0 = I

(note that in matrix exponentiation e^Ae^B = e^{A+B} only if A and B commute – see below). Since the diagonal elements of an antisymmetric matrix are always zero, we also have

\mathrm{det}(e^A) = e^{tr(A)} = e^0 = 1

Thus, e^A is both special and orthogonal, so it must be an element of SO(D). Conversely, suppose e^A \in SO(D). Then we must have

(e^{A})^{\ T} e^A = I

\iff

e^{A^{\ T}}e^A = I

\iff

e^{A^{\ T}} = e^{-A}

\implies

A^{\ T} = -A

so A is antisymmetric.

So we have a tight link between \mathfrak{so}(D) and SO(D) via matrix exponentiation. We can do a couple of things with this. First, for any real parameter t \in \mathbb{R} and antisymmetric basis matrix J_{(mn)}, we have R(t) \equiv e^{t J_{(mn)}} \in SO(D) and this defines a parametric path through SO(D) which passes through its identity element at t = 0. Differentiating with respect to t and evaluating the derivative at t = 0 we find that

R^{\ \prime}(0) = J_{(mn)}

which indicates that the antisymmetric basis matrices J_{(mn)} are tangent vectors of the manifold SO(D) at the identity, and that the set of \frac{D(D-1)}{2} antisymmetric basis matrices form the tangent space of SO(D). Another thing we can do with the matrix exponential map is quickly recover the elementary rotation matrix in the case D = 2. Noting that  J_{(12)}^2 = -I and separating the exponential series into even and odd terms in the usual way we find that

R(\theta) = e^{\theta J_{(12)}} = \mathrm{cos}\theta I + \mathrm{sin}\theta J_{(12)} = \begin{bmatrix} \mathrm{cos}\theta & \mathrm{sin}\theta \\ -\mathrm{sin}\theta & \mathrm{cos}\theta \end{bmatrix}

where the single real number \theta here is the angle of rotation. This is the matrix analogue of Euler’s formula e^{ix} = \mathrm{cos}x +  i \mathrm{sin}x that was mentioned earlier.

To further elucidate how the antisymmetric basis matrices J_{(mn)} form a Lie algebra which is closely tied to the matrix Lie group SO(D), we will show that the commutation relation between them is closed (i.e., that the commutator of two antisymmetric basis matrices is itself antisymmetric), and that these commutators play a crucial role in ensuring the closure of the group SO(D) (i.e., in ensuring that a rotation multiplied by a rotation produces another rotation). First, suppose that A and B are two distinct antisymmetric matrices. Then since the transpose of a product is the product of the transposes in reverse order we can write

([A, B])^{\ T} = (AB - BA)^{\ T} = (AB)^{\ T} - (BA)^{\ T}

= B^{\ T} A^{\ T}  - A^{\ T} B^{\ T} = BA - AB = - [A, B]

This shows that the commutator of two antisymmetric matrices is itself antisymmetric, so the commutator can be written as a linear combination of the antisymmetric basis matrices J_{(mn)}. Furthermore, since we can write A = \sum_{m, n} \theta_{(mn)} J_{(mn)} and B = \sum_{p, q} \theta_{(pq)}^{\ \prime} J_{(pq)}, we have

[A, B] = \sum_{m, n, p, q} \theta_{(mn)} \theta_{(pq)}^{\ \prime}[J_{(mn)}, J_{(pq)}]

so every commutator between antisymmetric matrices can be written in terms of the commutators [J_{(mn)}, J_{(pq)}] of the antisymmetric basis matrices. Next, suppose we exponentiate the antisymmetric matrices A and B to obtain the rotations e^A and e^B. Since SO(D) is closed, it must be the case that

e^A e^B = e^C

where e^C is another rotation and therefore C is an antisymmetric matrix. To see the role of the commutator between antisymmetric matrices in ensuring this, we will expand both sides. For the left-hand side we get

e^A e^B = (I + A + \frac{1}{2}A^2 + \cdots)(I + B + \frac{1}{2}B^2 + \cdots)

= I + A + B + \frac{1}{2}A^2 + \frac{1}{2}B^2 + AB + \cdots

= I + A + B + \frac{1}{2}(A^2 + AB + BA + B^2) + \frac{1}{2}[A, B] + \cdots

= I + A + B + \frac{1}{2}(A + B)^2 + \frac{1}{2}[A, B] + \cdots

= I + A + B + \frac{1}{2}[A, B] + \cdots

For the right-hand side we get

e^C =  I + C + \frac{1}{2}C^2 + \cdots

Equating the two expansions we get

C = A + B + \frac{1}{2}[A, B] + \cdots

where the remaining terms on the right-hand side are of second and higher order in A, B and C. A result known as the Baker-Campbell-Hausdorff formula shows that the remaining terms on the right-hand side of C are in fact all nested commutators of A and B. The series for C with a few additional terms expressed in this way is

C = A + B + \frac{1}{2}[A, B]

+ \frac{1}{12}\big([A, [A, B]] + [B, [B, A]]\big)

- \frac{1}{24}[B, [A, [A, B]]]

- \frac{1}{720}\big([B, [B, [B, [B, A]]]] + [A, [A, [A, [A, B]]]]\big) + \cdots

This shows that e^A e^B \neq e^{A + B} unless A and B commute, since only in this case do all the commutator terms in the series for C vanish. Since the commutator of two antisymmetric matrices is itself antisymmetric, this result also shows that C is an antisymmetric matrix, and therefore e^C must be a rotation.

Since every commutator between antisymmetric matrices can be written in terms of the commutators [J_{(mn)}, J_{(pq)}] of the antisymmetric basis matrices, a general formula for the latter would seem to be useful. In fact, the formula given earlier, namely

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

completely characterises the Lie algebra \mathfrak{so}(D). To conclude this note we will therefore derive this formula ab initio, starting from the formula

(J_{(mn)})_{wk} = \delta_{mw} \delta_{nk} - \delta_{mk} \delta_{nw}

for the wk-th element of each matrix J_{(mn)}. We have

[J_{(mn)}, J_{(pq)}] = J_{(mn)} J_{(pq)} - J_{(pq)} J_{(mn)}

Focus on J_{(mn)} J_{(pq)} first. Using the Einstein summation convention, the product of the w-th row of J_{(mn)} with the k-th column of J_{(pq)} is

(J_{(mn)})_{wz} (J_{(pq)})_{zk}

= (\delta_{mw} \delta_{nz} - \delta_{mz} \delta_{nw})(\delta_{pz} \delta_{qk} - \delta_{pk} \delta_{qz})

= \delta_{mw} \delta_{qk} \delta_{nz} \delta_{pz} + \delta_{nw} \delta_{pk} \delta_{mz} \delta_{qz} - \delta_{mw} \delta_{pk} \delta_{nz} \delta_{qz} - \delta_{nw} \delta_{qk} \delta_{mz} \delta_{pz}

Now focus on J_{(pq)} J_{(mn)}. The product of the w-th row of J_{(pq)} with the k-th column of J_{(mn)} is

(J_{(pq)})_{wz} (J_{(mn)})_{zk}

= (\delta_{pw} \delta_{qz} - \delta_{pz} \delta_{qw})(\delta_{mz} \delta_{nk} - \delta_{mk} \delta_{nz})

= \delta_{pw} \delta_{nk} \delta_{qz} \delta_{mz} + \delta_{qw} \delta_{mk} \delta_{pz} \delta_{nz} - \delta_{pw} \delta_{mk} \delta_{qz} \delta_{nz} - \delta_{qw} \delta_{nk} \delta_{pz} \delta_{mz}

So the element in the w-th row and k-th column of [J_{(mn)}, J_{(pq)}] is

\delta_{mw} \delta_{qk} \delta_{nz} \delta_{pz} + \delta_{nw} \delta_{pk} \delta_{mz} \delta_{qz} + \delta_{pw} \delta_{mk} \delta_{qz} \delta_{nz} + \delta_{qw} \delta_{nk} \delta_{pz} \delta_{mz}

- \delta_{mw} \delta_{pk} \delta_{nz} \delta_{qz} -  \delta_{nw} \delta_{qk} \delta_{mz} \delta_{pz} - \delta_{pw} \delta_{nk} \delta_{qz} \delta_{mz} - \delta_{qw} \delta_{mk} \delta_{pz} \delta_{nz}

But notice that

\delta_{nz} \delta_{pz} = \delta_{np}

and similarly for the other Einstein summation terms. Thus, the above sum reduces to

(\delta_{mw} \delta_{qk} - \delta_{qw} \delta_{mk})\delta_{np} + (\delta_{nw} \delta_{pk} - \delta_{pw} \delta_{nk})\delta_{mq}

+ (\delta_{pw} \delta_{mk} - \delta_{mw} \delta_{pk})\delta_{nq} + (\delta_{qw} \delta_{nk} - \delta_{nw} \delta_{qk})\delta_{mp}

But

(\delta_{mw} \delta_{qk} - \delta_{mk} \delta_{qw})\delta_{np} = \delta_{np} (J_{(mq)})_{wk}

(\delta_{nw} \delta_{pk} - \delta_{nk} \delta_{pw})\delta_{mq} = \delta_{mq} (J_{(np)})_{wk}

(\delta_{mk} \delta_{pw} - \delta_{mw} \delta_{pk})\delta_{nq} = - \delta_{nq} (J_{(mp)})_{wk}

(\delta_{nk} \delta_{qw} - \delta_{nw} \delta_{qk})\delta_{mp} = - \delta_{mp} (J_{(nq)})_{wk}

Thus the element in the w-th row and k-th column of [J_{(mn)}, J_{(pq)}] is

\delta_{np} (J_{(mq)})_{wk} + \delta_{mq} (J_{(np)})_{wk} - \delta_{mp} (J_{(nq)})_{wk} - \delta_{nq} (J_{(mp)})_{wk}

Extending this to the matrix as a whole gives the required formula:

[J_{(mn)}, J_{(pq)}] = \delta_{np}J_{(mq)} + \delta_{mq}J_{(np)} - \delta_{mp}J_{(nq)} - \delta_{nq}J_{(mp)}

A note on Green’s theorem in the plane and finding areas enclosed by parametric curves

Green’s theorem in the plane says that if PQ\partial P/\partial y and \partial Q/\partial x are single-valued and continuous in a simple connected region \mathfrak{R} bounded by a simple closed curve C, then

\oint_C P \mathrm{d}x + Q \mathrm{d}y = \iint_{\mathfrak{R}}\big(\frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y}\big) \mathrm{d}x \mathrm{d}y

where the line integral along C is in the anti-clockwise direction, as shown in the sketch. The theorem allows one to replace a double-integral over the region \mathfrak{R} by a line integral around the boundary curve C, or vice versa, whichever is the easier one to solve. It acts as a template for generating a multitude of useful formulas of this kind that can be tailored to suit particular situations by carefully choosing the forms of the functions P and Q. For example, in the context of integration of vector functions, if we write a two-dimensional vector V in unit-vector form as V = i V_x + j V_y, then putting Q = V_x and P = - V_y in Green’s theorem gives the divergence theorem in two dimensions, whereas putting Q = V_y and P = V_x gives Stokes’ theorem in two dimensions. Another result like this is obtained by putting Q = x and P = -y in Green’s theorem. This yields a formula for calculating the area enclosed by the simple closed curve C in the sketch above:

\oint_C x\mathrm{d}y - y \mathrm{d}x = \iint_{\mathfrak{R}} \big(\frac{\partial x}{\partial x} - \frac{\partial (-y)}{\partial y}\big) \mathrm{d}x \mathrm{d}y = 2 \iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y

\implies

\iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y = \frac{1}{2} \oint_C x\mathrm{d}y - y \mathrm{d}x

In the present note I want to quickly record a couple of observations about how this last result extends to cases in which the relevant curve C is defined parametrically rather than in terms of Cartesian coordinates.

First, the result can easily be adapted to obtain a very useful formula for finding the areas of closed parametric curves. If the curve C is defined parametrically by x(t) and y(t)), then simply changing variables from x and y to t in the formula above gives

\iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y = \frac{1}{2} \oint_{C} x \mathrm{d}y - y \mathrm{d}x = \frac{1}{2}  \oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t

The expression on the right-hand side can immediately be applied to a huge range of problems involving finding areas of closed parametric curves. As a simple initial illustration to check that it works, we can use it to confirm that the area of a circle of radius r is \pi r^2. The circle is described by the Cartesian equation x^2 + y^2 = r^2 but has a parametric representation x = r\mathrm{cos} (t), y = r \mathrm{sin} (t), with t ranging from 0 to 2 \pi. Therefore \dot{x} = -r\mathrm{sin} (t) and \dot{y} = r\mathrm{cos} (t). Putting these into the formula we get

\frac{1}{2}\oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_{t=0}^{2 \pi} (r^2\mathrm{cos}^2(t) + r^2\mathrm{sin}^2(t)) \mathrm{d}t = \frac{r^2}{2} \int_{t=0}^{2 \pi} \mathrm{d} t = \pi r^2

as expected.

P05As a slightly more interesting example, we can find the area of the main cardioid in the Mandelbrot set. This has parametric representation x = \frac{1}{2} \mathrm{cos}(t) - \frac{1}{4} \mathrm{cos}(2t), y = \frac{1}{2} \mathrm{sin}(t) - \frac{1}{4} \mathrm{sin}(2t) with t ranging from 0 to 2 \pi (see, e.g., Weisstein, Eric W., Mandelbrot set, From MathWorld – A Wolfram Web Resource). We find that \dot{x} = -\frac{1}{2} \mathrm{sin}(t) + \frac{1}{2} \mathrm{sin}(2t) and \dot{y} = \frac{1}{2} \mathrm{cos}(t) - \frac{1}{2} \mathrm{cos}(2t), and therefore

x\dot{y} = \frac{1}{4}  \mathrm{cos}^2(t) - \frac{3}{8} \mathrm{cos}(t) \mathrm{cos}(2t) + \frac{1}{8}\mathrm{cos}^2(2t)

y\dot{x} = -\frac{1}{4} \mathrm{sin}^2(t) + \frac{3}{8} \mathrm{sin}(t)\mathrm{sin}(2t) - \frac{1}{8}\mathrm{sin}^2(2t)

x\dot{y} - y\dot{x} = \frac{3}{8} - \frac{3}{8}\mathrm{cos}(t)

Putting this into the formula we find that the area of the main cardioid in the Mandelbrot set is

\frac{1}{2}\oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_{t=0}^{2 \pi} \big(\frac{3}{8} - \frac{3}{8}\mathrm{cos}(t)\big) \mathrm{d}t = \frac{3\pi}{8}

The second observation I want to make here is that we can sometimes use the same formula to find the area of a region by integrating along a parametric curve that is not closed. This seems surprising at first because we obtained the formula using Green’s theorem which explicitly requires the curve C to be closed. As an illustration of this situation, consider the problem of finding the area A in the diagram. The arc joining the two points (x_1, y_1) and (x_2, y_2) is assumed to have parametric representation x = f(t), y = g(t), such that (x_1, y_1) = (f(t_1), g(t_1)) and (x_2, y_2) = (f(t_2), g(t_2)), with t_2 > t_1. The claim is then that the area A in the diagram is given by the same formula as before, but applied only along the arc joining (x_1, y_1) and (x_2, y_2) rather than all the way around the enclosed region. Thus, we are claiming that

A = \frac{1}{2} \int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t

To prove this we first note that since x = f(t), we can write t = f^{-1}(x), so y can be written as a function of x as y = g(f^{-1}(x)). The area under the arc joining (x_1, y_1) and (x_2, y_2) is then given by

\int_{x_1}^{x_2} g(f^{-1}(x)) \mathrm{d}x

Changing variables in this integral from x to t we find that t_1 = f^{-1}(x_1)t_2 = f^{-1}(x_2), g(f^{-1}(x)) = g(t) = y and \mathrm{d}x = f^{\prime}(t) \mathrm{d}t = \dot{x} \mathrm{d}t. Thus, we find that the area under the arc is given by

\int_{x_1}^{x_2} g(f^{-1}(x)) \mathrm{d}x = \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t

By simple geometry we can then see that the area A is given by

\frac{1}{2}x_2y_2 - \frac{1}{2}x_1y_1 - \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t

where \frac{1}{2}x_2y_2 is the area under the line joining 0 and (x_2, y_2), and \frac{1}{2}x_1y_1 is the area under the line joining 0 and (x_1, y_1).

Next, we imagine flipping the graph over so that y is now along the horizontal axis and x is along the vertical axis. We can proceed in the same way as before to find the area under the curve from this point of view. In Cartesian form the area under the curve is given by

\int_{y_1}^{y_2} f(g^{-1}(y)) \mathrm{d}y

and upon changing variables from y to t this becomes \int_{t_1}^{t_2} x \dot{y} \mathrm{d}t. But, returning to the original graph, we  see that the sum of the two areas \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t and \int_{t_1}^{t_2} x \dot{y} \mathrm{d}t is the same as the difference x_2y_2 - x_1y_1, where x_2y_2 is the area of the rectangle with vertices at (0, 0), (0, y_2), (x_2, y_2) and (x_2, 0), and x_1y_1 is the area of the smaller rectangle with vertices at (0, 0), (0, y_1), (x_1, y_1) and (x_1, 0). Thus, we can write

x_2y_2 - x_1y_1 = \int_{t_1}^{t_2} (x \dot{y} + y \dot{x}) \mathrm{d}t

\iff

\frac{1}{2}x_2y_2 - \frac{1}{2}x_1y_1 - \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t= \frac{1}{2}\int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t

This proves the result since we saw above that the expression on the left-hand side gives the area A.

As a simple application of this result, suppose the arc in the above scenario is part of a parabola with Cartesian equation y = x^2 and we want to find the area A when (x_1, y_1) = (1, 1) and (x_2, y_2) = (2, 4). The quadratic equation has a parametric representation x = t, y = t^2 and t ranges from 1 to 2 along the arc. We have \dot{x} = 1 and \dot{y} = 2t, so putting these into the formula we find that

A = \frac{1}{2} \int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_1^2 t^2 \mathrm{d}t = \frac{7}{6}

Study of a proof of Noether’s theorem and its application to conservation laws in physics

While I have for a long time been aware of Noether’s theorem and its relevance to symmetry and conservation laws in physics, I have only recently taken the time to closely explore its mathematical proof. In the present note I want to record some notes I made on the mathematical nuances involved in a proof of Noether’s theorem and the mathematical relevance of the theorem to some simple conservation laws in classical physics, namely, the conservation of energy and the conservation of linear momentum. Noether’s Theorem has important applications in a wide range of classical mechanics problems as well as in quantum mechanics and Einstein’s relativity theory. It is also used in the study of certain classes of partial differential equations that can be derived from variational principles.

The theorem was first published by Emmy Noether in 1918. An English translation of the full original paper is available here. An interesting book by Yvette Kosmann-Schwarzbach also presents an English translation of Noether’s 1918 paper and discusses in detail the history of the theorem’s development and its impact on theoretical physics in the 20th Century. (Kosmann-Schwarzbach, Y, 2011, The Noether Theorems: Invariance and Conservation Laws in the Twentieth Century. Translated by Bertram Schwarzbach. Springer). At the time of writing, the book is freely downloadable from here.

Mathematical setup of Noether’s theorem

The case I explore in detail here is that of a variational calculus functional of the form

S[y] = \int_a^b \mathrm{d}x F(x, y, y^{\prime})

where x is a single independent variable and y = (y_1, y_2, \ldots, y_n) is a vector of n dependent variables. The functional has stationary paths defined by the usual Euler-Lagrange equations of variational calculus. Noether’s theorem concerns how the value of this functional is affected by families of continuous transformations of the dependent and independent variables (e.g., translations, rotations) that are defined in terms of one or more real parameters. The case I explore in detail here involves transformations defined in terms of only a single parameter, call it \delta. The transformations can be represented in general terms as

\overline{x} = \Phi(x, y, y^{\prime}; \delta)

\overline{y}_k = \Psi_k(x, y, y^{\prime}; \delta)

for k = 1, 2, \ldots, n. The functions \Phi and \Psi_k are assumed to have continuous first derivatives with respect to all the variables, including the parameter \delta. Furthermore, the transformations must reduce to identities when \delta = 0, i.e.,

x \equiv \Phi(x, y, y^{\prime}; 0)

y_k \equiv \Psi_k(x, y, y^{\prime}; 0)

for k = 1, 2, \ldots, n. As concrete examples, translations and rotations are continuous differentiable transformations that can be defined in terms of a single parameter and that reduce to identities when the parameter takes the value zero.

Noether’s theorem is assumed to apply to infinitesimally small changes in the dependent and independent variables, so we can assume |\delta| \ll 1 and then use perturbation theory to prove the theorem. Treating \overline{x} and \overline{y}_k as functions of \delta and Taylor-expanding them about \delta = 0 we get

\overline{x}(\delta) = \overline{x}(0) + \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

where

\phi(x, y, y^{\prime}) \equiv \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}

and

\overline{y}_k (\delta) = \overline{y}_k (0) + \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

where

\psi_k (x, y, y^{\prime}) \equiv \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0} for k = 1, 2, \ldots, n.

Noether’s theorem then says that whenever the functional S[y] is invariant under the above family of transformations, i.e., whenever

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

for all c and d such that a \leq c < d \leq b, where \overline{c} = \Phi(c, y(c), y^{\prime}(c)) and \overline{d} = \Phi(d, y(d), y^{\prime}(d)), then for each stationary path of S[y] the following equation holds:

\sum_{k = 1}^n \frac{\partial F}{\partial y_k^{\prime}}\psi_k + \bigg(F - \sum_{k = 1}^n y_k^{\prime}\frac{\partial F}{\partial y_k^{\prime}}\bigg)\phi = \mathrm{constant}

As illustrated below, this remarkable equation encodes a number of conservation laws in physics, including conservation of energy, linear and angular momentum given that the relevant equations of motion are invariant under translations in time and space, and under rotations in space respectively. Thus, Noether’s theorem is often expressed as a statement along the lines that whenever a system has a continuous symmetry there must be corresponding quantities whose values are conserved.

Application of the theorem to familiar conservation laws in classical physics

It is, of course, not necessary to use the full machinery of Noether’s theorem for simple examples of conservation laws in classical physics. The theorem is most useful in unfamiliar situations in which it can reveal conserved quantities which were not previously known. However, going through the motions in simple cases clarifies how the mathematical machinery works in more sophisticated and less familiar situations.

To obtain the law of the conservation of energy in the simplest possible scenario, consider a particle of mass m moving along a straight line in a time-invariant potential field V(x) with position at time t given by the function x(t). The Lagrangian formulation of mechanics then says that the path followed by the particle will be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2 - V(x)\big)

The Euler-Lagrange equation for this functional would give Newton’s second law as the equation governing the particle’s motion. With regard to demonstrating energy conservation, we notice that the Lagrangian, which is more generally of the form L(t, x, \dot{x}) when there is a time-varying potential, here takes the simpler form L(x, \dot{x}) because there is no explicit dependence on time. Therefore we might expect the functional to be invariant under translations in time, and thus Noether’s theorem to hold. We will verify this. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \phi + O(\delta^2) \equiv t + \delta

and

\overline{x}(\delta) = x + \delta \cdot 0 + O(\delta^2) \equiv x

From the first equation we see that \phi = 1 in the case of a simple translation in time by an amount \delta, and from the second equation we see that \psi = 0, which simply reflects the fact that we are only translating in the time direction. The invariance of the functional under these transformations can easily be demonstrated by writing

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\overline{x}, \dot{\overline{x}}) = \int_{\overline{0}-\delta}^{\overline{T}-\delta} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t L(x, \dot{x})

where the limits in the second integral follow from the change of the time variable from \overline{t} to t. Thus, Noether’s theorem holds and with \phi = 1 and \psi = 0 the fundamental equation in the theorem reduces to

L - \dot{x}\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

Evaluating the terms on the left-hand side we get

\frac{1}{2}m\dot{x}^2 - V(x) - \dot{x} m\dot{x} =\mathrm{constant}

\iff

\frac{1}{2}m\dot{x}^2 + V(x) = E = \mathrm{constant}

which is of course the statement of the conservation of energy.

To obtain the law of conservation of linear momentum in the simplest possible scenario, assume now that the above particle is moving freely in the absence of any potential field, so V(x) = 0 and the only energy involved is kinetic energy. The path followed by the particle will now be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(\dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2\big)

The Euler-Lagrange equation for this functional would give Newton’s first law as the equation governing the particle’s motion (constant velocity in the absence of any forces). To get the law of conservation of linear momentum we will consider a translation in space rather than time, and check that the action functional is invariant under such translations. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \cdot 0 + O(\delta^2) \equiv t

and

\overline{x}(\delta) = x + \delta \psi + O(\delta^2) \equiv x + \delta

From the first equation we see that \phi = 0 reflecting the fact that we are only translating in the space direction, and from the second equation we see that \psi = 1 in the case of a simple translation in space by an amount \delta. The invariance of the functional under these transformations can easily be demonstrated by noting that \dot{\overline{x}} = \dot{x}, so we can write

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\dot{\overline{x}}) = \int_0^{T} \mathrm{d}t L(\dot{x})

since the limits of integration are not affected by the translation in space. Thus, Noether’s theorem holds and with \phi = 0 and \psi = 1 the fundamental equation in the theorem reduces to

\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

\iff

m\dot{x} = \mathrm{constant}

This is, of course, the statement of the conservation of linear momentum.

Proof of Noether’s theorem

To prove Noether’s theorem we will begin with the transformed functional

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

We will substitute into this the linearised forms of the transformations, namely

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

and

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

for k = 1, 2, \ldots, n, and then expand to first order in \delta. Note that the integration limits are, to first order in \delta,

\overline{c} = c + \delta \phi(c)

and

\overline{d} = d + \delta \phi(d)

Using the linearised forms of the transformations and writing \psi = (\psi_1, \psi_2, \ldots, \psi_n) we get

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \frac{\mathrm{d}x}{\mathrm{d}\overline{x}}

\frac{\mathrm{d}\overline{x}}{\mathrm{d}x} = 1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}

Inverting the second equation we get

\frac{\mathrm{d}x}{\mathrm{d}\overline{x}} = \big(1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big)^{-1} = 1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} + O(\delta^2)

Using this in the first equation we find, to first order in \delta,

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \big(1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big) = \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)

Making the necessary substitutions we can then write the transformed functional as

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

= \int_{\overline{c}-\delta \phi(c)}^{\overline{d}-\delta \phi(d)} \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

Treating F as a function of \delta and expanding about \delta = 0 to first order we get

F(\delta) = F(0) + \delta \frac{\partial F}{\partial \delta}\big|_{\delta = 0}

= F(x, y, y^{\prime}) + \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

Then using the expression for \frac{\mathrm{d}\overline{x}}{\mathrm{d}x} above, the transformed functional becomes

\int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \int_c^d \mathrm{d}x \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F(x, y, y^{\prime}) + O(\delta^2)

Ignoring the second order term in \delta we can thus write

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \delta \int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg)

Since the functional is invariant, however, this implies

\int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg) = 0

We now manipulate this equation by integrating the terms involving \frac{\mathrm{d}\phi}{\mathrm{d}x} and \frac{\mathrm{d}\psi_k}{\mathrm{d}x} by parts. We get

\int_c^d \mathrm{d}x \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} = \bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg]_c^d

- \int_c^d \mathrm{d}x \phi \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

and

\int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} = \bigg[\sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d - \int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\psi_k

Substituting these into the equation gives

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \bigg(\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can manipulate this equation further by expanding the integrand in the second term on the left-hand side. We get

\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

= \frac{\partial F}{\partial x} - \frac{\partial F}{\partial x} - \sum_{k=1}^n \frac{\partial F}{\partial y_k}y^{\prime}_k - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n y^{\prime}_k \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)

= \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

Thus, the equation becomes

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can now see at a glance that the second and third terms on the left-hand side must vanish because of the Euler-Lagrange expressions appearing in the brackets (which are identically zero on stationary paths). Thus we arrive at the equation

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d = 0

which proves that the formula inside the square brackets is constant as per Noether’s theorem.