# Invariance under rotations in space and conservation of angular momentum

In a previous note I studied in detail the mathematical setup of Noether’s Theorem and its proof. I briefly illustrated the mathematical machinery by considering invariance under translations in time, giving the law of conservation of energy, and invariance under translations in space, giving the law of conservation of linear momentum. I briefly mentioned that invariance under rotations in space would also yield the law of conservation of angular momentum but I  did not work this out explicitly. I want to quickly do this in the present note.

We imagine a particle of unit mass moving freely in the absence of any potential field, and tracing out a path $\gamma(t)$ in the $(x, y)$-plane of a three-dimensional Euclidean coordinate system between times $t_1$ and $t_2$, with the $z$-coordinate everywhere zero along this path. The angular momentum of the particle at time $t$ with respect to the origin of the coordinate system is given by

$\mathbf{L} = \mathbf{r} \times \mathbf{v}$

$= (\mathbf{i} x + \mathbf{j} y) \times (\mathbf{i} \dot{x} + \mathbf{j} \dot{y})$

$= \mathbf{k} x \dot{y} - \mathbf{k} y \dot{x}$

$= \mathbf{k} (x \dot{y} - y \dot{x})$

where $\times$ is the vector product operation. Alternatively, we could have obtained this as

$\mathbf{L} = \mathbf{r} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & 0 \\ \ \\\dot{x} & \dot{y} & 0 \end{vmatrix}$

$= \mathbf{k} (x \dot{y} - y \dot{x})$

In terms of Lagrangian mechanics, the path $\gamma(t)$ followed by the particle will be a stationary path of the action functional

$S[\gamma(t)] = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2)$

(in the absence of a potential field the total energy consists only of kinetic energy).

Now imagine that the entire path $\gamma(t)$ is rotated bodily anticlockwise in the $(x, y)$-plane through an angle $\theta$. This corresponds to a one-parameter transformation

$\overline{t} \equiv \Phi(t, x, y, \dot{x}, \dot{y}; \theta) = t$

$\overline{x} \equiv \Psi_1(t, x, y, \dot{x}, \dot{y}; \theta) = x \cos \theta - y \sin \theta$

$\overline{y} \equiv \Psi_2(t, x, y, \dot{x}, \dot{y}; \theta) = x \sin \theta + y \cos \theta$

which reduces to the identity when $\theta = 0$. We have

$d\overline{t} = dt$

$\dot{\overline{x}}^2 = \dot{x}^2 \cos^2 \theta + \dot{y}^2 \sin^2 \theta - 2 \dot{x} \dot{y} \sin \theta \cos \theta$

$\dot{\overline{y}}^2 = \dot{x}^2 \sin^2 \theta + \dot{y}^2 \cos^2 \theta + 2 \dot{x} \dot{y} \sin \theta \cos \theta$

and therefore

$\dot{x}^2 + \dot{y}^2 = \dot{\overline{x}}^2 + \dot{\overline{y}}^2$

so the action functional is invariant under this rotation since

$S[\overline{\gamma}(t)] = \int_{t_1}^{t_2} d\overline{t} \frac{1}{2}(d\dot{\overline{x}}^2 + d\dot{\overline{y}}^2) = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2) = S[\gamma(t)]$

Therefore Noether’s theorem applies. Let

$F(t, x, y, \dot{x}, \dot{y}) = \frac{1}{2}(\dot{x}^2 + \dot{y}^2)$

Then Noether’s theorem in this case says

$\frac{\partial F}{\partial \dot{x}} \psi_1 + \frac{\partial F}{\partial \dot{y}} \psi_2 + \big(F - \frac{\partial F}{\partial \dot{x}} \dot{x} - \frac{\partial F}{\partial \dot{y}} \dot{y}\big) \phi = const.$

where

$\phi \equiv \frac{\partial \Phi}{\partial \theta} \big|_{\theta = 0} = 0$

$\psi_1 \equiv \frac{\partial \Psi_1}{\partial \theta} \big|_{\theta = 0} = -y$

$\psi_2 \equiv \frac{\partial \Psi_2}{\partial \theta} \big|_{\theta = 0} = x$

We have

$\frac{\partial F}{\partial \dot{x}} = \dot{x}$

$\frac{\partial F}{\partial \dot{y}} = \dot{y}$

Therefore Noether’s theorem gives us (remembering $\phi = 0$)

$-\dot{x} y + \dot{y} x = const.$

The expression on the left-hand side of this equation is the angular momentum of the particle (cf. the brief discussion of angular momentum at the start of this note), so this result is precisely the statement that the angular momentum is conserved. Noether’s theorem shows us that this is a direct consequence of the invariance of the action functional of the particle under rotations in space.

# A mathematical formulation of Feynman’s ‘mirage on a hot road’

In his famous Feynman Lectures on Physics, Richard Feynman provided an intuitive explanation of how a ‘mirage on a hot road’ can arise due to the bending of light rays from the sky in accordance with Fermat’s Principle (see The Feynman Lectures on Physics, Volume I, Chapter 26). Feynman wrote the following:

I was discussing this with a beginning engineering student who did not quite understand why the mirage makes it look as if the water is actually on the road. I explained this by augmenting Feynman’s Fig. 26-8 above as follows:

The bent light ray starting at point A and entering the observer’s eye at point B is interpreted by the observer as having followed a straight line path emanating from the road, as indicated in the diagram. Thus, the observer sees the image of the sky on the road surface and interprets it as a shimmering pool of water.

Having done this, the question then arose as to how one could go about constructing an explicit mathematical model of the above scenario, yielding a suitable equation for the curved light ray from A to B, a linear equation for the apparent straight line path seen by the observer, and explicit coordinates for the point on the road where the image of the sky is seen by the observer. This turned out to be an interesting exercise involving Fermat’s Principle and the Calculus of Variations and is what I want to record here.

Suppose the light ray begins at point $A = (a, b)$ at time $t_1$, and enters the observer’s eye at point $B = (-a, b)$ at time $t_2$. Fermat’s Principle (see, e.g., this Wikipedia article) says that the path followed by the light ray is such as to make the optical length functional

$S[y] = \int_A^B n ds$

stationary, where $n = c/v$ is the refractive index of the medium through which the light passes, $c$ is the speed of light in a vacuum and $v = ds/dt$ is the speed of light in the medium. This functional can be derived (up to a multiplicative constant) from the ‘Principle of Least Time’ by noting that the time taken by the light ray is

$T = \int_{t_1}^{t_2} dt = \int_{t_1}^{t_2} \frac{1}{c} \frac{c}{v} \frac{ds}{dt} dt = \int_A^B \frac{n}{c} ds = \frac{1}{c} S$

The light ray will find the path that minimises this time of travel.

To apply this setup to the mirage in Feynman’s lecture we need to model the refractive index as a function of the $y$-coordinate in my amended diagram above, which measures the height above the road. As Feynman says, light goes faster in the hot region near the road than in the cooler region higher up. Thus, since the refractive index is inversely proportional to $v$, it should be an increasing function of the height above the road $y$. To get a toy model for the scenario in Feynman’s lecture let us make the simplest possible assumption that the refractive index is a simple linear function of $y$, namely

$n(y) = \alpha + \beta y$

with $\alpha$ and $\beta$ both positive. Then since the arc-length element is

$ds = dx \sqrt{1 + y^{\prime \ 2}}$

we can write the optical length functional as

$S[y] = \int_A^B n ds = \int_{a}^{-a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = -\int_{-a}^{a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}$

We find the stationary path for this functional using the Calculus of Variations. Let

$F(x, y, y^{\prime}) = (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}$

Since this does not depend directly on $x$, the problem admits a first-integral of the form

$y^{\prime} \frac{\partial F}{\partial y^{\prime}} - F = C$

where $C$ is a constant. We have

$\frac{\partial F}{\partial y^{\prime}} = \frac{(\alpha + \beta y)y^{\prime}}{\sqrt{1 + y^{\prime \ 2}}}$

Therefore the first-integral for this problem is

$\frac{(\alpha + \beta y)y^{\prime \ 2}}{\sqrt{1 + y^{\prime \ 2}}} - (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = C$

Multiplying through by $\sqrt{1 + y^{\prime \ 2}}/ \alpha$, absorbing $\alpha$ into the constant term, and writing $\delta \equiv \beta/\alpha$ we get

$(1 + \delta y) y^{\prime \ 2} - (1 + \delta y)(1 + y^{\prime \ 2}) = C\sqrt{1 + y^{\prime \ 2}}$

$\iff$

$-(1 + \delta y) = C\sqrt{1 + y^{\prime \ 2}}$

$\implies$

$y^{\prime} = \frac{\pm \sqrt{(1+\delta y)^2 - C^2}}{C}$

This is a first-order differential equation for $y$ which can be solved by separation of variables. We get the integral equation

$\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \pm \int \frac{dx}{C}$

To solve the integral on the left-hand side, make the change of variable

$(1 + \delta y) = C sec \theta$

$\implies$

$\delta dy = C sec \theta \ tan \theta \ d \theta$

Then

$\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \int \frac{C sec \theta tan \theta d \theta}{\delta \sqrt{C^2 sec^2 \theta - C^2}}$

$= \frac{1}{\delta}\int tan \theta d \theta$

$= \frac{1}{\delta} ln[sec \theta] + const.$

$= \frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] + const.$

For the integral on the right-hand side of the integral equation we get

$\pm \int \frac{dx}{C} = \pm \frac{x}{C} + const.$

Therefore the integral equation reduces to

$\frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] = \pm \frac{x}{C} + const.$

$\implies$

$y = \frac{Cexp\big(\pm\frac{\delta x}{C} + const.\big) - 1}{\delta}$

This seems to represent two possible solutions for the first-integral equation, which we may write as

$y_1 = \frac{Cexp\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}$

$y_2 = \frac{Cexp\big(- \big[ \frac{\delta x}{C} + const. \big] \big) - 1}{\delta}$

However, for the curved light ray in my amended diagram above we must have $y \rightarrow \infty$ as $x \rightarrow \pm \infty$. This condition is not satisfied by either of $y_1$ or $y_2$ on their own, but it is satisfied by their sum. We will therefore take the solution of the first integral equation to be

$y = \frac{y_1 + y_2}{2}$

$= \frac{C}{\delta}\bigg[\frac{exp\big(\frac{\delta x}{C} + const.\big) + exp\big(- \big[ \frac{\delta x}{C} + const. \big] \big)}{2}\bigg] - \frac{1}{\delta}$

$= \frac{C cosh\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}$

Furthermore, we have $y(a) = y(-a) = b$ and therefore we require

$cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(-\frac{\delta a}{C} + const. \big)$

But

$cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) + sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)$

and

$cosh\big(-\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) - sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)$

These cannot be equal unless $sinh(const.) = 0 \implies const. = 0$. Thus, our solution for $y$ reduces to

$y = \frac{C cosh\big(\frac{\delta x}{C}\big) - 1}{\delta}$

with the constant $C$ determined in terms of $a$ and $b$ by

$b = \frac{C cosh\big(\frac{\delta a}{C}\big) - 1}{\delta}$

This is the equation of the curved path of the light ray from the sky in Feynman’s diagram. The slope of $y$ at point $B = (-a, b)$ is

$y^{\prime}(-a) = -sinh\big(\frac{\delta a}{C}\big)$

The straight line with this gradient passing through the point $B$ has equation

$y = \big(b - asinh\big(\frac{\delta a}{C}\big)\big) - sinh\big(\frac{\delta a}{C}\big)x$

This is the equation of the straight line emanating from the $x$-axis to the observer’s eye in my amended diagram above. On the $x$-axis we have $y = 0$ in the straight-line equation so

$x = \frac{b}{sinh\big(\frac{\delta a}{C}\big)} - a$

This is the point on the $x$-axis at which the observer in my amended diagram will see the mirage.

# Notes on Sturm-Liouville theory

Sturm-Liouville theory was developed in the 19th century in the context of solving differential equations. When one studies it in depth, however, one experiences a sudden realisation that this is the mathematics underlying a lot of quantum mechanics. In quantum mechanics we envisage a quantum state (a time-dependent function) expressed as a superposition of eigenfunctions of a self-adjoint operator (usually referred to as a Hermitian operator) representing an observable. The coefficients of the eigenfunctions in this superposition are probability amplitudes. A measurement of the observable quantity represented by the Hermitian operator produces one of the eigenvalues of the operator with a probability equal to the square of the probability amplitude attached to the eigenfunction corresponding to that eigenvalue in the superposition. It is the fact that the operator is self-adjoint that ensures the eigenvalues are real (and thus observable), and furthermore, that the eigenfunctions corresponding to the eigenvalues form a complete and orthogonal set of functions enabling quantum states to be represented as a superposition in the first place (i.e., an eigenfunction expansion akin to a Fourier series). The Sturm-Liouville theory of the 19th century has essentially this same structure and in fact Sturm-Liouville eigenvalue problems are important more generally in mathematical physics precisely because they frequently arise in attempting to solve commonly-encountered partial differential equations (e.g., Poisson’s equation, the diffusion equation, the wave equation, etc.), particularly when the method of separation of variables is employed.

I want to get an overview of Sturm-Liouville theory in the present note and will begin by considering a nice discussion of a vibrating string problem in Courant & Hilbert’s classic text, Methods of Mathematical Physics (Volume I). Although the problem is simple and the treatment in Courant & Hilbert a bit terse, it (remarkably) brings up a lot of the key features of Sturm-Liouville theory which apply more generally in a wide variety of physics problems. I will then consider Sturm-Liouville theory in a more general setting emphasising the role of the Sturm-Liouville differential operator, and finally I will illustrate further the occurrence of Sturm-Liouville systems in physics by looking at the eigenvalue problems encountered when solving Schrödinger’s equation for the hydrogen atom.

On page 287 of Volume I, Courant & Hilbert write the following:

Equation (12) here is the one-dimensional wave equation

$\frac{\partial^2 u}{\partial x^2} = \mu^2 \frac{\partial^2 u}{\partial t^2}$

which (as usual) the authors are going to solve by using a separation of variables of the form

$u(x, t) = v(x) g(t)$

As Courant & Hilbert explain, the problem then involves finding the function $v(x)$ by solving the second-order homogeneous linear differential equation

$\frac{\partial^2 v}{\partial x^2} + \lambda v = 0$

subject to the boundary conditions

$v(0) = v(\pi) = 0$

Although not explicitly mentioned by Courant & Hilbert at this stage, equations (13) and (13a) in fact constitute a full blown Sturm-Liouville eigenvalue problem. Despite being very simple, this setup captures many of the typical features encountered in a wide variety of such problems in physics. It is instructive to explore the text underneath equation (13a):

Not all these requirements can be fulfilled for arbitrary values of the constant $\lambda$.

… the boundary conditions can be fulfilled if and only if $\lambda = n^2$ is the square of an integer $n$.

To clarify this, we can try to solve (13) and (13a) for the three possible cases: $\lambda < 0$, $\lambda = 0$ and $\lambda > 0$. Suppose first that $\lambda < 0$. Then $-\lambda > 0$ and the auxiliary equation for (13) is

$D^2 = - \lambda$

$\implies$

$D = \pm \sqrt{- \lambda}$

Thus, we can write the general solution of (13) in this case as

$v = \alpha e^{\sqrt{-\lambda} x} + \beta e^{-\sqrt{-\lambda} x} = A \mathrm{cosh} \big(\sqrt{-\lambda} x\big) + B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)$

where $A$ and $B$ are constants to be determined from the boundary conditions. From the boundary condition $v(0) = 0$ we conclude that $A = 0$ so the equation reduces to

$v = B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)$

But from the boundary condition $v(\pi) = 0$ we are forced to conclude that $B = 0$ since $\mathrm{sinh} \big(\sqrt{-\lambda} \pi\big) \neq 0$. Therefore there is only the trivial solution $v(x) = 0$ in the case $\lambda < 0$.

Next, suppose that $\lambda = 0$. Then equation (13) reduces to

$\frac{\mathrm{d}^2 v}{\mathrm{d} x^2} = 0$

$\implies$

$v = A + Bx$

From the boundary condition $v(0) = 0$ we must conclude that $A = 0$, and the boundary condition $v(\pi) = 0$ means we are also forced to conclude that $B = 0$. Thus, again, there is only the trivial solution $v(x) = 0$ in the case $\lambda = 0$.

We see that nontrivial solutions can only be obtained when $\lambda > 0$. In this case we have $-\lambda < 0$ and the auxiliary equation is

$D^2 = - \lambda$

$\implies$

$D = \pm i \sqrt{\lambda}$

Thus, we can write the general solution of (13) in this case as

$v = \alpha e^{i \sqrt{\lambda} x} + \beta e^{- i \sqrt{\lambda} x} = A \mathrm{cos} \big(\sqrt{\lambda} x\big) + B \mathrm{sin} \big(\sqrt{\lambda} x\big)$

where $A$ and $B$ are again to be determined from the boundary conditions. From the boundary condition $v(0) = 0$ we conclude that $A = 0$ so the equation reduces to

$v = B \mathrm{sin} \big(\sqrt{\lambda} x\big)$

But from the boundary condition $v(\pi) = 0$ we must conclude that, if $B \neq 0$, then we must have $\sqrt{\lambda} = n$ where $n = 1, 2, 3, \ldots$. Thus, we find that for each $n = 1, 2, 3, \ldots$, the eigenvalues of this Sturm-Liouville problem are $\lambda_n = n^2$, and the corresponding eigenfunctions are $v = B \mathrm{sin}\big(n x\big)$. The coefficient $B$ is undetermined and must be specified through some normalisation process, for example by setting the integral of $v^2$ between $0$ and $\pi$ equal to $1$ and then finding the value of $B$ that is consistent with this. In Courant & Hilbert they have (implicitly) simply set $B = 1$.

Some features of this solution are typical of Sturm-Liouville eigenvalue problems in physics more generally. For example, the eigenvalues are real (rather than complex) numbers, there is a minimum eigenvalue ($\lambda_1 = 1$) but not a maximum one, and for each eigenvalue there is a unique eigenfunction (up to a multiplicative constant). Also, importantly, the eigenfunctions here form a complete and orthogonal set of functions. Orthogonality refers to the fact that the integral of a product of any two distinct eigenfunctions over the interval $(0, \pi)$ is zero, i.e.,

$\int_0^{\pi} \mathrm{sin}(nx) \mathrm{sin}(mx) \mathrm{d} x = 0$

for $n \neq m$, as can easily be demonstrated in the same way as in the theory of Fourier series. Completeness refers to the fact that over the interval $(0, \pi)$ the infinite set of functions $\mathrm{sin} (nx)$, $n = 1, 2, 3, \ldots$, can be used to represent any sufficiently well behaved function $f(x)$ using a Fourier series of the form

$f(x) = \sum_{n=1}^{\infty} a_n \mathrm{sin} (nx)$

All of this is alluded to (without explicit explanation at this stage) in the subsequent part of this section of Courant & Hilbert’s text, where they go on to provide the general solution of the vibrating string problem. They write the following:

The properties of completeness and orthogonality of the eigenfunctions are again a typical feature of the solutions of Sturm-Liouville eigenvalue problems more generally, and this is one of the main reasons why Sturm-Liouville theory is so important to the solution of physical problems involving differential equations. To get a better understanding of this, I will now develop Sturm-Liouville theory in a more general setting by starting with a standard second-order homogeneous linear differential equation of the form

$\alpha(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \beta(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \gamma(x) y = 0$

where the variable $x$ is confined to an interval $a \leq x \leq b$.

Let

$p(x) = \mathrm{exp} \bigg(\int \mathrm{d} x \frac{\beta(x)}{\alpha(x)}\bigg)$

$q(x) = \frac{\gamma(x)}{\alpha(x)} p(x)$

Dividing the differential equation by $\alpha(x)$ and multiplying through by $p(x)$ we get

$p(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \frac{\beta(x)}{\alpha(x)} p(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \frac{\gamma(x)}{\alpha(x)} p(x) y = 0$

$\iff$

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} y}{\mathrm{d} x} \bigg) + q(x) y = 0$

$\iff$

$L y = 0$

where

$L = \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} }{\mathrm{d} x} \bigg) + q(x)$

is called the Sturm-Liouville differential operator. Thus, we see already that a wide variety of second-order differential equations encountered in physics will be able to be put into a form involving the operator $L$, so results concerning the properties of $L$ will have wide applicability.

Using the Sturm-Liouville operator we can now write the defining differential equation of Sturm-Liouville theory in an eigenvalue-eigenfunction format that is very reminiscent of the setup in quantum mechanics outlined at the start of this note. The defining differential equation is

$L \phi = - \lambda w \phi$

where $w(x)$ is a real-valued positive weight function and $\lambda$ is an eigenvalue corresponding to the eigenfunction $\phi$. This differential equation is often written out in full as

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0$

with $x \in [a, b]$. In Sturm-Liouville problems, the functions $p(x)$, $q(x)$ and $w(x)$ are specified at the start and, crucially, the function $\phi$ is required to satisfy particular boundary conditions at $a$ and $b$. The boundary conditions are a key aspect of each Sturm-Liouville problem; for a given form of the differential equation, different boundary conditions can produce very different problems. Solving a Sturm-Liouville problem involves finding the values of $\lambda$ for which there exist non-trivial solutions of the defining differential equation above subject to the specified boundary conditions. The vibrating string problem in Courant & Hilbert (discussed above) is a simple example. We obtain the differential equation (13) in that problem by setting $p(x) = 1$, $q(x) = 0$ and $w(x) = 1$ in the defining Sturm-Liouville differential equation.

We would now like to prove that the eigenvalues in a Sturm-Liouville problem will always be real and that the eigenfunctions will form an orthogonal set of functions, as claimed earlier. To do this, we need to consider a few more developments. In Sturm-Liouville theory we can apply $L$ to both real and complex functions, and a key role is played by the concept of the inner product of such functions. Using the notation $f(x)^{*}$ to denote the complex conjugate of the function $f(x)$, we define the inner product of two functions $f$ and $g$ over the interval $a \leq x \leq b$ as

$(f, g) = \int_a^b \mathrm{d} x f(x)^{*} g(x)$

and we define the weighted inner product as

$(f, g)_w = \int_a^b \mathrm{d} x w(x) f(x)^{*} g(x)$

where $w(x)$ is the real-valued positive weight function mentioned earlier. A key result in the theory is Lagrange’s identity, which says that for any two complex-valued functions of a real variable $u(x)$ and $v(x)$, we have

$v(Lu)^{*} - u^{*} Lv = \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]$

This follows from the form of $L$, since

$v(Lu)^{*} - u^{*} Lv = v\bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + q(x) u^{*}\bigg] - u^{*} \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) + q(x) v\bigg]$

$= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)$

$= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + \frac{\mathrm{d} v}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) - \frac{\mathrm{d} u^{*}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)$

$= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) v \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) u^{*} \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)$

$= \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]$

Using the inner product notation, we can write Lagrange’s identity in an alternative form that reveals the crucial role played by the boundary conditions in a Sturm-Liouville problem. We have

$(Lu, v) - (u, Lv) = \int_a^b (Lu)^{*} v \mathrm{d} x - \int_a^b u^{*} Lv \mathrm{d} x$

$= \int_a^b \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg] \mathrm{d} x$

$= \int_a^b \mathrm{d} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]$

$= \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b$

For some boundary conditions the final term here is zero and then we will have

$(Lu, v) = (u, Lv)$

When this happens, the operator in conjunction with the boundary conditions is said to be self-adjoint. As an example, a so-called regular Sturm-Liouville problem involves solving the differential equation

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0$

subject to what are called separated boundary conditions, taking the form

$A_1 \phi(a) + A_2 \phi^{\prime}(a) = 0$

and

$B_1 \phi(b) + B_2 \phi^{\prime}(b) = 0$

In this case, the operator $L$ is self-adjoint. To see this, suppose the functions $u$ and $v$ satisfy these boundary conditions. Then at $a$ we have

$A_1 u(a)^{*} + A_2 u^{\prime}(a)^{*} = 0$

and

$A_1 v(a) + A_2 v^{\prime}(a) = 0$

from which we can deduce that

$\frac{u^{\prime}(a)^{*}}{u(a)^{*}} = -\frac{A_1}{A_2} = \frac{v^{\prime}(a)}{v(a)}$

$\implies$

$v(a) u^{\prime}(a)^{*} = u(a)^{*} v^{\prime}(a)$

Similarly, at the boundary point $b$ we find that

$v(b) u^{\prime}(b)^{*} = u(b)^{*} v^{\prime}(b)$

These results then imply

$\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0$

so the operator $L$ is self-adjoint as claimed. As another example, a singular Sturm-Liouville problem involves solving the same differential equation as in the regular problem, but subject to the boundary condition that $p(x)$ is zero at either $a$ or $b$ or both, while being positive for $a < x < b$. If $p(x)$ does not vanish at one of the boundary points, then $\phi$ is required to satisfy the same boundary condition at that point as in the regular problem. Clearly we will have

$\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0$

in this case too, so the operator $L$ will also be self-adjoint in the case of a singular Sturm-Liouville problem. As a final example, suppose the Sturm-Liouville problem involves solving the same differential equation as before, but with periodic boundary conditions of the form

$\phi(a) = \phi(b)$

$\phi^{\prime}(a) = \phi^{\prime}(b)$

and

$p(a) = p(b)$

Then if $u$ and $v$ are two functions satisfying these boundary conditions we will have

$\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b$

$= p(b) \bigg(v(b) u^{\prime}(b)^{*} - u(b)^{*} v^{\prime}(b)\bigg) - p(a) \bigg(v(a) u^{\prime}(a)^{*} - u(a)^{*} v^{\prime}(a)\bigg)$

$= p(a) \bigg[\bigg(v(b) u^{\prime}(b)^{*} - v(a) u^{\prime}(a)^{*}\bigg) + \bigg(u(a)^{*} v^{\prime}(a) - u(b)^{*} v^{\prime}(b)\bigg)\bigg] = 0$

So again, the operator $L$ will be self-adjoint in the case of periodic boundary conditions. We will see later that the singular and periodic cases arise when attempting to solve Schrödinger’s equation for the hydrogen atom.

The key reason for focusing so much on the self-adjoint property of the operator $L$ is that the eigenvalues of a self-adjoint operator are always real, and the eigenfunctions are orthogonal. Note that by orthogonality of the eigenfunctions in the more general context we mean that

$(\phi_n, \phi_m)_w = \int_a^b \mathrm{d} x w(x) \phi_n(x)^{*} \phi_m(x) = 0$

whenever $\phi_n(x)$ and $\phi_m(x)$ are eigenfunctions corresponding to two distinct eigenvalues.

To prove that the eigenvalues are always real, suppose that $\phi(x)$ is an eigenfunction corresponding to an eigenvalue $\lambda$. Then we have

$L \phi = - \lambda w \phi$

and so

$(L \phi, \phi) = (- \lambda w \phi, \phi) = \int_a^b (- \lambda w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*} \int_a^b (w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*}\int_a^b \mathrm{d}x w(x)|\phi(x)|^2$

But we also have

$(\phi, L \phi) = (\phi, - \lambda w \phi) = \int_a^b \phi^{*}(- \lambda w \phi) \mathrm{d} x = -\lambda \int_a^b \phi^{*} (w \phi) \mathrm{d} x = -\lambda\int_a^b \mathrm{d}x w(x)|\phi(x)|^2$

Therefore if the operator is self-adjoint we can write

$(L \phi, \phi) - (\phi, L \phi) = (\lambda - \lambda^{*}) \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 = 0$

$\implies$

$\lambda = \lambda^{*}$

since $\int_a^b \mathrm{d}x w(x)|\phi(x)|^2 > 0$, so the eigenvalues must be real. In particular, this must be the case for regular and singular Sturm-Liouville problems, and for Sturm-Liouville problems involving periodic boundary conditions.

To prove that the eigenfunctions are orthogonal, let $\phi(x)$ and $\psi(x)$ denote two eigenfunctions corresponding to distinct eigenvalues $\lambda$ and $\mu$ respectively. Then we have

$L \phi = - \lambda w \phi$

$L \psi = - \mu w \psi$

and so by the self-adjoint property we can write

$(L \phi, \psi) - (\phi, L \psi) = \int_a^b (- \lambda w \phi)^{*} \psi \mathrm{d} x - \int_a^b \phi^{*} (- \mu w \psi) \mathrm{d} x$

$= (\mu - \lambda) \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0$

Since the eigenvalues are distinct, the only way this can happen is if

$(\phi, \psi)_w = \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0$

so the eigenfunctions must be orthogonal as claimed.

In addition to being orthogonal, the eigenfunctions $\phi_n(x)$, $n = 1, 2, 3, \dots$, of a Sturm-Liouville problem with specified boundary conditions also form a complete set of functions (I will not prove this here), which means that any sufficiently well-behaved function $f(x)$ for which $\int_a^b\mathrm{d} x |f(x)|^2$ exists can be represented by a Fourier series of the form

$f(x) = \sum_{n=1}^{\infty} a_n \phi_n(x)$

for $x \in [a, b]$, where the coefficients $a_n$ are given by the formula

$a_n = \frac{(\phi_n, f)_w}{(\phi_n, \phi_n)_w} = \frac{\int_a^b \mathrm{d}x w(x) \phi_n(x)^{*} f(x)}{\int_a^b \mathrm{d}x w(x) |\phi_n(x)|^2}$

It is the completeness and orthogonality of the eigenfunctions that makes Sturm-Liouville theory so useful in solving linear differential equations, because (for example) it means that the solutions of many second-order inhomogeneous linear differential equations of the form

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)$

with suitable boundary conditions can be expressed as a linear combination of the eigenfunctions of the corresponding Sturm-Liouville problem

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0$

with the same boundary conditions. To illustrate this, suppose this Sturm-Liouville problem with boundary conditions $\phi(a) = \phi(b) = 0$ has an infinite set of eigenvalues $\lambda_k$ and corresponding eigenfunctions $\phi_k(x)$, $k = 1, 2, 3, \dots$, which are orthogonal and form a complete set. We will assume that the solution of the inhomogeneous differential equation above is an infinite series of the form

$\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)$

where the coefficients $a_k$ are constants, and we will find these coefficients using the orthogonality of the eigenfunctions. Since for each $k$ it is true that

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k = - \lambda_k w(x) \phi_k$

we can write

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q \phi$

$= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \sum_{k = 1}^{\infty} a_k \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \sum_{k=1}^{\infty} a_k \phi_k$

$= \sum_{k=1}^{\infty} a_k \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k\bigg]$

$= \sum_{k=1}^{\infty} a_k\big[- \lambda_k w(x) \phi_k\big]$

$= - \sum_{k=1}^{\infty} a_k \lambda_k w(x) \phi_k$

Thus, in the inhomogeneous equation

$\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)$

we can put

$F(u) = - \sum_{k=1}^{\infty} a_k \lambda_k w(u) \phi_k(u)$

To find the $m$th coefficient $a_m$ we can multiply both sides by $\phi_m(u)^{*}$ and integrate. By orthogonality, all the terms in the sum on the right will vanish except the one involving $\phi_m(u)$. We will get

$\int_a^b \phi_m(u)^{*} F(u) \mathrm{d}u = - \int_a^b a_m \lambda_m w(x) \phi_m(u)^{*}\phi_m(u) \mathrm{d} u = -a_m \lambda_m (\phi_m, \phi_m)_w$

$\implies$

$a_m = -\int_a^b \frac{\phi_m(u)^{*} F(u)}{\lambda_m (\phi_m, \phi_m)_w}\mathrm{d} u$

Having found a formula for the coefficients $a_k$, we can now write the solution of the original inhomogeneous differential equation as

$\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)$

$= \sum_{k = 1}^{\infty} \bigg(-\int_a^b \frac{\phi_k(u)^{*} F(u)}{\lambda_k (\phi_k, \phi_k)_w}\mathrm{d} u\bigg) \phi_k(x)$

$= \int_a^b \mathrm{d} u \bigg(-\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}\bigg) F(u)$

$= \int_a^b \mathrm{d} u G(x, u) F(u)$

where

$G(x, u) \equiv -\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}$

To conclude this note, I want to go back to a previous note in which I explored in detail the solution of Schrödinger’s equation for the hydrogen atom by the method of separation of variables. This approach reduced Schrödinger’s partial differential equation into a set of three uncoupled ordinary differential equations which we can now see are in fact Sturm-Liouville problems. As discussed in my previous note, Schrödinger’s three-dimensional equation for the hydrogen atom can be written in spherical polar coordinates as

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

and after solving this by the usual separation of variables approach starting from the assumption that the $\psi$ function can be expressed as a product

$\psi(r, \theta, \phi) = R(r) \Phi(\phi) \Theta(\theta)$

we end up with an equation for $R$ (the radial equation) of the form

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

and equations for $\Phi$ and $\Theta$ of the forms

$\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0$

and

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0$

respectively. Taking each of these in turn, we first observe that the radial equation is of the Sturm-Liouville form with $p(r) = r^2$ and eigenvalues corresponding to the energy term $E$ in the equation. The variable $r$ can range between $0$ and $\infty$ and the boundary conditions are formulated in such a way that the solutions of the radial equation remain bounded as $r \rightarrow 0$ and go to zero as $r \rightarrow \infty$. Furthermore, since $p(0) =0$, the radial equation is a singular Sturm-Liouville problem. Next, we observe that the equation for $\Phi$ is essentially the same as equation (13) for the vibrating string in the extract from Courant & Hilbert discussed at the start of this note. The azimuth angle $\phi$ can take any value in $(-\infty, \infty)$ but the function $\Phi$ must take a single value at each point in space (since this is a required property of the quantum wave function which $\Phi$ is a constituent of). It follows that the function $\Phi$ must be periodic since it must take the same value at $\phi$ and $\phi + 2\pi$ for any given $\phi$. This condition implies the conditions $\Phi(0) = \Phi(2 \pi)$ and $\Phi^{\prime}(0) = \Phi^{\prime}(2\pi)$. Furthermore, we have $p(\phi) = 1$ for all $\phi$. Thus, the equation for $\Phi$ is a Sturm-Liouville problem with periodic boundary conditions. Finally, as discussed in my previous note, the $\Theta$ equation can be rewritten as

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

$\iff$

$\frac{d}{d x}\bigg((1 - x^2) \frac{d \Theta}{d x}\bigg) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

where $x = \cos \theta$ and thus $-1 \leq x \leq 1$. This is a Sturm-Liouville problem with $p(x) = 1 - x^2$ and the boundary conditions are given by the requirement that $\Theta(\theta)$ should remain bounded for all $x$. Since $p(x) = 0$ at both ends of the interval $[-1, 1]$, this equation can be classified as a singular Sturm-Liouville problem. The eigenvalue is $\lambda$ in this equation.

# A note on Green’s theorem in the plane and finding areas enclosed by parametric curves

Green’s theorem in the plane says that if $P$$Q$$\partial P/\partial y$ and $\partial Q/\partial x$ are single-valued and continuous in a simple connected region $\mathfrak{R}$ bounded by a simple closed curve $C$, then

$\oint_C P \mathrm{d}x + Q \mathrm{d}y = \iint_{\mathfrak{R}}\big(\frac{\partial Q}{\partial x} - \frac{\partial P}{\partial y}\big) \mathrm{d}x \mathrm{d}y$

where the line integral along $C$ is in the anti-clockwise direction, as shown in the sketch. The theorem allows one to replace a double-integral over the region $\mathfrak{R}$ by a line integral around the boundary curve $C$, or vice versa, whichever is the easier one to solve. It acts as a template for generating a multitude of useful formulas of this kind that can be tailored to suit particular situations by carefully choosing the forms of the functions $P$ and $Q$. For example, in the context of integration of vector functions, if we write a two-dimensional vector $V$ in unit-vector form as $V = i V_x + j V_y$, then putting $Q = V_x$ and $P = - V_y$ in Green’s theorem gives the divergence theorem in two dimensions, whereas putting $Q = V_y$ and $P = V_x$ gives Stokes’ theorem in two dimensions. Another result like this is obtained by putting $Q = x$ and $P = -y$ in Green’s theorem. This yields a formula for calculating the area enclosed by the simple closed curve $C$ in the sketch above:

$\oint_C x\mathrm{d}y - y \mathrm{d}x = \iint_{\mathfrak{R}} \big(\frac{\partial x}{\partial x} - \frac{\partial (-y)}{\partial y}\big) \mathrm{d}x \mathrm{d}y = 2 \iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y$

$\implies$

$\iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y = \frac{1}{2} \oint_C x\mathrm{d}y - y \mathrm{d}x$

In the present note I want to quickly record a couple of observations about how this last result extends to cases in which the relevant curve $C$ is defined parametrically rather than in terms of Cartesian coordinates.

First, the result can easily be adapted to obtain a very useful formula for finding the areas of closed parametric curves. If the curve $C$ is defined parametrically by $x(t)$ and $y(t))$, then simply changing variables from $x$ and $y$ to $t$ in the formula above gives

$\iint_{\mathfrak{R}} \mathrm{d}x \mathrm{d}y = \frac{1}{2} \oint_{C} x \mathrm{d}y - y \mathrm{d}x = \frac{1}{2} \oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t$

The expression on the right-hand side can immediately be applied to a huge range of problems involving finding areas of closed parametric curves. As a simple initial illustration to check that it works, we can use it to confirm that the area of a circle of radius $r$ is $\pi r^2$. The circle is described by the Cartesian equation $x^2 + y^2 = r^2$ but has a parametric representation $x = r\mathrm{cos} (t)$, $y = r \mathrm{sin} (t)$, with $t$ ranging from $0$ to $2 \pi$. Therefore $\dot{x} = -r\mathrm{sin} (t)$ and $\dot{y} = r\mathrm{cos} (t)$. Putting these into the formula we get

$\frac{1}{2}\oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_{t=0}^{2 \pi} (r^2\mathrm{cos}^2(t) + r^2\mathrm{sin}^2(t)) \mathrm{d}t = \frac{r^2}{2} \int_{t=0}^{2 \pi} \mathrm{d} t = \pi r^2$

as expected.

As a slightly more interesting example, we can find the area of the main cardioid in the Mandelbrot set. This has parametric representation $x = \frac{1}{2} \mathrm{cos}(t) - \frac{1}{4} \mathrm{cos}(2t)$, $y = \frac{1}{2} \mathrm{sin}(t) - \frac{1}{4} \mathrm{sin}(2t)$ with $t$ ranging from $0$ to $2 \pi$ (see, e.g., Weisstein, Eric W., Mandelbrot set, From MathWorld – A Wolfram Web Resource). We find that $\dot{x} = -\frac{1}{2} \mathrm{sin}(t) + \frac{1}{2} \mathrm{sin}(2t)$ and $\dot{y} = \frac{1}{2} \mathrm{cos}(t) - \frac{1}{2} \mathrm{cos}(2t)$, and therefore

$x\dot{y} = \frac{1}{4} \mathrm{cos}^2(t) - \frac{3}{8} \mathrm{cos}(t) \mathrm{cos}(2t) + \frac{1}{8}\mathrm{cos}^2(2t)$

$y\dot{x} = -\frac{1}{4} \mathrm{sin}^2(t) + \frac{3}{8} \mathrm{sin}(t)\mathrm{sin}(2t) - \frac{1}{8}\mathrm{sin}^2(2t)$

$x\dot{y} - y\dot{x} = \frac{3}{8} - \frac{3}{8}\mathrm{cos}(t)$

Putting this into the formula we find that the area of the main cardioid in the Mandelbrot set is

$\frac{1}{2}\oint_C (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_{t=0}^{2 \pi} \big(\frac{3}{8} - \frac{3}{8}\mathrm{cos}(t)\big) \mathrm{d}t = \frac{3\pi}{8}$

The second observation I want to make here is that we can sometimes use the same formula to find the area of a region by integrating along a parametric curve that is not closed. This seems surprising at first because we obtained the formula using Green’s theorem which explicitly requires the curve $C$ to be closed. As an illustration of this situation, consider the problem of finding the area $A$ in the diagram. The arc joining the two points $(x_1, y_1)$ and $(x_2, y_2)$ is assumed to have parametric representation $x = f(t)$, $y = g(t)$, such that $(x_1, y_1) = (f(t_1), g(t_1))$ and $(x_2, y_2) = (f(t_2), g(t_2))$, with $t_2 > t_1$. The claim is then that the area $A$ in the diagram is given by the same formula as before, but applied only along the arc joining $(x_1, y_1)$ and $(x_2, y_2)$ rather than all the way around the enclosed region. Thus, we are claiming that

$A = \frac{1}{2} \int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t$

To prove this we first note that since $x = f(t)$, we can write $t = f^{-1}(x)$, so $y$ can be written as a function of $x$ as $y = g(f^{-1}(x))$. The area under the arc joining $(x_1, y_1)$ and $(x_2, y_2)$ is then given by

$\int_{x_1}^{x_2} g(f^{-1}(x)) \mathrm{d}x$

Changing variables in this integral from $x$ to $t$ we find that $t_1 = f^{-1}(x_1)$$t_2 = f^{-1}(x_2)$, $g(f^{-1}(x)) = g(t) = y$ and $\mathrm{d}x = f^{\prime}(t) \mathrm{d}t = \dot{x} \mathrm{d}t$. Thus, we find that the area under the arc is given by

$\int_{x_1}^{x_2} g(f^{-1}(x)) \mathrm{d}x = \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t$

By simple geometry we can then see that the area $A$ is given by

$\frac{1}{2}x_2y_2 - \frac{1}{2}x_1y_1 - \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t$

where $\frac{1}{2}x_2y_2$ is the area under the line joining $0$ and $(x_2, y_2)$, and $\frac{1}{2}x_1y_1$ is the area under the line joining $0$ and $(x_1, y_1)$.

Next, we imagine flipping the graph over so that $y$ is now along the horizontal axis and $x$ is along the vertical axis. We can proceed in the same way as before to find the area under the curve from this point of view. In Cartesian form the area under the curve is given by

$\int_{y_1}^{y_2} f(g^{-1}(y)) \mathrm{d}y$

and upon changing variables from $y$ to $t$ this becomes $\int_{t_1}^{t_2} x \dot{y} \mathrm{d}t$. But, returning to the original graph, we  see that the sum of the two areas $\int_{t_1}^{t_2} y \dot{x} \mathrm{d}t$ and $\int_{t_1}^{t_2} x \dot{y} \mathrm{d}t$ is the same as the difference $x_2y_2 - x_1y_1$, where $x_2y_2$ is the area of the rectangle with vertices at $(0, 0)$, $(0, y_2)$, $(x_2, y_2)$ and $(x_2, 0)$, and $x_1y_1$ is the area of the smaller rectangle with vertices at $(0, 0)$, $(0, y_1)$, $(x_1, y_1)$ and $(x_1, 0)$. Thus, we can write

$x_2y_2 - x_1y_1 = \int_{t_1}^{t_2} (x \dot{y} + y \dot{x}) \mathrm{d}t$

$\iff$

$\frac{1}{2}x_2y_2 - \frac{1}{2}x_1y_1 - \int_{t_1}^{t_2} y \dot{x} \mathrm{d}t= \frac{1}{2}\int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t$

This proves the result since we saw above that the expression on the left-hand side gives the area $A$.

As a simple application of this result, suppose the arc in the above scenario is part of a parabola with Cartesian equation $y = x^2$ and we want to find the area $A$ when $(x_1, y_1) = (1, 1)$ and $(x_2, y_2) = (2, 4)$. The quadratic equation has a parametric representation $x = t$, $y = t^2$ and $t$ ranges from $1$ to $2$ along the arc. We have $\dot{x} = 1$ and $\dot{y} = 2t$, so putting these into the formula we find that

$A = \frac{1}{2} \int_{t_1}^{t_2} (x \dot{y} - y \dot{x}) \mathrm{d}t = \frac{1}{2}\int_1^2 t^2 \mathrm{d}t = \frac{7}{6}$

# Study of a proof of Noether’s theorem and its application to conservation laws in physics

While I have for a long time been aware of Noether’s theorem and its relevance to symmetry and conservation laws in physics, I have only recently taken the time to closely explore its mathematical proof. In the present note I want to record some notes I made on the mathematical nuances involved in a proof of Noether’s theorem and the mathematical relevance of the theorem to some simple conservation laws in classical physics, namely, the conservation of energy and the conservation of linear momentum. Noether’s Theorem has important applications in a wide range of classical mechanics problems as well as in quantum mechanics and Einstein’s relativity theory. It is also used in the study of certain classes of partial differential equations that can be derived from variational principles.

The theorem was first published by Emmy Noether in 1918. An English translation of the full original paper is available here. An interesting book by Yvette Kosmann-Schwarzbach also presents an English translation of Noether’s 1918 paper and discusses in detail the history of the theorem’s development and its impact on theoretical physics in the 20th Century. (Kosmann-Schwarzbach, Y, 2011, The Noether Theorems: Invariance and Conservation Laws in the Twentieth Century. Translated by Bertram Schwarzbach. Springer). At the time of writing, the book is freely downloadable from here.

Mathematical setup of Noether’s theorem

The case I explore in detail here is that of a variational calculus functional of the form

$S[y] = \int_a^b \mathrm{d}x F(x, y, y^{\prime})$

where $x$ is a single independent variable and $y = (y_1, y_2, \ldots, y_n)$ is a vector of $n$ dependent variables. The functional has stationary paths defined by the usual Euler-Lagrange equations of variational calculus. Noether’s theorem concerns how the value of this functional is affected by families of continuous transformations of the dependent and independent variables (e.g., translations, rotations) that are defined in terms of one or more real parameters. The case I explore in detail here involves transformations defined in terms of only a single parameter, call it $\delta$. The transformations can be represented in general terms as

$\overline{x} = \Phi(x, y, y^{\prime}; \delta)$

$\overline{y}_k = \Psi_k(x, y, y^{\prime}; \delta)$

for $k = 1, 2, \ldots, n$. The functions $\Phi$ and $\Psi_k$ are assumed to have continuous first derivatives with respect to all the variables, including the parameter $\delta$. Furthermore, the transformations must reduce to identities when $\delta = 0$, i.e.,

$x \equiv \Phi(x, y, y^{\prime}; 0)$

$y_k \equiv \Psi_k(x, y, y^{\prime}; 0)$

for $k = 1, 2, \ldots, n$. As concrete examples, translations and rotations are continuous differentiable transformations that can be defined in terms of a single parameter and that reduce to identities when the parameter takes the value zero.

Noether’s theorem is assumed to apply to infinitesimally small changes in the dependent and independent variables, so we can assume $|\delta| \ll 1$ and then use perturbation theory to prove the theorem. Treating $\overline{x}$ and $\overline{y}_k$ as functions of $\delta$ and Taylor-expanding them about $\delta = 0$ we get

$\overline{x}(\delta) = \overline{x}(0) + \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)$

$\iff$

$\overline{x}(\delta) = x + \delta \phi + O(\delta^2)$

where

$\phi(x, y, y^{\prime}) \equiv \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}$

and

$\overline{y}_k (\delta) = \overline{y}_k (0) + \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)$

$\iff$

$\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)$

where

$\psi_k (x, y, y^{\prime}) \equiv \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0}$ for $k = 1, 2, \ldots, n$.

Noether’s theorem then says that whenever the functional $S[y]$ is invariant under the above family of transformations, i.e., whenever

$\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})$

for all $c$ and $d$ such that $a \leq c < d \leq b$, where $\overline{c} = \Phi(c, y(c), y^{\prime}(c))$ and $\overline{d} = \Phi(d, y(d), y^{\prime}(d))$, then for each stationary path of $S[y]$ the following equation holds:

$\sum_{k = 1}^n \frac{\partial F}{\partial y_k^{\prime}}\psi_k + \bigg(F - \sum_{k = 1}^n y_k^{\prime}\frac{\partial F}{\partial y_k^{\prime}}\bigg)\phi = \mathrm{constant}$

As illustrated below, this remarkable equation encodes a number of conservation laws in physics, including conservation of energy, linear and angular momentum given that the relevant equations of motion are invariant under translations in time and space, and under rotations in space respectively. Thus, Noether’s theorem is often expressed as a statement along the lines that whenever a system has a continuous symmetry there must be corresponding quantities whose values are conserved.

Application of the theorem to familiar conservation laws in classical physics

It is, of course, not necessary to use the full machinery of Noether’s theorem for simple examples of conservation laws in classical physics. The theorem is most useful in unfamiliar situations in which it can reveal conserved quantities which were not previously known. However, going through the motions in simple cases clarifies how the mathematical machinery works in more sophisticated and less familiar situations.

To obtain the law of the conservation of energy in the simplest possible scenario, consider a particle of mass $m$ moving along a straight line in a time-invariant potential field $V(x)$ with position at time $t$ given by the function $x(t)$. The Lagrangian formulation of mechanics then says that the path followed by the particle will be a stationary path of the action functional

$\int_0^{T} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2 - V(x)\big)$

The Euler-Lagrange equation for this functional would give Newton’s second law as the equation governing the particle’s motion. With regard to demonstrating energy conservation, we notice that the Lagrangian, which is more generally of the form $L(t, x, \dot{x})$ when there is a time-varying potential, here takes the simpler form $L(x, \dot{x})$ because there is no explicit dependence on time. Therefore we might expect the functional to be invariant under translations in time, and thus Noether’s theorem to hold. We will verify this. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

$\overline{t}(\delta) = t + \delta \phi + O(\delta^2) \equiv t + \delta$

and

$\overline{x}(\delta) = x + \delta \cdot 0 + O(\delta^2) \equiv x$

From the first equation we see that $\phi = 1$ in the case of a simple translation in time by an amount $\delta$, and from the second equation we see that $\psi = 0$, which simply reflects the fact that we are only translating in the time direction. The invariance of the functional under these transformations can easily be demonstrated by writing

$\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\overline{x}, \dot{\overline{x}}) = \int_{\overline{0}-\delta}^{\overline{T}-\delta} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t L(x, \dot{x})$

where the limits in the second integral follow from the change of the time variable from $\overline{t}$ to $t$. Thus, Noether’s theorem holds and with $\phi = 1$ and $\psi = 0$ the fundamental equation in the theorem reduces to

$L - \dot{x}\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}$

Evaluating the terms on the left-hand side we get

$\frac{1}{2}m\dot{x}^2 - V(x) - \dot{x} m\dot{x} =\mathrm{constant}$

$\iff$

$\frac{1}{2}m\dot{x}^2 + V(x) = E = \mathrm{constant}$

which is of course the statement of the conservation of energy.

To obtain the law of conservation of linear momentum in the simplest possible scenario, assume now that the above particle is moving freely in the absence of any potential field, so $V(x) = 0$ and the only energy involved is kinetic energy. The path followed by the particle will now be a stationary path of the action functional

$\int_0^{T} \mathrm{d}t L(\dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2\big)$

The Euler-Lagrange equation for this functional would give Newton’s first law as the equation governing the particle’s motion (constant velocity in the absence of any forces). To get the law of conservation of linear momentum we will consider a translation in space rather than time, and check that the action functional is invariant under such translations. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

$\overline{t}(\delta) = t + \delta \cdot 0 + O(\delta^2) \equiv t$

and

$\overline{x}(\delta) = x + \delta \psi + O(\delta^2) \equiv x + \delta$

From the first equation we see that $\phi = 0$ reflecting the fact that we are only translating in the space direction, and from the second equation we see that $\psi = 1$ in the case of a simple translation in space by an amount $\delta$. The invariance of the functional under these transformations can easily be demonstrated by noting that $\dot{\overline{x}} = \dot{x}$, so we can write

$\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\dot{\overline{x}}) = \int_0^{T} \mathrm{d}t L(\dot{x})$

since the limits of integration are not affected by the translation in space. Thus, Noether’s theorem holds and with $\phi = 0$ and $\psi = 1$ the fundamental equation in the theorem reduces to

$\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}$

$\iff$

$m\dot{x} = \mathrm{constant}$

This is, of course, the statement of the conservation of linear momentum.

Proof of Noether’s theorem

To prove Noether’s theorem we will begin with the transformed functional

$\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})$

We will substitute into this the linearised forms of the transformations, namely

$\overline{x}(\delta) = x + \delta \phi + O(\delta^2)$

and

$\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)$

for $k = 1, 2, \ldots, n$, and then expand to first order in $\delta$. Note that the integration limits are, to first order in $\delta$,

$\overline{c} = c + \delta \phi(c)$

and

$\overline{d} = d + \delta \phi(d)$

Using the linearised forms of the transformations and writing $\psi = (\psi_1, \psi_2, \ldots, \psi_n)$ we get

$\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \frac{\mathrm{d}x}{\mathrm{d}\overline{x}}$

$\frac{\mathrm{d}\overline{x}}{\mathrm{d}x} = 1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}$

Inverting the second equation we get

$\frac{\mathrm{d}x}{\mathrm{d}\overline{x}} = \big(1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big)^{-1} = 1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} + O(\delta^2)$

Using this in the first equation we find, to first order in $\delta$,

$\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \big(1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big) = \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)$

Making the necessary substitutions we can then write the transformed functional as

$\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})$

$= \int_{\overline{c}-\delta \phi(c)}^{\overline{d}-\delta \phi(d)} \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)$

$= \int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)$

Treating $F$ as a function of $\delta$ and expanding about $\delta = 0$ to first order we get

$F(\delta) = F(0) + \delta \frac{\partial F}{\partial \delta}\big|_{\delta = 0}$

$= F(x, y, y^{\prime}) + \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)$

Then using the expression for $\frac{\mathrm{d}\overline{x}}{\mathrm{d}x}$ above, the transformed functional becomes

$\int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)$

$= \int_c^d \mathrm{d}x F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)$

$+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)$

$= \int_c^d \mathrm{d}x F(x, y, y^{\prime})$

$+ \int_c^d \mathrm{d}x \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)$

$+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F(x, y, y^{\prime}) + O(\delta^2)$

Ignoring the second order term in $\delta$ we can thus write

$\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})$

$+ \delta \int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg)$

Since the functional is invariant, however, this implies

$\int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg) = 0$

We now manipulate this equation by integrating the terms involving $\frac{\mathrm{d}\phi}{\mathrm{d}x}$ and $\frac{\mathrm{d}\psi_k}{\mathrm{d}x}$ by parts. We get

$\int_c^d \mathrm{d}x \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} = \bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg]_c^d$

$- \int_c^d \mathrm{d}x \phi \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)$

and

$\int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} = \bigg[\sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d - \int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\psi_k$

Substituting these into the equation gives

$\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d$

$+ \int_c^d \mathrm{d}x \phi \bigg(\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg)$

$+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0$

We can manipulate this equation further by expanding the integrand in the second term on the left-hand side. We get

$\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)$

$= \frac{\partial F}{\partial x} - \frac{\partial F}{\partial x} - \sum_{k=1}^n \frac{\partial F}{\partial y_k}y^{\prime}_k - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n y^{\prime}_k \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)$

$= \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)$

Thus, the equation becomes

$\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d$

$+ \int_c^d \mathrm{d}x \phi \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)$

$+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0$

We can now see at a glance that the second and third terms on the left-hand side must vanish because of the Euler-Lagrange expressions appearing in the brackets (which are identically zero on stationary paths). Thus we arrive at the equation

$\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d = 0$

which proves that the formula inside the square brackets is constant as per Noether’s theorem.

# Calculus derivation of a surprising result of Galileo’s

An interesting paper by Herman Erlichson discusses Galileo’s attempt to prove that the minimum time path for a particle falling from a point on the lower quadrant of a circle is via the circle itself. (I attach the paper here: Erlichson, H, 1998, Galileo’s Work on Swiftest Descent from a Circle And How He Almost Proved the Circle Itself Was the Minimum Time Path, The American Mathematical Monthly, Vol 105, No 4, 338-347). I was particularly intrigued by equation (1) in this paper which asserts that the time taken for a particle to fall along a straight line starting at the circle is independent of the starting point. Thus, for example, a particle following a longer chord starting higher up on the circle would take exactly the same time to reach the bottom as a particle following a shorter chord starting lower down on the circle. Galileo explained it in his book, The Two New Sciences, as follows:

(I attach the English translation of Galileo’s book here: galileo two new sciences).

Underneath equation (1) in Erlichson’s paper, Erlichson gives a quick heuristic derivation of this result but I couldn’t resist exploring it more deeply using calculus methods (which were not available in Galileo’s time – calculus was invented by Isaac Newton, who was born almost at the same time that Galileo died). I want to record these calculations in the present note, in particular showing first how the time taken along an arbitrary straight line definitely does depend on the starting point, and then showing how this dependence instantly disappears’ when the starting point of the straight line is taken to lie on a circle. I find it remarkable how this happens algebraically – nothing fundamentally changes about the straight line other than assuming that it does or does not start on a circle!

For the purposes of this note I have amended Figure 1 in Erlichson’s paper by imposing a coordinate system as shown below:

The origin of the coordinate system is taken to be at C and we suppose that the point D has coordinates $(X, nX)$. The equation of the straight line DC is $y = nx$. Let us ignore the circular arc connecting D and C for the moment and work out the time it would take for a particle of mass $m$ to fall from D to C along the straight line. We can do this by considering the energy of the particle at point D. If it starts at rest (i.e., its initial velocity is zero), the energy at D consists only of the particle’s potential energy given by $mgy(X) = mgnX$. Potential energy is transferred to kinetic energy as the particle moves down the line and the total energy at each $x$-coordinate is given by

$\frac{1}{2}mv^2 + mgnx = mgnX$

$\iff$

$v = \sqrt{2gn(X - x)}$

Given that $y = nx$ we see that $y^{\prime} = n$ so the element of distance along the straight line is $\mathrm{d}x\sqrt{1 + (y^{\prime})^2} = \mathrm{dx}\sqrt{1 + n^2}$. Therefore the element of speed is

$\frac{\mathrm{d}x}{\mathrm{d}t}\sqrt{1+n^2}$

Setting this equal to $v$ and solving for $\frac{\mathrm{d}x}{\mathrm{d}t}$ we get

$\frac{\mathrm{d}x}{\mathrm{d}t} = \sqrt{\frac{2gn(X-x)}{1+n^2}}$

We now observe that the time of descent $T$ will be given by the integral

$T = \int_0^T \mathrm{d}t = \int_0^X \frac{\mathrm{d}t}{\mathrm{d}x} \mathrm{dx} = \int_0^X \sqrt{\frac{1+n^2}{2gn(X-x)}}dx$

$= \sqrt{\frac{1+n^2}{2gn}} \int_0^X (X - x)^{1/2}\mathrm{d}x$

$= 2\sqrt{X} \sqrt{\frac{1+n^2}{2gn}}$

$= \sqrt{\frac{2X}{gn}}\sqrt{1+n^2}$

Thus, in this case, the formula for $T$ definitely depends on the starting point $X$. But watch what happens when we now assume that the point D lies on a circle of radius $L$ as shown in the diagram. The circle has equation

$x^2 + (y - L)^2 = L^2$

so the equation of the circular arc joining D and C is

$y = L + \sqrt{L^2 - x^2}$

as indicated in the diagram. At the point D we therefore have

$nX = L + \sqrt{L^2 - X^2}$

$\iff$

$n = \frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}$

so

$n^2 = \frac{L^2}{X^2} + \frac{2L}{X}\sqrt{\frac{L^2}{X^2} - 1} + \frac{L^2}{X^2} - 1$

and therefore

$1 + n^2 = \frac{2L^2}{X^2} + \frac{2L}{X}\sqrt{\frac{L^2}{X^2} - 1}$

Substituting these expressions for $n$ and $1 + n^2$ in the formula for $T$ we get

$T = \sqrt{\frac{2X}{gn}}\sqrt{1+n^2}$

$= \sqrt{\frac{2X\big(\frac{2L^2}{X^2} + \frac{2L}{X}\sqrt{\frac{L^2}{X^2} - 1}\big)}{g\big(\frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}\big)}}$

$= \sqrt{\frac{4L\big(\frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}\big)}{g\big(\frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}\big)}}$

$= \sqrt{\frac{4L}{g}}$

$= 2\sqrt{\frac{L}{g}}$

which is equation (1) in Erlichson’s paper. The dependence on the starting point $X$ has now vanished, so all starting points on the lower quadrant of the circle will yield the same time of travel to the origin!

# Simple variational setups yielding Newton’s Second Law and Schrödinger’s equation

It is a delightful fact that one can get both the fundamental equation of classical mechanics (Newton’s Second Law) and the fundamental equation of quantum mechanics (Schrödinger’s equation) by solving very simple variational problems based on the familiar conservation of mechanical energy equation

$K + U = E$

In the present note I want to briefly set these out emphasising the common underlying structure provided by the conservation of mechanical energy and the calculus of variations. The kinetic energy $K$ will be taken to be

$K = \frac{1}{2}m \dot{x}^2 = \frac{p^2}{2m}$

where $\dot{x} = \frac{\mathrm{d}x}{\mathrm{d}t}$ is the particle’s velocity, $p = m\dot{x}$ is its momentum, and $m$ is its mass. The potential energy $U$ will be regarded as some function of $x$ only.

To obtain Newton’s Second Law we find the stationary path followed by the particle with respect to the functional

$S[x] = \int_{t_1}^{t_2} L(t, x, \dot{x}) dt = \int_{t_1}^{t_2} (K - U) dt$

The function $L(t, x, \dot{x}) = K - U$ is usually termed the Lagrangian’ in classical mechanics. The functional $S[x]$ is usually called the `action’. The Euler-Lagrange equation for this calculus of variations problem is

$\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t}\big(\frac{\partial L}{\partial \dot{x}}\big) = 0$

and this is Newton’s Second Law in disguise! We have

$\frac{\partial L}{\partial x} = -\frac{\mathrm{d}U}{\mathrm{d}x} \equiv F$

$\frac{\partial L}{\partial \dot{x}} = m\dot{x} \equiv p$

and

$\frac{\mathrm{d}}{\mathrm{d}t} \big(\frac{\partial L}{\partial \dot{x}}\big) = \frac{\mathrm{d}p}{\mathrm{d}t} = m\ddot{x} \equiv ma$

so substituting these into the Euler-Lagrange equation we get Newton’s Second Law, $F = ma$.

To obtain Schrödinger’s equation we introduce a function

$\psi(x) = exp\big(\frac{1}{\hbar}\int p\mathrm{d}x\big)$

where $p = m \dot{x}$ is again the momentum of the particle and $\hbar$ is the reduced Planck’s constant from quantum mechanics. (Note that $\int p dx$ has units of length$^2$ mass time$^{-1}$ so we need to remove these by dividing by $\hbar$ which has the same units. The function $\psi(x)$ in quantum mechanics is dimensionless). We then have

$\text{ln} \psi = \frac{1}{\hbar}\int p\mathrm{d}x$

and differentiating both sides gives

$\frac{\psi^{\prime}}{\psi} = \frac{1}{\hbar} p$

so

$p^2 = \hbar^2 \big(\frac{\psi^{\prime}}{\psi}\big)^2$

Therefore we can write the kinetic energy as

$K = \frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2$

and putting this into the conservation of mechanical energy equation gives

$\frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2 + U = E$

$\iff$

$\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2 = 0$

We now find the stationary path followed by the particle with respect to the functional

$T[\psi] = \int_{-\infty}^{\infty} M(x, \psi, \psi^{\prime}) \mathrm{d}x = \int_{-\infty}^{\infty} \big(\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2\big)\mathrm{d}x$

The Euler-Lagrange equation for this calculus of variations problem is

$\frac{\partial M}{\partial \psi} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial M}{\partial \psi^{\prime}}\big) = 0$

and this is Schrödinger’s equation in disguise! We have

$\frac{\partial M}{\partial \psi} = 2(U - E)\psi$

$\frac{\partial M}{\partial \psi^{\prime}} = \frac{\hbar^2}{m} \psi^{\prime}$

and

$\frac{\mathrm{d}}{\mathrm{d}x} \big(\frac{\partial M}{\partial \psi^{\prime}}\big) = \frac{\hbar^2}{m} \psi^{\prime \prime}$

so substituting these into the Euler-Lagrange equation we get

$2(U - E) \psi - \frac{\hbar^2}{m} \psi^{\prime \prime} = 0$

$\iff$

$-\frac{\hbar^2}{2m} \frac{\mathrm{d}^2 \psi}{\mathrm{d} x^2} + U \psi = E \psi$

and this is the (time-independent) Schrödinger equation for a particle of mass $m$ with fixed total energy $E$ in a potential $U(x)$ on the line $-\infty < x < \infty$.