Invariance under rotations in space and conservation of angular momentum

In a previous note I studied in detail the mathematical setup of Noether’s Theorem and its proof. I briefly illustrated the mathematical machinery by considering invariance under translations in time, giving the law of conservation of energy, and invariance under translations in space, giving the law of conservation of linear momentum. I briefly mentioned that invariance under rotations in space would also yield the law of conservation of angular momentum but I  did not work this out explicitly. I want to quickly do this in the present note.

We imagine a particle of unit mass moving freely in the absence of any potential field, and tracing out a path \gamma(t) in the (x, y)-plane of a three-dimensional Euclidean coordinate system between times t_1 and t_2, with the z-coordinate everywhere zero along this path. The angular momentum of the particle at time t with respect to the origin of the coordinate system is given by

\mathbf{L} = \mathbf{r} \times \mathbf{v}

= (\mathbf{i} x + \mathbf{j} y) \times (\mathbf{i} \dot{x} + \mathbf{j} \dot{y})

= \mathbf{k} x \dot{y} - \mathbf{k} y \dot{x}

= \mathbf{k} (x \dot{y} - y \dot{x})

where \times is the vector product operation. Alternatively, we could have obtained this as

\mathbf{L} = \mathbf{r} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & 0 \\ \ \\\dot{x} & \dot{y} & 0 \end{vmatrix}

= \mathbf{k} (x \dot{y} - y \dot{x})

In terms of Lagrangian mechanics, the path \gamma(t) followed by the particle will be a stationary path of the action functional

S[\gamma(t)] = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

(in the absence of a potential field the total energy consists only of kinetic energy).

Now imagine that the entire path \gamma(t) is rotated bodily anticlockwise in the (x, y)-plane through an angle \theta. This corresponds to a one-parameter transformation

\overline{t} \equiv \Phi(t, x, y, \dot{x}, \dot{y}; \theta) = t

\overline{x} \equiv \Psi_1(t, x, y, \dot{x}, \dot{y}; \theta) = x \cos \theta - y \sin \theta

\overline{y} \equiv \Psi_2(t, x, y, \dot{x}, \dot{y}; \theta) = x \sin \theta + y \cos \theta

which reduces to the identity when \theta = 0. We have

d\overline{t} = dt

\dot{\overline{x}}^2 = \dot{x}^2 \cos^2 \theta + \dot{y}^2 \sin^2 \theta - 2 \dot{x} \dot{y} \sin \theta \cos \theta

\dot{\overline{y}}^2 = \dot{x}^2 \sin^2 \theta + \dot{y}^2 \cos^2 \theta + 2 \dot{x} \dot{y} \sin \theta \cos \theta

and therefore

\dot{x}^2 + \dot{y}^2 = \dot{\overline{x}}^2 + \dot{\overline{y}}^2

so the action functional is invariant under this rotation since

S[\overline{\gamma}(t)] = \int_{t_1}^{t_2} d\overline{t} \frac{1}{2}(d\dot{\overline{x}}^2 + d\dot{\overline{y}}^2) = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2) = S[\gamma(t)]

Therefore Noether’s theorem applies. Let

F(t, x, y, \dot{x}, \dot{y}) = \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

Then Noether’s theorem in this case says

\frac{\partial F}{\partial \dot{x}} \psi_1 + \frac{\partial F}{\partial \dot{y}} \psi_2 + \big(F - \frac{\partial F}{\partial \dot{x}} \dot{x} - \frac{\partial F}{\partial \dot{y}} \dot{y}\big) \phi = const.

where

\phi \equiv \frac{\partial \Phi}{\partial \theta} \big|_{\theta = 0} = 0

\psi_1 \equiv \frac{\partial \Psi_1}{\partial \theta} \big|_{\theta = 0} = -y

\psi_2 \equiv \frac{\partial \Psi_2}{\partial \theta} \big|_{\theta = 0} = x

We have

\frac{\partial F}{\partial \dot{x}} = \dot{x}

\frac{\partial F}{\partial \dot{y}} = \dot{y}

Therefore Noether’s theorem gives us (remembering \phi = 0)

-\dot{x} y + \dot{y} x = const.

The expression on the left-hand side of this equation is the angular momentum of the particle (cf. the brief discussion of angular momentum at the start of this note), so this result is precisely the statement that the angular momentum is conserved. Noether’s theorem shows us that this is a direct consequence of the invariance of the action functional of the particle under rotations in space.

A mathematical formulation of Feynman’s ‘mirage on a hot road’

In his famous Feynman Lectures on Physics, Richard Feynman provided an intuitive explanation of how a ‘mirage on a hot road’ can arise due to the bending of light rays from the sky in accordance with Fermat’s Principle (see The Feynman Lectures on Physics, Volume I, Chapter 26). Feynman wrote the following:

feynman vol1 ch26

I was discussing this with a beginning engineering student who did not quite understand why the mirage makes it look as if the water is actually on the road. I explained this by augmenting Feynman’s Fig. 26-8 above as follows:

The bent light ray starting at point A and entering the observer’s eye at point B is interpreted by the observer as having followed a straight line path emanating from the road, as indicated in the diagram. Thus, the observer sees the image of the sky on the road surface and interprets it as a shimmering pool of water.

Having done this, the question then arose as to how one could go about constructing an explicit mathematical model of the above scenario, yielding a suitable equation for the curved light ray from A to B, a linear equation for the apparent straight line path seen by the observer, and explicit coordinates for the point on the road where the image of the sky is seen by the observer. This turned out to be an interesting exercise involving Fermat’s Principle and the Calculus of Variations and is what I want to record here.

Suppose the light ray begins at point A = (a, b) at time t_1, and enters the observer’s eye at point B = (-a, b) at time t_2. Fermat’s Principle (see, e.g., this Wikipedia article) says that the path followed by the light ray is such as to make the optical length functional

S[y] = \int_A^B n ds

stationary, where n = c/v is the refractive index of the medium through which the light passes, c is the speed of light in a vacuum and v = ds/dt is the speed of light in the medium. This functional can be derived (up to a multiplicative constant) from the ‘Principle of Least Time’ by noting that the time taken by the light ray is

T = \int_{t_1}^{t_2} dt = \int_{t_1}^{t_2} \frac{1}{c} \frac{c}{v} \frac{ds}{dt} dt = \int_A^B \frac{n}{c} ds = \frac{1}{c} S

The light ray will find the path that minimises this time of travel.

To apply this setup to the mirage in Feynman’s lecture we need to model the refractive index as a function of the y-coordinate in my amended diagram above, which measures the height above the road. As Feynman says, light goes faster in the hot region near the road than in the cooler region higher up. Thus, since the refractive index is inversely proportional to v, it should be an increasing function of the height above the road y. To get a toy model for the scenario in Feynman’s lecture let us make the simplest possible assumption that the refractive index is a simple linear function of y, namely

n(y) = \alpha + \beta y

with \alpha and \beta both positive. Then since the arc-length element is

ds = dx \sqrt{1 + y^{\prime \ 2}}

we can write the optical length functional as

S[y] = \int_A^B n ds = \int_{a}^{-a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = -\int_{-a}^{a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

We find the stationary path for this functional using the Calculus of Variations. Let

F(x, y, y^{\prime}) = (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

Since this does not depend directly on x, the problem admits a first-integral of the form

y^{\prime} \frac{\partial F}{\partial y^{\prime}} - F = C

where C is a constant. We have

\frac{\partial F}{\partial y^{\prime}} = \frac{(\alpha + \beta y)y^{\prime}}{\sqrt{1 + y^{\prime \ 2}}}

Therefore the first-integral for this problem is

\frac{(\alpha + \beta y)y^{\prime \ 2}}{\sqrt{1 + y^{\prime \ 2}}} - (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = C

Multiplying through by \sqrt{1 + y^{\prime \ 2}}/ \alpha, absorbing \alpha into the constant term, and writing \delta \equiv \beta/\alpha we get

(1 + \delta y) y^{\prime \ 2} - (1 + \delta y)(1 + y^{\prime \ 2}) = C\sqrt{1 + y^{\prime \ 2}}

\iff

-(1 + \delta y) = C\sqrt{1 + y^{\prime \ 2}}

\implies

y^{\prime} = \frac{\pm \sqrt{(1+\delta y)^2 - C^2}}{C}

This is a first-order differential equation for y which can be solved by separation of variables. We get the integral equation

\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \pm \int \frac{dx}{C}

To solve the integral on the left-hand side, make the change of variable

(1 + \delta y) = C sec \theta

\implies

\delta dy = C sec \theta \ tan \theta \ d \theta

Then

\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \int \frac{C sec \theta tan \theta d \theta}{\delta \sqrt{C^2 sec^2 \theta - C^2}}

= \frac{1}{\delta}\int tan \theta d \theta

= \frac{1}{\delta} ln[sec \theta] + const.

= \frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] + const.

For the integral on the right-hand side of the integral equation we get

\pm \int \frac{dx}{C} = \pm \frac{x}{C} + const.

Therefore the integral equation reduces to

\frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] = \pm \frac{x}{C} + const.

\implies

y = \frac{Cexp\big(\pm\frac{\delta x}{C} + const.\big) - 1}{\delta}

This seems to represent two possible solutions for the first-integral equation, which we may write as

y_1 = \frac{Cexp\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

y_2 = \frac{Cexp\big(- \big[ \frac{\delta x}{C} + const. \big] \big) - 1}{\delta}

However, for the curved light ray in my amended diagram above we must have y \rightarrow \infty as x \rightarrow \pm \infty. This condition is not satisfied by either of y_1 or y_2 on their own, but it is satisfied by their sum. We will therefore take the solution of the first integral equation to be

y = \frac{y_1 + y_2}{2}

= \frac{C}{\delta}\bigg[\frac{exp\big(\frac{\delta x}{C} + const.\big) + exp\big(- \big[ \frac{\delta x}{C} + const. \big] \big)}{2}\bigg] - \frac{1}{\delta}

= \frac{C cosh\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

Furthermore, we have y(a) = y(-a) = b and therefore we require

cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(-\frac{\delta a}{C} + const. \big)

But

cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) + sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)

and

cosh\big(-\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) - sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)

These cannot be equal unless sinh(const.) = 0 \implies const. = 0. Thus, our solution for y reduces to

y = \frac{C cosh\big(\frac{\delta x}{C}\big) - 1}{\delta}

with the constant C determined in terms of a and b by

b = \frac{C cosh\big(\frac{\delta a}{C}\big) - 1}{\delta}

This is the equation of the curved path of the light ray from the sky in Feynman’s diagram. The slope of y at point B = (-a, b) is

y^{\prime}(-a) = -sinh\big(\frac{\delta a}{C}\big)

The straight line with this gradient passing through the point B has equation

y = \big(b - asinh\big(\frac{\delta a}{C}\big)\big) - sinh\big(\frac{\delta a}{C}\big)x

This is the equation of the straight line emanating from the x-axis to the observer’s eye in my amended diagram above. On the x-axis we have y = 0 in the straight-line equation so

x = \frac{b}{sinh\big(\frac{\delta a}{C}\big)} - a

This is the point on the x-axis at which the observer in my amended diagram will see the mirage.

Notes on Sturm-Liouville theory

Sturm-Liouville theory was developed in the 19th century in the context of solving differential equations. When one studies it in depth, however, one experiences a sudden realisation that this is the mathematics underlying a lot of quantum mechanics. In quantum mechanics we envisage a quantum state (a time-dependent function) expressed as a superposition of eigenfunctions of a self-adjoint operator (usually referred to as a Hermitian operator) representing an observable. The coefficients of the eigenfunctions in this superposition are probability amplitudes. A measurement of the observable quantity represented by the Hermitian operator produces one of the eigenvalues of the operator with a probability equal to the square of the probability amplitude attached to the eigenfunction corresponding to that eigenvalue in the superposition. It is the fact that the operator is self-adjoint that ensures the eigenvalues are real (and thus observable), and furthermore, that the eigenfunctions corresponding to the eigenvalues form a complete and orthogonal set of functions enabling quantum states to be represented as a superposition in the first place (i.e., an eigenfunction expansion akin to a Fourier series). The Sturm-Liouville theory of the 19th century has essentially this same structure and in fact Sturm-Liouville eigenvalue problems are important more generally in mathematical physics precisely because they frequently arise in attempting to solve commonly-encountered partial differential equations (e.g., Poisson’s equation, the diffusion equation, the wave equation, etc.), particularly when the method of separation of variables is employed.

I want to get an overview of Sturm-Liouville theory in the present note and will begin by considering a nice discussion of a vibrating string problem in Courant & Hilbert’s classic text, Methods of Mathematical Physics (Volume I). Although the problem is simple and the treatment in Courant & Hilbert a bit terse, it (remarkably) brings up a lot of the key features of Sturm-Liouville theory which apply more generally in a wide variety of physics problems. I will then consider Sturm-Liouville theory in a more general setting emphasising the role of the Sturm-Liouville differential operator, and finally I will illustrate further the occurrence of Sturm-Liouville systems in physics by looking at the eigenvalue problems encountered when solving Schrödinger’s equation for the hydrogen atom.

On page 287 of Volume I, Courant & Hilbert write the following:

Equation (12) here is the one-dimensional wave equation

\frac{\partial^2 u}{\partial x^2} = \mu^2 \frac{\partial^2 u}{\partial t^2}

which (as usual) the authors are going to solve by using a separation of variables of the form

u(x, t) = v(x) g(t)

As Courant & Hilbert explain, the problem then involves finding the function v(x) by solving the second-order homogeneous linear differential equation

\frac{\partial^2 v}{\partial x^2} + \lambda v = 0

subject to the boundary conditions

v(0) = v(\pi) = 0

Although not explicitly mentioned by Courant & Hilbert at this stage, equations (13) and (13a) in fact constitute a full blown Sturm-Liouville eigenvalue problem. Despite being very simple, this setup captures many of the typical features encountered in a wide variety of such problems in physics. It is instructive to explore the text underneath equation (13a):

Not all these requirements can be fulfilled for arbitrary values of the constant \lambda.

… the boundary conditions can be fulfilled if and only if \lambda = n^2 is the square of an integer n.

To clarify this, we can try to solve (13) and (13a) for the three possible cases: \lambda < 0, \lambda = 0 and \lambda > 0. Suppose first that \lambda < 0. Then -\lambda > 0 and the auxiliary equation for (13) is

D^2 = - \lambda

\implies

D = \pm \sqrt{- \lambda}

Thus, we can write the general solution of (13) in this case as

v = \alpha e^{\sqrt{-\lambda} x} + \beta e^{-\sqrt{-\lambda} x} = A \mathrm{cosh} \big(\sqrt{-\lambda} x\big) + B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)

where A and B are constants to be determined from the boundary conditions. From the boundary condition v(0) = 0 we conclude that A = 0 so the equation reduces to

v = B \mathrm{sinh} \big(\sqrt{-\lambda} x\big)

But from the boundary condition v(\pi) = 0 we are forced to conclude that B = 0 since \mathrm{sinh} \big(\sqrt{-\lambda} \pi\big) \neq 0. Therefore there is only the trivial solution v(x) = 0 in the case \lambda < 0.

Next, suppose that \lambda = 0. Then equation (13) reduces to

\frac{\mathrm{d}^2 v}{\mathrm{d} x^2} = 0

\implies

v = A + Bx

From the boundary condition v(0) = 0 we must conclude that A = 0, and the boundary condition v(\pi) = 0 means we are also forced to conclude that B = 0. Thus, again, there is only the trivial solution v(x) = 0 in the case \lambda = 0.

We see that nontrivial solutions can only be obtained when \lambda > 0. In this case we have -\lambda < 0 and the auxiliary equation is

D^2 = - \lambda

\implies

D = \pm i \sqrt{\lambda}

Thus, we can write the general solution of (13) in this case as

v = \alpha e^{i \sqrt{\lambda} x} + \beta e^{- i \sqrt{\lambda} x} = A \mathrm{cos} \big(\sqrt{\lambda} x\big) + B \mathrm{sin} \big(\sqrt{\lambda} x\big)

where A and B are again to be determined from the boundary conditions. From the boundary condition v(0) = 0 we conclude that A = 0 so the equation reduces to

v = B \mathrm{sin} \big(\sqrt{\lambda} x\big)

But from the boundary condition v(\pi) = 0 we must conclude that, if B \neq 0, then we must have \sqrt{\lambda} = n where n = 1, 2, 3, \ldots. Thus, we find that for each n = 1, 2, 3, \ldots, the eigenvalues of this Sturm-Liouville problem are \lambda_n = n^2, and the corresponding eigenfunctions are v = B \mathrm{sin}\big(n x\big). The coefficient B is undetermined and must be specified through some normalisation process, for example by setting the integral of v^2 between 0 and \pi equal to 1 and then finding the value of B that is consistent with this. In Courant & Hilbert they have (implicitly) simply set B = 1.

Some features of this solution are typical of Sturm-Liouville eigenvalue problems in physics more generally. For example, the eigenvalues are real (rather than complex) numbers, there is a minimum eigenvalue (\lambda_1 = 1) but not a maximum one, and for each eigenvalue there is a unique eigenfunction (up to a multiplicative constant). Also, importantly, the eigenfunctions here form a complete and orthogonal set of functions. Orthogonality refers to the fact that the integral of a product of any two distinct eigenfunctions over the interval (0, \pi) is zero, i.e.,

\int_0^{\pi} \mathrm{sin}(nx) \mathrm{sin}(mx) \mathrm{d} x = 0

for n \neq m, as can easily be demonstrated in the same way as in the theory of Fourier series. Completeness refers to the fact that over the interval (0, \pi) the infinite set of functions \mathrm{sin} (nx), n = 1, 2, 3, \ldots, can be used to represent any sufficiently well behaved function f(x) using a Fourier series of the form

f(x) = \sum_{n=1}^{\infty} a_n \mathrm{sin} (nx)

All of this is alluded to (without explicit explanation at this stage) in the subsequent part of this section of Courant & Hilbert’s text, where they go on to provide the general solution of the vibrating string problem. They write the following:

The properties of completeness and orthogonality of the eigenfunctions are again a typical feature of the solutions of Sturm-Liouville eigenvalue problems more generally, and this is one of the main reasons why Sturm-Liouville theory is so important to the solution of physical problems involving differential equations. To get a better understanding of this, I will now develop Sturm-Liouville theory in a more general setting by starting with a standard second-order homogeneous linear differential equation of the form

\alpha(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \beta(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \gamma(x) y = 0

where the variable x is confined to an interval a \leq x \leq b.

Let

p(x) = \mathrm{exp} \bigg(\int \mathrm{d} x \frac{\beta(x)}{\alpha(x)}\bigg)

q(x) = \frac{\gamma(x)}{\alpha(x)} p(x)

Dividing the differential equation by \alpha(x) and multiplying through by p(x) we get

p(x) \frac{\mathrm{d}^2 y}{\mathrm{d} x^2} + \frac{\beta(x)}{\alpha(x)} p(x) \frac{\mathrm{d} y}{\mathrm{d} x} + \frac{\gamma(x)}{\alpha(x)} p(x) y = 0

\iff

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} y}{\mathrm{d} x} \bigg) + q(x) y = 0

\iff

L y = 0

where

L = \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} }{\mathrm{d} x} \bigg) + q(x)

is called the Sturm-Liouville differential operator. Thus, we see already that a wide variety of second-order differential equations encountered in physics will be able to be put into a form involving the operator L, so results concerning the properties of L will have wide applicability.

Using the Sturm-Liouville operator we can now write the defining differential equation of Sturm-Liouville theory in an eigenvalue-eigenfunction format that is very reminiscent of the setup in quantum mechanics outlined at the start of this note. The defining differential equation is

L \phi = - \lambda w \phi

where w(x) is a real-valued positive weight function and \lambda is an eigenvalue corresponding to the eigenfunction \phi. This differential equation is often written out in full as

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

with x \in [a, b]. In Sturm-Liouville problems, the functions p(x), q(x) and w(x) are specified at the start and, crucially, the function \phi is required to satisfy particular boundary conditions at a and b. The boundary conditions are a key aspect of each Sturm-Liouville problem; for a given form of the differential equation, different boundary conditions can produce very different problems. Solving a Sturm-Liouville problem involves finding the values of \lambda for which there exist non-trivial solutions of the defining differential equation above subject to the specified boundary conditions. The vibrating string problem in Courant & Hilbert (discussed above) is a simple example. We obtain the differential equation (13) in that problem by setting p(x) = 1, q(x) = 0 and w(x) = 1 in the defining Sturm-Liouville differential equation.

We would now like to prove that the eigenvalues in a Sturm-Liouville problem will always be real and that the eigenfunctions will form an orthogonal set of functions, as claimed earlier. To do this, we need to consider a few more developments. In Sturm-Liouville theory we can apply L to both real and complex functions, and a key role is played by the concept of the inner product of such functions. Using the notation f(x)^{*} to denote the complex conjugate of the function f(x), we define the inner product of two functions f and g over the interval a \leq x \leq b as

(f, g) = \int_a^b \mathrm{d} x f(x)^{*} g(x)

and we define the weighted inner product as

(f, g)_w = \int_a^b \mathrm{d} x  w(x) f(x)^{*} g(x)

where w(x) is the real-valued positive weight function mentioned earlier. A key result in the theory is Lagrange’s identity, which says that for any two complex-valued functions of a real variable u(x) and v(x), we have

v(Lu)^{*} - u^{*} Lv = \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

This follows from the form of L, since

v(Lu)^{*} - u^{*} Lv = v\bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + q(x) u^{*}\bigg] - u^{*} \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) + q(x) v\bigg]

= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= v \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) + \frac{\mathrm{d} v}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - u^{*} \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg) - \frac{\mathrm{d} u^{*}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) v \frac{\mathrm{d}u^{*}}{\mathrm{d} x} \bigg) - \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) u^{*} \frac{\mathrm{d} v}{\mathrm{d} x} \bigg)

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

Using the inner product notation, we can write Lagrange’s identity in an alternative form that reveals the crucial role played by the boundary conditions in a Sturm-Liouville problem. We have

(Lu, v) - (u, Lv) = \int_a^b (Lu)^{*} v \mathrm{d} x - \int_a^b u^{*} Lv \mathrm{d} x

= \int_a^b \frac{\mathrm{d}}{\mathrm{d} x} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg] \mathrm{d} x

= \int_a^b \mathrm{d} \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]

= \bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b

For some boundary conditions the final term here is zero and then we will have

(Lu, v) = (u, Lv)

When this happens, the operator in conjunction with the boundary conditions is said to be self-adjoint. As an example, a so-called regular Sturm-Liouville problem involves solving the differential equation

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

subject to what are called separated boundary conditions, taking the form

A_1 \phi(a) + A_2 \phi^{\prime}(a) = 0

and

B_1 \phi(b) + B_2 \phi^{\prime}(b) = 0

In this case, the operator L is self-adjoint. To see this, suppose the functions u and v satisfy these boundary conditions. Then at a we have

A_1 u(a)^{*} + A_2 u^{\prime}(a)^{*} = 0

and

A_1 v(a) + A_2 v^{\prime}(a) = 0

from which we can deduce that

\frac{u^{\prime}(a)^{*}}{u(a)^{*}} = -\frac{A_1}{A_2} = \frac{v^{\prime}(a)}{v(a)}

\implies

v(a) u^{\prime}(a)^{*} = u(a)^{*} v^{\prime}(a)

Similarly, at the boundary point b we find that

v(b) u^{\prime}(b)^{*} = u(b)^{*} v^{\prime}(b)

These results then imply

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0

so the operator L is self-adjoint as claimed. As another example, a singular Sturm-Liouville problem involves solving the same differential equation as in the regular problem, but subject to the boundary condition that p(x) is zero at either a or b or both, while being positive for a < x < b. If p(x) does not vanish at one of the boundary points, then \phi is required to satisfy the same boundary condition at that point as in the regular problem. Clearly we will have

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b = 0

in this case too, so the operator L will also be self-adjoint in the case of a singular Sturm-Liouville problem. As a final example, suppose the Sturm-Liouville problem involves solving the same differential equation as before, but with periodic boundary conditions of the form

\phi(a) = \phi(b)

\phi^{\prime}(a) = \phi^{\prime}(b)

and

p(a) = p(b)

Then if u and v are two functions satisfying these boundary conditions we will have

\bigg[p(x) \bigg(v \frac{\mathrm{d}u^{*}}{\mathrm{d}x} - u^{*}\frac{\mathrm{d}v}{\mathrm{d}x}\bigg) \bigg]_a^b

= p(b) \bigg(v(b) u^{\prime}(b)^{*} - u(b)^{*} v^{\prime}(b)\bigg) - p(a) \bigg(v(a) u^{\prime}(a)^{*} - u(a)^{*} v^{\prime}(a)\bigg)

= p(a) \bigg[\bigg(v(b) u^{\prime}(b)^{*} - v(a) u^{\prime}(a)^{*}\bigg) + \bigg(u(a)^{*} v^{\prime}(a) - u(b)^{*} v^{\prime}(b)\bigg)\bigg] = 0

So again, the operator L will be self-adjoint in the case of periodic boundary conditions. We will see later that the singular and periodic cases arise when attempting to solve Schrödinger’s equation for the hydrogen atom.

The key reason for focusing so much on the self-adjoint property of the operator L is that the eigenvalues of a self-adjoint operator are always real, and the eigenfunctions are orthogonal. Note that by orthogonality of the eigenfunctions in the more general context we mean that

(\phi_n, \phi_m)_w = \int_a^b \mathrm{d} x w(x) \phi_n(x)^{*} \phi_m(x) = 0

whenever \phi_n(x) and \phi_m(x) are eigenfunctions corresponding to two distinct eigenvalues.

To prove that the eigenvalues are always real, suppose that \phi(x) is an eigenfunction corresponding to an eigenvalue \lambda. Then we have

L \phi = - \lambda w \phi

and so

(L \phi, \phi) = (- \lambda w \phi, \phi) = \int_a^b (- \lambda w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*} \int_a^b (w \phi)^{*} \phi \mathrm{d} x = -\lambda^{*}\int_a^b \mathrm{d}x w(x)|\phi(x)|^2

But we also have

(\phi, L \phi) = (\phi, - \lambda w \phi) = \int_a^b \phi^{*}(- \lambda w \phi) \mathrm{d} x = -\lambda \int_a^b \phi^{*} (w \phi) \mathrm{d} x = -\lambda\int_a^b \mathrm{d}x w(x)|\phi(x)|^2

Therefore if the operator is self-adjoint we can write

(L \phi, \phi) - (\phi, L \phi) = (\lambda - \lambda^{*}) \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 = 0

\implies

\lambda = \lambda^{*}

since \int_a^b \mathrm{d}x w(x)|\phi(x)|^2 > 0, so the eigenvalues must be real. In particular, this must be the case for regular and singular Sturm-Liouville problems, and for Sturm-Liouville problems involving periodic boundary conditions.

To prove that the eigenfunctions are orthogonal, let \phi(x) and \psi(x) denote two eigenfunctions corresponding to distinct eigenvalues \lambda and \mu respectively. Then we have

L \phi = - \lambda w \phi

L \psi = - \mu w \psi

and so by the self-adjoint property we can write

(L \phi, \psi) - (\phi, L \psi) = \int_a^b (- \lambda w \phi)^{*} \psi \mathrm{d} x - \int_a^b \phi^{*} (- \mu w \psi) \mathrm{d} x

= (\mu - \lambda) \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0

Since the eigenvalues are distinct, the only way this can happen is if

(\phi, \psi)_w = \int_a^b \mathrm{d}x w(x)\phi(x)^{*} \psi(x) = 0

so the eigenfunctions must be orthogonal as claimed.

In addition to being orthogonal, the eigenfunctions \phi_n(x), n = 1, 2, 3, \dots, of a Sturm-Liouville problem with specified boundary conditions also form a complete set of functions (I will not prove this here), which means that any sufficiently well-behaved function f(x) for which \int_a^b\mathrm{d} x |f(x)|^2 exists can be represented by a Fourier series of the form

f(x) = \sum_{n=1}^{\infty} a_n \phi_n(x)

for x \in [a, b], where the coefficients a_n are given by the formula

a_n = \frac{(\phi_n, f)_w}{(\phi_n, \phi_n)_w} = \frac{\int_a^b \mathrm{d}x w(x) \phi_n(x)^{*} f(x)}{\int_a^b \mathrm{d}x w(x) |\phi_n(x)|^2}

It is the completeness and orthogonality of the eigenfunctions that makes Sturm-Liouville theory so useful in solving linear differential equations, because (for example) it means that the solutions of many second-order inhomogeneous linear differential equations of the form

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)

with suitable boundary conditions can be expressed as a linear combination of the eigenfunctions of the corresponding Sturm-Liouville problem

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + \big(q(x) + \lambda w(x)\big) \phi = 0

with the same boundary conditions. To illustrate this, suppose this Sturm-Liouville problem with boundary conditions \phi(a) = \phi(b) = 0 has an infinite set of eigenvalues \lambda_k and corresponding eigenfunctions \phi_k(x), k = 1, 2, 3, \dots, which are orthogonal and form a complete set. We will assume that the solution of the inhomogeneous differential equation above is an infinite series of the form

\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)

where the coefficients a_k are constants, and we will find these coefficients using the orthogonality of the eigenfunctions. Since for each k it is true that

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k = - \lambda_k w(x) \phi_k

we can write

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q \phi

= \frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \sum_{k = 1}^{\infty} a_k \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \sum_{k=1}^{\infty} a_k \phi_k

= \sum_{k=1}^{\infty} a_k \bigg[\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p \frac{\mathrm{d} \phi_k}{\mathrm{d} x} \bigg) + q \phi_k\bigg]

= \sum_{k=1}^{\infty} a_k\big[- \lambda_k w(x) \phi_k\big]

= - \sum_{k=1}^{\infty} a_k \lambda_k w(x) \phi_k

Thus, in the inhomogeneous equation

\frac{\mathrm{d}}{\mathrm{d} x} \bigg(p(x) \frac{\mathrm{d} \phi}{\mathrm{d} x} \bigg) + q(x) \phi = F(x)

we can put

F(u) = - \sum_{k=1}^{\infty} a_k \lambda_k w(u) \phi_k(u)

To find the mth coefficient a_m we can multiply both sides by \phi_m(u)^{*} and integrate. By orthogonality, all the terms in the sum on the right will vanish except the one involving \phi_m(u). We will get

\int_a^b \phi_m(u)^{*} F(u) \mathrm{d}u = - \int_a^b a_m \lambda_m w(x) \phi_m(u)^{*}\phi_m(u) \mathrm{d} u = -a_m \lambda_m (\phi_m, \phi_m)_w

\implies

a_m = -\int_a^b \frac{\phi_m(u)^{*} F(u)}{\lambda_m (\phi_m, \phi_m)_w}\mathrm{d} u

Having found a formula for the coefficients a_k, we can now write the solution of the original inhomogeneous differential equation as

\phi(x) = \sum_{k = 1}^{\infty} a_k \phi_k(x)

= \sum_{k = 1}^{\infty} \bigg(-\int_a^b \frac{\phi_k(u)^{*} F(u)}{\lambda_k (\phi_k, \phi_k)_w}\mathrm{d} u\bigg) \phi_k(x)

= \int_a^b \mathrm{d} u \bigg(-\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}\bigg) F(u)

= \int_a^b \mathrm{d} u G(x, u) F(u)

where

G(x, u) \equiv -\sum_{k = 1}^{\infty} \frac{\phi_k(u)^{*} \phi_k(x)}{\lambda_k (\phi_k, \phi_k)_w}

To conclude this note, I want to go back to a previous note in which I explored in detail the solution of Schrödinger’s equation for the hydrogen atom by the method of separation of variables. This approach reduced Schrödinger’s partial differential equation into a set of three uncoupled ordinary differential equations which we can now see are in fact Sturm-Liouville problems. As discussed in my previous note, Schrödinger’s three-dimensional equation for the hydrogen atom can be written in spherical polar coordinates as

\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0

and after solving this by the usual separation of variables approach starting from the assumption that the \psi function can be expressed as a product

\psi(r, \theta, \phi) = R(r) \Phi(\phi) \Theta(\theta)

we end up with an equation for R (the radial equation) of the form

\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0

and equations for \Phi and \Theta of the forms

\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0

and

\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0

respectively. Taking each of these in turn, we first observe that the radial equation is of the Sturm-Liouville form with p(r) = r^2 and eigenvalues corresponding to the energy term E in the equation. The variable r can range between 0 and \infty and the boundary conditions are formulated in such a way that the solutions of the radial equation remain bounded as r \rightarrow 0 and go to zero as r \rightarrow \infty. Furthermore, since p(0) =0, the radial equation is a singular Sturm-Liouville problem. Next, we observe that the equation for \Phi is essentially the same as equation (13) for the vibrating string in the extract from Courant & Hilbert discussed at the start of this note. The azimuth angle \phi can take any value in (-\infty, \infty) but the function \Phi must take a single value at each point in space (since this is a required property of the quantum wave function which \Phi is a constituent of). It follows that the function \Phi must be periodic since it must take the same value at \phi and \phi + 2\pi for any given \phi. This condition implies the conditions \Phi(0) = \Phi(2 \pi) and \Phi^{\prime}(0) = \Phi^{\prime}(2\pi). Furthermore, we have p(\phi) = 1 for all \phi. Thus, the equation for \Phi is a Sturm-Liouville problem with periodic boundary conditions. Finally, as discussed in my previous note, the \Theta equation can be rewritten as

(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

\iff

\frac{d}{d x}\bigg((1 - x^2) \frac{d \Theta}{d x}\bigg) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0

where x = \cos \theta and thus -1 \leq x \leq 1. This is a Sturm-Liouville problem with p(x) = 1 - x^2 and the boundary conditions are given by the requirement that \Theta(\theta) should remain bounded for all x. Since p(x) = 0 at both ends of the interval [-1, 1], this equation can be classified as a singular Sturm-Liouville problem. The eigenvalue is \lambda in this equation.

 

Study of a proof of Noether’s theorem and its application to conservation laws in physics

While I have for a long time been aware of Noether’s theorem and its relevance to symmetry and conservation laws in physics, I have only recently taken the time to closely explore its mathematical proof. In the present note I want to record some notes I made on the mathematical nuances involved in a proof of Noether’s theorem and the mathematical relevance of the theorem to some simple conservation laws in classical physics, namely, the conservation of energy and the conservation of linear momentum. Noether’s Theorem has important applications in a wide range of classical mechanics problems as well as in quantum mechanics and Einstein’s relativity theory. It is also used in the study of certain classes of partial differential equations that can be derived from variational principles.

The theorem was first published by Emmy Noether in 1918. An English translation of the full original paper is available here. An interesting book by Yvette Kosmann-Schwarzbach also presents an English translation of Noether’s 1918 paper and discusses in detail the history of the theorem’s development and its impact on theoretical physics in the 20th Century. (Kosmann-Schwarzbach, Y, 2011, The Noether Theorems: Invariance and Conservation Laws in the Twentieth Century. Translated by Bertram Schwarzbach. Springer). At the time of writing, the book is freely downloadable from here.

Mathematical setup of Noether’s theorem

The case I explore in detail here is that of a variational calculus functional of the form

S[y] = \int_a^b \mathrm{d}x F(x, y, y^{\prime})

where x is a single independent variable and y = (y_1, y_2, \ldots, y_n) is a vector of n dependent variables. The functional has stationary paths defined by the usual Euler-Lagrange equations of variational calculus. Noether’s theorem concerns how the value of this functional is affected by families of continuous transformations of the dependent and independent variables (e.g., translations, rotations) that are defined in terms of one or more real parameters. The case I explore in detail here involves transformations defined in terms of only a single parameter, call it \delta. The transformations can be represented in general terms as

\overline{x} = \Phi(x, y, y^{\prime}; \delta)

\overline{y}_k = \Psi_k(x, y, y^{\prime}; \delta)

for k = 1, 2, \ldots, n. The functions \Phi and \Psi_k are assumed to have continuous first derivatives with respect to all the variables, including the parameter \delta. Furthermore, the transformations must reduce to identities when \delta = 0, i.e.,

x \equiv \Phi(x, y, y^{\prime}; 0)

y_k \equiv \Psi_k(x, y, y^{\prime}; 0)

for k = 1, 2, \ldots, n. As concrete examples, translations and rotations are continuous differentiable transformations that can be defined in terms of a single parameter and that reduce to identities when the parameter takes the value zero.

Noether’s theorem is assumed to apply to infinitesimally small changes in the dependent and independent variables, so we can assume |\delta| \ll 1 and then use perturbation theory to prove the theorem. Treating \overline{x} and \overline{y}_k as functions of \delta and Taylor-expanding them about \delta = 0 we get

\overline{x}(\delta) = \overline{x}(0) + \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

where

\phi(x, y, y^{\prime}) \equiv \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}

and

\overline{y}_k (\delta) = \overline{y}_k (0) + \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

where

\psi_k (x, y, y^{\prime}) \equiv \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0} for k = 1, 2, \ldots, n.

Noether’s theorem then says that whenever the functional S[y] is invariant under the above family of transformations, i.e., whenever

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

for all c and d such that a \leq c < d \leq b, where \overline{c} = \Phi(c, y(c), y^{\prime}(c)) and \overline{d} = \Phi(d, y(d), y^{\prime}(d)), then for each stationary path of S[y] the following equation holds:

\sum_{k = 1}^n \frac{\partial F}{\partial y_k^{\prime}}\psi_k + \bigg(F - \sum_{k = 1}^n y_k^{\prime}\frac{\partial F}{\partial y_k^{\prime}}\bigg)\phi = \mathrm{constant}

As illustrated below, this remarkable equation encodes a number of conservation laws in physics, including conservation of energy, linear and angular momentum given that the relevant equations of motion are invariant under translations in time and space, and under rotations in space respectively. Thus, Noether’s theorem is often expressed as a statement along the lines that whenever a system has a continuous symmetry there must be corresponding quantities whose values are conserved.

Application of the theorem to familiar conservation laws in classical physics

It is, of course, not necessary to use the full machinery of Noether’s theorem for simple examples of conservation laws in classical physics. The theorem is most useful in unfamiliar situations in which it can reveal conserved quantities which were not previously known. However, going through the motions in simple cases clarifies how the mathematical machinery works in more sophisticated and less familiar situations.

To obtain the law of the conservation of energy in the simplest possible scenario, consider a particle of mass m moving along a straight line in a time-invariant potential field V(x) with position at time t given by the function x(t). The Lagrangian formulation of mechanics then says that the path followed by the particle will be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2 - V(x)\big)

The Euler-Lagrange equation for this functional would give Newton’s second law as the equation governing the particle’s motion. With regard to demonstrating energy conservation, we notice that the Lagrangian, which is more generally of the form L(t, x, \dot{x}) when there is a time-varying potential, here takes the simpler form L(x, \dot{x}) because there is no explicit dependence on time. Therefore we might expect the functional to be invariant under translations in time, and thus Noether’s theorem to hold. We will verify this. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \phi + O(\delta^2) \equiv t + \delta

and

\overline{x}(\delta) = x + \delta \cdot 0 + O(\delta^2) \equiv x

From the first equation we see that \phi = 1 in the case of a simple translation in time by an amount \delta, and from the second equation we see that \psi = 0, which simply reflects the fact that we are only translating in the time direction. The invariance of the functional under these transformations can easily be demonstrated by writing

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\overline{x}, \dot{\overline{x}}) = \int_{\overline{0}-\delta}^{\overline{T}-\delta} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t L(x, \dot{x})

where the limits in the second integral follow from the change of the time variable from \overline{t} to t. Thus, Noether’s theorem holds and with \phi = 1 and \psi = 0 the fundamental equation in the theorem reduces to

L - \dot{x}\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

Evaluating the terms on the left-hand side we get

\frac{1}{2}m\dot{x}^2 - V(x) - \dot{x} m\dot{x} =\mathrm{constant}

\iff

\frac{1}{2}m\dot{x}^2 + V(x) = E = \mathrm{constant}

which is of course the statement of the conservation of energy.

To obtain the law of conservation of linear momentum in the simplest possible scenario, assume now that the above particle is moving freely in the absence of any potential field, so V(x) = 0 and the only energy involved is kinetic energy. The path followed by the particle will now be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(\dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2\big)

The Euler-Lagrange equation for this functional would give Newton’s first law as the equation governing the particle’s motion (constant velocity in the absence of any forces). To get the law of conservation of linear momentum we will consider a translation in space rather than time, and check that the action functional is invariant under such translations. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \cdot 0 + O(\delta^2) \equiv t

and

\overline{x}(\delta) = x + \delta \psi + O(\delta^2) \equiv x + \delta

From the first equation we see that \phi = 0 reflecting the fact that we are only translating in the space direction, and from the second equation we see that \psi = 1 in the case of a simple translation in space by an amount \delta. The invariance of the functional under these transformations can easily be demonstrated by noting that \dot{\overline{x}} = \dot{x}, so we can write

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\dot{\overline{x}}) = \int_0^{T} \mathrm{d}t L(\dot{x})

since the limits of integration are not affected by the translation in space. Thus, Noether’s theorem holds and with \phi = 0 and \psi = 1 the fundamental equation in the theorem reduces to

\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

\iff

m\dot{x} = \mathrm{constant}

This is, of course, the statement of the conservation of linear momentum.

Proof of Noether’s theorem

To prove Noether’s theorem we will begin with the transformed functional

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

We will substitute into this the linearised forms of the transformations, namely

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

and

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

for k = 1, 2, \ldots, n, and then expand to first order in \delta. Note that the integration limits are, to first order in \delta,

\overline{c} = c + \delta \phi(c)

and

\overline{d} = d + \delta \phi(d)

Using the linearised forms of the transformations and writing \psi = (\psi_1, \psi_2, \ldots, \psi_n) we get

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \frac{\mathrm{d}x}{\mathrm{d}\overline{x}}

\frac{\mathrm{d}\overline{x}}{\mathrm{d}x} = 1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}

Inverting the second equation we get

\frac{\mathrm{d}x}{\mathrm{d}\overline{x}} = \big(1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big)^{-1} = 1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} + O(\delta^2)

Using this in the first equation we find, to first order in \delta,

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \big(1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big) = \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)

Making the necessary substitutions we can then write the transformed functional as

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

= \int_{\overline{c}-\delta \phi(c)}^{\overline{d}-\delta \phi(d)} \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

Treating F as a function of \delta and expanding about \delta = 0 to first order we get

F(\delta) = F(0) + \delta \frac{\partial F}{\partial \delta}\big|_{\delta = 0}

= F(x, y, y^{\prime}) + \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

Then using the expression for \frac{\mathrm{d}\overline{x}}{\mathrm{d}x} above, the transformed functional becomes

\int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \int_c^d \mathrm{d}x \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F(x, y, y^{\prime}) + O(\delta^2)

Ignoring the second order term in \delta we can thus write

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \delta \int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg)

Since the functional is invariant, however, this implies

\int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg) = 0

We now manipulate this equation by integrating the terms involving \frac{\mathrm{d}\phi}{\mathrm{d}x} and \frac{\mathrm{d}\psi_k}{\mathrm{d}x} by parts. We get

\int_c^d \mathrm{d}x \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} = \bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg]_c^d

- \int_c^d \mathrm{d}x \phi \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

and

\int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} = \bigg[\sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d - \int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\psi_k

Substituting these into the equation gives

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \bigg(\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can manipulate this equation further by expanding the integrand in the second term on the left-hand side. We get

\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

= \frac{\partial F}{\partial x} - \frac{\partial F}{\partial x} - \sum_{k=1}^n \frac{\partial F}{\partial y_k}y^{\prime}_k - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n y^{\prime}_k \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)

= \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

Thus, the equation becomes

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can now see at a glance that the second and third terms on the left-hand side must vanish because of the Euler-Lagrange expressions appearing in the brackets (which are identically zero on stationary paths). Thus we arrive at the equation

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d = 0

which proves that the formula inside the square brackets is constant as per Noether’s theorem.

Simple variational setups yielding Newton’s Second Law and Schrödinger’s equation

It is a delightful fact that one can get both the fundamental equation of classical mechanics (Newton’s Second Law) and the fundamental equation of quantum mechanics (Schrödinger’s equation) by solving very simple variational problems based on the familiar conservation of mechanical energy equation

K + U = E

In the present note I want to briefly set these out emphasising the common underlying structure provided by the conservation of mechanical energy and the calculus of variations. The kinetic energy K will be taken to be

K = \frac{1}{2}m \dot{x}^2 = \frac{p^2}{2m}

where \dot{x} = \frac{\mathrm{d}x}{\mathrm{d}t} is the particle’s velocity, p = m\dot{x} is its momentum, and m is its mass. The potential energy U will be regarded as some function of x only.

To obtain Newton’s Second Law we find the stationary path followed by the particle with respect to the functional

S[x] = \int_{t_1}^{t_2} L(t, x, \dot{x}) dt  = \int_{t_1}^{t_2} (K - U) dt

The function L(t, x, \dot{x}) = K - U is usually termed the `Lagrangian’ in classical mechanics. The functional S[x] is usually called the `action’. The Euler-Lagrange equation for this calculus of variations problem is

\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t}\big(\frac{\partial L}{\partial \dot{x}}\big) = 0

and this is Newton’s Second Law in disguise! We have

\frac{\partial L}{\partial x} = -\frac{\mathrm{d}U}{\mathrm{d}x} \equiv F

\frac{\partial L}{\partial \dot{x}} = m\dot{x} \equiv p

and

\frac{\mathrm{d}}{\mathrm{d}t} \big(\frac{\partial L}{\partial \dot{x}}\big) = \frac{\mathrm{d}p}{\mathrm{d}t} = m\ddot{x} \equiv ma

so substituting these into the Euler-Lagrange equation we get Newton’s Second Law, F = ma.

To obtain Schrödinger’s equation we introduce a function

\psi(x) = exp\big(\frac{1}{\hbar}\int p\mathrm{d}x\big)

where p = m \dot{x} is again the momentum of the particle and \hbar is the reduced Planck’s constant from quantum mechanics. (Note that \int p dx has units of length^2 mass time^{-1} so we need to remove these by dividing by \hbar which has the same units. The function \psi(x) in quantum mechanics is dimensionless). We then have

\text{ln} \psi = \frac{1}{\hbar}\int p\mathrm{d}x

and differentiating both sides gives

\frac{\psi^{\prime}}{\psi} = \frac{1}{\hbar} p

so

p^2 = \hbar^2 \big(\frac{\psi^{\prime}}{\psi}\big)^2

Therefore we can write the kinetic energy as

K = \frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2

and putting this into the conservation of mechanical energy equation gives

\frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2 + U = E

\iff

\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2 = 0

We now find the stationary path followed by the particle with respect to the functional

T[\psi] = \int_{-\infty}^{\infty} M(x, \psi, \psi^{\prime}) \mathrm{d}x = \int_{-\infty}^{\infty}  \big(\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2\big)\mathrm{d}x

The Euler-Lagrange equation for this calculus of variations problem is

\frac{\partial M}{\partial \psi} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial M}{\partial \psi^{\prime}}\big) = 0

and this is Schrödinger’s equation in disguise! We have

\frac{\partial M}{\partial \psi} = 2(U - E)\psi

\frac{\partial M}{\partial \psi^{\prime}} = \frac{\hbar^2}{m} \psi^{\prime}

and

\frac{\mathrm{d}}{\mathrm{d}x} \big(\frac{\partial M}{\partial \psi^{\prime}}\big) = \frac{\hbar^2}{m} \psi^{\prime \prime}

so substituting these into the Euler-Lagrange equation we get

2(U - E) \psi - \frac{\hbar^2}{m} \psi^{\prime \prime} = 0

\iff

-\frac{\hbar^2}{2m} \frac{\mathrm{d}^2 \psi}{\mathrm{d} x^2} + U \psi = E \psi

and this is the (time-independent) Schrödinger equation for a particle of mass m with fixed total energy E in a potential U(x) on the line -\infty < x < \infty.

Derivation of Euler’s equations of motion for a perfect fluid from Newton’s second law

eulerHaving read a number of highly technical derivations of Euler’s equations of motion for a perfect fluid I feel that the mathematical meanderings tend to obscure the underlying physics. In this note I want to explore the derivation from a more physically intuitive point of view. The dynamics of a fluid element of mass m are governed by Newton’s second law which says that the vector sum of the forces acting on the fluid element is equal to the mass of the element times its acceleration. Thus,

\vec{F} = m \vec{a}

The net force \vec{F} can be decomposed into two distinct types of forces, so-called body forces \vec{W} that act on the entire fluid element (e.g., the fluid element’s weight due to gravity) and stresses \vec{S} such as pressures and shears that act upon the surfaces enclosing the fluid element. For the purposes of deriving the differential form of Euler’s equation we will focus on the net force per unit volume acting on the fluid element, \vec{f}, which we will decompose into a weight per unit volume \vec{w} and a net stress force per unit volume \vec{s}. The weight per unit volume is simply obtained as

\vec{w} = \rho \vec{g}

where

\rho = \frac{m}{V}

is the mass density of the fluid (i.e., mass per unit volume) and \vec{g} is the acceleration due to gravity. In index notation, the equation for the i-th component of the weight per unit volume is

w^i = \rho g^i

The net stress force per unit volume, \vec{s}, is a little more complicated to derive since it involves the rank-2 stress tensor \tau^{ij}. This tensor contains nine components and is usually represented as a 3 \times 3 symmetric matrix. In Cartesian coordinates the components along the main diagonal, namely \tau^{xx}, \tau^{yy} and \tau^{zz}, represent normal stresses, i.e., forces per unit area acting orthogonally to the planes whose normal vector is identified by the first superscript, as indicated in the diagram below. (Note that a stress is a force per unit area, so to convert a stress tensor component \tau^{ij} into a force it would be necessary to multiply it by the area over which it acts).

In the diagram each normal stress is shown as a tension, i.e., a normal stress pointing away from the surface. When a normal stress points towards the surface it acts upon, it is called a pressure.

The off-diagonal components of the stress tensor represent shear stresses, i.e., forces per unit area that point along the sides of the fluid element, parallel to these sides rather than normal to them. These shear stresses are shown in the following diagram.

Shear stresses only arise when there is some kind of friction in the fluid. A perfect fluid is friction-free so there are no shear stresses. Euler’s equation only applies to perfect fluids so for the derivation of the equation we can ignore the off-diagonal components of the stress tensor.

The normal stresses along the main diagonal are usually written as

\tau^{xx} = - p^x

\tau^{yy} = - p^y

\tau^{zz} = - p^z

where p stands for pressure and the negative sign reflects the fact that a pressure points in the opposite direction to a tension.

In a perfect fluid the pressure is isotropic, i.e., the same in all directions, so we have

p^x = p^y = p^z = p

Therefore the stress tensor of a perfect fluid with isotropic pressure reduces to

\tau^{ij} = -p \delta^{ij}

where \delta^{ij} is the Kronecker delta (and may be thought of here as the metric tensor of Cartesian 3-space).

Now suppose we consider the net stress (force per unit area) in the y-direction of an infinitesimal volume element.

The stress on the right-hand face can be approximated using a Taylor series expansion as being equal to the stress on the left plus a differential adjustment based on its gradient and the length dy. If we take the stress on the right to be pointing in the positive direction and the one on the left as pointing in the negative (opposite) direction, the net stress in the y-direction is given by

\tau^{yy} + \frac{\partial \tau^{yy}}{\partial y} dy - \tau^{yy} = \frac{\partial \tau^{yy}}{\partial y} dy

Similarly, the net stresses in the x and z-directions are

\frac{\partial \tau^{xx}}{\partial x} dx

and

\frac{\partial \tau^{zz}}{\partial z} dz

To convert these net stresses to net forces we multiply each one by the area on which it acts. Thus, the net forces on the fluid element (in vector form) are

\big(\frac{\partial \tau^{xx}}{\partial x} dxdydz\big) \vec{i}

\big(\frac{\partial \tau^{yy}}{\partial y} dxdydz\big) \vec{j}

\big(\frac{\partial \tau^{zz}}{\partial z} dxdydz\big) \vec{k}

The total net force on the fluid element is then

\big(\frac{\partial \tau^{xx}}{\partial x} \  \vec{i} + \frac{\partial \tau^{yy}}{\partial y} \  \vec{j} + \frac{\partial \tau^{zz}}{\partial z} \  \vec{k}\big) dxdydz

Switching from tensions to pressures using \tau^{ij} = -p \delta^{ij} and dividing through by the volume dxdydz we finally get the net stress force per unit volume to be

\vec{s} = -\big(\frac{\partial p}{\partial x} \  \vec{i} + \frac{\partial p}{\partial y} \  \vec{j} + \frac{\partial p}{\partial z} \  \vec{k}\big)

In index notation, the equation for the i-th component of this net pressure per unit volume is written as

s^i = -\frac{\partial p}{\partial x^i}

We have now completed the analysis of the net force on the left-hand side of Newton’s second law.

On the right-hand side of Newton’s second law we have mass times acceleration, where acceleration is the change in velocity with time. To obtain an expression for this we observe that the velocity of a fluid element may change for two different reasons. First, the velocity field may vary over time at each point in space. Second, the velocity may vary from point to point in space (at any given time). Thus, we consider the velocity field to be a function of the time coordinate as well as the three spatial coordinates, so

\vec{v} = \vec{v}(t, x, y, z) = v^x(t, x, y, z) \ \vec{i} + v^y(t, x, y, z) \ \vec{j} + v^z(t, x, y, z) \ \vec{k}

Considering the i-th component of this velocity field, the total differential is

dv^i = \frac{\partial v^i}{\partial t} \ dt + \frac{\partial v^i}{\partial x} \ dx + \frac{\partial v^i}{\partial y} \ dy + \frac{\partial v^i}{\partial z} \ dz

so the total derivative with respect to time is

\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} \ dt + v^x \frac{\partial v^i}{\partial x} + v^y \frac{\partial v^i}{\partial y} + v^z \frac{\partial v^i}{\partial z}

where I have used

v^x = \frac{dx}{dt}

v^y = \frac{dy}{dt}

v^z = \frac{dz}{dt}

We can write this more compactly using the Einstein summation convention as

\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}

This is then the i-th component of the acceleration vector on the right-hand side of Newton’s second law. In component form, therefore, we can write mass times acceleration per unit volume for the fluid element as

\rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)

This completes the analysis of the mass times acceleration term on the right-hand side of Newton’s second law.

In per-unit-volume form, Newton’s second law for a fluid element is

\vec{w} + \vec{s} = \rho \vec{a}

and writing this in the component forms derived above we get the standard form of Euler’s equations of motion for a perfect fluid:

\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)

 

 

 

 

 

On the classification of singularities, with an application to non-rotating black holes

singularity In mathematics a singularity is a point at which a mathematical object (e.g., a function) is not defined or behaves `badly’ in some way. Singularities can be isolated (e.g., removable singularities, poles and essential singularities) or nonisolated (e.g., branch cuts). For teaching purposes, I want to delve into some of the mathematical aspects of isolated singularities in this note using simple examples involving the complex sine function. I will not consider nonisolated singularities in detail. These are briefly discussed with some examples in this Wikipedia page. I will also briefly look at how singularities arise in the context of black hole physics in a short final section.

punctured

Definition: A function f has an isolated singularity at the point \alpha if f is analytic on a punctured open disc \{z: 0 < |z - \alpha| < r \}, where r > 0, but not at \alpha itself.

Note that a function f is analytic at a point \alpha if it is differentiable on a region containing \alpha. Strangely, a function can have a derivative at a point without being analytic there. For example, the function f(z) = |z|^2 has a derivative at z = 0 but at no other point, as can easily be verified using the Cauchy-Riemann equations. Therefore this function is not analytic at z = 0. Also note with regard to the definition of an isolated singularity that the function MUST be analytic on the `whole’ of the punctured open disc for the singularity to be defined. For example, despite appearances, the function

f(z) = \frac{1}{\sqrt{z}}

does not have a singularity at z = 0 because it is impossible to define a punctured open disc centred at 0 on which f(z) is analytic (the function z \rightarrow \sqrt{z} is discontinuous everywhere on the negative real axis, so f(z) fails to be analytic there).

I find it appealing that all three types of isolated singularity (removable, poles and essential singularities) can be illustrated by using members of the following family of functions:

f(z) = \frac{\sin(z^m)}{z^n}

where m, n \in \mathbb{N}. For example, if m = n = 1 we get

f_1(z) = \frac{\sin(z)}{z}

which has a removable singularity at z = 0. If m = 1, n = 3 we get

f_2(z) = \frac{\sin(z)}{z^3}

which has a pole of order 2 at z = 0. Finally, if m = -1, n = 0 we get

f_3(z) = \sin\big( \frac{1}{z} \big)

which has an essential singularity at z = 0. In each of these three cases, the function is not analytic at z = 0 but is analytic on a punctured open disc with centre 0, e.g., \{z: 0 < |z| < 1\} or indeed \mathbb{C} - \{0\} (which can be thought of as a punctured disc with infinite radius). In what follows I will use these three examples to delve into structural definitions of the three types of singularity. I will then explore their classification using Laurent series expansions.

Structural definitions of isolated singularities

Removable singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a removable singularity at \alpha if there is a function g which is analytic at \alpha such that

f(z) = g(z) for 0 < |z - \alpha| < r

We can see that g extends the analyticity of f to include \alpha, so we say that g is an analytic extension of f to the circle

\{z: |z - \alpha| < r \}

With removable singularities we always have that \lim_{z \rightarrow \alpha} f(z) exists since

\lim_{z \rightarrow \alpha} f(z) = g(\alpha)

(this will not be true for the other types of singularity) and the name of this singularity comes from the fact that we can effectively `remove’ the singularity by defining f(\alpha) = g(\alpha).

To apply this to the function

f_1(z) = \frac{\sin(z)}{z}

we first observe that the Maclaurin series expansion of \sin(z) is

\sin(z) = z - \frac{z^3}{3!} + \frac{z^5}{5!} - \frac{z^7}{7!} + \cdots for z \in \mathbb{C}

Therefore we can write

f_1(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C} - \{0\}

If we then set

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

we see that g(z) extends the analyticity of f_1(z) to include z = 0. We also see that

\lim_{z \rightarrow 0} f_1(z) = g(0)

Therefore f_1(z) has a removable singularity at z = 0.

Poles of order k, k > 0

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has a pole of order k at \alpha if there is a function g, analytic at \alpha with g(\alpha) \neq 0, such that

f(z) = \frac{g(z)}{(z - \alpha)^k} for 0 < |z - \alpha| < r

With poles of order k we always have that

f(z) \rightarrow \infty as z \rightarrow \alpha

(which distinguishes them from removable singularities)

and

\lim_{z \rightarrow \alpha} (z - \alpha)^k f(z)

exists and is nonzero (since \lim_{z \rightarrow \alpha} (z - \alpha)^k f(z) = g(\alpha) \neq 0).

To apply this to the function

f_2(z) = \frac{\sin(z)}{z^3}

we first observe that

f_2(z) = \frac{\sin(z)/z}{z^2} = \frac{g(z)}{z^2} for z \in \mathbb{C} - \{0\}

where g is the function

g(z) = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots for z \in \mathbb{C}

Since g(0) = 1 > 0, we see that f_2(z) behaves like \frac{1}{z^2} near z = 0 and

f_2(z) \rightarrow \infty as z \rightarrow 0

so the singularity at z = 0 is not removable. We also see that

\lim_{z \rightarrow 0} z ^2 f_2(z) = g(0) = 1

Therefore the function f_2(z) has a pole of order 2 at z = 0.

Essential singularities

Suppose a function f is analytic on the punctured open disc

\{z: 0 < |z - \alpha| < r\}

and has a singularity at \alpha. The function f has an essential singularity at \alpha if the singularity is neither removable nor a pole. Such a singularity cannot be removed in any way, including by mutiplying by any (z - \alpha)^k, hence the name.

With essential singularities we have that

\lim_{z \rightarrow \alpha} f(z)

does not exist, and f(z) does not tend to infinity as z \rightarrow \alpha.

To apply this to the function

f_3(z) = \sin\big( \frac{1}{z}\big)

we observe that if we restrict the function to the real axis and consider a sequence of points

z_n = \frac{2}{(2n + 1) \pi}

then we have that z_n \rightarrow 0 whereas

f_3(z_n) = \sin\big(\frac{(2n + 1) \pi}{2}\big) = (-1)^n

Therefore

\lim_{z \rightarrow 0} f_3(z)

does not exist, so the singularity is not removable, but it is also the case that

\lim_{z \rightarrow 0} f_3(z) \not \rightarrow \infty

so the singularity is not a pole. Since it is neither a removable singularity nor a pole, it must be an essential singularity.

Classification of isolated singularities using Laurent series

By Laurent’s Theorem, a function f which is analytic on an open annulus

A = \{z: 0 \leq r_1 < |z - \alpha| < r_2 \leq \infty \}

annulus

(shown in the diagram) can be represented as an extended power series of the form

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

= \cdots + \frac{a_{-2}}{(z - \alpha)^2} + \frac{a_{-1}}{(z - \alpha)} + a_0 + a_1 (z - \alpha) + a_2 (z - \alpha)^2 + \cdots

for z \in A, which converges at all points in the annulus. It is an `extended’ power series because it involves negative powers of (z - \alpha). (The part of the power series involving negative powers is often referred to as the singular part. The part involving non-negative powers is referred to as the analytic part). This extended power series representation is the Laurent series about \alpha for the function f on the annulus A. Laurent series are also often used in the case when A is a punctured open disc, in which case we refer to the series as the Laurent series about \alpha for the function f.

The Laurent series representation of a function on an annulus A is unique. We can often use simple procedures, such as finding ordinary Maclaurin or Taylor series expansions, to obtain an extended power series and we can feel safe in the knowledge that the power series thus obtained must be the Laurent series.

Laurent series expansions can be used to classify singularities by virtue of the following result: If a function f has a singularity at \alpha and if its Laurent series expansion about \alpha is

f(z) = \sum_{n = -\infty}^{\infty} a_n(z - \alpha)^n

then

(a) f has a removable singularity at \alpha iff a_n = 0 for all n < 0;

(b) f has a pole of order k at \alpha iff a_n = 0 for all n < -k and a_{-k} \neq 0;

(c) f has an essential singularity at \alpha iff a_n \neq 0 for infinitely many n < 0.

To apply this to our three examples, observe that the function

f_1(z) = \frac{\sin(z)}{z}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z} = 1 - \frac{z^2}{3!} + \frac{z^4}{5!} - \frac{z^6}{7!} + \cdots

for z \in \mathbb{C} - \{0\}. This has no non-zero coefficients in its singular part (i.e., it only has an analytic part) so the singularity is a removable one.

The function

f_2(z) = \frac{\sin(z)}{z^3}

has a singularity at 0 and its Laurent series expansion about 0 is

\frac{\sin(z)}{z^3} = \frac{1}{z^2} - \frac{1}{3!} + \frac{z^2}{5!} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n = 0 for all n < -2 and a_{-2} \neq 0, so the singularity in this case is a pole of order 2.

Finally, the function

f_3(z) = \sin\big( \frac{1}{z} \big)

has a singularity at 0 and its Laurent series expansion about 0 is

\sin \big(\frac{1}{z} \big) = \frac{1}{z} - \frac{1}{3! z^3} + \frac{1}{5! z^5} - \cdots

for z \in \mathbb{C} - \{0\}. This has a_n \neq 0 for infinitely many n < 0 so the singularity here is an essential singularity.

Singularities in Schwarzschild black holes

One often hears about singularities in the context of black hole physics and I wanted to quickly look at singularities in the particular case of non-rotating black holes. A detailed investigation of the various singularities that appear in exact solutions of Einstein’s field equations was conducted in the 1960s and 1970s by Penrose, Hawking, Geroch and others. See, e.g., this paper by Penrose and Hawking. There is now a vast literature on this topic. The following discussion is just my own quick look at how the ideas might arise.

The spacetime of a non-rotating spherical black hole is usually analysed using the Schwarzschild solution of the Einstein field equations for an isolated spherical mass m. In spherical coordinates this is the metric

\Delta \tau = \bigg[ \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2} \bigg\{\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)} + r^2(\Delta \theta)^2 + r^2 \sin^2 \theta (\Delta \phi)^2\bigg\} \bigg]^{1/2}

where

k = \frac{2mG}{c^2} and m is the mass of the spherically symmetric static object exterior to which the Schwarzschild metric applies. If we consider only radial motion (i.e., world lines for which \Delta \theta = \Delta \phi = 0) the Schwarzschild metric simplifies to

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t)^2 - \frac{1}{c^2}\frac{(\Delta r)^2}{\big(1 - \frac{k}{r}\big)}

We can see that the \Delta r term in the metric becomes infinite at r = k so there is apparently a singularity here. However, this singularity is `removable’ by re-expressing the metric in a new set of coordinates, r and t^{\prime}, known as the Eddington-Finkelstein coordinates. The transformed metric has the form

(\Delta \tau)^2 = \big(1 - \frac{k}{r}\big) (\Delta t^{\prime})^2 - \frac{2k \Delta t^{\prime} \Delta r}{cr} - \frac{(\Delta r)^2}{c^2}\big(1 + \frac{k}{r}\big)

which does not behave badly at r = k. In general relativity, this type of removable singularity is known as a coordinate singularity. Another example is the apparent singularity at the 90^{\circ} latitude in spherical coordinates, which disappears when a different coordinate system is used.

Since the term \big(1 - \frac{k}{r} \big) in the Schwarzschild metric becomes infinite at r = 0, it appears that we also have a singularity at this point. This is not a removable singularity and can in fact be recognised in terms of the earlier discussion above as a pole of order 1 (also called a simple pole).