# A fluid-mechanical visualisation of the quantum-mechanical continuity equation

The concept of a probability current is useful in quantum mechanics for analysing quantum scattering and tunnelling phenomena, among other things. However, I have noticed that the same rather abstract and non-visual approach to introducing probability currents is repeated almost verbatim in every textbook (also see, e.g., this Wikipedia article). The standard approach essentially involves defining a probability current from the outset as

$\vec{j} = \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

and then using Schrödinger’s equation to show that this satisfies a fluid-like continuity equation of the form

$\frac{\partial \rho}{\partial t} + \nabla \cdot \vec{j} = 0$

with

$\rho \equiv \Psi^{*} \Psi$

In the present note I want to briefly explore a more intuitive and visual approach involving a model of the actual flow of a probability fluid’. I want to begin with a fluid-mechanical model and then obtain the standard expression for the quantum-mechanical continuity equation from this, rather than starting out with an abstract definition of the probability current and then showing that this satisfies a continuity equation. The essential problem one faces when trying to do this is that although in classical mechanics the position $\vec{r}(t)$ of a point particle and its velocity $\vec{v}(t) = d\vec{r}(t)/dt$ are well defined, this is not the case in conventional quantum mechanics. Quantum mechanics is done probabilistically, treating a particle as a wave packet such that the square of the amplitude of the corresponding wave function acts as a probability density which can be used to measure the probability that the particle will occupy a particular region of space at a particular time. It is not possible to say definitively where a particular particle will be at a particular time in quantum mechanics, which makes it difficult to apply the conventional deterministic equations of fluid mechanics.

A huge specialist literature on quantum hydrodynamics has in fact arisen which tries to circumvent this problem in a number of ways. A standard reference is Wyatt, R. E., 2005, Quantum Dynamics with Trajectories: Introduction to Quantum Hydrodynamics (Springer). The route that a large part of this literature has taken is intriguing because it is based on Bohmian mechanics, an approach to quantum mechanics developed by David Bohm in 1952 which is regarded by most mainstream physicists today as unconventional. The key feature of the Bohmian mechanics approach is that classical-like particle trajectories are possible. Using this approach one can obtain Newtonian-like equations of motion analogous to those in conventional fluid mechanics and this is how this particular literature seems to have chosen to treat quantum particle trajectories in a fluid-like way. Attempts have also been made to introduce mathematically equivalent approaches, but defined within conventional quantum mechanics (see, e.g., Brandt, S. et al, 1998, Quantile motion and tunneling, Physics Letters A, Volume 249, Issue 4, pp. 265-270).

In the present note I am not looking to solve any elaborate problems so I will simply consider a free quantum wave packet which is not acted upon by any forces and try to visualise probability currents and the quantum continuity equation in a fluid-like way by using the momentum vector operator $\hat{\vec{p}}$ to characterise the velocity of the particle. I will then show that the probability current obtained in this fluid-mechanical model is the same as the one defined abstractly in textbooks.

In quantum mechanics, calculations are done using operators to represent observables. Every possible observable that one might be interested in for the purposes of experiment has a corresponding operator which the mathematics of quantum mechanics can work on to produce predictions. The key operator for the purposes of the present note is the momentum vector operator

$\hat{\vec{p}} = -i \hbar \nabla$

which is the quantum mechanical analogue of the classical momentum vector

$\vec{p} = m \vec{v}$

The key idea for the present note is to regard the velocity vector of the quantum particle as being represented by the operator

$\frac{\hat{\vec{p}}}{m}$

by analogy with the classical velocity vector which can be obtained as

$\vec{v} = \frac{\vec{p}}{m}$

We will imagine the total probability mass

$\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \Psi \text{d}V = 1$

as a fluid in steady flow throughout the whole of space and obeying mass conservation. The fact that the flow is steady reflects the fact that there are no forces acting on the quantum particle in this model, so we must have

$\frac{\partial }{\partial t}\big[\frac{\hat{\vec{p}}}{m}\big] = 0$

The velocity can vary from point to point in the probability fluid but at any given point it cannot be varying over time.

In a classical fluid we have a mass density per unit volume $\rho$ and we regard the velocity vector $\vec{v}$ as a volumetric flow rate per unit area, i.e., the volume of fluid that would pass through a unit area per unit time. Then $\rho \vec{v}$ is the mass flow rate per unit area, i.e., the mass of fluid that would pass through a unit area per unit time. In quantum mechanics we can regard the probability mass density per unit volume $\rho \equiv \Psi^{*} \Psi$ as analogous to the mass density of a classical fluid. We can interpret $\frac{\hat{\vec{p}}}{m}$ as the volumetric flow rate per unit area, i.e., the volume of probability fluid that would pass through a unit area per unit time. When doing probability calculations with quantum mechanical operators we usually sandwich’ the operator between $\Psi^{*}$ and $\Psi$, so following that approach here we can define the probability current density as

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$

This is to be interpreted as the probability mass flow rate per unit area, i.e., the amount of probability mass that would pass through a unit area per unit time, analogous to $\vec{j} = \rho \vec{v}$ in the classical case. To see how close the analogy is, suppose the quantum wave function is that of a plane wave

$\Psi(x, y, z, t) = A\mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

Then

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi = A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} \nabla A \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} A \ i \ \vec{k} \ \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A^2 \frac{\hbar \vec{k}}{m}$

$= \rho \vec{v}$

which looks just like the mass flow rate in the classical case with $\rho = \Psi^{*} \Psi$ and $\vec{v} \equiv \frac{\hbar \vec{k}}{m}$. Note that in this example the probability current density formula we are using, namely $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$, turned out to be real-valued. Unfortunately this will not always be the case. Since the probability current vector must always be real-valued, the fluid-mechanical model in the present note will only be applicable in cases when this is true for the formula $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$.

As in classical fluid mechanics, a continuity equation can now be derived by considering the net outflow of probability mass from an infinitesimal fluid element of volume $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$.

Considering only the $y$-component for the moment, we see from the diagram that on the left-hand side we have the probability mass flow rate coming into the volume element through the left-hand face. The mass flow rate coming out of the fluid element through the right-hand face can be approximated using a Taylor series expansion as being equal to the mass flow rate through the left-hand face plus a differential adjustment based on the gradient of the probability current density and the length $\mathrm{d} y$. The net probability mass flow rate in the $y$-direction is then obtained by subtracting the left-hand term from the right-hand term to get

$\frac{\partial }{\partial y} \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi\big) \mathrm{d}V$

Using similar arguments for the $x$ and $z$-directions, the net mass flow rate out of the fluid element in all three directions is then

$\nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) \mathrm{d} V$

Now, the probability mass inside the fluid element is $\rho \mathrm{d} V$ where $\rho = \Psi^{*} \Psi$ and if there is a net outflow of probability fluid this mass will be decreasing at the rate

$- \frac{\partial \rho}{\partial t} \mathrm{d} V$

Equating the two expressions and dividing through by the volume of the fluid element we get the equation of continuity

$\frac{\partial \rho}{\partial t} + \nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) = 0$

What I want to do now is show that if we work out $\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$ we will get the same formula for the probability current as the one usually given in quantum mechanics textbooks. We have

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{m} \Psi^{*} \nabla \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi$

We now note that since the probability current density must be a real vector, the last two terms above must be real. Therefore they are not affected in any way if we take their complex conjugate. Taking the complex conjugate of the second term in the last equality we get

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi + \frac{i \hbar}{2m} \Psi \nabla \Psi^{*}$

$= \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

$= \vec{j}$

This is exactly the expression for the probability current density that appears in textbooks, but rather than introducing it out of nowhere’ at the beginning, we have obtained it naturally as a result of a fluid-mechanical model.

# Calculation of a quantum-mechanical commutator in three dimensions

I needed to work out the commutator $[\hat{H}, \hat{\vec{r}} \ ]$, where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator and $\hat{\vec{r}}$ is the 3D position vector operator. It is difficult to find any textbook or online source that explicitly goes through the calculation of this three-dimensional case (in fact, I have not been able to find any) so I am recording my calculation step-by-step in this note.

The commutator $[\hat{H}, \hat{\vec{r}} \ ]$ is a vector operator with components

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H}$

$[\hat{H}, \hat{y} \ ] = \hat{H} \ \hat{y} - \hat{y} \ \hat{H}$

and

$[\hat{H}, \hat{z} \ ] = \hat{H} \ \hat{z} - \hat{z} \ \hat{H}$

To evaluate these, note that the momentum operator (in position space) is

$\hat{\vec{p}} = -i \ \hbar \nabla$

and so we have

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

$= \frac{1}{2m} \hat{\vec{p}} \cdot \hat{\vec{p}} + \hat{U}$

$= \frac{1}{2m}(\hat{p}_x^{2} + \hat{p}_y^{2} + \hat{p}_z^{2}) + \hat{U}$

Looking at the $x$-component of $[\hat{H}, \hat{\vec{r}} \ ]$ we therefore have

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{\hat{p}_x^{2} \hat{x} + \hat{p}_y^{2} \hat{x} + \hat{p}_z^{2} \hat{x}}{2m} + \hat{U} \ \hat{x} - \big(\frac{\hat{x} \hat{p}_x^{2} + \hat{x} \hat{p}_y^{2} + \hat{x} \hat{p}_z^{2}}{2m} + \hat{x} \ \hat{U}\big)$

$= \frac{1}{2m}([\hat{p}_x^2, \hat{x} \ ] + [\hat{p}_y^2, \hat{x} \ ] + [\hat{p}_z^2, \hat{x} \ ]) + [\hat{U}, \hat{x} \ ]$

Since multiplication is commutative we have $[\hat{U}, \hat{x} \ ] = 0$. I will now show that we also have

$[\hat{p}_y^2, \hat{x} \ ] = [\hat{p}_z^2, \hat{x} \ ] = 0$

To see this, let us first work out in detail the effect of $[\hat{p}_y, \hat{x} \ ]$ on a wavefunction $\Psi$. We have

$[\hat{p}_y, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial y} + \hat{x }i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial y} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} = 0$

where the last equality follows from the fact that $\hat{x}$ does not depend on $y$. Thus, $[\hat{p}_y, \hat{x} \ ] = 0$.

We can now easily show that $[\hat{p}_y^2, \hat{x} \ ] = 0$ because using the basic result for commutators that

$[AB, C] = A[B, C] + [A, C]B$

(easy to prove by writing out the terms in full) we find that

$[\hat{p}_y^2, \hat{x} \ ] = \hat{p}_y \ [\hat{p}_y, \hat{x} \ ] + [\hat{p}_y, \hat{x} \ ] \ \hat{p}_y = 0$

Identical arguments show that $[\hat{p}_z^2, \hat{x} \ ] = 0$. Thus, we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ]$

It now only remains to work out $[\hat{p}_x^2, \hat{x} \ ]$ and we can do this by first working out in detail the effect of $[\hat{p}_x, \hat{x} \ ]$ on a wavefunction $\Psi$ (this is of course the canonical commutation relation’ of quantum mechanics). We have

$[\hat{p}_x, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial x} + \hat{x } i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial x} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} = - i \ \hbar \Psi$

where the last equality follows from the fact that $\hat{x}$ is the same as multiplying by $x$ so its derivative with respect to $x$ equals $1$. Thus, $[\hat{p}_x, \hat{x} \ ] = - i \ \hbar$. Then we find that

$[\hat{p}_x^2, \hat{x} \ ] = \hat{p}_x \ [\hat{p}_x, \hat{x} \ ] + [\hat{p}_x, \hat{x} \ ] \ \hat{p}_x = -2 i \ \hbar \hat{p}_x$

and we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ] = - \frac{i \ \hbar}{m} \hat{p}_x$

Identical arguments show that

$\hat{H} \ \hat{y} - \hat{y} \ \hat{H} = \frac{1}{2m} [\hat{p}_y^2, \hat{y} \ ] = - \frac{i \ \hbar}{m} \hat{p}_y$

and

$\hat{H} \ \hat{z} - \hat{z} \ \hat{H} = \frac{1}{2m} [\hat{p}_z^2, \hat{z} \ ] = - \frac{i \ \hbar}{m} \hat{p}_z$

Thus, we finally reach our desired expression for the Hamiltonian-position commutator in three dimensions:

$[\hat{H}, \hat{\vec{r}} \ ] = - \frac{i \ \hbar}{m} \hat{\vec{p}}$

As an application of this result we will consider the problem of working out the expectation of position and velocity for a quantum particle. In three-dimensional space the quantum wave function is

$\Psi = \Psi(x, y, z, t)$

and we obtain the probability density function as

$\rho(x, y, z, t) = \Psi^{*} \Psi$

The wavefunction $\Psi$ satisfies the time-dependent Schrödinger equation

$i \hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\big(\frac{\partial^2 \Psi}{\partial x^2} + \frac{\partial^2 \Psi}{\partial y^2} + \frac{\partial^2 \Psi}{\partial z^2}\big) + U\Psi$

where $U = U(x, y, z, t)$ is some potential energy function. We can write the Schrödinger equation in operator form using Dirac notation as

$\hat{H} |\Psi \rangle = i \hbar \frac{\partial}{\partial t} |\Psi \rangle$

where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator (the Hamiltonian form of total energy) and

$i \hbar \frac{\partial }{\partial t}$

is the total energy operator. Note that the complex conjugate of the wavefunction $\Psi^{*}$ satisfies the Schrödinger equation written in Dirac notation as

$\langle \Psi | \hat{H} = - \langle \Psi | i \hbar \frac{\partial}{\partial t}$

In quantum mechanics we find the expected position $\langle \vec{r}(t) \rangle$ of the particle by integrating the position operator $\hat{\vec{r}}$ over all space, sandwiched between $\Psi^{*}$ and $\Psi$. Thus, letting $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$ we have

$\langle \vec{r}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \hat{\vec{r}} \ \Psi \text{d}V \equiv \langle \Psi \ | \hat{\vec{r}} \ | \Psi \rangle$

where in the last term I have switched to using Dirac notation which will be useful shortly. The expected velocity can then be obtained by differentiating this integral with respect to $t$. We get

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \frac{\partial \hat{\vec{r}}}{\partial t} \ \Psi \text{d}V$

The second triple integral on the right-hand side is zero because the position operator does not depend on time. The integrand in the first triple integral can be manipulated by using the operator form of the Schrödinger equation and Dirac notation to write

$\frac{\partial \Psi}{\partial t} = \frac{\partial}{\partial t} | \Psi \rangle = \frac{1}{i \ \hbar} \ \hat{H} \ | \Psi \rangle$

and

$\frac{\partial \Psi^{*}}{\partial t} = \langle \Psi | \frac{\partial}{\partial t} = -\frac{1}{i \ \hbar} \ \langle \Psi | \hat{H}$

Thus, we have

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V$

$= - \frac{1}{i \ \hbar} \langle \Psi |\hat{H} \ \hat{\vec{r}} \ | \Psi \rangle + \frac{1}{i \ \hbar} \langle \Psi | \hat{\vec{r}} \ \hat{H} | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \hat{H} \ \hat{\vec{r}} \ - \hat{\vec{r}} \ \hat{H} \ | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \ [\hat{H}, \hat{\vec{r}} \ ] \ | \Psi \rangle$

$= \frac{1}{m} \langle \Psi | \ \hat{\vec{p}} \ | \Psi \rangle$

$= \big \langle \frac{\hat{\vec{p}}}{m} \big \rangle$

where the last two equalities follow from the fact that $[\hat{H}, \hat{\vec{r}} \ ]$ is the commutator of $\hat{H}$ and $\hat{\vec{r}}$, which is equal to $-\frac{i \hbar}{m} \hat{\vec{p}}$ as we saw above. Therefore the expected velocity of a quantum particle looks a lot like the velocity of a classical particle (momentum divided by mass). The idea that quantum mechanical expectations exhibit Newtonian-like behaviour is the essence of the Ehrenfest Theorem of quantum mechanics.

# Derivation of Euler’s equations of motion for a perfect fluid from Newton’s second law

Having read a number of highly technical derivations of Euler’s equations of motion for a perfect fluid I feel that the mathematical meanderings tend to obscure the underlying physics. In this note I want to explore the derivation from a more physically intuitive point of view. The dynamics of a fluid element of mass $m$ are governed by Newton’s second law which says that the vector sum of the forces acting on the fluid element is equal to the mass of the element times its acceleration. Thus,

$\vec{F} = m \vec{a}$

The net force $\vec{F}$ can be decomposed into two distinct types of forces, so-called body forces $\vec{W}$ that act on the entire fluid element (e.g., the fluid element’s weight due to gravity) and stresses $\vec{S}$ such as pressures and shears that act upon the surfaces enclosing the fluid element. For the purposes of deriving the differential form of Euler’s equation we will focus on the net force per unit volume acting on the fluid element, $\vec{f}$, which we will decompose into a weight per unit volume $\vec{w}$ and a net stress force per unit volume $\vec{s}$. The weight per unit volume is simply obtained as

$\vec{w} = \rho \vec{g}$

where

$\rho = \frac{m}{V}$

is the mass density of the fluid (i.e., mass per unit volume) and $\vec{g}$ is the acceleration due to gravity. In index notation, the equation for the $i$-th component of the weight per unit volume is

$w^i = \rho g^i$

The net stress force per unit volume, $\vec{s}$, is a little more complicated to derive since it involves the rank-2 stress tensor $\tau^{ij}$. This tensor contains nine components and is usually represented as a $3 \times 3$ symmetric matrix. In Cartesian coordinates the components along the main diagonal, namely $\tau^{xx}$, $\tau^{yy}$ and $\tau^{zz}$, represent normal stresses, i.e., forces per unit area acting orthogonally to the planes whose normal vector is identified by the first superscript, as indicated in the diagram below. (Note that a stress is a force per unit area, so to convert a stress tensor component $\tau^{ij}$ into a force it would be necessary to multiply it by the area over which it acts).

In the diagram each normal stress is shown as a tension, i.e., a normal stress pointing away from the surface. When a normal stress points towards the surface it acts upon, it is called a pressure.

The off-diagonal components of the stress tensor represent shear stresses, i.e., forces per unit area that point along the sides of the fluid element, parallel to these sides rather than normal to them. These shear stresses are shown in the following diagram.

Shear stresses only arise when there is some kind of friction in the fluid. A perfect fluid is friction-free so there are no shear stresses. Euler’s equation only applies to perfect fluids so for the derivation of the equation we can ignore the off-diagonal components of the stress tensor.

The normal stresses along the main diagonal are usually written as

$\tau^{xx} = - p^x$

$\tau^{yy} = - p^y$

$\tau^{zz} = - p^z$

where $p$ stands for pressure and the negative sign reflects the fact that a pressure points in the opposite direction to a tension.

In a perfect fluid the pressure is isotropic, i.e., the same in all directions, so we have

$p^x = p^y = p^z = p$

Therefore the stress tensor of a perfect fluid with isotropic pressure reduces to

$\tau^{ij} = -p \delta^{ij}$

where $\delta^{ij}$ is the Kronecker delta (and may be thought of here as the metric tensor of Cartesian 3-space).

Now suppose we consider the net stress (force per unit area) in the y-direction of an infinitesimal volume element.

The stress on the right-hand face can be approximated using a Taylor series expansion as being equal to the stress on the left plus a differential adjustment based on its gradient and the length $dy$. If we take the stress on the right to be pointing in the positive direction and the one on the left as pointing in the negative (opposite) direction, the net stress in the y-direction is given by

$\tau^{yy} + \frac{\partial \tau^{yy}}{\partial y} dy - \tau^{yy} = \frac{\partial \tau^{yy}}{\partial y} dy$

Similarly, the net stresses in the $x$ and $z$-directions are

$\frac{\partial \tau^{xx}}{\partial x} dx$

and

$\frac{\partial \tau^{zz}}{\partial z} dz$

To convert these net stresses to net forces we multiply each one by the area on which it acts. Thus, the net forces on the fluid element (in vector form) are

$\big(\frac{\partial \tau^{xx}}{\partial x} dxdydz\big) \vec{i}$

$\big(\frac{\partial \tau^{yy}}{\partial y} dxdydz\big) \vec{j}$

$\big(\frac{\partial \tau^{zz}}{\partial z} dxdydz\big) \vec{k}$

The total net force on the fluid element is then

$\big(\frac{\partial \tau^{xx}}{\partial x} \ \vec{i} + \frac{\partial \tau^{yy}}{\partial y} \ \vec{j} + \frac{\partial \tau^{zz}}{\partial z} \ \vec{k}\big) dxdydz$

Switching from tensions to pressures using $\tau^{ij} = -p \delta^{ij}$ and dividing through by the volume $dxdydz$ we finally get the net stress force per unit volume to be

$\vec{s} = -\big(\frac{\partial p}{\partial x} \ \vec{i} + \frac{\partial p}{\partial y} \ \vec{j} + \frac{\partial p}{\partial z} \ \vec{k}\big)$

In index notation, the equation for the $i$-th component of this net pressure per unit volume is written as

$s^i = -\frac{\partial p}{\partial x^i}$

We have now completed the analysis of the net force on the left-hand side of Newton’s second law.

On the right-hand side of Newton’s second law we have mass times acceleration, where acceleration is the change in velocity with time. To obtain an expression for this we observe that the velocity of a fluid element may change for two different reasons. First, the velocity field may vary over time at each point in space. Second, the velocity may vary from point to point in space (at any given time). Thus, we consider the velocity field to be a function of the time coordinate as well as the three spatial coordinates, so

$\vec{v} = \vec{v}(t, x, y, z) = v^x(t, x, y, z) \ \vec{i} + v^y(t, x, y, z) \ \vec{j} + v^z(t, x, y, z) \ \vec{k}$

Considering the $i$-th component of this velocity field, the total differential is

$dv^i = \frac{\partial v^i}{\partial t} \ dt + \frac{\partial v^i}{\partial x} \ dx + \frac{\partial v^i}{\partial y} \ dy + \frac{\partial v^i}{\partial z} \ dz$

so the total derivative with respect to time is

$\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} \ dt + v^x \frac{\partial v^i}{\partial x} + v^y \frac{\partial v^i}{\partial y} + v^z \frac{\partial v^i}{\partial z}$

where I have used

$v^x = \frac{dx}{dt}$

$v^y = \frac{dy}{dt}$

$v^z = \frac{dz}{dt}$

We can write this more compactly using the Einstein summation convention as

$\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}$

This is then the $i$-th component of the acceleration vector on the right-hand side of Newton’s second law. In component form, therefore, we can write mass times acceleration per unit volume for the fluid element as

$\rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

This completes the analysis of the mass times acceleration term on the right-hand side of Newton’s second law.

In per-unit-volume form, Newton’s second law for a fluid element is

$\vec{w} + \vec{s} = \rho \vec{a}$

and writing this in the component forms derived above we get the standard form of Euler’s equations of motion for a perfect fluid:

$\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

# Alternative approaches to formulating geodesic equations on Riemannian manifolds and proof of their equivalence

A geodesic can be defined as an extremal path between two points on a manifold in the sense that it minimises or maximises some criterion of interest (e.g., minimises distance travelled, maximises proper time, etc). Such a path will satisfy some geodesic equations equivalent to the Euler-Lagrange equations of the calculus of variations. A geodesic can also be defined in a conceptually different way as the straightest’ possible path between two points on a manifold. In this case the path will satisfy geodesic equations derived by requiring parallel transport of a tangent vector along the path. Although these are conceptually different ways of defining geodesics, they are mathematically equivalent. In the present note I want to explore the derivation of geodesic equations in these two different ways and prove their mathematical equivalence.

Now, in the calculus of variations we typically define a system’s action $S$ to be the time-integral of a Lagrangian $L$:

$S \equiv \int^{t_B}_{t_A} L(q_i, \dot{q_i}) dt$

where $L(q_i, \dot{q_i})$ says that the Lagrangian is a function of position coordinates $q_i$ and velocities $\dot{q_i}$ (and $i$ ranges over however many coordinates there are). We find the trajectory that yields a desired extremal value of the action $S$ as the one that satisfies the Euler-Lagrange equations

$0 = \frac{d}{dt} \big(\frac{\partial L}{\partial \dot{q_i}} \big) - \frac{\partial L}{\partial q_i}$

Let us now suppose that we are facing an exactly analogous situation in which there are two points on the manifold, $A$ and $B$, and we are considering possible paths between them to try to find the extremal one. We can describe any path between $A$ and $B$ by specifying the coordinates of the points along it as functions of a parameter $\sigma$ that goes from a value of $0$ at $A$ to a value of $1$ at $B$, i.e., by specifying the functions $x^{\mu}(\sigma)$. Noting that the line element can be written as

$ds^2 = g_{\mu \gamma} dx^{\mu} dx^{\gamma}$

we can write the length of a particular path as

$s = \int \sqrt{ds^2} = \int^1_0 \sqrt{g_{\mu \gamma} \frac{dx^{\mu}}{d \sigma} \frac{dx^{\gamma}}{d \sigma}} d \sigma$

Note that the metric is a function of the coordinates of points along the path, which in turn are functions of the parameter $\sigma$, i.e., $g_{\mu \gamma} = g_{\mu \gamma}(x^{\alpha}(\sigma))$. This situation is exactly analogous to the usual calculus of variations scenario because, writing $\dot{x}^{\alpha} \equiv d x^{\alpha}/d \sigma$, we see that we have a Lagrangian function

$L(x^{\alpha}, \dot{x}^{\alpha}) = \sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

and we hope to find the path $x^{\mu}(\sigma)$ that makes the integral of the Lagrangian extreme. This will be the path that satisfies the Euler-Lagrange equations

$0 = \frac{d}{d \sigma} \big(\frac{\partial L}{\partial \dot{x}^{\alpha}}\big) - \frac{\partial L}{\partial x^{\alpha}}$

This corresponds to $N$ separate differential equations in an $N$-dimensional manifold, one equation for each value of the index $\alpha$.

We can manipulate the Euler-Lagrange equations to get geodesic equations which are easier to use in particular contexts. First, note that

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{\partial}{\partial \dot{x}^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} (g_{\mu \gamma} \ \delta^{\mu}_{\alpha} \ \dot{x}^{\gamma} + g_{\mu \gamma} \ \dot{x}^{\mu} \ \delta^{\ \gamma}_{\alpha})$

because, for example, $\partial \dot{x}^{\mu}/\partial \dot{x}^{\alpha} = \delta^{\mu}_{\alpha}$. Also note that the metric is treated as a constant as it depends on $x^{\alpha}$ not on $\dot{x}^{\alpha}$. Doing the sums over the Kronecker deltas we get

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{2L}(g_{\alpha \gamma} \ \dot{x}^{\gamma} + g_{\mu \alpha} \ \dot{x}^{\mu})$

$= \frac{1}{2L}(g_{\alpha \mu} \ \dot{x}^{\mu} + g_{\alpha \mu} \ \dot{x}^{\mu})$

$= \frac{1}{L} g_{\alpha \mu} \ \dot{x}^{\mu}$

But notice that since

$s = \int L d \sigma$

we have

$\frac{ds}{d \sigma} = L$

so

$\frac{1}{L} = \frac{d \sigma}{ds}$

and we can write

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{L} g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma} \frac{d \sigma}{ds}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{ds}$

Next, we have

$\frac{\partial L}{\partial x^{\alpha}} = \frac{\partial}{\partial x^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \dot{x}^{\mu} \dot{x}^{\gamma}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{d \sigma} \frac{d x^{\gamma}}{d \sigma} \frac{d \sigma}{ds}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Putting these results into the Euler-Lagrange equations we get

$0 = \frac{d}{d \sigma} \big(g_{\alpha \mu} \frac{d x^{\mu}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Finally, multiplying through by $d \sigma/ds$ we get

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

where I have also renamed $\mu \rightarrow \beta$ in the first term to make it clearer that the Einstein summations in the first and second terms are independent. This is the first version of the geodesic equations, derived by requiring that the path between the points $A$ and $B$ should be extremal in the sense of satisfying the Euler-Lagrange equations of the calculus of variations.

We will now derive a second version of the geodesic equations by requiring the geodesic to be a path that is locally straight. In differential geometry a path is defined as straight if it parallel transports its own tangent vector, i.e., if the tangent vector does not change as we move an infinitesimal step along the path. If we take an arbitrary point on the path to be $x^{\mu} \ e_{\mu}$ and we take $ds$ to be an infinitesimal displacement along the path, then a tangent vector to the path is

$\frac{d x^{\mu}}{d \sigma} e_{\mu}$

and we want

$\frac{d}{ds}\big(\frac{d x^{\mu}}{d \sigma}e_{\mu} \big) = \frac{d^2 x^{\mu}}{ds d \sigma} e_{\mu} + \frac{d x^{\mu}}{d \sigma} \frac{d e_{\mu}}{ds} = 0$

Multiplying through by $d \sigma/ds$ this gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d e_{\mu}}{ds} = 0$

But

$\frac{d e_{\mu}}{ds} = \frac{\partial e_{\mu}}{\partial x^{\gamma}} \frac{d x^{\gamma}}{d s}$

$= \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha}$

Putting this into the equation gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha} = 0$

To enable us to factor out the basis vector we can rename the indices in the second term as $\mu \rightarrow \alpha$ and $\gamma \rightarrow \beta$ to get

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} e_{\mu} = 0$

$\iff$

$\big[\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} \big] e_{\mu} = 0$

$\implies$

$\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = 0$

This is the second version of the geodesic equations, derived by assuming that the path between the two points on the manifold is locally straight.

We now have two seemingly different versions of the geodesic equations, namely

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

and

$0 = \frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta}$

We will next show that they are in fact mathematically equivalent. Starting from the first version, we can expand out the brackets to get

$0 = \frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

$\iff$

$0 = \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

Now we rename the indices as follows: $\sigma \rightarrow \alpha$ in the first term; $\sigma \rightarrow \beta$ in the second term; $\beta \rightarrow \mu$ and $\alpha \rightarrow \sigma$ in the third term; and $\alpha \rightarrow \sigma$, $\mu \rightarrow \alpha$, $\gamma \rightarrow \beta$ in the fourth term. We get

$0 = \frac{1}{2}\frac{\partial g_{\sigma \beta}}{\partial x^{\alpha}}\frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\sigma \alpha}}{\partial x^{\beta}} \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2} - \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}} \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds}$

We can write this as

$0 = \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} \frac{1}{2} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}] + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2}$

Finally, multiplying through by $g^{\mu \sigma}$ and using the facts that

$g^{\mu \sigma} \ g_{\sigma \mu} = 1$

and

$\Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = \frac{1}{2} g^{\mu \sigma} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}]$

we get

$0 = \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} + \frac{d^2 x^{\mu}}{ds^2}$

which is the second version of the geodesic equation. Thus, the two versions are equivalent as claimed.

# Geometric interpretation of Christoffel symbols and some alternative approaches to calculating them

In a classic paper in 1869, Elwin Bruno Christoffel (1829-1900) introduced his famous Christoffel symbols $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ to represent an array of numbers describing a metric connection. They are also known as connection coefficients (and sometimes less respectfully as Christ-awful symbols’). In differential geometry one usually first encounters them when studying covariant derivatives of tensors in tensor calculus. For example, suppose we try to differentiate the contravariant vector $A = A^{\alpha} e_{\alpha}$, where $e_{\alpha}$ denotes a coordinate basis vector (and we are using the Einstein summation convention). We get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \frac{\partial e_{\alpha}}{\partial x^{\beta}}$

In general, the partial derivative in the second term on the right will result in another vector which we can write in terms of its coordinate basis as

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

This defines the Christoffel symbol $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$. The downstairs indices refer to the rate of change of the basis components $e_{\alpha}$ with respect to the coordinate variable $x^{\beta}$ in the direction of the coordinate basis vector $e_{\gamma}$ ($\gamma$ being the upstairs index). Substituting the second equation into the first we get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

To enable us to factor out the coordinate basis vector we can exchange the symbols $\alpha$ and $\gamma$ in the second term on the right to get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta} \ e_{\alpha}$

$= \big( \frac{\partial A^{\alpha}}{\partial x^{\beta}} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}\big) \ e_{\alpha}$

The expression in the bracket is called the covariant derivative of the contravariant vector $A$, i.e., the rate of change of $A^{\alpha}$ in each of the directions $\beta$ of the coordinate system $x^{\beta}$. It has the important property that it is itself tensorial (unlike the ordinary partial derivative of the tensor on its own). This covariant derivative is often written using the notation

$\nabla_{\beta} \ A^{\alpha} = \partial_{\beta} \ A^{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$

Having thus established the meaning of the Christoffel symbols, one then goes on to work out that the covariant derivative of a one-form is

$\nabla_{\beta} \ A_{\alpha} = \partial_{\beta} \ A_{\alpha} - A_{\gamma} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$

and that the covariant derivatives of higher rank tensors are constructed from the building blocks of $\nabla_{\beta} \ A^{\alpha}$ and $\nabla_{\beta} \ A_{\alpha}$ by adding a $\Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$ term for each upper index $\gamma$ and a $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ term for each lower index $\gamma$. For example, the covariant derivative of the $(1, 1)$ rank-2 tensor $X^{\mu}_{\hphantom{\mu} \sigma}$ is

$\nabla_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} = \partial_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} + X^{\alpha}_{\hphantom{\mu} \sigma} \ \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} - X^{\mu}_{\hphantom{\mu} \alpha} \ \Gamma^{\alpha}_{\hphantom{\alpha} \sigma \beta}$

Christoffel symbols then go on to play vital roles in other areas of differential geometry, perhaps most notably as key components in the definition of the Riemann curvature tensor.

It is possible to have a working knowledge of all of this without truly understanding at a deep level, say geometrically, what Christoffel symbols really mean. In the present note I want to delve a bit more deeply into how one might calculate and interpret Christoffel symbols geometrically. I also want to explore some alternative ways of calculating them in the context of a simple plane polar coordinate system $(r, \theta)$ which is related to the usual Cartesian $(x, y)$ coordinate system via the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

In an $n-$dimensional manifold there are potentially $n^3$ Christoffel symbols to be calculated, though this number is usually reduced by symmetries. In the present plane polar coordinate case, we will need to calculate $2^3 = 8$ Christoffel symbols. These are

$\Gamma^{r}_{\hphantom{r} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r}$

$\Gamma^{r}_{\hphantom{r} \theta r}$

$\Gamma^{r}_{\hphantom{r} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta}$

$\Gamma^{r}_{\hphantom{r} r \theta}$

Geometric approach
Consider the situation shown in the diagram below where two vectors $(e_{\theta})_P$ and $(e_{\theta})_S$ of the basis vector field $e_{\theta}$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_{\theta})_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

Now, in plane polar coordinates the magnitude of $e_{\theta}$ is

$|e_{\theta}| = r$

Therefore the length of the arc $L$ in the diagram is

$L = r \Delta \theta$

If $\Delta \theta$ is small, we have

$L \approx |\Delta_{\theta} e_{\theta}|$

where $\Delta_{\theta} e_{\theta}$ is the vector connecting the endpoints of $(e_{\theta})_P$ and $(e_{\theta})_S$, i.e., $\Delta_{\theta} e_{\theta} = (e_{\theta})_S - (e_{\theta})_P$.

Therefore

$|\Delta_{\theta} e_{\theta}| \approx r \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we get

$|d_{\theta} e_{\theta}| = r d \theta$

From the diagram we see that $d_{\theta} e_{\theta}$ points in the opposite direction of $e_r$. Therefore we have

$d_{\theta} e_{\theta} = - r d \theta e_r$

(note that in plane polar coordinates $e_r$ is of unit length). From this equation we have

$\frac{d_{\theta} e_{\theta}}{d \theta} \equiv \frac{\partial e_{\theta}}{\partial \theta} = -r e_r$

But from the definition of Christoffel symbols we have

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta}$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

We have obtained the first two Christoffel symbols on our list from the geometric setup and the nice thing about this approach is that we can see what the underlying changes in the coordinate basis vectors looked like.

To obtain the next two Christoffel symbols on our list, we consider a change in the vector field $e_{\theta}$ due to a displacement in the radial direction from $P$ to $Q$ in the following diagram:

We have moved outwards by a small amount $\Delta r$ and as a result the length of the vectors in the vector field $e_{\theta}$ has increased by a small amount $|\Delta_r e_{\theta}|$ shown in the diagram. From the diagram we see that the proportions of the two increases must be same, so we have

$\frac{|\Delta_r e_{\theta}|}{|e_{\theta}|} = \frac{\Delta r}{r}$

or

$|\Delta_r e_{\theta}| = \Delta r \frac{1}{r} |e_{\theta}|$

Passing to the differential limit as $\Delta r \rightarrow 0$ we get

$|d_r e_{\theta}| = dr \frac{1}{r} |e_{\theta}|$

Since $d_r e_{\theta}$ is directed along the vector $e_{\theta}$ we can write the vector equation

$d_r e_{\theta} = dr \frac{1}{r} e_{\theta}$

so

$\frac{d_r e_{\theta}}{dr} \equiv \frac{\partial e_{\theta}}{\partial r} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

We have thus found two more Christoffel symbols from the geometrical setup. To get the next two Christoffel symbols on our list we observe that the basis vector field $e_r$ does not change as we move in the radial direction (either in magnitude or direction) so we must have

$\frac{\partial e_r}{\partial r} = 0$

where the right hand side here denotes a zero vector. But we know that

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta}$

so we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Finally, to get the last two remaining Christoffel symbols on our list, we consider a change in the vector field $e_r$ due to an angular displacement. In the diagram below two vectors $(e_r)_P$ and $(e_r)_S$ of the basis vector field $e_r$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_r)_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

The arc length $L$ is

$L = |e_r| \Delta \theta = \Delta \theta$

(since the magnitude of the coordinate basis vector $e_r$ is $|e_r| = 1$). But for small $\Delta \theta$ we also have

$L \approx |\Delta_{\theta} e_r|$

where $\Delta_{\theta} e_r$ is the vector connecting the endpoints of $(e_r)_P$ and $(e_r)_S$, i.e., $\Delta_{\theta} e_r = (e_r)_S - (e_r)_P$. Therefore

$|\Delta_{\theta} e_r| = \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we have

$|d_{\theta} e_r| = d \theta$

But $d_{\theta} e_r$ has the same direction as $e_{\theta}$. Therefore

$d_{\theta} e_r = \frac{1}{r} d \theta e_{\theta}$

where the factor $\frac{1}{r}$ is needed to correct for the magnitude $r$ of $e_{\theta}$ (we only want the direction of $e_{\theta}$ here). Therefore we see that

$\frac{d_{\theta} e_r}{d \theta} \equiv \frac{\partial e_r}{\partial \theta} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

This completes the geometric calculation of all the Christoffel symbols for plane polar coordinates.

Algebraic approach

It is possible to calculate the eight Christoffel symbols quite easily for plane polar coordinates by first expressing the basis components $e_r$ and $e_{\theta}$ in terms of the Cartesian components $e_x$ and $e_y$. Note that these basis components are one-forms, so they transform as

$e^{\prime}_{\alpha} = \frac{\partial x^{\beta}}{\partial x^{\prime \alpha}} e_{\beta}$

We use the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

to calculate the coefficients. We get

$e_r = \frac{\partial x}{\partial r} e_x + \frac{\partial y}{\partial r} e_y$

$e_{\theta} = \frac{\partial x}{\partial \theta} e_x + \frac{\partial y}{\partial \theta} e_{\theta}$

and therefore

$e_r = \cos \theta e_x + \sin \theta e_y$

$e_{\theta} = -r \sin \theta e_x + r \cos \theta e_y$

Then we calculate the Christoffel symbols as follows. First,

$\frac{\partial e_r}{\partial r} = 0$

so

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta} = 0$

and we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Next,

$\frac{\partial e_r}{\partial \theta} = - \sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

Next,

$\frac{\partial e_{\theta}}{\partial \theta} = -r \cos \theta e_x - r \sin \theta e_y = -r e_r$

so

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta} = -r e_r$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

Finally,

$\frac{\partial e_{\theta}}{\partial r} = -\sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

Metric tensor approach

The previous approach relied on knowing the functional relationship between the Cartesian coordinates $(x, y)$ and the plane polar coordinates $(r, \theta)$. There is another more generally useful method of calculating the Christoffel symbols from the components of the metric tensor, using the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

I will first derive this formula from first principles, then use it to find the Christoffel symbols for the plane polar coordinates case.

The first step is to show that Christoffel symbols are symmetric in their lower indices, i.e.,

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

as this property will be needed in the derivation of the formula. To prove the symmetry property we start from the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Suppose we now decompose the basis vectors $e_{\alpha}$ in a local Cartesian coordinate system. Then using the transformation rule for one-forms we have

$e_{\alpha} = \frac{\partial x^m}{\partial x^{\alpha}} e_m$

where the $x^m$ are the Cartesian coordinates and the $e_m$ are the coordinate basis vectors (which are constant in both magnitude and direction in the Cartesian system). Differentiating gives

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

Equating the expressions for $\frac{\partial e_{\alpha}}{\partial x^{\beta}}$ we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

But then

$\Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\beta} \partial x^{\alpha}} e_m$

so it follows from Young’s Theorem (equality of cross-partials) that

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

We conclude that Christoffel symbols are symmetric in their lower indices, as claimed.

Note too that the components $g_{\mu \gamma}$ of the general metric tensor are also symmetric with respect to their indices. This follows from the defining equation of the metric tensor components in terms of the basis vector fields $e_{\gamma}$, namely

$g_{\mu \gamma} \equiv e_{\mu} \cdot e_{\gamma}$

Since $e_{\mu} \cdot e_{\gamma} = e_{\gamma} \cdot e_{\mu}$

the metric is symmetric, i.e.,

$g_{\mu \gamma} = g_{\gamma \mu}$

To derive the formula for the Christoffel symbols in terms of the metric tensor components, we begin again with the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Taking the scalar product with another basis vector on both sides we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} \cdot e_{\mu} = \frac{\partial e_{\alpha}}{\partial x^{\beta}} \cdot e_{\mu}$

$= \frac{\partial (e_{\alpha} \cdot \ e_{\mu})}{\partial x^{\beta}} - e_{\alpha} \cdot \frac{\partial e_{\mu}}{\partial x^{\beta}}$

$= \frac{\partial g_{\alpha \mu}}{\partial x^{\beta}} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ e_{\alpha} \cdot \ e_{\rho}$

$= \partial_{\beta} \ g_{\alpha \mu} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

Therefore we have

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

In the second term on the right hand side we can rename $\rho \rightarrow \gamma$ and use the fact that the metric is symmetric to reverse the indices. We get

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\gamma}_{\hphantom{\gamma} \mu \beta} \ g_{\gamma \alpha}$

By cyclically renaming the indices $\beta$, $\alpha$, and $\mu$ we can generate two more similar equations. From the cyclic permutation $\beta$, $\alpha$, $\mu$ $\rightarrow$ $\alpha$, $\mu$, $\beta$ we get

$\partial_{\alpha} \ g_{\mu \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \mu \alpha} \ g_{\gamma \beta} + \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ g_{\gamma \mu}$

and from the cyclic permutation $\alpha$, $\mu$, $\beta$ $\rightarrow$ $\mu$, $\beta$, $\alpha$ we get

$\partial_{\mu} \ g_{\beta \alpha} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \mu} \ g_{\gamma \alpha} + \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \mu} \ g_{\gamma \beta}$

Now we add the first two equations and subtract the third to get

$\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha} = 2 \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu}$

where we have taken advantage of the symmetry in the lower indices of the Christoffel symbols to cancel some terms. Using the fact that

$g^{\mu \gamma} \ g_{\gamma \mu} = 1$

we multiply both sides by $\frac{1}{2}g^{\mu \gamma}$ to get the final formula:

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2}g^{\mu \gamma}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha}]$

$= \frac{1}{2}g^{\gamma \mu}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

This is made easier to remember by noting the following facts. A factor of the inverse metric generates the Christoffel symbol’s upper index. The negative term has the symbol’s lower indices as the indices of the metric. The other two terms in the bracket are cyclic permutations of this last term.

Having derived the formula we can now employ it to calculate the eight Christoffel symbols for plane polar coordinates. We can work out the metric tensor using the distance formula

$ds^2 = dx^2 + dy^2$

with the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

Then

$dx = \cos \theta dr - r \sin \theta d \theta$

$dy = \sin \theta dr + r \cos \theta d \theta$

so

$dx^2 = \cos^2 \theta dr^2 + r^2 \sin^2 \theta d \theta^2 - 2 r \sin \theta \cos \theta dr d \theta$

$dy^2 = \sin^2 \theta dr^2 + r^2 \cos^2 \theta d \theta^2 + 2 r \sin \theta \cos \theta dr d \theta$

Therefore the metric in plane polar coordinates is

$ds^2 = dx^2 + dy^2 = dr^2 + r^2 d \theta^2$

The metric tensor is therefore

$[g_{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & r^2 \end{pmatrix}$

and the inverse metric is

$[g^{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & \frac{1}{r^2} \end{pmatrix}$

Now, in the formula for $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ the indices $\alpha$, $\beta$, $\gamma$ and $\mu$ represent the polar coordinates $r$ and $\theta$ in various permutations. Inspection of $[g_{\alpha \beta}]$ shows that the only partial derivative terms which do not equal zero are

$\partial_r \ g_{\theta \theta} = \partial_r (r^2) = 2r$

Inspection of $[g^{\alpha \beta}]$ shows that this equals zero except when

$g^{rr} = 1$

and

$g^{\theta \theta} = \frac{1}{r^2}$

Substituting these values of the metric tensor components into the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

we get

$\Gamma^{r}_{\hphantom{r} \theta \theta} = \frac{1}{2} g^{r r} \big( - \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} (-2r) = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

# On Lie derivatives of tensor fields

Differential geometry provides a number of ways of extending the familiar notion of the derivative of a real-valued function to enable differentiation of various types of tensor fields that `live’ on manifolds, such as scalars, contravariant vectors, one-forms (also known as covariant vectors) and mixed tensors. The problem that needs to be overcome in such cases is the fact that partial differentiation of tensors is generally not tensorial, i.e., the result is not itself a tensor. The reason is that the process of differentiation involves subtracting tensors living in different tangent spaces on a curved manifold, so their difference does not transform in the same way as either of the tensors individually. For example, consider a contravariant vector field $X^a$. In the tangent space at point $P$ the transformation law for $X^a$ is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_P X^b$

whereas in the tangent space at point $Q$ the transformation law is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_Q X^b$

If we imagine these two tangent spaces at points $P$ and $Q$ on the manifold separated by distance $\delta u$, the derivative would involve computing

$\lim_{\delta u \rightarrow 0} \frac{[X^a]_P - [X^a]_Q}{\delta u}$

and the difference in the numerator will clearly not transform like either of the contravariant vectors individually since their transformation matrices are evaluated at different points. The derivative will not therefore be a tensor itself.

The usual way of getting around this is to introduce some kind of auxiliary field on to the manifold which provides a link between the two tensors, thus enabling them to transform in the same way (with respect to the auxiliary field) when they are subtracted. In the present note I want to explore the particular concept of the Lie derivative of a tensor field (named after the mathematician Marius Sophus Lie, 1842-1899) which employs this auxiliary field approach by introducing a contravariant vector field on to the manifold. To this end, suppose we define a vector field $X^a (x)$. This can be used to define streamlines in the manifold (also called a congruence of curves) as the solutions of the ordinary differential equations

$\frac{dx^a}{du} = X^a (x(u))$

where $u$ is a parameter determining the position on a given streamline. The equations encapsulate the fact that each point on a streamline has a tangent vector belonging to the vector field.

Example Using the notation $x^i$ to denote the $i$-th coordinate, suppose the vector field is $X = (1, x^2(u))$. Then $X^1 = 1$, $X^2 = x^2$ and the streamlines for this vector field are obtained by solving simultaneously the differential equations

$\frac{dx^1}{du} = 1$

$\frac{dx^2}{du} = x^2$

Solving the first equation gives

$x^1 = u + c^1$

Solving the second equation gives

$x^2 = e^{c^2}e^u$

Using the solution of the first equation to substitute for $u$ in the solution of the second one we get

$x^2 = e^{c^2} e^{x^1 - c^1} = e^{c^2 - c^1}e^{x^1} \equiv Ce^{x^1}$

Therefore the streamlines of the vector field $(1, x^2(u))$ are the graphs of the equation

$x^2 = Ce^{x^1}$

where $C$ is a constant. Some of the streamlines are shown in the figure below.

Now suppose we want to find the Lie derivative of a tensor field, $T^{ab \cdots}_{cd \cdots}(x)$, using the vector field $X^a(x)$. The essential idea is to use the streamlines of the vector field to link the tensor at some point $P$, $T^{ab\cdots}_{cd \cdots}(P)$, with the tensor at some neighbouring point $Q$, $T^{ab\cdots}_{cd \cdots}(Q)$, in such a way that the two will have the same transformation matrix at point $Q$ (with respect to the auxiliary vector field). We can then subtract the two tensors at $Q$ and so define the derivative at $P$ by a limiting process as $Q$ tends to $P$. In all such cases, the technique begins by considering a coordinate transformation from $P$ to $Q$ of the form

$x^{\prime a} = x^a + \delta u X^a(x)$

where $\delta u$ is arbitrarily small. The point $Q$ with coordinates $x^{\prime a}$ lies on the streamline through $P$ which the vector field $X^a(x)$ generates. Differentiating the coordinate transformation we get

$\frac{\partial x^{\prime a}}{\partial x^b} = \frac{\partial x^a}{\partial x^b} + \delta u \ \partial_b X^a$

$= \delta^a_b + \delta u \ \partial_b X^a$

where $\delta^a_b$ is the Kronecker delta and $\partial_b \equiv \frac{\partial}{\partial x^b}$. What we now do is consider the effect of the above coordinate transformation on the tensor field $T^{ab\cdots}_{cd \cdots}$ at the points $P$ and $Q$. In what follows I will employ this general procedure to obtain the Lie derivative formulas with respect to a contravariant vector field $X^a(x)$ in the case of a scalar field $\phi$, a contravariant vector field $Y^a$, a one-form field $Y_a$, and a general mixed tensor field $T^{ab\cdots}_{cd \cdots}$.

The Lie derivative of a scalar field $\phi$

Not surprisingly this is the easiest case to deal with since scalars are invariants: the values of a scalar field defined over a manifold do not change under a change in the coordinate system being used. The value of the scalar field at the point $P$ will simply be $\phi(x)$ and the value at the point $Q$ will be

$\phi(x^{\prime}) = \phi(x^c + \delta u X^c)$

We can expand this in a Taylor series about the point $P$ with coordinates $x$ to get the first-order approximation

$\phi(x^{\prime}) \approx \phi(x) + \delta u X^c \ \partial_c \phi(x)$

The Lie derivative of the scalar field with respect to the contravariant vector field $X^a(x)$ is then

$L_X \phi = \lim_{\delta u \rightarrow 0} \frac{\phi(x^{\prime}) - \phi(x)}{\delta u} = X^c \ \partial_c \phi$

$\equiv X^a \ \partial_a \phi$

We observe that the Lie derivative of the scalar field $\phi(x)$ with respect to the vector field $X^a(x)$ is actually the directional derivative of $\phi$ in the direction of the vector $X^a$. In the differential geometry literature in this area it is common to associate the contravariant vector field $X$ with the linear differential operator $X^a \ \partial_a$ (which operates on any real-valued function $f$ to produce another function $g$) and essentially treat them as the same object. Given a point $P$ on the manifold, one thinks of the partial differential operators $[\partial_a]_P$ as constituting a basis for all the vectors in the tangent space at $P$, so that any vector at $P$ can be written as a linear combination of the $[\partial_a]_P$ in the form

$[X]_P = [X^a]_P [\partial_a]_P$

This is the intuitive justification for treating the vector field $X$ and the linear differential operator $X^a \ \partial_a$ as being the same things. Under this convention, one often sees the Lie derivative of a scalar field $\phi$ with respect to the contravariant vector field $X$ written as

$L_X \phi = X \phi$

The Lie derivative of a contravariant vector field $Y^a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the contravariant vector field $Y^a(x)$ at $P$ is mapped to

$Y^{\prime a}(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^b} Y^b(x)$

$= (\delta^a_b + \delta u \ \partial_b X^a) Y^b(x)$

$= Y^a(x) + \delta u \ Y^b \ \partial_b X^a$

The vector already at $Q$, namely $Y^a(x^{\prime})$, has a first-order Taylor series approximation about $x$ of the form

$Y^a(x^{\prime}) = Y^a(x^c + \delta u X^c)$

$\approx Y^a(x) + \delta u \ X^c \ \partial_c Y^a(x)$

The Lie derivative with respect to the vector field $X^a(x)$ is then given by

$L_X Y^a = \lim_{\delta u \rightarrow 0} \frac{Y^a (x^{\prime}) - Y^{\prime a}(x^{\prime})}{\delta u}$

$= X^c \ \partial_c Y^a - Y^b \ \partial_b X^a$

$\equiv X^b \ \partial_b Y^a - Y^b \ \partial_b X^a$

Under the convention of associating the vector field $X$ with the linear differential operator $X^a \partial_a$, one often sees the Lie derivative of a contravariant vector field $Y^a$ with respect to the field $X$ written as

$L_X Y^a = [X, Y]^a$

where $[X, Y] = XY - YX$ is called the Lie bracket (or commutator) of the two vector fields $X$ and $Y$. This is a new vector field (and therefore linear differential operator) that can be written alternatively as

$[X, Y] = X(Y^a \partial_a) - Y(X^a \partial_a)$

$= X^b \partial_b (Y^a \partial_a) - Y^b \partial_b (X^a \partial_a)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a + X^a Y^b(\partial_b \partial_a - \partial_a \partial_b)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a$

where the last equality follows from the fact that the second term in the penultimate line will always vanish by Young’s Theorem (equality of cross-partials). Therefore the $a$-th component of the vector field $[X, Y]$ is the one that appears in the expression of the Lie derivative $L_X Y^a$ above.

The Lie derivative of a one-form (covariant vector) field $Y_a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the one-form (covariant vector) field $Y_a(x)$ at $P$ is mapped to

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

To work out the transformation matrix here we need to write the coordinate transformation as

$x^{\prime b} = x^b + \delta u X^b(x)$

$\implies$

$x^b = x^{\prime b} - \delta u X^b(x)$

Partially differentiating we get

$\frac{\partial x^b}{\partial x^{\prime a}} = \delta^b_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^b$

$= \delta^b_a - \delta u \ \partial_c X^b \frac{\partial x^c}{\partial x^{\prime a}}$

$= \delta^b_a - \delta u \ \partial_c X^b \big(\delta^c_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^a\big)$

$= \delta^b_a - \delta u \ \partial_a X^b + O((\delta u)^2)$

We can ignore the $O((\delta u)^2)$ terms as they will disappear in the limiting process of the differentiation, so we have

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

$= \big( \delta^b_a - \delta u \ \partial_a X^b \big) Y_b(x)$

$= Y_a(x) - \delta u Y_b \ \partial_a X^b$

Again taking a first-order Taylor series approximation about $x$ at the point $Q$ we get that

$Y_a(x^{\prime}) = Y_a(x^c + \delta u X^c)$

$\approx Y_a(x) + \delta u X^c \ \partial_c Y_a(x)$

Then the Lie derivative of the one-form field $Y_a(x)$ with respect to the contravariant vector field $X^a(x)$ is obtained as

$L_X Y_a = \lim_{\delta u \rightarrow 0} \frac{Y_a(x^{\prime}) - Y^{\prime}_a (x^{\prime})}{\delta u}$

$= X^c \partial_c Y_a + Y_b \partial_a X^b$

$= X^b \partial_b Y_a + Y_b \partial_a X^b$

The Lie derivative of a mixed tensor field $T^{ab\cdots}_{cd \cdots}$

A good prototype for this case is the Lie derivative of the simplest type of mixed tensor field, the rank-2 tensor of type $(1, 1)$ represented as $T^a_b(x)$. We will therefore consider this case first and then use it to extrapolate to a general mixed tensor field of type $(p, q)$ represented as $T^{ab \cdots}_{cd \cdots}(x)$.

Under the coordinate transformation

$x^{\prime a} = x^a + \delta u X^a(x)$

the mixed tensor field $T^a_b(x)$ transforms as

$T^{\prime a}_b(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^c} \frac{\partial x^d}{\partial x^{\prime b}} T^c_d(x)$

$= (\delta^a_c + \delta u \ \partial_c X^a)(\delta^d_b - \delta u \ \partial_b X^d) T^c_d$

$= (\delta^a_c + \delta u \ \partial_c X^a)(T^c_b - \delta u T^c_d \ \partial_b X^d)$

$= T^a_b(x) - \delta u T^a_d \ \partial_b X^d + \delta u T^c_b \ \partial_c X^a + O((\delta u)^2)$

Under a first-order Taylor series approximation about $x$, the tensor at $Q$ can be written

$T^a_b(x^{\prime}) = T^a_b(x^c + \delta u X^c)$

$\approx T^a_b(x) + \delta u X^c \ \partial_c T^a_b$

The Lie derivative of $T^a_b(x)$ with respect to the contravariant vector field $X^a(x)$ is then

$L_X T^a_b = \lim_{\delta u \rightarrow 0} \frac{T^a_b(x^{\prime}) - T^{\prime a}_b(x^{\prime})}{\delta u}$

$= X^c \ \partial_c T^a_b + T^a_d \ \partial_b X^d - T^c_b \ \partial_c X^a$

We observe that the contravariant index $a$ contributes a term of the form $-T^c_b \ \partial_c X^a$ while the covariant index $b$ contributes a term of the form $T^a_d \ \partial_b X^d$.

Now consider the general mixed tensor $T^{ab \cdots}_{cd \cdots}$. The first-order Taylor series approximation of $T^{ab \cdots}_{cd \cdots}(x^{\prime})$ about $x$ gives

$T^{ab \cdots}_{cd \cdots}(x^e + \delta u X^e)$

$\approx T^{ab \cdots}_{cd \cdots}(x) + \delta u X^e \ \partial_e T^{ab \cdots}_{cd \cdots}(x)$

Therefore the first-term of the Lie derivative will be $X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$. This is of the same form as the first term of $L_X T^a_b$. Conveniently, it turns out that from then on each contravariant and covariant index in $T^{ab \cdots}_{cd \cdots}$ will contribute terms like the corresponding terms we saw above in $L_X T^a_b$. Therefore the Lie derivative of the general mixed tensor field $T^{ab \cdots}_{cd \cdots}(x)$ with respect to the contravariant vector field $X^a(x)$ will be of the form

$L_X T^{ab \cdots}_{cd \cdots} = X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$

$- \ T^{eb \cdots}_{cd \cdots} \ \partial_e X^a \ - \ T^{ae \cdots}_{cd \cdots} \ \partial_e X^b \ - \ \cdots$

$+ \ T^{ab \cdots}_{ed \cdots} \ \partial_c X^e \ + \ T^{ab \cdots}_{ce \cdots} \ \partial_d X^e \ + \ \cdots$

# A closer look at how the principal, orbital and magnetic quantum numbers arise in Schrödinger’s theory of the hydrogen atom

In this note I want to explore some aspects of the solution of Schrödinger’s three-dimensional wave equation in spherical polar coordinates which pertain to the three main quantum numbers characterising the electron in a hydrogen atom: the electron’s principal, orbital and magnetic quantum numbers. In particular, I have noticed that many physics and mathematics sources tend to gloss over some (or all) of the mathematical details of how permissibility conditions on the three quantum numbers emerge naturally from validity constraints on the relevant underlying differential equations. I want to bring out these details clearly in the present note.

In general, four quantum numbers are needed to fully describe atomic electrons in many-electron atoms. These four numbers and their permissible values are:

Principal quantum number $\tilde{n} = 1, 2, 3, \ldots$

Orbital quantum number $l = 0, 1, 2, \ldots, (\tilde{n} - 1)$

Magnetic quantum number $m_l = 0, \pm 1, \pm 2, \ldots, \pm l$

Spin magnetic quantum number $m_s = -\frac{1}{2}, +\frac{1}{2}$

The principal quantum number determines the electron’s energy, the orbital quantum number its orbital angular-momentum magnitude, the magnetic quantum number its orbital angular-momentum direction, and the spin magnetic quantum number its spin direction.

I have noticed that it is often not explained clearly why, for example, the orbital quantum number cannot exceed the principal quantum number minus one, or why the magnitude of the magnetic quantum number cannot exceed that of the orbital quantum number. I will explore issues like these in depth in the context of the time-independent Schrödinger equation for the hydrogen atom, which only involves the first three quantum numbers. I will not discuss the spin magnetic quantum number in the present note.

Schrödinger’s wave equation for the hydrogen atom

In Cartesian coordinates, Schrödinger’s three-dimensional equation for the electron in the hydrogen atom is

$\frac{\partial^2 \psi}{\partial x^2} + \frac{\partial^2 \psi}{\partial y^2} + \frac{\partial^2 \psi}{\partial z^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

where $m_e$ denotes the electron mass. The potential energy $U$ is the electric potential energy of a charge $-e$ given that it is at distance $r$ from another charge $+e$, namely

$U = -\frac{e^2}{4 \pi \epsilon_0 r}$

It is necessary to change variables in Schrödinger’s equation since the potential energy is a function of radial distance $r$ rather than the Cartesian coordinate variables $x$, $y$ and $z$. Given the spherical symmetry of the atom, it is sensible to proceed by changing the variables in Schrödinger’s equation to those of spherical polar coordinates (rather than changing the $r$ variable in $U$ to Cartesian coordinates using $r = \sqrt{x^2 + y^2 + z^2}$). Only the variables in the Laplacian part of Schrödinger’s equation need to be changed, so we can use the approach in my previous note on changing variables in Laplace’s equation to get

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

I will now temporarily simplify things by using the representation of the square of the angular momentum operator in spherical polar coordinates which I obtained in another previous note, namely

$L^2 = - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$

$= - \hbar^2 r^2 \big( \frac{1}{r^2 \sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$

Using this to replace the two middle terms in Schrödinger’s equation and rearranging we get

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{2m_e}{\hbar^2}(E - U) \psi = \frac{L^2}{\hbar^2 r^2} \psi$

This equation can now be solved by the usual separation of variables approach. We assume that the $\psi$ function can be expressed as a product

$\psi(r, \theta, \phi) = R(r) Y(\theta, \phi)$

and then substitute this back into the wave equation to get

$\frac{Y}{r^2} \frac{d}{ d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e}{\hbar^2}(E - U) R Y = \frac{R}{\hbar^2 r^2} L^2 Y$

Dividing through by $\frac{R Y}{r^2}$ we get

$\frac{1}{R} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e r^2}{\hbar^2}(E - U) = \frac{1}{\hbar^2 Y} L^2 Y$

Since the left-hand side of this equation depends only on $r$ while the right-hand side depends only on $\theta$ and $\phi$, both sides must be equal to some constant which we can call $\lambda$. Setting the left and right-hand sides equal to $\lambda$ in turn and rearranging slightly we finally get the radial equation

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

and the angular equation

$L^2 Y = \lambda \hbar^2 Y$

We can now apply separation of variables again to the angular equation. Rewriting the operator $L^2$ in full the angular equation becomes

$- \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} \big) = \lambda \hbar^2 Y$

which simplifies to

$\frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} + \lambda Y = 0$

We assume that the $Y$ function can be written as the product

$Y(\theta, \phi) = \Theta (\theta) \Phi (\phi)$

Substituting this into the angular equation gives

$\frac{\Phi}{\sin \theta}\frac{d}{d \theta} \big( \sin \theta \frac{d \Theta}{d \theta}\big) + \frac{\Theta}{\sin^2 \theta}\frac{d^2 \Phi}{d \phi^2} + \lambda Y \Theta \Phi = 0$

Multiplying through by $\frac{\sin^2 \theta}{\Theta \Phi}$ and rearranging we get

$-\frac{1}{\Phi} \frac{d^2 \Phi}{d \phi^2} = \frac{\sin^2 \theta}{\Theta}\big[ \frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \lambda \Theta\big]$

Since the left-hand side of this equation depends only on $\phi$ while the right-hand side depends only on $\theta$, both sides must be equal to some constant which we can provisionally call $k$. Setting the left and right-hand sides equal to $k$ in turn and rearranging we get

$\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0$

and

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0$

We now have three ordinary differential equations that need to be solved, one for $\Phi$, one for $\Theta$ and one for $R$. We will solve each of them in turn.

The equation for $\Phi$

The equation for $\Phi$ is a straightforward second-order differential equation with auxiliary equation

$\zeta^2 + k = 0$

implying $\zeta = \pm \sqrt{-k}$ if $k < 0$ and $\zeta = \pm i \sqrt{k}$ if $k > 0$. Therefore it has a general solution of the form

$\Phi(\phi) = Ae^{\sqrt{-k} \phi} + Be^{- \sqrt{-k} \phi}$

if $k < 0$ and $\Phi(\phi) = Ae^{i \sqrt{k} \phi} + Be^{- i \sqrt{k} \phi}$ if $k > 0$, where $A$ and $B$ are arbitrary constants. Now, the azimuth angle $\phi$ can take any value in $(-\infty, \infty)$ but the function $\Phi$ must take a single value at each point in space (since this is a required property of the quantum wave function which $\Phi$ is a constituent of). It follows that the function $\Phi$ must be periodic since it must take the same value at $\phi$ and $\phi + 2\pi$ for any given $\phi$. This imposes two constraints on the form of the general solution: (1) it cannot consist only of exponential functions with real arguments since these are not periodic (thus ruling out the first general solution above and thereby implying that the separation constant $k$ must be nonnegative); (2) $\sqrt{k}$ must be an integer. Given these constraints, it is customary in quantum mechanics to denote $\pm \sqrt{k}$ by the letter $m$ (it is called the magnetic quantum number) and to specify the separation constant in the angular equations as $m^2$, which guarantees its nonnegativity. We then state the general solution of the equation for $\Phi$ as

$\Phi(\phi) = Ae^{i m \phi} + Be^{- i m \phi}$

In practice, a particular electron wave function will involve only a single value of $m$ so only the first of the two terms in the general solution will be necessary. We can therefore state the general solution of the equation for $\Phi$ for a given magnetic quantum number $m$ as

$\Phi(\phi) \propto e^{i m \phi}$

The equation for $\Theta$

Given that we now know the separation constant for the angular equations is either zero or a positive square number $k = m^2$, we can write the equation for $\Theta$ as

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

Expanding the first term we get

$\frac{1}{\sin \theta} \cos \theta \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2} + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

I am now going to multiply and divide the first two terms by $\sin^2 \theta$ to get

$\sin^2 \theta \big(\frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} \big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

$\iff$

$\sin^2 \theta \big(- \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} + \frac{2 \cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

Now we can make the change of variable $x = \cos \theta$ which implies $dx = - \sin \theta d \theta$ and therefore

$\frac{d \theta}{dx} = - \frac{1}{\sin \theta}$

$\frac{d \Theta}{d x} = \frac{d \Theta}{d \theta} \frac{d \theta}{d x} = - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta}$

$\frac{d^2 \Theta}{d x^2} = \frac{d}{d \theta} \big[ - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta} \big] \frac{d \theta}{d x} = - \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2}$

Using these in the amended form of the $\Theta$ equation together with the fact that $\sin^2 \theta$ = $1 - x^2$, the $\Theta$ equation becomes

$(1 - x^2) \big(\frac{d^2 \Theta}{d x^2} - \frac{2x}{1 - x^2} \frac{d \Theta}{d x} \big) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

$\iff$

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

We will solve this equation first for the case $m = 0$ (the solutions will be Legendre polynomials) and use these results to construct solutions for the case $m \neq 0$ (the solutions here will be the associated Legendre functions). Setting $m = 0$ we get

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \lambda \Theta = 0$

which has the form of a well known differential equation known as Legendre’s equation. It can be solved by assuming a series solution of the form

$\Theta = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + \cdots + a_n x^n + \cdots$

and then differentiating it term by term twice to get

$\Theta^{\prime} = a_1 + 2 a_2 x + 3 a_3 x^2 + 4 a_4 x^3 + \cdots + n a_n x^{n-1} + \cdots$

and

$\Theta^{\prime \prime} = 2 a_2 + 6 a_3 x + 12 a_4 x^2 + 20 a_5 x^3 + \cdots + n (n - 1) a_n x^{n-2} + \cdots$

We now substitute these into Legendre’s equation and set the coefficient of each power of $x$ equal to zero (because $\Theta$ must satisfy Legendre’s equation identically). We find that the coefficient of the $x^n$ term satisfies

$(n + 2)(n + 1) a_{n + 2} + (\lambda - n(n + 1)) a_n = 0$

which implies

$a_{n + 2} = - \frac{(\lambda - n(n + 1))}{(n + 2)(n + 1)} a_n$

This formula makes it possible to find any even coefficient as a multiple of $a_0$ and any odd coefficient as a multiple of $a_1$. The general solution of our Legendre equation is then a sum of two series involving two arbitrary constants $a_0$ and $a_1$:

$\Theta = a_0 \bigg \{1 - \frac{\lambda}{2!} x^2 + \frac{\lambda (\lambda - 6)}{4!} x^4 - \frac{\lambda(\lambda - 6)(\lambda - 20)}{6!} x^6 + \cdots \bigg \}$

$+ a_1 \bigg \{x - \frac{(\lambda - 2)}{3!} x^3 + \frac{(\lambda - 2)(\lambda - 12)}{5!} x^5 - \frac{(\lambda - 2)(\lambda - 12)(\lambda - 30)}{7!} x^7 + \cdots \bigg \}$

Both of the series in this sum converge for $x^2 < 1$ but in general they do not converge for $x^2 = 1$. This is a problem for us because in our change of variables we set $x = \cos \theta$ and we want solutions that converge for all possible values of $\theta$ including those that result in $x^2 = 1$. It turns out that the only way to get such solutions is to choose integer values of $\lambda$ that make either the $a_0$ or the $a_1$ series in the above sum terminate (the other series will generally be divergent so we remove it by setting the corresponding arbitrary constant equal to zero). This requires $\lambda$ to take values in the quadratic sequence $0$, $2$, $6$, $12$, $20$, $30$, $42$, $56 \ldots$ The $l$-th term of this sequence is $l(l + 1)$, so the separation constant $\lambda$ must be of this form, i.e., $\lambda = l(l + 1)$ for some $l = 0, 1, 2, 3, \dots$. When $l$ takes an even value the $a_0$ series will terminate and we can set $a_1 = 0$ to make the other series vanish. Conversely, when $l$ takes an odd value the $a_1$ series will terminate and we can set $a_0 = 0$ to make the other series vanish.

From the eigenvalue equation for $L^2$ given earlier ($L^2 Y = \lambda \hbar^2 Y$) it is clear that the magnitude of the orbital angular momentum is $L = \sqrt{l(l + 1)} \hbar$. It is interesting to see how the form of this arises mathematically from considering series solutions to Legendre’s equation above. The parameter $l$ is called the orbital angular momentum quantum number.

Note that negative integral values of $l$ are allowed but they simply give solutions already obtained for positive values. For example, $l = -2$ gives $\lambda = 2$ and this makes the $a_1$ series terminate, yielding the polynomial solution

$\Theta = a_1 x$

This is exactly the same solution as the one that would be obtained if $l = 1$. It is therefore customary to restrict $l$ to nonnegative values. Each possible value of $l$ gives a polynomial solution to Legendre’s equation. For $l = 0$ we get $\Theta = a_0$, for $l = 1$ we get $\Theta = a_1 x$, for $l = 2$ we get $\Theta = a_0 - 3 a_0 x^2$, and so on. If the value of $a_0$ or $a_1$ in each polynomial equation is selected so that $\Theta = 1$ when $x = 1$ the resulting polynomials are called Legendre polynomials, denoted by $P_l(x)$. Given that for each $l$ we have $P_l(1) = 1$ the first few Legendre polynomials are

$P_0(x) = 1$ $P_1(x) = x$

$P_2(x) = \frac{1}{2}(3 x^2 - 1)$

$P_3(x) = \frac{1}{2}\big(5 x^3 - 3x \big)$

These are the physically acceptable solutions to Legendre’s equation for $\Theta$ above.

We now consider the solutions for

$m \neq 0$ of the equation $(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

We now know that $\lambda = l(l + 1)$ so we can write this in and we can also add the subscript $l$ to $m$ as the solutions to this equation will involve a link the between the values of the orbital angular momentum and magnetic quantum numbers. The equation we need to solve becomes

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big[l(l + 1) - \frac{m_l^2}{1 - x^2} \big] \Theta = 0$

The link between $l$ and $m_l$ arises from the fact that we are constrained in trying to solve this equation: it encompasses the case $m_l = 0$ for which the physically acceptable solutions are the Legendre polynomials $P_l(x)$. Therefore the physically allowable solutions for the above equation must include the Legendre polynomials as a special case. We can find these by using the series approach again and it turns out that the physically acceptable solutions are the so-called associated Legendre functions which take the form

$P_l^{m_l}(x) = (1 - x^2)^{m_l/2}\frac{d^{m_l}}{d x^{m_l}}P_l(x)$

Now, each Legendre polynomial $P_l(x)$ is a polynomial of degree $l$. Therefore the $m_l$-th order derivative in $P_l^{m_l}$ will equal zero if $|m_l| > l$, so for physically acceptable solutions we must impose the constraint $|m_l| \leq l$ in the differential equation for $\Theta$. This is where the link between the quantum numbers $l$ and $m_l$ comes from in the quantum theory of the hydrogen atom: given a value of $l$ the acceptable values of $m_l$ are integers in the range $-l \leq m_l \leq l$.

Finally, note two things: (1) $P_l^{m_l}(x)$ reduces to the Legendre polynomial $P_l(x)$ when $m_l = 0$, which is what we needed. (2) A negative value for $m_l$ does not change $m_l^2$ in the original differential equation so a solution for positive $m_l$ is also a solution for the corresponding negative $m_l$. Thus many references define the associated Legendre function $P_l^{m_l}(x)$ for $-l \leq m_l \leq l$ as $P_l^{|m_l|}(x)$.

To conclude, given values for the quantum numbers $l$ and $m_l$, the general solution of the equation for $\Theta$ can be written as

$\Theta(\theta) \propto P_l^{m_l}(\cos \theta)$

To clarify where the principal quantum number comes from, the final equation we need to deal with is the radial equation

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

Writing $\lambda = l(l + 1)$ and replacing $U$ with the formula for the potential energy we get

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}\big(\frac{e^2}{4 \pi \epsilon_0 r} + E\big) - \frac{l(l + 1)}{r^2} \big] R = 0$

$\iff$

$\frac{d^2 R}{d r^2} + \frac{2}{r} \frac{d R}{d r} + \frac{2m_e}{\hbar^2} \big[ E + \frac{e^2}{4 \pi \epsilon_0 r} - \frac{l(l + 1) \hbar^2}{2 m_e r^2} \big] R = 0$

We are only interested in solutions for which the electron is bound within the atom, so we take $E < 0$ (the negative energy of the electron is the amount of energy that must be supplied to it to free it from the atom). In order to solve the above equation it is customary to make the change of variable

$\rho = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} r$

and define the dimensionless constant

$\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2}$

If we then specify $R = R(\rho)$ we have

$\frac{d R}{d r} = \frac{d R}{d \rho} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d R}{d \rho}$

$\frac{2}{r} \frac{d R}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{2}{\rho} \frac{d R}{d \rho}$

$\frac{d^2 R}{d r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d^2 R}{d \rho^2} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{d^2 R}{d \rho^2}$

$\frac{2 m_e}{\hbar^2} E + \frac{2 m_e}{\hbar^2} \frac{e^2}{4 \pi \epsilon_0 r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big \{\frac{1}{4}\frac{e^2}{4 \pi \epsilon_0 r} \big(-\frac{1}{E}\big) - \frac{1}{4}\big \} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big(\frac{\tau}{\rho} - \frac{1}{4}\big)$

$\frac{l(l + 1)}{r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{l(l + 1)}{\rho^2}$

Using these results we can rewrite the differential equation as

$\big(-\frac{8 m_e E}{\hbar^2}\big) \bigg \{\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) \bigg \} = 0$

$\iff$

$\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0$

To make further progress we consider the behaviour of this differential equation as $\rho \rightarrow \infty$. It reduces to

$\frac{d^2 R}{d \rho^2} - \frac{1}{4} R = 0$

which is a straightforward second-order differential equation with auxiliary equation

$\zeta^2 - \frac{1}{4} = 0$

$\implies \zeta = \pm \frac{1}{2}$

The positive solution to the auxiliary equation implies a term in the general solution of the form $e^{\rho/2}$ which is unacceptable since it explodes as $\rho \rightarrow \infty$. Therefore we only accept the negative solution to the auxiliary equation and the general solution for $R$ as $\rho \rightarrow \infty$ must be of the form

$R \propto e^{-\rho/2}$

This suggests we can try an exact solution of the full differential equation of the form

$R = e^{-\rho/2} F(\rho)$

Differentiating this twice we get

$\frac{d R}{d \rho} = -\frac{1}{2} e^{-\rho/2} F(\rho) + e^{-\rho/2} F^{\prime} (\rho)$

$\frac{d^2 R}{d \rho^2} = \frac{1}{4} e^{-\rho/2} F(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) + e^{-\rho/2} F^{\prime \prime} (\rho)$

Substituting these into the differential equation

$\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0$

gives

$F^{\prime \prime}(\rho) + \frac{(2 - \rho)}{\rho} F^{\prime}(\rho) + \big[\frac{(\tau - 1)}{\rho} - \frac{l(l + 1)}{\rho^2}\big] F(\rho) = 0$

$\iff$

$\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0$

We can now try to solve this latest version of the differential equation by the method of Frobenius, which involves assuming a generalised power series solution of the form

$F(\rho) = a_0 \rho^s + a_1 \rho^{s + 1} + a_2 \rho^{s + 2} + \cdots$

Differentiating twice we get

$F^{\prime}(\rho) = s a_0 \rho^{s-1} + (s + 1) a_1 \rho^s + (s + 2) a_2 \rho^{s + 1} + \cdots$

$F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^{s-2} + s (s + 1) a_1 \rho^{s-1} + (s + 1) (s + 2) a_2 \rho^s + \cdots$

Then the terms appearing in the differential equation have the generalised power series forms

$\rho^2 F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^s + s (s + 1) a_1 \rho^{s+1} + (s + 1) (s + 2) a_2 \rho^{s+2} + \cdots$

$2 \rho F^{\prime}(\rho) = 2 s a_0 \rho^s + 2 (s + 1) a_1 \rho^{s+1} + 2 (s + 2) a_2 \rho^{s + 2} + \cdots$

$-\rho^2 F^{\prime}(\rho) = - s a_0 \rho^{s+1} - (s + 1) a_1 \rho^{s+2} - (s + 2) a_2 \rho^{s + 3} - \cdots$

$(\tau - 1) \rho F(\rho) = (\tau - 1) a_0 \rho^{s+1} + (\tau - 1) a_1 \rho^{s + 2} + (\tau - 1) a_2 \rho^{s + 3} + \cdots$

$-l(l + 1) F(\rho) = -l(l + 1) a_0 \rho^s - l(l + 1) a_1 \rho^{s + 1} - l(l + 1) a_2 \rho^{s + 2} - \cdots$

Summing these terms (remembering that the sum must be identically equal to zero) we find the coefficient of $\rho^s$ to be

$[s(s - 1) + 2s - l(l+1)]a_0 = 0$

$\implies$

$s(s + 1) - l(l+1) = 0$

$\implies$

$s = l$ or $s = -l - 1$

Now, when $s = -l - 1$ the first term of the power series for $F(\rho)$ is $a_0/\rho^{l+1}$ which explodes as $\rho \rightarrow 0$. This is unacceptable so we discard this solution and set $s = l$.

For the coefficient of $\rho^{s + n}$ we get

$[(s+n)(s+n-1) + 2(s+n) - l(l+1)]a_n + [(\tau - 1) - (s+n-1)]a_{n-1} = 0$

Setting $s = l$ and rearranging gives us the recurrence equation

$a_n = \frac{(l + n - \tau)}{(l+n+1)(l+n) - l(l+1)} a_{n-1}$

From this recurrence equation we observe that

$a_n \rightarrow \frac{1}{n}a_{n-1} = \frac{1}{n!}a_0$

as $n \rightarrow \infty$. We deduce from this that the series for $F(\rho)$ becomes like $a_0 \rho^l \sum \frac{\rho^n}{n!}$ as $n \rightarrow \infty$ and therefore $R = e^{-\rho/2} F(\rho)$ becomes like $a_0 \rho^l e^{\rho/2}$. However, this diverges as $\rho \rightarrow \infty$ which is unacceptable, so we conclude that the series for $F(p)$ must terminate at some value of $n$ which we will call $N$. In this case we have $a_{N+1} = 0$ which the recurrence equation tells us can only happen if

$\tau = l + N + 1 \equiv \tilde{n}$

This is how the principal quantum number $\tilde{n}$ first appears. Now, we have

$\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2} = \tilde{n}$

$\iff$

$\big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \big(-\frac{m_e}{2 \hbar^2}\big) \frac{1}{E} = \tilde{n}^2$

$\iff E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}$

These are the famous bound-state energy eigenvalues for $\tilde{n} = 1, 2, \ldots$. This is the same formula for the energy levels of the hydrogen atom that Niels Bohr obtained by intuitive means in his 1913 ‘solar system’ model of atomic structure.

As stated above, the integer $\tilde{n}$ is called the principal quantum number. Recall that $\tilde{n} = l + N + 1$ and $N$ cannot be smaller than zero. It follows that

$\tilde{n} - l - 1 \geq 0$

$\iff$

$l \leq \tilde{n} - 1$

This explains why for a given value of $\tilde{n}$ the allowable values of $l$ are $l = 0, 1, 2, \dots, (\tilde{n}-1)$.

Returning to the solution of

$\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0$

the above discussion suggests that we should look for a solution of the form

$F(\rho) = a_0 \rho^l L(\rho)$

where $L(\rho)$ is a polynomial (rather than an infinite series). Differentiating this twice gives

$F^{\prime}(\rho) = a_0 l \rho^{l-1}L(\rho) + a_0 \rho^l L^{\prime}(\rho)$

$F^{\prime \prime} (\rho) = a_0 (l-1) l \rho^{l-2}L(\rho) + 2 a_0 l \rho^{l-1} L^{\prime}(\rho) + a_0 \rho^l L^{\prime \prime}(\rho)$

Substituting these into the differential equation and setting $\tau = \tilde{n}$ we get

$\rho^{l+2} L^{\prime \prime}(\rho) + (2l + 2 - \rho) \rho^{l+1}L^{\prime}(\rho) + (\tilde{n} - 1 - l)\rho^{l+1}L(\rho) = 0$

$\iff$

$\rho L^{\prime \prime}(\rho) + (2l + 2 - \rho) L^{\prime}(\rho) + (\tilde{n} - 1 - l) L(\rho) = 0$

$\iff$

$\rho L^{\prime \prime}(\rho) + (\alpha + 1 - \rho) L^{\prime}(\rho) + \tilde{n}^{*} L(\rho) = 0$

where $\alpha \equiv 2l + 1$ and $\tilde{n}^{*} \equiv \tilde{n} - 1 - l$. This last form is a well known differential equation whose physically acceptable solutions in the present context are associated Laguerre polynomials given by the formula

$L_{\tilde{n}^{*}}^{(\alpha)} = \sum_{j = 0}^{\tilde{n}^{*}} (-1)^j \frac{(\tilde{n}^{*} + \alpha)!}{(\tilde{n}^{*}-j)!(\alpha + j)!}\frac{\rho^j}{j!}$

For given quantum numbers $\tilde{n}$ and $l$, the solution of the radial equation for $R$ is then

$R_{\tilde{n}l}(r) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)}$

Final form of the electron wave function $\psi$

Putting everything together, for given principal quantum number $\tilde{n}$, orbital quantum number $l$ and magnetic quantum number $m_l$, the wave function of the electron in the hydrogen atom is

$\psi_{\tilde{n} l m_l}(r, \theta, \phi) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)} P_l^{m_l}(\cos \theta) e^{i m_l \phi}$

where

$\rho = \big(-\frac{8 m_e E_{\tilde{n}}}{\hbar^2}\big)^{1/2} r$

and

$E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}$