A note on reverse engineering the Navier-Stokes equations

The Navier-Stokes equations are the fundamental equations of fluid mechanics, analogous to, say, Maxwell’s equations in the case of electromagnetism. They are important for applications in science and engineering but they are difficult to solve analytically so real-life applications rely almost exclusively on computer-aided methods. In fact, the equations are still not yet fully understood mathematically. Some basic questions about the existence and nature of possible solutions remain unanswered. Because of their importance, the Clay Mathematics Institute has offered a prize of one million dollars to anyone who can clarify some specific fundamental questions about them (see the CMI web page for details).

The equations are named after Claude-Louis Navier (1785-1836) and George Gabriel Stokes (1819-1903) who worked on their development independently. Some other leading contemporary mathematicians were also involved and I was intrigued to learn that all of these mathematicians used Euler’s equations of motion as a starting point in their derivations of the Navier-Stokes equations. Euler’s equations had been derived much earlier in 1757 by the great mathematician Leonhard Euler (1707-1783). In tensor notation, using the Einstein summation convention, the Euler equations take the form

$\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

I explored a derivation of these equations from Newton’s second law in a previous post. To better understand how the two sets of equations are related, it occurred to me that it might be an interesting exercise, and probably not too difficult, to try to reverse engineer’ the Navier-Stokes equations using Euler’s equations as a starting point. I want to briefly record my exploration of this idea in the present note.

I am interested in the incompressible fluid version of the Navier-Stokes equations which can be written in tensor form, again using the Einstein summation convention, as

$\rho g^i - \frac{\partial p}{\partial x^i} + \mu \frac{\partial}{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big) = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

where $\mu > 0$ is a viscosity coefficient specific to the fluid in question. Putting the Euler equations and the Navier-Stokes equations side by side like this makes it clear that the key difference between them is the addition of the viscosity-related terms on the left-hand side of the Navier-Stokes equations. To clarify things a bit more it is helpful to see the equations laid out fully. The Euler equations constitute a three-equation system which looks like this:

$\rho g^1 - \frac{\partial p}{\partial x^1} = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)$

$\rho g^2 - \frac{\partial p}{\partial x^2} = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)$

$\rho g^3 - \frac{\partial p}{\partial x^3} = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)$

The Navier-Stokes equations also constitute a three-equation system which looks like this:

$\rho g^1 - \frac{\partial p}{\partial x^1} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)$

$\rho g^2 - \frac{\partial p}{\partial x^2} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)$

$\rho g^3 - \frac{\partial p}{\partial x^3} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)$

Looking at the pattern of the indices appearing in the viscosity-related terms on the left-hand side of the Navier-Stokes equations, one is immediately reminded of the indices identifying the positions of the elements of a matrix and we can in fact arrange the viscosity-related terms in matrix form as

$\begin{bmatrix} \frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\end{bmatrix}$

From the point of view of reverse engineering the Navier-Stokes equations this is highly suggestive because in deriving the Euler equations we encountered one key matrix, the rank-2 stress tensor $\tau^{ij}$. Could the above matrix be related to the $3 \times 3$ stress tensor? The answer is yes, and in fact we have that

$\begin{bmatrix} \frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\end{bmatrix} = \begin{bmatrix} \frac{\partial \tau^{11}}{\partial x^1} & \frac{\partial \tau^{12}}{\partial x^2} & \frac{\partial \tau^{13}}{\partial x^3}\\ \ \\ \frac{\partial \tau^{21}}{\partial x^1} & \frac{\partial \tau^{22}}{\partial x^2} & \frac{\partial \tau^{23}}{\partial x^3}\\ \ \\ \frac{\partial \tau^{31}}{\partial x^1} & \frac{\partial \tau^{32}}{\partial x^2} & \frac{\partial \tau^{33}}{\partial x^3} \end{bmatrix}$

In deriving the Euler equations what we did was to assume that the fluid was friction-free so that there were no shear forces acting on any of the faces of a given fluid volume element. This means that the fluid was being modelled as being inviscid, i.e., having no viscosity. As a result, we kept only the diagonal elements of the stress tensor and re-interpreted these as pressures. Approximating the net pressure in each direction on a fluid volume element using a Taylor series expansion we were then led to include only the first-order partials of the diagonal terms of the stress tensor (interpreted as the first-order partials of pressure) in the Euler equations.

In the case of the Navier-Stokes equations we are no longer assuming that the fluid is inviscid so we are including all the first-order partials of the stress tensor in the equations. Thus, the Navier-Stokes equations could be written as

$\rho g^1 - \frac{\partial p}{\partial x^1} + \mu \big(\frac{\partial \tau^{11}}{\partial x^1} + \frac{\partial \tau^{12}}{\partial x^2} + \frac{\partial \tau^{13}}{\partial x^3}\big) = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)$

$\rho g^2 - \frac{\partial p}{\partial x^2} + \mu \big(\frac{\partial \tau^{21}}{\partial x^1} + \frac{\partial \tau^{22}}{\partial x^2} + \frac{\partial \tau^{23}}{\partial x^3}\big) = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)$

$\rho g^3 - \frac{\partial p}{\partial x^3} + \mu \big(\frac{\partial \tau^{31}}{\partial x^1} + \frac{\partial \tau^{32}}{\partial x^2} + \frac{\partial \tau^{33}}{\partial x^3}\big) = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)$

Detailed consideration of the forces acting on an infinitesimal volume element in the case of a viscous fluid lead to expressions for the stress tensor components as functions of the first-order partials of the components of the velocity vector of the form

$\tau^{ij} = \tau^{ji} = \mu \big(\frac{\partial v^i}{\partial x^j} + \frac{\partial v^j}{\partial x^i}\big)$

The first-order partials of these are then of the form

$\frac{\partial \tau^{ij}}{\partial x^j} = \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big) + \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^j}{\partial x^i}\big)$

But note that with incompressible flow the mass density of the fluid is a constant so the divergence of the velocity vanishes (this is implied directly by the continuity equation with constant mass density). Therefore the second term on the right-hand side is zero since

$\frac{\partial }{\partial x^j}\big(\frac{\partial v^j}{\partial x^i}\big) = \frac{\partial }{\partial x^i}\big(\frac{\partial v^j}{\partial x^j}\big)$

$= \frac{\partial }{\partial x^i}\big(\frac{\partial v^1}{\partial x^1}\big) + \frac{\partial }{\partial x^i}\big(\frac{\partial v^2}{\partial x^2}\big) + \frac{\partial }{\partial x^i}\big(\frac{\partial v^3}{\partial x^3}\big) = \frac{\partial }{\partial x^i} \text{div}(\vec{v}) = 0$

Thus we are left with

$\frac{\partial \tau^{ij}}{\partial x^j} = \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big)$

which are the extra terms appearing on the left-hand side of the Navier-Stokes equations.

A fluid-mechanical visualisation of the quantum-mechanical continuity equation

The concept of a probability current is useful in quantum mechanics for analysing quantum scattering and tunnelling phenomena, among other things. However, I have noticed that the same rather abstract and non-visual approach to introducing probability currents is repeated almost verbatim in every textbook (also see, e.g., this Wikipedia article). The standard approach essentially involves defining a probability current from the outset as

$\vec{j} = \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

and then using Schrödinger’s equation to show that this satisfies a fluid-like continuity equation of the form

$\frac{\partial \rho}{\partial t} + \nabla \cdot \vec{j} = 0$

with

$\rho \equiv \Psi^{*} \Psi$

In the present note I want to briefly explore a more intuitive and visual approach involving a model of the actual flow of a probability fluid’. I want to begin with a fluid-mechanical model and then obtain the standard expression for the quantum-mechanical continuity equation from this, rather than starting out with an abstract definition of the probability current and then showing that this satisfies a continuity equation. The essential problem one faces when trying to do this is that although in classical mechanics the position $\vec{r}(t)$ of a point particle and its velocity $\vec{v}(t) = d\vec{r}(t)/dt$ are well defined, this is not the case in conventional quantum mechanics. Quantum mechanics is done probabilistically, treating a particle as a wave packet such that the square of the amplitude of the corresponding wave function acts as a probability density which can be used to measure the probability that the particle will occupy a particular region of space at a particular time. It is not possible to say definitively where a particular particle will be at a particular time in quantum mechanics, which makes it difficult to apply the conventional deterministic equations of fluid mechanics.

A huge specialist literature on quantum hydrodynamics has in fact arisen which tries to circumvent this problem in a number of ways. A standard reference is Wyatt, R. E., 2005, Quantum Dynamics with Trajectories: Introduction to Quantum Hydrodynamics (Springer). The route that a large part of this literature has taken is intriguing because it is based on Bohmian mechanics, an approach to quantum mechanics developed by David Bohm in 1952 which is regarded by most mainstream physicists today as unconventional. The key feature of the Bohmian mechanics approach is that classical-like particle trajectories are possible. Using this approach one can obtain Newtonian-like equations of motion analogous to those in conventional fluid mechanics and this is how this particular literature seems to have chosen to treat quantum particle trajectories in a fluid-like way. Attempts have also been made to introduce mathematically equivalent approaches, but defined within conventional quantum mechanics (see, e.g., Brandt, S. et al, 1998, Quantile motion and tunneling, Physics Letters A, Volume 249, Issue 4, pp. 265-270).

In the present note I am not looking to solve any elaborate problems so I will simply consider a free quantum wave packet which is not acted upon by any forces and try to visualise probability currents and the quantum continuity equation in a fluid-like way by using the momentum vector operator $\hat{\vec{p}}$ to characterise the velocity of the particle. I will then show that the probability current obtained in this fluid-mechanical model is the same as the one defined abstractly in textbooks.

In quantum mechanics, calculations are done using operators to represent observables. Every possible observable that one might be interested in for the purposes of experiment has a corresponding operator which the mathematics of quantum mechanics can work on to produce predictions. The key operator for the purposes of the present note is the momentum vector operator

$\hat{\vec{p}} = -i \hbar \nabla$

which is the quantum mechanical analogue of the classical momentum vector

$\vec{p} = m \vec{v}$

The key idea for the present note is to regard the velocity vector of the quantum particle as being represented by the operator

$\frac{\hat{\vec{p}}}{m}$

by analogy with the classical velocity vector which can be obtained as

$\vec{v} = \frac{\vec{p}}{m}$

We will imagine the total probability mass

$\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \Psi \text{d}V = 1$

as a fluid in steady flow throughout the whole of space and obeying mass conservation. The fact that the flow is steady reflects the fact that there are no forces acting on the quantum particle in this model, so we must have

$\frac{\partial }{\partial t}\big[\frac{\hat{\vec{p}}}{m}\big] = 0$

The velocity can vary from point to point in the probability fluid but at any given point it cannot be varying over time.

In a classical fluid we have a mass density per unit volume $\rho$ and we regard the velocity vector $\vec{v}$ as a volumetric flow rate per unit area, i.e., the volume of fluid that would pass through a unit area per unit time. Then $\rho \vec{v}$ is the mass flow rate per unit area, i.e., the mass of fluid that would pass through a unit area per unit time. In quantum mechanics we can regard the probability mass density per unit volume $\rho \equiv \Psi^{*} \Psi$ as analogous to the mass density of a classical fluid. We can interpret $\frac{\hat{\vec{p}}}{m}$ as the volumetric flow rate per unit area, i.e., the volume of probability fluid that would pass through a unit area per unit time. When doing probability calculations with quantum mechanical operators we usually sandwich’ the operator between $\Psi^{*}$ and $\Psi$, so following that approach here we can define the probability current density as

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$

This is to be interpreted as the probability mass flow rate per unit area, i.e., the amount of probability mass that would pass through a unit area per unit time, analogous to $\vec{j} = \rho \vec{v}$ in the classical case. To see how close the analogy is, suppose the quantum wave function is that of a plane wave

$\Psi(x, y, z, t) = A\mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

Then

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi = A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} \nabla A \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} A \ i \ \vec{k} \ \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A^2 \frac{\hbar \vec{k}}{m}$

$= \rho \vec{v}$

which looks just like the mass flow rate in the classical case with $\rho = \Psi^{*} \Psi$ and $\vec{v} \equiv \frac{\hbar \vec{k}}{m}$. Note that in this example the probability current density formula we are using, namely $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$, turned out to be real-valued. Unfortunately this will not always be the case. Since the probability current vector must always be real-valued, the fluid-mechanical model in the present note will only be applicable in cases when this is true for the formula $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$.

As in classical fluid mechanics, a continuity equation can now be derived by considering the net outflow of probability mass from an infinitesimal fluid element of volume $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$.

Considering only the $y$-component for the moment, we see from the diagram that on the left-hand side we have the probability mass flow rate coming into the volume element through the left-hand face. The mass flow rate coming out of the fluid element through the right-hand face can be approximated using a Taylor series expansion as being equal to the mass flow rate through the left-hand face plus a differential adjustment based on the gradient of the probability current density and the length $\mathrm{d} y$. The net probability mass flow rate in the $y$-direction is then obtained by subtracting the left-hand term from the right-hand term to get

$\frac{\partial }{\partial y} \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi\big) \mathrm{d}V$

Using similar arguments for the $x$ and $z$-directions, the net mass flow rate out of the fluid element in all three directions is then

$\nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) \mathrm{d} V$

Now, the probability mass inside the fluid element is $\rho \mathrm{d} V$ where $\rho = \Psi^{*} \Psi$ and if there is a net outflow of probability fluid this mass will be decreasing at the rate

$- \frac{\partial \rho}{\partial t} \mathrm{d} V$

Equating the two expressions and dividing through by the volume of the fluid element we get the equation of continuity

$\frac{\partial \rho}{\partial t} + \nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) = 0$

What I want to do now is show that if we work out $\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$ we will get the same formula for the probability current as the one usually given in quantum mechanics textbooks. We have

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{m} \Psi^{*} \nabla \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi$

We now note that since the probability current density must be a real vector, the last two terms above must be real. Therefore they are not affected in any way if we take their complex conjugate. Taking the complex conjugate of the second term in the last equality we get

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi + \frac{i \hbar}{2m} \Psi \nabla \Psi^{*}$

$= \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

$= \vec{j}$

This is exactly the expression for the probability current density that appears in textbooks, but rather than introducing it out of nowhere’ at the beginning, we have obtained it naturally as a result of a fluid-mechanical model.

Calculation of a quantum-mechanical commutator in three dimensions

I needed to work out the commutator $[\hat{H}, \hat{\vec{r}} \ ]$, where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator and $\hat{\vec{r}}$ is the 3D position vector operator. It is difficult to find any textbook or online source that explicitly goes through the calculation of this three-dimensional case (in fact, I have not been able to find any) so I am recording my calculation step-by-step in this note.

The commutator $[\hat{H}, \hat{\vec{r}} \ ]$ is a vector operator with components

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H}$

$[\hat{H}, \hat{y} \ ] = \hat{H} \ \hat{y} - \hat{y} \ \hat{H}$

and

$[\hat{H}, \hat{z} \ ] = \hat{H} \ \hat{z} - \hat{z} \ \hat{H}$

To evaluate these, note that the momentum operator (in position space) is

$\hat{\vec{p}} = -i \ \hbar \nabla$

and so we have

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

$= \frac{1}{2m} \hat{\vec{p}} \cdot \hat{\vec{p}} + \hat{U}$

$= \frac{1}{2m}(\hat{p}_x^{2} + \hat{p}_y^{2} + \hat{p}_z^{2}) + \hat{U}$

Looking at the $x$-component of $[\hat{H}, \hat{\vec{r}} \ ]$ we therefore have

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{\hat{p}_x^{2} \hat{x} + \hat{p}_y^{2} \hat{x} + \hat{p}_z^{2} \hat{x}}{2m} + \hat{U} \ \hat{x} - \big(\frac{\hat{x} \hat{p}_x^{2} + \hat{x} \hat{p}_y^{2} + \hat{x} \hat{p}_z^{2}}{2m} + \hat{x} \ \hat{U}\big)$

$= \frac{1}{2m}([\hat{p}_x^2, \hat{x} \ ] + [\hat{p}_y^2, \hat{x} \ ] + [\hat{p}_z^2, \hat{x} \ ]) + [\hat{U}, \hat{x} \ ]$

Since multiplication is commutative we have $[\hat{U}, \hat{x} \ ] = 0$. I will now show that we also have

$[\hat{p}_y^2, \hat{x} \ ] = [\hat{p}_z^2, \hat{x} \ ] = 0$

To see this, let us first work out in detail the effect of $[\hat{p}_y, \hat{x} \ ]$ on a wavefunction $\Psi$. We have

$[\hat{p}_y, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial y} + \hat{x }i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial y} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} = 0$

where the last equality follows from the fact that $\hat{x}$ does not depend on $y$. Thus, $[\hat{p}_y, \hat{x} \ ] = 0$.

We can now easily show that $[\hat{p}_y^2, \hat{x} \ ] = 0$ because using the basic result for commutators that

$[AB, C] = A[B, C] + [A, C]B$

(easy to prove by writing out the terms in full) we find that

$[\hat{p}_y^2, \hat{x} \ ] = \hat{p}_y \ [\hat{p}_y, \hat{x} \ ] + [\hat{p}_y, \hat{x} \ ] \ \hat{p}_y = 0$

Identical arguments show that $[\hat{p}_z^2, \hat{x} \ ] = 0$. Thus, we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ]$

It now only remains to work out $[\hat{p}_x^2, \hat{x} \ ]$ and we can do this by first working out in detail the effect of $[\hat{p}_x, \hat{x} \ ]$ on a wavefunction $\Psi$ (this is of course the canonical commutation relation’ of quantum mechanics). We have

$[\hat{p}_x, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial x} + \hat{x } i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial x} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} = - i \ \hbar \Psi$

where the last equality follows from the fact that $\hat{x}$ is the same as multiplying by $x$ so its derivative with respect to $x$ equals $1$. Thus, $[\hat{p}_x, \hat{x} \ ] = - i \ \hbar$. Then we find that

$[\hat{p}_x^2, \hat{x} \ ] = \hat{p}_x \ [\hat{p}_x, \hat{x} \ ] + [\hat{p}_x, \hat{x} \ ] \ \hat{p}_x = -2 i \ \hbar \hat{p}_x$

and we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ] = - \frac{i \ \hbar}{m} \hat{p}_x$

Identical arguments show that

$\hat{H} \ \hat{y} - \hat{y} \ \hat{H} = \frac{1}{2m} [\hat{p}_y^2, \hat{y} \ ] = - \frac{i \ \hbar}{m} \hat{p}_y$

and

$\hat{H} \ \hat{z} - \hat{z} \ \hat{H} = \frac{1}{2m} [\hat{p}_z^2, \hat{z} \ ] = - \frac{i \ \hbar}{m} \hat{p}_z$

Thus, we finally reach our desired expression for the Hamiltonian-position commutator in three dimensions:

$[\hat{H}, \hat{\vec{r}} \ ] = - \frac{i \ \hbar}{m} \hat{\vec{p}}$

As an application of this result we will consider the problem of working out the expectation of position and velocity for a quantum particle. In three-dimensional space the quantum wave function is

$\Psi = \Psi(x, y, z, t)$

and we obtain the probability density function as

$\rho(x, y, z, t) = \Psi^{*} \Psi$

The wavefunction $\Psi$ satisfies the time-dependent Schrödinger equation

$i \hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\big(\frac{\partial^2 \Psi}{\partial x^2} + \frac{\partial^2 \Psi}{\partial y^2} + \frac{\partial^2 \Psi}{\partial z^2}\big) + U\Psi$

where $U = U(x, y, z, t)$ is some potential energy function. We can write the Schrödinger equation in operator form using Dirac notation as

$\hat{H} |\Psi \rangle = i \hbar \frac{\partial}{\partial t} |\Psi \rangle$

where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator (the Hamiltonian form of total energy) and

$i \hbar \frac{\partial }{\partial t}$

is the total energy operator. Note that the complex conjugate of the wavefunction $\Psi^{*}$ satisfies the Schrödinger equation written in Dirac notation as

$\langle \Psi | \hat{H} = - \langle \Psi | i \hbar \frac{\partial}{\partial t}$

In quantum mechanics we find the expected position $\langle \vec{r}(t) \rangle$ of the particle by integrating the position operator $\hat{\vec{r}}$ over all space, sandwiched between $\Psi^{*}$ and $\Psi$. Thus, letting $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$ we have

$\langle \vec{r}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \hat{\vec{r}} \ \Psi \text{d}V \equiv \langle \Psi \ | \hat{\vec{r}} \ | \Psi \rangle$

where in the last term I have switched to using Dirac notation which will be useful shortly. The expected velocity can then be obtained by differentiating this integral with respect to $t$. We get

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \frac{\partial \hat{\vec{r}}}{\partial t} \ \Psi \text{d}V$

The second triple integral on the right-hand side is zero because the position operator does not depend on time. The integrand in the first triple integral can be manipulated by using the operator form of the Schrödinger equation and Dirac notation to write

$\frac{\partial \Psi}{\partial t} = \frac{\partial}{\partial t} | \Psi \rangle = \frac{1}{i \ \hbar} \ \hat{H} \ | \Psi \rangle$

and

$\frac{\partial \Psi^{*}}{\partial t} = \langle \Psi | \frac{\partial}{\partial t} = -\frac{1}{i \ \hbar} \ \langle \Psi | \hat{H}$

Thus, we have

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V$

$= - \frac{1}{i \ \hbar} \langle \Psi |\hat{H} \ \hat{\vec{r}} \ | \Psi \rangle + \frac{1}{i \ \hbar} \langle \Psi | \hat{\vec{r}} \ \hat{H} | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \hat{H} \ \hat{\vec{r}} \ - \hat{\vec{r}} \ \hat{H} \ | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \ [\hat{H}, \hat{\vec{r}} \ ] \ | \Psi \rangle$

$= \frac{1}{m} \langle \Psi | \ \hat{\vec{p}} \ | \Psi \rangle$

$= \big \langle \frac{\hat{\vec{p}}}{m} \big \rangle$

where the last two equalities follow from the fact that $[\hat{H}, \hat{\vec{r}} \ ]$ is the commutator of $\hat{H}$ and $\hat{\vec{r}}$, which is equal to $-\frac{i \hbar}{m} \hat{\vec{p}}$ as we saw above. Therefore the expected velocity of a quantum particle looks a lot like the velocity of a classical particle (momentum divided by mass). The idea that quantum mechanical expectations exhibit Newtonian-like behaviour is the essence of the Ehrenfest Theorem of quantum mechanics.

Derivation of Euler’s equations of motion for a perfect fluid from Newton’s second law

Having read a number of highly technical derivations of Euler’s equations of motion for a perfect fluid I feel that the mathematical meanderings tend to obscure the underlying physics. In this note I want to explore the derivation from a more physically intuitive point of view. The dynamics of a fluid element of mass $m$ are governed by Newton’s second law which says that the vector sum of the forces acting on the fluid element is equal to the mass of the element times its acceleration. Thus,

$\vec{F} = m \vec{a}$

The net force $\vec{F}$ can be decomposed into two distinct types of forces, so-called body forces $\vec{W}$ that act on the entire fluid element (e.g., the fluid element’s weight due to gravity) and stresses $\vec{S}$ such as pressures and shears that act upon the surfaces enclosing the fluid element. For the purposes of deriving the differential form of Euler’s equation we will focus on the net force per unit volume acting on the fluid element, $\vec{f}$, which we will decompose into a weight per unit volume $\vec{w}$ and a net stress force per unit volume $\vec{s}$. The weight per unit volume is simply obtained as

$\vec{w} = \rho \vec{g}$

where

$\rho = \frac{m}{V}$

is the mass density of the fluid (i.e., mass per unit volume) and $\vec{g}$ is the acceleration due to gravity. In index notation, the equation for the $i$-th component of the weight per unit volume is

$w^i = \rho g^i$

The net stress force per unit volume, $\vec{s}$, is a little more complicated to derive since it involves the rank-2 stress tensor $\tau^{ij}$. This tensor contains nine components and is usually represented as a $3 \times 3$ symmetric matrix. In Cartesian coordinates the components along the main diagonal, namely $\tau^{xx}$, $\tau^{yy}$ and $\tau^{zz}$, represent normal stresses, i.e., forces per unit area acting orthogonally to the planes whose normal vector is identified by the first superscript, as indicated in the diagram below. (Note that a stress is a force per unit area, so to convert a stress tensor component $\tau^{ij}$ into a force it would be necessary to multiply it by the area over which it acts).

In the diagram each normal stress is shown as a tension, i.e., a normal stress pointing away from the surface. When a normal stress points towards the surface it acts upon, it is called a pressure.

The off-diagonal components of the stress tensor represent shear stresses, i.e., forces per unit area that point along the sides of the fluid element, parallel to these sides rather than normal to them. These shear stresses are shown in the following diagram.

Shear stresses only arise when there is some kind of friction in the fluid. A perfect fluid is friction-free so there are no shear stresses. Euler’s equation only applies to perfect fluids so for the derivation of the equation we can ignore the off-diagonal components of the stress tensor.

The normal stresses along the main diagonal are usually written as

$\tau^{xx} = - p^x$

$\tau^{yy} = - p^y$

$\tau^{zz} = - p^z$

where $p$ stands for pressure and the negative sign reflects the fact that a pressure points in the opposite direction to a tension.

In a perfect fluid the pressure is isotropic, i.e., the same in all directions, so we have

$p^x = p^y = p^z = p$

Therefore the stress tensor of a perfect fluid with isotropic pressure reduces to

$\tau^{ij} = -p \delta^{ij}$

where $\delta^{ij}$ is the Kronecker delta (and may be thought of here as the metric tensor of Cartesian 3-space).

Now suppose we consider the net stress (force per unit area) in the y-direction of an infinitesimal volume element.

The stress on the right-hand face can be approximated using a Taylor series expansion as being equal to the stress on the left plus a differential adjustment based on its gradient and the length $dy$. If we take the stress on the right to be pointing in the positive direction and the one on the left as pointing in the negative (opposite) direction, the net stress in the y-direction is given by

$\tau^{yy} + \frac{\partial \tau^{yy}}{\partial y} dy - \tau^{yy} = \frac{\partial \tau^{yy}}{\partial y} dy$

Similarly, the net stresses in the $x$ and $z$-directions are

$\frac{\partial \tau^{xx}}{\partial x} dx$

and

$\frac{\partial \tau^{zz}}{\partial z} dz$

To convert these net stresses to net forces we multiply each one by the area on which it acts. Thus, the net forces on the fluid element (in vector form) are

$\big(\frac{\partial \tau^{xx}}{\partial x} dxdydz\big) \vec{i}$

$\big(\frac{\partial \tau^{yy}}{\partial y} dxdydz\big) \vec{j}$

$\big(\frac{\partial \tau^{zz}}{\partial z} dxdydz\big) \vec{k}$

The total net force on the fluid element is then

$\big(\frac{\partial \tau^{xx}}{\partial x} \ \vec{i} + \frac{\partial \tau^{yy}}{\partial y} \ \vec{j} + \frac{\partial \tau^{zz}}{\partial z} \ \vec{k}\big) dxdydz$

Switching from tensions to pressures using $\tau^{ij} = -p \delta^{ij}$ and dividing through by the volume $dxdydz$ we finally get the net stress force per unit volume to be

$\vec{s} = -\big(\frac{\partial p}{\partial x} \ \vec{i} + \frac{\partial p}{\partial y} \ \vec{j} + \frac{\partial p}{\partial z} \ \vec{k}\big)$

In index notation, the equation for the $i$-th component of this net pressure per unit volume is written as

$s^i = -\frac{\partial p}{\partial x^i}$

We have now completed the analysis of the net force on the left-hand side of Newton’s second law.

On the right-hand side of Newton’s second law we have mass times acceleration, where acceleration is the change in velocity with time. To obtain an expression for this we observe that the velocity of a fluid element may change for two different reasons. First, the velocity field may vary over time at each point in space. Second, the velocity may vary from point to point in space (at any given time). Thus, we consider the velocity field to be a function of the time coordinate as well as the three spatial coordinates, so

$\vec{v} = \vec{v}(t, x, y, z) = v^x(t, x, y, z) \ \vec{i} + v^y(t, x, y, z) \ \vec{j} + v^z(t, x, y, z) \ \vec{k}$

Considering the $i$-th component of this velocity field, the total differential is

$dv^i = \frac{\partial v^i}{\partial t} \ dt + \frac{\partial v^i}{\partial x} \ dx + \frac{\partial v^i}{\partial y} \ dy + \frac{\partial v^i}{\partial z} \ dz$

so the total derivative with respect to time is

$\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} \ dt + v^x \frac{\partial v^i}{\partial x} + v^y \frac{\partial v^i}{\partial y} + v^z \frac{\partial v^i}{\partial z}$

where I have used

$v^x = \frac{dx}{dt}$

$v^y = \frac{dy}{dt}$

$v^z = \frac{dz}{dt}$

We can write this more compactly using the Einstein summation convention as

$\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}$

This is then the $i$-th component of the acceleration vector on the right-hand side of Newton’s second law. In component form, therefore, we can write mass times acceleration per unit volume for the fluid element as

$\rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

This completes the analysis of the mass times acceleration term on the right-hand side of Newton’s second law.

In per-unit-volume form, Newton’s second law for a fluid element is

$\vec{w} + \vec{s} = \rho \vec{a}$

and writing this in the component forms derived above we get the standard form of Euler’s equations of motion for a perfect fluid:

$\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

Alternative approaches to formulating geodesic equations on Riemannian manifolds and proof of their equivalence

A geodesic can be defined as an extremal path between two points on a manifold in the sense that it minimises or maximises some criterion of interest (e.g., minimises distance travelled, maximises proper time, etc). Such a path will satisfy some geodesic equations equivalent to the Euler-Lagrange equations of the calculus of variations. A geodesic can also be defined in a conceptually different way as the straightest’ possible path between two points on a manifold. In this case the path will satisfy geodesic equations derived by requiring parallel transport of a tangent vector along the path. Although these are conceptually different ways of defining geodesics, they are mathematically equivalent. In the present note I want to explore the derivation of geodesic equations in these two different ways and prove their mathematical equivalence.

Now, in the calculus of variations we typically define a system’s action $S$ to be the time-integral of a Lagrangian $L$:

$S \equiv \int^{t_B}_{t_A} L(q_i, \dot{q_i}) dt$

where $L(q_i, \dot{q_i})$ says that the Lagrangian is a function of position coordinates $q_i$ and velocities $\dot{q_i}$ (and $i$ ranges over however many coordinates there are). We find the trajectory that yields a desired extremal value of the action $S$ as the one that satisfies the Euler-Lagrange equations

$0 = \frac{d}{dt} \big(\frac{\partial L}{\partial \dot{q_i}} \big) - \frac{\partial L}{\partial q_i}$

Let us now suppose that we are facing an exactly analogous situation in which there are two points on the manifold, $A$ and $B$, and we are considering possible paths between them to try to find the extremal one. We can describe any path between $A$ and $B$ by specifying the coordinates of the points along it as functions of a parameter $\sigma$ that goes from a value of $0$ at $A$ to a value of $1$ at $B$, i.e., by specifying the functions $x^{\mu}(\sigma)$. Noting that the line element can be written as

$ds^2 = g_{\mu \gamma} dx^{\mu} dx^{\gamma}$

we can write the length of a particular path as

$s = \int \sqrt{ds^2} = \int^1_0 \sqrt{g_{\mu \gamma} \frac{dx^{\mu}}{d \sigma} \frac{dx^{\gamma}}{d \sigma}} d \sigma$

Note that the metric is a function of the coordinates of points along the path, which in turn are functions of the parameter $\sigma$, i.e., $g_{\mu \gamma} = g_{\mu \gamma}(x^{\alpha}(\sigma))$. This situation is exactly analogous to the usual calculus of variations scenario because, writing $\dot{x}^{\alpha} \equiv d x^{\alpha}/d \sigma$, we see that we have a Lagrangian function

$L(x^{\alpha}, \dot{x}^{\alpha}) = \sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

and we hope to find the path $x^{\mu}(\sigma)$ that makes the integral of the Lagrangian extreme. This will be the path that satisfies the Euler-Lagrange equations

$0 = \frac{d}{d \sigma} \big(\frac{\partial L}{\partial \dot{x}^{\alpha}}\big) - \frac{\partial L}{\partial x^{\alpha}}$

This corresponds to $N$ separate differential equations in an $N$-dimensional manifold, one equation for each value of the index $\alpha$.

We can manipulate the Euler-Lagrange equations to get geodesic equations which are easier to use in particular contexts. First, note that

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{\partial}{\partial \dot{x}^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} (g_{\mu \gamma} \ \delta^{\mu}_{\alpha} \ \dot{x}^{\gamma} + g_{\mu \gamma} \ \dot{x}^{\mu} \ \delta^{\ \gamma}_{\alpha})$

because, for example, $\partial \dot{x}^{\mu}/\partial \dot{x}^{\alpha} = \delta^{\mu}_{\alpha}$. Also note that the metric is treated as a constant as it depends on $x^{\alpha}$ not on $\dot{x}^{\alpha}$. Doing the sums over the Kronecker deltas we get

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{2L}(g_{\alpha \gamma} \ \dot{x}^{\gamma} + g_{\mu \alpha} \ \dot{x}^{\mu})$

$= \frac{1}{2L}(g_{\alpha \mu} \ \dot{x}^{\mu} + g_{\alpha \mu} \ \dot{x}^{\mu})$

$= \frac{1}{L} g_{\alpha \mu} \ \dot{x}^{\mu}$

But notice that since

$s = \int L d \sigma$

we have

$\frac{ds}{d \sigma} = L$

so

$\frac{1}{L} = \frac{d \sigma}{ds}$

and we can write

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{L} g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma} \frac{d \sigma}{ds}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{ds}$

Next, we have

$\frac{\partial L}{\partial x^{\alpha}} = \frac{\partial}{\partial x^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \dot{x}^{\mu} \dot{x}^{\gamma}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{d \sigma} \frac{d x^{\gamma}}{d \sigma} \frac{d \sigma}{ds}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Putting these results into the Euler-Lagrange equations we get

$0 = \frac{d}{d \sigma} \big(g_{\alpha \mu} \frac{d x^{\mu}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Finally, multiplying through by $d \sigma/ds$ we get

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

where I have also renamed $\mu \rightarrow \beta$ in the first term to make it clearer that the Einstein summations in the first and second terms are independent. This is the first version of the geodesic equations, derived by requiring that the path between the points $A$ and $B$ should be extremal in the sense of satisfying the Euler-Lagrange equations of the calculus of variations.

We will now derive a second version of the geodesic equations by requiring the geodesic to be a path that is locally straight. In differential geometry a path is defined as straight if it parallel transports its own tangent vector, i.e., if the tangent vector does not change as we move an infinitesimal step along the path. If we take an arbitrary point on the path to be $x^{\mu} \ e_{\mu}$ and we take $ds$ to be an infinitesimal displacement along the path, then a tangent vector to the path is

$\frac{d x^{\mu}}{d \sigma} e_{\mu}$

and we want

$\frac{d}{ds}\big(\frac{d x^{\mu}}{d \sigma}e_{\mu} \big) = \frac{d^2 x^{\mu}}{ds d \sigma} e_{\mu} + \frac{d x^{\mu}}{d \sigma} \frac{d e_{\mu}}{ds} = 0$

Multiplying through by $d \sigma/ds$ this gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d e_{\mu}}{ds} = 0$

But

$\frac{d e_{\mu}}{ds} = \frac{\partial e_{\mu}}{\partial x^{\gamma}} \frac{d x^{\gamma}}{d s}$

$= \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha}$

Putting this into the equation gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha} = 0$

To enable us to factor out the basis vector we can rename the indices in the second term as $\mu \rightarrow \alpha$ and $\gamma \rightarrow \beta$ to get

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} e_{\mu} = 0$

$\iff$

$\big[\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} \big] e_{\mu} = 0$

$\implies$

$\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = 0$

This is the second version of the geodesic equations, derived by assuming that the path between the two points on the manifold is locally straight.

We now have two seemingly different versions of the geodesic equations, namely

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

and

$0 = \frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta}$

We will next show that they are in fact mathematically equivalent. Starting from the first version, we can expand out the brackets to get

$0 = \frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

$\iff$

$0 = \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

Now we rename the indices as follows: $\sigma \rightarrow \alpha$ in the first term; $\sigma \rightarrow \beta$ in the second term; $\beta \rightarrow \mu$ and $\alpha \rightarrow \sigma$ in the third term; and $\alpha \rightarrow \sigma$, $\mu \rightarrow \alpha$, $\gamma \rightarrow \beta$ in the fourth term. We get

$0 = \frac{1}{2}\frac{\partial g_{\sigma \beta}}{\partial x^{\alpha}}\frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\sigma \alpha}}{\partial x^{\beta}} \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2} - \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}} \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds}$

We can write this as

$0 = \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} \frac{1}{2} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}] + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2}$

Finally, multiplying through by $g^{\mu \sigma}$ and using the facts that

$g^{\mu \sigma} \ g_{\sigma \mu} = 1$

and

$\Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = \frac{1}{2} g^{\mu \sigma} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}]$

we get

$0 = \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} + \frac{d^2 x^{\mu}}{ds^2}$

which is the second version of the geodesic equation. Thus, the two versions are equivalent as claimed.

Geometric interpretation of Christoffel symbols and some alternative approaches to calculating them

In a classic paper in 1869, Elwin Bruno Christoffel (1829-1900) introduced his famous Christoffel symbols $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ to represent an array of numbers describing a metric connection. They are also known as connection coefficients (and sometimes less respectfully as Christ-awful symbols’). In differential geometry one usually first encounters them when studying covariant derivatives of tensors in tensor calculus. For example, suppose we try to differentiate the contravariant vector $A = A^{\alpha} e_{\alpha}$, where $e_{\alpha}$ denotes a coordinate basis vector (and we are using the Einstein summation convention). We get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \frac{\partial e_{\alpha}}{\partial x^{\beta}}$

In general, the partial derivative in the second term on the right will result in another vector which we can write in terms of its coordinate basis as

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

This defines the Christoffel symbol $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$. The downstairs indices refer to the rate of change of the basis components $e_{\alpha}$ with respect to the coordinate variable $x^{\beta}$ in the direction of the coordinate basis vector $e_{\gamma}$ ($\gamma$ being the upstairs index). Substituting the second equation into the first we get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

To enable us to factor out the coordinate basis vector we can exchange the symbols $\alpha$ and $\gamma$ in the second term on the right to get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta} \ e_{\alpha}$

$= \big( \frac{\partial A^{\alpha}}{\partial x^{\beta}} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}\big) \ e_{\alpha}$

The expression in the bracket is called the covariant derivative of the contravariant vector $A$, i.e., the rate of change of $A^{\alpha}$ in each of the directions $\beta$ of the coordinate system $x^{\beta}$. It has the important property that it is itself tensorial (unlike the ordinary partial derivative of the tensor on its own). This covariant derivative is often written using the notation

$\nabla_{\beta} \ A^{\alpha} = \partial_{\beta} \ A^{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$

Having thus established the meaning of the Christoffel symbols, one then goes on to work out that the covariant derivative of a one-form is

$\nabla_{\beta} \ A_{\alpha} = \partial_{\beta} \ A_{\alpha} - A_{\gamma} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$

and that the covariant derivatives of higher rank tensors are constructed from the building blocks of $\nabla_{\beta} \ A^{\alpha}$ and $\nabla_{\beta} \ A_{\alpha}$ by adding a $\Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$ term for each upper index $\gamma$ and a $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ term for each lower index $\gamma$. For example, the covariant derivative of the $(1, 1)$ rank-2 tensor $X^{\mu}_{\hphantom{\mu} \sigma}$ is

$\nabla_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} = \partial_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} + X^{\alpha}_{\hphantom{\mu} \sigma} \ \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} - X^{\mu}_{\hphantom{\mu} \alpha} \ \Gamma^{\alpha}_{\hphantom{\alpha} \sigma \beta}$

Christoffel symbols then go on to play vital roles in other areas of differential geometry, perhaps most notably as key components in the definition of the Riemann curvature tensor.

It is possible to have a working knowledge of all of this without truly understanding at a deep level, say geometrically, what Christoffel symbols really mean. In the present note I want to delve a bit more deeply into how one might calculate and interpret Christoffel symbols geometrically. I also want to explore some alternative ways of calculating them in the context of a simple plane polar coordinate system $(r, \theta)$ which is related to the usual Cartesian $(x, y)$ coordinate system via the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

In an $n-$dimensional manifold there are potentially $n^3$ Christoffel symbols to be calculated, though this number is usually reduced by symmetries. In the present plane polar coordinate case, we will need to calculate $2^3 = 8$ Christoffel symbols. These are

$\Gamma^{r}_{\hphantom{r} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r}$

$\Gamma^{r}_{\hphantom{r} \theta r}$

$\Gamma^{r}_{\hphantom{r} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta}$

$\Gamma^{r}_{\hphantom{r} r \theta}$

Geometric approach
Consider the situation shown in the diagram below where two vectors $(e_{\theta})_P$ and $(e_{\theta})_S$ of the basis vector field $e_{\theta}$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_{\theta})_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

Now, in plane polar coordinates the magnitude of $e_{\theta}$ is

$|e_{\theta}| = r$

Therefore the length of the arc $L$ in the diagram is

$L = r \Delta \theta$

If $\Delta \theta$ is small, we have

$L \approx |\Delta_{\theta} e_{\theta}|$

where $\Delta_{\theta} e_{\theta}$ is the vector connecting the endpoints of $(e_{\theta})_P$ and $(e_{\theta})_S$, i.e., $\Delta_{\theta} e_{\theta} = (e_{\theta})_S - (e_{\theta})_P$.

Therefore

$|\Delta_{\theta} e_{\theta}| \approx r \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we get

$|d_{\theta} e_{\theta}| = r d \theta$

From the diagram we see that $d_{\theta} e_{\theta}$ points in the opposite direction of $e_r$. Therefore we have

$d_{\theta} e_{\theta} = - r d \theta e_r$

(note that in plane polar coordinates $e_r$ is of unit length). From this equation we have

$\frac{d_{\theta} e_{\theta}}{d \theta} \equiv \frac{\partial e_{\theta}}{\partial \theta} = -r e_r$

But from the definition of Christoffel symbols we have

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta}$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

We have obtained the first two Christoffel symbols on our list from the geometric setup and the nice thing about this approach is that we can see what the underlying changes in the coordinate basis vectors looked like.

To obtain the next two Christoffel symbols on our list, we consider a change in the vector field $e_{\theta}$ due to a displacement in the radial direction from $P$ to $Q$ in the following diagram:

We have moved outwards by a small amount $\Delta r$ and as a result the length of the vectors in the vector field $e_{\theta}$ has increased by a small amount $|\Delta_r e_{\theta}|$ shown in the diagram. From the diagram we see that the proportions of the two increases must be same, so we have

$\frac{|\Delta_r e_{\theta}|}{|e_{\theta}|} = \frac{\Delta r}{r}$

or

$|\Delta_r e_{\theta}| = \Delta r \frac{1}{r} |e_{\theta}|$

Passing to the differential limit as $\Delta r \rightarrow 0$ we get

$|d_r e_{\theta}| = dr \frac{1}{r} |e_{\theta}|$

Since $d_r e_{\theta}$ is directed along the vector $e_{\theta}$ we can write the vector equation

$d_r e_{\theta} = dr \frac{1}{r} e_{\theta}$

so

$\frac{d_r e_{\theta}}{dr} \equiv \frac{\partial e_{\theta}}{\partial r} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

We have thus found two more Christoffel symbols from the geometrical setup. To get the next two Christoffel symbols on our list we observe that the basis vector field $e_r$ does not change as we move in the radial direction (either in magnitude or direction) so we must have

$\frac{\partial e_r}{\partial r} = 0$

where the right hand side here denotes a zero vector. But we know that

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta}$

so we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Finally, to get the last two remaining Christoffel symbols on our list, we consider a change in the vector field $e_r$ due to an angular displacement. In the diagram below two vectors $(e_r)_P$ and $(e_r)_S$ of the basis vector field $e_r$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_r)_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

The arc length $L$ is

$L = |e_r| \Delta \theta = \Delta \theta$

(since the magnitude of the coordinate basis vector $e_r$ is $|e_r| = 1$). But for small $\Delta \theta$ we also have

$L \approx |\Delta_{\theta} e_r|$

where $\Delta_{\theta} e_r$ is the vector connecting the endpoints of $(e_r)_P$ and $(e_r)_S$, i.e., $\Delta_{\theta} e_r = (e_r)_S - (e_r)_P$. Therefore

$|\Delta_{\theta} e_r| = \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we have

$|d_{\theta} e_r| = d \theta$

But $d_{\theta} e_r$ has the same direction as $e_{\theta}$. Therefore

$d_{\theta} e_r = \frac{1}{r} d \theta e_{\theta}$

where the factor $\frac{1}{r}$ is needed to correct for the magnitude $r$ of $e_{\theta}$ (we only want the direction of $e_{\theta}$ here). Therefore we see that

$\frac{d_{\theta} e_r}{d \theta} \equiv \frac{\partial e_r}{\partial \theta} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

This completes the geometric calculation of all the Christoffel symbols for plane polar coordinates.

Algebraic approach

It is possible to calculate the eight Christoffel symbols quite easily for plane polar coordinates by first expressing the basis components $e_r$ and $e_{\theta}$ in terms of the Cartesian components $e_x$ and $e_y$. Note that these basis components are one-forms, so they transform as

$e^{\prime}_{\alpha} = \frac{\partial x^{\beta}}{\partial x^{\prime \alpha}} e_{\beta}$

We use the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

to calculate the coefficients. We get

$e_r = \frac{\partial x}{\partial r} e_x + \frac{\partial y}{\partial r} e_y$

$e_{\theta} = \frac{\partial x}{\partial \theta} e_x + \frac{\partial y}{\partial \theta} e_{\theta}$

and therefore

$e_r = \cos \theta e_x + \sin \theta e_y$

$e_{\theta} = -r \sin \theta e_x + r \cos \theta e_y$

Then we calculate the Christoffel symbols as follows. First,

$\frac{\partial e_r}{\partial r} = 0$

so

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta} = 0$

and we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Next,

$\frac{\partial e_r}{\partial \theta} = - \sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

Next,

$\frac{\partial e_{\theta}}{\partial \theta} = -r \cos \theta e_x - r \sin \theta e_y = -r e_r$

so

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta} = -r e_r$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

Finally,

$\frac{\partial e_{\theta}}{\partial r} = -\sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

Metric tensor approach

The previous approach relied on knowing the functional relationship between the Cartesian coordinates $(x, y)$ and the plane polar coordinates $(r, \theta)$. There is another more generally useful method of calculating the Christoffel symbols from the components of the metric tensor, using the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

I will first derive this formula from first principles, then use it to find the Christoffel symbols for the plane polar coordinates case.

The first step is to show that Christoffel symbols are symmetric in their lower indices, i.e.,

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

as this property will be needed in the derivation of the formula. To prove the symmetry property we start from the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Suppose we now decompose the basis vectors $e_{\alpha}$ in a local Cartesian coordinate system. Then using the transformation rule for one-forms we have

$e_{\alpha} = \frac{\partial x^m}{\partial x^{\alpha}} e_m$

where the $x^m$ are the Cartesian coordinates and the $e_m$ are the coordinate basis vectors (which are constant in both magnitude and direction in the Cartesian system). Differentiating gives

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

Equating the expressions for $\frac{\partial e_{\alpha}}{\partial x^{\beta}}$ we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

But then

$\Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\beta} \partial x^{\alpha}} e_m$

so it follows from Young’s Theorem (equality of cross-partials) that

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

We conclude that Christoffel symbols are symmetric in their lower indices, as claimed.

Note too that the components $g_{\mu \gamma}$ of the general metric tensor are also symmetric with respect to their indices. This follows from the defining equation of the metric tensor components in terms of the basis vector fields $e_{\gamma}$, namely

$g_{\mu \gamma} \equiv e_{\mu} \cdot e_{\gamma}$

Since $e_{\mu} \cdot e_{\gamma} = e_{\gamma} \cdot e_{\mu}$

the metric is symmetric, i.e.,

$g_{\mu \gamma} = g_{\gamma \mu}$

To derive the formula for the Christoffel symbols in terms of the metric tensor components, we begin again with the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Taking the scalar product with another basis vector on both sides we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} \cdot e_{\mu} = \frac{\partial e_{\alpha}}{\partial x^{\beta}} \cdot e_{\mu}$

$= \frac{\partial (e_{\alpha} \cdot \ e_{\mu})}{\partial x^{\beta}} - e_{\alpha} \cdot \frac{\partial e_{\mu}}{\partial x^{\beta}}$

$= \frac{\partial g_{\alpha \mu}}{\partial x^{\beta}} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ e_{\alpha} \cdot \ e_{\rho}$

$= \partial_{\beta} \ g_{\alpha \mu} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

Therefore we have

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

In the second term on the right hand side we can rename $\rho \rightarrow \gamma$ and use the fact that the metric is symmetric to reverse the indices. We get

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\gamma}_{\hphantom{\gamma} \mu \beta} \ g_{\gamma \alpha}$

By cyclically renaming the indices $\beta$, $\alpha$, and $\mu$ we can generate two more similar equations. From the cyclic permutation $\beta$, $\alpha$, $\mu$ $\rightarrow$ $\alpha$, $\mu$, $\beta$ we get

$\partial_{\alpha} \ g_{\mu \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \mu \alpha} \ g_{\gamma \beta} + \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ g_{\gamma \mu}$

and from the cyclic permutation $\alpha$, $\mu$, $\beta$ $\rightarrow$ $\mu$, $\beta$, $\alpha$ we get

$\partial_{\mu} \ g_{\beta \alpha} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \mu} \ g_{\gamma \alpha} + \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \mu} \ g_{\gamma \beta}$

Now we add the first two equations and subtract the third to get

$\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha} = 2 \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu}$

where we have taken advantage of the symmetry in the lower indices of the Christoffel symbols to cancel some terms. Using the fact that

$g^{\mu \gamma} \ g_{\gamma \mu} = 1$

we multiply both sides by $\frac{1}{2}g^{\mu \gamma}$ to get the final formula:

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2}g^{\mu \gamma}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha}]$

$= \frac{1}{2}g^{\gamma \mu}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

This is made easier to remember by noting the following facts. A factor of the inverse metric generates the Christoffel symbol’s upper index. The negative term has the symbol’s lower indices as the indices of the metric. The other two terms in the bracket are cyclic permutations of this last term.

Having derived the formula we can now employ it to calculate the eight Christoffel symbols for plane polar coordinates. We can work out the metric tensor using the distance formula

$ds^2 = dx^2 + dy^2$

with the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

Then

$dx = \cos \theta dr - r \sin \theta d \theta$

$dy = \sin \theta dr + r \cos \theta d \theta$

so

$dx^2 = \cos^2 \theta dr^2 + r^2 \sin^2 \theta d \theta^2 - 2 r \sin \theta \cos \theta dr d \theta$

$dy^2 = \sin^2 \theta dr^2 + r^2 \cos^2 \theta d \theta^2 + 2 r \sin \theta \cos \theta dr d \theta$

Therefore the metric in plane polar coordinates is

$ds^2 = dx^2 + dy^2 = dr^2 + r^2 d \theta^2$

The metric tensor is therefore

$[g_{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & r^2 \end{pmatrix}$

and the inverse metric is

$[g^{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & \frac{1}{r^2} \end{pmatrix}$

Now, in the formula for $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ the indices $\alpha$, $\beta$, $\gamma$ and $\mu$ represent the polar coordinates $r$ and $\theta$ in various permutations. Inspection of $[g_{\alpha \beta}]$ shows that the only partial derivative terms which do not equal zero are

$\partial_r \ g_{\theta \theta} = \partial_r (r^2) = 2r$

Inspection of $[g^{\alpha \beta}]$ shows that this equals zero except when

$g^{rr} = 1$

and

$g^{\theta \theta} = \frac{1}{r^2}$

Substituting these values of the metric tensor components into the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

we get

$\Gamma^{r}_{\hphantom{r} \theta \theta} = \frac{1}{2} g^{r r} \big( - \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} (-2r) = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

On Lie derivatives of tensor fields

Differential geometry provides a number of ways of extending the familiar notion of the derivative of a real-valued function to enable differentiation of various types of tensor fields that live’ on manifolds, such as scalars, contravariant vectors, one-forms (also known as covariant vectors) and mixed tensors. The problem that needs to be overcome in such cases is the fact that partial differentiation of tensors is generally not tensorial, i.e., the result is not itself a tensor. The reason is that the process of differentiation involves subtracting tensors living in different tangent spaces on a curved manifold, so their difference does not transform in the same way as either of the tensors individually. For example, consider a contravariant vector field $X^a$. In the tangent space at point $P$ the transformation law for $X^a$ is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_P X^b$

whereas in the tangent space at point $Q$ the transformation law is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_Q X^b$

If we imagine these two tangent spaces at points $P$ and $Q$ on the manifold separated by distance $\delta u$, the derivative would involve computing

$\lim_{\delta u \rightarrow 0} \frac{[X^a]_P - [X^a]_Q}{\delta u}$

and the difference in the numerator will clearly not transform like either of the contravariant vectors individually since their transformation matrices are evaluated at different points. The derivative will not therefore be a tensor itself.

The usual way of getting around this is to introduce some kind of auxiliary field on to the manifold which provides a link between the two tensors, thus enabling them to transform in the same way (with respect to the auxiliary field) when they are subtracted. In the present note I want to explore the particular concept of the Lie derivative of a tensor field (named after the mathematician Marius Sophus Lie, 1842-1899) which employs this auxiliary field approach by introducing a contravariant vector field on to the manifold. To this end, suppose we define a vector field $X^a (x)$. This can be used to define streamlines in the manifold (also called a congruence of curves) as the solutions of the ordinary differential equations

$\frac{dx^a}{du} = X^a (x(u))$

where $u$ is a parameter determining the position on a given streamline. The equations encapsulate the fact that each point on a streamline has a tangent vector belonging to the vector field.

Example Using the notation $x^i$ to denote the $i$-th coordinate, suppose the vector field is $X = (1, x^2(u))$. Then $X^1 = 1$, $X^2 = x^2$ and the streamlines for this vector field are obtained by solving simultaneously the differential equations

$\frac{dx^1}{du} = 1$

$\frac{dx^2}{du} = x^2$

Solving the first equation gives

$x^1 = u + c^1$

Solving the second equation gives

$x^2 = e^{c^2}e^u$

Using the solution of the first equation to substitute for $u$ in the solution of the second one we get

$x^2 = e^{c^2} e^{x^1 - c^1} = e^{c^2 - c^1}e^{x^1} \equiv Ce^{x^1}$

Therefore the streamlines of the vector field $(1, x^2(u))$ are the graphs of the equation

$x^2 = Ce^{x^1}$

where $C$ is a constant. Some of the streamlines are shown in the figure below.

Now suppose we want to find the Lie derivative of a tensor field, $T^{ab \cdots}_{cd \cdots}(x)$, using the vector field $X^a(x)$. The essential idea is to use the streamlines of the vector field to link the tensor at some point $P$, $T^{ab\cdots}_{cd \cdots}(P)$, with the tensor at some neighbouring point $Q$, $T^{ab\cdots}_{cd \cdots}(Q)$, in such a way that the two will have the same transformation matrix at point $Q$ (with respect to the auxiliary vector field). We can then subtract the two tensors at $Q$ and so define the derivative at $P$ by a limiting process as $Q$ tends to $P$. In all such cases, the technique begins by considering a coordinate transformation from $P$ to $Q$ of the form

$x^{\prime a} = x^a + \delta u X^a(x)$

where $\delta u$ is arbitrarily small. The point $Q$ with coordinates $x^{\prime a}$ lies on the streamline through $P$ which the vector field $X^a(x)$ generates. Differentiating the coordinate transformation we get

$\frac{\partial x^{\prime a}}{\partial x^b} = \frac{\partial x^a}{\partial x^b} + \delta u \ \partial_b X^a$

$= \delta^a_b + \delta u \ \partial_b X^a$

where $\delta^a_b$ is the Kronecker delta and $\partial_b \equiv \frac{\partial}{\partial x^b}$. What we now do is consider the effect of the above coordinate transformation on the tensor field $T^{ab\cdots}_{cd \cdots}$ at the points $P$ and $Q$. In what follows I will employ this general procedure to obtain the Lie derivative formulas with respect to a contravariant vector field $X^a(x)$ in the case of a scalar field $\phi$, a contravariant vector field $Y^a$, a one-form field $Y_a$, and a general mixed tensor field $T^{ab\cdots}_{cd \cdots}$.

The Lie derivative of a scalar field $\phi$

Not surprisingly this is the easiest case to deal with since scalars are invariants: the values of a scalar field defined over a manifold do not change under a change in the coordinate system being used. The value of the scalar field at the point $P$ will simply be $\phi(x)$ and the value at the point $Q$ will be

$\phi(x^{\prime}) = \phi(x^c + \delta u X^c)$

We can expand this in a Taylor series about the point $P$ with coordinates $x$ to get the first-order approximation

$\phi(x^{\prime}) \approx \phi(x) + \delta u X^c \ \partial_c \phi(x)$

The Lie derivative of the scalar field with respect to the contravariant vector field $X^a(x)$ is then

$L_X \phi = \lim_{\delta u \rightarrow 0} \frac{\phi(x^{\prime}) - \phi(x)}{\delta u} = X^c \ \partial_c \phi$

$\equiv X^a \ \partial_a \phi$

We observe that the Lie derivative of the scalar field $\phi(x)$ with respect to the vector field $X^a(x)$ is actually the directional derivative of $\phi$ in the direction of the vector $X^a$. In the differential geometry literature in this area it is common to associate the contravariant vector field $X$ with the linear differential operator $X^a \ \partial_a$ (which operates on any real-valued function $f$ to produce another function $g$) and essentially treat them as the same object. Given a point $P$ on the manifold, one thinks of the partial differential operators $[\partial_a]_P$ as constituting a basis for all the vectors in the tangent space at $P$, so that any vector at $P$ can be written as a linear combination of the $[\partial_a]_P$ in the form

$[X]_P = [X^a]_P [\partial_a]_P$

This is the intuitive justification for treating the vector field $X$ and the linear differential operator $X^a \ \partial_a$ as being the same things. Under this convention, one often sees the Lie derivative of a scalar field $\phi$ with respect to the contravariant vector field $X$ written as

$L_X \phi = X \phi$

The Lie derivative of a contravariant vector field $Y^a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the contravariant vector field $Y^a(x)$ at $P$ is mapped to

$Y^{\prime a}(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^b} Y^b(x)$

$= (\delta^a_b + \delta u \ \partial_b X^a) Y^b(x)$

$= Y^a(x) + \delta u \ Y^b \ \partial_b X^a$

The vector already at $Q$, namely $Y^a(x^{\prime})$, has a first-order Taylor series approximation about $x$ of the form

$Y^a(x^{\prime}) = Y^a(x^c + \delta u X^c)$

$\approx Y^a(x) + \delta u \ X^c \ \partial_c Y^a(x)$

The Lie derivative with respect to the vector field $X^a(x)$ is then given by

$L_X Y^a = \lim_{\delta u \rightarrow 0} \frac{Y^a (x^{\prime}) - Y^{\prime a}(x^{\prime})}{\delta u}$

$= X^c \ \partial_c Y^a - Y^b \ \partial_b X^a$

$\equiv X^b \ \partial_b Y^a - Y^b \ \partial_b X^a$

Under the convention of associating the vector field $X$ with the linear differential operator $X^a \partial_a$, one often sees the Lie derivative of a contravariant vector field $Y^a$ with respect to the field $X$ written as

$L_X Y^a = [X, Y]^a$

where $[X, Y] = XY - YX$ is called the Lie bracket (or commutator) of the two vector fields $X$ and $Y$. This is a new vector field (and therefore linear differential operator) that can be written alternatively as

$[X, Y] = X(Y^a \partial_a) - Y(X^a \partial_a)$

$= X^b \partial_b (Y^a \partial_a) - Y^b \partial_b (X^a \partial_a)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a + X^a Y^b(\partial_b \partial_a - \partial_a \partial_b)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a$

where the last equality follows from the fact that the second term in the penultimate line will always vanish by Young’s Theorem (equality of cross-partials). Therefore the $a$-th component of the vector field $[X, Y]$ is the one that appears in the expression of the Lie derivative $L_X Y^a$ above.

The Lie derivative of a one-form (covariant vector) field $Y_a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the one-form (covariant vector) field $Y_a(x)$ at $P$ is mapped to

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

To work out the transformation matrix here we need to write the coordinate transformation as

$x^{\prime b} = x^b + \delta u X^b(x)$

$\implies$

$x^b = x^{\prime b} - \delta u X^b(x)$

Partially differentiating we get

$\frac{\partial x^b}{\partial x^{\prime a}} = \delta^b_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^b$

$= \delta^b_a - \delta u \ \partial_c X^b \frac{\partial x^c}{\partial x^{\prime a}}$

$= \delta^b_a - \delta u \ \partial_c X^b \big(\delta^c_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^a\big)$

$= \delta^b_a - \delta u \ \partial_a X^b + O((\delta u)^2)$

We can ignore the $O((\delta u)^2)$ terms as they will disappear in the limiting process of the differentiation, so we have

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

$= \big( \delta^b_a - \delta u \ \partial_a X^b \big) Y_b(x)$

$= Y_a(x) - \delta u Y_b \ \partial_a X^b$

Again taking a first-order Taylor series approximation about $x$ at the point $Q$ we get that

$Y_a(x^{\prime}) = Y_a(x^c + \delta u X^c)$

$\approx Y_a(x) + \delta u X^c \ \partial_c Y_a(x)$

Then the Lie derivative of the one-form field $Y_a(x)$ with respect to the contravariant vector field $X^a(x)$ is obtained as

$L_X Y_a = \lim_{\delta u \rightarrow 0} \frac{Y_a(x^{\prime}) - Y^{\prime}_a (x^{\prime})}{\delta u}$

$= X^c \partial_c Y_a + Y_b \partial_a X^b$

$= X^b \partial_b Y_a + Y_b \partial_a X^b$

The Lie derivative of a mixed tensor field $T^{ab\cdots}_{cd \cdots}$

A good prototype for this case is the Lie derivative of the simplest type of mixed tensor field, the rank-2 tensor of type $(1, 1)$ represented as $T^a_b(x)$. We will therefore consider this case first and then use it to extrapolate to a general mixed tensor field of type $(p, q)$ represented as $T^{ab \cdots}_{cd \cdots}(x)$.

Under the coordinate transformation

$x^{\prime a} = x^a + \delta u X^a(x)$

the mixed tensor field $T^a_b(x)$ transforms as

$T^{\prime a}_b(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^c} \frac{\partial x^d}{\partial x^{\prime b}} T^c_d(x)$

$= (\delta^a_c + \delta u \ \partial_c X^a)(\delta^d_b - \delta u \ \partial_b X^d) T^c_d$

$= (\delta^a_c + \delta u \ \partial_c X^a)(T^c_b - \delta u T^c_d \ \partial_b X^d)$

$= T^a_b(x) - \delta u T^a_d \ \partial_b X^d + \delta u T^c_b \ \partial_c X^a + O((\delta u)^2)$

Under a first-order Taylor series approximation about $x$, the tensor at $Q$ can be written

$T^a_b(x^{\prime}) = T^a_b(x^c + \delta u X^c)$

$\approx T^a_b(x) + \delta u X^c \ \partial_c T^a_b$

The Lie derivative of $T^a_b(x)$ with respect to the contravariant vector field $X^a(x)$ is then

$L_X T^a_b = \lim_{\delta u \rightarrow 0} \frac{T^a_b(x^{\prime}) - T^{\prime a}_b(x^{\prime})}{\delta u}$

$= X^c \ \partial_c T^a_b + T^a_d \ \partial_b X^d - T^c_b \ \partial_c X^a$

We observe that the contravariant index $a$ contributes a term of the form $-T^c_b \ \partial_c X^a$ while the covariant index $b$ contributes a term of the form $T^a_d \ \partial_b X^d$.

Now consider the general mixed tensor $T^{ab \cdots}_{cd \cdots}$. The first-order Taylor series approximation of $T^{ab \cdots}_{cd \cdots}(x^{\prime})$ about $x$ gives

$T^{ab \cdots}_{cd \cdots}(x^e + \delta u X^e)$

$\approx T^{ab \cdots}_{cd \cdots}(x) + \delta u X^e \ \partial_e T^{ab \cdots}_{cd \cdots}(x)$

Therefore the first-term of the Lie derivative will be $X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$. This is of the same form as the first term of $L_X T^a_b$. Conveniently, it turns out that from then on each contravariant and covariant index in $T^{ab \cdots}_{cd \cdots}$ will contribute terms like the corresponding terms we saw above in $L_X T^a_b$. Therefore the Lie derivative of the general mixed tensor field $T^{ab \cdots}_{cd \cdots}(x)$ with respect to the contravariant vector field $X^a(x)$ will be of the form

$L_X T^{ab \cdots}_{cd \cdots} = X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$

$- \ T^{eb \cdots}_{cd \cdots} \ \partial_e X^a \ - \ T^{ae \cdots}_{cd \cdots} \ \partial_e X^b \ - \ \cdots$

$+ \ T^{ab \cdots}_{ed \cdots} \ \partial_c X^e \ + \ T^{ab \cdots}_{ce \cdots} \ \partial_d X^e \ + \ \cdots$