# On the electrodynamical part of Einstein’s paper ‘Zur Elektrodynamik bewegter Körper’

Albert Einstein (1879-1955) introduced his special theory of relativity in a 1905 paper Zur Elektrodynamik bewegter Körper (On the Electrodynamics of Moving Bodies, Annalen der Physik. 17 (10): 891–921). The thoughtfulness and confidence of the 26 year-old Einstein are impressive and what is also striking is that the paper contains no references to other research publications. The only person Einstein acknowledges is his best friend Michele Angelo Besso (1873-1955) with the now famous concluding sentence: Zum Schlusse bemerke ich, daß mir beim Arbeiten an dem hier behandelten Probleme mein Freund und Kollege M. Besso treu zur Seite stand und daß ich demselben manche wertvolle Anregung verdanke. (In conclusion I wish to say that in working at the problem here dealt with I have had the loyal assistance of my friend and colleague M. Besso, and that I am indebted to him for several valuable suggestions.’) The two remained close until they died a month apart fifty years later.

While reading through Einstein’s work I could not help tinkering with his electrodynamical equations and comparing them to the modern equations of classical and relativistic electrodynamics. I want to record some of my jottings about this in the present note, particularly in relation to two key sections of his paper:

§$3$ Theorie der Koordinaten- und Zeittransformation… (Theory of the Transformation of Coordinates and Times…)

and

§$6$ Transformation der Maxwell-Hertzschen Gleichungen für den leeren Raum. (Transformation of the Maxwell-Hertz Equations for Empty Space.)

What Einstein envisages throughout these two sections is a pair of inertial reference frames $K$ and $k$ which are in what we now call standard configuration’. That is, frame $k$ is moving in the positive $x$-direction relative to frame $K$ with speed $v$ such that all the Cartesian coordinate axes in the two frames remain parallel and such that the origins and axes of the Cartesian coordinate systems in $K$ and $k$ coincide perfectly at some initial time point. Under these conditions the coordinates of an event using the coordinate system in the $k$-frame can be expressed in terms of the coordinates of the same event using the coordinate system in the $K$-frame by a set of transformation equations which Einstein derives in §$3$ of his paper, and which are now known as the Lorentz transformation equations. Einstein expresses these equations as follows on page 902 of the published paper:

(It follows from this relation and the one previously found that $\phi(v) = 1$, so that the transformation equations which have been found become: $\cdots$.’) Note that Einstein uses the symbol $V$ to denote the speed of light whereas we now use the symbol $c$.

Shortly after Einstein published this in 1905, Herman Minkowski (1864-1909) realised that the special theory of relativity could be better understood by positing the existence of a four-dimensional spacetime. In Minkowski spacetime we have four-vectors consisting of a time coordinate, $ct$, and three spatial coordinates, $x$, $y$ and $z$. Note that the time coordinate takes the form $ct$ in order to make it have the same units as the spatial coordinates (i.e., units of length). The full coordinate transformations between Einstein’s two inertial frames $K$ and $k$ in standard configuration would be obtained in Minkowski spacetime as

$\begin{bmatrix} ct^{\prime}\\ \ \\x^{\prime} \\ \ \\ y^{\prime} \\ \ \\ z^{\prime} \end{bmatrix} = \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \ 0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \ 1 & \ \ 0 \\ \ \\ 0 & 0 & \ 0 & \ \ 1 \end{bmatrix} \begin{bmatrix} ct \\ \ \\x \\ \ \\ y \\ \ \\ z \end{bmatrix}$

where, using the same notation as Einstein,

$\beta = \frac{1}{\sqrt{1 - \big(\frac{v}{c}\big)^2}}$

The coefficient matrix is actually a rank-2 tensor of type (1, 1) usually denoted by $\Lambda^{\mu}_{\hphantom{\mu} \nu}$. It is the application of this matrix to a four-vector in the $K$-frame that would produce the required Lorentz transformation to a corresponding (primed) four-vector in the $k$-frame when $K$ and $k$ are in standard configuration.

At the start of §$6$ on page 907 of the published paper Einstein writes the following:

(‘Let the Maxwell-Hertz equations for empty space hold for the stationary system K, so that we have: $\ldots$ where $(X, Y, Z)$ denotes the vector of the electric force, and $(L, M, N)$ that of the magnetic force.’) The key point that Einstein is trying to make in §$6$ is that when the coordinate transformations he derived in §$3$ are applied to electromagnetic processes satisfying the above Maxwell-Hertz equations, it is found that the vectors $(X, Y, Z)$ and $(L, M, N)$ themselves satisfy transformation equations of the form

(page 909 of Einstein’s published paper). Here the primed letters are the components of the vectors with respect to the coordinate system in the $k$-frame and the unprimed letters are the components with respect to the coordinate system in the $K$-frame. These equations show that the components of $(X, Y, Z)$ and $(L, M, N)$ do not all remain unchanged when we switch from one inertial reference frame to another. In the standard configuration scenario, only the $X$ and $L$ components remain unchanged. In contrast, looking at the $Y^{\prime}$ equation, for example, we see that an event which from the point of view of the $k$-frame would be regarded as being due solely to an electric force’ $Y^{\prime}$ would be regarded from the point of view of the $K$-frame as being due to a combination of an electric force $Y$ and a magnetic force $N$. Thus, Einstein writes on page 910 that die elektrischen und magnetischen Kräfte keine von dem Bewegungszustande des Koordinatensystems unabhängige Existenz besitzen. (electric and magnetic forces do not exist independently of the state of motion of the system of coordinates.’)

Not surprisingly, the terminology and notation that Einstein uses in 1905 seem rather archaic and obscure from a modern perspective and I could not help jotting down modern interpretations of what he was saying as I was reading his paper. To begin with, the Maxwell-Hertz equations for empty space’ that Einstein refers to at the start can be viewed as arising from a simple scenario in which there is a changing charge and current distribution within a certain region of space, but we are considering the fields produced by this source of radiation in the free space outside the region. In this free space outside the region the charge and current densities (denoted by $\rho$ and $\vec{J}$ respectively) are everywhere zero and the differential form of the Maxwell equations in SI units reduce to

$\nabla \cdot \vec{E} = 0$

$\nabla \cdot \vec{B} = 0$

$\mu_0 \epsilon_0 \frac{\partial \vec{E}}{\partial t} = \nabla \times \vec{B}$

$\frac{\partial \vec{B}}{\partial t} = - \nabla \times \vec{E}$

where the parameters $\epsilon_0$ and $\mu_0$ and the speed of light $c$ are related by the equation

$c^2 = \frac{1}{\mu_0 \epsilon_0}$

The vector $\vec{E} = (E_x, E_y, E_z)$ denotes the electric field which is defined at any given point as the electric force per unit charge on an infinitesimally small positive test charge placed at that point. This corresponds to the vector $(X, Y, Z)$ in Einstein’s paper which he called den Vektor der elektrischen (‘the vector of the electric force’). The vector $\vec{B} = (B_x, B_y, B_z)$ denotes the magnetic field and this corresponds to the vector $(L, M, N)$ in Einstein’s paper. A moving charge creates a magnetic field in all of space and this magnetic field exerts a force on any other moving charge. (Charges at rest do not produce magnetic fields nor do magnetic fields exert forces on them).

In his paper Einstein actually employs an alternative formulation of the above Maxwell equations in which quantities are measured in Gaussian units rather than SI units. This involves defining

$\epsilon_0 = \frac{1}{4 \pi c}$

(and therefore $\mu_0 = \frac{4 \pi}{c}$) and also rescaling the electric field vector as

$\vec{E} = c^{-1} \vec{E}_{SI}$

where $\vec{E}_{SI}$ denotes the electric field vector expressed in SI units. When these changes are substituted into the above Maxwell equations, the equations become

$\nabla \cdot \vec{E} = 0$

$\nabla \cdot \vec{B} = 0$

$\frac{1}{c} \frac{\partial \vec{E}}{\partial t} = \nabla \times \vec{B}$

$\frac{1}{c} \frac{\partial \vec{B}}{\partial t} = - \nabla \times \vec{E}$

The last two of these are the equations appearing at the start of §$6$ of Einstein’s paper, with $V \equiv c$ there. Incidentally, these are also the equations that are responsible for the emergence of electromagnetic waves, as can easily be demonstrated by taking the curl of both sides of the equations to get

$\frac{1}{c} \frac{\partial (\nabla \times \vec{E})}{\partial t} = \nabla \times (\nabla \times \vec{B})$

$\frac{1}{c} \frac{\partial (\nabla \times \vec{B})}{\partial t} = - \nabla \times (\nabla \times \vec{E})$

On the left-hand sides I have taken the curl operator into the partial derivative which is permissible since curl does not involve time. We can then replace the curl inside each partial derivative by the corresponding term in Maxwell’s equations. Using a standard identity in vector calculus the right-hand-sides become

$\nabla \times (\nabla \times \vec{B}) = \nabla (\nabla \cdot \vec{B}) - \nabla^{2} \vec{B}$

$\nabla \times (\nabla \times \vec{E}) = \nabla (\nabla \cdot \vec{E}) - \nabla^{2} \vec{E}$

Since the divergence terms are zero in free space we are left only with the Laplacian terms on the right-hand sides. Putting the left and right-hand sides together we get the electromagnetic wave equations

$\nabla^2 \vec{B} = \frac{1}{c^2} \frac{\partial^2 \vec{B}}{\partial t^2}$

$\nabla^2 \vec{E} = \frac{1}{c^2} \frac{\partial^2 \vec{E}}{\partial t^2}$

With regard to Einstein’s demonstration that the electric and magnetic fields get mixed together under a Lorentz transformation, we would show this now via a Lorentz transformation of a rank-2 electromagnetic field tensor of the form

$F^{\mu \nu} = \begin{bmatrix} 0 & \frac{1}{c} E_x & \frac{1}{c} E_y & \frac{1}{c} E_z \\ \ \\ -\frac{1}{c} E_x & 0 & B_z & -B_y \\ \ \\-\frac{1}{c} E_y & -B_z & 0 & B_x \\ \ \\ -\frac{1}{c}E_z & B_y & -B_x & 0 \end{bmatrix}$

Note that we are using SI units here, which is why the coefficient $\frac{1}{c}$ appears in front of the electric field components. This coefficient would disappear if we were using Gaussian units. To apply the Lorentz transformation to the electromagnetic field tensor we observe that since $F^{\mu \nu}$ is a rank-2 tensor of type $(2, 0)$ it transforms according to

$F^{\prime \ \alpha \beta} = \frac{\partial x^{\prime \alpha}}{\partial x^{\mu}} \frac{\partial x^{\prime \beta}}{\partial x^{\nu}} F^{\mu \nu}$

Since

$\Lambda^{\alpha}_{\hphantom{\alpha} \mu} \equiv \frac{\partial x^{\prime \alpha}}{\partial x^{\mu}}$

we obtain the required Lorentz transformation of the electromagnetic field tensor as

$F^{\prime \ \alpha \beta} = \Lambda^{\alpha}_{\hphantom{\alpha} \mu} \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu}$

But $\Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu}$ involves multiplying corresponding terms in each row of the matrices $\Lambda^{\beta}_{\hphantom{\beta} \nu}$ and $F^{\mu \nu}$ which is the same as the matrix multiplication

$[F^{\mu \nu}] [\Lambda^{\beta}_{\hphantom{\beta} \nu}]^T$

where the $T$ denotes the matrix transpose. (Note that the transformation matrix for frames in standard configuration is symmetric, so it is unaffected by transposition). Applying $\Lambda^{\alpha}_{\hphantom{\alpha} \mu}$ on the left then amounts to multiplying the above matrix product on the left by the matrix $[\Lambda^{\alpha}_{\hphantom{\alpha} \mu}]$. Thus, we are able to compute the required Lorentz transformation of the electromagnetic field tensor as a matrix product. We get

$F^{\prime \ \alpha \beta} = \Lambda^{\alpha}_{\hphantom{\alpha} \mu} \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu}$

$= [\Lambda^{\alpha}_{\hphantom{\alpha} \mu}] [F^{\mu \nu}] [\Lambda^{\beta}_{\hphantom{\beta} \nu}]^T$

$= \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \ 0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \ 1 & \ \ 0 \\ \ \\ 0 & 0 & \ 0 & \ \ 1 \end{bmatrix} \begin{bmatrix} 0 & \frac{1}{c} E_x & \frac{1}{c} E_y & \frac{1}{c} E_z \\ \ \\ -\frac{1}{c} E_x & 0 & B_z & -B_y \\ \ \\-\frac{1}{c} E_y & -B_z & 0 & B_x \\ \ \\ -\frac{1}{c}E_z & B_y & -B_x & 0 \end{bmatrix} \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \ 0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \ 1 & \ \ 0 \\ \ \\ 0 & 0 & \ 0 & \ \ 1 \end{bmatrix}$

$= \begin{bmatrix} 0 & \frac{E_x}{c} & \beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg) & \beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg)\\ \ \\ -\frac{E_x}{c} & 0 & \beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg) & -\beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg) \\ \ \\ -\beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg) & -\beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg) & 0 & B_x \\ \ \\ -\beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg) & \beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg) & -B_x & 0 \end{bmatrix}$

$\equiv \begin{bmatrix} 0 & \frac{E_x}{c}^{\prime} & \frac{E_y}{c}^{\prime} & \frac{E_z}{c}^{\prime} \\ \ \\ -\frac{E_x}{c}^{\prime} & 0 & B_z^{\prime} & -B_y^{\prime} \\ \ \\-\frac{E_y}{c}^{\prime} & -B_z^{\prime} & 0 & B_x^{\prime} \\ \ \\ -\frac{E_z}{c}^{\prime} & B_y^{\prime} & -B_x^{\prime} & 0 \end{bmatrix}$

Comparing the entries in the last two matrices we get exactly the same relations as Einstein did on page 909 of his paper, except that we are using SI units here so the electric field components are divided by $c$:

$\frac{E_x}{c}^{\prime} = \frac{E_x}{c}$

$\frac{E_y}{c}^{\prime} = \beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg)$

$\frac{E_z}{c}^{\prime} = \beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg)$

$B_x^{\prime} = B_x$

$B_y^{\prime} = \beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg)$

$B_z^{\prime} = \beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg)$

# A note on reverse engineering the Navier-Stokes equations

The Navier-Stokes equations are the fundamental equations of fluid mechanics, analogous to, say, Maxwell’s equations in the case of electromagnetism. They are important for applications in science and engineering but they are difficult to solve analytically so real-life applications rely almost exclusively on computer-aided methods. In fact, the equations are still not yet fully understood mathematically. Some basic questions about the existence and nature of possible solutions remain unanswered. Because of their importance, the Clay Mathematics Institute has offered a prize of one million dollars to anyone who can clarify some specific fundamental questions about them (see the CMI web page for details).

The equations are named after Claude-Louis Navier (1785-1836) and George Gabriel Stokes (1819-1903) who worked on their development independently. Some other leading contemporary mathematicians were also involved and I was intrigued to learn that all of these mathematicians used Euler’s equations of motion as a starting point in their derivations of the Navier-Stokes equations. Euler’s equations had been derived much earlier in 1757 by the great mathematician Leonhard Euler (1707-1783). In tensor notation, using the Einstein summation convention, the Euler equations take the form

$\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

I explored a derivation of these equations from Newton’s second law in a previous post. To better understand how the two sets of equations are related, it occurred to me that it might be an interesting exercise, and probably not too difficult, to try to reverse engineer’ the Navier-Stokes equations using Euler’s equations as a starting point. I want to briefly record my exploration of this idea in the present note.

I am interested in the incompressible fluid version of the Navier-Stokes equations which can be written in tensor form, again using the Einstein summation convention, as

$\rho g^i - \frac{\partial p}{\partial x^i} + \mu \frac{\partial}{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big) = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

where $\mu > 0$ is a viscosity coefficient specific to the fluid in question. Putting the Euler equations and the Navier-Stokes equations side by side like this makes it clear that the key difference between them is the addition of the viscosity-related terms on the left-hand side of the Navier-Stokes equations. To clarify things a bit more it is helpful to see the equations laid out fully. The Euler equations constitute a three-equation system which looks like this:

$\rho g^1 - \frac{\partial p}{\partial x^1} = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)$

$\rho g^2 - \frac{\partial p}{\partial x^2} = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)$

$\rho g^3 - \frac{\partial p}{\partial x^3} = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)$

The Navier-Stokes equations also constitute a three-equation system which looks like this:

$\rho g^1 - \frac{\partial p}{\partial x^1} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)$

$\rho g^2 - \frac{\partial p}{\partial x^2} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)$

$\rho g^3 - \frac{\partial p}{\partial x^3} + \mu \bigg(\frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) + \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) + \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\bigg) = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)$

Looking at the pattern of the indices appearing in the viscosity-related terms on the left-hand side of the Navier-Stokes equations, one is immediately reminded of the indices identifying the positions of the elements of a matrix and we can in fact arrange the viscosity-related terms in matrix form as

$\begin{bmatrix} \frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\end{bmatrix}$

From the point of view of reverse engineering the Navier-Stokes equations this is highly suggestive because in deriving the Euler equations we encountered one key matrix, the rank-2 stress tensor $\tau^{ij}$. Could the above matrix be related to the $3 \times 3$ stress tensor? The answer is yes, and in fact we have that

$\begin{bmatrix} \frac{\partial}{\partial x^1}\big(\frac{\partial v^1}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^1}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^1}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^2}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^2}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^2}{\partial x^3}\big)\\ \ \\ \frac{\partial}{\partial x^1}\big(\frac{\partial v^3}{\partial x^1}\big) & \frac{\partial}{\partial x^2}\big(\frac{\partial v^3}{\partial x^2}\big) & \frac{\partial}{\partial x^3}\big(\frac{\partial v^3}{\partial x^3}\big)\end{bmatrix} = \begin{bmatrix} \frac{\partial \tau^{11}}{\partial x^1} & \frac{\partial \tau^{12}}{\partial x^2} & \frac{\partial \tau^{13}}{\partial x^3}\\ \ \\ \frac{\partial \tau^{21}}{\partial x^1} & \frac{\partial \tau^{22}}{\partial x^2} & \frac{\partial \tau^{23}}{\partial x^3}\\ \ \\ \frac{\partial \tau^{31}}{\partial x^1} & \frac{\partial \tau^{32}}{\partial x^2} & \frac{\partial \tau^{33}}{\partial x^3} \end{bmatrix}$

In deriving the Euler equations what we did was to assume that the fluid was friction-free so that there were no shear forces acting on any of the faces of a given fluid volume element. This means that the fluid was being modelled as being inviscid, i.e., having no viscosity. As a result, we kept only the diagonal elements of the stress tensor and re-interpreted these as pressures. Approximating the net pressure in each direction on a fluid volume element using a Taylor series expansion we were then led to include only the first-order partials of the diagonal terms of the stress tensor (interpreted as the first-order partials of pressure) in the Euler equations.

In the case of the Navier-Stokes equations we are no longer assuming that the fluid is inviscid so we are including all the first-order partials of the stress tensor in the equations. Thus, the Navier-Stokes equations could be written as

$\rho g^1 - \frac{\partial p}{\partial x^1} + \mu \big(\frac{\partial \tau^{11}}{\partial x^1} + \frac{\partial \tau^{12}}{\partial x^2} + \frac{\partial \tau^{13}}{\partial x^3}\big) = \rho \big(\frac{\partial v^1}{\partial t} + v^1 \frac{\partial v^1}{\partial x^1} + v^2 \frac{\partial v^1}{\partial x^2} + v^3 \frac{\partial v^1}{\partial x^3}\big)$

$\rho g^2 - \frac{\partial p}{\partial x^2} + \mu \big(\frac{\partial \tau^{21}}{\partial x^1} + \frac{\partial \tau^{22}}{\partial x^2} + \frac{\partial \tau^{23}}{\partial x^3}\big) = \rho \big(\frac{\partial v^2}{\partial t} + v^1 \frac{\partial v^2}{\partial x^1} + v^2 \frac{\partial v^2}{\partial x^2} + v^3 \frac{\partial v^2}{\partial x^3}\big)$

$\rho g^3 - \frac{\partial p}{\partial x^3} + \mu \big(\frac{\partial \tau^{31}}{\partial x^1} + \frac{\partial \tau^{32}}{\partial x^2} + \frac{\partial \tau^{33}}{\partial x^3}\big) = \rho \big(\frac{\partial v^3}{\partial t} + v^1 \frac{\partial v^3}{\partial x^1} + v^2 \frac{\partial v^3}{\partial x^2} + v^3 \frac{\partial v^3}{\partial x^3}\big)$

Detailed consideration of the forces acting on an infinitesimal volume element in the case of a viscous fluid lead to expressions for the stress tensor components as functions of the first-order partials of the components of the velocity vector of the form

$\tau^{ij} = \tau^{ji} = \mu \big(\frac{\partial v^i}{\partial x^j} + \frac{\partial v^j}{\partial x^i}\big)$

The first-order partials of these are then of the form

$\frac{\partial \tau^{ij}}{\partial x^j} = \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big) + \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^j}{\partial x^i}\big)$

But note that with incompressible flow the mass density of the fluid is a constant so the divergence of the velocity vanishes (this is implied directly by the continuity equation with constant mass density). Therefore the second term on the right-hand side is zero since

$\frac{\partial }{\partial x^j}\big(\frac{\partial v^j}{\partial x^i}\big) = \frac{\partial }{\partial x^i}\big(\frac{\partial v^j}{\partial x^j}\big)$

$= \frac{\partial }{\partial x^i}\big(\frac{\partial v^1}{\partial x^1}\big) + \frac{\partial }{\partial x^i}\big(\frac{\partial v^2}{\partial x^2}\big) + \frac{\partial }{\partial x^i}\big(\frac{\partial v^3}{\partial x^3}\big) = \frac{\partial }{\partial x^i} \text{div}(\vec{v}) = 0$

Thus we are left with

$\frac{\partial \tau^{ij}}{\partial x^j} = \mu \frac{\partial }{\partial x^j}\big(\frac{\partial v^i}{\partial x^j}\big)$

which are the extra terms appearing on the left-hand side of the Navier-Stokes equations.

# A fluid-mechanical visualisation of the quantum-mechanical continuity equation

The concept of a probability current is useful in quantum mechanics for analysing quantum scattering and tunnelling phenomena, among other things. However, I have noticed that the same rather abstract and non-visual approach to introducing probability currents is repeated almost verbatim in every textbook (also see, e.g., this Wikipedia article). The standard approach essentially involves defining a probability current from the outset as

$\vec{j} = \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

and then using Schrödinger’s equation to show that this satisfies a fluid-like continuity equation of the form

$\frac{\partial \rho}{\partial t} + \nabla \cdot \vec{j} = 0$

with

$\rho \equiv \Psi^{*} \Psi$

In the present note I want to briefly explore a more intuitive and visual approach involving a model of the actual flow of a probability fluid’. I want to begin with a fluid-mechanical model and then obtain the standard expression for the quantum-mechanical continuity equation from this, rather than starting out with an abstract definition of the probability current and then showing that this satisfies a continuity equation. The essential problem one faces when trying to do this is that although in classical mechanics the position $\vec{r}(t)$ of a point particle and its velocity $\vec{v}(t) = d\vec{r}(t)/dt$ are well defined, this is not the case in conventional quantum mechanics. Quantum mechanics is done probabilistically, treating a particle as a wave packet such that the square of the amplitude of the corresponding wave function acts as a probability density which can be used to measure the probability that the particle will occupy a particular region of space at a particular time. It is not possible to say definitively where a particular particle will be at a particular time in quantum mechanics, which makes it difficult to apply the conventional deterministic equations of fluid mechanics.

A huge specialist literature on quantum hydrodynamics has in fact arisen which tries to circumvent this problem in a number of ways. A standard reference is Wyatt, R. E., 2005, Quantum Dynamics with Trajectories: Introduction to Quantum Hydrodynamics (Springer). The route that a large part of this literature has taken is intriguing because it is based on Bohmian mechanics, an approach to quantum mechanics developed by David Bohm in 1952 which is regarded by most mainstream physicists today as unconventional. The key feature of the Bohmian mechanics approach is that classical-like particle trajectories are possible. Using this approach one can obtain Newtonian-like equations of motion analogous to those in conventional fluid mechanics and this is how this particular literature seems to have chosen to treat quantum particle trajectories in a fluid-like way. Attempts have also been made to introduce mathematically equivalent approaches, but defined within conventional quantum mechanics (see, e.g., Brandt, S. et al, 1998, Quantile motion and tunneling, Physics Letters A, Volume 249, Issue 4, pp. 265-270).

In the present note I am not looking to solve any elaborate problems so I will simply consider a free quantum wave packet which is not acted upon by any forces and try to visualise probability currents and the quantum continuity equation in a fluid-like way by using the momentum vector operator $\hat{\vec{p}}$ to characterise the velocity of the particle. I will then show that the probability current obtained in this fluid-mechanical model is the same as the one defined abstractly in textbooks.

In quantum mechanics, calculations are done using operators to represent observables. Every possible observable that one might be interested in for the purposes of experiment has a corresponding operator which the mathematics of quantum mechanics can work on to produce predictions. The key operator for the purposes of the present note is the momentum vector operator

$\hat{\vec{p}} = -i \hbar \nabla$

which is the quantum mechanical analogue of the classical momentum vector

$\vec{p} = m \vec{v}$

The key idea for the present note is to regard the velocity vector of the quantum particle as being represented by the operator

$\frac{\hat{\vec{p}}}{m}$

by analogy with the classical velocity vector which can be obtained as

$\vec{v} = \frac{\vec{p}}{m}$

We will imagine the total probability mass

$\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \Psi \text{d}V = 1$

as a fluid in steady flow throughout the whole of space and obeying mass conservation. The fact that the flow is steady reflects the fact that there are no forces acting on the quantum particle in this model, so we must have

$\frac{\partial }{\partial t}\big[\frac{\hat{\vec{p}}}{m}\big] = 0$

The velocity can vary from point to point in the probability fluid but at any given point it cannot be varying over time.

In a classical fluid we have a mass density per unit volume $\rho$ and we regard the velocity vector $\vec{v}$ as a volumetric flow rate per unit area, i.e., the volume of fluid that would pass through a unit area per unit time. Then $\rho \vec{v}$ is the mass flow rate per unit area, i.e., the mass of fluid that would pass through a unit area per unit time. In quantum mechanics we can regard the probability mass density per unit volume $\rho \equiv \Psi^{*} \Psi$ as analogous to the mass density of a classical fluid. We can interpret $\frac{\hat{\vec{p}}}{m}$ as the volumetric flow rate per unit area, i.e., the volume of probability fluid that would pass through a unit area per unit time. When doing probability calculations with quantum mechanical operators we usually sandwich’ the operator between $\Psi^{*}$ and $\Psi$, so following that approach here we can define the probability current density as

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$

This is to be interpreted as the probability mass flow rate per unit area, i.e., the amount of probability mass that would pass through a unit area per unit time, analogous to $\vec{j} = \rho \vec{v}$ in the classical case. To see how close the analogy is, suppose the quantum wave function is that of a plane wave

$\Psi(x, y, z, t) = A\mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

Then

$\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi = A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} \nabla A \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A \mathrm{e}^{-i(\vec{k} \cdot \vec{r} - \omega t)} \frac{(-i \hbar)}{m} A \ i \ \vec{k} \ \mathrm{e}^{i(\vec{k} \cdot \vec{r} - \omega t)}$

$= A^2 \frac{\hbar \vec{k}}{m}$

$= \rho \vec{v}$

which looks just like the mass flow rate in the classical case with $\rho = \Psi^{*} \Psi$ and $\vec{v} \equiv \frac{\hbar \vec{k}}{m}$. Note that in this example the probability current density formula we are using, namely $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$, turned out to be real-valued. Unfortunately this will not always be the case. Since the probability current vector must always be real-valued, the fluid-mechanical model in the present note will only be applicable in cases when this is true for the formula $\Psi^{*} \frac{\hat{\vec{p}}}{m} \Psi$.

As in classical fluid mechanics, a continuity equation can now be derived by considering the net outflow of probability mass from an infinitesimal fluid element of volume $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$.

Considering only the $y$-component for the moment, we see from the diagram that on the left-hand side we have the probability mass flow rate coming into the volume element through the left-hand face. The mass flow rate coming out of the fluid element through the right-hand face can be approximated using a Taylor series expansion as being equal to the mass flow rate through the left-hand face plus a differential adjustment based on the gradient of the probability current density and the length $\mathrm{d} y$. The net probability mass flow rate in the $y$-direction is then obtained by subtracting the left-hand term from the right-hand term to get

$\frac{\partial }{\partial y} \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi\big) \mathrm{d}V$

Using similar arguments for the $x$ and $z$-directions, the net mass flow rate out of the fluid element in all three directions is then

$\nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) \mathrm{d} V$

Now, the probability mass inside the fluid element is $\rho \mathrm{d} V$ where $\rho = \Psi^{*} \Psi$ and if there is a net outflow of probability fluid this mass will be decreasing at the rate

$- \frac{\partial \rho}{\partial t} \mathrm{d} V$

Equating the two expressions and dividing through by the volume of the fluid element we get the equation of continuity

$\frac{\partial \rho}{\partial t} + \nabla \cdot \big(\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi \big) = 0$

What I want to do now is show that if we work out $\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$ we will get the same formula for the probability current as the one usually given in quantum mechanics textbooks. We have

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{m} \Psi^{*} \nabla \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi$

We now note that since the probability current density must be a real vector, the last two terms above must be real. Therefore they are not affected in any way if we take their complex conjugate. Taking the complex conjugate of the second term in the last equality we get

$\Psi^{*}\frac{\hat{\vec{p}}}{m} \Psi$

$= - \frac{i \hbar}{2m} \Psi^{*} \nabla \Psi + \frac{i \hbar}{2m} \Psi \nabla \Psi^{*}$

$= \frac{i \hbar}{2m}(\Psi \nabla \Psi^{*} - \Psi^{*} \nabla \Psi)$

$= \vec{j}$

This is exactly the expression for the probability current density that appears in textbooks, but rather than introducing it out of nowhere’ at the beginning, we have obtained it naturally as a result of a fluid-mechanical model.

# Calculation of a quantum-mechanical commutator in three dimensions

I needed to work out the commutator $[\hat{H}, \hat{\vec{r}} \ ]$, where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator and $\hat{\vec{r}}$ is the 3D position vector operator. It is difficult to find any textbook or online source that explicitly goes through the calculation of this three-dimensional case (in fact, I have not been able to find any) so I am recording my calculation step-by-step in this note.

The commutator $[\hat{H}, \hat{\vec{r}} \ ]$ is a vector operator with components

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H}$

$[\hat{H}, \hat{y} \ ] = \hat{H} \ \hat{y} - \hat{y} \ \hat{H}$

and

$[\hat{H}, \hat{z} \ ] = \hat{H} \ \hat{z} - \hat{z} \ \hat{H}$

To evaluate these, note that the momentum operator (in position space) is

$\hat{\vec{p}} = -i \ \hbar \nabla$

and so we have

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

$= \frac{1}{2m} \hat{\vec{p}} \cdot \hat{\vec{p}} + \hat{U}$

$= \frac{1}{2m}(\hat{p}_x^{2} + \hat{p}_y^{2} + \hat{p}_z^{2}) + \hat{U}$

Looking at the $x$-component of $[\hat{H}, \hat{\vec{r}} \ ]$ we therefore have

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{\hat{p}_x^{2} \hat{x} + \hat{p}_y^{2} \hat{x} + \hat{p}_z^{2} \hat{x}}{2m} + \hat{U} \ \hat{x} - \big(\frac{\hat{x} \hat{p}_x^{2} + \hat{x} \hat{p}_y^{2} + \hat{x} \hat{p}_z^{2}}{2m} + \hat{x} \ \hat{U}\big)$

$= \frac{1}{2m}([\hat{p}_x^2, \hat{x} \ ] + [\hat{p}_y^2, \hat{x} \ ] + [\hat{p}_z^2, \hat{x} \ ]) + [\hat{U}, \hat{x} \ ]$

Since multiplication is commutative we have $[\hat{U}, \hat{x} \ ] = 0$. I will now show that we also have

$[\hat{p}_y^2, \hat{x} \ ] = [\hat{p}_z^2, \hat{x} \ ] = 0$

To see this, let us first work out in detail the effect of $[\hat{p}_y, \hat{x} \ ]$ on a wavefunction $\Psi$. We have

$[\hat{p}_y, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial y} + \hat{x }i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial y} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial y}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial y} = 0$

where the last equality follows from the fact that $\hat{x}$ does not depend on $y$. Thus, $[\hat{p}_y, \hat{x} \ ] = 0$.

We can now easily show that $[\hat{p}_y^2, \hat{x} \ ] = 0$ because using the basic result for commutators that

$[AB, C] = A[B, C] + [A, C]B$

(easy to prove by writing out the terms in full) we find that

$[\hat{p}_y^2, \hat{x} \ ] = \hat{p}_y \ [\hat{p}_y, \hat{x} \ ] + [\hat{p}_y, \hat{x} \ ] \ \hat{p}_y = 0$

Identical arguments show that $[\hat{p}_z^2, \hat{x} \ ] = 0$. Thus, we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ]$

It now only remains to work out $[\hat{p}_x^2, \hat{x} \ ]$ and we can do this by first working out in detail the effect of $[\hat{p}_x, \hat{x} \ ]$ on a wavefunction $\Psi$ (this is of course the canonical commutation relation’ of quantum mechanics). We have

$[\hat{p}_x, \hat{x} \ ] \Psi = - i \ \hbar \frac{\partial (\hat{x} \Psi)}{\partial x} + \hat{x } i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \hat{x} \frac{\partial \Psi}{\partial x} - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} + \hat{x} i \ \hbar \frac{\partial \Psi}{\partial x}$

$= - i \ \hbar \Psi \frac{\partial \hat{x}}{\partial x} = - i \ \hbar \Psi$

where the last equality follows from the fact that $\hat{x}$ is the same as multiplying by $x$ so its derivative with respect to $x$ equals $1$. Thus, $[\hat{p}_x, \hat{x} \ ] = - i \ \hbar$. Then we find that

$[\hat{p}_x^2, \hat{x} \ ] = \hat{p}_x \ [\hat{p}_x, \hat{x} \ ] + [\hat{p}_x, \hat{x} \ ] \ \hat{p}_x = -2 i \ \hbar \hat{p}_x$

and we can conclude that

$[\hat{H}, \hat{x} \ ] = \hat{H} \ \hat{x} - \hat{x} \ \hat{H} = \frac{1}{2m} [\hat{p}_x^2, \hat{x} \ ] = - \frac{i \ \hbar}{m} \hat{p}_x$

Identical arguments show that

$\hat{H} \ \hat{y} - \hat{y} \ \hat{H} = \frac{1}{2m} [\hat{p}_y^2, \hat{y} \ ] = - \frac{i \ \hbar}{m} \hat{p}_y$

and

$\hat{H} \ \hat{z} - \hat{z} \ \hat{H} = \frac{1}{2m} [\hat{p}_z^2, \hat{z} \ ] = - \frac{i \ \hbar}{m} \hat{p}_z$

Thus, we finally reach our desired expression for the Hamiltonian-position commutator in three dimensions:

$[\hat{H}, \hat{\vec{r}} \ ] = - \frac{i \ \hbar}{m} \hat{\vec{p}}$

As an application of this result we will consider the problem of working out the expectation of position and velocity for a quantum particle. In three-dimensional space the quantum wave function is

$\Psi = \Psi(x, y, z, t)$

and we obtain the probability density function as

$\rho(x, y, z, t) = \Psi^{*} \Psi$

The wavefunction $\Psi$ satisfies the time-dependent Schrödinger equation

$i \hbar \frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\big(\frac{\partial^2 \Psi}{\partial x^2} + \frac{\partial^2 \Psi}{\partial y^2} + \frac{\partial^2 \Psi}{\partial z^2}\big) + U\Psi$

where $U = U(x, y, z, t)$ is some potential energy function. We can write the Schrödinger equation in operator form using Dirac notation as

$\hat{H} |\Psi \rangle = i \hbar \frac{\partial}{\partial t} |\Psi \rangle$

where

$\hat{H} = -\frac{\hbar^2}{2m} \nabla^2 + \hat{U}$

is the Hamiltonian operator (the Hamiltonian form of total energy) and

$i \hbar \frac{\partial }{\partial t}$

is the total energy operator. Note that the complex conjugate of the wavefunction $\Psi^{*}$ satisfies the Schrödinger equation written in Dirac notation as

$\langle \Psi | \hat{H} = - \langle \Psi | i \hbar \frac{\partial}{\partial t}$

In quantum mechanics we find the expected position $\langle \vec{r}(t) \rangle$ of the particle by integrating the position operator $\hat{\vec{r}}$ over all space, sandwiched between $\Psi^{*}$ and $\Psi$. Thus, letting $\mathrm{d}V \equiv \mathrm{d} x \mathrm{d} y \mathrm{d} z$ we have

$\langle \vec{r}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \hat{\vec{r}} \ \Psi \text{d}V \equiv \langle \Psi \ | \hat{\vec{r}} \ | \Psi \rangle$

where in the last term I have switched to using Dirac notation which will be useful shortly. The expected velocity can then be obtained by differentiating this integral with respect to $t$. We get

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V + \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \Psi^{*} \ \frac{\partial \hat{\vec{r}}}{\partial t} \ \Psi \text{d}V$

The second triple integral on the right-hand side is zero because the position operator does not depend on time. The integrand in the first triple integral can be manipulated by using the operator form of the Schrödinger equation and Dirac notation to write

$\frac{\partial \Psi}{\partial t} = \frac{\partial}{\partial t} | \Psi \rangle = \frac{1}{i \ \hbar} \ \hat{H} \ | \Psi \rangle$

and

$\frac{\partial \Psi^{*}}{\partial t} = \langle \Psi | \frac{\partial}{\partial t} = -\frac{1}{i \ \hbar} \ \langle \Psi | \hat{H}$

Thus, we have

$\langle \vec{v}(t) \rangle = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \big[ \frac{\partial \Psi^{*}}{\partial t} \ \hat{\vec{r}} \ \Psi + \Psi^{*} \ \hat{\vec{r}} \ \frac{\partial \Psi}{\partial t} \big] \text{d}V$

$= - \frac{1}{i \ \hbar} \langle \Psi |\hat{H} \ \hat{\vec{r}} \ | \Psi \rangle + \frac{1}{i \ \hbar} \langle \Psi | \hat{\vec{r}} \ \hat{H} | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \hat{H} \ \hat{\vec{r}} \ - \hat{\vec{r}} \ \hat{H} \ | \Psi \rangle$

$= \frac{i}{\hbar} \langle \Psi | \ [\hat{H}, \hat{\vec{r}} \ ] \ | \Psi \rangle$

$= \frac{1}{m} \langle \Psi | \ \hat{\vec{p}} \ | \Psi \rangle$

$= \big \langle \frac{\hat{\vec{p}}}{m} \big \rangle$

where the last two equalities follow from the fact that $[\hat{H}, \hat{\vec{r}} \ ]$ is the commutator of $\hat{H}$ and $\hat{\vec{r}}$, which is equal to $-\frac{i \hbar}{m} \hat{\vec{p}}$ as we saw above. Therefore the expected velocity of a quantum particle looks a lot like the velocity of a classical particle (momentum divided by mass). The idea that quantum mechanical expectations exhibit Newtonian-like behaviour is the essence of the Ehrenfest Theorem of quantum mechanics.

# Derivation of Euler’s equations of motion for a perfect fluid from Newton’s second law

Having read a number of highly technical derivations of Euler’s equations of motion for a perfect fluid I feel that the mathematical meanderings tend to obscure the underlying physics. In this note I want to explore the derivation from a more physically intuitive point of view. The dynamics of a fluid element of mass $m$ are governed by Newton’s second law which says that the vector sum of the forces acting on the fluid element is equal to the mass of the element times its acceleration. Thus,

$\vec{F} = m \vec{a}$

The net force $\vec{F}$ can be decomposed into two distinct types of forces, so-called body forces $\vec{W}$ that act on the entire fluid element (e.g., the fluid element’s weight due to gravity) and stresses $\vec{S}$ such as pressures and shears that act upon the surfaces enclosing the fluid element. For the purposes of deriving the differential form of Euler’s equation we will focus on the net force per unit volume acting on the fluid element, $\vec{f}$, which we will decompose into a weight per unit volume $\vec{w}$ and a net stress force per unit volume $\vec{s}$. The weight per unit volume is simply obtained as

$\vec{w} = \rho \vec{g}$

where

$\rho = \frac{m}{V}$

is the mass density of the fluid (i.e., mass per unit volume) and $\vec{g}$ is the acceleration due to gravity. In index notation, the equation for the $i$-th component of the weight per unit volume is

$w^i = \rho g^i$

The net stress force per unit volume, $\vec{s}$, is a little more complicated to derive since it involves the rank-2 stress tensor $\tau^{ij}$. This tensor contains nine components and is usually represented as a $3 \times 3$ symmetric matrix. In Cartesian coordinates the components along the main diagonal, namely $\tau^{xx}$, $\tau^{yy}$ and $\tau^{zz}$, represent normal stresses, i.e., forces per unit area acting orthogonally to the planes whose normal vector is identified by the first superscript, as indicated in the diagram below. (Note that a stress is a force per unit area, so to convert a stress tensor component $\tau^{ij}$ into a force it would be necessary to multiply it by the area over which it acts).

In the diagram each normal stress is shown as a tension, i.e., a normal stress pointing away from the surface. When a normal stress points towards the surface it acts upon, it is called a pressure.

The off-diagonal components of the stress tensor represent shear stresses, i.e., forces per unit area that point along the sides of the fluid element, parallel to these sides rather than normal to them. These shear stresses are shown in the following diagram.

Shear stresses only arise when there is some kind of friction in the fluid. A perfect fluid is friction-free so there are no shear stresses. Euler’s equation only applies to perfect fluids so for the derivation of the equation we can ignore the off-diagonal components of the stress tensor.

The normal stresses along the main diagonal are usually written as

$\tau^{xx} = - p^x$

$\tau^{yy} = - p^y$

$\tau^{zz} = - p^z$

where $p$ stands for pressure and the negative sign reflects the fact that a pressure points in the opposite direction to a tension.

In a perfect fluid the pressure is isotropic, i.e., the same in all directions, so we have

$p^x = p^y = p^z = p$

Therefore the stress tensor of a perfect fluid with isotropic pressure reduces to

$\tau^{ij} = -p \delta^{ij}$

where $\delta^{ij}$ is the Kronecker delta (and may be thought of here as the metric tensor of Cartesian 3-space).

Now suppose we consider the net stress (force per unit area) in the y-direction of an infinitesimal volume element.

The stress on the right-hand face can be approximated using a Taylor series expansion as being equal to the stress on the left plus a differential adjustment based on its gradient and the length $dy$. If we take the stress on the right to be pointing in the positive direction and the one on the left as pointing in the negative (opposite) direction, the net stress in the y-direction is given by

$\tau^{yy} + \frac{\partial \tau^{yy}}{\partial y} dy - \tau^{yy} = \frac{\partial \tau^{yy}}{\partial y} dy$

Similarly, the net stresses in the $x$ and $z$-directions are

$\frac{\partial \tau^{xx}}{\partial x} dx$

and

$\frac{\partial \tau^{zz}}{\partial z} dz$

To convert these net stresses to net forces we multiply each one by the area on which it acts. Thus, the net forces on the fluid element (in vector form) are

$\big(\frac{\partial \tau^{xx}}{\partial x} dxdydz\big) \vec{i}$

$\big(\frac{\partial \tau^{yy}}{\partial y} dxdydz\big) \vec{j}$

$\big(\frac{\partial \tau^{zz}}{\partial z} dxdydz\big) \vec{k}$

The total net force on the fluid element is then

$\big(\frac{\partial \tau^{xx}}{\partial x} \ \vec{i} + \frac{\partial \tau^{yy}}{\partial y} \ \vec{j} + \frac{\partial \tau^{zz}}{\partial z} \ \vec{k}\big) dxdydz$

Switching from tensions to pressures using $\tau^{ij} = -p \delta^{ij}$ and dividing through by the volume $dxdydz$ we finally get the net stress force per unit volume to be

$\vec{s} = -\big(\frac{\partial p}{\partial x} \ \vec{i} + \frac{\partial p}{\partial y} \ \vec{j} + \frac{\partial p}{\partial z} \ \vec{k}\big)$

In index notation, the equation for the $i$-th component of this net pressure per unit volume is written as

$s^i = -\frac{\partial p}{\partial x^i}$

We have now completed the analysis of the net force on the left-hand side of Newton’s second law.

On the right-hand side of Newton’s second law we have mass times acceleration, where acceleration is the change in velocity with time. To obtain an expression for this we observe that the velocity of a fluid element may change for two different reasons. First, the velocity field may vary over time at each point in space. Second, the velocity may vary from point to point in space (at any given time). Thus, we consider the velocity field to be a function of the time coordinate as well as the three spatial coordinates, so

$\vec{v} = \vec{v}(t, x, y, z) = v^x(t, x, y, z) \ \vec{i} + v^y(t, x, y, z) \ \vec{j} + v^z(t, x, y, z) \ \vec{k}$

Considering the $i$-th component of this velocity field, the total differential is

$dv^i = \frac{\partial v^i}{\partial t} \ dt + \frac{\partial v^i}{\partial x} \ dx + \frac{\partial v^i}{\partial y} \ dy + \frac{\partial v^i}{\partial z} \ dz$

so the total derivative with respect to time is

$\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} \ dt + v^x \frac{\partial v^i}{\partial x} + v^y \frac{\partial v^i}{\partial y} + v^z \frac{\partial v^i}{\partial z}$

where I have used

$v^x = \frac{dx}{dt}$

$v^y = \frac{dy}{dt}$

$v^z = \frac{dz}{dt}$

We can write this more compactly using the Einstein summation convention as

$\frac{dv^i}{dt} = \frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}$

This is then the $i$-th component of the acceleration vector on the right-hand side of Newton’s second law. In component form, therefore, we can write mass times acceleration per unit volume for the fluid element as

$\rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

This completes the analysis of the mass times acceleration term on the right-hand side of Newton’s second law.

In per-unit-volume form, Newton’s second law for a fluid element is

$\vec{w} + \vec{s} = \rho \vec{a}$

and writing this in the component forms derived above we get the standard form of Euler’s equations of motion for a perfect fluid:

$\rho g^i - \frac{\partial p}{\partial x^i} = \rho \big(\frac{\partial v^i}{\partial t} + v^j \frac{\partial v^i}{\partial x^j}\big)$

# Alternative approaches to formulating geodesic equations on Riemannian manifolds and proof of their equivalence

A geodesic can be defined as an extremal path between two points on a manifold in the sense that it minimises or maximises some criterion of interest (e.g., minimises distance travelled, maximises proper time, etc). Such a path will satisfy some geodesic equations equivalent to the Euler-Lagrange equations of the calculus of variations. A geodesic can also be defined in a conceptually different way as the straightest’ possible path between two points on a manifold. In this case the path will satisfy geodesic equations derived by requiring parallel transport of a tangent vector along the path. Although these are conceptually different ways of defining geodesics, they are mathematically equivalent. In the present note I want to explore the derivation of geodesic equations in these two different ways and prove their mathematical equivalence.

Now, in the calculus of variations we typically define a system’s action $S$ to be the time-integral of a Lagrangian $L$:

$S \equiv \int^{t_B}_{t_A} L(q_i, \dot{q_i}) dt$

where $L(q_i, \dot{q_i})$ says that the Lagrangian is a function of position coordinates $q_i$ and velocities $\dot{q_i}$ (and $i$ ranges over however many coordinates there are). We find the trajectory that yields a desired extremal value of the action $S$ as the one that satisfies the Euler-Lagrange equations

$0 = \frac{d}{dt} \big(\frac{\partial L}{\partial \dot{q_i}} \big) - \frac{\partial L}{\partial q_i}$

Let us now suppose that we are facing an exactly analogous situation in which there are two points on the manifold, $A$ and $B$, and we are considering possible paths between them to try to find the extremal one. We can describe any path between $A$ and $B$ by specifying the coordinates of the points along it as functions of a parameter $\sigma$ that goes from a value of $0$ at $A$ to a value of $1$ at $B$, i.e., by specifying the functions $x^{\mu}(\sigma)$. Noting that the line element can be written as

$ds^2 = g_{\mu \gamma} dx^{\mu} dx^{\gamma}$

we can write the length of a particular path as

$s = \int \sqrt{ds^2} = \int^1_0 \sqrt{g_{\mu \gamma} \frac{dx^{\mu}}{d \sigma} \frac{dx^{\gamma}}{d \sigma}} d \sigma$

Note that the metric is a function of the coordinates of points along the path, which in turn are functions of the parameter $\sigma$, i.e., $g_{\mu \gamma} = g_{\mu \gamma}(x^{\alpha}(\sigma))$. This situation is exactly analogous to the usual calculus of variations scenario because, writing $\dot{x}^{\alpha} \equiv d x^{\alpha}/d \sigma$, we see that we have a Lagrangian function

$L(x^{\alpha}, \dot{x}^{\alpha}) = \sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

and we hope to find the path $x^{\mu}(\sigma)$ that makes the integral of the Lagrangian extreme. This will be the path that satisfies the Euler-Lagrange equations

$0 = \frac{d}{d \sigma} \big(\frac{\partial L}{\partial \dot{x}^{\alpha}}\big) - \frac{\partial L}{\partial x^{\alpha}}$

This corresponds to $N$ separate differential equations in an $N$-dimensional manifold, one equation for each value of the index $\alpha$.

We can manipulate the Euler-Lagrange equations to get geodesic equations which are easier to use in particular contexts. First, note that

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{\partial}{\partial \dot{x}^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} (g_{\mu \gamma} \ \delta^{\mu}_{\alpha} \ \dot{x}^{\gamma} + g_{\mu \gamma} \ \dot{x}^{\mu} \ \delta^{\ \gamma}_{\alpha})$

because, for example, $\partial \dot{x}^{\mu}/\partial \dot{x}^{\alpha} = \delta^{\mu}_{\alpha}$. Also note that the metric is treated as a constant as it depends on $x^{\alpha}$ not on $\dot{x}^{\alpha}$. Doing the sums over the Kronecker deltas we get

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{2L}(g_{\alpha \gamma} \ \dot{x}^{\gamma} + g_{\mu \alpha} \ \dot{x}^{\mu})$

$= \frac{1}{2L}(g_{\alpha \mu} \ \dot{x}^{\mu} + g_{\alpha \mu} \ \dot{x}^{\mu})$

$= \frac{1}{L} g_{\alpha \mu} \ \dot{x}^{\mu}$

But notice that since

$s = \int L d \sigma$

we have

$\frac{ds}{d \sigma} = L$

so

$\frac{1}{L} = \frac{d \sigma}{ds}$

and we can write

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{L} g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma} \frac{d \sigma}{ds}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{ds}$

Next, we have

$\frac{\partial L}{\partial x^{\alpha}} = \frac{\partial}{\partial x^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \dot{x}^{\mu} \dot{x}^{\gamma}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{d \sigma} \frac{d x^{\gamma}}{d \sigma} \frac{d \sigma}{ds}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Putting these results into the Euler-Lagrange equations we get

$0 = \frac{d}{d \sigma} \big(g_{\alpha \mu} \frac{d x^{\mu}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Finally, multiplying through by $d \sigma/ds$ we get

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

where I have also renamed $\mu \rightarrow \beta$ in the first term to make it clearer that the Einstein summations in the first and second terms are independent. This is the first version of the geodesic equations, derived by requiring that the path between the points $A$ and $B$ should be extremal in the sense of satisfying the Euler-Lagrange equations of the calculus of variations.

We will now derive a second version of the geodesic equations by requiring the geodesic to be a path that is locally straight. In differential geometry a path is defined as straight if it parallel transports its own tangent vector, i.e., if the tangent vector does not change as we move an infinitesimal step along the path. If we take an arbitrary point on the path to be $x^{\mu} \ e_{\mu}$ and we take $ds$ to be an infinitesimal displacement along the path, then a tangent vector to the path is

$\frac{d x^{\mu}}{d \sigma} e_{\mu}$

and we want

$\frac{d}{ds}\big(\frac{d x^{\mu}}{d \sigma}e_{\mu} \big) = \frac{d^2 x^{\mu}}{ds d \sigma} e_{\mu} + \frac{d x^{\mu}}{d \sigma} \frac{d e_{\mu}}{ds} = 0$

Multiplying through by $d \sigma/ds$ this gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d e_{\mu}}{ds} = 0$

But

$\frac{d e_{\mu}}{ds} = \frac{\partial e_{\mu}}{\partial x^{\gamma}} \frac{d x^{\gamma}}{d s}$

$= \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha}$

Putting this into the equation gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha} = 0$

To enable us to factor out the basis vector we can rename the indices in the second term as $\mu \rightarrow \alpha$ and $\gamma \rightarrow \beta$ to get

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} e_{\mu} = 0$

$\iff$

$\big[\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} \big] e_{\mu} = 0$

$\implies$

$\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = 0$

This is the second version of the geodesic equations, derived by assuming that the path between the two points on the manifold is locally straight.

We now have two seemingly different versions of the geodesic equations, namely

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

and

$0 = \frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta}$

We will next show that they are in fact mathematically equivalent. Starting from the first version, we can expand out the brackets to get

$0 = \frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

$\iff$

$0 = \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

Now we rename the indices as follows: $\sigma \rightarrow \alpha$ in the first term; $\sigma \rightarrow \beta$ in the second term; $\beta \rightarrow \mu$ and $\alpha \rightarrow \sigma$ in the third term; and $\alpha \rightarrow \sigma$, $\mu \rightarrow \alpha$, $\gamma \rightarrow \beta$ in the fourth term. We get

$0 = \frac{1}{2}\frac{\partial g_{\sigma \beta}}{\partial x^{\alpha}}\frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\sigma \alpha}}{\partial x^{\beta}} \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2} - \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}} \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds}$

We can write this as

$0 = \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} \frac{1}{2} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}] + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2}$

Finally, multiplying through by $g^{\mu \sigma}$ and using the facts that

$g^{\mu \sigma} \ g_{\sigma \mu} = 1$

and

$\Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = \frac{1}{2} g^{\mu \sigma} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}]$

we get

$0 = \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} + \frac{d^2 x^{\mu}}{ds^2}$

which is the second version of the geodesic equation. Thus, the two versions are equivalent as claimed.

# Geometric interpretation of Christoffel symbols and some alternative approaches to calculating them

In a classic paper in 1869, Elwin Bruno Christoffel (1829-1900) introduced his famous Christoffel symbols $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ to represent an array of numbers describing a metric connection. They are also known as connection coefficients (and sometimes less respectfully as `Christ-awful symbols’). In differential geometry one usually first encounters them when studying covariant derivatives of tensors in tensor calculus. For example, suppose we try to differentiate the contravariant vector $A = A^{\alpha} e_{\alpha}$, where $e_{\alpha}$ denotes a coordinate basis vector (and we are using the Einstein summation convention). We get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \frac{\partial e_{\alpha}}{\partial x^{\beta}}$

In general, the partial derivative in the second term on the right will result in another vector which we can write in terms of its coordinate basis as

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

This defines the Christoffel symbol $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$. The downstairs indices refer to the rate of change of the basis components $e_{\alpha}$ with respect to the coordinate variable $x^{\beta}$ in the direction of the coordinate basis vector $e_{\gamma}$ ($\gamma$ being the upstairs index). Substituting the second equation into the first we get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

To enable us to factor out the coordinate basis vector we can exchange the symbols $\alpha$ and $\gamma$ in the second term on the right to get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta} \ e_{\alpha}$

$= \big( \frac{\partial A^{\alpha}}{\partial x^{\beta}} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}\big) \ e_{\alpha}$

The expression in the bracket is called the covariant derivative of the contravariant vector $A$, i.e., the rate of change of $A^{\alpha}$ in each of the directions $\beta$ of the coordinate system $x^{\beta}$. It has the important property that it is itself tensorial (unlike the ordinary partial derivative of the tensor on its own). This covariant derivative is often written using the notation

$\nabla_{\beta} \ A^{\alpha} = \partial_{\beta} \ A^{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$

Having thus established the meaning of the Christoffel symbols, one then goes on to work out that the covariant derivative of a one-form is

$\nabla_{\beta} \ A_{\alpha} = \partial_{\beta} \ A_{\alpha} - A_{\gamma} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$

and that the covariant derivatives of higher rank tensors are constructed from the building blocks of $\nabla_{\beta} \ A^{\alpha}$ and $\nabla_{\beta} \ A_{\alpha}$ by adding a $\Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$ term for each upper index $\gamma$ and a $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ term for each lower index $\gamma$. For example, the covariant derivative of the $(1, 1)$ rank-2 tensor $X^{\mu}_{\hphantom{\mu} \sigma}$ is

$\nabla_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} = \partial_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} + X^{\alpha}_{\hphantom{\mu} \sigma} \ \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} - X^{\mu}_{\hphantom{\mu} \alpha} \ \Gamma^{\alpha}_{\hphantom{\alpha} \sigma \beta}$

Christoffel symbols then go on to play vital roles in other areas of differential geometry, perhaps most notably as key components in the definition of the Riemann curvature tensor.

It is possible to have a working knowledge of all of this without truly understanding at a deep level, say geometrically, what Christoffel symbols really mean. In the present note I want to delve a bit more deeply into how one might calculate and interpret Christoffel symbols geometrically. I also want to explore some alternative ways of calculating them in the context of a simple plane polar coordinate system $(r, \theta)$ which is related to the usual Cartesian $(x, y)$ coordinate system via the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

In an $n-$dimensional manifold there are potentially $n^3$ Christoffel symbols to be calculated, though this number is usually reduced by symmetries. In the present plane polar coordinate case, we will need to calculate $2^3 = 8$ Christoffel symbols. These are

$\Gamma^{r}_{\hphantom{r} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r}$

$\Gamma^{r}_{\hphantom{r} \theta r}$

$\Gamma^{r}_{\hphantom{r} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta}$

$\Gamma^{r}_{\hphantom{r} r \theta}$

Geometric approach
Consider the situation shown in the diagram below where two vectors $(e_{\theta})_P$ and $(e_{\theta})_S$ of the basis vector field $e_{\theta}$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_{\theta})_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

Now, in plane polar coordinates the magnitude of $e_{\theta}$ is

$|e_{\theta}| = r$

Therefore the length of the arc $L$ in the diagram is

$L = r \Delta \theta$

If $\Delta \theta$ is small, we have

$L \approx |\Delta_{\theta} e_{\theta}|$

where $\Delta_{\theta} e_{\theta}$ is the vector connecting the endpoints of $(e_{\theta})_P$ and $(e_{\theta})_S$, i.e., $\Delta_{\theta} e_{\theta} = (e_{\theta})_S - (e_{\theta})_P$.

Therefore

$|\Delta_{\theta} e_{\theta}| \approx r \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we get

$|d_{\theta} e_{\theta}| = r d \theta$

From the diagram we see that $d_{\theta} e_{\theta}$ points in the opposite direction of $e_r$. Therefore we have

$d_{\theta} e_{\theta} = - r d \theta e_r$

(note that in plane polar coordinates $e_r$ is of unit length). From this equation we have

$\frac{d_{\theta} e_{\theta}}{d \theta} \equiv \frac{\partial e_{\theta}}{\partial \theta} = -r e_r$

But from the definition of Christoffel symbols we have

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta}$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

We have obtained the first two Christoffel symbols on our list from the geometric setup and the nice thing about this approach is that we can see what the underlying changes in the coordinate basis vectors looked like.

To obtain the next two Christoffel symbols on our list, we consider a change in the vector field $e_{\theta}$ due to a displacement in the radial direction from $P$ to $Q$ in the following diagram:

We have moved outwards by a small amount $\Delta r$ and as a result the length of the vectors in the vector field $e_{\theta}$ has increased by a small amount $|\Delta_r e_{\theta}|$ shown in the diagram. From the diagram we see that the proportions of the two increases must be same, so we have

$\frac{|\Delta_r e_{\theta}|}{|e_{\theta}|} = \frac{\Delta r}{r}$

or

$|\Delta_r e_{\theta}| = \Delta r \frac{1}{r} |e_{\theta}|$

Passing to the differential limit as $\Delta r \rightarrow 0$ we get

$|d_r e_{\theta}| = dr \frac{1}{r} |e_{\theta}|$

Since $d_r e_{\theta}$ is directed along the vector $e_{\theta}$ we can write the vector equation

$d_r e_{\theta} = dr \frac{1}{r} e_{\theta}$

so

$\frac{d_r e_{\theta}}{dr} \equiv \frac{\partial e_{\theta}}{\partial r} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

We have thus found two more Christoffel symbols from the geometrical setup. To get the next two Christoffel symbols on our list we observe that the basis vector field $e_r$ does not change as we move in the radial direction (either in magnitude or direction) so we must have

$\frac{\partial e_r}{\partial r} = 0$

where the right hand side here denotes a zero vector. But we know that

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta}$

so we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Finally, to get the last two remaining Christoffel symbols on our list, we consider a change in the vector field $e_r$ due to an angular displacement. In the diagram below two vectors $(e_r)_P$ and $(e_r)_S$ of the basis vector field $e_r$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_r)_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

The arc length $L$ is

$L = |e_r| \Delta \theta = \Delta \theta$

(since the magnitude of the coordinate basis vector $e_r$ is $|e_r| = 1$). But for small $\Delta \theta$ we also have

$L \approx |\Delta_{\theta} e_r|$

where $\Delta_{\theta} e_r$ is the vector connecting the endpoints of $(e_r)_P$ and $(e_r)_S$, i.e., $\Delta_{\theta} e_r = (e_r)_S - (e_r)_P$. Therefore

$|\Delta_{\theta} e_r| = \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we have

$|d_{\theta} e_r| = d \theta$

But $d_{\theta} e_r$ has the same direction as $e_{\theta}$. Therefore

$d_{\theta} e_r = \frac{1}{r} d \theta e_{\theta}$

where the factor $\frac{1}{r}$ is needed to correct for the magnitude $r$ of $e_{\theta}$ (we only want the direction of $e_{\theta}$ here). Therefore we see that

$\frac{d_{\theta} e_r}{d \theta} \equiv \frac{\partial e_r}{\partial \theta} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

This completes the geometric calculation of all the Christoffel symbols for plane polar coordinates.

Algebraic approach

It is possible to calculate the eight Christoffel symbols quite easily for plane polar coordinates by first expressing the basis components $e_r$ and $e_{\theta}$ in terms of the Cartesian components $e_x$ and $e_y$. Note that these basis components are one-forms, so they transform as

$e^{\prime}_{\alpha} = \frac{\partial x^{\beta}}{\partial x^{\prime \alpha}} e_{\beta}$

We use the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

to calculate the coefficients. We get

$e_r = \frac{\partial x}{\partial r} e_x + \frac{\partial y}{\partial r} e_y$

$e_{\theta} = \frac{\partial x}{\partial \theta} e_x + \frac{\partial y}{\partial \theta} e_{\theta}$

and therefore

$e_r = \cos \theta e_x + \sin \theta e_y$

$e_{\theta} = -r \sin \theta e_x + r \cos \theta e_y$

Then we calculate the Christoffel symbols as follows. First,

$\frac{\partial e_r}{\partial r} = 0$

so

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta} = 0$

and we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Next,

$\frac{\partial e_r}{\partial \theta} = - \sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

Next,

$\frac{\partial e_{\theta}}{\partial \theta} = -r \cos \theta e_x - r \sin \theta e_y = -r e_r$

so

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta} = -r e_r$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

Finally,

$\frac{\partial e_{\theta}}{\partial r} = -\sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

Metric tensor approach

The previous approach relied on knowing the functional relationship between the Cartesian coordinates $(x, y)$ and the plane polar coordinates $(r, \theta)$. There is another more generally useful method of calculating the Christoffel symbols from the components of the metric tensor, using the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

I will first derive this formula from first principles, then use it to find the Christoffel symbols for the plane polar coordinates case.

The first step is to show that Christoffel symbols are symmetric in their lower indices, i.e.,

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

as this property will be needed in the derivation of the formula. To prove the symmetry property we start from the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Suppose we now decompose the basis vectors $e_{\alpha}$ in a local Cartesian coordinate system. Then using the transformation rule for one-forms we have

$e_{\alpha} = \frac{\partial x^m}{\partial x^{\alpha}} e_m$

where the $x^m$ are the Cartesian coordinates and the $e_m$ are the coordinate basis vectors (which are constant in both magnitude and direction in the Cartesian system). Differentiating gives

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

Equating the expressions for $\frac{\partial e_{\alpha}}{\partial x^{\beta}}$ we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

But then

$\Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\beta} \partial x^{\alpha}} e_m$

so it follows from Young’s Theorem (equality of cross-partials) that

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

We conclude that Christoffel symbols are symmetric in their lower indices, as claimed.

Note too that the components $g_{\mu \gamma}$ of the general metric tensor are also symmetric with respect to their indices. This follows from the defining equation of the metric tensor components in terms of the basis vector fields $e_{\gamma}$, namely

$g_{\mu \gamma} \equiv e_{\mu} \cdot e_{\gamma}$

Since $e_{\mu} \cdot e_{\gamma} = e_{\gamma} \cdot e_{\mu}$

the metric is symmetric, i.e.,

$g_{\mu \gamma} = g_{\gamma \mu}$

To derive the formula for the Christoffel symbols in terms of the metric tensor components, we begin again with the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Taking the scalar product with another basis vector on both sides we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} \cdot e_{\mu} = \frac{\partial e_{\alpha}}{\partial x^{\beta}} \cdot e_{\mu}$

$= \frac{\partial (e_{\alpha} \cdot \ e_{\mu})}{\partial x^{\beta}} - e_{\alpha} \cdot \frac{\partial e_{\mu}}{\partial x^{\beta}}$

$= \frac{\partial g_{\alpha \mu}}{\partial x^{\beta}} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ e_{\alpha} \cdot \ e_{\rho}$

$= \partial_{\beta} \ g_{\alpha \mu} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

Therefore we have

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

In the second term on the right hand side we can rename $\rho \rightarrow \gamma$ and use the fact that the metric is symmetric to reverse the indices. We get

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\gamma}_{\hphantom{\gamma} \mu \beta} \ g_{\gamma \alpha}$

By cyclically renaming the indices $\beta$, $\alpha$, and $\mu$ we can generate two more similar equations. From the cyclic permutation $\beta$, $\alpha$, $\mu$ $\rightarrow$ $\alpha$, $\mu$, $\beta$ we get

$\partial_{\alpha} \ g_{\mu \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \mu \alpha} \ g_{\gamma \beta} + \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ g_{\gamma \mu}$

and from the cyclic permutation $\alpha$, $\mu$, $\beta$ $\rightarrow$ $\mu$, $\beta$, $\alpha$ we get

$\partial_{\mu} \ g_{\beta \alpha} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \mu} \ g_{\gamma \alpha} + \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \mu} \ g_{\gamma \beta}$

Now we add the first two equations and subtract the third to get

$\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha} = 2 \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu}$

where we have taken advantage of the symmetry in the lower indices of the Christoffel symbols to cancel some terms. Using the fact that

$g^{\mu \gamma} \ g_{\gamma \mu} = 1$

we multiply both sides by $\frac{1}{2}g^{\mu \gamma}$ to get the final formula:

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2}g^{\mu \gamma}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha}]$

$= \frac{1}{2}g^{\gamma \mu}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

This is made easier to remember by noting the following facts. A factor of the inverse metric generates the Christoffel symbol’s upper index. The negative term has the symbol’s lower indices as the indices of the metric. The other two terms in the bracket are cyclic permutations of this last term.

Having derived the formula we can now employ it to calculate the eight Christoffel symbols for plane polar coordinates. We can work out the metric tensor using the distance formula

$ds^2 = dx^2 + dy^2$

with the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

Then

$dx = \cos \theta dr - r \sin \theta d \theta$

$dy = \sin \theta dr + r \cos \theta d \theta$

so

$dx^2 = \cos^2 \theta dr^2 + r^2 \sin^2 \theta d \theta^2 - 2 r \sin \theta \cos \theta dr d \theta$

$dy^2 = \sin^2 \theta dr^2 + r^2 \cos^2 \theta d \theta^2 + 2 r \sin \theta \cos \theta dr d \theta$

Therefore the metric in plane polar coordinates is

$ds^2 = dx^2 + dy^2 = dr^2 + r^2 d \theta^2$

The metric tensor is therefore

$[g_{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & r^2 \end{pmatrix}$

and the inverse metric is

$[g^{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & \frac{1}{r^2} \end{pmatrix}$

Now, in the formula for $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ the indices $\alpha$, $\beta$, $\gamma$ and $\mu$ represent the polar coordinates $r$ and $\theta$ in various permutations. Inspection of $[g_{\alpha \beta}]$ shows that the only partial derivative terms which do not equal zero are

$\partial_r \ g_{\theta \theta} = \partial_r (r^2) = 2r$

Inspection of $[g^{\alpha \beta}]$ shows that this equals zero except when

$g^{rr} = 1$

and

$g^{\theta \theta} = \frac{1}{r^2}$

Substituting these values of the metric tensor components into the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

we get

$\Gamma^{r}_{\hphantom{r} \theta \theta} = \frac{1}{2} g^{r r} \big( - \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} (-2r) = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$