Illustrating the correspondence between 1-forms and vectors using de Broglie waves

I was asked by a student to clarify the issues surrounding an exercise in the famous book Gravitation written by Misner, Thorne and Wheeler (MTW). The exercise appears as follows in Chapter 2, Section 5:

The key point of this section is that Equation (2.14), the defining equation of 1-forms, can be shown to be physically valid (as well as being a just a mathematical definition) using de Broglie waves in quantum mechanics. The notation in MTW is not ideal, so we will replace the notation $\langle \mathbf{\tilde{p}}, \mathbf{v} \rangle$ for a 1-form evaluated at a vector $\mathbf{v}$ by the notation $\mathbf{\tilde{p}}(\mathbf{v})$. What MTW are then saying is that given any vector $\mathbf{p}$ we can define a corresponding 1-form as

$\mathbf{\tilde{p}} = \langle \mathbf{p}, \ \rangle$

which is to be viewed as a function waiting for a vector input (to be placed in the empty space on the right-hand side of the angle brackets). When the vector input $\mathbf{v}$ is supplied, the 1-form will then yield the number

$\mathbf{\tilde{p}}(\mathbf{v}) = \langle \mathbf{p}, \mathbf{v} \rangle = \mathbf{p} \cdot \mathbf{v}$

In Exercise 2.1 we are asked to verify the validity of this equation using the de Broglie wave

$\psi = e^{i \phi} = exp[i(\mathbf{k}\cdot \mathbf{x}- \omega t)]$

The phase is the angular argument $\phi = \mathbf{k}\cdot \mathbf{x} - \omega t$ which specifies the position of the wave from some starting point. The phase is parameterised by the wave vector $\mathbf{k}$ which is such that $|\mathbf{k}| = 2 \pi/\lambda$ where $\lambda$ is the wavelength, and by the angular frequency $\omega = 2 \pi f$ where $f$ is the frequency of the relevant oscillator.

It is a well known fact (and it is easy to verify) that given any real-valued function of a vector $\phi(\mathbf{x})$, the gradient vector $\partial \phi/\partial \mathbf{x}$ is orthogonal to the level surfaces of $\phi$. In the case of the phase of a de Broglie wave we have

$\frac{\partial \phi}{\partial \mathbf{x}} = \mathbf{k}$

so the wave vector is the (position) gradient vector of the phase $\phi$ and therefore $\mathbf{k}$ must be orthogonal to loci of constant phase.

In the case of circular waves, for example, these loci of constant phase are circles with centre at the source of the waves and the wave vectors $\mathbf{k}$ point radially outwards at right angles to them, as indicated in the diagram.

To get a diagrammatic understanding of the relationship between 1-forms and vectors, we can imagine focusing on a very small neighbourhood around some point located among these loci of constant phase. On this very small scale, the loci of constant phase will look flat rather than circular, but the wave vectors $\mathbf{k}$ will still be orthogonal to them. What we do is interpret this local pattern of (flat) surfaces of constant phase as the 1-form $\mathbf{\tilde{k}}$This 1-form corresponding to the wave vector $\mathbf{k}$ is

$\mathbf{\tilde{k}} = \langle \mathbf{k}, \ \rangle$

and as before we interpret this as a function waiting for a vector input. When it receives a vector input, say $\mathbf{v}$, it will output a number computed as the scalar product of $\mathbf{k}$ and $\mathbf{v}$. Thus we can write

$\mathbf{\tilde{k}}(\mathbf{v}) = \langle \mathbf{k}, \mathbf{v} \rangle = \mathbf{k} \cdot \mathbf{v}$

As indicated in the diagram, the vector $\mathbf{v}$ which we supply to $\mathbf{\tilde{k}}$ will be at an angle to the wave vector $\mathbf{k}$. If the vector $\mathbf{v}$ is parallel to the loci of constant phase then $\mathbf{\tilde{k}}(\mathbf{v}) = 0$ because $\mathbf{k}$ and $\mathbf{v}$ will be orthogonal. In the language of 1-forms, this would be interpreted by saying that the vector $\mathbf{v}$ will not pierce the 1-form $\mathbf{\tilde{k}}$ because it will not cross any of the loci of constant phase. Conversely, if the vector $\mathbf{v}$ is parallel to the wave vector $\mathbf{k}$ (orthogonal to the loci of constant phase), we would say that $\mathbf{v}$ will pierce the 1-form $\mathbf{\tilde{k}}$ as much as possible, because it will cross as many loci of constant phase as it possibly can. Between these extremes we will get intermediate values of the 1-form $\mathbf{\tilde{k}}(\mathbf{v})$. The key idea, then, is that the set of loci of constant phase in the neighbourhood of a point is the diagrammatic representation of the 1-form $\mathbf{\tilde{k}}$. When we feed a vector $\mathbf{v}$ into this 1-form we get a measure $\mathbf{\tilde{k}}(\mathbf{v})$ of how many loci of constant phase the vector pierces. This is the language being used by MTW in the prelude to Exercise 2.1 above.

To actually solve Exercise 2.1, begin by recalling from quantum mechanics that a photon’s momentum $\mathbf{p}$ is such that $|\mathbf{p}| = E/c$ where $E = hf$ is the photonic energy and $f$ is the frequency of the oscillator. Since $\lambda f = c$ where $\lambda$ is the photon’s wavelength, we have $E = hc/\lambda$ so the magnitude of the photon’s momentum is

$|\mathbf{p}| = \frac{E}{c} = \frac{h}{\lambda} = \hbar \frac{2\pi}{\lambda} = \hbar |\mathbf{k}|$

and in fact

$\mathbf{p} = \hbar \mathbf{k}$

Note that therefore

$\lambda = \frac{h}{|\mathbf{p}|}$

Famously, de Broglie’s idea in his 1924 PhD thesis was that this wavelength formula applies not just to photons but also to massive particles such as electrons, for which the momentum $\mathbf{p}$ would be calculated as

$\mathbf{p} = m \mathbf{u}$

where $m$ is the mass of the particle and $\mathbf{u}$ is its four-velocity in Minkowski spacetime. Note that this four-velocity is such that $\mathbf{u}\cdot\mathbf{u} = -1$ (easily demonstrated using the $- +++$ metric of Minkowski spacetime).

Thus we have

$\mathbf{p} = m \mathbf{u} = \hbar \mathbf{k}$

so

$\mathbf{u} = \frac{\hbar}{m} \mathbf{k}$

In the prelude to Exercise 2.1, MTW say

relabel the surfaces of $\mathbf{\tilde{k}}$ by $\hbar \times phase$, thereby obtaining the momentum 1-form $\mathbf{\tilde{p}}$. Pierce this 1-form with any vector $\mathbf{v}$, and find the result that $\mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v})$.

Following the authors’ instructions, we relabel the surfaces of $\mathbf{\tilde{k}}$ (i.e., the loci of constant phase) by multiplying by $\hbar$ to get the 1-form

$\mathbf{\tilde{p}} = \hbar \mathbf{\tilde{k}} = \hbar \langle \mathbf{k}, \ \rangle$

As usual, this 1-form is a linear function waiting for a vector input. Supplying the input $\mathbf{v}$ we then get

$\mathbf{\tilde{p}}(\mathbf{v}) = \hbar \langle \mathbf{k}, \mathbf{v} \rangle = \hbar \mathbf{k} \cdot \mathbf{v}$

But this is exactly what we get when we work out $\mathbf{p} \cdot \mathbf{v}$ since

$\mathbf{p} \cdot \mathbf{v} = m \mathbf{u} \cdot \mathbf{v} = m \frac{\hbar}{m} \mathbf{k} \cdot \mathbf{v} = \hbar \mathbf{k} \cdot \mathbf{v}$

Thus, we have solved Exercise 2.1 by showing that $\mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v})$ is in accord with the quantum mechanical properties of de Broglie waves, as claimed by MTW.

Alternative approaches to formulating geodesic equations on Riemannian manifolds and proof of their equivalence

A geodesic can be defined as an extremal path between two points on a manifold in the sense that it minimises or maximises some criterion of interest (e.g., minimises distance travelled, maximises proper time, etc). Such a path will satisfy some geodesic equations equivalent to the Euler-Lagrange equations of the calculus of variations. A geodesic can also be defined in a conceptually different way as the straightest’ possible path between two points on a manifold. In this case the path will satisfy geodesic equations derived by requiring parallel transport of a tangent vector along the path. Although these are conceptually different ways of defining geodesics, they are mathematically equivalent. In the present note I want to explore the derivation of geodesic equations in these two different ways and prove their mathematical equivalence.

Now, in the calculus of variations we typically define a system’s action $S$ to be the time-integral of a Lagrangian $L$:

$S \equiv \int^{t_B}_{t_A} L(q_i, \dot{q_i}) dt$

where $L(q_i, \dot{q_i})$ says that the Lagrangian is a function of position coordinates $q_i$ and velocities $\dot{q_i}$ (and $i$ ranges over however many coordinates there are). We find the trajectory that yields a desired extremal value of the action $S$ as the one that satisfies the Euler-Lagrange equations

$0 = \frac{d}{dt} \big(\frac{\partial L}{\partial \dot{q_i}} \big) - \frac{\partial L}{\partial q_i}$

Let us now suppose that we are facing an exactly analogous situation in which there are two points on the manifold, $A$ and $B$, and we are considering possible paths between them to try to find the extremal one. We can describe any path between $A$ and $B$ by specifying the coordinates of the points along it as functions of a parameter $\sigma$ that goes from a value of $0$ at $A$ to a value of $1$ at $B$, i.e., by specifying the functions $x^{\mu}(\sigma)$. Noting that the line element can be written as

$ds^2 = g_{\mu \gamma} dx^{\mu} dx^{\gamma}$

we can write the length of a particular path as

$s = \int \sqrt{ds^2} = \int^1_0 \sqrt{g_{\mu \gamma} \frac{dx^{\mu}}{d \sigma} \frac{dx^{\gamma}}{d \sigma}} d \sigma$

Note that the metric is a function of the coordinates of points along the path, which in turn are functions of the parameter $\sigma$, i.e., $g_{\mu \gamma} = g_{\mu \gamma}(x^{\alpha}(\sigma))$. This situation is exactly analogous to the usual calculus of variations scenario because, writing $\dot{x}^{\alpha} \equiv d x^{\alpha}/d \sigma$, we see that we have a Lagrangian function

$L(x^{\alpha}, \dot{x}^{\alpha}) = \sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

and we hope to find the path $x^{\mu}(\sigma)$ that makes the integral of the Lagrangian extreme. This will be the path that satisfies the Euler-Lagrange equations

$0 = \frac{d}{d \sigma} \big(\frac{\partial L}{\partial \dot{x}^{\alpha}}\big) - \frac{\partial L}{\partial x^{\alpha}}$

This corresponds to $N$ separate differential equations in an $N$-dimensional manifold, one equation for each value of the index $\alpha$.

We can manipulate the Euler-Lagrange equations to get geodesic equations which are easier to use in particular contexts. First, note that

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{\partial}{\partial \dot{x}^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} (g_{\mu \gamma} \ \delta^{\mu}_{\alpha} \ \dot{x}^{\gamma} + g_{\mu \gamma} \ \dot{x}^{\mu} \ \delta^{\ \gamma}_{\alpha})$

because, for example, $\partial \dot{x}^{\mu}/\partial \dot{x}^{\alpha} = \delta^{\mu}_{\alpha}$. Also note that the metric is treated as a constant as it depends on $x^{\alpha}$ not on $\dot{x}^{\alpha}$. Doing the sums over the Kronecker deltas we get

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{2L}(g_{\alpha \gamma} \ \dot{x}^{\gamma} + g_{\mu \alpha} \ \dot{x}^{\mu})$

$= \frac{1}{2L}(g_{\alpha \mu} \ \dot{x}^{\mu} + g_{\alpha \mu} \ \dot{x}^{\mu})$

$= \frac{1}{L} g_{\alpha \mu} \ \dot{x}^{\mu}$

But notice that since

$s = \int L d \sigma$

we have

$\frac{ds}{d \sigma} = L$

so

$\frac{1}{L} = \frac{d \sigma}{ds}$

and we can write

$\frac{\partial L}{\partial \dot{x}^{\alpha}} = \frac{1}{L} g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{d \sigma} \frac{d \sigma}{ds}$

$= g_{\alpha \mu} \frac{d x^{\mu}}{ds}$

Next, we have

$\frac{\partial L}{\partial x^{\alpha}} = \frac{\partial}{\partial x^{\alpha}}\sqrt{g_{\mu \gamma} \ \dot{x}^{\mu} \ \dot{x}^{\gamma}}$

$= \frac{1}{2L} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \dot{x}^{\mu} \dot{x}^{\gamma}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{d \sigma} \frac{d x^{\gamma}}{d \sigma} \frac{d \sigma}{ds}$

$= \frac{1}{2} \frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Putting these results into the Euler-Lagrange equations we get

$0 = \frac{d}{d \sigma} \big(g_{\alpha \mu} \frac{d x^{\mu}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{d \sigma}$

Finally, multiplying through by $d \sigma/ds$ we get

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

where I have also renamed $\mu \rightarrow \beta$ in the first term to make it clearer that the Einstein summations in the first and second terms are independent. This is the first version of the geodesic equations, derived by requiring that the path between the points $A$ and $B$ should be extremal in the sense of satisfying the Euler-Lagrange equations of the calculus of variations.

We will now derive a second version of the geodesic equations by requiring the geodesic to be a path that is locally straight. In differential geometry a path is defined as straight if it parallel transports its own tangent vector, i.e., if the tangent vector does not change as we move an infinitesimal step along the path. If we take an arbitrary point on the path to be $x^{\mu} \ e_{\mu}$ and we take $ds$ to be an infinitesimal displacement along the path, then a tangent vector to the path is

$\frac{d x^{\mu}}{d \sigma} e_{\mu}$

and we want

$\frac{d}{ds}\big(\frac{d x^{\mu}}{d \sigma}e_{\mu} \big) = \frac{d^2 x^{\mu}}{ds d \sigma} e_{\mu} + \frac{d x^{\mu}}{d \sigma} \frac{d e_{\mu}}{ds} = 0$

Multiplying through by $d \sigma/ds$ this gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d e_{\mu}}{ds} = 0$

But

$\frac{d e_{\mu}}{ds} = \frac{\partial e_{\mu}}{\partial x^{\gamma}} \frac{d x^{\gamma}}{d s}$

$= \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha}$

Putting this into the equation gives

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds} \Gamma^{\alpha}_{\hphantom{\alpha} \mu \gamma} e_{\alpha} = 0$

To enable us to factor out the basis vector we can rename the indices in the second term as $\mu \rightarrow \alpha$ and $\gamma \rightarrow \beta$ to get

$\frac{d^2 x^{\mu}}{ds^2} e_{\mu} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} e_{\mu} = 0$

$\iff$

$\big[\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} \big] e_{\mu} = 0$

$\implies$

$\frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = 0$

This is the second version of the geodesic equations, derived by assuming that the path between the two points on the manifold is locally straight.

We now have two seemingly different versions of the geodesic equations, namely

$0 = \frac{d}{ds} \big(g_{\alpha \beta} \frac{d x^{\beta}}{ds} \big) - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

and

$0 = \frac{d^2 x^{\mu}}{ds^2} + \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta}$

We will next show that they are in fact mathematically equivalent. Starting from the first version, we can expand out the brackets to get

$0 = \frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

$\iff$

$0 = \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}}\frac{dx^{\sigma}}{ds} \frac{dx^{\beta}}{ds} + g_{\alpha \beta} \frac{d^2 x^{\beta}}{ds^2} - \frac{1}{2}\frac{\partial g_{\mu \gamma}}{\partial x^{\alpha}} \frac{d x^{\mu}}{ds} \frac{d x^{\gamma}}{ds}$

Now we rename the indices as follows: $\sigma \rightarrow \alpha$ in the first term; $\sigma \rightarrow \beta$ in the second term; $\beta \rightarrow \mu$ and $\alpha \rightarrow \sigma$ in the third term; and $\alpha \rightarrow \sigma$, $\mu \rightarrow \alpha$, $\gamma \rightarrow \beta$ in the fourth term. We get

$0 = \frac{1}{2}\frac{\partial g_{\sigma \beta}}{\partial x^{\alpha}}\frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + \frac{1}{2}\frac{\partial g_{\sigma \alpha}}{\partial x^{\beta}} \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2} - \frac{1}{2}\frac{\partial g_{\alpha \beta}}{\partial x^{\sigma}} \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds}$

We can write this as

$0 = \frac{dx^{\alpha}}{ds} \frac{dx^{\beta}}{ds} \frac{1}{2} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}] + g_{\sigma \mu} \frac{d^2 x^{\mu}}{ds^2}$

Finally, multiplying through by $g^{\mu \sigma}$ and using the facts that

$g^{\mu \sigma} \ g_{\sigma \mu} = 1$

and

$\Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} = \frac{1}{2} g^{\mu \sigma} [\partial_{\alpha} \ g_{\beta \sigma} + \partial_{\beta} \ g_{\sigma \alpha} - \partial_{\sigma} \ g_{\alpha \beta}]$

we get

$0 = \frac{d x^{\alpha}}{ds} \frac{d x^{\beta}}{ds} \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} + \frac{d^2 x^{\mu}}{ds^2}$

which is the second version of the geodesic equation. Thus, the two versions are equivalent as claimed.

Geometric interpretation of Christoffel symbols and some alternative approaches to calculating them

In a classic paper in 1869, Elwin Bruno Christoffel (1829-1900) introduced his famous Christoffel symbols $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ to represent an array of numbers describing a metric connection. They are also known as connection coefficients (and sometimes less respectfully as Christ-awful symbols’). In differential geometry one usually first encounters them when studying covariant derivatives of tensors in tensor calculus. For example, suppose we try to differentiate the contravariant vector $A = A^{\alpha} e_{\alpha}$, where $e_{\alpha}$ denotes a coordinate basis vector (and we are using the Einstein summation convention). We get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \frac{\partial e_{\alpha}}{\partial x^{\beta}}$

In general, the partial derivative in the second term on the right will result in another vector which we can write in terms of its coordinate basis as

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

This defines the Christoffel symbol $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$. The downstairs indices refer to the rate of change of the basis components $e_{\alpha}$ with respect to the coordinate variable $x^{\beta}$ in the direction of the coordinate basis vector $e_{\gamma}$ ($\gamma$ being the upstairs index). Substituting the second equation into the first we get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\alpha} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

To enable us to factor out the coordinate basis vector we can exchange the symbols $\alpha$ and $\gamma$ in the second term on the right to get

$\frac{\partial A}{\partial x^{\beta}} = \frac{\partial A^{\alpha}}{\partial x^{\beta}} e_{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta} \ e_{\alpha}$

$= \big( \frac{\partial A^{\alpha}}{\partial x^{\beta}} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}\big) \ e_{\alpha}$

The expression in the bracket is called the covariant derivative of the contravariant vector $A$, i.e., the rate of change of $A^{\alpha}$ in each of the directions $\beta$ of the coordinate system $x^{\beta}$. It has the important property that it is itself tensorial (unlike the ordinary partial derivative of the tensor on its own). This covariant derivative is often written using the notation

$\nabla_{\beta} \ A^{\alpha} = \partial_{\beta} \ A^{\alpha} + A^{\gamma} \Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$

Having thus established the meaning of the Christoffel symbols, one then goes on to work out that the covariant derivative of a one-form is

$\nabla_{\beta} \ A_{\alpha} = \partial_{\beta} \ A_{\alpha} - A_{\gamma} \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$

and that the covariant derivatives of higher rank tensors are constructed from the building blocks of $\nabla_{\beta} \ A^{\alpha}$ and $\nabla_{\beta} \ A_{\alpha}$ by adding a $\Gamma^{\alpha}_{\hphantom{\alpha} \gamma \beta}$ term for each upper index $\gamma$ and a $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ term for each lower index $\gamma$. For example, the covariant derivative of the $(1, 1)$ rank-2 tensor $X^{\mu}_{\hphantom{\mu} \sigma}$ is

$\nabla_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} = \partial_{\beta} \ X^{\mu}_{\hphantom{\mu} \sigma} + X^{\alpha}_{\hphantom{\mu} \sigma} \ \Gamma^{\mu}_{\hphantom{\mu} \alpha \beta} - X^{\mu}_{\hphantom{\mu} \alpha} \ \Gamma^{\alpha}_{\hphantom{\alpha} \sigma \beta}$

Christoffel symbols then go on to play vital roles in other areas of differential geometry, perhaps most notably as key components in the definition of the Riemann curvature tensor.

It is possible to have a working knowledge of all of this without truly understanding at a deep level, say geometrically, what Christoffel symbols really mean. In the present note I want to delve a bit more deeply into how one might calculate and interpret Christoffel symbols geometrically. I also want to explore some alternative ways of calculating them in the context of a simple plane polar coordinate system $(r, \theta)$ which is related to the usual Cartesian $(x, y)$ coordinate system via the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

In an $n-$dimensional manifold there are potentially $n^3$ Christoffel symbols to be calculated, though this number is usually reduced by symmetries. In the present plane polar coordinate case, we will need to calculate $2^3 = 8$ Christoffel symbols. These are

$\Gamma^{r}_{\hphantom{r} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta}$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r}$

$\Gamma^{r}_{\hphantom{r} \theta r}$

$\Gamma^{r}_{\hphantom{r} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r r}$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta}$

$\Gamma^{r}_{\hphantom{r} r \theta}$

Geometric approach
Consider the situation shown in the diagram below where two vectors $(e_{\theta})_P$ and $(e_{\theta})_S$ of the basis vector field $e_{\theta}$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_{\theta})_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

Now, in plane polar coordinates the magnitude of $e_{\theta}$ is

$|e_{\theta}| = r$

Therefore the length of the arc $L$ in the diagram is

$L = r \Delta \theta$

If $\Delta \theta$ is small, we have

$L \approx |\Delta_{\theta} e_{\theta}|$

where $\Delta_{\theta} e_{\theta}$ is the vector connecting the endpoints of $(e_{\theta})_P$ and $(e_{\theta})_S$, i.e., $\Delta_{\theta} e_{\theta} = (e_{\theta})_S - (e_{\theta})_P$.

Therefore

$|\Delta_{\theta} e_{\theta}| \approx r \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we get

$|d_{\theta} e_{\theta}| = r d \theta$

From the diagram we see that $d_{\theta} e_{\theta}$ points in the opposite direction of $e_r$. Therefore we have

$d_{\theta} e_{\theta} = - r d \theta e_r$

(note that in plane polar coordinates $e_r$ is of unit length). From this equation we have

$\frac{d_{\theta} e_{\theta}}{d \theta} \equiv \frac{\partial e_{\theta}}{\partial \theta} = -r e_r$

But from the definition of Christoffel symbols we have

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta}$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

We have obtained the first two Christoffel symbols on our list from the geometric setup and the nice thing about this approach is that we can see what the underlying changes in the coordinate basis vectors looked like.

To obtain the next two Christoffel symbols on our list, we consider a change in the vector field $e_{\theta}$ due to a displacement in the radial direction from $P$ to $Q$ in the following diagram:

We have moved outwards by a small amount $\Delta r$ and as a result the length of the vectors in the vector field $e_{\theta}$ has increased by a small amount $|\Delta_r e_{\theta}|$ shown in the diagram. From the diagram we see that the proportions of the two increases must be same, so we have

$\frac{|\Delta_r e_{\theta}|}{|e_{\theta}|} = \frac{\Delta r}{r}$

or

$|\Delta_r e_{\theta}| = \Delta r \frac{1}{r} |e_{\theta}|$

Passing to the differential limit as $\Delta r \rightarrow 0$ we get

$|d_r e_{\theta}| = dr \frac{1}{r} |e_{\theta}|$

Since $d_r e_{\theta}$ is directed along the vector $e_{\theta}$ we can write the vector equation

$d_r e_{\theta} = dr \frac{1}{r} e_{\theta}$

so

$\frac{d_r e_{\theta}}{dr} \equiv \frac{\partial e_{\theta}}{\partial r} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

We have thus found two more Christoffel symbols from the geometrical setup. To get the next two Christoffel symbols on our list we observe that the basis vector field $e_r$ does not change as we move in the radial direction (either in magnitude or direction) so we must have

$\frac{\partial e_r}{\partial r} = 0$

where the right hand side here denotes a zero vector. But we know that

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta}$

so we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Finally, to get the last two remaining Christoffel symbols on our list, we consider a change in the vector field $e_r$ due to an angular displacement. In the diagram below two vectors $(e_r)_P$ and $(e_r)_S$ of the basis vector field $e_r$ are drawn emanating from points $P$ and $S$ respectively:

If we parallel transport the vector $(e_r)_P$ from $P$ to $S$ we end up with the situation shown in the next diagram:

The arc length $L$ is

$L = |e_r| \Delta \theta = \Delta \theta$

(since the magnitude of the coordinate basis vector $e_r$ is $|e_r| = 1$). But for small $\Delta \theta$ we also have

$L \approx |\Delta_{\theta} e_r|$

where $\Delta_{\theta} e_r$ is the vector connecting the endpoints of $(e_r)_P$ and $(e_r)_S$, i.e., $\Delta_{\theta} e_r = (e_r)_S - (e_r)_P$. Therefore

$|\Delta_{\theta} e_r| = \Delta \theta$

Passing to the differential limit as $\Delta \theta \rightarrow 0$ we have

$|d_{\theta} e_r| = d \theta$

But $d_{\theta} e_r$ has the same direction as $e_{\theta}$. Therefore

$d_{\theta} e_r = \frac{1}{r} d \theta e_{\theta}$

where the factor $\frac{1}{r}$ is needed to correct for the magnitude $r$ of $e_{\theta}$ (we only want the direction of $e_{\theta}$ here). Therefore we see that

$\frac{d_{\theta} e_r}{d \theta} \equiv \frac{\partial e_r}{\partial \theta} = \frac{1}{r} e_{\theta}$

But

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

This completes the geometric calculation of all the Christoffel symbols for plane polar coordinates.

Algebraic approach

It is possible to calculate the eight Christoffel symbols quite easily for plane polar coordinates by first expressing the basis components $e_r$ and $e_{\theta}$ in terms of the Cartesian components $e_x$ and $e_y$. Note that these basis components are one-forms, so they transform as

$e^{\prime}_{\alpha} = \frac{\partial x^{\beta}}{\partial x^{\prime \alpha}} e_{\beta}$

We use the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

to calculate the coefficients. We get

$e_r = \frac{\partial x}{\partial r} e_x + \frac{\partial y}{\partial r} e_y$

$e_{\theta} = \frac{\partial x}{\partial \theta} e_x + \frac{\partial y}{\partial \theta} e_{\theta}$

and therefore

$e_r = \cos \theta e_x + \sin \theta e_y$

$e_{\theta} = -r \sin \theta e_x + r \cos \theta e_y$

Then we calculate the Christoffel symbols as follows. First,

$\frac{\partial e_r}{\partial r} = 0$

so

$\frac{\partial e_r}{\partial r} = \Gamma^{r}_{\hphantom{r} r r} e_r + \Gamma^{\theta}_{\hphantom{\theta} r r} e_{\theta} = 0$

and we conclude

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

Next,

$\frac{\partial e_r}{\partial \theta} = - \sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_r}{\partial \theta} = \Gamma^{\theta}_{\hphantom{\theta} r \theta} e_{\theta} + \Gamma^{r}_{\hphantom{r} r \theta} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

Next,

$\frac{\partial e_{\theta}}{\partial \theta} = -r \cos \theta e_x - r \sin \theta e_y = -r e_r$

so

$\frac{\partial e_{\theta}}{\partial \theta} = \Gamma^{r}_{\hphantom{r} \theta \theta} e_r + \Gamma^{\theta}_{\hphantom{\theta} \theta \theta} e_{\theta} = -r e_r$

Therefore we conclude

$\Gamma^{r}_{\hphantom{r} \theta \theta} = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

Finally,

$\frac{\partial e_{\theta}}{\partial r} = -\sin \theta e_x + \cos \theta e_y = \frac{1}{r} e_{\theta}$

so

$\frac{\partial e_{\theta}}{\partial r} = \Gamma^{\theta}_{\hphantom{\theta} \theta r} e_{\theta} + \Gamma^{r}_{\hphantom{r} \theta r} e_r = \frac{1}{r} e_{\theta}$

from which we conclude

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

Metric tensor approach

The previous approach relied on knowing the functional relationship between the Cartesian coordinates $(x, y)$ and the plane polar coordinates $(r, \theta)$. There is another more generally useful method of calculating the Christoffel symbols from the components of the metric tensor, using the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

I will first derive this formula from first principles, then use it to find the Christoffel symbols for the plane polar coordinates case.

The first step is to show that Christoffel symbols are symmetric in their lower indices, i.e.,

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

as this property will be needed in the derivation of the formula. To prove the symmetry property we start from the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Suppose we now decompose the basis vectors $e_{\alpha}$ in a local Cartesian coordinate system. Then using the transformation rule for one-forms we have

$e_{\alpha} = \frac{\partial x^m}{\partial x^{\alpha}} e_m$

where the $x^m$ are the Cartesian coordinates and the $e_m$ are the coordinate basis vectors (which are constant in both magnitude and direction in the Cartesian system). Differentiating gives

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

Equating the expressions for $\frac{\partial e_{\alpha}}{\partial x^{\beta}}$ we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\alpha} \partial x^{\beta}} e_m$

But then

$\Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ e_{\gamma} = \frac{\partial^2 x^m}{\partial x^{\beta} \partial x^{\alpha}} e_m$

so it follows from Young’s Theorem (equality of cross-partials) that

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha}$

We conclude that Christoffel symbols are symmetric in their lower indices, as claimed.

Note too that the components $g_{\mu \gamma}$ of the general metric tensor are also symmetric with respect to their indices. This follows from the defining equation of the metric tensor components in terms of the basis vector fields $e_{\gamma}$, namely

$g_{\mu \gamma} \equiv e_{\mu} \cdot e_{\gamma}$

Since $e_{\mu} \cdot e_{\gamma} = e_{\gamma} \cdot e_{\mu}$

the metric is symmetric, i.e.,

$g_{\mu \gamma} = g_{\gamma \mu}$

To derive the formula for the Christoffel symbols in terms of the metric tensor components, we begin again with the defining equation for Christoffel symbols,

$\frac{\partial e_{\alpha}}{\partial x^{\beta}} \equiv \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma}$

Taking the scalar product with another basis vector on both sides we get

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ e_{\gamma} \cdot e_{\mu} = \frac{\partial e_{\alpha}}{\partial x^{\beta}} \cdot e_{\mu}$

$= \frac{\partial (e_{\alpha} \cdot \ e_{\mu})}{\partial x^{\beta}} - e_{\alpha} \cdot \frac{\partial e_{\mu}}{\partial x^{\beta}}$

$= \frac{\partial g_{\alpha \mu}}{\partial x^{\beta}} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ e_{\alpha} \cdot \ e_{\rho}$

$= \partial_{\beta} \ g_{\alpha \mu} - \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

Therefore we have

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\rho}_{\hphantom{\rho} \mu \beta} \ g_{\alpha \rho}$

In the second term on the right hand side we can rename $\rho \rightarrow \gamma$ and use the fact that the metric is symmetric to reverse the indices. We get

$\partial_{\beta} \ g_{\alpha \mu} = \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu} + \Gamma^{\gamma}_{\hphantom{\gamma} \mu \beta} \ g_{\gamma \alpha}$

By cyclically renaming the indices $\beta$, $\alpha$, and $\mu$ we can generate two more similar equations. From the cyclic permutation $\beta$, $\alpha$, $\mu$ $\rightarrow$ $\alpha$, $\mu$, $\beta$ we get

$\partial_{\alpha} \ g_{\mu \beta} = \Gamma^{\gamma}_{\hphantom{\gamma} \mu \alpha} \ g_{\gamma \beta} + \Gamma^{\gamma}_{\hphantom{\gamma} \beta \alpha} \ g_{\gamma \mu}$

and from the cyclic permutation $\alpha$, $\mu$, $\beta$ $\rightarrow$ $\mu$, $\beta$, $\alpha$ we get

$\partial_{\mu} \ g_{\beta \alpha} = \Gamma^{\gamma}_{\hphantom{\gamma} \beta \mu} \ g_{\gamma \alpha} + \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \mu} \ g_{\gamma \beta}$

Now we add the first two equations and subtract the third to get

$\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha} = 2 \Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} \ g_{\gamma \mu}$

where we have taken advantage of the symmetry in the lower indices of the Christoffel symbols to cancel some terms. Using the fact that

$g^{\mu \gamma} \ g_{\gamma \mu} = 1$

we multiply both sides by $\frac{1}{2}g^{\mu \gamma}$ to get the final formula:

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2}g^{\mu \gamma}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\beta \alpha}]$

$= \frac{1}{2}g^{\gamma \mu}[\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

This is made easier to remember by noting the following facts. A factor of the inverse metric generates the Christoffel symbol’s upper index. The negative term has the symbol’s lower indices as the indices of the metric. The other two terms in the bracket are cyclic permutations of this last term.

Having derived the formula we can now employ it to calculate the eight Christoffel symbols for plane polar coordinates. We can work out the metric tensor using the distance formula

$ds^2 = dx^2 + dy^2$

with the conversion equations

$x = r \cos \theta$

$y = r \sin \theta$

Then

$dx = \cos \theta dr - r \sin \theta d \theta$

$dy = \sin \theta dr + r \cos \theta d \theta$

so

$dx^2 = \cos^2 \theta dr^2 + r^2 \sin^2 \theta d \theta^2 - 2 r \sin \theta \cos \theta dr d \theta$

$dy^2 = \sin^2 \theta dr^2 + r^2 \cos^2 \theta d \theta^2 + 2 r \sin \theta \cos \theta dr d \theta$

Therefore the metric in plane polar coordinates is

$ds^2 = dx^2 + dy^2 = dr^2 + r^2 d \theta^2$

The metric tensor is therefore

$[g_{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & r^2 \end{pmatrix}$

and the inverse metric is

$[g^{\alpha \beta}] = \begin{pmatrix} 1 & 0 \\ \ \\ 0 & \frac{1}{r^2} \end{pmatrix}$

Now, in the formula for $\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta}$ the indices $\alpha$, $\beta$, $\gamma$ and $\mu$ represent the polar coordinates $r$ and $\theta$ in various permutations. Inspection of $[g_{\alpha \beta}]$ shows that the only partial derivative terms which do not equal zero are

$\partial_r \ g_{\theta \theta} = \partial_r (r^2) = 2r$

Inspection of $[g^{\alpha \beta}]$ shows that this equals zero except when

$g^{rr} = 1$

and

$g^{\theta \theta} = \frac{1}{r^2}$

Substituting these values of the metric tensor components into the formula

$\Gamma^{\gamma}_{\hphantom{\gamma} \alpha \beta} = \frac{1}{2} g^{\gamma \mu} [\partial_{\beta} \ g_{\alpha \mu} + \partial_{\alpha} \ g_{\mu \beta} - \partial_{\mu} \ g_{\alpha \beta}]$

we get

$\Gamma^{r}_{\hphantom{r} \theta \theta} = \frac{1}{2} g^{r r} \big( - \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} (-2r) = -r$

$\Gamma^{\theta}_{\hphantom{\theta} \theta \theta} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} \theta r} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} \theta r} = 0$

$\Gamma^{r}_{\hphantom{r} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r r} = 0$

$\Gamma^{\theta}_{\hphantom{\theta} r \theta} = \frac{1}{2} g^{\theta \theta} \big( \partial_r \ g_{\theta \theta}\big) = \frac{1}{2} \frac{1}{r^2} 2r = \frac{1}{r}$

$\Gamma^{r}_{\hphantom{r} r \theta} = 0$

On Lie derivatives of tensor fields

Differential geometry provides a number of ways of extending the familiar notion of the derivative of a real-valued function to enable differentiation of various types of tensor fields that `live’ on manifolds, such as scalars, contravariant vectors, one-forms (also known as covariant vectors) and mixed tensors. The problem that needs to be overcome in such cases is the fact that partial differentiation of tensors is generally not tensorial, i.e., the result is not itself a tensor. The reason is that the process of differentiation involves subtracting tensors living in different tangent spaces on a curved manifold, so their difference does not transform in the same way as either of the tensors individually. For example, consider a contravariant vector field $X^a$. In the tangent space at point $P$ the transformation law for $X^a$ is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_P X^b$

whereas in the tangent space at point $Q$ the transformation law is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_Q X^b$

If we imagine these two tangent spaces at points $P$ and $Q$ on the manifold separated by distance $\delta u$, the derivative would involve computing

$\lim_{\delta u \rightarrow 0} \frac{[X^a]_P - [X^a]_Q}{\delta u}$

and the difference in the numerator will clearly not transform like either of the contravariant vectors individually since their transformation matrices are evaluated at different points. The derivative will not therefore be a tensor itself.

The usual way of getting around this is to introduce some kind of auxiliary field on to the manifold which provides a link between the two tensors, thus enabling them to transform in the same way (with respect to the auxiliary field) when they are subtracted. In the present note I want to explore the particular concept of the Lie derivative of a tensor field (named after the mathematician Marius Sophus Lie, 1842-1899) which employs this auxiliary field approach by introducing a contravariant vector field on to the manifold. To this end, suppose we define a vector field $X^a (x)$. This can be used to define streamlines in the manifold (also called a congruence of curves) as the solutions of the ordinary differential equations

$\frac{dx^a}{du} = X^a (x(u))$

where $u$ is a parameter determining the position on a given streamline. The equations encapsulate the fact that each point on a streamline has a tangent vector belonging to the vector field.

Example Using the notation $x^i$ to denote the $i$-th coordinate, suppose the vector field is $X = (1, x^2(u))$. Then $X^1 = 1$, $X^2 = x^2$ and the streamlines for this vector field are obtained by solving simultaneously the differential equations

$\frac{dx^1}{du} = 1$

$\frac{dx^2}{du} = x^2$

Solving the first equation gives

$x^1 = u + c^1$

Solving the second equation gives

$x^2 = e^{c^2}e^u$

Using the solution of the first equation to substitute for $u$ in the solution of the second one we get

$x^2 = e^{c^2} e^{x^1 - c^1} = e^{c^2 - c^1}e^{x^1} \equiv Ce^{x^1}$

Therefore the streamlines of the vector field $(1, x^2(u))$ are the graphs of the equation

$x^2 = Ce^{x^1}$

where $C$ is a constant. Some of the streamlines are shown in the figure below.

Now suppose we want to find the Lie derivative of a tensor field, $T^{ab \cdots}_{cd \cdots}(x)$, using the vector field $X^a(x)$. The essential idea is to use the streamlines of the vector field to link the tensor at some point $P$, $T^{ab\cdots}_{cd \cdots}(P)$, with the tensor at some neighbouring point $Q$, $T^{ab\cdots}_{cd \cdots}(Q)$, in such a way that the two will have the same transformation matrix at point $Q$ (with respect to the auxiliary vector field). We can then subtract the two tensors at $Q$ and so define the derivative at $P$ by a limiting process as $Q$ tends to $P$. In all such cases, the technique begins by considering a coordinate transformation from $P$ to $Q$ of the form

$x^{\prime a} = x^a + \delta u X^a(x)$

where $\delta u$ is arbitrarily small. The point $Q$ with coordinates $x^{\prime a}$ lies on the streamline through $P$ which the vector field $X^a(x)$ generates. Differentiating the coordinate transformation we get

$\frac{\partial x^{\prime a}}{\partial x^b} = \frac{\partial x^a}{\partial x^b} + \delta u \ \partial_b X^a$

$= \delta^a_b + \delta u \ \partial_b X^a$

where $\delta^a_b$ is the Kronecker delta and $\partial_b \equiv \frac{\partial}{\partial x^b}$. What we now do is consider the effect of the above coordinate transformation on the tensor field $T^{ab\cdots}_{cd \cdots}$ at the points $P$ and $Q$. In what follows I will employ this general procedure to obtain the Lie derivative formulas with respect to a contravariant vector field $X^a(x)$ in the case of a scalar field $\phi$, a contravariant vector field $Y^a$, a one-form field $Y_a$, and a general mixed tensor field $T^{ab\cdots}_{cd \cdots}$.

The Lie derivative of a scalar field $\phi$

Not surprisingly this is the easiest case to deal with since scalars are invariants: the values of a scalar field defined over a manifold do not change under a change in the coordinate system being used. The value of the scalar field at the point $P$ will simply be $\phi(x)$ and the value at the point $Q$ will be

$\phi(x^{\prime}) = \phi(x^c + \delta u X^c)$

We can expand this in a Taylor series about the point $P$ with coordinates $x$ to get the first-order approximation

$\phi(x^{\prime}) \approx \phi(x) + \delta u X^c \ \partial_c \phi(x)$

The Lie derivative of the scalar field with respect to the contravariant vector field $X^a(x)$ is then

$L_X \phi = \lim_{\delta u \rightarrow 0} \frac{\phi(x^{\prime}) - \phi(x)}{\delta u} = X^c \ \partial_c \phi$

$\equiv X^a \ \partial_a \phi$

We observe that the Lie derivative of the scalar field $\phi(x)$ with respect to the vector field $X^a(x)$ is actually the directional derivative of $\phi$ in the direction of the vector $X^a$. In the differential geometry literature in this area it is common to associate the contravariant vector field $X$ with the linear differential operator $X^a \ \partial_a$ (which operates on any real-valued function $f$ to produce another function $g$) and essentially treat them as the same object. Given a point $P$ on the manifold, one thinks of the partial differential operators $[\partial_a]_P$ as constituting a basis for all the vectors in the tangent space at $P$, so that any vector at $P$ can be written as a linear combination of the $[\partial_a]_P$ in the form

$[X]_P = [X^a]_P [\partial_a]_P$

This is the intuitive justification for treating the vector field $X$ and the linear differential operator $X^a \ \partial_a$ as being the same things. Under this convention, one often sees the Lie derivative of a scalar field $\phi$ with respect to the contravariant vector field $X$ written as

$L_X \phi = X \phi$

The Lie derivative of a contravariant vector field $Y^a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the contravariant vector field $Y^a(x)$ at $P$ is mapped to

$Y^{\prime a}(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^b} Y^b(x)$

$= (\delta^a_b + \delta u \ \partial_b X^a) Y^b(x)$

$= Y^a(x) + \delta u \ Y^b \ \partial_b X^a$

The vector already at $Q$, namely $Y^a(x^{\prime})$, has a first-order Taylor series approximation about $x$ of the form

$Y^a(x^{\prime}) = Y^a(x^c + \delta u X^c)$

$\approx Y^a(x) + \delta u \ X^c \ \partial_c Y^a(x)$

The Lie derivative with respect to the vector field $X^a(x)$ is then given by

$L_X Y^a = \lim_{\delta u \rightarrow 0} \frac{Y^a (x^{\prime}) - Y^{\prime a}(x^{\prime})}{\delta u}$

$= X^c \ \partial_c Y^a - Y^b \ \partial_b X^a$

$\equiv X^b \ \partial_b Y^a - Y^b \ \partial_b X^a$

Under the convention of associating the vector field $X$ with the linear differential operator $X^a \partial_a$, one often sees the Lie derivative of a contravariant vector field $Y^a$ with respect to the field $X$ written as

$L_X Y^a = [X, Y]^a$

where $[X, Y] = XY - YX$ is called the Lie bracket (or commutator) of the two vector fields $X$ and $Y$. This is a new vector field (and therefore linear differential operator) that can be written alternatively as

$[X, Y] = X(Y^a \partial_a) - Y(X^a \partial_a)$

$= X^b \partial_b (Y^a \partial_a) - Y^b \partial_b (X^a \partial_a)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a + X^a Y^b(\partial_b \partial_a - \partial_a \partial_b)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a$

where the last equality follows from the fact that the second term in the penultimate line will always vanish by Young’s Theorem (equality of cross-partials). Therefore the $a$-th component of the vector field $[X, Y]$ is the one that appears in the expression of the Lie derivative $L_X Y^a$ above.

The Lie derivative of a one-form (covariant vector) field $Y_a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the one-form (covariant vector) field $Y_a(x)$ at $P$ is mapped to

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

To work out the transformation matrix here we need to write the coordinate transformation as

$x^{\prime b} = x^b + \delta u X^b(x)$

$\implies$

$x^b = x^{\prime b} - \delta u X^b(x)$

Partially differentiating we get

$\frac{\partial x^b}{\partial x^{\prime a}} = \delta^b_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^b$

$= \delta^b_a - \delta u \ \partial_c X^b \frac{\partial x^c}{\partial x^{\prime a}}$

$= \delta^b_a - \delta u \ \partial_c X^b \big(\delta^c_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^a\big)$

$= \delta^b_a - \delta u \ \partial_a X^b + O((\delta u)^2)$

We can ignore the $O((\delta u)^2)$ terms as they will disappear in the limiting process of the differentiation, so we have

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

$= \big( \delta^b_a - \delta u \ \partial_a X^b \big) Y_b(x)$

$= Y_a(x) - \delta u Y_b \ \partial_a X^b$

Again taking a first-order Taylor series approximation about $x$ at the point $Q$ we get that

$Y_a(x^{\prime}) = Y_a(x^c + \delta u X^c)$

$\approx Y_a(x) + \delta u X^c \ \partial_c Y_a(x)$

Then the Lie derivative of the one-form field $Y_a(x)$ with respect to the contravariant vector field $X^a(x)$ is obtained as

$L_X Y_a = \lim_{\delta u \rightarrow 0} \frac{Y_a(x^{\prime}) - Y^{\prime}_a (x^{\prime})}{\delta u}$

$= X^c \partial_c Y_a + Y_b \partial_a X^b$

$= X^b \partial_b Y_a + Y_b \partial_a X^b$

The Lie derivative of a mixed tensor field $T^{ab\cdots}_{cd \cdots}$

A good prototype for this case is the Lie derivative of the simplest type of mixed tensor field, the rank-2 tensor of type $(1, 1)$ represented as $T^a_b(x)$. We will therefore consider this case first and then use it to extrapolate to a general mixed tensor field of type $(p, q)$ represented as $T^{ab \cdots}_{cd \cdots}(x)$.

Under the coordinate transformation

$x^{\prime a} = x^a + \delta u X^a(x)$

the mixed tensor field $T^a_b(x)$ transforms as

$T^{\prime a}_b(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^c} \frac{\partial x^d}{\partial x^{\prime b}} T^c_d(x)$

$= (\delta^a_c + \delta u \ \partial_c X^a)(\delta^d_b - \delta u \ \partial_b X^d) T^c_d$

$= (\delta^a_c + \delta u \ \partial_c X^a)(T^c_b - \delta u T^c_d \ \partial_b X^d)$

$= T^a_b(x) - \delta u T^a_d \ \partial_b X^d + \delta u T^c_b \ \partial_c X^a + O((\delta u)^2)$

Under a first-order Taylor series approximation about $x$, the tensor at $Q$ can be written

$T^a_b(x^{\prime}) = T^a_b(x^c + \delta u X^c)$

$\approx T^a_b(x) + \delta u X^c \ \partial_c T^a_b$

The Lie derivative of $T^a_b(x)$ with respect to the contravariant vector field $X^a(x)$ is then

$L_X T^a_b = \lim_{\delta u \rightarrow 0} \frac{T^a_b(x^{\prime}) - T^{\prime a}_b(x^{\prime})}{\delta u}$

$= X^c \ \partial_c T^a_b + T^a_d \ \partial_b X^d - T^c_b \ \partial_c X^a$

We observe that the contravariant index $a$ contributes a term of the form $-T^c_b \ \partial_c X^a$ while the covariant index $b$ contributes a term of the form $T^a_d \ \partial_b X^d$.

Now consider the general mixed tensor $T^{ab \cdots}_{cd \cdots}$. The first-order Taylor series approximation of $T^{ab \cdots}_{cd \cdots}(x^{\prime})$ about $x$ gives

$T^{ab \cdots}_{cd \cdots}(x^e + \delta u X^e)$

$\approx T^{ab \cdots}_{cd \cdots}(x) + \delta u X^e \ \partial_e T^{ab \cdots}_{cd \cdots}(x)$

Therefore the first-term of the Lie derivative will be $X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$. This is of the same form as the first term of $L_X T^a_b$. Conveniently, it turns out that from then on each contravariant and covariant index in $T^{ab \cdots}_{cd \cdots}$ will contribute terms like the corresponding terms we saw above in $L_X T^a_b$. Therefore the Lie derivative of the general mixed tensor field $T^{ab \cdots}_{cd \cdots}(x)$ with respect to the contravariant vector field $X^a(x)$ will be of the form

$L_X T^{ab \cdots}_{cd \cdots} = X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$

$- \ T^{eb \cdots}_{cd \cdots} \ \partial_e X^a \ - \ T^{ae \cdots}_{cd \cdots} \ \partial_e X^b \ - \ \cdots$

$+ \ T^{ab \cdots}_{ed \cdots} \ \partial_c X^e \ + \ T^{ab \cdots}_{ce \cdots} \ \partial_d X^e \ + \ \cdots$

A problem involving the use of exterior derivatives of differential forms to re-express the classical gradient, curl and divergence operations

Modern differential geometry makes extensive use of differential forms and the concept of exterior derivatives of differential forms developed by the French mathematician Élie Cartan (1869-1951). A Wikipedia article about exterior derivatives of differential forms can be found here. As alluded to in this article, exterior derivatives of differential forms encompass a lot of results usually expressed in terms of vector fields in classical vector calculus. In particular, there is a duality between 1-forms, 2-forms and vector fields which allows the classical gradient, curl and divergence operations of vector calculus to be fully subsumed within the realm of exterior derivatives. In the present note I want to briefly explore how these three differentiation operations of vector calculus can be replaced with Cartan’s exterior derivative. The necessary notation and motivation for this are nicely encapsulated in the following problem which appears in Barrett O’Neill’s Elementary Differential Geometry book (Revised Second Edition, p.33):

This problem was also the subject of an interesting Mathematics Stack Exchange discussion which can be found here. The reader should attempt to solve this problem by himself/herself before reading my solution below.

To solve part (a), we use the fact that if $f$ is a differentiable real-valued function on $\mathbb{R}^3$ and $\bold{v}_p$ is a tangent vector with point of application $\bold{p}$ and vector part $\bold{v}$, then the differential $df$ of $f$ is the 1-form such that

$df(\bold{v}_p) = \sum v_i \frac{\partial f}{\partial x_i}(\bold{p}) = \sum \frac{\partial f}{\partial x_i}(\bold{p}) dx_i(\bold{v}_p)$

(where the last equality uses the fact that the differentials of the natural coordinate functions evaluated at a tangent vector are equal to the the coordinates $v_i$ of the vector part of the tangent vector). But using the correspondence (1) between 1-forms and vector fields in the problem we can then write

$df(\bold{v}_p) = \sum \frac{\partial f}{\partial x_i}(\bold{p}) dx_i(\bold{v}_p) \stackrel{\mathrm{(1)}}{\longleftrightarrow} \sum \frac{\partial f}{\partial x_i}(\bold{p}) U_i(\bold{p}) = \text{grad } f(\bold{p})$

(where the $U_i(\bold{p})$ are the natural frame field vectors at the point of application $\bold{p}$). Therefore we have shown that

$df \stackrel{\mathrm{(1)}}{\longleftrightarrow} \text{grad } f$

I emphasised a specific tangent vector argument $\bold{v}_p$ in the above solution but I will not do this in the solutions for (b) and (c) as the notation becomes too cumbersome. To solve part (b), we consider the 1-form

$\phi = f_1 dx_1 + f_2 dx_2 + f_3 dx_3$

The exterior derivative of $\phi$ is the 2-form

$d \phi = df_1 \wedge dx_1 + df_2 \wedge dx_2 + df_3 \wedge dx_3$

$=$

$\big(\frac{\partial f_1}{\partial x_1} dx_1 + \frac{\partial f_1}{\partial x_2} dx_2 + \frac{\partial f_1}{\partial x_3} dx_3 \big) \wedge dx_1$

$+ \big(\frac{\partial f_2}{\partial x_1} dx_1 + \frac{\partial f_2}{\partial x_2} dx_2 + \frac{\partial f_2}{\partial x_3} dx_3 \big) \wedge dx_2$

$+ \big(\frac{\partial f_3}{\partial x_1} dx_1 + \frac{\partial f_3}{\partial x_2} dx_2 + \frac{\partial f_3}{\partial x_3} dx_3 \big) \wedge dx_3$

$=$

$-\frac{\partial f_1}{\partial x_2} dx_1 dx_2 - \frac{\partial f_1}{\partial x_3} dx_1 dx_3$

$+ \frac{\partial f_2}{\partial x_1} dx_1 dx_2 - \frac{\partial f_2}{\partial x_3} dx_2 dx_3$

$+ \frac{\partial f_3}{\partial x_1} dx_1 dx_3 + \frac{\partial f_3}{\partial x_2} dx_2 dx_3$

$= \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) dx_1 dx_2 + \big( \frac{\partial f_3}{\partial x_1} - \frac{\partial f_1}{\partial x_3} \big) dx_1 dx_3 + \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) dx_2 dx_3$

But using the correspondence (2) between 2-forms and vector fields in the problem we can then write

$d \phi = \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) dx_1 dx_2 + \big( \frac{\partial f_3}{\partial x_1} - \frac{\partial f_1}{\partial x_3} \big) dx_1 dx_3 + \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) dx_2 dx_3$

$\stackrel{\mathrm{(2)}}{\longleftrightarrow}$

$= \big( \frac{\partial f_3}{\partial x_2} - \frac{\partial f_2}{\partial x_3} \big) U_1 + \big( \frac{\partial f_1}{\partial x_3} - \frac{\partial f_3}{\partial x_1} \big) U_2 + \big( \frac{\partial f_2}{\partial x_1} - \frac{\partial f_1}{\partial x_2} \big) U_3$

$= \text{curl } V$

Therefore we have shown that

$d \phi \stackrel{\mathrm{(2)}}{\longleftrightarrow} \text{curl } V$

Finally, to solve part (c) we can consider the 2-form

$\eta = f_1 dydz + f_2 dx dz + f_3 dx dy$

which has a correspondence with the vector field $V = \sum f_i U_i$ of the type (1) in the problem, that is,

$\eta \stackrel{\mathrm{(1)}}{\longleftrightarrow} V$

The exterior derivative of $\eta$ is the 3-form

$d \eta = df_1 \wedge dy dz + df_2 \wedge dx dz + df_3 \wedge dx dy$

Since products of differentials containing the same differential twice are eliminated, we see immediately that this reduces to

$d \eta = \big( \frac{\partial f_1}{\partial x} dx \big) dy dz + \big( \frac{\partial f_2}{\partial y} dy \big) dx dz + \big( \frac{\partial f_3}{\partial z} dz \big) dx dy$

$= \big(\frac{\partial f_1}{\partial x} + \frac{\partial f_2}{\partial y} + \frac{\partial f_3}{\partial z} \big) dx dy dz$

$= (\text{div } V) dx dy dz$