Study of a proof of Noether’s theorem and its application to conservation laws in physics

While I have for a long time been aware of Noether’s theorem and its relevance to symmetry and conservation laws in physics, I have only recently taken the time to closely explore its mathematical proof. In the present note I want to record some notes I made on the mathematical nuances involved in a proof of Noether’s theorem and the mathematical relevance of the theorem to some simple conservation laws in classical physics, namely, the conservation of energy and the conservation of linear momentum. Noether’s Theorem has important applications in a wide range of classical mechanics problems as well as in quantum mechanics and Einstein’s relativity theory. It is also used in the study of certain classes of partial differential equations that can be derived from variational principles.

The theorem was first published by Emmy Noether in 1918. An English translation of the full original paper is available here. An interesting book by Yvette Kosmann-Schwarzbach also presents an English translation of Noether’s 1918 paper and discusses in detail the history of the theorem’s development and its impact on theoretical physics in the 20th Century. (Kosmann-Schwarzbach, Y, 2011, The Noether Theorems: Invariance and Conservation Laws in the Twentieth Century. Translated by Bertram Schwarzbach. Springer). At the time of writing, the book is freely downloadable from here.

Mathematical setup of Noether’s theorem

The case I explore in detail here is that of a variational calculus functional of the form

S[y] = \int_a^b \mathrm{d}x F(x, y, y^{\prime})

where x is a single independent variable and y = (y_1, y_2, \ldots, y_n) is a vector of n dependent variables. The functional has stationary paths defined by the usual Euler-Lagrange equations of variational calculus. Noether’s theorem concerns how the value of this functional is affected by families of continuous transformations of the dependent and independent variables (e.g., translations, rotations) that are defined in terms of one or more real parameters. The case I explore in detail here involves transformations defined in terms of only a single parameter, call it \delta. The transformations can be represented in general terms as

\overline{x} = \Phi(x, y, y^{\prime}; \delta)

\overline{y}_k = \Psi_k(x, y, y^{\prime}; \delta)

for k = 1, 2, \ldots, n. The functions \Phi and \Psi_k are assumed to have continuous first derivatives with respect to all the variables, including the parameter \delta. Furthermore, the transformations must reduce to identities when \delta = 0, i.e.,

x \equiv \Phi(x, y, y^{\prime}; 0)

y_k \equiv \Psi_k(x, y, y^{\prime}; 0)

for k = 1, 2, \ldots, n. As concrete examples, translations and rotations are continuous differentiable transformations that can be defined in terms of a single parameter and that reduce to identities when the parameter takes the value zero.

Noether’s theorem is assumed to apply to infinitesimally small changes in the dependent and independent variables, so we can assume |\delta| \ll 1 and then use perturbation theory to prove the theorem. Treating \overline{x} and \overline{y}_k as functions of \delta and Taylor-expanding them about \delta = 0 we get

\overline{x}(\delta) = \overline{x}(0) + \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

where

\phi(x, y, y^{\prime}) \equiv \frac{\partial \Phi}{\partial \delta} \big|_{\delta = 0}

and

\overline{y}_k (\delta) = \overline{y}_k (0) + \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0}(\delta - 0) + O(\delta^2)

\iff

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

where

\psi_k (x, y, y^{\prime}) \equiv \frac{\partial \Psi_k}{\partial \delta} \big|_{\delta = 0} for k = 1, 2, \ldots, n.

Noether’s theorem then says that whenever the functional S[y] is invariant under the above family of transformations, i.e., whenever

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

for all c and d such that a \leq c < d \leq b, where \overline{c} = \Phi(c, y(c), y^{\prime}(c)) and \overline{d} = \Phi(d, y(d), y^{\prime}(d)), then for each stationary path of S[y] the following equation holds:

\sum_{k = 1}^n \frac{\partial F}{\partial y_k^{\prime}}\psi_k + \bigg(F - \sum_{k = 1}^n y_k^{\prime}\frac{\partial F}{\partial y_k^{\prime}}\bigg)\phi = \mathrm{constant}

As illustrated below, this remarkable equation encodes a number of conservation laws in physics, including conservation of energy, linear and angular momentum given that the relevant equations of motion are invariant under translations in time and space, and under rotations in space respectively. Thus, Noether’s theorem is often expressed as a statement along the lines that whenever a system has a continuous symmetry there must be corresponding quantities whose values are conserved.

Application of the theorem to familiar conservation laws in classical physics

It is, of course, not necessary to use the full machinery of Noether’s theorem for simple examples of conservation laws in classical physics. The theorem is most useful in unfamiliar situations in which it can reveal conserved quantities which were not previously known. However, going through the motions in simple cases clarifies how the mathematical machinery works in more sophisticated and less familiar situations.

To obtain the law of the conservation of energy in the simplest possible scenario, consider a particle of mass m moving along a straight line in a time-invariant potential field V(x) with position at time t given by the function x(t). The Lagrangian formulation of mechanics then says that the path followed by the particle will be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2 - V(x)\big)

The Euler-Lagrange equation for this functional would give Newton’s second law as the equation governing the particle’s motion. With regard to demonstrating energy conservation, we notice that the Lagrangian, which is more generally of the form L(t, x, \dot{x}) when there is a time-varying potential, here takes the simpler form L(x, \dot{x}) because there is no explicit dependence on time. Therefore we might expect the functional to be invariant under translations in time, and thus Noether’s theorem to hold. We will verify this. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \phi + O(\delta^2) \equiv t + \delta

and

\overline{x}(\delta) = x + \delta \cdot 0 + O(\delta^2) \equiv x

From the first equation we see that \phi = 1 in the case of a simple translation in time by an amount \delta, and from the second equation we see that \psi = 0, which simply reflects the fact that we are only translating in the time direction. The invariance of the functional under these transformations can easily be demonstrated by writing

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\overline{x}, \dot{\overline{x}}) = \int_{\overline{0}-\delta}^{\overline{T}-\delta} \mathrm{d}t L(x, \dot{x}) = \int_0^{T} \mathrm{d}t L(x, \dot{x})

where the limits in the second integral follow from the change of the time variable from \overline{t} to t. Thus, Noether’s theorem holds and with \phi = 1 and \psi = 0 the fundamental equation in the theorem reduces to

L - \dot{x}\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

Evaluating the terms on the left-hand side we get

\frac{1}{2}m\dot{x}^2 - V(x) - \dot{x} m\dot{x} =\mathrm{constant}

\iff

\frac{1}{2}m\dot{x}^2 + V(x) = E = \mathrm{constant}

which is of course the statement of the conservation of energy.

To obtain the law of conservation of linear momentum in the simplest possible scenario, assume now that the above particle is moving freely in the absence of any potential field, so V(x) = 0 and the only energy involved is kinetic energy. The path followed by the particle will now be a stationary path of the action functional

\int_0^{T} \mathrm{d}t L(\dot{x}) = \int_0^{T} \mathrm{d}t \big(\frac{1}{2}m\dot{x}^2\big)

The Euler-Lagrange equation for this functional would give Newton’s first law as the equation governing the particle’s motion (constant velocity in the absence of any forces). To get the law of conservation of linear momentum we will consider a translation in space rather than time, and check that the action functional is invariant under such translations. In the context of the mathematical setup of Noether’s theorem above, we can write the relevant transformations as

\overline{t}(\delta) = t + \delta \cdot 0 + O(\delta^2) \equiv t

and

\overline{x}(\delta) = x + \delta \psi + O(\delta^2) \equiv x + \delta

From the first equation we see that \phi = 0 reflecting the fact that we are only translating in the space direction, and from the second equation we see that \psi = 1 in the case of a simple translation in space by an amount \delta. The invariance of the functional under these transformations can easily be demonstrated by noting that \dot{\overline{x}} = \dot{x}, so we can write

\int_{\overline{0}}^{\overline{T}} \mathrm{d}\overline{t} L(\dot{\overline{x}}) = \int_0^{T} \mathrm{d}t L(\dot{x})

since the limits of integration are not affected by the translation in space. Thus, Noether’s theorem holds and with \phi = 0 and \psi = 1 the fundamental equation in the theorem reduces to

\frac{\partial L}{\partial \dot{x}} = \mathrm{constant}

\iff

m\dot{x} = \mathrm{constant}

This is, of course, the statement of the conservation of linear momentum.

Proof of Noether’s theorem

To prove Noether’s theorem we will begin with the transformed functional

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

We will substitute into this the linearised forms of the transformations, namely

\overline{x}(\delta) = x + \delta \phi + O(\delta^2)

and

\overline{y}_k (\delta) = y_k + \delta \psi_k + O(\delta^2)

for k = 1, 2, \ldots, n, and then expand to first order in \delta. Note that the integration limits are, to first order in \delta,

\overline{c} = c + \delta \phi(c)

and

\overline{d} = d + \delta \phi(d)

Using the linearised forms of the transformations and writing \psi = (\psi_1, \psi_2, \ldots, \psi_n) we get

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \frac{\mathrm{d}x}{\mathrm{d}\overline{x}}

\frac{\mathrm{d}\overline{x}}{\mathrm{d}x} = 1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}

Inverting the second equation we get

\frac{\mathrm{d}x}{\mathrm{d}\overline{x}} = \big(1 + \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big)^{-1} = 1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} + O(\delta^2)

Using this in the first equation we find, to first order in \delta,

\frac{\mathrm{d} \overline{y}}{\mathrm{d}\overline{x}} = \big(\frac{\mathrm{d}y}{\mathrm{d}x} + \delta \frac{\mathrm{d}\psi}{\mathrm{d}x} \big) \big(1 - \delta \frac{\mathrm{d}\phi}{\mathrm{d}x}\big) = \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)

Making the necessary substitutions we can then write the transformed functional as

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime})

= \int_{\overline{c}-\delta \phi(c)}^{\overline{d}-\delta \phi(d)} \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

Treating F as a function of \delta and expanding about \delta = 0 to first order we get

F(\delta) = F(0) + \delta \frac{\partial F}{\partial \delta}\big|_{\delta = 0}

= F(x, y, y^{\prime}) + \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

Then using the expression for \frac{\mathrm{d}\overline{x}}{\mathrm{d}x} above, the transformed functional becomes

\int_c^d \mathrm{d}x \frac{ \mathrm{d}\overline{x}}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F\bigg(x + \delta \phi, y+\delta \psi, \frac{\mathrm{d}y}{\mathrm{d}x} + \delta\big(\frac{\mathrm{d}\psi}{\mathrm{d}x} - \frac{\mathrm{d}y}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big) \bigg)

= \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \int_c^d \mathrm{d}x \delta \bigg(\frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} - \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\frac{\mathrm{d}\phi}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \delta \frac{\mathrm{d}\phi}{\mathrm{d}x} F(x, y, y^{\prime}) + O(\delta^2)

Ignoring the second order term in \delta we can thus write

\int_{\overline{c}}^{\overline{d}} \mathrm{d} \overline{x} F(\overline{x}, \overline{y}, \overline{y}^{\prime}) = \int_c^d \mathrm{d}x F(x, y, y^{\prime})

+ \delta \int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg)

Since the functional is invariant, however, this implies

\int_c^d \mathrm{d}x \bigg(\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} + \frac{\partial F}{\partial x}\phi + \sum_{k=1}^n \big(\frac{\partial F}{\partial y_k}\psi_k + \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x}\big)\bigg) = 0

We now manipulate this equation by integrating the terms involving \frac{\mathrm{d}\phi}{\mathrm{d}x} and \frac{\mathrm{d}\psi_k}{\mathrm{d}x} by parts. We get

\int_c^d \mathrm{d}x \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\frac{\mathrm{d}\phi}{\mathrm{d}x} = \bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg]_c^d

- \int_c^d \mathrm{d}x \phi \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

and

\int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\frac{\mathrm{d}\psi_k}{\mathrm{d}x} = \bigg[\sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d - \int_c^d \mathrm{d}x \sum_{k=1}^n \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\psi_k

Substituting these into the equation gives

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \bigg(\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can manipulate this equation further by expanding the integrand in the second term on the left-hand side. We get

\frac{\partial F}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}x}\big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big)

= \frac{\partial F}{\partial x} - \frac{\partial F}{\partial x} - \sum_{k=1}^n \frac{\partial F}{\partial y_k}y^{\prime}_k - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}y^{\prime \prime}_k + \sum_{k=1}^n y^{\prime}_k \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)

= \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

Thus, the equation becomes

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d

+ \int_c^d \mathrm{d}x \phi \sum_{k=1}^n y^{\prime}_k \bigg(\frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big) - \frac{\partial F}{\partial y_k}\bigg)

+ \int_c^d \mathrm{d}x \sum_{k=1}^n \psi_k \bigg(\frac{\partial F}{\partial y_k} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial F}{\partial y^{\prime}_k}\big)\bigg) = 0

We can now see at a glance that the second and third terms on the left-hand side must vanish because of the Euler-Lagrange expressions appearing in the brackets (which are identically zero on stationary paths). Thus we arrive at the equation

\bigg[\phi \big(F - \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k} \frac{\mathrm{d}y_k}{\mathrm{d}x}\big) + \sum_{k=1}^n \frac{\partial F}{\partial y^{\prime}_k}\psi_k\bigg]_c^d = 0

which proves that the formula inside the square brackets is constant as per Noether’s theorem.

Calculus derivation of a surprising result of Galileo’s

An interesting paper by Herman Erlichson discusses Galileo’s attempt to prove that the minimum time path for a particle falling from a point on the lower quadrant of a circle is via the circle itself. (I attach the paper here: Erlichson, H, 1998, Galileo’s Work on Swiftest Descent from a Circle And How He Almost Proved the Circle Itself Was the Minimum Time Path, The American Mathematical Monthly, Vol 105, No 4, 338-347). I was particularly intrigued by equation (1) in this paper which asserts that the time taken for a particle to fall along a straight line starting at the circle is independent of the starting point. Thus, for example, a particle following a longer chord starting higher up on the circle would take exactly the same time to reach the bottom as a particle following a shorter chord starting lower down on the circle. Galileo explained it in his book, The Two New Sciences, as follows:

Galileo2

(I attach the English translation of Galileo’s book here: galileo two new sciences).

Underneath equation (1) in Erlichson’s paper, Erlichson gives a quick heuristic derivation of this result but I couldn’t resist exploring it more deeply using calculus methods (which were not available in Galileo’s time – calculus was invented by Isaac Newton, who was born almost at the same time that Galileo died). I want to record these calculations in the present note, in particular showing first how the time taken along an arbitrary straight line definitely does depend on the starting point, and then showing how this dependence instantly `disappears’ when the starting point of the straight line is taken to lie on a circle. I find it remarkable how this happens algebraically – nothing fundamentally changes about the straight line other than assuming that it does or does not start on a circle!

For the purposes of this note I have amended Figure 1 in Erlichson’s paper by imposing a coordinate system as shown below:

The origin of the coordinate system is taken to be at C and we suppose that the point D has coordinates (X, nX). The equation of the straight line DC is y = nx. Let us ignore the circular arc connecting D and C for the moment and work out the time it would take for a particle of mass m to fall from D to C along the straight line. We can do this by considering the energy of the particle at point D. If it starts at rest (i.e., its initial velocity is zero), the energy at D consists only of the particle’s potential energy given by mgy(X) = mgnX. Potential energy is transferred to kinetic energy as the particle moves down the line and the total energy at each x-coordinate is given by

\frac{1}{2}mv^2 + mgnx = mgnX

\iff

v = \sqrt{2gn(X - x)}

Given that y = nx we see that y^{\prime} = n so the element of distance along the straight line is \mathrm{d}x\sqrt{1 + (y^{\prime})^2} = \mathrm{dx}\sqrt{1 + n^2}. Therefore the element of speed is

\frac{\mathrm{d}x}{\mathrm{d}t}\sqrt{1+n^2}

Setting this equal to v and solving for \frac{\mathrm{d}x}{\mathrm{d}t} we get

\frac{\mathrm{d}x}{\mathrm{d}t} = \sqrt{\frac{2gn(X-x)}{1+n^2}}

We now observe that the time of descent T will be given by the integral

T = \int_0^T \mathrm{d}t = \int_0^X \frac{\mathrm{d}t}{\mathrm{d}x} \mathrm{dx} = \int_0^X \sqrt{\frac{1+n^2}{2gn(X-x)}}dx

= \sqrt{\frac{1+n^2}{2gn}} \int_0^X (X - x)^{1/2}\mathrm{d}x

= 2\sqrt{X} \sqrt{\frac{1+n^2}{2gn}}

= \sqrt{\frac{2X}{gn}}\sqrt{1+n^2}

Thus, in this case, the formula for T definitely depends on the starting point X. But watch what happens when we now assume that the point D lies on a circle of radius L as shown in the diagram. The circle has equation

x^2 + (y - L)^2 = L^2

so the equation of the circular arc joining D and C is

y = L + \sqrt{L^2 - x^2}

as indicated in the diagram. At the point D we therefore have

nX = L + \sqrt{L^2 - X^2}

\iff

n = \frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}

so

n^2 = \frac{L^2}{X^2} + \frac{2L}{X}\sqrt{\frac{L^2}{X^2} - 1} + \frac{L^2}{X^2} - 1

and therefore

1 + n^2 = \frac{2L^2}{X^2} + \frac{2L}{X}\sqrt{\frac{L^2}{X^2} - 1}

Substituting these expressions for n and 1 + n^2 in the formula for T we get

T = \sqrt{\frac{2X}{gn}}\sqrt{1+n^2}

= \sqrt{\frac{2X\big(\frac{2L^2}{X^2} + \frac{2L}{X}\sqrt{\frac{L^2}{X^2} - 1}\big)}{g\big(\frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}\big)}}

= \sqrt{\frac{4L\big(\frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}\big)}{g\big(\frac{L}{X} + \sqrt{\frac{L^2}{X^2} - 1}\big)}}

= \sqrt{\frac{4L}{g}}

= 2\sqrt{\frac{L}{g}}

which is equation (1) in Erlichson’s paper. The dependence on the starting point X has now vanished, so all starting points on the lower quadrant of the circle will yield the same time of travel to the origin!

 

Simple variational setups yielding Newton’s Second Law and Schrödinger’s equation

It is a delightful fact that one can get both the fundamental equation of classical mechanics (Newton’s Second Law) and the fundamental equation of quantum mechanics (Schrödinger’s equation) by solving very simple variational problems based on the familiar conservation of mechanical energy equation

K + U = E

In the present note I want to briefly set these out emphasising the common underlying structure provided by the conservation of mechanical energy and the calculus of variations. The kinetic energy K will be taken to be

K = \frac{1}{2}m \dot{x}^2 = \frac{p^2}{2m}

where \dot{x} = \frac{\mathrm{d}x}{\mathrm{d}t} is the particle’s velocity, p = m\dot{x} is its momentum, and m is its mass. The potential energy U will be regarded as some function of x only.

To obtain Newton’s Second Law we find the stationary path followed by the particle with respect to the functional

S[x] = \int_{t_1}^{t_2} L(t, x, \dot{x}) dt  = \int_{t_1}^{t_2} (K - U) dt

The function L(t, x, \dot{x}) = K - U is usually termed the `Lagrangian’ in classical mechanics. The functional S[x] is usually called the `action’. The Euler-Lagrange equation for this calculus of variations problem is

\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t}\big(\frac{\partial L}{\partial \dot{x}}\big) = 0

and this is Newton’s Second Law in disguise! We have

\frac{\partial L}{\partial x} = -\frac{\mathrm{d}U}{\mathrm{d}x} \equiv F

\frac{\partial L}{\partial \dot{x}} = m\dot{x} \equiv p

and

\frac{\mathrm{d}}{\mathrm{d}t} \big(\frac{\partial L}{\partial \dot{x}}\big) = \frac{\mathrm{d}p}{\mathrm{d}t} = m\ddot{x} \equiv ma

so substituting these into the Euler-Lagrange equation we get Newton’s Second Law, F = ma.

To obtain Schrödinger’s equation we introduce a function

\psi(x) = exp\big(\frac{1}{\hbar}\int p\mathrm{d}x\big)

where p = m \dot{x} is again the momentum of the particle and \hbar is the reduced Planck’s constant from quantum mechanics. (Note that \int p dx has units of length^2 mass time^{-1} so we need to remove these by dividing by \hbar which has the same units. The function \psi(x) in quantum mechanics is dimensionless). We then have

\text{ln} \psi = \frac{1}{\hbar}\int p\mathrm{d}x

and differentiating both sides gives

\frac{\psi^{\prime}}{\psi} = \frac{1}{\hbar} p

so

p^2 = \hbar^2 \big(\frac{\psi^{\prime}}{\psi}\big)^2

Therefore we can write the kinetic energy as

K = \frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2

and putting this into the conservation of mechanical energy equation gives

\frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2 + U = E

\iff

\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2 = 0

We now find the stationary path followed by the particle with respect to the functional

T[\psi] = \int_{-\infty}^{\infty} M(x, \psi, \psi^{\prime}) \mathrm{d}x = \int_{-\infty}^{\infty}  \big(\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2\big)\mathrm{d}x

The Euler-Lagrange equation for this calculus of variations problem is

\frac{\partial M}{\partial \psi} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial M}{\partial \psi^{\prime}}\big) = 0

and this is Schrödinger’s equation in disguise! We have

\frac{\partial M}{\partial \psi} = 2(U - E)\psi

\frac{\partial M}{\partial \psi^{\prime}} = \frac{\hbar^2}{m} \psi^{\prime}

and

\frac{\mathrm{d}}{\mathrm{d}x} \big(\frac{\partial M}{\partial \psi^{\prime}}\big) = \frac{\hbar^2}{m} \psi^{\prime \prime}

so substituting these into the Euler-Lagrange equation we get

2(U - E) \psi - \frac{\hbar^2}{m} \psi^{\prime \prime} = 0

\iff

-\frac{\hbar^2}{2m} \frac{\mathrm{d}^2 \psi}{\mathrm{d} x^2} + U \psi = E \psi

and this is the (time-independent) Schrödinger equation for a particle of mass m with fixed total energy E in a potential U(x) on the line -\infty < x < \infty.

Alternative approaches for a differential equation involving the square of a derivative

In the course of working out a calculus of variations problem I came across the nonlinear first-order differential equation

\big( \frac{\mathrm{d} y}{\mathrm{d} x}\big)^2 + y = c

where c is a constant. The boundary conditions of the original variational problem were y(0) = 0 and y(1) = 1. Differential equations involving squares of derivatives can be tricky and I decided to take the opportunity to try to find at least a couple of different ways of solving this relatively simple one that might be applicable in harder cases. I want to record these here. (The reader should pause at this point and try to solve this equation by himself/herself before continuing).

One approach that worked nicely and that is generalisable to more difficult cases is to differentiate the equation, giving

2y^{\prime} y^{\prime \prime} + y^{\prime} = 0

\iff

y^{\prime}(2 y^{\prime \prime} + 1) = 0

This yields two simpler differential equations, namely

y^{\prime} = 0

and

2y^{\prime \prime} + 1 = 0

The first one implies that y is a constant but this is not consistent with the boundary conditions. We therefore ignore this one and the problem then reduces to solving the second simple second-order differential equation, which can be written as

\frac{\mathrm{d}^2 y}{\mathrm{d} x^2} = -\frac{1}{2}

Integrating twice we get the general solution

y = -\frac{1}{4} x^2 + c_1 x + c_2

where c_1 and c_2 are arbitrary constants. Using y(0) = 0 and y(1) = 1 we find that c_1 = \frac{5}{4} and c_2 = 0. Therefore the required final solution of the differential equation with the squared derivative is

y = -\frac{1}{4} x^2 + \frac{5}{4} x

The other approach I tried and that worked in this case was to rearrange the equation with the squared derivative directly in order to use separation of variables. Again this is something that might be worth trying in more difficult situations. In the present case we have

\big( \frac{\mathrm{d} y}{\mathrm{d} x}\big)^2 + y = c_1

so

(\mathrm{d} y)^2 = (c_1 - y) (\mathrm{d} x)^2

or

\big(\frac{\mathrm{d} y}{\sqrt{c_1 - y}}\big)^2 = (\mathrm{d} x)^2

and therefore

\big(\int \frac{\mathrm{d} y}{\sqrt{c_1 - y}}\big)^2 = (\int \mathrm{d} x)^2

So the problem reduces to solving

\int \frac{\mathrm{d} y}{\sqrt{c_1 - y}} = \int \mathrm{d} x

Carrying out the integrations we get

-2 \sqrt{c_1 - y} = x + c_2

which yields the general solution

y = c_1 - \frac{1}{4} x^2 - \frac{1}{2} c_2 x - \frac{1}{4} c_2^2

Using the boundary conditions we find that c_1 = \frac{25}{16} and c_2 = -\frac{5}{2}. Substituting these into the general solution we again find the final solution of the differential equation with the squared derivative to be

y = -\frac{1}{4} x^2 + \frac{5}{4} x

A note on the structure of the higher-order terms in multivariate Taylor expansions

I recently had to think a bit more deeply than is usually necessary about the nature of the higher-order terms in multivariate Taylor expansions of smooth functions of the form f : \mathbb{R}^m \rightarrow \mathbb{R}. The result was a realisation that the fundamental structure of these higher-order terms is actually quite simple and that working them out manually brings up some interesting combinatorial issues, including use of the partition function from advanced number theory. I want to record some of my thoughts about this here.

For a univariate function f : \mathbb{R} \rightarrow \mathbb{R}, Taylor’s theorem says that if f is n times differentiable at some point a \in \mathbb{R} then there exists a function h_n : \mathbb{R} \rightarrow \mathbb{R} such that

f(x) - f(a) =

f^{\prime}(a)(x - a) + \frac{1}{2!} f^{\prime \prime}(a)(x - a)^2 + \cdots + \frac{1}{n!}f^{(n)}(a)(x - a)^n + h_n(x)(x - a)^n

where h_n(x) \rightarrow 0 as x \rightarrow a. The n-th order Taylor polynomial

f(a) + f^{\prime}(a)(x - a) + \frac{1}{2!} f^{\prime \prime}(a)(x - a)^2 + \cdots + \frac{1}{n!}f^{(n)}(a)(x - a)^n

then serves as an approximation to the function f(x). If the function f is infinitely many times differentiable, we can approximate f to any desired level of accuracy by choosing the Taylor polynomial to be of high enough degree.

In the univariate case the n-th order term in the Taylor polynomial is

\frac{1}{n!}f^{(n)}(a)(x - a)^n

In the present note I am interested in how this generalises to the multivariate case f : \mathbb{R}^m \rightarrow \mathbb{R}, where the Taylor expansion is about a vector \mathbf{a} = (a_1, a_2, \ldots, a_m). In the bivariate case m = 2, where the Taylor expansion is about the point \mathbf{a} = (a_1, a_2), the n-th order term in the Taylor expansion is given by the formula

\frac{1}{n!}\sum_{k=0}^{n} \binom{n}{k} \frac{\partial^k f}{\partial x_1^{n-k} \partial x_2^{k}}\bigg|_{\mathbf{a}} (x_1 - a_1)^{n-k}(x_2 - a_2)^k

This enables any higher-order term in the Taylor polynomial to be quickly obtained (and notice that the binomial coefficient values can be quickly read from the appropriate line of Pascal’s triangle). For example, the fourth-order term would be

\frac{1}{4!} \bigg[\frac{\partial^4f}{\partial x_1^4}\bigg|_{\mathbf{a}} (x_1 - a_1)^4 + 4 \frac{\partial^4 f}{\partial x_1^3 \partial x_2}\bigg|_{\mathbf{a}} (x_1 - a_1)^3 (x_2 - a_2) \ \ +

6 \frac{\partial^4 f}{\partial x_1^2 \partial x_2^2}\bigg|_{\mathbf{a}} (x_1 - a_1)^2 (x_2 - a_2)^2 + 4 \frac{\partial^4 f}{\partial x_1 \partial x_2^3}\bigg|_{\mathbf{a}} (x_1 - a_1) (x_2 - a_2)^3 \  \  +

\frac{\partial^4f}{\partial x_2^4}\bigg|_{\mathbf{a}} (x_2 - a_2)^4\bigg]

In order to generalise this as easily as possible to multivariate cases with m > 2, I found it convenient to adopt the following notation:

\partial_i \equiv \frac{\partial f}{\partial x_i} \bigg|_{\mathbf{a}}

\partial_{ij} \equiv \frac{\partial^2 f}{\partial x_i \partial x_j} \bigg|_{\mathbf{a}}

\partial_{ijk} \equiv \frac{\partial^3 f}{\partial x_i \partial x_j \partial x_k} \bigg|_{\mathbf{a}}

and so on, and

v_i \equiv (x_i - a_i)

Given the function f : \mathbb{R}^m \rightarrow \mathbb{R}, we allow the indices above to take values from 1 to m. To my surprise, I then realised that if we use the Einstein summation convention, it is possible to represent any Taylor expansion as

f(\mathbf{x}) = f(\mathbf{a}) + \partial_i v_i + \frac{1}{2!} \partial_{ij} v_i v_j + \frac{1}{3!} \partial_{ijk} v_i v_j v_k + \frac{1}{4!} \partial_{ijkl} v_i v_j v_k v_l + \cdots

This generic formula will apply to any number of variables m. Notice that the number of indices in each term corresponds to the order of that term. So, for example, the fifth-order term in the Taylor expansion will involve five indices, the sixth-order term will involve six indices, and so on.

To test out this formula, let us use it to work out the fourth-order term for the bivariate case m = 2 again. The relevant term in the above form of the Taylor expansion is

\frac{1}{4!} \partial_{ijkl} v_i v_j v_k v_l

Since each of the indices i, j, k, l will be allowed to take values 1 or 2, the above expression represents a sum involving 2^4 = 16 terms. In general, the n-th order term in the Taylor expansion of a function f : \mathbb{R}^m \rightarrow \mathbb{R} will involve m^n terms. The 16 terms in the above case are

\frac{1}{4!} \bigg[ \partial_{1111} v_1 v_1 v_1 v_1 + \partial_{1112} v_1 v_1 v_1 v_2 + \partial_{1121} v_1 v_1 v_2 v_1 + \partial_{1122} v_1 v_1 v_2 v_2 \ +

\partial_{1211} v_1 v_2 v_1 v_1 + \partial_{1212} v_1 v_2 v_1 v_2 + \partial_{1221} v_1 v_2 v_2 v_1 + \partial_{1222} v_1 v_2 v_2 v_2 \ +

\partial_{2111} v_2 v_1 v_1 v_1 + \partial_{2112} v_2 v_1 v_1 v_2 + \partial_{2121} v_2 v_1 v_2 v_1 + \partial_{2122} v_2 v_1 v_2 v_2 \ +

\partial_{2211} v_2 v_2 v_1 v_1 + \partial_{2212} v_2 v_2 v_1 v_2 + \partial_{2221} v_2 v_2 v_2 v_1 + \partial_{2222} v_2 v_2 v_2 v_2\bigg]

However, due to the equality of cross-partials involving the same index numbers, this reduces to

\frac{1}{4!}\bigg[\partial_{1111} v_1^4 + 4 \partial_{1112} v_1^3 v_2 + 6 \partial_{1122} v_1^2 v_2^2 + 4 \partial_{1222} v_1 v_2^3 + \partial_{2222} v_2^4\bigg]

which matches what we previously obtained using the formula.

In a similar way, we could proceed to systematically calculate the fourth-order term in the trivariate case m = 3. The relevant term in the generic Taylor expansion is the same one as before,  namely

\frac{1}{4!} \partial_{ijkl} v_i v_j v_k v_l

However, since each of the indices will take values from 1 to 3 in the trivariate case, the above expression now represents a sum of 3^4 = 81 terms. The 81 terms are

\frac{1}{4!} \bigg[\partial_{1111} v_1 v_1 v_1 v_1 + \partial_{1121} v_1 v_1 v_2 v_1 + \partial_{1131} v_1 v_1 v_3 v_1 \ +

\partial_{1112} v_1 v_1 v_1 v_2 + \partial_{1122} v_1 v_1 v_2 v_2 + \partial_{1132} v_1 v_1 v_3 v_2 \ +

\partial_{1113} v_1 v_1 v_1 v_3 + \partial_{1123} v_1 v_1 v_2 v_3 + \partial_{1133} v_1 v_1 v_3 v_3 \ +

\partial_{1211} v_1 v_2 v_1 v_1 + \partial_{1221} v_1 v_2 v_2 v_1 + \partial_{1231} v_1 v_2 v_3 v_1 \ +

\partial_{1212} v_1 v_2 v_1 v_2 + \partial_{1222} v_1 v_2 v_2 v_2 + \partial_{1232} v_1 v_2 v_3 v_2 \ +

\partial_{1213} v_1 v_2 v_1 v_3 + \partial_{1223} v_1 v_2 v_2 v_3 + \partial_{1233} v_1 v_2 v_3 v_3 \ +

\partial_{1311} v_1 v_3 v_1 v_1 + \partial_{1321} v_1 v_3 v_2 v_1 + \partial_{1331} v_1 v_3 v_3 v_1 \ +

\partial_{1312} v_1 v_3 v_1 v_2 + \partial_{1322} v_1 v_3 v_2 v_2 + \partial_{1332} v_1 v_3 v_3 v_2 \ +

\partial_{1313} v_1 v_3 v_1 v_3 + \partial_{1323} v_1 v_3 v_2 v_3 + \partial_{1333} v_1 v_3 v_3 v_3 \ +

\partial_{2111} v_2 v_1 v_1 v_1 + \partial_{2121} v_2 v_1 v_2 v_1 + \partial_{2131} v_2 v_1 v_3 v_1 \ +

\partial_{2112} v_2 v_1 v_1 v_2 + \partial_{2122} v_2 v_1 v_2 v_2 + \partial_{2132} v_2 v_1 v_3 v_2 \ +

\partial_{2113} v_2 v_1 v_1 v_3 + \partial_{2123} v_2 v_1 v_2 v_3 + \partial_{2133} v_2 v_1 v_3 v_3 \ +

\partial_{2211} v_2 v_2 v_1 v_1 + \partial_{2221} v_2 v_2 v_2 v_1 + \partial_{2231} v_2 v_2 v_3 v_1 \ +

\partial_{2212} v_2 v_2 v_1 v_2 + \partial_{2222} v_2 v_2 v_2 v_2 + \partial_{2232} v_2 v_2 v_3 v_2 \ +

\partial_{2213} v_2 v_2 v_1 v_3 + \partial_{2223} v_2 v_2 v_2 v_3 + \partial_{2233} v_2 v_2 v_3 v_3 \ +

\partial_{2311} v_2 v_3 v_1 v_1 + \partial_{2321} v_2 v_3 v_2 v_1 + \partial_{2331} v_2 v_3 v_3 v_1 \ +

\partial_{2312} v_2 v_3 v_1 v_2 + \partial_{2322} v_2 v_3 v_2 v_2 + \partial_{2332} v_2 v_3 v_3 v_2 \ +

\partial_{2313} v_2 v_3 v_1 v_3 + \partial_{2323} v_2 v_3 v_2 v_3 + \partial_{2333} v_2 v_3 v_3 v_3 \ +

\partial_{3111} v_3 v_1 v_1 v_1 + \partial_{3121} v_3 v_1 v_2 v_1 + \partial_{3131} v_3 v_1 v_3 v_1 \ +

\partial_{3112} v_3 v_1 v_1 v_2 + \partial_{3122} v_3 v_1 v_2 v_2 + \partial_{3132} v_3 v_1 v_3 v_2 \ +

\partial_{3113} v_3 v_1 v_1 v_3 + \partial_{3123} v_3 v_1 v_2 v_3 + \partial_{3133} v_3 v_1 v_3 v_3 \ +

\partial_{3211} v_3 v_2 v_1 v_1 + \partial_{3221} v_3 v_2 v_2 v_1 + \partial_{3231} v_3 v_2 v_3 v_1 \ +

\partial_{3212} v_3 v_2 v_1 v_2 + \partial_{3222} v_3 v_2 v_2 v_2 + \partial_{3232} v_3 v_2 v_3 v_2 \ +

\partial_{3213} v_3 v_2 v_1 v_3 + \partial_{3223} v_3 v_2 v_2 v_3 + \partial_{3233} v_3 v_2 v_3 v_3 \ +

\partial_{3311} v_3 v_3 v_1 v_1 + \partial_{3321} v_3 v_3 v_2 v_1 + \partial_{3331} v_3 v_3 v_3 v_1 \ +

\partial_{3312} v_3 v_3 v_1 v_2 + \partial_{3322} v_3 v_3 v_2 v_2 + \partial_{3332} v_3 v_3 v_3 v_2 \ +

\partial_{3313} v_3 v_3 v_1 v_3 + \partial_{3323} v_3 v_3 v_2 v_3 + \partial_{3333} v_3 v_3 v_3 v_3 \bigg]

Again due to equality of cross-partials involving the same index numbers, this reduces to

\frac{1}{4!} \bigg[ \partial_{1111} v_1^4 + \partial_{2222} v_2^4 + \partial_{3333} v_3^4 + 4 \partial_{1112} v_1^3 v_2 + 4 \partial_{1113} v_1^3 v_3 +

4 \partial_{2223} v_2^3 v_3 + 4 \partial_{1222} v_1 v_2^3 + 4 \partial_{1333} v_1 v_3^3 + 4 \partial_{2333} v_2 v_3^3 +

6 \partial_{1122} v_1^2 v_2^2 + 6 \partial_{1133} v_1^2 v_3^2 + 6 \partial_{2233} v_2^2 v_3^2 + 12 \partial_{1123} v_1^2 v_2 v_3 + 12 \partial_{1223} v_1 v_2^2 v_3 + 12 \partial_{1233} v_1 v_2 v_3^2 \bigg]

Note that the coefficient of each of the terms is the multinomial coefficient

\binom{k}{k_1, k_2, \ldots, k_m} = \frac{k!}{k_1! k_2! \dots k_m!}

where k_1 + k_2 + \cdots + k_m = k. This gives the number of ordered arrangements of k objects in which there are k_1 objects of type 1, k_2 objects of type 2, \ldots, k_m objects of type m.

In our case we have k = 4 and m = 3. Thus, the coefficient of \partial_{1122} v_1^2 v_2^2 for example is 6, because there are two 1s, two 2s and zero 3s, and setting k_1 = 2, k_2 = 2, k_3 = 0 we get

\binom{k}{k_1, k_2, k_3} = \frac{k!}{k_1! k_2! k_3!} = \frac{4!}{2! 2! 0!} = \frac{24}{4} = 6

Similarly, in the case of \partial_{1123} v_1^2 v_2 v_3 the coefficient is 12 because k_1 = 2, k_2 = 1, k_3 = 1, so

\binom{k}{k_1, k_2, k_3} = \frac{4!}{2! 1! 1!} = \frac{24}{2} = 12

Note also that within the square brackets in the reduced expression above there are 15 different `types’ of terms. Why 15? The answer has to do with the concept of partitions from number theory. A partition of a positive integer is a way of writing that integer as a sum of positive integers. Two sums that differ only in the order of the integers are considered to be the same partition. Now, the number 4 has five partitions, namely

4

3 + 1

2 + 2

2 + 1 + 1

1 + 1 + 1 + 1

In the multinomial coefficients in the square brackets in the reduced expression above we are trying to obtain the sum k = 4 in all possible ways by assigning values to k_1, k_2 and k_3. There are 3 ways of achieving this with the partition 4 (one example is k_1 = 4, k_2 = 0, k_3 = 0); there are 6 ways of achieving this with the partition 3 + 1 (one example is k_1 = 3, k_2 = 1, k_3 = 0); there are 3 ways of achieving this with the partition 2 + 2 (one example is k_1 = 2, k_2 = 2, k_3 = 0); there are 3 ways of achieving this with the partition 2 + 1 + 1 (one example is k_1 = 2, k_2 =1, k_3 = 1); and finally it is impossible to use the partition 1 + 1 + 1 + 1 since we are only summing three terms. Thus there are 15 ways to achieve the sum k = 4 by applying the partitions of 4 to the values of k_1, k_2 and k_3, which explains why there are 15 terms in the square brackets in the reduced expression above.

Can we use these insights to quickly work out the fourth-order term in the Taylor expansion of the quadvariate function f : \mathbb{R}^4 \rightarrow \mathbb{R}? Since m = 4 and n = 4, this fourth-order term will be a sum of 4^4 = 256 terms. Can we work out how many terms there will be in the reduced expression in which equivalent cross-partials are grouped together? Yes we can, using the partitions of 4 again. In the multinomial coefficients in the square brackets in the reduced expression we will be trying to obtain the sum k = 4 in all possible ways by assigning values to k_1, k_2, k_3 and k_4. There are 4 ways of achieving this with the partition 4 (one example is k_1 = 4, k_2 = 0, k_3 = 0, k_4 = 0); there are 12 ways of achieving this with the partition 3 + 1 (one example is k_1 = 3, k_2 = 1, k_3 = 0, k_4 = 0); there are 6 ways of achieving this with the partition 2 + 2 (one example is k_1 = 2, k_2 = 2, k_3 = 0, k_4 = 0); there are 12 ways of achieving this with the partition 2 + 1 + 1 (one example is k_1 = 2, k_2 =1, k_3 = 1, k_4 = 0); and finally there is exactly one way to use the partition 1 + 1 + 1 + 1, namely k_1 = 1, k_2 = 1, k_3 = 1, k_4 = 1. Thus there are 35 ways to achieve the sum k = 4 by applying the partitions of 4 to the values of k_1, k_2, k_3 and k_4, so there should be 35 different types of terms in the reduced expression. These will be

\frac{1}{4!} \bigg[\partial_{1111} v_1^4 + \partial_{2222} v_2^4 + \partial_{3333} v_3^4 + \partial_{4444} v_4^4 \ +

4 \partial_{1112} v_1^3 v_2 + 4 \partial_{1222} v_1 v_2^3 + 4 \partial_{2223} v_2^3 v_3 \ +

4 \partial_{3444} v_3 v_4^3 + 4 \partial_{1113} v_1^3 v_3 + 4 \partial_{1333} v_1 v_3^3 \ +

4 \partial_{2333} v_2 v_3^3 + 4 \partial_{2224} v_2^3 v_4 + 4 \partial_{1114} v_1^3 v_4 \ +

4 \partial_{1444} v_1 v_4^3 + 4 \partial_{3334} v_3^3 v_4 + 4 \partial_{2444} v_2 v_4^3 \ +

6 \partial_{1122} v_1^2 v_2^2 + 6 \partial_{2233} v_2^2 v_3^2 + 6 \partial_{1133} v_1^2 v_3^2 \ +

6 \partial_{2244} v_2^2 v_4^2 + 6 \partial_{1144} v_1^2 v_4^2 + 6 \partial_{3344} v_3^2 v_4^2 \ +

12 \partial_{1123} v_1^2 v_2 v_3 + 12 \partial_{1124} v_1^2 v_2 v_4 + 12 \partial_{1134} v_1^2 v_3 v_4 \ +

12 \partial_{1223} v_1 v_2^2 v_3 + 12 \partial_{1224} v_1 v_2^2 v_4 + 12 \partial_{1334} v_1 v_3^2 v_4 \ +

12 \partial_{1344} v_1 v_3 v_4^2 + 12 \partial_{1244} v_1 v_2 v_4^2 + 12 \partial_{1233} v_1 v_2 v_3^2 \ +

12 \partial_{2334} v_2 v_3^2 v_4 + 12 \partial_{2234} v_2^2 v_3 v_4 + 12 \partial_{2344} v_2 v_3 v_4^2 \ +

24 \partial_{1234} v_1 v_2 v_3 v_4 \bigg]

The coefficient on each of these terms is the relevant multinomial coefficient, and note that these coefficients add up to 256. The following are examples of how I calculated the multinomial coefficients in this quadvariate case:

For the term involving \partial_{1112} we have k_1 = 3, k_2 = 1, k_3 = 0, k_4 = 0, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{3! 1! 0! 0!} = \frac{24}{6} = 4

For the term involving \partial_{1122} we have k_1 = 2, k_2 = 2, k_3 = 0, k_4 = 0, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{2! 2! 0! 0!} = \frac{24}{4} = 6

For the term involving \partial_{1123} we have k_1 = 2, k_2 = 1, k_3 = 1, k_4 = 0, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{2! 1! 1! 0!} = \frac{24}{2} = 12

Finally, for the term involving \partial_{1234} we have k_1 = 1, k_2 = 1, k_3 = 1, k_4 = 1, so

\binom{k}{k_1, k_2, k_3, k_4} = \frac{4!}{1! 1! 1! 1!} = \frac{24}{1} = 24

A study of Euler’s untranslated paper `De integratione aequationum differentialium’

UntitledThe following are some translations and annotations I made while studying a previously unstranslated paper in Latin by Leonhard Euler (1707-1783) entitled De integratione aequationum differentialium (On the integration of differential equations), 1763. (Eneström index number E269, obtained from the Euler archive). This contains, among other things, some interesting treatments of linear first-order equations as well as nonlinear first-order equations such as Bernoulli’s equation and Riccati’s equation. I was curious to see what Euler had to say about these, particularly Riccati’s equation. (Jacopo Francesco Riccati (1676-1754) was a Venetian mathematician who introduced an important class of nonlinear first-order differential equations which now bear his name, and which Euler investigated in greater depth).

To begin with, I will translate the first paragraph of the introduction to the paper as it gives a nice overview of the central idea as well as giving a flavour of Euler’s writing style:

1.

I consider herein differential equations of first order, involving only two variables, which one may represent in the general form M dx + N dy = 0, where M and N denote any functions of a pair of variables x and y. It is demonstrated that such equations always imply a certain relationship between the variables x and y, such that for any given value of one of them, the values for the other can be determined. Since moreover this definite relationship between the two variables is to be found through integration, the integral equation, regarded in its greatest generality, will accommodate a new constant quantity which provided that it depends completely on our choice enables as it were an infinite number of integral equations to be encompassed, which should all be equally compatible with the differential equation. 

Mathematical comments: What Euler is essentially saying in this first paragraph is that he is going to consider methods for integrating first-order differential equations of the form M(x, y) dx + N(x, y) dy = 0 in order to obtain corresponding relationships between the variables x and y which he calls `integral equations’ (aequationes integrales) of the form V(x, y) = Const. The term on the right-hand side is an arbitrary constant and no matter what value is assigned to this constant, the differential of this `integral equation’ will always be of the same form M dx + N dy = 0. \square

After some further introductory comments, the rest of Euler’s long paper consists of a multitude of problems, theorems, corollaries and examples concerning differential equations of this type. The first problem that Euler considers is important because it more or less sets the scene for the rest of the paper:

Problem 1.

7. Given that the differential equation M dx + N dy = 0 is such that \big(\frac{d M}{d y} \big) = \big(\frac{d N}{d x} \big), find its integral equation.   

Solution.

If it is to be that \big(\frac{d M}{d y} \big) = \big(\frac{d N}{d x} \big), then a function must exist defined over a pair of variables x and y, which when it is differentiated yields M dx + N dy. Let V be this function, and given that dV = M dx + N dy, M dx will be the differential of V when only the variable x is considered, and N dy will be its differential when only the variable y is considered. From here therefore V will be found, if either M dx is integrated with y regarded as a constant, or N dy is integrated with x regarded as a constant: and in this way the task is reduced to the integration of a differential formula involving a single variable, which might be achieved either by algebraic methods, or else quadrature of curves might be needed. But since it is the case that the quantity V can be found in two ways, and one integration treats as a constant any function whatever of y, while the other treats as a constant any function whatever of x, so it may then be that both V = \int M dx + Y and V = \int N dy + X hold, and it is always permissible to define these functions Y of y, and X of x, such that \int M dx + Y = \int N dy + X, and this is in all cases readily guaranteed. When this is done it is evident that since the quantity V is to be the integral of the formula M dx + N dy, the complete integral equation of the proposed equation M dx + N dy = 0 will then be V = Const., as this involves a constant which is dependent on our choice. 

Mathematical comments: The first thing to observe here is that Euler uses the notation \big(\frac{d M}{d y} \big) and \big(\frac{d N}{d x} \big) to represent the partial derivatives \frac{\partial M}{\partial y} and \frac{\partial N}{\partial x}. At the start of the solution he says that if \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} then a function V(x, y) must exist which yields M dx + N dy when it is totally differentiated. The reason for this immediately becomes clear if we totally differentiate V(x, y) to get dV = \frac{\partial V}{\partial x} dx + \frac{\partial V}{\partial y} dy. We see that M = \frac{\partial V}{\partial x} and N = \frac{\partial V}{\partial y} so the condition \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} simply amounts to saying that the cross-partials of V are equal, i.e., \frac{\partial^2 V}{\partial x \partial y} = \frac{\partial^2 V}{\partial y \partial x}. This must be true for any well-behaved function. So if the condition \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} holds it will always be possible to find a function V whose total differential is M dx + N dy. Conversely, if it is not true that \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x}, there can be no function whose total differential is M dx + N dy. Euler deals with this latter case in a subsequent problem (see below). Since M = \frac{\partial V}{\partial x} we see that V = \int M dx + Y where Y is any function of y. Similarly, since N = \frac{\partial V}{\partial y} we have that V = \int N dy + X where X is any function of x. Equating these two expressions for V we can then work out what the functions X and Y must be in order to make \int M dx + Y and \int N dy + X mutually compatible. This is essentially the strategy that Euler is advocating here. \square

Euler provides a number of examples to illustrate the above procedure. The following is the first one:

Example 1.

12. Integrate this differential equation:

2axy dx + axx dy - y^3 dx - 3xyy dy = 0

The equation compared with the form M dx + N dy = 0 implies

M = 2axy - y^3 and N = axx - 3xyy.

Then the first thing to be considered is whether in this case the problem can be solved ? to which end we inquire as to the values of:

\big(\frac{d M}{d y}\big) = 2ax - 3yy, and \big(\frac{d N}{d y}\big) = 2ax - 3yy,

which since they are equal, the suggested method must necessarily succeed. Then it will be discovered, with y taken to be constant:

\int M dx = axxy - y^3 x + Y

which is of such a form that if the differential is taken, treating x as a constant, it will yield:

axx dy - 3yyx dy + dY = N dy,

and upon replacing N with its value axx - 3xyy, the result will be dY = 0, from which we get Y = 0, or Y = const. Hence we will have the required integral equation:

axxy - y^3 x = Const.

In the next extract, Euler introduces an integrating factor method to deal with situations in which the condition \frac{\partial M}{\partial y} = \frac{\partial N}{\partial x} does not hold:

Theorem.

16. If for the differential equation M dx + N dy = 0 it is not the case that \big(\frac{d M}{d y}\big) = \big(\frac{d N}{d x}\big), a multiplier is always available by means of which multiplication of the formula M dx + N dy will make it become integrable. 

Proof.

Since it is not the case that \big(\frac{d M}{d y}\big) = \big(\frac{d N}{d x}\big), also the formula M dx + N dy will not be integrable, or rather a function of x and y does not exist whose differential is M dx + N dy. In truth however it is not the formula M dx + N dy, but the equation M dx + N dy = 0, whose integral is being sought; and since this equation still stands if it is multiplied by any function L of x and y, so that it may be written L M dx + L N dy = 0, it is demonstrated that it is always possible to find a function L of this kind, such that the formula L M dx + L N dy = 0 will be integrable. For this to come about it will be necessary that:

\big(\frac{d LM}{d y}\big) = \big(\frac{d LN}{d x}\big)

or if it is supposed that dL = P dx + Q dy, then since \big(\frac{d L}{d y}\big) = Q, and \big(\frac{d L}{d x}\big) = P, the function L should be constructed so that:

L \big(\frac{d M}{d y}\big) + M Q = L \big(\frac{d N}{d x}\big) + NP

But it is apparent that this condition is sufficient to define a function L, which when it multiplies the formula M dx + N dy will make it integrable. 

Mathematical comments: What Euler is doing here is deriving a procedure for finding an integrating factor L(x, y) which will make LM dx + LN d y = 0 integrable. If this is integrable there will be a function V(x, y) whose total differential is LM dx + LN d y. But the total differential of V is d V = \frac{\partial V}{\partial x} dx + \frac{\partial V}{\partial y} dy from which we see that

\frac{\partial V}{\partial x} = LM and \frac{\partial V}{\partial y} = LN

Therefore Euler’s condition

\frac{\partial LM}{\partial y} = \frac{\partial LN}{\partial x}

again amounts to saying that the cross-partials of V must be equal. Expanding Euler’s condition we get

L \frac{\partial M}{\partial y} + M \frac{\partial L}{\partial y} = L \frac{\partial N}{\partial x} + N \frac{\partial L}{\partial x}

which when it is assumed that dL = P dx + Q dy we can write as

L \frac{\partial M}{\partial y} + MQ = L \frac{\partial N}{\partial x} + NP

This is the equation at the end of Euler’s proof and we can see that it is nothing more than a statement of the equality of the cross-partials of V. Euler does not explicitly take the final step, saying simply that it is evident from this that the required function L can be defined, but what he means is that L can now be defined in terms of the functions M, N, P and Q as

L = \frac{NP - MQ}{\frac{\partial M}{\partial y} - \frac{\partial N}{\partial x}}

When M dx + N dy is multiplied through by a function L defined in this way, the result will be a differential equation that is integrable. \square

I will now jump ahead in Euler’s paper to an application of the above procedure in the familiar case of a linear first-order differential equation:

Problem 4.

34. If the proposed differential equation is to be 

P dx + Q y dx + R dy = 0

where P, Q and R denote any functions whatever of x, and moreover the other variable y does not have more than one dimension, find the multiplier which makes it integrable. 

Solution.

Comparison of this equation with the form M dx + N dy implies M = P + Qy and N = R, and so we get 

\big(\frac{d M}{d y}\big) = Q and \big(\frac{d N}{d x}\big) = \frac{d R}{d x}

If now L is to be set up as the multiplier in question, let dL = p dx + q dy, and then the equation ought to satisfy

\frac{Np - Mq}{L} = Q - \frac{d R}{d x} = \frac{Rp - (P + Qy)q}{L}

Now since Q - \frac{d R}{d x} is to be a function only of x, L will likewise have to be accepted as a function only of x, so that q = 0 and dL = p dx; whence it will be the case that:

Q - \frac{d R}{d x} = \frac{Rp}{L}, or rather Q dx - dR = \frac{R dL}{L}

and therefore \frac{d L}{L} = \frac{Q dx}{R} - \frac{d R}{R}. Hence upon integration will be obtained lL = \int \frac{Q dx}{R} - lR, and taking e to be the number whose hyperbolic logarithm equals unity, it turns out that 

L = \frac{1}{R} e^{\int \frac{Q dx}{R}}

Then as a result of this discovered multiplier the integral equation will be:

\int \frac{P dx}{R} e^{\int \frac{Q dx}{R}} + y e^{\int \frac{Q dx}{R}} = Const.

Mathematical comments: In view of the previous theorem introducing the integrating factor approach using Euler’s idiosyncratic notation, Euler’s explanation of the solution to this problem is largely self-explanatory. Perhaps only the final step needs clarification. Having found the multiplier L we can multiply the original differential equation by it to get

L P dx + L Q y dx + L R dy = 0

or

\frac{P dx}{R} e^{\int \frac{Q dx}{R}} + \frac{Q dx}{R} y e^{\int \frac{Q dx}{R}} + e^{\int \frac{Q dx}{R}} dy = 0

or

\frac{P dx}{R} e^{\int \frac{Q dx}{R}} + d \big(y e^{\int \frac{Q dx}{R}}\big) = 0

This can now be integrated directly to get the final equation obtained by Euler. Although straightforward, it is interesting to compare Euler’s approach here with the type of argument usually given in modern treatments of linear first-order differential equations. In modern treatments the differential equation would usually be given as

\frac{d y}{d x} + Py = Q

We would then seek an integrating factor L(x) which would enable us to write

\frac{d}{dx}(y L) = QL

and this could then be integrated directly to find y. The unknown function L is found by first expanding the left-hand side to get

L \frac{d y}{d x} + y \frac{d L}{d x} = QL

Dividing through by L we get

\frac{d y}{d x} + \big(\frac{d L/d x}{L}\big)y = Q

Comparing this with the original equation we see that

\frac{1}{L} \frac{d L}{d x} = P

and so

L = e^{\int P dx}

Putting this into \frac{d}{dx}(y L) = QL and integrating directly we then get the required solution

y e^{\int P dx} = \int Q e^{\int P dx} dx + Const.

which is equivalent to the one obtained by Euler. \square

In the next problem Euler considers a nonlinear first-order differential equation which today is known as Bernoulli’s equation:

Problem 5.

37. If the proposed differential equation is to be: 

P y^n dx + Q y dx + R dy = 0

where P, Q and R denote any functions whatever of x, find the multiplier which makes it integrable. 

Solution.

Then it will be that M = P y^n + Qy and N = R, and hence

\big(\frac{d M}{d y}\big) = n P y^{n - 1} + Q, and \big(\frac{d N}{d x}\big) = \frac{d R}{d x}

Therefore assuming that the multiplier in question is L and dL = p dx + q dy, based on our previous findings it will be the case that:

 \frac{Rp - P y^n q - Q y q}{L} = n P y^{n - 1} + Q - \frac{d R}{d x}

If it is imagined that L = S y^m, where S is a function only of x, it will be the case that p = \frac{y^m d S}{d x}, and q = m S y^{m - 1}, so that substituting these values will yield:

\frac{R d S}{S d x} - m P y^{n - 1} - m Q = n P y^{n - 1} + Q - \frac{d R}{d x}

In order for this equation to hold it needs to take m = -n, so it will become

\frac{R d S}{S d x} = (1 - n) Q - \frac{d R}{d x} , or \frac{d S}{S} = \frac{(1 - n) Q dx}{R} - \frac{d R}{R}

Then since integration will yield S = \frac{1}{R} e^{(1 - n) \int \frac{Q dx}{R}}, the required multiplier, with m = -n, will be

L = \frac{y^{-n}}{R} e^{(1 - n) \int \frac{Q dx}{R}}

and the integral equation will be

\frac{y^{1 - n}}{1 - n} e^{(1 - n) \int \frac{Q dx}{R}} + \int \frac{P dx}{R} e^{(1 - n) \int \frac{Q dx}{R}} =  Const.

Mathematical comments: Again in view of the previous material, Euler’s solution to this problem is largely self-explanatory. To get the final equation we multiply the original differential equation by the multiplier to get

L P y^n dx + L Q y dx + L R dy = 0

or

\frac{P dx}{R} e^{(1 - n) \int \frac{Q dx}{R}} + \frac{Q dx}{R} y^{1 - n} e^{(1 - n) \int \frac{Q dx}{R}} + y^{-n} e^{(1 - n) \int \frac{Q dx}{R}} dy = 0

or

\frac{P dx}{R} e^{(1 - n) \int \frac{Q dx}{R}} + d \big(\frac{y^{1 - n}}{1 - n} e^{(1 - n) \int \frac{Q dx}{R}}\big) = 0

and this last equation can now be integrated directly. As with the previous problem, it is interesting to compare Euler’s approach to solving this problem with the way it would be done in modern treatments. Prior to Euler, the equation in this problem was also studied by two Bernoulli brothers, James Bernoulli (1654-1705) and John Bernoulli (1667-1748), as well as by Gottfried Wilhelm Leibniz (1646-1716). Ironically, the modern approach is essentially the one developed by John Bernoulli before Euler tackled it (John Bernoulli was Euler’s tutor). In modern times the equation would usually be written as

\frac{d y}{d x} + P y = Q y^n

and John Bernoulli’s approach was to make the change of variable z = y^{1 - n}. Then

\frac{d z}{d x} = \frac{1 - n}{y^n} \frac{d y}{d x}

and we can rewrite the original equation using this new variable as

\frac{d z}{d x} + (1 - n) P z = (1 - n) Q

This is now a first-order linear equation of the type solved in the previous problem. Following the approach discussed there, we seek a multiplier L which enables us to write the equation as

\frac{d}{dx} (z L) = (1 - n) Q L

Expanding the left-hand side and comparing with the original equation we find that

\frac{1}{L} \frac{d L}{d x} = (1 - n) P

so the multiplier is

L = e^{(1 - n) \int P dx}

Then we can write

\frac{d}{dx}\big(z e^{(1 - n) \int P dx}\big) = (1 - n) Q e^{(1 - n) \int P dx}

This can be integrated directly to get

z e^{(1 - n) \int P dx} = (1 - n) \int Q e^{(1 - n) \int P dx} dx + Const.

or

y^{1-n} e^{(1 - n) \int P dx} = (1 - n) \int Q e^{(1 - n) \int P dx} dx + Const.

which is equivalent to Euler’s solution. \square

The final extract from Euler’s paper that I want to include here is a problem dealing with a nonlinear differential equation now known as Riccati’s equation. The equation was apparently first referred to as `Riccati’s equation’ in 1770 (only a few years after Euler wrote this paper) by the mathematician Jean-Baptiste le Rond d’Alembert (1717-1783):

Problem 9. 

56. For this proposed differential equation:

dy + P y dx + Q y y dx + R dx = 0

where P, Q and R are functions only of x, if it is determined that this equation satisfies y = v where v is a function of x, find the multipliers which make this equation integrable. 

Solution.

Since this equation is satisfied by the value y = v, it will be the case that

dv + P v dx + Q v v dx + R dx = 0;

if one then puts y = v + \frac{1}{z}, one will obtain

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

or:

dz - (P + 2Qv)z dx - Qdx = 0

which is rendered integrable by means of the multiplier

e^{-\int(P + 2Qv)dx}.

This multiplier will thus fit the proposed equation when it itself is multiplied by zz. When therefore it is the case that z = \frac{1}{y - v}, the multiplier that will render the proposed equation integrable will be

\frac{1}{(y - v)^2} \ e^{-\int(P + 2Qv)dx}

For the sake of brevity let e^{-\int(P + 2Qv)dx} = S. Since the integral of the equation dz - (P + 2Qv)z dx - Qdx = 0 is

Sz - \int QS dx = Const.

all the required multipliers will be encompassed by this form:

\frac{S}{(y - v)^2} funct. \big(\frac{S}{y - v} - \int QS dx \big)

where by hypothesis v is a known function of x, and also S = e^{-\int(P + 2Qv)dx}

Mathematical comments: In this problem Euler has developed a method for solving Riccati’s equation when a particular integral is known. The basic idea is that if the particular solution v(x) is known, then the change of variables y = v + \frac{1}{z} gives a simple linear first-order equation for z of the form dz - (P + 2Qv)z dx - Qdx = 0. This is because substituting y = v + \frac{1}{z} into Riccati’s equation we get

dy + Py dx + Qyy dx + Rdx =

\big(dv - \frac{1}{zz} dz\big) + \big(Pv dx + \frac{P dx}{z}\big) + \big(Qvv dx + \frac{2Qvdx}{z} + \frac{Qdx}{zz}\big) + Rdx =0

which reduces to

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

\iff

dz - (P + 2Qv)z dx - Qdx = 0

The integrating factor which makes this simple linear first-order differential equation integrable can be found straightforwardly using the method of Problem 4, and is e^{-\int(P + 2Qv)dx}. Then the integrating factor that makes the equation

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

integrable will be

zz e^{-\int(P + 2Qv)dx} = \frac{1}{(y - v)^2} \ e^{-\int(P + 2Qv)dx}

To clarify the last part of Euler’s solution, note that if we multiply through the equation dz - (P + 2Qv)z dx - Qdx = 0 by S = e^{-\int(P + 2Qv)dx} we get

Sdz - S(P + 2Qv)z dx - SQdx = 0

\iff

d(Sz) - QSdx = 0

This last equation can be integrated directly to give

Sz - \int QS dx = Const.

or

\frac{S}{y - v} - \int QS dx = Const.

Euler makes the point that in general the integrating factor for all cases of Riccati’s equation in which a particular solution v(x) is known will be of the form

\frac{S}{(y - v)^2} f\big(\frac{S}{y - v} - \int QS dx\big)

where f is some unspecified function. This works because the argument of the unspecified function f is an arbitrary constant, so df = 0. Thus, for example, starting from

-\frac{d z}{zz} + \frac{P d x}{z} + \frac{2Qv d x}{z} + \frac{Q d x}{zz} = 0

we can multiply through by -\frac{Sf}{(y - v)^2} instead of just -\frac{S}{(y - v)^2} to get

Sf dz - Sf (P + 2Qv) z dx - SfQ dx = 0

and since df = 0 we can add the term zS df to get

Sf dz - Sf (P + 2Qv) z dx + zSdf - SfQ dx = 0

\iff

d\big(Sfz\big) - QSf dx = 0

and this can be integrated directly to get

Sfz - \int QSf dx = Const. \square

On the electrodynamical part of Einstein’s paper ‘Zur Elektrodynamik bewegter Körper’

Albert Einstein (1879-1955) introduced his special theory of relativity in a 1905 paper Zur Elektrodynamik bewegter Körper (On the Electrodynamics of Moving Bodies, Annalen der Physik. 17 (10): 891–921). The thoughtfulness and confidence of the 26 year-old Einstein are impressive and what is also striking is that the paper contains no references to other research publications. The only person Einstein acknowledges is his best friend Michele Angelo Besso (1873-1955) with the now famous concluding sentence: Zum Schlusse bemerke ich, daß mir beim Arbeiten an dem hier behandelten Probleme mein Freund und Kollege M. Besso treu zur Seite stand und daß ich demselben manche wertvolle Anregung verdanke. (`In conclusion I wish to say that in working at the problem here dealt with I have had the loyal assistance of my friend and colleague M. Besso, and that I am indebted to him for several valuable suggestions.’) The two remained close until they died a month apart fifty years later.

While reading through Einstein’s work I could not help tinkering with his electrodynamical equations and comparing them to the modern equations of classical and relativistic electrodynamics. I want to record some of my jottings about this in the present note, particularly in relation to two key sections of his paper:

§3 Theorie der Koordinaten- und Zeittransformation… (Theory of the Transformation of Coordinates and Times…)

and

§6 Transformation der Maxwell-Hertzschen Gleichungen für den leeren Raum. (Transformation of the Maxwell-Hertz Equations for Empty Space.)

What Einstein envisages throughout these two sections is a pair of inertial reference frames K and k which are in what we now call `standard configuration’. That is, frame k is moving in the positive x-direction relative to frame K with speed v such that all the Cartesian coordinate axes in the two frames remain parallel and such that the origins and axes of the Cartesian coordinate systems in K and k coincide perfectly at some initial time point. Under these conditions the coordinates of an event using the coordinate system in the k-frame can be expressed in terms of the coordinates of the same event using the coordinate system in the K-frame by a set of transformation equations which Einstein derives in §3 of his paper, and which are now known as the Lorentz transformation equations. Einstein expresses these equations as follows on page 902 of the published paper:

(`It follows from this relation and the one previously found that \phi(v) = 1, so that the transformation equations which have been found become: \cdots.’) Note that Einstein uses the symbol V to denote the speed of light whereas we now use the symbol c.

Shortly after Einstein published this in 1905, Herman Minkowski (1864-1909) realised that the special theory of relativity could be better understood by positing the existence of a four-dimensional spacetime. In Minkowski spacetime we have four-vectors consisting of a time coordinate, ct, and three spatial coordinates, x, y and z. Note that the time coordinate takes the form ct in order to make it have the same units as the spatial coordinates (i.e., units of length). The full coordinate transformations between Einstein’s two inertial frames K and k in standard configuration would be obtained in Minkowski spacetime as

\begin{bmatrix} ct^{\prime}\\ \ \\x^{\prime} \\ \ \\ y^{\prime} \\ \ \\ z^{\prime} \end{bmatrix} = \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \  0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \  1 & \  \ 0 \\ \ \\ 0 & 0 & \  0 & \  \  1 \end{bmatrix} \begin{bmatrix} ct \\ \ \\x \\ \ \\ y \\ \ \\ z \end{bmatrix}

where, using the same notation as Einstein,

\beta = \frac{1}{\sqrt{1 - \big(\frac{v}{c}\big)^2}}

The coefficient matrix is actually a rank-2 tensor of type (1, 1) usually denoted by \Lambda^{\mu}_{\hphantom{\mu} \nu}. It is the application of this matrix to a four-vector in the K-frame that would produce the required Lorentz transformation to a corresponding (primed) four-vector in the k-frame when K and k are in standard configuration.

At the start of §6 on page 907 of the published paper Einstein writes the following:

(‘Let the Maxwell-Hertz equations for empty space hold for the stationary system K, so that we have: \ldots where (X, Y, Z) denotes the vector of the electric force, and (L, M, N) that of the magnetic force.’) The key point that Einstein is trying to make in §6 is that when the coordinate transformations he derived in §3 are applied to electromagnetic processes satisfying the above Maxwell-Hertz equations, it is found that the vectors (X, Y, Z) and (L, M, N) themselves satisfy transformation equations of the form

(page 909 of Einstein’s published paper). Here the primed letters are the components of the vectors with respect to the coordinate system in the k-frame and the unprimed letters are the components with respect to the coordinate system in the K-frame. These equations show that the components of (X, Y, Z) and (L, M, N) do not all remain unchanged when we switch from one inertial reference frame to another. In the standard configuration scenario, only the X and L components remain unchanged. In contrast, looking at the Y^{\prime} equation, for example, we see that an event which from the point of view of the k-frame would be regarded as being due solely to an `electric force’ Y^{\prime} would be regarded from the point of view of the K-frame as being due to a combination of an electric force Y and a magnetic force N. Thus, Einstein writes on page 910 that die elektrischen und magnetischen Kräfte keine von dem Bewegungszustande des Koordinatensystems unabhängige Existenz besitzen. (`electric and magnetic forces do not exist independently of the state of motion of the system of coordinates.’)

Not surprisingly, the terminology and notation that Einstein uses in 1905 seem rather archaic and obscure from a modern perspective and I could not help jotting down modern interpretations of what he was saying as I was reading his paper. To begin with, the `Maxwell-Hertz equations for empty space’ that Einstein refers to at the start can be viewed as arising from a simple scenario in which there is a changing charge and current distribution within a certain region of space, but we are considering the fields produced by this source of radiation in the free space outside the region. In this free space outside the region the charge and current densities (denoted by \rho and \vec{J} respectively) are everywhere zero and the differential form of the Maxwell equations in SI units reduce to

\nabla \cdot \vec{E} = 0

\nabla \cdot \vec{B} = 0

\mu_0 \epsilon_0 \frac{\partial \vec{E}}{\partial t} = \nabla \times \vec{B}

\frac{\partial \vec{B}}{\partial t} = - \nabla \times \vec{E}

where the parameters \epsilon_0 and \mu_0 and the speed of light c are related by the equation

c^2 = \frac{1}{\mu_0 \epsilon_0}

The vector \vec{E} = (E_x, E_y, E_z) denotes the electric field which is defined at any given point as the electric force per unit charge on an infinitesimally small positive test charge placed at that point. This corresponds to the vector (X, Y, Z) in Einstein’s paper which he called den Vektor der elektrischen (‘the vector of the electric force’). The vector \vec{B} = (B_x, B_y, B_z) denotes the magnetic field and this corresponds to the vector (L, M, N) in Einstein’s paper. A moving charge creates a magnetic field in all of space and this magnetic field exerts a force on any other moving charge. (Charges at rest do not produce magnetic fields nor do magnetic fields exert forces on them).

In his paper Einstein actually employs an alternative formulation of the above Maxwell equations in which quantities are measured in Gaussian units rather than SI units. This involves defining

\epsilon_0 = \frac{1}{4 \pi c}

(and therefore \mu_0 = \frac{4 \pi}{c}) and also rescaling the electric field vector as

\vec{E} = c^{-1} \vec{E}_{SI}

where \vec{E}_{SI} denotes the electric field vector expressed in SI units. When these changes are substituted into the above Maxwell equations, the equations become

\nabla \cdot \vec{E} = 0

\nabla \cdot \vec{B} = 0

\frac{1}{c} \frac{\partial \vec{E}}{\partial t} = \nabla \times \vec{B}

\frac{1}{c} \frac{\partial \vec{B}}{\partial t} = - \nabla \times \vec{E}

The last two of these are the equations appearing at the start of §6 of Einstein’s paper, with V \equiv c there. Incidentally, these are also the equations that are responsible for the emergence of electromagnetic waves, as can easily be demonstrated by taking the curl of both sides of the equations to get

\frac{1}{c} \frac{\partial (\nabla \times \vec{E})}{\partial t} = \nabla \times (\nabla \times \vec{B})

\frac{1}{c} \frac{\partial (\nabla \times \vec{B})}{\partial t} = - \nabla \times (\nabla \times \vec{E})

On the left-hand sides I have taken the curl operator into the partial derivative which is permissible since curl does not involve time. We can then replace the curl inside each partial derivative by the corresponding term in Maxwell’s equations. Using a standard identity in vector calculus the right-hand-sides become

\nabla \times (\nabla \times \vec{B}) = \nabla (\nabla \cdot \vec{B}) - \nabla^{2} \vec{B}

\nabla \times (\nabla \times \vec{E}) = \nabla (\nabla \cdot \vec{E}) - \nabla^{2} \vec{E}

Since the divergence terms are zero in free space we are left only with the Laplacian terms on the right-hand sides. Putting the left and right-hand sides together we get the electromagnetic wave equations

\nabla^2 \vec{B} = \frac{1}{c^2} \frac{\partial^2 \vec{B}}{\partial t^2}

\nabla^2 \vec{E} = \frac{1}{c^2} \frac{\partial^2 \vec{E}}{\partial t^2}

With regard to Einstein’s demonstration that the electric and magnetic fields get mixed together under a Lorentz transformation, we would show this now via a Lorentz transformation of a rank-2 electromagnetic field tensor of the form

F^{\mu \nu} = \begin{bmatrix} 0 & \frac{1}{c} E_x & \frac{1}{c} E_y & \frac{1}{c} E_z \\ \ \\ -\frac{1}{c} E_x & 0 & B_z & -B_y \\ \ \\-\frac{1}{c} E_y & -B_z & 0 & B_x \\ \ \\ -\frac{1}{c}E_z & B_y & -B_x & 0 \end{bmatrix}

Note that we are using SI units here, which is why the coefficient \frac{1}{c} appears in front of the electric field components. This coefficient would disappear if we were using Gaussian units. To apply the Lorentz transformation to the electromagnetic field tensor we observe that since F^{\mu \nu} is a rank-2 tensor of type (2, 0) it transforms according to

F^{\prime \  \alpha \beta} = \frac{\partial x^{\prime \alpha}}{\partial x^{\mu}} \frac{\partial x^{\prime \beta}}{\partial x^{\nu}} F^{\mu \nu}

Since

\Lambda^{\alpha}_{\hphantom{\alpha} \mu} \equiv \frac{\partial x^{\prime \alpha}}{\partial x^{\mu}}

we obtain the required Lorentz transformation of the electromagnetic field tensor as

F^{\prime \  \alpha \beta} = \Lambda^{\alpha}_{\hphantom{\alpha} \mu} \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu}

But \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu} involves multiplying corresponding terms in each row of the matrices \Lambda^{\beta}_{\hphantom{\beta} \nu} and F^{\mu \nu} which is the same as the matrix multiplication

[F^{\mu \nu}] [\Lambda^{\beta}_{\hphantom{\beta} \nu}]^T

where the T denotes the matrix transpose. (Note that the transformation matrix for frames in standard configuration is symmetric, so it is unaffected by transposition). Applying \Lambda^{\alpha}_{\hphantom{\alpha} \mu} on the left then amounts to multiplying the above matrix product on the left by the matrix [\Lambda^{\alpha}_{\hphantom{\alpha} \mu}]. Thus, we are able to compute the required Lorentz transformation of the electromagnetic field tensor as a matrix product. We get

F^{\prime \  \alpha \beta} = \Lambda^{\alpha}_{\hphantom{\alpha} \mu} \Lambda^{\beta}_{\hphantom{\beta} \nu} F^{\mu \nu}

=  [\Lambda^{\alpha}_{\hphantom{\alpha} \mu}] [F^{\mu \nu}] [\Lambda^{\beta}_{\hphantom{\beta} \nu}]^T

= \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \  0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \  1 & \  \ 0 \\ \ \\ 0 & 0 & \  0 & \  \  1 \end{bmatrix} \begin{bmatrix} 0 & \frac{1}{c} E_x & \frac{1}{c} E_y & \frac{1}{c} E_z \\ \ \\ -\frac{1}{c} E_x & 0 & B_z & -B_y \\ \ \\-\frac{1}{c} E_y & -B_z & 0 & B_x \\ \ \\ -\frac{1}{c}E_z & B_y & -B_x & 0 \end{bmatrix} \begin{bmatrix} \beta & -\beta \big(\frac{v}{c}\big) & 0 & \  0\\ \ \\ -\beta \big(\frac{v}{c}\big) & \beta & 0 & \ 0 \\ \ \\0 & 0 & \  1 & \  \ 0 \\ \ \\ 0 & 0 & \  0 & \  \  1 \end{bmatrix}

= \begin{bmatrix} 0 & \frac{E_x}{c} & \beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg) & \beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg)\\ \ \\ -\frac{E_x}{c} & 0 & \beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg) & -\beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg) \\ \ \\ -\beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg) & -\beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg) & 0 & B_x \\ \ \\ -\beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg) & \beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg) & -B_x & 0 \end{bmatrix}

\equiv \begin{bmatrix} 0 & \frac{E_x}{c}^{\prime} & \frac{E_y}{c}^{\prime} & \frac{E_z}{c}^{\prime} \\ \ \\ -\frac{E_x}{c}^{\prime} & 0 & B_z^{\prime} & -B_y^{\prime} \\ \ \\-\frac{E_y}{c}^{\prime} & -B_z^{\prime} & 0 & B_x^{\prime} \\ \ \\ -\frac{E_z}{c}^{\prime} & B_y^{\prime} & -B_x^{\prime} & 0 \end{bmatrix}

Comparing the entries in the last two matrices we get exactly the same relations as Einstein did on page 909 of his paper, except that we are using SI units here so the electric field components are divided by c:

\frac{E_x}{c}^{\prime} = \frac{E_x}{c}

\frac{E_y}{c}^{\prime} = \beta \bigg(\frac{E_y}{c} - \big(\frac{v}{c} \big) B_z \bigg)

\frac{E_z}{c}^{\prime} = \beta \bigg(\frac{E_z}{c} + \big(\frac{v}{c} \big) B_y \bigg)

B_x^{\prime} = B_x

B_y^{\prime} = \beta \bigg(B_y + \big(\frac{v}{c} \big) \frac{E_z}{c} \bigg)

B_z^{\prime} = \beta \bigg(B_z - \big(\frac{v}{c} \big) \frac{E_y}{c} \bigg)