Illustrating the correspondence between 1-forms and vectors using de Broglie waves

I was asked by a student to clarify the issues surrounding an exercise in the famous book Gravitation written by Misner, Thorne and Wheeler (MTW). The exercise appears as follows in Chapter 2, Section 5:


The student found a discussion about this problem in Physics Stack Exchange but remained confused as the exercise and the surrounding issues were not dealt with clearly enough in the posts there. I could not find anything else about this online so I tried to clarify the situation a bit more myself. I want to record my thoughts about this in the present note. (The reader should try to solve Exercise 2.1 above for himself/herself at this point, before continuing to read my discussion below).

The key point of this section is that Equation (2.14), the defining equation of 1-forms, can be shown to be physically valid (as well as being a just a mathematical definition) using de Broglie waves in quantum mechanics. The notation in MTW is not ideal, so we will replace the notation \langle \mathbf{\tilde{p}}, \mathbf{v} \rangle for a 1-form evaluated at a vector \mathbf{v} by the notation \mathbf{\tilde{p}}(\mathbf{v}). What MTW are then saying is that given any vector \mathbf{p} we can define a corresponding 1-form as

\mathbf{\tilde{p}} = \langle \mathbf{p}, \ \rangle

which is to be viewed as a function waiting for a vector input (to be placed in the empty space on the right-hand side of the angle brackets). When the vector input \mathbf{v} is supplied, the 1-form will then yield the number

\mathbf{\tilde{p}}(\mathbf{v}) = \langle \mathbf{p}, \mathbf{v} \rangle = \mathbf{p} \cdot \mathbf{v}

In Exercise 2.1 we are asked to verify the validity of this equation using the de Broglie wave

\psi = e^{i \phi} = exp[i(\mathbf{k}\cdot \mathbf{x}- \omega t)]

The phase is the angular argument \phi = \mathbf{k}\cdot \mathbf{x} - \omega t which specifies the position of the wave from some starting point. The phase is parameterised by the wave vector \mathbf{k} which is such that |\mathbf{k}| = 2 \pi/\lambda where \lambda is the wavelength, and by the angular frequency \omega = 2 \pi f where f is the frequency of the relevant oscillator.

It is a well known fact (and it is easy to verify) that given any real-valued function of a vector \phi(\mathbf{x}), the gradient vector \partial \phi/\partial \mathbf{x} is orthogonal to the level surfaces of \phi. In the case of the phase of a de Broglie wave we have

\frac{\partial \phi}{\partial \mathbf{x}} = \mathbf{k}

so the wave vector is the (position) gradient vector of the phase \phi and therefore \mathbf{k} must be orthogonal to loci of constant phase.

In the case of circular waves, for example, these loci of constant phase are circles with centre at the source of the waves and the wave vectors \mathbf{k} point radially outwards at right angles to them, as indicated in the diagram.

To get a diagrammatic understanding of the relationship between 1-forms and vectors, we can imagine focusing on a very small neighbourhood around some point located among these loci of constant phase. On this very small scale, the loci of constant phase will look flat rather than circular, but the wave vectors \mathbf{k} will still be orthogonal to them. What we do is interpret this local pattern of (flat) surfaces of constant phase as the 1-form \mathbf{\tilde{k}}This 1-form corresponding to the wave vector \mathbf{k} is

\mathbf{\tilde{k}} = \langle \mathbf{k}, \ \rangle

and as before we interpret this as a function waiting for a vector input. When it receives a vector input, say \mathbf{v}, it will output a number computed as the scalar product of \mathbf{k} and \mathbf{v}. Thus we can write

\mathbf{\tilde{k}}(\mathbf{v}) = \langle \mathbf{k}, \mathbf{v} \rangle = \mathbf{k} \cdot \mathbf{v}

As indicated in the diagram, the vector \mathbf{v} which we supply to \mathbf{\tilde{k}} will be at an angle to the wave vector \mathbf{k}. If the vector \mathbf{v} is parallel to the loci of constant phase then \mathbf{\tilde{k}}(\mathbf{v}) = 0 because \mathbf{k} and \mathbf{v} will be orthogonal. In the language of 1-forms, this would be interpreted by saying that the vector \mathbf{v} will not pierce the 1-form \mathbf{\tilde{k}} because it will not cross any of the loci of constant phase. Conversely, if the vector \mathbf{v} is parallel to the wave vector \mathbf{k} (orthogonal to the loci of constant phase), we would say that \mathbf{v} will pierce the 1-form \mathbf{\tilde{k}} as much as possible, because it will cross as many loci of constant phase as it possibly can. Between these extremes we will get intermediate values of the 1-form \mathbf{\tilde{k}}(\mathbf{v}). The key idea, then, is that the set of loci of constant phase in the neighbourhood of a point is the diagrammatic representation of the 1-form \mathbf{\tilde{k}}. When we feed a vector \mathbf{v} into this 1-form we get a measure \mathbf{\tilde{k}}(\mathbf{v}) of how many loci of constant phase the vector pierces. This is the language being used by MTW in the prelude to Exercise 2.1 above.

To actually solve Exercise 2.1, begin by recalling from quantum mechanics that a photon’s momentum \mathbf{p} is such that |\mathbf{p}| = E/c where E = hf is the photonic energy and f is the frequency of the oscillator. Since \lambda f = c where \lambda is the photon’s wavelength, we have E = hc/\lambda so the magnitude of the photon’s momentum is

|\mathbf{p}| = \frac{E}{c} = \frac{h}{\lambda} = \hbar \frac{2\pi}{\lambda} = \hbar |\mathbf{k}|

and in fact

\mathbf{p} = \hbar \mathbf{k}

Note that therefore

\lambda = \frac{h}{|\mathbf{p}|}

Famously, de Broglie’s idea in his 1924 PhD thesis was that this wavelength formula applies not just to photons but also to massive particles such as electrons, for which the momentum \mathbf{p} would be calculated as

\mathbf{p} = m \mathbf{u}

where m is the mass of the particle and \mathbf{u} is its four-velocity in Minkowski spacetime. Note that this four-velocity is such that \mathbf{u}\cdot\mathbf{u} = -1 (easily demonstrated using the - +++ metric of Minkowski spacetime).

Thus we have

\mathbf{p} = m \mathbf{u} = \hbar \mathbf{k}


\mathbf{u} = \frac{\hbar}{m} \mathbf{k}

In the prelude to Exercise 2.1, MTW say

relabel the surfaces of \mathbf{\tilde{k}} by \hbar \times phase, thereby obtaining the momentum 1-form \mathbf{\tilde{p}}. Pierce this 1-form with any vector \mathbf{v}, and find the result that \mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v}).

Following the authors’ instructions, we relabel the surfaces of \mathbf{\tilde{k}} (i.e., the loci of constant phase) by multiplying by \hbar to get the 1-form

\mathbf{\tilde{p}} = \hbar \mathbf{\tilde{k}} = \hbar \langle \mathbf{k}, \ \rangle

As usual, this 1-form is a linear function waiting for a vector input. Supplying the input \mathbf{v} we then get

\mathbf{\tilde{p}}(\mathbf{v}) = \hbar \langle \mathbf{k}, \mathbf{v} \rangle = \hbar \mathbf{k} \cdot \mathbf{v}

But this is exactly what we get when we work out \mathbf{p} \cdot \mathbf{v} since

\mathbf{p} \cdot \mathbf{v} = m \mathbf{u} \cdot \mathbf{v} = m \frac{\hbar}{m} \mathbf{k} \cdot \mathbf{v} = \hbar \mathbf{k} \cdot \mathbf{v}

Thus, we have solved Exercise 2.1 by showing that \mathbf{p} \cdot \mathbf{v} = \mathbf{\tilde{p}}(\mathbf{v}) is in accord with the quantum mechanical properties of de Broglie waves, as claimed by MTW.

Invariance under rotations in space and conservation of angular momentum

In a previous note I studied in detail the mathematical setup of Noether’s Theorem and its proof. I briefly illustrated the mathematical machinery by considering invariance under translations in time, giving the law of conservation of energy, and invariance under translations in space, giving the law of conservation of linear momentum. I briefly mentioned that invariance under rotations in space would also yield the law of conservation of angular momentum but I  did not work this out explicitly. I want to quickly do this in the present note.

We imagine a particle of unit mass moving freely in the absence of any potential field, and tracing out a path \gamma(t) in the (x, y)-plane of a three-dimensional Euclidean coordinate system between times t_1 and t_2, with the z-coordinate everywhere zero along this path. The angular momentum of the particle at time t with respect to the origin of the coordinate system is given by

\mathbf{L} = \mathbf{r} \times \mathbf{v}

= (\mathbf{i} x + \mathbf{j} y) \times (\mathbf{i} \dot{x} + \mathbf{j} \dot{y})

= \mathbf{k} x \dot{y} - \mathbf{k} y \dot{x}

= \mathbf{k} (x \dot{y} - y \dot{x})

where \times is the vector product operation. Alternatively, we could have obtained this as

\mathbf{L} = \mathbf{r} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & 0 \\ \ \\\dot{x} & \dot{y} & 0 \end{vmatrix}

= \mathbf{k} (x \dot{y} - y \dot{x})

In terms of Lagrangian mechanics, the path \gamma(t) followed by the particle will be a stationary path of the action functional

S[\gamma(t)] = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

(in the absence of a potential field the total energy consists only of kinetic energy).

Now imagine that the entire path \gamma(t) is rotated bodily anticlockwise in the (x, y)-plane through an angle \theta. This corresponds to a one-parameter transformation

\overline{t} \equiv \Phi(t, x, y, \dot{x}, \dot{y}; \theta) = t

\overline{x} \equiv \Psi_1(t, x, y, \dot{x}, \dot{y}; \theta) = x \cos \theta - y \sin \theta

\overline{y} \equiv \Psi_2(t, x, y, \dot{x}, \dot{y}; \theta) = x \sin \theta + y \cos \theta

which reduces to the identity when \theta = 0. We have

d\overline{t} = dt

\dot{\overline{x}}^2 = \dot{x}^2 \cos^2 \theta + \dot{y}^2 \sin^2 \theta - 2 \dot{x} \dot{y} \sin \theta \cos \theta

\dot{\overline{y}}^2 = \dot{x}^2 \sin^2 \theta + \dot{y}^2 \cos^2 \theta + 2 \dot{x} \dot{y} \sin \theta \cos \theta

and therefore

\dot{x}^2 + \dot{y}^2 = \dot{\overline{x}}^2 + \dot{\overline{y}}^2

so the action functional is invariant under this rotation since

S[\overline{\gamma}(t)] = \int_{t_1}^{t_2} d\overline{t} \frac{1}{2}(d\dot{\overline{x}}^2 + d\dot{\overline{y}}^2) = \int_{t_1}^{t_2} dt \frac{1}{2}(\dot{x}^2 + \dot{y}^2) = S[\gamma(t)]

Therefore Noether’s theorem applies. Let

F(t, x, y, \dot{x}, \dot{y}) = \frac{1}{2}(\dot{x}^2 + \dot{y}^2)

Then Noether’s theorem in this case says

\frac{\partial F}{\partial \dot{x}} \psi_1 + \frac{\partial F}{\partial \dot{y}} \psi_2 + \big(F - \frac{\partial F}{\partial \dot{x}} \dot{x} - \frac{\partial F}{\partial \dot{y}} \dot{y}\big) \phi = const.


\phi \equiv \frac{\partial \Phi}{\partial \theta} \big|_{\theta = 0} = 0

\psi_1 \equiv \frac{\partial \Psi_1}{\partial \theta} \big|_{\theta = 0} = -y

\psi_2 \equiv \frac{\partial \Psi_2}{\partial \theta} \big|_{\theta = 0} = x

We have

\frac{\partial F}{\partial \dot{x}} = \dot{x}

\frac{\partial F}{\partial \dot{y}} = \dot{y}

Therefore Noether’s theorem gives us (remembering \phi = 0)

-\dot{x} y + \dot{y} x = const.

The expression on the left-hand side of this equation is the angular momentum of the particle (cf. the brief discussion of angular momentum at the start of this note), so this result is precisely the statement that the angular momentum is conserved. Noether’s theorem shows us that this is a direct consequence of the invariance of the action functional of the particle under rotations in space.

A mathematical formulation of Feynman’s ‘mirage on a hot road’

In his famous Feynman Lectures on Physics, Richard Feynman provided an intuitive explanation of how a ‘mirage on a hot road’ can arise due to the bending of light rays from the sky in accordance with Fermat’s Principle (see The Feynman Lectures on Physics, Volume I, Chapter 26). Feynman wrote the following:

feynman vol1 ch26

I was discussing this with a beginning engineering student who did not quite understand why the mirage makes it look as if the water is actually on the road. I explained this by augmenting Feynman’s Fig. 26-8 above as follows:

The bent light ray starting at point A and entering the observer’s eye at point B is interpreted by the observer as having followed a straight line path emanating from the road, as indicated in the diagram. Thus, the observer sees the image of the sky on the road surface and interprets it as a shimmering pool of water.

Having done this, the question then arose as to how one could go about constructing an explicit mathematical model of the above scenario, yielding a suitable equation for the curved light ray from A to B, a linear equation for the apparent straight line path seen by the observer, and explicit coordinates for the point on the road where the image of the sky is seen by the observer. This turned out to be an interesting exercise involving Fermat’s Principle and the Calculus of Variations and is what I want to record here.

Suppose the light ray begins at point A = (a, b) at time t_1, and enters the observer’s eye at point B = (-a, b) at time t_2. Fermat’s Principle (see, e.g., this Wikipedia article) says that the path followed by the light ray is such as to make the optical length functional

S[y] = \int_A^B n ds

stationary, where n = c/v is the refractive index of the medium through which the light passes, c is the speed of light in a vacuum and v = ds/dt is the speed of light in the medium. This functional can be derived (up to a multiplicative constant) from the ‘Principle of Least Time’ by noting that the time taken by the light ray is

T = \int_{t_1}^{t_2} dt = \int_{t_1}^{t_2} \frac{1}{c} \frac{c}{v} \frac{ds}{dt} dt = \int_A^B \frac{n}{c} ds = \frac{1}{c} S

The light ray will find the path that minimises this time of travel.

To apply this setup to the mirage in Feynman’s lecture we need to model the refractive index as a function of the y-coordinate in my amended diagram above, which measures the height above the road. As Feynman says, light goes faster in the hot region near the road than in the cooler region higher up. Thus, since the refractive index is inversely proportional to v, it should be an increasing function of the height above the road y. To get a toy model for the scenario in Feynman’s lecture let us make the simplest possible assumption that the refractive index is a simple linear function of y, namely

n(y) = \alpha + \beta y

with \alpha and \beta both positive. Then since the arc-length element is

ds = dx \sqrt{1 + y^{\prime \ 2}}

we can write the optical length functional as

S[y] = \int_A^B n ds = \int_{a}^{-a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = -\int_{-a}^{a} dx (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

We find the stationary path for this functional using the Calculus of Variations. Let

F(x, y, y^{\prime}) = (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}}

Since this does not depend directly on x, the problem admits a first-integral of the form

y^{\prime} \frac{\partial F}{\partial y^{\prime}} - F = C

where C is a constant. We have

\frac{\partial F}{\partial y^{\prime}} = \frac{(\alpha + \beta y)y^{\prime}}{\sqrt{1 + y^{\prime \ 2}}}

Therefore the first-integral for this problem is

\frac{(\alpha + \beta y)y^{\prime \ 2}}{\sqrt{1 + y^{\prime \ 2}}} - (\alpha + \beta y)\sqrt{1 + y^{\prime \ 2}} = C

Multiplying through by \sqrt{1 + y^{\prime \ 2}}/ \alpha, absorbing \alpha into the constant term, and writing \delta \equiv \beta/\alpha we get

(1 + \delta y) y^{\prime \ 2} - (1 + \delta y)(1 + y^{\prime \ 2}) = C\sqrt{1 + y^{\prime \ 2}}


-(1 + \delta y) = C\sqrt{1 + y^{\prime \ 2}}


y^{\prime} = \frac{\pm \sqrt{(1+\delta y)^2 - C^2}}{C}

This is a first-order differential equation for y which can be solved by separation of variables. We get the integral equation

\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \pm \int \frac{dx}{C}

To solve the integral on the left-hand side, make the change of variable

(1 + \delta y) = C sec \theta


\delta dy = C sec \theta \ tan \theta \ d \theta


\int \frac{dy}{\sqrt{(1+\delta y)^2 - C^2}} = \int \frac{C sec \theta tan \theta d \theta}{\delta \sqrt{C^2 sec^2 \theta - C^2}}

= \frac{1}{\delta}\int tan \theta d \theta

= \frac{1}{\delta} ln[sec \theta] + const.

= \frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] + const.

For the integral on the right-hand side of the integral equation we get

\pm \int \frac{dx}{C} = \pm \frac{x}{C} + const.

Therefore the integral equation reduces to

\frac{1}{\delta} ln \big[\frac{(1 + \delta y)}{C}\big] = \pm \frac{x}{C} + const.


y = \frac{Cexp\big(\pm\frac{\delta x}{C} + const.\big) - 1}{\delta}

This seems to represent two possible solutions for the first-integral equation, which we may write as

y_1 = \frac{Cexp\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

y_2 = \frac{Cexp\big(- \big[ \frac{\delta x}{C} + const. \big] \big) - 1}{\delta}

However, for the curved light ray in my amended diagram above we must have y \rightarrow \infty as x \rightarrow \pm \infty. This condition is not satisfied by either of y_1 or y_2 on their own, but it is satisfied by their sum. We will therefore take the solution of the first integral equation to be

y = \frac{y_1 + y_2}{2}

= \frac{C}{\delta}\bigg[\frac{exp\big(\frac{\delta x}{C} + const.\big) + exp\big(- \big[ \frac{\delta x}{C} + const. \big] \big)}{2}\bigg] - \frac{1}{\delta}

= \frac{C cosh\big(\frac{\delta x}{C} + const.\big) - 1}{\delta}

Furthermore, we have y(a) = y(-a) = b and therefore we require

cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(-\frac{\delta a}{C} + const. \big)


cosh\big(\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) + sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)


cosh\big(-\frac{\delta a}{C} + const. \big) = cosh\big(\frac{\delta a}{C}\big) \ cosh(const.) - sinh\big(\frac{\delta a}{C}\big) \ sinh(const.)

These cannot be equal unless sinh(const.) = 0 \implies const. = 0. Thus, our solution for y reduces to

y = \frac{C cosh\big(\frac{\delta x}{C}\big) - 1}{\delta}

with the constant C determined in terms of a and b by

b = \frac{C cosh\big(\frac{\delta a}{C}\big) - 1}{\delta}

This is the equation of the curved path of the light ray from the sky in Feynman’s diagram. The slope of y at point B = (-a, b) is

y^{\prime}(-a) = -sinh\big(\frac{\delta a}{C}\big)

The straight line with this gradient passing through the point B has equation

y = \big(b - asinh\big(\frac{\delta a}{C}\big)\big) - sinh\big(\frac{\delta a}{C}\big)x

This is the equation of the straight line emanating from the x-axis to the observer’s eye in my amended diagram above. On the x-axis we have y = 0 in the straight-line equation so

x = \frac{b}{sinh\big(\frac{\delta a}{C}\big)} - a

This is the point on the x-axis at which the observer in my amended diagram will see the mirage.

Simple variational setups yielding Newton’s Second Law and Schrödinger’s equation

It is a delightful fact that one can get both the fundamental equation of classical mechanics (Newton’s Second Law) and the fundamental equation of quantum mechanics (Schrödinger’s equation) by solving very simple variational problems based on the familiar conservation of mechanical energy equation

K + U = E

In the present note I want to briefly set these out emphasising the common underlying structure provided by the conservation of mechanical energy and the calculus of variations. The kinetic energy K will be taken to be

K = \frac{1}{2}m \dot{x}^2 = \frac{p^2}{2m}

where \dot{x} = \frac{\mathrm{d}x}{\mathrm{d}t} is the particle’s velocity, p = m\dot{x} is its momentum, and m is its mass. The potential energy U will be regarded as some function of x only.

To obtain Newton’s Second Law we find the stationary path followed by the particle with respect to the functional

S[x] = \int_{t_1}^{t_2} L(t, x, \dot{x}) dt  = \int_{t_1}^{t_2} (K - U) dt

The function L(t, x, \dot{x}) = K - U is usually termed the `Lagrangian’ in classical mechanics. The functional S[x] is usually called the `action’. The Euler-Lagrange equation for this calculus of variations problem is

\frac{\partial L}{\partial x} - \frac{\mathrm{d}}{\mathrm{d}t}\big(\frac{\partial L}{\partial \dot{x}}\big) = 0

and this is Newton’s Second Law in disguise! We have

\frac{\partial L}{\partial x} = -\frac{\mathrm{d}U}{\mathrm{d}x} \equiv F

\frac{\partial L}{\partial \dot{x}} = m\dot{x} \equiv p


\frac{\mathrm{d}}{\mathrm{d}t} \big(\frac{\partial L}{\partial \dot{x}}\big) = \frac{\mathrm{d}p}{\mathrm{d}t} = m\ddot{x} \equiv ma

so substituting these into the Euler-Lagrange equation we get Newton’s Second Law, F = ma.

To obtain Schrödinger’s equation we introduce a function

\psi(x) = exp\big(\frac{1}{\hbar}\int p\mathrm{d}x\big)

where p = m \dot{x} is again the momentum of the particle and \hbar is the reduced Planck’s constant from quantum mechanics. (Note that \int p dx has units of length^2 mass time^{-1} so we need to remove these by dividing by \hbar which has the same units. The function \psi(x) in quantum mechanics is dimensionless). We then have

\text{ln} \psi = \frac{1}{\hbar}\int p\mathrm{d}x

and differentiating both sides gives

\frac{\psi^{\prime}}{\psi} = \frac{1}{\hbar} p


p^2 = \hbar^2 \big(\frac{\psi^{\prime}}{\psi}\big)^2

Therefore we can write the kinetic energy as

K = \frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2

and putting this into the conservation of mechanical energy equation gives

\frac{\hbar^2}{2m}\big(\frac{\psi^{\prime}}{\psi}\big)^2 + U = E


\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2 = 0

We now find the stationary path followed by the particle with respect to the functional

T[\psi] = \int_{-\infty}^{\infty} M(x, \psi, \psi^{\prime}) \mathrm{d}x = \int_{-\infty}^{\infty}  \big(\frac{\hbar^2}{2m} (\psi^{\prime})^2 + (U - E) \psi^2\big)\mathrm{d}x

The Euler-Lagrange equation for this calculus of variations problem is

\frac{\partial M}{\partial \psi} - \frac{\mathrm{d}}{\mathrm{d}x}\big(\frac{\partial M}{\partial \psi^{\prime}}\big) = 0

and this is Schrödinger’s equation in disguise! We have

\frac{\partial M}{\partial \psi} = 2(U - E)\psi

\frac{\partial M}{\partial \psi^{\prime}} = \frac{\hbar^2}{m} \psi^{\prime}


\frac{\mathrm{d}}{\mathrm{d}x} \big(\frac{\partial M}{\partial \psi^{\prime}}\big) = \frac{\hbar^2}{m} \psi^{\prime \prime}

so substituting these into the Euler-Lagrange equation we get

2(U - E) \psi - \frac{\hbar^2}{m} \psi^{\prime \prime} = 0


-\frac{\hbar^2}{2m} \frac{\mathrm{d}^2 \psi}{\mathrm{d} x^2} + U \psi = E \psi

and this is the (time-independent) Schrödinger equation for a particle of mass m with fixed total energy E in a potential U(x) on the line -\infty < x < \infty.

Alternative approaches for a differential equation involving the square of a derivative

In the course of working out a calculus of variations problem I came across the nonlinear first-order differential equation

\big( \frac{\mathrm{d} y}{\mathrm{d} x}\big)^2 + y = c

where c is a constant. The boundary conditions of the original variational problem were y(0) = 0 and y(1) = 1. Differential equations involving squares of derivatives can be tricky and I decided to take the opportunity to try to find at least a couple of different ways of solving this relatively simple one that might be applicable in harder cases. I want to record these here. (The reader should pause at this point and try to solve this equation by himself/herself before continuing).

One approach that worked nicely and that is generalisable to more difficult cases is to differentiate the equation, giving

2y^{\prime} y^{\prime \prime} + y^{\prime} = 0


y^{\prime}(2 y^{\prime \prime} + 1) = 0

This yields two simpler differential equations, namely

y^{\prime} = 0


2y^{\prime \prime} + 1 = 0

The first one implies that y is a constant but this is not consistent with the boundary conditions. We therefore ignore this one and the problem then reduces to solving the second simple second-order differential equation, which can be written as

\frac{\mathrm{d}^2 y}{\mathrm{d} x^2} = -\frac{1}{2}

Integrating twice we get the general solution

y = -\frac{1}{4} x^2 + c_1 x + c_2

where c_1 and c_2 are arbitrary constants. Using y(0) = 0 and y(1) = 1 we find that c_1 = \frac{5}{4} and c_2 = 0. Therefore the required final solution of the differential equation with the squared derivative is

y = -\frac{1}{4} x^2 + \frac{5}{4} x

The other approach I tried and that worked in this case was to rearrange the equation with the squared derivative directly in order to use separation of variables. Again this is something that might be worth trying in more difficult situations. In the present case we have

\big( \frac{\mathrm{d} y}{\mathrm{d} x}\big)^2 + y = c_1


(\mathrm{d} y)^2 = (c_1 - y) (\mathrm{d} x)^2


\big(\frac{\mathrm{d} y}{\sqrt{c_1 - y}}\big)^2 = (\mathrm{d} x)^2

and therefore

\big(\int \frac{\mathrm{d} y}{\sqrt{c_1 - y}}\big)^2 = (\int \mathrm{d} x)^2

So the problem reduces to solving

\int \frac{\mathrm{d} y}{\sqrt{c_1 - y}} = \int \mathrm{d} x

Carrying out the integrations we get

-2 \sqrt{c_1 - y} = x + c_2

which yields the general solution

y = c_1 - \frac{1}{4} x^2 - \frac{1}{2} c_2 x - \frac{1}{4} c_2^2

Using the boundary conditions we find that c_1 = \frac{25}{16} and c_2 = -\frac{5}{2}. Substituting these into the general solution we again find the final solution of the differential equation with the squared derivative to be

y = -\frac{1}{4} x^2 + \frac{5}{4} x

On an integral calculus approach to finding the area of a quadrarc

p01I recently gave a ‘stretch and challenge’ problem to some of my GCSE maths students which involved finding the area of a quadrarc. A quadrarc, shown shaded in the diagram, is the shape enclosed by four arcs having identical radius but with centres at the four corners of a square. Finding the formula for the area of a quadrarc is in principle just within the reach of a high-attaining GCSE maths student since it can be done using only plane geometry techniques covered in the later stages of a GCSE course. However, it is a challenging problem and although some of my students got quite close none of them managed to solve it fully. The same plane geometry techniques are covered more thoroughly in the foundation year of A-level maths courses so this approach should be more feasible for first-year A-level maths students.

When exploring alternative ways to solve the problem myself, I employed an integral calculus approach that struck me as also being potentially very pedagogically useful for A-level students. It involves a number of techniques from coordinate geometry and integration that are in principle within the reach of such students in the later stages of their studies, so the quadrarc problem is also eminently useful as a ‘stretch and challenge’ revision problem for high-attaining A-level maths students if it is specified that integral calculus is to be used (rather than plane geometry). I have not been able to find any integral calculus solutions to the quadrarc problem in the literature which bring out the potentially useful pedagogical features at A-level, so I want to record my solution in the present note emphasising these features. The reader should try to solve the problem of finding the area of a quadrarc for himself/herself before reading any further.

In what follows we will always take the main square within which the quadrarc lies to be of side length r. I will begin by quickly setting out a plane geometry solution which involves finding the areas of an equilateral triangle, a sector, and a segment (diagrams A, B and C below).


Given that the main square has side length r, the area of the equilateral triangle in diagram A is (using the sine rule for area)

\frac{1}{2} r^2 \sin 60^{\circ} = \frac{\sqrt{3}}{4} r^2

The area of the sector in diagram B is

\frac{60^{\circ}}{360^{\circ}} \pi r^2 = \frac{\pi}{6} r^2

so the area of the segment shaded in diagram C is

\frac{\pi}{6} r^2 - \frac{\sqrt{3}}{4} r^2

Finally we obtain the shaded area in diagram D by deducting a triangle and two segments from the area of a quarter circle:

\frac{\pi}{4} r^2 - \frac{\sqrt{3}}{4} r^2 - 2\big( \frac{\pi}{6} r^2 - \frac{\sqrt{3}}{4} r^2 \big) = \frac{\sqrt{3}}{4} r^2 - \frac{\pi}{12} r^2

The area of the quadrarc is then the area of the square minus four of the areas in diagram D:



Area of quadrarc =

r^2 - 4 \big( \frac{\sqrt{3}}{4} r^2 - \frac{\pi}{12} r^2 \big)

= \big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2



This same formula for the area of a quadrarc, \big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2, can be derived using integral calculus as follows.


Let the origin be at the bottom left corner of the square with increasing x-direction to the right and increasing y-direction upwards. Based on the equation of a circle with centre at (r, 0) and radius r, the equation of the arc through R and P is

y_{RP} = \sqrt{r^2 - (x - r)^2}

Similarly, based on the equation of a circle with centre at (r, r), the equation of the arc through R and S is

y_{RS} = r - \sqrt{r^2 - (x - r)^2}

The x-coordinate of the point R is given by the intersection of y_{RP} and y_{RS}, i.e.,

r - \sqrt{r^2 - (x - r)^2} = \sqrt{r^2 - (x - r)^2}


4x^2 - 8xr + r^2 = 0


x = r \pm \frac{\sqrt{3}}{2}r

Therefore the x-coordinate of the point R is

r \big(1 - \frac{\sqrt{3}}{2} \big)

Note that we need to discard the other solution, namely r \big(1 + \frac{\sqrt{3}}{2} \big), because it exceeds r and therefore lies outside the main square. By symmetry, the x-coordinate of both the points P and S is \frac{r}{2}. Therefore half of the area of the quadrarc should be given by the integral

\int_{r(1 - \sqrt{3}/2)}^{r/2} (y_{RP} - y_{RS}) dx = \int_{r(1 - \sqrt{3}/2)}^{r/2} (2\sqrt{r^2 - (x - r)^2} - r) dx

= 2 \int_{r(1 - \sqrt{3}/2)}^{r/2} \sqrt{r^2 - (x - r)^2} dx - \frac{(\sqrt{3} - 1)}{2}r^2

From tables of standard integrals we have that

\int\sqrt{a^2 - z^2} dz = \frac{1}{2} z \sqrt{a^2 - z^2} + \frac{1}{2} a^2 \arctan\big( \frac{z}{\sqrt{a^2 - z^2}}\big)

Therefore let’s employ the change of variable z = x - r. Then dz = dx and we have z = -\frac{\sqrt{3}}{2}r when x = r\big(1 - \frac{3}{2}\big) whereas z = -\frac{r}{2} when x = \frac{r}{2}. Our integral becomes

2 \int_{- (\sqrt{3}/2)r}^{-r/2} \sqrt{r^2 - z^2} dz = \bigg[ z\sqrt{r^2 - z^2} + r^2 \arctan\big( \frac{z}{\sqrt{r^2 - z^2}}\big)\bigg]_{- (\sqrt{3}/2)r}^{-r/2}

= -\frac{r}{2}\sqrt{r^2 - \frac{r^2}{4}} + r^2 \arctan\bigg( \frac{-\frac{r}{2}}{\sqrt{r^2 - \frac{r^2}{4}}}\bigg) + \frac{\sqrt{3}}{2}r \sqrt{r^2 - \frac{3r^2}{4}} + r^2 \arctan\bigg( \frac{\frac{\sqrt{3}r}{2}}{\sqrt{r^2 - \frac{3r^2}{4}}}\bigg)

= r^2 \bigg\{ \arctan(\sqrt{3}) - \arctan\big( \frac{1}{\sqrt{3}}\big) \bigg\}

= \big\{ \frac{\pi}{3} - \frac{\pi}{6}\big\} r^2 = \frac{\pi}{6}r^2

Therefore half the area of the quadrarc is

\int_{r(1 - \sqrt{3}/2)}^{r/2} (y_{RP} - y_{RS}) dx

= 2 \int_{r(1 - \sqrt{3}/2)}^{r/2} \sqrt{r^2 - (x - r)^2} dx - \frac{(\sqrt{3} - 1)}{2}r^2

= \bigg(\frac{\pi}{6} - \frac{(\sqrt{3} - 1)}{2}\bigg)r^2

The full area of the quadrarc is then

\big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2

as before.




A note on Bragg’s law for X-ray scattering

wlbIn 1912, at the age of only 22, William L. Bragg wrote a paper which essentially began the field now known as X-ray crystallography. Earlier that year, Max von Laue and others had passed X-rays through crystals of copper sulphate and zinc sulphide, obtaining patterns of dots on a photographic film. These patterns were the first evidence that X-rays could be diffracted by the regularly arranged atoms in crystals. The X-ray diffraction pattern produced by zinc sulphide was particularly sharp and clear, and seemed to show a four-fold symmetry.

Laue attempted to explain the patterns of dots mathematically but was not completely successful. He had tried to set up the maths based on overly complicated and partially incorrect models of what was going on at the atomic level. At this time the young William L. Bragg was just finishing a mathematics degree at Cambridge University. He heard about Laue’s work from his father, William H. Bragg, who was a mathematician and physicist himself and had been working on X-rays at the University of Leeds. When thinking about the problem, the young William Bragg envisaged a much simpler model than Laue’s in which the planes in the crystal lattice acted like mirrors. The incoming X-ray beams, interpreted as electromagnetic waves with a particular wavelength \lambda, would be reflected by the planes in the crystal lattice. The outgoing beams would then interfere with one another to give either constructive or destructive interference, depending on whether the crests of the waves were in phase or out of phase. The diagram below shows the standard set-up for deriving Bragg’s law that appears in every textbook and article on this topic.


The incident beams are at an angle \theta to the planes in the lattice and are reflected at the same angle. The extra distance travelled by the lower beam is the sum of the lengths of the two black arrows in the diagram. Elementary trigonometry shows that the length of each black arrow is \text{d} \sin \theta, so the extra distance travelled by the lower beam is 2 \text{d} \sin \theta. A black dot is produced on the photographic film only if the extra distance travelled by the lower beam is a whole number of wavelengths, since in this case there will be constructive interference of the outgoing waves and the reinforced beam will register strongly. If the extra distance travelled by the lower beam is not exactly equal to a whole number of wavelengths, there will be some X-ray beam nearby that will interfere destructively with it and they will cancel each other out. For example, the diagram below shows how the beams cancel each other out when the extra distance travelled by the lower beam is an odd multiple of half a wavelength. In this case, the scattering of the X-ray beams will not be detectable and no black spot will appear in the corresponding position on the photographic film.


It follows that it is possible to calculate the positions of the black spots on the photographic film using the equation

2 \text{d} \sin \theta = \text{n} \lambda

where n is an integer. This is Bragg’s law. A black spot appears on the photographic film only at those positions where the three quantities d, \theta and \lambda satisfy the equation of Bragg’s law simultaneously; otherwise no diffraction spot will appear on the film. This is the law that the young William Bragg published in his 1912 paper. Together, the two William Braggs continued to investigate crystal structures using X-ray diffraction, and father and son were both jointly awarded the Nobel Prize in physics for this work in 1915. The young William Bragg was only twenty-five and remains the youngest ever Nobel winner for science.

This is a lovely story and the standard textbook set-up above for deriving Bragg’s law seems straightforward enough. However, a bright A-level student of mine expressed dissatisfaction with it and asked if there is some alternative approach to deriving Bragg’s law that involves an explicit subtraction operation, i.e., an approach showing that a specific distance travelled by the lower X-ray beam minus a specific distance travelled by the upper X-ray beam equals 2 \text{d} \sin \theta. I felt this was an interesting question but when we tried to find an alternative approach in the literature we noticed that every textbook and every source online that we looked at had exactly this same set-up, or something very similar to it that the student also did not like! We eventually managed to come up with a calculation of the kind the student wanted to see by varying the standard textbook set-up slightly, and this is what I want to record in the present note.

We altered the standard set-up by moving the two short perpendiculars to the X-ray beams symmetrically away from the point of incidence of the upper beam, as indicated in the following diagram.


A convenient place to put the blue perpendiculars is at the points of intersection of the lower X-ray beam with the upper crystal plane, which is what I have done in the diagram above. This then allows the type of explicit subtraction operation to be carried out that the student wanted to see. All we needed to do was to demonstrate that the distance shown in red in the diagram below, minus the distance shown in green, exactly equals 2 \text{d} \sin \theta.

braggsdiagram2From the diagram we see immediately that

\text{h} = \frac{\text{d}}{\sin \theta}

\frac{\text{j}}{\text{k}} = \cos \theta

\frac{\text{d}}{\text{k}} = \tan \theta


\text{j} = \text{k} \cos \theta = \frac{\text{d}}{\tan \theta} \cos \theta

and the desired subtraction operation is then

2 \text{h} - 2 \text{j} = \frac{2 \text{d}}{\sin \theta} - \frac{2 \text{d}}{\tan \theta} \cos \theta = 2 \text{d} \bigg( \frac{1}{\sin \theta} - \frac{\cos^2 \theta}{\sin \theta} \bigg) = 2 \text{d} \sin \theta

as required. One can now imagine the blue perpendiculars sliding back together towards the centre of the diagram. The difference between the red and the green distances will always remain equal to 2 \text{d} \sin \theta as the blue perpendiculars slide inwards but the green distance will eventually shrink to zero. We will then have arrived back at the standard textbook set-up and the extra distance travelled by the lower X-ray beam will (of course) remain 2 \text{d} \sin \theta.