# On Lie derivatives of tensor fields

Differential geometry provides a number of ways of extending the familiar notion of the derivative of a real-valued function to enable differentiation of various types of tensor fields that live’ on manifolds, such as scalars, contravariant vectors, one-forms (also known as covariant vectors) and mixed tensors. The problem that needs to be overcome in such cases is the fact that partial differentiation of tensors is generally not tensorial, i.e., the result is not itself a tensor. The reason is that the process of differentiation involves subtracting tensors living in different tangent spaces on a curved manifold, so their difference does not transform in the same way as either of the tensors individually. For example, consider a contravariant vector field $X^a$. In the tangent space at point $P$ the transformation law for $X^a$ is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_P X^b$

whereas in the tangent space at point $Q$ the transformation law is

$X^{\prime a} = \big[\frac{\partial x^{\prime a}}{\partial x^b}\big]_Q X^b$

If we imagine these two tangent spaces at points $P$ and $Q$ on the manifold separated by distance $\delta u$, the derivative would involve computing

$\lim_{\delta u \rightarrow 0} \frac{[X^a]_P - [X^a]_Q}{\delta u}$

and the difference in the numerator will clearly not transform like either of the contravariant vectors individually since their transformation matrices are evaluated at different points. The derivative will not therefore be a tensor itself.

The usual way of getting around this is to introduce some kind of auxiliary field on to the manifold which provides a link between the two tensors, thus enabling them to transform in the same way (with respect to the auxiliary field) when they are subtracted. In the present note I want to explore the particular concept of the Lie derivative of a tensor field (named after the mathematician Marius Sophus Lie, 1842-1899) which employs this auxiliary field approach by introducing a contravariant vector field on to the manifold. To this end, suppose we define a vector field $X^a (x)$. This can be used to define streamlines in the manifold (also called a congruence of curves) as the solutions of the ordinary differential equations

$\frac{dx^a}{du} = X^a (x(u))$

where $u$ is a parameter determining the position on a given streamline. The equations encapsulate the fact that each point on a streamline has a tangent vector belonging to the vector field.

Example Using the notation $x^i$ to denote the $i$-th coordinate, suppose the vector field is $X = (1, x^2(u))$. Then $X^1 = 1$, $X^2 = x^2$ and the streamlines for this vector field are obtained by solving simultaneously the differential equations

$\frac{dx^1}{du} = 1$

$\frac{dx^2}{du} = x^2$

Solving the first equation gives

$x^1 = u + c^1$

Solving the second equation gives

$x^2 = e^{c^2}e^u$

Using the solution of the first equation to substitute for $u$ in the solution of the second one we get

$x^2 = e^{c^2} e^{x^1 - c^1} = e^{c^2 - c^1}e^{x^1} \equiv Ce^{x^1}$

Therefore the streamlines of the vector field $(1, x^2(u))$ are the graphs of the equation

$x^2 = Ce^{x^1}$

where $C$ is a constant. Some of the streamlines are shown in the figure below.

Now suppose we want to find the Lie derivative of a tensor field, $T^{ab \cdots}_{cd \cdots}(x)$, using the vector field $X^a(x)$. The essential idea is to use the streamlines of the vector field to link the tensor at some point $P$, $T^{ab\cdots}_{cd \cdots}(P)$, with the tensor at some neighbouring point $Q$, $T^{ab\cdots}_{cd \cdots}(Q)$, in such a way that the two will have the same transformation matrix at point $Q$ (with respect to the auxiliary vector field). We can then subtract the two tensors at $Q$ and so define the derivative at $P$ by a limiting process as $Q$ tends to $P$. In all such cases, the technique begins by considering a coordinate transformation from $P$ to $Q$ of the form

$x^{\prime a} = x^a + \delta u X^a(x)$

where $\delta u$ is arbitrarily small. The point $Q$ with coordinates $x^{\prime a}$ lies on the streamline through $P$ which the vector field $X^a(x)$ generates. Differentiating the coordinate transformation we get

$\frac{\partial x^{\prime a}}{\partial x^b} = \frac{\partial x^a}{\partial x^b} + \delta u \ \partial_b X^a$

$= \delta^a_b + \delta u \ \partial_b X^a$

where $\delta^a_b$ is the Kronecker delta and $\partial_b \equiv \frac{\partial}{\partial x^b}$. What we now do is consider the effect of the above coordinate transformation on the tensor field $T^{ab\cdots}_{cd \cdots}$ at the points $P$ and $Q$. In what follows I will employ this general procedure to obtain the Lie derivative formulas with respect to a contravariant vector field $X^a(x)$ in the case of a scalar field $\phi$, a contravariant vector field $Y^a$, a one-form field $Y_a$, and a general mixed tensor field $T^{ab\cdots}_{cd \cdots}$.

The Lie derivative of a scalar field $\phi$

Not surprisingly this is the easiest case to deal with since scalars are invariants: the values of a scalar field defined over a manifold do not change under a change in the coordinate system being used. The value of the scalar field at the point $P$ will simply be $\phi(x)$ and the value at the point $Q$ will be

$\phi(x^{\prime}) = \phi(x^c + \delta u X^c)$

We can expand this in a Taylor series about the point $P$ with coordinates $x$ to get the first-order approximation

$\phi(x^{\prime}) \approx \phi(x) + \delta u X^c \ \partial_c \phi(x)$

The Lie derivative of the scalar field with respect to the contravariant vector field $X^a(x)$ is then

$L_X \phi = \lim_{\delta u \rightarrow 0} \frac{\phi(x^{\prime}) - \phi(x)}{\delta u} = X^c \ \partial_c \phi$

$\equiv X^a \ \partial_a \phi$

We observe that the Lie derivative of the scalar field $\phi(x)$ with respect to the vector field $X^a(x)$ is actually the directional derivative of $\phi$ in the direction of the vector $X^a$. In the differential geometry literature in this area it is common to associate the contravariant vector field $X$ with the linear differential operator $X^a \ \partial_a$ (which operates on any real-valued function $f$ to produce another function $g$) and essentially treat them as the same object. Given a point $P$ on the manifold, one thinks of the partial differential operators $[\partial_a]_P$ as constituting a basis for all the vectors in the tangent space at $P$, so that any vector at $P$ can be written as a linear combination of the $[\partial_a]_P$ in the form

$[X]_P = [X^a]_P [\partial_a]_P$

This is the intuitive justification for treating the vector field $X$ and the linear differential operator $X^a \ \partial_a$ as being the same things. Under this convention, one often sees the Lie derivative of a scalar field $\phi$ with respect to the contravariant vector field $X$ written as

$L_X \phi = X \phi$

The Lie derivative of a contravariant vector field $Y^a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the contravariant vector field $Y^a(x)$ at $P$ is mapped to

$Y^{\prime a}(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^b} Y^b(x)$

$= (\delta^a_b + \delta u \ \partial_b X^a) Y^b(x)$

$= Y^a(x) + \delta u \ Y^b \ \partial_b X^a$

The vector already at $Q$, namely $Y^a(x^{\prime})$, has a first-order Taylor series approximation about $x$ of the form

$Y^a(x^{\prime}) = Y^a(x^c + \delta u X^c)$

$\approx Y^a(x) + \delta u \ X^c \ \partial_c Y^a(x)$

The Lie derivative with respect to the vector field $X^a(x)$ is then given by

$L_X Y^a = \lim_{\delta u \rightarrow 0} \frac{Y^a (x^{\prime}) - Y^{\prime a}(x^{\prime})}{\delta u}$

$= X^c \ \partial_c Y^a - Y^b \ \partial_b X^a$

$\equiv X^b \ \partial_b Y^a - Y^b \ \partial_b X^a$

Under the convention of associating the vector field $X$ with the linear differential operator $X^a \partial_a$, one often sees the Lie derivative of a contravariant vector field $Y^a$ with respect to the field $X$ written as

$L_X Y^a = [X, Y]^a$

where $[X, Y] = XY - YX$ is called the Lie bracket (or commutator) of the two vector fields $X$ and $Y$. This is a new vector field (and therefore linear differential operator) that can be written alternatively as

$[X, Y] = X(Y^a \partial_a) - Y(X^a \partial_a)$

$= X^b \partial_b (Y^a \partial_a) - Y^b \partial_b (X^a \partial_a)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a + X^a Y^b(\partial_b \partial_a - \partial_a \partial_b)$

$= (X^b \partial_b Y^a - Y^b \partial_b X^a) \partial_a$

where the last equality follows from the fact that the second term in the penultimate line will always vanish by Young’s Theorem (equality of cross-partials). Therefore the $a$-th component of the vector field $[X, Y]$ is the one that appears in the expression of the Lie derivative $L_X Y^a$ above.

The Lie derivative of a one-form (covariant vector) field $Y_a$

Under the coordinate transformation from $P$ to $Q$ given by

$x^{\prime a} = x^a + \delta u X^a(x)$

the one-form (covariant vector) field $Y_a(x)$ at $P$ is mapped to

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

To work out the transformation matrix here we need to write the coordinate transformation as

$x^{\prime b} = x^b + \delta u X^b(x)$

$\implies$

$x^b = x^{\prime b} - \delta u X^b(x)$

Partially differentiating we get

$\frac{\partial x^b}{\partial x^{\prime a}} = \delta^b_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^b$

$= \delta^b_a - \delta u \ \partial_c X^b \frac{\partial x^c}{\partial x^{\prime a}}$

$= \delta^b_a - \delta u \ \partial_c X^b \big(\delta^c_a - \delta u \frac{\partial}{\partial x^{\prime a}} X^a\big)$

$= \delta^b_a - \delta u \ \partial_a X^b + O((\delta u)^2)$

We can ignore the $O((\delta u)^2)$ terms as they will disappear in the limiting process of the differentiation, so we have

$Y^{\prime}_a(x^{\prime}) = \frac{\partial x^b}{\partial x^{\prime a}} Y_b(x)$

$= \big( \delta^b_a - \delta u \ \partial_a X^b \big) Y_b(x)$

$= Y_a(x) - \delta u Y_b \ \partial_a X^b$

Again taking a first-order Taylor series approximation about $x$ at the point $Q$ we get that

$Y_a(x^{\prime}) = Y_a(x^c + \delta u X^c)$

$\approx Y_a(x) + \delta u X^c \ \partial_c Y_a(x)$

Then the Lie derivative of the one-form field $Y_a(x)$ with respect to the contravariant vector field $X^a(x)$ is obtained as

$L_X Y_a = \lim_{\delta u \rightarrow 0} \frac{Y_a(x^{\prime}) - Y^{\prime}_a (x^{\prime})}{\delta u}$

$= X^c \partial_c Y_a + Y_b \partial_a X^b$

$= X^b \partial_b Y_a + Y_b \partial_a X^b$

The Lie derivative of a mixed tensor field $T^{ab\cdots}_{cd \cdots}$

A good prototype for this case is the Lie derivative of the simplest type of mixed tensor field, the rank-2 tensor of type $(1, 1)$ represented as $T^a_b(x)$. We will therefore consider this case first and then use it to extrapolate to a general mixed tensor field of type $(p, q)$ represented as $T^{ab \cdots}_{cd \cdots}(x)$.

Under the coordinate transformation

$x^{\prime a} = x^a + \delta u X^a(x)$

the mixed tensor field $T^a_b(x)$ transforms as

$T^{\prime a}_b(x^{\prime}) = \frac{\partial x^{\prime a}}{\partial x^c} \frac{\partial x^d}{\partial x^{\prime b}} T^c_d(x)$

$= (\delta^a_c + \delta u \ \partial_c X^a)(\delta^d_b - \delta u \ \partial_b X^d) T^c_d$

$= (\delta^a_c + \delta u \ \partial_c X^a)(T^c_b - \delta u T^c_d \ \partial_b X^d)$

$= T^a_b(x) - \delta u T^a_d \ \partial_b X^d + \delta u T^c_b \ \partial_c X^a + O((\delta u)^2)$

Under a first-order Taylor series approximation about $x$, the tensor at $Q$ can be written

$T^a_b(x^{\prime}) = T^a_b(x^c + \delta u X^c)$

$\approx T^a_b(x) + \delta u X^c \ \partial_c T^a_b$

The Lie derivative of $T^a_b(x)$ with respect to the contravariant vector field $X^a(x)$ is then

$L_X T^a_b = \lim_{\delta u \rightarrow 0} \frac{T^a_b(x^{\prime}) - T^{\prime a}_b(x^{\prime})}{\delta u}$

$= X^c \ \partial_c T^a_b + T^a_d \ \partial_b X^d - T^c_b \ \partial_c X^a$

We observe that the contravariant index $a$ contributes a term of the form $-T^c_b \ \partial_c X^a$ while the covariant index $b$ contributes a term of the form $T^a_d \ \partial_b X^d$.

Now consider the general mixed tensor $T^{ab \cdots}_{cd \cdots}$. The first-order Taylor series approximation of $T^{ab \cdots}_{cd \cdots}(x^{\prime})$ about $x$ gives

$T^{ab \cdots}_{cd \cdots}(x^e + \delta u X^e)$

$\approx T^{ab \cdots}_{cd \cdots}(x) + \delta u X^e \ \partial_e T^{ab \cdots}_{cd \cdots}(x)$

Therefore the first-term of the Lie derivative will be $X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$. This is of the same form as the first term of $L_X T^a_b$. Conveniently, it turns out that from then on each contravariant and covariant index in $T^{ab \cdots}_{cd \cdots}$ will contribute terms like the corresponding terms we saw above in $L_X T^a_b$. Therefore the Lie derivative of the general mixed tensor field $T^{ab \cdots}_{cd \cdots}(x)$ with respect to the contravariant vector field $X^a(x)$ will be of the form

$L_X T^{ab \cdots}_{cd \cdots} = X^e \ \partial_e T^{ab \cdots}_{cd \cdots}$

$- \ T^{eb \cdots}_{cd \cdots} \ \partial_e X^a \ - \ T^{ae \cdots}_{cd \cdots} \ \partial_e X^b \ - \ \cdots$

$+ \ T^{ab \cdots}_{ed \cdots} \ \partial_c X^e \ + \ T^{ab \cdots}_{ce \cdots} \ \partial_d X^e \ + \ \cdots$

# A closer look at how the principal, orbital and magnetic quantum numbers arise in Schrödinger’s theory of the hydrogen atom

In this note I want to explore some aspects of the solution of Schrödinger’s three-dimensional wave equation in spherical polar coordinates which pertain to the three main quantum numbers characterising the electron in a hydrogen atom: the electron’s principal, orbital and magnetic quantum numbers. In particular, I have noticed that many physics and mathematics sources tend to gloss over some (or all) of the mathematical details of how permissibility conditions on the three quantum numbers emerge naturally from validity constraints on the relevant underlying differential equations. I want to bring out these details clearly in the present note.

In general, four quantum numbers are needed to fully describe atomic electrons in many-electron atoms. These four numbers and their permissible values are:

Principal quantum number $\tilde{n} = 1, 2, 3, \ldots$

Orbital quantum number $l = 0, 1, 2, \ldots, (\tilde{n} - 1)$

Magnetic quantum number $m_l = 0, \pm 1, \pm 2, \ldots, \pm l$

Spin magnetic quantum number $m_s = -\frac{1}{2}, +\frac{1}{2}$

The principal quantum number determines the electron’s energy, the orbital quantum number its orbital angular-momentum magnitude, the magnetic quantum number its orbital angular-momentum direction, and the spin magnetic quantum number its spin direction.

I have noticed that it is often not explained clearly why, for example, the orbital quantum number cannot exceed the principal quantum number minus one, or why the magnitude of the magnetic quantum number cannot exceed that of the orbital quantum number. I will explore issues like these in depth in the context of the time-independent Schrödinger equation for the hydrogen atom, which only involves the first three quantum numbers. I will not discuss the spin magnetic quantum number in the present note.

Schrödinger’s wave equation for the hydrogen atom

In Cartesian coordinates, Schrödinger’s three-dimensional equation for the electron in the hydrogen atom is

$\frac{\partial^2 \psi}{\partial x^2} + \frac{\partial^2 \psi}{\partial y^2} + \frac{\partial^2 \psi}{\partial z^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

where $m_e$ denotes the electron mass. The potential energy $U$ is the electric potential energy of a charge $-e$ given that it is at distance $r$ from another charge $+e$, namely

$U = -\frac{e^2}{4 \pi \epsilon_0 r}$

It is necessary to change variables in Schrödinger’s equation since the potential energy is a function of radial distance $r$ rather than the Cartesian coordinate variables $x$, $y$ and $z$. Given the spherical symmetry of the atom, it is sensible to proceed by changing the variables in Schrödinger’s equation to those of spherical polar coordinates (rather than changing the $r$ variable in $U$ to Cartesian coordinates using $r = \sqrt{x^2 + y^2 + z^2}$). Only the variables in the Laplacian part of Schrödinger’s equation need to be changed, so we can use the approach in my previous note on changing variables in Laplace’s equation to get

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial \psi}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 \psi}{\partial \phi^2} + \frac{2m_e}{\hbar^2}(E - U) \psi = 0$

I will now temporarily simplify things by using the representation of the square of the angular momentum operator in spherical polar coordinates which I obtained in another previous note, namely

$L^2 = - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$

$= - \hbar^2 r^2 \big( \frac{1}{r^2 \sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$

Using this to replace the two middle terms in Schrödinger’s equation and rearranging we get

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial \psi}{\partial r}\big) + \frac{2m_e}{\hbar^2}(E - U) \psi = \frac{L^2}{\hbar^2 r^2} \psi$

This equation can now be solved by the usual separation of variables approach. We assume that the $\psi$ function can be expressed as a product

$\psi(r, \theta, \phi) = R(r) Y(\theta, \phi)$

and then substitute this back into the wave equation to get

$\frac{Y}{r^2} \frac{d}{ d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e}{\hbar^2}(E - U) R Y = \frac{R}{\hbar^2 r^2} L^2 Y$

Dividing through by $\frac{R Y}{r^2}$ we get

$\frac{1}{R} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \frac{2m_e r^2}{\hbar^2}(E - U) = \frac{1}{\hbar^2 Y} L^2 Y$

Since the left-hand side of this equation depends only on $r$ while the right-hand side depends only on $\theta$ and $\phi$, both sides must be equal to some constant which we can call $\lambda$. Setting the left and right-hand sides equal to $\lambda$ in turn and rearranging slightly we finally get the radial equation

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

and the angular equation

$L^2 Y = \lambda \hbar^2 Y$

We can now apply separation of variables again to the angular equation. Rewriting the operator $L^2$ in full the angular equation becomes

$- \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} \big) = \lambda \hbar^2 Y$

which simplifies to

$\frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial Y}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2 Y}{\partial \phi^2} + \lambda Y = 0$

We assume that the $Y$ function can be written as the product

$Y(\theta, \phi) = \Theta (\theta) \Phi (\phi)$

Substituting this into the angular equation gives

$\frac{\Phi}{\sin \theta}\frac{d}{d \theta} \big( \sin \theta \frac{d \Theta}{d \theta}\big) + \frac{\Theta}{\sin^2 \theta}\frac{d^2 \Phi}{d \phi^2} + \lambda Y \Theta \Phi = 0$

Multiplying through by $\frac{\sin^2 \theta}{\Theta \Phi}$ and rearranging we get

$-\frac{1}{\Phi} \frac{d^2 \Phi}{d \phi^2} = \frac{\sin^2 \theta}{\Theta}\big[ \frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \lambda \Theta\big]$

Since the left-hand side of this equation depends only on $\phi$ while the right-hand side depends only on $\theta$, both sides must be equal to some constant which we can provisionally call $k$. Setting the left and right-hand sides equal to $k$ in turn and rearranging we get

$\frac{d^2 \Phi}{d \phi^2} + k \Phi = 0$

and

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{k}{\sin^2 \theta}\big) \Theta = 0$

We now have three ordinary differential equations that need to be solved, one for $\Phi$, one for $\Theta$ and one for $R$. We will solve each of them in turn.

The equation for $\Phi$

The equation for $\Phi$ is a straightforward second-order differential equation with auxiliary equation

$\zeta^2 + k = 0$

implying $\zeta = \pm \sqrt{-k}$ if $k < 0$ and $\zeta = \pm i \sqrt{k}$ if $k > 0$. Therefore it has a general solution of the form

$\Phi(\phi) = Ae^{\sqrt{-k} \phi} + Be^{- \sqrt{-k} \phi}$

if $k < 0$ and $\Phi(\phi) = Ae^{i \sqrt{k} \phi} + Be^{- i \sqrt{k} \phi}$ if $k > 0$, where $A$ and $B$ are arbitrary constants. Now, the azimuth angle $\phi$ can take any value in $(-\infty, \infty)$ but the function $\Phi$ must take a single value at each point in space (since this is a required property of the quantum wave function which $\Phi$ is a constituent of). It follows that the function $\Phi$ must be periodic since it must take the same value at $\phi$ and $\phi + 2\pi$ for any given $\phi$. This imposes two constraints on the form of the general solution: (1) it cannot consist only of exponential functions with real arguments since these are not periodic (thus ruling out the first general solution above and thereby implying that the separation constant $k$ must be nonnegative); (2) $\sqrt{k}$ must be an integer. Given these constraints, it is customary in quantum mechanics to denote $\pm \sqrt{k}$ by the letter $m$ (it is called the magnetic quantum number) and to specify the separation constant in the angular equations as $m^2$, which guarantees its nonnegativity. We then state the general solution of the equation for $\Phi$ as

$\Phi(\phi) = Ae^{i m \phi} + Be^{- i m \phi}$

In practice, a particular electron wave function will involve only a single value of $m$ so only the first of the two terms in the general solution will be necessary. We can therefore state the general solution of the equation for $\Phi$ for a given magnetic quantum number $m$ as

$\Phi(\phi) \propto e^{i m \phi}$

The equation for $\Theta$

Given that we now know the separation constant for the angular equations is either zero or a positive square number $k = m^2$, we can write the equation for $\Theta$ as

$\frac{1}{\sin \theta}\frac{d}{d \theta}\big(\sin \theta \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

Expanding the first term we get

$\frac{1}{\sin \theta} \cos \theta \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2} + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

I am now going to multiply and divide the first two terms by $\sin^2 \theta$ to get

$\sin^2 \theta \big(\frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} \big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

$\iff$

$\sin^2 \theta \big(- \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{1}{\sin^2 \theta} \frac{d^2 \Theta}{d \theta^2} + \frac{2 \cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta}\big) + \big( \lambda - \frac{m^2}{\sin^2 \theta}\big) \Theta = 0$

Now we can make the change of variable $x = \cos \theta$ which implies $dx = - \sin \theta d \theta$ and therefore

$\frac{d \theta}{dx} = - \frac{1}{\sin \theta}$

$\frac{d \Theta}{d x} = \frac{d \Theta}{d \theta} \frac{d \theta}{d x} = - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta}$

$\frac{d^2 \Theta}{d x^2} = \frac{d}{d \theta} \big[ - \frac{1}{\sin \theta} \frac{d \Theta}{d \theta} \big] \frac{d \theta}{d x} = - \frac{\cos \theta}{\sin^3 \theta} \frac{d \Theta}{d \theta} + \frac{d^2 \Theta}{d \theta^2}$

Using these in the amended form of the $\Theta$ equation together with the fact that $\sin^2 \theta$ = $1 - x^2$, the $\Theta$ equation becomes

$(1 - x^2) \big(\frac{d^2 \Theta}{d x^2} - \frac{2x}{1 - x^2} \frac{d \Theta}{d x} \big) + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

$\iff$

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

We will solve this equation first for the case $m = 0$ (the solutions will be Legendre polynomials) and use these results to construct solutions for the case $m \neq 0$ (the solutions here will be the associated Legendre functions). Setting $m = 0$ we get

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \lambda \Theta = 0$

which has the form of a well known differential equation known as Legendre’s equation. It can be solved by assuming a series solution of the form

$\Theta = a_0 + a_1 x + a_2 x^2 + a_3 x^3 + a_4 x^4 + \cdots + a_n x^n + \cdots$

and then differentiating it term by term twice to get

$\Theta^{\prime} = a_1 + 2 a_2 x + 3 a_3 x^2 + 4 a_4 x^3 + \cdots + n a_n x^{n-1} + \cdots$

and

$\Theta^{\prime \prime} = 2 a_2 + 6 a_3 x + 12 a_4 x^2 + 20 a_5 x^3 + \cdots + n (n - 1) a_n x^{n-2} + \cdots$

We now substitute these into Legendre’s equation and set the coefficient of each power of $x$ equal to zero (because $\Theta$ must satisfy Legendre’s equation identically). We find that the coefficient of the $x^n$ term satisfies

$(n + 2)(n + 1) a_{n + 2} + (\lambda - n(n + 1)) a_n = 0$

which implies

$a_{n + 2} = - \frac{(\lambda - n(n + 1))}{(n + 2)(n + 1)} a_n$

This formula makes it possible to find any even coefficient as a multiple of $a_0$ and any odd coefficient as a multiple of $a_1$. The general solution of our Legendre equation is then a sum of two series involving two arbitrary constants $a_0$ and $a_1$:

$\Theta = a_0 \bigg \{1 - \frac{\lambda}{2!} x^2 + \frac{\lambda (\lambda - 6)}{4!} x^4 - \frac{\lambda(\lambda - 6)(\lambda - 20)}{6!} x^6 + \cdots \bigg \}$

$+ a_1 \bigg \{x - \frac{(\lambda - 2)}{3!} x^3 + \frac{(\lambda - 2)(\lambda - 12)}{5!} x^5 - \frac{(\lambda - 2)(\lambda - 12)(\lambda - 30)}{7!} x^7 + \cdots \bigg \}$

Both of the series in this sum converge for $x^2 < 1$ but in general they do not converge for $x^2 = 1$. This is a problem for us because in our change of variables we set $x = \cos \theta$ and we want solutions that converge for all possible values of $\theta$ including those that result in $x^2 = 1$. It turns out that the only way to get such solutions is to choose integer values of $\lambda$ that make either the $a_0$ or the $a_1$ series in the above sum terminate (the other series will generally be divergent so we remove it by setting the corresponding arbitrary constant equal to zero). This requires $\lambda$ to take values in the quadratic sequence $0$, $2$, $6$, $12$, $20$, $30$, $42$, $56 \ldots$ The $l$-th term of this sequence is $l(l + 1)$, so the separation constant $\lambda$ must be of this form, i.e., $\lambda = l(l + 1)$ for some $l = 0, 1, 2, 3, \dots$. When $l$ takes an even value the $a_0$ series will terminate and we can set $a_1 = 0$ to make the other series vanish. Conversely, when $l$ takes an odd value the $a_1$ series will terminate and we can set $a_0 = 0$ to make the other series vanish.

From the eigenvalue equation for $L^2$ given earlier ($L^2 Y = \lambda \hbar^2 Y$) it is clear that the magnitude of the orbital angular momentum is $L = \sqrt{l(l + 1)} \hbar$. It is interesting to see how the form of this arises mathematically from considering series solutions to Legendre’s equation above. The parameter $l$ is called the orbital angular momentum quantum number.

Note that negative integral values of $l$ are allowed but they simply give solutions already obtained for positive values. For example, $l = -2$ gives $\lambda = 2$ and this makes the $a_1$ series terminate, yielding the polynomial solution

$\Theta = a_1 x$

This is exactly the same solution as the one that would be obtained if $l = 1$. It is therefore customary to restrict $l$ to nonnegative values. Each possible value of $l$ gives a polynomial solution to Legendre’s equation. For $l = 0$ we get $\Theta = a_0$, for $l = 1$ we get $\Theta = a_1 x$, for $l = 2$ we get $\Theta = a_0 - 3 a_0 x^2$, and so on. If the value of $a_0$ or $a_1$ in each polynomial equation is selected so that $\Theta = 1$ when $x = 1$ the resulting polynomials are called Legendre polynomials, denoted by $P_l(x)$. Given that for each $l$ we have $P_l(1) = 1$ the first few Legendre polynomials are

$P_0(x) = 1$ $P_1(x) = x$

$P_2(x) = \frac{1}{2}(3 x^2 - 1)$

$P_3(x) = \frac{1}{2}\big(5 x^3 - 3x \big)$

These are the physically acceptable solutions to Legendre’s equation for $\Theta$ above.

We now consider the solutions for

$m \neq 0$ of the equation $(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big(\lambda - \frac{m^2}{1 - x^2} \big) \Theta = 0$

We now know that $\lambda = l(l + 1)$ so we can write this in and we can also add the subscript $l$ to $m$ as the solutions to this equation will involve a link the between the values of the orbital angular momentum and magnetic quantum numbers. The equation we need to solve becomes

$(1 - x^2) \frac{d^2 \Theta}{d x^2} - 2x \frac{d \Theta}{d x} + \big[l(l + 1) - \frac{m_l^2}{1 - x^2} \big] \Theta = 0$

The link between $l$ and $m_l$ arises from the fact that we are constrained in trying to solve this equation: it encompasses the case $m_l = 0$ for which the physically acceptable solutions are the Legendre polynomials $P_l(x)$. Therefore the physically allowable solutions for the above equation must include the Legendre polynomials as a special case. We can find these by using the series approach again and it turns out that the physically acceptable solutions are the so-called associated Legendre functions which take the form

$P_l^{m_l}(x) = (1 - x^2)^{m_l/2}\frac{d^{m_l}}{d x^{m_l}}P_l(x)$

Now, each Legendre polynomial $P_l(x)$ is a polynomial of degree $l$. Therefore the $m_l$-th order derivative in $P_l^{m_l}$ will equal zero if $|m_l| > l$, so for physically acceptable solutions we must impose the constraint $|m_l| \leq l$ in the differential equation for $\Theta$. This is where the link between the quantum numbers $l$ and $m_l$ comes from in the quantum theory of the hydrogen atom: given a value of $l$ the acceptable values of $m_l$ are integers in the range $-l \leq m_l \leq l$.

Finally, note two things: (1) $P_l^{m_l}(x)$ reduces to the Legendre polynomial $P_l(x)$ when $m_l = 0$, which is what we needed. (2) A negative value for $m_l$ does not change $m_l^2$ in the original differential equation so a solution for positive $m_l$ is also a solution for the corresponding negative $m_l$. Thus many references define the associated Legendre function $P_l^{m_l}(x)$ for $-l \leq m_l \leq l$ as $P_l^{|m_l|}(x)$.

To conclude, given values for the quantum numbers $l$ and $m_l$, the general solution of the equation for $\Theta$ can be written as

$\Theta(\theta) \propto P_l^{m_l}(\cos \theta)$

To clarify where the principal quantum number comes from, the final equation we need to deal with is the radial equation

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}(E - U) - \frac{\lambda}{r^2} \big] R = 0$

Writing $\lambda = l(l + 1)$ and replacing $U$ with the formula for the potential energy we get

$\frac{1}{r^2} \frac{d}{d r}\big( r^2 \frac{d R}{d r}\big) + \big[ \frac{2m_e}{\hbar^2}\big(\frac{e^2}{4 \pi \epsilon_0 r} + E\big) - \frac{l(l + 1)}{r^2} \big] R = 0$

$\iff$

$\frac{d^2 R}{d r^2} + \frac{2}{r} \frac{d R}{d r} + \frac{2m_e}{\hbar^2} \big[ E + \frac{e^2}{4 \pi \epsilon_0 r} - \frac{l(l + 1) \hbar^2}{2 m_e r^2} \big] R = 0$

We are only interested in solutions for which the electron is bound within the atom, so we take $E < 0$ (the negative energy of the electron is the amount of energy that must be supplied to it to free it from the atom). In order to solve the above equation it is customary to make the change of variable

$\rho = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} r$

and define the dimensionless constant

$\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2}$

If we then specify $R = R(\rho)$ we have

$\frac{d R}{d r} = \frac{d R}{d \rho} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d R}{d \rho}$

$\frac{2}{r} \frac{d R}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{2}{\rho} \frac{d R}{d \rho}$

$\frac{d^2 R}{d r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big)^{1/2} \frac{d^2 R}{d \rho^2} \frac{d \rho}{d r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{d^2 R}{d \rho^2}$

$\frac{2 m_e}{\hbar^2} E + \frac{2 m_e}{\hbar^2} \frac{e^2}{4 \pi \epsilon_0 r} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big \{\frac{1}{4}\frac{e^2}{4 \pi \epsilon_0 r} \big(-\frac{1}{E}\big) - \frac{1}{4}\big \} = \big(-\frac{8 m_e E}{\hbar^2}\big) \big(\frac{\tau}{\rho} - \frac{1}{4}\big)$

$\frac{l(l + 1)}{r^2} = \big(-\frac{8 m_e E}{\hbar^2}\big) \frac{l(l + 1)}{\rho^2}$

Using these results we can rewrite the differential equation as

$\big(-\frac{8 m_e E}{\hbar^2}\big) \bigg \{\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) \bigg \} = 0$

$\iff$

$\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0$

To make further progress we consider the behaviour of this differential equation as $\rho \rightarrow \infty$. It reduces to

$\frac{d^2 R}{d \rho^2} - \frac{1}{4} R = 0$

which is a straightforward second-order differential equation with auxiliary equation

$\zeta^2 - \frac{1}{4} = 0$

$\implies \zeta = \pm \frac{1}{2}$

The positive solution to the auxiliary equation implies a term in the general solution of the form $e^{\rho/2}$ which is unacceptable since it explodes as $\rho \rightarrow \infty$. Therefore we only accept the negative solution to the auxiliary equation and the general solution for $R$ as $\rho \rightarrow \infty$ must be of the form

$R \propto e^{-\rho/2}$

This suggests we can try an exact solution of the full differential equation of the form

$R = e^{-\rho/2} F(\rho)$

Differentiating this twice we get

$\frac{d R}{d \rho} = -\frac{1}{2} e^{-\rho/2} F(\rho) + e^{-\rho/2} F^{\prime} (\rho)$

$\frac{d^2 R}{d \rho^2} = \frac{1}{4} e^{-\rho/2} F(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) - \frac{1}{2} e^{-\rho/2} F^{\prime}(\rho) + e^{-\rho/2} F^{\prime \prime} (\rho)$

Substituting these into the differential equation

$\frac{d^2 R}{d \rho^2} + \frac{2}{\rho} \frac{d R}{d \rho} + \big[\frac{\tau}{\rho} - \frac{1}{4} - \frac{l(l + 1)}{\rho^2}\big] R(\rho) = 0$

gives

$F^{\prime \prime}(\rho) + \frac{(2 - \rho)}{\rho} F^{\prime}(\rho) + \big[\frac{(\tau - 1)}{\rho} - \frac{l(l + 1)}{\rho^2}\big] F(\rho) = 0$

$\iff$

$\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0$

We can now try to solve this latest version of the differential equation by the method of Frobenius, which involves assuming a generalised power series solution of the form

$F(\rho) = a_0 \rho^s + a_1 \rho^{s + 1} + a_2 \rho^{s + 2} + \cdots$

Differentiating twice we get

$F^{\prime}(\rho) = s a_0 \rho^{s-1} + (s + 1) a_1 \rho^s + (s + 2) a_2 \rho^{s + 1} + \cdots$

$F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^{s-2} + s (s + 1) a_1 \rho^{s-1} + (s + 1) (s + 2) a_2 \rho^s + \cdots$

Then the terms appearing in the differential equation have the generalised power series forms

$\rho^2 F^{\prime \prime}(\rho) = (s - 1) s a_0 \rho^s + s (s + 1) a_1 \rho^{s+1} + (s + 1) (s + 2) a_2 \rho^{s+2} + \cdots$

$2 \rho F^{\prime}(\rho) = 2 s a_0 \rho^s + 2 (s + 1) a_1 \rho^{s+1} + 2 (s + 2) a_2 \rho^{s + 2} + \cdots$

$-\rho^2 F^{\prime}(\rho) = - s a_0 \rho^{s+1} - (s + 1) a_1 \rho^{s+2} - (s + 2) a_2 \rho^{s + 3} - \cdots$

$(\tau - 1) \rho F(\rho) = (\tau - 1) a_0 \rho^{s+1} + (\tau - 1) a_1 \rho^{s + 2} + (\tau - 1) a_2 \rho^{s + 3} + \cdots$

$-l(l + 1) F(\rho) = -l(l + 1) a_0 \rho^s - l(l + 1) a_1 \rho^{s + 1} - l(l + 1) a_2 \rho^{s + 2} - \cdots$

Summing these terms (remembering that the sum must be identically equal to zero) we find the coefficient of $\rho^s$ to be

$[s(s - 1) + 2s - l(l+1)]a_0 = 0$

$\implies$

$s(s + 1) - l(l+1) = 0$

$\implies$

$s = l$ or $s = -l - 1$

Now, when $s = -l - 1$ the first term of the power series for $F(\rho)$ is $a_0/\rho^{l+1}$ which explodes as $\rho \rightarrow 0$. This is unacceptable so we discard this solution and set $s = l$.

For the coefficient of $\rho^{s + n}$ we get

$[(s+n)(s+n-1) + 2(s+n) - l(l+1)]a_n + [(\tau - 1) - (s+n-1)]a_{n-1} = 0$

Setting $s = l$ and rearranging gives us the recurrence equation

$a_n = \frac{(l + n - \tau)}{(l+n+1)(l+n) - l(l+1)} a_{n-1}$

From this recurrence equation we observe that

$a_n \rightarrow \frac{1}{n}a_{n-1} = \frac{1}{n!}a_0$

as $n \rightarrow \infty$. We deduce from this that the series for $F(\rho)$ becomes like $a_0 \rho^l \sum \frac{\rho^n}{n!}$ as $n \rightarrow \infty$ and therefore $R = e^{-\rho/2} F(\rho)$ becomes like $a_0 \rho^l e^{\rho/2}$. However, this diverges as $\rho \rightarrow \infty$ which is unacceptable, so we conclude that the series for $F(p)$ must terminate at some value of $n$ which we will call $N$. In this case we have $a_{N+1} = 0$ which the recurrence equation tells us can only happen if

$\tau = l + N + 1 \equiv \tilde{n}$

This is how the principal quantum number $\tilde{n}$ first appears. Now, we have

$\tau = \frac{e^2}{4 \pi \epsilon_0 \hbar} \big(-\frac{m_e}{2 E} \big)^{1/2} = \tilde{n}$

$\iff$

$\big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \big(-\frac{m_e}{2 \hbar^2}\big) \frac{1}{E} = \tilde{n}^2$

$\iff E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}$

These are the famous bound-state energy eigenvalues for $\tilde{n} = 1, 2, \ldots$. This is the same formula for the energy levels of the hydrogen atom that Niels Bohr obtained by intuitive means in his 1913 ‘solar system’ model of atomic structure.

As stated above, the integer $\tilde{n}$ is called the principal quantum number. Recall that $\tilde{n} = l + N + 1$ and $N$ cannot be smaller than zero. It follows that

$\tilde{n} - l - 1 \geq 0$

$\iff$

$l \leq \tilde{n} - 1$

This explains why for a given value of $\tilde{n}$ the allowable values of $l$ are $l = 0, 1, 2, \dots, (\tilde{n}-1)$.

Returning to the solution of

$\rho^2 F^{\prime \prime}(\rho) + \rho (2 - \rho) F^{\prime}(\rho) + \big[\rho (\tau - 1) - l(l + 1)\big] F(\rho) = 0$

the above discussion suggests that we should look for a solution of the form

$F(\rho) = a_0 \rho^l L(\rho)$

where $L(\rho)$ is a polynomial (rather than an infinite series). Differentiating this twice gives

$F^{\prime}(\rho) = a_0 l \rho^{l-1}L(\rho) + a_0 \rho^l L^{\prime}(\rho)$

$F^{\prime \prime} (\rho) = a_0 (l-1) l \rho^{l-2}L(\rho) + 2 a_0 l \rho^{l-1} L^{\prime}(\rho) + a_0 \rho^l L^{\prime \prime}(\rho)$

Substituting these into the differential equation and setting $\tau = \tilde{n}$ we get

$\rho^{l+2} L^{\prime \prime}(\rho) + (2l + 2 - \rho) \rho^{l+1}L^{\prime}(\rho) + (\tilde{n} - 1 - l)\rho^{l+1}L(\rho) = 0$

$\iff$

$\rho L^{\prime \prime}(\rho) + (2l + 2 - \rho) L^{\prime}(\rho) + (\tilde{n} - 1 - l) L(\rho) = 0$

$\iff$

$\rho L^{\prime \prime}(\rho) + (\alpha + 1 - \rho) L^{\prime}(\rho) + \tilde{n}^{*} L(\rho) = 0$

where $\alpha \equiv 2l + 1$ and $\tilde{n}^{*} \equiv \tilde{n} - 1 - l$. This last form is a well known differential equation whose physically acceptable solutions in the present context are associated Laguerre polynomials given by the formula

$L_{\tilde{n}^{*}}^{(\alpha)} = \sum_{j = 0}^{\tilde{n}^{*}} (-1)^j \frac{(\tilde{n}^{*} + \alpha)!}{(\tilde{n}^{*}-j)!(\alpha + j)!}\frac{\rho^j}{j!}$

For given quantum numbers $\tilde{n}$ and $l$, the solution of the radial equation for $R$ is then

$R_{\tilde{n}l}(r) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)}$

Final form of the electron wave function $\psi$

Putting everything together, for given principal quantum number $\tilde{n}$, orbital quantum number $l$ and magnetic quantum number $m_l$, the wave function of the electron in the hydrogen atom is

$\psi_{\tilde{n} l m_l}(r, \theta, \phi) \propto e^{-\rho/2}p^l L_{\tilde{n} - l - 1}^{(2l + 1)} P_l^{m_l}(\cos \theta) e^{i m_l \phi}$

where

$\rho = \big(-\frac{8 m_e E_{\tilde{n}}}{\hbar^2}\big)^{1/2} r$

and

$E_{\tilde{n}} = \big(-\frac{m_e}{2 \hbar^2}\big) \big(\frac{e^2}{4 \pi \epsilon_0}\big)^2 \frac{1}{\tilde{n}^2}$

# Changing variables in the square of the quantum mechanical angular momentum operator

A particle of mass $m$ with position vector $\mathbf{r}$ and velocity $\mathbf{v}$ (with respect to some specified origin) has a linear momentum vector $\mathbf{p} = m \mathbf{v}$ and angular momentum vector

$\mathbf{L} = \mathbf{r} \times \mathbf{p} = m \mathbf{r} \times \mathbf{v}$

where $\times$ is the vector product operation. The magnitude of the angular momentum vector is $L = rmv\sin \theta$ where $\theta$ is the angle between $\mathbf{r}$ and $\mathbf{v}$. The direction of $\mathbf{L}$ is given by the right-hand rule when the vectors $\mathbf{r}$ and $\mathbf{v}$ are placed with their tails at the same point: one curls the fingers of the right hand in the direction of rotation of $\mathbf{r}$ into $\mathbf{v}$ and the thumb then points in the direction of $\mathbf{L}$. One can find the components of $\mathbf{L}$ in the $x$, $y$ and $z$ directions in Cartesian coordinates using

$\mathbf{L} = \mathbf{r} \times \mathbf{p} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ \ \\x & y & z \\ \ \\p_x & p_y & p_z \end{vmatrix}$

$= (yp_z - zp_y)\mathbf{i} + (zp_x - xp_z)\mathbf{j} + (xp_y - yp_x)\mathbf{k}$

Therefore the components of $\mathbf{L}$ in Cartesian coordinates are

$L_x = yp_z - zp_y$

$L_y = zp_x - xp_z$

$L_z = xp_y - yp_x$

In classical mechanics the angular momentum magnitude $L$ can take a continuum of values but in quantum mechanics only certain discrete values for $L$ are permissible. Furthermore, the linear momentum vector $\mathbf{p}$ appearing in $\mathbf{L}$ must obey Heisenberg’s uncertainty principle in each direction, i.e.,

$\Delta x \Delta p_x \geq \frac{\hbar}{2}$

in the $x$-direction and similarly for the $y$ and $z$ directions. These features of quantized variables like $\mathbf{p}$ and $\mathbf{L}$ make it necessary to do calculations with them in quantum mechanics using their operator representations. (For example, on the quantum scale one cannot calculate the expectation of momentum in the $x-$direction using an integral involving momentum as a function of $x$ because no such function can exist: Heisenberg’s uncertainty principle prevents accurate knowledge of momentum when the value of $x$ is known exactly. One must therefore use the operator representation of momentum rather than momentum as some function of $x$ in order to calculate the expectation). It is a basic postulate of quantum mechanics that every observable quantity characterising a physical system can be represented by a quantum mechanical operator obtained by expressing the quantity in terms of $\mathbf{r}$ and $\mathbf{p}$ and then replacing the vector $\mathbf{p}$ by $- i \hbar \nabla$ where

$\nabla = \mathbf{i} \frac{\partial}{\partial x} + \mathbf{j} \frac{\partial}{\partial y} + \mathbf{k} \frac{\partial}{\partial z}$

and its components $p_x$$p_y$ and $p_z$  by

$p_x = - i \hbar \frac{\partial}{\partial x}$

$p_y = - i \hbar \frac{\partial}{\partial y}$

$p_z = - i \hbar \frac{\partial}{\partial z}$

Taking this on board we can then write $\mathbf{L}$ and $L_x$, $L_y$ and $L_z$ in quantum mechanical operator form in Cartesian coordinates as

$\mathbf{L} = - i \hbar (\mathbf{r} \times \nabla)$

$L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)$

$L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)$

$L_x = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)$

In order to perform some calculations involving Schrödinger’s equation I needed to employ the square of the quantum mechanical angular momentum operator $L^2$, but in spherical polar coordinates rather than Cartesian coordinates, where

$L^2 = L_x^2 + L_y^2 + L_z^2$

I used the matrix calculus approach in my previous note to achieve the necessary change of variables in $L^2$. In the present note I want to record the details of this calculation as I have never seen this approach used elsewhere. (This change of variables can also be done using a scale factor method based on vector calculus which I will not go into here).

As in my previous note we begin the matrix calculus approach with the standard conversion equations for spherical polar coordinates:

$x = r \sin \theta \cos \phi$

$y = r \sin \theta \sin \phi$

$z = r \cos \theta$

Differentiating with respect to the vector $(r, \theta, \phi)$ we get

$\begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix}$

We can then use the matrix version of the chain rule to write

$\begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix}$

We can solve this system by inverting the coefficient matrix to get

$\begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\\ \ \\ \sin \theta \sin \phi & \frac{\cos \theta \sin \theta}{r} & \frac{\cos \phi}{r \sin \theta}\\ \ \\ \cos \theta & -\frac{\sin \theta}{r} & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

Using the equations in this system together with the standard conversion equations we then have

$y \frac{\partial F}{\partial z} = r \sin \theta \sin \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \sin \phi \frac{\partial F}{\partial r} - \sin^2 \theta \sin \phi \frac{\partial F}{\partial \theta}$

and

$z \frac{\partial F}{\partial y} = r \cos \theta \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \cos \theta \sin \theta \sin \phi \frac{\partial F}{\partial r} + \cos^2 \theta \sin \phi \frac{\partial F}{\partial \theta} + \frac{\cos \theta \cos \phi}{\sin \theta}\frac{\partial F}{\partial \phi}$

Subtracting the second expression from the first and ignoring the $F$ in the numerators of the partial derivatives we can then write the angular momentum operator in the $x-$direction in terms of spherical polar coordinates as

$L_x = - i \hbar \big( y \frac{\partial}{\partial z} - z \frac{\partial}{\partial y} \big)$

$= - i \hbar \big( - \cot \theta \cos \phi \frac{\partial}{\partial \phi} - \sin \phi \frac{\partial}{\partial \theta} \big)$

$= i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big)$

Similarly we have

$z \frac{\partial F}{\partial x} = r \cos \theta \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \cos \theta \sin \theta \cos \phi \frac{\partial F}{\partial r} + \cos^2 \theta \cos \phi \frac{\partial F}{\partial \theta} - \frac{\cos \theta \sin \phi}{\sin \theta}\frac{\partial F}{\partial \phi}$

and

$x \frac{\partial F}{\partial z} = r \sin \theta \cos \phi \big( \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta} \big) = r \sin \theta \cos \theta \cos \phi \frac{\partial F}{\partial r} - \sin^2 \theta \cos \phi \frac{\partial F}{\partial \theta}$

Therefore

$L_y = - i \hbar \big( z \frac{\partial}{\partial x} - x \frac{\partial}{\partial z} \big)$

$= - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)$

Finally, we have

$x \frac{\partial F}{\partial y} = r \sin \theta \cos \phi \big( \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} + \cos^2 \phi \frac{\partial F}{\partial \phi}$

and

$y \frac{\partial F}{\partial x} = r \sin \theta \sin \phi \big( \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}\big)$

$= r \sin^2 \theta \cos \phi \sin \phi \frac{\partial F}{\partial r} + \sin \theta \cos \theta \sin \phi \cos \phi \frac{\partial F}{\partial \theta} - \sin^2 \phi \frac{\partial F}{\partial \phi}$

Therefore

$L_z = - i \hbar \big( x \frac{\partial}{\partial y} - y \frac{\partial}{\partial x} \big)$

$= - i \hbar \frac{\partial}{\partial \phi}$

Having obtained the components of $\mathbf{L}$ in spherical polar coordinates we can now finally calculate the operator representation of $L^2$ as

$L^2 = L_x^2 + L_y^2 + L_z^2$

$= \big[ i \hbar \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} + \sin \phi \frac{\partial}{\partial \theta} \big) \big]^2 + \big[ - i \hbar \big(\cos \phi \frac{\partial}{\partial \theta} - \cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) \big]^2 + \big[ - i \hbar \frac{\partial}{\partial \phi} \big]^2$

$= - \hbar^2 \big[ \sin^2 \phi \frac{\partial^2}{\partial \theta^2} + \big(\sin \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big) + \big(\cot \theta \cos \phi \frac{\partial}{\partial \phi} \big)\big(\sin \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \cos^2 \phi \frac{\partial^2}{\partial \phi^2} \big]$

$- \hbar^2 \big[ \cos^2 \phi \frac{\partial^2}{\partial \theta^2} - \big(\cos \phi \frac{\partial}{\partial \theta} \big) \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big) - \big(\cot \theta \sin \phi \frac{\partial}{\partial \phi} \big)\big(\cos \phi \frac{\partial}{\partial \theta} \big) + \cot^2 \theta \sin^2 \phi \frac{\partial^2}{\partial \phi^2} \big]$

$- \hbar^2 \frac{\partial^2}{\partial \phi^2}$

$= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2} - \sin \phi \frac{1}{\sin^2 \theta} \cos \phi \frac{\partial}{\partial \phi} + \cot \theta \cos^2 \phi \frac{\partial}{\partial \theta}$

$+ \cos \phi \frac{1}{\sin^2 \theta} \sin \phi \frac{\partial}{\partial \phi} + \cot \theta \sin^2 \phi \frac{\partial}{\partial \theta}\big)$

$= - \hbar^2 \big( \frac{\partial^2}{\partial \theta^2} + \cot \theta \frac{\partial}{\partial \theta} + \cot^2 \theta \frac{\partial^2}{\partial \phi^2} + \frac{\partial^2}{\partial \phi^2}\big)$

$= - \hbar^2 \big( \frac{1}{\sin \theta}\frac{\partial}{\partial \theta} \big( \sin \theta \frac{\partial}{\partial \theta}\big) + \frac{1}{\sin^2 \theta}\frac{\partial^2}{\partial \phi^2} \big)$

# A matrix calculus approach to changing variables in Laplace’s equation

In three-dimensional rectangular coordinates, the partial differential equation known as Laplace’s equation takes the form

$\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} = 0$

This equation is applicable to a wide range of problems in physics but it is often necessary to make a change of variables from rectangular to spherical polar coordinates in order to better match the spherical symmetry of particular contexts. Until now I had only ever worked out this change of variables using a scale factor method based on the formula for the Laplacian of a function $F$ in general orthogonal curvilinear coordinates:

$\nabla^2 F = \frac{1}{h_1 h_2 h_3}\bigg[ \frac{\partial }{\partial x_1}\big( \frac{h_2 h_3}{h_1}\frac{\partial F}{\partial x_1} \big) + \frac{\partial }{\partial x_2}\big( \frac{h_1 h_3}{h_2}\frac{\partial F}{\partial x_2} \big) + \frac{\partial }{\partial x_3}\big( \frac{h_1 h_2}{h_3}\frac{\partial F}{\partial x_3} \big) \bigg]$

This formula is derived using basic results from vector calculus and allows Laplace’s equation to be easily expressed in any orthogonal coordinate system once the form of the Euclidean metric is known in that coordinate system. (Orthogonal coordinate systems are those for which the coordinate basis vectors are always orthogonal at each point). The Euclidean metric in general orthogonal coordinates $x_1$, $x_2$, $x_3$ will take the form

$ds^2 = h_1^2 dx_1^2 + h_2^2 dx_2^2 + h_3^2 dx_3^2$

The coefficients $h_1$, $h_2$ and $h_3$ are the scale factors appearing in the Laplacian formula. In the metric, these convert the coordinate differentials $dx_i$ into lengths $h_i dx_i$.

In the case of rectangular coordinates we have $x_1 = x$, $x_2 = y$, $x_3 = z$ and $h_1 = h_2 = h_3 = 1$ so the Euclidean metric and the Laplacian formula reduce to the familiar forms

$ds^2 = dx^2 + dy^2 + dz^2$

and

$\nabla^2 F = \frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2}$

The scale factors are all equal to 1 in this case since the coordinate differentials are already in units of length. In the case of spherical polar coordinates (an orthogonal coordinate system) we can find the scale factors and hence the Laplacian by using the standard conversion equations

$x = r \sin \theta \cos \phi$

$y = r \sin \theta \sin \phi$

$z = r \cos \theta$

Taking the differentials of these, squaring them and adding them we find that the Euclidean metric in spherical polar coordinates takes the form

$ds^2 = dr^2 + r^2 d \theta^2 + r^2 \sin^2 \theta d \phi^2$

The scale factors in this case are therefore $h_1 = 1$, $h_2 = r$ and $h_3 = r \sin \theta$. Putting these into the Laplacian formula we immediately get

$\nabla^2 F = \frac{1}{r^2 \sin \theta} \bigg[ \frac{\partial }{\partial r}\big( \frac{r^2 \sin \theta}{1}\frac{\partial F}{\partial r} \big) + \frac{\partial }{\partial \theta}\big( \frac{r \sin \theta}{r}\frac{\partial F}{\partial \theta} \big) + \frac{\partial }{\partial \phi}\big( \frac{r}{r \sin \theta}\frac{\partial F}{\partial \phi} \big) \bigg]$

$= \frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2}$

Therefore Laplace’s equation in spherical polar coordinates is

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2} = 0$

I recently had occasion to use Laplace’s equation in spherical polar coordinates again and this time I decided to have a go at justifying the change of variables to myself using a matrix calculus approach instead of the above scalar factor method. This turned out to be a laborious but interesting process. In the present note I want to record my calculations using the matrix calculus approach in detail as I have never seen something like this elsewhere. The advantage of the matrix calculus approach is that it works for any coordinate system irrespective of whether it is orthogonal or not.

In the matrix calculus approach we use the standard conversion equations for spherical polar coordinates

$x = r \sin \theta \cos \phi$

$y = r \sin \theta \sin \phi$

$z = r \cos \theta$

to work out the coefficient matrix in the system

$\begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial y}{\partial r} & \frac{\partial z}{\partial r}\\ \ \\ \frac{\partial x}{\partial \theta} & \frac{\partial y}{\partial \theta} & \frac{\partial z}{\partial \theta}\\ \ \\ \frac{\partial x}{\partial \phi} & \frac{\partial y}{\partial \phi} & \frac{\partial z}{\partial \phi}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix}$

$\big($which of course is equivalent to the set of equations

$\frac{\partial F}{\partial r} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial r} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial r} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial r}$

$\frac{\partial F}{\partial \theta} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial \theta} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial \theta} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial \theta}$

$\frac{\partial F}{\partial \phi} = \frac{\partial F}{\partial x} \frac{\partial x}{\partial \phi} + \frac{\partial F}{\partial y} \frac{\partial y}{\partial \phi} + \frac{\partial F}{\partial z} \frac{\partial z}{\partial \phi} \big)$

We get the system

$\begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z} \end{bmatrix}$

We can solve this system either by inverting the coefficient matrix or by using Cramer’s rule. Using Cramer’s rule we get

$\frac{\partial F}{\partial x} = \frac{\begin{vmatrix} \frac{\partial F}{\partial r} & \sin \theta \sin \phi & \cos \theta\\ \ \\ \frac{\partial F}{\partial \theta} & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ \frac{\partial F}{\partial \phi} & r \sin \theta \cos \phi & 0\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \sin^2 \theta \cos \phi \frac{\partial F}{\partial r} + r \cos \theta \sin \theta \cos \phi \frac{\partial F}{\partial \theta} - r \sin \phi \frac{\partial F}{\partial \phi} }{r^2 \sin \theta}$
$\$
$= \sin \theta \cos \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \cos \phi}{r} \frac{\partial F}{\partial \theta} - \frac{\sin \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}$

and similarly

$\frac{\partial F}{\partial y} = \frac{\begin{vmatrix} \sin \theta \cos \phi & \frac{\partial F}{\partial r} & \cos \theta\\ \ \\ r \cos \theta \cos \phi & \frac{\partial F}{\partial \theta} & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & \frac{\partial F}{\partial \phi} & 0\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \sin^2 \theta \sin \phi \frac{\partial F}{\partial r} + r \cos \theta \sin \theta \sin \phi \frac{\partial F}{\partial \theta} + r \cos \phi \frac{\partial F}{\partial \phi} }{r^2 \sin \theta}$
$\$
$= \sin \theta \sin \phi \frac{\partial F}{\partial r} + \frac{\cos \theta \sin \phi}{r} \frac{\partial F}{\partial \theta} + \frac{\cos \phi}{r \sin \theta} \frac{\partial F}{\partial \phi}$

and

$\frac{\partial F}{\partial z} = \frac{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \frac{\partial F}{\partial r}\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & \frac{\partial F}{\partial \theta}\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & \frac{\partial F}{\partial \phi}\end{vmatrix}}{\begin{vmatrix} \sin \theta \cos \phi & \sin \theta \sin \phi & \cos \theta\\ \ \\ r \cos \theta \cos \phi & r \cos \theta \sin \phi & - r \sin \theta\\ \ \\ - r \sin \theta \sin \phi & r \sin \theta \cos \phi & 0\end{vmatrix}} = \frac{r^2 \cos \theta \sin \theta \frac{\partial F}{\partial r} - r \sin^2 \theta \frac{\partial F}{\partial \theta}}{r^2 \sin \theta}$
$\$
$= \cos \theta \frac{\partial F}{\partial r} - \frac{\sin \theta}{r} \frac{\partial F}{\partial \theta}$

We can write these solutions more concisely in matrix form as

$\begin{bmatrix} \frac{\partial F}{\partial x}\\ \ \\ \frac{\partial F}{\partial y} \\ \ \\ \frac{\partial F}{\partial z}\end{bmatrix} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\\ \ \\ \sin \theta \sin \phi & \frac{\cos \theta \sin \theta}{r} & \frac{\cos \phi}{r \sin \theta}\\ \ \\ \cos \theta & -\frac{\sin \theta}{r} & 0\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

The coefficient matrix here is, of course, the inverse of the coefficient matrix in the original system.

To find the second-order partials $\frac{\partial^2F}{\partial x^2}$, $\frac{\partial^2F}{\partial y^2}$ and $\frac{\partial^2F}{\partial z^2}$, let $G \equiv \frac{\partial F}{\partial x}$, $H \equiv \frac{\partial F}{\partial y}$ and $I \equiv \frac{\partial F}{\partial z}$. Then we need to find $\frac{\partial G}{x}$, $\frac{\partial H}{\partial y}$ and $\frac{\partial I}{\partial z}$. We can use the first, second and third equations respectively in the above matrix system to find these, with $F$ replaced by $G$, $H$ and $I$ respectively. Thus,

$\frac{\partial G}{\partial x} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial G}{\partial r}\\ \ \\ \frac{\partial G}{\partial \theta} \\ \ \\ \frac{\partial G}{\partial \phi} \end{bmatrix}$

$\frac{\partial H}{\partial y} = \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial H}{\partial r}\\ \ \\ \frac{\partial H}{\partial \theta} \\ \ \\ \frac{\partial H}{\partial \phi} \end{bmatrix}$

$\frac{\partial I}{\partial z} = \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix} \begin{bmatrix} \frac{\partial I}{\partial r}\\ \ \\ \frac{\partial I}{\partial \theta} \end{bmatrix}$

But differentiating

$G \equiv \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

with respect to $r$, $\theta$ and $\phi$ we get

$\begin{bmatrix} \frac{\partial G}{\partial r}\\ \ \\ \frac{\partial G}{\partial \theta} \\ \ \\ \frac{\partial G}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \cos \phi\\ \ \\ \frac{\cos \theta \cos \phi}{r} \\ \ \\ -\frac{\sin \phi}{r \sin \theta} \end{bmatrix} + \begin{bmatrix} 0 & -\frac{\cos \theta \cos \phi}{r^2} & \frac{\sin \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \cos \phi & -\frac{\sin \theta \cos \phi}{r} & \frac{\sin \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ -\sin \theta \sin \phi & -\frac{\cos \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

Similarly, differentiating

$H \equiv \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

with respect to $r$, $\theta$ and $\phi$ we get

$\begin{bmatrix} \frac{\partial H}{\partial r}\\ \ \\ \frac{\partial H}{\partial \theta} \\ \ \\ \frac{\partial H}{\partial \phi}\end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \sin \phi\\ \ \\ \frac{\cos \theta \sin \phi}{r} \\ \ \\ \frac{\cos \phi}{r \sin \theta} \end{bmatrix} + \begin{bmatrix} 0 & -\frac{\cos \theta \sin \phi}{r^2} & -\frac{\cos \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \sin \phi & -\frac{\sin \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

and differentiating

$I \equiv \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \end{bmatrix}$

with respect to $r$ and $\theta$ we get

$\begin{bmatrix} \frac{\partial I}{\partial r}\\ \ \\ \frac{\partial I}{\partial \theta} \end{bmatrix} = \begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} \end{bmatrix} \begin{bmatrix} \cos \theta \\ \ \\ -\frac{\sin \theta}{r} \end{bmatrix} + \begin{bmatrix} 0 & \frac{\sin \theta}{r^2}\\ \ \\ -\sin \theta & -\frac{\cos \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \end{bmatrix}$

Therefore we have

$\frac{\partial G}{\partial x} = \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \cos \phi\\ \ \\ \frac{\cos \theta \cos \phi}{r} \\ \ \\ -\frac{\sin \phi}{r \sin \theta} \end{bmatrix}$

$+ \begin{bmatrix} \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} 0 & -\frac{\cos \theta \cos \phi}{r^2} & \frac{\sin \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \cos \phi & -\frac{\sin \theta \cos \phi}{r} & \frac{\sin \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ -\sin \theta \sin \phi & -\frac{\cos \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

and similarly

$\frac{\partial H}{\partial y} = \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta} & \frac{\partial^2 F}{\partial r \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} & \frac{\partial^2 F}{\partial \theta \partial \phi}\\ \ \\ \frac{\partial^2 F}{\partial \phi \partial r} & \frac{\partial^2 F}{\partial \phi \partial \theta} & \frac{\partial^2 F}{\partial \phi^2}\end{bmatrix} \begin{bmatrix} \sin \theta \sin \phi\\ \ \\ \frac{\cos \theta \sin \phi}{r} \\ \ \\ \frac{\cos \phi}{r \sin \theta} \end{bmatrix}$

$+ \begin{bmatrix} \sin \theta \sin \phi & \frac{\cos \theta \sin \phi}{r} & \frac{\cos \phi}{r \sin \theta}\end{bmatrix}\begin{bmatrix} 0 & -\frac{\cos \theta \sin \phi}{r^2} & -\frac{\cos \phi}{r^2 \sin \theta}\\ \ \\ \cos \theta \sin \phi & -\frac{\sin \theta \sin \phi}{r} & -\frac{\cos \phi}{r \sin^2 \theta \cos \theta}\\ \ \\ \sin \theta \cos \phi & \frac{\cos \theta \cos \phi}{r} & -\frac{\sin \phi}{r \sin \theta}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \\ \ \\ \frac{\partial F}{\partial \phi} \end{bmatrix}$

and

$\frac{\partial I}{\partial z} = \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix}\begin{bmatrix} \frac{\partial^2 F}{\partial r^2} & \frac{\partial^2 F}{\partial r \partial \theta}\\ \ \\ \frac{\partial^2 F}{\partial \theta \partial r} & \frac{\partial^2 F}{\partial \theta^2} \end{bmatrix} \begin{bmatrix} \cos \theta \\ \ \\ -\frac{\sin \theta}{r} \end{bmatrix} + \begin{bmatrix} \cos \theta & -\frac{\sin \theta}{r} \end{bmatrix}\begin{bmatrix} 0 & \frac{\sin \theta}{r^2}\\ \ \\ -\sin \theta & -\frac{\cos \theta}{r}\end{bmatrix} \begin{bmatrix} \frac{\partial F}{\partial r}\\ \ \\ \frac{\partial F}{\partial \theta} \end{bmatrix}$

Laplace’s equation is given by $\frac{\partial G}{\partial x} + \frac{\partial H}{\partial y} + \frac{\partial I}{\partial z} = 0$ so we need to work out the three matrix equations above and sum them. The result will be a sum involving the nine partials $\frac{\partial^F}{\partial r^2}$, $\frac{\partial^F}{\partial \theta^2}$, $\frac{\partial^F}{\partial \phi^2}$, $\frac{\partial^F}{\partial r \partial \theta}$, $\frac{\partial^F}{\partial r \partial \phi}$, $\frac{\partial^F}{\partial \theta \phi}$, $\frac{\partial F}{\partial r}$, $\frac{\partial F}{\partial \theta}$ and $\frac{\partial F}{\partial \phi}$. I found that the most convenient way to work out the sum was by working out the coefficients of each of these nine partials individually in turn. The result is

$\frac{\partial G}{\partial x} + \frac{\partial H}{\partial y} + \frac{\partial I}{\partial z} =$

$\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} =$

$\big \{\sin^2 \theta \cos^2 \phi + \sin^2 \theta \sin^2 \phi + \cos^2 \theta \big \} \frac{\partial^2 F}{\partial r^2}$

$+ \big \{ \frac{\cos^2 \theta \cos^2 \phi}{r^2} + \frac{\cos^2 \theta \sin^2 \phi}{r^2} + \frac{\sin^2 \theta}{r^2} \big \} \frac{\partial^2 F}{\partial \theta^2}$

$+ \big \{ \frac{\sin^2 \phi}{r^2 \sin^2 \theta} + \frac{\cos^2 \phi}{r^2 \sin^2 \theta}\big \} \frac{\partial^2 F}{\partial \phi^2}$

$+ \big \{ \frac{\sin \theta \cos \theta \cos^2 \phi}{r} + \frac{\sin \theta \cos \theta \sin^2 \phi}{r} - \frac{\sin \theta \cos \theta}{r}\big \} \frac{\partial^2 F}{\partial r \partial \theta}$

$+ \big \{ -\frac{\cos \phi \sin \phi}{r} + \frac{\cos \phi \sin \phi}{r} \big \} \frac{\partial^2 F}{\partial r \partial \phi}$

$+ \big \{ -\frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} -\frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} + \frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} + \frac{\cos \theta \cos \phi \sin \phi}{r^2 \sin \theta} \big \} \frac{\partial^2 F}{\partial \theta \partial \phi}$

$+ \big \{ \frac{\cos^2 \theta \cos^2 \phi}{r} + \frac{\sin^2 \phi}{r} + \frac{\cos^2 \theta \sin^2 \phi}{r} + \frac{\cos^2 \phi}{r} + \frac{\sin^2 \theta}{r}\big \} \frac{\partial F}{\partial r}$

$+ \big \{ -\frac{2\sin \theta \cos \theta \cos^2 \phi}{r^2} + \frac{\cos \theta \sin^2 \phi}{r^2 \sin \theta} -\frac{2\sin \theta \cos \theta \sin^2 \phi}{r^2} + \frac{\cos \theta \cos^2 \phi}{r^2 \sin \theta} + \frac{2\cos \theta \sin \theta}{r^2} \big \} \frac{\partial F}{\partial \theta}$

$+ \big \{ \frac{\cos \phi \sin \phi}{r^2} + \frac{2 \cos \phi \sin \phi}{r^2 \sin^2 \theta} - \frac{\cos \phi \sin \phi}{r^2} - \frac{2 \cos \phi \sin \phi}{r^2 \sin^2 \theta} \big \} \frac{\partial F}{\partial \phi}$

which simplifies to

$\frac{\partial^2 F}{\partial x^2} + \frac{\partial^2 F}{\partial y^2} + \frac{\partial^2 F}{\partial z^2} =$

$\frac{\partial^2 F}{\partial r^2} + \frac{1}{r^2} \frac{\partial^2 F}{\partial \theta^2} + \frac{1}{r^2 \sin^2 \theta} \frac{\partial^2 F}{\partial \phi^2} + \frac{2}{r} \frac{\partial F}{\partial r} + \frac{\cos \theta}{r^2 \sin \theta} \frac{\partial F}{\partial \theta}$

$= \frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2}$

Therefore Laplace’s equation in spherical polar coordinates is

$\frac{1}{r^2} \frac{\partial }{\partial r}\big( r^2 \frac{\partial F}{\partial r}\big) + \frac{1}{r^2 \sin \theta}\frac{\partial }{\partial \theta}\big( \sin \theta \frac{\partial F}{\partial \theta} \big) + \frac{1}{r^2 \sin^2 \theta}\frac{\partial^2 F}{\partial \phi^2} = 0$

as before.

# On an integral calculus approach to finding the area of a quadrarc

I recently gave a ‘stretch and challenge’ problem to some of my GCSE maths students which involved finding the area of a quadrarc. A quadrarc, shown shaded in the diagram, is the shape enclosed by four arcs having identical radius but with centres at the four corners of a square. Finding the formula for the area of a quadrarc is in principle just within the reach of a high-attaining GCSE maths student since it can be done using only plane geometry techniques covered in the later stages of a GCSE course. However, it is a challenging problem and although some of my students got quite close none of them managed to solve it fully. The same plane geometry techniques are covered more thoroughly in the foundation year of A-level maths courses so this approach should be more feasible for first-year A-level maths students.

When exploring alternative ways to solve the problem myself, I employed an integral calculus approach that struck me as also being potentially very pedagogically useful for A-level students. It involves a number of techniques from coordinate geometry and integration that are in principle within the reach of such students in the later stages of their studies, so the quadrarc problem is also eminently useful as a ‘stretch and challenge’ revision problem for high-attaining A-level maths students if it is specified that integral calculus is to be used (rather than plane geometry). I have not been able to find any integral calculus solutions to the quadrarc problem in the literature which bring out the potentially useful pedagogical features at A-level, so I want to record my solution in the present note emphasising these features. The reader should try to solve the problem of finding the area of a quadrarc for himself/herself before reading any further.

In what follows we will always take the main square within which the quadrarc lies to be of side length $r$. I will begin by quickly setting out a plane geometry solution which involves finding the areas of an equilateral triangle, a sector, and a segment (diagrams A, B and C below).

Given that the main square has side length $r$, the area of the equilateral triangle in diagram A is (using the sine rule for area)

$\frac{1}{2} r^2 \sin 60^{\circ} = \frac{\sqrt{3}}{4} r^2$

The area of the sector in diagram B is

$\frac{60^{\circ}}{360^{\circ}} \pi r^2 = \frac{\pi}{6} r^2$

so the area of the segment shaded in diagram C is

$\frac{\pi}{6} r^2 - \frac{\sqrt{3}}{4} r^2$

Finally we obtain the shaded area in diagram D by deducting a triangle and two segments from the area of a quarter circle:

$\frac{\pi}{4} r^2 - \frac{\sqrt{3}}{4} r^2 - 2\big( \frac{\pi}{6} r^2 - \frac{\sqrt{3}}{4} r^2 \big) = \frac{\sqrt{3}}{4} r^2 - \frac{\pi}{12} r^2$

The area of the quadrarc is then the area of the square minus four of the areas in diagram D:

$r^2 - 4 \big( \frac{\sqrt{3}}{4} r^2 - \frac{\pi}{12} r^2 \big)$

$= \big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2$

This same formula for the area of a quadrarc, $\big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2$, can be derived using integral calculus as follows.

Let the origin be at the bottom left corner of the square with increasing x-direction to the right and increasing y-direction upwards. Based on the equation of a circle with centre at $(r, 0)$ and radius $r$, the equation of the arc through $R$ and $P$ is

$y_{RP} = \sqrt{r^2 - (x - r)^2}$

Similarly, based on the equation of a circle with centre at $(r, r)$, the equation of the arc through $R$ and $S$ is

$y_{RS} = r - \sqrt{r^2 - (x - r)^2}$

The x-coordinate of the point $R$ is given by the intersection of $y_{RP}$ and $y_{RS}$, i.e.,

$r - \sqrt{r^2 - (x - r)^2} = \sqrt{r^2 - (x - r)^2}$

$\iff$

$4x^2 - 8xr + r^2 = 0$

$\implies$

$x = r \pm \frac{\sqrt{3}}{2}r$

Therefore the x-coordinate of the point $R$ is

$r \big(1 - \frac{\sqrt{3}}{2} \big)$

Note that we need to discard the other solution, namely $r \big(1 + \frac{\sqrt{3}}{2} \big)$, because it exceeds $r$ and therefore lies outside the main square. By symmetry, the x-coordinate of both the points $P$ and $S$ is $\frac{r}{2}$. Therefore half of the area of the quadrarc should be given by the integral

$\int_{r(1 - \sqrt{3}/2)}^{r/2} (y_{RP} - y_{RS}) dx = \int_{r(1 - \sqrt{3}/2)}^{r/2} (2\sqrt{r^2 - (x - r)^2} - r) dx$

$= 2 \int_{r(1 - \sqrt{3}/2)}^{r/2} \sqrt{r^2 - (x - r)^2} dx - \frac{(\sqrt{3} - 1)}{2}r^2$

From tables of standard integrals we have that

$\int\sqrt{a^2 - z^2} dz = \frac{1}{2} z \sqrt{a^2 - z^2} + \frac{1}{2} a^2 \arctan\big( \frac{z}{\sqrt{a^2 - z^2}}\big)$

Therefore let’s employ the change of variable $z = x - r$. Then $dz = dx$ and we have $z = -\frac{\sqrt{3}}{2}r$ when $x = r\big(1 - \frac{3}{2}\big)$ whereas $z = -\frac{r}{2}$ when $x = \frac{r}{2}$. Our integral becomes

$2 \int_{- (\sqrt{3}/2)r}^{-r/2} \sqrt{r^2 - z^2} dz = \bigg[ z\sqrt{r^2 - z^2} + r^2 \arctan\big( \frac{z}{\sqrt{r^2 - z^2}}\big)\bigg]_{- (\sqrt{3}/2)r}^{-r/2}$

$= -\frac{r}{2}\sqrt{r^2 - \frac{r^2}{4}} + r^2 \arctan\bigg( \frac{-\frac{r}{2}}{\sqrt{r^2 - \frac{r^2}{4}}}\bigg) + \frac{\sqrt{3}}{2}r \sqrt{r^2 - \frac{3r^2}{4}} + r^2 \arctan\bigg( \frac{\frac{\sqrt{3}r}{2}}{\sqrt{r^2 - \frac{3r^2}{4}}}\bigg)$

$= r^2 \bigg\{ \arctan(\sqrt{3}) - \arctan\big( \frac{1}{\sqrt{3}}\big) \bigg\}$

$= \big\{ \frac{\pi}{3} - \frac{\pi}{6}\big\} r^2 = \frac{\pi}{6}r^2$

Therefore half the area of the quadrarc is

$\int_{r(1 - \sqrt{3}/2)}^{r/2} (y_{RP} - y_{RS}) dx$

$= 2 \int_{r(1 - \sqrt{3}/2)}^{r/2} \sqrt{r^2 - (x - r)^2} dx - \frac{(\sqrt{3} - 1)}{2}r^2$

$= \bigg(\frac{\pi}{6} - \frac{(\sqrt{3} - 1)}{2}\bigg)r^2$

The full area of the quadrarc is then

$\big( \frac{\pi}{3} - (\sqrt{3} - 1) \big) r^2$

as before.

# A note on Bragg’s law for X-ray scattering

In 1912, at the age of only 22, William L. Bragg wrote a paper which essentially began the field now known as X-ray crystallography. Earlier that year, Max von Laue and others had passed X-rays through crystals of copper sulphate and zinc sulphide, obtaining patterns of dots on a photographic film. These patterns were the first evidence that X-rays could be diffracted by the regularly arranged atoms in crystals. The X-ray diffraction pattern produced by zinc sulphide was particularly sharp and clear, and seemed to show a four-fold symmetry.

Laue attempted to explain the patterns of dots mathematically but was not completely successful. He had tried to set up the maths based on overly complicated and partially incorrect models of what was going on at the atomic level. At this time the young William L. Bragg was just finishing a mathematics degree at Cambridge University. He heard about Laue’s work from his father, William H. Bragg, who was a mathematician and physicist himself and had been working on X-rays at the University of Leeds. When thinking about the problem, the young William Bragg envisaged a much simpler model than Laue’s in which the planes in the crystal lattice acted like mirrors. The incoming X-ray beams, interpreted as electromagnetic waves with a particular wavelength $\lambda$, would be reflected by the planes in the crystal lattice. The outgoing beams would then interfere with one another to give either constructive or destructive interference, depending on whether the crests of the waves were in phase or out of phase. The diagram below shows the standard set-up for deriving Bragg’s law that appears in every textbook and article on this topic.

The incident beams are at an angle $\theta$ to the planes in the lattice and are reflected at the same angle. The extra distance travelled by the lower beam is the sum of the lengths of the two black arrows in the diagram. Elementary trigonometry shows that the length of each black arrow is $\text{d} \sin \theta$, so the extra distance travelled by the lower beam is $2 \text{d} \sin \theta$. A black dot is produced on the photographic film only if the extra distance travelled by the lower beam is a whole number of wavelengths, since in this case there will be constructive interference of the outgoing waves and the reinforced beam will register strongly. If the extra distance travelled by the lower beam is not exactly equal to a whole number of wavelengths, there will be some X-ray beam nearby that will interfere destructively with it and they will cancel each other out. For example, the diagram below shows how the beams cancel each other out when the extra distance travelled by the lower beam is an odd multiple of half a wavelength. In this case, the scattering of the X-ray beams will not be detectable and no black spot will appear in the corresponding position on the photographic film.

It follows that it is possible to calculate the positions of the black spots on the photographic film using the equation

$2 \text{d} \sin \theta = \text{n} \lambda$

where n is an integer. This is Bragg’s law. A black spot appears on the photographic film only at those positions where the three quantities d, $\theta$ and $\lambda$ satisfy the equation of Bragg’s law simultaneously; otherwise no diffraction spot will appear on the film. This is the law that the young William Bragg published in his 1912 paper. Together, the two William Braggs continued to investigate crystal structures using X-ray diffraction, and father and son were both jointly awarded the Nobel Prize in physics for this work in 1915. The young William Bragg was only twenty-five and remains the youngest ever Nobel winner for science.

This is a lovely story and the standard textbook set-up above for deriving Bragg’s law seems straightforward enough. However, a bright A-level student of mine expressed dissatisfaction with it and asked if there is some alternative approach to deriving Bragg’s law that involves an explicit subtraction operation, i.e., an approach showing that a specific distance travelled by the lower X-ray beam minus a specific distance travelled by the upper X-ray beam equals $2 \text{d} \sin \theta$. I felt this was an interesting question but when we tried to find an alternative approach in the literature we noticed that every textbook and every source online that we looked at had exactly this same set-up, or something very similar to it that the student also did not like! We eventually managed to come up with a calculation of the kind the student wanted to see by varying the standard textbook set-up slightly, and this is what I want to record in the present note.

We altered the standard set-up by moving the two short perpendiculars to the X-ray beams symmetrically away from the point of incidence of the upper beam, as indicated in the following diagram.

A convenient place to put the blue perpendiculars is at the points of intersection of the lower X-ray beam with the upper crystal plane, which is what I have done in the diagram above. This then allows the type of explicit subtraction operation to be carried out that the student wanted to see. All we needed to do was to demonstrate that the distance shown in red in the diagram below, minus the distance shown in green, exactly equals $2 \text{d} \sin \theta$.

From the diagram we see immediately that

$\text{h} = \frac{\text{d}}{\sin \theta}$

$\frac{\text{j}}{\text{k}} = \cos \theta$

$\frac{\text{d}}{\text{k}} = \tan \theta$

Therefore

$\text{j} = \text{k} \cos \theta = \frac{\text{d}}{\tan \theta} \cos \theta$

and the desired subtraction operation is then

$2 \text{h} - 2 \text{j} = \frac{2 \text{d}}{\sin \theta} - \frac{2 \text{d}}{\tan \theta} \cos \theta = 2 \text{d} \bigg( \frac{1}{\sin \theta} - \frac{\cos^2 \theta}{\sin \theta} \bigg) = 2 \text{d} \sin \theta$

as required. One can now imagine the blue perpendiculars sliding back together towards the centre of the diagram. The difference between the red and the green distances will always remain equal to $2 \text{d} \sin \theta$ as the blue perpendiculars slide inwards but the green distance will eventually shrink to zero. We will then have arrived back at the standard textbook set-up and the extra distance travelled by the lower X-ray beam will (of course) remain $2 \text{d} \sin \theta$.

# On passing from the discrete to the continuous form of Slutsky’s identity in microeconomics

I have noticed that there is a sharp ‘jump’ in the literature concerning Slutsky’s decomposition equation for the effects of a price change on demand. Elementary discussions usually employ a discrete version of Slutsky’s equation which assumes a relatively large price change. This is typically illustrated in a pivot-shift’ type diagram which is familiar to all undergraduate microeconomics students. In more advanced (typically postgraduate) treatments, however, the discussion suddenly jumps to using a full blown partial differential equation form of Slutsky’s identity assuming an infinitesimal price change. The partial differential equation form is usually expressed in terms of a Hicksian demand function. I have not been able to find any appealing discussions of how the set-up that relates to the discrete form of Slutsky’s equation naturally evolves into the partial differential equation form as we pass from a large price change to the limit of an infinitesimally small price change. In this note I want to explore how the discrete form naturally evolves into the partial differential equation form when we make the price change infinitesimally small.

Slutsky’s decomposition equation expresses the effects of a price change on Marshallian demand in terms of a pure substitution effect, which is always negative, and an income effect, which can be positive or negative depending on whether the good is a normal good or an inferior good respectively. I will employ a simple two-good numerical example to illustrate the discrete form of Slutsky’s equation, using a Cobb-Douglas utility function of the form

$u(x, y) = x^{1/2}y^{1/2}$

The mathematical problem is to find the combination of $x$ and $y$ which maximises this utility function subject to the budget constraint

$p_1x + p_2y = m$

The Cobb-Douglas utility function is globally concave and smooth so we are guaranteed to find a unique interior solution by partial differentiation. One normally proceeds by taking the natural logarithm of the utility function (this is a monotonic transformation so does not affect the preferences represented by the original utility function) and setting up the Lagrangian for the problem, namely

$L = \frac{1}{2}\ln x + \frac{1}{2}\ln y + \lambda (m - p_1x -p_2y)$

Taking first-order partial derivatives with respect to $x$, $y$ and $\lambda$ and setting them equal to zero we get

$\frac{\partial L}{\partial x} = \frac{1}{2x} - \lambda p_1 = 0$

$\frac{\partial L}{\partial y} = \frac{1}{2y} - \lambda p_2 = 0$

$\frac{\partial L}{\partial \lambda} = m - p_1x - p_2y = 0$

This is a system of three equations in three unknowns. Dividing the first equation by the second and rearranging one obtains

$\frac{y}{x} = \frac{p_1}{p_2}$

Solving the third equation for $x$ we get

$x = \frac{m}{p_1} - \frac{p_2}{p_1}y$

and substituting this into the equation above we get

$\frac{y}{\frac{m}{p_1} - \frac{p_2}{p_1}y} = \frac{p_1}{p_2}$

$\iff$

$y = \frac{m}{2p_2}$

This is the uncompensated demand function (often also called the Marshallian demand function) for good $y$. By symmetry, the uncompensated demand function for good $x$ is

$x = \frac{m}{2p_1}$

(Note that rearranging the demand function for $y$ we get $p_2y = \frac{m}{2}$ which says that the consumer will spend exactly half of the income on $y$, and similarly for $x$. Whenever the Cobb-Douglas utility function is in a form in which the exponents on the goods are fractions which sum to $1$, these fractions tell us the proportions of income which will be spent on the corresponding goods. Our utility function was of the form $u(x, y) = x^{1/2}y^{1/2}$ so one-half of the total income is spent on each good, as we confirmed with the above calculation).

To illustrate Slutsky’s decomposition of the effects of a price change into a pure substitution effect and an income effect, consider the above uncompensated demand function for $x$ and suppose that $m = \pounds 1000$ while $p_1 = \pounds 10$. The amount of $x$ demanded at this income and price is then

$x(p_1, m) = \frac{1000}{(2)(10)}= 50$

This corresponds to the amount of $x$ in the bundle A in the diagram below.

Now suppose that the price rises to $p_1^{*} = \pounds 20$

The amount of $x$ demanded at the original income and this new price falls to

$x(p_1^{*}, m) = \frac{1000}{(2)(20)} = 25$

This corresponds to the amount of $x$ in the bundle C in the diagram.

Slutsky’s decomposition of this total change in demand begins by asking what change in income would be enough to enable the consumer to buy the original amount of $x$ at the new price. This amount of additional income is obtained as

$p_1^{*}x - p_1x = (20)(50) - (10)(50) = \pounds 500$

Therefore, compensating’ the consumer by increasing the income level from $m = \pounds 1000$ to $m^{*} = \pounds 1500$ enables them to buy their original bundle A with $x = 50$. This increase in the income level corresponds to a shift outwards in the new budget line to a position represented by the blue budget line in the diagram.

In the sense that the original bundle A is affordable again (so purchasing power has remained constant), the consumer is now as well off as before, but the original bundle A is no longer the utility-maximising one at the new price and the higher income level. The consumer will want to adjust the bundle until the utility function is maximised at the new price and new income. The amount of $x$ the consumer will actually demand at the new price and new income level will be

$x(p_1^{*}, m^{*}) = \frac{1500}{(2)(20)} = 37.5$

This corresponds to the amount of $x$ in the bundle B in the diagram above, and is usually referred to in the literature as the compensated demand for $x$ (as opposed to the uncompensated demand at point A). The pure substitution effect of the price rise (i.e., abstracting from the income effect) is then the change in demand for $x$ when the price of $x$ changes to $p_1^{*}$ and at the same time the income level changes to $m^{*}$ to keep the consumer’s purchasing power constant:

$x(p_1^{*}, m^{*}) - x(p_1, m) = 37.5 - 50 = -12.5$

This is the change in the amount of $x$ represented by the shift from bundle A to bundle B in the diagram above.

In this numerical example, the pure substitution effect of the price rise accounts for exactly half of the total drop in the demand for $x$ from 50 at point A to 25 at point C. The other half of the drop in the demand for $x$ is accounted for by the income effect (sometimes called the wealth’ effect) of the price rise, which is represented in the diagram above by a parallel shift inwards of the blue budget line to the position of the final budget line on which bundle C lies. This is the change in demand for $x$ when we change the income level from $m^{*}$ back to $m$, holding the price of $x$ fixed at $p_1^{*}$. Thus, the income effect is computed as

$x(p_1^{*}, m) - x(p_1^{*}, m^{*}) = 25 - 37.5 = -12.5$

The substitution effect plus the income effect together account for the full drop in the demand for $x$ as a result of moving from bundle A to bundle C in response to the price rise of $x$.

In this simple numerical example Slutsky’s decomposition equation takes the discrete form

$x(p_1^{*}, m) - x(p_1, m) = \big \{ x(p_1^{*}, m^{*}) - x(p_1, m) \big \} + \big\{ x(p_1^{*} , m) - x(p_1^{*}, m^{*}) \big \}$

As we pass to the limit of an infinitesimally small price change, however, Slutsky’s decomposition equation takes the form of a partial differential equation which can be derived quite naturally from the discrete form by the following arguments. Suppose we start at point A in the above diagram again, and suppose the price changes by an infinitesimal amount $\delta$, i.e.,

$p_1 \longrightarrow p_1 + \delta$

The (infinitesimal) amount of income needed to compensate the consumer so that the bundle at A remains affordable after the price change is

$(p_1 + \delta ) x - p_1 x = \delta x$

We can then rewrite the discrete form of Slutsky’s equation as

$x(p_1 + \delta, \ m) - x(p_1, \ m) =$

$\big \{ x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m) \big \} + \big \{x(p_1 + \delta, \ m) - x(p_1 + \delta, \ m + \delta x) \big \}$

$\iff$

$x(p_1 + \delta, \ m) - x(p_1, \ m) =$

$\big \{ x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m) \big \} - \big \{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m) \big \}$

Dividing through by $\delta$ and taking the limit as $\delta \rightarrow 0$ we get

$\frac{\partial x}{\partial p_1} = \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} - \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta}$

The second term on the right-hand side, the income effect, can be written as

$\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta} = x \lim_{\delta x \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1 + \delta, \ m)}{\delta x} = x \frac{\partial x}{\partial m}$

Our partial differential equation form of Slutsky’s identity has then so far evolved to

$\frac{\partial x}{\partial p_1} = \lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} - x \frac{\partial x}{\partial m}$

Now consider the first term on the right-hand side, namely

$\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta}$

This is the effect of the (infinitesimal) price change when we eliminate the income effect by changing the income to $m + \delta x$, i.e., it is the pure price substitution effect. What we are going to do is replace the two terms in the numerator by corresponding Hicksian demand functions, usually denoted by the letter $h$. Consider first the original demand function $x(p_1, \ m)$ at point A in the diagram above. This was obtained by maximising utility subject to the price $p_1$ and income level $m$. However, we can also regard it as having been obtained by solving the dual problem of minimising the expenditure required to achieve the level of utility $u(x^A, y^A)$ associated with the bundle A in the diagram above, given the price $p_1$. We would find that the minimised expenditure level would be

$E(p_1, \ u(x^A, y^A)) = m$

and the amount of $x$ demanded would be the same as the original amount:

$x(p_1, \ m) = h(p_1, \ E(p_1, \ u(x^A, y^A)))$

Now consider the problem of minimising the expenditure required to achieve the level of utility $u(x^A, y^A)$ given the changed price $p_1 + \delta$. The minimised expenditure level would be

$E(p_1 + \delta, \ u(x^A, y^A))$

Now, it is clearly the case that this minimised expenditure is `sandwiched’ between $m$ and $m + \delta x$, i.e.,

$m < E(p_1 + \delta, \ u(x^A, y^A)) < m + \delta x$

so the difference between the three quantities in the inequality must become vanishingly small as $\delta \rightarrow 0$. Therefore as long as we are taking limits as $\delta \rightarrow 0$, we can replace $x(p_1 + \delta, \ m + \delta x)$ with the Hicksian demand function

$h(p_1 + \delta, \ E(p_1 + \delta, \ u(x^A, y^A)))$

(What I am basically saying here is that as $\delta \rightarrow 0$, we can replace the point B in the diagram above with a point that is on the same indifference curve as the original bundle at point A. In this case the consumer would be compensated not by returning them to the original purchasing power after the price change, but to the original level of utility they were experiencing. As $\delta$ becomes infinitesimally small, it makes no difference which of these two points we think about for the pure substitution effect. When the price change is large, these two points diverge and we then have a distinction between a Slutsky substitution effect which involves restoring the original purchasing power, and a Hicksian substitution effect which involves restoring the original utility level. The distinction between these two disappears when the price change is infinitesimal). We then have

$\lim_{\delta \to 0} \frac{x(p_1 + \delta, \ m + \delta x) - x(p_1, \ m)}{\delta} = \lim_{\delta \to 0} \frac{h(p_1 + \delta, \ E(p_1 + \delta, \ u(x^A, y^A))) - h(p_1, E(p_1, \ u(x^A, y^A)))}{\delta}$

$=$

$\frac{\partial h}{\partial p_1}$

We now have the final partial differential equation form of Slutsky’s decomposition equation in the way it is usually written using the Hicksian demand function:

$\frac{\partial x}{\partial p_1} = \frac{\partial h}{\partial p_1} - x \frac{\partial x}{\partial m}$

As before, this says that the total effect of a price change is composed of a pure substitution effect (with income adjusted to exactly compensate the consumer for the wealth effect of the price change) and an income effect.

To check that this partial differential equation works in the context of our Cobb-Douglas example above, we can compute the partial derivatives explicitly. Since the demand function with the above Cobb-Douglas preferences would be

$x = \frac{m}{2p_1}$

we have

$\frac{\partial x}{\partial p_1} = - \frac{m}{2(p_1)^2}$

By solving a simple expenditure minimisation problem it can easily be shown that

$h = \frac{(p_2)^{1/2}}{(p_1)^{1/2}} u$

(The expenditure minimisation problem would be to minimise $E = p_1 x + p_2 y$ subject to $x^{1/2} y^{1/2} = u$. Solving the constraint equation for $y$ and substituting into the objective function gives an unconstrained minimisation problem in the variable $x$ only. Solving this yields the above expression for the Hicksian demand function $h$).

Therefore we also have

$\frac{\partial h}{\partial p_1} = -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1)^{3/2}} u$

$= -\frac{1}{2}\frac{(p_2)^{1/2}}{(p_1)^{3/2}}\big(\frac{m}{2p_1}\big)^{1/2}\big(\frac{m}{2p_2}\big)^{1/2}$

$= -\frac{m}{4(p_1)^2}$

and

$-\frac{\partial x}{\partial m}\cdot x = -\frac{1}{2p_1}\frac{m}{2p_1} = -\frac{m}{4(p_1)^2}$

Putting these into the partial differential form of Slutsky’s equation we see that the equation is satisfied.